Автор: Williams D.  

Теги: mathematics   probability theory  

Год: 1991

Текст
                    Probability with Martingales
David Williams
Statistical Laboratory, DPMMS
Cambridge University
11
i
The right of the
University of Cambridge
to print and sell
all manner of books
was granted by
Henry VIII in 1534.
The University has printed
and published continuously
since 1584.
CAMBRIDGE UNIVERSITY PRESS
Cambridge
New York Port Chester
Melbourne Sydney


Published by the Press Syndicate of the University of Cambridge The Pitt Building, Trumpington Street, Cambridge CB2 1RP 40 West 20th Street, New York, NY10011, USA 10, Stamford Road, Oakleigh, Melbourne 3166, Australia Cambridge University Press 1991 Printed in Great Britain at the University Press, Cambridge British Library cataloguing in publication data available Library of Congress cataloging in publication data available
Contents Preface — please read! xi A Question of Terminology xiii A Guide to Notation xiv Chapter 0: A Branching-Process Example 1 0.0. Introductory remarks. 0.1. Typical number of children, X. 0.2. Size of nth generation, Zn. 0.3. Use of conditional expectations. 0.4. Extinction probability, tt. 0.5. Pause for thought: measure. 0.6. Our first martingale. 0.7. Convergence (or not) of expectations. 0.8. Finding the distribution of . 0.9. Concrete example. PART A: FOUNDATIONS Chapter 1: Measure Spaces 14 1.0. Introductory remarks. 1.1. Definitions of algebra, a-algebra. 1.2. Ex- Examples. Borel cr-algebras, B(S), B = B(R). 1.3. Definitions concerning set functions. 1.4. Definition of measure space. 1.5. Definitions con- concerning measures. 1.6. Lemma. Uniqueness of extension, x-systems. 1.7. Theorem. Caratheodory's extension theorem. 1.8. Lebesgue measure Leb on (@, l],#@,1]). 1.9. Lemma. Elementary inequalities. 1.10. Lemma. Monotone-convergence properties of measures. 1.11. Example/Warning. Chapter 2: Events 23 2.1. Model for experiment: (O,^7, P). 2.2. The intuitive meaning. 2.3. Examples of (Q^J7) pairs. 2.4. Almost surely (a.s.) 2.5. Reminder: limsup,liminf,| lim, etc. 2.6. Definitions. limsupJ5n, B£n,i.o.). 2.7.
vi Contents First Borel-Cantelli Lemma (BCl). 2.8. Definitions, liminf En,(En,ev). 2.9. Exercise. Chapter 3: Random Variables 29 3.1. Definitions. S-measurable function, mS, (mS)+,bS. 3.2. Elementary Propositions on measurability. 3.3. Lemma. Sums and products of mea- measurable functions are measurable. 3.4. Composition Lemma. 3.5. Lemma on measurability of infs, liminfs of functions. 3.6. Definition. Random variable. 3.7. Example. Coin tossing. 3.8. Definition, cr-algebra generated by a collection of functions on Q. 3.9. Definitions. Law, Distribution Func- Function. 3.10. Properties of distribution functions. 3.11. Existence of random variable with given distribution function. 3.12. Skorokod representation of a random variable with prescribed distribution function. 3.13. Generated a-algebras - a discussion. 3.14. The Monotone-Class Theorem. Chapter 4: Independence 38 4.1. Definitions of independence. 4.2. The 7r-system Lemma; and the more familiar definitions. 4.3. Second Borel-Cantelli Lemma (BC2). 4.4. Example. 4.5. A fundamental question for modelling. 4.6. A coin-tossing model with applications. 4.7. Notation: IID RVs. 4.8. Stochastic processes; Markov chains. 4.9. Monkey typing Shakespeare. 4.10. Definition. Tail c- algebras. 4.11. Theorem. Kolmogorov's 0-1 law. 4.12. Exercise/Warning. Chapter 5: Integration 49 5.0. Notation, etc. /i(/) :=: J fdfi, fi(f\A). 5.1. Integrals of non-negative simple functions, SF+. 5.2. Definition of//(/), / £ (mS)+. 5.3. Monotone- Convergence Theorem (MON). 5.4. The Fatou Lemmas for functions (FA- TOU). 5.5. 'Linearity'. 5.6. Positive and negative parts of /. 5.7. Inte- grable function, C1 (S, S,//). 5.8. Linearity. 5.9. Dominated Convergence Theorem (DOM). 5.10. Scheffe's Lemma (SCHEFFE). 5.11. Remark on uniform integrability. 5.12. The standard machine. 5.13. Integrals over subsets. 5.14. The measure ///, / £ (mS)+. Chapter 6: Expectation 58 Introductory remarks. 6.1. Definition of expectation. 6.2. Convergence theorems. 6.3. The notation E(X;F). 6.4. Markov's inequality. 6.5. Sums of non-negative RVs. 6.6. Jensen's inequality for convex functions. 6.7. Monotonicity of Cp norms. 6.S. The Schwarz inequality. 6.9. C2: Pythagoras, covariance, etc. 6.10. Completeness of Cp A < p < oo). 6.11. Orthogonal projection. 6.12. The 'elementary formula' for expectation. 6.13. Holder from Jensen.
Contents vii Chapter 7: An Easy Strong Law 71 7.1. 'Independence means multiply' - again! 7.2. Strong Law - first version. 7.3. Chebyshev's inequality. 7.4. Weierstrass approximation theorem. Chapter 8: Product Measure 75 8.0. Introduction and advice. 8.1. Product measurable structure, Si X £2. 8.2. Product measure, Fubini's Theorem. 8.3. Joint laws, joint pdfs. 8.4. Independence and product measure. 8.5. B(R)n = B(Rn). 8.6. The n-fold extension. 8.7. Infinite products of probability triples. 8.8. Technical note on the existence of joint laws. PART B: MARTINGALE THEORY Chapter 9: Conditional Expectation 83 9.1. A motivating example. 9.2. Fundamental Theorem and Definition (Kolmogorov, 1933). 9.3. The intuitive meaning. 9.4. Conditional ex- expectation as least-squares-best predictor. 9.5. Proof of Theorem 9.2. 9.6. Agreement with traditional expression. 9.7. Properties of conditional ex- expectation: a list. 9.8. Proofs of the properties in Section 9.7. 9.9. Regular conditional probabilities and pdfs. 9.10. Conditioning under independence assumptions. 9.11. Use of symmetry: an example. Chapter 10: Martingales 93 10.1. Filtered spaces. 10.2. Adapted processes. 10.3. Martingale, super- martingale, submartingale. 10.4. Some examples of martingales. 10.5. Fair and unfair games. 10.6. Previsible process, gambling strategy. 10.7. A fun- fundamental principle: you can't beat the system! 10.8. Stopping time. 10.9. Stopped supermartingales are supermartingales. 10.10. Doob's Optional- Stopping Theorem. 10.11. Awaiting the almost inevitable. 10.12. Hitting times for simple random walk. 10.13. Non-negative superharmonic func- functions for Markov chains. Chapter 11: The Convergence Theorem 106 11.1. The picture that says it all. 11.2. Upcrossings. 11.3. Doob's Upcross- ing Lemma. 11.4. Corollary. 11.5. Doob's'Forward'Convergence Theorem. 11.6. Warning. 11.7. Corollary.
viii Contents Chapter 12: Martingales bounded in C2 110 12.0. Introduction. 12.1. Martingales in C2: orthogonality of increments. 12.2. Sums of zero-mean independent random variables in C2. 12.3. Ran- Random signs. 12.4. A symmetrization technique: expanding the sample space. 12.5. Kolmogorov's Three-Series Theorem. 12.6. Cesaro's Lemma. 12.7. Kronecker's Lemma. 12.8. A Strong Law under variance constraints. 12.9. Kolmogorov's Truncation Lemma. 12.10. Kolmogorov's Strong Law of Large Numbers (SLLN). 12.11. Doob decomposition. 12.12. The angle- brackets process (M). 12.13. Relating convergence of M to finiteness of (M)oq. 12.14. A trivial 'Strong Law' for martingales in £2. 12.15. Levy's extension of the Borel-Cantelli Lemmas. 12.16. Comments. Chapter 13: Uniform Integrability 126 13.1. An 'absolute continuity' property. 13.2. Definition. UI family. 13.3. Two simple sufficient conditions for the UI property. 13.4. UI property of conditional expectations. 13.5. Convergence in probability. 13.6. Ele- Elementary proof of (BDD). 13.7. A necessary and sufficient condition for Cl convergence. Chapter 14: UI Martingales 133 14.0. Introduction. 14.1. UI martingales. 14.2. Levy's 'Upward' Theorem. 14.3. Martingale proof of Kolmogorov's 0-1 law. 14.4. Levy's 'Downward' Theorem. 14.5. Martingale proof of the Strong Law. 14.6. Doob's Sub- martingale Inequality. 14.7. Law of the Iterated Logarithm: special case. 14.8. A standard estimate on the normal distribution. 14.9. Remarks on ex- exponential bounds; large deviation theory. 14.10. A consequence of Holder's inequality. 14.11. Doob's Cp inequality. 14.12. Kakutani's Theorem on 'product' martingales. 14.13.The Radon-Nikodym theorem. 14.14. The Radon-Nikodym theorem and conditional expectation. 14.15. Likelihood ratio; equivalent measures. 14.16. Likelihood ratio and conditional expec- expectation. 14.17. Kakutani's Theorem revisited; consistency of LR test. 14.18. Note on Hardy spaces, etc. Chapter 15: Applications 153 15.0. Introduction - please read! 15.1. A trivial martingale-representation result. 15.2. Option pricing; discrete Black-Scholes formula. 15.3. The Mabinogion sheep problem. 15.4. Proof of Lemma 15.3(c). 15.5. Proof of result 15.3(d). 15.6. Recursive nature of conditional probabilities. 15.7. Bayes' formula for bivariate normal distributions. 15.8. Noisy observation of a single random variable. 15.9. The Kalman-Bucy filter. 15.10. Harnesses entangled. 15.11. Harnesses unravelled, 1. 15.12. Harnesses unravelled, 2.
Contents ix PART C: CHARACTERISTIC FUNCTIONS Chapter 16: Basic Properties of CFs 172 16.1. Definition. 16.2. Elementary properties. 16.3. Some uses of char- characteristic functions. 16.4. Three key results. 16.5. Atoms. 16.6. Levy's Inversion Formula. 16.7. A table. Chapter 17: Weak Convergence 179 17.1. The 'elegant' definition. 17.2. A 'practical' formulation. 17.3. Sko- rokhod representation. 17.4. Sequential compactness for Prob(R). 17.5. Tightness. Chapter 18: The Central Limit Theorem 185 18.1. Levy's Convergence Theorem. 18.2. o and 0 notation. 18.3. Some important estimates. 18.4. The Central Limit Theorem. 18.5. Example. 18.6. CF proof of Lemma 12.4. APPENDICES Chapter Al: Appendix to Chapter 1 192 Al.l. A non-measurable subset A of S1. A1.2. J-systems. A1.3. Dynkin's Lemma. A1.4. Proof of Uniqueness Lemma 1.6. A1.5. A-sets: 'algebra' case. A1.6. Outer measures. Al.7. Caratheodory's Lemma. A1.8. Proof of Caratheodory's Theorem. Al.9. Proof of the existence of Lebesgue measure on (@, l],#@,1]). A1.10. Example of non-uniqueness of extension. Al.ll. Completion of a measure space. A1.12. The Baire category theorem. Chapter A3: Appendix to Chapter 3 205 A3.1. Proof of the Monotone-Class Theorem 3.14. A3.2. Discussion of generated a-algebras. Chapter A4: Appendix to Chapter 4 208 A4.1. Kolmogorov's Law of the Iterated Logarithm. A4.2. Strassen's Law of the Iterated Logarithm. A4.3. A model for a Markov chain. Chapter A5: Appendix to Chapter 5 211 A5.1. Doubly monotone arrays. A5.2. The key use of Lemma 1.10(a). A5.3. 'Uniqueness of integral'. A5.4. Proof of the Monotone-Convergence Theorem.
x Contents Chapter A9: Appendix to Chapter 9 214 A9.1. Infinite products: setting things up. A9.2. Proof of A9.1(e). Chapter A13: Appendix to Chapter 13 217 A13.1. Modes of convergence: definitions. A13.2. Modes of convergence: relationships. Chapter A14: Appendix to Chapter 14 219 A14.1. The cr-algebra Tt<> T a stopping time. A14.2. A special case of OST. A14.3. Doob's Optional-Sampling Theorem for UI martingales. A14.4. The result for UI submartingales. Chapter A16: Appendix to Chapter 16 222 A 16.1. Differentiation under the integral sign. Chapter E: Exercises 224 References 243 Index 246
Preface - please read! The most important chapter in this book is Chapter E: Exercises. I have left the interesting things for you to do. You can start now on the 'EG' exercises, but see 'More about exercises' later in this Preface. The book, which is essentially the set of lecture notes for a third-year undergraduate course at Cambridge, is as lively an introduction as I can manage to the rigorous theory of probability. Since much of the book is devoted to martingales, it is bound to become very lively: look at those Exercises on Chapter 10! But, of course, there is that initial plod through the measure-theoretic foundations. It must be said however that measure theory, that most arid of subjects when done for its own sake, becomes amazingly more alive when used in probability, not only because it is then applied, but also because it is immensely enriched. You cannot avoid measure theory: an event in probability is a measur- measurable set, a random variable is a measurable function on the sample space, the expectation of a random variable is its integral with respect to the prob- probability measure; and so on. To be sure, one can take some central results from measure theory as axiomatic in the main text, giving careful proofs in appendices; and indeed that is exactly what I have done. Measure theory for its own sake is based on the fundamental addition rule for measures. Probability theory supplements that with the multipli- multiplication rule which describes independence; and things are already looking up. But what really enriches and enlivens things is that we deal with lots of cr-algebras, not just the one cr-algebra which is the concern of measure theory. In planning this book, I decided for every topic what things I considered just a bit too advanced, and, often with sadness, I have ruthlessly omitted them. For a more thorough training in many of the topics covered here, see Billingsley A979), Chow and Teicher A978), Chung A968), Kingman and
xii Preface Taylor A966), Laha and Rohatgi A979), and Neveu A965). As regards measure theory, I learnt it from Dunford and Schwartz A958) and Halmos A959). After reading this book, you must read the still-magnificent Breiman A968), and, for an excellent indication of what can be done with discrete martingales, Hall and Heyde A980). Of course, intuition is much more important than knowledge of measure theory, and you should take every opportunity to sharpen your intuition. There is no better whetstone for this than Aldous A989), though it is a very demanding book. For appreciating the scope of probability and for learning how to think about it, Karlin and Taylor A981), Grimmett and Stirzaker A982), Hall A988), and Grimmett's recent superb book, Grimmett A989), on percolation are strongly recommended. More about exercises. In compiling Chapter E, which consists exactly of the homework sheet I give to the Cambridge students, I have taken into account the fact that this book, like any other mathematics book, implicitly contains a vast number of other exercises, many of which are easier than those in Chapter E. I refer of course to the exercises you create by reading the statement of a result, and then trying to prove it for yourself, before you read the given proof. One other point about exercises: you will, for example, surely forgive my using expectation E in Exercises on Chapter 4 before E is treated with full rigour in Chapter 6. Acknowledgements. My first thanks must go to the students who have endured the course on which the book is based and whose quality has made me try hard to make it worthy of them; and to those, especially David Kendall, who had developed the course before it became my privilege to teach it. My thanks to David Tranah and other staff of CUP for their help in converting the course into this book. Next, I must thank Ben Garling, James Norris and Chris Rogers without whom the book would have contained more errors and obscurities. (The many faults which surely remain in it are my responsibility.) Helen Rutherford and I typed part of the book, but the vast majority of it was typed by Sarah Shea-Simonds in a virtuoso performance worthy of Horowitz. My thanks to Helen and, most especially, to Sarah. Special thanks to my wife, Sheila, too, for all her help. But my best thanks - and yours if you derive any benefit from the book - must go to three people whose names appear in capitals in the Index: J.L. Doob, A.N. Kolmogorov and P. Levy: without them, there wouldn't have been much to write about, as Doob A953) splendidly confirms. Statistical Laboratory, David Williams Cambridge October 1990
A Question of Terminology Random variables: functions or equivalence classes? At the level of this book, the theory would be more 'elegant' if we regarded a random variable as an equivalence class of measurable functions on the sample space, two functions belonging to the same equivalence class if and only if they are equal almost everywhere. Then the conditional-expectation map A' i-» E(X\g) would be a truly well-defined contraction map from Lp(Cl, J7, P) to LP(Q, Q, P for p > 1; and we would not have to keep mentioning versions (representa- (representatives of equivalence classes) and would be able to avoid the endless 'almost surely' qualifications. I have however chosen the 'inelegant' route: firstly, I prefer to work with functions, and confess to preferring 4 + 5 = 2 mod 7 to [4]7 + [5]7 = [2]7. But there is a substantive reason. I hope that this book will tempt you to progress to the much more interesting, and more important, theory where the parameter set of our process is uncountable (e.g. it may be the time- parameter set [0, oo)). There, the equivalence-class formulation just will not work: the 'cleverness' of introducing quotient spaces loses the subtlety which is essential even for formulating the fundamental results on existence of continuous modifications, etc., unless one performs contortions which are hardly elegant. Even if these contortions allow one to formulate results, one would still have to use genuine functions to prove them; so where does the reality lie?! Xlll
A Guide to Notation ► signifies something important, ►► something very important, and the Martingale Convergence Theorem. I use ':=' to signify 'is defined to equal'. This Pascal notation is particularly convenient because it can also be used in the reversed sense. I use analysts' (as opposed to category theorists') conventions: ► N:= {1,2,3,...} C {0,1,2,...} =: Z+. Everyone is agreed that R+ := [0, oo). For a set B contained in some universal set S, Ib denotes the indicator function of B: that is Ib : S —* {0,1} and 0 otherwise. For a, b £ R, a A b := min(a, 6), a V b := max(a, b). CFxharacteristic function; DF: distribution function; pdf: probability den- density function. <7-algebra, a(C) A.1); a{Y1 : 7 G C) C.8, 3.13). 7r-system A.6); c?-system (A1.2). a.e.: almost everywhere A.5) a.s.: almost surely B.4) the space of bounded S-measurable functions C.1) xiv
A Guide to Notation B(S): C»X: dX/dfx: dQ/dP: E(X): E(X; F): E(X\g): (En,ev): (En,i.o.): fx: fx,Y: fx\Y'- Fx: lim inf: lim sup: vis ——" I XJL xX Jl *a> j-j log: £x, Ax: Cp, L*>: Leb: mS: MT: (M): tif)'- V^x: <?: $: XT: the Borel <r-algebra on S, 3 := 5(R) A.2) discrete stochastic integral A0.6) Radon-Nikodym derivative E.14) Likelihood Ratio A4.13) expectation E(X) := JQ X{uj)P(duj) of X F.3) /FXcfPF.3) conditional expectation (9.3) liminf£?n B.8) lim sup £n B.6) probability density function (pdf) of X F.12). joint pdf (8.3) conditional pdf (9.6) distribution function of X C.9) for sets, B.8) for sets, B.6) xn f x in that xn < xn+\ (Vn) and xn —*• x. natural (base e) logarithm law of X C.9) Lebesgue spaces F.7, 6.13) Lebesgue measure A.8) space of S-measurable functions C.1) process M stopped at time T A0.9) angle-brackets process A2.12) integral of / with respect to /i E.0, 5.2) fA fdfi E.0, 5.2) CF of X (Chapter 16) pdf of standard normal N@,l) distribution DF of N@,l) distribution X stopped at time T A0.9)
Chapter 0 A Branching-Process Example (This Chapter is not essential for the remainder of the book. You can start with Chapter 1 if you wish.) 0.0. Introductory remarks The purpose of this chapter is threefold: to take something which is probably well known to you from books such as the immortal Feller A957) or Ross A976), so that you start on familiar ground; to make you start to think about some of the problems involved in making the elementary treatment into rigorous mathematics; and to indicate what new results appear if one applies the somewhat more advanced theory developed in this book. We stick to one example: a branching process. This is rich enough to show that the theory has some substance. 0.1. Typical number of children, X In our model, the number of children of a typical animal (see Notes below for some interpretations of'child' and 'animal') is a random variable X with values in Z+. We assume that P(X = 0) > 0. We define the generating function f o/Ias the map / : [0,1] —* [0,1], where f@) := E@x) = Standard theorems on power series imply that, for 0 6 [0,1], f'($) = E(X0x~1) = and fi := E(X) = /'(I) = ]T kP(X = k)<oo
2 Chapter 0: A Branching-Process Example Of course, /'(I) is here interpreted as lim MW = lim 9\\ 0-1 911 since /(I) = 1. We assume that \x < oo. Notes. The first application of branching-process theory was to the question of survival of family names; and in that context, animal = man, and child = son. In another context, 'animal' can be 'neutron', and 'child' of that neu- neutron will signify a neutron released if and when the parent neutron crashes into a nucleus. Whether or not the associated branching process is super- supercritical can be a matter of real importance. We can often find branching processes embedded in richer structures and can then use the results of this chapter to start the study of more interesting things. For superb accounts of branching processes, see Athreya and Ney A972), Harris A963), Kendall A966, 1975). 0.2. Size of nth generation, Zn To be a bit formal: suppose that we are given a doubly infinite sequence (a) [xJm):m,reNJ of independent identically distributed random variables (IID RVs), each with the same distribution as X: P(X(rm) =k) = P(X = k). The idea is that for n £ Z+ and r £ N, the variable Xr represents the number of children (who will be in the (n-f l)th generation) of the rth animal (if there is one) in the nth generation. The fundamental rule therefore is that if Zm signifies the size of the nth generation, then (b) Zn+1 = Jf <"+« + • • • + X£+1). We assume that Zo — 1, so that (b) gives a full recursive definition of the sequence (Zm : m £ Z+) from the sequence (a). Our first task is
..@.8) Chapter 0: A Branching-Process Example 3 to calculate the distribution function of Zn, or equivalently to find the generating function (c) /„(*) := EFZ») = £ 6kP{Zn = k). 0.3. Use of conditional expectations The first main result is that for n £ Z+ (and 9 £ [0,1]) (a) /„+!(<?) =/„(/(*)), so that for each n £ Z+, /n is the n-fold composition (b) fn=fofo...of. Note that the 0-fold composition is by convention the identity map fo@) — 0, in agreement with - indeed, forced by - the fact that Zq — 1. To prove (a), we use - at the moment in intuitive fashion - the fol- following very special case of the very useful Tower Property of Conditional Expectation: (c) E(U) = BE(U\V); to find the expectation of a random variable U, first find the conditional expectation E(£/|F) of U given V, and then find the expectation of that. We prove the ultimate form of (c) at a later stage. We apply (c) with U = 0Zn^ and V = Zn: Now, for fcG Z+, the conditional expectation of 0Zn+1 given that Zn = k satisfies (d) E(**«+* \Zn = Jfe) But Zn is constructed from variables Xs with r < n, and so Zn is inde- independent of X[n ,... ,X^n . The conditional expectation given Zn — k in the right-hand term in (d) must therefore agree with the absolute expec- expectation (e)
Chapter 0: A Branching-Process Example @.3).. But the expression at (e) is a expectation of the product of independent random variables and as part of the family of'Independence means multiply* results, we know that this expectation of a product may be rewritten as the product of expectations. Since (for every n and r) we have proved that and this is what it means to say that [If V takes only integer values, then when V — k, the conditional expectation E(£/|F) of U given V is equal to the conditional expectation E({7|F = k) of U given that V = k. (Sounds reasonable!)] Property (c) now yields and, since E(a*») = /„(<*), D result (a) is proved. Independence and conditional expectations are two of the main topics in this course. 0.4. Extinction probability, -k Let 7rn := P(Zn = 0). Then tt* = /n@), so that, by @.3,b), (a) 7Tn+i =/GTn). Measure theory confirms our intuition about the extinction probability: (b) 7T := P(Zm = 0 for some m) =| Iim7rn. Because / is continuous, it follows from (a) that (C) 7T = /(,). The function / is analytic on @,1), and is non-decreasing and convex (of non-decreasing slope). Also, /(I) = 1 and /@) = P(X = 0) > 0. The slope /'(I) of / at 1 is // = E(X). The celebrated pictures opposite now make the following Theorem obvious. THEOREM E(X) > 1, then the extinction probability tt is the unique root of the equation it = /(tt) which lies strictly between 0 and 1. If E(X) < 1, then 7T = 1.
..@.4) Chapter 0: A Branching-Process Example V = /(*) y — x Case 1: subcritical, /i = /'(I) < 1. Clearly, ir = 1. The critical case /i = 1 has a similar picture. Case 2: supercritical, n — /'(I) > 1. Now, w < 1.
6 Chapter 0: A Branching-Process Example @.5).. 0.5. Pause for thought: measure Now that we have finished revising what introductory courses on probability theory say about branching-process theory, let us think about why we must find a more precise language. To be sure, the claim at @.4,b) that (a) 7r=flim7Tn is intuitively plausible, but how could one prove it? We certainly can- cannot prove it at present because we have no means of stating with pure- mathematical precision what it is supposed to mean. Let us discuss this further. Back in Section 0.2, we said 'Suppose that we are given a doubly infinite sequence {Xr : m,r £ N} of independent identically distributed random variables each with the same distribution as X\ What does this mean? A random variable is a (certain kind of) function on a sample space Q.. We could follow elementary theory in taking Q to be the set of all outcomes, in other words, taking Q to be the Cartesian product r,8 the typical element uj of Q being u> = (u>£r) : r £ N, s £ N), and then setting Xb(uj) = a>« . Now Q is an uncountable set, so that we are outside the 'combinatorial' context which makes sense of ?rn in the elementary theory. Moreover, if one assumes the Axiom of Choice, one can prove that it is impossible to assign to all subsets of ft a probability satisfying the 'intuitively obvious' axioms and making the X's IID RVs with the correct common distribution. So, we have to know that the set of u corresponding to the event 'extinction occurs' is one to which one can uniquely assign a probability (which will then provide a definition of tt). Even then, we have to prove (a). Example. Consider for a moment what is in some ways a bad attempt to construct a 'probability theory'. Let C be the class of subsets C of N for which the 'density' p(C) := lim t{k : 1 < k < n; k £ C} n\oo exists. Let Cn := {1,2,..., n}. Then Cn £ C and Cn | N in the sense that Cn C Cn+i,Vn and also \JCn = N. However, p(Cn) = 0,Vn, but /?(N) = 1.
..@.6) Chapter 0: A Branching-Process Example 7 Hence the logic which will allow us correctly to deduce (a) from the fact that {Zn = 0} | {extinction occurs} fails for the (N,C,p) set-up: (N,C,p) is not ca probability triple'. □ There are problems. Measure theory resolves them, but provides a huge bonus in the form of much deeper results such as the Martingale Conver- Convergence Theorem which we now take a first look at - at an intuitive level, I hasten to add. 0.6. Our first martingale Recall from @.2,b) that H h AZn , where the J\T(n+1) variables are independent of the values Z\, Z2,..., Zn. It is clear from this that = j\Zn = Zn), a result which you will probably recognize as stating that the process Z (Zn : n >Q) is a Markov chain. We therefore have E(Zn+i\Z0 = zo,Zi = i'i,...,Zn =in) = ^jP(Zn+i =j\Zn = in j or, in a condensed and better notation, (a) E(Zn+i |Z0, Zi,..., Zn) — E(Zn+i \Zn). Of course, it is intuitively obvious that (b) E(Zn+1\Zn) = because each of the Zn animals in the nth generation has on average children. We can confirm result (b) by differentiating the result with respect to 0 and setting 0 = 1.
8 Chapter 0: A Branching-Process Example @.6).. Now define (c) Mn := Zn/fin, n > 0. Then which exactly says that (d) M is a martingale relative to the Z process. Given the history of Z up to stage n, the next value Mn+i of M is on average what it is now: M is 'constant on average' in this very sophisticated sense of conditional expectation given 'past' and 'present'. The true statement (e) E(Mn) = l, Vn is of course infinitely cruder. A statement S is said to be true almost surely (a.s.) or with prob- probability 1 if (surprise, surprise!) P(S is true) =1. Because our martingale M is non-negative (Mn > 0, Vn), the Martin- Martingale Convergence Theorem implies that it is almost surely true that (f) Jlfoo := UmMn exists. Note that if M^ > 0 for some outcome (which can happen with positive probability only when \i > 1), then the statement is a precise formulation of 'exponential growth'. A particularly fascinating question is: suppose that ji > 1; what is the behaviour of Z conditional on the value o 0.7. Convergence (or not) of expectations We know that M^ :— lim Mn exists with probability 1, and that E(Mn) = 1, Vn. We might be tempted to believe that E(Moo) = 1. However, we already know that if // < 1, then, almost surely, the process dies out and Mn is eventually 0. Hence (a) if fi < 1, then Moo - 0 (a.s.) and
..@.8) Chapter 0: A Branching-Process Example 9 This is an excellent example to keep in mind when we come to study Fatou's Lemma, valid for any sequence (Yn) of non-negative random vari- variables: E(liminf Yn) < liminf E(Fn). What is 'going wrong' at (a) is that (when \i < 1) for large n, the chances are that Mn will be large if Mn is not 0 and, very roughly speaking, this large value times its small probability will keep E(Mn) at 1. See the concrete examples in Section 0.9. Of course, it is very important to know when (b) and we do spend quite a considerable time studying this. The best gen- general theorems are rarely good enough to get the best results for concrete problems, as is evidenced by the fact that (c) E(Moo) = 1 if and only if both, /i > 1 and E(X logX) < oo, where X is the typical number of children. Of course 0 log 0 = 0. If \i > 1 and E(-YlogX) = oo, then, even though the process may not die out, o = 0, a.s. [oo 0.8. Finding the distribution of Mc Since Mn —* Moo (a.s.), it is obvious that for A > 0, exp(—XMn) —► exp(—AMoo) (a.s.) Now since each Mn > 0, the whole sequence (exp( — XMn)) is bounded in absolute value by the constant 1, independently of the outcome of our experiment. The Bounded Convergence Theorem says that we can now assert what we would wish: (a) Eexp(—AMqo) = limEexp(—AMn). Since Mn = Zn/fin and EFZ») = /n@), we have (b) Eexp(-AMn) = /n(exp(-A//in)), so that, in principle (if very rarely in practice), we can calculate the left-hand side of (a). However, for a non-negative random variable Y, the distribution function y i-> P(Y < y) is completely determined by the map A h-* Eexp(—AF) on @, oo).
10 Chapter 0: A Branching-Process Example @.8).. Hence, in principle, we can find the distribution of We have seen that the real problem is to calculate the function L(X) :~ Eexp(—AMqo). Using (b), the fact that fn+i — f ° /n, and the continuity of L (another consequence of the Bounded Convergence Theorem), you can immediately establish the functional equation: (c) L(\») = f{L(X)). 0.9. Concrete example This concrete example is just about the only one in which one can calculate everything explicitly, but, in the way of mathematics, it is useful in many contexts. We take the 'typical number of children' X to have a geometric distri- distribution: (a) P(X = k)=pqk (&G where 0 < p < 1, q := 1 — p. Then, as you can easily check, * and To calculate / o / o ... o /, we use a device familiar from the geometry of the upper half-plane. If G _ f 9\\ 9\2 \ \92\ 922/ is a non-singular 2x2 matrix, define the fractional linear transformation: (c) G{0) = +#22
..@.9) Chapter 0: A Branching-Process Example 11 Then you can check that if H is another such matrix, then G(HF)) = (GH)(e), so that composition of fractional linear transformations corresponds to ma- matrix multiplication. Suppose that p =fi q. Then, by the S *AS = A method, for example, we find that the nth power of the matrix corresponding to / is 0 ] -q \) -" " VI qJ\0 so that q/in(l - 0) + qO - p' If n = q/p < 1, then limn/n@) = 1, corresponding to the fact that the process dies out. Suppose now that /i > 1. Then you can easily check that, for A > 0, L(X) : = Eexp(—AMoo) = lim/n(exp(—X/fin)) p q-p /•OO from which we deduce that P(Moo = 0) = 7T, and P(x < M^ < x + <&) = A - Trfe-^-^^a: (x > 0), or, better, > a?) = A - w)e-(l-*)x (x > 0). Suppose that // < 1. In this case, it is interesting to ask: what is the distribution of Zn conditioned by Zn =fi 0? We find that Z" where
12 Chapter 0: A Branching-Process Example @.9).. so 0 < an < 1 and an -f /?„ = 1. As n -» oo, we see that an -» 1 - //, /?„ -» //, so (this is justified) (e) lim P(Zn = k\Zn ^ 0) = A - /z)/ (A: G N). Suppose that // = 1. You can show by induction that and that E{e-xz»'n\Zn ^ 0) - 1/A+A), corresponding to (f) P(Zn/n > x\Zn ^ 0) -> c"x, a: > 0. 'The Fatou factor' We know that when fi < 1, we have E(Mn) = 1, Vn, but E(Moo) = 0. Can we get some insight into this? First consider the case when fi < 1. Result (e) makes it plausible that for large n, E(Zn\Zn ^ 0) is roughly A — /*)X)&/* = V(* ~ We know that P(Zn ^ 0) = 1 - /n@) is roughly A - so we should have (roughly) n ^ 0) which might help explain how the 'balance' E(Mn) = 1 is achieved by big values times small probabilities.
..@.9) Chapter 0: A Branching-Process Example IS Now consider the case when // = 1. Then P(Zn + 0) = l/(n + 1), and, from (f), Zn/n conditioned by Zn =^ 0 is roughly exponential with mean 1, so that Mn = Zn conditioned by Zn =^ 0 is on average of size about n, the correct order of magnitude for balance. Warning. We have just been using for 'correct intuitive explanations' exactly the type of argument which might have misled us into thinking that = 1 in the first place. But, of course, the result E(Mn) = E(Mn\Zn jt 0)P(Zn jt 0) = 1 is a matter of obvious fact.
PART A: FOUNDATIONS Chapter 1 Measure Spaces 1.0. Introductory remarks Topology is about open sets. The characterizing property of a continuous function / is that the inverse image f~l(G) of an open set G is open. Measure theory is about measurable sets. The characterizing property of a measurable function / is that the inverse image f~1(A) of any measur- measurable set is measurable. In topology, one axiomatizes the notion of 'open set', insisting in par- particular that the union of any collection of open sets is open, and that the intersection of a finite collection of open sets is open. In measure theory, one axiomatizes the notion of 'measurable set', in- insisting that the union of a countable collection of measurable sets is measur- measurable, and that the intersection of a countable collection of measurable sets is also measurable. Also, the complement of a measurable set must be mea- measurable, and the whole space must be measurable. Thus the measurable sets form a <r-algebra, a structure stable (or 'closed') under countably many set operations. Without the insistence that 'only countably many operations are allowed', measure theory would be self-contradictory - a point lost on certain philosophers of probability. The probability that a point chosen at random on the surface of the unit sphere S2 in R3 falls into the subset jP of S2 is just the area of jP divided by the total area 4?r. What could be easier? However, Banach and Tarski showed (see Wagon A985)) that if the Axiom of Choice is assumed, as it is throughout conventional mathematics, then there exists a subset F of the unit sphere S2 in R3 such that for u
..A.1) Chapter 1: Measure Spaces 15 3 < k < oo (and even for k — oo), S2 is the disjoint union of k exact copies of F: where each r,- is a rotation. If F has an 'area', then that area must simultaneously be 4?r/3,4?r/4,..., 0. The only conclusion is that the set F is non-measurable (not Lebesgue measurable): it is so complicated that one cannot assign an area to it. Banach and Tarski have not broken the Law of Conservation of Area: they have simply operated outside its jurisdiction. Remarks, (i) Because every rotation r has a fixed point x on S2 such that t(x) = x, it is not possible to find a subset A of S2 and a rotation r such that A U r(A) = S2 and A D r(A) = 0. So, we could not have taken k — 2. (ii) Banach and Tarski even proved that given any two bounded sub- subsets A and B of R3 each with non-empty interior, it is possible to decompose A into a certain finite number n of disjoint pieces A = (JJLj A{ and B into the same number n of disjoint pieces B — (J"=1 #,-, in such a way that, for each i, A^ is Euclid-congruent to I?,!!! So, we can disassemble A and rebuild it as B. (iii) Section Al.l (optional!) in the appendix to this chapter gives an Axiom-of-Choice construction of a non-measurable subset of S1. This chapter introduces 0-algebraSj ir-systems, and measures and emphasizes monotone-convergence properties of measures. We shall see in later chapters that, although not all sets are measurable, it is always the case for probability theory that enough sets are measurable. 1.1. Definitions of algebra, <r-algebra Let S be a set. Algebra on S A collection So of subsets of S is called an algebra on S (or algebra of subsets of S) if 0) S 6 So, (ii) F 6 So => Fc := S\F 6 So, (iii) -F,Ge So => FUG G So. [Note that 0 = Sc € So and
Chapter 1: Measure Spaces A-1)~ Thus, an algebra on S is a family of subsets of S stable under finitely many set operations. Exercise (optional). Let C be the class of subsets C of N for which the 'density' lim m~1}ji{k : 1 < k < m\ k £ C} mfoo exists. We might like to think of this density (if it exists) as 'the probability that a number chosen at random belongs to C\ But there are many reasons why this does not conform to a proper probability theory. (We saw one in Section 0.5.) For example, you should find elements F and G in C for which FOGgC. Note on terminology ('algebra versus field'). An algebra in our sense is a true algebra in the algebraists' sense with D as product, and symmetric difference AAB:=(Al)B)\(AnB) as 'sum', the underlying field of the algebra being the field with 2 elements. (This is why we prefer 'algebra of subsets' to 'field of subsets': there is no way that an algebra of subsets is a field in the algebraists' sense - unless So is trivial, that is, So = {5,0}.) <r-algebra on S A collection S of subsets of S is called a cr-algebra on S (or <r-algebra of subsets of S) if S is an algebra on S such that whenever Fn 6 S (n £ N), then n [Note that if S is a <r-algebra on S and Fn £ S for n £ N, then n n Thus, a cr-algebra on S is a family of subsets of S 'stable under any countable collection of set operations'. Note. Whereas it is usually possible to write in 'closed form' the typical element of many of the algebras of sets which we shall meet (see Section 1.8 below for a first example), it is usually impossible to write down the typical element of a <j-algebra. This is the reason for our concentrating where possible on the much simpler '7r-systems'. Measurable space A pair E, S), where S is a set and S is a ^-algebra on 5, is called a measurable space. An element of S is called a S-measurable subset of S.
..A.2) Chapter 1: Measure Spaces 11 , <7-algebra generated by a class C of subsets Let C be a class of subsets of S. Then cr(C), the a-algebra generated by C, is the smallest <j-algebra S on S such that CCS. It is the intersection of all <7-algebras on S which have C as a subclass. (Obviously, the class of all subsets of S is a cr-algebra which extends C.) 1.2. Examples. Borel cr-algebras, B(S), B = B(R) Let 5 be a topological space. B(S) B(S)j the Borel cr-algebra on 5, is the cr-algebra generated by the family of open subsets of S. With slight abuse of notation, B(S) := cr(open sets). B := B(R) It is standard shorthand that B := #(R). The a-algebra B is the most important of all cr-algebras. Every subset of R which you meet in everyday use is an element of B\ and indeed it is difficult (but possible!) to find a subset of R constructed explicitly (without the Axiom of Choice) which is not in B. Elements of B can be quite complicated. However, the collection tt(R) := {(-oo,z] :x eR] (not a standard notation) is very easy to understand, and it is often the case that all we need to know about B is that (a) B = <t(tt(R)). Proof of (a). For each x in R, (—oo,a:] = flneis^"0^ "+" n~1)i so ^a^ as a countable intersection of open sets, the set (—oo,ar] is in B. All that remains to be proved is that every open subset G of R is in <7(tt(R)). But every such G is a countable union of open intervals, so we need only show that, for a, b £ R with a < 6, (a, b) e <t(tt(R)). But, for any u with u > a, (a, u] = (-oo, u] D (—oo, a]c £ <j(tt(R)), and since, for e = %(b — a), (a, 6) = [j(a,b - en'1], n we see that (a, b) £ cr(?r(R)), and the proof is complete.
18 Chapter 1: Measure Spaces A.8).. 1.3. Definitions concerning set functions Let 5* be a set, let So be an algebra on S, and let fiQ be a non-negative set function : So —► [0,oo]. Additive Then fio is called additive if //o@) = 0 and, for F, G £ So, Countably additive The map hq is called countably additive (or cr-additive) if //@) = 0 and whenever (Fn : n £ N) is a sequence of disjoint sets in So with union F = IJ-Fn in So (note that this is an assumption since So need not be a a-algebra), then n Of course (why?), a countably additive set function is additive. 1.4. Definition of measure space Let (S, S) be a measurable space, so that S is a <j-algebra on S. A map \x : S —> [0, oo]. is called a measure on E, S) if ft is countably additive. The triple E, S, is then called a measure space. 1.5. Definitions concerning measures Let (S, S,//) be a measure space. Then /j, (or indeed the measure space (S, S,//)) is called finite if fi(S) < co, cr- finite if there is a sequence En : n £ N) of elements of S such that n(Sn) < oo (Vn £ N) and [J 5n = S. Warning. Intuition is usually OK for finite measures, and adapts well for <7-finite measures. However, measures which are not <7-finite can be crazy; fortunately, there are no such measures in this book.
..A.6) Chapter 1: Measure Spaces 19 Probability measure, probability triple Our measure \x is called a probability measure if fi(S) = 1, and E, S,//) is then called a probability triple. //-null element of S, almost everywhere (a.e.) An element F of £ is called fi-nullii p(F) = 0. A statement 5 about points s of S is said to hold almost everywhere (a.e.) if F := {s : S(s) is false} G S and fi(F) = 0. 1.6. LEMMA. Uniqueness of extension, 7r-systems Moral: cr-algebras are 'difficult', but 7r-systems are 'easy'; so we aim to work with the latter. ►(a) Let S be a set. Let X be a Tr-system on S, that is, a family of subsets of S stable under finite intersection; Let £ := cr(X). Suppose that fi\ and fi2 are measures on E, S) such that fii(S) = fi2(S) < oo and /xi = //2 on X. Then \i\ = fi2 on S. ►(b) Corollary. If two probability measures agree on a Tr-system, then they agree on the <j-algebra generated by that Tr-system. The example B = (t(tt(R)) is of course the most important example of the S = cr(J) in the theorem. This result will play an important role. Indeed, it will be applied more frequently than will the celebrated existence result in Section 1.7. Because of this, the proof of Lemma 1.6 given in Sections Al.2-1.4 of the appendix to this chapter should perhaps be consulted - but read the remainder of this chapter first.
20 Chapter 1: Measure Spaces 1.7. THEOREM. Caratheodory's Extension Theorem Let S be a set, let So be an algebra on S, and let S:=<j(S0). If Ho is a countably additive map no : So —* [0,oo], then there exists a ( measure /i on E, S) such that := fio on If fio(S) < oo, then, by Lemma 1.6, this extension is unique - an algebra is a rc-system! In a sense, this result should have more ► signs than any other, for without it we could not construct any interesting models. However, once we have our model, we make no further use of the theorem. The proof of this result given in Sections Al.5-1.8 of the appendix is there for completeness. It will do no harm to assume the result for this course. Let us now see how the theorem is used. 1.8. Lebesgue measure Leb on (@, l],#@,1]) Let S = @,1]. For F C S, say that F £ So if F may be written as a finite union (*) F = (a1,61]U...U(ar,6r] where r £ N, 0 < a\ < b\ < • • • < ar < br < 1. Then So is an algebra on @,1] and (We write 5@,1] instead of #(@,1]).) For F as at (*), let k<r Then fiQ is well-defined and additive on So (this is easy). Moreover, fiQ is countably additive on So. (This is not trivial. See Section A1.9.) Hence, by Theorem 1.7, there exists a unique measure fi on (@,1], 5@,1]) extend- extending fio on So. This measure \i is called Lebesgue measure on (@, l],/?@,1]) or (loosely) Lebesgue measure on @,1]. We shall often denote fi by Leb. Lebesgue measure (still denoted by Leb) on ([0, l],#[0,1]) is of course ob- obtained by a trivial modification, the set {0} having Lebesgue measure 0. Of course, Leb makes precise the concept of length. In a similar way, we can construct (cr-finite) Lebesgue measure (which we also denote by Leb) on R (more strictly, on (R, B(R)).
..A.10) Chapter 1: Measure Spaces 21 1.9. LEMMA. Elementary inequalities Let (,!>,£,//) be a measure space. Then (a) n(A U B) < n(A) + fi(B) Furthermore, if fi(S) < oo, //ten (c) /z(AUJ3)==/z(A) + /z(J3)-/z(An£) (A,JB 6 E), (d) (inclusion-exclusion formula): for Fi, F2,..., Fn G E, E £<</(*n successive partial sums alternating between over- and under-estimates. You will surely have seen some version of these results previously. Re- Result (c) is obvious because A U B is the disjoint union A U (B\( A D B)). But (c)=^(a)=^(b) - check that 'infinities do not matter'. You can deduce (d) from (c) by induction, but, as we shall see later, the neat way to prove (d) is by integration. 1.10. LEMMA. Monotone-convergence properties of measures • These results are often needed for making things rigorous. (Peep ahead to the 'Monkey typing Shakespeare' Section 4.9.) Again, let E, S, fi) be a measure space. ►(a) If Fn £ E (n 6 N) and Fn | -F, then fi(Fn) | fi(F). Notes. Fn T F means: Fn C Fn+1 (Vn £ N), \JFn = F. Result (a) is 2/ie fundamental property of measure. Proof of (a). Write d := Fls Gn := Fn\Fn_i (n > 2). Then the sets n (n G N) are disjoint, and U G2 U ... U G») = Y, /*(G*) t /:<oo Application. In a proper formulation of the branching-process example of Chapter 0, {Zn — 0} I {extinction occurs}, so that ?rn f tt. (A proper formulation of the branching-process example will be given later.)
22 Chapter 1: Measure Spaces A.10).. ►(b) If Gn £ E, Gn i G and n(Gk) < oo for some &, then fi(Gn) I Proof of (b). For n £ N, let Fn := G|fc\Gjt+n, and now apply part (a). □ Example - to indicate what can 'go wrong'. For n £ N, let n •= (n,oo). Then Leb(tfn) = oo,Vn, but tfn | 0. ►(c) 77ie union of a countable number of ft-null sets is fi-null. This is a trivial corollary of results A.9,b) and A.10,a). 1.11. Example/Warning Let E, E,//) be ([0,1], B[Q, l],Leb). Let e(k) be a sequence of strictly posi- positive numbers such that e(k) j 0. For a single point x of 5, we have (a) {x} C(x - e(k), x + e(k)) f! 5, so that for every k, //({#}) < 2e(k), and so //({#}) = 0. That {x} is ^E)- measurable follows because {x} is the intersection of the countable number of open subsets of S on the right-hand side of (a). Let V = Q fl [0,1], the set of rationals in [0,1]. Since V is a count- countable union of singletons: V = {vn : n £ N}, it is clear that V is #[0,1]- measurable and that Leb(V) = 0. We can include V in an open subset of S of measure at most 4e(k) as follows: VCGk= |J [(vn - e(kJ-n, vn + e(*J-») n S] =: |J /„,*. n Clearly, H :— f)k Gk satisfies Leh(H) = 0 and V Q H. Now, it is a consequence of the Baire category theorem (see the appendix to this chapter) that H is uncountable, so (b) the set H is an uncountable set of measure 0; moreover, n n Throughout the subject, we have to be careful about interchanging orders of operations.
Chapter 2 Events 2.1. Model for experiment: (£l,f, P) A model for an experiment involving randomness takes the form of a prob- probability triple @,^*, P) in the sense of Section 1.5. Sample space H is a set called the sample space. Sample point A point lj of Q is called a sample point. Event The cr-algebra J~ on 0 is called the family of events, so that an event is an element of J-', that is, an ^"-measurable subset of 0. By definition of probability triple, P is a probability measure on (Q^J7). 2.2. The intuitive meaning Tyche, Goddess of Chance, chooses a point cj of Q, 'at random' according to the law P in that, for F in f, P(F) represents the 'probability' (in the sense understood by our intuition) that the point u chosen by Tyche belongs to F. The chosen point uj determines the outcome of the experiment. Thus there is a map 0 —> set of outcomes, u; *-¥ outcome. There is no reason why this 'map' (the co-domain lies in our intuition!) should be one-one. Often it is the case that although there is some obvious 'minimal' or 'canonical' model for an experiment, it is better to use some richer model. (For example, we can read off many properties of coin tossing by imbedding the associated random walk in a Brownian motion.) 28
24 Chapter 2: Events B.8).. 2.3. Examples of (fi,^*) pairs We leave the question of assigning probabilities until later. (a) Experiment: Toss coin twice. We can take Q = {HH, HT, TH, TT], f = V(tt) := set of all subsets of Q. In this model, the intuitive event 'At least one head is obtained' is described by the mathematical event (element of J-) {HH,HT,TH}. (b) Experiment: Toss coin infinitely often. We can take so that a typical point uj of Q is a sequence We certainly wish to speak of the intuitive event 'u>n = W\ where W £ {Hy T}, and it is natural to choose T = a{{u e a : un = W) : n 6 N,W 6 {H,T}). Although T =fi V(£i) (accept this!), it turns out that T is big enough; for example, we shall see in Section 3.7 that the truth set \ n 2} of the statement number of heads in n tosses 1 n * 2 is an element of Note that we>fcn use the current model as a more informative model for the experiment in (a), using the map u; h-> (cji,^) of sample points to outcomes. (c) Experiment: Choose a point between 0 and 1 uniformly at random. Take fi = [0,1], .F = #[0,1],uj signifying the point chosen. In this case, we obviously take P =Leb. The sense in which this model contains model (b) for the case of a fair coin will be explained later.
..B.5) Chapter 2: Events 25 2.4. Almost surely (a.s.) A statement S about outcomes is said to be true almost surely (a.s.), or with probability 1 (w.p.l), if F := {a; : S(w) is true} G ^ and P(F) = 1. (a) Proposition. If Fn € T (n € N) <md P(-F«) = l,Vn, Pro©/. P(F^) = 0,Vn, so, by Lemma 1.10(c), PflJn^n) = °- But C\Fn = (b) Something to think about. Some distinguished philosophers have tried to develop probability without measure theory. One of the reasons for difficulty is the following. When the discussion B.3,b) is extended to define the appropriate prob- probability measure for fair coin tossing, the Strong Law of Large Numbers (SLLN) states that F € F and P(F) = 1, where F, the truth set of the statement 'proportion of heads in n tosses —> \\ is defined formally in B.3,b). Let A be the set of all maps a : N —♦ N such that c*(l) < <*B) <... . For a £ A, let n f lt(fc < n : u>a(k) = H) 1\ I n 2} the 'truth set of the Strong Law for the subsequence a'. Then, of course, we have P(Fa) = 1, Va 6 A. Exercise. Prove that (Hint. For any given a>, find an a ... .) The moral is that the concept of 'almost surely' gives us (i) absolute precision, but also (ii) enough flexibility to avoid the self-contradictions into which those innocent of measure theory too easily fall. (Of course, since philosophers are pompous where we are precise, they are thought to think deeply ... .) 2.5. Reminder: limsup, liminf, | lim, etc. (a) Let (xn : n £ N) be a sequence of real numbers. We define lim sup xn :— inf < sup xn > =J, lim < sup xn > m ln>m J m [n>m ) £ [— oo, oo],
26 Chapter 2: Events B.5).. Obviously, ym := supn>m xn is monotone non-increasing in m, so that the limit of the sequence ym exists in [—oo, oo]. The use of flim or J,lim to signify monotone limits will be handy, as will yn j y^ to signify yoo =1 limt/n. (b) Analogously, liminf xn := sup < inf xn > =| lim < inf xn > £ [—oo, oo]. m ^n>m J m ^n>m J (c) We have xn converges in [—00,00] <£=> limsupxn = liminf xn, and then limxn = lim sup xn = liminf xn. ► (d) Note that (i) if z > lim sup xn, then xn < z eventually (that is, for all sufficiently large ri) (ii) if z < lim sup xny then xn > z infinitely often (that is, for infinitely many n). 2.6. Definitions. limsupi£n,(.En, i.o.) The event (in the rigorous formulation: the truth set of the statement) 'number of heads/ number of tosses —► J' is built out of simple events such as 'the nth toss results in heads' in a rather complicated way. We need a systematic method of being able to handle complicated combinations of events. The idea of taking lim infs and lim sups of sets provides what is required. It might be helpful to note the tautology that, if E is an event, then E = {a; : w € E). Suppose now that (En : n £ N) is a sequence of events. ► (a) We define (En, i.o.) : = [En infinitely often) : = limsup.En := Q [J En = {u> : for every ro, 3n(u>) > m such that uj 6 = {a; : a; G En for infinitely many n}.
..B.8) Chapter 2: Events 27 ►(b) (Reverse Fatou Lemma - needs FINITENESS of P) P(limsup£n) > limsupP(JE;n). Proof. Let Gm := Un>m En> Then (look at the definition in (a)) Gm 1 G, where G := limsup£n7 By result A.10,b), P(Gm) 1 P(G). But, clearly, P(Gm) > sup P(En). Hence, P(G) >i lim ( sup P(En)\ =: HmsupP(.En). □ m [n>m J 2.7. First Borel-Cantelli Lemma (BC1) ► Let (En : n £ N) be a sequence of events such that £nP(£n)<oo. Then P(limsup£n) = P(En, i.o.) = 0. Proof. With the notation of B.6,b), we have, for each m, P(G) < P(Gm) < using A.9,b) and A.10,a). Now let m f oo. D Notes, (i) An instructive proof by integration will be given later. (ii) Many applications of the First Borel-Cantelli Lemma will be given within this course. Interesting applications require concepts of indepen- independence, random variables, etc.. 2.8. Definitions, liminfEn, (En, ev) Again suppose that (En : n 6 N) is a sequence of events. ► (a) We define (Eny ev) : = (En eventually) : = liminfEn:=\J f] En m n">m = {uj : for some ro(u;), u £ En,\/n > m(u>)} = {io : uj £ En for all large n). (b) Note that (£n, ev)c = (££, i.o.). ►(c) (Fatou's Lemma for sets - true fot ALL measure spaces) P(liminf En) < liminf P(En). Exercise. Prove this in analogy with the proof of result B.6,b), using A.10,a) rather than A.10,b).
28 Chapter 2: Events B.9) 2.9. Exercise For an event E, define the indicator function Ie on Q via ., ifweE, Let (En : n G N) be a sequence of events. Prove that, for each u>, IlimsuP£n(^) = limsupl£?n(a;), and establish the corresponding result for lim infs.
Chapter 3 Random Variables Let (Sy S) be a measurable space, so that S is a a-algebra on S. 3.1. Definitions. S-measurable function, mS,(mS)+,bS Suppose that h : S —♦ R. For ACR, define h-1(A):={seS:h(s)G A}. Then h is called S-measurable if h~l :5-^S, that is, h~1(A) £ S, VA £ So, here is a picture of a S-measurable function h: We write mS for the class of S-measurable functions on 5, and (mS)+ for the class of non-negative elements in mS. We denote by bS the class of bounded S-measurable functions on 5. Note. Because lim sups of sequences even of finite-valued functions may be infinite, and for other reasons, it is convenient to extend these definitions to functions h taking values in [—00,00] in the obvious way: h is called S-measurable if h~l : B[—00,00] —* S. Which of the various results stated for real-valued functions extend to functions with values in [—00, 00], and what these extensions are, should be obvious. Borel function A function h from a topological space S to R is called Borel if h is B{S)- measurable. The most important case is when S itself is R. 29
80 Chapter 8: Random Variables (8.2).. 3.2. Elementary Propositions on measurability (a) The map h~1 preserves all set operations: h-'iUa Aa) = U* ^KO. h^{Ac) = {h-\A)Y, etc. Proof. This is just definition chasing. □ ►(b) IfCCBand<T(C) = B,thenh-1:C->E => ft £ mE. Proof. Let £ be the class of elements B in B such that h~1(B) £ E. By result (a), £ is a <r-algebra, and, by hypothesis, S "DC. □ (c) If S is topological and h : S —* R is continuous, then h is Borel. Proof. Take C to be the class of open subsets of R, and apply result (b). □ ►(d) For any measurable space (Sy S), a function h : S —* R is S-measurable if {h < c) := {seS : h(s) <c}eS (Vc £ R). Proof Take C to be the class tt(R) of intervals of the form (—oo, c], c € R, and apply result (b). □ Note. Obviously, similar results apply in which {h < c] is replaced by {h > c}, {h > c}, etc. 3.3. LEMMA. Sums and products of measurable functions are measurable ► mS is an algebra over R, that is, if A £ R and h,hi,h2 £ mS, then hi + /12 € mS, /ii/i2€mS, Xh £ mS. Example of proof. Let c £ R. Then for ^ € S, it is clear that hi(s)+h2(s) > c if and only if for some rational q, we have hi(s) > q > c — h2(s). In other words, + /i2 > c} = (J ({/ij > q) fl {/*2 > c - a countable union of elements of E. □
..(8.6) Chapter 8: Random Variables 81 3.4. Composition Lemma. If h e mS and f £ mB, then f o h £ mS. Proof. Draw the picture: C ** D * v O O ► rl r It Note. There are obvious generalizations based on the definition (important in more advanced theory): if (Si, Si) and (S2,S2) are measurable spaces and h : Si —* S2, then h is called Si/S2-measurable if h~l : S2 —* Si. From this point of view, what we have called S-measurable should read SI'B-measurable (or perhaps S/#[—00,00]-measurable). 3.5. LEMMA on measurability of infs, lini infs of functions ► Let (hn : n £ N) be a sequence of elements o/mS. Then (i) inf 7in, (ii) liminf/in, (iii) limsup/in are Yt-measurable (into ([—00,00], ^[—00, 00]), but we shall still write mihn £ mS (for example)). Further, (iv) {s : limhn(s) exists in R} £ S. Proof, (i) {inf hn > c} = Hnl^w — c}* (ii) Let Ln(s) := inf {hr(s) : r > n}. Then Ln £ mS, by part (i). But L(s) := liminf hn(s) =| limLn(s) = supLn(s), (iii) This part is now obvious. (iv) This is also clear because the set on which lim/in exists in R is {limsup/in < 00} fl {liminf hn > —00} D ^"^({O}), where g := limsuphn — liminf hn. D 3.6. Definition. Random variable ►Let (Q^J7) be our (sample space, family of events). A random variable is an element of m^*. Thus, X : Q -> R, X'1 \B-*T.
32 Chapter 8: Random Variables (8.7).. 3.7. Example. Coin tossing Let n = {H,T}N, u = £ {H,T}. As in B.3,b), we define , T}). Let The definition of T guarantees that each Xn is a random variable. By Lemma 3.3, Sn := X\ + X2 + - • • + Xn = number of heads in n tosses is a random variable. Next, for p £ [0,1], we have A f := < a; : number of heads r r- number of tosses 1 p\ J = p} r : L where L+ := limsupn~15n and L~ is the corresponding lim inf. By Lemma 3.5, A6f ► Thus, we have taken an important step towards the Strong Law: the result is meaningful! It only remains to prove that it is true! 3.8. Definition. <r-algebra generated by a collection of functions on Q This is an important idea, discussed further in Section 3.14. (Compare the weakest topology which makes every function in a given family continuous, etc.) In Example 3.7, we have a given set 0, a family (Xn : n £ N) of maps Xn : Q —> R. The best way to think of the cr-algebra J- in that example is as T = a(Xn : n £ N) in the sense now to be described. ►Generally, if we have a collection (F7 : 7 £ C) of maps Y1 : Q -> R, then : 7 € C)
..(8.10) Chapter 8: Random Variables 88 is defined to be the smallest a-algebra y onQ such that each map Yy G G C) is y-measurable. Clearly, G(K7 : 7 G C) = <r({u; G a : Y^w) 6 B} : 7 € C,B 6 B). If X is a random variable for some (ft, J7), then, of course, cr(X) C T. Remarks, (i) The idea introduced in this section is something which you will pick up gradually as you work through the course. Don't worry about it now; think about it, yes! (ii) Normally, tt-systems come to our aid. For example, if (Xn : n G N) is a collection of functions on ft, and Xn denotes a{Xk : k < n), then the union (J Xn is a 7r-system (indeed, an algebra) which generates cr(Xn : n G N). 3.9. Definitions. Law, distribution function Suppose that X is a random variable carried by some probability triple (ft^P). We have or indeed [0,1] 4^- a{X) £^B. Define the law Cx of X by Cx~P°X-\ £X : B-» [0,1]. Then (Exercise!) Cx is a probability measure on (R,H). Since ?r(R) = {(—00, c] : c G R} is a 7r-system which generates B, Uniqueness Lemma 1.6 shows that Cx is determined by the function Fx : R —♦ [0,1] defined as follows: Fx(c) := £x(-oo,c] = P(X < c) = P{u; : X(cj) < c}. The function Fx is called the distribution function of X. 3.10. Properties of distribution functions Suppose thatF is the distribution function F = Fx of some random variable X. Then (a) F : R -> [0,1], F | (that is, x < y => F(x) < F(y)), (b) limx_ooF(a:) = l, limx__ooF(a:) = 0, (c) F w right-continuous. Proof of (c). By using Lemma A.10,b), we see that and this fact together with the monotonicity of Fx shows that Fx is right- continuous. Exercise! Clear up any loose ends.
Chapter S: Random Variables (8.11).. 3.11. Existence of random variable with given distribution func- function If F has the properties (a,b,c) in Section 3.10, then, by analogy with Sec- Section 1.8 on the existence of Lebesgue measure, we can construct a unique probability measure C on (R, B) such that £(—oo,x] = F(x),\/x. Take (ft, T, P) = (R, B, £), X(w) = u>. Then it is tautological that Note. The measure C just described is called the Lebesgue-Stieltjes measure associated with F. Its existence is proved in the next section. 3.12. Skorokhod representation of a random variable with pre- prescribed distribution function Again let F : R —* [0,1] have properties C.10,a,b,c). We can construct a random variable with distribution function F carried by as follows. Define (the right-hand equalities, which you can prove, are there for clarification only) (al) inf{z : F(z) > u} = sup{y : F(y) < a;}, := inf{z : F(z) > u} = sup{y : F(y) (al) The following picture shows cases to watch out for. 1 0 By definition of X X-{F{x)) x X+(F(x)) (w < F(c)) => < c).
..(8.12) ChapterS: Random Variables 85 Now, so, by the right-continuity of F, F(X~(u>)) > u>, and fuj<F(X-(aj))<F(c)\ Thus, (w < F(c)) <=> (X-(w) < c), so that P(X~ < c) = (b) TTie variable X therefore has distribution function F, and the mea sure C in Section 3.11 is just the law of X~. It will be important later to know that (c) X"*" also has distribution function F, and that, indeed, P(X+ = X") = 1. Proof of(c). By definition o so that F(c) < P(X+ < c). Since X~ < X+, it is clear that {X~ £ X+} = |J {X~ <c< X+}. c€Q But, for every c £ R, P(Ar~ < c < X+) = P({X~ < c}\{X+ < c}) < F(c) - F(c) = 0. Since Q is countable, the result follows. □ Remark. It is in fact true that every experiment you will meet in this (or any other) course can be modelled via the triple ([0, l],#[0,1], Leb). (You will start to be convinced of this by the end of the next chapter.) However, this observation normally has only curiosity value.
SO Chapter 8: Random Variables (8.18).. 3.13. Generated cr-algebras - a discussion Suppose that (ft,.?7, P) is a model for some experiment, and that the ex- experiment has been performed, so that (see Section 2.2) Tyche has made her choice of a;. Let (Ky : 7 G C) be a collection of random variables associated with our experiment, and suppose that someone reports to you the following information about the chosen point u>: (*) the values Yy(uj), that is, the observed values of the random variables y-r G 6 C). Then the intuitive significance of the cr-algebra y := <r(Y7 : 7 € C) is that it consists precisely of those events F for which, for each and every a>, you can decide whether or not F has occurred (that is, whether or not u; € F) on the basis of the information (*); the information (*) is precisely equivalent to the following information: (**) the values IF(w) (F G y). (a) Exercise. Prove that the <r-algebra cr(Y) generated by a single random variable Y is given by a{Y) = Y~l{B) := ({w : Y(u>) € B} : B £ B), and that <r(Y) is generated by the 7r-system tt(Y) := ({w : Y(w) < x} : x <E R) = F-1(tt(R)). D The following results might help clarify things. Good advice: stop reading this section after (c)! Results (b) and (c) are proved in the appendix to this chapter. (b) If Y : Q —* R, then Z : ft —> R is an <r(Immeasurable function if and only if there exists a Borel function / : R —> R such that Z = f(Y). (c) If Ki, yi,..., Yn are functions from Q to R, then a function Z : Q —> R is cr(Yi, Y2,..., Yn)-measurable if and only if there exists a Borel function / on Rn such that Z = f(Y\, Y2,..., Yn). We shall see in the appendix that the more correct measurability condition on / is that / be '#n-measurable\ (d) If (F7 : 7 £ C) is a collection (parametrized by the infinite set C) of functions from ft to R, then Z : ft -> R is <r(Yy : 7 € C)-measurable if and only if there exists a countable sequence G,- : i G N) of elements of C and a Borel function / on RN such that Warning - for the over-enthusiastic only. For uncountable C, #(RC) is much larger than the C-fold product measure space Yi-yec #(R)- ^ is tne latter rather than the former which gives the appropriate type of / in (d).
..(8.14) ChapterS: Random Variables SI 3.14. The Monotone-Class Theorem In the same way that Uniqueness Lemma 1.6 allows us to deduce results about cr-algebras from results about 7r-systems, the following 'elementary' version of the Monotone-Class Theorem allows us to deduce results about general measurable functions from results about indicators of elements of n- systems. Generally, we shall not use the theorem in the main text, preferring 'just to use bare hands'. However, for product measure in Chapter 8, it becomes indispensable. THEOREM. Let H be a class of bounded functions from a set S into R satisfying the following conditions: (i) 7i is a vector space over R; (ii) the constant function 1 is an element ofTi; (iii) if (fn) is a sequence of non-negative functions in H. such that fn T / where f is a bounded function on S, then f £ 7i. Then if 7i contains the indicator function of every set in some tt- system X, then 7i contains every bounded <r(T)-measurable function on S. For proof, see the appendix to this chapter.
Chapter 4 Independence Let (fi,*F, P) be a probability triple. 4.1. Definitions of independence Note. We focus attention on the cr-algebra formulation (and describe the more familiar forms of independence in terms of it) to acclimatize ourselves to thinking of cr-algebras as the natural means of summarizing information. Section 4.2 shows that the fancy c-algebra definitions agree with the ones from elementary courses. Independent c-algebras ► Sub-<7-algebras ^i,&2»«-- of .F are called independent if, whenever G{ G Qi (i £ N) and i\,..., in are distinct, then n Independent random variables ►Random variables X\,X29 • • • are called independent if the cr-algebras are independent. Independent events ►Events JE?i, £?2»• • • sxe called independent if the <j-algebras £1, £2,... are independent, where Sn is the cr-algebra {$,En,£l\En,Q}. Since €n — °(lEn ), it follows that events E\, 2?2, • • • a^e independent if and only if the random variables Iex , /£;2,... are independent. 88
..D-%) Chapter 4- Independence 89 4.2. The 7r-system Lemma; and the more familiar definitions We know from elementary theory that events E\, E2,... are independent if and only if whenever n £ N and i1?... ,zn are distinct, then n corresponding results involving complements of the JE7«, etc., being conse- consequences of this. We now use the Uniqueness Lemma 1.6 to obtain a significant general- generalization of this idea, allowing us to study independence via (manage- (manageable) 7r-systems rather than (awkward) cr-algebras. Let us concentrate on the case of two cr-algebras. ►(a) LEMMA. Suppose that Q and 7i are sub-a-algebras of J-, and that X and J are ft-systems with cr(J) = G, a{J) = H. Then Q and Ji are independent if and only ifX and J are independent in that P(/n J) = P(/)P(J), 1 el, J ej. Proof Suppose that J and J are independent. For fixed / in J, the measures (check that they are measures!) H »-» P(J n H) and H 1-+ P(I)P(#) on @,7^) have the same total mass P(/), and agree on J. By Lemma 1.6, they therefore agree on <r(J) = 7i. Hence, p(jnh) = p(/)P(#), /€ J, h en. Thus, for fixed H in K, the measures G 1-+ P(G n JI) and G 1-+ P(G)P(fT) on @, ^) have the same total mass P(^), and agree on X. They therefore agree on <r(J) = Q; and this is what we set out to prove. □
j@ Chapter 4-' Independence Suppose now that X and Y are two random variables on (ft, T, P) such that, whenever x,y € R, (b) P(X < x\ Y < y) = P(X < x)P(Y < y). Now, (b) says that the 7r-systems tt(X) and tt(Y) (see Section 3.13) are independent. Hence cr{X) and <r(Y) are independent: that is, X and Y are independent in the sense of Definition 4.1. In the same way, we can prove that random variables Xi, X2,. • • ,Xn are independent if and only if n P(Xk < xk : 1 < k < n) = JJ P(Xk < xk), and all the familiar things from elementary theory. Command: Do Exercise E4.1 now. 4.3. Second Borel-Cantelli Lemma (BC2) If (En : n € N) is a sequence of independent events, then = 00 =^ P(JEJn, i.o.) = P(limsup£n) = 1 Proof First, we have (limsup£n)c = liminfEcn = (J f] Ecn. m With pn denoting P(£^n), we have n>m this equation being true if the condition {n > m} is replaced by condition {r > n > m}, because of independence, and the limit as r f 00 being justified by the monotonicity of the two sides. For x > 0, 1 — x < exp(—x), so that, since Y!,Pn = °°> n>m \ n>m So, P[(limsuPjE;n)c]=0. D Exercise. Prove that if 0 < pn < 1 and S := J2pn < 00, then Y[(l 0. Hint First show that if S < 1, then J](l - Pn) > 1 ~
..D-4) Chapter 4- Independence 41 4.4. Example Let (Xn : n E N) be a sequence of independent random variables, each exponentially distributed with rate 1: P(Xn > x) = e"x, x > 0. Then, for a > 0, so that, using (BCl) and (BC2), (aO) P(Xn > a logn for infinitely many n) = < ' Now let L := limsup(Xn/logn). Then P(X > 1) > P(Arn > logn, i.o.) = 1, and, for k E N, P(Z > 1 + 2A:-1) < P (Xn > A + A;-1)logn, i.o.) = 0. Thus, {L > 1} = \Jk{L > 1 + 2A:} is P-null, and hence L — 1 almost surely. Something to think about In the same way, we can prove the finer result (al) P(Xn > logn + a log logn, i.o. ) = \ l -f ^ < i' or, even finer, (a2) P(Xn > logn + log logn + a log log logn, i.o. )=\1 if ^ < i' or etc. By combining in an appropriate way (think about this!) the se- sequence of statements (aO),(al),(a2),... with the statement that the union of a countable number of null sets is null while the intersection of a sequence of probability-1 sets has probability 1, we can obviously make remarkably precise statements about the size of the big elements in the sequence (Xn)- I have included in the appendix to this chapter the statement of a truly fantastic theorem about precise description of long-term behaviour: Strassen's Law.
42 Chapter 4-' Independence D-4)- A number of exercises in Chapter E are now accessible to you. 4.5. A fundamental question for modelling Can we construct a sequence (Xn : n E N) of independent random variables, Xn having prescribed distribution function Fn ? We have to be able to answer Yes to this question - for example, to be able to construct a rigorous model for the branching-process model of Chapter 0, or indeed for Example 4.4 to make sense. Equation @.2,b) makes it clear that a Yes answer to our question is all that is needed for a rigorous branching-process model. The trick answer based on the existence of Lebesgue measure given in the next section does settle the question. A more satisfying answer is provided by the theory of product measure, a topic deferred to Chapter 8. 4.6. A coin-tossing model with applications Let (ft,^1*, P) be ([0, l],#[0,1], Leb). For u E ft, expand u in binary: (The existence of two different expansions of a dyadic rational is not going to cause any problems because the set D (say) of dyadic rationals in [0,1] has Lebesgue measure 0 - it is a countable set!) An an Exercise, you can prove that the sequence (£„ : n E N), where is a sequence of independent variables each taking the values 0 or 1 with probability \ for either possibility. Clearly, (£n : n E N) provides a model for coin tossing. Now define and so on. We now need a bit of common sense. Since the sequence has the same 'coin-tossing' properties as the full sequence (un : n E N), it is clear that Y± has the uniform distribution on [0,1]; and similarly for the other Y's.
..D-8) Chapter 4- Independence 4$ Since the sequences A,3,6,...), B,5,9,...), ... which give rise to Yi, F2,... are disjoint, and therefore correspond to different sets of tosses of our 'coin', it is intuitively obvious that ► Fi,Y2,--- are independent random variables, each uniformly dis- distributed on [0,1], Now suppose that a sequence (Fn : n £ N) of distribution functions is given. By the Skorokhod representation of Section 3.12, we can find functions gn on [0,1] such that Xn := gn(Yn) has distribution function Fn. But because the Y-variables are independent, the same is obviously true of the X-variables. ► We have therefore succeeded in constructing a family (Xn : n € N) of independent random variables with prescribed distribution functions. Exercise. Satisfy yourself that you could if forced carry through these intuitive arguments rigorously. Obviously, this is again largely a case of utilizing the Uniqueness Lemma 1.6 in much the same way as we did in Section 4.2. 4.7. Notation: IID RVs Many of the most important problems in probability concern sequences of random variables (RVs) which are independent and identically distributed (IID). Thus, if (Xn) is a sequence of IID variables, then the Xn are inde- independent and all have the same distribution function F (say): P(*» < x) = F(x), V?2,Vz. Of course, we now know that for any given distribution function F, we can construct a triple (H,^*, P) carrying a sequence of IID RVs with common distribution function F. In particular, we can construct a rigorous model for our branching process. 4.8. Stochastic processes; Markov chains ►A stochastic process Y parametrized by a set C is a collection Y = (K, : 7 6 C) of random variables on some triple (Q,^*, P). The fundamental question about existence of a stochastic process with prescribed joint distributions is (to all intents and purposes) settled by the famous Daniell-Kolmogorov theorem, which is just beyond the scope of this course.
Chapter 4-' Independence Our concern will be mainly with processes X = (Xn : n G Z+) indexed (or parametrized) by Z+. We think of Xn as the value of the process X at time n. For u> G ft, the map ?i h-> Xn(u;) is called the sample path of X corresponding to the sample point a;. A very important example of a stochastic process is provided by a Markov chain. Let E be a, finite or countable set. Let P = (pij : z, j G E) be a stochastic E x E matrix, so that for i,j G E, we have Let fjt be a probability measure on E, so that fi is specified by the values Hi :=z n({i}), (i G E). By a time-homogeneous Markov chain Z = (Zn : n G Z~*~) on E with initial distribution fi and 1-step transition matrix P is meant a stochastic process Z such that, whenever n G Z+ and z'o, ii,..., in € 22, (a) P(Z0 = io;Zt = ii;...;Zn = in) = ^iQPioix •..?»„_!»„• Exercise. Give a construction of such a chain Z expressing Zn{u>) explicitly in terms of the values at w of a suitable family of independent random variables. See the appendix to this chapter. 4.9. Monkey typing Shakespeare Many interesting events must have probability 0 or 1, and we often show that an event F has probability 0 or 1 by using some argument based on independence to show that P(FJ = P(F). Here is a silly example, to which we apply a silly method, but one which both illustrates very clearly the use of the monotonicity properties of measures in Lemma 1.10 and has a lot of the flavour of the Kolmogorov 0-1 law. See the 'Easy exercise' towards the end of this section for an instantaneous solution to the problem. Let us agree that correctly typing WS, the Collected Works of Shake- Shakespeare, amounts to typing a particular sequence of N symbols on a type- typewriter. A monkey types symbols at random, one per unit time, producing an infinite sequence (^n) of IID RVs with values in the set of all possible symbols. We agree that e := inf {P(Xi = x) : x is a symbol} > 0. Let H be the event that the monkey produces infinitely many copies of WS. Let Hk be the event that the monkey will produce at least k copies of WS in
..D-9) Chapter 4-' Independence 4b all, and let Hm^k be the probability that it will produce at least k copies by time m. Finally, let #(m) be the event that the monkey produces infinitely many copies of WS over the time period [m -f- l,oo). Because the monkey's behaviour over [l,m] is independent of its be- behaviour over [m + 1, oo), we have P(Hm,k n But logic tells us that, for every m, i/(m) = Hi Hence, P{Hm,t n H) = P{Hm,k)P{H). But, as m f oo, Hm<k T Hk, and (Hm,k nfl)| (Hk D H) = H, it being obvious that Hk D H. Hence, by Lemma 1.10(a), P(H) = P(Hk)P(H). However, as k | oo, Hk I H, and so, by Lemma 1.10(b), P(H) = P(H)P(H), whence P(H) = 0 or 1. The Kolmogorov 0-1 law produces a huge class of important events E for which we must have P(E) = 0orP(E) = 1. Fortunately, it does not tell us which - and it therefore generates a lot of interesting problems! Easy exercise. Use the Second Borel-Cantelli Lemma to prove that P(H) = 1. Hint. Let E\ be the event that the monkey produces WS right away, that is, during time period [l,iV]. Then P(E\) > eN. Tricky exercise ( to which we shall return). If the monkey types only capital letters, and is on every occasion equally likely to type any of the 26, how long on average will it take him to produce the sequence 'ABRACADABRA'? The next three sections involve quite subtle topics which take time to as- assimilate. They are not strictly necessary for subsequent chapters. The Kolmogorov 0-1 law is used in one of our two proofs of the Strong Law for IID RVs, but by that stage a quick martingale proof (of the 0-1 law) will have been provided. Note. Perhaps the otherwise-wonderful Tg^- makes its T too like X. Below, I use K, instead of X to avoid the confusion. Script X, X, is too like Greek chi, x-) t°°; btrt we have to live with that.
46 Chapter 4- Independence D-10).. 4.10. Definition. Tail cr-algebras Let X\, X2, • • • be random variables. Define (a) Tn := <7(Xn+i,Xri+2? • • • )•> T := | | n The cr-algebra T is called the tail a-algebra of the sequence (Xn : n £ N). Now, T contains many important events: for example, (bl) F\ := (limX* exists) := {a; : limX*(a;) exists}, k (b2) F2 := B__\ Xk converges), ,,ov c, fv X\ + X2-] h Xk . \ (b3) F3 := ( lim = exists ) . \ A> / \ / Also, there are many important variables which are in mT: for example, , , r .- IT" A2 i~ * ' ' + A.k (c) fi:=hmsup , which may be ±00, of course.* Exercise. Prove that JFi, JF2 and JF3 are are T-measurable, that the event H in the monkey problem is a tail event, and that the various events of probability 0 and 1 in Section 4.4 are tail events. Hint - to be read only after you have already tried hard. Look at Fz for example. For each n, logic tells us that F$ is equal to the set Now, Xn+i, Xn+2, • • • are all random variables on the triple (fi, 7^, P). That ^3 E Tn now follows from Lemmas 3.3 and 3.5. 4.11. THEOREM. Kolmogorov's 0-1 Law Let (Xn : n £ N) be a sequence of independent random variables, and let T be the tail a-algebra of (Xn : n E N). Then T is P-trivial: that is, (i) FeT => P(F) = 0 or P(F) = 1, (ii) if £ is a T-measurable random variable, then, £ is almost deter- deterministic in that for some constant c in [—00,00], Ptf = c) = 1.
..D-11) Chapter 4- Independence We allow £ = d=oo at (ii) for obvious reasons. Proof of (i). Let Xn := a(X\,... ,Xn), Tn := < /: We claim that Xn and Tn are independent. Proof of claim. The class JC of events of the form u : Xi(cj) <xi:l<k<n}, jr^RU {oo} is a x-system which generates Xn. The class J of sets of the form {u : Xj(lj) < xj : n + 1 < j < n + r}, r G N, Xj G R U {oo} is a 7r-system which generates Tn. But the assumption that the sequence (Xk) is independent implies that K and J are independent. Lemma 4.2(a) now clinches our claim. Step 2: Xn and T are independent. This is obvious because T C Tn. Step 8: We claim that Xqq := &(Xn : n E N) and T are independent. Proof of claim. Because Xn C Xn+i, Vn, the class /Coo := U^" 1S a T~ system (it is generally NOT a <7-algebra!) which generates A^. Moreover, /Coo and T are independent, by Step 2. Lemma 4.2(a) again clinches things. Step 4> Since T C Xqq, T is independent of T! Thus, FeT => P(F) = P(F n F) = P(F)P(F), and P(F) - 0 or 1. D Proof of (ii). By part (i), for every x in R, P(f < x) = 0 or 1. Let c := sup{a: : P(^ < x) = 0}. Then, if c = —oo, it is clear that P(£ = —oo) = 1; and if c = oo, it is clear that P(£ = oo) = 1. So, suppose that c is finite. Then P(£ < c — 1/n) = 0, Vn, so that while, since P(£ < c + 1/n) = 1, Vn, we have
48 Chapter 4- Independence D.11)~ Hence, P{( = c) = 1. □ Remarks. The examples in Section 4.10 show how striking this result is. For example, ifX\,X2,... is a sequence of independent random variables, then either P(V^ Xn converges) = 0 or P(TJ Xn converges) = 1. The Three Series Theorem (Theorem 12.5) completely settles the question of which possibility occurs. So, you can see that the 0-1 law poses numerous interesting questions. Example. In the branching-process example of Chapter 0, the variable Moo := lim Znj\in is measurable on the tail <7-algebra of the sequence (Zn : n E N) but need not be almost deterministic. But then the variables (Zn : n £ N) are not independent. 4.12. Exercise/Warning Let Yq , Y\, Y2,... be independent random variables with P(Kn = +l) = P(Kn = -l)=i, Vn. For n 6 N, define Xn := 10Y1 ... Yn. Prove that the variables X\,X2,... are independent. Define y:=a(YlyY2>...), Tn := a(Xr : r > n). Prove that \ n Hint. Prove that Yq 6 m£ and that Yo is independent of 71. Notes. The phenomenon illustrated by this example tripped up even Kol- mogorov and Wiener. The very simple illustration given here was shown to me by Martin Barlow and Ed Perkins. Deciding when we can assert that (for J> a a-algebra and (Tn) a decreasing sequence of cr-algebras ) n \ n / is a tantalizing problem in many probabilistic contexts.
Chapter 5 Integration 5.0. Notation, etc. /i(f) :=: J/d/x, /i(f;A) Let (S, £,//) be a measure space. We are interested in defining for suitable elements / of mS the (Lebesgue) integral of / with respect to //, for which we shall use the alternative notations: /*(/) :==: Is f(sMds) :==: Is fd^' It is worth mentioning now that we shall also use the equivalent nota- notations for i6S: (with a true definition on the extreme right!) It should be clear that, for example, tifl />*):= Kfr A)> where A = {s G S : f(s) > x). Something else worth emphasizing now is that, of course, summation is a special type of integration. If (an : n £ N) is a sequence of real numbers, then with S = N, S = V(N), and fi the measure on E, S) with fi({k}) = 1 for every A: in N, then s t-* as is /i-integrable if and only if ^ |an| < oo, and then fln ==: / as/i(ds) = I ad/j, Js Js We begin by considering the integral of a function / in (mS)+, allowing such an f to take values in the extended half-line [0, oo]. 49
50 Chapter 5: Integration E.1). 5.1. Integrals of non-negative simple functions, SF+ If A is an element of S, we define oo. The use of fio rather than fi signifies that we currently have only a naive integral defined for simple functions. An element / of (mS)+ is called simple, and we shall then write / G , if / may be written as a finite sum m (a) where a* G [0, oo] and Ak G S. We then define (b) ^o(/) = Vja*/i(Ajfc) < oo (with O.oo := 0 =: oo.O). The first point to be checked is that fJ,o(f) is well-defined; for / will have many different representations of the form (a), and we must ensure that they yield the same value of Ho(f) in (b). Various desirable properties also need to be checked, namely (c), (d) and (e) now to be stated: (c) if f,g G SF+ and fi(f j= g) = 0 then /zo(/) = (d) ('Linearity') if f,g £ SF+ and c > 0 then / + 9 and cf are in SF+, and (e) (Monotonicity) if /,# G 5F+ and / < g, then //o(/) 5: (f) if /, # G SF+ then / A 0 and / V g are in SF+. Checking all the properties just mentioned is a little messy, but it in- involves no point of substance, and in particular no analysis. We skip this, and turn our attention to what matters: the Monotone-Convergence Theorem. 5.2. Definition of//(/), / G (mE)+ ►For / G (mS)+ we define (a) fi(f) := sup{no(h) : h G SF+, h < /} < 00. Clearly, for / G 5F+, we have //(/) = The following result is important.
..E.8) Chapter 5: Integration 51 LEMMA ►(b) If f e (mS)+ and //(/) = 0, then rtU > o}) = o- Proof. Obviously, {/ > 0} =| lim{/ > n}. Hence, using A.10,a), we see that if fi({f > 0}) > 0, then, for some n, //({/ > n}) > 0, and then >O. D 5.3. Monotone-Convergence Theorem (MON) ►(a) If (fn) is a sequence of elements of (mS)+ such that fn j /, then KM t Kf) < or, in other notation, T Js This theorem is really all there is to integration theory. We shall see that other key results such as the Fatou Lemma and the Dominated-Convergence Theorem follow trivially from it. The (MON) theorem is proved in the Appendix. Obviously, the the- theorem relates very closely to Lemma 1.10(a), the monotonicity result for measures. The proof of (MON) is not at all difficult, and may be read once you have looked at the following definition of a(r\ It is convenient to have an explicit way given / € (mE)+ of obtaining a sequence /(r) of simple functions such that /(r) | /. For r E N, define the rth staircase function a^ : [0, oo] —► [0, oo] as follows: (b) f 0 if x = 0, (i - lJ~r if (t - lJ"r < x < i2~r <r (i € N), r if x > r. Then /<r) = a^ o f satisfies /<r> 6 ST+, and /(r> | / so that, by (MON), We have made a^ left-continuous so that if fn | / then a^(fn) | &(r
52 Chapter 5: Integration E.8).. Often, we need to apply convergence theorems such as (MON) where the hypothesis (fn | / m the case of (MON)) holds almost everywhere rather than everywhere. Let us see how such adjustments may be made. (c) If f,g e (mS)+ and f = g (a.e.), then fi(f) = fi{g). Proof. Let /<r> = a<r> o /, g^ = a<r> o g. Then /<r> = g^ (a.e.) and so, by E.1,c), /i(/(r)) = K9(r))- Now let r T oo, and use (MON). □ (d) If f £ (mS)+ and (fn) is a sequence in (m£)+ such that, except on a fi-null set TV", fn | /. Then Kfn) T /i(/). Proo/. We have p(f) = fi(fls\N) and /i(/n) = A*(/nIs\iv). But fnls\N T fls\N everywhere. The result now follows from (MON). □ From now on, (MON) is understood to include this extension. We do not bother to spell out such extensions for the other convergence theorems, often stating results with 'almost everywhere' but proving them under the assumption that the exceptional null set is empty. Note on the Riemann integral If, for example, / is a non-negative Riemann integrable function on ([0,1], #[0,1], Leb) with Riemann integral I, then there exists an increasing se- sequence (Ln) of elements of SF+ and a decreasing sequence (Un) of elements of SF+ such that Ln T L < /, Un i U > f and n(Ln) T I» M^n) 1 I. If we define ■{ - i L if L — c7, 0 otherwise, then it is clear that / is Borel measurable, while (since fi{L) = n(U) = 1) {/ ¥" /} ^s a subset of the Borel set {L ^ U} which Lemma 5.2(b) shows to be of measure 0. So / is Lebesgue measurable (see Section Al.ll) and the Riemann integral of / equals the integral of / associated with ([0, l],Le6[0, l],Leb), Le6[0,1] denoting the cr-algebra of Lebesgue measur- measurable subsets of [0,1]. 5.4. The Fatou Lemmas for functions (a) (FATOU) For a sequence (/„) in (mS)+, //(liminf fn) < limin
..E.6) Chapter 5: Integration 53 Proof. We have (*) liminf/„ =f lim#jt, where gk := infn>jb fn. n For n > k, we have fn > gk, so that /i(/n) > /^(^fc), whence < inf and on combining this with an application of (MON) to (*), we obtain ^(liminf/n) =| lim//($*) <| lim inf //(/„) n it It n>k =: liminf/i(/n). D n Reverse Fatou Lemma ► (b) // (/n) is a sequence in (mS)+ such that for some g in (mS)+, we have fn < g,Vn, and fi(g) < oo, then /i(limsup/n) > lim sup//(/„). Proof Apply (FATOU) to the sequence {g - /„). □ 5.5. 'Linearity' For a, $ e R+ andf.g 6 (mS)+, fi(af + /?^) = a/i(/) + ^/i(flf) (< oo). Proof Approximate / and g from below by simple functions, apply E.1,d) to the simple functions, and then use (MON). □ 5.6. Positive and negative parts of/ For / £ mS, we write / = /+ — /~, where f+(s) := max(/(a), 0), f~(s) := Then /+,/-€ (mS)+, and |/| = /++/-.
54 Chapter 5: Integration E.7).. 5.7. Integrable function, ►For / E mS, we say that / is \i-integrable, and write if and then we define Note that, for / 6 £J(S, S, /*), • Hf)\ < K\f\), the familiar rule that the modulus of the integral is less than or equal to the integral of the modulus. We write £X(S, S, /i)+ for the class of non-negative elements in £l(S, E, //). 5.8. Linearity Fora,p e R and f,g G and Proof. This is a totally routine consequence of the result in Section 5.5. □ 5.9. Dominated-Convergence Theorem (DOM) ► Suppose that /n, / E m£, that /n(«s) —► f(s) for every s in S and that the sequence (/„) is dominated by an element g of Cl(S, £,//)+: \fn{s)\<g(s), VsES,VneN, where n(g) < oo. Then fn-+f in £\S, S,/x): that is, /i(|/n - /|) -> 0, whence Kfn) Command: Do Exercise E5.1 now.
..E.11) Chapter 5: Integration 55 Proof. We have \fn - f\ < 2#, where fiBg) < oo, so by the reverse Fatou Lemma 5.4(b), limsup/i(|/n - /|) < //(limsupl/n - /|) = /i@) = 0. Since IM/n) - Kf)\ = Hfn ~ f)\ < MI/» - /I), the theorem is proved. 5.10. Scheffe's Lemma (SCHEFFfi) ►(i) Suppose that /„,/ 6 Cl(S, S,^)+; in particular, fn and f are non- negative. Suppose that fn —> f (a.e.). Then v(\fn - /I) -> 0 if and only if ji(fn) Proof The 'only if part is trivial. Suppose now that 0) Since (/„ - /)" < /, (DOM) shows that (b) Next, But Hfn - /; /» < /)| < K(/n - /Dl - o so that (a) and (b) together imply that Of course, (b) and (c) now yield the desired result. □ Here is the second part of Scheffe's Lemma. ► (ii) Suppose that fn,f E £*(£, £,//) and that /n —> / (a.e.). TTten i ~ /I) "* 0 if and only if »(\fn\) - Exercise. Prove the 'if part of (ii) by using Fatou's Lemma to show that ^(Z^) ~* A£(/±)> an(l then applying (i). Of course, the 'only if part is trivial. 5.11. Remark on uniform integrability The theory of uniform integrability, which we shall establish later for proba- probability triples, gives better insight into the matter of convergence of integrals.
56 Chapter 5: Integration E.12).. 5.12. The standard machine What I call the standard machine is a much cruder alternative to the Monotone-Class Theorem. The idea is that to prove that a 'linear' result is true for all functions /iina space such as Cl(S, H,fi), • first, we show the result is true for the case when h is an indicator function - which it normally is by definition; • then, we use linearity to obtain the result for h in SF+; • next, we use (MON) to obtain the result for h E (m£)+, integrability conditions on h usually being superfluous at this stage; • finally, we show, by writing h = h+ — h~~ and using linearity, that the claimed result is true. It seems to me that, when it works, it is easier to 'watch the standard machine work' than to appeal to the monotone-class result, though there are times when the greater subtlety of the Monotone-Class Theorem is essential. 5.13. Integrals over subsets Recall that for / E (m£)+, we set, for A E D, / Ja If we really want to integrate / over A, we should integrate the restriction /| a with respect to the measure jia (say) which is fi restricted to the measure space (A, £,4), Ha denoting the <7-algebra of subsets of A which belong to E. So we ought to prove that The standard machine does this. If / is the indicator of a set B in A, then both sides of (a) are just fi(A fl B); etc. We discover that for f E mS, we have J\a € mS^; and then S\a € C^A^a^a) if and only if flA E in which case (a) holds.
..E.14) Chapter 5: Integration 57 5.14. The measure //i, / G (m£)+ Let / e (mS)+. For AeS, define (a) A trivial Exercise on the results of Section 5.5 and (MON) shows that (b) (//*) w a measure on E, £). For h 6 (m£)+, and A G 2, we can conjecture that (c) If h is the indicator of a set in E, then (c) is immediate by definition. Our standard machine produces (c), so that we have Result (d) is often used in the following form: 00 if f e (mS)+ and h e (mS), then h e C^S^.ffi) if and only if fh e Cl{S,Y,,ii) and then (ffi)(h) = fi(fh). Proof. We need only prove this for h > 0 in which case it merely says that the measures at (d) agree on 5. d Terminology, and the Radon-Nikodym theorem If A denotes the measure ffi on E, E), we say that A has density f relative to /i, and express this in symbols via dX/dfi = f. We note that in this case, we have for F £ S: (f) n(F) = 0 implies that X(F) = 0; so that only certain measures have density relative to fi. The Radon- Nikodym theorem (proved in Chapter 14) tells us that */ /•* and ^ are cr-finite measures on E, D) such that (f) holds, then A = f/j, for some f £ (mS)+.
Chapter 6 Expectation 6.0. Introductory remarks We work with a probability triple (Q, T, P), and write Cr for £r(£l, J7, P). Recall that a random variable (RV) is an element of mj7, that is an T- measurable function from fi to R. Expectation is just the integral relative to P. Jensen's inequality, which makes critical use of the fact that P@) = 1, is very useful and powerful: it implies the Schvvarz, Holder, ... inequalities for general (S, S,/i). (See Section 6.13.) We study the geometry of the space £2(Q, T, P) in some detail, with a view to several later applications. 6.1. Definition of expectation For a random variable X 6 C1 = £1(Q,^r, P), we define the expectation E(X) ofX by E(X) := / XdP = [ X{u:)P{dw). We also define E(X) (< oo) for X G (mJF)+. In short, E(X) = P(X). That our present definitions agree with those in terms of probability density function (if it exists) etc. will be confirmed in Section 6.12. 6.2. Convergence theorems Suppose that (Xn) is a sequence of RVs, that X is a RV, and that Xn —> X almost surely: P(Xn -► X) = 1. We rephrase the convergence theorems of Chapter 5 in our new notation: 58
..F.4) Chapter 6: Expectation 59 (MON) if 0 < In T X, then E(Xn) T E(X) < oo; (FATOU) ifXn > 0, then E(X) < liminf E(Xn); ►(DOM) if \Xn(u>)\ < Y(u) V(n,u;), where E(F) < oo, then so that E(Xn) -+ ►(SCHEFFE) ifE(\Xn\) -♦ E(\X\), then E(\Xn - X\)-+0; (BDD) if for some finite constant K, |Xn(u;)| < K, V(n,u;), then E{\Xn - X\)-* Q. The newly-added Bounded Convergence Theorem (BDD) is an imme- immediate consequence of (DOM), obtained by taking Y(u>) = K, Vu;; because of the fact that P@) = 1, we have E(F) < oo. It has a direct elementary proof which we shall examine in Section 13.7; but you might well be able to provide it now. As has been mentioned previously, uniform integrability is the key con- concept which gives a proper understanding of convergence theorems. We shall study this, via the elementary (BDD) result, in Chapter 13. 6.3. The notation E(A"; F) For X EC1 (or (m^*)+) and Fe/,we define ► E(X; F) := JF X{u:)P{duj) := E(XIF), where, as ever, 1 if oj if a; Of course, this tallies with the /i(/; A) notation of Chapter 5. 6.4. Markov's inequality Suppose that Z £ mj7 and that g : R —► [0, oo] is B-measurable and non- decreasing. (We know that g{Z) = g o Z 6 (m.Jr)+.) Then Eg(Z) > E(g(Z); Z>c)> g(c)P(Z > c).
Chapter 6: Expectation F-4)~ Examples: for Z G (m^)+, cP(Z > c) < E(Z), (c > 0), eO, cP{\X\ > c) < ► Considerable strength can often be obtained by choosing the optimum 6 for c in ► P(F > c) < e"*cE(e*y), @>O, c G R). 6.5. Sums of non-negative RVs We collect together some useful results. (a) If X £ (m.F)+ and E(Ar) < oo, then P(X < oo) = 1. This is obvious. (b) If (Zk) is a sequence in (m^7)"^", then This is an obvious consequence of linearity and (MON). ►(c) If(Zk) is a sequence in (m^*)+ such that 5ZE(Zjt) < oo, then Y^Zk < oo (a.s.) and so Zk —» 0 (a.s.) This is an immediate consequence of (a) and (b). (d) The First Borel-Cantelli Lemma is a consequence o/(c). For suppose that (Fk) is a sequence of events such that ^P{Fk) < oo. Take Zk = If*- Then E(Zjt) = P(Fk) and, by (c), If* = number of events Fk which occur is a.s. finite. 6.6. Jensen's inequality for convex functions ►A function c : G —♦ R, where G is an open subinterval of R, is called convex on G if its graph lies below any of its chords: for x,y £ G and 0<p=l-g<l, c(px -f qy) < pc(x) + qc(y). It will be explained below that c is automatically continuous on G. If c is twice-differentiable on G, then c is convex if and only if c" > 0. ^Important examples of convex functions: \x\,x2 ,c6x{9 6 R).
..F.7) Chapter 6: Expectation 61 THEOREM. Jensen's inequality ► ► Suppose thai c : G —* R is a convex function on an open subinterval G of R and that X is a random variable such that < oo, P(X 6ff) = l, E|c(X)| < oo. Then Ec(X) > c(E(Ar)). Proof. The fact that c is convex may be rewritten as follows: for u,v,w G G with w < v < w, we have A ^ A 1_ A C(V) ~ C(U) A«,v < At,tU;) where AUtV := v — u It is now clear (why?!) that c is continuous on G, and that for each v in the monotone limits (D-c)(v) :=T Hm A«flM B?+c)(u) :=| lim A exist and satisfy (D-c)(v) < (J94_c)(v). The functions J9_c and J9+c are non-decreasing, and for every v in G, for any m in [(J9_c)(v), (Z>4_c)(v)] we have c(x) > m(x — v) + c(v), x £ G. In particular, we have, almost surely, for ji := c(X) > m(X -,i) + c(/i), m 6 [B and Jensen's inequality follows on taking expectations. □ Remark. For later use, we shall need the obvious fact that (a) c(x) = sup[(D_c)(q)(x - q) + c(q)] = sup(anx + bn) (x 6 G) G n for some sequences (an) and (bn) in R. (Recall that c is continuous.) 6.7. Monotonicity of Cp norms ►For 1 < p < oo, we say that X e Cp = £'(ft,^", P) if oo,
$2 Chapter 6: Expectation F.7). and then we define ► ||X||, := {E(\X\')}i. The monotonicity property referred to in the section title is the following: ►(a) if 1 < p < r < oo and Y G £r, then Y G Cp and \\y\\p < w\\r. ► Proof. For n 6 N, define Xn(u):={\Y(u)\An}'. Then Xn is bounded so that Xn and Xn are both in C1. Taking c{x) xrtp on @, oo), we conclude from Jensen's inequality that A ny\ < E(\Y\r). Now let n f oo and use (MON) to obtain the desired result. D Note. The proof is marked with a ► because it illustrates a simple but effective use of truncation. Vector-space property of Cp (b) Since, for a, b G R+, we have (a + b)p < [2max(a, b)]p < 2p(a Cp is obviously a vector space. 6.8. The Schwarz inequality ►(a) If X and Y are in £2, then XY G C1, and Remark. You will have seen many versions of this result and of its proof before. We use truncation to make the argument rigorous. Proof. By considering \X\ and |F| instead of X and F, we can and do restrict attention to the case when X > 0, Y > 0.
..F.9) Chapter 6: Expectation 68 Write Xn •*= X A n, Yn := Y A n, so that A"n and yn are bounded. For any a,b € R, 2 = a2E(A'2) + 2a6E(Xny«) + *>2E(Fn2), and since the quadratic in a/b (or 6/a, or...) does not have two distinct real roots, {2E(XnYn)}2 < 4E(A2)E(Fn2) < 4E(X2)E(F2). Now let n f oo using (MON). □ The following is an immediate consequence of (a): (b) if X and Y are in C2, then so is X +Y, and we have the triangle law. Remark. The Schwarz inequality is true for any measure space - see Sec- Section 6.13, which gives the extensions of (a) and (b) to Cp. 6.9. C2: Pythagoras, covariance, etc. In this section, we take a brief look at the geometry of C2 and at its con- connections with probabilistic concepts such as covariance, correlation, etc. Covariance and variance If X, Y 6 £2, then by the monotonicity of norms, A", Y 6 £*, so that we may define liX:=E(X), /iy:=E(r). Since the constant functions with values fiXil^Y &re in £2, we see that (a) X:=X-nx, Y :=Y - are in C2. By the Schwarz inequality, XY G £*, and so we may define (b) Cov(X, Y) := E(XY) = The Schwarz inequality further justifies expanding out the product in the final [ ] bracket to yield the alternative formula: (c) Cov(X, Y) = E(XY) - As you know, the variance of X is defined by (d) Var(X) := E[(X - fixf] = E(X2) -& = Cov(X, X).
Chapter 6: Expectation F.9).. Inner product, angle For U, V 6 £2, we define the inner (or scalar) product (e) (V, V) := E(UV), and if ||17||2 and ||F||2 ^ 0, we define the cosine of the angle 9 between U and V by (f) cos*= \Wh\\V\W This has modulus at most 1 by the Schwarz inequality. This ties in with the probabilistic idea of correlation: (g) the correlation p of X and Y is cos a where a is the angle between X and Y. Orthogonality, Pythagoras theorem C2 has the same geometry as any inner-product space (but see ' Quotient- ing' below). Thus the 'cosine rule' of elementary geometry holds, and the Pythagoras theorem takes the form (h) \\U + V\\22 = \\U\U2 + \\V\\22 if (U, V) = 0. If A7, V) — 0, we say that U and V are orthogonal or perpendicular, and write U _L V. In probabilistic language, (h) takes the form (with 17, V replaced by X, Y) (i) Var(X + Y) = Var(X) + Var(F) if Cov(X,F) = 0. Generally, for Xx, X2,..., Xn 6 C2, (j) Var(Xa + X2 + • • • + Xn) = ^ V*r(Xk) + 2^ X]. Cov(X,-, Xj). k I have not marked results such as (i) and (j) with ► because I am sure that they are well known to you. Parallelogram law Note that by the bilinearity of (•,•), (k) \\U + V\\22 + \\U - F||22 = (U + V,U + V) + = 2\\U\\22+2\\V\\22
..F.10) Chapter 6: Expectation 65 Quotienting (or lack of it!): L2 Our space C2 does not quite satisfy the requirements for an inner product space because the best we can say is that (see E.2,b)) ||J7||2 = 0 if and only if U = 0 almost surely. In functional analysis, we find an elegant solution by defining an equiv- equivalence relation U ~ V if and only if U = V almost surely and define L2 as '£2 quotiented out by this equivalence relation'. Of course, one needs to check that if for t = 1,2, we have c, G R and 17,-, V{ G C2 with Ui ~ V*, then + c2U2 ~ dVl + c2V2; (UUU2) = (VUV2); that if Un -» U in C2 and Fn - I7n and V~U, then Vn -^ V in £2; etc. As mentioned in *A Question of Terminology', we normally do not do this quotienting in probability theory. Although one might safely do so at the moderately elementary level of this book, one could not do so at a more advanced level. For a Brownian motion {Bt : t G R+}, the crucial property that t h* Bt(u)) is continuous would be meaningless if one replaced the true function Bt on 0 by an equivalence class. 6.10. Completeness of Cp A < p < oo) Let p G [1, oo). The following result (a) is important in functional analysis, and will be crucial for us in the case when p = 2. It is instructive to prove it as an exercise in our probabilistic way of thinking, and we now do so. (a) If (Xn) is a Cauchy sequence in Cp in that sup ||Xr - X3\\p -+ 0 (k -+ oo) r,s>k then there exists X in Cp such that Xr —► X in Cp: \\Xr - X\\p -> 0 (r -> oo). Note. We already know that Cp is a vector space. Property (a) is important in showing that Cp can be made into a Banach space Lp by a quotienting technique of the type mentioned at the end of the preceding section.
C6 Chapter 6: Expectation F.10).. Proof of (a). We show that X may be chosen to be an almost sure limit of a subsequence (Xkn ). Choose a sequence (kn : n 6 N) with kn f oo such that Then E(|X*n+1 -X*J) = ||X,n+1 - AT*m ||! < ||X*n+1 -Xk so that Hence it is almost surely true that the series kn+i ~ Ar*B) converges (even absolutely!), so that limXjfcn(u;) exists for almost all u. Define Then X is ^"-measurable, and Xkn —■> A", a.s. Suppose that n G N and r > /?„. Then, for N B t >n, E (|Xr - Xit Y) = ||Xr - Xk, ||/ < 2-"', so that on letting t f oo and using Fatou's Lemma, we obtain Firstly, Xr — X G £p, so that X £ £p. Secondly, we see that, indeed, Xr -+ X in £P. D . For an easy exercise on £p convergence, see EA13.2. 6.11. Orthogonal projection The result on completeness of £p obtained in the previous section has a number of important consequences for probability theory, and it is perhaps as well to develop one of these while Section 6.10 is fresh in your mind. I hope that you will allow me to present the following result on orthog- orthogonal projection as a piece of geometry for now, deferring discussion of its central role in the theory of conditional expectation until Chapter 9. We write || • || for || • ||2 throughout this section.
..F.11) Chapter 6: Expectation 67 THEOREM Let K, be a vector subspace of C? which is complete in that whenever (Vn) is a sequence in K, which has the Cauchy property that sup then there exists a V in K, such that ||V»-V||->0 (n->oo). Then given X in C2 y there exists Y in K, such that (i) \\X - r|| = A := inf{||X - W\\ : W G £}, (ii) X - Y ± Z, VZ e£. Properties (i) and (ii) ofY in fC are equivalent and ifY shares either property (i) or (ii) with Y', then \\Y - Y|| = 0 (equivalent^, Y = Y, a.s.). Definition. The random variable Y in the theorem is called a version of the orthogonal projection of X onto IC. UY is another version, then Y = Y, a.s. Proof. Choose a sequence (Yn) in IC such that By the parallelogram law F.9,k), \\X - Yr\\2 + \\X - Ys\\2 = 2\\X - \{Yr + Fa)||2 + 2\\\(Yr - Y™2 But \(Yr + Y3) e £, so that \\X - \{Yr + y^,)||2 > A2. It is now obvious that the sequence (Yn) has the Cauchy property so that there exists a Y in K such that \\Yn - Y\\ - 0. Since F.8,b) implies that \\X - Y\\ < \\X -Yn\\ + \\Yn - Y\\, it is clear that \\X - Y\\ = A.
$8 Chapter 6: Expectation F.11). For any Z in £, we have Y + tZ G K for t 6 R, and so whence -2t(Z,X -Y) +t2\\Z\\2 >Q. This can only be the case for all t of small modulus if = 0. □ Remark. The case to which we shall apply this theorem is when K, has the form £2(Q, £, P) for some sub-a-algebra Q of J7. 6.12. The 'elementary formula' for expectation Back to earth! Let X be a random variable. To avoid confusion between different £'s, let us here write Ax on (R, B) for the law of X: Ax(B) := P(X e B). LEMMA Suppose that h is a Borel measurable function from R to R. Then h(X) e Cl(Sl,F,P) if and only if h G Cl(R,B,\x) and then (a) Eh(X) = Ax(h) = / h(x)Ax(dx). Jr Proof We simply feed everything into the standard machine. Result (a) is the definition of Ax if h = Ib (B E B). Linearity then shows that (a) is true if h is a simple function on (R, B). (MON) then implies (a) for h a non-negative function, and linearity allows us to complete the argument. □ Probability density function (pdf) We say that X has a probability density function (pdf) fx if there exists a Borel function fx : R —> [0, oo] such that (b) P(X eB)= f fx{x)dx, JB
..F.13) Chapter 6: Expectation 69 Here we have written dx for what should be Leb(dx). In the language of Section 5.12, result (b) says that Ax has density fx relative to Leb: dAx __ , dLeb JX' The function fx is only defined almost everywhere: any function a.e. equal to fx will also satisfy (b) 'and conversely'. The above lemma extends to E(\h(X)\) < oo if and only if / \h(x)\fx(x)dx < oo and then Eh(X) — I h(x)fx(x)dx. Jr 6.13. Holder from Jensen The truncation technique used to prove the Schwarz inequality in Section 6.8 relied on the fact that P(Q) < oo. However, the Schwarz inequality is true for any measure space, as is the more general Holder inequality. We conclude this chapter with a device (often useful) which yields the Holder inequality for any E, E, fi) from Jensen's inequality for probability triples. Let (S, E, /i) be a measure space. Suppose that ► p > 1 and p~l -f q~l = 1. Write / G £P(S, S,/i) if / G mS and //(|/|p) < oo, and in that case define 11/11, := THEOREM Suppose thai f,g G £P(S, S, //), h G Cq(S, S, //). Then ►(a) (Holder's inequality) fh G £1E',E,/i) and Mfh)\ < V(\fh\) < \\f\\p\\h\Ui ►(b) (Minkowski's inequality)
ft) Chapter 6: Expectation F.IS). Proof of (a). We can obviously restrict attention to the case when f,h >0 and//(/*)> 0. With the notation of Section 5.14, define P "" tip)' so that P is a probability measure on E", £). Define 0 if/(a) The fact that P(«)» < P(w«) now yields Proof of(b). Using Holder's inequality, we have till + g\p) = ti\M + ar1) + ti\g\\f + ^ + \\9\\pA where and (b) follows on rearranging. (The result is non-trivial only if /, g 6 £p, and in that case, the finiteness of A follows from the vector-space property of £P.) D
Chapter 7 An Easy Strong Law 7.1. 'Independence means multiply' - again! THEOREM Suppose that X and Y are independent RVs, and that X and Y are both in C1. Then XY e C1 and E(AT) = E(X)E(F). In particular, if X andY are independent elements of C2, then Cov(X, Y) = 0 and Var(X + Y) = Var(X) + Var(F). Proof Writing X = X"^ — X , etc., allows us to reduce the problem to the case when X > 0 and Y > 0. This we do. But then, if a^ is our familiar staircase function, then = £ bjlB, where the sums are over finite parameter sets, and where for each i and Ai (in cr(X)) is independent of Bj (in cr(Y)). Hence n B = E E ^ Now let r f oo and use (MON). D Remark. Note especially that if X and Y are independent then X 6 C1 and Y 6 C1 imply that AT G £J. This is not necessarily true when X and 71
72 Chapter 7: An Easy Strong Law G.0).. Y are not independent, and we need the inequalities of Schwarz, Holder, etc. It is important that independence obviates the need for such inequalities. 7.2. Strong Law - first version The following result covers many cases of importance. You should note that though it imposes a 'finite 4th moment' condition, it makes no assumption about identical distributions for the (Xn) sequence. It is remarkable that so fine a result has so simple a proof. THEOREM ► Suppose that X\,X2, • • • are independent random variables, and that for some constant K in [0, oo), = 0, E(Xi)<K, Vk. Let Sn = Xi + X2 + • • • + Xn. Then or again, Sn/n —>■ 0 (a.s.). Proof. We have E(S*) = E[(A'j + Xt + • • • + Xnf] k because, for distinct i,j, k and /, E(XiX]) = E(XiX]Xk) = EiXiXjXkXt) = 0, using independence plus the fact that E(Arj) = 0. [Note that, for example, the fact that E(Xf) < oo implies that E(X]) < oo, by the 'monotonicity of £P norms' result in Section 6.7. Thus X{ and X] are in C1.] We know from Section 6.7 that [E(X?)]2 < E(Xf) < K, Vi. Hence, using independence again, for i ^ j, E(X?X]) = E(Xf)E(Xj) < K.
..G.8) Chapter 7: An Easy Strong Law 73 Thus 4) < nK + 3n(n - \)K < 3/vn2, and (see Section 6.5) ]T/nL < ZK  so that ]C(*^n/nL < °°» a-s*> an<^ 5*n/?i —♦ 0, a.s. D Corollary. If the condition E(Xk) = 0 m the theorem is replaced by E(-Y*) = // /or wme constant fi, then the theorem holds with n~1Sn —> // (a.s.) a^ ita conclusion. Proof. It is obviously a case of applying the theorem to the sequence where Yk := X* — //. But we need to know that (a) supE(yfc4) < oo. This is obvious from Minkowski's inequality (the constant function fil on Q having £4 norm \fi\). But we can also prove (a) immediately by the elementary inequality F.7,b). □ The next topics indicate a different use of variance. 7.3. Chebyshev's inequality As you know this says that for c > 0, and Ie£2, c2P(|X - fi\ > c) < Var(X), fi := E(X); and it is obvious. Example. Consider a sequence (Xn) of IID RVs with values in {0,1} with p = P(Xn = 1) = 1 - P(Xn = 0).
74 Chapter 7: An Easy Strong Law G.8). Then E(Xn) = p and Var(Arn) = p(l - p) < J. Thus (using Theorem 7.1) Sn := X\ 4- -^2 4- * *' 4- -X\» has expectation np and variance np(l — p) < n/4, and we have Ein^Sn) = P, V*s{rrlSn) = nVar(Sn) < l/Dn). Chebyshev's inequality yields 7.4. Weierstrass approximation theorem If f is a continuous function on [0,1] and e > 0, <&era tffoere exw^ a polynomial B such that sup |B(x) ~/(x)| < e. x€[0,l] Proof. Let (J\Tjt), S'n etc. be as in the Example in Section 7.3. You are well aware that P[5n = k] = (? V(l ~ PT~\ 0<k<n. Hence Bn(p) := the 4J?' being in deference to Bernstein. Now / is bounded on [0,1], |/(y)| < K, Vy 6 [0,1]. Also, / is uniformly continuous on [0,1]: for our given e > 0, there exists S > 0 such that (a) |x — y| < 6 implies that \f(x) — f(y)\ < is. Now, for pe [0,1], \Bn(p) - f(P)\ = mfin-'Sn) - f(p)}\. Let us write Yn := {/(n^Sn) - f(p)\ and Zn := {n^Sn-pl Then Zn < 6 implies that Yn < \e, and we have 2KP(Zn > S) Earlier, we chose a fixed 8 at (a). We now choose n so that 2K/Dn62) < ie. Then \Bn(p) - f(p)\ < e, for all p in [0,1]. □ Now do Exercise E7.1 on inverting Laplace transforms.
Chapter 8 Product Measure 8.0. Introduction and advice One of this chapter's main lessons of practical importance is that an 'in- 'interchange of order of integration' result S\ is always valid (both sides possibly being infinite) if / > 0; and is valid for 'signed' / (with both repeated integrals finite) provided that one (then the other) of the integrals of absolute values: / (/ \f(sus2)\fi2(ds2))fii(ds1)= J S\ JS2 J S / (/ is finite. It is a good idea to read through the chapter to get the ideas, but you are strongly recommended to postpone serious study of the contents until a later stage. Except for the matter of infinite products, it is all a case of relentless use of either the standard machine or the Monotone-Class Theorem to prove intuitively obvious things made to look complicated by the notation. When you do begin a serious study, it is important to appreciate when the more subtle Monotone-Class Theorem has to be used instead of the standard machine. 8.1. Product measurable structure, £1 x £2 Let (Si,I]i) and E2, £2) be measurable spaces. Let S denote the Cartesian product S := S\ x S2. For i = 1,2, let pi denote the ith coordinate map, so that 75
76 Chapter 8: Product Measure (8.1). The fundamental definition of S = Si x S2 is as the cr-algebra ►(a) E = <j(/>i,p2). Thus S is generated by the sets of the form together with sets of the form p2~1(B2) = S1 xB2 E2 G S2). Generally, a product cr-algebra is generated by Cartesian products in which one factor is allowed to vary over the a-algebra corresponding to that factor, and all other factors are whole spaces. In the case of our product of two factors, we have (b) (Bl x S2) 0 (Si x B2) = BlxB2 and you can easily check that (c) X={J?i X B2:Bi;€£,} is a 7r-system generating E = Ei x E2. A similar remark would apply for a countable product fl^»»» but you can see that, since we may only take countable intersections in analogues of (b), products of uncountable families of cr-algebras cause problems. The fundamental definition analogous to (a) still works. LEMMA (d) Let Ti denote the class of functions f : S —> R which are in bS and which are such that for each s\ in S\, the map s2 *-► f(s\^s2) is £2-measurable on S2, for each s2 in S2y the map S\ \-* f(s\,s2) is Si -measurable on S\. Then H = bS. Proof. It is clear that if A G J, then IA G U. Verification that H satisfies the conditions (i)-(iii) of the Monotone-Class Theorem 3.14 is straightforward. Since E = a(I), the result follows. □
..(8.2) Chapter 8: Product Measure 77 8.2. Product measure, FubinPs Theorem We continue with the notation of the preceding Section. We suppose that for i = 1,2, m is a finite measure on E*,-, Et). We know from the preceding Section that for / G bE, we may define the integrals := / S2 LEMMA Let 7i be the class of elements in bE such that the following property holds: {and l£(-) G Then H = bE. Proof. If A G X, then, trivially, 1^ G *%. Verification of the conditions of Monotone-Class Theorem 3.14 is straightforward. □ For F G E with indicator function / := I/r, we now define 5! Js2 Fubini's Theorem ► The set function fi is a measure on E, S) called the product mea- measure of fix and jjl2 and we write ji = jjl\ X fi2 and X E2,S2,/x2). Moreover, /j, is the unique measure on (S1, S) for which (a) ti(Ai x A2) = tii{Ai)n2(A2), A{ G E,-. If f G (mS)+, then with the obvious definitions ofif, 1^ we have • fi(f)= I l{{sl)lH{ds1) = f lf2(s2)n2(dS2),
78 Chapter 8: Product Measure (8.2).. in [0, oo]. // / 6 mS and fJ>(\f\) < oo, then equation (a) is valid (with all terms in R). Proof. The fact that /x is a measure is a consequence of linearity and (MON). The fact that fi is then uniquely specified by (a) is obvious from Uniqueness Lemma 1.6 and the fact that o~(Z) = S. Result (b) is automatic for / = 1^, where A 6 T. The Monotone-Class Theorem shows that it is therefore valid for / (E bS, and in particular for / in the SF+ space for (S, S, fi). (MON) then shows that it is valid for G (m£)+; and linearity shows that (b) is valid if//(|/|) < oo. Extension ► All of Fubini's Theorem will work if the (S,-,Et-,/it) are a-finite mea- measure spaces: We have a unique measure \i on (S, S) satisfying (a), etc., etc. We can prove this by breaking up a-finite spaces into countable unions of disjoint finite blocks. Warning The a-finiteness condition cannot be dropped. The standard example is the following. For i = 1,2, take Si = [0,1] and E,- = #[0,1]. Let ji\ be Lebesgue measure and let //2 just count the number of elements in a set. Let F be the diagonal {(x,y) £ Si x S2 : x = y}. Then (check!) F G E, but and result (b) fails, stating that 1 = 0. Something to think about So, our insistence on beginning with bounded functions on products of fi- finite measures was necessary. Perhaps it is worth emphasizing that in our standard machine, things work because we can use indicator functions of any set in our a-algebra, whereas when we can only use indicator functions of sets in a 7r-system, we have to use the Monotone-Class Theorem. We cannot approximate the set F in the Warning example as where each Fn is a finite union of Rectangles' A\ X ^2, each A{ being in B[0,1].
..(8.3) Chapter 8: Product Measure 79 A simple application Suppose that X is a non-negative random variable on (fi, T, P). Consider the measure \i := P x Leb on (O,^7) x ([0, oo),#[0, oo)). Let A := {(u>,z) : 0 < x < X(u>)}, h := 1^. Note that A is the 'region under the graph of X\ Then If H = X(u>), lh2(x) = P(X>x). Thus (c) fjt(A) = E(X) = / P(X > x)dx, dx denoting Leh(dx) as usual. Thus we have obtained one of the well-known formulae for E(X) and also interpreted the integral E(X) as ''area under the graph of X\ Note. It is perhaps worth remarking that the Monotone-Class Theorem, the Fatou Lemma and the reverse Fatou Lemma for functions amount to the corresponding results for sets applied to regions under graphs. 8.3. Joint laws, joint pdfs Let X and Y be two random variables. The (joint) law £x,Y of the pair (X, Y) is the map Cx,y : B(R) x B(R) -> [0,1] defined by CX,y(T) := P[(X, Y) e T). The system {(—oo, x] X (—oo,y] : x,y G R} is a 7r-system which generates x #(R). Hence £x,Y is completely determined by the joint distribution of X and Y which is defined via We now know how to construct Lebesgue measure /i = Leb x Leb on (R, 5(R)J. We say that X and Y have joint probability density function (joint pdf) fx,Y on R2 if for T e B(R) x B(R), p[(x,Y)er\ = = / / h(x, Jr Jr
SO Chapter 8: Product Measure (8.3).. etc., etc., (Fubini's Theorem being used in the last step(s)). Fubini's The- Theorem further shows that fx(x) := / fx,y{x, acts as a pdf for X (Section 6.12), etc., etc. You don't need me to tell you any more of this sort of thing. 8.4. Independence and product measure Let X and Y be two random variables with laws Cx, Cy respectively and distribution functions Fx,Fy respectively. Then the following three state- statements are equivalent: (i) X and Y are independent] n) ^x,y = Lx x Ly\ (iii) FXyy{x,y) = Fx(x)FY(y); moreover, if (X, Y) has 'joint' pdf fx,Y then each of (i)-(iii) is equivalent to (iv) fx,Y(x,y) = fx(x)fy(y) for Leb x Leb almost every (x,y). You do not wish to know more about this either. 8.5. B(R)n = B(Rn) Here again, things are nice and tidy provided we work with finite or count- countable products, but require different concepts (such as Baire a-algebras) if we work with uncountable products. B(Rn) is constructed from the topological space Rn. Now, if pi : Rn —+ R is the iih coordinate map: then pi is continuous, and hence #(Rn)-measurable. Hence Bn :=: B(R)n = a(pi : 1 < i < n) C B{Rn). On the other hand, #(Rn) is generated by the open subsets of Rn, and every such open subset is a countable union of open 4hypercubes' of the form l<k<n and such products are in B(R)n. Hence, B(Rn) = B(R)n. D In probability theory it is almost always product structures Bn which feature rather than B(Rn). See Section 8.8.
..(8.7) Chapter 8: Product Measure 81 8.6. The n-fold extension So far in this chapter, we have studied the product measure space of two measure spaces and how this relates to the study of two random variables. You are more than able to 'generalize' 'from two to n1 from your experi- experience of similar things in other branches of mathematics. You should give some thought to the associativity of the 'product' in product measure space, something again familiar in analogous contexts. 8.7. Infinite products of probability triples This topic is not a trivial extension of previous results. We concentrate on a restricted context (though an important one) because it allows us to get the main idea in a clear fashion; and extension to infinite products of arbitrary probability triples is then a purely routine exercise. Canonical model for a sequence of independent RVs Let (An : n € N) be a sequence of probability measures on (R, B). We already know from the coin-tossing trickery of Section 4.6 that we can con- construct a sequence (Xn) of independent RVs, Xn having law An. Here is a more elegant and systematic way of doing this. THEOREM Let (An : n 6 N) be a sequence of probability measures on (R,3). Define so that a typical element u> of £1 is a sequence (u>n) in R. Define and let T := <j(Xn : n £ N). Then there exists a unique probability measure P on (ft, .F) such that for r G N and B\, B2,..., Br G B, * nR) = n J We write ,.F, P)= Then the sequence (Xn : n £ N) is a sequence of independent RVs on (ft,^*, P), Xn having law An.
82 Chapter 8: Product Measure (8.7).. Remarks, (i) The uniqueness of P follows in the usual way from Lemma 1.6, because product sets of the form which appear on the left-hand side of (a) form a 7r-system generating T. (ii) We could rewrite (a) more neatly as ns*)=n To see this, use the monotone-convergence property A.10,b) of measures. Proof of the theorem is deferred to the Appendix to Chapter 9. 8.8. Technical note on the existence of joint laws Let (n,^*), (Si,Si) and (£2,^2) be measurable spaces. For i = 1,2, let : n —> Si be such that X~l : S,- —* T. Define S := Si x 52, S := Ei x S2, X(w) := (Xi(w),X2(u>)) G S. Then (Exercise) X~l : S —* T, so that X is an E, S)-valued random variable, and if P is a probability measure on fi, we can talk about the law fi of X (equals the joint law of X\ and X2) on E, S) : /x = P o X~l on S. Suppose now that Si and S2 are metrizable spaces and that S; = (i = 1,2). Then £ is a metrizable space under the product topology. If Si and S2 are separable, then S = B(S), and there is no 'conflict'. However, if Si and S2 are not separable, then B(S) may be strictly larger than S, X need not be an (S, #(S))-valued random variable, and the joint law of X\ and X2 need not exist on (S, B(S)). It is perhaps as well to be warned of such things. Note that the sepa- separability of R was used in proving that 3(Rn) C Bn in Section 8.5.
PART B: MARTINGALE THEORY Chapter 9 Conditional Expectation 9.1. A motivating example Suppose that (ft, T, P) is a probability triple and that X and Z are random variables, X taking the distinct values x\, #2, • • • > xm > Z taking the distinct values z\, ^2,. •., zn. Elementary conditional probability: P(X = Xi\Z = Zj) := P(X = Xi;Z = zj)/P(Z = zj and elementary conditional expectation: E(X\Z = Zj) = are familiar to you. The random variable Y = E(X|Z), the conditional expectation of X given Z, is defined as follows: (a) if Z(w) = jztj, then Y(u) := E(X|Z = ^) =: ^ (say). Z = zx Z — Z2 • • • Z = zn It proves to be very advantageous to look at this idea in a new way. 'Reporting to us the value of Z(cj)' amounts to partitioning Q into ^-atoms' on which Z is constant: Q The a-algebra Q = cr(Z) generated by Z consists of sets {Z 6 B}, B 6 5, and therefore consists precisely of the 2n possible unions of the n Z-atoms. It is clear from (a) that Y is constant on Z-atoms, or, to put it better, (b) Y is ^-measurable. 83
Chapter 9: Conditional Expectation (9.1).. Next, since Y takes the constant value yj on the Z-atom {Z = zj], we have: / YdP = yjP(Z = *,-) = r*iP(X = s,-|Z = zj)P(Z = zj) J{z=*rt j XdP. If we write Gj = {Z = zj}, this says E(YlGj) = E(XlGj). Since for every G in £, /g is a sum of Igj 's, we have E(Y1g) = E(XIg), or VG G (c) / YdP = / Results (b) and (c) suggest the central definition of modern probability. 9.2. Fundamental Theorem and Definition (Kolmogorov, 1933) ► Let (Q,^7, P) be a triple, and X a random variable with E(|-Y|) < oo. Let Q be a sub-a-algebra of T. Then there exists a random variable Y such that (a) Y is G measurable, (b) E(|F|) < oo, (c) for every set G in Q fequivalently, for every set G in some 7r-system which contains ft and generates Q)y we have JYdP= I XdP, VGeQ. G G Moreover^ ifY is another RV with these properties thenY = Y, a.s., that is, P[Y = Y] = 1. A random variable Y with properties (a)-(c) is called sl version of the conditional expectation E(X\G) of X given Q, and we write Y = E(X|£), a.s. Two versions agree a.s., and when one has become familiar with the concept, one identifies different versions and speaks of the conditional ex- expectation E(X\G). But you should think about the 'a.s.' throughout this course. The theorem is proved in Section 9.5, except for the tt-system assertion which you will find at Exercise E9.1. ►Notation. We often write E(X\Z) for E(A>(Z)), E(X|Z1,Z2,...) for E(X\a(Zi,Z2,...)), etc. That this is consistent with the elementary usage is apparent from Section 9.6 below.
..(9.5) Chapter 9: Conditional Expectation 85 9.3. The intuitive meaning An experiment has been performed. The only information available to you regarding which sample point u> has been chosen is the set of values Z(u>) for every ^-measurable random variable Z. Then Y{ui) = E(X\Q)(u>) is the expected value of X(u>) given this information. The 'a.s.' ambiguity in the definition is something one has to live with in general, but it is sometimes possible to choose a canonical version of E(X\Q). Note that if Q is the trivial a-algebra {0, Q} (which contains no infor- information), then E(X\g)(u) = E(X) for all u>. 9.4. Conditional expectation as least-squares-best predictor ► If E(X2) < oQj then the conditional expectation Y = E(X|£7) is a version of the orthogonal projection (set Section 6.11) of X onto £2(Q,£/,P). Hence, Y is the least-squares-best Q-measurable pre- predictor of X: amongst all Q-measurable functions (i.e. amongst all predictors which can be computed from the available information), Y minimizes E[(Y - XJ]. No surprise then that conditional expectation (and the martingale the- theory which develops it) is crucial in filtering and control - of space-ships, of industrial processes, or whatever. 9.5. Proof of Theorem 9.2 The standard way to prove Theorem 9.2 (see Section 14.14) is via the Radon- Nikodym theorem described in Section 5.14. However, Section 9.4 suggests a much simpler approach, and this is what we now develop. We can then prove the general Radon-Nikodym theorem by martingale theory. See Section 14.13. First we prove the almost sure uniqueness of a version of E(X\Q). Then we prove the existence of E(X|C7) when X £ C2; and finally, we prove the existence in general. Almost sure uniqueness of E(X\Q) Suppose that X 6 C1 and that Y and Y are versions of E(X\G)- Then \, and = o, \/Geg.
86 Chapter 9: Conditional Expectation (9.5).. Suppose that Y and Y are not almost surely equal. We may assume that the labelling is such that P(F > Y) > 0. Since we see that P(Y - Y > n) > 0 for some n. But the set {Y - Y > n} is in Q, because Y and Y are ^-measurable; and E(Y-Y;Y-Y> n^1) > n~lP(Y - Y > n) > 0, a contradiction. Hence Y = Yy a.s. , D Existence of E(X|£) for X e C2 Suppose that X G C2 := £2(Q,^", P). Let 5 be a sub-a-algebra of T, and let K := £2(G) := £2(n,^,P). By Section 6.10 applied to Q rather than T, we know that K is complete for the C2 norm. By Theorem 6.11 on orthogonal projection we know that there exists Y in K = C2{G) such that (a) E[(X - Yf) = inf{E[(X - Wf]: W € C\Q)}, (b) (X -Y,Z) = 0, VZ in C\g). Now, if G 6 Q, then Z :=Ig € C2(Q) and (b) states that Hence Y is a version of E(X|CJ), as required. Existence of E{X\G) for Ig/!1 By splitting X as X = X+ — -XL, we see that it is enough to deal with the case when X G (£1)+- So assume that X 6 (^1)+- We can now choose bounded variables Xn with 0 < Xn f X. Since each Xn is in £2, we can choose a version Yn of E(-Yn|^). We now need to establish that (c) it is almost surely true that 0 < Yn f. We prove this in a moment. Given that (c) is true, we set Y(uj) := limsupFn(a;). Then Y € mQ, and Yn f Y, a.s. But now (MON) allows us to deduce that E(Y;G) = E(X;G) (G € Q) from the corresponding result for Yn and Xn. D
..(9.6) Chapter 9: Conditional Expectation 87 A positivity result Property (c) follows once we prove that (d) ifU is a non-negative bounded RVy then E{U\G) > 0, a.s. Proof of (d). Let W be a version of E(U\G). If P(W < 0) > 0, then for some n, the set G := {W < — n~~1} in Q has positive probability, so that 0 < E(U;G) = E(W; G) < -n-lP(G) < 0. This contradiction finishes the proof. D 9.6. Agreement with traditional usage The case of two RVs will suffice to illustrate things. So suppose that X and Z are RVs which have a joint probability density function (pdf) Then fz(z) = fnfx,z(x,z)dx acts as a probability density function for Z. Define the elementary conditional pdf fx\z °f X given % vlSi 10 otherwise. Let Hea Borel function on R such that E\h(X)\ = / \h(x)\fx(x)dx < oo, JR where of course fx(x) — /r /x,z(^j z)dz gives a pdf for X. Set g(z) := / h(x)fX\z(x\z)dx. JR Then Y := g(Z) is a version of the conditional expectation of h(X) given Proof. The typical element of v(Z) has the form {u? : Z(u>) 6 B}, where B G B. Hence, we must show that (a) L := E[h(X)lB(Z)] = E[g(Z)lB(Z)] =: R. But L== I Kx)lB{z)fXyz{x,z)dxdz, R= g(z)IB(z)fz(z)dz, and result (a) follows from Fubini's Theorem. D Some of the practice is given in Sections 15.6-15.9, which you can look at now.
Chapter 9: Conditional Expectation (9.7).. ►9.7. Properties of conditional expectation: a list These properties are proved in Section 9.8. All X's satisfy E(|X|) < oo in this list of properties. Of course, Q and 7i denote sub-a-algebras of T. (The use of *c' to denote 'conditional' in (cMON), etc., is obvious.) (a) If Y is any version of E(X\Q) then E(F) = E(X). ( Very useful, this.) (b) If X is Q measurable, then E(X\Q) = X, a.s. (c) (Linearity) E(a1Xl + a2X2\Q) = a^X^G) + a2E(X2|£), a.s. Clarification: if Y\ is a version of E(X\ \Q) and I2 is a version of E(X2 \G), then a\Y\ -f ^2^2 is a version of E(aiXi 4- «2^2|&)- (d) (Positivity) If X > 0, then E(X|£) > 0, a.s. (e) (cMON) If 0 < Xn T Ar, then E(Xn\g) T E(X|^), a.s. (f) (cFATOU) If Xn > 0, then E[liminf Xn\Q] < liminf E[Xn|£], a.s. (g) (cDOM) If \Xn(u)\ < V(w), Vn, EV < 00, and Xn -^ X, a.s., then a.s. (h) (c JENSEN) If c : R -> R is convex, and E|c(X)| < 00, then E[c(X)\g] > c(E[X\g]), a.s. Important corollary: ||E(X|£)||P < \\X\\P for p > 1. (i) (Tower Property) If 7i is a sub-cr-algebra of £, then a.s. Note. We shorthand LHS to E[X|£|ft] for tidiness. (j) ('Taking out what is known') If Z is ^-measurable and bounded, then (*) E[ZX\g] = ZE[X\g], a.s. If p > 1, p-1 +q~l =l,Xe £p(n,^,P) and Z G £*(fi,a,P), then (*) again holds. If X £ (ni^)+, Z G (m^)+, E(X) < 00 and E(ZX) < 00, then (*) holds. (k) (Role of independence) If H is independent of a(cr(X), £), then E[X\<j{g,H)]=E{X\g), a.s. In particular, if X is independent of K, then E(X|ft) = E(X), a.s.
..(9.8) Chapter 9: Conditional Expectation 89 9.8. Proofs of the properties in Section 9.7 Property (a) follows since E(Y;Ct) = E(X;D), 0 being an element of Q. Property (b) is immediate from the definition, as is Property (c) now that its Clarification has been given. Property (d) is not obvious, but the proof of (9.5,d) transfers immedi- immediately to our current situation. Proof of (e). If 0 < Xn T X, then, by (d), if, for each n, Yn is a version of E(Xn\G), then (a.s.) 0 < Yn f. Define Y := limsupyn. Then Y G m£, and Yn | K, a.s. Now use (MON) to deduce from VG 6 G, that E(F; G) = E(J\T; G), WG G Q. (Of course we used a very similar argu- argument in Section 9.5.) □ Proof of (f) and (g). You should check that the argument used to obtain (FATOU) from (MON) in Section 5.4 and the argument used to obtain (DOM) from (FATOU) in Section 5.9 both transfer without difficulty to yield the conditional versions. Doing the careful derivation of (cFATOU) from (cMON) and of (cDOM) from (cFATOU) is an essential exercise for you. D Proof of (h). From F.6,a), there exists a countable sequence ((«n,^n)) of points in R2 such that c(x) = sup(ana: + &n), or 6 R. n For each fixed n we deduce via (d) from c(X) > anX + bn that, almost surely, (**) E[c(X)|ff] > an£[X\g] + bn. By the usual appeal to countability, we can say that almost surely (**) holds simultaneously for all n, whence, almost surely, > sup(an£[X\g] + 6n) = c(E[X|£]). □ n Proof of corollary to (h). Let p > 1. Taking c(x) = \x\p, we see that E(\X\'\g) > |E(X|a)|", a.s.
90 Chapter 9: Conditional Expectation (9.8).. Now take expectations, using property (a). □ Property (i) is virtually immediate from the definition of conditional expectation. Proof of (j). Linearity shows that we can assume that X > 0. Fix a version Y of E(X\G), and fix G in Q. We must prove that if Z is Q-measurable and appropriate integrability conditions hold, then (***) E(ZX;G) = E(ZY;G). We use the standard machine. If Z is the indicator of a set in Q, then (***) is true by definition of the conditional expectation Y. Linearity then shows that (***) holds for Z G SF+(n,£,P). Next, (MON) shows that (*♦*) is true for Z G (m£7)+ with the understanding that both sides might be infinite. All that is necessary to establish that property (j) in the table is correct is to show that under each of the conditions given, E(|ZJ\T|) < oo. This is obvious if Z is bounded and X is in £*, and follows from the Holder inequality if X G £p and Z G Cq where p > 1 and p~l -f q~l = 1. □ Proof of (k). We can assume that X > 0 (and E(X) < oo). For G G Q and H G 7i, XIq and H are independent, so that by Theorem 7.1, E(X; GnH) = EpaG)IH] = E(XlG)P(H). Now if Y = E(X\G) (a version of), then since Y is ^-measurable, YIq is independent of H. so that H] = E(YlG)P(H) and we have Thus the measures on a(C7,7i) of the same finite total mass agree on the 7r-system of sets of the form GC\H(GeQ,He 7i), and hence agree everywhere on <t(G, H). This is exactly what we had to prove. D
..(9.10) Chapter 9: Conditional Expectation 91 9.9. Regular conditional probabilities and pdfs For Fef,we have P(F) = E(IF). For F G T and Q a sub-a-algebra of T, we define P(F\G) to be a version o/E(If|(?)- By linearity and (cMON), we can show that for a fixed sequence (Fn) of disjoint elements of T, we have (a) Except in trivial cases, there are uncountably many sequences of disjoint sets, so we cannot conclude from (a) that there exists a map such that (bl) for F6f, the function uj h-> P(u?, F) is a version ofP(F\G); (b2) for almost every u>y the map F h-* P(u>, F) is a probability measure on J-'. If such a map exists, it is called a regular conditional probability given Q. It is known that regular conditional probabilities exist under most conditions encountered in practice, but they do not always exist. The matter is too technical for a book at this level. See, for example, Parthasarathy A967). Important note. The elementary conditional pdf fx\z(x\z) °f Section 9.6 is a proper - technically, regular - conditional pdf for X given Z in that for every A in 3, > I fx\z(z\Z(v))dx is a version of P(X G A\Z). J A UJ SA Proof. Take h = I a in Section 9.6. □ 9.10. Conditioning under independence assumptions Suppose that r 6 N and that Xi,X2,... ,Xr are independent RVs, having law A*. If h 6 bBr and we define (for x\ 6 R) (a) 7
Q2 Chapter 9: Conditional Expectation (9.10).. then (b) 7^(Xi) is a version of the conditional expectation rfi/y Y" V M V 1 iz|n\ Jx.i, y\.2> • • • 5 -Ar)\^*-1J. Two proofs of (b). We need only show that for I? 6 B, (c) E[/i(X1,X2,...,Xr)IB(X1)) = E^^XOIb^)]. We can do this via the Monotone-Class Theorem, the class H of h satisfying (c) contains the indicator functions of elements in the Tr-system of sets of the form Bi x B2 x ... x Br (Bk e B), etc., etc. Alternatively, we can appeal to the r-fold Fubini Theorem; for (c) says that / h(x)lB(xi)(Ai x A2 x ... x Ar)(dx) where r ,t/)(A2 x ... x Ar)(dy). □ 9.11. Use of symmetry: an example Suppose that X\, X2,... are IID RVs with the same distribution as X, where E(|X|) < oo. Let Sn := Xx + X2 + • • • + Xn, and define We wish to calculate for very good reasons, as we shall see in Chapter 14. Now cr(Xn+i, Xn+2,...) is independent of a(Xi,5n) (which is a sub-a-algebra of a(Xi,... ,Xn)). Hence, by (9.7,k), But if A denotes the law of X, then, with sn denoting x\ -f x2 + • • • 4- xn, we have E(Xi;SneB) = I ... I XiA(dxi)A(dx2)... A(dxn) J Jsn£B = E(X2;SneB)= ... =E(Xn;SneB). Hence, almost surely, = ••• =E(Xn\Sn)
Chapter 10 Martingales 10.1. Filtered spaces ►As basic datum, we now take a filtered space (O,^7, {^"n}? P)« Here, F, P) is a probability triple as usual, 7n : n > 0} is a filtration, that is, an increasing family of sub- a- algebras of T\ We define ^ n Intuitive idea. The information about u) in Q, available to us at (or, if you prefer, 'just after') time n consists precisely of the values of Z(lo) for all Tn measurable functions Z. Usually, {J-n} is the natural filtration of some (stochastic) process W = (Wn : n G Z+), and then the information about u? which we have at time n consists of the values. 10.2. Adapted process ►A process X = (Xn : n > 0) is called adapted (to the filtration {Fn}) i each n, Xn is ^-measurable. Intuitive idea. If X is adapted, the value Xn(u>) is known to us at time n. Usually, Tn = <j(Wb, Wk,..., Wn) and Xn = fn(W0, Wu...,Wn) for some -measurable function fn on Rn+1. 93
Chapter 10: Martingales A0.3).. 10.3. Martingale, super martingale, submartingale ►A process X is called a martingale (relative to {{Fn},P)) if (i) X is adapted, (ii) E(|Xn|)<oo,Vn, (iii) E[Xn|^n_1] = Xn_1, a.s. (n>l). A supermartingale (relative to ({^"n}, P)) is defined similarly, except that (iii) is replaced by a.s. and a submartingale is defined with (iii) replaced by -!, a.s. (n > 1). A supermartingale 'decreases on average'; a submartingale 'increases on average'! [Supermartingale corresponds to superharmonic: a function / on Rn is superharmonic if and only if for a Brownian motion B on Rn, f(B) is a local supermartingale relative to the natural filtration of B. Compare Section 10.13.] Note that X is a supermartingale if and only if —X is a submartingale, and that X is a martingale if and only if it is both a supermartingale and a submartingale. It is important to note that a process X for which Xq £ £1(Q,^ro?P) is a martingale [respectively, supermartingale, submartingale] if and only if the process X — Xo = (Xn — Xq : n G Z+) has the same property. So we can focus attention on processes which are null at 0. ► If X is for example a supermartingale, then the Tower Property of CEs, (9.7)(i), shows that for m < n, a.s.. 10.4. Some examples of martingales As we shall see, it is very helpful to view all martingales, supermartingales and submartingales in terms of gambling. But, of course, the enormous importance of martingale theory derives from the fact that martingales crop up in very many contexts. For example, diffusion theory, which used to be studied via methods from Markov-process theory, from the theory of
..A0.4) Chapter 10: Martingales 95 partial differential equations, etc., has been revolutionized by the martingale approach. Let us now look at some simple first examples, and mention an inter- interesting question (solved later) pertaining to each. (a) Sums of independent zero-mean RVs. Let X\, X2,... be a sequence of independent RVs with E(|X*|) < 00, Vfc, and E(Xk) = 0, Vfc. Define (So := 0 and) Then for n > 1, we have (a.s.) The first (a.s.) equality is obvious from the linearity property (9.7,c). Since Sn-i is Fn-i-measurable, we have E(Srl_i|^rrl_i) = Sn-\ (a.s.) by (9.7,b); and since Xn is independent of Tn-x, we have E(Xn|^rn_i) = E(Xn) (a.s.) by (9.7,k). That must explain our notation! Interesting question: when does limS'n exist (a.s.)? See Section 12.5. (b) Products of non-negative independent RVs of mean 1. Let Xi,X2,... be a sequence of independent non-negative random variables with Define (Mo := 1, Fq := {0,0} and) Mn := X1X2 .. 'Xn, Tn := o(X\,X2,... ,Xn). Then, for n > 1, we have (a.s.) (=)Mn_1E(Xn)=Mn_1, so that M is a martingale.
Chapter 10: Martingales A0.4).. It should be remarked that such martingales are not at all artificial. Interesting question. Because M is a non-negative martingale, lim Mn exists (a.s.); this is part of the Martingale Convergence Theorem of the next chapter. When can we say that E(Mqo) = 1? See Sections 14.12 and 14.17. (c) Accumulating data about a random variable. Let {fn} be our filtration, and let f £ C1^,?,P). Define Mn := E(f|.Fn) ('some version of). By the Tower Property (9.7,i), we have (a.s.) Hence Mis a martingale. Interesting question. In this case, we shall be able to say that Mn —> Moo := E(f l-^oo), a-s-> because of Levy's Upward Theorem (Chapter 14). Now Mn is the best predictor of f given the information available to us at time n, and Mo© is the best prediction of £ we can ever make. When can we say that £ = E^I.Foo)? a.s? The answer is not always obvious. See Section 15.8. 10.5. Fair and unfair games Think now of Xn — Xn-i as your net winnings per unit stake in game n (n > 1) in a series of games, played at times n = 1,2, There is no game at time 0. In the martingale case, (a) E[Xn — Xn-i l^n-i] = 0, (game series is fair), and in the supermartingale case, (b) E[Xn — Xn-\ l^n-i] < 0, (game series is unfavourable to you). Note that (a) [respectively (b)] gives a useful way of formulating the mar- martingale [supermartingale] property of X. 10.6. Previsible process, gambling strategy ►We call a process C = (Cn : n 6 N) previsible if Cn is Tn-\ measurable (n > 1).
..A0.8) Chapter 10: Martingales 97 Note that C has parameter set N rather than Z+: Co does not exist. Think of Cn as your stake on game n. You have to decide on the value of Cn based on the history up to (and including) time n — 1. This is the intuitive significance of the 'previsible' character of C. Your winnings on game n are Cn(Xn — Xn-i) and your total winnings up to time n are Yn= l<k<n Note that (C • X)Q = 0, and that Yn — Yn-i = Cn(Xn — Xn-i). The expression C • -Y, the martingale transform of X by C, is the discrete analogue of the stochastic integral fCdX. Stochastic-integral theory is one of the greatest achievements of the modern theory of proba- probability. 10.7. A fundamental principle: you can't beat the system! ►(i) Let C be a bounded non-negative previsible process so that, for some K in [0, oo), |Cn(o;)| < K for every n and every u). Let X be a super- martingale [respectively martingale]. Then CtX is a supermartingale [martingale] null at 0. (ii) // C is a bounded previsible process and X is a martingale, then (C • X) is a martingale null at 0. (iii) In (i) and (ii),^e boundedness condition on C may be replaced by the condition Cn G £2,Vn, provided we also insist that Xn G £2,Vn. Proof of (i). Write Y for C • X. Since Cn is bounded non-negative and Tn-\ measurable, B[Yn - rn_x l^.j] y? CnE[Xn - Xn-! |^n-i] < 0, [resp. =0]. Proofs of (ii) and (iii) are now obvious. (Look again at (9.7,j).) □ 10.8. Stopping time 10.8. Stopping time A map T : Q —> {0,1,2,...; 00} is called a stopping time if, ►(a) {T < n} = {u> : T(u>) < n} G JFn, Vn < 00,
98 Chapter 10: Martingales A0.8).. equivalent ly, (b) {T = n} = {u> : T(w) = n} G «Fn, Vn < oo. Note that T can be oo. Proof of the equivalence of (a) and (b). If T has property (a), then {T = n} = {T< n}\{T < n - 1} G JV If T has property (b), then for fc < n, {T = A:} G ^jt C ^"n and {T < n} = |J {T = k] G Tn. 0<k<n Intuitive idea. T is a time when you can decide to stop playing our game. Whether or not you stop immediately after the nth game depends only on the history up to (and including) time n : {T = n] 6 Tn- Example. Suppose that (^4n) is an adapted process, and that B G B. Let T = inf {n > 0 : An G B] = time of first entry of A into set B. By convention, inf@) = oo, so that T = oo if A never enters set B. Obvi- Obviously, {T < n} = (J {Ak eB}e Tn, k<n so that T is a stopping time. Example. Let L = sup{n : n < 10; An G B}, sup@) = 0. Convince yourself that L is NOT a stopping time (unless A is freaky). 10.9. Stopped supermartingales are supermartingales Let X be a supermartingale, and let T be a stopping time. Suppose that you always bet 1 unit and quit playing at (immediately after) time T. Then your 'stake process' is C^T\ where, for n G N, Your 'winnings process' is the process with value at time n equal to • X)n = XTAn - Xo.
..A0.9) Chapter 10: Martingales 99 If XT denotes the process X stopped at T: then Now C^ is clearly bounded (by 1) and non-negative. Moreover, C^ is previsible because Cn can only be 0 or 1 and, for n G N, Result 10.7 now yields the following result. THEOREM. ►(i) If X is a super martingale and T is a stopping time, then the stopped process XT = (Xtah - n £ Z+) is a supermartingale, so that in particular, < E(X0), Vn. ►(ii) If X is a martingale and T is a stopping time, then XT is a martin gale, so that in particular, Vn. It is important to notice that this theorem imposes no extra integra- bility conditions whatsoever (except of course for those implicit in the defi- definition of supermartingale and martingale). But be very careful! Let X be a simple random walk on Z+, starting at 0. Then X is a martingale. Let T be the stopping time: T:=inf{n:Xn = l}. It is well known that P(T < oo) = 1. (See Section 10.12 for a martingale proof of this fact, and for a martingale calculation of the distribution of T.) However, even though = E(Xo) for every we have 1 = E(XT) ^ E(X0) = 0.
100 Chapter 10: Martingales A0.9).. We very much want to know when we can say that E(XT) = E(X0) for a martingale X. The following theorem gives some sufficient conditions. 10.10. Doob's Optional-Stopping Theorem ►(a) Let T be a stopping time. Let X be a supermartingale. Then Xt w integrable and E(XT) < E(A'o) in each of the following situations: (i) T is bounded (for some N in N, T(u>) < N, Vu>); (ii) X is bounded (for some K in R+, |Xn(u?)| < K for every n and every us) and T is a.s. finite; (iii) E(T) < oo, and, for some K in R+, < K (b) If any of the conditions (i)-(iii) holds and X is a martingale, then E(XT)=E(Xo). Proof of (a). We know that Xtau is integrable, and TAn - Xo) < 0. For (i), we can take n = N. For (ii), we can let n —► oo in (*) using (BDD). For (iii), we have TAn \Xtau -Xq\ = \ J2(xk -**-i)| < KT and EG\T) < oo, so that (DOM) justifies letting n —> oo in (*) to obtain the answer we want. □ Proof o/(b). Apply (a) to X and to (-X). D
..A0.11) Chapter 10: Martingales 101 Corollary ►(c) Suppose that M is a martingale, the increments Mn—Mn-i of which are bounded by some constant K\. Suppose that C is a previsible process bounded by some constant A'2, and that T is a stopping time such that E(T) < 00. Then E(C • M)T = 0. Proof of the following final paxt of the Optional-Stopping Theorem is left as an Exercise. (It's clear whose lemma is needed!) (d) // X is a non-negative supermartingale, and T is a stopping time which is a.s. finite, then E(XT)<E(Xo). 10.11. Awaiting the almost inevitable In order to be able to apply some of the results of the preceding Section, we need ways of proving that (when true!) E(T) < 00. The following announcement of the principle that Whatever always stands a reasonable chance of happening will almost surely happen - sooner rather than later'is often useful. LEMMA Suppose that T is a stopping time such that for some N in N and some e > 0, we have, for every n in ISI: P(T <n + N\Fn) > e, a.s. Then E(T) < 00. You will find the proof of this set as an exercise in Chapter E. Note that if T is the first occasion by which the monkey in the 'Tricky exercise' at the end of Section 4.9 first completes ABRACADABRA, then E(T) < 00. You will find another exercise in Chapter E inviting you to apply result (c) of the preceding Section to show that = 26n +264-f 26. A large number of other Exercises are now accessible to you.
102 Chapter 10: Martingales A0.12).. 10.12. Hitting times for simple random walk Suppose that (Xn : n 6 N) is a sequence of IID RVs, each Xn having the same distribution as X where P(X = 1) = P(X = -1) = \. Define So := 0, Sn := X\ -f • • • -f Xn, and set T := inf{n : Sn = 1}. Let , 5*1, . . . , Sn). Then the process S is adapted (to {^"n}), so that T is a stopping time. We wish to calculate the distribution of T. For 9 e R, Ee'x = J(e* + e~$) = cosh6>, so that E[(sech0)e'x»] = l, Vn. Example A0.4,b) shows that M$ is a martingale, where Since T is a stopping time, and M$ is a martingale, we have (a) E^TAn = E[(sech0)TAn exp(OSTAn)] = 1, Vn. that 9 > 0. Then, firstly, exp(^5rAn) is bounded by e*, so M^An is bounded by e^. Secondly, asn | oo, Mj>An —♦ M^ where the latter is defined to be 0 if T = oo. The Bounded Convergence Theorem allows us to let n —♦ oo in (a) to obtain the term inside [•] on the right-hand side correctly being 0 if T = oo. Hence (b) E[(sech 6)T] = e~° for 0 > 0. We now let 0 | 0. Then (sech0)r | 1 if T < oo, and (sech0)T | 0 if T = oo. Either (MON) or (BDD) yields EI{r<oo} = 1 = P(T < oo).
..A0.18) Chapter 10: Martingales 103 The above argument has been given carefully to show how to deal with possibly infinite stopping times. Put a = sech# in (b) to obtain (c\ E(otT) = V^anPCT = n) = e~e — a~ so that P(r = 2m - 1) = (-l)m+1 (\ J. Intuitive proof of (c) We have (d) /(a) : = E(aT) = |E(ar|X1 = 1) + £E(aT|X, = -1) = \a + laf(aJ. The intuitive reason for the very last term is that time 1 has already elapsed giving the a, and the time taken to go from —1 to 1 has the form T\ -f T2, where T\ (the time to go from —1 to 0) and T<i (the time to go from 0 to 1) are independent, each with the same distribution as T. It is not obvious that Ti and Ti are independent', but it is not difficult to devise a proof: the so-called Strong Markov Theorem would allow us to justify (d). 10.13. Non-negative superharmonic functions for Markov chains Let E be a finite or countable set. Let P = (pij) be a stochastic E X E matrix, so that, for i,j 6 J5, we have k£E Let fi be a probability measure on E. We know from Section 4.8 that there exists a triple (fi, J-', P**) (we now signify the dependence of P on ji) carrying a Markov chain Z = (Zn : n 6 Z+) such that D.8,a) holds. We write 'a.s., P*11 to signify 'almost surely relative to the P^-measure'. Let Tn \— a(Zo, Z\,..., Zn). It is easy to deduce from D.8,a) that if we write p(i,j) instead of pij when typographically convenient, then (a.s.,PM) Let h be a non-negative function on E and define the function Ph on E via
104 Chapter 10: Martingales A0.13).. Assume that our non-negative function h is finite and P-superharmonic in that Ph < h on E. Then, (cMON) shows that, a.s., so that h(Zn) is a non-negative supermartingale (whatever be the initial distribution /x). Suppose that the chain Z is irreducible recurrent in that fa := P\Tj < oo) = 1, where P1 denotes P*1 when /x is the unit mass (/ij = Sij) at i (see 'Note' below) and Tj := inf {n : n > 1; Zn = j}. Note that the infimum is over {n > 1}, so that /,-,- is the probability of a return to i if Z starts at i. Then, by Theorem 10.10(d), we see that if h is non-negative and P-superharmonic, then, for any i and j in E, so that h is constant on E. Exercise. Explain (at first intuitively, and later with consideration of rigour) why fij = ^Pikfkj +p«j > and deduce that if every non-negative P-superharmonic function is constant, then Z is irreducible recurrent. □ So we have proved that our chain Z is irreducible and recurrent if and only if every non- negative P-superharmonic function is constant. This is a trivial first step in the links between probability and potential theory. Note. The perspicacious reader will have been upset by a lack of precision in this section. I wished to convey what is interesting first. Only the very enthusiastic should read the remainder of this section.
..A0.18) Chapter 10: Martingales 105 The natural thing to do, given the one-step transition matrix P, is to take the canonical model for the Markov chain Z obtained as follows. Let S denote the a-algebra of all subsets of E and define In particular, a point a; of 17 is a sequence of elements of E. For a; in 17 and n in Z+, define Zn(u) := con e E. Then, for each probability measure /x on (JE7, £), there is a unique probability measure P** on A2, ^") such that for n G N and z'o, ii,.. -, «n € E, we have The uniqueness is trivial because u?-sets of the form contained in [•] on the left-hand side of (*), together with 0, form a 7r-system generating T. Existence follows because we can take P^ to be the P^-law of the non- canonical process Z constructed in Section A4.3: Here, we regard Z as the map z:n this map Z being f'jT measurable in that Z~l : T -> f. The canonical model thus obtained is very satisfying because the mea- measurable space (QjJ7) carries all measures P** simultaneously.
Chapter 11 The Convergence Theorem 11.1. The picture that says it all The top part of Figure 11.1 shows a sample path n h-» Xn(u)) for a process X where Xn — Xn-\ represents your winnings per unit stake on game n. The lower part of the picture illustrates your total-winnings process Y := C • X under the previsible strategy C described as follows: Pick two numbers a and b with a < b. REPEAT Wait until X gets below a Play unit stakes until X gets above b and stop playing UNTIL FALSE (that is, forever!). Black blobs signify where C = 1; and open circles signify where C = 0. Recall that C is not defined at time 0. To be more formal (and to prove inductively that C is previsible), define and, for n > 2, 11.2. Upcrossings The number ?7;v[a,6](u;) of upcrossings of [a,6] made by nn Xn(u>) by time N is defined to be the largest k in Z+ such that we can find 0 < 5i < 11 < s2 < t2 < - - • < sk < tk < N with XSi(u>) < a, Xti(a;) > b (l<i< k). 106
A1.2) Chapter 11: The Convergence Theorem Figure 11.1 107 OOO oo Y
108 Chapter 11: The Convergence Theorem A1.2).. The fundamental inequality (recall that Yq(u>) := 0) ►(D) YN(u) > (b - a)UN[a, 6](w) - [Xn(uj) - a}~ is obvious from the picture: every upcrossing of [a, b] increases the Y-value by at least (b — a), while the [X/v(u>) — a]~ overemphasizes the loss during the last 'interval of play'. 11.3. Doob's Upcrossing Lemma ► Let X be a supermartingale. Let U^[ajb] be the number of upcross- ings o/[a,6] by time N. Then (b - a)EUN[a, b] < E[(XN - a)']. Proof. The process C is previsible, bounded and > 0, and Y = C • X. Hence Y is a supermartingale, and E(Yjv) < 0. The result now follows from 11.4. COROLLARY Let X be a supermartingale bounded in Cl in that supE(|Xn|) < oo. Let a, b 6 R with a < b. Then, with U^la^o] :=| lim;v ##[<*, b], (b - a)EUoo[ay b] < \a\ + sup E(|Xn|) < oo n so that Proof. By A1.3), we have, for (b-a)EUN[a,b] < \a\ + E(\XN\) < \a\ + supE(|Xn|). Now let N t oo, using (MON). □
..A1.7) Chapter 11: The Convergence Theorem 109 11.5. Doob's 'Forward5 Convergence Theorem Let X be a supermartingale bounded in C1 : supE(|Xn|) < oo. n Then, almost surely, X^ := limXn exists and is finite. For definiteness, we define X^u) := limsupXn(u>), Vu>, so that is ^"oo measurable and Aroo = HmXn, a.s. Proof (Doob). Write (noting the use of [—oo, oo]): A : = {uj : Xn(u>) does not converge to a limit in [—00,00]} = {u>: liminf Xn(u)) < limsupArn(u;)} = M {u> : liminf Xn(u>) < a <b < limsup-Yn(u;)} {o,66Q:o<5} =:\jAa}b (say). But Aa,6 £ {a; : Uoo[a, b](u) = 00}, so that, by A1.4), P(Aa>fc) = 0. Since A is a countable union of sets Aa>&, we see that P(A) = 0, whence Xqo '♦= \hnXn exists a.s. in [—00,00]. But Fatou's Lemma shows that EflXool) = E(liminf \Xn\) < liminf E(\Xn\) <supE(|Xn|)<oo, so that P(Xoo is finite) = 1. □ Note. There are other proofs for the discrete-parameter case. None of these is as probabilistic, and none shares the central importance of this one for the continuous-parameter case. 11.6. Warning As we saw for the branching-process example, it need not be true that 11.7. Corollary If X is a non-negative supermartingale, then Xqq := limXn exists almost surely. Proof. X is obviously bounded in C1, since E(|Xn|) = E(Xn) < E(Xo). □
Chapter 12 Martingales bounded in C 12.0. Introduction When it works, one of the easiest ways of proving that a martingale M is bounded in C1 is to prove that it is bounded in C2 in the sense that (a) sup||A/n||2 < oo, equivalently, supE(M^) < oo. n n Boundedness in C2 is often easy to check because of a Pythagorean formula (proved in Section 12.1) n E(M2) = E(M02) + The study of sums of independent random variables, a central topic in the classical theory, will be seen to hinge on Theorem 12.2 below, both parts of which have neat martingale proofs. We shall prove the Three- Series Theorem, which says exactly when a sum of independent random variables converges. We shall also prove the general Strong Law of Large Numbers for IID RVs and Levy's extension of the Borel-Cantelli Lemmas. 12.1. Martingales in C2; orthogonality of increments Let M = (Mn : n > 0) be a martingale in C2 in that each Mn is in C2 so that E(M2) < oo, Vn. Then for syt,u,v 6 Z+, with s < t < u < t>, we know that so that Mv — Mu is orthogonal to £2{J-U) (see Section 9.5) and in particular, (a) (Mt-MsyMv- 110
..A2.1) Chapter 12: Martingales bounded in C? Ill Hence the formula n expresses Mn as the sum of orthogonal terms, and Pythagoras's theorem yields n 2\ _ (b) E(M2n) = E(M02) THEOREM Let M be a martingale for which Mn 6 C2, Vn. Then M is bounded in C2 if and only if (c) EEKM* - Mk-iJ] < oo; and when this obtains, Mn —> Moo almost surely and in C2. Proof It is obvious from (b) that condition (c) is equivalent to the statement M is bounded in C?. Suppose now that (c) holds. Then M is bounded in C2, and hence, by the property of monotonicity of norms (Section 6.7), M is bounded in C1. Doob's Convergence Theorem 11.5 shows that Moo •= Hm A/n exists almost surely. The Pythagorean theorem implies that n-f r f (d) E[(Mn+r - Mnf) = J2 E[(Mk - jfc=n-fl Letting r —♦ oo and applying Fatou's Lemma, we obtain (e) E[(Moo-MnJ]< J2 E[(Mk - Mk-iJ) k>n+l Hence (f) limE[(Moo-MnJ]=0, n so that Mn -* Moq in C2. Of course, (f) allows us to deduce from (d) that (e) holds with equality.
112 Chapter 12: Martingales bounded in C? A2.2).. 12.2. Sums of zero-mean independent variables in C2 THEOREM Suppose that (Xk : k £ N) is a sequence of independent random variables such that, for every ky E(Xk) = 0, a\ := Var(Xit) < oo. (a) Then \^-| < oo) implies that i/]^k converges, a.s.) (b) // the variables (X^) are bounded by some constant K in [0,oo) in that \Xk(u>)\ < K, VA:, Vu>, then converges, a.s.) implies that (S_] a\ < °°)- Note. Of course, the Kolmogorov 0-1 law implies that Xk converges) = 0 or 1. Notation. We define Tn := a(Xi,X2,.. .yXn), Mn := X\ + X2 4- h Xn, (with ^o := {0,0}, Mo := 0, by the usual conventions). We also define n so that A$ := 0 and iVo := 0. Proof of (a). We know from A0.4,a) that M is a martingale. Moreover so that, from A2.1,b), n If X)afc < °°j t^len -^ is bounded in £2, so that limMn exists a.s. □
..A2.3) Chapter 12: Martingales bounded in C2 113 Proof of (b). We can strengthen (*) as follows: since Xk is independent of -i, we have, almost surely, = E[X2k\rk-{\ = E{Xl) = a\. A familiar argument now applies: since Mk-i is Tk-\ measurable, o\ = But this result states that N is a martingale. Now let c G @, oo) and define r:=inf{r: \Mr\ > c}. We know that NT is a martingale so that, for every n, EiVj = E[(MJJ] - E^LTAn = 0. But since \MT - MT-i \ = \XT\ < K if T is finite, we see that |Mj| <K + c for every n, whence (**) EATAn<(K + cJ, Vn. However, since ^Xn converges a.s., the partial sums of Y^^k a^ a-s- bounded, and it must be the case that for some c, P(T = oo) > 0. It is now clear from (**) that Aoo •= J2al < °°- ^ Remark. The proof of (b) showed that if (Xn) is a sequence of independent zero-mean RVs uniformly bounded by some constant if, then (P{ partial sums"of X]X* are bounded } > 0) => (J2Xk converges a.s.) Generalization. Sections 12.11-12.16 present the natural martingale form of Theorem 12.2 with applications. 12.3. Random signs Suppose that (an) is a sequence of real numbers and that (en) is a sequence of IID RVs with P(en = ±1) = J.
Chapter 12: Martingales bounded in C2 A2.3).. The results of Section 12.2 show that ^enan converges (a.s.) if and only ifYl°Z. < °°» and that nan (a.s.) oscillates infinitely if ^Zan ~ °°* You should think about how to clinch the latter statement. 12.4. A symmetrization technique: expanding the sample space We need a stronger result than that provided by A2.2,b). LEMMA Suppose that (Xn) is a sequence of independent random variables bounded by a constant K in [0, oo): Then n converges, a.s.) => (£^E(Xn) converges and ]T)Var(Xn) < oo). Proof. If each Xn has mean zero, then of course, this would amount to A2.2,b). There is a nice trick which replaces each Xn by a 'symmetrized version' Z* of mean 0 in such a way as to preserve enough of the structure. Let (O,/,P,(i"n : n e N)) be an exact copy of (ft, T, P, (Xn : n 6 N)). Define and, for u;* = (u;,d;) G ft*, define := Xn(u>)y We think of X£ as Xn lifted to the larger 'sample space' (ft*,^7*, P*). It is clear (and may be proved by applying Uniqueness Lemma 1.6 in a familiar way) that the combined family (X* : n e N) U (X* : n G N) is a family of independent random variables on (ft*,^**, P*), with both X* and X* having the same P*-distribution as the P-distribution of Xn: P* o (X*) =PoIn on (R,5), etc.
..A2.5) Chapter 12: Martingales bounded in C2 115 Now we have (a) (Z* : n G N*) is a zero-mean sequence of independent random variables on (£l*yf*,P*) such that \Zn(u>m)\ < 2K (Vn,Vu;*) and where a\ := Var(Xn). Let G '.— {u> G O>: YlXn(w) converges}, with G defined similarly. Then we are given that P(G) = P(G) = 1, so that P*(G x G) = 1. But Yl zn(^) converges on G x G, so that (b) P*(E^» converges) = 1. From (a) and (b) and A2.2,b), we conclude that <oo, and now it follows from A2.2,a) that (c) YHXn - Hxn)] converges, a.s., the variables in this sum being zero-mean independent, with E[{Xn - 2 l Since (c) holds and ]T)Xn converges (a.s.) by hypothesis, ]TE(Xn) con- converges. D Note. Another proof of the lemma may be found in Section 18.6. 12.5. Kolmogorov's Three-Series Theorem Let (Xn) be a sequence of independent random variables. Then ]P Xn converges almost surely if and only if for some (then for every) K > 0, the following three properties hold: (i) J2P(\Xn\ > K) < oo, n (ii) Y^^(Xrf) converges, (iii) n
11$ Chapter 12: Martingales bounded in C2 A2.5).. where \rK( \ _ J Xn{yf) if |Xn(u;)| >K. Proof of 'if part. Suppose that for some K > 0 properties (i)-(iii) hold. Then £P(A' ^ X%) £P(|X| if) OO  so that by (BC1) P(Xn = X* for all but finitely many n) = 1. It is therefore clear that we need only show that £ X^ converges almost surely; and because of (ii), we need only prove that ? converges, a.s., where Yf := X* - E(X?). However, the sequence (Y^ : n G N) is a zero-mean sequence of independent random variables with = Var(X^). Because of (iii), the desired result now follows from A2.2,a). □ Proof of 'only if part. Suppose that ]T) JVn converges, a.s., and that K is any constant in @, oo). Since it is almost surely true that Xn —♦ 0 whence |Xn| > K for only finitely many n, (BC2) shows that (i) holds. Since (a.s.) Xn = X% for all but finitely many n, we know that ^ converges, a.s. Lemma 12.4 completes the proof. □ Results such as the Three-Series Theorem become poweiful when used in conjunction with Kronecker's Lemma (Section 12.7). 12.6. Cesaro's Lemma Suppose that Fn) is a sequence of strictly positive real numbers with bn | oo, and that (vn) is a convergent sequence of real numbers: vn —♦ Voq 6 R. Then n oo).
..A2.7) Chapter 12: Martingales bounded in C2 117 Here, 6o := 0. Proof Let e > 0. Choose N such that vjt > ^oo — £ whenever k > N. Then, lim inf r— / (bk — fyt-i)t>* n—*oo On *—-J N i > lim inf k — 1 > 0 4- M«s — F **^ \J |^ *^ OO ^ • Since this is true for every e > 0, we have lim inf > Vcx>i anc^ since, by a similar argument, lim sup < Vqq, the result follows. □ 12.7. Kronecker's Lemma ► Again, let Fn) denote a sequence of strictly positive real numbers with bn | oo. Let (xn) be a sequence of real numbers, and define sn :~ x\ -jr x2 -r " - -f- xn, Then / Xn \ —^ converges ) => Proof. Let un := X)jt<n(*r^/^)» so ^na^ woo •= Hmwn exists. Then Un Un — i XfifOfi. Thus n n — Uh-l) = bnUn — Cesaro's Lemma now shows that Sn/K —> Woo - Woo = 0. □
118 Chapter 12: Martingales bounded in C2 A2.8). 12.8. A Strong Law under variance constraints LEMMA Let (Wn) be a sequence of independent random variables such that n Then n l 2Zk<n ^* ~* ^» a*s** Proof. By Kronecker's Lemma, it is enough to prove that YXWn/n) con- converges, a.s. But this is immediate from Theorem 12.2(a). D Note. We are now going to see that a truncation technique enables us to obtain the general Strong Law for IID RVs from the above lemma. 12.9. Kolmogorov's Truncation Lemma Suppose that Xi,J^2> • • • 0?*e IID RVs each with the same distribution as X, where E(\X\) < oo. Set fi := E(X). Define _(Xn if\Xn\<n, In'~\0 if\Xn\>n. Then (i) E(Yn) -+ //; (ii) P[Yn = Xn eventually] = 1; (iii) £>~2Vax(yn) < oo. Proof of(i). Let 7 ._ /X 'li n •" \ 0 if |X| > n. Then Zn has the same distribution as Fn, so that in particular, E(Zn) = E(Yn). But, asn-> oo, we have Zn -» X, |Z»| < |X|, so, by (DOM), S(Z«) - E(Jf) = /i. D
..A2.10) Chapter 12: Martingales bounded in C2 119 Proof of (ii). We have oo P(Yn ? Xn) ■■ n=l oo so that by (BC1), result (ii) holds. □ Proof of (iii). We have n where, for 0 < z < oo, /(*)= V n-2<2/max(l, We have used the fact that, for n > 1, 1 . 2 = 2 n2 n(n -f 1) \n n + 1 Hence n) < 2E(|X|) < oo. □ 12.10. Kolmogorov's Strong Law of Large Numbers (SLLN) ► LetXuX^... be I1D RVs with E(\Xk\) < oo, \/k. Define Sn := X\ -f X2 + • • • 4- Xn. Then, with 11 := E(Xk), Vk, n Sn —* fiy almost surely. Proof. Define Yn as in Lemma 12.9. By property (ii) of that lemma, we need only show that n-i ^2 Yk -> ^ a.s. k<n
120 Chapter 12: Martingales bounded in C2 A2.10).. But (a) n-1 k<n k<n k<n where Wk := Y* — E(yjt). But, the first term on the right-hand side of (a) tends to /i by A2.9,i) and Cesaro's Lemma; and the second almost surely converges to 0 by Lemma 12.8. □ Notes. The Strong Law is philosophically satisfying in that it gives a pre- precise formulation of E(X) as 'the mean of a large number of independent realizations of X\ We know from Exercise E4.6 that if E(|JV|) = oo, then lim sup \Sn \/n = oo, a.s.. So, we have arrived at the best possible result for the IID case. Discussion of methods. Even though we have achieved a good result, it has to be admitted that the truncation technique seems 'ad hoc*: it does not have the pure-mathematical elegance - the sense of Tightness - which the martingale proof and the proof by ergodic theory (the latter is not in this book) both possess. However, each of the methods can be adapted to cover situations which the others cannot tackle; and, in particular, classical truncation arguments retain great importance. Properly formulated, the argument which gave the result, Theorem 12.2, on which all of this chapter has so far relied, can yield much more. 12.11. Doob decomposition In the following theorem, the statement that 'A is a previsible process null at 0' means of course that Aq = 0 and An G nxT^-i (n G N). THEOREM ►(a) Let (Xn : n £ Z+) be an adapted process with Xn G £2, Vn. Then X has a Doob decomposition (D)
A2.12) Chapter 12: Martingales bounded in C2 121 where M is a martingale null at 0, and A is a previsible process null at 0. Moreover, this decomposition is unique modulo indistinguishability in the sense that if X = Xo -\- M + A is another such decomposition, then ► (b) -Y is a submartingale if and only if the process A is an increasing process in the sense that P(An <An+i,Vn) = l. Proof. If X has a Doob decomposition as at (D), then, since M is a mar- martingale and A is previsible, we have, almost surely, Hence n (c) An = Yl E(** - JY*-! Irk-i), a.s. and if we use (c) to define A, we obtain the required decomposition of X. The 'submartingale' result (b) is now obvious. □ Remark. The Doob-Meyer decomposition, which expresses a sub- submartingale in continuous time as the sum of a local martingale and a pre- previsible increasing process, is a deep result which is the foundation stone for stochastic-integral theory. 12.12. The angle-brackets process (M) Let M be a martingale in C2 and null at 0. Then the conditional form of Jensen's inequality shows that (a) M2 is a submartingale. Thus M has a Doob decomposition (essentially unique): (b) M2 = N + A, where iV is a martingale and A is a previsible increasing process, both iV and A being null at 0. Define Aoo :=| lim^4n, a.s. Notation. The process A is often written (M).
122 Chapter 12: Martingales bounded in C2 A2.12).. Since E(M2) = E(An), we see that (c) M is bounded in C2 if and only ifE(A<x>) < °°. It is important to note that ► (d) An ~ An-! = E(M2 - Ml-^Fn-i) = E[(Mn - 12.13. Relating convergence of M to finiteness of {M) Again let M be a martingale in C2 and null at 0. Define A := (M). (More strictly, let A be *a version of (M).) THEOREM ►(a) limMn(u;) exists for almost every u for which Aoo(<jj) < oo. ► (b) Suppose that M has uniformly bounded increments in that for some K in R, <x> Then Aoo(u>) < oo for almost every u for which liraMn(w) exists. n Remark. This is obviously an extension - and a very substantial one - of Theorem 12.2. Proof o/(a). Because A is previsible, it is immediate that for every k € N, SO) := inf {n € Z+ : An+l > k) defines a stopping time S(k). Moreover, the stopped process As^ is previs- previsible because for B £ B, and n £ N, {AnAS{k) £ B] = F1 U F2, where n-l -F, := (J {S(k) = r;AreB}& F2 := {An £B}D {S(k) < n - Since is a martingale, we now see that (Ms^) = As(k\ However, the process ASW is bounded by k, so that by A2.12,c), MSW is bounded in C2 and (c) lim MnAs(k) exists almost surely.
..A2.14) Chapter 12: Martingales bounded in C2 123 However, (d) {Aoo < oo} = \J{S(k) = oo}. Result (a) now follows on combining (c) and (d). □ Proof of (b). Suppose that o = oo, sup \Mn\ < oo) > 0. n Then for some c > 0, (e) P(T(c) = oo, Aoo = oo) > 0, where T(c) is the stopping time: T(c):=mf{r:\Mr\>c}. Now, ()n) = 0, and MT(C) is bounded by c + AT. Thus (f) E4r(c)An < (c + KJ, Vn. But (MON) shows that (e) and (f) are incompatible. Result (b) follows. □ Remark. In the proof of (a), we were able to use previsibility to make the jump A$(k) ~ As(k)-i irrelevant. We could not do this for the jump Mt(c) ~~ Mt(c)-i which is why we needed the assumption about bounded increments. 12.14. A trivial 'Strong Law* for martingales in C2 Let M be a martingale in C2 and null at 0, and let A = (M). Since is a bounded previsible process, Wnl= l<k<n defines a martingale W. Moreover, since A + An) is Fn-i measurable, E[(Wn - Wn_1J|^n_1] = A + An)~2(An - An-i) < A + An-i)'1 - A + An)-1, a.S.
124 Chapter 12: Martingales bounded in C2 A2.14)- We see that (W)^ < 1, a.s., so that now shows that ►(a) MnjAn —► 0 a.s. on n exists, a.s.. Kronecker's Lemma 12.15. Levy's extension of the Borel-Cantelli Lemmas THEOREM Suppose that for n 6 N, En 6 Tn- Define Zn := Define lEk = number of Eu(k < n) which occur. -i), and Then, almost surely, (a) (Y^ < oo) => (Zoo < oo), (b) (Foo = oo) =^ (Zn/Yn -♦ 1). Remarks, (i) Since Ef* = P(^), it follows that if X)P(^) < °°> then Foo < oo, a.s. (BCl) therefore follows. (ii) Let (En : n G N) be a sequence of independent events associated with some triple (ft, .F, P), and define Tn = o(E\, -E2,..., -^n). Then , a.s., and (BC2) follows from (b). Proof. Let M be the martingale Z — Y, so that Z — M + Y is the Doob decomposition of the submartingale Z. Then (you check!) An := (M)n Yn% a.s. Jb<n If Fqo < oo, then A^ < oo and limMn exists, so that Z^ is finite. (We are skipping 'except for a null a;-set' statements now.) If oo and Aqq < oo then limMn exists and it is trivial that 1. If Fqo = oo and Aqq — oo, then Mn/An —► 0, so that, a fortiori, Mn/Yn -► 0 and Zn/Yn -►1. □
..A2.16) Chapter 12: Martingales bounded in C? 125 12.16. Comments The last few sections have indicated just how powerful the use of (M) to study M is likely to be. In the same way as one can obtain the conditional version Theorem 12.15 of the Borel-Cantelli Lemmas, one can obtain con- conditional versions of the Three-Series Theorem etc. But a whole new world is opened up: see Neveu A975), for example. In the continuous-time case, things are much more striking still. See, for example, Rogers and Williams A987).
Chapter 13 Uniform Integrability We have already seen a number of nice applications of martingale theory. To derive the full benefit, we need something better than the Dominated- Convergence Theorem. In particular, Theorem 13.7 gives a necessary and sufficient condition for a sequence of RVs to converge on C1. The new con- concept required is that of a uniformly integrable (UI) family of random variables. This concept links perfectly with conditional expectations and hence with martingales. The appendix to this chapter contains a discussion of that topic loved by examiners and others: modes of convergence. Our use of the Upcrossing Lemma has meant that this topic does not feature large in the main text of this book. 13.1. An 'absolute continuity' property LEMMA ►(a) Suppose that X 6 C1 = £J(fi, T, P). Then, given e > 0, there exists aS>0 such that for F G F, P(F) < 6 implies that E(|X|; F) < e. Proof If the conclusion is false, then, for some Sq > 0, we can find a sequence (Fn ) of elements of T such that P(Fn)<2-n and E(\X\;Fn) > e0. Let H := limsupi^. Then (BCl) shows that P(J5T) = 0, but the 'Reverse' Fatou Lemma E.4,b) shows that and we have arrived at the required contradiction. □ 126
..A3.3) Chapter IS: Uniform Integr-ability 127 Corollary (b) Suppose that X € C1 and that £ > 0. Then there exists K in [0, oo) such that |; \X\ > K) < e. Proof. Let S be as in Lemma (a). Since KP(\X\ > K) < we can choose K such that P(\X\> K) < 8. 13.2. Definition. UI family ► A class C of random variables is called uniformly integrable (UI) if given e > 0; there exists K in [0, oo) such that <£, VX eC. We note that for such a class C, we have (with K\ relating to e = 1) for every X £ C, E(\X\) = E(\X\; \X\ > /A) + E(\X\; \X\ Thus, a UI family is bounded in C1. It is not true that a family bounded in C1 is UI. Example. Take (n,.F,P) = ([0, l],B[0, l],Leb). Let Then E(|JVn|) = 1, Vn, so that (Xn) is bounded in C1. However, for any K > 0, we have for n > A", E(\Xn\;\Xn\>K) = n so that (Xn) is not UI. Here, Xn -+ 0, but E(Xn) /> 0. 13.3. Two simple sufficient conditions for the UI property Suppose thatC is a class of random variables which is bounded in Cp for some p > 1; thus, for some A 6 [0, oo), E(\X\P) <Ay VX G C.
228 Chapter IS: Uniform Integrability A3.8).. Then C is UI. Proof. If v > K > 0, then v < Kl"pv^ (obviously!). Hence, for K > 0 and X 6 C, we have E(|J\T|; |JV| > K) < Kx-pE{\X\p; \X\ > K) < K1'*A. The result follows. □ (b) Suppose that C is a class of random variables which is dominated by an integrable non-negative variable Y: \X{u>)\<Y(u>), VXeC and E(K) < oo. Then C is UI. Note. It is precisely this which makes (DOM) work for our (tt,^*, P). Proof It is obvious that, for K > 0 and X € C, E(\X\;\X\>K)<E(Y;Y>K), and now it is only necessary to apply A3.1,b) to Y. □ 13.4. UI property of conditional expectations The mean reason that the UI property fits in so well with martingale theory is the following. See Exercise E13.3 for an important extension. THEOREM ► LetXGC1. Then the class {E(X\6) : g a sub-a-algebra of is uniformly integrable. Note. Because of the business of versions, a formal description of the class C in question would be as follows: Y € C if and only if for some sub-a-algebra Q of F, Y is a version of E(X|C7). Proof. Let e > 0 be given. Choose 6 > 0 such that, for F 6 P(F) < 8 implies that E(|X|; F) < e.
..A3.5) Chapter IS: Uniform Integrability 129 Choose K so that A'^EflA'l) < 6. Now let Q be a sub-cr-algebra of T and let Y be any version of E(X|C7). By Jensen's inequality, (a) |y|<E(|.Y||£), a.s Hence E(|F|) < E(|X|) and A'P(|F| > K) < E(|F|) < E so that p(|y| > K) < s. But {\Y\ > K) e G, so that, from (a) and the definition of conditional expectation, E(|F|; |F| > A') < E(\X\; \Y\ > K) < e. D Note. Now you can see why we needed the more subtle result A3.1,a), not just the result A3.1,b) which has a simpler proof. 13.5. Convergence in probability Let (Xn) be a sequence of random variables, and let X be a random variable. We say that ► Xn —► X in probability if for every e > 0, P(\Xn - X\ > e) -> 0 as n -» oo. LEMMA ► If Xn —► X almost surely, then Xn —► X in probability. Proof. Suppose that Xn —► X almost surely and that e > 0. Then by the Reverse Fatou Lemma 2.6(b) for sets, 0 = P(\Xn - X\ > e, i.o.) = P(limsup{|Xn - X\ > e}) >limsupP(|J\:n-A'|>£), and the result is proved.
130 Chapter 13: Uniform Integrability A3.5).. Note. As already mentioned, a discussion of the relationships between var- various modes of convergence may be found in the appendix to this chapter. 13.6. Elementary proof of (BDD) We restate the Bounded Convergence Theorem, but under the weaker hy- hypothesis of 'convergence in probability' rather than 'almost sure conver- convergence'. THEOREM (BDD) Let (Xn) be a sequence of RVs, and let X be a RV. Suppose that Xn —+ X in probability and that for some K in [0, oo), we have for every n and iv [X»(u)\ < K. Then E(\X» - X\)-* 0. Proof. Let us check that P(|A'| < K) = 1. Indeed, for k 6 N, P(\X\ > K + it-1) < P(\X - X»\ > k-1), Vn, so that P(\X\ > K + k-1) = 0. Thus P(|X| > A') = P((J{|X| > K + k-1}) = 0. k Let e > 0 be given. Choose no such that P(\Xn - X\ >|£) < -^— when n > n0. Then, for n > no, E(\Xn - X\) = E(\Xn - X\; \Xn - X\ > Je) + E(\Xn - X\; \Xn - X\ <\ <2K?{\Xn-X\>\e)+\e<e. The proof is finished. □ This proof shows (much as does that of the Weierstrass approximation theorem) that convergence in probability is a natural concept.
..A3.7) Chapter 13: Uniform Integrability 131 13.7. A necessary and sufficient condition for Cl convergence THEOREM Let (Xn) be a sequence in £}, and let X E C1. Then Xn —► X in C1, equivalently E(\Xn - X\) —> 0, if and only if the following two conditions are satisfied: (i) Xn —> X in probability y (ii) the sequence (Xn) is VI. I Remarks. It is of course the 'if' part of the theorem which is useful. Since the result is 'best possible', it must improve on (DOM) for our (Q,^*, P) triple; and, of course, result 13.3(b) makes this explicit. Proof of 'if part. Suppose that conditions (i) and (ii) are satisfied. For K £ [0, oo), define a function <pK ' R —► [—■&"> K] as follows: (K \ix>K, <Pk(x) := < x if \x\ < Ky I -K if x < -K. Let e > 0 be given. By the UI property of the (Xn) sequence and A3.1,b), we can choose K so that E{|M*»)-*»I} < |,Vn; E{\<pK(X)-X\] < |. But, since |v?*:(x) - V^A'Cj/)! < \x - y|, we see that <pK(Xn) -> Vk(X) in probability; and by (BDD) in the form of the preceding section, we can choose no such that, for n > no, E{\vK(Xn)-<pK(X)\}<£-. The triangle inequality therefore implies that, for n > no, E(\Xn-X\)<e, and the proof is complete. □ Proof of (only if7 part. Suppose that Xn -» X in Cl. Let e > 0 be given. Choose N such that n>N => E(\Xn-X\)<e/2.
132 Chapter IS: Uniform Integrability (IS.7). By A3.1,a), we can choose 6 > 0 such that whenever P(-F) < 6, we have E(\Xn\;F)<e (l<n<N), E(\X\;F)<e/2. Since (Xn) is bounded in £*, we can choose K such that K'1 supE(\Xr\) < S. r Then for n > TV, we have P(|Xn| > 70 < S and '|; |A'n| > A') + E(|X - Xn\) < e. For n < N, we have POA^I > K) < 6 and Hence (Xn) is a UI family. Since eP(\Xn - X\ >e)< E(\Xn - A'|) = \\Xn - X\\u it is clear that Xn —► X in probability. □
Chapter 14 UI Martingales 14.0. Introduction The first part of this chapter examines what happens when uniform integra- bility is combined with the martingale property. In addition to new results such as Levy's 'Upward* and 'Downward' Theorems, we also obtain new proofs of the Kolmogorov 0-1 Law and of the Strong Law of Large Numbers. The second part of the chapter (beginning at Section 14.6) is concerned with Doob's Sub martingale Inequality. This result implies in particular that for p > 1 (but not for p = 1) a martingale bounded in Cp is dominated by an element of Cp and hence converges both almost surely and in Cp. The Submartingale Inequality is also used to prove Kakutani's Theorem on product-form martingales and, in an illustration of exponential bounds, to prove a very special case of the Law of the Iterated Logarithm. The Radon-Nikodym theorem is then proved, and its relevance to likelihood ratio explained. The topic of optional sampling, important for continuous-parameter theory and in other contexts, is covered in the appendix to this chapter. 14.1. UI martingales Let M be a UI martingale, so that M is a martingale relative to our set-up (ft, T, {Tn}, P) and (Mn : n 6 Z+) is a UI family. Since M is UI, M is bounded in C1 (by A3.2)), and so Moo •= limMn exists almost surely. By Theorem 13.7, it is also true that Mn —> Moo in C1: E(|Mn-Moo|)-4 0. We now prove that Mn = E(Moo|.7:*n), a.s. For F £\Fn, and r > n, the martingale property yields (*) E(Mr; F) = E(Mn; F). 1S3
134 Chapter 14: UI Martingales A4-1) But |E(Mr;F) - E{M^F)\ < E(|Mr - Mod; <E(|Mr-Mod) Hence, on letting r-+ooin (*), we obtain We have proved the following result. THEOREM ► Let M be a UI martingale. Then Mqo :=limMn exists a.s. and in C1 Moreover, for every n, Mn = E(M00\Jrn)9 a.s.. The obvious extension to UI supermartingales may be proved similarly. 14.2. Levy's 'Upward' Theorem ► Let £ e C1{Q.,Jr,P)> and define Mn := E(£\Fn)9 a.s. Then M is a UI martingale and almost surely and in C1. Proof. We know that M is a martingale because of the Tower Property. We know from Theorem 13.4 that M is UI. Hence M^ :— limMn exists a.s. and in C1, and it remains only to prove that M^ = 77, a.s., where n ~ Without loss of generality, we may (and do) assume that £ > 0. Now consider the measures Qi and Q2 on (ft,Poo), where , Q2(F) = E(MX; F), Fe If F € Tny then since E(r/|^*n) = E(£|.Fn) by the Tower Property, E(ij; F) = E(Mn;F) = E(Moo; F),
..A4.3) Chapter 14: UI Martingales 185 the second equality having been proved in Section 14.1. Thus Qi and Q2 agree on the 7r-system (algebra!) (JTn, and hence they agree on Too. Both 77 and M^ are Fqq measurable; more strictly, Moo may be taken to be Too measurable by defining Moo •= limsupMn for every u. Thus, F:={lj:t1> Moo} e and since Qj(F) = Q2(ir), Hence P(r/ > Moo) = 0, and similarly P(Moo > rj) = 0. D 14.3. Martingale proof of Kolmogorov's 0-1 law Recall the result. THEOREM Let Xi,X2i*.. be a sequence of independent RVs. Define Tn := a(Xn-|_i,JVn+2? • • •)> 7~ := I |7^. n Then if F £T, P(F) = 0 or 1. Proof. Define Tn := a{X\,X2,... ,-^n)- Let F 6 T, and let r/ := 1^. Since r/ E b^oo, Levy's Upward Theorem shows that 77 = E(tj\Too) = lim E^l^), a.s. However, for each n, 7; is 7^ measurable, and hence (see Remark below) is independent of Tn. Hence by (9.7,k), ,) = Efo) = P(F), a.s. Hence 77 = P(i^), a.s.; and since 77 only takes the values 0 and 1, the result follows. □ Remark. Of course, we have cheated to some extent in building parts of the earlier proof into the martingale statements used in the proof just given.
1S6 Chapter l^: UI Martingales A44)" 14.4. Levy's 'Downward' Theorem Suppose that (ft,/*, P) w a probability triple, and that {G-n : n 6 N} w a collection of sub-a-algebras of T such that G-oo := f|£-fc C • • • C fiL(fl+1) C G-n C • • • C £?_!. 7 6 £1(ft,/*, P) anJ define M-n := _oo := limA/_n exw<5 a.^. anJ m and (*) -A/-00 = EG|£_oo), a.s. Proof. The Upcrossing Lemma applied to the martingale (Mk,Qk:-N<k<-l) can be used exactly as in the proof of Doob's Forward Convergence Theorem to show that HmM_n exists a.s. The uniform-integrability result, Theorem 13.4, shows that limM_n exists in C1. That (*) holds (if you like with M-oo := limsupM_n G m£/_oo) follows by now-familiar reasoning: for G € G-oo Q G-n and now let r | oo. D 14.5. Martingale proof of the Strong Law Recall the result (but add C1 convergence as bonus). THEOREM Let Xi,X2,... be IID RVs, with E{\Xk\) < oo,VA\ Let fi be the common value of E(Xn). Write Then n~l Sn —> fi, a.s. and in Cl.
..A4.6) Chapter 14: UI Martingales 137 Proof. Define G-n •= <r(Sn,Sn+i , 5n+25 •••)> G-oo •= j ]£?-n- n We know from Section 9.11 that A'i \G-n) = n5n, a.s. Hence L := lim n^Sn exists a.s. and in C1. For definiteness, define L limsupn~15n for every u>. Then for each &, Xk+i 4- • • • 4- Xk+n L = hm sup n so that L 6 m7i where 7i = <7(Xjt+i,-Yjt+2,.. .)• ^Y Kolmogorov's 0-1 law, = c) = 1 for some c in R. But c = E(£) = YimEin^Sn) = fi. D Exercise. Explain how we could have deduced C1 convergence at A2.10). Hint. Recall SchefFe's Lemma 5.10, and think about how to use it. Remarks. See Meyer A966) for important extensions and applications of the results given so far in this chapter. These extensions include: the Hewitt-Savage 0-1 law, de Finetti's theorem on exchangeable random variables, the Choquet-Deny theorem on bounded harmonic functions for random walks on groups. 14.6. Doob's Submartingale Inequality THEOREM ►(a) Let Z be a non-negative submartingale. Then, for c > 0, cP (sup Zk > c ) < E [Zn] sup Zk > c ) < E(Zn). \k<n J \ k<n J Proof. Let F := {supfc<n Zk > c}. Then F is a disjoint union F = Fq U F\ U ... U Fn,
138 Chapter 14'- UI Martingales A4-6).. where Fo :={Z0 > Fk := {z0 < c} n {z1 < c} n... n {zk-x < c} n {zk > c} Now, Fk € Pk, and Z > c on i**. Hence, Summing over k now yields the result. □ The main reason for the usefulness of the above theorem is the follow- following. LEMMA ►(b) If M is a martingale, c is a convex function, and E|c(Mn)| < oo, Vn, then c(M) is a submartingale. Proof. Apply the conditional form of Jensen's inequality in Table 9.7. □ Kolmogorov's inequality ► Let (Xn : n E N) be a sequence of independent zero-mean RVs in C2. Define o~\ := Var(Ajt). Write n Sn i= Ai -|- • • • -f" A\i, vn := Var(on) = y k=\ Then, for c>0, c2P(sup|5it|>c] < Vn \k<n ) Proof. We know that if we set Tn = o{X\, X2,... ,-^n), then 5 = (Sn) is a martingale. Now apply the Submartingale Inequality to S2. . □ Note. Kolmogorov's inequality was the key step in the original proofs of Kolmogorov's Three-Series Theorem and Strong Law. 14.7. Law of the Iterated Logarithm: special case Let us see how the Submartingale Inequality may be used via so-called exponential bounds to prove a very special case of Kolmogorov's Law of the
Chapter 14: UI Martingales 139 Iterated Logarithm which is described in Section A4.1. (You would do well to take a quick look at this proof even though it is not needed later.) THEOREM Let (Xn : n 6 N) be IID RVs each with the standard normal N@,1) distribution of mean 0 and variance 1. Define Sn := X\ -j- X2 ■+■••• + Xn. Then, almost surely, c limsup r- = 1. BnloglognJ Proof. Throughout the proof, we shall write h(n) := BnloglognM, n > 3. (It will be understood that integers occurring in the proof are greater than e, when this is necessary.) Step 1: An exponential bound. Define Tn := cr(Xi,-X*2,... ,-^n)« Then 5 is a martingale relative to {fn}- It is well known that for 0 £ R, n G N, EeeSn = e^2n < 00. The function x h-> e6x is convex on R, so that eeSn is a submartingale and, by the Submartingale Inequality, we have, for 0 > 0, P (suPt<n Sk > c) = P (suPjt<n e»s* > e«c) < e-««E (e°s» ) . This is a type of exponential bound much used in modern probability theory. In our special case, we have P(supS*>c) \k<n /
Chapter 14: UI Martingales and for c > 0, choosing the best #, namely c/n, we obtain (a) P [supSk>c )< \k<n Step 2: Obtaining an upper bound. Let K be a real number with K > 1 (We are interested in cases when K is close to 1.) Choose cn := Kh(Kn~l). Then P ( sup Sk > cn) < exp(-c2j2Kn) = (n - l)~K(log K \k<Kn / ~K The First Borel-Cantelli Lemma therefore shows that, almost surely, for all large n (all n > no(iv)) we have for A'" < k < A"n, Sk < sup Sk<cn = Kh(Kn~1) < Kh(k). k<Kn Hence, for K > 1, limsup/i(A:)~15jt < Ky a.s. k By taking a sequence of AT-values converging down to 1, we obtain limsup/i(A:)~15^ < 1, a.s. k Step 3: Obtaining a lower bound. Let iV be an integer with N > 1. (We are interested in cases when Ar is very large.) Let £ be a number in @,1). (Of course, e will be small in the cases which interest us.) Write S(r) for 5r, etc., when typographically more convenient. For n € N, define the event Fn := {S{Nn+1) - S(Nn) > A - e)h(Nn+1 - Nn)}. Then (see Proposition 14.8(b) below), P(Fn) = 1 - $(y) > Bw)-l'(y + jT1)-1 exP(-y2/2), where y = A -£){21oglog(iVn+1 - K Thus, ignoring 'logarithmic terms', P(Fn) is roughly (n logN)~^~e^2 so that ^n) = oo. However, the events Fn (n £ N) are clearly independent, so
..(H.8) Chapter 14: UI Martingales that (BC2) shows that, almost surely, infinitely many Fn occur. Thus, for infinitely many n, 5GV«+i) > ^ _ e)fc(#n+1 _ ivn) + S(Nn). But, by Step 2, SGVn) > -2h(Nn) for all large n, so that for infinitely many n, we have - Nn) - It now follows that n (You should check that 'the logarithmic terms do disappear'.) The rest is obvious. D V 14.8. A standard estimate on the normal distribution We used part of the following result in the previous section. Proposition Suppose that X has the standard normal distribution, so that, for x e R, P(X>x) = l- $(*) = / <p(y)dy Jx where (a) 00 Then, for x > o, P(X P(X Proof. Let x > 0. Since <p'(y) = —y<p(y), /CO /» y<p(y)dy >x J X yielding (a).
H2 Chapter 1J,: VI Martingales A4.8).. Since (srV(y))' = -A + s/" A + y-2)ip(y)dy < A + *) / J x yielding (b). □ 14.9. Remarks on exponential bounds; large-deviation theory Obtaining exponential bounds is related to the very powerful theory of large deviations - see Varadhan A984), Deuschel and Stroock A989) - which has an ever-growing number of fields of application. See Ellis A985). You can study exponential bounds in the very specific context of mar- martingales in Neveu A975), Chow and Teicher A978), Garsia A973), etc. Much of the literature is concerned with obtaining exponential bounds which are in a sense best possible. However, Elementary' results such as the Azuma-HoefFding inequality in Exercise E14.1 are very useful in numerous applications. See for example the applications to combinatorics in Bollobas A987). 14.10. A consequence of Holder's inequality Look at the statement of Doob's Cp inequality in the next section in order to see where we are going. LEMMA Suppose that X and Y are non-negative RVs such that cP(X >c)< E(Y; X > c) for every c> 0. Then, for p > 1 and p~1 -f- q~l = 1, we have Proof We obviously have L := / pcp~lP{X > c)dc < / pc*-2E(Y;X > c)dc =: R. Jc=O Jc=O Using Fubini's Theorem with non-negative integrands, we obtain
..A4.11) Chapter 14: UI Martingales 143 [ / / pcp'ldc P{<Lj) = E(Xp) Jn \Jc=o J J Exactly similarly, we find that We apply Holder's inequality to conclude that (**) E(X') < > Suppose that \\Y\\P < oo, and suppose for now that \\X\\P < oo also. Then since (p — l)q = />, we have \\X'~% = so (**) implies that \\X\\P < q\\Y\\p. For general X, note that the hypothesis remains true for X An. Hence ||X A n\\p < ^||y||P for all n, and the result follows using (MON). □ 14.11. Doob's &> inequality THEOREM ►(a) Letp > 1 and define q so thatp~l +q~l = 1. Let Z be a non-negative submartingale bounded in Cp, and define (this is standard notation) Z* :— sup Zk> k£Z+ Then Z* € £p, and indeed The submartingale Z is therefore dominated by the element Z* of Cp. As n —► oo, Zoo •= limZn exists a.s. and in Cp and
Chapter 14: UI Martingales (i4.ll)- (b) If Z is of the form \M\, where M is a martingale bounded in Cp, then := limMn exists a.s. and in Cp, and of course Zoo = |M| a.s. Proof. For n G Z+, define Z* := supJt<nZ*. From Doob's Submartingale Inequality 14.6(a) and Lemma 14.10 we see that Property (*) now follows from the Monotone-Convergence Theorem. Since (-Z) is a supermartingale bounded in £p, and therefore in £*, we know that Zoo := HmZn exists a.s. However, \zn - z\p < BZ*y e £p, so that (DOM) shows that Zn —> Z in Cp. Jensen's inequality shows that is non-decreasing in r, and all the rest is straightforward. □ 14.12. Kakutani's theorem 011 'product* martingales Let Xi,X2,... be independent non-negative RVs, each of mean 1 Define Mo := 1, and, for nGN, let Mn \ = X\X2 ...An. Then M is a non-negative martingale, so that Moo :=limAfn exists a.s. The following five statements are equivalent: (i) E(Afoo) = 1, (ii) Mn -* Moo in Cl; (iii) M is UI; (iv) Y[an > 0 where 0 < an := E(Arj) < 1, If one (then every one) of the above five statements fails to hold, then P(Moo = 0) = 1. Remark. Something of the significance of this theorem is explained in Section 14.17.
..(HAS) Chapter 14: UI Martingales 1J5 Proof. That an < 1 follows from Jensen's inequality. That an > 0 is obvious. First, suppose that statement (iv) holds. Then define y\ vi j — • • • • — y- Then N is a martingale for the same reason that M is. See A0.4,b). We have l = l/(aia2 ... anJ < l/(]~[ *kJ < 00, so that N is bounded in £ . By Doob's £2 inequality, < E Tsupl^l2^ < 4supE|AT2|) < 00, so that M is dominated by M* :— supn \Mn\ G C1. Hence M is UI and properties (i)-(iii) hold. Now consider the case when Y\an = 0. Define iV as at (*). Since N is a non-negative martingale, limiVn exists a.s. But since J*]an = 0, we are forced to conclude that M^ = 0, a.s. The equivalence of (iv) and (v) is known to us from D.3). The theorem is proved. D 14.13. The Radon-Nikodym theorem Martingale theory yields an intuitive — and ''constructive1 — proof of the Radon-Nikodym theorem. We are guided by Meyer A966). We begin with a special case. THEOREM (I) Suppose that (tt,^7, P) is a probability triple in which T is separable in that = a{Fn : n G N) for some sequence (Fn) of subsets of Q. Suppose that Q is a finite measure on (Q^J7) which is absolutely continuous relative to P in that (a) for FeJ7, P(F) = 0 => Q(F) = 0.
Chapter 14: UIMartingales (I4.I8).. Then there exists X in £1(fi,^r, P) such that Q = XP (see Section 5.14) in that Q(F) = / XdP = E(X; F), VF € T. JF The variable X is called a version of the Radon-Nikodym derivative o/Q relative to P on (il^J7). Two such versions agree a.s. We write ,o = -A on j , a.s. Remark. Most of the cr-algebras we have encountered are separable. (The a-algebra of Lebesgue-measurable subsets of [0,1] is not.) Proof. With the method of Section 13.1 (a) in mind, you can prove that property (a) implies that (b) given e > 0, there exists S > 0 such that, for FG P(F) < 6 =» Q(F) < e. Next, define Then for each n, Tn consists of the 2r(n) possible unions of *atoms' of Tnt an atom A of ^-*n being an element of Tn such that 0 is the only proper subset of A which is again an element of Tn. (Each atom A will have the form x nH2n...r\Hni where each Hi is either Fi or Ff.) Define a function Xn : Q, —► [0,00) as follows: if u E Anfk, then ° if (Z(An,k)/P(An,k) if P(An,k)>0. Then Xn € &(Sl,Fn, P) and (c)
..A4.18) Chapter 14: UI Martingales 147 The variable Xn is the obvious version of dQ/dP on A2,^"n). It is obvious from (c) that X = (Xn : n € Z+) is a martingale relative to the filtration (Tn : n 6 Z+), and since this martingale is non-negative, n, exists, a.s. Let e > 0, choose S as at (a), and let K G @, oo) be such that K'lqnp) < 6. Then P(Xn > K) < K'lE(Xn) = K'lQ(il) < 6, so that E(Xn;Xn > K) = q,(Xn > K) < e. The martingale X is therefore UI, so that It now follows from (c) that the measures F*->E(X;F) and F agree on the 7r-system IJ^n, so that they agree on T. All that remains is the proof of uniqueness, which is now standard for us. □ Remark. The familiarity of all of the arguments in the above proof em- emphasizes the close link between the Radon-Nikodym and conditional expec- expectation which is made explicit in Section 14.14. Now for the next part of the theorem ... (II) The assumption that T is separable can be dropped from Part I. Once one has Part II, one can easily extend the result to the case when P and Q are cr-finite measures by partitioning Q, into sets on which both are finite. Proving Part II of the theorem is a piece of 'abstract nonsense' based on the fact that £} (or, more strictly, L1) is a metric space, and in particular on the role of sequential convergence in metric spaces. You might well want to take Part II for granted and skip the remainder of this section. Let Sep be the class of all separable sub-cr-algebras of T. Part I shows that for Q € Sep, there exists Xg in £1(fi,5, P) such that dQ,/dP = XG ; equivalent^, E(X0; G) = Q(<7),
Chapter H: UI Martingales (HAS).. We are going to prove that there exists X in £J(fi, J7, P) such that (d) XG -> X in C1 in the sense that given e > 0, there exists K in Sep such that if K C £ G Sep, then ||A^ - J^||i < £. First, we note that it is enough to prove that (e) (Xq : Q G Sep) is Cauchy in £} in the sense that given e > 0, there exists fC in Sep such that if K C £7t- G Sep for 2 = 1,2, then \\Xgx — Xg7 \\\ < e. Proof that (e) implies (d). Suppose that (e) holds. Choose Kn G Sep such that if K,n C Qi G Sep for i — 1,2, then Let *H(p) = cr(/Ci,/C2,... ,A^n). Then (see the proof of F.10,a)) the limit X := HmX?f(n) exists a.s. and in £x, and indeed, Set X := limsupX7^(n) for definiteness. For any Q G Sep with Q 3 7in we have Result (d) follows. □ Proof of (e). If (e) is false, then (why?!) we can find £0 > 0 and a sequence /C@) C )C{\) C ... of elements of Sep such that \\X)C(n) - -X/C(n+l)||l > £0, Vfl. However, it is easily seen that (Xjc(n)) is a UI martingale relative to the fil- filtration (£(n)), so that X/c(n) converges in C1. The contradiction establishes that (e) is true. □ Proof of Part II of the theorem. We need only show that for X as at (d) and for F G f, we have ; F) =
..A4-15) Chapter H: UI Martingales 149 Choose /C such that for K C Q € Sep, \XQ — J\T||i < £. Then a(K,F) G Sep, where cr(/C, F) is the smallest cr-algebra extending K, and including F; and, by a familiar argument, ; F) - The result follows. □ 14.14. The Radon-Nikodym theorem and conditional expectation Suppose that (ft,.F, P) is a probability triple, and that Q is a sub-a-algebra of T. Let X be a non-negative element of £}(Sl,T, P). Then defines a finite measure Q on A2,£). Moreover, Q is clearly absolutely continuous relative to P on Q, so that, by the Radon-Nikodym theorem, (a version...) Y := dQ/dP exists on Now Y is ^-measurable, and E(F; G) = Q(G) = E(X; G), Ge Hence Y is a version of the conditional expectation of X given Q\ Y = E(X\Q)9 a.s. Remark. The right context for appreciating the close inter-relations be- between martingale convergence, conditional expectation, the Radon-Nikodym theorem, etc., is the geometry of Banach spaces. 14.15. Likelihood ratio, equivalent measures Let P and Q be probability measures on (fi,^7) such that Q is absolutely continuous relative to P, so that a version X of dQ/dP on T exists. We say that Y is (a version of) the likelihood ratio of Q given P. Then P is absolutely continuous relative to Q if and only if P(X > 0) = 1, and then X~1 is a version of dP/dQ. When each of P and Q is absolutely continuous relative to the other, then P and Q are said to be equivalent. Note that it then makes sense to define x/dPdQ, := / X^dP = / Jf Jf / / / f Jf Jf and we can hope for a fuller understanding of what Kakutani achieved
150 Chapter 1 j: UI Martingales A4-16).. 14.16. Likelihood ratio and conditional expectation Let (fi,^7, P) be a probability triple, and let Q be a probability measure on (fi,^7) which is absolutely continuous relative to P with density X. Let Q be a sub-cr-algebra of F. What (/-measurable function (modulo versions) yields dQ/dP on Q1 Yes, of course, it is Y = E(X|£7), for, yet again, with E denoting P-expectation, E(Y; G) = E(X; G) = Q(G) for G € 0. Hence, if {Fn} is a filtration of (ft,.?7), then the likelihood ratios (*) (dq/dP on rn) = E{X\Tn) form a UI martingale. (This is of course why the martingale proof of the Radon-Nikodym theorem was bound to succeed!) Here and in the next two sections, we are dropping the la.s.' qualifications on such statements as (*): we have outgrown them. 14.17. Kakutani's Theorem revisited; consistency of LR test Let fi = RN, Xn(uj) = ujni and define the a-algebras = a(Xk :iteN), Tn = cr(Xk :l<k<n). Suppose that for each n, fn and gn are everywhere positive probability density functions on R and let rn(x) := gn(x)/fn(x)- Let P [respectively, Q] be the unique measure on (fi, T) which makes the variables Xn independent, Xn having probability density function fn [respectively, gn]. Clearly, but you should prove this, Mn := dQ/dP = Y^i ... Yn on Ttf where Yn = rn(Xn). Note that the variables (Yn : n E N) are independent under P and that each has P-mean 1. For any of a multitude of familiar reasons, M is a martingale. Now if Q is absolutely continuous relative to P on T with dQ/dP — on T, then Mn = £(^1^), and M is UI. Conversely, if M is UI, then exists (a.s., P) and [ , Vn. But then the probability measures and F*-+E(Moo;F) agree on the 7r-system (J^n and so agree on J^*, so that Mqq = dQ/dP on T. Thus Q is absolutely continuous relative to P on T if and only if M is UI.
..A4.18) Chapter 14: UI Martingales 151 Kakutani's Theorem therefore implies that Q is equivalent to P on if and only if n n(*)9n(x)dx > 0, equivalently if (*) V / { V7»(*) ~ V9n(x)\ dx < and that then P is also absolutely continuous relative to Q. Suppose now that the Xn are identically distributed independent vari- variables under each of P and Q. Thus, for some probability density functions / and g on R, we have fn = / and gn = g for all n. It is clear from (*) that Q is equivalent to P if and only if / = g almost everywhere with respect to Lebesgue measure, in which case Q = P. Moreover, Kakutani's Theorem also tells us that if Q ^ P, then Mn —► 0 (a.s., P) and this is exactly the consistency of the Likelihood-Ratio Test in Statistics. 14.18. Note on Hardy spaces, etc. (prestissimo!) We have seen in this chapter that for many purposes, the class of UI mar- martingales is a natural one. The appendix to this chapter, on the Optional- Sampling Theorem, provides further evidence of this. However, what we might wish to be true is not always true for UI martingales. For example, if M is a UI martingale and C is a (uniformly) bounded previsible process, then the martingale C*M need not be bounded in £}. (Even so, C • M does converge a.s.!) For many parts of the more advanced theory, one uses the 'Hardy' space Hq of martingales M null at 0 for which one (then each) of the following equivalent conditions holds: (a) M* :=sup|Mn| e (b) [M\i G £\ where [M)n := and [A/]oo =| limMn. By a special case of a celebrated Burkholder-Davis-Gundy theorem, there exist absolute constants cp,Cp A < p < oo) such that i < cr\\[M\i\\p (i < P < « The space TiJ is obviously sandwiched between the union of the spaces of martingales bounded in Cp (p > 1) and the space of UI martingales. Its
152 Chapter 14: UI Martingales A4.18).. identification as the right intermediate space has proved very important. Its name derives from its important links with complex analysis. Proof of the B-D-G inequality or of the equivalence of (a) and (b) is too difficult to give here. But we can take a very quick look at the relevance of (b) to the C • M problem. First, (b) makes it clear that (d) if M E %l and C is a bounded previsible process, then C • M E %q, and we shall now see that, in a sense, this is 'best possible'. Suppose that we have a martingale M null at 0 and a (bounded) previs- previsible process e = (e* : k € N) where the Sk are IID RVs with P(£* = dzl) = J, and where e and M are independent. We want to show that (e) M G H\ if (as well as only if) e • M is bounded in C1. We run into no difficulties of 'regularity' if we condition on M: E\(e . M)\n = EE{|(£ . M)n\ \o{M)} > Z And where did the last inequality appear from? Let (a* : k E N) be a sequence of real numbers. Think of a* as M* — Mk-\ when M is known. Define n Xk := akek, Wn := Xt + ... + Xn, vn = Then (see Section 7.2) = E so that, certainly, E(W^) < 3v^. On combining this fact with Holder's inequality in the form we obtain the special case of Khinchine's inequality we need: >3-*4- □ For more on the topics in this section, see Chow and Teicher A978), Dellacherie and Meyer A980), Doob A981), Durrett A984). The first of these is accessible to the reader of this book; the others are more advanced.
Chapter 15 Applications 15.0. Introduction — please read! The purpose of this chapter is to give some indication of some of the ways in which the theory which we have developed can be applied to real-world problems. We consider only very simple examples, but at a lively pace! In Sections 15.1-15.2, we discuss a trivial case of a celebrated result from mathematical economics, the Black-Scholes option-pricing for- formula. The formula was developed for a continuous-parameter (diffusion) model for stock prices; see, for example, Karatzas and Schreve A988). We present an obvious discretization which also has many treatments in the literature. What needs to be emphasized is that in the discrete case, the result has nothing to do with probability, which is why the answer is com- completely independent of the underlying probability measure. The use of the 'martingale measure' P in Section 15.2 is nothing other than a device for expressing some simple algebra/combinatorics. But in the diffusion set- setting, where the algebra and combinatorics are no longer meaningful, the martingale-representation theorem and Cameron-Martin-Girsanov change- of-measure theorem provide the essential language. I think that this justifies my giving a 'martingale' treatment of something which needs only junior- school algebra. Sections 15.3-15.5 indicate the further development of the martingale formulation of optimality in stochastic control, at which Exercise E10.2 gave a first look. We consider just one 'fun' example, the 'Mabinogion sheep problem?\ but it is an example which illustrates rather well several techniques which may be effectively utilized in other contexts. In Sections 15.6-15.9, we look at some simple problems of filtering: estimating in real-time processes of which only noisy observations can be made. This topic has important applications in engineering (look at the IEEE journals!), in medicine, and in economics. I hope that you will look 153
154 Chapter 15: Applications A5.0).. further into this topic and into the important subject which develops when filtering is combined with stochastic-control theory. See, for example, Davis and Vintner A985) and Whittle A990). Sections 15.10-15.12 consist of first reflections on the problems we en- encounter when we try to extend the martingale concept. 15.1. A trivial martingale-representation result Let S denote the two-point set { — 1,1}, let S denote the set of all subsets of 5, let p G @,1), and let fi be the probability measure on E, S) with Let N £ N. Define (ft, T, P) = E, 2, \i)N so that a typical element of ft is u> — (o;i,u;2,...,aw), ^k 6 {-1,1}- Define €k - ft —► R by £*(u>) := a;*, so that (£i, £21 • • • > £w) SLre HD RVs each with law fi. For 0 < n < Ar, define n Note that E(ek) = l.p + (-1)A ~p) = 2p- 1. We see that (a) Z = (Zn : 0 < n < TV) « fl martingale {relative to ({^*n : 0 < ^ < iV"}, P)). LEMMA If M — (Mn : 0 < n < N) is a martingale (relative to ({^"n : 0 < n < N},P)), then there exists a unique previsible process H such that Z, that is, Mn = Mo -f ELi Remark. Since ^0 = {0,^}, Mo is constant on ft and has to be the common value of the E(Mn). Proof. We simply construct H explicitly. Because Mn is Tn measurable,
..A5.2) Chapter 15: Applications 155 for some function /„ : {—1, l}n —»■ R. Since Mis a martingale, we have - p)fn{uju ... ,a;n_i, -1) Hence the expressions (hi) 2A -P) and (b2) are equal; and if we define Hn(oj) to be their common value, then H is clearly previsible, and simple algebra verifies that M = Mo -f H • Z, as required. You check that # is unique. □ 15.2. Option pricing; discrete Black-Scholes formula Consider an economy in which there are two 'securities': bonds of fixed interest rate r, and stock, the value of which fluctuates randomly. Let iV be a fixed element of N. We suppose that values of units of stock and of bond units change abruptly at times 1,2,..., N. For n = 0,1,..., N, we write Bn — A -f r)nBo for the value of 1 bond unit throughout the open time interval (h, n -j-1), Sn for the value of 1 unit of stock throughout the open time interval n,n -f 1). You start just after time 0 with a fortune of value x made up of Aq units of stock and Vo of bond, so that ^o£o + V0B0 = x. Between times 0 and 1 you invest this in stocks and bonds, so that just before time 1, you have A\ units of stock and Vi of bond so that So, (j4i,Vi) represents the portfolio you have as your 'stake on the first game'.
156 Chapter 15: Applications A5.2).. Just after time n — 1 (where n > 1) you have An-i units of stock and Vn-i units of bond with value By trading stock for bonds or conversely, you rearrange your portfolio be- between times n — 1 and n so that just before time n, your fortune (still of value Xn-i because we assume transaction costs to be zero) is described by n_! = AnSn-x -f VnBn-, (n > 1). Your fortune just after time n is given by (a) Xn = AnSn + VnBn (n>0) and your change in fortune satisfies (b) Xn - Xn-i = An(Sn - 5«.,) + Vn(Bn - Now, and where i?n is the random 'rate of interest of stock at time n\ We may now rewrite (b) as Xn — Xn-i = so that if we set then (d) Fn - yB_! = A + r)-<—«AIl5B_i(ii1, - r). Note that (c) shows Yn to be the discounted value of your fortune at time n, so that the evolution (d) is of primary interest. Let fi,.F,£n(l < n < N), Zn(Q <n<N) and fn(Q < n < N) be as in Section 15.1. Note that no probability measure has been introduced. We build a model in which each Rn takes only values a, b in (—1, oo), where a < r < 6,
..A5.2) Chapter 15: Applications 157 by setting , . _ a + b b — a 00 &n = ~y- + —£n. But then (f) Rn-r = l(b- a)(en - 2p + 1) = J(* - «)(^n - Zn-i), where we now choose Note that (d) and (f) together display Y as a 'stochastic integral' relative to Z. A European option is a contract made just after time 0 which will allow you to buy 1 unit of stock just after time N at a price K\ K is the so-called striking price. If you have made such a contract, then just after time iV, you will exercise the option if Sn > K and will not if Sn < K- Thus, the value at time iV of such a contract is (Sn — K)+. What should you pay for the option at time 0? Black and Scholes provide an answer to this question which is based on the concept of a hedging strategy. A hedging strategy with initial value x for the described option is a portfolio management scheme {(-An> Vn) : 1 < n < N] where the processes A and V are previsible relative to {^V»}i an<3 where, with X satisfying (a) and (b), we have for every a;, (hi) X0(uj) = x, (h2) Xn(u>) > 0 @ < n < N), (h3) Xn(uj)=(Sn(u)-K)+. Anyone employing a hedging strategy will by appropriate portfolio man- management, and without going bankrupt, exactly duplicate the value of the option at time N. Note. Though Black and Scholes insist that Xn(u) > 0, Vn, Vu;, they (and we) do not insist that the processes A and V be positive. A negative value of V for some n amounts to borrowing at the fixed interest rate r. A negative value of A corresponds to ''short-selling' stock, but after you have read the theorem, this may not worry you!
158 Chapter 15: Applications A5.2).. THEOREM A hedging strategy with initial value x exists if and only if := E [A + r)~N(SN - where E is the expectation for the measure P of Section 15.1 with p as at (g). There is a unique hedging strategy with initial value xq, and it involves no short-selling: A is never negative. On the basis of this result, it is claimed that xo is the unique fair price at time 0 of the option. Proof. In the definition of hedging strategy, there is nowhere any mention of an underlying probability measure. Because however of the 'for every uj1 requirement, we should consider only measures on tt for which each point uj has positive mass. Of course, P is such a measure. Suppose now that a hedging strategy with initial value x exists, and let A, Vy X, Y denote the associated processes. From (d) and (f), where F is the previsible process with Fn = ( Of course, F is hounded because there are only finitely many (n,Lo) combi- combinations. Thus Y is a martingale under the P measure, since Z is; and since Yq — x and Yn = A + r)~N(Sn — A")+ by (c) and the definition of hedging strategy, we obtain x = (We did not use the property that X > 0.) Now consider things afresh and define Yn := E (A + r)~N(SN ~ Then Y is a martingale, and by combining (f) with the martingale-represent- martingale-representation result in Section 15.1, we see that for some unique previsible process A, (d) holds. Define Xn := A + r)nFn, Vn := (Xn - AnSn)/Bn Then (a) and (b) hold. Since Xq = x and XN = (SN ~
..A5.8) Chapter 15: Applications 159 the only thing which remains is to prove that A is never negative. Because of the explicit formula A5.1,bl), this reduces to showing that E [(SN-K)+\Sn-USn = A +6)S«-i] > E [(SN - K)+\Sn-uSn = A + a)S»-i] ; and this is intuitively obvious and may be proved by a simple computation on binomial coefficients. 15.3. The Mabinogion sheep problem In the Tale of Peredur ap Efrawg in the very early Welsh folk tales, The Mabinogion (see Jones and Jones A949)), there is a magical flock of sheep, some black, some white. We sacrifice poetry for precision in specifying its behaviour. At each of times 1,2,3,... a sheep (chosen randomly from the entire flock, independently of previous events) bleats; if this bleating sheep is white, one black sheep (if any remain) instantly becomes white; if the bleating sheep is black, then one white sheep (if any remain) instantly becomes black. No births or deaths occur. The controlled system Suppose now that the system can be controlled in that just after time 0 and just after every magical transition time 1,2,..., any number of white sheep may be removed from the system. (White sheep may be removed on numerous occasions.) The object is to maximize the expected final number of black sheep. Consider the following example of a policy: Policy A: at each time of decision, do nothing if there are more black sheep than white sheep or if no black sheep remain; otherwise, immediately reduce the white population to one less than the black population. The value function V for Policy A is the function F:Z+ xZ+ -*[0,oo), where for w,b 6 Z+, V(w, b) denotes the expected final number of black sheep if one adopts Policy A and if there are w white and b black sheep at time 0. Then V is uniquely specified by the fact that, for w,b € Z+, (al)Y@,6) = 6;
160 Chapter 15: Applications A5.8).. (a2) V(w, b) = V(w — 1, b) whenever w > b and w > 0; (a3) V(wtb) = ^jV(u; + 1,6-1) + ^rbV(w -1,6 + 1) whenever w < 6, 6 > 0 and w > 0. It is almost tautological that if Wn and Bn denote the numbers of white and black sheep at time n, then, if we adopt Policy A, then, whatever the initial values of Wq and Bo, (b) V(Wn,Bn) is a martingale relative to the natural filtration of {(Wn,Bn) : n > 0}. (c) LEMMA The following statements are true for w, b £ Z": (cl) V(w,b) > V(w — 1,6) whenever w > 0, (c2) V(w,b) > ^-bV(w + 1,6- 1) + ^hV(w -1,6 + 1) whenever w > 0 and b > 0. Let us suppose that this Lemma is true. (It is proved in the next Section.) Then, for any policy whatsoever, (d) V(Wn,Bn) is a supermartingale. The fact that V(Wn,Bn) converges means that the system must a.s. end up in an absorbing state in which sheep are of one colour. But then ^(T^oo, #oo) is just the final number of black sheep (by definition of V). Since F(iyn, Bn) is a non-negative supermartingale, we have for deterministic Hence, whatever the initial position, the expected final number of black sheep under any policy is no more than the expected final number of black sheep if Policy A is used. Thus Policy A is optimal. In Section 15.5, we prove the following result: V(k, k) - Bk + f - \flrk) -> 0 as h -4 oo. Thus if we start with 10000 black sheep and 10000 white sheep, we finish up with (about) 19824 black sheep on average (over many 'runs'). Of course, the above argument worked because we had correctly guessed the optimal policy to begin with. In this subject area, one often has to make good guesses. Then one usually has to work rather hard to clinch results
..A54) Chapter 15: Applications 161 which correspond in more general situations to Lemma (c) and statement (d). You might find it quite an amusing exercise to prove these results for our special problem now, before reading on. For a problem in economics which utilizes analogous ideas, see Davis and Norman A990). 15.4. Proof of Lemma 15.3(c) It will be convenient to define (a) vk:=V(k,k). Everything hinges on the following results: for 1 < c < k, (bl) V(k - c, k + c) = vk + Bk - _ _ /Ol* "I \ V I 1 (b2) l-c,k = vk + Bk Vk) J J — K which simply reflect A5.3,a3) together with the 'boundary conditions': (c) V(k + 1, k) = vk, V@,2k + 1) = 2k + 1. Now, from A5.3,a2), and hence, from (b2) with c = 1, we find that 1 — 2pk i -f-p/t where pk is the chance of obtaining k heads and k tails in 2k tosses of a fair coin: (e) Pk = 2 -2k Result (d) is the key to proving things by induction.
162 Chapter 15: Applications A5.4)-- Proof of result A5.3,cl). From A5.3,a2), result A5.3,cl) is automatically true when w > b. Hence, we need only establish the result when w < b. Now if w < b and w -f b is odd, then (w,b) = (& + 1 -c,A? + c) for some c with 1 < c < k. But formulae (b) show that it is enough to show that for 1 < a < k, and since 2k \ fBk\ f 2k - 1 \ / Bk - I -a-l) I \k) ~ [k + a-l / { k we need only establish the case when a = 1: > Bk - vkJVV ( which reduces to (f) vk>2k-p^\ But property (f) follows by induction from (d) using only the fact that Pk is decreasing in k. Proof for the case when b + w is even may be achieved similarly. Proof of result A5.3,c2). Because of A5.3,a3), the result A5.3,c2) is au- automatically true when w < 6, so we need only establish it when w > b. In analogy with the reduction of the 'general a' case to the 'boundary case a = 1' in the proof of A5.3,cl), it is easily shown that it is sufficient to prove A5.3,c2) for the case when (to, b) = (k + 1, k -f 1) for some k. Formulae (b) reduce this problem to showing that + Bk + l)Pk}vk < 2kBk
..A5.5) Chapter 15: Applications 163 and this follows by induction from (d) using only the fact that Bk -f l)pk is increasing in k. 15.5. Proof of result A5.3,d) Define ak := vk - 2k - (pk) Then, from A5.4,d), ajt+1 = A - pk)(*k -h where 2pk Pk - pk := 7-c := 1 Stirling's formula shows that Ck —* 0 as k —> 00, so that given e > 0, we can find N so that <e for k > N. Induction shows that for k > iV, But, since J]/9* = 00, we have J~[(l — pk) = 0, and it is now clear that limsup|afc+i| < ^? so that a^ —► 0. Because of the accurate version of Stirling's formula: n! = B™)* (-) " c*/<12n>, 0 < ^ == 0(n) we have -(»*)* {1 + 0 (I)}, so that Vfc - ^2/: + j - a/ttIJ -> 0, as required. D We now take a quick look at filtering. The central idea combines Bayes' formula with a recursive property which is now illustrated by two examples.
Chapter 15: Applications A5.6).. 15.6. Recursive nature of conditional probabilities Example. Suppose that A, B, C and D are events (elements of T) each with a strictly positive probability. Let us write (for example) ABC for A D B n C. Let us also introduce the notation CA(B) := P(B\A) = P(AB)/P(A) for conditional probabilities. The 'recursive property' in which we are in- interested is exemplified by CABc(D) = CAB(D\C) := 'if we want to find the conditional probability of D given that A, B and C have occurred, we can assume that both A and B have occurred and find the Cab probability of D given C". D Example. Suppose the X, Y, Z and T are RVs such that (X, Yy Z, T) has a strictly positive joint pdf fx,Y,z,T on R4- for B G BA, Then, of course, (X, Y, Z) has joint pdf fx,Y,Z on R3, where = / JR The formula defines a ('regular') conditional pdf of T given X, y, Z: for B G #, we have, with all dependence on u indicated, P(T 6 5|X, F, Z)(w) = EGB(T)|X, Y, Similarly, f
..A5.7) Chapter 15: Applications 165 The recurrence property is exemplified by 15.7. Bayes' formula for bivariate normal distributions With a now-clear notation, we have for RVs X, Y with strictly positive joint fx,Y on R2, Thus (**) the 'constant of proportionality' depending on y but being determined by the fact that / fx\y(x,y)dx = 1. JR The meaning of the following Lemma is clarified within its proof. LEMMA ►(a) Suppose that /i, a, b E R, <Aa< 17, W E @, oo) an(f that X and Y are RVs such that Then where the number V G @,oo) and the RV X are determined as follows: JL-! !L *.-£. V " U * W V ~ ^+ Proof. The absolute distribution of X is N(/i, U), so that
166 Chapter 15: Applications A5.7). The conditional pdf of Y given X is the density of N(a + bX, W), so that -* exp {- Hence, from (**), where 1/V = 1/U + P/W and x/V = fi/U+ b(y-a)/V. The result follows. D COROLLARY (b) 1^j</i tf/ie notation of the Lemma, we have \\X-X\\22=E{(X-XJ}=:V. Proof. Since Cy(X) = N(X, F), we even have E{(X-XJ\Y} = Vy a.s. D 15.8. Noisy observation of a single random variable Let X, r}\, 7/2,. • • be independent RVs, with Let (cn) be a sequence of positive real numbers, and let Yk = X -f Cjt^, ^n = (j(Yi, F2,..., Yn). We regard each Yk as a noisy observation of X. We know that Mn := E(JY|^n) -» Moo := E^l^o) a.s. and in £2. One interesting question mentioned at A0.4,c) is: when is it true that Moo = X a.s.?, or again, is X a.s. equal to an Too -measurable RV?
..A5.8) Chapter 15: Applications 167 Let us write Cn to signify 'regular conditional law' given Y\, Y2,..., Yn. We have Suppose that it is true that where Xn-\ is a linear function of Yi, 1^»,... ,Yn-\ an(i Vn-i a constant in @,00). Then, since Yn — X + cnrjn, we have From the Lemma 15.7(a) on bivariate normals with /i = Xn-UU = K.^a = 0,6 = 1, we have where un vn Yn But the recursive property indicated in Section 15.6 shows that Cn(X)=Cn-1(X\Yn). We have now proved by induction that Cn(X) = N(Xn,Fn), Vn. Now, of course, Mn = Xn and E{(X-MnJ} = Vn. However, n Our martingale M is £2 bounded and so converges in C2. We now see that = X a.s. if and only if ]T} c^2 — 00 .
168 Chapter 15: Applications A5.9).. 15.9. The Kalman-Bucy filter The method of calculation used in the preceding three sections allows im- immediate derivation of the famous Kalman-Bucy filter. Let A, H', C, K and # be real constants. Suppose that Xq, Yq,£i162, • •., 771,7/2, • • • are independent RVs with C(ek) = C(i)k) = N@,l), C(X0) = N(m,a2),Fo = 0. The true state of a system at time n is supposed given by Xn, where (dynamics) Xn — Xn-i = AXn-i + Hen + g. However the process X cannot be observed directly: we can only observe the process F, where (observation) Just as in Section 15.8, we make the induction hypothesis that where Cn-\ signifies regular conditional law given Y\, Yi,..., Yn-i- Since Xn = aln_i + g + Hen, where we have Cn-i(Xn) = N(oX..i +g,a2Vn-1 Also, since Yn = Yn-i + CXn + Krjn, we have Cn-i(Y«\Xn) = N(Yn-! +CXn,K2). Apply the bivariate-normal Lemma 15.7(a) to find that Cn(Xn) = en-!(Xn\Yn) = N(Xn, Vn), where (KB1) — = - + — Vn aWn-.+W* K*
..A5.10) Chapter 15: Applications 109 Equation (KB1) shows that Vn — f(Vn-\). Examination of the rectan- rectangular hyperbola which is the graph of / shows that Vn —► Vqo, the positive root of V = f(V). If one wishes to give a rigorous treatment of the K-B filter in continuous time, one is forced to use martingale and stochastic-integral techniques. See, for example, Rogers and Williams A987) and references to filtering and control mentioned there. For more on the discrete-time situation, which is very important in practice, and for how filtering does link with stochastic control, see Davis and Vintner A985) or Whittle A990). 15.10. Harnesses entangled The martingale concept is well adapted to processes evolving in time be- because (discrete) time naturally belongs to the ordered space Z"*". The ques- question arises: does the martingale property transfer in some natural way to processes parametrized by (say) Zd? Let me first explain a difficulty described in Williams A973) that arises with models in Z (d = 1) and in Z2, though we do not study the latter here. Suppose that (Xn : n £ Z) is a process such that each Xn £ C1 and that ('almost surely' qualifications will be dropped) (a) E(Xn\Xm:m^n)=\(Xn-1 + Xn+1), Vn. For 77i G Z, define Qm — &{Xk : h < ra), 7im — cr(Xr : r > m). The Tower Property shows that for a, b in Z+ with a < 6, we have for a < r <b, E(xr\ga,nb) = E(xr\xM: r ± s\ga,Hh) so that r h-> E(Xr\ga,7ib) is the linear interpolation = T Xa 4- t b — a b — a Hence, for n £ Z and u £ N, we have, a.s., £(xn\gn.u,nn+1) = -^ +
170 Chapter 15: Applications A5.10).. Now, the a-algebras <r(Gn-u,'Hn+i) decrease as u f oo. Because of Warning 4.12, we had better not claim that they decrease to o(G-oo, Anyway, by the Downward Theorem, we see that the random variable L := limui-oo(Xu/u) exists (a.s.) and n| I |0r(£7n_u, u Hence, by the Tower Property, (b) E(Xn|-L,7Yn+i) = Xn+\ — L whence we have a re versed-martingale property: E(X» - nL\L,nn+1) = Xn+1 - (n + \)L. A further application of the Downward Theorem shows that (c) A := limnfoo(-X"n — nL) exists (a.s.) Hence L = \im(Xu/u). ufoo By using the arguments which led to (b) in the reversed-time sense, we now obtain (d) E(Xn+1\L,Qn) = From (b) and (d) and the Tower Property, Exercise E9.2 shows that Xn_|-i = Xn -f- L. Hence (almost surely) Xn = nL -f A, Vn € Z, so that (almost) all sample paths of X are straight linesl Hammersley A966) suggested that any analogue of (a) should be called a harness property and that the type of result just obtained conveys the idea that every low-dimensional harness is a straitjacket!
..A5.12) Chapter 15: Applications 171 15.11. Harnesses unravelled, 1 The reason that A5.10,a) rules out interesting models is that it is expressed in terms of the idea that each Xn should be a random variable. What one should say is that Xn : ft -+ R but then require only that differences (Xr-Xs:r,seZ) be RVs (that is, be /"-measurable), and that for n,k G Z with k ^ n, E(Xn - Xk\Xm - X* : m ^ n) = J(*»-i - Xk) + I call this a difference harness in Williams A973). Easy exercise. Suppose that (Yn : n £ Z) are IID RVs in C1. Let -Yo be any function on ft. Define n — S v i y^w y if n > 0 if n < 0. Thus, Xn — Xn-i = yn, Vn. Prove that X is a difference harness. 15.12. Harnesses unravelled, 2 In dimension d > 3, we do not need to use the 'difference-process' unravelling described in the preceding section. For d > 3, there is a non-trivial model, related both to Gibbs states in statistical mechanics and to quantum fields, such that each Xn(n 6 Zrf) w a RV and, for n G Zrf, : m E Zd \ {n}) = where U is the set of the 2d unit vectors in Zd. See Williams A973). In addition to a fascinating etymological treatise on the terms 'martin- 'martingale' and 'harness', Hammersley A966) contains many important ideas on harnesses, anticipating later work on stochastic partial differential equations. Many interesting unsolved problems on various kinds of harness remain.
PART C: CHARACTERISTIC FUNCTIONS Chapter 16 Basic Properties of CFs Part C is merely the briefest account of the first stages of characteristic function theory. This theory is something very different in spirit from the work in Part B. Part B was about the sample paths of processes. Character- Characteristic function theory is on the one hand part of Fourier-integral theory, and it is proper that it finds its way into that marvellous recent book, Korner A988); see also the magical Dym and McKean A972). On the other hand, characteristic functions do have an essential role in both probability and statistics, and I must include these few pages on them. For full treatment, see Chow and Teicher A978) or Lukacs A970). Exercises indicate the analogous Laplace-transform method, and de- develop in full the method of moments for distributions on [0,1]. 16.1. Definition The characteristic function (CF) <p = ipx of a random variable X is defined to be the map <p : R -> C (important:the domain is R not C) defined by <pF) := E(eiex) = E cos(9X) + ^E s'mFX). Let F := Fx be the distribution function of X, and let /i := fix denote the law of X. Then <p($) := / eie*dF(x) := f ei9x Jr Jr so that <p is the Fourier transform of /i, or the Fourier-Stieltjes transform of F. (We do not use the factor Btt)~2 which is sometimes used in Fourier theory.) We often write ipF or (p^ for (p. 172
..A6.3) Chapter 16: Basic Properties of CFs 173 16.2. Elementary properties Let <p = ipx for a RV X. Then ►(a) 9?@) = 1 (obviously); ► (b) \ip{ff)\ < 1, W; ►(c) 0 t—> y>@) is continuous on R; (d) (e) You can easily prove these properties. (Use (BDD) to establish (c).) Note on differentiability (or lack of it). Standard analysis (see Theorem A 16.1) implies that if n G N and E(|JV|n) < oo, then we may formally differentiate <p@) = Ee%ex n times to obtain <p(n\0)=E[(iX)neiex]. In particular, <p(n\0) = inE(Xn). However, it is possible for ^'@) to exist when E(|A"|) = oo. We shall see shortly that tp can be the 'tent-function' so that (p need not be differentiable everywhere, and tp can be 0 outside 16.3. Some uses of characteristic functions Amongst uses of CFs are the following: • to prove the Central Limit Theorem (CLT) and analogues, • to calculate distributions of limit RVs, • to prove the 'only if part of the Three-Series Theorem, • to obtain estimates on tail probabilities via saddle-point approxima- approximation, • to prove such results as if X and Y are independent, and X -\-Y has a normal distribution, then both X and Y have normal distributions. Only the first three of these uses are discussed in this book.
174 Chapter 16: Basic Properties of CFs A6.4) 16.4. Three key results (a) If X and Y are independent RVs, then W9. Proof. This is just 'independence means multiply' again: (b) F may be reconstructed from <p. See Section 16.6 for a precise statement. (c) 'Weak' convergence of distribution functions corresponds exactly to con- convergence of the corresponding CFs. See Section 18.1 for a precise statement. The way in which these results are used in the proof of the Central Limit Theorem is as follows. Suppose that X\, X2,... are IID RVs each with mean 0 and variance 1. From (a) and A6.2,e), we see that if Sn := X\ H \-Xn, then We shall obtain rigorous estimates which show that *)n = |l - \e2/n + o(l/n) J" WO. Since 0 »-+ exp(—\02) is the CF of the standard normal distribution (as we shall see shortly), it now follows from (b) and (c) that the distribution of Sn/y/n converges weakly to the distribution func- function $ of the standard normal distribution N@,1). In this case, this simply means that i <x)-* $(x), x G R. 16.5. Atoms In regard to both A6.4,b) and A6.4,c), tidiness of results can be threatened by the presence of atoms.
..A6.6) Chapter 16: Basic Properties of CFs 175 If P(X = c) > 0, then the law /i of X is said to have an atom at c, and the distribution function F of X has a discontinuity at c: Now /i can have at most n atoms of mass at least 1/n, so that the number of atoms of fi is at most countable. It therefore follows that given x £ R, there exists a sequence (yn) of reals with yn j x such that every yn is a non-atom of /i (equivalently a point of continuity of F); and then, by right-continuity of F, F(yn) j F(x). 16.6. Levy's Inversion Formula This theorem puts the fact that F may be reconstructed from <p m very explicit form. (Check that the theorem does imply that if F and G are distribution functions such that <pp = <po on R, then F = G.) THEOREM ► Let ip be the CF of a RV X which has law \i and distribution function F. Then, for a < b, (a) Km i- / C '" e" y(tf)dg ijooZTT J— T ™ Moreover, if JR |y>(^)|c?^ < oo; </ien X /las continuous probability density function f, and (b) /(x) = i- / e-i9*<p(9)de. The 'duality' between (b) and the result (c) V>@) = / ei9*f(x)dx can be exploited as we shall see. The proof of the theorem may be omitted on a first reading.
176 Chapter 16: Basic Properties of CFs A6.6).. Proof of the theorem. For u,v € R with u < v> \eiv - eiu\ < \v - u\, either from a picture or since / itudt < f |te''f|<ft = / Ju Ju Ju ldt. Let a, 6 £ R, with a < b. Fubini's Theorem allows us to say that, for 0<T< oo, (e) *T —iQa p — i9b -T 10 <p@)d$ = i- f 2nJ_ T --I e I R %0 ei0(x-a) _ ei9(x-b) id x) dB p{dx) provided we show that -T -a) __ ei0{x-b) iO dO < oo. However, inequality (d) shows that Ct < (b — a)T/w, so that (e) is valid. Next, we can exploit the evenness of the cosine function and the oddness of the sine function to obtain >T iO(x-a) _ pi0(x-b) -T iO dS sgn(x - a)S(\x - a\T) - sgn(x - b)S(\x - b\T) where, as usual, sgn(x) := 0 L-l if x > o, if x = 0, if x < 0, and S(U) := / — > 0).
..A6.6) Chapter 16: Basic Properties of CFs 177 Even though the Lebesgue integral f£° x~x s'mxdx does not exist, because / [ j c[x = / [ ] dx = oo, Jo V x ) Jo V x J we have (see Exercise El6.1) lim S(U) = jr . U]oo V } 2 The expression (f) is bounded simultaneously in x and T for our fixed a and 6; and, as T f oo, the expression (f) converges to 0 if x < a or x > 6, § if a: = a or a: = 6, 1 if a < a: < b. The Bounded Convergence Theorem now yields result (a). Suppose now that JR |(/?@)|d0 < oo. We can then let T | oo in result (a) and use (DOM) to obtain (g) F(b) ± J e = Jn provided that F is continuous at a and b. However, (DOM) shows that the right-hand side of (g) is continuous in a and b and (why?!) we can conclude that F has no atoms and that (g) holds for all a and b with a < b. We now have But, by (d), i0(b - a) Hence, the assumption that /R |<£>@)|d0 < oo allows us to use (DOM) to let b —> a in (h) to obtain F'(a) = f(a) := i- / e""V(')d», and, finally, / is continuous by (DOM).
178 16.7. A table Chapter 16: Basic Properties of CFs A6.7). 1. 2. 3. 4. 5. 6. 7. Distribution N(//,cr2) U[0,l] U[ 1,1] Double exponential Cauchy Triangular Anon pdf 1 1 2 1 7T(H-X2) 1—COS X 7TX2 Support R [0,1] [-1,1] R R [-1,1] R CF iO sin 0 1 e-W o /1—cos 0^ ~\ $* ) A - I*I)I[-M](«) The two lines 4 and 5 illustrate the duality between A6.6,b) and A6.6,c), as do the two lines 6 and 7. Hints on verifying the table are given in the exercises on this chapter.
Chapter 17 Weak Convergence In this chapter, we consider the appropriate concept of 'convergence' for probability measures on (R, B). The terminology 'weak convergence' is un- unfortunate: the concept is closer to 'weak*' than to 'weak' convergence in the senses used by functional analysts. 'Narrow convergence' is the official pure-mathematical term. However, probabilists seem determined to use 'weak convergence' in their sense, and, reluctantly, I follow them here. We are studying the special case of weak convergence on a Polish (com- (complete, separable, metric) space S when 5 = R; and we unashamedly use special features of R. For the general theory, see Billingsley A968) or Parthasarathy A967) - or, for a superb acount of its current scope, Ethier and Kurtz A986). Notation. We write Prob(R) for the space of probability measures on R, and C»(R) for the space of bounded continuous functions on R. 17.1. The 'elegant' definition Let (fin : n e N) be a sequence in Prob(R) and let fi G Prob(R). We say that fin converges weakly to \i if (and only if) (a) i*n(h) -* fi(h), Vh e C6(R), and then write 179
180 Chapter 17: Weak Convergence A7.1).. We know that elements of Prob(R) correspond to distribution functions via the correspondence fj, <-+ F, where F(x) = fi(—co,x]. Weak convergence of distribution functions is defined in the obvious way: (c) Fn ^¥ F if and only if /in -^» fi. We are generally interested in the case when Fn = Fxn, that is when Fn is the distribution function for some random variable Xn. Then, by F.12), we have, for h € C&(R), fin(h) = / h(x)dFn(x) = E/i(Xn). Note that the statement Fn -^ F is meaningful even if the Xn's are defined on different probability spaces. However, if Xn (n £ N) and X are RVs on the same triple (Q,Jr1P)} then (d) (Xn -* X, a.s.) =» (FXm Z Fx), and indeed, (e) (Xn-^Xinprob) =» (FXn Z Fx), Proof o/(d). Suppose that Xn —» X, a.s., and that (in is the law of Xn and /x is the law of X. Then, for h € Ct(R), we have h(Xn) —► ft(X), a.s., and, by (BDD), D Exercise. Prove (e). 17.2. A 'practical' formulation Example. Atoms are a nuisance. Suppose that Xn — -, X = 0. Let /in be the law of Xn, so that fin is the unit mass at -, and let \x be the law of X. Then, for h £ C6(R), -> HO) = fi(h), so that [in —+ fi. However, Fn(Q) = 0 t4 F@) = 1. □
..A7.2) Chapter 17: Weak Convergence 181 LEMMA (a) Let (Fn) be a sequence of DFs on R, and let F be a DF on R. Then Fn -^ F if and only if n for every non-atom (that is, every point of continuity) x of F. Proof of 'only if part. Suppose that Fn ^+ F. Let x 6 R, and let S > 0. Define h 6 Cb(R) via h(y) := 1 if y < x, 1 — S~1(y — x) if x < y < x -f < .0 if y > x -h <^. Then fin(h) —+ fi(h). Now, ■Fn(#) < fin(h) and /i(/i) < .F(x -f <^), so that limsupFn(a:) < F(x -f ^). n However, F is right-continuous, so we may let S j 0 to obtain (b) limsupi^z) < F(x\ Vx e R. n In similar fashion, working with y »—> h(y -f S), we find that for x 6 R and n so that (c) liminfFn(ar-) > F(x-), Va: e R. n Inequalities (b) and (c) refine the desired result. □ In the next section, we obtain the 'if part as a consequence of a nice representation.
182 Chapter 17: Weak Convergence A7.3).. 17.3. Skorokhod representation THEOREM Suppose that (Fn : n G N) is a sequence of DFs on R, that F is a DF on R and that Fn(x) —+ F(x) at every point x of continuity of F. Then there exists a probability triple A2,^*, P) carrying a sequence (Xn) of RVs and also a RVX such that and Xn —► X a.s. This is a kind of 'converse' to A7.1,d). Proof. We simply use the construction in Section 3.12. Thus, take define X+{u):= X~{uj) := mi{z : F(z) and define X+, X~ similarly. We know from Section 3.12 that X+ and X~ have DF F and that P(X+ = X~) = 1. Fix uj. Let z be a non-atom of F with z > X~*~(lj). Then F(z) > a;, and hence, for large n, Fn(z) > u, so that X+(lj) < z. So limsupn X+(lj) < z. But (since non-atoms are dense), we can choose z I X~*~(lj). Hence and, by similar arguments, Since X~ < X+ and P(X+ = X~) = 1, the result follows. □ 17.4. Sequential compactness for Prob(R) There is a problem in working with the non-compact space R. Let /in be the unit mass at n. No subsequence of (/in) converges weakly in Prob(R), but fin —y /ioo in Prob(R), where fi^ is the unit mass at oo. Here R is the
..A7.4) Chapter 17: Weak Convergence 183 compact metrizable space [—00,00], the definition of Prob(R) is obvious, and fin m* fico in Prob(R) means that fin(h) -> fioo(h), Vfc e C(R). (We do not need the subscript '&' on C(R) because elements of C(R) are bounded.) It is important to keep remembering that while functions in C(R) tend to limits at -f 00 and —00, functions in C&(R) need not. The space C(R) is separable (it has a countable dense subset) while the space is not. Let me briefly describe how one should think of the next topic. Here I assume some functional analysis, but from the next paragraph (not the next section) on, I resort to elementary bare-hands treatment. By the Riesz rep- representation theorem, the dual space C(R)* of C(R) is the space of bounded signed measures on (R,#(R)). The weak* topology of C(R)* is metrizable (because C(R) is separable), and under this topology the unit ball of C(R)* is compact and contains Prob(R) as a closed subset. The weak* topology of Prob(R) is exactly the probabilists' weak topology, so (a) Prob(R) is a compact metrizable space under our probabilists' weak topology. The bare-hands substitute for result (a) is the following. LEMMA (Helly-Bray) (b) Let (Fn) be any sequence of distribution functions on R. Then there exist a right-continuous non-decreasing function F on R such that 0 < F < 1 and a subsequence (n,) such that (*) limFn.(x) = F(x) at every point of continuity F. Proof. We make an obvious use of 'the diagonal principle'. Take a countable dense set C of R and label it: C = {ci,C2,C3,. . .}. Since (Fn(c\) : n 6 N) is a bounded sequence, it contains a convergent subsequence Fn(ij)(ci) -+ H(ci) (say) as j -» 00 In some subsequence of this subsequence, we shall have FnB,j)(c2) -* H(c2) as j -4 00;
184 Chapter 17: Weak Convergence A7.4)- and so on. If we put n,- = n(i,i), then we shall have: H{c) := limFn<(c) exists for every c in C. Obviously, 0 < H < 1, and H is non-decreasing on C. For i6R, define F(x) := Hm #(c), the 'II' signifying that c decreases strictly to a: through C. (In particular, jP(c) need not equal H(c) for c in C.) Our function F is right-continuous, as you can check. By the 'limsu- pery' of Sections 17.2 and 17.3, you can also check for yourself that (*) holds: I wouldn't dream of depriving you of that pleasure. □ 17.5. Tightness Of course, the function F in the Helly-Bray Lemma 17.4(b) need not be a distribution function. It will be a distribution if and only if lim F(x) = 0, lim F(x) = 1. xjj-oo zTT+oo Definition ► A sequence (Fn) of distribution functions is called tight if, given e > 0, there exists A" > 0 such that pu[-K, K] = F{K) - F{-K-) > 1 - c. You can see the idea: 'tightness stops mass being pushed out to -j-oo or —oo'. LEMMA Suppose that Fn is a sequence of DFs. (a) If Fn Z F for some DF F, then (Fn) is tight (b) If (Fn) is tight, then there exists a subsequence (Fni) and a DF F such that Fni -^ F. This is a really easy limsupery exercise.
Chapter 18 The Central Limit Theorem The Central Limit Theorem (CLT) is one of the great results of mathemat- mathematics. Here we derive it as a corollary of Levy's Convergence Theorem which says that weak convergence of DFs corresponds exactly to pointwise convergence of CFs. 18.1. Levy's Convergence Theorem ► Let (Fn) be a sequence of DFs, and let <pn denote the CF of Fn. Suppose that g@) := \innpn@) exists for all 0 G R, and that g(-) is continuous at 0. Then g — (pp for some distribution function F, and Proof Assume for the moment that (a) the sequence (Fn) is tight Then, by the Helly-Bray result 17.5(b), we can find a subsequence (Fnk) and a DF F such that But then, for 9 £ R, we have ¥>nk(9) -* <Pf@) {Vnk : CF of Fnk) (take h(x) - ei9x). Thus g - <pF. 185
186 Chapter 18: The Central Limit Theorem A8.1).. Now we argue by contradiction. Suppose that (Fn) does not converge weakly to F. Then, for some point x of continuity of F, we can find a subsequence which we shall denote by (Fn) and an rj > 0 such that (*) \Fn(x) - F{x)\ > 77, Vn. But (Fn) is tight, so that we can find a subsequence Fnj and a DF F such that F ^F But then (pni —* <^>, so that (p — (pp = g = <pp- Since a CF determines the corresponding DF uniquely, we see that F = i^, so that, in particular, a: is a non-atom of i71 and (**) PBi The contradiction between (*) and (**) clinches the result. □ We must now prove (a). Proof of tightness of (Fn). Let e > 0 be given. Since the expression is real, it follows that g@) -\- g{—9) is real (and obviously bounded above by 2). Since g is continuous at 0 and equal to 1 at 0, we can choose 8 > 0 such that |1 — #(#)| < \s when \6\ < 8. We now have f6 1 / {2-g@)-g(-0)}M<\e. Jo Since g = lim</?n, the Bounded Convergence Theorem for the finite interval [0,6] shows that there exists no in N such that for n > no, f6 1 / {2 - tpn($) - <pn(-0)}d0 < e. Jo
..A8.2) Chapter 18: The Central Limit Theorem 187 However, [ {2 - ipn(9) - ipn -6 dFn(x), JR [ J-6 the interchange of order of integration being justified by the fact that since |1 — et9x\ < 2, 'the integral of the absolute value' is clearly finite. We now have, for n > no, R i i - dFn =fin{x : \x\ > 2<5~1} and it is now evident that the sequence (Fn) is tight. □ If you now re-read Section 16.4, you will realize that the next task is to obtain 'Taylor' estimates on characteristic functions. 18.2. o and O notation Recall that means that and that f(t) = o(g(t)) as t-*L means that f(t)/g(t) -+0 as t -> L.
188 Chapter 18: The Central Limit Theorem A8.S) 18.3. Some important estimates For n — 0,1,2,... and x real, define the 'remainder' Rn(x) = eix - J2 Then Ro(x) = eix - 1 = and from these two expressions we see that |-Ro(z)| < minB, \x\). Since T> ( \ _ / • D ^0 we obtain by induction: mm Suppose now that X is a zero-mean RV in C2: E(X) = 0, G2 := Var(X) < oo. Then, with <p denoting y>x> we have ►(a) \ip@) - A - Ja2^2)! = |Ei22(^)| < A 2larln l3rln+1 \ II II i n\ ' (n 4- 1V / The final term within E(-) is dominated by the integrable RV |X|2 and tend to 0 as 0 —► 0. Hence, by (DOM), we have (b) V?((9) = l-iG2(92-ho((92) as 0 -> 0. Next, for jjarJ < £, and with principal values for logs, = # — dw = -z / - Jo J- -r ^ Jo -1 and since |1 + tz\ > | we have
..A8.5) Chapter 18: The Central Limit Theorem 189 18.4. The Central Limit Theorem Let (Xn) he an IID sequence, each Xn distributed as X where E(X) = 0, a2 := Var(X) < oo. Define Sn := X\ -f • • • -f Xn> and set Then, for i6R, we have, as n —* oo, P(Gn <x)- *(x) = -i= T exp(-|y2)<fy. V27T 7 Proof. Fix 6 in R. Then, using A8.3,b), the 'o' now referring to the situation when n —» oo. But now, using A8.3,c), we have, as n —♦ oo, 1 02 f 02 ~U\ 2 n + WJ 2 • Hence y>Gn{®) —* exp(—|^2), and since 0 t-+ exp(—|^2) is the CF of the normal distribution, the result follows from Theorem 18.1. 18.5. Example Let us look at a simple example which shows how the method may be adapted to deal with a sequence of independent but non-IID RVs. With the Record Problem E4.3 in mind, suppose that on some (fi, T, P), , JE/2y -.. are independent events with P(-Z£n) = 1/rc- Define
190 Chapter 18: The Central Limit Theorem A8.5).. the 'number of records by time ny in the record context. Then E(Nn) = k<n k k 7 + o(l), G is Euler's constant) 7 ~ Let Gn := Nn - log n so that E(G?n) y/iogn 0, Var(G?n) -* 1. Then, for fixed 6 in R, But ^.M=n n {1 -1+u We see that as n —>• 00, and with t := 0/y/\og ny ^ ^^™ i. + n r 1 ^D log I 1 + - +0 ik=l n + 0A)] Hence P(Gn < x) -> ^(x), xGR. D See Hall and Heyde A980) for some very general limit theorems. 18.6. CF proof of Lemma 12.4 Lemma 12.4 gave us the 'only if part of the Three-Series Theorem 12.5. Its statement was as follows.
..A8.6) Chapter 18: The Central Limit Theorem 191 LEMMA Suppose that (Xn) is a sequence of independent random variables bounded by a constant K in [0, oo): \Xn(u)\<Ky Vn,Vu;. Then (]TXn converges, a.s.) => (X^E(^n) converges and ]T Var(Xn) < oo). The proof given in Section 12.4 was rather sophisticated. Proof using characteristic functions. First, note that, as a consequence of estimate A8.3,a), if Z is a RV such that for some constant \Z\ < Ki, E(Z) = 0, a2 := Var(Z) < oo, then for \6\ < K^1, we have Now take Zn := Xn - E(Zn). Then E(ZB) = 0, and \Zn\ < 2K. If 2 Var(Xn) = oo, then, for 0 < B7ST)-1, we shall have, for 0 < BK)~\ However, if ^X* converges a.s. to 5, then, by (DOM), J] <pXk@) = Ee and <ps(9) is continuous in 0 with <ps(Q) = 1- We have a contradiction. Hence ]T Var(Xn) = ^ Var(Zn) < oo, and, since E(Zn) = 0, Theorem 12.2(a) shows that > Zn converges a.s. Hence converges a.s., and since it is a deterministic sum, it converges! This last part of the argument was used in Section 12.4.
APPENDICES Chapter Al Appendix to Chapter 1 Al.l. A non-measurable subset A of S1. In the spirit of Banach and Tarski, although, of course, this relatively trivial example pre-dates theirs, we use the Axiom of Choice to show that (a) S1 = (J Aq where the Aq are disjoint sets, each of which may be obtained from any of the others by rotation. If the set A = Aq has a 'length' then it is intuitively clear that result (a) would force 2?r = oo x length (A), an impossibility. To construct the family (Ag : q G Q), proceed as follows. Regard S1 as {el9 : 0 G R} inside C. Define an equivalence relation *v on S1 by writing z ~ w if there exist a and f3 in R such that z = eta, w = elV>, a-/?GQ. Use the Axiom of Choice to produce a set A which has precisely one repre- representative of each equivalence class. Define Aq = eiqA = {eiqz : z £ A}. Then the family (Aq : q 6 Q) has the desired properties. (Obviously, Q could be replaced by Z throughout the above argument.) We do not bring this example to its fully rigorous conclusion. The remainder of this appendix is fully rigorous. 192
..(Al.S) Chapter Al: Appendix to Chapter 1 198 We now set out to prove Uniqueness Lemma 1.6. Al.2. d-systems. Let 5 be a set, and let T> be a collection of subsets of S. Then T> is called a d-system (on S) if (a) S €V, (b) if A,B G V and A C B then B\A G X>, (c) if An £ V and An | A, then iGD. Recall that An 1 A means: An C An+i(Vn) and U^4n = A (d) Proposition. A collection S 0/ subsets of S is a a-algebra if and only if Yt is both a tr-system and a d-systcm. Proof. The 'only if part is trivial, so we prove only the 'if part. Suppose that S is both a 7r-system and a c?-system, and that E, F and En{n GN)GS. Then Ec := S\E <E E, and E U F = S\(EC fl Fc) e E. Hence Gn := £?i U.. .UEn G S and, since Gn T U^, we see that (J Ek G S. Finally, fl((J3l)e€E. n  Definition of c?(C). Suppose that C is a class of subsets of 5. We define d(C) to be the intersection of all d-systems which contain C. Obviously, d(C) is a c?-system, the smallest c?-system which contains C. It is also obvious that d(C) C *(C). A1.3. Dynkin's Lemma ► If X is a K-system, then d(X) = or(J). Thus any c?-system which contains a 7r-system contains the cr-algebra generated by that 7r-system. Proof. Because of Proposition Al.2(d), we need only prove that d{2) is a 7r-system.
194 Chapter Al: Appendix to Chapter 1 (A1.8).. Step 1: Let Vx := {B G d(X) : BDC G d(T), VC G J}. Because J is a tt- system, T>\ D X. It is easily checked that T>\ inherits the d~system structure from d{X). [For, clearly, S G 2>i- Next, if BUB2 G X>i and #i C B2, then, for C in 1", and, since I?2 D C G d(J), B\ D C G c?(J) and d( J) is a d-system, we see that (B2\B1) PI C G d(J), so that B2\Bi G 2>i. Finally, if Bn £ Vi(n G N) and B, then for C G J, so that 5flCG d(X) and B G £>i.] We have shown that X>i is a d-system which contains X, so that (since T>\ C d(T) by its definition) D\ = f: Let D2 := {^ G rf(J) : B D A G d(J), VB G d(l)}. Step 1 showed that T>2 contains X. But, just as in Step 1, we can prove that T>2 inherits the d-system structure from d(X) and that therefore T>2 ~ d(X). But the fact that T>2 = d(X) says that d(T) is a 7r-system. D A1.4. Proof of Uniqueness Lemma 1.6 Recall what the crucial Lemma 1.6 stated: Let S be a set. Let X be a TC-system on S, and let S := cr(X). Suppose that ii\ and ji2 are measures on (S, E) such that fii(S) = f*2(S) <oo and fi\ = fi2 on X. Then on E. Proof. Let Then V is a d-system on 5. [Indeed, the fact that 5 G V is given. If A,B £Vy then in(B\A) = ) so that B\A G 2?. Finally, if Fn G I> and Fn T ^, then by Lemma 1.10(a), so that F G X>.
..(A1.5) Chapter A1: Appendix to Chapter 1 195 Since V is a d-system and V D 2 by hypothesis, Dynkin's Lemma shows that V D cr{X) = S, and the result follows. □ Notes. You should check that no circular argument is entailed by the use of Lemma 1.10 (this is obvious). The reason for the insistence on finiteness in the condition fii(S) = fi2(S) < oo is that we do not wish to try to claim at (*) that oo — co = co — co. Indeed the Lemma 1.6 is false if '< co' is omitted - see Section A 1.10 below. We now aim to prove Caratheodory's Theorem 1.7. A1.5. A-sets: 'algebra' case LEMMA Let Qo be an algebra of subsets of S and let A : Qo -> [0, oo] with A@) = 0. Call an element L of Qo a \-set if L 'splits every element of Qo properly': \(L n G) + A(LC n G) = A(G), V<? G &. Then the class Co of \-sets is an algebra, and A is finitely additive on Co. Moreover, for disjoint L\,L2,... ,Ln € Co and G in Qo, x(\J(LknG))= " Proof. Step 1: Let L\ and Li be A-sets, and let L = L\ H L>2. We wish to prove that L is a A-set. Now Lc PI L2 = L2 fl L\ and Lc PI L\ = L%. Hence, since L2 is a A-set, we have, for any G in Qo, \(LC n G) = \(L2 n Ll n G) + \{LC2 n G)
196 Chapter Al: Appendix to Chapter 1 (Al.5).. and, of course \{LC2 n G) + A(X2 fl G) = A(G). Since Xi is a A-set, A(x2 n xj n G) + A(x n G) = A(x2 n G). On adding the three equations just obtained, we see that A(XC H G) + A(X n G) = A(G), VG G ^o, so that Z is indeed a A-set. Step 2: Since, trivially, 5 is a A-set, and the complement of a A-set is a A-set, it now follows that Co is an algebra. Step 8: If L\ and L2 are disjoint A-sets and G € Qoy then (Xi u l2) n ii = Xi, (Xi u x2) n lj = x2, so, since L\ is a A-set, A((Xi U X2) n G) = A(Xi n G) + A(X2 n G). The proof is now easily completed. D A1.6. Outer measures Let Q be a (j-algebra of subsets of 5. A map A : g -4 [0, oo] is called an outer measure on (S, Q) if (a) A@) = 0; (b) A is increasing: for G±,G2 £ Q with Gi QG2, A(G,)<A(Ga); (c) A is countably subadditive: if (<?*) is any sequence of elements of then
..(A1.7) Chapter Al: Appendix to Chapter 1 197 A1.7. Caratheodory's Lemma. ► Let X be an outer measure on the measurable space (SyQ). Then the X-sets in Q form a a-algebra C on which X is countably additive, so that (S, £, A) is a measure space. Proof. Because of Lemma A 1.5, we need only show that if (Lk) is a disjoint sequence of sets in £, then L := [JLk £ £ and k (a) By the subadditive property of A, for G € G, we have (b) X(G) < X(L PI G) + X(LC H G). Now let Mn := Ujt<n Lk. Lemma A1.5 shows that Mn G Cy so that X(G) = X(Mn H G) + A(M^ n G). However, M^ 3Ic,so that (c) \{G) > X{Mn PI G) + A(LC PI Lemma A 1.5 now allows us to rewrite (c) as X(L° n ik<n so that (d) X(G) > J2 KLk PI G) + A(ZC H G) > A(L n G) + A(LC n G), using the countably subadditive property of A in the last step. On comparing (d) with (b), we see that equality must hold throughout (d) and (b), so that L G C; and then on taking G — L we obtain result (a). D
198 Chapter Al: Appendix to Chapter 1 (A1.8).. A1.8. Proof of Caratheodory's Theorem. Recall that we need to prove the following. Let S be a set, let So be an algebra on S, and let E:=<t(£o). If fio is a countably additive map jiq : Eo —*■ [0, oo], then there exists a measure fi on (i>, S) such that /i = /do on So- Proof Step 1: Let Q be the a-algebra of all subsets of S. For G G Gy define n where the infimum is taken over all sequences (Fn) in Eo with G C [JFn. n We now prove that (a) A is an outer measure on (S,Q). The facts that A@) = 0 and A is increasing are obvious. Suppose that (Gn) is a sequence in Q', such that each X(Gn) is finite. Let e > 0 be given. For each n, choose a sequence (Fntk '- k £ N) of elements of Eo such that Gn C \jFn,k, J2fio(Fn,k) < \(Gn) Then G := \JGnQ UU^n,/k, so that n k n Jb n Since e is arbitrary, we have proved result (a). Step 2: By Caratheodory's Lemma A 1.7, A is a measure on E, £), where C is the (j-algebra of A-sets in Q. All we need show is that (b) So C £, and A = for then E := <j(Eo) C C and we can define fj to be the restriction of A to (S, E).
..(A1.8) Chapter Al: Appendix to Chapter 1 199 Step 8: Proof that A = fio on So. Let F G So. Then, clearly, A(F) < //o^). Now suppose that F C IJn ^n' w^ere ^n € So. As usual, we can define a sequence (En) of disjoint sets: ,k<n such that J5Jn C Fn and |J J5Jn = (J Fn 2 F. Then (J,q{F) = fJ,Q ( IJ(F n -£/n) ) =: / ftOyF H using the countable additivity of fio on So. Hence < so that \(F) > ^iq(F). Step 3 is complete. Step 4- Proof that So C C. Let E G So and G € £7. Then there exists a sequence (.Fn) in So such that G C (Jn Fn, and n Now, by definition of A, n n n since E DG C \J(E D Fn) and Ec D G C \J(EC n i^). Thus, since e is arbitrary, \(G) > X(E n G) + A(EC H G). However, since A is subadditive, A(G) < A(J57 n G) + \{EC n G). We see that J5J is indeed a A-set. □
200 Chapter Al: Appendix to Chapter 1 (A1.9).. A1.9. Proof of the existence of Lebesgue measure on (@,1], 5@,1]). Recall the set-up in Section 1.8. Let S = @,1]. For F C 5, say that FgS0 if F may be written as a finite union where r G N, 0 < a\ < b\ < ... < ar < br < 1. Then (as you should convince yourself) So is an algebra on @,1] and S:=<r(S0) = 5@,1]. (We write 5@,1] rather than 5(@,1]).) For F as at (*), let Of course, a set F may have different expressions as a finite disjoint union of the form (*): for example, However, it is easily seen that jj,q is well defined on So and that //o is finitely additive on Eo- While this is obvious from a picture, you might (or might not) wish to consider how to make the intuitive argument into a formal proof. The key thing is to prove that //o w countably additive on Sq. So, suppose that (Fn) is a sequence of disjoint elements of So with union F in So. We know that if Gn = (JJLj Fk, then n and Gn T F. To prove that fj,Q is countably additive it is enough to show that fio(Gn) , for then n Km/io(Gn) =T H Let Hn = F\Gn. Then Hn G So and /fn | 0. We need only prove that fto(Hn) i 0;
..(A1.9) Chapter Al: Appendix to Chapter 1 201 for then n) T It is clear that an alternative (and final!) rewording of what we need to show is the following: (a) if (Hn) is a decreasing sequence of elements o/So such that for some £ >0, fio(Hn) > 2e, Vn, thenf]Hk ^ 0. Proof of (a). It is obvious from the definition of So that, for each k G N, we can choose Jk G So such that, with Jk denoting the closure of Jjt, Jk C Hk and fi(Hk\Jk) < e2~k. But then (recall that Hn [) Hence, since fio(Hn) > 2e, Vrc, we see that for every and hence f)k<n Jk is non-empty. A fortiori then, for every n, Kn := (j Jk is non-empty. Jt<n That (b) f)Jk^<& (whence now follows from the Heine-Borel theorem: for if (b) is false, then ((Jk)c : k G N) gives a covering of [0,1] by open sets with no finite subcovering. Alternatively, we can argue directly as follows. For each rc, choose a point xn in the non-empty set Kn. Since each xn belongs to the compact set Ji, we can find a subsequence (nq) and a point x of J\ such that xUq —> x. However, for each k, xn G Jk for all but finitely many q, and since Jk is
202 Chapter Al: Appendix to Chapter 1 (A1.9).. compact, it follows that x G J*. Hence x G f)k J*, and property (b) holds. D Since //o is countably additive on So and //o@,1] < oo, it follows that has a unique extension to a measure // on (@, l],#@,1]). This is Lebesgue measure Leb on (@, l],#@,1]). The //o-sets form a cr-algebra strictly larger than 5@,1], namely the cr-algebra of Lebesgue measurable subsets of @,1]. See Section Al.ll. Al.10. Example of non-uniqueness of extension With (£, So) as in Section A1.9, suppose that for F G So, fO ifF = 0, (a) v y "v ' ■ oo The Caratheodory extension of i/q will be obtained as the obvious extension of (a) to S. However, another extension v is given by v(F) = number of elements in F. Al.ll. Completion of a measure space In fact (apart from an 'aside' on the Riemann integral), we do not need completions in this book. Suppose that (£, E, //) is a measure space. Define a class Af of subsets of S as follows: N G Af if and only if 3Z G E such that N C Z and fi(Z) = 0. It is sometimes philosophically satisfying to be able to make precise the idea that 'N in Af is //-measurable and fi(N) = 0'. This is done as follows. For any subset F of 5, write FGE* if 3E,G eH such that E C F C G and //(<? \ E) = 0. It is very easy to show that S* is a cr-algebra on 5 and indeed that £* = <x(S,A/'). With obvious notation we define for FEU*, it being easy to check that //* is well defined. Moreover, it is no problem to prove that (S, £*, //*) is a measure space, the completion of E, S, //).
..(A1.12) Chapter Al: Appendix to Chapter 1 203 For parts of advanced probability, it is essential to complete the basic probability triple (ft, J7, P). In other parts of probability, when (for exam- example) 5 is topological, £ = 5E), and we wish to consider several different measures on E, £), it is meaningless to insist on completion. If we begin with ([0, l],5[0, l],Leb), then #[0,1]* is the cr-algebra of what are called Lebesgue-measurable sets of [0,1]. Then, for example, a function / : [0,1] —► [0,1] is Lebesgue-measurable if the inverse image of every Borel set is Lebesgue-measurable: it need not be true that the inverse image of a Lebesgue-measurable set is Lebesgue-measurable. A1.12. The Baire category theorem In Section 1.11, we studied a subset H of 5 := [0,1] such that (i) H = f) Gk for a sequence (Gk) of open subsets of 5, k (ii) HDV, where V = Q f) S. If H were countable: H = {hr : r G N}, then we would have (a) S = HUHc=(\J{hr})U(\jGl) r expressing S as a countable union n of closed sets where no Fn contains an open interval. [Since V C Gk for every k, G\ C Vc so that Gck contains only irrational points in 5.] However, the Baire category theorem states that if a complete metric space S may be written as a union of a countable sequence of closed sets: then some Fn contains an open ball. Thus the set H must be uncountable. The Baire category theorem has fundamental applications in functional analysis, and some striking applications to probability too! Proof of the Baire category theorem. Assume for the purposes of contradic- contradiction that no Fn contains an open ball. Since F[ is a non-empty open subset of 5, we can find x\ in S and 6\ > 0 such that B(x1,e1)CF1c,
204 Chapter Al: Appendix to Chapter 1 (Al.12).. B(x\, £\) denoting the open ball of radius 6\ centred at x\. Now F2 contains no open ball, so that the open set U2 := is non-empty, and we can find X2 in U2 and £2 > 0 such that B(x2,e2) C U2, £2 <2~1ei. Inductively, choose a sequence (xn) in S and (en) in @,00) so that we have < 2~1en and Since d(xn,xn+i) < 2~1en, it is obvious from the triangle law that (xn) is Cauchy, so that x := \\mxn exists, and that contradicting the fact that (J Fn = 5. D
Chapter A3 Appendix to Chapter 3 A3.1. Proof of the Monotone-Class Theorem 3.14 Recall the statement of the theorem. ► Let 7i be a class of bounded functions from a set S into R satisfying the following conditions: (i) 7i is a vector space over R; (ii) the constant function 1 is an element of7{\ (iii) if (/n) is a sequence of non-negative functions in 7i such that fn T / where f is a bounded function on S, then f G 7i. Then if 7i contains the indicator function of every set in some tt- system X, then H. contains every bounded o'(X)-measurable function on S. Proof. Let T> be the class of sets F in 5 such that If G 7{. It is immediate from (i) - (iii) that T> is a d-system. Since T> contains the 7r-system X, T> contains cr{X). Suppose that / is a crB")-measurable function such that for some K in N, 0 < f(s) < K, Vs G 5. For n G N, define K2n Us) = Y, «2-"Ia(»,o. t=0 where ,i) := {s : i2~n < f(s) < (i -f Since / is cr(Z)-measurable, every A(nyi) G c(X)T so that lA(n,i) € H. Since is a vector space, every /„ G W. But 0 < fn | /, so that / G 7i. 205
206 Chapter AS: Appendix to Chapter 3 (ASA).. If / G bcr(J), we may write / = /+ — /*", where / = max(/, 0) and /- = max(-/,0). Then /+,/" G ba(l) and /+,/" > 0, so that f~ ^ 7^ by what we established above. □ A3.2. Discussion of generated cr-algebras This is one of those situations in which it is actually easier to understand things in a more formal abstract setting. So, suppose that fi, and 5 are sets, and that Y : Q —> 5; S is a a-algebra on S: X : Q -+ R. Because Y~l preserves all set operations, is a cr-algebra on n, and because it is tautologically the smallest <7-algebra y onSl such that Y is y/H measurable (in that K : E —► 3^)» we call it cr(Y): a(Y) = r-2E. LEMMA (a) X is cr(Y)-measurable if and only if X = f(Y) where f is a S-measurable function from S to R. Note. The 'if' part is just the Composition Lemma Proof of 'only if part. It is enough to prove that (b) X G ba(Y) if and only if3f G bE such that X = f(Y). (Otherwise, consider arc tan X, for example.) Though we certainly do not need the Monotone-Class Theorem to prove (b), we may as well use it. So define 7i to be the class of all bounded functions Xon(] such that X = f{Y) for some / G bE. Taking I = cr(F), note that if F G I then F = Y~XB for some B in E, so that = Ib(Y{u>)\
..(AS.2) Chapter AS: Appendix to Chapter S 207 so that If G W. That H is a vector space containing constants is obvious. Finally, suppose that (Xn) is a sequence of elements of 7i such that, for some positive real constant K, 0 < Xn T X < K. For each n, Xn = /n(^) for some /„ in b£. Define / := limsup/n, so that / G bS. Then X = /(F). D One Aa.9 <o be very careful about what Lemma (a) means in practice. To be sure, result C.13,b) is the special case when E, S) = Discussion o/C.13,c). Suppose that Y* : Q —> R for 1 < k < n. We may define a map Y : Q, —> Rn via The problem mentioned at C.13,d) and in the Warning following it shows up here because, before we can apply Lemma (a) to prove C.13,c), we need to prove that ,..., Yn) := ^(^-^(R) : 1 < k < n) = rB(Rn) =: a(Y). [This amounts to proving that the product cr-algebra Ili<fc<n same as #(Rn). See Section 8.5.] Now Yk = ~fk°Y, where 7* is the (continu- (continuous, hence Borel) 'A:th coordinate' map on Rn, so that Yk is <x(F)-measurable. On the other hand, every open subset of Rn is a countable union of open rectangles G\ x • • • x Gn where each Gk is a subinterval of R, and since {Y e Gi x • • • x Gn} = f]{Yk e Gk} G <t(Yu ..., Yn), things do work out. □ You can already see why we are in an appendix, and why we skip discussion of C.13,d).
Chapter A4 Appendix to Chapter 4 This appendix gives the statement of Strassen's Law of the Iterated Loga- Logarithm. Section A4.3 treats the completely different topic of constructing a rigorous model for a Markov chain. A4.1. Kolmogorov's Law of the Iterated Logarithm THEOREM Let Xi, X2,... be IID RVs each with mean 0 and variance 1. Let Sn := Xi -f X2 H + Xn. Then, almost surely, S S limsup n = +1, liminf n = —1. V 2n log log n y/2n log log n This result already gives very precise behaviour on the big values of partial sums. See Section 14.7 for proof in the case when the X's are normally distributed. A4.2. Strassen's Law of the Iterated Logarithm Strassen's Law is a staggering extension of Kolmogorov's result. Let (Xn) and (Sn) be as in the previous section. For each u, let the map t»—> St(u) on [0,00) be the linear interpolation of the map n \-+ Sn on Z+, so that := (t - nMn+i(w) + (n + 1 - t)Sn{uj\ te[n,n + 1). With Kolmogorov's result in mind, define 1 ' og log n 208
..(AJ^.S) Chapter A4: Appendix to Chapter 4 209 so that 11-* Zn(t,io) on [0,1] is a rescaled version of the random walk S run up to time n. Say that a function t »-► /(t,u;) is in the set K(w) of limiting shapes of the path associated with lj if there is a sequence ni(u;), ??2(a;),... in N such that Zn(t,u>) —► /(t,u;) uniformly in / € [0,1]. Now let A' consist of those functions / in C[0,1] which can be written in the Lebesgue-integral form f(t) = / h(s)ds where / Jo Jo < 1. Strassen's Theorem P[K(uj) = K] = 1. Thus, (almost) all paths have the same limiting shapes. Khinchine's law follows from Strassen's precisely because (Exercise!) sup{/(l):/ e A'} = 1, inf{/(l):/ 6 A'} = -1. However, the only element of K for which /(I) = 1 is the function f[t) = t, so the big values of S occur when the whole path (when rescaled) looks like a line of slope 1. Almost every path will, in its Z rescaling, look infinitely often like the function t and infinitely often like the function — t; etc., etc. References. For a highly-motivated classical proof of Strassen's Law, see Freedman A971). For a proof for Brownian motion based on the powerful theory of large deviations, see Stroock A984). A4.3. A model for a Markov chain Let E be a countable set; let fj, be a probability measure on (E,£)y where £ denotes the set of all subsets of E\ and let P denote a stochastic E X E matrix as in Section 4.8. Complicating the notation somewhat for reasons we shall discover later, we wish to construct a probability triple (ft,^", P**) carrying an ^-valued stochastic process (Zn:n € Z+) such that for n£Z+ and i'o,»i,. • •,in we have 'o = io\-"\Zn = in) = flio
210 Chapter A4' Appendix to Chapter 4 (A4-8) The trick is to make (ft,.F, P'*) carry independent ^-valued variables (Z0;Y(i,n):i £ E;n£N) Zq having law fi and such that We can obviously do this via the construction in Section 4.6. For Q € Cl and n G N, define Z«(«) :=?(!,,_, (w),n); and that's it!
Chapter A 5 Appendix to Chapter 5 Our task is to prove the Monotone-Convergence Theorem 5.3. We need an elementary preliminary result. A5.1. Doubly monotone arrays Proposition. Let he an array of numbers in [0, oo] which is doubly monotone: for fixed r, yn T as n T so that y^ :=| limj/n exists; n for fixed n, yn T as r T so that yn :=| limy^ exists. r Then n Proof. The result is almost trivial. By replacing each (yn) by arc tan yn , we can assume that the yf "^ are uniformly bounded. Let e > 0 be given. Choose no such that yno > Voo — fe. Then choose r0 such that ylroo) > yno — \e. Then ^_ yno so that y(°°> > y^. Similarly, yoo > y(°°). D A5.2. The key use of Lemma 1.10(a) This is where the fundamental monotonicity property of measures is used. Please re-read Section 5.1 at this stage. 211
212 Chapter A5: Appendix to Chapter 5 (A5.2).. LEMMA (a) Suppose that A € S and that hn £ SF+ and hn Then fio(hn) | fi(A). Proof From E.1,e), fiO(hn) < fi(A), so we need only prove that liminf/io(/in) > Let e > 0, and define in := by Lemma 1.10(a), fJ>(An) T : ^n(^) > 1 — e}. Then An ^ A so that, n <hn so that, by E.1,e), A — e)fi(An) < fio(hn). Hence liminf fio(hn) > A — e)fi(A). Since this is true for every e > 0, the result follows. LEMMA (b) Suppose that f G SF~*~ and that gn e SF+ and gn Then fiQ(gn) | //o(/)- □ Proof We can write / as a finite sum / = £]aJtI>u where the sets ^4* are disjoint and each a* > 0. Then n T and the result follows from Lemma (a). rc T D A5.3. 'Uniqueness of integral' LEMMA (a) Suppose that f 6 (mS)+ and that we have two sequences (/n) of elements of SF+ such that and
..(A5.4) Chapter A5: Appendix to Chapter 5 213 Proof. Let f(nr) := /<r> A /„. Then asrjoo, f{nr) | /n, and as n | oo, /ir) T /(r). Hence, by Lemma A5.2(b), as r The result now follows from Proposition A5.1. □ Recall from Section 5.2 that for / € (mS)+, we define fi(f) := sup{/iO(h) : /* G SF+; h < /} < oo. By definition of//(/), we may choose a sequence /in in SF* such that hn < f and fio(hn) f yw(/). Let us also choose a sequence (#n) of elements of S-F"*" such that gn f /. (We can do this via the 'staircase function' in Section 5.3.) Now let fn := max(gn,h1,h2,.. .,hn). Then /„ E SF+, fn < /, and since /„ > gn, /„ | /• Since /„ < /, f*o(fn) ^ A'C/)? an(i since /n > /in, we see that On combining this fact with Lemma (a), we obtain the next result. (Lemma (a) 'changes our particular sequence to any sequence'.) LEMMA (b) Let f £ (mS)+ and let (fn) be any sequence in SF~*~ such that fn f /. Then n) T A5.4. Proof of the Monotone-Convergence Theorem Recall the statement: Let (fn) be a sequence of elements o/(mS)+ such that fn f /. Then tifn) T Proof Let a^ denote the rth staircase function defined in Section 5.3. Now set fkr) := a(r)(fn), /(r) := «(r)(/)- Since a^ is left-continuous, /£r) | /(r) as n t oo. Since a^r\x) | x, W, fj^ | /n as r | oo. By Lemma A5.2(b), Kfn}) T Kf{r)) ^ n t oo; and by Lemma A5.3(b), fi(fnr)) | *<(/») as r I oo. We also know from Lemma A5.3(b) that jJ.(f^) T /^(/)- The result now follows from Proposition A5.1.
Chapter A 9 Appendix to Chapter 9 This chapter is solely devoted to the proof of the 'infinite-product' Theorem 8.6. It may be read after Section 9.10. It is probably something which a keen student who has read all previous appendices should study with a tutor. A9.1. Infinite products: setting things up Let (An :G N) be a sequence of probability measures on (R,#). Let ft := Y[ R, nGN so that a typical element uj of ft is a sequence lj = (un : n G N) of elements of R. Define Xn(u) := uny and set n := cf\X.\,X2, • • •,Xn). The typical element Fn of Tn has the form (a) Fn = Gn X J] R, Gn 6 k>n Fubini's Theorem shows that on the algebra (NOT a-algebra) we may unambiguously use (a) to define a map P~ : !F~ —* [0,1] via (b) P-(F«) = (A,x...xA«)((? and that P~ is finitely additive on the algebra T~~. However, for each fixed n, 214
..(A9.2) Chapter A9: Appendix to Chapter 9 215 (c) (Oj.FnjP") is a bona fide probability triple which may be identified with rii<Jb<n(^'^'^*:) rn'a (a) and 0*)- Moreover, X\,X2,... ,Xn are independent RVs on (Q,.Fn,P"~). We want to prove that (d) P~ is countably additive on T~ (obviously with the intention of using Caratheodory's Theorem 1.7). Now we know from our proof of the existence of Lebesgue measure (see (A1.9,a)) that it is enough to show that (e) if (Hr) is a sequence of sets in T~ such that Hr 5 #V+i?Vr, and if for some e > 0, P~(Hr) > e for every r, then f]Hr ^ 0. A9.2. Proof of (A9.1,e) Step 1: For every r, there is some n(r) such that Hr G J~n{r) and so hir(w) = hr(u>i,u>2,... ,wn(r)) f°r some hr G bBn^r\ Recall that Xk(w) = w^, and look again at Section A3.2. Step 2: We have (aO) E-hr(XuX2,...,Xn(r)) > e, because the left-hand side of (aO) is exactly P~(Hr). If we work within the probability triple (O,^rn(r), P~), then we know from Section 9.10 that is an explicit version of the conditional expectation of i//r given T\ , and Now, 0 < gr < 1, so that e < Ai(gr) < lAi{^r > e2~1} + e2~1A1{g{1) < A Thus Step 3: However, since Hr D Hr+1, we have (working within where both Hr and Hr+\ are in > gr+i(ui), for every u>i in R.
216 Chapter A9: Appendix to Chapter 9 (A9.2).. Working on (R, 5, Ai) we have Ai{<7r >e2~1} >e2~\ Vr, and gr I so that {gr > e2~1} J,; and by Lemma 1.10(b) on the continuity from above of measures, we have Aifa;! : gr{ux) > e2~\ Vr} > 62. Hence, there exists u>* (say) in R such that (al) E-hr(u>i,X2,...,Xn{r))>e2-\\/r. Step 4' We now repeat Steps 2 and 3 applied to the situation in which (Xi, X2,...) is replaced by hr is replaced by hr(u)*)y where We find that there exists cjJ in R such that (a2) E-hr(ulLjZ,X3,...,Xn(r))>e2-2, Vr. Proceeding inductively, we obtain a sequence a;* = (w; : n G N) with the property that E-hrM,^, ■ ■ -,<(r)) > £2-"(r), Vr. However, and can only be 0 or 1. The only conclusion is that u* G Hr,\fr] and it was exactly the existence of such an uj* which we had to prove. D
Chapter A13 Appendix to Chapter 13 This chapter is devoted to comments on modes of convergence, regarded by many as good for the souls of students, and certainly easy to set exami- examination questions on. A13.1. Modes of convergence: definitions Let (Xn : n G N) be a sequence of RVs and let X be a RV, all carried by our triple (ft,.?7, P). Let us collect together definitions known to us. Almost sure convergence We say that Xn —> X almost surely if P(Xn -> X) = 1. Convergence in probability We say that Xn —> X in probability if, for every e > 0, P(\Xn - X\ > e) -> 0 as n -+ oo. Cp convergence (p > 1) We say that Xn -+ X in Cp if each Xn is in £p and X G £p and ||Xn — X||p —> 0 as n —>■ oo, equivalently, as n-+oo 217
218 Chapter Al3: Appendix to Chapter IS (A13.2).. A13.2. Modes of convergence: relationships Let me state the facts. Convergence in probability is the weakest of the above forms of convergence. Thus (a) (Arn —> X, a.s.) => (Xn —► X in prob) (b) for p > 1, (Xn ->X'm£P)=> (Xn -> X in prob). No other implication between any two of our three forms of convergence is valid. But, of course, for r > p > 1, (c) (Xn -» X in £r) =» (Xn -> X in If we know that 'convergence in probability is happening quickly' in that (d) £ P(\Xn - X\ > e) < oo, Ve > 0, n then (BC1) allows us to conclude that Xn —» X, a.s. The fact that property (d) implies a.s. convergence is used in proving the following result: (e) Xn —* X in probability if and only if every subsequence of (Xn) contains a further subsequence along which we have almost sure convergence to X. The only other useful result is that (f) for p > 1, Xn —* X in Cv if and only if the following two statements hold: (i) Xn —v X in probability, (ii) the family (\Xn\P : n > 1) is UL There is only one way to gain an understanding of the above facts, and that is to prove them yourself. The exercises under EA13 provide guidance if you need it.
Chapter A14 Appendix to Chapter 14 We work with a filtered space (£lyTy {Tn- n £ Z+}, P). This chapter introduces the cr-algebra Tt, where T is a stopping time. The idea is that Tt represents the information available to our observer at (or, if you prefer, immediately after) time T. The Optional-Sampling Theorem says that if X is a uniformly integrable supermartingale and S and T are stopping times with S < T, then we have the natural extension of the supermartingale property: a.s. A14.1. The cr-algebra Tti T a stopping time Recall that a map T: ft —> Z+ U {00} is called a stopping time if {T <n} efn> nel+ U{oo}, equivalently if {T = n} e Tn, n G Z+ U {00}. In each of the above statements, the cn = oo' case follows automatically from the validity of the result for every n in Z+. Let T be a stopping time. Then, for F C Q, we say that F G Tt if Fn{T<n} e equivalently if , n G Z+ U {00}. Then ^V = Tn if T = n; ^r = ^"oo if T = 00; and ^T C ^ for every T. 219
220 Chapter Al4: Appendix to Chapter 14 (AI4.I).. You can easily check that Ft is a cr-algebra. You can also check that if S is another stopping time, then Fsat Q Ft Q Fsvt- Hint. If F G Fsat, then FH{T = n}= |J Fn{SAT = k). D k<n Another detail that needs to be checked is that if X is an adapted process and T is a stopping time, then Xt G ttiFt- Here, X^ is assumed defined in some way such that Xqo is Foo measurable. Proof. For B eB, {XT eB}n{T = n}= {Xn e B) PI {T = n) e Tn. D A14.2. A special case of OST LEMMA Let X be a supermartingale. Let T be a stopping time such that, for some N in N, T(u) < N, Va;. Then XT G C^Q^t.P) and Proof Let F G Ft- Then n<N E(Xn; FH{T = n}) = E(XT] F). n<N (Of course, the fact that \Xt\ < |-X"i |H h \XN | guarantees the result that E(\XT\) < oo.) D A14.3. Doob's Optional-Sampling Theorem for UI martingales Let M be a UI martingale. Then, for any stopping time T, a.s.
..(A14-4) Chapter A14: Appendix to Chapter 14 221 Corollary 1 (a new Optional-Stopping Theorem!) If M is a UI martingale, and T is a stopping time, then E(\Mt\) < oo andE(MT) = E(M0J. Corollary 2 If M is a UI martingale and S and T are stopping times with S < T, then E(MT\FS) = Ms, a.s. Proof of theorem. By Theorem 14.1 and Lemma A14.2, we have, for k £ N, E(Moo|,Fjt) = Mjt, a.s., E(Mk\TTAk) = MTAk, a.s. Hence, by the Tower Property, (*) E(Afool^rAfc) = MTAk, a.s. If F £ TT, then (check!) F n {T < k} G J'rAJb, so that, by (*), (**) E(Moo;FO{T < A:}) = E(MTAk;FD{T < k}) = E(Mr;FO{T < *}). We can (and do) restrict attention to the case when M^ > 0, whence Mn = E(M00|^rn) > 0 for all n. Then, on letting k f oo in (**) and using (MON), we obtain ; F fl {T < oo}) = E(MT; F n {T < oo}). However, the fact that E(Moo; F O {T = oo}) = E(Mr; F n {T = oo}). is tautological. Hence E(Moo*, F) = E(Mt;F). D Corollary 2 now follows from the Tower Property, and Corollary 1 fol- follows from Corollary 2! A14.4. The result for UI submartingales A UI submartingale X has Doob decomposition X = XQ + M + A, where (Exercise: explain why!) E(Aoo) < oo and M is UI. Hence, if T is a stopping time, then, almost surely,
Chapter A16 Appendix to Chapter 16 A16.1. Differentiation under the integral sign Before stating our theorem on this topic, let us examine the type of applica- application we need in Section 16.3. Suppose that X is a RV such that E(|X|) < oo and that h(tyx) = ixetix. (We can treat the real and imaginary parts of h separately.) Note that if [a, b] is a subinterval of R, then the variables {h(t,X) : t G [a, b]} are dominated by \X\, and so are UI. In the theorem, we shall have EH(t,X) = <px(t) - V>x(a), t G [a, 6], and we can conclude that <p'x(t) exists and equals Eh(t,X). THEOREM Let X be a RV carried by (ft, T, P). Suppose that a, b G R with a <b, and that has the 'properties: (i) t *—> h(t,x) is continuous in t for every x in R, (ii) x »—¥ h(t,x) is B-measurable for every t in [a, 6], (iii) the variables {h(t,X) : t G [a,b]} are UI. Then (a) t »—> Eh(t,X) is continuous on [a, 6], (b) h is B[a, b] x B-measurable, 222
..(A16.1) Chapter Al6: Appendix to Chapter 16 223 (c) if H{t, x) := f* h(s, x)ds fora<t<by then for t £ (a, 6), —EH(t,X) exists and equals Eh(t,X). Proof of (a). Since we need only consider the 'sequential case tn —> t\ result (a) follows immediately from Theorem 13.7. Proof of (h). Define^ := 2~n(b - a), Dn := (a + 6Z+) H [a, 6], rn(*) := inf{r G Z>n : r > t], te [a, 6], hn(t,x) := h(rn(t),x), t £ [a, 6], x G R. Then, for J3 G B, U (([r, r + ^) 0 [a, 6]) x {x : ^(r, x) G so that /in is B[a, b] X ^-measurable. Since hn —»• /i on [a, 6] x R, result (b) follows. □ Proof of(c). For F C [a, b] x R, define a(F) := {(t,w) G [a, 6] x SI : (tyX(w)) G F}. If r = A x C where A G 5[a, b] and CG^, then a(r) = Ax (X^C) G 5[a,6] x T. It is now clear that the class of F for which a(F) is an element of B[a, b] X T is a a-algebra containing B[a, b] x j5. The point of all this is that (*) 0t»^0 ^"^ /i(t,X(o;)) is B X J7- measurable since for B G B, {{t,u) : h(t,X(uj)) £ B} is a^^B). (Yes, I know, we could have obtained (*) more directly using the /in's, but it is good to have other methods.) Since the family {h(t,X) : t > 0} is UI, it is bounded in £*, whence / E\h(t,X)\dt<oo. Ja Fubini's Theorem now implies that, for a <t < b, f Eh(s,X)ds = E / h(s,X)ds = EH(t,X), Ja Ja and part (c) now follows. □
Chapter E Exercises Starred exercises are more tricky. The first number in an exercise gives a rough indication of which chapter it depends on. 'G' stands for 'a bit of gumption is all that's necessary'. A number of exercises may also be found in the main text. Some are repeated here. We begin with an Antidote to measure-theoretic material - just for fun, though the point that probability is more than mere measure theory needs hammering home. EG.l. Two points are chosen at random on a line AB, each point being chosen according to the uniform distribution on AB, and the choices being made independently of each other. The line AB may now be regarded as divided into three parts. What is the probability that they may be made into a triangle? EG.2. Planet X is a ball with centre 0. Three spaceships A, B and C land at random on its surface, their positions being independent and each uniformly distributed on the surface. Spaceships A and B can communicate directly by radio if ZAOB < 90°. Show that the probability that they can keep in touch (with, for example, A communicating with B via C if necessary) is (tt -f 2)/Dtt). EG.3. Let G be the free group with two generators a and b. Start at time 0 with the unit element 1, the empty word. At each second multiply the current word on the right by one of the four elements a, a,6, b~~l, choosing each with probability 1/4 (independently of previous choices). The choices a,ayo,a ,a,o , a ,ct, o at times 1 to 9 will produce the reduced word aab of length 3 at time 9. Prove that the probability that the reduced word 1 ever occurs at a positive time is 1/3, and explain why it is intuitively clear that (almost surely) (length of reduced word at time n)/n —> \. 224
Chapter E: Exercises . 225 EG.4.* (Continuation) Suppose now that the elements a, a-1,&, b~* are chosen instead with respective probabilities a,a,/?,/3, where a > 0,/9 > 0,a -f P — \. Prove that the conditional probability that the reduced word 1 ever occurs at a positive time, given that the element a is chosen at time 1, is the unique root x = r(a) (say) in @,1) of the equation 3x3 + C - 4a  As time goes on, (it is almost surely true that) more and more of the reduced word becomes fixed, so that a final word is built up. If in the final word, the symbols a and a are both replaced by A and the symbols b and 6 are both replaced by B, show that the sequence of ^4's and i?'s obtained is a Markov chain on {A, B} with (for example) _ tt(l-s) PAA~ where y = r(/3). What is the (almost sure) limiting proportion of occurrence of the symbol a in the final word? (Note. This result was used by Professor Lyons of Edinburgh to solve a long-standing problem in potential theory on Riemannian manifolds.) Algebras, etc. El.l. 'Probability' for subsets of N Let 7CN. Say that V has (Cesaro) density i(V) and write V £ CES if n exists. Give an example of sets V^ and V2 in CES for which Vi f) V2 £ CES. Thus, CES is not an algebra. Independence E4.1. Let (ft,^", P) be a probability triple. Let Zi, X2 and Z3 be three 7r-systems on Q. such that, for k = 1,2,3, ^f and Q, G Ik-
226 Chapter E: Exercises Prove that if whenever h G Jjt (k = 1,2,3), then cr(Ji),cr(T2),cr(T3) axe independent. Why did we require that Q, £ E4.2. Let s > 1, and define ((s) := ]T]n6N n~*•> as usual- Let X and Y be independent N-valued random variables with P(X = n) = P(Y = n) = n Prove that the events (-Ep : p prime) , where Ep = {X is divisible by are independent. Explain Euler's formula i/C(*) = p probabilistically. Prove that P(no square other than 1 divides X) = l/£B.s). Let H be the highest common factor of X and Y. Prove that P(H = n) = n-2VCB5). E4.3. Let Xi,X2,... be independent random variables with the same con- continuous distribution function. Let E\ := fi, and, for n > 2, let £Jn := {Xn > Xm,Vm < «} = {a 'Record' occurs at time n}. Convince yourself and your tutor that the events E\, E2,... are independent, with P(En) = 1/n. Borel-Cantelli Lemmas E4.4. Suppose that a coin with probability p of heads is tossed repeatedly. Let Ak be the event that a sequence of k (or more) consecutive heads occurs amongst tosses numbered 2*, 2k -f 1,2* -f- 2,..., 2k+1 - 1. Prove that Hint Let E{ be the event that there are k consecutive heads beginning at toss numbered 2k+(i—l)k. Now make a simple use of the inclusion-exclusion formulae (Lemma 1.9).
Chapter E: Exercises 221 E4.5. Prove that if G is a random variable with the normal N@,l) distri- distribution, then, for x > 0, 1 f°° _i.,2 , „ 1 _i 2 Let Xi,X2,... be a sequence of independent N@,l) variables. Prove that, with probability 1, L < 1, where L := limsup(Xn/\/21ogrc). (Harder. Prove that P(L = 1) = 1.) [Hint. See Section 14.8.] Let Sn := Xx + X2 + ••• + Xn. Recall that 5n/v^ has the N@,l) distribution. Prove that P(|5n| < 2y/nlogn, ev) = 1. Note that this implies the Strong Law: P(Sn/n —> 0) = 1. Remark. The Law of the Iterated Logarithm states that P Aim sup . .5wi = \ v 2n log log n Do not attempt to prove this now! See Section 14.7. E4.6. Converse to SLLN Let Z be a non-negative RV. Let Y be the integer part of Z. Show that and deduce that ncN ncN Let (Xn) be a sequence of IID RVs (independent, identically distributed random variables) with E(|Xn|) = co,Vn. Prove that n| > kn] = oo (k G N) and limsup -—— = oo, a.s. 1L n Deduce that if Sn = Xi + X2 H + Xn, then ,. \Sn\ limsup = oo, a.s. n
228 Chapter E: Exercises E4.7. What's fair about a fair game? Let Xi, X2, • • • be independent RVs such that _ \ n2 — 1 with probability n~2 \ — 1 with probability 1 — n~2. Prove that E(Xn) = 0, Vn, but that if Sn = Xx + X2 + • • • + Xn, then Sn , — —^ —— i,. a.s. E4.8*. Blackwell's test of imagination This exercise assumes that you are familiar with continuous-parameter Markov chains with two states. For each n G N, let X^ = {X^n\t) : t > 0} be a Markov chain with state-space the two-point set {0,1} with Q-matrix - ( ~~a and transition function P(n\t) = exp(tQ(n^). Show that, for every f, < an/(an -f- bn). The processes (X^11^ : n G N) are independent and X^n^@) = 0 for every n. Each X^n) has right-continuous paths. Suppose that ^)an = 00 and X)an/^n < 00. Prove that if t is a fixed time then (*) P{X(n\t) = 1 for infinitely many n) = 0. Use Weierstrass's M-test to show that Y2n l°SPoo @ *s uniformly convergent on [0,1], and deduce that P{XM(t) = 0 for ALL n} -> 1 as t J 0. Prove that P{X(n)F) = 0,V6 < ^ Vn} = 0 for every t > 0 and discuss with your tutor why it is almost surely true that
Chapter E: Exercises 229 (**) within every non-empty time interval, infinitely many of the X(n' chains jump. Now imagine the whole behaviour. Notes. Almost surely, the process X — (X^) spends almost all its time in the countable subset of {0,1}N consisting of sequences with only finitely many l's. This follows from (*) and Fubini's Theorem 8.2. However, it is a.s. true that X visits uncountable points of {0,1 }N during every non- nonempty time interval. This follows from (**) and the Baire category theorem Al.12. By using much deeper techniques, one can show that for certain choices of (an) and (bn)y X will almost certainly visit every point of {0,1}N uncountably often within a finite time. Tail a-algebras E4.9. Let Yq, Y\, Y2,... be independent random variables with P(Fn = +1) = P(Yn = -1) = h Vn. For n G N, define Xn := YqYi ... Yn. Prove that the variables X\, X2,... are independent. Define y := a(Fi,y2,...), Tn := a(Xr : r > n). Prove that £ := n <y> r») ^ * (y. n r»)=: n- ( n \ n Hint Prove that Yq G m£ and that Yq is independent of 1Z. E4.10. Star Trek, 2 See E10.il, which you can do now. Dominated-Convergence Theorem E5.1. Let S := [0,1], E := 5E), /* := Leb. Define /„ := n/(O|i/n). Prove that /n(-s) —♦ 0 for every s in 5, but that //(/«) = 1 for every n. Draw a picture of g := supn |/n|, and show that g £ £1E, S,
280 Chapter E: Exercises Inclusion-Exclusion Formulae E5.2. Prove the inclusion-exclusion formulae and inequalities of Section 1.9 by considering integrals of indicator functions. The Strong Law E7.1. Inverting Laplace transforms Let / be a bounded continuous function on [0, oo). The Laplace transform of / is the function L on @, oo) defined by L(\) := / e~Xxf(x)dx. Jo Let X\, X2,... be independent RVs each with the exponential distribution of rate A, so P[X > x] = e~Ax, E(X) = £, Var(X) = ^. Show that where Sn = Xi +X2 H hXn, and L^n~^ denotes the (n — l)th derivative of L. Prove that / may be recovered from L as follows: for y > 0, E7.2. The uniform distribution on the sphere Sn~l C Rn As usual, write 51* = {x € Rn : |x| = 1}. You may assume that there is a unique probability measure i/n~l on (Sn~x,B{Sn~1)) such that z/n~1(A) = un~1(HA) for every orthogonal n X n matrix H and every A in B(Sn~1). Prove that if X is a vector in Rn, the components of which are inde- independent N@,l) variables, then for every orthogonal n X n matrix Hy the vector ifX has the same property. Deduce that X/|X| has law z/n~1. Let Zi, Z2,... be independent N@,l) variables and define Prove that Rn/y/n —> 1, a.s. Combine these ideas to prove a rather striking fact which relates the normal distribution to the 'infinite-dimensional* sphere and which is impor- important both for Brownian motion and for Fock-space constructions in quantum mechanics:
Chapter E: Exercises 231 If, for each ra, (F/n), F2(n),..., Fn(n)) is a point chosen on S11 according to the distribution vn~1, then lim P(v^Y1 <x) = $(x) = = f V2?T J-oo lim n—+oo Hint P(F1(u) < u) = P(Xl/Rn < u), Conditional Expectation E9.1. Prove that if Q is a sub-a-algebra of T and if X G £1(n,^r, P) and if Y eC^Q.g^P) and (*) E(X;G) = E(Y;G) for every G in a 7r-system which contains Q and generates 5, then (*) holds for every G in Q. E9.2. Suppose that X,Y G £1(Q,^",P) and that = F, a.s., E(F|X) = X, a.s. Prove that P(X = Y) = 1. JTmt Consider E(X - F; X > c, Y < c) -f- E(X - F; X < c, Y < c). Martingales El0.1. Polya's urn At time 0, an urn contains 1 black ball and 1 white ball. At each time 1,2,3,..., a ball is chosen at random from the urn and is replaced together with a new ball of the same colour. Just after time n, there are therefore n + 2 balls in the urn, of which Bn + 1 are black, where Bn is the number of black balls chosen by time n. Let Mn = (Bn + l)/(n + 2), the proportion of black balls in the urn just after time n. Prove that (relative to a natural filtration which you should specify) M is a martingale. Prove that P(Bn = k) = (n + I) for 0 < k < n. What is the distribution of 0, where 0 := limMn?
232 Chapter E: Exercises Prove that for 0 < 0 < 1, ( + 1)! Bn _ Bn "' Bn\(n-Bny. defines a martingale N$. (Continued at El0.8.) El0.2. Martingale formulation of Bellman's Optimality Principle Your winnings per unit stake on game n are en, where the en are IID RVs with P(en = +1) = p, P(en = -1) = ?, where $<p=l-q<l. Your stake Cn on game n must lie between 0 and Zn_i, where Zn-\ is your fortune at time n — 1. Your object is to maximize the expected 'inter- 'interest rate' Elog(Ziv/^o)» where N is a given integer representing the length of the game, and Zq, your fortune at time 0, is a given constant. Let J:n = cr(£i,... ,£„) be your 'history' up to time n. Show that if C is any (previsible) strategy, then log Zn — na is a sttpermartingale, where a denotes the 'entropy* a = p\ogp + qlogq + \og2, so that Elog(Z;v/^o) IS N#> but that, for a certain strategy, logZn — na is a martingale. What is the best strategy? El0.3. Stopping times Suppose that S and T are stopping times (relative to (n,.F, {Tn}))- Prove that S AT (:= minE,T)), S V T(:= maxE,T)) and S + T are stopping times. El0.4. Let S and T be stopping times with S < T. Define the process 1(S,T] with parameter set N via Prove that l(s,T\ is previsible, and deduce that if X is a supermartingale, then < E(Xsah), Vn.
Chapter E: Exercises 283 E10.5. 'What always stands a reasonable chance of happening will (almost surely) happen - sooner rather than later,' Suppose that T is a stopping time such that for some N 6 N and some e > 0, we have, for every n: P(T<n + N\fn) >e, a.s. Prove by induction using P(T > kN) = P(T > kN;T > (k - 1)N) that for P(T > kN) < A - e)k. Show that E(T) < oo. E10.6. ABRACADABRA At each of times 1,2,3,..., a monkey types a capital letter at random, the sequence of letters typed forming an IID sequence of RVs each chosen uni- uniformly from amongst the 26 possible capital letters. Just before each time n = 1,2,..., a new gambler arrives on the scene. He bets $ 1 that the nth letter will be A, If he loses, he leaves. If he wins, he receives $ 26 all of which he bets on the event that the (n + l)th letter will be B. If he loses, he leaves. If he wins, he bets his whole current fortune of $ 262 that the (n + 2)th letter will be R and so on through the ABRACADABRA sequence. Let T be the first time by which the monkey has produced the consecutive sequence ABRA- ABRACADABRA. Explain why martingale theory makes it intuitively obvious that and use result 10.10(c) to prove this. (See Ross A9&3) for other such appli cations.) E10.7. Gambler's Ruin Suppose that Xi, X2,... are IID RVs with = q, where 0<p=l-?<l,
234 Chapter E: Exercises and p ^ q. Suppose that a and b are integers with 0 < a < b. Define Sn :=a + Xi + ... + _Yn, T := inf{n : Sn = 0 or 5n = b}. Let ^* = cr(-Yi,..., Arn) {To = {0,ft}). Explain why T satisfies the condi tions in Question E10.5. Prove that n:=(l)s» and Nn = Sn - n(p - q) P define martingales M and N. Deduce the values of P(St = 0) and E(St). E10.8. Bayes' urn A random number 0 is chosen uniformly between 0 and 1, and a coin with probability 0 of heads is minted. The coin is tossed repeatedly. Let Bn be the number of heads in n tosses. Prove that (Bn) has exactly the same probabilistic structure as the (Bn) sequence in (ElO.l) on Polya's urn. Prove that N^ is a regular conditional pdf of 0 given Bi, #2, • •., Bn. (Continued at El8.5.) El0.9. Show that if X is a non-negative supermartingale and T is a stop- stopping time, then E(A't;T<oo)<E(X0). (Hint. Recall Fatou's Lemma.) Deduce that cP(supXn > c) < E(Xq). n E10.10*. The 'Star-ship Enterprise' Problem The control system on the star-ship Enterprise has gone wonky. All that one can do is to set a distance to be travelled. The spaceship will then move that distance in a randomly chosen direction, then stop. The object is to get into the Solar System, a ball of radius r. Initially, the Enterprise is at a distance Ro(> r) from the Sun. Let Rn be the distance from Sun to Enterprise after n 'space-hops'. Use Gauss's theorems on potentials due to spherically-symmetric charge distri- distributions to show that whatever strategy is adopted, 1/Rn is a supermartin- supermartingale, and that for any strategy which always sets a distance no greater than that from Sun to Enterprise, l/Rn is a martingale. Use (E10.9) to show that P[Enterprise gets into Solar System] < r/Ro. For each e > 0, you can choose a strategy which makes this probability greater than (r/R0) — e. What kind of strategy will this be?
Chapter E: Exercises 285 E10.ll*. Star Trek, 2. 'Captain's Log ... Mr Spock and Chief Engineer Scott have modified the control system so that the Enterprise is confined to move for ever in a fixed plane passing through the Sun. However, the next 'hop-length' is now automatically set to be the current distance to the Sun ('next' and 'current' being updated in the obvious way). Spock is muttering something about logarithms and random walks, but I wonder whether it is (almost) certain that we will get into the Solar System sometime ... ' Hint. Let Xn := log Rn—log Rn-i* Prove that Xi, X2,... is an IID sequence of variables each of mean 0 and finite variance cr2 (say), where a > 0. let Prove that if a is a fixed positive number, then P[inf Sn = -00] > P[Sn < -ot<Ty/n,i.o.] n > limsupP[5n < -aay/n\ = $(-a) > 0. (Use the Central Limit Theorem.) Prove that the event {infn Sn = —00} is in the tail cr-algebra of the (Xn) sequence. El2.1. Branching Process A branching process Z = {Zn: n > 0} is constructed in the usual way. Thus, a family < X^ :n,fc > 1 > of IID Z+-valued random variables is supposed given. We define Zq: = 1 and then define recursively: Zn+1: = X[n+l) + ■■■+ Xz"n+1) (n > 0). Assume that if X denotes any one of the Xf. , then //:=E(X)<oo and 0 < cr2: = Vai(X) < 00. Prove that Mn: = Znj\in defines a martingale M relative to the filtration '„ = a (Zq, Z\,..., Zn). Show that and deduce that M is bounded in C2 if and only if// > 1. Show that when
286 Chapter E: Exercises El2.2. Use of Kronecker's Lemma Let Ei,E2,'.- be independent events with P (En) = l/n- Let ^* = Prove that ^ (Yk — £) /log A: converges a.s., and use Kronecker's Lemma to deduce that Nn - > j., a.s., log n where Nn: = Y\ + • • • + Yn, An interesting application is to E4.3, when Nn becomes the number of records by time n. E12.3. Star Trek, 3 Prove that if the strategy in E10.ll is (in the obvious sense) employed - and for ever - in R3 rather than in R2, then Rn2 < oo, a.s., where Rn is the distance from the Enterprise to the Sun at time n. Note. It should be obvious which result plays the key role here, but you should try to make your argument fully rigorous. Uniform Integrability El3.1. Prove that a clas C of RVs is UI if and only if both of the following conditions (i) and (ii) hold: (i) C is bounded in C1, so that A := sup{E(|X|) : X G C) < oo, (ii) for every e > 0, 36 > 0 such that if F G T, P(F) < 8 and X G C, thenE(|X|;F) < e. Hint for 'if. For X € C, P(|X| > K) < K~lA. Hint for 'only if. E(|X|; F) < E(\X\; \X\ > K) + KP{F). El3.2. Prove that if C and V are UI classes of RVs, and if we define C + V := {X + Y : X G C, Y G X>}, then C + T> is UI. Hint. One way to prove this is to use El3.1. E13.3. Let C be a UI family of RVs. Say that Y G V if for some X G C and some sub-cr-algebra Q of ^*, we have Y = E(X|(?), a.s. Prove that V is UI. E14.1. Hunt's Lemma Suppose that (Xn) is a sequence of RVs such that X: — limXn exists a.s. and that (Xn) is dominated by Y in (£*) : and E(F) < oo.
Chapter E: Exercises 231 Let {J-n) be any filtration. Prove that a.s. Hint Let Zm:= supr>m \Xr — X\. Prove that Zm -» 0 a.s. and in C1 Prove that for n > m, we have, almost surely, El4.2. Azuma-Hoeffding Inequality (a) Show that if Y is a RV with values in [—c,c] and with E(Y) = 0, then, for 0 £ R, < cosh 6c < exp I - (b) Prove that if M is a martingale null at 0 such that for some sequence (cn : n G N) of positive constants, |n-Mn_i| <cn, Vn, then, for x > 0, i i sup iK/i; -^ x i ^>» exp i —x I / cl i . Hint for (a). Let /(z):= exp@z), z G [—c, c]. Then, since / is convex, f(y) < ^/(-c) + ^ /or (b). See the proof of A4.7,a). Characteristic Functions E16.1. Prove that lim I x 1 sin x dx = tt/2 by integrating J z 1etzdz around the contour formed by the 'upper' semi- semicircles of radii e and T and the intervals [—T, —e] and [e,T]. E16.2. Prove that if Z has the U[-l, 1] distribution, then <pz(B) = (sin 6)/9,
288 Chapter E: Exercises and prove that there do not exist IID RVs X and Y such that E16.3. Suppose that X has the Cauchy distribution, and let 0 > 0. By integrating et6z/(I -f z2) around the semicircle formed by [—R,R] together with the 'upper' semicircle centre 0 of radius i?, prove that y>x(®) — t~B. Show that <px@) = e"'*1 for all 0. Prove that if Xx, X2,... Xn are IID RVs each with the standard Cauchy distribution, then {X\ + • • • + Xn)/n also has the standard Cauchy distribution. El6.4. Suppose that X has the standard normal N@,l) distribution. Let 6 > 0. Consider JBtt)~2 exp(— \z2)dz around the rectangular contour R - iO) -> (R - iB) and prove that <px(&) = exp(— | E16.5. Prove that if <p is the characteristic function of a RV X, then <p is non-negative definite in that for complex c\, C2,..., cn and real 0\, ^2> • • •»#n 5 {Hint. Express LHS as the expectation of ... .) Bochner's Theorem says that <p is a characteristic function if and only if <p(Q) = 1, <p is contin- continuous, and <p is non-negative definite! (It is of course understood that here (p : R —> C.) E18.6 gives a simpler result in the same spirit. E16.6. (a) Let (ft,^",P) = ([0,l],£[0,l],Leb). What is the distribution of the RV Z, where Z(w) := 2w - 1? Let u = XJ-ni?n(a;) be the binary expansion of a;. Let Qn(u) = odd n Find a random variable V independent of U such that U and V are identi- identically distributed and U + \V \s uniformly distributed on [—1,1]. (b) Now suppose that (on some probability triple) X and Y are IID RVs such that X -f \Y is uniformly distributed on [—1,1].
Chapter E: Exercises 239 Let (f be the CF of X. Calculate <pF)/<p(\0). Show that the distribution of X must be the same as that of U in part (a), and deduce that there exists a set F e 5[-l, 1] such that Leb(F) = 0 and P(X G F) = 1. E18.1. (a) Suppose that A > 0 and that (for n > X)Fn is the DF associ- associated with the Binomial distribution B(n, \/n). Prove (using CFs) that Fn converges weakly to F where F is the DF of the Poisson distribution with parameter A. (b) Suppose that Xi,X2,... are IID RVs each with the density function A — cosx)/irx2 on R. Prove that for iGR, .. D (X\ -f X2 -f • • • -f Xn \ , -1 x hm P I < x j = £ -f 7T arc tan #, n-^00 \ n ) where arc tan £ ( —f, f )• E18.2. Prove the Weak Law of Large Numbers in the following form. Suppose that Xi,X2,... are IID RVs, each with the same distribution as X. Suppose that X £ Cl and that E(X) = ft. Prove by the use of CFs that the distribution of converges weakly to the unit mass at \i. Deduce that An —♦ fi in probability. Of course, SLLN implies this Weak Law. Weak Convergence for Prob[0,1] El8.3. Let X and Y be RVs taking values in [0,1]. Suppose that E(Xk) = E(F*), k = 0,1,2,... . Prove that (i) Ep(X) = Ep(Y) for every polynomial p, (ii) Ef(X) = Ef(Y) for every continuous function / on [0,1], (iii) P(X <x) = P(Y < y) for every x in [0,1]. Hint for (ii). Use the Weierstrass Theorem 7.4. E18.4. Suppose that (Fn) is a sequence of DFs with Fn(x) = 0 for x < 0, Fn(l) = 1, for every n.
240 Chapter E: Exercises Suppose that (*) mjt := lim / xkdFn exists for k = 0,1,2,... . n Use the Helly-Bray Lemma and E18.3 to show that Fn —+ F, where F is characterized by L ^ xkdF = m*, Vk. El8.5. Improving on El8.3: A Moment Inversion Formula Let F be a distribution with F@—) = 0 and F(l) = 1. Let fi be the associated law, and define / 7@ xkdF{x). Define Q = [0,1] x [0,1]N, J = 5x5N, P = //x LebN, This models the situation in which 0 is chosen with law //, a coin with probability 0 of heads is then minted, and tossed at times 1,2,... . See E10.8. The RV H^ is 1 if the kih toss produces heads, 0 otherwise. Define Sn :== H\ -f H2 H + Hn. By the Strong Law and Fubini's Theorem, Define a map D on the space of real sequences (an : n G Z+) by setting Da = (an — an4-i : n £ Z+). Prove that * t<nx at every point x of continuity of F. E18.6* Moment Problem Prove that if (m* : k G Z+) is a sequence of numbers in [0,1], then there exists a RV X with values in [0,1] such that E(Xk) = m* if and only if mo = 1 and (Drm)9>0 (r,s,GZ+). Hint Define Fn via E18.5(*), and then verify that E18.4(*) holds. You can show that the moments mjt>n of Fn satisfy mn>o = 1, mn>i = mi, mn>2 = m2 + n~1(mi - m2), etc. You discover the algebra!
Chapter E: Exercises 241 Weak Convergence for Prob[0,00) E18.7. Using Laplace transforms instead of CFs Suppose that F and G are DFs on R such that F(O-) = G(Q-) = 0, and / e~XxdF(x) = / e~XxdG(x), VA > 0. «/[0,oo) J[Q,oo) Note that the integral on LHS has a contribution F@) from {0}. Prove that F = G. [Hint. One could derive this from the idea in E7.1. However, it is easier to use El8.3, because we know that if X has DF F and Y has DF G, then * n = 0,1,2,... .] Suppose that (jPn) is a sequence of distribution functions on R each with Fn@—) = 0 and such that £(A):=lim I e~XxdFn(x) exists for A > 0 and that L is continuous at 0. Prove that Fn is tight and that Fn Z F where / e~XxdF(x) = L(A), VA > 0. Modes of convergence EA13.1. (a) Prove that (Xn -> X, a.s.) => {Xn -> X in prob). Hint. See Section 13.5. / (b) Prove that (Xn -> X in prob) ^ (Xn -> X, a.s.). iTml Let Xn = lEn» where E\,E2,... are independent events. (c) Prove that if £P(|Xn - .Y| > e) < oo, Ve > 0, then Xn -> X, a.s. mi Show that the set {uj : Xn(u) -f+ X(lj)} may be written |J {uj : \Xn(u) — X(u)\ > A: for infinitely many n}. (d) Suppose that Xn —> X in probability. Prove that there is a subsequence (Xnk) of (Xn) such that Arnfc —> X, a.s. m<. Combine (c) with the 'diagonal principle'.
242, Chapter E: Exercises (e) Deduce from (a) and (d) that Xn —♦ X in probability if and only if every subsequence of (Xn) contains a further subsequence which converges a.s. to A\ EA13.2. Recall that if £ is a random variable with the standard normal N@,l) distribution, then Suppose that £1, £2, • • • are IID RVs each with the N@,l) distribution. Let Sn = YHc=i f *» 1^ a' ^ ^ "» anc^ define Arn = exp(a5n — bn). Prove that (-Yn -> 0, a.s.) & (b > 0) but that for r > 1, (Arn -^ 0 in £r) ^ (r < 26/a2)
References Aldous, D. A989), Probability Approximations via the Poisson Clumping Heuristic, Springer, New York. Athreya, K.B. and Ney, P. A972), Branching Processes, Springer, New York, Berlin. Billingsley, P. A968), Convergence of Probability Measures, Wiley, New York. Billingsley, P. A979), Probability and Measure, Wiley, Chichester, New York Bnd edn. 1987). Bollobas, B. A987), Martingales, isoperimetric inequalities, and random graphs, Coll. Math. Soc. J. Bolyai 52, 113-39. Breiman, L. A968), Probability, Addison-Wesley, Reading, Mass.. Chow, Y.-S. and Teicher, H. A978), Probability Theory: Independence, In- Inter changeability, Martingales, Springer, New York, Berlin. Chung, K.L. A968), A Course in Probability, Harcourt, Brace and Wold, New York. Davis, M.H.A. and Norman, A.R. A990), Portfolio selection with transac- transaction costs, Maths, of Operation Research (to appear). Davis, M.H.A. and Vintner, R.B. A985), Stochastic Modelling and Control, Chapman and Hall, London. Dellacherie, C. and Meyer, P.-A. A980), Probabilites et Potentiel, Chaps. V-VIII, Hermann, Paris. Deuschel, J.-D. and Stroock, D.W. A989), Large Deviations, Academic Press, Boston. Doob, J.L. A953), Stochastic Processes, Wiley, New York. 24 S
244 References Doob, J.L. A981), Classical Potential Theory and its Probabilistic Counter- Counterpart, Springer, New York. Dunford, N. and Schwartz, J.T. A958), Linear Operators: Part I, General Theory, Interscience, New York. Durrett, R. A984), Brownian Motion and Martingales in Analysis, Wads- worth, Belmont, Ca. Dym, H. and McKean, H.P. A972), Fourier Series and Integrals, Academic Press, New York. Ellis, R.S. A985), Entropy, Large Deviations, and Statistical Mechanics, Springer, New York, Berlin. Ethier, S.N. and Kurtz, T.G. A986), Markov Processes: Characterization and Convergence, Wiley, New York. Feller, W. A957), Introduction to Probability Theory and its Applications, Vol.1, 2nd edn., Wiley, New York. Freedman, D. A971), Brownian Motion and Diffusion, Holden-Day, San Francisco. Garsia, A. A973), Martingale Inequalities: Seminar Notes on Recent Prog- Progress, Benjamin, Reading, Mass. Grimmett, G.R. A989), Percolation Theory, Springer, New York, Berlin. Grimmett, G.R. and Stirzaker, D.R. A982), Probability and Random Pro- Processes, Oxford University Press. Hall, P. A988), Introduction to the Theory of Coverage Processes, Wiley, New York. Hall, P. and Heyde, C.C. A980), Martingale Limit Theory and its Applica- Application, Academic Press, New York. Halmos, P.J. A959), Measure Theory, Van Nostrand, Princeton, NJ. Hammersley, J.M. A966), Harnesses, Proc. Fifth Berkeley Symp. Math. Statist, and Prob.y Vol.Ill, 89-117, University of California Press. Harris, T.E. A963), The Theory of Branching Processes, Springer, New York, Berlin. Jones, G. and Jones, T. A949), (Translation of) The Mabinogion, Dent, London. Karatzas, I. and Schreve, S.E. A988), Brownian Motion and Stochastic Calculus, Springer, New York. Karlin, S. and Taylor, H.M. A981), A Second Course in Stochastic Pro- Processes, Academic Press, New York.
References Kendall, D.G. A966), Branching processes since 1873, J. London Math. Soc. 41, 385-406. Kendall, D.G. A975), The genealogy of genealogy: Branching processes before (and after) 1873, Bull. London Math. Soc. 7, 225-53. Kingman, J.F.C. and Taylor, S.J. A966), Introduction to Measure and Prob- Probability, Cambridge University Press. Korner, T.W. A988), Fourier Analysis, Cambridge University Press. Laha, R. and Rohatgi, V. A979), Probability Theory, Wiley, New York. Lukacs, E. A970), Characteristic Functions, 2nd edn., Griffin, London. Meyer, P.-A. A966), Probability and Potential (English translation), Blais- dell, Waltham, Mass. Neveu, J. A965), Mathematical Foundation of the Calculus of Probability (translated from the French), Holden-Day, San Francisco. Neveu, J. A975), Discrete-parameter Martingales, North-Holland, Amster- Amsterdam. Parthasarathy, K.R. A967), Probability Measures on Metric Spaces, Aca- Academic Press, New York. Rogers, L.C.G. and Williams, D. A987), Diffusions, Markov Processes, and Martingales, 2: ltd calculus, Wiley, Chichester, New York. Ross, S. A976), A First Course in Probability, Macmillan, New York. Ross, S. A983), Stochastic Processes, Wiley, New York. Stroock, D.W. A984), An Introduction to the Theory of Large Deviations, Springer, New York, Berlin. Varadhan, S.R.S. A984), Large Deviations and Applications, SIAM, Phila- Philadelphia. Wagon, S. A985), The Banach-Tarski Paradox, Encyclopaedia of Mathe- Mathematics, Vol. 24, Cambridge University Press. Whittle, P. A990), Risk-sensitive Optimal Control, Wiley, Chichester, New York. Williams, D. A973), Some basic theorems on harnesses, in Stochastic Anal- Analysis, eds. D.G. Kendall and E.F. Harding, Wiley, New York, pp.349-66.
Index (Recall that there is a Guide to Notation on pages xiv-xv.) ABRACADABRA D.9, E10.6). adapted process A0.2): Doob decomposition A2.11). a-algebra A.1). algebra of sets A.1). almost everywhere = a.e. A.5); almost surely = a.s. B.4). atoms: of cr-algebra (9.1, 14.13); of distribution function A6.5). Azuma-Hoeffding inequality (E14.2). Baire category theorem (A1.12). Banach-Tarski paradox A.0). Bayes' formula A5.7-15.9). Bellman Optimality Principle (E10.2, 15.3). Black-Scholes option-pricing formula A5.2). Blackwell's Markov chain (E4.8). Bochner's Theorem (E16.5) Borel-Cantelli Lemmas: First = BCl B.7); Second == BC2 D.3); Levy's extension of A2.15). Bounded Convergence Theorem = BDD F.2, 13.6). branching process (Chapter 0, E12.1). Burkholder-Davis-Gundy inequality A4.18). 246
Index 247 Caratheodory's Lemma (A 1.7). Caratheodory's Theorem: statement A.7); proof (Al.8). Central Limit Theorem A8.4). Cesaro's Lemma A2.6). characteristic functions: definition A6.1); inversion formula A6.6); conver- convergence theorem A8.1). Chebyshev's inequality G.3). coin tossing C.7). conditional expectation (Chapter 9): properties (9.7). conditional probability (9.9). consistency of Likelihood-Ratio Test A4.17). contraction property of conditional expectation (9.7,h). convergence in probability A3.5, A13.2). convergence theorems for integrals: MON E.3); Fatou E.4); DOM E.9); BDD F.2, 13.6); for UI RVs A3.7). convergence theorems for martingales: Main A1.5); for UI case A4.1); Upward A4.2); Downward A4.4). cf-system (A1.2). differentiation under integral sign (A16.1). distribution function for RV C.10-3.11). Dominated Convergence Theorem = DOM E.9); conditional (9.7,g). J.L. DOOB's Convergence Theorem A1.5); Decomposition A2.11); Cp in- inequality A4.11); Optional Sampling Theorem (A14.3-14.4); Optional Stopping Theorem A0.10, A14.3); Submartingale Inequality A4.6); Upcrossing Lemma A1.2) - and much else! Downward Theorem A4.4). Dynkin's Lemma (A1.3). events (Chapter 2): independent D.1). expectation F.1): conditional (Chapter 9). 'extension of measures': uniqueness A.6); existence A.7).
248 Index extinction probability @.4). fair game A0.5): unfavourable (E4.7). Fatou Lemmas: for sets B.6,b), 2.7,c); for functions E.4); conditional ver- version (9.7,f). filtered space, filtration A0.1). filtering A5.6-15.9). finite and cr-finite measures A.5). Forward Convergence Theorem for supermartingales A1.5). Fubini's Theorem (8.2). gambler's ruin (E10.7). gambling strategy A0.6). Hardy space Hi A4.18). harnesses A5.10-15.12). hedging strategy A5.2). Helly-Bray Lemma A7.4). hitting times A0.12). HoefFding's inequality (E14.2). Holder's inequality F.13). Hunt's Lemma (E14.1). independence: definitions D.1); 7r-system criterion D.2); and conditioning (9.7,k, 9.10). independence and product measure (8.4). inequalities: Azuma-HoefFding (E14.2); Burkholder-Davis-Gundy A4.18); Chebyshev G.3); Doob's &> A4.11); H61der F.13); Jensen F.6), and in conditional form (9.7,h); Khinchine - see A4.8); Kolmogorov A4.6); Markov F.4); Minkowski F.14); Schwarz F.8). infinite products of probability measures (8.7, Chapter A9); Kakutani's Theorem on A4.12, 14.17). integration (Chapter 5).
Index 249 Jensen's inequality F.6); conditional form (9.7,h). Kakutani's Theorem on likelihood ratios A4.12, 14.17). Kalman-Bucy filter A5.6-15.9). A.N. KOLMOGOROV's Definition of Conditional Expectation (9.2); In- Inequality A4.6); Law of the Iterated Logarithm (A4.1, 14.7); Strong Law of Large Numbers A2.10, 14.5); Three-Series Theorem A2.5); Truncation Lemma A2.9); Zero-One @-1) Law D.11, 14.3). Kronecker's Lemma A2.7). Laplace transforms: inversion (E7.1); and weak convergence (E18.7). law of random variable C.9): joint laws (8.3). least-squares-best predictor (9.4). Lebesgue integral (Chapter 5). Lebesgue measure = Leb A.8, A 1.9). Lebesgue spaces Cpy Lp F.10). P.LEVY's Convergence Theorem for CFs A8.1); Downward Theorem for martingales A4.4); Extension of Borel-Cantelli Lemmas A2.15); In- Inversion formula for CFs A6.6); Upward Theorem for martingales A4.2). Likelihood-Ratio Test, consistency of A4.17). Mabinogion sheep A5.3-15.5). Markov chain D.8, 10.13). Markov's inequality F.4). martingale (Chapters 10-15!): definition A0.3); Convergence Theorem A1.5); Optional-Stopping Theorem A0.9-10.10, A14.3); Optional- Sampling Theorem (Chapter A14). martingale transform A0.6) measurable function C.1). measurable space A.1). measure space A.4). Minkowski's inequality F.14). Moment Problem (E18.6).
250 Index monkey typing Shakespeare D.9). Monotone-Class Theorem C.14, A1.3). Monotone-Convergence Theorem: for sets A.10); for functions E.3, Chap ter A5); conditional version (9.7,e). narrow convergence - see weak convergence. option pricing A5.2). Optional-Sampling Theorem (Chapter A14). Optional-Stopping Theorems A0.9-10.10, A14.3). optional time - see stopping time A0.8). orthogonal projection F.11): and conditional expectation (9.4-9.5). outer measures (A 1.6). 7r-system A.6): Uniqueness Lemmas A.6, 4.2). Polya's urn (E10.1, E10.8). previsible (= predictable) process A0.6). probability density function = pdf F.12); joint (8.3). probability measure A.5). probability triple B.1). product measures (Chapter 8). Pythagoras's Theorem F.9). Radon-Nikodym theorem E.14, 14.13-14.14). random signs A2.3). random walk: hitting times A0.12, E10.7); on free group (EG.3-EG.4). Record Problem (E4.3, E12.2, 18.5). regular conditional probability (9.9). Riemann integral E.3). sample path D.8) sample point B.1) sample space B.1)
Index 251 Schwarz inequality F.8). Star Trek problems (E10.10, E10.ll, E12.3). stopped process A0.9). stopping times A0.8); associated cr-algebras (A14.1). Strassen's Law of the Iterated Logarithm (A4.2). Strong Laws G.2, 12.10, 12.14, 14.5). submartingales and supermartingales: definitions A0.3); convergence theorem A1.5); optional stopping A0.9-10.10); optional sampling (A14.4). superharmonic functions for Markov chains A0.13). symmetrization technique A2.4). d-system (A1.2); 7r-system A.6). tail cr-algebra D.10-4.12, 14.3). Tchebycheff: == Chebyshev G.3). Three-Series Theorem A2.5). tightness A7.5). Tower Property (9.7,i). Truncation Lemma A2.9). uniform integrability (Chapter 13). Up crossing Lemma A1.1-11.2). Uniqueness Lemma A.6). weak convergence (Chapter 17): and characteristic functions A8.1); and moments (E18.3-18.4); and Laplace transforms (E18.7). Weierstrass approximation theorem G.4). Zero-one law = 0-1 law D.11, 14.3).