Автор: Kallenberg О.  

Теги: mathematics  

ISBN: 0-387-95313-2

Год: 2002

Текст
                    Probability and its Applications


A Series of the Applied Probability Trust


Editors: J. Gani, C.C. Heyde, T.G. Kurtz


Springer
New York
Berlin
Heidelberg
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo





Probability and its Applications Anderson: Continuous-Time Markov Chains. Azencott/Dacunha-Castelle: Series of Irregular Observations. Bass: Diffusions and Elliptic Operators. Bass: Probabilistic Techniques in Analysis. Choi: ARMA Model Identification. de la Peiia/Gine: Decoupling: From Dependence to Independence. Galambos/Simonelli: Bonferroni-type Inequalities with Applications. Gani (Editor): The Craft of Probabilistic Modelling. Grandeli.' Aspects of Risk Theory. Gut: Stopped Random Walks. Guyon: Random Fields on a Network. Kallenberg: Foundations of Modem Probability, Second Edition. Last/Brandt: Marked Point Processes on the Real Line. Leadbetter/Lindg reniRootzen: Extremes and Related Properties of Random Sequences and Processes. Nualart: The Malliavin Calculus and Related Topics. Rachev/Ruschendorf: Mass Transportation Problems. Volume I: Theory. Rachev/Ruschendorf' Mass Transportation Problems. Volume II: Applications. Resnick: Extreme Values, Regular Variation and Point Processes. Shedler: Regeneration and Networks of Queues. Thorisson: Coupling, Stationarity, and Regeneration. Todorovic: An Introduction to Stochastic Processes and Their Applications. 
Olav Kallenberg Foundations of Modern Probability Second Edition Springer 
Olav Kallenberg Department of Mathematics Auburn University Auburn, AL 36849 USA Series Editors J. Gani Stochastic Analysis Group, CMA Australian National University Canberra, ACT 0200 Australia C.C. Heyde Stochastic Analysis Group, CMA Australian National University Canberra, ACT 0200 Australia T.G. Kurtz Department of Mathematics University of Wisconsin 480 Lincoln Drive Madison, WI 53706 USA Mathematics Subject Classification (2000): 60-01 Library of Congress Cataloging-in-Publication Data Kallenberg, Olav. Foundations of modem probability I Olav Kallenberg. - 2nd ed. p. em. - (Probability and its applications) Includes bibliographical references and index. ISBN 0-387-95313-2 Calk. paper) 1. Probabilities. I. Title. II. Springer series in statistics. Probability and its applications. QA273.K285 2001 519.2--dc21 2001032816 Printed on acid-free paper. @ 2002 by the Applied Probability Trust. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth A venue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of infonnation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Production managed by Allan Abrams; manufacturing supervised by Jerome Basma. Photocomposed pages prepared by the Bartlett Press. Printed and bound by Maple-Vail Book Manufacturing Group, York, P A. Printed in the United States of America. 98765432 ISBN 0-387-95313-2 Springer- Verlag New York Berlin Heidelberg A member of BertelsmannSpringer Science+Business Media GmbH 
Praise for the First Edition "It is truly surprising how much material the author has managed to cover in the book. ... More advanced readers are likely to regard the book as an ideal reference. Indeed, the monograph has the potential to become a (possibly even 'the') major reference book on large parts of probability theory for the next decade or more." -M. Scheutzow (Berlin) "1 am often asked by mathematicians. .. for literature on 'a broad intro- duction to modern stochastics.' . .. Due to this book, my task for answering is made easier. This is it! A concise, broad overview of the main results and techniques ... . From the table of contents it is difficult to believe that behind all these topics a streamlined, readable text is at all possible. It is: Convince yourself. I have no doubt that this text will become a classic. Its main feature of keeping the whole area of probability together and present- ing a general overview is a real success. Scores of students . .. and indeed researchers will be most grateful!" -P.A.L. E'mbrechts (Ziirich) "The theory of probability has grown exponentially during the second half of the twentieth century, and the idea of writing a single volume that could serve as a general reference ... seems almost foolhardy. Yet this is precisely what Professor Kallenberg has attempted . .. and he has accom- plished it brilliantly. ... With regard to his primary goal, the author has been more successful than I would have imagined possible. It is astonishing that a single volume of just over five hundred pages could contain so much material presented with complete rigor, and still be at least formally self- contained. . .. As a general reference for a good deal of modern probability theory [the book] is outstanding. It should have a place in the library of every probabilist. Professor Kallenberg set himself a very difficult task, and he should be congratulated for carrying it out so well." -R.K. Getoor (La Jolla, California) "This is a superbly written, high-level introduction to contemporary probability theory. In it, the advanced mathematics student will find basic information, presented in a uniform terminology and notation, essential to gaining access to much present-day research. ... I congratulate Professor Kallenberg on a noteworthy achievement." -M.F. Neuts (Tucson, Arizona) "This is a very modern, very ambitious, and very well-,vritten book. The scope is greater than I would have thought possible in a book of this length. This is made possible by the extremely efficient treatment, particularly the proofs ... . [Kallenberg] has succeeded in his mammoth task beyond all reasonable expectations. I think this book is destined to become a modern classic." -N.H. Bingham (London) 
"Kallenberg has ably achieved [his] goal and presents all the important results and techniques that every probabilist should know. ... We do not doubt that the book. .. will be widely used as material for advanced post- graduate courses and seminars on various topics in probability." -jste, European Math. Soc. Newsletter "This is a very well written book. ... Much effort must have been put into simplifying and streamlining proofs, and the results are quite impres- sive. ... I would highly recommend [the book] to anybody who wants a good concise reference text on several very important parts of modern prob- ability theory. For a mathematical sciences library, such a book is a must." -K. Borovkov (Melbourne) "[This] is an unusual book about a wide range of probability and stochas- tic processes, written by a single excellent mathematician. . .. The graduate student will definitely enjoy reading it, and for the researcher it will become a useful reference book and necessary tool for his or her work." -T. Mikosch (Groningen) "The author has succeeded in writing a text containing-in the spirit of Loeve's Probability Theory-all the essential results that any probabilist needs to know. Like Loeve's classic, this book will become a standard source of study and reference for students and researchers in probability theory." - R. Kiesel (London) "Kallenberg's present book would have to qualify as the assimilation of probability par excellence. It is a great edifice of material, clearly and ingeniously presented, without any nonmathematical distractions. Readers wishing to venture into it may do so with confidence that they are in very capable hands." -F.B. Knight (Urbana, Illinois) "The presentation of the material is characterized by a surprising clarity and precision. The author's overview over the various subfields of probabil- ity theory and his detailed knowledge are impressive. Through an activity over many years as a researcher, academic teacher, and editor, he has ac- quired a deep competence in many areas. Wherever one reads, all chapters are carefully worked through and brought in streamlined form. One can imagine what an enormous effort it has cost the author to reach this final state, though no signs of this are visible. His goal, as set forth in the preface, of giving clear and economical proofs of the included theorems has been achieved admirably. . .. I can't recall that in recent times I have held in my hands a mathematics book so thoroughly worked through." -H. Rost (Heidelberg) 
Preface to the Second Edition For this new edition the entire text has been carefully revised, and some portions are totally rewritten. More importantly, I have inserted more than a hundred pages of new material, in chapters on general measure and er- godic theory, the asymptotics of Markov processes, and large deviations. The expanded size has made it possible to give a self-contained treatment of the underlying measure theory and to include topics like multivariate and ratio ergodic theorems, shift coupling, Palm distributions, entropy and information, Harris recurrence, invariant measures, strong and weak ergod- icity, Strassen's law of the iterated logarithm, and the basic large deviation results of Cramer, Sanov, Schilder, and Freidlin and Ventzel. Unfortunately, the body of knowledge in probability theory keeps grow- ing at an ever increasing rate, and I am painfully aware that I will never catch up in my efforts to survey the entire subject. Many areas are still totally beyond reach, and a comprehensive treatment of the more recent developments would require another volume or two. I am asking for the reader's patience and understanding. Many colleagues have pointed out errors or provided helpful information. I am especially grateful for some valuable comments from Wlodzimierz Kuperberg, Michael Scheutzow, Josef Teichmann, and Hermann Thoris- son. Some of the new material was presented in our probability seminar at Auburn, where I benefited from stimulating discussions with Bill Hud- son, Ming Liao, Lisa Peterson, and Hussain Talibi. My greatest thanks are due, as always, to my wife Jinsoo, whose constant love and support have sustained and inspired me throughout many months of hard work. Olav Kallenberg March 2001 
Preface to the First Edition Some thirty years ago it was still possible, as Loeve so ably demonstrated, to write a single book in probability theory containing practically every- thing worth knowing in the subject. The subsequent development has been explosive, and today a corresponding comprehensive coverage would require a whole library. Researchers and graduate students alike seem compelled to a rather extreme degree of specialization. As a result, the subject is threatened by disintegration into dozens or hundreds of subfields. At the same time the interaction between the areas is livelier than ever, and there is a steadily growing core of key results and techniques that every probabilist needs to know, if only to read the literature in his or her own field. Thus, it seems essential that we all have at least a general overview of the whole area, and we should do what we can to keep the subject together. The present volume is an earnest attempt in that direction. My original aim was to write a book about "everything." Various space and time constraints forced me to accept more modest and realistic goals for the project. Thus, "foundations" had to be understood in the narrower sense of the early 1970s, and there was no room for some of the more recent developments. I especially regret the omission of topics such as large devia- tions, Gibbs and Palm measures, interacting particle systems, stochastic differential geometry, Malliavin calculus, SPDEs, measure-valued diffu- sions, and branching and superprocesses. Clearly plenty of fundamental and intriguing material remains for a possible second volume. Even with my more limited, revised ambitions, I had to be extremely se- lective in the choice of material. More importantly, it was necessary to look for the most economical approach to every result I did decide to include. In the latter respect, I was surprised to see how much could actually be done to simplify and streamline proofs, often handed down through gen- erations of textbook writers. My general preference has been for results conveying some new idea or relationship, whereas many propositions of a more technical nature have been omitted. In the same vein, I have avoided technical or computational proofs that give little insight into the proven results. This conforms with my conviction that the logical structure is what matters most in mathematics, even when applications is the ultimate goal. Though the book is primarily intended as a general reference, it should also be useful for graduate and seminar courses on different levels, rang- ing from elementary to advanced. Thus, a first-year graduate course in measure-theoretic probability could be based on the first ten or so chapters, while the rest of the book will readily provide material for more advanced courses on various topics. Though the treatment is formally self-contained, as far as measure theory and probability are concerned, the text is intended for a rather sophisticated reader with at least some rudimentary knowledge of subjects like topology, functional analysis, and complex variables. 
x Foundations of Modern Probability My exposition is based on experiences from the numerous graduate and seminar courses I have been privileged to teach in Sweden and in the United States, ever since I was a graduate student myself. Over the years I have developed a personal approach to almost every topic, and even experts might find something of interest. Thus, many proofs may be new, and every chapter contains results that are not available in the standard textbook literature. It is my sincere hope that the book will convey some of the excitement I still feel for the subject, which is without a doubt (even apart from its utter usefulness) one of the richest and most beautiful areas of modern mathematics. Notes and Acknowledgments: My first thanks are due to my numer- ous Swedish teachers, and especially to Peter Jagers, whose 1971 seminar opened my eyes to modern probability. The idea of this book was raised a few years later when the analysts at Gothenburg asked me to give a short lecture course on "probability for mathematicians." Although I objected to the title, the lectures were promptly delivered, and I became convinced of the project's feasibility. For many years afterward I had a faithful and enthusiastic audience in numerous courses on stochastic calculus, SDEs, and Markov processes. I am grateful for that learning opportunity and for the feedback and encouragement I received from colleagues and graduate students. Inevitably I have benefited immensely from the heritage of countless authors, many of whom are not even listed in the bibliography. I have further been fortunate to know many prominent probabilists of our time, who have often inspired me through their scholarship and personal example. Two people, Klaus Matthes and Gopi Kallianpur, stand out as particularly important influences in connection with my numerous visits to Berlin and Chapel Hill, respectively. The great Kai Lai Chung, my mentor and friend from recent years, of- fered penetrating comments on all aspects of the work: linguistic, historical, and mathematical. My c?lleague Ming Liao, always a stimulating partner for discussions, was kind enough to check my material on potential theory. Early versions of the manuscript were tested on several groups of graduate students, and Kamesh Casukhela, Davorin Dujmovic, and Hussain Talibi in particular were helpful in spotting misprints. Ulrich Albrecht and Ed Slaminka offered generous help with software problems. I am further grate- ful to John Kimmel, Karina Mikhli, and the Springer production team for their patience with my last-minute revisions and their truly professional handling of the project. My greatst thanks go to my family, who is my constant source of happiness and inspiration. Without their love, encouragement, and understanding, this work would not have been possible. Olav Kallenberg May 1997 
Contents Preface to the Second Edition vii Preface to the First Edition ix 1. Measure Theory - Basic Notions Measurable sets and functions measures and integration monotone and dominated convergence transformation of integrals product measures and Fubini's theorem LP -spaces and projection approximation measure spaces and kernels 1 2. Measure Theory - Key Results Outer measures and extension Lebesgue and Lebesgue-Stieltjes measures Jordan-Hahn and Lebesgue decompositions Radon-Nikodym theorem Lebesgue's differentiation theorem functions of finite variation Riesz' representation theorem Haar and invariant measures 23 3. Processes, Distributions, and Independence Random elements and processes distributions and expectation independence zero-one laws Borel-Cantelli lemma Bernoulli sequences and existence moments and continuity of paths 45 4. Random Sequences, Series, and Averages Convergence in probability and in £P uniform integrability and tightness convergence in distribution convergence of random series strong laws of large numbers Portmanteau theorem continuous mapping and approximation coupling and measurability 62 
Xll Foundations of Modern Probability 5. Characteristic Functions and Classical Limit Theorems 83 Uniqueness and continuity theorem Poisson convergence positive and symmetric terms Lindeberg's condition general Gaussian convergence weak laws of large numbers domain of Gaussian attraction vague and weak compactness 6. Conditioning and Disintegration 103 Conditional expectations and probabilities regular conditional distributions disintegration conditional independence transfer and coupling existence of sequences and processes extension through conditioning 7. Martingales and Optional Times 119 Filtrations and optional times random time-change martingale property optional stopping and sampling maximum and upcrossing inequalities martingale convergence, regularity, and closure limits of conditional expectations regularization of submartingales 8. Markov Processes and Discrete-Time Chains 140 Markov property and transition kernels finite-dimensional distributions and existence space and time homogeneity strong Markov property and excursions invariant distributions and stationarity recurrence and transience ergodic behavior of irreducible chains mean recurrence times 9. Random W"alks and Renewal Theory 159 Recurrence and transience dependence on dimension general recurrence criteria symmetry and duality Wiener-Hop! factorization 
Contents Xlll ladder time and height distribution stationary renewal process renewal theorem 10. Stationary Processes and Ergodic Theory Stationarity, invariance, and ergodicity discrete- and continuous-time ergodic theorems moment and maximum inequalities multivariate ergodic theorems sample intensity of a random measure subadditivity and products of random matrices conditioning and ergodic decomposition shift coupling and the invariant a-field 178 11. Special Notions of Symmetry and Invariance Palm distributions and inversion formulas stationarity and cycle stationarity local hitting and conditioning ergodic properties of Palm measures exchangeable sequences and processes strong stationarity and predictable sampling ballot theorems entropy and information 202 12. Poisson and Pure Jump-Type Markov Processes Random measures and point processes Cox processes, randomization, and thinning mixed Poisson and binomial processes independence and symmetry criteria Markov transition and rate kernels embedded Markov chains and explosion compound and pseudo-Poisson processes ergodic behavior of irreducible chains 224 13. Gaussian Processes and Brownian Motion Symmetries of Gaussian distribution existence and path properties of Brownian motion strong Markov and reflection properties arcsine and uniform laws law of the iterated logarithm Wiener integrals and isonormal Gaussian processes multiple Wiener-Ita integrals chaos expansion of Brownian functionals 249 
XIV . Foundations of Modern Probability 14. Skorohod Embedding and Invariance Principles 270 Embedding of random variables approximation of random walks functional central limit theorem laws of the iterated logarithm arcsine laws approximation of renewal processes empirical distribution functions embedding and approximation of martingales 15. Independent Increments and Infinite Divisibility 285 Regularity and integral representation Levy processes and subordinators stable processes and first-passage times infinitely divisible distributions characteristics and convergence criteria approximation of Levy processes and random walks limit theorems for null arrays convergence of extremes 16. Convergence of Random Processes, Measures, and Sets 307 Relative compactness and tightness uniform topology on C(K, S) Skorohod's J 1 -topology equicontinuity and tightness convergence of random measures superposition and thinning exchangeable sequences and processes simple point processes and random closed sets 17. Stochastic Integrals and Quadratic Variation 329 Continuous local martingales and semimartingales quadratic variation and covariation existence and basic properties of the integral integration by parts and [to's formula Fisk-Stratonovich integral approximation and uniqueness random time-change dependence on parameter 18. Continuous Martingales and Brownian Motion 350 Real and complex exponential martingales martingale characterization of Brownian motion random time-change of martingales integral representation of martingales 
Contents xv iterated and multiple integrals change of measure and Girsanov's theorem Cameron-Marlin theorem Wald's identity and Novikov's condition 19. Feller Processes and Semigroups 367 Semigroups, resolvents, and generators closure and core Hille- Yosida theorem existence and regularization strong Markov property characteristic operator diffusions and elliptic operators convergence and approximation 20. Ergodic Properties of Markov Processes 390 transition and contraction operators ratio ergodic theorem space-time invariance and tail triviality mixing and convergence in total variation Harris recurrence and transience existence and uniqueness of invariant measure distributional and pathwise limits 21. Stochastic Differential Equations and Martingale Problems 412 Linear equations and Ornstein-Uhlenbeck processes strong existence, uniqueness, and nonexplosion criteria weak solutions and local martingale problems well-posedness and measurability pathwise uniqueness and functional solution weak existence and continuity transformation of SDEs strong Markov and Feller properties 22. Local Time, Excursions, and Additive Functionals 428 Tanaka's formula and semimartingale local time occupation density, continuity and approximation regenerative sets and processes excursion local time and Poisson process Ray-Knight theorem excessive functions and additive functionals local time at a regular point additive functionals of Brownian motion 
XVI Foundations of Modern Probability 23. One-dimensional SDEs and Diffusions Weak existence and uniqueness pathwise uniqueness and comparison scale function and speed measure time-change representation boundary classification entrance boundaries and Feller properties ratio ergodic theorem recurrence and ergodicity 450 24. Connections with PDEs and Potential Theory 470 Backward equation and Feynman-Kac formula uniqueness for SDEs from existence for PDEs harmonic functions and Dirichlet's problem Green functions as occupation densities sweeping and equilibrium problems dependence on conductor and domain time reversal capacities and random sets 25. Predictability, Compensation, and Excessive Functions 490 Accessible and predictable times natural and predictable processes Doob-Meyer decomposition quasi-Left-continuity compensation of random measures excessive and superharmonic functions additive functionals as compensators Riesz decomposition 26. Semimartingales and General Stochastic Integration 515 Predictable covariation and £2 -integral semimarlingale integral and covariation general substitution ,rule Doleans' exponential and change of measure norm and exponential inequalities martingale integral decomposition of semimartingales quasi-martingales and stochastic integrators 27. Large Deviations 537 Legendre-Fenchel transform Cramer's and Schilder's theorems large-deviation principle and rate function functional form of the LDP 
Contents XVll continuous mapping and extension perturbation of dynamical systems empirical processes and entropy Strassen's law of the iterated logarithm Appendices AI. Advanced Measure Theory Polish and Borel spaces measurable inverses projection and sections A2. Some Special Spaces Function spaces measure spaces spaces of closed sets measure-valued functions projective limits 561 562 Historical and Bibliographical Notes 569 Bibliography 596 Symbol Index 621 Author Index 623 Subject Index 629 
Words of Wisdom and Folly  "A mathematician who argues from probabilities in geometry is not worth an ace" - Socrates (on the demands of rigor in mathematics) ,. "[We will travel a road] full of interest of its own. It familiarizes us with the measurement of variability, and with curious laws of chance that apply to a vast diversity of social subjects" - Francis Galton (on the wondrous world of probability) ,. "God doesn't play dice" [i.e., there is no randomness in the universe] - Albert Einstein (on quantum mechanics and causality) ,. "It might be possible to prove certain theorems, but they would not be of any interest, since, in practice, one could not verify whether the assumptions are fulfilled" - Emile Borel (on why bothering with probability) ,. "[The stated result] is a special case of a very general theorem [the strong Markov property]. The measure [theoretic] ideas involved are somewhat glossed over in the proof, in order to a.void complexities out of keeping with the rest of this paper" - Joseph L. Doob (on why bothering with generality or mathematical rigor)  "Probability theory [has two hands]: On the right is the rigorous [technical work]; the left hand. .. reduces problerns to gambling sit- uations, coin-tossing, motions of a physical particle" - Leo Breiman (on probabilistic thinking) ,. "There are good taste and bad taste in mathematics just as in music, literature, or cuisine, and one who dabbles in it must stand judged thereby" - Kai Lai Chung (on the art of writing mathematics) ,. "The traveler often has the choice between climbing a peak or using a cable car" - William Feller (on the art of reading mathematics)  "A Catalogue Aria of triumphs is of less benefit [to the student] than an indication of the techniques by which such results are achieved" - David Williams (on seduction and the art of discovery) tit "One needs [for stochastic integration] a six months course [to cover only] the definitions. What is there to do?" - Paul-Andre Meyer (on the dilemma of modern math education) ,. "There were very many [bones] in the open valley; and 10, they were very dry. And [God] said unto me, 'Son of man, can these bones live?' And I answered, '0 Lord, thou knowest.'" - Ezekiel 37:2-3 (on the ultimate reward of hard studies, as quoted by Chris Rogers and David Williams) 
Chapter 1 Measure Theory - Basic Notions Measurable sets and functions; measures and integration; mono- tone and dominated convergence; transformation of integrals; product measures and Fubini's theorem; LP -space.., and projec- tion; approximation; measure spaces and kernels Modern probability theory is technically a branch of measure theory, and any systematic exposition of the subject must begin with some basic measure-theoretic facts. In this chapter and its sequel we have collected some basic ideas and results from measure theory that will be useful throughout this book. Though most of the quoted propositions may be found in any textbook in real analysis, our emphasis is often somewhat different and has been chosen to suit our special needs. Many readers may prefer to omit these chapters on their first encounter and return for reference when the need arises. To fix our notation, we begin with some elementary notions from set theory. For subsets A, A k , B,... of some abstract space !1, recall the defi- nitions of union Au B or Uk A k , intersection An B or nk Ak, complement A c, and difference A \ B = A n BC. The latter is said to be proper if A  B. The symmetric difference of A and B is given by AB == (A \ B) U (B \ A). Among basic set relations, we note in particular the distributive laws AnUkB k = Uk(AnB k ), and de Morgan's laws Au n k Bk = nk(A U B k ), {UkA k f = nkAk' {nkAkf =UkA k , valid for arbitrary (not necessarily countable) unions and intersections. The latter formulas allow us to convert any relation involving unions (intersections) into the dual formula for intersections (unions). We define a a-algebra or a-field in n as a nonempty collection A of subsets of n that is closed under countable unions and intersections as well as under complementation. (For a field, closure is required only under finite set operations.) Thus, if A, AI, A 2 , . . . E A, then also A c, Uk Ak' and nk Ak lie in A. In particular, the whole space !1 and the empty set 0 belong to every a-field. In any space !1 there is a smallest a-field {0, n} and a largest one 2!1 -the class of all subsets of O. Note that any O"-field A is closed under monotone limits. Thus, if AI, A 2 ,. .. E A with An t A or An 4.. A, 
2 Foundations of Modern Probability then also A E A. A measurable space is a pair (O,A), where 0 is a space and A is a a-field in f2. For any class of a-fields in 0, the intersection (but usually not the union) is again a a-field. If C is an arbitrary class of subsets of f2, there is a smallest a-field in 0 containing C, denoted by a(C) and called the a-field generated or induced by C. Note that a(C) can be obtained as the intersection of all a-fields in n that contain C. We endow a metric or topological space S with its Borel a-field B(8) generated by the topology (class of open subsets) in 8, unless a a-field is otherwise specified. The elements of 8(8) are called Borel sets. In the case of the real line JR, we often write B instead of B(JR). More primitive classes than a-fields often arise in applications. A class C of subsets of some space f2 is called a 1r-system if it is closed under finite intersections, so that A, B E C implies An B E C. Furthermore, a class V is a A-system if it contains f2 and is closed under proper differences and increasing limits. Thus, we require that 0 E V, that A, B E V with A  B implies A \ B E V, and that AI, A 2 ,... E V with An t A implies A E V. The following monotone-class theorem is often useful to extend an estab- lished property or relation from a class C to the generated a-field a(C). An application of this result is referred to as a monotone-class argument. Theorem 1.1 (monotone classes, Sierpinski) Let C be a 1r-system and V a A-system in some space f! such that C c V. Then a(C) c v. Proof: We may clearly assume that V == A( C) -the smallest A-system containing C. It suffices to show that V is a 7r-system, since it is then a a-field containing C and therefore contains the smallest a-field a(C) with this property. Thus, we need to show that A n B E V whenever A, B E V. The relation A n B E V is certainly true when A, B E C, since C is a 1r-system contained in V. We proceed by extension in two steps. First we fix any B E C and define AB = {A c 0; An B E V}. Then AB is a A- system containing C, and so it contains the smallest .A-system V with this property. This shows that An B E V for any A E V and B E C. Next we fix any A E V and define B A = {B c 0; An B E V}. As before, we note that even B A contains V, which yields the desired property. 0 For any family of spaces Qt, t E T, we define the Cartesian product X tET f2 t as the class of all collections (Wt; t E T), where Wt E f!t for all t. When T = {I,..., n} or T = N = {I, 2, . . . }, we often write the product space as 0 1 x.. . x On or f2 I x O 2 X - · . , respectively; if f1t = 0 for all t, we use the notation OT, on, or 0 00 . In case of topological spaces Ot, we endow Xtf2t with the product topology unless a topology is otherwise specified. Now assume that each space Ot is equipped with a a-field At. In Xtnt we may then introduce the product a-field @t At, generated by all one- dimensional cylinder sets At x XstOs, where t E T and At E At- (Note the analogy with the definition of product topologies.) As before, we write Al @ - . . Q9 An, Al QS) A 2 Q9 . . . , AT, An, or Aoo in the appropriate special cases. 
1. Measure Theory - Basic Notions 3 Lemma 1.2 (product and Borel a-fields) If 8 1 ,8 2 ,... are separable metric spaces, then B ( 8 1 X 8 2 X . . . ) == B ( 8 1 ) Q9 B ( 8 2 ) Q9 . .. . Thus, for countable products of separable metric spaces, the product and Borel a-fields agree. In particular, B(JR d ) == (B(JR))d == B d , the a- field generated by all rectangular boxes II x . . . X I d , where II, . . . ,Id are arbitrary real intervals. This special case can also be proved directly. Proof: The assertion may be written as a(C l ) == a(C 2 ), and it suffices to show that C l C a(C 2 ) and C 2 C a(C l ). For C 2 we may choose the class of all cylinder sets Gk X Xn¥=k8n with kEN and G k open in Sk. Those sets generate the product topology in S == XnSn, and so they belong to B(S). Conversely, we note that S == XnSn is again separable. Thus, for any topological base C in 8, the open subsets of 8 are countable unions of sets in C. In particular, we may choose C to consist of all finite intersections of cylinder sets G k X Xn#kSn as above. It remains to note that the latter sets lie in @n B(Sn). 0 Every point mapping f between two spaces Sand T induces a set mapping f-I in the opposite direction, that is, from 2 T to 2 8 , given by f-IB=={SES;f(s)EB}, BeT. Note that f-l preserves the basic set operations in the sense that, for any subsets Band Bk of T, j-IB C == (j-1B)C, j -l U B == U f -IB k kkk, 1-1nkBk = n/- 1Bk . (1) The next result shows that f-I also preserves a-fields, jn both directions. For convenience, we write f- I C=={f- l B;BEC}, Ce2 T . Lemma 1.3 (induced a-fields) Let I be a mapping between two measur- able spaces (8, S) and (T,7). Then (i) S' == f-IT is a u-field in 8; (ii) T ' == {B c T; f-IB E S} is a a-field in T. Proof: (i) Let A, AI, A 2 , . . . E S'. Then there exists some sets B, B I , B 2 , .. . E T with A = 1-1 B and An == f-l Bn for each n. Since T is a a-field, the sets BC, Un Bn, and nn En all belong to T, and by (1) we get (f-l B)C == j-l B C E j- 1 T == S'. U n l-l Bn = 1-1 Un Bn E I-IT = 5', nn l - 1 Bn = 1-1 nn Bn E I-IT = 5'. A C Un An - nn An - 
4 Foundations of Modern Probability (ii) Let B, B 1 , B 2 ,... E T', so that j-l B, f-l B 1 , j-l B 2 ,... E S. Using (1) and the fact that S is a a-field, we get j-lB C (j-l B)C E S, 1-1 Un Bn U n l-l Bn E S, j-l n Bn n j- 1B n E S, n n which shows that BC, Un En, and nn Bn all lie in T'. o Given two measurable spaces (S, S) and (T, T), a mapping f: S  T is said to be SIT-measurable or simply measurable if f-l, C S, that is, if j-l B E S for every BET. (Note the analogy with the definition of continuity in terms of topologies on Sand T.) By the next result, it is enough to verify the defining condition for a generating subclass. Lemma 1.4 (measurable functions) Consider a mapping f between two measurable spaces (8,S) and (T, T), and let C C 2 T with a(C) = T. Then j is SIT-measurable iff j-IC C S. Proof: Let 7' = {B c T; f-l B E S}. Then C c 7' by hypothesis and 7' is a u-field by Lemma 1.3 (ii). Hence, T' = a(T') :J a(C) = T, which shows that 1-1 B E S for all BET. o Lemma 1.5 (continuity and measurability) Let j be a continuous map- ping between two topological spaces Sand T with Borel a-fields Sand ,. Then f is S IT -measurable. Proof: Let S' and 7' denote the classes of open sets in Sand T. Since f is continuous and 8 = u(8'), we have j- 1 T' C 8' c S. By Lemma 1.4 it follows that f is 8ja(T')-measurable. It remains to note that a(7') = I. 0 We insert a result about subspace topologies and a-fields that will be needed in Chapter 16. Given a class C of subsets of S and a set A c S, we define An C = {A n c; C E C}. Lemma 1.6 (subspaces) Fix a metric space (S,p) with topology T and Borel a-field S, and let A c S. Then (A, p) has topology IA = AnT and Borel O"-field SA = An S. Proof: The natural embedding lA: A -+ S is continuous and hence mea- surable, and so AnT = J A 1 T C 7A and An S = lA1S C SA- Conversely, given any BElA, we define G = (B U AC)O where the complement and interior are with respect to S and note that B = AnG. Hence, TA cAnT, 
1. Measure Theory - Basic Notions 5 and therefore SA == a(TA) C a(A n T) c a(A n S) == A r1 S, where the operation a(.) refers to the subspace A. o As with continuity, we note that even measurability is preserved by composition. Lemma 1.7 (composition) Fix three measurable spaces (5,S), (T, T), and (U, U), and consider some measurable mappings .f : 5  T and g: T  U. Then the composition h == 9 0 1: S  U is again measurable. Proof: Let C E U, and note that B _ g-lC E T since 9 is measur- able. Noting that (1 0 g)-l == g-1 0 1-1 and using the fact that even f is measurable, we get h- 1 e = (1 0 g)-Ie = g-1 I-Ie == g-l B E S. o To state the next result, we note that any collection of functions It : n -+ St, t E T, defines a mapping f = (it) from n to XtS t given by I(w) = (/t(w); t E T), wEn. (2) It is often useful to relate the measurability of 1 to that of the coordinate mappings It. Lemma 1.8 (collections of functions) Consider any set of functions It: [! -t St, t E T, where (O,A) and (St,St), t E T, are rneasurable spaces, and define I = (It) : n -+ XtS t . Then I is A/ @t St-m,easurable iff It is A/St-measurable for every t E T. Proof: We may use Lemma 1.4, with C equal to the class of cylinder sets At x Xs:f.tSt for arbitrary t E T and At ESt. 0 Changing our perspective, assume the It in (2) to be mappings into some measurable spaces (St, St). In n we may then introduce the generated or induced a-field u(/) = u{/t; t E T}, defined as the smallest a-field in n that makes all the It measurable. In other words, a(l) is the intersection of all a-fields A in n such that it is A/St-measurable for every t E T. In this notation, the functions It are clearly measurable with respect to a a-field A in n iff a(f) C A. It is further useful to note that o-(f) agrees with the u-field in n generated by the collection {ft-lSt; t E T}. For functions on or into a Euclidean space JRd, measurability is under- stood to be with respect to the Borel u-field B d . Thus, a real-valued function f on some measurable space (!1,A) is measurable iff {uJ; f(w) < x} E A for all x E R. The same convention applies to functions into the ex- tended real line JR = [-00,00] or the extended half-line JR + = [0,00], regarded as compactifications of JR and JR+ == [0,(0), respectively. Note that B( JR ) = u{B,:i:oo} and B( JR +) = u{B(R+),oo}. For any set A C 0, we define the associated indicator function lA: 0  JR to be equal to 1 on A and to 0 on A c. (The term characte'ristic function has 
6 Foundations of Modern Probability a different meaning in probability theory.) For sets A = {W; f(w) E B}, it is often convenient to write 1{.} instead of 1{.}. Assuming A to be a a-field in 0, we note that lA is A-measurable iff A E A. Linear combinations of indicator functions are called simple functions. Thus, a general simple function 1: n -+ ]R has the form f == CI 1 A 1 + ... + cn1An' where n E Z+ = {O, 1, . . . }, Cl, . . . , en E R, and AI,. . . , An C O. Here we may clearly take Cl, . . . , C n to be the distinct nonzero values attained by f and define Ak = 1-1 { Ck}, k = 1, . . . , n. With this choice of representation, we note that f is measurable with respect to a given O"-field A in 11 iff AI, . . . , An E A. We proceed to show that the class of measurable functions is closed under the basic finite or countable operations occurring in analysis. Lemma 1.9 (bounds and limits) Let 11,12,... be measurable junc- tions from some measurable space (11,A) into JR . Then sUP n In, inf n fn, limsuPn In, and liminf n In are again measurable. Proof: To see that sUP n f n is measurable, write {W; sUPnfn(w) < t} = nn {w; fn(w) < t} = nn f ;l[-oo, t] E A, and use Lemma 1.4. The measurability of the other three functions follows easily if we write inf n In = -suPn( - In) and note that lim sup f n = inf sup Ik, lirn inf In == sup inf fk. 0 n--+oo n kn n--+oo n kn Since I n  f iff lim sUPn In == lirn inf n f n == I, it follows easily that both the set of convergence and the possible limit are measurable. The next result gives an extension to functions with values in more general spaces. Lemma 1.10 (convergence and limits) Let 11,12,. .. be measurable func- tions from a measurable space (S1, A) into some metric space (8, p). Then (i) {w; 1n(w) converges} E A when S is complete; (ii) f n --t f on n implies that f is measurable. Proof: (i) Since S is complete, the convergence of In is equivalent to the Cauchy convergence Hm sup P(fm, In) = O. n--+oo mn Here the left-hand side is measurable by Lemmas 1.5 and 1.9. (ii) If f n  f, we have 9 0 f n ---t 9 0 f for any continuous function 9: S --t IR, and so go f is measurable by Lemmas 1.5 and 1.9. Fixing any open set G C S, we may choose some continuous functions 91, 92, . . .: S -+ JR+ with 9n t 1G and conclude from Lemma 1.9 that IG 0 f is measurable. Thus, G E A for all and so is measurable Lemma 1.4. 0 
1. Measure Theory - Basic Notions 7 Many results in measure theory are proved by a simple approximation, based on the following observation. Lemma 1.11 (approximation) For any measurable function f: (O,A) -+ R+, there exist some simple measurable functions fl, f2, . . . : n -+ JR.+ with o < In t f. Proof: We may define fn(w) == 2- n [2 n f(w)) /\ n, w E 0, n E N. o To illustrate the method, we may use the last lemma to prove the measurability of the basic arithmetic operations. Lemma 1.12 (elementary operations) Fix any measurable functions f, g: (n, A) -+ JR and constants a, b E JR. Then af + bg and f 9 are again measurable, and so is I / 9 when 9 =1= 0 on n. Proof: By Lemma 1.11 applied to f:f= == (-:t.f) V 0 and g:f= == (:t:g) V 0, we may approximate by simple measurable functions f n -+ f and gn -+ g. Here afn + bg n and Ingn are again simple measurable functions. Since they converge to af + bg and f g, respectively, even the latter functions are measurable by Lemma 1.9. The same argument applies to the ratio f / g, provided we choose 9n -1= o. An alternative argument is to write af + bg, fg, or f /9 as a composition 1/J 0 'P, where 'P == (I, g) : n -+ JR2, and 'ljJ(x, y) is defined as ax + by, xy, or x/y, repectively. The desired measurability then fo11o",'s by Lemmas 1.2, 1.5, and 1.8. In the case of ratios, we may use the continuity of the mapping (x, y) r-t x/yon JR x (JR \ {O}). 0 For many statements in measure theory and probability, it is convenient first to give a proof for the real line and then to extend the result to more general spaces. In this context, it is useful to identify pairs of measurable spaces S and T that are Borel isomorphic, in the sense that there exists a bijection f: S -+ T such that both f and j-l are measurable. A space S that is Borel isomorphic to a Borel subset of [0, 1] is called a Borel space. In particular, any Polish space endowed with its Borel a-field is known to be Borel (cf. Theorem Al.2). (Recall that a topological space is said to be Polish if it admits a separable and complete metrization.) The next result gives a useful functional representation of measurable functions. Given any two functions f and 9 on the same space f2, we say that f is g-measurable if the induced a-fields are related by a(f) C a(g). Lemma 1.13 (functional representation, Doob) Fix two measurable func- tions f and 9 from a space n into some measurable spaces (5, S) and (T, T), where the former is Borel. Then j is g-measurable iff there exists some measurable mapping h: T -+ S with f == hog. Proof: Since S is Borel, we may assume that S E B([O, 1]). By a suitable modification of h, we may further reduce to the case when S == [0,1]. If 
8 Foundations of Modern Probability 1 == lA with a g-measurable A c 0, then by Lemma 1.3 there exists some set BET with A == g-l B. In this case I == lA == lB 0 g, and we may choose h == lB. The result extends by linearity to any simple g-measurable function f. In the general case, there exist by Lemma 1.11 some simple g-measurable functions 11,12,... with 0 < In t f, and we may choose associated T-measurable functions hI, h 2 ,. ..: T  [0,1] with In == h n 0 g. Then h == sUP n h n is again T-measurable by Lemma 1.9, and we note that hog == (suPnhn) 0 9 == sUPn(h n 0 g) == suPnfn == f. o Given any measurable space (SJ, A), a function J.l: A --7 JR + is said to be countably additive if H u Ak " HAk AI, A 2 , ... E A dis;oint. fA' kl == klfA' ,  (3) A measure on ([2, A) is defined as a function J1: A   + with J-l0 == 0 and satisfying (3). A triple (0, A, J-l) as above, where J.l is a measure, is called a measure space. From (3) we note that any measure is finitely additive and nondecreasing. This implies in turn the countable subadditivity lI. u A k < II.A k A1 , A2 , ...EA. fA' k;::::l - k?lfA' , We note the following basic continuity properties. Lemma 1.14 (continuity) Let J1 be a measure on (O,A), and assume that AI, A 2, . . . EA. Then (i) An t A implies J1 A n t J.lA; (ii) An t A with J.tAl < 00 implies JLAn  JLA. Proof: For (i) we may apply (3) to the differences Dn = An \ An-1 with Ao == 0. To get (ii), apply (i) to the sets Bn == Al \ An. 0 The simplest measures on a measurable space (f2, A) are the unit masses or Dirac measures 8x, x E 0, given by 8xA == lA(x). For any countable set A == {Xl, x2,... }, we may form the associated counting measure JL == Ln 8xn. More generally, we o may form countable linear combinations of arbitrary measures on 0, as follows. Proposition 1.15 (series of measures) For any measures {t1,J-l2,... on (0, A) and constants C1, C2, . .. > 0, the sum J.1 = Ln cnJ-ln is again a measure. Proof: We need the fact that, for any array of constants Cij > 0, i,j E N, 2:: i 2::/ i j = LjLiCij. (4) This is trivially true for finite sums. In general, let m, n E N and write   . > ""  c.. - "  c. . ij'-"£J - imjn J - jnim J. 
1. Measure Theory - Basic Notions 9 Letting m --t 00 and then n --t 00, we obtain (4) with the inequality > . The same argument yields the reverse relation, and the equality follows. Now consider any disjoint sets A 1 ,A 2 ,... E A. Using (4) and the countable additivity of each j.,Ln, we get J.tU k Ak I: n CnJ.tn Uk Ak = I:nI: k CnJ.tnAk I:kI: n CnJ.tnAk = I:kJ.tA k . The last result may be restated in terms of monotone sequences. o Corollary 1.16 (monotone limits) Let J.11, J-l2, . .. be measures on a mea- surable space (0, A) such that either J-ln t J-l, or J.1n t J1 with J.11 bounded. Then J1 is again a measure on (f!, A) . Proof: In the increasing case, we may apply Proposition 1.15 to the sum J-l = n (J..tn - j.,Ln-l), where J-lo == O. For decreasing sequences, the previous case applies to the increasing measures J-ll - J.1n. 0 For any measure j.,L on (0, A) and set B E A, the function v : A t---+ j.,L( A n B) is again a measure on (0, A), called the restriction of J.1 to B. Given any countable partition of 0 into disjoint sets AI, A 2 , . .. E A, we note that J-l == n J-ln, where J-ln denotes the restriction of J.1 to An. The measure J-L is said to be 0'- finite if the partition can be chosen such that j.,LAn < 00 for all n. In that case the restrictions J.ln are clearly bounded. A measure J-l on some topological space S with Borel a-field S is said to be locally finite if every point s E S has a neighborhood where J.l is finite. A locally finite measure on a a-compact space is clearly a-finite. It is often useful to identify simple measure-determining classes C c S such that a measure on S is uniquely determined by its values on C. For locally finite measures on a Euclidean space JRd, we may take C == T d , the class of all bounded rectangles. Lemma 1.17 (uniqueness) Let J.1 and v be bounded rneasures on some measurable space (0, A) and let C be a 7r-system in f! such that nEe and O'(C) == A. Then J.1 == v iff J-lA == vA for all A E C. Proof: Assuming J.1 == v on C, let V denote the class of sets A E A with J-lA == vA. Using the condition 0 E C, the finite additivity of J-l and v, and Lemma 1.14, we see that V is a A-system. Moreover, C c D by hypothesis. Hence, Theorem 1.1 yields V => a(C) == A, which means that J.1 == v. The converse assertion is obvious. D For any measure J..L on a topological space S, the support sUPP J-l is defined as the smallest closed set F c S with J.lFc == O. If Isupp J.11 < 1, then J.1 is said to be degenerate, and we note that J..L == cDs for some s E Sand c > o. More generally, a measure J..t is said to have an atom at s E S if {s} E S and J-l{ s} > O. For any locally finite measure J..L on some a-compact metric space S, the set A == {s E S; J..L{ s} > O} is clearly measurable, and we may define the atomic and diffuse components J.la and J..td of J.l as the restrictions 
10 Foundations of Modern Probability of jj to A and its complement. We further say that jj is diffuse if jja = 0 and purely atomic if ILd = o. In the important special case when jj is locally finite and integer valued, the set A above is clearly locally finite and hence closed. By Lemma 1.14 we further have supp JL C A, and so jj is purely atomic. Hence, in this case J-t = LSEA csb s for some integers C s . In particular, J-L is said to be simple if C s = 1 for all sEA. Then clearly IL agrees with the counting measure on its support A. Any measurable mapping f between two measurable spaces (S, S) and (T, T) induces a mapping of measures on 8 into measures on T. More precisely, given any measure J-t on (8, S), we may define a measure J-t 0 f-l on (T, 7) by (J10 f-l)B = J.L(f-lB) = J.L{s E S; I(s) E B}, BET. Here the countable additivity of IL 0 f-l follows from that for JL together with the fact that 1-1 preserves unions and intersections. Our next aim is to define the integral JLf = J fdJL = J f(w)JL() of a real-valued, measurable function f on some measure space (n, A, Jl). First assume that f is simple and nonnegative, hence of the form CI1Al + . . . + C n 1 An for some n E Z+, AI,. . . , An E A, and Cl, . . . , C n E JR+, and define J.Lf = cIJ.LA I + . . . + C n J1 A n. (Throughout measure theory we are following the convention 0 . 00 = 0.) Using the finite additivity of J-t, it is easy to verify that ILl is independent of the choice of representation of I. It is further clear that the mapping f H- J.Lf is linear and nondecreasing, in the sense that Jl(al + bg) f < g => aJ.Lf + bjjg, a, b > 0, ILl < ILg. To extend the integral to any nonnegative measurable function f, we may choose as in Lemma 1.11 some simple measurable functions 11,/2,. . . with 0 < In t f, and define ILl = limn JLfn o The following result shows that the limit is independent of the choice of approximating sequence (In). Lemma 1.18 (consistency) Fix any measurable function f > 0 on some measure space (n, A, J.t), and let fl, f2, . o. and 9 be simple measurable functions satisfying 0 < f n t f and 0 < 9 < f. Then limn J.tf n > JLg. Proof: By the linearity of JL, it is enough to consider the case when 9 = lA for some A E A. Then fix any c > 0, and define An = {w E A; fn(w) > 1 - e}, n E N. 
1. Measure Theory - Basic Notions 11 Here An t A, and so J-Lln > (1 - t:)JtAn t (1 - t:)JtA = (1 - t:)/lg. I t remains to let t: --+ O. o The linearity and monotonicity extend immediately to arbitrary f > 0, since if fn t f and gn t g, then afn + bg n t af + bg, and if also f < g, then fn < (fn V gn) t g. We are now ready to prove the basic continuity property of the integral. Theorem 1.19 (monotone convergence, Levi) Let 1,11, f2 . .. be measur- able functions on (n, A, J-L) with 0 < f n t I. Then Jt f n t J-" f . Proof: For each n we may choose some simple measurable functions gnk, with 0 < 9nk t fn as k --+ 00. The functions h nk == glk V . . . V 9nk have the same properties and are further nondecreasing in both indices. Hence, f > lim h kk > lim hnk == fn t f, k--+oo k--+oo and so 0 < h kk t f. Using the definition and rnonotonicity of the integral, we obtain J-Lf == Hrn Jth kk < lim Jtfk < Jtf. k--+oo k--+oo The last result leads to the following key inequality. Lemma 1.20 (Fatou) For any measurable functions 11, f2,' . . > 0 on (0, A, Jl), we have o liminf Jlfn > JLliminf fn. n--+oo noo Proof: Since 1m > inf k 2::n fk for all m > n, we have inf J-Lfk > Jl inf fk, n E N. k2::n kn Letting n --+ 00, we get by Theorem 1.19 lirn inf Jtfk > lim J-L inf fk == JL Hm inf fk' k-HX> n-+oo kn koo o A measurable function f on (0, A, JL) is said to be integrable if J-Llfl < 00. In that case f may be written as the difference of two nonnegative, integrable functions 9 and h (e.g., as f+ - f-, where fT. = (-:1::f) V 0), and we may define Jlf = Jl9 - J-Lh. It is easy to check that the extended integral is independent of the choice of representation f = 9 - h and that J-lf satisfies the basic linearity and monotonicity properties (the former with arbitrary real coefficients). We are now ready to state the basic condition that allows us to take limits under the integral sign. For gn = 9 the result reduces to Lebesgue '8 dominated convergence theorem, a key result in analysis. 
12 Foundations of Modern Probability Theorem 1.21 (dominated convergence, Lebesgue) Let 1,/1,/2,. .. and g, gl, g2,. .. be measurable functions on (0, A, J.t) with Ifni < gn for all n, and such that In -t f, gn -t g, and J-tgn -t J-tg < 00. Then J.tfn -t J-tf. Proof: Applying Fatou's lemma to the functions gn ::l: fn > 0, we get J-tg + liminf(::l:J.Lfn) = liminf J.t(gn::i: fn) > J.t(g::l: f) = J-tg::l: J.tf. noo noo Subtracting J-tg < (X) from each side gives J.tf < Hm inf J-tfn < lim sup J-tfn < J.tf. noo o noo The next result shows how integrals are transformed by measurable mappIngs. Lemma 1.22 (substitution) Consider a measure space (0, A, J.t), a mea- surable space (S, S) , and two measurable mappings f : 0 -+ Sand g: S -t JR. Then J-t(g 0 f) = (J-t 0 f-l)g (5) whenever either side exists. (In other words, if one side exists, then so does the other and the two are equal.) Proof: If 9 is an indicator function, then (5) reduces to the definition of J.Lof-l. From here on we may extend by linearity and monotone convergence to any measurable function 9 > O. For general 9 it follows that J-tlg 0 II = (J-t 0 f-l )lgl, and so the integrals in (5) exist at the same time. When they do, we get (5) by taking differences on both sides. 0 Turning to the other basic transformation of measures and integrals, fix any measurable function f > 0 on some measure space (n, A, J.t), and define a function f . J.L on A by (f . JL)A = JL(lAf) = i fdJL, A E A, where the last relation defines the integral over a set A. It is easy to check that v = I . J.L is again a measure on (n, A). Here f is referred to as the J-t-density of v. The corresponding transformation rule is as follows. Lemma 1.23 (chain rule) For any measure space (0, A, j.t) and measur- able functions f: n -t JR+ and g: 0  JR, we have J.L(fg) = (f. J.t)g whenever either side exists. Proof: As in the last proof, we may begin with the case when 9 is an indicator function and then extend in steps to the general case. 0 Given a measure space (0, A, j.t), a set A E A is said to be J.L-null or simply null if J-tA = o. A relation between functions on n is said to hold almost everywhere with respect to j.t (abbreviated as a. e. J.t or j.t- a. e.) if it 
1. Measure Theory - Basic Notions 13 holds for all w E 0 outside some Jl-null set. The following frequently used result explains the relevance of null sets. Lemma 1.24 (null sets and junctions) For any measurable function f > o on some measure space (0, A, Jl), we have JlI == 0 iff f == 0 a. e. Jl. Proof' The statement is obvious when f is simple. In the general case, we may choose some simple measurable functions In with 0 < fn t f, and note that f = 0 a.e. iff In = 0 a.e. for every n, that is, iff J-lfn = 0 for all n. Here the latter integrals converge to JlI, and so the last condition is equivalent to J.lf == o. 0 The last result shows that two integrals agree when the integrands are a.e. equal. We may then allow integrands that are undefined on some Jl-null set. It is also clear that the conclusions of Theorems 1.19 and 1.21 remain valid if the hypotheses are only fulfilled outside some null set. In the other direction, we note that if two a-finite measures Jl and v are related by v == I . Jl for some density f, then the latter is Jl-a.e. unique, which justifies the notation I == d1/ / dJ-l. It is further clear that any J-l-null set is also a null set for v. For measures Jl and v with the latter property, we say that v is absolutely continuous with respect to J-l and write v «:: Jl. The other extreme case is when Jl and 1/ are mutually singular or orthogonal (written as Jl .1. 1/), in the sense that ttA == 0 and v A C == 0 for some set AE A. Given a measure space (O,A,Jl) and a a-field :F c A, we define the J.l-completion of F in A as the a-field FJ-L == a(F,NJ-L)' where NJ-L denotes the class of all subsets of arbitrary J.t-null sets in A. The description of :FJ-L can be made more explicit, as follows. Lemma 1.25 (completion) Consider a measure space (O,A,Jl), a a-field :F c A, and a Borel space (8, S). Then a function I : 0 -t S is :FJ-L- measurable iff there exists some :F -measurable function g satisfying f == 9 a.e. J.L. Proof: Beginning with indicator functions, let 9 be the class of subsets A c n such that AB E NJ-L for some B E F. Then A \ Band B \ A are again in NJ-L' which implies 9 C FJ-L. Conversely, FI-L C 9 since both :F and NJ-L are trivially contained in 9. Combining the two relations gives 9 == FJ-L, which shows that A E ;:P- iff lA == IB a.e. for some B E :F. In the general case, we may clearly assume that S =: [0,1]. For any:FJ-l- measurable function f, we may then choose some simple FJ-L-measurable functions fn such that 0 < In t f. By the result for indicator functions, we may next choose some simple F-measurable functions 9n such that In == 9n a.e. for each n. Since a countable union of null sets is again a null set, the function 9 == Jim sUPn gn has the desired property. 0 Any measure tt on (0, A) has a unique extension to the a-field AJ-L. In- deed, for any A E AJ£ there exist by Lemma 1.25 some sets A::f: E A with 
14 Foundations of Modern Probability A_ cAe A+ and J.L(A+ \ A_) = 0, and any extension must satisfy J.tA == J1. A :i:. With this choice, it is easy to check that J1. remains a measure on AIL. Our next aims are to construct product measures and to establish the basic condition for changing the order of integration. This requires a preliminary technical lemma. Lemma 1.26 (sections) Fix two measurable spaces (8, S) and (T, T), a measurable function f: 8 x T -+ IR+, and a a-finite measure J1. on S. Then (i) f(8, t) is S-measurable in s E 8 for each t E T; (ii) J f(s,t)J1.(ds) is T-measurable in t E T. Proof: We may assume that J.l is bounded. Both statements are obvious when f == 1A with A = B x C for some B E Sand C E T, and they extend by a monotone class argument to any indicator functions of sets in S 0 T. The general case follows by linearity and monotone convergence. 0 We are now ready to state the main result involving product measures, commonly referred to as Fubini's theorem. Theorem 1.27 (product measures and iterated integrals, Lebesgue, Fu- bini, Tonelli) For any u-finite measure spaces (8, S, J.l) and (T, T, v), there exists a unique measure jj 0 v on (8 x T, S Q9 T) satisfying (J.t 0 v)(B x C) = J.LB . vC, B E 5, C E T. (6) Furthermore, for any measurable function f: S x T --+ +, (J.l Q9 v)f = f J.l(ds) f f(s, t)v(dt) = J v(dt) f f(s, t)J.l(ds). (7) The last relation remains valid for any measurable function f: 8 x T -+ JR with (J1. 0 v) If I < 00. Note that the iterated integrals in (7) are well defined by Lemma 1.26, although the inner integrals vf(s,.) and J..Lf(., t) may fail to exist on some null sets in Sand T, respectively. Proof By Lemma 1.26 we may define (J.l Q9 v)A = J J.l(ds) f lACS, t)v(dt), A E S Q9 T, (8) which is clearly a measure on S x T satisfying (6). By a monotone class argument there can be at most one such measure. In particular, (8) remains true with the order of integration reversed, which proves (7) for indicator functions f. The formula extends by linearity and monotone convergence to arbitrary measurable functions f > o. In the general case, we note that (7) holds with f replaced by If I. If (J..L 0 v) If I < 00, it follows that N s = {s E S; vl/( s, .) I = oo} is a J1.-null set in S whereas NT = {t E T; pll(., t)1 = oo} is a v-null set in T. By Lemma 1.24 we may redefine f(8, t) to be zero when 8 E Ns or tENT. Then (7) follows for f by subtraction of the formulas for f + and f _ . 0 
1. Measure Theory - Basic Notions 15 The measure J.-t 0 v in Theorem 1.27 is called the product measure of J.-l and v. Iterating the construction in finitely many steps, we obtain product measures J.-li 0 . . . @ J.-tn = @ k J.-tk satisfying higher-dimensional versions of (7). If J.-tk == J-t for all k, we often write the product as J.-l)n or J-ln. By a measurable group we mean a group G endowed with a a-field 9 such that the group operations in G are {I-measurable. If J.-lI, . . . , Jln are a-finite measures on G, we may define the convolution Ji'l * . . . * /--In as the image of the product measure J.11 @ . . . 0 J.-tn on en under the iterated group operation (Xl, . . . , X n ) r-t Xl . . . Xn. The convolution is said to be associative if (ILl * J-t2) * JL3 = J.11 * (IL2 * J-t3) whenever both J-tl * J-t2 and fL2 * fL3 are a-finite and commutative if J-tl * /--l2 == /--l2 * /--ll. A measure J.-t on G is said to be right or left invariant if fL 0 Tg-I == jL for all 9 E G, where Tg denotes the right or left shift x I---t xg or x I---t gx. When G is Abelian, the shift is called a translation. We may also consider spaces of the form G x S, in which case translations are defined to be mappings of the form Tg: (x, s) r-t (x + g, s ) . Lemma 1.28 (convolution) The convolution of a-finite measures on a measurable group (G, Q) is associative, and for Abelian G it is also commutative. In the latter case, (J1. * v)B = J J1.(B - s) v(ds) = J v(B - s) J1.(ds), BEg. If J.L == f . A and v == 9 . A for some invariant measure A, then J-L * v has the A-density (f * g)( s) = J I (s - t) g( t) ).. ( dt) = J I ( t) g( s - t) ).. ( dt) , s E G. Proof: Use Fubini's theorem. o Given a measure space (!1, A, J-t) and a p > 0, we write LP = LP(!1, A, /-1) for the class of all measurable functions f: !1 -+ IR with IIfli p - (J-tlfI P )l/ p < 00. Lemma 1.29 (Holder and Minkowski inequalities) For any measurable functions f andg on some measure space (!1,A,jL), we have (i) III gilT < IIlllpl/gl/q for all p, q, r > 0 with p-l + q-J == r- l , (ii) IIf + gll1\1 < IIfll1\1 + IIglll\l for all p > o. Proof: (i) It is clearly enough to take r = 1 and II flip == IIgllq == 1. The relation p-l + q-l = 1 implies (p - l)(q - 1) == 1, and so the equations y = x p - 1 and x == yq-l are equivalent for x, y > o. By calculus, Ifl Igi 11g1 < J o xp-1dx + Jo yq-1dy = p- 1 lll P + q-1Iglq, and so IIlgll1 < p-l J 111 P dJ1. + q-lIIlqdJ1. = p-l + q-l = 1. 
16 Foundations of Modern Probability (ii) The relation holds for p < 1 by the concavity of x P on +. For p > 1, we get by (i) with q == p/(1 - p) and r == 1 IIf + 911 < J Ifllf + 9l p - 1 djL + J 191 If + 9l p - 1 djL < IIfllpllf + gll-l + IIgllpllf + gll-l. 0 The inequality in (ii) is often needed in the following extended form. Corollary 1.30 (extended Minkowski inequality) Let J-l, v, and f be such as in Theorem 1.27, and assume that J-tf(t) == J f(s, t)j.t(ds) exists for t E T a.e. v. Write Ilfllp(s) = (vlf(s, ')IP)l/p. Then IIJ-lfll p < J-lllfll p , p > 1. Proof:,...Since IJ-lfl < J.tlfl, we may assume that I > 0, and we may also assume that IIJ.tfll p E (0, (0). For p > 1, we get by Fubini's theorem and Holder's inequality lIJ.tfll = v(J.Lf)P = v(J-tI(j.tf)P-l) == jjv(f(jjf)P-l) < J.tllfll p 1I(J-tf)P-l\1q = J.Lllfll p lIJ.tfll-I, and it remains to divide by lIJ.lfll-I. The proof for p = 1 is similar but simpler. 0 In particular, Lemma 1.29 shows that II . lip becomes a norm for p > 1 if we identify functions that agree a.e. For any p > 0 and f, fl, f2, . . . E LP, we write In --+ f in LP if Il/n - flip --+ 0 and say that (fn) is Cauchy in LP if Ilfm - fn\lp -4 0 as m,n -400. Lemma 1.31 (completeness) Let (fn) be a Cauchy sequence in LP, where p > o. Then II/n - Illp --+ 0 for some f E £P. Proof: Choose a subsequence (nk) C N with Ek IIfnk+l - Ink 111\1 < 00. By Lemma 1.29 and monotone convergence we get II Ek Ifnk+l - fnk 11I/\1 < 00, and SQ Ek I/nk+l - ink I < 00 a.e. Hence, (Ink) is a.e. Cauchy in, and so Lemma 1.10 yields fnk -4 I a.e. for some measurable function f. By Fatou's lemma, III - fnllp < liminf !link - In lip < sup 111m - fnllp -1- 0, n --+ 00, k--+oo mn which shows that in -4 I in LP. 0 The next result gives a useful criterion for convergence in LP. Lemma 1.32 (LP-convergence) For any p > 0, let f, fl, 12,. .. E LP with In -4 f a.e. Then In -4 f in LP iff IIfnllp --t IIfll p . Proof: If fn -4 f in £P, we get by Lemma 1.29 IlIfntl PAI - IIfUp"l, < U/n - fll P l\1 -t 0, J!...%.......................P P 
1. Measure Theory - Basic Notions 17 and so Ilfnllp -+ Ilfll p . Now assume instead the latter condition, and define 9n = 2P(llnl P + I/IP), 9 = 2 P + 1 IfI P . Then gn -+ 9 a.e. and j.t9n -+ J-t9 < 00 by hypotheses. Since also 19n I > Ifn - flP -t 0 a.e., Theorem 1.21 yields Ilfn - fll == J-tlfn - liP -+ o. 0 Taking p == q == 2 and r == 1 in Lemma 1.29 (i), we get the Cauchy- Buniakovsky or Schwarz inequality I1lgllt < Ilf112119112. In particular, we note that, for any f, 9 E £2, the inner product (f, g) == j.t(fg) exists and satisfies l(f,9)1 < 11/1I211g112. From the obvious bilinearity of the inner product, we get the parallelogram identity IIf + gl1 2 + Ilf - 911 2 == 211fl1 2 + 211g112, f,9 E L 2 . (9) Two functions f, 9 E L 2 are said to be orthogonal (written as f -L 9) if (I, g) == o. Orthogonality between two subsets A, B c L 2 means that f J.. 9 for all f E A and 9 E B. A subspace M c L 2 is said to be linear if af + bg E M for any f, gEM and a, b E JR, and closed if f E M whenever f is the L2-limit of a sequence in M. Theorem 1.33 (orthogonal projection) Let M be a closed linear subspace of £2. Then any function f E £2 has an a. e. unique decomposition 1 = g+h with gEM and h 1. M. Proof: Fix any f E £2, and define d == inf{llf - 911; gEM}. Choose 91, g2, . . . E M with II f - 9n II -+ d. Using the linearity of M, the definition of d, and (9), we get as m, n -+ 00, 4d 2 + IIgm - 9nl\2 < 112f - gm - 9nl1 2 + 119m - 9nl1 2 2111 - 9mll 2 + 2111 - 9n 11 2 -+ 4d 2 . Thus, 119m - gn II -t 0, and so the sequence (gn) is Cauchy in £2. By Lemma 1.31 it converges toward some 9 E L 2 , and since M is closed we have 9 EM. Noting that h = f - 9 has norm d, we get for any 1 EM, d 2 < Ilh + tlll 2 = d 2 + 2t(h, l) + t 2 111112, t E , which implies (h, l) == o. Hence, h -L M, as required. To prove the uniqueness, let g' + h' be another decomposition with the stated properties. Then 9 - g' E M and also 9 - g' == h' - h -1 M, so g - g' 1. 9 - g', which implies Ilg - 9'11 2 == (g - 9',9 - g/) = 0, and hence - , D 9 - 9 a.e. We proceed with a basic approximation property of sets. 
18 Foundations of Modern Probability Lemma 1.34 (regularity) Let Jl be a bounded measure on some metric space S with Borel a-field S. Then J-tB == sup J-tF = inf J-tG, B E S, FeE G-:)B with F and G restricted to the classes of closed and open subsets of S, respectively. Proof: For any open set G there exist some closed sets Fn t G, and by Lemma 1.14 we get J..LFn t J..LG. This proves the statement for B belonging to the 7r-system 9 of all open sets. Letting V denote the class of all sets B with the stated property, we further note that 1) is a A-system. Hence, Theorem 1.1 shows that 1) ::) a(Q) == S. D The last result leads to a basic approximation property for functions. Lemma 1.35 (approximation) Given a metric space S with Borel a-field S, a bounded measure J-L on (S, S), and a constant p > 0, the set of bounded, continuous functions on S is dense in LP (S, S, J.L). Thus, for any f E LP there exist some bounded, continuous functions 11, 12, . .. : S -t 1R with IIln - flip -t o. Proof: If i = 1A with A c S open, we may choose some continuous functions In with 0 < in t f, and then II/n - flip -t 0 by dominated convergence. By Lemma 1.34 the result remains true for arbitrary A E S. The further extension to simple measurable functions is immediate. For general I E LP we may choose some simple measurable functions fn -t I with Ifni < If I. Since Ifn - flP < 2 P + 1 IfI P , we get Il/n - Illp -t 0 by dominated convergence. D The next result shows how the pointwise convergence of a sequence of measurable functions is almost uniform. Lemma 1.36 (near uniformity, Egorov) Let f, fl, f2,. .. be measurable functions on some finite measure space (f2, A, JL) such that fn -t f on O. Then for any £ >' 0 there exists some A E A with jlAc < c such that In -t I uniformly on A. Proof: Define Am n = n {x E f2; Ifk(X) - f(x)1 < m- 1 }, m,n E N. , kn As n --t 00 for fixed m, we have Am,n t 11 and hence J.tA,n -+ O. Given any £ > 0, we may then choose nt, n2,. .. E N so large that J..LA,nm < e2- m for all m. Letting A = nm Am,n m , we get JlA c < J..l u AC < e'"' 2- m = c - m m, n m L..J m ' and we note that f n --t f uniformly on A. D 
1. Measure Theory - Basic Notions 19 The last two results may be combined to show that every measurable function is almost continuous. Lemma 1.37 (near continuity, Lusin) Let / be a measurable function on some compact metric space S with Borel a-field S and a bounded measure J-l. Then there exist some continuous functions iI, /2,. . on S such that J-l{x; fn(x) i= f(x)} -t o. Proof: We may clearly assume that f is bounded. By Lemma 1.35 we may choose some continuous functions 91,92,... on S such that J119k- II < 2- k . By Fubini's theorem, we get fL L k l9k - II = L k fLI9k - II < Lk 2- k = 1, and so I:k 19k - II < 00 a.e., which implies 9k -t f a.e. By Lemma 1.36, we may next choose AI, A 2 , . . . E S with J1A -t 0 such that the convergence is uniform on each An. Since each 9k is uniformly continuous on S, we conclude that f is uniformly continuous on each An. By Tietze's extension theorem, the restriction IIAn then admits a continuous extension in to S. 0 For any measurable space (S, S), we may introduce the class M (5) of a- finite measures on S. The set M(S) becomes a measurable space in its own right when endowed with the a-field induced by the mappings 'lrB: J1 t---+ J1B, B E S. Note in particular that the class P(S) of probability measures on S is a measurable subset of M(S). In the next two lemrnas we state some less obvious measurability properties, which will be needed in subsequent chapters. Lemma 1.38 (measurability of products) For any measurable spaces (8, S) and (T, T), the mapping (11, v) t-+ J1 0 1/ is measurable from P(S) x P(T) to P(S x T). Proof: Note that (J-lQ9v)A is measurable whenever A == B x C with B E S and C E T, and extend by a monotone class argument. 0 In the context of separable metric spaces S, we assume the measures J-L E M(S) to be locally finite, in the sense that J-lB < 00 for any bounded Borel set B. Lemma 1.39 (diffuse and atomic parts) For any separable metric space S, (i) the set D C M(S) of degenerate measures on S is measurable; (ii) the diffuse and purely atomic components J-ld and I-la are measurable functions of J-L E M(S). 
20 Foundations of Modern Probability Proof: (i) Choose a countable topological base B 1 , B 2 , . .. in S, and define J == {(i,j); B i n Bj == 0}. Then, clearly, D == { J-L E M (8); '" .. (J-LBi) ({tBj ) == O } . (,J)EJ (ii) Choose a nested sequence of countable partitions Bn of S into Borel sets of diameter less than n -1. For any E > 0 and n E N we introduce the sets U == U{B E Bn; {tB > E}, Uc == {s E S; J-L{s} > E}, and U == {s E 3; J-L{s} > O}. It is easily seen that U ..!-. U€ as n -t 00 and U€ t U as E -t o. By dominated convergence, the restrictions J-L == J-L(U n .) and J-l€ == J-L( U€ n .) satisfy locally J-L ..!-. J-L€ and J-tc t J-La. Since J-L; is clearly a measurable function of J-L, the asserted measurability of J-La and J-Ld now follows by Lemma 1.10. 0 Given two measurable spaces (3, S) and (T, T), a mapping J-L: S x T -t  + is called a (probability) kernel from S to T if the function J-LsB == J-L(s, B) is S-measurable in s E S for fixed BET and a (probability) mea- sure in BET for fixed s E 3. Any kernel J-L determines an associated operator that maps suitable functions f : T ---t 1R into their integrals J-Lf(s) == J J-L(s, dt)f(t). Kernels play an important role in probability the- ory, where they may appear in the guises of random measures, conditional distributions, Markov transition functions, and potentials. The following characterizations of the kernel property are often useful. For simplicity we restrict our attention to probability kernels. Lemma 1.40 (kernels) Fix two measurable spaces (3,S) and (T, T), a 7r-system C with O"(C) == T, and a family J-L == {J-Ls; s E S} of probability measures on T. Then these conditions are equivalent: (i) J-L is a probability kernel from S to T; (ii) J-L is a measurable mapping from S to P(T); (iii) S  J-LsB is a measurable mapping from S to [0,1] for every BEe. Proof: Since WB : J-L  J-LB is measurable on P(T) for every BET, condition (ii) implies (iii) by Lemma 1.7. Furthermore, (iii) implies (i) by a straightforward application of Theorem 1.1. Finally, under (i) we have /L-1 7rB 1 [0, x] E S for all BET and x > 0, and (ii) follows by Lemma 1.4. 0 Let us now introduce a third measurable space (U,U), and consider two kernels J-L and v, one from S to T and the other from S x T to U. Imitating the construction of product measures, we may attempt to combine /L and v into a kernel Jl 0 v from S to T x U given by (p,0 v )(s,B) = J JL(s,dt) J v(s,t,du)l B (t,u), BE 70U. The following lemma justifies the formula and provides some further useful information. 
1. Measure Theory - Basic Notions 21 Lemma 1.41 (kernels and functions) Fix three measurable spaces (S, S), (T,7), and (U, U). Let J-L and v be probability kernels from S to T and from 8 x T to U, respectively, and consider two measurable functions f : S x T -+ IR+ and g: S x T -1- U. Then (i) J-Lsf(s,.) is a measurable function of S E S; (ii) J-Ls 0 (g(S,.))-l is akemelfromS toU; (iii) J-L 0 v is a kernel from S to T x U. Proof: Assertion (i) is obvious when f is the indicator function of a set A == B x C with B E Sand C E T. From here on, we may extend to general A E S &; T by a monotone class argument and then to arbitrary f by linearity and monotone convergence. The statements in (ii) and (iii) are easy consequences. 0 For any measurable function f > 0 on T x U, we get as in Theorem 1.27 (J.L 0 v)sf = J J.L( s, dt) J v(s, t, du )f( t, u), S E S, or simply (J-L &; v)1 == J-L(vf). By iteration we may combine any kernels Ilk from 8 0 x . . . X Sk-1 to Sk, k == 1, . . . , n, into a kernel P-l @ . . . 0 J.ln from So to Sl X . . . X Sn, given by (J-Ll &;... Q9 J-Ln)f == J-Ll(/J2('.' (JLnf)...)) for any measurable function 1 > 0 on S1 x . . . X Sn. In applications we may often encounter kernels /Jk from Sk-1 to Sk, k == 1, . . . , n, in which case the composition J-L1 . . . /In is defined as a kernel from So to Sn given for measurable B C Sn by (JL1 . . . J-Ln)sB (J-L1 @ . . . @ /In)s(Sl x . .. X Sn-l X B) - J J.Ll(s,ds 1 ) J J.L2(Sl,ds 2 )... . · . J J.Ln-l (Sn-2, dS n - 1 )J.Ln (Sn-l, B). Exercises 1. Prove the triangle inequality /1(AC) < Il(AB) -f- j.-l(Bf:1C). (Hint: Note that 1 ALlB = 11A - 1BI.) 2. Show that Lemma 1.9 is false for uncountable index sets. (Hint: Show that every measurable set depends on countably many coordinates.) 3. For any space S, let JLA denote the cardinality of the set A c S. Show that JL is a measure on (8,2 8 ). 4. Let /C be the class of compact subsets of some metric space S, and let J.-l be a bounded measure such that infKEK: JLKc == O. Show for any B E 8(S) that JLB == sUPKEK:nB JLK. 
22 Foundations of Modern Probability 5. Show that any absolutely convergent series can be written as an integral with respect to counting measure on N. State series versions of Fatou's lemma and the dominated convergence theorem, and give direct elementary proofs. 6. Give an example of integrable functions f, 11, f2, . .. on some probability space (0, A, J-L) such that In -+ I but J-Lfn f+ J-Lf. 7. Fix two a-finite measures J-t and v on some measurable space (O,:F) with sub-a-field Q. Show that if J-t « 1/ holds on F, it is also true on Q. Further show by an example that the converse may fail. 8. Fix two measurable spaces (8, S) and (T, T), a measurable function I: S --t T, and a measure J1 on 8 with image 1/ == J-t 0 f-l. Show that I remains measurable w.r.t. the completions SIL and TI/. 9. Fix a measure space (8,5, J-L) and a a-field T c S, let SJ.L denote the J.l-completion of 5, and let TJ.L be the a-field generated by T and the J-t-null sets of 5J-L. Show that A E TJ-1 iff there exist some BET and N E SJ-L with AB c Nand JtN == O. Also, show by an example that TIL may be strictly greater than the j.L-completion of T. 10. State Fubini's theorem for the case where J.l is any a-finite measure and v is the counting measure on N. Give a direct proof of this result. 11. Let fl, 12, . .. be J1-integrable functions on some measurable space S such that 9 == Lk fk exists a.e., and put gn == Lk<n fk. Restate the domi- nated convergence theorem for the integrals J-tgn in terms of the functions fk, and compare with the result of the preceding exercise. 12. Extend Theorem 1.27 to the product of n measures. 13. Let A denote Lebesgue measure on IR+, and fix any p > O. Show that the class of step functions with bounded support and finitely many jumps is dense in LP(A). Generalize to }Ri. 14. Let M ::J N be closed linear subspaces of £2. Show that if f E £2 has projections 9 onto M and h onto N, then 9 has projection h onto N. 15. Let M be a closed linear subspace of £2, and let f, 9 E L 2 with M- projections 1 and g. Show that (1, g) == (f, g) == (I, g). 16. Let J-tl, J.l2, . .. be kernels between two measurable spaces 8 and T. Show that the function Jt == Ln J.ln is again a kernel. 17. Fix a function f between two measurable spaces Sand T, and define J.l(s, B) == IB 0 f(8). Show that J-t is a kernel iff f is measurable. 18. Show that if J-t « 1/ and 1/1 == 0 with f > 0, then also J11 == O. (Hint: Use Lemma 1.24.) 19. For any a-finite measures J-tl « J12 and VI « 1/2, show that J.ll Q9 1/1 « J-t2 @ V2. (Hint: Use Fubini's theorem and Lemma 1.24.) 
Chapter 2 Measure Theory - Key Results Outer measures and extension; Lebesgue and Lebesgue-Stieltjes measures; Jordan-Hahn and Lebesgue decomposit10ns; Radon- Nikodym theorem; Lebesgue's differentiation theorem; junctions of finite variation; Riesz' representation theorern; Haar and invariant measures We continue our introduction to measure theory with a detailed discussion of some basic results of the subject, all of special relevance to probability theory. Again the hurried or impatient reader may skip to the next chapter and return for reference when need arises. Most important, by far, of the quoted results is the existence of Lebesgue measure, which lies at the heart of most probabilistic constructions, of- ten via a use of the Daniell-Kolmogorov theorem of Chapter 6. A similar role is played by the construction of Haar and other invariant measures, which ensures the existence of uniform distributions or homogeneous Pois- son processes on spheres and other manifolds. Other key results include Riesz' representation theorem, which will enable us in C;hapter 19 to con- struct Markov processes with a given generator, via the resolvents and the associated semigroup of transition operators. We may also mention the Radon-Nikodym theorem, of relevance to the theory of conditioning in Chapter 6, Lebesgue's differentiation theorem, instrulnental for proving the general ballot theorem in Chapter 11, and various results on functions of bounded variation, important for the theory of predictable processes and general semimartingales in Chapters 25 and 26. We begin with an ingenious technical result that will play a crucial role for our construction of Lebesgue measure in Theorem 2.2 and for the proof of Riesz' representation Theorem 2.22. By an outer measure on a space [2 we mean a nondecreasing and countably subadditive set function J.-t: 2° -t IR + with J-t0 = O. Given an outer measure J-t on [2, we say that a set A c [2 is J.l- measurable if J-tE = J.l(E n A) + Jl(E n A C ), E c !l. (1) Note that the inequality < holds automatically by subadditivity. The following result gives the basic construction of mea5ures from outer measures. 
24 Foundations of Modern Probability Theorem 2.1 (restriction of outer measure, Caratheodory) Let J.l be an outer measure on 0, and write A for the class of J.l-measurable sets. Then A is a a-field and the restriction of J-t to A is a measure. Proof: Since J.l0 == 0, we have for any set E c 0 J.l(E n 0) + J-l(E n f!) = J-t0 + J-LE = J-LE, which shows that 0 E A. Also note that trivially A E A implies AC E A. Next assume that A, B E A. Using (1) for A and B together with the subadditivity of J.l, we get for any E c f! J-tE Jl(E n A) + J.t(E n A C ) == Jl(E n A n B) + Jl(E n A n B C ) + J-t(E n A C ) > Jl(E n (A n B)) + p(E n (A n B)C), which shows that even A n B E A. It follows easily that A is a field. If A, B E A are disjoint, we also get by (1) for any E c n J-L(E n (A U B)) == J-L(E n (A U B) n A) + J-L(E n (A U B) n A C ) == J-L(E n A) + J-L(E n B). (2) Finally, consider any disjoint sets AI, A 2 ,.. . E A, and put Un == Uk<n Ak and U == Un Un. Using (2) recursively along with the monotonicity-of J-t, we get J.L(E n U) > J.L(E nUn) = LksnJ.L(E n A k ). Letting n -+ 00 and combining with the subadditivity of J-L, we obtain J.L(E n U) = LkJ.L(E n A k ). (3) In particular, for E = n we see that Jl is countably additive on A. Noting that Un E A and using (3) twice along with the monotonicity of J.l, we also get JlE J-L(E nUn) + J-L(E n U) > LksnJ.L(E n A k ) + J.L(E n U C ) -+ J1(E n U) + J.L(E n U C ), which shows that U E A. Thus, A is a a-field. o We are now ready to introduce Lebesgue measure A on JR. The length of an interval I C ]R is denoted by 111. Theorem 2.2 (Lebesgue measure, Borel) There exists a unique measure A on (JR, B) such that AI = III for every interval I c JR. As a first step in the proof, we show that the length III of intervals I c JR admits an extension to an outer measure on JR. Then define .xA = inf{h} Lk lIkl , A c JR, (4) 
2. Measure Theory -- Key Results 25 where the infimum extends over all countable covers of ..(4 by open intervals 1 1 ,12, . . . . We show that (4) provides the desired extension. Lemma 2.3 (outer Lebesgue measure) The function A in (4) is an outer measure on JR. Moreover, AI = III for every interval I. Proof: The set function A is clearly nonnegative and non decreasing with A0 = O. To prove the countable subadditivity, let AI, A 2 , . .. c 1R be ar- bitrary. For any £ > 0 and n EN, we may choose some open intervals Inl, I n2 , . .. such that An c U/nk, >"An > Lkllnkl- E2- n , n E N. Then UnAn >.. Un An c UnU/nk' < LnLkllnkl < Ln>"An +10, and the desired relation follows as we let E ---t O. To prove the second assertion, we may assume that I == [a, b] for some finite numbers a < b. Since I c (a - E, b + E) for every c > 0, we get AI < III + 2£, and so AI < III. To obtain the reverse relation, we need to prove that if I C Uk Ik for some open intervals 1 1 ,1 2 "., , then III < Lk Ilkl. By the Heine-Borel theorem, I remains covered by finitely many intervals Il,...,In, and it suffices to show that III < Lk<nIIkl. This reduces the assertion to the case of finitely many covering intervals II, . . . , In. The statement is clearly true for a single covering interval. Proceeding by induction, we assume the assertion to be true for n - 1 covering intervals and turn to the case of covering by II,..., In. Then b belongs to some Ik = (ak, b k ), and so the interval I = I \ Ik is covered by the remaining intervals Ij, j =1= k. By the induction hypothesis, we get III b-a « b-ak)+(ak- a ) < Ihl + III < Ihl + Lj#IIjl = L)Ijl, as required. o The next result ensures that the class of measurable sets in Lemma 2.3 is large enough to contain all Borel sets. Lemma 2.4 (measurability of intervals) Let A denote the outer measure in Lemma 2.3. Then the interval (-00, a] is A-measurable for every a E JR. Proof: For any set E c JR and constant c > 0, we may cover E by some open intervals 1 1 ,1 2 ,. .. such that AE > Ln IInl - E. Writing I = (-00, a] 
26 Foundations of Modern Probability and using the subadditivity of A and Lemma 2.3, we get >"E + E: > Ln llnl = Lnlln nIl + Lnlln n ICI Ln >..(In n 1) + Ln >..(In n I C ) > A(EnI)+.A(EnI C ). Since e was arbitrary, it follows that I is A-measurable. o Proof of Theorem 2.2: Define A as in (4). Then Lemma 2.3 shows that A is an outer measure such that AI = III for every interval I. Furthermore, Theorem 2.1 shows that A is a measure on the a-field A of all A-measurable sets. Finally, Lemma 2.4 shows that A contains all intervals (-00, a] with a E JR.. Since the latter sets generate the Borel a-field B, we have B c A. To prove the uniqueness, consider any measure {t with the stated prop- erties, and put In == [-n,n] for n E N. Using Lemma 1.17 with C equal to the set of intervals, we see that .A(B n In) = J-L(B n In), B E B, n E N. Letting n -+ 00 and using Lemma 1.14, we get AB == JLB for all B E B, as required. 0 Before proceeding to a more detailed study of Lebesgue measure, we state an abstract extension theorem that can be proved by essentially the same arguments. Here a nonempty class I of subsets of a space n is called a semiring if for any I, J E I we have I n J E I and the set I n JC can be written as a union of finitely many disjoint sets II, . . . , In E I. Theorem 2.5 (extension, Caratheodory) Let J1 be a finitely additive and countably subadditive set function on a semiring I such that J-L0 == O. Then J..l extends to a measure on a (I) . Proof: Define a set function J.-t* on 2!1 by J-L* A = inf{ld LkJ-Lh, A c fl, where the infimum extends over all covers of A by sets 1 1 ,1 2 ,' . . E I. Let J..l* A = 00 when no such cover exists. Proceeding as in the proof of Lemma 2.3, we see that J-L* is an outer measure on O. To check that J-L* extends J-L, fix any I E I, and consider an arbitrary cover 1 1 ,1 2 ,'.. E I of I. Using both the subadditivity and the finite additivity of {t, we get J-L* I < J-LI < Lk (I n h) < LkJ-Lh, which implies J-L* I = J1.I. By Theorem 2.1, it remains to show that every set I E I is J.L* -measurable. Then let A c n be covered by some sets 1 1 ,1 2 ,." E I with f.-t* A > L::k J-LIk-c, and proceed as in the proof of Lemma 2.4, noting that In n IC is a finite disjoint union of some sets Inj E I, and therefore J1(In n IC) = E j J-L1nj by the finite additivity of {t. 0 
2. Measure Theory - Key Results 27 Using Theorem 1.27, we may construct the product measure Ad == A @ . . . @ A on d for every dEN. We call Ad the d-dirnensional Lebesgue measure. Note that Ad generalizes the ordinary notion of area (when d == 2) or volume (when d > 3). The following result shows that Ad is invariant under arbitrary translations (or shifts) and rotations. We shall also see that the shift invariance characterizes Ad up to a constant factor. Theorem 2.6 (invariance of Lebesgue measure) Fix any measurable space (8, S) and a measure J-l on]Rd x 8 with a-finite projection v == j.l((O, l]d X .) onto S. Then J-t is invariant under shifts in d iff J-t == Ad (29 v, in which case J-L remains invariant under arbitrary rigid motions of }Rd. Proof: First assume that J-t is invariant under shifts in ]Rd. Let I denote the class of intervals I == (a, b] with rational endpoints, and note that for any II, . . . , Id E I and C E S with vC < 00, J-l(II X . . . X Id X C) 11 1 1 . . . lId I vC (Ad @ V)(II X . .. X 1d X C). For fixed 1 2 , . . . ,Id and C, the relation extends by monotonicity to arbitrary intervals II and then, by the uniqueness in Theorem 2.2, to any Bl E B. Proceeding recursively in d steps, we get for arbitrary 11 1 , . . . , Bd E B J-L(BI X ... X Ed X C) == (Ad Q9 v)(B I X ... X Ed X C), and so J..L == Ad 0 v by the uniqueness in Theorem 1.27. Conversely, let J-L == Ad @ v. For any h == (hI,. . . , h d ) E: d, we define the shift operator Th : ]Rd -+ ]Rd by ThX == x + h for all x E JRd. For any intervals II,. . . ,Id and sets C E S, we have J-L(II X . . . X Id X C) 11 1 1 . . .IIdl vC J..LOT;:I(Il x... X Id X C), where Th(X, s) == (x + h, s). As before, it follows that /-l == J-l 0 T;:I. It remains to show that J-l is invariant under arbitrary orthogonal transformations P on ]Rd. Then note that, for any x, h E: JRd, Px+h=P(x+P- 1 h) P(x + h') == PTh,X, where hi == P-lh. Since J-t is shift-invariant, we obtain Th PX J-L 0 p-l 0 Ti: l == J-l 0 T;:,1 0 p- 1 == J-l 0 p--l, where P(x,s) == (Px,s). Thus, even J-lop-l is shift-invariant and hence of the form>.. d Q?J v'. Writing B for the unit ball in JRd, we get for any C E S >.odB. vie == J-lO P-l(B x C) == J-l(p-l B x C) == J-l(B x C) == Ad B . vC. Dividing by Ad B yields v'e == vC. Hence, Vi == v, and so J-lO p-l == J-L. 0 
28 Foundations of Modern Probability We proceed to show that integrable functions on ]Rd are continuous in a specified average sense. Lemma 2.7 (mean continuity) Let I be a measurable function on IR d with Adl!1 < 00. Then lim J If (x + h) - f(x)1 dx = O. h-+O Proof: By Lemma 1.35 and a simple truncation, we may choose some continuous functions 11, f2, . .. with bounded supports such that Adll n - II -4 O. By the triangle inequality, we get for n E Nand h E d J If(x + h) - f(x)1 dx < J Ifn(x + h) - fn(x)1 dx + 2>.dlfn - fl. Since the fn are bounded, the right-hand side tends to 0 by dominated convergence as h -4 0 and then n -4 00. 0 By a bounded signed measure on a measurable space (!1, A) we mean a bounded function v: A --t  such that v Un An = l:n V An for any disjoint sets AI, A 2 ,' . . E A, where the series converges absolutely. We say that two measures JL and v on (0, A) are (mutually) singular or orthogonal and write JL 1- v if there exists some set A E A with J.LA = £lAc = O. Note that this A may not be unique. The following result gives the basic decomposition of a signed measure into positive components. Theorem 2.8 (Hahn decomposition) Any bounded signed measure v can be written uniquely as a difference of two bounded, nonnegative, and mutually singular measures £1+ and £1_. Proof: Put c == sup{vA; A E A} and note that, if A, A' E A with vA > C - E and v A' > C - E', then v(A u A') vA+vA' -v(AnA') (c - E) + (c - E/) - C = C - E - E'. > Choosing AI, A 2 , . .. E A with v An > C - 2- n , we get by iteration and countable additivity V u Ak > C - '" 2- k = C - 2- n , n E N. k>n L..i k>n Define A+ = nn Uk>n Ak and A_ = A+. Using the countable additivity again, we get vA+ :=: c. Hence, for sets B E A, vB vA+ - v(A+ \ B) > 0, vB - v(A+UB)-vA+ < O, B c A+, B c A_. We may then define some measures £1+ and £1_ by v+B = v(B n A+), l/_B = -v(B n A_), BE A. To prove the uniqueness, assume also that v = JL+ - J..L- for some positive measures Jl+ ..L J.t-. Choose a set B+ E A with J-t_B+ = J.L+B+ = o. Then 
2. Measure Theory - Key Results 29 v is both positive and negative on the sets A+ \ B+ a.nd B+ \ A+, and therefore v == 0 on A+B+. Hence, for any C E A J1+C == J-t+(B+ n C) == v(B+ n C) == v(A+ n C) == v+C, which shows that 11+ == v+_ Then also /1- == J.l+ - v == v+ - v == v_. o The last result can be used to construct the maximum /.lVV and minimum J.l /\ v of two a-finite measures J.l and v. Corollary 2.9 (maximum and minimum) For any a-finite measures J.1 and v on a common measurable space, there exists a la.rgest measure J.1 /\ v bounded by J1 and v and a smallest measure J1 V v bounding J.l and v. Furthermore, J.l - J.l /\ v ..L v - J.l /\ 1/, J.l /\ 1/ + J.l V v == ,f..l + v. Proof: We may assume that J.l and 1/ are bounded. Letting p+ - p- be the Hahn decomposition of J1- v, we put J.l /\ v == J.l - p+, J1 V 1/ == J1 + p-. For any two measures J.l and v on (f1, A), we say that v is absolutely continuous with respect to J.L and write v « J.L if jlA == 0 implies 1/ A == 0 for all A E A. The following result gives a fundamental decomposition of a measure into an absolutely continuous and a singular component; at the same time it provides a basic representation of the former part. o Theorem 2.10 (Lebesgue decomposition, Radon-Nikodym theorem) For any a-finite measures J.L and 1/ on 0, there exist some unique measures Va « J.l and V 8 ..L J.l such that v == Va + 1/ 8 ' Furthermore, L'a == f . J1 for some jl-a. e. unique measurable function f > 0 on n. Two lemmas will be needed for the proof. Lemma 2.11 (closure) Fix two measures J.l and v on f1 and some mea- surable functions 11, f2, . . . > 0 on n with f n . J.l < v - Then even f . f.-L < v, where I == sUP n In- Proof: First assume that f . J-L < v and 9 . J-L < v, and put h == f V g. Writing A == {f > g}, we get h . J.l == 1 A h . J.L + 1 A c h . J.l == 1 A f . J.l + 1 A c 9 - J.l < 1 A . II + 1 A c . V == v. Thus, we may assume that f n t f. But then v > f n . J.l t f - J.L by monotone convergence, and so I . J.l < v. 0 Lemma 2.12 (partial density) Let J.l and 1/ be bounded measures on n with J-t I- v. Then there exists a measurable function f > 0 on n such that Jl f > 0 and f · J.l < v. Proof: For each n E N we introduce the signed measure Xn == V - n- 1 J.l. By Theorem 2.8 we may choose some A;t E A with complement A;; such 
30 Foundations of Modern Probability that :f:Xn > 0 on A. Since the Xn are nondecreasing, we may assume that At c At c . . . . Writing A == Un A and noting that AC == nn A c A;;, we obtain vA C < vA == XnA +n-lttA < n- 1 J.-L0'  0, and so vAc == O. Since J-L t v, we get ttA > O. Furthermore, A;t t A implies J.LA;t t J.LA > 0, and we may choose n so large that ttA;t > O. Putting I == n- 1 1 A ;t", we obtain J-lf == n- 1 J-lA;t > 0 and f.jL==n- 1 1 A + .J.-L==l A + .v-1 A + .Xn < v. 0 n n n Proof of Theorem 2.10: We may assume that J.L and v are bounded. Let C denote the class of measurable functions ! > 0 on n with ! . jL < v, and define c == SUP{M!; I E C}. Choose 11, f2,... E C with Mfn  c. Then f - SUPn In E C by Lemma 2.11 and MI == c by monotone convergence. Define Va == f . M and V s == v - Va, and note that Va « J-L. If V 8 I- J.L, then by Lemma 2.12 there exists a measurable function 9 > 0 with J.Lg > 0 and 9 . J-l < v 8 . But then f + 9 E C with M(! + g) > c, which contradicts the definition of c. Thus, v s -L /.1. To prove the uniqueness of Va and v 8 , assume that also v == v + v for some measures v « J-L and v -L {t. Choose A, B E A with vsA == J-LAc == vB == j.LBc == O. Then clearly vs(A n B) == v(A n B) == va(A C U B C ) == v(AC U B C ) == 0, and so V 8 - 1AnB . Va == 1AnB . V == lAnB · v == v, I I V - Va == V - Va == v 8 . Va To see that f is a.e. unique, assume that also Va == 9 . J.L for some mea- surable function 9 > O. Writing h == f - 9 and noting that h . J.-L == 0, we get ILlhl = [ hdlL - [ hdlL = 0, J{h>O} J{h<O} and so h == 0 a.e. by Lemma 1.24. 0 We insert a simple corollary that will be useful in Chapter 10. Corollary 2.13 (splitting) Consider two finite measure spaces (8, S, {t) and (T, T, v) and a measurable map I: S  T such that v < /-l0 f-l. Then there exists a measure J..L' < {t on S such that v == {t' 0 1-1 . Proof: Put J..L' == (g 0 f) . J.L with 9 == dv / d({t 0 f-l), and use Lemma 1.22. 0 A measure {t on 1R is said to be locally finite if {tI < 00 for every bounded interval I. The following result gives a basic correspondence between locally finite measures and nondecreasing functions. 
2. Measure Theory -- Key Results 31 Proposition 2.14 (Lebesgue-Stieltjes measures) The relation j.L(a, b] == PCb) - Pea), -00 < a < b < IX), (5) defines a one-to-one correspondence between the locally finite measures J-l on 1R and the right-continuous, nondecreasing functions F on JR with F(O) == O. Proof: Given a locally finite measure J-l on JR, we define the function F on JR by F(x) == { j.L(O, x], -J-l(x,O], x > 0, x < o. Then F is right-continuous and nondecreasing with F(O) == 0, and it is clearly the unique such function satisfying (5). Conversely, given a function F as stated, we define the left-continuous, generalized inverse g: 1R  JR by 9 ( t) == inf {s E JR; F ( s) > t}, t E JR. Since 9 is again nondecreasing, the set g-l( -00, s] is an extended interval for each s E 1R, and so 9 is measurable by Lemma 1.4. We may then define a measure J-l on 1R by J-l == A 0 g-1, where A denotes Lebesgue measure on 1R. Noting that get) < x iff t < F(x), we get for any a < b j.L( a, b] A{t; get) E (a, b]} A ( F ( a ), F ( b )] == F ( b) - F ( Q,) . Thus, the restriction of J-l to JR satisfies (5). The uniqueness of J-l may be proved in the same way as for A in Theorem 2.2. 0 We now specialize Theorem 2.10 to the case when Ji equals Lebesgue measure and v is a locally finite measure on JR, defined as in Proposition 2.14 in terms of some nondecreasing, right-continuous function F. The Lebesgue decomposition and Radon-Nikodym property may be expressed in terms of F as F = Fa + Fs = f f + Fs, where Fa and Fs correspond to the absolutely continuous and singular com- ponents of v, respectively, and we assume that Fa(O) == O. Here J f denotes the function J o x f(t) dt, where the Lebesgue density f is a. locally integrable function on JR. The following result extends the fundamental theorem of calculus for Riemann integrals of continuously differentiable functions-the fact that differentiation and integration are mutually inverse operations. (6) Theorem 2.15 (differentiation, Lebesgue) Any nondecreasing and right- continuous function F == J f + Fs is differentiable a. e. with derivative F' == f. Thus, the two parts of the fundamental theorem generalize to (J f)' == f a.e. and J F' = Fa. In other words, the density of an integral can still be 
32 Foundations of Modern Probability recovered a.e. through differentiation, whereas integration of a derivative yields only the absolutely continuous component of the underlying function. In particular, F is absolutely continuous iff J F' == F - F(O) and singular iff p' = 0 a.e. The last result extends trivially to any difference F == F + - F _ between two nondecreasing, right-continuous functions F + and F _. However, it fails for more general functions, already because the derivative may not exist. For example, the paths of Brownian motion introduced in Chapter 13 are a.s. nowhere differentiable. Two lemmas will be helpful for the proof of the last theorem. Lemma 2.16 (interval selection) Let I be a class of open intervals with union G. If AG < 00, there exist some disjoint sets II, . . . , In E I with I:k Ilk I > AG/4. Proof: Choose a compact set KeG with AK > 3AG /4. By compactness we may cover K by finitely many intervals J 1 , . .. , J m E I. We now define II, 1 2 , . .. recursively, by letting I k be the longest interval J r not yet chosen such that J r n Ij = 0 for all j < k. The selection terminates when no such interval exists. If an interval J r is not selected, it must intersect a longer interval Ik. Writing i k for the interval centered at Ik with length 3IIk\, we obtain K c UrJ r C Uki k , and so {3/4).xC < .xK < .xU/k < Lk1ikl = 3 Lk 1hl . 0 Lemma 2.17 (differentiation on null sets) Let F(x) = J.l(O,x] for some locally finite measure J.l on ]R, and let A E B with JLA = o. Then P' = 0 a.e. A on A. Proof: By Lemma 1.34 there exists for every fJ > 0 some open set G{) =:) A with J.lG () < fJ. Define { . J.l(x - h, x + h) } Ae = x E A; hp h > € , e > 0, and note that each Ac is measurable since the lim sup may be taken along the rationals. For every x E Ae there exists some interval I = (x-h, x+h) c G6 with 2JLI > elII, and we note that the class Ie,o of such intervals covers Ac. Hence, by Lemma 2.16 we may choose some disjoint sets II,.. . , In E I e ,6 with I:k Ilk I > AAe/ 4 . Then '"' 8 " 8 J.lG{j 8b AAc < 4 L...."k'Ik I < e L...JkJ.llk < e <  . As 6 -t 0, we get AAe == O. Thus, limsupJ.l(x - h,x + h)/h < e a.e. A on A, and the assertion follows since e is arbitrary. 0 
2. Measure Theory - Key Results 33 Proof of Theorem 2.15: Since F == 0 a.e. A by Lenlma 2.17, we may assume that F == J f. Define pA(X) limsuph-1(F(x + h) - F(x)), h-+O F V (x) lim inf h- 1 (F(x + h) - F(x)), h.......O and note that FA == 0 a.e. on the set {f == O} == {x; f(;r) == O} by Lemma 2.17. Applying this to the function Fr == J (f - r) + for arbitrary r E  and noting that f < (j - r)+ + r, we get FA < r a.e. on {f < r}. Thus, for r restricted to the rationals, A{J < FA} AU)! < r < FA} < 2: r A U < r < FA} = 0, which shows that FA < f a.e. Applying this result to - F == J ( - f) yields F V == -( -F)/\ > j a.e. Thus, FA == F V == f a.e., and so P' exists a.e. and equals f. 0 For any function F: JR --+ JR, we define the total variation of P on the interval [a, b] as IIFII = sup{tk} 2: k lF(tk) - F(tk-l)!, where the supremum extends over all finite partitions a == to < t 1 < .. . < t n == b. Similarly, the positive and negative variations of F are defined by the same expression with the absolute value I . I replaced by the positive and negative parts (.):f:. Here xi: == (:l:x) V 0, so that x == x+ - x- and Ix I == x+ + x-. We also write LlF == PCb) - P(a). The following result gives a basic decomposition of functions of locally finite variation, similar to the Hahn decomposition in Theorem 2.8. Proposition 2.18 (Jordan decomposition) A function F on  has locally finite variation iff it is a difference of two nondecreasing functions F + and F _. In that case, I!FII; < Ll;F+ + Ll;F_, s < t, (7) with equality iff the increments Ll;F:f: agree with the positive and negative variations of F on (s, t]. Proof: For any s < t we have (dF)+ - (d;F)- + F, IFI - (LlF)+ + (d;F)- == 2(F)- + LlF. Summing over the intervals in an arbitrary partition s = to < tl < ... < t n == t and taking the supremum of each side, we obtain Ll;F+ - Ll;F_ + ;F, IIFII = 2LlF_ + F = F+ + LlF_, 
34 Foundations of Modern Probability where F:f: (x) denote the positive and negative variations of F on [0, x] (or minus the variations on [x, 0] when x < 0). Thus, F = F(O) + F+ - F_, and (7) holds with equality. If also F = G+ - G_ for some nondecreas- ing functions G x , then (F):f: < G:i:, and so Fx < G:f:. Thus, IIFII; < ;G+ + ;G_, and equality holds iff ;F:f: = G:f:. 0 Next we give another useful decomposition of finite-variation functions. Proposition 2.19 (left and right continuity) Any function F of locally finite variation can be written as Pr + Fl, where Fr is right-continuous with left-hand limits and Fl is left-continuous with right-hand limits. If F is right-continuous, then so are the minimal components F:f: in Proposition 2.18. Proof: By Proposition 2.18 we may assume that F is nondecreasing. The right- and left-hand limits P1:. ( s) then exist at every point s, and we note that P-(s) < pes) < P+(s). Also note that F has at most countably many jump discontinuities. For t > 0, we define Fi(t) ==  (F+(s) - F(s)), L-tsE[O,t) Fr(t) == F(t) - FI(t); when t < 0 we need to take the negative of the corresponding sum on (t, 0]. It is easy to check that Fl is left-continuous and Pr is right-continuous, and that both functions are nondecreasing. To prove the last assertion, assume that F is right-continuous at some point s. If !lFII; -t c > 0 as t -!. s, we may choose t - s so small that IIFII < 4c/3. Next we may choose a partition s == to < t l < . . . < t n == t of [s, t] such that the corresponding F-increments 8k satisfy Lk 16kl > 2c/3. By the right continuity of F at s, we may assume that tl - s is small enough that 6 1 = IF(t l ) - F(s)1 < c/3. Then IIFln 1 > c/3, and so 4c/3 > IIFII == IIFIIl + IIFIIl > c + c/3 == 4c/3, a contradiction. Hence c == O. Assuming F:f: to be minimal, we obtain F:f: < IIFII --t 0, t t s. o Justified by the last theorem, we may assume our finite-variation func- tions to be right-continuous. In that case, we have the following basic relation to signed measures. Here we only require the latter to be locally bounded. Proposition 2.20 (finite-variation functions and signed measures) For any right-continuous function F of locally finite variation, there exists a unique signed measure v on]R such that v(s, t] = b,.;F for all s < t. Further- more, the Hahn decomposition v = v+ - v_ and the Jordan decomposition F = F+ - F_ into minimal components are related by v:t:(s, t] = Fx. Proof: The positive and negative variations F:f: are right-continuous by Proposition 2.19. Hence, by Proposition 2.14 there exist some locally finite 
2. Measure Theory - Key Results 35 measures J-tI on JR such that J-t-J: (s, t] - ;F I' and we may take v == J-l+ - /-l-. To see that this agrees with the Hahn decomposition v == v+ - v_, choose A E A such that lI+Ac == v_A == O. For any B E B, we get J-t+B > J-l+(B n A) > v(B n A) == v+(B n A) == v+B, which shows that J-t+ > v+. Then also J-l- > v_. If the equality fails on some interval (s, t], then II F ,,; == J-t+ ( s, t] + J-l- ( s, t] > v + ( s, t] + v _ ( s, t], which contradicts Proposition 2.18. Hence, J-l-:1::. == V::t:. o A function F: JR -+ 1R is said to be absolutely continuous if for any a < b and E > 0 there exists some {) > 0 such that, for any finite collection of disjoint intervals (ak, b k ] C (a, b] with 2:k Ibk-akl < 8, we have 2:k IF(bk)- F(ak) I < E. In particular, we note that every absolutely continuous function is continuous and has locally finite variation. Given a function F of locally finite variation, we say that F is singular if for any a < band E > 0 there exist finitely many disjoint intervals (ak, b k ] C (a, b] such that 2:k Ib k - akl < E and IIFII < 2:k IF(bk) - F(ak)1 + E. We say that a locally finite signed measure v on 1R is absolutely continu- ous or singular if the components VI of the associated Hahn decomposition satisfy V-J: « A or V-J: 1- A, respectively. The following result relates the notions of absolute continuity and singularity for functions and measures. Proposition 2.21 (absolutely continuous and singular functions) Let F be a right-continuous function on JR of locally finite variation, and let v be the associated signed measure on JR with v( s, t] - ;F. Then F is absolutely continuous or singular iff the corresponding property holds for v. Proof: If F is absolutely continuous or singular, then the corresponding property holds for the total variation function IIFII with arbitrary a and hence also for the minimal components F I in Proposition 2.20. Thus, we may assume that F is nondecreasing, so that v is a positive and locally finite measure on . First assume that F is absolutely continuous. If v 1;:. 'x, there exists a bounded interval I = (a, b) with a subset A E B such that ,XA == 0 but vA > O. Taking E == vA/2, we choose a corresponding 8 > 0 as in the definition of absolute continuity. Since A is measurable and has outer Lebesgue measure 0, we may next choose an open set Ci with A c Gel such that ,XC < 8. But then vA < vG < E == vA/2, a contradiction. This shows that v « 'x. Next assume that F is singular, and fix any bounded interval I = (a, b]. Given any c > 0, we may choose some Borel sets A 11 A 2 , . .. c I such that AAn < E2- n and vAn -+ vI. Then B == Un An satisfies ,XB < E and vB = vI. Next we may choose some Borel sets Bn C I with 'xBn -+ 0 and 
36 Foundations of Modern Probability v Bn == v I. Then C == nn Bn satisfies .xC == 0 and vC == v I, which shows that v -L A on I. Conversely, assume that v « A, so that v == f . A for some locally integrable function f > O. Fix any bounded interval I and put An == {x E I; f(x) > n}. Fix any E > O. Since vAn -+ 0 by Lemma 1.14, we may choose n so large that vAn < E/2. Put {) == E/2n. For any Borel set Bel with AB < {) we obtain vB == v(B n An) + v(B n A) < vAn + nAB < !E + n6 == E. In particular, this applies to any finite union B of intervals (ak, b k ] C I, and so we may conclude that F is absolutely continuous. Finally, assume that v ...L A. Fix any finite interval I == (a, b], and choose a Borel set A c I such that AA == 0 and vA == vI. For any E > 0 we may choose some open set G J A with .xG < E. Letting (an, b n ) denote the connected components of G and writing In =: (an, b n ], we get En IInl < E and En v(l n In) == vI. This shows that F is singular. 0 From now on, we assume the basic space S to be locally compact, second countable, and Hausdorff (abbreviated lcscH). Let g, F, and K denote the classes of open, closed, and compact sets in S, and put g == {G E g; G E K}. Let 6+ == 6+ (S) denote the class of continuous functions f: 8 -+ JR+ with compact support, where the latter is defined as the closure of the set {x E S; f(x) > O}. Relations such as U -< f -< V mean that f E 6+ with o < f < 1 and satisfies f == 1 on U and supp f C VO. By a positive linear functional on 0+ we mean a mapping J-t: 6+ -+ R+ such that /l(f + g) == J-Lf + J.lg for all f, 9 E 6+- This clearly implies the homogeneity J-t(cf) == CIl! for any f E 6+ and c E JR+. A Radon measure on 8 is defined as a measure J.t on the Borel a-field S == 8(8) such that J.LK < 00 for every K E /C. The following result gives the basic extension of positive linear functionals to measures. Theorem 2.22 (Riesz representation) If S is lescH, then every positive linear functional J.L on 6+(8) extends uniquely to a Radon measure on S. Several lemmas will be needed for the proof, and we begin with a simple topological fact. Lemma 2.23 (partition 01 unity) For any open cover G I ,. . . , G n of a compact set K c S, there exist some functions 11,.. -, In E 6+(8) with fk -< Gk such that Ek fk = 1 on K. Proof: For any x E K we may choose some k < n and V E g with x E V and V C Gk. By compactness, K is covered by finitely many such sets VI, - . . , V m - For each k < n, let Uk be the union of all sets Vj with V j c G k . Then U k c G k , and so we may choose g1,. - . , gn E 0+ with Uk -< gk -< Gk. Define Ik == 9k(1 - 91) . . · (1 - 9k-l), k == 1, . . . , n. 
2. Measure Theory - Key Results 37 Then fk -< G k for all k, and by induction f 1 + . . . + In == 1 - (1 - 91) . . . (1 - 9n). It remains to note that TIk(l - gk) == 0 on K since K C Uk Uk. 0 By an inner content on an IcscH space 5 we mean a nondecreasing func- tion J.L : 9 -+ 1R +, finite on g, such that J.L is both finitely additive and countably subadditive, and also satisfies the inner continuity J-LG == sup{J.LU; U E g, U c G}, G E Q. (8) Lemma 2.24 (inner approximation) For any positive linear functional J.L on 6+(5), we may define an inner content v on S by vG == sup{J.Lf; I -< G}, G E g. Proof: Note that v is nondecreasing with v0 = 0 and that vG < 00 for bounded G. It is also clear that 1/ is inner continuous in the sense of (8). To show that 1/ is countably subadditive, fix any G 1, G 2 , . .. E 9 and let f -< Uk Gk. By compactness, f -< Uk<n Gk for some finite n, and by Lemma 2.23 we may choose some functions gk -< G k such that 'Lk 9k == 1 on supp f. Then the products fk == gkf satisfy Ik -< G A; and 'Lk fk == f, and so J.Lf = Lk<.S.nJ.t!k < Lk<.S.n vGk < LkvG k . Since f -< Uk G k was arbitrary, we obtain v Uk G k < 'Lk vG k , as required. To show that v is finitely additive, fix any disjoint sets G, G' E g. If I -< G and f' -< G', then f + I' -< G u G', and so J.Lf + J.LI ' == J.L(f + f') < v( G U G ' ) < vG + vG'. Taking the supremum over all f and I' gives vG + vG ' == v(G u G'), as required. 0 An outer measure J.L on S is said to be regular if it is finitely additive on 9 and enjoys the outer and inner regularity JLA J.LG inf{J.LG; G E Q, G :> A}, sup{J-lK; K E lC, KeG}, A c S, G E g. (9) (10) Lemma 2.25 (outer approximation) Every inner content J.L on S admits an extension to a regular outer measure. Proof: We may define the extension by (9), since the right-hand side equals J.lA when A E g. By the finite additivity on 9 we have 2J.L0 == J.L0 < 00, which implies J.l0 = O. To prove the countable subadditivity, fix any Al,A2'..' C S. For any e > 0 we may choose some G 1 ,G 2 ,... E Q with G n :) An and J-tG n < J-tAn + e2- n . Since J.l is subadditive on g, we get J-tU An < J.l u Gn <  J-lG n <  J-tAn + E. n n n n 
38 Foundations of Modern Probability The desired relation follows since c was arbitrary. Thus, the extension is an outer measure on S. Finally, the inner regularity in (10) follows from (8) and the monotonicity of J-L. 0 Lemma 2.26 (measurability) If J-t is a regular outer measure on S, then every Borel set in S is J.l-measurable. Proof: Fix any F E :F and A c G E g. By the inner regularity in (10), we may choose G 1 ,G 2 ,... E 9 with G n c G\F and J-tG n -+ J.l(G\F). Since J-L is nondecreasing and finitely additive on g, we get J-tG > J.l(G \ aG n ) == J-LG n + J-t(G \ G n ) > J-tG n + J-t( G n F) -+ J-t(G\F) + J-t(Gn F) > J.l(A \ F) + J-t(A n F). Using the outer regularity in (9) gives J-lA > J-t(A \ F) + J-L(A n F), F E:F, A c S. Hence, every closed set is measurable, and by Theorem 2.1 the measura- bility extends to (1(F) == B(S) == S. 0 Proof of Theorem 2.22: Construct an inner content v as in Lemma 2.24, and conclude from Lemma 2.25 that v admits an extension to a regular outer measure on S. By Theorem 2.1 and Lemma 2.26, the restriction of the latter to S == B(S) is a Radon measure on S, here still denoted by v. To see that It == v on 6+, fix any f E 6+. For n E Nand k E Z+, let IJ: (x) ( n I ( x) - k) + 1\ 1, G k {nf> k} == {Ii: > O}. Noting that G +1 c {IT: == I} and using the definition of v and the outer regularity in (9), we get for appropriate k v 1;:+1 < VG k + 1 < J-lfJ: < vG  < v If-I. Writing Go == Go == {f > O} and noting that nl == Ek IT:, we obtain nvf - vG o < nJ-tf < nvf + v G o . Here vG o < 00 since Go is bounded. Dividing by n and letting n -+ 00 gives J-lf == v f. To prove the asserted uniqueness, let J-t and v be Radon measues on S with J-t! == vf for all f E 6+. By an inner approximation, we have J-tG == vG for every G E g, and a monotone-class argument yields J-t == v. 0 By a topological group we mean a group endowed with a topology that renders the group operations continuous. Thus, the mapping (I, g) H- 1 9 is continuous from G 2 to G, whereas the mapping 9 t-+ g-1 is continuous from G to G. In the former case, G2 is equipped with the product topology. 
2. Measure Theory - Key Results 39 Introducing the Borel a-field 9 == B( G), we obtain a measurable group (G, Q), and we note that the group operations are measurable when G is lcscH. A measure J-t on G is said to be left-invariant if J--L(gB) == J-tB for all 9 E G and B E Q, where gB == {gb; b E B}, the left translate of B by g. This is clearly equivalent to J f (g k ) J-l (dk) == J--L f for any measurable function I: G -+ + and element 9 E G. The definition of right-invariant measures is similar. We may now state the basic existence and uniqueness theorem for invariant measures on groups. Theorem 2.27 (Haar measure) On every lcscH group G there exists, uniquely up to a normalization, a left-invariant Radon measure A =I o. If G is compact, then ,\ is also right-invariant. Proof (Weil): For any I, 9 E 6+ we define Ifl g == inf I:k Ck, where the infimum extends over all finite sets of constants Cl, . . . , C n > 0 such that f(x) < Lk:s;n ckg(SkX), x E G, for some Sl,..., Sn E G. By compactness, I/lg < 00 when 9 =I O. We also note that Ifl g is nondecreasing and translation invariant in f, and that it satisfies the subadditivity and homogeneity properties If + f'lg < Ifl g + 1/'lg, Icilg == clil g , (11 ) as well as the inequalities II fll M < Ifl g < Iflhlhlg. We may normalize Ifl g by fixing an fo E C+ \ {O} and putting '\gl == I/lg/lfol g , f,g E C+, 9 -I- O. From (11) and (12) we note that Ag(f + f') < Agi + Agf', Ag(cf) == cAgf, (13) 1/01 / 1 < Agf < I/lfo. (14) (12) Conversely, Ag is nearly superadditive in the following sense. Lemma 2.28 (near superadditivity) For any f, f' E C+ and E > 0, there exists an open set U i= 0 such that AgI + '\g/' < Ag(f + I') + E, 0 i= 9 -< if. Proof: Fix any h E 6+ with h == 1 on supp(f + I'), and define for 8 > 0 f6 == f + I' +8h, h6 = Ilf6, h == /'116, so that hh, h E 6+. By compactness we may choose a neighborhood U of the identity element e E G such that I h c5(X) - hh(y)1 < 8, Ih8(x) - h(y)1 < 8, x-ly E U. (15) 
40 Foundations of Modern Probability Now assume 0 =P 9 -< U, and let f6(X) < 2:k ckg(SkX) for some Sl,.. . , Sn E G and C1,... , C n > O. Since g(SkX) i= 0 implies SkX E U, we have by (15) f(x) = f6(x)h6(x) < 2: k ckg(Sk X )h6(X) < 2: k ck9(SkX){ h6(S;;1) + 6} , and similarly for f'. Noting that h6 + h < 1, we get Ifl g + If'lg < 2: k ck(l + 26). Taking the infimum over all dominating sums for f6 and using (11), we conclude that Ifl g + If'lg < 1/61g(1 + 28) < {If + f'ly + 8l h l g } (1 + 28). Now divide by I/olg, and use (14) to obtain Agf + Agf' < {Ag(1 + j') + <5Agh} (1 + 28) < Ag(1 + j') + 2<5lf + j'llo + <5(1 + 2<5)lhll o , which tends to )..g (f + I') as <5 -+ o. o Returning to the proof of Theorem 2.27, we may consider the functionals Ag as elements of the product space A = JR+. For any neighborhood U of e, let Au denote the closure in A of the set {A g ; 0 i= 9 -< U}. Since Agi < Ifll o < 00 for all f E 0+ by (14), the Au are compact by Tychonov's theorem. Furthermore, the family {Au; e E U} has the finite intersection property since U C V implies Au C Av. We may then choose an element ,.\ E nu Au, here regarded as a functional on 6+. From (14) we note that A i= O. To see that ,.\ is linear, fix any I, f' E 6+ and a, b > 0, and choose some 91,92,... E 6+ with suppgn t {e} such that Agn f -+ Af, A f ' ---t ,.\ 1 ' gn , Agn (af + bf') -t >..(af + bf'). By (13) and Lemma 2.28 we obtain >..(af + bf') = a>..f + bAf'. Thus, A is a nontrivial, positive linear functional on 6+, and so by Theorem 2.22 it extends uniquely to a Radon measure on S. The invariance of the functionals Ag clearly carries over to A. Now consider any left-invariant Radon measure A =1= 0 on G. Fixing a right- invariant Radon measure J-l t= 0 and a function h E 6+ \ {O}, we define p(x) = J h(y- 1 x)J1.(dy), x E G, 
2. Measure Theory - Key Results 41 and we note that p > 0 on G. Using the ipvariance of A and f1 together with Fubini's theorem, we get for any f E C+ (Ah) (J1f) J h(x) 'x(dx) J f(y) p(dy) J h(x)A(dx) J f(yx) p(dy) J p(dy) J h(x)f(yx) 'x(dx) J p(dy) J h(y- 1 x)f(x) 'x(dx) J f(x) 'x(dx) J h(y- 1 x) p(dy) == 'x(Jp). Since f was arbitrary, we conclude that (>"'h)fL == P . )\ or, equivalently, >.../ >"'h == p-l . fl. Here the right-hand side is independent of >..., and the asserted uniqueness follows. If S is compact, we may choose h 1 to obtain A/AS = J1/J-lS. 0 Given a group G and an abstract space S, we define a left action of G on S as a mapping (g, s)  gs from G x S to S such that es == sand (gh)s == g(hs) for any g, h E G and s E S, where e denotes the identity element in G. Similarly, a right action is a mapping (s, g)  S9 such that se == sand s(gh) == (sg)h for all s, g, h as above. The action is said to be transitive if for any s, t E S there exists some 9 E G such that gs == t or sg = t, respectively. All actions are henceforth assumed to be from the left. If G is a topological group and S is a topological space, we assume the action (x, s)  xs to be continuous from G x S to S. A function h: G --+ S is said to be proper if h- 1 K is compact in G for any compact set K c S; if this holds for every mapping 1fs(x) == xs, s E S, we say that the group action is proper. Finally, a measure fL on S is G-invariant if J..t(xB) = J-lB for any x E G and B E S. This is clearly equivalent to the relation J f(xs)J..t(ds) = J-lf for any measurable function f: S --+ + and element x E G. We may now state the basic existence and uniqueness result for invariant measures on a general IcscH space. The existence of Haar measures in Theorem 2.27 is a special case. Theorem 2.29 (invariant measure) Consider an lcscH group G that acts transitively and properly on an lcscH space S. Then the1'e exists, uniquely up to a normalization, a G-invariant Radon measure f..L =I 0 on S. Proof: Fix any pES, and let 1T denote the mapping x  xp from G to S. Letting A be a left Haar measure on G, we define J1 == A 0 7T- 1 . Since 7r is proper, we note that J..t is a Radon measure on S. To see that J-l is 
42 Foundations of Modern Probability G-invariant, let 1 E 6+ be arbitrary, and note that for any x E G Is J(xs) J.l(ds) = L f(xyp»..(dy) = L f(yp) )"(dy) = J.lf, by the invariance of A. To prove the uniqueness, let J.l be an arbitrary G-invariant Radon measure on S. Introduce the subgroup K == {x E G; xp == p} == 7r- 1 {p}, and note that K is compact since rr is proper. Let v be the normalized Haar measure on K, and define /(x) = i f(xk) v(dk), x E G, f E C'+(G). If xp == yp, we have y- 1 xp == p, and so y- 1 x = h E K, which implies x = yh. Hence, the left invariance of v yields /(x) = J(yh) = [f(Yhk) v(dk) = i f(yk) v(dk) = fey). We may then define a mapping f f-t f* by I*(s) == l(x), s == xp E S, x E G, f E 6+(G). For any subset B C (0, (0), we note that (f*)-1 B == rr(I- 1 B) C 7r[(supp f) . K]. Here the right-hand side is compact since the sets supp 1 and K are com- pact, and since 7r and the group operation in G are both continuous. Thus, f* has bounded support. Furthermore, f is continuous by dominated con- vergence, and so 1-1 (t, 00) is closed and hence compact for every t > o. By the continuity of 7r it follows that even (f*)-l(t,OO) is compact. In particular, f* is measurable. We may now define a functional A on C+(G) by Af == J.lf*, f E C+(G). The linearity and positivity of A are clear from the corresponding properties of the mapping f  f* and the measure J-L. We also note that A is finite on C+(G) since J.l is locally finite. By Theorem 2.22, we may then extend A to a Radon measure on G. To see that A is left-invariant, let f E 6+(G) be arbitrary and define fy(x) = f(yx). For any s = xp E Sand y E G we get f;(s) = Jy(x) = i f(yxk) v(dk) = J(yx) = f*(ys). Hence, by the invariance of J.l, L f(yx) )"(dx) = )..fy = J.lf; = Is f*(ys) J.t(ds) = J.tf* = )..f. 
2. Measure Theory - Key Results 43 Now fix any 9 E 6+(8), and put f ( x) == 9 ( xp) == 9 0 7r ( X ) , x E G. Then f E C+(G) because {f > O} C 7r- 1 suppg, which is compact since 7r is proper. By the definition of K, we have for any s == xp E S 1*(s) J(x) = i f(xk) v(dk) = i g(xkp) v(dk) i g(xp) v(dk) = g(s), and so /19 == J-lf* == Af == A(g 0 7r) == (A 0 7r- 1 )g, which shows that /1 == A 0 rr- 1 . Since A is unique up to a normalization, the same thing is true for J-l. 0 Exercises 1. Show that if J-ll == 11 . J.t and J-l2 == f2 . J-l, then /11 V J-l2 == (II V !2) . J-l and JLl 1\ J..L2 == (11 1\ f2) . J-L. In particular, we may take JL == J-.l] + J-.l2. Extend the result to sequences J-L 1 , J-l2, . . . . 2. Consider an arbitrary family J-li, i E I, of a-finite Ineasures on some measurable space S. Show that there exists a largest measure J-l == /\n JLn such that /1 < J-li for all i E I. Show also that if the P'i are bounded by some a-finite measure v, there exists a smallest measure {L == V n JLi such that J-li < jl for all i. (Hint: Use Zorn's lemma.) 3. Show that any countably additive set function J-l > 0 on a field A with JL0 == 0 extends to a measure on a(A). Show also that the extension is unique whenever J-l is bounded. 4. Extend the first assertion of Theorem 2.6 to the context of general invariant measures, as in Theorem 2.29. 5. Construct d-dimensional Lebesgue measure Ad directly, by the method of Theorem 2.2. Then show that Ad == Ad. 6. Derive the existence of d-dimensional Lebesgue measure from Riesz' representation theorem and the basic properties of the }{,iemann integral. 7. Extend the mean continuity in Lemma 2.7 to general invariant measures. 8. For any bounded, signed measure v on (0, A), show that there exists a smallest measure Ivl such that IvAI < IvlA for all A E .A. Show also that Ivl == v+ + v_, where V:i: are the components in the Hahn decomposition of v. Finally, for any bounded, measurable function f on 0, show that Iv II < lv' If I. 
44 Foundations of Modern Probability 9. Extend the last result to complex-valued measures X = J1- + iv, where J1- and v are bounded, signed measures on (!1, A). Introducing the complex- valued Radon-Nikodym density f = dx/d(IJ.t1 + lvI), show that Ixl = If I . (1J.t1 + Ivl). 10. Show by an example that the uniqueness in Theorem 2.29 may fail if the group action is not transitive. 
Chapter 3 Processes, Distributions, and Independence Random elements and processes; distributions and expectation; independence; zero-one laws; Borel-Cantelli lemma; Bernoulli sequences and existence; moments and continuity of paths Armed with the basic notions and results of measure theory from the previ- ous chapter, we may now embark on our study of probability theory itself. The dual purpose of this chapter is to introduce the basic terminology and notation and to prove some fundamental results, many of which are used throughout the remainder of this book. In modern probability theory it is customary to relate all objects of study to a basic probability space (n, A, P), which is nothing more than a normal- ized measure space. Random variables may then be defined as measurable functions  on !1, and their expected values as the integrals E == J dP. Furthermore, independence between random quantities reduces to a kind of orthogonality between the induced sub-O"-fields. It should be noted, how- ever, that the reference space n is introduced only for technical convenience, to provide a consistent mathematical framework. Indeed, the actual choice of f! plays no role, and the interest focuses instead on the various induced distributions £() = P 0 -1. The notion of independence is fundamental for all areas of probability theory. Despite its simplicity, it has some truly remarkable consequences. A particularly striking result is Kolmogorov's 0-1 law, vv"hich states that every tail event associated with a sequence of independent random elements has probability zero or one. As a consequence, any random variable that depends only on the "tail" of the sequence must be a.s. constant. This result and the related Hewitt-Savage Q-llaw convey much of the flavor of modern probability: Although the individual elements of a random sequence are erratic and unpredictable, the long-term behavior may often conform to deterministic laws and patterns. Our main objective is to uncover the latter. Here the classical Borel-Cantelli lemma is a useful tool, among others. To justify our study, we need to ensure the existence of the random objects under discussion. For most purposes, it suffices to use the Lebesgue unit interval ([0,1], B,..\) as the basic probability space. In this chapter the existence will be proved only for independent random variables with prescribed distributions; we postpone the more general discussion until 
46 Foundations of Modern Probability Chapter 6. As a key step, we use the binary expansion of real numbers to construct a so-called Bernoulli sequence, consisting of independent random digits 0 or 1 with probabilities 1 - p and p, respectively. Such sequences may be regarded as discrete-time counterparts of the fundamental Poisson process, to be introduced and studied in Chapter 12. The distribution of a random process X is determined by the finite-di- mensional distributions, and those are not affected if we change each value Xt on a null set. It is then natural to look for versions of X with suitable regularity properties. As another striking result, we shall provide a moment condition that ensures the existence of a continuous modification of the process. Regularizations of various kinds are important throughout modern probability theory, as they may enable us to deal with events depending on the values of a process at uncountably many times. To begin our systematic exposition of the theory, we may fix an ar- bitrary probability space (0, A, P), where P, the probability measure, has total mass 1. In the probabilistic context the sets A E A are called events, and P A == P(A) is called the probability of A. In addition to results valid for all measures, there are properties that depend on the boundedness or normalization of P, such as the relation PAc == 1 - P A and the fact that An .!- A implies P An --t P A. Some infinite set operations have special probabilistic significance. Thus, given any sequence of events AI, A 2 , . .. E A, we may be interested in the sets {An Lo.}, where An happens infinitely often, and {An ult.}, where An happens ultimately (i.e., for all but finitely many n). Those occurrences are events in their own right, expressible in terms of the An as {An i. o. } {An ult.} {  lA == oo } == n U Ak, n n n k?:.n { "" lAc < oo } == U n Ak. n n n k'?:.n (1) (2) From here on, we omit the argument w from our notation when there is no risk for confusion. For example, the expression {En IAn = oo} is used as a convenient shorthand form of the unwieldy {w E f1; En IAn (w) == oo}. The indicator functions of the events in (1) and (2) may be expressed as l{An i.o.} == limsuplAn' n -+ ex:> I{An ult.} = liminfl An , n-+oo where, for typographical convenience, we write 1 { .} instead of 1 {.}. Applying Fatou's lemma to the functions IAn and lA, we get P{An i.o.} > limsupPA n , n-+oo P{An ult.} < liminf PAn- n-+oo Using the continuity and subadditivity of P, we further see from (1) that P{An i.o.} == tim P u Ak < lim  PAk. n -+ 00 k '?:.n n -+ 00 L....J k '?:. n 
3. Processes, Distributions, and Independence 47 If Ln PAn < 00, we get zero on the right, and it follows that P{ An i.o.} == O. The resulting implication constitutes the easy part of the Borel-Cantelli lemma, to be reconsidered in Theorem 3.18. Any measurable mapping € of 0 into some measurable space (5, S) is called a random element in S. If B E S, then { E B} ==- €-l B E A, and we may consider the associated probabilities P{€EB}==P(€-lB)==(PO€-l)B, BES. The set function £(€) == P 0 €-l is a probability measure on the range space S of , called the distribution or law of . We shall also use the term distribution as synonomous to probability measure, even when no generating random element has been introduced. Random elements are of interest in a wide variety of spaces. A random element in S is called a random variable when S == JR, a random vector when S == JRd, a random sequence when S == ]Roo, a random or stochastic process when S is a function space, and a random measure or set when S is a class of measures or sets, respectively. A metric or topological space 8 will be endowed with its Borel a-field B(8) unless a a-.field is otherwise specified. For any separable metric space 5, it is clear from Lemma 1.2 that € == (l, €2, . . .) is a random element in 5 00 iff €l, €2, . .. are random elements in S. If (8, S) is a measurable space, then any subset A c S becomes a measur- able space in its own right when endowed with the a-field An S == {A n B; B E S}. By Lemma 1.6 we note in particular that if S is a metric space with Borel a-field S, then A n S is the Borel a-field in A. Any random element in (A, An S) may clearly be regarded, alternatively, as a random element in S. Conversely, if € is a random element in S such that € E A a.s. (almost surely or with probability 1) for some A E , then € == TJ a.s. for some random element 17 in A. Fixing a measurable space (8, S) and an abstract index set T, we shall write ST for the class of functions f : T  S, and let ST denote the a- field in ST generated by all evaluation maps 1ft : ST  S, t E T, given by 1ft! == j(t). If X: n -1- U CST, then clearly Xt == 1ft 0)( maps n into S. Thus, X may also be regarded as a function X ( t, w) == -LY t (w) from T x n to S. Lemma 3.1 (measurability) Fix a measurable space (5" S), an index set T, and a subset U c ST. Then a function X: 0 -+ U is [J n ST -measurable iff Xt : n -+ S is S-measurable for every t E T. Proof: Since X is U-valued, the U n ST-measurability is equivalent to measurability with respect to ST. The result now follows by Lemma 1.4 from the fact that ST is generated by the mappings 1ft. 0 A mapping X with the properties in Lemma 3.1 is called an 5-valued (random) process on T with paths in U. By the lemma it is equivalent to regard X as a collection of random elements Xt in the state space S. 
48 Foundations of Modern Probability For any random elements  and 'fJ in a common measurable space, the equality  d 1] means that  and fJ have the same distribution, or £() == £('fJ). If X is a random process on some index set T, the associated finite- dimensional distributions are given by tl,...,tn == £(X tl ,..., X tn ), tI,..., t n E T, n E N. The following result shows that the distribution of a process is determined by the set of finite-dimensional distributions. Proposition 3.2 (finite-dimensional distributions) Fix any 5, T, and U as in Lemma 3.1, and let X and Y be processes on T with paths in U. Then X d Y iff d (X tI , . . . , X tn ) == (Yi 1 , . . . , Yi n ), t I , . . . , t n E T, n E N. (3) Proof: Assume (3). Let V denote the class of sets A E ST with P{ X E A} == P {Y E A}, and let C consist of all sets A == {f E ST; (ftl'... , ft n ) E B}, t 1, . . . , t n E T, B E sn, n EN. Then C is a 7r-system and V a A-system, and furthermore C c V by hypothesis. Hence, ST == o-(C) C V by Theorem 1.1, which means that X d Y. 0 For any random vector  == (1,. . . , d) in lR d , we define the associated distribution function F by F(XI,." ,Xd) = pnk$;d{k < xd, Xl,... ,Xd E R The next result shows that F determines the distribution of . Lemma 3.3 (distribution functions) Let  and 7} be random vectors in IR d with distribution functions F and G. Then  d 'T7 iff F == G. Proof: Use Theorem 1.1. o The expected value, expectation, or mean of a random variable  is defined as E = l dP = l x(poCI)(dx) whenever either integral exists. The last equality then holds by Lemma 1.22. By the same result we note that, for any random element  in some measurable space S and for an arbitrary measurable function f: S --t IR, (4) Ef() = l f() dP = Is f(s)(P 0 C l )(ds) - LX(Po(Jo)-I)(dx), (5) 
3. Processes, Distributions, and Independence 49 provided that at least one of the three integrals exists. Integrals over a measurable subset A c n are often denoted by E[;A] = E(lA) = i dP, A E A. For any random variable  and constant p > 0, the integral EIIP IIJJ is called the pth absolute moment of . By Holder's inequality (or by Jensen's inequality in Lemma 3.5) we have 1Illp < 11llq for p < q, so the corresponding LP-spaces are nonincreasing in p. If  E LP and either pEN or  > 0, we may further define the pth moment of  as lp. The following result gives a useful relationship between moments and tail probabilities. Lemma 3.4 (moments and tails) For any random variable  > 0, Ee = p 1 00 P{ > t}tp-1dt = p 1 00 P{ > t}tp-1dt, p > O. Proof: By calculus and Fubini's theorem, Ee = pE 1$. tp-1dt = pE 1 00 1{ > t} tp-1dt - p 1 00 P{ > t} tp-1dt. The proof of the second expression is similar. o A random vector  = (1,'.., d) or process X = (.X t ) is said to be integrable if integrability holds for every component k or value Xt, in which case we may write E == (El,...,Ed) or EX == (EXt). Recall that a function f: ]Rd ---+ IR is said to be convex if f(px + (1 - p)y) < pf(x) + (1 - p)f(y), x, Y E d, P E [0,1]. (6) The relation may be written as f(E) < Ef(), where  is a random vector in d with P{ = x} = 1 - P{{ = y} = p. The following extension to arbitrary integrable random vectors is known as Jensen's inequality. Lemma 3.5 (convex maps, Holder, Jensen) For any 'integrable random vector  in JR.d and convex function f: JR.d ---+ JR., we have Ej() > j(E{). Proof: By a version of the Hahn-Banach theorem, the convexity condition (6) is equivalent to the existence for every s E }Rd of a supporting affine function hs(x) = ax + b with f > hs and f(s) = hs(s). Taking s = E{ gl ves Ef() > Ehs() = hs(E) = f(E). o The covariance of two random variables , 17 E £2 is given by cov(, 'rJ) = E( - E)(1J - E17) = E{1J - E . E1J. 
50 Foundations of Modern Probability The resulting functional is bilinear, in the sense that cov ( "'. ajj," bk17k ) == ". " ajbkcov(j, 17k). J5:m k5:n J5:mkn Taking  == 17 E £2 yields the variance var() == cov(,) == E( - E)2 == E2 - (E)2, and we note that, by the Cauchy-Buniakovsky inequality, Icov(,17)1 < {var()var(17)}1/2. Two random variables  and 17 are said to be uncorrelated if cov( , TJ) == O. For any collection of random variables t E £2, t E T, the associated cova-riance function Ps,t == cov(s, t), s, t E T, is nonnegative definite, in the sense that 2:ij aiajPt1.,tJ > 0 for any n E N, t I ,... t n E T, and aI, . . . , an E JR. This is clear if we write L. .ai a jPt 1 ,t J == L. .aiajCov(ti'tj) == v ar { L.ait1. } > o. ,J 'I, ,J 'l The events At E A, t E T, are said to be (mutually) independent if, for any distinct indices t 1, 0 0 . , t n E T, P n Atk == II PAtko kn kn (7) More generally, we say that the families C t C A, t E T, are independent if independence holds between the events At for arbitrary At E C t , t E T. Finally, the random elements t, t E T, are independent if independence holds between the generated a-fields a(t), t E T. Pairwise independence between two objects A and B,  and 17, or Band C is often denoted by AlLB, iL17, or BJlC, respectively. The following result is often useful to prove extensions of the indepen- dence property. Lemma 3.6 (extension) lithe 7r-systems C t , t E T, are independent, then so are the generated a-fields Ft == a(C t ), t E T. Proof: We may clearly assume that C t =1= 0 for all t. Fix any distinct indices t I , . . . , t n E T, and note that (7) holds for arbitrary Atk E C tk , k = 1, . 0 . , n. For fixed A t2 , . . . , At n , we introduce the class D of sets Atl E A satisfying (7). Then D is a A-system containing C tl , and so D ::> a(C tl ) = F t1 by Theorem 1.1. Thus, (7) holds for arbitrary Atl E Ftl and Atk E C tk , k = 2, . . . , n. Proceeding recursively in n steps, we obtain the desired extension to arbitrary Atk E F tk , k == 1, . . . , n. 0 As an immediate consequence, we obtain the following basic grouping property. Here and in the sequel we shall often write F V 9 = a{ F, Q} and Fs = VtEsF t = o-{Ft; t E S}. 
3. Processes, Distributions, and Independence 51 Corollary 3.7 (grouping) Let Ft, t E T, be independenl a-fields, and let T be a disjoint partition ofT. Then the a-fields Fs == VtE:s Ft, 5 E T, are again independent. Proof: For any SET, let Cs denote the class of all finite intersections of sets in UtES Ft. Then the classes Cs are independent n-systems, and by Lemma 3.6 the independence extends to the generated a.. fields Fs. 0 Though independence between more than two a-fields is clearly stronger than pairwise independence, we shall see how the full independence may be reduced to the pairwise notion in various ways. Given any set T, we say that a class T c 2 T is separating, if for any s =I=- t in T there exists some SET such that exactly one of the elements sand t lies in S. Lemma 3.8 (pairwise independence) (i) The a-fields F 1 ,F 2 ,... are independent iff Vk<nJ:-k llFn+l for all n. (ii) The a-fields Ft, t E T, are independent iff FsJ1.:Fsc for all sets S in some separating class T c 2 T . Proof: The necessity of the two conditions follows from Corollary 3.7. As for the sufficiency, we consider only part (ii), the proof for (i) being similar. Under the stated condition, we need to show that, for any finite subset SeT, the a-fields Fs, S E S, are independent. Let 151 denote the cardinality of 5, and assume the statement to be true for ISI < n. Proceeding to the case when ISI == n + 1, we may choose U E T such that S' == S n U and S" == S \ U are nonempty. Since F s,llFs", we get for any sets As E Fs, s E S, P n As == ( p n As ) ( p n As ) == II PAs, sES sES' sES" sES where the last relation follows from the induction hypothesis. o A a-field F is said to be P-trivial if P A == 0 or 1 for every A E F. We further say that a random element is a.s. degenerate if its distribution is a degenerate probability measure. Lemma 3.9 (triviality and degeneracy) A a-field:F is i)-trivial iff F lLF. In that case, any :F -measurable random element  taking values in a separable metric space is a. s. degenerate. Proof: If F llF, then for any A E F we have P A == P(A n A) == (P A)2, and so P A == 0 or 1. Conversely, assume that F is P-trivial. Then for any two sets A, B E :F we have p(An B) == P A /\ PB == P A. P B, which means that F Jl:F. Now assume that F is P-trivial, and let  be as stated. For each n we may partition S into countably many disjoint Borel sets Bnj of diameter < n- 1 . Since P{ E B nj } = 0 or 1, we have  E Bnj a.s. for exactly one j, 
52 Foundations of Modern Probability say for j = jn. Hence,  E nn Bn,jn a.s. The latter set has diameter 0, so it consists of exactly one point s, and we get  == s a.s. 0 The next result gives the basic relation between independence and product measures. Lemma 3.10 (product measures) Let l,. . . , n be random elements in some measurable spaces 8 1 , . . . , Sn with distributions J.ll, . . . , J-Ln. Then the k are independent iff  == (l, . . . , n) has distribution J.-Ll 0 . . . Q9 J.-Ln. Proof: Assuming the independence, we get for any measurable product set B == Bl X . . . x Bn P{ E B} == II P{k E Bk} == II JlkBk == 0 Jlk B. kn kn kn This extends by Theorem 1.1 to arbitrary sets in the product a-field. 0 In conjunction with Fubini's theorem, the last result leads to a useful method of computing expected values. Lemma 3.11 (conditioning) Let  and 1] be independent random elements in some measurable spaces Sand T, and let the function f: S x T  JR be measurable with E(Elf(s, 1])l)s= < 00. Then Ef(f,,1]) == E{Ef(s, 1]))s=. Proof: Let Jl and v denote the distributions of  and 'T}, respectively. Assuming that f > 0 and writing g(s) = Ef(s,1]), we get, by Lemma 1.22 and Fubini's theorem, E f(t;, 17) - f f( s, t)(J.L Q9 v)(dsdt) f J.L(ds) f f(s, t)v(dt) = f g(s)J.L(ds) = Eg(f,,). For general f, this applies to the function If I, and so Elf(, 1])1 < 00. The desired relation then follows as before. 0 In particular, for any independent random variables 1,. . . , n, we have E It f"k = IIk Ef.k, var L k f"k = L k var f.k, whenever the expressions on the right exist. If  and 'TJ are random elements in a measurable group G, then the product 1] is again a random element in G. The following result gives the connection between independence and the convolutions of Lemma 1.28. Corollary 3.12 (convolution) Let  and 'TJ be independent random ele- ments with distributions J..t and 1/, respectively, in some measurable group G. Then the product f, 1] has distribution JL * v. Proof: For any measurable set BeG, we get by Lemma 3.10 and the definition of convolution P{'TJ E B} == (JL  1/){(x, y) E G 2 ; xy E B} = (J-L * 1/)B. 0 
3. Processes, Distributions, and Independence 53 Given any sequence of a-fields :Fi, ;:2, . . . , we introduce the associated tail (7- field T == n V :Fk == n a{:Fk; k > n}. n k>n n The following remarkable result shows that T is trivial whenever the Fn are independent. An extension appears in Corollary 7.25. Theorem 3.13 (Kolmogorov's 0-1 law) Let :F 1 ,F2,... be independent a-fields. Then the tail a-field T == nn V k>n:Fk is P-triv'ial. Proof: For each n E N, define Tn == V k>n F k , and note that :F 1 , . . . ,Fn, Tn are independent by Corollary 3.7. Hence, so are the a-fields :F 1 , . . . , Fn, T, and then also :F1' :F2' . . . , T. By the same theorem we obtain 'lOll T, and so T Jl T. Thus, T is P-trivial by Lemma 3.9. 0 We shall consider some simple illustrations of the last theorem. Corollary 3.14 (sums and averages) Let 1, 2, . .. be independent ran- dom variables, and put Sn == 1 + . . . + n' Then each of the sequences (Sn) and (8n/n) is either a.s. convergent or a.s. divergent. For the latter sequence, the possible limit is a.s. degenerate. Proof: Define :F n = a{n}, n E N, and note that the associated tail a- field T is P-trivial by Theorem 3.13. Since the sets of convergence of (8n) and (Sn/n) are T-measurable by Lemma 1.9, the first assertion follows. The second assertion is obtained from Lemma 3.9. 0 By a finite permutation of N we mean a bijective map p: N -+ N such that Pn = n for all but finitely many n. For any space S, a finite permutation p of N induces a permutation Tp on 8 00 given by Tp(s) == 80p= (Spl,SP2"'.)' S == (81,82,...) E 8 00 . A set I c 8 00 is said to be symmetric (under finite pernlutations) if T; 1 I = {s E 8 00 ; Sop E I} == I for every finite permutation p of N. If (8, S) is a measurable space, the symmetric sets I E 8 00 form a sub-a-field I c 5 00 , called the permutation invariant a-field in 8 00 . We may now state the other basic 0-1 law, which refers to sequences of random elements that are independent and identically distributed (often abbreviated as i.i.d.). Theorem 3.15 (Hewitt-Savage 0-1 law) Let  be an infinite sequence of i. i. d. random elements in some measurable space (8, S), and let I denote the permutation invariant a-field in 8 00 . Then the a-field -lI is P-trivial. Our proof is based on a simple approximation. Write A6.B = (A \ B) U (B \ A), 
54 Foundations of Modern Probability and note that P(AB) == P(ACBC) == EllA - IBI, A,B E A. (8) Lemma 3.16 (approximation) Given any a-fields ;:1 C ;:2 C ... and a set A E V n Fn, there exist some AI, A 2 , . . . E Un Fn with P(AAn) ---+ o. Proof: Define C == Un Fn, and let V denote the class of sets A E V n Fn with the stated property. Then C is a 7r-system and V a 'x-system containing C. By Theorem 1.1 we get V n Fn == a(C) c V. 0 Proof of Theorem 3.15: Define Jl == £(), put :F n == sn x Boo, and note that I c 5 00 == V n Fn. For any I E I there exist by Lemma 3.16 some Bn E sn such that the orresponding cylinder sets In == En X 8 00 satisfy Jl(I In) ---+ O. Writing In == sn X Bn X SOO, it is clear from the symmetry of Jl and I that Jli n == Jlln -+ JlI and Jl(I6i n ) == Jl(I In) -+ o. Hence, by (8), Jl(I (In n in)) < Jl(I In) + Jl(I in) -+ o. Since moreover InJl.l n under Jl, we get - - 2 JlI  Jl(In n In) == (Jl1n)(Jlln) -+ (JlI) . Thus, JlI == (J.LI)2, and so P 0 c;-l I == JlI == 0 or 1. o The next result lists some typical applications. Say that a random variable  is symmetric if  d -. Corollary 3.17 (random walk) Let l, 2, . .. be i.i.d., nondegenerate random variables, and put Sn == l + . . . + n' Then (i) P{Sn E B i.o.} == 0 or 1 for any B E B; (ii) lim sUPn Sn == 00 a. s. or -00 a. s.; (iii) lim sUP n (:f:S n ) == 00 a.s. if the n are symmetric. Proof: Statement (i) is immediate from Theorem 3.15, since for any finite permutation p of N we have x P1 + . . . + x pn == Xl + . . . + X n for all but finitely many n. To prove (ii), conclude from Theorem 3.15 and Lemma 3.9 that limsuPn Sn == c a.s. for some constant c E 1R == [-00,00]. Hence, a.s., c==limsUPnSn+l ==limsuPn(Sn+l-c;l)+c;l ==e+l. If lei < 00, we get c;l == 0 a.s., which contradicts the nondegeneracy of {I. Thus, lei == 00. In case (iii), we have c == limsuPnSn > liminfnS n == -limsUP n ( -Sn) == -c, and so -c < e E {:1:00}, which implies c == 00. o Using a suitable zero-one law, one can often rather easily see that a given event has probability zero or one. Determining which alternative ac- tually occurs is often harder. The following classical result, known as the 
3. Processes, Distributions, and Independence 55 Borel-Cantelli lemma, may then be helpful, especially when the events are independent. An extension to the general case appears in Corollary 7.20. Theorem 3.18 (Borel, Cantelli) Let AI, A 2 ,' .. E A. Then L:n PAn < 00 implies P {An Lo.} == 0, and the two conditions are equivalent when the An are independent. Here the first assertion was proved earlier as an application of Fatou's lemma. The use of expected values allows a more transparent argument. Proof: If L:n PAn < 00, we get by monotone convergence ELn IAn = Ln ElAn = Ln PAn < 00. Thus, L:n IAn < 00 a.s., which means that P{An i.a.} =: O. Next assume that the An are independent and satisfy L:n PAn 00. Noting that 1 - x < e- x for all x, we get P u Ak 1- P n Ak == 1- II PAk kn kn kn 1- II k2n (1- PAk) > 1- II k2n exp(-PA k ) - 1 - exp { - L k2n P Ak } = 1. Hence, as n -t 00, 1 = P u Ak.!- P n U Ak = P{An i.o.}, k2n n kn and so the probability on the right equals 1. o For many purposes it is sufficient to use the Lebesgue unit interval ([0, 1], B[O, 1], A) as the basic probability space. In particular., the following re- sult ensures the existence on [0,1] of some independent random variables 1, 2, . .. with arbitrarily prescribed distributions. The present statement is only preliminary. Thus, we shall remove the independence assumption in Theorem 6.14, prove an extension to arbitrary index sets in Theorem 6.16, and eliminate the restriction on the spaces in Theorem 6.17. Theorem 3.19 (existence, Borel) For any probability rneasures J.Ll, J12, . . . on some Borel spaces 8 1 ,8 2 ".. , there exist some independent random elements 1, 2,... on ([0,1],'\) with distributions J.Ll, J.L2, . .. . As a consequence, there exists a probability measure ,11 on 51 x 52 X . . . satisfying J1 0 (7rl, . . . , 7r n) -1 = J.Ll Q9 . . . Q9 J.Ln, n EN. For the proof, we first consider two special cases of independent interest. By a Bernoulli sequence with rate rate p we mean a sequence of i.i.d. random variables 1,2,... such that P{n == I} = 1 -- P{n = O} = p. Furthermore, we say that a random variable {} is uniformly distributed on [0, I] (written as U(O,l)) if its distribution £({}) equals Lebesgue 
56 Foundations of Modern Probability measure A on [0,1]. Every number x E [0,1] has a binary expansion rl, T2, . .. E {O, I} satisfying x == Ln r n 2- n , and to ensure uniqueness we assume that Ln Tn == 00 when x > o. The following result provides a simple construction of a Bernoulli sequence on the Lebesgue unit interval. Lemma 3.20 (Bernoulli sequence) Let {) be a random variable in [0,1] with binary expansion 1, 2, . ... Then {) is U(O,I) iff the n form a Bernoulli sequence with rate . Proof: If {) is U(O, 1), then pnjn{j == k j } == 2- n for all k 1 ,..., k n E {O, I}. Summing over kl'...' k n - 1 gives P{n == k} ==  for k == 0 and 1. A similar calculation yields the asserted independence. Now assume instead that the n form a Bernoulli sequence with rate . - - - d - Letting 1J be U(O, 1) with binary expansion 1, 2, . .. , we get (n) == (n). Thus, {) = Ln n2-n d Ln tnTn = J. The next result shows how a single U(O, 1) random variable can be used to generate a whole sequence. o Lemma 3.21 (reproduction) There exist some measurable/unctions II, 12, . .. on [0,1] such that whenever {) is U(O,l), the random variables 1J n == Inca) are i.i.d. U(O,l). Proof: For any x E [0, 1] we introduce the associated binary expansion 91 (x), 92 (x), . .. and note that the 9k are measurable. Rearranging the 9k into a two-dimensional array h nj , n,j E N, we define In(x) == L .2- j hnj(x), x E [0,1], n E N. J By Lemma 3.20 the random variables 9k({}) form a Bernoulli sequence with rate , and the same result shows that the variables {}n == In({)) are U(O, 1). The latter are further independent by Corollary 3.7. 0 Finally, we need to construct a random element with given distribution from an arbitrary randomization variable. The required lemma is stated in a version for kernels, to meet the needs of Chapters 6, 8, and 14. Lemma 3.22 (kernels and randomization) Let f-t be a probability kernel from a measurable space S to a Borel space T. Then there exists a measur- able function f: S x [0, 1] --+ T such that if {} is U(0,1), then 1(8,19) has distribution Jl( 8, .) for every 8 E s. Proof: We may assume that T is a Borel subset of [0,1], in which case we may easily reduce to the case when T = [0, 1]. Define I(s, t) = sup{x E [0,1]; Jl(s, [0, x]) < t}, S E S, t E [0,1], (9) and note that 1 is product measurable on S x [0,1], since the set {(s, t); p,(s, [0, x)) < t} is measurable for each x by Lemma 1.12, and the supremum 
3. Processes, Distributions, and Independence 57 in (9) can be restricted to rational x. Iff) is U(O, 1), we get P{f(s, iJ) < x} == P{19 < /1(s, [0, x])} == /1(s, [0, x]), x E [0,1], and so I (s, 19) has distribution /1( s, .) by Lemma 3.3. 0 Proof of Theorem 3.19: By Lemma 3.22 there exist some measurable functions In: [0,1] -t Sn such that A 0 f;:l == /1n. Letting; {} be the identity mapping on [0,1] and choosing 19 1 ,19 2 , . .. as in Lemma :.21, we note that the functions n == In(19 n ), n E N, have the desired joint distribution. 0 Next we consider the regularization and sample path properties of ran- dom processes. Say that two processes X and Y on the same index set T are versions of each other if Xt == yt a.s. for each t E T. In the special case when T == JRd or JR+, we note that two continuous or right-continuous versions X and Y of the same process are indistinguishable, in the sense that X = Y a.s. In general, the latter notion is clearly stronger. For any function f between two metric spaces (S,p) and (S',p'), the associated modulus of continuity wf == w(f,.) is given by wf(r) = sup{p'(/s, it); s, t E S, p(s, t) < r}, r > o. Note that f is uniformly continuous iff wf(r) -t 0 as r --+ O. Say that f is Holder continuous with exponent c if wf(r) :S r C as r -+ O. The property is said to hold locally if it is true on every bounded set. (Here and in the sequel, the relation f 5. 9 between positive functions nleans that I < cg for some constant c < 00.) A simple moment condition ensures the existence of a Holder-continuous version of a given process on ]Rd. Important applications are given in The- orems 13.5, 21.3, and 22.4, and a related tightness criterion appears in Corollary 16.9. Theorem 3.23 (moments and continuity, Kolmogorov, Loeve, Chentsov) Let X be a process on JR.d with values in a complete metric space (8, p), and assume for some a, b > 0 that E{p(X s , xt)}a :S Is - tl d + b , S, t E Rd. (10) Then X has a continuous version, and the latter is a.s. locally Holder continuous with exponent c for any c E (0, b/a). Proof: It is clearly enough to consider the restriction of X to [0, l]d. Define Dn = {(k 1 ,...,kd)2- n ; k1,...,kn E {1,...,2n}}, n EN, and let n == max{p(X s , X t ); s, t E Dn, Is - tl == 2-n}. n E N. Since I{(s, t) E D; Is - tl == 2- n }1 < d2 dn , n E N, 
58 Foundations of Modern Probability we get by (10), for any c E (0, bja), EL(2cnn)a == L2acnE:S L2acn2dn(2-n)d+b == L2(ac-b)n < 00. n n n n The sum on the left is then a.s. convergent, and therefore n < 2- cn a.s. ,.--... Now any two points s, t E Un Dn with Is - tl < 2- m can be connected by a piecewise linear path involving, for each n > m, at most 2d steps between nearest neighbors in Dn. Thus, for r E [2-m-l, 2- m ], sup {p(X s , Xd; s, t E Un Dn, Is - tl < r } < "'" n < "'" 2- cn < 2- cm < r C , ----- n?m ..- n?m --- ..- which shows that X is a.s. Holder continuous on Un Dn with exponent c. In particular, there exists a continuous process Y on [O,I]d that agrees with X a.s. on Un Dn, and it is easily seen that the Holder continuity of Y on Un Dn extends with the same exponent c to the entire cube [0, l]d. To show that Y is a version of X, fix any t E [0, l]d and choose tl, t2,... E Un Dn with t n --t t. Then X tn == ¥in a.s. for each n. Furthermore, X tn  Xt by (10) and ¥in ---t yt a.s. by continuity, so Xt = ¥i a.s. 0 The next result shows how regularity of the paths may sometimes be established by comparison with a regular process. Lemma 3.24 (transfer of regularity) Let X d Y be random processes on some index set T, taking values in a separable metric space S, and assume that the paths of Y lie in a set U C ST that is Borel for the a-field U == (B(S))T n U. Then X has a version with paths in U. Proof: For clarity we may_ write Y for the path of Y, regarded as a random element in U. Then Y is Y-measurable, and by Lemma 1.13 there exists a measurable mapping f: ST -+ U such that Y == fey) a.s. Define X == j(X), and note that (X,X) d (Y, Y). Since the diagonal in 8 2 is measurable, we get in particular P{Xt == X t } == P{t == yt} == 1, t E T. o We conclude this chapter with a characterization of distribution functions in ]Rd, required in Chapter 5. For any vectors x = (Xl'...' Xd) and y == (Yl, . . . , Yd), write x < Y for the componentwise inequality Xk < Yk, k == 1, . . . , d, and similarly for x < y. In particular, the distribution function F of a probability measure J1; on JRd is given by F(x) = J1;{Y; Y < x}. Similarly, let X V Y denote the componentwise maximum. Put 1 == (1,...,1) and 00 = (00,...,00). For any rectangular box (x,y] == {u; x < U < y} == (XI,YI] x ... x (Xd,Yd], we note that J1;(x,y] == Eus(u)F(u) where s(u) == (-l)P with p = Ek 1{ Uk == Yk} and the summation extends over all corners u of (x, y]. Let F(x, y] denote the stated sum and say that F has nonnegative increments if 
3. Processes, Distributions, and Independence 59 F(x, y] > 0 for all pairs x < y. Let us further say that F is right-continuous if F(xn) -t F(x) as X n .J, x and proper if F(x) -t 1 or 0 as mink Xk -+ :too, respectively. The following result characterizes distribution functions in terms of the mentioned properties. Theorem 3.25 (distribution functions) A function F: }Rd -+ [0,1] is the distribution function of some probability measure /-i on }Rd iff it is right- continuous and proper with nonnegative increments. Proof: Assume that F has the stated properties, and note that the associated set function F(x, y] is finitely additive. Since F is proper, we further have F(x, y] -t 1 as x -t -00 and y --t 00, that is, as (x,y] t (-00,00) = d. Hence, for every n E N there exists a probability measure J-Ln on (2- n Z)d with Z == {. . . , -1,0,1, . . . } such that J-Ln { 2 - n k} :=: F (2 - n (k - 1), 2 - n k], k E 7l d, n EN, and from the finite additivity of F(x, y] we obtain J-Lm(2- m (k - 1, k]) == J-Ln(2- m (k - 1, k]), k E Zd, m < n in N. (11) In view of (11), we may split the Lebesgue unit interval ([0,1], B[O, 1],'x) recursively to construct some random vectors 1,2,'" with distributions J.Ll, J-t2, . .. such that m - 2- m < €n < €m for all m <: n. In particular, 1 > €2 > ... > €l - 1, and so €n converges pointwise to some random vector . Define J.l == ,X 0 €-l . To see that J-t has distribution function F, we note that since F is proper, 'x{€n < 2- n k} == J-tn(-oo,2- n k] == F(2- n k), k E Zd, n E N. Since also n .J..  a.s., Fatou's lemma yields for dyadic x E jRd ,x{€ < x} 'x{n < X ult.} < liminfn'x{n < x} < F(x) == limsuPnA{€n < x} < A{n < x i.o.} < 'x{€ < x}, and so F(x) < A{ < x} < F(x + 2- n l), n E N. Letting n -t 00 and using the right-continuity of 1':;1, we get A {€ < x} == F(x), which extends to any x E R d by the right-continuity of both sides. 0 The last result has the following version for unbounded measures. Corollary 3.26 (unbounded measures) Let the function F on JRd be right- continuous with nonnegative increments. Then there exi.sts a measure J-L on ]Rd such that J.L(x, y] == F(x, y] for all x < y in d. Proof: For any a E JRd, we may apply Theorem 3.25 to suitably normal- ized versions of the function Fa(x) == F(a, a V x] to obtain a measure J-La 
60 Foundations of Modern Probability on [a, (0) with J-ta ( a, x] == F ( a, x] for all x > a. Then clearly tla == J.Lb on (a V b, (0) for any a and b, and so the set function J.L == sUPa J.1a is a measure with the required property. 0 Exercises 1. Give an example of two processes X and Y with different distributions d such that Xt == yt for all t. 2. Let X and Y be {O, l}-valued processes on some index set T. Show that X d Y iff P{X t1 + ... + X tn > O} == P{Yi 1 + ... + Yi n > O} for all n E N and t 1, . . . , t n E T. 3. Let F be a right-continuous function of bounded variation and with F(-oo) == o. Show for any random variable  that EF() == J P{ > t} F(dt). (Hint: First take F to be the distribution function of some random variable 1JlL, and use Lemma 3.11.) 4. Consider a random variable  E L 1 and a strictly convex function f on JR. Show that Ef() == f(E) iff  == E a.s. 5. Assume that € == E j ajj and TJ == E j b j 1]j, where the sums converge in L 2 . Show that cov(, 1]) == Ei,j aibjcov(i, 'fJj), where the double series on the right is absolutely convergent. 6. Let the o--fields :Ft,n, t E T, n E N, be nondecreasing in n for each t and independent in t for each n. Show that the independence extends to the o--fields :Ft == V n :Ft,n. 7. For each t E T, let t, f , , . .. be random elements in some metric space St with  -4 €t a.s., and assume for each n E N that the random elements  are independent. Show that the independence extends to the limits t. (Hint: First show that E TItES ft(t) == TItES Eft(t) for any bounded, continuous functions it on St and for finite subsets SeT.) 8. Give an example of three events that are pairwise independent but not independent. 9. Give an example of two random variables that are uncorrelated but not independent. 10. Let 1, 2, . .. be i.i.d. random elements with distribution J.-l in some measurable space (8, S). Fix a set A E S with J.LA > 0, and put T == inf{k; (,k E A}. Show that T has distribution J1[.IA] = J.-l(- nA)/J-tA. 11. Let l, 2,. .. be independent random variables taking values in [0,1]. Show that E TIn n == TIn Ef,n. In particular, show that P nn An == TIn P An for any independent events AI, A 2 , . .. . 12. Let {I, {2, . .. be arbitrary random variables. Show that there exist some constants Cl, C2,. . . > 0 such that the series En Cnn converges a.s. 
3. Processes, Distributions, and Independence 61 13. Let I, 2, . .. be random variables with n --+ 0 a.s. Show that there exists some measurable function f > 0 with En f(n) < 00 a.s. Also show that the conclusion fails if we only assume L1-convergence. 14. Give an example of events AI, A 2 , . .. such that P {An La.} == 0 but En P An == 00. 15. Extend Lemma 3.20 to a correspondence between U(O,l) random variables {) and Bernoulli sequences l, 2, . .. with rate p E (0,1). 16. Give an elementary proof of Theorem 3.25 for d == 1. (Hint: Define  == F- I ({)), where iJ is U(O, 1), and note that  has distribution function F.) 17. Let I, 2, . .. be random variables such that P {n 7 0 La.} == 1. Show that there exist some constants C n E 1R such that P{lcnnl > 1 Lo.} == 1. (Hint: Note that P{Ekn Ikl > O} --+ 1.) 
Chapter 4 Random Sequences, Series, and Averages Convergence in probability and in LP; uniform integrabil- ity and tightness; convergence in distribution; convergence of random series; strong laws of large numbers; Portmanteau the- orem; continuous mapping and approximation; coupling and measurability The first goal of this chapter is to introduce and compare the basic modes of convergence of random quantities. For random elements  and 1, 2, . .. in a metric or topological space S, the most commonly used notions are those of almost sure convergence, n   a.s., and convergence in probability, n  , corresponding to the general notions of convergence a.e. and in measure, respectively. When S == IR, we have the additional concept of LP-convergence, familiar from Chapter 1. Those three notions are used throughout this book. For a special purpose in Chapter 25, we shall also need the notion of weak L1-convergence. For our second main topic, we shall study the very different concept of convergence in distribution, n .!4 , defined by the condition Ef(n)  Ef() for all bounded, continuous functions f on S. This is clearly equiv- alent to weak convergence of the associated distributions J-ln = £(n) and J-l == £(), written as J-ln  J-l and defined by the condition J.1nf  J-lf for every f as above. In this chapter we shall only establish the most basic results of weak convergence theory, such as the "Portmanteau" theorem, the continuous mapping and approximation theorems, and the Skorohod coupling. Our development of the general theory continues in Chapters 5 and 16, and further distributional limit theorems appear in Chapters 8, 9, 12, 14, 15, 19, and 23. Our third main theme is to characterize the convergence of series Lk (,k and averages n- C Lk<n (,k, where (,1, 2, . .. are independent random vari- ables and c is a positive constant. The two problems are related by the elementary Kronecker lemma, and the main results are the basic three- series criterion and the strong law of large numbers. The former result is extended in Chapter 7 to the powerful martingale convergence theorem, whereas extensions and refinements of the latter result are proved in Chap- 
4. Random Sequences, Series, and Averages 63 ters 10 and 14. The mentioned theorems are further related to certain weak convergence results presented in Chapters 5 and 15. Before beginning our systematic study of the various notions of convergence, we consider a couple of elementary but useful inequalities. Lemma 4.1 (moments and tails, Bienayme, Chebyshev, Paley and Zyg- mund) Let  be an R+-valued random variable with 0 < E < 00. Then (1 - r) (2 < P{ > rEO < , r:> o. (1) The second relation in (1) is often referred to as Chebyshev's or Markov's inequality. Assuming that E2 < 00, we get in particular the well-known estimate P{I - EI > €} < €-2var(), E > O. Proof of Lemma 4.1: We may clearly assume that E' == 1. The upper bound then follows as we take expectations in the inequality rl{ > r} < . To get the lower bound, we note that for any r, t > 0 t21{ > r} > (- r)(2t + r -) = 2(r + t) - r(2t + r) _ 2. Taking expected values, we get for r E (0, 1) t 2 P{ > r} > 2(r + t) - r(2t + r) - E2 > 2t(1 -- r) - E2. Now choose t == E2 /(1 - r). 0 For random elements  and 1, 2, . .. in a metric space (S, p), we say that n converges in probability to  (written as n  ) if lirn P{P(n') > E} == 0, E > O. n-+oo By Chebyshev's inequality it is equivalent that E[P(n') /\ 1] -t O. This notion of convergence is related to the a.s. version as follows. Lemma 4.2 (subsequence criterion) Let , 1, 2, . .. be random elements in a metric space (S, p). Then n   iff every subsequence N' c N has a further subsequence Nil c N' such that n -t  a. s. along Nil. In particular, n -+  a.s. implies n  . This shows in particular that the notion of convergence in probability depends only on the topology and is independent of the metrization p. Proof: Assume that n  , and fix an arbitrary subsequence N' c N. We may then choose a further subsequence Nil c N' such that E L {P(n,)/\l}= L E[P(n,)/\l] <00, nEN" nEN" where the equality holds by monotone convergence. The series on the left then converges a.s., which implies n -+  a.s. along Nil. 
64 Foundations of Modern Probability Now assume instead the stated condition. If n  , there exists some € > o such that E[p( n' ) 1\ 1] > € along a subsequence N' eN. By hypothesis, n -+  a.s. along a further subsequence N" c N', and by dominated convergence we get E[p( n, ) 1\ 1] -7 0 along Nil, a contradiction. 0 For a first application, we shall see how convergence in probability is preserved by continuous mappings. Lemma 4.3 (continuous mapping) For any metric spaces 8 and T, let , 1' 2, . .. be random elements in S with n  , and let the mapping f: S -+ T be measurable and a.s. continuous at. Then f(n)  f(). Proof: Fix any subsequence N' c N. By Lemma 4.2 we have n -+  a.s. along some further subsequence N" C N', and by continuity we get p f(n) -+ f() a.s. along N". Hence, f(n) -+ f(f,) by Lemma 4.2. 0 Now consider a sequence of metric spaces (Sk,Pk), and introduce the product space S = XkSk == 8 1 X 8 2 X ... endowed with the product topology, a convenient metrization of which is given by p(X,y) = LkTk{Pk(Xk,Yk) 1\ I}, X,Y E XkSk. (2) If each 8k is separable, then 8(8) == @k 8(Sk) by Lemma 1.2, and so a random element in S is simply a sequence of random elements in Sk, kEN. Lemma 4.4 (random sequences) For any separable metric spaces 8 1 ,8 2 , . .. , let  == (1, 2, . . .) and n == (1' 2 , . . . ), n EN, be random elements in XkSk. Then n   iff k  k in 8k for each k. Proof: With p as in (2), we get for each n E N E[p(n,) 1\ 1] = Ep(n,) = Lk T k E[Pk(k' k) 1\ 1]. Thus, by dominated convergence E[p(n, )I\ 1] -7 0 iff E[Pk(k' k) 1\1] -+ 0 for all k. 0 Combining the last two lemmas, it is easy to see how convergence in probability is preserved by the basic arithmetic operations. Corollary 4.5 (elementary operations) Let ,1,2,". and T},T}1,TJ2,.', be random variables with n   and TJn  TJ. Then an + brJn  a + br] p p for all a, b E JR, and nTJn --+ TJ. Furthermore, n/1Jn -+ /1J whenever a.s. 'TJ -# 0 and TJn =1= 0 for all n. Proof: By Lemma 4.4 we have (n' TJn)  (, '1]) in ]R2, so the results fOf linear combinations and products follow by Lemma 4.3. To prove the last assertion, we may apply Lemma 4.3 to the function f: (x, y) H- (xjy)l{y =I O}, which is clearly a.s. continuous at (, TJ). 0 
4. Random Sequences, Series, and Averages 65 Let us next examine the associated completeness properties. For any random elements 1,2,..' in a metric space (S,p), we say that (n) is Cauchy (convergent) in probability if P(m, n)  0 as 'm, n  00, in the sense that E[P(m' n) /\ 1] --t O. Lemma 4.6 (completeness) Let 1,2,... be random elements in a com- plete metric space (8, p). Then (n) is Cauchy in probability or a.s. iff n   or n -+  a. s., respectively, for some random element  in S. Proof: The a.s. case is immediate from Lemma 1.10. Assuming n  , we get E[P(m, n) 1\ 1] < E[P(m') /\ 1] + E[P(n') 1\ 1]  0, which means that (n) is Cauchy in probability. Now assume instead the latter condition. Define nk == inf {n > k; sUPm>nE[P(m'n) /\ 1] < 2- k }, kEN. The nk are finite and satisfy ELk {P(nk' nk+l) 1\ I} < Lk r k < eX), and so Lk P(nk' nk+l) < 00 a.s. The sequence (nk) is then a.s. Cauchy and converges a.s. toward some measurable limit . To see that n  , write E[P(m,) 1\ 1] < E[P(m' nk) /\ 1] + E[P(nk') /\ 1], and note that the right-hand side tends to zero as m, k -1 00, by the Cauchy convergence of (n) and dominated convergence. 0 N ext consider any probability measures J..L and J.-L 1 , J-L2, . .. on some metric space (8, p) with Borel a-field 5, and say that J-Ln converges weakly to JL (written as J.tn  J-L) if J-Lnf -t J-Lf for every f E Cb(S), the class of bounded, continuous functions I: S  JR. If  and  1 , 2, . .. are random elements in S, we further say that n converges in distribution to  (written as n  ) if .c(n)  .c(), that is, if Ef(f,n)  Ef(f,) for all f E Cb(S). Note that the latter mode of convergence depends only on the distributions and that  and the n need not even be defined on the same probability space. To motivate the definition, note that X n -t x in a metric space S iff I(xn) -+ I(x) for all continuous functions f: S --t JR., and also that £(f,) is determined by the integrals Ef() for all f E Cb(S). The following result gives a connection between convergence in proba- bility and in distribution. 
66 Foundations of Modern Probability Lemma 4.7 (convergence in probability and in distribution) Let , 1, 2, . .. be random elements in a metric space (S, p). Then n   implies n , and the two conditions are equivalent when  is a.s. constant. p Proof: Assume n  . For any f E Cb(S) we need to show that Ef(n) -+ Ef(). If the convergence fails, we may choose some subse- quence N' c N such that infnEN, IEf(n) - Ef()1 > O. By Lemma 4.2 there exists a further subsequence Nil c N' such that n -+  a.s. along Nil. By continuity and dominated convergence we get Ef(n)  Ef() along Nil, a cQntradiction. Conversely, assume that n  S E S. Since p(x, s) 1\ 1 is a bounded and continuous function of x, we get E[P(n, s) 1\ 1]  E[p(s, s) /\ 1] == 0, and p so n -7 s. D A family of random vectors t, t E T, in d is said to be tight if lim supP{Itl > r} == O. r-+oo tET For sequences (n) the condition is clearly equivalent to lirn limsupP{Inl > r} == 0, roo n-+oo (3) which is often easier to verify. Tightness plays an important role for the compactness methods developed in Chapters 5 and 16. For the moment we note only the following simple connection with weak convergence. Lemma 4.8 (weak convergence and tightness) Let, 1, 2, . .. be random vectors in JRd satisfying n  . Then (n) is tight. Proof: Fix any r > 0, and define I(x) == (1 - (r -Ixl)+)+. Then limsupP{Inl > r} < lim Ef((,n) == Ef((') < P{I(,I > r - I}. n-+oo n-+oo Here the right-hand side tends to 0 as r --+ 00, and (3) follows. D We may further note the following simple relationship between tightness and convergence in probability. Lemma 4.9 (tightness and convergence in probability) Let l, 2, . .. be random vectors in d. Then (c;n) is tight iff enn  0 for any constants Cl , c2, . . . > 0 with C n -7 O. Proof: Assume (n) to be tight, and let en --+ O. Fixing any r, € > 0, and noting that CnT < € for all but finitely many n E N, we get limsupP{lennl > E} < limsupP{Inl > r}. n-+oo n-+oo Here the right-hand side tends to 0 as r -7 00, and so P{lcnnl > E} -7 O. Since € was arbitrary, we get enn  o. If instead (n) is not tight, we may 
4. Random Sequences, Series, and Averages 67 choose a subsequence (nk) C N such that infkP{Inkl:> k} > O. Letting C n = sup{k- 1 ; nk > n}, we note that C n -j. 0 and yet P{ICnknkl > I} f+ O. Thus, the stated condition fails. 0 We turn to a related notion for expected values. A family of random variables t, t E T, is said to be uniformly integrable if lim sup E[Itl; Itl > r] == O. r-+oo tET (4) For sequences (n) in £1, this is clearly equivalent to lirn limsup E[Inl; Inl > r] = O. r-4-OO n-+oo (5) Condition (4) holds in particular if the t are LP - bounded for some p > 1, in the sense that SUPt EIt!P < 00. To see this, it suffices to write E[\t I; It I > r] < r- p + 1 EI€t IP, r, p > O. The next result gives a useful characterization of uniform integrability. For motivation we note that if  is an integrable random variable, then E[II; A] --+ 0 as P A -4- 0, by Lemma 4.2 and dominated convergence. The latter condition means that sup AEA,P A<c; E[II; A] -+ 0 .as E -+ o. Lemma 4.10 (uniform integrability) The random variables t, t E T, are uniformly integrable iff SUPt EI€t\ < 00 and lim sup E[I€t I; A] == o. (6) PA-+O tET Proof: Assume the t to be uniformly integrable, and write E[Itl; A] < rP A + E[Itl; Itl > r], r:;> O. Here (6) follows as we let P A -+ 0 and then r -4- 00. To get the boundedness in £1, it suffices to take A = n and choose r > 0 large enough. Conversely, let the t be £l-bounded and satisfy (6). By Chebyshev's inequality we get as r -+ 00 sUPtP{Itl > r} < r-lsuPtEItl -+ 0, and so (4) follows from (6) with A = {Itl > r}. o The relevance of uniform integrability for the convergence of moments is clear from the following result, which also contains a. weak convergence version of Fatou's lemma. Lemma 4.11 (convergence of means) Let, l, 2, . .. be IR+-valued ran- dom variables with n  . Then E < lirn inf n l?n' and we have En --+ E < 00 iff (5) holds. Proof: For any r > 0 the function x H x 1\ r is bounded and continuous on IR+. Thus, liminf En > lim E(n 1\ r) = E( 1\ r), n-+CX) n-+(X) 
68 Foundations of Modern Probability and the first assertion follows as we let r  00. Next assume (5), and note in particular that E < liminf n En < 00. For any r > 0 we get IEn - EI < IEn - E(f,n A r)1 + IE{{n A r) - E( A r)1 + IE(f, A r) - E{I. Letting n  00 and then r -t 00, we obtain En -t E. Now assume instead that En --t E{ < 00. Keeping r > 0 fixed, we get as n -t 00 E[{n; f,n > r] < E[n - n 1\ (r - n)+]  E[{ -  1\ (r - )+]. Since x /\ (r - x)+ t x as r -t 00, the right-hand side tends to zero by dominated convergence, and (5) follows. 0 We may now examine the relationship between convergence in LP and in probability. Proposition 4.12 (LP-convergence) Fix any p > 0, and let f" l, {2,... E LP with n  . Then these conditions are equivalent: (i) n  f, in LP; (ii) lI{nilp -t 1IlIp; (iii) the variables l{nI P , n E N, are uniformly integrable. Conversely, (i) implies {n  {. Proof: First assume that n -t { in LP. Then II{nllp -t 11llp by Lemma 1.29, and by Lemma 4.1 we have, for any € > 0, P{I{n - {I > €} = P{I{n - {IP > e P } < e-PII{n - II -t o. Thus, {n  {. For the remainder of the proof we may assume that {n  {. In particular, l{nl P  I{IP by Lemmas 4.3 and 4.7, and so (ii) and (iii) are equivalent by Lemma 4.11. Next assume (ii). If (i) fails, there exists some subsequence N' c N with inf nEN , II{n - {lip > o. By Lemma 4.2 we may choose a further subsequence N" c N' such that {n -t  a.s. along Nil. But then Lemma 1.32 yields lI{n - {lip -t 0 along Nil, a contradiction. Thus, (ii) implies (i), and so all three conditions are equivalent. 0 We shall briefly consider yet another notion of convergence of random variables. Assuming , 1, . . . E LP for some p E [1, 00 ), we say that n -t  weakly in LP if EnTJ  ETJ for every 1] E Lq, where p-l +q-l = 1. Taking 1] = /{IP-l sgn { gives IITJllq = IIII-l, and so by Holder's inequality IIII = E'TJ = Hm Ef,nT} < "{/I-llim inf IIn lip, n--+oo n--+oo which shows that 1Illp < lirn inf n lIn lip. Now recall the well-known fact that any L2-bounded sequence has a subsequence that converges weakly in L2. The following related criterion for weak compactness in L 1 will be needed in Chapter 25. 
4. Random Sequences, Series, and Averages 69 Lemma 4.13 (weak L 1 -compactness, Dunford) Every uniformly inte- grable sequence of random variables has a subsequence that converges weakly in £1. Proof: Let (n) be uniformly integrable. Define  == nl{lnl < k}, and note that () is £2-bounded in n for each k. By the compactness in £2 and a diagonal argument, there exist a subsequence N' c N and some random variables 1]1, 1]2, . .. such that   TJk holds weakly in £2 and then also in £1, as n  00 along N' for fixed k. Now l11Jk - 1Jdh < liminf n II - 111, and by unifornl integrability the right-hand side tends to zero as k, 1  00. Thus, the sequence (TJk) is Cauchy in £1, and so it converges in £1 toward some . By approximation it follows easily that n   weakly in £1 along N'. 0 We now derive criteria for the convergence of random, series, beginning with an important special case. Proposition 4.14 (series with positive terms) Let  1, l;2, . .. be indepen- dent JR+ -valued random variables. Then En n < 00 a. s. iff En E [n 1\ 1] < 00. Proof: Assuming the stated condition, we get E En (n 1\ 1) < 00 by Fubini's theorem, so En(nl\ 1) < 00 a.s. In particular, En l{n > I} < 00 a.s., so the series En (n 1\ 1) and En €n differ by at lllost finitely many terms, and we get En n < 00 a.s. Conversely, assume that En n < 00 a.s. Then also En (n 1\ 1) < 00 a.s., so we may assume that n < 1 for all n. Noting that 1 - x < e- x < 1 - ax for x E [0,1] where a = 1- e- 1 , we get o < EexP{-Ln.;n} = IInEe-n < IIn (1 - aE';n) < IIn e-aEn = exp { -a I: n E';n } , and so En En < 00. D To handle more general series, we need the following strengthened ver- sion of the Bienayme-Chebyshev inequality. A further extension appears as Proposition 7.15. Lemma 4.15 (maximum inequality, Kolmogorov) Let 1, 2,. .. be inde- pendent random variables with mean zero, and put Sn == 1 + . . . + n. Then P{suPnl 8n l > r} < r- 2 Ln E.;, r > O. 
70 Foundations of Modern Probability Proof: We may assume that En E < 00. Writing r = inf {n; I Sn I > r} and noting that Sk1{T == k}Jl(Sn - Sk) for k < n, we get "E ES > " E[S; r == k] kn kSn > " {E[S; T == k] + 2E[Sk(Sn - Sk); T == k]} k<n Lkn E[S; r = k] > r 2 P{r < n}. As n --t 00, we obtain L E > r 2 P{T < oo} == r 2 P{suPkl S kl > r}. dk The last result leads easily to the following sufficient condition for the a.s. convergence of random series with independent terms. Conditions that are both necessary and sufficient are given in Theorem 4.18. o Lemma 4.16 (variance criterion for series, Khinchin and Kolmogorov) Let  1 , 2, . .. be independent random variables with mean 0 and En E < 00. Then En n converges a.s. Proof: Write Sn = 1 + . . . + n. By Lemma 4.15 we get for any £ > 0 p{ sUPk>nlSn - Ski> £} < £-2" EZ. - kn Hence, sUPk>n ISn -Ski  0 as n -+ 00, and Lemma 4.2 yields sUPk>n 18n- Ski --t 0 a.s.-along a subsequence. Since the last supremum is nonincreasing in n, the a.s. convergence extends to the entire sequence, which means that (Sn) is a.s. Cauchy convergent. Thus, Sn converges a.s. by Lemma 4.6. 0 The next result gives the basic connection between series with positive and symmetric terms. By n  00 we mean that P {n > r} -+ 1 for every r > o. Theorem 4.17 (positive and symmetric terms) Let 1, 2, . .. be indepen- dent, symmetric random variables. Then these conditions are equivalent: (i) En n converges a.s.; (ii) En  < 00 a.s.; (iii) En E(; /\ 1) < 00. If the conditions fail, then I Ekn kl  00. Proof: Conditions (ii) and (iii) are equivalent by Proposition 4.14. Next assume (iii), and conclude from Lemma 4.16 that En n1{Inl < I} con- verges a.s. From (iii) and Fubini's theorem we note that also En l{lnl > I} < 00 a.s. Hence, the series En n 1 {In I < I} and En n differ by at most finitely many terms, and so even the latter series converges a.s. Thus, (iii) implies (i). To see that (i) implies (ii), assume instead that (ii) fails. p Then En  = 00 a.s. by Kolmogorov's 0-1 law, and so 18nl -t 00 where 
4. Random Sequences, Series, and Averages 71 Sn == L:k<n k. Since the latter condition implies 18nl ---+ (X) a.s. along some subsequence, we conclude that even (i) fails. This shows that (i)-(iii) are are equivalent. To prove the final assertion, we introduce an independent sequence of i.i.d. random variables f} n with P { f} n == :f: I} == , and note that the se- quences (n) and (19 n I n \) have the same distribution. Letting J-L denote the distribution of the sequence (Inl), we get by Lemma 3.11 P{ISnl > r} = f p{ILk:s n 'l9 k X k j > r} J.L(dx), r > 0, and by dominated convergence it is enough to show that. the integrand on the right tends to 0 for J-L-almost every x == (Xl, X2, . . . ). Since Ln x; == 00 a.e., this reduces the argument to the case of nonrandom. In I == Cn, n E N. First assume that the C n are unbounded. For any r > 0 we may re- cursively construct a subsequence (nk) C N such that C n1 > rand c nk > 4 Lj<k c n ] for each k. Then clearly P{LJS;k nJ E I} < 2- k for every interval I of length 2r. By convolution we get P{IS"nl < r} < 2- k for all n > nk, which implies P{ISnl < r} -t o. Next assume that C n < C < 00 for all n. Choosing a > 0 so small that 2 cos X < e- ax for Ixl < 1, we get for 0 < It I < c- l o < Ee itSn = II COS(tck) < II exp( -at2c) = exp { -aeLk<n c} -+ O. kn kS;n - Anticipating the elementary Lemma 5.1 of the next chapter, we again get P{ISnl < r} ---+ 0 for each r > o. 0 The problem of characterizing the convergence, a.s. or in distribution, of a series of independent random variables is solved eompletely by the following result. Here we write var[; A] == var(lA). Theorem 4.18 (three-series criterion, Kolmogorov, Levy) Let l, 2, . . . be independent random variables. Then Ln n converges a.s. iff it converges in distribution, and also iff these conditions are fulfilled: (i) Ln P{Inl > I} < 00; (ii) Ln E[n; Inl < 1] converges; (iii) Ln var[n; Inl < 1] < 00. For the proof we need the following simple symmetrization inequalities. Say that m is a median of the random variable  if P { > m} V P { < m} < . A symmetrization of  is defined as a random variable of the form  =  - ' with ' lL and ' d . For symmetrized versions of the random variables 1, 2,. .. , we require the same properties for the whole sequences (n) and (). 
72 Foundations of Modern Probability Lemma 4.19 (symmetrization) Let  be a symmetrization of a random variable  with median m. Then P{I - ml > r} < P{I€I > r} < 2P{I1 > r/2}, r > o. Proof: Assume  =  -' as above, and write {- m > r,' < m} U {- m < -r,' > m} C {II > r} C {II > r/2} U {I'I > r/2}. o We also need a simple centering lemma. Lemma 4.20 (centering) Let the random variables l, 2, . .. and con- stants Cl, C2, . .. be such that both n and n + C n converge in distribution. Then even C n converges. d Proof: Assume that n -+ . If C n -+ :1:00 along some subsequence N' c N, then clearly n + C n  ::1::00 along N', which contradicts the tightness of n + Cn. Thus, the en are bounded. Now assume that C n -+ a and C n -+ b d along two subsequences N 1 ,N 2 c N. Then n + C n -+  + a along N l and n + C n .!4  + b along N2, so  + a d  + b. Iterating this relation, we get  +n(b-a) d  for arbitrary n E IE, which is impossible unless a = b. Thus, all limit points of (c n ) agree, and C n converges. 0 Proof of Theorem 4.18: Assume conditions (i) through (iii), and define  = n1{Inl < I}. By (iii) and Lemma 4.16 the series En( - E€) converges a.s., so by (ii) the same thing is true for En €. Finally, P{n i=  Lo.} = 0 by (i) and the Borel-Cantelli lemma, so En (n - ) has a.s. finitely many nonzero terms. Hence, even En €n converges a.s. Conversely, assume that En n converges in distribution. Then Lemma 4.19 shows that the sequence of symmetrized partial sums Ekn k is tight, and so En n converges a.s. by Theorem 4.17. In particular, n -+ 0 a.s. For any € > 0 we obtain EnP{Inl > €} < 00 by the Borel-Cantelli lemma. Hence, En P{In - mnl > E} < 00 by Lemma 4.19, where ml, m2,. .. are medians of 1,€2,... . Using the Borel-Cantelli lemma again, we get n - m n -+ 0 a.s. Now let Cl, C2,... be arbitrary with m n - C n -+ o. Then even n - C n -+ 0 a.s. Putting 1Jn = n 1 {In - C n I < I}, we get a.s. n = 1]n or all but finitely many n, and similarly for the symmetrized variables €n and iin. Thus, even En TJn converges a.s. Since the fin are bounded and symmetric, Theorem 4.17 yields En var(1Jn) = ! En var(1]n) < 00. Thus, En('TJn -E1]n) converges a.s. by Lemma 4.16, as does the series En(n -E1]n). Comparing with the distributional convergence of En €n, we conclude from Lemma 4.20 that En E1]n converges. In particular, E1Jn --+ 0 and 1]n - E'fJn -+ 0 a.s., so TIn -+ 0 a.s., and then also €n -+ 0 a.s. Hence, m n -+ 0, so we may take C n = 0 in the previous argument, and conditions (i)-(iii) follow. 0 
4. Random Sequences, Series, and Averages 73 A sequence of random variables 1, (2,. .. with partial sums Sn is said to obey the strong law of large numbers if Sn/n converges a.s. to a constant. The weak law is defined by the corresponding condition with convergence in probability. The following elementary proposition enables us to convert convergence results for random series into laws of large numbers. Lemma 4.21 (series and averages, Kronecker) If Ll1 n-ca n converges for some aI, a2, . . . E IR and c > 0, then n -c Lkn ak --+ o. Proof: Put b n == n-ca n , and assume that Ln b n == b. By dominated convergence as n -7 00, L bk - n -c L ak kn kn L(l - (k/n)C)bk = C L bk 1 1 x c - 1 dx kn kn k/n [1 ,.1 C io x c - 1 dx L b k ---+ be j x C - 1 dx = b, o knx 0 and the assertion follows since the first term on the left tends to b. 0 The following simple result illustrates the method. Corollary 4.22 (variance criterion for averages, Kol1nogorov) Let (1, 2,' .. be independent random variables with zero mean such that Ln n- 2c E( < 00 for some c > o. Then n- C Lkn (k --+ 0 a.s. Proof: The series Ln n-cn converges a.s. by Lemrna 4.16, and the assertion follows by Lemma 4.21. 0 In particular, we note that if (,(1,(2,... are i.i.d. with E( = 0 and E(2 < 00, then n- C Lk<n k --+ 0 a.s. for any c > . The statement fails for c == , as may be seen by taking ( to be N(O,l). 'The best possible normalization is given in Corollary 14.8. The next result characterizes the stated convergence for arbitrary c > . For c = 1 we reeognize the strong law of large numbers. Corresponding criteria for the weak law are given in Theorem 5.16. Theorem 4.23 (strong laws of large numbers, Kolmog01'Ov, Marcinkiewicz and Zygmund) Let (, (1, (2,... be i.i.d. random variables, and fix any p E (0,2). Then n- 1 / p Lk<n (k converges a.s. iff EI(IP < ex: and either p < 1 or E == O. In that case the limit equals E for p = 1 and is otherwise o. Proof: Assume that EI(IP < 00 and also, for p > 1, that E(, = o. Define { = {n1{I{nl < n 1 / p }, and note that by Lemma 3.4 LP{ =F n} = LP{IIP > n} < 1 00 P{IIP > t}dt = EIIP < 00. n n 0 By the Borel-Cantelli lemma we get P{{ =1= n Lo.} == 0, and so  == n for all but finitely many n E N a.s. It is then equivalent to show that 
74 Foundations of Modern Probability n- 1 / p Lkn  -t 0 a.s. By Lemma 4.21 it suffices to prove instead that En n-l/p converges a.s. For p < 1, this is clear if we write Lnn-l/PE[II; 1(1 < n l / P ] 1 00 clip E[II; II < tllP]dt E[II roo Cl/Pdt]  EIIP < 00. J1f,IP If instead p > 1, it suffices by Theorem 4.18 to prove that Ln n- 1 / p E converges and Ln n-2/pvar() < 00. Since E == -E[; II > n 1 / p ], we have for the former series ELnn-l/PI1 <  Lnn-l/PIEI < Lnn-l/PE[II; II > n l / P ] < 1 00 Clip E(II; II > tl/P]dt 1f,I P E[''1 Cl/Pdt]  EIIP < 00. Ln n- 2 / p E(()2 Ln n- 2 / p E[e; II < n l / p ] 1 00 c 2 / p E[e; II < tl/P]dt E[e roo C 2 / P dt]  EIIP < 00. J1f,IP If p == 1, then E == E[; II < n] -t 0 by dominated convergence. Thus, n- 1 Lk<n E  0, and we may prove instead that n- 1 Lk<n €  0 a.s. where  ==  - E. By Lemma 4.21 and Theorem 4.18 it is then enough to show that Ln n-2var(€) < 00, which may be seen as before. Conversely, assume that n-1/PSn == n- 1 / p Lk<n €k converges a.s. Then As for the latter series, we get Ln n-2/pvar(() < < ".--.. == Sn _ ( n-1 ) 1/p Sn-l --+0 as nl/p n 1 / p n (n - l)l/p . ., and in particular P{InIP > n Lo.} == O. Hence, by Lemma 3.4 and the Borel-Cantelli lemma, EIIP = 1 00 P{IIP > t}dt < 1 + LP{IIP > n} < 00. o nl For p > 1, the direct assertion yields n-1/p(Sn - nE€) --+ 0 a.s., and so n 1 - l/p Ef, converges, which implies Ef, = O. 0 
4. Random Sequences, Series, and Averages 75 For a simple application of the law of large numbers, consider an arbitrary sequence of random variables l, 2, . .. , and define the associated empirical distributions as the random probability measures {l,n == n- 1 I:kn 8k. The corresponding empirical distribution functions Fn are given by Fn(x) = ,J,n( -00, x] = n- 1 Lk<n l{k < x}, x E JR, n E N. Proposition 4.24 (empirical distribution functions, Glivenko, Cantelli) Let l, 2, . .. be i. i. d. random vc:.riaples with distribution function F and empirical distribution functions F I , F 2 , . .. . Then lim sup IFn(x) - F(x)1 == 0 a.s. (7) noo x Proof: By the law of large numbers we have Fn(x) -+ l(x) a.s. for every x E JR. Now fix a finite 1?artition -00 == Xl < X2 < . . . < X m == 00. By the monotonicity of F and Fn sup IFn{x) - F(x)1 < max IFn(Xk) - F(Xk)1 + max IF(;rk+l) - F(Xk)l. x k k Letting n -+ 00 and refining the partition indefinitely, we get in the limit limsup sup IFn(x) - F(x)1 < sup F(x) a.s., noo x x which proves (7) when F is continuous. For general F, let {)1,rJ 2 ,... be i.i.d. U(O, 1), and define TJn == g(1J n ) for each n, where g(t) == sup{x; F(x) < t}. Then TJn < x iff tin < F(x), and so ('TJn) d (n). We may then assume that n - TJn. Writing 0 1 , 6,... or the empirical distribution functions of'19 I ,{)2,... , we see that also Fn == GnoF. Writing A == F(JR) and using the result for continuous F, we get a.s. sup IFn(x) - F(x)1 == sup IGn(t) - tl < sup IGn(t) - tl -+ o. D x tEA tE[O,I] We turn to a systematic study of convergence in distribution. Although we are currently mostly interested in distributions on Euclidean spaces, it is crucial for future applications that we consider the more general setting of an abstract metric space. In particular, the theory is applied in Chapter 16 to random elements in various function spaces. Theorem 4.25 (Portmanteau theorem, Alexandrov) For any random elements , 1, 2, . .. in a metric space 5, these conditions are equivalent: (i) n  ; (ii) liminf n P{n E G} > P{ E G} for any open set c; c s; (iii) limsuPn P{n E F} < P{ E F} for any closed set F c 5; (iv) P{n E B} -t P{ E B} for any B E B(5) with  ( BB a.s. A set B E B(8) with  (j. BB a.s. is often called a -continuity set. Proof: Assume (i), and fix any open set G c S. Letting f be continuous with 0 < f < IG, we get Ef(n) < P{n E G}, and (ii) follows as we let 
76 Foundations of Modern Probability n -t 00 and then f t 1G. The equivalence between (ii) and (iii) is clear from taking complements. Now assume (ii) and (iii). For any B E 8(8), P{ E BO} < liminf P{n E B} < limsupP{n E B} < P{ E B }. noo n-+oo Here the extreme members agree when  ft 8B a.s., and (iv) follows. Conversely, assume (iv) and fix any closed set F c S. Write FE = {s E S; p(s,F) < E}. Then the sets 8FE: C is; p(s,F) = c:} are disjoint, and so  ft apE for almost every c: > O. For such an € we may write P{n E F} < P{ E FE}, and (iii) follows as we let n -t 00 and then E -t O. Finally, assume (ii) and let f > 0 be continuous. By Lemma 3.4 and Fatou's lemma, Ef(() ('0 P{J() > t}dt < (JO limin£ P{J((n) > t}dt J o Jo n-+oo < limin£ roo P{J(n) > t}dt = limin£ Ef((n). (8) n-+oo h n-+oo Now let f be continuous with If I < C < 00. Applying (8) to c:!: f yields Ef(n) -t Ef(), which proves (i). 0 For an easy application, we insert a simple lemma that is needed in Chapter 16. Lemma 4.26 (subspaces) For any metric space (8, p) with subspace A c S, let , 1, 2,. .. be random elements in (A, p). Then n !!:;.  in (A, p) iff the same convergence holds in (S, p). Proof: Since , 1, 2, . . . E A, condition (ii) of Theorem 4.25 is equivalent to liminf P{n E An G} > P{ E An G}, G c S open. n-+oo By Lemma 1.6, this is precisely condition (ii) of Theorem 4.25 for the subspace A. 0 It is clear directly from the definitions that convergence in distribution is preserved by continuous mappings. The following more general statement is a key result of weak convergence theory. Theorem 4.27 (continuous mapping, Mann and Wald, Prohorov, Rubin) For any metric spaces Sand T, let ,1,2,... be random elements in S with n !!:;., and consider some measurable mappings f, 11, !2,...: 8 -t T and a measurable set C c 8 with  E C a.s. such that !n(sn) -t f(s) as d Sn -t SEC. Then !n(f,n) --t f(). In particular, we note that if n !!:;.  in Sand f: S ---t Tis a.s. continuous at €, then f(€n) !!:;. I(). This frequently used statement is commonly referred to as the continuous mapping theorem. 
4. Random Sequences, Series, and Averages 77 Proof: Fix any open set GeT, and let S E j-Ie n (7. By hypothesis there exist an integer mEN and some neighborhood IV of s such that fk(S') E G for all k > m and s' E N. Thus, N c nk2:m fi:-1G, and so i-Ie nee U {nk>mi;Ie} 0 . m - Now let J-t, J-tl, J..t2, . .. denote the distributions of , 2, 2, , .. . By Theorem 4.25 we get J-t(f-l G) < J-t U {n f;IG } O = sup J-t { n fklC } O k2:m m k'2'm Tn < sup liminf J-tn n fklG < liminf J-tn(f;lG). m n n k2:m Using the same theorem again gives J-Ln 0 f;;l  J-t 0 j-l, v/hich means that d jn(n)  f(€). 0 We will now prove an equally useful approximation theorem. Here the idea is to prove n .!4. € by choosing approximations 'rJn of n and 'rJ of € such that 1]n .!4. 1]. The desired convergence will follow if w'e can ensure that the approximation errors are uniformly small. Theorem 4.28 (approximation) Let €, €n, 1]k, and 'rJ be random elements in a metric space (S, p) such that 1] .!4. 'rJk as n ---+ 00 fOT fixed k and also 'fJk .!4. . Then €n .!4. € holds under the further condition lirn limsup E[P(17, n) /\ 1] = O. (9) k n-+- Proof: For any closed set F c S and constant c > 0 we have P{€n E F} < P{1J E FE} + P{P(17,n) > c}, where FE = {s E S; p( s, F) < c}. By Theorem 4.25 we get as n ---+ 00 limsupP{€n E F} < p{1Jk E FE} + limsupP{p('rJ:,n) > c}. n n---+CX) Now let k -t 00, and conclude from Theorem 4.25 together with (9) that limsupP{n E F} < P{€ E FE}. n ---+ 00 As € ---+ 0, the right-hand side tends to P{€ E F}. Since F was arbitrary, d we get n -+ € by Theorem 4.25. 0 Next we consider convergence in distribution on product spaces. 
78 Found8:tions of Modern Probability Theorem 4.29 (random sequences) For any separable metric spaces 8 1 , 8 2 , ... , let  == ( 1 , 2 , . . .) and  n = (, , ...), n EN, be random elements in XkSk. Then n   iff for any functions fk E Cb(Sk), E[fl()... fm(:)] -7 E[fl(I)... fm(m)], mEN. (10) In particular, we note that n   follows from the finite-dimensional convergence (, . . . , :)  (1, . . . , m), mEN. (11) If  and the n have independent components, it is even sufficient that   k for every k. Proof: The necessity of the condition is clear from the continuity of the projections s I---t Sk. To prove the sufficiency, we first assume that (10) holds for a fixed m. Writing S£ == {B E B(8k); k rf- 8B a.s.} and applying Theorem 4.25 m times, we obtain P { ( , . . . , :) E B} -7 P { (  1 , . . . , m) E B}, (12) for any set B = B l X . . . x Bm such that B k E S for all k. Since the 8k are separable, we may choose some countable bases C k C S, and we note that C l x . . . X C m is then a countable base in 8 1 x . . . X 8m. Hence, any open set G C 8 1 X . . . X 8m can be written as a countable union of measurable rectangles Bj == BJ x . .. x Bj with B; E S£ for all k. Since the S are fields, we may easily reduce to the case when the sets Bj are disjoint. By Fatou's lemma and (12) we obtain lim inf P{ (, . . . , :-) E G} lim inf " .P{ (, . . . , :) E B j } n-+oo n-+oo J > L.P{(1,...,m)EBj} J P{ (1, . . . , m) E G}, and so (11) holds by Theorem 4.25. To see that (11) implies n  , fix any ak E Sk, kEN, and note that the mapping (81,. . . ,8m) t-+ (81, . . ., 8m, a m +1, a m +2,. . .) is continuous on 8 1 x . · · X 8m for each mEN. By (11) it follows that ( , . . . , : , a m + 1, . . . )  ( 1 , . . . , m , a m + 1, . . . ), mEN. ( 13 ) Writing 'TJ and'TJm for the sequences in (13) and letting p be the metric in (2), we also note that p(, 'TJm) < 2- m and P(n, 'TJ) < 2- m for all m and d n. The convergence {n -+  now follows by Theorem 4.28. 0 In discussions involving distributional convergence of a random sequence 1, 2, . . . , the relationship between the elements n is often irrelevant. It is then natural to look for a more convenient representation, which may lead to simpler and more transparent proofs. 
4. Random Sequences, Series, and Averages 79 Theorem 4.30 (coupling, Skorohod, Dudley) Let, 1, (2, . .. be random elements in a separable metric space (S, p) such that €n .5, . Then there exists a probability space with some random elements T}   and T}n d n, n EN, such that 'TJn -+ 'TJ a. s. In the course of the proof, we need to introduce families of independent random elements with given distributions. The existence of such families is ensured, in general, by Corollary 6.18. When S is complete, we may instead rely on the more elementary Theorem 3.19. Proof: First assume that 8 == {l,...,m}, and put Pk == P{ == k} and Pi: = P {n == k}. Assuming 1) to be U (0, 1) and independent of , we may - d - easily construct some random elements n == n such tha! n == k whenever  == k and {} < Pi:/pk' Since Pk -+ Pk for each k, we get n -+  a.s. For general S, fix any pEN, and choose a partition of S into -continuity sets B 1 , B 2 ,'.. E 8(8) of diameter < 2- p . Next choose m so large that P{  tj. Ukm B k } < 2- P , and put Eo == nk:5m Bk. For k == 0, . . . , m, define d "" == k when  E Bk and ""n == k when n E B k , n E N. Then Kn -+ K, and by the result for finite S we may choose some n d Kn 'with Kn -+ K a.s. Let us further introduce some independent random elements ( in S with distributions P[n E .In E Bk] and define  == 2:k (l{Kn == k}, so that - d  == n for each n. From the construction it is clear that {p(€,) > TV} C {ii;n -I K;} U {E Bo}, n,p E N. Since Kn -t 1\; a.s. and P{€ E Bo} < 2- P , there exists for every p some np E N with pU { p(,€) > 2- p } < 2- P , p E f, n?np and we may further assume hat nl < n2 < ... . By the Borel-Cantelli lemma we get a.s. sUPn?n p p(€,) < 2- P for all but finitely many p. Now - d define 1]n == €h for np < n < n p +l, and note that €n = 1Jn -+  a.s. 0 We conclude this chapter with a result on functional representations of limits, needed in Chapters 17 and 21. To motivate the problem, recall from Lemma 4.6 that if n  'TJ for some random elements in a complete metric space S, then 'TJ == f() a.s. for some measurable function f : SCXJ -+ S, where  == (n). Here f depends on the distribution J1 of , so a universal representation must be of the form 'T] == f(, j-t). For certain purposes, it is crucial to choose a measurable version even of the latter function. To allow constructions by repeated approximation in probability, we need to consider the more general case when 1Jn  'TJ for some random elements 'TJn == f n ( , Jl). 
80 Foundations of Modern Probability For a precise statement of the result, let P(S) denote the space of proba- bility measures J.1 on S, endowed with the a-field induced by all evaluation maps J.l  J.lB, B E 8(8). Proposition 4.31 (representation of limits) Fix a complete metric space (S, p), a measurable space U, and some measurable functions 11, 12, . . . : U x P(U) -4 S. Then there exist a measurable set A C P(U) and a measurable function f: U x A -4 S such that, whenever  is a random element in U with distribution J-L, the sequence 1}n == fn(' J.1) converges in probability iff J.1 E A, in which case the limit equals f (, J.l) . Proof: For sequences s == (81,82,...) in S, define l(8) == limk Sk when the limit exists and put l(s) == 8 00 otherwise, where Soo E S is arbitrary. By Lemma 1.10 we note that l is a measurable mapping from 8 00 to S. Next consider a sequence 1} == (1}1, 1}2, . . .) of random elements in S, and put v == .c( "l). Define n1, n2, . .. as in the proof of Lemma 4.6, and note that each nk == nk(v) is a measurable function of v. Let C be the set of measures v such that nk(v) < 00 for all k, and note that 1Jn converges in probability iff v E C. Introduce the measurable function g(s, v) == l(snl(v), Sn2(11)'..')' S == (81,82,...) E 8 00 , v E P(SOO). If v E C, we see from the proof of Lemma 4.6 that 1J n k(V) converges a.s., p and so ''In -4 g( "l, v). Now assume that ''In = fn(, J-L) for some random element € in U with distribution jj and some measurable functions In. It remains to show that v is a measurable function of J-L. But this is clear from Lemma 1.41 (ii) applied to the kernel K(J.1,') == f-£ from P(U) to U and the function F == (11, 12, . . . ): U x P(U) -+ 8 00 . 0 As a simple consequence, we may consider limits in probability of measurable processes. The resulting statement will be useful in Chapter 17. Corollary 4.32 (measurability of limits, Stricker and Yor) For any mea- surable space T and complete metric space S, let X l , X 2 , . .. be S -valued measurable processes on T. Then there exist a measurable set AcT and a measurable process X on A such that Xf converges in probability iff tEA, in which case Xl"  Xt. Proof: Define t == (xl, Xl, . . .) and J-Lt = .c(t). By Proposition 4.31 there exist a measurable set C C P(Soo) and a measurable function f : 8 00 x C  8 such that Xl" converges in probability iff jjt E C, in which case Xl"  f(t, J-Lt). It remains to note that the mapping t  J-Lt is measurable, which is clear from Lemmas 1.4 and 1.26. 0 
4. Random Sequences, Series, and Averages 81 Exercises 1. Let 1,. . . , n be independent symmetric random variables. Show that P{(L:kk)2 > rL:k} > (1- r)2/3 for any r E (0,1). (Hint: Reduce by means of Lemma 3.11 to the case of nonrandom Ikl, and use Lemma 4.1.) 2. Let 1,. . . , n be independent symmetric random variables. Show that P{maxk Ikl > r} < 2P{ISI > r} for all r > 0, where S == L:k k. (Hint: Let "1 be the first term k where maxk Ik I is attained, and check that d ("1, S - "1) == ("1, "1 - S).) 3. Let 1,2,... be i.i.d. random variables with P{Inl > t} > 0 for all t > O. Show that there exist some constants Cl, C2, . .. such that cnn -t 0 in probability but not a.s. 4. Show that a family of random variables t is tight iff SUPt Ef(ltl) < 00 for some increasing function f: IR+ -+ JR+ with f ( (0) == OJ. 5. Consider some random variables n and 1Jn such that (n) is tight and p p "1n -t O. Show that even n 'TJn -+ O. 6. Show that the random variables t are uniformly integrable iff SUPt E f (It I) < 00 for some increasing function f: 1R+ -+ IR+ \vith f (x) / x ---+ 00 as x -+ 00. 7. Show that the condition SUPt EItl < 00 in Lemma 4.10 can be omitted if A is nonatomic. 8. Let 1, 2, . .. ELI. Show that the n are uniformly integrable iff the condition in Lemma 4.10 holds with sUP n replaced by lirrl sUPn. 9. Deduce the dominated convergence theorem from Lemma 4.11. 10. Show that if {1€t\P} and {\"1t\P} are uniformly integrable for some p > 0, then so is {Iat + b"1tl P } for any a, b E JR. (Hint: Use Lemnla 4.10.) Use this fact to deduce Proposition 4.12 from Lemma 4.11. 11. Give examples of random variables , €1, 2,.'. E £2 such that n -t  holds a.s. but not in L2, in £2 but not a.s., or in £1 but not in £2. 12. Let 1, 2,. .. be independent random variables in £2. Show that L:n n converges in £2 iff L:n En and L:n var(€n) both converge. 13. Give an example of independent symmetric random variables 1, 2, . . . such that L:n n is a.s. conditionally (nonabsolutely) convergent. 14. Let n and "1n be symmetric random variables with In I < l1Jn I such that the pairs (n, 1Jn) are independent. Show that L:n €n converges whenever L:n "1n does. 15. Let 1, €2, . .. be independent symmetric random variables. Show that E[(L:n €n)2 /\ 1] < L:n E[€; /\ 1] whenever the latter series converges. (Hint: Integrate over the sets where sUPn Inl < 1 or > 1, respectively.) 16. Consider some independent sequences of symmetric random variables k, "11, "1,. .. with 1"1k'1 < Ikl such that L:k k converges, and assume r1'1:  
82 Foundations of Modern Probability T/k for each k. Show that Ek T/k  Ek T}k. (Hint: Use a truncation based on the preceding exercise.) 17. Let En n be a convergent series of independent random variables. Show that the sum is a.s. independent of the order of terms iff En IE[n; Inl < 1]1 < 00. 18. Let the random variables nj be symmetric and independent for each n. Show that E j f"nj  0 iff E j E[f,,;j 1\ 1] -T O. 19. Let €n  € and ann   for some nondegenerate random variable  and some constants an > O. Show that an -t 1. (Hint: Turning to subsequences, we may assume that an -+ a.) 20. Let €n   and ann +b n   for some nondegenerate random variable , where an > o. Show that an -t 1 and b n -t O. (Hint: Symmetrize.) 21. Let 1, 2,. .. be independent random variables such that an Ek<n €k converges in probability for some constants an --+ o. Show that the limit is degenerate. 22. Show that Theorem 4.23 is false for p = 2 by taking the k to be independent and N ( 0, 1). 23. Let l, 2,. .. be i.i.d. and such that n- 1 / p Ek<n k is a.s. bounded for some p E (0,2). Show that EIlIP < 00. (Hint: Argue as in the proof of Theorem 4.23.) 24. Show for p < 1 that the a.s. convergence in Theorem 4.23 remains valid in LP. (Hint: Truncate the k.) 25. Give an elementary proof of the strong law of large numbers when EI14 < 00. (Hint: Assuming E = 0, show that E En(Sn/n)4 < 00.) 26. Show by examples that Theorem 4.25 is false without the stated restrictions on the sets G, F, and B. 27. Use Theorem 4.30 to give a simple proof of Theorem 4.27 when S is separable. Generalize to random elements  and n in Borel sets C and Cn, respectively, assuming only fn(x n ) -t f(x) for X n E C n and x E C with X n -t x. Extend the original proof to that case. 28. Give a short proof of Theorem 4.30 when S = JR. (Hint: Note that the distribution functions Fn and F satisfy Fl --+ p-l a.e. on [0,1].) 
Chapter 5 Characteristic Functions and Classical Limit Theorems Uniqueness and continuity theorem; Poisson convergence; pos- itive and symmetric terms; Lindeberg's condition; general Gaussian convergence; weak laws of large numbers; domain of Gaussian attraction; vague and weak compactness In this chapter we continue the treatment of weak convergence from Chap- ter 4 with a detailed discussion of probability measures on ]uclidean spaces. Our first aim is to develop the theory of characteristic functions and Laplace transforms. In particular, the basic uniqueness and continuity theorem will be established by simple equicontinuity and approximation arguments. The traditional compactness approach-in higher dimensions a highly nontriv- ial route--is required only for the case when the limiting function is not known in advance to be a characteristic function. The cornpactness theory also serves as a crucial bridge to the general theory of vreak convergence presented in Chapter 16. Our second aim is to establish the basic distributional limit theorems in the case of Poisson or Gaussian limits. We shall then consider triangular arrays of random variables nj, assumed to be independent for each nand such that nj  0 as n -4 00 uniformly in j. In this setting, general criteria will be obtained for the convergence of Lj (nj toward a Poisson or Gaussian distribution. Specializing to the case of suitably centered and normalized partial sums from a single i.i.d. sequence 1, 2, . .. , we may deduce the ultimate versions of the weak law of large numbers and the central limit theorem, including a complete description of the domain of attraction of the Gaussian law. The mentioned limit theorems lead in Chapters 12 and 13 to some ba- sic characterizations of Poisson and Gaussian processes, ,¥hich in turn are needed to describe the general independent increment processes in Chapter 15. Even the limit theorems themselves are generalized in various ways in subsequent chapters. Thus, the Gaussian convergence is extended in Chap- ter 14 to suitable martingales, and the result is strengthened to uniform approximation of the summation process by the path of a Irownian motion. Similarly, the Poisson convergence is extended in Chapter 16 to a general limit theorem for point processes. A complete solution to the general limit 
84 Foundations of Modern Probability problem for triangular arrays is given in Chapter 15, in connection with our treatment of Levy processes. In view of the crucial role of the independence assumption for the meth- ods in this chapter, it may come as a surprise that the scope of the method of characteristic functions and Laplace transforms extends far beyond the present context. Thus, exponential martingales based on characteristic functions playa crucial role in Chapters 15 and 18, whereas Laplace func- tionals of random measures are used extensively in Chapters 12 and 16. Even more importantly, Laplace transforms playa key role in Chapters 19 and 22, in the guises of resolvents and potentials for Markov processes and their additive functionals, and also in connection with the large deviation theory of Chapter 27. To begin with the basic definitions, consider a random vector  in JRd with distribution J.t. The associated characteristic function p is given by (.t(t) = J e itx J.L(dx) = Eeitf., t E d, where tx denotes the inner product tlxl +. · . +tdXd. For distributions J..t on JRi, it is often more convenient to consider the Laplace transform jl, given by (.t(u) = J e-uxJ.L(dx) = Ee-uf., u E . Finally, for distributions J.t on Z+, it is often preferable to use the (probability) generating function 1/J, given by 'ljJ(s) = L snp{ = n} = Es, s E [0,1]. nO Formally, jl( u) = {t( iu) and fl( t) = [L( -it), and so the functions jl and jl are essentially the same, apart from domain. Furthermore, the generating function 'Ij; is related to the Laplace transform ji by [L(u) = 1/J(e- U ) or 'ljJ( s) = jL( -log s). Though the characteristic function always exists, it may not be extendable to an analytic function in the complex plane. For any distribution Jj on ]Rd, we note that the characteristic function c.p = {l. is uniformly continuous with Icp(t)1 < cp(O) = 1. It is also seen to be Hermitian in the sense that 'P( -t) = (jj(t), where the bar denotes complex conjugation. If  has characteristic function 'P, then the linear combination a = all +. . .+add has characteristic function t  cp(ta). Also note that if  and 17 are independent random vectors with characteristic functions c.p and 'ljJ, then the characteristic function of the pair (, fJ) is given by the tensor product 'P Q91/J: (s, t)  <.p(s)1/J(t). In particular,  + 'fJ has characteristic function cp1/J, and the characteristic function of the symmetrized variable  - ' equals Icp12. Whenever applicable, the quoted statements carryover to Laplace trans- forms and generating functions. The latter functions have the further 
5. Characteristic Functions and Classical Limi t Theorems 85 advantage of being positive, monotone, convex, and analytic-properties that simplify many arguments. The following result contains some elementary but useful estimates in- volving characteristic functions. The second inequality was used in the proof of Theorem 4.17, and the remaining relations will be useful in the sequel to establish tightness. Lemma 5.1 (tail estimates) For any probability measure JL on JR, we have r J 2/T J.L{ x; Ixl > r} < - (1 - Pt )dt, r >. 0, (1) 2 -2/r Jl[ -r, r] < 2r J IlT Ij/,t Idt, r > O. (2) -l/r If J.-t is supported by JR+, then also JL[r,oo) < 2(1 - JJ(I/r)), r > O. (3) Proof: Using Fubini's theorem and noting that sin x < x/2 for x > 2, we get for any c > 0 ICc (1 - j/,t)dt = / p,(dx) ICc (1 - eitX)dt 2c / { 1 - Sl:;X } p,(dx) > cp,{:r; Icxl > 2}, and (1) follows as we take c == 2/T. To prove (2), we may write p,[-r, r] < 2 / 1 -(/r) p,(dx) r / p,(dx) /(1- rltl)+etxtdt r / (1 - rltl)+J1,t dt < r j IlT Ij/,t Idt. -liT To obtain (3), we note that e- X <  for x > 1. Thus, for t > 0, 1 - fit = /(1 - e-tX)p,( dx) >  p,{ x; tx > I}. 0 Recall that a family of probability measures JLQ on ]Rd is said to be tight if Hrn SUPJLa{x; Ixl > r} == O. r-+oo 0 The following lemma describes tightness in terms of characteristic func- tions. 
86 Foundations of Modern Probability Lemma 5.2 (equicontinuity and tightness) A family {J.to} of probability measures on ]Rd is tight iff {ita} is equicontinuous at 0, and then {ita} is uniformly equicontinuous on }Rd. A similar statement holds for the Laplace transforms of distributions on JRi. Proof: The sufficiency is immediate from Lemma 5.1, applied separately in each coordinate. To prove the necessity, let €a denote a random vector with distribution J.to, and write for any s, t E ]Rd lito(s) - ito(t)1 < EleiSQ - eito I = Ell - ei(t-s)o I < 2E[I(t - s)ol AI]. If {a} is tight, then by Lemma 4.9 the right-hand side tends to 0 as t - s -t 0, uniformly in 0:, and the asserted uniform equicontinuity follows. The proof for Laplace transforms is similar. 0 For any probability measures J-t, J-tl, J.t2, . .. on]Rd, we recall that the weak convergence J.tn  M holds by definition iff J.tnf --t J-Lf for any bounded, continuous function! on JRd, where J.1! denotes the integral f fdJ.t. The usefulness of characteristic functions is mainly due to the following basic result. Theorem 5.3 (uniqueness and continuity, Levy) For any probability mea- sures J-L, J..tl, /l2, . .. on JRd, we have fLn  J-L iff n (t) -t ji,(t) for every t E ]Rd, and then itn --t it uniformly on every bounded set. A corresponding statement holds for the Laplace transforms of distributions on JRi. In particular, we may take J-ln = v and conclude that a probability measure J.t on JRd is uniquely determined by its characteristic function fl. Similarly, a probability measure J.L on 1Ri is seen to be determined by its Laplace transform fl. For the proof of Theorem 5.3, we need the following simple cases or consequences of the Stone-Weierstrass approximation theorem. Here [0,00] denotes the compactification of JR+. Lemma 5.4 (approximation) Every continuous function f: d -+ 1R. with period 27r in each coordinate admits a uniform approximation by linear com- binations of cos kx and sin kx, k E Zi. Similarly, every continuous function g: [O,oo]d -+]R+ can be approximated uniformly by linear combinations of the functions e- kx , k E Zi. Proof of Theorem 5.3: We consider only the case of characteristic func- tions, the proof for Laplace transforms being similar. If J.tn  J.t, then iln(t) -+ P,(t) for every t, by the definition of weak convergence. By Lemmas 4.8 and 5.2, the latter convergence is uniform on every bounded set. 
5. Characteristic Functions and Classical Limit Theorems 87 Conversely, assume that iln(t) -+ {L(t) for every t. By Lemma 5.1 and dominated convergence we get, for any a E JRd and r > 0, r j 2/r limsuPJ.tn{x; laxi > r} < lim - (1 - /Ln(ta))dt noo noo 2 -2/r r j 2/r - (1 - j1( ta) )dt. 2 -2/r Since jl is continuous at 0, the right-hand side tends to 0 as r --t 00, which shows that the sequence (J.ln) is tight. Given any E > 0, we may then choose r > 0 so large that J.tn{lxl > r} < c for all nand p,{lxl >. r} < c. Now fix any bounded, continuous function f : JRd --t IR, say with If I < m < 00. Let Ir denote the restriction of I to the ball {ix i < r}, and extend fr to a continuous function j on d with III < m and period 27rr in each coordinate. By Lemma 5.4 there exists some linear cOD1bination 9 of the functions cos(kx/r) and sin(kx/r), k E Zi, such that Ii' - 91 < E. Writing \I . II for the supremum norm, we get for any n E N IJLnf - J.tn91 < JLn{lxl > r}llf -111 + 111- 9\1 < (2m + l)E, and similarly for M. Thus, lJ.lnf - Jlfl < IJln9 - j.tgl + 2(2m + l)c, n E N. Letting n -t 00 and then € --+ 0, we obtain J.tnf --+ {Lf. Since f was arbitrary, this proves that J.Ln  j.t. 0 The next result provides a way of reducing the d-dimensional case to that of one dimension. Corollary 5.5 (one-dimensional projections, Cramer and Wold) Let € and 1, 2, . .. be random vectors in ]Rd. Then n   jJ tn  t for all t E JRd. For random vectors in JRi, it suffices that un  u for all u E JRt. Proof: If tn  t, then Eeitn -+ Eeit by the definition of weak con- vergence, and so n   by Theorem 5.3. The proof for random vectors in JRi is similar. 0 The last result contains in particular a basic uniqueness result, the fact that  d 1] iff t d t1] for all t E JRd or Ri, respectively. In other words, a probability measure on JRd is uniquely determined by its one-dimensional projections. We now apply the continuity theorem to prove some classical limit theorems, and we begin with the case of Poisson convergence. For an intro- duction, consider for each n E N some i.i.d. random variables €nl, . . . , nn with distribution P{c;nj = I} = 1- P{{nj =O} = Cn, n E N, 
88 Foundations of Modern Probability and assume that nC n -+ C < 00. Then the sums Sn == nl + . . . + nn have generating functions cns n 'l/;n ( s) == (1 - (1 - s) C n ) n -+ e - c( 1- s) == e - C 2: " s E [0, 1]. n. nO The limit 'ljJ(s) == e-c(l-s) is the generating function of the Poisson dis- tribution with parameter c, the distribution of a random variable rJ with probabilities P{'f} == n} == e-ccn/n! for n E Z+. Note that the correspond- ing expected value equals E'f] == 1jJ' (1) == c. Since 'l/Jn --+ , it is clear from d Theorem 5.3 that Sn -+ TJ. Before turning to more general cases of Poisson convergence, we need to introduce the notion of a null array. By this we mean a triangular array of random variables or vectors nj, 1 < j < m n , n E N, such that the nj are independent for each n and satisfy sUPjE[Inj 1/\ 1] -+ O. (4) The latter condition may be thought of as the convergence nj  0 a.s n --+ 00, uniformly in j. When nj > 0 for all nand j, we may allow the m n to be infinite. The following lemma characterizes null arrays in terms of the associated characteristic functions or Laplace transforms. Lemma 5.6 (null arrays) Consider a triangular array of random vectors nj with characteristic functions 'Pnj or Laplace transforms 'l/Jnj. Then (4) holds iff, respectively, suPjll - 'Pnj(t)l-+ 0, inf j 'ljJ n j ( u) -t 1, t E ]Rd, U E i. (5) Proof: Relation (4) holds iff n,jn  0 for all sequences (jn). By Theorem 5.3 this is equivalent to 'Pn,jn (t) -+ 1 for all t and (jn), which in turn is equivalent to (5). The proof for Laplace transforms is similar. 0 We now give a general criterion for Poisson convergence of the row sums in a null array of integer-valued random variables. The result will be ex- tended in Lemmas 15.15 and 15.24 to more general limiting distributions and in Theorem 16.18 to the context of point processes. Theorem 5.7 (Poisson convergence) Let (nj) be a null array of Z+- valued random variables, and let  be Poisson distributed with mean c. Then Lj nj   iff these conditions hold: (i) LjP{nj>I}-+O; (ii) E j P{nj = I} -+ c. Moreover, (i) is equivalent to SUPj f,nj V 1  1. If Lj nj converges 'In distributiQn, then (i) holds iff the limit is Poisson. 
5. Characteristic Functions and Classical Limit Theorems 89 We need the following frequently used lemma. Lemma 5.8 (sums and products) Consider a null array of constants Cnj > 0, and fix any c E [0,00]. Then IT j (1 - Cnj) --t e- c iff LJ Cnj -t c. Proof: Since SUPj Cnj < 1 for large n, the first relation is equivalent to Lj 10g(1 - Cnj) -t -C, and the assertion follows from the fact that log(l - x) == -x + o(x) as x --t O. 0 Proof of Theorem 5. 7: Let 'lfJnj denote the generating function of nJ. By Theorem 5.3 the convergence Lj nj   is equivalent to IT j 1/-J nJ (s) --t e-c(l-s) for arbitrary s E [0,1], which holds by Lemmas 5.6 and 5.8 iff 2: .(l-1/Jnj(s)) --+ e(l - s), S E [0,1]. J By an easy computation, the sum on the left equals (1 - s) 2: P{nj > O} + 2:(s - sk) 2: P{nJ == k} == Tl + T 2 , (7) J k>l] (6) and we also note that s(l- s)2:,P{nj > I} < T 2 < S2:.P{nJ > I}. (8) J J Assuming (i) and (ii), it is clear that (6) follows from (7) and (8). Now assume instead that (6) holds. For s == 0 we get Lj P{(nJ > O} -7 c, and so in general T 1 -4 c(l - s). But then (6) implies T 2 -t 0, and (i) follows by (8). Finally, (ii) is obtained by subtraction. To prove that (i) is equivalent to sup] nj V 1  1, we note that P{SUPjnj < 1} == IT.P{nj < I} == IT.(l- P{nJ > 1}). J J By Lemma 5.8 the right-hand side tends to 1 iff Lj P{ nj > I} -+ 0, which is the stated equivalence. To prove the last assertion, put Cnj == P {nj > O} and write Eexp {- 2:jnj} - P{SUPjnj > I} < Eexp {- Lj(nj 1\ I)} == IT .Eexp{ -(nj 1\ I)} == IT ,{1 - (1 - t- 1 )c n j} J J < IT j exp{ -(1 - e- 1 )Cnj} = exp { -(1 - e- 1 ) L/n j } . If (i) holds and I: j nj  TJ, then the left-hand side tends to Ee-"" > 0, and so the sums C n == Lj Cnj are bounded. Hence, C n converges along a subsequence N' C N toward some constant c. But then (i) and (ii) hold along N', and the first assertion shows that 17 is Poisson with mean c. D Next consider some i.i.d. random variables 1, 2, . .. \\ith P{ k == ::f::1} == , and write Sn == 1 +... + n. Then n- 1 / 2 Sn has characteristic function 'Pn(t) = cosn(n-l/2t) == (1 - t2n-l + O(n- 2 )) n -+ e- t2 / 2 == 'P(t). 
90 Foundations of Modern Probability By a classical computation, the function e- x2 / 2 has Fourier transform i: eitxe-x2/2dx = (21T)1/2e- t2 / 2 , t E. Hence, <p is the characteristic function of a probability measure on JR. with density (27r) -1/2e- x2 /2. This is the standard normal or Gaussian distribu- tion N(O, 1), and Theorem 5.3 shows that n- 1 / 2 Sn  (, where ( is N(O, 1). The general Gaussian law N(m, a 2 ) is defined as the distribution of the ran- dom variable 'TJ = m + a(, and we note that 'TJ has mean m and variance a 2 . From the form of the characteristic functions together with the uniqueness property, it is clear that any linear combination of independent Gaussian random variables is again Gaussian. The convergence to a Gaussian limit generalizes easily to a more general setting, as in the following classical result. The present statement is only preliminary, and a more general version is obtained by different methods in Theorem 5.17. Proposition 5.9 (central limit theorem, Lindeberg, Levy) Let , l, 2, . . . be i.i.d. random variables with E == 0 and E2 = 1, and let ( be N(O, 1). Then n- 1 / 2 2:kn k  (. The proof may be based on a simple Taylor expansion. Lemma 5.10 (Taylor expansion) Let <p be the characteristic function of a random variable  with EIln < 00. Then n (it)kEk n cp(t) = L k! + o(t), t -+ O. k=O Proof: Noting that le it - 11 < t for all t E IR, we get recursively by dominated convergence <p(k)(t) = E(i)keit, t E JR, 0 < k < n. In particular, cp(k)(O) = E(i)k for k < n, and the result follows from Taylor's formula. 0 Proof of Proposition 5.9: Let the k have characteristic function <p. By Lemma 5.10, the characteristic function of n- 1 / 2 Sn equals CPn(t) = (cp(n- 1 / 2 t) r = (1 - en-l + o(n-1)r -+ e- t2 / 2 , where the convergence holds as n --+ 00 for fixed t. o Our next aim is to examine the relationship between null arrays of symmetric and positive random variables. In this context, we may also derive criteria for convergence toward Gaussian and degenerate limits, respectively. 
5. Characteristic Functions and Classical Limit Theorems 91 Theorem 5.11 (positive and symmetric terms) Let (nj) be a null array of symmetric random variables, and let  be N(O, c) for some c > o. Then L: j nj 54  iff L: j j  c, and also iff these conditions hold: (i) L: j P{Injl > e} -t 0 for all e > 0,- (ii) L: j E(j 1\ 1) -+ c. Moreover, (i) is equivalent to SUPj Injl  o. If L: j nj or L: j ,j con- verges in distribution, then (i) holds iff the limit is Gaussian or degenerate, respectively. Here the necessity of condition (i) is a remarkable fact that plays a cru- cial role in our proof of the more general Theorem 5.15. It is instructive to compare the present statement with the corresponding result for ran- dom series in Theorem 4.17. Note also the extended version appearing in Proposition 15.23. Proof: First assume that L: j nj 54 . By Theorem 5.3 and Lemmas 5.6 and 5.8 it is equivalent that L.E(1-costnj)-+ct2, tEIR, (9) J where the convergence is uniform on every bounded interval. Comparing the integrals of (9) over [0,1] and [0,2], we get L: j Ef(nJ) -+ 0, where f(O) == 0 and I(x) = 3 - 4snx + Si:X , x E IR \ {O}. Now f is continuous with f(x) -+ 3 as Ixl -+ 00, and furthermore f(x) > 0 for x i= o. Indeed, the last relation is equivalent to 8 sin x - sin 2x < 6x for x > 0, which is obvious when x > 1r /2 and follows by differentiation twice when x E (0, 1r /2). Writing g(x) == infy>x f(y) and letting E > 0 be arbitrary, we get L-P{Injl > E} < L.P{f(nj) > g(e)} < L.Ef(nj)/g(c) -+ 0, J J J which proves (i). If instead L: j j  c, the corresponding symmetrized variables 1]nj sat- isfy L,j 'fJnj  0, and we get L: j P{I'fJnjl > c} ---+ 0 as before. By Lemma 4.19 it follows that L: j P {Ij - mnj I > c} -+ 0, where the mnj are medians of j' and since SUPj mnj -t 0, condition (i) follows again. Using Lemma 5.8, we further note that (i) is equivalent to SUPj Inj I  o. Thus, we may henceforth assume that (i) is fulfilled. Next we note that, for any t E IR and e > 0, L .E[l - costnj; Injl < e] == t2 (1 - O(t 2 e 2 )) L .E[j; Injl < c]. J J Assuming (i), the equivalence between (9) and (ii) now follows as we let n --t 00 and then E --t O. To get the corresponding result for the variables 
92 Foundations of Modern Probability j' we may instead write 2:B[l- e-tJ; (j < E] = t(l- O(tE))2:B[(j; (j < E], t,E > 0, J J and proceed as before. This completes the proof of the first assertion. Finally, assume that (i) holds and E j f,nj  TJ. Then the same relation holds for the truncated variables nj 1 {If,nj I < I}, and so we may assume that Inj I < 1 for all j and k. Define C n == E j Ej' If C n -t 00 along some subsequence, then the distribution of C1/2 E j nj tends to N(O, 1) by the first assertion, which is impossible by Lemmas 4.8 and 4.9. Thus, (c n ) is bounded and converges along some subsequence. By the first assertion, E j nj then tends to some Gaussian limit, so even 'TJ is Gaussian. 0 The following result gives the basic criterion for Gaussian convergence, under a normalization by second moments. Theorem 5.12 (Gaussian convergence under classical normalization, Lin- deberg, Feller) Let (nj) be a triangular array of rowwise independent random variables with mean 0 and E j Ej -t 1, and let  be N(O,l). Then these conditions are equivalent: (i) E j (nj  ( and SUPj B(j -+ 0; (ii) E j E[j; Injl > E] -t 0 for all € > o. Here (ii) is the celebrated Lindeberg condition. OUf proof is based on two elementary lemmas. Lemma 5.13 (comparison of products) For any complex numbers Zl, . . . , Zn and z, . . . , z of modulus < 1, we have ITIk Zk - TIk z1 < 2: k l zk - zl. Proof: For n = 2 we get IZIZ2 - zz1 < I Z IZ2 - zz21 + IZZ2 - zz1 < IZI - z1 + I Z 2 - zl, and the general result follows by induction. o Lemma 5.14 (Taylor expansion) For any t E 1R and n E Z+, we have ( t) k 2 1tl n Itl n+l it  i e -  k! < - 1\ ( ) , . k=O - n! n + 1 . Proof: Letting hn(t) denote the difference on the left, we get hn(t) = i it h n - 1 (s)ds, t > 0, n E Z+. Starting from the obvious relations Ih_ll - 1 and /ho/ < 2, it follows by induction that Ih n - 1 (t)! < !tl n In! and Ihn(t)1 < 2\t1 n In!. 0 
5. Characteristic Functions and Classical Limit Theorems 93 We return to the proof of Theorem 5.12. At this point we shall prove only the sufficiency of the Lindeberg condition (ii), which is needed for the proof of the main Theorem 5.15. To avoid repetition, we postpone the proof of the necessity part until after the proof of that theorenl. Proof of Theorem 5.12, (ii) ::::} (i): Write c n ] == Ej and C n == E j C n ]. First we note that for any c > 0 SUPjCnj < c 2 + sUPjE[;j; Injl > c] < c 2 + L E[;]; Injl > c], J which tends to 0 under (ii), as n -t 00 and then E -t O. Now introduce some independent random variables (n] 'with distributions N(O,cnj), and note that en = E j enj is N(O,c n ). Hence, en . Letting 'Pnj and 'lfJnj denote the characteristic functions of nj and (nJ' respectively, it remains by Theorem 5.3 to show that il J 'Pnj - il) 'l/Jn) --+ O. Then conclude from Lemmas 5.13 and 5.14 that, for fixed t E I, I n .'Pnj(t) - n .V;nj(t) 1 < L l'Pnj(t) -1Pnj(t)1 ] J J < L .1'Pnj(t) - 1 + t2Cnjl + L l1/JnJ(t) -- 1 + !t 2 c n jl J ] :S LjEj(l !\ !(njl) + LjEej(l !\ lenj!)' For any c > 0, we have L.Ej(l!\ Injl) < €L.cnj + L.E[;j; Injl > E], J ]] which tends to 0 by (ii), as n -t 00 and then E -7 O. Further note that LjEej(l !\ lenjl) < L j Elen j l 3 = L/2 EI(1 3 :S G n SUP j c;,j2  0 by the first part of the proof. 0 The problem of characterizing the convergence to a Gaussian limit is solved completely by the following result. The reader should notice the striking resemblance between the present conditions and those of the three- series criterion in Theorem 4.18. A far-reaching extension of the present result is obtained by different methods in Chapter 15. As before var[; A] == var(lA). Theorem 5.15 (Gaussian convergence, Feller, Levy) Let (nj) be a null array of random variables, and let  be N(b, c) for some constants band c. Then E j nj   iff these conditions hold: (i) E j P{Injl > c} --t 0 for all E > 0; (ii) E j E[nj; Injl < 1] -t b; (iii) E j var[nj; Inj I < 1] -+ c. Moreover, (i) is equivalent to SUPj Inj I  o. If E j nj converges n distribution, then (i) holds iff the limit is Gaussian. 
94 Foundations of Modern Probability Proof: To see that (i) is equivalent to SUPj Inj I  0, we note that P{sUPjlnjl > €} = 1- II.(l- P{I€njl > €}), € > O. J Since SUPj P{Inj I > E} -+ 0 under both conditions, the assertion follows by Lemma 5.8. Now assume Enj nj  . Introduce medians mnj and symmetrizations - . - d nj of the varIables nj, and note that m n = SUPj Imnj I -+ 0 and E j nj -+ , where  is N(O,2c). By Lemma 4.19 and Theorem 5.11, we get for any E>O L .P{Injl > E} J < L .P{Inj - mnjl > E - m n } J < 2 L .P{Injl > € - m n } -+ O. J Thus, we may henceforth assume condition (i) and hence that SUPj Inj I  o. But then E j nj  'fJ is equivalent to E j j  'TJ, where j = njl{lnjl < I}, and so we may further assume that Injl < 1 a.s. for all nand j. In this case (ii) and (iii) reduce to b n = E j Enj -+ band C n = E j var( nj) -+ c, respectively. Write b nj = Enj, and note that SUPj Ibnjl -+ 0 because of (i). Assuming (ii) and (iii), we get E j nj -b n   -b by Theorem 5.12, and so E j nj  . Conversely, E j nj   implies E j nj  , and (iii) follows by Theorem d 5.11. But then E j €nj -b n -+  -b, so Lemma 4.20 shows that b n converges toward some b ' . Hence, E j nj   + b ' - b, and so b ' = b, which means that even (ii) is fulfilled. It remains to prove that, under condition (i), any limiting distribution is Gaussian. Then assume E j nj  'fJ, and note that E j nj  ij, where ij denotes a symmetrization of 1}. If C n -7 00 along some subsequence, then C;;:1/2 E j (nj tends to N(O, 2) by the first assertion, which is impossible by Lemma 4.9. Thus, (c n ) is bounded, and we have convergence C n -7 C along some subsequence. But then Enj nj - b n tends to N(O, c), again by the first assertion, and Lemma 4.20 shows that even b n converges toward some limit b. Hence, Enj nj tends to N(b, c), which is then the distribution of 'TJ. 0 Proof of Theorem 5.12, (i) => (ii): The second condition in (i) implies that (nj) is a null array. Furthermore, we have for any € > 0 L .var[nj; Injl < c] < L .E[j; 'njl < c] < L .Ej -+ 1. J J J By Theorem 5.15 even the left-hand side tends to 1, and (ii) follows. 0 
5. Characteristic Functions and Classical Limit Theorems 95 As a first application of Theorem 5.15, we shall prove the following ul- timate version of the weak law of large numbers. The result should be compared with the corresponding strong law established jn Theorem 4.23. Theorem 5.16 (weak laws of large numbers) Let, l, 2, . .. be i.i.d. ran- dom variables, and fix any p E (0,2) and c E JR. Then n- 1 / p Lk<n k  C iff the following condition holds as r --+ 00, depending on the value of p: p < 1: rPP{I1 > r} --). 0 and c == 0; p == 1: rP{I1 > r} -+ 0 and E[; II < r] --+ c; p > 1: r P P{II > r}  0 and E == c == o. Proof: Applying Theorem 5.15 to the null array of random variables nj == n-l/pj, j < n, we note that the stated convergence is equivalent to the three conditions (i) nP{I1 > n 1 / P E} --+ 0 for all E > 0, (ii) n 1 - l / p E[; II < n l / p ] --+ c, (iii) nl-2/pvar[; II < n l / p ] -+ o. By the monotonicity of P{II > r 1/p }, condition (i) is equivalent to r P P{II > r} -+ o. Furthermore, Lemma 3.4 yields for any r > 0 rP-2var[; II < r] < r P E[(/r)2 /\ 1] = r P 1 1 P{II > rvt}dt, rP-lIE[; II < rJl < rPE(I/rl/\ 1) = r P 1 1 P{II > rt}dt. Since t-a is integrable on [0,1] for any a < 1, it follows by dominated convergence that (i) implies (iii) and also that (i) implies (ii) with c == 0 when p < 1. If instead p > 1, we see from (i) and Lemma 3.4 that EII = 1 00 P{II > r}dr ;s 1 00 (1/\ r-P)dr < 00. Thus, E[; II < r] -+ E, and (ii) implies E == O. Moreover, we get from (i) r P - 1 E[II; II > r] = r P P{II > r} + r P - 1 1 00 P{II > t}dt -+ O. Under the further assumption that E = 0, we obtain (ii) with c == o. Finally, let p = 1, and conclude from (i) that E[II; n < II < n + 1] :S nP{I1 > n} --+ O. Hence, under (i), condition (ii) is equivalent to E[; II ; r] --+ c. 0 We next extend the central limit theorem in Proposition 5.9 by charac- terizing convergence of suitably normalized partial sums from a single i.i.d. sequence toward a Gaussian limit. Here a nondecreasing function L > 0 is 
96 Foundations of Modern Probability said to vary slowly at 00 if sUPx L(x) > 0 and moreover L(cx) r-.J L(x) as x -7 00 for each c > O. This holds in particular when L is bounded, but it is also true for many unbounded functions, such as log(x VI). Theorem 5.17 (domain of Gaussian attraction, Levy, Feller, Khinchin) Let ,  1 , 2, . .. be i. i. d. nondegenerate random variables, and let ( be N(O,I). Then an Lk<n (k -m n )  ( for some constants an and m n iff the function L(x) = E[2; II < x] varies slowly at 00, in which case we may take m n = E. In particular, the stated convergence holds with an = n- I / 2 and m n = 0 iff E == 0 and E2 == 1. Even other so-called stable distributions may occur as limits, but the conditions for convergence are too restrictive to be of much interest for applications. Our proof of Theorem 5.17 is based on the following result. Lemma 5.18 (slow variation, Karamata) Let  be a nondegenerate ran- dom variable such that L(x) == E[2; II < x] varies slowly at 00. Then so does the function Lm(x) == E[( - m)2; I - ml < x] for every m E JR, and moreover Hm x 2 - p E[IIP; II > x]/ L(x) = 0, P E [0,2). (10) x ---+ 00 Proof: Fix any constant r E (1,2 2 - P ), and choose Xo > 0 so large that L(2x) < rL(x) for all x > Xo. For such an x, we get x 2 - p E[IIP; II > x] x2-PLnoE[IIP; II/x E (2 n ,2 n +1J] < Lno2(p-2)n E[e; II/x E (2 n , 2n+lJ] <" 2(p-2)n(r - l)r n L(x) nO - (r - l)L(x)/(l - 2P- 2 r). Now (10) follows, as we divide by L(x) and let x -7 00 and then r -t 1. In particular, we note that EIIP < 00 for all p < 2. If even E2 < 00, then E( -m)2 < 00, and the first assertion is obvious. If instead E2 == 00, we may write Lm(x) == E[2; I - ml < x] + mE[m - 2; I - ml < x]. Here the last term is bounded, and the first term lies between the bounds L(x ::l:: m) rv L(x). Thus, Lm(x) r-.J L(x), and the slow variation of Lm follows from that of L. 0 Proof of Theorem 5.17: Assume that L varies slowly at 00. By Lemma 5.18 this is also true for the function Lm(x) = E[( - m)2; I - ml > x], where m == E, and so we may assume that E == o. Now define C n == 1 V sup{x > 0; nL(x) > x 2 }, n E N, 
5. Characteristic Functions and Classical Limit Theorems 97 and note that C n t 00. From the slow variation of L it is further clear that C n < 00 for all n and that, moreover, nL(c n ) f"'V c;. In particular, C n f"'V n 1 / 2 iff L(c n ) f"'V 1, that is, iff var() == 1. We shall verify the conditions of Theorem 5.15 with b == 0, C == 1, and nj == j / Cn, j < n. Beginning with (i), let E > 0 be arbitrary, and conclude from Lemma 5.18 that P{l c / I } f"'V cP{I1 > CnE} rv cP{I1 > CnE} 0 n  C n > € L(c n ) L(CnE) -+. Recalling that E == 0, we get by the same lemma nIE[/Cn; I/cnl < 1]1 < E[II; II > cn] '" cnE[I I > cn] -+ 0, C n C n (11) which proves (ii). To obtain (iii), we note that in view of (11) n n var [ I C n ; I  / C n I < 1] == 2 L ( c n ) - n ( E [ / C n ; I  I < C n ]) 2 ---+ 1. c n By Theorem 5.15 the required convergence follows with an == cl and m n = o. Now assume instead that the stated convergence holds for suitable con- stants an and m n . Then a corresponding result holds for the symmetrized variables , 1' 2,... with constants ani J2 and 0, and so we may assume -1 - d that C n Lkn k ---+ (. Here, clearly, C n -t 00 and, moreover, C n +l rv Cn, since even Cl Lkn k  ( by Theorem 4.28. Now define for x > 0 T(x) == P{II > x}, L(x) == E[2; II < x], U(x) == E(2 1\ x 2 ). By Theorem 5.15 we have nT(CnE) ---+ 0 for all E > 0, and also nc2 L(c n ) -t 1. Thus, CT(cnE)1 L(c n ) ---+ 0, which extends by monotonicity to x 2 f(x) x 2 f(x) -- < -- -+ 0 , x -+ 00. U(x) - L(x) Next define for any x > 0 T(x) == P{II > x}, U(x) == E(2 1\ x 2 ). By Lemma 4.19 we have T(x + Iml) < 2T(x) for any median m of . Furthermore, by Lemmas 3.4 and 4.19, we get x 2 x 2 U(x) = 1 P{t 2 > t}dt < 21 p{4e > t}dt = 8U(x/2). Hence, as x ---+ 00, L(2x) - L(x) 4x 2 T(x) 8x 2 T(x - Im,1) L(x) < U(x) - x 2 T(x) < 8- 1 U(2x) _ 2x 2 T(x _ 1m!) -+ 0, which shows that L is slowly varying. 
98 Foundations of Modern Probability Finally, assume that n- I / 2 L:k<n k  (. By the previous argument with C n = n l / 2 , we get L(n I/2 )  2, which implies Et 2 = 2 and hence var() = 1. But then n- I / 2 Lk<n(k - E)  (, and so by comparison E = o. - 0 We return to the general problem of characterizing the weak convergence of a sequence of probability measures J-tn on ]Rd in terms of the associated characteristic functions n or Laplace transforms {In- Suppose that Pn or /Ln converges toward some continuous limit c.p, which is not recognized as a characteristic function or Laplace transform. To conclude that J-tn converges weakly toward some measure J.L, we need an extended version of Theorem 5.3, which in turn requires a compactness argument for its proof. As a preparation, consider the space M = M(JRd) of locally finite mea- sures on ]Rd. On M we may introduce the vague topology, generated by the mappings J..l t-+ J..l f = J f dJ..l for all f E Cj(, the class of continuous functions f: ]Rd  ]R+ with compact support. In particular, J-Ln converges vaguely to J-L (written as J-Ln -4 J-L) iff J-tnf -+ J-Lf for all f E Cj{. If the J-tn are probability measures, then clearly J.Ld < 1. The following version of Helly's selection theorem shows that the set of probability measures on JRd is vaguely relatively sequentially compact. Theorem 5.19 (vague sequential compactness, Helly) Any sequence of probability measures on JRd has a vaguely convergent subsequence. Proof: Fix any probability measures J-lI, J.L2,... on }Rd, and let F I , F 2 ,. -- denote the corresponding distribution functions. Write Q for the set of rational numbers. By a diagonal argument, the functions Fn converge on Qd toward some limit G, along a suitable subsequence N' C N, and we may define F(x) = inf{G(r); r E Qd, r > x}, x E ]Rd. (12) Since each Fn has nonnegative increments, the same thing is true for G and hence also for F. From (12) and the monotonicity of G, it is further clear that F is right-continuous. Hence, by Corollary 3.26 there exists some measure J-t on }Rd with J-L(x, y] = F(x, y] for any bounded rectangular box (x,y] C }Rd, and it remains to show that J.Ln  J-l along N'. Then note that Fn(x) -+ F(x) at every continuity point x of F. By the monotonicity of F there exist some countable sets D I ,... , Dd C JR such that F is continuous on C == D x . .. x Dd. Then J-tnU  J-LU for every finite union U of rectangular boxes with corners in 0, and by a simple approximation we get for any bounded Borel set B C ]Rd /-LBO < liminf J-LnB < lim sup J.LnB < J.LB. noo n-+oo (13) For any bounded j.£-continuity set B, we may consider functions f E OJ( supported by B, and proceed as in the proof of Theorem 4.25 to show that v J-Lnf -+ J-Lf. Thus, J.Ln -+ J-L. 0 
5. Characteristic Functions and Classical Limit Theorems 99 If J.-Ln  J.L for some probability measures J.Ln on jRd, we may still have J.Ld < 1, due to an escape of mass to infinity. To exclude this possibility, we need to assume that (JLn) be tight. Lemma 5.20 (vague and weak convergence) For any probability measures fLl, J-L2, . .. on]Rd with J.Ln  J.-L for some measure J-L, we have J.L}Rd == 1 iff (J-tn) is tight, and then J.-Ln  J.L. Proof: By a simple approximation, the vague convergence implies (13) for every bounded Borel set B, and in particular for the balls Br == {x E JRd; Ixl < r}, r > O. If J.LJRd == 1, then J.LB -7 1 as l' -7 00, and the first inequality shows that (J-tn) is tight. Conversely, if (J1n) is tight, then Hrn SUPn J.LnBr -+ 1, and the last inequality yields J-LJRd == 1. Now assume that (J.Ln) is tight, and fix any bounded continuous function f: JRd -+ JR. For any r > 0, we may choose some 9r E Cj{ with 1B T < 9r < 1 and note that lJ.Lnf - J.-Lfl < lJ.Lnf - J-Lnf 9r I + IJ.-Lnf 9r - J-L19r I + IJl! 9r - Jl/I < IJlnl9r - J.LI grl + 11111 (Jln + Jl)B. Here the right-hand side tends to zero as n -t 00 and then r -7 00, so J-tnl -+ JlI. Hence, in this case J.Ln  Jl. 0 Combining the last two results, we may easily show that the notions of tightness and weak sequential compactness are equivalent. The result is extended in Theorem 16.3, which forms a starting point for the theory of weak convergence on function spaces. Proposition 5.21 (tightness and weak sequential compactness) A se- quence of probability measures on }Rd is tight iff every 5ubsequence has a weakly convergent further subsequence. Proof: Fix any probability measures fLl, Jl2,... on }Rd. By Theorem 5.19 every subsequence has a vaguely convergent further subsequence. If (Jln) is tight, then by Lemma 5.20 the convergence holds even in the weak sense. Now assume instead that (J-Ln) has the stated property. If it fails to be tight, we may choose a sequence nk  00 and some constant c > 0 such that JlnkB'k > c for all kEN. By hypothesis there exists some probability measure Jl on JRd such that J.Lnk  J.l along a subsequence N' c N. The sequence (J.-tnk; kEN') is then tight by Lemma 4.8, and in particular there exists some r > 0 with J.-Lnk B < c for all kEN'. For k > r this is a contradiction, and the asserted tightness follows. 0 We may now prove the desired extension of Theorem t).3. 
100 Foundations of Modern Probability Theorem 5.22 (extended continuity theorem, Levy, Bochner) Let J-ll, J-l2, . .. be probability measures on IR d with iln (t) --+ cp(t) for every t E JRd, where the limit cp is continuous at o. Then J.Ln -4 J.L for some probability measure J.L on d with {l == <po A corresponding statement holds for the Laplace transforms of measures on JRi. Proof: Assume that {In --+ cp, where the limit is continuous at o. As in the proof of Theorem 5.3, we may conclude that (J-ln) is tight. Hence, by Proposition 5.21 there exists some probability measure J-l on IR d such that J-ln  J.L along a subsequence N' c N. By continuity we get {In --+ {l along N', and so <p == fl. Finally, the convergence J.Ln  J.L extends to N by Theorem 5.3. The proof for Laplace transforms is similar. 0 Exercises 1. Show that if  and TJ are independent Poisson random variables, then  + 1] is again Poisson. Also show that the Poisson property is preserved under convergence in distribution. 2. Show that any linear combination of independent Gaussian random vari- ables is again Gaussian. Also show that the class of Gaussian distributions is preserved under weak convergence. 3. Show that 'Pr (t) == (1- t / r)+ is a characteristic functions for every r > O. (Hint: Compute the Fourier transform "pr of the function tPr (t) == 1 { I t I < r}, and note that the Fourier transform {/J; of 1/J;2 is integrable. Now use Fourier inversion. ) 4. Let cp be a real, even function that is convex on JR+ and satisfies cp( 0) == 1 and <p( (0) E [0, 1]. Show that 'P is the characteristic function of some symmetric distribution on . In particular, c.p(t) == e- 1tlC is a characteristic function for every c E [0, 1]. (Hint: Approximate by convex combinations of functions 'Pr as above, and use Theorem 5.22.) 5. Show that if jl is integrable, then fl has a bounded and continuous density. (Hint: Let <(Jr be the triangular density above. Then (<Pr)' == 27rcpr, and so J e-ituflePr(t)dt == 27r J 'Pr(x - u)Jl(dx). Now let r --+ 0.) 6. Show that a distribution /-l is supported by some set a'll + b iff litt I == 1 for some t =I=- O. 7. Give an elementary proof of the continuity theorem for generating functions of distributions on Z+. (Hint: Note that if J-Ln  J-l for some distributions on JR+, then iLn --* ji on (0, 00 ). ) 8. The moment-generating function of a distribution J-L on JR is given by [it == J e tx Jl( dx). Assuming fit < 00 for all t in some nondegenerate interval 1, show that jl is analytic in the strip {z E C; z E 1°}. (Hint: Approximate by measures with bounded support.) 
5. Characteristic Functions and Classical Limit Theorems 101 9. Let J-L, J-Ll, J-L2, . .. be distributions on  with moment-generating functions [i, [iI, [i2, . .. such that [in --t ji < 00 on some nondegenerate interval I. Show that J-Ln  J-L. (Hint: If J-Ln  v along some subsequence Nt, then itn --t ;; on 1° along N', and so ;; == ji on 1. By the preceding exercise we get v1R == 1 and f) == fl. Thus, v == J-L.) 10. Let J-L and v be distributions on 1R with finite moments J x n J-L( dx) == J xnv(dx) == m n , where I:n tnlmnl/nI < 00 for some t > O. Show that J-L == v. (Hint: The absolute moments satisfy the same relation for any smaller value of t, so the moment-generating functions exist and agree on (-t,t).) 11. For each n E N, let J-Ln be a distribution on  with finite moments m, kEN, such that limn m == ak for some constants ak with I:k t k \ak 1/ k! < 00 for some t > O. Show that J-Ln  J-L for some distribution J-L with moments ak. (Hint: Each function xk is uniformly integrable wit h respect to the measures J-Ln. In particular, (J-Ln) is tight. If J-Ln  v along some subsequence then v has moments ak') 12. Given a distribution J-L on 1R x 1R+, introduce the mixed transform <p( s, t) == J eisx-ty J-L( dx dy), where s E 1R and t > O. Prove versions for <p of the continuity Theorems 5.3 and 5.22. 13. Consider a null array of random vectors nj == (},..., ]) in zt, let l, . . . , d be independent Poisson variables with means Cl, . . . , Cd, and put  == (l,..., d). Show that I: j nj   iff I:] P{ ;] == I} -+ Ck for all k and I: j P{I:k j > I} --t O. (Hint: Introduce independent random variables 1}j d j' and note that I: j nj   iff I: J 1}nJ  .) 14. Consider some random variables Jl...1] with finite variance such that the distribution of (, 1]) is rotationally invariant. Show that  is centered Gaussian. (Hint: Let 1, 2, . .. be i.i.d. and distributed as , and note that n- 1 / 2 I:k<n k has the same distribution for all n. Now use Proposition 5.9.) - 15. Prove a multivariate version of the Taylor expansion in Lemma 5.10. 16. Let J-L have a finite nth moment m n . Show that jl is n times continuously differentiable and satisfies fl n) == inm n . (Hint: Differentiate n times under the integral sign.) 11. For J-L and m n as above, show that fl2n) exists iff m2n < 00. Also, char- acterize the distributions such that fl6 2n - 1 ) exists. (Hint: For jl proceed as in the proof of Proposition 5.9, and use Theorem 5.17. For jl use Theorem 5.16. Extend by induction to n > 1.) 18. Let J-L be a distribution on 1R+ with moments m n . Show that jLn) == ( -1 )n mn whenever either side exists and is finite. (Hint: Prove the statement for n == 1, and extend by induction.) 19. Deduce Proposition 5.9 from Theorem 5.12. 
102 Foundations of Modern Probability 20. Let the random variables  and nj be such as in Theorem 5.12, and assume that Lj EInjIC ---? 0 for some c > 2. Show that Lj nj  . 21. Extend Theorem 5.12 to random vectors in ]Rd, with the condition Lj Ej --t 1 replaced by Lj cov(nj) --t a, with  as N(O, a), and with j replaced by lnjI2. (Hint: Use Corollary 5.5 to reduce to one dimension.) 22. Show that Theorem 5.15 remains true for random vectors in JRd, with var[nj; Injl < 1] replaced by the corresponding covariance matrix. (Hint: If a, aI, a2,. .. are symmetric, nonnegative definite matrices, then an  a iff u' an U -+ u' au for all u E ]Rd. To see this, use a compactness argument.) 23. Show that Theorems 5.7 and 5.15 remain valid for possibly infinite row-sums Lj nj. (Hint: Use Theorem 4.17 or 4.18 together with Theorem 4.28. ) 24. Let , I, c;2, . .. be i.i.d. random variables. Show that n- 1 / 2 Lk<n k converges in probability iff  = 0 a.s. (Hint: Use condition (iii) in Theorem 5.15. ) 25. Let 1,2,'" be Li.d. J-t, and fix any p E (0,2). Find a J-L such that n- 1 / p Lkn k  0 in probability but not a.s. 26. Let l, 2, . .. be i.i.d., and let p > 0 be such that n- 1 / p Lk::;n k  0 in probability but not a.s. Show that lim sUPn n- 1 / P I Lk<n kl = 00 a.s. (Hint: Note that EIlIP = 00.) - 21. Give an example of a distribution with infinite second moment in the domain of attraction of the Gaussian law, and find the corresponding normalization. 
Chapter 6 Conditioning and Disintegration Conditional expectations and probabilities; regular conditional distributions; disintegration; conditional independence; transfer and coupling; existence of sequences and processes; extension through conditioning Modern probability theory can be said to begin with the notions of con- ditioning and disintegration. In particular, conditional expectations and distributions are needed already for the definitions of martingales and Markov processes, the two basic dependence structures beyond indepen- dence and stationarity. Even in other areas and throughout probability theory, conditioning is constantly used as a basic tool to describe and analyze systems involving randomness. The notion may be thought of in terms of averaging, projection, and disintegration-viewpoints that are all essential for a proper understanding. In all but the most elementary contexts, one defines conditioning with respect to a a-field rather than a single event. In general, the result of the operation is not a constant but a random variable, measurable with respect to the given a-field. The idea is familiar from elementary constructions of the conditional expectation E[I1J], in cases where (, 1]) is a random vector with a nice density, and the result is obtained as a suita.ble function of 1]. This corresponds to conditioning on the a-field F = a(1]). The simplest and most intuitive general approach to conditioning is via projection. Here E[IF] is defined for any  E £2 as the orthogonal Hilbert space projection of  onto the linear subspace of F-measurable random variables. The L 2 -version extends immediately, by contiIluity, to arbitrary  ELI. From the orthogonality of the projection one gets the relation E( - E[IF])( = 0 for any bounded, F-measurable random variable (. This leads in particular to the familiar averaging characterization of E[IF] as a version of the density d( . P)/dP on the a-field :F, the existence of which can also be inferred from the Radon-Nikodym theorem. The conditional expectation is defined only up to a null set, in the sense that any two versions agree a.s. It is then natural to look for versions of the conditional probabilities P[AIF] = E[lAIF] that combine into a random probability measure on O. In general, such regular versions exist only for A restricted to suitable sub-a-fields. The basic case is when € is a random element in some Borel space S, and the conditional distribution P[ E -IF] may be constructed as an F-measurable random measure on 
104 Foundations of Modern Probability s. If we further assume that F = a('T}) for a random element 'T} in some space T, we may write P[ E Bll}] == /l(1}, B) for some probability kernel J.-l from T to S. This leads to a decomposition of the distribution of (, TJ) according to the values of'TJ. The result is formalized in the disintegration theorem-a powerful extension of Fubini's theorem that is often used in subsequent chapters, especially in combination with the (strong) Markov property. Using conditional distributions, we shall further establish the basic trans- fer theorem, which may be used to convert any distributional equivalence  d f('TJ) into a corresponding a.s. representation  == f(ij) with a suitable ij d 1]. From the latter result, one easily obtains the fundamental Daniell- Kolmogorov theorem, which ensures the existence of random sequences and processes with specified finite-dimensional distributions. A different approach is required for the more general Ionescu Tulcea extension, where the measure is specified by a sequence of conditional distributions. Further topics treated in this chapter include the notion of conditional independence, which is fundamental for both Markov processes and ex- changeability and also plays an important role in Chapter 21, in connection with SDEs. Especially useful in those contexts is the elementary but powerful chain rule. Let us finally call attention to the local property of con- ditional expectations, which in particular leads to simple and transparent proofs of the strong Markov and optional sampling theorems. Returning to our construction of conditional expectations, let us fix a probability space (f!, A, P) and consider an arbitrary sub-a-field F c A. In L 2 == L 2 (A) we may introduce the closed linear subspace M, consisting of all random variables TJ E L 2 that agree a.s. with some element of L 2 (:F). By the Hilbert space projection Theorem 1.33, there exists for every  E L2 an a.s. unique random variable 1] E M with  - 1J 1. M, and we define E:F  = E[IF] as an arbitrary F-measurable version of TJ. The L 2 -projection E:F is easily extended to L 1 , as follows. Theorem 6.1 (conditional expectation, Kolmogorov) For any a-field F c A there exists an a.s. unique linear operator E:F : L 1 --t L 1 (F) such that (i) E[E:F; A] = E[; A],  E L 1 , A E :F. The following additional properties hold whenever the corresponding expressions exist for the absolute values: (ii)  > 0 implies E:F > 0 a.s.; (iii) EIE.r I < EII; (iv) 0 < n t  implies E:F (,n t E.r (, a.B.; (v) E:F 'T} = E:F 1} a.s. when  is F-measurable; (vi) E(E:FiJ) == E('TJE:F{) = E(E:F. E:Fl}); (vii) E:FEQ(, = E:F a.B. for all:F c Q. 
6. Conditioning and Disintegration 105 In particular, we note that E:F  =  a.s. iff € has an F-measurable version and that E:F € == E a.s. when €lLF. We shall often refer to (i) as the averaging property, to (ii) as the positivity, to (iii) as the L] -contractivity, to (iv) as the monotone convergence property, to (v) as the pull-out property, to (vi) as the self-adjointness, and to (vii) as the chain rule. Since the operator E:F is both self-adjoint by (vi) and idempotent by (vii), it may be thought of as a generalized projection on £1. The existence of E:F is an immediate consequence of the Radon-Nikodym Theorem 2.10. However, we prefer the following elementary construction from the £2-version. Proof of Theorem 6.1: First assume that € E £2, and define E:F E by projection as above. For any A E F we get E - E:F ..L lA, and (i) follows. Taking A = {E:F € > O}, we get in particular EIE:F I = E[E:F; A] - E[E:F €; A C ] = E[€; A] - E[E: A C ] < EI€I, which proves (Hi). Thus, the mapping E:F is uniformly L1-continuous on £2. Also note that L 2 is dense in £1 by Lemma 1.11 and that £1 is complete by Lemma 1.31. Hence, E:F extends a.s. uniquely to a linear and continuous mapping on L 1 . Properties (i) and (iii) extend by continuity to L 1 , and from Lemma 1.24 we note that EF€ is a.s. determined by (i). If  > 0, we see from (i) with A = {E:FE < O} together with Lemma 1.24 that EJ > 0 a.s., which proves (ii). If 0 < n t E, then n -+ E in £1 by dominated convergence, so by (iii) we get E:Fn -+ EF in L 1 . Now the sequence (E:Fn) is a.s. nondecreasing by (ii), and so by Lemma 4.2 the convergence remains true in the a.s. sense. This proves (iv). Property (vi) is obvious when f" 'TJ E L 2 , and it extends to the general case by means of (iv). To prove (v), we note from the characterization in (i) that E:F  =  a.s. when E is F-measurable. In the general case we need to show that E[E'TJ; A] == E[f,E F 17; A], A E F, which follows immediately from (vi). Finally, property (vii) is obvious for  E £2 since L 2 (F) C L 2 (Q), and it extends to the general case by means of (iv). 0 The next result shows that the conditional expectation E:F  is local in both E and:F, an observation that simplifies many proofs. Given two a-fields F and g, we say that :F == 9 on A if A E :F n 9 and A n :F == A n g. Lemma 6.2 (local property) Let the a-fields F, 9 c A and functions ,'TJ E £1 be such that:F = 9 and  == 'TJ a.s. on some set A E F n Q. Then E:F  = EQ"., a.s. on A. Proof: Since lAE:F  and 1AEQ'TJ are F n Q-measurable, we get B A n {E:F  > EfJ 'TJ} E F n g, and the averaging property yields E[E:F; B] = E[; B] = E[1J; B] == E[EfJ'TJ; B]. 
106 Foundations of Modern Probability Hence, E:F  < EQ1] a.s. on A by Lemma 1.24. The opposite inequality is obtained by interchanging the roles of (, F) and ('TJ, Q). 0 The conditional probability of an event A E A, given a a-field:F, is defined as P:FA = E:F1A or P[AIF] = E[lAIF], A E A. Thus, p:F A is the a.s. unique random variable in £1(:F) satisfying E[pFA;B] = p(AnB), BE F. Note that p:F A = P A a.s. iff AllY and that p:F A = 1A a.s. iff A agrees a.s. with a set in :F. The positivity of E:F implies 0 < p:F A < 1 a.s., and the monotone convergence property gives p:F Un An = Ln p:F An a.s., A}, A 2 ,'" E A disjoint. (1) However, the random set function p:F is not a measure in general since the exceptional null set in (1) may depend on the sequence (An). If 1] is a random element in some measurable space (8, S), we define conditioning on 1] as conditioning with respect to the induced a-field a('TJ). Thus, E17 = EU(7J), p'IJ A = pU(1J) A, or E[I1J] = E[la(1])], P[AI1]J = P[Ala(ry)]. By Lemma 1.13, the 'TJ-measurable function E17 may be represented in the form I(ry), where f is a measurable function on S, determined a.e. £(ry) by the averaging property E[/('TJ); 17 E B] = E[; 1] E B], B E S. In particular, the function 1 depends only on the distribution of (, ry). The situation for p'I'J A is similar. Conditioning with respect to a a-field :F is the special case when 1] is the identity map from (0, A) to (0, :F). Motivated by (1), we proceed to examine the existence of measure-valued versions of the functions p:F and p'TJ. Then recall from Chapter 1 that a kernel between two measurable spaces (T, T) and (8, S) is a function J..L: T x S  JR + such that j.j(t, B) is I-measurable in t E T for fixed B E S and a measure in B E S for fixed t E T. Say that J-L is a probability kernel if J.L(t, S) = 1 for all t. Kernels on the basic probability space S1 are called random measures. Now fix a a-field :F c A and a random element  in some measurable space (8, S). By a regular conditional distribution of, given :F, we mean a version of the function P[ E . IF] on n x S which is a probability kernel from (n, F) to (8, S), hence an .r-measurable random probability measure on S. More generally, if 1] is another random element in some measurable 
6. Conditioning and Disintegration 107 space (T, T), a regular conditional distribution of €, given 'fJ, is defined as a random measure of the form jj('fJ, B) = P[€ E BI17] a.s., B E S, (2) where J-t is a probability kernel from T to S. In the extreme cases when  is F-measurable or independent of F, we note that P[€ E BIF] has the regular version 1{€ E B} or P{€ E B}, respectively. The general case requires some regularity conditions on the space S. Theorem 6.3 (conditional distribution) For any Borel space S and mea- surable space T, let € and 'T} be random elements in Sand T, respectively. Then there exists a probability kernel J-t from T to S satisfying P[ E .1'fJ] == J..t( 1/, .) a.s., and J-L is unique a. e. £( rJ). Proof: We may assume that S E B(). For every r E ij we may choose some measurable function Ir = f(., r): T -t [0,1] such that f(TJ,r)=P[ < rl1J] a.s., rEQ. (3) Let A be the set of all t E T such that f ( t, r) is nondecrea'3ing in r E Q with limits 1 and 0 at :t:oo. Since A is specified by countably many measurable conditions, each of which holds a.s. at TJ, we have A E T and 'fJ E A a.s. Now define F ( t, x) = 1 A ( t) inf r >x f ( t, r) + 1 A c ( t ) 1 {x > O}, x E JR, t E T, and note that F(t,.) is a distribution function on ]R for every t E T. Hence, by Proposition 2.14 there exist some probability measures m(t,.) on JR with met, (-00, x]) = F(t, x), x E JR, t E 7 1 . The function F( t, x) is clearly measurable in t for each x, and by a monotone class argument it follows that m is a kernel from T to JR. By (3) and the monotone convergence property of E1J, we have m( '1], ( -00, x]) = F( TJ, x) = P[ < xl1]] a.s., x E . Using a monotone class argument based on the a.s. monotone convergence property, we may extend the last relation to m(1], B) = P[ E BI1/] a.s., B E B(JR). (4) In particular, we get m('fJ, Be) = 0 a.s., and so (4) remains true on S = BnS with m replaced by the kernel j.L(t,.) = met, .)l{m(t, S) = I} + b s l{m(t, S) < I}, t E T, where s E S is arbitrary. If J-t' is another kernel with the stated property, then J.t(T/, (-00, r]) = P[ < rl1J] = jj' (TJ, (-00, r]) a.s., r E Q, and a monotone class argument yields J-L( "7, .) = J-L' ("I, .) a.s. 0 
108 Foundations of Modern Probability Our next aim is to extend Fubini's theorem, by showing how ordinary and conditional expectations can be computed by integration with respect to suitable conditional distributions. The result may be regarded as a disintegration of measures on a product space into their one-dimensional components. Theorem 6.4 (disintegration) Fix two measurable spaces Sand T, a a- field F c A, and a random element  in S such that P[ E . IF] has a regular version v. Further consider an F -measurable random element rJ in T and a measurable function f on S x T with Elf(, 'T})I < 00. Then E(J(,1])IF] = J v(ds)f(s,1]) a.s. (5) The a.s. existence and F-measurability of the integral on the right should be regarded as part of the assertion. In the special case when F = a(rJ) and P[ E .lrJ] == /-L(rJ, .) for some probability kernel J..l from T to 5, (5) becomes E[f(, 1]) 11]] = J J.t( 1], ds )f( s, 1]) a.s. Integrating (5) and (6), we get the commonly used formulas Ef(, 1]) = E J v(ds)f(s, 1]) = E J J.t(1], ds)f(s, 1]). (7) (6) If  Ji 1J, we may take J..L ( 1J, .) = .c ( ), and (7) reduces to the relation in Lemma 3.11. Proof of Theorem 6.4: If B E Sand C E T, we may use the averaging property of conditional expectations to get E[P[ E BIF]; 1] E CJ = E[vB; rJ E C] E J v(ds)l{s E B, 1] E C}, which proves the first relation in (7) for f == 1Bxc. The formula extends, along with the measurability of the inner integral on the right, first by a monotone class argument to all measurable indicator functions, and then by linearity and monotone convergence to any measurable function f > o. Now fix a measurable function f : S x T -+ JR+ with Ef(f,,1]) < 00, and let A E F be arbitrary. Regarding (17, lA) as an F-measurable random element in T x {O, I}, we may conclude from (7) that P{ E B, 7J E C} E[J(, 1]); A] = E J v(ds)f(s, 1])lA, A E F. This proves (5) for f > 0, and the general result follows by taking differences. 0 Applying (7) to functions of the form f(), we may extend many prop- erties of ordinary expectations to a conditional setting. In particular, such 
6. Conditioning and Disintegration 109 extensions hold for the Jensen, Holder, and Minkowski inequalities. The first of those implies the LP-contractivity liEF lIp < 1Illp,  E LP, P > 1. Considering conditional distributions of entire sequences (, 1, 2, . . . ), we may further derive conditional versions of the basic continuity properties of ordinary integrals. The following result plays an important role in Chapter 7. Lemma 6.5 (uniform integrability, Doob) For any  E I}, the conditional expectations E[IF], :F c A, are uniformly integrable. Proof: By Jensen's inequality and the self-adjointness property, E[lE F I; A] < E[EFII; A] == E[IlpF A], A E A, and by Lemma 4.10 we need to show that this tends to zero as P A  0, uniformly in F. By dominated convergence along subsequences, it is then enough to show that pFn An  0 for any a-fields Tn C A and sets An E A with P An  O. But this is clear, since EpFn An == PAn -+ O. 0 Turning to the topic of conditional independence, consider any sub-a- fields :fi, . . . , Fn, 9 c A. Imitating the definition of ordinary independence, we say that F 1 , . . . , F n are conditionally independent, given Q, if pQ n Bk= I1 pQBk a.s., BkEFk,k==l,...,n. kn kn For infinite collections of a-fields Ft, t E T, the same property is required for every finite sub collection :F t1 , . . . , Ft n with distinct indices t 1, . . . , t n E T. We use the symbol Jig to denote pairwise conditional independence, given some a-field g. Conditional independence involving events At or random elements t, t E T, is defined as before in terms of the induced a-fields a(At) or a(t), respectively, and the notation involving II carries over to this case. In particular, we note that any F-measurable rando]Tl elements t are conditionally independent, given :F. If the t are instead independent of F, then their conditional independence, given F, is equivalent to ordinary independence between the t. By Theorem 6.3, any general statement or for- mula involving independencies between countably many random elements in some Borel spaces has a conditional counterpart. For example, we see from Lemma 3.8 that the a-fields :F 1 , F 2 , . .. are conditionally independent, given some g, iff (:F 1 ,..., Fn) lL :F n + 1 , n E N. g Much more can be said in the conditional case, and we begin with a fundamental characterization. Here and bel0w, F, Q,. .. with or without subscripts denote sub-a-fields of A. 
110 Foundations of Modern Probability Proposition 6.6 (conditional independence, Doob) For any a-fields F, g, and 1-l, we have :F Jig 'Ii iff P[HIF, Q] == P[HIQ] a.s., H E 1-£. (8) Proof: Assuming (8) and using the chain and pull-out properties of conditional expectations, we get for any F E F and H E 1l pQ (F n H) - EQ p:FvQ (F n H) == EQ [p:FVg H; F] - EQ[PQ H; F] == (pQ F) (pQ H), which shows that F llQ1l. Conversely, assuming :F llQll and using the chain and pull-out properties, we get for any F E :F, G E Q, and H E 1£ E[PQ H; F n G] E[(Pg F) (pQ H); G] - E[pQ(FnH);G] == p(FnGnH). By a monotone class argument, this extends to E[PQH;A]==p(HnA), AE:FVQ, and (8) follows by the averaging characterization of p:FvQ H. o From the last result we may easily deduce some further useful proper- ties. Let Q denote the completion of Q with respect to the basic a-field A, generated by 9 and the family N == {N c A; A E A, P A == O}. Corollary 6.7 For any a-fields :F, Q, and 1-£, we have (i) :F llg1l iff :F JlQ(Q, 1-£); (ii) F Jig:F iff F c g . Proof: (i) By Proposition 6.6, both relations are equivalent to P[FIQ, 'Ii] == P[FIQ] a.s., F E :F. (ii) If F JiQ:F, then by Proposition 6.6 IF == P[FIF, Q] == P[FIQ] a.s., F E F, which implies F c Q . Conversely, the latter relation yields P[FIQ] == P[FI Q ] == IF == P[FIF, Q] a.s., F E F, and so F lLQ:F by Proposition 6.6. o The following result is often applied in both directions. 
6. Conditioning and Disintegration 111 Proposition 6.8 (chain rule) For any a-fields Q, H, and :F 1 ,:F 2 ,... , these conditions are equivalent: (i) 1-£.1L(F 1 ,F 2 ,...); g (ii) 1-£ lL F n + 1 , n > O. g, F't, . . . , F'n In particular, we have the commonly used equivalence 1-l11g (F, F') {=:::> 1-£ llg F, 1-l Jlg,:F F' Proof: Assuming (i), we get by Proposition 6.6 for any H E 1-£ and n > 0 P[HIQ,:F 1 , . . . , Fn] = P[HIQ] = P[HIQ,:F 1 , . . . , :F n + 1 ], and (ii) follows by another application of Proposition 6.{). Now assume (ii) instead, and conclude by Proposition 6.6 that for any HE1-£ P[HIQ, :F 1 ,.. . , Fn] = P[HIQ, :F 1 ,. .. , :F n + 1 ], n > O. Summing over n < m gives P[HIQ] = P[HIQ, F 1 ,. . . , :F m ], m > 1, and so by Proposition 6.6 we have lllLg(Fl, . . . , Fm) for all m > 1, which extends to (i) by a monotone class argument. 0 The last result is even useful for establishing ordinary independence. In fact, taking Q = {0, n} in Proposition 6.8, we see that 1-£Jl(:F 1 , F 2 ,.. .) iff 1£ .1L F n + 1 , n > O. :Fl,...,F'n Our next aim is to show how regular conditional distributions can be used to construct random elements with desired properties. This may require an extension of the basic probability space. By an extension of (0, A, P) we mean a product space (n, A) = (n x S, AQ9S), equipped with a probability measure P satisfying PC' x S) = P. Any random element € on 0 may be regarded as a function on n. Thus, we may formally replace  by the random element t(w,s) = (w), which clearly has the same distribution. For extensions of this type, we may retain our original notation and write p and  instead of P and €. We begin with an elementary extension suggested by Theorem 6.4. The result is needed for various constructions in Chapter 12. 
112 Foundations of Modern Probability Lemma 6.9 (extension) Fix a probability kernel J-L between two measur- able spaces Sand T, and let  be a random element in S. Then there exists a random element'TJ in T, defined on some extension of the original prob- ability space 0, such that P[TJ E .I] == J-L(, .) a.s. and also 1]1l€( for every random element ( on Q. Proof: Put (f2, A) == (n x T, A 0 T), where T denotes the a-field in T, and define a probability measure P on f2 by PA = E J lA(-,t)J1(,dt), " A EA. Then clearly PC. xT) == P, and the random element 1](w, t) = ton n satisfies p['TJ E .IA] == J-L(,.) a.s. In particular, we get 'TJllA by Proposition 6.6, \ and so 1]lL€(. 0 For most constructions we need only a single randomization variable. By this we mean a U(0,1) random variable {) that is independent of all previously introduced random elements and a-fields. The basic probability space is henceforth assumed to be rich enough to support any randomiza- tion variables we may need. This involves no essential loss of generality, since we can always get the condition fulfilled by a simple extension of the original space. In fact, it suffices to take f2 == 0 x [0, 1], A == A Q9 B[O, 1], P = P 0 A, where A denotes Lebesgue measure on [0,1]. Then {J(w, t) = t is U(O, 1) on nand {)lLA. By Lemma 3.21 we may use {) to produce a whole sequence of independent randomization variables {)1, {)2, . .. if required. The following basic result shows how a probabilistic structure can be car- ried over from one context to another by means of a suitable randomization. Constructions of this type are frequently employed in the sequel. Theorem 6.10 (transfer) For any measurable space S and Borel space T, let  d  and 'TJ be random elements in Sand T, respectively. Then there exists a random element iJ in T with (, iJ) d (, 'fJ). More precisely, there exists a measurable function f: S x [0, 1]  T such that we may take iJ = f(, iJ) whenever iJlL is U(O, 1). Proof: By Theorem 6.3 there exists a probability kernel J-L from S to T satisfying j.t«(" B) == P[1] E BI]' B E 8[0,1], and by Lemma 3.22 we may choose a measurable function f: S x [0, 1] -t T such that I(s, '19) ha5 distribution J..t(s, .) for every s E S. Define fJ = f(, 19). Using Lemmas 1.22 and 3.11 together with Theorem 6.4, we get for any 
6. Conditioning and Disintegration 113 measurable function g: S X [0, 1] -+ JR+ Eg((.,r,) = Eg(,f(,'I9)) = E J g((',f((.,n))du E J g((., t)p,((., dt) = Eg((., T]), which shows that (, fJ) d (, "I). o The following version of the last result is often useful to transfer representations of random objects. Corollary 6.11 (stochastic equations) Fix two Borel spaces Sand T, a measurable mapping f: T -+ 8, and some random elements  in Sand 1J in T with  d f ("I). Then there exists a random element ij d TJ in T with  == f(ij) a.s. Proof: By Theorem 6.10 there exists some random element ij in T with (, fj) d (f(1}),1}). In particular, fj d 1J and (, f(ij») d (f(1]), f(1J)). Since the diagonal in 8 2 is measurable, we get P{ == f{ij)} == P{f(TJ) == f(TJ)} == 1, and so  == f(fJ) a.s. 0 The last result leads in particular to a useful extension of Theorem 4.30. Corollary 6.12 (extended Skorohod coupling) Let f, fl, f2, . .. be measur- able functions from a Borel space S to a Polish space T, and let , 1, 2, . . . be random elements in S with fn(n)  f(). Then there exist some - d - d -- random elements  =  and n == n such that !n(n) -+ f() a.s. Proof: By Theorem 4.30 there exist some TJ d !() and TJn d fn(n) - d with "In -t ", a.s. By Corollary 6.11 we may further choose some  ==  - d - - and_n = n_such that a.s. f() == TJ and fn(n) == 1Jn for all n. But then !n«(,n) --t f() a.s. 0 The next result clarifies the relationship between randomizations and conditional independence. Important applications appear in Chapters 8, 12, and 21. Proposition 6.13 (conditional independence and randomization) Let, 1}, and ( be random elements in some measurable spa.ces 5, T, and U, respectively, where S is Borel. Then 1l17( iff € == f('I}, 'l9) a.s. for some measurable function f: T x [0,1] --t S and some U(O, ]) random variable 19 J.L( 1], (). Proof: First assume that  = j(1],19) a.s., where f is measurable and 1? J.L ( 1], (). Then Proposition 6.8 yields 19 liT] (, and so (1], 19) J.L1] ( by Corollary 6.7, which implies  lL1](' Conversely, assume that lL1J(' and let 'l9Jl(1J, () be U(O, 1). By Theorem 6.10 there exists some measurable function f: T x [0,1] -+ S such that the 
114 Foundations of Modern Probability - - d - d random element  == f(TJ, iJ) satisfies  ==  and (, 1]) == (, '17). By the sufficiency part, we further note that lL1J(' Hence, by Proposition 6.6, P[t E .11], (] == P[t E . 11/] == P[ E . ITJ] == P[ E . ITJ, (], - d - d and so (, "1, () == (, '17, (). By Theorem 6.10 we may choose some iJ == {) . ... d ... .'" - d wIth (, 17, (, iJ) == (, 1/, (, fJ). In partIcular, iJlL(1J, () and (, f(1/, fJ)) == (,f(1],fJ)). Since  == f(1],fJ) and the diagonal in 8 2 is measurable, we get  = 1('fI, J) a.s., and so the stated condition holds with J in place of iJ. 0 We may use the transfer theorem to construct random sequences or pro- cesses with given finite-dimensional distributions. Given any measurable spaces 8 1 , 8 2 , . .. , we say that a sequence of probability measures Iln on 1 x . . · X 8 n , n E N, is projective if J.1n+1 (. X 8 n + 1 ) = Iln, n E N. (9) Theorem 6.14 (existence of random sequences, Daniell) Given a projec- tive sequence of probability measures J-Ln on 8 1 x ... X 8n, n E N, where 8 2 ,8 3 , . .. are Borel, there exist some random elements n in Sn, n E N, such that .c(1,... 'n) = Iln for all n. Proof: By Lemmas 3.10 and 3.21 there exist some independent random variables l, fJ 2 , fJ 3 , ... such that .c(1) = J.11 and the rJ n are Li.d. U(O,l). We proceed to construct recursively 2, 2, . .. with the desired proper- ties such that each n is a measurable function of 1, 19 2 , . . . , 'l9n o Assuming that 1, . . . , n have already been constructed, let 1]1, . . . ,1]n+1 be arbitrary with joint distribution Iln+1. The projective property yields (1,. . . , n) d (1]1,. . . , 17n), and so by Theorem 6.10 we may form n+1 as a measurable function of 1,"., n, Dn+1 such that (1,"', n+1) d ('fIl,..., 1Jn+l)' This completes the recursion. 0 The last theorem may be used to extend a process from bounded to unbounded domains. We state the result in an abstract form, designed to fulfill the needs of Chapters 18 and 24. Let I denote the identity mapping on any space. Corollary 6.15 (projective limit) For any Borel spaces 8,8 1 ,8 2 ,." , consider some measurable mappings tr n : S -+ Sn and 7rk' : Sn -+ Sk, k < n, such that 7rk: = 1rk 0 7r, k < m < n. (10) Let 8 denote the set of sequences (81,82, . . . ) E 8 1 X 8 2 X . .. with 1rk'sn = Sk for all k < n, and suppose there exists a measurable mapping h: 8 -+ S satisfying (7r1, 7r2, . . . ) 0 h = I on 8 . Then for any probability measures Iln on Sn with J..tn 0 (Irk) -1 = J..tk for all k < n, there exists a probability measure J.t on S such that J..L 07r;1 = J..tn for all n. 
6. Conditioning and Disintegration 115 Proof: Introduce the measures - ( n n ) -l J-tn = J-tn 0 1T' 1 , . . . , 1T n , n EN, (11) and conclude from (10) and the relation between the J.Ln that ( S ) ( n+l n+l ) -l /In+ 1 . X n + 1 - J-tn + 1 0 1T' 1 , . . . , 1T' n ( n+l ) -l ( n n ) -l - J-tn+ 1 0 1T n 0 7r 1 , . . . , 1r n ( n n ) -l- J-tn 0 7r 1 , . . . , 7T' n = J.Ln. By Theorem 6.14 there exists some measure Jl on 51 x 8 2 X . .. with /l 0 (7r 1, . . . , 7r n ) - 1 == iln, n EN, (12) where ifl, 7r2, . .. denote the coordinate projections in 51 x S2 X . .. . From (10)-(12) we see that jl is restricted to 5 , which allows us to define J.L == jl 0 h- l . It remains to note that -1 - ( h) -1 - - -1 - - -1 ( n ) -1 J-t 0 7r n = J.-t 0 1T' n = J-t 0 1T' n == J.ln 0 7r n == J-tn 0 7T n == J.Ln. o We often need a version of Theorem 6.14 for processes on an arbitrary index set T. For any collection of spaces St, t E T, define 51 == XtE1St, leT. Similarly, if each St is endowed with a a-field St, let SI denote the product a-field (F)tEI St. Finally, if each t is a random element in St, write I for the restriction of the process (t) to the index set I. Now let T and T denote the classes of finite and countable subsets of T, respectively. A family of probability measures J-tl, lET or T , is said to be projective if A _ J.LJ(. x SJ\I) == J.LI, Ie J in T or T. (13) Theorem 6.16 (existence of processes, Kolmogorov) }r any set of Borel spaces 5t1. t E T, consider a projective family of probability measures J.LI on S I, lET. Then there exist some random elements X t in St, t E T, such that £(X[) == J.-tI for all lET. Proof: Recall that the product a-field ST in ST is generated by all co- ordinate projections 1C"t, t E T, and hence consists of all countable cylinder sets B x ST\U, B E Su, U E T . For each U E T , there exists by Theorem 6.14 some probability measure J-tu on Su satisfying '" J-tU(.XSU\I)==J-tI, lEU, and by Proposition 3.2 the family J.Lu, U E T , is again J)fojective. We may then define a function J1.: ST -+ [0, 1] by J.-t(. x ST\U) == J-tu, U E T . To check the countable additivity of J.L, consider any disjoint sets A l , A 2 ,... EST- FOf each n we have An = Bn X ST\U n fOf some Un E T 
116 Foundations of Modern Probability and En E SUn. Writing U == Un Un and C n = En X SU\U n , we get /lUnAn = /lu Un C n = Ln/lUC n = Ln/l An . We may now define the process X == (X t ) as the identity mapping on the probability space (ST, ST, JL). 0 If the projective sequence in Theorem 6.14 is defined recursively in terms of a sequence of conditional distributions, then no regularity condition is needed on the state spaces. For a precise statement, define the product J.-l @ 1/ of two kernels J-L and v as in Chapter 1. Theorem 6.17 (extension by conditioning, Ionescu Tulcea) For any mea- surable spaces (Sn, Sn) and probability kernels J-ln from 8 1 x . . . X Sn-1 to Sn, n EN," there exist some random elements n in Sn, n E N, such that .c(l,..., n) == JLl @... Q9 J-ln for all n. Proof: Put Fn == SI Q9 . . . 0 Sn and Tn == Sn+l X Sn+2 X . .. , and note that the class C == Un(:F n x Tn) is a field in To generating the a-field Fex>. Define an additive function J-L on C by J.L(A x Tn) == (J-tl @ . . . @ J-ln)A, A E Fn, n E N, (14) which is clearly independent of the representation C == A x Tn. We need to extend J-L to a probability measure on :Foo. By Theorem 2.5, it is then enough to show that J.L is continuous at 0. For any sequence C 1 , C 2 , . .. E C with C n ..!. 0, we need to show that j..tC n -+ O. Renumbering if necessary, we may assume for each n that C n == An X Tn with An E Fn. Now define ff: = (J-lk+l Q9 . . . @ J-ln) lA n' k < n, (15) with the understanding that f:: == IAn for k == n. By Lemma 1.41 (i) and (iii), each IT: is an Fk-measurable function on 8 1 x.. · X Sk, and from (15) we note that fT: == J-lk+1fk+l' 0 < k < n. (16) Since C n ..j.. 0, the functions Ii: are nonincreasing in n for fixed k, say with limits 9k. By (16) and dominated convergence, 9k == J-lk+19k+l, k > o. (17) Combining (14) and (15), we get j-tC n == 10' .J.. 90. If 90 > 0, then by (17) there exists some Sl E 8 1 with 91(Sl) > O. Continuing recursively, we may construct a sequence s = (SI, S2,.' .) E To such that 9n(SI, . . . , sn) > 0 for all n. Then lc n (8) = IAn (SI,.", sn) = 1::(sl,"', 8n) > 9n(SI,..', sn) > 0, and so S E nn Cn, which contradicts the hypothesis en ..j.. 0. Thus, 90 = 0, which means that j..tC n --t O. 0 
6. Conditioning and Disintegration 117 As a simple application, we may deduce the existence of independent random elements with arbitrary distributions. The result extends the elementary Theorem 3.19. Corollary 6.18 (infinite product measures, Lomnicki and Ulam) For any collection of probability spaces (St, St, J..Lt), t E T, there exist some independent random elements t in St with distributions J..Lt, t E T. Proof: For any countable subset leT, the associated product measure J-LI == Q9tEI J-Lt exists by Theorem 6.17. Now proceed 2.8 in the proof of Theorem 6.16. 0 Exercises 1. Show that (,'TJ) d (',1]) iff P[ E Blry] == P[' E B\1]] a.s. for any measurable set B. 2. Show that E:F  == EQ  a.s. for all  E £1 iff :F == g . 3. Show that the averaging property implies the other properties of conditional expectations listed in Theorem 6.1. 4. Let 0 < n t  and 0 < 1] < , where 1,2,...,1] E L], and fix a a-field F. Show that E:F7] < sUPnE:Fn' (Hint: Apply the monotone convergence property to E:F (  n 1\ 1/).) 5. For any [O,oo]-valued random variable , define E:F  == SUPn EF ( 1\ n). Show that this extension of EF satisfies the monotone convergence property. (Hint: Use the preceding result.) 6. Show that the above extension of E:F remains characterized by the averaging property and that E:F  < 00 a.s. iff the measure  . p == E[; .] is a-finite on F. Extend E:F  to any random variable  such that the measure II . P is a-finite on F. 7. Let 1, 2, . .. be [O,oo]-valued random variables, and fix any a-field :F. Show that lim inf n E:F n > E:F Hm inf n n a.s. 8. Fix any cr-field :F, and let , 1, 2, . .. be random variables with n -+  and E:F sUPn In I < 00 a.s. Show that E:F n --+ E:F  a.s. 9. Let :F be the a-field generated by some partition ..4 1 , A 2 ,'.' E A of n. Show for any  E £1 that E[IF] == E[IAk] == E[; Ak]/ P Ak on Ak whenever P Ak > O. 10. For any a-field F, event A, and random variable l; E £1, show that E[IF, 1A] == E[; AIF]/ P(AIF] a.s. on A. 11. Let the random variables 1, 2,'.' > 0 and a-fields F 1 , :F 2 ,. .. be such that E[nIFn]  O. Show that n  O. (Hint: Consider the random variables n 1\ 1.) 
118 Foundations of Modern Probability 12. Let (, 'TJ) d (, ij), where  ELI. Show that E[I1]] d E[I17]. (Hint: If E[I17] = f('TJ), then E[I17] = 1(17) a.s.) 13. Let (€,1]) be a random vector in ]R2 with probability density f, put F(y) = J f(x,y)dx,andletg(x,y) = f(x,y)/F(y).ShowthatP[€ E BITJ] = IB g(x, 'TJ)dx a.s. 14. Use conditional distributions to deduce the monotone and dominated convergence theorems for conditional expectations from the corresponding unconditional results. 15. Assume that E:F  d  for some  ELI. Show that  is a.s. :F- measurable. (Hint: Choose a strictly convex function f with Ef() < 00, and apply the strict Jensen inequality to the conditional distributions.) 16. Assume that (, 'fJ) d (€, (), where 'rJ is (-measurable. Show that €Jl 71 (. (Hint: Show as above that P[ E BI17] d P[ E BI(], and deduce the corresponding a.s. equality.) 17. Let € be a random element in some separable metric space S. Show that P[€ E .IF] is a.s. degenerate iff  is a.s. F-measurable. (Hint: Reduce to the case when P[ E .IF] is degenerate everywhere and hence equal to b'TJ for some :F-measurable random element 'TJ in S. Then show that  = 1] a.s. ) 18. Assuming lL17( and ,lL(, 17, (), show that Jl7],1'( and 1l1l«(' ,). 19. Extend Lemma 3.6 to the context of conditional independence. Also show that Corollary 3.7 and Lemma 3.8 remain valid for the conditional independence, given some u-field H. 20. Fix any O'-field F and random element  in some Borel space, and define 'fJ = P[ E ./F]. Show that lLllF. 21. Let  and 'fJ be random elements in some Borel space S. Prove the existence of a measurable function f : S x [0,1] ---+ S and some U(O,l) random variable ,.liT] such that  = /(11,/) a.s. (Hint: Choose f with (f(T], 11), 'TJ) d (, 'TJ) for any U(O, 1) random variable 11Jl(, TJ), and then let (I' ij) d ('19,1]) with (, 17) = (/(/' ij), ij) a.s.) 22. Let  and'fJ be random elements in some Borel space S. Show that we may choose a random element ij in S with (, 1]) d (, ij) and 1JJlij. 23. Let the probability measures P and Q on (0, A) be related by Q = .p for some random variable  > 0, and consider any O'-field F c A. Show that Q = Ep[IF] . P on :F. 24. Assume as before that Q =  . P on A, and let :F c A. Show that EQ['TJIF] = Ep[TJIF]/Ep[IF] a.s. Q for any random variable 1J > o. 
Chapter 7 Martingales and Optional Times Filtrations and optional times; random time-change; martin- gale properly; optional stopping and sampling; ma.ximum and upcrossing inequalities; martingale convergence, regularity, and closure; limits of conditional expectations; regularization of submartingales The importance of martingale methods and ideas can hardly be exag- gerated. Indeed, martingales and the associated notions of filtrations and optional times are constantly used in all areas of modern probability; they appear frequently throughout the remainder of this book. In discrete time a martingale is simply a sequence of integrable random variables centered at the successive conditional means, a centering that can always be achieved by the elementary Doob decomposition. More precisely, given any discrete filtration F = (Fn), that is, an increasing sequence of a-fields in f2, we say that a sequence M == (M n ) forms a martingale with respect to F if E[MnIFn-l] = M n - 1 a.s. for all n. A special role is played by the class of uniformly integrable martingales, which can be represented in the form M n = E[IFn] for some integrable random variables . Martingale theory owes its usefulness to a number of powerful general results, such as the optional sampling theorem, the submartingale conver- gence theorem, and a wide range of maximum inequalities. The applications discussed in this chapter include extensions of the Borel-Cantelli lemma and Kolmogorov's 0-1 law. Martingales can also be used to establish the existence of measurable densities and to give a short proof of the law of large numbers. Much of the discrete-time theory extends immediately to continuous time, thanks to the fundamental regularization theorem, which ensures that every continuous-time martingale with respect to a right-continuous filtration has a right-continuous version with left-hand limits. The impli- cations of this result extend far beyond martingale theory. In particular, it will enable us in Chapters 15 and 19 to obtain right-continuous versions of independent-increment and Feller processes. The theory of continuous-time martingales is continued in Chapters 17, 18, 25, and 26 with studies of quadratic variation, randoln time-change, in- tegral representations, removal of drift, additional maximum inequalities, and various decomposition theorems. Martingales also play a basic role for especially the Skorohod embedding in Chapter 14, the stochastic integra- 
120 Foundations of Modern Probability tion in Chapters 17 and 26, and the theories of Feller processes, SDEs, and diffusions in Chapters 19, 21, and 23. As for the closely related notion of optional times, our present treatment is continued with a more detailed study in Chapter 25. Optional times are fundamental not only for martingale theory but also for various models involving Markov processes. In the latter context they appear frequently in the sequel, especially in Chapters 8, 9, 12, 13, 14, 19, and 22-25. To begin our systematic exposition of the theory, we may fix an arbitrary index set T c IR. A filtration on T is defined as a nondecreasing family of a-fields :Ft C A, t E T. We say that a process X on T is adapted to :F = (Ft) if Xt is :Ft-measurable for every t E T. The smallest filtration with this property, namely Ft = a{X s ; s < t}, t E T, is called the induced or generated filration. Here "smallest" is understood in the sense of set inclusion for every fixed t. By a random time we mean a random element T in T = T U {sup T}. We say that T is F-optional or an F-stopping time if {T < t} E :Ft for every t E T, that is, if the process Xt = l{T < t} is adapted. (Here and in similar cases, we often omit the prefix F when there is no risk for confusion.) If T is countable, it is clearly equivalent that {T = t} E Ft for every t E T. For any optional times a and T we note that even a V T and a A 'T are optional. With every optional time 'T we may associate au-field :F r = {A E A; An {T < t} EFt, t E T}. Some basic properties of optional times and the associated a-fields are listed below. Lemma 7.1 (optional times) For any optional times a and T, we have (i) 'T is Fr-measurable; (ii) :F r = Ft on {T = t} for all t E T; (iii) Fu n {u < T} C :F aAr == Fa nFr. In particular, we see from (iii) that {a < 'T} E F(T n Fr, that Fa == Fr on {a == T}, and that :Fa c:F r whenever a < 'T. Proof: (iii) For any A E :Fu and t E T, we have A n {a < r} n {T < t} = (A n {a < t}) n {r < t} n {o- A t < TAt}, which belongs to :Ft since u A t and 'T A t are both Ft-measurable. Hence, Fa n {a < T} C :Fr. The first relation now follows as we replace T by u 1\ T. Replacing u and T by the pairs (0-/\ T, a) and (a /\ T, T), we obtain FO'l\r C FO' n F.,.. To prove the reverse relation, we note that for any A E :Fu n:F.,. and t E T An{O'/\r < t} = (An {a < t})U(An{i < t}) EFt, whence A E :F aAr . 
7. Martingales and Optional Times 121 (i) Applying (iii) to the pair (T, t) gives {T < t} E F T for all t E T, which extends immediately to any t E IR. Now use Lemma 1.4. (ii) First assume that T = t. Then:F T ==:FT n {T < t} C :Ft- Conversely, assume that A E Ft and sET. If s > t we get An {T < s} == A E :Ft C :Fs, and for s < t we have An {T < s} == 0 E :Fs. Thus, A t:: :FT. This shows that :F". == :Ft when T = t. The general case now follows by part (iii). D Given an arbitrary filtration :F on JR+, we may define a new filtration F+ by Ft == nu>t :Fu, t > 0, and we say that :F is right-continuous if F+ == F. In particular, ;:+ is right-continuous for any filtration :F. We say that a random time T is weakly;: -optional if {T < t} E :Ft for every t > O. In that case 'T + h is clearly :F-optional for every h > 0, and we may define F".+ == nh>oF".+h. When the index set is Z+, we take F+ ==:F and make no difference between strictly and weakly optional times. The following result shows that the notions of optional and weakly optional times agree when F is right-continuous. Lemma 7.2 (weakly optional times) A random timf T is weakly F- optional iff it is :F+ -optional, in which case :FT+ == F; == {A E A; An {T < t} E :Ft, t :> O}. (1) Proof: For any t > 0, we note that {T < t}== n {T<r}, r>t {T<t}== U {T < r}, r<t (2) where r may be restricted to the rationals. If A n {T < t} E :Ft+ for all t, we get by (2) for any t > 0 An{T<t}= U (An{T < r})E:Ft. r<t Conversely, if An {T < t} E :Ft for all t, then (2) yields for any t > 0 and h>O An {T < t} = n (A n {T < r}) E Ft+h' rE(t,t+h) and so A n {T < t} E Ft+. For A == n this proves the first assertion, and for general A E A it proves the second relation in (1). To prove the first relation, we note that A E :Fr+ iff 4 E :FT+h for each h > 0, that is, iff A n {T + h < t} E Ft for all t > 0 and h > O. But this is equivalent to A n {T < t} E :Ft+h for all t > 0 and h > 0, hence to An {T < t} E :Ft+ for every t > 0, which means that A E :F;. 0 We have already seen that the maximum and minimum of two optional times are again optional. The result extends to countable collections as follows. 
122 Foundations of Modern Probability Lemma 7.3 (closure properties) For any random times T1, T2, . .. and filtration :F on JR+ or Z+, we have: (i) If the Tn are :F-optional, then so is a = SUPn Tn. (ii) If the Tn are weakly:F-optional, then so is T = inf n Tn, and we have F: = nn F;: . Proof: To prove (i) and the first assertion in (ii), we note that {IT < t} = nn {Tn < t}, {T < t} = Un {Tn < t}, (3) where the strict inequalities may be replaced by < for the index set T = Z+. To prove the second asse,rtion in (ii), we note that F; c nn:F by Lemma 7.1. Conversely, assuming A E nn:F, we get by (3) for any t > 0 An {T < t} = AnUJT n < t} = Un(An {Tn < t}) E :Ft, with the indicated modification for T = Z+. Thus, A E :F:. 0 Part (ii) of the last result is often useful in connection with the following approximation of optional times from the right. Lemma 7.4 (discrete approximation) For any weakly optional time T in JR+, there exist some countably valued optional times Tn .J-. T. Proof: We may define Tn = 2- n [2 n T + 1], n E N. Then Tn E 2- n N for all n, and Tn .J-. T. Also note that the Tn are optional since {Tn < k2- n } = {T < k2- n } E :F k2 -n. 0 It is now time to relate the optional times to random processes. We say that a process X on JR+ is progressively measurable or simply progressive if its restriction to n x [0, t] is Ft @ 8[0, t]-measurable for every t > o. Note that any progressive process is adapted by Lemma 1.26. Conversely, a simple approximation from the left or right shows that any adapted and left- or right-continuous process is progressive. A set A c n x R+ is said to be progressive if the corresponding indicator function 1A has this property, and we note that the progressive sets form a a-field. Lemma 7.5 (optional evaluation) Fix a filtration F on an index set T, let X be a process on T with values in a measurable space (S, S), and let T be an optional time in T. Then X7' is Fr-measurable under each of these conditions: (i) T is countable and X is adapted; (ii) T = JR+ and X is progressive. Proof: In both cases, we need to show that {X r E B, T < t} E :Ft, t > 0, B E S. 
7. Martingales and Optional Times 123 This is clear in case (i) if we write {XrEB}== U {XsEB,r==s}E:Ft, BES. s$;t In case (ii) it is enough to show that Xr/\t is Ft-measurable for every t > o. We may then assume T < t and prove instead that X r is Ft-measurable. Writing X r = X o'l/J where 'ljJ(w) = (w, T(W)), we note that 'ljJ is measurable from Ft to Ft Q9B[O, t] whereas X is measurable on n x [0, t] from :Ft Q9B[O, t] to S. The required measurability of X r now follows by Lemma 1.7. 0 Given a process X on JR+ or Z+ and a set B in the range space of X, we introduce the hitting time TB == inf{t > 0; Xt E B}. It is often important to decide whether TB is optional. The following elementary result covers the most commonly occurring cases. Lemma 7.6 (hitting times) Fix a filtration F on T == IR+ or Z+, let X be an F-adapted process on T with values in a measurable space (S,S), and let B E S. Then TB is weakly optional under each of these conditions: (i) T = Z+; (ii) T == IR+, S is a metric space, B is closed, and X is continuous; (iii) T = R+, S is a topological space, B is open, and X is right- continuous. Proof: In case (i) it is enough to write {TB < n} = U {X k E B} E F n, n EN. kE[l,nJ In case (ii) we get for any t > 0 {TB < t} == U n U {p(Xr,B) < n- 1 } EFt, h>O nEN rEQn[h,t] where p denotes the metric in S. Finally, in case (iii) we get {TB<t}= U {XrEB}EFt, t>O, rEQn(O,t) which suffices by Lemma 7.2. 0 For special purposes we need the following more general but much deeper result, known as the debut theorem. Here and below, a filtration F is said to be complete if the basic a-field A is complete and each :Ft contains all P-null sets in A. 
124 Foundations of Modern Probability Theorem 7.7 (first entry, Doob, Hunt) Let the set A c + x n be pro- gressive with respect to some right-continuous and complete filtration F. Then the time r(w) == inf{t > 0; (t,w) E A} is :F-optional. Proof: Since A is progressive, we have An [0, t) E Ft Q9 B([O, t]) for every t > O. Noting that {r < t} is the projection of A n [0, t) onto 0, we get fT < t} E Ft by Theorem Al.4, and so r is optional by Lemma 7.2. 0 In applications of the last result and for other purposes, we may need to extend a given filtration F on 1R.+ to make it both right-continuous and complete. Writing A for the completion of A, we put N = {A E A ; P A = O} and define :F t = a{Ft,N}. Then F = ( F t ) is the smallest complete extension of F. Similarly, :F+ == (F t +) is the smallest right-continuous extension of F. We show that the two operations commute and can be combined into a smallest right-continuous and complete extension, known as the (usual) augmentation of F. Lemma 7.8 (augmented filtration) Every filtration F on + has a smallest right-continuous and complete extension g, given by 9t= Ft+ = F t+, t > O. (4) Proof: First we note that Ft+ C Ft + C Ft +, t > o. Conversely, assume that A E F t+. Then A E F t+h for every h > 0, and so, as in Lemma 1.25, there exist some sets Ah E Ft+h with P(AAh) = O. Now choose h n --t 0, and define A' = {Ah n i.o.}. Then A' = Ft+ and P(AA') = 0, so A E Ft+ . Thus, F t+ C Ft+ , which proves the second relation in (4). In particular, the filtration 9 in (4) contains F and is both right-contin- uous and complete. For any filtration 1-£ with those properties, we have Qt = F t+ C 1-l t+ = 1-lt+ = 1-lt, t > 0, which proves the required minimality of g. 0 The next result shows how the a-fields Fr arise naturally in connection with a random time-change. Proposition 7.9 (random time-change) Let X > 0 be a nondecreas- ing, right-continuous process adapted to some right-continuous filtration F. Then Ts = inf{t > 0; Xt > s}, s > 0, is a right-continuous process of optional times, generating a right- continuous filtration Qs = Frs' S > O. If X is continuous and the time T is :F-optional, then X r is Q-optional and :F r C Qx.,.. If X is further strictly increasing, then Fr = 9x.,.. 
7. Martingales and Optional Times 125 In the latter case, we have in particular Ft == 9 X t for all t, so the processes (Ts) and (X t ) play symmetric roles. Proof: The times Ts are optional by Lemmas 7.2 and 7.6, and since (T 8 ) is right-continuous, so is (9s) by Lemma 7.3. If X is continuous, then by Lemma 7.1 we get for any F-optional time T > 0 and set A E F T An{XT < s}=An{T < Ts}EFTs==9s, E > O. For A == 0 it follows that X T is 9-optional, and for general A we get A E 9 x T' Thus, F T C 9 x T' Both statements extend by Lemma 7.3 to arbitrary T. Now assume that X is also strictly increasing. For any A E QXt with t > 0 we have An{t < Ts} == An{X t < s} E Qs == FTs' S > 0, and so An {t < Ts < u} E Fu, S > 0, u > t. Taking the union over all S E Q+ -the set of nonnegative rationals-gives A E :Fu, and as u -!. t we get A E :Ft+ == :Ft. Hence, .Ft == 9 x t? which extends as before to t == O. By Lemma 7.1 we now obtain for any A E gX T An {T < t} = An {X T < X t } E QXt == Ft, t > 0, and so A E :FT' Thus, gX T C Fr, so the two a-fields agree. o To motivate the introduction of martingales, we may fix a random variable  E £1 and a filtration F on some index set T, and put Mt == E[IFt], t E T. The process M is clearly integrable (for each t) and adapted, and by the chain rule for conditional expectations we note that Ms == E[Mtl:F s ] a.s., s < t. (5) Any integrable and adapted process M satisfying (5) is called a martingale with respect to :F, or an :F-martingale. When T == Z+, it suffices to require (5) for t == s + 1, so in that case the condition becomes E[LlM n IFn-1] == 0 a.s., n E N, (6) where LlM n = M n - M n - 1 . A process M == (M 1 , . . . , M d ) in ]Rd is said to be a martingale if Ml, . . . , Md are one-dimensional martingales. Replacing the equality in (5) or (6) by an inequality we arrive at the notions of sub- and supermartingales. Thus, a submartingale is defined as an integrable and adapted process X with Xs < E[XtIFs] a.s., s < t. - , (7) reversing the inequality sign yields the notion of a superrnartingale. In par- ticular, the mean is nondecreasing for submartingales and nonincreasing 
126 Foundations of Modern Probability for supermartingales. (The sign convention is suggested by analogy with sub- and super harmonic functions.) Given a filtration F on Z+, we say that a random sequence A = (An) with Ao = 0 is predictable with respect to F, or :F-predictable, if An is Fn_l-measurable for every n E N, that is, if the shifted sequence OA = (A n + 1 ) is adapted. The following elementary result, known as the Doob decomposition, is useful to deduce results for submartingales from the corresponding martingale versions. An extension to continuous time is proved in Chapter 25. Lemma 7.10 (centering) Any integrable and F-adapted process X on Z+ has an a.s. unique decomposition M + A, where M is an F-martingale and A is an :F-predictable process with Ao = o. In particular, X is a submartingale iff A is a.s. nondecreasing. Proof: If X = M + A for some processes M and A as stated, then clearly LlAn = E[dXnlFn-l] a.s. for all n E N, and so An =  E[8X k IF k - 1 ] a.s., n E Z+, L..J k '5. n (8) which proves the required uniqueness. In general, we may define a predictable process A by (8). Then M = X - A is a martingale, since E[LlMnl:Fn-l] = E[XnIFn-l] - An = 0 a.s., n E N. o We proceed to show how the martingale and submartingale properties are preserved under various transformations. Lemma 7.11 (convex maps) Let M be a martingale in jRd, and consider a convex function f: JRd -+ 1R such that X = f(M) is integrable. Then X is a submartingale. The statement remains true for any real submartingale M, provided that f is also nondecreasing. Proof: In the martingale case, the conditional verSion of Jensen's inequality yields f(M s ) = f(E[MtIFs]) < E[f(Mt)IFs] a.s., s < t, (9) which shows that f(M) is a submartingale. If instead M is a submartin- gale and f is nondecreasing, the first relation in (9) becomes f(M s ) < f(E[MtIFs)), and the conclusion remains valid. 0 The last result is often applied with f(x) = Ixl P for some p > lor, for d = 1, with f(x) = x+ = x V o. We say that an optional time T is bounded if T < U a.s. for some u E T. This is always true when T has a last element. The following result is an elementary version of the basic optional sampling theorem. An extension to continuous-time submartingales appears as Theorem 7.29. 
7. Martingales and Optional Times 127 Theorem 7.12 (optional sampling, Doob) Let M be a martingale on some countable index set T with filtration F, and consider two optional times u and T, where T is bounded. Then M..,. is integrable, and M u /\..,. == E[M..,. IFu] a.s. Proof: By Lemmas 6.2 and 7.1 we get for any t < u in 'T E[Mu IF..,.] = E[Mu 1Ft] = Mt == M..,. a.s. on {7 == t}, and so E[MuIFr] = M..,. a.s. whenever T < U a.s. If a < 7 < u, then Fu c F..,. by Lemma 7.1, and we get E[MrIFu] == E[E[MuIF..,.]IFu] == E[MuIFu] == Jt;f a a.s. On the other hand, clearly E[M..,.IFu] == M T a.s. when 7 < (J !\ u. In the general case, the previous results combine by means of Lemmas 6.2 and 7.1 into E[MrIFu] = E[M..,.IFu/\r] = MU/\T a.s. on {a < T}, E[MTIFu] = E[Mu/\..,.IFu] = MUI\T a.s. on {a > 7}. 0 In particular, we note that if M is a martingale on an arbitrary time scale T with filtration F and (Ts) is a nondecreasing family of bounded, optional times that take countably many values, then the process (M Ts ) is a martingale with respect to the filtration (Frs)' In this sense, the martingale property is preserved by a random time-change. From the last theorem we note that every martingale M satisfies EMu == EM T , for any bounded optional times a and T that take only count- ably many values. An even weaker property characterizes the class of martingales. Lemma 7.13 (martingale criterion) Let M be an integrable, adapted pro- cess on some index set T. Then M is a martingale iff EMu = EM..,. for any T -valued optional times a and T that take at most t'l.VO values. Proof: If s < t in T and A E Fs, then T = slA + t1 A c is optional, and so o == EMt - EM r == EMt - E[Ms; A] - E[Mt; A C ] == E(M t - Ms; A]. Since A is arbitrary, it follows that E[Mt - Ms IFs] = 0 a.s. o The following predictable transformation of martingales is basic for the theory of stochastic integration. Corollary 7.14 (martingale transform) Let M be a martingale on some index set T with filtration F, fix an optional time T that takes countably many values, and let 'fJ be a bounded, Fr-measurable random variable. Then the process Nt = 'fJ(Mt - Mtl\r) is again a martingale. Proof: The integrability follows from Theorem 7.12, and the adaptedness is clear if we replace 'fJ by 'fJl{T < t} in the expression for Nt. Now fix any 
128 Foundations of Modern Probability bounded, optional time a taking countably many values. By Theorem 7.12 and the pull-out property of conditional expectations, we get a.s. E[NuIFr] == 1]E[M u - MUl\rlFr] == 'TJ(MUI\T - MUl\r) == 0, and so EN a = O. Thus, N is a martingale by Lemma 7.13. o In particular, we note that optional stopping preserves the martingale property, in the sense that the stopped process M[ == Mrl\t is a martingale whenever M is a martingale and T is an optional time that takes countably many values. More generally, we may consider predictable step processes of the form Vi == " 1Jk 1 {t> Tk}, t E T, k$.n where T} < ... < Tn are optional times, and each TJk is a bounded, F rk - measurable random variable. For any process X, we may introduce the associated elementary stochastic integral (V . X)t = t dXs = L 'fJk(X t - X tMk ), t E T. 10 k$.n From Corollary 7.14 we note that V · X is a martingale whenever X is a martingale and each Tk takes countably many values. In discrete time we may clearly allow V to be any bounded, predictable sequence, in which case (V . X)n == " VkXk, n E Z+. ks.:n The result for martingales extends in an obvious way to submartingales X, provided that the predictable sequence V is nonnegative. Our next aim is to derive some basic martingale inequalities. We begin with an extension of Kolmogorov's maximum inequality in Lemma 4.15. Proposition 7.15 (maximum inequalities, Bernstein, Levy) Let X be a submartingale on a countable index set T. Then for any r > 0 and u E T, rP{suPts.:uXt > r} < E[Xu; SUPts.:uXt > r] < Ext, (10) rP{suPtlXtl > r} < 3suPtEIXtl. (11) Proof: By dominated convergence it is enough to consider finite index sets, so we may assume that T = Z+. Define T = u /\ inf{t; Xt > r} and B = {maxts.:u Xt > r}. Then T is an optional time bounded by u, and we note that B E F T and X r > r on B. Hence, by Lemma 7.10 and Theorem 7.12, rPB < E[Xr;B] < E[Xu;B] < EX:, 
7. Martingales and Optional Times 129 which proves (10). Letting M + A be the Doob decomposition of X and applying (10) to -M, we further get rP{mintuXt < -r} < rP{mintuMt < -r} < EM;; EM: - EMu < EX- - EX o < 2 maxtuEIXtl. Combining this with (10) yields (11). o We proceed to derive a basic norm inequality. For processes X on some index set T, we define x; == sUPstIXsl, X* == SUPtETIXtl. Proposition 7.16 (norm inequality, Doob) Let M be a martingale on a countable index set T, and fix any p, q > 1 with p-l + q-l == 1. Then IIMtilp < qllMtll p , t E T. Proof: By monotone convergence we may assume that T == Z+. If IIMt\lp < 00, then IIMsllp < 00 for all 8 < t by Jensen's inequality, and so we may assume that 0 < IIMtilp < 00. Applying Proposition 7.15 to the submartingale IMI, we get rP{Mt > r} < EUMtl; Mt > r], r > o. Hence, by Lemma 3.4, Fubini's theorem, and Holder's inequality, IIMtll p 1 00 P{Mt > r}r P - 1 dr < p 1 00 E[lMtl; Mt > r] r p - 2 dr M* - pE I M t'l t r p - 2 dr = q E IMtl Mt(P-l) < qllMtll p IIMt(P-l)llq =qIIMtllpIIMtll-l. It remains to divide by the last factor on the right. o The next inequality is needed to prove the basic Theorem 7.18. For any function f on T and constants a < b, the number of [a, b]-crossings of f up to time t is defined as the supremum of all n E Z+ such that there exist times 81 < t1 < 82 < t2 < ... < Sn < t n < t in T with f(Sk) < a and f(tk) > b for all k. The supremum may clearly be infinite. 
130 Foundations of Modern Probability Lemma 7.17 (upcrossing inequality, Doob, Snell) Let X be a submartin- gale on a countable index set T, and let N:(t) denote the number of [a, b]-crossings of X up to time t. Then ENb(t) < E(X t - a)+ t E T, a < b in R a - b-a ' Proof: As before, we may assume that T = Z+. Since Y = (X - a)+ is again a submartingale by Lemma 7.11 and the [a, b]-crossings of X corre- spond to [0, b - a]-crossings of Y, we may assume that X > 0 and a = O. Now define recursively the optional times 0 = TO < 0"1 < T1 < 0"2 < . .. by (Jk = inf{n > Tk-1; X n = OJ, Tk = inf{n > Uk; X n > b}, kEN, and introduce the predictable process V n = Lk11{l1k < n < 7k}, n EN. Then (1 - V) . X is again a submartingale by Corollary 7.14, and so E(l - V) . X)t > E((l - V) . X)o = 0, t > o. Since also (V . X)t > bN8(t), we get bEN8(t) < E(V . X)t < E(l . X)t = EXt - EX o < EXt. 0 We may now state the fundamental regularity and convergence theorem for submartingales. Theorem 7.18 (regularity and convergence, Doob) Let X be an £1_ bounded submartingale on a countable index set T. Then Xt converges along every increasing or decreasing sequence in T, outside some fixed P-null set A. Proof: By Proposition 7.15 we have X* < 00 a.s., and Lemma 7.17 shows that X has a.s. finitely many up crossings of every interval [a, b] with rational a < b. Outside the null set A where any of these conditions fails, it is clear that X has the asserted property. 0 The following is an interesting and useful application. Proposition 7.19 (one-sided bounds) Let M be a martingale on Z+ with M < c a.s. for some constant c < 00. Then a.s. {M n converges} = {suPn M n < oo}. Proof: Since M - Mo is again a martingale, we may assume that Mo = o. Introduce the optional times Tm = inf{n; M n > m}, mEN. The processes MTm are again martingales by Corollary 7.14. Since MTm < m + c a.s., we have EIMTm I < 2(m + c) < 00, and so MTm converges a.s. 
7. Martingales and Optional Times 131 by Theorem 7.18. Hence, M converges a.s. on {suPn Mn < oo} = Urn {M - MTm}. The reverse implication is obvious, since every convergent sequence in 1R is bounded. 0 From the last result we may easily derive the following useful extension of the Borel-Cantelli lemma in Theorem 3.18. Corollary 7.20 (extended Borel-Cantelli lemma, Levy) For any filtration :F on Z+, let An E :F n , n E N. Then a.s. {An Lo.} = {2:nP[AnIFn-l] = oo}. Proof: The sequence M n ==  (lA k - P[A k l:Fk-1]) , n E Z+, kn is a martingale with IMnl < 1, and so by Proposition 7.19 P{M n -t oo} == P{M n -t -oo} == O. Hence, a.s. {An i.o.} = {Ln IAn = oo} = {Ln P[AnIFn-d = oo} . A martingale M or submartingale X is said to be closed if u == sup T belongs to T. In the former case, clearly Mt == E[Mul:F t ] a.s. for all t E T. If instead u ft T, we say that M is closable if it can be extended to a martingale on T == TU{u}. If Mt == E[IFt] for some  E L 1 , we may clearly choose Mu == . The next result gives general criteria for closability. An extension to continuous-time submartingales appears as part of Theorem 7.29. o Theorem 7.21 (uniform integrability and closure, Doob) For any mar- tingale M on an unbounded index set T, these conditions are equivalent: (i) M is uniformly integrable; (ii) M is closable at sup T; (iii) M is L 1 -convergent at sup T. Under those conditions, M is closable by the limit in (iii). Proof: First note that (ii) implies (i) by Lemma 6.5. Next (i) implies (iii) by Theorem 7.18 and Proposition 4.12. Finally, assume that Mt -t  in £1 as t ---+ u = sup T. Using the L1-contractivity of conditional expectations, we get as t -t u for fixed s, Ms == E[MtI:Fs] -1 E[I:Fs] in L 1 . Thus, Ms = E[IFs] a.s., and we may take Mu == . This shows that (iii) implies (ii). 0 
132 Foundations of Modern Probability For comparison, we may examine the case of LP-convergence for p > 1. Corollary 7.22 (LP-convergence) Let M be a martingale on an unboun- ded index set T, and fix any p > 1. Then M converges in LP iff it is LP -bounded. Proof: We may clearly assume that T is countable. If M is LP-bounded, it converges in L 1 by Theorem 7.18. Since IMIP is also uniformly integrable by Proposition 7.16, the convergence extends to LP by Proposition 4.12. Conversely, if M converges in LP, it is LP-bounded by Lemma 7.11. 0 We now consider the convergence of martingales of the special form Mt = E[IFt], as t increases or decreases along some sequence. Without loss of generality, we may assume that the index set T is unbounded above or below, and define respectively :FOC; = V Ft, tET F-OC; = n :Ft. tET Theorem 7.23 (conditioning limits, Jessen, Levy) Let:F be a filtration on a countable index set T c 1R that is unbounded above or below. Then for any  E L 1 , we have as t ---+ ::1:00 E[IFt] ---+ E[I:F:f:OC;] a.s. and in L 1 . Proof: By Theorems 7.18 and 7.21, the martingale Mt = E[IFt] con- verges a.s. and in L 1 as t ---+ ::f:oo, and the limit M:f:OC; may clearly be taken to be :F:f:oo-measurable. To see that M-:f:CX) = E[IF:f:CX)] a.s., we need to verify the relations E[M:f:CX); A] = E[; A], A E F-:f:CX). (12) Then note that, by the definition of M, E[Mt; A] = E[; A], A E Fs, S < t. (13) This clearly remains true for s = -00, and as t ---+ -00 we get the "minus" version of (12). To get the "plus" version, let t --t 00 in (13) for fixed s, and extend by a monotone class argument to arbitrary A E :FCX). 0 In particular, we note the following useful special case. Corollary 7.24 (Levy) For any filtration F on Z+, we have P[AI:F n ] ---+ 1A a.s., A E :FCX). For a simple application, we consider an extension of Kolmogorov's 0-1 law in Theorem 3.13. Say that two u-fields agree a.s. if they have the same completion with respect to the basic iT-field. 
7. Martingales and Optional Times 133 Corollary 7.25 (tail a-field) If :1"1, F 2 , . .. and Q are independent (7- fields, then nn a{Fn' Fn+!,' " ; Q} = Q a.s. Proof: Let T denote the a-field on the left, and note that T Rg(F 1 V . . . V:F n ) by Proposition 6.8. Using Proposition 6.6 and Corollary 7.24, we get for any A E T P[AIQ] == P[AIQ, F 1 , . . . , Fn] -+ lA a.s" which shows that 7 c 9 a.s. The converse relation is obvious. o The last theorem can be used to give a short proof of the law of large numbers. Then let 1, 2, . .. be i.i.d. random variables in £1, put Sn == 1 + . . . + n, and define :F -n == a{ 5n, 5 n + 1 , . . . }. Here F -O:J is trivial by Theorem 3.15, and for any k < n we have E[k IF -n] == E[lIF -n] a.s., since (k,Sn,5n+l,"') d (1,Sn,5n+l,"')' Hence, by Theorem 7.23, n-1Sn - E[n- 1 5nIF_n] == n- 1 " E[J;;IF-n] L-t k 5:n E[ll:F -n] -+ E[lIF -00] == El' As a further application of Theorem 7.23, we consider a kernel version of the regularization Theorem 6.3. The result is needed in Chapter 21. Proposition 7.26 (regular densities) For any measurable space (5, S) and Borel spaces (T, T) and (U,U), let J-£ be a probability kernel from 5 to T xU. Then the densities J-L( s, dt x B) lI(s,t,B) = (d U) ' S E S, t E T, BE U, (14) J-L s, t x have versions that form a probability kernel from 5 x T to U. Proof: We may assume T and U to be Borel subsets of R, in which case J-L can be regarded as a probability kernel from S to JR2. Letting V n denote the a-field in]R generated by the intervals Ink == [(k -1)2- n , k2- n ), k E Z, we define '"" J-L( s, Ink X B) Mn(s, t, B) = L-t (I U) l{t E Ink}, k J-L S, nk X s E 5, t E T, B E B, under the convention % = O. Then Mn(s, ., B) is a version of the density in (14) with respect to V n , and for fixed sand B it is also a martingale with respect to J.-t(s, . x U). By Theorem 7.23 we get Mn(s, ., B) -t v(s,., B) a.e. J-L(s, . x U). Thus, a product-measurable version of II is given by v(s,t,B)=limsupMn(s,t,B), sES, tET., BEU. n -10 o:J It remains to find a version of v that is a probability measure on U for fixed sand t. Then proceed as in the proof of Theoreln 6.3, noting that 
134 Foundations of Modern Probability in each step the exceptional (s, t)-set A lies in S Q9 T and is such that the sections As = {t E T; (8, t) E A} satisfy J-L(s, As xU) = 0 for all 8 E s. 0 In order to extend the previous theory to martingales on JR+, we need to choose suitably regular versions of the studied processes. The next result provides two closely related regularizations of a given submartingale. Say that a process X on JR+ is right-continuous with left-hand limits (abbrevi- ated as rell) if Xt = X t + for all t > 0 and the left-hand limits Xt- exist and are finite for all t > O. For any process Y on Q+, we write Y+ for the process of right-hand limits ¥t+, t > 0, provided that the latter exist. Theorem 7.27 (regularization, Doob) For any :F-submartingale X on R.+ with restriction Y to Q+, we have: (i) y+ exists and is rell outside some fixed P-null set A, and Z = lAc y+ is a submartingale with respect to the augmented filtration :F+ . (ii) If F is right-continuous, then X has an rcll version iff EX is right- continuous; this holds in particular when X is a martingale. The proof requires an extension of Theorem 7.21 to suitable submartin- gales. Lemma 7.28 (uniform integrability) A submartingale X on Z_ is uniformly integrable iff EX is bounded. Proof: Let EX be bounded. Introduce the predictable sequence an = E[Xnl:Fn-1] > 0, n < 0, and note that E" an = EX o - infn<oEX n < 00. nO - Hence, l:n On < 00 a.s., and so we may define An = " Ok, M n == X n - An, k5:.n n < o. Since EA* < 00 and M is a martingale closed at 0, both A and Mare uniformly integrable. 0 Proof of Theorem 7.27: (i) By Lemma 7.11 the process Y V 0 is £1_ bounded on bounded intervals, and so the same thing is true for Y. Thus, by Theorem 7.18, the right- and left-hand limits ¥t:1: exist outside some fixed P-null set A, and so Z = lAc y+ is rcll. Also note that Z is adapted to :F+ . To prove that Z is an F+ -submartingale, fix any times 8 < t, and choose 8n t 8 and t n .i t in Q+ with 8n < t. Then Y Sm < E[¥t n l:F sm ] a.s. for all m and n, and as m ---t 00 we get Zs < E[¥tnIFs+] a.s. by Theorem 7.23. Since }'in --+ Zt in £1 by Lemma 7.28, it follows that Zs < E[ZtIFs+] = E[ZtI F s+] a.s. 
7. Martingales and Optional Times 135 (ii) For any t < t n E Q+, (EX)t n == E(¥t n ), Xt < E[Y't n 1Ft] a.s., and as t n t t we get, by Lemma 7.28 and the right-continuity of F, (EX)t+ = EZ t , Xt < E[Zt 1Ft] = Zt .:1.s. (15) If X has a right-continuous version, then clearly Zt = X:t a.s. Hence, (15) yields (EX)t+ = EXt, which shows that EX is right-continuous. If instead EX is right-continuous, then (15) gives E\Zt - Xtl = EZt - EXt = 0, and so Zt = Xt a.s., which means that Z is a version of X. 0 Justified by the last theorem, we henceforth assume all submartingales to be rcll, unless otherwise specified, and also that the underlying filtration is right-continuous and complete. Most of the previously quoted results for submartingales on a countable index set extend immediately to such a context. In particular, this is true for the convergence rheorem 7.18 and the inequalities in Proposition 7.15 and Lemma 7.17. We proceed to show how Theorems 7.12 and 7.21 extend to submartingales in continuous time. Theorem 7.29 (optional sampling and closure, Doob) Let X be an F- sub martingale on JR+, where X and:F are right-continuous, and consider two optional times l7 and T, where T is bounded. Then X r is integrable, and Xul\r < E[XrIFu] a.s. (16) The statement extends to unbounded times T iff X+ is uniformly integrable. Proof: Introduce the optional times an = 2-n[2 n a + 1] and Tn 2-n[2nr + 1], and conclude from Lemma 7.10 and Theorem 7.12 that XUml\'T n < E[X'T n IFam] a.s., m, n E f. As m  00, we get by Lemma 7.3 and Theorem 7.23 XUl\r n < E[X rn IFa] a.s., n E N. (17) By the result for the index sets 2- n Z+, the random variables Xo;..., X'T2' X T1 form a submartingale with bounded mean and are therefore uni- formly integrable by Lemma 7.28. Thus, (16) follows as we let n -t 00 in (1 7) . If X+ is uniformly integrable, then X is £l-bounded and hence converges a.s. toward some Xoo E £1. By Proposition 4.12 we get xi -t xct in £1, and so E[xtIFs] -+ E[XIFs] in £1 for each s. Letting t -t 00 along a sequence, we get by Fatou's lemma Xs < limtE[XtIFs] -liminftE[X;\Fs] < E[XIFs] - E[XI:Fs] == E[XooIFs]. We may now approximate as before to obtain (16) for arbitrary a and T. Conversely, the stated condition implies that there exists some Xoo E £1 with Xs < E[XooIFs] a.s. for all s > 0, and so X;- < E[XIFs] a.s. by Lemma 7.11. Hence, X+ is uniformly integrable by Lemma 6.5. 0 
136 Foundations of Modern Probability For a simple application, we consider the hitting probabilities of a con- tinuous martingale. The result will be useful in Chapters 14, 17, and 23. Corollary 7.30 (first hit) Let M be a continuous martingale with Mo == 0 and P{M* > O} > 0, and define 'x == inf{t > 0; Mt = x}. Then b P[Ta < Tbl M* > 0] < b _ a < P(Ta < Tbl M* > 0], a < 0 < b. Proof: Since I = Ta 1\ Ib is optional by Lemma 7.6, Theorem 7.29 yields EMrl\t == 0 for all t > 0, and so by dominated convergence EM r == O. Hence, o aP{Ta < Tb} + bP{Tb < Ta} + E[Moo; T == 00] < aP{ra < rb} + bP{Tb < Ta, M* > O} bP{M* > O} - (b - a)P{'a < Tb}, which implies the first inequality. The second one follows by taking complements. 0 The next result plays a crucial role in Chapter 19. Lemma 7.31 (absorption) Let X > 0 be a right-continuous supermartin- gale, and put T = inf{t > 0; Xt /\ Xt- = O}. Then X == 0 a.s. on [7,00). Proof: By Theorem 7.27 the process X remains a supermartingale with respect to the right-continuous filtration :F+. The times Tn = inf {t > 0; Xt < n- l } are ;:+-optional by Lemma 7.6, and by the right-continuity of X we have X rn < n- l on {Tn < oo}. Hence, by Theorem 7.29, E[Xt; Tn < t] < E[X rn ; 'n < t] < n- l , t > 0, n E N. Noting that Tn t 7, we get by dominated convergence E[Xt; I < t] = 0, and so Xt = 0 a.s. on {T < t}. The assertion now follows, as we apply this result to all t E Q+ and use the right-continuity of X. 0 We proceed to show how the right-continuity of an increasing sequence of supermartingales extends to the limit. The result is needed in Chapter 25. Theorem 7.32 (increasing limits of supermartingales, Meyer) Let Xl < x 2 < ... be right-continuous supermartingales with sUP n EX[j < 00. Then Xt = sUPn Xr, t > 0, is again an a.s. right-continuous supermartingale. Proof (Doob): By Theorem 7.27 we may assume the filtration to be right- continuous. The supermartingale property carries over to X by monotone convergence. To prove the asserted right-continuity, we may assume that Xl is bounded below by an integrable random variable; otherwise consider the processes obtained by optional stopping at the times ml\inf{t; Xl < -m} for arbitrary m > O. 
7. Martingales and Optional Times 137 Now fix any  > 0, let T denote the class of optional times T with limsuPultlXu - Xtl < 2, t < T, and put p == infTET Ee- r . Choose 0"1, a2, . .. E T with Ee- an -+ p, and note that a = SUPn an E T with Ee- a == p. We need to show that a == 00 a.s. Then introduce the optional times Tn == inf{t > a; IX:' - Xal > E}, n E J, and put T == limsuPn Tn. Noting that IX t - Xal == liminf IX - Xal < E, t E [0., T), n ---+- 00 we obtain T E T. By the right-continuity of X n , we note that I X - X a I > E on { Tn < oo} for every n. Furthermore, on the set A == {a == T < oo} ,ve have lim inf X > sUPk limn X; == sUPk X == .X a , n-+oo n n and so liminfnXn > Xa + c on A. Since A E Fa by Lemma 7.1, we get by Fatou's lemma, optional sampling, and monotone cOIlvergence, E[X u + ; A] < E[lim infnXn; A] < Hm infnE[Xn; A] < limnE[X;; A] == E[X a ; A]. Thus, PA == 0, and so T > a a.s. on {a < oo}. If p > 0, we get the contradiction Ee- r < p, so p == O. Hence, a == 00 a.s. 0 Exercises 1. Show for any optional times a and T that {a == T} E F(j n:F r and Fu == Fr on {a == T}. However, Fr and Foo may differ on {T == oo}. 2. Show that if a and T are optional times on the time scale IR+ or Z+, then so is a + T. 3. Give an example of a random time that is weakly optional but not optional. (Hint: Let F be the filtration induced by the process Xt == iJt with P{'19 = ::I::l} = !, and take T = inf{t; Xt > O}.) 4. Fix a random time T and a random variable  in JR \ {O}. Show that the process Xt =  1 {r < t} is adapted to a given filtration F iff r is F-optional and  is Fr-measurable. Give corresponding conditions for the process Yi =  1 {T < t}. 5. Let P denote the class of sets A E + x n such that the process lA is progressive. Show that P is a a-field and that a process X is progressive iff it is P-measurable. 
138 Foundations of Modern Probability 6. Let X be a progressive process with induced filtration F, and fix any optional time T < 00. Show that a{ T, XT} C F". C :F; c a{ T, XT+h} for every h > o. (Hint: The first relation becomes an equality when r takes only countably many values.) Note that the result may fail when P{ T = oo} > O. 7. Let M be an F-martingale on some countable index set, and fix an optional time T. Show that M - MT remains a martingale conditionally on :FT. (Hint: Use Theorem 7.12 and Lemma 7.13.) Extend the result to continuous time. 8. Show that any submartingale remains a submartingale with respect to the induced filtration. 9. Let Xl, x 2 , . .. be submartingales such that the process X = sUP n xn is integrable. Show that X is again a submartingale. Also show that limsuPn X n is a submartingale when even sUPn IXnl is integrable. 10. Show that the Doob decomposition of an integrable random sequence X = (X n ) depends on the filtration unless X is a.s. Xo-measurable. (Hint: Compare the filtrations induced by X and by the sequence Y n = (X O ,X n + 1 ).) 11. Fix a random time T and a random variable  E £1, and define Mt = € 1 {T < t}. Show that M is a martingale with respect to the induced filtration :F iff E[; T < tiT> s] = 0 for any s < t. (Hint: The set {T > s} is an atom of :Fs.) 12. Let :F and 9 be filtrations on a common probability space. Show that every F-martingale is a 9-martingale iff Ft c 9tJlFtFoo for every t > o. (Hint: For the necessity, consider F-martingales of the form Ms = E[IFs] with  E Ll(Ft).) 13. Show for any rcll supermartingale X > 0 and constant r > 0 that rP{suPtXt > r} < EXo. 14. Let M be an L2-bounded martingale on Z+. Imitate the proof of Lemma 4.16 to show that M n converges a.s. and in £2. 15. Give an example of a martingale that is Ll-bounded but not uniformly integrable. (Hint: Every positive martingale is Ll-bounded.) 16. Show that if 9Jl:F n 1i for some increasing a-fields :F n , then 9J1.:F oo 1i. 17. Let n ---+  in L I . Show for any increasing a-fields :F n that E[nIFn] --+ E[I..roo] in L 1 . 18. Let , 1, 2, . .. E L 1 with n t  a.s. Show for any increasing a-fields Fn that E[nIFn] -t E[IFoo] a.s. (Hint: By Proposition 7.15 we have sUPm E[ - nIFm]  o. Now use the monotonicity.) 19. Show that any right-continuous submartingale is a.s. reI I. 20. Let (j and T be optional times with respect to some right-continuous filtration F. Show that the operators E:Fu and EFT commute on £1 with product E:FUI\T . (Hint." For any  E L 1 , apply the optional sampling theorem to a right-continuous version of the martingale Mt = E[IFt].) 
7. Martingales and Optional Times 139 21. Let X > 0 be a supermartingale on Z+, and let TO < 71 < ... be optional times. Show that the sequence (X Tn ) is again a supermartingale. (Hint: Truncate the times Tn, and use the conditional Fatou lemma.) Show by an example that the result fails for submartingales. 22. For any random time T > 0 and right-continuous filtration :F = (:F t ), show that the process Xt = P[7 < tiFt] has a right-continuous version. (Hint: Use Theorem 7.27 (ii).) 
Chapter 8 arkov Processes and Discrete- Time Chains Markov property and transition kernels; finite-dimensional dis- tributions and existence; space and time homogeneity; strong Markov property and excursions; invariant distributions and stationarity; recurrence and transience; ergodic behavior of irreducible chains; mean recurrence times . A Markov process may be described informally as a randomized dynamical system, a description that explains the fundamental role that Markov pro- cesses play both in theory and in a wide range of applications. Processes of this type appear more or less explicitly throughout the remainder of this book. To make the above description precise, let us fix any Borel space Sand filtration :F. An adapted process X in S is said to be Markov if for any times s < t we have Xt == !s,t(X s , {)s,t) a.s. for some measurable func- tion !s,t and some U(O, 1) random variable {)s,tJlFs. The stated condition is equivalent to the less transparent conditional independence XtlLXsFs. The process is said to be time-homogeneous if we can take fs,t = fO,t-s and space-homogeneous (when S == JRd) if fs,t(x,.) = fs,t(O,.) +X. A more convenient description of the evolution is in terms of the transition ker- nels J.Ls,t(x, .) == P{fs,t(x, 'l9) E .}, which are easily seen to satisfy an a.s. version of the Chapman-Kolmogorov relation J.-ts,tJ.-tt,u == fts,u. In the usual axiomatic treatment, the latter equation is assumed to hold identically. This chapter is devoted to some of the most basic and elementary por- tions of Markov process theory. Thus, the space homogeneity will be shown to be equivalent to the independence of the increments, which motivates our discussion of random walks and Levy processes in Chapters 9 and 15. In the time-homogeneous case we shall establish a primitive form of the strong Markov property and see how the result simplifies when the pro- cess is also space-homogeneous. Next we shall see how invariance of the initial distribution implies stationarity of the process, which motivates our treatment of stationary processes in Chapter 10. Finally, we shall discuss the classification of states and examine the ergodic behavior of discrete- time Markov chains on a countable state space. The analogous but less elementary theory for continuous-time chains is postponed until Chapter 12. 
8. Markov Processes and Discrete-Time Chains 141 The general theory of Markov processes is more advanced and is not continued until Chapter 19, which develops the basic theory of Feller pro- cesses. In the meantime we shall consider several important subclasses, such as the pure jump-type processes in Chapter 12, Brownian motion and related processes in Chapters 13 and 18, and the above-mentioned random walks and Levy processes in Chapters 9 and 15. A detailed discussion of diffusion processes appears in Chapters 21 and 23, and additional aspects of Brownian motion are considered in Chapters 22, 24, and 25. To begin our systematic study of Markov processes, consider an arbitrary time scale T c , equipped with a filtration F == (Ft), and fix a measurable space (8, S). An S-valued process X on T is said to be a Markov process if it is adapted to F and such that FtJiXu, t < u in T. Xt (1) Just as for the martingale property, we note that even the Markov property depends on the choice of filtration, with the weakest version obtained for the filtration induced by X. The simple property in (1) may be strengthened as follows. Lemma 8.1 (extended Markov property) If X satisfies (1), then FtJi {Xu; u > t}, t E T. Xt (2) Proof: Fix any t == to < tl < ... in T. By (1) we have FtnJlXtn Xtn+l for every n > 0, and so by Proposition 6.8 Ft Ji X tn + 1 , n > O. xto,...,x tn By the same proposition, this is equivalent to Ft Ji (X t 1 , X t2 , . . . ), Xt and (2) follows by a monotone class argument. o For any times s < t in T, we assume the existence of some regular conditional distributions J-ts,t(X s , B) == P[X t E BIX s ] == P[X t E BIFs] a.s. B E S. (3) In particular, we note that the transition kernels /ks,t exist by Theorem 6.3 when S is Borel. We may further introduce the one-dimensional distri- butions Vt == £(X t ), t E T. When T begins at 0, we shall prove that the distribution of X is uniquely determined by the kernels fLs,t together with the initial distribution vo. For a precise statement, it is convenient to use the kernel operations introduced in Chapter 1. Note in particular that if /k and v are kernels on 
142 Foundations of Modern Probability S, then J.L Q9 v and J.LV are kernels from S to 8 2 and S, respectively, given for s E S by (J.l (g) v) (s, B) (J.Lv) (s, B) J J.l(s, dt) J vet, du)lB(t, u), - (J.l (g) v)(s, S x B) = J J.l(s, dt)v(t, B), B E S2, BE S. Proposition 8.2 (finite-dimensional distributions) Let X be a Markov process on T with one-dimensional distributions lit and transition kernels jJs,t. Then for any to < . . . < t n in T, £(X to ' . . . , X tn ) - lito Q9 J.ltO,tl @ . . . @ J.lt n -l ,tn' (4) P[(X t1 ,..., X tn ) E 'IFto] == (J.ltO,tl (g)... Q9 J.ltn-l,tn)(X to , .). (5) Proof: Formula (4) is clearly true for n = O. Proceeding by induction, assume (4) to be true with n replaced by n - 1, and fix any bounded measurable function f on sn+l. Noting that Xto,"', X tn _ 1 are F tn _ 1 - measurable, we get by Theorem 6.4 and the induction hypothesis Ef(Xto,..., X tn ) - E E[f(Xto,"', Xtn)IFtn_l] E J f(Xto,..., X tn _1> xn)/Ltn-l,t n (X tn _ p dx n ) (Vto Q9 J.Lto ,tl Q9 . . . Q9 J.ltn-l ,t n ) f, as desired. This completes the proof of (4). In particular, for any B E Sand C E sn we get P{(Xto,...,X tn ) E B x C} L Vto (dx)(/Lto,tl 0 . · · 0 /Ltn-btn )(x, C) - E[(J.ltO,tl Q9'" Q9 J.Ltn-l,tn)(X to , C); Xto E B], and (5) follows by Theorem 6.1 and Lemma 8.1. o An obvious consistency requirement leads to the following basic so-called Chapman-Kolmogorov relation between the transition kernels. Here we say that two kernels jJ and J.l' agree a.s. if jJ(x, .) == J.L' (x, .) for almost every x. Corollary 8.3 (Chapman, Smoluchovsky) For any Markov process in a Borel space 8, we have J-l s , u == J.L s , t J-Lt, u a. s. v s , S < t < u. Proof: By Proposition 8.2 we have a.s. for any B E S J.ls,u(X s , B) P[X u E BIFs] == P[(Xt, Xu) E S x BIFs] (J.ls,t Q9 J.lt,u) (X s , S x B) = (J-Ls,tJ.Lt,u)(X s , B). Since S is Borel, we may choose a common null set for all B. o 
8. Markov Processes and Discrete-Time Chains 143 We henceforth assume that the Chapman-Kolmogorov relation holds identically, so that II. II. t II. t S < t < U. r-S,U = fA'S, fA' ,u, (6) Thus, we define a Markov process by condition (3), in terms of some tran- sition kernels J.ts,t satisfying (6). In discrete time, when T = Z+, the latter relation is no restriction, since we may then start from any versions of the kernels J.Ln = J..Ln-1,n, and define J..Lm,n = J..Lm+1 . . . J..Ln for arbitrary m < n. Given such a family of transition kernels J..Ls,t and an arbitrary initial distribution v, we need to show that an associated Markov process exists. This is ensured, under weak restrictions, by the following result. Theorem 8.4 (existence, K olmogorov) Fix a time scale T starting at 0, a Borel space (S, S), a probability measure v on S, and a family of probability kernels J.Ls,t on S, S < t in T, satisfying (6). Then there exists an S -valued Markov process X on T with initial distribution v and transition kernels J..Ls,t. Proof: Introduce the probability measures V t t VII- t t @... Q9 II. t t o = t o < t 1 < . . . < t n , n EN. 1,..., n = fA' 0, 1 r- n-l, n' To see that the family (Vto,...,tn) is projective, let B E sn-1 be arbitrary, and define for any k E {I, . . . , n} the set B k = {(Xl, . . . , X n) E sn; (Xl"", X k -1 , X k+ 1, . . . , X n) E B}. Then by (6) VtI,...,t n Bk (VJ.LtO,tl @ . . . Q9 J.Ltk-l ,tk+l Q9 . . . 0 J.ttn-l ,tn)B Vtl ,...,tk-l ,tk+l ,...,t n B, as desired. By Theorem 6.16 there exists an S-valued process X on T with £(X tl ,... ,X tn ) = Vtl,...,t n , t1 < ... < tn, n E N, (7) and, in particular, £(X o ) = Vo = v. To see that X is Markov with transition kernels J.Ls,t, fix any times S1 < . . · < Sn = S < t and sets B E sn and C E S, and conclude from (7) that P{ (X S1 , . . . , X Sn , Xt) E B x C} vs1,...,sn,t(B X C) E [J.Ls, t ( X s, C); (X S 1 , . . . , X S n) E B]. Writing :F for the filtration induced by X, we get by a monotone class argument P[X t E C;A] = E[J.Ls,t(Xs,C);A], A E Fs, and so P[Xt E CIFs] = J.ts,t(Xs, C) a.s. 0 
144 Foundations of Modern Probability Now assume that S is a measurable Abelian group. A kernel J-l on S is then said to be homogeneous if J.l(x, B) = /-L(O, B - x), XES, B E S. An S-valued Markov process with homogeneous transition kernels J-ls,t is said to be space-homogeneous. Furthermore, we say that a process X in S has independent increments if, for any times to < ... < tn, the incre- ments X tk -X tk _ 1 are mutually independent and independent of Xo. More generally, given any filtration :F on T, we say that X has :F-independent increments if X is adapted to F and such that Xt - XsJlF s for all s < t in T. Note that the elementary notion of independence corresponds to the case when F is induced by X. Proposition 8.5 (independent increments and homogeneity) Consider a measurable Abelian group S, a filtration F on some time scale T, and an S -valued and :F -adapted process X on T. Then X is space-homogeneous :F- Markov iff it has :F -independent increments, in which case the transition kernels are given by J-ls,t(x,B)==P{Xt-XsEB-x}, xES, BES, s < tinT. (8) Proof: First assume that X is Markov with transition kernels J.ts,t(X, B) == J.ts,t(B - x), XES, B E S, s < t in T. (9) By Theorem 6.4, for any s < t in T and B E S we get P[X t - Xs E BI:F s ] = P[X t E B + Xsl:Fs] == /-Ls,t(X s , B + Xs) == J-ls,tB. Thus, Xt - Xs is independent of 5s with distribution J-ls,t, and (8) follows by means of (9). Conversely, assume that Xt - Xs is independent of :Fs with distribution J..Ls,t. Defining the associated kernel J.ts,t by (9), we get by Theorem 6.4, for any s, t, and B as before, P[X t E BIFs] = P[X t - Xs E B - XsIFs] = J-ls,t(B - Xs) == J.ts,t(X s , B). Thus, X is Markov with the homogeneous transition kernels in (9). 0 We may now specialize to the time-homogeneous case--when T == R+ or Z+ and the transition kernels are of the form J-ls,t = J-lt-s, so that P[X t E BIFs] = J.tt-s(X s , B) a.s., B E S, s < t in T. Introducing the initial distribution v = £(X o ), we may write the formulas of Proposition 8.2 as £( Xto , . . . , X tn ) P[(X tl ,.. . , X tn ) E .IFto] VJ..tto Q9 J..ttl-to Q9 . . . Q9 J1t n -tn-I' (J.ttl-to ... Q9 J.ttn-tn-l)(X to , .). 
8. Markov Processes and Discrete-Time Chains 145 The Chapman-Kolmogorov relation now becomes J-Ls+t == JLsJLt, s, t E T, which is again assumed to hold identically. We often refer to the family (/It) as a semigroup of transition kernels. The following result justifies the interpretation of a discrete-time Markov process as a randomized dynamical system. Proposition 8.6 (recursion) Let X be a process on Z+ with values in a Borel space S. Then X is Markov iff there exist some measurable functions !1, 12,...: Sx [0,1] -4 Sand i.i.d. U(O, 1) random variables 19 1 ,19 2 ,.. .1lX o such that X n == In(X n - 1 , 'l?n) a.s. for all n E N. Here we may choose 11 == 12 == . . . = 1 iff X is time-homogeneous. Proof: Let X have the stated representation and introduce the kernels /In(X,.) == P{!n(X, {}) E'}, where 19 is U(O, 1). Writing :F for the filtration induced by X, we get by Theorem 6.4 for any B E S P[X n E BIFn-1] P[fn(Xn-1, 19 n) E BIFn-l] A {t; f n ( X n -1 , t) E B} == J-ln ( X n -1 , B), which shows that X is Markov with transition kernels j1'n' Now assume instead the latter condition. y emma 3.22 we may choose some associated functions fn as above. Let '19 1 ,19 2 " .. be i.i.d. U(O,l) and - d - -- independent of Xo == Xo, and define recursively X n == !n(X n - 1 ,19 n ) for n E N. As before, X is Markov with transition kernels /-In. Hence, X d X by Proposition 8.2, and so by Theorem 6.10 there exist some random variables 19 n with (X, ({) n) ) d (X, (J n) ). Since the diagonal in 52 is measurable, the desired representation follows. The last assertion is obvious from the construction. 0 Now fix a transition semigroup (/It) on some Borel space S. For any probability measure v on S, there exists by Theorem 8.4 an associated Markov process Xv, and by Proposition 3.2 the corresponding distribution P II is uniquely determined by 1/. Note that P II is a probability measure on the path space (ST,ST). For degenerate initial distributions 8x, we may write Px instead of Pbx' Integration with respect to Pv or Px is denoted by Ev or Ex, respectively. Lemma 8.7 (mixtures) The measures Px form a probability kernel from S to ST, and for any initial distribution 1/ we have PyA = l Px(A) v(dx), A EST. (10) Proof: Both the measurability of PxA and formula (10) are obvious for cylinder sets of the form A = (7rtl"..' 1ftn )-1 B. The general case follows easily by a monotone class argument. 0 
146 Foundations of Modern Probability Rather than considering one Markov process Xv for each initial distribu- tion 11, it is more convenient to introduce the canonical process X, defined as the identity mapping on the path space (ST, ST), and equip the latter space with the different probability measures Pv. Then Xt agrees with the evaluation map 7ft : w t-+ Wt on ST, which is measurable by the definition of ST. For our present purposes, it is sufficient to endow the path space ST with the canonical filtration F induced by X. On ST we may also introduce the shift operators Ot : ST  ST, t E T, given by ( (}tW ) s = W s+t , s, t E T, WEST, and we note that the Ot are measurable with respect to ST. In the canonical case it is further clear that OtX = Ot = X 0 Ot. Optional times with respect to a Markov process are often constructed recursively in terms of shifts on the underlying path space. Thus, for any pair of optional times a and T on the canonical space, we may consider the random time 'Y = a + T 0 (jq, with the understanding that 'Y = 00 when a = 00. Under weak restrictions on space and filtration, we show that 'Y is again optional. Let 0(8) and D(8) denote the spaces of continuous or rcll functions, respectively, from + to S. Proposition 8.8 (compound optional times) For any metric space S, let a and T be optional times on the canonical space 8 00 , C(S), or D(S), endowed with the right-continuous, induced filtration. Then even 'Y = a + T 0 (} a is optional. Proof: Since a An + T 0 (JuAn t "'I, we may assume by Lemma 7.3 that a is bounded. Let X denote the canonical process with induced filtration F. Since X is F+ -progressive, X a + s = Xs ,0 (Ja is F:+s-measurable for every s > 0 by Lemma 7.5. Fixing any t > 0, it follows that all sets A = {X s E B} with s < t and B E S satisfy 0;1 A E F:+ t . The sets A with the latter property form a a-field, and therefore 0;1 Ft C F:+ t , t > o. Now fix any t > 0, and note that (11) {'Y<t}= U {a<r,To(Ju<t-r}. rEQn(O,t) For every r E (0, t) we have {T < t-r} E Ft-r, so O;l{T < t-r} E :F:+ t - r by (11), and Lemma 7.2 yields {a < r, T 0 (ja < t - r} = {a + t - r < t} n 0;1{ T < t - r} EFt. Thus, {'Y < t} E:Ft by (12), and so "'I is F+-optional by Lemma 7.2. 0 (12) We proceed to show how the elementary Markov property may be extended to suitable optional times. The present statement is only p:r.:e- 
8. Markov Processes and Discrete-Time Chains 147 liminary, and stronger versions are obtained under further conditions in Theorems 12.14, 13.11, and 19.17. Proposition 8.9 (strong Markov property) Fix a time-homogeneous Mar- kov process X on T == 1R.+ or Z+, and let T be an optional time taking countably many values. Then P[O,XEAIF,]=PXTA a.s.on{r<oo}, AES T . (13) If X is canonical, it is equivalent that Ev[ 0 0, IF,] == EXT' Pv-a.s. on {r < oo}, (14) for any distribution v on S and bounded or nonnegative random variable . Since {T < co} E F" we note that (13) and (14) make sense by Lemma 6.2, although O,X and Px.,. are defined only for r < 00. Proof: By Lemmas 6.2 and 7.1 we may assume that T == t is finite and nonrandom. For sets A of the form A == (7r t 1 , . . . , 7r t n ) -1 B, t 1 < . .. < tn, B E sn, n EN, (15 ) Proposition 8.2 yields P[Ot X E AIFtJ P[(X t + t1 , . . . , X t + tn ) E BIFt] (J-ttl  J.L t 2- t l @ . . . Q9 J-ttn-tn-l )(X t , B) == PXtA, which extends by a monotone class argument to arbitrary A E ST. In the canonical case we note that (13) is equivalent to (14) with  == lA, since in that case  00, == 1 {O,X E A}. The result extends by linearity and monotone convergence to general . 0 When X is both space- and time-homogeneous, the strong Markov property can be stated without reference to the family (Px). Theorem 8.10 (space and time homogeneity) Let X be a space- and time- homogeneous Markov process in a measurable Abelian group S. Then PxA==Po(A-x), XES, AES T . (16) Furthermore, (13) holds for a given optional time T < 00 iff X, '/,s a.s. FT-measurable and x-X o d (),X - X". Jl F,. (17) 
148 Foundations of Modern Probability Proof: By Proposition 8.2 we get for any set A as in (15) Px 0 (7r tl , . . . , 7r t n ) -1 B (J-ltl Q9 J-l t 2- t l @ ... Q9 J-ltn-tn-l)(X, B) (J-ltl Q9 J-l t 2- t l Q9 . . . Q9/-lt n -t n -l )(0, B - x) Po 0 (11'" t l'.'.' 7rt n )-l(B - x) = Po(A - x), which extends to (16) by a monotone class argument. Next assume (13). Letting A == 11'"0 1 B with B E S, we get Px A IB(X,) == PX T {7rQ E B} == P[X, E BIF,] a.s., and so X, is a.s. F,-measurable. By (16) and Theorem 6.4 we have P[B,X - X, E AIF,] == PxT(A + X,) == PoA, A EST, (18) which shows that (),X - X, is independent of F, with distribution Po. For T == 0 we get in particular £(X - Xo) = Po, and (17) follows. Next assume (17). To deduce (13), let A E ST be arbitrary, and conclude from (16) and Theorem 6.4 that P[OrX E AIF.,.] P[O,X - X r E A - XrIF.,.] Po(A - X,) == PXrA. o If a time-homogeneous Markov process X has initial distribution v, then the distribution at time t E T equals Vt == V/-lt, or Vt B = J V (dx)f..Lt (x, B), B E 8, t E T. A distribution v is said to be invariant for the semi group (J.-Lt) if Vt is independent of t, so that Vjlt = v for all t E T. We also say that a process X on T is stationary if ()tX d X for all t E T. The two notions are related as follows. Lemma 8.11 (stationarity and invariance) Let X be a time-homogene- ous Markov process on T with transition kernels J-lt and initial distribution v. Then X is stationary iff v is invariant for (J-lt). Proof: Assuming v to be invariant, we get by Proposition 8.2 d (X t + tl ,... ,X t + tn ) = (X tl ,... ,X tn ), t, t 1 < ... < t n in T, and the stationarity of X follows by Proposition 3.2. o For processes X in discrete time, we may consider the sequence of suc- cessive visits to a fixed state YES. Assuming the process to be canonical, we may introduce the hitting time Ty == inf{n E N; X n == y} and then define recursively k+ 1 k () Ty = Ty + Ty 0 ,;, k E Z+, 
8. Markov Processes and Discrete-Time Chains 149 starting from T == O. Let us further introduce the occupation times K-y == sup{k; T; < oo} = " l{X n = y}, YES. n?l The next result expresses the distribution of "'y in terms of the hitting probabilities r xy == Px{Ty < oo} == Px{Ky > O}, x,y E S. Proposition 8.12 (occupation times) For any x, y E 51 and kEN, Px{"'y > k} Px{T; < oo} = rXyr;l, r xy 1 - Tyy ExK-y (19) (20) Proof: By the strong Markov property, we get for any kEN Px{r;+l < oo} Px {r; < 00, ry 0 Or; < oo} Px{T; < oo}Py{Ty < oo} == ryyPx{T; < oo}, and the second relation in (19) follows by induction on k. The first relation is clear from the fact that K-y > k iff T; < 00. To deduce (20), conclude from (19) and Lemma 3.4 that ExKy = '" Px{Ky > k} = "'r Xy r;;l = r xy . L...J L...J 1 - r k ?l k? 1 yy For x == y the last result yields Px {fix > k} == Px {T; < oo} == rx' kEN. o Thus, under Px, the number of visits to x is either a.s. infinite or geomet- rically distributed with mean Exlix + 1 == (1- rxx)-l < (X). This leads to a corresponding classification of the states into recurrent and transient ones. Recurrence can often be deduced from the existence of an invariant distribution. Here and below we write Py == Jln(x, {y}). Proposition 8.13 (invariant distributions and recur'rence) If an in- variant distribution 11 exists, then any state x with v{ x } > 0 is recurrent. Proof: By the invariance of 11, o < lI{x} = J lI(dy)p;x' n E N. Thus, by Proposition 8.12 and Fubini's theorem, 00 = I: J lI(dy)p;x = J lI(dy) I:p;x = J lI(dy) 1 :y; nl nl xx (21) 1 < 1 - Txx Hence, r xx == 1, and so x is recurrent. o 
150 Foundations of Modern Probability The period d x of a state x is defined as the greatest common divisor of the set {n EN; Px > O}, and we say that x is aperiodic if d x = 1. Proposition 8.14 (positivity) If xES has period d < 00, then p > 0 for all but finitely many n. Proof: Define S = {n E N; p > O}, and conclude from the Chapman- Kolmogorov relation th,at S is closed under addition. Since S has greatest common divisor 1, the generated additive group equals Z. In particular, there exist some nl,.. . , nk E Sand Zl,. . . , Zk E Z with E j zjnj = 1. Writ- ing m = nl E j IZjlnj, we note that any number n > m can be represented, for suitable h E Z+ and r E {O,..., nl - I}, as n = m + hnl + r = hnl + L '<k(n1Izjl + rZj}nj E S. J_ For each xES, the successive excursions of X from x are given by o Y n = XTx 0 ()Tn, n E Z+, x as long as T: < 00. To allow for infinite excursions, we may introduce an extraneous element fJ (j. S, and define Y n = "8 = (8,8,. ..) whenever T: = 00. Conversely, X may be recovered from the Y n through the formulas Tn =  inf{t > 0; Yk(t) = x}, (22) k<n Xt = Yn(t - Tn), Tn < t < Tn+l, n E Z+. (23) The distribution V x = Px 0 yo-l is called the excursion law at x. When x is recurrent and r yx = 1, Proposition 8.9 shows that Y 1 , Y 2 , . .. are i.i.d. V x under Py. The result extends to the general case, as follows. Proposition 8.15 (excursions) Consider a discrete-time Markov process X in a Borel space S, and fix any XES. Then there exist some independent processes Yo, Y 1 ,... in S, all but Yo with distribution v x , such that X is a.s. given by (22) and (23). - d - - - Proof: Put Yo = Yo, and let Y 1 , Y2, . .. be independent of Yo and i.i.d. V X . Construct associated random times TO, Tt, . .. as in (22), and define a process X as in (23). By Corollary 6.11, it is enough to show that X d x. Writing  = sup{n > 0; 'Tn < oo}, it is equivalent to show that  = sup{n > 0; Tn < oo}, -- d - --- (yo, . . . , Y K , fJ, <5, . . . ) = (yo, . . . , Y it , <5, fJ, . . . ). (24) 1]sing the strong Markov property on the left and the independence of the Y n on the right, it is easy to check that both sides are Markov processes in Sz+ U {6} with the same initial distribution and transition kernel. Hence, (24) holds by Proposition 8.2. 0 
8. Markov Processes and Discrete-Time Chains 151 By a discrete-time Markov chain we mean a Markov process on the time scale Z+, taking values in a countable state space S. In this case the tran- sition kernels of X are determined by the n-step transition probabilities pij = JLn(i, {j}), i,j E S, and the Chapman-Kolmogorov relation becomes m+n """" m n Pik ==  .PijPjk, J i, k E S, m, n E N, (25) or in matrix notation, pm+n = pmpn. Thus, pn is the nth power of the ma- trix p = pI, which justifies our notation. Regarding the initial distribution vasa row vector (Vi), we may write the distribution at time n as vpn. As before, we define rij = Pi{Tj < oo}, where Tj == inf{n > 0; X n == j}. A Markov chain in S is said to be irreducible if r ij > 0 for all i, j E S, so that every state can be reached from any other state. For irreducible chains, all states have the same recurrence and periodicity properties. Proposition 8.16 (irreducible chains) For any irreducible Markov chain, (i) the states are either all recurrent or all transient; (ii) all states have the same period; (Hi) if v is invariant, then Vi > 0 for all i. For the proof of (i) we need the following lemma. Lemma 8.17 (recurrence classes) Let i E S be recurrent, and define Si == {j E S; rij > O}. Then rjk = 1 for any j, k E Si, and all states in Si are recurrent. Proof: By the recurrence of i and the strong Markov property, we get for any j E Si o P i { Tj < 00, Ti 0 f)Tj == oo} Pi{Tj < oo}Pj{Ti == oo} == rij(1 - rji). Since rij > 0 by hypothesis, we obtain rji = 1. Fixing any m, n E N with Pij,Pji > 0, we get by (25) E > L m+n+s > L n 8 m n m E J '''' J ' P . . P . . p . . p .. == P . . p .. -iK-i == 00 - 8>0 JJ - 8>0 J U 'lJ J'l 'I,J .,., , and so j is recurrent by Proposition 8.12. Reversing the roles of i and j gives Tij = 1. Finally, we get for any j, k E Si rjk > Pj{Ti < 00, Tk 0 f)Ti < oo} == Tjirik == 1. o Proof of Proposition 8.16: (i) This is clear from Lemma 8.17. (ii) Fix any i, j E S, and choose m, n E N with pi], Pji > O. By (25), m+h+n > n h m Pjj - PjiPiiPij' h > o. 
152 Foundations of Modern Probability For h = 0 we get pjj+n > 0, and so djl(m + n) (d j divides m + n). Hence, in general, p > 0 implies d j Ih, and we get d j < die Reversing the roles of i and j yields the opposite inequality. (iii) Fix any i E S. Choosing j E S with Vj > 0 and then n E N with Pji > 0, we see from (21) that even Vi > O. 0 We may now state the basic ergodic theorem for irreducible Markov chains. Related results will appear in Chapters 12, 19, and 23. For any ) signed measure J.L we define IIJ.LII = sup A IJ.LAI. Theorem 8.18 (ergodic behavior, Markov, Kolmogorov, Orey) For any irreducible, aperiodic Markov chain in S, exactly one of these cases occurs: (i) There exists a unique invariant distribution v, the latter satisfies Vi > o for all i E S, and for any distribution J.L on S we have lim I/PJL 0 0;;1 - Pv II = O. n-.oo (ii) No invariant distribution exists, and we have (26) lim p. = 0 i, j E S. noo 1,J ' (27) A Markov chain satisfying (i) is clearly recurrent, whereas one that sat- isfies (ii) may be either recurrent or transient. This leads to the further classification of the irreducible, aperiodic, and recurrent Markov chains into positive recurrent and null-recurrent ones, depending on whether (i) or (ii) applies. We shall prove Theorem 8.18 by the powerful method of coupling. Here the general idea is to compare the distributions of two processes X and Y, by constructing copies X d X and Y d Y on a common probability space. By a suitable choice of joint distribution, one may sometimes reduce the original problem to a pathwise comparison. The coupling approach often leads to simple and transparent proofs; we shall see further applications of the method in Chapters 9, 14, 15, 16, 20, and 23. For our present needs, an elementary coupling by independence is sufficient. Lemma 8.19 (coupling) Let X and Y be independent Markov chains in Sand T with transition matrices (Pii') and (qjj/), respectively. Then (X, Y) is a Markov chain in S x T with transition matrix rij,i'j' = Pii,qjj', If X and Yare irreducible and aperiodic, then so is (X, Y); in that case (X, Y) is recurrent whenever invariant distributions exist for both X and Y. Proof: The first assertion is easily proved by computation of the finite- dimensional distributions of (X, Y) for an arbitrary initial distribution jj@V on S x T, using Proposition 8.2. Now assume that X and Yare irreducible and aperiodic. Fixing any i, i' E Sand j, j' E T, we see from Proposition 8.14 that rij,i'j' = pii,qjj' > 0 for all but finitely many n E N, and so even eX, Y) has the stated properties. Finally, if J-t and V are invariant 
8. Markov Processes and Discrete- Time Chains 153 distributions for X and Y, respectively, then J.-l 0 v is invariant for (X, Y), and the last assertion follows by Proposition 8.13. 0 The point of the construction is that, if the coupled processes eventually meet, their distributions will agree asymptotically. Lemma 8.20 (strong ergodicity) If the Markov chain in 52 with transi- tion matrix Pii'Pjj' is irreducible and recurrent, then for any distributions J-L and 1/ on 5, lim II PJ.t 0 0;; 1 - Pv 0 () 111 == o. n-+ <X) (28) Proof (Doeblin): Let X and Y be independent with distributions PJ.t and Pv. By Lemma 8.19 the pair (X, Y) is again Markov with respect to the induced filtration F, and by Proposition 8.9 it satisfies the strong Markov property at every finite optional time T. Taking T == inf{n > 0; X n == Y n }, we get for any measurable set A c 5 00 P[(}r X E AIFr] == PXr A == PYr A == P[(}r Y E AIFrJ. In particular, (T,Xr,OrX) d (T,X r ,8 r Y). Defining X n == X n for n < T - - d and X n = Y n otherwise, we obtain X == X, and so for any A as above IP{OnX E A} - P{OnY E A}I IP{(}nX E A} - P{OnY E A}I IP{OnX E A, T > n} - P{(}nY E A, T > n}1 < P{T > n} -+ O. 0 The next result ensures the existence of an invariant distribution. Here a coupling argument is again useful. Lemma 8.21 (existence) If (27) fails, there exists an invariant distribu- tion. Proof: Assume that (27) fails, so that Jim SUPn pi J o > 0 for some io, jo E 0, 0 S. By a diagonal argument we may choose a subsequence N' c N and some constants Cj with Cjo > 0 such that P,j -+ Cj along N' for every j E 5. Note that 0 < Lj Cj < 1 by Fatou's lemma. To extend the convergence to arbitrary i, let X and Y be independent processes with the given transition matrix (Pij), and conclude from Lemma 8.19 that (X, Y) is an irreducible Markov chain on 52 with transition probabilities qij,i'j' = Pii'Pjj'. If (X, Y) is transient, then by Proposition 8.12 '"""' (p.)2 = " q'T}: .. < 00 i, j E 5, n J n n,JJ ' and (27) follows. The pair (X, Y) is then recurrent and Lemma 8.20 yields pij - P,j -4 0 for all i,j E I. Hence, pij -4 Cj along N' for all i and j. 
154 Foundations of Modern Probability Next conclude from the Chapman-Kolmogorov relation that n+1  n  n Pik =  .PijPjk =  .PijPjk, J J i, k E S. Using Fatou's lemma on the left and dominated convergence on the right, we get as n -t 00 along N'  C' P ' k < '" P "C k = C k k E S. . J J - . 'lJ , J J (29) Summing over k gives Lj Cj < 1 on both sides, and so (29) holds with equality. Thus, (Ci) is invariant, and we get an invariant distribution v by taking Vi = Ci/ Lj Cj. 0 Proof of Theorem 8.18: If no invariant distribution exists, then (27) holds by Lemma 8.21. Now let v be an invariant distribution, and note that Vi > 0 for all i by Proposition 8.16. By Lemma 8.19 the coupled chain in Lemma 8.20 is irreducible and recurrent, so (28) holds for any initial distribution J-L, and (26) follows since Pv oO:;;,l = Pv by Lemma 8.11. If even v'is invariant, then (26) yields Pv' = Pv, and so v' = v. 0 The limits in Theorem 8.18 may be expressed in terms of the mean recurrence times EjTj, as follows. Theorem 8.22 (mean reCU1Tence times, Kolmogorov) chain in 8 and states i, j E 8 with j aperiodic, we have 1 . n Pi{Tj < oo} 1m Pi' = . noo J E'T' J J For any Markov (30) Proof: First take i = j. If j is transient, then Pjj --t 0 and EjTj = 00, and so (30) is trivially true. If instead j is recurrent, then the restriction of X to the set 8 j = {i; rji > O} is irreducible recurrent by Lemma 8.17 and aperiodic by Proposition 8.16. Hence, p}j converges by Theorem 8.18. To identify the limit, define n Ln = sup{k E Z+; Tj < n} = L l{Xk = j}, n E N. k=l The Tj form a random walk under Pj, and so, by the law of large numbers, L(T;) n 1 = - -+ a.s. Pj. T j n rf EjTj By the monotonicity of Lk and rf it follows that Ln/n ---* (EjTj)-l a.s. Pj. Noting that Ln < n, we get by dominated convergence n 1 L k EjLn 1 - p..= ---* n JJ n E'T" k= 1 J J and (30) follows. 
8. Markov Processes and Discrete- Time Chains 155 Now let i =1= j. Using the strong Markov property, the disintegration theorem, and dominated convergence, we get pij - Pi{X n = j} = Pi{Tj < n, (()TJX)n-Tj = j} E i [P7j-T J ; 7j < n] --t Pi{7j < OO}/EjTj. o We return to continuous time and a general state space, to clarify the nature of the strong Markov property of a process X at finite optional times 'T. The condition is clearly a combination of the conditional independence ()-,-X Jlx-rFr and the strong homogeneity P[8r X E 'IX,.] = PXr a.s. (31) Though (31) appears to be weaker than (13), the two properties are in fact equivalent, under suitable regularity conditions on X and F. Theorem 8.23 (strong homogeneity) Fix a separable metric space (S,p), a probability kernel (Px) from S to D(S), and a right-continuous filtration F on JR+. Let X be an :F-adapted rcll process in S such that (31) holds for all bounded optional times T. Then X satisfies the strong Markov property. Our proof is based on a 0-1 law for absorption probabilities, involving the sets I = {w E D; Wt = wo}, A = {x E S; PxI = I}. (32) Lemma 8.24 (absorption) For X as in Theorem 8.23 and for any optional time T < 00, we have PXr I = l](OrX) = lA(X i ) a.s. (33) Proof: We may clearly assume that T is bounded, say by n E N. Fix any h > 0, and divide S into disjoint Borel sets B 1 , B 2 , . .. of diameter < h. For each kEN, define 7k = n A inf{t > T; p(X i , Xt) > h} on {Xi E B k }, (34) and put Tk = T otherwise. The times Tk are again bounded and optional, and we note that {X-'-k E Bk} C {X-,- E B k , SUPtE[T,nJP(X." X t ) < h}. (35) 
156 Foundations of Modern Probability Using (31) and (35), we get as n -t 00 and h -t 0 E[PxJ C ; ()TX E I] = Lk E[PxJ c ; ()T X E I, X T E B k ] < Lk E[PX Tk I C ; X Tk E Bk] Lk P{()Tk X tic I, X Tk E Bd < LkP{OTX tic I, X T E Bk, SUPtEhnjp(XnXd < h} --t P{(JrX tt I, SUPtrp(Xr' Xt) == O} == 0, and so PxTI == 1 a.s. on {OrX E I}. Since also EPxTI == P{(JrX E I} by (31), we obtain the first relation in (33). The second relation follows by the definition of A. 0 Proof of Theorem 8.23: Define I and A as in (32). To prove (13) on {X r E A}, fix any times tl < ... < t n and Borel sets Bl,...,Bn, write B == nk B k , and conclude from (31) and Lemma 8.24 that P [nk{X T + tk E BdlFTJ P[X r E BIFr] == l{X r E B} P[X r E BIX r ] == PXT{wo E B} pxTnk{Wt k E Bd. This extends to (13) by a monotone class argument. To prove (13) on {X r  A}, we may assume that T < n a.s., and divide AC into disjoint Borel sets Bk of diameter < h. Fix any F E :F r with F C {X r 1- A}. For each kEN, define Tk as in (34) on the set pc n {X r E Bk}, and let Tk == T otherwise. Note that (35) remains true on FC. Using (31), (35), and Lemma 8.24, we get as n -t 00 and h -t 0 IP[Br X E . ; F] - E[P XT ; F] I IL k E[l{OT X E'} - Px T ; X T E Bk, F]I IL k E[l{OTk X E .}-P XTk ; X Tk EBk, F]I IL k E[l{()Tk X E.} - PX Tk ; X Tk E Bk, FC]I < Lk P[X Tk E B k ; F C ] < LkP{X T E B k , SUPtE[T,njP(XT,X t ) < h} -t P{X r tt A, SUPtrp(Xr' Xt) == O} == O. Hence, the left-hand side is zero. o 
8. Markov Processes and Discrete-Time Chains 157 Exercises 1. Let X be a process with XsllXt {Xu, U > t} for all s < t. Show that X is Markov with respect to the induced filtration. 2. Let X be a Markov process in some space S, and fix a measurable function f on S. Show by an example that the process yt == f(Xt) need not be Markov. (Hint: Let X be a simple symmetric random walk on Z, and take f(x) == [x/2].) 3. Let X be a Markov process in IR with transition functions J-tt satisfying J.lt(x, B) == J-lt( -x, -B). Show that the process yt == IXtl is again Markov. 4. Fix any process X on JR+, and define yt == X t == {XSl\t; s > O}. Show that Y is Markov with respect to the induced filtration. 5. Consider a random element  in some Borel space and a filtration :F with :Foe C (]"{}. Show that the measure-valued process Xt == P[ E .1:F t ] is Markov. (Hint: Note that llXtFt for all t.) 6. For any Markov process X on R+ and time u > 0, show that the re- versed process it == Xu-t, t E [0, u], is Markov with respect to the induced filtration. Also show by an example that a possible time homogeneity of X need not carryover to Y. 7. Let X be a time-homogeneous Markov process in some Borel space S. Show that there exist some measurable functions fh : S x [0,1] -+ S, h > 0, and U(O, 1) random variables 'l9t,hllXt, t, h > 0, such that Xt+h == fh(X t ,'19 t ,h) a.s. for all t, h > O. 8. Let X be a time-homogeneous and rcll Markov process in some Polish space S. Show that there exist a measurable function f : S x [0,1] -+ D(JR+, S) and some U(O,l) random variables 'l9 t l.LX t such that ()tX == f(Xt,'19 t ) a.s. Extend the result to optional times taking countably many values. 9. Let X be a process on R+ with state space S, and define yt == (X t , t), t > O. Show that X and Yare simultanously Markov, and that Y is then time-homogeneous. Give a relation between the transition kernels for X and Y. Express the strong Markov property of Y at a random time T in terms of the process X. 10. Let X be a discrete-time Markov process in S with invariant distribu- tion lJ. Show for any measurable set B c S that Pv {X n E B i.o.} > lJ B. Use the result to give an alternative proof of Proposition 8.13. (Hint: Use Fatou's lemma.) 11. Fix an irreducible Markov chain in S with period d. Show that S has a unique partition into subsets 8 1 , . . . , Sd such that Pij == 0 unless i E Sk and j E 8k+1 for some k E {I, . . . , d}, where the addition is defined modulo d. 12. Let X be an irreducible Markov chain with period d, and define 8 1 , · · . , Sd as above. Show that the restrictions of (Xnd) to 8 1 , . . . , Sd are 
158 Foundations of Modern Probability irreducible, aperiodic and either all positive recurrent or all null recurrent. In the former case, show that the original chain has a unique invariant dis- tribution 1/. Further show that (26) holds iff jjSk = lid for all k. (Hint: If (X nd ) has an invariant distribution v k in Sk, then v;+l = Ei Vfpij form an invariant distribution in Sk+1.) 13. Given a Markov chain X on S, define the classes C i as in Lemma 8.17. Show that if j E C i but i f/- C j for some i, j E S, then i is transient. If instead i E C j for every j E C i , show that C i is irreducible (i.e., the restriction of X to C i is an irreducible Markov chain). Further show that the irreducible sets are disjoint and that every state outside all irreducible sets is transient. 14. For an arbitrary Markov chain, show that (26) holds iff E j Ipi} -Vj I -+ 0 for all i. 15. Let X be an irreducible, aperiodic Markov chain in N. Show that X is transient iff X n -+ 00 a.s. under any initial distribution and is null recurrent iff the same divergence holds in probability but not a.s. 16. For every irreducible, positive recurrent subset Sk C S, there exists a unique invariant distribution Vk restrited to Sk, and every invariant distribution is a convex combination Ek CkVk. 17. Show that a Markov chain on a finite state space S has at least one irreducible set and one invariant distribution. (Hint: Starting from any io E S, choose i 1 E C io , i2 E C il , etc. Then nn Gin is irreducible.) 18. Let X and Y be independent Markov processes with transition kernels JLs,t and Vs,t. Show that (X, Y) is again Markov with transition kernels JLs,t(x,.) Q9 Vs,t(Y, .). (Hint: Compute the £inite-dimensional distributions from Proposition 8.2, or use Proposition 6.8 with no computations.) 19. Let X and Y be independent, irreducible Markov chains with periods d 1 and d 2 . Show that Z = (X, Y) is irreducible iff d 1 and d 2 have greatest common divisor 1 and that Z then has period d 1 d 2 . 20. State and prove a discrete-time version of Theorem 8.23. Further simplify the continuous-time proof when S is countable. 
Chapter 9 Random Walks and Renewal Theory Recurrence and transience; dependence on dimension; general recurrence criteria; symmetry and duality; Wiener-Hop! factor- ization; ladder time and height distribution; stationary renewal process; renewal theorem A random walk in jRd is defined as a discrete-time random process (Sn) evolving by i.i.d. steps n = Sn = Sn - Sn-l. For most purposes we may take So = 0, so that Sn = 1 + . . . + n for all n. Random walks may be regarded as the simplest of all Markov processes. Indeed, we recall from Chapter 8 that random walks are precisely the discrete-time Markov processes in d that are both space- and time-homogeneous. (In continuous time, a similar role is played by the so-called Levy processes, to be studied in Chapter 15.) Despite their simplicity, random walks exhibit many basic features of Markov processes in discrete time and hence may serve as a good introduction to the general subject. We shall further see how random walks enter naturally into the discussion of certain continuous-time phenomena. Some basic facts about random walks were obtained in previous chapters. Thus, we established some simple 0-1 laws in Chapter 3, and in Chap- ters 4 and 5 we proved the ultimate versions of the laws of large numbers and the central limit theorem, both of which deal with the asymptotic behavior of n-cSn for suitable constants c > o. More sophisticated limit theorems of this type will be derived in Chapters 14-16 and 27, often through approximation by a Brownian motion or some other Levy process. Random walks in ]Rd are either recurrent or transient, and our first major task is to derive a recurrence criterion in terms of the transition distribution J.j. We proceed with some striking connections between maximum and re- turn times, anticipating the arcsine laws of Chapters 13, 14, and 15. This is followed by a detailed study of ladder times and heights for one-dimensional random walks, culminating with the Wiener-Hopf factorization and Bax- ter's formula. Finally, we prove a two-sided version of the renewal theorem, which describes the asymptotic behavior of the occupation measure and associated intensity for a transient random walk. In addition to the already mentioned connections to other chapters, we note the relevance of renewal theory for the study of continuous-time Markov chains, as considered in Chapter 12. Renewal processes may fur- ther be regarded as constituting an elementary subclass of the regenerative 
160 Foundations of Modern Pobability sets, to be studied in full generality in Chapter 22 in connection with local time and excursion theory. To begin our systematic discussion of random walks, assume as before that Sn = l + . . . + n for all n E Z+, where the n are i.i.d. random vectors in ]Rd. The distribution of (Sn) is then determined by the common distribution J-L = .c(n) of the increments. By the effective dimension of (Sn) we mean the dimension of the linear subspace spanned by the support of J-L. For most purposes, we may assume that the effective dimension agrees with the dimension of the underlying space, since we may otherwise restrict our attention to the generated subspace. The occupation measure of (Sn) is defined as the random measure 1JB = " l{Sn E B}, B E Sd. nO We also need to consider the corresponding intensity measure (ETJ)B = E(TJB) =  P{Sn E B}, B E Sd. nO Writing B; = {y; Ix - yl < €}, we may introduce the accessible set A, the mean recurrence set M, and the recurrence set R, given by A - nc>o{x E dj E'f/B > O}, M nc>o{x E d; E'f/B = oo}, R - n {x E }Rd; 1JB = 00 a.s.}. c:>o The following result gives the basic dichotomy for random walks in d. Theorem 9.1 (recurrence dichotomy) Let (Sn) be a random walk in JRd, and define A, M, and R as above. Then exactly one of these conditions holds: (i) R = M = A, which is then a closed additive subgroup ofIR d ; (ii) R = M = 0, and ISnl  00 a.s. A random walk is said to be recurrent if (i) holds and to be transient otherwise. Proof: Since trivially ReM c A, the relations in (i) and (ii) are equiv- alent to A c Rand M = 0, respectively. Further note that A is a closed additive semigroup. First assume P{ISnl  co} < 1, so that P{ISnl < r Lo.} > 0 for some r > O. Fix any € > 0, cover the r-ball around 0 by finitely many open balls Bl'...' Bn of radius €/2, and note that P{Sn E Bk Lo.} > 0 for at least one k. By the Hewitt-Savage 0-1 law, the latter probability equals 1. Thus, the optional time ". = inf {n > 0; Sn E B k } is a.s. finite, and the strong Markov property at T yields 1 = P{Sn E Bk Lo.} < P{ISr+n - Sri < € Lo.} = P{ISnl < € Lo.}. 
9. Random Walks and Renevi/al Theory 161 Hence, 0 ERin this case. To extend the latter relation to A c R, fix any x E A and c > O. By the strong Markov property at a == inf{n > 0; ISn - xl < E/2}, P{ISn - xl < E Lo.} > P{a < 00, ISa+n - Sa I < E/2 i.o.} P{a < oo}P{ISnl < c/2 i.a.} > 0, and by the Hewitt-Savage 0-1 law the probability on the left equals 1. Thus, x E R. The asserted group property will follow if \\re can prove that even -x E A. This is clear if we write P{ISn + xl < E Lo.} P{jSa+n - Sa + xl < E i.a.} > P{ISnl < E/2 Lo.} == 1. Next assume that ISnl -4 00 a.s. Fix any m, kEN, and conclude from the Markov property at m that P{ISml < r, infn2::kISm+nl > r} > P{ISml < r, infn2::kISm+n - Sml > 2r} == P{ISml < r} P{infnkISnl > 2r}. Here the event on the left can occur for at most k different values of m, and therefore P{infnkISnl > 2r} Lm P{ISml < r} < 00, kEN. As k --+ 00, the probability on the left tends to 1. Hence, the sum converges, and we get E1JB < 00 for any bounded set B. This shows that M == 0. 0 The next result gives some easily verified recurrence criteria. Theorem 9.2 (recurrence for d == 1,2) A random walk (Sn) in JRd 'ts recurrent under each of these conditions: (i) d == 1 and n-1Sn  0; (ii) d == 2, E€l == 0, and EI€112 < 00. In (i) we recognize the weak law of large numbers, which is characterized in Theorem 5.16. In particular, the condition is fulfilled when E€l == O. By contrast, El E (0,00] implies Sn -4 00 a.s. by the strong law of large numbers, so in that case (Sn) is transient. Our proof of Theorem 9.2 is based on the following scaling relation. As before, a :S. b means that a < cb for some constant c > O. Lemma 9.3 (scaling) For any random walk (Sn) in}Rd, LnoP{ISnl < r€}  r d LnoP{ISnl < E}, r > 1, E > O. Proof: Cover the ball {x; Ixl < rE} by balls Bl,...,Bm of radius E/2, and note that we can make m < rd. Introduce the optional times Tk = - 
162 Foundations of Modern Pobability inf{n; Sn E B k }, k = 1,...,m, and conclude from the strong Markov property that :L n P{ISnl < rd < :Lk:L n P{Sn E Bk} :LkLn P{ISTk+ n - STk I < cj Tk < oo} LkP{Tk < oo} LnP{ISnl < d rd:L n P{ISnl < d. o < < Proof of Theorem 9.2 (Chung and Ornstein): (i) Fix any E > 0 and r > 1, and conclude from Lemma 9.3 that :LP{ISnl < c}  r- 1 :LP{ISnl < rc} = 1 00 P{IS[rtd < rc}dt. n n 0 Here the integrand on the right tends to 1 as r -4 00, so the integral tends to 00 by Fatou's lemma, and the recurrence of (Sn) follows by Theorem 9.1. (ii) We may assume that (Sn) is two-dimensional, since the one-dimen- sional case is already covered by part (i). By the central limit theorem we have n- 1 / 2 Sn  (, where the random vector (has a nondegenerate normal distribution. In particular, P{I(I < c} ? c 2 for bounded c > O. Now fix any E > 0 and r > 1, and conclude from Lemma 9.3 that :LP{ISnl < d  r- 2 :LP{ISnl < rc} = 1 00 P{IS[r 2 td < Tc}dt. n n 0 As r --+ 00, we get by Fatou's lemma L P{lSnl < c}  1 00 P{I(I < ct- 1 / 2 }dt  c21°O r 1 dt = 00, n 0 1 and the recurrence follows again by Theorem 9.1. o Our next aim is to derive a general recurrence criterion, stated in terms of the characteristic function jJ, of J.t. Write Be; = {x E }Rd; Ixl < E}. Theorem 9.4 (recurrence criterion, Chung and Fuchs) Let (Sn) be a ran- dom walk in R d based on some distribution j.t, and fix any € > o. Then (Sn) is recurrent iff 1 1 sup  '" dt = 00. O<r<l Be 1 - rJ.tt (1) The proof is based on an elementary identity. Lemma 9.5 (Parseval) Let J.t and 1/ be probability measures on ]Rd with characteristic functions jJ, and v. Then J jld1/ = J vdJ.L. Proof: Use Fubini's theorem. o 
9. Random Walks and Renewal Theory 163 Proof of Theorem 9.4: The function f(8) == (1 -181)+ has Fourier trans- form j(t) = 2t- 2 (1-cost), so the tensor product f@d(s) = ITk<d!(8k) on JRd has Fourier transform jd(t) == ITk<d j(tk). Writing J.-l*n - .c(Sn), we get by Lemma 9.5 for any a > 0 and n E Z+ J j@d(x/a)JL*n(dx) = ad J f0d(at)jJ,dt. By Fubini's theorem it follows that, for any r E (0,1), J j@d(x/a) " rnp,*n(dx) == ad J f0d(a!) dt. (2) L...,no 1 - rjjt Now assume that (1) is false. Taking 8 = E- 1 d l / 2 , we get by (2) Ln P{ISnl < 8} = LnJL*n(Bb):S J j0d(x/8) Lnll*n(dx) J ffgJd(8t) 1 dt 8 d sup '" dt < c-dsup '" < 00, r<l 1 - rjjt --- r<l B€ 1 - rjjt and so (Sn) is transient by Theorem 9.1. To prove the converse, we note that jfgJd has Fourier transform (27r)d ffgJd. Hence, (2) remains true with f and j interchanged, apart from a factor (27r)d on the left. If (Sn) is transient, then for any E > 0 with 8 == €-ld l / 2 we get 1 dt J j@d(tjE) sup '" < sup '" dt r<l Be 1 - TJ..Lt ...- r<l 1 - rJ.Lt :S Ed J f@d(EX) Lnll*n(dx) < Ed Lnll*n(Bb) < 00. 0 In particular, we note that if J.L is symmetric in the sense that l d -l, then jL is real valued, and the last criterion reduces to LE 1 :tjJ,t = 00. By a symmetrization of (Sn) we mean a random walk Sn == Sn - S, n > 0, where (S) is an independent copy of (Sn). The following result relates the recurrence behavior of (Sn) and (Sn). Corollary 9.6 (symmetrization) If a random walk (Sn) is recurrent, then so is the symmetrized version (Sn). Proof: Noting that (Z)(Z-l) < 1 for any complex number z =1= 0, we get 1 1 1 R 1 _ rjJ,2 < 1 - rRjJ,2 < 1 - rlP-1 2 . 
164 Foundations of Modern Pobability Thus, if (Sn) is transient, then so is the random walk (S2n) by Theorem 9.4. But then IS2nl-+ 00 a.s. by Theorem 9.1, and so IS2n+II-+ 00 a.s. By combination, ISnl -+ 00 a.s., which means that (Sn) is transient. 0 The following sufficient conditions for recurrence or transience are often more convenient for applications. Corollary 9.7 (sufficient conditions) Fix any E > o. Then (Sn) is recurrent if r  1" dt == 00 } Be 1 - J1t (3) and transient if { dt JBel-[Lt <00. Proof: First assume (3). By Fatou's lemma, we get for any sequence r n t 1 liminf f  1 A > f lim  1 A = f  1 A = 00. n-+oo J Be 1 - rnJ.L - J Be n-+oo 1 - rnJ.L } Be 1 - J.L Thus, (1) holds, and (Sn) is recurrent. N ow assume (4) instead. Decreasing E if necessary, we may further assume that SRil > 0 on Be. As before, we get he  1 ! r[L < he 1 - [L < he 1 } [L < 00, and so (1) fails. Thus, (Sn) is transient. 0 The last result enables us to supplement Theorem 9.2 with some conclusive information for d > 3. (4) Theorem 9.8 (transience for d > 3) Any random walk of effective dimension d > 3 is transient. Proof: We may assume that the symmetrized distribution is again d- dimensional, since Jl is otherwise supported by some hyperplane outside the origin, and the transience follows by the strong law of large numbers. By Corollary 9.6, it is enough to prove that the symmetrized random walk (Sn) is transient, and so we may assume that J.-L is symmetric. Considering the conditional distributions on Br and B for large enough r > 0, we may write J.,l as a convex combination CJ.Ll + (1- C)JL2, where J.-Ll is symmetric and d-dimensional with bounded support. Letting (rij) denote the covariance matrix of 111, we get as in Lemma 5.10 ill (t) = 1 -  L. .rijtitj + o(ltI 2 ), t -+ O. ,J Since the matrix (rij) is positive definite, it follows that 1- ill (t)  Itl 2 for small enough It I, say for t E Bc. A similar relation then holds for jl, and so 1 dt 1 dt l e d-3 A;S - , 1 2  r dr < 00. B £ 1 - j.jt B £ t 0 
9. Random Walks and Renewal Theory 165 Thus, (Sn) is transient by Theorem 9.4. 0 We turn to a more detailed study of the one-dimensional random walk Sn == l + . . . + n' n E Z+. Say that (Sn) is simple if Ill == 1 a.s. For a simple, symmetric random walk (Sn) we note that Un = P{S2n = O} = T2ne:} n E Z+_ (5) The following result gives a surprising connection between the probabilities Un and the distribution of last return to the origin. Proposition 9.9 (last return, Feller) Let (Sn) be a simple, symmetric random walk in Z, put an == max{k < n; 5 2k == OJ, and define Un by (5). Then P{a n == k} == UkUn-k, 0 < k < n. Our proof will be based on a simple symmetry property, which will also appear in a continuous-time version as Lemma 13.14. Lemma 9.10 (reflection principle, Andre) For any symmetric random walk (Sn) and optional time T, we have (Sn) d (Sn), where Sn == Snl\r - (Sn - SnI\T)' n > o. Proof: We may clearly assume that T < 00 a.s. Writing S == ST+n - 5 T , d n E Z+, we get by the strong Markov property S == S'.1L(ST, r), and by symmetry - S d S. Hence, by combination (- S' , sr , T) d (5', ST , r), and the assertion follows by suitable assembly. 0 Proof of Proposition 9.9: By the Markov property at time 2k, we get P{a n == k} == P{S2k == O}P{an-k == OJ, 0 < k < n, which reduces the proof to the case when k == o. Thus, it remains to show that P {8 2 =1= 0, . . . , S2n =1= O} == P {S2n = O}, n EN. By the Markov property at time 1, the left-hand side equals !P{ min k<2n S k = O} + !P{maxk<2nSk == O} == P{.l\1 2n - 1 == OJ, where M n = maxkn Sk. Using Lemma 9.10 with T == inf{k; Sk == I}, we get 1 - P{M 2n - 1 = O} = P{M 2n - 1 > 1} P{M 2n - 1 > 1, S2n-l > 1} + P{M 2n - 1 > 1, S2n-l < O} = P{S2n-l > 1} + P{S2n-l > 2} - 1- P{S2n-l == 1} == 1 - P{S2n == OJ. 0 We continue with an even more striking connection between the max- imum of a symmetric random walk and the last return probabilities in 
166 Foundations of Modern Pobability Proposition 9.9. Related results for Brownian motion and more general random walks will appear in Theorems 13.16 and 14.11. Theorem 9.11 (first maximum, Sparre-Andersen) Let (Sn) be a random walk based on a symmetric, diffuse distribution, put M n == maxkn Sk, and write Tn == min{k > 0; Sk == M n }. Define an as in Proposition 9.9 in terms of a simple, symmetric random walk. Then Tn d an for every n > o. Here and below, we shall use the relation d (SI,'.', Sn) == (Sn - Sn-l,..., Sn - So), n E N, (6) valid for any random walk (Sn). The formula is obvious from the fact that d (1,'" ,n) == (n,." ,c;l). Proof of Theorem 9.11: By the symmetry of (Sn) together with (6), we have Vk = P{Tk=O}==P{Tk==k}, k > O. (7) Using the Markov property at time k, we hence obtain P{Tn == k} == P{Tk == k}P{Tn-k == O} == VkVn-k, 0 < k < n. (8) Clearly ao == TO == O. Proceeding by induction, assume that ak d Tk and hence Uk == Vk for all k < n. Comparing (8) with Proposition 9.9, we obtain P{a n = k} == P{Tn == k} for 0 < k < n, and by (7) the equality extends to d k == 0 and n. Thus, an == Tn. 0 For a general one-dimensional random walk (Sn)' we may introduce the ascending ladder times /1, T2, . .. , given recursively by Tn = inf{k > Tn-I; Sk > S'Tn_l}' n E N, (9) starting with TO == o. The associated ascending ladder heights are defined as the random variables S'T n ' n E N, where Sex:> may be interpreted as 00. In a similar way, we may define the descending ladder times T;; and heights S -, 'In n E N. The times Tn and T;; are clearly optional. By the strong Markov property, we conclude that the pairs (Tn, S'T n ) and (7;, S'T:;) form possibly terminating random walks in JR 2. Replacing the relation Sk > S'Tn_l in (9) by Sk > S'Tn_l' we obtain the weak ascending ladder times an and heights SUn. Similarly, we may intro- duce the weak descending ladder times 0";; and heights 8 u ;;. The mentioned sequences are connected by a pair of simple but powerful duality relations. 
9. Random Walks and Renewal Theory 167 Lemma 9.12 (duality) Let 1], rl, (, and (' denote the occupation mea- sures of the sequences (Sr n ), (San)' (Sn; n < 7 1 ), and (Sn; n < ( 1 ), respectively. Then E1] == E(' and Erl == E(. Proof: By (6) we have for any B E B(O, (0) and n E N P{SI !\ ... /\ Sn-l > 0, Sn E B} P{SI V ... V Sn-I < Sn E B} I: k P{Tk = n, S-rk E B}. (10) Summing over n > 1 gives Ee' B == E1]B, and the first assertion follows. The proof of the second assertion is similar. 0 The last lemma yields some interesting information. For example, in a simple symmetric random walk, the expected number of visits to an arbitrary state k =1= 0 before the first return to 0 is constant and equal to 1. In particular, the mean recurrence time is infinite, and so (Sn) is a null-recurrent Markov chain. The following result shows how the asymptotic behavior of a random walk is related to the expected values of the ladder times. Proposition 9.13 (fluctuations and mean ladder times) For any nonde- generate random walk (Sn) in JR, exactly one of these cases occurs: (i) Sn -+ 00 a.s. and Erl < 00; (ii) Sn -+ -00 a.s. and E7} < 00; (iii) limsuPn(:i:S n ) == 00 a.s. and EUI == EU 1 == 00. Proof: By Corollary 3.17 there are only three possibilities: Sn -+ 00 a.s., Sn -+ -00 a.s., and lim sUP n (::i:S n ) == 00 a.s. In the first case a:;; < 00 for finitely many n, say for n < Ii < 00. Here K is geometrically distributed, and so Erl == Eli < 00 by Lemma 9.12. The proof in case (ii) is similar. In case (iii) the variables r n and r;; are all finite, and Lemma 9.12 yields Eal == Ea 1 = 00. 0 Next we shall see how the asymptotic behavior of a random walk is related to the expected values of l and SrI' Here we define E == E+ - E- whenever E+ /\ E- < 00. Proposition 9.14 (fluctuations and mean ladder heights) If (Sn) is a nondegenerate random walk in JR, then (i) El = 0 implies lim sUPn (::i:S n ) == 00 a.s.; (ii) El E (0,00] implies Sn  00 a.s. and ES rl == ErlEl; (iii) Et = El = 00 implies ES rl = -ES rl = 00. The first assertion is an immediate consequence of Theorem 9.2 (i). It can also be obtained more directly, as follows. Proof: (i) By symmetry, we may assume that limsuPn Sn == 00 a.s. If ETI < 00, then the law of large numbers applies to each of the three ratios 
168 Foundations of Modern Pobability in the equation 8 Tn Tn 8 Tn n E N, Tn n n and we get 0 = ElETI = ES Tl > o. The contradiction shows that Ell = 00, and so Hrn inf n Sn = -00 by Proposition 9.13. (ii) In this case Sn -+ 00 a.s. by the law of large numbers, and the formula EST! = ETI El follows as before. (iii) This is clear from the relations STI > t and ST- < -1. 0 1 We proceed with a celebrated factorization, which provides some more detailed information about the distributions of ladder times and heights. Here we write X:I: for the possibly defective distributions of the pairs ( '1, SrI) and (T 1" , S T - ), respectively, and let 'ljJ:I: denote the correspond- 1 ing distributions of (a1,Sal) and (a1,Sa-). Put x = x:I:({n} x.) and 1 'ljJ = 'ljJ:t ( { n} x .). Let us finally introd lice the measure X O on N, given by x - P{Sl!\.... /\ Sn-l > 0 = Sn} = P{Sl V . . . V Sn-l < 0 = Sn}, n E N, where the second equality holds by (6). Theorem 9.15 (Wiener-Hop! factorization) For any random walk in JR based on some distribution J..L, we have b o - 8 1  J1, (8 0 - X +) * (80 - "p -) = (8 0 - 'l/J +) * (8 o - X - ) , (11) 8 0 - 'ljJ:I: _ (8 0 _ X:I:) * (80 - Xo). (12) Note that the convolutions in (11) are defined on the space Z+ x JR, whereas those in (12) can be regarded as defined on Z+. Alternatively, we may consider Xo as a measure on N x {O}, and interpret all convolutions as defined on Z+ x IR. Proof: Define the measures PI, P2, . .. on (0, (0) by Pn B - P{SI /\ ...1\ Sn-l > 0, Sn E B} - EL k 1{n: = n, STk E B}, n EN, BE 8(0,00), (13) where the second equality holds by (10). Put Po = 80, and regard the sequence P = (Pn) as a measure on Z+ x (0, (0). Noting that the corre- sponding measures on 1R equal Pn + 1/;; and using the Markov property at time n - 1, we get Pn + 1/J;; = Pn-l * J.L = (p * (8 1 Q9 /-l))n, n E N. (14) Applying the strong Markov property at 'I to the second expression in (13), we see that also n Pn = 2::xt *Pn-k = {X+ *P)n, n E N. k=l (15) 
9. Random Walks and Renewal Theory 169 Recalling the values at zero, we get from (14) and (15) p + 'ljJ - == 6 0 + P * (6 1 (9 J-L), P == 8 0 + X + * p. Eliminating p between the two equations yields the first relation in (11), and the second relation follows by symmetry. To prove (12), we note that the restriction of'ljJ+ to (0,00) equals 'ljJ;t -X. Thus, for any B E B(O,oo), (x - 'ljJ-;; + X)B == P{maxk<nSk == 0, Sn E: B}. Decomposing the event on the right according to the time of first return to 0, we get n-l X - 'ljJ + X == L XX-k == (X O * X+)n, n E N, k=l and so X+ - 'ljJ+ + X O == XO * X+, which is equivalent to the plus-sign version of (12). The minus-sign version follows by symmetry. 0 The preceding factorization yields in particular an explicit formula for the joint distribution of the first ladder time and height. Theorem 9.16 (ladder distributions, Sparre-Andersen, Baxter) If (Sn) is a random walk in lR, then for Is I < 1 and u > 0, E STl exp( -uS T .) = 1 - exp { -  : E[e- Usn ; Sn > 0] } . (16) A similar relation holds for (0"1, SUI) with Sn > 0 replaced by Sn > o. Proof: Introduce the mixed generating and characteristic functions X , t == E STI exp(itS TI ), ;f;;t == E sUI exp(itS u -)' , I and note that the first relation in (11) is equivalent to 1 - S t == (1 - X t t ) (1 - "j; -; t ) , I s I < 1, t E: JR. , , Taking logarithms and expanding in Taylor series, we obtain '""' n -1 ( silt) n == '""' n -1 (X t t) n + '""' n -1 ( b -; t ) n . n n' n ' For fixed s E (-1, 1), this equation is of the form f) == [1+ + [1- , where v and v::i: are bounded signed measures on JR, (0, 00 ), and ( - 00, 0], respectively. By the uniqueness theorem for characteristic functions we get v == v+ + v- . In particular, v+ equals the restriction of v to (0, 00 ). Thus, the corresponding Laplace transforms agree, and (16) follows by summation of a Taylor series for the logarithm. A similar argument yields the formula for (0"1, SUI)' 0 From the last result we may easily obtain expressions for the probability that a random walk stays negative or non positive, and also deduce criteria for its divergence to -00. 
170 Foundations of Modern Pobability Corollary 9.17 (negativity and divergence to -00) For any random walk (Sn) in JR, we have P{TI =oo} (Ea1)-1 eXP{-Lnln-lP{Sn >O}}, (17) P{al=oo} - (ETl)-l eXP{-Lnln-lP{Sn > O}}. (18) Furthermore, each of these two conditions is equivalent to Sn --t -00 a.s.: " n- 1 P{Sn > O} < 00,  n- 1 P{Sn > O} < 00. nl n1 Proof: The last expression for P{ 71 = oo} follows from (16) with u = 0 as we let s --t 1. Similarly, the formula for P{ a1 == oo} is obtained from the version of (16) for the pair (0"1, S(1)' In particular, P{ 71 = oo} > 0 iff the series in (17) converges, and similarly for the condition P{ 0"1 = oo} > o in terms of the series in (18). Since both conditions are equivalent to Sn --t -00 a.s., the last assertion follows. Finally, the first equalities in (17) and (18) are obtained most easily from Lemma 9.12, if we note that the number of strict or weak ladder times 7 n < 00 or an < 00 is geometrically distributed. 0 We turn to a detailed study of the occupation measure 'TJ = L:n>o 8sn of a transient random walk on , based on transition and initial distrIbutions J-L and v. Recall from Theorem 9.1 that the associated intensity measure ErJ = v * L:n J..L*n is locally finite. By the strong Markov property, the sequence (ST+n - S7) has the same distribution for every finite optional time T. Thus, a similar invariance holds for the occupation measure, and the associated intensities must agree. A renewal is then said to occur at time T, and the whole subject is known as renewal theory. In the special case when J-L and 1/ are supported by R+, we refer to TJ as a renewal process based on j.j and v, and to E'rJ as the associated renewal measure. For most purposes, we may assume that v = 8 0 ; if this is not the case, we say that 1] is delayed. The occupation measure ry is clearly a random measure on , in the sense that TJB is a random variable for every bounded Borel set B. From Lemma 12.1 we anticipate the simple fact that the distribution of a random measure on 1R+ is determined by the distributions of the integrals rJ f = J f dry for all f E Cj( (JR+ ), the space of continuous functions f: JR+ --t 1R+ with bounded support. For any measure J-L on 1R and constant t > 0, we may introduce the shifted measure (}tJ-L on R+, given by «(}tJ.L)B = J-L(B + t) for arbitrary B E B(R+). A random measure 'TJ on R is said to be stationary on JR+ if d Ot'f} = ()o'TJ. Given a renewal process TJ based on some distribution J-L, we say that the delayed process ij = 80: * 'TJ is a stationary version of ry, if the delay distri- bution v = £( a) is such that the random measure i} becomes stationary on R+. We proceed to show that such a version exists iff J.l has finite mean, in which case v is uniquely determined by J.L. Write A for Lebesgue measure on JR+. 
9. Random Walks and ReneliVal Theory 171 Proposition 9.18 (stationary renewal process) Let'f} be a renewal process based on some distribution J1 on 1R+ with mean c. Then 11 has a stationary version ij iff c E (0,00). In that case Eij == c- 1 A, and the delay distribution of ij is uniquely given by v == c- 1 (8 0 - J-l) * A, or v[O, t] = c- 1 I t /1(s, 00 )ds, t > O. (19) Proof: By Fubini's theorem, E1J ELnDSn = Ln£(Sn) = L n v*/1*n v + /1 * Ln v * /1*n = v + /1 * E1J, and so v == (8 0 - J-l) * E'f}. If TJ is stationary, then E'f} is shift invariant, and Theorem 2.6 yields E1] == aA for some constant a > O. Thus, v == a(8 0 - J-l) * A, and (19) holds with c- 1 replaced by a. A.s t -+ 00, we get 1 == ac by Lemma 3.4, which implies c E (0,00) and a == c- 1 . Conversely, assume that c E (0, (0), and let v be given by (19). Then E1J v * Ln/1m = C-1(Do - /1) * A * Ln/1*n c- 1 A * { '""" J-l*n - '""" J-l*n } == c- 1 A. L..,. n  0 L...-t n  1 By the strong Markov property, the shifted random measure OtT! is again a renewal process based on J-l, say with delay distribution Vt. As before, Vt == (8 0 - J.L) * (OtETJ) == (8 0 - J-l) * ETJ == v, which implies the asserted stationarity of 'f}. D From the last result we may deduce a corresponding statement for the occupation measure of a general random walk. Proposition 9.19 (stationary occupation measure) Let 'f} be the occupa- tion measure of a random walk in  based on some distTibutions J-l and v, where J-l has mean c E (0, (0) and v is defined as in (lt) in terms of the ladder height distribution j1 and its mean c. Then 'f} is stationary on 1R+ with intensity c- 1 . Proof: Since Bn  00 a.s., Propositions 9.13 and 9.14 show that the ladder times 'Tn and heights Hn == B Tn have finite mean, and by Proposition 9.18 the renewal process <; == En 8Hn is stationary for the prescribed choice of v. Fixing t > 0 and putting (J"t == inf{n E Z+; Bn > t}, we note in particular that BUt - t has distribution v. By the strong Markov property at (J"t, the sequence SUt+ n - t, n E Z+, has then the sarne distribution as (Sn). Since Sk < t for k < (J"t, we get f)t'f} d 'f} on +, which proves the asserted stationarity. To identify the intensity, let T!n denote the occupation measure of the d sequence Sk - Hn, Tn < k < T n +l, and note that HnJl.'f}n == TJo for each n, 
172 Foundations of Modern Pobability by the strong Markov property. Hence, by Fubini's theorem, E'f} E Ln 'f}n * 8Hn = Ln E(8Hn * E'f}n) E'f}o * E Ln 8Hn = E'f}o * E(. Noting that E( == c- I A by Proposition 9.18, that E'Tjo(O, 00) == 0, and that c = CETI by Proposition 9.14, we get on JR+ E'f} = E'f}JIL >. = El >. = C -1 >.. C c The next result describes the asymptotic behavior of the occupation measure 'Tj and its intensity E1}. Under weak restrictions on J.L, we shall see how (}t1} approaches the corresponding stationary version ij, whereas ETJ is asymptotically proportional to Lebesgue measure. For simplicity, we assume that the mean of J.L exists in JR . Thus, if  is a random variable with distribution J-t, we assume that E(+ /\ -) < 00 and define E = E+ - E- . It is natural to state the result in terms of vague convergence for measures on JR+, and the corresponding notion of distributional convergence for ran- dom measures. Recall that, for locally finite measures v, VI, V2, . .. on JR+, the vague convergence V n -4. v means that vnf -+ vf for all f E O.tOR+). Similarly, if TJ, TJl, 1}2, . .. are random measures on JR+, we define the dis- tributional convergence TJn  1} by the condition 1}nf  TJf for every f E Ok (JR+ ). (The latter notion of convergence will be studied in detail in Chapter 16.) A measure J1 on JR is said to be nonarithmetic if the additive subgroup generated by supp J.L is dense in IR. o Theorem 9.20 (two-sided renewal theorem, Blackwell, Feller and Grey) Let TJ be the occupation measure of a random walk in JR based on some distributions J.L and v, where J.L is nonarithmetic with mean c E JR \ {O}. If c E (0,00), let i] be the stationary version in Proposition 9.19; otherwise, put fj == O. Then as t ---7 00, (i) (}t'"  r" (ii) (}tErJ  Eij == (c- 1 V O)A. Our proof is based on two lemmas. First we consider the distribution Vt of the first nonnegative ladder height for the shifted process (Sn - t). For c E (0, 00 ), the key step is to show that Vt converges weakly toward the corresponding distribution f; for the stationary version. This will be accomplished by a coupling argument. Lemma 9.21 (asymptotic delay) If c E (0,00), then Vt  i/ as t  00. Proof: Let a and a' be independent random variables with distributions v and v. Choose some i.i.d. sequences (k)Jl(1?k) independent of 0 and 0' 
9. Random Walks and Renewal Theory 1 73 such that .c(k) = jj and P{'19 k = :f:l} = !. Then Sn = a' - a - '"" {}kk, n E Z+, k'5:n is a random walk based on a nonarithmetic distribution with mean 0, and so by Theorems 9.1 and 9.2 the set {Sn} is a.s. dense in . For any E > 0, the optional time a = inf{n > 0; Sn E [O,E]} is then a.s. finite. Now define 19 = (_l)l{k'5: lT }{}k, kEN, and note as in Lemma 9.10 that {a', (k,D)} d {a', (k,'19k)}. Let Kl < K2 < ... be the values of k with 19 k = 1, and define K < K < ... similarly in terms of ({}). By a simple conditioning argument, the sequences Sn = a +  . "'] , J5:n S' = a ' + '"" c ",' , n j'5:n  J n E Z+, are random walks based on jj and the initial distributions v and v, respectively. Writing a:1: = Lk:::;a 1 {{}k = :i::1}, we note that S_+n - Sa++n = Sa E [0, E], n E Z+. Putting'Y = S;+ V S';_, and considering the first entry of (Sn) and (S) into the interval [t, (0), we obtain li[E, x] - P{'Y > t} < Vt[O, x] < li[O, x + E] + P{, > t}. Letting t --+ 00 and then € --+ 0, and noting that i/{O} = 0 by stationarity, we get Vt [0, x] --+ i/[O, x]. 0 The following simple statement will be needed to deduce (ii) from (i) in the main theorem. Lemma 9.22 (uniform integrability) Let 1] be the occupation measure of a transient random walk (Sn) in}Rd with arbitrary initial distribution, and fix any bounded set B E Bd. Then the random variables 'fJ(B + x), x E jRd, are uniformly integrable. Proof: Fix any x E JRd, and put T = inf {t > 0; Sn E B + x}. Letting 1]0 denote the occupation measure of an independent randorn walk starting at 0, we get by the strong Markov property 1](B + x) d 1]o(B + x - S-r)l{T < oo} < 1]o(B - B). In remains to note that EfJo(B - B) < 00 by Theorem 9.1, since (Sn) is transient. 0 Proof of Theorem 9.20 (c < (0): By Lemma 9.22 it is enough to prove (i). If c < 0, then Sn --+ -00 a.s. by the law of large numbers, so ()t1] = 0 for sufficiently large t, and (i) follows. If instead c E (0, 00 ), then Vt  li by Lemma 9.21, and we may choose some random variables at and Q with distributions lit and v, respectively, such that at --t a a.s. We may also introduce the occupation measure 110 of an independent random walk starting at O. 
174 Foundations of Modern Pobability Now fix any f E Cj«(JR+), and extend f to JR by putting f(x) == 0 for x < O. Since iJ  '\, we have 1]0 { -Q} == 0 a.s. Hence, by the strong Markov property and dominated convergence, (O(fJ)f d J f(at + x)'fJo(dx)  J f(a + x)'fJo(dx) d fif. (c == 00): In this case it is clearly enough to prove (ii). Then note that E'TJ == v * EX * E(, where X is the occupation measure of the ladder height sequence of (Sn - So), and ( is the occupation measure of the same process prior to the first ladder time. Here E(R_ < 00 by Proposition 9.13, and so by dominated convergence it suffices to show that (}tEX -4 O. Since the mean of the ladder height distribution is again infinite by Proposition 9.14, we may henceforth take v == fJ o and let J,l be an arbitrary distribution on JR+ with infinite mean. Put I == [0, 1], and note that E'TJ(I + t) is bounded by Lemma 9.22. Define b == limsuPt E'TJ(I +t), and choose some tk -+ 00 with E'TJ(I +tk) -t b. Subtracting the finite measures J.l*j for j < m, we get (J.l*m*E'TJ)(I +tk) ---+ b for all m E Z+. Using the reverse Fatou lemma, we obtain for any B E B(+) liminf E'fJ(I - B + tk)J-t*mB k -+<X) > liminf ( E'fJ(I - x + tk){L*m(dx) k-+<X) J B - b -limsup ( E'fJ{l - x + tk){L*m(dx) k-+oo } Be > b - ( lim sup E'fJ{l - x + tk){L*m(dx) > b{L*mB. (20) J Be k-+oc> Now fix any h > 0 with j1.(O, h] > O. Noting that E1J[r, r + h] > 0 for all r > 0 and writing J == [0, a] with a == h + 1, we get by (20) liminf E17(J + tk - r) > b, r > a. k -+ oc> (21) Next conclude from the identity 6 0 == (6 0 - j1.) * E1] that {t k 1 = Jo {L(tk - x, oo)E'fJ(dx) > 2: {L(na, oo)E'fJ(J + tk - na). o nl As k -+ 00, we get by (21) and Fatou's lemma 1 > b Ek>l J-t(na, (0). Since the sum diverges by Lemma 3.4, it follows that b == O. - 0 We may use the preceding theory to study the renewal equation F == f + F * J-t, which often arises in applications. Here the convolution F * J-t is defined by (F*{L)t = it F(t- s){L(ds), t > 0, 
9. Random Walks and Renewal Theory 1 75 whenever the integrals on the right exist. Under suitable regularity condi- tions, the renewal equation has the unique solution F == f * jl, where Jl denotes the renewal measure En>o J-L*n. Additional conditions ensure the solution F to converge at 00. - A precise statement requires some further terminology. By a regular step function we mean a function on JR+ of the form It = L.>la j l(j-l,j)(t/h), t > 0, (22) J_ where h > 0 and aI, a2, . .. E IR. A measurable function ! on JR+ is said to be directly Riemann integrable if AI!I < 00 and there exist some regular step functions f: with f;; < f < f: and A(!: - f;:) - o. Corollary 9.23 (renewal equation) Fix a distribution /1 =I 8 0 on + with associated renewal measure jl, and let f be a locally bounded and measurable function on JR+. Then the equation F == f + F * J-L has the unique, locally bounded solution F == f * fl. If f is also directly Riemann integrable and if J.L is nonarithmetic with mean c, then Ft ---+ c- 1 Af as t --+ 00. Proof: Iterating the renewal equation gives F ==  f * J-L*k + F * J-L*n, n E N. k<n (23) Now jl*n[o, t] ---+ 0 as n ---+ 00 for fixed t > 0 by the weak law of large numbers, and so for a locally bounded F we have F * J-L*n ---+ O. If even f is locally bounded, then by (23) and Fubini's theorem, F ==  f * J-L*k == f *  J-t*k == f * jl. kO kO Conversely, f + f * jl * j..t == f * jl, which shows that F == f * jl solves the given equation. Now let J.L be nonarithmetic. If f is a regular step function as in (22), then by Theorem 9.20 and dominated convergence we get as t  00 Ft (t f(t - s)jl(ds) = L. ajjl((O, h] + t - jh) Jo Jl ---+ c-lh a o ==c- 1 Af. o>l J J_ In the general case, we may introduce some regular step functions I: with f;; < f < f: and A(f: - f;;)  0, and note that (/;; * fl) t < Ft < (/: * fl) t , t > 0, n EN. Letting t  00 and then n ---+ 00, we obtain Pt ---+ c- 1 AI. 0 Exercises 1. Show that if (Sn) is recurrent, then so is the random \\ralk (Snk) for each kEN. (Hint: If (Snk) is transient, then so is (Snk+ j) for any j > 0.) 
176 Foundations of Modern Pobability 2. For any nondegenerate random walk (Sn) in ]Rd, show that ISn I  00. (Hint: Use Lemma 5.1.) 3. Let (Sn) be a random walk in JR based on a symmetric, nondegenerate distribution with bounded support. Show that (Sn) is recurrent, using the fact that lim sUPn (:J:S n ) == 00 a.s. 4. Show that the accessible set A equals the closed semigroup generated by supp ft. Also show by examples that A mayor may not be a group. 5. Let v be an invariant measure on the accessible set of a recurrent random walk in }Rd. Show by examples that E1] mayor may not be of the form oo.v. 6. Show that a nondegenerate random walk in d has no invariant distribution. (Hint: If v is invariant, then J-L * 1/ == v.) 7. Show by examples that the conditions in Theorem 9.2 are not necessary. (Hint: For d = 2, consider mixtures of N(O, a 2 ) and use Lemma 5.18.) 8. Consider a random walk (Sn) based on the symmetric p-stable distribu- tion on 1R with characteristic function e- 1tIP . Show that (Sn) is recurrent for p > 1 and transient for p < 1. 9. Let (Sn) be a random walk in ]R2 based on the distribution J-L2, where J-L is symmetric p-stable. Show that (Sn) is recurrent for p == 2 and transient for p < 2. 10. Let J-L = Cj11 + (1 - C)J-L2, where J-L1 and J-L2 are symmetric distributions on JRd and C is a constant in (0, 1). Show that a random walk based on J..L is recurrent iff recurrence holds for the random walks based on J-Ll and J..L2. 11. Let J..L == J..Ll * J.L2, where J-Ll and J.L2 are symmetric distributions on }Rd. Show that if a random walk based on J-L is recurrent, then so are the random walks based on J-Ll and J.l2. Also show by an example that the converse is false. (Hint: For the latter part, let J.-tl and J-L2 be supported by orthogonal subspaces. ) 12. For any symmetric, recurrent random walk on 7l d , show that the ex- pected number of visits to an accessible state k =1= 0 before return to the origin equals 1. (Hint: Compute the distribution, assuming probability p for return before visit to k.) 13. Use Proposition 9.13 to show that any nondegenerate random walk in Zd has infinite mean recurrence time. Compare with the preceding problem. 14. Show how part (i) of Proposition 9.14 can be strengthened by means of Theorems 5.16 and 9.2. 15. For a nondegenerate random walk in :JR, show that lim sUPn Sn = 00 a.s. iff 0"1 < 00 a.s. and that Sn -4 00 a.s. iff EO'I < 00. In both conditions, note that 0"1 can be replaced by Ti. 16. Let 'Tl be a renewal process based on some nonarithmetic distribution on JR+. Show for any € > 0 that sup{t > 0; E1J[t, t + €] = O} < 00. (Hint: Imitate the proof of Proposition 8.14.) 
9. Random Walks and Renewal Theory 177 17. Let Jl be a distribution on Z+ such that the group generated by supp j.t equals Z. Show that Proposition 9.18 remains true with v{ n} == c- 1 j.t(n,00), n > 0, and prove a corresponding version of Proposition 9.19. 18. Let 1] be the occupation measure of a random walk on Z based on some distribution J-L with mean c E IR \ {OJ such that the group generated by supp J-t equals Z. Show as in Theorem 9.20 that ETJ{ n} --t c- 1 V o. 19. Derive the renewal theorem for random walks on Z+ from the ergodic theorem for discrete-time Markov chains, and conversely. (Hint: Given a distribution J..l on N, construct a Markov chain X on Z+ \vith Xn+l == X n + 1 or 0, and such that the recurrence times at 0 are i.i.d. {t. Note that X is aperiodic iff Z is the smallest group containing SUPP {t.) 20. Fix a distribution Jl on 1R with symmetrization jj. Note that if jj is nonarithmetic, then so is 1-£. Show by an example that the converse is false. 21. Simplify the proof of Lemma 9.21, in the case when even the sym- metrization jl is nonarithmetic. (Hint: Let 1, 2, . .. and , , . .. be i.i.d. J-L, and define Sn == 0' - Q + Ek:::;n ( - k).) 22. Show that any monotone and Lebesgue integrable function on IR+ is directly Riemann integrable. 23. State and prove the counterpart of Corollary 9.23 for arithmetic distributions. 24. Let (n) and (1Jn) be independent i.i.d. sequences with distributions {t and 1/, put Sn == Ek<n(k + TJk), and define U == Un>O[Sn, Sn + n+l)' Show that Ft == P{ t E -U} satisfies the renewal equation -F == f + F * J-L * v with It == j.t(t, 00). Assuming J-L and v to have finite means, show also that Ft converges as t -+ 00, and identify the limit. 25. Consider a renewal process TJ based on some nonarithmetic distribution J..l with mean c < 00, fix an h > 0, and define Ft == P{ TJ[t, t + h] == OJ. Show that F == f + F * J-t, where It == J-L( t + h, 00 ). Also show that Pt converges as t -+ 00, and identify the limit. (Hint: Consider the first point of 1] in (0, t), if any.) 26. For 17 as above, let T = inf{ t > 0; 17[t, t + h] == O}, and put Ft == P{r < t}. Show that Ft = J-t(h, 00) + Johl\t J-t(ds)Ft-s, or F == f + F * J..lh, where J..lh == 1 [O,h] · J.L and f = J..l( h, (0). 
Chapter 1 0 Stationary Processes and Ergodic Theory Stationarity, invariance, and ergodicity; discrete- and continuous- time ergodic theorems; moment and maximum inequalities; multivariate ergodic theorems; sample intensity of a random measure; subadditivity and products of random matrices; con- ditioning and ergodic decomposition; shift coupling and the invariant a-field In this chapter we come to the third important dependence structure of probability theory, beside those of martingales and Markov processes, namely stationarity. A stationary process is simply a process whose distri- bution is invariant under shifts. Stationary processes are important in their own right, and they also arise under broad conditions as steady-state limits of various Markov and renewal-type processes, as we have seen in Chapters 8 and 9 and will see again in Chapters 12, 20, and 23. Our present aim is to present some of the most useful general results for stationary and related processes. The key result of stationarity theory is Birkhoff's ergodic theorem, which may be regarded as a strong law of large numbers for stationary sequences and processes. After proving the classical ergodic theorems in discrete and continuous time, we turn to the multivariate versions of Zygmund and Wiener, the former in a setting for noncommutative mappings and rectan- gular regions, the latter in the commutative case but with averages over increasing families of convex sets. Wiener's theorem will also be consid- ered in a version for random measures that will be useful in Chapter 11 for the theory of Palm distributions. We finally present a version of King- man's subadditive ergodic theorem, along with an important application to random matrices. In all the mentioned results, the limit is a random variable, measurable with respect to the appropriate invariant a-field L. Of special interest then is the ergodic case, when I is trivial and the limit reduces to a constant. For general stationary processes, we consider a decomposition of the dis- tribution into ergodic components. The chapter concludes with some basic criteria for coupling and shift coupling of two processes, expressed in terms of the tail and invariant a-fields r and I, respectively. Those results will be helpful to prove some ergodic theorems in Chapters 11 and 20. 
10. Stationary Processes and Ergodic Theory 179 Our treatment of stationary sequences and processes is continued in Chapter 11 with some important applications and extensions of the present theory. In particular, we will then derive ergodic theorems for Palm dis- tributions, as well as for entropy and information. In Chapter 20 we show how the basic ergodic theorems admit extensions to suitable contraction operators, which leads to a profound unification of the present theory with the ergodic theory for Markov transition operators. Our treatment of the ratio ergodic theorem is also postponed until then. Let us now return to the basic notions of stationarity and invariance. Then fix an arbitrary measurable space (5, S). Given a measure J-l and a measurable transformation T on S, we say that T is JL-preserving or measure-preserving if J1 a T- I == J1. Thus, if  is a random element of 5 with distribution J1, then T is measure-preserving iff T d . In particular, consider a random sequence  == (o, l,. . .) in some measurable space (S', S'), and let () denote the shift on 5 == (8') 00 given by B( Xo, Xl, . . .) == (x 1, X2, . . . ). Then  is said to be stationary if (J d  . We show that the general situation is equivalent to this special case. Lemma 10.1 (stationarity and invariance) For any random element  in , 8 and measurable transformation T on 5, we have T   iff the sequence (Tn) is stationary, in which case even (f 0 Tn) is stationary for every measurable function f. Conversely, any stationary random sequence admits such a representation. Proof: Assuming T d , we get 0(1 0 Tn) == (I 0 Tn+l(,) == (f 0 TnT) d (I 0 Tn), and so (f 0 Tn) is stationary. Conversely, if 7J == (TJo, 7JI, . . . ) is stationary we may write TJn == 1ro(on'TJ) with 1rO(XO,Xl,...) == Xo, and we note that 07J d 7J by the stationarity of 1]. D In particular, we note that if o,  1 , . .. is a stationary sequence of random elements in some measurable space 8, and if 1 is a measurable mapping of Soo into some measurable space S', then the random sequence 'TJn == 1 (  n ,  n+ I , . . . ), n E Z + , is again stationary. The definition of stationarity extends in the obvious way to random se- quences indexed by Z. The two-sided versions have the technical advantage that the associated shift operators form a group, rather than just a semi- group as in the one-sided context. The following result shows that the two cases are essentially equivalent. Here we assume the existence of appropriate randomization variables, as explained in Chapter 6. 
180 Foundations of Modern Probability Lemma 10.2 (two-sided extension) Any stationary random sequence o, 1, . .. in a Borel space admits a stationary extension. . . , -1, o, 1, . . . to the index set Z. Proof: Assuming 'l9 1 ,'l9 2 ,... to be i.i.d. U(O,l) and independent of  == (O,I,...), we may construct the -n recursively as functions of  d and 19 1 ,..., f}n such that (-n'-n+l,...) ==  for all n. In fact, once -1,... '-n have been chosen, the existence of -n-l is clear from The- orem 6.10 if we note that (-n, -n+l, . . .) d (). Finally, the extended sequence is stationary by Proposition 3.2. 0 N ow fix a measurable transformation T on some measure space (S, S, J-l), and let SJL denote the J-l-completion of S. We say that a set I c 8 is invariant if T- 1 I == I and almost invariant if T- 1 I == I a.e. J-t, in the sense that J-l(T- 1 I b,.I) == O. Since inverse mappings preserve the basic set operations, the classes I and I' of invariant sets in S and almost invariant sets in SJL form a-fields in S, called the invariant and almost invariant a -fields, respectively. A measurable function j on S is said to be invariant if joT = f and almost invariant if f 0 T == f a.e. fl. The following result gives the basic relationship between invariant or almost invariant sets and functions. Lemma 10.3 (invariant sets and functions) Fix a measure fL and a mea- surable transformation T on S, and let f be a measurable mapping of S into a Borel space 8'. Then f is invariant or almost invariant iff it is I-measurable or I' -measurable, respectively. Proof: We may first apply a Borel isomorphism to reduce to the case when S' == R. If f is invariant or almost invariant, then so is the set Ix == f-l ( -00, x) for any x E IR, and so Ix E I or I', respectively. Converely, if f is measurable with respect to I or I', then Ix E I or I', respectively, for every x E JR. Hence, the function fn(s) == 2-n[2nj(s)], s E 8, is invariant or almost invariant for every n E N, and the invariance or almost invariance carries over to the limit f. 0 The next result clarifies the relationship between the invariant and almost invariant a-fields. Here we write IlL for the j1-completion of I in SIL, the u-field generated by I and the J-t-null sets in SIL. Lemma 10.4 (almost in variance) For any distribution J1 and j1-preserving transformation T on S, the associated invariant and almost invariant u-fields I and I' are related by I' = IlL. Proof: If J E IlL, there exists some I E I with J-t( I b,.J) == O. Since T is j1-preserving, we get /-L(T- 1 J b.J) < /-L(T- 1 J b.T- 1 I) + J-l(T- 1 I Dr.!) + j1(1 Dr.J) jloT-1(Jt,.I) = jl(JI) = 0, 
10. Stationary Processes and Ergodic Theory 181 which shows that J E I'. Conversely, given any J E I', we may choose some J' E S with J..L( J D.J') == 0 and put I nn Uk>n T-n J'. Then, clearly, I E I and J-L( I f).J) = 0, and so J E Ill-. - 0 A measure-preserving mapping T on some probability space (5, S, J-l) is said to be ergodic for J-L or simply J-L-ergodic if the invariant a-field I is j..L-trivial, in the sense that J-LI = 0 or 1 for every I E I. Depending on viewpoint, we may prefer to say that J-L is ergodic for T, or T-ergodic. The terminology carries over to any random element  with distribution J-L, which is said to be ergodic whenever this is true for 'T or J.-t. Thus,  is ergodic iff P{ E I} == 0 or 1 for any I E I, that is, if the a-field If" == -lI in n is P-trivial. In particular, a stationary sequence  == (n) is ergodic if the shift-invariant a-field is trivial for the distribution of . The next result shows how the ergodicity of a random element  is related to the ergodicity of the generated stationary sequence. Lemma 10.5 (ergodicity) Let  be a random element in 8 with distribu- tion J-L, and let T be a J-L-preserving mapping on S. Then  is T -ergodic iff the sequence (Tn) is B-ergodic, in which case even TJ == (f 0 Tn) is B-ergodic for every measurable mapping f on S. Proof: Fix any measurable mapping f: 8 -1' 8', and define F == (I 0 Tn; n > 0), so that PoT == B 0 F. If I c (S')CXJ is B-invariant, then T- 1 p-l I == F- 1 ()-1 I == F- 1 I, and so p-l I is T-invariant in S. Assuming  to be ergodic, we obtain P{TJ E I} = P{ E p- 1 I} == 0 or 1, which shows that even 1/ is ergodic. Conversely, let the sequence (Tn) be ergodic, and fix any T-invariant set I in S. Put F == (Tn; n > 0), and define A == {s E 800; Sn E I i.o.}. Then I = p-l A and A is B-invariant. Hence, P{ E I} == p{crn) E A} == 0 or 1, which means that even  is ergodic. 0 We may now state the fundamental a.s. and mean ergodic theorem for stationary sequences of random variables. Recall that (5, S) denotes an arbitrary measurable space, and write If" == -lI for convenience. Theorem 10.6 (ergodic theorem, Birkhoff) Let  be a random element in S with distribution J..L, and let T be a J-L-preserving map on 8 with invariant a-field I. Then for any measurable function f > 0 on S, n- 1 "" f(Tk(,)  E[f((')II] a.s. (1) L..; k<n The same convergence holds in LP for some p > 1 when f E LP(J-L). The proof is based on a simple, but ingenious, inequality. Lemma 10.7 (maximzal ergodic lemma) Let  == (k) be a stationary sequence of integrable random variables, and put Sn = l + . . . + n' Then E[{l; sUPnSn > 0] > o. 
182 Foundations of Modern Probability Proof (Garsia): Put M n == S1 V . . . V Sn. Assuming  to be defined on the canonical space ]Roo, we note that Sk == 1 + Sk-1 0 () < 1 + (M n 0 ())+, k == 1, . . . , n. Taking maxima yields M n < 1 + (M n 0 B)+ for all n E N, and so by stationarity E[l; M n > 0] > E[Mn - (M n 08)+; M n > 0] > E[(M n )+ - (M n 0 ())+] == o. Since M n t sUPn Sn, the assertion follows by dominated convergence. 0 Proof of Theorem 10.6 (Yosida and Kakutani): First assume that f E £1, and put 1Jk == !(Tk-l€) for convenience. Since E['TJIII] is an invariant function of  by Lemma 10.3, the sequence (k == 'f/k - E['TJIII] is again stationary. Writing Sn == (1 + . . . + (n, we define for any E > 0 Ac == {limsuPn(Sn/n) > c:}, ( == (n - c:)lAe' and note that the sums S == (i + . . . + ( satisfy {suPnS > o} == {suPn(S/n) > o} == {suPn(Sn/n) > c:} n Ac == Ac:. Since Ac E I, the sequence «() is stationary, and Lemma 10.7 yields o < E[(f; sUPnS > 0] == E[( - c:; Ac] == E[E[(II]; Ac] - C:P Ac: == -EP Ac:, which implies PAc == o. Thus, lim sUP n (Sn / n) < c: a.s., and c: being arbi- trary, we obtain limsuPn(Sn/n) < 0 a.s. Applying the same result to -Sn yields liminfn(Sn/n) > 0 a.s., and so by combination Sn/n  0 a.s. Next assume that! E LP for some p > 1. Using Jensen's inequality and the stationarity of T k , we get for any A E A and r > 0 E 1 A l n- 1  f(Tk) I P < n- 1  E[lf(Tk)IP; A] k<n k<n < r P P A + E[lf()IP; If()1 > r], which tends to 0 as P A  0 and then r --t 00. Hence, by Lemma 4.10 the pth powers on the left are uniformly integrable, and the asserted £P- convergence follows by Proposition 4.12. Finally, let f > 0 be arbitrary and put E[f()II] = fi. Conditioning on the event {'i} < r} for arbitrary r > 0, we see that (1) holds a.s. on {'i} < oo}. Next we have a.s. for any r > 0 liminf n- 1 "'" f(Tk) > lirn n- 1 "'" (f(Tk) 1\ r) n-+oo k$n n-+oo kn == E[f() 1\ rII]. As r  00, the right-hand side tends a.s. to fj by the monotone convergence property of conditional expectations. In particular, the left-hand side is a.s. infinite on {ij == oo}, as required. 0 
10. Stationary Processes and Ergodic Theory 183 Write I and T for the shift-invariant and tail a-fields, respectively, in IRoo and note that I c T. Thus, for any sequence of random variables  = (1' 2,.'.), we have I{ = -lI c -lT. By Kolmogorov's 0-1 law, the latter a-field is trivial when the €n are independent. If they are even i.i.d. and integrable, then Theorem 10.6 yields n- I (l -t- . . . + n) -+ El a.s. and in £1, in agreement with Theorem 4.23. Hence. the last theorem contains the strong law of large numbers. It is often useful to allow the function f = fn,k in Theorem 10.6 to depend on n or k. For later needs, we consider a slighty more general situation. Corollary 10.8 (approximation, Maker) Let  be a random element in S with distribution Jj, let T be a J.L-preserving map on S with invariant a-field I, and consider some measurable functions f and fm,k on S. (i) If fm,k -+ f a.s. and sUPm,k Ifm,k I E £1, then as m" n -+ 00, n- 1 ,", fm k(Tk) -+ E[f()II] a.s. k<n ' (ii) If fm,k -+ f in LP for some p > 1, the same convergence holds in LP. Proof: (i) By Theorem 10.6 we may assume that 1 = O. Then put 9r = sUPm,k>r I/m,kl, and conclude from the same result that a.s. limsup I n- 1 L fm,k(Tkf,) I < Hm n- 1 L gr(Tk) m n-+CX) k<n n-+oo k<n , - E[gr () II]. Here gr() -+ 0 a.s., and so by dominated convergence E[9r(€)II] -+ 0 a.s. (ii) Assuming f = 0, we get by Minkowski's inequality and the invariance of J-t Il n- 1 ,", 1m k 0 T k II < n- 1 """ 111m kill) -+ O. k<n' p k<n' o Our next aim is to extend the ergodic theorem to continuous time. We may then consider a family of transformations Tt on S, t > 0, satisfying the semigroup property T s + t == TsTt. The semigroup (Tt) is called a flow if it is also measurable, in the sense that the mapping (x, t) r---+ Ttx is product measurable from S x JR+ to S. The invariant a-field I now consists of all sets I E S such that Tt- 1 I == I for all t. A random element  in S is said to be {Tt)-stationary if Tt d f, for all t > o. Corollary 10.9 (continuous-time ergodic theorem) Let  be a random el- ement in S with distribution Jl, and let (Ts) be a Jj-preserving flow on S with invariant O'-field I. Then for any measurable function f > 0 on S, !irn c 1 t f(TsE;,)ds = E(j(E;,)ITd a.s. (2) t-+oo } 0 The same convergence holds in LP for some p > 1 when f E LP (fl ) . 
184 Foundations of Modern Probability Proof: In both cases we may assume that f > O. Writing Xs = f(Ts), we get by Jensen's inequality and Fubini's theorem i t P i t i t E t- I X ds < Et- I XPds = t- I EXPds = EX P < 00 s - s so' 000 The required convergence now follows as we apply Theorem 10.6 to the function g(x) == J; f{Tsx)ds and the discrete shift T == Tl. To identify the limit, we first assume that f E £1 and introduce the invariant version l r + n f() == Hrn lirnsup n- l f(Ts)ds, r--+oo n--+oo r which is also I-measurable. By the stationarity ofTs we have EI£. f(Ts) == EI£. f() a.s. for all s > O. Using Fubini's theorem, the £l-convergence in (2), and the contraction property of conditional expectations, we get as t --t 00 EI J(t,) = EI c11t J(Tst,)ds  EI Jet,) = !(t,), as required. The result extends as before to arbitrary f > o. o We return to the case when  1 , 2, . .. is a stationary sequence of inte- grable random variables, and put Sn == Ek<n k. Since Sn/n converges a.s. by Theorem 10.6, we note that the maximum M == sUPn(Sn/n) is a.s. finite. The following result, relating the moments of  and M, is known as the dominated ergodic theorem. Here we write log+ x = log(x V 1) for convenIence. Proposition 10.10 (moment inequalities, Hardy and Littlewood, Wiener) Let  == (k) be a stationary sequence of random variables, and put Sn == Ekn k and M == sUPn(Sn/n). Then (i) EIMIP 5 EIIIP for fixed p > 1; (ii) E IMllog IMI :S 1 + E 1lllog+l Ill for fixed m > o. The proof requires a simple estimate related to Lemma 10.7. Lemma 10.11 (maximum inequality) If € == (€k) is stationary in £1, then rP{suPn(Sn/n) > 2r} < E[€l; €1 > r], r > O. Proof: For any r > 0, we put k = kl{€k > r} and note that k < k +r. Assuming  to be defined on the canonical space ]Roo and writing An == Sn/n, we get An - 2r == An 0 ( - 2r) < An 0 (r - r), 
10. Stationary Processes and Ergodic Theory 185 which implies M - 2r < M 0 (r - r). Applying Lemma 10. 7 to the sequence r - r, we obtain rP{M > 2r} < rP{M 0 (r - r) > 0) < E[r;Mo(r-r»O] < E(,r == E[l; 1 > r]. o Proof of Proposition 10.10: We may clearly assume that (,1 > 0 a.s. (i) By Lemma 10.11, Fubini's theorem, and some calculus, EMP - pE 1 M r p - 1 dr = p 1 00 P{M > r} r p - 1 dr < 2p 1 00 E[6j 26 > r] r p - 2 dr r21 2pE 6 Jo r p - 2 dr 2p(p - 1 )-1 E 1 (21 )P-l :S Ef 0 (ii) For m = 0, we may write < < E(M -1)+ = 1 00 P{M > r}dr 21 00 E[6j 26 > r] r- 1 dr r2 1 VI 2 E 6 J 1 r -1 dr = 2 E 6 log+ 26 e + 2E[1Iog 21; 21 > e] 1 + E llog+ lo EM-l < < ,..-..,. For m > 0, we instead write EMlog M = 1 00 P{Mlog M > r}dr 1 00 P{M > t} (mlogm-1 t + logm. t) dt < 21 00 E[6j 26 > t] (mlogm-1 t + logm t) r 1 dt < < {log+ 21 2E6 J o (mxm-1+xm)da; { I m+l 2'" } 2 E c 10 m 2'" + og+  l l g+ 1 m + 1 2e + 4E[11ogm+l 21; 21 > e] 1 + E {llog+l {lo o 
186 Foundations of Modern Probability Given a measure space (8, S, J-L), we introduce for any m > 0 the class L logm L(J-t) of measurable functions f on S satisfying J Ifllog IfldJ-L < 00. Note in particular that L logO L == L 1 . Using the maximum inequalities of Proposition 10.10, we may prove the following multivariate version of The- orem 10.6 for possibly noncommuting, measure-preserving transformations T 1 , . . . , Td. Theorem 10.12 (multivariate ergodic theorem, Zygmund) Let  be a ran- dom element in S with distribution J-l, let Tl' . . . , Td be j.t-preserving maps on S with invariant a-fields II,... ,Id, and put Jk == -IIk. Then for any f E L logd-l L(J-l), we have as n1, . . . , nd -+ 00 (nl . . . nd)-l I: . .. I: f(Tfl... T:d) -+ EJd . . . EJI f() a.s. (3) k 1 <nl kd<nd The same convergence holds in LP for some p > 1 when f E LP(/L). Proof: Since E[f()IJk] == J-t[fII k ] 0  a.s., e.g. by Theorem 10.6, we may take  to be the identity mapping on S. For d == 1 the result reduces to Theorem 10.6. Now assume the statement to be true up to dimension d. Proceeding by induction, consider any JL-preserving maps T 1 ,. . . , Td+l on S and let f E L logd L. By the induction hypothesis, the d-dimensional version of (3) holds as stated, and we may write the result in the form fm -+ 1 a.s., where m = (nl'...' nd). Iterating Proposition 10.10, we also note that /L sUP m liml < 00. Hence, by Corollary 10.8 (i) we have as m, n -+ 00 I I: k - n- 1m 0 Td+l  JL[III d + 1 ] a.s., k<n as required. The proof of the LP-version is similar. 0 In the commutative case, the last result leads immediately to an interest- ing relationship between the associated conditional expectations. Let L 1 () denote the set of all integrable, -measurable random variables. Corollary 10.13 (commuting maps and expectations) Assume in Theo- rem 10.12 that Tl,... ,Td commute, and put :T == nk :Tko Then E31 . . . E3d == E3 on L 1 (). Proof: Since even Tfl,. . . , T;d commute for arbitrary k 1 ,... , kd E Z+, Theorem 10.12 yields E.Jl . . . E:Jd f() = E:J P1 ... E:J Pd f() a.s. (4) for any measurable function f > 0 on S and permutation PI, . . . , Pd of 1, 0 . . , d. In particular, the expression in (4) is a.s. Jk-measurable for every k and therefore 3-measurable. It remains to note that E[E.Jl . . . E.:Td f(); A] = E[f(); A], A E :1. o For commuting mappings Tl, . . . , Td on S, we note that the compositions T k == Tfl . . . T:d form ad-dimensional semi group indexed by Zi. Similarly, when (Ti),..., (TJ) are commuting flows on S, the compositions T S = 
10. Stationary Processes and Ergodic Theory 187 Tl . . . TJd form a d-dimensional measurable semigroup or flow indexed by JRt. In the continuous parameter case, it may be more natural to consider flows indexed by }Rd, corresponding to the case of stationary processes on JRd. In this context, one may also want to average over more general sets than rectangles. Here we consider a basic ergodic theorem for increasing sequences of convex sets. Given such a set B, we define the inner radius r(B) as the radius of the largest open ball contained in l. Put Ad B == IBj. Theorem 10.14 (monotone, multivariate ergodic theor-em, Wiener) Let  be a random element in S with distribution J1. Consider a flow of J1- preserving maps Ts, s E JRd, on S with invariant a-field I and fix some bounded, convex sets BI C B 2 C . .. in B d with r(Bn) -t 00. Then for any measurable function f > 0 on S, I B nJ-l ( f(Tsl;.) ds --+ E[f(I;.)II;] a.s. iB n The same convergence holds in LP for some p > 1 when f E £P(J-L). Several lemmas are needed for the proof. We begin with some esti- mates for convex sets, stated here without proof. Let BeE denote the E-neighborhood of the boundary 8B, and write binomial coefficients as (n//k). Lemma 10.15 (convex sets) If B C ]Rd is convex and E > 0, then (i) IB - BI < (2d//d)t B I; (ii) IBeBI < 2((1 + E:/r(B))d - l)IBI. We continue with a simple geometric estimate. Lemma 10.16 (space filling) Fix any bounded, convex sets BI C ... C Bm in B d with IBII > 0, a bounded set K E Sd, and a function p: K --+ {I, . . . , m}. Then there exists a finite subset H C K such that the sets Bp(x) + x, x E H, are disjoint and satisfy IKI < (2d//d) l:xEH IBp(x)l. Proof: Put C x = Bp(x) + x and choose Xl, X2, . .. E K recursively, as follows. Once Xl,. . . , Xj-l have been selected, we choose Xj E K with the largest possible p(x) such that C Xt nc x ] = 0 for all i < j. The construction terminates when no such Xj exists. Put H = {Xi}, and note that the sets C x with x E H are disjoint. Now fix any y E K. By the construction of H we have C x n C y =1= 0 for some X E H with p(x) > p(y), and so y E Bp(x) - Bp(y) + x c Bp(x) - Bp(x) + x. Hence, K c UXEH(Bp(x) - Bp(x) + x), and so by Lemma 10.15 (i) IKI <  IBp(x) - Bp(x) I < (2d// d) " IBp(x) I. L.J xEH XEH We may now establish a multivariate version of Lemma 10.11, stated for convenience in terms of random measures (see the detailed discussion below). For motivation, we note that the set function ",B = IB f(Ts)ds in o 
188 Foundations of Modern Probability Theorem 10.14 is a stationary random measure on}Rd and that the intensity m of 1J, defined by the relation E'T] = mAd, is equal to Ef(). Lemma 10.17 (maximum inequality) Let  be a stationary random mea- sure on ]Rd with intensity m, and let Bl C B 2 C ... be bounded, convex sets in Bd with IBll > o. Then r P{suPk(Bk/IBkl) > r} < m (2d//d), r > o. Proof: Fix any r, a > 0 and n E N, and define a process 1/ on jRd and a random set K in Sa = {x E }Rd; Ixl < a} by 1/(x) = inf{k E N; (Bk + x) > rIBkl}, x E d, K == {x E Sa; v(x) < n}. By Lemma 10.16 there exists a finite, random subset H C K such that the sets Bv(x) +x, x E H, are disjoint and IKI < (2d//d) LXEH IBv(x)l. Writing b = sup{lxl; x E Bn}, we get Sa+b > LXEH(Bv(x) + x) > r LXEHIBv(x) I > rIKI/(2dffd). Taking expectations and using Fubini's theorem and the stationarity and measurability of v, we obtain m (2dffd) ISa+b1 > rEIKI = r [ P{v(x) < n} dx JS a == rlBal P{maxk<n(Bk/IBkl) > r}. Now divide by ISal, and then let a -+ 00 and n -+ 00 in this order. 0 We finally need an elementary Hilbert-space result. Recall that a con- traction on a Hilbert space H is defined as a linear operator T such that IITII < IIII for all  E H. For any linear subspace M c H, we write M.L for the orthogonal complement and M for the closure of M. The adjoint T* of an operator T is characterized by the identity (,T17) = (T*,rJ), where (-, .) denotes the inner product in H. Lemma 10.18 (invariant subspace) For any family T of contractions on a Hilbert space H, let N denote the I -invariant subspace of H, and let R be the linear subspace of H spanned by the set { - T;  E H, T E T}. Then N.L c R . Proof: If  .l R, then (-T*,lJ)==(,1}-T1})=O, TE/,1}EH, which implies T*  =  for every TEl. Hence, for any TEl we have (T,) = (, T* {) = II 11 2 , and so by the contraction property, o < IIT - 1I2 = IIT1I2 + 1I{1I2 - 2(T,) < 2111I2 - 2111I2 = 0, which implies T =. This gives R.L C N, and so N.L C (R.L).L = R . 0 
10. Stationary Processes and Ergodic Theory 189 Proof of Theorem 10.14: First assume that f E L 1 , and define Tsf = f 0 Ts, An = IBnl-1 f Tsds. JB n For any € > 0, Lemma 10.18 yields a measurable decomposition I = I€ + '""" (9k - T Sk 9k) + hE:, km where I€ E L 2 is Ts-invariant for all s E IR d , the functions 9, . . . ,9:n are bounded, and Elh€()1 < E. Here clearly Anl€ = fE. Next, we see from Lemma 10.15 (ii) that, as n -+ 00 for fixed k < m and £ > 0, IIAn(gk - T Sk 9k)11 < (I(B n + sk)BnI/IBn/) IIg%11 < 2 ((1 + ISkl/r(Bn))d - 1) 119% II --+ O. Finally, Lemma 10.1 7 yields rP{suPnAnlh€()1 > r} < (2dffd)Elh€()1 < (2dffd)£, r,£ > 0, which implies sUPn Anlh€()1  0 as E --+ O. In particular, it follows that liminf n Anf() < 00 a.s., which justifies the estimate (limsuPn -liminfn)Anf() (limsuPn -liminfn)Anh€() p 2 sUPnAnlh€ ()I --+ o. < This shows that the left-hand side vanishes a.s., and the required a.s. convergence follows. When f E LP for some p > 1, the asserted LP -convergence follows as before from the uniform integrability of the powers IAnf()IP. We may now identify the limit, as in the proof of Corollary 10.9, and the a.s. convergence extends to arbitrary f > 0, as in case of Theorem 10.6. 0 We turn to a version of Theorem 10.14 for random measures on ]Rd. Recall that a random measure  on R d is defined as a locally finite kernel from the basic probability space (n, A, P) into }Rd. In other words, (w, B) is required to be a locally finite measure in B E B d for fixed wEn and a random variable in wEn for every bounded set B E B d . Alternatively, we may regard  as a random element in the space M (1d) of locally finite measures J.-l on ]Rd, endowed with the a-field generated by all evaluation maps J-L t--1- J-LB with B E B d . We say that  is stationary if 0 s d  for every S E= JRd, where the shift operators Os on M(Rd) are defined by ((}s{t)B = {t(B + s) for all B E Bd. The invariant a-field of (, is given by I = -lI, where I denotes the a-field of all shift-invariant, measurable sets in M(d). We may now define the sample intensity of  as the extended-valued random variable  = E[BII]/IBI, where B E Bd is arbitrary with IBI E (0, (0). Note that this expression is independent of B, by the stationarity of  and Theorem 2.6. 
190 Foundations of Modern Probability Corollary 10.19 (sample intensity, Nguyen and Zessin) Let  be a sta- tionary random measure on JRd, and fix some bounded, convex sets Bl C B 2 C ... in B d with r(Bn)  00. Then Bn/IBnl -1-  a.s., where Ad = E[II]. The same convergence holds in LP for some p > 1 when  [0, 1 J d E LP. Proof: By Fubini's theorem, we have for any A, B E B d L (Osf,)Ads = L ds J lA(t - s)f,(dt) J f,(dt) L lA(t - s) ds = f,(lA * lB). Assuming IAI == 1 and A C Sa == {s; Isl < a}, and putting B+ == B + Sa and B- = (BC + Sa)C, we note that also lA * I B - < IB < 1A * I B +. Applying this to the sets B = Bn gives IBI €(lA*lB) < Bn < IBI (IA*lB) IBnl IB;I - IBnl - IBnl IBI . Since r(Bn) -+ 00, Lemma 10.15 (ii) yields IBI/IBnl -+ 1. Next we may apply Theorem 10.14 to the function f(J-t) == J.lA and the convex sets B!: to obtain (lA * 1B)/IBI-+ EIA =  in the appropriate sense. 0 The LP-versions of Theorem 10.14 and Corollary 10.19 remain valid un- der weaker conditions than previously indicated. The following results are adequate for most purposes. Here we say that the distributions (probability measures) J.ln on JRd are asymptotically invariant if IIJ-tn - J.ln * 8s II -1- 0 for every s E jRd, where II . II denotes the total variation norm. Similarly, the weight functions (probability densities) fn on }Rd are said to be asymptotically invariant if ,\dlfn - Osfnl ---t 0 for every s. Note that the conclusion of Theorem 10.14 can be written as J.lnX -+ X, where Mn = (IBn . ,\d)/IB n (, Xs = f(Ts), and X == E[f()II]. Corollary 10.20 (mean ergodic theorem) (i) For any p > 1, consider on jRd a stationary, measurable, and LP- valued process X and some asymptotically invariant distributions J.ln. Then MnX -+ X = E[XIIx] in £P. (ii) Consider on JRd a stationary random measure  with finite intensity and some asymptotically invariant weight functions in. Then fn   in £1, where Ad == E(II]. Proof: (i) By Theorem 10.14 we may choose some distributions l/m on jRd such that l/mX -+ X in LP. Using Minkowski's inequality and its extension in Corollary 1.30, along with the stationarity of X, the invariance of X, 
10. Stationary Processes and Ergodic Theory 191 and dominated convergence, we get as n -+ 00 and then m -+ 00 IIJln X - Xllp < IIJln X - (Jln * vm)Xllp + II(Jln * vm)..-Y - Xllp < lIJ.Ln - J.Ln * vmllllXllp + J 11(8 5 * vm)X - Xllp J.Ln(ds) < IIXli p J IIJ.Ln - J.Ln * 8tll vm(dt) + IlvmX - Xllp --+ O. (ii) By Corollary 10.19 we may choose some weight functions gm such that f,gm -+  in L 1 . Using Minkowski's inequality, the stationarity of , the invariance of f" and dominated convergence, we get as n -+ 00 and then m-*oo IIf,fn - lIl < lIf,fn - f,(fn * 9m)lh + 11f,(fn * gm) -- Ih < Et.lin - in * 9ml + J 1It.(8 5 * 9m) - Ih in(s) ds < E J Adlin - Bdnl 9m(t) dt + 1It.9m - Ih --+ O. 0 Additional conditions may be needed to ensure LP-convergence in case (ii) when f,B E LP for bounded sets B. It is certainly enough to require In < cg n for some weight functions 9n with 9n -+  in LP and some constant C > o. Our next aim is to prove a subadditive version of Theorem 10.6. For mo- tivation and subsequent needs, we begin with a simple result for nonrandom sequences. Recall that a sequence Cl, C2, . . . E JR. is said to be subadditive if C m + n < C m + C n for all m, n E N. Lemma 10.21 (subadditivity) For any subadditive sequence Cl, C2, ... E JR., we have I . Cn . f Cn [ ) 1m - = In - E -00, 00 . n--+-oo n n n Proof: Iterating the subadditivity relation, we get for a.ny k, n E N C n < [n/k]ck + Cn-k[n/k] < [n/ k]Ck + Co V . . . V Ck-l, where Co == O. Noting that [n/k]  n/k as n -* 00, we get 1imsuPn(c n /n) < Ck/ k for all k, and so . f Cn < I . . f Cn I . C n . f Cn In - _ 1m In - < lmsup- < In -. n n noo n noo n n n o We turn to the more general case of a two-dimensional array Cjk, 0 < j < k, which is said to be subadditive if Co,n < Co,m + cm"n for all m < n. The present notion reduces to the previous one when Cjk == Ck-j for some sequence Ck. We also note that subadditivity holds automatically for arrays of the form Cjk = aj+1 + . . . + ak. We shall now extend the ergodic theorem to subadditive arrays of random variables jk, 0 < j < k. For motivation, we recall from Theorem 10.6 that if f,jk = 1]j+1 +.. .+1]k for some stationary and integrable sequence of random 
192 Foundations of Modern Probability variables T/k, then (O,n/n converges a.s. and in L 1 . A similar result holds for general subadditive arrays (jk) that are stationary under simultaneous shifts in the two indices, so that (j+l,k+l) d «(j,k). To allow for a wider range of applications, we introduce the slightly weaker assumptions (k,2k, (2k,3k, . . . ) (k,k+l, k,k+2,...) d d (O,k , k,2k, . . . ), (O,I, O,2,...), kEN, kEN. (5) (6) For convenience of reference, we also restate the subadditivity requirement: O,n < O,m + m,n' 0 < m < n. (7) Theorem 10.22 (subadditive ergodic theorem, Kingman) Let (jk) be a subadditive array of random variables satisfying (5) and (6), and assume that E1 < 00. Then O,n/n converges a.s. toward a random variable  in [-00,(0) with E = infn(Eo,n/n) = c. The same convergence holds in £1 when c > -00. If the sequences in (5) are ergodic, then  is a.s. a constant. Proof (Liggett): Put O,n = n for convenience. By (6) and (7) we have E;t < nEt < 00. We first assume c > -00, so that the variables m,n are integrable. Iterating (7) gives n / n -:;; < L (j-l)k,jk/n + L j-l,j/n, n, kEN. (8) j=1 j=k[n/k]+1 By (5) the sequence (j-l)k,jk' j E N, is stationary for fixed k, and so 1 -. 1 by Theorem 10.6 we have n- Lj::;n (j-l)k,jk -+ k a.s. and In L , where Ek = Ek. Hence, the first term in (8) tends a.s. and in L 1 toward k/k. Similarly, n- 1 Lj::;n j-l,j -+ (1 a.s. and in £1, and so the second term in (8) tends in the same sense to o. Thus, the right-hand side converges a.s. and in L 1 toward k/k, and since k is arbitrary, we get limsuPn(n/n) < infn(n/n) =  < 00 a.s. (9) The variables ;t In are uniformly integrable by Proposition 4.12, and moreover ElimsuPn(n/n) < E < infn(En/n) == infn(En/n) == c. (10) To derive a lower bound, let KnJl(jk) be uniformly distributed over {I,..., n} for each n, and define ('k = ltn,ltn+k, 171: = K,n+k - ltn+k-l, kEN. By (6) we have ( (f , (2\ · . .) d (1, 2, . . . ), n EN. (11 ) Moreover, rif: < EK,n+ k - 1 ,K n +k d 1 by (6) and (7), and so the vari- ables (1}i:)+ are uniformly integrable. On the other hand, the sequence 
10. Stationary Processes and Ergodic Theory 193 Ef,l, Ef,2, ... is subadditive, and so by Lemma 10.21 we have as n -+ 00 Erlk = n-1(En+k - Ek) -+ infn(Enln) = C, kEN. (12) In particular, SUPn E/1Jkl < 00, which shows that the sequence T}k, rJ, . .. is tight for each k. Hence, by Theorems 4.29,5.19, and 6.14., there exist some random variables (k and 'T/k such that d (f, (2\ . . . ; rJ? , 1]2\ . . . ) -+ (1, (2, . . . ; T}1, rJ2, . . . ) (13) along a subsequence. Here «(k) d (k) by (11), and so by Theorem 6.10 we may assume that (k == k for each k. The sequence 111, 'f}2, . .. is clearly stationary, and by T...Jemma 4.11 it is also integrable. From (7) we get 'fJ + . . . + 'fJk == f,n +k - f,n < f,n ,n +k == (;:, and so in the limit 111 + . . . + 'f}k < f,k a.s. Hence, Theorern 10.6 yields f.njn > n- 1 L k :o;n TJk -+ ij a.s. and in L 1 for some fj ELI. In particular, the variables ;; In are uniformly integrable, and so the same thing is true for n/n. Using Lemma 4.11 and the uniform integrability of the variables ('f}k)+ together with (10) and (12), we get c - lirn SUPnE'TJ? < ET}l == Efj < Eliminfn(n/n) < ElimsuPn(n/n) < E < c. Thus, n/n converges a.s., and by (9) the limit equals . Furthermore, by Lemma 4.11 the convergence holds even in L 1 , and E == c. If the sequences in (5) are ergodic, then n = En a.s. for each n, and we get  == c a.s. Now assume instead that c = -00. Then for each r E .Z, the truncated array f,m,n V r(n - m), 0 < m < n, satisfies the hypotheses of the theorem with c replaced by c T == infn(E/n) > r, where  == f,n V rn. Thus, f,/n '= (f,n/n) V r converges a.s. toward some random variable r with mean c T , and so f,n/n -t inf r (,T _ . Finally, E < inf r c r == c = -00 by monotone convergence. 0 As an application of the last theorem, we may derive a celebrated ergodic theorem for products of random matrices. Theorem 10.23 (random matrices, Furstenberg and Kesten) Consider a stationary sequence of random d x d matrices X k == (X&) such that xt > 0 a.s. and EllogX&1 < 00 for all i andj. Then n- 1 1og(Xl... xn)j converges a.s. and in L1 as n -t 00, and the limit is inde]}endent of (i, j). Proof: First let i = j = 1, and define m,n = log(X m + 1 . . . X n )11, 0 < m < n. 
194 Foundations of Modern Probability The array (-m,n) is clearly subadditive and jointly stationary, and we have EIo,11 < 00 by hypothesis. Further note that (Xl .. - xn)u < d n - l I1 k ::;n m8.Xi,jX- Hence, O,n - (n - 1) logd < L log8:XXt < L L Ilogxtl, k< ,J k . . _ n :::;n ,J and so n-lEo,n < logd+ L. .EllogXI,jl < 00. ,J Thus, by Theorem 10.22 and its proof, there exists an invariant random variable  such that O,n/n -+  a.s. and in L 1 . To extend the convergence to arbitrary i, j E {1, . . . , d}, we write for any nEN XlI (X 3 . . . xn)IIXj+l < (X 2 . . . xn+l )ij < (Xi i X jl+ 2 )-1(X 1 ... X n + 2 )11. Noting that n-1logXij ---1- 0 a.s. and in L 1 by Theorem 10.6, and using the stationarity of (xn) and the invariance of , we obtain n- 1 Iog(X 2 . . . xn+l )ij -+  a.s. and in L 1 . The desired convergence now follows by stationarity. 0 We turn to the decomposition of an invariant distribution into ergodic components. For motivation, consider the setting of Theorem 10.6 or 10.14, and assume that S is Borel, to ensure the existence of regular conditional distributions. Writing 'rJ = P[ E .II], we get .c() = EP[ E -lId = E7] = J mP{7] E dm}. (14) Furthermore, 1]/ = P[ E /II] = 1{ E I} a.s., / E I, and so 1J/ = 0 or 1 a.s. for all I E I. If we can choose the exceptional null set to be independent of I, it follows that 'rJ is a.s. ergodic, and (14) gives the desired ergodic decomposition of JL = .c(). Though the suggested result is indeed true, the proof requires a different approach. 
10. Stationary Processes and Ergodic Theory 195 Proposition 10.24 (ergodicity by conditioning, Farrell, Va radarajan) Let  be a random element with distribution J.1 in a Borel space S, and let T == (Ts; S E }Rd) be a measurable group of j1-preserving maps on S with invariant a-field I. Then 1] == P[ E .II] is a.s. invariant and ergodic under T. For the proof, we fix an increasing sequence of convex sets En E B d with r(Rn) -+ 00 and introduce on S the probability kernels /Ln(X, A) = IBnl-1 ( lA(Tsx) ds, XES, A E 5, JB n and the associated empirical distributions 1Jn == j1n (, .). By Theorem 10.14 we note that 'flnf -+ 1Jf a.s. for every bounded, measurable function f on S, where 1J = P[ E .II]. We say that a class C c S is measure-determining if every probability measure on S is uniquely determined by its values on c. Lemma 10.25 (degenerate limit) Let AI, A 2 , . . . E S be rneasure-determin- ing and such that 1JnAk -+ P{ E A k } a.s. for each k. Then  is ergodic. Proof: By Theorem 10.14 we have 'fJnA -+ 'flA - P[ E AII] a.s. for every A E S, and so by comparison 'flAk == P{ E A k } a.s. for all k. Since the Ak are measure-determining, it follows that 1] == £() a.s. Hence, for any I E I we have a.s. P{ E I} = 1]1 == P[ E III] == 1I() E {O, I}, which implies P{  E 1} = 0 or 1. o Proof of Proposition 10.24: By the stationarity of , we have for any A E Sand s E }Rd 1] 0 Ts-I A = P[Ts E AII] == P[ E AII] == 'flA a.s. Since S is Borel, we obtain 'floT;1 == 'TJ a.s. for every s. Now put C == [0, l]d, and define 17 == J c ('TJo Ts-l )ds. Since 'fl is a.s. invariant under shifts in Zd, the variable ij is a.s. invariant under arbitrary shifts. Furthernl0re, by Fubini's theorem, A d { S E [0, 1]d; 'TJ 0 Ts-I == 17} == 1 a.s., and therefore fj = 'TJ a.s. This shows that 'TJ is a.s. T-invaria.nt. Let us now choose a measure-determining sequence At, A 2 ,... E S, which is possible since S is Borel. Noting that 1}nAk -t 1]Ak a.s. for every k by Theorem 10.14, we get by Theorem 6.4 7Jn k {x E S; /Ln(x, Ak)  7J A k} = pI. n k {7Jn A k  7JAd = 1 a.s. Since 1] is a.s. a T-invariant probability measure on S, Lemma 10.25 ap- plies for every wEn outside a P-null set, and we conclude that TJ is a.8. ergodic. 0 
196 Foundations of Modern Probability We have seen that (14) gives a representation of the distribution J..t = L() as a mixture of invariant and ergodic probability measures. The next result shows that this decomposition is unique and characterizes the ergodic measures as extreme points in the convex set of invariant measures. To explain the terminology, recall that a subset M of a linear space is said to be convex if cm} + (1- c)m2 E M for all ml, m2 E M and c E (0,1). In that case, we say that m E M is extreme if for any ml, m2, and c as above, the relation m = em} + (1 - c)m2 implies m} = m2 = m. With any set of measures J.-L on a measurable space (8, S), we associate the a-field generated by all evaluation maps 7r B : J-t  J-LB, B E S. Theorem 10.26 (ergodic decomposition, Krylov and Bogolioubov) Let r := (Ts; S E d) be a measurable group of transformations on some Borel space S. Then the r -invariant distributions on S form a convex set M, whose extreme points agree with the ergodic measures in M. Moreover, any measure J..l E M has a unique representation J-L = J m 1I( dm) with II restricted to the set of ergodic measures in M. Proof: The set M is clearly convex, and by Proposition 10.24 we have for every J-L E M a representation J.l = f m v( dm ), where II is a probability measure on the set of ergodic measures in M. To see that II is unique, we introduce a regular conditional distribution TJ = J.L[ . II] a.s. J..l on S, and note that J..lnA -+ 1]A a.s. J-l for all A E S by Theorem 10.14. Thus, for any AI, A 2 , . . . E S, we have mn k {x E S; JLn(x,A k ) -t 1J(x,Ak)} = 1 a.e. v. The same relation holds with TJ(x, A k ) replaced by mA k , since II is restricted to the class of ergodic measures in M. Assuming the sets Ak to be measure- determining, we conclude that m{x; 'TJ(x,.) = m} = 1 a.e. v. Hence, for any measurable set A eM, JL{1J E A} = f m{17 E A}v(dm) = J lA(m)v(dm) = vA, which shows that II = J-L 0 TJ-}. To prove the equivalence of ergodicity and extremality, fix any measure J-t E M with ergodic decomposition f m v( dm). Let us first assume that J-l is extreme. If it is not ergodic, then v is nondegenerate, and we have v = CVl + (1 - C)1I2 for some v1.l.v2 and C E (0,1). Since JL is extreme, we obtain J m VI ( dm) = J m V2 ( dm), and so III = V2 by the uniqueness of the decomposition. The contradiction shows that J.L is ergodic. Next assume JL to be ergodic, so that v = bJ.t, and let J.L = CJ-ll + (1- e)J.L2 with Jll, J-t2 E M and C E (0,1). If Jli = J m IIi (dm) for i = 1,2, then 611- = cVl+(1-c)V2 by the uniqueness of the decomposition. Hence, VI := V2 = DJ.L' and so J.LI = J-l2, which shows that JL is extreme. 0 We conclude the chapter with some powerful coupling results that will be needed for our discussion of Palm distributions in Chapter 11 and also 
10. Stationary Processes and Ergodic Theory 197 for the ergodic theory of Markov processes developed in Chapter 20. First we consider pairs of measurable processes on JR.+ with values in an arbi- trary measurable space S. In the associated path space, we introduce the invariant a-field I and the tail a-field T == nt Tt, where Tt == O"(Ot), and we note that I c T. For any signed measure v, let Ilvll.A denote the total variation of v on the a-field A. Theorem 10.27 (coupling on JR.+, Goldstein, Berbee, Aldous and Tho- risBon) For any S-valued, measurable processes X and Y on JR.+, we have (i) X d Y on T iff (a, OuX) d (T, Or Y) for some random times 0", T > 0, and also iff 1I£(BtX) - £(OtY)11 ---+ 0 as t ---+ 00; (ii) X d Y on I iff (}u X d OrY for some random ti'mes a, T > 0, and also iff II Jo 1 (£(Ost X ) - £(OstY))dsll ---+ 0 as t ---+ ex). If the path space is Borel, we can strengthe_n the distributioal couplings in (i) and (ii) to the a.s. versions OrX == Or Y and ()uX == Or Y, respectively, - d for some Y == Y. Proof of (i): Let J.l1 and J.l2 be the distributions of X and Y, and assume that J.l1 == J-t2 on T. Write U == SJR+ , and define a mapping p on JR+ x U by p( s, x) == (s, () sx). Let C denote the class of all pairs (VI, 1/2) of measures on JR+ x U such that -1 -1 < VI 0 P == V2 0 P , VI _ J..L 1 , V2 < J.L2, (15) where Vi == Vi(JR+ x .), and regard C as partially ordered under compo- nentwise inequality. By Corollary 1.16 we note that every linearly ordered subset has an upper bound in C. Hence, Zorn's lemma ensures the existence of a maximal element (VI, V2). To see that VI == J.Ll and V2 == J.l2, we define J.l == J-ii -- Vi, and conclude from the equality in (15) that IIJ.L - J.l liT II VI - V211 T < II VI - V2 "t < 2v}((n,00) x U) -+ 0, (16) which implies J.l = J.L on T. Next, by Corollary 2.13, there exist some measures Jii < J.l satisfying n n ' /\ ' ,..,- J.ll = J.l2 == J-ll J-l2 on J n, n EN. Writin g v!'- == {) to.. 1I.1}- we g et ii!1 < 1/ and v n 0 P - l == v n 0 P - 1 and so 1, n \(y r-1, , 1, -""'1, 1 2' (VI + Vr,V2 + V2') E C. Since (VI,V2) is maximal, we obtain vI = v2' == 0, and so by Corollary 2.9 we have /-l ...L J-L on Tn for a1l1L In other words, J.LAn == J.LA == 0 for some sets An E Tn. But then also /-lA == J.LAc == 0, where A = lim SUP n An E T. Since the /-l agree on T" we obtain J-i == J-L = 0, which means that Vi = J.li. Hence, by Theorenl 6.10 there exist some random variables a, T > 0 such that the pairs (a, X-) and (T, Y) have 
198 Foundations of Modern Probability distributions VI and V2, and the desired coupling follows from the equality in (15). The remaining claims are easy. Thus, the relation (a, 8a X ) d (7, B-r Y) implies 1IJ.t1 - J.t211T n ---t 0 as in (16), and the latter condition yields J.ll == J.t2 on T. When the path space is Borel, the asserted a.s. coupling follows from the distributional version by Theorem 6.10. 0 To avoid repetitions, we postpone the proof of part (ii) until after the proof of the next theorem, where we consider a closely related result in- volving groups G of transformations on an arbitrary measurable space (8, S). Theorem 10.28 (group coupling, Thorisson) Let the lcscH group G act measurably on a space S, and let  and TJ be random elements in S such that  d rJ on the G-invariant a-field I. Then, d TJ for some random element , in G. Proof: Let J.tl and J.t2 be the distributions of  and 1J. Define p: G x S ---t 8 by p(g, s) == g8 and let C denote the class of pairs (VI, V2) of measures on G x S satisfying (15) with Vi == Vi(G x .). Using Zorn's lemma as before, we see that C has a maximal element (VI, V2), and we claim that Vi == J.Li for i == 1, 2. To see this, let A be a right-invariant Baar measure on G, which exists y Theorem 2.27. Since A is a-finite, we may choose a probability measure A rv A and define J.L == J.ti - Vi, Xi == X @ J.t, i == 1, 2. By Corollary 2.13 there exist some measures v < Xi satisfying I -1 I -1 -1 /\ -1 VI 0 P == V2 0 P = Xl 0 P X2 0 P . Then v < J.L for i == 1,2, and so (VI + v, V2 + v) E C. Since (VI, V2) is maximal, we have v == v = 0, and so Xl op-l 1- X2 0 p-l by Corollary 2.9. In other words, there eists a set Al == A 2 E S such that Xi 0 p-l Ai == 0 for i == 1,2. Since A « A, Fubini's theorem gives is J.t(ds) L 1A i (gs) ..\( dg) = (..\ @ J.tD 0 p-l Ai = O. (17) By the right invariance of .x, the inner integral on the left is G-invariant and therefore I-measurable in s E S. Since also J-t = J-L on I by (15), equation (17) remains true with Ai replaced by Ai. Adding the two formulas gives A Q9 J-l = 0, and so J-L = o. Thus, Vi = J-ti for i = 1, 2. Since G is Borel, there exist by Theorem 6.10 some random elements a and 7 in G such that (O',) and (7, '11) have distributions VI and V2. By (15) we get u d 7rJ, and so the same theorem yields a random element f in G such that (f,(1) d (7,7'fJ). But then f-la d 7- 1 717 = 1}, which proves the desired relation with 'Y = ;;--lu. 0 
10. Stationary Processes and Ergodic Theory 199 Proof of Theorem 10.27 (ii): In the last proof, we replace S by the path space U = SR+, G by the semigroup of shifts ()t, t > 0, and A by Lebesgue measure on +. Assuming X d Y on I, we may proceed as before up to equation (17), which now takes the form L 1L(dx) 1 00 1A. ((}t X ) dt = (>' 01L) 0 p-1A i = O. (18) Writing fi(X) for the inner integral on the left, we note that for any h > 0 Ii ((}h X ) = 1 00 1A. ((}t X ) dt = 1 00 lOi:'A. ((}t. r ) dt. (19) Hence, (18) remains true with Ai replaced by ()hIAi, and then also for the Ol-invariant sets Al = limsup OlAI' noo A 2 = lim inf fJ;;lA 2 , noo - - where n -+ 00 along N. Since Al = A 2 , we may henceforth assume the Ai in (18) to be Ol-invariant. Then so are the functions Ii in view of (19). By the monotonicity of Ii oOh, the Ii are then Oh-invariant for all h > 0 and therefore I-measurable. From this point on, we may argue as before to show that 8(7X d (}r Y for some random variables 0-, T > o. The remaining assertions are again routine. 0 Exercises 1. State and prove continuous-time, two-sided, and higher-dimensional versions of Lemma 10.1. 2. Consider a stationary random sequence  = (I, 2,. . .). Show that the  n are i.i. d. iff  III (  2 ,  2, . . . ). 3. Fix a Borel space 5, and let X be a stationary array of S-valued random elements in S, indexed by N d . Show that there exists a stationary array Y indexed by Zd such that X = Y a.s. on N d . 4. Let X be a stationary process on JR+ with values in some Borel space S. Show that there exists a stationary process Y on JR with X d Y on JR+. Strengthen this to a.s. equality when S is a complete metric space and X is right-continuous. 5. Consider a two-sided, stationary random sequence  with restriction 1/ to N. Show that  and 1/ are simultaneously ergodic. (Hint: For any measurable, invariant set ] E SZ, there exists some measurable, invariant set ]' E SN with] = Sz- X ]' a.s. £().) 6. Establish two-sided and higher-dimensional versions of Lemmas 10.4 and 10.5 as well as of Theorem 10.9. 
200 Foundations of Modern Probability 7. A measure-preserving transformation T on some probability space (8, S, JL) is said to be mixing if Jl(A n T-n B) -t JlA . J-LB for all A, B E S. Prove the counterpart of Lemma 10.5 for mixing. Also, show that any mix- ing transformation is ergodic. (Hint: For the latter assertion, take A == B to be invariant.) 8. Show that it is enough to verify the mixing property for sets in a gen- erating 7r-system. Use this fact to prove that any i.i.d. sequence is mixing under shifts. 9. Fix any a E IR, and define Ts == s+a (mod 1) on [0,1]. Show that T fails to be mixing but is ergodic iff a ft Q. (Hint: To prove the ergodicity, let I C [0,1] be T-invariant. Then so is the measure 1[.'\, and since the points ka are dense in [0, 1], it follows that 1[ . A is invariant. Now use Theorem 2.6. ) 10. (Bohl, Sierpinski, Weyl) For any a fj Q, let JLn == n- 1 Lkn Dka, where ka is defined modulo 1 as a number in [0,1]. Show that JLn  A. (Hint: Apply Theorem 10.6 to the mapping of the previous exercise.) 11. Prove that the transformation Ts == 2s (mod 1) on [0,1] is mixing. Also show how the mapping of Lemma 3.20 can be generated as in Lemma 10.1 by means of T. 12. Note that Theorem 10.6 remains true for invertible shifts T, with av- erages taken over increasing index sets [an, b n ] with b n - an -+ 00. Show by an example that the a.s. convergence may fail without the assumption of monotonicity. (Hint: Consider an Li.d. sequence (n) and disjoint intervals [an, b n ], and use the Borel-Cantelli lemma.) 13. Consider a one- or two-sided stationary random sequence (n) in some measurable space (8, S), and fix any B E S. Show that a.s. either n E BC for all n or n E B Lo. (Hint: Use Theorem 10.6.) 14. (von Neumann) Give a direct proof of the L2- vers ion of Theorem 10.6. (Hint: Define a unitary operator U on £2 (S) by U f = f 0 T. Let M denote the U-invariant subspace of L 2 and put A = I - U. Check that M.L == R A, the closed range of A. By Theorem 1.33 it is enough to take f E M or f ERA.) Deduce the general LP-version, and extend the argument to higher dimensions. 15. In the context of Theorem 10.26, show that the ergodic measures form a measurable subset of M. (Hint: Use Lemma 1.41, Proposition 4.31, and Theorem 10.14.) 16. Prove a continuous-time version of Theorem 10.26. 17. Deduce Theorem 4.23 for p < 1 from Theorem 10.22. (Hint: Take Xm,n = 18n - 8m1 P , and note that EI8nlP = o(n) when p < 1.) 
10. Stationary Processes and Ergodic Theory 201 18. Let  == (I, 2, . . .) be a stationary sequence of random variables, fix any B E B(IR d ), and let n be the number of indices k E {I,..., n - d} with (k,'.' ,d) E B. Prove from Theorem 10.22 that n/n converges a.s. Deduce the same result from Theorem 10.6, by considering suitable subsequences. 19. Show that the inequality in Lemma 10.7 can be strengthened to E[I; sUPn(Sn/n) > 0] > O. (Hint: Apply the original result to the variables k + €, and let €  0.) 20. Extend Proposition 10.10 to stationary processes on Zd. 21. Extend Theorem 10.14 to averages over arbitrary rectangles An == [0, anI] X . . . X [0, and] such that anj  00 and sUPn (a n2 / anj) < 00 for all i =1= j. (Hint: Note that Lemma 10.17 extends to this ca'3e.) 22. Derive a version of Theorem 10.14 for stationary processes X on Zd. (Hint: By a suitable randomization, construct an associated stationary pro- cess X on d, apply Theorem 10.14 to X, and estimate the error term as in Corollary 10.19.) 23. Give an example of a stationary, simple point process  on ]Rd with a.s. infinite sample intensity . 24. Give an example of two processes X and Y on + such that X d Y on I but not on T. 25. Derive a version of Theorem 10.27 for processes on Z+. Also prove versions for processes on t and Zi. 26. Show that Theorem 10.27 (ii) implies a corresponding result for pro- cesses on . (Hint: Apply Theorem 10.27 to the processes Xt == ()tX and Y == Ot Y .) Also show how the two-sided statement follows from Theorem 10.28. 27. For processes X on +, define Xt = (Xt, t) and let i be the associated invariant a-field. Assuming X and Y to be measurable, show that X d Y - d - - on T iff X == Y on I. (Hint: Use Theorem 10.27.) 28. Prove Lemma 10.15 (ii). (Hint (Day): First show that if Sr C B, then B + SE: C (1 + €/r)B, where Sr denotes an r-ball around 0.) 
Chapter 11 Special Notions of Symmetry and Invariance Palm distributions and inversion formulas; stationarity and cycle stationarity; local hitting and conditioning; ergodic prop- erties of Palm measures; exchangeable sequences and processes; strong stationarity and predictable sampling; ballot theorems; entropy and information This chapter is devoted to some loosely connected topics that are all related to our previous treatment of stationary processes and ergodic theory. We begin with a discussion of Palm distributions of stationary random mea- sures and point processes. In the simplest setting, when  is a stationary, simple point process on JRd, we may think of the associated Palm distri- bution Q as the conditional distribution, given that  has a point at o. A formal definition is possible when  has finite and positive intensity, in which case the mentioned interpretation may be justified by a limit theo- rem. In the ergodic case, the distributions of the original process and its Palm version agree up to a random shift, which leads to some useful ergodic and averaging relations. Finally, the theory of Palm distributions provides a striking relationship between the notions of stationarity under discrete and continuous shifts. Asymptotically invariant sampling from a stationary sequence or process leads in the limit to an exchangeable sequence. This is the key observa- tion behind de Finetti's theorem, the fact that exchangeable sequences are mixed i.i.d. It also implies the further equivalence with the notion of spread- ability or subsequence invariance, which in turn is equivalent to strong stationarity or invariance in distribution under optional shifts. In the other direction, we consider the striking and useful predictable sampling theo- rem, the fact that an exchangeable distribution remains invariant under predictable permutations. The latter result will be used in Chapters 13-15 to give simple proofs of the various versions of the arcsine laws. The chapter concludes with a general so-called ballot theorem for sta- tionary, singular random measures and with a version of the fundamental ergodic theorem of information theory. The former result leads, whenever it applies, to some very precise maximum inequalities, related to those of the preceding chapter and with important applications to queuing theory 
11. Special Notions of Symmetry and Invariance 203 and other areas. The latter result relates ergodic theory to the notion of entropy, of such basic importance in statistical mechanics. The material in this chapter is related in many ways to other parts of the book. In particular, we may point out some links to various applications and extensions, in Chapters 12, 13, and 16, of results for exchangeable sequences and processes. Furthermore, the predictable sampling theorem is related to some results on random time change appearing in Chapters 18 and 25. A random measure  on !R d is defined as a locally finite kernel from the basic probability space to !Rd. It is called a point process if B is integer- valued for every bounded Borel set B. In the latter case,  is said to be simple if {s} < 1 for all s E R d outside a fixed P-null set. A more detailed discussion of random measures is given in Chapter 12. We begin the present treatment with a basic general property. Lemma 11.1 (zero-infinity law) If  is a stationary random measure on  or Z, then [O,oo) == 00 a.s. on {i= O}. Proof: We first consider the case of random measures on JR. By the stationarity of  and Fatou's lemma, we have for any t I JR and h, E > 0 P{[t, t + h) > E} limsuPnP{[(n - l)h,nh) > E} < P{[(n - l)h, nh) > E i.o.} < P{[O, (0) = oo}. Letting e -t 0, h -t 00, and t -t -00 in this order, we get P{  =1= O} < P{ [O, 00) == oo}. Since trivially [O, (0) == 00 implies  ::/: 0, we obtain P{[O,oo) < 00,  =1= O} == P{ -I O} - P{[O,oo) == oo} < 0, and the assertion follows. The result for random measures on Z may be proved by the same argument with t and h restricted to Z. 0 Now consider on JRd a random measure  and a rneasurable random process X, taking values in an arbitrary measurable space S. We say that  and X are jointly stationary if Ot(X,) = ((}tX, ()t) d (X,) for every t E ]Rd. By Theorem 2.6 and the stationarity of , we have E = CAd for some cnstant c E [0,00], called the intensity of, and we note that c == E[, where  is the sample intensity in Corollary 10.19. If X and  are jointly stationary and  has finite and positive intensity, we define the Palm distribution Qx, of (X, c;) with r'espect to c; by the formula Qx,d = E 1 f(()s(X, f,,)) f,,(ds) /Ef"B, (1) for any set B E B d with Ad B E (0, 00) and for measurable functions f > 0 on SIRd x M(]Rd). The following result shows that the definition is independent of the choice of B. 
204 Foundations of Modern Probability Lemma 11.2 (coding) Consider a stationary pair (X,) on JRd, where X is a measurable process in Sand € is a random measure. Then for any measurable function f > 0, the stationarity carries over to the random measure fB = L f(Os(X, )) (ds), B E B d . Proof: For any t E d and B E Bd, a simple computation gives «()tf)B = f(B + t) = f f«()s(X,)) (ds) JB+t J 1 B (s - t) f ( 0 s (X, ) )  ( ds ) - J IB(u)f(Ou+t(X,))(du+t) L f(()u()t(X,)) (()t)(du). Writing f == F(X,€) and using the stationarity of (X,), we obtain f)tf = F( Ot (X, €)) d F(X,) == €f, t E ]Rd. o The mapping in (1) is essentially a one-to-one correspondence, and we proceed to derive some useful inversion formulas. To state the latter, it is suggestive to introduce a random pair (Y,1}) with distribution Qx,, where in view of (1) the process Y can again be chosen to be measurable. When  is a simple point process, then so is 1}, and we note that 1]{0} == 1 a.s. The result may then be stated in terms of the associated Voronoi cells VJt == {s E }Rd; j-t(Slsl + s) == O}, j-t E N(IR d ), where NORd) is the class of locally finite measures on JR+ and Sr denotes the open ball of radius r around the origin. If also d == 1, we may enumerate the supporting points of J.t in increasing order as tn(j-t), subject to the conven- tion to(j-t) < 0 < tl(J.j). To simplify our statements, we often omit the obvi- ous requirement that the space S and the functions f and 9 be measurable. Proposition 11.3 (uniqueness and inversion) Consider a stationary pair (X,) on ]Rd, where X is a measurable process in S and  is a random measure with E E (0,00). Then P[(X, c;) E .1  i= 0] is uniquely determined by Ley, 1]) == Qx,f., and the following inversion formulas hold: (i) For any f > 0 and 9 > 0 with >..d g < 00, E[J(X, );  I- 0] = Et,. E J f(:;») g( -s) ds. (ii) If  is a simple point process, we have for any f > 0 E[f(X, );  I- 0] = Et,. E f f(Os(Y, 1])) ds. JV TI 
11. Special Notions of Symmetry and Invariance 205 (iii) If  is a simple point process and d == 1, we have for any f > 0 (t 1 (1]) E[f(X, );  # 0] = Ef. · E Jo f( ()s(Y, 17)) ds. To express the conditional distribution P[f{X,) E .\  1= 0] in terms of £(Y,1}), it suffices in each case to divide by the corresponding formula for f = 1. The latter equation also expresses P{ 1= O}/E in terms of £('f/). In particular, this ratio equals EIVTJI in case (ii) and Et l ('TJ) in case (iii). Proof: (i) Write (1) in the form Ef.. ).d B . Ef(Y,17) = E 1 f«()s(X,fJ) (ds), B E Sd, and extend by a monotone class argument to Ef.. E J h(Y,17,S)ds = E J h«()s(X,O,s)(ds), for any measurable function h > 0 on the appropriate product space. Ap- plying the latter formula to the function h ( x, J-L, 8) == f ( () - s ( x, J-t), 8) for measurable f > 0 and substituting -8 for s, we get Ef.. E J f(()s(Y, 17), -s) ds = E J f(X,, s) (ds). (2) In particular, we have for measurable g, h > 0 Ef.. E J h«()s(Y, 17)) g( -s) ds = E h(X,) g. If 9 > 0 with ),.d g < 00, then €g < 00 a.s., and the desired relation follows by the further substitution h(x,JL) = f(x,JL) 1{JL9 > O}. J-Lg (ii) Here we may apply (2) to the function h(x,J-t,s) == f(x,J-L) 1{J-L{s} == 1, J-LSlsl == O}, and note that ((}s17)SI-sl == 0 iff s E VTJ. (iii) In this case, we apply (2) to the function h(x,J-L,s) == f(x,J-L) l{to(J.l) == s}, and note that to( (}s'TJ) = -s iff S E [0, tl (1])). o Now consider a simple point process 7J on JR and a D1easurable process Y on JR with values in an arbitrary measurable space (,, S). We say that the pair (Y, 'TJ) is cycle-stationary if 1]{O} == 1 and tl ('TJ) < 00 a.s., and if in addition ()tl(T/)(Y,7J) d (Y,'TJ). The variables tn(TJ) are then a.s. finite, and the successive differences Llt n (TJ) == t n + 1 (TJ) - t n (TJ) along with the shifted processes yn = ()t n ('11) Y form a stationary sequence in the space 
206 Foundations of Modern Probability (0,00) X SIR.. The following result gives a striking relationship between the notions of stationarity and cycle stationarity for pairs (X,) and (Y, TJ). When d = 1 and  =1= 0 a.s., the definition (1) of the Palm distribution and the inversion formula in Proposition 11.3 (Hi) reduce to the nearly symmetric equations Ef(Y,TJ) E 1 1 f«()s(X,))(ds)/E, r t1 ('I7) Ef(X,) = EJo f«()s(Y,TJ))ds/Eh(TJ). (3) (4) Theorem 11.4 (cycle stationarity, Kaplan) Equations (3) and (4) pro- vide a one-to-one correspondence between the distributions of all stationary pairs (X,) on 1R and all cycle-stationary ones (Y, TJ), where X and Yare measurable processes in S, and  and TJ are simple point processes with  =1= 0 a.s., E( < 00, and Etl(r]) < 00. Proof: First assume that (X,) is stationary with  =1= 0 and E < 00, put ak = tk(), and define L(Y, 'fJ) by (3). Then for any n E N and for bounded, measurable! > 0, we have nE'Ef(Y'TJ)=Eln f«()s(X,))(ds)=E L f«()(jk(X,)), ukE(O,n) Writing Tk = tk(TJ), we get by a suitable substitution nE(.E!({}rl(Y,r])) = E L j(fJ uk + 1 (X,f,)), ukE(O,n) and so by subtraction, 211/11 IE f«()Tl (Y, TJ)) - Ef(Y, TJ) I < n E . As n -+ 00, we obtain Ef(9Tl (Y, 'TJ)) = Ef(Y, 'TJ), and therefore Orl (Y, 'TJ) d (Y, 1]), which means that (Y,1]) is cycle-stationary. Also note that (4) holds in this case by Proposition 11.3. Next assume that (Y, 'TJ) is cycle-stationary with Etl ('TJ) < 00, and define £(X,f,) by (4). Then for nand f as before, nET1' Ef(X,) = E l Tn f«()s(Y,TJ))ds, and so for any t E IR, l Tn [Tn +t nET1.Ef«()t(X,))=E 0 !«()s+t(Y,TJ))ds=E Jt f«()s(Y,TJ))ds. Hence, by subtraction, IEf«()t(X,)) - Ef(X,)1 < 2ltfll . n Tl 
11. Special Notions of Symmetry and lnvariance 207 d As n  00, we get Ef(Ot(X,)) == Ef(X,), and so Ot(X,) == (X,), which means that (X,) is stationary. To see that (X,) and (Y,1]) are related by (3), we jntroduce a possibly unbounded measure space with integration operator E and a random pair (Y, ij) satisfying Ef(¥,iJ) =E 1 1 f(Os(X,))(ds). (5) Proceeding as in the proof of Proposition 11.3, except that the monotone class argument requires some extra care since E may be infinite, we obtain ftl (ij) (t l (17) E Jo f(Os(¥,iJ))ds = Ef(XJ,) = E J o f(Os(Y,T/))ds/Elt(T/). Replacing f(x, J1) by f(Oto(j.t) (x, J1)) and noting that to(OsJ1) == -8 when J1{0} = 1 and s E [0, t l (J1)), we get E [t 1 ( ij) f (Y , ij)] = E [t 1 (1]) f (Y, 1] ) ] / Et I (7 ) . Hence, by a suitable substitution, Ef(Y, ij) == Ef{Y, 1])/ Etl (1]). Inserting this into (5) and dividing by the same formula for f = 1, we obtain the required equation. 0 When  is a simple point process on }Rd, we may think of the Palm distri- bution Qx, as the conditional distribution of (X, ), given that {O} == 1. The interpretation is justified by the following result, which also provides an asymptotic formula for the hitting probabilities of s]nall Borel sets. By Bn -t 0 we mean that sup{lsl; S E Bn} --* 0, and we write 11.11 for the total variation norm. Theorem 11.5 (local hitting and conditioning, Korolyuk, Ryll-Nardzewski, Konig, Matthes) Consider a stationary pair (X,) on ]Rd, where X is a measurable process in S and  is a simple point proceS5 with E E (0,00). Let B I , B 2 ,... E B d with IBnl > 0 and Bn -t 0, and let f be bounded, measurable, and shift-continuous. On {Bn == 1}, let an denote the unique point of  in Bn. Then (i) P{Bn = I}  P{Bn > O} rv EBn; (ii) IIP[8u n (X,) E .1 Bn == 1] - Qx,11 -t 0; (iii) E[f{X, )I Bn > 0] -t Qx,f. Proof: (i) Since 1]{O} == 1 a.s., we have (Os1])Bn > 0 for all s E -Bn. Hence, Proposition 11.3 (ii) yields P{> O} = E Iv., l{(OsT/)B n > O} ds > EI n (-Bn)l. 
208 Foundations of Modern Probability Dividing by IBnl and using Fatou's lemma, we obtain I " . f P{Bn > O} lID In E ' B noo  n > 1 . " f EIV1] n (-Bn)1 Iffiln I B I noo n > EI " . f IVl1 n (-Bn)1 _ lID III IB I - 1, n-+oo n which implies . " P{Bn=l} P { Bn>O } llIDlnf > 2liminf - 1 > 1. n-+oo EBn - n-+oo EBn - The converse relations are obvious since P{Bn = I} < P{Bn > O} < EBn. (ii) Introduce on SR d x N(IRd) the measures /in = E { 1 { e s (X, ) E .}  ( ds ) , JB n V n == P[8a n (X,) E "; Bn == 1], and put m n == EBn and Pn == P{Bn = I}. By (1) the stated total variation becomes V n J..Ln < V n V n lI n J-Ln --- + --- Pn m n m n m n Pn m n 1 1 1 I Pn < Pn - - - + - Pn - m n I = 2 1 - - , Pn m n m n m n which tends to 0 in view of (i). (iii) Here we write IE[f(X, )I Bn > 0] - Qx,fl < IE[f(X,)IBn > 0] - E[f(X,)IBn == 1]1 + IE[f(X,) - f(Oa n (X, »I Bn = 1]1 + IE[f(8a n (X, »I Bn = 1] - Qx,fl. By (i) and (ii) the first and last terms on the right tend to 0 as n --+ 00. To estimate the second term, we introduce on Sad x N(d) the bounded, measurable functions ge(X, J.L) = sup If(Os(x, j.t» - f(x, j.t)I, € > 0, Isl<e and conclude from (ii) that for large enough n IE[f(X,) - f(Oa n (X, »I Bn == 1]1 < E[ge«(Ja n (X, »I Bn == 1] -t QX,ge" Since also QX,ge --+ 0 by dominated convergence as € --+ 0, the desired convergence follows. 0 
11. Special Notions of Symmetry and Invariance 209 We turn to a general ergodic theorem for Palm distributions. Given a bounded measure 1/ i= 0 on JRd and a positive or bounded, measurable function f on SIRd x M(JR d ), we introduce the average Iv (x, J.t) = J f( () s (x, J.t)) v( ds) / II vII , x E SR d , 11, E M (lR. d ), where x is understood to be a measurable function on jRd, to ensure the existence of the integral. When 1/ = 0, we take Iv = o. ]et us say that the weight functions (probability densities) gl, g2, . .. on JRd are asymptotically invariant if the corresponding property holds for the associated measures 9n . Ad. For convenience, we may sometimes write 9 . J1 := gJ.-l. Theorem 11.6 (pointwise averages) Consider a stationary and ergodic pair (X,) on JRd, where X is a measurable process in S and  is a ran- dom measure with  E (0,00) a.s. Let .c(Y,1]) = Qx,. Then for any bounded, measurable function f and asymptotically invariant distributions J-Ln or weight functions gn on JRd, we have . - p (1) fJLn (Y, 1]) -t Ef(X, ); (ii) 19n€(X,)  Ef(Y, 'TJ). The same convergence holds a. s. when J-Ln = 1 Bn . Ad or 9n = 1 Bn' re- spectively, for some bounded, convex sets B 1 C B 2 C ... in B d with r(Bn) -t 00. We can give a short and transparent proof by using the general shift coupling in Theorem 10.28. Since the latter result applies directly only when the sample intensity  is a constant (which holds in particular when  is ergodic), we need to replace the Palm distribution Qx, in (1) by a suitably modified version Q'x,€, given for f > 0 and B E B d with IBI E (0, (0) by Q'x,tJ = E L f((}s(X,f,)) (ds)/IBI. whenever  E (0,00) a.s. If  is ergodic, we note that  = E a.s., and therefore Q'x,€ = Qx,€. As previously for Qx,, it is both suggestive and convenient to introduce a random pair (Z, () with distribution Q'x,€. Lemma 11.7 (shift coupling, Thorisson) Consider a stationary pair (X,) on }Rd, where X is a measurable process in S and  is a random measure with  E (0,00) a.s. Let .c(Z, () = Q'x,. Then there exist some random vectors a and T in ]Rd such that d d (X,) = (}(j(Z, (), (Z, () = Br(X, ). The result suggests that we think of Q'x, as the distribution of (X,) shifted to a "typical" point of . Note that this interpretation fails for Q x, in general. Proof: Write I for the shift-invariant a-field in the measurable path space of (X, ), and put Ix, = (X, )-lI. Letting B = (O,ll d and noting that 
210 Foundations of Modern Probability E[BIIx,] = , we get for any I E I P{(Z,() E I} E L lI«()s(X,))f,,(ds)/ E[B/(; (X,) E I] = P{(X,) E I}, which shows that (X,) d (Z, () on I. Both assertions now follow from Theorem 10.28. 0 Proof of Theorem 11.6: (i) By Lemma 11.7 we may assume that (Y,1]) = (}T(X,) for some random element T in ]Rd. Using Corollary 10.20 (i) and the asymptotic invariance of J..ln, we get I!JLn(Y,1]) - Ef(X,)1 - p < IIJ..ln - OrJ..ln 1IIIfil + If/-Ln (X,) - Ef(X, )I  o. The a.s. version follows in the same way from Theorem 10.14. (ii) Let f be the stationary and ergodic random measure in Lemma 11.2. Applying Corollary 10.20 (ii) to both  and f and using (1), we obtain d - f - ( X C ) = fgn ,\ gn P ! = EfB = E f( Y ) gnt;. ,  ,\d gn gn   EB ' 'TJ . For the pointwise version, we may use Corollary 10.19 instead. 0 Taking expected values in Theorem 11.6, we get for bounded f the formulas E!JLn(Y,TJ)  Ef{X,), Efgn(.(X')  Ef{Y, "1), which may be interpreted as limit theorems for suitable space averages of the distributions .c(X,) and .c(Y,1]). We shall prove the less obvious fact that both relations hold uniformly for bounded f. For a striking formula- tion, we may introduce the possibly defective distributions .c JL(X,) and £g (. (X, ), given for measurable functions f > 0 by .c /-L(X, )f = EfJL(X, ), .cg (X,)f = Elg{X,). Theorem 11.8 (distributional averages, Slivnyak, Ziihle) Consider a sta- tionary pair (X,) on]Rd, where X is a measurable process in S and  is a random measure with  E (0, (0) a.s. Let £(Z, () = Q'x,. Then for any asymptotically invariant distributions J-tn or weight functions gn on JRd, (i) II .c JLn (Z, () - .c(X,) II  0; (ii) II £ gn(X,) - £(Z, () II --* o. Proof: (i) By Lemma 11.7 we may assume that (Z,() = f)T(X,). Using Fubini's theorem and the stationarity of (X, ), we get for any measurable function f > 0 £' lJ.n{X,)f = J Ef«()s(X,))J.tn(ds) = Ef(X,) = £'(X,)f. 
11. Special Notions of Symmetry and lnvariance 211 Hence, by Fubini's theorem and dominated convergence, II .c tLn(Z,() - .c(X,)" II .c tLn(8T(X,)) - .c tLn(X",)11 < E J l{Os(X,O E .} (JLn - OrJLn)(ds) < EIIJ-Ln - 8 T t-tn II -+ o. (ii) Letting 0 < f < 1 and defining f as in Lemma 11.2, we get f"f9n = J f(Os(X, f,,)) 9n(S) f,,(ds) < f,,9n. Interpreting fgn/gn as 0 when gn == 0, we obtain l .c gn(X, )f - .c(Z, ()\ < IEJgn(X,) - Ef(Z,()1 E fgn _ f!!n < E 1 _ n . gn  -  - p Here gn/ -+ 1 by Corollary 10.20, and moreover E(gn/(,) == E(E[gnIIx,]/(,) == E((/() == 1. Hence, Proposition 4.12 yields gn/(, -+ 1 in £1, and the assertion follows. D To motivate our next main topic, we consider a simple limit theorem for multivariate sampling from a stationary process. IIere we consider a measurable process X on some index set T, taking values in a space S, and let T == (71, 72, . . .) be an independent sequence of random elements in T with joint distribution J-l. We may then form the associated sampling sequence  == X 0 7 in S(X), given by  == (l, 2, . . . ) == (X T1 , X T2 , . . . ) and referred to below as a sample from X with distribution J-L. The sam- pling distributions J-ll , 112, . .. on T(X) are said to be asymptotically invariant if their projections onto T k are asymptotically invariant for every kEN. Recall that Ix denotes the invariant a-field of X, and note that the con- ditional distribution 1] == P[X o E .IIx] exists by Theorem 6.3 when S is Borel. Lemma 11.9 (asymptotically invariant sampling) Let X be a stationary and measurable process on T == JR or Z with values in a Polish space S, and form 1, 2, . .. by sampling from X with some asymptotically invariant distributions J-ll, J-L2, . .. on T(X). Then n   in Soc, where .c() == E1]oc with 1] == P[X o E 'IIx]. Proof: Write  == (k) and n == (). Fix any asymptotically invariant distributions VI, V2,... on T, and let fl,. . . , fm be measurable functions on S bounded by :!::1. Proceeding as in the proof of Corollary 10.20 (i), we 
212 Foundations of Modern Probability get IE II/k() - E II/k(k)1 < EIJLn Q9/k(X) - IIk 1J!kl < IIJLn - JLn * v;:r'1I + J EI(v;:r' * 8d Q9/k(X) - IIk 1J!k1 JLn(dt) < JllJLn - JLn * 8tll v;:r'(dt) + Lk SUPtEI(v r * 8 t )!k(X) -1J!kI. Using the asymptotic invariance of J.Ln and lI r together with Corollary 10.20 (i) and dominated convergence, we see that the right-hand side tends to o as n -+ 00 and then r -1- 00. The assertion now follows by Theorem 4.29. 0 The last result leads immediately to a version of de Finetti's theorem, the fact that infinite exchangeable sequences are mixed i.i.d. For a precise statement, consider any finite or infinite random sequence  = (l, 2, . . . ) with index set I, and say that  is exchangeable if d (kl , k2' . . . ) = (l, 2, . . . ) (6) for any finite permutation (k 1 , k 2 , . . . ) of I. (Here a permutation is said to be finite if it affects only finitely many elements). For infinite sequences  we also consider the formally weaker property of spreadability, where (6) is required for all strictly increasing sequences k 1 < k 2 < ... . Note that  is then stationary and that any sample from  with strictly increasing sampling times 71,72,. .. has the same distribution as. By Lemma 11.9 we conclude that .c() == E1]oo with 1] == P[l E .II]. Below we give a slightly stronger conditional statement. Recall that for any random measure 1] on a measurable space (S, S), the associated a-field is generated by the random variables 1]B for arbitrary B E S. Theorem 11.10 (exchangeable sequences, de Finetti, Ryll-Nardzewski) For any infinite random sequence  in a Borel space S, the following conditions are equivalent: (i)  is exchangeable; ( ii )  is spreadable; (iii) P[ E .11]] == 1]00 a.s. for some random distribution 1] on S. The random measure 1] is then a.s. unique and equals P[l E .II]. Since '11 00 is a.s. the distribution of an i.i.d. sequence in S based on the measure 1], we may state condition (iii) in words by saying that  is conditionally i.i.d. Taking expectations of both sides in (iii), we obtain the seemingly weaker condition £() = E1]oo, which says that  is mixed i. i. d. Now the latter condition implies that  is exchangeable, and so, by the stated theorem, the two versions of (iii) are in fact equivalent. 
11. Special Notions of Symmetry and lnvariance 213 Proof: Since S is Borel, we may assume that S == [0,1]. Letting J-tn be the uniform distribution on the product set Xk { (k - l)n + 1, . . . , kn} and using the spreadability of , we see from Lemma 11.9 that 1:>{ E .} == ETJ oo . More generally, consider any invariant Borel set I c 5 00 , and note that (6) extends to d (lI(€),kl,€k2"") == (lI(),1,2,"'), k 1 < k 2 < ... . Applying Lemma 11.9 to the sequence of pairs (k, 1I()), we get as before P{ E . n I} == E[TJoo;  E I], and since TJ is I-measurable, it follows that P[ E .11}] == 1}oo a.s.. To see that 1} is unique, we may use the law of large numbers and Theorem 6.4 to obtain n- 1 "'" IB(€k) -1- TJB a.s., B E S. kn The statement of Theorem 11.10 is clearly false for finite sequences. To rescue the result in the finite case, we need to replace the inherent Li.d. sequences by so-called urn sequences, generated by successive draw- ing without replacement from a finite set. For a precise statement, fix any measurable space S, and consider a measure of the form J-t == Lkn 8Sk with S1, . . . , Sn E S. The associated factorial measure J1 (n) on sn is defined by D j..L(n) == 2: 8 sop , p where the summation extends over all permutations P == (PI,..., Pn) of 1, . . . , n, and we write sop == (SpI' . . . , spn)' Note that j1(n) is independent of the order of 81, . . . , Sn and is measurable as a function of J1. Lemma 11.11 (finite exchangeable sequences) Let €I,... ,n be random elements in some measurable space, and put  == (k) and TJ == Lk 8k' Then f, is exchangeable iff P[ E .11]] == 1](n) In! a.s. Proof: Since 1} is invariant under permutations of 1, . . . , n' we note that (f, 0 p, '11) d (, 1}) for any permutation p of 1,. . . , n. Now introduce an exchangeable permutation 7r llf, of 1, . . . , n. Using Fubini's theorem twice, we get for any measurable sets A and B in appropriate spaces P{f, E B, 1} E A} - P{€07r E B, TJ E A E[P[ ° 7r E BI]; 1] E A] - E[(n!)-l1](n) B; 1] E A]. o Just as for the martingale and Markov properties, even the notions of exchangeability and spreadability may be related to a filtration F == (Fn). Thus, a finite or infinite sequence of random elements (; == (1, 2, . . .) is said to be :F-exchangeable if it is F-adapted and such that, for every n > 0, the shifted sequence (Jn = (n+1,n+2,"') is conditionally exchangeable given :F n . For infinite sequences , the definition of F-spreadability is sim- ilar. (Since both definitions may be stated without reference to regular 
214 Foundations of Modern Probability conditional distributions, no restrictions are needed on S.) When :F is the filtration induced by , the stated properties reduce to the unqualified versions considered earlier. For an infinite sequence , we define strong stationarity or :F-stationarity by the condition ()r d  for every finite optional time T > O. By the prediction sequence of  we mean the set of conditional distributions 7r n == P[()n E .IFn], n E Z+. (7) The random probability measures 7ro, 7rl, . .. on S are said to form a measure-valued martingale if (7r nB) is a real-valued martingale for every measurable set B c S. The next result shows that strong stationarity is equivalent to ex- changeability; it also exhibits an interesting connection with martingale theory. Lemma 11.12 (strong stationarity) Let  be an infinite, :F-adapted ran- dom sequence in a Borel space S, and let 1f' denote the prediction sequence of . Then these conditions are equivalent: (i)  is :F -exchangeable; (ii)  is :F -spreadable; (iii)  is F -stationary; (iv) 1f' is a measure-valued :F -martingale. Proof: Conditions (i) and (ii) are equivalent by Theorem 11.10. Assuming (ii), we get a.s. for any B E Soo and n E Z+ E[1f'n+lBI:Fn] = P[Bn+l E BIFn] = P[(}n E BIFn] == 7r n B, (8) which proves (iv). Conversely, (ii) follows by iteration from the second equality in (8), and so (ii) and (iv) are equivalent. Next we note that (7) extends by Lemma 6.2 to 1f'rB == P[8r E BIFr] a.s., BE Soo, for any finite optional time T. By Lemma 7.13 it follows that (iv) IS equivalent to P{(Jr E B} == E1f'rB == E7r o B = P{ E B}, B E Soo, which in turn is equivalent to (iii). D We next aim to show how the exchangeability property extends to a wide class of random transformations. For a precise statement, we say that an integer-valued random variable T is predictable with respect to a given filtration :F if the shifted time 'T - 1 is :F -optional. 
11. Special Notions of Symmetry and lnvariance 215 Theorem 11.13 (predictable sampling) Let  == (1, 2, . . .) be a finite or infinite, :F -exchangeable random sequence, and let 71, . . . , 7 n be a. s. distinct :F -predictable times in the index set of . Then d (  Tl , . . . ,  Tn) == (1, . . . ,  n ) . (9 ) Of special interest is the case of optional skipping, when 71 < 72 < . .. . If Tk = T + k for some optional time 7 < 00, then (9) reduces to the strong stationarity of Lemma 11.12. In general, we require neither  to be infinite nor the 7k to be increasing. For both applications and proof, it is useful to introduce the associated allocation sequence aj == inf{ k; 7k == j}, j E I, where I is the index set of . Note that any finite value of a J gives the position of j in the permuted sequence (7k). The random times 7k are clearly predictable iff the aj form a predictable sequence in the sense of Chapter 7. Proof of Theorem 11.13: First let  be indexed by I == {I, . . . , n }, so that ( 71, . . . , 7 n) and (0:1,..., an) are mutually inverse random permutations of I. For each m E {O,...,n}, put aj == Qj for all j < m, and define recursively aH-l == min(I \ {Q,... ,aj}), m < j < n. Then (aft,..., a) is a predictable and :F m - 1 -measurable permutation of 1,. .. , n. Since also aj == aj-1 == aj whenever j < m, Theorem 6.4 yields for any bounded measurable functions fl,. . . , in on S En.fo;n(j) J E E[n/a;"(j)1 Fm-l] E n . fom(j)E [n . fam(j) I Fm-l ] J<m J J?:.m J . E n . fom-l(j)E [n . f m-l(j) l :Fm-1 ] J<m J J?:.m OJ En .ia m - 1 (j). J J Summing over m E {I, . . . , n} and noting that aj == OJ and aJ == j for all j, we get EIL!k(Tk) = EII/aj(j) = EIIJk((k), which extends to (9) by a monotone class argument. Next assume that I = {I,.. . , m} with m > n. We may then extend the sequence (Tk) to I by recursively defining Tk+ 1 = min (I \ {T1, . . . , 7k } ), k > n, (10) 
216 Foundations of Modern Probability so that 71,..., 7m form a random permutation of I. Using (10), we see by induction that the times 7 n+1, . . . , 7 Tn are again predictable. Hence, the previous case applies, and (9) follows. Finally, assume that I = N. For each mEN, we introduce the predictable times 7k = 7k 1 {7k < m} + (m + k) 1 {7k > m}, k = 1, . . . , n, and conclude from the previous version of (9) that d (Tm,... 'Tm) = (1,... ,n). 1 n (11) As m --t 00, we have 7k --t Tk, and (9) follows from (11) by dominated convergence. D The last result yields a simple proof of yet another basic property of random walks in JR, a striking relation between the first maximum and the number of positive values. The latter result will in turn lead to simple proofs of the arcsine laws in Theorems 13.16 and 14.11. Corollary 11.14 (sojourns and maxima, Sparre-Andersen) Let 1, .. . , n be exchangeable random variables, and put Sk = 1 + . . . + k. Then  l{Sk > O} d min{k > 0; Sk = m ax J o<nS J o}. kn - - - Proof: Put k = n-k+l for k = 1,..., n, and note that the k remain xchageable for_ the filtration :Fk = a{ Sn, l' . . . , k}, k = 0, . . . , n. Write Sk = 1 + . . . + k, and introduce the predictable permutation k-l Ok = L l{Sj < Sn} + (n - k + l)l{Sk-l > Sn}, k = 1, . .. , n. j=O Define  = l:j j 1 {OJ = k} for k = 1, . . . , n, and conclude from Theorem 11.13 that () d (k). Writing S =  + . . . + , we further note that n-l n min{k > 0; S = maxjSj} = L 1{Sj < Sn} = L l{Sk > O}. 0 j=O k=l Turning to the case of continuous time, we say that a process X in some topological space is continuous in probability if Xs --+ Xt as s --+ t. An JRd-valued process X on JR+ is said to be exchangeable or spreadable if it is continuous in probability with Xo = 0 and such that the increments Xt-X s over any set of disjoint intervals (s, t] of equal length form an exchangeable or spreadable sequence. Finally, we say that X has conditionally stationary and independent increments, given some O"-field I, if the stated property is conditionally true for any finite collection of intervals. The following continuous-time version of Theorem 11.10 characterizes the exchangeable processes on IR+_ We postpone the much harder finite-interval 
11. Special Notions of Symmetry and Invariance 217 case until Theorem 16.21. The point process case is treated separately by different methods in Theorem 12.12. Theorem 11.15 (exchangeable processes on ffi.+, Buhlmann) Let the pro- cess X on IR+ be JRd-valued and continuous in probability with Xo == O. Then X is spreadable iff it has conditionally stationary and independent increments, given some a-field I. Proof: The sufficiency being obvious, it suffices to show that the stated condition is necessary. Thus, assume that X is spreadable. Then the in- crements nk over the dyadic intervals Ink == 2- n (k - 1, k] are spreadable for fixed n, and so by Theorem 11.10 they are conditionally i.i.d. T/n for some random probability measure T/n on . Using Coronary 3.12 and the uniqueness in Theorem 11.10, we obtain *2 n - m 1]n == 1]m a.s., m < n. (12) Thus, for any m < n, the increments mk are conditionally i.i.d. rJm, given 1]n. Since the a-fields a(1]n) are a.s. nondecreasing by (12), Theorem 7.23 shows that the €mk remain conditionally i.i.d. rJm, given I = a { 1]0, Tll, . . . }. Now fix any disjoint intervals II"." In of equal length with associ- ated increments 1, . . . , n' Here we may approximate by disjoint intervals II' . . . , I of equal length with dyadic endpoints. For each m, the associ- ated increments k are conditionally i.i.d., given I. Thus, for any bounded, continuous functions f 1, . . . , f n , EIIIkn!k(k) = IIkn E I fk(k) = IIknEI !k(f{'). (13) Since X is continuous in probability, we have k  k for each k, so (13) extends by dominated convergence to the original variab]es k. By suitable approximation and monotone class arguments, we may finally extend the relations to any measurable indicator functions fk == IBk' 0 We turn to an interesting relationship between the san1ple intensity  of a stationary random measure  on JR+ and the corresponding maximum over increasing intervals. It is interesting to compare with the more general but less precise maximum inequalities in Proposition 10.10 and Lemmas 10.11 and 10.17. For the need of certain applications, we also consider the case of random measures  on [0,1). Here [ == [O, 1) by definition, and stationarity is defined as before in terms of the shifts Ot on [0,1), where (JtB = s+t (mod 1) and correspondingly for sets and measures. Recall that  is singular if its absolutely continuous component vanishes. This holds in particular for purely atomic measures . 
218 Foundations of Modern Probability Theorem 11.16 (ballot theorem) Let  be a stationary and a.s. singu- lar random measure on JR+ or [0, 1). Then there exists a U (0, 1) random variable a J.LI such that a sup t- I [O, t] == ( a.s. (14) t>O To justify the statement, we note that singularity is a measurable prop- erty of a measure J-l. Indeed, by Proposition 2.21, it is equivalent that the function Ft == prO, t] be singular. Now it is easy to check that the singularity of F can be described by countably many conditions, each involving the increments of F over finitely many intervals with rational endpoints. Proof: If  is stationary on [0, 1), then the periodic continuation 1] == L::n<O f}n is clearly stationary on JR+, and moreover ITf == I and ij == (. We may also use the elementary inequality Xl + . . . + X n Xk < max - , n EN, tl + . . . + t n - kn tk valid for arbitrary Xl, X2, . .. > 0 and tl, t2, . .. > 0, to see that SUPt t-IrJ[O, t] == SUPt t-I[O, t]. It is then enough to consider random measures on 1R+. In that case, put Xt == (O, t] and define At == infs>t(s - Xs), at == l{At == t - Xt}, t > o. (15) Noting that At < t - Xt and using the monotonicity of X, we get for any s<t As inf (r-X r )I\A t > (8-X t )I\A t rE[s,t) > (8 - t + At) 1\ At == 8 - t + At. If Ao is finite, then so is At for every t, and we obtain by subtraction o < At - As < t - 8 on {Ao> -oo}, 8 < t. (16) Thus, A is nondecreasing and absolutely continuous on {Ao > -oo}. Now fix a singular path of X such that Ao is finite, and let t > 0 be such that At < t - Xt. Then At + X t ::!: < t by monotonicity, and so, by the left and right continuity of A and X, there exists some E > 0 such that As + Xs < s - 2£, 18 - tl < E. Then by (16), s - Xs > As + 2€ > At + €, Is - tl < €, and by (15) it follows that As == At for Is - tl < €. In particular, A has derivative A == 0 == at at t. We turn to the complementary set D == {t > 0; At == t-X t }. By Theorem 2.15 both A and X are differentiable a.e., the latter with derivative 0, and we form a set D' by excluding the corresponding null sets. We may also 
11. Special Notions of Symmetry and lnvariance 219 exclude the at most countably many isolated points of D. Then for any tED' we may choose some t n  t in D \ {t}. By the definition of D, At n - At = 1 _ X tn - Xt R\T , n E l, t n - t t n - t and as n  00 we get A = 1 = at. Combining this with the result in the previous case gives A' == a a.e., and since A is absolutely continuous, we conclude from Theorem 2.15 that At - Ao = it cxsds on {Ao> -oo}, t. O. (17) Now recall that Xtlt -t  a.s. as t  ex) by Corollary 10.19. When  < 1, we see from (15) that -00 < Atlt  1 -  a.s. Also At + X t - t inf s 2:: t ( (s - t) - (X s - X t ) ) infso(s - 8t(0, s]), and hence at = l{infso(s - 8t(0, s]) == OJ, t > O. Dividing (17) by t and using Corollary 10.9, we get a.s. on { < 1} P[SUPt>o(Xtlt) < 11 L€] P[SUPt2::0 (X t - t) == 01 I] P[Ao == 01 I] E[o:ol I€] == 1 - . Replacing  by r and taking complements, we obtain more generally P[r SUPt>o(Xtlt) > 11 T€] == r /\ 1 a.s., r :> 0 -- , (18) where the result for r( E [1, (0) follows by monotonicity. When  E (0, (0), we may simply define a by (14); if instead ( = 0 or 00, we take a == iJ, where iJ is U(O, 1) and independent of . Note that (14) remains true in the latter case, since  == 0 a.s. on { == O} and Xtli -t 00 a.s. on { == oo}. To verify the distributional claim, we conclude from (18) and Theorem 6.4 that, on { E (O,oo)}, P[a < rl T€] == P[r SUPt(Xt/t) > I .L€] == r /\ 1 a.s., r > O. Since the same relation holds trivially when  == 0 or 00, we see that a is conditionally U(O,l) given If,., which means that (]" is U(O,l) and independent of If,.. 0 From the last theorem we may easily deduce a corresponding discrete- time result. Here (14) holds only with inequality and will be supplemented by a sharp relation similar to (18). For a stationary sequence  == (1, 2, . . .) in + with invariant a-field .L, we define  == E[lII] a.s. On {I, . . . , n} we define stationarity in the obvious way in terms of addition modulo n, and we put t == n- 1 Ek k. 
220 Foundations of Modern Probability Corollary 11.17 (discrete-time ballot theorem) Let  = (l, 2, . . .) be a finite or infinite, stationary sequence of JR+ -valued random variables, and put Sk = Ejkj. Then there exists a U(O, 1) random variable allI such that asuPk>O (Sk/k) < ( a.s. If the k are Z+-valued, we have also P[suPk>O (Sk - k) > 01 I] =  /\ 1 a.s. (19) (20) Proof: Arguing by periodic continuation as before, we may reduce to the case of infinite sequences . Now let {) be U(0,1) and independent of , and define Xt = S[t+19]' Then X has stationary increments, and we note that also Ix = If.. and X = (. By Theorem 11.16 there exists some U(O, 1) random variable a JLI x such that a.s. sup (Sk/k) = sup (S[t]/t) < sup (Xt/t) = (/a. k>O t>O t>O If the k are Z+-valued, the same result yields a.s. P[SUPk>O (Sk - k) > 01 I] - P[SUPtO (X t - t) > 01 If..) P[SUPt>o (Xt/t) > 11 I] P[ > al I{] =  !\ 1. 0 To state the next result, consider a random element  in a countable space S, and put Pj == P{ = j}. Given an arbitrary a-field :F, we define the information I(j) and the conditional information I(jIF) by I(j) = -logpj, I(jl}=") = -logP[ = jlF], j E S. For motivation, we note the additivity property I(l,.' ., n) = I(l) + I(211) + ... + I(nll,'.., n-l), (21) valid for any random elements l, . . . , n in S. Next we form the associated entropy H() = EI() and conditional entropy H(I:F) == EI(IF), and note that H() = EI() = - L .Pj logpj. J From (21) we see that even H is additive, in the sense that H(f,l,. .., n) = H(f,l) + H(f,211) + . . . + H(nll,. . ., n-l)' (22) If the n form a stationary and ergodic sequence such that H(o) < 00, we show that the averages of the terms in (21) and (22) converge toward a common limit. 
11. Special Notions of Symmetry and Invariance 221 Theorem 11.18 (entropy and information, Shannon, McMillan, Breiman, Ionescu Tulcea) Let  == (k) be a stationary and ergodic sequence in a countable space S such that H(o) < 00. Then n- 1 1(1,. .., n) --+ H(ol-l, -2, . . .) a.s. and in L 1 . Note that the condition H(o) < 00 holds automatically when the state space is finite. Our proof will be based on a technical estimate. Lemma 11.19 (maximum inequality, Chung, Neveu) libr any countably valued random variable  and discrete filtration (:F n ), we have EsuPnl(IFn) < H() + 1. Proof: Write Pj == P{ == j} and 1] == sUPn I(IFn)' For fixed r > 0, we introduce the optional times Tj == inf{n; l(jl:F n ) > r} == inf{n; P[ == jlFn] < e- r }, j E S. By Lemma 6.2, P{", > r,  == j} P{Tj < 00,  == j} E[P[ == j 1FT]]; TJ < 00] < e-rP{Tj<oo} < e- T . Since the left-hand side is also bounded by Pj, Lemma 3.4 yields  .E[1]j E = j] = . roo P{ 1] > r, E = j} dr J J Jo < . roo (e- r 1\ Pj) dr J J o  .Pj (1 - logpj) == H() + 1. J Proof of Theorem 11.18 (Breiman): We may assume  to be defined on the canonical space Soo. Then introduce the functions E1] o gk() == I(ol-I," . , -k+l), g() == I(ol-I, (-2,. . .). By (21) we may write the assertion in the form n- 1  gk(Ok) --+ Eg() a.s. and in L 1 , kn (23) Here gk() --+ g() a.s. by martingale convergence and E7suPngk() < 00 by Lemma 11.19. Hence, (23) follows by Corollary 10.8. 0 Exercises 1. Show that Lemma 11.1 can be strengthened to lirn inft t-l[O, t] > 0 a.s. on  =f O. (Hint: Use Corollary 10.19.) 
222 Foundations of Modern Probability 2. Let =: be a stationary random set in JR. Show that sup =: == 00 a.s. on =: =1= 0. (Hint: Use Lemma 11.1, or prove the result by a similar argument.) 3. For (X,) as in Lemma 11.2, define on the appropriate product space a random measure  by (A x B) == IB lA(Os(X,))(ds). Show that  is again stationary under shifts in ]Rd. 4. Prove Theorem 11.5 (i) by an elementary argument when the Bn are intervals in ]Rd. (Hint: If an interval I is partitioned for each n into subin- tervals Inj with maxj IInj I -7 0, then Lj 1 {Inj == I} -7 f,I a.s. Now take expected values and use dominated convergence.) 5. In the context of Theorem 11.8, show that Qx, == Q'x, iff ( == E( E (0,00) a.s. Also give examples where Qx, exists while Q'x,f. does not, and conversely. 6. Let J-Lc be the distribution of the sequence 71 < 72 < ... , where  == Lj fJ T ) is a stationary Poisson process on + with rate c > O. Show that the Me are asymptotically invariant as c -7 O. 7. Show by an example that a finite, exchangeable sequence need not be mixed i.i.d. 8. Let the random sequence  be conditionally i.i.d. 'rJ. Show that  is ergodic iff 'TJ is a.s. nonrandom. 9. Let  and'TJ be random probability measures on some Borel space such that Ef,oo == E",oo. Show that  d 'TJ. (Hint: Use the law of large numbers.) 10. Let 1, 2, . .. be spreadable random elements in some Borel space S. Prove the existence of a measurable function f: [0,1]2 -7 S and some i.i.d. U(O, 1) random variables 19 0 ,19 1 ,." such that f,n = f(190' 19n) a.s. for all n. (Hint: Use Lemma 3.22, Proposition 6.13, and Theorems 6.10 and 11.10.) 11. Let  = (1, 2, . . . ) be an F -spreadable random sequence in some Borel space S. Prove the existence of some random measure 'TJ such that, for each n E Z+, the sequence on is conditionally i.i.d. 'TJ, given Fn and 'TJ. 12. Let 1, . . . , n be exchangeable random variables, fix a Borel set B, and let 71 < . . . < 7v be the indices k E {I, . . . , n} with Lj<k j E B. Construct d a random vector ("'1,.." 'T}n) == (1, . . . , n) such that Tk == 'T}k a.s. for all k < v. (Hint: Extend the sequence (7k) to k E (v, n], and apply Theorem 11.13.) 13. Prove a version of Corollary 11.14 for the last maximum. 14. State and prove a continuous-time version of Lemma 11.12. (If no reg- ularity conditions are imposed on the exchangeable processes of Theorem 11.15, we need to consider optional times taking countably many values.) 15. Anticipating the theory of Levy processes in Chapter 15, show that any exchangeable process on 1R+ as in Theorem 11.15 has a version with rcll paths. 
11. Special Notions of Symmetry and lnvariance 223 16. Show by an example that the conclusion of TheoreIn 11.16 may fail when  is not singular. 17. Give an example where the inequality in Corollary 11.17 is a.s. strict. (Hint: Examine the proof.) 18. (Bertrand, Andre) Show that if two candidates A and B in an election get the proportions p and 1 - P of the votes, then the probability that A will lead throughout the ballot count equals (2p-1)+. (Hint: Use Corollary 11.17. Alternatively, use a combinatorial argument based on the reflection principle. ) 19. Prove the second claim in Corollary 11.17 by a martingale argument, in the case where 1,. . . , n are Z+-valued and exchangeable. (Hint: We may assume that Sn is nonrandom. Then the variables M k == Sk/k form a reverse martingale, and the result follows by optional sarnpling.) 20. Prove that the convergence in Theorem 11.18 holds in LP for ar- bitrary p > 0 when S is finite. (Hint: Show as in Lemma 11.19 that II sUPn I(IFn)llp < 00 when  is S-valued, and use Corollary 10.8 (ii).) 21. Show that H(, 'TJ) < H() + H('fJ) for any  and 7]. (Hint: Note that H(1}I) < H(r]) by Jensen's inequality.) 22. Give an example of a stationary Markov chain (n) such that H(l) > 0 but H(llo) == o. 23. Give an example of a stationary Markov chain (n) such that H(l) == 00 but H(llo) < 00. (Hint: Choose the state space Z+, and consider transition probabilities Pij that equal 0 unless j == i + 1 or j == 0.) 
Chapter 12 Poisson and Pure J ump- Type arkov Processes Random measures and point processes; Cox processes, random- ization, and thinning; mixed Poisson and binomial processes; independence and symmetry criteria; Markov transition and rate kernels; embedded Markov chains and explosion; compound and pseudo-Poisson processes; ergodic behavior of irreducible chains Poisson processes and Brownian motion constitute the basic building blocks of modern probability theory. Our first goal in this chapter is to introduce the family of Poisson and related processes. In particular, we construct Poisson processes on bounded sets as mixed binomial processes and derive a variety of Poisson characterizations in terms of independence, symmetry, and renewal properties. A randomization of the underlying intensity mea- sure leads to the richer class of Cox processes. We also consider the related randomizations of general point processes, obtainable through independent motions of the individual point masses. In particular, we will see how the latter type of transformations preserve the Poisson property. It is usually most convenient to regard Poisson and other point processes on an abstract space as integer-valued random measures. The relevant parts of this chapter may then serve at the same time as an introduction to ran- dom measure theory. In particular, Cox processes and randomizations will be used to derive some general uniqueness criteria for simple point processes and diffuse random measures. The notions and results of this chapter form a basis for the corresponding weak convergence theory developed in Chapter 16, where Poisson and Cox processes appear as limits in important special cases. Our second goal is to continue the theory of Markov processes from Chapter 8 with a detailed study of pure jump-type processes. The evo- lution of such a process is governed by a rate kernel lX, which determines both the rate at which transitions occur and the associated transition prob- abilities. For bounded a one gets a pseudo-Poisson process, which may be described as a discrete-time Markov chain with transition times given by an indep.endent, homogeneous Poisson process. Of special interest is the case of compound Poisson processes, where the underlying Markov chain is a random walk. In Chapter 19 we shall see how every Feller process can be 
12. Poisson and Pure Jump-Type Markov Processes 225 approximated in a natural way by pseudo-Poisson processes, recognized in that context by the boundedness of their generators. A similar compound Poisson approximation of general Levy processes is utilized in Chapter 15. In addition to the already mentioned connections to other topics, we note the fundamental role of Poisson processes for the theory of Levy pro- cesses in Chapter 15 and for excursion theory in Chapter 22. In Chapter 25 the independent-increment characterization of Poisson processes is ex- tended to a criterion in terms of compensators, and we derive some related time-change results. Finally, the ergodic theory for continuous-time Markov chains, developed at the end of this chapter, is analogous to the discrete- time theory of Chapter 8 and will be extended in Chapter 20 to a general class of Feller processes. A related theory for diffusions appears in Chapter 23. To introduce the basic notions of random measure theory, consider an arbitrary measurable space (5, S). By a random measure on S we mean a u-finite kernel  from the basic probability space (f!, A, P) into S. Here the u-finiteness means that there exists a partition B 1 , B 2 , . .. E 5 of S such that Bk < 00 a.s. for all k. It is often convenient to think of  as a random element in the space M(5) of u-finite measures on S, endowed with the u-field generated by the projection maps 7rB : J-t H J-tB for arbitrary B E S. Note that B = (., B) is a random variable in [0,00] for every B E 5. More generally, it is clear by a simple approximation that f == J fd is a random variable in [0,00] for every measurable function .f > 0 on S. The intensity of  is defined as the measure EB = E(B), B E S. We often encounter the situation when 5 is a topological space with Borel u-field S = B(8). In the special case when S is a locally compact, second countable Hausdorff space (abbreviated as lcscH), it is understood that  is a.s. finite on the ring S of all relatively compact Borel sets. Equivalently, we assume that f < 00 a.s. for every f E C1«5), the class of continuous functions f > 0 on S with compact support. In this case, the a-field in M (S) is generated by the projections 7r f : J1  J1 f for all fECi< (S). The following elementary result provides the basic uniqueness criteria for random measures. Stronger results are given for simple point processes and diffuse random measures in Theorem 12.8, and related convergence criteria appear in Theorem 16.16. Lemma 12.1 (uniqueness for random measures) Let  and'TJ be random measures on S. Then  d 11 under each of these conditions: (i) (Bl,..., Bn) d (ryBl'...' 'TJBn) for any B l , . .., Bn E 5, n E N; (ii)  f d 11 f for any measurable function f > 0 on S. If S is lescH, it suffices in (ii) to consider functions f E (Jj( ( S) . Proof: The sufficiency of (i) is clear from Proposition 3.2. Next we note that (i) follows from (ii), as we apply the latter condition to any positive linear combination f = Ek ck1Bk and use the Cramer-Wold Corollary 5.5. 
226 Foundations of Modern Probability Now assume that S is lescH, and that (ii) holds for all f E C]((8). Since C]( (S) is closed under positive linear combinations, we see as before that (!1,' .., !n) d (TJ!I,..., TJfn), 11,. . . , In E Ck, n E N. By Theorem 1.1 it follows that .c() = .c(rJ) on the a-field 9 = a{7rj; I E Cj( }, where 1T j : J-l H J-lf, and it remains to show that Q contains F = a{1TB; B E 5}. Then fix any compact set K c S, and choose some functions in E Cj( with In -!- l K . Since J-lln -!- J.LK for every /-L E M(S), the mapping 7rK is Q-measurable by Lemma 1.10. Next apply Theorem 1.1 to the Borel subses of an arbitrary compact set, to see that 1rB is Q-measurable for any B E S. Hence, F c Q. 0 - By a point process on S we mean an integer-valued random measure . In other words, we assume B to be a Z +-valued random variable for every B E S. Alternatively, we may think of  as a random element in the space N(S) c M(S) of all a-finite, integer-valued measures on S. When S is Borel, we may write  = 2: k K, 8'k for some random elements /'1, /'2, . . . in Sand '" in Z +, and we note that  is simple iff the 7k with k < K are distinct. In general, we may eliminate the possible multiplicities to create a simple point process *, which agrees with the counting measure on the support of . By construction it is clear that * is a measurable function of €. A random measure  on a measurable space S is said to have independent increments if the random variables Bl,'.. ,Bn are independent for any disjoint sets Bl"..' Bn E S. By a Poisson process on S with intensity measure J.-L E M (S) we mean a point process  on S with independent increments such that B is Poisson with mean /-LB whenever JlB < 00. By Lemma 12.1 the stated conditions specify the distribution of , which is then determined by the intensity measure J..L. More generally, for any random measure 'TJ on S, we say that a point process  is a Cox process directed by TJ if it is conditionally Poisson, given 1], with E[I1]] = 'TJ a.s. In particular, we may take rJ = QJ..L for some measure J.L E M (S) and random variable a > 0 to form a mixed Poisson process based on J.-L and a. We next define a v-randomization ( of an arbitrary point process  on S, where v is a probability kernel from S to some measurable space T. Assuming first that  is nonrandom and equal to J..L = 2:k 8 Sk , we may take ( == 2:k 6 Sk ,'Yk' where the "'/k are independent random elements in T with distributions V(Sk, .). Note that the distribution PJ-L of ( depends only on J.L. In general, we define a v-randomization ( of  by the condition P[( E .I] = p a.s. In the special case when T = {O, I} and v(s, {O}) = P E [0,1], we refer to the point process p == (. x {O}) on S as a p-thinning of. Another special instance is when S == {O},  = ",8 0 , and v = J.t / J-LT for some J-L E M(T) with J-LT E (0,00), in which case <" is called a mixed binomial (or sample) process based on J.-L and K. Note that (B is then binomially distributed, conditionally on K, with parameters vB and "'. If T is Borel, 
12. Poisson and Pure Jump-Type Markov Processes 227 we can write ( == Lk<K fJ'Yk' where the random elements lk are i.i.d. v and independent of K. - Our first aim is to examine the relationship between the various point processes introduced so far. Here we may simplify the cornputations by us- ing the Laplace functional1/J(f) == Ee-€f of a random measure , defined for any measurable function f > 0 on the state space S. Note the 'l/JE" de- termines the distribution .c() by Lemma 12.1 and the uniqueness theorem for Laplace transforms. The following lemma lists some useful formulas. Recall that a kernel v between two measurable spaces Sand T may be regarded as an operator between the associated function spaces, given by vf(s) == J v(s,dt)f(t). For convenience, we write D(s,.) :== 8s 0 V(b, .), so that J-L 0 v == {LV. Lemma 12.2 (Laplace functionals) Let f, 9 > 0 be measurable. (i) If  is a Poisson process with E == J-L, then Ee-€f == exp{ -J-L(1 - e- f)} . Here we may replace f by if when f: S -+ IR with p(lfl/\ 1) < 00. (ii) If  is a Cox process directed by 'fJ, then Ee-f-119 == Eexp{ -1}(1- e- f + g)}. (iii) If ( is a v-randomization of, then Ee-(f == Eexp(logfje-f). (iv) If p is a p-thinning of, then Ee-pf-g == E exp{ -(g - log{l - p(l - e- f )})} . (v) If  is a mixed binomial process based on JL and K, then Ee-E"f == E(J-Le- f / J-LS)K. Proof: (i) If Q is a Poisson random variable with mean m, then clearly Ee- CCl = e- m LkO (me-c)k /k! = exp{ -m(l - e- C )}, C E <C. Now let f == Lk<m Ck IB k , where Ck E <C and the sets Bk E S are disjoint with {LBk < 00. Then Ee-f Eexp {- Lk ckf.Bk} = It Ee-Ck(.Bk It exp{ -JLBdl - e- Ck )} exp { - LkJLBk(l - e- Ck )} exp{ -J-L(1 - e- f)}. For general f > 0, we may choose some simple functions In > 0 with In t I and conclude by monotone convergence that fn -+ f and J.L(1 - e- fn ) -+ f..£(l - e- f ). The asserted formula then follows by dominated convergence from the version for f n. 
228 Foundations of Modern Probability Now assume that J.L(lfl/\ 1) < 00. Replacing f by elfl in the previous formula and letting c t 0, we get by dominated convergence P{ Ifl < oo} == eO == 1, or Ifl < 00 a.s. Next choose some simple functions In ---* f with Ifni < If I and J.Llfnl < 00, and note that 11 - e-ifnl < Ifl/\ 2 by Lemma 5.14. By dominated convergence we obtain f,fn --+ f,f and /l(I - e- ifn ) ---* J.L(1 - e- if ). The extended formula now follows from the version for In. (ii) By (i) we have Ee-f. f -1J9 _ Ee- 119 E[e-f. f l1]] Ee- 1J9 exp{ -1J(1 - e- f)} Eexp{ -ry(l - e- f + g)}. (iii) First assume that  == Lk 8Sk is nonrandom. Introducing some independent random elements rk in T with distributions V(Sk, .), we get Ee-(f Eexp {- 2:/(Sk,1'k)} IIk Ee- f(Sk,"Ik) = II/e- f (Sk) - exp 2:klogve-f(sk) = expOogve- f . Hence, in general, Ee-c;f == EE[e-c;fl] == Eexplogve-f. (iv) Apply (iii), or use the same method of proof. (v) We may assume that  == Lk<1'£ b"Yk' where /'1, 1'2, . .. are i.i.d. and independent of  with distribution J..L7 J.lB. Using Fubini's theorem, we get Ee-f. f Eex p { -  !(Ik) } == E ll Ee-f('Yk) k k E II (J.le-f / J.lS) == E(J.le- f / J..LB)  . kK o It is now easy to prove that the Poisson property is preserved under randomizations. Here is a more general result. Proposition 12.3 (preservation laws) For any measurable spaces S, T, and U, consider some probability kernels J.L: S --+ T and v: S x T --+ U. (i) If  is a Cox process on S directed by 1] and (lL 1] is a J..L- randomization of, then ( is a Cox process directed by 1] @ J..L. (ii) If'fJ is a J-L-randomization of  and ( is a v-randomization of 1], then ( is a J.L Q9 v-randomization of . Note that the conditional independence in (i) holds automatically when ( is constructed from  by independent randomization, as in Lemma 6.9. 
12. Poisson and Pure Jump-Type Markov Processes 229 Proof: (i) Using Proposition 6.6 and Lemma 12.2 (ii) and (iii), we get for any measurable functions f, 9 > 0 Ee-(f-TJflg Ee- 1J ji,g E[e-(fl, 1]] Eexp{logjte-f -1]jtg} Eexp{ -1](1 - jle- f + jly)} - Eexp{-ru 1 (1 - e- f + g)}. The result now follows by Lemmas 12.1 (ii) and 12.2 (ii). (ii) By Lemma 12.2 (iii), Ee-(f E exp{ 1] log ve- f} E exp{  log flve- f} E exp {  log (Jl @ v)" e - f } . o We continue with a basic relationship between Poisson and binomial processes. The result leads to an easy construction of the general Poisson process in Theorem 12.7. The significance of mixed Poisson and binomial processes is further clarified by Theorem 12.12 below. Theorem 12.4 (mixed Poisson and binomial processes) Consider a point process  and a a-finite measure J..l on a common space (8,5), and let B 1 , B 2 , ... E S with Bn t S. Then  is a mixed Poisson or binomial process based on J..l iff the same property holds on Bn for every n EN. Proof: First assume that  is a mixed Poisson process based on Jl and a. Then the same property holds for the restriction to any set B E S with J-tB = 00. If instead J..lB E (0, 00 ), let 1] be a mixed binom ial process based on IB . p and "", where"" is conditionally Poisson with mean allB. By Lemma 12.2 (ii) and (v) we have for any measurable function f: S --+ JR+ supported by B Ee- TJf E(p[e- f; B]/ pB) K - E exp( -aJ.lB(l - J..l[e- f; B]/ J.ll1)) - E exp( -aJ.l[1 - e- f; B]) E exp ( - Q J..l (1 - e - f)) == E e -f" f . Thus,  d 1] on B, as required. Next let  be a mixed binomial process on S based on J.L and K, and fix any B E S with J.lB > O. Let Kp be a p-thinning of K, where p = J-tB / J.lS, and consider a mixed binomial process 1] based on IB . J.L and "'p. Using Lemma 12.2 (iv) and (v), we get for any measurable function f > 0 supported byB 
230 Foundations of Modern Probability Ee- 1Jf - E(J.t[e-f;B]/J.tB)P E{ 1-  (1- JL[e:B] ) }  - E(l- J.t(l- e-f)/J.tS)K - J _ E (j-te - f / j-tS) K == Ee -(.f . Again it follows that  d TJ on B. To prove the converse assertion, we may clearly assume that j-tBn E (0,00) for all n, so that 1Bn .  is a mixed binomial process based on IBn . J-t and Bn' If f > 0 is supported by Bm, then by Lemma 12.2 we have for n > m Ee-f = E ( JL[e- f; B n ] ) £;Bn = E ( 1 _ JL(l - e- f) ) £;Bn . J.tBn j.tBn If J-tS < 00, then as n --t 00 we get by dominated convergence Ee- U = E(1- JL(l :;-f) ) £;S = E( JL=f ) £;S Taking f = c1Bm and letting c -+ 0, we see in particular that S < 00 a.s. The relation extends by dominated convergence to arbitrary f > 0, and so by Lemma 12.2 we conclude that  is a mixed binomial process based on ft and €S. If instead jLS = 00, Theorem 5.19 shows that Bn/ /-LBn  a in [0,00] along some subsequence N' c N, where 0 < Q < 00. By Theorem 4.30 we d may choose some an = Bn/ j-tBn such that an -+ 0: a.s. along N', and so by dominated convergence in (1) (1) Ee-f == Eexp(-ajL(l- e- f )). As before, we see that Q < 00 a.s., and by monotone and dominated con- vergence we may extend the relation to arbitrary f > O. Hence, Lemma 12.2 shows that  is a mixed Poisson process based on J-t and Q. 0 The l;t, result leads in particular to a criterion for a Poisson process to '-) be simpTe Recall that a measure J.t on S is said to be diffuse if J..t{ s} == 0 for all s E S. Corollary 12.5 (simplicity and diffuseness) Let  be a Cox process di- rected by some random measure '1}, both defined on a Borel space S. Then  is a.s. simple iff", is a.s. diffuse. Proof: It is enough to establish the corresponding property for mixed binomial processes. Then let ')'1,,2,... be Li.d. with distribution J.L. By 
12. Poisson and Pure Jump- Type Markov Processes 231 Fubini's theorem Phi = ')'j} = J J.L{ s }J.L(ds) = Ls (J.L{ S})2, i =1= j, and so the fj are a.s. distinct iff J.t is diffuse. 0 The following uniqueness assertion will playa crucial role in a subsequent proof. Lemma 12.6 (uniqueness for Cox processes and thinnings) Fix apE (0, 1). (i) For any Cox processes  and' directed by 1] and ffJ', we have  d ' jJ d / 'l 'TJ = 'TJ . (ii) For any p-thinnings p and  of  and', we have p d  iff  d '. Proof: We prove only (i), the argument for (ii) being similar. By Lemma 12.2 (ii) we have for any measurable function 9 > 0 on S Ee-g = E exp{ -'TJ(1 - e- g )} = Ee-1]f, where f = 1-e- g , and similarly for' and 'fl. Assuming  d /, we conclude that Ee-T/f = Ee-T/'f for any measurable function f: S --+ [0,1). Then also Ee-tT/f = Ee- tTl ' f, t E [0,1], and since both sides are analytic for t > 0, the relation extends to all t > O. Hence, Ee- TJf = Ee-T/' f for all bounded, measurable functions f > 0, and Lemma 12.1 (ii) yields 1] d 'TJ'. 0 We proceed to establish the existence of a Poisson process with arbitrary intensity measure on a general measurable space. More generally, we can prove the existence of arbitrary Cox processes and randomizations, which also covers the cases of thinnings and mixed binomial processes. Theorem 12.7 (existence) Fix any measurable spaces Sand T, and allow suitable extensions of the basic probability space. (i) For any random measure", on 5, there exists a Co; process  directed by 'TJ. (ii) For any point process  on S and probability kernelll: S -t T, there exists a v-randomization ( of . Proof: (i) First assume that", = J.t is nonrandom with J-LS E (0,00). By Corollary 6.18 we may choose a Poisson distributed random variable K with EK = J1S and an independent sequence of i.i.d. random elements ,1,,2, . . . in S with distribution J-L/ J-LS. By Theorem 12.4 the random measure  = Ej:::;K 8j is then Poisson with intensity J-L. Next let J1S = 00. Since J1 is l1-finite, we may split S into disjoint subsets B 1 , B2,' .. E S such that J..tBk E (0, (0) for each k. As before, there exists for every k a Poisson process k on S with intensity J1k = 1Bk . J.L, and by 
232 Foundations of Modern Probability Corollary 6.18 we may choose the k to be independent. Writing  == Lk k and using Lemma 12.2 (i), we get for any measurable function f > 0 on S Ee -f ilk Ee-f.kf = ilk exp { -J.Lk(l - e- f )} exp { - L k J.Lk(l - e- f) } exp{ -tt(l - e- f )}. Using Lemmas 12.1 (ii) and 12.2 (i), we conclude that  is a Poisson process with intensity J..L. Now let f,J-t be a Poisson process with intensity tt. Then for any numbers m1, . . . , m n E Z+ and disjoint sets B 1 , .. . , Bn E S, we have p n {J-tBk == mk} = Il e-J-tBk (J.LBk)mk /mk!' kn kn which is a measurable function of J..L. (Here the expression on the right is understood to be 0 when ttBk == 00.) The measurability extends to arbitrary sets Bk E S, since the general probability on the left is a finite sum of such products. 'Now the sets on the left form a 1r-system generating the a-field in N(B), and so by Theorem 1.1 we conclude that PJ-t == £(f,J-t) is a probability kernel from M(S) to N(S). But then Lemma 6.9 ensures the existence, for any random measure 1J on S, of a Cox process  directed by 'TJ. (ii) First let J..L = Lk 6Sk be nonrandom in N(S). By Corollary 6.18 there exist some independent random elements 'Yk in T with distributions V(Sk, .), and we note that (JL == Lk 6 Sk ,'Yk is a v-randomization of J..L. Letting Bl'...' Bn E S x T and 81,. .. ,8n E (0,1), we get by Lemma 12.2 (iii) E exp (I' Lk 1Bk log Sk - exp J.L log vexp Lk 1Bk log Sk exp J.L log v Ilk SBk . Using Lemma 1.41 (i) twice, we see that f) TIk8Bk is a measurable function on S for fixed Sl,.. ., 8n, and hence that the right-hand side is a measurable function of J-L. Differentiating mk times with respect to Sk for each k and taking 81 == . . . == 8n = 0, we conclude that the probability P nk {(J-tBk == mk} is a measurable function of J.L for any ml, . . . , m n E Z+. As before, it follows that PJL == £«(J-t) is a probability kernel from N(B) to N(B x T), and the general result follows by Lemma 6.9. 0 E Il 8(p.Bk k k We may use Cox transformations and thinnings to derive some general uniqueness criteria for simple point processes and diffuse random measures, improving-the elementary statements in Lemma 12.1. Related convergence criteria are given in Proposition 16.17 and Theorems 16.28 and 16.29. 
12. Poisson and Pure Jump-Type Markov Processes 233 Theorem 12.8 (one-dimensional uniqueness criteria) let (3, S) be Borel. (i) For any simple point processes  and "l on S, we have  d "l iff P{B = O} = P{'rJB = O} for all B E S. (ii) Let  and"l be simple point processes or diffuse random measures on 5, and fix any C > O. Then  d "l iff Ee-cB == Ee- Cl1B for all B E S. (iii) Let  be a simple point process or diffuse random rneasure on S, and let "l be an arbitrary random measure on S. Then; d "l iff B d TJB for all B E s. Proof: We may clearly assume that S = (0,1]. (i) Let C denote the class of sets {J.L; J.LB == O} with B E S, and note that C is a 7r-system since {JLB = O} n {JLG = O} = {JL(B U C) = O}, B, C E S. By Theorem 1.1 it follows that  d 77 on a(C). Furthermore, writing Inj == 2- n (j -l,j] for n E Nand j = 1,... ,2 n , we have f-L* B = lim L .(JL(B n Inj) 1\ 1), f-L E N(S), B E S, noo 1 which shows that the mapping JL r4 JL* is a(C)-measurable. Since  and TJ are simple, we conclude that  == * d 'rJ* == TJ. (ii) First let { and 1} be diffuse. By Theorem 12.7 we may choose some Cox processes  and ij directed by c{ and C'rJ. Conditioning on  or TJ, respectively, we obtain P{B = O} == Ee-c{B = Ee- cTJB == P{ijB = O}, B E S. (2) Since t and ij are a.s. simple by Corollary 12.5, assertion (i) yields t d iJ, and so  d 'T} by Lemma 12.6. If  and 'TJ are instead simple point processes, then (2) holds by Lemma 12.2 (iv) when  and ij are p-thinnings of  and 'T} with p = 1 - e- c , and the proof may be completed as before. (iii) First let  be a simple point process. Fix any B E S such that 'TJE < 00 a.s. Defining Inj as before, we note that 1}(B n Inj) E Z+ outside a fixed null set. It follows easily that IB . 'T} is a.s. integer valued, and so even 1] is a.s. a point process. Noting that P{1]* B = O} = P{1]B = O} = P{B = OJ, 11 E S, we conclude from (i) that  d 'rJ*. In particular, 'TJ B d  B d 77* B for all B, and so 1}* == 1] a.s. Next assume that  is a.s. diffuse. Letting  and ij be Cox processes directed by  and 1], we note that B d ijB for every B E S. Since t is a.s. simple by Proposition 12.5, it follows as before that  d " and so  d 1] by Lemma 12.6. 0 
234 Foundations of Modern Probability As an easy consequence, we get the following characterization of Pois- son processes. To simplify the statement, we may allow a Poisson random variable to have infinite mean, hence to be a.s. infinite. Corollary 12.9 (one-dimensional Poisson criterion, Renyi) Let  be a random measure on a Borel space S such that {s} = 0 a.s. for all S E S. Then  is a Poisson process iff B is Poisson for every B E S, in which case E is u-finite and diffuse. Proof: Assume the stated condition. Then J.L = E is clearly a-finite and diffuse, and by Theorem 12.7 there exists a Poisson process 1] on S with intensity J-l. Then ryB d B for all B E S, and since 1] is a.s. simple by d Corollary 12.5, we conclude from Theorem 12.8 that € = 'rJ. 0 Much of the previous theory can be extended to the case of marks. Given any measurable spaces (S, S) and (K, lC), we define a K -marked point pro- cess on S as a point process  on S x K in the usual sense satisfying ({ s} x K) < 1 identically and such that the projections €(. x Kj) are a-finite point processes on S for some measurable partition K 1 , K 2 , . . . of K. We say that € has independent increments if the point processes €(B 1 x.), . . . , (Bn x.) on K are independent for any disjoint sets B 1 , . . . , Bn E S. We also say that  is a Poisson process if € is Poisson in the usual sense on the product space S x K. The following result characterizes Poisson processes in terms of the independence property. The result plays a crucial role in Chapters 15 and 22. A related characterization in terms of compensators is given in Corollary 25.25. Theorem 12.10 (independence criterion for Poisson, Erlang, Levy) Let  be a K -marked point process on a Borel space S such that €( {s} x K) = 0 a.s. for all S E S. Then  is Poisson iff it has independent increments, in which case E is a-finite with diffuse projections onto S. Proof: We may assume that S = (0,1]. Fix any set B E S Q9 /C with B < 00 a.s., and note that the projection 17 = (lB . )(. x K) is a simple point process on S with independent increments such that 1]{ s} = 0 a.s. for all s E S. Introduce the dyadic intervals In; = 2- n (j - 1, j], and note that maxj 1]Inj V 1 --t 1 a.s. Next fix any c > O. By dominated convergence, every point s E [0,1] has an open neighborhood GS such that P{1]Gs > O} < £, and by compactness we may cover [0,1] by finitely many such sets G 1 ,. . . , G m . Choosing n so large that every interval Inj lies in one of the Gk, we get maxj P{1]Inj > O} < E. This shows that the variables 1]Inj form a null array. Now apply Theorem 5.7 to see that the random variable B = 1]8 = E j ",In; is Poisson. Since B was arbitrary, Corollary 12.9 then shows that  is a Poisson process on S x K. The last assertion is now obvious. 0 
12. Poisson and Pure Jump-Type Markov Processes 235 The last theorem yields in particular a representation of random mea- sures with independent increments. A version for general processes on 1R+ will be proved in Theorem 15.4. Corollary 12.11 (independent increments) Let € be a random measure on a Borel space S such that  { s} == 0 a. s. for all s. Then  has independent increments iff a.s. B = aB + 1 00 xT}(B x dx), B E S, (3) for some nonrandom measure Q on S and some Poisson process TJ on S x (0,00). Furthermore, B < 00 a.s. for some B E S iff oB < 00 and 1 00 (x 1\ 1) ET}(B x dx) < 00. (4) Proof: Introduce on S x (0,00) the point process TJ == L8 8s,{s}, where the required measurability follows by a simple approximation. Noting that 1] has independent S-increments, and also that 1]({S} x (0,00)) == 1{{s} > O} < 1, S E S, we conclude from Theorem 12.10 that rJ is a Poisson process. Subtracting the atomic part from , we get a diffuse random measure 0 satisfying (3), and we note that a has again independent increments. Hence, a is a.s. nonrandom by Theorem 5.11. Next, Lemma 12.2 (i) yields for any B E S and r > 0 -logEexp {-r 1 00 XT}(B x dX)} = 1 00 (1 - e- rx ) ET}(B x dx). As r -t 0, it follows by dominated convergence that Jo CX ) x1](B x dx) < 00 a.s. iff (4) holds. 0 We proceed to characterize the mixed Poisson and binomial processes by a natural symmetry condition. Related results for more general processes appear in Theorems 11.15 and 16.21. Given a random measure  and a diffuse measure j.L on S, we say that € is j.L-symmetric if  0 1- 1 d  for every JL-preserving mapping f on S. Theorem 12.12 (symmetric point processes) Consider a simple point process € and a diffuse, a-finite measure J.-l on a Borel space S. Then € is J..l-symmetric iff it is a mixed Poisson or binomial process based on J.-l. Proof: By Theorem 12.4 and scaling we may assume that J.-lS == 1. By the symmetry of  there exists a function 'P on [0,1] such that P{B == O} = r.p(jjB) for all B, and by Theorem 12.8 (i) it is enough to show that 'P has the desired form. For notational convenience, we may then assume that jj equals Lebesgue measure on (0,1], the general case being similar. Then introduce for suitable j,n E N the intervals Inj == n- 1 (j -1,j], and 
236 Foundations of Modern Probability put nj = Inj 1\ 1. Writing Kn = Lj nj, we get by symmetry k-l . <p(k/n) = E n n - "'n  J , 0 < k < n. . O n - J J= As n -t 00, we have ""n --t K == (O, 1], and so for kin  t E (0,1) k-1 . IT n - ""n - J log . n- J j=O n L log (1 - "'r n ) r=n-k+1 r-v n i n -1 -1 -K L r r-v -K X dx r=n-k+1 n-k K log( 1 - kin) -+ "" log( 1 - t). Hence, the product on the left tends to (1 - t), and so by dominated convergence we get 'P(t) = E(l - t)'" for rational t E (0,1), which extends by monotonicity to all real t E [0, 1]. This clearly agrees with the result for a mixed binomial process on (0,1] with K points. 0 Integrals with respect to Poisson processes occur frequently in applica- tions. The next result gives criteria for the existence of the integrals f, ( - /)/, and ( - J1)/, where  and ' are independent Poisson processes with a common intensity measure J-L. In each case the integral may be de- fined as a limit in probability of elementary integrals fn, ( - /)fn, or ( - JL) / n, respectively, where the / n are bounded with compact support and such that Ifni < If I and fn -4 f. We say that the integral of f exists if the appropriate limit exists and is independent of the choice of approximating functions f n . Lemma 12.13 (Poisson integrals) Let  and ' be independent Poisson processes on S with the same intensity measure JL. Then for any measurable function f on S, we have (i) f exists iff J1( 1/1/\ 1) < 00; (ii) (- /)f exists iff J-L(f2 1\ 1) < 00; (iii) (- J.t)f exists iff Jl;(f2 /\ liD < 00. In each case, it is also equivalent that the corresponding set of approximat- ing elementary integrals is tight. Proof: (i) If Ifl < 00 a.s., then Jl;(lfl /\ 1) < 00 by Lemma 12.2. The converse implication was established in the proof of the same lemma. (ii) First consider a deterministic counting measure II = Lk lJ sk , and define v = Lk {}k 8 sk' where {)1, {)2, . .. are i.i.d. random variables with P{{}k = :f:1} = . By Theorem 4.17, the series ilf converges a.s. iff v j2 < 00, and otherwise Iv f n I  00 for any bounded approximations fn = IBnf with Bn E S. The result extends by conditioning to arbitrary point processes v and their symmetric randomizations v. Now Proposition 
12. Poisson and Pure Jump-Type Markov Processes 237 12.3 exhibits  - ' as such a randomization of the Poisson process  + /, and by part (i) we have (+ /)f2 < 00 a.s. iff J-t(f2 1\ 1) < 00. (iii) Write f == 9 + h, where 9 == fl{lfl < I} and h == jl{lfl > I}. First assume that J-tg2 + ttlhl == tt(f2/\ If I) < 00. Since clearly E( f - jjf)2 == pf2, the integral (- J-t)g exists. Furthermore, h exists by part (i). Hence, even ( - JL)f == ( - p)g + h - JLh exists. Conversely, assume that ( - J-t)f exists. Then so does ( - /)j, and by part (ii) we get ttg 2 + J1{h =1= O} == J1(f2/\ 1) < 00. The existence of (- J-t)g now follows by the direct assertion, and trivially even h exists. Thus, the existence of ph == (- J1)g + h - (- J1)f follows, and so J-tlhl < 00. 0 A Poisson process  on JR+ is said to be time-homogeneous with rate C > 0 if E == CA. In that case Proposition 8.5 shows that Nt == [O, t], t > 0, is a space- and time-homogeneous Markov process. We now introduce a more general class of Markov processes. Say that a process X in some measurable space (8, S) is of pure jump type if its paths are a.s. right-,continuous and constant apart from isolated jumps. In that case we may denote the jump times of X by 71,72, . . . , with the understanding that Tn == 00 if there are fewer than n jumps. By Lemma 7.3 and a simple approximation, the times Tn are optional with respect to the right-continuous filtration :F == (Ft) induced by X. F'or convenience we may choose X to be the identity mapping on the canonical path space 0. When X is Markov, the distribution with initial state x is denoted by Px, and we note that the mapping x t---t Px is a kernel from (8, S) to (0, Foo). We begin our study of pure jump-type Markov processes by proving an extension of the elementary strong Markov property in Proposition 8.9. A further extension appears as Theorem 19.17. Theorem 12.14 (strong Markov property, Doob) A pure jump-type Markov process satisfies the strong Markov property at every optional time. Proof: For any optional time r, we may choose some optional times an > r+2- n taking countably many values such that an -+ T a.s. By Proposition 8.9 we get, for any A E:F., n {T < oo} and B E :Foo, P[f)unX E B; A] == E[P x <7n B; A]. (5) By the right-continuity of X, we have P{XD"n =1= X T } -+ o. If B depends on finitely many coordinates, it is also clear that P({OD"nX E B}6{OTX E B}) -+ 0, n -+ CX). Hence, (5) remains true for such sets B with an replaced by T, and the relation extends to the general case by a monotone class argument. 0 We shall now see how the homogeneous Poisson processes may be char- acterized as special renewal processes. Recall that a random variable 'Y is said to be exponentially distributed with rate c > 0 if P {J > t} == e - ct for all t > O. In this cage, clearly E, == c- 1 . 
238 Foundations of Modern Probability Proposition 12.15 (Poisson and renewal processes) Let € be a simple point process on 1R+ with atoms at T1 < T2 < ... , and put TO == o. Then  is homogeneous Poisson with rate c > 0 iff the differences Tn - Tn-1 are i. i. d. and exponentially distributed with mean c- 1 . Proof: First assume that  is Poisson with rate c. Then Nt = [O, t] is a space- and time-homogeneous pure jump-type Markov process. By Lemma 7.6 and Theorem 12.14, the strong Markov property holds at each Tn, and by Theorem 8.10 we get d T1 = Tn+1 - Tn lL (T1,..., Tn), n E N. Thus, the variables Tn - T n -1 are i.i.d., and it remains to note that P{7"l > t} = P{[O,t] == O} == e- c . Conversely, assume that T1, 72, . .. have the stated properties. Consider a homogeneous Poisson process 'fJ with rate c and with atoms at (11 < (12 < ... , and conclude from the necessity part that (an) d (Tn). Hence,  = Ln b-r n d Ln bUn = 'fJ. We proceed to examine the structure of a general pure jump-type Markov process. Here the first and crucial step is to describe the distributions associated with the first jump. Say that a state xES is absorbing if Px{ X = x} = lor, equivalently, if Px{ 71 == oo} == 1. Lemma 12.16 (first jump) If x is nonabsorbing, then under Px the time T1 until the first jump is exponentially distributed and independent of (}"'l X . o Proof: Put 71 = T. Using the Markov property at fixed times, we get for any s, t > 0 Px{T> S + t} == Px{T > S, TO OS > t} = Px{T > S}Px{T > t}. The only nonincreasing solutions to this Cauchy equation are of the form Px{7 > t} = e- ct with c E [0,00]. Since x is nonabsorbing and 7 > 0 a.s., we have c E (0, (0), and so 7 is exponentially distributed with parameter c. By the Markov property at fixed times, we further get for any B E F 00 PX{T > t, (}".X E B} Px{ T > t, (fJ".X) 0 (}t E B} Px{7 > t}Px{(J".X E B}, which shows that T lLO-rX. o Writing Xoo = x when X is eventually absorbed at x, we may define the rate function c and jump transition kernel /-L by c(x) = (ExT1)-1, J.L(x, B) = PX{X"'l E B}, XES, B E S. It is often convenient to combine c and J-L into a rate kernel a(x, B) c(x)J.1.(x, B) or Q = CJ.L, where the required measurability is clear from 
12. Poisson and Pure Jump- Type Markov Processes 239 that for the kernel (P x ). Note that jj may be reconstrueted from Q, if we add the requirement that jj(x,.) == 8x when a(x,.) == 0, conforming with our convention for absorbing states. This ensures that J.L is a measurable function of ();. The following theorem gives an explicit representation of the process in terms of a discrete-time Markov chain and a sequence of exponentially distributed random variables. The result shows in particular that the dis- tributions Px are uniquely determined by the rate kernel a. As usual, we assume the existence of required randomization variables. Theorem 12.17 (embedded Markov chain) Let X be a pure jump-type Markov process with rate kernel a == Cjj. Then there exist a Markov process Y on Z+ with transition kernel J-L and an independent sequence of i. i. d., exponentially distributed random variables 1'1, 1'2, . .. with mean 1 such that a.s. Xt = Y n , t E [Tn, T n +1), n E Z+, (6) where n  Tk Tn =  (Yi ) , n E Z+. k=l C k-l (7) Proof: To satisfy (6), put TO = 0, and define Y n == X Tn for n E Z+. Intro- duce some Li.d. exponentially distributed random variables T' T' . . .lLX with mean 1, and define for n E N "In == (Tn - Tn-1)C(Yn)1{Tn-l < oo} + "Il{c(}) == O}. By Lemma 12.16, we get for any t > 0, B E 5, and xES with c(x) > 0 Px{l'l > t, Y 1 E B} = Px{TIC(X) > t, Y 1 E B} = e-tJ-L(x,B), and this clearly remains true when c(x) = o. By the strong Markov property we obtain for every n, a.s. on { Tn < oo}, Px[rn+l > t, Y n + 1 E BIFrn] = P Yn {l'l > t, Yl E B} = e-tjj(Yn,B). (8) The strong Markov property also gives Tn+l < 00 a.s. on the set {Tn < 00, C(Y n ) > O}. Arguing recursively, we get {c(Yn) == O} == {Tn+l == oo} a.s., and (7) follows. Using the same relation, it is also easy to check that (8) remains a.s. true on { Tn == oo}, and in both cases we Dlay clearly replace Fr n by Yn == Fr n V a{ T' . . . , T}. Thus, the pairs (Tn, Y n ) form a discrete- time Markov process with the desired transition kernel. 13y Proposition 8.2, the latter property together with the initial distribution determine uniquely the joint distribution of Y and (Tn). 0 In applications the rate kernel Q is normally given, and one needs to know whether a corresponding Markov process X exists. As before we may write a(x, B) = c(x)J.t(x, B) for a suitable choice of rate function c: S -+ R+ and transition kernel J.L on S, where J..L(x,.) = 6x when c(x) == 0 and otherwise J1(x, {x}) == o. If X does exist, it clearly may be constructed as in Theorem 
240 Foundations of Modern Probability 12.17. The construction fails when ( = SUP n Tn < 00, in which case an explosion is said to occur at time (. Theorem 12.18 (synthesis) For any kernel Q; == CM on S with a(x, {x}) = 0, consider a Markov chain Y with transition kernel J-l and some i. i.d., expo- nentially distributed random variables 1'1,1'2,... llY with mean 1. Assume that En "tn/C(Yn-l) == 00 a.s. under every initial distribution for Y. Then (6) and (7) define a pure jump-type Markov process with rate kernel Q. Proof: Let Px be the distribution of the sequences Y = (Y n ) and r = (1'n) when Yo = x. For convenience, we may regard (Y, r) as the identity mapping on the canonical space n == 8 00 x R+. Construct X from (Y, r) as in (6) and (7), with Xt = So arbitrary for t > SUP n Tn, and introduce the filtrations y == (Qn) induced by (Y,,) and :F = (Ft) induced by X. It suffices to prove the Markov property Px[8tX E 'IFt] == PXt {X E,}, since the rate kernel may then be identified via Theorem 12.17. Then fix any t > 0 and n E Z+, and define K == sup{k; Tk < t}, f3 = (t - Tn)C(Yn). Put Tm(Y,r) == {(Y k "k+1); k > m}, (Y',r') == T n + 1 (Y,r), and " ,n+1' Since clearly Ft = Yn Va{,' > fj} on {== n}, it is enough by Lemma 6.2 to prove that Px[(Y', r') E ., 'Y' - (3 > rl Qn,,' > (3] = pYn {T(Y, r) E ., ')'1 > r}. Now (y l , r')l1.g n (,"'/, {3) because ')"lL(Qn, Y', r'), and so the left-hand side equals Px[(Y', r') E ., ')" - {3 > rlQn] Px [')" > IQn] = p [( Y' r' ) E . I I! ] Px[,' - /3 > rlQn] == ( p 0 T- 1 ) -T x, n Px ['Y' > .8lgn] Y n e, as required. o To complete the picture, we need a convenient criterion for nonexplosion. Proposition 12.19 (explosion) For any rate kernel Q and initial state x, let (Y n ) and (Tn) be such as in Theorem 12.17. Then a.s. Tn  00 iff 2: n {C(Yn)} -1 = 00. (9) In particular, Tn -t 00 a.s. when x is recurrent for (Y n ). Proof: Write (3n = {c(Y n - 1 )} -1. Noting that Ee-U'Yn == (1 + u)-1 for all u > 0, we get by (7) and Fubini's theorem E[e-u(IYj = IIn (1 + ul1n)-l = exp { - Ln log(l + ul1n}} a.s. (10) 
12. Poisson and Pure Jump- Type Markov Processes 241 Since !(rI\1) < log(l+r} < r for all r > 0, the series on the right converges for every u > 0 iff En /3n < 00. Letting u  0 in (10), we get by dominated convergence P[( < 001 Y] = 1 {:LJ1n < oo} a.s., which implies (9). If x is visited infinitely often, then the series En!3n has infinitely many terms C;l > 0, and the last assertion follows. 0 By a pseudo-Poisson process in some measurable space 5 we mean a process of the form X == YoN a.s., where Y is a diserete-time Markov process in Sand N is an independent homogeneous Poisson process. Letting Jlt be the transition kernel of Y and writing c for the constant rate of N, we may construct a kernel a(x, B} = CJl(x, B \ {x}}, XES, B E B(S), (11) which is measurable since J.t(x, {x}) is a measurable function of x. The next result characterizes pseudo-Poisson processes in terms of the rate kernel. Proposition 12.20 (pseudo-Poisson processes) A pr-ocess X in some Borel space 5 is pseudo-Poisson iff it is pure jump-type Markov with a bounded rate function. Specifically, if X == YoN a.s. for some Markov chain Y with transition kernel Jlt and an independent Poisson process N with constant rate c, then X has the rate kernel in (II). Proof: Assume that X = YoN with Y and N as stated. Letting T1, T2, . . . be the jump times of N and writing F for the filtration induced by the pair (X, N), it may be seen as in Theorem 12.18 that X is F-:rvlarkov. To identify the rate kernel 0, fix any initial state x, and note that the first jump of X occurs at the first time Tn when Y n leaves x. For each transition of Y, this happens with probability Px = J.t(x, {x}C). By Proposition 12.3 the time until first jump is then exponentially distributed with parameter cpx. If Px > 0, we further note that the location of X after the first jump has distribution J.t( x, . \ {x} ) / Px. Thus, a is given by (11). Conversely, let X be a pure jump-type Markov process with uniformly bounded rate kernel Q =1= O. Put r x = a(x, S) and c == sUPx r x , and note that the kernel Jlt(x,.) == c- 1 {a(x,.) + (c - r x )6x}, xES, satisfies (11). Thus, if X' = Y' 0 N' is a pseudo-Poisson process based on J.L and c, then X' is again Markov with rate kernel a, and so X d X'. Hence, Corollary 6.11 yields X = YoN a.s. for some pair (Y, N) d (Y', N'). 0 If the underlying Markov chain Y is a random walk in some measurable Abelian group 5, then X == YoN is called a compound Poisson process. In this case X - Xo J.L Xo, the jump sizes are i.i.d., and the jump times are given by an independent homogeneous Poisson process. Thus, the distribu- tion of X - Xo is determined by the characteristic measure lJ == cJ.t, where c 
242 Foundations of Modern Probability is the rate of the jump time process and J-t is the common distribution of the jumps. A kernel a on S is said to be homogeneous if a(x, B) = a(O, B - x) for all x and B. Let us also say that a process X in S has independent increments if Xt - Xs l.L {X r ; r < s} for any s < t. The next result characterizes compound Poisson processes in two ways, analytically in terms of the rate kernel and probabilistically in terms of the increments of the process. Corollary 12.21 (compound Poisson processes) For any pure jump- type process X in some measurable Abelian group, these conditions are equivalent: (i) X is Markov with homogeneous rate kernel; (ii) X has independent increments; (iii) X is compound Poisson. Proof: If a pure jump-type Markov process is space-homogeneous, then its rate kernel is clearly homogeneous; the converse follows from the representa- tion in Theorem 12.17. Thus, (i) and (ii) are equivalent by Proposition 8.5. Next Theorem 12.17 shows that (i) implies (iii), and the converse follows by Theorem 12.18. 0 Our next aim is to derive a combined differential and integral equation for the transition kernels /-Lt. An abstract version of this result appears in Theorem 19.6. For any measurable and suitably integrable function f: S  1R, we define Ttf(x) = J f(Y)I1Ax, dy) = Exf(Xt), XES, t > o. Theorem 12.22 (backward equation, Kolmogorov) Let Q be the rate ker- nel of a pure jump-type Markov process on S, and fix any bounded, measurable function /: S --t IR. Then Tt/(x) is continuously differentiable in t for fixed x, and we have  Ttf(x) = J a(x, dy){Jtf(y) -Jtf(x)}, t > 0, XES. (12) Proof: Put T = T1, and let xES and t > O. By the strong Markov property at u = T 1\ t and Theorem 6.4, Ttf(x) Ex/(Xt) = Exf«OuX)t-u) = ExTt-uf(Xu) f(x)Px{r > t} + Ex[Tt-T!(X T ); r < t] f(x)e- tcx + it e-SCxds J a(x,dy)Tt-sf(y), and so etcXTf.f(x) = f(x) + it e 8Cx ds J a(x,dy)Tsf(y). (13) 
12. Poisson and Pure Jump-Type Markov Processes 243 Here the use of the disintegration theorem is justified by the fact that X(w, t) is product measurable on Q x JR+ because of the right-continuity of the paths. From (13) we note that Ttf(x) is continuous in t for each x, and so by dominated convergence the inner integral on the right is continuous in s. Hence, Ttf(x) is continuously differentiable in t, and (12) follows by an easy computation. 0 The next result relates the invariant distributions of a pure jump-type Markov process to those of the embedded Markov chain. Proposition 12.23 (invariance) Let the processes X and Y be related as in Theorem 12.17, and fix a probability measure v on S" with J c dv < 00. Then v is invariant for X iff c . v is invariant for Y. Proof: By Theorem 12.22 and Fubini's theorem, we have for any bounded measurable function f: S -1- 1R Evf(Xt) = J f(x)v(dx) + l t ds J v(dx) J Q(x, dy){Tsf(y) - Tsf(x)}. Thus, v is invariant for X iff the second term on the right is identically zero. Now (12) shows that Ttf(x) is continuous in t, and by dominated convergence this is also true for the integral It = f v(dx) f Q(x, dy){TtJ(y) - TtJ(x)}, t > O. Thus, the condition becomes It = O. Since f is arbitra.ry, it is enough to take t = o. Our condition then reduces to (va)f = v(cf) or (c. v)J.L = c. v, which means that c . v is invariant for Y. 0 By a continuous-time Markov chain we mean a pure jump-type Markov process on a countable state space S. Here the kernels J.-£t may be specified by the set of transition functions Pj = JLt(i, {j}). The connectivity prop- erties are simpler than in discrete time, and the notion of periodicity has no counterpart in the continuous-time theory. Lemma 12.24 (positivity) For any i,j E S, we have either Pj > 0 for all t > 0, or Pj = 0 for all t > O. In particular, Pi > 0 for all t and i. Proof: Let q = (qij) be the transition matrix of the embedded Markov chain Y in Theorem 12.17. If qij = Pi{Y n == j} = 0 for all n > 0, then clearly 1{X t =I j} = 1 a.s. Pi, and so Pj == 0 for all t > O. If instead qIj > 0 for some n > 0, there exist some states i = io, iI, . . . , in == j with qik-bik > 0 for k = 1, . . . , n. Noting that the distribution of ('1, . . . , l'n+l) has positive density I1kn+l e- Xk > 0 on JR+l, we obtain for any t > 0 { n n+l } n Pj > p 2: 7 k < t < 2: c:rk II qik-l,ik > O. k=l Ci k - 1 k=l tk-l k=1 
244 Foundations of Modern Probability Since P?i = q?i = 1, we get in particular Pi > 0 for all t > o. o A continuous-time Markov chain is said to be irreducible if Pj > 0 for all i, j E Sand t > O. Note that this holds iff the associated discrete-time process Y in Theorem 12.17 is irreducible. In that case clearly sup{ t > 0; Xt == j} < 00 iff sup{ n > 0; Y n == j} < 00. Thus, when Y is recurrent, the sets {t; Xt = j} are a.s. unbounded under Pi for all i E S; otherwise, they are a.s. bounded. The two possibilities are again referred to as recurrence and transience, respectively. The basic ergodic Theorem 8.18 for discrete-time Markov chains has an analogous version in continuous time. Further extensions are considered in Chapter 20. Theorem 12.25 (ergodic behavior) For any irreducible, continuous-time Markov chain in S, exactly one of these cases occurs: (i) There exists a unique invariant distribution v, the latter satisfies Vi > o for all i E S, and for any distribution Jl on S, !im IIPJ.t 0 Oil - Pv II == O. (14) too (ii) No invariant distribution exists, and Pj -+ 0 for all i, j E s. Proof: By Lemma 12.24 the discrete-time chain Xnh, n E Z+, is irre- ducible and aperiodic. Assume that (X nh ) is positive recurrent for some h > 0, say with invariant distribution v. Then the chain (Xnh,) is pos- itive recurrent for every h' of the form 2- m h, and by the uniqueness in Theorem 8.18 it has the same invariant distribution. Since the paths are right-continuous, we may conclude by a simple approximation that v IS invariant even for the original process X. For any distribution J-l on S we have IIPI' oOt l - Pvll = III:ilLiI:/Pj - Vj)Pjll < LilLiL)pj - vjl. Thus, (14) follows by dominated convergence if we can show that the inner sum on the right tends to zero. This is clear if we put n = [tlh] and T == t - nh and note that by Theorem 8.18 I:klpk - vkl < I:jI:klpijh - VjlPJk = I:)pijh - Vjl -+ O. It remains to consider the case when (Xnh) is null recurrent or transient for every h > O. Fixing any i, k E S and writing n == [tlh] and r == t - nh as before, we get t  r nh < nh +  r nh + (1 r ) Pik = L...ijPijPjk - Pik L.Jj#iPij == Pik - Pii , which tends to zero as t -+ 00 and then h -+ 0, due to Theorem 8.18 and the continuity of Pi. 0 As in discrete time, we note that condition (ii) of the last theorem holds for any transient Markov chain, whereas a recurrent chain may satisfy either 
12. Poisson and Pure Jump- Type Markov Processes 245 condition. Recurrent chains satisfying (i) and (ii) are again referred to as positive recurrent and null-recurrent, respectively. It is interesting to note that X may be positive recurrent even when the embedded, discrete-time chain Y is null-recurrent, and vice versa. On the other hand, X clearly has the same ergodic properties as the discrete-time processes (Xnh), h > O. Let us next introduce the first exit and recurrence tirres "Yj == inf{ t > 0; Xt =1= j}, Tj = inf {t > "'Yj; )( t == j}. As in Theorem 8.22 for the discrete-time case, we may express the asymp- totic transition probabilities in terms of the mean recurrence times EjTj. To avoid trivial exceptions, we confine our attention to non absorbing states. Theorem 12.26 (mean recurrence times) For any continuous-time Mar- kov chain in S and states i, j E S with j nonabsorbing, we have I . t Pi {Tj < oo} lID Pi' == . t-+oo J C.E'T' J J J (15) Proof: It is enough to take i == j, since the general statement will then follow as in the proof of Theorem 8.22. If j is transient, then 1 {X t == j} -t 0 a.s. P j , and so by dominated convergence P]j == Pj{X t == j} -+ O. This agrees with (15), since in this case Pj {Tj == oo} > O. Turning to the recurrent case, let 5j denote the class of states i accessible from j. Then 8j is clearly irreducible, and so PJj converges by Theorem 12.25. To identify the limit, define L = '\{s < t; Xs =j} = l t l{X s =j}ds, t > 0, and let rj denote the instant of nth return to j. Letting m, n -+ 00 with 1m - nl < 1, and using the strong Markov property and the law of large numbers, we get a.s. Pj U(rj) _ Lj(rj) n m Ej'Yj 1 n - .-n'--+ E ==- E :;--. Tj m Tj n jTj Cj 'jTj By the monotonicity of Lj, it follows that t- I L -+ (Cj Ej Tj) -1 a.s. Hence, by Fubini's theorem and dominated convergence, t j  1 ps. .ds = EjLt ---+ 1 , t 0 JJ t C.E.T' J J J and (15) follows. o 
246 Foundations of Modern Probability Exercises 1. Let  be a point process on a Borel space S. Show that  = Ek 6'Tk for some random elements Tk in S U {  }, where d fJ. S is arbitrary. Extend the result to general random measures. (Hint: We may assume that S = R.+.) 2. Show that two random measures  and 1] are independent iff Ee-f-".,g = Ee-f Ee- TJ9 for all measurable f, 9 > o. Also, in case of simple point processes, prove the equivalence of P {  B + 'T}C == O} == P {  B == O} P {TJC == O} for any B, C E S. (Hint: Regard (, 1]) as a random measure on 28.) 3. Let 1, 2, . .. be independent Poisson processes with intensity measures J-Ll, J-L2, . .. such that the measure J.l == Ek J.,lk is a-finite. Show that  == Ek k is again Poisson with intensity measure J-L. 4. Show that the classes of mixed Poisson and binomial processes are preserved under randomization. 5. Let  be a Cox process on S directed by some random measure 'T}, and let f be a measurable mapping into some space T such that TJ 0 j-l is a.s. u-finite. Prove directly from definitions that  0 1-1 is a Cox process on T directed by TJ 0 j-l. Derive a corresponding result for p-thinnings. Also show how the result follows from Proposition 12.3. 6. Consider a p-thinning 1] of  and a q-thinning ( of TJ with (J.L1J. Show that ( is a pq-thinning of . 7. Let  be a Cox process directed by 1] or a p-thinning of'fJ with p E (0,1), and fix two disjoint sets B, C E S. Show that BlLC iff'T]BJ1.1]C. (Hint: Compute the Laplace transforms. The if assertions can also be obtained from Proposition 6.8.) 8. Use Lemma 12.2 to derive expressions for P{B = O} when  is a Cox process directed by 1J, a j.L-randomization of 17, or a p-thinning of 1]. (Hint: Note that Ee-tB ---t P{B = O} as t -t 0.) 9. Let  be a p-thinning of TJ, where p E (0,1). Show that  and 1} are simultaneously Cox. (Hint: Use Lemma 12.6.) 10. (Fichtner) For a fixed p E (0,1), let TJ be a p-thinning of a point process € on S. Show that € is Poisson iff 1]Jl€ - 1]. (Hint: Extend by iteration to arbitrary p. Then a uniform randomization of  on S x [0, 1] has independent increments in the second variable, and the result follows by Theorem 18.3.) 11. Use Theorem 12.8 to give a simplified proof of Theorem 12.4 in the case when  is simple. 12. Derive Theorem 12.4 from Theorem 12.12. (Hint: Note that  is sym- metric on S iff it is symmetric on Bn for every n. If  is simple, the assertion follows immediately from Theorem 12.12. Otherwise, apply the same result to a uniform randomization on S x [0, 1].) 
12. Poisson and Pure Jump-Type Markov Processes 247 13. For  as Theorem 12.12, show that P{B == O} =: cp(I-LB) for some completely monotone function 'P. Conclude from the Hausdorff-Bernstein characterization and Theorem 12.8 that  is a mixed Poisson or binomial process based on J.L. 14. Show that the distribution of a simple point process  on  is not determined, in general, by the distributions of €I for all intervals I. (Hint: If  is restricted to {I,..., n}, then the distributions of all I give 2: k sn k(n- k + 1) < n 3 linear relations between the 2 n - 1 parameters.) 15. Show that the distribution of a point process is not determined, in general, by the one-dimensional distributions. (Hint: If  is restricted to {O, I} with {O} V {1} < n, then the one-dimensional distributions give 4n linear relations between the n( n + 2) parameters.) 16. Show that Lemma 12.1 remains valid with Bl'...' Bn restricted to an arbitrary preseparating class C, as defined in Chapter 16 or Appendix A2. Also show that Theorem 12.8 holds with B restricted to a separating class. (Hint: Extend to the case when C = {B E S; (+ 1])8B = 0 a.s.}. Then use monotone class arguments for sets in S and in M(S).) 17. Show that Theorem 12.10 fails in general without the condition (( {s} x K) = 0 a.s. for all s. 18. Give an example of a non-Poisson point process  on S such that f,B is Poisson for every B E S. (Hint: It suffices to take S = {O, I}.) 19. Extend Corollary 12.11 to the case when Ps = P{{s} > O} may be positive. (Hint: By Fatou's lemma, Ps > 0 for at most countably many s.) 20. Prove Theorem 12.13 (i) and (iii) by means of characteristic functions. 21. Let  and", be independent Poisson processes on S with E == Ef} = J-L, and let /1, 12, . . . : S -+ JR be measurable with 00 > J.-L(f: 1\ 1) -+ 00. Show that I( - 1J)fnl  00. (Hint: Consider the symmetrization v of a fixed measure 11 E N(S) with lIf --+ 00, and argue along subsequences as in the proof of Theorem 4.17.) 22. For any pure jump-type Markov process on S, show that Px{T2 < t} = o(t) for all XES. Also note that the bound can be sharpened to O(t 2 ) if the rate function is bounded, but not in general. (Hint: Use Lemma 12.16 and dominated convergence.) 23. Show that any transient, discrete-time Markov chain Y can be em- bedded into an exploding (resp., nonexploding) continuous-time chain X. (Hint: Use Propositions 8.12 and 12.19.) 24. In Corollary 12.21, use the measurability of the mapping X = YoN to deduce the implication (iii) => (i) from its converse. (Hint: Proceed as in the proof of Proposition 12.15.) Also use Proposition 12.3 to show that (iii) implies (ii), and prove the converse by means of Theorem 12.10. 
248 Foundations of Modern Probability 25. Consider a pure jump-type Markov process on (8, S) with transi- tion kernels J.Lt and rate kernel a. Show for any xES and B E S that a(x,B) = /lo(x,B \ {x}). (Hint: Take f == 1 B \{x} in Theorem 12.22, and use dominated convergence.) 26. Use Theorem 12.22 to derive a system of differential equations for the transition functions Pij(t) of a continuous-time Markov chain. (Hint: Take f( i) == 6 ij for fixed j.) 27. Give an example of a positive recurrent, continuous-time Markov chain such that the embedded discrete-time chain is null-recurrent, and vice versa. (Hint: Use Proposition 12.23.) 28. Establish Theorem 12.25 by a direct argument, mimicking the proof of Theorem 8.18. 
Chapter 13 Gaussian Processes and Brownian Motion Symmetries of Gaussian distribution; existence and path prop- erties of Brownian motion; strong Markov and reflection properties; arcsine and uniform laws; law of the iterated log- arithm; Wiener integrals and isonormal Gaussian processes; multiple Wiener-Ita integrals; chaos expansion of Brownian functionals The main purpose of this chapter is to initiate the study of Brownian motion, arguably the single most important object in rnodern probability theory. Indeed, we shall see in Chapters 14 and 16 how the Gaussian limit theorems of Chapter 5 can be extended to approximations of broad classes of random walks and discrete-time martingales by a Brownian motion. In Chapter 18 we show how every continuous local rnartingale may be represented in terms of Brownian motion through a suitable random time- change. Similarly, the results of Chapters 21 and 23 demonstrate how large classes of diffusion processes may be constructed from Brownian motion by various pathwise transformations. Finally, a close relationship between Brownian motion and classical potential theory is uncovered in Chapters 24 and 25. The easiest construction of Brownian motion is via a so-called isonormal Gaussian process on £2 (R+ ), whose existence is a consequence of the char- acteristic spherical symmetry of the multivariate Gaussian distributions. Among the many important properties of Brownian motion, this chapter covers the Holder continuity and existence of quadratic variation, the strong Markov and reflection properties, the three arcsine laws, and the law of the iterated logarithm. The values of an isonormal Gaussian process on L 2 (JR+) may be identi- fied with integrals of L2-functions with respect to the associated Brownian motion. Many processes of interest have representations in terms of such integrals, and in particular we shall consider spectral arld moving average representations of stationary Gaussian processes. More generally, we shall introduce the multiple Wiener-Ita integrals In! of functions f E £2(IR+.) and establish the fundamental chaos expansion of Brownian L 2 -functionals. The present material is related to practically every other chapter in the book. Thus, we refer to Chapter 5 for the definition of Gaussian distribu- 
250 Foundations of Modern Probability tions and the basic Gaussian limit theorem, to Chapter 6 for the transfer theorem, to Chapter 7 for properties of martingales and optional times, to Chapter 8 for basic facts about Markov processes, to Chapter 9 for similar- ities with random walks, to Chapter 11 for some basic symmetry results, and to Chapter 12 for analogies with the Poisson process. Our study of Brownian motion per se is continued in Chapter 18 with the basic recurrence or transience dichotomy, some further invariance proper- ties, and a representation of Brownian martingales. Brownian local time and additive functionals are studied in Chapter 22. In Chapter 24 we consider some basic properties of Brownian hitting distributions, and in Chapter 25 we examine the relationship between excessive functions and additive functionals of Brownian motion. A further discussion of multiple integrals and chaos expansions appears in Chapter 18. To begin with some basic definitions, we say that a process X on some parameter space T is Gaussian if the random variable C1X t1 + . . . + cnX tn is Gaussian for any choice of n EN, t 1, . . . , t n E T, and Cl, . . . , C n E JR. This holds in particular if the Xt are independent Gaussian random variables. A Gaussian process X is said to be centered if EXt = 0 for all t E T. Let us also say that the processes Xi on T i , i E I, are jointly Gaussian if the combined process X == {X;; t E Ti, i E I} is Gaussian. The latter condition is certainly fulfilled if the processes Xi are independent and Gaussian. The following simple facts clarify the fundamental role of the covariance function. As usual, we assume all distributions to be defined on the a-fields generated by the evaluation maps. Lemma 13.1 (covariance function) (i) The distribution of a Gaussian process X on T is determined by the functions EXt and cov(Xs, X t ), s, t E T. (ii) The jointlv Gaussian processes Xi on T i , i E I, are independent iff cov(X,Xl) == 0 for all s E T i and t E Tj, i =1= j in I. Proof: (i) Let X and Y be Gaussian processes on T with the same means and covariances. Then the random variables C1X tl +.. '+cnX tn and Cl yt 1 +. . . + C n ¥t n have the same mean and variance for any Cl, . . . , C n E JR. and tl,.'., t n E T, n E N, and since both variables are Gaussian, their distributions must agree. By the Cramer-Wold theorem it follows that d d (X t1 , . . . , X tn ) = (}It}, . . . , "Yt n ) for any tl, . . . , t n E T, n E N, and so X = Y by Proposition 3.2. (ii) Assume the stated condition. To prove the asserted independence, we may assume I to be finite. Introduce some independent processes yi, i E I, with the same distributions as the Xi, and note that the combined processes X = (Xi) and Y = (yi) have the same means and covariances. Hence, the joint distributions agree by part (i). In particular, the indepen- dence between the processes yi implies the corresponding property for the processes Xi. 0 
13. Gaussian Processes and Brownian Motion 251 The following result characterizes the Gaussian distributions by a simple symmetry property. Proposition 13.2 (spherical symmetry, Maxwell) Let 1, . . . , d be inde- pendent random variables, where d > 2. Then the distribution of (l, . . . , d) is spherically symmetric iff the i are i. i. d. centered Gav..ssian. Proof: Let cp denote the common characteristic function of 1, . . . , d, and assume the stated condition. In particular, -l  l, and so 'P is real valued and symmetric. Noting th at Sl + t2 d 1 V s2 + t 2 , we obtain the functional equation <p(s)cp(t) = cp( v s 2 + t 2 ), and so by iteration cpn(t) = cp(tyTi) for all n. Thus, for rational t 2 we have 'P(t) == e at2 for some constant a, and by continuity this extends to all t E . Finally, we have a < 0 since I'PI < 1. Conversely, let l,..., d be i.i.d. centered Gaussian, and assume that ( 'T}1, . . . , 'T/d) = T (  1, . . . , d) for some orthogonal transformation T. Then both random vectors are Gaussian, and we may easily verify that cov( 'T/i, 'TJj) = cov( i, j) for all i and j. Hence, the two distributions agree by Lemma 13.1. 0 In infinite dimensions, the Gaussian property is essentia.lly a consequence of the rotational symmetry alone, without any assumption of independence. Theorem 13.3 (unitary invariance, Schoenberg, Freedrnan) For any infi- nite sequence of random variables 1,2,... , the distribution of(l,... ,n) is spherically symmetric for every n > 1 iff the k are conditionally i. i. d. N(O, (]"2), given some random variable a 2 > o. Proof: The n are clearly exchangeable, and so by Theorem 11.10 there exists a random probability measure f.-L such that the n are conditionally J.l,-i.i.d. given J.L. By the law of large numbers, J.LB == Hm n- 1  1{k E B} a.s., B E B, n--+CX) k".5:.n and in particular J.L is a.s. {3, 4, . . . }-measurable. Now the spherical symmetry implies that, for any orthogonal transformation T on JR.2, P[(l, 2) E BI3,. . ., n] == P[T(I, 2) E BI3,. . . , n], B E B(JR. 2 ). As n --t 00, we get J.L2 == f.-L2 0 T- 1 a.s. Considering a countable dense set of mappings T, it is clear that the exceptional null set can be chosen to be independent of T. Thus, f.-L2 is a.s. spherically symmetric, and so J.l, is a.s. centered Gaussian by Proposition 13.2. It rema.ins to take (j2 == J x2f.-L(dx). 0 Now fix a separable Hilbert space H. By an isonorrnal Gaussian pro- cess on H we mean a centered Gaussian process ryh, h E H, such that E( TJh TJk) == (h, k), the inner product of hand k. To construct such a pro- cess 'T}, we may introduce an orthonormal basis (ONB) eL, e2,... E H, and let 1,2,... be independent N(O,1) random variables. For any element 
252 Foundations of Modern Probability h == Ei biei, we define 'TJh == Ei bii, where the series converges a.s. and in £2 since Ei b; < 00. The process 'T} is clearly centered Gaussian. It is also linear, in the sense that 'TJ( ah + bk) == a'TJh + b'TJk a.s. for all h, k E Hand a, b E . Assuming k == Ei Ciei, we may compute E(rJh'TJk) == L. .bicjE(ij) == L.biCi == (h,k). Z,J  By Lemma 13.1 the stated conditions uniquely determine the distribu- tion of 1]. In particular, the symmetry in Proposition 13.2 extends to a distributional invariance of'TJ under any unitary transformation on H. The following result shows how the Gaussian distribution arises naturally in the context of processes with independent increments. It is interesting to compare with the similar Poisson characterization in Theorem 12.10. Theorem 13.4 (independence and Gaussian property, Levy) Let X be a continuous process in }Rd with independent increments and Xo == O. Then X is Gaussian, and there exist some continuous functions b in d and a in JRd 2 , the latter with nonnegative definite increments, such that Xt - Xs is N(b t - b s , at - as) for all s < t. Proof: Fix any s < t in + and U E ]Rd. For every n E N we may divide the interval [s, t] into n subintervals of equal length, and we denote the corresponding increments of uX by nl, . . . , nn' By the continuity of X we have maxj Injl ---t 0 a.s., and so Theorem 5.15 shows that u(Xt - Xs) == E j nj is a Gaussian random variable. Since X has independent increments, it follows that X is Gaussian. Writing b t == EXt and at == cov(X t ), we get E(X t - Xs) == EXt - EX s == b t - b s , and so by independence o < cov(X t - Xs) == cov(Xt) - cov(Xs) == at - as, s < t. The continuity of X yields Xs  Xt as s ---7 t, and so b s ---7 b t and as  at. Thus, both functions are continuous. 0 If the process X in Theorem 13.4 has stationary, independent increments, then the mean and covariance functions are clearly linear. The simplest choice in one dimension is to take b = 0 and at == t, so that Xt - Xs is N(O, t - s) for all s < t. The next result shows that the corresponding process exists; it also gives an estimate of the local modulus of continuity. More precise rates of continuity are obtained in Theorem 13.18 and Lemma 14.7. Theorem 13.5 (existence of Brownian motion, Wiener) There exists a continuous Gaussian process B in JR with stationary independent incre- ments and Bo == 0 such that Bt is N(D, t) for every t > O. Furthermore, B is a.s. locally Holder continuous with exponent c for any c E (0, !). Proof: Let 'TJ be an isonormal Gaussian process on L 2 (1R+, A), and de- fine Bt = 1J1[O,t], t > O. Since indicator functions of disjoint intervals are orthogonal, the increments of the process Bare uncorrelated and hence 
13. Gaussian Processes and Brownian Motion 253 independent. Furthermore, we have IIl(s,t]112 == t - s for any s < t, and so Bt - B s is N (0, t - s). For any s < t we get !!.- !!.- ( ) 1/2 Bt - Bs - Bt-s - t - S B 1 , (1) whence, EIBt - Bslc = (t - s)c/2EIB 1 / c < 00, C:::> o. The asserted Holder continuity now follows by Theorem 3.23. A process B as in Theorem 13.5 is called a (standard) Brownian motion or a Wiener process. By a Brownian motion in JRd we mean a process Bt == (BI,..., Bf), where B 1 ,..., Bd are independent., one-dimensional Brownian motions. From Proposition 13.2 we note that the distribution of B is invariant under orthogonal transformations of Rd. It. is also clear that any continuous process X in d with stationary independent increments and Xo == 0 can be written as Xt == bt + uBt for some vector b and matrix a. From Brownian motion we may construct other important Gaussian pro- cesses. For example, a Brownian bridge may be defined as a process on [0,1] with the same distribution as Xt == Bt -tB I , t E [0, 1]. An easy computation shows that X has covariance function Ts,t = s(1 - t), 0 :::; s < t < 1. The Brownian motion and bridge have many nice symmetry properties. For example, if B is a Brownian motion, then so is - B as well as the process c- 1 B(c 2 t) for any c > o. The latter transformation is especially useful and is often referred to as a Brownian scaling. We also note that, for each u > 0, the processes B u =1:.t - Bu are Brownian motions on IR+ and [0, u], respectively. If B is instead a Brownian bridge, then so are the processes -Bt and Bl-t. The following result gives some less obvious invariance properties. Further, possibly random mappings that preserve the distribution of a Brownian motion or bridge are exhibited in Theorem 13.11, Lemma 13.14, and Proposition 18.9. o Lemma 13.6 (scaling and inversion) If B is a Brownian motion, then so is the process tB I / t , whereas (1 - t)B t /(1-t) and tB(1-t)/t are Brownian bridges. If B is instead a Brownian bridge, then the processes (1 +t)Bt/(l+t) and (1 + t)B1/(1+t) are Brownian motions. Proof: Since all processes are centered Gaussian, it suffices by Lemma 13.1 to verify that they have the desired covariance functions. This is clear from the expressions s 1\ t and (s /\ t) (1 - s V t) for the covariance functions of the Brownian motion and bridge. 0 From Proposition 8.5 together with Theorem 13.4 we note that any space- and time-homogeneous, continuous Markov process in d has the form a Bt + tb + c, where B is a Brownian motion in ]Rd, a is a d x d matrix, and 
254 Foundations of Modern Probability band c are vectors in ]Rd. The next result gives a general characterization of Gaussian Markov processes. Here we use the convention % = o. Proposition 13.7 (Gaussian Markov processes) Let X be a Gaussian process on some index set T c JR., and define rs,t = cov(Xs, Xt). Then X is Markov iff Ts,u == Ts,tTt,u/rt,t, S < t < u in T. (2) If X is further stationary and defined on JR, then rs,t = ae-bls-tl for some constants a > 0 and b E [0,00]. Proof: Subtracting the means if necessary, we may assume that EXt = O. Now fix any times t < u in T, and choose a E JR such that X = Xu - aX t 1.. Xt. Then a = rt,u/Tt,t when rt,t =f 0, and if rt,t = 0, we may take a = o. By Lemma 13.1 we get XlLXt. First assume that X is Markov, and let s < t be arbitrary. Then Xsll.XtXU' and so XsJlXtX. Since also XtJlX by the choice of a, Propo- sition 6.8 yields Xsll.X. Hence, rs,u = ars,t, and (2) follows as we insert the expression for a. Conversely, (2) implies Xs J... X for all s < t, and so FtJlX by Lemma 13.1, where Ft = a{X s ; s < t}. By Proposition 6.8 it follows that FtllXtXu, which is the required Markov property of X at t. If X is stationary, then r s,t = rls-tl,O == Tis-tl' and (2) reduces to the Cauchy equation Tors+t = Tsrt, s, t > 0, which admits the only bounded solutions Tt = ae- bt . 0 A continuous, centered Gaussian process on R with covariance func- tion Tt = !e- 1tl is called a stationary Ornstein-Uhlenbeck process. Such a process Y can be expressed in terms of a Brownian motion B as yt = e- t B( e2t), t E JR. The last result shows that the Ornstein-Uhlenbeck process is essentially the only stationary Gaussian process that is also a Markov process. We will now study some basic sample path properties of Brownian motion. Lemma 13.8 (level sets) If B is a Brownian motion or bridge, then '\{t; Bt = u} == 0 a.s., u E JR. Proof: Introduce the processes XI" = B[nt]/n, t E R+ or [0,1], n E N, and note that XI" -+ B t for every t. Since each process xn is product measurable on (1 x JR+ or n x [0,1], the same thing is true for B. Now use Fubini's theorem to conclude that EA{tj Bt = u} = ! P{Bt = u}dt = 0, u E IR.. 0 The next result shows that Brownian motion has locally finite quadratic variation. An extension to general continuous semimartingales is obtained in Proposition 17.17. 
13. Gaussian Processes and Brownian Motion 255 Theorem 13.9 (quadratic variation, Levy) Let B be a 13rownian motion, and fix any t > 0 and a sequence of partitions 0 == tn,o < tn, 1 < ... < tn,k n = t, n E N, such that h n = maxk(tn,k - tn,k-l) -+ O. Then (n = Lk (Btn.k - Bt n .k_J2 -t t in £2. (3) If the partitions are nested, then also (n -+ t a.s. Proof (Doob): To prove (3), we may use the scaling property Bt - Bs d It - 81 1 / 2 Bl to obtain E(n - Lk E(Btn.k - Bt n .k_l)2 - Lk(tn,k - tn,k-l)EB; = t, var((n) Lk var(Btn.k - Bt n .k_l)2 - Lk (tn,k - t n ,k_d 2v ar(B;) < hntEBi -t O. For nested partitions we may prove the a.s. convergence by showing that the sequence (n) is a reverse martingale, that is, E[(n-l - (nl(n, (n+l, . . .] == 0 a.s., n E N. (4) Inserting intermediate partitions if necessary, we may assume that k n == n for all n. In that case there exist some numbers tl, t2 '\ . .. E [0, t] such that the nth partition has division points t 1 ,..., tn. To verify (4) for a fixed n, we may further introduce an auxiliary random variable {} lLB with P{{) = :i1} = , and replace B by the Brownian motion B == BSl\t n + f}(Bs - Bs/\t n ), S > o. Since B' has the same sums (n, (n+l, . .. as B whereas (n--l - (n is replaced by {)(n - (n-1), it is enough to show that E[{}(n - (n-l)l(n,(n+l,...] = 0 a.s. This is clear from the choice of {} if we first condition on (n-l,(n'.... 0 The last result implies that B has locally unbounded variation. This explains why the stochastic integral J V dB cannot be defined as an or- dinary Stieltjes integral and a more sophisticated approach is required in Chapter 17. Corollary 13.10 (linear variation) Brownian motion has a.s. unbounded variation on every interval [s, t] with s < t. Proof: The quadratic variation vanishes for any continuous function of bounded variation on [s, t]. 0 From Proposition 8.5 we note that Brownian motion B is a space-homo- geneous Markov process with respect to its induced filtration. If the Markov property holds for some more general filtration F == (Ft) -that is, if B is adapted to F and such that the process B = B s+t - B s is independent of 
256 Foundations of Modern Probability Fs for each s > 0 -we say that B is a Brownian motion with respect to F, or an :F-Brownian motion. In particular, we may take :Ft = Qt V N, t > 0, where 9 is the filtration induced by Band N = a{N c A; A E A, P A = OJ. With this construction, :F becomes right-continuous by Corollary 7.25. The Markov property of B will now be extended to suitable optional times. A more general version of this result appears in Theorem 19.17. As in Chapter 7, we write :Ft+ = Ft+. Theorem 13.11 (strong Markov property, Hunt) For any :F-Brownian motion B in}Rd and a.s. finite :F+ -optional time T, the process B == B,+t- Br, t > 0, is again a Brownian motion independent of :F;. Proof: As in Lemma 7.4, we may choose some optional times Tn -+ T that take countably many values and satisfy Tn > T + 2- n . Then F: c nn Fr n by Lemmas 7.1 and 7.3, and so by Proposition 8.9 and Theorem 8.10 each process Bf == B'n+ t - Br n , t > 0, is a Brownian motion independent of :r::. The continuity of B yields B'f -+ B a.s. for every t. By dominated convergence we then obtain, for any A E :F: and tI,..., tk E JR+, kEN, and for bounded continuous functions f: JRk -+ JR, E[f(Bl'...' Bk); A] = Ef(Btl'...' B tk ) . P A. The general relation P[B' E ., A] == P {B E .} . P A now follows by a straightforward extension argument. 0 If B is a Brownian motion in JRd, then a process with the same distri- bution as IBI is called a Bessel process of order d. More general Bessel processes may be obtained as solutions to suitable SDEs. The next result shows that IBI inherits the strong Markov property from B. Corollary 13.12 (Bessel processes) If B is an :F-Brownian motion in }Rd, then IBI is a strong :F+ -Markov process. Proof: By Theorem 13.11 it is enough to show that IB + xl d IB + yl whenever Ixl == Iyl. We may then choose an orthogonal transformation T on ]Rd with Tx = y, and note that d IB+xl = IT(B+x)1 = ITB+yl = IB+y\. o We shall use the strong Markov property to derive the distribution of the maximum of Brownian motion up to a fixed time. A stronger result is obtained in Corollary 22.3. Proposition 13.13 (maximum process, Bachelier) Let B be a Brownian motion in JR, and define Mt = sUPst Bs, t > O. Then d d Mt = Mt - Bt = IBt\, t > o. For the proof we need the following continuous-time counterpart to Lemma 9.10. 
13. Gaussian Processes and Brownian Motion 257 Lemma 13.14 (reflection principle) Consider a Brownian motion Band an associated optional time T. Then B has the same distribution as the reflected process Bt == B tAT - (Bt - B tAr ), t > o. Proof: It is enough to compare the distributions up to a fixed time t, and so we may assume that T < 00. Define B[ == BrAt and B: == B r + t - B T . By Theorem 13.11 the process B' is a Brownian motion independent of (T, BT). Since, moreover, - B' d B', we get (T, BT, B')  (T, B r , - B'). It remains to note that Bt = B[ + B Ct - r )+' Bt = B; - B(t-T)+' t > o. o Proof of Proposition 13.13: By scaling it suffices to take t == 1. Applying Lemma 13.14 with T == inf {t; Bt == x} gives P{M 1 > x, Bl < y} == P{B 1 > 2x - y}, x > yVO. By differentiation it follows that the pair (M 1 , B 1 ) has probability density -2<p'(2x - y), where <p denotes the standard normal density. Changing variables, we may conclude that (M 1 , M 1 - B 1 ) has density -2c.p'(x + y), x, y > o. In particular, both M 1 and M 1 - Bl have density 2c.p(x), x > o. 0 To prepare for the next main result, we shall derive another elementary sample path property. Lemma 13.15 (local extremes) The local maxima and minima of a Brownian motion or bridge are a.s. distinct. Proof: Let B be a Brownian motion, and fix any intervals I == [a, b] and J == [e, d] with b < c. Write sup Bt - sup Bt = sup(B t - Be) + (Be - B b ) - sup(B t - Bb). tEJ tEl tEJ tEl Here the second term on the right has a diffuse distribution, and by in- dependence the same thing is true for the whole expression. In particular, the difference on the left is a.s. nonzero. Since I and J are arbitrary, this proves the result for local maxima. The case of local minirna and the mixed case are similar. The result for the Brownian bridge B O follows from that for Brownian motion, since the distributions of the two processes are equivalent (mutu- ally absolutely continuous) on any interval [0, t] with t < 1. To see this, construct from Band BO the corresponding "bridges" Xs=Bs-Bt, Ys=B-B, sE[O,t], t t and check that BtiLX d Y JlB. The stated equivalence now follows from the fact that N(O, t) f'J N(O, t(l- t)) when t E [0,1). 0 
258 Foundations of Modern Probability The next result involves the arcsine law, which may be defined as the distribution of  == sin 2 Q when Q is U(O,27r). The name comes from the fact that P{ < t} = P {I sinal < v't} = : arcsin v't, t E [0,1]. Note that the arcsine distribution is symmetric around !, since  = sin 2 a d cos 2 Q = 1 - sin 2 Q = 1 - . The following celebrated result exhibits three interesting functionals of Brownian motion, all of which are arcsine distributed. Theorem 13.16 (arcsine laws, Levy) Let B be a Brownian motion on [0,1] with maximum MI. Then these random variables are all arcsine distributed: 71 == A{t; Bt > O}, 72 == inf{t; Bt == M 1 }, 73 == sup{t; Bt == O}. It is interesting to compare the relations 71 d 72 d 73 with the discrete-time versions obtained in Theorem 9.11 and Corollary 11.14. In Theorems 14.11 and 15.21, the arcsine laws are extended by approximation to appropriate random walks and Levy processes. d Proof: To see that 71 == 72, let n E N, and note that by Corollary 11.14 n- 1 Lkn 1{B k / n > O} d n- 1 min{k > 0; Bk/n = maxjnBj/n}. By Lemma 13.15 the right-hand side tends a.s. to 72 as n -+ 00. To see that the left-hand side converges to 71, we may conclude from Lemma 13.8 that A{t E [0,1]; Bt > O} + A{t E [0,1]; Bt < O} == 1 a.s. It remains to note that, for any open set G C [0,1], liminfn- 1 " lc(k/n) > AG. n--+oo L...J kn In case of 72, fix any t E [0,1], let  and T} be independent N(O, 1), and let Q be U(O,27r). Using Proposition 13.13 and the circular symmetry of the distribution of (, T]), we get P{72 < t} P{suPst(Bs - Bt) > sUPs2 t (B s - Bt)} - P{IBt\ > IBI - Btl} = p{t2 > (1 - t)1]2} - p{ £;2 : 'TJ2 < t } = P{sin 2 a < t}. In case of 73, we may write P {73 < t} P{suPstBs < O} + P{infstBs > O} 2P{SUPs2 t (B s - Bt) < -Bt} = 2P{I B l - Btl < Bt} P{IB1 - Btl < IBtl} = P{72 < t}. 0 
13. Gaussian Processes and Brownian Motion 259 The first two arCSIne laws have the following counterparts for the Brownian bridge. Theorem 13.17 (uniform laws) Let B be a Brownian bridge with maximum M I - Then these random variables are both U(O, 1): /1 == A{t; Bt > O}, /2 == inf{t; Bt == M l }. Proof: The relation /1 d /2 may be proved in the same way as for Brow- nian motion. To see that /2 is U (0, 1), write (x) == x - [x], and consider for each u E [0,1] the process Bf = B(u+t) - Bu, t E [0,1]. It is easy to check that BU d B for each u, and further that the maximum of BU occurs at (/2 - u). By Fubini's theorem we hence obtain for any t E: [0,1] P{T2 < t} = 1 1 P{(T2 - u) < t}du = E ..\{u; (T2 - u) < t} = t. 0 From Theorem 13.5 we note that t- C Bt  0 a.s. as t  0 for any c E [0, ). The following classical result gives the exact growth rate of Brownian motion at 0 and 00. Extensions to random walks and renewal processes are obtained in Corollaries 14.8 and 14.14. A functional version appears in Theorem 27.18. Theorem 13.18 (laws of the iterated logarithm, Khinchin) For a Brow- nian motion B in JR, we have a.s. .   hm sup == lim sup == 1 t--+O V 2t log log(ljt) t--+oo y! 2t log log t . Proof: The Brownian inversion Bt == tB1!t of Lemma L3.6 converts the two formulas into one another, so it is enough to prove the result for t -+ 00. Then we note that as u  00 1 00 e- x2f2 dx '" u- 1 1 00 xe- x2f2 dx = u-1e-u2f2. By Proposition 13.13 we hence obtain, uniformly in t > 0, P{Mt > ut 1 !2} = 2P{Bt > ut 1 !2} rv (2/1r)1!2u- L e- U2 / 2 , where Mt == sUPstBs. Writing ht == (2tloglogt)lj2, we get for any r > 1 and c > 0 P{M(r n ) > ch(r n - 1 )}  n- c2 !r(logn)-1!2, n E N. Fixing c > 1 and choosing r < c 2 , it follows by the Bore]-Cantelli lemma that P{limsuPt--+oo(Bt/ht) > c} < P{M(r n ) > ch(r n - 1 ) La.} == 0, which shows that limsuPt--+oo(Bt/ht) < 1 a.s. To prove the reverse inequality, we may write P{ B(r n ) - B(rn-l) > ch(r n )} ?: n- c2r !(r-l) (log n)- L/2, n E N. 
260 Foundations of Modern Probability Taking c == {( r - 1) / r } 1/2, we get by the Borel-Cantelli lemma 1 . Bt - B t / r > 1 . B(rn) - B(r n - 1 ) ( r -1 ) 1/2 lID sup lID sup () > a.s. too ht - n--+oo h r n - r The upper bound obtained earlier yields limsuPt--+oo( -Bt/r/h t ) < r- 1 / 2 , and combining the two estimates gives B limsup  > (1 - r- 1 )1/2 - r- 1 / 2 a.s. t--+oo ht Here we may finally let r  00 to obtain lim SUPt--+CXJ (Bt / ht) > 1 a.s. 0 In the proof of Theorem 13.5 we constructed a Brownian motion B from an isonormal Gaussian process 'rJ on L 2 (+,,X) such that Bt = 'rJl[o,t] a.s. for all t > o. If instead we are starting from a Brownian motion B on 1R+, the existence of an associated isonormal Gaussian process T} may be inferred from Theorem 6.10. Since every function h E L2(+,'x) can be approximated by simple step functions, as in the proof of Lemma 1.35, we note that the random variables 'rJh are a.s. unique. We shall see how they can also be constructed directly from B as suitable Wiener integrals J hdB. As already noted, the latter fail to exist in the pathwise Stieltjes sense, and so a different approach is needed. As a first step, we may consider the class S of simple step functions of the form ht = Ljn aj l(tj_l ,tj] (t), t > 0, where n E Z+, 0 == to < . . . < tn, and aI, . . . , an E JR. For such integrands h, we may define the integral in the obvious way as ",h = roo htdBt = Bh = L '< aj(Btj - B tj _ 1 ). Jo J_n Here 'TJh is clearly centered Gaussian with variance E(",h)2 = L, aJ(tj - tj-d = roo h;dt = IIh11 2 , Jn J o where IIhll denotes the norm in L 2 (R+, -X). Thus, the integration h ..-.+ 1Jh = J hdB defines a linear isometry from S c L2(R+,'x) into L2(f!, P). Since S is dense in L2(+, ,X), we may extend the integral by continuity to a linear isometry h ..-.+ 'T}h = J hdB from L 2 (,\) to L 2 (P). Here '1}h is again centered Gaussian for every h E L 2 (,\), and by linearity the whole process h r-+ 'rJh is then Gaussian. By a polarization argument it is also clear that the integration preserves inner products, in the sense that E(",h",k) = 1 00 htktdt = (h,k), h,k E L 2 (A). We shall consider two general ways of representing stationary Gaussian processes in terms of Wiener integrals 'fJh. Here a complex notation is conve- nient. By a complex-valued, isonormal Gaussian process on a (real) Hilbert 
13. Gaussian Processes and Brownian Motion 261 space H we mean a process ( =  + i1] on H such that  and 1] are indepen- dent, real-valued, isonormal Gaussian processes on H. For any f = 9 + ih with g, h E H, we define (f = g - 1]h + i(h + 1]g). Now let X be a stationary, centered Gaussian process on JR with covariance function rt = E XsXs+t, S, t E JR.. We know that r is non- negative definite, and it is further continuous whenever X is continuous in probability. In that case Bochner's theorem yields a unique spectral representation rt = f: eitxf.-t(dx), t E JR, where the spectral measure J-l is a bounded, symmetric IIteasure on IR. The following result gives a similar spectral representation of the process X itself. By a different argument, the result extends to suitable non- Gaussian processes. As usual, we assume that the basic probability space is rich enough to support the required randomization variables. Proposition 13.19 (spectral representation, Stone, Cramer) Let X be an L 2 -continuous, stationary, centered Gaussian process on IR with spectral measure J-l. Then there exists a complex, isonormal Gau.5sian process ( on L 2 (J.L) such that Xt = ?R f: eitxd(x a.s., t E R Proof: Denoting the right-hand side of (5) by Y, we may compute E Y s yt E J ( cos sx dx - sin sx d'TJx) J ( CDS tx d:x - sin tx d'TJx) J ( cos sx cos tx - sin sx sin tx) f.-t( dx ) - J cos(s - t)x f.-t(dx) = J ei(s-t)x f.-t(dx) = rs-t. Since both X and Yare centered Gaussian, Lemma 13.1 shows that Y d X. Now both X and ( are continuous and defined on the separable spaces L 2 (X) and L2(J-t), and so they may be regarded as randonl elements in suit- able Polish spaces. The a.s. representation in (5) then follows by Theorem 6.10. 0 (5) Another useful representation may be obtained under suitable regularity conditions on the spectral measure J-l. Proposition 13.20 (moving average representation) _Let X be an £2_ continuous, stationary, centered Gaussian process on I with absolutely continuous spectral measure J-t. Then there exist an isonormal Gaussian process'T} on L 2 (JR, A) and a function f E L 2 (A) such that Xt = i: ft-sd'TJs a.s., t E JR. (6) 
262 Foundations of Modern Probability Proof: Fix a symmetric density 9 > 0 of Jl;, and define h = 9 1 / 2 . Then h E L 2 (A), and we may introduce the Fourier transform in the sense of Plancherel, is = hs == (27r)-1/2 lim j a eisxhxdx, S E JR, a---+CX) -a (7) which is again real valued and square integrable. For each t E  the function kx = e- itx hx has Fourier transform ks = /s-t, and so by Parseval's relation j oo j oo j oo itx 2 - Tt = -00 e hxdx = -00 hxkxdx = -00 Isls-tds. (8) Now consider any isonormal Gaussian process 1] on L 2 (A). For / as in (7), we may define a process Y on R by the right-hand side of (6). Using (8), we get EYsY s + t = rt for arbitrary s, t E , and so Y d X by Lemma 13.1. Again an appeal to Theorem 6.10 yields the desired a.s. representation of x. 0 For an example, we may consider a moving average representation of the stationary Ornstein-Uhlenbeck process. Then introduce an isonormal Gaussian process 1] on L2(, A) and define Xt = j t es-td'T/s, t > O. -00 The process X is clearly centered Gaussian, and we get j Sl\t rs,t = E XsXt == eu-seu-tdu = e-Is-tl, -00 s, t E , as desired. The Markov property of X follows most easily from the fact that i t s-t u-t Xt = e Xs + s e d'T/u, s < t. We proceed to introduce multiple integrals In = 1]n with respect to an isonormal Gaussian process 1] on a separable (infinite-dimensional) Hilbert space H. Without loss of generality, we may take H to be of the form L 2 (S, J1). Then HQf)n can be identified with L 2 (sn, tlQf)n), where Jl;Qf)n denotes the n-fold product measure J.l; Q9 . . . 0 Jl;, and the tensor product @k<n h k = hI Q9 · . · Q9 h n of the elements hI, . . . , h n E H is equivalent to the function hI (t l ).. . hn(t n ) on sn. Recall that for any ONB el, e2,... in H, the tensor products jn ekj with arbitrary k I ,..., k n E N form an ONB in Hn. We may now state the basic existence and uniqueness result for the integrals In. 
13. Gaussian Processes and Brownian Motion 263 Theorem 13.21 (multiple stochastic integrals, Wiener" Ita) Let 1] be an isonormal Gaussian process on some separable Hilbert space H. Then for every n E N there exists a unique continuous linear mapping In : H0 n -+ L2(p) such that a.s. In Q9 hk = II'TJhk, hI,..., h n E H orthogonal. kn kn Here the uniqueness means that Inh is a.s. unique for every h, and the linearity means that In(af + bg) = aInf + bIng a.s. for any a, b E JR and f, 9 E H0 n . Note in particular that II h == 1]h a.s. For consistency, we define 10 as the identity mapping on . For the proof we may clearly assume that H = £2([0,1], A). Let En denote the class of elementary functions of the form f= 2:CjQ91A' jm kn (9) where the sets A},. . . , Aj E 8[0,1] are disjoint for each j E {1,..., m}. The indicator functions 1 A are then orthogonal for fixed j, and we need J to take Inf = 2: Cj II 1]Aj, jm kn (10) where 'TJA = 'TJ1A. From the linearity in each factor it is clear that the value of Inf is independent of the choice of representation (9) for f. To extend the definition of In to the entire space £2 (Rt., A @n), we need two lemmas. For any function f on 1R+, we introduce the symmetrization l(t l ,..., t n ) = (n!)-l 2: f(t p1 ,..., t pn ), t l ,. .., t n E 1R+, p where the summation extends over all permutations p of {1, . . . , n }. The following result gives the basic L 2 -structure, which later carries over to the general integrals. Lemma 13.22 (isometry) The elementary integrals In! 'In (10) are orthogonal for different n and satisfy E(Inf)2 = n!IIJII 2 < n!lIfI1 2 , f E En. (11) Proof: The second relation in (11) follows from Minkowski's inequality. To prove the remaining assertions, we may first reduce to the case when all sets Aj are chosen from some fixed collection of disjoint sets B l , B 2 , . .. . For any finite index sets J ::/= K in N, we note that E II 1}Bj II 'T]Bk = II E('TJBj)2 II E1]Bj = O. jEJ kEK jEJnK jEJK This proves the asserted orthogonality. Since clearly (I, g) == 0 when 1 and 9 involve different index sets, it also reduces the proof of the isometry in (11) to the case when all terms in f involve the same sets Bl,... , Bn, 
264 Foundations of Modern Probability though in possibly different order. Since In! = In!, we may further assume that! = Q9k 1Bk. But then E(Inf)2 = It E(1]B k )2 = I1k ABk = IIfl1 2 = n!lIiIl 2 , where the last relation holds since, in the present case, the permutations of f are orthogonal. 0 To extend the integral, we need to show that the elementary functions are dense in L2(,X@n). Lemma 13.23 (approximation) The set En is dense in L 2 (;A@n). Proof: By a standard argument based on monotone convergence and a monotone class argument, any function f E L 2 (;A@n) can be approximated by linear combinations of products Q9k<n 1Ak' and so it is enough to ap- proximate functions f of the latter typeThen divide [0, 1] for each m into 2 m intervals Bmj of length 2- m , and define fm = f L Q?)lB m ,jk' (12) jl ,...,jn kn where the summation extends over all collections of distinct indices jl,." ,jn E {I,..., 2 m }. Here 1m E En for each m, and the sum in (12) tends to 1 a.e. AQ9n. Thus, by dominated convergence fm -+ f in L 2 (;A@n). 0 By the last two lemmas, In is defined as a uniformly continuous mapping on a dense subset of L 2 (.,\@n), and so it extends by continuity to all of L 2 (;A@n), with preservation of both the linearity and the norm relations in (11). To complete the proof of Theorem 13.21, it remains to show that InQ9k<nhk = I1k'f}hk for any orthogonal functions hI,...,h n E L 2 (,X). This is an immediate consequence of the following lemma, where for any f E L 2 (.,\@n) and 9 E L2(,X) we write (f 01 g)(tl,"" tn-I) = J f(t 1 ,..., tn)g(tn)dt n . Lemma 13.24 (recursion) For any f E L 2 (.,\@n) and 9 E L2(A) with n EN, we have In+I(f Q9 g) = In! '1]g - nI n - 1 (! Q91 g). (13) Proof: By Fubini's theorem and the Cauchy-Buniakowski inequality, 1I10g11 = 1111I11gll, 11101 gll < 1111111g11 < 1111111gll. Hence, the two sides of (13) are continuous in probability in both f and g, and it is enough to prove the formula for f E £n and 9 E £1. By the linearity of each side we may next reduce to the case when 1 = Q9k<n 1Ak and 9 = lA, where AI,.. . , An are disjoint and either A n Uk Ak - 0 or 
13. Gaussian Processes and Brownian Motion 265 A == AI. In the former case we have j @l 9 == 0, so (13) is immediate from the definitions. In the latter case, (13) becomes In+l (A 2 X A 2 X . . . x An) == {( 1]A)2 - ,XA }1]A 2 . . '1]An. (14) Approximating l A 2 as in Lemma 13.23 by functions 1m E £2 with support in A 2 , it is clear that the left-hand side equals 1 2 A 2 '1]A 2 . . . '1]An. This re- duces the proof of (14) to the two-dimensional version 1 2 A 2 == (1]A)2 - 'xA. To prove the latter, we may divide A for each m into 2 m subsets BmJ of measure < 2- m , and note as in Theorem 13.9 and Lemrna 13.23 that (17A)2 = L i (17 B mi)2 + Li-f:- j 17 Bm i 17 B mj ----+ AA + hA2 in £2. 0 The last lemma will be used to derive an explicit representation of the integrals In in terms of the Hermite polynomials Po, PI, . .. . The latter are defined as orthogonal polynomials of degrees 0, 1, . .. with respect to the standard Gaussian distribution on JR. This condition determines each Pn up to a normalization, which we choose for convenience such that the leading coefficient becomes 1. The first few polynomials are then Po ( x) == 1, PI ( x) == x, P2 ( x) == x 2 - 1, P3 ( x) == :1;3 - 3x, Theorem 13.25 (orthogonal representation, Ito) On a separable Hilbert space H, let TJ be an isonormal Gaussian process with associated multi- ple Wiener-ItD integrals 1 1 ,1 2 ,... . Then for any orthonormal elements el,..., em E H and integers nl,..., n m > 1 with sum n, we have In Q9 eJn) == II Pn)(TJej). js.m J5:m Using the linearity of In and writing it == h/llhll, we see that the stated formula is equivalent to the factorization In Q9 hJnj == II Injhfn), hI,..., h k E H orthogonal, (15) j:5m j5:m together with the representation of the individual factors I n h6])n == Ilhllnpn(TJh), h E H \ {O}. (16) Proof: We prove (15) by induction on n. Then assume the relation to hold for all integrals up to order n, fix any orthonormal elements h, hI, . . . ,h m E H and integers k, nl, . . . , n m E N with sum n + 1, and write 1 = j5:m hfn j . By Lemma 13.24 and the induction hypothesis, In+l (I (j!) h6])k) In(f (j!) h(k-l»). TJh - (k -l)ln-l(f @ h0(k-2») (In-k+1f) {h_ 1 h0(k-l) .17 h - (k -- 1)h_2 h 0(k-2)} In-k+l/.Ik h @k. Using the induction hypothesis again, we obtain the desired extension to In+l- 
266 Foundations of Modern Probability It remains to prove (16) for an arbitrary element h E H with IIhll = 1. Then conclude from Lemma 13.24 that In+Ih(n+l) = Inhn . 'fJh - nln_Ih(n-I), n E N. Since 101 = 1 and Ilh = 'fJh, we see by induction that Inh filn is a polynomial in 1Jh of degree n and with leading coefficient 1. By the definition of Hermite polynomials, it remains to show that the integrals Inh@n for different n are orthogonal, which holds by Lemma 13.22. 0 Given an isonormal Gaussian process 1] on some separable Hilbert space H, we introduce the space L2('TJ) = L 2 (f2, a{7]}, P) of 7]-measurable random variables  with E2 < 00. The nth polynomial chaos Pn is defined as the closed linear subspace generated by all polynomials of degree < n in the random variables ".,h, h E H. We also introduce for every n E Z+ the nth homogeneous chaos tin, consisting of all integrals In!, f E H(8)n. The relationship between the mentioned spaces is clarified by the fol- lowing result. As usual, we write EB and e for direct sums and orthogonal complements, respectively. Theorem 13.26 (chaos expansion, Wiener) On a separable Hilbert space H, let 'fJ be an isonormal Gaussian process with associated polynomial and homogeneous chaoses Pn and 1-l n , respectively. Then the 1-l n are orthogonal, closed, linear subspaces of £2 (".,), satisfying n 00 Pn = EB 1ik, n E Z+; £2(TJ) = EB tin. (17) k=O n=O Furthermore, every  E £2(7]) has a unique a.s. representation  = Ln Infn with symmetric elements f n E H@n, n > o. In particular, we note that 1lo = Po == JR and 1i n = Pn e P n - 1 , n E N. Proof: The properties in Lemma 13.22 extend to arbitrary integrands, and so the spaces ll n are mutually orthogonal, closed, linear subspaces of £2(1]). From Lemma 13.23 or Theorem 13.25 we see that also 1-l n C Pn. Conversely, let  be an nth-degree polynomial in the variables 1Jh. We may then choose some orthonormal elements eI, . . . , em E H such that  is an nth-degree polynomial in 'fie 1 , . . . , 'fIe m . Since any power ('fJej)k is a linear combination of the variables po(7]ej),... ,Pk('fJej), Theorem 13.25 shows that  is a linear combination of multiple integrals Ikf with k < n, which means that  E E9k<n llk. This proves the first relation in (17). To prove the second relation, let  E L2('TJ) e E9n 1-l n . In particular, ..L('T1h)n for every h E Hand n E Z+. Since Ln l'fJhln In! = e ll1hl E £2, the series eiT}h = Ln(iT}h)nln! converges in £2, and we get -Lei'11h for every h E H. By the linearity of the integral 'fJh, we hence obtain for any 
13. Gaussian Processes and Brownian Motion 267 hI, . . . , h n E H, n EN, E [exp Lk$niUk1]hk] = 0, Ul,..., Un E R Applying the uniqueness theorem for characteristic functions to the distri- butions of (1Jh 1 ,.. . ,1Jh n ) under the bounded measures J-lI == E[I;.], we may conclude that E [; (1] hI, . . . , 'fJ h n ) E B] == 0, B E B OR n ) . By a monotone class argument, this extends to E[; A] == 0 for arbitrary A E (1{TJ}, and since  is 1J-measurable, it follows that  == E[I7J] == 0 a.s. The proof of (17) is then complete. In particular, any element  E L 2 (17) has an orthogonal expansion ==  Infn ==  Inln, .L...t n  0 .L...t n  0 for some elements in E HQ9n with symmetric versions in, n E Z+. Now assume that also  == En Ingn. Projecting onto ll n and using the linearity of In' we get In(gn - in) == o. By the isometry in (] 1) it follows that 119n - inll == 0, and so 9n == In. 0 Exercises 1. Let l,. . . , n be i.i.d. N(m, (12). Show that the random variables ( == n- 1 Ek k and s2 == (n - 1)-1 Ek(k - )2 are independent and that (n - 1)s2 d Ek<n(k - m)2. (Hint: Use the symmetry in Proposition 13.2, and no calculations.) 2. For a Brownian motion B, put tnk == k2- n , and define O,k == Bk - B k - 1 and nk == B tn ,2k-l - (Btn-l,k-l + Btn-1,k)' k, n > 1. Show that the nk are independent Gaussian. Use this fact to construct a Brownian motion from a sequence of i.i.d. N(O, 1) random variables. 3. Let B be a Brownian motion on [0,1], and define Xt == Bt - tB l . Show that X JiB!. Use this fact to express the conditional distribution of B, given B!, in terms of a Brownian bridge. 4. Combine the transformations in Lemma 13.6 with the Brownian scal- ing c- 1 B(c 2 t) to construct a family of transformations preserving the distribution of a Brownian bridge. 5. Show that the Brownian bridge is an inhomogeneous Markov process. (Hint: Use the transformations in Lemma 13.6 or verify the condition in Proposition 13.7.) 6. Let B = (B 1 , B2) be a Brownian motion in JR2, and consider some times tnk as in Theorem 13.9. Show that Ek(B£n,k - B£n.k_l )(Bln,k - B;n,k_l)  0 in L 2 or a.s., respectively. (Hint: Reduce to the case of the quadratic variation. ) 
268 Foundations of Modern Probability 7. Use Theorem 7.27 to construct an rcll version B of Brownian motion. Then show as in Theorem 13.9 that B has quadratic variation [B]t - t, and conclude that B is a.s. continuous. 8. For a Brownian motion B, show that inf {t > 0; Bt > O} == 0 a.s. (Hint: Conclude from Kolmogorov's 0-1 law that the stated event has probability o or 1. Alternatively, use Theorem 13.18.) 9. For a Brownian motion B, define Ta == inf{t > 0; Bt == a}. Compute the density of the distribution of Ta for a :/= 0, and show that ETa == 00. (Hint: Use Proposition 13.13.) 10. For a Brownian motion B, show that Zt == exp(cB t - !c 2 t) is a martin- gale for every c. Use optional sampling to compute the Laplace transform of Ta above, and compare with the preceding result. 11. (Paley, Wiener, and Zygmund) Show that Brownian motion B is a.s. nowhere Lipschitz continuous, and hence nowhere differentiable. (Hint: If B is Lipschitz at t < 1, there exist some K, 8 > 0 such that I Br - B s I < 2hK for all r, s E (t - h, t + h) with h < {yo Apply this to three consecutive n-dyadic intervals (r, s) around t.) 12. Refine the preceding argument to show that B is a.s. nowhere Holder continuous with exponent c > !. 13. Show that the local maxima of a Brownian motion are a.s. dense in  and that the corresponding times are a.s. dense in JR+. (Hint: Use the preceding result.) 14. Show by a direct argument that lim SUPt t- 1 / 2 Bt == 00 a.s. as t -t 0 and 00, where B is a Brownian motion. (Hint: Use Kolmogorov's 0-1 law.) 15. Show that the law of the iterated logarithm for Brownian motion at 0 remains valid for the Brownian bridge. 16. Show for a Brownian motion B in jRd that the process IBI satisfies the law of the iterated logarithm at 0 and 00. 17. Let 1,2,'" be i.i.d. N(D,l). Show that limsuPn(2Iogn)-1/2n == 1 a.s. 18. For a Brownian motion B, show that Mt == t- 1 Bt is a reverse martin- gale, and conclude that t- 1 Bt -t 0 a.s. and in LP, p > 0, as t -t 00. (Hint: The limit is degenerate by Kolmogorov's 0-1 law.) Deduce the same result from Theorem 10.9. 19. For a Brownian bridge B, show that Mt == (1 - t)-l Bt is a martingale on [0,1). Check that M is not Ll-bounded. 20. Let In be the n-fold Wiener-Ito integral w.r.t. Brownian motion B on 1R+. Show that the process Mt = In(l[o,t]n) is a martingale. Express M in terms of B, and compute the expression for n == 1,2,3. (Hint: Use Theorem 13.25. ) 
13. Gaussian Processes and Brownian Motion 269 21. Let 'f/l, . . . , 'TJn be independent, isonormal Gaussian processes on a sep- arable Hilbert space H. Show that there exists a unique continuous linear mapping @k 1Jk from Hfi9 n to £2(p) such that @k 1]k @k h k == Ilk 1]k h k a.s. for all hI, . . . , h n E H. Also show that @ k TJk is an isometry. 
Chapter 14 Skorohod Embedding and Invariance Principles Embedding of random variables; approximation of random walks; functional central limit theorem; laws of the iterated logarithm; arcsine laws; approximation of renewal processes; empirical distribution functions; embedding and approximation of martingales In Chapter 5 we used analytic methods to derive criteria for a sum of inde- pendent random variables to be approximately Gaussian. Though this may remain the easiest approach to the classical limit theorems, the results are best understood when viewed as consequences of some general approxima- tion theorems for random processes. The aim of this chapter is to develop a purely probabilistic technique, the so-called Skorohod embedding, for deriving such functional limit theorems. In the simplest setting, we may consider a random walk (Sn) based on some i.i.d. random variables €k with mean 0 and variance 1. In this case there exist a Brownian motion B and some optional times Tl < /2 < ... such that Sn = Br n a.s. for every n. For applications it is essential to choose the Tn such that the differences Tn are again i.i.d. with mean one. The step process 3[t] will then be close to the path of B, and many results for Brownian motion carryover, at least approximately, to the random walk. In particular, the procedure yields versions for random walks of the arcsine laws and the law of the iterated logarithm. From the statements for random walks, similar results may be deduced rather easily for various related processes. In particular, we shall derive a functional central limit theorem and a law of the iterated logarithm for renewal processes, and we shall also see how suitably normalized versions of the empirical distribution functions from an i.i.d. sample can be approxi- mated by a Brownian bridge. For an extension in another direction, we shall obtain a version of the Skorohod embedding for general L 2 -martingales and show how any suitably time-changed martingale with small jumps can be approximated by a Brownian motion. The present exposition depends in many ways on material from previous chapters. Thus, we rely on the basic theory of Brownian motion, as set forth in Chapter 13. We also make frequent use of ideas and results from Chapter 7 on martingales and optional times. Finally, occasional references 
14. Skorohod Embedding and Invariance Principles 271 are made to Chapter 4 for empirical distributions, to Chapter 6 for the transfer theorem, to Chapter 9 for random walks and renewal processes, and to Chapter 12 for the Poisson process. More general approximations and functional limit theorems are obtained by different methods in Chapters 15, 16, and 19. We also note the close relationship between the present approximation result for martingales with small jumps and the time-change results for continuous local martingales in Chapter 18. To clarify the basic ideas, we begin with a detailed discussion of the classical Skorohod embedding for random walks. The rnain result in this context is the following. Theorem 14.1 (embedding of random walk, Skorohod) Let 1, 2,... be i.i.d. random variables with mean 0, and put Sn == 1 + . . . + n. Then there exists a filtered probability space with a Brownian rnotion B and some optional times 0 = TO < Tl < ... such that (B Tn ) d (Sn) and the differences Tn = Tn - Tn-l are i.i.d. with ETn = Er and E(Tn)2 < 4Et. Here the moment requirements on the differences LlTn are crucial for applications. Without those conditions the statement would be trivially true, since we could then choose BlL(n) and define the Tn recursively by Tn = inf{t > Tn-I; Bt = Sn}. In that case ETn = 00 unless 1 = 0 a.s. The proof of Theorem 14.1 is based on a sequence of lemmas. First we exhibit some martingales associated with Brownian motion. Lemma 14.2 (Brownian martingales) For a Brownian motion B, the processes Bt, B; - t, and Bi - 6tB; + 3t 2 are all martingales. Proof: Note that EBt = EBl = 0, EBl = t, and EBt = 3t 2 . Write F fr the filtration induced by B, let 0 < s < t, and recall that the process Bt = B s + t - Bs is again a Brownian motion independent of :Fs. Hence, 2 2 - -2 2 E[Bt IFs] = E[Bs + 2BsBt-s + Bt-sl:F s ] = B + t - s. Moreover, E[BiIFs] 4 3 - 2 -2 - 3 - 4 E[Bs + 4BsBt-s + 6BsBt-s + 4BsBt_s + Bt-sl:F s ] B: + 6(t - s)B; + 3(t - s)2, and so E[Bt - 6tB;IF s ] = B; - 6sB; + 3(s2 - t 2 ). o By optional sampling, we may deduce some useful formulas. Lemma 14.3 (moment relations) Consider a Brownian motion B and an optional time T such that BT is bounded. Then EB T = 0, ET = EB;, ET 2 < 4EB;. (1) Proof: By optional stopping and Lemma 14.2, we get for any t > 0 EBT/\t = 0, E(T /\ t) = EB;At, (2) 
272 Foundations of Modern Probability 3E(T 1\ t)2 + EB;'/\t = 6E(T 1\ t)B;/\t. (3) The first two relations in (1) follow from (2) by dominated and monotone convergence as t --+ 00. In particular, we have ET < 00. We may then take limits even in (3) and conclude by dominated and monotone convergence together with the Cauchy-Buniakovsky inequality that 3ET 2 + EB; == 6ET B; < 6(ET 2 EB;)1/2. Writing r == (ET2jEB;.)1/2, we get 3r 2 + 1 < 6r. Thus, 3(r -1)2 < 2, and finally, r < 1 + (2/3)1/2 < 2. 0 The next result shows how an arbitrary distribution with mean 0 can be expressed as a mixture of centered two-point distributions. For any a < 0 < b, let lIa,b denote the unique probability measure on {a, b} with mean O. Clearly, lIa,b == 6 0 when ab == 0, and otherwise Va,b == b6 a - a6 b b - a ' a < 0 < b. It is easy to verify that v is a probability kernel from JR_ x 1R+ to IR. For mappings between two measure spaces, measurability is defined in terms of the O"-fields generated by all evaluation maps 7r B : J-l r-t J-lB, where B is an arbitrary set in the underlying a-field. Lemma 14.4 (randomization) For any distribution J-l on IR with mean zero, there exists a distribution ji on JR_ x JR+ with J-l == J ji( dx dy )vx,y, and we can choose ji to be a measurable function of J-l. Proof (Chung): Let J-l:f: denote the restrictions of jj to 1R:f: \ {O}, define l (x) = x, and put c == J ldJ-l+ == - J ldJ-l-. For any measurable function f: JR --+ JR+ with 1(0) == 0, we get c f fd/-L = J ld/-L+ f fd/-L- - J ld/-L- J fd/-L+ J J (y - x)/-L_(dx)/-L+(dy) f fdvx,y, and so we may take ji(dxdy) = J-l{0}6 0 ,o(dxdy) + c- 1 (y - x)J-L-(dx)J-l+(dy). The measurability of the mapping J-l t-+ ji is clear by a monotone class argument, once we note that ji(A x B) is a measurable function of J-l for arbitrary A, B E B(IR). 0 The embedding in Theorem 14.1 will now be constructed recursively, beginning with the first random variable eEl, 
14. Skorohod Embedding and lnvariance Principles 273 Lemma 14.5 (embedding of random variable) For any probability mea- sure J-l on IR with mean 0, consider a random pair (a, (3) with distribution jl as in Lemma 14.4, and let B be an independent Brownian motion. Then the time T = inf{ t > 0; Bt E {a, j3}} is optiona.l for the filtration :Ft == a{ a, (3; Bs, s < t}, and we have .c(B T ) = J-L, ET = / x 2 J-L(dx), ET 2 < 4 J x 4 J-L(dx). Proof: The process B is clearly an F-Brownian motion, and T is F- optional as in Lemma 7.6 (ii). Using Lemma 14.3 and Fubini's theorem gIves ET E P[B T E . I a, 13] == EVo:,f3 == J-l, EE[Tla,;3) = E J X 2 v a ,fJ(dx) = J x 2 J-L(dx) , EE[T 2 Ia,,6] < 4E J X 4 v a ,fJ(dx) = 4/ x 4 J-L(dx). o £( B T ) ET 2 Proof of Theorem 14.1: Let J-l be the common distribution of the n. In- troduce a Brownian motion B and some independent i.i.d. pairs (an, (3n), n E N, with the distribution ji of Lemma 14.4. Define recursively the random times 0 == TO < Tl < . .. by Tn == inf{t > Tn-I; Bt - BTn-l E {a n ,j3n}}, n EN. Here each Tn is clearly optional for the filtration Ft == a{fk, f3k, k > 1; B t }, t > 0, and B is an F-Brownian motion. By the strong l\farkov property at Tn, the process Bn) == B Tn + t - B Tn is then a Brownian IIlotion independent of G n = a{Tk,B Tk ; k < n}. Since moreover (a n +l,(3n+l)JL(B(n),gn), we obtain (a n +l, tJn+l, B(n))JLQn, and so the pairs (L1Tn, L1B Tn ) are i.i.d. The remaining assertions now follow by Lemma 14.5. 0 The last theorem enables us to approximate the entire random walk by a Brownian motion. As before, we assume the underlying probability space to be rich enough to support the required randomization variables. Theorem 14.6 (approximation of random walk, Skorohod, Strassen) Let 1, 2,. .. be i.i.d. random variables with mean 0 and va-riance 1, and write Sn = 1 + . .. + n. Then there exists a Brownian motion B such that t- 1 / 2 sUPstIS[s] - Bsl  0, t -+ 00, Hm Srt] - Bt = 0 a.s. too v 2 t log log t (4) (5) The proof of (5) requires the following estimate. 
274 Foundations of Modern Probability Lemma 14.7 (rate of continuity) For a Brownian motion B in R, we have 1 . I . IBu - Btl 0 1m 1m sup sup = a.s. T 11 t-+oo t$;u$;rt v! 2t log log t Proof: Write h(t) = (2tloglogt)1/2. It is enough to show that !irn Hrn sup sup T 11 n-+oo rn$t$rn+l IBt - Brnl h(r n ) = 0 a.s. (6) Proceeding as in the proof of Theorem 13.18, we get as n --+ 00 for fixed r > 1 and c > 0 P {SUPtE[rn,rn+1]iBt - Br n I > ch(r n )} < lD{B(rn(r - 1)) > ch(r n )} n- c2 /(r-l) (log n)-1/2. ,--... < ,--... (As before, a ;s b means that a < cb for some constant c > 0.) If c 2 > r -1, it is clear from the Borel-Cantelli lemma that the lim sup in (6) is a.s. bounded by c, and the relation follows as we let r -+ 1. 0 For the main proof, we need to introduce the modulus of continuity w(f, t, h) = sup Ifr - fsl, t, h > O. r,st, Ir-slh Proof of Theorem 14.6: By Theorems 6.10 and 14.1 we may choose a Brownian motion B and some optional times 0 = TO < Tl < ... such that Sn = Br n a.s. for all n, and the differences Tn - T n -1 are i.i.d. with mean 1. Then Tn/n -+ 1 a.s. by the law of large numbers, and so T[t]/t -+ 1 a.s. Relation (5) now follows by Lemma 14.7. Next define 6t = sUPstIT[s] - 81, t > 0, and note that the a.s. convergence Tn/n -+ 1 implies 6t/t -+ 0 a.s. Fix any t, h, € > 0, and conclude by the scaling property of B that p {Cl/2SUPs9IBTIS] - Bsl > E: } < P{ w(B, t + th, th) > et 1 / 2 } + lD{ 8t > th} = P{w(B,1 + h, h) > €} + P{t- 1 8 t > h}. Here the right-hand side tends to zero as t -+ 00 and then h -+ 0, and (4) . 0 As an immediate application of the last theorem, we may extend the law of the iterated logarithm to suitable random walks. 
14. Skorohod Embedding and lnvariance Principles 275 Corollary 14.8 (law of the iterated logarithm, Hartman and Wintner) Let 1, 2, . .. be i. i. d. random variables with mean 0 and variance 1, and define Sn = 1 + . . . + n' Then I . n 1 1m sup == a.s. n-+oo V 2n log log n Proof: Combine Theorems 13.18 and 14.6. o To derive a weak convergence result, let D[O,1] denote the space of all functions on [0,1] that are right-continuous with left-hand limits (rell). For our present needs, it is convenient to equip D[O, 1] with the norm Ilxll == SUPt IXtl and the a-field V generated by all evaluation maps 1rt : x M Xt- The norm is clearly V-measurable, and so the same thing is true for the open balls Bx,r == {y; IIx - yll < r}, x E D[O, 1], r > O. (However, 1) is strictly smaller than the Borel a-field induced by the norm.) Given a process X with paths in D[O, 1] and a mapping f: D[O, 1]  JR, we say that f is a.s. continuous at X if X ft D f a.s., where D f is the set of functions x E D[O, 1] where f is discontinuous. (The measurability of Df is irrelevant here, provided that we interpret the condition in the sense of inner measure.) We may now state a functional version of the classical central limit theorem. Theorem 14.9 (functional central limit theorem, Donsker) Let 1, 2,.'. be i. i. d. random variables with mean 0 and variance 1, and define Xf == n- 1 / 2  k, t E [0,1], n E N. knt Consider a Brownian motion B on [0, 1], and let f : D[O, 1]  IR be measurable and a.s. continuous at B. Then f{X n )  j(B). The result follows immediately from Theorem 14.6 together with the following lemma. Lemma 14.10 (approximation and convergence) Let Xl, X 2 ,... and Y 1 , Y2, ... be rcll processes on [0, 1] with Y n d Y 1 :::= Y for all nand IIX n - Y n II  0, and let f: D[O, 1]   be measurable and a.s. continuous at Y. Then f(Xn)  j(Y). Proof: Put T == Q n [0,1]. By Theorem 6.10 there exist some processes X on T such that (X, Y) d (X n , Y n ) on T for all n. Then each X is a.s. bounded and has finitely many up crossings of any nondegenerate interval, and so the process Xn(t) == X(t+) exists a.s. with paths in D[O, 1]. From the right continuity of paths it is also clear that (X n , Y) d (X n , Y n ) on [0, 1] for every n. To obtain the desired convergence, we note that IIX n - YII d IIX n - Y n II  d - P 0, and hence f{X n ) = f{X n )  f{Y) as in Lemma 4.3. 0 
276 Foundations of Modern Probability In particular, we may recover the central limit theorem in Proposition 5.9 by taking f(x) == Xl in Theorem 14.9. We may also obtain results that go beyond the classical theory, such as for the choice f(x) == SUPt IXtl. As a less obvious application, we shall see how the arcsine laws of Theorem 13.16 can be extended to suitable random walks. Recall that a random variable  is said to be arcsine distributed if  d sin 2 a, where a is U(0,21r). Theorem 14.11 (arcsine laws, Erdos and Kac, Sparre-Andersen) Let (Sn) be a random walk based on some distribution J1 with mean 0 and variance 1, and define for n E N r n- 1 Lkn1{Sk > OJ, r n- l min{k > 0; Sk == maxjnSj}, T n- l max{k < n; SkSn < OJ. Then T  r for i == 1, 2,3, where T is arcsine distributed. The results for i == 1,2 remain valid for any nondegenerate, symmetric distribution J-l. For the proof, we consider on D[O, 1] the functionals fl(X) A{t E [0,1]; Xt > OJ, f 2 ( X ) inf {t E [0, 1]; Xt V X t - == sup s  1 X s } , f3(X) sup{t E [0,1]; XtXl < OJ. The following result is elementary. Lemma 14.12 (continuity of functionals) The functionals Ii are mea- surable. Furthermore, 11 is continuous at X iff ,\{ t; Xt == O} == 0, 12 is continuous at x iff Xt V Xt- has a unique maximum, and 13 is continuous at x if 0 is not a local extreme of Xt or Xt- on (0, 1] . Proof of Theorem 14.11: Clearly, T == fi(X n ) for n E Nand i == 1,2,3, where X n - 1/2 S t ==n [nt], t E [0, 1], n E N. To prove the first assertion, it suffices by Theorems 13.16 and 14.9 to show that each Ii is a.s. continuous at B. Thus, we need to verify that B a.s. satisfies the conditions in Lemma 14.12. For 11 this is obvious, since by Fubini's theorem E..\{t < 1; Bt = O} = 1 1 P{Bt = O}dt = O. The conditions for 12 and 13 follow easily from Lemma 13.15. To prove the last assertion, it is enough to consider T, since T has the same distribution by Corollary 11.14. Then introduce an independent Brownian motion B and define (j == n- 1 " l{EBk + (1 - E)Sk > O}, n E N, E E (0,1]. k$;n 
14. Skorobod Embedding and Invariance Principles 277 By the first assertion together with Theorem 9.11 and Corollary 11.14, we have a d a ..5, T. Since P{Sn == O} -+ 0, e.g. by Theorem 4.17, we also note that limsup I(T - TI < n- 1 L l{Sk = O}  O. €o kn Hence, we may choose some constants En -4 0 with an - T  0, and by Theorem 4.28 we get T ..5, T. 0 Theorem 14.9 is often referred to as an invariance principle, because the limiting distribution of f(X n ) is the same for all i.i.d. sequences (k) with mean 0 and variance 1. This fact is often useful for applications, since a direct computation may be possible for some special choice of distribution, such as for P{ k == :i:l} == . The approximation Theorem 14.6 yields a corresponding result for renewal processes, regarded here as nondecreasing step processes. Theorem 14.13 (approximation of renewal processes) ..f..tet N be a renewal process based on some distribution J-L with mean 1 and variance a 2 E (0, 00 ) . Then there exists a Brownian motion B such that t- 1 / 2 sUPstlNs - s - aBsl  0, t -4 00, (7) . Nt - t - a Bt hm == 0 a.s. too v 2t log log t Proof: Let TO, T1,. .. be the renewal times of N, and introduce the random walk Sn == n - r n + TO, n E Z+. Choosing a Brownian motion B as in Theorem 14.6, we get 1 . N'Tn - Tn - aBn I . Sn - aBn 1m == 1m == 0 a.s. noo v 2n log log n noo y 2n log log n Since Tn r>..J n a.s. by the law of large numbers, we may replace n in the denominator by Tn, and by Lemma 14.7 we may further replace En by B'T n . Hence, (8) Nt - t - a Bt v' I ---+ 0 a.s. along ( Tn). 2t log og t Invoking Lemma 14.7, we see that (8) will follow if we can only show that Tn+1- T n 0  a.s. y 2Tn log log Tn This may be seen most easily from Theorem 14.6. From Theorem 14.6 we see that also n- 1 / 2 sup IN'Tk - Tk - aBkl == n-l/2 sup ISk - TO -- aBkl  0, kn kn and by Brownian scaling, n- 1 / 2 w(B,n,1) d w(B,I,n- 1 )  o. 
278 Foundations of Modern Probability To get (7), it is then enough to show that n-l/2suPknITk - Tk-l - 11 = n-l/2suPknISk - Sk-ll  0, which is again clear from Theorem 14.6. o We may now proceed as in Corollary 14.8 and Theorem 14.9 to deduce an associated law of the iterated logarithm and a weak convergence result. Corollary 14.14 (limits of renewal processes) Let N be a renewal process based on some distribution J.L with mean 1 and variance a 2 < 00. Then . :f: ( N t - t ) hm sup = (}' a.s. too ..J 2t log log t If B is a Brownian motion and X r _ N rt - rt [ ] t - ar 1 /2 ' t EO, 1 , r > 0, then also f(X r ) i+ f(B) as r -+ 00 for any measurable function f D[0,1] --+ 1R that is a.s. continuous at B. The weak convergence part of the last corollary yields a similar result for the empirical distribution functions associated with a sequence of i.i.d. random variables. In this case the asymptotic behavior can be expressed in terms of a Brownian bridge. Theorem 14.15 (approximation of empirical distribution functions) Let  1 , 2, . .. be i. i. d. random variables with distribution function F and em- pirical distribution functions ii, F 2 ,... . Then there exist some Brownian bridges B 1 , B 2 , . .. such that SUPx In 1 / 2 (Fn(x) - F(x)) - B n 0 F(x)1  0, n -t 00. (9) Proof: Arguing as in the proof of Proposition 4.24, we may reduce the discussion to the case when the n are U(O, 1), and F(t) = t on [0,1]. Then clearly n 1 / 2 (Fn(t) - F(t)) = n- 1 / 2 Lkn (1{k < t} - t), t E [0, 1]. Now introduce for each n an independent Poisson random variable "'n with mean n, and conclude from Proposition 12.4 that Nr = Ek<K 1 {k < t} _ n is a homogeneous Poisson process on [0,1] with rate n. By Theorem 14.13 there exist some Brownian motions W n on [0,1] with SUPt9In-l/2(N;' - nt) - wtl  o. For the associated Brownian bridges Bf = wtn - tWi, we get SUPt < 1In- 1 / 2 (N;' - tNI'") - Bfl  o. 
14. Skorohod Embedding and lnvariance Principles 279 To deduce (9), it is enough to show that n- 1 / 2 SUPt<1 " (l{k < t} - t)  O. (10) - k::;ln-nl Here In -nl  00, e.g. by Proposition 5.9, and so (10) ho]ds by Proposition 4.24 with n 1 / 2 replaced by I/'\:n - nl. It remains to note that n- 1 / 2 1K: n - nl is tight, since E(n - n)2 = n. 0 Our next aim is to establish martingale versions of the Skorohod embedding Theorem 14.1 and the associated approximation Theorem 14.6. Theorem 14.16 (embedding of martingales) Let (M n ) be a martingale with Mo = 0 and induced filtration (9n). Then there exist a Brownian motion B and some associated optional times 0 == TO < 71 < . .. such that M n = Br n a.s. for all nand E[TnIFn-l] = E[(Mn)2Ign_1], (11) E[(Tn)2IFn_1] < 4E[(Mn)4Ign_1,' (12) where (Fn) denotes the filtration induced by the pairs (Af n , Tn). Proof: Let J.tl, J.t2, . -. be probability kernels satisfying P[Mn E -I Qn-l] == /Ln(M 1 ,.. ., Mn-l;.) a.s., n E N. (13) Since the M n form a martingale, we may assume that J.Ln (x; .) has mean 0 for all x E jRn-l. Define the associated measures iin(x; .) on JR2 as in Lemma 14.4, and conclude from the measurability part of the lemma that iin is a probability kernel from IRn-l to}R2. Next choose some measurable functions In: ]Rn -). JR2 as in Lemma 3.22 such that /n(x, {}) has distribution iin(x, .) when {) is U(O, 1). Now fix any Brownian motion B' and some independent Li.d. U(O, 1) ran- dom variables {)l, {}2, . . . . Take T = 0, and recursively define the random variables an, !3n, and T, n EN, through the relations (an, !3n) f n (BI , · . . , BI , {} n) , 1 n-l (14) (15) I Tn - inf {t > <-1; B - B_l E {ar",Bn}}. Since B' is a Brownian motion for the filtration Bt = a{(B')t, (19 n )}, t > 0, and each T is B-optional, the strong Markov property shows that Bin) = BI + t - BI is again a Brownian motion independent of F = a{ T k ' , B' , ; n n Tk k < n}. Since also {}n+lJl(B(n), F), we have (B(n), {}71+1)lLF. Writing g == a{ B/; k < n}, it follows easily that k (T + l' BI )llgl F. n+l n (16) By (14) and Theorem 6.4 we have P(on, ,an) E .Ig-l] = jln(B/, . . . , BI ; .). 1 n-l (17) 
280 Foundations of Modern Probability Since also B(n-I) lL(a n , ,In, g-I)' we have B(n-I) lLg, (an, /3n), and n-l B(n-I) is conditionally a Brownian motion. Applying Lemma 14.5 to the conditional distributions given g-l' we get by (15), (16), and (17) P[LlB, E .Ig-I] n E[TIF_l] == E[Tlg_I] E[(T)2IF_1] == E[(7)2Ig_1] < J-Ln(B/,...,B, ;.), 1 n-l E[(B, )2Ig_I], n 4E[(B, )419_I]. n (18) (19) (20) Comparing (13) and (18) gives (B/) d (M n ). By Theorem 6.10 we may then choose a Brownian motion B with associated optional times 71, T2, . . . such that {B, (M n ), (Tn)} d {B', (B,), (7)}. n All a.s. relations between the objects on the right, including also their conditional expectations given any induced a-fields, remain valid for the objects on the left. In particular, M n == Br n a.s. for all n, and relations (19) and (20) imply the corresponding formulas (11) and (12). 0 We may use the last theorem to show how martingales with small jumps can be approximated by a Brownian motion. For martingales M on Z+, we then introduce the quadratic variation [M] and predictable quadratic variation (M), given by [M]n == " (b. M k)2, L-ik'5:n (M)n = " E[(D. M k)2I F k_I]. L-ik5. n Continuous-time versions of those processes are considered in Chapters 17 and 26. Theorem 14.17 (approximation of martingales with small jumps) For each n E N, let M n be an Fn-martingale on Z+ with Mr; == 0 and lilMrl < p 1, and assume that sUPk IMrl --+ O. Define Xr = Lk LlMk'l{[M n ]k < t}, t E [0,1], n E N, and put (n == [Mn]ooe Then (x n _Bn)(nl\l  0 for some Brownian motions Bn. This remains true with [Mn] replaced by (Mn), and we may also replace the condition sUPk ID.Mi:1  0 by k P[lLlMk' I > el.1k-l]  0, e > O. (21) For the proof, we need to show that the time scales given by the sequences (7;;), [Mn], and (M n ) are asymptotically equivalent. 
14. Skorohod Embedding and Invarianc€ Principles 281 Lemma 14.18 (time-scale comparison) Assume in Theorem 14.17 that Mi: == Bn(T;:) a.s. for some Brownian motions Bn and associated optional times TJ: as in Theorem 14.16. Put K == inf{k; [Mn]k > t}. Then as n -+ 00 for fixed t > 0, we have sup (IT;: - [Mn]kl V I[Mn]k - (Mn)kl)  O. (22) k Proof: By optional stopping, we may assume that [lJn] is uniformly bounded and take the supremum in (22) over all k. To handle the second difference in (22), we note that Dn == [Mn] - (Mn) is a martingale for each n. Using the martingale property, Proposition 7.1(i, and dominated convergence, we get E(D n )*2 < SUPk E (Di.n 2 = Lk E(D'k)2 Lk EE[(D'k)2IFk_l] Lk EE[([Mn]k)2IFk_l] E'" ( Mn ) 4 < Esu p ( M1 ) 2 -+ 0 L..Jk k ___ k k , --- < and so (Dn)*  O. This clearly remains true if each sequence (Mn) is defined in terms of the filtration gn induced by M n . To complete the proof of (22), it is enough to show, for the latter versions of (Mn), that (Tn - (M n ))*  o. Then let Tn denote the filtration induced by the pairs (MJ:,T;:), kEN, and conclude from (11) that (Mn)m == '" E[Tkl7k-l], m, n E N. L..Jkm Hence, Dn == Tn - (M n ) is a Tn-martingale. Using (11) and (12), we then get as before E(b n )*2 < --- SUPk E (D'k)2 = Lk EE[(Dk)217k_l] Lk EE[(Tk)217k_l] Lk EE[(AM k )419 k _ 1 ] EL k (AM k )4  ESUPk(M;)2 --+ O. o < < --- The sufficiency of (21) is a consequence of the following simple estimate. Lemma 14.19 (Dvoretzky) For any filtration:F on Z+ and sets An E :F n , n EN, we have PUnAn < p{LnP[AnIFn-l] > c} +E, E > O. 
282 Foundations of Modern Probability Proof: Write n = IAn and tn = P[AnIFn-l], fix any E > 0, and define r = inf{n; 1 +... +n > E}. Then {r < n} E Fn-l for each n, and so ELn == LE[n; T > n] = LE[tn; 7> n] = ELn < E. n<T n n n<T Hence, PUAn < P{r<oo}+EL(n < P{Lnn>C:}+C:. 0 n n<T Proof of Theorem 14.17: To prove the result for the time-scales [Mn], we may reduce by optional stopping to the case when [Mn] < 2 for all n. For each n we may choose some Brownian motion Bn and associated optional times 71: as in Theorem 14.16. Then (X n - Bn)(n A1 < w(Bn, 1 + 6n, 8 n ), n E N, where 8n = sUPk{lrk - [Mn]kl + (Mk)2}, and so E[(X n - B n )(nl\l/\ 1] < E[w(B n , 1 + h, h) 1\ 1] + P{6n > h}. Since 6n  0 by Lemma 14.18, the right-hand side tends to zero as n -+ 00 and then h -+ 0, and the assertion follows. In the case of the time scales (Mn), define Kn = inf{k; [Mn] > 2}. Then [Mn]n - (Mn)n  0 by Lemma 14.18, and so P{ (Mn)K n < 1, Kn < oo} --+ O. We may then reduce by optional stopping to the case when [Mn] < 3. The proof may now be completed as before. 0 Though the Skorohod embedding has no natural extension to higher di- mensions, one can still obtain useful multidimensional approximations by applying the previous results to each component separately. To illustrate the method, we proceed to show how suitable random walks in JRd can be approximated by continuous processes with stationary, independent incre- ments. Extensions to more general limits are obtained by different methods in Corollary 15.20 and Theorem 16.14. Theorem 14.20 (approximation of random walks in JRd) Let 8 1 , 8 2 , . . . be random walks in ]Rd such that .c(Sn)  N(O,o-o-') for some d x d matrix (Y and integers m n --+ 00. Then there exist some Brownian mo- tions B 1 , B 2 , . .. in]Rd such that the processes Xl" = 8nt] satisfy p (xn -aBn); --+ 0 for all t > O. Proof: By Theorem 5.15 we have p max Id8 k l -+ 0, t > 0, k < 111,n t 
14. Skorohod Embedding and Invariance Principles 283 and so we may assume that ISkl < 1 for all nand k. Subtracting the means, we may further assume that EST: = O. Applying Theorem 14.17 in each coordinate, we get w(X n , t, h)  0 as n -+ 00 and then h -+ O. Furthermore, w(uB, t, h)  0 a.s. as h  o. Using Theorem 5.15 in both directions gives Xf n  aBt as t n -i' t. By independence it follows that (X,. . . , Xf rn )  a(Bt 1 , . . . , Bt-rn) for all n E Nand t 1 ,..., t n > 0, and so x n  aB on Q+ by Theorem 4.29. By Theorem 4.30 or, more conveniently, by Corollary 6.12 and Theorem A2.2, there exist some rcll processes yn d xn with n -+ aBt a.s. for all t E Q+. For any t, h > 0 we have E[(yn - aB); /\ 1] < E [maxjt/hIYjh - aBjhl/\ 1] +E[w(yn, t, h) /\ 1] + E[w(aB, t, h) /\ 1]. Multiplying by e- t , integrating over t > 0, and letting 'n -+ 00 and then h -+ 0 along Q+, we get by dominated convergence 1 00 e- t E[(yn - oB); 1\ l]dt -+ o. Hence, by monotonicity, the last integrand tends to zero as n -+ 00, and so (yn - uB);  0 for each t > O. It remains to use Theorem 6.10. 0 Exercises 1. Proceed as in Lemma 14.2 to construct Brownian martingales with leading terms Bt and Bf. Use multiple Wiener-Ita integrals to give an alternative proof of the lemma, and find for every n E N a martingale with leading term Br. (Hint: Use Theorem 13.25.) 2. Given a Brownian motion B and an associated optional time T < 00, show that ET > EB;.. (Hint: Truncate T and use Fatou's lemma.) 3. For Sn as in Corollary 14.8, show that sequence of random variables (2n log log n) -1/2 Sn, n > 3, is a.s. relatively compact with set of limit points equal to [-1,1]. (Hint: Prove the corresponding property for Brownian motion, and use Theorem 14.6.) 4. Let 1, 2, . .. be i.i.d. random vectors in JRd with mean 0 and covariances bij. Show that the conclusion of Corollary 14.8 holds with Sn replaced by I Sn I. More precisely, show that the sequence (2n log log n) -1/2 Sn, n > 3, is relatively compact in JRd, and that the set of limit points is contained in the closed unit ball. (Hint: Apply Corollary 14.8 to the projections u . Sn for arbitrary u E ]Rd with lul = 1.) 
284 Foundations of Modern Probability 5. In Theorem 13.18, show that for any C E (0,1) there exists a sequence t n -+ 00 such that the limsup along (t n ) equals c a.s. Conclude that the set of limit points in the preceding exercise agrees with the closed unit ball in IR d . 6. Condition (21) clearly follows from L:k E[IMkll\ 11.F_1]  O. Show by an example that the latter condition is strictly stronger. (Hint: Consider a sequence of random walks.) 7. Specialize Lemma 14.18 to random walks, and give a direct proof in this case. 8. In the special case of random walks, show that condition (21) is also necessary. (Hint: Use Theorem 5.15.) 9. Specialize Theorem 14.17 to a sequence of random walks in IR, and derive a corresponding extension of Theorem 14.9. Then derive a functional version of Theorem 5.12. 10. Specialize further to the case of successive renormalizations of a single random walk Sn. Then derive a limit theorem for the values at t == 1, and compare with Proposition 5.9. 11. In the second arcsine law of Theorem 14.11, show that the first maxi- mum on [0, 1] can be replaced by the last one. Conclude that the associated times an and Tn satisfy Tn - an  O. (Hint: Use the corresponding result for Brownian motion. Alternatively, use the symmetry of (Sn) and of the arcsine distribution.) 12. Extend Theorem 14.11 to an arbitrary sequence of symmetric ran- dom walks satisfying a Lindeberg condition. Also extend the results for T and T to sequences of random walks based on diffuse, symmetric distribu- tions. Finally, show that the result for T may fail in the latter case. (Hint: Consider the n- 1 -increments of a compound Poisson process based on the uniform distribution on [-1,1], perturbed by a small diffusion term EnB, where B is an independent Brownian motion.) 13. In the context of Theorem 14.20, show that for any Brownian motion B there exist some processes yn d X n such that (yn - a B); -+ 0 a.s. for all t > o. Prove a corresponding version of Theorem 14.17. (Hint: Use Theorem 4.30 or Corollary 6.12.) 
Chapter 15 Independent Increments and Infinite Divisibility Regularity and integral representation; Levy processes and sub- ordinators; stable processes and first-passage time..,; infinitely divisible distributions; characteristics and convergence crite- ria; approximation of Levy processes and random 'walks; limit theorems for null arrays; convergence of extremes In Chapters 12 and 13 we saw how Poisson processes and Brown- ian motion arise as special processes with independent increments. Our present aim is to study more general processes of this type. Under a mild regularity assumption, we shall derive a general representation of independent-increment processes in terms of a Gaussian component and a jump component, where the latter is expressible as a suitably compen- sated Poisson integral. Of special importance is the time-homogeneous case of so-called Levy processes, which admit a description in terms of a charac- teristic triple (a, b, v), where a is the diffusion rate, b is the drift coefficient, and v is the Levy measure that determines the rates for jumps of different sizes. In the same way that Brownian motion is the basic example of both a a diffusion process and a continuous martingale, the general Levy processes constitute the fundamental cases of both Markov processes and general semimartingales. As a motivation for the general weak convergence theory of Chapter 16, we shall further see how Levy processes serve as the natural approximations to random walks. In particular, such approximations may be used to extend two of the arcsine laws for Brownian Jnotion to general symmetric Levy processes. Increasing Levy processes, even called subordi- nators, playa basic role in Chapter 22, where they appear in representations of local time and regenerative sets. The distributions of Levy processes at fixed times coincide with the in- finitely divisible laws, which also arise as the most general limit laws in the classical limit theorems for null arrays. The special cases of convergence to- ward Poisson and Gaussian limits were considered in Chapter 5, and now we shall be able to characterize the convergence toward an arbitrary in- finitely divisible law. Though characteristic functions will still be needed occasionally as a technical tool, the present treatment is more probabilis- 
286 Foundations of Modern Probability tic in flavor and involves as crucial steps a centering at truncated means followed by a compound Poisson approximation. To resume our discussion of general independent-increment processes, say that a process X in ]Rd is continuous in probability if Xs !+ Xt whenever s  t. Let us further say that a function f on JR+ or [0, 1] is right-continuous with left-hand limits (abbreviated as rell) if the right- and left-hand limits ft:1:: exist and are finite and if, moreover, ft+ = ft. A process X is said to be rcll if its paths have this property. In that case only jump discontinuities may occur, and we say that X has a fixed jump at some time t > 0 if P{X t f= Xt-} > O. The following result gives the basic regularity properties of independent- increment processes. A similar result for Feller processes is obtained by different methods in Theorem 19.15. Theorem 15.1 (regularization, Levy) If a process X in d is continuous in probability and has independent increments, then X has an rcll version without fixed jumps. For the proof we shall use a martingale argument based on the characteristic functions 'Ps,t(u) = Eexp{iu(X t - Xs)}, u E ]Rd, 0 < s < t. Note that 'Pr,s'Ps,t = 'Pr,t for any r < s < t, and put 'PO,t = 'Pt. In order to construct associated martingales, we need to know that 'Ps,t =1= o. Lemma 15.2 (zeros) For any u E JRd and s < t, we have 'Ps,t(u) =1= o. Proof: Fix any u E JRd and s < t. Since X is continuous in probability, there exists for any r > 0 some h > 0 such that 'Pr,r'(u) =1= 0 whenever Ir - r'l < h. By compactness we may then choose finitely many division point s = to < t} < ... < t n = t such that 'Ptk-l ,tk (u) =1= 0 for all k, and by the independence of the increments we get 'Ps,t(u) = Ilk'Ptk-l,tk(u) =1= o. 0 We also need the following deterministic convergence criterion. Lemma 15.3 (complex exponentials) Fix any aI, a2,... E ]Rd. Then an converges iff e iuan converges for almost every U E ]Rd. Proof: Assume the stated condition. Fix a nondegenerate Gaussian ran- dom vector 'T] in JRd, and note that exp{ it'T]( am - an)}  1 a.s. as m, n ---t 00 for fixed t E JR. By dominated convergence the characteristic function of 'T](a m - an) tends to 1, and so 'T](a m - an)  0 by Theorem 5.3, which implies am - an  o. Thus, (an) is Cauchy and therefore convergent. 0 Proof of Theorem 15.1: We may clearly assume that Xo = o. By Lemma 15.2 we may define e iuXt M = () ' t > 0, U E ]Rd, <Pt U 
15. Independent Increments and Infinite l)ivisibility 287 which is clearly a martingale in t for each u. Letting Ou c n denote the set where e iuXt has limits from the left and right along (11+ at every t > 0, we see from Theorem 7.18 that po. u == 1. Restating the definition of Ou in terms of upcrossings, we note that the set A == {(u,w); W E Ou} is product measurable in jRd x O. Writing Aw == {u E ]Rd; W E o.u}, it follows by Fubini's theorem that the set (2' == {w; Ad A == O} has probability 1. If w E 0', we have u E Aw for almost every u E JRd, and so Lemma 15.3 shows that X itself has finite right- and left-hand limits along Q+. Now define Xt == X t + on 0.' and X == 0 on o.'c, and note that X is rcll everywhere. Further note that X is a version of X, sice Xt+h  Xt as h ---t 0 for fixed t by hypothesis. For the same reason, X has no fixed jumps. D We proceed to state the general representation theorem. Given any Pois- son process 'fJ with intensity measure 11 == E'TJ, we recall from Theorem 12.13 that the integral (1J - 11) f == J f (x) (1J - 11) (dx) exists in the sense of approximation in probability iff lI(f2 1\ If I) < 00. Theorem 15.4 (independent-increment processes, Levy, lto) Let X be an rell process in ]Rd with Xo == o. Then X has independent increments and no fixed jumps iff, a.s. for each t > 0, Xt=mt+G t + t ( x(1]-E1])(dsdx)+ t ( X 1](dsdx), (1) 10 Jlxll 10 J1xl>1 for some continuous function m with mo == 0, some continuous centered Gaussian process G with independent increments and Go == 0, and some independent Poisson process 1J on (0,00) X (JR d \ {O}) with l t j(lx' 2 A l)E1](dsdx) < 00, t > O. (2) In the special case when X is real and nondecreasing, (1) simplifies to Xt = at + l t l°O x1](dsdx), t > 0, (3) for some nondecreasing continuous function a with ao 0 and some Poisson process TJ on (0, 00)2 with l t l°O(XAl)E1](dsdX) <00, t>O. (4) Both representations are a.s. unique, and all functions m or a and processes G and 'TJ with the stated properties may occur. We begin the proof by analyzing the jump structure of X. Let us then introduce the random measure 1] = Lt8t,Xt = L?{(t,Xt) E.}, (5) where the summation extends over all times t > 0 with Xt - Xt - Xt- =I- O. We say that 'TJ is locally X -measurable if, for any s <::: t, the measure 7](8, t] X .) is a measurable function of the process X r - Xs, r E [s, t]. 
oo rUUllua10ns or lVloaern r'robability Lemma 15.5 (Poisson process of jumps) Let X be an Tell process in d with independent increments and no fixed jumps. Then 1J in (5) is a locally X -measurable Poisson process on (0,00) x (JRd \ {O}) satisfying (2). If X is further real-valued and nondecreasing, then "1 is supported by (0,00)2 and satisfies (4). Proof (beginning): Fix any times s < t, and consider a sequence of parti- tions s = tn,o < . . . < tn,n with maxk(tn,k-tn,k-l) -+ O. For any continuous function f on JRd that vanishes in a neighborhood of 0, we have L/(Xtn,k - Xtn,k_J -4 J f(x)1J((s, t] x dx), which implies the measurability of the integrals on the right. By a simple approximation we may conclude that 1J((s, t] x B) is measurable for every compact set B C JRd \ {OJ. The measurability extends by a monotone class argument to all random variables 1]A with A included in some fixed bounded rectangle [0, t] x B, and the further extension to arbitrary Borel sets is immediate. Since X has independent increments and no fixed jumps, the same prop- erties hold for 1], which is then Poisson by Theorem 12.10. If X is real-valued and nondecreasing, then (4) holds by Theorem 12.13. 0 The proof of (2) requires a further lemma, which is also needed for the main proof. Lemma 15.6 (orthogonality and independence) Let X and Y be rcll pro- cesses in JRd with Xo = Yo = 0 such that (X, Y) has independent increments and no fixed jumps. Assume also that Y is a.s. a step process and that X. LlY = 0 a.s. Then XllY. Proof: Define 'fJ as in (5) in terms of Y, and note as before that 1] is locally Y-measurable whereas Y is locally 'T}-measurable. By a simple transforma- tion of 1J we may reduce to the case when Y has bounded jumps. Since 1] is Poisson, Y then has integrable variation on every finite interval. By Corollary 3.7 we need to show that (X tI ,... , X tn )JL(lt I , . . . , lt n ) for any tl < ... < tn, and by Lemma 3.8 it suffices to show for all s < t that Xt - XsJlyt - Ys. Without loss of generality, we may take s = 0 and t = 1. Then fix any u, v E d, and introduce the locally bounded martingales e iuX t Mt = E . x ' e ZU t e ivYt Nt = E . Y; , eV t t > o. 
15. Independent Increments and Infinite l)ivisibility 289 Note that N again has integrable variation on [0,1]. For n E N, we get by the martingale property and dominated convergence E MINI - 1 E Lk:'Sn (Mk/n - M(k-l)/n) (N k / n - N(k-l)/n) E 1 1 (M[sn+l-J/n - M[sn-]/n) dN -+ E (I llMsdN s = E L llMsllN s = O. Jo s1 Thus, E MINI = 1, and so EeiuXl+ivYl = EeiuXIEeivYl, u,v E d. The asserted independence XIJlY I now follows by the uniqueness theorem for characteristic functions. 0 End of proof of Lemma 15.5: It remains to prove (2). 'Then define TJt = 1]([0, t] x .), and note that 1Jt{x; Ixl > E} < 00 a.s. for aU t, c > 0 because X is rcll. Since 1] is Poisson, the same relations hold for the measures E1Jt, and so it suffices to prove that ( Ixl 2 E'T}t(dx) < 00, t > O. Jlxl1 Then introduce for each e > 0 the process (6) x: = L llXsl{lllXsl > e} = ( x'T}t(dx), t > 0, st J1xl>c and note that Xc lLX - Xc by Lemma 15.6. By Lemmas 12.2 (i) and 15.2 we get for any €,t > 0 and u E}Rd \ {O} o < I Eei'Ux t I < I Ee iuX : I = Eexp ( iuxT}t(dx) J'xl>c exp { (e iux - l)ET}t(dx) = exp { (cosux - l)ET}t(dx). J1xl>c J1xl>c Letting € -+ 0 gives ( l ux I 2 ET}t(dx)  J (l- cosux)E1Jt(dx) < 00, Jluxl1 and (6) follows since u is arbitrary. o Proof of Theorem 15.4: In the nondecreasing case, we may subtract the jump component to obtain a continuous, nondecreasing process Y with independent increments, and from Theorem 5.11 it is clear that Y is a.s. nonrandom. Thus, in this case we get a representation as in (3). 
:l9U bundations of Modern Probability In the general case, introduce for each € E [0, 1 J the martingale M{ = t r x (1] - E1])(dsdx), t > o. J o J1xIE(c,lJ Put Mt = M?, and let J t denote the last term in (1). By Proposition 7.16 we have E(Mc - MO);2  0 for each t. Thus, M + J has a.s. the same jumps as X, and so the process Y = X - M - J is a.s. continuous. Since 'rJ is locally X-measurable, the same thing is true for Y. Theorem 13.4 then shows that Y is Gaussian with continuous mean and covariance functions. Subtracting the means mt yields a continuous, centered Gaussian process G, and by Lemma 15.6 we get ClL(Mc + J) for every £ > o. The independence extends to M by Lemma 3.6, and so GlL1]. The uniqueness of 1] is clear from (5), and G is then determined by subtraction. From Theorem 12.13 it is further seen that the integrals in (1) and (3) exist for any Poisson process 1] with the stated properties, and we note that the resulting process has independent increments. 0 We may now specialize to the time-homogeneous case, when the distri- bution of Xt+h - Xt depends only on h. An rcll process X in ]Rd with stationary independent increments and X o = 0 is called a Levy process. If X is also real and nonnegative, it is often called a subordinator. Corollary 15.7 (Levy processes and subordinators) An rcll process X in R d is Levy iff (1) holds with mt = bt, G t = uBt, and E1J = A Q9 v for some b E JRd, some d x d-matrix u, some measure v on R d \ {O} with f(lxl2 A l)v(dx) < 00, and some Brownian motion BJ1.1] in ]Rd. Furthermore, X is a subordinator iff (3) holds with at = at and E'fJ = AQ9V for some a > 0 and some measure v on (0,00) with f(x A l)v(dx) < 00. The triple (aa', b, v) or pair (a, v) is then determined by .c(X), and any a, b, u, and v with the stated properties may occur. The measure v above is called the Levy measure of X, and the quantities au', b, and v or a and v are referred to collectively as the characteristics of X. Proof: The stationarity of the increments excludes the possibility of fixed jumps, and so X has a representation as in Theorem 15.4. The stationarity also implies that E1] is time invariant. Thus, Theorem 2.6 yields ETJ = AQ9v for some measure v on JRd \ {O} or (0,00). The stated conditions on v are immediate from (2) and (4). Finally, Theorem 13.4 gives the form of the continuous component. Formula (5) shows that 'fJ is a measurable function of X, and so v is uniquely determined by £(X). The uniqueness of the remaining characteristics then follows by subtraction. 0 From the representations in Theorem 15.4 we may easily deduce the following so-called Levy-Khinchin formulas for the associated characteristic functions or Laplace transforms. Here we write u' for the transpose of u. 
15. Independent Increments and Infinite I)ivisibility 291 Corollary 15.8 (characteristic exponents, Kolmogorov, Levy) Let X be a Levy process in R d with characteristics (a, b, v). Then EeiuXt == et1/Ju for all t > 0 and u E ]Rd, where tPu = iu'b - !u'au + !(eiU'X - 1- iu'x1{lxl < 1})v(dx), u E ]Rd. (7) If X is a subordinator with characteristics (a, v), then also Ee- UXt == e- txu for all t, u > 0, where Xu = ua + ! (1- e-UX)v(dx), u > O. (8) In both cases, the characteristics are determined by .c (X 1) . Proof: Formula (8) follows immediately from (3) and Lemma 12.2 (i). Similarly, (7) is obtained from (1) by the same lemma when v is bounded, and the general case then follows by dominated convergence. To prove the last assertion, we note that  is the unique continuous function with 'l/Jo = 0 satisfying e'l/Ju == Ee iuX1 . By the uniqueness theorem for characteristic functions and the independence of the increments, 'l/J de- termines all finite-dimensional distributions of X, and so the uniqueness of the characteristics follows from the uniqueness in Corollary 15.7. 0 From Proposition 8.5 we note that a Levy process X is Markov for the induced filtration 9 = (Qt) with translation-invariant transition kernels J-Lt(x,B) = J..Lt(B-x) = P{X t E B-x}. More generally, given any filtration :F, we say that X is Levy with respect to F, or simply :F-Levy, if X is adapted to F and such that (X t - Xs)llFs for all s < t. In particular, we may take:F t = gt V N, t > 0, where N == a{N c A; A E A, PA == OJ. Note that the latter filtration is right-continuous by Corollary 7.25. Just as for Brownian motion in Theorem 13.11, we further see that any process X which is F-Levy for some right-continuous, complete filtration F is a strong Markov process, in the sense that the process X' == t',X -X, satisfies X d X' Jl.F, for any finite optional time T. We turn to a brief discussion of some basic symmetry properties. A pro- cess X on JR+ is said to be self-similar if for any r > 0 there exists some s == h(r) > 0 such that the process X rt , t > 0, has the saIne distribution as sX. Excluding the trivial case when Xt == 0 a.s. for all t > 0, it is clear that h satisfies the Cauchy equation h(xy) == h(x)h(y). If X is right-continuous, then h is continuous, and the only solutions are of the form h(x) = x Q for some Q E JR. Let us now return to the context of Levy processes. \Ve say that such a process X is strictly stable if it is self-similar, and weakly stable if it is self-similar apart from a centering, so that for each r > 0 the process (X rt ) has the same distribution as (sXt + bt) for suitable sand b. In the latter case, the corresponding symmetrized process is strictly stable, and so s is again of the form rO;. In both cases it is clear that Q > o. We may then introduce the index p = a-I and say that X is strictly or weakly p-stable. 
292 Foundations of Modern Probability The terminology carries over to random variables or vectors with the same distribution as Xl. Proposition 15.9 (stable Levy processes) Let X be a nondegenerate Levy process in R with characteristics (a, b, v). Then X is weakly p-stable for some p > 0 iff either of these conditions holds: (i) p == 2 and v == 0; (ii) p E (0,2), a == 0, and v(dx) == C:i:lx\-P-Idx on :i: for some C::I: > o. For subordinators, weak p-stability is equivalent to the condition (iii) p E (0,1) and v(dx) == cx-p-1dx on (0,00) for some c > o. Proof: Writing Sr : x  rx for any r > 0, we note that the processes X (rPt) and r X have characteristics r P (a, b, v) and (r 2 a, rb, v 0 S; 1 ), respec- tively. Since the latter are determined by the distributions, it follows that X is weakly p-stable iff rPa == r 2 a and rPv == v 0 S;l for all r > o. In par- ticular, a == 0 when p =F 2. Writing F(x) == v[x, 00) or v( -00, -x], we also note that r P F(rx) == F(x) for all r, x > 0, and so F(x) == x- P F(l), which yields the stated form of the density. The condition J(x 2 A l)v(dx) < 00 implies p E (0, 2) when v -I o. If X > 0, we have the stronger condition J(x A l)v(dx) < 00, so in this case p < 1. 0 If X is weakly p-stable for some p -11, it can be made strictly p-stable by a suitable centering. In particular, a weakly p-stable subordinator is strictly stable iff the drift component vanishes. In the latter case we simply say that X is stable. The next result shows how stable subordinators may arise naturally even in the study of continuous processes. Given a Brownian motion B in , we introduce the maximum process Mt == sUPst Bs and its right-continuous Inverse Tr == inf {t > 0; Mt > r} == inf {t > 0; Bt > T}, r > o. (9) Theorem 15.10 (first-passage times, Levy) For a Brownian motion B, the process T in (9) is a  -stable subordinator with Levy measure v(dx) == (27r)-1/2 x -3/2dx, x > o. Proof: By Lemma 7.6, the random times Tr are optional with respect to the right-continuous filtration :F induced by B. By the strong Markov property of B, the process BrT - Tr is then independent of FTr with the same distribution as T. Since T is further adapted to the filtration (FT r ), it follows that T has stationary independent increments and hence is a su bordinator . To see that T is -stable, fix any c > 0, put Bt = c- 1 B(c 2 t), and define 'i r == inf{t > 0; Bt > r}. Then Tcr = inf{t > 0; Bt > cr} = c 2 inf{t > 0; Bt > r} == c2Tr. 
15. Independent Increments and Infinite Divisibility 293 By Proposition 15.9 the Levy measure of T has a density of the form ax- 3 / 2 , x > 0, and it remains to identify a. Then note that the process Xt = exp(uB t - u2t), t > 0, is a martingale for any u E JR. In particular, EX TrAt = 1 for any r, t > 0, and since clearly B Tr == T, we get by dominated convergence E exp( - u2Tr) == e- ur , u, T > O. Taking u == v2 and comparing with Corollary 15.8, we obtain v2 == 1 00 (1 - e- X )x- 3 / 2 dx = 2 1 00 e-xx-l/2dx = 2J1f, a 0 0 which shows that a == {27r)-1/2. 0 If we add a negative drift to a Brownian motion, the associated maximum process M becomes bounded, and so T == M- 1 terminates by a jump to infinity. For such occasions, it is useful to consider subordinators with possibly infinite jumps. By a generalized subordinator we mean a process of the form Xt = yt + 00. l{t > (} a.s., where Y is an ordinary subordinator and ( is an independent, exponentially distributed random variable. In this case we say that X is obtained from Y by exponential killing. The representation in Theorem 15.4 remains valid in the generalized case, except that v may now have positive mass at 00. The following characterization is needed in Chapter 22. Lemma 15.11 (generalized subordinators) Let X be a nondecreasing and right-continuous process in [0,00] with Xo == 0, and let :F denote the filtration induced by X. Then X is a generalized subordinatoT iff P[X s + t - Xs E .IFs] == P{Xt E,} a.s. on {Xs < oo}, s,t > O. (10) Proof: Writing ( == inf{t; Xt == oo}, we get from (10) the Cauchy equation P{( > s + t} == P{( > s}P{( > t}, s, t 2:: 0, (11) which shows that ( is exponentially distributed with nlean m E (0,00]. Next define J..tt == P[Xt E .IX t < 00], t > 0, and conclude from (10) and (11) that the /-tt form a semigroup under convolution. By Theorem 8.4 there exists a corresponding process Y with stationary, independent increments. From the right-continuity of X, it follows that Y is continuous in probability. Hence, Y has a version that is a subordinator. Now choose - d - - - ( = ( with (lLY, and let X denote the process Y killed at (. Comparing - d with (10), e note that X == X. By Theorem 6.10 we may assume that even X = X a.s., which means that X is a generalized subordinator. The converse assertion is obvious. 0 The next result provides the basic link between Levy processes and triangular arrays. A random vector  or its distribution is said to be in- 
294 Foundations of Modern Probability finitely divisible if for every n E N there exist some i.i.d. random vectors nl, . . . , nn with L:k nk d . By an i. i. d. array we mean a triangular ar- ray of random vectors f,nj, j < m n , where the f,nj are i.i.d. for each nand m n -+ 00. Theorem 15.12 (Levy processes and infinite divisibility) For any random vector  in ]Rd, these conditions are equivalent: (i)  is infinitely divisible; (ii) L: j f,nj  f, for some i. i.d. array (f,nj); (iii) f, d Xl for some Levy process X in d. Under those conditions, £(X) is determined by £() = £(X I ). A simple lemma is needed for the proof. Lemma 15.13 (individual terms) If the nj are such as in Theorem 15.12 (ii), then nl  o. Proof: Let J.-t and J..Ln denote the distributions of  and nj, respectively. Choose r > 0 so small that {L =1= 0 on [-r, r], and write {L = e 1P on this interval, where 'l/;: [-r, r] -+ <C is continuous with 'l/'{O) = O. Since the convergence {l";:'n -+ jl is uniform on bounded intervals, it follows that fln =1= 0 on [-r, r] fOf sufficiently large n. Thus, we may write jl,n(u) = e 1Pn (u) for lul < r, where m n 1/;n -+ 'l/J on [-r, r]. Then'l/;n -+ 0 on the same interval, and therefore {In -+ 1. Now let c < r- 1 , and note as in Lemma 5.1 that 2r J ( 1 - SIn rx ) J.Ln ( dx ) rx > 2r(1- SlnrC )J.Ln{lxl > e}. rE As n -+ 00, the left-hand side tends to 0 by dominated convergence, and we get J..tn  6 0 . 0 J (1 - jJ,n ( U ))du Proof of Theorem 15.12: Trivially (Hi) =? (i) =? (H). Now let nj, j < m n , be an i.i.d. array satisfying (ii), put J.Ln = £(nj), and fix any kEN. By Lemma 15.13 we may assume that k divides each m n and writeL: j nj = h h . . d . th d . t . b t . * ( m n / k ) "D 'T/nl +. . . + 'TJnk, were t e'T/nj are 1.1. . WI IS fl U Ion J.Ln . ror any u E ]Rd and r > 0 we have (P{ U17nl > r})k = P{minj$k U 17nj > r} < P{Lj$k U17nj > kr } , and so the tightness of Lj 'T/nj carries over to the sequence ('T/nl). By Propo- sition 5.21 we may extract a weakly convergent subsequence, say with limiting distribution Vk. Since L: j 'rJnj  , it follows by Theorem 5.3 that  has distribution v k k . Thus, (ii) =? (i). Next assume (i), so that .c() = Jj = Jjn for each n. By Lemma 15.13 we get iln --+ 1 uniformly on bounded intervals, and so [1 f:: O. We may 
15. Independent Increments and Infinite Divisibility 295 then write /l = e1/J and /In == e1/Jn for some continuous functions V; and n with 'ljJ(O) == 1Pn(O) == 0, and we get 'ljJ == n'l/Jn for each n. Hence, e bjJ is a characteristic function for every t E Q+, and then also for t E JR+ by Theorem 5.22. By Theorem .16 there exists a process JY with stationary independent increments such that Xt has characteristic function et'ljJ for every t. Here X is continuous in probability, and so by Theorem 15.1 it has an rcll version, which is the desired Levy process. Thus, (i) =} (iii). The last assertion is clear from Corollary 15.8. 0 Justified by the one-to-one correspondence between infinitely divisible distributions fl and their characteristics (a, b, v) or (a, 1./), we may write J.t == id( a, b, v) or J-l == id( a, v), respectively. The last result shows that the class of infinitely divisible laws is closed under weak convergence, and we proceed to derive explicit convergence criteria. Then define for each h > 0 a h = a + f xx'v(dx), b h = b - f xv(dx), J1xlSh Jh<lxlSl where fh<lxlSl == - Jl<lxlSh when h > 1. In the positive case, we define instead a h == a+ Jx<h xv(dx). Let JRd denote the one-point compactification of JRd. - Theorem 15.14 (convergence of infinitely divisible distributions) (i) Let fl == id(a, b, v) and J.-Ln == id(a n , b n , v n ) on JRd, and fix any h > 0 with v{ Ixl == h} == O. Then fln  J-l iff a -+ a h , b .-+ b h , and V n  v on JRd \ {O}. (ii) Let J.L == id( a, v) and J-ln == id( a, v) on JR+, and fix any h > 0 with v{h} == O. Then J.Ln  J.L iff a --t a h and V n  v on (0,00]. For the proof, we consider first the one-dimensional ease, which allows some important simplifications. Thus, (7) may then be written as f . 1 2 . 'iux ux + x _ 1Pu==cu+ (e -1- 2 ) 2 v(dx), 1+x x (12) where x 2 iI(dx) (}' 28 0(dx) + 2 v(dx), (13) l+x c = b+!( 1:x2 -X1{lxl < 1})v(dX), (14) and the integrand in (12) is defined by continuity as -'u 2 /2 when x == o. For infinitely divisible distributions on R+, we may instead introduce the measure ii(dx) = a8 0 + (1 - e-X)v(dx). (15) The associated distributions J.L are denoted by Id(c, iI) and Id(iI), respec- tively. 
296 Foundations of Modern Probability Lemma 15.15 (one-dimensional convergence criteria) (i) Let J.t == Id(c,v) and {In == Id(cn,v n ) on. Then {In  J-t iffc n -+ C and v n  v. (ii) Let J.t == Id(v) and J-Ln == ld(v n ) on JR+. Then J-ln  {l iff v n -4 v. Proof: (i) Defining 'ljJ and 'l/Jn as in (12), we may write P ( e'l/J and iln == e'l/ln. If C n -+ c and v n -4 v, then 'l/;n --t 'l/J by the boundedness and continuity of the integrand in (12), and so {In --t fl, which implies fln  J.t by Theorem 5.3. Conversely, J.tn -4 {l implies {In --t jl, uniformly on bounded intervals, and we get 'l/Jn --t 1/J in the same sense. Now define 1 1 J . ( sin X ) 1 + x 2 - X(u) == ('ljJ(u) - 'ljJ(u + s)) ds == 2 eux 1 - - 2 v(dx), -1 x X and similarly for Xn, where the interchange of integrations is justified by Fubini's theorem. Then Xn -7 X, and so by Theorem 5.3 (1 _ sinx ) 1 +2 X2 vn(dx)  (1- sinx ) 1 \X 2 v(dx). x x x x Since the integrand is continuous and bounded away from 0, it follows that v n -4 v. This implies convergence of the integral in (12), and by subtraction C n --t c. (ii) This may be proved directly by the same method, where we note that the functions in (8) satisfy X(u + 1) - X(u) = J e-UXv(dx). 0 Proof of Theorem 15.14: For any finite measures m n and m on JR. we w v- note that fin --t m iff m n --t m on 1R \ {O} and m n ( -h, h) --t m( -h, h) for some h > 0 with m{:!:h} = O. Thus, for distributions J-L and J.tn on JR, we have v n  v iff V n -4 v on JR \ {O} and a --t a h for any h > 0 with v{:1:h} == O. Similarly, v n  v holds for distributions J.t and J.tn on JR+ iff V n -4 v on (0, 00] and a --t a h for all h > 0 with v{ h} == o. Thus, (ii) follows immediately from Lemma 15.15. To obtain (i) from the same lemma when d = 1, it remains to notice that the conditions b --t b h and C n -7 C are equivalent when v n ;; and v{:f:h} = 0, since Ix-x(l+x 2 )-11 < Ix1 3 . Turning to the proof of (i) when d > 1, let us first assume that V n -4 v on JRd \ {O} and that a --t a h and b  b h for some h > 0 with v{lxl = h} = o. To prove J.tn  J.t, it suffices by Corollary 5.5 to show that, for any one-dimensional projection 1r U : x  u' x with u i= 0, J-Ln 0 1r;; 1  J.L 0 1r:;; 1 . Then fix any k > 0 with v{lu'xl == k} = 0, and note that J.L 01r;;1 has the associated characteristics V U = v 0 1r;1 and au,k - u' ahu + J (u' X )2{I(o,k] (Iu' xl) - l(O,h] (Ixl)}v(dx), bu,k u'b h + J U'X{I(l,k] (Iu'xl) - l(l,h](l x l)}v(dx). 
15. Independent Increments and Infinite Divisibility 297 Let a,k, b,k, and v denote the corresponding characteristics of J-Ln 0 7r:;; 1 . Then V U -4 V U on IR \ { O } and furthermore au,k ---t au,k and bu,k ---t bu,k n , n n' The desired convergence now follows from the one-dimensional result. Conversely, assume that J-ln  j.,L. Then JLn 0 7r;: 1  I-L 0 7r:;; 1 for every u =1= 0, and the one-dimensional result yields v:: -4 V U on ]R \ {O} as well as a,k ---t au,k and b,k ---t bu,k for any k > 0 with v{ lu' xl == k} == o. In particular, the sequence (vnK) is bounded for every compact set K C ]Rd \ {O}, and so the sequences (u'au) and (u'b) are bounded for any u =1= 0 and h > O. In follows easily that (a) and (b) are bounded for every h > 0, and therefore all three sequences are relatively compact. Given any subsequence N' eN, we have V n -4 v' along a further sub- sequence Nil C N' for some measure v' satisfying J(lxl2 1\ l)v'(dx) < 00. Fixing any h > 0 with v' {Ixl == h} == 0, we may choose a still further sub- sequence N'" such that even a and b converge toward some limits a' and b'. The direct assertion then yields J.Ln  J-L' along Nil', where J-L' is infinitely divisible with characteristics determined by (a', b ' , v'). Since J-l' == J-l, we get v' == v, a' == a h , and b' == b h . Thus, the convergence remains valid along the original sequence. 0 By a simple approximation, we may now derive explicit criteria for the convergence E j €nj  € in Theorem 15.12. Note that the compound Poisson distribution with characteristic measure J-L == .c() is given by jl == id(O, b, J.L), where b == E[€; I€I < 1]. For any array of random vectors nj, we may introduce an associated compound Poisson ary:ay, consisting of row- wise independent compound Poisson random vectors €nj with characteristic measures £(€nj). Corollary 15.16 (i.i.d. arrays) Consider in]Rd an i.i.d. array (€nj) and an associated compound Poisson array (€nj), and let € be id(a,b,v). Then Ej€nj  € ifJEjnj . For any h > 0 with v{lxl == h} == 0, it is also equivalent that (i) mn£(nl) -4 v on JRd \ {O}; (ii) mnE[€nl€l; I€nll < h] -+ a h ; (iii) mnE[€nl; lnll < h] ---t b h . Proof: Let J.L == £(€) and write p == e'l/J, where 'l/J is continuous with 'ljJ(0) == o. If J.lmn  J.l, then r::n ---t P uniformly on compacts. Thus, on any bounded set B we may write iln == e'l/1n for large enough n, where the 1/Jn are continuous with m n 1/Jn -+ 'l/J uniformly on B. Hence, m n ( e V;n - 1) -+ 'ljJ, and so jlmn  J.L. The proof in the other direction is similar. Since jlmn is id(O,bn,mnJ-Ln) with b n == mnXI$l xJ-Ln(dx), the last assertion follows by Theorem 15.14. 0 The weak convergence of infinitely divisible laws extends to a pathwise approximation property for the corresponding Levy processes. 
298 Foundations of Modern Probability Theorem 15.17 (approximation of Levy processes, Skorohod) Let X, Xl, X 2 , . .. be Levy processes in ]Rd with Xl i+ Xl. Then there exist some - d - P processes xn == xn such that (X n - X); ---t 0 for all t > O. Before proving the general result, we consider two special cases. Lemma 15.18 (compound Poisson case) The conclusion of Theorem 15.17 holds when X, Xl, x 2 ,. .. are compound Poisson with characteristic measures v, VI, V2, . .. satisfying V n  v. Proof: Allowing positive mass at the origin, we may assume that v and the V n have the same total mass, which may then be reduced to 1 through a suit- able scaling. If l, 2, . .. and 1' 2 , . .. are associated i.i.d. sequences, then (1 , 2 , . . . ) i+ (l, 2, . . . ) by Theorem 4.29, and by Theorem 4.30 we may assume that the convergence holds a.s. Letting N be an independent unit- rate Poisson process, and defining Xt == EjNt j and XI" == EjNt j, it follows that (X n - X); ---t 0 a.s. for each t > O. 0 Lemma 15.19 (case of small jumps) The conclusion of Theorem 15.17 P holds when EX n = 0 and 1 > (LlXn)i -+ O. Proof: Since (xn)i  0, we may choose some constants h n ---t 0 with m n == h;;l E N such that w(xn, 1, h n )  O. By the stationarity of the increments, it follows that w(X n , t, h n )  0 for all t > O. Next, Theorem 15.14 shows that X is centered Gaussian. Thus, there exist as in Theorem 14.20 some processes yn d (X[:nnt]hJ with (yn - X);  0 for all t > o. ..., d By Coollary 6.11 we may further choose some processes xn == X n with yn = X[:nnt]h n a.s. Then, as n ---t 00 for fixed t > 0, E[(X n - X); 1\ 1] < E[(yn - X); !\ 1] + E[w(X n , t,h n )!\ 1] ---t O. 0 Progf of Theorem 15.17: The asserted convergence is clearly equivalent to p(X n , X) --+ 0, where p denotes the metric p(X, Y) = lcc e- t E[(X - Y); /\ l]dt. For any h > 0 we may write X == L h + M h + Jh and X n = Ln,h + Mn,h + In,h with Lf = bht and L,h = bt, where M h and Mn,h are martingales containing the Gaussian components and all centered jumps of size < h, and the processes Jh and In,h are formed by all remaining jumps. Write B for the Gaussian component of X, and note that p( M h , B) --7 0 as h -+ 0 by Proposition 7.16. For any h > 0 with v{ Ixl == h} == 0, it is clear from Theorem 15.14 that b -+ b h and v  v h , where v h and v denote the restrictions of v and V n , respectively, to the set {I x I > h}. The same theorem yields a ---t a as n --7 00 and then h -+ 0, and so under those conditions M,h i+ Bl' 
15. Independent Increments and Infinite Divisibility 299 Now fix any c > o. By Lemma 15.19 there exist some constants h, r > 0 - d - and processes Mn,h == Mn,h such that p( M h , B) < E and p( Al n ,h, B) < E for all n > r. Furthermore, if v{lxl == h} == 0, there exist by Lemma 15.18 - d - some number r' > r and processes In,h == In,h independent of Mn,h such that p( jh , jn,h) < E for all n > r'. We may finally choose r" > r' so large that p(Lh,Ln,h) < c for all n > r". The processes X n : Ln,h + JVtn,h + - d - In,h == x n then satisfy p( X, X n ) < 4E for all n > r n . 0 Combining Theorem 15.17 with Corollary 15.16, we get a similar approx- imation theorem for random walks, which extends the result for Gaussian limits in Theorem 14.20. A slightly weaker result is obtained by different methods in Theorem 16.14. Corollary 15.20 (approximation of random walks) (;onsider in d a Levy process X and some random walks 8 1 , 8 2 , . .. such that Sn  Xl for some integers m n -t 00, and let N be an independent unit-rate Pois- son process. Then there exist some processes x n d (sn 0 N mnt ) such that p (X n - X); -+ 0 for all t > O. In particular, we may use this result to extend the first two arcsine laws in Theorem 13.16 to symmetric Levy processes. Theorem 15.21 (arcsine laws) Let X be a symmetric Levy process in lR. with Xl i= 0 a.s. Then these random variables are arcsine distributed: II == A{t < 1; Xt > O}, 72 == inf{t > 0; Xt V Xt- == SUPs<lXs}. (16) The purpose of the condition Xl i= 0 a.s. is to exclude the degenerate case of pure jump-type processes. Lemma 15.22 (diffuseness, Doeblin) A measure 11 == id( a, b, v) in JRd is diffuse iff a i= 0 or vd == 00. Proof: If a == 0 and vJRd < 00, then J-l is compound Poisson apart from a shift, and so it is clearly not diffuse. When either condition fails, then it does so for at least one coordinate projection, and we may take d == 1. If a > 0, the diffuseness is obvious by Lemma 1.28. Next assume that v is unbounded, say with v(O, 00) == 00. For each n E N we may then write v == V n +v, where v is supported by (0, n- 1 ) and has total mass log 2. For J-L we get a corresponding decomposition J-ln * J-l, where J-l is compound Poisson with Levy measure v and J.-l {O} == . For any x E IR and E > 0 we get J.L{X} < JLn{x}J.l{O} + J.ln[x - E,X)J.l(O,E] + IL(E, (0) <  JLn [x - c, x] + J.l (c, 00 ). Letting n -+ 00 and then c --t 0, and noting that J1  8 0 and J-ln  /1, we get JL{x} < JL{ x} by Theorem 4.25, and so p,{ x} == o. 0 
300 Foundations of Modern Probability Proof of Theorem 15.21: Introduce the random walk SJ: = X k / n , let N be an independent unit-rate Poisson process, and define Xr = sn 0 N nt. By - d - P Corollary 15.20 there exist some processes X n = X n with (X n - X)i -+ O. Define If and 12 as in (16) in terms of X n , and conclude from Lemmas 14.12 and 15.22 that Tin  Ti for i == 1,2. Now define ur == N;;I L l{Sk > O}, u; == N;;Imin{k; Sk == maXjNnSj}. kNn Since t- I Nt -+ 1 a.s. by the law of large numbers, we have sUPt<1In- 1 N nt - tl -t 0 a.s., and so 0"'2 -'2 -t 0 a.s. Applying the same law to the sequence of holding times in N, we further note that ar - 71  O. Hence, ai  Ii for i = 1,2. Now ar d a2' by Corollary 11.14, and by Theorem 14.11 we have a'2  sin 2 Ct where Ct is U(O, 21r). Hence, 71 d T2 d sin 2 Ct. 0 The preceding results will now be used to complete the classical limit theory for sums of independent random variables begun in Chapter 5. Re- call that a null array in }Rd is defined as a family of random vectors nj, j = 1,..., m n , n E N, such that the nj are independent for each nand satisfy SUPj E[Injl /\ 1] -t O. Our first goal is to extend Theorem 5.11, by giving the basic connection between sums with positive and symmetric terms. Here we write P2 for the mapping x t--+ x 2 . Proposition 15.23 (positive and symmetric terms) Let (nj) be a null array of symmetric random variables, and let  and 1] be infinitely divis- ible with characteristics (a, 0, v) and (a, v 0 pi!), respectively, where v is symmetric and a > 0. Then 2: j nj   iff 2: j j  TJ. Again the proof may be based on a simple compound Poisson ap- proximation. Here n  '/}n means that n   iff 'fJn   for any . Lemma 15.24 (approximation) Let ({nj) be a null array of positive or symmetric random variables, and let (nj) be an associated compound . d - PO'lsson array. Then 2: j {nj I"'.J 2: j nj. Proof: Write J-t = £() and J-tnj = £({nj). In the symmetric case we need to show that II .{lnj -+ jl J {:=::> II. exp({lnj - 1) --t {L, J which is immediate from Lemmas 5.6 and 5.8. In the nonnegative case, a similar argument applies to the Laplace transforms. 0 Proof of Proposition 15.23: Define JLnj = £'(nj), and fix any h > 0 with v{lxl = h} == O. By Theorem 15.14 (i) and Lemma 15.24 we have 
15. Independent Increments and Infinite Divisibility 301 E j f,nj   iff L .J-tnj -4 J L .E[j; Injl < h]  J v on JR \ {O}, a + f x 2 v(dx), Jlxlh whereas E j j  11 iff '" -1 L.J . J..Lnj 0 P2 J -4 v 0 P"2 1 on (0,00], L .E[j; j < h2] J -t a + f y( v 0 p;;l )( dy). JYh2 The two sets of conditions are equivalent by Lemma 1.22. o The limit problem for general null arrays is more delicate, since a com- pound Poisson approximation as in Corollary 15.16 or Lelnma 15.24 applies only after a careful centering, as specified by the following key result. Theorem 15.25 (compound Poisson approximation) let (nj) be a null array of random vectors in JRd, and fix any h > o. Define 'TIn) == nj - b nj , where b nj == E[nj; I€nj I < h], and let (i]nj) be an associated compound Poisson array. Then L .nj  L . (r,nj + b nj ). J J (17) A technical estimate is needed for the proof. Lemma 15.26 (uniform summability) Let the random vectors 'TInj == €nj - b nj in Theorem 15.25 have characteristic functions <.{Jnj. Then either condition in (17) implies limsup L .11 - 'Pnj(u)\ < 00, U E ]Rd. n-+oo J Proof: By the definitions of b nj , 'TInj, and 'Pnj, we have 1 - ipnj(U) = E[l- e i1L ''T/n3 + iu'11njl{lnjl < h}] - iu'bnjP{Injl > h}. Putting an == L.E[1Jnj1Jj; I€njl < h], Pn == L.P{Injl > h}, ) J and using Lemma 5.14, we get L .11- 'Pnj(u)\ ::S u'anU + (2 + lul)Pn. J Hence, it is enough to show that (u'anu) and (Pn) are bounded. Assuming the second condition in (17), the desired boundedness follows easily from Theorem 15.14, together with the fact that maxj Ibnjl  o. If instead E j nj  , we may introduce an independent copy (j) of the 
302 Foundations of Modern Probability array (nj) and apply Theorem 15.14 and Lemma 15.24 to the symmetric random variables (::j == u' nj - u' j' For any h' > 0, this gives limsup L .P{I(jl > h'} < 00, noo J (18) limsup L .E[((j)2; I(jl < hi] < 00. nCX) J (19) The boundedness of Pn follows from (18) and Lemma 4.19. Next we note that (19) remains true with the condition I(j I < hi replaced by Inj I V Ij I < h. Furthermore, by the independence of nj and j' !L.E[((j)2; Injl V Ijl < h] J L .E[(u'1Jnj)2; Injl < h]P{Injl < h} - L . (E[u'1]nj; Injl < h])2 J J > u'anuminjP{Injl < h} - L.(U'bnjP{Injl > h})2. J Here the last sum is bounded by Pn maxj ( u' b nj )2 -+ 0, and the minimum on the right tends to 1. The boundedness of (u'anu) now follows by (19). 0 Proof of Theorem 15.25: By Lemma 5.13 it is enough to show that E j I 'Pnj ( u) - exp{ 'Pnj ( u) - I} I -+ 0, where 'Pnj denotes the characteristic function of "Inj. This is clear from Taylor's formula, together with Lemmas 5.6 and 15.26. 0 In particular, we may now identify the possible limits. Corollary 15.27 (limit laws, Feller, Khinchin) Let (nj) be a null array of random vectors in d such that E j nj !!,.  for some random vector . Then  is infinitely divisible. Proof: The random vectors iinj in Theorem 15.25 are infinitely divisible, so the same thing is true for the sums E j (finj -b nj ). The infinite divisibility of  then follows by Theorem 15.12. 0 We may further combine Theorems 15.14 and 15.25 to obtain explicit convergence criteria for general null arrays. The present result generalizes Theorem 5.15 for Gaussian limits and Corollary 15.16 for i.i.d. arrays. For convenience, we write cov[; A] for the covariance matrix of the random vector lA. 
15. Independent Increments and Infinite Divisibility 303 Theorem 15.28 (convergence criteria for null arrays, Doeblin, Gne- denko) Let (nj) be a null array of random vectors in JRd, let  be ide a, b, v), and fix any h > 0 with v{lxl == h} == o. Then j (nj   iff these conditions hold: (i) j £'(nj)  v on JRd \ {O}; (ii) j cov[nj; Inj I < h] -+ a h ; (Hi) j E[nj; Inj I < h] -+ b h . Proof: Define anj == cov[nj; Inj I < h] and b nj == E[nj; Inj I < h]. By Theorems 15.14 and 15.25 the convergence j nj   is equivalent to the conditions v - (i') j .c("'nj) -+ v on JRd \ {O}, (ii') j E["'nj1]j; I"'nj I < h] -+ a h , (iii') j (b nj + E[1]nj; I"'nj I < h]) -+ b h . Here (i) and (i') are equivalent since maxj Ib nj I ---t O. Using (i) and the facts that maxj Ibnjl -+ 0 and v{lxl == h} == 0, it is further elear that the sets {11Jnj I < h} in (ii') and (Hi') can be replaced by {Inj I ; h}. To prove the equivalence of (ii) and (ii'), it is then enough to note that, in view of (i), IIL j {anj - E[17nj17jj Injl < h]}11 < IILjbnjbjP{Injl > h}11 < maxjlb nj l 2 L .P{Injl > h} ---t O. J Similarly, (iii) and (iii') are equivalent because IL}njP{Injl > h}1 < maxjlbnjl L .P{I€njl > h} ---t o. J In the one-dimensional case, we give two probabilistic interpretations of the first condition in Theorem 15.28, one of which involves the row-wise d extremes. For random measures", and "'n on ]R \ {O}, the convergence "'n ---t 1] on JR \ {O} is defined by the condition "lnf  1]1 for all f E Cj« JR \ {O}). \LjE[17nj; Injl < hJI o Theorem 15.29 (sums and extremes) Let (nj) be a null array oj ran- dom variables with distributions J-Lnj, and define TIn == 2: j 8nJ and Q == maxj (::f:nj ), n EN. Fix a Levy measure v on ]R \ {O}, let 'f} be a Poisson process on ]R \ {O} with E'TJ = 1I, and put a:i: = sup{ x > 0; 1]{:i::x} > O}. Then these conditions are equivalent: . v- (1) j Mnj -+ v on]R \ {O}; (ii) ''In  TI on R \ {O}; (iii) a;  a:f: . 
304 Foundations of Modern Probability The equivalence of (i) and (ii) is an immediate consequence of Theorem 16.18 in the next chapter. Here we give a direct elementary proof. Proof: Condition (i) holds iff L.JLnj(X,OO) -+ v(x,oo), L.JLnj(-OO,-x) -+ v(-oo,-x), (20) J J for all x > 0 with v{::t:x} = O. By Lemma 5.8, the first condition in (20) is equivalent to P{Q < x} = II .(1- P{nj > x}) -+ e-v(x,CX) = P{a+ < x}, J which holds for all continuity points x > 0 iff Q  0+. Similarly, the second condition in (20) holds iff 0;;:-  0-. Thus, (i) and (iii) are equivalent. To show that (i) implies (ii), we may write the latter condition in the form L .f(nj)  171, I E ci«  \ {O}). J (21) Here the variables I(nj) form a null array with distributions J.lnj 0 1-1, and 'TJI is compound Poisson with characteristic measure vol-I. Thus, Theorem 15.14 (ii) shows that (21) is equivalent to the conditions L .JLnj 0 1-1  V 0 I-Ion (0,00], (22) J lim limsup L. f f(x)J.Lnj(dx) = O. (23) E-+O n-+oo J J f(x)s€ Now (22) follows immediately from (i). To deduce (23), it suffices to note that the sum on the left is bounded by 2: j J.lnj(1 1\ €) -+ v(1 /\ c). Finally, assume (ii). By a simple approximation, 17n(x, 00)  17(x, 00) for any x > 0 with I/{ x} = O. In particular, for such an x, P{a < x} = P{1]n(x,oo) = O} -+ P{1](x,oo) = O} = P{a+ < x}, and so Q  a+. Similarly, a  a-, which proves (iii). o Exercises 1. Show that a Levy process X in 1R is a subordinator iff Xl > 0 a.s. 2. Show that the Cauchy distribution J.l(dx) = 7T- 1 (1 + X2)-ldx is strictly I-stable, and determine the corresponding Levy measure 1/. (Hint: Check that jj(u) = e- 1ul . By symmetry, v(dx) = cx- 2 dx for some c > 0, and it remains to determine c.) 3. Let X be a weakly p-stable Levy process. If p =1= 1, show that the process Xt - ct is strictly p-stable for a suitable constant c. Note that the centering fails for p = 1. 
15. Independent Increments and Infinite Divisibility 305 4. Extend Proposition 15.23 to null arrays of spherically symmetric random vectors in JR d . 5. Show by an example that Theorem 15.25 fails without the centering at truncated means. (Hint: Without the centering, condition (ii) of Theorem 15.28 becomes 2: j E[njj; Inj I < h] -+ a h .) 6. Deduce Theorems 5.7 and 5.11 from Theorem 15.14 and Lemma 15.24. 7. For a Levy process X of effective dimension d > 3, show that IXtl -+ 00 a.s. as t -+ 00. (Hint: Define T == inf{t; IXtl > I}, and iterate to form a random walk (Sn). Show that the latter has the same effective dimension as X, and use Theorem 9.8.) 8. Let X be a Levy process in JR, and fix any p E (0,2). Show that t- l/p Xt converges a.s. iff EIX1IP < 00 and either p < 1 or EX l == O. (Hint: Define a random walk (Sn) as before, show that Sl satisfies the same moment condition as Xl, and apply Theorem 4.23.) 9. If  is idea, b, v) and p > 0, show that EIIP < 00 iff x! >1 IxlPv( dx) < 00. (Hint: If v has bounded support, then EIIP < 00 for all p. It is then enough to consider compound Poisson distributions, for which the result is elementary. ) 10. Show by a direct argument that a Z+-valued random variable  is infinitely divisible (on Z+) iff - log E s == 2: k (1- sk) Vk, S E (0, 1], for some unique, bounded measure v = (Vk) on N. (Hint: Assuming .c() = J1n, use the inequality 1 - x < e- x to show that the sequence (nJ1n) is tight on N. Then nJ..Ln  v along a subsequence for some bounded measure v on N. Finally note that -log(1 - x) f".J x as x  O. For the uniqueness, take differences and use the uniqueness theorem for power series.) 11. Show by a direct argument that a random variable  > 0 is infinitely divisible iff -log Ee-u = ua + J(1 - e-UX)v(dx), u > 0, for some unique constant a > 0 and measure v on (0,00) with J(lxiA 1) < 00. (Hint: If .c() = J..Ln, note that the measures Xn(dx) = n(l - e- X )J1n(dx) are tight on R+. Then Xn  X along a subsequence, and we Dlay write X(dx) = a8o(dx) + (1- e- X )lI(dx). The desired representation now follows as before. To get the uniqueness, take differences and use the uniqueness theorem for Laplace transforms.) 12. Show by a direct argument that a random variable  is infinitely divis- ible iff u = log Eeiu exists and is given by (7) for some unique constants a > 0 and b and measure v on 1R \ {O} with J(x 2 1\ l)v(dx) < 00. (Hint: Proceed as in Lemma 15.15.) 
306 Foundations of Modern Probability 13. Given a semi group of infinitely divisible distributions J..tt, show that there exists a process X on 1R+ with stationary, independent increments and £(X t ) = J-Lt for all t > O. Starting from a suitable Poisson process and an independent Brownian motion, construct a Levy process Y with the same property. Conclude that X has a version with rell paths and a similar representation as Y. (Hint: Use Lemma 3.24 and Theorems 6.10 and 6.16.) 
Chapter 16 Convergence of Random Processes, Measures, and Sets Relative compactness and tightness; uniform topology on C(K, S); Skorohod's J 1 -topology; equicontinuity and tightness; conver- gence of random measures; superposition and thinning; ex- changeable sequences and processes; simple point processes and random closed sets The basic notions of weak or distributional convergence were introduced in Chapter 4, and in Chapter 5 we studied the special case of distributions on Euclidean spaces. The purpose of this chapter is to develop the gen- eral weak convergence theory into a powerful tool that applies to a wide range of set, measure, and function spaces. In particular, some functional limit theorems derived in the last two chapters by cumbersome embedding and approximation techniques will then be accessible by straightforward compactness arguments. The key result is Prohorov's theorem, which gives the basic connection between tightness and relative distributional compactness. This result will enable us to convert some classical compactness criteria into convenient probabilistic versions. In particular, we shall see how the Arzela-Ascoli theorem yields a corresponding criterion for distributional compactness of continuous processes. Similarly, an optional equicontinuity condition will be shown to guarantee the appropriate compactness for processes that are right-continuous with left-hand limits (rcll). We shall also derive some gen- eral criteria for convergence in distribution of random measures and sets, with special attention to the point process case. The general criteria will be applied to some interesting concrete situa- tions. In addition to some already familiar results froIn Chapters 14 and 13, we shall obtain a general functional limit theorem for sampling from finite populations and derive convergence criteria for superpositions and thinnings of point processes. Further applications appear in subsequent chapters, such as a general approximation result for Markov chains in Chap- ter 19 and a method for constructing weak solutions to SDEs in Chapter 21. Beginning with the case of continuous processes, let us fix two metric spaces (K,d) and (S,p), where K is compact and S is separable and com- plete, and consider the space C(K, S) of continuous functions from K to 
308 Foundations of Modern Probability S, endowed with the uniform metric jJ(x, y) = SUPtEK p(Xt, Yt). For each t E K we may introduce the evaluation map 1rt : x t---t Xt from C(K, S) to S. The following result shows that the random elements in C(K, S) are precisely the continuous S-valued processes on K. Lemma 16.1 (Borel sets and evaluations) B(C(K, S») = a{1rt; t E K}. Proof: The maps 1ft are continuous, hence Borel measurable, and so the generated a-field C is contained in B(C(K, S)). To prove the reverse relation, we need to show that any open subset G c C(K, S) lies in C. From the Arzela-Ascoli Theorem A2.1 we note that C(K, S) is a- compact and hence separable. Thus, G is a countable union of open balls Bx,r = {y E C(K, S); p(x, y) < r}, and it suffices to prove that the latter lie in C. But this is clear, since for any countable dense set D c K, B x,r = n {y E C(K, S); p(Xt, Yt) < r}. 0 tED If X and xn are random processes on K, we write xn f d ) X for convergence of the finite-dimensional distributions, in the sense that (X;: , . . . , X;:)  (X tI ,. . . , X tk ), tl, . . . , tk E K, kEN. (1) Though by Proposition 3.2 the distribution of a random process is deter- mined by the family of finite-dimensional distributions, condition (1) is insufficient in general for the convergence xn  X in C(K, S). This is al- ready clear when the processes are nonrandom, since pointwise convergence of a sequence of functions need not be uniform. To overcome this difficulty, we may add a compactness condition. Recall that a sequence of random elements  I , 2, . .. is said to be relatively compact in distribution if every subsequence has a further subsequence that converges in distribution. Lemma 16.2 (weak convergence via compactness) Let X, Xl, X 2 ,. .. be random elements in C(K, S). Then X n  X iff X n fd) X and (X n ) zs relatively compact in distribution. Proof: If X n  X, then X n fd) X by Theorem 4.27, and (X n ) is trivially relatively compact in distribution. Now assume instead that (X n ) satisfies the two conditions. If X n  X, we may choose a bounded continuous function f: C(K, S) -+ JR and an c > 0 such that IEf(Xn) - Ef(X)1 > c along some subsequence N' c N. By the relative compactness we may d choose a further subsequence N" and a process Y such that X n -+ Y along Nil. But then X n f d ) Y along Nil, and since also X n fd) X, Proposition 3.2 yields X d Y. Thus, X n i+ X along Nfl, and so Ef(X n ) -+ Ef(X) along the same sequence, a contradiction. We conclude that X n  X. 0 The last result shows the importance of finding tractable conditions for a random sequence l, 2, . .. in a metric space S to be relatively compact. 
16. Convergence of Random Processes, Measures, and Sets 309 Generalizing a notion from Chapter 4, we say that (n) is tight if SUPK liminf P{n E K} == 1, n-+ CX) (2) where the supremum extends over all compact subsets !{ c s. We may now state the key result of weak convergence theory, the equiv- alence between tightness and relative compactness for random elements in sufficiently regular metric spaces. A version for Euclidean spaces was obtained in Proposition 5.21. Theorem 16.3 (tightness and relative compactness, PTohorov) For any sequence of random elements 1, 2, . .. in a metric space S, tightness implies relative compactness in distribution, and the tuo conditions are equivalent when S is separable and complete. In particular, we note that when S is separable and complete, a single random element  in S is tight, in the sense that SUPK P{ E K} == 1. In that case we may clearly replace the "lim inf" in (2) by "inf." For the proof of Theorem 16.3 we need a simple lemma. Recall from Lemma 1.6 that a random element in a subspace of a metric space S may also be regarded as a random element in S. Lemma 16.4 (preservation of tightness) Tightness is pTeserved by contin- uous mappings. In particular, if (n) is a tight sequence of random elements in a subspace A of some metric space S, then (n) remains tight when regarded as a sequence in S. Proof: Compactness is preserved by continuous mappings. This applies in particular to the natural embedding I: A -t S. 0 Proof of Theorem 16.3 (Varadarajan): For S == JRd the result was proved in Proposition 5.21. Turning to the case when S == }RCX), consider a tight se- quence of random elements n = (1' f,'2, . . . ) in JRCX). Writing rf k == (r, . . . , k)' we conclude from Lemma 16.4 that the sequence (1}J:; n E N) is tight in :IRk for each kEN. Given any subsequence N' c N, we may then use a di- d agonal argument to extract a further subsequence Nil such that 1}k -+ some 'f]k as n -+ 00 along Nil for fixed kEN. The sequence ([,(1}k)) is projective by the continuity of the coordinate projections, and so by Theorem 6.14 there exists a random sequence  == (1, 2, . . . ) such that (1, . . . , k) d 1}k for each k. But then n f d )  along Nil, and so Theorem 4.29 yields n   along the same sequence. Next assume that S c JRCX). If (n) is tight in S, then by Lemma 16.4 it remains tight as a sequence in }Roo. Hence, for any sequence N' c N there exist a further subsequence Nil and some random element  such that n   in Roo along Nil. To show that the convergence remains valid in S, it suffices by Lemma 4.26 to verify that  E S a.s. Then choose some compact sets Km C SwithliminfnP{n E Km} > 1-2- rn for each m E N. 
310 Foundations of Modern Probability Since the Km remain closed in ]Roo, Theorem 4.25 yields P{ E Km} > limsupP{n E Km} > liminf P{n E Km} > 1 - 2- m , nEN" n-+oo and so  E UmKm C S a.s. Now assume that S is a-compact. In particular, it is then separable and therefore homeomorphic to a subset A C }ROO. By Lemma 16.4 the tightness of (n) carries over to the image seqence (tn) in A, and by Lemma 4.26 the possible relative compactness of (n) implies the same property for (n). This reduces the discussion to the previous case. Now turn to the general case. If ({n) is tight, there exist some compact sets Km C S with liminfnP{n E Km} > 1- 2- m . In particular, P{n E A} -+ 1, where A == Urn Krn, and so we may choose some random elements TJn in A with P{n = TJn} -+ 1. Here (TJn) is again tight, even as a sequence in A, and since A is a-compact, the previous argument shows that (TJn) is relatively compact as a sequence in A. By Lemma 4.26 it remains relatively compact in S, and by Theorem 4.28 the relative compactness carries over to (n). To prove the converse assertion, let S be separable and complete, and assume that (n) is relatively compact. For any r > 0 we may cover S by some open balls B 1 , B 2 ,... of radius r. Writing G k = B 1 U .. . U B k , we claim that lirn inf P{ n E G k } == 1. k-+oo n (3) Indeed, we may otherwise choose some integers nk t 00 with sUPk P{nk E Gk} == c < 1. By the relative compactness we have nk   along a subsequence N' C N for a suitable , and so P{ E G m } < li1o/ P{nk E G m } < c < 1, mEN, which leads as m -+ 00 to the absurdity 1 < 1. Thus, (3) must be true. Now take r = m- 1 and write G'k for the corresponding sets G k . For any E > 0 there exist by (3) some k 1 , k 1 , . . . E N with inf P{n E G km } > 1- e2- m , mEN. n Writing A = nm G km , we get inf n P{n E A} > 1 - E. Also, note that A is complete and totally bounded, hence compact. Thus, (n) is tight. 0 In order to apply the last theorem, we need convenient criteria for tight- ness. Beginning with the space C(K, S), we may convert the classical Arzela-Ascoli compactness criterion into a condition for tightness. Then introduce the modulus of continuity w(x,h) = sup{p(xs,Xt); d(s,t) < h}, x E C(K,S), h > O. The function w(x, h) is clearly continuous for fixed h > 0 and hence a measurable function of x. 
16. Convergence of Random Processes, Measures, and Sets 311 Theorem 16.5 (tightness in C(K, S), Prohorov) For any metric spaces K and S, where K is compact and S is separable and complete, let X, Xl, X 2 ,... be random elements in C(K, S). Then X n  X iff X n f d ) X and Hrn lim sup E[w(Xn, h) /\ 1] == O. hO n-+oo (4) Proof: Since C(K, S) is separable and complete, Theorem 16.3 shows that tightness and relative compactness are equivalent for (X n ). By Lemma 16.2 it is then enough to show that, under the condition X n  X, the tightness of (X n ) is equivalent to (4). First let (X n ) be tight. For any c > 0 we may then choose a compact set B c C(K, S) such that limsuPn P{X n E BC} < E. By the Arzela-Ascoli Theorem A2.1 we may next choose h > 0 so small that w(x, h) < E for all x E B. But then limsuPn P{w(Xn, h) > E} < E, and (4) follows since E was arbitrary. Next assume that (4) holds and X n f d ) X. Since each X n is continuous, w( X n , h) -+ 0 a.s. as h -+ 0 for fixed n, so the "lim sup" in (4) may be replaced by "sup." For any E > 0 we may then choose hI, h 2 , . .. > 0 so small that sUPnP{w(Xn, hk) > 2- k } < 2- k - 1 E, kEN. (5) Letting t 1 , t2, . .. be dense in K, we may further choose some compact sets C 1 , C 2 ,. . . C S such that sUPnP{Xn(tk) E Ck} < 2- k - I E, kEN. (6) Now define B = nk{x E C(K,S); X(tk) E C k , w(x,h k ) < T k }. Then B is compact by the Arzela-Ascoli Theorem A2.1, and from (5) and (6) we get sUP n P{X n E B C } < E. Thus, (X n ) is tight. D One often needs to replace the compact parameter space K by some more general index set T. Here we assume T to be locally compact, second- countable, and Hausdorff (abbreviated as ZeseH) and endow the space C(T, S) of continuous functions from T to S with the topology of uniform convergence on compacts. As before, the Borel a-field in C(T, S) is gener- ated by the evaluation maps 1ft, and so the random elements in C(T, S) are precisely the continuous processes on T taking values in S. The following result characterizes convergence in distribution of such processes. 
312 Foundations of Modern Probability Proposition 16.6 (locally compact parameter space) Let X, Xl, X 2 , . . . be random elements in G(T, S), where S is a metric space and T is LescH. Then x n .!!t X iff convergence holds for the restrictions to any compact subset K c T. Proof: The necessity is obvious from Theorem 4.27, since the restriction map 'irK : G(T, S) --* C(K, S) is continuous for any compact set K c T. To prove the sufficiency, we may choose some compact sets KI C K 2 C . .. c T with KJ t T, and let Xi, xl , X;, . .. denote the restrictions of the processes X, Xl, X 2 , . .. to K i . By hypothesis we have Xi .!!t Xi for every i, and so Theorem 4.29 yields (Xl' X 2 , . . .) .!!t (Xl, X 2 , . . .). Now 1r == ('lrK 1 ,1rK 2 ,...) is a homeomorphism from G(T,S) onto its range in d XjC(K j , S), and so xn ---7 X by Lemma 4.26 and Theorem 4.27. 0 For a simple illustration, we may prove a version of Donsker's Theorem 14.9. Since Theorem 16.5 applies only to processes with continuous paths, we need to replace the original step processes by their linearly interpolated verSIons X == n- I / 2 {  k + (nt - [nt])[nt]+l } ' t > 0, n E N. (7) k'5::nt Corollary 16.7 (functional central limit theorem, Donsker) Let €1, €2, . . . be i. i. d. random variables with mean 0 and variance 1, define X I , X 2 , . . . by (7), and let B denote a Brownian motion on JR+. Then X n .!!t B zn C(]R+). The following simple estimate may be used to verify the tightness. Lemma 16.8 (maximum inequality, Ottaviani) Let €1, 2,... be i.i.d. random variables with mean 0 and variance 1, and put Sn == Ljnj. Then P{S > 2rvn} < P{In -:vn} , r > 1, n E N. Proof: Put c == rJTi, and define T == inf{kEN; ISkl > 2c}. By the strong Markov property at T and Theorem 6.4, P{ISnl > c} > P{ISnl > c, S > 2c} > p{, < n, IBn - Sri < c} > P{S > 2c} minknP{ISkl < c}, and by Chebyshev's inequality, min P{ISkl < c} > min(l - kc- 2 ) > (1 - nc- 2 ) == 1 - r- 2 . 0 k'5::n k'5::n Proof of Corollary 16. 7: By Proposition 16.6 it is enough to prove the convergence on [0, 1]. Clearly, X n fd) X by Proposition 5.9 and Corollary 5.5. Combining the former result with Lemma 16.8, we further get the 
16. Convergence of Random Processes, Afeasures, and Sets 313 rough estimate lirn r2limsupP{S > ryln} == 0, r --+ 00 n --+ 00 which implies lirn h-1lirnsup supP{suPo<r<hIXf+r - Xfl > E} == O. hO n--+oo t - - Now (4) follows easily, as we divide [0,1] into subintervals of length < h. 0 Next we show how the Kolrnogorov-Chentsov criterion in Theorem 3.23 may be converted into a sufficient condition for tightness in C (JR d , S). An important application appears in Theorem 21.9. Corollary 16.9 (moments and tightness) Let Xl, X 2 , . .. be continuous processes on ]Rd with values in a separable, complete metric space (3, p). Assume that (Xl)) is tight in S and that for suitable constants a, b > 0, E{p(X:, Xf)}a 5. Is - tl d + b , s, t E d, n E N, (8) uniformly in n. Then (xn) is tight in CORd, S), and for any c E (0, bja) the limiting processes are a.s. locally Holder continuous 1J.Jith exponent c. Proof: For each process X n we define the associated quantities nk as in the proof of Theorem 3.23, and we get E€k 5. 2- kb . Hence, Lemma 1.29 yields for m, n E N Il w ( X 2-'Tn )Il aI\1 <  li e Il al\l <  2- kb /(aVl) < 2- mb /(aVl) n , a..-.. L...J k ? m  nk a --. L...J k ? m --. ' which implies (4). Condition (8) extends by Lemma 4.11 to any limiting process X, and the last assertion then follows by Theorem 3.23. 0 Let us now fix a separable, complete metric space S, and consider random processes with paths in D(R+, S), the space of rcll functions f: JR+ -+ S. We endow D(R+, S) with the Skorohod J 1 -topology, whose basic properties are summarized in Appendix A2. Note in particular that the path space is again Polish and that compactness may be characterized in terms of a modified modulus of continuity ill, as defined in Theorem A2.2. The following result gives a criterion for weak convergence in D(JR.+, S), similar to Theorem 16.5 for C(K, S). Theorem 16.10 (tightness in D(JR+, S), Skorohod, Prohorov) For any separable, complete metric space S, let X, Xl, X 2 ,... be random elements in D(1R+, S). Then X n  X iff X n fd) X on some dense subset of T = {t > 0; llX t = 0 a.s.} and lim limsupE[w(X n , t, h) 1\ 1] == 0, t > 0. (9) hO noo Proof: Since 1Tt is continuous at every path x E D(JR+, S) with D.Xt == 0, X n  X implies X n f d ) X on T by Theorem 4.27. Now use Theorem A2.2 and proceed as in the proof of Theorem 16.5. 0 
314 Foundations of Modern Probability Tightness in D(+, S) is often verified most easily by means of the fol- lowing sufficient condition. Given a process X, we say that a random time is X-optional if it is optional with respect to the filtration induced by X. Theorem 16.11 (optional equicontinuity and tightness, Aldous) For any metric space (S, p), let X l , x 2 , . .. be random elements in D (JR+, S). Then (9) holds if for any bounded sequence of Xn-optional times Tn and any positive constants h n --+ 0, p(X,X+hn)2+0, n-too. (10) The proof will be based on two lemmas, where the first one IS a restatement of condition (10). Lemma 16.12 The condition in Theorem 16.11 is equivalent to Hm limsup supE[p(X;,X)!\ 1] = 0, t > 0, (11) hO n-too u,r where the supremum extends over all Xn-optional times (1,7 < t with (I < T < a + h. Proof. Replacing p by p!\ 1 if necessary, we may assume that p < 1. The condition in Theorem 16.11 is then equivalent to !im limsup sup sup Ep(X;:,X;-+h) = 0, t > 0, 8.-.0 n-too rt hE[O,6] where the first supremum extends over all Xn-optional times r < t. To deduce (11), assume that 0 < 7 - (j < D. Then [7,7 + 6] c [a, a + 26], and so by the triangle inequality and a simple substitution, bp(X q , X r ) < 1 6 {p(Xu,XT+h) + p(XnXT+h)}dh 1 26 1 6 o p(X u , Xu+h)dh + 0 p(X n XT+h)dh. < Thus, supEp(Xu,X r ) < 3sup sup Ep(XT,Xr+h), q,r r hE[O,26] where the suprema extend over all optional times T < t and a E [T- b,7]. 0 We also need the following elementary estimate. 
16. Convergence of Random Processes, Measures, and Sets 315 Lemma 16.13 Let 1, . . . , n > 0 be random variables with sum Sn. Then Ee- Sn < e- nc + maxP{€k < c}, c > o. kS:n Proof: Let p denote the maximum on the right. By the Holder and Chebyshev inequalities we get Ee- Sn = Ell e-k < II(Ee-nk)l/n < {(e- nc + p)l/n} n = e- nc + p. 0 k k Proof of Theorem 16.11: Again we may assume that p < 1, and by suitable approximation we may extend condition (11) to weakly optional times a and T. For each n E Nand E > 0, we recursively define the weakly Xn-optional times ak+l = inf{s > ak; P(X;;k'X) > E}, k E Z+, starting with 0'0 = O. Note that for mEN and t, h > 0, w(X n , t, h) < 2E + L l{ak+l - aJ: < h, O"k < t} + 1{0" < t}. (12) k<m Now let vn(t, h) denote the supremum in (11). By Chebyshev's inequality and a simple truncation, P{ak+l - ai: < h, ai: < t} < €-ll/n(t + h, h), kEN., t, h > 0, (13) and so by (11) and (12), lirn lirnsupEw(Xn,t,h) < 2E + limsupP{O", < t}. (14) h--+O n-+oo n-+CX) Next we conclude from (13) and Lemma 16.13 that, for any c > 0, P{a < t} < etE[e-U; a < t] < et{e- mc + c-Ivn{t + c,c)}. By (11) the right-hand side tends to 0 as m, n -+ 00 and then c -+ o. Hence, the last term in (14) tends to 0 as m -+ 00, and (9) follows since € is ar bi trary. 0 We may illustrate the use of Theorem 16.11 by proving an extension of Corollary 16.7. A more precise result is obtained by different methods in Corollary 15.20. An extension to Markov chains appears in Theorem 19.28. Theorem 16.14 (approximation of random walks, Skorohod) Let S1, 8 2 ,... be random walks in JRd such that 8n  Xl for some Levy process X and some integers m n -+ 00. Then the processes Xlt = S[:nn t ] satisfy xn  X in D(:Il+,1Rd). fd Proof: By Corollary 15.16 we have X n  X, and so by Theorem 16.11 it is enough to show that IX;n+hn - X I  0 for any finite optional times T' n and constants h n -t o. By the strong Markov property of sn, or 
316 Foundations of Modern Probability alternatively by Theorem 11.13, we may reduce to the case when Tn == 0 for all n. Thus, it suffices to show that X hn  0 as h n -4 0, which again may be seen from Corollary 15.16. 0 For the remainder of this chapter, we assume that S is IcscH with Borel a-field S. Write S for the class of relatively compact sets in S. Let M(S) denote the space of locally finite measures on S, endowed with the vague topology induced by the mappings 1r f : J.L t-t J-lf == J f dJ.-L, f E Cf(. The basic properties of this topology are summarized in Theorem A2.3. Note in particular that M (S) is Polish and that the random elements in M (S) are precisely the random measures on S. Similarly, the point processes on S are random elements in the vaguely closed subspace N(S), consisting of all integer-valued measures in M(S). We begin with the basic tightness criterion. Lemma 16.15 (tightness of random measures, Prohorov) Let 1,2,... be random measures on some lcscH space S. Then the sequence (n) is relatively compact in distribution iff (nB) is tight in JR+ for every B E S. Proof: By Theorems 16.3 and A2.3 the notions of relative compactness and tightness are equivalent for (n). If (n) is tight, then so is ((,nf) for every f E C1< by Lemma 16.4, and hence (nB) is tight for all B E S. Conversely, assume the latter condition. Choose an open cover G 1, G 2 , . . . E S of S, fix any c > 0, and let Tl, r2, . .. > 0 be large enough that sUPnP{nGk > Tk} < c2- k , kEN. (15) Then the set A = nk {J.t; j.tGk < rk} is relatively compact by Theorem A2.3 (ii), and (15) yields inf n P{n E A} > 1 - c. Thus, (n) is tight. 0 We may now derive some general convergence criteria for random mea- sures, corresponding to the uniqueness results in Lemma 12.1 and Theorem 12.8. Define S = {B E S; f,8B = 0 a.s.}. Theorem 16.16 (convergence of random measures) Let, 1, 2,... be random measures on an lcscH space S. Then these conditions are equivalent: (i) n  €; (ii) (,nf  f for all f E Cf(; (iii) (  n B 1, . . . , (, n B k)  «(, B 1, . . · ,  B k) for all B 1 , . . . , B k E S, kEN. If  is a simple point process or a diffuse random measure, it is also equivalent that d A (iv) f,nB ---t f,B for all B E s. Proof: By Theorems 4.27 and A2.3 (Hi), condition (i) implies both (ii) and (iii). Conversely, Lemma 16.15 shows that (n) is relatively compact in distribution under both (ii) and (iii). Arguing as in the proof of Lemma 
16. Convergence of Random Processes, Measures, and Sets 317 16.2, it remains to show for any random measures  and TJ on S that  d TJ if f d TJf for all f E OJ(, or if d (Bl,...,Bk)=('TJBl,...,'TJBk), Bl,...,BkES+Tp kEN. (16) In the former case, this holds by Lemma 12.1; in the latter ease it follows by a monotone class argument from Theorem A2.3 (iv). The last assertion is obtained in a similar way from a suitable version of Theorern 12.8 (iii). 0 Weaker conditions are required for convergence to a simple point process, as suggested by Theorem 12.8. The following conditions are only sufficient, and a precise criterion is given in Theorem 16.29. Here a class U C S is said to be separating if, for any compact and open sets K and G with KeG, there eists some U E U with K c U c G. Furthermore, we say that I c S is preseparating if the finite unions of sets in I form a separating class. Applying Lemma A2.6 to the function h(B) = Ee-B, we note that the class S is separating for any random measure . For Euclidean spaces S, a preseparating class typically consist of rectangular boxes, whereas the corresponding finite unions form a separating class. Proposition 16.17 (convergence of point processes) Let ,1,2,... be point processes on an lcscH space S, where  is simple, and fix a separating class U c S. Then n .!4  under these conditions: (i) P{nU = O} -t P{U = O} for all U E U; (ii) limsuPn EnK < EK < 00 for all compact sets K c S. Proof: Fist note that both (i) and (ii) extend by suitable approximation to sets in S. By the usual compactness argument together with Lemma 4.11, it is enough to prove that a point process 1] is distributed as  whenever P{1]B = O} = P{B = O}, E1]B < EB, B E S+7J. Here the first relation yields 1]* d  as in Theorem 12.8 (i). From the second relation we then obtain E'fJB < E'TJ* B for all B E S, which shows that 1] is a.s. simple. 0 We may illustrate the use of Theorem 16.16 by showing how Poisson and Cox processes may arise as limits under superposition or thinning. Say that the random measures nj, n,j E N, form a null array if they are independent for fixed n and such that, for every B E S, the random variables njB form a null array in the sense of Chapter 5. The following result is a point process version of Theorem 5.7. 
318 Foundations of Modern Probability Theorem 16.18 (convergence of superpositions, Grigelionis) Let (nj) be a null array of point processes on an lcscH space S, and consider a Poisson process  on S with E == J.l. Then E j nj   iff these conditions hold: (i) E j P{njB > O}  J.lB for all B E SJ-L; (ii) E j P{njB > I}  0 for all B E S. d d ,.., Proof: If E j nj -t , then E j njB --t B for all B E SJ-L by Theorem 16.16, which implies (i) and (ii) by Theorem 5.7. Conversely, assume (i) and (ii). To prove that E j nj  , we may restrict our attention to an arbitrary compact set C E SJ-L. For notational convenience, we may also assume that S itself is compact. Now define 1Jnj == nj1{njS < I}, and note that (i) and (ii) remain true for the array ('TJnj). Moreover, E j 'T}nj   implies E j nj   by Theorem 4.28. This reduces the discussion to the case when (,njS < 1 for all nand j. Now define J.lnj =: EEnj. By (i) we get L .MnjB =: L .EnjB == L .P{njB > O} --t J-lB, B E SJ-L' J J J and so E j J-tnj  J.l by Theorem 4.25. Noting that m(l- e- f ) == 1- e- mf when m == 6x or 0 and writing n =: E j nj, we get by Lemmas 5.8 and 12.2 (i) Ee-€nf II .Ee-€n J ! == II .E{l - nj(l - e- f )} J J II j {I - Jlnj(l - e- f )} '"" exp { - LjJlnj(l - e- f )} --+ exp( -J.l(l - e- f)) = Ee-!. o We may next establish a basic limit theorem for independent thinnings of point processes. Theorem 16.19 (convergence of thinnings) For every n E N, let n be a Pn-thinning of some point process 1]n on S, where S is lcscH and Pn -+ o. Then n  some  iff Pn1]n  some 1J, in which case  is distributed as a Cox process directed by 1]. Proof: For any f E CJ(, we get by Lemma 12.2 E-nf = Eexp(TJn lo g{l- Pn(l- e- f )}). Noting that px < -log(l - px) < -x log(l - p) for any p, x E [0,1) and writing p = -log(l - Pn), we obtain E exp{ -P1]n(l - e- f)} < Ee-f.nf < E exp{ -Pn1]n(l - e- f)}. (17) d d If Pn'TJn -+ TJ, then even PTJn --t TJ, and so by Lemma 12.2 Ee-nf -4 Eexp{-'1}(l- e- f )} = Ee-f, 
16. Convergence of Random Processes, Measures, and Sets 319 where  is a Cox process directed by 'T}. Hence, n  . Conversely, assume that n  . Fix any 9 E C-; and let 0 < t < Ilgll- 1 . Applying (17) with f == -log(l - tg), we get limin£ Eexp{ -tpnTJng} > Eexp{log(l - tg)}. n-+oo Here the right-hand side tends to 1 as t -+ 0, and so by Lemmas 5.2 and 16.15 the sequence (PnTJn) is tight. For any subsequence N' c N, we may d then choose a further subsequence Nil such that Pn TJn -+ some TJ along Nil. By the direct assertion,  is then distributed as a Cox process directed by '1J, which by Lemma 12.6 determines the distribution of ry. Hence, TJn  TJ remains true along the original sequence. 0 The last result leads in particular to an interesting characterization of Cox processes. Corollary 16.20 (Cox processes and thinnings, Mecke) Let  be a point process on S. Then  is Cox iff for every p E (0,1) there exists a point process p such that  is distributed as a p-thinning of p. Proof: If  and p are Cox processes directed by TJ and 17/P, respectively, then Proposition 12.3 shows that  is distributed as a p-thinning of p. Conversely, assuming the stated condition for every p E (0,1), we note that  is Cox by Theorem 16.19. 0 The previous theory will now be used to derive a general limit theorem for sums of exchangeable random variables. The result applies in particu- lar to sequences obtained by sampling without replacement from a finite population. It is also general enough to contain a version of Donsker's the- orem. The appropriate function space in this case is D([O, 1], JR) == D[O, 1], to which the results for D(JR+) apply with obvious modifications. For motivation, we begin with a description of the possible limits, which are precisely the exchangeable processes on [0, 1]. Here we say that a process X on [0, 1] is exchangeable if it is continuous in probability with Xo == 0 and has exchangeable increments over any set of disjoint intervals of equal length. The following result is a finite-interval version of Theorem 11.15. Theorem 16.21 (exchangeable processes on [0,1]) A process X on [0,1] is exchangeable iff it has a version with representation Xt = at + aBt + L ./3j(l{rj < t} - t), t E [0,1], (18) J for some Brownian bridge B, some independent i. i. d. U (0, 1) random variables 71, 72, . . . , and some independent set of coefficients Q, a, and {31, 132, . .. such that 2: j 13J < 00 a.s. In that case, the sum in (18) converges in probability, uniformly on [0,1], toward an rclilimit. In particular, we note that a simple point process on [0,1] is symmetric with respect to Lebesgue measure ,\ iff it is a mixed binonlial process based on '\, in agreement with Theorem 12.12. Combining the present result with 
320 Foundations of Modern Probability Theorem 11.15, we also see that a continuous process X on JR+ or [0, 1] with Xo = 0 is exchangeable iff it can be written in the form Xt = at + uBt, where B is a Brownian motion or bridge, respectively, and (a,O") is an independent pair of random variables. We first examine the convergence of the series in (18). Lemma 16.22 (convergence of series) For any t E (0,1), the series in (18) converges a.s. iff Ej!3J < 00 a.s. In that case, it converges in proba- bility with respect to the uniform metric on [0, 1], and the sum has a version in D[O, 1]. Proof: For both assertions, we may assume that the coefficients {3j are nonrandom. Then for fixed t E (0,1), the terms are independent and bounded with mean 0 and variance !3Jt(l - t), and so by Theorem 4.18 the series converges iff Lj !3J < 00. To prove the second assertion, let x n denote the nth partial sum in (18), and note that the processes Mf == Xf /(1 - t) are L 2 -martingales on [0,1) with respect to the filtration induced by the processes 1{ Tj < t}. By Doob's inequality we have for any m < nand t E [0, 1) E(X n - X m );2 < E(M n - M m );2 < 4E(Mr - M;n)2 - 4(1 - t)-2 E(Xf - X;n)2 < 4t(1- t)-l '" (3 j>m J' which tends to 0 as m --t 00 for fixed t. Hence, (xn - X); -+ 0 a.s. along p a subsequence for some process X, and then also (xn - X); -+ 0 along - - p N. By symmetry we have also (X n - X); -4 0 for the reflected processes Xt == Xl-t- and X;" = Xf-t-, and so by combination (xn-x)  o. The last assertion now follows from the fact that X n is rcll for every n. 0 We plan to prove Theorem 16.21 together with the following approxima- tion result. Here we consider for every n E N some exchangeable random variables nj, j < m n , where m n -+ 00, and introduce the summation processes X;" == . nj, t E [0,1], n E N. L...,,; J '5:. m n t (19) Our aim is to show that the xn can be approximated by exchangeable processes as in (18). The convergence criteria will be stated in terms of the random variables and measures an = 2: .nj, J f\,n == 2: .j<5nj' J n EN, (20) K = 0"2<5 0 + 2: .f3Jbj. J (21) 
16. Convergence of Random Processes, Measures, and Sets 321 Theorem 16.23 (approximation of exchangeable sums) ror every n E N, consider some exchangeable random variables nj, j < fin, and define X n , an, and "'n by (19) and (20). Assume fin -t 00. Then xn  some X in D[O,l] iff (an, "'n)  some (a, K) in 1R x M( IR ), in which case X and (a, K) are related by (18) and (21). Our proof is based on three auxiliary results. We begin with a simple randomization lemma, which will enable us to reduce the proof to the case of non-random coefficients. Recall that if v is a measure on Sand J-l is a kernel from S to T, then VJ-L denotes the measure J Jl(s, .)v(ds) on T. For any measurable function f : T --+ IR+, we define the mea..'3urable function J-LI on S by J-tf(s) = J Jl(s, dt)f(t). Lemma 16.24 (randomization) For any metric spaces Sand T, let v, VI, V2, . .. be probability measures on S with V n  v, and let J-l, Jll, J-l2, . .. be probability kernels from S to T such that Sn -t s in S implies J-ln (sn, .)  J..t ( s, .). Th en v n Jln  v Jl. Proof: Fix any bounded, continuous function f on T. Then Jlnl(sn) -+ J..tf(s) as Sn -+ s, and so by Theorem 4.27 (v n J1n)f = v n (J1nf) -t v(Jlf) = (vJ..t)f. o To establish tightness of the random measures "'n, we need the following conditional hyper-contractivity criterion. Lemma 16.25 (hyper-contraction and tightness) Let the random var't- abIes 1, 2, ... > 0 and a-fields ;:1, F2, be such that, for some a > 0, E[IFn] < a(E[nIFn])2 < 00 a.s., n E N. Then if (n) is tight, so is the sequence "In == E[nIFn], n E N. p Proof: By Lemma 4.9 we need to show that CnTJn -+ 0 whenever 0 < C n --+ O. Then conclude from Lemma 4.1 that, for any r E (0,1) and c > 0, o < (1- r)2a- 1 < P[n > rTJnlFn] < P[cnn > rclFn] + l{c n 1Jn < c}. Here the first term on the right tends in probability to 0 since cnn  0 by Lemma 4.9. Hence, l{C n 17n < e}  1, which means that P{c n 1Jn > c} -4 O. Since £ is arbitrary, we get Cn17n  O. 0 Since the summation processes in (19) will be approximated by exchange- able processes, as in Theorem 16.21, we finally need a convergence criterion for the latter. This result also has some independent interest. 
322 Foundations of Modern Probability Proposition 16.26 (convergence of exchangeable processes) Let the pro- cesses X n and pairs (an, Kn) be related as in (18) and (21). Then X n  some X in D[O, 1] iff (an, h: n )  some (a, h:) in R x M( JR ), in which case even X and (o:,h:) are related by (18) and (21). Proof' First let (on, Kn)  (a, K). To prove xn  X for the correspond- ing processes in (18), it suffices by Lemma 16.24 to assume that all the an and Kn are nonrandom. Thus, we may restrict our attention to processes X n with constant coefficients an, (J' n, and (3nj, j EN. To prove that xn f d ) X, we begin with four special cases. First we note that if an -+ a, then trivially O:nt  at uniformly on [0,1]. Similarly, an -+ a implies anB -+ (J'B in the same sense. Next we consider the case when an == an == 0 and {3n,m+l == !3n,m+2 == . . · == 0 for some fixed mEN. Here we may assume that even a == a == 0 and 13m+l == /3m+2 == . . . == 0, and that moreover !3nj -+ /3j for all j. The convergence xn -+ X is then obvious. Finally, we may assume that an == an == 0 and a == (3l == /32 == ... == O. Then maxj l13nj I -+ 0, and for any s < t we have E(Xr; Xf) == s(1 - t) I: ./3j -+ s(1 - t)a 2 == E(XsX t ). (22) J In this case, X n f d ) X by Theorem 5.12 and Corollary 5.5. By indepen- dence we may combine the four special cases to obtain X n f d ) X whenever j3j == 0 for all but finitely many j. From here on, it is easy to extend to the general case by means of Theorem 4.28, where the required uniform error estimate may be obtained as in (22). To strengthen the convergence to xn  X in D[O,l], it is enough to verify the tightness criterion in Theorem 16.11. Thus, for any Xn-optional times 'Tn and positive constants h n --t 0 with Tn + h n < 1, we need to show that X+hn - Xn  o. By Theorem 11.13 and a simple approximation, it is equivalent that X hn  0, which is clear since E(X hn )2 == ha + h n (l - hn)KnIR -7 O. To obtain the reverse implication, we assume that xn  X in D[O, 1] for some process X. Since an == Xl  Xl, the sequence (an) is tight. Next define for n E N 1Jn = 2Xr/2 - Xf == 2a n B l/2 + 2 I: .{Jnj(l{Tj < !} - !). J Then E[1J IKn] a + :E .(3j = Kn, J 3 {(1 + I:lj} 2 - 2 Ljj < 3(nll. E[17In] 
16. Convergence of Random Processes, Measures, and Sets 323 Since ('TJn) is tight, Lemmas 16.15 and 16.25 show that even (n) is tight, and so the same thing is true for the sequence of pairs (an, n)' The tightness implies relative compactness in distribution, and so every subsequence contains a further subsequence that converges in  x M( JR ) toward some random pair (a, ). Since the measures in (21) form a vaguely closed subset of M(  ), the limit"" has the same form for suitable a and {31, {32,. .. . The direct assertion then yields X n  Y with Y as in (18), and therefore X d Y. Now the coefficients in (18) are measurable functions of Y, and so the distribution of (Q, ) is uniquely determined by that of X. Thus, the limiting distribution is independent of subsequence, and the convergence (an, Kn)  (a, K) remains valid along N. We may finally use Corollary 6.11 to trnsfer the representation (18) to the original process X. 0 Proof of Theorem 16.23: Let 71,12,. .. be i.i.d. U(O,1) and independent of all nj, and define n := L .nj1{Tj < t} := ant + L .nj(1{Tj < t} - t), t E [0,1]. J J Writing nk for the kth jump from the left of yn (including possible 0 jumps when nj := 0), we note that (tnj) d (nj) by exchangeability. Thus, - d - - - X n := X n , where Xr := Ejmnt nj. Furthermore, d(X n , yn) ---t 0 a.s. by Proposition 4.24, where d is the metric in Theorem A2.2. Hence, by Theorem 4.28 it is equivalent to replace xn by yn. But then the assertion follows by Proposition 16.26. 0 Using similar compactness arguments, we may finally prove the main representation theorem for exchangeable processes on [0, 1]. Proof of Theorem 16.21: The sufficiency part being obvious, it is enough to prove the necessity. Thus, assume that X has exchangeable increments. Introd uce the step processes x; = X(2-n[2 n t]), t E [0,1], n E N, define Kn as in (20) in terms of the jump sizes of X n , and put an = X 1. If the sequence (Kn) is tight, then (an, Kn)  (a, "") along some subsequence, d and by Theorem 16.23 we get x n -t Y along the same subsequence, where Y can be represented as in (18). In particular, X n f d ) y, and so the finite-dimensional distributions of X and Y agree for dyadic times. The agreement extends to arbitrary times, since both processes are continuous in probability. By Lemma 3.24 it follows that X has a version in D[O, 1], and by Corollary 6.11 we obtain the desired representation. To prove the required tightness of (K n ), denote the increments in xn by f,nj, put (nj = nj - 2- n a n , and note that fbnR = L .j = L .(j + 2-na. J J (23) 
324 Foundations of Modern Probability Writing TIn = 2Xr!2 - Xr = 2X 1j2 - Xl and noting that E j (nj = 0, we get the elementary estimates E[1J;In] :S L/j + Li#j'i'j = {Lj'j} 2 :S (E[1J;In])2. Since 'r/n is independent of n, the sequence of sums E j (j is tight by Lemma 16.25, and the tightness of (n) follows by (23). 0 For measure-valued processes X n with rcll paths, we show that tight- ness can be characterized in terms of the real-valued projections Xl" f == f f(s)XJ:(ds), f E C1(. Theorem 16.27 (measure-valued processes) Let Xl, X 2 ,. .. be random elements in D(1R+, M (8)), where S is lcscH. Then (X n ) is tight iff (X n f) is tight in D(1R+, 1R+) for every 1 E ct (8). Proof: Assume that (X n f) is tight for every f E Cj(, and fix any £ > o. Let 11,12,... be such as in Theorem A2.4, and choose some compact sets B 1 , B2, . . . c D(IR+, IR+) with P{Xnfk E B k } > 1-c2- k , k,n E N. (24) Then A = nk{p,; P,fk E Bk} is relatively compact in D(JR+,M(S)), and (24) yields P{X n E A} > 1 - c. 0 We turn our attention to random sets. Then fix an IcscH space S, and let F, Q, and /C denote the classes of closed, open, and compact subsets, respectively. We endow :F with the so-called Fell topology, generated by the sets {F; F n G =f:. 0} and {F; F n K == 0} for arbitrary G E g and K E /C. Some basic properties of this topology are summarized in Theorem A2.5. In particular, F is compact and metrizable, and {F; F n B = 0} is universally measurable for every B E S. By a random closed set in S we mean a random element c.p in F. In this context we often write c.p n B = c.pB, and we note that the probabilities P{ c.pB = 0} are well defined. For any random closed set c.p, we introduce the class Sip = {B E Sj P{cpBo = 0} = P{ cpB = 0}} , which is separating by Lemma A2.6. We may now state the basic con- vergence criterion for random sets. It is interesting to note the formal agreement with the first condition in Proposition 16.17. 
16. Convergence of Random Processes, Measures, and Sets 325 Theorem 16.28 (convergence of random sets, Norberg) Let C{J, <PI, C{J2,... be random closed sets in an LescH space S. Then 4?n  <p iff P{C{JnU==0}.-tP{<pU==0}, UEU, (25) for some separating class U c S, in which case we may take U == S<.p. Proof: Write h(B) == P{<pB =1= 0} and hn(B) == P{C{Jn B i= 0}. If <Pn  <P, then by Theorem 4.25, h(BO) < liminf hn(B) < limsuphn(B) < h( B ), n-+oo BE S, n-+oo and so for any B E S'P we get hn(B)  h(B). A Next assume that (25) holds for some separating class U. Fix any B E Scp, and conclude from (25) that, for any U, V E U with U c B c V, h(U) < liminf hn(B) < limsuphn(B) < h(V). n-+oo n-+oo Since U is separating, we may let U t B O to get {<pU =I 0} t {<pBo 1= 0} and hence h(U) t h(BO) = h(B). Next choose some sets V E U with V .t. B , and conclude by the finite intersection property that {<p V =1= 0} t {<p B =1= 0}, which gives h(V) t ( B ) == h(B). Thus, hn(B) .-t h(B), and so (25) remains true for U == S'P' Since F is compact, the sequence {<Pn} is relatively compact by Theorem d 16.3. Thus, for any subsequence N' c N, we have <(In --t 1jJ along a further subsequence for some random closed set 'ljJ. By the direct statement together with (25) we get " A P{ cpB = 0} == P{ 1/JB == 0}, B E S<p n 5'1/;. (26) Since Scp ns"p is separating by Lemma A2.6, we may approximate as before to extend (26) to arbitrary compact sets B. The class of sets {F; FnK == 0} with K compact is clearly a 7f-system, and so a monotone class argument gives <p d 1/J. Since N' is arbitrary, we obtain 'Pn  <p along N. 0 Simple point processes allow the dual descriptions as integer-valued ran- dom measures or locally finite random sets. The corresponding notions of convergence are different, and we proceed to clarify their relationship. Since the mapping J.L ...-.+ sUPP J-L is continuous on N(S), we note that n   im- plies supp n  supp . Conversely, assuming the intensity measures E and Ef,n to be locally finite, we see from Proposition 16.17 and Theorem d d v 16.28 that n --1- f, whenever sUPPn  supp and En --t E. The next result gives a general criterion. 
326 Foundations of Modern Probability Theorem 16.29 (supports of point processes) Let , 1, 2, . .. be point processes on an lcscH space S, where  is simple, and fix a preseparating  d d class I c Sf,.. Then n -t  iff supp n -t supp  and limsupP{nI > I} < P{I > I}, I E I. (27) n-+oo f Proof: By Corollary 6.12 we may assume that sUPPn -=-t sUPP a.s., and since  is simple we get by Proposition A2.8 limsuP(nB 1\ 1) < B < liminf nB a.s., B E B. (28) n-+oo noo Next we have for any a, b E Z+ {b < a < l}C {a>1}U{a<bI\2} - {b>l}U{a==O,b=l}U{a>l > b}, where all unions are disjoint. Substituting a == I and b == nI, we get by (27) and (28) !im P{I < f,nI 1\ 2} = 0, I E I. nCX) (29) Next let Bel E I and B' == 1\ B, and note that {nB > B} c {nI > I} U {nB' < B'} c {nI 1\ 2 > I} U {I > I} U {nB' < B'}. (30) More generally, assume that B E B is covered by 11,. .. ,1m E I. It may then be partitioned into sets Bk E Bt,. n Ik, k == 1, . . . , m, and by (28), (29), and (30) we get limsupP{nB > B} < pU {lk > I}. n-+oo k (31) Now let B E B and K E K with B c KO. Fix a metric d in S and let E > O. Since I is preseparating, we may choose some 11,. . ., 1m E I with d- diameters < e such that B C Uk Ik C K. Letting PK denote the minimum d-distance between points in K n supp , it follows that the right-hand side of (31) is bounded by P{PK < E}. Since PK > 0 a.s. and E > 0 is arbitrary, we get P{nB > B} -+ o. In view of the second relation in (28), we obtain P d nB -+ B. Thus, n -t  by Theorem 16.16. 0 Exercises 1. For any metric space (S,p), show that if x n -+ x in D(JR+,S) with x continuous, then SUPs<t p(x, xs) -+ 0 for every t > o. (Hint: Note that x is uniformly continuous on every interval [0, t].) 
16. Convergence of Random Processes, Measures, and Sets 327 2. For any separable and complete metric space (8, p), show that if X n  X in D(IR+, S) with X continuous, there exist some processes yn d x n such that sUPs<t p(Ysn, Xs) -+ 0 a.s. for every t > O. (Hint: lTse the preceding result together with Theorems 4.30 and A2.2.) 3. Give an example where X n -+ x and Yn ---t Y in D(+, JR.) and yet (xn, Yn) -1+ (x, y) in D(]R+, }R2). 4. Let X and X n be random elements in D(R+, ]Rd) with X n f d > X and such that uX n  uX in D (IR+ , IR) for every u E IR d. Show that X n  X. (Hint: Proceed as in Theorems 16.27 and A2.4.) 5. Let f be a continuous mapping between two metric spaces Sand T, where S is separable and complete. Show that if X n  X in D(R+, S), then f(X n )  f(X) in D(+, T). (Hint: By Theorem 4.27 it suffices to show that X n -t x in D(JR+, S) implies f(xn) ---t f(x) in D(IR+, T). Since A = {x, Xl, X2,...} is relatively compact in D(JR+, S), Theorem A2.2 shows that U t = Us<t 1fsA is relatively compact in 8 for every t > o. Hence, f is uniformly continuous on each U t .) 6. Show by an example that the condition in Theorem 16.11 is not necessary for tightness. (Hint: Consider nonrandom processes X n. ) 7. In Theorem 16.11, show that it is enough to consider optional times taking finitely many values. (Hint: Approximate from the right and use the right-continuity of the paths.) 8. Let the process X on + be continuous in probability with values in a separable and complete metric space (8, p). Assume that p(X Tn , XTn+h n )  o for any bounded optional times Tn and constants h n  O. Show that X has an rcll version. (Hint: Approximate by suitable step processes and use Theorems 16.10 and 16.11.) 9. Extend Corollary 16. 7 to random vectors in JR d . 10. Let X, Xl, X 2 ,. .. be Levy processes in ]Rd. Show that xn  X in D(R+,JR d ) iff Xl  Xl in Rd. Compare with Theorem 15.17. 11. Show that", conditions (iii) and (iv) of Theorem 16.16 remain sufficient if we replace S by an arbitrary separating class. (Hint: Restate the con- ditions in terms of Laplace transforms, and extend to S by a suitable approximation. ) 12. Deduce Theorem 16.18 from Theorem 5.7. (Hint: First assume that J-t is diffuse and use Theorem 16.17. Then extend to the general case by a suitable randomization.) d 13. Strengthen the conclusion in Theorem 16.19 to (n,Pn1Jn)  (, 1]), where  is a Cox process directed by 1]. 
328 Foundations of Modern Probability 14. For any lcscH space S, let ,1,2,... be Cox processes on S directed by 1], 'fJl, 1]2, . .. . Show that n   iff 'rJn  "I. Prove the corresponding result for p-thinnings with a fixed p E (0,1). 15. Let 1], 1]1 , TJ2, . .. be.A- randomizations of some point processes ,  1 , 2, . .. on an lcscH space S. Show that n   iff 1Jn  'fl. 16. Specialize Theorem 16.23 to suitably normalized sequences of i.i.d. random variables, and compare with Corollary 16.7. 17. Let X be a continuous process on I = ffi:+ or [0,1] with Xo == o. Show that X is exchangeable iff a.s. Xt == at + {J' Bt, tEl, for some Brownian motion or bridge B and some independent pair of random variables Q and a > o. Also show that Q and a are a.s. unique. (Hint: For the last assertion, use the laws of large numbers and the iterated logarithm.) 18. Characterize the Levy processes on [0,1] as special exchangeable processes, in terms of the coefficients in Theorem 16.21. 19. Show that a process X on JR+ is exchangeable iff it has a version that is conditionally Levy with random characteristics (0:, /3, v). (Hint: Theorem 16.21 shows that X has an rcll version. By Theorem 11.15 it is then condi- tionally Levy, given some a-field I. Finally, the characteristics (0:, /3, v) of X are I-measurable by the law of large numbers.) 20. Let X be an rcll, exchangeable process on R+. Show directly from Corollary 16.20 and Theorem 16.21 that the point process of jump sizes on [0,1] is Cox. Also conclude from Theorem 16.19 and the law of large numbers that the point process of jump times and sizes is Cox with directing random measure of the form v Q9 A. 21. For an IcscH space S, let U C S be separating. Show that if KeG with K compact and G open, there exists some U E U with K c UO c U c G. (Hint: First choose B, C E S with K c B O c B c Co c C c G.) 
Chapter 17 Stochastic Integrals and Quadratic Variation Continuous local martingales and semimartingales; quadratic variation and co variation; existence and basic properties of the integral; integration by parts and Ito's forrnula; Fisk- Stratonovich integral; approximation and uniquen(ss; random time-change; dependence on parameter This chapter introduces the basic notions of stochastic calculus in the spe- cial case of continuous integrators. As a first major task, we shall construct the quadratic variation [M] of a continuous local martingale M, using an elementary approximation and completeness argument. The processes M and [M] will be related by some useful continuity and norm relations, most importantly by the powerful BDG inequalities. Given the quadratic variation [M], we may next construct the stochastic integral J V dM for suitable progressive processes V, using a simple Hilbert space argument. Combining with the ordinary Stieltjes integral J V dA for processes A of locally finite variation, we may finally extend the integral to arbitrary continuous semimartingales X == M + A. The continuity prop- erties of quadratic variation carryover to the stochastic integral, and in conjunction with the obvious linearity they characterize the integration. The key result for applications is Ita's formula, which shows how semimartingales are transformed under smooth mappings. The present sub- stitution rule differs from the corresponding result for Stieltjes integrals, but the two formulas can be brought into agreement by a suitable modi- fication of the integral. We conclude the chapter with some special topics of importance for applications, such as the transformation of stochastic integrals under a random time-change, and the integration of processes depending on a parameter. The present material may be regarded as continuing the martingale the- ory from Chapter 7. Though no results for Brownian motion are used explicitly in this chapter, the existence of the Brownian quadratic variation in Chapter 13 may serve as a motivation. We shall also need the represen- tation and measurability of limits obtained in Chapter 4. The stochastic calculus developed in this chapter plays an important role throughout the remainder of this book, especially in Chapters 18, 21, 22, and 23. In Chapter 26 the theory is extended to possibly discontinuous semimartingales. 
330 Foundations of Modern Probability Throughout the chapter we let F == (Ft) be a right-continuous and complete filtration on 1R+. A process M is said to be a local martingale if it is adapted to :F and such that the stopped and shifted processes MTn - Mo are true martingales for suitable optional times Tn t 00. By a similar localization we may define local L 2 -martingales, locally bounded martingales, locally integrable processes, and so on. The associated optional times Tn are said to form a localizing sequence. Any continuous local martingale may clearly be reduced by localiza- tion to a sequence of bounded, continuous martingales. Conversely, we see by dominated convergence that every bounded local martingale is a true martingale. The following useful result may be less obvious. Lemma 17.1 (localization) Fix any optional times Tn t 00. Then a process M is a local martingale iff M'n has this property for every n. Proof: If M is a local martingale with localizing sequence (un), and if T is an arbitrary optional time, then the processes (M')CT n == (MU n )' are true martingales. Thus, MT is again a local martingale with localizing sequence (an). Conversely, assume that each process MTn is a local martingale with localizing sequence (0';:). Since uk -+ 00 a.s. for each n, we may choose some indices k n with P{ak n < Tn An} < 2- n , n EN. Writing T = Tn A a kn , we get T -+ 00 a.s. by the Borel-Cantelli lemma, and so the optional times r:: = infm>n r:n satisfy r:: t 00 a.s. It remains to " T II note that the processes MTn == (MTn )Tn are true martingales. 0 The next result shows that every continuous martingale of finite variation is a.s. constant. An extension appears as Lemma 25.11. Proposition 17.2 (finite-variation martingales) If M is a continuous local martingale of locally finite variation, then M == Mo a.s. Proof: By localization we may reduce to the case when Mo = 0 and M has bounded variation. In fact, let vt denote the total variation of M on the interval [0, t], and note that V is continuous and adapted. For each n E N we may then introduce the optional time Tn = inf {t > 0; vt = n}, and we note that MTn - Mo is a continuous martingale with total variation bounded by n. Note also that Tn -+ 00 and that if M'n == Mo a.s. for each n, then even M = Mo a.s. In the reduced case, fix any t > 0, write tn,k = ktln, and conclude from the continuity of M that a.s. (n =  (M tn k - M tn k _ l )2 < Vi max IM tn k - M tn k-ll -t O. L..Jkn' · kn' · 
17. Stochastic Integrals and Quadratic Variation 331 Since (n < 2, which is bounded by a constant, it follows by the martingale property and dominated convergence that EM; == E(n ---+ 0, and so Mt == 0 a.s. for each t > O. 0 Our construction of stochastic integrals depends on the quadratic vari- ation and covariation processes, which then need to be constructed first. Here we use a direct approach that has the further advantage of giving some insight into the nature of the basic integration-by-parts formula of Theorem 17.16. An alternative but less elementary approach would be to use the Doob-Meyer decomposition in Chapter 25. The construction utilizes predictable step processes of the form Vi = Lkkl{t > rd = L/7k1(rk,rk+1] (t), t > 0, (1) where the Tn are optional times with Tn t 00 a.s., and the k and 1}k are FTk-measurable random variables for each kEN. For any process X we may introduce the elementary integral process V . X, given as in Chapter 7 by (V. X)t = (t VdX = L k(Xt - X?) = L 1Jk(X;k+l - X;J, (2) J o k k where the series converge since they have only finitely many nonzero terms. Note that (V . X)o = 0 and that V . X inherits the possible continuity properties of X. It is further useful to note that V . X == V . (X - X 0)' The following simple estimate will be needed later. Lemma 17.3 (L 2 -bound) For any continuous L 2 -martingale M with Mo == 0 and predictable step process V with IVI < 1, the process V . M is again an L 2 -martingale, and E(V . M) < EM;. Proof: First assume that the sum in (1) has only finitely many nonzero terms. Then Corollary 7.14 shows that V . M is a martingale, and the L2-bound follows by the computation E(V . M); E Lk 1J(M;k+l - M;J2 E" ( M t - M t ) 2 == EM 2 . k Tk+l Tk t < The estimate extends to the general case by Fatou's lemma, and the martingale property then extends by uniform integrability. 0 Let us now introduce the space M 2 of all L 2 -bounded, continuous mar- tingales M with Mo == 0, and equip M 2 with the norrn 11M II == IIM(X) 112. Recall that IIM*1I2 < 211MII by Proposition 7.16. Lemma 17.4 (completeness) The space M 2 is a Hilbert space. Proof: Fix any Cauchy sequence M 1 , M 2 ,. .. in M 2 . The sequence (M::O) is then Cauchy in L 2 and thus converges toward some element  E £2. 
332 Foundations of Modern Probability Introduce the L2- mar tingale Mt = E[IFt1, t > 0, and note that Moo =  a.s. since  is F oo-measurable. Hence, II(M n - M)*112 < 211M n - Mil == 211M - Moo 112 -+ 0, and so liMn - Mil -+ O. Moreover, (Mn - M)* -+ 0 a.s. along some subsequence, which shows that M is a.s. continuous with Mo = O. 0 We are now ready to prove the existence of the quadratic variation and covariation processes [1\;1] and [M, N]. Extensions to possibly discontinuous processes are considered in Chapter 26. Theorem 17.5 (covariation) For any continuous local martingales !v! and N, there exists an a.s. unique continuous process [M, N] of locally finite variation and with [M, N]o == 0 such that M N - [M, N] is a local martingale. The form [M, N] is a.s. symmetric and bilinear with [M, N] == [M - Mo, N - No] a.s. Furthermore, [M] == [M, M] is a.s. nondecreasing, and for any optional time T, [M T , N] = [MT, NT] == [M, N]T a.s. Proof: The a.s. uniqueness of [M, N] follows from Proposition 17.2, and the symmetry and bilinearity are immediate consequences. If [M, N] exists with the stated properties and T is an optional time, then by Lemma 17.1 the process MT NT - [M, N]T is a local martingale, and so is the process MT(N - NT) by Corollary 7.14. Hence, even MT N - [M, N]T is a local martingale, and so [MT, N] = [MT, NT] == [M, N]T a.s. Furthermore, M N - (M - Mo)(N - No) == MoNo + Mo(N - No) + No(M - Mo) is a local martingale, and so [M - Mo, N - No] = [M, N] a.s. whenever either side exists. If both [M + N] and [M - N1 exist, then 4M N - ([M + N] - [M - N]) == ((M + N)2 - [M + N]) - ((M - N)2 - [M - N]) is a local martingale, and so we may take [M, N] == ([M +N] - [M - N])/4. It is then enough to prove the existence of [M] when Mo = O. First assume that M is bounded. For each n E N, let TO = 0 and define recursively 7;:+1 = inf{t > 7; IMt - MTrl = 2- n }, k > O. Clearly, TT: ---t 00 as k  00 for fixed n. Introduce the processes v;n = Lk MTk'l{t E (Tk,Tk+1])' Qf = Lk(MtMk' - MtMi:_J2. The V n are bounded predictable step processes, and we note that Ml = 2(V n . M)t + Qr , t > O. (3) By Lemma 17.3 the integrals V n . M are continuous L 2 -martingales, and since tv n - Ml < 2 n for each n, we have IIV m . M - V n . MU = II(V m - V n ) · Mil < 2- m + 1 I1MII, m < n. 
17. Stochastic Integrals and Quadratic Variation 333 Hence, by Lemma 17.4 there exists some continuous martingale N such that (V n . M - N)*  o. The process [M] == M2 - 2N is again continuous, and by (3) we have (Qn _ [M])* = 2(N - V n . M)*  o. In particular, [M] is a.s. nondecreasing on the randorn time set T == { T;:; n, kEN}, and the monotonicity extends by continuity to the clo- sure T . Also note that [M] is constant on each interval in r C , since this is true for M and hence also for every Qn. Thus, [M] is a.s. nondecreasing. Thrning to the unbounded case, we define Tn == inf{t > 0; IMtl == n}, n E N. The processes [MT n ] exist as before, and we note that [MT m JTm =: [MT n ]Tm a.s. for all m < n. Hence, [MT m ] == [MTn] a.s. on [0, Tm], and since Tn -+ 00 there exists a nondecreasing, continuous, and adapted process [M] such that [M] == [MT n ] a.s. on [0, Tn] for each n. Here (MTn)2 - [M]T n is a local martingale for each n, and so M 2 - [M] is a local martingale by Lemma 17.1. 0 We proceed to establish a basic continuity property. Proposition 17.6 (continuity) For any continuous local martingales M n starting at 0, we have M  0 iff [Mn]oo  o. Proof: First let M  O. Fix any c > 0, and define Tn == inf{t > 0; IMn(t)1 > c}, n E N. Write N n = M - [M n ], and note that Nn is a true martingale on JR +. In particular, E[Mn]T n < E 2 , and so by Chebyshev's inequality P{[Mn]oo > E} < P{Tn < oo} + E- 1 E[M n ]Tn < P{M > E} + E. Here the right-hand side tends to zero as n -+ 00 and then c ---t 0, which p shows that [Mn]oo -t 0. The proof in the other direction is similar, except that we need to use a localization argument together with Fatou's lemma to see that a continuous local martingale M with Mo = 0 and E[M]oo < 00 is necessarily £2_ bounded. 0 Next we prove a pair of basic norm inequalities involving the quad- ratic variation, known as the BDG inequalities. Partial extensions to discontinuous martingales are established in Theorem 26.12. Theorem 17.7 (norm inequalities, Burkholder, Millar, Gundy, Novikov) There exist some constants c p E (0,00), p > 0, such that for any continuous local martingale M with Mo = 0, c- 1 E [ M ] P/2 < EM*P < c E [ M ] P/2 P :> o. P 00- -P 00' Proof: By optional stopping we may assume that M and [M] are b9unded. Write M' = M - MT with T = inf{t; Ml == r} and define 
334 Foundations of Modern Probability N = (M')2 - [M']. By Corollary 7.30 we have for any r > 0 and c E (0,2- P ) p{M*2 > 4r} - P{[M]oo > cr} < p{M*2 > 4r, [M]oo < cr} < P{N> -cr, SUPtNt > r - cr} < cP{N* > O} < cP{M*2 > r}. Multiplying by (p/2)r P / 2 - 1 and integrating over lR+, we get by Lemma 3.4 2- P EM*P - c- p !2 E[M]2 < cEM*P, and the right-hand inequality follows with c p == c- p !2/(2- P - c). N ext let N be as before with T = inf {t; [M] t = r}, and write for any r > 0 and c E (0,2- p / 2 - 2 ) P{[M]oo > 2r} - P{Al*2 > cr} < P{[M]oo > 2r, M*2 < cr} < P{N < 4cr, inftNt < 4cr - r} < 4cP{[M]oo > r}. Integrating as before yields 2- p / 2 E[M]2 - c- p / 2 EM*P < 4cE[M]2, and the left-hand inequality follows with c p = c- p / 2 /(2- p / 2 - 4c). 0 It is often important to decide whether a local martingale is in fact a true martingale. The last proposition yields a useful criterion. Corollary 17.8 (uniform integrability) Let M be a continuous local mar- tingale satisfying E{IMol + [M];b2) < 00. Then M is a uniformly integrable martingale. Proof: By Theorem 17.7 we have EM* < 00, and the martingale property follows by dominated convergence. 0 The basic properties of [M, N] suggest that we think of the covariation process as an inner product. A further justification is given by the following useful Cauchy-Buniakovsky-type inequalities. Proposition 17.9 (Cauchy-type inequalities, Courrege) For any contin- uous local martingales M and N, we have a.s. I[M, N] I < J Id[M, NJI < [Mf/2 [NJl/2 . ( 4) More generally, we have a.s. for any measurable processes U and V it IUV d[M, NJI < (U 2 · [M])/2(V2 . [N])/2, t > o. Proof: Using the positivity and bilinearity of the covariation, we get a.s. for any a, b E JR and t > 0 o < [aM + bN]t = a 2 [M]t + 2ab[M, N]t + b 2 [N]t. 
17. Stochastic Integrals and Quadratic Variation 335 By continuity we can choose a common exceptional null set for all a and b, and so [M, N]; < [M]t[N]t a.s. Applying this inequality to the processes M - MS and N - NS for any s < t, we obtain a.s. I [M, N] t - [M, N] S I < ([M] t - [M] s ) 1/ 2 ( [N] t - [N] s ) 1/ 2 , ( 5 ) and by continuity we may again choose a common null set. Now let 0 == to < tl < . . . < t n == t be arbitrary, and conclude from (5) and the classical Cauchy-Buniakovsky inequality that I[M,N]tl < Lk \[M,N]tk - [M,N]tk_ll < [M];/2[N];/2. To get (4), it remains to take the supremum over all partitions of [0, t]. Next write dJ1- == d[M], dv == d[N], and dp == Id[M,lV]1, and conclude from (4) that (pI)2 < J.LI vI a.s. for every interval I. By continuity we may choose the exceptional null set A to be independent of 1. Letting G c IR+ be open with connected components Ik and using the Cauchy-Buniakovsky inequality, we get on A C pG = LkPh < L k (J-th v h)1/2 < {LjJ-tljLkv h } 1/2 = (J-tGvG)1/2. By Lemma 1.34 the last relation extends to any B E B(+). Now fix any simple measurable functions f == Lk ak1Bk and 9 == Lk bk 1Bk. Using the Cauchy-Buniakovsky inequality again, we obtain on AC plfgl < LklakbklpBk < Lklakbkl(J-tBkVBk)1/2 < {Lja]J-tB j Lk b'f.vB k } 1/2 < (J-tf2vg2) 1/2 , which extends by monotone convergence to any measurable functions f and 9 on R+. In particular, in view of Lemma 1.33, we may take f(t) == Ut(w) and g(t) = vt(w) for fixed w E AC. 0 Let £ denote the class of bounded, predictable step processes with jumps at finitely many fixed times. To motivate the construction of general stochastic integrals and for subsequent needs, we shall establish a basic identity for elementary integrals. Lemma 17.10 (covariation of elementary integrals) }tor any continuous local martingales M, N and processes U, V E £, the integrals U . M and V . N are again continuous local martingales, and we have [U . M, V . N] == (UV) . [M, N] a.s. (6) Proof: We may clearly take Mo == No == o. The first assertion follows by localization from Lemma 17.3. To prove (6), let U t = Ek<nkl(tk,tk+l](t), where k is bounded and Ftk -measurable for each k. By localization we may assume M, N, and [M, N] to be bounded, so that M, N and M N - (M, N] 
336 Foundations of Modern Probability are martingales on  +. Then E(U . M)ooN oo - ELjj(MtJ+l - M tj ) Lk (N tH1 - N tk ) - ELkdMtk+lNtk+l - MtkN tk ) - ELkk([M,NhHl - [M,Nhk) - E(U. [M, N])oo. Replacing M and N by MT and NT for an arbitrary optional time 7, we get E(U. M)rNT =: E(U . Mr)ooN =: E(U . [M r , NT])oo == E(U . [M, N))r. By Lemma 7.13 the process (U. M)N - U. [M, N] is then a martingale, and so [U. M, N] == U. [M, N] a.s. The general formula follows by iteration. 0 In order to extend the stochastic integral V . M to more general pro- cesses V, it is convenient to take (6) as the characteristic property. Given a continuous local martingale M, let L(M) denote the class of all progressive processes V such that (V 2 · [.l\IJ])t < 00 a.s. for every t > O. Theorem 17.11 (stochastic integral, ltB, Kunita and Watanabe) For any continuous local martingale M and process Y E L( M), there exists an a. s. unique continuous local martingale V.M with (V.M)o = 0 such that [V.M, N] = V. [M,N) a.s. for every continuous local martingale N. Proof: To prove the uniqueness, let M' and Mil be continuous local martingales with M == M' = 0 such that [M', N] == [Mil, N] = V . [M, N] a.s. for all continuous local martingales N. By linearity we get [M' - Mil, N] == 0 a.s. Taking N =: M' - Mil gives [M' - M"] == 0 a.s. But then (M' - M")2 is a local martingale starting at 0, and it easily follows that M' = M" a.s. To prove the existence, we may first assume that IIVIIL = E(y 2 . [M])oo < 00. Since V is measurable, we get by Proposition 17.9 and the Cauchy-Buniakovsky inequality IE(V. [M, N])ool < IIVIIMI/NII, N E M 2 . The mapping N .-+ E(V . [M, N])oo is then a continuous linear functional on M 2 , and so by Lemma 17.4 there exists an element V . M E M2 with E(V. [M, N])oo = E(V . M)ooN oo , N E M 2 . Now replace N by NT for an arbitrary optional time T. By Theorem 17.5 and optional sampling we get E(V. [M, N])r = E(V. [M, N]T)OO = E(V . [M, NT])oo = E(V. M)ooN.,. = E(V . M).,.N.,.. Since V is progressive, it follows by Lemma 7.13 that V. (M, NJ - (V. M)N is a martingale, which means that [V . M, N] = V . [M, N] a.s. The last 
17. Stochastic Integrals and Quadratic Variation 337 relation extends by localization to arbitrary continuous local martingales N. In the general case, define Tn == inf {t > 0; (V 2 . [M])t == n}. By the previous argument there exist some continuous local martingales V . MTn such that, for any continuous local martingale N, [V. MTn,N] == V. [MTn,N] a.s., n E L (7) For m < n it follows that (V . MTn )Tm satisfies the corresponding relation with [MT m , N], and so (V . MTn )Tm == V . MTm a.s. Hence, there exists a continuous process V . M with (V . M)T n == V . MTn a.s. for all n, and Lemma 17.1 shows that V. M is again a local martingale. Finally, (7) yields [V . M, N] == V . [M, N] a.s. on [0, Tn] for each n, and so the same relation holds on IR+. 0 By Lemma 17.10 we note that the stochastic integral V . M of the last theorem extends the previously defined elementary integral. It is also clear that V. M is a.s. bilinear in the pair (V, M) and satisfies the following basic continuity property. Lemma 17.12 (continuity) For any continuous local martingales M n and processes V n E L(M n ), we have (V n . M n )*  0 iff (V; . (Mn])oo  O. Proof: Recall that [V n . M n ] = V; . (M n ] and use Proposition 17.6. 0 Before continuing the study of stochastic integrals, it is convenient to extend the definition to a larger class of integrators. A process X is said to be a continuous semimartingale if it can be written as a sum M + A, where M is a continuous local martingale and A is a continuous, adapted process of locally finite variation and with Ao == O. By Proposition 17.2 the decomposition X = M + A is then a.s. unique, and it is often referred to as the canonical decomposition of X. By a continuous semimartingale in R d we mean a process X = (Xl,. .. , X d ) such that the cOlnponent processes X k are one-dimensional continuous semimartingales. Let L(A) denote the class of progressive processes V such that the process (V . A)t = J V dA exists in the sense of ordinary Stieltjes integration. For any continuous semimartingale X == M + A we lllay write L(X) == L(M) nL(A), and we define the integral of a process V E L(X) as the sum V . X = V . M + V . A. Note that V . X is again a continuous semimartingale with canonical decomposition V . M + V . A. For progressive processes V, it is further clear that V E L(X) iff V 2 E L([M]) and r E L(A). From Lemma 17.12 we may easily deduce the following stochastic version of the dominated convergence theorem. Corollary 17.13 (dominated convergence) For any continuous semimar- tingale X, let U, V, VI, V 2 , . .. E L(X) with IVnl < U and V n -+ V. Then (V n . X - V . X);  0, t > O. 
338 Foundations of Modern Probability Proof: Assume that X == M + A. Since U E L(X), we have U 2 E L([M]) and U E L(A). Hence, by dominated convergence for ordinary Stieltjes integrals, «(V n - V)2 . [M])t  0 and (V n . A - V . A); --t 0 a.s. By Lemma 17.12 the former convergence implies (V n . M - V . M);  0, and the assertion follows. 0 The next result extends the elementary chain rule of Lemma 1.23 to stochastic integrals. Proposition 17.14 (chain rule) Consider a continuous semimartingale X and two progressive processes U and V, where V E L(X). Then U E L(V . X) iff UV E L(X), in which case U . (V . X) == (UV) . X a.s. Proof: Let M + A be the canonical decomposition of X. Then U E L(V. X) iff U 2 E L([V . M]) and U E L(V . A), whereas UV E L(X) iff (UV)2 E L([M]) and UV E L(A). Since [V.M] == V 2 .[M], the two pairs of conditions are equivalent. The formula U . (V . A) == (UV) . A is elementary. To see that even U . (V .1\,f) == (UV). M a.s., let N be an arbitrary continuous local martingale, and note that [(UV) . M, N] (UV). [M,N] = U. (V. [M,N]) U. [V. M,N] == [U. (V. M),N]. o The next result shows how the stochastic integral behaves under optional stopping. Proposition 17.15 (optional stopping) For any continuous semimartin- gale X, process V E L(X), and optional time 7, we have a.s. (V . X)T == V . X T == (Vl[O,r]) . X. Proof: The relations being obvious for ordinary Stieltjes integrals, we may assume that X == M is a continuous local martingale. Then (V . M)T is a continuous local martingale starting at 0, and we have [(V. 1Vf)T, N] [V. M,N T ] == V. [M,N T ] == V. [MT,N] V . [M, N]T == (V1[o,T]) . [M, N]. Thus, (V . M)T satisfies the conditions characterizing the integrals V . MT and (Vl[o,T]) . M. 0 We may extend the definitions of quadratic variation and covariation to arbitrary continuous semimartingales X and Y with canonical decom- positions M + A and N + B, respectively, by putting [X] = [M] and [X, Y] == [M, N]. As a key step toward the development of a stochastic calculus, we show how the covariation process can be expressed in terms of stochastic integrals. In the martingale case, the result is implicit in the proof of Theorem 17.5. 
17. Stochastic Integrals and Quadratic Variation 339 Theorem 17.16 (integration by parts) For any continuous semimartin- gales X and Y, we have a.s. Xy == XoYo + X . Y + Y . X + [X, Y]. (8) Proof: We may take X == Y, since the general result \J\rill then follow by polarization. First let X == M E M 2 , and define v n and Qn as in the proof of Theorem 17.5. Then V n -4 M and In! < Mt < 00, and so Corollary 17.13 yields (V n . M)t  (M. M)t for each t > o. Thus, (8) follows in this case as we let n -4 00 in the relation ]\;1 2 == V n . 1\1 + Qn, and it extends by localization to general continuous local martingales A1 with J\;1 0 == o. If instead X == A, formula (8) reduces to A 2 == 2A. A, which holds by Fubini's theorem. Turning to the general case, we may assume that Xo == 0, since the formula for general Xo will then follow by an easy computation from the result for X -Xo. In this case (8) reduces to X 2 == 2X.X +[M]. Subtracting the formulas for M2 and A 2 , it remains to prove that AJ\1 == A . !vI + !v! . A a.s. Then fix any t > 0, and introduce the processes A == A(k-l)t/n, M;- == M kt / n , which satisfy S E t(k - 1,k]jn, k,n E N, AtMt == (An. M)t + (A1 n . A)t, n E L Here (An. M)t  (A . M)t by Corollary 17.13 and (MH . A)t  (M . A)t by dominated convergence for ordinary Stieltjes integra]s. 0 The terms quadratic variation and covariation are justified by the following result, which extends Theorem 13.9 for Brownian motion. Proposition 17.17 (approximation, Fisk) Let X and Y be continuous semimartingales, fix any t > 0, and consider for every n E N a partition o == tn,o < tn,l < . .. < tn,k n == t such that maxk(tn,k - tn,k-l)  o. Then (n = Lk (Xtn,k - Xtn,k-l ) (l'tn,k - l'tn,k-J  [X, yJt. (9) Proof: We may clearly assume that Xo == Yo == o. Introduce the predictable step processes X;- == Xtn,k_l' Ysn == "Yt n . k - 1 , S E (tn,k-l, tn,k], k, n E N, and note that Xt¥t == (X n . Y)t + (y n . X)t + (n, n E:: N. Since X n -t X and yn -t Y, and also (X n ); < X; < 00 and (yn); < X; < 00, we get by Corollary 17.13 and Theorem 17.16 p (n  Xtrt - (X . Y)t - (Y . X)t == [X, Y]t. o We proceed to prove a version of lto's formula, arguably the most im- portant formula in modern probability. The result sho,vs that the class of 
340 Foundations of Modern Probability continuous semimartingales is preserved under smooth mappings; it also exhibits the canonical decomposition of the image process in terms of the components of the original process. Extended versions appear in Corollaries 17.19 and 17.20 as well as in Theorems 22.5 and 26.7. Let C k = Ck(JRd) denote the class of k times continuously differentiable functions on JRd. When I E C 2 , we write II and II; for the first- and second- order partial derivatives of f. Here and below, summation over repeated indices is understood. Theorem 17.18 (substitution rule, ItD) For any continuous semimartin- gale X in jRd and function f E C 2 (JRd), we have a.s. I(X) == j(X o ) + jf(X). Xi + !ffj(X). [Xi,xj]. (10) The result is often written in differential form as df(X) == If(X)dX i + !lij(X) d[Xi, xj]. It is suggestive to think of Ita's formula as a second-order Taylor expansion df(X) = ff(X)dX i + !ffj(X)dXidX j , where the second-order differential dXidX j is interpreted as d[X i , Xj]. If X has canonical decomposition M + A, we get the corresponding de- composition of f(X) by substituting Mi + Ai for Xi on the right of (10). When M == 0, the last term vanishes, and (10) reduces to the familiar sub- stitution rule for ordinary Stieltjes integrals. In general, the appearance of this ltD correction term shows that the Ita integral does not obey the rules of ordinary calculus. Proof of Theorem 17.18: For notational convenience we may assume that d == 1, the general case being similar. Then fix a one-dimensional, contin- uous semimartingale X, and let C denote the class of functions I E C 2 satisfying (10), now appearing in the form f(X) == f(Xo) + f'(X) . X + f"(X) . [X]. (11) The class C is clearly a linear subspace of C 2 containing the functions f(x) = 1 and f(x) = x. We shall prove that C is closed under multiplication and hence contains all polynomials. To see this, assume that (11) holds for both f and g. Then F = f(X) and G = g(X) are continuous semimartingales, and so, by the definition of the integral together with Proposition 17.14 and Theorem 17.16, we have (fg)(X) - (fg)(Xo) - FG - FoGo == F · G + G . F + [F, G] - F. (g'(X). X + g"(X). [X]) +G. (f'(X) . X + f"(X). [X]) + [f'(X). X, g'(X). X] - (fg' + f'g)(X) . X + (fg" + 2f'9' + f"g) (X) . [X] - (fg)'(X). X + (fg)"(X) . [X]. 
17. Stochastic Integrals and Quadratic Variation 341 Now let f E 0 2 be arbitrary. By Weierstrass' approximation theorem, we may choose some polynomials Pl,P2,... such that sUPlxl:::;c IPn(x) - f" (x) I -t 0 for every c > O. Integrating the Pn twice yields polynomials f n satisfying sup (Ifn(x) - f(x)1 V If(x) - f'(x)1 V If::(x) - f"(x) I) -t 0, C > O. Ixlc In particular, fn(X t ) -t f(Xt) for each t > O. Letting M +A be the canon- ical decomposition of X and using dominated convergence for ordinary Stieltjes integrals, we get for any t > 0 (f(X) . A + f(X) . [X])t -t (f'(X) . A + f"(J) . [X])t. Similarly, (f(X) - f'(X))2 . [M])t -+ 0 for all t, and so by Lemma 17.12 (f(X) . M)t  (f'(X) . M)t, t > O. Thus, equation (11) for the polynomials fn extends in the limit to the same formula for f. 0 We sometimes need a local version of the last theorem, involving stochas- tic integrals up to the time (D when X first leaves a given domain D C }Rd. If X is continuous and adapted, then (D is clearly predictable, in the sense of being announced by some optional times Tn t (D such that Tn < (D a.s. on {(v > O} for all n. In fact, writing p for the Euclidean metric in }Rd, we may choose Tn = inf{t E [0, n]; p(X t , DC) < n- 1 }, n E: N. (12) We say that X is a semimartingale on [0, (D) if the stopped process XTn is a semimartingale in the usual sense for every n E N. In that case, we may define the covariation processes [Xi, Xj] on the interval [0, (D) by requiring [Xi, XJ]T n = [(Xi)Tn, (Xj)T n ] a.s. for every n. Stochastic integrals with respect to Xl, . . . , x d are defined on [0, (D) in a similar way. Corollary 17.19 (local lto-formula) For any domain D C }Rd, let X be a continuous semimartingale on [0, (D). Then (10) holds a.s. on [0, (D) for every f E C 2 (D). Proof: Choose some functions in E C 2 (JR d ) with in (x) == f(x) when p(x, DC) > n- 1 . Applying Theorem 17.18 to fn(XT n ) with Tn as in (12), we get (10) on [0, Tn]. Since n was arbitrary, the result extends to [0, (D). 0 By a complex-valued, continuous semimartingale we mean a process of the form Z = X + iY, where X and Yare real continuous semimartin- gales. The bilinearity of the covariation process suggests that we define the quadratic variation of Z as [Z] = [Z, Z] = [X + iY, X + iY] = [X] + 2i[X, -Y] - [Y]. 
342 Foundations of Modern Probability Let L(Z) denote the class of processes W = U + iV with U, V E L(X) n L(Y). For such a process W, we define the integral by W . Z (U + iV) . (X + iY) == U. X - V . Y + i(U . Y + V . X). Corollary 17.20 (conformal mapping) Let f be an analytic function on some domain DeC. Then (10) holds for any D-valued, continuous semimartingale Z. Proof: Writing f(x + iy) == g(x, y) + ih(x, y) for any x + iy E D, we get , ' h ' f ' gl + Z 1 == , , ' h ' ' f ' g2 + Z 2 == Z , and so by iteration g " + ih" - f " 11 11 - , g " + ' h " -  f " 12 z 12 - tJ , g " + ,; h " - f " 22 tJ 22 - - . Equation (10) now follows for Z == X + iY, as we apply Corollary 17.19 to the semimartingale (X, Y) and the functions 9 and h. 0 We also consider a modification of the It6 integral that does obey the rules of ordinary calculus. Assuming both X and Y to be continuous semimartingales, we define the Fisk-Stratonovich integral by it X 0 dY = (X . Y)t + 4 [X, Y]t, t > 0, (13) or in differential form X 0 dY == XdY + d[X, Y], where the first term on the right is an ordinary It6 integral. Corollary 17.21 (modified substitution rule, Fisk, Stratonovich) For any continuous semimartingale X in JRd and function f E C 3 (IR d ), we have a.s. f(Xt) = f(Xo) + it f:(X) 0 dX i , t > O. Proof: By Ita's formula, fleX) = fl(X o ) + flj(X) . xj + f:'/k(X) . [xj, X k ]. Using It6's formula again, together with (6) and (13), we get fleX) . Xi + [fieX),Xi] fi(X) . Xi + fij(X) . [xj, Xi] = f(X) - f(Xo). 0 Unfortunately, the more convenient substitution rule of Corollary 17.21 comes at a high price: The new integral does not preserve the martingale property, and it requires even the integrand to be a continuous semimartin- gale. It is the latter restriction that forces us to impose stronger regularity conditions on the function f in the substitution rule. Our next task is to establish a basic uniqueness property, justifying our reference to the process V . M in Theorem 17.11 as an integral. 1 f:(X) 0 dX i 
17. Stochastic Integrals and Quadratic Variation 343 Theorem 17.22 (uniqueness) The integral V . M in Theorem 17.11 is the a.s. unique linear extension of the elementary stochastic integral such that, for any t > 0, the convergence (V; . [M])t  0 implies (V n . M);  o. The statement follows immediately from Lemmas 17.10 and 17.12, together with the following approximation of progressive processes by predictable step processes. Lemma 17.23 (approximation) For any continuous sem,imartingale X == M + A and process V E L(X), there exist some processes VI, V 2 ,'" E £ such that a.s. ((V n - V)2 . [M])t -+ 0 and ((V n - V) . A); -+ 0 for every t > o. Proof: It is enough to take t == 1, since we can then combine the processes V n for disjoint finite intervals to construct an approxima.ting sequence on +. Furthermore, it suffices to consider approximations in the sense of convergence in probability, since the a.s. versions will then follow for a suitable subsequence. This allows us to perform the construction in steps, first approximating V by bounded and progressive processes V', next ap- proximating each V' by continuous and adapted processes V", and finally approximating each V" by predictable step processes V"'. Here the first and last steps are elementary, so we may concentrate on the second step. Then let V be bounded. We need to construct some continuous, adapted processes V n such that ((V n - V)2. [M])I -+ 0 and ((V n - V). A)i -+ o a.s. Since the V n can be taken to be uniformly bounded, we may replace the former condition by (IV n - VI . [M])l -+ 0 a.s. Thus, it is enough to establish the approximation (IV n - VI . A)l -+ 0 in the case when A is a nondecreasing, continuous, adapted process with Ao == O. Replacing At by At + t if necessary, we may even assume that A is strictly increasing. To construct the required approximations, we may introduce the inverse process Ts = sup{ t > 0; At < s}, and define h = h- 1 t V dA = h- 1 f At V(Ts)ds, t, h > O. JT(At-h) (At- h )+ By Theorem 2.15 we have V h 0 T -7 V 0 T as h -7 0, a.e. on [0, AI]' Thus, by dominated convergence, 1 1 IV h - VidA = l A1 IVh(Ts) - V(Ts)lds  O. The processes V h are clearly continuous. To prove that they are also adapted, we note that the process T(A t - h) is adapted for every h > 0 by the definition of T. Since V is progressive, it is further seen that V . A is adapted and hence progressive. The adaptedness of (V . A)T(At-h) now follows by composition. 0 Though the class L(X) of stochastic integrands is sufficient for most purposes, it is sometimes useful to allow the integration of slightly more 
344 Foundations of Modern Probability general processes. Given any continuous semimartingale X == M + A, let L(X) denote the class of product-measurable processes V such that (V - V) . [M] == 0 and (V - V) . A = 0 a.s. for some process V E L(X). For V E L(X) we define V . X == V . X a.s. The extension clearly enjoys all the previously established properties of stochastic integration. It is often important to see how semimartingales, covariation processes, and stochastic integrals are transformed by a random time-change. Let us then consider a nondecreasing, right-continuous family of finite optional times Ts, S > 0, here referred to as a finite random time-change T. If even :F is right-continuous, then by Lemma 7.3 the same thing is true for the induced filtration 98 == :F Ts ' S > o. A process X is said to be T-continuous if it is a.s. continuous on JR+ and constant on every interval [T 8 -, is], S > 0, where TO- == X o - == 0 by convention. Theorem 17.24 (random time-change, Kazamaki) Let T be a finite random time-change with induced filtration g, and let X == M + A be a T- continuous F -semimartingale. Then X OT is a continuous g -semimartingale with canonical decomposition MOT + A 0 T and such that [X 0 T] == [X] 0 T a.s. Furthermore, V E L(X) implies V 0 T E t(X 0 T) and (Vor).(XoT)==(V.X)or a.s. (14) Proof: It is easy to check that the time-change X ...-.t X 0 r preserves continuity, adaptedness, monotonicity, and the local martingale property. In particular, X 0 r is then a continuous Q-semimartingale with canonical decomposition M 07+ AOT. Since M 2 - [M] is a continuous local martingale, the same thing is true for the time-changed process M2 0 r - [M] 0 T, and so [X 0 r] = [M 0 T] == [M] 0 T == [X] 0 T a.s. If V E L(X), we also note that V 0 7 is product-measurable, since this is true for both V and r. Fixing any t > 0 and using the r-continuity of X, we get -1 (l[o,tl 0 T) . (X 0 T) = 1[O'Tt-1] . (X 0 T) = (X 0 r)Tt == (l[o,t] . X) 0 T, which proves (14) when V == l(o,t]. If X has locally finite variation, the result extends by a monotone class argument and monotone convergence to arbitrary V E L(X). In general, Lemma 17.23 yields the existence of some continuous, adapted processes VI, V 2 ,. .. such that I(V n - V)2d[M] --+ 0 and J I(V n - V)dAI --+ 0 a.s. By (14) the corresponding properties hold for the time-changed processes, and since the processes V n 0 Tare right- continuous and adapted, hence progressive, we obtain V 0 T E L(X 0 r). Now assume instead that the approximating processes VI, V 2 ,. .. are pre- dictable step processes. The previous calculation then shows that (14) holds for each V n , and by Lemma 17.12 the relation extends to V. 0 Let us next consider stochastic integrals of processes depending on a parameter. Given any measurable space (8, S), we say that a process V on 
17. Stochastic Integrals and Quadratic Variation 345 S x JR+ is progressive if its restriction to S x [0, t] is S @ Bt @ Ft-measurable for every t > 0, where Bt == 8([0, t]). A simple version of the following result will be useful in Chapter 18. Theorem 17.25 (dependence on parameter, Doleans, Stricker and Yor) Let X be a continuous semimartingale, fix a measurable space S, and con- sider a progressive process V s (t), s E S, t > 0, such that V s E L( X) for every s E S. Then the process Ys(t) == (V s . X)t has a version that is progressive on S x 1R+ and a.s. continuous for each s E 5. Proof: Let M + A be the canonical decomposition of X. Assume the existence of some progressive processes Vsn on S x ffi.+ such that, for any t > 0 and s E S, «V s n - \/;,)2 . [M])t  0, «V s n - V s ). A);  O. Then Lemma 17.12 yields (Vsn.X - Vs.X);  0 for every sand t. Proceeding as in the proof of Proposition 4.31, we may choose a subsequence (nk (s)) c N, depending measurably on s, such that the same convergence holds a.s. along (nk(s)) for any sand t. Define Ys,t = limsuPk(V;k . X)t whenever this is finite, and put Y:s,t == 0 otherwise. If we can choose versions of the processes (Vsn . X)t that are progressive on S x lR+ and a.s. continuous for each s, then Ys,t is clearly a version of the process (V s . X)t with the same properties. This argument will now be applied in three steps. First we reduce to the case of bounded and progressive integrands by taking V n == V1{IVI < n}. Next we apply the transformation in the proof of Lemma 17.23, to reduce to the case of continuous and progressive integrands. In the final step, we approximate any continuous, progres- sive process V by the predictable step processes Vsn(t) == V s (2-n[2 n t]). Here the integrals Vsn . X are elementary, and the desired continuity and measurability are obvious by inspection. 0 We turn to the related topic of functional representations. To motivate the problem, note that the construction of the stochastic integral V . X depends in a subtle way on the underlying probability measure P and filtration F. Thus, we cannot expect any universal representation F(V, X) of the integral process V.X. In view of Proposition 4.31, one might still hope for a modified representation F(J.l, V, X), where fL denotes the distribution of (V, X). Even this could be too optimistic, however, since the canonical decomposition of X may also depend on F. Dictated by our needs in Chapter 21, we restrict our attention to a very special situation, which is still general enough to cover most applications of interest. Fixing any progressive functions O"} and b i of suitable dimension, defined on the path space C(1R+, JRd), we may consider an arbitrary adapted process X satisfying the stochastic differential equation dX; = O";(t, X)dBf + bi(t, X)dt, (15) 
346 Foundations of Modern Probability where B is a Brownian motion in IR T . A detailed discussion of such equations is given in Chapter 21. For the moment, we need only the simple fact from Lemma 21.1 that. the coefficients aJ(t, X) and bi(t, X) are again progressive. Write aij == aaic. Proposition 17.26 (functional representation) For any progressive func- tions a, b, and f of suitable dimension, there exists a measurable mappzng F: P(C(1R+,IR d ) x C(R+,IR d ) -+ C(JR+,) (16) such that, whenever X is a solution to (15) with £(X) == J.L and fi(X) E L(Xi) for all i, we have fi(X) . Xi = F(J.L, X) a.s. Proof: From (15) we note that X is a semi martingale with covariation processes [Xi, xj] == aij (X) . A and drift components b i (X) . A. Hence, fi(X) E L(Xi) for all i iff the processes (fi)2a ii (X) and fibi(X) are a.s. Lebesgue integrable. Note that this holds in particular when j is bounded. Now assume that fl, f2,... are progressive with (f - fi)2a ii (X) . A -+ 0, I(f - ji)bi(X)1 . A  O. (17) . .. . p Then (f(X) . XZ - ft(X) . X); -+ 0 for every t > 0 by Lemma 17.12. Thus, if f(X) . Xi = Fn(J.L, X) a.s. for some measurable mappings Fn as in (16), then Proposition 4.31 yields a similar representation for the limit fi(X) . Xi. As in the preceding proof, we may apply this argument in three steps, reducing first to the case when j is bounded, next to the case of continuous f, and finally to the case when f is a predictable step function. Here the first and last steps are again elementary. For the second step, we may now use the simpler approximation fn(t,x) == n i t f(s,x)ds, t > 0, n E N, x E C(R+,R d ). (t-n- 1 )+ By Theorem 2.15 we have fn(t,x)  f(t,x) a.e. in t for each x E C(1R+,JR d ), and (17) follows by dominated convergence. 0 Exercises 1. Show that if M is a local martingale and  is an Fo-measurable random variable, then the process Nt = Mt is again a local martingale. 2. Use Fatou's lemma to show that every local martingale M > 0 with EMo < 00 is a supermartingale. Also show by an example that M may fail to be a martingale. (Hint: Let Mt = X t /(I-t)+, where X is a Brownian motion starting at 1, stopped when it reaches 0.) 
17. Stochastic Integrals and Quadratic Variation 347 3. Fix a continuous local martingale M. Show that M and [M] have a.s. the same intervals of constancy. (Hint: For any r E Q+, put T == inf{t > r; [M]t > [M]r}' Then MT is a continuous local martingale on [r, (0) with quadratic variation 0, so MT is a.s. constant on [s, T]. Use a similar argument in the other direction.) 4. For any continuous local martingales M n starting at 0 and associated optional times Tn, show that (Mn);n  0 iff [Mn]Tn  O. State the corresponding result for stochastic integrals. 5. Show that there exist some continuous semimartingales Xl, x 2 , . .. such that X  0 and yet [Xn]t  0 for all t > O. (Hint: Let B be a Brow- nian motion stopped at time 1, put A k2 -n == B(k-l)+2-n, and interpolate linearly. Define X n == B - An.) 6. Consider a Brownian motion B and an optional tirne T. Show that EB, == 0 when ET I / 2 < 00 and that EB; == ET when ET < 00. (Hint: Use optional sampling and Theorem 17.7.) 7. Deduce the first inequality in Proposition 17.9 from J>roposition 17.17 and the classical Cauchy-Buniakovsky inequality. 8. Prove for any continuous semimartingales X and Y that [X + y]1/2 < [X] 1/2 + [Y] 1/2 a.s. 9. (Kunita and Watanabe) Let M and N be continuous local martingales, and fix any p,q,r > 0 with p-l + q-1 == r- 1 . Show that II[M,N]tllr < II [M]t IIplI[N]t Ilq for all t > O. 10. Let M, N be continuous local martingales with Mo == No == O. Show that M JlN implies [M, N] = 0 a.s. Also show by an example that the converse is false. (Hint: Let M == U . Band N == V . B for a Brownian motion B and suitable U, Y E L(B).) 11. Fix a continuous semimartingale X, and let U, V E L(X) with U == V a.s. on some set A E Fo. Show that U . X == V . X a.s. on A. (Hint: Use Proposition 17.15.) 12. Fix a continuous local martingale M, and let U, U I , U 2 ,... and V, VI, V 2 ,'" E L(M) with IUnl < V n , Un  U, V n  V, and ((V n - V). M);  0 for all t > O. Show that (Un' M)t  (U . M)t for all t. (Hint: Write (Un - U)2 < 2(V n - V)2 + 8y 2 , and use Theorem 1.21 and Lemmas 4.2 and 1 7.12. ) 13. Let B be a Brownian bridge. Show that Xt == Btl\l is a semimartingale on R+ w.r.t. the induced filtration. (Hint: Note that Mt == (1 - t)-l Bt is a martingale on [0,1), integrate by parts, and check that the compensator has finite variation.) 14. Show by an example that the canonical decomposition of a continuous semimartingale may depend on the filtration. (Hint: Let B be Brownian mo- tion with induced filtration :F, put Qt = :Ft V a(B 1 ), and use the preceding result. ) 
348 Foundations of Modern Probability 15. Show by stochastic calculus that t- P Bt -t 0 a.s. as t -7 00, where B is a Brownian motion and p > . (Hint: Integrate by parts to find the canonical decomposition. Compare with the Ll-limit.) 16. Extend Theorem 17.16 to a product of n semimartingales. 17. Consider a Brownian bridge X and a bounded, progressive process V with Jo 1 vtdt = 0 a.s. Show that E J0 1 V dX = O. (Hint: Integrate by parts to get Je: V dX == J (V - U)dB, where B is a Brownian motion and U t == (1 - t)-l J/ Vsds.) 18. Show that Proposition 17.17 remains valid for any finite optional times t and tnk satisfying maxk(tnk - tn,k-l)  o. 19. Let M be a continuous local martingale. Find the canonical decompo- sition of IMIP when p > 2, and deduce for such a p the second relation in Theorem 17.7. (Hint: Use Theorem 17.18. For the last part, use Holder's inequality. ) 20. Let M be a continuous local martingale with Mo == 0 and [M]oo < 1. Show for any r > 0 that P {SUPt Mt > r} < e _r 2 /2. (Hint: Consider the supermartingale Z == exp(cM - c 2 [M]/2) for a suitable c > 0.) 21. Let X and Y be continuous semimartingales. Fix at> 0 and a se- quence of partitions (tnk) of [0, t] with maxk(tnk - t n ,k-1) -7 o. Show that ! EkCY tnk + ¥tn,k-l)(X tnk - Xtn,k-l)  (Y 0 X)t. (Hint: Use Corollary 17.13 and Proposition 17.17.) 22. Show that the Fisk-Stratonovich integral satisfies the chain rule U 0 (VoX) = (UV) oX. (Hint: Reduce to Ito integrals and use Theorems 17.11 and 17.16 and Proposition 17.14.) 23. A process is predictable if it is measurable with respect to the a-field in JR+ x Q induced by all predictable step processes. Show that every pre- dictable process is progressive. Conversely, given a progressive process X and a constant h > 0, show that the process yt = X(t-h)+ is predictable. 24. Given a progressive process V and a nondecreasing, contiuous, adapted process A, show that there exists some predictable process V with IV - VI. A == 0 a.s. (Hint: Use Lemma 17.23.) 25. Given the preceding statement, deduce Lemma 17.23. (Hint: Begin with predictable V, using a monotone class argument.) 26. Construct the stochastic integral V . M by approximation from ele- mentary integrals, using Lemmas 17.10 and 17.23. Show that the resulting integral satisfies the relation in Theorem 17.11. (Hint: First let M E M 2 and E(V 2 . [M])(X) < 00, and extend by localization.) d - ,.., ,.., 27. Let (V, B) = (V, B), where Band B are Brownian motions on possibly different filtered probability spaces and V E L(B), V E L(B). Show that d - ,..,,.., ,.., (V, B, V . B) = (V, B, V . B). (Hint: Argue as in the proof of Proposition 17.26. ) 
17. Stochastic Integrals and Quadratic Variation 349 28. Let X be a continuous F-semimartingale. Show that X remains a semimartingale conditionally on :Fo, and that the conditional quadratic variation agrees with [X]. Also show that if V E L(X), where V == a(Y) for some continuous process Y and measurable function a, then V remains conditionally X-integrable, and the conditional integral agrees with V . X. (Hint: Conditioning on Fo preserves martingales.) 
Chapter 18 Continuous Martingales and Brownian Motion Real and complex exponential martingales; martingale charac- terization of Brownian motion; random time-change of mar- tingales; integral representation of martingales; iterated and multiple integrals; change of measure and Girsanov's theo- rem; Cameron-Martin theorem; Wald's i4entity and Novikov's condition This chapter deals with a wide range of applications of the stochastic calculus, the principal tools of which were introduced in the preceding chapter. A recurrent theme is the notion of exponential martingales, which appear in both a real and a complex variety. Exploring the latter yields an effortless approach to Levy's celebrated martingale characterization of Brownian motion as well as to the basic random time-change reduction of isotropic continuous local martingales to a Brownian motion. By applying the latter result to suitable compositions of Brownian motion with har- monic or analytic functions, we may deduce some important information about Brownian motion in JRd. Similar methods can be used to analyze a variety of other transformations that lead to Gaussian processes. As a further application of the exponential martingales, we shall derive stochastic integral representations of Brownian functionals and martingales and examine their relationship to the chaos expansions obtained by differ- ent methods in Chapter 13. In this context, we show how the previously introduced multiple Wiener-Ita integrals can be expressed as iterated single Ita integrals. A similar problem, of crucial importance for Chapter 21, is to represent a continuous local martingale with absolutely continuous covari- ation processes in terms of stochastic integrals with respect to a suitable Brownian motion. Our last main topic is to examine the transformations induced by an absolutely continuous change of probability measure. The density process turns out to be a real exponential martingale, and any continuous local martingale in the original setting will remain a martingale under the new measure, apart from an additional drift term. The observation is useful for applications, where it is often employed to remove the drift from a given semimartingale. The appropriate change of measure then depends on the 
18. Continuous Martingales and Brownian Motion 351 process, and it becomes important to derive effective criteria for a proposed exponential process to be a true martingale. Our present exposition may be regarded as a continuation of the dis- cussion of martingales and Brownian motion from Chapters 7 and 13, respectively. Changes of time and measure are both important for the the- ory of stochastic differential equations, as developed in (hapters 21 and 23. The time-change results for continuous martingales have a counterpart for point processes explored in Chapter 25, where general Poisson processes playa role similar to that of the Gaussian processes here. The results about changes of measure are extended in Chapter 26 to the context of possibly discontinuous semimartingales. To elaborate on the new ideas, we begin with an introduction of complex exponential martingales. It is instructive to compare them with the real versions appearing in Lemma 18.21. Lemma 18.1 (complex exponential martingales) Let M be a real contin- uous local martingale with Mo == o. Then Zt == exp(iM t + ![M]t), t > 0, is a complex local martingale satisfying Zt == 1 + i(Z . M)t a.s. Proof: Applying Corollary 17.20 to the complex-valued semimartingale Xt == iMt + [M]t and the entire function j(z) == e Z , we get dZ t == Zt(dX t + !d[X]t) == Zt(idM t + d[M]t - d[M]t) == iZtdMt. 0 The next result gives the basic connection between continuous martin- gales and Gaussian processes. For any subset K of a Hilbert space, we write k for the closed linear subspace generated by K. Lemma 18.2 (isometries and Gaussian processes) Given a subset K of a Hilbert space H, consider for each h E K a continuous local F -martingale M h with MIf == 0 such that [M h , Mk]oo = (h, k) a.s., h, k E K. (1) Then there exists an isonormal Gaussian process 1JllFo on K such that M == TJh a.s. for all h E K. Proof: Fix any linear combination Nt == u 1 M;1 + . . . + unM:n, and conclude from (1) that [N]oo == L. ujuk[Mhj,Mhk]oo == L. ujuk(hj,hk) == Ilh11 2 , J,k J,k where h == u1h 1 + ... + unh n . The process Z == exp(iN + ![N]) is a.s. bounded, and so by Lemma 18.1 it is a uniformly integrable martingale. Writing  == N oo , we hence obtain for any A E Fo PA = E[Zoo;A] = E[exp(iN oo + [N]oo);A] = E[e i l;;A]e lfh I!2/2. Since Ul, . . . , Un were arbitrary, we conclude from the uniqueness theorem for characteristic functions that the random vector (M, . . . , Mn) is in- 
352 Foundations of Modern Probability dependent of :Fa and centered Gaussian with covariances (h j , h k ). It is now easy to construct a process rJ with the stated properties. 0 As a first application, we may establish the following basic martingale characterization of Brownian motion. Theorem 18.3 (characterization of Brownian motion, Levy) Let B == (B 1 , ..., B d ) be a process in JRd with Bo == O. Then B is an :F-Brownian motion iff it is a continuous local :F -martingale with [B i , Bj]t = bijt a.s. Proof' For fixed s < t, we may apply Lemma 18.2 to the continuous local martingales M; = B:l\t - B:l\s, r > s, i = 1,..., d, to see that the differences B1 - B are i.i.d. N(O, t - s) and independent of :Fs. 0 The last theorem suggests the possibility of transforming an arbitrary continuous local martingale M into a Brownian motion through a suitable random time-change. The proposed result is indeed true and admits a nat- ural extension to higher dimensions; for convenience, we consider directly the version in ]Rd. A continuous local martingale M == (M 1 , . . . , M d ) is said to be isotropic if a.s. [M i ] == [Mj] and [M i , Mj] == 0 for all i i= j. Note in particular that this holds for Brownian motion in JRd. When M is a contin- uous local martingale in <C, the condition is clearly equivalent to [M] == 0 a.s., or [M] == [M] and [M, M] == 0 a.s. For isotropic processes M, we refer to [Ml] == ... == [M d ] or [RM] == [SSM] as the rate process of M. The proof is straightforward when [M]oo == 00 a.s., but in general it re- quires a rather subtle extension of the filtered probability space. To simplify our statements, we assume the existence of any requested randomization variables. This can always be achieved, as in the elementary context of Chapter 6, by passing from the original setup (Q, A,:F, P) to the product space (n, A, P, P), where n == n x [0,1], A == A Q9 B, Pt == :Ft x [0,1], and P == p 0 A. Given two filtrations :F and g on n, we say that 9 is a standard extension of :F if Ft C YtJlFt:F for all t > o. This is precisely the condition needed to ensure that all adaptedness and conditioning properties will be preserved. The notion is still flexible enough to admit a varety of useful constructions. Theorem 18.4 (time-change reduction, Dambis, Dubins and Schwarz) Let M be an isotropic continuous local :F-martingale in JRd with Mo == 0, and define Ts == inf{t > 0; [Ml]t > s}, Ys == FTs' S > o. Then there exists in JRd a Brownian motion B with respect to a standard extension ofg, such that a.s. B == MOT on [0, [M 1 ]oo) and M == Bo[M 1 ]. Proof: We may take d = 1, the proof in higher dimensions being similar. Introduce a Brownian motion X Ji:F with induced filtration X, and put Qt = Qt V Xt. Since (ilLX, it is clear that g is a standard extension of both 
18. Continuous Martingales and Brownian Motion 353 g and X. In particular, X remains a Brownian motion under Q. Now define Bs = M'T s + 1 s l{Tr = oo}dX r , s > o. (2) Since M is i-continuous by Proposition 17.6, Theorem 17.24 sows that the first term M 0 i is a continuous Q-martingale, hence also a Q-martingale, with quadratic variation [M 0 i]s == [M]T s == S 1\ [M]oo, s > o. The second term in (2) has quadratic variation s - s 1\ [M]OCJ' and the covariation vanishes since MOT l.LX. Thus, [B]s == s a.s., and so Theorem 18.3 shows that B is a Q-Brownian motion. Finally, Bs == M Ts for s < [M]oo, which implies M == B 0 [M] a.s. by the i-continuity of M. 0 In two dimensions, isotropic martingales arise naturally through the composition of a complex Brownian motion B with an arbitrary (possi- bly multi-valued) analytic function f. For a general continuous process X, we may clearly choose a continuous evolution of f(X), as long as X avoids the possible singularities of f. Similar results are available for harmonic functions, which is especially useful in dimensions d > 3, when no analytic functions exist. Theorem 18.5 (harmonic and analytic maps, Levy) (i) Let M be an isotropic, continuous local martingale in ffi.d, and fix an harmonic function I such that M a.s. avoids the .sigularities of f. Then I(M) is a local martingale with [f(M)] == l\7f(M)12 . [M 1 ]. (ii) Let M be a complex, isotropic, continuous local martingale, and fix an analytic function f such that M a.s. avoids the singularities of f. Then f(M) is again an isotropic local martingale, and [f(M)] == If'(M)12. [M]. If B is a Brownian motion and 1'1= 0, then [f(B)] is a.s. unbounded and strictly increasing. Proof: (i) Using the isotropy of M, we get by Corollary 17.19 I(M) == I(M o ) + II . M i + D.I(M) . [M 1 ]. Here the last term vanishes since I is harmonic, and so f(M) is a local martingale. From the isotropy of M we further obtain [f(M)] = Li[JI(M)' Mi] = L i UI(M»2. [M 1 ] = I'VJ(MW. [Ml]. (ii) Since f is analytic, we get by Corollary 17.20 I(M) == I(M o ) + f'(M) . M + f/l(M) . [Af]. (3) Here the last term vanishes since M is isotropic. The same property also yields [f(M)] = [f'(M) . M] = (f'(M))2 . [M) = 0, 
354 Foundations of Modern Probability and so I(M) is again isotropic. Finally, writing M == X + iY and I'(M) == U + iV, we get [f(M)J == [U . X - V . Y] == (U 2 + V 2 ) . [X] == /f'(M)/2 . [RM]. If I' is not identically 0, it has at most count ably many zeros. Hence, by Fubini's theorem E'x{t > 0; !'(B t ) = O} = 1 00 P{J'(Bt) = O}dt = 0, and so [RI(B)] == If'(B)12. A is a.s. strictly increasing. To see that it is also a.s. unbounded, we note that f(B) converges a.s. on the set {[f(B)] < oo}. However, I(B) diverges a.s. since I is nonconstant and the random walk Bo, B 1 , . .. is recurrent by Theorem 9.2. 0 Combining the last two results, we may derive two basic properties of Brownian motion in ]Rd, namely the polarity of singleton sets when d > 2 and the transience when d > 3. Note that the latter property is a continuous-time counterpart of Theorem 9.8 for random walks. Both prop- erties play important roles for the potential theory developed in Chapter 24. Define Ta = inf{t > 0; Bt = a}. Theorem 18.6 (point polarity and transience, Levy, Kakutani) For a Brownian motion B in ]Rd, we have the following: (i) If d > 2, then Ta == 00 a.s. for all a E ]Rd. (ii) If d > 3, then IBtl -j. 00 a.s. as t --t 00. Proof: (i) Here we may clearly take d == 2, so we may let B be a complex Brownian motion. Applying Theorem 18.5 (ii) to the entire function e Z , it is seen that M = e B is a conformal local martingale with unbounded rate [M]. By Theorem 18.4 we have M - 1 == X 0 [RM] a.s. for some Brownian motion X, and since M =I 0 it follows that X a.s. avoids -1. Hence, T-1 = 00 a.s., and by the scaling and rotational symmetries of B we get Ta == 00 a.s. for every a 1= O. To extend the result to a = 0, we may conclude from the Markov property at h > 0 that Po{ TO 0 ()h < oo} = EOPBh {TO < oo} == 0, h > O. As h -1- 0, we get Po{ TO < oo} = 0, and so TO = 00 a.s. (ii) Here we may take d = 3. For any a =I 0 we have Ta = 00 a.s. by claim (i), and so by Theorem 18.5 (i) the process M = IB - al- 1 is a continuous local martingale. By Fatou's lemma M is then an L1-bounded supermartingale, and so by Theorem 7.18 it converges a.s. toward some random variable . Since Mt  0 we have  = 0 a.s. 0 Combining part (i) of the last result with Theorem 19.11, we note that a complex, isotropic continuous local martingale avoids every fixed point outside the origin. Thus, Theorem 18.5 (ii) applies to any analytic function f with only isolated singularities. Since f is allowed to be multi-valued, 
18. Continuous Martingales and Brownian Motion 355 the result applies even to functions with essential singularities, such as to j(z) == loge! + z). For a simple application, we may consider the windings of planar Brownian motion around a fixed point. Corollary 18.7 (skew-product representation, Galmarino) Let B be a complex Brownian motion starting at 1, and choose a continuous version of V == argB with Va == O. Then Vi = Y 0 (IBI- 2 . A)t a.s. for some real Brownian motion Y lLIBI. Proof: Applying Theorem 18.5 (ii) with fez) == log(1 + z), we note that Mt == log IBtl + ivt is an isotropic martingale with rate [RM] == IBI- 2 . A. Hence, by Theorem 18.4 there exists some complex Brownian motion Z == X + iY with M == Z 0 [M] a.s., and the assertion £0110'\\1-8. 0 For a nonisotropic continuous local martingale M in d, there is no single random time-change that will reduce the process to a Brownian motion. However, we may transform each component M i separately, as in Theorem 18.4, to obtain a collection of one-dimensional Brownian mo- tions B I ,..., B d . If the latter processes happen to be independent, they may clearly be combined into a d-dimensional Brownian motion B == (B I , . . . , B d ). It is remarkable that the required independence arises au- tomatically whenever the original components M i are strongly orthogonal, in the sense that [M i , Mj) == 0 a.s. for all i =1= j. Proposition 18.8 (orthogonality and independence, Knight) Let M I , M 2 , . .. be strongly orthogonal, continuous local martingales starting at O. Then there exist some independent Brownian motions B 1 , B 2 , . .. such that M k == B k 0 [M k ] a.s. for every k. Proof: When [Mk]oo == 00 a.s. for all k, the result is an easy consequence of Lemma 18.2. In general, we may introduce a sequence of independent Brownian motions Xl, x 2 , . . . lL F with induced filtration X. Define B: == M k (,:) + Xk((s - [Mk]oo)+), s > 0, kEN, write'l.jJt == -log(1 - t)+, and put 9t == :F1/Jt + X(t-l)+, t > O. To check that BI , B 2 , . .. have the desired joint distribution, we may clearly assume the [M k ] to be bounded. Then the processes N t k == Mt + X_l)+ are strongly orthogonal, continuous Q-martingales with quadratic variations [Nk]t == [M k ]1Pt + (t - 1)+, and we note that B: == N;k, where a == inf{t > 0; s [Nk]t > s}. The assertion now follows from the result for [Mk](X) == 00 a.s. 0 As a further application of Lemma 18.2, we consider a simple continuous- time version of Theorem 11.13. Given a continuous semimartingale X on I = IR+ or [0,1) and a progressive process T on I that takes values in I == [0,00] or [0,1], respectively, we may define (X oT-1)t = jl{T s < t}dX s , t E I, 
356 Foundations of Modern Probability as long as the integrals on the right exist. For motivation, we note that if  is a random measure on I with "distribution function" Xt = [O, t], tEl, then X 0 T- 1 is the distribution function of the transformed measure  0 T- 1 . Proposition 18.9 (measure-preserving progressive maps) Let B be a Brownian motion or bridge on I = JR+ or [0, 1], respectively, and let T be a progressive process on I such that AoT- 1 == A a.s. Then BoT- 1 d B. Proof: The result for I = JR+ is an immediate consequence of Lemma 18.2, and so we may assume that B is a Brownian bridge on [0, 1]. Then Mt == Bt/(l- t) is a martingale on [0,1), and therefore B is a semimartingale on the same interval. Integrating by parts gives dBt = (1 - t)dM t - Mtdt = dX t - Mtdt. (4) Thus, [X]t = [B]t == t a.s. for all t, and X is a Brownian motion by Theorem 18.3. Now let V be a bounded, progressive process on [0,1] such that the integral V = Jo 1 "Vtdt is a.s. nonrandom. Integrating by parts, we get for any u E [0, 1) 1'" VtMt dt M", 1'" Vtdt - 1'" dMt it Vsds 1'" dMt 1 1 Vsds - M", 1 1 Vtdt. As u --t 1, we have (1 - u)Mu = Bu --t 0, and so the last term tends to O. Hence, by dominated convergence and (4), 1 1 VtdBt = 1 1 VtdX t -1 1 VtMtdt = 1 1 (Vt - V t)dX t , where V t = (1- t) -1 It! Vsds. If U is another bounded, progressive process, we get by a simple calculation 1 1 (U t - U t)(Vt - V t)dt = 1 1 UtVtdt - UV . For U r = l{Tr < s} and V r = l{Tr < t}, the right-hand side becomes s 1\ t - st = E(BsBt), and the assertion follows by Lemma 18.2. 0 We turn to a basic representation of martingales with respect to a Brownian filtration. 
18. Continuous Martingales and Brownian Motion 357 Theorem 18.10 (Brownian martingales) Let F be the complete filtration induced by a Brownian motion B == (B I , . . . , B d ) in]Rd. Then any local :F- martingale M is a.s. continuous, and there exist some (P x A)-a.e. unique processes VI, . . . , V d E L(B I ) such that M == A10 + " V k . B k a.s. (5) kd The statement is essentially equivalent to the following representation of Brownian functionals, which we prove first. Lemma 18.11 (Brownian functionals, Ita) Let B == (BI,...,B d ) be a Brownian motion in }Rd, and fix any B -measurable random variable  E L2 with E == O. Then there exist some (P x A)-a.e. unique processes VI,. . . , V d E £(B 1 ) such that  =: I:k(V k . Bk)oo a.s. Proof (Dellacherie): Let H denote the Hilbert space of B-measurable random variables  E £2 with E == 0, and write K for the subspace of elements  admitting the desired representation I:k(V k .1k)00' For such a  we get E2 = E I:k ((V k )2 . A)oo, which implies the asserted uniqueness. By the obvious completeness of L(B 1 ), it is further seen from the same formula that K is closed. To obtain K == H, we need to show that any  E H e K vanishes a.s. Then fix any nonrandom functions u l ,..., u d E L 2 (JR). Put M == I:kuk. B k , and define the process Z as in Lemma 18.1. Then Z -1 == iZ.1\;1 =: iI:k(Zu k ) . B k by Proposition 17.14, and so  1- (Zoo - 1), or E  exp{ iI:k (uk. Bk)oo} == O. Specializing to step functions uk and using the uniqueness theorem for characteristic functions, we get E[; (Btl'.'.' Bt n ) E 0] == 0, tl, . . . , t n E JR+, C E B n , n E N. By a monotone class argument this extends to E[; A] == 0 for arbitrary A E :Foo, and so  == E[I:Foo] == 0 a.s. 0 Proof of Theorem 18.10: We may clearly take Mo == 0, and by suitable localization we may assume that M is uniformly integrable. Then MOC) exists in L1(:Foo) and may be approximated in L 1 by some random variables l, 2,' .. E L2(:F 00 ). The martingales M'[" == E[n'Ft] are a.s. continuous by Lemma 18.11, and by Proposition 7.15 we get, for any E > 0, P{(M)* > 2e} < P{(M n - M)* > E} < E-1EIEn -- Moo I -t O. Hence, (dM)* == 0 a.s., and so M is a.s. continuous. The remaInIng assertions now follow by localization from Lemma 18.11. 0 Our next theorem deals with the converse problem of finding a Brownian motion B satisfying (5) when the representing processes \Irk are given. The result plays a crucial role in Chapter 21. 
358 Foundations of Modern Probability Theorem 18.12 (integral representation, Doob) Let M be a continuous local :F -martingale in 1R d with Mo = 0 such that [M i , Mj] = VV · A a.s. for some :F-progressive processes V, 1 < i < d, 1 < k < n. Then there exists in 1R d a Brownian motion B with respect to a standard extension of :F such that M i = V . B k a.s. for all i. Proof: For any t > 0, let Nt and Rt be the null and range spaces of the matrix \It, and write N t .1. and Rt for their orthogonal complements. Denote the corresponding orthogonal projections by 1TN t , 1TRt' 1TN-L, and t 1TR..L, respectively. Note that vt is a bijection from Nt to Rt, and write t -1 for the inverse mapping from Rt to Nl-. All these mappings are clearly Borel-measurable functions of Vi, and hence again progressive. Now introduce a Brownian motion X lL:F in }Rn with induced filtration X, and note that 9t = Ft V Xt, t > 0, is a standard extension of both :F and X. Thus, V remains Q-progressive, and the martingale properties of M and X are still valid for Q. Consider in ]Rn the local Q-martingale B = V- 1 1rR . M + 1rN . X. The covariation matrix of B has density (V- 1 7rR)VV'(V- 1 1rR)' + 7rN7r'rv = 1fN-L 1r..L + 1rN1T'rv = 7rN-L + 7rN = I, and so Theorem 18.3 shows that B is a Brownian motion. Furthermore, the process 7rR-L . M = 0 vanishes a.s. since its covariation matrix has density 1rR-L VV'1Tk-L = o. Hence, by Proposition 17.14, V. B = VV- l 7rR. M + V1fN . Y = 1rR. M = (1rR + 1fR-L). M = M. 0 We may next prove a Fubini-type theorem, which shows how the multiple Wiener-Ita integrals defined in Chapter 13 can be expressed in terms of iterated Ita integrals. Then introduce for each n E N the simplex n = {( t 1, . . . , t n ) E IR; t 1 < . . . < t n }. Given a function f E L 2 (IR+.,A n ), we write j = n!j1n' where j denotes the symmetrization of f defined in Chapter 13. Theorem 18.13 (multiple and iterated integrals) Consider a Brownian motion B in JR with associated multiple Wiener-Ita integrals In' and fix any f E L2(1R+.). Then In! = J dBt n J dBtn_l · .. J j(tl'..., t n )dB t1 a.s. (6) Though a formal verification is easy, the existence of the integrals on the right depends in a subtle way on the possibility of choosing suitable versions in each step. The existence of such versions is implicitly regarded as part of the assertion. 
18. Continuous Martingales and Brownian Motion 359 Proof: We shall prove by induction that the iterated integral "1.-'tH ....,t n = J dB tk J dB tk _ 1 .. . J j( tl, . . . , tn)dB t1 exists for almost all tk+l,. . . , tn, and that V k has a version supported by D.n-k that is progressive as a process in tk+l with parameters tk+2, . . . , tn. FUrthermore, we shall establish the relation E ("1.-'t:+lo....t n f = J... J {f(h,.. ., t n )}2dh . .. dtk. (7) This allows us, in the next step, to define :;,...,tn for almost all tk+2, . . . , tn. The integral VO = j clearly has the stated properties. Now assume that a version of the integral "1.-'t::.,tn has been constructed with the desired properties. For any tk+l, . . . , t n such that (7) is finite, Theorem 17.25 shows that the process X;,tk+l....,t n = I t "1.-'t..tn dB tk , t > 0, has a progressive version that is a.s. continuous in t for fixed tk+l, . . . , tn. By Proposition 17.15 we obtain  t k t =X t k t t a.s., tk + l,...,tn > O, k+l,"., n k+l, k+l,..., n and the progressivity clearly carries over to V k , regarded as a process in tk+l with parameters tk+2'..., tn. Since V k - 1 is supported by n-k+l, we may choose X k to be supported by JR+ x t1 n - k , which ensures V k to be supported by Lln-k. Finally, equation (7) for V k - 1 yields E J ("1.-'t:..tJ 2 dtk J . . . J {f (t 1, . · . , t n )} 2 dt 1 . . . dt k . To prove (6), we note that the right-hand side is linear and L 2 -continuous in f. Furthermore, the two sides agree for indicator functions of rectangu- lar boxes in n. The relation extends by a monotone class argument to arbitrary indicator functions in n, and the further extension to L2(il n ) is immediate. It remains to note that In! = In! = In! for any ! E L 2 (1R+). 0 E (:+l ,...,t n ) 2 OUf previous developments have provided two entirely different represen- tations of Brownian functionals with zero mean and finite variance, namely the chaos expansion in Theorem 13.26 and the stochastic integral repre- sentation in Lemma 18.11. We proceed to examine ho'\\'- the two formulas are related. For any function f E £2 (JR+.), we define !t (t 1 , . . . , tn-I) !(t l ,. . ., tn-I, t) and write In-lf(t) = In-1ft when 11ft II < 00. 
360 Foundations of Modern Probability Proposition 18.14 (chaos and integral representations) Fix a Brownian motion B in 1R, and let  be a B-measurable random variable with chaos expansion Enl In/n. Then  = (V . B)oo a.s., where Vi = " In-lln{t), t > o. L..,., n  1 Proof: For any mEN we get, as in the last proof, J " "2 "'"'2" 2 dt L..,., E{In-1fn(t)} = L..,., IIfnll = L..,., E(Infn) < 00. nm nm nm (8) Since integrals In! with different n are orthogonal, it follows that the series for Vi converges in L 2 for almost every t > O. On the exceptional set we may redefine \It to be o. As before, we may choose progressive versions of the integrals In-lin(t), and from the proof of Corollary 4.32 it is clear that even the sum V can be chosen to be progressive. Applying (8) with m = 1, we then obtain V E L(B). Using Theorem 18.13, we get by a formal calculation  = L: Infn = L: J In-dn(t)dBt = J dBt L: In-dn(t) = J YtdBt. nl nl nl To justify the interchange of integration and summation, we may use (8) and conclude as m -t 00 that E {J dBt LnmIn-dn(t) r - J dt Lnm E{In-dn(t)}2 Lnm E(Infn)2  o. Let us now consider two different probability measures P and Q on the same measurable space (0, A), equipped with a right-continuous and P- complete filtration (:Ft). If Q « P on :Ft, we denote the corresponding density by Zt, so that Q = Zt . P on Ft. Since the martingale property depends on the choice of probability measure, we need to distinguish be- tween P-martingales and Q-martingales. Integration with respect to P is denoted by E as usual, and we write :F 00 = V t :Ft. o Lemma 18.15 (absolute continuity) Let Q = Zt . P on :Ft for all t > o. Then Z is a P -martingale, and it is further uniformly integrable iff Q « P on Foo. More generally, an adapted process X is a Q-martingale iff XZ is a P -martingale. Proof: For any adapted process X, we note that Xt is Q-integrable iff XtZt is P-integrable. If this holds for all t, we may write the Q-martingale property of X as L XsdQ = L XtdQ, A E :Fs, S < t. 
18. Continuous Martingales and Brownian Motion 361 By the definition of Z, it is equivalent that E[XsZs; A] == E[XtZt; A], A E Fs, S < t, which means that X Z is a P-martingale. This proves the last assertion, and the first statement follows as we take Xt = 1. Next assume that Z is uniformly P-integrable, say with £1-limit Zoo' For any t < u and A E Ft we have QA == E[Zu; A]. As u -+ 00, it follows that QA == E[Zoo; A], which extends by a monotone class argulnent to arbitrary A E Foo. Thus, Q == Zoo . P on Foo. Conversely, if Q ==  . P on Foo, then E == 1, and the P-martingale Mt == E[IFt] satisfies Q == Mt . P on :Ft for each t. But then Zt == Mt a.s. for each t, and Z is uniformly P-integrable with limit . 0 By the last lemma and Theorem 7.27, we may henceforth assume that the density process Z is rcll. The basic properties may then be extended to optional times and local martingales as follows. Lemma 18.16 (localization) Let Q == Zt' P on Ft for all t > O. Then for any optional time T, we have Q==Z,.p on :F,n{T<OO}. (9) Furthermore, an adapted rcll process X is a local Q-martingale iff X Z is a local P -martingale. Proof: By optional sampling, QA == E[ZTAt; A], A E F TAt , t > 0, and so Q[A;T < t] == E[Z,;An {T < t}], A E :F" t > O. Equation (9) now follows by monotone convergence as t -+ 00. To prove the last assertion, it is enough to show for any optional time T that X' is a Q-martingale iff (X Z)T is a P-martingale. This may be seen as before if we note that Q == Z[ . P on F,/\t for each t. 0 We also need the following positivity property. Lemma 18.17 (positivity) For every t > 0 we have infs::;t Zs > 0 a.s. Q. Proof: By Lemma 7.31 it is enough to show for each t > 0 that Zt > 0 a.s. Q. This is clear from the fact that Q{Zt == O} == E[Zt; Zt == 0] == o. 0 In typical applications, the measure Q is not given at the outset but needs to be constructed from the martingale Z. This requires some regularity conditions on the underlying probability space. 
362 Foundations of Modern Probability Lemma 18.18 (existence) For any Polish space S, let P be a probability measure on n = D(1R+, S), endowed with the right-continuous and complete induced filtration F. Consider an F-martingale Z > 0 with Zo = 1. Then there exists a probability measure Q on n with Q = Zt . P on Ft for all t > o. Proof: For each t > 0, we may introduce the probability measure Qt = Zt . P on Ft, which may be regarded as a measure on D([O, t], S). Since the spaces D([O, t], S) are Polish for the Skorohod topology, Corollary 6.15 ensures the existence of some probability measure Q on D(JR+, S) with projections Qt. It is easy to verify that Q has the stated properties. 0 The following basic result shows how the drift term of a continuous semi- martingale is transformed under a change of measure with a continuous density Z. An extension appears in Theorem 26.9. Theorem 18.19 (transformation of drift, Girsanov, van Schuppen and Wong) Let Q = Zt . P on Ft for each t > 0, where Z is a.s. co r:. tin- uous. Then for any continuous local P-martingale M, the process M = M - Z-l · [M, Z] is a local Q-martingale. _ Proof: First assume that Z-l is bounded on the support of [M]. Then M is a continuous P-semimartingale, and we get by Proposition 17.14 and an integration by parts MZ-(MZ)o - M.Z+Z.M+[M,Z] 1M. Z + Z. M - [M, Z] + [1\1, Z] - M. Z + Z . M, which shows that M Z is a local P-martingale. Hence, M is a local Q- martingale by Lemma 18.16. For general , we may define Tn = inf {t > 0; Zt < 1/ n} and conclude as before that MTn is a local Q-martingale for each n E N. ince Tn --t 00 a.s. Q by Lemma 18.17, it follows by Lemma 17.1 that M is a local Q-martingale. 0 The next result shows how the basic notions of stochastic calculus are preserved under a change of measure. Here [X]p denotes the quadratic variation of X under the probability measure P. We further write Lp(X) for the class of X-integrable processes V under P, and let (V . X)p be the corresponding stochastic integral. Proposition 18.20 (preservation laws) Let Q = Zt . P on Ft for each t > 0, where Z is continuous. Then any continuous P-semimartingale X is also a Q-semimarlingale, and [X]p = [X]Q a.s. Q. Furthermore, Lp(X) c LQ(X), and for any V E Lp(X) we have (V . X)p = (V . X)Q a.s. Q. Finally, any continuous local P-martingale M satisfies (V . M)'" = V . M a.s. Q whenever either side exists. 
18. Continuous Martingales and Brownictn Motion 363 Proof: Consider a continuous P-semimartingale X == M + A, where M is a continuous local P-martingale and A is a process of locally finite variation. Under Q we may write X = if +Z-l.[M, Z]+A, where if is the continuous local Q-martingale of Theorem 18.19, and we note that Z-l . (M, Z] has locally finite variation since Z > 0 a.s. Q by Lemma 18.17. Thus, X is also a Q-semimartingale. The statement for [X] is now clear from Proposition 17.17. Now assume that V E Lp(X). Then V 2 E Lp([X]) anj V E Lp(A), so the same relations hold under Q, and we get V E LQ (Iv! + A). Thus, to get V E LQ(X), it remains to show that V E LQ(Z-l(M, Z]). Since Z > 0 under Q, it is equivalent to show that V E LQ«(M, Z]). But this is clear by Proposition 17.9, since [M,Z]Q = [M,Z]Q and V E LQ(M). To prove the last assertion, we note as before that LQ(M) == LQ(M). If V belongs to either class, then by Proposition 17.14 we get under Q the a.s. relations (V . M)rv V . M - Z-l . [V . M, Z) 1 - - V. M - V Z- . [M, Z] == V . M. o In particular, we note that if B is a P-Brownian motion in d, then B is a Q-Brownian motion by Theorem 18.3, since both processes are continuous martingales with the same covariation process. The preceding theory simplifies when P and Q are equivalent on each Ft, since in that case Z > 0 a.s. P by Lemma 18.17. If Z is also continuous, it may be expressed as an exponential martingale. More general processes of this type are considered in Theorem 26.8. Lemma 18.21 (real exponential martingales) A continuous process Z > o is a local martingale iff it has an a. s. representation Zt == £(M)t - exp(M t - [M]t), t > 0, (10) for some continuous local martingale M. In that case M 'is a.s. unique, and for any continuous local martingale N we have [M,N] =: Z-l . [Z,N]. Proof: If M is a continuous local martingale, then so is [(M) by Ito's formula. Conversely, assume that Z > 0 is a continuous local martingale. Then by Corollary 17.19, 10gZ -logZo = Z-l. Z - !Z-2. [Z] = Z-l. Z - [Z-l. Z], and (10) follows with M = log Zo + Z-l . Z. The last assertion is clear from this expression, and the uniqueness of M follows from Proposition 17.2. 0 We shall now see how Theorem 18.19 can be used to eliminate the drift of a continuous semimartingale, and we begin with the silnple case of Brow- nian motion B with a deterministic drift. Here we need the fact that £(B) is a true martingale, as can be seen most easily by a direct computation. By p  Q we mean that P «: Q and Q  P. Write Lroc for the class of 
364 Foundations of Modern Probability functions f : + -t jRd such that 111 2 is locally Lebesgue integrable. For any f E Lfoc we define f . A == (i 1 . A, . . . , fd . A), where the components on the right are ordinary Lebesgue integrals. Theorem 18.22 (shifted Brownian motion, Cameron and Marlin) Let:F be the complete filtration induced by canonical Brownian motion B in d, fix a continuous function h : IR+ --+ IR d with ho == 0, and write Ph for the distribution of B + h. Then Ph '" Po on :Ft for all t > 0 iff h == f . A for some I E Lfoc, in which case Ph == £(1 . B)t . Po. Proof: If Ph '" Po on each Ft, then by Lemmas 18.15 and 18.17 there exists some Po-martingale Z > 0 such that Ph == Zt . Po on :Ft for all t > O. Theorem 18.10 shows that Z is a.s. continuous, and by Lemma 18.21 it can then be written as £(M) for some continuous local Po-martingale M. Using Theorem 18.10 again, we note that M == V . B = Ei Vi . B i a.s. for some processes Vi E L(B 1 ), and in _particular V E Lfoc a.s. By Theorem 18.19 the process B == B - [B, M] == B - V . A is a P h - Brownian motion, and so, under Ph, the canonical process B has two semimartingale decompositions, namely B == B + V . A == (B - h) + h. By Proposition 17.2 the decomposition is a.s. unique, and so V . A == h a.s. Thus, h == I . A for some nonrandom function f E Lc' and furthermore A{t > 0; \It =I It} == 0 a.s., which implies M == V . B == f. B a.s. Conversely, assume that h == I . A for some f E Lfoc. Since M == I . B is a time-changed Brownian motion under Po, the process Z == £ (M) is a Po-martingale, and by Lemma 18.18 there exists a probability measure Q on C(1R+, d) such that Q = Zt' Po on :Ft for all t > O. Moreover, Theorem 18.19 shows that B = B - [B, M] = B - h is a Q-Brownian motion, which means that Q = Ph. In particular, Ph '" Po on each :Ft. 0 In more general cases, Theorem 18.19 and Lemma 18.21 suggest that we might try to remove the drift of a semimartingale through a change of measure of the form Q == £(M)t . P on :Ft for each t > 0, where M is a continuous local martingale with Mo == o. By Lemma 18.15 it is then necessary for Z == £ (M) to be a true martingale. This is ensured by the following condition. Theorem 18.23 (uniform integrability, Novikov) Let M be a continuous local martingale with Mo == 0 such that Ee[M]oo/2 < 00. Then £(M) is a uniformly integrable martingale. The result will first be proved in a special case. Lemma 18.24 (Wald's jdentity) If B is a real Brownian motion and r is an optional time with Ee 7" /2 < 00, then E exp( Br -  T) = 1. Proof: We first consider the special optional times Tb = inf {t > 0; Bt = t - b}, b > O. 
18. Continuous Martingales and Brownian Motion 365 Since the Tb remain optional with respect to the right-continuous, induced filtration, we may assume B to be canonical Brownian n1otion with asso- ciated distribution P == Po. Defining ht = t and Z == E(B)., we see from Theorem 18.22 that Ph == Zt . P on :Ft for all t > O. Since Ib < 00 a.s. under both P and Ph, Lemma 18.16 yields Eexp(B'b - Tb) == EZ'b == E[Z'b; Tb < 00] == Ph{Tb < oo} == 1. In the general case, the stopped process Mt - Ztl\Tb is a. positive martin- gale, and Fatou's lemma shows that M is also a supermartingale on [0,00]. Since, moreover, EMoo == EZ Tb == 1 == EMo, it is clear from the Doob decomposition that M is a true martingale on [0,00]. Hence, by optional sampling, 1 == EM, == EZ,I\ T b == E[Z,; T < Tb] + E[Z'b; T > Tb]. (11) By the definition of Tb and the hypothesis on T, we get as b -+ 00 E[ZTb; T > Tb] == e- b E[e Tb / 2 ; T > Tb] < e- b Ee T / 2 --t 0, and so the last term in (11) tends to zero. Since, moreover, Tb -t 00, the first term on the right tends to EZ, by monotone convergence, and the desired relation EZ, == 1 follows. 0 Proof of Theorem 18.23: Since £(M) is always a supermartingale on [0,00], it is enough, under the stated condition, to show that E£(M)oo == 1. We may then use Theorem 18.4 and Proposition 7.9 to reduce to the statement of Lemma 18.24. 0 In particular, we obtain the following classical result for Brownian motion. Corollary 18.25 (removal of drift, Girsanov) Consider in }Rd a Brown- ian motion B and a progressive process V with Eexp{(IVI2. -X)oo} < 00. Then Q == £(V ' . B)oo . P is a probability measure, and 13 == B - V . -X is a Q-Brownian motion. Proof: Combine Theorems 18.19 and 18.23. o Exercises 1. Assume in Theorem 18.4 that [M]oo == 00 a.s. Show that M is T- continuous in the sense of Theorem 17.24, and use Theore1n 18.3 to conclude that B == MOT is a Brownian motion. Also show for any V E L(M) that (V 0 T) . B == (V . M) 0 T a.s. 2. If B is a real Brownian motion and V E L(B), then X == V. B is a time- changed Brownian motion. Express the required time-change T in terms of V, and verify that X is r-continuous. 
366 Foundations of Modern Probability 3. Let M be a real continuous local martingale. Show that M converges a.s. on the set {SUPt Mt < oo}. (Hint: Use Theorem 18.4.) 4. Let :F and 9 be filtrations on a common probability space (!2, A, P). Show that 9 is a standard extension of F iff every F-martingale is also a Q-martingale. (Hint: Consider martingales of the form Mt == E[I:Ft], where  E Ll(:F oo ). Here Mt is gt-meaBurable for all  iff:Ft C gt, and then Mt = E[lgt] a.s. for all  iff F lLFtYt by Proposition 6.6.) 5. Let F and 9 be right-continuous filtrations such that 9 is a standard extension of :F, and let 7 be an F-optional time. Show that F-r C Q.,lLFrF. (Hint: Apply optional sampling to the uniformly integrable martingale Mt = E[IFtJ for any  E Ll(:F oo ).) 6. Let M be a nontrivial isotropic continuous local martingale in ffi.d, and fix an affine transformation f on }Rd. Show that even f(M) is isotropic iff f is conformal (i.e., the composition of a rigid motion with a change of scale) . 7. Deduce Theorem 18.6 (ii) from Theorem 9.8. (Hint: Define T == inf{t; IBtl = I}, and iterate the construction to form a random walk in }Rd with steps of size 1.) 8. Deduce Theorem 18.3 for d == 1 from Theorem 14.17. (Hint: Proceed as above to construct a discrete-time martingale with jumps of size h. Let h --t 0, and use a version of Proposition 17.17.) 9. Consider a real Brownian motion B and a family of progressive processes V t E L(B), t > O. Give necessary and sufficient conditions on the V t for the existence of a Brownian motion B', such that B; = (V t . B)oo a.s. for each t. Verify the conditions in the case of Proposition 18.9. 10. Extend Proposition 18.9 to any continuous, F-exchangeable process X on + or [0,1]. (Hint: Recall that Xt = at+aB t for some Brownian motion or bridge B and some independent pair of random variables a and a > o. Note that X remains exchangeable for the filtration gt == :Ft V a{ a, a}. Hence, so is B, and we may apply Proposition 18.9.) 11. Use Proposition 18.9 to give direct proofs of the relation 71 d 72 in Theorems 13.16 and 13.17. (Hint: Imitate the proof of Theorem 11.14.) 12. For a Brownian motion B and optional time T < 00, show that E exp(B., - T) < 1 where the inequality may be strict. (Hint: Thuncate and use Fatou's lemma. Note that t - 2Bt --t 00 by the law of large numbers.) 
Chapter 19 Feller Processes and Semigroups Semigroups, resolvents, and generators; closure and core; Hille- Yosida theorem; existence and regularization; strong Markov property; characteristic operator; diffusions and elliptic oper- ators; convergence and approximation Our aim in this chapter is to continue the general discussion of continuous- time Markov processes initiated in Chapter 8. We have already seen several important examples of such processes, such as the pure jump-type processes in Chapter 12, Brownian motion in Chapters 13 and 18, and the general Levy processes in Chapter 15. The present treatment will be supplemented by detailed studies of ergodic properties in Chapter 20, of diffusions in Chapters 21 and 23, and of excursions and additive functionals in Chapters 22 and 25. The crucial new idea is to regard the transition kernels as operators Tt on an appropriate function space. The Chapman-Kolmogorov relation then turns into the semigroup property Ts Tt = Ts+t, which suggests a formal representation Tt = etA in terms of a generator A. Under suitable regularity conditions-the so-called Feller properties-it is indeed possible to define a generator A that describes the infinitesimal evolution of the underlying process X. Under further hypotheses, X will be shown to have continuous paths iff A is (an extension of) an elliptic differential operator. In general, the powerful Hille- Yosida theorem provides the precise conditions for the existence of a Feller process corresponding to a given operator A. Using the basic regularity theorem for submartingales from Chapter 7, it will be shown that every Feller process has a version that is right-continuous with left-hand limits (rcll). Given this fundamental result, it is straightfor- ward to extend the strong Markov property to arbitrary Feller processes. We shall also explore some profound connections with martingale theory. Finally, we shall establish a general continuity theorem for Feller processes and deduce a corresponding approximation of discrete-time Markov chains by diffusions and other continuous-time Markov processes. The proofs of the latter results will require some weak convergence theory from Chapter 16. To clarify the connection between transition kernels and operators, let I-L be an arbitrary probability kernel on some measurable space (S, S). We 
368 Foundations of Modern Probability may then introduce an associated transition operator T, given by Tf(x) = (Tf)(x) = J J1.(x,dy)f(y), XES, (1) where f: S --t JR is assumed to be measurable and either bounded or non- negative. Approximating f by simple functions, we see that by monotone convergence T f is again a measurable function on S. It is also clear that T is a positive contraction operator, in the sense that 0 < f < 1 implies o < T f < 1. A special role is played by the identity operator I, which cor- responds to the kernel J.L(x, .) = 6x. The importance of transition operators for the study of Markov processes is due to the following simple fact. Lemma 19.1 (transition kernels and operators) The probability kernels J..Lt, t > 0, satisfy the Chapman-Kolmogorov relation iff the corresponding transition operators Tt have the semigroup property T s + t == TsTt, s, t > O. (2) Proof: For any B E S we have Ts+t1B(x) == J..Ls+t(x, B) and (Ts1t)lB(x) Ts(Tt 1 B)(x) = J J1.s(x,dy)(Tt 1 B)(Y) - J J1.s(x, dY)J1.t(Y, B) = (J1.sJ1.t)(x, B). Thus, the Chaprnan-Kolmogorov relation is equivalent to Ts+t1B == (TsTt)IB for any B E S. The latter relation extends to (2) by linearity and monotone convergence. 0 By analogy with the situation for the Cauchy equation, one might hope to represent the semigroup in the form Tt == etA, t > 0, for a suitable generator A. For the formula to make sense, the operator A must be suitably bounded, so that the exponential function can be defined through a Taylor expansion. We shall consider a simple case when such a representation exists. Proposition 19.2 (pseudo-Poisson processes) Let (Tt) be the transi- tion semigroup of a pure jump-type Markov process in S with bounded rate kernelo. Then Tt = etA for all t > 0, where for any bounded measurable function f: S  JR, Af(x) = J (f(y) - f(x »a(x, dy), xES. Proof: Choose a probability kernel J.l- and a constant c > 0 such that a(x, B) = CJ.l(x, B \ {x}). From Proposition 12.20 we see that the process is pseudo-Poisson of the form X = YoN, where Y is a discrete-time Markov chain with transition kernel J.L, and N is an independent Poisson process with fixed rate c. Letting T denote the transition operator associated with 
19. Feller Processes and Semigroups 369 J-L, we get for any t > 0 and f as stated, Ttf(x) Exl(Xt) =  Ex[f(Yn); Nt == n] nO " P{N t == n}Exf(Yn) L....t n  0 _" e- ct (ct)n Tn f(x) = ect(T-r) f(x). L....tno n! Hence, Tt == etA holds for t > 0 with Af(x) c(T - l)f(x) = c J (f(y) - f(x))JJ(x, dy) - J (f(y) - f(x))a(x, dy). For the further analysis, we assume S to be a locally compact, separable metric space, and we write Co == Co (S) for the class of continuous functions f : S  JR with f(x) -+ 0 as x -+ 00. We can make (7 0 into a Banach space by introducing the norm ItIII == sUPx If(x)l. A sernigroup of positive contraction operators Tt on Co is called a Feller semigroup if it has the additional regularity properties o (F 1 ) TtCo C Co, t > 0, (F 2 ) Ttf(x) -t f(x) as t  0, f E Co, xES. In Theorem 19.6 we show that (F 1 ) and (F 2 ) together vrith the semi group property imply the strong continuity (F 3 ) Ttf  1 as t  0, I E Co. For motivation, we proceed to clarify the probabilistic significance of those conditions. Then assume for simplicity that S is compact, and also that (Tt) is conservative in the sense that Tt1 == 1 for all t. For every initial state x, we may then introduce an associated Markov process Xf, t > 0, with transition operators Tt. Lemma 19.3 (Feller properties) Let (Tt) be a conservative transition semigroup on a compact metric space (S, p). Then (F 1 ) holds iff Xf  xi as x  y for fixed t > 0; (F 2 ) holds iff Xf  x as t  0 for fixed x; (F 3 ) holds iff sUPx Ex[p(Xs, Xt) !\ 1]  0 as s - t  o. Proof: The first two statements are obvious, so we shall prove only the third one. Then choose a dense sequence fl, 12,. .. in () = C(S). By the compactness of S we note that X n -+ x in S iff !k(X n )  !k(X) for each k. Thus, p is topologically equivalent to the metric p'(x, y) = Lk 2- k (I!k(x) - fk(y) I A. 1), x, yES. 
370 Foundations of Modern Probability Since S is compact, the identity mapping on S is uniformly continuous with respect to p and p', and so we may assume that p = p'. Next we note that, for any f E C, x E 5, and t, h > 0, Ex(f(Xt) - f(Xt+h))2 Ex(f2 - 21Th! - Thf2)(Xt) < 111 2 - 2fTh! + Thf211 < 211fll IIf - Thill + 11/ 2 - Thf211. Assuming (F 3 ), we get sUPx Exlfk(Xs) - fk(Xt)1 -+ 0 as s - t --t 0 for fixed k, and so by dominated convergence SUP x Exp(Xs, Xt) -+ O. Conversely, the latter condition yields Thfk -+ fk for each k, which implies (F 3 ). 0 Our aim is now to construct the generator of an arbitrary Feller semi- group (Tt) on Co. In general, there is no bounded linear operator A satisfying It = etA, and we need to look for a suitable substitute. For motivation, we note that if p is a real-valued function on JR+ with repre- sentation Pt = eta, then a can be recovered from p by either differentiation or integration: t -1 (Pt - 1) -+ a as t --t 0; 1 00 e->.tptdt = (>' - a)-I, >. > O. Motivated by the latter formula, we introduce for each A > 0 the associated resolvent or potential RA' defined as the Laplace transform R>.! = 1 00 e->'t(Td)dt, ! E co. Note that the integral exists, since TtJ(x) is bounded and right-continuous in t > 0 for fixed xES. Theorem 19.4 (resolvents and generator) Let (Tt) be a Feller semigroup on Co with resolvents R).., A > o. Then the operators AR).. are injective contractions on Co such that AR).. --t I strongly as A -+ 00. Furthermore, the range V = R)..C o is independent of A and dense in Co, and there exists an operator A on Co with domain V such that R-;.l = A - A on V for every A > O. Finally, A commutes on TJ with every Tt. Proof: If f E Co, then (F 1 ) yields Ttf E Co for every t, and so by domi- nated convergence we have even RAf E Co. To prove the stated contraction property, we write for any ! E Co /I>'RA!/I < >.1 00 e->.t/lTd/ldt < >"If/ll OO e->.tdt = IIfli. A simple computation yields the resolvent equation R A - Rp, = (J-t - A)RAR JL , A, J.L > 0, (3) 
19. Feller Processes and Semigroups 371 which shows that the operators R). commute and have a eommon range V. If f == RI9 with 9 E Co, we get by (3) and as A --t 00 IIAR).f - III II(AR). - I)R 1 911 = II(R I - I)R).gll < A -lIIRI - lUl\gll --to. The convergence extends by a simple approximation to the closure of V. Now introduce the one-point compactification S == S u {} of 5, and extend any f E Co to 6 == C(5) by putting f() == o. If V =1= Co, then by the Hahn-Banach theorem there exists a bounded linear functional 'P t 0 on (; such that cpR 1 ! = 0 for all f E Co. By Riesz's representation Theorem 2.22 we may extend 'P to a bounded, signed measure on S. Letting f E Co and using (F 2), we get by dominated convergence as .A -+ 00 o = >..<pR),J = J <p(dx) 1 00 >..e->..tTtf(:r)dt ! <p(dx) 1 00 e-STsj>..f(x)dt --t <pf, and so 'P = o. The contradiction shows that V is dense in Co. To see that the operators R). are injective, let ! E (;0 with R).o f == 0 for some AO > O. Then (3) yields R).f == 0 for every A > 0, and since AR).f 4 f as A -t 00, we get f == O. Hence, the inverses R-;I exist on V. Multiplying (3) by R-;l from the left and by RI from the right, we get on V the relation Rl - R-;l == J.-t - A. Thus, the operator A == A - R-;l on V is independent of A. To prove the final assertion, we note that Tt and R). commute for any t, A > 0, and write Tt(A - A)R). == Tt == (A - A)R).T t == (,\ - A)Tt R ).. o The operator A in Theorem 19.4 is called the generator of the semigroup (Tt). If we want to emphasize the role of the domain V, we say that (Tt) has generator (A, V). The term is justified by the follo,ving lemma. Lemma 19.5 (uniqueness) A Feller semigroup is uniquely determined by its generator. Proof: The operator A determines R). == (A - A)-I for all A > o. By the uniqueness theorem for Laplace transforms, it then determines the measure J.t(dt) = Ttf(x)dt on 1R+ for any I E Co and xES. Since the density Ttf(x) is right-continuous in t for fixed x, the assertion follows. 0 We now aim to show that any Feller semigroup is strongly continuous and to derive abstract versions of Kolmogorov's forward and backward equations. 
372 Foundations of Modern Probability Theorem 19.6 (strong continuity, forward and backward equations) Let (Tt) be a Feller semigroup with generator (A, D). Then (Tt) is strongly continuous and satisfies TtJ - I = it TsAlds, I E v, t > O. (4) Furthermore, Tt! is differentiable at 0 iff ! E D, in which case d dt (Ttf) == Tt A ! == ATtf, t > O. (5) To prove this result, we introduce the so-called Yosida approximation AA == -XAR A == -X(AR A - I), A > 0, (6) and the associated semigroup Tt A == etA>', i > o. The latter is clearly the transition semigroup of a pseudo-Poisson process with rate A based on the transition operator AR A . Lemma 19.7 (Yosida approximation) For any f E V, we have IITtf - TtAfli < iliA! - A AlII, t, A > 0, (7) and AAf 4 Af as A 4 00. Furthermore, Tt A f 4 Ttf as A 4 00 for each j E Co, uniformly for bounded t > o. Proof: By Theorem 19.4 we have AAj == -XR>..Af -t Af for any f E 1). For fixed A > 0 it is further clear that h -1 (T; - I) -t A A in the norm topology as h -+ o. Now for any commuting contraction operators Band C, IIB n f - Cnjll < IIB n - 1 + B n - 2 C +... + cn-III liB! - Cfll < nllBf - Cfll. Fixing any! E Co and t, A, JL > 0, we hence obtain as h == tin -+ 0 IITtAf - Tt fll < n IITt j - Th fll = t Tt I h - I - T/: I h - I -+ t IIAA I - AIL III. For j E V it follows that TtAf is Cauchy convergent as A -t 00 for fixed t, and since V is dense in_Co, the same property holds for arbitrary f E Co. Denoting the limit by Ttt, we get in particular IIT/I - Tdll < tllA A I - AlII, I E V, t > o. (8) Thus, for each f E V we have Tl'! -+ 'it! as A -t 00, uniformly for bounded t, which again extends to all f E Co. To identify 7't, we may use the resolvent equation (3) to obtain, for any ! E Co and -X, JL > 0, roo e-AtTf p,R,,.Jdt = pI. _ AIL) -1 p,RI-LI = ,.\ IL Rvl, (9) h +p 
19. Feller Processes and Semigroups 373 where v == >..J-L(>.. + J-L)-1. As J-L -t 00, we have v -+ '\, and so Rv! -+ R>... Furthermore, IITt J-LRf - Tt/ll < IIJlRlLf - ill + IITi f - Ttfll -t 0, so from (9) we get by dominated convergence J e->..tTtfdt == R>..I. Hence, the semi groups (Tt) and ('it) have the same resolvent operators R>.., and so they agree by Lemma 19.5. In particular, (7) then follows from (8). 0 Proof of Theorem 19.6: The semigroup (T/) is clearly norm continuous in t for each ,.\ > 0, and so the strong continuity of (Tt) follows by Lemma 19.7 as ,.\ -+ 00. Furthermore, we note that h- 1 (T; - I) -t AA as h -!. O. Using the semigroup relation and continuity, we obtain more generally !!-.T>" == A>"T>" == T>" A>" dt t t t , t > 0, which implies Tl'f - f = I t TsAAAfds, f E Co, t > O. If f E 1), then by Lemma 19.7 we get as ,.\ -t 00 (10) II Ts>"A>" ! - TsA!11 < IIA>"f - Ajll + IIT;A! - TsAf11 -+ 0, uniformly for bounded s, and so (4) follows from (10) as ,\ -+ 00. By the strong continuity of Tt we may differentiate (4) to get the first relation in (5). The second relation holds by Theorem 19.4. Conversely, assume that h- 1 (Th! - f) -+ 9 for some pair of functions 1,9 E Co. As h -+ 0, we get Th - I Thf - 1 AR>..I +- h R>..I == R>.. h -+ R;\g, and so I == (,.\ - A)R>..f == "\R>..! - AR>..I == R>..(,\j - g) E V. 0 In applications, the domain of a generator A is often hard to identify or too large to be convenient for computations. It is then useful to restrict A to a suitable subdomain. An operator A with domain V on some Banach space B is said to be closed if its graph G == {(I, Af); f E V} is a closed subset of B 2 . In general, we say that A is closable if the closure G is the graph of a single-valued operator A , the so-called closure of A. Note that A is closable iff the conditions V :3 f n -t 0 and Af n -t 9 imply 9 == o. When A is closed, a core for A is defined as a linear subspace D c V such that the restriction AID has closure A. In this case, A is clearly uniquely determined by AID. We shall give some conditions ensuring that D c V is a core when A is the generator of a Feller semigroup (Tt) on Co. 
374 Foundations of Modern Probability Lemma 19.8 (closure and cores) The generator (A, V) of a Feller semi- group is closed, and for any A > 0 a subspace D c V is a core for A iff (A - A)D is dense in Co. Proof: Assume that 11,/2,... E V with In  I and Afn  g. Then (1 -A)!n ! -g, and since R 1 is bounded, it follows that!n  R 1 (! -g). Hence, ! == R 1 (! - g) E V, and we have (1 - A)f = f - g, or 9 = Af. Thus, A is closed. If D is a core for A, then for any 9 E Co and A > 0 there exist some fl, f2,. . . E D with fn  RAg and Afn  ARAg, and we get (..\ - A)fn -* (A - A)RAg == g. Thus, (A - A)D is dense in Co. Conversely, assume that (A - A)D is dense in Co. To show that D is a core, fix any j E V. By hypothesis we may choose some !1, 12, . . . E D with 9n = (A - A)fn -7 (A - A)f = g. Since R).. is bounded, we obtain fn == R)..9n -7 RAg == f, and thus Afn == Afn - gn -* Af - 9 == Af. o A subspace DeCo is said to be invariant under (Tt) if TtD c D for all t > O. In particular, we note that, for any subset B c Co, the linear span of Ut Tt B is an invariant subspace of Co. Proposition 19.9 (invariance and cores, Watanabe) If (A, V) is the gen- erator of a Feller semigroup, then any dense, invariant subspace D C V is a core for A. Proof: By the strong continuity of (Tt) we note that R 1 can be approxi- mated in the strong topology by some finite linear combinations L 1 , L 2 ,. . . of the operators yt. Now fix any fED, and define gn == Lnf. Noting that A and Ln commute on D by Theorem 19.4, we get (1 - A)gn == (I - A)Lnf == Ln(l - A)f -+ R 1 (I - A)j == f. Since gn E D and D is dense in Co, it follows that (I - A)D is dense in Co. Hence, D is a core by Lemma 19.8. 0 The Levy processes in }Rd are the archetypes of Feller processes, and we proceed to identify their generators. Let COO denote the class of all infinitely differentiable functions f on ]Rd such that f and all its derivatives belong to Co = Co(JRd). Theorem 19.10 (Levy processes) Let Tt, t > 0, be the transition op- erators of a Levy process in ]Rd with characteristics (a, b, v). Then (Tt) is a Feller semigroup, and CD is a core for the associated generator A. Moreover, we have for any f E C and x E ]Rd Af(x) =  L. .aijfIj(x) + L.bifI(x) ,J 1 + J {f(x + y) - f(x) - L i YdI(x)l{IYI < I}} v(dy).(U) 
19. Feller Processes and Semigroups 375 In particular, a standard Brownian motion in JRd has generator , and the uniform motion with velocity b E R d has generator b\7, both on the core COO. Here  and V denote the Laplace and gradient operators, respectively. Also note that the generator of the jump component has the same form as for the pseudo-Poisson processes in Proposition 19.2, apart from the compensation for small jumps by a linear drift term. *[t- 1 ] w Proof of Theorem 19.10: As t -+ 0, we have tLt -+ /-ll. Thus, Corollary 15.20 yields J-Lt/t -4 1/ on ]Rd \ {O} and at,h = r l f XXI ILt(dx)  ah, bt,h = r 1 f XILt(dx)  bh, (12) J1xlSh Jlxlh provided that h > 0 satisfies v{ Ix I = h} = O. Now fix any f E COO, and write rl(Ttf(x) - f(x)) = r 1 J (f(x + y) - f(X))ILt(dy) - t- I f { f(x + y) - f(x) - L.Yif:(x) - L. .uiYjf:j(x) } /-Lt(dy) J1ylSh  ,J + rll (f(x + y) - f(X))ILt(dy) + L b,h f:Cx) +  L ajh f:j(x). Iyl>h i i,j As t -+ 0, the last three terms approach the expression in (11), though with aij replaced by afj and with the integral taken over {Ixl > h}. To establish the required convergence, it is then enough to show that the first term on the right tends to zero as h -+ 0, uniformly for small t > O. But this is clear from (12), since the integrand is of the order hlyl2 by Taylor's formula. From the uniform boundedness of the derivatives of f, we also see that the convergence is uniform in x. Thus, Co c D by Theorem 19.6, and (11) holds on Co. It remains to show that COO is a core for A. Since Co is dense in Co, it suffices by Proposition 19.9 to show that it is also invariant under (Tt). Then note that, by dominated convergence, the differentiation operators commute with each Tt, and use condition (F 1 ). 0 We proceed to characterize the linear operators A on Co whose closures A are generators of a Feller semigroups. Theorem 19.11 (characterization of generators, Hille, Yosida) Let A be a linear operator on Co with domain V. Then A is closable and its closure A is the generator of a Feller semigroup on Co iff these conditions hold: (i) V is dense in Co; (ii) the range of Ao - A is dense in Co for some Ao > 0; (iii) if f V 0 < f(x) for some f E V and x E 5, then A.f(x) < O. Condition (iii) is known as the positive-maximum principle. 
376 Foundations of Modern Probability Proof: First assume that .it is the generator of a Feller semi group (Tt). Then (i) and (ii) hold by Theorem 19.4. To prove (iii), let fED and xES with f+ = f V 0 < f(x). Then Ttf(x) < Ttf+(x) < /lTtf+1I < l/f+II = f(x), t > 0, and so h- 1 (T h f - f)(x) < O. As h  0, we get Af(x) < o. Conversely, assume that A satisfies (i), (H), and (Hi). Let f E V be arbitrary, choose xES with I/(x)/ = lilli, and put 9 = fsgnf(x). Then 9 E V with g+ < g(x), and so (iii) yields Ag(x) < o. Thus, we get for any A>O II(A - A)fll > Ag(X) - Ag(x) > Ag(X) = A/lfll. (13) To show that A is closable, let 11,12,' .. E V with fn  0 and Afn  g. By (i) we may choose 91, g2," . E V with gn  g, and by (13) we have II (A - A)(gm + Afn) II > AII9m + A/nil, m,n EN, A > O. As n  00, we get II (A - A)gm - Agli > AIIgmil. Here we may divide by A and let A  00 to obtain 119m - 911 > 119m", which yields IIgll = 0 as m -+ 00. Thus, A is closable, and from (13) we note that the closure A satisfies II(A - A)/II > All/II, A > 0, 1 E dom(A). (14) Now assume that An --+ A > 0 and (An - A)fn --+ 9 for some 11,12,' . . E dom(.A). By (14) the sequence (In) is then Cauchy, say with limit I E Co. By the definition of A we get (-X - A)f = g, and so 9 belongs to the range of A - .A. Letting A denote the set of constants A > 0 such that -X - .A has range Co, it follows in particular that A is closed. If we can show that A is open as well, then by (ii) we have A = (0,00). Then fix any A E A, and conclude from (14) that A - A has a bounded inverse R A with norm liRA I! < A-I. For any J-L > 0 with IA - J-LII/RAII < 1, we may form the bounded linear operator R =" ( A - J-L ) n Rn+l /-l L...,.;. n  0 A , and we note that (J-L - A)RJ.l == (A - A)Rjl - (A - J-L)RJl == I. In particular, J.1- E A, which shows that A E A 0 . We may next establish the resolvent equation (3). Then start from the identity (-X - .A)R A = (J-t - .A)RJ.l = I. By a simple rearrangement, (A - .A)(R A - R/-l) == (J-L - A)R/-l' and (3) follows as we multiply from the left by R A . In particular, (3) shows that the operators R A and RJ.l commute for any A, J.-t > o. Since RA(A - A) == I on dom(A) and IIRAII < A-I, we have for any f E dom(A) as A 4' 00 !lARAI - III = IIR.xAIIi < A-lilA/II  o. 
19. Feller Processes and Semigroups 377 From (i) and the contractivity of AR A , it follows easily that AR). -+ I in the strong topology. Now define AA as in (6) and let Tt A == etA>'. As in the proof of Lemma 19.7, we get Tli -7 Ttl for each I E (7 0 , uniformly for bounded t, where the Tt form a strongly continuous family of contraction operators on Co such that J e-AtTtdt == R A for all A > O. To deduce the semigroup property, fix any I E Co and s, t > 0, and note that as A -t 00 (T s + t - TsTt)f = (Ts+t - Ts>+t)f + TsA(Tt A - Tt)f + (T; - Ts)Tt! -+ o. The positivity of the operators Tt will follow immediately, if we can show that R).. is positive for each A > o. Then fix any function 9 > 0 in Co, and put f = RAg, so that g =: (A - A.)f. By the definition of ..4, there exist some 11,12,." E 1) with In -+ I and Aln -+ Af. If infx f(x) < 0, we have infx In (x) < 0 for all sufficiently large n, and we may choose some X n E S with fn(xn) < fn A O. By (iii) we have Aln(xn) > 0, and so infx(A - A)fn(x) < (A - A)fn(xn) < Aln(xn) == Ainfxln(x), As n -7 00, we get the contradiction o < infxg(x) = infx(A - A)/(x) < Ainfx!(x) < o. It remains to show that A is the generator of the semigroup (Tt). But this is clear from the fact that the operators A - A are inverses to the resolvent operators R A . 0 From the proof we note that any operator A on Co satisfying the positive maximum principle in (iii) must be dissipative, in the sense that II (A - A)fll > AII/II for all f E dom(A) and A > O. This leads to the following simple observation, which will be needed later. Lemma 19.12 (maximality) Let (A, V) be the generator of a Feller semi- group on Co, and assume that A extends to a linear operator (A', D') satisfying the positive-maximum principle. Then V' == V. Proof: Fix any f E V', and put 9 = (1 - A') f. Since A' is dissipative and (1 - A)R 1 == 1 on Co, we get IIf - Rigil < 11(1 - A')(f - RIg)11 = 119 - (I - A)R 1 911 == 0, and so f == RIg E D. o Our next aim is to show how a nice Markov process can be associated with every Feller semigroup (Tt). In order for the corresponding transition kernels ILt to have total mass 1, we need the operators Tt to be conservative, in the sense that sUPfl Ttf(x) == 1 for all xES. This can be achieved by a suitable extension. Let us then introduce an auxiliary state  f/. Sand fornl the compactified space S == S u {}, where  is regarded as the point at infinity when S is noncompact, and otherwise as isolated from S. Note that any function 
378 Foundations of Modern Probability ! E Co has a continuous extension to S, obtained by putting I() == O. We may now extend the original semigroup on Co to a conservative semigroup on the space 6 = C(S). Lemma 19.13 (compactijication) Any Feller semigroup (Tt) on Co ad- mits an extension to a conservative Feller semigroup (Tt) on 6, given by Tt! == I() + Tt{f - j(Ll)}, " t > 0, I E C. Proof: It is straightforward to verify that (Tt) is a strongly continuous semigroup on C. To show that the operators Tt are positive, fix any f E (; with f > 0, and note that 9 = f() - f E Co with 9 < f(). Hence, Ttg < Ttg+ < II Ttg+ II < Ilg+ II < f(), and so Ttl == f() - Tt9 > o. The contraction and conservation properties now follow from the fact that Tt 1 == 1. 0 Our next step is to "construct an associated semi group of Markov transition kernels J-Lt on S, satisfying Ttf(x) = J f(Y)J..Lt(x, dy), f E Co. (15) We say that a state xES is absorbing for (Ilt) if Ilt(X, {x}) = 1 for each t > o. Proposition 19.14 (existence) For any Feller semigroup (Tt) on Co, there exists a unique semigroup of Markov transition kernels J.Lt on S satisfying (15) and such that  is absorbing for (J-Lt). Prool: For fixed x  Sand t > 0, the mapping I r--+ Ttf(x) is a positive linear functional on C with norm 1, so by Riesz's representation Theorem 2.22 there exist some probability measures J.Lt (x, .) on S satisfying Ttf(x) = J f(Y)J..Lt(x, dy), '" "- f E C, XES, t > o. (16) The measurability of the right-hand side is clear by continuity. By a stan- dard approximation followed by a monotone class argument, we then obtain the desired measurability of J.Lt(x, B) for any t > 0 and Borel set B c S. The Chapman-Kolmogorov relation holds on S by Lemma 19.1. Relation (15) is a special case of (16), and from (16) we further get J f(Y)J..Lt(6., dy) = Td(6.) = f(6.) = 0, f E Co, which shows that  is absorbing. The uniqueness of (Ilt) is a consequence of the last two properties. 0 For any probability easure v on S, there exists by Theorem 8.4 a Markov process XV in S with initial distribution 1/ and transition ker- nels Ilt. As before, we denote the distribution of XV by Pv and write Ev for 
19. Feller Processes and Semigroups 379 the corresponding integration operator. When v == 8x, we often prefer the simpler forms Px and Ex, respectively. We may now extend Theorem 15.1 to a basic regularization theorem for Feller processes. Given a process X, we say that  is absorbing for XI if Xt =  or Xt- ==  implies Xu =  for all u > t. Theorem 19.15 (regularization, Kinney) Let X be a Feller proces in  with arbitrary initial distribution v. Then X has an rcll version X in S such that Ll is absorbng for XI. If (Tt) is conservative and v is restricted to S, we can choose X to be rell even in S. The idea of the proof is to construct a sufficiently rich class of super- martingales, to which the regularity theorems of Chapter 7 can be applied. Let ct denote the class of nonnegative functions in Co. Lemma 19.16 (resolvents and excessive functions) If fEet, then the process yt == e- t R1!(X t ), t > 0, is a supermartingale under Pv for every v. Proof: Writing (Qt) for the filtration induced by X, we get for any t, h > 0 E[e-t-hRl!(Xt+h)IQt] = e-t-h1hRI!(Xt) e- t - h 1 00 e-STs+hf(Xdds e- t 1 00 e-STsf(Xt)ds < }t. Proof of Theorem 19.15: By Lemma 19.16 and Theorem 7.27, the pro- cess f(Xt) has a.s. right- and left-hand limits along Q+ for any f E V dom(A). Since V is dense in Co, the stated property holds for every f E Co. By the separability of Co we may choose the exceptional null set N to be independent of !. If Xl, X2, . . . E  are such that f (xn) converges for every f E Co, then the compactness of S ensures that X n converges in the topol- ogy of S. Thus, on NC the process X itself has right- and left-ha!ld limits Xt:i: along Q+; on N we may re<!efine X to be o. Then clearly Xt = X t + is rcll. It remains to show that X is a version of X or, equivalently, that X t + = Xt a.s. for each t > o. But this follows from the fact that Xt+h  Xt as h t 0 by Lemma 19.3 and dominated convergence. Now fix any f E Co with f > 0 on S, and note from the strong continuity of (Tt) that even R1f > 0 on S. Applying Lemma 7.31 to the supermartin- gale It = e-tR1!(X t ), we conclude that X -  a.s. on the interval [(, (0), where ( = inf {t > 0;  E {X t , Xt- } }. Discarding the exceptional null set, we can make_this hold identically. If (Tt) is conservative and v is restricted to S, then Xt E S a.s. for every t > O. Thus, ( > t a.s. for all t, and h_ence ( _ 00 a.s. Again we may assume that this holds identically. Then Xt and Xt- take values in S, and the stated regularity properties remain valid in S. 0 E[Yi+hIQtJ o 
0U f4bundations of Modern .Probability '" In view of the last theorem, we may choose 11 to be the space of all S-valued rcll functions such that the state  is absorbing, and let X be the canonical process on !1. Processes with different initial distributions v are then distinguished by their distributions Pv on !1. Thus, under Pv the process X is Markov with initial distribution v and transition kernels J1t, and X has all the regularity properties stated in Theorem 19.15. In particular, X =  on the interval [(, (0), where ( denotes the terminal time (= inf{t > 0; Xt =  or Xt- = }. We take (Ft) to be the right-continuous filtration generated by X, and put A = F 00 = V t Ft. The shift operators Ot on Q are defined as before by (Otw)s = Ws+t, s,t > o. The process X with associated distributions Pv, filtration F = (F t ), and shift operators Ot is called the canonical Feller process with semigroup (Tt). We are now ready to state a general version of the strong Markov prop- erty. The result extends the special versions obtained in Proposition 8.9 and Theorems 12.14 and 13.11. A further instant of this property appears in Theorem 21.11. Theorem 19.17 (strong Markov property, Dynkin and Yushkevich, Blu- menthal) For any canonical Feller process X, initial distribution v, optional time T, and random variable  > 0, we have Ev[ 0 8,IFr] = EX-r a.s. Pv on {T < oo}. Proof: By Lemmas 6.2 and 7.1 we may assume that T < 00. Let 9 denote the filtration induced by X. Then Lemma 7.4 shows that the times Tn = 2- n [2 n T + 1] are Q-optional, and by Lemma 7.3 we have F T C QT n for all n. Thus, Proposition 8.9 yields Ev[ 0 8'n; A] = Ev[Ex-rn; A], A E F" n E N. (17) To extend the relation to T, we first assume that  = TIk<m Ik(X tk ) for some II, . . . , 1m E Co and tl < . . . < t m . Then  0 f),n -+  0 f) by the right- continuity of X and the continuity of II, . . . ,1m- Writing hk = tk-tk-I with to == 0, it is also clear from the first Feller property and the right-continuity of X that EX-rn  Th 1 (!I T h 2 . . . (/m-lThrnlm) . . . ) (X Tn ) -+ Thl (flTh2 . .. (fm-lThrn!m) -.. )(X r ) = EXT. Thus, (17) extends to 'T by dominated convergence on both sides. We may fi- nally use standard approximation and monotone class arguments to extend the result to arbitrary . 0 As a simple application, we get the following useful zero-one law. 
19. Feller Processes and Semigroups 381 Corollary 19.18 (Blumenthal's 0-1 law) For any canonical Feller process, we have PxA == 0 or 1, xES, A E Fa. Proof: Taking T == 0 in Theorem 19.17, we get for any xES and A E:Fo 1A == Px[AIFo] == PXoA == Px A a.s. Px. o To appreciate the last result, recall that Fa == F o +. In particular, we note that Px {T == O} == 0 or 1 for any state xES and F -optional time T. The strong Markov property is often used in the following extended form. Corollary 19.19 (optional projection) For any canonical Feller process X, nondecreasing adapted process Y, and random variable  > 0, we have Ex 1 00 (EXt) d}'t = Ex 1 00 (o ()d d}'t, xES. Proof: We may assume that Yo == O. Introduce the right-continuous Inverse T s == inf {t > 0; yt > s}, s > 0, and note that the times Ts are optional by Lemma 7.6. By Theorem 19.17 we have Ex[Ex[ 0 OTsIF Ts ]; Ts < 00] Ex[ 0 OT s ; Ts < 00]. Since Ts < 00 iff s < Y 00, we get by integration Ex[ExTS; Ts < 00] {Y oo {Y oo Ex J o (Ex.,..) ds = Ex J o ( 0 ()r.) ds, and the asserted formula follows by Lemma 1.22. o Our next aim is to show that any martingale on the canonical space of a Feller process X is a.s. continuous outside the discontinuity set of X. For Brownian motion, the result was already noted as a consequence of the integral representation in Theorem 18.10. Theorem 19.20 (discontinuity sets) Let X be a canon'ical Feller process with arbitrary initial distribution v, and let M be a local Pv-martingale. Then {t > 0; LlMt =1= O} c {t > 0; Xt- # Xt} Q.s. (18) Proof (Chung and Walsh): By localization we may reduce to the case when M is uniformly integrable and hence of the form Mt == E[IFt] for some  E L 1 . Let C denote the class of random variables  E £1 such that the corresponding M satisfies (18). Then C is a linear subspace of L 1 . It is further closed, since if Mf == E[nIFt] with lInlh -t 0, then P{sUPtIMfl > £} < e- 1 EInl -t 0, e > 0, 
382 Foundations of Modern Probability p and so SUPt IMt"1 -t o. Now let € = TIk<n Ik(X tk ) for some 11'...' In E Co and t 1 < ... < tn. Writing h k = tk - tk-l, we note that Mt = IIk:::;m /k{X tk )T t "'+1- t9m H (X t ), t E [tm, tmH]' (19) where 9k = !kThk+l (!k+l T h k + 2 (... Thnfn).'.)' k = 1,..., n, with the obvious conventions for t < t 1 and t > tn. Since Ttg(x) is jointly continuous in (t, x) for each 9 E Co, equation (19) defines a right-continuous version of M satisfying (18), and so € E C. By a simple approximation it follows that C contains all indicator functions of sets nk<n {X tk E G k} with G 1 , . . . , G n open. The result extends by a monotone class argument to any X-measurable indicator function , and a routine argument yields the final extension to £1. 0 A basic role in the theory is played by the processes M! = f{X t ) - f{X o ) - I t Af{Xs)ds, t > 0, fED. Lemma 19.21 (Dynkin's formula) The processes M / are martingales un- der any initial distribution v for X. In particular, we have for any bounded optional time T Exf{X r ) = f{x) + Ex l T Af(Xs)ds, x E 5, fED. (20) Proof: For any t, h > 0, we have I t + h M!+h - M! = f{Xt+h) - f{X t ) - t Af{Xs)ds = M[ 0 (}t, and so by the Markov property at t and Theorem 19.6 Ev[M/+hIFt] - M! = Ev[M[ 0 OtlFt] = EXtMl = o. Thus, M/ is a martingale, and (20) follows by optional sampling. 0 As a preparation for the next major result, we introduce the optional times Th = inf{t > 0; p(Xt,X o ) > h}, h > 0, where p denotes the metric in S. Note that a state x is absorbing iff 'h = 00 a.s. Px for every h > o. Lemma 19.22 (escape times) For any nonabsorbing state xES, we have ExTh < 00 for all sufficiently small h > O. Proof: If x is not absorbing, then J.lt(x, B;) < p < 1 for some t, £ > 0, where B; = {V; p(x,y) < ,g}. By Lemma 19.3 and Theorem 4.25 we may 
19. Feller Processes and Semigroups 383 choose h E (0, e] so small that jjt(Y, B) < JLt(Y, B) < p, Y E B. Then Proposition 8.2 yields PX{Th > nt} < Px n ksn {Xkt E B} < pn, n E IE.+, and so by Lemma 3.4 f t ExTh = In P{Th > s}ds < t LP{Th > nt} = t Lpn = 1- p < 00.0 o nO nO We turn to a probabilistic description of the generator and its domain. Say that A is maximal within a class of linear operators if it extends every member of the class. Theorem 19.23 (characteristic operator, Dynkin) Let (.A, V) be the gen- erator of a Feller process. Then for any f E V we have ilf (x) == 0 if x is absorbing, and otherwise Af(x) = lim Exf(X rh ) - f(x) . (21) h-+O ExTh Furthermore, A is the maximal operator on Co with those properties. Proof: Fix any lEV. If x is absorbing, then Ttf(x) == f(x) for all t > 0, and so AI(x) = O. For a nonabsorbing x, we get instead by Lemma 19.21 [Th I\t Exf(XThAt) - f(x) = Ex Jo Af(Xs)ds, t, h > o. (22) By Lemma 19.22 we have ETh < 00 for sufficiently small h > 0, and so (22) extends by dominated convergence to t == 00. Relation (21) now follows from the continuity of Af, together with the fact that p( X s, x) < h for all S < Th' Since the positive maximum principle holds for any extension of A with the stated properties, the last assertion follows by Lemma 19.12. 0 In the special case when S == JRd, let C denote the class of infinitely differentiable functions on JRd with bounded support. An operator (A, V) with V ::> C is said to be local on C if Ai (x) == 0 whenever I vanishes in some neighborhood of x. For any generator with this property, we note that the positive-maximum principle implies a local positive-maximum principle, asserting that if f E C has a local maximum > 0 at some point x, then AI(x) < o. The following result gives the basic connection between diffusion pro- cesses and elliptic differential operators. This connection is explored further in Chapters 21 and 24. 
384 Foundations of Modern Probability Theorem 19.24 (Feller diffusions and elliptic operators, Dynkin) Let (A, V) be the generator of a Feller process X in JRd, and assume that C K C V. Then X is continuous on [0, (), a.s. Pv for every l/, iff A is local on C. In that case there exist some functions aij,bi,c E CORd), where c > 0 and the aij form a symmetric, nonnegative definite matrix, such that for any f E C and x E JR+, Af(x) = ! L. .aij (x)jij (x) + L.bi(x)fi(x) - c(x)f(x). (23) 1,) 1 In the situation described by this result, we may choose (2 to consist of all paths that are continuous on [0, (). The resulting Markov process is referred to as a canonical Feller diffusion. Proof: If X is continuous on [0, (), then A is local by Theorem 19.23. Conversely, assume that A is local on C. Fix any x E jRd and 0 < h < m, and choose f E C with f > 0 and support {y; h < Iy - xl < m}. Then Af(y) = 0 for all Y E B, and so Lemma 19.21 shows that f(XtI\Th) is a martingale under Px. By dominated convergence we get Exf(XTh) = 0, and since m was arbitrary, Px {IX Th - xl < h or X Th = } = 1, x E JRd, h > O. Applying the Markov property at fixed times, we obtain for any initial distribution v pv n (};1 {IX Th - Xol < h or X Th = } = 1, h > 0, tEQ+ which implies Pv {suPt«IXtl < h} = 1, h > O. Hence, X is continuous on [0, () a.s. PJ.L' To show that (23) holds for suitable aij, b i , and c, we choose for every x E JRd some functions Iff, If, lij E C K such that, for any y close to x, f(y) = 1, jt(y) =: Yi - Xi, fij(Y) =: (Yi - Xi)(Yj - Xj). Putting C(X) =: -Af(x), bi(x) = A/f(x), aij(x) = Aiij(x), we note that (23) holds locally for any function i E C that agrees near x with a second-degree polynomial. In particular, we may choose fo(y) = 1, fi(Y) =: Yi, and fij(Y) = YiYj near x to obtain A!o(x) Afi (x) Afij(X) -C(x), - bi(x) - XiC(X), - aij(x) + Xibj(X) + xjbi(x) - XiXjC(X). This shows that c, bi, and aij = aji are continuous. 
19. Feller Processes and Semigroups 385 Applying the local positive-maximum principle to fa gives c(x) > O. By the same principle applied to the function f = - {L:iudi} 2 = - L iJ ui u jf 0 , we get L:ij UiUjaij{X) > 0, which shows that (aij) is nonnegative defi- nite. Finally, we consider any function f E OK with a second-order Taylor expansion f around x. Here each function g(y) == :l:(f(y) - j(y)) - Elx - y12, C > 0, has a local maximum 0 at x, and so Ag1:(x) = ::f:(Af(x) - Aj(x)) - E Liaii(x) < 0, E > O. Letting E -t 0 gives Af(x) = AJ(x), which shows that (23) is generally true. 0 We consider next a basic convergence theorem for Feller processes, essentially generalizing the result for Levy processes in Theorem 15.17. Theorem 19.25 (convergence, Trotter, Sova, Kurtz, Mackevicius) Let X and X n be Feller processes in S with semigroups (Tt) and (Tn,t) and gen- erators (A, V) and (An, V n ), respectively. Fix a core D for A. Then these conditions are equivalent: (i) If fED, there exist some fn E 1J n with fn -t f and Anfn -t Af. (ii) Tn,t -t Tt strongly for each t > o. (iii) Tn,tf -t Ttf for every f E Co, uniformly for bounded t > O. (iv) If Xl) i+ Xo in 5, then xn i+ X in D(JR+, S). For the proof we need two lemmas, the first of which extends Lemma 19.7. Lemma 19.26 (norm inequality) Let (Tt) and (T:) be Feller semigroups with generators (A, V) and (A', V'), respectively, where A' is bounded. Then IITd - TUII < it II (A - A')Tsfll ds, f E v, t > O. (24) Proof: Fix any f E V and t > O. Since (T) is norm continuous, we get by Theorem 19.6 :s (TLsTsJ) = TLs(A - A')Tsf, 0 < s < t. Here the right-hand side is continuous in s, because of the strong continuity of (Ts), the boundedness of A', the commutativity of A and Ts, and the norm continuity of (T;). Hence, Td - T:! = it :s (Tf-sTsJ) ds = it Tf-s(A - A')Tsf ds, 
386 Foundations of Modern Probability and (24) follows by the contractivity of T£_s. D We may next establish a continuity property for the Yosida approxima- tions AA and A of A and An, respectively. Lemma 19.27 (continuity of Yosida approximation) Let (A, V) and (An, V n ) be the generators of some Feller semigroups satisfying condition (i) of Theorem 19.25. Then A -+ AA strongly for every A > o. Proof: By Lemma 19.8 it suffices to show that Af -+ AAf for every f E (A - A)D. Then define 9 = RA fED. By (i) we may choose some gn E V n with gn -+ g and Angn -+ Ag. Then in = (A - An)gn --t (A - A)g == f, and so IIAI - AA III A211Rf - R A III < A2I1R(f - In) II + A211Rfn - R A III < All! - fnll + A 2 11gn - 911 -+ O. 0 Proof of Theorem 19.25: First we show that (i) implies (iii). Since D is dense in Co, it is enough to verify (iii) for fED. Then choose some functions In as in (i), and conclude by Lemmas 19.7 and 19.26 that, for any n E Nand t, A > 0, IITn,tf - Ttfll < IITn,t(f - in)" + II (Tn,t - T,t)fn II + IIT:,t(!n - f) II + II(T t - T£)fll + II(Tt A - Tt)fll , < 211fn - fll + tll(AA - A)/II + tll(A n - A)fnll + I t II(A - A>')Ts>' 111 ds. (25) By Lemma 19.27 and dominated convergence, the last term tends to zero as n -+ 00. For the third term on the right, we get II (An - A)fnll < IIAnin - Alii + II (A - AA)fll + II(AA - A)/II + IIA(I - fn)lI, which tends to II (A - AA)fll by the same lemma. Hence, by (25) limsup sup IITn,tf - Ttfll < 2ull(AA - A)/II, u, A > 0, n-+oo tu and the desired convergence follows by Lemma 19.7 as we let ..\ -+ 00. Conversely, (iii) trivially implies (ii), and so the equivalence of (i)-(iii) will follow if we can show that (ii) implies (i). Then fix any / E D and A > 0, and define 9 = (A - A)f and In = Rg. Assuming (ii), we get by dominated convergence fn -+ RAg == f. Since (..\ - °An)fn == g == (..\ - A)f, we also note that An/n -+ Af. Thus, even (i) holds. It remains to show that conditions (i)-(iii) are equivalent to (iv). For convenience, we may then assume that S is compact and the semigroups (Tt) and (Tn,t) are conservative. First assume (iv). We may establish (ii) by showing that, for any fEe and t > 0, we have T[" f(xn)  Ttf(x) 
19. Feller Processes and Semigroups 387 whenever X n --+ x in S. Then assume that Xo = x and Xl) == Xn. By Lemma 19.3 the process X is a.s. continuous at t. Thus, (iv) yields XI" i+ Xt, and the desired convergence follows. Conversely, assume conditions (i)-(iii), and let Xo i+ Xo. To obtain xn Jd) X, it is enough to show that, for any fa, . . . ,1m E C and 0 = to < t l . . . t m, Hrn E II fk(X4) = E II fk(Xtk). n-+oo km km (26) This holds by hypothesis when m = o. Proceeding by induction, we may use the Markov property to rewrite (26) in the form E II fk(X) . ThTnfm(Xrn_l) -t E II fk(X tk ) . Thrnfm(Xtrn-l)' (27) k<m k<m where h m = t m - t m - 1 . Since (ii) implies Th rn f m -t Th rn f rn, it is equivalent to prove (27) with T replaced by Th m . The resulting condition is of the form (26) with m replaced by m - 1. This completes the induction and shows that xn f d ) X. To strengthen the conclusion to X n i+ X, it suffices by Theorems 16.10 and 16.11 to show that p(Xn,X+hn)  0 for any finite optional times Tn and positive constants h n -t O. By the strong Markov property we may prove instead that p(X o , X hn )  0 for any initial distributions v n . In view of the compactness of S and Theorem 16.3, we may then assume that V n  v for some probability measure v. Fixing any f,9 E C and noting that Th n 9 -+ 9 by (iii), we get Ef(X[j)g(Xh n ) = EfThng(XO)  Efg(Xo), where .c(X o ) = v. Then (X{j, X hn ) i+ (X o , Xo) as before, and in particular p(X(j, X hn )  p(Xo,X o ) = O. This completes the proof of (iv). 0 From the last theorem and its proof we may easily deduce a similar approximation property for discrete-time Markov chains. The result ex- tends the approximations for random walks obtained in Corollary 15.20 and Theorem 16.14. Theorem 19.28 (approximation of Markov chains) Let yl, y2, . .. be discrete- time Markov chains in S with transition operators U 1, U 2, . .. , and consider a Feller process X in S with semigroup (Tt) and generator A. Fix a core D for A, and assume that 0 < h n -t o. Then conditions (i)-(iv) of Theorem 19.25 remain equivalent for the operators and processes A - h - 1 (U I) rp - U [t/h n ] X n -- "'{.,Tn n - n n -, .L n,t - n' t -- L [tj h n ]' Proof: Let N be !tn independent, unit-rate Poisson process, and note that the processes XI" == yn 0 N t / hn are pseudo-Poisson with generators An. Theorem 19.25 shows that (i) is equivalent to (iv) with X n replaced 
388 Foundations of Modern Probability by x n . By the strong law of large numbers for N together with Theorem 4.28, we also see that (iv) holds simultaneously for the processes x n and X n . Thus, (i) and (iv) are equivalent. Since X is a.s. continuous at fixed times, condition (iv) yields X .:;. Xt whenever t n -+ t and the processes X n and X start at fixed points X n -+ x in S. Hence, Tn,tnf(xn) -+ Ttf(x) for any f E 6, and (iii) follows. Since (iii) trivially implies (ii), it remains to show that (ii) implies (i). Arguing as in the preceding proof, we then need to show that Rg -t R A 9 for any A > 0 and 9 E Co, where R = (..\ - An)-I. Now (ii) yields Rg -+ RAg, where R = J e-AtTn,tdt, and so it suffices to prove that (R - R)g -t o. Then note that ..\Rg - ARg = Eg(YK:-l) - Eg(YK n n _l)' where the random variables n and Kn are independent of yn and geometri- cally distributed with parameters Pn == 1- e-'\h n and Pn == ..\hn(l + ..\h n )-I, respectively. Since Pn rv Pn, we have 11£(K n ) - £(n) II -+ 0, and the desired convergence follows by Fubini's theorem. 0 Exercises 1. Examine how the proofs of Theorems 19.4 and 19.6 can be simplified if we assume (F 3 ) instead of the weaker condition (F 2 ). 2. Consider a pseudo-Poisson process X on S with rate kernel Q. Give conditions ensuring X to be Feller. 3. Verify the resolvent equation (3), and conclude that the range of R A is independent of A. 4. Show that a Feller semigroup (Tt) is uniquely determined by the resolvent operator R A for a fixed A > O. Interpret the result probabilistically in terms of an independent, exponentially distributed random variable with mean A-I. (Hint: Use Theorem 19.4 and Lemma 19.5.) 5. Consider a discrete-time Markov process in S with transition operator T, and let T be an independent random variable with a fixed geometric distribution. Show that T is uniquely determined by Exf(Xr) for arbitrary xES and f > o. (Hint: Apply the preceding result to the associated pseudo-Poisson process.) 6. Give a probabilistic description of the Yosida approximation Tl' in terms the original process X and two independent Poisson processes with rate A. 7. Given a Feller diffusion semigroup, write the second differential equation in Theorem 19.6, for suitable f, as a PDE for the function Ttf(x) on R+ x JRd. Also show that the backward equation of Theorem 12.22 is a special case of the same equation. 
19. Feller Processes and Semigroups 389 8. Consider a Feller process X and an independent subordinator T. Show that yt == X(Tt) is again Markov, and that Y is Levy whenever this is true for X. If both T and X are stable, then so is Y. F'ind the relation- ship between the transition semigroups, respectively between the indices of stability. 9. Consider a Feller process X and an independent rene\val process TO, 71, . . . . Show that Y n == X Tn is a discrete-time Markov process, and express its transition kernel in terms of the transition semigroup of X. Also show that yt == X ( T[t]) may fail to be Markov, even when ( Tn) is Poisson. 10. Let X and Y be independent Feller processes in Sand T with genera- tors A and !J. Sow that eX, Y) s a Feller process in S x T with generator extending A + B, where A and B denote the natural extensions of A and B to C o (8 x T). 11. Consider in S a Feller process with generator A and a pseudo-Poisson process with generator B. Construct a Markov process with generator A + B. 12. Use Theorem 19.23 to show that the generator of Brownian motion in  extends A ==  on the set D of functions f E CJ with Ai E Co. 13. Let R).. be the A-resolvent of Brownian motion in JR. For any f E Co, put h == R)..f, and show by direct computation that Ah -  h lf == f. Conclude by Theorem 19.4 that  with domain D, defined as above, extends the generator A. Thus, A ==  by the preceding exercise or by Lemma 19.12. 14. Show that if A is a bounded generator on Co, then the associated Markov process is pseudo-Poisson. (Hint: Note as in Theorem 19.11 that A satisfies the positive-maximum principle. Next use Riesz' representation theorem to express A in terms of bounded kernels, and show that A has the form of Proposition 19.2.) 15. Let the processes xn and X be such as in Theorem 16.14. Show that Xr  Xt for all t > 0 implies xn  X in D(IR+, d), and compare with the stated theorem. Also prove a corrsponding result for a sequence of Levy processes X n . (Hint: Use Theorems 19.28 and 19.25, respectively.) 
Chapter 20 Ergodic Properties of Markov Processes transition and contraction operators; ratio ergodic theorem; space-time invariance and tail triviality; mixing and conver- gence in total variation; Harris recurrence and transience; existence and uniqueness of invariant measure; distributional and pathwise limits In Chapters 8 and 12 we have seen, under suitable regularity conditions, how the transition probabilities of a discrete- or continuous-time Markov chain converge in total variation toward a unique invariant distribution. Here our main purpose is to study the asymptotic behavior of more general Markov processes and their associated transition kernels. A wide range of powerful tools will then come into play. We first extend the basic ergodic theorem of Chapter 10 to suitable con- traction operators on an arbitrary measure space and establish a general operator version of the ratio ergodic theorem. The relevance of those re- sults for the study of Markov processes is due to the fact that the transition operators are positive L 1 - LOO-contractions with respect to any invariant measure ,\ on the state space 5. The mentioned results cover both the pos- itive recurrent case, where '\S < 00, and the null-recurrent case, where ,\5 == 00. Even more remarkably, the same ergodic theorems apply to both the transition probabilities and the sample paths, in each case giving conclusive information about the asymptotic behavior. Next we prove for an arbitrary Markov process that a certain strong ergodicity condition is equivalent to the triviality of the tail a-field, the con- stancy of all bounded, space-time invariant functions, and a uniform mixing condition. We also consider a similar result where all four conditions are replaced by suitably averaged versions. For both sets of equivalences, one gets very simple and transparent proofs by applying the general coupling results of Chapter 10. In order to apply the mentioned theorems to specific Markov processes, one needs to find regularity conditions ensuring the existence of an invari- ant measure or the triviality of the tail a-field. Here we consider a general class of Feller processes which satisfy either a strong recurrence or a uni- form transience condition. In the former case, we prove the existence of an invariant measure, required for the application of the mentioned ergodic 
20. Ergodic Properties of Markov Processes 391 theorems, and show that the space-time invariant functions are constant, which implies the mentioned strong ergodicity. Our proofs of the latter re- sults depend on some potential theoretic tools related to those developed in Chapter 19. To begin with the technical developments, we consider a Markov tran- sition operator T on an arbitrary measurable space (5,S). Note that T is positive, in the sense that f > 0 implies T f > 0, and also that Tl == 1. As before, we write Px for the distribution of a Markov process on Z+ with transition operator T starting at x E 5. More generally, we define PJL == Is PxJ-L(dx) for any measure J.L on S. A measure .\ on S is said to be invariant if AT f == Af for any measurable function f > O. Writig () or the shift on the path space SOCJ, we define the associated operator () by Of==foO. For any p > 1, we say that an operator T on some measure space (S, S, Jl) is an LP-contraction if IITfil p < Ilfll p for every f E LP. By an L1_£00_ contraction we mean an operator that is an LP-contraction for every p E [1,00]. The following result shows the relevance of the mentioned notions for the theory of Markov processes. Lemma 20.1 (Markov processes and contractions) Let T be a Markov transition operator on (8, S) with invariant measure A. Then (i) T is a positive L 1 -Loo-contraction on (8, A); ( ii) 0 is a positive L 1 - L 00 - contraction on (5 00 , P A ) . Proof: (i) Applying Jensen's inequality to the transition kernel J.L(x, B) == T1B(x) and using the invariance of A, we get for any p E [1, (0) and f E LP liT fll == AIJ.LfI P < AJllfl P == Alfl P == IIfll. The result for p == 00 is obvious. (ii) Proceeding as in Lemma 8.11, we see that 0 is a IIleasure-preserving transformation on (Soo, P>.). Hence, for any measurable function f > 0 on SOCJ and constant p > 1, we have P>.IOfI P == P>.lf 0 61P == (P A 0 (}-l)lfI P == PAl liP. The contraction property for p == 00 is again obvious. o We shall see how some crucial results of Chapter 10 carryover to the context of positive L 1 - LOO-contractions on an arbitrary measure space. First we consider an operator version of Birkhoff's ergodic theorem. To simplify our writing, we introduce the operators Sn == Ek<n Tk, An == Sn/n, and Mf = sUP n Anf. Say that f is T-invariant if Tf = f. 
392 Foundations of Modern Probability Theorem 20.2 (operator ergodic theorem, Hop/, Dunford and Schwartz) Let T be a positive L 1 _LCXJ -contraction on a measure space (S, S, J-l). Then Ani converges a.e. for every f E £1 toward a T-invariant function Af E Ll. For the proof, we need to extend the inequalities in Lemmas 10.7 and 10.11 and in Proposition 10.10 (i) to an operator setting. Lemma 20.3 (maximum inequalities) For any positive L 1 -contraction T on a measure space (8, S, J-l), we have (i) J1 [f; M I > 0] > 0, f EL I . If T is even an L 1 - L 00 - contraction, then also (ii) rJ1{ M f > 2r} < J-l[f; f > r], f E £1, r > 0; (iii) 11M flip :$ II flip, f E LP, P > 1. Proof: (i) For any f E £1 we write Mnf = Slf V . .. V Snf and conclude by positivity that Skf = f + TSk-lf < f + T(Mnf)+, k = 1,. . . , n. Hence, Mnf < f+T(Mnf)+ for all n, and so by positivity and contractivity J1[f; Mnf > 0] > J-l[Mnf - T(Mnf)+; Mnf > 0] > J1[(M n f)+ - T(Mnf)+] II(Mnf)+lh -IIT(Mnf)+lh > O. As before, it remains to let n -t 00. (ii) Put fr == fl{f > r}. By the LCXJ-contractivity and positivity of An, Anf - 2r < An(f - 2r) < An(fr - r), n E N, which implies Mf - 2r < M(fr - r). Hence, by part (i), rJ1{M f > 2r} < rJ-l{M(fr - r) > O} < J.L[/r; M(/r - r) > 0] < J1fr = J1[f; / > r]. (iii) Here the earlier proof applies with only notational changes. D Proof of Theorem 20.2: Fix any f ELI. By dominated convergence, we may approximate I in L 1 by functions j E £lnLoo c L 2 . By Lemma 10.18, we may next approximate j in L 2 by functions of the form j + (g - Tg), where j, 9 E L 2 and T j = j. Finally, we may approximate 9 in L 2 by functions 9 E £1 nLoo. Since T contracts L 2 , the functions 9 - Tg will then approximate 9 - Tg in L 2 . Combining the three approximations, we have for any c > 0 f = fe: + (ge: - Tge:) + he: + ke:, (1) where Ie: E L 2 with Tie: = fe:, ge: E £1 n LCXJ, and II he: 112 V IIke:1I1 < c. 
20. Ergodic Properties of Markov Processes 393 Since fE is invariant, we have AnfE = IE. Next we note that IIAn(ge: - Tge:) 1100 == n- 1 I1ge: - Tnge:lloo < 2n- 1 11gEl\oo -+ O. (2) Hence, limsupAnf < ie: + Mhe: + Mke: < 00 a.e., n --+ 00 and similarly for liminf n An!. Combining the two estimates gives (limsuPn -liminfn)Ani < 2Ml h e:1 + 2Mlk g l. Now Lemma 20.3 yields for any E, r > 0 IIMlh c ll1 2 5 IIh g l1 2 < E, JL{Mlke:! > 2r} < r- 1 11k g 11 1 < E/r, and so MlhEI + Mike:l -t 0 a.e. as E -t 0 along a suitable sequence. Thus, An! converges a.e. toward some limit Af. To see that Af is T-invariant, we note that by (1) and (2) the a.e. limits Ah E and Ake: exist and satisfy T Af - Af == (T A - A)(h g + kg). By the contraction property and Fatou's lemma, the right-hand side tends to 0 a.e. as E -t 0 along some sequence, and we get T AI == AI a.e. 0 A problem with the last theorem is that the limit AI nlay be 0, in which case the a.s. convergence Ani -t Af gives little inforlnation about the asymptotic behavior of Anf. For example, this happens \vhen J-tS == 00 and T is the operator induced by a j.t-preserving and ergodic transformation f} on S. Then Af is a constant, and the condition Af E L 1 implies Af = O. To get around this difficulty, we may instead compare the asymptotic behavior of Snf with that of Sng for a suitable reference function 9 ELI. This idea leads to a far-reaching and powerful extension of Birkhoff's theorem. Theorem 20.4 (ratio ergodic theorem, Chacon and Ornstein) Let T be a positive L 1 -contraction on a measure space (S,S,jl), and fix any f E L 1 and 9 E L. Then Snf /Sng converges a.e. on the set {Sf oog > O}. Our proof will be based on three lemmas. Lemma 20.5 (individual terms) T n f/Sn+1g -+ 0 a.e. on {Soog > O}. Proof: We may assume that f > O. Fix any E > 0, and define h n == Tn f - ES n +1g, An == {h n > O}, By positivity, 'n > o. h n = Thn-l - Eg < Th-;;_1 - Eg, n > 1. Examining the cases An and A separately, we conclude that h < Th-;;_1 - clAng, n > 1, and so by contractivity EJL[g; An] < JL(Th_l) - fLh < JLh-l - J-lh . 
394 Foundations of Modern Probability Summing over n gives eJL Ln11Ang < JLht = JLU - 109) < JLi < 00, which implies J.L[g; An La.] == 0 and hence limsuPn(Tn f /Sn+lg) < E a.e. on {g > OJ. Since € was arbitrary, we obtain Tn 1 / Sn+lg --+ 0 a.e. on {g > OJ. Applying this result to the functions T m I and Tmg gives the same convergence on {Sm-lg == 0 < Smg} for arbitrary m > 1. 0 To state the next result, we introduce the nonlinear filling operator U on L 1 , given by U h == Th+ - h_. It is suggestive to think of the sequence un h as resulting from successive attempts to fill a hole h_, by mapping in each step only the matter that has not yet fallen into the hole. We also define Mnh == 8 1 h V . . . V Snh. Lemma 20.6 (filling operator) For any h E L1 and n EN, we have un- 1 h > 0 on {Mnh > OJ. Proof. Writing h k = h+ + (Uh)+ + ... + (Ukh)+, we claim that hk > Sk+lh, k > o. (3) This holds for k = 0 since h+ == h + h_ > h. Proceeding by induction, we assume (3) to be true for k = m > O. Using the induction hypothesis and the definitions of 8k, h k , and U, we get for m + 1 Sm+2 h h + TS m + 1 h < h + Th m = h + Lkm T(Ukh)+ - h+ Lkm (Uk+ 1 h+ (Ukh)_) h+ " ((U k + 1 h)+ - (Uk+1h)_ + (Ukh)_) k5:m h + hm+l - h+ + h_ - (U m + 1 h)_ < hm+l. This completes the proof of (3). If Mnh > 0 at some point in S, then 8kh > 0 for some k < n, and so by (3) we have h k > 0 for some k < n. But then (Ukh)+ > 0, and therefore (Uk h)_ == 0 for the same k. Since (Ukh)_ is nonincreasing, it follows that (U n - 1 h)_ == 0, and hence U n - 1 h > o. 0 To state our third and crucial lemma, we write 9 E 7i (I) for a given I E L if there exists a decomposition f == il + /2 with 11, /2 E L such that 9 = T 11 + 12. In particular, we note that I, gEL  implies U(I - g) == I' - 9 for some I' E 7(/). The classes Tn{/) are defined recursively by 1n+l(/) = Ti(Tn(/» and we put 7(/) == Un Tn(f). We may now introduce the functionals 'l/JBf = sup{J.L[g;B]; 9 E 7(1)}, f E L, B E S. 
20. Ergodic Properties of Markov Processes 395 Lemma 20. 7 (filling functionals) Let I, gEL  and B E S. Then B C {limsupnSn(f - g) > O} ===> 'l/Jnf > 1/;Bg. Proof: Fix any 9' E 7(g) and c > 1. First we show that {limsuPnSn(f - g) > O} C {limsuPn8n(cf - g') > O} a.e. (4) We may then assume that g' E Ti(g), since the general result then follows by iteration in finitely many steps. Letting g' == r + Ts for some r, s E L with r + s = g, we obtain Sn(cf - g') == '" Tk(cf - r - Ts) k<n == Sn(f - g) + (c - l)Snf + s -- Tns. Since Tns/Snf = Tn-ITs/Snf -+ 0 a.e. on {Boo! > O} by Lemma 20.5, we conclude that eventually Sn(cf - g') > 8n(f - g) a.e. on the same set, and (4) follows. Combining the given hypothesis with (4), we obtain the a.e. relation B C {M(cf - g') > O}. Now Lemma 20.6 yields Un-l(c! - g') > 0 on Bn = B n {Mn(cf - g') > O}, n E N. Since Bn t B a.e. and Un-l(c! - g') == I' - g' for some f' E T(cf), we get o < J1.[U n - l (cJ - g'); Bn] J1.[I'; Bn] - Jl [g'; Bn] < 1/;B (cf) - Jl[g'; Bn] --+ c1jJ B f - Jl[g'; B], and so c1/;BI > J1.[g'; B]. It remains to let c --+ 1 and take the supremum over g' E 7(g). 0 Proof of Theorem 20.4: We may assume that 1 > O. On {Soo9 > O}, put Q == liminfn(Snf/Sng) < limsuPn(Snf/Sng) == fJ, and define Q = /3 = 0 otherwise. Since Sng is nondecreasing, we have for any c > 0 {,B > c} C {limsuPn(Sn(f - cg)/Sn9) > 0, 5 00 9 > O} C {limsuPnBn(f - cg) > O}. Writing B = {B = 00, 8 00 9 > O}, we see from Lemma 20.7 that c'l/JBg == 'l/JB(cg) < 'lfJBf < J-tf < 00, and as c --+ 00 we get 'tPBg == O. But then J-L[Tng; B] == 0 for all n > 0, and therefore J.L[Soog; B] = Q. Since 8 00 9 > 0 on B, we obtain JlB == 0, which means that {3 < 00 a.e. Now define C = {Q < a < b < {3} for fixed b > a > O. As before, C C {limsuPn8n(f - bg) 1\ limsuPnSn(ag - f) > O}, 
396 Foundations of Modern Probability and so by Lemma 20.7 b'lf;cg == 'l/;c(bg) < 'l/Jcf < 1/Jc(ag) == a1/Jcg < 00, which implies 1/;Cg == 0, and therefore J-LC = O. Hence, J-l{ a < jJ} < '""' J.l{ a < a < b < jJ} == 0, a<b where the summation extends over all rational a < b, and so a == jJ a.e., which proves the asserted convergence. 0 We illustrate the use of the last theorem by considering a striking ap- plication to discrete-time Markov processes. Given such a process X on S and a measurable function f on 5 00 , we define 5 n f == Ek<n j(OkX). Corollary 20.8 (ratio limit theorem) Given a discrete-time Markov pro- cess in S with invariant measure '\, we have for any fELl (P A ) and 9 E L(P>J (i) Sn! /Sn9 converges a.e. P A on {y E soo; Soog(y) > O}; (ii) Ex5n!/ExSng converges a.e. ,\ on {x E S; ExS009 > O}. Proof: (i) By emma 20.1 (ii) we may apply Theorem 20.4 to the L1- LOO-contractio 0 on (BOO, P)..) induced by the shift 0, and the result follows. (ii) Writing f(x) = Exf(X) and using the Markov property at k, we get - k - Exf(Ok X ) == ExExkf(X) == Exf(Xk) == T f(x), and so Ex5n! ==  Ex!(OkX) ==  T k J(x) == Snj(X), k<n k<n where Sn == TO + .. . + Tn-Ion the right. We also note that ;..j = J j(x) >'(dx) = J Exf(X) >'(dx) = E>.f(X) = P>.l Now Lemma 20.1 (i) shows that T is a positie £1 - LOO-contraction on (8,'\). By Theorem 20.4 we conclude that Snf(x)/8ng(x) converges a.e. ,\ on the set {x E 8; 8ooY(x) > O}, which translates immediately into the asserted statement. 0 Now consider a conservative, continuous-time Markov process on an ar- bitrary state space (8, S) with distributions Px and associated expectation operators Ex. On the canonical path space f2 = SJR.+ we introduce the shift operators Ot and filtration :F = (:F t ). A bounded function f: S --+ JR is said to be invariant or harmonic if it is measurable and such that f(x) = Ttf(x) = Exf(Xt), x E 8, t > o. More generally, we say that a bounded function f: SxJR+ --+ 1R is space-time invariant or harmonic if it is measurable and satisfies f(x, s) = Exf(Xt, s + t), XES, s, t > o. (5) 
20. Ergodic Properties of Markov Processes 397 For motivation, we note that f is then invariant for the associated space- time process Xt = (X t , s + t) in S x ffi.+, where the second component is deterministic apart from the possibly random initial value s > o. Note that X is again a time-homogeneous Markov process with transition oper- ators Ttf(x, s) == Exf(Xt, s + t). We need the following useful martingale connection. Lemma 20.9 (space-time invariance) A bounded, meas'urable function f: S x +  JR is space-time invariant iff the process Mt == f(Xt, s + t) is a PJ-l-martingale for any J-l and s > o. Proof: Assume that f is space-time invariant. Letting s, t, h > 0 and using the Markov property of X, we get EJ-l[Mt+hIFt] EJ-l(f(Xt+h, S + t + h) 1Ft] - EXt!(Xh, s + t + h) f(Xt, s + t) == Mt, which shows that M is a PJ-l-martingale. Conversely, the martingale property of M for J-l = 8x yields Exf(Xt, s + t) == ExMt == ExMo == f(x, s) a.s., XES, s, t > 0, which means that f is space-time invariant. o The tail u-field on n is defined as T == nt It, where Tt == a(Ot) - o-{X s ; s > t}. A a-field 9 on n is said to be PJ-l-trivial if PJ-lA == 0 or 1 for every A E Q. We write p!! == PJ-l[ .IB] and say that PJ-l is mixing if lim IIpJl 0 Bi l - p/! 0 Bill! = 0, B E :Foo with PJ-lB > O. t-+oo fA' The following key result defines the notion of strong ergodicity, as opposed to the weak ergodicity of Theorem 20.11. Theorem 20.10 (strong ergodicity, Orey) For any conservative, discrete- or continuous-time Markov semigroup with distributions PJ-L, these condi- tions are equivalent: (i) the tail a-field T is PJ-L-trivial for every jj; (ii) Pj-t is mixing for every J-l; (iii) every bounded, space-time invariant function is a constant; (iv) IIPJ.L 0 Bi l - Pv 0 Billl  0 as t  00 for any J-l and v. First proof: By Theorem 10.27 (i) we note that (ii) and (iv) are equivalent to the conditions (ii') PJl == P: on T for any J-l and B; (iv') PJ.L = Pv on T for any J.t and v. We may then prove that (ii') {::} (i) :::;. (iv') and (iv) => (iii) => (i). 
398 Foundations of Modern Probability (i) {:::} (ii'): If PJLA = 0 or 1, then clearly also Pff A = 0 or 1, which shows that (i) =} (ii'). Conversely, let A E T be arbitrary with PJLA > o. Taking B = A in (ii') gives PJ.LA = (PJ.LA)2, which implies PJ.LA = 1. (i) => (iv'): Applying (i) to the distribution (J.L+v) gives PJLA+PvA = 0 or 2 for every A E T, which implies PJ.LA = PvA = 0 or 1. (iv) ::::} (iii): Let I be bounded and space-time invariant. Using (iv) with tL = Dx and v = Dy gives If(x,8) - fey, 8)1 = IExf(Xt, s + t) - Eyf(Xt, s + t)/ < 11111 IIPx 0 Oil - Py 0 Bill! -t 0, which shows that I (x, s) = f (s) is independent of x. But then f (s ) f(s + t) by (5), and so f is a constant. (iii) ::::} (i): Fix any A E T. Since A E Tt = a(Ot) for every t > 0, we have A = OtlAt for some sets At E ;:00' and we note that At is unique since Ot is surjective. For any s, t > 0, Ot10;lAs+t = O;JtAs+t = A = Bt1At, s, t > 0, and so O;lA s + t = At. Putting f(x, t) == PxAt and using the Markov property at time s, we get Exf(Xs,8 + t) = ExPxsAs+t = Expx[O;lAs+tIFs] = PxAt == f(x, t). Thus, f is space-time invariant and therefore equal to a constant c E [0,1]. By the Markov property at t and martingale convergence as t -+ 00, we have a.s. c == f(Xt, t) = PXtAt == PJ.L[BtlAtIFt] == PJL[AIFt]  lA, which implies PJLA = c E {O, I}. This shows that T is PJL-trivial. 0 Second proof: We can avoid using the rather deep Theorem 10.27 by giving direct proofs of the implications (i) => (ii) ::::} (iv). (i) => (ii): Assuming (i), we get by reverse martingale convergence IIPJL (. n B) - PJL (. )PJL (B) II 'It = 'IEJL [PJL [BiTt] - PJLB; .] II 'It < EJLIPJL[BITt] - PJLBI  o. (ii) => (iv): Let J-l' - v' be the Hahn decomposition of J.L - v and choose B E S with J.L' BC = vB = O. Writing X = J.L' + v' and A = {X o E B}, we get by (ii) IIPIJ 0 Ot 1 - PI' 0 Ot 1 11 = lip/II lip:: 0 Ot 1 - pt 0 Ot 1 11--+ o. 0 The invariant a-field 'I on n consists of all events A c {1 such that Oil A == A for all t > O. Note that a random variable  on {1 is I-measurable iff  0 (Jt =  for all t > o. The invariant a-field I is clearly contained in the tail a-field T. We say that Pp, is weakly mixing if lim 1 1 (PI' - p/!) 00;/ ds = 0, B E ;:00 with PIJB > 0, t-+oo 0 
20. Ergodic Properties of Marko,r Processes 399 where it is understood that Os = O[s] when the time scale is discrete. We may now state the weak counterpart of Theorem 20.10. Theorem 20.11 (weak ergodicity) For any measurable, conservative, dis- crete- or continuous-time Markov semigroup with distributions PJ-L, these conditions are equivalent: (i) the invariant a-field I is PJ-L -trivial for every J-l; (ii) PJL is weakly mixing for every J-l; (iii) every bounded, invariant function is a constant; (iv) II J(PJ-L - Pv) oO;;ldsll  0 as t  00 for any J-l and v. Proof: By Theorem 10.27 (ii) we note that (ii) and (iv) are equivalent to the conditions (ii ' ) PJL == p!! on I for any J-l and B; (iv') PJL == Pv on I for any J.L and v. Here the implications (ii') <=> (i) =? (iv') may be established as before, and so it is enough to show that (iv) => (iii) => (i). (iv) => (iii): Let f be bounded and invariant. Then f(x) = Ex/(Xt) == Ttf(x), and therefore I(x) = J; Tstf(x) ds. Using (iv) gives If(x) - f(y)1 - 1 1 (Tsd(x) - Tsd(y)) ds < IIfll 1 1 (Px - Py) 0 e-;/ ds -+ 0, which shows that f is a constant. (iii) => (i): Fix any A E I, and define I(x) == PxA. Using the Markov property at t and the invariance of A, we get Ex/(Xt) = ExPXtA == Ex p x[Ot 1A IFd = Px O t lA == Px A :=: f(x), which shows that f is invariant. By (iii) it follows that f equals a constant c E [0,1]. Hence, by the Markov property and martingale convergence, we have a.s. c = f{X t ) = PXtA = PJl[8t1AIFt] = PJL[AIFtl -t lA, which implies PJLA = c E {O, 1}. Thus, I is PJ.L-trivial. o Let us now specialize to the case of conservative Feller processes X with distributions Px, defined on an IcscH (locally compact.. second countable Hausdorff) space S with Borel a-field S. We say that the process is regular if there exist a locally finite measure p on S and a continuous function (x, y, t) t-+ Pt(x, y) > 0 on 8 2 x (0,00) such that Px{Xt E B} = L Pt(x, y) p(dy), xES, B E S, t > o. Note that the supporting measure p is then unique, up to an equivalence, and that supp(p) = S by the Feller property. A Feller process is said to be 
400 Foundations of Modern Probability Harris recurrent if it is regular with a supporting measure p satisfying 1 00 IB(X t ) dt = 00 a.s. Px, x E 5, B E S with pB > O. (6) Theorem 20.12 (Harris recurrence and ergodicity, Grey) Any Harris recurrent Feller process is strongly ergodic. Proof: By Theorem 20.10 it suffices to prove that any bounded, space- time invariant function f: S x JR+ -+ 1R is a constant. First we show for fixed xES that f(x, t) is independent of t. Then assume instead that I(x, h) =1= f(x,O) for some h > 0, say f(x, h) > f(x, 0). Recall from Lemma 20.9 that Mt = f(Xt, S + t) is a Py-martingale for any yES and s > o. In particular, the limit M exists a.s. along hQ, and we get a.s. Px Ex[M - MIFo] = }Yl - M8 == f(x, h) - f(x, 0) > 0, which implies Px{M > M} > O. We may then choose some constants a < b such that px{M20 < a < b < M} > o. (7) We also note that Mt+ h 0 Os == f(Xs+t, S + t + h) == M:+ t , s, t, h > o. (8) With sand t restricted to hQ+, we define g(y,s) = pynto{MtS < a < b < M!+h}, Y E 5, s > o. Using the Markov property at sand (8), we get a.s. Px for any r < s g(Xs, s) - p x . ntO {M! < a < b < M:+ h } p:Fs n { MS 0 () < a < b < M s + h 0 () } x tO t s - - t s p:Fs n { MO < a < b < Mh } x t?s t - - t > p:Fs n { MO < a < b < M h } . x > t- - t t_T By martingale convergence, we get a.s. as s -+ 00 along hQ and then r  00 liminfg(Xs,s) > liminfl{M < a < b < Mt h } s-+oo t-+oo > 1 {M < a < b < M}, and so by (7) Px{g(Xs, s) -4 I} > Px{M < a < b < M} > o. (9) Now fix any nonempty, bounded, open set B c S. Using (6) and the right-continuity of X, we note that limsups IB(X s ) == 1 a.s. Px, and so in view of (9) Px{limsuPsIB(Xs) g(Xs, s) = I} > o. (10) 
20. Ergodic Properties of Markor Processes 401 Furthermore, we have by regularity Ph ( U, v) !\ P2h ( u, v) > E > 0, u, v E B, (11) for some E > o. By (10) we may choose some y E Band s > 0 such that g(y, s) > 1 - EpB. Define for i == 1,2 B i == B \ {u E S; feu, s + ih) < a < b < f(u, s + (i + l)h)}. Using (11), the definitions of B i , MS, and g, and the properties of y and s, we get EpB i < Py{X ih E B i } < 1 - Py{f(Xih, s + ih) < a < b < !(X ih , .5 + (i + l)h)} 1 - P { M < a < b < M+h } y h - - h < 1 - g(y, s) < EpB. Thus, pB l + pB 2 < pB, and there exists some u E B \ (B l U B 2 ). But this yields the contradiction a < b < I ( u, s + 2h) < a, which shows that I(x, t) = f(x) is indeed independent of t. To see that I(x) is also independent of x, we assume that instead p{x; f(x) < a} /\ p{x; f(x) > b} > 0 for some a < b. Then by (6) the martingale Mt == f(Xt) satisfies l001{Mt < a}dt= l001{Mt > b}dt=oo a.s. (12) Writing M for the right-continuous version of M, which exists by Theorem 7.27 (ii), we get by Fubini's theorem for any XES sup Ex {U l{M t < a} dt - r l{M t < a} dt u>o Jo Jo < 1 00 Exll{Mt < a} - l{Mt < a}\ dt < 1 00 Px{M t =I- Md dt = 0, and similarly for the events Mt > band Mt > b. Thus, the integrals on the lft agree a.s. Px for all x, and so (12) remains true with M replaced by M. In particular, - - lim inf Mt < a < b < lim sup Mt a.s. P.r. t ---+- 00 t---+-oo But this is impossible, since M is a bounded, right-continuous martingale and therefore converges a.s. The contradiction shows that f(x) == c a.e. p for some constant c E JR. Then for any t > 0, f(x) = Exf(Xt) = J f(y) Pt(x, y) p(dy) = c, XES, 
402 Foundations of Modern Probability and so f(x) is indeed independent of x. o Our further analysis of regular Feller processes requires some poten- tial theory. For any measurable functions f, h > 0 on S, we define the h-potential of Uhf of f by Uh!(X) = Ex 1 00 e-A f(Xt) dt, XES, where Ah denotes the elementary additive functional A = it h(X s ) ds, t > O. When h is a constant a > 0, we note that Uh = U a agrees with the resolvent operator Ra of the semigroup (Tt), in which case Ua!(X) = Ex 1 00 e- at f(Xt) dt = 1 00 e-atTtf(x) dt, XES. The classical resolvent equation extends to general h-potentials as follows. Lemma 20.13 (resolvent equation) Let f > 0 and h > k > 0 be mea- surable functions on S, and assume that h is bounded. Then Uhh < 1, and Uk! = Uhf + Uh(h - k)Ukf = Uhf + Uk(h - k)Uh!. Proof: For convenience, we define F = f{X), H = h{X), and K = k(X). By Ita's formula for continuous functions of bounded variation, e-A = 1 -it e-A Hs ds, t > 0, (13) which implies Uhh < 1. We may also conclude from the Markov property of X that a.s. ( ) f _A k Uk! Xt - EXt looo e · Fsds - E;t 1 00 e-Ao9t Fs+t ds - E;t 1 00 e- A :+ A ; Fu duo (14) 
20. Ergodic Properties of Markov Processes 403 Using (13) and (14) together with Fubini's theorem, we get roo h 1 00 k k Ex Jo e- At (Ht - Kt) dt t e-A,,+At Fu du Ex 1 00 e-A Fu du l u e-A+A (Ht - Kt) dt Ex 1 00 e-A Fu (1 - e-A+A) du Ex 1 00 (e-A - e-A) Fu du Ukf(x) - Uhf(x). Uh(h - k)Ukf(x) A similar calculation gives the same expression for Uk(h - k)Uhf(x). 0 For a simple application of the resolvent equation, we show that any bounded potential function Uhf is continuous. Lemma 20.14 (boundedness and continuity) For any regular Feller pro- cess on S, let J, h > 0 be bounded, measurable functions on S such that Uhf is bounded. Then Uhf is continuous. Proof: Using Fatou's lemma and the continuity of Pt(., y), we get for any time t > 0 and sequence X n --t x in S lfTd(xn) lf J Pt(xn, y)f(y) p(dy) > J Pt(x, y)f(y) p(dy) = Ttf(x). If f < c, the same relation applies to the function Tt(c- f) == c-Ttf, and by combination it follows that Tt! is continuous. By dominated convergence, Uaf is then continuous for every a > o. Now assume that h < a. Applying the previous result to the bounded, measurable function (a - h)Uhf > 0, we conclude that even Ua(a - h)Uhf is continuous. The continuity of Uhf now follows from Lemma 20.13 with hand k replaced by a and h. 0 We proceed with some useful estimates. Lemma 20.15 (lower bounds) For any regular Feller process on S, there exist some continuous functions h, k : S --t (0,1] such that, for every measurable function f > 0 on S, (i) U 2 f(x) > p(kf) hex); (ii) Uhf(x) > p(kf) Uhh(x). Proof: Fix any compact sets K c Sand T c (0, 00) with pK > 0 and AT > o. Define u(X,y) = l e-atpt(x,y)dt, x,y E 8, 
404 Foundations of Modern Probability and note that for any measurable function f > 0 on S, Uaf(x) > / u?;(x,y)f(y)p(dy), xES. Using Lemma 20.13 for the constants 4 and 2 gives U 2 f(x) U 4 f(x) + 2U 4 U 2 !(X) > 2 L ur(x,y)p(dy) / ur(y,z)f(z)p(dz), and (i) follows with h(x) = 2 f K uI(x, y) p(dy) 1\ 1, k(x) == inf uf(y, x) 1\ 1. i  yEK To deduce (ii), we may combine (i) with Lemma 20.13 for the functions 2 and h to obtain Uh!(X) > Uh(2 - h) U 2 !(x) > Uh U 2f(x) > Uhh(x) p(kf). The continuity of h is clear by dominated convergence. For the same reason, the function ur is jointly continuous on S2. Since K is compact, the functions ur(y, .), y E K, are then equicontinuous on S, which yields the required continuity of k. Finally, the relation h > 0 is obvious, whereas k > 0 holds by the compactness of K. 0 Fixing a function h as in Lemma 20.15, we introduce the kernel QxB = Uh(hlB)(x), XES, B E S, (15) and note that QxS == Uhh(x) < 1 by Lemma 20.13. Lemma 20.16 (convergence dichotomy) Let Q be given by (15) in terms of some function h as in Lemma 20.15. (i) If Uhh  1, then IIQnSII < rn-l, n E N, for some r E (0,1). (ii) If Uhh = 1, then IIQ - vii --+ 0, x E 5, for some Q-invariant prob- ability measure v rv p, and every a-finite, Q-invariant measure on S is proportional to 1/. Proof: (i) Choose k as in Lemma 20.15, fix any a E S with Uhh(a) < 1, and define r == 1 - h(a) p(hk(1 - Uhh». Note that p(hk(1 - Uhh)) > 0 since Uhh is continuous by Lemma 20.14. Using Lemma 20.15 (i), we obtain o < 1 - r < h(a) pk < U 2 1(a) = . Next we see from Lemma 20.15 (ii) that (1 - r)Uhh = h(a) p(hk(l - Uhh)) Uhh < Uhh(l - Uhh). 
20. Ergodic Properties of Markov Processes 405 Hence, Q 2 S == UhhUhh < rUhh == rQS, and so by iteration Qns < rn-1QS < rn-I, n E N. (ii) Introduce a measure p == hk . p on S. Since Uhh == 1, we get by Lemma 20.15 (ii) pB == p(hk1B) < U h (h1 B )(x) == QxB, B E:: S. (16) Regarding p as a kernel, we have for any x, yES and m, n E Z+ (Qr; - Q;)jjk == (Qr: s - Q S)pk == o. Iterating (17) and using (16), we get as n --7 00 IIQ _ Q+kll II(b x _ Q)Qnll II(b x - Q)(Q - p)nll < Ilb x - QZII sUPzllQz - plln < 2(1 - ps)n  o. (17) Hence, suPx IIQ - vII -t 0 for some set function v on S, and it is easy to see that v is a Q-invariant probability measure. By Fubini's theorem we note that Qx « p for all x, and so 1/ == 1/Q « p. Conversely, Lemma 20.15 (ii) yields vB == vQ(B) == v(U h (hlB)) > p(hk1B), which shows that even p  1/. Now consider any a-finite, Q-invariant measure J.1 on S. By Fatou's lemma, we get for any B E S J.LB == Hm inf ftQn B > ftV B == ftS 1/ B . n-+oo Choosing B such that J.-tB < 00 and vB > 0, we obtain J.LS < 00. We may then conclude by dominated convergence that ft == J-L?n -+ J-LV == J-LS v, which proves the asserted uniqueness of v. 0 We are now ready to prove the basic recurrence dichotomy for regular Feller processes. Write U == U o and say that X is uniformly transient if IIUIK!I = sUPx Ex 1 00 lK(X t ) dt < 00, K c S compact. Theorem 20.17 (recurrence dichotomy) A regular FeUer process is either Harris recurrent or uniformly transient. Proof: Choose hand k as in Lemma 20.15 and Q, r, and v as in Lemma 20.16. First assume that Uhh t 1. Letting a E (0, IIhll]' we note that ah < (h !\ a) IIhll, and hence aUhAah < UhAa(h 1\ a)lIhll < IIhll. (18) 
406 Foundations of Modern Probability FUrthermore, Lemma 20.13 yields UhAah < Uhh + UhhUhAah = Q(1 + UhAah). Iterating this relation and using Lemma 20.16 (i) and (18), we get Uhl\a h <  Q l 1 + QnUhl\a h L...J l5: n < 2:1nrl-l + rn-11lUhl\ahli < (1 - r)-l + r n - 1 I1hllla. Letting n -4 00 and then a -t 0, we conclude by dominated and monotone convergence that Uh < (1 - r)-I. Now fix any compact set K c S. Since b = inf K h > 0, we get UI K (x) < b-1Uh(x) < b- 1 (1 - r)-l < 00, xES, which shows that X is uniformly transient. Now assume instead that Uhh = 1. Fix any measurable function f on S with 0 < I < hand pf > 0, and put 9 = 1- Uff. By Lemma 20.13 we get 9 1 - Ufl == Uhh - Uhf - Uh(h - f)Uff - Uh(h - f)(1 - Ufl) - Uh(h - I)g < Uhhg = Qg. (19) Iterating this relation and using Lemma 20.16 (ii), we obtain 9 < Qng -4 lIg, where 1/ "J P is the unique Q-invariant distribution on S. Inserting this into (19) gives 9 < Uh(h - 1)1/g, and so by Lemma 20.15 (ii) 1/g < v(Uh(h - f)) vg < (1 - p(kf)) vg. Since p(kf) > 0, we obtain lIg == 0, and so Uff = 1 - 9 = 1 a.e. l/ rv p. Recalling that Utf is continuous by Lemma 20.14 and suppp == S, we obtain UtI = 1. Taking expected values in (13), we conclude that Ate = 00 a.s. Px for every xES. Now fix any compact set K c S with pK > O. Since b = infK h > 0, we may choose f = blK, and the desired Harris recurrence follows. 0 A measure A on S is said to be invariant for the semigroup (Tt) if A(Ttf) == Af for all t > 0 and every measurable function f > 0 on S. In the Harris recurrent case, the existence of an invariant measure A can be inferred from Lemma 20.16. Theorem 20.18 (invariant measure, Harris, Watanabe) Any Harris re- current Feller process on S with supporting measure p has a locally finite, invariant measure A "J p, and every u-finite, invariant measure agrees with A up to a normalization. To prepare for the proof, we first express the required invariance in terms of the resolvent operators. 
20. Ergodic Properties of Markov Processes 407 Lemma 20.19 (invariance equivalence) Let (Tt) be a Feller semigroup on S with resolvent (U a ), and fix any locally finite measure A on Sand constant c > o. Then A is (Tt)-invariant iff it is aU a -invariant for every a > c. Proof: If A is (Tt)-invariant, then Fubini's theorenl yields for any measurable function f > 0 and constant a > 0 ),,(Uaf) = 1 00 e- at )"(Td) dt = 1 00 e- at )..1 dt = )..J/ a, (20) which shows that A is aUa-invariant. Conversely, assume that A is aUa-invariant for every a > c. Then for any measurable function f > 0 on S with Af < 00, the integrals in (20) agree for all a > c. Hence, by Theorem 5.3 the measures .A(Ttf)e-ctdt and Afe-ctdt agree on IR+, which implies A(Ttf) == Af for almost every t > O. By the semi group property and Fubini's theorem we then obtain for any t > O )"(Td) - c)..Uc(Td) = cA l OCJ e-csTsTd ds - c 1 00 e- CS )"(Ts+tf) ds c 1 00 e- CS )..1 ds = )..1, which shows that A is (Tt)-invariant. o Proof of Theorem 20.18: Let h, Q, and v be such as in Lemmas 20.15 and 20.16, and put A = h- 1 . v. Using the definition of .A (twice), the Q- invariance of v (three times), and Lemma 20.13, we get for any constant a > Ilhl! and bounded, measurable function f > 0 on S aAUaf - av(h- 1 U a f) = avUhUal - v(Uhf - Ual + UhhU a !) - vUhl == v(h- 1 I) == AJ, which shows that A is aUa-invariant for every such a. By Lemma 20.19 it follows that A is also (Tt)-invariant. To prove the asserted uniqueness, consider any a-fillite, (Tt)-invariant measure A' on S. By Lemma 20.19, A' is even aUa-invariant for every a > IIhll. Now define v' == h . A'. Letting I > 0 be bounded and measurable on S and using Lemma 20.13, we get as before v'Uh(hf) - A'(hUh(hf)) = aA'UahUh(hf) - aA'(Ua(hf) - Uh(hf) + aUa[Th(hf)) - aA'Ua(hf) = A/(hl) = v' f, 
408 Foundations of Modern Probability which shows that v' is Q-invariant. Hence, the uniqueness part of Lemma 20.16 (ii) yields v' = cv for some constant c > 0, which implies A' - CA. 0 A Harris recurrent Feller process is said to be positive recurrent if the invariant measure A is bounded and null-recurrent otherwise. In the former case, we may assume that A is a probability measure on S. For any process X in S, the divergence Xt ---+ 00 a.s. or Xt  00 means that lK(X t ) ---+ 0 in the same sense for every compact set K c S. Theorem 20.20 (distributional limits) For any regular Feller process X and distribution Jl on S, the following holds as t -+ 00: (i) If X is positive recurrent with invariant distribution ,X and A E ;::00 with PI1A > 0, then IIPt 0 (r;l - PAil -t O. (ii) If X is null-recurrent or transient, then Xt P> 00. Proof: (i) Since P).. 0 Oil = P A by Lemma 8.11, the assertion follows from Theorem 20.12 together with properties (ii) and (iv) of Theorem 20.10. (ii) (null-recurrent case): For any compact set K c S and constant £ > 0, we define Bt = {x E S; Tt 1 K(X) > J-lTt1K - c:}, t > 0, and note that, for any invariant measure '\, (J-lTt1K - c) 'xBt < 'x(T t 1K) = AK < 00. (21) Since /-lTt1K - Tt1K(x) -+ 0 for all xES by Theorem 20.12, we have liminft Bt = S, and so 'xBt --+ 00 by Fatou's lemma. Hence, (21) yields limsuPt J-l T t 1 K < c, and since € was arbitrary, we obtain PJ.L{Xt E K} = JlTt1K -+ o. (ii) (transient case): Fix any compact set K c S with pK > 0, and conclude from the uniform transience of X that UI K is bounded. Hence, by the Markov property at t and dominated convergence, E,..UIK(X t ) = E,..EXt 10 00 lK(X s ) ds = EJ1-1 OO lK(X s ) ds -* 0, which shows that UIK(X t ) P IL ) O. Since UIK is strictly positive and also continuous by Lemma 20.14, we conclude that Xt P IL ) 00. 0 We complete our discussion of regular Feller processes with a pathwise limit theorem. Recall that "almost surely" means a.s. PJ-t for every initial distribution J.L on S. 
20. Ergodic Properties of Markov Processes 409 Theorem 20.21 (pathwise limits) For any regular Feller process X on S, the following holds as t --t 00: (i) If X is positive recurrent with invariant distribution .A, then ellt f(OsX) ds -+ E>J(X) a.s., f bounded, measurable. (ii) If X is null-recurrent, then ellt lK(X s ) ds -+ 0 a.s., K c S compact. (iii) If X is transient, then X t --t ()() a. s. Proof: (i) From Lemma 8.11 and Theorems 20.10 (i) and 20.12 we note that P A is stationary and ergodic, and so the assertion holds a.s. P A by Corollary 10.9. Since the stated convergence is a tail event and Pp, == P A on T for any J..t, the general result follows. (ii) Since P A is shift-invariant with P A {X s E K} == .A K < 00, the left- hand side converges a.e. P A by Theorem 20.2. From Theorems 20.10 and 20.12 we see that the limit is a.e. a constant c > O. Using Fatou's lemma and Fubini's theorem gives E>..c < lim inf e l t P>.. {X s E K} ds = >"K < 00, t-H)() J o which implies c == 0 since IIPAII == 11.A11 == 00. The general result follows from the fact that PJ.L = Pv on T for any distributions J..t and II. (iii) Fix any compact set K c S with pK > 0, and conclude from the Markov property at t > 0 that a.s. PJ-L UIK(X t ) = Ex, 1 00 lK(Xr)dr = E:" 1 00 lK(X r ) dr. Using the chain rule for conditional expectations, we get for any s < t E/L[UIK(Xt)l.F s ] E:'S 1 00 lK(X r ) dr < E:,'l OO lK(X r ) dr = UIK(X s ), which shows that UIK(X t ) is a supermartingale. Since it is also nonnegative and right-continuous, it converges a.s. Pp, as t ---* 00, and the limit equals p o a.s. since UIK(X t )  0 by the preceding proof. Since UI K is strictly positive and continuous, it follows that Xt --+ 00 a.s. Pp,. 0 
410 Foundations of Modern Probability Exercises 1. Given a measure space (S, S, J..L), let T be a positive, linear operator on L 1 n Loo. Show that if T is both an Ll- con traction and an Loo-contraction, then it is also an LP-contraction for every p E [1,00]. (Hint: Prove a Holder- type inequality for T.) 2. Extend Lemma 10.3 to arbitrary transition operators T on a measurable space (S, S). In other words, letting I denote the class of sets B E S with T1B = IB, show that an S-measurable function I > 0 is T-invariant iff it is I-measurable. 3. Prove a continuous-time version of Theorem 20.2 for measurable semigroups of positive L 1 - LOO-contraction. (Hint: Interpolate in the discrete-time result.) 4. Let (Tt) be a measurable, discrete- or continuous-time semigroup of positive L1-Loo-contractions on (8, S, v), let J..Ll, ft2, . .. be asymptotically invariant distributions on Z+ or IR+, and define An = J TtJ-Ln (dt). Show that Ani  At for any I E Ll(A), where  denotes convergence in mea- sure. (Hint: Proceed as in Theorem 20.2, using the contractivity together with Minkowski's and Chebyshev's inequalities to estimate the remainder terms. ) 5. Prove a continuous-time version of Theorem 20.4. (Hint: Use Lemma 20.5 to interpolate in the discrete-time result.) 6. Derive Theorem 10.6 from Theorem 20.4. (Hint: Take 9 = 1, and proceed as in Corollary 10.9 to identify the limit.) 7. Show that when! > 0, the limit in Theorem 20.4 is strictly positive on the set {Boo! 1\ 8 00 9 > O}. 8. Show that the limit in Theorem 20.4 is invariant, at least when T is induced by a measure-preserving map on S. 9. Derive Lemma 20.3 (i) from Lemma 20.6. (Hint: Note that if 9 E 7(/) with fELt, then J-L9 < ftf. Conclude that for any h E L 1 , J-L[h; Mnh > 0] > ft[U n - 1 h; Mnh > 0] > 0.) 10. Show that Brownian motion X in d is regular and strongly ergodic for every dEN with an invariant measure that is unique up to a constant factor. Also show that X is Harris recurrent for d = 1, 2, uniformly transient for d > 3. 11. Let X be a Markov process with associated space-time process _ X. Show that X is strongly ergodic in the sense of Theorem 20.10 iff X is weakly ergodic in the sense of Theorem 20.11. (Hint: Note that a function is space-time invariant for X iff it is invariant for X.) 12. For a Harris recurrent process on JR+ or Z+, every tail event is clearly a.s. invariant. Show by an example that the statement may fail in the transient case. 
20. Ergodic Properties oE Markov Processes 411 13. State and prove discrete-time versions of Theorems 20.12,20.17, and 20.18. (Hint: The continuous-time arguments apply with obvious changes.) 14. Derive discrete-time versions of Theorems 20.17 and 20.18 from the corresponding continuous-time results. 15. Show that a regular Markov process may be weakly but not strongly ergodic. (Hint: For any strongly ergodic process, the assoeiated space-time process has the stated property. For a less trivial example, consider a suitable supercritical branching process.) 16. Give examples of nonregular Markov processes ",ith no invariant measure, with exactly one (up to a normalization), and with more than one. 17. Show that a discrete-time Markov process X and the corresponding pseudo-Poisson process Y have the same invariant measures. FUrthermore, regularity of X implies that Y is regular, but not conversely. 
Chapter 21 Stochastic Differential Equations and Martingale Problems Linear equations and Ornstein-Uhlenbeck processes; strong ex- istence, uniqueness, and nonexplosion criteria; weak solutions and local martingale problems; well-posedness and measurabil- ity; pathwise uniqueness and functional solution; weak existence and continuity; transformation of SDEs; strong Markov and Feller properties In this chapter we shall study classical stochastic differential equations (SDEs) driven by a Brownian motion and clarify the connection with the associated local martingale problems. Originally, the mentioned equations were devised to provide a pathwise construction of diffusions and more gen- eral continuous semimartingales. They have later turned out to be useful in a wide range of applications, where they may provide models for a diversity of dynamical systems with random perturbations. The coefficients deter- mine a possibly time-dependent elliptic operator A as in Theorem 19.24, which suggests the associated martingale problem of finding a process X such that the processes M f in Lemma 19.21 become martingales. It turns out to be essentially equivalent for X to be a weak solutions to the given SDE, as will be seen from the fundamental Theorem 21.7. The theory of SDEs utilizes the basic notions and ideas of stochastic calculus, as developed in Chapters 17 and 18. Occasional references will be made to other chapters, such as to Chapter 6 for conditional independence, to Chapter 7 for martingale theory, to Chapter 16 for weak convergence, and to Chapter 19 for Feller processes. Some further aspects of the theory are displayed at the beginning of Chapter 23 as well as in Theorems 24.2, 26.8, and 27.14. The SDEs studied in this chapter are typically of the form dX: = a; (t, X)dBl + b i (t, X)dt, (1) or more explicitly, X; = X + L. ft 0-;(8, X)dB + t bi(s, X)ds, t > O. (2) J J o J o Here B = (Bl,..., BT) is a Brownian motion in JRr with respect to some filtration F, and the solution X = (Xl,..., X d ) is a continuous 
21. Stochastic Differential Equations and Martingale Problems 413 F-semimartingale in JRd. Furthermore, the coefficients a and b are progres- sive functions of suitable dimension, defined on the canonical path space C(IR+, IR d ) equipped with the induced filtration gt == a{ 11's; S < t}, t > o. For convenience, we shall often refer to (1) as equation (0-, b). For the integrals in (2) to exist in the sense of It6 and Lebesgue integration, X must fulfill the integrability conditions h t (Ia ij (s, X) I + Ib i (s, X)I)ds < 00 a.s., t :> 0 -- , (3) where aij == ala or a == o-a', and the bars denote any norms in the spaces of d x d-matrices and d-vectors, respectively. For the existence and adaptedness of the right-hand side, it is also necessary that the integrands in (2) be progressive. This is ensured by the following result. Lemma 21.1 (progressive functions) Let the function f on JR+ xC(IR+, JRd) be progressive for the induced filtration 9 on C(JR+,d), and let X be a continuous, F-adapted process in JRd. Then the process It == f(t, X) is F -progressive. Proof: Fix any t > O. Since X is adapted, we note that 1r s (X) == Xs is Ft- measurable for every s < t, where 1rs(w) == W s on C(JR+,JR d ). Since Qt == a{7r s ; s < t}, Lemma 1.4 shows that X is Ft/9t-measurable. Hence, by Lemma 1.8 the mapping c.p(s,w) == (8, X(w)) is Bt o Ft/Bt o Qt-measurable from [0, t] x 0 to [0, t] x C (IR+, ]Rd), where Bt == B[O, t]. Also note that f is Bt Q9 gt-measurable on [0, t] x C(]R+, IR d ) since f is progressive. By Lemma 1.7 we conclude that Y = f 0'P is Bt Q9 Ft/B-measurable on [0, t] x n. 0 Equation (2) exhibits the solution process X as an JRd-valued semi- martingale with drift components bi(X) . A and covariation processes [Xi,Xj] = aij(X). A, where aij(w) = aij(.,w) and bi(w) == b(.,w). It is natural to regard the densities aCt, X) and b(t, X) as local characteristics of X at time t. Of special interest is the diffusion case, where 0- and b have the form aCt, w) == a(wt), b(t, w) == b(wt), t > 0, w E C(JR+, ]Rd), (4) for some measurable functions on ]Rd. In that case, the local characteristics at time t depend only on the current position Xt of the process, and the progressivity holds automatically. We shall distinguish between strong and weak solutions to an SDE (a, b). For the former, the filtered probability space (O,:F, P) is regarded as given, along with an F-Brownian motion B and an Fo-measurable random vector . A strong solution is then defined as an adapted process X with Xo ==  a.s. satisfying (1). In case of a weak solution, only the initial distribution J-t is given, and the solution consists of the triple (n, F, P) together with an .r-Brownian motion B and an adapted process X yith P 0 XOI == J.L satisfying (1). 
414 Foundations of Modern Probability This leads to different notions of existence and uniqueness for a given equation (a, b). Thus, weak existence is said to hold for the initial distribu- tion J1 if there is a corresponding weak solution (0, F, P, B, X). By contrast, strong existence for the given J-L means that there is a strong solution X for every basic triple (F, B,) such that  has distribution J.L. We further say that uniqueness in law holds for the initial distribution J.l if the cor- responding weak solutions X have the same distribution. Finally, we say that pathwise uniqueness holds for the initial distribution J.1- if, for any two solutions X and Y on a common filtered probability space with a given Brownian motion B such that Xo = Yo a.s. with distribution J.L, we have X = Y a.s. One of the simplest SDEs is the Langevin equation dX t = dBt - Xtdt, (5) which is of great importance for both theory and applications. Integrating by parts, we get from (5) the equation d(e t Xt) = etdX t + e t Xtdt = etdB t , which admits the explicit solution X - -t x + i t -(t-S) dB t > 0 t - e 0 e s, _, o recognized as an Ornstein-Uhlenbeck process. Conversely, the process in (6) is easily seen to satisfy (5). We further note that ()tX  Y as t --t 00, where Y denotes the stationary version of the process considered in Chapter 13. We can also get the stationary version directly from (6), by choosing Xo to be N(D,) and independent of B. We turn to a more general class of equations that can be solved explicitly. A further extension appears in Theorem 26.8. (6) Proposition 21.2 (linear equations) Let U and V be continuous semi- martingales, and put Z = exp(V - YO - ![V]). Then the equation dX = dU + X dV has the unique solution X = Z{Xo + Z-1 . (U - [U, V])}. (7) Proof: Define Y = X/Z. Integrating by parts and noting that dZ = ZdV, we get dU = dX - XdV == YdZ + ZdY + dry, Z] - XdV = ZdY + d[Y, Z]. (8) In particular, [U, V] = Z. [Y, V] = [Y,Z]. (9) Substituting (9) into (8) yields ZdY = dU - d[U, V], which implies dY = Z- l d(U - [U, V]). To get (7), it remains to integrate from D to t and note that YO = Xo. Since all steps are reversible, the same argument shows that (7) is indeed a solution. 0 
21. Stochastic Differential Equations and Martingale Problems 415 Though most SDEs have no explicit solution, we may still derive gen- eral conditions for strong existence, pathwise uniqueness, and continuous dependence on the initial conditions, by imitating the classical Picard itera- tion for ordinary differential equations. Recall that the relation  denotes inequality up to a constant factor. Theorem 21.3 (strong solutions and stochastic flows, lto) Let a and b be bounded, progressive functions satisfying a Lipschitz condition (u(w) - a(w')); + (b(w) - b(w'));  (w - w');, t > 0, (10) and fix a Brownian motion B in JRT with associated complete filtration :F. Then there exists a jointly continuous process X == (Xf) on IR+ x JRd such that, for any :Fo-measurable random vector  in lR d , equation (a, b) has the a.s. unique solution X{ starting at . For one-dimensional diffusion equations, a stronger result is established in Theorem 23.3. The solution process X == (Xt) on JR+ x }Rd is called the stochastic flow generated by B. Our proof is based on two lemmas, and we begin with an elementary estimate. Lemma 21.4 (Gronwall) Let f be a continuous function on JR+ such that f(t) < a + b it f(s)ds, t > 0, (11) for some a, b > o. Then J(t) < ae bt for all t > o. Proof: We may write (11) as :t {e- bt it f(S)dS} < ae- bt , t > o. It remains to integrate over [0, t] and combine with (11). o To state the next result, let S(X) denote the process defined by the right- hand side of (2). Lemma 21.5 (local contraction) Let a and b be bounded, progressive func- tions satisfying (10), and fix any p > 2. Then there exists a non decreasing function c > 0 on JR+ such that, for any continuous adapted processes X andY inJRd, E(S(X) - S(Y))? < 2EIX o - Yolp + Ct it E(X - Y)?ds, t > o. 
416 Foundations of Modern Probability Proof: By Theorem 17.7, condition (10), and Jensen's inequality, E(S(X) - S(y));P - 2E/Xo - Yolp < E((a(X) - a(Y)) . B);P + E((b(X) - bey)) . A);P 5 E(la(X) - a(Y)1 2 . A)f/2 + E(lb(X) - b(Y)I. A)f  E it (X - y):2ds p/2 + E it (X - Y):ds P < (t p / 2 - 1 + t p - 1 ) l t E(X - Y):Pds. 0 Proof of Theorem 21.3: To prove the existence, fix any Fa-measurable random vector  in JRd, put x2 = , and define recursively X n = S(X n - 1 ) for n > 1. Since l7 and b are bounded, we have E(X 1 - XO);2 < 00, and by Lemma 21.5 E(X nH - xn);2 < Ct it E(X n - xn-l):2ds, t > 0, n > 1. Hence, by induction, nt n E(X n + 1 - X n );2 <  E(X 1 - );2 < 00, t, n > O. n. For any kEN, we get IISUP n 2k(X n - X k );11 2 < L n2k II (X nH - X n );112 < II (Xl - ); 112  (ctn /n!)1/2 < 00. n?k Thus, by Lemma 4.6 there exists a continuous adapted process X with Xo =  such that (X n - X); -+ 0 a.s. and in L2 for each t > O. To see that X solves equation (a, b), we may use Lemma 21.5 to obtain E(X n - S(X));2 < Ct it E(X n - 1 - X):2ds, t > O. As n -+ 00, we get E(X - SeX) );2 = 0 for all t, which implies X = SeX) a.s. Now consider any two solutions X and Y with IX o - Yol < € a.s. By Lemma 21.5 we get for any p > 2 E(X - V)? < 2e P + Ct it E(X - Y)?ds, t > 0, and by Lemma 21.4 it follows that E(X - y);P < 2E P e Ctt , t > o. (12) If Xo == Yo a.s., we may take E == 0 and conclude that X = Y a.s., which proves the asserted uniqueness. Letting Xx denote the solution X with 
21. Stochastic Differential Equations and Martingale Problems 417 Xo = x a.s., we get by (12) EIXx - XYI;P < 21x - ylPeCtt, t > O. Taking p > d and applying Theorem 3.23 for each T > 0 with the met- ric PT(f,g) == (f - g)T' we conclude that the process ()(f) has a jointly continuous version on + x d. From the construction we note that if X and Yare solutions with Xo ==  and Yo == TJ a.s., then X == Y a.s. on the set { == 1J}. In particular, X := Xf. a.s. when  takes countably many values. In general, we may approximate  uniformly by random vectors l, 2,'" in Qd, and by (12) we get x;n -4 Xt in £2 for all t > O. Since also x;n -+ xf a.s. by the continuity of the flow, it follows that Xt = X; a.s. 0 It is often useful to allow the solutions to explode. As in Chapter 19, we may then introduce an absorbing state  at infinity, so that the path space becomes C(JR+, JRd ) with }Rd == IR d U {}. Define (n == inf{t; IXtl > n} for each n, put ( == sUPn (n, and let Xt ==  for t > (. Given a Brownian motion B in JRT and an adapted process X in the extended path space, we say that X or the pair (X, B) solves equation (a, b) on the interval [0, () if ftA(n ftA(n Xtl\(n = Xo + J o o-{s, X)dB s + Jo b{s, X)ds, t > 0, n E N. (13) When ( < 00, we have IX (n I -t 00, and X is said to explode at time (. Conditions for the existence and uniqueness of possibly exploding solu- tions may be obtained from Theorem 21.3 by suitable localization. The following result is then useful to decide whether explosion can actually occur. Proposition 21.6 (explosion) The solutions to equation (a, b) are a.s. nonexploding if a(x); +b(x); :s 1 +x;, t > O. (14) Proof: By Proposition 17.15 we may assume that Xo is bounded. From (13) and (14) we get for suitable constants Ct < 00 EX;(n < 2EI X ol2 + Ct it {I + EX;;"(n )ds, t > 0, n E N, and so by Lemma 21.4 1 + EX;;'(n < (1 + 2EIXo\2) exp(ctt) < 00, t > 0, n E N. As n -4 00, we obtain EX;;, < 00, which implies ( > t a.s. 0 Our next aim is to characterize weak solutions to equation (a, b) by a martingale property that involves only the solution X. Then define M! = f{X t ) - f{X o ) - it Asf{X)ds, t > 0, .f E Gft, (15) 
418 Foundations of Modern Probability where the operators As are given by Asf(x) == aij(s,x)f:j(xs) + bi(s,x)f:(x s ), s > 0, f E CK. (16) In the diffusion case we may replace the integrand Asf(X) in (15) by the expression Af(X s ), where A denotes the elliptic operator AI(x) == aij(x)f:j(x) + bi(x)I:(x), IE CK, x E }Rd. (17) A continuous process X in jRd or its distribution P is said to solve the local martingale problem for (a, b) if M f is a local martingale for every f E C K . When a and b are bounded, it is clearly equivalent for M f to be a true martingale, and the original problem turns into a martingale problem. The (local) martingale problem for (a, b) with initial distribution J.-t is said to be well posed if it has exactly one solution PJ.L. For degenerate initial distributions 8x, we may write Px instead of P8x. The next result gives the basic equivalence between weak solutions to an SDE and solutions to the associated local martingale problem. Theorem 21.7 (weak solutions and martingale problems, Stroock and Varadhan) Let a and b be progressive, and fix any probability measure P on C(+, }Rd). Then equation (a, b) has a weak solution with distribution P iff P solves the local martingale problem for ((Ju', b) . Proof: Write a = au'. If (X,B) solves equation (a, b), then [Xi, Xj] [a1(X) . B k , a{ (X) . Bl] == a1 a f (X) . [B k , Bl] = aij (X) . A. By Ita's formula we get for any f E C K df(Xt) f:(Xt)dX; + f:j(Xt)d[Xi,Xj]t J:(Xt)u}(t,X)dBf + AtJ(X)dt. Hence, dMf = fI(Xt)(J](t,X)dBl, and so Mf is a local martingale. Conversely, assume that X solves the local martingale problem for (a, b). Considering functions f E C K with f(x) = xi for Ixl < n, it is clear by a localization argument that the processes M; = X; - X - l t bi(s, X)ds, t > 0, (18) are continuous local martingales. Similarly, we may choose f:! E C K with f:!(x) = xixj for Ixl < n, to obtain the local martingales Mij = XiX j - XX6 - (X i j3j + X j j3i + aij) . A, where a ij = aij(X) and (3i = bi(X). Integrating by parts and using (18), we get Mij _ Xi. xj + X j . Xi + [Xi,xj] - (X i {3j + X j (3i + ail) . A _ Xi. Mj + x j . M i + [M i , Mj] - aij . A. 
21. Stochastic Differential Equations and Martingale Problems 419 The last two terms on the right then form a local martingale, and so by Proposition 17.2 [M i , Mj]t = it a ij (s, X)ds, t > O. Hence, by Theorem 18.12 there exists a Brownian motion B with respect to a standard extension of the original filtration such that Mi = it ot(s, X)dB:, t > O. Substituting this into (18) yields (2), which means that the pair (X, B) solves equation ((7, b). 0 For subsequent needs, we rcte that the previous construction can be made measurable in the following sense. Lemma 21.8 (functional representation) Let (7 and b be progressive. Then there exists a measurable mapping F: P(C(JR+,JR d )) x C(JR+,JR d ) x [0, 1]  C(JR+,JR T ), such that, if the local martingale problem for ((7(7' , b) admits a solution X with distribution P and iftJlLX is U(O,l), then B =: F(P,X,rJ) is a Brownian motion in JRT and the pair (X, B) with induced filtration solves equation ((7, b) . Proof: In the previous construction of B, the only nonelementary step is the stochastic integration with respect to (X, Y) in Theorem 18.12, where Y is an independent Brownian motion, and the integrand is a progressive function of X obtained by some elementary matrix algebra. Since the pair (X, Y) is again a solution to a local martingale problem, })roposition 17.26 yields the desired functional representation. 0 Combining the martingale formulation with a compactness argument, we may deduce some general existence and continuity results. Theorem 21.9 (weak existence and continuity, Skorohod) Let a and b be bounded, progressive functions such that, for any fixed t : 0, the functions a( t, .) and b( t, .) are continuous on C(JR+, }Rd). Then the martingale problem for ( a, b) has a solution PJl for every initial distribution J-L. If those solutions are unique, then the mapping J-l t--+ PJ.L is weakly continuous. Proof: For any € > 0, t > 0, and x E C(JR+,JR d ), define (7c:(t, x) = (7((t - E)+, x), bc:(t, x) == b«(t - E)+, x), and let ac; = (7 c;(7. Since (7 and b are progressive, the processes (J c; (s, X) and bc;(s,X), s < t, are measurable functions of X on [0, (t - E)+]. Hence, a strong solution XC: to equation ((7 c' be:) may be constructed recursively on the intervals [en - 1)c:, nc:J, n E N, starting from an arbitrary random 
420 Foundations of Modern Probability vector JLB in ]Rd with distribution J-L. Note in particular that X e solves the martingale problem for the pair (a e , be). A pplying Theorem 17.7 to equation (a e, be:) and using the boundedness of a and b, we get for any p > 0 E sup IXf+r - Xfl P >- h p / 2 + h P :s h P / 2 , t,E > 0, h E [0,1]. Orh For p > 2d it follows by Corollary 16.9 that the family {XC:} is tight in C(+, JRd), and by Theorem 16.3 we may then choose some En -+ 0 such that X€n  X for a suitable X. To see that X solves the martingale problem for (a, b), let f E C and s < t be arbitrary, and consider any bounded, continuous function g: C([O, s], JRd) -t JR. We need to show that E {f(X t ) - f(Xs) -it Arf(X)dr} g(X) = o. Then note that X e satisfies the corresponding equation for the operators A constructed from the pair (a e , be). Writing the two conditions as Ec.p(X) == 0 and E'Pe(X€) == 0, respectively, it suffices by Theorem 4.27 to show that <Pe:(xe:) --+ <p(x) whenever Xc  x in C(IR+,JR d ). This follows easily from the continuity conditions imposed on a and b. Now assume that the solutions PJ-L are unique, and let J-ln  jl. Arguing as before, we see that (PJ.Ln) is tight, and so by Theorem 16.3 it is also relatively compact. If PJ.Ln  Q along some subsequence, then as before we note that Q solves the martingale problem for (a, b) with initial distribution Jl. Hence Q == PJ-L' and the convergence extends to the original sequence. 0 Our next aim is to show how the well-posedness of the local martingale problem for (a, b) extends from degenerate to arbitrary initial distributions. This requires a basic measurability property, which will also be needed later. Theorem 21.10 (measurability and mixtures, Stroock and Varadhan) Let a and b be progressive and such that, for any x E JRd, the local martingale problem for ( a, b) with initial distribution 8x has a unique solution Px. Then (Px) is a kernel from ]Rd to C(R+,JR d ), and for every initial distribution jl, the associated local martingale problem has the unique solution PI-' == J PxJ.L( dx). Proof: According to the proof of Theorem 21.7, it is enough to formu- late the local martingale problem in terms of functions f belonging to some countable subclass C C C K , consisting of suitably truncated ver- sions of the coordinate functions Xi and their products xix j . Now define P = P(C(IRd,lR. d )) and PM = {P x ; x E JRd}, and write X for the canon- ic.I process in C(JR+,JRd). Let D denote the class of measures PEP with degenerate projections P 0 XOI. Next let I consist of all measures PEP such that X satisfies the integrability condition (3). Finally, put 
21. Stochastic Differential Equations and Martingale Problems 421 Tl = inf{t; IM!I > n}, and let L be the class of measures PEP such that the processes M!,n = M f (t 1\ rl) exist and are martingales under P for all f E C and n E N. Then clearly PM = D n I n L. To prove the asserted kernel property, it is enough to show that PM is a measurable subset of P, since the desired measurability will then follow by Theorem A1.3 and Lemma 1.40. The measurability of D is clear from Lemma 1.39 (i). Even I is measurable, since the integrals on the left of (3) are measurable by Fubini's theorem. Finally, L n I is a measurable subset of I, since the defining condition is equivalent to countably many relations of the form E[M!,n - M!,n; F] = 0, with f E C, n E N, s < t in Q+, and F E Fs. Now fix any probability measure J.L on JRd. The measure PJ-l == J PxJ.L( dx) has clearly initial distribution J-L, and from the previous argument we note that PJ.L again solves the local martingale for (a, b). To prove the uniqueness, let P be any measure with the stated properties. Then E[M!,n - M!,n; FI Xo] == 0 a.s. for all f, n, s < t, and F as above, and so P[ .IX o ] is a.s. a solution to the local martingale problem with initial distribution {; Xo . Thus, P[ .IX o ] == PX o a.s., and we get P == EP xo == J PxJl(dx) == PJ-l. This extends the well-posedness to arbitrary initial distributions. 0 We return to the basic problem of constructing a Feller diffusion with given generator A in (17) as the solution to a suitable SDE or the associated martingale problem. The following result may be regarded as a converse to Theorem 19.24. Theorem 21.11 (strong Markov and Feller properties, Stroock and Varad- han) Let a and b be measurable functions on jRd such that, for any x E jRd, the local martingale problem for (a, b) with initial distribution b x has a unique solution Px. Then the family (Px) satisfies the strong Markov property. If a and b are also bounded and continuous, then the equation Ttf(x) = Exf(Xt) defines a Feller semigroup on Co, and the operator A in (17) extends uniquely to the associated generator. Proof: By Theorem 21.10 it remains to prove that, for any state x E jRd and bounded optional time T, Px [X 0 Or E . 1F T ] = PX.,. a.s. As in the previous proof, this is equivalent to count ably many relations of the form Ex[{(M!,n - M!,n)1p} 0 OTIF,] == 0 a.s. (19) with s < t and F E Fs, where M f,n denotes the process M f stopped at Tn = inf{t; 1M!! > n}. Now 0:;1 Fs C F'+8 by Lemma 7.5, and in the diffusion case ( M!,n M!,n ) (J - M! Mf t - s 0 r - (r+t)I\U n - rl\u n , 
422 Foundations of Modern Probability where an = T + Tn 0 Or, which is again optional by Proposition 8.8. Thus, (19) follows by optional sampling from the local martingale property of M f under Px. Now assume that a and b are also bounded and continuous, and define Ttf(x) = Exf(Xt). By Theorem 21.9 we note that Ttf is continuous for every f E Co and t > 0, and from the continuity of the paths it is clear that Ttf(x) is continuous in t for each x. To see that Ttf E Co, it remains to show that IXf/  00 as Ixl -+ 00, where XX has distribution Px. But this follows from the SDE by the boundedness of a and b if for 0 < r < Ixl we write E/XX - xl 2 t + t 2 P{IXfl<r} < P{IXf-x\>lxl-r} < (lxl t -r)2 ;S (lxl-r)2 ' and let Ixl -+ 00 for fixed rand t. The last assertion is obvious from the uniqueness in law together with Theorem 19.23. 0 It is usually harder to establish uniqueness in law than to prove weak exis- tence. Some fairly general uniqueness criteria will be obtained in Theorems 23.1 and 24.2. For the moment we shall only exhibit some transformations that may simplify the problem. The following result, based on a change of probability measure, is often useful to eliminate the drift term. Proposition 21.12 (transformation of drift) Let a, b, and c be pro- gressive functions of suitable dimension, where c is bounded. Then weak existence holds simultaneously for equations ( a, b) and ((j, b + ac). If, more- over, c = a'h for some progressive function h, then even uniqueness in law holds simultaneously for the two equations. Proof: Let X be a weak solution to equation (a, b), defined on the canoni- cal space for (X, B) with induced filtration F and with probability measure P. Put V = c(X), and note that (V 2 . '\)t is bounded for each t. By Lemma 18.18 and Corollary 18.25 there exists a probability_ measure Q with Q == £(V' . B)t' P on Ft for each t > 0, and we note that B = B - V . A is a Q-Brownian motion. Under Q we further get by Proposition 18.20 X - Xo - a(X). (B + V . A) + b(X) . A - a(X). iJ + (b + O"c)(X) . '\, which shows that X is a weak solution to the SDE (a, b + ac). Since the same argument applies to equation (a, b + O"c) with c replaced by -c, we conclude that weak existence holds simultaneously for the two equations. N ow let c = a' h, and assume that uniqueness in law holds for equation (a, b+ ah). Further assume that (X, B) solves equation (a, b) under both P and Q. Choosing V and iJ as before, it follows that (X, B) solves equation (a, b+ O"c) under the transformed distributions [(V' . B)t' P and [(V' . B)t' Q for (X, B). By hypothesis the latter measures then have the same X- marginal, and the stated condition implies that [(V' . B) is X-measurable. Thus, the X-marginals agree even for P and Q, which proves the uniqueness 
21. Stochastic Differential Equations and Martingale Problems 423 in law for equation (a, b). Again we may reverse the argument to get an implication in the other direction. 0 Next we examine how an SDE of diffusion type can be transformed by a random time-change. The method will be used systematically in Chapter 23 to analyze the one-dimensional case. Proposition 21.13 (scaling) Fix some measurable functions a, b, and c > 0 on d, where c is bounded away from 0 and 00. Then weak existence and uniqueness in law hold simultaneously for equations ( (1, b) and (ca, c 2 b). Proof: Assume that X solves the local martingale problem for the pair (a, b), and introduce the process V =: c 2 (X) o-X with inverse (Ts). By optional sampling we note that MIs' s > 0, is again a local ma.rtingale, and the process Y s =: X Ts satisfies M£ = f(Ys) - f{Yo) - is c 2 Af{Yr)dr. Thus, Y solves the local martingale problem for (c 2 a, c 2 b). Now let T denote the mapping on C(JR+,d) leading from X to Y, and write T' for the corresponding mapping based on c- 1 . Then T and T' are mutual inverses, and so by the previous argument applied to both mappings, a measure P E P( C(JR+, ffi.d)) solves the local martingale prob- lem for (a, b) iff P 0 T- 1 solves the corresponding problem for (c 2 a, c 2 b). Thus, both existence and uniqueness hold simultaneously for the two prob- lems. By Theorem 21.7 the last statement translates immediately into a corresponding assertion for the SDEs. 0 Our next aim is to examine the connection between weak and strong solu- tions. Under appropriate conditions, we shall further establish the existence of a universal functional solution. To explain the subsequent terminology, let Q be the filtration induced by the identity mapping (€, B) on the canon- ical space 0 = JRd x C(IR+, JRT), so that Qt =: a{€, B t ), t > 0, where B = BSl\t. Writing WT for the r-dimensional Wiener measure, we intro- duce for any J1 E P(ffi. d ) the (J1 Q9 WT)-completion 9f of Qt. The universal completion g t is defined as nJ-L gf, and we say that a function F: JRd x C(JR+, ]RT) ---t C(JR+, ]Rd) (20) is universally adapted if it is adapted to the filtration 9 = ( Q t). Theorem 21.14 (pathwise uniqueness and functional solution) Let a and b be progressive and such that weak existence and pathwise uniqueness hold for solutions to equation ( a, b) starting at fixed points. Then strong existence and uniqueness in law hold for any initial distribution, and there exists a measurable and universally adapted function F as in (20) such that every solution (X, B) to equation (a, b) satisfies X = F(X o , B) a.s. Note in particular that the function F above is independent of initial distribution J-t. A key step in the proof, accomplished in Lemma 21.17, is 
424 Foundations of Modern Probability to establish the corresponding result for a fixed J.1.. Two further lemmas will be needed, and we begin with a statement that clarifies the connection between adaptedness, strong existence, and functional solutions. Lemma 21.15 (transfer of strong solution) Let (X, B) solve equation (a, b), and assume that X is adapted to the complete filtration induced by Xo and B. Then X == F( Xo, B) a.s. for some Borel-measurable function F as in (20), and for any basic triple (:F, B,) with  d X o , the process X == F(, B) is :F-adapted and such that the pair (X, B) solves equation (a, b). Proof: By Lemma 1.13 we have X == F(X o , B) a.s. for some Borel- measurable function F as stated. By the same result, there exists for every t > 0 a further representation of the form Xt == Gt(X o , B t ) a.s., and so F(X o , B)t == Gt(X o , B t ) a.s. Hence, Xt == Gt(f" iJt) a.s., and so X is :F- adapted. Since also (X, B) d (X, B), Proposition 17.26 shows that even the former pair solves equation (a, b). 0 The following result shows that even weak solutions can be transferred to any given probability space with a specified Brownian motion. Lemma 21.16 (transfer of weak solution) Let (X, B) solve equation (0", b), and fix any basic triple (:F,iJ,(,) with (, d Xo. Then there exists - - - - d a process XlL,BF with Xo == f, a.s. and (X, B) = (X, B). Furthermore, the filtration g induced by (X,:F) is a standard extension of:F, and the pair (X, B) with filtration 9 solves equation ( 0", b). Proof: By Theorem 6.10 and Proposition 6.13 there exists a process - - - d - XJl,BF satisfying (X,("B) = (X, Xo, B), and in particular Xo =  a.s. To see that 9 is a standard extension of :F, fix any t > 0 and de- fine B' = B - iJt. Then (X t , iJt).lLB' since the corresponding relation holds for (X, B), and so X t .J.l.,BtiJ'. Since also X t lL,BF, Proposition 6.8 yields xtJ.l.,Bt(B',:F) and hence xtJL.rt:F. But then (Xt,Ft)Jl.rtF by Corollary 6.7, which means that 9tJl.FtF. Since standard extensions preserve martingales, Theorem 18.3 shows that iJ remains a Brownian motion with respect to Q. As in Proposition 17.26, we conclude that the pair (X, B) solves equation (a, b). 0 We are now ready to establish the crucial relationship between strong existence and pathwise uniqueness. Lemma 21.17 (strong existence and pathwise uniqueness, Yamada and Watanabe) Assume that weak existence and pathwise uniqueness hold for solutions to equation ((j, b) with initial distribution J.L. Then even strong existence and uniqueness in law hold for such solutions, and there exists a measurable function FIJ as in (20) such that any solution (X, B) with initial distribution Jl satisfies X = FJL(Xo, B) a.s. 
21. Stochastic Differential Equations and Martingale Problems 425 Proof: Fix any solution (X, B) with initial distribution J-L and associated filtration F. By Lemma 21.16 there exists some process Y Jlxo,B F with Yo ==: Xo a.s. such that (Y, B) solves equation (a, b) for the filtration 9 induced by (Y,F). Since 9 is a standard extension of :F, the pair (X, B) remains a solution for g, and the pathwise uniqueness yields X == Y a.s. For each t > 0 we have Xtl.Lxo,BX t and (Xt,Bt)Jl(B - B t ), and so X t llxo,BtX t a.s. by Proposition 6.8. Thus, Corollary 6.7 (ii) shows that X is adapted to the complete filtration induced by ()(o, B). Hence, by Lemma 21.15 there exists a measurable function Fp, with X == FJ-L(Xo, B) a.s. and such that, for any basic triple (f:, B,) with  = Xo, the process X == FI-L(' B) is f:-adapted and solves equation (a, b) along with B. In particular,  d X since (c;, B) d (X o , B), and the pathwis niqueness shows that X is the a.s. unique solution for the given triple (F, B, ). This proves the uniqueness in law. 0 Proof of Theorem 21.14: By Lemma 21.17 we have uniqueness in law for solutions starting at fixed points, and Theorem 21.10 shows that the corresponding distributions Px form a kernel from d to C(+, JRd). By Lemma 21.8 there exists a measurable mapping G such that, whenever X has distribution Px and 1?lLX is U(O, 1), the process B == G(Px, X, 19) is a Brownian motion in }RT and the pair (X, B) solves equation (a, b). Writing Qx for the distribution of (X, B), it is clear from Lemmas 1.38 and 1.41 (ii) that the mapping x  Qx is a kernel from JRd to C(+,JRd+T). Changing the notation, we may write (X, B) for the canonical process in C(IR+, JRd+T). By Lemma 21.17 we have X == Fx(x, B) == Fx{B) a.s. Qx, and so Qx[X E .IB] == OFx(B) a.s., x E d. (21) By Proposition 7.26 we may choose versions vx,w == Qx[X E .IB E dw] that combine into a probability kernel v from JRd x C(+, JRT) to C(JR+, }Rd). From (21) we see that vx,w is a.s. degenerate for each x, and since the set D of degenerate measures is measurable by Lemma 1.39 (i), we can modify v such that vx,wD = 1. In that case, vx,w == OF(x,w), x E JRd, W E C(JR+,JR T ), (22) for some function F as in (20), and the kernel property of l/ implies that F is product measurable. Comparing (21) and (22) gives F(x, B) == Fx(B) a.s. for all x. Now fix any probability measure J.-l on jRd, and conclude as in Theorem 21.10 that PJ.L = J PxJ.L(dx) solves the local martingale problem for (a, b) with initial distribution J-L. Hence, equation (a, b) has a solution (X, B) with distribution Jl for Xo. Since conditioning on :Fo preserves martingales, the equation remains conditionally valid given Xo. By the pathwise uniqueness in the degenerate case we get P[X == F(X o , B)IX o ] == 1 a.s., and so X == F(X o , B) a.s. In particular, the pathwise uniqueness extends to arbitrary initial distributions J-L. 
426 Foundations of Modern Probability Returning to the canonical setting, we may take (, B) to be the identity map on the canonical space}Rd x C(+,JRT), endowed with the probability measure J-L @ WT and the induced complete filtration gJ.L. By Lemma 21.17 equation (0-, b) has a gJ1- a dapted solution X == FJ1(' B) with Xo =  a.s., and the previous discussion shows that even X == F(, B) a.s. Hence, F' is adapted to g'\ and since J.-l is arbitrary, the adaptedness extends to the universal completion g t == n/-L gr, t > o. 0 Exercises 1. Show that for any c E (0,1), the stochastic flow Xf in Theorem 21.3 is a.s. Holder continuous in x with exponent c, uniformly for bounded x and t. (Hint: Apply Theorem 3.23 to the estimate in the proof of Theorem 21.3. ) 2. Show that a process X in JRd is a Brownian motion iff the process f(Xt) -  f f(Xs)ds is a martingale for every f E C K . Compare with Theorem 18.3 and Lemma 19.21. 3. Show that a Brownian bridge in JRd satisfies the SDE dX t == dBt - (1 - t)-l Xt dt on [0,1) with initial condition Xo = o. Also show that if Xx denotes the solution starting at x, then the process x == Xf - (1 - t)x is again a Brownian bridge. (Hint: Note that Mt = Xt/(1-t) is a martingale on [0,1) and that yx satisfies the same SDE as X.) 4. Solve the preceding SDE, using Proposition 21.2, to express the Brown- ian bridge in terms of a Brownian motion. Compare with previously known formulas. 5. Given two continuous semimartingales U and V, show that the Fisk- Stratonovich SDE dX = dU + X 0 dV has the unique solution X = Z (X o + Z-l oU), where Z = exp(V - V o ). (Hint: Use Corollary 17.21 and the chain rule for FS-integrals, or derive the result from Proposition 21.2.) 6. Show under suitable conditions how a Fisk-Stratonovich SDE can be converted into an Ito equation, and conversely. Also give a sufficient condition for the existence of a strong solution to an FS-equation. 7. Show that weak existence and uniqueness in law hold for the SDE dX t = sgn(Xt+ )dBt with initial condition Xo = 0, while strong existence and pathwise uniqueness fail. (Hint: Show that any solution X is a Brownian motion, and define B = sgn(X+)dX. Note that both X and -X satisfy the given SDE.) 8. Show that weak existence holds for the SDE dX t = sgn(Xt)dBt with initial condition Xo = 0, while strong existence and uniqueness in law fail. (Hint: We may take X to be a Brownian motion or put X = 0.) 9. Show that strong existence holds for the SDE dX t = l{X t =1= O}dB t with initial condition Xo = 0, while uniqueness in law fails. (Hint: Here X = B and X = 0 are both solutions.) 
21. Stochastic Differential Equations and Martingale Problems 427 10. Show that a given process may satisfy SDE's with different ((J(J', b). (Hint: For a trivial example, take X == 0, b == 0, and a == 0 or a(x) == sgnx.) 11. Construct a non-Markovian solution X to the SDE d.Jt == sgn(Xt)dBt. (Hint: We may take X to be a Brownian motion, stopped at the first visit to 0 after time 1. Another interesting choice is to take X to be 0 on [0,1] and a Brownian motion on [1, 00 ).) 12. For X as in Theorem 21.3, construct an SDE in }Rmd satisfied by the process (Xfl,.. . , Xfm) for arbitrary Xl, . . . ,X m E ]Rd. Conclude that .c(X) is determined by .c(XX, XY) for arbitrary x, y E IR d . (Hint: Note that .c( XX) is determined by (aa' , b) and x, and apply this result to the m-point motion. ) 13. Find two SDE's as in Theorem 21.3 with solutions )( and Y such that Xx d yx for all X but X $. Y. (Hint: We may choose dX == dB and dY = sgn(Y + )dB.) 14. For a diffusion equation (a, b) as in Theorem 21.3, show that the dis- tribution of the associated flow X determines Lj aj(x)a](y) for arbitrary pairs i,k E {I,... ,d} and x,y E }Rd. 15. Show that if weak existence holds for the SD E (a, b), then the pathwise uniqueness can be strengthened to the corresponding property for solutions X and Y with respect to possibly different filtrations. 16. Assume that weak existence and the stronger version of pathwise uniqueness hold for the SDE (a, b). Use Theorem 6.10 and Lemma 21.15 to prove the existence for every J.1 of an a.s. unique functional solution F(Xo, B) with .c(X o ) == J.L. 
Chapter 22 Local Time, Excursions, and Additive Functionals Tanaka's formula and semimartingale local time; occupation density, continuity and approximation; regenerative sets and processes; excursion local time and Poisson process; Ray-Knight theorem; excessive functions and additive functionals; local time at a regular point; additive functionals of Brownian motion The central theme of this chapter is the notion of local time, which we will approach in three different ways, namely via stochastic calculus, via excursion theory, and via additive functionals. Here the first approach leads in particular to a useful extension of Ita's formula and to an interpretation of local time as an occupation density. Excursion theory will be developed for processes that are regenerative at a fixed state, and we shall prove the basic Ita representation, involving a Poisson process of excursions on the local time scale. Among the many applications, we consider a version of the Ray-Knight theorem about the spatial variation of Brownian local time. Finally, we shall study continuous additive functionals (CAFs) and their potentials, prove the existence of local time at a regular point, and show that any CAF of one-dimensional Brownian motion is a mixture of local times. The beginning of this chapter may be regarded as a continuation of the stochastic calculus developed in Chapter 17. The present excursion theory continues the elementary discussion for the discrete-time case in Chapter 8. Though the theory of CAFs is formally developed for Feller processes, few results from Chapter 19 will be needed beyond the strong Markov property and its integrated version in Corollary 19.19. Both semimartingale local time and excursion theory will reappear in Chapter 23 as useful tools for studying one-dimensional SDEs and diffusions. Our discussion of CAFs of Brownian motion and their associated potentials is continued at the end of Chapter 25. For the stochastic calculus approach to local time, consider an arbitrary continuous semimartingale X in JR. The semimartingale local time LO of X at 0 may be defined through Tanaka '8 formula L = IXtl - IXol - it sgn(Xs- )dX s , t > 0, (1) 
22. Local Time, Excursions, and Additive Functionals 429 where sgn(x-) = l(o,oo)(x) - 1C-oo,o] (x). Note that the stochastic integral on the right exists since the integrand is bounded and progressive. The process LO is clearly continuous and adapted with Lg == O. To motivate the definition, we note that a formal application of Ita's rule to the function f(x) = Ix\ yields (1) with L = fs<t 8(X s )d[X]s. The following result gives the basic properties of local time at a fixed point. Here we say that a non decreasing function f is supported by a Borel set A if the associated measure J-L satisfies j.LAc = o. The support of I is the slnallest closed set with this property. Theorem 22.1 (semimartingale local time) Let L O be t.he local time at 0 of a continuous semimartingale X. Then L O is a.s. nondecreasing, contin- uous, and supported by the set Z == {t > 0; Xt = O}. Furthermore, we have a.s. L == { -IXol- inf r s gn(X-)dX } V 0, t > O. (2) st J o The proof of the last assertion depends on an elementary observation. Lemma 22.2 (supporting function, Skorohod) Let I be a continuous function on + with fo > O. Then there exists a unique nondecreas- ing, continuous function 9 with 90 == 0 such that h == I + 9 > 0 and f l{h > O}dg == 0, namely 9t == - inf fs 1\ 0 == sup( - Is) V 0, t > o. (3) st st Proof: The function in (3) clearly has the desired properties. To prove the uniqueness, assume that both 9 and g' have the stated properties, and put h = f + 9 and h' == I + g'. If 9t < g for some t > 0, define s = sup{r < t; 9r = g}, and note that h' > h' - h == gl - 9 > 0 on (8, t]. Hence, 9 = g, and so 0 < g - gt < g - gs == 0, a contradiction. D Proof of Theorem 22.1: For any h > 0, we may choose a convex function fh E 0 2 such that fh(x) == -x for x < 0 and fh(x) = x -- h for x > h. Here clearly fh(x)  Ixl and f -+ sgn(x-) as h -+ O. By Ita's formula we get, a.s. for any t > 0, y;;h = !h(X t ) - fh(X O ) -I t f(Xs)dXs =  it f::(Xs)d[X]s, and by Corollary 17.13 and dominated convergence we note that (Y h - L 0 );  0 for each t > o. The first assertion now follows froIn the fact that the processes yh are nondecreasing and satisfy 1 00 l{X s ft [0, h]}dY s h = 0 a.s., h > o. The last assertion is a consequence of Lemma 22.2. 0 
430 Foundations of Modern Probability In particular, we may deduce a basic relationship between a Brownian motion, its maximum process, and its local time at o. The result improves the elementary Proposition 13.13. Corollary 22.3 (local time and maximum process, Levy) Let L O be the local time at 0 of Brownian motion B, and define Mt == sUPs:S;t Bs. Then (LO,IBI) d (M,M - B). Proof: Define B == - fs<t sgn(Bs- )dBs and M; == sUPst B, and con- clude from (1) and (2) tht LO == M' and IBI == L O - B' == M' - B'. It remains to note that B' d B by Theorem 18.3. D The local time LX at an arbitrary point x E JR. is defined as the local time of the process X - x at O. Thus, Lf = IX t - xl- IX o - xl -it sgn(X s - x- )dX s , t > O. (4) The following result shows that the two-parameter process L == (Lf) on JR+ x JR has a version that is continuous in t and rcll (right-continuous with left-hand limits) in x. In the martingale case we even have joint continuity. Theorem 22.4 (regularization, Trotter, Yor) Let X be a continuous semimartingale with canonical decomposition M + A. Then the local time L == (L[) of X has a version that is rcll in x, uniformly for bounded t, and satisfies Lf - Lf- = 2i t l{X s = x}dAs, x E , t E +. (5) Proof: By the definition of L we have for any x E Rand t > 0 LX t IX t - xl - \X o - xl -it sgn(X s - x-)dM s -it sgn(X s - x- )dAs. (6) By dominated convergence the last term has the required continuity proper- ties, and the discontinuities in the space variable are given by the right-hand side of (5). Since the first two terms are trivially continuous in (t, x), it re- mains to show that the first integral in (6), denoted by If below, has a jointly continuous version. By localization we may then assume that the processes X - Xo, [M]1/2, and J IdAI are all bounded by some constant c. Fix any p > 2. By Theorem 17.7 we get for any x < y E(I X - IY);P < 2 P E(l(x,y] (X) . M);P ;s E(l(x,y] (X) . [M])f/2. (7) 
22. Local Time, Excursions, and Additive F'unctionals 431 To estimate the integral on the right, put y - x == h and choose f E C 2 with f" > 2 . 1(x,yJ and If' I < 2h. By Ita's formula 1(x,yJ(X) . [M] < f"(X). [X] == f(X) - f(Xo) - f'(X) . X < 4ch + If' (X) . MI, (8) and by another application of Theorem 17.7 E(f' (X) . M);p/2 :S E( (f' (X))2 . [MJ)f/4 < (2ch )p/2. (9) Combination of (7)-(9) gives E(IX - IY);P :s (ch )p/2" and the desired continuity follows by Theorem 3.23. 0 By the last result we may henceforth assume the local time Lf to be rcll in x. Here the right-continuity is only a convention, consistent with our choice of a left-continuous sign function in (4). If the occupation measure of the finite variation component A of X is a.s. diffuse, then (5) shows that L is a.s. continuous. We proceed to give a simultaneous extension of Ita's and Tanaka's for- mulas. Recall that any convex function f on JR has a nondecreasing and left-continuous left derivative f'(x-). The same thing is then true when f is the difference between two convex functions. In that ease there exists a unique signed measure J-Lf with J-Lf[x, y) == f'(y-) - f'(.];-) for all x < y. In particular, J-L f (dx) == f" (x )dx when f E C 2 . Theorem 22.5 (occupation density, Meyer, Wang) Llt X be a continu- ous semimartingale with right-continous local time L. Then outside a fixed null set we have, for any measurable function f > 0 on IR, it f(Xs)d[X]s = I: f(x)Ldx, t > o. (10) If f is the difference of two convex functions, then also f(Xt) - f(Xo) = it f'(X - )dX +  I: Lf /-Lf(dx), t > o. (11) In particular, Theorem 17.18 extends to any function fECI () such that f' is absolutely continuous with Radon-Nikodym derivative f". Note that (11) remains valid for the left-continuous version of L, provided that f' (X -) is replaced by the right derivative f' (X + ) , Proof: For f(x) = Ix - ai, equation (11) reduces to the definition of Lf. Since the formula is also trivially true for affine functions f (x) - ax + b, it extends by linearity to the case when J-Lf is supported by a finite set. By linearity and a suitable truncation, it remains to prove (11) when J.1f is positive with bounded support and f( -(0) == f' ( -00) == o. Then define for every n E N the functions 9n(X) = f'(2- n [2 n x]-), fn(x) = J 9n(u)du, x E JR, 
432 Foundations of Modern Probability and note that (11) holds for all fn. As n  00, we get f(x-) = 9n(X-) t f'(x-), and so Corollary 17.13 yields f(X-) . X  f'(X-) . X. Also note that In -t I by monotone convergence. It remains to show that J L J-l f n ( dx)  J L J-l f ( dx). Then let h be any bounded, right-continuous function on , and note that J-tfnh == /-lfh n with hn(x) = h(2- n [2 n x + 1]). Since h n -7 h, we get J-lfh n  J-lfh by dominated convergence. Comparing (11) with Ita's formula, we note that (10) holds a.s. for any t > 0 and f E C. For each t > 0, the two sides of (10) define random measures on JR, and so by suitable approximation and monotone class ar- guments we may choose the exceptional null set N to be independent of f. By the continuity of each side, we may also assume that N is independent of t. If fECI with f' as stated, then (11) applies with J.1f(dx) = f"(X)dx, and the last assertion follows by (10). 0 In particular, we note that the occupation measure at time t, 7]tA = it lA(Xs)d[X]s, A E B(JR), t > 0, (12) is a.s. absolutely continuous with density Lt. This leads to a simple construction of L. Corollary 22.6 (right derivative) Outside a fixed P-null set, we have L == lim T}t[x, x + h)/h, t > 0, x E IR. hO Proof: Use Theorem 22.5 and the right-continuity of L. o Our next aim is to show how local time arises naturally in the context of regenerative processes. Then consider an rcll process X in some Polish space S such that X is adapted to some right-continuous and complete filtration :F. Fix a state a E S, and assume X to be regenerative at a, in the sense that there exists some distribution Pa on the path space satisfying P[Br X E .IF T ] == Pa a.s. on {T < 00, X r == a}, (13) for every optional time T. The relation will often be applied to the hitting times 7"r = inf{t > r; Xt = a}, which are optional for all r > 0 by Theorem 7.7. In fact, when X is continuous, the optionality of 7"r follows already from the elementary Lemma 7.6. In particular, we note that :Fro and (}ToX are conditionally independent, given that TO < 00. For simplicity we may henceforth take X to be the canonical process on the path space D = D(+, S), equipped with the distribution P = Pa. Introducing the regenerative set Z = {t > 0; Xt = a}, we may write the last event in (13) simply as {7" E Z}. From the right-continuity of X it is clear that Z :3 t n t t implies t E Z, which means that every point in Z \ Z is isolated from the right. Since Z C is open and hence a countable union of disjoint open intervals, it follows that ZC is a countable union of disjoint intervals of the form (u, v) or [u, v). With every such interval we 
22. Local Time, Excursions, and Additive Functionals 433 may associate an excursion process yt == X(t+u)I\V' t > O. Note that a is absorbing for Y, in the sense that yt == a for all t > inf {s > 0;  == a}. The number of excursions may be finite or infinite, and if Z is bounded there is clearly a last excursion of infinite length. We begin with a classification according to the local properties of Z. Proposition 22.7 (local dichotomies) If the set Z is regenerative, then (i) either ( Z )O == 0 a.s. or Zo == Z a.s.; (ii) either a.s. all points of Z are isolated, or a.s. none of them is; (iii) either AZ = 0 a.s. or supp(Z . A) == Z a.s. Recall that the set Z is said to be nowhere dense if ( Z )O == 0, and that Z is perfect if Z has no isolated points. If ZO == Z , then clearly supp(Z."\) == Z , and no isolated points exist. Proof: By the regenerative property, we have for any optional time T PiT == O} == E[P[7 == OIFo]; 7 == 0] == (P{T == o})2, and so P{r == O} == 0 or 1. If (J is another optional time, then T' == a+To(}(j is again optional by Proposition 8.8, and we get P{r' - h < a E Z} == PiT 0 OCT < h, a E Z} == PiT ; h}P{a E Z}. Thus, P[7' - a E '\(J E Z] == p 0 7- 1 , and in particular T == 0 a.s. implies T' == a a.s. on {a E Z}. (i) Here we apply the previous argument to the optional times T == inf ZC and a == rr. If T > 0 a.s., then rO(}T r > 0 a.s. on {rr < oo}, and so Tr E ZO a.s. on the same set. Since the set {Tr; r E Q+} is dense in Z , it follows that Z == zo a.s. Now assume instead that 7 == 0 a.s. TIlen T 0 e Tr == 0 a.s. on {Tr < oo}, and so rr E zc a.s. on the same set. Hence, Z c ZC a.s., and therefore zc == 1R+ a.s. It remains to note that Zc == ( Z )c, since ZC is a disjoint union of intervals (u, v) or [u, v). (ii) In this case, we define r = inf(Z \ {O}). If T == 0 a.s., then 70 (}T r == 0 a.s. on {Tr < oo}. Since every isolated point of Z is of the form Tr for some r E Q+, it follows that Z has a.s. no isolated points. If instead 7 > 0 a.s., we may define the optional times (In recursively by a n +l == an + T 0 OCT n , starting from (J1 == T. Then (In == Ek<n {k, where the {k are i.i.d. and distributed as r, and so (In -t 00 a.s. by the law of large numbers. Thus, Z == {an < 00; n E N} a.s., and a.s. all points of Z are isolated. (Hi) Here we may take T == inf{ t > 0; (Z . '\)t > OJ. If T == 0 a.s., then T 0 ()T r = 0 a.s. on {Tr < oo}, and so Tr E supp(Z . .,\) a.s. on the same set. Hence, Z c supp(Z . A) a.s., and the two sets agree a.s. If instead T > 0 a.s., then T = r + r 0 ()., > r a.s. on {T < oo}, which "implies T == 00 a.s. This yields AZ = 0 a.s. 0 To examine the global properties of Z, we may introduce the holding time 1 = inf ZC = inf{t > 0; Xt =1= a}, which is optional by Lemma 7.6. The 
434 Foundations of Modern Probability following extension of Lemma 12.16 gives some more detailed information about dichotomy (i) above. Lemma 22.8 (holding time) The time I is exponentially distributed with mean m E [0,00], where m == 0 or 00 when X is continuous. Furthermore, Z is a.s. nowhere dense when m == 0, and if m > 0 it is a.s. a locally finite union of intervals [a, T). Finally, ,lLX 0 (}"Y when m < 00. Proof: The first and last assertions may be proved as in Lemma 12.16, and the statement for m = 0 was obtained in Proposition 22.7 (i). Now let o < m < 00. Noting that 'Y 0 B--y == 0 a.s. on {'Y E Z}, we get 0== P{1'oB"Y > 0, ,E Z} == P{1' > O}P{1' E Z} == P{1' E Z}, so in this case I tJ. Z a.s. Put 0"0 = 0, let 0"1 == l' + TO 0 (}"'j' and define recursively O"n+l == O"n + 0"1 0 ()(Tn' Write I'n == an + l' 0 Ban' Then O"n -7 00 a.s. by the law of large numbers, and so Z == Un [an, 1'n). If X is continuous, then Z is closed and the last case is excluded. 0 The state a is said to be absorbing if m == 00 and instantaneous if m == O. In the former case clearly X _ a and Z == R+ a.s. Hence, to avoid trivial exceptions, we may henceforth assume that m < 00. A separate treatment is sometimes required for the elementary case when the recurrence time , + 70+ 0 ()"'j is a.s. strictly positive. This clearly occurs when Z has a.s. only isolated points or the holding time 'Y is positive. We proceed to examine the set of excursions. Since there is no first excur- sion in general, it is helpful first to focus on excursions of long duration. For any h > 0, let Dh denote the set of excursion paths longer than h, endowed with the O"-field Vh generated by all evaluation maps 1ft, t > o. Note that Do is a Borel space and that Dh E Do for aU h. The number of excursions in Dh will be denoted by h' The following result is a continuous-time version of Proposition 8.15. Lemma 22.9 (long excursions) Fix any h > 0, or h > 0 when the re- currence time is positive. Then either h = 0 a.s., or h has a geometric distribution with mean mh E (1, 00]. In the latter case, X has Dh -excursions -vj . < l . . d vI "\/"2 . D h "'{.TK.h. .I. h' J _ "'h J or some z. t. . processes .1. h , .I. h , . .. zn h, were .I. h 'lS a. s. infinite when mh < 00. Proof: For t E (0,00], let K denote the number of Dh-excursions com- pleted at time t E [0,00], and note that K.t > 0 when It = 00. Writing Ph = P{Kh > O}, we obtain Ph - P{Kt > O} + P{Kt = 0, h OOTt > O} - P{Kt > O} + P{Kt = O}Ph. Since K  Kh as t  00, we get Ph == Ph + (1 - Ph)Ph, and so Ph = 0 or 1. Now assume that Ph = 1. Put eTo = 0, let 0"1 denote the end of the first Dh-excursion, and recursively define l1n+1 = an + l11 0 DUn. If all excursions are finite, then clearly eTn < 00 a.s. for all n, and so fih == 00 a.s. Thus, 
22. Local Time, Excursions, and Additive ]nctionals 435 the last Dh-excursion is infinite when lih < 00. We may now proceed as in the proof of Proposition 8.15 to construct some i.i.d. processes Yl, Y , . . . in Dh such that X has Dh-excursions y, j < ""h. Since lih is the nun1ber of the first infinite excursion, we note in particular that ""h is geometri- cally distributed with mean q/:l, where qh is the probability that y is infinite. 0 Now put h == inf{h > 0; lih == 0 a.s.}_ For any h E (0, h) we have lih > 1 a.s., and we may define Vh as the distribution of the first excursion in Dh- The next result shows how the Vh can be combined into a single measure v on Do, the so-called excursion law of X. For convenience, we write v[ 'IA] == v(. n A)/vA whenever 0 < vA < 00. Lemma 22.10 (excursion law, Ito) There exists a measure v on Do such that VDh E (0, (0) and Vh == v[ .IDh] for every h E (0, h,). Furthermore, v is unique up to a normalization, and it is bounded iff the recurrence time is a.s. positive. Proof: Fix any h < k in (0, h), and let y, y, . .. be such as in Lemma 22.9. Then the first Dk-excursion is the first process } that belongs to D k , and since the Yl are i.i.d. Vh, we have "- Vk == Vh[ .IDk), 0 < h < k < h. (14) Now fix any k E (0, h), and define Vh == Vh/VhDk, h E: (0, k]. Then (14) yields Vh' == Vh(' n Dh') for any h < h' < k, and so Vh increases as h -+ 0 toward a measure v with v(. n Dh) == Vh for all h < k. :For any h E (0, h), we get v[ .ID h ] == Vh/\k [ . IDh] == Vh/\k [ . IDh] == L'h. If v'is another measure with the stated property, then V(.nDh) Vh v'(.nD h ) - VDk VhDk V'Dk As h -+ 0 for fixed k, we get v == rv' with r == VDk/V'Dko If the recurrence time is positive, then (14) remains true for h == 0, and we may take v == va. Otherwise, let h < k in (0, h), and denote by ""h,k the number of Dh-excursions up to the first completed excursion in Dk. For fixed k we have ""h,k -+ 00 a.s. as h -+ 0, since Z is perfect and nowhere dense. Now lih,k is geometrically distributed with mean "- h < k < h. E"'h,k = (Vh D k)-l == (V[DkI D h])-l == VDh/vDk, and so VDh -+ 00. Thus, v is unbounded. o When the regenerative set Z has a.s. only isolated points, then Lemma 22.9 already gives a complete description of the excursion structure. In the complementary case when Z is a.s. perfect, we have the following funda- mental representation in terms of a local time process L and an associated 
436 Foundations of Modern Probability Poisson point process , both of which can be constructed from the array of holding times and excursions. Theorem 22.11 (excursion local time and Poisson process, Levy, lto) Let X be regenerative at a and such that the closure of Z = {t; Xt = a} is a.s. perfect. Then there exist a nondecreasing, continuous, adapted process L on JR+ with support Z a.s., a Poisson process  on JR+ x Do with intensity measure of the form A 0 v, and a constant c > 0, such that Z . A = cL a.s. and the excursions of X with associated L-values are given by the restriction of  to [0, Loo]. Furthermore, the product v . L is a.s. unique. Proof (beginning): If E7 == c > 0, we may define v == vole and introduce a Poisson pr.?cess  on + x Do with intensity measure A0v. Let the points of {be (O"j, 1j), j E N, and put ao = O. By Proposition 12.15 the differences 1j == OJ - OJ-l are independent and exponentially distributed with mean c. Furthermore, by Proposition 12.3 the processes 1j are independent of the O"j and i.i.d. vo. Letting 'Pi be the first index j such that 1j is infinite, we see from Lemmas 22.8 and 22.9 that {'Yj,1j; j < } d {1j, fj; j < }, (15) where the quantities on the left are the holding times and subsequent ex- cursions of X. By Theorem 6.10 we may redefine  such that (15) holds a.s. The stated conditions then become fulfilled with L == Z . A. Turning to the case when E'Y == 0, we may define v as in Lemma 22.10 and let  be Poisson A 0 v, as before. For any h E (O,_h) , the points of  in JR+ x Dh may be enumerated from the left as (a, y), j E N, and we define h as the first index j such that y is infinite. The processes y are clearly i.i.d. Vh, and so by Lemma 22.9 we have {Y;j < Kh} d {Y;j < Kh}' hE(O,h). (16) Since longer excursions form subarrays, the entire collections in (16) have the same finite-dimensional distributions, and so by Theorem 6.10 we may redefine.{ such that all relations hold a.s. Let it" be the right endpoint of the jth excursion in D h , and define Lt = inf{a; h,j > 0, T > t}, t > O. We need the obvious facts that, for any t > 0 and h, j > 0, Lt < 0{ =} t < i  Lt < a. (17) To see that L is a.s. continuous, we may assume that (16) holds identically. Since v is infinite, we may further assume the set {ai; h, j > O} to be dense in the interval [0, Loo]. If Lt > 0, there exist some i, j, h > 0 with Lt- < (]" < O" < L t +- By (17) we get t - E < i < 7k < t + e for every e > 0, which is impossible. Thus, D.Lt == 0 for all t. To prove that Z c supp L a.s., we may further assume Z w to be perfect and nowhere dense for each w E O. If t E Z , then for every € > 0 there 
22. Local Time, Excursions, and Additive F'unctionals 437 exist some i,j,h > 0 with t - E < T < Th < t + E, and by (17) we . j get Lt-c < a'h < a h < Lt+c' Thus, Lt-€ < Lt+€ for all E > 0, and so t E suppLe 0 In the perfect case, it remains to establish the a.s. relation Z . A == cL for a suitable c and to show that L is unique and adapted. To. avoid repetition, we postpone the proof of the former claim until Theorem 22.13. The latter statements are immediate consequences of the following result, which also suggests many explicit constructions of L. Let TJtA denote the number of excursions in a set A E V o , completed at time t > O. Note that 'T} is an adapted, measure-valued process on Do. Proposition 22.12 (approximation) If AI, A 2 ,'" E V o with 00 > v An -j. 00, then sup 1]tAn - Lt  0, U > o. t5:u II An The same convergence holds a.s. when the An are nested. In particular, 1]tDh/lIDh -+ Lt a.s. as h -+ 0 for fixed t. Thus, L is a.s. determined by the regenerative set Z. Proof: Let  be such as in Theorem 22.11, and put s == ([O, s] x.). First assume that the An are nested. For any s > 0 we note that (sAn) d (N svAn ), where N is a unit-rate Poisson process on +. Since t-IN t ---+ 1 a.s. by the law of large numbers and the monotonicity of N, we get sAn A -+ s a.s., s > O. II n (18) As in case of Proposition 4.24, we may strengthen this to sAn sup A - S -t 0 a.s., r > O. s5:r II n (19) Without the nestedness assumption, we may introduce a nested sequence A , A, . .. with II A = II An for all n. Then (19) holds with An replaced by A, and since the distributions on the left are the saIne for each n, the formula for An remains valid with convergence in probability. In both cases we may clearly replace r by any positive random variable. The convergence (18) now follows, if we note that Lt- < 'T}t < Lt for all t > 0 and use the continuity of L. 0 The excursion local time L is described most conveniently in terms of its right-continuous inverse Ts = L;1 = inf{t > 0; Lt > s}, s > O. To state the next result, we introduce the subset Z' C Z, obtained from Z by omission of all points that are isolated from the right. Let us further write l ( u) for the length of an excursion path U E Do. 
438 Foundations of Modern Probability Theorem 22.13 (inverse local time) Let L, , l/, and c be such as in Theorem 22.11. Then T == L -1 is a generalized subordinator with characteristics (c, 1/ 0 i-I) and a. s. range Z' in IR+, and we have a. s. Ts = CS+ l s +1 l(u)(drdu), S > O. (20) Proof: We may clearly discard the null set where L is not continuous with support Z . If Ts < 00 for some s > 0, then Ts E supp L == Z by the definition of T, and since L is continuous, we get Ts  Z \ Z'. Thus, T(+) C Z' U { oo} a.s. Conversely, assume that t E Z'. Then for any E > 0 we have Lt+€ > Lt, and so t < ToLt < t+E. As c -+ 0, we get ToLt = t. Thus, Z' c T(JR+) a.s. For each s > 0, the time Ts is optional by Lemma 7.6. Furthermore, it is clear from Proposition 22.12 that, as long as Ts < 00, the process BsT - Ts is obtainable from X 0 ()T s by a measurable mapping that is independent of s. By the regenerative property and Lemma 15.11, the process T is then a generalized subordinator, and in particular it admits a representation as in Theorem 15.4. Since the jumps of T agree with the lengths of the excursion intervals, we obtain (20) for a suitable c > O. By Lemma 1.22 the double integral in (20) equals J x(s 0[-1 )(dx), which shows that T has Levy measure E(l 0 [-1) == l/ 0 [-1. Substituting s == Lt into (20), we get a.s. for any t E Z' t = To Lt = cLt + 1 Lt +j l(u)(drdu) = cLt + (ZC . >'k Hence, cLt == (Z . A)t a.s., which extends by continuity to arbitrary t > o. 0 We may justify our terminology by showing that the semimartingale and excursion local times agree whenever both exist. Proposition 22.14 (reconciliation) Let the continuous semimartingale X in  be regenerative at some a E IR with P{L =I- o} > O. Then the set Z = {t; Xt == a} is a.s. perfect and nowhere dense, and La is a version of the excursion local time at a. Proof: By Theorem 22.1 the state a is nonabsorbing, and so Z is nowhere dense by Lemma 22.8. Since P{L =I- O} > 0 and La is a.s. continuous with support in Z, Proposition 22.7 shows that Z is a.s. perfect. Let L be a version of the excursion local time at a, and put T = L -1. Define Y s = La 0 Ts for s < Loo, and let Y s == 00 otherwise. By the continuity of La we have Ys:l:: == La 0 Ts for every s < Loo. If Ts > 0, we note that La 0 Ts- == La 0 Ts, since (Ts-, Ts) is an excursion interval of X and La is continuous with support in Z. Thus, Y is a.s. continuous on [0, Loo). By Corollary 22.6 and Proposition 22.12 the process BsY -  is obtain- able from OTsX through the same measurable mapping for all s < Loo. By the regenerative property and Lemma 15.11 it follows that Y is a general- 
22. Local Time, Excursions, and Additive Functionals 439 ized subordinator, and so by Theorem 15.4 and the cont.inuity of Y there exists some c > 0 with Y s = cs a.s. on [0, LCXJ)' For t E Z' we have a.s. To Lt = t, and therefore Lf == La 0 (T 0 Lt) == (La 0 T) 0 Lt == cLt. This extends to + since both extremes are continuous with support in z. 0 For Brownian motion it is convenient to normalize local time according to Tanaka's formula, which leads to a corresponding normalization of the excursion law 1/. By the spatial homogeneity of Brownian motion, we may restrict our attention to excursions from O. The next result shows that ex- cursioty' of different length have the same distribution apart from a scaling. For a precise statement, we may introduce the scaling operators Sr on D, given by (Srf)t == r 1/2 ftlr, t > 0, r > 0, fED. Theorem 22.15 (Brownian excursion) Let 1/ be the normalized excursion law of Brownian motion. Then there exists a unique distribution v on the set of excursions of unit length such that 1/ = (21T)-1/2l°O (v 0 S;l )r- 3 / 2 dr. (21) Proof: By Theorem 22.13 the inverse local time L- 1 is a subordinator with Levy measure vo[-l, where l( u) denotes the length of u. Furthermore, d L == M by Corollary 22.3, where Mt == sUPst Bs, and so by Theorem 15.10 the measure 1/0[-1 has density (27r)-1/2r-3/2, r> o. As in Theorem 6.3, there exists a probability kernel (1/ r ) from (0,00) to Do such that [ -1 -  . d 1/r 0 == U r an 1/ = (21T)-1/2l OO I/ r r- 3 / 2 dr, and we note that the measurs V T are unique a.e. A. For any r > 0 the process B _ SrB is a_gain a Brownian motion, and by Corollary 22.6 the local time of B equals L == SrL. If B h an excursion u ending at time t, then the corresponding excursion Sru of B ends at rt, and the local time for fJ at he new excursion equals Lrt = r 1 / 2 Lt. Thus, the excursion process  for B is obtained from the process  for B through the mapping Tr: (s, u) t---+ (r 1/2 s, Sr u ). Since  d , each Tr leaves the intensity measure A 0 1/ invariant, and we get (22) 1/ 0 S -l = r l/2 1/ > 0 r , r . (23) Combining (22) and (23), we get for any r > 0 1 00 (I/ x 0 S;1)x- 3 / 2 dx = r 1 / 2 l°O I/ x x- 3 / 2 dx = l CX ) I/rxx-3/2dx, 
440 Foundations of Modern Probability and by the uniqueness in (22) we obtain 8 -1 V x 0 r == v rx , x > 0 a.e. A, r > o. By Fubini's theorem, we may then fix an x == c > 0 such that 8 -1 V COr == Vcr, r > 0 a.s. A. Define f) == V e 0 8V' and conclude that for almost every r > 0 8 -1 8 -1 S -l A S -l V r == Vc(rlc) == V e 0 rle == V e 0 lie 0 r == V 0 r . Substituting this into (22) yields equation (21). If J.-L is another probability measure with the stated properties, then for almost every r > 0 we have J.L 0 8;1 == V 0 8;1, and hence S -l 8 -1 A 8 -1 S -1 A J.l == J.l 0 r 0 1/r == V 0 r 0 1/r == v. Thus, f) is unique. 0 By continuity of paths, an excursion of Brownian motion is either positive or negative, and by symmetry the two possibilities have the same probabil- ity  under f). This leads to the further decomposition f) ==  (f) + + v _ ). A process with distribution v+ is called a (normalized) Brownian excursion. For subsequent needs, we continue with a simple computation. Lemma 22.16 (height distribution) Let v be the excursion law of Brownian motion. Then v{u E Do; SUPtUt > h} = (2h)-1, h > o. Proof: By Tanaka's formula the process M == 2B V 0 - L O == B + IBI- LO is a martingale, and so we get for T = inf{t > 0; Bt = h} E L/\t == 2E(B r /\t V 0), t > o. Hence, by monotone and dominated convergence EL = 2E(B T V 0) == 2h. On the other hand, Theorem 22.11 shows that L is exponentially distributed with mean (lIAh)-l, where Ah == {u; SUPtUt > h}. 0 The following result gives some remarkably precise information about the spatial behavior of Brownian local time. Theorem 22.17 (space dependence, Ray, Knight) For Brownian motion B with local time L, let T == inf{t > 0; Bt == I}. Then on [0,1] the process St == L;-t is a squared Bessel process of order 2. Several proofs are known. Here we derive the result as an application of the previously developed excursion theory. Proof (Walsh): Fix any U E [0,1], put (j == L, and let :I: denote the Poisson processes of positive and negative excursions from u. Write Y for the process B, stopped when it first hits u. Then Y II (+ , -) and + JL-, so + Jl(-, Y). Since a is + -measurable, we obtain + Jl(7(-, Y) and hence : 1l(7(;, Y), which implies the Markov property of L at x = u. 
22. Local Time, Excursions, and Additive Functionals 441 To derive the corresponding transition kernels, fix any x E [0, u), and write h == u - x. Put TO == 0, and let 71,72, . .. be the right endpoints of those excursions from x that reach u. Next define (k == L:k+l - Lk' k > 0, so that L == (0 +... + ("" with K =: sup{k; Tk < T}. By ]emma 22.16 the variables (k are Li.d. and exponentially distributed with mean 2h. Since f\; agrees with the number of completed u-excursions before time T that reach x and since (1 lL-, it is further seen that "" is conditionally Poisson (1/2h, gIven a. We also need the fact that (0-, K)ll((o, (1,. . . ). To see this, define (J"k == Lk. Since - is Poisson, we note that ((11, 0-2, . . . ) lL ((1, (2, . . . ), and so (0",0"1,0"2,.. . )ll(Y, (1, (2,. .. ).'The desired relation now follows, since  is a measurable function of (0", (11, 0"2,. . .) and (0 depends measurably on Y. For any s > 0, we may now compute E[ e-sL-h I a] E [ (Ee-s(o) 1<+11 a] = E [ (1 + 2sh) -1<-11 a] { -sO" } (1 + 2sh)-1 exp 1 + 2sh . In combination with the Markov property of L, the last relation is equiv- alent, via the substitutions u == 1- t and 2s == (a - t)-l, to the martingale property of the process { _L1-t } Mt = (a - t)-1 exp 2(a =- t) , for arbitrary a > o. Now let X be a squared Bessel process of order 2, and note that L = Xo = 0 by Theorem 22.4. By Corollary 13.12 the process X is again Markov. To see that X has the same transition kernel as L-t, it is enough to show for 'an arbitrary a > 0 that the process M in (24) re- mains a martingale when L-t is replaced by Xt. This is easily verified by means of Ita's formula, if we note that X is a weak solution to the SDE dX t = 2Xt1/2 dBt + 2dt. 0 t E [0, a), (24) As an important application of the last result, we may show that the local time is strictly positive on the range of the process. Corollary 22.18 (range and support) Let M be a continuous local martingale with local time L. Then outside a fixed P-null set, {L > O} == {infs=:;tMs < x < sUPs<tMs}, x E R, t > O. (25) Proof: By Corollary 22.6 and the continuity of L, we have Lf == 0 for x outside the interval in (25), except on a fixed P-null set. 'Ib see that Lf > 0 otherwise, we may reduce by Theorem 18.3 and Corollary 22.6 to the case when M is a Brownian motion B. Letting T u = inf {t > 0; Bt == u}, we see from Theorems 18.6 (i) and 18.16 that, outside a fixed }>-null set, Lu > 0, 0 < x < u E Q+. (26) 
442 Foundations of Modern Probability If 0 < X < sUPs<t Bs for some t and x, there exists some u E Q+ with x < U < sUPs::;t Es. But then Tu < t, and (26) yields Lt > L;.u > o. A similar argument applies to the case when infsst Bs < x < o. 0 Our third approach to local times is via additive functionals and their po- tentials. To introduce those, consider a canonical Feller process X with state space 5, associated terminal time (, probability measures Px, transition operators Tt, shift operators Ot, and filtration F. By a continuous addi- tive functional (CAF) of X we mean a nondecreasing, continuous, adapted process A with Ao == 0 and A(vt - A(, and such that A s + t == As + At 0 Os a.s., s, t > 0, (27) where a.s. without qualification means Px-a.s. for every x. By the continuity of A, we may choose the exceptional null set to be independent of t. If it can also be taken to be independent of s, then A is said to be perfect. For a simple example, let f > 0 be a bounded, measurable function on S, and consider the associated elementary CAF At = it f(Xs)ds, t > O. (28) More generally, given any CAF A and a function f as above, we may define a new CAF f. A by (f . A)t == fs<t f(Xs)dAs, t > O. A less trivial example is given by the local time of X at a fixed point x, whenever it exists in either sense discussed earlier. For any CAF A and constant a > 0, we may introduce the associated a- potential U1(x) = Ex 1 00 e-atdA t , XES, and put UAf = Uf.A' In the special case when At = t /\ (, we shall often write UO f == UAf. Note in particular that U A == uo f == Rof when A is given by (28). If a == 0, we may omit the superscript and write U == uO and U A == u. The next result shows that a CAF is determined by its a-potential whenever the latter is finite. Lemma 22.19 (uniqueness) Let A and B be CAFs of a Feller process X such that U A = Un < 00 for some a > 0. Then A == B a.s. Proof: Define Af == fs<t e-asdA s , and conclude from (27) and the Markov property at t that,ror any xES, Ex[AIFt] - Af == e- at Ex[A 0 OtlFt] == e-atUA(X t ). (29) Comparing with the same relation for B, it follows that Aa - Ba is a continuous Px-martingale of finite variation, and so Aa == Ba a.s. Px by Proposition 17.2. Since x was arbitrary, we get A == B a.s. 0 Given any CAF A of Brownian motion in ]Rd, we may introduce the associated Revuz measure v A, given for any measurable function 9 > 0 on 
22. Local Time, Excursions, and Additive Functionals 443 }Rd by VAg = E (g.A)l' where E == J Exdx. When A is given by (28), \ve get in particular VAg == (I, g), where (.,.) denotes the inner product in L 2 (JR d ). In general, we need to prove that v A is a-finite. Lemma 22.20 (a-finiteness) For any CAF A of Brownian motion X in ]Rd, the associated Revuz measure v A is a-finite. Proof: Fix any integrable function f > 0 on }Rd, and define g(x) = Ex 1 00 e- t - At f(Xt)dt, x E ]Rd. Using Corollary 19.19, the additivity of A, and Fubini's theorem, we get Ulg(x) Ex 1 00 e-tdA t EXt 1 00 e- s - As f(Xs)ds Ex 1 00 e-tdA t 1 00 e-s-AsOOt f(Xs+dds Ex 1 00 eAtdAt 1 00 e- s - As f(Xs)ds Ex 1 00 e- s - As f(Xs)ds l s eAtdAt - Ex 1 00 e- S (1- e-AS)f(Xs)ds < Eo 1 00 e- S f(Xs + x)ds. Hence, by Fubini's theorem e-lvAg < J Ulg(x)dx < J dx Eo 1 00 e- S f(Xs + x)ds - Eo 1 00 e-Sds J f(Xs + x)dx = J f{x)dx < 00. The assertion now follows since 9 > O. 0 Now let Pt(x) denote the transition density (27rt)-d/2 e - 1 x I2 /2t of Brow- nian motion in ]Rd, and put uO:(x) == fo oo e-o:tpt(x)dt. For any measure J-l on d, we may introduce the associated a-potential UCtJ-l(x) == J uQ(x - Y)J-L(dy). The following result shows that the Revuz measure has the same potential as the underlying CAF. Theorem 22.21 (a-potentials, Hunt, Revuz) For Brownian motion in JRd, let A be a CAP with Revuz measure v A. Then U A == ua v A for all Q > o. Proof: By monotone convergence we may assume that ( > O. By Lemma 22.20 we may choose some positive functions In t 1 such that vfnoA1 == vAin < 00 for each n, and by dominated convergence we have U fnoA t U A and UO:vfnoA t UO:VA. Thus, we may further assume that VA is bounded. In that case, clearly U A < 00 a.e. 
444 Foundations of Modern Probability Now fix any bounded, continuous function f > 0 on R d , and note that by dominated convergence UO: f is again bounded and continuous. Writing h = n- 1 for an arbitrary n E N, we get by dominated convergence and the additivity of A l/AU Ol f = E {I U Ol f(Xs)dAs = Hrn E '"'. U Ol f(Xjh)A h 0 ()jh. J o n-+oo J<n Noting that the operator U a is self-adjoint and using the Markov property, we may write the expression on the right as L:. E UO: I(Xjh)ExJhAh == n J UO:f(x)ExAhdx = n{f, UO:E.A h ). J<n To estimate the function U a E.Ah on the right, it is enough to consider arguments x such that UA(x) < 00. Using the Markov property of X and the additivity of A, we get U Ol E.Ah(x) Ex 1 00 e- OlS EXsAhds = Ex 1 00 e-OlS(Ah 0 ()s)ds Ex 1 00 e-OlS(A s + h - As)ds (e Olh - l)E x 1 00 e- OlS Asds - eOlhEx 1 h e- OlS A s d,.;{.30) Integrating by parts gives Ex 1 00 e- OlS Asds = a-I Ex 1 00 e-OltdA t = a-IU(x). Thus, as n = h- 1 -t 00, the first term on the right of (30) yields in the limit the contribution (I, U A ). The second term is negligible since (f,E.Ah)::s E Ah = hVAl-+ O. Hence, (UO:vA,/) = VAUO:j = {UA,f}, and since f is arbitrary, we obtain U A = uo: v A a.e. To extend this to an identity, fix any h > 0 and x E }Rd. Using the addi- tivity of A, the Markov property at h, the a.e. relation, Fubini's theorem, and the Chapman-Kolmogorov relation, we get e Olh Ex 1 00 e-OlSdA s - Ex 1 00 e-OlSdA s OOh ExUA(X h ) = ExUO:VA(X h ) J l/A (dy) ExuOl (Xh - y) e Olh J l/A(dy) L oo e-OlSps(x - y)ds. 
22. Local Time, Excursions, and Additive Functionals 445 The required relation U A (x) = ua V A (x) now follows by monotone convergence as h -+ O. 0 It is now easy to show that a CAF is determined by its Revuz measure. Corollary 22.22 (uniqueness) If A and Bare CAFs of 13rownian motion in]Rd with VA = VB, then A = B a.s. Proof: By Lemma 22.20 we may assume that v A is bounded, so that U A < 00 a.e. for all a > O. Now V A determines U A by Theorem 22.21, and from the proof of Lemma 22.19 we note that U A determines A a.s. Px whenever UA(x) < 00. Since Px 0 x;; 1 « Ad for each h > 0, it follows that A 0 Oh is a.s. unique, and it remains to let h -+ o. 0 We turn to the reverse problem of constructing a CAF associated with a given potential. To motivate the following definition, we may take expected values in (29) to get e-atTtU A < U A . A function f on S is said to be uniformly a-excessive if it is bounded and measurable with 0 < e-atTtf < f for all t > 0 and such that I/Tt! - ill -t 0 as t -t 0, where II . 1/ denotes the supremum norm. Theorem 22.23 (excessive functions and CAFs, Volkonsky) For any Feller process X in S and constant a > 0, let f > 0 be a uniformly Q;- excessive function on S. Then f == U A for some a.s. unique, perfect CAF A of X . Proof: For any bounded, measurable function 9 on S, v/e get by Fubini's theorem and the Markov property of X Ex roo e-o.tg(Xt)dt 2 = Ex roo e-o:tg(Xt)dt roo e-o:(t+h)g(Xt+h)dh lo lo. J o Ex 1 00 e- 2 o: t g(X t )dt 1 00 e-O:hThg(Xt)dh Ex 1 00 e- 2 o. t gUo.g(X t )dt = 1 00 e- 2 o: t TtgUO:g(x)dt < lI u o:glll°O e-o: t 1tlgl(x)dt < IIUO:gIIIIUO:jglll. (31) Now introduce for each h > 0 the bounded, nonnegative functions fh h- 1 (f - e-ahT h !), UO:g h = h- 1 l h e-asTsfds, gh and define Ah(t) = I t gh(Xs)ds, Mh(t) = Ah(t) + e-o. t !h(X t ). As in (29), we note that the processes M h are martinga.les under Px for every x. Using the continuity of the Ah, we get by Proposition 7.16 and 
446 Foundations of Modern Probability (31), for any xES and as h, k -+ 0, Ex{Ah - Ak)*2 < ExSllPtEQ+ IMh(t) - M k (t)1 2 + Ilfh - fkl1 2 < Ex IA(oo) - A(oo)12 + Ilfh - fkl1 2 < IIfh - fkllllfh + fkll + Ilfh - fkl1 2 -+ o. Hence, there exists some continuous process A independent of x such that Ex(Ah - AQ)*2 -+ 0 for every x. For a suitable sequence h n -+ 0 we have (Ah n -+ Aa)* -+ 0 a.s. Px for all x, and it follows easily that A is a.s. a perfect CAF. Taking limits in the relation fh(X) == ExAh(oo), we also note that f(x) = ExAQ(oo) = UA(x). Thus, A has a-potential f. 0 We will now use the last result to construct local times. Let us say that a CAF A is supported by some set B c S if its set of increase is a.s. contained in the closure of the set {t > 0; X t E B}. In particular, a nonzero and perfect CAF supported by a singleton set {x} is called a local time at x. This terminology is clearly consistent with our earlier definitions of local time. Writing Tx == inf{t > 0; Xt = x}, we say that x is regular (for itself) if Tx = 0 a.s. Px. By Proposition 22.7 this holds iff Px-a.s. the random set Zx == {t > 0; Xt == x} has no isolated points. Theorem 22.24 (additive functional local time, Blumenthal and Getoor) A Feller process in S has a local time L at a point a E S iff a is regular. In that case L is a.s. unique up to a normalization, and Ul(x) = UI(a)Exe-ra < 00, xES. (32) Proof: Let L be a local time at a. Comparing with the renewal process L:;;1, n E Z+, we see that SUPx,t Ex (Lt+h - Lt) < 00 for every h > 0, which implies UI(x) < 00 for all x. By the strong Markov property at T = Ta, we get for any xES Ul(x) Ex(L - L) = Exe-T(L 0 (}T) Exe-T EaL = UI(a)Ex e - r , proving (32). The uniqueness assertion now follows by Lemma 22.19. To prove the existence of L, define f(x) = Exe- r , and note that f is bounded and measurable. Since T < t + T 0 ()t, we may also conclude from the Markov property at t that, for any xES, f(x) == Exe-T > e- t Ex(e- T oOt) = e- t ExExte-T = e- t Exf(Xt) = e-tTtf(x). Noting that (J't = t + T 0 ()t is nondecreasing and tends to 0 a.s. Pa as t  0 by the regularity of a, we further obtain o < f(x) - e-hThf(x) _ Ex(e- r - e- Uh ) < Ex(e- r - e- Uh + T ) Ex e - T Ea(l - e- Uh ) < Ea(l - e- Uh ) -+ O. 
22. Local Time, Excursions, and Additive Functionals 447 Thus, f is uniformly I-excessive, and so by Theorem 22.23 there exists a perfect CAF L with ul == f. To see that L is supported by the singleton {a}, we may write Ex(L - L;) == Exe-r EaL == Exe-r Eae-r == Exe--r == ExL, which implies L;. == 0 a.s. Hence, Lr == 0 a.s., and so the Markov property yields L(1t == Lt a.s. for all rational t. This shows that L has a.s. no point of increase outside the closure of {t > 0; Xt == a}. 0 The next result shows that every CAF of one-dimensional Brownian mo- tion is a unique mixture of local times. Recall that v A denotes the Revuz measure of the CAF A. Theorem 22.25 (integral representation, Volkonsky, Mcl{ean and Tanaka) For Brownian motion X in JR with local time L, a process A is a CAF of X iff it has an a. s. representation At = I: Lfv(dx), t > 0, (33) for some locally finite measure v on. The latter zs then unique and equals v A . Proof: For any measure v we may define an associated process A as in (33). If v is locally finite, it is clear by the continuity of 1 and dominated convergence that A is a.s. continuous, hence a CAF. In the opposite case, we note that v is infinite in every neighborhood of some point a E JR. Under Pa and for any t > 0, the process Lf is further a.s. continuous and strictly positive near x == a. Hence, At == 00 a.s. Pa, and A fails to be a CAF. Next, we conclude from Fubini's theorem and Theorem 22.5 that E Lf = !(EyLf)d y = Eo ! L-Ydy = l. Since LX is supported by {x}, we get for any CAF A as in (33) vAf E (f. Ah = E ! v(dx) 1 1 f(Xt)dL: J f(x)v(dx) E Li = vf, which shows that v == VA. Now consider an arbitrary CAF A. By Lemma 22.20 there exists some function I > 0 with vAl < 00. The process Bt = ! LfVfoA(dx) = ! Lf f(X)VA(dx), t :> 0 -- , is then a CAF with VB = VI-A, and by Corollary 22.22 we get B == f . A a.s. Thus, A = f-l . B a.s., and (33) follows. 0 
448 Foundations of Modern Probability Exercises 1. Use Lemma 13.15 to show that the set of increase of Brownian local time at 0 agrees a.s. with the zero set Z. Extend the result to any continuous local martingale. (Hint: Apply Lemma 13.15 to the process sgn(B-) . B in Theorem 22.1.) 2. (Levy) Let M be the maximum process of a Brownian motion B. Show that B can be measurably recovered from M - B. (Hint: Use Corollaries 22.3 and 22.6.) 3. Use Corollary 22.3 to give a simple proof of the relation 12 d 13 in The- orem 13.16. (Hint: Recall that the maximum is unique by Lemma 13.15.) Also use Proposition 18.9 to give a direct proof of the relation II d 72. (Hint: Integrate separately over the positive and negative excursions of B, and use Lemma 13.15 to identify the minimum.) 4. Show that for any c E (0, ), Brownian local time Lf is a.s. Holder continuous in x with exponent c, uniformly for bounded t. Also show that the bound c <  is best possible. (Hint: Apply Theorem 3.23 to the estimate in the proof of Theorem 22.4. For the last assertion, use Theorem 22.17.) 5. Let M be a continuous local martingale such that B 0 [M] a.s. for some Brownian motion. Show that if B has local time Lf, then the local time of M at x equals LX 0 [M]. (Hint: Use Theorem 22.5, and note that L 0 [M] is jointly continuous.) 6. For any continuous semimartingale X, show that J f(Xs, s )d[X]s = J dx J I(x, s)dL outside a fixed null set. (Hint: Extend Theorem 22.5 by a monotone class argument.) 7. Let Z be the zero set of Brownian motion B. Use Proposition 22.12 and Theorem 22.15 to construct its local time L directly from Z. Also use Lemma 22.16 to construct L from the heights of the excursions of B. Finally, use Corollary 22.6 to construct L from the occupation measure of B. 8. Let 'TJ be the maximum of a Brownian excursion. Show that E'TJ = (7r/2)1/2. (Hint: Use Theorem 22.15 and Lemmas 22.16 and 3.4.) 9. Let L be the continuous local time of a continuous local martingale M with [M]oo = 00 a.s. Show that a.s. Lf --* 00 as t --* 00, uniformly on compacts. (Hint: Reduce to the case of Brownian motion. Then use Corollary 22.18, the strong Markov property, and the law of large numbers.) 10. Show that the intersection of two regenerative sets is regenerative. 11. Let L be the local time of a regenerative set and let T be an indepen- dent, exponentially distributed time. Show that L T is again exponentially distributed. (Hint: Prove a Cauchy equation for the function P{Lr > s}.) 12. For any unbounded regenerative set Z, show that .c(Z) is a.s. determined by Z. (Hint: Use the law of large numbers.) 
22. Local Time, Excursions, and Additive Fanctionals 449 13. Let Z be a nontrivial regenerative set. Show that cZ d Z for all c > 0 iff the inverse local time is strictly stable. 14. Let X be a Feller process in JR and put Mt == sUPss;t Xs. Show that the points of increase of M form a regenerative set. Also prove the same d statement for the process X; == sUPst IXsl when -X == )(. 15. Let X be a strictly stable Levy process, let Z denote the set of increase of the process Mt = sUPs<t Xs, and write L for the local time of Z. As- suming Z to be nontrivialshow that L- 1 is strictly stable. Also prove the corresponding statement for X* when X is symmetric. 16. Give an explicit construction of the process X in Theorem 22.11, based on the Poisson process  and the constant c. (Hint: Use 1heorem 22.13 to construct the time scale.) 17. Show that semimartingale local time is preserved under a change of measure Q == Zt . P. Use this result to extend Corollary 22.18 Brownian motion with a suitable drift. (Hint: Use Proposition 18.20 and Corollary 18.25. ) 18. Show that the notion of a continuous additive functional is preserved under a suitable change of measure Q == Zt . P. Use this result to extend Theorem 22.25 to a Brownian motion with drift. 
Chapter 23 One-Dimensional SDEs and Diffusions Weak existence and uniqueness; pathwise uniqueness and com- parison; scale function and speed measure; time-change rep- resentation; boundary classification; entrance boundaries and Feller properties; ratio ergodic theorem; recurrence and ergod- icity By a diffusion is usually understood a continuous strong Markov process, sometimes required to possess additional regularity properties. The basic example of a diffusion process is Brownian motion, which was first intro- duced and studied in Chapter 13. More general diffusions, first encountered in Chapter 19, were studied extensively in Chapter 21 as solutions to suit- able stochastic differential equations (SDEs). This chapter focuses on the one-dimensional case, which allows a more detailed analysis. Martingale methods are used throughout the chapter, and we make essential use of re- sults on random time-change from Chapters 17 and 18, as well as on local time, excursions, and additive functionals from Chapter 22. After considering the Engelbert-Schmidt characterization of weak ex- istence and uniqueness for the equation dX t == a(Xt)dBt, we turn to a discussion of various pathwise uniqueness and comparison results for the corresponding equation with drift. Next we proceed to a systematic study of regular diffusions, introduce the notions of scale function and speed mea- sure, and prove the basic representation of a diffusion on a natural scale as a time-changed Brownian motion. Finally, we characterize the different types of boundary behavior, establish the Feller properties for a suitable extension of the process, and examine the recurrence and ergodic properties in the various cases. To begin with the SDE approach, consider the general one-dimensional diffusion equation (a, b), given by dX t == a(Xt)dBt + b(Xt)dt. (1) From Theorem 21.11 we know that if weak existence and uniqueness in law hold for (1), then the solution process X is a continuous strong Markov process. It is clearly also a semimartingale. In Proposition 21.12 we saw how the drift term can sometimes be elim- inated through a suitable change of the underlying probability measure. 
23. One-dimensional SDEs and .Diffusions 451 Under suitable regularity conditions on the coefficients, we may use the alternative approach of transforming the state space. Let us then assume that X solves (1), and put yt = p(X t ) for some function p E 0 1 possessing an absolutely continuous derivative p' with density p". By the generalized Ita formula of Theorem 22.5, we have dyt p'(Xt)dX t + p"(Xt)d[X]t (ap')(Xt)dBt + (a2p" + bp')(Xt)dt. Here the drift term vanishes iff p solves the ordinary differential equation a2p" + bp' = O. If bl a 2 is locally integrable, then (2) has the explicit solutions p'(x) = c exp { -21 x (b(T-2) (u)du }, x E JR, where c is an arbitrary constant. The desired scale function p is then deter- mined up to an affine transformation, and for c > 0 it is strictly increasing with a unique inverse p-1. The mapping by p reduces (1) to the form dyt = o-(yt)dB t , where 0- = (ap')op-1. Since the new equation is equivalent, it is clear that weak or strong existence or uniqueness hold simultaneously for the two equations. Once the drift has been removed we are left with an equation of the form (2) dX t = a(Xt)dBt. (3) Here exact criteria for weak existence and uniqueness may be given in terms of the singularity sets Sa { X E JR; 1+ (T-2(y)dy = 00 } , N a {x E IR; u(x) = O}. Theorem 23.1 (existence and uniqueness, Engelbert and Schmidt) Weak existence holds for equation (3) with arbitrary initial distribution iff Sa C N a. In that case, uniqueness in law holds for every initial distribution iff Sa == N a . Our proof begins with a lemma, which will also be useful later. Given any measure v on JR., we may introduce the associated singularity set SI/ = {x E JR.; v(x-, x+) = oo}. If B is a one-dimensional Brownian motion with associated local time L, we may also introduce the additive functional As = J Lv(dx), s > 0, (4) 
452 Foundations of Modern Probability Lemma 23.2 (singularity set) Let L be the local time of Brownian motion B with arbitrary initial distribution, and define A by (4) for some measure v on JR.. Then a.s. inf{s > 0; As == co} == inf{s > 0; Bs E Sv}. Proof: Fix any t > 0, and let R be the event where Bs tt Sv on [0, t]. Noting that Lf == 0 a.s. for x outside the range B[O, t], we get a.s. on R At = i: V;v(dx) < v(B[O, t]) suPxL < 00 since B[O, t] is compact and Lf is a.s. continuous, hence bounded. Conversely, suppose that Bs E Sv for some s < t. To show that At == 00 a.s. on this event, we may use the strong Markov property to reduce to the case when Bo == a is nonrandom in Sv. But then Lf > 0 a.s. by Tanaka's formula, and so by the continuity of L we get for small enough € > 0 At == 1 00 Lfv(dx) > v(a - c, a + c) inf Lf = 00. 0 -00 Ix-al<c Proof of Theorem 23.1: First assume that 5(7 C N(7. To prove the as- serted weak existence, let Y be a Brownian motion with arbitrary initial distribution J.L, and define ( == inf{ s > 0; Y s E 5(7}. By Lemma 23.2 the additive functional As = is 0-- 2 (Yr)dr, S > 0, (5) is continuous and strictly increasing on [0, (), and for t > (we have At == 00. Also note that A, == 00 when ( == 00, whereas A, may be finite when ( < 00. In the latter case A jumps from A, to 00 at time (. Now introduce the inverse Tt == inf {s > 0; As > t}, t > o. ( 6) The process T is clearly continuous and strictly increasing on [0, A(], and for t > A( we have Tt == (. Also note that Xt = Y Tt is a continuous local martingale and, moreover, [Tt ft t = Art = 10 0--2(Y r )dr = 10 0-- 2 (Xs)dTs, Hence, for t < A" t < A,. [Xh = Tt = it 0-2(Xs)ds. (7) Here both sides remain constant after time A, since S(7 C N u , and so (7) remains true for all t > O. Hence, Theorem 18.12 yields the existence of a Brownian motion B satisfying (3), which means that X is a weak solution with initial distribution /.L. To prove the converse implication, assume that weak existence holds for any initial distribution. To show that Su C N u , we may fix any x E Su and 
23. One-dimensional SDEs and Diffusions 453 choose a solution X with Xo == x. Since X is a continuous local martingale, Theorem 18.4 yields Xt == Y Tt for some Brownian motion Y starting at x and some random time-change T satisfying (7). For A as in (5) and for t > 0 we have {Tt {t (t AT, = Jo u- 2 (Y r )dr = J o u- 2 (X s )dT s = J o l{u(Xs) > O}ds < t. (8) Since As == 00 for s > 0 by Lemma 23.2, we get Tt == 0 a.s., and so Xt = x a.s. But then x E N u by (7). Turning to the uniqueness assertion, assume that N u c: Sa, and consider a solution X with initial distribution J-L. As before, we may write Xt == Y Tt a.s., where Y is a Brownian motion with initial distribution J-L and T is a random time-change satisfying (7). Define A as in (5), put X == inf{t > 0; Xt E S(7}, and note that TX == ( = inf{ s > 0; Y s E Su}' Since N(7 C Su, we get as in (8) {Tt AT, = J o u- 2 (Y s )ds = t, t < X. Furthermore, As = 00 for s > ( by Lemma 23.2, and so (8) implies Tt < ( a.s. for all t, which means that T remains constant after time X. Thus, T and A are related by (6), which shows that r and then also X are measurable functions of Y. Since the distribution of Y depends only on J-L, the same thing is true for X, which proves the asserted uniqueness in law. To prove the converse, assume that Su is a proper subset of N u, and fix any x E N u \ Su. As before, we may construct a solution starting at x by writing Xt = Y Tt , where Y is a Brownian motion starting at x, and T is defined as in (6) from the process A in (5). Since x tf. SO", Lemma 23.2 gives A o + < 00 a.s., and so It > 0 a.s. for t > 0, which shows that X is a.s. nonconstant. Since x E N u , (3) has also the trivial solution Xt = x. Thus, uniqueness in law fails for solutions starting at x. 0 Proceeding with a study of pathwise uniqueness, we return to equation (1), and let w(a,.) denote the modulus of continuity of (]". Theorem 23.3 (pathwise uniqueness, Skorohod, Yamada and Watanabe) Let a and b be bounded, measurable functions on JR, where lE(w(u,h))-2dh = 00, c>O, (9) and either b is Lipschitz continuous or (J i= O. Then pathwise uniqueness holds for equation (a , b) . The significance of condition (9) is clarified by the following lemma, where for any semimartingale Y we write Lf (Y) for the associated local time. 
454 Foundations of Modern Probability Lemma 23.4 (local time) For i == 1,2, let Xi solve equation (0", b i ), where a satisfies (9). Then LO(X I - X 2 ) = 0 a.s. Proof: Write Y == Xl - x 2 , Li == Lf{Y), and w(x) == w{a, Ixl). Using (1) and Theorem 22.5, we get for any t > 0 1 00 Lidx i t d[Y]s i t { a(X;) - O"(X;) } 2 2 == 2 == 1 2 ds < t < 00. -ex) W x 0 (w(Ys)) 0 w(X s - Xs) By (1) and the right-continuity of L it follows that L = 0 a.s. 0 Proof of Theorem 23.3 for a :f- 0: By Propositions 21.12 and 21.13 com- bined with a simple localization argument, we note that uniqueness in law holds for equation (a, b) when 0" i= O. To prove the pathwise uniqueness, consider any two solutions X and Y with X o = Yo a.s. Using Tanaka's formula, Lemma 23.4, and equation (a, b), we get d(X t V yt) dX t + d(yt - Xt)+ dX t + l{yt > Xt}d(¥t - Xt) l{yt < Xt}dX t + l{yt > Xt}dyt a(X t V Yt)dBt + b(X t V Yt)dt, which shows that X V Y is again a solution. By the uniqueness in law we get X d X V Y. Since X < X V Y, it follows that X = X V Y a.s., which implies Y < X a.s. Similarly, X < Y a.s. 0 The assertion for Lipschitz continuous b is a special case of the following comparison result. Theorem 23.5 (weak comparison, Skorohod, Yamada) Fix some func- tions a and b I > b 2 , where 0" satisfies (9) and either b I or b 2 is Lipschitz continuous. For i = 1,2, let Xi solve equation (0-, b i ), and assume that XJ > X5 a.s. Then Xl > X 2 a.s. Proof: By symmetry we may assume that b 1 is Lipschitz continuous. Since X5 < XJ a.s., we get by Tanaka's formula and Lemma 23.4 (xi - Xl)+ it l{X; > X;} (a(X;) - a(XI)) dBt + it l{X; > X;} (b 2 (X;) - bl(X;)) ds. 
23. One-dimensional SDEs and Diffusions 455 Using the martingale property of the first term, the Lipschitz continuity of b l , and the condition b 2 < b I , we conclude that E(X; - xl)+ < E it l{X; > xl} (b1(X;) - b1(xl)) ds < E it l{X; > xl} Ix; - xli ds it E(X; - xl)+ ds. By Gronwall's lemma E(Xl - Xl)+ == 0, and hence xl < xl a.s. 0 Imposing stronger restrictions on the coefficients, we rnay strengthen the last conclusion to a strict inequality. Theorem 23.6 (strict comparison) Fix a Lipschitz continuous function (J' and some continuous functions b l > b 2 . For i == 1, 2, let X'l solve equation (a,b i ), andassumethatXJ > X6 a.s. Then Xl >x 2 a.s. on (0,00). Proof: Since the b i are continuous with b l > b 2 , there exists a locally Lipschitz continuous function b on 1R with b l > b > b 2 . By Theorem 21.3 equation (a, b) has a solution X with Xo == XJ > xg a.s., and it suffices to show that Xl > X > X 2 a.s. on (0,00). This reduces the discussion to the case when one of the functions b i is locally Lipschitz. By symmetry we may take that function to be b l . By the Lipschitz continuity of a and b l , we may define some continuous semimartingales U and V by U t it (b 1 (X;) - b 2 (X;)) ds,  i t a(X;) - a(X;) dB i t bl(X;) - b1(X;) d t X l _ X 2 s + X l _ X 2 s, o s s 0 s s subject to the convention % == 0, and we note that d(Xl- X;) == dU t + (Xl- X;)dvt. Letting Z == exp(V - ![V]) > 0, we get by Proposition 21.2 xl- xl = Zt(xJ - xJ) + Zt it Z;l (b1(X;) - b 2 (X;)) ds, and the assertion follows since XJ > xg a.s. and b l > b 2 . D We turn to a systematic study of one-dimensional diffusions. By a diffu- sion on some interval I C JR we mean a continuous strong Markov process taking values in I. Termination will only be allowed at open end-points of I. We define Ty == inf{t > 0; Xt == y} and say that X is regular if Px{Ty < oo} > o for any x E [0 andy E I. Let us further write Ta,b == Ta/\Tb. 
456 Foundations of Modern Probability Our first aim is to transform the general diffusion process into a contin- uous local martingale, using a suitable change of scale. This corresponds to the removal of drift in the SDE (1). Theorem 23.7 (scale function, Feller, Dynkin) For any regular diffusion X on I, there exists a continuous and strictly increasing function p on I such that p(XTa,b) is a Px-martingale for all a < x < b in I. Furthermore, an increasing function p has the stated property iff P { } Px - Pa [ ] x Tb < T a == , X E a, b . Pb - Pa (10) A function p with the stated property is called a scale function for X, and we say that X is on a natural scale if the scale function can be chosen to be linear. In general, we note that Y == p(X) is a regular diffusion on a natural scale. Our proof begins with a study of the functions Pa,b(X) == Px{ Tb < Ta}, ha,b(X) == ExTa,b, which play a basic role in the subsequent analysis. Lemma 23.8 (hitting times) For any regular diffusion on I and constants a < b in I, we have (i) Pa,b is continuous and strictly increasing on [a, b]; (ii) ha,b is bounded on [a, b]. a < x < b, In particular, we see from (ii) that Ta,b < 00 a.s. under Px for any a < x < b. Proof: (i) First we show that Px{ Tb < Ta} > 0 for any a < x < b. Then introduce the optional time 0"1 == Ta + Tx 0 ()T a , and define recursively O"n+1 == an +a1 ofJ un . By the strong Markov property the an form a random walk in [0,00] under each Px. If Px{ Tb < Ta} == 0, we get Tb > an --t 00 a.s. Px, and so Px {Tb == oo} == 1, which contradicts the regularity of X. Using the strong Markov property at Ty, we next obtain Px{Tb < Ta} == Px{Ty < Ta}Py{Tb < Ta}, a < x < y < b. (11) Since Px{Ta < Ty} > 0, we have Px{Ty < Ta} < 1, which shows that Px {Tb < T a} is strictly increasing. By symmetry it remains to prove that Py{Tb < Ta} is left-continuous on (a, b]. By (11) it is equivalent to show for each x E (a, b) that the mapping Y H- Px{Ty < Ta} is left-continuous on (x, b]. Then let Yn t Y, and note that TYn t Ty a.s. Px by the continuity of X. Hence, {TYn < Ta} t {7y < Ta}, which implies convergence of the corresponding probabilities. (ii) Fix any C E (a, b). By the regularity of X we may choose h > 0 so large that Pc{Ta < h} A Pc{Tb < h} = fJ > O. 
23. One-dimensional SDEs and Diffusions 457 If x E (a, c), we may use the strong Markov property at Tx to get {; < Pc{Ta < h} < Pc{Tx < h}Px{Ta < h} < Px{Ta < h} < Px{Ta,b < h}, and similarly for x E (c, b). By the Markov property at h and induction on n we obtain Px{Ta,b > nh} < (1- 8)n, x E [a,b], n E Z+, and Lemma 3.4 yields ExTa,b = 1 00 Px{Ta,b > t}dt < h I:no(l - 8)n < 00. 0 Proof of Theorem 23. 7: Let P be a locally bounded and measurable function on I such that M == p(XTa,b) is a martingale under Px for any a < x < b. Then Px ExMo == ExMoc> == EXP(XTa,b) PaPx{ Ta < Tb} + PbPx{ Tb < Ta} Pa + (pb - Pa)Px{ Tb < Ta}, and (10) follows, provided that Pa =I Pb. To construct a function p with the stated properties, fix any points u < v in I, and define for arbitrary a < u and b > v in I ( ) Pa,b(X) - Pa,b( u) P x - - Pa,b(V) - Pa,b(U) , To see that P is independent of a and b, consider any larger interval [a', b'] in I, and conclude from the strong Markov property at Ta,b that, for x E [a, b], x E [a, b] . (12) PX{Tb ' < Tal} = Px{Ta < Tb}Pa{Tb ' < Tal} + Px{Tb < Ta}Pb{Tb ' < Tal}, or Pal,bl(X) = Pa,b(X)(Pa',bl(b) - Pal,bl(a)) + Pa'b,(a). Thus, Pa,b and Pa' ,b ' agree on [a, b] up to an affine transformation and so give rise to the same value in (12). By Lemma 23.8 the constructed function is continuous and strictly in- creasing, and it remains to show that p(XTa,b) is a martingale under Px for any a < b in I. Since the martingale property is preserved by affine trans- formations, it is equivalent to show that Pa,b(XTa,b) is a Px-martingale. Then fix any optional time a, and write T == a /\ Ta,bo By the strong Markov property at T we get ExPa,b(X T ) ExPX-r{Tb < Ta} == p x fr;l{Tb < Ta} - Px{Tb < Ta} = Pa,b(X), and the desired martingale property follows by Lemma 7.13. 0 
458 Foundations of Modern Probability To prepare for the next result, consider a Brownian motion B in 1R with associated jointly continuous local time L. For any measure v on JR., we may introduce as in (4) the associated additive functional A == J LXv(dx) and its right-continuous inverse at == inf {s > 0; As > t}, t > o. If v ::j:. 0, it is clear from the recurrence of B that A is a.s. unbounded. Hence, at < 00 a.s. for all t, and we may define Xt == Bat' t > o. We shall refer to (j == (at) as the random time-change based on v and to the process X == Boa as the correspondingly time-changed Brownian motion. Theorem 23.9 (speed measure and time-change, Feller, Volkonsky, It6 and McKean) For any regular diffusion on a natural scale in I, there exists a unique measure v on I with v[a, b] E (0,00) for all a < b in ]0, such that X is a time-changed Brownian motion based on some extension of v to 1. Conversely, any such time-change of Brownian motion defines a regular diffusion on I. The extended version of v is called the speed measure of the diffusion. Contrary to what the term suggests, we note that the process moves slowly through regions where v is large. The speed measure of Brownian motion it- self is clearly equal to Lebesgue measure. More generally, the speed measure of a regular diffusion solving equation (3) has density a- 2 . To prove the uniqueness of v we need the following lemma, which is also useful for the subsequent classification of boundary behavior. Here we write a a,b == i nf {s > 0; B s  (a, b ) } . Lemma 23.10 (Green function) Let X be a time-changed Brownian mo- tion based on v. Then for any measurable function f > 0 on I and points a < b in I, (T a b l b Ex J o ' f(Xddt = a ga,b(X, y)f(y)v(dy), X E [a, b], where ( ) _ E L y - 2(x /\ Y - a)(b - x V y) [ b ] ga,b x, Y - x a a b - b ' x, yEa, . , -a If X is recurrent, this remains true with a == -00 or b == 00. (13) (14) Taking f - 1 in (13), we get in particular the formula ha,b(X) = ExTa,b = l b 9a,b(X, y)v(dy), x E [a, b], (15) which will be useful later. Proof: Clearly, Ta,b == A(aa,b) for any a, b E 1, and also for a == -00 or b == 00 when X is recurrent. Since LY is supported by {y}, it follows by (4) that {T a b (aa b l b Jo ' f(Xt)dt = Jo · f(Bs)dAs = a f(y)La,bv(dy). 
23. One-dimensional SDEs and Diffusions 459 Taking expectations gives (13) with ga,b(X, y) == ExLa.b' To prove (14), we note that by Tanaka's formula and optional sampling ExLa,bI\S == ExlBaa.bl\S - yl - Ix - yl, s > o. If a and b are finite, we may let s ---+ ()() and conclude by monotone and dominated convergence that ( ) (y-a)(b-x) (b-y)(x-a) I I 9a,b x, Y == b + b - x - y , -a -a which simplifies to (14). The result for infinite a or b follows immediately by monotone convergence. D The next lemma will enable us to construct the speed measure v from the functions ha b in Lemma 23.8. , Lemma 23.11 (consistency) For any regular diffusion on a natural scale in I, there exists a strictly concave function h on 1° such that, for any a < b in I, x-a b-x ha,b(X) == h(x) - b _ a h(b) - b _ a h(a), x E [a, b]. (16) Proof: Fix any u < v in I, and define for any a < u and b > v in I x-u v-x h(x) == ha b(X) - ha b(V) - ha b(U), :r E [a, b]. (17) , v-u' v-u' To see that h is independent of a and b, consider any larger interval [ai, b'] in I, and conclude from the strong Markov property at Ta,b that, for x E [a, b], ExTal,b l == ExTa,b + Px{Ta < Tb}EaTa',b l + Px{Tb < Ta}EbTal,b " or b-x x-a ha',b/(X) == ha,b(X) + b _ a ha',bl(a) + b _ a ha',b,(b). (18) Thus, ha,b and ha',b ' agree on [a, b) up to an affine function and therefore yield the same value in (1 7) . If a < U and b > v, then (17) shows that hand ha,b agree on [a, b] up to an affine function, and (16) follows since ha,b(a) == ha,b(b) == o. The formula extends by means of (18) to arbitrary a < b in I. 0 Since h is strictly concave, its left derivative h'- is strictly decreasing and left-continuous, and so it determines a measure v on [0 satisfying 2v[a, b) == h'- (a) - h'- (b), a < b in ]0. (19) For motivation, we note that this expression is consistent with (15). The proof of Theorem 23.9 requires some understanding of the behavior of X at the endpoints of I. If an endpoint b does not belong to I, then by hypothesis the motion terminates when X reaches b. It is clearly equivalent to attach b to I as an absorbing endpoint. For convenience we may then assume that I is a compact interval of the form [a, b), where either endpoint 
460 Foundations of Modern Probability may be inaccessible, in the sense that a.s. it cannot be reached in finite time from a point in [0. For either endpoint b, the set Zb == {t > 0; Xt == b} is regenerative under P b in the sense of Chapter 22. In particular, we see from Lemma 22.8 that b is either absorbing, in the sense that Zb == 1R+ a.s., or reflecting, in the sense that Zb == 0 a.s. In the latter case, we say that the reflection is fast if )..Zb == 0 and slow if )..Zb > o. A more detailed discussion of the boundary behavior will be given after the proof of the main theorem. We first establish Theorem 23.9 in a special case. The general result will then be deduced by a pathwise comparison. Proof of Theorem 23.9 for absorbing endpoints (Miliard): Let X have distribution Px, where x E [0, and put <: == inf{t > 0; Xt =1= [O}. For any a < b in 1° with x E [a, b], the process XTa,b is a continuous martingale, and so by Theorem 22.5 h(X t ) = h(x) + it h'-(X)dX -1 Lfv(dx), t E [O,(), (20) where L denotes the local time of X. Next conclude from Theorem 18.4 that X == B 0 [X] a.s. for some Brown- ian motion B starting at x. Using Theorem 22.5 twice, we get in particular, for any nonnegative measurable function f, - f {t f[Xh f if f(x)Lfdx = io f(Xs)d[X]s = io f(Bs)ds = if f(x)LfxJtdt, where L denotes the local time of B. Hence, Lf == Lfxh a.s. for t < (, and so the last term in (20) equals A[Xh a.s. For any optional time (J, put 7 == (J 1\ 7a,b, and conclude from the strong Markov property that Ex [7 + ha,b(X T )] Ex [7 + EXT 7a,b] - Ex [7 + Ta,b 0 B T ] == ExTa,b = ha,b(X). Writing Mt == h(X t ) + t, it follows by Lemma 7.13 that MTa,b is a P x - martingale whenever x E [a, b] c ]0. Comparing with (20) and using Proposition 17.2, we obtain A[Xh == t a.s. for all t E [0, (). Since A is continuous and strictly increasing on [0, () with inverse (J, it follows that [X]t = Ut a.s. for t < (. The last relation extends to [(,00), provided that 1/ is given infinite mass at each endpoint. Then X == B 0 (J a.s. on JR+. Conversely, it is easily seen that Boa is a regular diffusion on I when- ever (J" is a random time-change based on some measure 1/ with the stated properties. To prove the uniqueness of v, fix any a < x < b in 1°, and apply Lemma 23.10 with fey) = (ga,b(X, y))-l to see that v(a, b) is determined by Px. 0 Proof of Theorem 23.9, general case: Define 1/ on [0 as in (19), and extend the definition to I by giving infinite mass to absorbing endpoints. 
23. One-dimensional SDEs and Diffusions 461 To every reflecting endpoint we attach a finite mass, to be specified later. Given a Brownian motion B, we note as before that the correspondingly time-changed process X = B 0 (1 is a regular diffusion on I. Letting ( == sup{t; Xt E IO} and £ == sup{t; Xt E IO}, we further see from the previous case that X< and X< have the same distribution for any starting position x E 1°. Now fix any a < b in 1°, and define recursively Xl == (" + Ta,b 0 B<; Xn+l == Xn + Xl 0 (}Xn' n E N. The processes y,b = x< 0 (}Xn then form a Markov chain in the path space. A similar construction for X yields some processes y::,b, and we d - , b' note that (y:,b) == (y::,b) for fixed a and b. Since the processes y ' for any smaller interval [a', b'] can be measurably recovered from those for [a, b) and similarly for y::' ,b' , it follows that the whole collections (y;,b) and (y,b) have the same distribution. By Theorem 6.10 we may then assume that the two families agree a.s. Now assume that I == [a, b], where a is reflecting. Fro the propertie.:" of Brownian motion we note that the level sets Za and Za for X and X are a.s. perfect. hus, we may introdue the corresponding excursion poi:r:t processes  and , local times Land L, and inverse locl times T and T. Since the excursions within [a, b) agree a.s. for X and ...Y, it is clear from the law of large numbers that we may normalize the excursin laws for the two processes sch that the corresponding parts of  and  agree a.s. Then even T and T agree, possibly apart from the lengths of_ excursions that reach b and the drift coefficient c in Theorem 22.13. For X the latter is proportional to the mass v{a}, which may now be chosen such that c becomes the same as for X . Note that this choce of v { a} is independent of starting position x for the processes X and X. _ If the other endpoint b is absorbing, then clearly X == X a.s., and the proof is complee. If b is instead reflecting, then the excursions from b agree a.s. for X and X. Repeating !he previous argument with the roles of a and b interchanged, we get X = X a.s. after a suitable adjustment of the mass v{b}. 0 We proceed to classify the boundary behavior of a regular diffusion on a natural scale in terms of the speed measure lI. A right endpoint b is called an entrance boundary for X if b is inaccessible, and yet Hrn infPy{Tx < r}>O, xEIo. roo y>x (21) By the Markov property at times nr, n E N, the limit in (21) then equals 1. In particular, Py{ Tx < oo} = 1 for all x < y in 1°. As we shall see in Theorem 23.13, an entrance boundary is an endpoint where X may enter but not exit. The opposite situation occurs at an exit boundary. By this we mean an endpoint b that is accessible and yet naturally absorbing, in the sense that 
462 Foundations of Modern Probability it remains absorbing even when the charge v{b} is reduced to zero. If b is accessible but not naturally absorbing, we have already seen how the boundary behavior of X depends on the value of v{b}. Thus, b in this case is absorbing when v{b} = 00, slowly reflecting when v{ b} E (0,00), and fast reflecting when v{b} == o. For reflecting b it is further clear from Theorem 23.9 that the set Zb == {t > 0; Xt == b} is a.s. perfect. Theorem 23.12 (boundary behavior, Feller) Let v be the speed measure of a regular diffusion on a natural scale in somt; interval I == [a, b], and fix any u E [0. Then (i) b is accessible iff it is finite with J(b - x)v(dx) < 00; (ii) b is accessible and reflecting iff it is finite with v( u, b] < 00; (iii) b is an entrance boundary iff it is infinite with J: xv(dx) < 00. The stated conditions may be translated into corresponding criteria for arbitrary regular diffusions. In the general case it is clear that exit and other accessible boundaries may be infinite, whereas entrance boundaries may be finite. Explosion is said to occur when X reaches an infinite boundary point in finite time. An interesting example of a regular diffusion on (0, 00) with o as an entrance boundary is given by the Bessel process Xt == IBtl, where B is a Brownian Illotion in lR d with d > 2. Proof of Theorem 23.12: (i) Since limsups(::l:B s ) == 00 a.s., Theorem 23.9 shows that X cannot explode, so any accessible endpoint is finite. Now assume that a < c < u < b < 00. Then Lemma 23.8 shows that b is accessible iff hc,b(U) < 00, which by (15) is equivalent to J:(b - x)v(dx) < 00. (ii) In this case b < 00 by (i), and then Lemma 23.2 shows that b is absorbing iff v( u, b] == 00. (iii) An entrance boundary b is inaccessible by definition, and therefore Tu == Tu,b a.s. when a < U < b. Arguing as in the proof of Lemma 23.8, we also note that EyTu is bounded for y > u. If b < 00, we obtain the contradiction EyTu = hu,b(Y) == 00, and so b must be infinite. From (15) we get by monotone convergence as y -t 00 EyTu = hu,oo(Y) = 21 00 (x 1\ Y - u)v(dx) -+ 21 00 (x - u)v(dx), which is finite iff Ju oo xv(dx) < 00. o We proceed to establish an important regularity property, which also clarifies the nature of entrance boundaries. Theorem 23.13 (entrance laws and Feller properties) Given a regular diffusion on I, form an extended interval I by attaching the possible en- trance boundaries to I. Then the original diffusion extends to a continuous Feller process on I. 
23. One-dimensional SDEs and Diffusions 463 Proof: For any f E Cb, a, x E I, and r, t > 0, we get by the strong Markov property at Tx !\. r Eaf(X"'x/\r+t) EaTtf(XTX/\T) Ttf(x)Pa{Tx < r} + Ea[Ttf(Xr); Tx > r]. (22) To show that Tt! is left-continuous at some y E 1, fix any a < y in 1°, and choose r > 0 so large that Pa{Ty < r} > O. As x t y, we have Tx t Ty and hence {Tx < r} .J, {Ty < r}. Thus, the probabilities and expectations in (22) converge to the corresponding expressions for Ty, and we get Ttf(x) -+ Tt f (y ). The proof of the right-continuity is similar. If an endpoint b is inaccessible but not of entrance type, and if f (x) -+ 0 as x -+ b, then clearly even Tt!(x) -+ 0 at b for each t > Q. Now assume that 00 is an entrance boundary, and consider a function f with a finite limit at 00. We need to show that even Ttf(x) converges as x -+ 00 for fixed t. Then conclude from Lemma 23.10 that as a -+ 00, sup ExTa = 2 sup fOO(Xl\r-a)v(dr) =2 foo(r-a)v(dr)-+O. (23) xa xa Ja Ja Next we note that, for any a < x < y and r > 0, Py{Ta < r} < Py{Tx < r,Ta-Tx < r} - Py{Tx < r}Px{Ta < r} < Px{Ta < r}. Thus Px 0 T;;l converges vaguely as x -+ 00 for fixed a, and in view of (23) the convergence holds even in the weak sense. Now fix any t and f, and introduce for each a the continuous func- tion ga(s) = EaJ(X(t-s)+)' By the strong Markov property at Ta !\. t and Theorem 6.4 we get for any x, y > a ITtJ(x) - TtJ(y)\ < IExga(Ta) - Eyga(Ta)1 + 211fll(Pr + Py){Ta > t}. Here the right-hand side tends to zero as x, y -+ 00 and then a -t 00, because of (23) and the weak convergence of Px 0 T;l. Thus, Ttf(x) is Cauchy convergent as x -+ 00, and we may denote the limit by Tt f ( 00 ) . It is now easy to check that the extended operators Tt form a Feller semi- group on Co(I). Finally, it is clear from Theorem 19.15 that the associated process starting at a possible entrance boundary again has a continuous version, in the topology of I. 0 We proceed to establish a ratio ergodic theorem for elementary additive functionals of a recurrent diffusion. It is instructive to compare with the general ratio limit theorems of Chapter 20. 
464 Foundations of Modern Probability Theorem 23.14 (ratio ergodic theorem, Derman, Motoo and Watanabe) Let X be a regular, recurrent diffusion on a natural scale and with speed measure v. Then for any measurable functions I,g > 0 on I with vi < 00 and vg > 0, lirn J f{Xs)ds = v f a.s. Px, t-HX) J; g(Xs)ds vg x E I. Proof: Fix any a < b in I, put Ti == Tb + Ta 0 Orb' and define recursively some optional times ao, aI, . .. by an+l == an + T: 0 BUn' n > 0, starting with ao = Ta. Write l<7 n f{Xs)ds = 1<7 0 f{Xs)ds + t 1:1 f{Xs)ds, (24) and note that the terms of the last sum are i.i.d. By the strong Markov property and Lemma 23.10, we get for any x E I Ex j <7 k f{Xs)ds = Ea r b f{Xs)ds + Eb r a f{Xs)ds Uk-l Jo Jo - J f{y){9-oo,b{y, a) + 9a,oo{Y, b)}v{dy) 2 J f(y){{b-.yva)++(y/\b-a)+}v(d Y ) 2(b - a)v f. From the same lemma, we also see that the first term in (24) is a.s. finite. Hence, by/the law of large numbers Hrn n- 1 r n f{Xs)ds = 2{b - a)vf a.s. P"" x E I. n-+oo Jo Writing Kt = sup{ n > 0; an < t}, we get by monotone interpolation Hrn K,t 1 t f{Xs)ds = 2(b - a)vf a.s. P x , x E I. (25) t-+oo Jo This remains true when v f = 00, since we can then apply (25) to some approximating functions In t f with vfn < 00 and let n  00. The assertion now follows as we apply (25) to both f and g. 0 We may finally classify the asymptotic behavior of the process, according to the boundedness of the speed measure v and the nature of the endpoints. For convenience, we may first apply an affine mapping to transforms ]0 into one of the intervals (0,1), (0, (0), or (00,00). Since finite endpoints may be either inaccessible, absorbing, or reflecting-represented below by the brackets (, [, and [[, respectively-we need to distinguish between ten different cases. 
23. One-dimensional SDEs and Diffusions 465 We say that a diffusion is v-ergodic if it is recurrent and such that Px 0 X t - l  vjvI for all x. A recurrent diffusion may be either null-recurrent or positive recurrent, depending on whether \Xtl -4 00 or not. Let us also recall that absorption occurs at an endpoint b whenever Xt == b for all sufficiently large t. Theorem 23.15 (recurrence and ergodicity, Feller, Maruyama and Tanaka) For any regular diffusion on a natural scale and with speed measure v, the ergodic behavior is the following, depending on initial position x and the nature of the boundaries: (-00,00): v-ergodic if v is bounded, otherwise null-recurrent; (0,00): converges to 0 a.s.; [0,00): absorbed at 0 a.s.; [[0,00): v-ergodic if v is bounded, otherwise null-recurrent; (0, 1): converges to 0 or 1 with probabilities 1 - x and x, respectively; [0, 1): absorbed at 0 or converges to 1 with probabilities 1 - x and x, respectively; [0, 1]: absorbed at 0 or 1 with probabilities 1 - x and x, respectively; [[0,1): converges to 1 a.s.; [[0, 1]: absorbed at 1 a.s.; [[0,1]]: v-ergodic. We begin our proof with the relatively elementary recurrence properties, which distinguish between the possibilities of absorption, convergence, and recurrence. Proof of recurrence properties: [0,1]: Relation (10) yields Px{TO < oo} == 1 - x and Px{Tl < oo} == x. [0,00): By (10) we have for any b > x Px{70 < oo} > Px{70 < Tb} == (b - x)/b, which tends to 1 as b --+ 00. (-00,00): The recurrence follows from the previous case. [[0,00): Since 0 is reflecting, we have PO{Ty < oo} > 0 for some y > o. By the strong Markov property and the regularity of X, this extends to arbitrary y. Arguing as in the proof of Lemma 23.8, we may conclude that Po { T Y < oo} = 1 for all y > o. The asserted recurrence now follows, as we combine with the statement for [0, ()o). (0,00): In this case X == B 0 [X] a.s. for some Brownian motion B. Since X > 0, we have [X]oo < 00 a.s., and therefore X converges a.s. Now Py{ Ta,b < oo} = 1 for any 0 < a < y < b. Applying the Markov property at an arbitrary time t > 0, we conclude that a.s. either 
466 Foundations of Modern Probability lim inft Xt < a or lirn SUPt Xt > b. Since a and b are arbitrary, it follows that Xoo is an endpoint of (0, 00) and hence equals o. (0,1): Arguing as in the previous case, we get a.s. convergence to either 0 or 1. To find the corresponding probabilities, we conclude fronl (10) that b-x Px{Ta<OO} > Px{Ta<Tb}== b-a ' O<a<x<b<l. Letting b -+ 1 and then a -+ 0, we obtain Px{Xoo == O} > 1 - x. Similarly, Px {X 00 = I} > x, and so equality holds in both relations. [0,1): Again X converges to either 0 or 1 with probabilities 1 - x and x, respectively. Furthermore, we note that PX{TO < co} > Px{TO < Tb} == (b - x)jb, 0 < x < b < 1, which tends to 1 - x as b -t 1. Thus, X gets absorbed when it approaches o. [[0,1]]: Arguing as in the previous case, we get PO{TI < co} == 1, and by symmetry we also have PI {TO < oo} = 1. [[0,1]: Again we get PO{TI < oo} == 1, so the same relation holds for Px. [[0,1): As before, we get PO{Tb < oo} = 1 for all b E (0,1). By the strong Markov property at Tb and the result for [0,1) it follows that Po{Xt -t I} > b. Letting b -t 1, we obtain Xt -t 1 a.s. under Po. The result for Px now follows by the. strong Markov property at Tx, applied under Po. 0 I The ergodic properties will be proved along the lines of Theorem 8.18, which requires some additional lemmas. Lemma 23.16 (coupling) If X and Yare independent Feller processes, then the pair (X, Y) is again Feller. Proof: Use Theorem 4.29 and Lemma 19.3. The next result is a continuous-time counterpart of Lemma 8.20. Lemma 23.17 (strong ergodicity) Given a regular, recurrent diffusion, we have for any initial distributions J-LI and J-L2 Hrn IIPJ.Ll oO"t I - PJ.t2 0 B;ll1 = o. t-+oo o Proof: Let X and Y be independent with distributions PJ.Ll and PJ.t2' respectively. By Theorem 23.13 and Lemma 23.16 the pair (X, Y) can be extended to a Feller diffusion, and so by Theorem 19.17 it is again strong Markov with respect to the induced filtration g. Define T = inf {t > 0; Xt = yt}, and note that T is Q-optional by Lemma 7.6. The assertion now follows as in case of Lemma 8.20, provided we can show that T < 00 a.s. 
23. One-dimensional SDEs and Diffusions 467 To see this, assume first that I == IR. The processes X and Yare then continuous local martingales. By independence they remain local martingales for the extended filtration g, and so even X - Y is a local Q-martingale. Using the independence and recurrence of X and Y, we get [X - Y]oo == [X]oo + [Y]oo == 00 a.s., which shows that even X - Y is recurrent. In particular, T < 00 a.s. Next let 1== [[0,(0) or [[0,1]], and define 71 == inf{t > 0; Xt == O} and 72 == inf {t > 0; yt == O}. By the continuity and recurrence of X and Y, we get T < Tl V T2 < 00 a.s. 0 Our next result is similar to the discrete-time version in Lemma 8.21. Lemma 23.18 (existence) Every regular, positive recurrent diffusion has an invariant distribution. Proof: By Theorem 23.13 we may regard the transition kernels J-Lt with associated operators Tt as defined on I, the interval I with possi- ble entrance boundaries adjoined. Since X is not null-recurrent, we may choose a bounded Borel set B and some Xo E I and in  00 such that inf n J..tt n (xo, B) > O. By Theorem 5.19 there exists SODle measure J.-L on I with J.-LI > 0 such that J.Lt n (xo, .) -4 J.L along a subsequence, in the topology of I. The convergence extends by Lemma 23.1 7 to arbitrary x E I, and so Tt n f(x) -t J.Lf, f E Co (I), x E I. (26) Now fix any h > 0 and f E C o (1), and note that even Thf E Co (I) by Theorem 23.13. Using (26), the semigroup property, and dominated convergence, we get for any x E I J-L(Thf) f- Tt n (Thf)(x) == Th (Tt n f)(x)  J-Li. Thus, J.LJ-lh = J.L for all h, which means that J-l is invariant on I. In particular, J.L(J \ I) == 0 by the nature of entrance boundaries, and so the normalized measure J-L/ J-LI is an invariant distribution on I. 0 Our final lemma provides the crucial connection between speed measure and invariant distributions. Lemma 23.19 (positive recurrence) For a regular, recurrent diffusion on a natural scale and with speed measure lJ, these conditions are equivalent: (i) vI < 00; (ii) the process is positive recurrent; (iii) an invariant distribution exists. The invariant distribution is then unique and equals l/ / l/ I. Proof: If the process is null-recurrent, then clearly no invariant distribu- tion exists. The converse is also true by Lemma 23.18, and so (ii) and (iii) are equivalent. Now fix any bounded, measurable function f: I ---+ IR+ with bounded support. By Theorem 23.14, Fubini's theorem, and dominated 
468 Foundations of Modern Probability convergence, we have for any distribution J-l on I c1lt El-'f(Xs)ds = EI-' c1lt f(Xs)ds -+ : . If J-l is invariant, we get /-l f = v f / v I, and so v I < 00. If instead X is null-recurrent, then EJ.lf(Xs) -+ 0 as s -+ 00, and we get v f /vI == 0, which implies vI = 00. 0 End of proof of Theorem 23.15: It remains to consider the cases when I is either (00, (0), [[0,(0), or [[0, 1]], since we have otherwise convergence or absorption at some endpoint. In case of [[0, 1]] we note from Theorem 23.12 (ii) that v is bounded. In the remaining cases v may be unbounded, and then X is null-recurrent by Lemma 23.19. If v is bounded, then J-l == v/vI is invariant by the same lemma, and the asserted lI-ergodicity follows from Lemma 23.17 with J.Ll == J-l. 0 Exercises 1. Prove pathwise uniqueness for the SDE dX t == (Xt+)1/2dBt + cdt with c > O. Also show that the solutions Xx with Xc) == x satisfy Xf < Xi a.s. / for x < y up to the time when XX reaches O. 2. Let X be Brownian motion in ]Rd, absorbed at O. Show that Y = IXI 2 is a regular diffusion on (0, (0), describe its boundary behavior for different d, and identify the corresponding case of Theorem 23.15. Verify the conclusion by computing the associated scale function and speed measure. 3. Show that solutions to equation dX t == a(Xt)dBt cannot explode. (Hint: If X explodes at time ( < 00, then [X], == 00, and the local time of X tends to 00 as t -+ (, uniformly on compacts. Now use Theorem 22.5 to see that ( == 00 a.s.) 4. Assume in Theorem 23.1 that Sa == N a . Show that the solutions X to (3) form a regular diffusion on a natural scale on every connected component I of Su. Also note that the endpoints of I are either absorbing or exit boundaries for X. (Hint: Use Theorems 21.11, 22.4, and 22.5, and show that the exit time from any compact interval J c I is finite.) 5. Assume in Theorem 23.1 that Su C N a , and form a from 0" by taking o-(x) = 1 on A = N a \ Suo Show that any solution X to equation (0-,0) also solves equation (0",0), but not conversely unless A == 0. (Hint: Since AA = 0, we have J lA(Xt)dt = J lA(Xt)d[X]t = 0 a.s. by Theorem 22.5.) 6. Assume in Theorem 23.1 that SO' c NO'. Show that equation (0-,0) has solutions that form a regular diffusion on every connected component of S;. Prove the corresponding statement for the connected components of N when N a is closed. (Hint: For S, use the preceding result. For N, take X to be absorbed when it first reaches N u .) 
23. One-dimensional SDEs and Diffusions 469 7. In the setting of Theorem 23.14, show that the stated relation implies the convergence in Corollary 20.8 (i). Also use the result to prove a law of large numbers for regular, recurrent diffusions with bounded speed measure v. (Hint: Note that vg > 0 implies J g(Xs)ds > 0 a.s.) 
Chapter 24 Connections with PDEs and Potential Theory Backward equation and Feynman-Kac formula; uniqueness for SDEs from existence for PDEs; harmonic functions and Dirichlet's problem; Green functions as occupation densities; sweeping and equilibrium problem,s; dependence on conductor and domain; time reversal; capacities and random sets In Chapters 19 and 21 we saw how elliptic differential operators arise natu- rally in probability theory as the generators of nice diffusion processes. This fact is the ultimate cause of some profound connections between probability theory and partial differential equations (PDEs). In particular, a suitable extension of the operator !  appears as the generator of Brownian motion in lR d , which leads to a close relationship between classical potential theory and the theory of Brownian motion. More specifically, many basic problems in potential theory can be solved by probabilistic methods, and, conversely, various hitting distributions for Brownian motion can be given a potential theoretic interpretation. This chapter explores some of the mentioned connections. First we derive the celebrated Feynman-Kac formula and show how existence of solutions to a given Cauchy problem implies uniqueness of solutions to the associated SDE. We then proceed with a probabilistic construction of Green functions and potentials and solve the Dirichlet, sweeping, and equilibrium problems of classical potential theory in terms of Brownian motion. Finally, we show how Green capacities and alternating set functions can be represented in a natural way in terms of random sets. Some stochastic calculus from Chapters 17 and 21 is used at the begin- ning of the chapter, and we also rely on the theory of Feller processes from Chapter 19. As for Brownian motion, the present discussion is essentially self-contained, apart from some elementary facts cited from Chapters 13 and 18. Occasionally we refer to Chapters 4 and 16 for some basic weak convergence theory. Finally, the results at the end of the chapter require the existence of Poisson processes from Proposition 12.5, as well as some basic facts about the Fell topology listed in Theorem A2.5. Potential theoretic ideas are used in several other chapters, and additional, though essentially unrelated, results appear in especially Chapters 20, 22, and 25. 
24. Connections with PDEs and Potential Theory 471 To begin with the general PDE connections, we consider an arbitrary Feller diffusion in d with associated semigroup operators Tt and generator (A, V). Recall from Theorem 19.6 that, for any f E V, the function u(t, x) == Ttf(x) == Exf(Xt), t > 0, x E IR d , satisfies Kolmogorov's backward equation it == Au, where 71 == au/at. Thus, u provides a probabilistic solution to the Cauchy problem, it == Au, u(O, x) == f(x). (1) Let us now add a potential term vu to (1), where v : JRd -t JR+, and consider the more general problem it == Au - vu, u(O, x) == f(x). (2) Here the solution may be expressed in terms of the elementary multiplicative functional e- v , where Vi = I t v(Xs)ds, t > O. Let 0 1 ,2 denote the class of functions f : JR+ x JRd that are of class 0 1 in the time variable and of class 0 2 in the space variables. Write Cb(JR d ) and ct (JR d ) for the classes of bounded, continuous functions from ]Rd to ]R and JR+, respectively. Theorem 24.1 (Cauchy problem, Feynman, Kac) Let (A,1)) be the gen- erator of a Feller diffusion in ]Rd, and fix any f E Cb(]Rd) and v E C: (JR d ). Then any bounded solution u E 0 1 ,2 to (2) is given by u(t, x) = Exe-Vt f(Xt), t > 0, x E )Rd. (3) Conversely, (3) solves (2) whenever f E 1). The expression in (3) has an interesting interpretation in terms of killing. To see this, we may introduce an exponential random variable '"Y lLX with mean 1, and define ( == inf {t > 0; Vi > '"Y}. Letting X denote the process X killed at time (, we may exp!ess the right-hand side of (3) as Exf(Xt), with the understanding that f(Xt) == 0 when t > (. In other words, u(t,x) == Ttf(x), where Tt is the transition operator of the killed process. It is easy to verify directly from (3) that the family (Tt) is again a Feller semigroup. Proof of Theorem 24.1: Assume that u E C 1 ,2 is bounded and solves (2), and define for fixed t > 0 M s == e - V s u (t - s, X s ) , s E [0, t]. Letting ';!:, denote equality apart from a continuous local martingale or its differential, we see from Lemma 19.21, Ita's formula, and (2) that, for any s < t, dMs e-Vs{du(t - s,X s ) - u(t - s,Xs)v(Xs)ds}  e- V s {Au(t - s, Xs) - u(t - s, Xs) - u(t - s, .X"s)v(Xs) }ds == O. 
472 Foundations of Modern Probability Thus, M is a continuous local martingale on [0, t). Since M is bounded, the martingale property extends to t, and we get u(t, x) = ExMo = ExMt == Exu(O, X t ) == Exe-Vt f(Xt). Next let u be given by (3) for some f E V. Integrating by parts and using Lemma 19.21, we obtain d{e- Vt f(X t )} e-Vt{df(X t ) - (vf)(Xt)dt}  e-Vt(Af - vf)(Xt)dt. Taking expectations and differentiating at t == 0, we conclude that the generator of the semigroup Ttf(x) == Exf(Xt) = u(t, x) equals A = A - v on V. Equation (2) now follows by the last assertion in Theorem 19.6. 0 The converse part of Theorem 24.1 can often be improved in special cases. In particular, if v == 0 and A ==   == ! L:i a 2 / ax;, so that X is a Brownian motion and (2) reduces to the standard heat equation, then u(t, x) == Exf(Xt) solves (2) for any bounded, continuous function f on ]Rd. To see this, we note that u E C 1 ,2 on (0, 00) x]Rd because of the smoothness of the Brownian transition density. We may then obtain (2) by applying the backward equation to the function Thf(x) for a fixed h E (0, t). Let us now consider an SDE in ]Rd of the form dX: == aJ(Xt)dBf + bi(Xt)dt, (4) and introduce the associated elliptic operator Av(x) == !aij(x)vj(x) + bi(X)V(x), x E ]Rd, V E C 2 , where a ij = aia. The next result shows how uniqueness in law for solu- tions to (4) may be inferred from the existence of solutions to the associated Cauchy problem (1). Theorem 24.2 (uniqueness, Stroock and Varadhan) If for every f E Co(R d ) the Cauchy problem in (1) has a bounded solution on [0,£] x ]Rd for some € > 0, then uniqueness in law holds for the SDE (4). Proof: Fix any f E COO and t E (0,£], and let u be a bounded solution to (1) on [0, t] x ]Rd. If X solves (4), we note as before that Ms == u(t - s, Xs) is a martingale on [0, t), and so Ef(X t ) == Eu(O, Xt) == EMt == EMo == Eu(t, Xo). Thus, the one-dimensional distributions of X on [0, €] are uniquely determined by the initial distribution. Now assume that X and Yare solutions with the same initial distribu- tion. To prove that their finite-dimensional distributions agree, it is enough to consider times 0 = to < t1 < ... < t n such that tk - tk-1 < € for all k. Assume that the distributions agree at to, . . . , t n -1 == t, and fix any set C == 1f...,tn_l B with B E Bnd. By Theorem 21.7, both £(X) and £(Y) solve the local martingale problem for (a, b). If P{X E C} = P{Y E C} > 
24. Connections with PDEs and Potential Theory 473 0, we see as in case of Theorem 21.11 that the same property holds for the conditional measures P[(JtX E .IX E C] and P[8t Y E . I}?" E C]. Since the corresponding initial distributions agree by hypothesis, the one-dimensional result yields the extension P{X E C, Xt+h E .} == P{Y E C, Yf+h E.}, h E (O,c]. In particular, the distributions agree at times to, . . . , tn. The general result now follows by induction. 0 Let us now specialize to the case when X is Brownian Inotion in JRd. For any closed set B C JRd, we introduce the hitting time TB == inf{ t > 0; Xt E B} and associated hitting kernel HB(x,dy) == Px{TB < 00, X'B E dy}, x E ]Rd. For suitable functions f, we write HB/(x) == J /(y)HB(a;, dy). By a domain in }Rd we mean an open, connected subset D c ]Rd. A function u: D -4 1R is said to be harmonic if it belongs to C 2 (D) and satisfies the Laplace equation du == O. We also say that u has the mean- value property if it is locally bounded and measurable, and such that for any ball BcD with center x, the average of u over the boundary BB equals u(x). The following analytic result is crucial for the probabilistic developments. Lemma 24.3 (harmonic functions, Gauss, Koebe) A fu.nction u on a do- main D C JRd is harmonic iff it has the mean-value property, in which case U E Coo(D). Proof: First assume that u E C 2 (D), and fix a ball BcD with center x. Writing T = TaB and noting that ExT < 00, we get by Ita's formula Exu(Xr) - u(x) = Ex 1 r llu(Xs)ds. Here the first term on the left equals the average of u over 8B, due to the spherical symmetry of Brownian motion. If u is harmonic, then the right-hand side vanishes, and the mean-value property follows. If instead u is not harmonic, we may choose B such that u =1= 0 on B. But then the right-hand side is nonzero, and so the mean-value property fails. It remains to show that every function u with the mean-value property is infinitely differentiable. Then fix any infinitely differentiable and spherically symmetric probability density 'P, supported by a ball of radius E > 0 around the origin. The mean-value property yields u = u * r.p on the set where the right-hand side is defined, and by dominated convergence the infinite differentiability of c.p carries over to u * c.p == u. 0 Before proceeding to the potential theoretic developments, we need to introduce a regularity condition on the domain D. Writing ( == (D == TDC, we note that Px{( = O} == 0 or 1 for every x E aD by Corollary 19.18. When this probability is 1, we say that x is regular for DC or simply regular. 
474 Foundations of Modern Probability If this holds for every x E aD, then the boundary aD is said to be regular and we refer to D as a regular domain. Regularity is a fairly weak condition. In particular, any domain with a smooth boundary is regular, and we shall see that even various edges and corners are allowed, provided they are not too sharp and directed inward. By a spherical cone in d with vertex v and axis a =1= 0 we mean a set of the form C == {x; (x - v, a) > clx - vi}, where c E (O,lalJ. Lemma 24.4 (cone condition, Zaremba) Given a domain D C IR d , let x E aD be such that C n G c DC for some some spherical cone C with vertex x and some neighborhood G of x. Then x is regular for DC. Proof: By compactness of the unit sphere in ]Rd, we may cover ]Rd by C 1 == C along with finitely many congruent cones C 2 , . . . , C n with vertex x. By rotational symmetry 1 == Px{mink<nTc1c == O} < '" P x {T0 1c == O} == nPx{TC == O}, - kn and so Px{ TO == O} > O. Hence, Corollary 19.18 yields P{ TC == O} = 1, and we get (D < TonG == 0 a.s. Px. 0 Now fix a domain D C IR. d and a continuous function f : aD -+ JR.. A function u on D is said to solve the Dirichlet problem (D, f), if u is harmonic on D and continuous on D with u == f on aD. The solution may be interpreted as the electrostatic potential in D when the potential on the boundary is given by f. / Theorem 24.5 (Dirichlet problem, Kakutani, Doob) For any regular do- main D C IR d and function f E C b (8D), the Dirichlet problem (D, f) is solved by the function u(x) == Ex[f(X'D); (D < 00] == HDcf(x), xED. (5) If (D < 00 a.s., then this is the only bounded solution; when d > 3 and f E Co(8D), it is the only solution in Co ( D ). Thus, HDc agrees with the sweeping (balayage) kernel of Newtonian potential theory, which determines the harmonic measure on aD. The following result clarifies the role of the regularity condition on aD. Lemma 24.6 (regularity, Doob) A point bEaD is regular for DC iff, for any f E Cb(8D), the function u in (5) satisfies u(x)  f(b) as D 3 x -+ b. Proof: First assume that b is regular. For any t > h > 0 and xED, we get by the Markov property Px{( > t} < Px{( 0 ()h > t - h} = EXPXh {( > t - h}. Here the right-hand side is continuous in x, by the continuity of the Gaussian kernel and dominated convergence, and so limsupP x {( > t} < EbPXh {( > t - h} == P b {( 0 Oh > t - h}. x--+-b 
24. Connections with PDEs and Potential Theory 475 As h -7 0, the probability on the right tends to P b {( > t} == 0, and so Px{( > t} -t 0 as x -+ b, which means that Px 0 (-1  boo Since also Px  Pb in C(IR+, IR d ), Theorem 4.28 yields Px 0 (X, ()-l  P b 0 (X, 0)-1 in C(JR+, JRd) x [0,00]. By the continuity of the mapping (x, t) r-+ Xt it follows that Px 0 X(l  P b 0 XOI == bb, and so u(x) --+ f(b) by the continuity of f. Next assume the stated condition. If d == 1, then D is an interval, which is obviously regular. Now assume that d > 2. By the Markov property we get for any f E Cb(an) u(b) == Eb[f(X(); ( < h] + Eb[U(Xh); ( > h], h > O. As h -+ 0, it follows by dominated convergence that u( b) == f ( b), and for f(x) == e- 1x - bl we get Pb{X( == b, ( < oo} == 1. Since a.s. Xt i- b for all t > 0 by Theorem 18.6 (i), we may conclude that P b {( == O} == 1, and so b is regular. 0 Proof of Theorem 24.5: Let u be given by (5), fix any closed ball in D with center x and boundary S, and conclude by the strong Markov property at 7 == 7S that u(x) == Ex[f(X(); ( < 00] == Ex EXT [f(X(); « 00] == EXu(X T ). This shows that u has the mean-value property, and so by Lemma 24.3 it is harmonic. From Lemma 24.6 it is further seen that u is continuous on D with u == f on aD. Thus, u solves the Dirichlet problem (D, f). Now assume that d > 3 and I E Co (aD). For any E > 0 we have lu(x)1 < € + 11/11 Px{lf(X()1 > E, ( < oo}. (6) Since X is transient by Theorem 18.6 (ii) and the set {y E aD; If(y)1 > E} is bounded, the right-hand side of (6) tends to 0 as Ix: -7 00 and then € -t 0, which shows that u E Co( D ). To prove the asserted uniqueness, it is clearly enough to assume f == 0 and show that any solution u with the stated properties is identically zero. If d > 3 and u E Co( D ), then this is clear by Lemma 24.3, which shows that harmonic functions can have no local maxima or minima. Next assume that ( < ()() a.s. and u E Cb( D ). By Corollary 17.19 we have E x u(X(l\n) == u(x) for any xED and n E N, and as n  00, we get by continuity and dominated convergence u(x) == Exu(Xc;) == o. 0 To prepare for our probabilistic construction of the Green function in a domain D C ]Rd, we need to study the transition densities of Brownian motion killed on the boundary aD. Recall that ordinary Brownian motion in IR d has transition densities Pt(X, y) = (27rt)-d/2e-lx-YI2 /2t, X, Y E IR d , t > O. (7) 
476 Foundations of Modern Probability By the strong Markov property and Theorem 6.4, we get for any t > 0, xED, and B c B (D), Px{Xt E B} == Px{Xt E B, t < (} + Ex[Tt-(lB(X,); t > (]. Thus, the killed process has transition densities pp(x, y) == Pt(x, y) - Ex[Pt-«(X(, y); t > (], x, Y E D, t > O. (8) The following symmetry and continuity properties of PP play a crucial role in the sequel. Theorem 24.7 (transition density, Hunt) For any domain D in }Rd and time t > 0, the function PP is symmetric and continuous on D 2 . If bEaD is regular, then pf(x, y) -4 0 as x -4 b for fixed y E D. Proof: From (7) we note that Pt(x, y) is uniformly continuous in (x, y) for fixed t > 0, as well as in (x, y, t) for Ix - yl > c > 0 and t > O. By (8) it follows that pf(x,y) is equicontinuous in y E D for fixed t > O. To prove the continuity in xED for fixed t > 0 and y ED, it is then enough to show that Px{Xt E B, t < (} is continuous in x for fixed t > 0 and B E B(D). Letting h E (0, t), we get by the Markov property Px{Xt E B, ( > t} == Ex [P Xh {Xt-h E B, ( > t - h}; ( > h]. Thus, for any x, y E D, I(Px - Py){Xt E B, t :< (}I < (Px + Py){( < h} + IIPx 0 X h 1 - Py 0 X h 1 }II, which tends to 0 as y --+ x and then h --+ O. Combining the continuity in x with the equicontinuity in y, we conclude that pf(x,y) is continuous in (x, Y) E D 2 for fixed t > O. To prove the symmetry in x and y, it is now enough to establish the integrated version i Px{Xt E B, ( > t}dx = l Px{Xt E C, ( > t}dx, (9) for any bounded sets B, C E B(D). Then fix any compact set FeD. Letting n E N and writing h == 2- n t and tk == kh, we get by Proposition 8.2 l PX{X tk E F, k < 2 n ; Xt E B}dx = r ... r lc(xo)lB(X2n) II Ph(Xk-l, xk)dx o .. . dX2n. iF iF k::;2 n Here the right-hand side is symmetric in the pair (B, C), because of the symmetry of Ph(X, y). By dominated convergence as n --+ 00 we obtain (9) with F instead of D, and the stated version follows by monotone convergence as F t D. 
24. Connections with PDEs and Potential Theory 477 To prove the last assertion, we recall from the proof of Lemma 24.6 that Px 0 (, X)-l  P b 0 (0, X)-l as x --+ b with bEaD regular. In particular, Px 0 ((, X()  c5 0 ,b, and by the boundedness and continuity of Pt(x, y) for Ix - yl > € > 0, it is clear from (8) that PP (x, y) ---+ O. 0 A domain D C d is said to be Greenian if either d > 3, or if d < 2 and Px {(D < oo} = 1 for all xED. Since the latter probability is harmonic in x, it is enough by Lemma 24.3 to verify the stated property for a single xED. Given a Greenian domain D, we may introduce the Green function gD(X,y) = i OOpp (X,y)dt, x,yED. For any measure J-t on D, we may further introduce the associated Green potential CDj.t(x) = J gD(x,y)j.t(dy), xED. Writing CD J-t = CD f when J..l(dy) = f(y)dy, we get by Fubini's theorem Ex it;, f(Xt)dt = J gD(x,y)f(y)dy = CD f(x), xED, which identifies gD as an occupation density for the killed process. The next result shows that gD and G D agree with the Green function and Green potential of classical potential theory. Thus, CD J..l( x) may be inter- preted as the electrostatic potential at x arising from a charge distribution J-t in D, when the boundary aD is grounded. Theorem 24.8 (Green function) For any Greenian domain D C JRd, the function gD is symmetric on D 2 . Furthermore, gD (x, y) is harmonic in xED \ {y} for each y E D, and if bEaD is regular, then gD(x, y) ---+ 0 as x --+ b for fixed y E D. The proof is straightforward when d > 3, but for d < 2 we need two technical lemmas. We begin with a uniform estimate for large t. Lemma 24.9 (uniform integrability) Consider a d01nain D assumed to be bounded when d < 2. Then lim sup 1 00 p(x, y)ds = o. too x,yED t C ]Rd , Proof: For d > 3 we may take D = ]Rd, in which case the result is obvious from (7). Next let d = 2. By obvious domination and scaling arguments, we may then assume that Ixl < 1, Y = 0, D = {z; Izi < 2}, and t > 1. 
478 Foundations of Modern Probability Writing Pt(X) = Pt (x, 0), we get by (8) pf(x,o) < Pt(X) - E o [Pt-«(l); ( < t/2] < Pt(O) - pt{l)Po{( < t/2} < Pt(O)Po{( > t/2} + Pt(O) - pt(l)  t- I PO{( > t/2} + t- 2 . As in case of Lemma 23.8 (ii), we have Eo( < 00, and so by Lemma 3.4 the right-hand side is integrable in t E [1, (0). The proof for d == 1 is similar. 0 We also need the fact that bounded sets have bounded Green potential. Lemma 24.10 (boundedness) For any Greenian domain D C d and bounded set B E B(D), the function GDIB is bounded. Proof: By domination and scaling together with the strong Markov prop- erty, it suffices to take B == {x; Ixl < I} and to show that CD 1 B (0) < 00. For d > 3 we may further take D == d, in which case the result follows by a simple computation. For d == 2 we may assume that D :) C = {x; Ixl < 2}. Write a == (C+TBO()(c and TO == 0, and recursively define Tk+I == Tk+aof)'Tk' k > O. Putting b == (1,0), we get by the strong Markov property at the times Tk CDIB{O) = CCIB(O) + CCIB(b)  PO{Tk < (}. kl Here CCIB(O) V CCIB(b) < 00 by Lemma 24.9. By the strong Markov property it is further seen that Po{ Tk < (} < pk, where p = SUPXEB Px{ a < (}. Finally, note that p < 1, since Px{a < (} is harmonic and hence continuous on B. The proof for d == 1 is similar. 0 Proof of Theorem 24.8: The symmetry of gD is clear from Theorem 24.7. If d > 3, or if d == 2 and D is bounded, it is further seen from Theorem 24.7, Lemma 24.9, and dominated convergence that gD (x, y) is continuous in xED \ {y} for each y E D. Next we note that GDI B has the mean- value property in D \ B for bounded B E B(D). The property extends by continuity to the density gD(x, y), which is then harmonic in xED \ {y} for fixed y E D, by Lemma 24.3. For d == 2 and unbounded D, we define Dn == {x E D; Ixl < n}, and note as before that gDn (x, y) has the mean-value property in x E Dn \ {y} for each y E Dn. Since pfn t PP by dominated convergence, we have gDn t gD, and so the mean-value property extends to the limit. For any x =1= y in D, choose a circular disk B around y with radius € > 0 small enough that x  B cD. Then 1r€2gD(x, y) == GDIB(x) < 00 by Lemma 24.10. Thus, by Lemma 24.3 even gD(x, y) is harmonic in xED \ {y}. To prove the last assertion, fix any y E D, and assume that x -+ bEaD. Choose a Greenian domain D' :> D with bED'. Since PP < pP', and 
24. Connections with PDEs and Potential Theory 479 both pP' (" y) and gD' (., y) are continuous at b whereas PP (x, y) -+ 0 by Theorem 24.7, we get gD(x,y) -+ 0 by Theorem 1.21. 0 We proceed to show that a measure is determined by its Green potential whenever the latter is finite. An extension appears as part of Theorem 24.12. For convenience, we write pp ",,(x) = J pp(x, y)",,(dy), xED, t > O. Theorem 24.11 (uniqueness) If J.L and v are measures on a Greenian domain D C ]Rd such that CD J.L == CDv < 00, then M == v. Proof: For any t > 0 we have it (PsD ",,)ds = CD"" - PPC D "" = CDv - PPCDv = it (Ppv)ds. (10) By the symmetry of pD, we further get for any measurable function f : D -+ IR+ Jf(x)PP",,(x)dx Jf(X)dX Jp:;(x,y)",,(d y ) J ",,(dy) J f(x)p:;(x, y)dx = J psD f(y)",,(dy). Hence, J f(x)dx it psD ",,(x) ds it ds J Pi> f(y) ",,(dy) J ",,(dy) it Pp f(y) ds, and similarly for lI. By (10) we obtain J ",,(dy) it psD f(y)ds = J v(dy) it psD f(y)ds. (11) Assuming that f E Cj«(D), we get psD f -+ f as s -+ 0, and so t- I f pf fds -+ f. If we can take limits inside the outer integrations in (11), we obtain J..tf == v f, which implies J1 == II since f is arbitrary. To justify the argument, it suffices to show that SUPs pf f is J1- and v- integrable. Then conclude from Theorem 24.7 that f  p;> (., y) for fixed s > 0 and y ED, and from Theorem 24.8 that f  CD f. The latter property yields pF f ;S pFG D f < G D f, and by the former property we get for any y E D and s > 0 ",,(CD!) = J CD",,(x)f(x)dx  psDCD",,(y) < CDJ.L(Y) < 00, and similarly for v. o 
480 Foundations of Modern Probability Now let :FD and KD denote the classes of closed and compact subsets of D, and write F D and /CD for the subclasses of sets with regular boundary. For any B E :F D we may introduce the associated hitting kernel HE(x,dy) = Px{TB < (D, X rB E dy}, xED. Note that if X has initial distribution J-L, then the hitting distribution of XC; in B equals J-tHE = J J-t(dx)HE(x, .). The next result solves the sweeping problem of classical potential theory. To avoid technical complications, here and below, we shall only consider subsets with regular boundary. In general, the irregular part of the bound- ary can be shown to be polar, in the sense of being a.s. avoided by a Brownian motion. Given this result, one can easily remove all regularity restrictions. Theorem 24.12 (sweeping and hitting) For any Greenian domain D C R d and subset B E F1, let J.l be a bounded measure on D with GD J-t < 00 on B. Then J-tH E is the unique measure 1/ on B with CD J.,L = CD 1/ on B. For an electrostatic interpretation, assume that a grounded conduc- tor B is inserted into a domain D with grounded boundary and charge distribution J-l. Then a charge distribution -J-tHE arises on B. A lemma is needed for the proof. Here we define gD\B(x, y) = 0 whenever x or y lies in B. Lemma 24.13 (fundamental identity) For any Greenian domain D C JR.d and subset B E F D , we have gD(x,y) = gD\B(x,y) + 1 HB(x,dz)gD(z,y), x,y E D. Proof: Write ( = (D and T = TB. Subtracting relations (8) for the do- mains D and D \ B, and using the strong Markov property at T together with Theorem 6.4, we get pf(x,y) - pf\B(x,y) - Ex [Pt-r(X r , y); T < ( 1\ t] - Ex (Pt-«(X" y); T < ( < t] Ex [Pt - r (X r, y); T < ( 1\ t] - Ex [ Ex.,. [Pt -7" - C; ( X ( , y); ( < t - T]; T < ( 1\ t] - ExlPR-r(Xr, y); T < ( 1\ t]. Now integrate with respect to t to get gD(x,y) - gD\B(X,y) _ Ex[gD(Xr,Y); T < (] J HE(x,dz)gD(z,y). 0 Proof of Theorem 24.12: Since 8B is regular, we have Hi(x,.) = b x for all x E B, and so by Lemma 24.13 we get for all x E Band zED J gD(x,y)HE(z,dy) = J gD(z,y)HB(x,dy) = gD(z,x). 
24. Connections with PDEs and Potential Theory 481 Integrating with respect to J-L(dz) gives CD(J-LHf])(x) == CD J-l(x), which shows that v == J-lH f] has the stated property. Now consider any measure v on B with CD J-l == CD V on B. Noting that gD\B(x,.) == 0 on B whereas Hf](x,.) is supported by B, we get by Lemma 24.13 for any xED GDv(x) J v(dz)gD(z,x) = J v(dz) J gD(z,y)Hf](x,dy) J Hf](x,dy)GDv(y) = J Hf](x,dy)GDJ.L(Y). Thus, J-l determines CDv on D, and so l/ is unique by Theorem 24.11. 0 Let us now turn to the classical equilibrium problem. For any K E /CD we introduce the last exit or quitting time I'f == sup{t < (D; Xt E K} and the associated quitting kernel L(x,dy) == p x {l'f > 0; X(I') E dy}. Theorem 24.14 (equilibrium measure and quitting, ()hung) For any Greenian domain D E ]Rd and subset K E lCD, there exists a measure J-l on 8K such that Lf(x, dy) == gD (x, Y)J-lf(dy), xED. (12) Furthermore, J-lf is diffuse when d > 2, and if K E Kb, then J-l is the unique measure J-l on K satisfying CD J-l == 1 on K. Here J-lf is called the equilibrium measure of K relative to D, and its total mass Cf is called the capacity of K in D. For an electrostatic interpretation, assume that a conductor K with potential 1 is inserted into a domain D with grounded boundary. Then a charge distribution J-l arises on the boundary of K. Proof of Theorem 24.14: Write l' == I', and define lc(x) == E- 1 Px{O < 1 < E}, E > O. Using Fubini's theorem, the simple Markov property, and dominated convergence as E -t 0, we get for any f E Cb(D) and xED GD(Jle)(x) - Ex 1< f(Xt)le(Xt)dt c 1 1 00 Ex [f(Xt)PXt {O < 'Y < di t < (]dt - c 1 1 00 Ex[f(Xt); t < 'Y < t + .:-]dt e;-l Ex r f(Xt)dt J("f- c )+ -t Ex[f(X"f); l' > 0] == L f(x). 
482 Foundations of Modern Probability If f has compact support, then for each x we may replace f by the bounded, continuous function f / gD ( x, .) to get as £  0 J f( )l ( )d  J L(x, dy)f(y) y € Y Y gD ( x, Y) . (13) Since the left-hand side is independent of x, the same thing is true for the measure D ( d ) _ L(x, dy) {tK Y - D ( ) . 9 x, Y (14) If d = 1, we have gD(x, x) < 00, and (14) is trivially equivalent to (12). If instead d > 2, then singletons are polar, and so the measure L(x, .) is diffuse, which implies the same property for {t. Thus, (12) and (14) are again equivalent. We may further conclude from the continuity of X that L(x, .), and then also {tf is supported by oK. Integrating (12) over D yields Px{TK < (D} == GD{t(x), xED, and so for K E JC D we get G D J.-tf == 1 on K. If y is another measure on K with GDy == 1 on K, then v = {t by the uniqueness part of Theorem 24.12. 0 The next result relates the equilibrium measures and capacities for different sets K E /CD' Proposition 24.15 (consistency) For any Greenian domain D C JRd and subsets K c B in /CD' we have D D H D D L D J.-tK J.-tB K == {tB K, C = L Px{TK < (D}JL£(dx). (15) (16) Proof: By Theorem 24.12 and the defining properties of {t and {t, we have on K GD({tHJi) == GD{t == 1 == cD{tf, and so {tl!fl == J-t by the same result. To prove the second relation in (15), we note by Theorem 24.14 that, for any A E B(K), JL£L(A) J JL£(dx) i gD(x, Y)JL(dy) i GDJL£(Y)JL(dy) = JL(A), since CD J.-t == 1 on A c B. Finally, (15) implies (16), since H(x, K) == Px{TK < (D}. 0 Some basic properties of capacities and equilibrium measures follow im- mediately from Proposition 24.15. To explain the terminology, fix any space 
24. Connections with PDEs and Potential Theory 483 S along with a class of subsets U, closed under finite unions. For any function h : U -+ IR and sets U, U l , U 2 , . .. E U, we recursively define the differences Ul h(U) Ul ,...,U n h(U) h(U U U I ) - h(U), Un {Ul ,...,U n - 1 h(U)}, rt > 1, where the difference Un in the last formula is taken with respect to U. Note that the higher-order differences Ul,...,Un are invariant under permu- tations of U I , . . . , Un. We say that h is alternating or completely monotone if (_l)n+lUl,""Unh(U) > 0, n E N, U, U I , U 2 ,." E U. Corollary 24.16 (dependence on conductor, Choquet) For any Greenian domain D C JRd, the capacity cfI is an alternating function of K E Kb. Furthermore, ILn  IL as Kn t K or Kn t K in Kv. Proof: Let 1/J denote the path of XC, regarded as a random closed set in D. Writing hx{K) = Px{'ljJK =1= 0} == Px{TK < (}, xED \ K, we get by induction (_l)n+lKl,...,Knhx(K) == Px{'ljJK == 0, 7jJK I =1= 0, ..., wK n =I- 0} > 0, and the first assertion follows by Proposition 24.15 with ]( c BO. To prove the last assertion, we note that trivially TKn - TK when Kn t K, and that TKn t TK when Kn t K since the Kn are closed. In the latter case we also note that nn {TKn < (} == {TK < (} by compactness. Thus, in both cases HJ1 n (x,.)  H(x,.) for all xED \ Un Kn, and by dominated convergence in Proposition 24.15 with BO => Un Kn we get D W D J-lK n -+ ILK. 0 The next result solves an equilibrium problem involving two conductors. Corollary 24.17 (condenser theorem) For any disjoint .sets B E F"D and K E Kb, there exists a unique signed measure v on B U K with CD v == 0 on Band GDv == 1 on K, namely _ D\B D\B H D v-J-lK -J-lK B' Proof: Applying Theorem 24.14 to the domain D \ B with subset K, we get v = ILf\B on K, and then v == -JLf\B HE on B by Theorem 24.12. 0 The symmetry between hitting and quitting kernels in I>roposition 24.15 may be extended to an invariance under time reversal of the whole pro- cess. More precisely, putting "y == l'f, we may relate the stopped process Xl = X--y/\t to its reversal Xi = X(")'-t)+. For convenience, we write 
484 Foundations of Modern Probability PJ..L = J PxJ.t( dx) and refer to the induced measures as distributions, even when J-t is not normalized. Theorem 24.18 (time reversal) Given a Greenian domain D E ]Rd and a set K E Kn, put "Y == 'Y and JL == Jlf. Then X'" d .x, under PJ..L. Proof: Let Px and Ex refer to the process X(. Fix any times 0 == to < t 1 < ... < tn, and write Sk = t n -tk and h k = tk -tk-l. For any continuous functions 10, . .. ,in with compact supports in D, we define r (x) Ex Ito !k(X Sk ) lE(X tn ) - Ex Itl !k(X Sk ) EXSl (JOlE)(XtJ, where the last equality holds by the Markov property at S1. Proceeding as in the proof of Theorem 24.14, we get / (rGDf.L) (x) dx / GD!E(y) f.L(dy) -t EI-' IIJk(X;:') 1{-y > t n }. (17) On the other hand, (13) shows that the measure l€(x)dx tends vaguely to J-t, and so by Theorem 24.7 Ex (fOlE)(XtJ / pf.(x,y) (fOlE)(y)dy -t / pf.(x,y)!o(y)f.L(dy). Using dominated convergence, Fubini's theorem, Proposition 8.2, Theorem 24.7, and the relation GDJ-t(x) == Px{ "Y > O}, we obtain / (fEGDf.L)(x)dx -t / GDf.L(x)dx / !o(Y)f.L(dy)Ex IIk>o!k(Xsk)Pf.(Xspy) / !o(xo)f.L(dxo) /.../ GDf.L(xn) IIk>opfk (Xk-l, Xk)!k(Xk)dxk EI-' II/k(Xtk)GDf.L(Xtn) = EI-' II/k(X tk )l{-y > t n }. Comparing with (17), we see that X'" and X'" have the same finite-dimen- sional distributions. 0 We may now extend Proposition 24.15 to the case of possibly different Greenian domains DeD'. Fixing any K E JCD, we recursively define the optional times D' D Tj = 'Yj-l + TK 0 ()'"Y:J- 1 ' 'Yj == Tj + 'YK 0 Orj' j > 1, starting with 1'0 == O. In other words, Tk and ,k are the times of hitting or quitting K during the kth excursion in D that reaches K, prior to the exit 
24. Connections with PDEs and Potential Theory 485 time (D'. The generalized hitting and quitting kernels are given by D D'  D D'  H K ' (x,.) = Ex k 8 X (Tk)' L K ' (x,.) = Ex k 8 X ('Yk)' where the summations extend over all kEN with Tk < 00. Theorem 24.19 (extended consistency relations) Let D c D' be Gree- nian domains in JRd with regular compact subsets K c K'. Then D D' D D' D' D,D' ilK = {lK,H K ' = J.LK,L K . (18) Proof: Define l€ = E- 1 Px{ I'll E (0, E]}. Proceeding as in the proof of Theorem 24.14, we get for any xED' and f E Cb(D') ,{(D' DD' CD (flc)(x) = E:- 1 Ex J o f(Xdlb oOt E (0, E:]}dt  L K ' f(x). If f has compact support in D, we may conclude as before that D D' J D J J LK' (x, dy)f(y) f(Y)J.LK(dy) f- (flc)(y)dy  gD'(x,y) ' and so D D' D' D L K ' (x, dy) = 9 (x, Y){lK(dy). Integrating with respect to Il:, and noting that GD'Ilf<: = 1 on K' =:) K, we obtain the second expression for {l in (18). , D D' D D' To deduce the first expression, we note that H f H K' = H K' by the strong Markov property at TK. Combining with the second expression in (18) and using Theorem 24.18 and Proposition 24.15, we get D D' D D' D' D D' v' D' D D' D' D,D' {lK = ilK L K ' = ilK H K ' = {lK,H K H K ' = f.-lK,H K . 0 The last result enables us to study the equilibrium measure JL and capacity C as functions of both D and K. In particular, we obtain the following continuity and monotonicity properties. Corollary 24.20 (dependence on domain) For any regular, compact set K C JRd, the measure J..l is nonincreasing and continuo'U..s from above as a function of the Greenian domain D =:) K. Proof: The monotonicity is clear from (18) with K = K', since H,D' (x,.) > 8x for x EKe D cD'. It remains to prove that cf1 is con- tinuous from above and below in D for fixed K. By dominated convergence it is then enough to show that Kfn -t K, where K = sup{j; Tj < oo} is the number of D-excursions hitting K. Assuming Dn t D, we need to show that if Xs, Xt E K and XED on [s, t], then X E Dn on [s, t] for sufficiently large n. But this is clear from the compactness of the path on the interval [s, t]. If instead Dn 4- D, we need to show for any r < s < t with X r , Xt E K and Xs f/:- D that Xs rJ. Dn for sufficiently large n. But this is obvious. 0 
486 Foundations of Modern Probability We proceed to show how Green capacities can be expressed in terms of random sets. Let X denote the identity mapping on FD. Given any measure v on FD \ {0} with v{XK -=f 0} < 00 for all K E /CD, we may introduce a Poisson process 1] on :F D \ {0} with intensity measure v and form the associated random closed set <p = U{F; 1]{F} > O} in D. Letting 'Trv denote the distribution of c.p, we note that 1I"v{x K = 0} = P{1]{XK =l0} = O} = exp( -v{X K =l0}), K E /CD. Theorem 24.21 (Green capacities and random sets, Choquet) For any Greenian domain D C ]Rd, there exists a unique measure v on F D \ {0} such that cf = v{X K =l0} = -log7r v {X K = 0}, K C /CD. Proof: Let 'l/J denote the path of X< in D. Choose sets Kn t D in /CD with Kn C K+l for all n, and put J.Ln == J.tn' 'l/ln = 'l/;Kn, and Xn = XKn. Define v = J Px{'l/Jp E ., 'l/Jn i= 0}/Lp(dx), n < p, (19) and conclude by the strong Markov property and Proposition 24.15 that v{Xp E ., Xm # 0} = v, m < n < p < q. (20) By Corollary 6.15 there exist some measures V n on FD, n L E N, satisfying vn{Xp E.} = v, n < p, and from (20) we note that V n { . , Xm # 0} = V m , m < n. (21) (22) Hence, the measures V n agree on {Xm # 0} for n > m, and so we may define v = sUP n v n . By (22) we have v{., Xn =1= 0} = V n for all n. Assuming K E /CD with K c K, we conclude from (19), (21), and Proposition 24.15 that v{XK # 0} vn{x K =1= 0} = v;:{X K =1= 0} J Px{'l/Jn K i= 0}/Ln(dx) J Px{TK < (}/Ln(dx) = cIl. The uniqueness of 1I is clear by a monotone class argument. o The representation of capacities in terms of random sets will now be extended to the abstract setting of alternating set functions. As in Chapter 16, we may then fix an IcscH space S with Borel a-field S, open sets Q, closed sets :F, and" compacts /C. Write S = {B E S; B E K:}, and recall that a class U C S is said to be separating if for any K E J( and G E 9 with KeG there exists some U E U with K cUe G. 
24. Connections with PDEs and Potential Theory 487 For any nondecreasing function h on a separating class U c S, we define the associated inner and outer capacities hO and h by h ° ( G) h(K) sup{h(U); U E U, U c G}, inf{h(U); U E U, UO :J K}, G E 9, K E K. Note that the formulas remain valid with U replaced by any separating subclass. For any random closed set <.p in S, the associated hitting function h is given by h(B) = P{<pB =1= 0} for all B E S. Theorem 24.22 (alternating functions and random sets, Choquet) The hitting function h of a random closed set in S is alternating wit h == h on K and h = hO on Q. Conversely, given a separating class U C 5, closed under finite unions, and an alternating function p: U -+ [0, 1] with p(0) = 0, there exists a random closed set with hitting function h such that h == 15 on K and h == pO on g. The algebraic part of the construction is clarified by the following lemma. Lemma 24.23 (discrete case) Assume U C S to be finite and closed un- der unions, and let h: U -+ [0, 1] be alternating with h(0) == O. Then there exists a point process  on S such that P{U > O} = h(U) for all U E U. Proof: The statement is obvious when U == {0}. Proceeding by induction, assume the assertion to be true when U is generated by up to n -1 sets, and consider a class U generated by n nonempty sets B 1 ,. .. , Bn. By scaling we may assume that h(B l U . . . U Bn) == 1. For each j E {I, . . . , n}, let U j be the class of unions formed by the sets B i \ Bj, i =1= j, and define hj(U) = Dt.uh(Bj) == h(Bj U U) - h(Bj), [] E U j . Then each h j is again alternating with h j (0) == 0, and so the induction hypothesis ensures the existence of some point process j on Ui B i \ Bj with hitting function h j . Note that h j remains the hitting function of j on all of U. Let us further introduce a point process n+1 with pn.{n+1Bi > O} == (_1)n+1 Bl,...,Bnh(0). t For 1 < j < n + 1, let 1/ j denote the restriction of L ( j) to the set A j = ni<j{/LB i > O}, and put 1/ == Lj 1/j. We may take  to be the canonical point process on S with distribution 1/. To see that  has hitting function h, we note that for any U E U and j < n, Vj{j.tU > O} - P{1B1 > 0, .. . , jB1-1 > 0, jU > O} ( _1 ) 1+1 Dt. h . ( 0 ) Bl,...,Bj-l,U J ( _1 ) j+1 Dt. h ( B . ) Bt,...,Bj-l,U J. 
488 Foundations of Modern Probability It remains to show that, for any U E U \ {0},  ( _1 ) j+18 h ( B. ) + ( -1 ) n+1 h ( 0 ) =h ( U ) j$.n Bl,...,BJ-l,U J Bl,...,Bn . This is clear from the fact that Bl,...,BJ_l,uh(Bj) = Bl,...,Bj,uh(0) + D.Bl,...,B J _ 1 ,uh(0). 0 Proof of Theorem 24.22: The direct assertion can be proved in the same way as Corollary 24.16. Conversely, let U and p be as stated. By Lemma A2.7 we may assume U to be countable, say U = {U 1 , U 2 ,. . .}. For each n, let Un be the class of unions formed from U 1 ,..., Un. By Lemma 24.23 there exist some point processes l, 2, . .. on S such that P{nU > O} = p(U), U E Un, n E N. The space :F is compact by Theorem A2.5, and so by Theorem 16.3 there exists some random closed set c.p in S such that supp n  c.p along a subsequence N' c N. Writing h n and h for the associated hitting functions, we get h(BO) < limif hn(B) < limsuphn(B) = h( B ), nEN nEN' '" BE S, and in particular, h(UO) < p(U) < h( U ), U E U. Using the strengthened separation property K c UO c U c G, we may easily conclude that h = pO on 9 and h = P on /C. 0 Exercises 1. For a domain D C }R2 and point x E aD, assume that x E I c DC for some line segment I. Show that x is regular for DC. (Hint: Consider the windings around x of Brownian motion starting at x, using the strong Markov property and Brownian scaling.) 2. Compute the Newtonian potential kernel 9 = gD when D = JRd with d > 3, and check by direct computation that g(x, y) is harmonic in x i= y for fixed y. 3. For any domain D C d, show that Pt(x, y) - pP(x, y) -+ 0 as t -+ 0, uniformly for x =1= y in a compact set KeD. Also prove the same convergence as inf{lxl; x fj. D} --+ 00, uniformly for bounded t > 0 and x =1= y. (Hint: Note that Pt(x,y) is uniformly bounded for Ix - yl > € > 0, and use (8).) 4. Given a domain D C }Rd with d > 3, show that g(x, y) - gD(x, y) is uniformly bounded for x =1= y in a compact set KeD. Also show that the difference tends to 0 as inf{lxl; x  D} -t 00, uniformly for x i= y in K. (Hint: Use Lemma 24.13.) 
24. Connections with PDEs and Potential Theory 489 5. Show that the equilibrium measure J.L is restricted to the outer bound- ary of K and agrees for all sets K with the same outer boundary. (Here the outer boundary of K consists of all points x E oK that can be connected to DC or 00 by a path through KC.) Prove a corresponding statement for the sweeping measure l/ in Theorem 24.12.) 6. For any Greenian domain D C d, disjoint sets Kl'...' Kn E K D , and constants PI, . . . , Pd E JR, show that there exists a unique signed measure l/ on UjK j with GDv = Pj on Kj for all j. (Hint: Use Corollary 24.17 recursively. ) 7. Show that if 'PI and <{J2 are independent random sets with distributions 'IT VI and 1f V2' then 'PI U 'P2 has distribution 1[" Vl +V2 . 8. Extend Theorem 24.22 to unbounded functions p. (Hint: Consider the restrictions to compact sets, and proceed as in Theorem 24.21.) 
Chapter 25 Predictability, Compensation, and Excessive Functions Accessible and predictable times; natural and predictable pro- cesses; Doob-Meyer decomposition; quasi-left-continuity; com- pensation of random measures; excessive and superharmonic junctions; additive functionals as compensators; Riesz decom- position The purpose of this chapter is to present some fundamental, yet profound, extensions of the theory of martingales and optional times from Chapter 7. A basic role in the advanced theory is played by the notions of predictable times and processes, as well as by various decomposition theorems, the most important being the celebrated Doob-Meyer decomposition, a continuous- time counterpart of the elementary Doob decomposition from Lemma _7.10. Applying the Doob-Meyer decomposition to increasing processes and their associated random measures leads to the notion of a compensator, whose role is analogous to that of the quadratic variation for martingales. In particular, the compensator can be used to transform a fairly general point process to Poisson, in a similar way that a suitable time-change of a continuous martingale was shown in Chapter 18 to lead to a Brownian motion. The chapter concludes with some applications to classical potential the- ory. To explain the main ideas, let f be an excessive function of Brownian motion X on }Rd. Then f(X) is a continuous supermartingale under Px for every x, and so it has a Doob-Meyer decomposition M - A. Here A can be chosen to be a continuous additive functional (CAF) of X, and we obtain an associated Riesz decomposition f = U A + h, where U A denotes the potential of A and h is the greatest harmonic minorant of f. The present material is related in many ways to topics from earlier chapters. Apart from the already mentioned connections, we shall occa- sionally require some knowledge of random measures and point processes from Chapter 12, of stable Levy processes from Chapter 15, of stochastic calculus from Chapter 17, of Feller processes from Chapter 19, of additive functionals and their potentials from Chapter 22, and of Green potentials from Chapter 24. The notions and results of this chapter play a crucial role for the analysis of semimartingales and construction of general stochastic integrals in Chapter 26. 
25. Predictability, Compensation, and Excessive Functions 491 All random objects in this chapter are assumed to be defined on some given probability space f2 with a right-continuous and complete filtration F. In the product space n x 1R+ we may introduce the predictable a-field P, generated by all continuous, adapted processes on ffi.+. The elements of P are called predictable sets, and the P-measurable functions on n x 1R+ are called predictable processes. Note that every predictable process is progressIve. The following lemma provides some useful characterizations of the predictable a-field. Lemma 25.1 (predictable a-field) The predictable a-field is generated by each of the following classes of sets or processes: (i) Fo x IR+ and the sets A x (t,oo) with A EFt, t > 0; (ii) Fo x + and the intervals (T, 00) for optional times T; (iii) the left-continuous, adapted processes. Proof: Let PI, P 2 , and P 3 be the a-fields generated by the classes in (i), (ii), and (iii), respectively. Since continuous functions are left-continuous, we have trivially P C P3. To see that P3 C PI, it is enough to note that any left-continuous process X can be approximated by the processes X;" = XOl[O.l] (nt) + LklXk/nl(k.k+1](nt), t > O. Next we obtain PI C P2 by noting that the random time tA == t.lA +OO.lAc is optional for any t > 0 and A E :Ft. Finally, we may prove the relation P2 C P by noting that, for any optional time T, the process l(T,oo) can be approximated by the continuous, adapted processes Xl" == (n(t - T)+) /\ 1, t > o. 0 A random variable T in [0,00] is called a predictable time if it is announced by some optional times Tn t T with Tn < T a.s. on {T > O} for all n. With any optional time T we may associate the a-field F T - generated by Fo and the classes :Ft n {T > t} for arbitrary t > O. The following result gives the basic properties of the a-fields Fr-. It is interesting to note the similarity with the results for the a-fields F T in Lemma 7.1. Lemma 25.2 (strict past) For any optional times a and T, we have (i) Fu n {a < T} C F T - C :F T ; (ii) if T is predictable, then {a < T} E Fu- n F T -; (iii) if T is predictable and announced by (Tn), then V n Fr n == :F7"-. Proof: (i) For any A E Fu we note that An{a<T}=U (An{a < r}n{r<T})EF T _, rEQ+ since the intersections on the right are generators of :FT-. Hence, Fu n {a < T} E FT-' The second relation holds since each generator of Fr- lies in Fr. 
492 Foundations of Modern Pobability (ii) Assuming that (Tn) announces T, we get by (i) {T < 0-} = {T = O} u nn {Tn < a} E :Fa-. (iii) For any A E :F rn we get by (i) A == (A n {Tn < T}) U (A n {Tn == T == O}) E :F r -, and so V n :F rn C :Fr-. Conversely, (i) yields for any t > 0 and A E :Ft An {T > t} = Un (A n {Tn> t}) E V n:F rn - C V n:F rn , which shows that :F r - C V n Fr n . o Next we examine the relationship between predictable processes and the a-fields :Fr-. Similar results for progressive processes and the a-fields Fr were obtained in Lemma 7.5. Lemma 25.3 (predictability and strict past) (i) For any optional time T and predictable process X, the random variable X,I {T < oo} is Fr- -measurable. (ii) For any predictable time T and :F r - -measurable random variable a, the process Xt = al{T < t} is predictable. Proof: (i) If X == IAX(t,oo) for some t > 0 and A EFt, then clearly {X,l{T < oo} = I} = An {t < T < oo} E :Fr-. We may now extend by a monotone class argument and subsequent ap- proximation, first to arbitrary predictable indicator functions, and then to the general case. (ii) We may clearly assume a to be integrable. Fixing an announcing sequence (Tn) for T, we define xr == E[aIF'-n](l{O < Tn < t} + 1{ Tn = O}), t > o. Then each X n is left-continuous and adapted, hence predictable. Moreover, X n -+ X on + a.s. by Theorem 7.23 and Lemma 25.2 (iii). 0 By a totally inaccessible time we mean an optional time T such that P {a = T < oo} = 0 for every predictable time a. An accessible time may then be defined as an optional time T such that P{ a = T < oo} = 0 for every totally inaccessible time a. For any random time T, we introduce the associated graph [T] = {(t,w) E + x Q; r(w) = t}, which allows us to express the previous condition on a and T as [a] n [T] = 0 a.s. Given any optional time T and set A E :F,-, the time TA = TIA +OO.IAc is again optional and is called the restriction of T to A. We now consider a basic decomposition of optional times. Related decompositions of increasing processes and martingales are given in Propositions 25.17 and 26.16. 
25. Predictability, Compensation, and Excessive .Functions 493 Proposition 25.4 (decomposition of optional times) For any optional time T there exists an a.s. unique set A E F T n {T < oo} such that T A is accessible and 7 Ac is totally inaccessible. Furthermore, there exist some predictable times 71,72,. .. with [7 A] C Un [Tn] a.s. Proof: Define p = suppU {7 = Tn < oo}, n (1) where the supremum extends over all sequences of predictable times 'Tn. Combining sequences such that the probability in (1) approaches p, we may construct a sequence (Tn) for which the supremum is attained. For such a maximal sequence, we define A as the union in (1). To see that 7 A is accessible, let a be totally inaccessible. Then [0-] n ['Tn] == o a.s. for every n, and so [a] n [7 A] == 0 a.s. If 7 Ac is not totally inaccessible, then P{ T Ac == 70 < oo} > 0 for some predictable time 70, which contradicts the maximality of 71,72,. .. . This shows that A has the desired property. To prove that A is a.s. unique, let B be another set with the stated properties. Then T A \B and 7B\A are both accessible and totally inaccessible, and so 7 A \B == 7B\A == 00 a.s., which implies A == B a.s. 0 We proceed to establish a version of the celebrated Doob-Meyer decompo- sition, a cornerstone in modern probability theory. By an increasing process we mean a nondecreasing, right-continuous, and adapted process A with Ao == O. We say that A is integrable if EAoo < 00. Recall that all sub- martingales are assumed to be right-continuous. Local submartingales and locally integrable processes are defined by localization in the usual way. Theorem 25.5 (decomposition of submartingales, Meyer, Doleans) A process X is a local submartingale iff it has a decomposition X == M + A, where M is a local martingale and A is a locally integrable, increasing, predictable process. In that case M and A are a.s. unique. The process A in the statement is often referred to as the compensator of X, especially when X is increasing. Several proofs of this result are known, most of which seem to require the deep section theorems. Here we give a relatively short and elementary proof, based on Dunford's weak compactness criterion and an approximation of totally inaccessible times. For convenience, we divide the proof into several lemmas. Let (D) denote the class of measurable processes X such that the fam- ily {X r } is uniformly integrable, where T ranges over the set of all finite optional times. By the following result it is enough to consider class (D) submartingales. Lemma 25.6 (uniform integrability) Any local subrnartingale X with Xo = 0 is locally of class (D). Proof: First reduce to the case when X is a true submartingale. Then introduce for each n the optional time 7 = n /\ inf {t > 0; IX t I > n}. Here 
494 Foundations of Modern Pobability /xrl < n V /Xrl, which is integrable by Theorem 7.29, and so X r is of class (D). 0 An increasing process A is said to be natural if it is integrable and such that E 10 00 MtdAt = 0 for any bounded martingale M. As a crucial step in the proof of Theorem 25.5, we may establish the following preliminary decomposition, where the compensator A is shown to he natural rather than predictable. Lemma 25.7 (Meyer) Any submartingale X of class (D) has a decom- position X = M + A, where M is a uniformly integrable martingale and A is a natural, increasing process. Proof (Rao): We may assume that Xo = O. Introduce the n-dyadic times fJ: = k2- n , k E Z+, and define for any process Y the associated differences n k Y = ytn - }tn. Let k+l k A == ""' E[kXIFtn], t > 0, n E N, k<2nt k and note that Mn == X - An is a martingale on the n-dyadic set. Writing r:;- == inf{t; A > r} for n E Nand r > 0, we get by optional sampling, for any n-dyadic time t, < E[A - Ar 1\ r] < E[A - AnAtJ r == E[Xt - Xr/\t] == E[Xt - Xr;:-/\t; A > r]. (2) By the martingale property and uniform integrability, we further obtain l E [ A n . An > 2r ] 2 t, t rP{A > r} < EA = EXt ::S 1, and so the probability on the left tends to zero as r --t 00, uniformly in t and n. Since the random variables Xt - XrnAt are uniformly integrable by r (D), the same property holds for the variables Af by (2) and Lemma 4.10. In particular, the sequence (A) is uniformly integrable, and each Mn is a uniformly integrable martingale. By Lemma 4.13 there exists some random variable Q E £1(:F00) such that A -t a weakly in L 1 along some subsequence N' c N. Define Mt = E[X oo - alFt], A = X - M, and note that Aoo = a a.s. by Theorem 7.23. For any dyadic t and bounded random variable , we get by the martingale and self-adjointness properties E(A - At) - E(M t - M) = E E[M oo - MIFt] = E(M oo - M)E[IFt] == E(A - a)E[IFt] --t 0, as n -t 00 along N'. Thus, A -t At weakly in £1 for dyadic t. In particular, we get for any dyadic s < t o < E[A - A; At - As < 0] -t E[(At - As) 1\ 0] < o. 
25. Predictability, Compensation, and Excessive Functions 495 Thus, the last expectation vanishes, and therefore At > As a.s. By right- continuity it follows that A is a.s. nondecreasing. Also note that Ao == 0 a.s. since Ao = 0 for all n. To see that A is natural, consider any bounded martingale N, and conclude by Fubini's theorem and the martingale properties of Nand An - A == M - Mn that ENooA Lk EN oo b..'kAn = Lk ENt'b..kAn '"' ENtnLln k A == E" NtnkA. k k k k Now use weak convergence on the left and dominated convergence on the right, and combine with Fubini's theorem and the martingale property of N to get E 1 00 Nt_dAt ENooAoo == " ENookA ==  ENtn LlkA k k k+l - E" Ntn b..kA --+ E roo NtdAt. k k+l Jo Hence, E 10 00 L1NtdAt = 0, as required. o To complete the proof of Theorem 25.5, it remains to show that the compensator A in the last lemma is predictable. This will be inferred from the following ingenious approximation of totally inaccessible times. Lemma 25.8 (uniform approximation, Doob) For any totally inaccessible time r, put r n == 2-n[2nr], and let X n be a right-continuous version of the process P[r n < tiFt]. Then lirn suplX -l{r < t}\ = 0 a.s. (3) n--+oo tO Proof: Since r n t r, we may assume that xl > xl > ... > l{r < t} for all t > O. Then X:" == 1 for t E [7,(0), and on the set {7 = oo} we have Xl < P[r < 00 1Ft] -+ 0 a.s. as t -+ 00 by Theorem 7.23. Thus, sUPn (Xr -l{r < t}1 -+ 0 a.s. as t -+ 00. To prove (3), it is then enough to show for every € > 0 that the optional times an = inf{t > 0; XI" - 1{7 < t} > e}, n E N, tend a.s. to infinity. The an are clearly nondecreasing, and we denote their limit by a. Note that either an < r or an = 00 for each n. By optional sampling, Theorem 6.4, and Lemma 7.1, we have X;l{a < oo} P(Tn < a < ooIFO"] P(7 < (j < ooIFO"] == l{r < (j < oo}. -+ Hence, X; -+ 1 {7 < a} a.s. on {a < oo}, and so by right-continuity we have on this set an < a for large enough n. Thus, (j is predictable and announced by the times an /\ n. 
496 Foundations of Modern Pobability Next apply the optional sampling and disintegration theorems to the optional times an, to obtain eP{u < oo} < eP{u n < oo} < E[Xn; Un < 00] P{ Tn < an < oo} = P{ Tn < an < T < oo}  P{T==a<oo}==O, where the last equality holds since I is totally inaccessible. Thus, (j == 00 a.s. 0 It is now easy to see that A has only accessible jumps. Lemma 25.9 (accessibility) For any natural increasing process A and totally inaccessible time I, we have Ar == 0 a. s. on {T < oo}. Proof: Resealing if necessary, we may assume that A is a.s. continuous at dyadic times. Define In == 2- n [2 n r). Since A is natural, we have E 1 00 P[Tn > tlFtJdAt = E 1 00 P[Tn > tIFt-JdAt, and since T is totally inaccessible, it follows by Lemma 25.8 that EAr- = E 1 00 l{T > t}dAt = E 1 00 l{T > t}dAt = EAr. Hence, E[A-r; T < 00] = 0, and so Ar = 0 a.s. on {T < oo}. 0 Finally, we need to show that A is predictable. Lemma 25.10 (Doleans) Every natural increasing process is predictable. Proof: Fix a natural increasing process A. Consider a bounded martingale M and a predictable time r < 00 announced by u1, (72, . .. . Then Mr - MUk is again a bounded martingale, and since A is natural, we get by dominated convergence EM-r6..Ar == O. In particular, we may take Mt == P[BIFt] with B E Fr. By optional sampling we have M-r == IB and M-r- +- M Uk == P[BIFuk] -+ P[BIFr-]. Thus, D.M-r == IB - P[BIFr-], and so E[Ar; B] == EArP[BIF-r-] == E[E[LlArIFr-]; B]. Since B was arbitrary in Fr, we get Ar == E[LlArIF-r-] a.s., and so the process A = ArI{T < t} is predictable by Lemma 25.3 (ii). It is also natural, since for any bounded martingale M EArMr == ED.A-rE[MTIFr-] = O. By an elementary construction we have {t > 0; At > O} C Un[7n] a.s. for some optional times Tn < 00, and by Proposition 25.4 and Lemma 25.9 we may assume the latter to be predictable. Taking T = 71 in the previous argument, we may conclude that the process A: == ATll {71 < t} is both 
25. Predictability, Compensation, and Excessive Functions 497 natural and predictable. Repeating the argument for the process A - A l with T == 72 and proceeding by induction, we may conclude that the jump component Ad of A is predictable. Since A - Ad is continuous and hence predictable, the predictability of A follows. 0 For the uniqueness assertion we need the following extension of Proposition 17.2. Lemma 25.11 (constancy criterion) A process M is a predictable mar- tingale of integrable variation iff Mt = Mo a.s. Proof: On the predictable o--field P we define the signed measure I-LB = E 1 00 IB(t)dM t , B E P, where the inner integral is an ordinary Lebesgue-Stieltjes integral. The martingale property implies that JL vanishes for sets B of the form F x (t, 00 ) with F E Ft. By Lemma 25.1 and a monotone class argument it follows that J-L = 0 on P. Since M is predictable, the same thing is true for the process llMt == Mt - Mt-, and then also for the sets J-i= == {t > 0; D.Mt > O}. Thus, J-LJ:i:: = 0, and so M = 0 a.s., which means that M is a.s. continuous. But then Mt = Mo a.s. by Proposition 17.2. 0 Proof of Theorem 25.5: The sufficiency is obvious, and the uniqueness holds by Lemma 25.11. It remains to prove that any local submartingale X has the stated decomposition. By Lemmas 25.6 and 25.11 we may assume that X is of class (D). Then Lemma 25.7 shows that X = M + A for some uniformly integrable martingale M and some natural increasing process A, and by Lemma 25.10 the latter process is predictable. 0 The two conditions in Lemma 25.10 are, in fact, equivalent. Theorem 25.12 (natural and predictable processes, Doleans) An inte- grable, increasing process is natural iff it is predictable. Proof: If an integrable, increasing process A is natural, it is also pre- dictable by Lemma 25.10. Now assume instead that A is predictable. By Lemma 25.7 we have A = M + B for some uniformly integrable martin- gale M and some natural increasing process B, and Lemma 25.10 shows that B is predictable. But then A = B a.s. by Lemma 25.11, and so A is nm. 0 The following useful result is essentially implicit in earlier proofs. 
498 Foundations of Modern Pobability Lemma 25.13 (dual predictable projection) Let X and Y be locally in- tegrable, increasing processes, and assume that Y is predictable. Then X has compensator Y iff E f V dX == E J V dY for every predictable process V > 0. Proof: First reduce by localization to the case when X and Yare inte- grable. Then Y is the compensator of X iff M == Y - X is a martingale or, equivalently, iff EM T == 0 for every optional time T. This is equivalent to the stated relation for V = l[o,T]' and the general result follows by a straightforward monotone class argument. 0 We may now establish the fundamental connection between predictable times and processes. Theorem 25.14 (predictable times and processes, Meyer) For any op- tional time T, these conditions are equivalent: (i) T is predictable; (ii) the process 1 {T < t} is predictable; (iii) EMT == 0 for any bounded martingale M. Proof (Chung and Walsh): Since (i) => (ii) by Lemma 25.3 (ii), and (ii) {::} (Hi) by Theorem 25.12, it remains to show that (iii) => (i). We then introduce the martingale Mt == E[e-TIFtJ and the supermartingale Xt == e-T/\t - Mt == E[e-r/\t - e-rIFt] > 0, t > o. Here X T == 0 a.s. by optional sampling. Letting a == inf{t > 0; Xt- !\ Xt == O}, we see from Lemma 7.31 that {t > 0; Xt == O} == [0", (0) a.s., and in particular CT < T a.s. Using optional sampling again, we get E(e- U -e- r ) == EX u == 0, and so a == T a.s. Hence, Xt 1\ Xt- > 0 a.s. on [0, T). Finally, (iii) yields EX T - = E(e- r - M r -) == E(e- r - M r ) == EXT == 0, and so Xr- == O. It is now clear that T is announced by the optional times Tn == inf{t; Xt < n- 1 }. 0 To illustrate the power of the last result, we may give a short proof of the following useful statement, which can also be proved directly. Corollary 25.15 (restriction) For any predictable time T and set A E :F r -, the restriction T A is again predictable. Proof: The process lAl{T < t} = l{TA < t} is predictable by Lemma 25.3, and so the time TA is predictable by Theorem 25.14. 0 We may also use the last theorem to show that predictable martingales are continuous. 
25. Predictability, Compensation, and Excessive Functions 499 Proposition 25.16 (predictable martingales) A local rnartingale is pre- dictable iff it is a. s. continuous. Proof: The sufficiency is clear by definitions. To prove the necessity, we note that, for any optional time r, M; == M t l[O,r](t) + M r l(T,oo)(t), t > o. Thus, predictability is preserved by optional stopping, and so we may as- sume that M is a uniformly integrable martingale. Now fix any E > 0, and introduce the optional time T == inf{t > 0; IMt > E}. Since the left-continuous version Mt- is predictable, so is the process flMt as well as the random set A == {t > 0; IMt I > c}. Hence, the same thing is true for the random interval [T, 00) == A U (7, 00 ), and therefore T is predictable by Theorem 25.14. Choosing an announcing sequence (Tn), we conclude by optional sampling, martingale convergence, and Lemmas 25.2 (iii) and 25.3 (i) that M r - f- M rn = E[MrIFTn]  E[MrIFr-] :=: Mr. Thus, T 00 a.s. Since £ was arbitrary, it follows that M IS a.s. continuous. 0 The decomposition of optional times in Proposition 25.4 may now be extended to increasing processes. We say that an rcll process X or a filtra- tion F is quasi-leftcontinuous if Xr- == X T a.s. on {T < oo} or Fr- == F T , respectively, for every predictable time T. We further say that X has ac- cessible jumps if X T - == X T a.s. on {T < oo} for every t.otally inaccessible time T. Proposition 25.17 (decomposition of increasing processes) Any purely discontinuous, increasing process A has an a.s. unique decomposition into increasing processes Aq and Aa, where Aq is quasi-leftcontinuous and Aa has accessible jumps. Furthermore, there exist some predictable times T1,72,... with disjoint graphs such that {t > 0; Af > O} C Un [Tn] a.s. Finally, if A is locally integrable with compensator A, then Aq has compensator (A)c. Proof: Introduce the locally integrable process Xt == I:s<t(As /\ 1) with compensator X, and define Aq == A - Aa == l{X == O} . A, or A = At - A = I t + l{b.X s = O} dAs, For any finite predictable time T, the graph [7] is again predictable by Theorem 25.14, and so by Lemma 25.13, t > o. (4) E(A 1\ 1) == E[XT; Xr = 0] == E[XT; Xr == 0] == 0, which shows that Aq is quasi-Ieftcontinuous. 
500 Foundations of Modern PobabiIity Now let Tn,O = 0, and recursively define the random times Tn,k = inf{t > Tn,k-l; Xt E (2-n,2-n+l]}, n,k E N, which are predictable by Theorem 25.14. Also note that {t > 0; Af > O} c Un k [T nk] a.s. by the definition of A a. Hence, if T is a totally inaccessible , time, then A == 0 a.s. on {T < oo}, which shows that A a has accessible Jumps. To prove the uniqueness, assume that A has two decompositions Aq + Aa == Bq + Ba with the stated properties. Then Y == Aq - Bq == Ba - Aa is quasi-leftcontinuous with accessible jumps. Hence, by Proposition 25.4 we have YT = 0 a.s. on {r < oo} for any optional time T, which means that Y is a.s. continuous. Since it is also purely discontinuous, we get Y == 0 a.s. If A is locally integrable, we may replace (4) by Aq == l{dA = O} . A, and we also note that (A)C == l{A == O} . A. Thus, Lemma 25.13 yields for any predictable process V > 0 E J VdAq - E J l{.6.A = O}VdA E J l{.6.A = O}VdA = E J Vd(A)C, and the same lemma shows that Aq has compensator (A)c. o By the compensator of an optional time T we mean the compensator of the associated jump process Xt == 1 {T < t}. The following result charac- terizes the special categories of optional times in terms of the associated compensators. Corollary 25.18 (compensation of optional times) Let T be an optional time with compensator A. Then (i) r is predictable iff A is a.s. constant apart from a possible unit jump; (ii) T is accessible iff A is a.s. purely discontinuous; (iii) T is totally inaccessible iff A is a.s. continuous. In general, T has the accessible part TD, where D == {A'T > 0, T < oo}. Proof: (i) If T is predictable, then so is the process Xt = 1 {T < t} by Theorem 25.14, and hence A = X a.s. Conversely, if At = l{a < t} for some optional time a, then the latter is predictable by Theorem 25.14, and Lemma 25.13 yields Pia = T < oo} = E[XO"; a < 00] = E[AO"; a < 00] = P{a < co} = EAec = EXec = PiT < oo}. Thus, T = G a.s., and so T is predictable. (ii) Clearly, T is accessible iff X has accessible jumps, which holds by Proposition 25.17 iff A = Ad a.s. (iii) Here we note that T is totally inaccessible iff X is quasi-Ieftcontin- uous, which holds by PrQposition 25.17 iff A = AC a.s. 
25. Predictability, Compensation, and Excessive Functions 501 The last assertion follow'8 easily from (ii) and (iii). 0 The next result characterizes quasi-left-continuity for both filtrations and martingales. Proposition 25.19 (quasi-leftcontinuous filtrations, Meyer) For any fil- tration F, these conditions are equivalent: (i) Every accessible time is predictable; (ii) Fr- == Fr on {T < oo} for every predictable time T; (iii) b.M r == 0 a.s. on {T < oo} for every martingale 1\1 and predictable time T. If the basic a-field in 0 is taken to be F cx:n then Fr- == Fr on {T == oo} for any optional time T, and the relation in (ii) extends to all of fl. Proof: (i) => (ii): Let T be a predictable time, and fix any B E Fr n {T < oo}. Then [TB] C [T], and so TB is accessible, hence by (i) even pre- dictable. The process Xt == 1 {TB < t} is then predictable by Theorem 25.14, and since Xr1{T < oo} == l{TB < T < oo} == IB, Lemma 25.3 (i) yields B E Fr-. (ii) =? (iii): Fix any martingale M, and let T be a bounded, predictable time with announcing sequence (Tn). Using (ii) and Lemma 25.2 (iii), we get as before M r - +- M rn == E[Mr\Fr n ] -t E[MrIFT-] == E[Mr\Fr] == M r , and so M r - == M r a.s. (iii) => (i): If T is accessible, then by Proposition 25.4 there exist some predictable times Tn with [T] C Un[Tn] a.s. By (iii) we have b.M rn == 0 a.s. on {Tn < oo} for every martingale M and all n, and so b.M T == 0 a.s. on {T < oo}. Hence, T is predictable by Theorem 25.14. 0 In particular, quasi-left-continuity holds for canonical Feller processes and their induced filtrations. Proposition 25.20 (quasi-left-continuity of Feller p1"'Ocesses, Blumen- thal, Meyer) Let X be a canonical Feller process with arbitrary initial distribution, and fix any optional time T. Then these conditions are equivalent: (i) T is predictable; (ii) T is accessible; (iii) Xr- == X r a.s. on {T < oo}. In the special case when X is a.s. continuous, we may conclude that every optional time is predictable. Proof: (ii) => (iii): By Proposition 25.4 we may assume that T is finite and predictable. Fix an announcing sequence (Tn) and a function f E Co. 
502 Foundations of Modern Pobability By the strong Markov property, we get for any h > 0 E{f(X rn ) - f(Xr n +h)}2 E(f2 - 2fThf + Thf2)(Xr n ) < 11/ 2 - 2fThf + Thf211 < 211flilif - Thill + 11/ 2 - Th1 2 11. Letting n -t 00 and then h ..J.. 0, it follows by dominated convergence on the left and by strong continuity on the right that E{/(X r -) - f(Xr)}2 = 0, which means that !(X r -) = f(Xr) a.s. Applying this to a sequence 11,/2,. . . E Co that separates points, we obtain Xr- = X r a.s. (Hi) => (i): By (iii) and Theorem 19.20 we have LlM r = 0 a.s. on {T < oo} for every martingale M, and so T is predictable by Theorem 25.14. (i) => (ii): This is trivial. 0 The following basic inequality will be needed in the proof of Theorem 26.12. Proposition 25.21 (norm inequality, Garsia, Neveu) Consider a right- or left-continuous, predictable, increasing process A and a random variable ( > 0 such that a.s. E[Aoo - At 1 F t] < E[(IFt], t > O. (5) Then !lAco lip < pl!(!lp, p > 1. In the left-continuous case, predictability is clearly equivalent to adapt- edness. The proper interpretation of (5) is to take E[AtIFtJ = At and to choose right-continuous versions of the martingales E[AcoIFt] and E[(IFt]. For a right-continuous A, we may clearly choose ( = Z* , where Z is the su- permartingale on the left of (5). We also note that if A is the compensator of an increasing process X, then (5) holds with ( = Xoo. Proof: We need to consider only the right-continuous case, the case of a left-continuous process A being similar but simpler. It is enough to assume that A is bounded, since we may otherwise replace A by the process A /\ u for arbitrary u > 0, and let u --t 00 in the resulting formula. For each r > 0, the random time Tr = inf{t; At > r} is predictable by Theorem 25.14. By optional sampling and Lemma 25.2 we note that (5) remains true with t replaced by T r -. Since Tr is Frr_-measurable by the same lemma, we obtain E[Aoo - r; Aoo > r] < E[Aoo - r; Tr < 00] < E[Aoo - Arr-; Tr < 00] < E[(; T'r < 00] < E[(; Aoo > r]. 
25. Predictability, Compensation, and Excessive Functions 503 Writing Aoo = ex and letting p-I + q-I == 1, we get by flibini's theorem, Holder's inequality, and some calculus 1I0:1I = p2q-l E 1° (0: - r)r P - 2 dr _ p2q-l 1 00 E[o: - rj 0: > r]r P - 2 dr < p2q-l 1 00 E[(; 0: > r]r P - 2 dr _ p2q-l E ( 1° r p - 2 dr _ pE(aP-1 < plI(llpIIQII-l. If lIali p > 0, we may finally divide both sides by "all-l. o Let us now turn our attention to random measures  on (0,00) x S, where (8, S) is a Borel space. We say that  is adapted, predictable, or locally integrable if there exists a subring S c S with a(S) == S such that the process tB = «O, t] x B) has the corresponding property for every B E S. In case of adaptedness or predictability, it is clearly equivalent that the relevant property holds for the measure-valued process t. Let us further say that a process V on 1R+ x S is predictable if it is P Q9 S-measurable, where P denotes the predictable a-field in 1R+ x n. Theorem 25.22 (compensation of random measures, Grigelionis, Jacod) Let  be a locally integrable, adapted random measure on some product space (0,00) x S, where 8 is Borel. Then there exists an a.s. unique predictable random measure t on (0,00) x S such that E J V d == E f V d€ for every predictable process V > 0 on JR.+ x s. '" The random measure  above is called the compensator of €. By Lemma 25.13 this extends the notion of compensator for real-valued processes. For the proof of Theorem 25.22 we need a simple technical ]emma, which can be established by straightforward monotone class arguments. Lemma 25.23 (predictable random measures) (i) For any predictable random measure  and predictable process V > 0 on (0,00) x S, the process V .  is again predictable. (ii) For any predictable process V > 0 on (0, (0) x S and predictable, measure-valued process p on 8, the process yt == J Vi,spt(ds) is again predictable. Proof of Theorem 25.22: Since  is locally integrable, we may easily con- struct a predictable process V > 0 on JR.+ x 8 such that E f V d(, < 00. If the random measure (" = V .  has compensator (, then by Lemma 25.23 the measure  = V-I. ( is the compensator of . Thus, we may henceforth agsume that E«O, 00) x S) = 1. 
504 Foundations of Modern Pobability Write", = (. x S). Using the kernel operation Q9 of Chapter 1, we may introduce the probability measure J.l = P@ on f2x1R+ xS and its projection 1/ := P Q9 "1 onto Q x 1R+. Applying Theorem 6.3 to the restrictions of J.l and v to the a-fields P Q9 Sand P, respectively, we conclude that there exists some probability kernel p from (n x IR+, P) to (8, S) satisfying J.l = v Q9 P, or p (g)  = p 0 1] 0 p on (0 x 1R+ x S, P x S). Letting fJ" denote the compensator of 1], we may introduce the random measure  = fJ 0 p on JR+ x S. To see that t is the compensator of, we first note that t is predictable by Lemma 25.23 (i). Next we consider an arbitrary predictable process V > 0 on IR+ x S, and note that the process Y s := J Vs,tpt(ds) is again predictable by Lemma 25.23 (ii). By Theorem 6.4 and Lemma 25.13 we get E ! Vdt = E! r,(dt) ! Vs,tpt(ds) E ! ".,(dt) ! Vs,t pt(ds) = E ! V. "- It remains to note that  is a.s. unique by Lemma 25.13. o Our next aim is to show, under a weak regularity condition, how a point process can be transformed to Poisson by means of a suitable pre- dictable mapping. The result leads to various time-change formulas for point processes, similar to those for continuous local martingales in Chapter 18. Recall that an S-marked point process on (0,00) is defined as an integer- valued random measure  on (0,00) x S such that a.s. ([t] x S) < 1 for all t > O. The condition implies that (, s locally integrable, and so the existence of the associated compensator  is automatic. We say that  is quasi-leftcontinuous if ([T] x S) = 0 a.s. for every predictable time T. Theorem 25.24 (predictable mapping to Poisson) Fix a Borel space S and a a-finite measure space (T, J.L), let  be a quasi-leftcontinuous S-marked point process on (0,00) with compensator t, and let Y be a predictable mapping from 1R+ x S to T with t a y-l = Jl a.s. Then 1} =  0 y-l is a Poisson process on T with E1J = J.l. Proof: For any disjoint measurable sets Bl,'. . , Bn in T with finite J-t- measure, we need to show that 1JBl,..', 1]Bn are independent Poisson random variables with means J.LBl, . . . , ftBn. Then introduce for each k < n the processes J; = li t + IBk(,x){(dsdx), J; = lit IBk(,x)€(dsdx). Here j = J.lBk < 00 a.s. by hypothesis, and so the Jk are simple Hinteable point prQcesses on 1R+ with compensators jk. For fixed 
25. Predictability, Compensation, and Excessive Functions 505 Ul, . . . , Un > 0, we define Xt = L k :5;n {UkJ: - (1 - e-Uk)J:}, t > o. The process Mt == e- Xt has bounded variation and finitely many jumps, and so by an elementary change of variables Mt - 1 L .6.e- xs - t e-xsdX st Jo Lkn I t + e- xs - (1 - e- Uk ) d(J: - J:). Since the integrands on the right are bounded and predictable, M is a uniformly integrable martingale, and we get EMoo == 1. Thus, E exp { - Lk uk'TJBk } = exp { - Lk (1 - e- Uk )/lBk } , and the assertion follows by Theorem 5.3. o The preceding theorem immediately yields a corresponding Poisson char- acterization, similar to the characterization of Brownian motion in Theorem 18.3. The result may also be considered as an extension of Theorem 12.10. Corollary 25.25 (Poisson characterization, Watanabe) Fix a Borel space S and a measure J.L on (0,00) x S with J.L( {t} x S) == 0 for all t > o. Let  be an S -marked, F -adapted point process on (0, (0) with compensator . '" Then  is F -Poisson with E == J.L iff  == Jl a.s. We may further deduce a basic time-change result, similar to Proposition 18.8 for continuous local martingales. Corollary 25.26 (time-change to Poisson, Papangelou, Meyer) Let N 1 , . . . , Nn be counting processes on IR+ with a.s. unbounded and continuous compensators ill,..., fJn, and assume that Ek N k is a.s. simple. De- fine T: ==inf{t>O; JVk > s} and Ysk == Nk(T:). Then yl,...,yn are independent unit-rate Poisson processes. Proof: We may apply Theorem 25.24 to the random measures  == " " 1 (l,... 'n) and  = (l,... ,n) on {I,..., n}xJR+ induced by (N ,..., Nn) and (N l ,..., JVn), respectively, and to the predictable mapping Tk,t == (k,Nf) on {I,... ,n} x JR+. It is then enough to verify that, a.s. for fixed k and t, ""k "'k k k k{S > 0; N s < t} = t, k{S > 0; N s < t} == N (T t ), which is clear by the continuity of JVk. 0 There is a similar result for stochastic integrals with respect to p-stable Levy processes, as described in Proposition 15.9. For simplicity, we consider only the case when p < 1. 
506 Foundations of Modern Pobability Proposition 25.27 (time-change of stable integrals) For apE (0,1), let X be a strictly p-stable Levy process, and consider a predictable process V > 0 such that the process A = VP .  is a.s. finite but unbounded. Define Ts = inf{t; At > s}, s > O. Then (V. X) 07 d X. Proof: Define a point process  on JR+ x (IR\ {O}) by B == 2:s 1B(8, Xs), and recall from Corollary 15.7 and Proposition 15.9 that  is Poisson with intensity measure of the form AQ9V, where v(dx) = c:1:lxl- p - 1 dx for IX > o. "- In particular,  has compensator  =  Q9 v. Let the predictable mapping T on 1R+ x JR be given by Ts,x = (As, x). Since A is continuous, we have {As < t} == {s < it} and Art == t. By Fubini's theorem, we hence obtain for any t, u > 0 (AQ9V) oT- 1 ([O,t] x (u,oo)) (AQ9V){(S,x); As < t, xV s > u} _ l Tt lI{x; xV s > u}ds lI(u,oo) l Tt V[ds = t lI(u, (0), and similarly for the sets [0, t] x (-00, -u). Thus,  0 T-l ==  ==  Q9 v a.s., and so Theorem 25.24 yields {o T-l d . Finally, we note that (V . X)Tt - l Tt + J xV s (dsdx) = 1 00 J xVsl{As < tH(dsdx) - I t + J y (o T- 1 )(drdy), where the process on the right has the same distribution as x. 0 We turn to an important special case where the compensator can be computed explicitly. By the natural compensator of a random measure  we mean the compensator with respect to the induced filtration. Proposition 25.28 (natural compensator) For any Borel space (8, S), let (T, () be a random element in (0,00] x S with distribution J.t. Then  = 8 r ,( has natural compensator i tB = 1 j.t(dr x B) , B S .. ([ ] ) t > 0, E . (O,tAr] JL r,oo x S (6) Proof: The process 'T/tB on the right of (6) is clearly predictable for every B E S. It remains to show that Mt = tB - T/tB is a martingale, hence that E[Mt - Ms; A] = 0 for any s < t and A E Fs. Since Mt = Ms on {7 < s}, and the set {T > 8} is a.s. an atom of Fs, it suffices to show that 
25. Predictability, Compensation, and Excessive Functions 507 E(M t - Ms) = 0, or EMt = o. Then use Fubini's theorem to get E'TJt B = E f j.t(dr x B) J(O,tAT] J.L([r,oo] X S) r lL(dx) r lL(dr x B) J(O,oo] J(O,tAX] J.L([r,oo] x S) r lL(dr x B) r lL(dx) J(O,t] J-l([r,oo] x S) J[r,oo] J-l( (0, t] x B) == EtB. o We turn to some applications of the previous ideas to elassical potential theory. Then fix a domain D C ]Rd, and let Tt = Tt D denote the transition operators of Brownian motion X in D, killed at the boundary aD. A func- tion f > 0 on D is said to be excessive if Tt! < f for all t > 0 and Tt/ -t ! as t -t O. In this case clearly Ttf t f. Note that if f is excessive, then f{X) is a supermartingale under Px for every xED. The basic example of an excessive function is the Green potential CD v of a measure v on a Greenian domain D, provided this potential is finite. Though excessivity is defined globally in terms of the operators Tt D , it is in fact a local property. For a precise statement, we say that a measurable function ! > 0 on D is superharmonic if, for any ball B in D with center x, the average of f over the sphere BB is bounded by f(x). As we shall see, it is enough to consider balls in D of radius less than an arbitrary f > O. Recall that f is lower semicontinuous if X n -t x implies liminf n f(x n ) > f(x). Theorem 25.29 (superharmonic and excessive functions, Doob) Let f > o be a measurable function on a domain D C d. Then f is excessive iff it is superharmonic and lower semicontinuous. For the proof we need two lemmas, the first of which clarifies the relation between the two continuity properties. Lemma 25.30 (semicontinuity) Consider a measurable function f > 0 on a domain D C JRd such that Ttf < f for all t > O. Then f is excessive iff it is lower semicontinuous. Proof: First assume that f is excessive, and let X n -). x in D. By Theorem 24.7 and Fatou's lemma Itf(x) J pf(x, y)f(y)dy < liminf J pf(Xn, y)f(y)dy n-+oo - liminfTtf(x n ) < liminf f(xn), n-+oo n-+oo and as t -t 0, we get f(x) < liminf n f(xn). Thus, f is lower semicontinuous. 
508 Foundations of Modern PobabiUty Next assume that f is lower semicontinuous. Using the continuity of X and Fatou's lemma, we get as t ---t 0 along an arbitrary sequence f(x) Exf(Xo) < Ex liminf f(Xt) t--+-O < Hminf Ex/(Xt) = liminfTtJ(x) t--+-O t--+-O < limsupTt/(x) < f(x). t--+-O Thus, Tt f ---1- /, and / is excessive. o For smooth functions, the superharmonic property is easy to describe. Lemma 25.31 (smooth functions) A function f > 0 in C 2 (D) is superharrnonic iff f < 0, in which case J is also excessive. Proof: By Ita's formula, the process Mt = f(Xt} -  1 t f(Xs)ds, t E [0, (), (7) is a continuous local martingale. Now fix any closed ball BcD with center x, and write T = TaB. Since ExT < 00, we get by dominated convergence f(x) = Exf(X-r) - Ex 1-r f(Xs)ds. Thus, f is superharmonic iff the last expectation is < 0, and the first assertion follows. To prove the last statement, we note that the exit time ( == TaD is predictable, say with announcing sequence ('Tn). If! < 0, we get from (7) by optional sampling Ex[f(XtA7"n); t < (] < Exf(XtATn) < f(x). Hence, Fatou's lemma yields Ex[J(Xt); t < (] = Ttf(x), and so f IS excessive by Lemma 25.30. 0 Proof of Theorem 25.29: If f is excessive or superharmonic, then Lemma 25.30 shows that f /\ n has the same property for every n > O. The converse statement is also true-by monotone convergence and because the lower semicontinuity is preserved by increasing limits. Thus, we may henceforth assume that f is bounded. Now assume that f is excessive on D. By Lemma 25.30 it is then lower semicontinuous, and it remains to prove that f is superharmonic. Since the property Tt! < f is preserved by passing to a subdomain, we may assume that D is bounded. For each h > 0 we define qh = h- 1 (f - Thf) and fh = QDqh. Since f and D are bounded, we have GDf < 00, and so fh = h- 1 Jo h Tsfds t f. By the strong Markov property we further see that, 
25. Predictability, Compensation, and Excessive Functions 509 for any optional time T < (, Exfh(X T ) ExEx.,. 1 00 qh(Xs)ds = Ex 1 00 qh(Xs+r)ds Ex 1 00 qh(Xs)ds < fh(X). In particular, fh is superharmonic for each h, and so by monotone convergence the same property holds for f. Conversely, assume that f is superharmonic and lower semicontinuous. To prove that f is excessive, it is enough by Lemma 25.30 to show that Ttf < f for all t. Then fix a spherically symmetric probability density 'l/J E coo(d) with support in the unit ball, and put 'l/Jh(X) = h--d'ljJ(x/h) for each h > O. Writing p for the Euclidean metric in d, we may define fh == 'l/Jh * f on the set Dh = {x E D; p(x,DC) > h}. Note that fh E COO(D h ) for all h, that fh is super harmonic on Dh, and that fh t f. By Lemma 25.31 and monotone convergence we conclude that f is excessive on each set Dh. Letting (h denote the first exit time from D h , we obtain Ex[f(Xt); t < (h] < f(x), h > o. As h ---t 0, we have (h t (, and hence {t < (h} t {t < (}. Thus, by monotone convergence Tt f (x) < f (x). 0 We may now prove the remarkable fact that, although an excessive func- tion f need not be continuous, the supermartingale f{X) is a.s. continuous under Px for every x. Theorem 25.32 (continuity, Doob) Fix an excessive function f on a do- main D C JRd, and let X be a Brownian motion killed at aD. Then the process f{X t ) is a.s. continuous on [0, (). The proof is based on the following invariance under time reversal of a stationary version of Brownian motion. Though no such process exists in the usual sense, we may consider distributions with respect to the a-finite measure P = J Px dx , where Px is the distribution of a Brownian motion in Rd starting at x. Lemma 25.33 (time reversal, Doob) For any c > 0, the processes  = Xt and Yt = Xc-t on [0, c] have the same distribution under P . Proof: Introduce the processes Bt = Xt - Xo, Bt = Xc-t - Xc, t E [0, c], and note that Band iJ are Brownian motions on [0, c] under each Px. Fix any measurable function f > 0 on C([O,c],JRd). By Fubini's theorem and 
510 Foundations of Modern Pobability the invariance of Lebesgue measure, we get E f{Y) = E f{X o - Be + B) = J Exf{x - Be + B) dx - J Eof{x - Be + B)dx = Eo J f{x - Be + B)dx - Eo J f{x + B) dx = J Exf(Y) dx = E f{Y). 0 Proof of Theorem 25.32: Since f 1\ n is again excessive for each n > 0 by Theorem 25.29, we may assume that f is bounded. As in the proof of the same theorem, we may then approximate f by smooth excessive functions fh t f on suitable subdomains Dh t D. Since fh{X) is a continuous super- martingale up to the exit time (h from D h , Theorem 7.32 shows that f{X) is a.s. right-continuous on [0, () under any initial distribution J-L. Using the Markov property at rational times, we may extend the a.s. right-continuity to the random time set T = {t > 0; Xt ED}. To strengthen the result to a.s. continuity on T, we note that f{X) is right-continuous on T, a.e. P . By Lemma 25.33 it follows that f{X) is also left-continuous on T, a.e. P . Thus, f(X) is continuous on T, a.s. PJ.t for arbitrary J-l «: )...d. Since PJ.L 0 Xi: 1 « )...d for any J.l and h > 0, we may conclude that f (X) is a.s. continuous on T n [h, 00) for any h > o. This together with the right-continuity at 0 yields the asserted continuity on [0, (). 0 If f is excessive, then f{X) is a supermartingale under Px for every x, and so it has a Doob-Meyer decomposition f{X) = M -A. It is remarkable that we can choose A to be a continuous additive functional (CAF) of X independent of x. A similar situation was encountered in connection with Theorem 22.23. Theorem 25.34 (compensation by additive functional, Meyer) Let f be an excessive function on a domain D C jRd, and let Px be the distribution of Brownian motion in D, killed at aD. Then there exists an a.s. unique CAF A of X such that M = f{X) + A is a continuous, local Px-martingale on [0, () for every xED. The main difficulty in the proof is to construct a version of the process A that compensates - f{X) under every measure Pp. Here the following lemma is helpful. Lemma 25.35 (universal compensation) Consider an excessive function f on a domain D C ]Rd, a distribution m I".J >.. d on D, and a Pm -compensa- tor A of - f{X) on [0, (). Then for any distribution f-L and constant h > 0, the process A 0 {Jh is a PIL-compensator of - f{X 0 (Jh) on [0, (0 (Jh). In other words, the process Mt = f{X t ) + At-h 0 (Jh is a local Pp.-martingale on [h, () for every J.t and h. 
25. Predictability, Compensation, and Excessive Functions 511 Proof: For any bounded Pm-martingale M and initial distribution J-L « m, we note that M is also a PJL-martingale. To see this, ¥lrite k == dJ-L/ dm, and note that PJL = k(X o ) .Pm. It is equivalent to show that Nt = k(Xo)Mt is a Pm-martingale, which is clear since k(X o ) is Fo-measurable with mean 1. Now fix any distribution J-L and a constant h > o. To prove the stated property of A, it is enough to show that, for any bounded Pm-martingale M, the process Nt = Mt-h OOh is a PJL-martingale on [h, (0). Then fix any times s < t and sets F E :Fh and G E Fs. Using the Markov property at h and noting that PJL 0 Xi: 1 « m, we get EJL[Mt 0 Oh; F n Oh"lG] - EJL[Exh [Mt; G]; F] EJL[Exh [Ms; G]; F] - EJL[Ms 0 Oh; F n Oh1G]. Hence, by a monotone class argument, EJ.£(MtoOhIFh+s] = MsoOh a.s. 0 Proof of Theorem 25.34: Let AJL denote the PJL-compensator of - f(X) on [0, (), and note that AJL is a.s. continuous, e.g. by Theorem 18.10. Fix any distribution m  Ad on D, and conclude from Lemma 25.35 that Am 0 Oh is a PJ.L-compensator of - f(X OOh) on [0, (0 Oh) for any J-L and h > O. Since this is also true for the process Ar+h - A, we get for any J-L and h > 0 Ar = A + Ah OOh, t > h, a.s. PJ.£. (8) Restricting h to the positive rationals, we may define At = lim Ah oOh, t > 0, h-+O whenever the limit exists and is continuous and nondecreasing with Ao = 0, and put A = 0 otherwise. By (8) we have A = AJ.£ a.s. P JJ , for every J-L, and so A is a PJL-compensator of - f(X) on [0, () for every J-L. For each h > 0 it follows by Lemma 25.35 that A 0 (}h is a PJL-compensator of - f(X 0 (}h) on [0, ( 0 (}h), and since this is also true for the process At+h - A h , we get At+h = Ah + At 0 (}h a.s. PJ.L. Thus, A is a CAF. 0 We may now establish a probabilistic version of the classical Riesz de- composition. To avoid technical difficulties, we restrict our attention to locally bounded functions f. By the greatest harmonic minorant of f we mean a harmonic function h < f that dominates all other such functions. Recall that the potential U A of a CAF A of X is given by U A (x) = Ex Aoo . Theorem 25.36 (Riesz decomposition) Fix any locally bounded function f > 0 on a domain D C }Rd, and let X be Brownian motion on D, killed at aD. Then f is excessive iff it has a representation f == U A + h, where A is a CAF of X and h is harmonic with h > O. In that case, A is the compensator of - f{X) and h is the greatest harmonic m,inorant of f. A similar result for uniformly a-excessive functions of an arbitrary Feller process was obtained in Theorem 22.23. From the classical Riesz represen- 
512 Foundations of Modern Pobability tation on Greenian domains, we know that U A may also be written as the Green potential of a unique measure v A, so that f = G D V A + h. In the special case when D = IR d with d > 3, we recall from Theorem 22.21 that l/AB = E (lB . A)l. A similar representation holds in the general case. Proof of Theorem 25.36: First assume that A is a CAF with U A < 00. By the additivity of A and the Markov property of X, we get for any t > 0 UA(X) ExAoo = Ex (At + Aoo oOt) ExAt + ExEXtAoo = ExAt + TtUA(X). By dominated convergence ExAt t 0 as t -+ 0, and so U A is excessive. Even U A + h is then excessive for any harmonic function h > O. Conversely, assume that f is excessive and locally bounded. By Theorem 25.34 there exists some CAF A such that M == I(X) + A is a continuous local martingale on [0, (). For any localizing and announcing sequence Tn t (, we get f(x) == ExMo = EXM'Tn = Exl(X'T n ) + ExAr n > ExAr n . As n -+ 00, it follows by monotone convergence that U A < f. By the additivity of A and the Markov property of X, Ex [Aoo 1Ft] At + Ex[Aoo oOtlFt] At + EXt Aoo == Mt - I(X t ) + UA(X t ). (9) Writing h == I - UA, it follows that heX) is a continuous local martingale. Since h is locally bounded, we may conclude by optional sampling and dominated convergence that h has the mean-value property. Thus, h is harmonic by Lemma 24.3. To prove the uniqueness of A, assume that I also has a representation U B + k for some OAF B and some harmonic function k > O. Proceeding as in (9), we get At - Bt = Ex[Aoo - Boo 1Ft] + h(X t ) - k(X t ), t > 0, which shows that A - B is a continuous local martingale. Hence, Proposition 17.2 yields A = B a.s. To see that h is the greatest harmonic minorant of I, consider any har- monic minorant k > O. Since I - k is again excessive and locally bounded, it has a representation U B + 1 for soe OAF B and some harmonic function l. But then f = U B + k + l, and so A = B a.s. and h = k + l > k. 0 For any sufficiently regular measure v on ]Rd, we may now construct an associated OAF A of Brownian motion X such that A increases only when X visits the support of v. This clearly extends the notion of local time. For convenience we may write GD(lD · v) = GDv. 
25. Predictability, Compensation, and Excessive Functions 513 Proposition 25.37 (additive functionals induced by measures) Fix a measure v on JRd such that U(lD . v) is bounded for every bounded do- main D. Then there exists an a.s. unique CAF A of Bro'wnian motion X such that, for any D, ExA(D = GDv(x), xED. Conversely, v is uniquely determined by A. Furthermore, suppA c {t > 0; Xt E suppv} a.s. (10) (11) The proof is straightforward, given the classical Riesz decomposition, and we shall indicate the main steps only. Proof: A simple calculation shows that CD V is excessive for any bounded domain D. Since cDv < U(ID . v), it is further bounded. Hence, by Theo- rem 25.36 there exist a CAF AD of X on [0, (D) and a harmonic function hD > 0 such that GDv == U AD + hD. In fact, h D == 0 by Riesz' theorem. Now consider another bounded domain D'  D. We claim that CD' v- GDv is harmonic on D. This is clear from the analytic definitions, and it also follows, under a regularity condition, from Lemma 24.13. Since AD and A D , are compensators of -CDv(X) and -G D ' v(X) respectively, we conclude that AD - AD' is a martingale on [0, (D), and so AD = AD' a.s. up to time (D. Now choose a sequence of bounded domains Dn t JRd, and define A == SUPn ADn' so that A == AD a.s. on [0, (D) for all D. It is easy to see that A is a CAF of X, and that (10) holds for any bounded domain D. The uniqueness of v is clear from the uniqueness in the classical Riesz decomposition. Finally, we obtain (11) by noting that GDv is harmonic on D \ supp v for every D, so that GDv(X) is a local martingale on the predictable set {t < (D; Xt fj. supp v}. 0 Exercises 1. Show by an example that the a-fields :F T and :F T - may differ. (Hint: Take T to be constant.) 2. Give examples of optional times that are predictable; accessible but not predictable; and totally inaccessible. (Hint: Use Corollary 25.18.) 3. Show by an example that a right-continuous, adapted process need not be predictable. (Hint: Use Theorem 25.14.) 4. Given a Brownian motion B on [0, 1], let F be the filtration induced by Xt == (Bt, B 1 ). Find the Doob-Meyer decomposition B == M + A on [0,1) and show that A has a.s. finite variation on [0, 1]. 5. For any totally inaccessible time T, show that SUPt IP{T < t + EIFt] - 1 {T < t} I -+ 0 a.s. as E -+ O. Derive a corresponding result for the compensator. (Hint: Use Lemma 25.8.) 
514 Foundations of Modern Pobability 6. Let the process X be adapted and rcll. Show that X is predictable iff it has accessible jumps and LlX r is Fr--measurable for every predictable time T < 00. (Hint: Use Proposition 25.17 and Lemmas 25.2 and 25.3.) 7. Show that the compensator A of a quasi-Ieftcontinuous local submartin- gale is a.s. continuous. (Hint: Note that A has accessible jumps. Use optional sampling at an arbitrary predictable time T < 00 with announcing sequence (Tn).) 8. Extend Corollary 25.26 to possibly bounded compensators. Show that the result fails in general when the compensators are not continuous. 9. Show that any general inequality involving an increasing process A and its compensator A remains valid in discrete time. (Hint: Embed the discrete-time process and filtration into continuous time.) 
Chapter 26 Semimartingales and General Stochastic Integration Predictable covariation and £2 -integral; semimartingale integral and covariation; general substitution rule; Doleans' exponen- tial and change of measure; norm and exponential inequal- ities; martingale integral; decomposition of semirnartingales; quasi-martingales and stochastic integrators In this chapter we shall use the previously established Doob-Meyer de- composition to extend the stochastic integral of Chapter 17 to possibly discontinuous semimartingales. The construction proceeds in three steps. First we imitate the definition of the L 2 -integral V . M from Chapter 17, using a predictable version (M, N) of the covariation process. A suitable truncation then allows us to extend the integral to arbitrary semimartin- gales X and bounded, predictable processes V. The ordinary covariation [X, Y] can now be defined by the integration-by-parts formula, and we may use some generalized versions of the BDG inequalities from Chapter 17 to extend the martingale integral V . M to more general integrands V. Once the stochastic integral is defined, we may develop a stochastic cal- culus for general semimartingales. In particular, we shall prove an extension of Ita's formula, solve a basic stochastic differential equation, and establish a general Girsanov-type theorem for absolutely continuous changes of the probability measure. The latter material extends the appropriate portions of Chapters 18 and 21. The stochastic integral and covariation process, together with the Doob- Meyer decomposition from the preceding chapter, provide the tools for a more detailed analysis of semimartingales. Thus, we may now establish two general decompositions, similar to the decompositions of optional times and increasing processes in Chapter 25. We shall further derive some ex- ponential inequalities for martingales with bounded jUlnps, characterize local quasi-martingales as special semimartingales, and show that no con- tinuous extension of the predictable integral exists beyond the context of semimartingales. Throughout this chapter, M 2 denotes the class of uniformly square- integrable martingales. As in Lemma 17.4, we note that M 2 is a Hilbert space for the norm IIMII = (EM)1/2. We define M5 as the closed lin- ear subspace of martingales M E M 2 with Mo = o. The corresponding 
516 Foundations of Modern Probability classes Mfoc and M6 lac are defined as the sets of processes M such that , the stopped versions MTn belong to M 2 or M6, respectively, for some sequence of optional times Tn -+ 00. For every M E Mfoc we note that M 2 is a local submartingale. The corre- sponding compensator, denoted by (M), is called the predictable quadratic variation of M. More generally, we may define the predictable covariation (M, N) of two processes M, N E Mroc as the compensator of M N, also computable by the polarization formula 4(M,N) = (M + N) - (M - N). Note that (M, M) - (M). If M and N are continuous, then clearly (M, N) = [M, N] a.s. The following result collects some further useful properties. Proposition 26.1 (predictable covariation) For any M, Mn, N E Mfoc, (i) (!vI, N) == (M - Mo, N - No) a.s.; (ii) (M) is a.s. increasing, and (M, N) is a.s. symmetric and bilinear; (iii) I(M, N)I < J Id(M, N)I < (M)1/2(N)1/2 a.s.; (iv) (M, N) I == (M', N) == (MT, NT) a.s. for any optional time T; (v) (Mn}oo  0 implies (Mn - M(f)*  O. Proof: By Lemma 25.11 we note that (M, N) is the a.s. unique predictable process of locally integrable variation and starting at 0 such that M N - (M, N) is a local martingale. The symmetry and bilinearity in (ii) follow immediately, as does property (i), since M No, MoN, and MoNo are all local martingales. Property (iii) is proved in the same way as Proposition 17.9, and (iv) is obtained as in Theorem 17.5. To prove (v), we may assume that M{) = 0 for all n. Let (Mn)oo  O. Fix any c > 0, and define Tn = inf{t; (Mn)t > c}. Since (Mn) is predictable, even Tn is predictable by Theorem 25.14 and is therefore announced by some sequence Tnk t Tn. The latter may be chosen such that Mn is an £2_ martingale and (Mn)2 - (Mn) a uniformly integrable martingale on [0, Tnk] for every k. By Proposition 7.16 E ( M n ) *2 < E ( M n ) 2 = E ( M n ) < c Tnk ,.-.. Ink Tnk - , and as k  00, we get E(Mn);_ ;S E. Now fix any b > 0, and write P{(M n )*2 > 8} < P{Tn < co} +8-1E(Mn):_ 5 p{(Mn)oo > c}+b- 1 c. Here the right-hand side tends to zero as n  00 and then € -+ O. D We may use the predictable quadratic variation to extend the It6 integral from Chapter 17. As before, let £ denote the class of bounded, predictable step processes V with jumps at finitely many fixed times. We refer to the corresponding integral V . X as the elementary predictable integral. 
26. Semimartingales and General Stochastic Integration 517 Given any M E Mroc' let L2(M) be the class of predictable processes V such that (V 2 . (M))t < 00 a.s. for every t > o. We first consider integrals V . M with M E Mfoc and V E L2(M). Here the integral process belongs to M6,loc, the class of local L 2 -martingales starting at O. In the following statement, it is understood that M, N E Mfoc and that U and V are predictable processes such that the stated integrals exist. Theorem 26.2 (L 2 -integral, Courrege, Kunita and Watanabe) The ele- mentary predictable integral extends a.s. uniquely to a bilinear map of any M E Mroc and V E L2(M) into V . M E M61oc, such that (V; . (Afn))t ,  0 implies (V n . M n );  0 for every t > O. Furthermore, (i) (V. M, N) == V . (M, N) a.s. for all N E Mfoc; (ii) u. (V . M) == (UV) . M a.s.; (iii) (V. M) == V M a.s.; (iv) (V. M)T == V . MT == (Vl[o,T]) . M a.s. for any optional time T; where property (i) characterizes the integral. The proof depends on an elementary approximation property, corre- sponding to Lemma 17.23 in the continuous case. Lemma 26.3 (approximation) Let V be a predictable process with IVIP E L( A), where A is increasing and p > 1. Then there exist some VI, v 2 , . . . E £ with (IV n - VIP. A)t -+ 0 a.s. for all t > o. Proof: It is enough to establish the approximation (IV n - VI)P . A)t  o. By Minkowski's inequality we may then approximate in steps, and by dominated convergence we may first reduce to the case when V is simple. Each term may then be approximated separately, and so we may next assume that V == I B for some predictable set B. Approximating separately on disjoint intervals, we may finally reduce to the case when B cOx [0, t] for some t > O. The desired approximation is then obtained from Lemma 25.1 by a monotone class argument. D Proof of Theorem 26.2: As in Theorem 17.11, we may construct the integral V . M as the a.s. unique element of M6 loc satisfying (i). The , mapping (V, M) M V. M is clearly bilinear, and by the analogue of Lemma 17.10 it extends the elementary predictable integral. Properties (ii) and (iv) may be obtained in the same way as in Propositions 17.14 and 17.15. The stated continuity property follows immediately from (i) and Proposition 26.1 (v). To get the stated uniqueness, it is then enough to apply Lemma 26.3 with A == (M) and p == 2. To prove (iii), we note from Lemma 26.3 with At == (lvI)t + Es<t(Ms)2 that there exist some processes V n E £ satisfying Vnb,.M -+ V M and (V n . M - V . M)*  0 a.s. In particular, (Vn . M) -t (V . M) a.s., and so (Hi) follows from the corresponding relation for the elementary integrals V n . M. The argument relies on the fact that Est(.L\1s)2 < 00 a.s. To 
518 Foundations of Modern Probability verify this, we may assume that M E M5 and define tn,k - kt2- n for k < 2 n . By Fatou's lemma E" (D.Ms)2 s5:t < Eliminf" (Mt k - Mt k_l)2 n--+oo k n, n, < lim inf E" (Mt k - Mt k_l)2 == EMt2 < 00. 0 n-+oo k n, n, A semimartingale is defined as a right-continuous, adapted process X admitting a decomposition M + A, where M is a local martingale and A is a process of locally finite variation starting at O. If the variation of A is even locally integrable, we can write X == (M + A - A) + A, where A denotes the compensator of A. Hence, in this case we can choose A to be predictable. The decomposition is then a.s. unique by Propositions 17.2 and 25.16, and X is called a special semimartingale with canonical decomposition M + A. Levy processes are the basic examples of semimartingales. In particular, we note that a Levy process is a special semimartingale iff its Levy measure v satisfies J(x 2 f\ Ixl)v(dx) < 00. From Theorem 25.5 it is further seen that any local submartingale is a special semimartingale. The next result extends the stochastic integration to general semimartin- gales. At this stage we consider only locally bounded integrands, which covers most applications of interest. Theorem 26.4 (semimartingale integral, Doleans-Dade and Meyer) The L 2 -integral of Theorem 26.2 and the Lebesgue-Stieltjes integral extend a.s. uniquely to a bilinear map of any semimartingale X and locally bounded, predictable process V into a semimartingale V . X. This integral satisfies conditions (ii)-(iv) of Theorem 26.2 and is such that, if V > IVnl -+ 0 for some locally bounded, predictable processes V, VI, v 2 , . .. , then (V n . X); .4 0 for all t > O. Finally, V. X is a local martingale whenever this holds for X. Our proof relies on the following basic decomposition. Lemma 26.5 (truncation, Doleans-Dade, Jacod and Memin, Yan) Any local martingale M has a decomposition into local martingales M' and Mil, where M' has locally integrable variation and IM"I < 1 a.s. Proof: Define At ==" Msl{IMsl > }, t > O. s::;t By optional sampling, we note that A has locally integrable variation. Let A denote the compensator of A, and put M' = A - A and Mil == M - MI. Then M' and Mil are again local martingales, and M' has locally integrable variation. Furthermore, ILlM"I < ILlM - AI + IAI <  + IA" and so it suffices to show that IAI < . Since the constructions of A and A commute with opt,ional stopping, we may then assume that M and 
26. Semimartingales and General Stochastic Integration 519 M' are uniformly integrable. Now A is predictable, so the times T = n J\ inf{t; IAI > !} are predictable by Theorem 25.14, and it is enough to show that IATI < ! a.s. Clearly, E[MTIFT-] = E[MIFT-] = 0 a.s., and so by Lemma 25.3 IATI IE[ArIFr-]1 = IE[Mr; IMrl > !IFr-]1 IE[LlM T ; ILlM.,.1 < IFr-]1 < . o Proof of Theorem 26.4: By Lemma 26.5 we may write X = M + A, where M is a local martingale with bounded jumps, hence a local L 2 -martingale, and A has locally finite variation. For any locally bounded, predictable process V we may then define V . X = V . M + V . A, where the first term is the integral in Theorem 26.2, and the second term is an ordinary Lebesgue- Stieltjes integral. If V > IVnl -t 0, then (V; . (M))t -t 0 and (V n . A); -t 0 by dominated convergence, and so Theorem 26.2 yields (V n . X);  0 for all t > o. To prove the uniqueness, it suffices to prove that if M = A is a local £2_ martingale of locally finite variation, then V. M = V. A a.s. for every locally bounded, predictable process V, where V . M is the integral in Theorem 26.2 and V . A is an elementary Stieltjes integral. The t",.o integrals clearly agree when V E E. For general V, we may approximate as in Lemma 26.3 by processes V n E £ such that ((V n - V)2. (M))* -t 0 and (IV n - VI.A)* -t 0 p a.s. But then (V n . M)t -t (V · M)t and (V n . A)t -t (V . A)t for every t > 0, and the desired equality follows. To prove the last assertion, we may reduce by means of Lemma 26.5 and a suitable localization to the case when V is bounded and X has integrable variation A. By Lemma 26.3 we may next choose some uniformly bounded processes VI, V 2 ,.. . E £ such that (IV n - VI . A)t -t 0 a.s. for every t > o. Then (V n . X)t -t (V . X)t a.s. for all t, and by dominated convergence this remains true in £1. Thus, the martingale property of V n . X carries over to V.X. 0 For any semimartingales X and Y, the left-continuous versions X_ (X t -) and Y_ = (yt_) are locally bounded and predictable, and so they can serve as integrands in the general stochastic integral. './e may then define the quadratic variation [X] and covariation [X, Y] by the integration-by- parts formulas [X] [X, Y] X 2 - X5 - 2X _ . X, - XY-Xoyo-X- .y-y_.x ([X + Y] - [X - Y])j4. (1) In particular, [X] = [X, X]. Here we list some further basic properties of the covariation. 
520 Foundations of Modern Probability Theorem 26.6 (covariation) For any semimartingales X and Y, (i) [X, Y] == [X - Xo, Y - Yo] a.s.; (ii) [X] is a.s. nondecreasing, and [X, Y] is a.s. symmetric and bilinear; (iii) I[X, Y]I < J Id[X, YJI < [X]1/2[y]1/2 a.s.; (iv) [X] == (X)2 and Ll[X, Y] == Xy a.s.; (v) [V. X, Y] == V . [X, Y] a.s. for any locally bounded, predictable V; (vi) [X r , Y] == [xr, yr] == [X, y]r a.s. for any optional time T; (vii) if M, N E Mtoc' then [M, N] has compensator (M, N); (viii) if A has locally finite variation, then [X, A]t == Es::;t AXsAAs a.s. Proof: The symmetry and bilinearity of [X, Y] are obvious from (1), and to get (i) it remains to check that [X, Yo] == o. (ii) We may extend Proposition 17.17 with the same proof to general semimartingales. In particular, [X]s < [X]t a.s. for any s < t. By right- continuity the exceptional null set can be chosen to be independent of s and t, which means that [X] is a.s. nondecreasing. Relation (iii) may now be proved as in Proposition 17.9. (iv) By (1) and Theorem 26.2 (iii), Ll[X, Y]t (XY)t - A(X_ . Y)t - Ll(Y_ . X)t Xtlt - Xt-lt- - Xt-Ayt - yt-Xt == Xtyt. ( v) For V E £ the relation follows most easily from the extended version of Proposition 17.17. Also note that both sides are a.s. linear in V. Now let V, VI, V 2 ,... be locally bounded and predictable with V > IVnl ---1- o. Then V n . [X, Y] ---1- 0 by dominated convergence, and by Theorem 26.4 we have p [V n . X, Y] == (V n . X)Y - (V n . X)_ . Y - (VnY_) . X -+ o. Using a monotone class argument, we may now extend the relation to arbitrary V. (vi) This follows from (v) with V = l[O,r]. (vii) Since M_.N and N_.M are local martingales, the assertion follows from (1) and the definition of (M, N). (viii) For step processes A the stated relation follows from the extended version of Proposition 17.17. Now assume instead that .6.A < e, and con- clude from the same result and property (iii) together with the ordinary Cauchy-Buniakovsky inequality that 2 i t [X, A]; V I E A XsAAs l < [X]t[A]t < E[X]t IdAsl. st 0 The assertion now follows by a simple approximation. o We may now extend the Ito formula of Theorem 17.18 to a substitution rule for general semimartingales. By a semimartingale in ]Rd we mean a pro- cess X = (Xl,..., X d ) such that each component Xi is a one-dimensional 
26. Semimartingales and General Stochastic Integration 521 semimartingale. Let [Xi, Xj]C denote the continuous components of the finite-variation processes [Xi, X j ], and write fI and fI; for the first- and second-order partial derivatives of f, respectively. Summation over repeated indices is understood as before. Theorem 26.7 (substitution rule, Kunita and Watanabe) For any semi- martingale X = (Xl,... ,Xd) in JRd and function f E C 2 (JRd), we have f(Xt) f(Xo) + it fI(Xs-)dX +  it f:;(Xs_)d[Xi,XJ] + " {f(Xs) - f:(Xs-)X}. (2) sS:.t Proof: Assuming that (2) holds for some function f E C 2 (JR d ), we shall prove for any k E {I, . . . , n} that (2) remains true for g(:1;) == xkf(x). Then note that by (1) g(X) == g(Xo) + X . f(X) + f(X-) . X k + [X k , f(X)]. (3) Writing J(x,y) = f{x) - f{y) - fI(y)(xi - Yi), we get by (2) and property (ii) of Theorem 26.2 X . f{X) Xf:(X-) . Xi + Xf:;(X-) . [X\ xj]C  k  + s Xs_f(Xs, X s -). (4) Next we note that, by properties (ii), (iv), (v), and (viii) of Theorem 26.6, [X k , j(X)] I k i " k " f i (X_) . [X , X ] +  Xs j(X s , Xs-) s fI (X_) . [X k , Xi]C + Ls X: f(Xs)' (5) Inserting (4) and (5) into (3), and using the elementary formulas g ( x ) gj ( x ) g(x,y) bikf(x) + xkfI(x), bikfj(x) + bjkfI(x) + xkflj(x), (Xk - Yk)(f(x) - f(y)) + YkJ(X, y), we obtain after some simplification the desired expression for g(X). Equation (2) is trivially true for constant functions, and it extends by induction and linearity to arbitrary polynomials. Now any function f E C 2 (JRd) may be approximated by polynomials, in such a way that all derivatives up to the second order tend uniformly to those of f on every compact set. To prove (2) for f, it is then enough to show that the right- hand side tends to zero in probability, as f and its first- and second-order derivatives tend to zero, uniformly on compact sets. For the two integrals in (2), this is clear by the dominated convergence property of Theorem 26.4, and it remains to consider the last term. Writing Bt = {x E ]Rd; Ixl < X;} and IIgliB = sUPB Igl, we get by Taylor's formula 
522 Foundations of Modern Probability in 1R d " Ij(Xs,Xs-)1 L..-is$;t < ,-,. Li,jllf:jIlBt LS91LlXsl2 L. .lIf::jIlB t L.[Xi]t -t O. ,J  < The same estimate shows that the last term has locally finite varIa- tion. 0 To illustrate the use of the general substitution rule, we consider a partial extension of Proposition 21.2 to general semimartingales. Theorem 26.8 (Do leans ' exponential) For any semimartingale X with Xo = 0, the equation Z = 1 + Z_ . X has the a.s. unique solution Zt = £(X) = exp(X t - [X]) II (1 + Xs)e-xs, t > O. (6) st Note that the infinite product in (6) is a.s. absolutely convergent, since Es<t(Xs)2 < [X]t < 00. However, we may have Xs = -1 for some s > -0, in which case Z = 0 for t > s. The process £(X) in (6) is called the Doleans exponential of X. When X is continuous, we get £(X) = exp(X - ![X]), in agreement with the notation of Lemma 18.21. For processes A of locally finite variation, formula (6) simplifies to £(A) = exp(A) II (1 + As), t > o. s$;t Proof of Theorem 26.8: To check that (6) is a solution, we may write Z = f(Y, V), where Y = X - ![X]C, V = I1(1 + X)e-X, and f(y, v) = eYv. By Theorem 26.7 we get Z - 1 = Z_. Y + e Y - . V + Z- . [X]C + L {Z - Z_X - eY-V}. (7) Now e Y - . V = E e Y -  V since V is of pure-jump type, and furthermore Z = Z_X. Hence, the right-hand side of (7) simplifies to Z_ . X, as desired. To prove the uniqueness, let Z be an arbitrary solution, and put V = Ze- Y , where Y = X - [X]C as before. By Theorem 26.7 we get V - 1 = e- Y -. Z - V_ . Y +  V_ . [X]C - e- Y - . [X, Z]C + L {V + V_Y - e- Y - Z} V_ . X - V_ . X +  V_ . [X]C +  V_ . [X]C - V_ · [X]C + L{LlV + V_LlX - V_X} - LV., 
26. Semimartingales and General Stochastic Integration 523 Thus, V is a purely discontinuous process of locally finite variation. We may further compute V - Ze- Y - Z_e- Y - == (Z_ + LlZ)e-Y--Y - Z_e- Y - - V_ {(I + LlX)e- AX - 1 } , which shows that V == 1 + V_ . A with A == E{(1 + X)e-X - I}. It remains to show that the homogeneous equation V == V_ . A has the unique solution V == O. Then define- Rt == J(O,t] IdAI, and conclude from Theorem 26.7 and the convexity of the function x r-+ x n that R n == nRr:..- 1 . R + L(Rn - nRr:..- 1 R) > nR-l . R. (8) We may now prove by induction that * < *R/n!, t > 0, n E Z+. (9) This is obvious for n = 0, and assuming (9) to be true for n - 1, we get by (8) Y:* == ( V . A ) * < *(R-l. R)t < * Rr t - t - ( _ 1) ' - " n. n. as required. Since R In! -t 0 as n -t 00, relation (9) yields * == 0 for all t > O. 0 The equation Z == 1 + Z_ . X arises naturally in conneetion with changes of probability measure. The following result extends Proposition 18.20 to general local martingales. Theorem 26.9 (change of measure, van Schuppen and Wong) Let Q == Zt . P on:F t for all t > 0, and consider a local P -martingale M such that the process [M, Z] has locally integrable variation and P-co1npensator (M, Z). Then M == M - Z=l . (M, Z) is a local Q-martingale. A lemma will be needed for the proof. Lemma 26.10 (integration by parts) If X is a semimartingale and A is a predictable process of locally finite variation, then AX == A . X + X _ . A a.s. Proof: We need to show that A . X == [A, X] a.s., which by Theorem 26.6 (viii) is equivalent to f AAsdXs = L AAsAXs, t > o. J(O,t] st Noting that the series on the right is absolutely convergent by the Cauchy- Buniakovsky inequality, we may reduce, by dominated convergence on each side, to the case when A is constant apart from finitely many jumps. Using Lemma 25.3 and Theorem 25.14, we may next proceed to the case when A has at most one jump, occurring at some predictable time T. Introducing 
524 Foundations of Modern Probability an announcing sequence (Tn) and writing Y = A. X, we get by property (iv) of Theorem 26.2 Y TnAt == 0 = ¥t - ¥tAT a.s., t > 0, n E N. Thus, even Y is constant apart from a possible jump at T. Finally, property (iii) of Theorem 26.2 yields YT = ATXT a.s. on {T < oo}. 0 Proof of Theorem 26.9: For each n E N, let Tn =_inf{t; Zt < 1In}, and note that Tn  00 a.s. Q by Lemma 18.17. Hence, M is well defined under Q, and it suffices as in Lemma 18.15 to show that (M Z)T n is a local P- martingale for every n. Writing  for equality up to a local P-martingale, we may conclude from Lemma 26.10 with X == Z and A == Z=l . (M, Z) that, on every interval [0, Tn], M Z  [M, Z] ;!!; (M, Z) == Z_ . A  AZ. - m Thus, we get MZ == (M - A)Z rv 0, as required. o Using the last theorem, we may easily show that the class of semimartin- gales is invariant under absolutely continuous changes of the probability measure. A special case of this result was previously obtained as part of Proposition 18.20. Corollary 26.11 (preservation law, Jacod) If Q « P on Ft for all t > 0, then every P-semimartingale is also a Q-semimartingale. Proof: Assume that Q = Zt . P on :Ft for all t > O. We need to show that every local P-martingale M is a Q-semimartingale. By Lemma 26.5 we may then assume M to be bounded, so that [M] is locally bounded. By Theorem 26.9 it suffices to show that [M, Z] has locally integrable variation, and by Theorem 26.6 (iii) it is then enough to prove that [Z]1/2 is locally integrable. Now Theorem 26.6 (iv) yields [Z];/2 < [Z];2 + IZtl < [Z]:2 + Z:_ + IZtl, t > 0, and so the desired integrability follows by optional sampling. o OUf next aim is to extend the BDG inequalities of Theorem 17.7 to general local martingales. Such an extension turns out to be possible only for exponents p > 1. Theorem 26.12 (norm inequalities, Burkholder, Davis, Gundy) There exist some constants c p E (0, 00 ), p > 1, such that for any local martingale M with Mo = 0, c;lE[M]2 < EM*P < CpE[M]2, P > 1. (10) As in Corollary 17.8, it follows in particular that M is a uniformly integrable martingale whenever E[M]2 < 00. Proof for p = 1 (Davis): To exploit the symmetry of the argument, we write M'rJ and M# for the processes M* and [M]1/2, taken in either order. 
26. Semimartingales and General Stochastic Integration 525 Put J == b,.M, and define At = '""" J s 1{IJ s l > 2J:_}, t > o. st Since Ib,.AI < 2b,.J*, we have 1 00 IdAsl = LJ6.Asl < 2J* < 4ML,. Writing Ii for the compensator of A and putting D == A - A, we get ED V ED'to < E 1 00 IdDsl  E 1 00 IdAsl ::: EML,. (11) To get a similar estimate for N == M - D, we introduce the optional times Tr == inf{t; Nf V Jt > r}, r > 0, and note that P{N > r} < P{Tr < oo} + P{TT == 00, N > r} < P{N > r} + P{J* > r} + P{Nr > r}. (12) Arguing as in the proof of Lemma 26.5, we get INI < 4J, and so N1 r < N /\ (Nr- + 4J;r-) < N /\ 5T. Since N 2 - [N] is a local martingale, we get by Chebyshev's inequality or Proposition 7.15, respectively, r 2 P{N;r > r} 5 EN; 5 E(N /\r)2. Hence, by Fubini's theorem and some calculus, 1 00 P{N;r > r}dr  1 00 E(NL, /\r)2r- 2dr :S. ENL,. Combining this with (11)-(12) and using Lemma 3.4, we get EN:x, 1 00 P{ N:x, > r }dr < 1 00 (P{NL, > r} + P{J* > r} + P{N;r > r}) dr  EN + EJ* 5 EM. It remains to note that EM < ED + EN. o Extension to p > 1 (Garsia): For any t > 0 and B EFt, we may apply (10) with p == 1 to the local martingale 1B(M - M t ) to get a.s. Cl l E[[M - Mt]2IFt] < E[(M - Mt) IFtJ < c1E[[M - Mt}2IFt]. 
526 Foundations of Modern Probability Since [M]2 - [M];/2 < [M _ Mt]2 < [M]2, M - Mt < (M - Mt) < 2M, the relation E[Aoo - At 1Ft] ::S E[(IFtJ occurring in Proposition 25.21 holds with At = [M]/2 and ( = M*, and also with At = Mt and ( = [M]2. Since 6.M* < 6. [ M ] 1/2 = I M I < [ M ] 1/2 1\ 2M* t- t t - t t, we have in both cases 6Ar :S E[(IFr] a.s. for every optional time T, and so the cited condition remains fulfilled for the left-continuous version A_. Hence, Proposition 25.21 yields IIAoo lip ;S 1I(lIp for every p > 1, and (10) follows. 0 We may use the last theorem to extend the stochastic integral to a larger class of integrands. Then write M for the space of local martingales and Mo for the subclass of processes M with Mo = O. For any M E M, let L(M) denote the class of predictable processes V such that (V 2 . [M])1/2 is locally integrable. Theorem 26.13 (martingale integral, Meyer) The elementary predictable integral extends a.s. uniquely to a bilinear map of any M E M and V E L(M) into V · MEMo, such that if V, VI, V 2 ,... E L(M) with IVnl < V and (V; . [M])t  0 for some t > 0, then (V n . M);  o. This integral satisfies properties (ii)-(iv) of Theorem 26.2 and is characterized by the condition [V. M,N] = V. [M,N] a.s., N EM. (13) Proof: For the construction of the integral, we may reduce by localization to the case when E(M - Mo)* < 00 and E(V 2 . [M])2 < 00. For each n E N, define V n = V1{IVI < n}. Then V n . MEMo by Theorem 26.4, and by Theorem 26.12 we have E(V n . M)* < 00. Using Theorems 26.6 (v) and 26.12, Minkowski's inequality, and dominated convergence, we obtain E(V m . M - V n . M)* < E[(V m - V n ) . M]2 E«V m - V n )2 . [M])2 -t o. ...- Hence, there exists a process V . M with E(V n . M - V · M)* --t 0, and clearly V . MEMo and E(V . M)*oo. To prove (13), we note that the relation holds for each V n by Theorem 26.6 (v). Since E[V n . M - V . M]2 -t 0 by Theorem 26.12, we get by Theorem 26.6 (iii) for any N E M and t > 0 I[V n . M, N]t - [V . M, N]tl < [V n . M - V . M]:/2[N];/2  O. (14) 
26. Semimartingales and General Stochastic Integration 527 Next we note that, by Theorem 26.6 (iii) and (v), I t IVnd[M, NJI = I t Id[V n . M, NIl < (Vn . M];/2 [NI;/2. As n  00, we get by monotone convergence on the left, and Minkowski's inequality on the right I t IV d[M, NIl < (V . MI;/2 [N];/2 < 00. Hence, by dominated convergence V n . [M, N]  V. [M, N], and (13) follows by combination with (14). To see that (13) determines V. M, it remains to note that if [M] == 0 a.s. for some MEMo, then M* =: 0 a.s. by Theorem 26.12. To prove the stated continuity property, we may reduce by localization to the case when E(V 2 . [M])2 < 00. But then E(V; . [M])2 -7 0 by dominated convergence, and Theorem 26.12 yields E(V n . M)* -7 O. To prove the uniqueness of the integral, it is enough to consider bounded integrands V. We may then approximate as in Lemma 26.3 by uniformly bounded processes V n E £ with «V n - V)2 . [M])  0, and conclude that (V n . M - V . M)*  o. Of the remaining properties in Theorem 26.2, relation (ii) may be proved as before by means of (13), whereas (iii) and (iv) follow most easily by truncation from the corresponding statements in Theorem 26.4. 0 A semimartingale X = M + A is said to be purely discontinuous if there exist some local martingales M 1 , M 2 , . .. of locally finite variation such that E(M - Mn)*2 --T 0 for every t > o. The property is clearly independent of the choice of decomposition X = M + A. To motivate the terminology, we note that any martingale M of locally finite variation may be written as M = Mo + A - A, where At = Ls<t LlMs and A denotes the compensator of A. Thus, M - Mo is in this case a compensated sum of jumps. Our present goal is to establish a fundamental decomposition of a general semimartingale X into a continuous and a purely discontinuous component, corresponding to the elementary decomposition of the quadratic variation [X] into a continuous part and a jump part. In this connection the reader is cautioned that, although any adapted process of locally finite variation is a purely discontinuous semimartingale, it may not be purely discontinuous in the sense of real analysis. Theorem 26.14 (decomposition of semimartingales, Yoeurp, Meyer) Ev- ery semimartingale X has an a.s. unique decomposition X = Xo+xc+X d , where XC is a continuous local martingale with X8 = 0 and X d is a purely discontinuous semimartingale. Furthermore, [XC] = [X]C and [X d ] = [X]d a.s. Proof: To decompose X it is enough to consider the martingale com- ponent in any decomposition X = Xo + M + A, and by Lemma 26.5 we may assume that M E M5,loc. We may then choose some optional times 
528 Foundations of Modern Probability Tn t 00, where TO = 0, such that MTn E M6 for each n. It is enough to construct the desired decomposition for each process MTn - M T n-l, which reduces the discussion to the case when M E M5. Now let C and D denote the classes of continuous and purely discontinuous processes in M5, and note that both are closed linear subspaces of the Hilbert space M5. The desired decomposition will follow from Theorem 1.33 if we can show that DJ.. c c. Then let M E V..L. To see that M is continuous, fix any E > 0, and put T = inf {t; Mt > c}. Define At = 1 {T < t}, let A denote the compensator of A, and put N = A-A. Integrating by parts and using Lemma 25.13 gIves EA; < E J AdA = E J AdA = EA,. = EAr < 1. Thus, N is L2-bounded and hence lies in D. For any bounded martingale M', we get EMNoo E J M'dN = E J b..M'dN - E J b..M'dA = E[b..M; T < 00], where the first equality is obtained as in the proof of Lemma 25.7, the second is due to the predictability of M, and the third holds since A is predictable and hence natural. Letting M' -4 M in M 2 , we obtain o = EMooNoo = E[MT; T < 00] > EP{r < oo}. Thus, M < E a.s., and therefore M < 0 a.s. since E is arbitrary. Similarly, D..M > 0 a.s., and the desired continuity follows. Next assume that MEV and N E C, and choose martingales Mn --+ M of locally finite variation. By Theorem 26.6 (vi) and (vii) and optional sampling, we get for any optional time r o = E[M n , N]T = EM;: NT -t EMTN T = E[M, N]r, and so [M, N] is a martingale by Lemma 7.13. Since it is also continuous by (15), Proposition 17.2 yields [M,N] == 0 a.s. In particular, EMooNoo = 0, which shows that C 1- V. The uniqueness assertion now follows easily. To prove the last assertion, we conclude from Theorem 26.6 (iv) that, for any M E M 2 , [M]t == [M] +  (Ms)2, t > o. L...J s5t (15) Letting MEV, we may choose martingales of locally finite variation Mn  M. By Theorem 26.6 (vi) and (viii) we have [Mn]c = 0 and E[Mn-M)oo 
26. Semimartingales and General Stochastic Integration 529 --1' O. For any t > 0, we get by Minkowski's inequality and (15) {Ls<t (M:)2} 1/2 - {Ls<t (Ms)2} 1/2 - < {L s : 9 (M: -=- Ms)2} 1/2 < [M n - M]/2  0, I [M n ]:/2 - [M]:/21 < [M n - M]/2  O. Taking limits in (15) for the martingales Mn, we get the same formula for M without the term [M], which shows that [M] == [M]d. Now consider any M E M 2 . Using the strong orthogonality [MC,M d ] == 0, we get a.s. [M]C + [M]d == [M] = [M C + M d ] == [M C ] + [M d ], which shows that even [MC] == [M]C a.s. By the sanle argument com- bined with Theorem 26.6 (viii) we obtain [X d ] == [X]d a.s. for any semimartingale X. 0 The last result immediately yields an explicit formula for the covariation of two semimartingales. Corollary 26.15 (decomposition of covariation) For a.ny semimartingale X, the process Xc is the a.s. unique continuous local 1nartingale M with !VIo == 0 such that [X - M] is purely discontinuous. Furthermore, we have a.s. for any semimartingales X and Y [X, Y]t = [XC, Y C ] +" XsYs, t > O. (16) st In particular, we note that (V . X)C == V . Xc a.s. for any semimartingale X and locally bounded, predictable process v. Proof: If M has the stated properties, then [(X - M)C] == [X - M]C == 0 a.s., and so (X -M)C = 0 a.s. Thus, X -M is purely discontinuous. Formula (16) holds by Theorem 26.6 (iv) and Theorem 26.14 when X == Y, and the general result follows by polarization. 0 The purely discontinuous component of a local martingale has a fur- ther decomposition, similar to the decompositions of optional times and increasing processes in Propositions 25.4 and 25.17. Corollary 26.16 (decomposition of martingales, Yoeurp) Every purely discontinuous local martingale M has an a.s. unique decomposition M = Mo + Mq + Ma with purely discontinuous Mq, Ma E Mo, where Mq is quasi-leftcontinuous and Ma has accessible jumps. Furthermore, there exist some predictable times 71, 72, . .. with disjoint graphs such that {t; Mf =I- O} C Un [7n] a.s. Finally, [Mq] = [M]q and [M a ] = [M]a a.s., and also (Mq) = (M)C and (M a ) == (M)d a.s. when M E Mfoc. Proof: Introduce the locally integrable process At == Es<t{(Ms)21\ I} with compensator A, and define Mq == M - Mo - Ma = l{At == O} . M. 
530 Foundations of Modern Probability By Theorem 26.4 we have Mq, Ma E Mo and Mq = 1 {A = O} M a.s. Furthermore, Mq and Ma are purely discontinuous by Corollary 26.15. The proof may now be completed as in the case of Proposition 25.17. 0 We may illustrate the use of the previous decompositions by proving two exponential inequalities for martingales with bounded jumps. Theorem 26.17 (exponential inequalities) Let M be a local martingale with Mo = 0 such that IMI < c for some constant c < 1. (i) If [M]oo < 1 a.s., then P{M* > r};s exp{-r2/(1+rc)}, r > O. (ii) If (M)oo < 1 a.s., then P{M* > r}  exp{-rlog(l +rc)jc}, r > o. For continuous martingales both bounds reduce to e- r2 /2, which can also be obtained directly by more elementary methods. For the proof of Theorem 26.17 we need two lemmas. We begin with a characterization of certain pure jump-type martingales. Lemma 26.18 (accessible jump-type martingales) Let N be a pure jump- type process with integrable variation and accessible jumps. Then N is a martingale iff E[LlNrIFr-] = 0 a.s. for every finite predictable time 7. Proof: By Proposition 25.17 there exist some predictable times '1, 72, . . . with disjoint graphs such that {t > 0; D.N t =1= O} C Un[Tn]. Assuming the stated condition, we get by Fubini's theorem and Lemma 25.2 for any bounded optional time 7 EN T - Ln E[.6.N Tn ; Tn < T] Ln E[E[.6.N T nI.r Tn -]; Tn < T] = 0, and so N is a martingale by Lemma 7.13. Conversely, given any uni- formly integrable martingale N and finite predictable time T, we have a.s. E[N.,.IFr-] = N T - and hence E[LlNrIF.,._] = o. 0 For general martingales M, the process Z = e M -[M]/2 in Lemma 18.21 is not necessarily a martingale. For many purposes, however, it can be replaced by a similar supermartingale. Lemma 26.19 (exponential supermartingales) Let M be a local martin- gale with Mo = 0 and IMI < c < 00 a.s., and put a = f(c) and b = g(c), where f(x) = -(x + log(l - x)+)x- 2 , g(x) = (eX - 1 - x)x- 2 . Then the processes X = eM -a[M] and Y = eM -b(M} are supermartingales. 
26. Semimartingales and General Stochastic Integration 531 Proof: In case of X we may clearly assume that c < 1. By Theorem 26.7 we get, in an obvious shorthand notation, X::.- 1 . X = M - (a - )[M]C + L {e6.M-a(6.M)2 - 1 - b..M} . Here the first term on the right is a local martingale, and the second term is nonincreasing since a > !. To see that even the sum is nonincreasing, we need to show that exp(x - ax 2 ) < 1 + x or f( -x) < f(c) whenever Ixl < c. But this is clear by a Taylor expansion of each side. Thus, X.=l . X is a local supermartingale, and since X > 0, the same thing is true for X_ . (X.=l . X) = X. By Fatou's lemma it follows that X is a true supermartingale. In the case of Y, we may decompose M according to Theorem 26.14 and Proposition 26.16 as M == MC + Mq + Ma, and conclude by Theorem 26.7 that yl . Y _ M - b(M)C + [M]C + L {e6.M-b6.(M) - 1 - b..M} - M + b([Mq] - (Mq)) - (b - )[M]C '" { M-bA(M) _ 1 + D.M + b(6.M)2 } +  e 1 + b(M) '" { I + b...Ma + b(b...M a )2 _ 1 _ b..M a } . +  1 + bb...(Ma) Here the first two terms on the right are martingales, and the third term is nonincreasing since b > !. Even the first sum of jumps is nonincreasing since eX - 1 - x < bx 2 for Ixl < c and e Y < 1 + y for y > o. The last sum clearly defines a purely discontinuous process N of locally finite variation and with accessible jumps. Fixing any finite predictable time T and writing  == D.M r and 'TJ == (M)r' we note that 1 +  + b2 E 1+br7 -l-{ < EI1++b2-(1+)(1+b1])1 bEI2 - (1 + )'TJI < b(2 + c)E2. Since E Lt (b...Mt) 2 < E[M]oo = E(M)oo < 1, we conclude that the total variation of N is integrable. lJsing Lemmas 25.3 and 26.18, we also note that a.s. E[IFT-] == 0 and E[2IFT_] == E[[M]rIFr-] == E[1JI F r-] == 1J. Thus, E [ 1 + {+ b{2 _ 1 _ t :F _ ] == 0 l+b1]  r , and Lemma 26.18 shows that N is a martingale. The proof may now be completed as before. 0 
532 Foundations of Modern Probability Proof of Theorem 26.17: (i) Fix any u > 0, and conclude from Lemma 26.19 that the process X; = exp{uM t - u 2 f(uc)(M]t}, t > 0, is a positive supermartingale. Since [M] < 1 and X8 = 1, we get for any r>O P{SUPtMt > r} < p{ SUPtX > exp{ ur - u 2 f(uc)} } < exp{-ur+u 2 f(uc)}. (17) Now define F(x) = 2xf(x), and note that F is continuous and strictly increasing from [0,1) onto JR+. Also note that F(x) < x/(1 - x) and hence F-l(y) > y/(l +y). Taking u = p-l(rc)/c in (17), we get P{SUPtMt > r} < exp{-rF-l(rc)lc} < exp{ -r2 1(1 + rc)}. It remains to combine with the same inequality for -M. (ii) Define G(x) = 2xg(x), and note that G is a continuous and strictly increasing mapping onto IR+. Furthermore, G(x) < eX -1, and so G-l(y) > 10g(1 + y). Proceeding as before, we get P{SUPtMt > r} < exp{-rG-l(rc)/c} < exp{-rlog(l+rc)/c}, and the result follows. o A quasi-martingale is defined as an integrable, adapted, and right-con- tinuous process X such that sup L E IX tk - E[Xtk+lIFtk] I < 00, 7r k$.n (18) where the supremum extends over all finite partitions 7r of 1R+ of the form o = to < tl < ... < t n < 00, and the last term is computed under the conventions t n + 1 = 00 and Xoo = o. In particular, we note that (18) holds when X is the sum of an Ll-bounded martingale and a process of integrable variation starting at o. The next result shows that this case is close to the general situation. Here localization is defined in the usual way in terms of a sequence of optional times Tn t 00. Theorem 26.20 (quasi-martingales, Rao) Any quasi-martingale is a dif- ference of two nonnegative supermartingales. Thus, a process X with Xo = 0 is a local quasi-martingale iff it is a special semimartingale. Proof: For any t > 0, let Pt denote the class of partitions 7r of the interval [t,oo) of the form t = to < t 1 < ... < tn, and define 'f}; = " E[(X tk - E[Xtk+lIFtk]):i:\ Ft], 7r E Pt, L....J k 5: n where t n +l =: 00 and Xoo = 0 as before. We claim that 'TJ; and 'f}; are a.s. nondecreasing under refinements of 7r E Pt. To see this, it is clearly 
26. Semimartingales and General Stochastic Integration 533 enough to add one more division point u to Ti, say in the interval (tk, tk+l). Put Q == X tk - Xu and /3 == Xu - X tk + 1 . By subadditivity and Jensen's inequality we get the desired relation E [E [a + (31:F t k ] 1: l:Ft] < E [ E [ Q l:Ft k ]:f: + E [f3I:F t k ] :f: I :Ft] < E [E[al:Ftk]:f: + E[,BIFuLf: I :Ft] . Now fix any t > 0, and conclude from (18) that m; = SUP7rE'P t E'rJ < 00. For each n E N we may then choose some 7r n E Pt with ET/;-n > mt - n -1. The sequences ("';-n) are Cauchy in L 1 , and so they converge in L 1 toward some limits 1:. Note also that EIT/; - :f: I < n -1 whenever Ti is a refinement of 7r n. Thus, 1J;- -+ :i: in L 1 along the directed set Pt. Next fix any s < t, let Ti E Pt be arbitrary, and define 1r' E Ps by adding the point s to 1r. Then Ys:l: > 17;' = (X s - E[Xtl:Fs]):f: + E[T/;I:F s ] > E[1];I F s]. Taking limits along Pt on the right, we get Y s :!: > E[y:f: l:Fs] a.s., which means that the processes y:l: are supermartingales. By 'Theorem 7.27 the right-hand limits along the rationals zt =  then exist outside a fixed null set, and the processes Z:f: are right-continuous supermartingales. For 1r E Pt we have Xt = 17; - 17; -t + - -, and so zt - Z; == X t + == Xt a.s. 0 The next result shows that semimartingales are the most general pro- cesses for which a stochastic integral with reasonable continuity properties can be defined. As before, £ denotes the class of bounded, predictable step processes with jumps at finitely many fixed points. Theorem 26.21 (stochastic integrators, Bichteler, Dellacherie) A right- continuous, adapted process X is a semimariingale iff for any VI, V 2 , . . . E £ with IIV:lloo -+ 0 we have (V n . X)t  0 for all t > o. The proof is based on three lemmas, the first of which separates the crucial functional-analytic part of the argument. Lemma 26.22 (convexity and tightness) For any tight, convex set K C L 1 (P), there exists a bounded random variable p > 0 with sUPEJC Ep < 00. Proof (Yan): Let B denote the class of bounded, nonnegative random variables, and define C = {')' E B; sUPEJC E( ')') < oo}. We claim that, for any ,1, ,2, . .. E C, there exists some, E C with {"y > O} == Un {"Yn > O}. Indeed, we may assume that "In < 1 and sUPEK E('n) < 1, in which case we may choose, == 2:n 2- n ,n. It is then easy to construct apE C such that P{p > O} = sUP,EC P{, > OJ. Clearly, {, > O} C {p > O} a.s., '"Y E C, (19) since we could otherwise choose a p' E C with P{p' > O} > P{p > O}. To show that p > 0 a.s., we assume that instead Pip == O} > E > O. By the tightness of JC we may choose r > 0 so large that P { > r} < € for 
534 Foundations of Modern Probability all  E /C. Then P{ cE - {3 > r} < € for all  E /C and j3 E B. B y Fat ou's lemma we obtain P{( > r} < € for all ( in the £1-closure Z == K - B. In particular, the random variable (0 == 2rl{p == O} lies outside Z. Now Z is convex and closed, and so, by a version of the Hahn-Banach theorem, there exists some 'Y E (L 1 )* = £00 satisfying supE'Y - inf E'Y{3 < supE'Y( < E'Y(o == 2rE['Y; P == 0]. (20) E}C I3EB (EZ Here 'Y > 0, since we would otherwise get a contradiction by choos- ing (3 == b 1 { 'Y < O} for large enough b > o. Hence, (20) red uces to SUPE}C E'Y < 2r E['Y; P == 0], which implies 'Y E C and E['Y; P == 0] > o. But this contradicts (19), and therefore p > 0 a.s. 0 Two further lemmas are needed for the proof of Theorem 26.21. Lemma 26.23 (tightness and boundedness) Let T be the class of optional times T < 00 taking finitely many values, and consider a right-continuous, adapted process X such that the family {X.,.; T E T} is tight. Then X* < 00 a.s. Proof: By Lemma 7.4 any bounded optional time T can be approxi- mated from the right by optional times Tn E T, and by right-continuity we have X"'n -+ X.,.. Hence, Fatou's lemma yields P{IX.,.I > r} < lim inf n P{IX':n I > r}, and so the hypothesis remains true with T replaced by the class T of all bounded optional times. By Lemma 7.6 the times Tt,n == t 1\ inf{s; IXsl > n} belong to t for all t > 0 and n E N, and as n -+ 00, we get P{X* > n} == supP{X; > n} < sup P{IXTI > n} -+ O. 0 t>O TET Lemma 26.24 (scaling) For any finite random variable , there exists a bounded random variable p > 0 such that Elp1 < 00. Proof: We may take p == (II V 1)-1. o Proof of Theorem 26.21: The necessity is clear from Theorem 26.4. Now assume the stated condition. By Lemma 4.9 it is equivalent to assume for each t > 0 that the family /Ct == {(V . X)t; V E £1} is tight, where &1 == {V E &; IVI < I}. The latter family is clearly convex, and by the linearity of the integral the convexity carries over to /Ct. By Lemma 26.23 we have X* < 00 a.s., and so by Lemma 26.24 there exists a probability measure Q f".J P such that EQX; = J X;dQ < 00. In particular, IC t C Ll(Q), and we note that IC t remains tight with respect to Q. Hence, by Lemma 26.22 there exists a probability measure R rv Q with bounded density p == dRjdQ such that K.t is bounded in Ll(R). Now consider an arbitrary partition 0 = to < tl < .. . < t n == t, and note that Lk$;n ERIX tk - ER[Xtk+lIFtk]1 = ER(V · X)t + ERIXtl, (21) 
26. Semimartingales and General Stochastic Integration 535 where V s = Lk<nsgn(ER[Xtk+ll.Ftk] - X tk ) l(tkh+d(s), s > o. Since p is bounded and V E [1, the right-hand side of (21) is bounded by a constant. Hence, the stopped process X t is a quasi-martingale under R. By Theorem 26.20 it is then an R-semimartingale, and since P rv R, Corollary 26.11 shows that X t is even a P-semimartingale. Since t is arbitrary, it follows that X itself is a P-semimartingale. 0 Exercises 1. Construct the quadratic variation [M] of a local L 2 -martingale M directly as in Theorem 17.5, and prove a corresponding version of the integration-by-parts formula. Use [M] to define the L 2 -integral of Theorem 26.2. 2. Show that the approximation in Proposition 17.17 remains valid for general semimartingales. 3. Consider a local martingale M starting at 0 and an optional time T. Use Theorem 26.12 to give conditions for the validity of the relations EM T == 0 and EM; = [M]7". 4. Give an example of a sequence of £2-bounded martingales M n such that M  0 and yet (Mn}oo  00. (Hint: Consider con1pensated Poisson processes with large jumps.) 5. Give an example of a sequence of martingales M n such that [J\;l n ]oo  0 and yet M  00. (Hint: See the preceding problem.) 6. Show that (Mn}oo  0 implies (Mn]oo  o. 7. Give an example of a martingale M of bounded variation and a bounded, progressive process V such that V 2 . (M) == 0 and yet V . M =1= o. Con- clude that the L 2 -integral in Theorem 26.2 has no continuous extension to progressive integrands. 8. Show that any general martingale inequality involving the processes M, [M], and (M) remains valid in discrete time. (Hint: Embed M and the associated discrete filtration into a martingale and filtration on 1R+.) 9. Show that the a.s. convergence in Theorem 4.23 re]nains valid in LP. (Hint: Use Theorem 26.12 to reduce to the case whenp < 1. Then truncate.) 10. Let 9 be an extension of the filtration F. Show that any F-adapted Q-semimartingale is also an F-semimartingale. Also show by an example that the converse implication fails in general. (Hint: Use Theorem 26.21.) 11. Show that if X is a Levy process in , then [X] is a subordinator. Express the characteristics of [X] in terms of those for ..X". 
536 Foundations of Modern Probability 12. For any Levy process X, show that if X is p-stable, then [X] is strictly p/2-stable. Also prove the converse, in the case when X has positive or symmetric jumps. (Hint: Use Proposition 15.9.) 13. Extend Theorem 26.17 to the case when [M]oo < a or (M) < a a.s. for some a > 1. (Hint: Apply the original result to a suitably scaled process.) 14. For any Levy process X with Levy measure v, show that X E Mfoc iff X E M 2 , and also iff J x 2 v(dx) < 00, in which case (X)t = tEXf. (Hint: Use Corollary 25.25.) 15. Show that if M is a purely discontinuous local martingale with positive jumps, then M -Mo is a.s. determined by [M]. (Hint: For any such processes M and N with [M] == [N], apply Theorem 26.14 to M - N.) 16. Show that a semimartingale X is quasi-Ieftcontinuous or has accessible jumps iff [X] has the same property. (Hint: Use Theorem 26.6 (iv).) 17. Show that a semimartingale X with IXI < c < 00 a.s. is a special semimartingale with canonical decomposition M + A satisfying IAI < c a.s. In particular, X is a continuous semimartingale iff it has a decomposi- tion M + A, where M and A are continuous. (Hint: Use Lemma 26.5, and note that IAI < c a.s. implies IAI < c a.s.) 18. Show that a semimartingale X is quasi-leftcontinuous or has accessible jumps iff it has a decomposition M + A, where M and A have the same property. Also show that, for special semimartingales, we may choose M +A to be the canonical decomposition of X. (Hint: Use Proposition 25.17 and Corollary 26.16, and refer to the preceding exercise.) 19. Show that a semimartingale X is predictable iff it is a special semi- martingale with canonical decomposition M + A such that M is continuous. (Hint: Use Proposition 25.16.) 
Chapter 27 Large Deviations Legendre-Fenchel transform; Cramer's and Schilder's theorems; large-deviation principle and rate function; functional form of the LDP; continuous mapping and extension; perturbation of dynamical systems; empirical processes and entropy; Strassen's law of the iterated logarithm In its simplest setting, large deviation theory provides the exact rate of convergence in the weak law of large numbers. To be precise, consider any i.i.d. random variables 1, 2, . .. with mean m and cumulant-generating function A(u) = logEeUi < 00, and write n = n- 1 L:k<n k. Then for any x > m, the tail probabilities P {n > x} tend to 0 at an exponential rate I (x), given by the Legendre-Fenchel transform A * of A. In higher dimensions, it is often convenient to state the result more generally in the form n-llogP{n E B} -+ -1(B), where I(B) == inf xEB l(x) and B is restricted to a suitable class of continuity sets. In this standard format of a large-deviation principle with rate function I, the result extends to an amazing variety of contexts throughout probability theory. A striking example, of fundamental importance in statistical mechan- ics, is Sanov's theorem, which provides a similar large deviation result for the empirical distributions of a sequence of i.i.d. randorIl variables with a common distribution J1. Here the rate function I is defined on the space of probability measures v on JR and agrees with the relative entropy function H(vlJ1). Another important example is Schilder's theorem for the fam- ily of resealed Brownian motions in JRd, where the rate function becomes l(x) == llxll, the squared norm in the Cameron-Martin space considered in Chapter 18. The latter result can be used to derive the Fredlin-Wentzell estimates for randomly perturbed dynamical systems. It also provides a short proof of Strassen's law of the iterated logarithm, a stunning extension of the classical Khinchin law from Chapter 13. Modern proofs of those and other large deviation results rely on some general extension principles, which also serve to explain the wide appli- cability of the present ideas. In addition to some rather straightforward and elementary techniques of continuity and approximation, we consider the more sophisticated and extremely powerful methods of inverse contin- uous mapping and projective limits, both of which play a crucial role in subsequent applications. We may also call attention to the significance of 
538 Foundations of Modern Probability exponential tightness, and to the essential equivalence between the setwise and functional formulations of the large-deviation principle. Large deviation theory is arguably one of the most technical branches of modern probability theory. For the nonexpert it then seems essential to avoid getting distracted by topological subtleties or elaborate computa- tions. Many results are therefore stated here under simplifying assumptions. Likewise, we postpone our discussion of general principles until the reader has become aquainted with the basic ideas in a concrete setting. For this reason, important applications appear both at the beginning and at the end of the chapter, separated by a more abstract discussion of some general notions and principles. Let us now return to the elementary context of i.i.d. random variables , 1, 2,... and write Sn == I:k'5:n k and n == Sn/n. If m = E exists and is finite, then P {(n > x} -+ 0 for all x > m by the weak law of large numbers. Under stronger moment conditions, the rate of convergence turns out to be exponential and can be estimated with great accuracy. This rather elementary but quite technical result lies, along with its multidimen- sional counterpart, at the core of large-deviation theory and provides both a pattern and a point of departure for more advanced developments. For motivation, we begin with some simple observations. Lemma 27.1 (convergence) Let , 1, 2,. .. be i.i.d. random variables. Then (i) n- 1 1og P{(n > x}  sUPn n-1log P{n > x} = -h(x) for all x; (ii) h is [O,oo]-valued, nondecreasing, and convex; (iii) h(x) < 00 iff P{€ > x} > o. Proof: (i) Writing Pn = P {(n > x}, we get for any m, n E N Pm+n P{Sm+n > (m+n)x} > P{Sm > mx, 8 m + n - 8m > nx} = PmPn. Taking logarithms, we conclude that the sequence - log Pn is subadditive, and the assertion follows by Lemma 10.21. (ii) The first two assertions are obvious. To prove the convexity, let x, y E IR be arbitrary, and proceed as before to get P{S2n > n(x + y)} > P{Sn > nx} P{Sn > ny}. Taking logarithms, dividing by 2n, and letting n -+ 00, we obtain h((x + y)) < (h(x) + hey)), x, y > o. (iii) If P{ > x} = 0, then P{n > x} = 0 for all n, and so hex) == 00. Conversely, (i) yields 10gP{ > x} < -h(x), and so hex) = 00 implies P{ > x}=O. 0 To determine the limit in Lemma 27.1 we need some further notation, which is given here for convenience directly in d dimensions. For any random 
27. Large Deviations 539 vector  in }Rd, we introduce the function A(u) = A(u) = logEeU, U E JRd, (1) known in statistics as the cumulant-generating function of . Note that A is convex, since by Holder's inequality we have for any u, v E IR d and p, q > 0 with p + q == 1 A(pu + qv) log Eexp((pu + qv)) < log((EeU)P(EeV)q):= pA(u) + qA(v). The surface z == A(u) in JRd+l is determined by the family of supporting hyperplanes (d-dimensional affine subspaces) with different slopes, and we note that the plane with slope x E JRd (or normal vector (1, - x )) has equation z+A*(x)==xu, uEJR d , where A * denotes the Legendre-Fenchel transform of A, given by A*(x) == sup (ux - A(u)), x E ]Rd. uEJRd (2) We can often compute A * explicitly. Here we list two simple cases that will be needed below. The results are proved by elementary calculus. Lemma 27.2 (Gaussian and Bernoulli distributions) (i) If  == (1, . . . , d) is standard Gaussian in ]Rd, then A (x) IxI2/2. (ii) If  E {O, I} with P{ == I} == P E (0,1), then A(x) == 00 for x tt: [0,1] and x I-x A (x) == x log - + (1 - x) log , x E [0, 1]. P 1-p The function A * is again convex, since for any x, y E JRd and for p and q as before A*(px + qy) suPu(P(ux - A(u)) + q(uy - A(u))] < p suPu(ux - A(u)) + q suPu(uy - A(u)) pA*(x) + qA*(y). If A < 00 near the origin, then m = E exists and agrees with the gradient V A (0). Thus, the surface z == A ( u) has tangent hyperplane z == m u at 0, and we conclude that A*(m) == 0 and A*(x) > 0 for x =f- m. If  is also truly d-dimensional, then A is strictly convex at 0, and A * is finite and continuous near m. For d = 1, we sometimes need the corresponding one-sided statements, which are easily derived by dominated convergence. The following key result identifies the function h in Lemma 27.1. For simplicity, we assume that m == E exists in [-00, (0). 
540 Foundations of Modern Probability Theorem 27.3 (rate function, Cramer, Chernoff) Let ,1,€2,... be i.i.d. random variables with m == E < 00. Then for any x > m, we have n-l1ogP{n > x} -t -A*(x). (3) Proof: Using Chebyshev's inequality and (1), we get for any u > 0 P{n > x} == P{ e uSn > e nux } < e- nux Ee uSn == enA(u)-nux, and so n-llogP{n > x} < A(u) - ux. This remains true for u < 0, since in that case A( u) - ux > 0 for x > m. Hence, by (2) we have the upper bound n-llogP{n > x} < -A*(x), x > m, n E N. (4) To derive a matching lower bound, we first assume that A < 00 on +. Then A is smooth on (0,00) with A'(O+) == m and A'(oo) == esssup - b, and so for any a E (m, b) we can choose au> 0 such that A' ( u) == a. Let 'f}, 'f}1 , TJ2, . .. be i.i.d. with distribution P{1J E B} == e-A(u) E[eU;  E B], B E B. (5) Then ATJ(r) == A(r + u) - A,(u), and therefore E", = A(O) == A(u) == a. For any € > 0, we get by (5) P{In - al < €} enA(u)E[e-nu1Jn; tiln - al < €} > enA(u)-nu(a+c)P{liin_al<E}. (6) Here the last probability tends to 1 by the law of large numbers, and so by (2) liminf n-l1ogP{In - al < e} > A(u) - u(a + e) > -A*(a + e). n-+oo Fixing any x E (m, b) and putting a == x + €, we get for small enough e > 0 liminf n-llogP{n > x} > -A*(x + 2e). noo Since A * is continuous on (m, b) by convexity, we may let e -t 0 and combine with (4) to obtain (3). The result for x > b is trivial, since in that case both sides of (3) equal -00. If instead x == b < 00, then both sides equallogP{ = b}, the left side by a simple computation and the right side by an elementary estimate. Finally, assume that x == m > -00. Since the statement is trivial when  = m a.s., we may assume that b > m. For any y E (m, b), we have o > n-llogP{n > m} > n- 1 P{ > y}  -A*(y) > -00. Here A*(y) --1> A*(m) = 0 by continuity, and (3) follows for x = m. This completes the proof when A < 00 on 1R+. 
27. Large Deviations 541 The case when A(u) == 00 for some u > 0 may be handled by truncation. Thus, for any r > m we consider the random variables k == k 1\ r. Writing AT and A; for the associated functions A and A *, we get for x > m > Er n-1IogP{n > x} > n-1IogP{ > x} -+ -A;(x). (7) Now Ar(u) t A(u) by monotone convergence as r -+ 00, and by Dini's theorem the convergence is uniform on every compact interval where A < 00. Since also A' is unbounded on the set where A < 00, it follows easily that A; (x)  A * (x) for all x > m. The required lower bound is now immediate from (7). 0 We may now supplement Lemma 27.1 with a criterion for exponential decline of the tail probabilities P{n > x}. Corollary 27.4 (exponential rate) Let ,  1 , 2, . .. be i. i. d. with m == E < 00 and b == ess sup  . Then for any x E (m, b), the probabilities P{& > x} decrease exponentially iff A(e) < 00 for some E > O. The exponential decline extends to x == b iff 0 < P { == b} < 1. Proof: If A(e) < 00 for some E > 0, then A'(O+) == m by dominated convergence, and so A * (x) > 0 for all x > m. If instead A == 00 on (0, 00 ), then A * (x) == 0 for all x > m. The statement for x == b is trivial. 0 The large deviation estimates in Theorem 27.3 are easily extended from intervals [x, (0) to arbitrary open or closed sets, which leads to the large-deviation principle for Li.d. sequences in IR. To fulfill the needs of sub- sequent applications and extensions, we shall derive a version of the same result in }Rd. Motivated by the last result, and also to avoid some technical complications, we assume that A(u) < 00 for all u. Write BO and B- for the interior and closure of a set B. Theorem 27.5 (large deviations in ]Rd, Varadhan) Let ,1,2,'" be i. i. d. random vectors in JRd with A == Ar" < 00. Then for any B E B d , we have - inf A*(x) < liminf n-1logP{n E B} xEBO n-+oo < limsup n-110gP{n E B} < - inf A*(x). n-+oo c;EB- Proof: To derive the upper bound, we fix any E > O. By (2) there exists for every x E }Rd some U x E d such that uxx - A(u x ) > (A*(x) - E)!\ E- 1 , and by continuity we may choose an open ball Bx around x such that uxY> A(u x ) + (A*(x) - E) 1\ e- 1 , y E Bx. By Chebyshev's inequality and (1) we get for any n E N P{n E Bx} < Eexp(uxS n - ninf{uxY; y E Bx}) < exp ( -n( ( A * (x) - €) 1\ € -1 ) ) . ( 8 ) 
542 Foundations of Modern Probability Also note that A < 00 implies A*(x)  00 as Ixl -7 00, at least when d = 1. By Lemma 27.1 and Theorem 27.3 we may then choose r > 0 so large that n- 1 IogP{I[nl > r} < -1/e:, n E N. (9) Now let B C d be closed. Then the set {x E B; Ixl < r} is compact and may be covered by finitely many balls B Xl , . . . , Bxm. with centers Xi E B. By (8) and (9) we get for any n E N P{n E B} < "'. P{(n E B Xi } + P{Inl > r} m < . exp( -n«A *(Xi) - £) 1\ £-1)) + e- n / E m < (m+l)exp(-n«A*(B)-£)I\£-l)), where A*(B) = infxEB A*(x). Hence, limsup n- 1 IogP{(n E B} < -(A*(B) _ £) 1\ £-1, noo and the upper bound follows since £ was arbitrary. Turning to the lower bound, we first assume that A(u)/lul -7 00 as lul -+ 00. Fix any open set B C d and a point x E B. By compactness and the smoothness of A, there exists a u E }Rd such that \7 A ( u) = x. Let 'fJ, 'fJl, 1]2, . .. be LLd. random vectors with distribution (5), and note as before that E'fJ = x. For £ > 0 small enough, we get as in (6) P{(n E B} > P{I(n - xl < £} > exp(nA(u) - nux - n£lul) P{lijn - xl < £}. Hence, by the law of large numbers and (2), liminf n- 1 IogP{(n E B} > A(u) - ux - £Iul > -A*(x) - £Iul. noo It remains to let £ -+ 0 and take he supremum over x E B. To eliminate the growth condition on A, let (, (1, (2,'.. be i.i.d. standard Gaussian random vectors independent of  and the n. Then for any a > 0 and u E ]Rd, we have by Lemma 27.2 (i) . A+u«u) = A(u) + A«au) = A(u) + a21u12 > A(u), and in particular A+(7( < A. Since also +u(u)/lul > a 2 ful/2  00, we note that the previous bound applies to n + a(n. Now fix any x E B as before, and choose £ > 0 small enough that B contains a 2£-ball around x. Then P{In + a(n - xl < £} < P{n E B} + P{al(nl > £} < 2 (P{n E B} V P{al(nl > £}). 
27. Large l)eviations 543 - - Applying the lower bound to the variables n + O'(n and the upper bound to (n, we get by Lemma 27.2 (i) -A(x) < -A+O"((x) < liminf n-IlogP{In + O'(n - xl < E} n -+ ex:> < liminf n-Ilog (P{n E B} V P{O'I1 > E}) n-+oo < liminfn-IlogP{€nEB} V (-E2/2O' 2 ). n-+oo The desired lower bound now follows, as we let a -t 0 and then take the supremum over all x E B. 0 We can also derive large-deviation results in function spaces. Here the fol- lowing theorem is basic and sets the pattern for more conlplex results. For convenience, we may write C == C([O, 1], ]Rd) and C == {:r E Ck; Xo == O}. We also introduce the Cameron-Martin space HI, consisting of all abso- lutely continuous functions x E Co admitting a Radon-Nikodym derivative x E £2, so that IIxll == J; \Xtl2dt < 00. Theorem 27.6 (large deviations of Brownian motion, Schilder) Let X be a d-dimensional Brownian motion on [0, 1]. Then for any Borel set B c C([O, I],JR d ), we have - inf lex) < xEBO Hm inf £2 log P{ EX E B} £-+0 < limsup £2I og P{EX E B} < - inf lex), c-+O xEB- where lex) == lIxll for x E HI and lex) == 00 otherwise. The proof requires a simple topological fact. Lemma 27.7 (level sets) For any r > 0, the level set Lr == I-I[O,r] == {x E HI; IIxll < 2r} is compact in C([O, 1], ]Rd). Proof: The Cauchy-Buniakovsky inequality yields IXt - xsl < it Ixul du < (t - S) lj2 l1xll2, 0 < s < t < 1, x E HI- By the Arzela-Ascoli Theorem A2.1 it follows that Lr is relatively com- pact in C. It is also weakly compact in the Hilbert space HI with norm !lxt! == I\x1l2. Thus, every sequence Xl, X2, . .. E Lr has a subsequence that converges in both C and HI, say with limits x E C and y E Lr, respec- tively. For every t E [0,1], the sequence xn(t) then converges in Rd to both x(t) and y(t), and we get x = y E Lr. 0 Proof of Theorem 27.6: To establish the lower bound, we fix any open set B c C. Since I == 00 outside H l , it suffices to prove that -l(x) < liminf £21 og P{EX E B}, x E B(IH I . (10) c-40 Now we note as in Lemma 1.35 that C5 is dense in HI, and also that Uxll<x> < IIX"l < HxH2 for any x E HI. Hence, for every ;:r; E B n HI there 
544 Foundations of Modern Probability exist some functions X n E B n C5 with l(xn) -» lex), and it suffices to prove (10) for x E B n e5. Now for small enough h > 0, Theorem 18.22 yields P{EX E B} > P{ljf:X - xII 00 < h} E[£( -(x/E:) . X)l; IIEXl!oo < h]. (11) Integrating by parts gives log£( -(X/E) . Xh = _€-1 1 1 XtdXt - €-2I(x) -1. X -I f .. X d -2 1( ) - -€ Xl 1 + € J o 1 Xt t t - € X , and so by (11) E 2 10g P{EX E B} > -lex) - hlx11 - hili/II + E: 2 1og P{IIc:xlloo < h}. Relation (10) now follows as we let £ -+ 0 and then h -+ o. Turning to the upper bound, we fix any closed set Bee and let Bh denote the closed h-neighborhood of B. Letting X n be the n-segment, polygonal approximation of X with Xn(k/n) = X(k/n) for k < n, we note that P{EX E B} < P{EX n E B h } + P{EIIX - Xnll > h}. (12) Writing I(Bh) = inf{l(x); x E B h }, we obtain P{EX n E B h } < P{I(EX n ) > I(B h )}. Here 21(X n ) is a sum of nd variables lk' where the €ik are i.i.d. N(O,I), and so by Lemma 27.2 (i) and an interpolated version of Theorem 27.5, limsup E 2 log P{c:X n E Bh} < -I(B h ). (13) £--+0 Next we get by Proposition 13.13 and some elementary estimates P{EIIX - Xnll > h}  < nP{c:IIXIl > hy'n/2} < 2ndp{€22 > h 2 n/4d}, where  is N(O, 1). Applying Theorem 27.5 and Lemma 27.2 (i) again, we obtain limsup c: 2 10gP{£IIX - Xnll > h} < -h 2 n/8d. (14) e--+O Combining (12), (13), and (14) gives limsup E 2 10g P{EX E B} < -1(Bh) 1\ (h 2 n/8d), e--+O and as n -+ 00 we obtain the upper bound -1(Bh). It remains to show that I(B h ) t I(B) as h --+ o. Then fix any r > sUPh I(Bh). For every h > 0 we may choose some Xh E Bh such that I(xh) < r, and by Lemma 27.7 we may extract a convergent sequence 
27. Large l)eviations 545 Xh n  x with h n  0 such that even l(x) < r. Since also x E nh Bh == B, we obtain I(B) < r, as required. 0 The last two theorems suggest the following abstraction. Letting €, £ > 0, be random elements in some metric space S with Borel a-field S, we say that the family (€) satisfies the large-deviation principle (LDP) with rate function I: S  [0, 00], if for any B E S we have - inf I(x) < liminf €logP{€ E B} xEBO £-+0 < limsup £logP{€ E B} < - inf I(x). (15) £-+0 xEB- For sequences 1, 2, . .. we require the same condition with the normalizing factor € replaced by n- 1 . It is often convenient to write I(B) == inf xEB I(x). Letting S1 denote the class {B E S; I(BO) == I(B-)} of all I-continuity sets, we note that (15) implies the convergence Hm ElogP{£ E B} == -I(B), B E S1. €-+O (16) If ,1,2,... are i.i.d. random vectors in]Rd with A(u) == Eeuf.. < 00 for all u, then by Theorem 27.5 the averages [n satisfy the LDP in }Rd with rate function A *. If instead X is a d-dimensional Brownian motion on [0,1], then Theorem 27.6 shows that the processes £1/2 X satisfy the LDP in C([O, l],JR d ) with rate function I(x) == llxll for x E HI and I(x) == 00 otherwise. We show that the rate function I is essentially unique. Lemma 27.8 (regularization and uniqueness) If (€) satisfies the LDP in a metric space S, then the associated rate function I can be chosen to be lower semicontinuous, in which case it is unique. Proof: Assume that (15) holds for some I. Then the function J(x) == liminf l(y), xES, y-+x is clearly lower semicontinuous with J < I. It is also easy to verify that J(G) == I(G) for all open sets G c S. Thus, (15) remains true with I replaced by J. To prove the uniqueness, assume that (15) holds for two lower semi- continuous functions I and J, and let I (x) < J (x) for some xES. By the semicontinuity of J, we may choose a neighborhood G of x such that J(C-) > I(x). Applying (15) to both I and J yields the contradiction -I(x) < -1(G) < limi o nf €logP{£ E G} < -J(G-) < -I(x). D €-. Justified by the last result, we may henceforth take the lower semiconti- nuity to be part of our definition of a rate function. (An arbitrary function I satisfying (15) will then be called a raw rate function.) No regularization is needed in Theorems 27.5 and 27.6, since the associated rate functions A* 
546 Foundations of Modern Probability and I are already lower semicontinuous, the former as the supremum of a family of continuous functions and the latter by Lemma 27.7. It is sometimes useful to impose a slightly stronger regularity condition on the function I. Thus, we say that I is good if the level sets I-I [0, r] = {x E S; lex) < r} are compact (rather than just closed). Note that the infimum I(B) = inf xEB lex) is then attained for every closed set B i- 0. The rate functions in Theorems 27.5 and 27.6 are clearly both good. A related condition on the family (c) is the exponential tightness inf limsup €logP{c fj K} = -00, K c-'O (17) where the infimum extends over all compact sets K c S. We actually need only the slightly weaker condition of sequential exponential tightness, where (17) is only required along sequences En  O. To simplify our exposition, we often omit the sequential qualification from our statements and carry out the proofs under the stronger nonsequential hypothesis. We finally say that (c) satisfies the weak LDP with rate function I if the lower bound in (15) holds as stated while the upper bound is only required for compact sets B. We list some relations between the mentioned properties. Lemma 27.9 (goodness, exponential tightness, and the weak LDP) Let c, c > 0, be random elements in a metric space S. (i) The LDP for (c) with rate function I implies (16), and the two conditions are equivalent when I is good. (ii) If the c are exponentially tight and satisfy the weak LDP with rate function l, then I is good and (€) satisfies the full LDP. (iii) (Pukhalsky) If S is Polish and (€) satisfies the LDP with rate function I, then I is good iff (E:) is sequentially exponentially tight. Proof: (i) Let I be good and satisfy (16). Write B h for the closed h- neighborhood of B E S. Since I is nonincreasing on S, we have B h t/:. S I for at most countably many h > O. Hnce, (16) yields for almost every h>O limsup elogP{E: E B} < Hm ElogP{E: E Bh} = -1(B h ). co c-.O To see that l(B h ) t l(B-) as h  0, assume instead that SUPh l(B h ) < l(B-). Since I is good, we may choose for every h > 0 some Xh E B h with l(Xh) = l(B h ), and then extract a convergent sequence Xh n  x E B- with h n --t o. By the lower semicontinuity of I we get the contradiction I(B-) < l(x) < liminf l(xh n ) < supl(B h ) < l(B-), n-+oo h>O which proves the upper bound. Next let x E BO be arbitrary, and conclude from (16) that, for almost all sufficiently small h > 0, -I(x) < -I( {x }h) = lim £ log P{€ E {x }h} < liminf € log P{c E B}. €-.Q €o 
27. Large l)eviations 547 The lower bound now follows as we take the supremum over x E BO. (ii) By (17) we may choose some compact sets Kr satisfying limsup €logP{c f/- Kr} < -r, r > O. (18) eO For any closed set B c S, we have P{c E B} < 2(P{c E B n Kr} V P{€c rf-: Kr}), r > 0, and so, by the weak LDP and (18), limsup € log P{£ E B} < -I(B n Kr) /\ r < -I(B) /\ r. £o The upper bound now follows as we let r  00. Applying the lower bound and (18) to the sets K gives -I(K) < limsup E log P{€e tt Kr} < -r, r > 0, e--+-O and so I-I [0, r] C Kr for all r > 0, which shows that I is good. (iii) The sufficiency follows from (ii), applied to an arbitrary sequence En  O. Now let S be separable and complete, and assume that the rate function I is good. For any kEN we may cover S by some open balls B kI ,B k2 ,... of radius Ilk. Putting U km = UjmBkJ' we have SUPm I(U km ) = 00 since any level set I-I [0, r] is covered by finitely many sets Bkj. Now fix any sequence En ---t 0 and constant r :> O. By the LDP upper bound and the fact that P{en E U km } ---t 0 as m ---t 00 for fixed n and k, we may choose mk E N so large that P{£n E Uk,mk} < exp(-rkIEn), n,k E f. Summing a geometric series, we obtain limsuPn En log p{ En E Uk Uf,m k } < -r. The asserted exponential tightness now follows, since the set nk Uk,mk is totally bounded and hence relatively compact. 0 The analogy with weak convergence theory suggests that we look for a version of (16) for continuous functions. Theorem 27.10 (functional LDP, Varadhan, Bryc) Let E' E > 0, be random elements in a metric space S. (i) If (E) satisfies the LDP with a rate function I and if f: S ---+  is continuous and bounded above, then AI = lim E log Eexp (f(€E)/E) = sup (f(x) - I(x)). E--+-O xES (ii) If the c are exponentially tight and the limit Af in (i) exists for every f E C b , then (c) satisfies the LDP with the good rate function I(x) = sup (f(x) - Af), xES. fECb 
548 Foundations of Modern Probability Proof: (i) For every n E N we can choose finitely many closed sets B 1 , . . . , Bm C S such that f < -n on n j Bj and the oscillation of f on each Bj is at most n- 1 . Then lim sup c log Eef(€: )/e: < rp.ax Hrn sup clog E [ ef(€: )/e:; e: E B j ] V (-n) e:0 Jm eO < rp.ax ( SUPXEB.!(X) - i nfxEB.I(x) ) V (-n) Jm J J < rp.ax sup (f(x) - l(x) + n- 1 ) V (-n) Jm xEBJ sup (f(x) - I(x) + n- 1 ) V (-n). xES The upper bound now follows as we let n -+ 00. Next we fix any xES with a neighborhood G and write lij8-f dogEef(e)/c > lij8-f dogE[ef(e)/c; c E G] > inf f(y) - inf I(y) yEG yEG > inf f(y) - I(x). yEG Here the lower bound follows as we let G .J,. {x} and then take the supremum over xES. (ii) First we note that I is lower semicontinuous, as the supremum over a family of continuous functions. Since A/ == 0 for f == 0, it is also clear that I > O. By Lemma 27.9 (ii) it remains to show that (£) satisfies the weak LDP with rate function I. Then fix any 6 > O. For every x E 5, we may choose a function fx E C b satisfying fx(x) - Afx > (I(x) - b) /\ b- 1 , and by continuity there exists a neighborhood Bx of x such that l. fx(Y) > A/x + (I(x) - b) 1\ b- 1 , Y E Bx. By Chebyshev's inequality we get for any c > 0 P{e E Bx} < Eexp (c-l(!x(e) - inf{!x(y); y E Bx})) < Eexp (c-l(fx(eJ - A/x - (I(x) - 6) 1\ b- 1 )) , and so by the definition of A/x' limsup clogP{e: E Bx} £-+0 < lirn c log Eexp(fx(£)/c) - A/x - (I(x) - 8) 1\ 8- 1 e:----..O - -(I(x) - 6) 1\ 6- 1 . 
27. Large Deviations 549 N ow fix any compact set K c S, and choose Xl,' . . , X rn E K such that K c Ui BXt' Then limsup clogP{c E K} < E-+O < rp.ax limsup clogP{€€ E Bx} ::;m E-+O - min (I(Xi) - 8) !\ b- 1 ::;m -(I(K) - 8) 1\ 8- 1 . < The upper bound now follows as we let 8 ---t o. Next consider any open set G and element x E G. For any n E N we may choose a continuous function in: S -+ [-n,O] such that !n(x) == 0 and In == -n on CC. Then -I (x) /fbb (Af - f(x)) < Afn - fn(x) = Afn lim c log Eexp(in(€c)/c) c-+o < liminf ElogP{E E G} V (-n). E-+O The lower bound now follows as we let n ---t 00 and then take the supremum over all x E G. 0 Next we note that the LDP is preserved by continuous mappings. The following results are often referred to as the direct and inverse contraction principles. Given any rate function I on S and a function I: S -+ T, we define the image J = 1 0 f-l on T as the function J (y) == I (f-1 {y }) == inf { 1 ( x ); f ( x) == y}, yET. ( 19 ) Note that the corresponding set functions are related by J(B) = inf J(y) == inf{1(x); I(x) E B} == 1(/- 1 B), BeT. yEB Theorem 27.11 (continuous mapping) Consider a continuous function f between two metric spaces Sand T, and let E be random elements in S. (i) If (€) satisfies the LDP in S with rate function I, then the images i(€) satisfy the LDP in T with the raw rate function J == I 0 f-1. Moreover, J is a good rate function on T whenever- the function I is good on S. (ii) (loffe) Let (€) be exponentially tight in S, let f be injective, and let the images f(E) satisfy the weak LDP in T with rate function J. Then (E) satisfies the LDP in S with the good rate function I == J 0 f. Proof: (i) Since f is continuous, we note that f-l B is open or closed whenever the corresponding property holds for B. Using the LDP for (E)' we get for any BeT -1(f-l BO) < liminf £logP{E E f- 1 BO} €-+O < limsup €logP{e E f-1B-} < -1(f- 1 B-), c-+O 
550 Foundations of Modern Probability which proves the LDP for {f(g)} with the raw rate function J = 10 f-l. When I is good, we claim that J- 1 [0, r] == I{I- 1 [0, r]), r > o. (20) To see this, fix any r > 0, and let x E I-I [0, r]. Then J 0 j(x) = 10 j-l 0 j(x) = inf{I(u); feu) = j(x)} < lex) < r, which means that f(x) E J-1[O,r]. Conversely, let y E J-l[O,r]. Since I is good and 1 is continuous, the infimum in (19) is attained at some XES, and we get y == j(x) with l(x) < r. Thus, y E 1(1- 1 [0, r]), which completes the proof of (20). Since continuous maps preserve compactness, (20) shows that the goodness of 1 carries over to J. (ii) Here I is again a rate function, since the lower semicontinuity of J is preserved by composition with the continuous map f. By Lemma 27.9 (ii) it is then enough to show that (g) satisfies the weak LDP in S. To prove the upper bound, fix any compact set K c S, and note that the image set f{K) is again compact since 1 is continuous. Hence, the weak LDP for (j(f,cJ) yields limsup ElogP{f,E E K} EO limsup E log P{f(f,E) E j(K)} E-+O < -J(j(K)) == -I(K). Next we fix any open set G c S, and let x E G be arbitrary with I (x) == r < 00. Since (E) is exponentially tight, we may choose a compact set K c S such that Hm sup E log P{ f,c: t/: K} < -r. E-+O (21) The continuous image f(K) is compact in T,and so by (21) and the weak LDP for {f(E)} -1(K C ) -J{j(K C )) < -J«j(K))C) < liminf ElogP{f(E) tI. j(K)} £o < limsup £logP{E  K} < -r. e-+O Since I(x) = r, we conclude that x E K. As a continuous bijection from the compact set K onto f (K), the func- tion f is in fact a homeomorphism between the two sets with their subset topologies. By Lemma 1.6 we may then choose an open set G' C T such that I(x) E I(G n K) = G' n j(K). Noting that P{f(e) E G'} < P{e E G} + P{E  K} 
27. Large Deviations 551 and using the weak LDP of {f (€c) }, we get -r -I(x) = -J(f(x)) < lim inf E log P{f(c) E G'} £-+-O < liminf clogP{c E G} V limsup clogP{c ff K}. c-+-o €-+-O Hence, by (21) -I(x) < liminf ElogP{€ E G}, x E G, c-+-o and the lower bound follows as we take the supremum over all x E G. 0 We turn to the powerful method of projective limits. The following se- quential version is sufficient for our needs and will enable us to extend the LDP to a variety of infinite-dimensional contexts. Some general background on projective limits is provided by Appendix A2. Theorem 27.12 (random sequences, Dawson and Giirtner) For any met- ric spaces 8 1 , 8 2 , . .. , let c == ( ) be random elements in S == Xk S k , such that for every n E N the vectors (;,..., ) satisfy the LDP in sn == XknSk with a good rate function In. Then (£) sa.tisfies the LDP in S with the good rate function I(x) == sUPnln(Xl,..., x n ), X == (Xl, X2,...) E S. (22) Proof: For any m < n we introduce the natural projections 1r n : S -)- sn and 1r mn : sn --+ sm. Since the 1r mn are continuous and the In are good, Theorem 27.11 shows that 1m = In 0 1r for all m < n, and so 1r mn (I;1[0,r]) C I;1[0,r] forallr > Oandm < n. Hence foreachr > o the level sets 1;1 [0, r] form a projective sequence. Since they are also compact by hypothesis, and in view of (22) I-l[O,r]=nn7rlIl[O,r], r > O, (23) Lemma A2.9 shows that the sets 1-1[0, r] are compact. Thus, I is again a good rate function. Now fix any closed set A c S and put An == 1r n A, so that 1rmnAn == Am for all m < n. Since the 1r mn are continuous, we have also 1rmnA C A;;;, for m < n, which means that the sets A form a projective sequence. We claim that A = nn7rl A. (24) Here the relation A C 1r:;;1 A is obvious. Next assume that x tt A. By the definition of the product topology, we may choose a kEN and an open set U C Sk such that x E 1r;1U C AC. It follows easily that 1rkX E U c Ak. Since U is open, we have even 1f'kX E (A;)c. Thus, x tt nn 1r; 1 A;;-, which completes the proof of (24). The projective property carries over to the intersections A;;- n I;l[O,r], and formulas (23) and (24) combine into the 
552 Foundations of Modern Probability relation Anr 1 [O,r] = nn7rl (A nIl[O,r]), r > O. (25) Now assume that I(A) > r E JR. Then A n 1-1 [0, r] == 0, and by (25) and Lemma A2.9 we get A;; n 1;;1 [0, r] == 0 for some n E N, which implies In(A) > r. Noting that A C 7r;;-1 An and using the LDP in sn, we conclude that limsup ElogP{f: E A} < limsup ElogP{7rne E An} eO eO < -In(A;;:-) < -r, The upper bound now follows as we let r t I(A). Finally, fix an open set G C S and let x E G be arbitrary. By the definition of the product topology, we may choose n E N and an open set U C sn such that x E 7r:;;lU C G. The LDP in sn yields liminf ElogP{e: E G} > liminf €logP{7rn£ E U} eO e:O > -In(U) > -In 0 7r n (x) > -I(x), and the lower bound follows as we take the supremum over all x E G. 0 We consider yet another basic method for extending the LDP, namely by suitable approximation. Here the following elementary result is often help- ful. Let us say that the random elements e: and 'f}e in a common separable metric space (8, d) are exponentially equivalent if Hm ElogP{d(e,'f}cJ > h} = -00, h> o. e-+O (26) The separability of S is needed only to ensure measurability of the pairwise distances d( e:, 'f}e:). In general, we may replace (26) by a similar condition involving the outer measure. Lemma 27.13 (approximation) Let e and 'f}e be exponentially equivalent random elements in a separable metric space S.< Then (e:) satisfies the LDP with a good rate function I iff the same LDP holds for ('T}c). Proof: Suppose that the LDP holds for (E) with rate function I. Fix any closed set B C S, and let B h denote the closed h-neighborhood of B. Then P {'T}e E B} < P {c E Bh} + P { d( e, 'l]e) > h}, and so by (26) and the LDP for (e:) limsup c logP{17e E B} e-+O < limsup clogP{e E Bh} V limsup clogP{d(c,17e) > h} c-+O eO < -I(B h ) V (-00) = -I(B h ). Since I is good, we have I(Bh) t I(B) as h -+ 0, and the required upper bound follows. 
27. Large Deviations 553 Next we fix an open set G c S and an element x E G. If d(x, GC) > h > 0, we may choose a neighborhood U of x such that U h c G. Noting that P{£ E U} < P{7J£ E G} + P{d(£, 1]£) > h}, we get by (26) and the LDP for (£) -lex) < -I(U) < lirn inf E log P{£ E U} £-+0 < lirninf ElogP{7J£ E G} V limsup ElogP{d(t;,1J£) > h} £O £-+0 lirn inf E log P{ 7J£ E G}. £O The required lower bound now follows, as we take the supremum over all x E G. 0 We now demonstrate the power of the abstract theory by considering some important applications. First we study perturbations of the ordinary differential equation x == b(x) by a small noise term. 1\fore precisely, we consider the unique solution X£ with xg == 0 of the d-dimensional SDE dX t == €1/2dBt + b(Xt)dt, t > 0, (27) where B is a Brownian motion in ]Rd and b is bounded and uniformly Lip- schitz continuous mapping on ]Rd. Let Hoo denote the set of all absolutely continuous functions x: JR+ -+ JRd with Xo == 0 such that x E £2. Theorem 27.14 (perturbed dynamical systems, Freidlin and Wentzell) For any bounded, uniformly Lipschitz continuous function b : JRd -+ JRd, the solutions X£ to (27) with xg == 0 satisfy the LDP n C(JR+,d) with the good rate function I(x) =  1 00 IXt - b(XtW dt, x E Hoo. (28) Here it is understood that l(x) == 00 when x (j. Hoo. Note that the result for b == 0 extends Theorem 27.6 to processes on JR+. Proof: If B I is a Brownian motion on [0,1], then for every r > 0 the process Br = <I>(B I ) given by B[ ::::: r I / 2 BI/ r is a Brownian motion on [0, r]. Noting that <P is continuous from C([O, 1]) to C([O, r]), we see from Theorems 27.6 and 27.11 (i) together with Lemma 27.7 that the processes £1/2 Br satisfy the LDP in C([O, r]) with the good rate function Ir == II 0 <fJ-I, where II(X) = tlxll for x E HI and II(X) == 00 otherwise. Now <P maps HI onto Hr, and when y == q,(x) with x E HI we have Xt == r l / 2 Yrt. Hence, by calculus Ir(Y) =  J; IYsl2ds = !"yll, which extends Theorem 27.6 to [0, r]. For the further extension to JR.+, let 7f n X denote the restriction of a function x E C(IR+) to [0, n], and infer from Theorem 27.12 that the processes EI/2 B satisfy the LDP in C(IR+) with the good rate function loo(x) = sUPn In (7r n x) = !lIxll. 
554 Foundations of Modern Probability By an elementary version of Theorem 21.3, the integral equation Xt = Zt + it b(xs) ds, t > 0, (29) has a unique solution x :=: F(z) in C :=: C(J.+) for every z E C. Letting Zl, z2 E C be arbitrary and writing a for the Lipschitz constant of b, we note that the corresponding solutions xi == F(zi) satisfy Ix: - x1 < Ilzl - z211 + a it Ix - x;1 ds, t > o. Hence, Gronwall's Lemma 21.4 yields II Xl - x 2 11 < IIZI - z211e ar on the interval [0, r], which shows that F is continuous. Using Schilder's theorem on JR+ along with Theorem 27.11 (i), we conclude that the processes XC: satisfy the LDP in C(JR+) with the good rate function 1 :=: 100 0 F- 1 . Now F is clearly bijective, and (29) shows that the functions z and x == F(z) lie simultaneously in Hoo, in which case z = x - b(x) a.e. Thus, 1 is indeed given by (28). 0 Now consider a random element  with distribution fl in an arbitrary metric space S. We introduce the cumulant-generating functional A(f) :=: log Eef() :=: log J-le I , f E Cb(S), and the associated Legendre-Fenchel transform A*(v) :=: sup (vi - A(f)), v E P(S), fECb where P(S) denotes the class of probability measures on 8, endowed with the topology of weak convergence. Note that A and A * are both convex, by the same argument as for JRd. Given any two measures J.l, v E P(S), we define the relative entropy of v with respect to J-l by ( (30) H(vlJ-l) = { V logp:=: J-l(plogp), 00, v  J..t with v = p . J-l, v  J-l. Since xlogx is convex, the function H(vIJl) is convex in v for fixed fl, and by Jensen's inequality we have H(vlJ-l) > J-lp log J-lP :=: v8 log v8 = 0, v E P(S), with equality iff 1/ :=: J..t. Now let 1, 2, . .. be i.i.d. random elements in S. The associated empirical distributions are given by 'TJn=n-l 8k' nEN. L....J k $. n They may be regarded as random elements in P(8), and we note that 'TInf= n- 1 Lk.5nf(k)' f E Cb(S), n E N. 
27. Large Deviations 555 In particular, Theorem 27.5 applies to the random vectors (TJnfl, . . . , TJnfm) for fixed 11,. . . ,1m E Cb(S). The following result may be regarded as an infinite-dimensional version of Theorem 27.5. It also provides an important connection to statistical mechanics, via the entropy function. Theorem 27.15 (large deviations of empirical distributions, Sanov) Let 1, 2, . .. be i. i. d. random elements with distribution J-L in a Polish space S, and put A(f) == log J-Le i . Then the associated empirical distributions 1}1, TJ2, . .. satisfy the LDP in P(S) with the good rate function A*(v) == H(vIJl), v E P(S). (31) A couple of lemmas will be needed for the proof. Lemma 27.16 (entropy, Donsker and Varadhan) In (30) it is equivalent to take the supremum over all bounded, measurable functions f: S -t JR. The identity (31) then holds for any probability measures Jl and v on a common measurable space S. Proof: The first assertion holds by Lemma 1.35 and dominated conver- gence. If v <;k. J-L, then H(vIJL) == 00 by definition. Furthermore, we may choose a set B E S with pB :=: 0 and vB > 0, and take in == nIB to obtain vfn -logJle fn == nvB -t 00. Thus, even A*(v) == 00 in this case, and it remains to prove (31) when v « Jl. Assuming v == p. JL and writing f = log p, we note that v f - log J.Le f == v logp - log J.LP == H(vlJ.L). If f = logp is unbounded, we may approximate by bounded measurable functions f n satisfying J-le f n -t 1 and v f n -t v 1, and we get A * ( v) > H(vlJ-l). To prove the reverse inequality, we first assume that S is finite and generated by a partition B 1, . . . , Bn of S. Putting J-Lk == JlB k, Vk == vB k, and Pk == Vk/ J-Lk, we may write our claim in the form g(x) = LkvkXk -log Lkl-Lk eXk < Lkvk logpk, where x == (Xl,..., x n ) E ]Rd is arbitrary. Here the function 9 is concave and satisfies V g( x) = 0 for x == (log PI, . . . , log Pn), asymptotically when Pk = 0 for some k. Thus, sUPxg(x) = g(lOgPl" .., logpk) = Lkvk lOgPk' To prove the inequality v f - log JLe f < v log p in general, we may assume that f is simple. The generated a-field :F c S is then finite, and we note that v = j.t(PIF] . JL on F. Using the result in the finite case, together with Jensen's inequality for conditional expectations, we obtain v f - log J.Le f < J-L(J.L[PIF] log J.L[pIF]) < J.t J.L[p log plF] = v log p. o 
556 Foundations of Modern Probability Lemma 27.1 7 (exponential tightness) The empirical distributions 1]n in Theorem 27.15 are exponentially tight in P(S). Proof: If B E S with P{ E B} = P E (0,1), then by Theorem 27.3 and Lemmas 27.1 and 27.2 we have for any x E (P,1] x I-x supn-11ogP{17nB>X} < -xlog--(I-x)log . (32) n p I-p In particular, we note that the right-hand side tends to -00 as p --7 0 for fixed x E (0,1). Now fix any r > o. By (32) and Theorem 16.3, we may choose some compact sets K 1 , K 2 ,.. . C S such that P{1]n K k > 2- k } < e- knr , k,n E N. Summing over k gives limsupn-1logPU {1]n K k > 2- k } < -r, nCX) k and it remains to note that the set M = n k {v E peS); vKk < Tk} is compact, by another application of Theorem 16.3. o Proof of Theorem 27.15: By Theorem AI.1 we can embed S as a Borel subset of a compact metric space K. The function space Cb(K) is separable, and we can choose a dense sequence 11,/2,... E Cb(K). For any mEN, the random vector (11 (), . . . , 1m ()) has cumulant-generating function Am(u) = log Eexp L ukfk() = A 0 L uklk, U E m, k5:m k5:m and so by Theorem 27.5 the random vectors (1]n/l,..., 1}nf m) satisfy the LDP in JRm with the good rate function A:n. By Theorem 27.12 it follows that the infinite sequences ('fJn/l, 1]n!2, . . .) satisfy the LDP in JRCX) with the good rate function J = sUPm(A 0 7r m ), where 7r m denotes the natural projection of ]ROO onto }Rm. Since P(K) is compact by Theorem 16.3 and the mapping v M (ViI, v 12,. ..) is a continuous injection of P(K) into]Roo, Theorem 27.11 (ii) shows that the random measures TIn satisfy the LDP in P(K) with the good rate function IK(v) J(V/l, V!2,...) = sUPm A :n(v/l,..., vim) sup sup ( 2: UkV Ik - A 0 2: ukfk ) m uERm km k5:m sup (vi - A(f)) = sup (vI - A(f»), (33) IE1=" IEC b where F denotes the set of all linear combinations of !1, f2,. .. . Next we note that the natural embedding P(S) -+ P(K) is continuous, since for any f E Cb(K) the restriction of I to S belongs to Cb(S). Since it is also trivially injective, we see from Theorem 27.11 (ii) and Lemma 27.17 
27. Large Deviations 557 that the 1Jn satisfy the LDP even in P(S), with a good rate function Is that equals the restriction of IK to P(S). It remains to note that Is == A* by (33) and Lemma 27.16. 0 We conclude with a remarkable application of Schilder's Theorem 27.6. Writing B for a standard Brownian motion in JRd, we define for any t > e the scaled process X t by X t == Bst , s > o. s yl 2t log log t Theorem 27.18 (functional law of the iterated logarithm, Strassen) Let B be a Brownian motion in }Rd, and define the processes X t by (34). Then the following equivalent statements hold outside a fixed P -null set: (i) The paths xt, t > 3, form a relatively compact set in C(JR+,IR d ), whose set of limit points as t --+ 00 equals K == {x = Hoo; IIxl12 < I}. (ii) For any continuous function F: C(JR+,JR d ) -+ JR, 'UJe have limsupF(X t ) == sup F(x). t-+oo xEK (34) In particular, we may recover the classical law of the iterated logarithm in Theorem 13.18 by choosing F(x) = Xl. Using Theorem 14.6, we can easily derive a correspondingly strengthened version for random walks. Proof: The equivalence of (i) a nd (ii) be ing elementary, we need to prove only (i). Noting that X t d B/ V 2loglogt and using Theorem 27.6, we get for any measurable set A c C(+,JRd) and constant r > 1 I " logP{xrnEA} I " logP{XtEA} 2I(A - ) 1m sup < Imsup < - , n-+oo log n - t-+oo log log t - lirn inf log p{X rn E A} > lirn inf log P{X t E A > -2I(A O ), n-+oo log n - t-+oo log log t - where lex) = !lIxll when x E Hoo and lex) = 00 otherwise. Hence, LnP{XrnE A} { < :: 2I(A-) > 1, 2I(AO) < 1. (35) Now fix any r > 1, and let G :) K be open. Note that 2I(GC) > 1 by Lemma 27.7. By the first part of (35) and the Borel-Cantelli lemma we have P {X rn t/:. G La.} == 0 or, equivalently, IG (X rn ) --+ 1 a.s. Since G n was arbitrary, it follows that p(X r , K) --+ 0 a.s. for any metrization p of C(R+,R d ). In particular, this holds with any c > 0 for the metric Pc(x, y) = 1 00 ((x - y): A 1) e- cs ds, x, y E C(JR+, JRd). To extend the convergence to the entire family {X t }, fix any path of B such that PI (X rn , K) --+ 0, and choose some functions yrn E K satisfying PI (X rn , yrn)  o. For any t E [r n , rn+l ), the paths X rn and X t are related 
558 Foundations of Modern Probability by ( n l 1 n ) I/2 Xt(s) = X rn (tr-ns) r og ogr , t log log t s > o. Defining yt in the same way in terms of yr n , we note that also yt E K since I(yt) < I(yr n ). (The two Hoo-norms would agree if the logarithmic factors were omitted.) Furthermore, Pr(X t , yt) _ 1 00 ((X t - yt): AI) e- rs ds < 1 00 ((X rn _ yrn);s AI) e- rs ds r- 1 PI (X rn , yrn) -t o. Thus, Pr(X t , K) -+ O. Since K is compact, we conclude that {xt} is relatively compact, with all its limit points as t -+ 00 belonging to K. Now fix any y E K and u > £ > O. By the established part of the theorem and the Cauchy-Buniakowski inequality, we have a.s. limsup (X t - y); < sup (x - y); < sup x; + y; < 2£1/2. (36) t-H)() xEK xEK Write x;,u = sUPsE[e,u] IXs-xel, and choose r > u/£ to ensure independence between the variables (X rn -y); u. Applying the second part of (35) to the , open set A == {x; (x - Y);,u < £} and using the Borel-Cantelli lemma together with (36), we obtain a.s. liminf (X t - y): < limsup (X t - y); + liminf (X rn - y);,u t(X) t(X) n(X) < 2£1/2 + E. Letting £ -+ 0 gives liminft(X t - y) == 0 a.s., and so liminf t Pl(X t ,y) < e- U a.s. As u -t 00, we obtain lim inft PI (X t , y) = 0 a.s. Applying this result to a dense sequence Yl, Y2, . . . E K, we see that a.s. every element of K is a limit point as t -+ 00 of the family {X t }. 0 Exercises 1. For any random vector  and constant a in IR d , show that A-a(u) = A(u) - ua and A_a(x) == A(x + a). 2. For any random vector  in }Rd and nonsingular d x d matrix a, show that Aa(u) = A(ua) and A(x) = A(a-lx). 3. For any pair of independent random vectors  and 1}, show that A,17(u, v) = A(u) + A17(v) and A,17(x, y) = A(x) + A;(y). 4. Prove the claims of Lemma 27.2. , 
27. Large Deviations 559 5. If  is Gaussian in JRd with mean m E d and covariance matrix a, show that A€ (x) == ! (x - m)' a-I (x - m). Explain the interpretation when a is singular. 6. Let  be a standard Gaussian random vector in ]Rd. Show that the family cl/2 satisfies the LDP in IR d with the good rate function I(x) == lxI2. (Hint: Deduce the result along the sequence Cn == n- l from Theorem 27.5, and extend by monotonicity to general E > 0.) 7. Use Theorem 27.11 (i) to deduce the preceding result from Schilder's theorem. (Hint: For x E HI, note that IXll < IIxl12 with equality iff Xt tXl. ) 8. Prove Schilder's theorem on [0, T] by the same argument as for [0, 1]. 9. Deduce Schilder's theorem in the space C([O, n], d) from the version in C([O, 1], nd). 10. Let B be a Brownian bridge in d. Show that the processes c l / 2 B satisfy the LDP in C([O, 1], d) with the good rate function I(x) == llxll for x E HI with Xl == 0 and I(x) == 00 otherwise. (Hint: Write Bt == Xt - tX l , where X is a Brownian motion in ]Rd, and use Theorem 27.11. Check that Ilx - al12 is minimized for a == Xl') 11. Show that the property of exponential tightness and its sequential version are preserved by continuous mappings. 12. Prove that if the processes X£ and ye in C(+, JRd) are exponentially tight, then so is any linear combination aXe + bYe. (Hint: Use the Arzela- Ascoli theorem.) 13. Show directly from (27) that the processes Xc in Theorem 27.14 are exponentially tight. (Hint: Use Lemmas 27.7 and 27.9 (iii) together with the Arzela- Ascoli theorem.) Derive the same result from the stated theorem. 14. Let c be random elements in a locally compact metric space S, satisfy- ing the LDP with a good rate function I. Show that the c are exponentially tight (even in the nonsequential sense). (Hint: For any r > 0, there exists a compact set Kr C S such that I-I [0, r] C K. Now apply the LDP upper bound to the closed sets (K)C  K.) 15. For any metric space Sand lcscH space T, let Xc be random elements in C(T, S) whose restrictions Xl< to an arbitrary compact set K c T satisfy the LDP in C(K, S) with the good rate function IK. Show that the Xc satisfy the LDP in C (T, S) with the good rate function I == sup K (I K 011 K ), where 1IK denotes the restriction map from C(T, S) to C(K, S). 16. Let kj be i.i.d. random vectors in JR.d satisfying A( u) == EeUkJ < 00 for all u E )Rd. Show that the sequences €n = n-lEk<n(kl,k2"") satisfy an LDP in (JRd)oo with the good rate function I() = E j A*(xj). Also derive an LDP for the associated random walks in Rd. 
560 Foundations of Modern Probability 17. Let  be a sequence of i.i.d. N(O, 1) random variables. Use the preceding result to show that the sequences €1/2 satisfy the LDP in ]Roo with the good rate function l(x) == !lIxll2 for x E Z2 and l(x) == 00 otherwise. Also show how the statement follows from Schilder's theorem. 18. Let l, 2, . .. be i.i.d. random probability merasures on a Polish space S. Derive an LDP in P(S) for the averages n = n- 1 Ek<n k. (Hint: Define A{f) == log Eel;.kf, and proceed as in the proof of Sanov's theorem.) 19. Show how the classical law of the iterated logarithm in Theorem 13.18 follows from Theorem 27.18. Also use the latter result to derive a law of the iterated logarithm for the variables t = IB 2t - Btl, where B is a Brownian motion in JRd. 20. Use Theorem 27.18 to derive a corresponding law of the iterated logarithm in C([O, 1], ]Rd). 21. Use Theorems 14.6 and 27.18 to derive a functional law of the iterated logarithm for random walks based on i.i.d. random variables with mean 0 and variance 1. (Hint: To state the result in C(IR+, JR), replace the summa- tion process S[t] by its linearly interpolated version, as in case of Corollary 16.7. ) 22. Use Theorems 14.13 and 27.18 to derive a functional law of the iterated logarithm for suitable renewal processes. 23. Let B 1 , B 2 , . .. be independent Brownian motions in IR d . Show that the sequence of paths Xr = (2Iogn)-1/2Bf, n > 2, is a.s. relatively compact in C(JR+,]Rd) with set of limit points K == {x E Hoo; IIx!l2 < I}. 
Appendices Here we list some results that play an important role in this book but whose proofs are too long or technical to contribute in any essential way to the understanding of the subject matter. Proofs are given only for results that are not easily accessible in the literature. AI. Advanced Measure Theory The basic facts of measure theory were reviewed in Chapters 1 and 2. In this appendix we list, mostly without proofs, some special or less elementary results that are required in this book. One of the quoted results is used more frequently, namely the Borel nature of Polish spaces in Theorem A1.2. The remaining results are needed only for special purposes. We begin with a basic embedding theorem. Recall that a topological space is said to be Polish if it is separable with a complete metrization. Theorem AI.l (embedding) Any Polish space is homeomorphic to a Borel subset of the compact space [0, 1] <X> . Proof: See Theorem 11.82.5 in Rogers and Williams (1994). o We say that two measurable spaces Sand T are Borel isomorphic if there exists a measurable bijection f: S --+ T such that j-l is also measurable. A Borel space is defined as a measurable space that is Borel isomorphic to a Borel subset of [0, 1]. The following result shows that the most commonly occurring spaces are Borel. Theorem Al.2 (Polish and Borel spaces) Every Borel subset of a Polish space is a Borel space. Proof: By Theorem A1.I, it is enough to show that [0,1]00 is a Borel space. This may be seen by an elementary argument involving binary ex- pansions, similar to that used in the proof of Lemma 3.21. However some extra care is needed to ensure that the resulting mapping into [0, 1] is in- jective and bimeasurable with a measurable range. See, e.g., Theorem A.47 in Breiman (1968) for details. 0 If a measurable mapping is invertible, then the measurability of the inverse can sometimes be inferred from the measurability of the range. 
562 Foundations of Modern Probability Theorem Al.3 (range and inverse, Kuratowski) Let f be a measurable bijection between two Borel spaces Sand T. Then the inverse f-l : T -+ S is again measurable. Proof: See Parthasarathy (1967), Section 1.3. o We turn to the basic projection and section theorem, which plays such an important role in the more advanced literature. For any measurable space (n,:F), the universal completion of F is defined as the a-field F = nJ.L FJ-t, where :FJ-t denotes the completion with respect to J..L, and the intersection extends over all probability measures J.-t on :F. For any spaces nand S, we define the projection 7r A of a set A c n x S onto n as the union Us As, where As == {w E n; (w, s) E A}, s E S. Theorem Al.4 (projection and sections, Lusin, Choquet, Meyer) Fix a measurable space (n,F) and a Borel space (8,5), and consider a set A E F 0 S with projection 7r A onto n. Then (i) 1f A belongs to the universal completion :F of :F; (ii) for any probability measure P on:F, there exists a random element € in S such that (w, (w)) E A holds P-a.s. on 7rA. Proof: See Dellacherie and Meyer (1975), Section 111.44. o A2. Some Special Spaces Here we collect some basic facts about various set, measure, and function spaces of importance in probability theory. Though random processes with paths in C(JR+, ]Rd) or D(JR+, JRd) and random measures on a variety of spaces are considered throughout the book, most of the topological results mentioned here are not needed until Chapter 16, where they play a fun- damental role for the theory of convergence in distribution. Our plan is to begin with the basic function spaces and then move on to some spaces of measures and sets. Whenever appropriate accounts are available in the literature, we omit the proofs. We begin with a well-known classical result. On any space of functions x: K -+ S, we introduce the evaluation maps 1ft : x M Xt, t E K. Given some metrics d in K and p in S, we define the associated modulus of continuity by w(X, h) = sup{p(xs,Xt); des, t) < h}, h > O. 
A2. Some Special Spaces 563 Theorem A2.1 (equicontinuity and compactness, Arzela, Ascoli) Fix two metric spaces K and S, where K is compact and 5 is complete, and let D be dense in K. Then a set A c C(K,5) is relatively compact iff 1rtA is relatively compact in S for every t E D and lim sup w(x, h) ==: O. h-40 xEA In that case, even UtEK 1TtA is relatively compact in S. Proof: See Dudley (1989), Section 2.4. o Next we fix a separable, complete metric space (5, p) and consider the space D(JR+, S) of functions x: 1R+ -+ S that are right-continuous with left- hand limits (rcll). It is easy to see that, for any E, t > 0, such a function x has at most finitely many jumps of size> E before time t. In D(JR+, 5) we introduce the modified modulus of continuity w(x, t, h) == inf max sup p(x r , xs), x E D(+, S), t, h > 0, (1) (I k ) k r,sE1k where the infimum extends over all partitions of the interval [0, t) into subintervals Ik == [u, v) such that v - u > h when IV < t. Note that w(x, t, h) -+ 0 as h -+ 0 for fixed x E D(IR+, S) and t > O. By a time- change on JR+ we mean a monotone bijection A: JR+ -+ R+. Note that A is continuous and strictly increasing with Ao == 0 and Aoo == 00. Theorem A2.2 (J 1 -topology, Skorohod, Prohorov, Kolmogorov) Fix a separable, complete metric space (S, p) and a dense set T c JR+. Then there exists a separable and complete metric d in D(JR+, 5) such that d(xn, x) -+ 0 iff sup IAn(s) - sl + sup p(x n 0 An(S), x(s)) -+ 0, t > 0, st st for some time-changes An on JR+. Furthermore, B(D(JR+, S)) == a{ 1ft; t E T}, and a set A c D(R+, S) is relatively compact iff 1rtA is relatively compact in S for every t E T and lim sup w(x, t, h) == 0, t > O. (2) hO xEA In that case, Ust 1T'sA is relatively compact in S for every t > O. Proof: See Either and Kurtz (1986), Sections 3.5 and 3.6, or Jacod and Shiryaev (1987), Section VI.l. 0 A suitably modified version of the last result applies to the space D([O, 1], S). Here we define w(x, h) in terms of partitions of [0,1) into subintervals of length > h and use time-changes A that are increasing bijections on [0,1]. Turning to the case of measure spaces, let S be a locally compact, second- countable Hausdorff (lcscH) space S with Borel a-field S, and let S denote the class of bounded (i.e., relatively compact) sets in S. The space S is known to be Polish, and the family C1< of continuous functions f: S  IR+ with compact support is separable in the uniform metric. Furthermore, 
564 Foundations of Modern Probability there exists a sequence of compact sets Kn t S such that Kn C K+l for each n. Let M(S) denote the class of measures on S that are locally finite (i.e., finite on S), and write 7r Band 1r j for the mappings J.l H- J-LB and J-L H J.LI = J fdJ.L, respectively, on M(S). The vague topology in M(S) is generated by the maps 1r f, f E Cj(, and we write the vague convergence of J-Ln toward J.l v "" as tLn  J-L. For any J.L E M(S) we define SJi. = {B E S; J-LaB == O}. Here we list some basic facts about the vague topology. Theorem A2.3 (vague topology) For any LescH space S, we have (i) M(S) is Polish in the vague topology; (ii) a set A c M(S) is vaguely relatively compact iffsupp,EAJ1f < 00 for all f E ej(; (iii) if J1n -4 J-t and B E S with po B == 0, then J-tnB -t J.lB; (iv) B(M(S)) is generated by the maps 7rj, f E Cj(, and also for any m E M(S) by the maps 7rB, B E Sm. Proof: (i) Let /1,/2, . .. be dense in c1<, and define p(J.L, v) = I: k Tk(lJ.Lh - v hi!\. 1), J.L, v E M(S). (3) It is easily seen that p metrizes the vague topology. In particular, M(S) is homeomorphic to a subset ofJRoo and therefore separable. The completeness of p will be clear once we have proved (ii). (ii) The necessity is clear from the continuity of 7T f for each f E e1<:. Con- versely, assume that SUPjlEA J-LI < 00 for all 1 E e1<. Choose some compact sets Kn t S with Kn C K+l for each n, and let the functions fn E Cj( be such that 1Kn < in < 1Kn+l. For each n the set {In. J.L; J-L E A} is uni- formly bounded, and so by Theorem 16.3 it is even sequentially relatively compact. A diagonal argument then shows that A itself is sequentially rela- tively compact. Since M(S) is metrizable, the desired relative compactness follows. (iii) The proof is the same as for Theorem 4.25. (iv) A topological basis in M(S) is formed by all finite intersections of the sets {/l; a < J-Lf < b} with 0 < a < band f E c1<. Furthermore, since M (S) is separable, every vaguely open set is a countable union of basis elements. Thus, B(M(S)) == a{1T"J; f E CI-}. By a simple approximation and monotone class argument it follows that B(M(S)) = a{1rB; B E S}. Now fix aI}Y m E S, put A = a{7rB; B E 8m}, and let V denote the class of all DES such that 7rD is A-measurable. Fixing a metric d in S such that all d-bounded closed sets are compact, we note that only countably many d-spheres around a fixed point have positi,:e m-measure. Thus, 8m contains a topological basis. We also note that Sm is closed under finite unions, whereas D is closed under bounded increasing limits. Since S is separable, it follows that V contains every open set G E S. For any such 
A2. Some Special Spaces 565 G, the class D n G is a A-system containing the 7r-system", of all open sets in G, and by a monotone class argument we get V n G == S n G. It remains to let G t S. 0 Next we consider the space of all measure-valued rcll functions. Here we may characterize compactness in terms of countably many one-dimensional projections, a result needed for the proof of Theorem 16.27. Theorem A2.4 (measure-valued functions) For any LcscH space S, there exist some fl, 12,'" E Ck(8) such that a set A c D(+, M(S)) is rela- tively compact iff Afj == {xfj; x E A} is relatively compact in D(IR+,JR+) for every j E N. Proof: If A is relatively compact, then so is Af for every f E Cj{ (S), since the map x  xf is continuous from D(R+,M(S)) to D(+,+). To prove the converse, choose a dense collection f 1, f2, . . . E Cj( (S), closed under addition, and assume that AIj is relatively compact for every j. In particular, sUPxEA Xt!j < 00 for all t > 0 and j E N, and so by Theorem A2.3 the set {Xt; x E A} is relatively compact in M(S) for every t > O. By Theorem A2.2 it remains to verify (2), where w is defined in terms of the complete metric p in (3). If (2) fails, then either we may choose some x n E A and t n -4 0 with lim sUPn p( xf n ' xo) > 0, or else there exist some x n E A and some bounded St < t n < Un with Un - Sn --+ 0 such that Hm sup (p(xn' xn) /\ p(xn' xn)) > o. nCX) (4) In the former case it is clear from (3) that limsuPn Ixfn!j - xofjl > 0 for some j E N, which contradicts the relative compactness of Afj. Next assume (4). By (3) there exist some i, j E N such that limsup (Ixnfi - Xnfil/\ IXnfj - xnfjl) :> O. (5) nCX) Now for any a, a', b, b' E JR, we have  (Ial /\ Ib'D < (Ial/\ la'l) V (Ibl /\ Ib'l) V (Ia + a' 1/\ Ib + b'D. Since the set {!k} is closed under addition, (5) then implies the same relation with a common i == j. But then (2) fails for Afi, ,vhich by Theorem A2.2 contradicts the relative compactness of Afi- Thus, (2) does hold for A, and so A is relatively compact. 0 Given an IcscH space S, we introduce the classes g, F, and JC of open, closed, and compact subsets, respectively. Here we may consider :F as a space in its own right, endowed with the Fell topology generated by the sets {F E F; FnG =1= 0} and {F E :F; FnK == 0} for arbitrary G E 9 and K E IC. To describe the corresponding notion of convergence, we may fix a metrization p of the topology in 8 such that every closed p-ball is compact. 
566 Foundations of Modern Probability Theorem A2.5 (Fell topology) Fix any LcscH space S, and let F be the class of closed sets F c S, endowed with the Fell topology. Then (i) F is compact, second-countable, and Hausdorff; (ii) Fn -+ F in :F iff p(s, Fn) -+ pes, F) for all s E S; (iii) {F E :F; FnB =1= 0} is universally Borel measurable for every B E S. Proof: First we show that the Fell topology is generated by the maps F ....-+ pes, F), s E S. To see that those mappings are continuous, put Bs,r == {t E S; pes, t) < T}, and note that {F; p(s,F) < r} {F; p(s,F) > r} {F; F n B; =1= 0}, {F; FnlJ; == 0}. Here the sets on the right are open, by the definition of the Fell topology and the choice of p. Thus, the Fell topology contains the p-topology. To prove the converse, fix any F E :F and a net {F i } C :F with directed index set (1, -<) such that F i -+ F in the p-topology. We need to show that convergence holds even in the Fell topology. Then let G E 9 be arbitrary with F n G  0. Fix any s E F n G. Since pes, F i ) -+ pes, F) == 0, we may further choose some Si E F i with pes, Si) -+ O. Since G is open, there exists some i E 1 such that Sj E G for all j >- i. Then also Fj n G ft 0 for all j >- i. Next consider any K E J( with F n K == 0. Define Ts = p(s, F) for each S E K and put G s == Bs,rs' Since K is compact, it is covered by finitely many balls G Sk ' For each k we have p(Sk, F i ) --+ p(Sk, F), and so there exists some ik E 1 such that Fj n G Sk = 0 for all j >- i k . Letting i E I be such that i >- ik for all k, it is clear that Fj n K = 0 for all j >- i. Now we fix any countable dense set DeS, and assume that pes, F i ) -+ pes, F) for all sED. For any s, s' E S we have Ip(s, Fj) - pes, F)I < IpCs', Fj) - pes', F)I + 2p(s, s'). Given any sand € > 0, we can make the left-hand side < €, by choosing an s' E D with p( s, s') < € /3 and then an i E 1 such that Ip( s', Fj) - p( s', F) I < £/3 for all j >- i. This shows that the Fell topology is also generated by the mappings F ..Ps, F) with s restricted to D. But then:F is homeomorphic to a subset of JR+ , which is second-countable and metrizable. To prove that :F is compact, it is now enough to show that every sequence (Fn) c F contains a convergent subsequence. Then choose a subsequence such that p( s, F n ) converges in JR + for all SED, and hence also for all s E S. Since the family of functions pes, Fn) is equicontinuous, even the limit f is continuous, and so the set F = {s E S; I(s) = O} is closed. To obtain Fn -+ F, we need to show that whenever F n G -# 0 or F n K = 0 for some G E g or K E JC, the same relation eventually holds even for Fn. In the former case, we may fix any s E F n G and note that p(s, Fn) -+ f(s) = o. Hence, we may choose some Sn E Fn with Sn --t s, and since Sn E G for large n, we get Fn n G =1= 0. In the latter case, we assume that instead FnnK =1= 0 along a subsequence. Then there exist some 
A2. :iome :ipeclal :ipaces 507 Sn E Fn n K, and we note that Sn -7 S E K along a further subsequence. Here 0 == p(sn, Fn) --t pes, F), which yields the contradiction s E F n K. This completes the proof of (i). To prove (iii), we note that the mapping (s, F) ...-.+ p( s, F) is jointly continuous and hence Borel measurable. Now Sand F are both separable, and so the Barela-field in S x F agrees with the product a-field S @ B(F). Since s E F iff p(s,F) = 0, it follows that {(s,F); S E F} belongs to SQ9B(F). Hence, so does {(s,F); S E FnB} for arbitrary B E S. The assertion now follows by Theorem Al.4. 0 We say that a class U c S is separating if for any KeG with K E K and G E 9 there exists some U E U with K cUe G. A preseparating class I C S is such that the finite unions of I-sets form a separating class. When S is Euclidean, we typically choose I to be a class of intervals or rectangles and U as the corresponding class of finite unions. Lemma A2.6 (separation) For any monotone function h : S -+ ffi., the class Sh == {B E S; h(BO) == h( B )} is separating. Proof: Fix a metric p in S such that every closed p-baU is compact, and let K E JC and G E 9 with KeG. For any € > 0, define KE: = {s E S; d( s, K) < E} and note that K E: == {s E S; p( s, K) < €}. Since K is compact, we have p(K, GC) > 0, and so K C KE: C G for sufficientlY"small € > O. From the monotonicity of h it is further clear that KE: E Sh for almost everye > O. 0 We often need the separating class to be countable. "" Lemma A2.7 (countable separation) Every separating class U C S contains a countable separating subclass. Proof: Fix a countable topological base B C S, closed under finite unions. Choose for every B E B some compact sets KB,n .J,. B with KB,n ::> B , and then for each pair (B, n) E BxN some set UB,n E U with B C UB,n C KB,n' The family {UB,n} is clearly separating. 0 The next result, needed for the proof of Theorem 16.29, relates the vague and Fell topologies for integer-valued measures and their supports. Let N(S) denote the class of locally finite, integer-valued measures on S, and write -4 for convergence in the Fell topology. Proposition A2.8 (supports of measures) Letj.l,j.ll,Jl2,'" EN(S) with supp Jln -4 supp Jl, where S is LcscH and J.l is simple. Then limsup(J.Ln B 1\ 1) < J.lB < lim inf JlnB, B E SJ-L. n n Proof: To prove the left inequality, we may assume that J1B == O. Since B E S,.,., we have even Jl B = 0, and so B n sUPP Jl = 0. By convergence of 
568 Foundations of Modern Probability the supports we get B n supp J.tn = 0 for large enough n, which implies lim sUP(J-lnB /\ 1) < !im sup /ln B = 0 = J1B. n--+oo n--+oo To prove the right inequality, we may assume that J-LB = m > O. Since SJ.L is a separating ring, we may choose a partition B1'...' Bm E SJ.L of B such that J-LBk = 1 for each k. Then also J1B'k = 1 for each k, and so B'k n Supp J1 =I- 0. By convergence of the supports we get Bk n supp J.tn =1= 0 for large enough n. Hence, 1 < lirn inf J.tn B'k < lim inf J.tn B k , n--+oo n--+oo and so ftB m < "" lim inf /In B k k n--+oo lirn inf "" J.tn B k = !im inf J-tn B . n--+oo  k n--+oo o < To state the next result, fix any metric spaces 8 1 ,8 2 , . .. , and introduce the product spaces sn = S1 X . . . X 8n and S == 8 1 X 8 2 X ... endowed with their product topologies. For any m < n < 00, let 1f m and 1r mn denote the natural projections of 8 and sn onto 8 m . The sets An C sn, n E N, are said to form a projective sequence if 1rmnAn C Am for all m < n. We may then define their projective limit in S as the set A = nn 1r1 An. Lemma A2.9 (projective limits) For any metric spaces 8 1 ,8 2 ,. .. , con- sider a projective sequence of nonempty, compact sets Kn C 8 1 X . . . X Sn, n E N. Then the projective limit K = nn 1r;1 Kn is again nonempty and compact. Proof: Since the Kn are nonempty, we may choose some sequences x n == (x) E 1r1 Kn, n E N. By the projective property of the sets Km, we have 1r m X n E Km for all m < n. In particular, the sequence x:n, x,. . . is relatively compact in 8m for each mEN, and by a diagonal argument we may choose a subsequence N' C N and an element x = (x m ) E S such that x n  x as n  00 along N'. Then also 7rmXn  7r m X along N' for each mEN, and since the Km are closed, we conclude that 1r m X E Km for all m. Thus, we have x E K, which shows that K is nonernpty. The compactness of K may be proved by the same argument, where we assume that x l , x 2 , . . . E K. 0 
Historical and Bibliographical Notes The following notes were prepared with the modest intentions of tracing the origins of some of the basic ideas in each chap- ter, of giving precise references for the main results cited in the text, and of suggesting some literature for further reading. No completeness is claimed, and knowledgeable readers are likely to notice misinterpretations and omissions, for which I appolo- gize in advance. A comprehensive history of modern probability theory still remains to be written. 1. Measure Theory - Basic Notions The first author to consider measures in the modern sense was BOREL (1895, 1898), who constructed Lebesgue measure on the Borel a-field in JR. The corresponding integral was introduced by LEBESGUE (1902, 1904), who also established the dominated convergence theorern. The monotone convergence theorem and Fatou's lemma were later obtained by LEVI (1906a) and FATOU (1906), respectively. LEBESGUE also introduced the higher-dimensional Lebesgue measure and proved a first version of Fubini's theorem, subsequently generalized by FUBINI (1907) and TONELLI (1909). The integration theory was extended to general measures and abstract spaces by many authors, including RADON (1913) and FRECHET (1928). The norm inequalities in Lemma 1.29 were first noted for finite sums by HOLDER (1889) and MINKOWSKI (1907), respectively, and were later extended to integrals by RIESZ (1910). Part (i) for p == 2 goes back to CAUCHY (1821) for finite sums and to BUNIAKOWSKY (1859) for integrals. The Hilbert space projection theorem can be traced back to LEVI (1906b). The monotone class Theorem 1.1 was first proved, along with related results, already by SIERPINSKI (1928), but the result was not used in prob- ability theory until DYNKIN (1961). More primitive versions had previously been employed by HALMOS (1950) and DOOB (1953). Most results in this chapter are well known and can be found in any textbook on real analysis. Many probability texts, including LOEVE (1977) and BILLINGSLEY (1995), contain detailed introductions to measure theory. There are also some excellent texts in real analysis adapted to the needs of probabilists, such as DUDLEY (1989) and DOOB (1994). The former author also provides some more detailed historical information. 
570 Foundations of Modern Probability 2. Measure Theory - Key Results As we have seen, BOREL (1995, 1998) was the first to prove the existence of one-dimensional Lebesgue measure. However, the modern construction via outer measures in due to CARATHEODORY (1918). Functions of bounded variation were introduced by JORDAN (1881), who proved that any such function is the difference of two nondecreasing func- tions. The corresponding decomposition of signed measures was obtained by HAHN (1921). Integrals with respect to nondecreasing functions were defined by STIELTJES (1894), but their importance was not recognized un- til RIESZ (1909b) proved his representation theorem for linear functionals on C[O, 1]. The a.e. differentiability of a function of bounded variation was first proved by LEBESGUE (1904). VITALI (1905) was the first author to see the connection between absolute continuity and the existence of a density. The Radon-Nikodym theorem was then proved in increasing generality by RADON (1913), DANIELL (1920), and NIKODYM (1930). The idea of a combined proof that also establishes the Lebesgue decomposition is due to VON NEUMANN. Invariant measures on specific groups were early identified through ex- plicit computation by many authors, notably by HURWITZ (1897) for the case of SO(n). HAAR (1933) proved the existence (but not the uniqueness) of invariant measures on an arbitrary IcscH group. The modern treatment originated with WElL (1940), and excellent expositions can be found in many books on real or harmonic analysis. Invariant measures on more gen- eral spaces are usually approached via quotient spaces. Our discussion in Theorem 2.29 is adapted from ROYDEN (1988). 3. Processes, Distributions, and Independence The use of countably additive probability measures dates back to BOREL (1909), who constructed random variables as measurable functions on the Lebesgue unit interval and proved Theorem 3.18 for independent events. CANTELLI (1917) noticed that the "easy" part remains true without the independence assumption. Lemma 3.5 was proved by JENSEN (1906) after HOLDER had obtained a special case. The modern framework, with random variables as measurable functions on an abstract probability space (0, A, P) and with expected values as p- integrals over 0, was used implicitly by KOLMOGOROV from (1928) on and was later formalized in KOLMOGOROV (1933). The latter monograph also contains Kolmogorov's zero-one law, discovered long before HEWITT and SAVAGE (1955) obtained theirs. Early work in probability theory deals with properties depending only on the finite-dimensional distributions. WIENER (1923) was the first author to construct the distribution of a process as a measure on a function space. The general continuity criterion in Theorem 3.23, essentially due to KOL- 
Historical and Bibliographical Notes 571 MOGOROV, was first published by SLUTSKY (1937), with minor extensions later added by LOEVE (1978) and CHENTSOV (1956). The general search for regularity properties was initiated by DOOB (1937, 1947). Soon it be- came clear, especially through the work of LEVY (1934-35, 1954), DOOB (1951, 1953), and KINNEY (1953), that most processes of interest have right-continuous versions with left-hand limits. More detailed accounts of the material in this chapter appear in many textbooks, such as in BILLINGSLEY (1995), ITO (1984), and WILLIAMS (1991). Further discussions of specific regularity properties appear in LOEVE (1977) and CRAMER and LEADBETTER (1967). Earlier texts tend to give more weight to distribution functions and their densities, less weight to measures and a-fields. 4. Random Sequences, Series, and Averages The weak law of large numbers was first obtained by BERNOULLI (1713) for the sequences named after him. More general versions were then estab- lished with increasing rigor by BIENAYME (1853), CHEBYSHEV (1867), and MARKOV (1899). A necessary and sufficient condition for the weak law of large numbers was finally obtained by KOLMOGOROV (1928-29). KHINCHIN and KOLMOGOROV (1925) studied series of independent, dis- crete random variables and showed that convergence holds under the condition in Lemma 4.16. KOLMOGOROV (1928-29) then obtained his max- imum inequality and showed that the three conditions in Theorem 4.18 are necessary and sufficient for a.s. convergence. The equivalence with convergence in distribution was later noted by LEVY (1954). The strong law of large numbers for Bernoulli sequences was stated by BOREL (1909), but the first rigorous proof is due to FABER (1910). The simple criterion in Corollary 4.22 was obtained in KOLMOGOROV (1930). In (1933) KOLMOGOROV showed that existence of the mean is necessary and sufficient for the strong law of large numbers for general i.i.d. sequences. The extension to exponents p =1= 1 is due to MARCINKIEWICZ and ZVGMUND (1937). Proposition 4.24 was proved in stages by GLIVENKO (1933) and CANTELLI (1933). RIESZ (1909a) introduced the notion of convergence in Ineasure, for prob- ability measures equivalent to convergence in probability and showed that it implies a.e. convergence along a subsequence. The weak compactness criterion in Lemma 4.13 is due to DUNFORD (1939). The functional rep- resentation of Proposition 4.31 appeared in KALLENBERG (1996a), and Corollary 4.32 was given by STRICKER and YOR (1978). The theory of weak convergence was founded by ALEXANDROV (1940- 43), who proved in particular the so-called Portmanteau Theorem 4.25. The continuous mapping Theorem 4.27 was obtained for a single function In f by MANN and WALD (1943) and then in the general case by PROHOROV 
572 Foundations of Modern Probability (1956) and RUBIN. The coupling Theorem 4.30 is due for complete S to SKOROHOD (1956) and in general to DUDLEY (1968). More detailed accounts of the material in this chapter may be found in many textbooks, such as in LOEVE (1977) and CHOW and TEICHER (1997). Additional results on random series and a.s. convergence appear in STOUT (1974) and KWAPIEN and WOYCZYNSKI (1992). 5. Characteristic Functions and Classical Limit Theorems The central limit theorem (a name first used by P6LYA (1920)) has a long and glorious history, beginning with the work of DE MOIVRE (1733-56), who obtained the now-familiar approximation of binomial probabilities in terms of the normal density function. LAPLACE (1774, 1812-20) stated the general result in the modern integrated form, but his proof was incomplete, as was the proof of CHEBYSHEV (1867, 1890). The first rigorous proof was given by LIAPOUNOV (1901), though under an extra moment condition. Then LINDEBERG (1922a) proved his funda- mental Theorem 5.12, which in turn led to the basic Proposition 5.9 in a series of papers by LINDEBERG (1922b) and LEVY (1922a-c). BERN- STEIN (1927) obtained the first extension to higher dimensions. The general problem of normal convergence, regarded for two centuries as the cen- tral (indeed the only) theoretical problem in probability, was eventually solved in the form of Theorem 5.15, independently by FELLER (1935) and LEVY (1935a). Slowly varying functions were introduced and studied by KARAMATA (1930). Though characteristic functions have been used in probability theory ever since LAPLACE (1812-20), their first use in a rigorous proof of a limit theorem had to wait until LIAPOUNOV (1901). The first general continuity theorem was established by LEVY (1922c), who assumed the characteristic functions to converge uniformly in some neighborhood of the origin. The definitive version in Theorem 5.22 is due to BOCHNER (1933). Our direct approach to Theorem 5.3 may be new, in avoiding the relatively deep HELLY selection theorem (1911-12). The basic Corollary 5.5 was noted by GRAMER and WOLD (1936). Introductions to characteristic functions and classical limit theorems may be found in many textbooks, notably LOEVE (1977). FELLER (1971) is a rich source of further information on Laplace transforms, characteris- tic functions, and classical limit theorems. For more detailed or advanced results on characteristic functions, see LUKACS (1970). 
Historical and Bibliographical Notes 573 6. Conditioning and Disintegration Though conditional densities have been computed by statisticians ever since LAPLACE (1774), the first general approach to conditioning was devised by KOLMOGOROV (1933), who defined conditional probabilities and ex- pectations as random variables on the basic probability space, using the Radon-Nikodym theorem, which had recently become available. His orig- inal notion of conditioning with respect to a random vector was extended by HALMOS (1950) to general random elements and then by DOOB (1953) to abstract sub-a-fields. Our present Hilbert space approach to conditioning, essentially due to VaN NEUMANN (1940), is more elementary and intuitive and avoids the use of the relatively deep Radon-Nikodym theorem. It has the further advantage of leading to the attractive interpretation of a martingale as a projective family of random variables. The existence of regular conditional distributions was studied by several authors, beginning with DOOB (1938). It leads immediately to the familiar disintegration of measures on product spaces and to the frequently used but rarely stated disintegration Theorem 6.4. Measures on infinite product spaces were first considered by DANIELL (1918-19, 1919-20), who proved the extension Theorenl 6.14 for count- able product spaces. KOLMOGOROV (1933) extended the result to arbitrary index sets. LOMNICKI and ULAM (1934) noted that no topological assump- tions are needed for the construction of infinite product measures, a result that was later extended by C.T. IONESCU TULCEA (1949-50) to measures specified by a sequence of conditional distributions. The interpretation of the simple Markov property in terms of conditional independence was indicated already by MARKOV (1906), and the formal statement of Proposition 6.6 appears in DooB (1953). Further properties of conditional independence have been listed by DaHLER (1980) and others. The transfer Theorem 6.10, in the present form quoted from KALLENBERG (1988), may have been first noted by THORISSON. The traditional Radon-Nikodym approach to conditional expectations appears in many textbooks, such as in BILLINGSLEY (1995). 7. Martingales and Optional Times Martingales were first introduced by BERNSTEIN (1927, 1937) in his efforts to relax the independence assumption in the classical limit theorems. Both BERNSTEIN and LEVY (1935a-b, 1954) extended Kolmogorov's maximum inequality and the central limit theorem to a general martingale context. The term martingale (originally denoting part of a horse's harness and later used for a special gambling system) was introduced in the probabilistic context by VILLE (1939). 
574 Foundations of Modern Probability The first martingale convergence theorem was obtained by JESSEN (1934) and LEVY (1935b), both of whom proved Theorem 7.23 for filtrations generated by sequences of independent random variables. A submartin- gale version of the same result appears in SPARRE-ANDERSEN and JESSEN (1948). The independence assumption was removed by LEVY (1954), who also noted the simple martingale proof of Kolmogorov's zero-one law and obtained his conditional version of the Borel-Cantelli lemma. The general convergence theorem for discrete-time martingales was proved by DOOB (1940), and the basic regularity theorems for continuous- time martingales first appeared in DooB (1951). The theory was extended to submartingales by SNELL (1952) and DOOB (1953). The latter book is also the original source of such fundamental results as the martingale closure theorem, the optional sampling theorem, and the LP-inequality. Though hitting times have long been used informally, general optional times seem to appear for the first time in DOOB (1936). Abstract filtrations were not introduced until DOOB (1953). Progressive processes were intro- duced by DYNKIN (1961), and the modern definition of the a-fields :F r is due to YUSHKEVICH. Elementary introductions to martingale theory are given by many authors, including WILLIAMS (1991). More information about the discrete- time case is given by NEVEU (1975) and CHOW and TEICHER (1997). For a detailed account of the continuous-time theory and its relations to Markov processes and stochastic calculus, see DELLACHERIE and MEYER (1975-87). 8. Markov Processes and Discrete-Time Chains Markov chains in discrete time and with finitely many states were intro- duced by MARKOV (1906), who proved the first ergodic theorem, assuming the transition probabilities to be strictly positive. KOLMOGOROV (1936a- b) extended the theory to countable state spaces and arbitrary transition probabilities. In particular, he noted the decomposition of the state space into irreducible sets, classified the states with respect to recurrence and periodicity, and described the asymptotic behavior of the n-step transition probabilities. Kolmogorov's original proofs were analytic. The more intu- itive coupling approach was introduced by DOEBLIN (1938), long before the strong Markov property had been formalized. BACHELlER had noted the connection between random walks and dif- fusions, which inspired KOLMOGOROV (1931a) to give a precise definition of Markov processes in continuous time. His treatment is purely analytic, with the distribution specified by a family of transition kernels satisfying the Chapman-Kolmogorov relation, previously noted in special cases by CHAPMAN (1928) and SMOLUCHOVSKY. KOLMOGOROV (1931a) makes no reference to sample paths. The transi- tion to probabilistic methods began with the work of LEVY (1934-35) and DOEBLIN (1938). Though the strong Markov property was used informally 
Historical and Bibliographical Notes 575 by those authors (and indeed already by BACHELlER (1900, 1901)), the result was first stated and proved in a special case by DOOB (1945). Gen- eral filtrations were introduced in Markov process theory by BLUMENTHAL (1957). The modern setup, with a canonical process X defined on the path space 0, equipped with a filtration F, a family of shift operators ()t, and a collection of probability measures Px, was developed systematically by DYNKIN (1961, 1965). A weaker form of Theorem 8.23 appears in BLUMEN- THAL and GETOOR (1968), and the present version is from KALLENBERG (1987, 1998). Elementary introductions to Markov processes appear in many text- books, such as ROGERS and WILLIAMS (2000a) and CHUNG (1982). More detailed or advanced accounts are given by DYNKIN (1965), BLUMEN- THAL and GETOOR (1968), ETHIER and KURTZ (1986), DELLACHERIE and MEYER (1975-87), and SHARPE (1988). FELLER (1968) gives a masterly in- troduction to Markov chains, later imitated by many authors. More detailed accounts of the discrete-time theory appear in KEMENY et al. (1966) and FREEDMAN (1971a). The coupling method fell into oblivion after Doeblin's untimely death in 1940 but has recently enjoyed a revival, meticulously documented by LINDVALL (1992) and THORISSON (2000). 9. Random Walks and Renewal Theory Random walks originally arose in a wide range of applications, such as gam- bling, queuing, storage, and insurance; their history can be traced back to the origins of probability. The approximation of diffusion processes by ran- dom walks dates back to BACHELlER (1900, 1901). A further application was to potential theory, where in the 1920s a method of discrete approxi- mation was devised, admitting a probabilistic interpretation in terms of a simple symmetric random walk. Finally, random walks played an important role in the sequential analysis developed by WALD (1947). The modern theory began with P6LYA'S (1921) discovery that a simple symmetric random walk on Zd is recurrent for d < 2 and transient other- wise. His result was later extended to Brownian motion by LEVY (1940) and KAKUTANI (1944a). The general recurrence criterion in Theorem 9.4 was derived by CHUNG and FUCHS (1951), and the probabilistic approach to Theorem 9.2 was found by CHUNG and ORNSTEIN (1962). The first con- dition in Corollary 9.7 is, in fact, even necessary for recurrence, as was noted independently by ORNSTEIN (1969) and C.J. STONE (1969). The reflection principle was first used by ANDRE (1887) in his discussion of the ballot problem. The systematic study of fluctuation and absorption problems for random walks began with the work of POLLACZEK (1930). Ladder times and heights, first introduced by BLACKWELL, were explored in an influential paper by FELLER (1949). The factorizations in Theorem 9.15 were originally derived by the Wiener-Hopf technique, which had been de- veloped by PALEY and WIENER (1934) as a general tool in Fourier analysis. 
576 Foundations of Modern Probability Theorem 9.16 is due for u = 0 to SPARRE-ANDERSEN (1953-54) and in gen- eral to BAXTER (1961). The former author used complicated combinatorial methods, which were later simplified by FELLER and others. Though renewals in Markov chains are implicit already in some early work of KOLMOGOROV and LEVY, the general renewal process was ap- parently first introduced by PALM (1943). The first renewal theorem was obtained by ERDOS et al. (1949) for random walks on Z+. In that case, however, CHUNG noted that the result is an easy consequence of KOL- MOGOROV'S (1936a-b) ergodic theorem for Markov chains on a countable state space. BLACKWELL (1948, 1953) extended the result to random walks on +. The ultimate version for transient random walks on JR. is due to FELLER and OREY (1961). The first coupling proof of Blackwell's theorem was given by LINDVALL (1977). Our proof is a modification of an argu- ment by ATHREYA et al. (1978), which originally did not cover all cases. The method seems to require the existence of a possibly infinite mean. An analytic approach to the general case appears in FELLER (1971). Elementary introductions to random walks are given by many authors, including CHUNG (1974), FELLER (1968, 1971), and LOEVE (1977). A detailed exposition of random walks on Zd is given by SPITZER (1976). 10. Stationary Processes and Ergodic Theory The history of ergodic theory dates back to BOLTZMANN'S (1887) work in statistical mechanics. Boltzmann's ergodic hypothesis--the conjectural equality between time and ensemble averages-was long accepted as a heuristic principle. In probabilistic terms it amounts to the convergence t- 1 J f(Xs) ds --t Ef(X o ), where Xt represents the state of the system (typically the configuration of all molecules in a gas) at time t, and the expected value is computed with respect to a suitably invariant probability measure on a compact submanifold of the state space. The ergodic hypothesis was sensationally proved as a mathematical the- orem, first in an L2- vers ion by VON NEUMANN (1932), after KOOPMAN (1931) had noted the connection between measure-preserving transforma- tions and unitary operators on a Hilbert space, and shortly afterwards in the pointwise form of BIRKHOFF (1932). The initially quite intricate proof of the latter was simplified in stages: first by Y OSIDA and KAKUTANI (1939), who noted how the result follows easily from the maximal ergodic Lemma 10.7, and then by GARSIA (1965), who gave a short proof of the latter result. KHINCHIN (1933, 1934) pioneered a translation of the results of ergodic theory into the probabilistic setting of stationary sequences and processes. The first multivariate ergodic theorem was obtained by WIENER (1939), who proved Theorem 10.14 in the special case of averages over concentric balls. More general versions were established by many authors, including DAY (1942) and PITT (1942). The classical methods were pushed to the 
Historical and Bibliographical Notes 577 limit in a notable paper by TEMPEL' MAN (1972). NGU'YEN and ZESSIN (1979) proved versions of the theorem for finitely additive set functions. The first ergodic theorem for noncommutative transformations was obtained by ZVGMUND (1951). SUCHESTON (1983) noted that the statement follows easily from MAKER'S (1940) result. In Lemma 10.15, part (i) is due to ROGERS and SHEPHARD (1958); part (ii) is elementary. The ergodic theorem for random matrices was proved by FURSTENBERG and KESTEN (1960), long before the subadditive ergodic theorem became available. The latter result was originally proved by KINGI\tlAN (1968) under the stronger hypothesis that the array (Xm,n) be jointly stationary in m and n. The present extension and shorter proof are due to LIGGETT (1985). The ergodic decomposition of invariant measures dates back to KRVLOV and BOGOLIOUBOV (1937), though the basic role of the invariant a-field was not recognized until the work of FARRELL (1962) a,nd VARADARA- JAN (1963). The connection between ergodic decompositions and sufficient statistics is explored in an elegant paper by DVNKIN (1978). The tra- ditional approach to the subject is via Choquet theory, as surveyed by DELLACHERIE and MEVER (1975-87). The coupling equivalences in Theorem 10.27 (i) were proved by S. GOLD- STEIN (1979), after GRIFFEATH (1975) had obtained a related result for Markov chains. The shift coupling part of the same theorem was estab- lished by BERBEE (1979) and ALDOUS and THORISSON (1993), and the version for abstract groups was then obtained by THORISSON (1996). The latter author surveyed the whole area in (2000). Elementary introductions to stationary processes have been given by many authors, beginning with DooB (1953) and CRAMER and LEAD- BETTER (1967). LOEVE (1978) contains a more advanced account of probabilistic ergodic theory. A modern and comprehensive survey of the vast area of general ergodic theorems is given by KRENGEL (1985). 11. Related Notions of Symmetry and Invariance Palm distributions are named after the Swedish engineer I>ALM (1943), who in a pioneering study of intensity fluctuations in telephone traffic consid- ered some basic Palm probabilities associated with simple, stationary point processes on JR, using an elementary conditioning approach. Palm also de- rived some primitive inversion formulas. An extended and more rigorous account of Palm's ideas was given by KHINCHIN (1955), in a monograph on queuing theory. Independently of Palm's work, KAPLAN (1955) first obtained Theorem 11.4 as an extension of some results for renewal processes by DOOB (1948). A partial discrete-time result in this direction had already been noted by KAC (1947). Kaplan's result was rediscovered in the setting of Palm distributions, independently by RVLL-NARDZEWSKI (1961) and SLIVNVAK (1962). In the special case of intervals on the real line, Theorem 11.5 (i) was 
578 Foundations of Modern Probability first noted by KOROLYUK (as cited by KHINCHIN (1955)), and part (iii) of the same theorem was obtained by RYLL-NARDZEWSKI (1961). The general versions are due to KONIG and MATTHES (1963) and MATTHES (1963) for d = 1 and to MATTHES et al. (1978) for d > 1. A more primitive setwise version of Theorem 11.8 (i), due to SLIVNYAK (1962), was strengthened by ZAHLE (1980) to convergence in total variation. DE FINETTI (1930, 1937) proved that an infinite sequence of exchangeable random variables is mixed i.i.d. The result became a cornerstone in his the- ory of subjective probability and Bayesian statistics. RYLL-NARDZEWSKI (1957) noted that the theorem remains valid under the weaker hypothesis of spreadability, and BUHLMANN (1960) extended the result to continuous time. The predictable sampling property in Theorem 11.13 was first noted by DOOB (1936) for Li.d. random variables and increasing sequences of predictable times. The general result and its continuous-time counterpart appear in KALLENBERG (1988). SPARRE-ANDERSEN'S (1953-54) announce- ment of his Corollary 11.14 was (according to Feller) "a sensation greeted with incredulity, and the original proof was of an extraordinary intricacy and complexity." A simplified argument (different from ours) appears in FELLER (1971). Lemma 11.9 is quoted from KALLENBERG (1999b). BERTRAND (1887) noted that if two candidates A and B in an election get the proportions p and 1- p of the votes, then the probability that A will lead throughout the counting of ballots equals (2p - 1) V O. More general "ballot theorems" and alternative proofs have been discovered by many au- thors, beginning with ANDRE (1887) and BARBIER (1887). TAKACS (1967) obtained the version for cyclically stationary processes on a finite interval and gave numerous applications to queuing theory. The present statement is cited from KALLENBERG (1999a). The first version of Theorem 11.18 was obtained by SHANNON (1948), who proved the convergence in probability for stationary and ergodic Markov chains in a finite state space. The Markovian restriction was lifted by McMILLAN (1953), who also strengthened the result to convergence in L 1 . CARLESON (1958) extended McMillan's result to countable state spaces. The a.s. convergence is due to BREIMAN (1957-60) and A. IONESCU TUL- CEA (1960) for finite state spaces and to CHUNG (1961) for the countable case. More information about Palm measures is available in MATTHES et al. (1978), DALEY and VERE-JONES (1988), and THORISSON (2000). Appli- cations to queuing theory and other areas are discussed by many authors, including FRANKEN et al. (1981) and BACCELLI and BREMAUD (1994). ALDOUS (1985) gives a comprehensive survey of exchangeability theory. A nice introduction to information theory is given by BILLINGSLEY (1965). 
Historical and Bibliographical Notes 579 12. Poisson and Pure Jump-Type Markov Processes The Poisson distribution was introduced by DE MOIVRE (1711-12) and POISSON (1837) as an approximation to the binomial distribution. The as- sociated process arose much later from miscellaneous applications. Thus, it was considered by LUNDBERG (1903) to model streams of insurance claims, by RUTHERFORD and GEIGER (1908) to describe the process of radioactive decay, and by ERLANG (1909) to model the incoming traffic to a telephone exchange. Poisson random measures in higher dimensions appear implicitly in the work of LEVY (1934-35), whose treatment was later formalized by ITO (1942b). The independent-increment characterization of Poisson processes goes back to ERLANG (1909) and LEVY (1934-35). Cox processes, originally introduced by Cox (1955) under the name of doubly stochastic Poisson processes, were thoroughly explored by KINGMAN (1964), KRICKEBERG (1972), and GRANDELL (1976). Thinnings were first considered by RENYI (1956). The binomial construction of general Poisson processes was noted independently by KINGMAN (1967) and MECKE (1967). ()ne-dimensional uniqueness criteria were obtained, first in the Poisson case by RENYI (1967), and then in general by MaNcH (1971), KALLENBERG (173a, 1986), and GRANDELL (1976). The mixed Poisson and binomial processes were studied extensively by MATTHES et al. (1978) and KALLENBERG (1986). Markov chains in continuous time have been studied by many authors, beginning with KOLMOGOROV (1931a). The transition functions of general pure jump-type Markov processes were explored by POSPIS]L (1935-36) and FELLER (1936, 1940), and the corresponding sample path properties were examined by DOEBLIN (1939b) and DooB (1942b). The first continuous- time version of the strong Markov property was obtained by DOOB (1945). KINGMAN (1993) gives an elementary introduction to Poisson processes with numerous applications. More detailed accounts, set in the context of general random measures and point processes, appear in l\;1ATTHES et al. (1978), KALLENBERG (1986), and DALEY and VERE-JONES (1988). Intro- ductions to continuous-time Markov chains are provided by many authors, beginning with FELLER (1968). For a more comprehensive account, see CHUNG (1960). The underlying regenerative structure was examined by KINGMAN (1972). 13. Gaussian Processes and Brownian Motion The Gaussian density function first appeared in the work of DE MOIV- RE (1733-56), and the corresponding distribution became explicit through the work of LAPLACE (1774, 1812-20). The Gaussian law was popularized by GAUSS (1809) in his theory of errors and so became named after him. MAXWELL derived the Gaussian law as the velocity distribution for the molecules in a gas, assuming the hypotheses of Proposition 13.2. Theorem 
580 Foundations of Modern Probability 13.3 was originally stated by SCHOENBERG (1938) as a relation between positive definite and completely monotone functions; the probabilistic in- terpretation was later noted by FREEDMAN (1962-63). Isonormal Gaussian processes were introduced by SEGAL (1954). The process of Brownian motion was introduced by BACHELlER (1900, 1901) to model fluctuations on the stock market. Bachelier discovered some basic properties of the process, such as the relation Mt =d I Bt I. EINSTEIN (1905, 1906) later introduced the same process as a model for the physical phenomenon of Brownian motionthe irregular movement of microscopic particles suspended in a liquid. The latter phenomenon, first noted by VAN LEEUWENHOEK in the seventeenth century, is named after the botanist BROWN (1828) for his systematic observations of pollen grains. Einstein's theory was forwarded in support of the still-controversial molecular theory of matter. A more refined model for the physical Brownian motion was proposed by LANGEVIN (1909) and ORNSTEIN and UHLENBECK (1930). The mathematical theory of Brownian motion was put on a rigorous basis by WIENER (1923), who constructed the associated distribution as a measure on the space of continuous paths. The significance of Wiener's rev- olutionary paper was not fully recognized until after the pioneering work of KOLMOGOROV (1931a, 1933), LEVY (1934-35), and FELLER (1936). Wiener also introduced stochastic integrals of deterministic L 2 -functions, which were later studied in further detail by PALEY et al. (1933). The spectral representation of stationary processes, originally deduced from BOCHNER'S (1932) theorem by CRAMER (1942), was later recognized as equivalent to a general Hilbert space result due to M.H. STONE (1932). The chaos expan- sion of Brownian functionals was discovered by WIENER (1938), and the theory of multiple integrals with respect to Brownian motion was developed in a seminal paper of ITO (1951c). The law of the iterated logarithm was discovered by KHINCHIN, first (1923, 1924) for Bernoulli sequences, and later (1933) for Brownian motion. A systematic study of the Brownian paths was initiated by LEVY (1954, 1965), who proved the existence of the quadratic variation in (1940) and the arcsine laws in (1939, 1965). Though many proofs of the latter have since been given, the present deduction from basic symmetry properties may be new. The strong Markov property was used implicitly in the work of Levy and others, but the result was not carefully stated and proved until HUNT (1956). Many modern probability texts contain detailed introductions to Brown- ian motion. The books by ITO and McKEAN (1965), FREEDMAN (1971b), KARATZAS and SHREVE (1991), and REVUZ and YOR (1999) provide a wealth of further information on the subject. Further information on mul- tiple Wiener-Ito integrals is given by KALLIANPUR (1980), DELLACHERIE et al. (1992), and NUALART (1995). The advanced theory of Gaussian distributions is nicely surveyed by ADLER (1990). 
Historical and Bibliographical Notes 581 14. Skorohod Embedding and lnvariance Principles The first functional limit theorems were obtained in (1931b, 1933a) by KOL- MOGOROV, who considered special functionals of a random walk. ERDOS and KAC (1946, 1947) conceived the idea of an invariance principle that would allow functional limit theorems to be extended froIn particular cases to a general setting. They also treated some special functionals of a ran- dom walk. The first general functional limit theorems were obtained by DONSKER (1951-52) for random walks and empirical distribution functions, following an idea of DOOB (1949). A general theory based on sophisti- cated compactness arguments was later developed by PROHOROV (1956) and others. SKOROHOD's (1965) embedding theorem provided a new and probabilis- tic approach to Donsker's theorem. Extensions to the nlartingale context were obtained by many authors, beginning with DUBINS (1968). Lemma 14.19 appears in DVORETZKY (1972). Donsker's weak invariance princi- ple was supplemented by a strong version due to STRASSEN (1964), which yields extensions of many a.s. limit theorems for Brownia.n motion to suit- able random walks. In particular, his result yields a silnple proof of the HARTMAN and WINTNER (1941) law of the iterated logarithm, which had originally been deduced from some deep results of KOLMOGOROV (1929). BILLINGSLEY (1968) gives many interesting applications and extensions of Donsker's theorem. For a wide range of applications of the martin- gale embedding theorem, see HALL and HEYDE (1980) and DURRETT (1995). KOMLOS et al. (1975-76) showed that the approximation rate in the Skorohod embedding can be improved by a more delicate "strong approx- imation." For an exposition of their work and its numerous applications, see CSORGO and REvEsz (1981). 15. Independent-Increment Processes and Approximation Until the 1920s, Brownian motion and the Poisson process were essentially the only known processes with independent increments. In (1924, 1925) LEVY introduced the stable distributions and noted that they too could be associated with suitable "decomposable" processes. DE FINETTI (1929) saw the general connection between processes with independent increments and infinitely divisible distributions and posed the problem of characterizing the latter. A partial solution for distributions with a finite second moment was found by KOLMOGOROV (1932). The complete solution was obtained in a revolutionary paper by LEVY (1934-35), where the "decomposable" processes are analyzed by a virtuosic blend of analytic and probabilistic methods, leading to an explicit descrip- tion in terms of a jump and a diffusion component. As a byproduct, Levy 
582 Foundations of Modern Probability obtained the general representation for the associated characteristic func- tions. His analysis was so complete that only improvements in detail have since been possible. In particular, ITO (1942b) showed how the jump com- ponent can be expressed in terms of Poisson integrals. Analytic derivations of the representation formula for the characteristic function were later given by LEVY (1954) himself, by FELLER (1937), and by KHINCHIN (1937). The scope of the classical central limit problem was broadened by LEVY (1925) to a general study of suitably normalized partial sums, obtained from a single sequence of independent random variables. To include the case of the classical Poisson approximation, KOLMOGOROV proposed a fur- ther extension to general triangular arrays, subject to the sole condition of uniformly asymptotically negligible elements. In this context, FELLER (1937) and KHINCHIN (1937) proved independently that the limiting distri- butions are infinitely divisible. It remained to characterize the convergence to specific limits, a problem that had already been solved in the Gaussian case by FELLER (1935) and LEVY (1935a). The ultimate solution was ob- tained independently by DOEBLIN (1939) and GNEDENKO (1939), and a comprehensive exposition of the theory was published by GNEDENKO and KOLMOGOROV (1968). The basic convergence Theorem 15.17 for Levy processes and the as- sociated approximation result for random walks in Corollary 15.20 are essentially due to SKOROHOD (1957), though with rather different state- ments and proofs. Lemma 15.22 appears in DOEBLIN (1939a). Our approach to the basic representation theorem is a modernized version of Levy's proof, with simplifications resulting from the use of basic point process and martingale methods. Detailed accounts of the basic limit theory for null arrays are provided by many authors, including LOEVE (1977) and FELLER (1971). The positive case is treated in KALLENBERG (1986). A modern introduction to Levy processes is given by BERTOIN (1996). General independent increment pro- cesses and associated limit theorems are treated in JACOD and SHIRYAEV (1987). Extreme value theory is surveyed by LEADBETTER et al. (1983). 16. Convergence of Random Processes, Measures, and Sets After DONSKER (1951-52) had proved his functional limit theorems for random walks and empirical distribution functions, a general theory of weak convergence in function spaces was developed by the Russian school, in seminal papers by PROHOROV (1956), SKOROHOD (1956, 1957), and KOLMOGOROV (1956). Thus, PROHOROV (1956) proved his fundamental compactness Theorem 16.3, in a setting for separable and complete metric spaces. The abstract theory was later extended in various directions by 
Historical and Bibliographical Notes 583 LE CAM (1957), VARADARAJAN (1958), and DUDLEY (1966, 1967). The elementary inequality of OTTAVIANI is from (1939). Originally SKOROHOD (1956) considered the space D([O,l]) endowed with four different topologies, of which the J1-topology considered here is by far the most important for applications. The theory was later ex- tended to D(R+) by C.J. STONE (1963) and LINDVALL (1973). Tightness was originally verified by means of various product moment conditions, de- veloped by CHENTSOV (1956) and BILLINGSLEY (1968), before the powerful criterion of ALDOUS (1978) became available. KURTZ (1)75) and MITOMA (1983) noted that criteria for tightness in D(IR+, S) can often be expressed in terms of one-dimensional projections, as in Theorem 16.27. The weak convergence theory for random measures and point processes originated with PROHOROV (1961), who noted the equivalence of (i) and (ii) in Theorem 16.16 when S is compact. The development continued with seminal papers by DEBES et al. (1970-71), HARRIS (1971), and JAGERS (1974). The one-dimensional criteria in Proposition 16.17 and Theorems 16.16 and 16.29 are based on results in KALLENBERG (1973a, 1986, 1996b) and a subsequent remark by KURTZ. Random sets had already been stud- ied extensively by many authors, including CHOQUET (1953-54), KENDALL (1974), and MATHERON (1975), when an associated weak convergence theory was developed by NORBERG (1984). The applications considered in this chapter have a long history. Thus, primitive versions of Theorem 16.18 were obtained by PALM (1943), KHIN- CHIN (1955), and OSOSKOV (1956). The present version is due for S =  to GRIGELIONIS (1963) and for more general spaces to GOLDMAN (1967) and JAGERS (1972). Limit theorems under simultaneous thinning and reseal- ing of a given point process were obtained by RENYI (1956), NAWROTZKI (1962), BELYAEV (1963), and GOLDMAN (1967). The general version in Theorem 16.19 was proved by KALLENBERG (1986) after MECKE (1968) had obtained his related characterization of Cox processes. Limit theo- rems for sampling from a finite population and for general exchangeable sequences have been proved in varying generality by many authors, in- cluding CHERNOV and TEICHER (1958), HAJEK (1960), ROSEN (1964), BILLINGSLEY (1968), and HAGBERG (1973). The results of Theorems 16.23 and 16.21 first appeared in KALLENBERG (1973b). Detailed accounts of weak convergence theory and its applications may be found in several excellent textbooks and monographs, including BILLINGS- LEY (1968), POLLARD (1984), ETHIER and KURTZ (1986), and JACOD and SHIRYAEV (1987). More information on limit theorems for random measures and point processes is available in MATTHES et al. (1978) and KALLENBERG (1986). A good general reference for random sets is MATHERON (1975). 
584 Foundations of Modern Probability 17. Stochastic Integrals and Quadratic Variation The first stochastic integral with a random integrand was defined by ITO (1942a, 1944), who used Brownian motion as the integrator and assumed the integrand to be product measurable and adapted. DooB (1953) noted the connection with martingale theory. A first version of the fundamen- tal substitution rule was proved by ITO [(1951a). The result was later extended by many authors. The compensated integral in Corollary 17.21 was introduced by FISK, and independently by STRATONOVICH (1966). The existence of the quadratic variation process was originally de- duced from the Doob-Meyer decomposition. FISK (1966) showed how the quadratic variation can also be obtained directly from the process, as in Proposition 17.17. The present construction was inspired by ROGERS and WILLIAMS (2000b). The BDG inequalities were originally proved for p > 1 and discrete time by BURKHOLDER (1966). MILLAR (1968) noted the ex- tension to continuous martingales, in which context the further extension to arbitrary p > 0 was obtained independently by BURKHOLDER and GUNDY (1970) and NOVIKOV (1971). KUNITA and WATANABE (1967) introduced the covariation of two martingales and proved the associated characteri- zation of the integral. They further established some general inequalities related to Proposition 17.9. The It6 integral was extended to square-integrable martingales by COUR- REGE (1962-63) and KUNITA and WATANABE (1967) and to continuous semimartingales by DOLEANS-DADE and MEYER (1970). The idea of local- ization is due to ITO and WATANABE (1965). Theorem 17.24 was obtained by KAZAMAKI (1972) as part of a general theory of random time change. Stochastic integrals depending on a parameter were studied by DOLEANS (1967b) and STRICKER and YOR (1978), and the functional representation of Proposition 17.26 first appeared in KALLENBERG (1996a). Elementary introductions to Ita integration appear in many textbooks, such as CHUNG and WILLIAMS (1983) and 0KSENDAL (1998). For more advanced accounts and for further information, see IKEDA and WATANABE (1989), ROGERS and WILLIAMS (2000b), KARATZAS and SHREVE (1991), and REVUZ and YOR (1999). 18. Continuous Martingales and Brownian Motion The fundamental characterization of Brownian motion in Theorem 18.3 was proved by LEVY (1954), who also (1940) noted the conformal invariance up to a time change of complex Brownian motion and stated the polar- ity of singletons. A rigorous proof of Theorem 18.6 was later provided by KAKUTANI (1944a-b). KUNITA and WATANABE (1967) gave the first mod- ern proof of Levy's characterization theorem, based on Ita's formula and exponential martingales. The history of the latter can be traced back to the seminal CAMERON and MARTIN (1944) paper, the source of Theorem 
Historical and Bibliographical Notes 585 18.22, and to WALD'S (1946, 1947) work in sequential analysis, where the identity of Lemma 18.24 first appeared in a version for random walks. The integral representation in Theorem 18.10 is essentially due to ITO (1951c), who noted its connection with multiple stochastic integrals and chaos expansions. A one-dimensional version of Theorem 18.12 appears in DOOB (1953). The general time-change Theorem 18.4 was discovered in- dependently by DAMBIS (1965) and DUBINS and SCHWARZ (1965), and a systematic study of isotropic martingales was initiated by GETOOR and SHARPE (1972). The multivariate result in Proposition 18.8 was noted by KNIGHT (1971), and a version of Proposition 18.9 for general ex- changeable processes appears in KALLENBERG (1989). The skew-product representation in Corollary 18.7 is due to GALMARINO (1963), The Cameron-Martin theorem was gradually extended to more general settings by many authors, including MARUYAMA (1954, 1955), GIRSANOV (1960), and VAN SCHUPPEN and WONG (1974). The martingale criterion of Theorem 18.23 was obtained by NOVIKOV (1972). The material in this chapter is covered by many texts, including the excellent monographs by KARATZAS and SHREVE (1991) and REVUZ and YOR (1999). A more advanced and amazingly informative text is JACOD (1979). 19. Feller Processes and Semigroups Semigroup ideas are implicit in KOLMOGOROV'S pioneering (1931a) pa- per, whose central theme is the search for local characteristics that will determine the transition probabilities through a system of differential equa- tions, the so-called Kolmogorov forward and backward equations. Markov chains and diffusion processes were originally treated separately, but in (1935) KOLMOGOROV proposed a unified framework, with transition ker- nels regarded as operators (initially operating on measures rather than on functions), and with local characteristics given by an associated generator. Kolmogorov's ideas were taken up by FELLER (1936), who obtained general existence and uniqueness results for the forward and backward equations. The abstract theory of contraction semigroups on Banach spaces was developed independently by HILLE (1948) and YOSIDA (1948), both of whom recognized its significance for the theory of Markov processes. The power of the semigroup approach became clear through the work of FELLER (1952, 1954), who gave a complete description of the generators of one-dimensional diffusions. In particular, Feller characterizes the boundary behavior of the process in terms of the domain of the generator. The systematic study of Markov semigroups began with the work of DVNKIN (1955a). The standard approach is to postulate strong continuity instead of the weaker and more easily verified condition (F 2 ). The posi- tive maximum principle appears in the work of ITO (1957), and the core condition of Proposition 19.9 is due to S. WATANABE (1968). 
586 Foundations of Modern Probability The first regularity theorem was obtained by DOEBLIN (1939b), who gave conditions for the paths to be step functions. A sufficient condition for continuity was then obtained by FORTET (1943). Finally, KINNEY (1953) showed that any Feller process has a version with rcll paths, after DYNKIN (1952) had obtained the same property under a 6lder condition. The use of martingale methods for the study of Markov processes dates back to KINNEY (1953) and DOOB (1954). The strong Markov property for Feller processes was proved indepen- dently by DYNKIN and YUSHKEVICH (1956) and by BLUMENTHAL (1957) after special cases had been considered by DOOB (1945), HUNT (1956), and RAY (1956). BLUMENTHAL'S (1957) paper also contains his zero-one law. DYNKIN (1955a) introduced his "characteristic operator," and a version of Theorem 19.24 appears in DYNKIN (1956). There is a vast literature on approximation results for Markov chains and Markov processes, covering a wide range of applications. The use of semigroup methods to prove limit theorems can be traced back to LINDE- BERG'S (1922a) proof of the central limit theorem. The general results in Theorems 19.25 and 19.28 were developed in stages by TROTTER (1958a), SOYA (1967), KURTZ (1969, 1975), and MACKEVICIUS (1974). Our proof of Theorem 19.25 uses ideas from J .A. GOLDSTEIN (1976). A splendid introduction to semigroup theory is given by the relevant chapters in FELLER (1971). In particular, Feller shows how the one- dimensional Levy-Khinchin formula and associated limit theorems can be derived by semigroup methods. More detailed and advanced accounts of the subject appear in DYNKIN (1965), ETHIER and KURTZ (1986), and DELLACHERIE and MEYER (1975-87). 20. Ergodic Properties of Markov Processes The first ratio ergodic theorems were obtained by DOEBLIN (1938b), DOOB (1938, 1948a), KAKUTANI (1940), and HUREWICZ (1944). HOPF (1954) and DUNFORD and SCHWARTZ (1956) extended the pointwise ergodic theorem to general L1-LOO-contractions, and the ratio ergodic theorem was extended to positive L1-contractions by CHACON and ORNSTEIN (1960). The present approach to their result in due to AKCOGLU and CHACON (1970). The notion of Harris recurrence goes back to DOEBLIN (1940) and HAR- RIS (1956). The latter author used the condition to ensure the existence, in discrete time, of a a-finite invariant measure. A corresponding continu- ous-time result was obtained by H. WATANABE (1964). The total variation convergence of Markov transition probabilities was obtained for a count- able state space by OREY (1959, 1962) and in general by JAMISON and OREY (1967). BLACKWELL and FREEDMAN (1964) noted the equivalence of mixing and tail triviality. The present coupling approach goes back to GRIFFEATH (1975) and S. GOLDSTEIN (1979) for the case of strong ergod- 
Historical and Bibliographical Notes 587 icity and to BERBEE (1979) and ALDOUS and THORISSON (1993) for the corresponding weak result. There is an extensive literature on ergodic theorems for Markov pro- cesses, mostly dealing with the discrete-time case. General expositions have been given by many authors, beginning with NEVEU (1971) and OREY (1971). Our treatment of Harris recurrent Feller processes is adapted from KUNITA (1990), who in turn follows the discrete-time approach of RE- vuz (1984). KRENGEL (1985) gives a comprehensive survey of abstract ergodic theorems. Detailed accounts of the coupling method and its various ramifications appear in LINDVALL (1992) and THORISSON (2000). 21. Stochastic Differential Equations and Martingale Problems Long before the existence of any general theory for SDEs, LANGEVIN (1908) proposed his equation to model the velocity of a Brownian particle. The solution process was later studied by ORNSTEIN and UHLENBECK (1930) and was thus named after them. A more rigorous discussion appears in DOOB (1942a). The general idea of a stochastic differential equation goes back to BERN- STEIN (1934, 1938), who proposed a pathwise construction of diffusion processes by a discrete approximation, leading in the limit to a formal differential equation driven by a Brownian motion. However, I TO (1942a, 1951b) was the first author to develop a rigorous and systematic theory, including a precise definition of the integral, conditions for existence and uniqueness of solutions, and basic properties of the solution process, such as the Markov property and the continuous dependence on initial state. Similar results were obtained, later but independently, by GIHMAN (1947, 1950-51). The notion of a weak solution was introduced by GIRSANOV (1960), and a version of the weak existence Theorem 21.9 appears in SKOROHOD (1965). The ideas behind the transformations in Propositions 21.12 and 21.13 date back to GIRSANOV (1960) and VOLKONSKY (1958), respectively. The no- tion of a martingale problem can be traced back to LEVY's martingale characterization of Brownian motion and DYNKIN's theory of the charac- teristic operator. A comprehensive theory was developed by STROOCK and VARADHAN (1969), who established the equivalence with weak solutions to the associated SDEs, obtained general criteria for uniqueness in law, and deduced conditions for the strong Markov and Feller properties. The measurability part of Theorem 21.10 is a slight extension of an exercise in STROOCK and VARADHAN (1979). YAMADA and WATANABE (1971) proved that weak existence and path- wise uniqueness imply strong existence and uniqueness in law. Under the same conditions, they further established the existence of a functional 
588 Foundations of Modern Probability solution, possibly depending on the initial distribution of the process; that dependence was later removed by KALLENBERG (1996a). IKEDA and WATANABE (1989) noted how the notions of pathwise uniqueness and uniqueness in law extend by conditioning from degenerate to arbitrary initial distributions. The basic theory of SDEs is covered by many excellent textbooks on different levels, including IKEDA and WATANABE (1989), ROGERS and WILLIAMS (1987), and KARATZAS and SHREVE (1991). More information on the martingale problem is available in JACOD (1979), STROOCK and VARADHAN (1979), and ETHIER and KURTZ (1986). 22. Local Time, Excursions, and Additive Functionals Local time of Brownian motion at a fixed point was discovered and ex- plored by LEVY (1939), who devised several explicit constructions, mostly of the type of Proposition 22.12. Much of Levy's analysis is based on the observation in Corollary 22.3. The elementary Lemma 22.2 is due to SKOROHOD (1961-62). Formula (1), first noted for Brownian motion by TANAKA (1963), was taken by MEYER (1976) as the basis for a general semi- martingale approach. The general It6-Tanaka formula in Theorem 22.5 was obtained independently by MEYER (1976) and WANG (1977). TROTTER (1958b) proved that Brownian local time has a jointly continuous version, and the extension to general continuous semimartingales in Theorem 22.4 was obtained by YOR (1978). Modern excursion theory originated with the seminal paper of ITO (1972), which was partly inspired by earlier work of LEVY (1939). In par- ticular, Ita proved a version of Theorem 22.11, assuming the existence of local time. HOROWITZ (1972) independently studied regenerative sets and noted their connection with subordinators, equivalent to the existence of a local time. A systematic theory of regenerative processes was developed by MAISONNEUVE (1974). The remarkable Theorem 22.17 was discovered independently by RAY (1963) and KNIGHT (1963), and the present proof is essentially due to WALSH (1978). Our construction of the excursion process is close in spirit to Levy's original ideas and to those in GREENWOOD and PITMAN (1980). Elementary additive functionals of integral type had been discussed ex- tensively in the literature when DYNKIN proposed a study of the general case. The existence Theorem 22.23 was obtained by VOLKONSKY (1960), and the construction of local time in Theorem 22.24 dates back to BLUMEN- THAL and GETOOR (1964). The integral representation of CAFs in Theorem 22.25 was proved independently by VOLKONSKY (1958, 1960) and McK- EAN and TANAKA (1961). The characterization of additive functionals in terms of suitable measures on the state space dates back to MEYER (1962), and the explicit representation of the associated measures was found by REVUZ (1970) after special cases had been considered by HUNT (1957-58). 
Historical and Bibliographical Notes 589 An excellent introduction to local time appears in KARATZAS and SHREVE (1991). The books by ITO and McKEAN (1965) and REVUZ and YOR (1999) contain an abundance of further information on the subject. The latter text may also serve as a good introduction to additive func- tionals and excursion theory. For more information on the latter topics, the reader may consult BLUMENTHAL and GETOOR (1968), BLUMENTHAL (1992), and DELLACHERIE et al. (1992). 23. One-Dimensional SDEs and Diffusions The study of continuous Markov processes and the associated parabolic dif- ferential equations, initiated by KOLMOGOROV (1931a) and FELLER (1936), took a new direction with the seminal papers of FELLER (1952, 1954), who studied the generators of one-dimensional diffusions within the framework of the newly developed semigroup theory. In particular, Feller gave a com- plete description in terms of scale function and speed measure, classified the boundary behavior, and showed how the latter is determined by the domain of the generator. Finally, he identified the cases when explosion occurs, corresponding to the absorption cases in Theorem 23.15. A more probabilistic approach to these results was developed by DYNKIN (1955b, 1959), who along with RAY (1956) continued Feller's study of the relationship between analytic properties of the generator and sample path properties of the process. The idea of constructing diffusions on a natural scale through a time change of Brownian motion is due to HUNT (1958) and VOLKONSKY (1958), and the full description in Theorem 23.9 was com- pleted by VOLKONSKY (1960) and ITO and McKEAN (1965). The present stochastic calculus approach is based on ideas in MELEARD (1986). The ratio ergodic Theorem 23.14 was first obtained for Brownian motion by DERMAN (1954), by a method originally devised for discrete-time chains by DOEBLIN (1938). It was later extended to more general diffusions by MOTOO and WATANABE (1958). The ergodic behavior of recurrent one- dimensional diffusions was analyzed by MARUYAMA and TAN AKA (1957). For one-dimensional SDEs, SKOROHOD (1965) noticed that Ita's original Lipschitz condition for pathwise uniqueness can be replaced by a weaker Holder condition. He also obtained a corresponding comparison theorem. The improved conditions in Theorems 23.3 and 23.5 are due to YAMADA and WATANABE (1971) and YAMADA (1973), respectively. PERKINS (1982) and LE GALL (1983) noted how the use of semimartingale local time sim- plifies and unifies the proofs of those and related results. The fundamental weak existence and uniqueness criteria in Theorem 23.1 were discovered by ENGELBERT and SCHMIDT (1984, 1985), whose (1981) zero-one law is implicit in Lemma 23.2. Elementary introductions to one-dimensional diffusions appear in BREI- MAN (1968), FREEDMAN (1971b), and ROGERS and VVILLIAMS (2000b). More detailed and advanced accounts are given by DYNKIN (1965) and ITO 
590 Foundations of Modern Probability and McKEAN (1965). Further information on one-dimensional SDEs may be obtained from the excellent books by KARATZAS and SHREVE (1991) and REVUZ and YOR (1999). 24. Connections with PDEs and Potential Theory The fundamental solution to the heat equation in terms of the Gaus- sian kernel was obtained by LAPLACE (1809). A century later BACHELlER (1900, 1901) noted the relationship between Brownian motion and the heat equation. The PDE connections were further explored by many au- thors, including KOLMOGOROV (1931a), FELLER (1936), KAc (1951), and DOOB (1955). A first version of Theorem 24.1 was obtained by KAC (1949), who was in turn inspired by FEYNMAN'S (1948) work on the Schrodinger equation. Theorem 24.2 is due to STROOCK and VARADHAN (1969). GREEN (1828), in his discussion of the Dirichlet problem, introduced the functions named after him. The Dirichlet, sweeping, and equilibrium problems were all studied by GAUSS (1840) in a pioneering paper on electrostatics. The rigorous developments in potential theory began with POINCARE (1890-99), who solved the Dirichlet problem for domains with a smooth boundary. The equilibrium measure was characterized by GAUSS as the unique measure minimizing a certain energy functional, but the ex- istence of the minimum was not rigorously established until FROSTMAN (1935). The first probabilistic connections were made by PHILLIPS and WIENER (1923) and COURANT et al. (1928), who solved the Dirichlet problem in the plane by a method of discrete approximation, involving a version of Theorem 24.5 for a simple symmetric random walk. KOLMOGOROV and LEONTOVICH (1933) evaluated a special hitting distribution for two- dimensional Brownian motion and noted that it satisfies the heat equation. KAKUTANI (1944b, 1945) showed how the harmonic measure and sweeping kernel can be expressed in terms of a Brownian motion. The probabilistic methods were extended and perfected by DOOB (1954, 1955), who noted the profound connections with martingale theory. A general potential the- ory was later developed by HUNT (1957-58) for broad classes of Markov processes. The interpretation of Green functions as occupation densities was known to KAC (1951), and a probabilistic approach to Green functions was devel- oped by HUNT (1956). The connection between equilibrium measures and quitting times, implicit already in SPITZER (1964) and ITO and McKEAN (1965), was exploited by CHUNG (1973) to yield the explicit representation of Theorem 24.14. Time reversal of diffusion processes was first considered by SCHRODINGER (1931). KOLMOGOROV (1936b, 1937) computed the transition kernels of the reversed process and gave necessary and sufficient conditions for symmetry. The basic role of time reversal and duality in potential theory was recog- 
Historical and Bibliographical Notes 591 nized by DOOB (1954) and HUNT (1958). Proposition 24.15 and the related construction in Theorem 24.21 go back to HUNT, but Theorem 24.19 may be new. The measure v in Theorem 24.21 is related to the "Kuznetsov mea- sures," discussed extensively in GETOOR (1990). The connection between random sets and alternating capacities was established by CHOQUET (1953- 54), and a corresponding representation of infinitely divisible random sets was obtained by MATHERON (1975). Elementary introductions to probabilistic potential theory appear in BASS (1995) and CHUNG (1995), and to other PDE connections in KARATZAS and SHREVE (1991). A detailed exposition of classical prob- abilistic potential theory is given by PORT and STONE (1978). DOOB (1984) provides a wealth of further information on both the analytic and probabilistic aspects. Introductions to Hunt's work and the subse- quent developments are given by CHUNG (1982) and DELLACHERIE and MEYER (1975-87). More advanced treatments appear in BLUMENTHAL and GETOOR (1968) and SHARPE (1988). 25. Predictability, Compensation, and Excessive Functions The basic connection between superharmonic functions and supermartin- gales was established by DOOB (1954), who also proved that compositions of excessive functions with Brownian motion are continuous. Doob further recognized the need for a general decomposition theorem for supermartin- gales, generalizing the elementary Lemma 7.10. Such a result was eventually proved by MEYER (1962, 1963), in the form of Lemma 25.7, after special decompositions in the Markovian context had been obtained by VOLKON- SKY (1960) and SHUR (1961). Meyer's original proof was profound and clever. The present more elementary approach, based on DUNFORD'S (1939) weak compactness criterion, was devised by RAO (1969a). The extension to general submartingales was accomplished by ITO and WATANABE (1965) through the introduction of local martingales. Predictable and totally inaccessible times appear implicitly in the work of BLUMENTHAL (1957) and HUNT (1957-58), in the context of quasi-left-con- tinuity. A systematic study of optional times and their associated a-fields was initiated by CHUNG and DOOB (1965). The basic role of the predictable u-field became clear after DOLEANS (1967a) had proved the equivalence between naturalness and predictability for increasing processes, thereby establishing the ultimate version of the Doob-Meyer decomposition. The moment inequality in Proposition 25.21 was obtained independently by GARSIA (1973) and NEVEU (1975) after a more special result had been proved by BURKHOLDER et al. (1972). The theory of optional and pre- dictable times and u-fields was developed by MEYER (1966), DELLACHERIE 
592 Foundations of Modern Probability (1972), and others into a "general theory of processes," which has in many ways revolutionized modern probability. Natural compensators of optional times first appeared in reliability the- ory. More general compensators were later studied in the Markovian context by S. WATANABE (1964) under the name of "Levy systems." GRIGELIO- NIS (1971) and JACOD (1975) constructed the compensator of a general random measure and introduced the related "local characteristics" of a general semimartingale. WATANABE (1964) proved that a simple point process with a continuous and deterministic compensator is Poisson; a corresponding time-change result was obtained independently by MEYER (1971) and PAPANGELOU (1972). The extension in Theorem 25.24 was given by KALLENBERG (1990), and general versions of Proposition 25.27 appear in ROSINSKI and WOYCZYNSKI (1986) and KALLENBERG (1992). An authoritative account of the general theory, including an elegant but less elementary projection approach to the Doob-Meyer decomposition due to DOLEANS, is given by DELLACHERIE and MEYER (1975-87). Useful in- troductions to the theory are contained in ELLIOTT (1982) and ROGERS and WILLIAMS (2000b). Our elementary proof of Lemma 25.10 uses ideas from DaoB (1984). BLUMENTHAL and GETOOR (1968) remains a good general reference on additive functionals and their potentials. A detailed account of random measures and their compensators appears in JACOD and SHIRYAEV (1987). Applications to queuing theory are given by BREMAUD (1981), BACCELLI and BREMAUD (2000), and LAST and BRANDT (1995). 26. Semimartingales and General Stochastic Integration DOOB (1953) conceived the idea of a stochastic integration theory for gen- eral L 2 -martingales, based on a suitable decomposition of continuous-time submartingales. MEYER'S (1962) proof of such a result opened the door to the L2-theory, which was then developed by COURREGE (1962-63) and KUNITA and WATANABE (1967). The latter paper contains in particular a version of the general substitution rule. The integration theory was later extended in a series of papers by MEYER (1967) and DOLEANS-DADE and MEYER (1970) and reached its final form with the notes of MEYER (1976) and the books by JACOD (1979), METIVIER and PELLAUMAIL (1979), and DELLACHERIE and MEYER (1975-87). The basic role of predictable processes as integrands was recognized by MEYER (1967). By contrast, semimartingales were originally introduced in an ad hoc manner by DOLEANS-DADE and MEYER (1970), and their ba- sic preservation laws were only gradually recognized. In particular, JACOD (1975) used the general Girsanov theorem of VAN SCHUPPEN and WONG (1974) to show that the semimartingale property is preserved under abso- lutely continuous changes of the probability measure. The characterization 
Historical and Bibliographical Notes 593 of general stochastic integrators as semimartingales was obtained indepen- dently by BICHTELER (1979) and DELLACHERIE (1980), in both cases with support from analysts. Quasimartingales were originally introduced by FISK (1965) and OREY (1966). The decomposition of RAO (1969b) extends a result by KRICKE- BERG (1956) for LI-bounded martingales. YOEURP (1976) combined a notion of "stable subspaces" due to KUNITA and WATANABE (1967) with the Hilbert space structure of M 2 to obtain an orthogonal decomposition of L 2 -martingales, equivalent to the decompositions in 'I'heorem 26.14 and Proposition 26.16. Elaborating on those ideas, MEYER (1976) showed that the purely discontinuous component admits a representation as a sum of compensated jumps. SDEs driven by general Levy processes were already considered by ITO (1951 b ). The study of SD Es driven by general semimartingales was initi- ated by DOLEANS-DADE (1970), who obtained her exponential process as a solution to the equation in Theorem 26.8. The scope of the theory was later expanded by many authors, and a comprehensive account is given by PROTTER (1990). The martingale inequalities in Theorems 26.12 and 26.17 have ancient origins. Thus, a version of the latter result for independent random variables was proved by KOLMOGOROV (1929) and, in a sharper form, by PROHOROV (1959). Their result was extended to discrete-time martingales by JOHNSON et al. (1985) and HITCZENKO (1990). The present statements appeared in KALLENBERG and SZTENCEL (1991). Early versions of the inequalities in Theorem 26.12 were proved by KHIN- CHIN (1923, 1924) for symmetric random walks and by PALEY (1932) for Walsh series. A version for independent random variables was obtained by MARCINKIEWICZ and ZYGMUND (1937, 1938). The extension to discrete- time martingales is due to BURKHOLDER (1966) for p > 1 and to DAVIS (1970) for p = 1. The result was extended to continuous time by BURK- HOLDER et al. (1972), who also noted how the general result can be deduced from the statement for p = 1. The present proof is a continuous-time version of Davis' original argument. Excellent introductions to semimartingales and stochastic integration are given by DELLACHERIE and MEYER (1975-87) and JACOD and SHIRYAEV (1987). PROTTER (1990) offers an interesting alternative approach, orig- inally suggested by MEYER and by DELLACHERIE (1980). The book by JACOD (1979) remains a rich source of further information on the subject. 27. Large Deviations Large deviation theory originated with certain refinements of the central limit theorem obtained by many authors, beginning with KHINCHIN (1929). Here the object of study is the ratio of tail probabilities rn(x) == P{ (n > x} / P{< > x}, where < is N(O,l) and <n == n-l/2L:knk for some Li.d. 
594 Foundations of Modern Probability random variables k with mean 0 and variance 1, so that r n (x) -t 1 for fixed x. A precise asymptotic expansion was obtained by CRAMER (1938), in the case when x varies with n at a rate x  o(n 1 / 2 ). (See PETROV (1995), Theorem 5.23, for details.) In the same historic paper, CRAMER (1938) obtained the first true large deviation result, in the form of our Theorem 27.3, though under some technical assumptions that were later removed by CHERNOFF (1952) and BAHADUR (1971). VARADHAN (1966) extended the result to higher dimen- sions and rephrased it in the form of a general large deviation principle. At about the same time, SCHILDER (1966) proved his large deviation re- sult for Brownian motion, using the present change-of-measure approach. Similar methods were used by FREIDLIN and WENTZELL (1970, 1998) to study random perturbations of dynamical systems. Even earlier, SANOY (1957) had obtained his large deviation result for empirical distributions of i.i.d. random variables. The relative entropy H(vlj.,t) appearing in the limit had already been introduced in statistics by KULLBACK and LEIBLER (1951). Its crucial link to the Legendre-Fenchel transform A *, long anticipated by physicists, was formalized by DONSKER and VARADHAN (1975-83). The latter authors also developed some pro- found and far-reaching extensions of Sanov's theorem, in a long series of formidable papers. ELLIS (1985) gives a detailed exposition of those results, along with a discussion of their physical significance. Much of the formalization of underlying principles and techniques was developed at a later stage. Thus, an abstract version of the projective limit approach was introduced by DAWSON and GARTNER (1987). BRYC (1990) supplemented VARADHAN'S (1966) functional version of the LDP with a reverse proposition. Similarly, IOFFE (1991) appended a power- ful inverse to the classical "contraction principle." Finally, PUKHALSKY (1991) established the equivalence, under suitable regularity conditions, of the exponential tightness and the goodness of the rate function. STRASSEN (1964) established his formidable law of the iterated logarithm by direct estimates. A detailed exposition of the original approach appears in FREEDMAN (1971b). VARADHAN (1984) recognized the result as a corol- lary to Schilder's theorem, and a complete proof along the suggested lines appears in DEUSCHEL and STROOCK (1989). Gentle introductions to large deviation theory and its applications are given by VARADHAN (1984) and DEMBO and ZEITOUNI (1998). The more demanding text of DEUSCHEL and STROOCK (1989) provides much additional insight to the persistent reader. Appendix Some more advanced aspects of measure theory are covered by Roy- DEN (1988), PARTHASARATHY (1967), and DUDLEY (1989). The projection 
Historical and Bibliographical Notes 595 and section theorems depend on capacity theory, for which we refer to DELLACHERIE (1972) and DELLACHERIE and MEYER (1975-87). The J}-topology was introduced by SKOROHOD (1956), and detailed ex- positions may be found in BILLINGSLEY (1968), ETHIER and KURTZ (1986), and JACOD and SHIRYAEV (1987). A discussion of the vague topology on M(S) with S IcscH is given by BAUER (1972). The topology on the space of closed sets, considered here, was introduced in a more general setting by FELL (1962), and a full account appears in MATHERON (1975), including a detailed proof (different from ours) of the basic Theorem A2. 5. 
, Bibliography This list includes only publications that are explicitly mentioned in the text or notes or are directly related to results cited in the book. Knowledgeable readers will notice that many books and papers of historical significance have been omitted. ADLER, R.J. (1990). An Introduction to Continuity, Extrema, and Related Topics for General Gaussian Proceses. Inst. Math. Statist., Hayward, CA. AKCOGLU, M.A., CHACON, R.V. (1970). Ergodic properties of operators in Lebesgue space. Adv. Appl. Probab. 2, 1-47. ALDOUS, D.J. (1978). Stopping times and tightness. Ann. Probab. 6, 335-340. - (1985). Exchangeability and related topics. Lect. Notes in Math. 1117, 1-198. Springer, Berlin. ALDOUS, D., THORISSON, H. (1993). Shift-coupling. Stoch. Proc. Appl. 44,1-14. ALEXANDROV, A.D. (1940-43). Additive set-functions in abstract spaces. Mat. Sb. 8, 307-348; 9, 563-628; 13, 169-238. ANDRE, D. (1887). Solution directe du probleme resolu par M. Bertrand. C.R. Acad. Sci. Paris 105, 436-437. ATHREYA, K., McDoNALD, D., NEY, P. (1978). Coupling and the renewal theorem. Amer. Math. Monthly 85, 809-814. BACCELLI, F., BREMAUD, P. (2000). Elements of Queueing [sic) Theory, 2nd ed., Springer, Berlin. BACHELlER, L. (1900). Theorie de la speculation. Ann. Sci. Ecole Norm. Sup. 17, 21-86. - (1901). Theorie mathematique du jeu. Ann. Sci. Ecole Norm. Sup. 18, 143- 210. BAHADUR, R.R. (1971). Some Limit Theorems in Statistics. SIAM, Philadelphia. BARBIER, E. (1887). Generalisation du probleme resolu par M. J. Bertrand. C.R. Acad. Sci. Paris 105, 407, 440. BASS, R.F. (1995). Probabilistic Techniques in Analysis. Springer, NY. - (1998). Diffusions and Elliptic Operators. Springer, NY. BAUER, H. (1972). Probability Theory and Elements of Measure Theory. Engl. trans., Holt, Rinehart & Winston, NY. BAXTER, G. (1961). An analytic approach to finite fluctuation problems in probability. J. d'Analyse Math. 9, 31-70. BELYAEV, Y.K. (1963). Limit theorems for dissipative flows. Th. Probab. Appl. 8, 165-173. 
Bibliography 597 BERBEE, H.C.P. (1979). Random Walks with Stationary Increments and Renewal Theory. Mathematisch Centrum, Amsterdam. BERNOULLI, J. (1713). Ars Conjectandi. Thurnisiorum, Basel. BERNSTEIN, S.N. (1927). Sur l'extension du theoreme limite du calcul des probabilites aux sommes de quantites dependantes. Math. Ann. 97, 1-59. - (1934). Principes de la theorie des equations differentielles stochastiques. Trudy Fiz.-Mat., Steklov Inst., Akad. Nauk. 5,95-124. - (1937). On some variations of the Chebyshev inequality (in Russian). Dokl. Acad. Nauk SSSR 17, 275-277. - (1938). Equations differentielles stochastiques. Act. Sci. Ind. 738, 5-31. BERTOIN, J. (1996). Levy Processes. Cambridge Univ. Press. BERTRAND, J. (1887). Solution d'un probleme. C.R. Acad. Sci. Paris 105, 369. BICHTELER, K. (1979). Stochastic integrators. Bull. Amer. Math. Soc. 1, 761- 765. BIENAYME, J. (1853). Considerations a l'appui de la decouverte de Laplace sur la loi de probabilite dans la methode des moindres carres. C.R. Acad. Sci. Paris 37, 309-324. BILLINGSLEY, P. (1965). Ergodic Theory and Information. "Tiley, NY. - (1968). Convergence of Probability Measures. Wiley, NY. - (1995). Probability and Measure, 3rd ed. Wiley, NY. BIRKHOFF, G.D. (1932). Proof of the ergodic theorem. Proc. Natl. Acad. Sci. USA 17, 656-660. BLACKWELL, D. (1948). A renewal theorem. Duke Math. J. 15, 145-150. - (1953). Extension of a renewal theorem. Pacific J. Math. 3, 315-320. BLACKWELL, D., FREEDMAN, D. (1964). The tail a-field of a Markov chain and a theorem of Orey. Ann. Math. Statist. 35, 1291-1295. BLUMENTHAL, R.M. (1957). An extended Markov property. 'Trans. Amer. Math. Soc. 82, 52-72. - (1992). Excursions of Markov Processes. Birkhauser, Bost.on. BLUMENTHAL, R.M., GETOOR, R.K. (1964). Local times for Markov processes. Z. Wahrsch. verw. Geh. 3, 50-74. - (1968). Markov Processes and Potential Theory. Academic Press, NY. BOCHNER, S. (1932). Vorlesungen iiber Fouriersche Integrale, Akad. Verlagsges., Leipzig. Repr. Chelsea, NY 1948. - (1933). Monotone Funktionen, Stieltjessche Integrale unci harmonische Anal- yse. Math. Ann. 108, 378-410. BOLTZMANN, L. (1887). Uber die mechanischen Analogien des zweiten Haupt- satzes der Thermodynamik. J. Reine Angew. Math. 100, 201-212. BOREL, E. (1895). Sur quelques points de la theorie des fonctions. Ann. Sci. Ecole Norm. Sup. (3) 12, 9-55. - (1898). Ler;ons sur la Theorie des Fonctions. Gauthier-Villars, Paris. - (1909). Les probabilites denombrables et leurs applications arithmetiques. Rend. Cire. Mat. Palermo 27 247-271. 
598 Foundations of Modern Probability BREIMAN, L. (1957-60). The individual ergodic theorem of infomation theory. Ann. Math. Statist. 28, 809-811; 31, 809-810. - (1968). Probability. Addison-Wesley, Reading, MA. Repr. SIAM, Philadelphia 1992. BREMAUD, P. (1981). Point Processes and Queues. Springer, NY. BROWN, R. (1828). A brief description of microscopical observations made in the months of June, July and August 1827, on the particles contained in the pollen of plants; and on the general existence of active molecules in organic and inorganic bodies. Ann. Phys. 14, 294-313. BRYC, W. (1990). Large deviations by the asymptotic value method. In Diffu- sion Processes and Related Problems in Analysis (M. Pinsky, ed.), 447-472. Birkhauser, Basel. BUHLMANN, H. (1960). Austauschbare stochastische Variabeln und ihre Grenzw- ertsatze. Univ. Calif. Publ. Statist. 3, 1-35. BUNIAKOWSKY, V.Y. (1859). Sur quelques inegalites concernant les integrales or- dinaires et les integrales aux differences finies. Mem. de l'Acad. St.-Petersbourg 1:9. BURKHOLDER, D.L. (1966). Martingale transforms. Ann. Math. Statist. 37,1494- 1504. BURKHOLDER, D.L., DAVIS, B.J., GUNDY, R.F. (1972). Integral inequalities for convex functions of operators on martingales. Proc. 6th Berkeley Symp. Math. Statist. Probab. 2, 223-240. BURKHOLDER, D.L., GUNDY, R.F. (1970). Extrapolation and interpolation of quasi-linear operators on martingales. Acta Math. 124, 249-304. CAMERON, R.H., MARTIN, W.T. (1944). Transformation of Wiener integrals under translations. Ann. Math. 45, 386-396. CANTELLI, F.P. (1917). Su due applicaziolle di un teorema di G. Boole alla statistica matematica. Rend. Accad. Naz. Lincei 26, 295-302. - (1933). Sulla determinazione empirica della leggi di probabilita. Ciorn. 1st. Ital. Attuari 4, 421-424. CARATHEODORY, C. (1927). Vorlesungen iiber reelle Funktionen, 2nd ed. Teubner, Leipzig (1st ed. 1918). Repr. Chelsea, NY 1946. CARLESON, L. (1958). Two remarks on the basic theorems of information theory. Math. Scand. 6, 175-180. " CAUCHY, A.L. (1821). Cours d'analyse de ['Ecole Royale Poly technique, Paris. CHACON, R.V., ORNSTEIN, D.S. (1960). A general ergodic theorem. Illinois J. Math. 4, 153-160. CHAPMAN, S. (1928). On the Brownian displacements and thermal diffusion of grains suspended in a non-uniform fluid. Proc. Roy. Soc. London (A) 119, 34-54. CHEBYSHEV, P.L. (1867). Des valeurs moyennes. J. Math. Pures Appl. 12, 177- 184. - (1890). Sur deux theoremes relatifs aux probabilites. Acta Math. 14, 305-315. 
Bjbliography 599 CHENTSOV, N.N. (1956). Weak convergence of stochastic processes whose trajec- tories have no discontinuities of the second kind and the "heuristic" approach to the Kolmogorov-Smirnov tests. Th. Probab. Appl. 1, 140-144. CHERNOFF, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann. Math. Statist. 23, 493-507. CHERNOFF, H., TEICHER, H. (1958). A central limit theorem for sequences of exchangeable random variables. A nn. Math. Statist. 29, 118-130. CHOQUET, G. (1953-54). Theory of capacities. Ann. Inst. Fourier Grenoble 5, 131-295. CHOW, Y.S., TEICHER, H. (1997). Probability Theory: Independence, Inter- changeability, Martingales, 3nd ed. Springer, NY. CHUNG, K.L. (1960). Markov Chains with Stationary Transition Probabilities. Springer, Berlin. - (1961). A note on the ergodic theorem of information theory. Ann. Math. Statist. 32, 612-614. - (1973). Probabilistic approach to the equilibrium problem in potential theory. Ann. Inst. Fourier Grenoble 23, 313-322. - (1974). A Course in Probability Theory, 2nd ed. Academic Press, NY. - (1982). Lectures from Markov Processes to Brownian Motion. Springer, NY. - (1995). Green, Brown, and Probability. World Scientific, Singapore. CHUNG, K.L., DOOB, J .L. (1965). Fields, optionality and measurability. Amer. J. Math. 87, 397-424. CHUNG, K.L., FUCHS, W.H.J. (1951). On the distribution of values of sums of random variables. Mem. Amer. Math. Soc. 6. CHUNG, K.L., ORNSTEIN, D.S. (1962). On the recurrence of sums of random variables. Bull. Amer. Math. Soc. 68, 30-32. CHUNG, K.L., WALSH, J .B. (1974). Meyer's theorem on previsibility. Z. Wahrsch. verw. Geb. 29, 253-256. CHUNG, K.L., WILLIAMS, R.J. (1990). Introduction to Stochastic Integration, 2nd ed. Birkhauser, Boston. COURANT, R., FRIEDRICHS, K., LEWY, H. (1928). Uber die partiellen Differen- tialgleichungen cler mathematischen Physik. Math. Ann. 100, 32-74. COURREGE, P. (1962-63). Integrales stochastiques et martingales de carre integrable. Bern. Brelot-Choquet-Deny 7. Pub!. Inst. H. Poincare. Cox, D.R. (1955). Some statistical methods connected with series of events. J. R. Statist. Soc. Sere B 17, 129-164. CRAMER, H. (1938). Sur un nouveau theoreme-limite de la theorie des probabilites. Actual. Sci. Indust. 736, 5-23. - (1942). On harmonic analysis in certain functional spaces. Ark. Mat. Astr. Fys. 28B:12 (17 pp.). CRAMER, H., LEADBETTER, M.R. (1967). Stationary and Related Stochastic Processes. Wiley, NY. CRAMER, H., WOLD, H. (1936). Some theorems on distribution functions. J. London Math. Soc. 11, 290-295. 
600 Foundations of Modern Probability CSORGO, M., REVESZ, P. (1981). Strong Approximations in Probability and Statistics. Academic Press, NY. DALEY, D.J., VERE-JONES, D. (1988). An Introduction to the Theory of Point Processes. Springer, NY. DAMBIS, K.E. (1965). On the decomposition of continuous submartingales. Th. Probab. Appl. 10, 401-410. DANIELL, P.J. (1918-19). Integrals in an infinite number of dimensions. Ann. Math. (2) 20, 281-288. - (1919-20). Functions of limited variation in an infinite number of dimensions. Ann. Math. (2) 21, 30-38. - (1920). Stieltjes derivatives. Bull. Amer. Math. Soc. 26, 444-448. DAVIS, B.J. (1970). On the integrability of the martingale square function. Israel J. Math. 8, 187-190. DAWSON, D.A., GARTNER, J. (1987). Large deviations from the McKean-Vlasov limit for weakly interacting diffusions. Stochastics 20, 247-308. DAY, M.M. (1942). Ergodic theorems for Abelian semigroups. Trans. Amer. Math. Soc. 51, 399-412. DEBES, H., KERSTAN, J., LIEMANT, A., MATTHES, K. (1970-71). Verallge- meinerung eines Satzes von Dobrushin I, III. Math. Nachr. 47, 183-244; 50, 99-139. DELLACHERIE, C. (1972). Capacites et Processus Stochastiques. Springer, Berlin. - (1980). Un survol de la theorie de l'integrale stochastique. Stoch. Proc. Appl. 10, 115-144. DELLACHERIE, C., MAISONNEUVE, B., MEYER, P.A. (1992). Probabilites et Potentiel, V. Hermann, Paris. DELLACHERIE, C., MEYER, P .A. (1975-87). Probabilites et Potentiel, I-IV. Hermann, Paris. Engl. trans., North-Holland. DEMBO, A., ZEITOUNI, O. (1998). Large Deviations Techniques and Applications, 2nd ed. Springer, NY. DERMAN, C. (1954). Ergodic property of the Brownian motion process. Proc. Natl. Acad. Sci. USA 40, 1155-1158. DEUSCHEL, J.D., STROOCK, D.W. (1989). Large Deviations. Academic Press, Boston. DOEBLIN, W. (1938a). Expose de la theorie des chaines simples constantes de Markov a un nombre fini d'etats. Rev. Math. Union Interbalkan. 2, 77-105. - (1938b). Sur deux problemes de M. Kolmogoroff concernant les chaines denombrables. Bull. Soc. Math. France 66, 210-220. - (1939a). Sur les sommes d'un grand nombre de variables aleatoires indepen- dantes. Bull. Sci. Math. 63, 23-64. - (193gb). Sur certains mouvements aleatoires discontinus. Skand. Aktuarietid- skr. 22, 211-222. - (1940). Elements d'une theorie generale des chaines simples constantes de Markoff. Ann. Sci. Ecole Norm. Sup. 357, 61-111. DaHLER, R. (1980). On the conditional independence of random events. Th. Probab. Appl. 25, 628-634. 
Bibliography 601 DOLEANS( -DADE), C. (1967a). Processus croissants naturel et processus crois- sants tres bien mesurable. C.R. Acad. Sci. Paris 264, 874-876. - (1967b). Integrales stochastiques dependant d 'un parametre. Publ. Inst. Stat. Univ. Paris 16, 23-34. - (1970). Quelques applications de la formule de changement de variables pour les semimartingales. Z. Wahrsch. verw. Geb. 16, 181-194. DOLEANS-DADE, C., MEYER, P.A. (1970). Integrales stochastiques par rapport aux martingales locales. Lect. Notes in Math. 124, 77-107. Springer, Berlin. DONSKER, M.D. (1951-52). An invariance principle for certain probability limit theorems. Mem. Amer. Math. Soc. 6. - (1952). Justification and extension of Doob's heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Statist. 23, 277-281. DONSKER, M.D., VARADHAN, S.R.S. (1975-83). Asymptotic evaluation of cer- tain Markov process expectations for large time, I-IV. Comm. Pure Appl. Math. 28, 1-47, 279-301; 29, 389-461; 36, 183-212. DOOB, J .L. (1936). Note on probability. Ann. Math. (2) 37, 363-367. - (1937). Stochastic processes depending on a continuous parameter. Trans. Amer. Math. Soc. 42, 107-140. - (1938). Stochastic processes with an integral-valued paranleter. Trans. Amer. Math. Soc. 44, 87-150. - (1940). Regularity properties of certain families of chance variables. Trans. Amer. Math. Soc. 47, 455-486. - (1942a). The Brownian movement and stochastic equations. Ann. Math. 43, 351-369. - (1942b). Topics in the theory of Markoff chains. Trans. Arner. Math. Soc. 52, 37-64. - (1945). Markoff chains-denumerable case. Trans. Amer. .J\1ath. Soc. 58, 455- 473. - (1947). Probability in function space. Bull. Amer. Math. joc. 53, 15-30. - (1948a). Asymptotic properties of Markov transition probabilities. Trans. Amer. Math. Soc. 63, 393-421. - (1948b). Renewal theory from the point of view of the theory of probability. Trans. Amer. Math. Soc. 63, 422-438. - (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math. Statist. 20, 393-403. - (1951). Continuous parameter martingales. Proc. 2nd Be'rkeley Symp. Math. Statist. Probab., 269-277. - (1953). Stochastic Processes. Wiley, NY. - (1954). Semimartingales and subharmonic functions. Trans. Amer. Math. Soc. 77, 86-121. - (1955). A probability approach to the heat equation. Trans. Amer. Math. Soc. 80, 216-280. - (1984). Classical Potential Theory and its Probabilistic Counterpart. Springer, NY. - (1994). Measure Theory. Springer, NY. DUBINS, L.E. (1968). On a theorem of Skorohod. Ann. Math. Statist. 39, 2094- 2097. 
602 Foundations of Modern Probability DUBINS, L.E., SCHWARZ, G. (1965). On continuous martingales. Proc. Natl. Acad. Sci. USA 53, 913-916. DUDLEV, R.M. (1966). Weak convergence of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces. Illinois J. Math. 10, 109- 126. - (1967). Measures on non-separable metric spaces. Illinois J. Math. 11, 449- 453. - (1968). Distances of probability measures and random variables. Ann. Math. Statist. 39, 1563-1572. - (1989). Real Analysis and Probability. Wadsworth, Brooks & Cole, Pacific Grove, CA. DUNFORD, N. (1939). A mean ergodic theorem. Duke Math. J. 5, 635-646. DUNFORD, N., SCHWARTZ, J. T. (1956). Convergence almost everywhere of operator averages. J. Rat. Mech. Anal. 5, 129-178. DURRETT, R. (1984). Brownian Motion and Martingales in Analysis. Wadsworth, Belmont, CA. - (1995). Probability Theory and Examples, 2nd ed. Wadsworth, Brooks & Cole, Pacific Grove, CA. DVORETZKV, A. (1972). Asymptotic normality for sums of dependent random variables. Proc. 6th Berkeley Symp. Math. Statist. Probab. 2, 513-535. DVNKIN, E.B. (1952). Criteria of continuity and lack of discontinuities of the second kind for trajectories of a Markov stochastic process (Russian). Izv. Akad. Nauk SSSR, Sere Mat. 16, 563-572. - (1955a). Infinitesimal operators of Markov stochastic processes (Russian). Dokl. Akad. Nauk SSSR 105, 206-209. - (1955b). Continuous one-dimensional Markov processes (Russian). Dokl. Akad. Nauk SSSR 105, 405-408. - (1956). Markov processes and semigroups of operators. Infinitesimal operators of Markov processes. Th. Probab. Appl. 1, 25-60. - (1959). One-dimensional continuous strong Markov processes. Th. Probab. Appl. 4, 3-54. - (1961). Theory of Markov Processes. Engl. trans., Prentice-Hall and Pergamon Press, Englewood Cliffs, NJ, and Oxford. (Russian orig. 1959.) - (1965). Markov Processes, Vols. 1-2. Engl. trans., Springer, Berlin. (Russian orig. 1963.) - (1978). Sufficient statistics and extreme points. Ann. Probab. 6, 705-730. DVNKIN, E.B., YUSHKEVICH, A.A. (1956). Strong Markov processes. Th. Probab. Appl. 1, 134-139. EINSTEIN, A. (1905). On the movement of small particles suspended in a sta- tionary liquid demanded by the molecular-kinetic theory of heat. Engl. trans. in Investigations on the Theory of the Brownian Movement. Repr. Dover, NY 1956. - (1906). On the theory of Brownian motion. Engl. trans. in Investigations on the Theory of the Brownian Movement. Repr. Dover, NY 1956. ELLIOTT, R.J. (1982). Stochastic Calculus and Applications. Springer, NY. ELLIS, R.S. (1985). Entropy, Large Deviations, and Statistical Mechanics. Springer, NY. 
Bibliography 603 ENGELBERT, H.J., SCHMIDT, W. (1981). On the behaviour of certain functionals of the Wiener process and applications to stochastic differential equations. Lect. Notes in Control and Inform. Sci. 36, 47-55. - (1984). On one-dimensional stochastic differential equations with generalized drift. Lect. Notes in Control and Inform. Sci. 69, 143-155. Springer, Berlin. - (1985). On solutions of stochastic differential equations without drift. Z. Wahrsch. verw. Geb. 68, 287-317. ERDOS, P., FELLER, W., POLLARD, H. (1949). A theorem on power series. Bull. Amer. Math. Soc. 55, 201-204. ERDOS, P., KAC, M. (1946). On certain limit theorems in the theory of probability. Bull. Amer. Math. Soc. 52, 292-302. - (1947). On the number of positive sums of independent random variables. Bull. Amer. Math. Soc. 53, 1011-1020. ERLANC, A.K. (1909). The theory of probabilities and telephone conversations. Nyt. Tidskr. Mat. B 20, 33-41. ETHIER, S.N., KURTZ, T.G. (1986). Markov Processes: Characterization and Convergence. Wiley, NY. FABER, G. (1910). Uber stetige Funktionen, II. Math. Ann. 69, 372-443. FARRELL, R.H. (1962). Representation of invariant measures. Illinois J. Math. 6, 447-467. FATOU, P. (1906). Series trigonometriques et series de Taylor. Acta Math. 30, 335-400. FELL, J .M.G. (1962). A Hausdorff topology for the closed subsets of a locally compact non-Hausdorff space. Proc. Amer. Math. Soc. 13, 472-476. FELLER, W. (1935-37). Uber den zentralen Grenzwertsatz der Wahrschein- lichkeitstheorie, I-II. Math. Z. 40, 521-559; 42, 301-312. - (1936). Zur Theorie der stochastischen Prozesse (Existenz und Eindeutigkeits- satze). Math. Ann. 113, 113-160. - (1937). On the Kolmogoroff-P. Levy formula for infinitely divisible distribu- tion functions. Proc. Yugoslav Acad. Sci. 82, 95-112. - (1940). On the integra-differential equations of purely discontinuous Markoff processes. Trans. Amer. Math. Soc. 48, 488-515; 58, 474. - (1949). Fluctuation theory of recurrent events. Trans. Amer. Math. Soc. 67, 98-119. - (1952). The parabolic differential equations and the associated semi-groups of transformations. Ann. Math. 55, 468-519. - (1954). Diffusion processes in one dimension. Trans. Amer. Math. Soc. 77, 1-31. - (1968, 1971). An Introduction to Probability Theory and its Applications, 1 (3rd ed.); 2 (2nd ed.). Wiley, NY (1st eds. 1950, 1966). FELLER, W., OREY, S. (1961). A renewal theorem. J. Math. lv/echo 10, 619-624. FEYNMAN, R.P. (1948). Space-time approach to nonrelativistic quantum me- chanics. Rev. Mod. Phys. 20, 367-387. DE FINETTI, B. (1929). Sulle funzioni ad incremento aleatorio. Rend. Ace. Naz. Lincei 10, 163-168. 
604 Foundations of Modern Probability - (1930). Fuzione caratteristica di un fenomeno aleatorio. Mem. R. Ace. Lincei (6) 4, 86-133. - (1937). La prevision: ses lois logiques, ses sources subjectives. Ann. Inst. H. Poincare 7, 1-68. FISK, D.L. (1965). Quasimartingales. Trans. Amer. Math. Soc. 120, 369-389. - (1966). Sample quadratic variation of continuous, second-order martingales. Z. Wahrsch. verw. Geb. 6, 273-278. FORTET, R. (1943). Les fonctions alEatoires du type de Markoff associees it cer- taines equations lineaires aux derivees partielles du type parabolique. J. Math. Pures Appl. 22, 177-243. FRANKEN, P., KONIG, D., ARNDT, D., SCHMIDT, V. (1981). Queues and Point Processes. Akademie- Verlag, Berlin. FRECHET, M. (1928). Les Espaces Abstraits. Gauthier-Villars, Paris. FREEDMAN, D. (1962-63). Invariants under mixing which generalize de Finetti's theorem. Ann. Math. Statist. 33, 916-923; 34, 1194-1216. - (1971a). Markov Chains. Holden-Day, San Francisco. Repr. Springer, NY 1983. - (1971b). Brownian Motion and Diffusion. Holden-Day, San Francisco. Repr. Springer, NY 1983. FREIDLIN, M.I., WENTZEL, A.D. (1970). On small random permutations of dynamical systems. Russian Math. Surveys 25, 1-55. - (1998). Random Perturbations of Dynamical Systems. Engl. trans., Springer, NY. (Russian orig. 1979.) FROSTMAN, O. (1935). Potentiel d'equilibre et capacite des ensembles avec quelques applications it la theorie des fonctions. Medd. Lunds Univ. Mat. Bern. 3, 1-118. FUBINI, G. (1907). SugH integrali multipli. Rend. Ace. Naz. Lincei 16, 608-614. FURSTENBERG, H., KESTEN, H. (1960). Products of random matrices. Ann. Math. Statist. 31, 457-469. GALMARINO, A.R. (1963). Representation of an isotropic diffusion as a skew product. Z. Wahrsch. verw. Geb. 1, 359-378. GARSIA, A.M. (1965). A simple proof of E. Hopf's maximal ergodic theorem. J. Math. Mech. 14, 381-382. - (1973). Martingale Inequalities: Seminar Notes on Recent Progress. Math. Lect. Notes Ser. Benjamin, Reading, MA. GAUSS, C.F. (1809). Theory of Motion of the Heavenly Bodies. Engl. trans., Dover, NY 1963. - (1840). Allgemeine Lehrsatze in Beziehung auf die im vehrkehrten Verhaltnisse des Quadrats der Entfernung wirkenden Anziehungs- und Abstossungs-Krafte. Gauss Werke 5, 197-242. Gottingen 1867. GETOOR, R.K. (1990). Excessive Measures. Birkhauser, Boston. GETOOR, R.K., SHARPE, M.J. (1972). Conformal martingales. Invent. Math. 16, 271-308. GIHMAN, 1.1. (1947). On a method of constructing random processes (Russian). Dokl. Akad. Nauk SSSR 58,961-964. 
Bibliography 605 - (1950-51). On the theory of differential equations for random processes, I-II (Russian). Ukr. Mat. J. 2:4, 37-63; 3:3, 317-339. GIHMAN, I.I., SKOROHOD, A.V. (1965). Introduction to the Theory of Random Processes. Engl. trans., Saunders, Philadelphia. Repr. Dover, Mineola 1996. - (1974-79). The Theory of Stochastic Processes, 1-3. Engl. trans., Springer, Berlin. G IRSANOV, LV. (1960). On transforming a certain class of stochastic processes by absolutely continuous substitution of measures. Th. Probab. Appl. 5, 285-301. GLIVENKO, V.1. (1933). Sulla determinazione empirica della leggi di probabilita. Giorn. 1st. Ital. Attuari 4, 92-99. GNEDENKO, B.V. (1939). On the theory of limit theorems for sums of independent random variables (Russian). Izv. Akad. Nauk SSSR Sere Mat. 181-232,643- 647. GNEDENKO, B.V., KOLMOGOROV, A.N. (1968). Limit Distn.butions for Sums of Independent Random Variables. Engl. trans., 2nd ed., Addison-Wesley, Reading, MA. (Russian orig. 1949.) GOLDMAN, J.R. (1967). Stochastic point processes: Limit theorems. Ann. Math. Statist. 38, 771-779. GOLDSTEIN, J .A. (1976). Semigroup-theoretic proofs of the central limit theorem and other theorems of analysis. Semigroup Forum 12, 189-206. GOLDSTEIN, S. (1979). Maximal coupling. Z. Wahrsch. verw. Geb. 46, 193-204. GRANDELL, J. (1976). Doubly Stochastic Poisson Processes. Leet. Notes in Math. 529. Springer, Berlin. GREEN, G. (1828). An essay on the application of mathematical analysis to the theories of electricity and magnetism. Repr. in Mathematical Papers, Chelsea, NY 1970. GREENWOOD, P., PITMAN, J. (1980). Construction of local time and Poisson point processes from nested arrays. J. London Math. Soc. (2) 22, 182-192. GRIFFEATH, D. (1975). A maximal coupling for Markov chains. Z. Wahrsch. verw. Geb. 31, 95-106. GRIGELIONIS, B. (1963). On the convergence of sums of random step processes to a Poisson process. Th. Probab. Appl. 8, 172-182. - (1971). On the representation of integer-valued measures by means of stochastic integrals with respect to Poisson measure. Litovsk. Mat. Sb. 11, 93-108. HAAR, A. (1933). Der MaBbegriff in der Theorie der kontinuerlichen Gruppen. Ann. Math. 34, 147-169. HAGBERG, J. (1973). Approximation of the summation process obtained by sampling from a finite population. Th. Probab. Appl. 18, 790-803. HAHN, H. (1921). Theorie der reellen Funktionen. Julius Springer, Berlin. HAJEK, J. (1960). Limiting distributions in simple random sampling from a finite population. Magyar Tud. Akad. Mat. Kutat6 Int. Kozi. 5, 361-374. HALL, P., HEYDE, C.C. (1980). Martingale Limit Theory and its Application. Academic Press, NY. 
606 Foundations of Modern Probability HALMOS, P.R. (1950). Measure Theory, Van Nostrand, Princeton. Repr. Springer, NY 1974. HARDY, G.H., LITTLEWOOD, J.E. (1930). A maximal theorem with function- theoretic applications. Acta Math. 54, 81-116. HARRIS, T.E. (1956). The existence of stationary measures for certain Markov processes. Proc. 3rd Berkeley Symp. Math. Statist. Probab. 2, 113-124. - (1971). Random measures and motions of point processes. Z. Wahrsch. verw. Geb. 18, 85-115. HARTMAN, P., WINTNER, A. (1941). On the law of the iterated logarithm. J. Math. 63, 169-176. HELLY, E. (1911-12). Uber lineare Funktionaloperatoren. Sitzungsber. Nat. Kais. Akad. Wiss. 121, 265-297. HEWITT, E., SAVAGE, L.J. (1955). Symmetric measures on Cartesian products. Trans. Amer. Math. Soc. 80, 470-501. HILLE, E. (1948). Functional analysis and semi-groups. Amer. Math. Colloq. Publ. 31, NY. HITCZENKO, P. (1990). Best constants in martingale version of Rosenthal's inequality. Ann. Probab. 18, 1656-1668. HOLDER, O. (1889). Uber einen Mittelwertsatz. Nachr. Akad. Wiss. Gottingen, math.phys. Kl., 38-47. HOPF, E. (1954). The general temporally discrete Markov process. J. Rat. Meeh. Anal. 3, 13-45. HOROWITZ, J. (1972). Semilinear Markov processes, subordinators and renewal theory. Z. Wahrsch. verw. Geb. 24, 167-193. HUNT, G.A. (1956). Some theorems concerning Brownian motion. Trans. Amer. Math. Soc. 81, 294-319. - (1957-58). Markoff processes and potentials, I-III. Illinois J. Math. 1, 44-93, 316-369; 2, 151-213. HUREWICZ, W. (1944). Ergodic theorem without invariant measure. Ann. Math. 45, 192-206. HURWITZ, A. (1897). Uber die Erzeugung der Invarianten durch Integration. Nachr. Ges. Gottingen, math.-phys. Kl., 71-90. IKEDA, N., WATANABE, S. (1989). Stochastic Differential Equations and Diffusion Processes, 2nd ed. North-Holland and Kodansha, Amsterdam and Tokyo. IOFFE, D. (1991). On some applicable versions of abstract large deviations theorems. Ann. Probab. 19, 1629-1639. IONEscu TULCEA, A. (1960). Contributions to information theory for abstract alphabets. Ark. Mat. 4, 235-247. IONESCU TULCEA, C. T. (1949-50). Mesures dans les espaces produits. Atti Accad. Naz. Lincei Rend. 7, 208-211. ITO, K. (1942a). Differential equations determining Markov processes (Japanese). Zenkoku Shij6 Sugaku Danwakai 244:1077, 1352-1400. - (1942b). On stochastic processes (I) (Infinitely divisible laws of probability). Jap. J. Math. 18, 261-301. - (1944). Stochastic integral. Proc. Imp. Acad. Tokyo 20, 519-524. 
Bibliography 607 - (1946). On a stochastic integral equation. Proc. Imp. Acad. Tokyo 22, 32-35. - (1951a). On a formula concerning stochastic differentials. Nagoya Math. J. 3, 55-65. - (1951b). On stochastic differential equations. Mem. Amer. Math. Soc. 4,1-51. - (1951c). Multiple Wiener integral. J. Math. Soc. Japan 3, 157-169. - (1957). Stochastic Processes (Japanese). Iwanami Shoten, Tokyo. - (1972). Poisson point processes attached to Markov processes. Proc. 6th Berkeley Symp. Math. Statist. Probab. 3, 225-239. - (1984). Introduction to Probability Theory. Engl. trans., Cambridge Univ. Press. lTC, K., McKEAN, H.P. (1965). Diffusion Processes and their Sample Paths. Repr. Springer, Berlin 1996. ITa, K., WATANABE, S. (1965). Transformation of Markov processes by multiplicative functionals. Ann. Inst. Fourier 15, 15-30. JACOD, J. (1975). Multivariate point processes: Predictable projection, Radon- Nikodym derivative, representation of martingales. Z. Wahrsch. verw. Geb. 31, 235-253. - (1979). Calcul Stochastique et Problemes de Martingales. Leet. Notes in Math. 714. Springer, Berlin. JACOD, J., SHIRYAEV, A.N. (1987). Limit Theorems for Stochastic Processes. Springer, Berlin. JAGERS, P. (1972). On the weak convergence of superpositions of point processes. Z. Wahrsch. verw. Geb. 22, 1-7. - (1974). Aspects of random measures and point processes. Adv. Probab. ReI. Topics 3, 179-239. Marcel Dekker, NY. JAMISON, B., OREY, S. (1967). Markov chains recurrent in the sense of Harris. Z. Wahrsch. verw. Geb. 8, 206-223. JENSEN, J.L.W.V. (1906). Sur les fonctions convexes et les inegalites entre les valeurs moyennes. Acta Math. 30, 175-193. JESSEN, B. (1934). The theory of integration in a space of an infinite number of dimensions. Acta Math. 63, 249-323. JOHNSON, W.B., SCHECHTMAN, G., ZINN, J. (1985). Best constants in moment inequalities for linear combinations of independent and exchangeable random variables. Ann. Probab. 13, 234-253. JORDAN, C. (1881). Sur la serie de Fourier. C.R. Acad. Sci. Paris 92, 228-230. KAC, M. (1947). On the notion of recurrence in discrete stochastic processes. Bull. Amer. Math. Soc. 53, 1002-1010. - (1949). On distributions of certain Wiener functionals. T1'ans. Amer. Math. Soc. 65, 1-13. - (1951). On some connections between probability theory and differential and integral equations. Proc. 2nd Berkeley Symp. Math. Statist. Probab., 189-215. U niv. of California Press, Berkeley. KAKUTANI, S. (1940). Ergodic theorems and the Markoff process with a stable distribution. Proc. Imp. Acad. Tokyo 16, 49-54. - (1944a). On Brownian motions in n-space. Proc. Imp. Acad. Tokyo 20, 648- 652. 
608 Foundations of Modern Probability - (1944b). Two-dimensional Brownian motion and harmonic functions. Proc. Imp. Acad. Tokyo 20, 706-714. - (1945). Markoff process and the Dirichlet problem. Proc. Japan Acad. 21, 227-233. KALLENBERG, o. (1973a). Characterization and convergence of random measures and point processes. Z. Wahrsch. veruJ. Geb. 27, 9-21. - (1973b). Canonical representations and convergence criteria for processes with interchangeable increments. Z. Wahrsch. verw. Geb. 27, 23-36. - (1986). Random Measures, 4th ed. Akademie-Verlag and Academic Press, Berlin and London (1st ed. 1975). - (1987). Homogeneity and the strong Markov property. Ann. Probab. 15, 213- 240. - (1988). Spreading and predictable sampling in exchangeable sequences and processes. Ann. Probab. 16, 508-534. - (1990). Random time change and an integral representation for marked stopping times. Probab. Th. Ret. Fields 86, 167-202. - (1992). Some time change representations of stable integrals, via predictable transformations of local martingales. Stoch. Proc. Appl. 40, 199-223. - (1996a). On the existence of universal functional solutions to classical SDEs. Ann. Probab. 24, 196-205. - (1996b). Improved criteria for distributional convergence of point processes. Stoch. Proc. Appl. 64, 93-102. - (1999a). Ballot theorems and sojourn laws for stationary processes. Ann. Probab. 27, 2011-2019. - (1999b). Asymptotically invariant sampling and averaging from stationary-like processes. Stoch. Proc. Appl. 82, 195-204. KALLENBERG, 0., SZTENCEL, R. (1991). Some dimension-free features of vector- valued martingales. Probab. Th. Ret. Fields 88, 215-247. KALLIANPUR, G. (1980). Stochastic Filtering Theory. Springer, NY. KAPLAN, E.L. (1955). Transformations of stationary random sequences. Math. Scand. 3, 127-149. KARAMATA, J. (1930). Sur une mode de croissance reguliere des fonctions. Mathematica (Cluj) 4, 38-53. KARATZAS, I., SHREVE, S.E. (1991). Brownian Motion and Stochastic Calculus, 2nd ed. Springer, NY. KAZAMAKI, N. (1972). Change of time, stochastic integrals and weak martingales. Z. Wahrsch. veruJ. Geb. 22, 25-32. KEMENY, J.G., SNELL, J.L., KNAPP, A.W. (1966). Denumerable Markov Chains. Van Nostrand, Princeton. KENDALL, D.G. (1974). Foundations of a theory of random sets. In Stochastic Geometry (eds. E.F. Harding, D.G. Kendall), pp. 322-376. Wiley, NY. KHINCHIN, A.Y. (1923). Uber dyadische Briicke. Math. Z. 18, 109-116. - (1924). Uber einen Satz der Wahrscheinlichkeitsrechnung. Fund. Math. 6, 9- 20. - (1929). Uber einen neuen Grenzwertsatz der Wahrscheinlichkeitsrechnung. Math. Ann. 101, 745-752. 
Bibliography 609 - (1933). Zur mathematischen Begriinding der statistischen Mechanik. Z. Angew. Math. Mech. 13, 101-103. - (1933). Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, Springer, Berlin. Repr. Chelsea, NY 1948. - (1934). Korrelationstheorie der stationaren stochastischen Prozesse. Math. Ann. 109, 604-615. - (1937). Zur Theorie der unbeschrankt teilbaren Verteilungsgesetze. Mat. Sb. 2, 79-119. - (1938). Limit Laws for Sums of Independent Random Variables (Russian). Moscow. - (1960). Mathematical Methods in the Theory of Queuing. Engl. trans., Griffin, London. (Russian orig. 1955.) KHINCHIN, A.Y., KOLMOGOROV, A.N. (1925). Uber Konvergenz von Reihen deren Glieder durch den Zufall bestimmt werden. Mat. Sb. 32, 668-676. KINGMAN, J.F.C. (1964). On doubly stochastic Poisson processes. Proc. Cambridge Phil. Soc. 60, 923-930. - (1967). Completely random measures. Pac.. J. Math. 21, 59-78. - (1968). The ergodic theory of subadditive stochastic processes. J. Roy. Statist. Soc. (B) 30, 499-510. - (1972). Regenerative Phenomena. Wiley, NY. - (1993). Poisson Processes. Clarendon Press, Oxford. KINNEY, J.R. (1953). Continuity properties of Markov processes. Trans. Amer. Math. Soc. 74, 280-302. KNIGHT, F.B. (1963). Random walks and a sojourn density process of Brownian motion. Trans. Amer. Math. Soc. 107, 56-86. - (1971). A reduction of continuous, square-integrable martingales to Brownian motion. Leet. Notes in Math. 190, 19-31. Springer, Berlin. KOLMOGOROV, A.N. (1928-29). Uber die Summen durch den Zufall bestimmter unabhangiger Grossen. Math. Ann. 99, 309-319; 102, 484-488. - (1929). Uber das Gesatz des iterierten Logarithmus. Math. Ann. 101, 126-135. - (1930). Sur la loi forte des grandes nombres. C.R. Acad. Sci. Paris 191, 910- 912. - (1931a). Uber die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Math. Ann. 104, 415-458. - (1931b). Eine Verallgemeinerung des Laplace-Liapounoffschen Satzes. Izv. Akad. Nauk USSR, Otdel. Matern. Yestestv. Nauk 1931, 959-962. - (1932). Sulla forma generale di un processo stocastico omogeneo (un problema di B. de Finetti). Atti Aeead. Naz. Lineei Rend. (6) 15, 805-808, 866-869. - (1933a). Uber die Grenzwertsatze der Wahrscheinlichkeitsrechnung. Izv. Akad. Nauk USSR, Otdel. Matern. Yestestv. Nauk 1933, 363-372. - (1933b). Zur Theorie der stetigen zufalligen Prozesse. Math. Ann. 108, 149- 160. - (1933c). Foundations of the Theory of Probability (German), Springer, Berlin. Engl. trans., Chelsea, NY 1956. - (1935). Some current developments in probability theory (in Russian). Proe. 2nd All-Union Math. Congr. 1,349-358. Akad. Nauk SSSR, Leningrad. - (1936a). Anfangsgriinde der Marko£Ischen Ketten mit unendlich vielen moglichen Zustanden. Mat. Sb. 1, 607-610. 
610 Foundations of Modern Probability - (1936b). Zur Theorie der Markoffschen Ketten. Math. Ann. 112, 155-160. - (1937). Zur Umkehrbarkeit der statistischen Naturgesetze. Math. Ann. 113, 766-772. - (1956). On Skorohod convergence. Th. Probab. Appl. 1, 213-222. KOLMOGOROV, A.N., LEONTOVICH, M.A. (1933). Zur Berechnung der mittleren Brownschen FUiche. Physik. Z. Sowjetunion 4, 1-13. KOMLos, J., MAJOR, P., TUSNADY, G. (1975-76). An approximation of partial sums of independent r. v.'s and the sample d.f., I-II. Z. Wahrsch. verw. Geb. 32, 111-131; 34,33-58. KONIG, D., MATTHES, K. (1963). Verallgemeinerung der Erlangschen Formeln, I. Math. Nachr. 26, 45-56. KOOPMAN, B.O. (1931). Hamiltonian systems and transformations in Hilbert space. Proc. Nat. Acad. Sci. USA 17, 315-318. KRENGEL, U. (1985). Ergodic Theorems. de Gruyter, Berlin. KRICKEBERC, K. (1956). Convergence of martingales with a directed index set. Trans. Amer. Math. Soc. 83, 313-357. - (1972). The Cox process. Symp. Math. 9, 151-167. KRYLOV, N., BOGOLIOUBOV, N. (1937). La theorie generale de Ia ffiesure dans son application it I' etude des systemes de la mecanique non lineaires. Ann. Math. 38, 65-113. KULLBACK, S., LEIBLER, R.A. (1951). On information and sufficiency. Ann. Math. Statist. 22, 79-86. KUNITA, H. (1990). Stochastic Flows and Stochastic Differential Equations. Cambridge Univ. Press, Cambridge. KUNITA, H., WATANABE, S. (1967). On square integrable martingales. Nagoya Math. J. 30, 209-245. KURTZ, T.G. (1969). Extensions of Trotter's operator semi group approximation theorems. J. Funct. Anal. 3, 354-375. - (1975). Semigroups of conditioned shifts and approximation of Markov processes. Ann. Probab. 3, 618-642. KWAPIEN, S., WOYCZYNSKI, W.A. (1992). Random Series and Stochastic Integrals: Single and Multiple. Birkhauser, Boston. LANGEVIN, P. (1908). Sur la theorie du mouvement brownien. C.R. Acad. Sci. Paris 146, 530-533. LAPLACE, P .S. DE (1774). Memoire sur la probabilite des causes par les evenemens. Engl. trans. in Statistical Science 1, 359-378. - (1809). Memoire sur divers points d'analyse. Repr. in Oeuvres Completes de Laplace 14, 178-214. Gauthier-Villars, Paris 1886-1912. - (1812-20). Theone Analytique des Probabilites, 3rd ed. Repr. in Oeuvres Completes de Laplace 7. Gauthier-Villars, Paris 1886-1912. LAST, G., BRANDT, A. (1995). Marked Point Processes on the Real Line: The Dynamic Approach. Springer, NY. LEADBETTER, M.R., LINDGREN, G., ROOTZEN, H. (1983). Extremes and Related Properties of Random Sequences and Processes. Springer, NY. 
Bibliography 611 LEBESGUE, H. (1902). Integrale, longeur, aire. Ann. Mat. Pura .i4ppl. 7,231-359. - (1904). Leons sur l' Integration et la Recherche des Fonctions Primitives. Paris. LE CAM, L. (1957). Convergence in distribution of stochastic processes. Univ. California Publ. Statist. 2, 207-236. LE GALL, J.F. (1983). Applications des temps locaux aux equations differentielles stochastiques unidimensionelles. Lect. Notes in Math. 986, 15-31. LEVI, B. (1906a). Sopra l'integrazione delle serie. Rend. 1st. Lo'mbardo Sci. Lett. (2) 39, 775-780. - (1906b). SuI principio de Dirichlet. Rend. Cire. Mat. Palerrno 22, 293-360. LEVY, P. (1922a). Sur Ie role de la loi de Gauss dans la theorie des erreurs. C.R. Acad. Sci. Paris 174, 855-857. - (1922b). Sur la loi de Gauss. C.R. Acad. Sci. Paris 1682-1684. - (1922c). Sur la determination des lois de probabilite par leurs fonctions caracteristiques. C.R. Acad. Sci. Paris 175, 854-856. - (1924). Theorie des erreurs. La loi de Gauss et les lois exceptionelles. Bull. Soc. Math. France 52, 49-85. - (1925). Calcul des Probabilites. Gauthier-Villars, Paris. - (1934-35). Sur les integrales dont les elements sont des variables aleatoires independantes. Ann. Scuola Norm. Sup. Pisa (2) 3, 337-36t>; 4, 217-218. - (1935a). Proprietes asymptotiques des sommes de variables aleatoires inde- pendantes ou enchainees. J. Math. Pures Appl. (8) 14, 347-402. - (1935b). Proprietes asymptotiques des sommes de variables aleatoires en- chainees. Bull. Sci. Math. (2) 59, 84-96, 109-128. - (1939). Sur certain processus stochastiques homogenes. Cornp. Math. 7, 283- 339. - (1940). Le mouvement brownien plan. Amer. J. Math. 62, 487-550. - (1954). Theorie de l'Addition des Variables Aleatoires, 2nd ed. Gauthier- Villars, Paris (1st ed. 1937). - (1965). Processus Stochastiques et Mouvement Brownien, 2nd ed. Gauthier- Villars, Paris (1st ed. 1948). LIAPOUNOV, A.M. (1901). Nouvelle forme du theoreme sur la limite des probabilites. Mem. Acad. Sci. St. Petersbourg 12, 1-24. LIGGETT, T.M. (1985). An improved subadditive ergodic theorem. Ann. Probab. 13, 1279-1285. LINDEBERG, J.W. (1922a). Eine neue Herleitung des Exponentialgesetzes in der Wahrscheinlichkeitsrechnung. Math. Zeitschr. 15, 211-225. - (1922b). Sur la loi de Gauss. C.R. Acad. Sci. Paris 174, 1400-1402. LINDVALL, T. (1973). Weak convergence of probability measures and random functions in the function space D[O, 00). J. Appl. Probab. 10, 109-121. - (1977). A probabilistic proof of Blackwell's renewal theorern. Ann. Probab. 5, 482-485. - (1992). Lectures on the Coupling Method. Wiley, NY. LIPSTER, R.S., SHIRYAEV, A.N. (2000). Statistics of Rand01n Processes, I-II, 2nd ed., Springer, Berlin. 
612 Foundations of Modern Probability LOEVE, M. (1977-78). Probability Theory 1-2, 4th ed. Springer, NY (1st ed. 1955). LOMNICKI, Z., ULAM, S. (1934). Sur la theorie de la ffiesure dans les es- paces combinatoires et son application au calcul des probabilites: I. Variables independantes. Fund. Math. 23, 237-278. LUKACS, E. (1970). Characteristic Functions, 2nd ed. Griffin, London. LUNDBERG, F. (1903). Approximerad Framstiillning av Sannolikhetsfunktionen. Aterforsiikring av Kollektivrisker. Thesis, Uppsala. MACKEVICIUS, V. (1974). On the question of the weak convergence of random processes in the space D[O, (0). Lithuanian Math. Trans. 14, 620-623. MAISONNEUVE, B. (1974). Systemes Regeneratifs. Asterique 15. Soc. Math. de France. MAKER, P. (1940). The ergodic theorem for a sequence of functions. Duke Math. J. 6, 27-30. MANN, H.B., WALD, A. (1943). On stochastic limit and order relations. Ann. Math. Statist. 14, 217-226. MARCINKIEWICZ, J., ZYGMUND, A. (1937). Sur les fonctions independantes. Fund. Math. 29, 60-90. - (1938). Quelques theoremes sur les fonctions independantes. Studia Math. 7, 104-120. MARKOV, A.A. (1899). The law of large numbers and the method of least squares (Russian). Izv. Fiz.-Mat. Obshch. Kazan Univ. (2) 8, 110-128. - (1906). Extension of the law of large numbers to dependent events (Russian). Bull. Soc. Phys. Math. Kazan (2) 15, 135-156. MARUYAMA, G. (1954). On the transition probability functions of the Markov process. Natl. Sci. Rep. Ochanomizu Univ. 5, 10-20. - (1955). Continuous Markov processes and stochastic equations. Rend. Circa Mat. Palermo 4, 48-90. MARUYAMA, G., TANAKA, H. (1957). Some properties of one-dimensional diffusion processes. Mem. Pac. Sci. Kyushu Univ. 11, 117-141. MATHERON, G. (1975). Random Sets and Integral Geometry. Wiley, London. MATTHES, K. (1963). Stationare zufallige Punktfolgen, I. Jahresber. Deutsch. Math. - Verein. 66, 66-79. MATTHES, K., KERSTAN, J., MECKE, J. (1978). Infinitely Divisible Point Processes. Wiley, Chichester. (German ed. 1974, Russian ed. 1982.) McKEAN, H.P. (1969). Stochastc Integrals. Academic Press, NY. McKEAN, H.P., TANAKA, H. (1961). Additive functionals of the Brownian path. Mem. ColI. Sci. Univ. Kyoto, A 33, 479-506. McMILLAN, B. (1953). The basic theorems of information theory. Ann. Math. Statist. 24, 196-219. MECKE, J. (1967). Stationare zufallige Mafie auf lokalkompakten Abelschen Gruppen. Z. Wahrsch. venn. Geb. 9, 36-58. - (1968). Eine characteristische Eigenschaft der doppelt stochastischen Poisson- schen Prozesse. Z. Wahrsch. venn. Geb. 11, 74-81. 
Bibliography 613 MELEARD, S. (1986). Application du calcul stochastique a l'etude des processus de Markov reguliers sur [0,1]. Stochastics 19, 41-82. METIVIER, M. (1982). Semimartingales: A Course on Stochastic Processes. de Gruyter, Berlin. METIVIER, M., PELLAUMAIL, J. (1980). Stochastic Integration. Academic Press, NY. MEYER, P .A. (1962). A decomposition theorem for supermartingales. Illinois J. Math. 6, 193-205. - (1963). Decomposition of supermartingales: The uniqueness theorem. Illinois J. Math. 7, 1-17. - (1966). Probability and Potentials. Engl. trans., Blaisdell, Waltham. - (1967). Integrales stochastiques, I-IV. Lect. Notes in Math. 39, 72-162. Springer, Berlin. - (1971). Demonstration simplifiee d 'un theoreme de Knight. Lect. Notes in Math. 191, 191-195. Springer, Berlin. - (1976). Un cours sur les integrales stochastiques. Lect. Notes in Math. 511, 245-398. Springer, Berlin. MILLAR, P.W. (1968). Martingale integrals. Trans. Amer. Math. Soc. 133,145- 166. MINKOWSKI, H. (1907). Diophantische Approximationen. Teubner, Leipzig. MITOMA, I. (1983). Tightness of probabilities on C([O, 1); 5') and D([O, 1]; 5'). Ann. Probab. 11, 989-999. DE MOIVRE, A. (1711-12). On the measurement of chance. Engl. trans., Int. Statist. Rev. 52 (1984), 229-262. - (1718-56). The Doctrine of Chances; or, a Method of Calculating the Proba- bility of Events in Play, 3rd ed. (post.) Repr. Case and Chelsea, London and NY 1967. - (1733-56). Approximatio ad Summam Terminorum Binomii a + bin in Seriem Expansi. Translated and edited in The Doctrine of Chances, 2nd and 3rd eds. Repr. Case and Chelsea, London and NY 1967. MONCH, G. (1971). Verallgemeinerung eines Satzes von A. Renyi. Studia Sel. Math. Hung. 6, 81-90. MOTOO, M., WATANABE, H. (1958). Ergodic property of recurrent diffusion process in one dimension. J. Math. Soc. Japan 10, 272-286. NAWROTZKI, K. (1962). Ein Grenzwertsatz fur homogene zufallige Punktfolgen (Verallgemeinerung eines Satzes von A. Renyi). Math. Nachr. 24, 201-217. VON NEUMANN, J. (1932). Proof of the quasi-ergodic hypothesis. Proc. Natl. Acad. Sci. USA 18, 70-82. - (1940). On rings of operators, III. Ann. Math. 41, 94-161. NEVEU, J. (1971). Mathematical Foundations of the Calculus of Probability. Holden-Day, San Francisco. - (1975). Discrete-Parameter Martingales. North-Holland, j\.msterdam. NGUYEN, X.X., ZESSIN, H. (1979). Ergodic theorems for spatial processes. Z. Wahrsch. verw. Geb. 48, 133-158. NIKODYM, O.M. (1930). Sur une generalisation des integrales de M. J. Radon. Fund. Math. 15, 131-179. 
614 Foundations of Modern Probability NORBERG, T. (1984). Convergence and existence of random set distributions. Ann. Probab. 12, 726-732. NOVIKOV, A.A. (1971). On moment inequalities for stochastic integrals. Th. Probab. Appl. 16, 538-541. - (1972). On an identity for stochastic integrals. Th. Probab. Appl. 17, 717-720. NUALART, D. (1995). The Malliavin Calculus and Related Topics. Springer, NY. 0KSENDAL, B. (1998). Stochastic Differential Equations, 5th ed. Springer, Berlin. OREY, S. (1959). Recurrent Markov chains. Pacific J. Math. 9, 805-827. - (1962). An ergodic theorem for Markov chains. Z. Wahrsch. verw. Geb. 1, 174-176. - (1966). F-processes. Proc. 5th Berkeley Symp. Math. Statist. Probab. 2:1,301- 313. - (1971). Limit Theorems for Markov Chain Transition Probabilities. Van Nostrand, London. ORNSTEIN, D.S. (1969). Random walks. Trans. Amer. Math. Soc. 138, 1-60. ORNSTEIN, L.S., UHLENBECK, G.E. (1930). On the theory of Brownian motion. Phys. Review 36, 823-841. OSOSKOV, G.A. (1956). A limit theorem for flows of homogeneous events. Th. Probab. Appl. 1, 248-255. OTTAVIANI, G. (1939). Sulla teoria astratta del calcolo delle probabilita proposita dal Cantelli. Giorn. 1st. Ital. Attuari 10, 10-40. PALEY, R.E.A.C. (1932). A remarkable series of orthogonal functions 1. Proc. London Math. Soc. 34, 241-264. PALEY, R.E.A.C., WIENER, N. (1934). Fourier transforms in the complex domain. Amer. Math. Soc. Coll. Publ. 19. PALEY, R.E.A.C., WIENER, N., ZYGMUND, A. (1933). Notes on random functions. Math. Z. 37, 647-668. PALM, C. (1943). Intensity Variations in Telephone Traffic (German). Ericsson Technics 44,1-189. Engl. trans., North-Holland Studies in Telecommunication 10, Elsevier 1988. PAPANGELOU, F. (1972). Integrability of expected increments of point processes and a related random change of scale. Trans. Amer. Math. Soc. 165,486-506. PARTHASARATHY, K.R. (1967). Probability Measures on Metric Spaces. Academic Press, NY. PERKINS, E. (1982). Local time and pathwise uniqueness for stochastic differential equations. Lect. Notes in Math. 920, 201-208. Springer, Berlin. PETROV, V.V. (1995). Limit Theorems of Probability Theory. Clarendon Press, Oxford. PHILLIPS, H.B., WIENER, N. (1923). Nets and Dirichlet problem. J. Math. Phys. 2, 105-124. PITT, H.R. (1942). Some generalizations of the ergodic theorem. Proc. Camb. Phil. Soc. 38, 325-343. POINCARE, H. (1890). Sur les equations aux derivees partielles de la physique mathema-tique. Amer. J. Math. 12, 211-294. - (1899). Theorie du Potentiel Newtonien. Gauthier-Villars, Paris. 
Bibliography 615 POISSON, S.D. (1837). Recherches sur la Probabilite des Jugements en Matiere Criminelle et en Matiere Civile, Precedees des Regles Generales du Calcul des Probabilites. Bachelier, Paris. POLLACZEK, F. (1930). Uber eine Aufgabe cler Wahrscheinlichkeitst, heorie I-II. Math. Z. 32, 64-100, 729-750. POLLARD, D. (1984). Convergence of Stochastic Processes. Springer, NY. P6LYA, G. (1920). Uber den zentralen Grenzwertsatz cler Wahrscheinlichkeit- srechnung und das Momentenproblem. Math. Z. 8, 171-181. - (1921). Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die Irr- fahrt im Strassennetz. Math. Ann. 84, 149-160. PORT, S.C., STONE, C.J. (1978). Brownian Motion and Classical Potential Theory. Academic Press, NY. POSPISIL, B. (1935-36). Sur un probleme de M.M.S. Bernstein et A. Kolmogoroff. Casopis Pest. Mat. Fys. 65, 64-76. PROHOROV, Y.V. (1956). Convergence of random processes and limit theorems in probability theory. Th. Probab. Appl. 1, 157-214. - (1959). Some remarks on the strong law of large numbers. Th. Probab. Appl. 4, 204-208. - (1961). Random measures on a compactum. Soviet Math. Dokl. 2, 539-541. PROTTER, P. (1990). Stochastic Integration and Differential E:quations. Springer, Berlin. PUKHALSKY, A.A. (1991). On functional principle of large deviations. In New Trends in Probability and Statistics (V. Sazonov and T. Shervashidze, eds.), 198-218. VSP Moks'las, Moscow. RADON, J. (1913). Theorie und Anwendungen der absolut additiven Mengen- funktionen. Wien Akad. Sitzungsber. 122, 1295-1438. RAO, K.M. (1969a). On decomposition theorems of Meyer. Math. Scand. 24, 66-78. - (1969b). Quasimartingales. Math. Scand. 24, 79-92. RAY, D.B. (1956). Stationary Markov processes with continuous paths. Trans. Amer. Math. Soc. 82, 452-493. - (1963). Sojourn times of a diffusion process. Illinois J. Math. 7, 615-630. RENYI, A. (1956). A characterization of Poisson processes. Magyar Tud. Akad. Mat. Kutato Int. Kozl. 1, 519-527. - (1967). Remarks on the Poisson process. Studia Sci. Math. Hung. 2, 119-123. REVUZ, D. (1970). Mesures associees aux fonctionnelles additives de Markov, I-II. Trans. Amer. Math. Soc. 148, 501-531; Z. Wahrsch. verw. Geb. 16,336-344. - (1984). Markov Chains, 2nd ed. North-Holland, Amsterdam. REvuz, D., YOR, M. (1999). Continuous Martingales and Brownian Motion, 23rd ed. Springer, Berlin. RIESZ, F. (1909a). Sur les suites de fonctions mesurables. C.R. Acad. Sci. Paris 148, 1303-1305. - (1909b). Sur les operations fonctionelles lineaires. C.R. Acad. Sci. Paris 149, 974-977. 
616 Foundations of Modern Probability - (1910). Untersuchungen iiber Systeme integrierbarer Funktionen. Math. Ann. 69, 449-497. - (1926-30). Sur les fonctions subharmoniques et leur rapport it la theorie du potentiel, I-II. Acta Math. 48, 329-343; 54, 321-360. ROGERS, C.A., SHEPHARD, G.C. (1958). Some extremal problems for convex bodies. M athematica 5, 93-102. ROGERS, L.C.G., WILLIAMS, D. (2000ajb). Diffusions, Markov Processes, and Martingales, 1 (2nd ed.); 2. Cambridge Univ. Press. ROSEN, B. (1964). Limit theorems for sampling from a finite population. Ark. Mat. 5, 383-424. ROSINSKI, J., \V OYCZYNSKI, W. A. ( 1986). On I to stochastic integration with respect to p-stable motion: Inner clock, integrability of sample paths, double and multiple integrals. Ann. Probab. 14, 271-286. ROYDEN, H.L. (1988). Real Analysis, 3rd ed. Macmillan, NY. RUTHERFORD, E., GEIGER, H. (1908). An electrical method of counting the number of particles from radioactive substances. Proc. Roy. Soc. A 81, 141- 161. RVLL- N ARDZEWSKI, C. (1957). On stationary sequences of random variables and the de Finetti's [sic] equivalence. Colloq. Math. 4, 149-156. - (1961). Remarks on processes of calls. Proc. 4th Berkeley Symp. Math. Statist. Probab. 2, 455-465. SANOV, I.N. (1957). On the probability of large deviations of random variables (Russian). Engl. trans.: Sel. Trans. Math. Statist. Probab. 1 (1961), 213-244. SCHILDER, M. (1966). Some asymptotic formulae for Wiener integrals. Trans. Amer. Math. Soc. 125, 63-85. SCHOENBERG, I.J. (1938). Metric spaces and completely monotone functions. Ann. Math. 39, 811-841. SCHRODINGER, E. (1931). Uber die Umkehrung der Naturgesetze. Sitzungsber. Preuss. Akad. Wiss. Phys. Math. Kl. 144-153. VAN SCHUPPEN, J.H., WONG, E. (1974). Transformation of local martingales under a change of law. Ann. Probab. 2, 879-888. SEGAL, I.E. (1954). Abstract probability spaces and a theorem of Kolmogorov. Amer. J. Math. 76, 721-732. SHANNON, C.E. (1948). A mathematical theory of communication. Bell System Tech. J. 27, 379-423, 623-656. SHARPE, M. (1988). General Theory of Markov Processes. Academic Press, Boston. SHIRVAEV, A.N. (1995). Probability, 2nd ed. Springer, NY. SHUR, M.G. (1961). Continuous additive functionals of a Markov process. Dokl. Akad. Nauk SSSR 137, 800-803. SIERPINSKI, W. (1928). U ne theoreme generale sur les familles d' ensemble. Fund. Math. 12, 206-210. SKOROHOD, A.V. (1956). Limit theorems for stochastic processes. Th. Probab. Appl. 1, 261-290. 
Bibliography 617 - (1957). Limit theorems for stochastic processes with independent increments. Th. Probab. Appl. 2, 122-142. - (1961-62). Stochastic equations for diffusion processes in a bounded region, I-II. Th. Probab. Appl. 6, 264-274; 7, 3-23. - (1965). Studies in the Theory of Random Processes. Addison-Wesley, Reading, MA. (Russian orig. 1961.) SLIVNYAK, I.M. (1962). Some properties of stationary flows of homogeneous random events. Th. Probab. Appl. 7, 336-341. SLUTSKY, E.E. (1937). Qualche proposizione relativa aHa teoria delle funzioni aleatorie. Giorn. fst. ftal. Attuari 8, 183-199. SNELL, J .L. (1952). Application of martingale system theorems. Trans. Amer. Math. Soc. 73, 293-312. SOYA, M. (1967). Convergence d'operations lineaires non born(es. Rev. Roumaine Math. Pures Appl. 12, 373-389. SPARRE-ANDERSEN, E. (1953-54). On the fluctuations of sums of random variables, I-II. Math. Scand. 1, 263-285; 2, 195-223. SPARRE-ANDERSEN, E., JESSEN, B. (1948). Some limit theorelns on set-functions. Danske Vide Selsk. Mat.-Fys. Medd. 25:5 (8 pp.). SPITZER, F. (1964). Electrostatic capacity, heat flow, and Brownian motion. Z. Wahrsch. verw. Geb. 3, 110-121. - (1976). Principles of Random Walk, 2nd ed. Springer, NY. STIELTJES, T.J. (1894-95). Recherches sur les fractions continues. Ann. Fac. Sci. Toulouse 8, 1-122; 9, 1-47. STONE, C. J. (1963). Weak convergence of stochastic processes defined on a semi- infinite time interval. Proc. Amer. Math. Soc. 14, 694-69(). - (1969). On the potential operator for one-dimensional recurrent random walks. Trans. Amer. Math. Soc. 136, 427-445. STONE, M.H. (1932). Linear transformations in Hilbert space and their applications to analysis. Amer. Math. Soc. Colla Publ. 15. STOUT, W.F. (1974). Almost Sure Convergence. Academic Press, NY. STRASSEN, V. (1964). An invariance principle for the law of the iterated logarithm. Z. Wahrsch. verw. Geb. 3, 211-226. STRATONOVICH, R.L. (1966). A new representation for stochastic integrals and equations. SIAM J. Control 4, 362-371. STRICKER, C., YOR, M. (1978). Calcul stochastique dependant d'un parametre. Z. Wahrsch. verw. Geb. 45, 109-133. STROOCK, D.W. (1993). Probability Theory: An Analytic View. Cambridge Univ. Press. STROOCK, D.W., VARADHAN, S.R.S. (1969). Diffusion processes with continuous coefficients, I-II. Comma Pure Appl. Math. 22, 345-400, 4:79-530. - (1979). Multidimensional Diffusion Processes. Springer, Berlin. SUCHESTON, L. (1983). On one-parameter proofs of almost sure convergence of multiparameter processes. Z. Wahrsch. verw. Geb. 63, 43-49. TAKACS, L. (1967). Combinatorial Methods in the Theory of Stochastic Processes. Wiley, NY. 
618 Foundations of Modern Probability TANAKA, H. (1963). Note on continuous additive functionals of the I-dimensional Brownian path. Z. Wahrsch. verw. Geb. 1, 251-257. TEMPEL'MAN, A.A. (1972). Ergodic theorems for general dynamical systems. Trans. Moscow Math. Soc. 26, 94-132. THORISSON, H. (1996). Transforming random elements and shifting random fields. Ann. Probab. 24, 2057-2064. - (2000). Coupling, Stationarity, and Regeneration. Springer, NY. TONELLI, L. (1909). Sull'integrazione per partie Rend. Ace. Naz. Lincei (5) 18, 246-253. TROTTER, H.F. (1958a). Approximation of semi-groups of operators. Pacific J. Math. 8, 887-919. - (1958b). A property of Brownian motion paths. Illinois J. Math. 2, 425-433. VARADARAJAN, V.S. (1958). Weak convergence of measures on separable metric spaces. On the convergence of probability distributions. Sankhya 19, 15-26. - (1963). Groups of automorphisms of Borel spaces. Trans. Amer. Math. Soc. 109, 191-220. VARADHAN, S.R.S. (1966). Asymptotic probabilities and differential equations. Comma Pure Appl. Math. 19, 261-286. - (1984). Large Deviations and Applications. SIAM, Philadelphia. VILLE, J. (1939). Etude Critique de la Notion du Collectif Gauthier-Villars, Paris. VITALI, G. (1905). Sulle funzioni integrali. Atti R. Accad. Sci. Torino 40, 753- 766. VOLKONSKY, V.A. (1958). Random time changes in strong Markov processes. Th. Probab. Appl. 3, 310-326. - (1960). Additive functionals of Markov processes. Trudy Mosk. Mat. Obshc. 9, 143-189. WALD, A. (1946). Differentiation under the integral sign in the fundamental identity of sequential analysis. A nn. Math. Statist. 17, 493-497. - (1947). Sequential Analysis. Wiley, NY. WALSH, J .B. (1978). Excursions and local time. Asterisque 52-53, 159-192. WANG, A. T. (1977). Generalized It6's formula and additive functionals of Brownian motion. Z. Wahrsch. verw. Geb. 41, 153-159. WATANABE, H. (1964). Potential operator of a recurrent strong Feller process in the strict sense and boundary value problem. J. Math. Soc. Japan 16, 83-95. WATANABE, S. (1964). On discontinuous additive functionals and Levy measures of a Markov process. Japan. J. Math. 34, 53-79. - (1968). A limit theorem of branching processes and continuous state branching processes. J. Math. Kyoto Univ. 8, 141-167. WElL, A. (1940). L'integration dans les Groupes Topologiques et ses Applications. Hermann et Cie, Paris. WIENER, N. (1923). Differential space. J. Math. Phys. 2, 131-174. - (1938). The homogeneous chaos. Amer. J. Math. 60, 897-936. - (1939). The ergodic theorem. Duke Math. J. 5, 1-18. 
Bibliography 619 WILLIAMS, D. (1991). Probability with Martingales. Cambridge Univ. Press. YAMADA, T. (1973). On a comparison theorem for solutions of stochastic differential equations and its applications. J. Math. Kyoto Univ. 13, 497-512. YAMADA, T., WATANABE, S. (1971). On the uniqueness of solutions of stochastic differential equations. J. Math. Kyoto Univ. 11, 155-167. YOEURP, C. (1976). Decompositions des martingales locales et formules expo- nentielles. Lect. Notes in Math. 511, 432-480. Springer, Berlin. YOR, M. (1978). Sur la continuite des temps locaux associee a certaines semimartingales. Asterisque 52-53, 23-36. Y OSIDA, K. (1948) . On the differentiability and the representation of one- parameter semigroups of linear operators. J. Math. Soc. Japan 1, 15-21. YOSIDA, K., KAKUTANI, S. (1939). Birkhoff's ergodic theorem and the maximal ergodic theorem. Proc. Imp. Acad. 15, 165-168. ZAHLE, M. (1980). Ergodic properties of general Palm measures. Math. Nachr. 95, 93-106. ZAREMBA, S. (1909). Sur Ie principe du minimum. Bull. Acad. Sci. Cracovie. ZVGMUND, A. (1951). An individual ergodic theorem for noncommutative transformations. Acta Sci. Math. (Szeged) 14, 103-110. 
Symbol Index C, 378 Co, Co, 369, 374 C k , 340 Ct:, 98, 225 Cb(8), 65 C(K, S), 307 cf, 481 cov(, 1]), cov[; A], 50, 302 FD, Fv, 480 F 0 Q, 2 F v g, V n Fn, 50 FJlQ, FJlgH, 50,109 f, 11 1-1, 3 I:, I:;, 340 I . A, 442 fog, 5 /0g, 262 (f, g), I -l g, 17 f . J-l, 12 I >- U, I -< V, 36 L, 567 , 308 l.pB, 324 A, 499 An, 391 A>', 372, 442 A C , A \ B, ALlB, 1 A X B, 2 A, AIL, 13, 46 IBI, 187 BO B- 541 , , 8, B(8), 2 GD,gD, 477 l'f, 481 Dh, V h , 434 D(R+, S), 313, 563 D([O, 1], S), 319 Ll, \7, 1, 287, 375, 377, 483 8, 8, 150, 187, 473 6 x , 8 :: , 48 d -+, 65 HI, Hoo, 543, 553 H@n 262 , H, 480 ha,b, 456 H(), H(IF), H(vlll), 220,554 E, 48, 225 E , 443 Ex, EJ.l' 145 E[; A], 49 E[IF] == E:F, 104 £, En, 263, 335 £(X), 363, 522 I, I(B), 368, 545 In, 263 I(), I(IF), 220 Tfa, 181, 189 id, Id, 295 K, 324 KD, lC;, 480 Fn, 75 II F II  , 33 F, 120, 324 F , 124 F+, 121 Fr, 120 Fr- , 491 Foo, 132 Lt, Lf, 430, 436, 446 L  , 481 LP, Lfoc' 15, 36; L(X), L(X), 3:6-337, 344, 526 L 2 (M), 517 L 2 ( 1] ) , 266 .c(), 47 A, 24 A, A"', 539, 554 
622 Foundations of Modern Probability M, MJ, 184,391 (M), (M, N), 280, 516 M, Mo, 526 M 2 , M5, Mfoc, 331,515-516 M(S), 19, 225  524 , jj, J-L, 227 J..lt, J-Ls,t, 144 J.L, 481 J..lJ, 10 J..loJ-l, 10 J-L*l/, 15 J..ll/, J.L  l/, 14, 20, 142 J..l 1- l/, J.L « l/, J.L rv l/, 13, 29, 363 J..l V l/, J.L 1\ l/, 29 S, 377 s..n, "Snf, 184, 391, 396 8, 81-" 225, 316, 324, 564 a{.}, 2,5 supp J-L, 9 Tt, T/', 368, 372 TA, TB, 123, 492 Ta, Ta,b, 455 [r], _ 492 ()t, (), 146, 179, 189, 391 U, U Q , Uh, UA, U1, 402,442-443 N(m,a 2 ), 90 N(S), 226 N, 2 (n//k), 187 l/, 290, 435 l/ A , 442 v . X, 128, 336, 517-518, 526 v , 98, 564 var(), var[c;; A], 50, 71 wf, w(J, h), w(J, t, h), 57, 274, 310, 562 w(j,t,h), 563 w , 65 n, w, 46 n T 2 , XC, X d , 527 X r , 128 X., X;, 129 XodY, 342 [X], [X, Y], 280, 332, 519 {, 436  , 503 . , 226 , n, 190, 538 P, 46 P , 509 Px, PJ.l' 145,391 Po-l, 47 P[AIF] = p:F A, 106 P(S), 19 Pa,b, 456 P, Pj, 151, 243 Pt, PP, 475-476 p , 63, 408 1fB, 1fJ, 1ft, 19,47,225-226,316 Z, 432 Z,Z+, 6,59 (, (D, 380, 473 Qx,, Q'x,, 203, 209 Q, Q+, 98, 125 0, 1 [[0,1), 464 1, 58 lA, 1{.}, 5,46 2 8 1 , <, 57 ,--... II . II, II . II p , 15, 152, 369 R).., 370 R, IR+, JR , R +, 2, 5 Tx,y, 149 
Author Index Abel, N.H., 15, 144, 147, 242 Adler, R.J., 580 Akcoglu, M.A., 586 Aldous, D.J., 197, 314, 577-578, 583, 587 Alexandrov, A.D., 75, 571 Andre, D., 165, 223, 575, 578 Arndt, D., 604 Arzela, C., 307, 310-311, 559, 563 Ascoli, G., 307, 310-311, 559, 563 Athreya, K.B., 576 Baccelli, F., 578, 592 Bachelier, L., 256, 574-575, 580, 590 Bahadur, R.R., 594 Banach, S., 49, 369, 534, 585 Barbier, E., 578 Bass, R.F., 591 Bauer, H., 595 Baxter, G., 159, 169, 576 Bayes, T., 578 Belyaev, Y.K., 583 Berbee, H.C.P., 197, 577, 587 Bernoulli, J., 46, 55-56, 539, 571, 580 Bernstein, S.N., 128, 247, 572-573, 587 Bertoin, J., 582 Bertrand, J., 223, 578 Bessel, F. W., 256 Bichteler, K., 533, 593 Bienayme, J., 63, 69, 571 Billingsley, P., 569, 571, 573, 578, 581, 583, 595 Birkhoff, G.D., 178, 181, 391, 393, 576 Blackwell, D., 172, 575-576, 586 Blumenthal, R.M., 380-381, 446, 501, 575, 586, 588-589, 591-592 Bochner, S., 100, 261, 572, 580 Bogolioubov, N., 196, 577 Bohl, 200 Boltzmann, L., 576 Borel, E., 2, 3, 7, 24-25, 45, 47, 55, 119, 131, 308, 561, 569-571, 574 Brandt, A., 592 Breiman, L., 221, 561, 578, 589 Bremaud, P., 578, ,592 Brown, R., 252-253, 439, 580 Bryc, VV., 547, 594 Biihlmann, H., 217, 578 Buniakovsky, V.Y., 17, 334, 569 Burkholder, D.L., :333, 524, 584, 591, 593 Cameron, R.H., 364, 537, 543, 584-585 Cantelli, F.P., 45, 47, 55, 75, 119, 131, 570-571, 574 Caratheodory,' C., 24, 26, 570 Carleson, L., 578 Cauchy, A.L., 16-17, 65, 238, 304, 334, 470-472, 569 Chacon, R.V., 393, 586 Chapman, S., 140, 142-143, 145, 154, 367-368, 378, 574 Chebyshev, P.L., 63, 69, 571-572 Chentsov, N.N., 57, 313, 571, 583 Chernoff, H., 540, 583, 594 Choquet, G., 483, 486-487, 562, 577, 583, 591 Chow, Y.S., 572, 574 Chung, K.L., 162, 221, 272, 381, 481, 575-576, 578-579, 584, 590-591 Courant, R., 590 Courrege, P., 334, 517, 584, 592 Cox, D., 224, 226--228, 230-233, 246, 317-319,327-328,579,583 Cramer, H., 87, 261, 540, 571-572, 577, 580, 594 Csorg6, M., 581 Daley, D.J., 578-579 Dambis, K.E., 352, 585 Daniell, P.J., 23, 104, 114, 570, 573 Davis, B.J., 524, 593, 598 Dawson, D.A., 551, 594 Day, M.M., 201, 576 Debes, H., 583 
624 }4bundations of Modern Probability Dellacherie, C., 357, 533, 562, 574-575, 577, 580, 586, 589, 591-593, 595 Dembo, A., 594 Derman, C., 464, 589 Deuschel, J.D., 594 Dini, D., 541 Dirac, P., 8 Dirichlet, P.G.L., 470, 474, 590 Doeblin, W., 299, 303, 574-575, 579, 582, 586, 589 Dohler, R., 573 Doleans(-Dade), C., 345, 493, 496-497, 518, 522, 584, 591-593 Donsker, M.D., 275, 312, 319, 555, 581-582, 594 Doob, J.L., 7, 109-110, 124, 126-127, 129-131, 134-136, 138, 237, 358, 474, 490, 493, 495, 507, 509, 569, 571, 573-575, 577-579, 581, 584-587, 590-592 Dubins, L.E., 352, 581, 585 Dudley, R.M., 79, 563, 569, 572, 583, 594 Dunford, N., 69, 392, 571, 586, 591 Durrett, R., 581 Dvoretzky, A., 281, 581 Dynkin, E.B., 380, 382-384, 456, 569, 574-575, 577, 585-589 Egorov, D., 18 Einstein, A., 580 Elliott, R.J., 592 Ellis, R.S., 594 Engelbert, H.J., 450-451, 589 Erdos, P., 276, 576, 581 Erlang, A.K., 234, 579 Ethier, S.N., 563, 575, 583, 586, 588, 595 Faber, G., 571 Farrell, R.H., 195, 577 Fatou, P., 11, 46, 67, 569 Fell, J.M.G., 324, 470, 565-567, 595 Feller, W., 92-93, 96, 165, 172, 302, 367, 369-387, 400, 405-409, 421, 442, 456, 458, 462, 465, 501, 572, 575-576, 578-580, 582, 585-587, 589-590 Fenchel, W., 537, 539, 554, 594 Feynman, R.P., 470-471, 590 Fichtner, K.H., 246 de Finetti, B., 202, 212, 578, 581 Fisk, D.L., 339, 342, 426, 584, 593 Fortet, R., 586 Fourier, J.B.J., 90, 100, 163, 262, 575 Franken, P., 578 Frechet, M., 569 Freedman, D., 251, 575, 580, 586, 589, 594 Freidlin, M.I., 537, 553, 594 Friedrichs, K., 599 Frostman, 0., 590 Fubini, G., 14, 52, 108, 569 Fuchs, W.H.J., 162, 575 Furstenberg, H., 193, 577 Galmarino, A.R., 355, 585 Garsia, A.M., 182, 502, 525, 576, 591 Gartner, J., 551, 594 Gauss, C.F., 90-96, 250-254, 260-263, 266, 351, 473, 539, 579, 590 Geiger, H., 579 Getoor, R.K., 446, 575, 585, 588-589, 591-592 Gihman, I.I., 587 Girsanov, LV., 362, 365, 515, 585, 587, 592 Glivenko, V.I., 75, 571 Gnedenko, B.V., 303, 582 Goldman, J.R., 583 Goldstein, J .A., 586 Goldstein, S., 197, 577, 586 Grandell, J., 579 Green, G., 458, 470, 475, 477-486, 590 Greenwood, P., 588 Griffeath, D., 577, 586 Grigelionis, B., 503, 583, 592 Gronwall, 415, 455, 554 Gundy, R.F., 333, 524, 584, 598 Haar, A., 23, 39, 41, 198, 570 Hagberg, J., 583 Hahn, H., 28, 33, 35, 49, 534, 570 Hajek, J., 583 Hall, P., 581 
Halmos, P.R., 569, 573 Hardy, C.H., 184 Harris, T.E., 400, 405-406, 408, 410, 583, 586-587 Hartman, P., 275, 581 Hausdorff, F., 36, 247, 311, 399, 563 Heine, H.E., 25 Helly, E., 98, 572 Hermite, C., 84, 265-266 Hewitt, E., 45, 53, 161, 570 Heyde, C.C., 581 Hilbert, D., 104, 188, 251, 260, 262-263, 265-266, 331, 351, 515, 543 Hille, E., 367, 375, 585 Hitczenko, P., 592 Holder, 0., 15, 49, 57, 109, 252, 268, 313, 426, 448, 569-570, 586, 589 Hopf, E., 159, 168, 392, 586 Horowitz, J., 588 Hunt, G.A., 124, 256, 443, 476, 580, 586, 588-591 Hurewicz, W., 586 Hurwitz, A., 570 Ikeda, N., 584, 588 Ioffe, D., 549, 594 Ionescu Tulcea, A., 221, 578 Ionescu Tulcea, C.T., 104, 116, 573 Ita, K., 263, 265, 287, 336, 339-341, 357-358, 415, 431, 435-436, 458, 520, 571, 579-580, 582, 584-585, 587-591, 593 Jacod, J., 503, 518, 524,563,582-583, 585, 588, 592-593, 595 Jagers, P., 583 Jamison, B., 586 Jensen, J.L.W.V., 49, 109, 570 Jessen, B., 132, 574 Johnson, W.B., 593 Jordan, C., 33, 570 Kac, ., 276, 470-471, 577, 581, 590 Kakutani, S., 182, 354, 474, 575-576, 584, 586, 590 Kallenberg, 0., 571, 573, 575, 578-579, 582-585, 588, 592-593 Author Index 625 Kallianpur, G., 580 Kaplan, E.L., 206, 577 Karamata, J., 96, 572 Karatzas, 1., 580, 584-585, 588-591 Kazamaki, N., 344,584 Kemeny, J.G., 575 Kendall, D., 583 Kerstan, J., 600, 612 Kesten, H., 193, 577 Khinchin, A., 70, 96, 259, 290, 302, 537, 571, 576-578, 580, 582-583, 586, 593 Kingman, J.F.C., ] 78, 192, 577, 579 Kinney, J.R., 379, 571, 586 Knapp, A.W., 608 Knight, F.B., 355, 428, 440, 585, 588 Koebe, P., 473 Kolmogorov, A.N.. 53, 57, 69-71, 73, 104, I1t), 132, 142-143, 145, 152, 154, 242, 291, 313, 368,371,471, 563, 570-571, 573-574, 576, 579-582, 585, 589-590, 593 Kom16s, J., 581 Konig, D., 207, 578, 604 Koopman, B.O., 576 Korolyuk, V.S., 207, 578 Krengel, U., 577, 587 Krickeberg, K., 579, 593 Kronecker, L., 62, 73 Krylov, N., 196, 577 Kullback, S., 594 Kunita, H., 336, 347, 517, 521, 584, 587, 592-593 Kuratowski, K., 562 Kurtz, T.G., 385, 563, 575, 583, 586, 588, 595 Kuznetsov, S.E., 591 Kwapien, S., 572 Langevin, P., 414, 580, 587 Laplace, P.S. de, 84, 86, 88, 100, 227, 370, 375, 473, 572-573, 579, 590 Last, G., 592 Leadbetter, M.R., 571, 577, 582 Lebesgue, H., 11-12, 14,24-25,27, 29, 31, 55, 569-570 Le Carn, L., 583 van Leeuwenhoek, A., 580 
626 Foundations of Modern Probability Le Gall, J.F., 589 Legendre, A.M., 537, 539, 554, 594 Leibler, R.A., 594 Leontovich, M.A., 590 Levi, B., 11, 569 Levy, P., 71, 86, 90, 93, 96, 100, 128, 131-132,234, 252,255, 258, 285-287, 290-292, 294, 298-299, 352-354, 374, 430, 436, 571-576, 579-582, 584, 586-588, 593 Lewy, H., 599 Liapounov, A.M., 572 Liemant, A., 600 Liggett, T., 577 indeberg, J.W., 90, 92, 572, 586 Lindgren, G., 610 Lindvall, T., 575-576, 583, 587 Lipschitz, R., 268, 415, 453-455, 553, 589 Lipster, R.S., 611 Littlewood, J .E., 184 Loeve, M., 57, 569, 571-572, 576-577, 582 Lomnicki, Z., 117, 573 Lukacs, E., 572 Lundberg, F., 579 Lusin, N.N., 19, 562 Mackevicius, V., 385, 586 Maisonneuve, B., 588, 600 Major, P., 609 Maker, P., 183, 577 Mann, H.B., 76, 571 Marcinkiewicz, J., 73, 571, 593 Markov, A.A., 63, 140-155, 237-245, 254, 256, 368, 378, 380, 387, 391, 396, 421, 571, 573-574 Martin, W.T., 364, 537, 543, 584-585 Maruyama, G., 465, 585, 589 Matheron, G., 583, 591, 595 Matthes, K., 207, 578-579, 583, 600 Maxwell, J.e., 251, 579 McDonald, D., 596 McKean, H.P., 447, 458, 580, 588-590 McMillan, B., 221, 578 Mecke, J., 319, 579, 583, 612 Meleard, S., 460, 589 Memin, J., 518 Metivier, M., 592 Meyer, P.A., 136,431,493-494,498, 501, 505, 510, 518, 526-527, 562, 574-575, 577, 584, 586, 588, 591-593, 595 Millar, P.W., 333, 584 Minkowski, H., 15-16, 109, 183, 190-191, 263, 569 Mitoma, 1., 583 de Moivre, A., 572, 579 Monch, G., 579 de Morgan, A., 1 Motoo, M., 464, 589 Nawrotzski, K., 583 von Neumann, J., 200, 570, 573, 576 Neveu, J., 221, 502, 574, 587, 591 Newton, I, 474,488 Ney, P., 596 Nguyen, X.X., 190, 577 Nikodym, a.M., 29, 31, 105, 570, 573 Norberg, T., 325, 583 Novikov, A.A., 333, 364, 584-585 Nualart, D., 580 0ksendal, B., 584 Orey, S., 152, 172, 397, 400, 576, 586-587, 593 Ornstein, D.S., 162, 393, 575, 586 Ornstein, L.S., 254, 262, 414, 580, 587 Ososkov, G.A., 583 Ottaviani, G., 312, 583 Paley, R.E.A.C., 63, 268, 575, 580, 593 Palm, C., 203-210, 576-578, 583 Papangelou, F., 505, 592 Parseval, M.A., 162, 262 Parthasarathy, K.R., 562, 594 Pellaumail, J., 592 Perkins, E., 589 Petrov, V.V., 594 Phillips, H.B., 590 Picard, E., 415 Pitman, J.W., 588 Pitt, H.R., 576 Plancherel, M., 262 Poincare, H., 590 Poisson, S.D., 87-88, 226-231, 234-238, 241-242, 288, 
297-298, 301, 318, 368, 436, 504-505, 579 Pollaczek, F., 575 Pollard, D., 583 Pollard, H., 603 P61ya, G., 572, 575 Port, S.C., 591 Pospisil, B., 579 Prohorov, Y.V., 76, 309, 311, 313, 316, 563, 571, 581-583, 593 Protter, P., 593 Pukhalsky, A.A., 546, 594 Radon, J., 29, 31, 36, 105, 569-570, 573 Rao, K.M., 494, 532, 591, 593 Ray, D.B., 428, 440, 586, 588-589 Renyi, A., 234, 579, 583 Revesz, P., 581 Revuz, D., 442-445, 447, 580, 584-585, 587-590 Riemann, G.F.B., 31, 43, 175 Riesz, F., 23,36,43,378,490, 511, 569-571 Rogers, C.A., 577 Rogers, L.C.G., 561, 575, 584, 588-589, 592 Rootzen, H., 610 Rosen, B., 583 Rosinski, J., 592 Royden, H.L., 570, 594 Rubin, H., 76, 572 Rutherford, E., 579 Ryll-Nardzewski, C., 207, 212, 577-578 Sanov, LN., 537, 555, 594 Savage, L.J., 45, 53, 161, 570 Schechtman, G., 607 Schilder, M., 537, 543, 554, 557, 594 Schmidt, V., 604 Schmidt, W., 450-451, 589 Schoenberg, LJ., 251, 580 Schrodinger, E., 590 van Schuppen, J .H., 362, 523, 585, 592 Schwartz, J. T ., 392, 586 Schwarz, G., 352, 585 Schwarz, H.A., 17 Segal, I.E., 580 A ut.hor Index 627 Shannon, C.E., 221, 578 Sharpe, M., 575, 585, 591 Shephard, G.C., 577 Shiryaev, A.N., 563, 582-583, 592-593, 595 Shreve, S.E., 580, 584-585, 588-591 Shur, M.G., 591 Sierpiriski, W., 2, 200, 569 Skorohod, A.V., 79, 113,271, 273, 298, 313, 315, 419, 429, 453-454, 563, 572, 581-583, 587-589, 595 Slivnyak, L1., 210, 577-578 Slutsky, E., 571 Smoluchovsky, M., 142, 574 Snell, J .L., 130, 574, 608 Soya, M., 385, 586 Sparre-Andersn, E., 166, 169, 216, 276, 574, 576, 578 Spitzer, F., 576, 590 Stieltjes, T.J., 31, 255, 329, 340, 519, 570 Stone, C.J., 575, 583, 591 Stone, M.H., 86, 261, 580 Stout, W.F., 572 Strassen, V., 273, 537, 557, 581, 594 Stratonovich, R.L., 342, 426, 584 Stricker, C., 80, 345,571,584 Stroock, D.W., 418, 420-421, 472, 587-588, 590, 594 Sucheston, L., 617 Sztencel, R., 593 Takacs, L., 578 Tanaka, H., 428,431, 439, 447, 454, 459, 465, 588-589 Taylor, B., 90, 92 Teicher, H., 572, 574, 583 Tempel'man, A.A., 577 Thorisson, H., 197-198,209,573,575, 577-578, 587 Tonelli, L., 14, 569 Trotter, R.F., 385, 430, 586, 588 Thsnady, G., 609 Tychonov, A.N., 40 Uhlenbeck, G.E., 254, 262, 414, 580, 587 Ulam, S., 117,573 
628 Foundations of Modern Probability Varadarajan, V.S., 195, 309, 577, 583 Varadhan, S.R.S., 418, 420-421, 472, 541, 547, 555, 587-588, 590, 594 Vere-Jones, D., 578-579 Ville, J., 573 Vitali, G., 570 Volkonsky, V.A., 445, 447, 458, 587-589, 591 Voronoi, G., 204 Wald, A., 76, 364, 571, 575, 585 Walsh, J.B., 381, 440, 588 Wang, A.T., 431, 588 Watanabe, H., 406, 464, 586, 589 Watanabe, S., 336, 347, 374, 424, 453,505,517,521, 584-585, 587-589, 591-593 Weierstrass, K., 86, 341 Weil, A., 39, 570 Wentzell, A.D., 537, 553, 594 Weyl, H., 200 Wiener, N., 168, 184, 187, 252-253, 260, 263-266, 268, 358, 570, 575-576, 580, 590, 614 Williams, D., 561, 571, 574-575, 584, 588-589, 592 Williams, R.J., 584 Wintner, A., 275, 581 Wold, H., 87, 250, 572 Wong, E., 362, 523, 585, 592 Woyczynski, W.A., 572, 592 Yamada, T., 424, 453-454, 587, 589 Yan, J.A., 518, 533 Yoeurp, C., 527, 529, 593 Yor, M., 80, 345, 430, 571, 580, 584-585, 588-590 Yosida, K., 182, 367, 372, 375, 386, 576, 585 Yushkevich, A.A., 380, 574, 586 ZaWe, M., 210, 578 Zaremba, S., 474 Zeitouni, 0., 594 Zessin, H., 190, 577 Zinn, J., 607 Zorn, M., 43, 197-198 Zygmund, A., 63, 73, 178, 186, 268, 571, 577, 593, 614 
Subject Index absolute: continuity, 13, 29, 35, 261, 360, 432, 523 moment, 49 absorption of: Markov process, 155, 238, 378, 382, 434 diffusion, 461, 465 su permartingale, 136 accessible: set, boundar 160, 462 time, 492, 50D-50 1 jumps, 499, 529-530 action, left, right, 41 adapted, 120, 503 additive functional, 442 a.e., almost everywhere, 12 allocation sequence, 215 almost: everywhere, 12 invariant, 180 alternating function, 483, 487 analytic function, 342, 353 announcing sequence, 341, 491 aperiodic, 150 approximation of: covariation, 339 empirical distributions, 278 exchangeable sums, 321 local time, 432, 437 Markov chains, 387 martingales, 280 predictable process, 51 7 progressive process, 343 random walk, 273, 282, 299, 315 renewal process, 277 arcsine laws, 258, 276, 299 Arzela-Ascoli theorem, 310, 563 a.s., almost surely, 47 asymptotic invariance, 211 atom, atomic, 9, 19 augmented filtration, 124 averaging property, 105 backward equation, 242, 372, 471 balayage, sweeping, 474 ballot theorem, 218, 220 BDG inequalities, 333, 524 Bernoulli sequence, 56, 539 Bessel process, 256, 440 bilinear, 50 binary expansion, 56 binomial process, 226-227, 229, 235 Blumenthal's zero-one law, 381 Borel-Cantelli lemrna, 47, 55, 131 Borel: isomorphism, space, 7, 561 set, a-field, 2 boundary behavior; 462, 465, 474 bounded optional time, 126 Brownian: bridge, 253, 278, 319, 356 excursion, 439 motion, 252-260, 271-275, 277-282, 312, 352-360, 364-365, 412-424, 430, 439-440, 443-445, 447, 450-455, 458, 472-486, 507-513, 543, 553, 557 scaling, inversion, 253 CAF, continuous additive functional, 442 Cameron-Martin space, 364, 543, 553, 557 canonical: decomposition, 337, 518 process, space, filtration, 146, 380, 384 capacity, 481-483, 486-487 Cartesian product, 2 Cauchy: sequence, 16, 65 problem, 471-472 Cauchy-Buniakovsky inequality, 17, 334, 516, 520 centering, centered, 72, 126, 250 central limit theorem, 90, 275, 312 chain rule for: conditional independence, III 
630 Foundations of Modern Probability conditioning, 105 integration, 12, 338, 517 change of: measure, 360-365, 422, 523 scale, 451, 456 time, 344, 352, 423, 451-453, 458-461, 505-506 chaos expansion, 266, 360 Chapman-Kolmogorov equation, 142-143, 145, 151, 368 characteristic: exponent, 291 function, 84-86, 90, 100, 227 measure, 241 operator, 383 characteristics, 290, 413 Chebyshev's inequality, 63 closed, closure: martingale, 131, 135 operator, 373-374 coding, 113, 145, 204 commuting operators, 186, 370 compactification, 377-378 compactness: vague, 98, 564 weak, 98, 309 weak £1, 69 in C and D, 563 comparison of solutions, 454-455 compensator, 493, 498-500, 503-506, 510-511 complete, completion: filtration, 123 function space, 16, 65 a-field, 13, 110 completely monotone, alternating, 483, 487 complex-valued process, 260, 341, 351-352 composition, 5 compound: optional time, 146 Poisson process, 242, 297-298, 300-301 condenser theorem, 483 conditional: distribution, 107 entropy, information, 220 expectation, 104-105 independence 109-113, 141, 212, 217,228,424 probability, 106 conductor, 480-481, 483 cone condition, 474 conformal mapping, invariance, 342, 353 conservative semigroup, 369, 377 continuity: set, 75, 545 theorem, 86, 100 for a time-change, 344 continuous: additive functional, 442-447, 451-452, 458, 510-513 in probability, 216, 286, 319 mapping, 64, 76, 549 martingale component, 527-529 contraction: operator, 105, 109, 368, 391-393, 415 principle., 549 convergence in/of: distribution, 65-66, 71-72, 75-79, 86-88, 90-93, 96, 99-100, 275-276, 308-326, 385-387 probability, 63-66, 80 exchangeable processes, 322 infinitely divisible laws, 295-296 Levy processes, 298 LP, 16,68 Markov processes, 385 point processes, 317, 326 random measures, 316 random sets, 325 convex, concave: functions, 49, 126, 431, 459, 538-539 sets, 187-190, 196, 533 convolution, 15, 52 core of generator, 373-374, 385, 387 countably additive, subadditive, 8 counting measure, 8 coupling, 152, 172 independent, 152, 466 shift, 197-198, 209 Skorohod, 79, 113, 298-299 covariance, 49-50, 250 covariation, 332, 334-336, 339-342, 516-517,519-521,526,529 
Cox process, 226-228, 230-231, 318-319 Cramer-Wold theorem, 87 cumulant-generating function, 539, 554 cycle stationarity, 206 cylinder set, 2, 115 (0), submartingale class, 493 Daniell-Kolmogorov theorem, 114-115 debut, 123 decomposition of: finite-variation function, 33-34 increasing process, 499 martingale, 518, 527, 529 measure, 29 optional time, 493 signed measure, 28 submartingale, 126, 493 degenerate: measure, 9, 19 random element, 51 delay, 170-172 density, 12-13, 29, 31, 133 differentiation theorem, 31 diffuse, nonatomic, 9-10, 19, 230, 233, 299 diffusion, 384, 413, 455-467, 471 equation, 413, 421, 423, 450-455 Dirac measure, 8 Dirichlet problem, 474 discrete time, 143 disintegration, 108 dissipative, 377 distri bu tion, 47 function, 48, 59 Doleans exponential, 522 domain, 473 of attraction, 96 of generator, 370, 372-375, 377 dominated: convergence, 11-12, 337, 518, 526 ergodic theorem, 184 Donsker's theorem, 275, 312 Doob decomposition, 126 Doob-Meyer decomposition, 493 dual predictable projection, 498 duality, 167 Dynkin's formula, 382 Subject Index 631 effective dimension, 160 Egorov's theorem, 18 elementary: function, 263 additive functional, 442 stochastic integral, 128, 335, 343, 517 elliptic operator, 384, 418, 472 embedded: Markov chain, 239 martingale, 279 random variable, walk, 271-273, 464 empirical distribution, 75, 195, 278, 554-555 entrance boundary, 461-462 entropy, 220-221, 554-555 equicontinuous, 86, 311, 313-314, 563 equilibrium measure, 481-485 ergodicity, 181, 195-196, 397, 399, 465 ergodic decomposition, 196 ergodic theorems: Markovian, 152-154, 244-245, 397, 399, 408-409, 465 multivariate, Palm, 186-187, 190, 209-210 ratio, 393, 396, 464 stationarity, contractions, 181-183, 392 subadditive, matrices, 192-193 evaluation, projection, 47, 225, 562 event, 46 excessive function, 379, 445, 507-511 exchangeable: sequence, 212-215, 320-321 process, 216-218, 235, 319-322 excursion, 150, 433-440 existence of: Brownian motion, 252 Cox process, randomization 231 Markov process, 143, 378 random sequence, process, 55, 114-117 solution to SDE, 415, 419, 422-423, 451 exit boundary, 461 expectation, expected value, 48-49, 52 exploon, 240,417,462 
032 bundations of Modern Probability exponential: distribution, 237-240, 434 equivalence, 552 inequalities, 530 martingale, process, 351, 363, 522, 530 rate, 541 tightness, 546-549, 556 extended real line, 5 extension of: filtration, 124, 352 measure, 26, 114-115, 362 probability space, 111-112 extreme: element, 196 value, 257, 303 factorial measure, 213 fast reflection, 460 Fatou's lemma, 11, 67 Fell topology, 324, 565-566 FeUer process, semigroup, 369-387, 399-409, 421, 442, 445-446, 462, 501 Fenchel-Legendre transform, 539, 554 Feynman-Kac formula, 471 filling operator, functional, 394-395 filtration, 120 de Finetti's theorem, 212 finite-dimensional distributions, 48, 142 finite- variation: function, 33-35 process, 330, 337,497, 518 first: entry, 124 maximum, 166, 216, 258, 276, 299 passage, 166-170,292 Fisk-Stratonovich integral, 342 fixed jump, 286 flow, 183, 415 fluctuations, 167 forward equation, 372 Fubini theorem, 14, 52, 108, 358 functional: CLT, LIL, 275, 312, 557 LDP, 547 representation, 80, 346 solution, 423 fundamental: identity, 480 theorem, 31-32 Gaussian: convergence, 90-92, 96 measure, process, 90, 250-252, 254, 260-266, 539 generated: a-field, 2, 5 filtration, 120 generating function, 84 generator, 368, 370-377, 383-387 geometric distribution, 149, 434 Girsanov theorem, 362, 365, 523 Glivenko-Cantelli theorem, 75 goodness of rate function, 546 graph: of operator, 373 of optional time, 492 Green function, potential, 458, 477, 513 Haar measure, 39 Hahn decomposition, 28 harmonic: function, 353, 396, 473 measure, 474 minorant, 511 Harris recurrent, 400, 405-406 heat equation, 472 Helly's selection theorem, 98 Hermite polynomials, 265 Hewitt-Savage zero-one law, 53 HiUe-Yosida theorem, 375 hitting: function, 325, 487 kernel, 473, 480, 485 time, 123, 456, 473 Holder: continuous, 57, 252, 313 inequality, 15, 109 holding time, 238, 434 homogeneous: chaos, 266 kernel, 144, 242 hyper-contraction, 321 hyperplane, 539 
i.i.d. sequence, 53-54, 56, 73, 89-90, 95-96, 271-276, 294, 297, 312, 538-541, 555 inaccessible boundary, 460 increasing process, 493 increment of function, measure, 33, 58-59, 226, 234 independent, 50-55 independent increments: processes, 144, 242, 252, 286-287 random measures, 226, 234-235 indicator function, 5, 46 indistinguishable, 57 induced: O'-field, 2-3, 5 filtration, 120, 344 infinitely divisible, 293-297, 302 information, 220-221 initial distribution, 141 Inner: content, 37 product, 17 radius, 187 instantaneous state, 434 integrable: function, 11 increasing process, 496 random vector, process 47 integral representation: invariant distribution, 196 martingale, 357-360 integration by parts, 339, 519, 523 intensity, 189, 203, 225 invariance principle, 277 invariant: distribution, 148-149, 151-152, 243-244, 408-409, 467 function, 180, 392, 396-399 measure, 15, 27, 39-41, 391, 396, 404-407 set, a-field, 180, 183, 186, 189, 398-399 subspace, 188, 374 inverse: contraction principle, 549 function, 3, 562 local time, 438 maximum process, 292 inversion formulas, 204-205 Lo., infinitely often, 46, 54-55, 131 Subject Index 633 irreducible, 151, 244 isometry, 260, 263, :351 isonormal, 251, 260, 263-266 isotropic, 352 Ita: correction term, 340 formula, 340-342, 431, 521 integral, 336-337, 343-344 J 1 -topology, 313, 563 Jensen's inequality, 49, 109 joint stationarity, 203 Jordan decomposition, 33 jump transition kernel, 238 jump-type process, 237 kernel, 20-21, 56, 106, 145,225,404, 420 density, 133 hitting, quitting, sweeping, 473, 480-481, 485 transition, rate, 141-145, 238-242 killing, 471, 475 Kolmogorov: extension theorern, 115 maximum inequality, 69 zero-one law, 53, 132-133 Kolmogorov-Chentsov criterion, 57, 313 ladder time, height, 166-167, 169-170 A-system, 2 Langevin equation, 414 Laplace: operator, equation, 375, 472-473 transform, functional 84-86, 88, 100, 227, 370 large deviation principle, LD P, 541-555 last: return, zero, 165, 258, 276 exit, 481 law of: large numbers, 73, 95 the iterated logarithm, 259, 275, 277- 278, 557 lcscH space, 225 LDP, 541-555 Lebesgue: decomposition, 29 
634 .f4bundations of Modern Probability differentiation theorem, 31 measure, 24, 27 unit interval, 55 Lebesgue-Stieltjes measure, integral, 31, 518 Legendre-Fenchel transform, 539, 554 level set, 254, 543 Levy: characterization of Brownian motion, 352 measure, 290 process, 290-294, 298-299, 315, 374, 518 Levy-Khinchin formula, 290-291 Lindeberg condition, 92 linear: SDE, 414, 522 functional, 36, 263 Lipschitz condition, 415, 453-455 L log L-condition, 186 local: characteristics, 413 condition, property 57, 105 conditioning, hitting, 207 operator, 383-384 martingale, submartingale, 330, 493, 518 measurability, 287 substitution rule, 341 time, 428-432, 436-438, 440-441, 446-447, 452, 454, 458, 512-513 localization, 330 locally: compact, 225, 312, 316, 324, 369, 563 finite, 9, 19, 30, 33-35, 225, 283, 564 LP- bounded, 67, 130, 132 contraction, 109, 391-393 convergence, 16, 68, 132, 181-183, 186-187, 190 Lusin's theorem, 19 marked point process, 234, 504-505 Markov: chain, 151-154, 243-245, 387 inequality, 63 process, 141-148, 254, 367-387, 391, 396-409, 421, 455-467 martingale, 125-136, 352-358, 360-364, 382 closure, 131, 135 convergence, 130-132, 135 decomposition, 518, 527, 529 embedding, 279-281 problem, 418-421 transform, 127 maximum, maximal: ergodic lemma, 181 inequality, 69, 128-129, 184, 188, 221, 312, 333, 392, 524, 530 measure, 29 operator, principle, 377, 383 process, 256, 292, 430 mean, 48 continuity, 28 ergodic theorem, 190 recurrence time, 154, 245 mean-value property, 473 measurable: group, 15 function, 4-7 set, space, 2, 23 measure, 8-9 determining, 9, 195 preserving, 179, 235, 356 space, 8 valued function, process, 214, 324, 565 median, 71-72 Minkowski's inequality, 15-16, 109 mixed: binomial, Poisson, 226-227, 229, 235 Li.d., Levy, 212, 217 mixing, 397-399 modulus of continuity, 57, 274, 310-311, 453, 562-563 moment, 49 moment inequalities, 129, 184, 333, 502, 524 monotone: class theorem, 2 convergence, 11, 104 rgodic theorem, 187, 190 moving average, 261 
multiple stochastic integral, 263-266, 358-360 multiplicative functional, 471 multivariate ergodic theorem, 186-190, 209-210 natural: absorption, 461 increasing process, 494, 496-497 scale, 456 nonarithmetic, 1 72 nonnegative definite, 50, 261 normal, Gaussian, 90, 250 norm inequalities, 15-16, 109, 129, 184, 333, 502, 524 nowhere dense, 433-434 null : array, 88, 91, 93, 300-303, 317-318 recurrent, 152, 245, 408-409, 465 set, 12 occupation: density, 431, 477 times, measure, 149, 160, 171-173, 432 ONB, orthonormal basis, 251, 262 one-dimensional criteria, 233- 234, 317-318,324-326 operator ergodic theorem, 392-393 optional: projection, 381 sampling, 127, 135 skipping, 215 stopping, 128, 338 time, 120-124, 146, 491-493, 498 Ornstein-Uhlenbeck process, 254, 262, 414 orthogonal: functions, spaces, 17, 265-266 martingales, processes, 288, 355 measures, 13, 28-29 projection, 17 outer measure, 23-25, 37-38 Palm distribution, 203-210 parabolic equation, 372,471-472 parallelogram identity, 17 parameter dependence, 80, 345 partition of unity, 36 path, 47, 486 Subject Index 635 pathwise uniqueness, 414-415, 423-424, 453 perfect, 433 period, 15D-151 permutation, 53, 212 perturbed dynamical system, 553 7T'-system, 2 Picard iteration, 415 point process, 171-172, 203-207, 226-236,317-319,326 Poisson: compound, 242, 297-298, 300-301 convergence, 88, 318 distribution, 88 integrals, 236, 287 mixed, 226, 229, 235 process, 226-227, 234-236, 238, 288, 318, 436, 486, 504-505 pseudo-, 241, 368 polar set, 354, 480 polarization, 516, 519 Polish space, 7, 56 L polynomial chaos, 266 Portmanteau theorem, 75 positive: density, 361, 441 functional, operator, 36, 105, 368 maximum principle, 375, 377, 383 operator, 368 random variables, 70, 91, 300 recurrent, 152, 245, 408-409, 465, 467 variation, 33 potential: of additive functional, 442-445, 511 Green, 477-481, 512-513 operator, 370, 379, 402-403 term, 471 predictable: quadratic variation, covariation, 280, 516 process, 491-492, 496-499, 502-504, 506, 517-518, 523, 526 random measure, 503 sampling, 215 sequence, 126 step process, 128, 331, 516, 533 time, 214, 341, 491-493,498-501, 504, 529 
636 Foundations of Modern Probability prediction sequence, 214 preseparating class, 317, 326, 567 preservation of: semimartingales, 340, 431, 521, 524 stochastic integrals, 362 probability, 46 generating function, 84 measure, space 46 product: a-field, 2, 115 measure, 14-15, 52, 117 progressive, 122, 345, 356, 413 Prohorov's theorem, 309 projection, 17, 562 projective limit, 114-117, 551, 568 proper, 41 pseudo-Poisson, 241, 368 pull-out property, 105 purely: atomic, 10 discontinuous, 499, 527, 529 quadratic variation, 255, 280, 332-334, 337, 519-520 quasi-left continuous , 499, 501, 504, 529 quasi-martingale, 532 quitting time, kernel, 481, 485 Radon measure, 36 Radon-Nikodym theorem, 29, 105 random: element, variable, process, 47 matrix, 193 measure, 106, 203-204, 209-210, 212, 218, 225-235, 316, 503 sequence, 64, 78, 551 series, 69-73, 319-320 set, 325-326, 486-487 time, 120 walk, 54, 160-172, 271, 273, 275-276, 282, 299, 315 randomization, 113, 145, 272, 321 of point process, 226-228 variable, 112, 352 fate: function, kernel, 238, 545-555 process, 352 ratio ergodic theorem, 393,464 raw rate function, 545, 549 Ray-Knight theorem, 440 rcll, 134 recurrence, 149, 151-152, 160-164, 244, 400, 405-406 time, 154, 245, 434 reflecting boundary, 460 reflection principle, 165, 257 regenerative set, process, 432 regular: boundary, domain, set, point 446, 473-474 conditional distribution, 106-107 diffusion, 455-459 measure, outer measure 18, 37-38 regularization of: local time, 430 Markov process, 379 rate function, 545 stochastic flow, 415 submartingale, 130, 134 relative: compactness, 69, 98-99, 309, 563-565 entropy, 554 renewal: measure, process, 170, 238, 277-278 theorem, 172 equation, 175 resolvent, 370, 379 equation, 370, 402 restriction of: measure, 9 optional time, 492-493, 498 Revuz measure, 442-445, 447 Riemann integrable, 1 75 Riesz: decomposition, 511-513 representation, 36, 371, 378 right-continuous: filtration, 121, 124 function, 34-35 process, 134, 379 right-invariant, 39 sample, 211 intensity, 190 process, 226 sampling: sequence, 211 without replacement, 213, 319 
scale function, 456 Schwarz's inequality, 17 SDE, stochastic differential equation, 346, 412-424, 450-455, 522, 553 sections, 14, 562 selection, 98, 562 self-adjoint, 105 self-similar, 291 semicontinuous, 507, 545 semigroup, 145, 183, 186-187, 368-378, 397-399 semimartingale, 337-345, 518-524, 527-529, 532-533 semiring, 26 separating class, 317, 486-487, 567 series of measures, 8 shift: coupling, 197-198,209 operator, 146, 1 79, 380 a-field, 1 a-finite, 9, 225 signed measure, 28,34 simple: function, measure, 6, 10 point process, 203-207, 226, 230, 233-235,238,317,326 random walk, 165 singular(ity), 13, 28-29, 35, 218,452 skew-product, 355 Skorohod: coupling, 79, 113, 298-299 embedding, 271 slow: reflection, 460 variation, 96 sojourn, 216, 258, 276 space filling, 187 space-homogeneous, 144, 147 space-time invariant, 397 special semimartingale, 518, 532 spectral measure, representation, 261 speed measure, 458, 465, 467 spreadable, 212, 214, 216-217 stable, 291-292, 506 standard extension, 352, 358, 419, 424 stationary: process, 148, 179, 183,211,220-221 random measure. 170, 189-190, 203-210, 218 Subject Index 637 stochastic: differential equation, 346, 412-424, 450-455, 522, 553 flow, 415 integral, 236, 260--266, 336-346, 517-518, 526 process, 47 Stone-Weierstrass theorem, 86, 341, 521 stopping (optional) time, 120 Stratonovich integral, 342 strict past, 491-492 strong: continuity, 369, 372 ergodicity, 153, 244, 397, 400, 408, 466 existence, 414-415, 423-424 homogeneity, 155 law of large numbers, 73 Markov property, 147, 155, 237, 256, 380, 421 orthogonality, 355 solution, 413, 424 stationarity, 214 subadditive: ergodic theorem, 192 sequence, 191, 538 set function, 8, 37 submartingale, 125-126, 128, 130, 134-135, 493 subordinator, 290-293, 438 subsequence criterion, 63 subspace, 4, 47, 76: 309 substitution rule, 12, 340-342, 431, 521 super harmonic, 507 supermartingale, 125, 136, 379, 510 superposition, 318, 486 support of: additive functional, 446, 513 local time, 429, 441 measure, 9, 326, 429, 567 supporting measure, 399 sweeping, 474, 480 symmetry, symmetric: difference, 1 point process, 235 random variable, 54, 70, 91, 163, 300 set, 53 
638 Foundations of Modern Probabjlity spherical, 251 symmetrization, 71-72, 163, 263 tail: probabilities, 49, 63, 85 a-field, 53, 133, 197, 397 Tanaka's formula, 428 Taylor expansion, 90, 92, 340 terminal time, 240, 341, 380, 473 thinning, 226-227, 231, 318-319 three-series criterion, 71 tightness, 66, 86, 99, 309-311, 313-314, 316, 321, 324, 533, 546, 556 time: change, 124, 344, 352, 355, 458, 505-506, 563 homogeneous, 144-145, 147, 237 reversal, 484, 509 topological group, 38 total variation, 33, 152, 255 totally inaccessible, 492, 495, 500 transfer, 58, 112, 424 transient, 149, 160, 164, 244, 354, 405, 408-409 transition: densi ty, 476 function, matrix, 151, 243 kernel, 141, 144-145, 368 operator, semigroup, 241, 368 transitive, 41 translation, 15, 27 trivial u-field, 51, 53, 181, 381, 397-399 two-sided extension, 180 ultimately, 46 uncorrelated, 50, 250 uniform: distribution, 55 excessivity, 445 integrability, 67-69, 109, 131, 134, 173,334,477,493 laws, 259, 218-220 transience, 405 uniqueness (of): additive functional, 442, 445 distribution, 48, 86-87, 141, 204, 371 pathwise, 414-415, 423-424, 453 rate function, 545 in law, 414, 421-424,451,472 universal completion, 423, 562 upcrossings, 129-130 urn sequence, 213 vague topology, 98-99, 172, 316, 564 variance, 50, 52 variation, 33 version of process, 57 V oronoi cell, 204 Wald's identity, 364 weak: compactness, 99, 309 convergence, 65, 86-96, 99-100, 275-276, 308-326, 385-387 ergodici ty, 399 existence, 414, 419, 422-423, 451 L 1 compactness, 69 law of large numbers, 95 LDP, 546, 549 mixing, 398-399 optionality, 121 solution, 413, 418, 424 weight function, 190 well posed, 418 Wiener: integral, 260-263 process, Brownian motion, 253 Wiener-Hopf factorization, 168 Yosida approximation, 372, 386 zero-one law, 53, 381 zero-infinity law, 203