/
Текст
Probability and its Applications
A Series of the Applied Probability Trust
Editors: J. Gani, C.C. Heyde, T.G. Kurtz
Springer
New York
Berlin
Heidelberg
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Probability and its Applications
Anderson: Continuous-Time Markov Chains.
Azencott/Dacunha-Castelle: Series of Irregular Observations.
Bass: Diffusions and Elliptic Operators.
Bass: Probabilistic Techniques in Analysis.
Choi: ARMA Model Identification.
de la Peiia/Gine: Decoupling: From Dependence to Independence.
Galambos/Simonelli: Bonferroni-type Inequalities with Applications.
Gani (Editor): The Craft of Probabilistic Modelling.
Grandeli.' Aspects of Risk Theory.
Gut: Stopped Random Walks.
Guyon: Random Fields on a Network.
Kallenberg: Foundations of Modem Probability, Second Edition.
Last/Brandt: Marked Point Processes on the Real Line.
Leadbetter/Lindg reniRootzen: Extremes and Related Properties of Random Sequences
and Processes.
Nualart: The Malliavin Calculus and Related Topics.
Rachev/Ruschendorf: Mass Transportation Problems. Volume I: Theory.
Rachev/Ruschendorf' Mass Transportation Problems. Volume II: Applications.
Resnick: Extreme Values, Regular Variation and Point Processes.
Shedler: Regeneration and Networks of Queues.
Thorisson: Coupling, Stationarity, and Regeneration.
Todorovic: An Introduction to Stochastic Processes and Their Applications.
Olav Kallenberg
Foundations of
Modern Probability
Second Edition
Springer
Olav Kallenberg
Department of Mathematics
Auburn University
Auburn, AL 36849
USA
Series Editors
J. Gani
Stochastic Analysis
Group, CMA
Australian National
University
Canberra, ACT 0200
Australia
C.C. Heyde
Stochastic Analysis
Group, CMA
Australian National
University
Canberra, ACT 0200
Australia
T.G. Kurtz
Department of
Mathematics
University of Wisconsin
480 Lincoln Drive
Madison, WI 53706
USA
Mathematics Subject Classification (2000): 60-01
Library of Congress Cataloging-in-Publication Data
Kallenberg, Olav.
Foundations of modem probability I Olav Kallenberg. - 2nd ed.
p. em. - (Probability and its applications)
Includes bibliographical references and index.
ISBN 0-387-95313-2 Calk. paper)
1. Probabilities. I. Title. II. Springer series in statistics. Probability and its
applications.
QA273.K285 2001
519.2--dc21 2001032816
Printed on acid-free paper.
@ 2002 by the Applied Probability Trust.
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth A venue, New York,
NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use
in connection with any form of infonnation storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the
former are not especially identified, is not to be taken as a sign that such names, as understood by
the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
Production managed by Allan Abrams; manufacturing supervised by Jerome Basma.
Photocomposed pages prepared by the Bartlett Press.
Printed and bound by Maple-Vail Book Manufacturing Group, York, P A.
Printed in the United States of America.
98765432
ISBN 0-387-95313-2
Springer- Verlag New York Berlin Heidelberg
A member of BertelsmannSpringer Science+Business Media GmbH
Praise for the First Edition
"It is truly surprising how much material the author has managed to cover
in the book. ... More advanced readers are likely to regard the book as
an ideal reference. Indeed, the monograph has the potential to become a
(possibly even 'the') major reference book on large parts of probability
theory for the next decade or more." -M. Scheutzow (Berlin)
"1 am often asked by mathematicians. .. for literature on 'a broad intro-
duction to modern stochastics.' . .. Due to this book, my task for answering
is made easier. This is it! A concise, broad overview of the main results and
techniques ... . From the table of contents it is difficult to believe that
behind all these topics a streamlined, readable text is at all possible. It is:
Convince yourself. I have no doubt that this text will become a classic. Its
main feature of keeping the whole area of probability together and present-
ing a general overview is a real success. Scores of students . .. and indeed
researchers will be most grateful!" -P.A.L. E'mbrechts (Ziirich)
"The theory of probability has grown exponentially during the second
half of the twentieth century, and the idea of writing a single volume that
could serve as a general reference ... seems almost foolhardy. Yet this is
precisely what Professor Kallenberg has attempted . .. and he has accom-
plished it brilliantly. ... With regard to his primary goal, the author has
been more successful than I would have imagined possible. It is astonishing
that a single volume of just over five hundred pages could contain so much
material presented with complete rigor, and still be at least formally self-
contained. . .. As a general reference for a good deal of modern probability
theory [the book] is outstanding. It should have a place in the library of
every probabilist. Professor Kallenberg set himself a very difficult task, and
he should be congratulated for carrying it out so well."
-R.K. Getoor (La Jolla, California)
"This is a superbly written, high-level introduction to contemporary
probability theory. In it, the advanced mathematics student will find basic
information, presented in a uniform terminology and notation, essential to
gaining access to much present-day research. ... I congratulate Professor
Kallenberg on a noteworthy achievement."
-M.F. Neuts (Tucson, Arizona)
"This is a very modern, very ambitious, and very well-,vritten book. The
scope is greater than I would have thought possible in a book of this length.
This is made possible by the extremely efficient treatment, particularly the
proofs ... . [Kallenberg] has succeeded in his mammoth task beyond all
reasonable expectations. I think this book is destined to become a modern
classic." -N.H. Bingham (London)
"Kallenberg has ably achieved [his] goal and presents all the important
results and techniques that every probabilist should know. ... We do not
doubt that the book. .. will be widely used as material for advanced post-
graduate courses and seminars on various topics in probability."
-jste, European Math. Soc. Newsletter
"This is a very well written book. ... Much effort must have been put
into simplifying and streamlining proofs, and the results are quite impres-
sive. ... I would highly recommend [the book] to anybody who wants a
good concise reference text on several very important parts of modern prob-
ability theory. For a mathematical sciences library, such a book is a must."
-K. Borovkov (Melbourne)
"[This] is an unusual book about a wide range of probability and stochas-
tic processes, written by a single excellent mathematician. . .. The graduate
student will definitely enjoy reading it, and for the researcher it will become
a useful reference book and necessary tool for his or her work."
-T. Mikosch (Groningen)
"The author has succeeded in writing a text containing-in the spirit
of Loeve's Probability Theory-all the essential results that any probabilist
needs to know. Like Loeve's classic, this book will become a standard source
of study and reference for students and researchers in probability theory."
- R. Kiesel (London)
"Kallenberg's present book would have to qualify as the assimilation
of probability par excellence. It is a great edifice of material, clearly and
ingeniously presented, without any nonmathematical distractions. Readers
wishing to venture into it may do so with confidence that they are in very
capable hands." -F.B. Knight (Urbana, Illinois)
"The presentation of the material is characterized by a surprising clarity
and precision. The author's overview over the various subfields of probabil-
ity theory and his detailed knowledge are impressive. Through an activity
over many years as a researcher, academic teacher, and editor, he has ac-
quired a deep competence in many areas. Wherever one reads, all chapters
are carefully worked through and brought in streamlined form. One can
imagine what an enormous effort it has cost the author to reach this
final state, though no signs of this are visible. His goal, as set forth in
the preface, of giving clear and economical proofs of the included theorems
has been achieved admirably. . .. I can't recall that in recent times I have
held in my hands a mathematics book so thoroughly worked through."
-H. Rost (Heidelberg)
Preface to the Second Edition
For this new edition the entire text has been carefully revised, and some
portions are totally rewritten. More importantly, I have inserted more than
a hundred pages of new material, in chapters on general measure and er-
godic theory, the asymptotics of Markov processes, and large deviations.
The expanded size has made it possible to give a self-contained treatment
of the underlying measure theory and to include topics like multivariate
and ratio ergodic theorems, shift coupling, Palm distributions, entropy and
information, Harris recurrence, invariant measures, strong and weak ergod-
icity, Strassen's law of the iterated logarithm, and the basic large deviation
results of Cramer, Sanov, Schilder, and Freidlin and Ventzel.
Unfortunately, the body of knowledge in probability theory keeps grow-
ing at an ever increasing rate, and I am painfully aware that I will never
catch up in my efforts to survey the entire subject. Many areas are still
totally beyond reach, and a comprehensive treatment of the more recent
developments would require another volume or two. I am asking for the
reader's patience and understanding.
Many colleagues have pointed out errors or provided helpful information.
I am especially grateful for some valuable comments from Wlodzimierz
Kuperberg, Michael Scheutzow, Josef Teichmann, and Hermann Thoris-
son. Some of the new material was presented in our probability seminar
at Auburn, where I benefited from stimulating discussions with Bill Hud-
son, Ming Liao, Lisa Peterson, and Hussain Talibi. My greatest thanks are
due, as always, to my wife Jinsoo, whose constant love and support have
sustained and inspired me throughout many months of hard work.
Olav Kallenberg
March 2001
Preface to the First Edition
Some thirty years ago it was still possible, as Loeve so ably demonstrated,
to write a single book in probability theory containing practically every-
thing worth knowing in the subject. The subsequent development has been
explosive, and today a corresponding comprehensive coverage would require
a whole library. Researchers and graduate students alike seem compelled
to a rather extreme degree of specialization. As a result, the subject is
threatened by disintegration into dozens or hundreds of subfields.
At the same time the interaction between the areas is livelier than ever,
and there is a steadily growing core of key results and techniques that every
probabilist needs to know, if only to read the literature in his or her own
field. Thus, it seems essential that we all have at least a general overview of
the whole area, and we should do what we can to keep the subject together.
The present volume is an earnest attempt in that direction.
My original aim was to write a book about "everything." Various space
and time constraints forced me to accept more modest and realistic goals
for the project. Thus, "foundations" had to be understood in the narrower
sense of the early 1970s, and there was no room for some of the more recent
developments. I especially regret the omission of topics such as large devia-
tions, Gibbs and Palm measures, interacting particle systems, stochastic
differential geometry, Malliavin calculus, SPDEs, measure-valued diffu-
sions, and branching and superprocesses. Clearly plenty of fundamental
and intriguing material remains for a possible second volume.
Even with my more limited, revised ambitions, I had to be extremely se-
lective in the choice of material. More importantly, it was necessary to look
for the most economical approach to every result I did decide to include.
In the latter respect, I was surprised to see how much could actually be
done to simplify and streamline proofs, often handed down through gen-
erations of textbook writers. My general preference has been for results
conveying some new idea or relationship, whereas many propositions of a
more technical nature have been omitted. In the same vein, I have avoided
technical or computational proofs that give little insight into the proven
results. This conforms with my conviction that the logical structure is what
matters most in mathematics, even when applications is the ultimate goal.
Though the book is primarily intended as a general reference, it should
also be useful for graduate and seminar courses on different levels, rang-
ing from elementary to advanced. Thus, a first-year graduate course in
measure-theoretic probability could be based on the first ten or so chapters,
while the rest of the book will readily provide material for more advanced
courses on various topics. Though the treatment is formally self-contained,
as far as measure theory and probability are concerned, the text is intended
for a rather sophisticated reader with at least some rudimentary knowledge
of subjects like topology, functional analysis, and complex variables.
x Foundations of Modern Probability
My exposition is based on experiences from the numerous graduate and
seminar courses I have been privileged to teach in Sweden and in the United
States, ever since I was a graduate student myself. Over the years I have
developed a personal approach to almost every topic, and even experts
might find something of interest. Thus, many proofs may be new, and every
chapter contains results that are not available in the standard textbook
literature. It is my sincere hope that the book will convey some of the
excitement I still feel for the subject, which is without a doubt (even apart
from its utter usefulness) one of the richest and most beautiful areas of
modern mathematics.
Notes and Acknowledgments: My first thanks are due to my numer-
ous Swedish teachers, and especially to Peter Jagers, whose 1971 seminar
opened my eyes to modern probability. The idea of this book was raised a
few years later when the analysts at Gothenburg asked me to give a short
lecture course on "probability for mathematicians." Although I objected
to the title, the lectures were promptly delivered, and I became convinced
of the project's feasibility. For many years afterward I had a faithful and
enthusiastic audience in numerous courses on stochastic calculus, SDEs,
and Markov processes. I am grateful for that learning opportunity and for
the feedback and encouragement I received from colleagues and graduate
students.
Inevitably I have benefited immensely from the heritage of countless
authors, many of whom are not even listed in the bibliography. I have
further been fortunate to know many prominent probabilists of our time,
who have often inspired me through their scholarship and personal example.
Two people, Klaus Matthes and Gopi Kallianpur, stand out as particularly
important influences in connection with my numerous visits to Berlin and
Chapel Hill, respectively.
The great Kai Lai Chung, my mentor and friend from recent years, of-
fered penetrating comments on all aspects of the work: linguistic, historical,
and mathematical. My c?lleague Ming Liao, always a stimulating partner
for discussions, was kind enough to check my material on potential theory.
Early versions of the manuscript were tested on several groups of graduate
students, and Kamesh Casukhela, Davorin Dujmovic, and Hussain Talibi
in particular were helpful in spotting misprints. Ulrich Albrecht and Ed
Slaminka offered generous help with software problems. I am further grate-
ful to John Kimmel, Karina Mikhli, and the Springer production team for
their patience with my last-minute revisions and their truly professional
handling of the project.
My greatst thanks go to my family, who is my constant source
of happiness and inspiration. Without their love, encouragement, and
understanding, this work would not have been possible.
Olav Kallenberg
May 1997
Contents
Preface to the Second Edition
vii
Preface to the First Edition
ix
1. Measure Theory - Basic Notions
Measurable sets and functions
measures and integration
monotone and dominated convergence
transformation of integrals
product measures and Fubini's theorem
LP -spaces and projection
approximation
measure spaces and kernels
1
2. Measure Theory - Key Results
Outer measures and extension
Lebesgue and Lebesgue-Stieltjes measures
Jordan-Hahn and Lebesgue decompositions
Radon-Nikodym theorem
Lebesgue's differentiation theorem
functions of finite variation
Riesz' representation theorem
Haar and invariant measures
23
3. Processes, Distributions, and Independence
Random elements and processes
distributions and expectation
independence
zero-one laws
Borel-Cantelli lemma
Bernoulli sequences and existence
moments and continuity of paths
45
4. Random Sequences, Series, and Averages
Convergence in probability and in £P
uniform integrability and tightness
convergence in distribution
convergence of random series
strong laws of large numbers
Portmanteau theorem
continuous mapping and approximation
coupling and measurability
62
Xll Foundations of Modern Probability
5. Characteristic Functions and Classical Limit Theorems 83
Uniqueness and continuity theorem
Poisson convergence
positive and symmetric terms
Lindeberg's condition
general Gaussian convergence
weak laws of large numbers
domain of Gaussian attraction
vague and weak compactness
6. Conditioning and Disintegration 103
Conditional expectations and probabilities
regular conditional distributions
disintegration
conditional independence
transfer and coupling
existence of sequences and processes
extension through conditioning
7. Martingales and Optional Times 119
Filtrations and optional times
random time-change
martingale property
optional stopping and sampling
maximum and upcrossing inequalities
martingale convergence, regularity, and closure
limits of conditional expectations
regularization of submartingales
8. Markov Processes and Discrete-Time Chains 140
Markov property and transition kernels
finite-dimensional distributions and existence
space and time homogeneity
strong Markov property and excursions
invariant distributions and stationarity
recurrence and transience
ergodic behavior of irreducible chains
mean recurrence times
9. Random W"alks and Renewal Theory 159
Recurrence and transience
dependence on dimension
general recurrence criteria
symmetry and duality
Wiener-Hop! factorization
Contents Xlll
ladder time and height distribution
stationary renewal process
renewal theorem
10. Stationary Processes and Ergodic Theory
Stationarity, invariance, and ergodicity
discrete- and continuous-time ergodic theorems
moment and maximum inequalities
multivariate ergodic theorems
sample intensity of a random measure
subadditivity and products of random matrices
conditioning and ergodic decomposition
shift coupling and the invariant a-field
178
11. Special Notions of Symmetry and Invariance
Palm distributions and inversion formulas
stationarity and cycle stationarity
local hitting and conditioning
ergodic properties of Palm measures
exchangeable sequences and processes
strong stationarity and predictable sampling
ballot theorems
entropy and information
202
12. Poisson and Pure Jump-Type Markov Processes
Random measures and point processes
Cox processes, randomization, and thinning
mixed Poisson and binomial processes
independence and symmetry criteria
Markov transition and rate kernels
embedded Markov chains and explosion
compound and pseudo-Poisson processes
ergodic behavior of irreducible chains
224
13. Gaussian Processes and Brownian Motion
Symmetries of Gaussian distribution
existence and path properties of Brownian motion
strong Markov and reflection properties
arcsine and uniform laws
law of the iterated logarithm
Wiener integrals and isonormal Gaussian processes
multiple Wiener-Ita integrals
chaos expansion of Brownian functionals
249
XIV . Foundations of Modern Probability
14. Skorohod Embedding and Invariance Principles 270
Embedding of random variables
approximation of random walks
functional central limit theorem
laws of the iterated logarithm
arcsine laws
approximation of renewal processes
empirical distribution functions
embedding and approximation of martingales
15. Independent Increments and Infinite Divisibility 285
Regularity and integral representation
Levy processes and subordinators
stable processes and first-passage times
infinitely divisible distributions
characteristics and convergence criteria
approximation of Levy processes and random walks
limit theorems for null arrays
convergence of extremes
16. Convergence of Random Processes, Measures, and Sets 307
Relative compactness and tightness
uniform topology on C(K, S)
Skorohod's J 1 -topology
equicontinuity and tightness
convergence of random measures
superposition and thinning
exchangeable sequences and processes
simple point processes and random closed sets
17. Stochastic Integrals and Quadratic Variation 329
Continuous local martingales and semimartingales
quadratic variation and covariation
existence and basic properties of the integral
integration by parts and [to's formula
Fisk-Stratonovich integral
approximation and uniqueness
random time-change
dependence on parameter
18. Continuous Martingales and Brownian Motion 350
Real and complex exponential martingales
martingale characterization of Brownian motion
random time-change of martingales
integral representation of martingales
Contents xv
iterated and multiple integrals
change of measure and Girsanov's theorem
Cameron-Marlin theorem
Wald's identity and Novikov's condition
19. Feller Processes and Semigroups 367
Semigroups, resolvents, and generators
closure and core
Hille- Yosida theorem
existence and regularization
strong Markov property
characteristic operator
diffusions and elliptic operators
convergence and approximation
20. Ergodic Properties of Markov Processes 390
transition and contraction operators
ratio ergodic theorem
space-time invariance and tail triviality
mixing and convergence in total variation
Harris recurrence and transience
existence and uniqueness of invariant measure
distributional and pathwise limits
21. Stochastic Differential Equations
and Martingale Problems 412
Linear equations and Ornstein-Uhlenbeck processes
strong existence, uniqueness, and nonexplosion criteria
weak solutions and local martingale problems
well-posedness and measurability
pathwise uniqueness and functional solution
weak existence and continuity
transformation of SDEs
strong Markov and Feller properties
22. Local Time, Excursions, and Additive Functionals 428
Tanaka's formula and semimartingale local time
occupation density, continuity and approximation
regenerative sets and processes
excursion local time and Poisson process
Ray-Knight theorem
excessive functions and additive functionals
local time at a regular point
additive functionals of Brownian motion
XVI Foundations of Modern Probability
23. One-dimensional SDEs and Diffusions
Weak existence and uniqueness
pathwise uniqueness and comparison
scale function and speed measure
time-change representation
boundary classification
entrance boundaries and Feller properties
ratio ergodic theorem
recurrence and ergodicity
450
24. Connections with PDEs and Potential Theory 470
Backward equation and Feynman-Kac formula
uniqueness for SDEs from existence for PDEs
harmonic functions and Dirichlet's problem
Green functions as occupation densities
sweeping and equilibrium problems
dependence on conductor and domain
time reversal
capacities and random sets
25. Predictability, Compensation, and Excessive Functions 490
Accessible and predictable times
natural and predictable processes
Doob-Meyer decomposition
quasi-Left-continuity
compensation of random measures
excessive and superharmonic functions
additive functionals as compensators
Riesz decomposition
26. Semimartingales and General Stochastic Integration 515
Predictable covariation and £2 -integral
semimarlingale integral and covariation
general substitution ,rule
Doleans' exponential and change of measure
norm and exponential inequalities
martingale integral
decomposition of semimartingales
quasi-martingales and stochastic integrators
27. Large Deviations 537
Legendre-Fenchel transform
Cramer's and Schilder's theorems
large-deviation principle and rate function
functional form of the LDP
Contents XVll
continuous mapping and extension
perturbation of dynamical systems
empirical processes and entropy
Strassen's law of the iterated logarithm
Appendices
AI. Advanced Measure Theory
Polish and Borel spaces
measurable inverses
projection and sections
A2. Some Special Spaces
Function spaces
measure spaces
spaces of closed sets
measure-valued functions
projective limits
561
562
Historical and Bibliographical Notes
569
Bibliography
596
Symbol Index
621
Author Index
623
Subject Index
629
Words of Wisdom and Folly
"A mathematician who argues from probabilities in geometry is not
worth an ace" - Socrates (on the demands of rigor in mathematics)
,. "[We will travel a road] full of interest of its own. It familiarizes us
with the measurement of variability, and with curious laws of chance
that apply to a vast diversity of social subjects" - Francis Galton
(on the wondrous world of probability)
,. "God doesn't play dice" [i.e., there is no randomness in the universe]
- Albert Einstein (on quantum mechanics and causality)
,. "It might be possible to prove certain theorems, but they would not
be of any interest, since, in practice, one could not verify whether the
assumptions are fulfilled" - Emile Borel (on why bothering with
probability)
,. "[The stated result] is a special case of a very general theorem [the
strong Markov property]. The measure [theoretic] ideas involved are
somewhat glossed over in the proof, in order to a.void complexities
out of keeping with the rest of this paper" - Joseph L. Doob (on
why bothering with generality or mathematical rigor)
"Probability theory [has two hands]: On the right is the rigorous
[technical work]; the left hand. .. reduces problerns to gambling sit-
uations, coin-tossing, motions of a physical particle" - Leo Breiman
(on probabilistic thinking)
,. "There are good taste and bad taste in mathematics just as in music,
literature, or cuisine, and one who dabbles in it must stand judged
thereby" - Kai Lai Chung (on the art of writing mathematics)
,. "The traveler often has the choice between climbing a peak or using
a cable car" - William Feller (on the art of reading mathematics)
"A Catalogue Aria of triumphs is of less benefit [to the student] than
an indication of the techniques by which such results are achieved"
- David Williams (on seduction and the art of discovery)
tit "One needs [for stochastic integration] a six months course [to cover
only] the definitions. What is there to do?" - Paul-Andre Meyer (on
the dilemma of modern math education)
,. "There were very many [bones] in the open valley; and 10, they were
very dry. And [God] said unto me, 'Son of man, can these bones live?'
And I answered, '0 Lord, thou knowest.'" - Ezekiel 37:2-3 (on the
ultimate reward of hard studies, as quoted by Chris Rogers and David
Williams)
Chapter 1
Measure Theory - Basic Notions
Measurable sets and functions; measures and integration; mono-
tone and dominated convergence; transformation of integrals;
product measures and Fubini's theorem; LP -space.., and projec-
tion; approximation; measure spaces and kernels
Modern probability theory is technically a branch of measure theory, and
any systematic exposition of the subject must begin with some basic
measure-theoretic facts. In this chapter and its sequel we have collected
some basic ideas and results from measure theory that will be useful
throughout this book. Though most of the quoted propositions may be
found in any textbook in real analysis, our emphasis is often somewhat
different and has been chosen to suit our special needs. Many readers
may prefer to omit these chapters on their first encounter and return for
reference when the need arises.
To fix our notation, we begin with some elementary notions from set
theory. For subsets A, A k , B,... of some abstract space !1, recall the defi-
nitions of union Au B or Uk A k , intersection An B or nk Ak, complement
A c, and difference A \ B = A n BC. The latter is said to be proper if A B.
The symmetric difference of A and B is given by AB == (A \ B) U (B \ A).
Among basic set relations, we note in particular the distributive laws
AnUkB k = Uk(AnB k ),
and de Morgan's laws
Au n k Bk = nk(A U B k ),
{UkA k f = nkAk'
{nkAkf =UkA k ,
valid for arbitrary (not necessarily countable) unions and intersections.
The latter formulas allow us to convert any relation involving unions
(intersections) into the dual formula for intersections (unions).
We define a a-algebra or a-field in n as a nonempty collection A of
subsets of n that is closed under countable unions and intersections as well
as under complementation. (For a field, closure is required only under finite
set operations.) Thus, if A, AI, A 2 , . . . E A, then also A c, Uk Ak' and nk Ak
lie in A. In particular, the whole space !1 and the empty set 0 belong to
every a-field. In any space !1 there is a smallest a-field {0, n} and a largest
one 2!1 -the class of all subsets of O. Note that any O"-field A is closed
under monotone limits. Thus, if AI, A 2 ,. .. E A with An t A or An 4.. A,
2 Foundations of Modern Probability
then also A E A. A measurable space is a pair (O,A), where 0 is a space
and A is a a-field in f2.
For any class of a-fields in 0, the intersection (but usually not the union)
is again a a-field. If C is an arbitrary class of subsets of f2, there is a smallest
a-field in 0 containing C, denoted by a(C) and called the a-field generated
or induced by C. Note that a(C) can be obtained as the intersection of all
a-fields in n that contain C. We endow a metric or topological space S with
its Borel a-field B(8) generated by the topology (class of open subsets) in
8, unless a a-field is otherwise specified. The elements of 8(8) are called
Borel sets. In the case of the real line JR, we often write B instead of B(JR).
More primitive classes than a-fields often arise in applications. A class
C of subsets of some space f2 is called a 1r-system if it is closed under finite
intersections, so that A, B E C implies An B E C. Furthermore, a class V
is a A-system if it contains f2 and is closed under proper differences and
increasing limits. Thus, we require that 0 E V, that A, B E V with A B
implies A \ B E V, and that AI, A 2 ,... E V with An t A implies A E V.
The following monotone-class theorem is often useful to extend an estab-
lished property or relation from a class C to the generated a-field a(C). An
application of this result is referred to as a monotone-class argument.
Theorem 1.1 (monotone classes, Sierpinski) Let C be a 1r-system and V
a A-system in some space f! such that C c V. Then a(C) c v.
Proof: We may clearly assume that V == A( C) -the smallest A-system
containing C. It suffices to show that V is a 7r-system, since it is then a
a-field containing C and therefore contains the smallest a-field a(C) with
this property. Thus, we need to show that A n B E V whenever A, B E V.
The relation A n B E V is certainly true when A, B E C, since C is a
1r-system contained in V. We proceed by extension in two steps. First we
fix any B E C and define AB = {A c 0; An B E V}. Then AB is a A-
system containing C, and so it contains the smallest .A-system V with this
property. This shows that An B E V for any A E V and B E C. Next we
fix any A E V and define B A = {B c 0; An B E V}. As before, we note
that even B A contains V, which yields the desired property. 0
For any family of spaces Qt, t E T, we define the Cartesian product
X tET f2 t as the class of all collections (Wt; t E T), where Wt E f!t for all t.
When T = {I,..., n} or T = N = {I, 2, . . . }, we often write the product
space as 0 1 x.. . x On or f2 I x O 2 X - · . , respectively; if f1t = 0 for all t, we
use the notation OT, on, or 0 00 . In case of topological spaces Ot, we endow
Xtf2t with the product topology unless a topology is otherwise specified.
Now assume that each space Ot is equipped with a a-field At. In Xtnt
we may then introduce the product a-field @t At, generated by all one-
dimensional cylinder sets At x XstOs, where t E T and At E At- (Note
the analogy with the definition of product topologies.) As before, we write
Al @ - . . Q9 An, Al QS) A 2 Q9 . . . , AT, An, or Aoo in the appropriate special
cases.
1. Measure Theory - Basic Notions 3
Lemma 1.2 (product and Borel a-fields) If 8 1 ,8 2 ,... are separable
metric spaces, then
B ( 8 1 X 8 2 X . . . ) == B ( 8 1 ) Q9 B ( 8 2 ) Q9 . .. .
Thus, for countable products of separable metric spaces, the product
and Borel a-fields agree. In particular, B(JR d ) == (B(JR))d == B d , the a-
field generated by all rectangular boxes II x . . . X I d , where II, . . . ,Id are
arbitrary real intervals. This special case can also be proved directly.
Proof: The assertion may be written as a(C l ) == a(C 2 ), and it suffices to
show that C l C a(C 2 ) and C 2 C a(C l ). For C 2 we may choose the class of
all cylinder sets Gk X Xn¥=k8n with kEN and G k open in Sk. Those sets
generate the product topology in S == XnSn, and so they belong to B(S).
Conversely, we note that S == XnSn is again separable. Thus, for any
topological base C in 8, the open subsets of 8 are countable unions of sets
in C. In particular, we may choose C to consist of all finite intersections of
cylinder sets G k X Xn#kSn as above. It remains to note that the latter sets
lie in @n B(Sn). 0
Every point mapping f between two spaces Sand T induces a set
mapping f-I in the opposite direction, that is, from 2 T to 2 8 , given by
f-IB=={SES;f(s)EB}, BeT.
Note that f-l preserves the basic set operations in the sense that, for any
subsets Band Bk of T,
j-IB C == (j-1B)C,
j -l U B == U f -IB
k kkk,
1-1nkBk = n/- 1Bk .
(1)
The next result shows that f-I also preserves a-fields, jn both directions.
For convenience, we write
f- I C=={f- l B;BEC}, Ce2 T .
Lemma 1.3 (induced a-fields) Let I be a mapping between two measur-
able spaces (8, S) and (T,7). Then
(i) S' == f-IT is a u-field in 8;
(ii) T ' == {B c T; f-IB E S} is a a-field in T.
Proof: (i) Let A, AI, A 2 , . . . E S'. Then there exists some sets B, B I , B 2 ,
.. . E T with A = 1-1 B and An == f-l Bn for each n. Since T is a a-field,
the sets BC, Un Bn, and nn En all belong to T, and by (1) we get
(f-l B)C == j-l B C E j- 1 T == S'.
U n l-l Bn = 1-1 Un Bn E I-IT = 5',
nn l - 1 Bn = 1-1 nn Bn E I-IT = 5'.
A C
Un An -
nn An -
4 Foundations of Modern Probability
(ii) Let B, B 1 , B 2 ,... E T', so that j-l B, f-l B 1 , j-l B 2 ,... E S. Using
(1) and the fact that S is a a-field, we get
j-lB C (j-l B)C E S,
1-1 Un Bn U n l-l Bn E S,
j-l n Bn n j- 1B n E S,
n n
which shows that BC, Un En, and nn Bn all lie in T'.
o
Given two measurable spaces (S, S) and (T, T), a mapping f: S T
is said to be SIT-measurable or simply measurable if f-l, C S, that is,
if j-l B E S for every BET. (Note the analogy with the definition of
continuity in terms of topologies on Sand T.) By the next result, it is
enough to verify the defining condition for a generating subclass.
Lemma 1.4 (measurable functions) Consider a mapping f between two
measurable spaces (8,S) and (T, T), and let C C 2 T with a(C) = T. Then
j is SIT-measurable iff j-IC C S.
Proof: Let 7' = {B c T; f-l B E S}. Then C c 7' by hypothesis and
7' is a u-field by Lemma 1.3 (ii). Hence,
T' = a(T') :J a(C) = T,
which shows that 1-1 B E S for all BET.
o
Lemma 1.5 (continuity and measurability) Let j be a continuous map-
ping between two topological spaces Sand T with Borel a-fields Sand ,.
Then f is S IT -measurable.
Proof: Let S' and 7' denote the classes of open sets in Sand T. Since f
is continuous and 8 = u(8'), we have
j- 1 T' C 8' c S.
By Lemma 1.4 it follows that f is 8ja(T')-measurable. It remains to note
that a(7') = I. 0
We insert a result about subspace topologies and a-fields that will be
needed in Chapter 16. Given a class C of subsets of S and a set A c S, we
define An C = {A n c; C E C}.
Lemma 1.6 (subspaces) Fix a metric space (S,p) with topology T and
Borel a-field S, and let A c S. Then (A, p) has topology IA = AnT and
Borel O"-field SA = An S.
Proof: The natural embedding lA: A -+ S is continuous and hence mea-
surable, and so AnT = J A 1 T C 7A and An S = lA1S C SA- Conversely,
given any BElA, we define G = (B U AC)O where the complement and
interior are with respect to S and note that B = AnG. Hence, TA cAnT,
1. Measure Theory - Basic Notions 5
and therefore
SA == a(TA) C a(A n T) c a(A n S) == A r1 S,
where the operation a(.) refers to the subspace A.
o
As with continuity, we note that even measurability is preserved by
composition.
Lemma 1.7 (composition) Fix three measurable spaces (5,S), (T, T),
and (U, U), and consider some measurable mappings .f : 5 T and
g: T U. Then the composition h == 9 0 1: S U is again measurable.
Proof: Let C E U, and note that B _ g-lC E T since 9 is measur-
able. Noting that (1 0 g)-l == g-1 0 1-1 and using the fact that even f is
measurable, we get
h- 1 e = (1 0 g)-Ie = g-1 I-Ie == g-l B E S.
o
To state the next result, we note that any collection of functions It : n -+
St, t E T, defines a mapping f = (it) from n to XtS t given by
I(w) = (/t(w); t E T), wEn.
(2)
It is often useful to relate the measurability of 1 to that of the coordinate
mappings It.
Lemma 1.8 (collections of functions) Consider any set of functions It:
[! -t St, t E T, where (O,A) and (St,St), t E T, are rneasurable spaces,
and define I = (It) : n -+ XtS t . Then I is A/ @t St-m,easurable iff It is
A/St-measurable for every t E T.
Proof: We may use Lemma 1.4, with C equal to the class of cylinder sets
At x Xs:f.tSt for arbitrary t E T and At ESt. 0
Changing our perspective, assume the It in (2) to be mappings into some
measurable spaces (St, St). In n we may then introduce the generated or
induced a-field u(/) = u{/t; t E T}, defined as the smallest a-field in n
that makes all the It measurable. In other words, a(l) is the intersection of
all a-fields A in n such that it is A/St-measurable for every t E T. In this
notation, the functions It are clearly measurable with respect to a a-field
A in n iff a(f) C A. It is further useful to note that o-(f) agrees with the
u-field in n generated by the collection {ft-lSt; t E T}.
For functions on or into a Euclidean space JRd, measurability is under-
stood to be with respect to the Borel u-field B d . Thus, a real-valued function
f on some measurable space (!1,A) is measurable iff {uJ; f(w) < x} E A
for all x E R. The same convention applies to functions into the ex-
tended real line JR = [-00,00] or the extended half-line JR + = [0,00],
regarded as compactifications of JR and JR+ == [0,(0), respectively. Note
that B( JR ) = u{B,:i:oo} and B( JR +) = u{B(R+),oo}.
For any set A C 0, we define the associated indicator function lA: 0 JR
to be equal to 1 on A and to 0 on A c. (The term characte'ristic function has
6 Foundations of Modern Probability
a different meaning in probability theory.) For sets A = {W; f(w) E B}, it
is often convenient to write 1{.} instead of 1{.}. Assuming A to be a a-field
in 0, we note that lA is A-measurable iff A E A.
Linear combinations of indicator functions are called simple functions.
Thus, a general simple function 1: n -+ ]R has the form
f == CI 1 A 1 + ... + cn1An'
where n E Z+ = {O, 1, . . . }, Cl, . . . , en E R, and AI,. . . , An C O. Here we
may clearly take Cl, . . . , C n to be the distinct nonzero values attained by f
and define Ak = 1-1 { Ck}, k = 1, . . . , n. With this choice of representation,
we note that f is measurable with respect to a given O"-field A in 11 iff
AI, . . . , An E A.
We proceed to show that the class of measurable functions is closed under
the basic finite or countable operations occurring in analysis.
Lemma 1.9 (bounds and limits) Let 11,12,... be measurable junc-
tions from some measurable space (11,A) into JR . Then sUP n In, inf n fn,
limsuPn In, and liminf n In are again measurable.
Proof: To see that sUP n f n is measurable, write
{W; sUPnfn(w) < t} = nn {w; fn(w) < t} = nn f ;l[-oo, t] E A,
and use Lemma 1.4. The measurability of the other three functions follows
easily if we write inf n In = -suPn( - In) and note that
lim sup f n = inf sup Ik, lirn inf In == sup inf fk. 0
n--+oo n kn n--+oo n kn
Since I n f iff lim sUPn In == lirn inf n f n == I, it follows easily that both
the set of convergence and the possible limit are measurable. The next
result gives an extension to functions with values in more general spaces.
Lemma 1.10 (convergence and limits) Let 11,12,. .. be measurable func-
tions from a measurable space (S1, A) into some metric space (8, p).
Then
(i) {w; 1n(w) converges} E A when S is complete;
(ii) f n --t f on n implies that f is measurable.
Proof: (i) Since S is complete, the convergence of In is equivalent to the
Cauchy convergence
Hm sup P(fm, In) = O.
n--+oo mn
Here the left-hand side is measurable by Lemmas 1.5 and 1.9.
(ii) If f n f, we have 9 0 f n ---t 9 0 f for any continuous function 9: S --t
IR, and so go f is measurable by Lemmas 1.5 and 1.9. Fixing any open set
G C S, we may choose some continuous functions 91, 92, . . .: S -+ JR+ with
9n t 1G and conclude from Lemma 1.9 that IG 0 f is measurable. Thus,
G E A for all and so is measurable Lemma 1.4. 0
1. Measure Theory - Basic Notions 7
Many results in measure theory are proved by a simple approximation,
based on the following observation.
Lemma 1.11 (approximation) For any measurable function f: (O,A) -+
R+, there exist some simple measurable functions fl, f2, . . . : n -+ JR.+ with
o < In t f.
Proof: We may define
fn(w) == 2- n [2 n f(w)) /\ n, w E 0, n E N.
o
To illustrate the method, we may use the last lemma to prove the
measurability of the basic arithmetic operations.
Lemma 1.12 (elementary operations) Fix any measurable functions f, g:
(n, A) -+ JR and constants a, b E JR. Then af + bg and f 9 are again
measurable, and so is I / 9 when 9 =1= 0 on n.
Proof: By Lemma 1.11 applied to f:f= == (-:t.f) V 0 and g:f= == (:t:g) V 0,
we may approximate by simple measurable functions f n -+ f and gn -+ g.
Here afn + bg n and Ingn are again simple measurable functions. Since
they converge to af + bg and f g, respectively, even the latter functions are
measurable by Lemma 1.9. The same argument applies to the ratio f / g,
provided we choose 9n -1= o.
An alternative argument is to write af + bg, fg, or f /9 as a composition
1/J 0 'P, where 'P == (I, g) : n -+ JR2, and 'ljJ(x, y) is defined as ax + by, xy,
or x/y, repectively. The desired measurability then fo11o",'s by Lemmas 1.2,
1.5, and 1.8. In the case of ratios, we may use the continuity of the mapping
(x, y) r-t x/yon JR x (JR \ {O}). 0
For many statements in measure theory and probability, it is convenient
first to give a proof for the real line and then to extend the result to more
general spaces. In this context, it is useful to identify pairs of measurable
spaces S and T that are Borel isomorphic, in the sense that there exists a
bijection f: S -+ T such that both f and j-l are measurable. A space S
that is Borel isomorphic to a Borel subset of [0, 1] is called a Borel space.
In particular, any Polish space endowed with its Borel a-field is known to
be Borel (cf. Theorem Al.2). (Recall that a topological space is said to be
Polish if it admits a separable and complete metrization.)
The next result gives a useful functional representation of measurable
functions. Given any two functions f and 9 on the same space f2, we say
that f is g-measurable if the induced a-fields are related by a(f) C a(g).
Lemma 1.13 (functional representation, Doob) Fix two measurable func-
tions f and 9 from a space n into some measurable spaces (5, S) and (T, T),
where the former is Borel. Then j is g-measurable iff there exists some
measurable mapping h: T -+ S with f == hog.
Proof: Since S is Borel, we may assume that S E B([O, 1]). By a suitable
modification of h, we may further reduce to the case when S == [0,1]. If
8 Foundations of Modern Probability
1 == lA with a g-measurable A c 0, then by Lemma 1.3 there exists some
set BET with A == g-l B. In this case I == lA == lB 0 g, and we may
choose h == lB. The result extends by linearity to any simple g-measurable
function f. In the general case, there exist by Lemma 1.11 some simple
g-measurable functions 11,12,... with 0 < In t f, and we may choose
associated T-measurable functions hI, h 2 ,. ..: T [0,1] with In == h n 0 g.
Then h == sUP n h n is again T-measurable by Lemma 1.9, and we note that
hog == (suPnhn) 0 9 == sUPn(h n 0 g) == suPnfn == f.
o
Given any measurable space (SJ, A), a function J.l: A --7 JR + is said to be
countably additive if
H u Ak " HAk AI, A 2 , ... E A dis;oint.
fA' kl == klfA' ,
(3)
A measure on ([2, A) is defined as a function J1: A + with J-l0 == 0 and
satisfying (3). A triple (0, A, J-l) as above, where J.l is a measure, is called a
measure space. From (3) we note that any measure is finitely additive and
nondecreasing. This implies in turn the countable subadditivity
lI. u A k < II.A k A1 , A2 , ...EA.
fA' k;::::l - k?lfA' ,
We note the following basic continuity properties.
Lemma 1.14 (continuity) Let J1 be a measure on (O,A), and assume that
AI, A 2, . . . EA. Then
(i) An t A implies J1 A n t J.lA;
(ii) An t A with J.tAl < 00 implies JLAn JLA.
Proof: For (i) we may apply (3) to the differences Dn = An \ An-1 with
Ao == 0. To get (ii), apply (i) to the sets Bn == Al \ An. 0
The simplest measures on a measurable space (f2, A) are the unit masses
or Dirac measures 8x, x E 0, given by 8xA == lA(x). For any countable
set A == {Xl, x2,... }, we may form the associated counting measure JL ==
Ln 8xn. More generally, we o may form countable linear combinations of
arbitrary measures on 0, as follows.
Proposition 1.15 (series of measures) For any measures {t1,J-l2,...
on (0, A) and constants C1, C2, . .. > 0, the sum J.1 = Ln cnJ-ln is again a
measure.
Proof: We need the fact that, for any array of constants Cij > 0, i,j E N,
2:: i 2::/ i j = LjLiCij.
(4)
This is trivially true for finite sums. In general, let m, n E N and write
. > "" c.. - " c. .
ij'-"£J - imjn J - jnim J.
1. Measure Theory - Basic Notions 9
Letting m --t 00 and then n --t 00, we obtain (4) with the inequality > .
The same argument yields the reverse relation, and the equality follows.
Now consider any disjoint sets A 1 ,A 2 ,... E A. Using (4) and the
countable additivity of each j.,Ln, we get
J.tU k Ak I: n CnJ.tn Uk Ak = I:nI: k CnJ.tnAk
I:kI: n CnJ.tnAk = I:kJ.tA k .
The last result may be restated in terms of monotone sequences.
o
Corollary 1.16 (monotone limits) Let J.11, J-l2, . .. be measures on a mea-
surable space (0, A) such that either J-ln t J-l, or J.1n t J1 with J.11 bounded.
Then J1 is again a measure on (f!, A) .
Proof: In the increasing case, we may apply Proposition 1.15 to the sum
J-l = n (J..tn - j.,Ln-l), where J-lo == O. For decreasing sequences, the previous
case applies to the increasing measures J-ll - J.1n. 0
For any measure j.,L on (0, A) and set B E A, the function v : A t---+
j.,L( A n B) is again a measure on (0, A), called the restriction of J.1 to B.
Given any countable partition of 0 into disjoint sets AI, A 2 , . .. E A, we
note that J-l == n J-ln, where J-ln denotes the restriction of J.1 to An. The
measure J-L is said to be 0'- finite if the partition can be chosen such that
j.,LAn < 00 for all n. In that case the restrictions J.ln are clearly bounded.
A measure J-l on some topological space S with Borel a-field S is said to
be locally finite if every point s E S has a neighborhood where J.l is finite.
A locally finite measure on a a-compact space is clearly a-finite. It is often
useful to identify simple measure-determining classes C c S such that a
measure on S is uniquely determined by its values on C. For locally finite
measures on a Euclidean space JRd, we may take C == T d , the class of all
bounded rectangles.
Lemma 1.17 (uniqueness) Let J.1 and v be bounded rneasures on some
measurable space (0, A) and let C be a 7r-system in f! such that nEe and
O'(C) == A. Then J.1 == v iff J-lA == vA for all A E C.
Proof: Assuming J.1 == v on C, let V denote the class of sets A E A with
J-lA == vA. Using the condition 0 E C, the finite additivity of J-l and v, and
Lemma 1.14, we see that V is a A-system. Moreover, C c D by hypothesis.
Hence, Theorem 1.1 yields V => a(C) == A, which means that J.1 == v. The
converse assertion is obvious. D
For any measure J..L on a topological space S, the support sUPP J-l is defined
as the smallest closed set F c S with J.lFc == O. If Isupp J.11 < 1, then J.1 is
said to be degenerate, and we note that J..L == cDs for some s E Sand c > o.
More generally, a measure J..t is said to have an atom at s E S if {s} E S
and J-l{ s} > O. For any locally finite measure J..L on some a-compact metric
space S, the set A == {s E S; J..L{ s} > O} is clearly measurable, and we may
define the atomic and diffuse components J.la and J..td of J.l as the restrictions
10 Foundations of Modern Probability
of jj to A and its complement. We further say that jj is diffuse if jja = 0
and purely atomic if ILd = o.
In the important special case when jj is locally finite and integer valued,
the set A above is clearly locally finite and hence closed. By Lemma 1.14
we further have supp JL C A, and so jj is purely atomic. Hence, in this case
J-t = LSEA csb s for some integers C s . In particular, J-L is said to be simple if
C s = 1 for all sEA. Then clearly IL agrees with the counting measure on
its support A.
Any measurable mapping f between two measurable spaces (S, S) and
(T, T) induces a mapping of measures on 8 into measures on T. More
precisely, given any measure J-t on (8, S), we may define a measure J-t 0 f-l
on (T, 7) by
(J10 f-l)B = J.L(f-lB) = J.L{s E S; I(s) E B}, BET.
Here the countable additivity of IL 0 f-l follows from that for JL together
with the fact that 1-1 preserves unions and intersections.
Our next aim is to define the integral
JLf = J fdJL = J f(w)JL()
of a real-valued, measurable function f on some measure space (n, A, Jl).
First assume that f is simple and nonnegative, hence of the form CI1Al +
. . . + C n 1 An for some n E Z+, AI,. . . , An E A, and Cl, . . . , C n E JR+, and
define
J.Lf = cIJ.LA I + . . . + C n J1 A n.
(Throughout measure theory we are following the convention 0 . 00 = 0.)
Using the finite additivity of J-t, it is easy to verify that ILl is independent
of the choice of representation of I. It is further clear that the mapping
f H- J.Lf is linear and nondecreasing, in the sense that
Jl(al + bg)
f < g =>
aJ.Lf + bjjg, a, b > 0,
ILl < ILg.
To extend the integral to any nonnegative measurable function f, we
may choose as in Lemma 1.11 some simple measurable functions 11,/2,. . .
with 0 < In t f, and define ILl = limn JLfn o The following result shows that
the limit is independent of the choice of approximating sequence (In).
Lemma 1.18 (consistency) Fix any measurable function f > 0 on some
measure space (n, A, J.t), and let fl, f2, . o. and 9 be simple measurable
functions satisfying 0 < f n t f and 0 < 9 < f. Then limn J.tf n > JLg.
Proof: By the linearity of JL, it is enough to consider the case when 9 = lA
for some A E A. Then fix any c > 0, and define
An = {w E A; fn(w) > 1 - e}, n E N.
1. Measure Theory - Basic Notions 11
Here An t A, and so
J-Lln > (1 - t:)JtAn t (1 - t:)JtA = (1 - t:)/lg.
I t remains to let t: --+ O.
o
The linearity and monotonicity extend immediately to arbitrary f > 0,
since if fn t f and gn t g, then afn + bg n t af + bg, and if also f < g,
then fn < (fn V gn) t g. We are now ready to prove the basic continuity
property of the integral.
Theorem 1.19 (monotone convergence, Levi) Let 1,11, f2 . .. be measur-
able functions on (n, A, J-L) with 0 < f n t I. Then Jt f n t J-" f .
Proof: For each n we may choose some simple measurable functions gnk,
with 0 < 9nk t fn as k --+ 00. The functions h nk == glk V . . . V 9nk have the
same properties and are further nondecreasing in both indices. Hence,
f > lim h kk > lim hnk == fn t f,
k--+oo k--+oo
and so 0 < h kk t f. Using the definition and rnonotonicity of the integral,
we obtain
J-Lf == Hrn Jth kk < lim Jtfk < Jtf.
k--+oo k--+oo
The last result leads to the following key inequality.
Lemma 1.20 (Fatou) For any measurable functions 11, f2,' . . > 0 on (0,
A, Jl), we have
o
liminf Jlfn > JLliminf fn.
n--+oo noo
Proof: Since 1m > inf k 2::n fk for all m > n, we have
inf J-Lfk > Jl inf fk, n E N.
k2::n kn
Letting n --+ 00, we get by Theorem 1.19
lirn inf Jtfk > lim J-L inf fk == JL Hm inf fk'
k-HX> n-+oo kn koo
o
A measurable function f on (0, A, JL) is said to be integrable if J-Llfl <
00. In that case f may be written as the difference of two nonnegative,
integrable functions 9 and h (e.g., as f+ - f-, where fT. = (-:1::f) V 0), and
we may define Jlf = Jl9 - J-Lh. It is easy to check that the extended integral is
independent of the choice of representation f = 9 - h and that J-lf satisfies
the basic linearity and monotonicity properties (the former with arbitrary
real coefficients).
We are now ready to state the basic condition that allows us to take
limits under the integral sign. For gn = 9 the result reduces to Lebesgue '8
dominated convergence theorem, a key result in analysis.
12 Foundations of Modern Probability
Theorem 1.21 (dominated convergence, Lebesgue) Let 1,/1,/2,. .. and
g, gl, g2,. .. be measurable functions on (0, A, J.t) with Ifni < gn for all n,
and such that In -t f, gn -t g, and J-tgn -t J-tg < 00. Then J.tfn -t J-tf.
Proof: Applying Fatou's lemma to the functions gn ::l: fn > 0, we get
J-tg + liminf(::l:J.Lfn) = liminf J.t(gn::i: fn) > J.t(g::l: f) = J-tg::l: J.tf.
noo noo
Subtracting J-tg < (X) from each side gives
J.tf < Hm inf J-tfn < lim sup J-tfn < J.tf.
noo
o
noo
The next result shows how integrals are transformed by measurable
mappIngs.
Lemma 1.22 (substitution) Consider a measure space (0, A, J.t), a mea-
surable space (S, S) , and two measurable mappings f : 0 -+ Sand
g: S -t JR. Then
J-t(g 0 f) = (J-t 0 f-l)g (5)
whenever either side exists. (In other words, if one side exists, then so does
the other and the two are equal.)
Proof: If 9 is an indicator function, then (5) reduces to the definition of
J.Lof-l. From here on we may extend by linearity and monotone convergence
to any measurable function 9 > O. For general 9 it follows that J-tlg 0 II =
(J-t 0 f-l )lgl, and so the integrals in (5) exist at the same time. When they
do, we get (5) by taking differences on both sides. 0
Turning to the other basic transformation of measures and integrals, fix
any measurable function f > 0 on some measure space (n, A, J.t), and define
a function f . J.L on A by
(f . JL)A = JL(lAf) = i fdJL, A E A,
where the last relation defines the integral over a set A. It is easy to check
that v = I . J.L is again a measure on (n, A). Here f is referred to as the
J-t-density of v. The corresponding transformation rule is as follows.
Lemma 1.23 (chain rule) For any measure space (0, A, j.t) and measur-
able functions f: n -t JR+ and g: 0 JR, we have
J.L(fg) = (f. J.t)g
whenever either side exists.
Proof: As in the last proof, we may begin with the case when 9 is an
indicator function and then extend in steps to the general case. 0
Given a measure space (0, A, j.t), a set A E A is said to be J.L-null or
simply null if J-tA = o. A relation between functions on n is said to hold
almost everywhere with respect to j.t (abbreviated as a. e. J.t or j.t- a. e.) if it
1. Measure Theory - Basic Notions 13
holds for all w E 0 outside some Jl-null set. The following frequently used
result explains the relevance of null sets.
Lemma 1.24 (null sets and junctions) For any measurable function f >
o on some measure space (0, A, Jl), we have JlI == 0 iff f == 0 a. e. Jl.
Proof' The statement is obvious when f is simple. In the general case,
we may choose some simple measurable functions In with 0 < fn t f, and
note that f = 0 a.e. iff In = 0 a.e. for every n, that is, iff J-lfn = 0 for
all n. Here the latter integrals converge to JlI, and so the last condition is
equivalent to J.lf == o. 0
The last result shows that two integrals agree when the integrands are
a.e. equal. We may then allow integrands that are undefined on some Jl-null
set. It is also clear that the conclusions of Theorems 1.19 and 1.21 remain
valid if the hypotheses are only fulfilled outside some null set.
In the other direction, we note that if two a-finite measures Jl and v are
related by v == I . Jl for some density f, then the latter is Jl-a.e. unique,
which justifies the notation I == d1/ / dJ-l. It is further clear that any J-l-null
set is also a null set for v. For measures Jl and v with the latter property,
we say that v is absolutely continuous with respect to J-l and write v «:: Jl.
The other extreme case is when Jl and 1/ are mutually singular or orthogonal
(written as Jl .1. 1/), in the sense that ttA == 0 and v A C == 0 for some set
AE A.
Given a measure space (O,A,Jl) and a a-field :F c A, we define the
J.l-completion of F in A as the a-field FJ-L == a(F,NJ-L)' where NJ-L denotes
the class of all subsets of arbitrary J.t-null sets in A. The description of :FJ-L
can be made more explicit, as follows.
Lemma 1.25 (completion) Consider a measure space (O,A,Jl), a a-field
:F c A, and a Borel space (8, S). Then a function I : 0 -t S is :FJ-L-
measurable iff there exists some :F -measurable function g satisfying f == 9
a.e. J.L.
Proof: Beginning with indicator functions, let 9 be the class of subsets
A c n such that AB E NJ-L for some B E F. Then A \ Band B \ A are
again in NJ-L' which implies 9 C FJ-L. Conversely, FI-L C 9 since both :F and
NJ-L are trivially contained in 9. Combining the two relations gives 9 == FJ-L,
which shows that A E ;:P- iff lA == IB a.e. for some B E :F.
In the general case, we may clearly assume that S =: [0,1]. For any:FJ-l-
measurable function f, we may then choose some simple FJ-L-measurable
functions fn such that 0 < In t f. By the result for indicator functions, we
may next choose some simple F-measurable functions 9n such that In == 9n
a.e. for each n. Since a countable union of null sets is again a null set, the
function 9 == Jim sUPn gn has the desired property. 0
Any measure tt on (0, A) has a unique extension to the a-field AJ-L. In-
deed, for any A E AJ£ there exist by Lemma 1.25 some sets A::f: E A with
14 Foundations of Modern Probability
A_ cAe A+ and J.L(A+ \ A_) = 0, and any extension must satisfy
J.tA == J1. A :i:. With this choice, it is easy to check that J1. remains a measure
on AIL.
Our next aims are to construct product measures and to establish the
basic condition for changing the order of integration. This requires a
preliminary technical lemma.
Lemma 1.26 (sections) Fix two measurable spaces (8, S) and (T, T), a
measurable function f: 8 x T -+ IR+, and a a-finite measure J1. on S. Then
(i) f(8, t) is S-measurable in s E 8 for each t E T;
(ii) J f(s,t)J1.(ds) is T-measurable in t E T.
Proof: We may assume that J.l is bounded. Both statements are obvious
when f == 1A with A = B x C for some B E Sand C E T, and they extend
by a monotone class argument to any indicator functions of sets in S 0 T.
The general case follows by linearity and monotone convergence. 0
We are now ready to state the main result involving product measures,
commonly referred to as Fubini's theorem.
Theorem 1.27 (product measures and iterated integrals, Lebesgue, Fu-
bini, Tonelli) For any u-finite measure spaces (8, S, J.l) and (T, T, v), there
exists a unique measure jj 0 v on (8 x T, S Q9 T) satisfying
(J.t 0 v)(B x C) = J.LB . vC, B E 5, C E T. (6)
Furthermore, for any measurable function f: S x T --+ +,
(J.l Q9 v)f = f J.l(ds) f f(s, t)v(dt) = J v(dt) f f(s, t)J.l(ds). (7)
The last relation remains valid for any measurable function f: 8 x T -+ JR
with (J1. 0 v) If I < 00.
Note that the iterated integrals in (7) are well defined by Lemma 1.26,
although the inner integrals vf(s,.) and J..Lf(., t) may fail to exist on some
null sets in Sand T, respectively.
Proof By Lemma 1.26 we may define
(J.l Q9 v)A = J J.l(ds) f lACS, t)v(dt), A E S Q9 T, (8)
which is clearly a measure on S x T satisfying (6). By a monotone class
argument there can be at most one such measure. In particular, (8) remains
true with the order of integration reversed, which proves (7) for indicator
functions f. The formula extends by linearity and monotone convergence
to arbitrary measurable functions f > o.
In the general case, we note that (7) holds with f replaced by If I. If
(J..L 0 v) If I < 00, it follows that N s = {s E S; vl/( s, .) I = oo} is a J1.-null set
in S whereas NT = {t E T; pll(., t)1 = oo} is a v-null set in T. By Lemma
1.24 we may redefine f(8, t) to be zero when 8 E Ns or tENT. Then (7)
follows for f by subtraction of the formulas for f + and f _ . 0
1. Measure Theory - Basic Notions 15
The measure J.-t 0 v in Theorem 1.27 is called the product measure of J.-l
and v. Iterating the construction in finitely many steps, we obtain product
measures J.-li 0 . . . @ J.-tn = @ k J.-tk satisfying higher-dimensional versions of
(7). If J.-tk == J-t for all k, we often write the product as J.-l)n or J-ln.
By a measurable group we mean a group G endowed with a a-field 9
such that the group operations in G are {I-measurable. If J.-lI, . . . , Jln are
a-finite measures on G, we may define the convolution Ji'l * . . . * /--In as the
image of the product measure J.11 @ . . . 0 J.-tn on en under the iterated group
operation (Xl, . . . , X n ) r-t Xl . . . Xn. The convolution is said to be associative
if (ILl * J-t2) * JL3 = J.11 * (IL2 * J-t3) whenever both J-tl * J-t2 and fL2 * fL3 are
a-finite and commutative if J-tl * /--l2 == /--l2 * /--ll.
A measure J.-t on G is said to be right or left invariant if fL 0 Tg-I == jL for
all 9 E G, where Tg denotes the right or left shift x I---t xg or x I---t gx. When
G is Abelian, the shift is called a translation. We may also consider spaces
of the form G x S, in which case translations are defined to be mappings
of the form Tg: (x, s) r-t (x + g, s ) .
Lemma 1.28 (convolution) The convolution of a-finite measures on a
measurable group (G, Q) is associative, and for Abelian G it is also
commutative. In the latter case,
(J1. * v)B = J J1.(B - s) v(ds) = J v(B - s) J1.(ds), BEg.
If J.L == f . A and v == 9 . A for some invariant measure A, then J-L * v has the
A-density
(f * g)( s) = J I (s - t) g( t) ).. ( dt) = J I ( t) g( s - t) ).. ( dt) , s E G.
Proof: Use Fubini's theorem.
o
Given a measure space (!1, A, J-t) and a p > 0, we write LP = LP(!1, A, /-1)
for the class of all measurable functions f: !1 -+ IR with
IIfli p - (J-tlfI P )l/ p < 00.
Lemma 1.29 (Holder and Minkowski inequalities) For any measurable
functions f andg on some measure space (!1,A,jL), we have
(i) III gilT < IIlllpl/gl/q for all p, q, r > 0 with p-l + q-J == r- l ,
(ii) IIf + gll1\1 < IIfll1\1 + IIglll\l for all p > o.
Proof: (i) It is clearly enough to take r = 1 and II flip == IIgllq == 1. The
relation p-l + q-l = 1 implies (p - l)(q - 1) == 1, and so the equations
y = x p - 1 and x == yq-l are equivalent for x, y > o. By calculus,
Ifl Igi
11g1 < J o xp-1dx + Jo yq-1dy = p- 1 lll P + q-1Iglq,
and so
IIlgll1 < p-l J 111 P dJ1. + q-lIIlqdJ1. = p-l + q-l = 1.
16 Foundations of Modern Probability
(ii) The relation holds for p < 1 by the concavity of x P on +. For p > 1,
we get by (i) with q == p/(1 - p) and r == 1
IIf + 911 < J Ifllf + 9l p - 1 djL + J 191 If + 9l p - 1 djL
< IIfllpllf + gll-l + IIgllpllf + gll-l. 0
The inequality in (ii) is often needed in the following extended form.
Corollary 1.30 (extended Minkowski inequality) Let J-l, v, and f be such
as in Theorem 1.27, and assume that J-tf(t) == J f(s, t)j.t(ds) exists for t E T
a.e. v. Write Ilfllp(s) = (vlf(s, ')IP)l/p. Then
IIJ-lfll p < J-lllfll p , p > 1.
Proof:,...Since IJ-lfl < J.tlfl, we may assume that I > 0, and we may also
assume that IIJ.tfll p E (0, (0). For p > 1, we get by Fubini's theorem and
Holder's inequality
lIJ.tfll = v(J.Lf)P = v(J-tI(j.tf)P-l) == jjv(f(jjf)P-l)
< J.tllfll p 1I(J-tf)P-l\1q = J.Lllfll p lIJ.tfll-I,
and it remains to divide by lIJ.lfll-I. The proof for p = 1 is similar but
simpler. 0
In particular, Lemma 1.29 shows that II . lip becomes a norm for p > 1 if
we identify functions that agree a.e. For any p > 0 and f, fl, f2, . . . E LP,
we write In --+ f in LP if Il/n - flip --+ 0 and say that (fn) is Cauchy in LP
if Ilfm - fn\lp -4 0 as m,n -400.
Lemma 1.31 (completeness) Let (fn) be a Cauchy sequence in LP, where
p > o. Then II/n - Illp --+ 0 for some f E £P.
Proof: Choose a subsequence (nk) C N with Ek IIfnk+l - Ink 111\1 < 00.
By Lemma 1.29 and monotone convergence we get II Ek Ifnk+l - fnk 11I/\1
< 00, and SQ Ek I/nk+l - ink I < 00 a.e. Hence, (Ink) is a.e. Cauchy in,
and so Lemma 1.10 yields fnk -4 I a.e. for some measurable function f.
By Fatou's lemma,
III - fnllp < liminf !link - In lip < sup 111m - fnllp -1- 0, n --+ 00,
k--+oo mn
which shows that in -4 I in LP. 0
The next result gives a useful criterion for convergence in LP.
Lemma 1.32 (LP-convergence) For any p > 0, let f, fl, 12,. .. E LP with
In -4 f a.e. Then In -4 f in LP iff IIfnllp --t IIfll p .
Proof: If fn -4 f in £P, we get by Lemma 1.29
IlIfntl PAI - IIfUp"l, < U/n - fll P l\1 -t 0,
J!...%.......................P P
1. Measure Theory - Basic Notions 17
and so Ilfnllp -+ Ilfll p . Now assume instead the latter condition, and define
9n = 2P(llnl P + I/IP),
9 = 2 P + 1 IfI P .
Then gn -+ 9 a.e. and j.t9n -+ J-t9 < 00 by hypotheses. Since also 19n I >
Ifn - flP -t 0 a.e., Theorem 1.21 yields Ilfn - fll == J-tlfn - liP -+ o. 0
Taking p == q == 2 and r == 1 in Lemma 1.29 (i), we get the Cauchy-
Buniakovsky or Schwarz inequality
I1lgllt < Ilf112119112.
In particular, we note that, for any f, 9 E £2, the inner product (f, g) ==
j.t(fg) exists and satisfies l(f,9)1 < 11/1I211g112. From the obvious bilinearity
of the inner product, we get the parallelogram identity
IIf + gl1 2 + Ilf - 911 2 == 211fl1 2 + 211g112, f,9 E L 2 . (9)
Two functions f, 9 E L 2 are said to be orthogonal (written as f -L 9)
if (I, g) == o. Orthogonality between two subsets A, B c L 2 means that
f J.. 9 for all f E A and 9 E B. A subspace M c L 2 is said to be linear if
af + bg E M for any f, gEM and a, b E JR, and closed if f E M whenever
f is the L2-limit of a sequence in M.
Theorem 1.33 (orthogonal projection) Let M be a closed linear subspace
of £2. Then any function f E £2 has an a. e. unique decomposition 1 = g+h
with gEM and h 1. M.
Proof: Fix any f E £2, and define d == inf{llf - 911; gEM}. Choose
91, g2, . . . E M with II f - 9n II -+ d. Using the linearity of M, the definition
of d, and (9), we get as m, n -+ 00,
4d 2 + IIgm - 9nl\2 < 112f - gm - 9nl1 2 + 119m - 9nl1 2
2111 - 9mll 2 + 2111 - 9n 11 2 -+ 4d 2 .
Thus, 119m - gn II -t 0, and so the sequence (gn) is Cauchy in £2. By Lemma
1.31 it converges toward some 9 E L 2 , and since M is closed we have 9 EM.
Noting that h = f - 9 has norm d, we get for any 1 EM,
d 2 < Ilh + tlll 2 = d 2 + 2t(h, l) + t 2 111112, t E ,
which implies (h, l) == o. Hence, h -L M, as required.
To prove the uniqueness, let g' + h' be another decomposition with the
stated properties. Then 9 - g' E M and also 9 - g' == h' - h -1 M, so
g - g' 1. 9 - g', which implies Ilg - 9'11 2 == (g - 9',9 - g/) = 0, and hence
- , D
9 - 9 a.e.
We proceed with a basic approximation property of sets.
18 Foundations of Modern Probability
Lemma 1.34 (regularity) Let Jl be a bounded measure on some metric
space S with Borel a-field S. Then
J-tB == sup J-tF = inf J-tG, B E S,
FeE G-:)B
with F and G restricted to the classes of closed and open subsets of S,
respectively.
Proof: For any open set G there exist some closed sets Fn t G, and by
Lemma 1.14 we get J..LFn t J..LG. This proves the statement for B belonging
to the 7r-system 9 of all open sets. Letting V denote the class of all sets
B with the stated property, we further note that 1) is a A-system. Hence,
Theorem 1.1 shows that 1) ::) a(Q) == S. D
The last result leads to a basic approximation property for functions.
Lemma 1.35 (approximation) Given a metric space S with Borel a-field
S, a bounded measure J-L on (S, S), and a constant p > 0, the set of bounded,
continuous functions on S is dense in LP (S, S, J.L). Thus, for any f E LP
there exist some bounded, continuous functions 11, 12, . .. : S -t 1R with
IIln - flip -t o.
Proof: If i = 1A with A c S open, we may choose some continuous
functions In with 0 < in t f, and then II/n - flip -t 0 by dominated
convergence. By Lemma 1.34 the result remains true for arbitrary A E S.
The further extension to simple measurable functions is immediate. For
general I E LP we may choose some simple measurable functions fn -t I
with Ifni < If I. Since Ifn - flP < 2 P + 1 IfI P , we get Il/n - Illp -t 0 by
dominated convergence. D
The next result shows how the pointwise convergence of a sequence of
measurable functions is almost uniform.
Lemma 1.36 (near uniformity, Egorov) Let f, fl, f2,. .. be measurable
functions on some finite measure space (f2, A, JL) such that fn -t f on O.
Then for any £ >' 0 there exists some A E A with jlAc < c such that In -t I
uniformly on A.
Proof: Define
Am n = n {x E f2; Ifk(X) - f(x)1 < m- 1 }, m,n E N.
, kn
As n --t 00 for fixed m, we have Am,n t 11 and hence J.tA,n -+ O. Given any
£ > 0, we may then choose nt, n2,. .. E N so large that J..LA,nm < e2- m
for all m. Letting A = nm Am,n m , we get
JlA c < J..l u AC < e'"' 2- m = c
- m m, n m L..J m '
and we note that f n --t f uniformly on A.
D
1. Measure Theory - Basic Notions 19
The last two results may be combined to show that every measurable
function is almost continuous.
Lemma 1.37 (near continuity, Lusin) Let / be a measurable function on
some compact metric space S with Borel a-field S and a bounded measure
J-l. Then there exist some continuous functions iI, /2,. . on S such that
J-l{x; fn(x) i= f(x)} -t o.
Proof: We may clearly assume that f is bounded. By Lemma 1.35 we may
choose some continuous functions 91,92,... on S such that J119k- II < 2- k .
By Fubini's theorem, we get
fL L k l9k - II = L k fLI9k - II < Lk 2- k = 1,
and so I:k 19k - II < 00 a.e., which implies 9k -t f a.e. By Lemma 1.36, we
may next choose AI, A 2 , . . . E S with J1A -t 0 such that the convergence
is uniform on each An. Since each 9k is uniformly continuous on S, we
conclude that f is uniformly continuous on each An. By Tietze's extension
theorem, the restriction IIAn then admits a continuous extension in to
S. 0
For any measurable space (S, S), we may introduce the class M (5) of a-
finite measures on S. The set M(S) becomes a measurable space in its own
right when endowed with the a-field induced by the mappings 'lrB: J1 t---+ J1B,
B E S. Note in particular that the class P(S) of probability measures on
S is a measurable subset of M(S). In the next two lemrnas we state some
less obvious measurability properties, which will be needed in subsequent
chapters.
Lemma 1.38 (measurability of products) For any measurable spaces
(8, S) and (T, T), the mapping (11, v) t-+ J1 0 1/ is measurable from
P(S) x P(T) to P(S x T).
Proof: Note that (J-lQ9v)A is measurable whenever A == B x C with B E S
and C E T, and extend by a monotone class argument. 0
In the context of separable metric spaces S, we assume the measures
J-L E M(S) to be locally finite, in the sense that J-lB < 00 for any bounded
Borel set B.
Lemma 1.39 (diffuse and atomic parts) For any separable metric space
S,
(i) the set D C M(S) of degenerate measures on S is measurable;
(ii) the diffuse and purely atomic components J-ld and I-la are measurable
functions of J-L E M(S).
20 Foundations of Modern Probability
Proof: (i) Choose a countable topological base B 1 , B 2 , . .. in S, and define
J == {(i,j); B i n Bj == 0}. Then, clearly,
D == { J-L E M (8); '" .. (J-LBi) ({tBj ) == O } .
(,J)EJ
(ii) Choose a nested sequence of countable partitions Bn of S into Borel
sets of diameter less than n -1. For any E > 0 and n E N we introduce
the sets U == U{B E Bn; {tB > E}, Uc == {s E S; J-L{s} > E}, and U ==
{s E 3; J-L{s} > O}. It is easily seen that U ..!-. U€ as n -t 00 and U€ t U
as E -t o. By dominated convergence, the restrictions J-L == J-L(U n .) and
J-l€ == J-L( U€ n .) satisfy locally J-L ..!-. J-L€ and J-tc t J-La. Since J-L; is clearly
a measurable function of J-L, the asserted measurability of J-La and J-Ld now
follows by Lemma 1.10. 0
Given two measurable spaces (3, S) and (T, T), a mapping J-L: S x T -t
+ is called a (probability) kernel from S to T if the function J-LsB == J-L(s, B)
is S-measurable in s E S for fixed BET and a (probability) mea-
sure in BET for fixed s E 3. Any kernel J-L determines an associated
operator that maps suitable functions f : T ---t 1R into their integrals
J-Lf(s) == J J-L(s, dt)f(t). Kernels play an important role in probability the-
ory, where they may appear in the guises of random measures, conditional
distributions, Markov transition functions, and potentials.
The following characterizations of the kernel property are often useful.
For simplicity we restrict our attention to probability kernels.
Lemma 1.40 (kernels) Fix two measurable spaces (3,S) and (T, T), a
7r-system C with O"(C) == T, and a family J-L == {J-Ls; s E S} of probability
measures on T. Then these conditions are equivalent:
(i) J-L is a probability kernel from S to T;
(ii) J-L is a measurable mapping from S to P(T);
(iii) S J-LsB is a measurable mapping from S to [0,1] for every BEe.
Proof: Since WB : J-L J-LB is measurable on P(T) for every BET,
condition (ii) implies (iii) by Lemma 1.7. Furthermore, (iii) implies (i) by
a straightforward application of Theorem 1.1. Finally, under (i) we have
/L-1 7rB 1 [0, x] E S for all BET and x > 0, and (ii) follows by Lemma
1.4. 0
Let us now introduce a third measurable space (U,U), and consider two
kernels J-L and v, one from S to T and the other from S x T to U. Imitating
the construction of product measures, we may attempt to combine /L and
v into a kernel Jl 0 v from S to T x U given by
(p,0 v )(s,B) = J JL(s,dt) J v(s,t,du)l B (t,u), BE 70U.
The following lemma justifies the formula and provides some further useful
information.
1. Measure Theory - Basic Notions 21
Lemma 1.41 (kernels and functions) Fix three measurable spaces (S, S),
(T,7), and (U, U). Let J-L and v be probability kernels from S to T and
from 8 x T to U, respectively, and consider two measurable functions f :
S x T -+ IR+ and g: S x T -1- U. Then
(i) J-Lsf(s,.) is a measurable function of S E S;
(ii) J-Ls 0 (g(S,.))-l is akemelfromS toU;
(iii) J-L 0 v is a kernel from S to T x U.
Proof: Assertion (i) is obvious when f is the indicator function of a set
A == B x C with B E Sand C E T. From here on, we may extend to
general A E S &; T by a monotone class argument and then to arbitrary f
by linearity and monotone convergence. The statements in (ii) and (iii) are
easy consequences. 0
For any measurable function f > 0 on T x U, we get as in Theorem 1.27
(J.L 0 v)sf = J J.L( s, dt) J v(s, t, du )f( t, u), S E S,
or simply (J-L &; v)1 == J-L(vf). By iteration we may combine any kernels Ilk
from 8 0 x . . . X Sk-1 to Sk, k == 1, . . . , n, into a kernel P-l @ . . . 0 J.ln from
So to Sl X . . . X Sn, given by
(J-Ll &;... Q9 J-Ln)f == J-Ll(/J2('.' (JLnf)...))
for any measurable function 1 > 0 on S1 x . . . X Sn.
In applications we may often encounter kernels /Jk from Sk-1 to Sk,
k == 1, . . . , n, in which case the composition J-L1 . . . /In is defined as a kernel
from So to Sn given for measurable B C Sn by
(JL1 . . . J-Ln)sB (J-L1 @ . . . @ /In)s(Sl x . .. X Sn-l X B)
- J J.Ll(s,ds 1 ) J J.L2(Sl,ds 2 )...
. · . J J.Ln-l (Sn-2, dS n - 1 )J.Ln (Sn-l, B).
Exercises
1. Prove the triangle inequality /1(AC) < Il(AB) -f- j.-l(Bf:1C). (Hint:
Note that 1 ALlB = 11A - 1BI.)
2. Show that Lemma 1.9 is false for uncountable index sets. (Hint: Show
that every measurable set depends on countably many coordinates.)
3. For any space S, let JLA denote the cardinality of the set A c S. Show
that JL is a measure on (8,2 8 ).
4. Let /C be the class of compact subsets of some metric space S, and let J.-l
be a bounded measure such that infKEK: JLKc == O. Show for any B E 8(S)
that JLB == sUPKEK:nB JLK.
22 Foundations of Modern Probability
5. Show that any absolutely convergent series can be written as an integral
with respect to counting measure on N. State series versions of Fatou's
lemma and the dominated convergence theorem, and give direct elementary
proofs.
6. Give an example of integrable functions f, 11, f2, . .. on some probability
space (0, A, J-L) such that In -+ I but J-Lfn f+ J-Lf.
7. Fix two a-finite measures J-t and v on some measurable space (O,:F) with
sub-a-field Q. Show that if J-t « 1/ holds on F, it is also true on Q. Further
show by an example that the converse may fail.
8. Fix two measurable spaces (8, S) and (T, T), a measurable function
I: S --t T, and a measure J1 on 8 with image 1/ == J-t 0 f-l. Show that I
remains measurable w.r.t. the completions SIL and TI/.
9. Fix a measure space (8,5, J-L) and a a-field T c S, let SJ.L denote the
J.l-completion of 5, and let TJ.L be the a-field generated by T and the J-t-null
sets of 5J-L. Show that A E TJ-1 iff there exist some BET and N E SJ-L with
AB c Nand JtN == O. Also, show by an example that TIL may be strictly
greater than the j.L-completion of T.
10. State Fubini's theorem for the case where J.l is any a-finite measure and
v is the counting measure on N. Give a direct proof of this result.
11. Let fl, 12, . .. be J1-integrable functions on some measurable space S
such that 9 == Lk fk exists a.e., and put gn == Lk<n fk. Restate the domi-
nated convergence theorem for the integrals J-tgn in terms of the functions
fk, and compare with the result of the preceding exercise.
12. Extend Theorem 1.27 to the product of n measures.
13. Let A denote Lebesgue measure on IR+, and fix any p > O. Show that
the class of step functions with bounded support and finitely many jumps
is dense in LP(A). Generalize to }Ri.
14. Let M ::J N be closed linear subspaces of £2. Show that if f E £2 has
projections 9 onto M and h onto N, then 9 has projection h onto N.
15. Let M be a closed linear subspace of £2, and let f, 9 E L 2 with M-
projections 1 and g. Show that (1, g) == (f, g) == (I, g).
16. Let J-tl, J.l2, . .. be kernels between two measurable spaces 8 and T.
Show that the function Jt == Ln J.ln is again a kernel.
17. Fix a function f between two measurable spaces Sand T, and define
J.l(s, B) == IB 0 f(8). Show that J-t is a kernel iff f is measurable.
18. Show that if J-t « 1/ and 1/1 == 0 with f > 0, then also J11 == O. (Hint:
Use Lemma 1.24.)
19. For any a-finite measures J-tl « J12 and VI « 1/2, show that J.ll Q9 1/1 «
J-t2 @ V2. (Hint: Use Fubini's theorem and Lemma 1.24.)
Chapter 2
Measure Theory - Key Results
Outer measures and extension; Lebesgue and Lebesgue-Stieltjes
measures; Jordan-Hahn and Lebesgue decomposit10ns; Radon-
Nikodym theorem; Lebesgue's differentiation theorem; junctions
of finite variation; Riesz' representation theorern; Haar and
invariant measures
We continue our introduction to measure theory with a detailed discussion
of some basic results of the subject, all of special relevance to probability
theory. Again the hurried or impatient reader may skip to the next chapter
and return for reference when need arises.
Most important, by far, of the quoted results is the existence of Lebesgue
measure, which lies at the heart of most probabilistic constructions, of-
ten via a use of the Daniell-Kolmogorov theorem of Chapter 6. A similar
role is played by the construction of Haar and other invariant measures,
which ensures the existence of uniform distributions or homogeneous Pois-
son processes on spheres and other manifolds. Other key results include
Riesz' representation theorem, which will enable us in C;hapter 19 to con-
struct Markov processes with a given generator, via the resolvents and
the associated semigroup of transition operators. We may also mention
the Radon-Nikodym theorem, of relevance to the theory of conditioning
in Chapter 6, Lebesgue's differentiation theorem, instrulnental for proving
the general ballot theorem in Chapter 11, and various results on functions
of bounded variation, important for the theory of predictable processes and
general semimartingales in Chapters 25 and 26.
We begin with an ingenious technical result that will play a crucial role
for our construction of Lebesgue measure in Theorem 2.2 and for the proof
of Riesz' representation Theorem 2.22. By an outer measure on a space [2 we
mean a nondecreasing and countably subadditive set function J.-t: 2° -t IR +
with J-t0 = O. Given an outer measure J-t on [2, we say that a set A c [2 is
J.l- measurable if
J-tE = J.l(E n A) + Jl(E n A C ), E c !l.
(1)
Note that the inequality < holds automatically by subadditivity. The
following result gives the basic construction of mea5ures from outer
measures.
24 Foundations of Modern Probability
Theorem 2.1 (restriction of outer measure, Caratheodory) Let J.l be an
outer measure on 0, and write A for the class of J.l-measurable sets. Then
A is a a-field and the restriction of J-t to A is a measure.
Proof: Since J.l0 == 0, we have for any set E c 0
J.l(E n 0) + J-l(E n f!) = J-t0 + J-LE = J-LE,
which shows that 0 E A. Also note that trivially A E A implies AC E A.
Next assume that A, B E A. Using (1) for A and B together with the
subadditivity of J.l, we get for any E c f!
J-tE Jl(E n A) + J.t(E n A C )
== Jl(E n A n B) + Jl(E n A n B C ) + J-t(E n A C )
> Jl(E n (A n B)) + p(E n (A n B)C),
which shows that even A n B E A. It follows easily that A is a field. If
A, B E A are disjoint, we also get by (1) for any E c n
J-L(E n (A U B)) == J-L(E n (A U B) n A) + J-L(E n (A U B) n A C )
== J-L(E n A) + J-L(E n B). (2)
Finally, consider any disjoint sets AI, A 2 ,.. . E A, and put Un == Uk<n Ak
and U == Un Un. Using (2) recursively along with the monotonicity-of J-t,
we get
J.L(E n U) > J.L(E nUn) = LksnJ.L(E n A k ).
Letting n -+ 00 and combining with the subadditivity of J-L, we obtain
J.L(E n U) = LkJ.L(E n A k ).
(3)
In particular, for E = n we see that Jl is countably additive on A. Noting
that Un E A and using (3) twice along with the monotonicity of J.l, we also
get
JlE J-L(E nUn) + J-L(E n U)
> LksnJ.L(E n A k ) + J.L(E n U C )
-+ J1(E n U) + J.L(E n U C ),
which shows that U E A. Thus, A is a a-field.
o
We are now ready to introduce Lebesgue measure A on JR. The length of
an interval I C ]R is denoted by 111.
Theorem 2.2 (Lebesgue measure, Borel) There exists a unique measure
A on (JR, B) such that AI = III for every interval I c JR.
As a first step in the proof, we show that the length III of intervals I c JR
admits an extension to an outer measure on JR. Then define
.xA = inf{h} Lk lIkl , A c JR,
(4)
2. Measure Theory -- Key Results 25
where the infimum extends over all countable covers of ..(4 by open intervals
1 1 ,12, . . . . We show that (4) provides the desired extension.
Lemma 2.3 (outer Lebesgue measure) The function A in (4) is an outer
measure on JR. Moreover, AI = III for every interval I.
Proof: The set function A is clearly nonnegative and non decreasing with
A0 = O. To prove the countable subadditivity, let AI, A 2 , . .. c 1R be ar-
bitrary. For any £ > 0 and n EN, we may choose some open intervals
Inl, I n2 , . .. such that
An c U/nk, >"An > Lkllnkl- E2- n ,
n E N.
Then
UnAn
>.. Un An
c UnU/nk'
< LnLkllnkl < Ln>"An +10,
and the desired relation follows as we let E ---t O.
To prove the second assertion, we may assume that I == [a, b] for some
finite numbers a < b. Since I c (a - E, b + E) for every c > 0, we get AI <
III + 2£, and so AI < III. To obtain the reverse relation, we need to prove
that if I C Uk Ik for some open intervals 1 1 ,1 2 "., , then III < Lk Ilkl.
By the Heine-Borel theorem, I remains covered by finitely many intervals
Il,...,In, and it suffices to show that III < Lk<nIIkl. This reduces the
assertion to the case of finitely many covering intervals II, . . . , In.
The statement is clearly true for a single covering interval. Proceeding by
induction, we assume the assertion to be true for n - 1 covering intervals
and turn to the case of covering by II,..., In. Then b belongs to some
Ik = (ak, b k ), and so the interval I = I \ Ik is covered by the remaining
intervals Ij, j =1= k. By the induction hypothesis, we get
III b-a « b-ak)+(ak- a )
< Ihl + III < Ihl + Lj#IIjl = L)Ijl,
as required.
o
The next result ensures that the class of measurable sets in Lemma 2.3
is large enough to contain all Borel sets.
Lemma 2.4 (measurability of intervals) Let A denote the outer measure
in Lemma 2.3. Then the interval (-00, a] is A-measurable for every a E JR.
Proof: For any set E c JR and constant c > 0, we may cover E by some
open intervals 1 1 ,1 2 ,. .. such that AE > Ln IInl - E. Writing I = (-00, a]
26 Foundations of Modern Probability
and using the subadditivity of A and Lemma 2.3, we get
>"E + E: > Ln llnl = Lnlln nIl + Lnlln n ICI
Ln >..(In n 1) + Ln >..(In n I C )
> A(EnI)+.A(EnI C ).
Since e was arbitrary, it follows that I is A-measurable.
o
Proof of Theorem 2.2: Define A as in (4). Then Lemma 2.3 shows that A
is an outer measure such that AI = III for every interval I. Furthermore,
Theorem 2.1 shows that A is a measure on the a-field A of all A-measurable
sets. Finally, Lemma 2.4 shows that A contains all intervals (-00, a] with
a E JR.. Since the latter sets generate the Borel a-field B, we have B c A.
To prove the uniqueness, consider any measure {t with the stated prop-
erties, and put In == [-n,n] for n E N. Using Lemma 1.17 with C equal to
the set of intervals, we see that
.A(B n In) = J-L(B n In), B E B, n E N.
Letting n -+ 00 and using Lemma 1.14, we get AB == JLB for all B E B, as
required. 0
Before proceeding to a more detailed study of Lebesgue measure, we
state an abstract extension theorem that can be proved by essentially the
same arguments. Here a nonempty class I of subsets of a space n is called
a semiring if for any I, J E I we have I n J E I and the set I n JC can be
written as a union of finitely many disjoint sets II, . . . , In E I.
Theorem 2.5 (extension, Caratheodory) Let J1 be a finitely additive and
countably subadditive set function on a semiring I such that J-L0 == O. Then
J..l extends to a measure on a (I) .
Proof: Define a set function J.-t* on 2!1 by
J-L* A = inf{ld LkJ-Lh, A c fl,
where the infimum extends over all covers of A by sets 1 1 ,1 2 ,' . . E I. Let
J..l* A = 00 when no such cover exists. Proceeding as in the proof of Lemma
2.3, we see that J-L* is an outer measure on O. To check that J-L* extends J-L,
fix any I E I, and consider an arbitrary cover 1 1 ,1 2 ,'.. E I of I. Using
both the subadditivity and the finite additivity of {t, we get
J-L* I < J-LI < Lk (I n h) < LkJ-Lh,
which implies J-L* I = J1.I. By Theorem 2.1, it remains to show that every
set I E I is J.L* -measurable. Then let A c n be covered by some sets
1 1 ,1 2 ,." E I with f.-t* A > L::k J-LIk-c, and proceed as in the proof of Lemma
2.4, noting that In n IC is a finite disjoint union of some sets Inj E I, and
therefore J1(In n IC) = E j J-L1nj by the finite additivity of {t. 0
2. Measure Theory - Key Results 27
Using Theorem 1.27, we may construct the product measure Ad == A @
. . . @ A on d for every dEN. We call Ad the d-dirnensional Lebesgue
measure. Note that Ad generalizes the ordinary notion of area (when d == 2)
or volume (when d > 3). The following result shows that Ad is invariant
under arbitrary translations (or shifts) and rotations. We shall also see that
the shift invariance characterizes Ad up to a constant factor.
Theorem 2.6 (invariance of Lebesgue measure) Fix any measurable space
(8, S) and a measure J-l on]Rd x 8 with a-finite projection v == j.l((O, l]d X .)
onto S. Then J-t is invariant under shifts in d iff J-t == Ad (29 v, in which
case J-L remains invariant under arbitrary rigid motions of }Rd.
Proof: First assume that J-t is invariant under shifts in ]Rd. Let I denote
the class of intervals I == (a, b] with rational endpoints, and note that for
any II, . . . , Id E I and C E S with vC < 00,
J-l(II X . . . X Id X C)
11 1 1 . . . lId I vC
(Ad @ V)(II X . .. X 1d X C).
For fixed 1 2 , . . . ,Id and C, the relation extends by monotonicity to arbitrary
intervals II and then, by the uniqueness in Theorem 2.2, to any Bl E B.
Proceeding recursively in d steps, we get for arbitrary 11 1 , . . . , Bd E B
J-L(BI X ... X Ed X C) == (Ad Q9 v)(B I X ... X Ed X C),
and so J..L == Ad 0 v by the uniqueness in Theorem 1.27.
Conversely, let J-L == Ad @ v. For any h == (hI,. . . , h d ) E: d, we define the
shift operator Th : ]Rd -+ ]Rd by ThX == x + h for all x E JRd. For any intervals
II,. . . ,Id and sets C E S, we have
J-L(II X . . . X Id X C) 11 1 1 . . .IIdl vC
J..LOT;:I(Il x... X Id X C),
where Th(X, s) == (x + h, s). As before, it follows that /-l == J-l 0 T;:I.
It remains to show that J-l is invariant under arbitrary orthogonal
transformations P on ]Rd. Then note that, for any x, h E: JRd,
Px+h=P(x+P- 1 h)
P(x + h') == PTh,X,
where hi == P-lh. Since J-t is shift-invariant, we obtain
Th PX
J-L 0 p-l 0 Ti: l == J-l 0 T;:,1 0 p- 1 == J-l 0 p--l,
where P(x,s) == (Px,s). Thus, even J-lop-l is shift-invariant and hence of
the form>.. d Q?J v'. Writing B for the unit ball in JRd, we get for any C E S
>.odB. vie == J-lO P-l(B x C) == J-l(p-l B x C)
== J-l(B x C) == Ad B . vC.
Dividing by Ad B yields v'e == vC. Hence, Vi == v, and so J-lO p-l == J-L. 0
28 Foundations of Modern Probability
We proceed to show that integrable functions on ]Rd are continuous in a
specified average sense.
Lemma 2.7 (mean continuity) Let I be a measurable function on IR d with
Adl!1 < 00. Then
lim J If (x + h) - f(x)1 dx = O.
h-+O
Proof: By Lemma 1.35 and a simple truncation, we may choose some
continuous functions 11, f2, . .. with bounded supports such that Adll n -
II -4 O. By the triangle inequality, we get for n E Nand h E d
J If(x + h) - f(x)1 dx < J Ifn(x + h) - fn(x)1 dx + 2>.dlfn - fl.
Since the fn are bounded, the right-hand side tends to 0 by dominated
convergence as h -4 0 and then n -4 00. 0
By a bounded signed measure on a measurable space (!1, A) we mean a
bounded function v: A --t such that v Un An = l:n V An for any disjoint
sets AI, A 2 ,' . . E A, where the series converges absolutely. We say that two
measures JL and v on (0, A) are (mutually) singular or orthogonal and write
JL 1- v if there exists some set A E A with J.LA = £lAc = O. Note that this A
may not be unique. The following result gives the basic decomposition of
a signed measure into positive components.
Theorem 2.8 (Hahn decomposition) Any bounded signed measure v can
be written uniquely as a difference of two bounded, nonnegative, and
mutually singular measures £1+ and £1_.
Proof: Put c == sup{vA; A E A} and note that, if A, A' E A with vA >
C - E and v A' > C - E', then
v(A u A')
vA+vA' -v(AnA')
(c - E) + (c - E/) - C = C - E - E'.
>
Choosing AI, A 2 , . .. E A with v An > C - 2- n , we get by iteration and
countable additivity
V u Ak > C - '" 2- k = C - 2- n , n E N.
k>n L..i k>n
Define A+ = nn Uk>n Ak and A_ = A+. Using the countable additivity
again, we get vA+ :=: c. Hence, for sets B E A,
vB vA+ - v(A+ \ B) > 0,
vB - v(A+UB)-vA+ < O,
B c A+,
B c A_.
We may then define some measures £1+ and £1_ by
v+B = v(B n A+), l/_B = -v(B n A_),
BE A.
To prove the uniqueness, assume also that v = JL+ - J..L- for some positive
measures Jl+ ..L J.t-. Choose a set B+ E A with J-t_B+ = J.L+B+ = o. Then
2. Measure Theory - Key Results 29
v is both positive and negative on the sets A+ \ B+ a.nd B+ \ A+, and
therefore v == 0 on A+B+. Hence, for any C E A
J1+C == J-t+(B+ n C) == v(B+ n C) == v(A+ n C) == v+C,
which shows that 11+ == v+_ Then also
/1- == J.l+ - v == v+ - v == v_.
o
The last result can be used to construct the maximum /.lVV and minimum
J.l /\ v of two a-finite measures J.l and v.
Corollary 2.9 (maximum and minimum) For any a-finite measures J.1
and v on a common measurable space, there exists a la.rgest measure J.1 /\
v bounded by J1 and v and a smallest measure J1 V v bounding J.l and v.
Furthermore,
J.l - J.l /\ v ..L v - J.l /\ 1/, J.l /\ 1/ + J.l V v == ,f..l + v.
Proof: We may assume that J.l and 1/ are bounded. Letting p+ - p- be
the Hahn decomposition of J1- v, we put
J.l /\ v == J.l - p+, J1 V 1/ == J1 + p-.
For any two measures J.l and v on (f1, A), we say that v is absolutely
continuous with respect to J.L and write v « J.L if jlA == 0 implies 1/ A == 0
for all A E A. The following result gives a fundamental decomposition of
a measure into an absolutely continuous and a singular component; at the
same time it provides a basic representation of the former part.
o
Theorem 2.10 (Lebesgue decomposition, Radon-Nikodym theorem) For
any a-finite measures J.L and 1/ on 0, there exist some unique measures
Va « J.l and V 8 ..L J.l such that v == Va + 1/ 8 ' Furthermore, L'a == f . J1 for some
jl-a. e. unique measurable function f > 0 on n.
Two lemmas will be needed for the proof.
Lemma 2.11 (closure) Fix two measures J.l and v on f1 and some mea-
surable functions 11, f2, . . . > 0 on n with f n . J.l < v - Then even f . f.-L < v,
where I == sUP n In-
Proof: First assume that f . J-L < v and 9 . J-L < v, and put h == f V g.
Writing A == {f > g}, we get
h . J.l == 1 A h . J.L + 1 A c h . J.l == 1 A f . J.l + 1 A c 9 - J.l < 1 A . II + 1 A c . V == v.
Thus, we may assume that f n t f. But then v > f n . J.l t f - J.L by monotone
convergence, and so I . J.l < v. 0
Lemma 2.12 (partial density) Let J.l and 1/ be bounded measures on n
with J-t I- v. Then there exists a measurable function f > 0 on n such that
Jl f > 0 and f · J.l < v.
Proof: For each n E N we introduce the signed measure Xn == V - n- 1 J.l.
By Theorem 2.8 we may choose some A;t E A with complement A;; such
30 Foundations of Modern Probability
that :f:Xn > 0 on A. Since the Xn are nondecreasing, we may assume that
At c At c . . . . Writing A == Un A and noting that AC == nn A c A;;,
we obtain
vA C < vA == XnA +n-lttA < n- 1 J.-L0' 0,
and so vAc == O. Since J-L t v, we get ttA > O. Furthermore, A;t t A implies
J.LA;t t J.LA > 0, and we may choose n so large that ttA;t > O. Putting
I == n- 1 1 A ;t", we obtain J-lf == n- 1 J-lA;t > 0 and
f.jL==n- 1 1 A + .J.-L==l A + .v-1 A + .Xn < v. 0
n n n
Proof of Theorem 2.10: We may assume that J.L and v are bounded. Let
C denote the class of measurable functions ! > 0 on n with ! . jL < v,
and define c == SUP{M!; I E C}. Choose 11, f2,... E C with Mfn c. Then
f - SUPn In E C by Lemma 2.11 and MI == c by monotone convergence.
Define Va == f . M and V s == v - Va, and note that Va « J-L. If V 8 I- J.L, then
by Lemma 2.12 there exists a measurable function 9 > 0 with J.Lg > 0 and
9 . J-l < v 8 . But then f + 9 E C with M(! + g) > c, which contradicts the
definition of c. Thus, v s -L /.1.
To prove the uniqueness of Va and v 8 , assume that also v == v + v for
some measures v « J-L and v -L {t. Choose A, B E A with vsA == J-LAc ==
vB == j.LBc == O. Then clearly
vs(A n B) == v(A n B) == va(A C U B C ) == v(AC U B C ) == 0,
and so
V 8 -
1AnB . Va == 1AnB . V == lAnB · v == v,
I I
V - Va == V - Va == v 8 .
Va
To see that f is a.e. unique, assume that also Va == 9 . J.L for some mea-
surable function 9 > O. Writing h == f - 9 and noting that h . J.-L == 0, we
get
ILlhl = [ hdlL - [ hdlL = 0,
J{h>O} J{h<O}
and so h == 0 a.e. by Lemma 1.24. 0
We insert a simple corollary that will be useful in Chapter 10.
Corollary 2.13 (splitting) Consider two finite measure spaces (8, S, {t)
and (T, T, v) and a measurable map I: S T such that v < /-l0 f-l. Then
there exists a measure J..L' < {t on S such that v == {t' 0 1-1 .
Proof: Put J..L' == (g 0 f) . J.L with 9 == dv / d({t 0 f-l), and use Lemma
1.22. 0
A measure {t on 1R is said to be locally finite if {tI < 00 for every bounded
interval I. The following result gives a basic correspondence between locally
finite measures and nondecreasing functions.
2. Measure Theory -- Key Results 31
Proposition 2.14 (Lebesgue-Stieltjes measures) The relation
j.L(a, b] == PCb) - Pea), -00 < a < b < IX), (5)
defines a one-to-one correspondence between the locally finite measures J-l on
1R and the right-continuous, nondecreasing functions F on JR with F(O) == O.
Proof: Given a locally finite measure J-l on JR, we define the function F
on JR by
F(x) == { j.L(O, x],
-J-l(x,O],
x > 0,
x < o.
Then F is right-continuous and nondecreasing with F(O) == 0, and it is
clearly the unique such function satisfying (5).
Conversely, given a function F as stated, we define the left-continuous,
generalized inverse g: 1R JR by
9 ( t) == inf {s E JR; F ( s) > t}, t E JR.
Since 9 is again nondecreasing, the set g-l( -00, s] is an extended interval
for each s E 1R, and so 9 is measurable by Lemma 1.4. We may then define
a measure J-l on 1R by J-l == A 0 g-1, where A denotes Lebesgue measure on
1R. Noting that get) < x iff t < F(x), we get for any a < b
j.L( a, b]
A{t; get) E (a, b]}
A ( F ( a ), F ( b )] == F ( b) - F ( Q,) .
Thus, the restriction of J-l to JR satisfies (5). The uniqueness of J-l may be
proved in the same way as for A in Theorem 2.2. 0
We now specialize Theorem 2.10 to the case when Ji equals Lebesgue
measure and v is a locally finite measure on JR, defined as in Proposition 2.14
in terms of some nondecreasing, right-continuous function F. The Lebesgue
decomposition and Radon-Nikodym property may be expressed in terms
of F as
F = Fa + Fs = f f + Fs,
where Fa and Fs correspond to the absolutely continuous and singular com-
ponents of v, respectively, and we assume that Fa(O) == O. Here J f denotes
the function J o x f(t) dt, where the Lebesgue density f is a. locally integrable
function on JR. The following result extends the fundamental theorem of
calculus for Riemann integrals of continuously differentiable functions-the
fact that differentiation and integration are mutually inverse operations.
(6)
Theorem 2.15 (differentiation, Lebesgue) Any nondecreasing and right-
continuous function F == J f + Fs is differentiable a. e. with derivative
F' == f.
Thus, the two parts of the fundamental theorem generalize to (J f)' == f
a.e. and J F' = Fa. In other words, the density of an integral can still be
32 Foundations of Modern Probability
recovered a.e. through differentiation, whereas integration of a derivative
yields only the absolutely continuous component of the underlying function.
In particular, F is absolutely continuous iff J F' == F - F(O) and singular
iff p' = 0 a.e.
The last result extends trivially to any difference F == F + - F _ between
two nondecreasing, right-continuous functions F + and F _. However, it fails
for more general functions, already because the derivative may not exist.
For example, the paths of Brownian motion introduced in Chapter 13 are
a.s. nowhere differentiable.
Two lemmas will be helpful for the proof of the last theorem.
Lemma 2.16 (interval selection) Let I be a class of open intervals with
union G. If AG < 00, there exist some disjoint sets II, . . . , In E I with
I:k Ilk I > AG/4.
Proof: Choose a compact set KeG with AK > 3AG /4. By compactness
we may cover K by finitely many intervals J 1 , . .. , J m E I. We now define
II, 1 2 , . .. recursively, by letting I k be the longest interval J r not yet chosen
such that J r n Ij = 0 for all j < k. The selection terminates when no such
interval exists.
If an interval J r is not selected, it must intersect a longer interval Ik.
Writing i k for the interval centered at Ik with length 3IIk\, we obtain
K c UrJ r C Uki k ,
and so
{3/4).xC < .xK < .xU/k < Lk1ikl = 3 Lk 1hl . 0
Lemma 2.17 (differentiation on null sets) Let F(x) = J.l(O,x] for some
locally finite measure J.l on ]R, and let A E B with JLA = o. Then P' = 0
a.e. A on A.
Proof: By Lemma 1.34 there exists for every fJ > 0 some open set G{) =:) A
with J.lG () < fJ. Define
{ . J.l(x - h, x + h) }
Ae = x E A; hp h > € , e > 0,
and note that each Ac is measurable since the lim sup may be taken along
the rationals. For every x E Ae there exists some interval I = (x-h, x+h) c
G6 with 2JLI > elII, and we note that the class Ie,o of such intervals covers
Ac. Hence, by Lemma 2.16 we may choose some disjoint sets II,.. . , In E
I e ,6 with I:k Ilk I > AAe/ 4 . Then
'"' 8 " 8 J.lG{j 8b
AAc < 4 L...."k'Ik I < e L...JkJ.llk < e < .
As 6 -t 0, we get AAe == O. Thus, limsupJ.l(x - h,x + h)/h < e a.e. A on
A, and the assertion follows since e is arbitrary. 0
2. Measure Theory - Key Results 33
Proof of Theorem 2.15: Since F == 0 a.e. A by Lenlma 2.17, we may
assume that F == J f. Define
pA(X) limsuph-1(F(x + h) - F(x)),
h-+O
F V (x) lim inf h- 1 (F(x + h) - F(x)),
h.......O
and note that FA == 0 a.e. on the set {f == O} == {x; f(;r) == O} by Lemma
2.17. Applying this to the function Fr == J (f - r) + for arbitrary r E and
noting that f < (j - r)+ + r, we get FA < r a.e. on {f < r}. Thus, for r
restricted to the rationals,
A{J < FA} AU)! < r < FA}
< 2: r A U < r < FA} = 0,
which shows that FA < f a.e. Applying this result to - F == J ( - f) yields
F V == -( -F)/\ > j a.e. Thus, FA == F V == f a.e., and so P' exists a.e. and
equals f. 0
For any function F: JR --+ JR, we define the total variation of P on the
interval [a, b] as
IIFII = sup{tk} 2: k lF(tk) - F(tk-l)!,
where the supremum extends over all finite partitions a == to < t 1 < .. . <
t n == b. Similarly, the positive and negative variations of F are defined by
the same expression with the absolute value I . I replaced by the positive
and negative parts (.):f:. Here xi: == (:l:x) V 0, so that x == x+ - x- and
Ix I == x+ + x-. We also write LlF == PCb) - P(a).
The following result gives a basic decomposition of functions of locally
finite variation, similar to the Hahn decomposition in Theorem 2.8.
Proposition 2.18 (Jordan decomposition) A function F on has locally
finite variation iff it is a difference of two nondecreasing functions F + and
F _. In that case,
I!FII; < Ll;F+ + Ll;F_, s < t, (7)
with equality iff the increments Ll;F:f: agree with the positive and negative
variations of F on (s, t].
Proof: For any s < t we have
(dF)+ - (d;F)- + F,
IFI - (LlF)+ + (d;F)- == 2(F)- + LlF.
Summing over the intervals in an arbitrary partition s = to < tl < ... <
t n == t and taking the supremum of each side, we obtain
Ll;F+ - Ll;F_ + ;F,
IIFII = 2LlF_ + F = F+ + LlF_,
34 Foundations of Modern Probability
where F:f: (x) denote the positive and negative variations of F on [0, x] (or
minus the variations on [x, 0] when x < 0). Thus, F = F(O) + F+ - F_,
and (7) holds with equality. If also F = G+ - G_ for some nondecreas-
ing functions G x , then (F):f: < G:i:, and so Fx < G:f:. Thus,
IIFII; < ;G+ + ;G_, and equality holds iff ;F:f: = G:f:. 0
Next we give another useful decomposition of finite-variation functions.
Proposition 2.19 (left and right continuity) Any function F of locally
finite variation can be written as Pr + Fl, where Fr is right-continuous
with left-hand limits and Fl is left-continuous with right-hand limits. If F
is right-continuous, then so are the minimal components F:f: in Proposition
2.18.
Proof: By Proposition 2.18 we may assume that F is nondecreasing. The
right- and left-hand limits P1:. ( s) then exist at every point s, and we note
that P-(s) < pes) < P+(s). Also note that F has at most countably many
jump discontinuities. For t > 0, we define
Fi(t) == (F+(s) - F(s)),
L-tsE[O,t)
Fr(t) == F(t) - FI(t);
when t < 0 we need to take the negative of the corresponding sum on (t, 0].
It is easy to check that Fl is left-continuous and Pr is right-continuous, and
that both functions are nondecreasing.
To prove the last assertion, assume that F is right-continuous at some
point s. If !lFII; -t c > 0 as t -!. s, we may choose t - s so small that
IIFII < 4c/3. Next we may choose a partition s == to < t l < . . . < t n == t of
[s, t] such that the corresponding F-increments 8k satisfy Lk 16kl > 2c/3.
By the right continuity of F at s, we may assume that tl - s is small enough
that 6 1 = IF(t l ) - F(s)1 < c/3. Then IIFln 1 > c/3, and so
4c/3 > IIFII == IIFIIl + IIFIIl > c + c/3 == 4c/3,
a contradiction. Hence c == O. Assuming F:f: to be minimal, we obtain
F:f: < IIFII --t 0, t t s.
o
Justified by the last theorem, we may assume our finite-variation func-
tions to be right-continuous. In that case, we have the following basic
relation to signed measures. Here we only require the latter to be locally
bounded.
Proposition 2.20 (finite-variation functions and signed measures) For
any right-continuous function F of locally finite variation, there exists a
unique signed measure v on]R such that v(s, t] = b,.;F for all s < t. Further-
more, the Hahn decomposition v = v+ - v_ and the Jordan decomposition
F = F+ - F_ into minimal components are related by v:t:(s, t] = Fx.
Proof: The positive and negative variations F:f: are right-continuous by
Proposition 2.19. Hence, by Proposition 2.14 there exist some locally finite
2. Measure Theory - Key Results 35
measures J-tI on JR such that J-t-J: (s, t] - ;F I' and we may take v ==
J-l+ - /-l-.
To see that this agrees with the Hahn decomposition v == v+ - v_, choose
A E A such that lI+Ac == v_A == O. For any B E B, we get
J-t+B > J-l+(B n A) > v(B n A) == v+(B n A) == v+B,
which shows that J-t+ > v+. Then also J-l- > v_. If the equality fails on
some interval (s, t], then
II F ,,; == J-t+ ( s, t] + J-l- ( s, t] > v + ( s, t] + v _ ( s, t],
which contradicts Proposition 2.18. Hence, J-l-:1::. == V::t:.
o
A function F: JR -+ 1R is said to be absolutely continuous if for any a < b
and E > 0 there exists some {) > 0 such that, for any finite collection of
disjoint intervals (ak, b k ] C (a, b] with 2:k Ibk-akl < 8, we have 2:k IF(bk)-
F(ak) I < E. In particular, we note that every absolutely continuous function
is continuous and has locally finite variation.
Given a function F of locally finite variation, we say that F is singular if
for any a < band E > 0 there exist finitely many disjoint intervals (ak, b k ] C
(a, b] such that 2:k Ib k - akl < E and IIFII < 2:k IF(bk) - F(ak)1 + E.
We say that a locally finite signed measure v on 1R is absolutely continu-
ous or singular if the components VI of the associated Hahn decomposition
satisfy V-J: « A or V-J: 1- A, respectively. The following result relates the
notions of absolute continuity and singularity for functions and measures.
Proposition 2.21 (absolutely continuous and singular functions) Let F
be a right-continuous function on JR of locally finite variation, and let v be
the associated signed measure on JR with v( s, t] - ;F. Then F is absolutely
continuous or singular iff the corresponding property holds for v.
Proof: If F is absolutely continuous or singular, then the corresponding
property holds for the total variation function IIFII with arbitrary a and
hence also for the minimal components F I in Proposition 2.20. Thus, we
may assume that F is nondecreasing, so that v is a positive and locally
finite measure on .
First assume that F is absolutely continuous. If v 1;:. 'x, there exists
a bounded interval I = (a, b) with a subset A E B such that ,XA == 0
but vA > O. Taking E == vA/2, we choose a corresponding 8 > 0 as in
the definition of absolute continuity. Since A is measurable and has outer
Lebesgue measure 0, we may next choose an open set Ci with A c Gel
such that ,XC < 8. But then vA < vG < E == vA/2, a contradiction. This
shows that v « 'x.
Next assume that F is singular, and fix any bounded interval I = (a, b].
Given any c > 0, we may choose some Borel sets A 11 A 2 , . .. c I such
that AAn < E2- n and vAn -+ vI. Then B == Un An satisfies ,XB < E and
vB = vI. Next we may choose some Borel sets Bn C I with 'xBn -+ 0 and
36 Foundations of Modern Probability
v Bn == v I. Then C == nn Bn satisfies .xC == 0 and vC == v I, which shows
that v -L A on I.
Conversely, assume that v « A, so that v == f . A for some locally
integrable function f > O. Fix any bounded interval I and put An == {x E
I; f(x) > n}. Fix any E > O. Since vAn -+ 0 by Lemma 1.14, we may
choose n so large that vAn < E/2. Put {) == E/2n. For any Borel set Bel
with AB < {) we obtain
vB == v(B n An) + v(B n A) < vAn + nAB < !E + n6 == E.
In particular, this applies to any finite union B of intervals (ak, b k ] C I,
and so we may conclude that F is absolutely continuous.
Finally, assume that v ...L A. Fix any finite interval I == (a, b], and choose
a Borel set A c I such that AA == 0 and vA == vI. For any E > 0 we may
choose some open set G J A with .xG < E. Letting (an, b n ) denote the
connected components of G and writing In =: (an, b n ], we get En IInl < E
and En v(l n In) == vI. This shows that F is singular. 0
From now on, we assume the basic space S to be locally compact, second
countable, and Hausdorff (abbreviated lcscH). Let g, F, and K denote the
classes of open, closed, and compact sets in S, and put g == {G E g; G E
K}. Let 6+ == 6+ (S) denote the class of continuous functions f: 8 -+ JR+
with compact support, where the latter is defined as the closure of the set
{x E S; f(x) > O}. Relations such as U -< f -< V mean that f E 6+ with
o < f < 1 and satisfies f == 1 on U and supp f C VO.
By a positive linear functional on 0+ we mean a mapping J-t: 6+ -+ R+
such that /l(f + g) == J-Lf + J.lg for all f, 9 E 6+- This clearly implies the
homogeneity J-t(cf) == CIl! for any f E 6+ and c E JR+. A Radon measure
on 8 is defined as a measure J.t on the Borel a-field S == 8(8) such that
J.LK < 00 for every K E /C. The following result gives the basic extension
of positive linear functionals to measures.
Theorem 2.22 (Riesz representation) If S is lescH, then every positive
linear functional J.L on 6+(8) extends uniquely to a Radon measure on S.
Several lemmas will be needed for the proof, and we begin with a simple
topological fact.
Lemma 2.23 (partition 01 unity) For any open cover G I ,. . . , G n of a
compact set K c S, there exist some functions 11,.. -, In E 6+(8) with
fk -< Gk such that Ek fk = 1 on K.
Proof: For any x E K we may choose some k < n and V E g with
x E V and V C Gk. By compactness, K is covered by finitely many such
sets VI, - . . , V m - For each k < n, let Uk be the union of all sets Vj with
V j c G k . Then U k c G k , and so we may choose g1,. - . , gn E 0+ with
Uk -< gk -< Gk. Define
Ik == 9k(1 - 91) . . · (1 - 9k-l), k == 1, . . . , n.
2. Measure Theory - Key Results 37
Then fk -< G k for all k, and by induction
f 1 + . . . + In == 1 - (1 - 91) . . . (1 - 9n).
It remains to note that TIk(l - gk) == 0 on K since K C Uk Uk. 0
By an inner content on an IcscH space 5 we mean a nondecreasing func-
tion J.L : 9 -+ 1R +, finite on g, such that J.L is both finitely additive and
countably subadditive, and also satisfies the inner continuity
J-LG == sup{J.LU; U E g, U c G}, G E Q.
(8)
Lemma 2.24 (inner approximation) For any positive linear functional J.L
on 6+(5), we may define an inner content v on S by
vG == sup{J.Lf; I -< G}, G E g.
Proof: Note that v is nondecreasing with v0 = 0 and that vG < 00 for
bounded G. It is also clear that 1/ is inner continuous in the sense of (8).
To show that 1/ is countably subadditive, fix any G 1, G 2 , . .. E 9 and
let f -< Uk Gk. By compactness, f -< Uk<n Gk for some finite n, and by
Lemma 2.23 we may choose some functions gk -< G k such that 'Lk 9k == 1
on supp f. Then the products fk == gkf satisfy Ik -< G A; and 'Lk fk == f,
and so
J.Lf = Lk<.S.nJ.t!k < Lk<.S.n vGk < LkvG k .
Since f -< Uk G k was arbitrary, we obtain v Uk G k < 'Lk vG k , as required.
To show that v is finitely additive, fix any disjoint sets G, G' E g. If
I -< G and f' -< G', then f + I' -< G u G', and so
J.Lf + J.LI ' == J.L(f + f') < v( G U G ' ) < vG + vG'.
Taking the supremum over all f and I' gives vG + vG ' == v(G u G'), as
required. 0
An outer measure J.L on S is said to be regular if it is finitely additive on
9 and enjoys the outer and inner regularity
JLA
J.LG
inf{J.LG; G E Q, G :> A},
sup{J-lK; K E lC, KeG},
A c S,
G E g.
(9)
(10)
Lemma 2.25 (outer approximation) Every inner content J.L on S admits
an extension to a regular outer measure.
Proof: We may define the extension by (9), since the right-hand side
equals J.lA when A E g. By the finite additivity on 9 we have 2J.L0 == J.L0 <
00, which implies J.l0 = O. To prove the countable subadditivity, fix any
Al,A2'..' C S. For any e > 0 we may choose some G 1 ,G 2 ,... E Q with
G n :) An and J-tG n < J-tAn + e2- n . Since J.l is subadditive on g, we get
J-tU An < J.l u Gn < J-lG n < J-tAn + E.
n n n n
38 Foundations of Modern Probability
The desired relation follows since c was arbitrary. Thus, the extension is
an outer measure on S. Finally, the inner regularity in (10) follows from
(8) and the monotonicity of J-L. 0
Lemma 2.26 (measurability) If J-t is a regular outer measure on S, then
every Borel set in S is J.l-measurable.
Proof: Fix any F E :F and A c G E g. By the inner regularity in (10),
we may choose G 1 ,G 2 ,... E 9 with G n c G\F and J-tG n -+ J.l(G\F).
Since J-L is nondecreasing and finitely additive on g, we get
J-tG > J.l(G \ aG n ) == J-LG n + J-t(G \ G n )
> J-tG n + J-t( G n F)
-+ J-t(G\F) + J-t(Gn F)
> J.l(A \ F) + J-t(A n F).
Using the outer regularity in (9) gives
J-lA > J-t(A \ F) + J-L(A n F), F E:F, A c S.
Hence, every closed set is measurable, and by Theorem 2.1 the measura-
bility extends to (1(F) == B(S) == S. 0
Proof of Theorem 2.22: Construct an inner content v as in Lemma 2.24,
and conclude from Lemma 2.25 that v admits an extension to a regular
outer measure on S. By Theorem 2.1 and Lemma 2.26, the restriction of
the latter to S == B(S) is a Radon measure on S, here still denoted by v.
To see that It == v on 6+, fix any f E 6+. For n E Nand k E Z+, let
IJ: (x) ( n I ( x) - k) + 1\ 1,
G k {nf> k} == {Ii: > O}.
Noting that G +1 c {IT: == I} and using the definition of v and the outer
regularity in (9), we get for appropriate k
v 1;:+1 < VG k + 1 < J-lfJ: < vG < v If-I.
Writing Go == Go == {f > O} and noting that nl == Ek IT:, we obtain
nvf - vG o < nJ-tf < nvf + v G o .
Here vG o < 00 since Go is bounded. Dividing by n and letting n -+ 00
gives J-lf == v f.
To prove the asserted uniqueness, let J-t and v be Radon measues on S
with J-t! == vf for all f E 6+. By an inner approximation, we have J-tG == vG
for every G E g, and a monotone-class argument yields J-t == v. 0
By a topological group we mean a group endowed with a topology that
renders the group operations continuous. Thus, the mapping (I, g) H- 1 9
is continuous from G 2 to G, whereas the mapping 9 t-+ g-1 is continuous
from G to G. In the former case, G2 is equipped with the product topology.
2. Measure Theory - Key Results 39
Introducing the Borel a-field 9 == B( G), we obtain a measurable group
(G, Q), and we note that the group operations are measurable when G is
lcscH. A measure J-t on G is said to be left-invariant if J--L(gB) == J-tB for
all 9 E G and B E Q, where gB == {gb; b E B}, the left translate of B
by g. This is clearly equivalent to J f (g k ) J-l (dk) == J--L f for any measurable
function I: G -+ + and element 9 E G. The definition of right-invariant
measures is similar.
We may now state the basic existence and uniqueness theorem for
invariant measures on groups.
Theorem 2.27 (Haar measure) On every lcscH group G there exists,
uniquely up to a normalization, a left-invariant Radon measure A =I o. If
G is compact, then ,\ is also right-invariant.
Proof (Weil): For any I, 9 E 6+ we define Ifl g == inf I:k Ck, where the
infimum extends over all finite sets of constants Cl, . . . , C n > 0 such that
f(x) < Lk:s;n ckg(SkX), x E G,
for some Sl,..., Sn E G. By compactness, I/lg < 00 when 9 =I O. We also
note that Ifl g is nondecreasing and translation invariant in f, and that it
satisfies the subadditivity and homogeneity properties
If + f'lg < Ifl g + 1/'lg,
Icilg == clil g ,
(11 )
as well as the inequalities
II fll
M < Ifl g < Iflhlhlg.
We may normalize Ifl g by fixing an fo E C+ \ {O} and putting
'\gl == I/lg/lfol g , f,g E C+, 9 -I- O.
From (11) and (12) we note that
Ag(f + f') < Agi + Agf', Ag(cf) == cAgf, (13)
1/01 / 1 < Agf < I/lfo. (14)
(12)
Conversely, Ag is nearly superadditive in the following sense.
Lemma 2.28 (near superadditivity) For any f, f' E C+ and E > 0, there
exists an open set U i= 0 such that
AgI + '\g/' < Ag(f + I') + E, 0 i= 9 -< if.
Proof: Fix any h E 6+ with h == 1 on supp(f + I'), and define for 8 > 0
f6 == f + I' +8h, h6 = Ilf6, h == /'116,
so that hh, h E 6+. By compactness we may choose a neighborhood U of
the identity element e E G such that
I h c5(X) - hh(y)1 < 8, Ih8(x) - h(y)1 < 8,
x-ly E U.
(15)
40 Foundations of Modern Probability
Now assume 0 =P 9 -< U, and let f6(X) < 2:k ckg(SkX) for some
Sl,.. . , Sn E G and C1,... , C n > O. Since g(SkX) i= 0 implies SkX E U,
we have by (15)
f(x) = f6(x)h6(x) < 2: k ckg(Sk X )h6(X)
< 2: k ck9(SkX){ h6(S;;1) + 6} ,
and similarly for f'. Noting that h6 + h < 1, we get
Ifl g + If'lg < 2: k ck(l + 26).
Taking the infimum over all dominating sums for f6 and using (11), we
conclude that
Ifl g + If'lg < 1/61g(1 + 28) < {If + f'ly + 8l h l g } (1 + 28).
Now divide by I/olg, and use (14) to obtain
Agf + Agf' < {Ag(1 + j') + <5Agh} (1 + 28)
< Ag(1 + j') + 2<5lf + j'llo + <5(1 + 2<5)lhll o ,
which tends to )..g (f + I') as <5 -+ o.
o
Returning to the proof of Theorem 2.27, we may consider the functionals
Ag as elements of the product space A = JR+. For any neighborhood U
of e, let Au denote the closure in A of the set {A g ; 0 i= 9 -< U}. Since
Agi < Ifll o < 00 for all f E 0+ by (14), the Au are compact by Tychonov's
theorem. Furthermore, the family {Au; e E U} has the finite intersection
property since U C V implies Au C Av. We may then choose an element
,.\ E nu Au, here regarded as a functional on 6+. From (14) we note that
A i= O.
To see that ,.\ is linear, fix any I, f' E 6+ and a, b > 0, and choose some
91,92,... E 6+ with suppgn t {e} such that
Agn f -+ Af,
A f ' ---t ,.\ 1 '
gn ,
Agn (af + bf') -t >..(af + bf').
By (13) and Lemma 2.28 we obtain >..(af + bf') = a>..f + bAf'. Thus,
A is a nontrivial, positive linear functional on 6+, and so by Theorem
2.22 it extends uniquely to a Radon measure on S. The invariance of the
functionals Ag clearly carries over to A.
Now consider any left-invariant Radon measure A =1= 0 on G. Fixing a
right- invariant Radon measure J-l t= 0 and a function h E 6+ \ {O}, we
define
p(x) = J h(y- 1 x)J1.(dy), x E G,
2. Measure Theory - Key Results 41
and we note that p > 0 on G. Using the ipvariance of A and f1 together
with Fubini's theorem, we get for any f E C+
(Ah) (J1f)
J h(x) 'x(dx) J f(y) p(dy)
J h(x)A(dx) J f(yx) p(dy)
J p(dy) J h(x)f(yx) 'x(dx)
J p(dy) J h(y- 1 x)f(x) 'x(dx)
J f(x) 'x(dx) J h(y- 1 x) p(dy) == 'x(Jp).
Since f was arbitrary, we conclude that (>"'h)fL == P . )\ or, equivalently,
>.../ >"'h == p-l . fl. Here the right-hand side is independent of >..., and the
asserted uniqueness follows. If S is compact, we may choose h 1 to
obtain A/AS = J1/J-lS. 0
Given a group G and an abstract space S, we define a left action of G
on S as a mapping (g, s) gs from G x S to S such that es == sand
(gh)s == g(hs) for any g, h E G and s E S, where e denotes the identity
element in G. Similarly, a right action is a mapping (s, g) S9 such that
se == sand s(gh) == (sg)h for all s, g, h as above. The action is said to be
transitive if for any s, t E S there exists some 9 E G such that gs == t or
sg = t, respectively. All actions are henceforth assumed to be from the left.
If G is a topological group and S is a topological space, we assume
the action (x, s) xs to be continuous from G x S to S. A function
h: G --+ S is said to be proper if h- 1 K is compact in G for any compact
set K c S; if this holds for every mapping 1fs(x) == xs, s E S, we say
that the group action is proper. Finally, a measure fL on S is G-invariant
if J..t(xB) = J-lB for any x E G and B E S. This is clearly equivalent to the
relation J f(xs)J..t(ds) = J-lf for any measurable function f: S --+ + and
element x E G.
We may now state the basic existence and uniqueness result for invariant
measures on a general IcscH space. The existence of Haar measures in
Theorem 2.27 is a special case.
Theorem 2.29 (invariant measure) Consider an lcscH group G that acts
transitively and properly on an lcscH space S. Then the1'e exists, uniquely
up to a normalization, a G-invariant Radon measure f..L =I 0 on S.
Proof: Fix any pES, and let 1T denote the mapping x xp from G to
S. Letting A be a left Haar measure on G, we define J1 == A 0 7T- 1 . Since
7r is proper, we note that J..t is a Radon measure on S. To see that J-l is
42 Foundations of Modern Probability
G-invariant, let 1 E 6+ be arbitrary, and note that for any x E G
Is J(xs) J.l(ds) = L f(xyp»..(dy) = L f(yp) )"(dy) = J.lf,
by the invariance of A.
To prove the uniqueness, let J.l be an arbitrary G-invariant Radon
measure on S. Introduce the subgroup
K == {x E G; xp == p} == 7r- 1 {p},
and note that K is compact since rr is proper. Let v be the normalized
Haar measure on K, and define
/(x) = i f(xk) v(dk), x E G, f E C'+(G).
If xp == yp, we have y- 1 xp == p, and so y- 1 x = h E K, which implies
x = yh. Hence, the left invariance of v yields
/(x) = J(yh) = [f(Yhk) v(dk) = i f(yk) v(dk) = fey).
We may then define a mapping f f-t f* by
I*(s) == l(x), s == xp E S, x E G, f E 6+(G).
For any subset B C (0, (0), we note that
(f*)-1 B == rr(I- 1 B) C 7r[(supp f) . K].
Here the right-hand side is compact since the sets supp 1 and K are com-
pact, and since 7r and the group operation in G are both continuous. Thus,
f* has bounded support. Furthermore, f is continuous by dominated con-
vergence, and so 1-1 (t, 00) is closed and hence compact for every t > o.
By the continuity of 7r it follows that even (f*)-l(t,OO) is compact. In
particular, f* is measurable.
We may now define a functional A on C+(G) by
Af == J.lf*, f E C+(G).
The linearity and positivity of A are clear from the corresponding properties
of the mapping f f* and the measure J-L. We also note that A is finite
on C+(G) since J.l is locally finite. By Theorem 2.22, we may then extend
A to a Radon measure on G.
To see that A is left-invariant, let f E 6+(G) be arbitrary and define
fy(x) = f(yx). For any s = xp E Sand y E G we get
f;(s) = Jy(x) = i f(yxk) v(dk) = J(yx) = f*(ys).
Hence, by the invariance of J.l,
L f(yx) )"(dx) = )..fy = J.lf; = Is f*(ys) J.t(ds) = J.tf* = )..f.
2. Measure Theory - Key Results 43
Now fix any 9 E 6+(8), and put
f ( x) == 9 ( xp) == 9 0 7r ( X ) , x E G.
Then f E C+(G) because {f > O} C 7r- 1 suppg, which is compact since 7r
is proper. By the definition of K, we have for any s == xp E S
1*(s) J(x) = i f(xk) v(dk) = i g(xkp) v(dk)
i g(xp) v(dk) = g(s),
and so
/19 == J-lf* == Af == A(g 0 7r) == (A 0 7r- 1 )g,
which shows that /1 == A 0 rr- 1 . Since A is unique up to a normalization, the
same thing is true for J-l. 0
Exercises
1. Show that if J-ll == 11 . J.t and J-l2 == f2 . J-l, then /11 V J-l2 == (II V !2) . J-l and
JLl 1\ J..L2 == (11 1\ f2) . J-L. In particular, we may take JL == J-.l] + J-.l2. Extend the
result to sequences J-L 1 , J-l2, . . . .
2. Consider an arbitrary family J-li, i E I, of a-finite Ineasures on some
measurable space S. Show that there exists a largest measure J-l == /\n JLn
such that /1 < J-li for all i E I. Show also that if the P'i are bounded by
some a-finite measure v, there exists a smallest measure {L == V n JLi such
that J-li < jl for all i. (Hint: Use Zorn's lemma.)
3. Show that any countably additive set function J-l > 0 on a field A with
JL0 == 0 extends to a measure on a(A). Show also that the extension is
unique whenever J-l is bounded.
4. Extend the first assertion of Theorem 2.6 to the context of general
invariant measures, as in Theorem 2.29.
5. Construct d-dimensional Lebesgue measure Ad directly, by the method
of Theorem 2.2. Then show that Ad == Ad.
6. Derive the existence of d-dimensional Lebesgue measure from Riesz'
representation theorem and the basic properties of the }{,iemann integral.
7. Extend the mean continuity in Lemma 2.7 to general invariant measures.
8. For any bounded, signed measure v on (0, A), show that there exists a
smallest measure Ivl such that IvAI < IvlA for all A E .A. Show also that
Ivl == v+ + v_, where V:i: are the components in the Hahn decomposition
of v. Finally, for any bounded, measurable function f on 0, show that
Iv II < lv' If I.
44 Foundations of Modern Probability
9. Extend the last result to complex-valued measures X = J1- + iv, where J1-
and v are bounded, signed measures on (!1, A). Introducing the complex-
valued Radon-Nikodym density f = dx/d(IJ.t1 + lvI), show that Ixl = If I .
(1J.t1 + Ivl).
10. Show by an example that the uniqueness in Theorem 2.29 may fail if
the group action is not transitive.
Chapter 3
Processes, Distributions,
and Independence
Random elements and processes; distributions and expectation;
independence; zero-one laws; Borel-Cantelli lemma; Bernoulli
sequences and existence; moments and continuity of paths
Armed with the basic notions and results of measure theory from the previ-
ous chapter, we may now embark on our study of probability theory itself.
The dual purpose of this chapter is to introduce the basic terminology and
notation and to prove some fundamental results, many of which are used
throughout the remainder of this book.
In modern probability theory it is customary to relate all objects of study
to a basic probability space (n, A, P), which is nothing more than a normal-
ized measure space. Random variables may then be defined as measurable
functions on !1, and their expected values as the integrals E == J dP.
Furthermore, independence between random quantities reduces to a kind
of orthogonality between the induced sub-O"-fields. It should be noted, how-
ever, that the reference space n is introduced only for technical convenience,
to provide a consistent mathematical framework. Indeed, the actual choice
of f! plays no role, and the interest focuses instead on the various induced
distributions £() = P 0 -1.
The notion of independence is fundamental for all areas of probability
theory. Despite its simplicity, it has some truly remarkable consequences.
A particularly striking result is Kolmogorov's 0-1 law, vv"hich states that
every tail event associated with a sequence of independent random elements
has probability zero or one. As a consequence, any random variable that
depends only on the "tail" of the sequence must be a.s. constant. This result
and the related Hewitt-Savage Q-llaw convey much of the flavor of modern
probability: Although the individual elements of a random sequence are
erratic and unpredictable, the long-term behavior may often conform to
deterministic laws and patterns. Our main objective is to uncover the latter.
Here the classical Borel-Cantelli lemma is a useful tool, among others.
To justify our study, we need to ensure the existence of the random
objects under discussion. For most purposes, it suffices to use the Lebesgue
unit interval ([0,1], B,..\) as the basic probability space. In this chapter
the existence will be proved only for independent random variables with
prescribed distributions; we postpone the more general discussion until
46 Foundations of Modern Probability
Chapter 6. As a key step, we use the binary expansion of real numbers to
construct a so-called Bernoulli sequence, consisting of independent random
digits 0 or 1 with probabilities 1 - p and p, respectively. Such sequences
may be regarded as discrete-time counterparts of the fundamental Poisson
process, to be introduced and studied in Chapter 12.
The distribution of a random process X is determined by the finite-di-
mensional distributions, and those are not affected if we change each value
Xt on a null set. It is then natural to look for versions of X with suitable
regularity properties. As another striking result, we shall provide a moment
condition that ensures the existence of a continuous modification of the
process. Regularizations of various kinds are important throughout modern
probability theory, as they may enable us to deal with events depending
on the values of a process at uncountably many times.
To begin our systematic exposition of the theory, we may fix an ar-
bitrary probability space (0, A, P), where P, the probability measure, has
total mass 1. In the probabilistic context the sets A E A are called events,
and P A == P(A) is called the probability of A. In addition to results valid
for all measures, there are properties that depend on the boundedness or
normalization of P, such as the relation PAc == 1 - P A and the fact that
An .!- A implies P An --t P A.
Some infinite set operations have special probabilistic significance. Thus,
given any sequence of events AI, A 2 , . .. E A, we may be interested in the
sets {An Lo.}, where An happens infinitely often, and {An ult.}, where An
happens ultimately (i.e., for all but finitely many n). Those occurrences are
events in their own right, expressible in terms of the An as
{An i. o. }
{An ult.}
{ lA == oo } == n U Ak,
n n n k?:.n
{ "" lAc < oo } == U n Ak.
n n n k'?:.n
(1)
(2)
From here on, we omit the argument w from our notation when there is no
risk for confusion. For example, the expression {En IAn = oo} is used as
a convenient shorthand form of the unwieldy {w E f1; En IAn (w) == oo}.
The indicator functions of the events in (1) and (2) may be expressed as
l{An i.o.} == limsuplAn'
n -+ ex:>
I{An ult.} = liminfl An ,
n-+oo
where, for typographical convenience, we write 1 { .} instead of 1 {.}.
Applying Fatou's lemma to the functions IAn and lA, we get
P{An i.o.} > limsupPA n ,
n-+oo
P{An ult.} < liminf PAn-
n-+oo
Using the continuity and subadditivity of P, we further see from (1) that
P{An i.o.} == tim P u Ak < lim PAk.
n -+ 00 k '?:.n n -+ 00 L....J k '?:. n
3. Processes, Distributions, and Independence 47
If Ln PAn < 00, we get zero on the right, and it follows that P{ An i.o.} ==
O. The resulting implication constitutes the easy part of the Borel-Cantelli
lemma, to be reconsidered in Theorem 3.18.
Any measurable mapping € of 0 into some measurable space (5, S) is
called a random element in S. If B E S, then { E B} ==- €-l B E A, and
we may consider the associated probabilities
P{€EB}==P(€-lB)==(PO€-l)B, BES.
The set function £(€) == P 0 €-l is a probability measure on the range
space S of , called the distribution or law of . We shall also use the
term distribution as synonomous to probability measure, even when no
generating random element has been introduced.
Random elements are of interest in a wide variety of spaces. A random
element in S is called a random variable when S == JR, a random vector
when S == JRd, a random sequence when S == ]Roo, a random or stochastic
process when S is a function space, and a random measure or set when S
is a class of measures or sets, respectively. A metric or topological space
8 will be endowed with its Borel a-field B(8) unless a a-.field is otherwise
specified. For any separable metric space 5, it is clear from Lemma 1.2
that € == (l, €2, . . .) is a random element in 5 00 iff €l, €2, . .. are random
elements in S.
If (8, S) is a measurable space, then any subset A c S becomes a measur-
able space in its own right when endowed with the a-field An S == {A n B;
B E S}. By Lemma 1.6 we note in particular that if S is a metric space
with Borel a-field S, then A n S is the Borel a-field in A. Any random
element in (A, An S) may clearly be regarded, alternatively, as a random
element in S. Conversely, if € is a random element in S such that € E A
a.s. (almost surely or with probability 1) for some A E , then € == TJ a.s.
for some random element 17 in A.
Fixing a measurable space (8, S) and an abstract index set T, we shall
write ST for the class of functions f : T S, and let ST denote the a-
field in ST generated by all evaluation maps 1ft : ST S, t E T, given by
1ft! == j(t). If X: n -1- U CST, then clearly Xt == 1ft 0)( maps n into S.
Thus, X may also be regarded as a function X ( t, w) == -LY t (w) from T x n
to S.
Lemma 3.1 (measurability) Fix a measurable space (5" S), an index set
T, and a subset U c ST. Then a function X: 0 -+ U is [J n ST -measurable
iff Xt : n -+ S is S-measurable for every t E T.
Proof: Since X is U-valued, the U n ST-measurability is equivalent to
measurability with respect to ST. The result now follows by Lemma 1.4
from the fact that ST is generated by the mappings 1ft. 0
A mapping X with the properties in Lemma 3.1 is called an 5-valued
(random) process on T with paths in U. By the lemma it is equivalent to
regard X as a collection of random elements Xt in the state space S.
48 Foundations of Modern Probability
For any random elements and 'fJ in a common measurable space, the
equality d 1] means that and fJ have the same distribution, or £() ==
£('fJ). If X is a random process on some index set T, the associated finite-
dimensional distributions are given by
tl,...,tn == £(X tl ,..., X tn ), tI,..., t n E T, n E N.
The following result shows that the distribution of a process is determined
by the set of finite-dimensional distributions.
Proposition 3.2 (finite-dimensional distributions) Fix any 5, T, and U
as in Lemma 3.1, and let X and Y be processes on T with paths in U. Then
X d Y iff
d
(X tI , . . . , X tn ) == (Yi 1 , . . . , Yi n ), t I , . . . , t n E T, n E N. (3)
Proof: Assume (3). Let V denote the class of sets A E ST with P{ X E A}
== P {Y E A}, and let C consist of all sets
A == {f E ST; (ftl'... , ft n ) E B}, t 1, . . . , t n E T, B E sn, n EN.
Then C is a 7r-system and V a A-system, and furthermore C c V by
hypothesis. Hence, ST == o-(C) C V by Theorem 1.1, which means that
X d Y. 0
For any random vector == (1,. . . , d) in lR d , we define the associated
distribution function F by
F(XI,." ,Xd) = pnk$;d{k < xd, Xl,... ,Xd E R
The next result shows that F determines the distribution of .
Lemma 3.3 (distribution functions) Let and 7} be random vectors in IR d
with distribution functions F and G. Then d 'T7 iff F == G.
Proof: Use Theorem 1.1.
o
The expected value, expectation, or mean of a random variable is defined
as
E = l dP = l x(poCI)(dx)
whenever either integral exists. The last equality then holds by Lemma
1.22. By the same result we note that, for any random element in some
measurable space S and for an arbitrary measurable function f: S --t IR,
(4)
Ef() = l f() dP = Is f(s)(P 0 C l )(ds)
- LX(Po(Jo)-I)(dx),
(5)
3. Processes, Distributions, and Independence 49
provided that at least one of the three integrals exists. Integrals over a
measurable subset A c n are often denoted by
E[;A] = E(lA) = i dP, A E A.
For any random variable and constant p > 0, the integral EIIP
IIJJ is called the pth absolute moment of . By Holder's inequality (or by
Jensen's inequality in Lemma 3.5) we have 1Illp < 11llq for p < q, so the
corresponding LP-spaces are nonincreasing in p. If E LP and either pEN
or > 0, we may further define the pth moment of as lp.
The following result gives a useful relationship between moments and
tail probabilities.
Lemma 3.4 (moments and tails) For any random variable > 0,
Ee = p 1 00 P{ > t}tp-1dt = p 1 00 P{ > t}tp-1dt, p > O.
Proof: By calculus and Fubini's theorem,
Ee = pE 1$. tp-1dt = pE 1 00 1{ > t} tp-1dt
- p 1 00 P{ > t} tp-1dt.
The proof of the second expression is similar.
o
A random vector = (1,'.., d) or process X = (.X t ) is said to be
integrable if integrability holds for every component k or value Xt, in
which case we may write E == (El,...,Ed) or EX == (EXt). Recall
that a function f: ]Rd ---+ IR is said to be convex if
f(px + (1 - p)y) < pf(x) + (1 - p)f(y), x, Y E d, P E [0,1]. (6)
The relation may be written as f(E) < Ef(), where is a random
vector in d with P{ = x} = 1 - P{{ = y} = p. The following extension
to arbitrary integrable random vectors is known as Jensen's inequality.
Lemma 3.5 (convex maps, Holder, Jensen) For any 'integrable random
vector in JR.d and convex function f: JR.d ---+ JR., we have
Ej() > j(E{).
Proof: By a version of the Hahn-Banach theorem, the convexity condition
(6) is equivalent to the existence for every s E }Rd of a supporting affine
function hs(x) = ax + b with f > hs and f(s) = hs(s). Taking s = E{
gl ves
Ef() > Ehs() = hs(E) = f(E).
o
The covariance of two random variables , 17 E £2 is given by
cov(, 'rJ) = E( - E)(1J - E17) = E{1J - E . E1J.
50 Foundations of Modern Probability
The resulting functional is bilinear, in the sense that
cov ( "'. ajj," bk17k ) == ". " ajbkcov(j, 17k).
J5:m k5:n J5:mkn
Taking == 17 E £2 yields the variance
var() == cov(,) == E( - E)2 == E2 - (E)2,
and we note that, by the Cauchy-Buniakovsky inequality,
Icov(,17)1 < {var()var(17)}1/2.
Two random variables and 17 are said to be uncorrelated if cov( , TJ) == O.
For any collection of random variables t E £2, t E T, the associated
cova-riance function Ps,t == cov(s, t), s, t E T, is nonnegative definite,
in the sense that 2:ij aiajPt1.,tJ > 0 for any n E N, t I ,... t n E T, and
aI, . . . , an E JR. This is clear if we write
L. .ai a jPt 1 ,t J == L. .aiajCov(ti'tj) == v ar { L.ait1. } > o.
,J 'I, ,J 'l
The events At E A, t E T, are said to be (mutually) independent if, for
any distinct indices t 1, 0 0 . , t n E T,
P n Atk == II PAtko
kn kn
(7)
More generally, we say that the families C t C A, t E T, are independent
if independence holds between the events At for arbitrary At E C t , t E T.
Finally, the random elements t, t E T, are independent if independence
holds between the generated a-fields a(t), t E T. Pairwise independence
between two objects A and B, and 17, or Band C is often denoted by
AlLB, iL17, or BJlC, respectively.
The following result is often useful to prove extensions of the indepen-
dence property.
Lemma 3.6 (extension) lithe 7r-systems C t , t E T, are independent, then
so are the generated a-fields Ft == a(C t ), t E T.
Proof: We may clearly assume that C t =1= 0 for all t. Fix any distinct
indices t I , . . . , t n E T, and note that (7) holds for arbitrary Atk E C tk , k =
1, . 0 . , n. For fixed A t2 , . . . , At n , we introduce the class D of sets Atl E A
satisfying (7). Then D is a A-system containing C tl , and so D ::> a(C tl ) =
F t1 by Theorem 1.1. Thus, (7) holds for arbitrary Atl E Ftl and Atk E
C tk , k = 2, . . . , n. Proceeding recursively in n steps, we obtain the desired
extension to arbitrary Atk E F tk , k == 1, . . . , n. 0
As an immediate consequence, we obtain the following basic grouping
property. Here and in the sequel we shall often write F V 9 = a{ F, Q} and
Fs = VtEsF t = o-{Ft; t E S}.
3. Processes, Distributions, and Independence 51
Corollary 3.7 (grouping) Let Ft, t E T, be independenl a-fields, and let
T be a disjoint partition ofT. Then the a-fields Fs == VtE:s Ft, 5 E T, are
again independent.
Proof: For any SET, let Cs denote the class of all finite intersections
of sets in UtES Ft. Then the classes Cs are independent n-systems, and by
Lemma 3.6 the independence extends to the generated a.. fields Fs. 0
Though independence between more than two a-fields is clearly stronger
than pairwise independence, we shall see how the full independence may
be reduced to the pairwise notion in various ways. Given any set T, we say
that a class T c 2 T is separating, if for any s =I=- t in T there exists some
SET such that exactly one of the elements sand t lies in S.
Lemma 3.8 (pairwise independence)
(i) The a-fields F 1 ,F 2 ,... are independent iff Vk<nJ:-k llFn+l for all
n.
(ii) The a-fields Ft, t E T, are independent iff FsJ1.:Fsc for all sets S in
some separating class T c 2 T .
Proof: The necessity of the two conditions follows from Corollary 3.7.
As for the sufficiency, we consider only part (ii), the proof for (i) being
similar. Under the stated condition, we need to show that, for any finite
subset SeT, the a-fields Fs, S E S, are independent. Let 151 denote
the cardinality of 5, and assume the statement to be true for ISI < n.
Proceeding to the case when ISI == n + 1, we may choose U E T such that
S' == S n U and S" == S \ U are nonempty. Since F s,llFs", we get for any
sets As E Fs, s E S,
P n As == ( p n As ) ( p n As ) == II PAs,
sES sES' sES" sES
where the last relation follows from the induction hypothesis.
o
A a-field F is said to be P-trivial if P A == 0 or 1 for every A E F. We
further say that a random element is a.s. degenerate if its distribution is a
degenerate probability measure.
Lemma 3.9 (triviality and degeneracy) A a-field:F is i)-trivial iff F lLF.
In that case, any :F -measurable random element taking values in a
separable metric space is a. s. degenerate.
Proof: If F llF, then for any A E F we have P A == P(A n A) == (P A)2,
and so P A == 0 or 1. Conversely, assume that F is P-trivial. Then for any
two sets A, B E :F we have p(An B) == P A /\ PB == P A. P B, which means
that F Jl:F.
Now assume that F is P-trivial, and let be as stated. For each n we
may partition S into countably many disjoint Borel sets Bnj of diameter
< n- 1 . Since P{ E B nj } = 0 or 1, we have E Bnj a.s. for exactly one j,
52 Foundations of Modern Probability
say for j = jn. Hence, E nn Bn,jn a.s. The latter set has diameter 0, so
it consists of exactly one point s, and we get == s a.s. 0
The next result gives the basic relation between independence and
product measures.
Lemma 3.10 (product measures) Let l,. . . , n be random elements in
some measurable spaces 8 1 , . . . , Sn with distributions J.ll, . . . , J-Ln. Then the
k are independent iff == (l, . . . , n) has distribution J.-Ll 0 . . . Q9 J.-Ln.
Proof: Assuming the independence, we get for any measurable product
set B == Bl X . . . x Bn
P{ E B} == II P{k E Bk} == II JlkBk == 0 Jlk B.
kn kn kn
This extends by Theorem 1.1 to arbitrary sets in the product a-field. 0
In conjunction with Fubini's theorem, the last result leads to a useful
method of computing expected values.
Lemma 3.11 (conditioning) Let and 1] be independent random elements
in some measurable spaces Sand T, and let the function f: S x T JR be
measurable with E(Elf(s, 1])l)s= < 00. Then Ef(f,,1]) == E{Ef(s, 1]))s=.
Proof: Let Jl and v denote the distributions of and 'T}, respectively.
Assuming that f > 0 and writing g(s) = Ef(s,1]), we get, by Lemma 1.22
and Fubini's theorem,
E f(t;, 17) - f f( s, t)(J.L Q9 v)(dsdt)
f J.L(ds) f f(s, t)v(dt) = f g(s)J.L(ds) = Eg(f,,).
For general f, this applies to the function If I, and so Elf(, 1])1 < 00. The
desired relation then follows as before. 0
In particular, for any independent random variables 1,. . . , n, we have
E It f"k = IIk Ef.k, var L k f"k = L k var f.k,
whenever the expressions on the right exist.
If and 'TJ are random elements in a measurable group G, then the product
1] is again a random element in G. The following result gives the connection
between independence and the convolutions of Lemma 1.28.
Corollary 3.12 (convolution) Let and 'TJ be independent random ele-
ments with distributions J..t and 1/, respectively, in some measurable group
G. Then the product f, 1] has distribution JL * v.
Proof: For any measurable set BeG, we get by Lemma 3.10 and the
definition of convolution
P{'TJ E B} == (JL 1/){(x, y) E G 2 ; xy E B} = (J-L * 1/)B. 0
3. Processes, Distributions, and Independence 53
Given any sequence of a-fields :Fi, ;:2, . . . , we introduce the associated
tail (7- field
T == n V :Fk == n a{:Fk; k > n}.
n k>n n
The following remarkable result shows that T is trivial whenever the Fn
are independent. An extension appears in Corollary 7.25.
Theorem 3.13 (Kolmogorov's 0-1 law) Let :F 1 ,F2,... be independent
a-fields. Then the tail a-field T == nn V k>n:Fk is P-triv'ial.
Proof: For each n E N, define Tn == V k>n F k , and note that
:F 1 , . . . ,Fn, Tn are independent by Corollary 3.7. Hence, so are the a-fields
:F 1 , . . . , Fn, T, and then also :F1' :F2' . . . , T. By the same theorem we obtain
'lOll T, and so T Jl T. Thus, T is P-trivial by Lemma 3.9. 0
We shall consider some simple illustrations of the last theorem.
Corollary 3.14 (sums and averages) Let 1, 2, . .. be independent ran-
dom variables, and put Sn == 1 + . . . + n' Then each of the sequences
(Sn) and (8n/n) is either a.s. convergent or a.s. divergent. For the latter
sequence, the possible limit is a.s. degenerate.
Proof: Define :F n = a{n}, n E N, and note that the associated tail a-
field T is P-trivial by Theorem 3.13. Since the sets of convergence of (8n)
and (Sn/n) are T-measurable by Lemma 1.9, the first assertion follows.
The second assertion is obtained from Lemma 3.9. 0
By a finite permutation of N we mean a bijective map p: N -+ N such that
Pn = n for all but finitely many n. For any space S, a finite permutation p
of N induces a permutation Tp on 8 00 given by
Tp(s) == 80p= (Spl,SP2"'.)' S == (81,82,...) E 8 00 .
A set I c 8 00 is said to be symmetric (under finite pernlutations) if
T; 1 I = {s E 8 00 ; Sop E I} == I
for every finite permutation p of N. If (8, S) is a measurable space, the
symmetric sets I E 8 00 form a sub-a-field I c 5 00 , called the permutation
invariant a-field in 8 00 .
We may now state the other basic 0-1 law, which refers to sequences of
random elements that are independent and identically distributed (often
abbreviated as i.i.d.).
Theorem 3.15 (Hewitt-Savage 0-1 law) Let be an infinite sequence of
i. i. d. random elements in some measurable space (8, S), and let I denote
the permutation invariant a-field in 8 00 . Then the a-field -lI is P-trivial.
Our proof is based on a simple approximation. Write
A6.B = (A \ B) U (B \ A),
54 Foundations of Modern Probability
and note that
P(AB) == P(ACBC) == EllA - IBI, A,B E A. (8)
Lemma 3.16 (approximation) Given any a-fields ;:1 C ;:2 C ... and a
set A E V n Fn, there exist some AI, A 2 , . . . E Un Fn with P(AAn) ---+ o.
Proof: Define C == Un Fn, and let V denote the class of sets A E V n Fn
with the stated property. Then C is a 7r-system and V a 'x-system containing
C. By Theorem 1.1 we get V n Fn == a(C) c V. 0
Proof of Theorem 3.15: Define Jl == £(), put :F n == sn x Boo, and note
that I c 5 00 == V n Fn. For any I E I there exist by Lemma 3.16 some
Bn E sn such that the orresponding cylinder sets In == En X 8 00 satisfy
Jl(I In) ---+ O. Writing In == sn X Bn X SOO, it is clear from the symmetry
of Jl and I that Jli n == Jlln -+ JlI and Jl(I6i n ) == Jl(I In) -+ o. Hence, by
(8),
Jl(I (In n in)) < Jl(I In) + Jl(I in) -+ o.
Since moreover InJl.l n under Jl, we get
- - 2
JlI Jl(In n In) == (Jl1n)(Jlln) -+ (JlI) .
Thus, JlI == (J.LI)2, and so P 0 c;-l I == JlI == 0 or 1.
o
The next result lists some typical applications. Say that a random
variable is symmetric if d -.
Corollary 3.17 (random walk) Let l, 2, . .. be i.i.d., nondegenerate
random variables, and put Sn == l + . . . + n' Then
(i) P{Sn E B i.o.} == 0 or 1 for any B E B;
(ii) lim sUPn Sn == 00 a. s. or -00 a. s.;
(iii) lim sUP n (:f:S n ) == 00 a.s. if the n are symmetric.
Proof: Statement (i) is immediate from Theorem 3.15, since for any finite
permutation p of N we have x P1 + . . . + x pn == Xl + . . . + X n for all but
finitely many n. To prove (ii), conclude from Theorem 3.15 and Lemma 3.9
that limsuPn Sn == c a.s. for some constant c E 1R == [-00,00]. Hence, a.s.,
c==limsUPnSn+l ==limsuPn(Sn+l-c;l)+c;l ==e+l.
If lei < 00, we get c;l == 0 a.s., which contradicts the nondegeneracy of {I.
Thus, lei == 00. In case (iii), we have
c == limsuPnSn > liminfnS n == -limsUP n ( -Sn) == -c,
and so -c < e E {:1:00}, which implies c == 00.
o
Using a suitable zero-one law, one can often rather easily see that a
given event has probability zero or one. Determining which alternative ac-
tually occurs is often harder. The following classical result, known as the
3. Processes, Distributions, and Independence 55
Borel-Cantelli lemma, may then be helpful, especially when the events are
independent. An extension to the general case appears in Corollary 7.20.
Theorem 3.18 (Borel, Cantelli) Let AI, A 2 ,' .. E A. Then L:n PAn <
00 implies P {An Lo.} == 0, and the two conditions are equivalent when the
An are independent.
Here the first assertion was proved earlier as an application of Fatou's
lemma. The use of expected values allows a more transparent argument.
Proof: If L:n PAn < 00, we get by monotone convergence
ELn IAn = Ln ElAn = Ln PAn < 00.
Thus, L:n IAn < 00 a.s., which means that P{An i.a.} =: O.
Next assume that the An are independent and satisfy L:n PAn 00.
Noting that 1 - x < e- x for all x, we get
P u Ak 1- P n Ak == 1- II PAk
kn kn kn
1- II k2n (1- PAk) > 1- II k2n exp(-PA k )
- 1 - exp { - L k2n P Ak } = 1.
Hence, as n -t 00,
1 = P u Ak.!- P n U Ak = P{An i.o.},
k2n n kn
and so the probability on the right equals 1.
o
For many purposes it is sufficient to use the Lebesgue unit interval ([0, 1],
B[O, 1], A) as the basic probability space. In particular., the following re-
sult ensures the existence on [0,1] of some independent random variables
1, 2, . .. with arbitrarily prescribed distributions. The present statement
is only preliminary. Thus, we shall remove the independence assumption in
Theorem 6.14, prove an extension to arbitrary index sets in Theorem 6.16,
and eliminate the restriction on the spaces in Theorem 6.17.
Theorem 3.19 (existence, Borel) For any probability rneasures J.Ll, J12, . . .
on some Borel spaces 8 1 ,8 2 ".. , there exist some independent random
elements 1, 2,... on ([0,1],'\) with distributions J.Ll, J.L2, . .. .
As a consequence, there exists a probability measure ,11 on 51 x 52 X . . .
satisfying
J1 0 (7rl, . . . , 7r n) -1 = J.Ll Q9 . . . Q9 J.Ln, n EN.
For the proof, we first consider two special cases of independent interest.
By a Bernoulli sequence with rate rate p we mean a sequence of i.i.d.
random variables 1,2,... such that P{n == I} = 1 -- P{n = O} = p.
Furthermore, we say that a random variable {} is uniformly distributed
on [0, I] (written as U(O,l)) if its distribution £({}) equals Lebesgue
56 Foundations of Modern Probability
measure A on [0,1]. Every number x E [0,1] has a binary expansion
rl, T2, . .. E {O, I} satisfying x == Ln r n 2- n , and to ensure uniqueness we
assume that Ln Tn == 00 when x > o. The following result provides a simple
construction of a Bernoulli sequence on the Lebesgue unit interval.
Lemma 3.20 (Bernoulli sequence) Let {) be a random variable in [0,1]
with binary expansion 1, 2, . ... Then {) is U(O,I) iff the n form a
Bernoulli sequence with rate .
Proof: If {) is U(O, 1), then pnjn{j == k j } == 2- n for all k 1 ,..., k n E
{O, I}. Summing over kl'...' k n - 1 gives P{n == k} == for k == 0 and 1. A
similar calculation yields the asserted independence.
Now assume instead that the n form a Bernoulli sequence with rate .
- - - d -
Letting 1J be U(O, 1) with binary expansion 1, 2, . .. , we get (n) == (n).
Thus,
{) = Ln n2-n d Ln tnTn = J.
The next result shows how a single U(O, 1) random variable can be used
to generate a whole sequence.
o
Lemma 3.21 (reproduction) There exist some measurable/unctions II, 12,
. .. on [0,1] such that whenever {) is U(O,l), the random variables 1J n ==
Inca) are i.i.d. U(O,l).
Proof: For any x E [0, 1] we introduce the associated binary expansion
91 (x), 92 (x), . .. and note that the 9k are measurable. Rearranging the 9k
into a two-dimensional array h nj , n,j E N, we define
In(x) == L .2- j hnj(x), x E [0,1], n E N.
J
By Lemma 3.20 the random variables 9k({}) form a Bernoulli sequence with
rate , and the same result shows that the variables {}n == In({)) are U(O, 1).
The latter are further independent by Corollary 3.7. 0
Finally, we need to construct a random element with given distribution
from an arbitrary randomization variable. The required lemma is stated in
a version for kernels, to meet the needs of Chapters 6, 8, and 14.
Lemma 3.22 (kernels and randomization) Let f-t be a probability kernel
from a measurable space S to a Borel space T. Then there exists a measur-
able function f: S x [0, 1] --+ T such that if {} is U(0,1), then 1(8,19) has
distribution Jl( 8, .) for every 8 E s.
Proof: We may assume that T is a Borel subset of [0,1], in which case
we may easily reduce to the case when T = [0, 1]. Define
I(s, t) = sup{x E [0,1]; Jl(s, [0, x]) < t}, S E S, t E [0,1], (9)
and note that 1 is product measurable on S x [0,1], since the set {(s, t);
p,(s, [0, x)) < t} is measurable for each x by Lemma 1.12, and the supremum
3. Processes, Distributions, and Independence 57
in (9) can be restricted to rational x. Iff) is U(O, 1), we get
P{f(s, iJ) < x} == P{19 < /1(s, [0, x])} == /1(s, [0, x]), x E [0,1],
and so I (s, 19) has distribution /1( s, .) by Lemma 3.3. 0
Proof of Theorem 3.19: By Lemma 3.22 there exist some measurable
functions In: [0,1] -t Sn such that A 0 f;:l == /1n. Letting; {} be the identity
mapping on [0,1] and choosing 19 1 ,19 2 , . .. as in Lemma :.21, we note that
the functions n == In(19 n ), n E N, have the desired joint distribution. 0
Next we consider the regularization and sample path properties of ran-
dom processes. Say that two processes X and Y on the same index set T
are versions of each other if Xt == yt a.s. for each t E T. In the special
case when T == JRd or JR+, we note that two continuous or right-continuous
versions X and Y of the same process are indistinguishable, in the sense
that X = Y a.s. In general, the latter notion is clearly stronger.
For any function f between two metric spaces (S,p) and (S',p'), the
associated modulus of continuity wf == w(f,.) is given by
wf(r) = sup{p'(/s, it); s, t E S, p(s, t) < r}, r > o.
Note that f is uniformly continuous iff wf(r) -t 0 as r --+ O. Say that f is
Holder continuous with exponent c if wf(r) :S r C as r -+ O. The property
is said to hold locally if it is true on every bounded set. (Here and in the
sequel, the relation f 5. 9 between positive functions nleans that I < cg
for some constant c < 00.)
A simple moment condition ensures the existence of a Holder-continuous
version of a given process on ]Rd. Important applications are given in The-
orems 13.5, 21.3, and 22.4, and a related tightness criterion appears in
Corollary 16.9.
Theorem 3.23 (moments and continuity, Kolmogorov, Loeve, Chentsov)
Let X be a process on JR.d with values in a complete metric space (8, p), and
assume for some a, b > 0 that
E{p(X s , xt)}a :S Is - tl d + b , S, t E Rd. (10)
Then X has a continuous version, and the latter is a.s. locally Holder
continuous with exponent c for any c E (0, b/a).
Proof: It is clearly enough to consider the restriction of X to [0, l]d.
Define
Dn = {(k 1 ,...,kd)2- n ; k1,...,kn E {1,...,2n}}, n EN,
and let
n == max{p(X s , X t ); s, t E Dn, Is - tl == 2-n}. n E N.
Since
I{(s, t) E D; Is - tl == 2- n }1 < d2 dn , n E N,
58 Foundations of Modern Probability
we get by (10), for any c E (0, bja),
EL(2cnn)a == L2acnE:S L2acn2dn(2-n)d+b == L2(ac-b)n < 00.
n
n
n
n
The sum on the left is then a.s. convergent, and therefore n < 2- cn a.s.
,.--...
Now any two points s, t E Un Dn with Is - tl < 2- m can be connected by a
piecewise linear path involving, for each n > m, at most 2d steps between
nearest neighbors in Dn. Thus, for r E [2-m-l, 2- m ],
sup {p(X s , Xd; s, t E Un Dn, Is - tl < r }
< "'" n < "'" 2- cn < 2- cm < r C ,
----- n?m ..- n?m --- ..-
which shows that X is a.s. Holder continuous on Un Dn with exponent c.
In particular, there exists a continuous process Y on [O,I]d that agrees
with X a.s. on Un Dn, and it is easily seen that the Holder continuity of
Y on Un Dn extends with the same exponent c to the entire cube [0, l]d.
To show that Y is a version of X, fix any t E [0, l]d and choose tl, t2,... E
Un Dn with t n --t t. Then X tn == ¥in a.s. for each n. Furthermore, X tn
Xt by (10) and ¥in ---t yt a.s. by continuity, so Xt = ¥i a.s. 0
The next result shows how regularity of the paths may sometimes be
established by comparison with a regular process.
Lemma 3.24 (transfer of regularity) Let X d Y be random processes
on some index set T, taking values in a separable metric space S, and
assume that the paths of Y lie in a set U C ST that is Borel for the a-field
U == (B(S))T n U. Then X has a version with paths in U.
Proof: For clarity we may_ write Y for the path of Y, regarded as a
random element in U. Then Y is Y-measurable, and by Lemma 1.13 there
exists a measurable mapping f: ST -+ U such that Y == fey) a.s. Define
X == j(X), and note that (X,X) d (Y, Y). Since the diagonal in 8 2 is
measurable, we get in particular
P{Xt == X t } == P{t == yt} == 1, t E T.
o
We conclude this chapter with a characterization of distribution functions
in ]Rd, required in Chapter 5. For any vectors x = (Xl'...' Xd) and y ==
(Yl, . . . , Yd), write x < Y for the componentwise inequality Xk < Yk, k ==
1, . . . , d, and similarly for x < y. In particular, the distribution function F
of a probability measure J1; on JRd is given by F(x) = J1;{Y; Y < x}. Similarly,
let X V Y denote the componentwise maximum. Put 1 == (1,...,1) and
00 = (00,...,00).
For any rectangular box (x,y] == {u; x < U < y} == (XI,YI] x ... x
(Xd,Yd], we note that J1;(x,y] == Eus(u)F(u) where s(u) == (-l)P with p =
Ek 1{ Uk == Yk} and the summation extends over all corners u of (x, y]. Let
F(x, y] denote the stated sum and say that F has nonnegative increments if
3. Processes, Distributions, and Independence 59
F(x, y] > 0 for all pairs x < y. Let us further say that F is right-continuous
if F(xn) -t F(x) as X n .J, x and proper if F(x) -t 1 or 0 as mink Xk -+ :too,
respectively.
The following result characterizes distribution functions in terms of the
mentioned properties.
Theorem 3.25 (distribution functions) A function F: }Rd -+ [0,1] is the
distribution function of some probability measure /-i on }Rd iff it is right-
continuous and proper with nonnegative increments.
Proof: Assume that F has the stated properties, and note that the
associated set function F(x, y] is finitely additive. Since F is proper,
we further have F(x, y] -t 1 as x -t -00 and y --t 00, that is, as
(x,y] t (-00,00) = d. Hence, for every n E N there exists a probability
measure J-Ln on (2- n Z)d with Z == {. . . , -1,0,1, . . . } such that
J-Ln { 2 - n k} :=: F (2 - n (k - 1), 2 - n k], k E 7l d, n EN,
and from the finite additivity of F(x, y] we obtain
J-Lm(2- m (k - 1, k]) == J-Ln(2- m (k - 1, k]), k E Zd, m < n in N. (11)
In view of (11), we may split the Lebesgue unit interval ([0,1], B[O, 1],'x)
recursively to construct some random vectors 1,2,'" with distributions
J.Ll, J-t2, . .. such that m - 2- m < €n < €m for all m <: n. In particular,
1 > €2 > ... > €l - 1, and so €n converges pointwise to some random
vector . Define J.l == ,X 0 €-l .
To see that J-t has distribution function F, we note that since F is proper,
'x{€n < 2- n k} == J-tn(-oo,2- n k] == F(2- n k), k E Zd, n E N.
Since also n .J.. a.s., Fatou's lemma yields for dyadic x E jRd
,x{€ < x} 'x{n < X ult.} < liminfn'x{n < x}
< F(x) == limsuPnA{€n < x}
< A{n < x i.o.} < 'x{€ < x},
and so
F(x) < A{ < x} < F(x + 2- n l), n E N.
Letting n -t 00 and using the right-continuity of 1':;1, we get A {€ <
x} == F(x), which extends to any x E R d by the right-continuity of both
sides. 0
The last result has the following version for unbounded measures.
Corollary 3.26 (unbounded measures) Let the function F on JRd be right-
continuous with nonnegative increments. Then there exi.sts a measure J-L on
]Rd such that J.L(x, y] == F(x, y] for all x < y in d.
Proof: For any a E JRd, we may apply Theorem 3.25 to suitably normal-
ized versions of the function Fa(x) == F(a, a V x] to obtain a measure J-La
60 Foundations of Modern Probability
on [a, (0) with J-ta ( a, x] == F ( a, x] for all x > a. Then clearly tla == J.Lb on
(a V b, (0) for any a and b, and so the set function J.L == sUPa J.1a is a measure
with the required property. 0
Exercises
1. Give an example of two processes X and Y with different distributions
d
such that Xt == yt for all t.
2. Let X and Y be {O, l}-valued processes on some index set T. Show that
X d Y iff P{X t1 + ... + X tn > O} == P{Yi 1 + ... + Yi n > O} for all n E N
and t 1, . . . , t n E T.
3. Let F be a right-continuous function of bounded variation and with
F(-oo) == o. Show for any random variable that EF() == J P{ > t}
F(dt). (Hint: First take F to be the distribution function of some random
variable 1JlL, and use Lemma 3.11.)
4. Consider a random variable E L 1 and a strictly convex function f on
JR. Show that Ef() == f(E) iff == E a.s.
5. Assume that € == E j ajj and TJ == E j b j 1]j, where the sums converge in
L 2 . Show that cov(, 1]) == Ei,j aibjcov(i, 'fJj), where the double series on
the right is absolutely convergent.
6. Let the o--fields :Ft,n, t E T, n E N, be nondecreasing in n for each t and
independent in t for each n. Show that the independence extends to the
o--fields :Ft == V n :Ft,n.
7. For each t E T, let t, f , , . .. be random elements in some metric
space St with -4 €t a.s., and assume for each n E N that the random
elements are independent. Show that the independence extends to the
limits t. (Hint: First show that E TItES ft(t) == TItES Eft(t) for any
bounded, continuous functions it on St and for finite subsets SeT.)
8. Give an example of three events that are pairwise independent but not
independent.
9. Give an example of two random variables that are uncorrelated but not
independent.
10. Let 1, 2, . .. be i.i.d. random elements with distribution J.-l in some
measurable space (8, S). Fix a set A E S with J.LA > 0, and put T ==
inf{k; (,k E A}. Show that T has distribution J1[.IA] = J.-l(- nA)/J-tA.
11. Let l, 2,. .. be independent random variables taking values in [0,1].
Show that E TIn n == TIn Ef,n. In particular, show that P nn An ==
TIn P An for any independent events AI, A 2 , . .. .
12. Let {I, {2, . .. be arbitrary random variables. Show that there exist
some constants Cl, C2,. . . > 0 such that the series En Cnn converges a.s.
3. Processes, Distributions, and Independence 61
13. Let I, 2, . .. be random variables with n --+ 0 a.s. Show that there
exists some measurable function f > 0 with En f(n) < 00 a.s. Also show
that the conclusion fails if we only assume L1-convergence.
14. Give an example of events AI, A 2 , . .. such that P {An La.} == 0 but
En P An == 00.
15. Extend Lemma 3.20 to a correspondence between U(O,l) random
variables {) and Bernoulli sequences l, 2, . .. with rate p E (0,1).
16. Give an elementary proof of Theorem 3.25 for d == 1. (Hint: Define
== F- I ({)), where iJ is U(O, 1), and note that has distribution function
F.)
17. Let I, 2, . .. be random variables such that P {n 7 0 La.} == 1. Show
that there exist some constants C n E 1R such that P{lcnnl > 1 Lo.} == 1.
(Hint: Note that P{Ekn Ikl > O} --+ 1.)
Chapter 4
Random Sequences, Series,
and Averages
Convergence in probability and in LP; uniform integrabil-
ity and tightness; convergence in distribution; convergence of
random series; strong laws of large numbers; Portmanteau the-
orem; continuous mapping and approximation; coupling and
measurability
The first goal of this chapter is to introduce and compare the basic modes of
convergence of random quantities. For random elements and 1, 2, . .. in
a metric or topological space S, the most commonly used notions are those
of almost sure convergence, n a.s., and convergence in probability,
n , corresponding to the general notions of convergence a.e. and in
measure, respectively. When S == IR, we have the additional concept of
LP-convergence, familiar from Chapter 1. Those three notions are used
throughout this book. For a special purpose in Chapter 25, we shall also
need the notion of weak L1-convergence.
For our second main topic, we shall study the very different concept of
convergence in distribution, n .!4 , defined by the condition Ef(n)
Ef() for all bounded, continuous functions f on S. This is clearly equiv-
alent to weak convergence of the associated distributions J-ln = £(n) and
J-l == £(), written as J-ln J-l and defined by the condition J.1nf J-lf for
every f as above. In this chapter we shall only establish the most basic
results of weak convergence theory, such as the "Portmanteau" theorem,
the continuous mapping and approximation theorems, and the Skorohod
coupling. Our development of the general theory continues in Chapters 5
and 16, and further distributional limit theorems appear in Chapters 8, 9,
12, 14, 15, 19, and 23.
Our third main theme is to characterize the convergence of series Lk (,k
and averages n- C Lk<n (,k, where (,1, 2, . .. are independent random vari-
ables and c is a positive constant. The two problems are related by the
elementary Kronecker lemma, and the main results are the basic three-
series criterion and the strong law of large numbers. The former result is
extended in Chapter 7 to the powerful martingale convergence theorem,
whereas extensions and refinements of the latter result are proved in Chap-
4. Random Sequences, Series, and Averages 63
ters 10 and 14. The mentioned theorems are further related to certain weak
convergence results presented in Chapters 5 and 15.
Before beginning our systematic study of the various notions of
convergence, we consider a couple of elementary but useful inequalities.
Lemma 4.1 (moments and tails, Bienayme, Chebyshev, Paley and Zyg-
mund) Let be an R+-valued random variable with 0 < E < 00.
Then
(1 - r) (2 < P{ > rEO < , r:> o.
(1)
The second relation in (1) is often referred to as Chebyshev's or Markov's
inequality. Assuming that E2 < 00, we get in particular the well-known
estimate
P{I - EI > €} < €-2var(), E > O.
Proof of Lemma 4.1: We may clearly assume that E' == 1. The upper
bound then follows as we take expectations in the inequality rl{ > r} < .
To get the lower bound, we note that for any r, t > 0
t21{ > r} > (- r)(2t + r -) = 2(r + t) - r(2t + r) _ 2.
Taking expected values, we get for r E (0, 1)
t 2 P{ > r} > 2(r + t) - r(2t + r) - E2 > 2t(1 -- r) - E2.
Now choose t == E2 /(1 - r). 0
For random elements and 1, 2, . .. in a metric space (S, p), we say
that n converges in probability to (written as n ) if
lirn P{P(n') > E} == 0, E > O.
n-+oo
By Chebyshev's inequality it is equivalent that E[P(n') /\ 1] -t O. This
notion of convergence is related to the a.s. version as follows.
Lemma 4.2 (subsequence criterion) Let , 1, 2, . .. be random elements
in a metric space (S, p). Then n iff every subsequence N' c N has a
further subsequence Nil c N' such that n -t a. s. along Nil. In particular,
n -+ a.s. implies n .
This shows in particular that the notion of convergence in probability
depends only on the topology and is independent of the metrization p.
Proof: Assume that n , and fix an arbitrary subsequence N' c N.
We may then choose a further subsequence Nil c N' such that
E L {P(n,)/\l}= L E[P(n,)/\l] <00,
nEN" nEN"
where the equality holds by monotone convergence. The series on the left
then converges a.s., which implies n -+ a.s. along Nil.
64 Foundations of Modern Probability
Now assume instead the stated condition. If n , there exists some € >
o such that E[p( n' ) 1\ 1] > € along a subsequence N' eN. By hypothesis,
n -+ a.s. along a further subsequence N" c N', and by dominated
convergence we get E[p( n, ) 1\ 1] -7 0 along Nil, a contradiction. 0
For a first application, we shall see how convergence in probability is
preserved by continuous mappings.
Lemma 4.3 (continuous mapping) For any metric spaces 8 and T, let
, 1' 2, . .. be random elements in S with n , and let the mapping
f: S -+ T be measurable and a.s. continuous at. Then f(n) f().
Proof: Fix any subsequence N' c N. By Lemma 4.2 we have n -+
a.s. along some further subsequence N" C N', and by continuity we get
p
f(n) -+ f() a.s. along N". Hence, f(n) -+ f(f,) by Lemma 4.2. 0
Now consider a sequence of metric spaces (Sk,Pk), and introduce the
product space S = XkSk == 8 1 X 8 2 X ... endowed with the product
topology, a convenient metrization of which is given by
p(X,y) = LkTk{Pk(Xk,Yk) 1\ I}, X,Y E XkSk. (2)
If each 8k is separable, then 8(8) == @k 8(Sk) by Lemma 1.2, and so a
random element in S is simply a sequence of random elements in Sk, kEN.
Lemma 4.4 (random sequences) For any separable metric spaces 8 1 ,8 2 ,
. .. , let == (1, 2, . . .) and n == (1' 2 , . . . ), n EN, be random elements
in XkSk. Then n iff k k in 8k for each k.
Proof: With p as in (2), we get for each n E N
E[p(n,) 1\ 1] = Ep(n,) = Lk T k E[Pk(k' k) 1\ 1].
Thus, by dominated convergence E[p(n, )I\ 1] -7 0 iff E[Pk(k' k) 1\1] -+ 0
for all k. 0
Combining the last two lemmas, it is easy to see how convergence in
probability is preserved by the basic arithmetic operations.
Corollary 4.5 (elementary operations) Let ,1,2,". and T},T}1,TJ2,.',
be random variables with n and TJn TJ. Then an + brJn a + br]
p p
for all a, b E JR, and nTJn --+ TJ. Furthermore, n/1Jn -+ /1J whenever a.s.
'TJ -# 0 and TJn =1= 0 for all n.
Proof: By Lemma 4.4 we have (n' TJn) (, '1]) in ]R2, so the results fOf
linear combinations and products follow by Lemma 4.3. To prove the last
assertion, we may apply Lemma 4.3 to the function f: (x, y) H- (xjy)l{y =I
O}, which is clearly a.s. continuous at (, TJ). 0
4. Random Sequences, Series, and Averages 65
Let us next examine the associated completeness properties. For any
random elements 1,2,..' in a metric space (S,p), we say that (n) is
Cauchy (convergent) in probability if P(m, n) 0 as 'm, n 00, in the
sense that E[P(m' n) /\ 1] --t O.
Lemma 4.6 (completeness) Let 1,2,... be random elements in a com-
plete metric space (8, p). Then (n) is Cauchy in probability or a.s. iff
n or n -+ a. s., respectively, for some random element in S.
Proof: The a.s. case is immediate from Lemma 1.10. Assuming n ,
we get
E[P(m, n) 1\ 1] < E[P(m') /\ 1] + E[P(n') 1\ 1] 0,
which means that (n) is Cauchy in probability.
Now assume instead the latter condition. Define
nk == inf {n > k; sUPm>nE[P(m'n) /\ 1] < 2- k }, kEN.
The nk are finite and satisfy
ELk {P(nk' nk+l) 1\ I} < Lk r k < eX),
and so Lk P(nk' nk+l) < 00 a.s. The sequence (nk) is then a.s. Cauchy
and converges a.s. toward some measurable limit . To see that n ,
write
E[P(m,) 1\ 1] < E[P(m' nk) /\ 1] + E[P(nk') /\ 1],
and note that the right-hand side tends to zero as m, k -1 00, by the Cauchy
convergence of (n) and dominated convergence. 0
N ext consider any probability measures J..L and J.-L 1 , J-L2, . .. on some metric
space (8, p) with Borel a-field 5, and say that J-Ln converges weakly to JL
(written as J.tn J-L) if J-Lnf -t J-Lf for every f E Cb(S), the class of bounded,
continuous functions I: S JR. If and 1 , 2, . .. are random elements
in S, we further say that n converges in distribution to (written as
n ) if .c(n) .c(), that is, if Ef(f,n) Ef(f,) for all f E Cb(S).
Note that the latter mode of convergence depends only on the distributions
and that and the n need not even be defined on the same probability
space. To motivate the definition, note that X n -t x in a metric space S iff
I(xn) -+ I(x) for all continuous functions f: S --t JR., and also that £(f,) is
determined by the integrals Ef() for all f E Cb(S).
The following result gives a connection between convergence in proba-
bility and in distribution.
66 Foundations of Modern Probability
Lemma 4.7 (convergence in probability and in distribution) Let , 1,
2, . .. be random elements in a metric space (S, p). Then n implies
n , and the two conditions are equivalent when is a.s. constant.
p
Proof: Assume n . For any f E Cb(S) we need to show that
Ef(n) -+ Ef(). If the convergence fails, we may choose some subse-
quence N' c N such that infnEN, IEf(n) - Ef()1 > O. By Lemma 4.2
there exists a further subsequence Nil c N' such that n -+ a.s. along
Nil. By continuity and dominated convergence we get Ef(n) Ef()
along Nil, a cQntradiction.
Conversely, assume that n S E S. Since p(x, s) 1\ 1 is a bounded and
continuous function of x, we get E[P(n, s) 1\ 1] E[p(s, s) /\ 1] == 0, and
p
so n -7 s. D
A family of random vectors t, t E T, in d is said to be tight if
lim supP{Itl > r} == O.
r-+oo tET
For sequences (n) the condition is clearly equivalent to
lirn limsupP{Inl > r} == 0,
roo n-+oo
(3)
which is often easier to verify. Tightness plays an important role for the
compactness methods developed in Chapters 5 and 16. For the moment we
note only the following simple connection with weak convergence.
Lemma 4.8 (weak convergence and tightness) Let, 1, 2, . .. be random
vectors in JRd satisfying n . Then (n) is tight.
Proof: Fix any r > 0, and define I(x) == (1 - (r -Ixl)+)+. Then
limsupP{Inl > r} < lim Ef((,n) == Ef((') < P{I(,I > r - I}.
n-+oo
n-+oo
Here the right-hand side tends to 0 as r --+ 00, and (3) follows. D
We may further note the following simple relationship between tightness
and convergence in probability.
Lemma 4.9 (tightness and convergence in probability) Let l, 2, . .. be
random vectors in d. Then (c;n) is tight iff enn 0 for any constants
Cl , c2, . . . > 0 with C n -7 O.
Proof: Assume (n) to be tight, and let en --+ O. Fixing any r, € > 0, and
noting that CnT < € for all but finitely many n E N, we get
limsupP{lennl > E} < limsupP{Inl > r}.
n-+oo n-+oo
Here the right-hand side tends to 0 as r -7 00, and so P{lcnnl > E} -7 O.
Since € was arbitrary, we get enn o. If instead (n) is not tight, we may
4. Random Sequences, Series, and Averages 67
choose a subsequence (nk) C N such that infkP{Inkl:> k} > O. Letting
C n = sup{k- 1 ; nk > n}, we note that C n -j. 0 and yet P{ICnknkl > I} f+ O.
Thus, the stated condition fails. 0
We turn to a related notion for expected values. A family of random
variables t, t E T, is said to be uniformly integrable if
lim sup E[Itl; Itl > r] == O.
r-+oo tET
(4)
For sequences (n) in £1, this is clearly equivalent to
lirn limsup E[Inl; Inl > r] = O.
r-4-OO n-+oo
(5)
Condition (4) holds in particular if the t are LP - bounded for some p > 1,
in the sense that SUPt EIt!P < 00. To see this, it suffices to write
E[\t I; It I > r] < r- p + 1 EI€t IP, r, p > O.
The next result gives a useful characterization of uniform integrability.
For motivation we note that if is an integrable random variable, then
E[II; A] --+ 0 as P A -4- 0, by Lemma 4.2 and dominated convergence. The
latter condition means that sup AEA,P A<c; E[II; A] -+ 0 .as E -+ o.
Lemma 4.10 (uniform integrability) The random variables t, t E T, are
uniformly integrable iff SUPt EI€t\ < 00 and
lim sup E[I€t I; A] == o. (6)
PA-+O tET
Proof: Assume the t to be uniformly integrable, and write
E[Itl; A] < rP A + E[Itl; Itl > r], r:;> O.
Here (6) follows as we let P A -+ 0 and then r -4- 00. To get the boundedness
in £1, it suffices to take A = n and choose r > 0 large enough.
Conversely, let the t be £l-bounded and satisfy (6). By Chebyshev's
inequality we get as r -+ 00
sUPtP{Itl > r} < r-lsuPtEItl -+ 0,
and so (4) follows from (6) with A = {Itl > r}.
o
The relevance of uniform integrability for the convergence of moments
is clear from the following result, which also contains a. weak convergence
version of Fatou's lemma.
Lemma 4.11 (convergence of means) Let, l, 2, . .. be IR+-valued ran-
dom variables with n . Then E < lirn inf n l?n' and we have
En --+ E < 00 iff (5) holds.
Proof: For any r > 0 the function x H x 1\ r is bounded and continuous
on IR+. Thus,
liminf En > lim E(n 1\ r) = E( 1\ r),
n-+CX) n-+(X)
68 Foundations of Modern Probability
and the first assertion follows as we let r 00. Next assume (5), and note
in particular that E < liminf n En < 00. For any r > 0 we get
IEn - EI < IEn - E(f,n A r)1 + IE{{n A r) - E( A r)1
+ IE(f, A r) - E{I.
Letting n 00 and then r -t 00, we obtain En -t E. Now assume
instead that En --t E{ < 00. Keeping r > 0 fixed, we get as n -t 00
E[{n; f,n > r] < E[n - n 1\ (r - n)+] E[{ - 1\ (r - )+].
Since x /\ (r - x)+ t x as r -t 00, the right-hand side tends to zero by
dominated convergence, and (5) follows. 0
We may now examine the relationship between convergence in LP and in
probability.
Proposition 4.12 (LP-convergence) Fix any p > 0, and let f" l, {2,... E
LP with n . Then these conditions are equivalent:
(i) n f, in LP;
(ii) lI{nilp -t 1IlIp;
(iii) the variables l{nI P , n E N, are uniformly integrable.
Conversely, (i) implies {n {.
Proof: First assume that n -t { in LP. Then II{nllp -t 11llp by Lemma
1.29, and by Lemma 4.1 we have, for any € > 0,
P{I{n - {I > €} = P{I{n - {IP > e P } < e-PII{n - II -t o.
Thus, {n {. For the remainder of the proof we may assume that {n {.
In particular, l{nl P I{IP by Lemmas 4.3 and 4.7, and so (ii) and (iii) are
equivalent by Lemma 4.11. Next assume (ii). If (i) fails, there exists some
subsequence N' c N with inf nEN , II{n - {lip > o. By Lemma 4.2 we may
choose a further subsequence N" c N' such that {n -t a.s. along Nil. But
then Lemma 1.32 yields lI{n - {lip -t 0 along Nil, a contradiction. Thus,
(ii) implies (i), and so all three conditions are equivalent. 0
We shall briefly consider yet another notion of convergence of random
variables. Assuming , 1, . . . E LP for some p E [1, 00 ), we say that n -t
weakly in LP if EnTJ ETJ for every 1] E Lq, where p-l +q-l = 1. Taking
1] = /{IP-l sgn { gives IITJllq = IIII-l, and so by Holder's inequality
IIII = E'TJ = Hm Ef,nT} < "{/I-llim inf IIn lip,
n--+oo n--+oo
which shows that 1Illp < lirn inf n lIn lip.
Now recall the well-known fact that any L2-bounded sequence has a
subsequence that converges weakly in L2. The following related criterion
for weak compactness in L 1 will be needed in Chapter 25.
4. Random Sequences, Series, and Averages 69
Lemma 4.13 (weak L 1 -compactness, Dunford) Every uniformly inte-
grable sequence of random variables has a subsequence that converges weakly
in £1.
Proof: Let (n) be uniformly integrable. Define == nl{lnl < k}, and
note that () is £2-bounded in n for each k. By the compactness in £2 and
a diagonal argument, there exist a subsequence N' c N and some random
variables 1]1, 1]2, . .. such that TJk holds weakly in £2 and then also in
£1, as n 00 along N' for fixed k.
Now l11Jk - 1Jdh < liminf n II - 111, and by unifornl integrability the
right-hand side tends to zero as k, 1 00. Thus, the sequence (TJk) is
Cauchy in £1, and so it converges in £1 toward some . By approximation
it follows easily that n weakly in £1 along N'. 0
We now derive criteria for the convergence of random, series, beginning
with an important special case.
Proposition 4.14 (series with positive terms) Let 1, l;2, . .. be indepen-
dent JR+ -valued random variables. Then En n < 00 a. s. iff En E [n 1\ 1] <
00.
Proof: Assuming the stated condition, we get E En (n 1\ 1) < 00 by
Fubini's theorem, so En(nl\ 1) < 00 a.s. In particular, En l{n > I} < 00
a.s., so the series En (n 1\ 1) and En €n differ by at lllost finitely many
terms, and we get En n < 00 a.s.
Conversely, assume that En n < 00 a.s. Then also En (n 1\ 1) < 00 a.s.,
so we may assume that n < 1 for all n. Noting that 1 - x < e- x < 1 - ax
for x E [0,1] where a = 1- e- 1 , we get
o < EexP{-Ln.;n} = IInEe-n
< IIn (1 - aE';n) < IIn e-aEn = exp { -a I: n E';n } ,
and so En En < 00.
D
To handle more general series, we need the following strengthened ver-
sion of the Bienayme-Chebyshev inequality. A further extension appears
as Proposition 7.15.
Lemma 4.15 (maximum inequality, Kolmogorov) Let 1, 2,. .. be inde-
pendent random variables with mean zero, and put Sn == 1 + . . . + n.
Then
P{suPnl 8n l > r} < r- 2 Ln E.;, r > O.
70 Foundations of Modern Probability
Proof: We may assume that En E < 00. Writing r = inf {n; I Sn I > r}
and noting that Sk1{T == k}Jl(Sn - Sk) for k < n, we get
"E ES > " E[S; r == k]
kn kSn
> " {E[S; T == k] + 2E[Sk(Sn - Sk); T == k]}
k<n
Lkn E[S; r = k] > r 2 P{r < n}.
As n --t 00, we obtain
L E > r 2 P{T < oo} == r 2 P{suPkl S kl > r}.
dk
The last result leads easily to the following sufficient condition for the
a.s. convergence of random series with independent terms. Conditions that
are both necessary and sufficient are given in Theorem 4.18.
o
Lemma 4.16 (variance criterion for series, Khinchin and Kolmogorov)
Let 1 , 2, . .. be independent random variables with mean 0 and En E <
00. Then En n converges a.s.
Proof: Write Sn = 1 + . . . + n. By Lemma 4.15 we get for any £ > 0
p{ sUPk>nlSn - Ski> £} < £-2" EZ.
- kn
Hence, sUPk>n ISn -Ski 0 as n -+ 00, and Lemma 4.2 yields sUPk>n 18n-
Ski --t 0 a.s.-along a subsequence. Since the last supremum is nonincreasing
in n, the a.s. convergence extends to the entire sequence, which means that
(Sn) is a.s. Cauchy convergent. Thus, Sn converges a.s. by Lemma 4.6. 0
The next result gives the basic connection between series with positive
and symmetric terms. By n 00 we mean that P {n > r} -+ 1 for every
r > o.
Theorem 4.17 (positive and symmetric terms) Let 1, 2, . .. be indepen-
dent, symmetric random variables. Then these conditions are equivalent:
(i) En n converges a.s.;
(ii) En < 00 a.s.;
(iii) En E(; /\ 1) < 00.
If the conditions fail, then I Ekn kl 00.
Proof: Conditions (ii) and (iii) are equivalent by Proposition 4.14. Next
assume (iii), and conclude from Lemma 4.16 that En n1{Inl < I} con-
verges a.s. From (iii) and Fubini's theorem we note that also En l{lnl >
I} < 00 a.s. Hence, the series En n 1 {In I < I} and En n differ by at
most finitely many terms, and so even the latter series converges a.s. Thus,
(iii) implies (i). To see that (i) implies (ii), assume instead that (ii) fails.
p
Then En = 00 a.s. by Kolmogorov's 0-1 law, and so 18nl -t 00 where
4. Random Sequences, Series, and Averages 71
Sn == L:k<n k. Since the latter condition implies 18nl ---+ (X) a.s. along some
subsequence, we conclude that even (i) fails. This shows that (i)-(iii) are
are equivalent.
To prove the final assertion, we introduce an independent sequence of
i.i.d. random variables f} n with P { f} n == :f: I} == , and note that the se-
quences (n) and (19 n I n \) have the same distribution. Letting J-L denote the
distribution of the sequence (Inl), we get by Lemma 3.11
P{ISnl > r} = f p{ILk:s n 'l9 k X k j > r} J.L(dx), r > 0,
and by dominated convergence it is enough to show that. the integrand on
the right tends to 0 for J-L-almost every x == (Xl, X2, . . . ). Since Ln x; == 00
a.e., this reduces the argument to the case of nonrandom. In I == Cn, n E N.
First assume that the C n are unbounded. For any r > 0 we may re-
cursively construct a subsequence (nk) C N such that C n1 > rand
c nk > 4 Lj<k c n ] for each k. Then clearly P{LJS;k nJ E I} < 2- k for
every interval I of length 2r. By convolution we get P{IS"nl < r} < 2- k for
all n > nk, which implies P{ISnl < r} -t o.
Next assume that C n < C < 00 for all n. Choosing a > 0 so small that
2
cos X < e- ax for Ixl < 1, we get for 0 < It I < c- l
o < Ee itSn = II COS(tck) < II exp( -at2c) = exp { -aeLk<n c} -+ O.
kn kS;n -
Anticipating the elementary Lemma 5.1 of the next chapter, we again get
P{ISnl < r} ---+ 0 for each r > o. 0
The problem of characterizing the convergence, a.s. or in distribution,
of a series of independent random variables is solved eompletely by the
following result. Here we write var[; A] == var(lA).
Theorem 4.18 (three-series criterion, Kolmogorov, Levy) Let l, 2, . . .
be independent random variables. Then Ln n converges a.s. iff it converges
in distribution, and also iff these conditions are fulfilled:
(i) Ln P{Inl > I} < 00;
(ii) Ln E[n; Inl < 1] converges;
(iii) Ln var[n; Inl < 1] < 00.
For the proof we need the following simple symmetrization inequalities.
Say that m is a median of the random variable if P { > m} V P { < m}
< . A symmetrization of is defined as a random variable of the form
= - ' with ' lL and ' d . For symmetrized versions of the random
variables 1, 2,. .. , we require the same properties for the whole sequences
(n) and ().
72 Foundations of Modern Probability
Lemma 4.19 (symmetrization) Let be a symmetrization of a random
variable with median m. Then
P{I - ml > r} < P{I€I > r} < 2P{I1 > r/2}, r > o.
Proof: Assume = -' as above, and write
{- m > r,' < m} U {- m < -r,' > m}
C {II > r} C {II > r/2} U {I'I > r/2}.
o
We also need a simple centering lemma.
Lemma 4.20 (centering) Let the random variables l, 2, . .. and con-
stants Cl, C2, . .. be such that both n and n + C n converge in distribution.
Then even C n converges.
d
Proof: Assume that n -+ . If C n -+ :1:00 along some subsequence N' c
N, then clearly n + C n ::1::00 along N', which contradicts the tightness of
n + Cn. Thus, the en are bounded. Now assume that C n -+ a and C n -+ b
d
along two subsequences N 1 ,N 2 c N. Then n + C n -+ + a along N l and
n + C n .!4 + b along N2, so + a d + b. Iterating this relation, we get
+n(b-a) d for arbitrary n E IE, which is impossible unless a = b. Thus,
all limit points of (c n ) agree, and C n converges. 0
Proof of Theorem 4.18: Assume conditions (i) through (iii), and define
= n1{Inl < I}. By (iii) and Lemma 4.16 the series En( - E€)
converges a.s., so by (ii) the same thing is true for En €. Finally, P{n i=
Lo.} = 0 by (i) and the Borel-Cantelli lemma, so En (n - ) has a.s.
finitely many nonzero terms. Hence, even En €n converges a.s.
Conversely, assume that En n converges in distribution. Then Lemma
4.19 shows that the sequence of symmetrized partial sums Ekn k is tight,
and so En n converges a.s. by Theorem 4.17. In particular, n -+ 0 a.s. For
any € > 0 we obtain EnP{Inl > €} < 00 by the Borel-Cantelli lemma.
Hence, En P{In - mnl > E} < 00 by Lemma 4.19, where ml, m2,. ..
are medians of 1,€2,... . Using the Borel-Cantelli lemma again, we get
n - m n -+ 0 a.s.
Now let Cl, C2,... be arbitrary with m n - C n -+ o. Then even n - C n -+ 0
a.s. Putting 1Jn = n 1 {In - C n I < I}, we get a.s. n = 1]n or all but
finitely many n, and similarly for the symmetrized variables €n and iin.
Thus, even En TJn converges a.s. Since the fin are bounded and symmetric,
Theorem 4.17 yields En var(1Jn) = ! En var(1]n) < 00. Thus, En('TJn -E1]n)
converges a.s. by Lemma 4.16, as does the series En(n -E1]n). Comparing
with the distributional convergence of En €n, we conclude from Lemma
4.20 that En E1]n converges. In particular, E1Jn --+ 0 and 1]n - E'fJn -+ 0
a.s., so TIn -+ 0 a.s., and then also €n -+ 0 a.s. Hence, m n -+ 0, so we may
take C n = 0 in the previous argument, and conditions (i)-(iii) follow. 0
4. Random Sequences, Series, and Averages 73
A sequence of random variables 1, (2,. .. with partial sums Sn is said to
obey the strong law of large numbers if Sn/n converges a.s. to a constant.
The weak law is defined by the corresponding condition with convergence
in probability. The following elementary proposition enables us to convert
convergence results for random series into laws of large numbers.
Lemma 4.21 (series and averages, Kronecker) If Ll1 n-ca n converges
for some aI, a2, . . . E IR and c > 0, then n -c Lkn ak --+ o.
Proof: Put b n == n-ca n , and assume that Ln b n == b. By dominated
convergence as n -7 00,
L bk - n -c L ak
kn kn
L(l - (k/n)C)bk = C L bk 1 1 x c - 1 dx
kn kn k/n
[1 ,.1
C io x c - 1 dx L b k ---+ be j x C - 1 dx = b,
o knx 0
and the assertion follows since the first term on the left tends to b. 0
The following simple result illustrates the method.
Corollary 4.22 (variance criterion for averages, Kol1nogorov) Let (1,
2,' .. be independent random variables with zero mean such that Ln n- 2c E(
< 00 for some c > o. Then n- C Lkn (k --+ 0 a.s.
Proof: The series Ln n-cn converges a.s. by Lemrna 4.16, and the
assertion follows by Lemma 4.21. 0
In particular, we note that if (,(1,(2,... are i.i.d. with E( = 0 and
E(2 < 00, then n- C Lk<n k --+ 0 a.s. for any c > . The statement fails
for c == , as may be seen by taking ( to be N(O,l). 'The best possible
normalization is given in Corollary 14.8. The next result characterizes the
stated convergence for arbitrary c > . For c = 1 we reeognize the strong
law of large numbers. Corresponding criteria for the weak law are given in
Theorem 5.16.
Theorem 4.23 (strong laws of large numbers, Kolmog01'Ov, Marcinkiewicz
and Zygmund) Let (, (1, (2,... be i.i.d. random variables, and fix any p E
(0,2). Then n- 1 / p Lk<n (k converges a.s. iff EI(IP < ex: and either p < 1
or E == O. In that case the limit equals E for p = 1 and is otherwise o.
Proof: Assume that EI(IP < 00 and also, for p > 1, that E(, = o. Define
{ = {n1{I{nl < n 1 / p }, and note that by Lemma 3.4
LP{ =F n} = LP{IIP > n} < 1 00 P{IIP > t}dt = EIIP < 00.
n n 0
By the Borel-Cantelli lemma we get P{{ =1= n Lo.} == 0, and so == n
for all but finitely many n E N a.s. It is then equivalent to show that
74 Foundations of Modern Probability
n- 1 / p Lkn -t 0 a.s. By Lemma 4.21 it suffices to prove instead that
En n-l/p converges a.s.
For p < 1, this is clear if we write
Lnn-l/PE[II; 1(1 < n l / P ]
1 00 clip E[II; II < tllP]dt
E[II roo Cl/Pdt] EIIP < 00.
J1f,IP
If instead p > 1, it suffices by Theorem 4.18 to prove that Ln n- 1 / p E
converges and Ln n-2/pvar() < 00. Since E == -E[; II > n 1 / p ], we
have for the former series
ELnn-l/PI1
<
Lnn-l/PIEI < Lnn-l/PE[II; II > n l / P ]
< 1 00 Clip E(II; II > tl/P]dt
1f,I P
E[''1 Cl/Pdt] EIIP < 00.
Ln n- 2 / p E(()2
Ln n- 2 / p E[e; II < n l / p ]
1 00 c 2 / p E[e; II < tl/P]dt
E[e roo C 2 / P dt] EIIP < 00.
J1f,IP
If p == 1, then E == E[; II < n] -t 0 by dominated convergence. Thus,
n- 1 Lk<n E 0, and we may prove instead that n- 1 Lk<n € 0 a.s.
where == - E. By Lemma 4.21 and Theorem 4.18 it is then enough
to show that Ln n-2var(€) < 00, which may be seen as before.
Conversely, assume that n-1/PSn == n- 1 / p Lk<n €k converges a.s. Then
As for the latter series, we get
Ln n-2/pvar(()
<
<
".--..
== Sn _ ( n-1 ) 1/p Sn-l --+0 as
nl/p n 1 / p n (n - l)l/p . .,
and in particular P{InIP > n Lo.} == O. Hence, by Lemma 3.4 and the
Borel-Cantelli lemma,
EIIP = 1 00 P{IIP > t}dt < 1 + LP{IIP > n} < 00.
o nl
For p > 1, the direct assertion yields n-1/p(Sn - nE€) --+ 0 a.s., and so
n 1 - l/p Ef, converges, which implies Ef, = O. 0
4. Random Sequences, Series, and Averages 75
For a simple application of the law of large numbers, consider an arbitrary
sequence of random variables l, 2, . .. , and define the associated empirical
distributions as the random probability measures {l,n == n- 1 I:kn 8k. The
corresponding empirical distribution functions Fn are given by
Fn(x) = ,J,n( -00, x] = n- 1 Lk<n l{k < x}, x E JR, n E N.
Proposition 4.24 (empirical distribution functions, Glivenko, Cantelli)
Let l, 2, . .. be i. i. d. random vc:.riaples with distribution function F and
empirical distribution functions F I , F 2 , . .. . Then
lim sup IFn(x) - F(x)1 == 0 a.s. (7)
noo x
Proof: By the law of large numbers we have Fn(x) -+ l(x) a.s. for every
x E JR. Now fix a finite 1?artition -00 == Xl < X2 < . . . < X m == 00. By the
monotonicity of F and Fn
sup IFn{x) - F(x)1 < max IFn(Xk) - F(Xk)1 + max IF(;rk+l) - F(Xk)l.
x k k
Letting n -+ 00 and refining the partition indefinitely, we get in the limit
limsup sup IFn(x) - F(x)1 < sup F(x) a.s.,
noo x x
which proves (7) when F is continuous.
For general F, let {)1,rJ 2 ,... be i.i.d. U(O, 1), and define TJn == g(1J n ) for
each n, where g(t) == sup{x; F(x) < t}. Then TJn < x iff tin < F(x), and so
('TJn) d (n). We may then assume that n - TJn. Writing 0 1 , 6,... or the
empirical distribution functions of'19 I ,{)2,... , we see that also Fn == GnoF.
Writing A == F(JR) and using the result for continuous F, we get a.s.
sup IFn(x) - F(x)1 == sup IGn(t) - tl < sup IGn(t) - tl -+ o. D
x tEA tE[O,I]
We turn to a systematic study of convergence in distribution. Although
we are currently mostly interested in distributions on Euclidean spaces, it
is crucial for future applications that we consider the more general setting
of an abstract metric space. In particular, the theory is applied in Chapter
16 to random elements in various function spaces.
Theorem 4.25 (Portmanteau theorem, Alexandrov) For any random
elements , 1, 2, . .. in a metric space 5, these conditions are equivalent:
(i) n ;
(ii) liminf n P{n E G} > P{ E G} for any open set c; c s;
(iii) limsuPn P{n E F} < P{ E F} for any closed set F c 5;
(iv) P{n E B} -t P{ E B} for any B E B(5) with ( BB a.s.
A set B E B(8) with (j. BB a.s. is often called a -continuity set.
Proof: Assume (i), and fix any open set G c S. Letting f be continuous
with 0 < f < IG, we get Ef(n) < P{n E G}, and (ii) follows as we let
76 Foundations of Modern Probability
n -t 00 and then f t 1G. The equivalence between (ii) and (iii) is clear
from taking complements. Now assume (ii) and (iii). For any B E 8(8),
P{ E BO} < liminf P{n E B} < limsupP{n E B} < P{ E B }.
noo
n-+oo
Here the extreme members agree when ft 8B a.s., and (iv) follows.
Conversely, assume (iv) and fix any closed set F c S. Write FE = {s E S;
p(s,F) < E}. Then the sets 8FE: C is; p(s,F) = c:} are disjoint, and so
ft apE for almost every c: > O. For such an € we may write P{n E F} <
P{ E FE}, and (iii) follows as we let n -t 00 and then E -t O. Finally,
assume (ii) and let f > 0 be continuous. By Lemma 3.4 and Fatou's lemma,
Ef(() ('0 P{J() > t}dt < (JO limin£ P{J((n) > t}dt
J o Jo n-+oo
< limin£ roo P{J(n) > t}dt = limin£ Ef((n). (8)
n-+oo h n-+oo
Now let f be continuous with If I < C < 00. Applying (8) to c:!: f yields
Ef(n) -t Ef(), which proves (i). 0
For an easy application, we insert a simple lemma that is needed in
Chapter 16.
Lemma 4.26 (subspaces) For any metric space (8, p) with subspace A c
S, let , 1, 2,. .. be random elements in (A, p). Then n !!:;. in (A, p) iff
the same convergence holds in (S, p).
Proof: Since , 1, 2, . . . E A, condition (ii) of Theorem 4.25 is equivalent
to
liminf P{n E An G} > P{ E An G}, G c S open.
n-+oo
By Lemma 1.6, this is precisely condition (ii) of Theorem 4.25 for the
subspace A. 0
It is clear directly from the definitions that convergence in distribution is
preserved by continuous mappings. The following more general statement
is a key result of weak convergence theory.
Theorem 4.27 (continuous mapping, Mann and Wald, Prohorov, Rubin)
For any metric spaces Sand T, let ,1,2,... be random elements in S
with n !!:;., and consider some measurable mappings f, 11, !2,...: 8 -t T
and a measurable set C c 8 with E C a.s. such that !n(sn) -t f(s) as
d
Sn -t SEC. Then !n(f,n) --t f().
In particular, we note that if n !!:;. in Sand f: S ---t Tis a.s. continuous
at €, then f(€n) !!:;. I(). This frequently used statement is commonly
referred to as the continuous mapping theorem.
4. Random Sequences, Series, and Averages 77
Proof: Fix any open set GeT, and let S E j-Ie n (7. By hypothesis
there exist an integer mEN and some neighborhood IV of s such that
fk(S') E G for all k > m and s' E N. Thus, N c nk2:m fi:-1G, and so
i-Ie nee U {nk>mi;Ie} 0 .
m -
Now let J-t, J-tl, J..t2, . .. denote the distributions of , 2, 2, , .. . By Theorem
4.25 we get
J-t(f-l G)
< J-t U {n f;IG } O = sup J-t { n fklC } O
k2:m m k'2'm
Tn
< sup liminf J-tn n fklG < liminf J-tn(f;lG).
m n n
k2:m
Using the same theorem again gives J-Ln 0 f;;l J-t 0 j-l, v/hich means that
d
jn(n) f(€). 0
We will now prove an equally useful approximation theorem. Here the
idea is to prove n .!4. € by choosing approximations 'rJn of n and 'rJ of €
such that 1]n .!4. 1]. The desired convergence will follow if w'e can ensure that
the approximation errors are uniformly small.
Theorem 4.28 (approximation) Let €, €n, 1]k, and 'rJ be random elements
in a metric space (S, p) such that 1] .!4. 'rJk as n ---+ 00 fOT fixed k and also
'fJk .!4. . Then €n .!4. € holds under the further condition
lirn limsup E[P(17, n) /\ 1] = O. (9)
k n-+-
Proof: For any closed set F c S and constant c > 0 we have
P{€n E F} < P{1J E FE} + P{P(17,n) > c},
where FE = {s E S; p( s, F) < c}. By Theorem 4.25 we get as n ---+ 00
limsupP{€n E F} < p{1Jk E FE} + limsupP{p('rJ:,n) > c}.
n n---+CX)
Now let k -t 00, and conclude from Theorem 4.25 together with (9) that
limsupP{n E F} < P{€ E FE}.
n ---+ 00
As € ---+ 0, the right-hand side tends to P{€ E F}. Since F was arbitrary,
d
we get n -+ € by Theorem 4.25. 0
Next we consider convergence in distribution on product spaces.
78 Found8:tions of Modern Probability
Theorem 4.29 (random sequences) For any separable metric spaces
8 1 , 8 2 , ... , let == ( 1 , 2 , . . .) and n = (, , ...), n EN, be random
elements in XkSk. Then n iff for any functions fk E Cb(Sk),
E[fl()... fm(:)] -7 E[fl(I)... fm(m)], mEN. (10)
In particular, we note that n follows from the finite-dimensional
convergence
(, . . . , :) (1, . . . , m), mEN.
(11)
If and the n have independent components, it is even sufficient that
k for every k.
Proof: The necessity of the condition is clear from the continuity of the
projections s I---t Sk. To prove the sufficiency, we first assume that (10)
holds for a fixed m. Writing S£ == {B E B(8k); k rf- 8B a.s.} and applying
Theorem 4.25 m times, we obtain
P { ( , . . . , :) E B} -7 P { ( 1 , . . . , m) E B},
(12)
for any set B = B l X . . . x Bm such that B k E S for all k. Since the 8k are
separable, we may choose some countable bases C k C S, and we note that
C l x . . . X C m is then a countable base in 8 1 x . . . X 8m. Hence, any open
set G C 8 1 X . . . X 8m can be written as a countable union of measurable
rectangles Bj == BJ x . .. x Bj with B; E S£ for all k. Since the S are
fields, we may easily reduce to the case when the sets Bj are disjoint. By
Fatou's lemma and (12) we obtain
lim inf P{ (, . . . , :-) E G} lim inf " .P{ (, . . . , :) E B j }
n-+oo n-+oo J
> L.P{(1,...,m)EBj}
J
P{ (1, . . . , m) E G},
and so (11) holds by Theorem 4.25.
To see that (11) implies n , fix any ak E Sk, kEN, and note that
the mapping (81,. . . ,8m) t-+ (81, . . ., 8m, a m +1, a m +2,. . .) is continuous on
8 1 x . · · X 8m for each mEN. By (11) it follows that
( , . . . , : , a m + 1, . . . ) ( 1 , . . . , m , a m + 1, . . . ), mEN. ( 13 )
Writing 'TJ and'TJm for the sequences in (13) and letting p be the metric in
(2), we also note that p(, 'TJm) < 2- m and P(n, 'TJ) < 2- m for all m and
d
n. The convergence {n -+ now follows by Theorem 4.28. 0
In discussions involving distributional convergence of a random sequence
1, 2, . . . , the relationship between the elements n is often irrelevant. It is
then natural to look for a more convenient representation, which may lead
to simpler and more transparent proofs.
4. Random Sequences, Series, and Averages 79
Theorem 4.30 (coupling, Skorohod, Dudley) Let, 1, (2, . .. be random
elements in a separable metric space (S, p) such that €n .5, . Then there
exists a probability space with some random elements T} and T}n d n,
n EN, such that 'TJn -+ 'TJ a. s.
In the course of the proof, we need to introduce families of independent
random elements with given distributions. The existence of such families is
ensured, in general, by Corollary 6.18. When S is complete, we may instead
rely on the more elementary Theorem 3.19.
Proof: First assume that 8 == {l,...,m}, and put Pk == P{ == k} and
Pi: = P {n == k}. Assuming 1) to be U (0, 1) and independent of , we may
- d -
easily construct some random elements n == n such tha! n == k whenever
== k and {} < Pi:/pk' Since Pk -+ Pk for each k, we get n -+ a.s.
For general S, fix any pEN, and choose a partition of S into -continuity
sets B 1 , B 2 ,'.. E 8(8) of diameter < 2- p . Next choose m so large that
P{ tj. Ukm B k } < 2- P , and put Eo == nk:5m Bk. For k == 0, . . . , m, define
d
"" == k when E Bk and ""n == k when n E B k , n E N. Then Kn -+ K, and
by the result for finite S we may choose some n d Kn 'with Kn -+ K a.s.
Let us further introduce some independent random elements ( in S with
distributions P[n E .In E Bk] and define == 2:k (l{Kn == k}, so that
- d
== n for each n.
From the construction it is clear that
{p(€,) > TV} C {ii;n -I K;} U {E Bo}, n,p E N.
Since Kn -t 1\; a.s. and P{€ E Bo} < 2- P , there exists for every p some
np E N with
pU { p(,€) > 2- p } < 2- P , p E f,
n?np
and we may further assume hat nl < n2 < ... . By the Borel-Cantelli
lemma we get a.s. sUPn?n p p(€,) < 2- P for all but finitely many p. Now
- d
define 1]n == €h for np < n < n p +l, and note that €n = 1Jn -+ a.s. 0
We conclude this chapter with a result on functional representations of
limits, needed in Chapters 17 and 21. To motivate the problem, recall from
Lemma 4.6 that if n 'TJ for some random elements in a complete metric
space S, then 'TJ == f() a.s. for some measurable function f : SCXJ -+ S,
where == (n). Here f depends on the distribution J1 of , so a universal
representation must be of the form 'T] == f(, j-t). For certain purposes, it
is crucial to choose a measurable version even of the latter function. To
allow constructions by repeated approximation in probability, we need to
consider the more general case when 1Jn 'TJ for some random elements
'TJn == f n ( , Jl).
80 Foundations of Modern Probability
For a precise statement of the result, let P(S) denote the space of proba-
bility measures J.1 on S, endowed with the a-field induced by all evaluation
maps J.l J.lB, B E 8(8).
Proposition 4.31 (representation of limits) Fix a complete metric space
(S, p), a measurable space U, and some measurable functions 11, 12, . . . : U x
P(U) -4 S. Then there exist a measurable set A C P(U) and a measurable
function f: U x A -4 S such that, whenever is a random element in U
with distribution J-L, the sequence 1}n == fn(' J.1) converges in probability iff
J.1 E A, in which case the limit equals f (, J.l) .
Proof: For sequences s == (81,82,...) in S, define l(8) == limk Sk when
the limit exists and put l(s) == 8 00 otherwise, where Soo E S is arbitrary.
By Lemma 1.10 we note that l is a measurable mapping from 8 00 to S.
Next consider a sequence 1} == (1}1, 1}2, . . .) of random elements in S, and
put v == .c( "l). Define n1, n2, . .. as in the proof of Lemma 4.6, and note
that each nk == nk(v) is a measurable function of v. Let C be the set of
measures v such that nk(v) < 00 for all k, and note that 1Jn converges in
probability iff v E C. Introduce the measurable function
g(s, v) == l(snl(v), Sn2(11)'..')' S == (81,82,...) E 8 00 , v E P(SOO).
If v E C, we see from the proof of Lemma 4.6 that 1J n k(V) converges a.s.,
p
and so ''In -4 g( "l, v).
Now assume that ''In = fn(, J-L) for some random element € in U with
distribution jj and some measurable functions In. It remains to show that
v is a measurable function of J-L. But this is clear from Lemma 1.41 (ii)
applied to the kernel K(J.1,') == f-£ from P(U) to U and the function F ==
(11, 12, . . . ): U x P(U) -+ 8 00 . 0
As a simple consequence, we may consider limits in probability of
measurable processes. The resulting statement will be useful in Chapter
17.
Corollary 4.32 (measurability of limits, Stricker and Yor) For any mea-
surable space T and complete metric space S, let X l , X 2 , . .. be S -valued
measurable processes on T. Then there exist a measurable set AcT and a
measurable process X on A such that Xf converges in probability iff tEA,
in which case Xl" Xt.
Proof: Define t == (xl, Xl, . . .) and J-Lt = .c(t). By Proposition 4.31
there exist a measurable set C C P(Soo) and a measurable function f :
8 00 x C 8 such that Xl" converges in probability iff jjt E C, in which case
Xl" f(t, J-Lt). It remains to note that the mapping t J-Lt is measurable,
which is clear from Lemmas 1.4 and 1.26. 0
4. Random Sequences, Series, and Averages 81
Exercises
1. Let 1,. . . , n be independent symmetric random variables. Show that
P{(L:kk)2 > rL:k} > (1- r)2/3 for any r E (0,1). (Hint: Reduce by
means of Lemma 3.11 to the case of nonrandom Ikl, and use Lemma 4.1.)
2. Let 1,. . . , n be independent symmetric random variables. Show that
P{maxk Ikl > r} < 2P{ISI > r} for all r > 0, where S == L:k k. (Hint:
Let "1 be the first term k where maxk Ik I is attained, and check that
d
("1, S - "1) == ("1, "1 - S).)
3. Let 1,2,... be i.i.d. random variables with P{Inl > t} > 0 for all
t > O. Show that there exist some constants Cl, C2, . .. such that cnn -t 0
in probability but not a.s.
4. Show that a family of random variables t is tight iff SUPt Ef(ltl) < 00
for some increasing function f: IR+ -+ JR+ with f ( (0) == OJ.
5. Consider some random variables n and 1Jn such that (n) is tight and
p p
"1n -t O. Show that even n 'TJn -+ O.
6. Show that the random variables t are uniformly integrable iff SUPt
E f (It I) < 00 for some increasing function f: 1R+ -+ IR+ \vith f (x) / x ---+ 00
as x -+ 00.
7. Show that the condition SUPt EItl < 00 in Lemma 4.10 can be omitted
if A is nonatomic.
8. Let 1, 2, . .. ELI. Show that the n are uniformly integrable iff the
condition in Lemma 4.10 holds with sUP n replaced by lirrl sUPn.
9. Deduce the dominated convergence theorem from Lemma 4.11.
10. Show that if {1€t\P} and {\"1t\P} are uniformly integrable for some p > 0,
then so is {Iat + b"1tl P } for any a, b E JR. (Hint: Use Lemnla 4.10.) Use this
fact to deduce Proposition 4.12 from Lemma 4.11.
11. Give examples of random variables , €1, 2,.'. E £2 such that n -t
holds a.s. but not in L2, in £2 but not a.s., or in £1 but not in £2.
12. Let 1, 2,. .. be independent random variables in £2. Show that L:n n
converges in £2 iff L:n En and L:n var(€n) both converge.
13. Give an example of independent symmetric random variables 1, 2, . . .
such that L:n n is a.s. conditionally (nonabsolutely) convergent.
14. Let n and "1n be symmetric random variables with In I < l1Jn I such that
the pairs (n, 1Jn) are independent. Show that L:n €n converges whenever
L:n "1n does.
15. Let 1, €2, . .. be independent symmetric random variables. Show that
E[(L:n €n)2 /\ 1] < L:n E[€; /\ 1] whenever the latter series converges. (Hint:
Integrate over the sets where sUPn Inl < 1 or > 1, respectively.)
16. Consider some independent sequences of symmetric random variables
k, "11, "1,. .. with 1"1k'1 < Ikl such that L:k k converges, and assume r1'1:
82 Foundations of Modern Probability
T/k for each k. Show that Ek T/k Ek T}k. (Hint: Use a truncation based
on the preceding exercise.)
17. Let En n be a convergent series of independent random variables.
Show that the sum is a.s. independent of the order of terms iff En IE[n;
Inl < 1]1 < 00.
18. Let the random variables nj be symmetric and independent for each
n. Show that E j f"nj 0 iff E j E[f,,;j 1\ 1] -T O.
19. Let €n € and ann for some nondegenerate random variable
and some constants an > O. Show that an -t 1. (Hint: Turning to
subsequences, we may assume that an -+ a.)
20. Let €n and ann +b n for some nondegenerate random variable
, where an > o. Show that an -t 1 and b n -t O. (Hint: Symmetrize.)
21. Let 1, 2,. .. be independent random variables such that an Ek<n €k
converges in probability for some constants an --+ o. Show that the limit is
degenerate.
22. Show that Theorem 4.23 is false for p = 2 by taking the k to be
independent and N ( 0, 1).
23. Let l, 2,. .. be i.i.d. and such that n- 1 / p Ek<n k is a.s. bounded for
some p E (0,2). Show that EIlIP < 00. (Hint: Argue as in the proof of
Theorem 4.23.)
24. Show for p < 1 that the a.s. convergence in Theorem 4.23 remains valid
in LP. (Hint: Truncate the k.)
25. Give an elementary proof of the strong law of large numbers when
EI14 < 00. (Hint: Assuming E = 0, show that E En(Sn/n)4 < 00.)
26. Show by examples that Theorem 4.25 is false without the stated
restrictions on the sets G, F, and B.
27. Use Theorem 4.30 to give a simple proof of Theorem 4.27 when S is
separable. Generalize to random elements and n in Borel sets C and Cn,
respectively, assuming only fn(x n ) -t f(x) for X n E C n and x E C with
X n -t x. Extend the original proof to that case.
28. Give a short proof of Theorem 4.30 when S = JR. (Hint: Note that the
distribution functions Fn and F satisfy Fl --+ p-l a.e. on [0,1].)
Chapter 5
Characteristic Functions
and Classical Limit Theorems
Uniqueness and continuity theorem; Poisson convergence; pos-
itive and symmetric terms; Lindeberg's condition; general
Gaussian convergence; weak laws of large numbers; domain of
Gaussian attraction; vague and weak compactness
In this chapter we continue the treatment of weak convergence from Chap-
ter 4 with a detailed discussion of probability measures on ]uclidean spaces.
Our first aim is to develop the theory of characteristic functions and Laplace
transforms. In particular, the basic uniqueness and continuity theorem will
be established by simple equicontinuity and approximation arguments. The
traditional compactness approach-in higher dimensions a highly nontriv-
ial route--is required only for the case when the limiting function is not
known in advance to be a characteristic function. The cornpactness theory
also serves as a crucial bridge to the general theory of vreak convergence
presented in Chapter 16.
Our second aim is to establish the basic distributional limit theorems in
the case of Poisson or Gaussian limits. We shall then consider triangular
arrays of random variables nj, assumed to be independent for each nand
such that nj 0 as n -4 00 uniformly in j. In this setting, general criteria
will be obtained for the convergence of Lj (nj toward a Poisson or Gaussian
distribution. Specializing to the case of suitably centered and normalized
partial sums from a single i.i.d. sequence 1, 2, . .. , we may deduce the
ultimate versions of the weak law of large numbers and the central limit
theorem, including a complete description of the domain of attraction of
the Gaussian law.
The mentioned limit theorems lead in Chapters 12 and 13 to some ba-
sic characterizations of Poisson and Gaussian processes, ,¥hich in turn are
needed to describe the general independent increment processes in Chapter
15. Even the limit theorems themselves are generalized in various ways in
subsequent chapters. Thus, the Gaussian convergence is extended in Chap-
ter 14 to suitable martingales, and the result is strengthened to uniform
approximation of the summation process by the path of a Irownian motion.
Similarly, the Poisson convergence is extended in Chapter 16 to a general
limit theorem for point processes. A complete solution to the general limit
84 Foundations of Modern Probability
problem for triangular arrays is given in Chapter 15, in connection with
our treatment of Levy processes.
In view of the crucial role of the independence assumption for the meth-
ods in this chapter, it may come as a surprise that the scope of the method
of characteristic functions and Laplace transforms extends far beyond the
present context. Thus, exponential martingales based on characteristic
functions playa crucial role in Chapters 15 and 18, whereas Laplace func-
tionals of random measures are used extensively in Chapters 12 and 16.
Even more importantly, Laplace transforms playa key role in Chapters 19
and 22, in the guises of resolvents and potentials for Markov processes and
their additive functionals, and also in connection with the large deviation
theory of Chapter 27.
To begin with the basic definitions, consider a random vector in JRd
with distribution J.t. The associated characteristic function p is given by
(.t(t) = J e itx J.L(dx) = Eeitf., t E d,
where tx denotes the inner product tlxl +. · . +tdXd. For distributions J..t on
JRi, it is often more convenient to consider the Laplace transform jl, given
by
(.t(u) = J e-uxJ.L(dx) = Ee-uf., u E .
Finally, for distributions J.t on Z+, it is often preferable to use the
(probability) generating function 1/J, given by
'ljJ(s) = L snp{ = n} = Es, s E [0,1].
nO
Formally, jl( u) = {t( iu) and fl( t) = [L( -it), and so the functions jl and jl
are essentially the same, apart from domain. Furthermore, the generating
function 'Ij; is related to the Laplace transform ji by [L(u) = 1/J(e- U ) or
'ljJ( s) = jL( -log s). Though the characteristic function always exists, it may
not be extendable to an analytic function in the complex plane.
For any distribution Jj on ]Rd, we note that the characteristic function
c.p = {l. is uniformly continuous with Icp(t)1 < cp(O) = 1. It is also seen to be
Hermitian in the sense that 'P( -t) = (jj(t), where the bar denotes complex
conjugation. If has characteristic function 'P, then the linear combination
a = all +. . .+add has characteristic function t cp(ta). Also note that if
and 17 are independent random vectors with characteristic functions c.p and
'ljJ, then the characteristic function of the pair (, fJ) is given by the tensor
product 'P Q91/J: (s, t) <.p(s)1/J(t). In particular, + 'fJ has characteristic
function cp1/J, and the characteristic function of the symmetrized variable
- ' equals Icp12.
Whenever applicable, the quoted statements carryover to Laplace trans-
forms and generating functions. The latter functions have the further
5. Characteristic Functions and Classical Limi t Theorems 85
advantage of being positive, monotone, convex, and analytic-properties
that simplify many arguments.
The following result contains some elementary but useful estimates in-
volving characteristic functions. The second inequality was used in the proof
of Theorem 4.17, and the remaining relations will be useful in the sequel
to establish tightness.
Lemma 5.1 (tail estimates) For any probability measure JL on JR, we have
r J 2/T
J.L{ x; Ixl > r} < - (1 - Pt )dt, r >. 0, (1)
2 -2/r
Jl[ -r, r] < 2r J IlT Ij/,t Idt, r > O. (2)
-l/r
If J.-t is supported by JR+, then also
JL[r,oo) < 2(1 - JJ(I/r)), r > O.
(3)
Proof: Using Fubini's theorem and noting that sin x < x/2 for x > 2, we
get for any c > 0
ICc (1 - j/,t)dt = / p,(dx) ICc (1 - eitX)dt
2c / { 1 - Sl:;X } p,(dx) > cp,{:r; Icxl > 2},
and (1) follows as we take c == 2/T. To prove (2), we may write
p,[-r, r] < 2 / 1 -(/r) p,(dx)
r / p,(dx) /(1- rltl)+etxtdt
r / (1 - rltl)+J1,t dt < r j IlT Ij/,t Idt.
-liT
To obtain (3), we note that e- X < for x > 1. Thus, for t > 0,
1 - fit = /(1 - e-tX)p,( dx) > p,{ x; tx > I}. 0
Recall that a family of probability measures JLQ on ]Rd is said to be tight
if
Hrn SUPJLa{x; Ixl > r} == O.
r-+oo 0
The following lemma describes tightness in terms of characteristic func-
tions.
86 Foundations of Modern Probability
Lemma 5.2 (equicontinuity and tightness) A family {J.to} of probability
measures on ]Rd is tight iff {ita} is equicontinuous at 0, and then {ita} is
uniformly equicontinuous on }Rd. A similar statement holds for the Laplace
transforms of distributions on JRi.
Proof: The sufficiency is immediate from Lemma 5.1, applied separately
in each coordinate. To prove the necessity, let €a denote a random vector
with distribution J.to, and write for any s, t E ]Rd
lito(s) - ito(t)1 < EleiSQ - eito I = Ell - ei(t-s)o I
< 2E[I(t - s)ol AI].
If {a} is tight, then by Lemma 4.9 the right-hand side tends to 0 as
t - s -t 0, uniformly in 0:, and the asserted uniform equicontinuity follows.
The proof for Laplace transforms is similar. 0
For any probability measures J-t, J-tl, J.t2, . .. on]Rd, we recall that the weak
convergence J.tn M holds by definition iff J.tnf --t J-Lf for any bounded,
continuous function! on JRd, where J.1! denotes the integral f fdJ.t. The
usefulness of characteristic functions is mainly due to the following basic
result.
Theorem 5.3 (uniqueness and continuity, Levy) For any probability mea-
sures J-L, J..tl, /l2, . .. on JRd, we have fLn J-L iff n (t) -t ji,(t) for every
t E ]Rd, and then itn --t it uniformly on every bounded set. A corresponding
statement holds for the Laplace transforms of distributions on JRi.
In particular, we may take J-ln = v and conclude that a probability
measure J.t on JRd is uniquely determined by its characteristic function fl.
Similarly, a probability measure J.L on 1Ri is seen to be determined by its
Laplace transform fl.
For the proof of Theorem 5.3, we need the following simple cases or
consequences of the Stone-Weierstrass approximation theorem. Here [0,00]
denotes the compactification of JR+.
Lemma 5.4 (approximation) Every continuous function f: d -+ 1R. with
period 27r in each coordinate admits a uniform approximation by linear com-
binations of cos kx and sin kx, k E Zi. Similarly, every continuous function
g: [O,oo]d -+]R+ can be approximated uniformly by linear combinations of
the functions e- kx , k E Zi.
Proof of Theorem 5.3: We consider only the case of characteristic func-
tions, the proof for Laplace transforms being similar. If J.tn J.t, then
iln(t) -+ P,(t) for every t, by the definition of weak convergence. By Lemmas
4.8 and 5.2, the latter convergence is uniform on every bounded set.
5. Characteristic Functions and Classical Limit Theorems 87
Conversely, assume that iln(t) -+ {L(t) for every t. By Lemma 5.1 and
dominated convergence we get, for any a E JRd and r > 0,
r j 2/r
limsuPJ.tn{x; laxi > r} < lim - (1 - /Ln(ta))dt
noo noo 2 -2/r
r j 2/r
- (1 - j1( ta) )dt.
2 -2/r
Since jl is continuous at 0, the right-hand side tends to 0 as r --t 00, which
shows that the sequence (J.ln) is tight. Given any E > 0, we may then choose
r > 0 so large that J.tn{lxl > r} < c for all nand p,{lxl >. r} < c.
Now fix any bounded, continuous function f : JRd --t IR, say with If I <
m < 00. Let Ir denote the restriction of I to the ball {ix i < r}, and extend
fr to a continuous function j on d with III < m and period 27rr in each
coordinate. By Lemma 5.4 there exists some linear cOD1bination 9 of the
functions cos(kx/r) and sin(kx/r), k E Zi, such that Ii' - 91 < E. Writing
\I . II for the supremum norm, we get for any n E N
IJLnf - J.tn91 < JLn{lxl > r}llf -111 + 111- 9\1 < (2m + l)E,
and similarly for M. Thus,
lJ.lnf - Jlfl < IJln9 - j.tgl + 2(2m + l)c, n E N.
Letting n -t 00 and then € --+ 0, we obtain J.tnf --+ {Lf. Since f was
arbitrary, this proves that J.Ln j.t. 0
The next result provides a way of reducing the d-dimensional case to
that of one dimension.
Corollary 5.5 (one-dimensional projections, Cramer and Wold) Let €
and 1, 2, . .. be random vectors in ]Rd. Then n jJ tn t for all
t E JRd. For random vectors in JRi, it suffices that un u for all u E JRt.
Proof: If tn t, then Eeitn -+ Eeit by the definition of weak con-
vergence, and so n by Theorem 5.3. The proof for random vectors in
JRi is similar. 0
The last result contains in particular a basic uniqueness result, the fact
that d 1] iff t d t1] for all t E JRd or Ri, respectively. In other words, a
probability measure on JRd is uniquely determined by its one-dimensional
projections.
We now apply the continuity theorem to prove some classical limit
theorems, and we begin with the case of Poisson convergence. For an intro-
duction, consider for each n E N some i.i.d. random variables €nl, . . . , nn
with distribution
P{c;nj = I} = 1- P{{nj =O} = Cn, n E N,
88 Foundations of Modern Probability
and assume that nC n -+ C < 00. Then the sums Sn == nl + . . . + nn have
generating functions
cns n
'l/;n ( s) == (1 - (1 - s) C n ) n -+ e - c( 1- s) == e - C 2: " s E [0, 1].
n.
nO
The limit 'ljJ(s) == e-c(l-s) is the generating function of the Poisson dis-
tribution with parameter c, the distribution of a random variable rJ with
probabilities P{'f} == n} == e-ccn/n! for n E Z+. Note that the correspond-
ing expected value equals E'f] == 1jJ' (1) == c. Since 'l/Jn --+ , it is clear from
d
Theorem 5.3 that Sn -+ TJ.
Before turning to more general cases of Poisson convergence, we need to
introduce the notion of a null array. By this we mean a triangular array of
random variables or vectors nj, 1 < j < m n , n E N, such that the nj are
independent for each n and satisfy
sUPjE[Inj 1/\ 1] -+ O.
(4)
The latter condition may be thought of as the convergence nj 0 a.s
n --+ 00, uniformly in j. When nj > 0 for all nand j, we may allow the
m n to be infinite.
The following lemma characterizes null arrays in terms of the associated
characteristic functions or Laplace transforms.
Lemma 5.6 (null arrays) Consider a triangular array of random vectors
nj with characteristic functions 'Pnj or Laplace transforms 'l/Jnj. Then (4)
holds iff, respectively,
suPjll - 'Pnj(t)l-+ 0,
inf j 'ljJ n j ( u) -t 1,
t E ]Rd,
U E i.
(5)
Proof: Relation (4) holds iff n,jn 0 for all sequences (jn). By Theorem
5.3 this is equivalent to 'Pn,jn (t) -+ 1 for all t and (jn), which in turn is
equivalent to (5). The proof for Laplace transforms is similar. 0
We now give a general criterion for Poisson convergence of the row sums
in a null array of integer-valued random variables. The result will be ex-
tended in Lemmas 15.15 and 15.24 to more general limiting distributions
and in Theorem 16.18 to the context of point processes.
Theorem 5.7 (Poisson convergence) Let (nj) be a null array of Z+-
valued random variables, and let be Poisson distributed with mean c.
Then Lj nj iff these conditions hold:
(i) LjP{nj>I}-+O;
(ii) E j P{nj = I} -+ c.
Moreover, (i) is equivalent to SUPj f,nj V 1 1. If Lj nj converges 'In
distributiQn, then (i) holds iff the limit is Poisson.
5. Characteristic Functions and Classical Limit Theorems 89
We need the following frequently used lemma.
Lemma 5.8 (sums and products) Consider a null array of constants Cnj >
0, and fix any c E [0,00]. Then IT j (1 - Cnj) --t e- c iff LJ Cnj -t c.
Proof: Since SUPj Cnj < 1 for large n, the first relation is equivalent
to Lj 10g(1 - Cnj) -t -C, and the assertion follows from the fact that
log(l - x) == -x + o(x) as x --t O. 0
Proof of Theorem 5. 7: Let 'lfJnj denote the generating function of nJ. By
Theorem 5.3 the convergence Lj nj is equivalent to IT j 1/-J nJ (s) --t
e-c(l-s) for arbitrary s E [0,1], which holds by Lemmas 5.6 and 5.8 iff
2: .(l-1/Jnj(s)) --+ e(l - s), S E [0,1].
J
By an easy computation, the sum on the left equals
(1 - s) 2: P{nj > O} + 2:(s - sk) 2: P{nJ == k} == Tl + T 2 , (7)
J k>l]
(6)
and we also note that
s(l- s)2:,P{nj > I} < T 2 < S2:.P{nJ > I}. (8)
J J
Assuming (i) and (ii), it is clear that (6) follows from (7) and (8). Now
assume instead that (6) holds. For s == 0 we get Lj P{(nJ > O} -7 c, and
so in general T 1 -4 c(l - s). But then (6) implies T 2 -t 0, and (i) follows
by (8). Finally, (ii) is obtained by subtraction.
To prove that (i) is equivalent to sup] nj V 1 1, we note that
P{SUPjnj < 1} == IT.P{nj < I} == IT.(l- P{nJ > 1}).
J J
By Lemma 5.8 the right-hand side tends to 1 iff Lj P{ nj > I} -+ 0, which
is the stated equivalence.
To prove the last assertion, put Cnj == P {nj > O} and write
Eexp {- 2:jnj} - P{SUPjnj > I} < Eexp {- Lj(nj 1\ I)}
== IT .Eexp{ -(nj 1\ I)} == IT ,{1 - (1 - t- 1 )c n j}
J J
< IT j exp{ -(1 - e- 1 )Cnj} = exp { -(1 - e- 1 ) L/n j } .
If (i) holds and I: j nj TJ, then the left-hand side tends to Ee-"" > 0,
and so the sums C n == Lj Cnj are bounded. Hence, C n converges along a
subsequence N' C N toward some constant c. But then (i) and (ii) hold
along N', and the first assertion shows that 17 is Poisson with mean c. D
Next consider some i.i.d. random variables 1, 2, . .. \\ith P{ k == ::f::1} ==
, and write Sn == 1 +... + n. Then n- 1 / 2 Sn has characteristic function
'Pn(t) = cosn(n-l/2t) == (1 - t2n-l + O(n- 2 )) n -+ e- t2 / 2 == 'P(t).
90 Foundations of Modern Probability
By a classical computation, the function e- x2 / 2 has Fourier transform
i: eitxe-x2/2dx = (21T)1/2e- t2 / 2 , t E.
Hence, <p is the characteristic function of a probability measure on JR. with
density (27r) -1/2e- x2 /2. This is the standard normal or Gaussian distribu-
tion N(O, 1), and Theorem 5.3 shows that n- 1 / 2 Sn (, where ( is N(O, 1).
The general Gaussian law N(m, a 2 ) is defined as the distribution of the ran-
dom variable 'TJ = m + a(, and we note that 'TJ has mean m and variance a 2 .
From the form of the characteristic functions together with the uniqueness
property, it is clear that any linear combination of independent Gaussian
random variables is again Gaussian.
The convergence to a Gaussian limit generalizes easily to a more general
setting, as in the following classical result. The present statement is only
preliminary, and a more general version is obtained by different methods
in Theorem 5.17.
Proposition 5.9 (central limit theorem, Lindeberg, Levy) Let , l, 2, . . .
be i.i.d. random variables with E == 0 and E2 = 1, and let ( be N(O, 1).
Then n- 1 / 2 2:kn k (.
The proof may be based on a simple Taylor expansion.
Lemma 5.10 (Taylor expansion) Let <p be the characteristic function of
a random variable with EIln < 00. Then
n (it)kEk n
cp(t) = L k! + o(t), t -+ O.
k=O
Proof: Noting that le it - 11 < t for all t E IR, we get recursively by
dominated convergence
<p(k)(t) = E(i)keit, t E JR, 0 < k < n.
In particular, cp(k)(O) = E(i)k for k < n, and the result follows from
Taylor's formula. 0
Proof of Proposition 5.9: Let the k have characteristic function <p. By
Lemma 5.10, the characteristic function of n- 1 / 2 Sn equals
CPn(t) = (cp(n- 1 / 2 t) r = (1 - en-l + o(n-1)r -+ e- t2 / 2 ,
where the convergence holds as n --+ 00 for fixed t.
o
Our next aim is to examine the relationship between null arrays of
symmetric and positive random variables. In this context, we may also
derive criteria for convergence toward Gaussian and degenerate limits,
respectively.
5. Characteristic Functions and Classical Limit Theorems 91
Theorem 5.11 (positive and symmetric terms) Let (nj) be a null array
of symmetric random variables, and let be N(O, c) for some c > o. Then
L: j nj 54 iff L: j j c, and also iff these conditions hold:
(i) L: j P{Injl > e} -t 0 for all e > 0,-
(ii) L: j E(j 1\ 1) -+ c.
Moreover, (i) is equivalent to SUPj Injl o. If L: j nj or L: j ,j con-
verges in distribution, then (i) holds iff the limit is Gaussian or degenerate,
respectively.
Here the necessity of condition (i) is a remarkable fact that plays a cru-
cial role in our proof of the more general Theorem 5.15. It is instructive
to compare the present statement with the corresponding result for ran-
dom series in Theorem 4.17. Note also the extended version appearing in
Proposition 15.23.
Proof: First assume that L: j nj 54 . By Theorem 5.3 and Lemmas 5.6
and 5.8 it is equivalent that
L.E(1-costnj)-+ct2, tEIR, (9)
J
where the convergence is uniform on every bounded interval. Comparing
the integrals of (9) over [0,1] and [0,2], we get L: j Ef(nJ) -+ 0, where
f(O) == 0 and
I(x) = 3 - 4snx + Si:X , x E IR \ {O}.
Now f is continuous with f(x) -+ 3 as Ixl -+ 00, and furthermore f(x) > 0
for x i= o. Indeed, the last relation is equivalent to 8 sin x - sin 2x < 6x
for x > 0, which is obvious when x > 1r /2 and follows by differentiation
twice when x E (0, 1r /2). Writing g(x) == infy>x f(y) and letting E > 0 be
arbitrary, we get
L-P{Injl > E} < L.P{f(nj) > g(e)} < L.Ef(nj)/g(c) -+ 0,
J J J
which proves (i).
If instead L: j j c, the corresponding symmetrized variables 1]nj sat-
isfy L,j 'fJnj 0, and we get L: j P{I'fJnjl > c} ---+ 0 as before. By Lemma
4.19 it follows that L: j P {Ij - mnj I > c} -+ 0, where the mnj are medians
of j' and since SUPj mnj -t 0, condition (i) follows again. Using Lemma
5.8, we further note that (i) is equivalent to SUPj Inj I o. Thus, we may
henceforth assume that (i) is fulfilled.
Next we note that, for any t E IR and e > 0,
L .E[l - costnj; Injl < e] == t2 (1 - O(t 2 e 2 )) L .E[j; Injl < c].
J J
Assuming (i), the equivalence between (9) and (ii) now follows as we let
n --t 00 and then E --t O. To get the corresponding result for the variables
92 Foundations of Modern Probability
j' we may instead write
2:B[l- e-tJ; (j < E] = t(l- O(tE))2:B[(j; (j < E], t,E > 0,
J J
and proceed as before. This completes the proof of the first assertion.
Finally, assume that (i) holds and E j f,nj TJ. Then the same relation
holds for the truncated variables nj 1 {If,nj I < I}, and so we may assume
that Inj I < 1 for all j and k. Define C n == E j Ej' If C n -t 00 along some
subsequence, then the distribution of C1/2 E j nj tends to N(O, 1) by the
first assertion, which is impossible by Lemmas 4.8 and 4.9. Thus, (c n ) is
bounded and converges along some subsequence. By the first assertion,
E j nj then tends to some Gaussian limit, so even 'TJ is Gaussian. 0
The following result gives the basic criterion for Gaussian convergence,
under a normalization by second moments.
Theorem 5.12 (Gaussian convergence under classical normalization, Lin-
deberg, Feller) Let (nj) be a triangular array of rowwise independent
random variables with mean 0 and E j Ej -t 1, and let be N(O,l).
Then these conditions are equivalent:
(i) E j (nj ( and SUPj B(j -+ 0;
(ii) E j E[j; Injl > E] -t 0 for all € > o.
Here (ii) is the celebrated Lindeberg condition. OUf proof is based on two
elementary lemmas.
Lemma 5.13 (comparison of products) For any complex numbers Zl,
. . . , Zn and z, . . . , z of modulus < 1, we have
ITIk Zk - TIk z1 < 2: k l zk - zl.
Proof: For n = 2 we get
IZIZ2 - zz1 < I Z IZ2 - zz21 + IZZ2 - zz1 < IZI - z1 + I Z 2 - zl,
and the general result follows by induction.
o
Lemma 5.14 (Taylor expansion) For any t E 1R and n E Z+, we have
( t) k 2 1tl n Itl n+l
it i
e - k! < - 1\ ( ) , .
k=O - n! n + 1 .
Proof: Letting hn(t) denote the difference on the left, we get
hn(t) = i it h n - 1 (s)ds, t > 0, n E Z+.
Starting from the obvious relations Ih_ll - 1 and /ho/ < 2, it follows by
induction that Ih n - 1 (t)! < !tl n In! and Ihn(t)1 < 2\t1 n In!. 0
5. Characteristic Functions and Classical Limit Theorems 93
We return to the proof of Theorem 5.12. At this point we shall prove
only the sufficiency of the Lindeberg condition (ii), which is needed for the
proof of the main Theorem 5.15. To avoid repetition, we postpone the proof
of the necessity part until after the proof of that theorenl.
Proof of Theorem 5.12, (ii) ::::} (i): Write c n ] == Ej and C n == E j C n ].
First we note that for any c > 0
SUPjCnj < c 2 + sUPjE[;j; Injl > c] < c 2 + L E[;]; Injl > c],
J
which tends to 0 under (ii), as n -t 00 and then E -t O.
Now introduce some independent random variables (n] 'with distributions
N(O,cnj), and note that en = E j enj is N(O,c n ). Hence, en . Letting
'Pnj and 'lfJnj denote the characteristic functions of nj and (nJ' respectively,
it remains by Theorem 5.3 to show that il J 'Pnj - il) 'l/Jn) --+ O. Then
conclude from Lemmas 5.13 and 5.14 that, for fixed t E I,
I n .'Pnj(t) - n .V;nj(t) 1 < L l'Pnj(t) -1Pnj(t)1
] J J
< L .1'Pnj(t) - 1 + t2Cnjl + L l1/JnJ(t) -- 1 + !t 2 c n jl
J ]
:S LjEj(l !\ !(njl) + LjEej(l !\ lenj!)'
For any c > 0, we have
L.Ej(l!\ Injl) < €L.cnj + L.E[;j; Injl > E],
J ]]
which tends to 0 by (ii), as n -t 00 and then E -7 O. Further note that
LjEej(l !\ lenjl) < L j Elen j l 3 = L/2 EI(1 3 :S G n SUP j c;,j2 0
by the first part of the proof. 0
The problem of characterizing the convergence to a Gaussian limit is
solved completely by the following result. The reader should notice the
striking resemblance between the present conditions and those of the three-
series criterion in Theorem 4.18. A far-reaching extension of the present
result is obtained by different methods in Chapter 15. As before var[; A] ==
var(lA).
Theorem 5.15 (Gaussian convergence, Feller, Levy) Let (nj) be a null
array of random variables, and let be N(b, c) for some constants band c.
Then E j nj iff these conditions hold:
(i) E j P{Injl > c} --t 0 for all E > 0;
(ii) E j E[nj; Injl < 1] -t b;
(iii) E j var[nj; Inj I < 1] -+ c.
Moreover, (i) is equivalent to SUPj Inj I o. If E j nj converges n
distribution, then (i) holds iff the limit is Gaussian.
94 Foundations of Modern Probability
Proof: To see that (i) is equivalent to SUPj Inj I 0, we note that
P{sUPjlnjl > €} = 1- II.(l- P{I€njl > €}), € > O.
J
Since SUPj P{Inj I > E} -+ 0 under both conditions, the assertion follows
by Lemma 5.8.
Now assume Enj nj . Introduce medians mnj and symmetrizations
- . - d
nj of the varIables nj, and note that m n = SUPj Imnj I -+ 0 and E j nj -+
, where is N(O,2c). By Lemma 4.19 and Theorem 5.11, we get for any
E>O
L .P{Injl > E}
J
< L .P{Inj - mnjl > E - m n }
J
< 2 L .P{Injl > € - m n } -+ O.
J
Thus, we may henceforth assume condition (i) and hence that SUPj Inj I
o. But then E j nj 'fJ is equivalent to E j j 'TJ, where j =
njl{lnjl < I}, and so we may further assume that Injl < 1 a.s. for
all nand j. In this case (ii) and (iii) reduce to b n = E j Enj -+ band
C n = E j var( nj) -+ c, respectively.
Write b nj = Enj, and note that SUPj Ibnjl -+ 0 because of (i). Assuming
(ii) and (iii), we get E j nj -b n -b by Theorem 5.12, and so E j nj
. Conversely, E j nj implies E j nj , and (iii) follows by Theorem
d
5.11. But then E j €nj -b n -+ -b, so Lemma 4.20 shows that b n converges
toward some b ' . Hence, E j nj + b ' - b, and so b ' = b, which means
that even (ii) is fulfilled.
It remains to prove that, under condition (i), any limiting distribution
is Gaussian. Then assume E j nj 'fJ, and note that E j nj ij, where
ij denotes a symmetrization of 1}. If C n -7 00 along some subsequence, then
C;;:1/2 E j (nj tends to N(O, 2) by the first assertion, which is impossible by
Lemma 4.9. Thus, (c n ) is bounded, and we have convergence C n -7 C along
some subsequence. But then Enj nj - b n tends to N(O, c), again by the
first assertion, and Lemma 4.20 shows that even b n converges toward some
limit b. Hence, Enj nj tends to N(b, c), which is then the distribution of
'TJ. 0
Proof of Theorem 5.12, (i) => (ii): The second condition in (i) implies
that (nj) is a null array. Furthermore, we have for any € > 0
L .var[nj; Injl < c] < L .E[j; 'njl < c] < L .Ej -+ 1.
J J J
By Theorem 5.15 even the left-hand side tends to 1, and (ii) follows. 0
5. Characteristic Functions and Classical Limit Theorems 95
As a first application of Theorem 5.15, we shall prove the following ul-
timate version of the weak law of large numbers. The result should be
compared with the corresponding strong law established jn Theorem 4.23.
Theorem 5.16 (weak laws of large numbers) Let, l, 2, . .. be i.i.d. ran-
dom variables, and fix any p E (0,2) and c E JR. Then n- 1 / p Lk<n k C
iff the following condition holds as r --+ 00, depending on the value of p:
p < 1: rPP{I1 > r} --). 0 and c == 0;
p == 1: rP{I1 > r} -+ 0 and E[; II < r] --+ c;
p > 1: r P P{II > r} 0 and E == c == o.
Proof: Applying Theorem 5.15 to the null array of random variables
nj == n-l/pj, j < n, we note that the stated convergence is equivalent to
the three conditions
(i) nP{I1 > n 1 / P E} --+ 0 for all E > 0,
(ii) n 1 - l / p E[; II < n l / p ] --+ c,
(iii) nl-2/pvar[; II < n l / p ] -+ o.
By the monotonicity of P{II > r 1/p }, condition (i) is equivalent to
r P P{II > r} -+ o. Furthermore, Lemma 3.4 yields for any r > 0
rP-2var[; II < r] < r P E[(/r)2 /\ 1] = r P 1 1 P{II > rvt}dt,
rP-lIE[; II < rJl < rPE(I/rl/\ 1) = r P 1 1 P{II > rt}dt.
Since t-a is integrable on [0,1] for any a < 1, it follows by dominated
convergence that (i) implies (iii) and also that (i) implies (ii) with c == 0
when p < 1.
If instead p > 1, we see from (i) and Lemma 3.4 that
EII = 1 00 P{II > r}dr ;s 1 00 (1/\ r-P)dr < 00.
Thus, E[; II < r] -+ E, and (ii) implies E == O. Moreover, we get
from (i)
r P - 1 E[II; II > r] = r P P{II > r} + r P - 1 1 00 P{II > t}dt -+ O.
Under the further assumption that E = 0, we obtain (ii) with c == o.
Finally, let p = 1, and conclude from (i) that
E[II; n < II < n + 1] :S nP{I1 > n} --+ O.
Hence, under (i), condition (ii) is equivalent to E[; II ; r] --+ c. 0
We next extend the central limit theorem in Proposition 5.9 by charac-
terizing convergence of suitably normalized partial sums from a single i.i.d.
sequence toward a Gaussian limit. Here a nondecreasing function L > 0 is
96 Foundations of Modern Probability
said to vary slowly at 00 if sUPx L(x) > 0 and moreover L(cx) r-.J L(x) as
x -7 00 for each c > O. This holds in particular when L is bounded, but it
is also true for many unbounded functions, such as log(x VI).
Theorem 5.17 (domain of Gaussian attraction, Levy, Feller, Khinchin)
Let , 1 , 2, . .. be i. i. d. nondegenerate random variables, and let ( be
N(O,I). Then an Lk<n (k -m n ) ( for some constants an and m n iff the
function L(x) = E[2; II < x] varies slowly at 00, in which case we may
take m n = E. In particular, the stated convergence holds with an = n- I / 2
and m n = 0 iff E == 0 and E2 == 1.
Even other so-called stable distributions may occur as limits, but the
conditions for convergence are too restrictive to be of much interest for
applications. Our proof of Theorem 5.17 is based on the following result.
Lemma 5.18 (slow variation, Karamata) Let be a nondegenerate ran-
dom variable such that L(x) == E[2; II < x] varies slowly at 00. Then so
does the function Lm(x) == E[( - m)2; I - ml < x] for every m E JR, and
moreover
Hm x 2 - p E[IIP; II > x]/ L(x) = 0, P E [0,2). (10)
x ---+ 00
Proof: Fix any constant r E (1,2 2 - P ), and choose Xo > 0 so large that
L(2x) < rL(x) for all x > Xo. For such an x, we get
x 2 - p E[IIP; II > x] x2-PLnoE[IIP; II/x E (2 n ,2 n +1J]
< Lno2(p-2)n E[e; II/x E (2 n , 2n+lJ]
<" 2(p-2)n(r - l)r n L(x)
nO
- (r - l)L(x)/(l - 2P- 2 r).
Now (10) follows, as we divide by L(x) and let x -7 00 and then r -t 1.
In particular, we note that EIIP < 00 for all p < 2. If even E2 < 00,
then E( -m)2 < 00, and the first assertion is obvious. If instead E2 == 00,
we may write
Lm(x) == E[2; I - ml < x] + mE[m - 2; I - ml < x].
Here the last term is bounded, and the first term lies between the bounds
L(x ::l:: m) rv L(x). Thus, Lm(x) r-.J L(x), and the slow variation of Lm
follows from that of L. 0
Proof of Theorem 5.17: Assume that L varies slowly at 00. By Lemma
5.18 this is also true for the function Lm(x) = E[( - m)2; I - ml > x],
where m == E, and so we may assume that E == o. Now define
C n == 1 V sup{x > 0; nL(x) > x 2 }, n E N,
5. Characteristic Functions and Classical Limit Theorems 97
and note that C n t 00. From the slow variation of L it is further clear that
C n < 00 for all n and that, moreover, nL(c n ) f"'V c;. In particular, C n f"'V n 1 / 2
iff L(c n ) f"'V 1, that is, iff var() == 1.
We shall verify the conditions of Theorem 5.15 with b == 0, C == 1, and
nj == j / Cn, j < n. Beginning with (i), let E > 0 be arbitrary, and conclude
from Lemma 5.18 that
P{l c / I } f"'V cP{I1 > CnE} rv cP{I1 > CnE} 0
n C n > € L(c n ) L(CnE) -+.
Recalling that E == 0, we get by the same lemma
nIE[/Cn; I/cnl < 1]1 < E[II; II > cn] '" cnE[I I > cn] -+ 0,
C n C n
(11)
which proves (ii). To obtain (iii), we note that in view of (11)
n
n var [ I C n ; I / C n I < 1] == 2 L ( c n ) - n ( E [ / C n ; I I < C n ]) 2 ---+ 1.
c n
By Theorem 5.15 the required convergence follows with an == cl and
m n = o.
Now assume instead that the stated convergence holds for suitable con-
stants an and m n . Then a corresponding result holds for the symmetrized
variables , 1' 2,... with constants ani J2 and 0, and so we may assume
-1 - d
that C n Lkn k ---+ (. Here, clearly, C n -t 00 and, moreover, C n +l rv Cn,
since even Cl Lkn k ( by Theorem 4.28. Now define for x > 0
T(x) == P{II > x}, L(x) == E[2; II < x], U(x) == E(2 1\ x 2 ).
By Theorem 5.15 we have nT(CnE) ---+ 0 for all E > 0, and also nc2 L(c n )
-t 1. Thus, CT(cnE)1 L(c n ) ---+ 0, which extends by monotonicity to
x 2 f(x) x 2 f(x)
-- < -- -+ 0 , x -+ 00.
U(x) - L(x)
Next define for any x > 0
T(x) == P{II > x}, U(x) == E(2 1\ x 2 ).
By Lemma 4.19 we have T(x + Iml) < 2T(x) for any median m of .
Furthermore, by Lemmas 3.4 and 4.19, we get
x 2 x 2
U(x) = 1 P{t 2 > t}dt < 21 p{4e > t}dt = 8U(x/2).
Hence, as x ---+ 00,
L(2x) - L(x) 4x 2 T(x) 8x 2 T(x - Im,1)
L(x) < U(x) - x 2 T(x) < 8- 1 U(2x) _ 2x 2 T(x _ 1m!) -+ 0,
which shows that L is slowly varying.
98 Foundations of Modern Probability
Finally, assume that n- I / 2 L:k<n k (. By the previous argument
with C n = n l / 2 , we get L(n I/2 ) 2, which implies Et 2 = 2 and hence
var() = 1. But then n- I / 2 Lk<n(k - E) (, and so by comparison
E = o. - 0
We return to the general problem of characterizing the weak convergence
of a sequence of probability measures J-tn on ]Rd in terms of the associated
characteristic functions n or Laplace transforms {In- Suppose that Pn or
/Ln converges toward some continuous limit c.p, which is not recognized as a
characteristic function or Laplace transform. To conclude that J-tn converges
weakly toward some measure J.L, we need an extended version of Theorem
5.3, which in turn requires a compactness argument for its proof.
As a preparation, consider the space M = M(JRd) of locally finite mea-
sures on ]Rd. On M we may introduce the vague topology, generated by
the mappings J..l t-+ J..l f = J f dJ..l for all f E Cj(, the class of continuous
functions f: ]Rd ]R+ with compact support. In particular, J-Ln converges
vaguely to J-L (written as J-Ln -4 J-L) iff J-tnf -+ J-Lf for all f E Cj{. If the J-tn
are probability measures, then clearly J.Ld < 1. The following version of
Helly's selection theorem shows that the set of probability measures on JRd
is vaguely relatively sequentially compact.
Theorem 5.19 (vague sequential compactness, Helly) Any sequence of
probability measures on JRd has a vaguely convergent subsequence.
Proof: Fix any probability measures J-lI, J.L2,... on }Rd, and let F I , F 2 ,. --
denote the corresponding distribution functions. Write Q for the set of
rational numbers. By a diagonal argument, the functions Fn converge on
Qd toward some limit G, along a suitable subsequence N' C N, and we
may define
F(x) = inf{G(r); r E Qd, r > x}, x E ]Rd.
(12)
Since each Fn has nonnegative increments, the same thing is true for G
and hence also for F. From (12) and the monotonicity of G, it is further
clear that F is right-continuous. Hence, by Corollary 3.26 there exists some
measure J-t on }Rd with J-L(x, y] = F(x, y] for any bounded rectangular box
(x,y] C }Rd, and it remains to show that J.Ln J-l along N'.
Then note that Fn(x) -+ F(x) at every continuity point x of F. By the
monotonicity of F there exist some countable sets D I ,... , Dd C JR such
that F is continuous on C == D x . .. x Dd. Then J-tnU J-LU for every
finite union U of rectangular boxes with corners in 0, and by a simple
approximation we get for any bounded Borel set B C ]Rd
/-LBO < liminf J-LnB < lim sup J.LnB < J.LB.
noo n-+oo
(13)
For any bounded j.£-continuity set B, we may consider functions f E OJ(
supported by B, and proceed as in the proof of Theorem 4.25 to show that
v
J-Lnf -+ J-Lf. Thus, J.Ln -+ J-L. 0
5. Characteristic Functions and Classical Limit Theorems 99
If J.-Ln J.L for some probability measures J.Ln on jRd, we may still have
J.Ld < 1, due to an escape of mass to infinity. To exclude this possibility,
we need to assume that (JLn) be tight.
Lemma 5.20 (vague and weak convergence) For any probability measures
fLl, J-L2, . .. on]Rd with J.Ln J.-L for some measure J-L, we have J.L}Rd == 1 iff
(J-tn) is tight, and then J.-Ln J.L.
Proof: By a simple approximation, the vague convergence implies (13)
for every bounded Borel set B, and in particular for the balls Br == {x E
JRd; Ixl < r}, r > O. If J.LJRd == 1, then J.LB -7 1 as l' -7 00, and the
first inequality shows that (J-tn) is tight. Conversely, if (J1n) is tight, then
Hrn SUPn J.LnBr -+ 1, and the last inequality yields J-LJRd == 1.
Now assume that (J.Ln) is tight, and fix any bounded continuous function
f: JRd -+ JR. For any r > 0, we may choose some 9r E Cj{ with 1B T < 9r < 1
and note that
lJ.Lnf - J.-Lfl < lJ.Lnf - J-Lnf 9r I + IJ.-Lnf 9r - J-L19r I + IJl! 9r - Jl/I
< IJlnl9r - J.LI grl + 11111 (Jln + Jl)B.
Here the right-hand side tends to zero as n -t 00 and then r -7 00, so
J-tnl -+ JlI. Hence, in this case J.Ln Jl. 0
Combining the last two results, we may easily show that the notions of
tightness and weak sequential compactness are equivalent. The result is
extended in Theorem 16.3, which forms a starting point for the theory of
weak convergence on function spaces.
Proposition 5.21 (tightness and weak sequential compactness) A se-
quence of probability measures on }Rd is tight iff every 5ubsequence has a
weakly convergent further subsequence.
Proof: Fix any probability measures fLl, Jl2,... on }Rd. By Theorem 5.19
every subsequence has a vaguely convergent further subsequence. If (Jln) is
tight, then by Lemma 5.20 the convergence holds even in the weak sense.
Now assume instead that (J-Ln) has the stated property. If it fails to be
tight, we may choose a sequence nk 00 and some constant c > 0 such
that JlnkB'k > c for all kEN. By hypothesis there exists some probability
measure Jl on JRd such that J.Lnk J.l along a subsequence N' c N. The
sequence (J.-tnk; kEN') is then tight by Lemma 4.8, and in particular there
exists some r > 0 with J.-Lnk B < c for all kEN'. For k > r this is a
contradiction, and the asserted tightness follows. 0
We may now prove the desired extension of Theorem t).3.
100 Foundations of Modern Probability
Theorem 5.22 (extended continuity theorem, Levy, Bochner) Let J-ll, J-l2,
. .. be probability measures on IR d with iln (t) --+ cp(t) for every t E JRd,
where the limit cp is continuous at o. Then J.Ln -4 J.L for some probability
measure J.L on d with {l == <po A corresponding statement holds for the
Laplace transforms of measures on JRi.
Proof: Assume that {In --+ cp, where the limit is continuous at o. As
in the proof of Theorem 5.3, we may conclude that (J-ln) is tight. Hence,
by Proposition 5.21 there exists some probability measure J-l on IR d such
that J-ln J.L along a subsequence N' c N. By continuity we get {In --+ {l
along N', and so <p == fl. Finally, the convergence J.Ln J.L extends to N by
Theorem 5.3. The proof for Laplace transforms is similar. 0
Exercises
1. Show that if and TJ are independent Poisson random variables, then
+ 1] is again Poisson. Also show that the Poisson property is preserved
under convergence in distribution.
2. Show that any linear combination of independent Gaussian random vari-
ables is again Gaussian. Also show that the class of Gaussian distributions
is preserved under weak convergence.
3. Show that 'Pr (t) == (1- t / r)+ is a characteristic functions for every r > O.
(Hint: Compute the Fourier transform "pr of the function tPr (t) == 1 { I t I < r},
and note that the Fourier transform {/J; of 1/J;2 is integrable. Now use Fourier
inversion. )
4. Let cp be a real, even function that is convex on JR+ and satisfies cp( 0) ==
1 and <p( (0) E [0, 1]. Show that 'P is the characteristic function of some
symmetric distribution on . In particular, c.p(t) == e- 1tlC is a characteristic
function for every c E [0, 1]. (Hint: Approximate by convex combinations
of functions 'Pr as above, and use Theorem 5.22.)
5. Show that if jl is integrable, then fl has a bounded and continuous
density. (Hint: Let <(Jr be the triangular density above. Then (<Pr)' == 27rcpr,
and so J e-ituflePr(t)dt == 27r J 'Pr(x - u)Jl(dx). Now let r --+ 0.)
6. Show that a distribution /-l is supported by some set a'll + b iff litt I == 1
for some t =I=- O.
7. Give an elementary proof of the continuity theorem for generating
functions of distributions on Z+. (Hint: Note that if J-Ln J-l for some
distributions on JR+, then iLn --* ji on (0, 00 ). )
8. The moment-generating function of a distribution J-L on JR is given by
[it == J e tx Jl( dx). Assuming fit < 00 for all t in some nondegenerate interval
1, show that jl is analytic in the strip {z E C; z E 1°}. (Hint: Approximate
by measures with bounded support.)
5. Characteristic Functions and Classical Limit Theorems 101
9. Let J-L, J-Ll, J-L2, . .. be distributions on with moment-generating functions
[i, [iI, [i2, . .. such that [in --t ji < 00 on some nondegenerate interval I.
Show that J-Ln J-L. (Hint: If J-Ln v along some subsequence Nt, then
itn --t ;; on 1° along N', and so ;; == ji on 1. By the preceding exercise we
get v1R == 1 and f) == fl. Thus, v == J-L.)
10. Let J-L and v be distributions on 1R with finite moments J x n J-L( dx) ==
J xnv(dx) == m n , where I:n tnlmnl/nI < 00 for some t > O. Show that
J-L == v. (Hint: The absolute moments satisfy the same relation for any
smaller value of t, so the moment-generating functions exist and agree on
(-t,t).)
11. For each n E N, let J-Ln be a distribution on with finite moments m,
kEN, such that limn m == ak for some constants ak with I:k t k \ak 1/ k! <
00 for some t > O. Show that J-Ln J-L for some distribution J-L with moments
ak. (Hint: Each function xk is uniformly integrable wit h respect to the
measures J-Ln. In particular, (J-Ln) is tight. If J-Ln v along some subsequence
then v has moments ak')
12. Given a distribution J-L on 1R x 1R+, introduce the mixed transform
<p( s, t) == J eisx-ty J-L( dx dy), where s E 1R and t > O. Prove versions for <p of
the continuity Theorems 5.3 and 5.22.
13. Consider a null array of random vectors nj == (},..., ]) in zt,
let l, . . . , d be independent Poisson variables with means Cl, . . . , Cd, and
put == (l,..., d). Show that I: j nj iff I:] P{ ;] == I} -+ Ck for
all k and I: j P{I:k j > I} --t O. (Hint: Introduce independent random
variables 1}j d j' and note that I: j nj iff I: J 1}nJ .)
14. Consider some random variables Jl...1] with finite variance such that
the distribution of (, 1]) is rotationally invariant. Show that is centered
Gaussian. (Hint: Let 1, 2, . .. be i.i.d. and distributed as , and note that
n- 1 / 2 I:k<n k has the same distribution for all n. Now use Proposition
5.9.) -
15. Prove a multivariate version of the Taylor expansion in Lemma 5.10.
16. Let J-L have a finite nth moment m n . Show that jl is n times continuously
differentiable and satisfies fl n) == inm n . (Hint: Differentiate n times under
the integral sign.)
11. For J-L and m n as above, show that fl2n) exists iff m2n < 00. Also, char-
acterize the distributions such that fl6 2n - 1 ) exists. (Hint: For jl proceed as
in the proof of Proposition 5.9, and use Theorem 5.17. For jl use Theorem
5.16. Extend by induction to n > 1.)
18. Let J-L be a distribution on 1R+ with moments m n . Show that jLn) ==
( -1 )n mn whenever either side exists and is finite. (Hint: Prove the
statement for n == 1, and extend by induction.)
19. Deduce Proposition 5.9 from Theorem 5.12.
102 Foundations of Modern Probability
20. Let the random variables and nj be such as in Theorem 5.12, and
assume that Lj EInjIC ---? 0 for some c > 2. Show that Lj nj .
21. Extend Theorem 5.12 to random vectors in ]Rd, with the condition
Lj Ej --t 1 replaced by Lj cov(nj) --t a, with as N(O, a), and with
j replaced by lnjI2. (Hint: Use Corollary 5.5 to reduce to one dimension.)
22. Show that Theorem 5.15 remains true for random vectors in JRd, with
var[nj; Injl < 1] replaced by the corresponding covariance matrix. (Hint:
If a, aI, a2,. .. are symmetric, nonnegative definite matrices, then an a
iff u' an U -+ u' au for all u E ]Rd. To see this, use a compactness argument.)
23. Show that Theorems 5.7 and 5.15 remain valid for possibly infinite
row-sums Lj nj. (Hint: Use Theorem 4.17 or 4.18 together with Theorem
4.28. )
24. Let , I, c;2, . .. be i.i.d. random variables. Show that n- 1 / 2 Lk<n k
converges in probability iff = 0 a.s. (Hint: Use condition (iii) in Theorem
5.15. )
25. Let 1,2,'" be Li.d. J-t, and fix any p E (0,2). Find a J-L such that
n- 1 / p Lkn k 0 in probability but not a.s.
26. Let l, 2, . .. be i.i.d., and let p > 0 be such that n- 1 / p Lk::;n k 0
in probability but not a.s. Show that lim sUPn n- 1 / P I Lk<n kl = 00 a.s.
(Hint: Note that EIlIP = 00.) -
21. Give an example of a distribution with infinite second moment in
the domain of attraction of the Gaussian law, and find the corresponding
normalization.
Chapter 6
Conditioning and Disintegration
Conditional expectations and probabilities; regular conditional
distributions; disintegration; conditional independence; transfer
and coupling; existence of sequences and processes; extension
through conditioning
Modern probability theory can be said to begin with the notions of con-
ditioning and disintegration. In particular, conditional expectations and
distributions are needed already for the definitions of martingales and
Markov processes, the two basic dependence structures beyond indepen-
dence and stationarity. Even in other areas and throughout probability
theory, conditioning is constantly used as a basic tool to describe and
analyze systems involving randomness. The notion may be thought of in
terms of averaging, projection, and disintegration-viewpoints that are all
essential for a proper understanding.
In all but the most elementary contexts, one defines conditioning with
respect to a a-field rather than a single event. In general, the result of the
operation is not a constant but a random variable, measurable with respect
to the given a-field. The idea is familiar from elementary constructions of
the conditional expectation E[I1J], in cases where (, 1]) is a random vector
with a nice density, and the result is obtained as a suita.ble function of 1].
This corresponds to conditioning on the a-field F = a(1]).
The simplest and most intuitive general approach to conditioning is via
projection. Here E[IF] is defined for any E £2 as the orthogonal Hilbert
space projection of onto the linear subspace of F-measurable random
variables. The L 2 -version extends immediately, by contiIluity, to arbitrary
ELI. From the orthogonality of the projection one gets the relation
E( - E[IF])( = 0 for any bounded, F-measurable random variable (.
This leads in particular to the familiar averaging characterization of E[IF]
as a version of the density d( . P)/dP on the a-field :F, the existence of
which can also be inferred from the Radon-Nikodym theorem.
The conditional expectation is defined only up to a null set, in the sense
that any two versions agree a.s. It is then natural to look for versions
of the conditional probabilities P[AIF] = E[lAIF] that combine into a
random probability measure on O. In general, such regular versions exist
only for A restricted to suitable sub-a-fields. The basic case is when € is
a random element in some Borel space S, and the conditional distribution
P[ E -IF] may be constructed as an F-measurable random measure on
104 Foundations of Modern Probability
s. If we further assume that F = a('T}) for a random element 'T} in some
space T, we may write P[ E Bll}] == /l(1}, B) for some probability kernel
J.-l from T to S. This leads to a decomposition of the distribution of (, TJ)
according to the values of'TJ. The result is formalized in the disintegration
theorem-a powerful extension of Fubini's theorem that is often used in
subsequent chapters, especially in combination with the (strong) Markov
property.
Using conditional distributions, we shall further establish the basic trans-
fer theorem, which may be used to convert any distributional equivalence
d f('TJ) into a corresponding a.s. representation == f(ij) with a suitable
ij d 1]. From the latter result, one easily obtains the fundamental Daniell-
Kolmogorov theorem, which ensures the existence of random sequences
and processes with specified finite-dimensional distributions. A different
approach is required for the more general Ionescu Tulcea extension, where
the measure is specified by a sequence of conditional distributions.
Further topics treated in this chapter include the notion of conditional
independence, which is fundamental for both Markov processes and ex-
changeability and also plays an important role in Chapter 21, in connection
with SDEs. Especially useful in those contexts is the elementary but
powerful chain rule. Let us finally call attention to the local property of con-
ditional expectations, which in particular leads to simple and transparent
proofs of the strong Markov and optional sampling theorems.
Returning to our construction of conditional expectations, let us fix a
probability space (f!, A, P) and consider an arbitrary sub-a-field F c A.
In L 2 == L 2 (A) we may introduce the closed linear subspace M, consisting
of all random variables TJ E L 2 that agree a.s. with some element of L 2 (:F).
By the Hilbert space projection Theorem 1.33, there exists for every E L2
an a.s. unique random variable 1] E M with - 1J 1. M, and we define
E:F = E[IF] as an arbitrary F-measurable version of TJ.
The L 2 -projection E:F is easily extended to L 1 , as follows.
Theorem 6.1 (conditional expectation, Kolmogorov) For any a-field F c
A there exists an a.s. unique linear operator E:F : L 1 --t L 1 (F) such that
(i) E[E:F; A] = E[; A], E L 1 , A E :F.
The following additional properties hold whenever the corresponding
expressions exist for the absolute values:
(ii) > 0 implies E:F > 0 a.s.;
(iii) EIE.r I < EII;
(iv) 0 < n t implies E:F (,n t E.r (, a.B.;
(v) E:F 'T} = E:F 1} a.s. when is F-measurable;
(vi) E(E:FiJ) == E('TJE:F{) = E(E:F. E:Fl});
(vii) E:FEQ(, = E:F a.B. for all:F c Q.
6. Conditioning and Disintegration 105
In particular, we note that E:F = a.s. iff € has an F-measurable version
and that E:F € == E a.s. when €lLF. We shall often refer to (i) as the
averaging property, to (ii) as the positivity, to (iii) as the L] -contractivity, to
(iv) as the monotone convergence property, to (v) as the pull-out property,
to (vi) as the self-adjointness, and to (vii) as the chain rule. Since the
operator E:F is both self-adjoint by (vi) and idempotent by (vii), it may be
thought of as a generalized projection on £1.
The existence of E:F is an immediate consequence of the Radon-Nikodym
Theorem 2.10. However, we prefer the following elementary construction
from the £2-version.
Proof of Theorem 6.1: First assume that € E £2, and define E:F E by
projection as above. For any A E F we get E - E:F ..L lA, and (i) follows.
Taking A = {E:F € > O}, we get in particular
EIE:F I = E[E:F; A] - E[E:F €; A C ] = E[€; A] - E[E: A C ] < EI€I,
which proves (Hi). Thus, the mapping E:F is uniformly L1-continuous on
£2. Also note that L 2 is dense in £1 by Lemma 1.11 and that £1 is complete
by Lemma 1.31. Hence, E:F extends a.s. uniquely to a linear and continuous
mapping on L 1 .
Properties (i) and (iii) extend by continuity to L 1 , and from Lemma
1.24 we note that EF€ is a.s. determined by (i). If > 0, we see from (i)
with A = {E:FE < O} together with Lemma 1.24 that EJ > 0 a.s., which
proves (ii). If 0 < n t E, then n -+ E in £1 by dominated convergence,
so by (iii) we get E:Fn -+ EF in L 1 . Now the sequence (E:Fn) is a.s.
nondecreasing by (ii), and so by Lemma 4.2 the convergence remains true
in the a.s. sense. This proves (iv).
Property (vi) is obvious when f" 'TJ E L 2 , and it extends to the general
case by means of (iv). To prove (v), we note from the characterization in
(i) that E:F = a.s. when E is F-measurable. In the general case we need
to show that
E[E'TJ; A] == E[f,E F 17; A], A E F,
which follows immediately from (vi). Finally, property (vii) is obvious for
E £2 since L 2 (F) C L 2 (Q), and it extends to the general case by means
of (iv). 0
The next result shows that the conditional expectation E:F is local in
both E and:F, an observation that simplifies many proofs. Given two a-fields
F and g, we say that :F == 9 on A if A E :F n 9 and A n :F == A n g.
Lemma 6.2 (local property) Let the a-fields F, 9 c A and functions
,'TJ E £1 be such that:F = 9 and == 'TJ a.s. on some set A E F n Q.
Then E:F = EQ"., a.s. on A.
Proof: Since lAE:F and 1AEQ'TJ are F n Q-measurable, we get B
A n {E:F > EfJ 'TJ} E F n g, and the averaging property yields
E[E:F; B] = E[; B] = E[1J; B] == E[EfJ'TJ; B].
106 Foundations of Modern Probability
Hence, E:F < EQ1] a.s. on A by Lemma 1.24. The opposite inequality is
obtained by interchanging the roles of (, F) and ('TJ, Q). 0
The conditional probability of an event A E A, given a a-field:F, is defined
as
P:FA = E:F1A or P[AIF] = E[lAIF], A E A.
Thus, p:F A is the a.s. unique random variable in £1(:F) satisfying
E[pFA;B] = p(AnB), BE F.
Note that p:F A = P A a.s. iff AllY and that p:F A = 1A a.s. iff A agrees
a.s. with a set in :F. The positivity of E:F implies 0 < p:F A < 1 a.s., and
the monotone convergence property gives
p:F Un An = Ln p:F An a.s., A}, A 2 ,'" E A disjoint. (1)
However, the random set function p:F is not a measure in general since the
exceptional null set in (1) may depend on the sequence (An).
If 1] is a random element in some measurable space (8, S), we define
conditioning on 1] as conditioning with respect to the induced a-field a('TJ).
Thus,
E17 = EU(7J),
p'IJ A = pU(1J) A,
or
E[I1J] = E[la(1])],
P[AI1]J = P[Ala(ry)].
By Lemma 1.13, the 'TJ-measurable function E17 may be represented in the
form I(ry), where f is a measurable function on S, determined a.e. £(ry) by
the averaging property
E[/('TJ); 17 E B] = E[; 1] E B], B E S.
In particular, the function 1 depends only on the distribution of (, ry). The
situation for p'I'J A is similar. Conditioning with respect to a a-field :F is the
special case when 1] is the identity map from (0, A) to (0, :F).
Motivated by (1), we proceed to examine the existence of measure-valued
versions of the functions p:F and p'TJ. Then recall from Chapter 1 that
a kernel between two measurable spaces (T, T) and (8, S) is a function
J..L: T x S JR + such that j.j(t, B) is I-measurable in t E T for fixed B E S
and a measure in B E S for fixed t E T. Say that J-L is a probability kernel
if J.L(t, S) = 1 for all t. Kernels on the basic probability space S1 are called
random measures.
Now fix a a-field :F c A and a random element in some measurable
space (8, S). By a regular conditional distribution of, given :F, we mean
a version of the function P[ E . IF] on n x S which is a probability kernel
from (n, F) to (8, S), hence an .r-measurable random probability measure
on S. More generally, if 1] is another random element in some measurable
6. Conditioning and Disintegration 107
space (T, T), a regular conditional distribution of €, given 'fJ, is defined as
a random measure of the form
jj('fJ, B) = P[€ E BI17] a.s., B E S,
(2)
where J-t is a probability kernel from T to S. In the extreme cases when
is F-measurable or independent of F, we note that P[€ E BIF] has the
regular version 1{€ E B} or P{€ E B}, respectively. The general case
requires some regularity conditions on the space S.
Theorem 6.3 (conditional distribution) For any Borel space S and mea-
surable space T, let € and 'T} be random elements in Sand T, respectively.
Then there exists a probability kernel J-t from T to S satisfying P[ E .1'fJ] ==
J..t( 1/, .) a.s., and J-L is unique a. e. £( rJ).
Proof: We may assume that S E B(). For every r E ij we may choose
some measurable function Ir = f(., r): T -t [0,1] such that
f(TJ,r)=P[ < rl1J] a.s., rEQ.
(3)
Let A be the set of all t E T such that f ( t, r) is nondecrea'3ing in r E Q with
limits 1 and 0 at :t:oo. Since A is specified by countably many measurable
conditions, each of which holds a.s. at TJ, we have A E T and 'fJ E A a.s.
Now define
F ( t, x) = 1 A ( t) inf r >x f ( t, r) + 1 A c ( t ) 1 {x > O}, x E JR, t E T,
and note that F(t,.) is a distribution function on ]R for every t E T. Hence,
by Proposition 2.14 there exist some probability measures m(t,.) on JR with
met, (-00, x]) = F(t, x), x E JR, t E 7 1 .
The function F( t, x) is clearly measurable in t for each x, and by a monotone
class argument it follows that m is a kernel from T to JR.
By (3) and the monotone convergence property of E1J, we have
m( '1], ( -00, x]) = F( TJ, x) = P[ < xl1]] a.s., x E .
Using a monotone class argument based on the a.s. monotone convergence
property, we may extend the last relation to
m(1], B) = P[ E BI1/] a.s., B E B(JR).
(4)
In particular, we get m('fJ, Be) = 0 a.s., and so (4) remains true on S = BnS
with m replaced by the kernel
j.L(t,.) = met, .)l{m(t, S) = I} + b s l{m(t, S) < I}, t E T,
where s E S is arbitrary. If J-t' is another kernel with the stated property,
then
J.t(T/, (-00, r]) = P[ < rl1J] = jj' (TJ, (-00, r]) a.s., r E Q,
and a monotone class argument yields J-L( "7, .) = J-L' ("I, .) a.s. 0
108 Foundations of Modern Probability
Our next aim is to extend Fubini's theorem, by showing how ordinary
and conditional expectations can be computed by integration with respect
to suitable conditional distributions. The result may be regarded as a
disintegration of measures on a product space into their one-dimensional
components.
Theorem 6.4 (disintegration) Fix two measurable spaces Sand T, a a-
field F c A, and a random element in S such that P[ E . IF] has a
regular version v. Further consider an F -measurable random element rJ in
T and a measurable function f on S x T with Elf(, 'T})I < 00. Then
E(J(,1])IF] = J v(ds)f(s,1]) a.s. (5)
The a.s. existence and F-measurability of the integral on the right should
be regarded as part of the assertion. In the special case when F = a(rJ) and
P[ E .lrJ] == /-L(rJ, .) for some probability kernel J..l from T to 5, (5) becomes
E[f(, 1]) 11]] = J J.t( 1], ds )f( s, 1]) a.s.
Integrating (5) and (6), we get the commonly used formulas
Ef(, 1]) = E J v(ds)f(s, 1]) = E J J.t(1], ds)f(s, 1]). (7)
(6)
If Ji 1J, we may take J..L ( 1J, .) = .c ( ), and (7) reduces to the relation in
Lemma 3.11.
Proof of Theorem 6.4: If B E Sand C E T, we may use the averaging
property of conditional expectations to get
E[P[ E BIF]; 1] E CJ = E[vB; rJ E C]
E J v(ds)l{s E B, 1] E C},
which proves the first relation in (7) for f == 1Bxc. The formula extends,
along with the measurability of the inner integral on the right, first by a
monotone class argument to all measurable indicator functions, and then
by linearity and monotone convergence to any measurable function f > o.
Now fix a measurable function f : S x T -+ JR+ with Ef(f,,1]) < 00,
and let A E F be arbitrary. Regarding (17, lA) as an F-measurable random
element in T x {O, I}, we may conclude from (7) that
P{ E B, 7J E C}
E[J(, 1]); A] = E J v(ds)f(s, 1])lA, A E F.
This proves (5) for f > 0, and the general result follows by taking
differences. 0
Applying (7) to functions of the form f(), we may extend many prop-
erties of ordinary expectations to a conditional setting. In particular, such
6. Conditioning and Disintegration 109
extensions hold for the Jensen, Holder, and Minkowski inequalities. The
first of those implies the LP-contractivity
liEF lIp < 1Illp, E LP, P > 1.
Considering conditional distributions of entire sequences (, 1, 2, . . . ), we
may further derive conditional versions of the basic continuity properties
of ordinary integrals.
The following result plays an important role in Chapter 7.
Lemma 6.5 (uniform integrability, Doob) For any E I}, the conditional
expectations E[IF], :F c A, are uniformly integrable.
Proof: By Jensen's inequality and the self-adjointness property,
E[lE F I; A] < E[EFII; A] == E[IlpF A], A E A,
and by Lemma 4.10 we need to show that this tends to zero as P A 0,
uniformly in F. By dominated convergence along subsequences, it is then
enough to show that pFn An 0 for any a-fields Tn C A and sets An E A
with P An O. But this is clear, since EpFn An == PAn -+ O. 0
Turning to the topic of conditional independence, consider any sub-a-
fields :fi, . . . , Fn, 9 c A. Imitating the definition of ordinary independence,
we say that F 1 , . . . , F n are conditionally independent, given Q, if
pQ n Bk= I1 pQBk a.s., BkEFk,k==l,...,n.
kn kn
For infinite collections of a-fields Ft, t E T, the same property is required for
every finite sub collection :F t1 , . . . , Ft n with distinct indices t 1, . . . , t n E T.
We use the symbol Jig to denote pairwise conditional independence, given
some a-field g. Conditional independence involving events At or random
elements t, t E T, is defined as before in terms of the induced a-fields
a(At) or a(t), respectively, and the notation involving II carries over to
this case.
In particular, we note that any F-measurable rando]Tl elements t are
conditionally independent, given :F. If the t are instead independent of
F, then their conditional independence, given F, is equivalent to ordinary
independence between the t. By Theorem 6.3, any general statement or for-
mula involving independencies between countably many random elements
in some Borel spaces has a conditional counterpart. For example, we see
from Lemma 3.8 that the a-fields :F 1 , F 2 , . .. are conditionally independent,
given some g, iff
(:F 1 ,..., Fn) lL :F n + 1 , n E N.
g
Much more can be said in the conditional case, and we begin with a
fundamental characterization. Here and bel0w, F, Q,. .. with or without
subscripts denote sub-a-fields of A.
110 Foundations of Modern Probability
Proposition 6.6 (conditional independence, Doob) For any a-fields F,
g, and 1-l, we have :F Jig 'Ii iff
P[HIF, Q] == P[HIQ] a.s., H E 1-£.
(8)
Proof: Assuming (8) and using the chain and pull-out properties of
conditional expectations, we get for any F E F and H E 1l
pQ (F n H) - EQ p:FvQ (F n H) == EQ [p:FVg H; F]
- EQ[PQ H; F] == (pQ F) (pQ H),
which shows that F llQ1l. Conversely, assuming :F llQll and using the
chain and pull-out properties, we get for any F E :F, G E Q, and H E 1£
E[PQ H; F n G] E[(Pg F) (pQ H); G]
- E[pQ(FnH);G] == p(FnGnH).
By a monotone class argument, this extends to
E[PQH;A]==p(HnA), AE:FVQ,
and (8) follows by the averaging characterization of p:FvQ H.
o
From the last result we may easily deduce some further useful proper-
ties. Let Q denote the completion of Q with respect to the basic a-field A,
generated by 9 and the family N == {N c A; A E A, P A == O}.
Corollary 6.7 For any a-fields :F, Q, and 1-£, we have
(i) :F llg1l iff :F JlQ(Q, 1-£);
(ii) F Jig:F iff F c g .
Proof: (i) By Proposition 6.6, both relations are equivalent to
P[FIQ, 'Ii] == P[FIQ] a.s., F E :F.
(ii) If F JiQ:F, then by Proposition 6.6
IF == P[FIF, Q] == P[FIQ] a.s., F E F,
which implies F c Q . Conversely, the latter relation yields
P[FIQ] == P[FI Q ] == IF == P[FIF, Q] a.s., F E F,
and so F lLQ:F by Proposition 6.6.
o
The following result is often applied in both directions.
6. Conditioning and Disintegration 111
Proposition 6.8 (chain rule) For any a-fields Q, H, and :F 1 ,:F 2 ,... ,
these conditions are equivalent:
(i) 1-£.1L(F 1 ,F 2 ,...);
g
(ii) 1-£ lL F n + 1 , n > O.
g, F't, . . . , F'n
In particular, we have the commonly used equivalence
1-l11g (F, F') {=:::> 1-£ llg F, 1-l Jlg,:F F'
Proof: Assuming (i), we get by Proposition 6.6 for any H E 1-£ and n > 0
P[HIQ,:F 1 , . . . , Fn] = P[HIQ] = P[HIQ,:F 1 , . . . , :F n + 1 ],
and (ii) follows by another application of Proposition 6.{).
Now assume (ii) instead, and conclude by Proposition 6.6 that for any
HE1-£
P[HIQ, :F 1 ,.. . , Fn] = P[HIQ, :F 1 ,. .. , :F n + 1 ], n > O.
Summing over n < m gives
P[HIQ] = P[HIQ, F 1 ,. . . , :F m ], m > 1,
and so by Proposition 6.6 we have lllLg(Fl, . . . , Fm) for all m > 1, which
extends to (i) by a monotone class argument. 0
The last result is even useful for establishing ordinary independence. In
fact, taking Q = {0, n} in Proposition 6.8, we see that 1-£Jl(:F 1 , F 2 ,.. .) iff
1£ .1L F n + 1 , n > O.
:Fl,...,F'n
Our next aim is to show how regular conditional distributions can be used
to construct random elements with desired properties. This may require an
extension of the basic probability space. By an extension of (0, A, P) we
mean a product space (n, A) = (n x S, AQ9S), equipped with a probability
measure P satisfying PC' x S) = P. Any random element € on 0 may
be regarded as a function on n. Thus, we may formally replace by the
random element t(w,s) = (w), which clearly has the same distribution.
For extensions of this type, we may retain our original notation and write
p and instead of P and €.
We begin with an elementary extension suggested by Theorem 6.4. The
result is needed for various constructions in Chapter 12.
112 Foundations of Modern Probability
Lemma 6.9 (extension) Fix a probability kernel J-L between two measur-
able spaces Sand T, and let be a random element in S. Then there exists
a random element'TJ in T, defined on some extension of the original prob-
ability space 0, such that P[TJ E .I] == J-L(, .) a.s. and also 1]1l€( for every
random element ( on Q.
Proof: Put (f2, A) == (n x T, A 0 T), where T denotes the a-field in T,
and define a probability measure P on f2 by
PA = E J lA(-,t)J1(,dt),
"
A EA.
Then clearly PC. xT) == P, and the random element 1](w, t) = ton n satisfies
p['TJ E .IA] == J-L(,.) a.s. In particular, we get 'TJllA by Proposition 6.6,
\ and so 1]lL€(. 0
For most constructions we need only a single randomization variable.
By this we mean a U(0,1) random variable {) that is independent of all
previously introduced random elements and a-fields. The basic probability
space is henceforth assumed to be rich enough to support any randomiza-
tion variables we may need. This involves no essential loss of generality,
since we can always get the condition fulfilled by a simple extension of the
original space. In fact, it suffices to take
f2 == 0 x [0, 1], A == A Q9 B[O, 1], P = P 0 A,
where A denotes Lebesgue measure on [0,1]. Then {J(w, t) = t is U(O, 1) on
nand {)lLA. By Lemma 3.21 we may use {) to produce a whole sequence
of independent randomization variables {)1, {)2, . .. if required.
The following basic result shows how a probabilistic structure can be car-
ried over from one context to another by means of a suitable randomization.
Constructions of this type are frequently employed in the sequel.
Theorem 6.10 (transfer) For any measurable space S and Borel space
T, let d and 'TJ be random elements in Sand T, respectively. Then
there exists a random element iJ in T with (, iJ) d (, 'fJ). More precisely,
there exists a measurable function f: S x [0, 1] T such that we may take
iJ = f(, iJ) whenever iJlL is U(O, 1).
Proof: By Theorem 6.3 there exists a probability kernel J-L from S to T
satisfying
j.t«(" B) == P[1] E BI]' B E 8[0,1],
and by Lemma 3.22 we may choose a measurable function f: S x [0, 1] -t T
such that I(s, '19) ha5 distribution J..t(s, .) for every s E S. Define fJ = f(, 19).
Using Lemmas 1.22 and 3.11 together with Theorem 6.4, we get for any
6. Conditioning and Disintegration 113
measurable function g: S X [0, 1] -+ JR+
Eg((.,r,) = Eg(,f(,'I9)) = E J g((',f((.,n))du
E J g((., t)p,((., dt) = Eg((., T]),
which shows that (, fJ) d (, "I).
o
The following version of the last result is often useful to transfer
representations of random objects.
Corollary 6.11 (stochastic equations) Fix two Borel spaces Sand T, a
measurable mapping f: T -+ 8, and some random elements in Sand 1J
in T with d f ("I). Then there exists a random element ij d TJ in T with
== f(ij) a.s.
Proof: By Theorem 6.10 there exists some random element ij in T with
(, fj) d (f(1}),1}). In particular, fj d 1J and (, f(ij») d (f(1]), f(1J)). Since
the diagonal in 8 2 is measurable, we get P{ == f{ij)} == P{f(TJ) == f(TJ)} ==
1, and so == f(fJ) a.s. 0
The last result leads in particular to a useful extension of Theorem 4.30.
Corollary 6.12 (extended Skorohod coupling) Let f, fl, f2, . .. be measur-
able functions from a Borel space S to a Polish space T, and let , 1, 2, . . .
be random elements in S with fn(n) f(). Then there exist some
- d - d --
random elements = and n == n such that !n(n) -+ f() a.s.
Proof: By Theorem 4.30 there exist some TJ d !() and TJn d fn(n)
- d
with "In -t ", a.s. By Corollary 6.11 we may further choose some ==
- d - -
and_n = n_such that a.s. f() == TJ and fn(n) == 1Jn for all n. But then
!n«(,n) --t f() a.s. 0
The next result clarifies the relationship between randomizations and
conditional independence. Important applications appear in Chapters 8,
12, and 21.
Proposition 6.13 (conditional independence and randomization) Let,
1}, and ( be random elements in some measurable spa.ces 5, T, and U,
respectively, where S is Borel. Then 1l17( iff € == f('I}, 'l9) a.s. for some
measurable function f: T x [0,1] --t S and some U(O, ]) random variable
19 J.L( 1], ().
Proof: First assume that = j(1],19) a.s., where f is measurable
and 1? J.L ( 1], (). Then Proposition 6.8 yields 19 liT] (, and so (1], 19) J.L1] ( by
Corollary 6.7, which implies lL1]('
Conversely, assume that lL1J(' and let 'l9Jl(1J, () be U(O, 1). By Theorem
6.10 there exists some measurable function f: T x [0,1] -+ S such that the
114 Foundations of Modern Probability
- - d - d
random element == f(TJ, iJ) satisfies == and (, 1]) == (, '17). By the
sufficiency part, we further note that lL1J(' Hence, by Proposition 6.6,
P[t E .11], (] == P[t E . 11/] == P[ E . ITJ] == P[ E . ITJ, (],
- d - d
and so (, "1, () == (, '17, (). By Theorem 6.10 we may choose some iJ == {)
. ... d ... .'" - d
wIth (, 17, (, iJ) == (, 1/, (, fJ). In partIcular, iJlL(1J, () and (, f(1/, fJ)) ==
(,f(1],fJ)). Since == f(1],fJ) and the diagonal in 8 2 is measurable, we get
= 1('fI, J) a.s., and so the stated condition holds with J in place of iJ. 0
We may use the transfer theorem to construct random sequences or pro-
cesses with given finite-dimensional distributions. Given any measurable
spaces 8 1 , 8 2 , . .. , we say that a sequence of probability measures Iln on
1 x . . · X 8 n , n E N, is projective if
J.1n+1 (. X 8 n + 1 ) = Iln, n E N.
(9)
Theorem 6.14 (existence of random sequences, Daniell) Given a projec-
tive sequence of probability measures J-Ln on 8 1 x ... X 8n, n E N, where
8 2 ,8 3 , . .. are Borel, there exist some random elements n in Sn, n E N,
such that .c(1,... 'n) = Iln for all n.
Proof: By Lemmas 3.10 and 3.21 there exist some independent random
variables l, fJ 2 , fJ 3 , ... such that .c(1) = J.11 and the rJ n are Li.d. U(O,l).
We proceed to construct recursively 2, 2, . .. with the desired proper-
ties such that each n is a measurable function of 1, 19 2 , . . . , 'l9n o Assuming
that 1, . . . , n have already been constructed, let 1]1, . . . ,1]n+1 be arbitrary
with joint distribution Iln+1. The projective property yields (1,. . . , n) d
(1]1,. . . , 17n), and so by Theorem 6.10 we may form n+1 as a measurable
function of 1,"., n, Dn+1 such that (1,"', n+1) d ('fIl,..., 1Jn+l)' This
completes the recursion. 0
The last theorem may be used to extend a process from bounded to
unbounded domains. We state the result in an abstract form, designed to
fulfill the needs of Chapters 18 and 24. Let I denote the identity mapping
on any space.
Corollary 6.15 (projective limit) For any Borel spaces 8,8 1 ,8 2 ,." ,
consider some measurable mappings tr n : S -+ Sn and 7rk' : Sn -+ Sk,
k < n, such that
7rk: = 1rk 0 7r, k < m < n.
(10)
Let 8 denote the set of sequences (81,82, . . . ) E 8 1 X 8 2 X . .. with 1rk'sn = Sk
for all k < n, and suppose there exists a measurable mapping h: 8 -+ S
satisfying (7r1, 7r2, . . . ) 0 h = I on 8 . Then for any probability measures Iln
on Sn with J..tn 0 (Irk) -1 = J..tk for all k < n, there exists a probability measure
J.t on S such that J..L 07r;1 = J..tn for all n.
6. Conditioning and Disintegration 115
Proof: Introduce the measures
- ( n n ) -l
J-tn = J-tn 0 1T' 1 , . . . , 1T n ,
n EN,
(11)
and conclude from (10) and the relation between the J.Ln that
( S ) ( n+l n+l ) -l
/In+ 1 . X n + 1 - J-tn + 1 0 1T' 1 , . . . , 1T' n
( n+l ) -l ( n n ) -l
- J-tn+ 1 0 1T n 0 7r 1 , . . . , 1r n
( n n ) -l-
J-tn 0 7r 1 , . . . , 7T' n = J.Ln.
By Theorem 6.14 there exists some measure Jl on 51 x 8 2 X . .. with
/l 0 (7r 1, . . . , 7r n ) - 1 == iln, n EN,
(12)
where ifl, 7r2, . .. denote the coordinate projections in 51 x S2 X . .. . From
(10)-(12) we see that jl is restricted to 5 , which allows us to define J.L ==
jl 0 h- l . It remains to note that
-1 - ( h) -1 - - -1 - - -1 ( n ) -1
J-t 0 7r n = J.-t 0 1T' n = J-t 0 1T' n == J.ln 0 7r n == J-tn 0 7T n == J.Ln.
o
We often need a version of Theorem 6.14 for processes on an arbitrary
index set T. For any collection of spaces St, t E T, define 51 == XtE1St,
leT. Similarly, if each St is endowed with a a-field St, let SI denote the
product a-field (F)tEI St. Finally, if each t is a random element in St, write
I for the restriction of the process (t) to the index set I.
Now let T and T denote the classes of finite and countable subsets of T,
respectively. A family of probability measures J-tl, lET or T , is said to be
projective if
A _
J.LJ(. x SJ\I) == J.LI, Ie J in T or T.
(13)
Theorem 6.16 (existence of processes, Kolmogorov) }r any set of Borel
spaces 5t1. t E T, consider a projective family of probability measures J.LI on
S I, lET. Then there exist some random elements X t in St, t E T, such
that £(X[) == J.-tI for all lET.
Proof: Recall that the product a-field ST in ST is generated by all co-
ordinate projections 1C"t, t E T, and hence consists of all countable cylinder
sets B x ST\U, B E Su, U E T . For each U E T , there exists by Theorem
6.14 some probability measure J-tu on Su satisfying
'"
J-tU(.XSU\I)==J-tI, lEU,
and by Proposition 3.2 the family J.Lu, U E T , is again J)fojective. We may
then define a function J1.: ST -+ [0, 1] by
J.-t(. x ST\U) == J-tu, U E T .
To check the countable additivity of J.L, consider any disjoint sets
A l , A 2 ,... EST- FOf each n we have An = Bn X ST\U n fOf some Un E T
116 Foundations of Modern Probability
and En E SUn. Writing U == Un Un and C n = En X SU\U n , we get
/lUnAn = /lu Un C n = Ln/lUC n = Ln/l An .
We may now define the process X == (X t ) as the identity mapping on the
probability space (ST, ST, JL). 0
If the projective sequence in Theorem 6.14 is defined recursively in terms
of a sequence of conditional distributions, then no regularity condition is
needed on the state spaces. For a precise statement, define the product
J.-l @ 1/ of two kernels J-L and v as in Chapter 1.
Theorem 6.17 (extension by conditioning, Ionescu Tulcea) For any mea-
surable spaces (Sn, Sn) and probability kernels J-ln from 8 1 x . . . X Sn-1 to
Sn, n EN," there exist some random elements n in Sn, n E N, such that
.c(l,..., n) == JLl @... Q9 J-ln for all n.
Proof: Put Fn == SI Q9 . . . 0 Sn and Tn == Sn+l X Sn+2 X . .. , and note
that the class C == Un(:F n x Tn) is a field in To generating the a-field Fex>.
Define an additive function J-L on C by
J.L(A x Tn) == (J-tl @ . . . @ J-ln)A, A E Fn, n E N, (14)
which is clearly independent of the representation C == A x Tn. We need
to extend J-L to a probability measure on :Foo. By Theorem 2.5, it is then
enough to show that J.L is continuous at 0.
For any sequence C 1 , C 2 , . .. E C with C n ..!. 0, we need to show that
j..tC n -+ O. Renumbering if necessary, we may assume for each n that C n ==
An X Tn with An E Fn. Now define
ff: = (J-lk+l Q9 . . . @ J-ln) lA n' k < n,
(15)
with the understanding that f:: == IAn for k == n. By Lemma 1.41 (i) and
(iii), each IT: is an Fk-measurable function on 8 1 x.. · X Sk, and from (15)
we note that
fT: == J-lk+1fk+l' 0 < k < n.
(16)
Since C n ..j.. 0, the functions Ii: are nonincreasing in n for fixed k, say with
limits 9k. By (16) and dominated convergence,
9k == J-lk+19k+l, k > o.
(17)
Combining (14) and (15), we get j-tC n == 10' .J.. 90. If 90 > 0, then by (17)
there exists some Sl E 8 1 with 91(Sl) > O. Continuing recursively, we may
construct a sequence s = (SI, S2,.' .) E To such that 9n(SI, . . . , sn) > 0 for
all n. Then
lc n (8) = IAn (SI,.", sn) = 1::(sl,"', 8n) > 9n(SI,..', sn) > 0,
and so S E nn Cn, which contradicts the hypothesis en ..j.. 0. Thus, 90 = 0,
which means that j..tC n --t O. 0
6. Conditioning and Disintegration 117
As a simple application, we may deduce the existence of independent
random elements with arbitrary distributions. The result extends the
elementary Theorem 3.19.
Corollary 6.18 (infinite product measures, Lomnicki and Ulam) For
any collection of probability spaces (St, St, J..Lt), t E T, there exist some
independent random elements t in St with distributions J..Lt, t E T.
Proof: For any countable subset leT, the associated product measure
J-LI == Q9tEI J-Lt exists by Theorem 6.17. Now proceed 2.8 in the proof of
Theorem 6.16. 0
Exercises
1. Show that (,'TJ) d (',1]) iff P[ E Blry] == P[' E B\1]] a.s. for any
measurable set B.
2. Show that E:F == EQ a.s. for all E £1 iff :F == g .
3. Show that the averaging property implies the other properties of
conditional expectations listed in Theorem 6.1.
4. Let 0 < n t and 0 < 1] < , where 1,2,...,1] E L], and fix a a-field
F. Show that E:F7] < sUPnE:Fn' (Hint: Apply the monotone convergence
property to E:F ( n 1\ 1/).)
5. For any [O,oo]-valued random variable , define E:F == SUPn EF ( 1\
n). Show that this extension of EF satisfies the monotone convergence
property. (Hint: Use the preceding result.)
6. Show that the above extension of E:F remains characterized by the
averaging property and that E:F < 00 a.s. iff the measure . p == E[; .] is
a-finite on F. Extend E:F to any random variable such that the measure
II . P is a-finite on F.
7. Let 1, 2, . .. be [O,oo]-valued random variables, and fix any a-field :F.
Show that lim inf n E:F n > E:F Hm inf n n a.s.
8. Fix any cr-field :F, and let , 1, 2, . .. be random variables with n -+
and E:F sUPn In I < 00 a.s. Show that E:F n --+ E:F a.s.
9. Let :F be the a-field generated by some partition ..4 1 , A 2 ,'.' E A of
n. Show for any E £1 that E[IF] == E[IAk] == E[; Ak]/ P Ak on Ak
whenever P Ak > O.
10. For any a-field F, event A, and random variable l; E £1, show that
E[IF, 1A] == E[; AIF]/ P(AIF] a.s. on A.
11. Let the random variables 1, 2,'.' > 0 and a-fields F 1 , :F 2 ,. .. be
such that E[nIFn] O. Show that n O. (Hint: Consider the random
variables n 1\ 1.)
118 Foundations of Modern Probability
12. Let (, 'TJ) d (, ij), where ELI. Show that E[I1]] d E[I17]. (Hint: If
E[I17] = f('TJ), then E[I17] = 1(17) a.s.)
13. Let (€,1]) be a random vector in ]R2 with probability density f, put
F(y) = J f(x,y)dx,andletg(x,y) = f(x,y)/F(y).ShowthatP[€ E BITJ] =
IB g(x, 'TJ)dx a.s.
14. Use conditional distributions to deduce the monotone and dominated
convergence theorems for conditional expectations from the corresponding
unconditional results.
15. Assume that E:F d for some ELI. Show that is a.s. :F-
measurable. (Hint: Choose a strictly convex function f with Ef() < 00,
and apply the strict Jensen inequality to the conditional distributions.)
16. Assume that (, 'fJ) d (€, (), where 'rJ is (-measurable. Show that €Jl 71 (.
(Hint: Show as above that P[ E BI17] d P[ E BI(], and deduce the
corresponding a.s. equality.)
17. Let € be a random element in some separable metric space S. Show
that P[€ E .IF] is a.s. degenerate iff is a.s. F-measurable. (Hint: Reduce
to the case when P[ E .IF] is degenerate everywhere and hence equal to
b'TJ for some :F-measurable random element 'TJ in S. Then show that = 1]
a.s. )
18. Assuming lL17( and ,lL(, 17, (), show that Jl7],1'( and 1l1l«(' ,).
19. Extend Lemma 3.6 to the context of conditional independence. Also
show that Corollary 3.7 and Lemma 3.8 remain valid for the conditional
independence, given some u-field H.
20. Fix any O'-field F and random element in some Borel space, and
define 'fJ = P[ E ./F]. Show that lLllF.
21. Let and 'fJ be random elements in some Borel space S. Prove the
existence of a measurable function f : S x [0,1] ---+ S and some U(O,l)
random variable ,.liT] such that = /(11,/) a.s. (Hint: Choose f with
(f(T], 11), 'TJ) d (, 'TJ) for any U(O, 1) random variable 11Jl(, TJ), and then let
(I' ij) d ('19,1]) with (, 17) = (/(/' ij), ij) a.s.)
22. Let and'fJ be random elements in some Borel space S. Show that we
may choose a random element ij in S with (, 1]) d (, ij) and 1JJlij.
23. Let the probability measures P and Q on (0, A) be related by Q = .p
for some random variable > 0, and consider any O'-field F c A. Show
that Q = Ep[IF] . P on :F.
24. Assume as before that Q = . P on A, and let :F c A. Show that
EQ['TJIF] = Ep[TJIF]/Ep[IF] a.s. Q for any random variable 1J > o.
Chapter 7
Martingales and Optional Times
Filtrations and optional times; random time-change; martin-
gale properly; optional stopping and sampling; ma.ximum and
upcrossing inequalities; martingale convergence, regularity, and
closure; limits of conditional expectations; regularization of
submartingales
The importance of martingale methods and ideas can hardly be exag-
gerated. Indeed, martingales and the associated notions of filtrations and
optional times are constantly used in all areas of modern probability; they
appear frequently throughout the remainder of this book.
In discrete time a martingale is simply a sequence of integrable random
variables centered at the successive conditional means, a centering that can
always be achieved by the elementary Doob decomposition. More precisely,
given any discrete filtration F = (Fn), that is, an increasing sequence of
a-fields in f2, we say that a sequence M == (M n ) forms a martingale with
respect to F if E[MnIFn-l] = M n - 1 a.s. for all n. A special role is played
by the class of uniformly integrable martingales, which can be represented
in the form M n = E[IFn] for some integrable random variables .
Martingale theory owes its usefulness to a number of powerful general
results, such as the optional sampling theorem, the submartingale conver-
gence theorem, and a wide range of maximum inequalities. The applications
discussed in this chapter include extensions of the Borel-Cantelli lemma
and Kolmogorov's 0-1 law. Martingales can also be used to establish the
existence of measurable densities and to give a short proof of the law of
large numbers.
Much of the discrete-time theory extends immediately to continuous
time, thanks to the fundamental regularization theorem, which ensures
that every continuous-time martingale with respect to a right-continuous
filtration has a right-continuous version with left-hand limits. The impli-
cations of this result extend far beyond martingale theory. In particular, it
will enable us in Chapters 15 and 19 to obtain right-continuous versions of
independent-increment and Feller processes.
The theory of continuous-time martingales is continued in Chapters 17,
18, 25, and 26 with studies of quadratic variation, randoln time-change, in-
tegral representations, removal of drift, additional maximum inequalities,
and various decomposition theorems. Martingales also play a basic role for
especially the Skorohod embedding in Chapter 14, the stochastic integra-
120 Foundations of Modern Probability
tion in Chapters 17 and 26, and the theories of Feller processes, SDEs, and
diffusions in Chapters 19, 21, and 23.
As for the closely related notion of optional times, our present treatment
is continued with a more detailed study in Chapter 25. Optional times are
fundamental not only for martingale theory but also for various models
involving Markov processes. In the latter context they appear frequently in
the sequel, especially in Chapters 8, 9, 12, 13, 14, 19, and 22-25.
To begin our systematic exposition of the theory, we may fix an arbitrary
index set T c IR. A filtration on T is defined as a nondecreasing family of
a-fields :Ft C A, t E T. We say that a process X on T is adapted to
:F = (Ft) if Xt is :Ft-measurable for every t E T. The smallest filtration
with this property, namely Ft = a{X s ; s < t}, t E T, is called the induced
or generated filration. Here "smallest" is understood in the sense of set
inclusion for every fixed t.
By a random time we mean a random element T in T = T U {sup T}. We
say that T is F-optional or an F-stopping time if {T < t} E :Ft for every
t E T, that is, if the process Xt = l{T < t} is adapted. (Here and in similar
cases, we often omit the prefix F when there is no risk for confusion.) If T
is countable, it is clearly equivalent that {T = t} E Ft for every t E T. For
any optional times a and T we note that even a V T and a A 'T are optional.
With every optional time 'T we may associate au-field
:F r = {A E A; An {T < t} EFt, t E T}.
Some basic properties of optional times and the associated a-fields are listed
below.
Lemma 7.1 (optional times) For any optional times a and T, we have
(i) 'T is Fr-measurable;
(ii) :F r = Ft on {T = t} for all t E T;
(iii) Fu n {u < T} C :F aAr == Fa nFr.
In particular, we see from (iii) that {a < 'T} E F(T n Fr, that Fa == Fr
on {a == T}, and that :Fa c:F r whenever a < 'T.
Proof: (iii) For any A E :Fu and t E T, we have
A n {a < r} n {T < t} = (A n {a < t}) n {r < t} n {o- A t < TAt},
which belongs to :Ft since u A t and 'T A t are both Ft-measurable. Hence,
Fa n {a < T} C :Fr.
The first relation now follows as we replace T by u 1\ T. Replacing u and T
by the pairs (0-/\ T, a) and (a /\ T, T), we obtain FO'l\r C FO' n F.,.. To prove
the reverse relation, we note that for any A E :Fu n:F.,. and t E T
An{O'/\r < t} = (An {a < t})U(An{i < t}) EFt,
whence A E :F aAr .
7. Martingales and Optional Times 121
(i) Applying (iii) to the pair (T, t) gives {T < t} E F T for all t E T, which
extends immediately to any t E IR. Now use Lemma 1.4.
(ii) First assume that T = t. Then:F T ==:FT n {T < t} C :Ft- Conversely,
assume that A E Ft and sET. If s > t we get An {T < s} == A E :Ft C :Fs,
and for s < t we have An {T < s} == 0 E :Fs. Thus, A t:: :FT. This shows
that :F". == :Ft when T = t. The general case now follows by part (iii). D
Given an arbitrary filtration :F on JR+, we may define a new filtration
F+ by Ft == nu>t :Fu, t > 0, and we say that :F is right-continuous if
F+ == F. In particular, ;:+ is right-continuous for any filtration :F. We say
that a random time T is weakly;: -optional if {T < t} E :Ft for every t > O.
In that case 'T + h is clearly :F-optional for every h > 0, and we may define
F".+ == nh>oF".+h. When the index set is Z+, we take F+ ==:F and make
no difference between strictly and weakly optional times.
The following result shows that the notions of optional and weakly
optional times agree when F is right-continuous.
Lemma 7.2 (weakly optional times) A random timf T is weakly F-
optional iff it is :F+ -optional, in which case
:FT+ == F; == {A E A; An {T < t} E :Ft, t :> O}. (1)
Proof: For any t > 0, we note that
{T < t}== n {T<r},
r>t
{T<t}== U {T < r},
r<t
(2)
where r may be restricted to the rationals. If A n {T < t} E :Ft+ for all t,
we get by (2) for any t > 0
An{T<t}= U (An{T < r})E:Ft.
r<t
Conversely, if An {T < t} E :Ft for all t, then (2) yields for any t > 0 and
h>O
An {T < t} = n (A n {T < r}) E Ft+h'
rE(t,t+h)
and so A n {T < t} E Ft+. For A == n this proves the first assertion, and
for general A E A it proves the second relation in (1).
To prove the first relation, we note that A E :Fr+ iff 4 E :FT+h for each
h > 0, that is, iff A n {T + h < t} E Ft for all t > 0 and h > O. But this
is equivalent to A n {T < t} E :Ft+h for all t > 0 and h > 0, hence to
An {T < t} E :Ft+ for every t > 0, which means that A E :F;. 0
We have already seen that the maximum and minimum of two optional
times are again optional. The result extends to countable collections as
follows.
122 Foundations of Modern Probability
Lemma 7.3 (closure properties) For any random times T1, T2, . .. and
filtration :F on JR+ or Z+, we have:
(i) If the Tn are :F-optional, then so is a = SUPn Tn.
(ii) If the Tn are weakly:F-optional, then so is T = inf n Tn, and we have
F: = nn F;: .
Proof: To prove (i) and the first assertion in (ii), we note that
{IT < t} = nn {Tn < t},
{T < t} = Un {Tn < t},
(3)
where the strict inequalities may be replaced by < for the index set T = Z+.
To prove the second asse,rtion in (ii), we note that F; c nn:F by Lemma
7.1. Conversely, assuming A E nn:F, we get by (3) for any t > 0
An {T < t} = AnUJT n < t} = Un(An {Tn < t}) E :Ft,
with the indicated modification for T = Z+. Thus, A E :F:. 0
Part (ii) of the last result is often useful in connection with the following
approximation of optional times from the right.
Lemma 7.4 (discrete approximation) For any weakly optional time T in
JR+, there exist some countably valued optional times Tn .J-. T.
Proof: We may define
Tn = 2- n [2 n T + 1], n E N.
Then Tn E 2- n N for all n, and Tn .J-. T. Also note that the Tn are optional
since {Tn < k2- n } = {T < k2- n } E :F k2 -n. 0
It is now time to relate the optional times to random processes. We say
that a process X on JR+ is progressively measurable or simply progressive
if its restriction to n x [0, t] is Ft @ 8[0, t]-measurable for every t > o.
Note that any progressive process is adapted by Lemma 1.26. Conversely,
a simple approximation from the left or right shows that any adapted and
left- or right-continuous process is progressive. A set A c n x R+ is said to
be progressive if the corresponding indicator function 1A has this property,
and we note that the progressive sets form a a-field.
Lemma 7.5 (optional evaluation) Fix a filtration F on an index set T,
let X be a process on T with values in a measurable space (S, S), and let T
be an optional time in T. Then X7' is Fr-measurable under each of these
conditions:
(i) T is countable and X is adapted;
(ii) T = JR+ and X is progressive.
Proof: In both cases, we need to show that
{X r E B, T < t} E :Ft, t > 0, B E S.
7. Martingales and Optional Times 123
This is clear in case (i) if we write
{XrEB}== U {XsEB,r==s}E:Ft, BES.
s$;t
In case (ii) it is enough to show that Xr/\t is Ft-measurable for every t > o.
We may then assume T < t and prove instead that X r is Ft-measurable.
Writing X r = X o'l/J where 'ljJ(w) = (w, T(W)), we note that 'ljJ is measurable
from Ft to Ft Q9B[O, t] whereas X is measurable on n x [0, t] from :Ft Q9B[O, t]
to S. The required measurability of X r now follows by Lemma 1.7. 0
Given a process X on JR+ or Z+ and a set B in the range space of X, we
introduce the hitting time
TB == inf{t > 0; Xt E B}.
It is often important to decide whether TB is optional. The following
elementary result covers the most commonly occurring cases.
Lemma 7.6 (hitting times) Fix a filtration F on T == IR+ or Z+, let X be
an F-adapted process on T with values in a measurable space (S,S), and
let B E S. Then TB is weakly optional under each of these conditions:
(i) T = Z+;
(ii) T == IR+, S is a metric space, B is closed, and X is continuous;
(iii) T = R+, S is a topological space, B is open, and X is right-
continuous.
Proof: In case (i) it is enough to write
{TB < n} = U {X k E B} E F n, n EN.
kE[l,nJ
In case (ii) we get for any t > 0
{TB < t} == U n U {p(Xr,B) < n- 1 } EFt,
h>O nEN rEQn[h,t]
where p denotes the metric in S. Finally, in case (iii) we get
{TB<t}= U {XrEB}EFt, t>O,
rEQn(O,t)
which suffices by Lemma 7.2. 0
For special purposes we need the following more general but much deeper
result, known as the debut theorem. Here and below, a filtration F is said
to be complete if the basic a-field A is complete and each :Ft contains all
P-null sets in A.
124 Foundations of Modern Probability
Theorem 7.7 (first entry, Doob, Hunt) Let the set A c + x n be pro-
gressive with respect to some right-continuous and complete filtration F.
Then the time r(w) == inf{t > 0; (t,w) E A} is :F-optional.
Proof: Since A is progressive, we have An [0, t) E Ft Q9 B([O, t]) for every
t > O. Noting that {r < t} is the projection of A n [0, t) onto 0, we get
fT < t} E Ft by Theorem Al.4, and so r is optional by Lemma 7.2. 0
In applications of the last result and for other purposes, we may need
to extend a given filtration F on 1R.+ to make it both right-continuous and
complete. Writing A for the completion of A, we put N = {A E A ; P A =
O} and define :F t = a{Ft,N}. Then F = ( F t ) is the smallest complete
extension of F. Similarly, :F+ == (F t +) is the smallest right-continuous
extension of F. We show that the two operations commute and can be
combined into a smallest right-continuous and complete extension, known
as the (usual) augmentation of F.
Lemma 7.8 (augmented filtration) Every filtration F on + has a
smallest right-continuous and complete extension g, given by
9t= Ft+ = F t+, t > O. (4)
Proof: First we note that
Ft+ C Ft + C Ft +, t > o.
Conversely, assume that A E F t+. Then A E F t+h for every h > 0, and so,
as in Lemma 1.25, there exist some sets Ah E Ft+h with P(AAh) = O.
Now choose h n --t 0, and define A' = {Ah n i.o.}. Then A' = Ft+ and
P(AA') = 0, so A E Ft+ . Thus, F t+ C Ft+ , which proves the second
relation in (4).
In particular, the filtration 9 in (4) contains F and is both right-contin-
uous and complete. For any filtration 1-£ with those properties, we have
Qt = F t+ C 1-l t+ = 1-lt+ = 1-lt, t > 0,
which proves the required minimality of g. 0
The next result shows how the a-fields Fr arise naturally in connection
with a random time-change.
Proposition 7.9 (random time-change) Let X > 0 be a nondecreas-
ing, right-continuous process adapted to some right-continuous filtration
F. Then
Ts = inf{t > 0; Xt > s}, s > 0,
is a right-continuous process of optional times, generating a right-
continuous filtration Qs = Frs' S > O. If X is continuous and the time
T is :F-optional, then X r is Q-optional and :F r C Qx.,.. If X is further
strictly increasing, then Fr = 9x.,..
7. Martingales and Optional Times 125
In the latter case, we have in particular Ft == 9 X t for all t, so the processes
(Ts) and (X t ) play symmetric roles.
Proof: The times Ts are optional by Lemmas 7.2 and 7.6, and since (T 8 )
is right-continuous, so is (9s) by Lemma 7.3. If X is continuous, then by
Lemma 7.1 we get for any F-optional time T > 0 and set A E F T
An{XT < s}=An{T < Ts}EFTs==9s, E > O.
For A == 0 it follows that X T is 9-optional, and for general A we get
A E 9 x T' Thus, F T C 9 x T' Both statements extend by Lemma 7.3 to
arbitrary T.
Now assume that X is also strictly increasing. For any A E QXt with
t > 0 we have
An{t < Ts} == An{X t < s} E Qs == FTs' S > 0,
and so
An {t < Ts < u} E Fu, S > 0, u > t.
Taking the union over all S E Q+ -the set of nonnegative rationals-gives
A E :Fu, and as u -!. t we get A E :Ft+ == :Ft. Hence, .Ft == 9 x t? which
extends as before to t == O. By Lemma 7.1 we now obtain for any A E gX T
An {T < t} = An {X T < X t } E QXt == Ft, t > 0,
and so A E :FT' Thus, gX T C Fr, so the two a-fields agree.
o
To motivate the introduction of martingales, we may fix a random
variable E £1 and a filtration F on some index set T, and put
Mt == E[IFt], t E T.
The process M is clearly integrable (for each t) and adapted, and by the
chain rule for conditional expectations we note that
Ms == E[Mtl:F s ] a.s., s < t.
(5)
Any integrable and adapted process M satisfying (5) is called a martingale
with respect to :F, or an :F-martingale. When T == Z+, it suffices to require
(5) for t == s + 1, so in that case the condition becomes
E[LlM n IFn-1] == 0 a.s., n E N,
(6)
where LlM n = M n - M n - 1 . A process M == (M 1 , . . . , M d ) in ]Rd is said to
be a martingale if Ml, . . . , Md are one-dimensional martingales.
Replacing the equality in (5) or (6) by an inequality we arrive at the
notions of sub- and supermartingales. Thus, a submartingale is defined as
an integrable and adapted process X with
Xs < E[XtIFs] a.s.,
s < t.
- ,
(7)
reversing the inequality sign yields the notion of a superrnartingale. In par-
ticular, the mean is nondecreasing for submartingales and nonincreasing
126 Foundations of Modern Probability
for supermartingales. (The sign convention is suggested by analogy with
sub- and super harmonic functions.)
Given a filtration F on Z+, we say that a random sequence A = (An)
with Ao = 0 is predictable with respect to F, or :F-predictable, if An
is Fn_l-measurable for every n E N, that is, if the shifted sequence
OA = (A n + 1 ) is adapted. The following elementary result, known as the
Doob decomposition, is useful to deduce results for submartingales from
the corresponding martingale versions. An extension to continuous time is
proved in Chapter 25.
Lemma 7.10 (centering) Any integrable and F-adapted process X on Z+
has an a.s. unique decomposition M + A, where M is an F-martingale
and A is an :F-predictable process with Ao = o. In particular, X is a
submartingale iff A is a.s. nondecreasing.
Proof: If X = M + A for some processes M and A as stated, then clearly
LlAn = E[dXnlFn-l] a.s. for all n E N, and so
An = E[8X k IF k - 1 ] a.s., n E Z+,
L..J k '5. n
(8)
which proves the required uniqueness. In general, we may define a
predictable process A by (8). Then M = X - A is a martingale, since
E[LlMnl:Fn-l] = E[XnIFn-l] - An = 0 a.s., n E N.
o
We proceed to show how the martingale and submartingale properties
are preserved under various transformations.
Lemma 7.11 (convex maps) Let M be a martingale in jRd, and consider
a convex function f: JRd -+ 1R such that X = f(M) is integrable. Then X
is a submartingale. The statement remains true for any real submartingale
M, provided that f is also nondecreasing.
Proof: In the martingale case, the conditional verSion of Jensen's
inequality yields
f(M s ) = f(E[MtIFs]) < E[f(Mt)IFs] a.s., s < t, (9)
which shows that f(M) is a submartingale. If instead M is a submartin-
gale and f is nondecreasing, the first relation in (9) becomes f(M s ) <
f(E[MtIFs)), and the conclusion remains valid. 0
The last result is often applied with f(x) = Ixl P for some p > lor, for
d = 1, with f(x) = x+ = x V o.
We say that an optional time T is bounded if T < U a.s. for some u E T.
This is always true when T has a last element. The following result is an
elementary version of the basic optional sampling theorem. An extension to
continuous-time submartingales appears as Theorem 7.29.
7. Martingales and Optional Times 127
Theorem 7.12 (optional sampling, Doob) Let M be a martingale on
some countable index set T with filtration F, and consider two optional
times u and T, where T is bounded. Then M..,. is integrable, and
M u /\..,. == E[M..,. IFu] a.s.
Proof: By Lemmas 6.2 and 7.1 we get for any t < u in 'T
E[Mu IF..,.] = E[Mu 1Ft] = Mt == M..,. a.s. on {7 == t},
and so E[MuIFr] = M..,. a.s. whenever T < U a.s. If a < 7 < u, then
Fu c F..,. by Lemma 7.1, and we get
E[MrIFu] == E[E[MuIF..,.]IFu] == E[MuIFu] == Jt;f a a.s.
On the other hand, clearly E[M..,.IFu] == M T a.s. when 7 < (J !\ u. In the
general case, the previous results combine by means of Lemmas 6.2 and 7.1
into
E[MrIFu] = E[M..,.IFu/\r] = MU/\T a.s. on {a < T},
E[MTIFu] = E[Mu/\..,.IFu] = MUI\T a.s. on {a > 7}. 0
In particular, we note that if M is a martingale on an arbitrary time
scale T with filtration F and (Ts) is a nondecreasing family of bounded,
optional times that take countably many values, then the process (M Ts ) is a
martingale with respect to the filtration (Frs)' In this sense, the martingale
property is preserved by a random time-change.
From the last theorem we note that every martingale M satisfies EMu ==
EM T , for any bounded optional times a and T that take only count-
ably many values. An even weaker property characterizes the class of
martingales.
Lemma 7.13 (martingale criterion) Let M be an integrable, adapted pro-
cess on some index set T. Then M is a martingale iff EMu = EM..,. for
any T -valued optional times a and T that take at most t'l.VO values.
Proof: If s < t in T and A E Fs, then T = slA + t1 A c is optional, and so
o == EMt - EM r == EMt - E[Ms; A] - E[Mt; A C ] == E(M t - Ms; A].
Since A is arbitrary, it follows that E[Mt - Ms IFs] = 0 a.s.
o
The following predictable transformation of martingales is basic for the
theory of stochastic integration.
Corollary 7.14 (martingale transform) Let M be a martingale on some
index set T with filtration F, fix an optional time T that takes countably
many values, and let 'fJ be a bounded, Fr-measurable random variable. Then
the process Nt = 'fJ(Mt - Mtl\r) is again a martingale.
Proof: The integrability follows from Theorem 7.12, and the adaptedness
is clear if we replace 'fJ by 'fJl{T < t} in the expression for Nt. Now fix any
128 Foundations of Modern Probability
bounded, optional time a taking countably many values. By Theorem 7.12
and the pull-out property of conditional expectations, we get a.s.
E[NuIFr] == 1]E[M u - MUl\rlFr] == 'TJ(MUI\T - MUl\r) == 0,
and so EN a = O. Thus, N is a martingale by Lemma 7.13.
o
In particular, we note that optional stopping preserves the martingale
property, in the sense that the stopped process M[ == Mrl\t is a martingale
whenever M is a martingale and T is an optional time that takes countably
many values.
More generally, we may consider predictable step processes of the form
Vi == " 1Jk 1 {t> Tk}, t E T,
k$.n
where T} < ... < Tn are optional times, and each TJk is a bounded, F rk -
measurable random variable. For any process X, we may introduce the
associated elementary stochastic integral
(V . X)t = t dXs = L 'fJk(X t - X tMk ), t E T.
10 k$.n
From Corollary 7.14 we note that V · X is a martingale whenever X is a
martingale and each Tk takes countably many values. In discrete time we
may clearly allow V to be any bounded, predictable sequence, in which
case
(V . X)n == " VkXk, n E Z+.
ks.:n
The result for martingales extends in an obvious way to submartingales X,
provided that the predictable sequence V is nonnegative.
Our next aim is to derive some basic martingale inequalities. We begin
with an extension of Kolmogorov's maximum inequality in Lemma 4.15.
Proposition 7.15 (maximum inequalities, Bernstein, Levy) Let X be a
submartingale on a countable index set T. Then for any r > 0 and u E T,
rP{suPts.:uXt > r} < E[Xu; SUPts.:uXt > r] < Ext, (10)
rP{suPtlXtl > r} < 3suPtEIXtl. (11)
Proof: By dominated convergence it is enough to consider finite index
sets, so we may assume that T = Z+. Define T = u /\ inf{t; Xt > r} and
B = {maxts.:u Xt > r}. Then T is an optional time bounded by u, and we
note that B E F T and X r > r on B. Hence, by Lemma 7.10 and Theorem
7.12,
rPB < E[Xr;B] < E[Xu;B] < EX:,
7. Martingales and Optional Times 129
which proves (10). Letting M + A be the Doob decomposition of X and
applying (10) to -M, we further get
rP{mintuXt < -r} < rP{mintuMt < -r} < EM;;
EM: - EMu < EX- - EX o
< 2 maxtuEIXtl.
Combining this with (10) yields (11).
o
We proceed to derive a basic norm inequality. For processes X on some
index set T, we define
x; == sUPstIXsl,
X* == SUPtETIXtl.
Proposition 7.16 (norm inequality, Doob) Let M be a martingale on a
countable index set T, and fix any p, q > 1 with p-l + q-l == 1. Then
IIMtilp < qllMtll p , t E T.
Proof: By monotone convergence we may assume that T == Z+. If
IIMt\lp < 00, then IIMsllp < 00 for all 8 < t by Jensen's inequality, and
so we may assume that 0 < IIMtilp < 00. Applying Proposition 7.15 to the
submartingale IMI, we get
rP{Mt > r} < EUMtl; Mt > r], r > o.
Hence, by Lemma 3.4, Fubini's theorem, and Holder's inequality,
IIMtll p 1 00 P{Mt > r}r P - 1 dr
< p 1 00 E[lMtl; Mt > r] r p - 2 dr
M*
- pE I M t'l t r p - 2 dr = q E IMtl Mt(P-l)
< qllMtll p IIMt(P-l)llq =qIIMtllpIIMtll-l.
It remains to divide by the last factor on the right.
o
The next inequality is needed to prove the basic Theorem 7.18. For any
function f on T and constants a < b, the number of [a, b]-crossings of f up
to time t is defined as the supremum of all n E Z+ such that there exist
times 81 < t1 < 82 < t2 < ... < Sn < t n < t in T with f(Sk) < a and
f(tk) > b for all k. The supremum may clearly be infinite.
130 Foundations of Modern Probability
Lemma 7.17 (upcrossing inequality, Doob, Snell) Let X be a submartin-
gale on a countable index set T, and let N:(t) denote the number of
[a, b]-crossings of X up to time t. Then
ENb(t) < E(X t - a)+ t E T, a < b in R
a - b-a '
Proof: As before, we may assume that T = Z+. Since Y = (X - a)+ is
again a submartingale by Lemma 7.11 and the [a, b]-crossings of X corre-
spond to [0, b - a]-crossings of Y, we may assume that X > 0 and a = O.
Now define recursively the optional times 0 = TO < 0"1 < T1 < 0"2 < . .. by
(Jk = inf{n > Tk-1; X n = OJ, Tk = inf{n > Uk; X n > b}, kEN,
and introduce the predictable process
V n = Lk11{l1k < n < 7k}, n EN.
Then (1 - V) . X is again a submartingale by Corollary 7.14, and so
E(l - V) . X)t > E((l - V) . X)o = 0, t > o.
Since also (V . X)t > bN8(t), we get
bEN8(t) < E(V . X)t < E(l . X)t = EXt - EX o < EXt. 0
We may now state the fundamental regularity and convergence theorem
for submartingales.
Theorem 7.18 (regularity and convergence, Doob) Let X be an £1_
bounded submartingale on a countable index set T. Then Xt converges along
every increasing or decreasing sequence in T, outside some fixed P-null set
A.
Proof: By Proposition 7.15 we have X* < 00 a.s., and Lemma 7.17
shows that X has a.s. finitely many up crossings of every interval [a, b] with
rational a < b. Outside the null set A where any of these conditions fails,
it is clear that X has the asserted property. 0
The following is an interesting and useful application.
Proposition 7.19 (one-sided bounds) Let M be a martingale on Z+ with
M < c a.s. for some constant c < 00. Then a.s.
{M n converges} = {suPn M n < oo}.
Proof: Since M - Mo is again a martingale, we may assume that Mo = o.
Introduce the optional times
Tm = inf{n; M n > m}, mEN.
The processes MTm are again martingales by Corollary 7.14. Since MTm <
m + c a.s., we have EIMTm I < 2(m + c) < 00, and so MTm converges a.s.
7. Martingales and Optional Times 131
by Theorem 7.18. Hence, M converges a.s. on
{suPn Mn < oo} = Urn {M - MTm}.
The reverse implication is obvious, since every convergent sequence in 1R is
bounded. 0
From the last result we may easily derive the following useful extension
of the Borel-Cantelli lemma in Theorem 3.18.
Corollary 7.20 (extended Borel-Cantelli lemma, Levy) For any filtration
:F on Z+, let An E :F n , n E N. Then a.s.
{An Lo.} = {2:nP[AnIFn-l] = oo}.
Proof: The sequence
M n == (lA k - P[A k l:Fk-1]) , n E Z+,
kn
is a martingale with IMnl < 1, and so by Proposition 7.19
P{M n -t oo} == P{M n -t -oo} == O.
Hence, a.s.
{An i.o.} = {Ln IAn = oo} = {Ln P[AnIFn-d = oo} .
A martingale M or submartingale X is said to be closed if u == sup T
belongs to T. In the former case, clearly Mt == E[Mul:F t ] a.s. for all t E T.
If instead u ft T, we say that M is closable if it can be extended to a
martingale on T == TU{u}. If Mt == E[IFt] for some E L 1 , we may clearly
choose Mu == . The next result gives general criteria for closability. An
extension to continuous-time submartingales appears as part of Theorem
7.29.
o
Theorem 7.21 (uniform integrability and closure, Doob) For any mar-
tingale M on an unbounded index set T, these conditions are equivalent:
(i) M is uniformly integrable;
(ii) M is closable at sup T;
(iii) M is L 1 -convergent at sup T.
Under those conditions, M is closable by the limit in (iii).
Proof: First note that (ii) implies (i) by Lemma 6.5. Next (i) implies (iii)
by Theorem 7.18 and Proposition 4.12. Finally, assume that Mt -t in £1
as t ---+ u = sup T. Using the L1-contractivity of conditional expectations,
we get as t -t u for fixed s,
Ms == E[MtI:Fs] -1 E[I:Fs] in L 1 .
Thus, Ms = E[IFs] a.s., and we may take Mu == . This shows that (iii)
implies (ii). 0
132 Foundations of Modern Probability
For comparison, we may examine the case of LP-convergence for p > 1.
Corollary 7.22 (LP-convergence) Let M be a martingale on an unboun-
ded index set T, and fix any p > 1. Then M converges in LP iff it is
LP -bounded.
Proof: We may clearly assume that T is countable. If M is LP-bounded,
it converges in L 1 by Theorem 7.18. Since IMIP is also uniformly integrable
by Proposition 7.16, the convergence extends to LP by Proposition 4.12.
Conversely, if M converges in LP, it is LP-bounded by Lemma 7.11. 0
We now consider the convergence of martingales of the special form Mt =
E[IFt], as t increases or decreases along some sequence. Without loss of
generality, we may assume that the index set T is unbounded above or
below, and define respectively
:FOC; = V Ft,
tET
F-OC; = n :Ft.
tET
Theorem 7.23 (conditioning limits, Jessen, Levy) Let:F be a filtration
on a countable index set T c 1R that is unbounded above or below. Then for
any E L 1 , we have as t ---+ ::1:00
E[IFt] ---+ E[I:F:f:OC;] a.s. and in L 1 .
Proof: By Theorems 7.18 and 7.21, the martingale Mt = E[IFt] con-
verges a.s. and in L 1 as t ---+ ::f:oo, and the limit M:f:OC; may clearly be taken
to be :F:f:oo-measurable. To see that M-:f:CX) = E[IF:f:CX)] a.s., we need to
verify the relations
E[M:f:CX); A] = E[; A], A E F-:f:CX).
(12)
Then note that, by the definition of M,
E[Mt; A] = E[; A], A E Fs, S < t.
(13)
This clearly remains true for s = -00, and as t ---+ -00 we get the "minus"
version of (12). To get the "plus" version, let t --t 00 in (13) for fixed s,
and extend by a monotone class argument to arbitrary A E :FCX). 0
In particular, we note the following useful special case.
Corollary 7.24 (Levy) For any filtration F on Z+, we have
P[AI:F n ] ---+ 1A a.s., A E :FCX).
For a simple application, we consider an extension of Kolmogorov's 0-1
law in Theorem 3.13. Say that two u-fields agree a.s. if they have the same
completion with respect to the basic iT-field.
7. Martingales and Optional Times 133
Corollary 7.25 (tail a-field) If :1"1, F 2 , . .. and Q are independent (7-
fields, then
nn a{Fn' Fn+!,' " ; Q} = Q a.s.
Proof: Let T denote the a-field on the left, and note that T Rg(F 1 V
. . . V:F n ) by Proposition 6.8. Using Proposition 6.6 and Corollary 7.24, we
get for any A E T
P[AIQ] == P[AIQ, F 1 , . . . , Fn] -+ lA a.s"
which shows that 7 c 9 a.s. The converse relation is obvious.
o
The last theorem can be used to give a short proof of the law of large
numbers. Then let 1, 2, . .. be i.i.d. random variables in £1, put Sn ==
1 + . . . + n, and define :F -n == a{ 5n, 5 n + 1 , . . . }. Here F -O:J is trivial by
Theorem 3.15, and for any k < n we have E[k IF -n] == E[lIF -n] a.s.,
since (k,Sn,5n+l,"') d (1,Sn,5n+l,"')' Hence, by Theorem 7.23,
n-1Sn - E[n- 1 5nIF_n] == n- 1 " E[J;;IF-n]
L-t k 5:n
E[ll:F -n] -+ E[lIF -00] == El'
As a further application of Theorem 7.23, we consider a kernel version
of the regularization Theorem 6.3. The result is needed in Chapter 21.
Proposition 7.26 (regular densities) For any measurable space (5, S)
and Borel spaces (T, T) and (U,U), let J-£ be a probability kernel from 5
to T xU. Then the densities
J-L( s, dt x B)
lI(s,t,B) = (d U) ' S E S, t E T, BE U, (14)
J-L s, t x
have versions that form a probability kernel from 5 x T to U.
Proof: We may assume T and U to be Borel subsets of R, in which case
J-L can be regarded as a probability kernel from S to JR2. Letting V n denote
the a-field in]R generated by the intervals Ink == [(k -1)2- n , k2- n ), k E Z,
we define
'"" J-L( s, Ink X B)
Mn(s, t, B) = L-t (I U) l{t E Ink},
k J-L S, nk X
s E 5, t E T, B E B,
under the convention % = O. Then Mn(s, ., B) is a version of the density
in (14) with respect to V n , and for fixed sand B it is also a martingale
with respect to J.-t(s, . x U). By Theorem 7.23 we get Mn(s, ., B) -t v(s,., B)
a.e. J-L(s, . x U). Thus, a product-measurable version of II is given by
v(s,t,B)=limsupMn(s,t,B), sES, tET., BEU.
n -10 o:J
It remains to find a version of v that is a probability measure on U for
fixed sand t. Then proceed as in the proof of Theoreln 6.3, noting that
134 Foundations of Modern Probability
in each step the exceptional (s, t)-set A lies in S Q9 T and is such that the
sections As = {t E T; (8, t) E A} satisfy J-L(s, As xU) = 0 for all 8 E s. 0
In order to extend the previous theory to martingales on JR+, we need to
choose suitably regular versions of the studied processes. The next result
provides two closely related regularizations of a given submartingale. Say
that a process X on JR+ is right-continuous with left-hand limits (abbrevi-
ated as rell) if Xt = X t + for all t > 0 and the left-hand limits Xt- exist
and are finite for all t > O. For any process Y on Q+, we write Y+ for the
process of right-hand limits ¥t+, t > 0, provided that the latter exist.
Theorem 7.27 (regularization, Doob) For any :F-submartingale X on
R.+ with restriction Y to Q+, we have:
(i) y+ exists and is rell outside some fixed P-null set A, and Z = lAc y+
is a submartingale with respect to the augmented filtration :F+ .
(ii) If F is right-continuous, then X has an rcll version iff EX is right-
continuous; this holds in particular when X is a martingale.
The proof requires an extension of Theorem 7.21 to suitable submartin-
gales.
Lemma 7.28 (uniform integrability) A submartingale X on Z_ is
uniformly integrable iff EX is bounded.
Proof: Let EX be bounded. Introduce the predictable sequence
an = E[Xnl:Fn-1] > 0, n < 0,
and note that
E" an = EX o - infn<oEX n < 00.
nO -
Hence, l:n On < 00 a.s., and so we may define
An = " Ok, M n == X n - An,
k5:.n
n < o.
Since EA* < 00 and M is a martingale closed at 0, both A and Mare
uniformly integrable. 0
Proof of Theorem 7.27: (i) By Lemma 7.11 the process Y V 0 is £1_
bounded on bounded intervals, and so the same thing is true for Y. Thus,
by Theorem 7.18, the right- and left-hand limits ¥t:1: exist outside some
fixed P-null set A, and so Z = lAc y+ is rcll. Also note that Z is adapted
to :F+ .
To prove that Z is an F+ -submartingale, fix any times 8 < t, and choose
8n t 8 and t n .i t in Q+ with 8n < t. Then Y Sm < E[¥t n l:F sm ] a.s. for all
m and n, and as m ---t 00 we get Zs < E[¥tnIFs+] a.s. by Theorem 7.23.
Since }'in --+ Zt in £1 by Lemma 7.28, it follows that Zs < E[ZtIFs+] =
E[ZtI F s+] a.s.
7. Martingales and Optional Times 135
(ii) For any t < t n E Q+,
(EX)t n == E(¥t n ),
Xt < E[Y't n 1Ft] a.s.,
and as t n t t we get, by Lemma 7.28 and the right-continuity of F,
(EX)t+ = EZ t ,
Xt < E[Zt 1Ft] = Zt .:1.s.
(15)
If X has a right-continuous version, then clearly Zt = X:t a.s. Hence, (15)
yields (EX)t+ = EXt, which shows that EX is right-continuous. If instead
EX is right-continuous, then (15) gives E\Zt - Xtl = EZt - EXt = 0, and
so Zt = Xt a.s., which means that Z is a version of X. 0
Justified by the last theorem, we henceforth assume all submartingales
to be rcll, unless otherwise specified, and also that the underlying filtration
is right-continuous and complete. Most of the previously quoted results
for submartingales on a countable index set extend immediately to such a
context. In particular, this is true for the convergence rheorem 7.18 and
the inequalities in Proposition 7.15 and Lemma 7.17. We proceed to show
how Theorems 7.12 and 7.21 extend to submartingales in continuous time.
Theorem 7.29 (optional sampling and closure, Doob) Let X be an F-
sub martingale on JR+, where X and:F are right-continuous, and consider
two optional times l7 and T, where T is bounded. Then X r is integrable, and
Xul\r < E[XrIFu] a.s. (16)
The statement extends to unbounded times T iff X+ is uniformly integrable.
Proof: Introduce the optional times an = 2-n[2 n a + 1] and Tn
2-n[2nr + 1], and conclude from Lemma 7.10 and Theorem 7.12 that
XUml\'T n < E[X'T n IFam] a.s., m, n E f.
As m 00, we get by Lemma 7.3 and Theorem 7.23
XUl\r n < E[X rn IFa] a.s., n E N. (17)
By the result for the index sets 2- n Z+, the random variables Xo;...,
X'T2' X T1 form a submartingale with bounded mean and are therefore uni-
formly integrable by Lemma 7.28. Thus, (16) follows as we let n -t 00 in
(1 7) .
If X+ is uniformly integrable, then X is £l-bounded and hence converges
a.s. toward some Xoo E £1. By Proposition 4.12 we get xi -t xct in £1,
and so E[xtIFs] -+ E[XIFs] in £1 for each s. Letting t -t 00 along a
sequence, we get by Fatou's lemma
Xs < limtE[XtIFs] -liminftE[X;\Fs]
< E[XIFs] - E[XI:Fs] == E[XooIFs].
We may now approximate as before to obtain (16) for arbitrary a and T.
Conversely, the stated condition implies that there exists some Xoo E £1
with Xs < E[XooIFs] a.s. for all s > 0, and so X;- < E[XIFs] a.s. by
Lemma 7.11. Hence, X+ is uniformly integrable by Lemma 6.5. 0
136 Foundations of Modern Probability
For a simple application, we consider the hitting probabilities of a con-
tinuous martingale. The result will be useful in Chapters 14, 17, and
23.
Corollary 7.30 (first hit) Let M be a continuous martingale with Mo == 0
and P{M* > O} > 0, and define 'x == inf{t > 0; Mt = x}. Then
b
P[Ta < Tbl M* > 0] < b _ a < P(Ta < Tbl M* > 0], a < 0 < b.
Proof: Since I = Ta 1\ Ib is optional by Lemma 7.6, Theorem 7.29 yields
EMrl\t == 0 for all t > 0, and so by dominated convergence EM r == O.
Hence,
o aP{Ta < Tb} + bP{Tb < Ta} + E[Moo; T == 00]
< aP{ra < rb} + bP{Tb < Ta, M* > O}
bP{M* > O} - (b - a)P{'a < Tb},
which implies the first inequality. The second one follows by taking
complements. 0
The next result plays a crucial role in Chapter 19.
Lemma 7.31 (absorption) Let X > 0 be a right-continuous supermartin-
gale, and put T = inf{t > 0; Xt /\ Xt- = O}. Then X == 0 a.s. on
[7,00).
Proof: By Theorem 7.27 the process X remains a supermartingale with
respect to the right-continuous filtration :F+. The times Tn = inf {t >
0; Xt < n- l } are ;:+-optional by Lemma 7.6, and by the right-continuity
of X we have X rn < n- l on {Tn < oo}. Hence, by Theorem 7.29,
E[Xt; Tn < t] < E[X rn ; 'n < t] < n- l , t > 0, n E N.
Noting that Tn t 7, we get by dominated convergence E[Xt; I < t] = 0,
and so Xt = 0 a.s. on {T < t}. The assertion now follows, as we apply this
result to all t E Q+ and use the right-continuity of X. 0
We proceed to show how the right-continuity of an increasing sequence of
supermartingales extends to the limit. The result is needed in Chapter 25.
Theorem 7.32 (increasing limits of supermartingales, Meyer) Let Xl <
x 2 < ... be right-continuous supermartingales with sUP n EX[j < 00. Then
Xt = sUPn Xr, t > 0, is again an a.s. right-continuous supermartingale.
Proof (Doob): By Theorem 7.27 we may assume the filtration to be right-
continuous. The supermartingale property carries over to X by monotone
convergence. To prove the asserted right-continuity, we may assume that Xl
is bounded below by an integrable random variable; otherwise consider the
processes obtained by optional stopping at the times ml\inf{t; Xl < -m}
for arbitrary m > O.
7. Martingales and Optional Times 137
Now fix any > 0, let T denote the class of optional times T with
limsuPultlXu - Xtl < 2, t < T,
and put p == infTET Ee- r . Choose 0"1, a2, . .. E T with Ee- an -+ p, and
note that a = SUPn an E T with Ee- a == p. We need to show that a == 00
a.s. Then introduce the optional times
Tn == inf{t > a; IX:' - Xal > E}, n E J,
and put T == limsuPn Tn. Noting that
IX t - Xal == liminf IX - Xal < E, t E [0., T),
n ---+- 00
we obtain T E T.
By the right-continuity of X n , we note that I X - X a I > E on { Tn < oo}
for every n. Furthermore, on the set A == {a == T < oo} ,ve have
lim inf X > sUPk limn X; == sUPk X == .X a ,
n-+oo n n
and so liminfnXn > Xa + c on A. Since A E Fa by Lemma 7.1, we get
by Fatou's lemma, optional sampling, and monotone cOIlvergence,
E[X u + ; A] < E[lim infnXn; A] < Hm infnE[Xn; A]
< limnE[X;; A] == E[X a ; A].
Thus, PA == 0, and so T > a a.s. on {a < oo}. If p > 0, we get the
contradiction Ee- r < p, so p == O. Hence, a == 00 a.s. 0
Exercises
1. Show for any optional times a and T that {a == T} E F(j n:F r and
Fu == Fr on {a == T}. However, Fr and Foo may differ on {T == oo}.
2. Show that if a and T are optional times on the time scale IR+ or Z+,
then so is a + T.
3. Give an example of a random time that is weakly optional but not
optional. (Hint: Let F be the filtration induced by the process Xt == iJt
with P{'19 = ::I::l} = !, and take T = inf{t; Xt > O}.)
4. Fix a random time T and a random variable in JR \ {O}. Show that
the process Xt = 1 {r < t} is adapted to a given filtration F iff r is
F-optional and is Fr-measurable. Give corresponding conditions for the
process Yi = 1 {T < t}.
5. Let P denote the class of sets A E + x n such that the process lA is
progressive. Show that P is a a-field and that a process X is progressive
iff it is P-measurable.
138 Foundations of Modern Probability
6. Let X be a progressive process with induced filtration F, and fix any
optional time T < 00. Show that a{ T, XT} C F". C :F; c a{ T, XT+h} for
every h > o. (Hint: The first relation becomes an equality when r takes only
countably many values.) Note that the result may fail when P{ T = oo} > O.
7. Let M be an F-martingale on some countable index set, and fix an
optional time T. Show that M - MT remains a martingale conditionally
on :FT. (Hint: Use Theorem 7.12 and Lemma 7.13.) Extend the result to
continuous time.
8. Show that any submartingale remains a submartingale with respect to
the induced filtration.
9. Let Xl, x 2 , . .. be submartingales such that the process X = sUP n xn
is integrable. Show that X is again a submartingale. Also show that
limsuPn X n is a submartingale when even sUPn IXnl is integrable.
10. Show that the Doob decomposition of an integrable random sequence
X = (X n ) depends on the filtration unless X is a.s. Xo-measurable.
(Hint: Compare the filtrations induced by X and by the sequence Y n =
(X O ,X n + 1 ).)
11. Fix a random time T and a random variable E £1, and define Mt =
€ 1 {T < t}. Show that M is a martingale with respect to the induced
filtration :F iff E[; T < tiT> s] = 0 for any s < t. (Hint: The set {T > s}
is an atom of :Fs.)
12. Let :F and 9 be filtrations on a common probability space. Show that
every F-martingale is a 9-martingale iff Ft c 9tJlFtFoo for every t > o.
(Hint: For the necessity, consider F-martingales of the form Ms = E[IFs]
with E Ll(Ft).)
13. Show for any rcll supermartingale X > 0 and constant r > 0 that
rP{suPtXt > r} < EXo.
14. Let M be an L2-bounded martingale on Z+. Imitate the proof of
Lemma 4.16 to show that M n converges a.s. and in £2.
15. Give an example of a martingale that is Ll-bounded but not uniformly
integrable. (Hint: Every positive martingale is Ll-bounded.)
16. Show that if 9Jl:F n 1i for some increasing a-fields :F n , then 9J1.:F oo 1i.
17. Let n ---+ in L I . Show for any increasing a-fields :F n that E[nIFn]
--+ E[I..roo] in L 1 .
18. Let , 1, 2, . .. E L 1 with n t a.s. Show for any increasing a-fields
Fn that E[nIFn] -t E[IFoo] a.s. (Hint: By Proposition 7.15 we have
sUPm E[ - nIFm] o. Now use the monotonicity.)
19. Show that any right-continuous submartingale is a.s. reI I.
20. Let (j and T be optional times with respect to some right-continuous
filtration F. Show that the operators E:Fu and EFT commute on £1 with
product E:FUI\T . (Hint." For any E L 1 , apply the optional sampling theorem
to a right-continuous version of the martingale Mt = E[IFt].)
7. Martingales and Optional Times 139
21. Let X > 0 be a supermartingale on Z+, and let TO < 71 < ... be
optional times. Show that the sequence (X Tn ) is again a supermartingale.
(Hint: Truncate the times Tn, and use the conditional Fatou lemma.) Show
by an example that the result fails for submartingales.
22. For any random time T > 0 and right-continuous filtration :F = (:F t ),
show that the process Xt = P[7 < tiFt] has a right-continuous version.
(Hint: Use Theorem 7.27 (ii).)
Chapter 8
arkov Processes
and Discrete- Time Chains
Markov property and transition kernels; finite-dimensional dis-
tributions and existence; space and time homogeneity; strong
Markov property and excursions; invariant distributions and
stationarity; recurrence and transience; ergodic behavior of
irreducible chains; mean recurrence times .
A Markov process may be described informally as a randomized dynamical
system, a description that explains the fundamental role that Markov pro-
cesses play both in theory and in a wide range of applications. Processes of
this type appear more or less explicitly throughout the remainder of this
book.
To make the above description precise, let us fix any Borel space Sand
filtration :F. An adapted process X in S is said to be Markov if for any
times s < t we have Xt == !s,t(X s , {)s,t) a.s. for some measurable func-
tion !s,t and some U(O, 1) random variable {)s,tJlFs. The stated condition
is equivalent to the less transparent conditional independence XtlLXsFs.
The process is said to be time-homogeneous if we can take fs,t = fO,t-s
and space-homogeneous (when S == JRd) if fs,t(x,.) = fs,t(O,.) +X. A more
convenient description of the evolution is in terms of the transition ker-
nels J.Ls,t(x, .) == P{fs,t(x, 'l9) E .}, which are easily seen to satisfy an a.s.
version of the Chapman-Kolmogorov relation J.-ts,tJ.-tt,u == fts,u. In the usual
axiomatic treatment, the latter equation is assumed to hold identically.
This chapter is devoted to some of the most basic and elementary por-
tions of Markov process theory. Thus, the space homogeneity will be shown
to be equivalent to the independence of the increments, which motivates
our discussion of random walks and Levy processes in Chapters 9 and 15.
In the time-homogeneous case we shall establish a primitive form of the
strong Markov property and see how the result simplifies when the pro-
cess is also space-homogeneous. Next we shall see how invariance of the
initial distribution implies stationarity of the process, which motivates our
treatment of stationary processes in Chapter 10. Finally, we shall discuss
the classification of states and examine the ergodic behavior of discrete-
time Markov chains on a countable state space. The analogous but less
elementary theory for continuous-time chains is postponed until Chapter
12.
8. Markov Processes and Discrete-Time Chains 141
The general theory of Markov processes is more advanced and is not
continued until Chapter 19, which develops the basic theory of Feller pro-
cesses. In the meantime we shall consider several important subclasses,
such as the pure jump-type processes in Chapter 12, Brownian motion and
related processes in Chapters 13 and 18, and the above-mentioned random
walks and Levy processes in Chapters 9 and 15. A detailed discussion of
diffusion processes appears in Chapters 21 and 23, and additional aspects
of Brownian motion are considered in Chapters 22, 24, and 25.
To begin our systematic study of Markov processes, consider an arbitrary
time scale T c , equipped with a filtration F == (Ft), and fix a measurable
space (8, S). An S-valued process X on T is said to be a Markov process if
it is adapted to F and such that
FtJiXu, t < u in T.
Xt
(1)
Just as for the martingale property, we note that even the Markov property
depends on the choice of filtration, with the weakest version obtained for the
filtration induced by X. The simple property in (1) may be strengthened
as follows.
Lemma 8.1 (extended Markov property) If X satisfies (1), then
FtJi {Xu; u > t}, t E T.
Xt
(2)
Proof: Fix any t == to < tl < ... in T. By (1) we have FtnJlXtn Xtn+l
for every n > 0, and so by Proposition 6.8
Ft Ji X tn + 1 , n > O.
xto,...,x tn
By the same proposition, this is equivalent to
Ft Ji (X t 1 , X t2 , . . . ),
Xt
and (2) follows by a monotone class argument.
o
For any times s < t in T, we assume the existence of some regular
conditional distributions
J-ts,t(X s , B) == P[X t E BIX s ] == P[X t E BIFs] a.s. B E S. (3)
In particular, we note that the transition kernels /ks,t exist by Theorem
6.3 when S is Borel. We may further introduce the one-dimensional distri-
butions Vt == £(X t ), t E T. When T begins at 0, we shall prove that the
distribution of X is uniquely determined by the kernels fLs,t together with
the initial distribution vo.
For a precise statement, it is convenient to use the kernel operations
introduced in Chapter 1. Note in particular that if /k and v are kernels on
142 Foundations of Modern Probability
S, then J.L Q9 v and J.LV are kernels from S to 8 2 and S, respectively, given
for s E S by
(J.l (g) v) (s, B)
(J.Lv) (s, B)
J J.l(s, dt) J vet, du)lB(t, u),
- (J.l (g) v)(s, S x B) = J J.l(s, dt)v(t, B),
B E S2,
BE S.
Proposition 8.2 (finite-dimensional distributions) Let X be a Markov
process on T with one-dimensional distributions lit and transition kernels
jJs,t. Then for any to < . . . < t n in T,
£(X to ' . . . , X tn ) - lito Q9 J.ltO,tl @ . . . @ J.lt n -l ,tn' (4)
P[(X t1 ,..., X tn ) E 'IFto] == (J.ltO,tl (g)... Q9 J.ltn-l,tn)(X to , .). (5)
Proof: Formula (4) is clearly true for n = O. Proceeding by induction,
assume (4) to be true with n replaced by n - 1, and fix any bounded
measurable function f on sn+l. Noting that Xto,"', X tn _ 1 are F tn _ 1 -
measurable, we get by Theorem 6.4 and the induction hypothesis
Ef(Xto,..., X tn ) - E E[f(Xto,"', Xtn)IFtn_l]
E J f(Xto,..., X tn _1> xn)/Ltn-l,t n (X tn _ p dx n )
(Vto Q9 J.Lto ,tl Q9 . . . Q9 J.ltn-l ,t n ) f,
as desired. This completes the proof of (4).
In particular, for any B E Sand C E sn we get
P{(Xto,...,X tn ) E B x C}
L Vto (dx)(/Lto,tl 0 . · · 0 /Ltn-btn )(x, C)
- E[(J.ltO,tl Q9'" Q9 J.Ltn-l,tn)(X to , C); Xto E B],
and (5) follows by Theorem 6.1 and Lemma 8.1.
o
An obvious consistency requirement leads to the following basic so-called
Chapman-Kolmogorov relation between the transition kernels. Here we say
that two kernels jJ and J.l' agree a.s. if jJ(x, .) == J.L' (x, .) for almost every x.
Corollary 8.3 (Chapman, Smoluchovsky) For any Markov process in a
Borel space 8, we have
J-l s , u == J.L s , t J-Lt, u a. s. v s , S < t < u.
Proof: By Proposition 8.2 we have a.s. for any B E S
J.ls,u(X s , B)
P[X u E BIFs] == P[(Xt, Xu) E S x BIFs]
(J.ls,t Q9 J.lt,u) (X s , S x B) = (J-Ls,tJ.Lt,u)(X s , B).
Since S is Borel, we may choose a common null set for all B.
o
8. Markov Processes and Discrete-Time Chains 143
We henceforth assume that the Chapman-Kolmogorov relation holds
identically, so that
II. II. t II. t S < t < U.
r-S,U = fA'S, fA' ,u,
(6)
Thus, we define a Markov process by condition (3), in terms of some tran-
sition kernels J.ts,t satisfying (6). In discrete time, when T = Z+, the latter
relation is no restriction, since we may then start from any versions of the
kernels J.Ln = J..Ln-1,n, and define J..Lm,n = J..Lm+1 . . . J..Ln for arbitrary m < n.
Given such a family of transition kernels J..Ls,t and an arbitrary initial
distribution v, we need to show that an associated Markov process exists.
This is ensured, under weak restrictions, by the following result.
Theorem 8.4 (existence, K olmogorov) Fix a time scale T starting at 0, a
Borel space (S, S), a probability measure v on S, and a family of probability
kernels J.Ls,t on S, S < t in T, satisfying (6). Then there exists an S -valued
Markov process X on T with initial distribution v and transition kernels
J..Ls,t.
Proof: Introduce the probability measures
V t t VII- t t @... Q9 II. t t o = t o < t 1 < . . . < t n , n EN.
1,..., n = fA' 0, 1 r- n-l, n'
To see that the family (Vto,...,tn) is projective, let B E sn-1 be arbitrary,
and define for any k E {I, . . . , n} the set
B k = {(Xl, . . . , X n) E sn; (Xl"", X k -1 , X k+ 1, . . . , X n) E B}.
Then by (6)
VtI,...,t n Bk
(VJ.LtO,tl @ . . . Q9 J.Ltk-l ,tk+l Q9 . . . 0 J.ttn-l ,tn)B
Vtl ,...,tk-l ,tk+l ,...,t n B,
as desired. By Theorem 6.16 there exists an S-valued process X on T with
£(X tl ,... ,X tn ) = Vtl,...,t n , t1 < ... < tn, n E N, (7)
and, in particular, £(X o ) = Vo = v.
To see that X is Markov with transition kernels J.Ls,t, fix any times S1 <
. . · < Sn = S < t and sets B E sn and C E S, and conclude from (7) that
P{ (X S1 , . . . , X Sn , Xt) E B x C}
vs1,...,sn,t(B X C)
E [J.Ls, t ( X s, C); (X S 1 , . . . , X S n) E B].
Writing :F for the filtration induced by X, we get by a monotone class
argument
P[X t E C;A] = E[J.Ls,t(Xs,C);A], A E Fs,
and so P[Xt E CIFs] = J.ts,t(Xs, C) a.s. 0
144 Foundations of Modern Probability
Now assume that S is a measurable Abelian group. A kernel J-l on S is
then said to be homogeneous if
J.l(x, B) = /-L(O, B - x), XES, B E S.
An S-valued Markov process with homogeneous transition kernels J-ls,t is
said to be space-homogeneous. Furthermore, we say that a process X in
S has independent increments if, for any times to < ... < tn, the incre-
ments X tk -X tk _ 1 are mutually independent and independent of Xo. More
generally, given any filtration :F on T, we say that X has :F-independent
increments if X is adapted to F and such that Xt - XsJlF s for all s < t
in T. Note that the elementary notion of independence corresponds to the
case when F is induced by X.
Proposition 8.5 (independent increments and homogeneity) Consider a
measurable Abelian group S, a filtration F on some time scale T, and an
S -valued and :F -adapted process X on T. Then X is space-homogeneous :F-
Markov iff it has :F -independent increments, in which case the transition
kernels are given by
J-ls,t(x,B)==P{Xt-XsEB-x}, xES, BES, s < tinT. (8)
Proof: First assume that X is Markov with transition kernels
J.ts,t(X, B) == J.ts,t(B - x), XES, B E S, s < t in T. (9)
By Theorem 6.4, for any s < t in T and B E S we get
P[X t - Xs E BI:F s ] = P[X t E B + Xsl:Fs]
== /-Ls,t(X s , B + Xs) == J-ls,tB.
Thus, Xt - Xs is independent of 5s with distribution J-ls,t, and (8) follows
by means of (9).
Conversely, assume that Xt - Xs is independent of :Fs with distribution
J..Ls,t. Defining the associated kernel J.ts,t by (9), we get by Theorem 6.4, for
any s, t, and B as before,
P[X t E BIFs] = P[X t - Xs E B - XsIFs]
= J-ls,t(B - Xs) == J.ts,t(X s , B).
Thus, X is Markov with the homogeneous transition kernels in (9). 0
We may now specialize to the time-homogeneous case--when T == R+ or
Z+ and the transition kernels are of the form J-ls,t = J-lt-s, so that
P[X t E BIFs] = J.tt-s(X s , B) a.s., B E S, s < t in T.
Introducing the initial distribution v = £(X o ), we may write the formulas
of Proposition 8.2 as
£( Xto , . . . , X tn )
P[(X tl ,.. . , X tn ) E .IFto]
VJ..tto Q9 J..ttl-to Q9 . . . Q9 J1t n -tn-I'
(J.ttl-to ... Q9 J.ttn-tn-l)(X to , .).
8. Markov Processes and Discrete-Time Chains 145
The Chapman-Kolmogorov relation now becomes
J-Ls+t == JLsJLt, s, t E T,
which is again assumed to hold identically. We often refer to the family
(/It) as a semigroup of transition kernels.
The following result justifies the interpretation of a discrete-time Markov
process as a randomized dynamical system.
Proposition 8.6 (recursion) Let X be a process on Z+ with values in a
Borel space S. Then X is Markov iff there exist some measurable functions
!1, 12,...: Sx [0,1] -4 Sand i.i.d. U(O, 1) random variables 19 1 ,19 2 ,.. .1lX o
such that X n == In(X n - 1 , 'l?n) a.s. for all n E N. Here we may choose
11 == 12 == . . . = 1 iff X is time-homogeneous.
Proof: Let X have the stated representation and introduce the kernels
/In(X,.) == P{!n(X, {}) E'}, where 19 is U(O, 1). Writing :F for the filtration
induced by X, we get by Theorem 6.4 for any B E S
P[X n E BIFn-1]
P[fn(Xn-1, 19 n) E BIFn-l]
A {t; f n ( X n -1 , t) E B} == J-ln ( X n -1 , B),
which shows that X is Markov with transition kernels j1'n'
Now assume instead the latter condition. y emma 3.22 we may choose
some associated functions fn as above. Let '19 1 ,19 2 " .. be i.i.d. U(O,l) and
- d - --
independent of Xo == Xo, and define recursively X n == !n(X n - 1 ,19 n ) for
n E N. As before, X is Markov with transition kernels /-In. Hence, X d X by
Proposition 8.2, and so by Theorem 6.10 there exist some random variables
19 n with (X, ({) n) ) d (X, (J n) ). Since the diagonal in 52 is measurable,
the desired representation follows. The last assertion is obvious from the
construction. 0
Now fix a transition semigroup (/It) on some Borel space S. For any
probability measure v on S, there exists by Theorem 8.4 an associated
Markov process Xv, and by Proposition 3.2 the corresponding distribution
P II is uniquely determined by 1/. Note that P II is a probability measure on
the path space (ST,ST). For degenerate initial distributions 8x, we may
write Px instead of Pbx' Integration with respect to Pv or Px is denoted by
Ev or Ex, respectively.
Lemma 8.7 (mixtures) The measures Px form a probability kernel from
S to ST, and for any initial distribution 1/ we have
PyA = l Px(A) v(dx), A EST. (10)
Proof: Both the measurability of PxA and formula (10) are obvious for
cylinder sets of the form A = (7rtl"..' 1ftn )-1 B. The general case follows
easily by a monotone class argument. 0
146 Foundations of Modern Probability
Rather than considering one Markov process Xv for each initial distribu-
tion 11, it is more convenient to introduce the canonical process X, defined
as the identity mapping on the path space (ST, ST), and equip the latter
space with the different probability measures Pv. Then Xt agrees with the
evaluation map 7ft : w t-+ Wt on ST, which is measurable by the definition of
ST. For our present purposes, it is sufficient to endow the path space ST
with the canonical filtration F induced by X.
On ST we may also introduce the shift operators Ot : ST ST, t E T,
given by
( (}tW ) s = W s+t , s, t E T, WEST,
and we note that the Ot are measurable with respect to ST. In the canonical
case it is further clear that OtX = Ot = X 0 Ot.
Optional times with respect to a Markov process are often constructed
recursively in terms of shifts on the underlying path space. Thus, for any
pair of optional times a and T on the canonical space, we may consider the
random time 'Y = a + T 0 (jq, with the understanding that 'Y = 00 when
a = 00. Under weak restrictions on space and filtration, we show that 'Y is
again optional. Let 0(8) and D(8) denote the spaces of continuous or rcll
functions, respectively, from + to S.
Proposition 8.8 (compound optional times) For any metric space S, let
a and T be optional times on the canonical space 8 00 , C(S), or D(S),
endowed with the right-continuous, induced filtration. Then even 'Y = a +
T 0 (} a is optional.
Proof: Since a An + T 0 (JuAn t "'I, we may assume by Lemma 7.3 that a
is bounded. Let X denote the canonical process with induced filtration F.
Since X is F+ -progressive, X a + s = Xs ,0 (Ja is F:+s-measurable for every
s > 0 by Lemma 7.5. Fixing any t > 0, it follows that all sets A = {X s E B}
with s < t and B E S satisfy 0;1 A E F:+ t . The sets A with the latter
property form a a-field, and therefore
0;1 Ft C F:+ t , t > o.
Now fix any t > 0, and note that
(11)
{'Y<t}= U {a<r,To(Ju<t-r}.
rEQn(O,t)
For every r E (0, t) we have {T < t-r} E Ft-r, so O;l{T < t-r} E :F:+ t - r
by (11), and Lemma 7.2 yields
{a < r, T 0 (ja < t - r} = {a + t - r < t} n 0;1{ T < t - r} EFt.
Thus, {'Y < t} E:Ft by (12), and so "'I is F+-optional by Lemma 7.2. 0
(12)
We proceed to show how the elementary Markov property may be
extended to suitable optional times. The present statement is only p:r.:e-
8. Markov Processes and Discrete-Time Chains 147
liminary, and stronger versions are obtained under further conditions in
Theorems 12.14, 13.11, and 19.17.
Proposition 8.9 (strong Markov property) Fix a time-homogeneous Mar-
kov process X on T == 1R.+ or Z+, and let T be an optional time taking
countably many values. Then
P[O,XEAIF,]=PXTA a.s.on{r<oo}, AES T . (13)
If X is canonical, it is equivalent that
Ev[ 0 0, IF,] == EXT' Pv-a.s. on {r < oo},
(14)
for any distribution v on S and bounded or nonnegative random variable
.
Since {T < co} E F" we note that (13) and (14) make sense by Lemma
6.2, although O,X and Px.,. are defined only for r < 00.
Proof: By Lemmas 6.2 and 7.1 we may assume that T == t is finite and
nonrandom. For sets A of the form
A == (7r t 1 , . . . , 7r t n ) -1 B, t 1 < . .. < tn, B E sn, n EN, (15 )
Proposition 8.2 yields
P[Ot X E AIFtJ
P[(X t + t1 , . . . , X t + tn ) E BIFt]
(J-ttl J.L t 2- t l @ . . . Q9 J-ttn-tn-l )(X t , B) == PXtA,
which extends by a monotone class argument to arbitrary A E ST.
In the canonical case we note that (13) is equivalent to (14) with == lA,
since in that case 00, == 1 {O,X E A}. The result extends by linearity and
monotone convergence to general . 0
When X is both space- and time-homogeneous, the strong Markov
property can be stated without reference to the family (Px).
Theorem 8.10 (space and time homogeneity) Let X be a space- and time-
homogeneous Markov process in a measurable Abelian group S. Then
PxA==Po(A-x), XES, AES T .
(16)
Furthermore, (13) holds for a given optional time T < 00 iff X, '/,s a.s.
FT-measurable and
x-X o
d
(),X - X". Jl F,.
(17)
148 Foundations of Modern Probability
Proof: By Proposition 8.2 we get for any set A as in (15)
Px 0 (7r tl , . . . , 7r t n ) -1 B
(J-ltl Q9 J-l t 2- t l @ ... Q9 J-ltn-tn-l)(X, B)
(J-ltl Q9 J-l t 2- t l Q9 . . . Q9/-lt n -t n -l )(0, B - x)
Po 0 (11'" t l'.'.' 7rt n )-l(B - x) = Po(A - x),
which extends to (16) by a monotone class argument.
Next assume (13). Letting A == 11'"0 1 B with B E S, we get
Px A
IB(X,) == PX T {7rQ E B} == P[X, E BIF,] a.s.,
and so X, is a.s. F,-measurable. By (16) and Theorem 6.4 we have
P[B,X - X, E AIF,] == PxT(A + X,) == PoA, A EST, (18)
which shows that (),X - X, is independent of F, with distribution Po. For
T == 0 we get in particular £(X - Xo) = Po, and (17) follows.
Next assume (17). To deduce (13), let A E ST be arbitrary, and conclude
from (16) and Theorem 6.4 that
P[OrX E AIF.,.]
P[O,X - X r E A - XrIF.,.]
Po(A - X,) == PXrA.
o
If a time-homogeneous Markov process X has initial distribution v, then
the distribution at time t E T equals Vt == V/-lt, or
Vt B = J V (dx)f..Lt (x, B), B E 8, t E T.
A distribution v is said to be invariant for the semi group (J.-Lt) if Vt is
independent of t, so that Vjlt = v for all t E T. We also say that a process
X on T is stationary if ()tX d X for all t E T. The two notions are related
as follows.
Lemma 8.11 (stationarity and invariance) Let X be a time-homogene-
ous Markov process on T with transition kernels J-lt and initial distribution
v. Then X is stationary iff v is invariant for (J-lt).
Proof: Assuming v to be invariant, we get by Proposition 8.2
d
(X t + tl ,... ,X t + tn ) = (X tl ,... ,X tn ), t, t 1 < ... < t n in T,
and the stationarity of X follows by Proposition 3.2.
o
For processes X in discrete time, we may consider the sequence of suc-
cessive visits to a fixed state YES. Assuming the process to be canonical,
we may introduce the hitting time Ty == inf{n E N; X n == y} and then
define recursively
k+ 1 k ()
Ty = Ty + Ty 0 ,;,
k E Z+,
8. Markov Processes and Discrete-Time Chains 149
starting from T == O. Let us further introduce the occupation times
K-y == sup{k; T; < oo} = " l{X n = y}, YES.
n?l
The next result expresses the distribution of "'y in terms of the hitting
probabilities
r xy == Px{Ty < oo} == Px{Ky > O}, x,y E S.
Proposition 8.12 (occupation times) For any x, y E 51 and kEN,
Px{"'y > k} Px{T; < oo} = rXyr;l,
r xy
1 - Tyy
ExK-y
(19)
(20)
Proof: By the strong Markov property, we get for any kEN
Px{r;+l < oo} Px {r; < 00, ry 0 Or; < oo}
Px{T; < oo}Py{Ty < oo} == ryyPx{T; < oo},
and the second relation in (19) follows by induction on k. The first relation
is clear from the fact that K-y > k iff T; < 00. To deduce (20), conclude
from (19) and Lemma 3.4 that
ExKy = '" Px{Ky > k} = "'r Xy r;;l = r xy .
L...J L...J 1 - r
k ?l k? 1 yy
For x == y the last result yields
Px {fix > k} == Px {T; < oo} == rx' kEN.
o
Thus, under Px, the number of visits to x is either a.s. infinite or geomet-
rically distributed with mean Exlix + 1 == (1- rxx)-l < (X). This leads to a
corresponding classification of the states into recurrent and transient ones.
Recurrence can often be deduced from the existence of an invariant
distribution. Here and below we write Py == Jln(x, {y}).
Proposition 8.13 (invariant distributions and recur'rence) If an in-
variant distribution 11 exists, then any state x with v{ x } > 0 is
recurrent.
Proof: By the invariance of 11,
o < lI{x} = J lI(dy)p;x' n E N.
Thus, by Proposition 8.12 and Fubini's theorem,
00 = I: J lI(dy)p;x = J lI(dy) I:p;x = J lI(dy) 1 :y;
nl nl xx
(21)
1
<
1 - Txx
Hence, r xx == 1, and so x is recurrent.
o
150 Foundations of Modern Probability
The period d x of a state x is defined as the greatest common divisor of
the set {n EN; Px > O}, and we say that x is aperiodic if d x = 1.
Proposition 8.14 (positivity) If xES has period d < 00, then p > 0
for all but finitely many n.
Proof: Define S = {n E N; p > O}, and conclude from the Chapman-
Kolmogorov relation th,at S is closed under addition. Since S has greatest
common divisor 1, the generated additive group equals Z. In particular,
there exist some nl,.. . , nk E Sand Zl,. . . , Zk E Z with E j zjnj = 1. Writ-
ing m = nl E j IZjlnj, we note that any number n > m can be represented,
for suitable h E Z+ and r E {O,..., nl - I}, as
n = m + hnl + r = hnl + L '<k(n1Izjl + rZj}nj E S.
J_
For each xES, the successive excursions of X from x are given by
o
Y n = XTx 0 ()Tn, n E Z+,
x
as long as T: < 00. To allow for infinite excursions, we may introduce an
extraneous element fJ (j. S, and define Y n = "8 = (8,8,. ..) whenever T: = 00.
Conversely, X may be recovered from the Y n through the formulas
Tn = inf{t > 0; Yk(t) = x}, (22)
k<n
Xt = Yn(t - Tn), Tn < t < Tn+l, n E Z+. (23)
The distribution V x = Px 0 yo-l is called the excursion law at x. When
x is recurrent and r yx = 1, Proposition 8.9 shows that Y 1 , Y 2 , . .. are i.i.d.
V x under Py. The result extends to the general case, as follows.
Proposition 8.15 (excursions) Consider a discrete-time Markov process
X in a Borel space S, and fix any XES. Then there exist some independent
processes Yo, Y 1 ,... in S, all but Yo with distribution v x , such that X is
a.s. given by (22) and (23).
- d - - -
Proof: Put Yo = Yo, and let Y 1 , Y2, . .. be independent of Yo and i.i.d.
V X . Construct associated random times TO, Tt, . .. as in (22), and define a
process X as in (23). By Corollary 6.11, it is enough to show that X d x.
Writing
= sup{n > 0; 'Tn < oo},
it is equivalent to show that
= sup{n > 0; Tn < oo},
-- d - ---
(yo, . . . , Y K , fJ, <5, . . . ) = (yo, . . . , Y it , <5, fJ, . . . ).
(24)
1]sing the strong Markov property on the left and the independence of the
Y n on the right, it is easy to check that both sides are Markov processes in
Sz+ U {6} with the same initial distribution and transition kernel. Hence,
(24) holds by Proposition 8.2. 0
8. Markov Processes and Discrete-Time Chains 151
By a discrete-time Markov chain we mean a Markov process on the time
scale Z+, taking values in a countable state space S. In this case the tran-
sition kernels of X are determined by the n-step transition probabilities
pij = JLn(i, {j}), i,j E S, and the Chapman-Kolmogorov relation becomes
m+n """" m n
Pik == .PijPjk,
J
i, k E S, m, n E N,
(25)
or in matrix notation, pm+n = pmpn. Thus, pn is the nth power of the ma-
trix p = pI, which justifies our notation. Regarding the initial distribution
vasa row vector (Vi), we may write the distribution at time n as vpn.
As before, we define rij = Pi{Tj < oo}, where Tj == inf{n > 0; X n == j}.
A Markov chain in S is said to be irreducible if r ij > 0 for all i, j E S,
so that every state can be reached from any other state. For irreducible
chains, all states have the same recurrence and periodicity properties.
Proposition 8.16 (irreducible chains) For any irreducible Markov chain,
(i) the states are either all recurrent or all transient;
(ii) all states have the same period;
(Hi) if v is invariant, then Vi > 0 for all i.
For the proof of (i) we need the following lemma.
Lemma 8.17 (recurrence classes) Let i E S be recurrent, and define Si ==
{j E S; rij > O}. Then rjk = 1 for any j, k E Si, and all states in Si are
recurrent.
Proof: By the recurrence of i and the strong Markov property, we get for
any j E Si
o
P i { Tj < 00, Ti 0 f)Tj == oo}
Pi{Tj < oo}Pj{Ti == oo} == rij(1 - rji).
Since rij > 0 by hypothesis, we obtain rji = 1. Fixing any m, n E N with
Pij,Pji > 0, we get by (25)
E > L m+n+s > L n 8 m n m E
J '''' J ' P . . P . . p . . p .. == P . . p .. -iK-i == 00
- 8>0 JJ - 8>0 J U 'lJ J'l 'I,J .,., ,
and so j is recurrent by Proposition 8.12. Reversing the roles of i and j
gives Tij = 1. Finally, we get for any j, k E Si
rjk > Pj{Ti < 00, Tk 0 f)Ti < oo} == Tjirik == 1.
o
Proof of Proposition 8.16: (i) This is clear from Lemma 8.17.
(ii) Fix any i, j E S, and choose m, n E N with pi], Pji > O. By (25),
m+h+n > n h m
Pjj - PjiPiiPij'
h > o.
152 Foundations of Modern Probability
For h = 0 we get pjj+n > 0, and so djl(m + n) (d j divides m + n). Hence,
in general, p > 0 implies d j Ih, and we get d j < die Reversing the roles of
i and j yields the opposite inequality.
(iii) Fix any i E S. Choosing j E S with Vj > 0 and then n E N with
Pji > 0, we see from (21) that even Vi > O. 0
We may now state the basic ergodic theorem for irreducible Markov
chains. Related results will appear in Chapters 12, 19, and 23. For any
)
signed measure J.L we define IIJ.LII = sup A IJ.LAI.
Theorem 8.18 (ergodic behavior, Markov, Kolmogorov, Orey) For any
irreducible, aperiodic Markov chain in S, exactly one of these cases occurs:
(i) There exists a unique invariant distribution v, the latter satisfies Vi >
o for all i E S, and for any distribution J.L on S we have
lim I/PJL 0 0;;1 - Pv II = O.
n-.oo
(ii) No invariant distribution exists, and we have
(26)
lim p. = 0 i, j E S.
noo 1,J '
(27)
A Markov chain satisfying (i) is clearly recurrent, whereas one that sat-
isfies (ii) may be either recurrent or transient. This leads to the further
classification of the irreducible, aperiodic, and recurrent Markov chains
into positive recurrent and null-recurrent ones, depending on whether (i) or
(ii) applies.
We shall prove Theorem 8.18 by the powerful method of coupling. Here
the general idea is to compare the distributions of two processes X and Y,
by constructing copies X d X and Y d Y on a common probability space.
By a suitable choice of joint distribution, one may sometimes reduce the
original problem to a pathwise comparison. The coupling approach often
leads to simple and transparent proofs; we shall see further applications of
the method in Chapters 9, 14, 15, 16, 20, and 23. For our present needs,
an elementary coupling by independence is sufficient.
Lemma 8.19 (coupling) Let X and Y be independent Markov chains in
Sand T with transition matrices (Pii') and (qjj/), respectively. Then (X, Y)
is a Markov chain in S x T with transition matrix rij,i'j' = Pii,qjj', If X
and Yare irreducible and aperiodic, then so is (X, Y); in that case (X, Y)
is recurrent whenever invariant distributions exist for both X and Y.
Proof: The first assertion is easily proved by computation of the finite-
dimensional distributions of (X, Y) for an arbitrary initial distribution jj@V
on S x T, using Proposition 8.2. Now assume that X and Yare irreducible
and aperiodic. Fixing any i, i' E Sand j, j' E T, we see from Proposition
8.14 that rij,i'j' = pii,qjj' > 0 for all but finitely many n E N, and so
even eX, Y) has the stated properties. Finally, if J-t and V are invariant
8. Markov Processes and Discrete- Time Chains 153
distributions for X and Y, respectively, then J.-l 0 v is invariant for (X, Y),
and the last assertion follows by Proposition 8.13. 0
The point of the construction is that, if the coupled processes eventually
meet, their distributions will agree asymptotically.
Lemma 8.20 (strong ergodicity) If the Markov chain in 52 with transi-
tion matrix Pii'Pjj' is irreducible and recurrent, then for any distributions
J-L and 1/ on 5,
lim II PJ.t 0 0;; 1 - Pv 0 () 111 == o.
n-+ <X)
(28)
Proof (Doeblin): Let X and Y be independent with distributions PJ.t and
Pv. By Lemma 8.19 the pair (X, Y) is again Markov with respect to the
induced filtration F, and by Proposition 8.9 it satisfies the strong Markov
property at every finite optional time T. Taking T == inf{n > 0; X n == Y n },
we get for any measurable set A c 5 00
P[(}r X E AIFr] == PXr A == PYr A == P[(}r Y E AIFrJ.
In particular, (T,Xr,OrX) d (T,X r ,8 r Y). Defining X n == X n for n < T
- - d
and X n = Y n otherwise, we obtain X == X, and so for any A as above
IP{OnX E A} - P{OnY E A}I
IP{(}nX E A} - P{OnY E A}I
IP{OnX E A, T > n} - P{(}nY E A, T > n}1
< P{T > n} -+ O. 0
The next result ensures the existence of an invariant distribution. Here
a coupling argument is again useful.
Lemma 8.21 (existence) If (27) fails, there exists an invariant distribu-
tion.
Proof: Assume that (27) fails, so that Jim SUPn pi J o > 0 for some io, jo E
0, 0
S. By a diagonal argument we may choose a subsequence N' c N and some
constants Cj with Cjo > 0 such that P,j -+ Cj along N' for every j E 5.
Note that 0 < Lj Cj < 1 by Fatou's lemma.
To extend the convergence to arbitrary i, let X and Y be independent
processes with the given transition matrix (Pij), and conclude from Lemma
8.19 that (X, Y) is an irreducible Markov chain on 52 with transition
probabilities qij,i'j' = Pii'Pjj'. If (X, Y) is transient, then by Proposition
8.12
'"""' (p.)2 = " q'T}: .. < 00 i, j E 5,
n J n n,JJ '
and (27) follows. The pair (X, Y) is then recurrent and Lemma 8.20 yields
pij - P,j -4 0 for all i,j E I. Hence, pij -4 Cj along N' for all i and j.
154 Foundations of Modern Probability
Next conclude from the Chapman-Kolmogorov relation that
n+1 n n
Pik = .PijPjk = .PijPjk,
J J
i, k E S.
Using Fatou's lemma on the left and dominated convergence on the right,
we get as n -t 00 along N'
C' P ' k < '" P "C k = C k k E S.
. J J - . 'lJ ,
J J
(29)
Summing over k gives Lj Cj < 1 on both sides, and so (29) holds with
equality. Thus, (Ci) is invariant, and we get an invariant distribution v by
taking Vi = Ci/ Lj Cj. 0
Proof of Theorem 8.18: If no invariant distribution exists, then (27) holds
by Lemma 8.21. Now let v be an invariant distribution, and note that Vi > 0
for all i by Proposition 8.16. By Lemma 8.19 the coupled chain in Lemma
8.20 is irreducible and recurrent, so (28) holds for any initial distribution J-L,
and (26) follows since Pv oO:;;,l = Pv by Lemma 8.11. If even v'is invariant,
then (26) yields Pv' = Pv, and so v' = v. 0
The limits in Theorem 8.18 may be expressed in terms of the mean
recurrence times EjTj, as follows.
Theorem 8.22 (mean reCU1Tence times, Kolmogorov)
chain in 8 and states i, j E 8 with j aperiodic, we have
1 . n Pi{Tj < oo}
1m Pi' = .
noo J E'T'
J J
For any Markov
(30)
Proof: First take i = j. If j is transient, then Pjj --t 0 and EjTj = 00,
and so (30) is trivially true. If instead j is recurrent, then the restriction of
X to the set 8 j = {i; rji > O} is irreducible recurrent by Lemma 8.17 and
aperiodic by Proposition 8.16. Hence, p}j converges by Theorem 8.18.
To identify the limit, define
n
Ln = sup{k E Z+; Tj < n} = L l{Xk = j}, n E N.
k=l
The Tj form a random walk under Pj, and so, by the law of large numbers,
L(T;) n 1
= - -+ a.s. Pj.
T j n rf EjTj
By the monotonicity of Lk and rf it follows that Ln/n ---* (EjTj)-l a.s.
Pj. Noting that Ln < n, we get by dominated convergence
n
1 L k EjLn 1
- p..= ---*
n JJ n E'T"
k= 1 J J
and (30) follows.
8. Markov Processes and Discrete- Time Chains 155
Now let i =1= j. Using the strong Markov property, the disintegration
theorem, and dominated convergence, we get
pij - Pi{X n = j} = Pi{Tj < n, (()TJX)n-Tj = j}
E i [P7j-T J ; 7j < n] --t Pi{7j < OO}/EjTj.
o
We return to continuous time and a general state space, to clarify the
nature of the strong Markov property of a process X at finite optional times
'T. The condition is clearly a combination of the conditional independence
()-,-X Jlx-rFr and the strong homogeneity
P[8r X E 'IX,.] = PXr a.s.
(31)
Though (31) appears to be weaker than (13), the two properties are in fact
equivalent, under suitable regularity conditions on X and F.
Theorem 8.23 (strong homogeneity) Fix a separable metric space (S,p),
a probability kernel (Px) from S to D(S), and a right-continuous filtration
F on JR+. Let X be an :F-adapted rcll process in S such that (31) holds for
all bounded optional times T. Then X satisfies the strong Markov property.
Our proof is based on a 0-1 law for absorption probabilities, involving
the sets
I = {w E D; Wt = wo},
A = {x E S; PxI = I}.
(32)
Lemma 8.24 (absorption) For X as in Theorem 8.23 and for any
optional time T < 00, we have
PXr I = l](OrX) = lA(X i ) a.s.
(33)
Proof: We may clearly assume that T is bounded, say by n E N. Fix any
h > 0, and divide S into disjoint Borel sets B 1 , B 2 , . .. of diameter < h.
For each kEN, define
7k = n A inf{t > T; p(X i , Xt) > h} on {Xi E B k }, (34)
and put Tk = T otherwise. The times Tk are again bounded and optional,
and we note that
{X-'-k E Bk} C {X-,- E B k , SUPtE[T,nJP(X." X t ) < h}. (35)
156 Foundations of Modern Probability
Using (31) and (35), we get as n -t 00 and h -t 0
E[PxJ C ; ()TX E I] = Lk E[PxJ c ; ()T X E I, X T E B k ]
< Lk E[PX Tk I C ; X Tk E Bk]
Lk P{()Tk X tic I, X Tk E Bd
< LkP{OTX tic I, X T E Bk, SUPtEhnjp(XnXd < h}
--t P{(JrX tt I, SUPtrp(Xr' Xt) == O} == 0,
and so PxTI == 1 a.s. on {OrX E I}. Since also EPxTI == P{(JrX E I} by
(31), we obtain the first relation in (33). The second relation follows by the
definition of A. 0
Proof of Theorem 8.23: Define I and A as in (32). To prove (13) on
{X r E A}, fix any times tl < ... < t n and Borel sets Bl,...,Bn, write
B == nk B k , and conclude from (31) and Lemma 8.24 that
P [nk{X T + tk E BdlFTJ
P[X r E BIFr] == l{X r E B}
P[X r E BIX r ] == PXT{wo E B}
pxTnk{Wt k E Bd.
This extends to (13) by a monotone class argument.
To prove (13) on {X r A}, we may assume that T < n a.s., and divide
AC into disjoint Borel sets Bk of diameter < h. Fix any F E :F r with F C
{X r 1- A}. For each kEN, define Tk as in (34) on the set pc n {X r E Bk},
and let Tk == T otherwise. Note that (35) remains true on FC. Using (31),
(35), and Lemma 8.24, we get as n -t 00 and h -t 0
IP[Br X E . ; F] - E[P XT ; F] I
IL k E[l{OT X E'} - Px T ; X T E Bk, F]I
IL k E[l{OTk X E .}-P XTk ; X Tk EBk, F]I
IL k E[l{()Tk X E.} - PX Tk ; X Tk E Bk, FC]I
< Lk P[X Tk E B k ; F C ]
< LkP{X T E B k , SUPtE[T,njP(XT,X t ) < h}
-t P{X r tt A, SUPtrp(Xr' Xt) == O} == O.
Hence, the left-hand side is zero.
o
8. Markov Processes and Discrete-Time Chains 157
Exercises
1. Let X be a process with XsllXt {Xu, U > t} for all s < t. Show that X
is Markov with respect to the induced filtration.
2. Let X be a Markov process in some space S, and fix a measurable
function f on S. Show by an example that the process yt == f(Xt) need
not be Markov. (Hint: Let X be a simple symmetric random walk on Z,
and take f(x) == [x/2].)
3. Let X be a Markov process in IR with transition functions J-tt satisfying
J.lt(x, B) == J-lt( -x, -B). Show that the process yt == IXtl is again Markov.
4. Fix any process X on JR+, and define yt == X t == {XSl\t; s > O}. Show
that Y is Markov with respect to the induced filtration.
5. Consider a random element in some Borel space and a filtration :F
with :Foe C (]"{}. Show that the measure-valued process Xt == P[ E .1:F t ]
is Markov. (Hint: Note that llXtFt for all t.)
6. For any Markov process X on R+ and time u > 0, show that the re-
versed process it == Xu-t, t E [0, u], is Markov with respect to the induced
filtration. Also show by an example that a possible time homogeneity of X
need not carryover to Y.
7. Let X be a time-homogeneous Markov process in some Borel space
S. Show that there exist some measurable functions fh : S x [0,1] -+ S,
h > 0, and U(O, 1) random variables 'l9t,hllXt, t, h > 0, such that Xt+h ==
fh(X t ,'19 t ,h) a.s. for all t, h > O.
8. Let X be a time-homogeneous and rcll Markov process in some Polish
space S. Show that there exist a measurable function f : S x [0,1] -+
D(JR+, S) and some U(O,l) random variables 'l9 t l.LX t such that ()tX ==
f(Xt,'19 t ) a.s. Extend the result to optional times taking countably many
values.
9. Let X be a process on R+ with state space S, and define yt == (X t , t),
t > O. Show that X and Yare simultanously Markov, and that Y is then
time-homogeneous. Give a relation between the transition kernels for X
and Y. Express the strong Markov property of Y at a random time T in
terms of the process X.
10. Let X be a discrete-time Markov process in S with invariant distribu-
tion lJ. Show for any measurable set B c S that Pv {X n E B i.o.} > lJ B.
Use the result to give an alternative proof of Proposition 8.13. (Hint: Use
Fatou's lemma.)
11. Fix an irreducible Markov chain in S with period d. Show that S has
a unique partition into subsets 8 1 , . . . , Sd such that Pij == 0 unless i E Sk
and j E 8k+1 for some k E {I, . . . , d}, where the addition is defined modulo
d.
12. Let X be an irreducible Markov chain with period d, and define
8 1 , · · . , Sd as above. Show that the restrictions of (Xnd) to 8 1 , . . . , Sd are
158 Foundations of Modern Probability
irreducible, aperiodic and either all positive recurrent or all null recurrent.
In the former case, show that the original chain has a unique invariant dis-
tribution 1/. Further show that (26) holds iff jjSk = lid for all k. (Hint: If
(X nd ) has an invariant distribution v k in Sk, then v;+l = Ei Vfpij form
an invariant distribution in Sk+1.)
13. Given a Markov chain X on S, define the classes C i as in Lemma 8.17.
Show that if j E C i but i f/- C j for some i, j E S, then i is transient.
If instead i E C j for every j E C i , show that C i is irreducible (i.e., the
restriction of X to C i is an irreducible Markov chain). Further show that
the irreducible sets are disjoint and that every state outside all irreducible
sets is transient.
14. For an arbitrary Markov chain, show that (26) holds iff E j Ipi} -Vj I -+ 0
for all i.
15. Let X be an irreducible, aperiodic Markov chain in N. Show that X is
transient iff X n -+ 00 a.s. under any initial distribution and is null recurrent
iff the same divergence holds in probability but not a.s.
16. For every irreducible, positive recurrent subset Sk C S, there exists
a unique invariant distribution Vk restrited to Sk, and every invariant
distribution is a convex combination Ek CkVk.
17. Show that a Markov chain on a finite state space S has at least one
irreducible set and one invariant distribution. (Hint: Starting from any
io E S, choose i 1 E C io , i2 E C il , etc. Then nn Gin is irreducible.)
18. Let X and Y be independent Markov processes with transition kernels
JLs,t and Vs,t. Show that (X, Y) is again Markov with transition kernels
JLs,t(x,.) Q9 Vs,t(Y, .). (Hint: Compute the £inite-dimensional distributions
from Proposition 8.2, or use Proposition 6.8 with no computations.)
19. Let X and Y be independent, irreducible Markov chains with periods
d 1 and d 2 . Show that Z = (X, Y) is irreducible iff d 1 and d 2 have greatest
common divisor 1 and that Z then has period d 1 d 2 .
20. State and prove a discrete-time version of Theorem 8.23. Further
simplify the continuous-time proof when S is countable.
Chapter 9
Random Walks and Renewal Theory
Recurrence and transience; dependence on dimension; general
recurrence criteria; symmetry and duality; Wiener-Hop! factor-
ization; ladder time and height distribution; stationary renewal
process; renewal theorem
A random walk in jRd is defined as a discrete-time random process (Sn)
evolving by i.i.d. steps n = Sn = Sn - Sn-l. For most purposes we
may take So = 0, so that Sn = 1 + . . . + n for all n. Random walks
may be regarded as the simplest of all Markov processes. Indeed, we recall
from Chapter 8 that random walks are precisely the discrete-time Markov
processes in d that are both space- and time-homogeneous. (In continuous
time, a similar role is played by the so-called Levy processes, to be studied
in Chapter 15.) Despite their simplicity, random walks exhibit many basic
features of Markov processes in discrete time and hence may serve as a good
introduction to the general subject. We shall further see how random walks
enter naturally into the discussion of certain continuous-time phenomena.
Some basic facts about random walks were obtained in previous chapters.
Thus, we established some simple 0-1 laws in Chapter 3, and in Chap-
ters 4 and 5 we proved the ultimate versions of the laws of large numbers
and the central limit theorem, both of which deal with the asymptotic
behavior of n-cSn for suitable constants c > o. More sophisticated limit
theorems of this type will be derived in Chapters 14-16 and 27, often
through approximation by a Brownian motion or some other Levy process.
Random walks in ]Rd are either recurrent or transient, and our first major
task is to derive a recurrence criterion in terms of the transition distribution
J.j. We proceed with some striking connections between maximum and re-
turn times, anticipating the arcsine laws of Chapters 13, 14, and 15. This is
followed by a detailed study of ladder times and heights for one-dimensional
random walks, culminating with the Wiener-Hopf factorization and Bax-
ter's formula. Finally, we prove a two-sided version of the renewal theorem,
which describes the asymptotic behavior of the occupation measure and
associated intensity for a transient random walk.
In addition to the already mentioned connections to other chapters,
we note the relevance of renewal theory for the study of continuous-time
Markov chains, as considered in Chapter 12. Renewal processes may fur-
ther be regarded as constituting an elementary subclass of the regenerative
160 Foundations of Modern Pobability
sets, to be studied in full generality in Chapter 22 in connection with local
time and excursion theory.
To begin our systematic discussion of random walks, assume as before
that Sn = l + . . . + n for all n E Z+, where the n are i.i.d. random
vectors in ]Rd. The distribution of (Sn) is then determined by the common
distribution J-L = .c(n) of the increments. By the effective dimension of
(Sn) we mean the dimension of the linear subspace spanned by the support
of J-L. For most purposes, we may assume that the effective dimension agrees
with the dimension of the underlying space, since we may otherwise restrict
our attention to the generated subspace.
The occupation measure of (Sn) is defined as the random measure
1JB = " l{Sn E B}, B E Sd.
nO
We also need to consider the corresponding intensity measure
(ETJ)B = E(TJB) = P{Sn E B}, B E Sd.
nO
Writing B; = {y; Ix - yl < €}, we may introduce the accessible set A, the
mean recurrence set M, and the recurrence set R, given by
A - nc>o{x E dj E'f/B > O},
M nc>o{x E d; E'f/B = oo},
R - n {x E }Rd; 1JB = 00 a.s.}.
c:>o
The following result gives the basic dichotomy for random walks in d.
Theorem 9.1 (recurrence dichotomy) Let (Sn) be a random walk in JRd,
and define A, M, and R as above. Then exactly one of these conditions
holds:
(i) R = M = A, which is then a closed additive subgroup ofIR d ;
(ii) R = M = 0, and ISnl 00 a.s.
A random walk is said to be recurrent if (i) holds and to be transient
otherwise.
Proof: Since trivially ReM c A, the relations in (i) and (ii) are equiv-
alent to A c Rand M = 0, respectively. Further note that A is a closed
additive semigroup.
First assume P{ISnl co} < 1, so that P{ISnl < r Lo.} > 0 for some
r > O. Fix any € > 0, cover the r-ball around 0 by finitely many open balls
Bl'...' Bn of radius €/2, and note that P{Sn E Bk Lo.} > 0 for at least
one k. By the Hewitt-Savage 0-1 law, the latter probability equals 1. Thus,
the optional time ". = inf {n > 0; Sn E B k } is a.s. finite, and the strong
Markov property at T yields
1 = P{Sn E Bk Lo.} < P{ISr+n - Sri < € Lo.} = P{ISnl < € Lo.}.
9. Random Walks and Renevi/al Theory 161
Hence, 0 ERin this case.
To extend the latter relation to A c R, fix any x E A and c > O. By the
strong Markov property at a == inf{n > 0; ISn - xl < E/2},
P{ISn - xl < E Lo.} > P{a < 00, ISa+n - Sa I < E/2 i.o.}
P{a < oo}P{ISnl < c/2 i.a.} > 0,
and by the Hewitt-Savage 0-1 law the probability on the left equals 1.
Thus, x E R. The asserted group property will follow if \\re can prove that
even -x E A. This is clear if we write
P{ISn + xl < E Lo.} P{jSa+n - Sa + xl < E i.a.}
> P{ISnl < E/2 Lo.} == 1.
Next assume that ISnl -4 00 a.s. Fix any m, kEN, and conclude from
the Markov property at m that
P{ISml < r, infn2::kISm+nl > r}
> P{ISml < r, infn2::kISm+n - Sml > 2r}
== P{ISml < r} P{infnkISnl > 2r}.
Here the event on the left can occur for at most k different values of m,
and therefore
P{infnkISnl > 2r} Lm P{ISml < r} < 00, kEN.
As k --+ 00, the probability on the left tends to 1. Hence, the sum converges,
and we get E1JB < 00 for any bounded set B. This shows that M == 0. 0
The next result gives some easily verified recurrence criteria.
Theorem 9.2 (recurrence for d == 1,2) A random walk (Sn) in JRd 'ts
recurrent under each of these conditions:
(i) d == 1 and n-1Sn 0;
(ii) d == 2, E€l == 0, and EI€112 < 00.
In (i) we recognize the weak law of large numbers, which is characterized
in Theorem 5.16. In particular, the condition is fulfilled when E€l == O.
By contrast, El E (0,00] implies Sn -4 00 a.s. by the strong law of large
numbers, so in that case (Sn) is transient.
Our proof of Theorem 9.2 is based on the following scaling relation. As
before, a :S. b means that a < cb for some constant c > O.
Lemma 9.3 (scaling) For any random walk (Sn) in}Rd,
LnoP{ISnl < r€} r d LnoP{ISnl < E}, r > 1, E > O.
Proof: Cover the ball {x; Ixl < rE} by balls Bl,...,Bm of radius E/2,
and note that we can make m < rd. Introduce the optional times Tk =
-
162 Foundations of Modern Pobability
inf{n; Sn E B k }, k = 1,...,m, and conclude from the strong Markov
property that
:L n P{ISnl < rd
<
:Lk:L n P{Sn E Bk}
:LkLn P{ISTk+ n - STk I < cj Tk < oo}
LkP{Tk < oo} LnP{ISnl < d
rd:L n P{ISnl < d.
o
<
<
Proof of Theorem 9.2 (Chung and Ornstein): (i) Fix any E > 0 and r > 1,
and conclude from Lemma 9.3 that
:LP{ISnl < c} r- 1 :LP{ISnl < rc} = 1 00 P{IS[rtd < rc}dt.
n n 0
Here the integrand on the right tends to 1 as r -4 00, so the integral tends
to 00 by Fatou's lemma, and the recurrence of (Sn) follows by Theorem
9.1.
(ii) We may assume that (Sn) is two-dimensional, since the one-dimen-
sional case is already covered by part (i). By the central limit theorem we
have n- 1 / 2 Sn (, where the random vector (has a nondegenerate normal
distribution. In particular, P{I(I < c} ? c 2 for bounded c > O. Now fix any
E > 0 and r > 1, and conclude from Lemma 9.3 that
:LP{ISnl < d r- 2 :LP{ISnl < rc} = 1 00 P{IS[r 2 td < Tc}dt.
n n 0
As r --+ 00, we get by Fatou's lemma
L P{lSnl < c} 1 00 P{I(I < ct- 1 / 2 }dt c21°O r 1 dt = 00,
n 0 1
and the recurrence follows again by Theorem 9.1.
o
Our next aim is to derive a general recurrence criterion, stated in terms
of the characteristic function jJ, of J.t. Write Be; = {x E }Rd; Ixl < E}.
Theorem 9.4 (recurrence criterion, Chung and Fuchs) Let (Sn) be a ran-
dom walk in R d based on some distribution j.t, and fix any € > o. Then (Sn)
is recurrent iff
1 1
sup '" dt = 00.
O<r<l Be 1 - rJ.tt
(1)
The proof is based on an elementary identity.
Lemma 9.5 (Parseval) Let J.t and 1/ be probability measures on ]Rd with
characteristic functions jJ, and v. Then J jld1/ = J vdJ.L.
Proof: Use Fubini's theorem.
o
9. Random Walks and Renewal Theory 163
Proof of Theorem 9.4: The function f(8) == (1 -181)+ has Fourier trans-
form j(t) = 2t- 2 (1-cost), so the tensor product f@d(s) = ITk<d!(8k) on
JRd has Fourier transform jd(t) == ITk<d j(tk). Writing J.-l*n - .c(Sn), we
get by Lemma 9.5 for any a > 0 and n E Z+
J j@d(x/a)JL*n(dx) = ad J f0d(at)jJ,dt.
By Fubini's theorem it follows that, for any r E (0,1),
J j@d(x/a) " rnp,*n(dx) == ad J f0d(a!) dt. (2)
L...,no 1 - rjjt
Now assume that (1) is false. Taking 8 = E- 1 d l / 2 , we get by (2)
Ln P{ISnl < 8} = LnJL*n(Bb):S J j0d(x/8) Lnll*n(dx)
J ffgJd(8t) 1 dt
8 d sup '" dt < c-dsup '" < 00,
r<l 1 - rjjt --- r<l B€ 1 - rjjt
and so (Sn) is transient by Theorem 9.1.
To prove the converse, we note that jfgJd has Fourier transform (27r)d ffgJd.
Hence, (2) remains true with f and j interchanged, apart from a factor
(27r)d on the left. If (Sn) is transient, then for any E > 0 with 8 == €-ld l / 2
we get
1 dt J j@d(tjE)
sup '" < sup '" dt
r<l Be 1 - TJ..Lt ...- r<l 1 - rJ.Lt
:S Ed J f@d(EX) Lnll*n(dx)
< Ed Lnll*n(Bb) < 00. 0
In particular, we note that if J.L is symmetric in the sense that l d -l,
then jL is real valued, and the last criterion reduces to
LE 1 :tjJ,t = 00.
By a symmetrization of (Sn) we mean a random walk Sn == Sn - S, n > 0,
where (S) is an independent copy of (Sn). The following result relates the
recurrence behavior of (Sn) and (Sn).
Corollary 9.6 (symmetrization) If a random walk (Sn) is recurrent, then
so is the symmetrized version (Sn).
Proof: Noting that (Z)(Z-l) < 1 for any complex number z =1= 0, we
get
1 1 1
R 1 _ rjJ,2 < 1 - rRjJ,2 < 1 - rlP-1 2 .
164 Foundations of Modern Pobability
Thus, if (Sn) is transient, then so is the random walk (S2n) by Theorem
9.4. But then IS2nl-+ 00 a.s. by Theorem 9.1, and so IS2n+II-+ 00 a.s. By
combination, ISnl -+ 00 a.s., which means that (Sn) is transient. 0
The following sufficient conditions for recurrence or transience are often
more convenient for applications.
Corollary 9.7 (sufficient conditions) Fix any E > o. Then (Sn) is
recurrent if
r 1" dt == 00
} Be 1 - J1t
(3)
and transient if
{ dt
JBel-[Lt <00.
Proof: First assume (3). By Fatou's lemma, we get for any sequence r n t 1
liminf f 1 A > f lim 1 A = f 1 A = 00.
n-+oo J Be 1 - rnJ.L - J Be n-+oo 1 - rnJ.L } Be 1 - J.L
Thus, (1) holds, and (Sn) is recurrent.
N ow assume (4) instead. Decreasing E if necessary, we may further assume
that SRil > 0 on Be. As before, we get
he 1 ! r[L < he 1 - [L < he 1 } [L < 00,
and so (1) fails. Thus, (Sn) is transient. 0
The last result enables us to supplement Theorem 9.2 with some
conclusive information for d > 3.
(4)
Theorem 9.8 (transience for d > 3) Any random walk of effective
dimension d > 3 is transient.
Proof: We may assume that the symmetrized distribution is again d-
dimensional, since Jl is otherwise supported by some hyperplane outside
the origin, and the transience follows by the strong law of large numbers.
By Corollary 9.6, it is enough to prove that the symmetrized random walk
(Sn) is transient, and so we may assume that J.-L is symmetric. Considering
the conditional distributions on Br and B for large enough r > 0, we may
write J.,l as a convex combination CJ.Ll + (1- C)JL2, where J.-Ll is symmetric and
d-dimensional with bounded support. Letting (rij) denote the covariance
matrix of 111, we get as in Lemma 5.10
ill (t) = 1 - L. .rijtitj + o(ltI 2 ), t -+ O.
,J
Since the matrix (rij) is positive definite, it follows that 1- ill (t) Itl 2 for
small enough It I, say for t E Bc. A similar relation then holds for jl, and so
1 dt 1 dt l e d-3
A;S - , 1 2 r dr < 00.
B £ 1 - j.jt B £ t 0
9. Random Walks and Renewal Theory 165
Thus, (Sn) is transient by Theorem 9.4. 0
We turn to a more detailed study of the one-dimensional random walk
Sn == l + . . . + n' n E Z+. Say that (Sn) is simple if Ill == 1 a.s. For a
simple, symmetric random walk (Sn) we note that
Un = P{S2n = O} = T2ne:} n E Z+_ (5)
The following result gives a surprising connection between the probabilities
Un and the distribution of last return to the origin.
Proposition 9.9 (last return, Feller) Let (Sn) be a simple, symmetric
random walk in Z, put an == max{k < n; 5 2k == OJ, and define Un by (5).
Then
P{a n == k} == UkUn-k, 0 < k < n.
Our proof will be based on a simple symmetry property, which will also
appear in a continuous-time version as Lemma 13.14.
Lemma 9.10 (reflection principle, Andre) For any symmetric random
walk (Sn) and optional time T, we have (Sn) d (Sn), where
Sn == Snl\r - (Sn - SnI\T)' n > o.
Proof: We may clearly assume that T < 00 a.s. Writing S == ST+n - 5 T ,
d
n E Z+, we get by the strong Markov property S == S'.1L(ST, r), and by
symmetry - S d S. Hence, by combination (- S' , sr , T) d (5', ST , r), and
the assertion follows by suitable assembly. 0
Proof of Proposition 9.9: By the Markov property at time 2k, we get
P{a n == k} == P{S2k == O}P{an-k == OJ, 0 < k < n,
which reduces the proof to the case when k == o. Thus, it remains to show
that
P {8 2 =1= 0, . . . , S2n =1= O} == P {S2n = O}, n EN.
By the Markov property at time 1, the left-hand side equals
!P{ min k<2n S k = O} + !P{maxk<2nSk == O} == P{.l\1 2n - 1 == OJ,
where M n = maxkn Sk. Using Lemma 9.10 with T == inf{k; Sk == I}, we
get
1 - P{M 2n - 1 = O} = P{M 2n - 1 > 1}
P{M 2n - 1 > 1, S2n-l > 1} + P{M 2n - 1 > 1, S2n-l < O}
= P{S2n-l > 1} + P{S2n-l > 2}
- 1- P{S2n-l == 1} == 1 - P{S2n == OJ. 0
We continue with an even more striking connection between the max-
imum of a symmetric random walk and the last return probabilities in
166 Foundations of Modern Pobability
Proposition 9.9. Related results for Brownian motion and more general
random walks will appear in Theorems 13.16 and 14.11.
Theorem 9.11 (first maximum, Sparre-Andersen) Let (Sn) be a random
walk based on a symmetric, diffuse distribution, put M n == maxkn Sk, and
write Tn == min{k > 0; Sk == M n }. Define an as in Proposition 9.9 in terms
of a simple, symmetric random walk. Then Tn d an for every n > o.
Here and below, we shall use the relation
d
(SI,'.', Sn) == (Sn - Sn-l,..., Sn - So), n E N, (6)
valid for any random walk (Sn). The formula is obvious from the fact that
d
(1,'" ,n) == (n,." ,c;l).
Proof of Theorem 9.11: By the symmetry of (Sn) together with (6), we
have
Vk = P{Tk=O}==P{Tk==k}, k > O.
(7)
Using the Markov property at time k, we hence obtain
P{Tn == k} == P{Tk == k}P{Tn-k == O} == VkVn-k, 0 < k < n. (8)
Clearly ao == TO == O. Proceeding by induction, assume that ak d Tk and
hence Uk == Vk for all k < n. Comparing (8) with Proposition 9.9, we obtain
P{a n = k} == P{Tn == k} for 0 < k < n, and by (7) the equality extends to
d
k == 0 and n. Thus, an == Tn. 0
For a general one-dimensional random walk (Sn)' we may introduce the
ascending ladder times /1, T2, . .. , given recursively by
Tn = inf{k > Tn-I; Sk > S'Tn_l}' n E N,
(9)
starting with TO == o. The associated ascending ladder heights are defined as
the random variables S'T n ' n E N, where Sex:> may be interpreted as 00. In a
similar way, we may define the descending ladder times T;; and heights S -,
'In
n E N. The times Tn and T;; are clearly optional. By the strong Markov
property, we conclude that the pairs (Tn, S'T n ) and (7;, S'T:;) form possibly
terminating random walks in JR 2.
Replacing the relation Sk > S'Tn_l in (9) by Sk > S'Tn_l' we obtain the
weak ascending ladder times an and heights SUn. Similarly, we may intro-
duce the weak descending ladder times 0";; and heights 8 u ;;. The mentioned
sequences are connected by a pair of simple but powerful duality relations.
9. Random Walks and Renewal Theory 167
Lemma 9.12 (duality) Let 1], rl, (, and (' denote the occupation mea-
sures of the sequences (Sr n ), (San)' (Sn; n < 7 1 ), and (Sn; n < ( 1 ),
respectively. Then E1] == E(' and Erl == E(.
Proof: By (6) we have for any B E B(O, (0) and n E N
P{SI !\ ... /\ Sn-l > 0, Sn E B} P{SI V ... V Sn-I < Sn E B}
I: k P{Tk = n, S-rk E B}. (10)
Summing over n > 1 gives Ee' B == E1]B, and the first assertion follows.
The proof of the second assertion is similar. 0
The last lemma yields some interesting information. For example, in
a simple symmetric random walk, the expected number of visits to an
arbitrary state k =1= 0 before the first return to 0 is constant and equal
to 1. In particular, the mean recurrence time is infinite, and so (Sn) is a
null-recurrent Markov chain.
The following result shows how the asymptotic behavior of a random
walk is related to the expected values of the ladder times.
Proposition 9.13 (fluctuations and mean ladder times) For any nonde-
generate random walk (Sn) in JR, exactly one of these cases occurs:
(i) Sn -+ 00 a.s. and Erl < 00;
(ii) Sn -+ -00 a.s. and E7} < 00;
(iii) limsuPn(:i:S n ) == 00 a.s. and EUI == EU 1 == 00.
Proof: By Corollary 3.17 there are only three possibilities: Sn -+ 00 a.s.,
Sn -+ -00 a.s., and lim sUP n (::i:S n ) == 00 a.s. In the first case a:;; < 00 for
finitely many n, say for n < Ii < 00. Here K is geometrically distributed,
and so Erl == Eli < 00 by Lemma 9.12. The proof in case (ii) is similar.
In case (iii) the variables r n and r;; are all finite, and Lemma 9.12 yields
Eal == Ea 1 = 00. 0
Next we shall see how the asymptotic behavior of a random walk is
related to the expected values of l and SrI' Here we define E == E+ -
E- whenever E+ /\ E- < 00.
Proposition 9.14 (fluctuations and mean ladder heights) If (Sn) is a
nondegenerate random walk in JR, then
(i) El = 0 implies lim sUPn (::i:S n ) == 00 a.s.;
(ii) El E (0,00] implies Sn 00 a.s. and ES rl == ErlEl;
(iii) Et = El = 00 implies ES rl = -ES rl = 00.
The first assertion is an immediate consequence of Theorem 9.2 (i). It
can also be obtained more directly, as follows.
Proof: (i) By symmetry, we may assume that limsuPn Sn == 00 a.s. If
ETI < 00, then the law of large numbers applies to each of the three ratios
168 Foundations of Modern Pobability
in the equation
8 Tn Tn 8 Tn
n E N,
Tn n n
and we get 0 = ElETI = ES Tl > o. The contradiction shows that Ell =
00, and so Hrn inf n Sn = -00 by Proposition 9.13.
(ii) In this case Sn -+ 00 a.s. by the law of large numbers, and the formula
EST! = ETI El follows as before.
(iii) This is clear from the relations STI > t and ST- < -1. 0
1
We proceed with a celebrated factorization, which provides some more
detailed information about the distributions of ladder times and heights.
Here we write X:I: for the possibly defective distributions of the pairs
( '1, SrI) and (T 1" , S T - ), respectively, and let 'ljJ:I: denote the correspond-
1
ing distributions of (a1,Sal) and (a1,Sa-). Put x = x:I:({n} x.) and
1
'ljJ = 'ljJ:t ( { n} x .). Let us finally introd lice the measure X O on N, given by
x - P{Sl!\.... /\ Sn-l > 0 = Sn}
= P{Sl V . . . V Sn-l < 0 = Sn}, n E N,
where the second equality holds by (6).
Theorem 9.15 (Wiener-Hop! factorization) For any random walk in JR
based on some distribution J..L, we have
b o - 8 1 J1, (8 0 - X +) * (80 - "p -) = (8 0 - 'l/J +) * (8 o - X - ) , (11)
8 0 - 'ljJ:I: _ (8 0 _ X:I:) * (80 - Xo). (12)
Note that the convolutions in (11) are defined on the space Z+ x JR,
whereas those in (12) can be regarded as defined on Z+. Alternatively, we
may consider Xo as a measure on N x {O}, and interpret all convolutions
as defined on Z+ x IR.
Proof: Define the measures PI, P2, . .. on (0, (0) by
Pn B - P{SI /\ ...1\ Sn-l > 0, Sn E B}
- EL k 1{n: = n, STk E B}, n EN, BE 8(0,00), (13)
where the second equality holds by (10). Put Po = 80, and regard the
sequence P = (Pn) as a measure on Z+ x (0, (0). Noting that the corre-
sponding measures on 1R equal Pn + 1/;; and using the Markov property at
time n - 1, we get
Pn + 1/J;; = Pn-l * J.L = (p * (8 1 Q9 /-l))n, n E N. (14)
Applying the strong Markov property at 'I to the second expression in
(13), we see that also
n
Pn = 2::xt *Pn-k = {X+ *P)n, n E N.
k=l
(15)
9. Random Walks and Renewal Theory 169
Recalling the values at zero, we get from (14) and (15)
p + 'ljJ - == 6 0 + P * (6 1 (9 J-L), P == 8 0 + X + * p.
Eliminating p between the two equations yields the first relation in (11),
and the second relation follows by symmetry.
To prove (12), we note that the restriction of'ljJ+ to (0,00) equals 'ljJ;t -X.
Thus, for any B E B(O,oo),
(x - 'ljJ-;; + X)B == P{maxk<nSk == 0, Sn E: B}.
Decomposing the event on the right according to the time of first return to
0, we get
n-l
X - 'ljJ + X == L XX-k == (X O * X+)n, n E N,
k=l
and so X+ - 'ljJ+ + X O == XO * X+, which is equivalent to the plus-sign version
of (12). The minus-sign version follows by symmetry. 0
The preceding factorization yields in particular an explicit formula for
the joint distribution of the first ladder time and height.
Theorem 9.16 (ladder distributions, Sparre-Andersen, Baxter) If (Sn)
is a random walk in lR, then for Is I < 1 and u > 0,
E STl exp( -uS T .) = 1 - exp { - : E[e- Usn ; Sn > 0] } . (16)
A similar relation holds for (0"1, SUI) with Sn > 0 replaced by Sn > o.
Proof: Introduce the mixed generating and characteristic functions
X , t == E STI exp(itS TI ), ;f;;t == E sUI exp(itS u -)'
, I
and note that the first relation in (11) is equivalent to
1 - S t == (1 - X t t ) (1 - "j; -; t ) , I s I < 1, t E: JR.
, ,
Taking logarithms and expanding in Taylor series, we obtain
'""' n -1 ( silt) n == '""' n -1 (X t t) n + '""' n -1 ( b -; t ) n .
n n' n '
For fixed s E (-1, 1), this equation is of the form f) == [1+ + [1- , where v and
v::i: are bounded signed measures on JR, (0, 00 ), and ( - 00, 0], respectively. By
the uniqueness theorem for characteristic functions we get v == v+ + v- . In
particular, v+ equals the restriction of v to (0, 00 ). Thus, the corresponding
Laplace transforms agree, and (16) follows by summation of a Taylor series
for the logarithm. A similar argument yields the formula for (0"1, SUI)' 0
From the last result we may easily obtain expressions for the probability
that a random walk stays negative or non positive, and also deduce criteria
for its divergence to -00.
170 Foundations of Modern Pobability
Corollary 9.17 (negativity and divergence to -00) For any random walk
(Sn) in JR, we have
P{TI =oo} (Ea1)-1 eXP{-Lnln-lP{Sn >O}}, (17)
P{al=oo} - (ETl)-l eXP{-Lnln-lP{Sn > O}}. (18)
Furthermore, each of these two conditions is equivalent to Sn --t -00 a.s.:
" n- 1 P{Sn > O} < 00, n- 1 P{Sn > O} < 00.
nl n1
Proof: The last expression for P{ 71 = oo} follows from (16) with u = 0
as we let s --t 1. Similarly, the formula for P{ a1 == oo} is obtained from
the version of (16) for the pair (0"1, S(1)' In particular, P{ 71 = oo} > 0 iff
the series in (17) converges, and similarly for the condition P{ 0"1 = oo} >
o in terms of the series in (18). Since both conditions are equivalent to
Sn --t -00 a.s., the last assertion follows. Finally, the first equalities in (17)
and (18) are obtained most easily from Lemma 9.12, if we note that the
number of strict or weak ladder times 7 n < 00 or an < 00 is geometrically
distributed. 0
We turn to a detailed study of the occupation measure 'TJ = L:n>o 8sn of
a transient random walk on , based on transition and initial distrIbutions
J-L and v. Recall from Theorem 9.1 that the associated intensity measure
ErJ = v * L:n J..L*n is locally finite. By the strong Markov property, the
sequence (ST+n - S7) has the same distribution for every finite optional
time T. Thus, a similar invariance holds for the occupation measure, and
the associated intensities must agree. A renewal is then said to occur at
time T, and the whole subject is known as renewal theory. In the special
case when J-L and 1/ are supported by R+, we refer to TJ as a renewal process
based on j.j and v, and to E'rJ as the associated renewal measure. For most
purposes, we may assume that v = 8 0 ; if this is not the case, we say that
1] is delayed.
The occupation measure ry is clearly a random measure on , in the sense
that TJB is a random variable for every bounded Borel set B. From Lemma
12.1 we anticipate the simple fact that the distribution of a random measure
on 1R+ is determined by the distributions of the integrals rJ f = J f dry for all
f E Cj( (JR+ ), the space of continuous functions f: JR+ --t 1R+ with bounded
support. For any measure J-L on 1R and constant t > 0, we may introduce
the shifted measure (}tJ-L on R+, given by «(}tJ.L)B = J-L(B + t) for arbitrary
B E B(R+). A random measure 'TJ on R is said to be stationary on JR+ if
d
Ot'f} = ()o'TJ.
Given a renewal process TJ based on some distribution J-L, we say that the
delayed process ij = 80: * 'TJ is a stationary version of ry, if the delay distri-
bution v = £( a) is such that the random measure i} becomes stationary on
R+. We proceed to show that such a version exists iff J.l has finite mean, in
which case v is uniquely determined by J.L. Write A for Lebesgue measure
on JR+.
9. Random Walks and ReneliVal Theory 171
Proposition 9.18 (stationary renewal process) Let'f} be a renewal process
based on some distribution J1 on 1R+ with mean c. Then 11 has a stationary
version ij iff c E (0,00). In that case Eij == c- 1 A, and the delay distribution
of ij is uniquely given by v == c- 1 (8 0 - J-l) * A, or
v[O, t] = c- 1 I t /1(s, 00 )ds, t > O. (19)
Proof: By Fubini's theorem,
E1J ELnDSn = Ln£(Sn) = L n v*/1*n
v + /1 * Ln v * /1*n = v + /1 * E1J,
and so v == (8 0 - J-l) * E'f}. If TJ is stationary, then E'f} is shift invariant,
and Theorem 2.6 yields E1] == aA for some constant a > O. Thus, v ==
a(8 0 - J-l) * A, and (19) holds with c- 1 replaced by a. A.s t -+ 00, we get
1 == ac by Lemma 3.4, which implies c E (0,00) and a == c- 1 .
Conversely, assume that c E (0, (0), and let v be given by (19). Then
E1J v * Ln/1m = C-1(Do - /1) * A * Ln/1*n
c- 1 A * { '""" J-l*n - '""" J-l*n } == c- 1 A.
L..,. n 0 L...-t n 1
By the strong Markov property, the shifted random measure OtT! is again a
renewal process based on J-l, say with delay distribution Vt. As before,
Vt == (8 0 - J.L) * (OtETJ) == (8 0 - J-l) * ETJ == v,
which implies the asserted stationarity of 'f}.
D
From the last result we may deduce a corresponding statement for the
occupation measure of a general random walk.
Proposition 9.19 (stationary occupation measure) Let 'f} be the occupa-
tion measure of a random walk in based on some distTibutions J-l and v,
where J-l has mean c E (0, (0) and v is defined as in (lt) in terms of the
ladder height distribution j1 and its mean c. Then 'f} is stationary on 1R+
with intensity c- 1 .
Proof: Since Bn 00 a.s., Propositions 9.13 and 9.14 show that the
ladder times 'Tn and heights Hn == B Tn have finite mean, and by Proposition
9.18 the renewal process <; == En 8Hn is stationary for the prescribed choice
of v. Fixing t > 0 and putting (J"t == inf{n E Z+; Bn > t}, we note in
particular that BUt - t has distribution v. By the strong Markov property
at (J"t, the sequence SUt+ n - t, n E Z+, has then the sarne distribution as
(Sn). Since Sk < t for k < (J"t, we get f)t'f} d 'f} on +, which proves the
asserted stationarity.
To identify the intensity, let T!n denote the occupation measure of the
d
sequence Sk - Hn, Tn < k < T n +l, and note that HnJl.'f}n == TJo for each n,
172 Foundations of Modern Pobability
by the strong Markov property. Hence, by Fubini's theorem,
E'f} E Ln 'f}n * 8Hn = Ln E(8Hn * E'f}n)
E'f}o * E Ln 8Hn = E'f}o * E(.
Noting that E( == c- I A by Proposition 9.18, that E'Tjo(O, 00) == 0, and that
c = CETI by Proposition 9.14, we get on JR+
E'f} = E'f}JIL >. = El >. = C -1 >..
C c
The next result describes the asymptotic behavior of the occupation
measure 'Tj and its intensity E1}. Under weak restrictions on J.L, we shall
see how (}t1} approaches the corresponding stationary version ij, whereas
ETJ is asymptotically proportional to Lebesgue measure. For simplicity, we
assume that the mean of J.L exists in JR . Thus, if is a random variable
with distribution J-t, we assume that E(+ /\ -) < 00 and define E =
E+ - E- .
It is natural to state the result in terms of vague convergence for measures
on JR+, and the corresponding notion of distributional convergence for ran-
dom measures. Recall that, for locally finite measures v, VI, V2, . .. on JR+,
the vague convergence V n -4. v means that vnf -+ vf for all f E O.tOR+).
Similarly, if TJ, TJl, 1}2, . .. are random measures on JR+, we define the dis-
tributional convergence TJn 1} by the condition 1}nf TJf for every
f E Ok (JR+ ). (The latter notion of convergence will be studied in detail in
Chapter 16.) A measure J1 on JR is said to be nonarithmetic if the additive
subgroup generated by supp J.L is dense in IR.
o
Theorem 9.20 (two-sided renewal theorem, Blackwell, Feller and Grey)
Let TJ be the occupation measure of a random walk in JR based on some
distributions J.L and v, where J.L is nonarithmetic with mean c E JR \ {O}. If
c E (0,00), let i] be the stationary version in Proposition 9.19; otherwise,
put fj == O. Then as t ---7 00,
(i) (}t'" r"
(ii) (}tErJ Eij == (c- 1 V O)A.
Our proof is based on two lemmas. First we consider the distribution
Vt of the first nonnegative ladder height for the shifted process (Sn - t).
For c E (0, 00 ), the key step is to show that Vt converges weakly toward
the corresponding distribution f; for the stationary version. This will be
accomplished by a coupling argument.
Lemma 9.21 (asymptotic delay) If c E (0,00), then Vt i/ as t 00.
Proof: Let a and a' be independent random variables with distributions
v and v. Choose some i.i.d. sequences (k)Jl(1?k) independent of 0 and 0'
9. Random Walks and Renewal Theory 1 73
such that .c(k) = jj and P{'19 k = :f:l} = !. Then
Sn = a' - a - '"" {}kk, n E Z+,
k'5:n
is a random walk based on a nonarithmetic distribution with mean 0, and
so by Theorems 9.1 and 9.2 the set {Sn} is a.s. dense in . For any E > 0,
the optional time a = inf{n > 0; Sn E [O,E]} is then a.s. finite.
Now define 19 = (_l)l{k'5: lT }{}k, kEN, and note as in Lemma 9.10 that
{a', (k,D)} d {a', (k,'19k)}. Let Kl < K2 < ... be the values of k with
19 k = 1, and define K < K < ... similarly in terms of ({}). By a simple
conditioning argument, the sequences
Sn = a + . "'] ,
J5:n
S' = a ' + '"" c ",' ,
n j'5:n J
n E Z+,
are random walks based on jj and the initial distributions v and v,
respectively. Writing a:1: = Lk:::;a 1 {{}k = :i::1}, we note that
S_+n - Sa++n = Sa E [0, E], n E Z+.
Putting'Y = S;+ V S';_, and considering the first entry of (Sn) and (S)
into the interval [t, (0), we obtain
li[E, x] - P{'Y > t} < Vt[O, x] < li[O, x + E] + P{, > t}.
Letting t --+ 00 and then € --+ 0, and noting that i/{O} = 0 by stationarity,
we get Vt [0, x] --+ i/[O, x]. 0
The following simple statement will be needed to deduce (ii) from (i) in
the main theorem.
Lemma 9.22 (uniform integrability) Let 1] be the occupation measure of
a transient random walk (Sn) in}Rd with arbitrary initial distribution, and
fix any bounded set B E Bd. Then the random variables 'fJ(B + x), x E jRd,
are uniformly integrable.
Proof: Fix any x E JRd, and put T = inf {t > 0; Sn E B + x}. Letting 1]0
denote the occupation measure of an independent randorn walk starting at
0, we get by the strong Markov property
1](B + x) d 1]o(B + x - S-r)l{T < oo} < 1]o(B - B).
In remains to note that EfJo(B - B) < 00 by Theorem 9.1, since (Sn) is
transient. 0
Proof of Theorem 9.20 (c < (0): By Lemma 9.22 it is enough to prove
(i). If c < 0, then Sn --+ -00 a.s. by the law of large numbers, so ()t1] = 0
for sufficiently large t, and (i) follows. If instead c E (0, 00 ), then Vt
li by Lemma 9.21, and we may choose some random variables at and Q
with distributions lit and v, respectively, such that at --t a a.s. We may
also introduce the occupation measure 110 of an independent random walk
starting at O.
174 Foundations of Modern Pobability
Now fix any f E Cj«(JR+), and extend f to JR by putting f(x) == 0 for
x < O. Since iJ '\, we have 1]0 { -Q} == 0 a.s. Hence, by the strong Markov
property and dominated convergence,
(O(fJ)f d J f(at + x)'fJo(dx) J f(a + x)'fJo(dx) d fif.
(c == 00): In this case it is clearly enough to prove (ii). Then note that
E'TJ == v * EX * E(, where X is the occupation measure of the ladder height
sequence of (Sn - So), and ( is the occupation measure of the same process
prior to the first ladder time. Here E(R_ < 00 by Proposition 9.13, and
so by dominated convergence it suffices to show that (}tEX -4 O. Since the
mean of the ladder height distribution is again infinite by Proposition 9.14,
we may henceforth take v == fJ o and let J,l be an arbitrary distribution on
JR+ with infinite mean.
Put I == [0, 1], and note that E'TJ(I + t) is bounded by Lemma 9.22.
Define b == limsuPt E'TJ(I +t), and choose some tk -+ 00 with E'TJ(I +tk) -t b.
Subtracting the finite measures J.l*j for j < m, we get (J.l*m*E'TJ)(I +tk) ---+ b
for all m E Z+. Using the reverse Fatou lemma, we obtain for any B E
B(+)
liminf E'fJ(I - B + tk)J-t*mB
k -+<X)
> liminf ( E'fJ(I - x + tk){L*m(dx)
k-+<X) J B
- b -limsup ( E'fJ{l - x + tk){L*m(dx)
k-+oo } Be
> b - ( lim sup E'fJ{l - x + tk){L*m(dx) > b{L*mB. (20)
J Be k-+oc>
Now fix any h > 0 with j1.(O, h] > O. Noting that E1J[r, r + h] > 0 for all
r > 0 and writing J == [0, a] with a == h + 1, we get by (20)
liminf E17(J + tk - r) > b, r > a.
k -+ oc>
(21)
Next conclude from the identity 6 0 == (6 0 - j1.) * E1] that
{t k
1 = Jo {L(tk - x, oo)E'fJ(dx) > 2: {L(na, oo)E'fJ(J + tk - na).
o nl
As k -+ 00, we get by (21) and Fatou's lemma 1 > b Ek>l J-t(na, (0). Since
the sum diverges by Lemma 3.4, it follows that b == O. - 0
We may use the preceding theory to study the renewal equation F ==
f + F * J-t, which often arises in applications. Here the convolution F * J-t is
defined by
(F*{L)t = it F(t- s){L(ds), t > 0,
9. Random Walks and Renewal Theory 1 75
whenever the integrals on the right exist. Under suitable regularity condi-
tions, the renewal equation has the unique solution F == f * jl, where Jl
denotes the renewal measure En>o J-L*n. Additional conditions ensure the
solution F to converge at 00. -
A precise statement requires some further terminology. By a regular step
function we mean a function on JR+ of the form
It = L.>la j l(j-l,j)(t/h), t > 0, (22)
J_
where h > 0 and aI, a2, . .. E IR. A measurable function ! on JR+ is said
to be directly Riemann integrable if AI!I < 00 and there exist some regular
step functions f: with f;; < f < f: and A(!: - f;:) - o.
Corollary 9.23 (renewal equation) Fix a distribution /1 =I 8 0 on + with
associated renewal measure jl, and let f be a locally bounded and measurable
function on JR+. Then the equation F == f + F * J-L has the unique, locally
bounded solution F == f * fl. If f is also directly Riemann integrable and if
J.L is nonarithmetic with mean c, then Ft ---+ c- 1 Af as t --+ 00.
Proof: Iterating the renewal equation gives
F == f * J-L*k + F * J-L*n, n E N.
k<n
(23)
Now jl*n[o, t] ---+ 0 as n ---+ 00 for fixed t > 0 by the weak law of large
numbers, and so for a locally bounded F we have F * J-L*n ---+ O. If even f is
locally bounded, then by (23) and Fubini's theorem,
F == f * J-L*k == f * J-t*k == f * jl.
kO kO
Conversely, f + f * jl * j..t == f * jl, which shows that F == f * jl solves the
given equation.
Now let J.L be nonarithmetic. If f is a regular step function as in (22),
then by Theorem 9.20 and dominated convergence we get as t 00
Ft (t f(t - s)jl(ds) = L. ajjl((O, h] + t - jh)
Jo Jl
---+ c-lh a o ==c- 1 Af.
o>l J
J_
In the general case, we may introduce some regular step functions I: with
f;; < f < f: and A(f: - f;;) 0, and note that
(/;; * fl) t < Ft < (/: * fl) t , t > 0, n EN.
Letting t 00 and then n ---+ 00, we obtain Pt ---+ c- 1 AI. 0
Exercises
1. Show that if (Sn) is recurrent, then so is the random \\ralk (Snk) for each
kEN. (Hint: If (Snk) is transient, then so is (Snk+ j) for any j > 0.)
176 Foundations of Modern Pobability
2. For any nondegenerate random walk (Sn) in ]Rd, show that ISn I 00.
(Hint: Use Lemma 5.1.)
3. Let (Sn) be a random walk in JR based on a symmetric, nondegenerate
distribution with bounded support. Show that (Sn) is recurrent, using the
fact that lim sUPn (:J:S n ) == 00 a.s.
4. Show that the accessible set A equals the closed semigroup generated
by supp ft. Also show by examples that A mayor may not be a group.
5. Let v be an invariant measure on the accessible set of a recurrent random
walk in }Rd. Show by examples that E1] mayor may not be of the form oo.v.
6. Show that a nondegenerate random walk in d has no invariant
distribution. (Hint: If v is invariant, then J-L * 1/ == v.)
7. Show by examples that the conditions in Theorem 9.2 are not necessary.
(Hint: For d = 2, consider mixtures of N(O, a 2 ) and use Lemma 5.18.)
8. Consider a random walk (Sn) based on the symmetric p-stable distribu-
tion on 1R with characteristic function e- 1tIP . Show that (Sn) is recurrent
for p > 1 and transient for p < 1.
9. Let (Sn) be a random walk in ]R2 based on the distribution J-L2, where J-L
is symmetric p-stable. Show that (Sn) is recurrent for p == 2 and transient
for p < 2.
10. Let J-L = Cj11 + (1 - C)J-L2, where J-L1 and J-L2 are symmetric distributions
on JRd and C is a constant in (0, 1). Show that a random walk based on J..L
is recurrent iff recurrence holds for the random walks based on J-Ll and J..L2.
11. Let J..L == J..Ll * J.L2, where J-Ll and J.L2 are symmetric distributions on }Rd.
Show that if a random walk based on J-L is recurrent, then so are the random
walks based on J-Ll and J.l2. Also show by an example that the converse is
false. (Hint: For the latter part, let J.-tl and J-L2 be supported by orthogonal
subspaces. )
12. For any symmetric, recurrent random walk on 7l d , show that the ex-
pected number of visits to an accessible state k =1= 0 before return to the
origin equals 1. (Hint: Compute the distribution, assuming probability p
for return before visit to k.)
13. Use Proposition 9.13 to show that any nondegenerate random walk in
Zd has infinite mean recurrence time. Compare with the preceding problem.
14. Show how part (i) of Proposition 9.14 can be strengthened by means
of Theorems 5.16 and 9.2.
15. For a nondegenerate random walk in :JR, show that lim sUPn Sn = 00
a.s. iff 0"1 < 00 a.s. and that Sn -4 00 a.s. iff EO'I < 00. In both conditions,
note that 0"1 can be replaced by Ti.
16. Let 'Tl be a renewal process based on some nonarithmetic distribution
on JR+. Show for any € > 0 that sup{t > 0; E1J[t, t + €] = O} < 00. (Hint:
Imitate the proof of Proposition 8.14.)
9. Random Walks and Renewal Theory 177
17. Let Jl be a distribution on Z+ such that the group generated by
supp j.t equals Z. Show that Proposition 9.18 remains true with v{ n} ==
c- 1 j.t(n,00), n > 0, and prove a corresponding version of Proposition 9.19.
18. Let 1] be the occupation measure of a random walk on Z based on some
distribution J-L with mean c E IR \ {OJ such that the group generated by
supp J-t equals Z. Show as in Theorem 9.20 that ETJ{ n} --t c- 1 V o.
19. Derive the renewal theorem for random walks on Z+ from the ergodic
theorem for discrete-time Markov chains, and conversely. (Hint: Given a
distribution J..l on N, construct a Markov chain X on Z+ \vith Xn+l == X n + 1
or 0, and such that the recurrence times at 0 are i.i.d. {t. Note that X is
aperiodic iff Z is the smallest group containing SUPP {t.)
20. Fix a distribution Jl on 1R with symmetrization jj. Note that if jj is
nonarithmetic, then so is 1-£. Show by an example that the converse is false.
21. Simplify the proof of Lemma 9.21, in the case when even the sym-
metrization jl is nonarithmetic. (Hint: Let 1, 2, . .. and , , . .. be i.i.d.
J-L, and define Sn == 0' - Q + Ek:::;n ( - k).)
22. Show that any monotone and Lebesgue integrable function on IR+ is
directly Riemann integrable.
23. State and prove the counterpart of Corollary 9.23 for arithmetic
distributions.
24. Let (n) and (1Jn) be independent i.i.d. sequences with distributions {t
and 1/, put Sn == Ek<n(k + TJk), and define U == Un>O[Sn, Sn + n+l)'
Show that Ft == P{ t E -U} satisfies the renewal equation -F == f + F * J-L * v
with It == j.t(t, 00). Assuming J-L and v to have finite means, show also that
Ft converges as t -+ 00, and identify the limit.
25. Consider a renewal process TJ based on some nonarithmetic distribution
J..l with mean c < 00, fix an h > 0, and define Ft == P{ TJ[t, t + h] == OJ. Show
that F == f + F * J-t, where It == J-L( t + h, 00 ). Also show that Pt converges as
t -+ 00, and identify the limit. (Hint: Consider the first point of 1] in (0, t),
if any.)
26. For 17 as above, let T = inf{ t > 0; 17[t, t + h] == O}, and put Ft ==
P{r < t}. Show that Ft = J-t(h, 00) + Johl\t J-t(ds)Ft-s, or F == f + F * J..lh,
where J..lh == 1 [O,h] · J.L and f = J..l( h, (0).
Chapter 1 0
Stationary Processes and
Ergodic Theory
Stationarity, invariance, and ergodicity; discrete- and continuous-
time ergodic theorems; moment and maximum inequalities;
multivariate ergodic theorems; sample intensity of a random
measure; subadditivity and products of random matrices; con-
ditioning and ergodic decomposition; shift coupling and the
invariant a-field
In this chapter we come to the third important dependence structure
of probability theory, beside those of martingales and Markov processes,
namely stationarity. A stationary process is simply a process whose distri-
bution is invariant under shifts. Stationary processes are important in their
own right, and they also arise under broad conditions as steady-state limits
of various Markov and renewal-type processes, as we have seen in Chapters
8 and 9 and will see again in Chapters 12, 20, and 23. Our present aim is to
present some of the most useful general results for stationary and related
processes.
The key result of stationarity theory is Birkhoff's ergodic theorem, which
may be regarded as a strong law of large numbers for stationary sequences
and processes. After proving the classical ergodic theorems in discrete and
continuous time, we turn to the multivariate versions of Zygmund and
Wiener, the former in a setting for noncommutative mappings and rectan-
gular regions, the latter in the commutative case but with averages over
increasing families of convex sets. Wiener's theorem will also be consid-
ered in a version for random measures that will be useful in Chapter 11
for the theory of Palm distributions. We finally present a version of King-
man's subadditive ergodic theorem, along with an important application
to random matrices.
In all the mentioned results, the limit is a random variable, measurable
with respect to the appropriate invariant a-field L. Of special interest then
is the ergodic case, when I is trivial and the limit reduces to a constant.
For general stationary processes, we consider a decomposition of the dis-
tribution into ergodic components. The chapter concludes with some basic
criteria for coupling and shift coupling of two processes, expressed in terms
of the tail and invariant a-fields r and I, respectively. Those results will
be helpful to prove some ergodic theorems in Chapters 11 and 20.
10. Stationary Processes and Ergodic Theory 179
Our treatment of stationary sequences and processes is continued in
Chapter 11 with some important applications and extensions of the present
theory. In particular, we will then derive ergodic theorems for Palm dis-
tributions, as well as for entropy and information. In Chapter 20 we show
how the basic ergodic theorems admit extensions to suitable contraction
operators, which leads to a profound unification of the present theory with
the ergodic theory for Markov transition operators. Our treatment of the
ratio ergodic theorem is also postponed until then.
Let us now return to the basic notions of stationarity and invariance.
Then fix an arbitrary measurable space (5, S). Given a measure J-l and
a measurable transformation T on S, we say that T is JL-preserving or
measure-preserving if J1 a T- I == J1. Thus, if is a random element of 5
with distribution J1, then T is measure-preserving iff T d . In particular,
consider a random sequence == (o, l,. . .) in some measurable space
(S', S'), and let () denote the shift on 5 == (8') 00 given by B( Xo, Xl, . . .) ==
(x 1, X2, . . . ). Then is said to be stationary if (J d . We show that the
general situation is equivalent to this special case.
Lemma 10.1 (stationarity and invariance) For any random element in
,
8 and measurable transformation T on 5, we have T iff the sequence
(Tn) is stationary, in which case even (f 0 Tn) is stationary for every
measurable function f. Conversely, any stationary random sequence admits
such a representation.
Proof: Assuming T d , we get
0(1 0 Tn) == (I 0 Tn+l(,) == (f 0 TnT) d (I 0 Tn),
and so (f 0 Tn) is stationary. Conversely, if 7J == (TJo, 7JI, . . . ) is stationary
we may write TJn == 1ro(on'TJ) with 1rO(XO,Xl,...) == Xo, and we note that
07J d 7J by the stationarity of 1]. D
In particular, we note that if o, 1 , . .. is a stationary sequence of random
elements in some measurable space 8, and if 1 is a measurable mapping of
Soo into some measurable space S', then the random sequence
'TJn == 1 ( n , n+ I , . . . ), n E Z + ,
is again stationary.
The definition of stationarity extends in the obvious way to random se-
quences indexed by Z. The two-sided versions have the technical advantage
that the associated shift operators form a group, rather than just a semi-
group as in the one-sided context. The following result shows that the two
cases are essentially equivalent. Here we assume the existence of appropriate
randomization variables, as explained in Chapter 6.
180 Foundations of Modern Probability
Lemma 10.2 (two-sided extension) Any stationary random sequence
o, 1, . .. in a Borel space admits a stationary extension. . . , -1, o, 1, . . .
to the index set Z.
Proof: Assuming 'l9 1 ,'l9 2 ,... to be i.i.d. U(O,l) and independent of
== (O,I,...), we may construct the -n recursively as functions of
d
and 19 1 ,..., f}n such that (-n'-n+l,...) == for all n. In fact, once
-1,... '-n have been chosen, the existence of -n-l is clear from The-
orem 6.10 if we note that (-n, -n+l, . . .) d (). Finally, the extended
sequence is stationary by Proposition 3.2. 0
N ow fix a measurable transformation T on some measure space (S, S, J-l),
and let SJL denote the J-l-completion of S. We say that a set I c 8 is
invariant if T- 1 I == I and almost invariant if T- 1 I == I a.e. J-t, in the
sense that J-l(T- 1 I b,.I) == O. Since inverse mappings preserve the basic set
operations, the classes I and I' of invariant sets in S and almost invariant
sets in SJL form a-fields in S, called the invariant and almost invariant
a -fields, respectively.
A measurable function j on S is said to be invariant if joT = f and
almost invariant if f 0 T == f a.e. fl. The following result gives the basic
relationship between invariant or almost invariant sets and functions.
Lemma 10.3 (invariant sets and functions) Fix a measure fL and a mea-
surable transformation T on S, and let f be a measurable mapping of S
into a Borel space 8'. Then f is invariant or almost invariant iff it is
I-measurable or I' -measurable, respectively.
Proof: We may first apply a Borel isomorphism to reduce to the case
when S' == R. If f is invariant or almost invariant, then so is the set Ix ==
f-l ( -00, x) for any x E IR, and so Ix E I or I', respectively. Converely, if
f is measurable with respect to I or I', then Ix E I or I', respectively, for
every x E JR. Hence, the function fn(s) == 2-n[2nj(s)], s E 8, is invariant
or almost invariant for every n E N, and the invariance or almost invariance
carries over to the limit f. 0
The next result clarifies the relationship between the invariant and almost
invariant a-fields. Here we write IlL for the j1-completion of I in SIL, the
u-field generated by I and the J-t-null sets in SIL.
Lemma 10.4 (almost in variance) For any distribution J1 and j1-preserving
transformation T on S, the associated invariant and almost invariant
u-fields I and I' are related by I' = IlL.
Proof: If J E IlL, there exists some I E I with J-t( I b,.J) == O. Since T is
j1-preserving, we get
/-L(T- 1 J b.J)
<
/-L(T- 1 J b.T- 1 I) + J-l(T- 1 I Dr.!) + j1(1 Dr.J)
jloT-1(Jt,.I) = jl(JI) = 0,
10. Stationary Processes and Ergodic Theory 181
which shows that J E I'. Conversely, given any J E I', we may choose
some J' E S with J..L( J D.J') == 0 and put I nn Uk>n T-n J'. Then,
clearly, I E I and J-L( I f).J) = 0, and so J E Ill-. - 0
A measure-preserving mapping T on some probability space (5, S, J-l) is
said to be ergodic for J-L or simply J-L-ergodic if the invariant a-field I is
j..L-trivial, in the sense that J-LI = 0 or 1 for every I E I. Depending on
viewpoint, we may prefer to say that J-L is ergodic for T, or T-ergodic.
The terminology carries over to any random element with distribution
J-L, which is said to be ergodic whenever this is true for 'T or J.-t. Thus, is
ergodic iff P{ E I} == 0 or 1 for any I E I, that is, if the a-field If" == -lI
in n is P-trivial. In particular, a stationary sequence == (n) is ergodic if
the shift-invariant a-field is trivial for the distribution of .
The next result shows how the ergodicity of a random element is related
to the ergodicity of the generated stationary sequence.
Lemma 10.5 (ergodicity) Let be a random element in 8 with distribu-
tion J-L, and let T be a J-L-preserving mapping on S. Then is T -ergodic
iff the sequence (Tn) is B-ergodic, in which case even TJ == (f 0 Tn) is
B-ergodic for every measurable mapping f on S.
Proof: Fix any measurable mapping f: 8 -1' 8', and define F == (I 0 Tn;
n > 0), so that PoT == B 0 F. If I c (S')CXJ is B-invariant, then T- 1 p-l I ==
F- 1 ()-1 I == F- 1 I, and so p-l I is T-invariant in S. Assuming to be
ergodic, we obtain P{TJ E I} = P{ E p- 1 I} == 0 or 1, which shows that
even 1/ is ergodic.
Conversely, let the sequence (Tn) be ergodic, and fix any T-invariant set
I in S. Put F == (Tn; n > 0), and define A == {s E 800; Sn E I i.o.}. Then
I = p-l A and A is B-invariant. Hence, P{ E I} == p{crn) E A} == 0 or
1, which means that even is ergodic. 0
We may now state the fundamental a.s. and mean ergodic theorem for
stationary sequences of random variables. Recall that (5, S) denotes an
arbitrary measurable space, and write If" == -lI for convenience.
Theorem 10.6 (ergodic theorem, Birkhoff) Let be a random element in
S with distribution J..L, and let T be a J-L-preserving map on 8 with invariant
a-field I. Then for any measurable function f > 0 on S,
n- 1 "" f(Tk(,) E[f((')II] a.s. (1)
L..; k<n
The same convergence holds in LP for some p > 1 when f E LP(J-L).
The proof is based on a simple, but ingenious, inequality.
Lemma 10.7 (maximzal ergodic lemma) Let == (k) be a stationary
sequence of integrable random variables, and put Sn = l + . . . + n' Then
E[{l; sUPnSn > 0] > o.
182 Foundations of Modern Probability
Proof (Garsia): Put M n == S1 V . . . V Sn. Assuming to be defined on
the canonical space ]Roo, we note that
Sk == 1 + Sk-1 0 () < 1 + (M n 0 ())+, k == 1, . . . , n.
Taking maxima yields M n < 1 + (M n 0 B)+ for all n E N, and so by
stationarity
E[l; M n > 0] > E[Mn - (M n 08)+; M n > 0]
> E[(M n )+ - (M n 0 ())+] == o.
Since M n t sUPn Sn, the assertion follows by dominated convergence. 0
Proof of Theorem 10.6 (Yosida and Kakutani): First assume that f E £1,
and put 1Jk == !(Tk-l€) for convenience. Since E['TJIII] is an invariant
function of by Lemma 10.3, the sequence (k == 'f/k - E['TJIII] is again
stationary. Writing Sn == (1 + . . . + (n, we define for any E > 0
Ac == {limsuPn(Sn/n) > c:},
( == (n - c:)lAe'
and note that the sums S == (i + . . . + ( satisfy
{suPnS > o} == {suPn(S/n) > o}
== {suPn(Sn/n) > c:} n Ac == Ac:.
Since Ac E I, the sequence «() is stationary, and Lemma 10.7 yields
o < E[(f; sUPnS > 0] == E[( - c:; Ac]
== E[E[(II]; Ac] - C:P Ac: == -EP Ac:,
which implies PAc == o. Thus, lim sUP n (Sn / n) < c: a.s., and c: being arbi-
trary, we obtain limsuPn(Sn/n) < 0 a.s. Applying the same result to -Sn
yields liminfn(Sn/n) > 0 a.s., and so by combination Sn/n 0 a.s.
Next assume that! E LP for some p > 1. Using Jensen's inequality and
the stationarity of T k , we get for any A E A and r > 0
E 1 A l n- 1 f(Tk) I P < n- 1 E[lf(Tk)IP; A]
k<n k<n
< r P P A + E[lf()IP; If()1 > r],
which tends to 0 as P A 0 and then r --t 00. Hence, by Lemma 4.10
the pth powers on the left are uniformly integrable, and the asserted £P-
convergence follows by Proposition 4.12.
Finally, let f > 0 be arbitrary and put E[f()II] = fi. Conditioning
on the event {'i} < r} for arbitrary r > 0, we see that (1) holds a.s. on
{'i} < oo}. Next we have a.s. for any r > 0
liminf n- 1 "'" f(Tk) > lirn n- 1 "'" (f(Tk) 1\ r)
n-+oo k$n n-+oo kn
== E[f() 1\ rII].
As r 00, the right-hand side tends a.s. to fj by the monotone convergence
property of conditional expectations. In particular, the left-hand side is a.s.
infinite on {ij == oo}, as required. 0
10. Stationary Processes and Ergodic Theory 183
Write I and T for the shift-invariant and tail a-fields, respectively, in
IRoo and note that I c T. Thus, for any sequence of random variables
= (1' 2,.'.), we have I{ = -lI c -lT. By Kolmogorov's 0-1 law,
the latter a-field is trivial when the €n are independent. If they are even
i.i.d. and integrable, then Theorem 10.6 yields n- I (l -t- . . . + n) -+ El
a.s. and in £1, in agreement with Theorem 4.23. Hence. the last theorem
contains the strong law of large numbers.
It is often useful to allow the function f = fn,k in Theorem 10.6 to depend
on n or k. For later needs, we consider a slighty more general situation.
Corollary 10.8 (approximation, Maker) Let be a random element in S
with distribution Jj, let T be a J.L-preserving map on S with invariant a-field
I, and consider some measurable functions f and fm,k on S.
(i) If fm,k -+ f a.s. and sUPm,k Ifm,k I E £1, then as m" n -+ 00,
n- 1 ,", fm k(Tk) -+ E[f()II] a.s.
k<n '
(ii) If fm,k -+ f in LP for some p > 1, the same convergence holds in LP.
Proof: (i) By Theorem 10.6 we may assume that 1 = O. Then put 9r =
sUPm,k>r I/m,kl, and conclude from the same result that a.s.
limsup I n- 1 L fm,k(Tkf,) I < Hm n- 1 L gr(Tk)
m n-+CX) k<n n-+oo k<n
,
- E[gr () II].
Here gr() -+ 0 a.s., and so by dominated convergence E[9r(€)II] -+ 0 a.s.
(ii) Assuming f = 0, we get by Minkowski's inequality and the invariance
of J-t
Il n- 1 ,", 1m k 0 T k II < n- 1 """ 111m kill) -+ O.
k<n' p k<n'
o
Our next aim is to extend the ergodic theorem to continuous time. We
may then consider a family of transformations Tt on S, t > 0, satisfying
the semigroup property T s + t == TsTt. The semigroup (Tt) is called a flow if
it is also measurable, in the sense that the mapping (x, t) r---+ Ttx is product
measurable from S x JR+ to S. The invariant a-field I now consists of all
sets I E S such that Tt- 1 I == I for all t. A random element in S is said
to be {Tt)-stationary if Tt d f, for all t > o.
Corollary 10.9 (continuous-time ergodic theorem) Let be a random el-
ement in S with distribution Jl, and let (Ts) be a Jj-preserving flow on S
with invariant O'-field I. Then for any measurable function f > 0 on S,
!irn c 1 t f(TsE;,)ds = E(j(E;,)ITd a.s. (2)
t-+oo } 0
The same convergence holds in LP for some p > 1 when f E LP (fl ) .
184 Foundations of Modern Probability
Proof: In both cases we may assume that f > O. Writing Xs = f(Ts),
we get by Jensen's inequality and Fubini's theorem
i t P i t i t
E t- I X ds < Et- I XPds = t- I EXPds = EX P < 00
s - s so'
000
The required convergence now follows as we apply Theorem 10.6 to the
function g(x) == J; f{Tsx)ds and the discrete shift T == Tl.
To identify the limit, we first assume that f E £1 and introduce the
invariant version
l r + n
f() == Hrn lirnsup n- l f(Ts)ds,
r--+oo n--+oo r
which is also I-measurable. By the stationarity ofTs we have EI£. f(Ts) ==
EI£. f() a.s. for all s > O. Using Fubini's theorem, the £l-convergence in
(2), and the contraction property of conditional expectations, we get as
t --t 00
EI J(t,) = EI c11t J(Tst,)ds EI Jet,) = !(t,),
as required. The result extends as before to arbitrary f > o.
o
We return to the case when 1 , 2, . .. is a stationary sequence of inte-
grable random variables, and put Sn == Ek<n k. Since Sn/n converges
a.s. by Theorem 10.6, we note that the maximum M == sUPn(Sn/n) is a.s.
finite. The following result, relating the moments of and M, is known
as the dominated ergodic theorem. Here we write log+ x = log(x V 1) for
convenIence.
Proposition 10.10 (moment inequalities, Hardy and Littlewood, Wiener)
Let == (k) be a stationary sequence of random variables, and put Sn ==
Ekn k and M == sUPn(Sn/n). Then
(i) EIMIP 5 EIIIP for fixed p > 1;
(ii) E IMllog IMI :S 1 + E 1lllog+l Ill for fixed m > o.
The proof requires a simple estimate related to Lemma 10.7.
Lemma 10.11 (maximum inequality) If € == (€k) is stationary in £1, then
rP{suPn(Sn/n) > 2r} < E[€l; €1 > r], r > O.
Proof: For any r > 0, we put k = kl{€k > r} and note that k < k +r.
Assuming to be defined on the canonical space ]Roo and writing An ==
Sn/n, we get
An - 2r == An 0 ( - 2r) < An 0 (r - r),
10. Stationary Processes and Ergodic Theory 185
which implies M - 2r < M 0 (r - r). Applying Lemma 10. 7 to the sequence
r - r, we obtain
rP{M > 2r} < rP{M 0 (r - r) > 0)
< E[r;Mo(r-r»O]
< E(,r == E[l; 1 > r].
o
Proof of Proposition 10.10: We may clearly assume that (,1 > 0 a.s.
(i) By Lemma 10.11, Fubini's theorem, and some calculus,
EMP - pE 1 M r p - 1 dr = p 1 00 P{M > r} r p - 1 dr
< 2p 1 00 E[6j 26 > r] r p - 2 dr
r21
2pE 6 Jo r p - 2 dr
2p(p - 1 )-1 E 1 (21 )P-l :S Ef 0
(ii) For m = 0, we may write
<
<
E(M -1)+ = 1 00 P{M > r}dr
21 00 E[6j 26 > r] r- 1 dr
r2 1 VI
2 E 6 J 1 r -1 dr = 2 E 6 log+ 26
e + 2E[1Iog 21; 21 > e]
1 + E llog+ lo
EM-l
<
<
,..-..,.
For m > 0, we instead write
EMlog M = 1 00 P{Mlog M > r}dr
1 00 P{M > t} (mlogm-1 t + logm. t) dt
< 21 00 E[6j 26 > t] (mlogm-1 t + logm t) r 1 dt
<
<
{log+ 21
2E6 J o (mxm-1+xm)da;
{ I m+l 2'" }
2 E c 10 m 2'" + og+ l
l g+ 1 m + 1
2e + 4E[11ogm+l 21; 21 > e]
1 + E {llog+l {lo
o
186 Foundations of Modern Probability
Given a measure space (8, S, J-L), we introduce for any m > 0 the class
L logm L(J-t) of measurable functions f on S satisfying J Ifllog IfldJ-L < 00.
Note in particular that L logO L == L 1 . Using the maximum inequalities of
Proposition 10.10, we may prove the following multivariate version of The-
orem 10.6 for possibly noncommuting, measure-preserving transformations
T 1 , . . . , Td.
Theorem 10.12 (multivariate ergodic theorem, Zygmund) Let be a ran-
dom element in S with distribution J-l, let Tl' . . . , Td be j.t-preserving maps
on S with invariant a-fields II,... ,Id, and put Jk == -IIk. Then for any
f E L logd-l L(J-l), we have as n1, . . . , nd -+ 00
(nl . . . nd)-l I: . .. I: f(Tfl... T:d) -+ EJd . . . EJI f() a.s. (3)
k 1 <nl kd<nd
The same convergence holds in LP for some p > 1 when f E LP(/L).
Proof: Since E[f()IJk] == J-t[fII k ] 0 a.s., e.g. by Theorem 10.6, we may
take to be the identity mapping on S. For d == 1 the result reduces to
Theorem 10.6. Now assume the statement to be true up to dimension d.
Proceeding by induction, consider any JL-preserving maps T 1 ,. . . , Td+l on S
and let f E L logd L. By the induction hypothesis, the d-dimensional version
of (3) holds as stated, and we may write the result in the form fm -+ 1
a.s., where m = (nl'...' nd). Iterating Proposition 10.10, we also note that
/L sUP m liml < 00. Hence, by Corollary 10.8 (i) we have as m, n -+ 00
I I: k -
n- 1m 0 Td+l JL[III d + 1 ] a.s.,
k<n
as required. The proof of the LP-version is similar. 0
In the commutative case, the last result leads immediately to an interest-
ing relationship between the associated conditional expectations. Let L 1 ()
denote the set of all integrable, -measurable random variables.
Corollary 10.13 (commuting maps and expectations) Assume in Theo-
rem 10.12 that Tl,... ,Td commute, and put :T == nk :Tko Then
E31 . . . E3d == E3 on L 1 ().
Proof: Since even Tfl,. . . , T;d commute for arbitrary k 1 ,... , kd E Z+,
Theorem 10.12 yields
E.Jl . . . E:Jd f() = E:J P1 ... E:J Pd f() a.s. (4)
for any measurable function f > 0 on S and permutation PI, . . . , Pd of
1, 0 . . , d. In particular, the expression in (4) is a.s. Jk-measurable for every
k and therefore 3-measurable. It remains to note that
E[E.Jl . . . E.:Td f(); A] = E[f(); A], A E :1.
o
For commuting mappings Tl, . . . , Td on S, we note that the compositions
T k == Tfl . . . T:d form ad-dimensional semi group indexed by Zi. Similarly,
when (Ti),..., (TJ) are commuting flows on S, the compositions T S =
10. Stationary Processes and Ergodic Theory 187
Tl . . . TJd form a d-dimensional measurable semigroup or flow indexed by
JRt. In the continuous parameter case, it may be more natural to consider
flows indexed by }Rd, corresponding to the case of stationary processes on
JRd. In this context, one may also want to average over more general sets
than rectangles. Here we consider a basic ergodic theorem for increasing
sequences of convex sets. Given such a set B, we define the inner radius
r(B) as the radius of the largest open ball contained in l. Put Ad B == IBj.
Theorem 10.14 (monotone, multivariate ergodic theor-em, Wiener) Let
be a random element in S with distribution J1. Consider a flow of J1-
preserving maps Ts, s E JRd, on S with invariant a-field I and fix some
bounded, convex sets BI C B 2 C . .. in B d with r(Bn) -t 00. Then for any
measurable function f > 0 on S,
I B nJ-l ( f(Tsl;.) ds --+ E[f(I;.)II;] a.s.
iB n
The same convergence holds in LP for some p > 1 when f E £P(J-L).
Several lemmas are needed for the proof. We begin with some esti-
mates for convex sets, stated here without proof. Let BeE denote the
E-neighborhood of the boundary 8B, and write binomial coefficients as
(n//k).
Lemma 10.15 (convex sets) If B C ]Rd is convex and E > 0, then
(i) IB - BI < (2d//d)t B I;
(ii) IBeBI < 2((1 + E:/r(B))d - l)IBI.
We continue with a simple geometric estimate.
Lemma 10.16 (space filling) Fix any bounded, convex sets BI C ... C
Bm in B d with IBII > 0, a bounded set K E Sd, and a function p: K --+
{I, . . . , m}. Then there exists a finite subset H C K such that the sets
Bp(x) + x, x E H, are disjoint and satisfy IKI < (2d//d) l:xEH IBp(x)l.
Proof: Put C x = Bp(x) + x and choose Xl, X2, . .. E K recursively, as
follows. Once Xl,. . . , Xj-l have been selected, we choose Xj E K with the
largest possible p(x) such that C Xt nc x ] = 0 for all i < j. The construction
terminates when no such Xj exists. Put H = {Xi}, and note that the sets
C x with x E H are disjoint. Now fix any y E K. By the construction of H
we have C x n C y =1= 0 for some X E H with p(x) > p(y), and so
y E Bp(x) - Bp(y) + x c Bp(x) - Bp(x) + x.
Hence, K c UXEH(Bp(x) - Bp(x) + x), and so by Lemma 10.15 (i)
IKI < IBp(x) - Bp(x) I < (2d// d) " IBp(x) I.
L.J xEH XEH
We may now establish a multivariate version of Lemma 10.11, stated
for convenience in terms of random measures (see the detailed discussion
below). For motivation, we note that the set function ",B = IB f(Ts)ds in
o
188 Foundations of Modern Probability
Theorem 10.14 is a stationary random measure on}Rd and that the intensity
m of 1J, defined by the relation E'T] = mAd, is equal to Ef().
Lemma 10.17 (maximum inequality) Let be a stationary random mea-
sure on ]Rd with intensity m, and let Bl C B 2 C ... be bounded, convex
sets in Bd with IBll > o. Then
r P{suPk(Bk/IBkl) > r} < m (2d//d), r > o.
Proof: Fix any r, a > 0 and n E N, and define a process 1/ on jRd and a
random set K in Sa = {x E }Rd; Ixl < a} by
1/(x) = inf{k E N; (Bk + x) > rIBkl}, x E d,
K == {x E Sa; v(x) < n}.
By Lemma 10.16 there exists a finite, random subset H C K such that the
sets Bv(x) +x, x E H, are disjoint and IKI < (2d//d) LXEH IBv(x)l. Writing
b = sup{lxl; x E Bn}, we get
Sa+b > LXEH(Bv(x) + x) > r LXEHIBv(x) I > rIKI/(2dffd).
Taking expectations and using Fubini's theorem and the stationarity and
measurability of v, we obtain
m (2dffd) ISa+b1 > rEIKI = r [ P{v(x) < n} dx
JS a
== rlBal P{maxk<n(Bk/IBkl) > r}.
Now divide by ISal, and then let a -+ 00 and n -+ 00 in this order. 0
We finally need an elementary Hilbert-space result. Recall that a con-
traction on a Hilbert space H is defined as a linear operator T such that
IITII < IIII for all E H. For any linear subspace M c H, we write M.L
for the orthogonal complement and M for the closure of M. The adjoint T*
of an operator T is characterized by the identity (,T17) = (T*,rJ), where
(-, .) denotes the inner product in H.
Lemma 10.18 (invariant subspace) For any family T of contractions on
a Hilbert space H, let N denote the I -invariant subspace of H, and let R
be the linear subspace of H spanned by the set { - T; E H, T E T}.
Then N.L c R .
Proof: If .l R, then
(-T*,lJ)==(,1}-T1})=O, TE/,1}EH,
which implies T* = for every TEl. Hence, for any TEl we have
(T,) = (, T* {) = II 11 2 , and so by the contraction property,
o < IIT - 1I2 = IIT1I2 + 1I{1I2 - 2(T,)
< 2111I2 - 2111I2 = 0,
which implies T =. This gives R.L C N, and so N.L C (R.L).L = R . 0
10. Stationary Processes and Ergodic Theory 189
Proof of Theorem 10.14: First assume that f E L 1 , and define
Tsf = f 0 Ts, An = IBnl-1 f Tsds.
JB n
For any € > 0, Lemma 10.18 yields a measurable decomposition
I = I€ + '""" (9k - T Sk 9k) + hE:,
km
where I€ E L 2 is Ts-invariant for all s E IR d , the functions 9, . . . ,9:n are
bounded, and Elh€()1 < E. Here clearly Anl€ = fE. Next, we see from
Lemma 10.15 (ii) that, as n -+ 00 for fixed k < m and £ > 0,
IIAn(gk - T Sk 9k)11 < (I(B n + sk)BnI/IBn/) IIg%11
< 2 ((1 + ISkl/r(Bn))d - 1) 119% II --+ O.
Finally, Lemma 10.1 7 yields
rP{suPnAnlh€()1 > r} < (2dffd)Elh€()1 < (2dffd)£, r,£ > 0,
which implies sUPn Anlh€()1 0 as E --+ O. In particular, it follows that
liminf n Anf() < 00 a.s., which justifies the estimate
(limsuPn -liminfn)Anf()
(limsuPn -liminfn)Anh€()
p
2 sUPnAnlh€ ()I --+ o.
<
This shows that the left-hand side vanishes a.s., and the required a.s.
convergence follows.
When f E LP for some p > 1, the asserted LP -convergence follows as
before from the uniform integrability of the powers IAnf()IP. We may now
identify the limit, as in the proof of Corollary 10.9, and the a.s. convergence
extends to arbitrary f > 0, as in case of Theorem 10.6. 0
We turn to a version of Theorem 10.14 for random measures on ]Rd.
Recall that a random measure on R d is defined as a locally finite kernel
from the basic probability space (n, A, P) into }Rd. In other words, (w, B)
is required to be a locally finite measure in B E B d for fixed wEn and
a random variable in wEn for every bounded set B E B d . Alternatively,
we may regard as a random element in the space M (1d) of locally finite
measures J.-l on ]Rd, endowed with the a-field generated by all evaluation
maps J-L t--1- J-LB with B E B d .
We say that is stationary if 0 s d for every S E= JRd, where the
shift operators Os on M(Rd) are defined by ((}s{t)B = {t(B + s) for all
B E Bd. The invariant a-field of (, is given by I = -lI, where I denotes
the a-field of all shift-invariant, measurable sets in M(d). We may now
define the sample intensity of as the extended-valued random variable
= E[BII]/IBI, where B E Bd is arbitrary with IBI E (0, (0). Note that
this expression is independent of B, by the stationarity of and Theorem
2.6.
190 Foundations of Modern Probability
Corollary 10.19 (sample intensity, Nguyen and Zessin) Let be a sta-
tionary random measure on JRd, and fix some bounded, convex sets Bl C
B 2 C ... in B d with r(Bn) 00. Then Bn/IBnl -1- a.s., where
Ad = E[II]. The same convergence holds in LP for some p > 1 when
[0, 1 J d E LP.
Proof: By Fubini's theorem, we have for any A, B E B d
L (Osf,)Ads = L ds J lA(t - s)f,(dt)
J f,(dt) L lA(t - s) ds = f,(lA * lB).
Assuming IAI == 1 and A C Sa == {s; Isl < a}, and putting B+ == B + Sa
and B- = (BC + Sa)C, we note that also lA * I B - < IB < 1A * I B +.
Applying this to the sets B = Bn gives
IBI €(lA*lB) < Bn < IBI (IA*lB)
IBnl IB;I - IBnl - IBnl IBI .
Since r(Bn) -+ 00, Lemma 10.15 (ii) yields IBI/IBnl -+ 1. Next we may
apply Theorem 10.14 to the function f(J-t) == J.lA and the convex sets B!:
to obtain (lA * 1B)/IBI-+ EIA = in the appropriate sense. 0
The LP-versions of Theorem 10.14 and Corollary 10.19 remain valid un-
der weaker conditions than previously indicated. The following results are
adequate for most purposes.
Here we say that the distributions (probability measures) J.ln on JRd
are asymptotically invariant if IIJ-tn - J.ln * 8s II -1- 0 for every s E jRd,
where II . II denotes the total variation norm. Similarly, the weight functions
(probability densities) fn on }Rd are said to be asymptotically invariant if
,\dlfn - Osfnl ---t 0 for every s. Note that the conclusion of Theorem 10.14
can be written as J.lnX -+ X, where Mn = (IBn . ,\d)/IB n (, Xs = f(Ts),
and X == E[f()II].
Corollary 10.20 (mean ergodic theorem)
(i) For any p > 1, consider on jRd a stationary, measurable, and LP-
valued process X and some asymptotically invariant distributions J.ln.
Then MnX -+ X = E[XIIx] in £P.
(ii) Consider on JRd a stationary random measure with finite intensity
and some asymptotically invariant weight functions in. Then fn
in £1, where Ad == E(II].
Proof: (i) By Theorem 10.14 we may choose some distributions l/m on jRd
such that l/mX -+ X in LP. Using Minkowski's inequality and its extension
in Corollary 1.30, along with the stationarity of X, the invariance of X,
10. Stationary Processes and Ergodic Theory 191
and dominated convergence, we get as n -+ 00 and then m -+ 00
IIJln X - Xllp < IIJln X - (Jln * vm)Xllp + II(Jln * vm)..-Y - Xllp
< lIJ.Ln - J.Ln * vmllllXllp + J 11(8 5 * vm)X - Xllp J.Ln(ds)
< IIXli p J IIJ.Ln - J.Ln * 8tll vm(dt) + IlvmX - Xllp --+ O.
(ii) By Corollary 10.19 we may choose some weight functions gm such
that f,gm -+ in L 1 . Using Minkowski's inequality, the stationarity of ,
the invariance of f" and dominated convergence, we get as n -+ 00 and then
m-*oo
IIf,fn - lIl < lIf,fn - f,(fn * 9m)lh + 11f,(fn * gm) -- Ih
< Et.lin - in * 9ml + J 1It.(8 5 * 9m) - Ih in(s) ds
< E J Adlin - Bdnl 9m(t) dt + 1It.9m - Ih --+ O. 0
Additional conditions may be needed to ensure LP-convergence in case
(ii) when f,B E LP for bounded sets B. It is certainly enough to require
In < cg n for some weight functions 9n with 9n -+ in LP and some
constant C > o.
Our next aim is to prove a subadditive version of Theorem 10.6. For mo-
tivation and subsequent needs, we begin with a simple result for nonrandom
sequences. Recall that a sequence Cl, C2, . . . E JR. is said to be subadditive if
C m + n < C m + C n for all m, n E N.
Lemma 10.21 (subadditivity) For any subadditive sequence Cl, C2, ... E
JR., we have
I . Cn . f Cn [ )
1m - = In - E -00, 00 .
n--+-oo n n n
Proof: Iterating the subadditivity relation, we get for a.ny k, n E N
C n < [n/k]ck + Cn-k[n/k] < [n/ k]Ck + Co V . . . V Ck-l,
where Co == O. Noting that [n/k] n/k as n -* 00, we get 1imsuPn(c n /n) <
Ck/ k for all k, and so
. f Cn < I . . f Cn I . C n . f Cn
In - _ 1m In - < lmsup- < In -.
n n noo n noo n n n
o
We turn to the more general case of a two-dimensional array Cjk, 0 <
j < k, which is said to be subadditive if Co,n < Co,m + cm"n for all m < n.
The present notion reduces to the previous one when Cjk == Ck-j for some
sequence Ck. We also note that subadditivity holds automatically for arrays
of the form Cjk = aj+1 + . . . + ak.
We shall now extend the ergodic theorem to subadditive arrays of random
variables jk, 0 < j < k. For motivation, we recall from Theorem 10.6 that if
f,jk = 1]j+1 +.. .+1]k for some stationary and integrable sequence of random
192 Foundations of Modern Probability
variables T/k, then (O,n/n converges a.s. and in L 1 . A similar result holds
for general subadditive arrays (jk) that are stationary under simultaneous
shifts in the two indices, so that (j+l,k+l) d «(j,k). To allow for a wider
range of applications, we introduce the slightly weaker assumptions
(k,2k, (2k,3k, . . . )
(k,k+l, k,k+2,...)
d
d
(O,k , k,2k, . . . ),
(O,I, O,2,...),
kEN,
kEN.
(5)
(6)
For convenience of reference, we also restate the subadditivity requirement:
O,n < O,m + m,n' 0 < m < n. (7)
Theorem 10.22 (subadditive ergodic theorem, Kingman) Let (jk) be a
subadditive array of random variables satisfying (5) and (6), and assume
that E1 < 00. Then O,n/n converges a.s. toward a random variable in
[-00,(0) with E = infn(Eo,n/n) = c. The same convergence holds in £1
when c > -00. If the sequences in (5) are ergodic, then is a.s. a constant.
Proof (Liggett): Put O,n = n for convenience. By (6) and (7) we have
E;t < nEt < 00. We first assume c > -00, so that the variables m,n
are integrable. Iterating (7) gives
n / n
-:;; < L (j-l)k,jk/n + L j-l,j/n, n, kEN. (8)
j=1 j=k[n/k]+1
By (5) the sequence (j-l)k,jk' j E N, is stationary for fixed k, and so
1 -. 1
by Theorem 10.6 we have n- Lj::;n (j-l)k,jk -+ k a.s. and In L , where
Ek = Ek. Hence, the first term in (8) tends a.s. and in L 1 toward k/k.
Similarly, n- 1 Lj::;n j-l,j -+ (1 a.s. and in £1, and so the second term in
(8) tends in the same sense to o. Thus, the right-hand side converges a.s.
and in L 1 toward k/k, and since k is arbitrary, we get
limsuPn(n/n) < infn(n/n) = < 00 a.s.
(9)
The variables ;t In are uniformly integrable by Proposition 4.12, and
moreover
ElimsuPn(n/n) < E < infn(En/n) == infn(En/n) == c. (10)
To derive a lower bound, let KnJl(jk) be uniformly distributed over
{I,..., n} for each n, and define
('k = ltn,ltn+k, 171: = K,n+k - ltn+k-l,
kEN.
By (6) we have
( (f , (2\ · . .) d (1, 2, . . . ), n EN. (11 )
Moreover, rif: < EK,n+ k - 1 ,K n +k d 1 by (6) and (7), and so the vari-
ables (1}i:)+ are uniformly integrable. On the other hand, the sequence
10. Stationary Processes and Ergodic Theory 193
Ef,l, Ef,2, ... is subadditive, and so by Lemma 10.21 we have as n -+ 00
Erlk = n-1(En+k - Ek) -+ infn(Enln) = C, kEN. (12)
In particular, SUPn E/1Jkl < 00, which shows that the sequence T}k, rJ, . .. is
tight for each k. Hence, by Theorems 4.29,5.19, and 6.14., there exist some
random variables (k and 'T/k such that
d
(f, (2\ . . . ; rJ? , 1]2\ . . . ) -+ (1, (2, . . . ; T}1, rJ2, . . . ) (13)
along a subsequence. Here «(k) d (k) by (11), and so by Theorem 6.10 we
may assume that (k == k for each k.
The sequence 111, 'f}2, . .. is clearly stationary, and by T...Jemma 4.11 it is
also integrable. From (7) we get
'fJ + . . . + 'fJk == f,n +k - f,n < f,n ,n +k == (;:,
and so in the limit 111 + . . . + 'f}k < f,k a.s. Hence, Theorern 10.6 yields
f.njn > n- 1 L k :o;n TJk -+ ij a.s. and in L 1
for some fj ELI. In particular, the variables ;; In are uniformly integrable,
and so the same thing is true for n/n. Using Lemma 4.11 and the uniform
integrability of the variables ('f}k)+ together with (10) and (12), we get
c - lirn SUPnE'TJ? < ET}l == Efj
< Eliminfn(n/n)
< ElimsuPn(n/n) < E < c.
Thus, n/n converges a.s., and by (9) the limit equals . Furthermore, by
Lemma 4.11 the convergence holds even in L 1 , and E == c. If the sequences
in (5) are ergodic, then n = En a.s. for each n, and we get == c a.s.
Now assume instead that c = -00. Then for each r E .Z, the truncated
array f,m,n V r(n - m), 0 < m < n, satisfies the hypotheses of the theorem
with c replaced by c T == infn(E/n) > r, where == f,n V rn. Thus,
f,/n '= (f,n/n) V r converges a.s. toward some random variable r with
mean c T , and so f,n/n -t inf r (,T _ . Finally, E < inf r c r == c = -00 by
monotone convergence. 0
As an application of the last theorem, we may derive a celebrated ergodic
theorem for products of random matrices.
Theorem 10.23 (random matrices, Furstenberg and Kesten) Consider
a stationary sequence of random d x d matrices X k == (X&) such that
xt > 0 a.s. and EllogX&1 < 00 for all i andj. Then n- 1 1og(Xl... xn)j
converges a.s. and in L1 as n -t 00, and the limit is inde]}endent of (i, j).
Proof: First let i = j = 1, and define
m,n = log(X m + 1 . . . X n )11, 0 < m < n.
194 Foundations of Modern Probability
The array (-m,n) is clearly subadditive and jointly stationary, and we
have EIo,11 < 00 by hypothesis. Further note that
(Xl .. - xn)u < d n - l I1 k ::;n m8.Xi,jX-
Hence,
O,n - (n - 1) logd < L log8:XXt < L L Ilogxtl,
k< ,J k . .
_ n :::;n ,J
and so
n-lEo,n < logd+ L. .EllogXI,jl < 00.
,J
Thus, by Theorem 10.22 and its proof, there exists an invariant random
variable such that O,n/n -+ a.s. and in L 1 .
To extend the convergence to arbitrary i, j E {1, . . . , d}, we write for any
nEN
XlI (X 3 . . . xn)IIXj+l < (X 2 . . . xn+l )ij
< (Xi i X jl+ 2 )-1(X 1 ... X n + 2 )11.
Noting that n-1logXij ---1- 0 a.s. and in L 1 by Theorem 10.6, and
using the stationarity of (xn) and the invariance of , we obtain
n- 1 Iog(X 2 . . . xn+l )ij -+ a.s. and in L 1 . The desired convergence now
follows by stationarity. 0
We turn to the decomposition of an invariant distribution into ergodic
components. For motivation, consider the setting of Theorem 10.6 or 10.14,
and assume that S is Borel, to ensure the existence of regular conditional
distributions. Writing 'rJ = P[ E .II], we get
.c() = EP[ E -lId = E7] = J mP{7] E dm}.
(14)
Furthermore,
1]/ = P[ E /II] = 1{ E I} a.s., / E I,
and so 1J/ = 0 or 1 a.s. for all I E I. If we can choose the exceptional
null set to be independent of I, it follows that 'rJ is a.s. ergodic, and (14)
gives the desired ergodic decomposition of JL = .c(). Though the suggested
result is indeed true, the proof requires a different approach.
10. Stationary Processes and Ergodic Theory 195
Proposition 10.24 (ergodicity by conditioning, Farrell, Va radarajan)
Let be a random element with distribution J.1 in a Borel space S, and
let T == (Ts; S E }Rd) be a measurable group of j1-preserving maps on S
with invariant a-field I. Then 1] == P[ E .II] is a.s. invariant and ergodic
under T.
For the proof, we fix an increasing sequence of convex sets En E B d with
r(Rn) -+ 00 and introduce on S the probability kernels
/Ln(X, A) = IBnl-1 ( lA(Tsx) ds, XES, A E 5,
JB n
and the associated empirical distributions 1Jn == j1n (, .). By Theorem 10.14
we note that 'flnf -+ 1Jf a.s. for every bounded, measurable function f on
S, where 1J = P[ E .II]. We say that a class C c S is measure-determining
if every probability measure on S is uniquely determined by its values on
c.
Lemma 10.25 (degenerate limit) Let AI, A 2 , . . . E S be rneasure-determin-
ing and such that 1JnAk -+ P{ E A k } a.s. for each k. Then is
ergodic.
Proof: By Theorem 10.14 we have 'fJnA -+ 'flA - P[ E AII] a.s. for
every A E S, and so by comparison 'flAk == P{ E A k } a.s. for all k. Since
the Ak are measure-determining, it follows that 1] == £() a.s. Hence, for
any I E I we have a.s.
P{ E I} = 1]1 == P[ E III] == 1I() E {O, I},
which implies P{ E 1} = 0 or 1.
o
Proof of Proposition 10.24: By the stationarity of , we have for any
A E Sand s E }Rd
1] 0 Ts-I A = P[Ts E AII] == P[ E AII] == 'flA a.s.
Since S is Borel, we obtain 'floT;1 == 'TJ a.s. for every s. Now put C == [0, l]d,
and define 17 == J c ('TJo Ts-l )ds. Since 'fl is a.s. invariant under shifts in Zd, the
variable ij is a.s. invariant under arbitrary shifts. Furthernl0re, by Fubini's
theorem,
A d { S E [0, 1]d; 'TJ 0 Ts-I == 17} == 1 a.s.,
and therefore fj = 'TJ a.s. This shows that 'TJ is a.s. T-invaria.nt.
Let us now choose a measure-determining sequence At, A 2 ,... E S, which
is possible since S is Borel. Noting that 1}nAk -t 1]Ak a.s. for every k by
Theorem 10.14, we get by Theorem 6.4
7Jn k {x E S; /Ln(x, Ak) 7J A k} = pI. n k {7Jn A k 7JAd = 1 a.s.
Since 1] is a.s. a T-invariant probability measure on S, Lemma 10.25 ap-
plies for every wEn outside a P-null set, and we conclude that TJ is a.8.
ergodic. 0
196 Foundations of Modern Probability
We have seen that (14) gives a representation of the distribution J..t =
L() as a mixture of invariant and ergodic probability measures. The next
result shows that this decomposition is unique and characterizes the ergodic
measures as extreme points in the convex set of invariant measures.
To explain the terminology, recall that a subset M of a linear space is
said to be convex if cm} + (1- c)m2 E M for all ml, m2 E M and c E (0,1).
In that case, we say that m E M is extreme if for any ml, m2, and c as
above, the relation m = em} + (1 - c)m2 implies m} = m2 = m. With any
set of measures J.-L on a measurable space (8, S), we associate the a-field
generated by all evaluation maps 7r B : J-t J-LB, B E S.
Theorem 10.26 (ergodic decomposition, Krylov and Bogolioubov) Let
r := (Ts; S E d) be a measurable group of transformations on some Borel
space S. Then the r -invariant distributions on S form a convex set M,
whose extreme points agree with the ergodic measures in M. Moreover,
any measure J..l E M has a unique representation J-L = J m 1I( dm) with II
restricted to the set of ergodic measures in M.
Proof: The set M is clearly convex, and by Proposition 10.24 we have
for every J-L E M a representation J.l = f m v( dm ), where II is a probability
measure on the set of ergodic measures in M. To see that II is unique, we
introduce a regular conditional distribution TJ = J.L[ . II] a.s. J..l on S, and
note that J..lnA -+ 1]A a.s. J-l for all A E S by Theorem 10.14. Thus, for any
AI, A 2 , . . . E S, we have
mn k {x E S; JLn(x,A k ) -t 1J(x,Ak)} = 1 a.e. v.
The same relation holds with TJ(x, A k ) replaced by mA k , since II is restricted
to the class of ergodic measures in M. Assuming the sets Ak to be measure-
determining, we conclude that m{x; 'TJ(x,.) = m} = 1 a.e. v. Hence, for any
measurable set A eM,
JL{1J E A} = f m{17 E A}v(dm) = J lA(m)v(dm) = vA,
which shows that II = J-L 0 TJ-}.
To prove the equivalence of ergodicity and extremality, fix any measure
J-t E M with ergodic decomposition f m v( dm). Let us first assume that
J-l is extreme. If it is not ergodic, then v is nondegenerate, and we have
v = CVl + (1 - C)1I2 for some v1.l.v2 and C E (0,1). Since JL is extreme, we
obtain J m VI ( dm) = J m V2 ( dm), and so III = V2 by the uniqueness of the
decomposition. The contradiction shows that J.L is ergodic.
Next assume JL to be ergodic, so that v = bJ.t, and let J.L = CJ-ll + (1- e)J.L2
with Jll, J-t2 E M and C E (0,1). If Jli = J m IIi (dm) for i = 1,2, then 611- =
cVl+(1-c)V2 by the uniqueness of the decomposition. Hence, VI := V2 = DJ.L'
and so J.LI = J-l2, which shows that JL is extreme. 0
We conclude the chapter with some powerful coupling results that will
be needed for our discussion of Palm distributions in Chapter 11 and also
10. Stationary Processes and Ergodic Theory 197
for the ergodic theory of Markov processes developed in Chapter 20. First
we consider pairs of measurable processes on JR.+ with values in an arbi-
trary measurable space S. In the associated path space, we introduce the
invariant a-field I and the tail a-field T == nt Tt, where Tt == O"(Ot), and
we note that I c T. For any signed measure v, let Ilvll.A denote the total
variation of v on the a-field A.
Theorem 10.27 (coupling on JR.+, Goldstein, Berbee, Aldous and Tho-
risBon) For any S-valued, measurable processes X and Y on JR.+, we
have
(i) X d Y on T iff (a, OuX) d (T, Or Y) for some random times 0", T > 0,
and also iff 1I£(BtX) - £(OtY)11 ---+ 0 as t ---+ 00;
(ii) X d Y on I iff (}u X d OrY for some random ti'mes a, T > 0, and
also iff II Jo 1 (£(Ost X ) - £(OstY))dsll ---+ 0 as t ---+ ex).
If the path space is Borel, we can strengthe_n the distributioal couplings in
(i) and (ii) to the a.s. versions OrX == Or Y and ()uX == Or Y, respectively,
- d
for some Y == Y.
Proof of (i): Let J.l1 and J.l2 be the distributions of X and Y, and assume
that J.l1 == J-t2 on T. Write U == SJR+ , and define a mapping p on JR+ x U by
p( s, x) == (s, () sx). Let C denote the class of all pairs (VI, 1/2) of measures on
JR+ x U such that
-1 -1 <
VI 0 P == V2 0 P , VI _ J..L 1 ,
V2 < J.L2,
(15)
where Vi == Vi(JR+ x .), and regard C as partially ordered under compo-
nentwise inequality. By Corollary 1.16 we note that every linearly ordered
subset has an upper bound in C. Hence, Zorn's lemma ensures the existence
of a maximal element (VI, V2).
To see that VI == J.Ll and V2 == J.l2, we define J.l == J-ii -- Vi, and conclude
from the equality in (15) that
IIJ.L - J.l liT II VI - V211 T < II VI - V2 "t
< 2v}((n,00) x U) -+ 0,
(16)
which implies J.l = J.L on T. Next, by Corollary 2.13, there exist some
measures Jii < J.l satisfying
n n ' /\ ' ,..,-
J.ll = J.l2 == J-ll J-l2 on J n,
n EN.
Writin g v!'- == {) to.. 1I.1}- we g et ii!1 < 1/ and v n 0 P - l == v n 0 P - 1 and so
1, n \(y r-1, , 1, -""'1, 1 2'
(VI + Vr,V2 + V2') E C. Since (VI,V2) is maximal, we obtain vI = v2' == 0,
and so by Corollary 2.9 we have /-l ...L J-L on Tn for a1l1L In other words,
J.LAn == J.LA == 0 for some sets An E Tn. But then also /-lA == J.LAc == 0,
where A = lim SUP n An E T. Since the /-l agree on T" we obtain J-i ==
J-L = 0, which means that Vi = J.li. Hence, by Theorenl 6.10 there exist
some random variables a, T > 0 such that the pairs (a, X-) and (T, Y) have
198 Foundations of Modern Probability
distributions VI and V2, and the desired coupling follows from the equality
in (15).
The remaining claims are easy. Thus, the relation (a, 8a X ) d (7, B-r Y)
implies 1IJ.t1 - J.t211T n ---t 0 as in (16), and the latter condition yields J.ll == J.t2
on T. When the path space is Borel, the asserted a.s. coupling follows from
the distributional version by Theorem 6.10. 0
To avoid repetitions, we postpone the proof of part (ii) until after the
proof of the next theorem, where we consider a closely related result in-
volving groups G of transformations on an arbitrary measurable space
(8, S).
Theorem 10.28 (group coupling, Thorisson) Let the lcscH group G act
measurably on a space S, and let and TJ be random elements in S such
that d rJ on the G-invariant a-field I. Then, d TJ for some random
element , in G.
Proof: Let J.tl and J.t2 be the distributions of and 1J. Define p: G x S ---t 8
by p(g, s) == g8 and let C denote the class of pairs (VI, V2) of measures on
G x S satisfying (15) with Vi == Vi(G x .). Using Zorn's lemma as before,
we see that C has a maximal element (VI, V2), and we claim that Vi == J.Li
for i == 1, 2.
To see this, let A be a right-invariant Baar measure on G, which exists
y Theorem 2.27. Since A is a-finite, we may choose a probability measure
A rv A and define
J.L == J.ti - Vi, Xi == X @ J.t,
i == 1, 2.
By Corollary 2.13 there exist some measures v < Xi satisfying
I -1 I -1 -1 /\ -1
VI 0 P == V2 0 P = Xl 0 P X2 0 P .
Then v < J.L for i == 1,2, and so (VI + v, V2 + v) E C. Since (VI, V2) is
maximal, we have v == v = 0, and so Xl op-l 1- X2 0 p-l by Corollary 2.9.
In other words, there eists a set Al == A 2 E S such that Xi 0 p-l Ai == 0
for i == 1,2. Since A « A, Fubini's theorem gives
is J.t(ds) L 1A i (gs) ..\( dg) = (..\ @ J.tD 0 p-l Ai = O. (17)
By the right invariance of .x, the inner integral on the left is G-invariant and
therefore I-measurable in s E S. Since also J-t = J-L on I by (15), equation
(17) remains true with Ai replaced by Ai. Adding the two formulas gives
A Q9 J-l = 0, and so J-L = o. Thus, Vi = J-ti for i = 1, 2.
Since G is Borel, there exist by Theorem 6.10 some random elements a
and 7 in G such that (O',) and (7, '11) have distributions VI and V2. By (15)
we get u d 7rJ, and so the same theorem yields a random element f in
G such that (f,(1) d (7,7'fJ). But then f-la d 7- 1 717 = 1}, which proves
the desired relation with 'Y = ;;--lu. 0
10. Stationary Processes and Ergodic Theory 199
Proof of Theorem 10.27 (ii): In the last proof, we replace S by the path
space U = SR+, G by the semigroup of shifts ()t, t > 0, and A by Lebesgue
measure on +. Assuming X d Y on I, we may proceed as before up to
equation (17), which now takes the form
L 1L(dx) 1 00 1A. ((}t X ) dt = (>' 01L) 0 p-1A i = O. (18)
Writing fi(X) for the inner integral on the left, we note that for any h > 0
Ii ((}h X ) = 1 00 1A. ((}t X ) dt = 1 00 lOi:'A. ((}t. r ) dt. (19)
Hence, (18) remains true with Ai replaced by ()hIAi, and then also for the
Ol-invariant sets
Al = limsup OlAI'
noo
A 2 = lim inf fJ;;lA 2 ,
noo
- -
where n -+ 00 along N. Since Al = A 2 , we may henceforth assume the
Ai in (18) to be Ol-invariant. Then so are the functions Ii in view of (19).
By the monotonicity of Ii oOh, the Ii are then Oh-invariant for all h > 0
and therefore I-measurable. From this point on, we may argue as before to
show that 8(7X d (}r Y for some random variables 0-, T > o. The remaining
assertions are again routine. 0
Exercises
1. State and prove continuous-time, two-sided, and higher-dimensional
versions of Lemma 10.1.
2. Consider a stationary random sequence = (I, 2,. . .). Show that the
n are i.i. d. iff III ( 2 , 2, . . . ).
3. Fix a Borel space 5, and let X be a stationary array of S-valued random
elements in S, indexed by N d . Show that there exists a stationary array Y
indexed by Zd such that X = Y a.s. on N d .
4. Let X be a stationary process on JR+ with values in some Borel space
S. Show that there exists a stationary process Y on JR with X d Y on JR+.
Strengthen this to a.s. equality when S is a complete metric space and X
is right-continuous.
5. Consider a two-sided, stationary random sequence with restriction
1/ to N. Show that and 1/ are simultaneously ergodic. (Hint: For any
measurable, invariant set ] E SZ, there exists some measurable, invariant
set ]' E SN with] = Sz- X ]' a.s. £().)
6. Establish two-sided and higher-dimensional versions of Lemmas 10.4 and
10.5 as well as of Theorem 10.9.
200 Foundations of Modern Probability
7. A measure-preserving transformation T on some probability space
(8, S, JL) is said to be mixing if Jl(A n T-n B) -t JlA . J-LB for all A, B E S.
Prove the counterpart of Lemma 10.5 for mixing. Also, show that any mix-
ing transformation is ergodic. (Hint: For the latter assertion, take A == B
to be invariant.)
8. Show that it is enough to verify the mixing property for sets in a gen-
erating 7r-system. Use this fact to prove that any i.i.d. sequence is mixing
under shifts.
9. Fix any a E IR, and define Ts == s+a (mod 1) on [0,1]. Show that T fails
to be mixing but is ergodic iff a ft Q. (Hint: To prove the ergodicity, let
I C [0,1] be T-invariant. Then so is the measure 1[.'\, and since the points
ka are dense in [0, 1], it follows that 1[ . A is invariant. Now use Theorem
2.6. )
10. (Bohl, Sierpinski, Weyl) For any a fj Q, let JLn == n- 1 Lkn Dka, where
ka is defined modulo 1 as a number in [0,1]. Show that JLn A. (Hint:
Apply Theorem 10.6 to the mapping of the previous exercise.)
11. Prove that the transformation Ts == 2s (mod 1) on [0,1] is mixing.
Also show how the mapping of Lemma 3.20 can be generated as in Lemma
10.1 by means of T.
12. Note that Theorem 10.6 remains true for invertible shifts T, with av-
erages taken over increasing index sets [an, b n ] with b n - an -+ 00. Show by
an example that the a.s. convergence may fail without the assumption of
monotonicity. (Hint: Consider an Li.d. sequence (n) and disjoint intervals
[an, b n ], and use the Borel-Cantelli lemma.)
13. Consider a one- or two-sided stationary random sequence (n) in some
measurable space (8, S), and fix any B E S. Show that a.s. either n E BC
for all n or n E B Lo. (Hint: Use Theorem 10.6.)
14. (von Neumann) Give a direct proof of the L2- vers ion of Theorem 10.6.
(Hint: Define a unitary operator U on £2 (S) by U f = f 0 T. Let M denote
the U-invariant subspace of L 2 and put A = I - U. Check that M.L == R A,
the closed range of A. By Theorem 1.33 it is enough to take f E M or
f ERA.) Deduce the general LP-version, and extend the argument to
higher dimensions.
15. In the context of Theorem 10.26, show that the ergodic measures form
a measurable subset of M. (Hint: Use Lemma 1.41, Proposition 4.31, and
Theorem 10.14.)
16. Prove a continuous-time version of Theorem 10.26.
17. Deduce Theorem 4.23 for p < 1 from Theorem 10.22. (Hint: Take
Xm,n = 18n - 8m1 P , and note that EI8nlP = o(n) when p < 1.)
10. Stationary Processes and Ergodic Theory 201
18. Let == (I, 2, . . .) be a stationary sequence of random variables, fix
any B E B(IR d ), and let n be the number of indices k E {I,..., n - d}
with (k,'.' ,d) E B. Prove from Theorem 10.22 that n/n converges
a.s. Deduce the same result from Theorem 10.6, by considering suitable
subsequences.
19. Show that the inequality in Lemma 10.7 can be strengthened to
E[I; sUPn(Sn/n) > 0] > O. (Hint: Apply the original result to the variables
k + €, and let € 0.)
20. Extend Proposition 10.10 to stationary processes on Zd.
21. Extend Theorem 10.14 to averages over arbitrary rectangles An ==
[0, anI] X . . . X [0, and] such that anj 00 and sUPn (a n2 / anj) < 00 for all
i =1= j. (Hint: Note that Lemma 10.17 extends to this ca'3e.)
22. Derive a version of Theorem 10.14 for stationary processes X on Zd.
(Hint: By a suitable randomization, construct an associated stationary pro-
cess X on d, apply Theorem 10.14 to X, and estimate the error term as
in Corollary 10.19.)
23. Give an example of a stationary, simple point process on ]Rd with a.s.
infinite sample intensity .
24. Give an example of two processes X and Y on + such that X d Y
on I but not on T.
25. Derive a version of Theorem 10.27 for processes on Z+. Also prove
versions for processes on t and Zi.
26. Show that Theorem 10.27 (ii) implies a corresponding result for pro-
cesses on . (Hint: Apply Theorem 10.27 to the processes Xt == ()tX and
Y == Ot Y .) Also show how the two-sided statement follows from Theorem
10.28.
27. For processes X on +, define Xt = (Xt, t) and let i be the associated
invariant a-field. Assuming X and Y to be measurable, show that X d Y
- d - -
on T iff X == Y on I. (Hint: Use Theorem 10.27.)
28. Prove Lemma 10.15 (ii). (Hint (Day): First show that if Sr C B, then
B + SE: C (1 + €/r)B, where Sr denotes an r-ball around 0.)
Chapter 11
Special Notions of
Symmetry and Invariance
Palm distributions and inversion formulas; stationarity and
cycle stationarity; local hitting and conditioning; ergodic prop-
erties of Palm measures; exchangeable sequences and processes;
strong stationarity and predictable sampling; ballot theorems;
entropy and information
This chapter is devoted to some loosely connected topics that are all related
to our previous treatment of stationary processes and ergodic theory. We
begin with a discussion of Palm distributions of stationary random mea-
sures and point processes. In the simplest setting, when is a stationary,
simple point process on JRd, we may think of the associated Palm distri-
bution Q as the conditional distribution, given that has a point at o.
A formal definition is possible when has finite and positive intensity, in
which case the mentioned interpretation may be justified by a limit theo-
rem. In the ergodic case, the distributions of the original process and its
Palm version agree up to a random shift, which leads to some useful ergodic
and averaging relations. Finally, the theory of Palm distributions provides
a striking relationship between the notions of stationarity under discrete
and continuous shifts.
Asymptotically invariant sampling from a stationary sequence or process
leads in the limit to an exchangeable sequence. This is the key observa-
tion behind de Finetti's theorem, the fact that exchangeable sequences are
mixed i.i.d. It also implies the further equivalence with the notion of spread-
ability or subsequence invariance, which in turn is equivalent to strong
stationarity or invariance in distribution under optional shifts. In the other
direction, we consider the striking and useful predictable sampling theo-
rem, the fact that an exchangeable distribution remains invariant under
predictable permutations. The latter result will be used in Chapters 13-15
to give simple proofs of the various versions of the arcsine laws.
The chapter concludes with a general so-called ballot theorem for sta-
tionary, singular random measures and with a version of the fundamental
ergodic theorem of information theory. The former result leads, whenever
it applies, to some very precise maximum inequalities, related to those of
the preceding chapter and with important applications to queuing theory
11. Special Notions of Symmetry and Invariance 203
and other areas. The latter result relates ergodic theory to the notion of
entropy, of such basic importance in statistical mechanics.
The material in this chapter is related in many ways to other parts of the
book. In particular, we may point out some links to various applications
and extensions, in Chapters 12, 13, and 16, of results for exchangeable
sequences and processes. Furthermore, the predictable sampling theorem
is related to some results on random time change appearing in Chapters
18 and 25.
A random measure on !R d is defined as a locally finite kernel from the
basic probability space to !Rd. It is called a point process if B is integer-
valued for every bounded Borel set B. In the latter case, is said to be
simple if {s} < 1 for all s E R d outside a fixed P-null set. A more detailed
discussion of random measures is given in Chapter 12. We begin the present
treatment with a basic general property.
Lemma 11.1 (zero-infinity law) If is a stationary random measure on
or Z, then [O,oo) == 00 a.s. on {i= O}.
Proof: We first consider the case of random measures on JR. By the
stationarity of and Fatou's lemma, we have for any t I JR and h, E > 0
P{[t, t + h) > E} limsuPnP{[(n - l)h,nh) > E}
< P{[(n - l)h, nh) > E i.o.}
< P{[O, (0) = oo}.
Letting e -t 0, h -t 00, and t -t -00 in this order, we get P{ =1= O} <
P{ [O, 00) == oo}. Since trivially [O, (0) == 00 implies ::/: 0, we obtain
P{[O,oo) < 00, =1= O} == P{ -I O} - P{[O,oo) == oo} < 0,
and the assertion follows. The result for random measures on Z may be
proved by the same argument with t and h restricted to Z. 0
Now consider on JRd a random measure and a rneasurable random
process X, taking values in an arbitrary measurable space S. We say that
and X are jointly stationary if Ot(X,) = ((}tX, ()t) d (X,) for every
t E ]Rd. By Theorem 2.6 and the stationarity of , we have E = CAd for
some cnstant c E [0,00], called the intensity of, and we note that c == E[,
where is the sample intensity in Corollary 10.19.
If X and are jointly stationary and has finite and positive intensity,
we define the Palm distribution Qx, of (X, c;) with r'espect to c; by the
formula
Qx,d = E 1 f(()s(X, f,,)) f,,(ds) /Ef"B,
(1)
for any set B E B d with Ad B E (0, 00) and for measurable functions
f > 0 on SIRd x M(]Rd). The following result shows that the definition
is independent of the choice of B.
204 Foundations of Modern Probability
Lemma 11.2 (coding) Consider a stationary pair (X,) on JRd, where X
is a measurable process in Sand € is a random measure. Then for any
measurable function f > 0, the stationarity carries over to the random
measure
fB = L f(Os(X, )) (ds), B E B d .
Proof: For any t E d and B E Bd, a simple computation gives
«()tf)B = f(B + t) = f f«()s(X,)) (ds)
JB+t
J 1 B (s - t) f ( 0 s (X, ) ) ( ds )
- J IB(u)f(Ou+t(X,))(du+t)
L f(()u()t(X,)) (()t)(du).
Writing f == F(X,€) and using the stationarity of (X,), we obtain
f)tf = F( Ot (X, €)) d F(X,) == €f, t E ]Rd.
o
The mapping in (1) is essentially a one-to-one correspondence, and we
proceed to derive some useful inversion formulas. To state the latter, it is
suggestive to introduce a random pair (Y,1}) with distribution Qx,, where
in view of (1) the process Y can again be chosen to be measurable. When
is a simple point process, then so is 1}, and we note that 1]{0} == 1 a.s.
The result may then be stated in terms of the associated Voronoi cells
VJt == {s E }Rd; j-t(Slsl + s) == O}, j-t E N(IR d ),
where NORd) is the class of locally finite measures on JR+ and Sr denotes the
open ball of radius r around the origin. If also d == 1, we may enumerate the
supporting points of J.t in increasing order as tn(j-t), subject to the conven-
tion to(j-t) < 0 < tl(J.j). To simplify our statements, we often omit the obvi-
ous requirement that the space S and the functions f and 9 be measurable.
Proposition 11.3 (uniqueness and inversion) Consider a stationary pair
(X,) on ]Rd, where X is a measurable process in S and is a random
measure with E E (0,00). Then P[(X, c;) E .1 i= 0] is uniquely determined
by Ley, 1]) == Qx,f., and the following inversion formulas hold:
(i) For any f > 0 and 9 > 0 with >..d g < 00,
E[J(X, ); I- 0] = Et,. E J f(:;») g( -s) ds.
(ii) If is a simple point process, we have for any f > 0
E[f(X, ); I- 0] = Et,. E f f(Os(Y, 1])) ds.
JV TI
11. Special Notions of Symmetry and Invariance 205
(iii) If is a simple point process and d == 1, we have for any f > 0
(t 1 (1])
E[f(X, ); # 0] = Ef. · E Jo f( ()s(Y, 17)) ds.
To express the conditional distribution P[f{X,) E .\ 1= 0] in terms of
£(Y,1}), it suffices in each case to divide by the corresponding formula for
f = 1. The latter equation also expresses P{ 1= O}/E in terms of £('f/).
In particular, this ratio equals EIVTJI in case (ii) and Et l ('TJ) in case (iii).
Proof: (i) Write (1) in the form
Ef.. ).d B . Ef(Y,17) = E 1 f«()s(X,fJ) (ds), B E Sd,
and extend by a monotone class argument to
Ef.. E J h(Y,17,S)ds = E J h«()s(X,O,s)(ds),
for any measurable function h > 0 on the appropriate product space. Ap-
plying the latter formula to the function h ( x, J-L, 8) == f ( () - s ( x, J-t), 8) for
measurable f > 0 and substituting -8 for s, we get
Ef.. E J f(()s(Y, 17), -s) ds = E J f(X,, s) (ds). (2)
In particular, we have for measurable g, h > 0
Ef.. E J h«()s(Y, 17)) g( -s) ds = E h(X,) g.
If 9 > 0 with ),.d g < 00, then €g < 00 a.s., and the desired relation follows
by the further substitution
h(x,JL) = f(x,JL) 1{JL9 > O}.
J-Lg
(ii) Here we may apply (2) to the function
h(x,J-t,s) == f(x,J-L) 1{J-L{s} == 1, J-LSlsl == O},
and note that ((}s17)SI-sl == 0 iff s E VTJ.
(iii) In this case, we apply (2) to the function
h(x,J-L,s) == f(x,J-L) l{to(J.l) == s},
and note that to( (}s'TJ) = -s iff S E [0, tl (1])).
o
Now consider a simple point process 7J on JR and a D1easurable process
Y on JR with values in an arbitrary measurable space (,, S). We say that
the pair (Y, 'TJ) is cycle-stationary if 1]{O} == 1 and tl ('TJ) < 00 a.s., and if
in addition ()tl(T/)(Y,7J) d (Y,'TJ). The variables tn(TJ) are then a.s. finite,
and the successive differences Llt n (TJ) == t n + 1 (TJ) - t n (TJ) along with the
shifted processes yn = ()t n ('11) Y form a stationary sequence in the space
206 Foundations of Modern Probability
(0,00) X SIR.. The following result gives a striking relationship between the
notions of stationarity and cycle stationarity for pairs (X,) and (Y, TJ).
When d = 1 and =1= 0 a.s., the definition (1) of the Palm distribution
and the inversion formula in Proposition 11.3 (Hi) reduce to the nearly
symmetric equations
Ef(Y,TJ) E 1 1 f«()s(X,))(ds)/E,
r t1 ('I7)
Ef(X,) = EJo f«()s(Y,TJ))ds/Eh(TJ).
(3)
(4)
Theorem 11.4 (cycle stationarity, Kaplan) Equations (3) and (4) pro-
vide a one-to-one correspondence between the distributions of all stationary
pairs (X,) on 1R and all cycle-stationary ones (Y, TJ), where X and Yare
measurable processes in S, and and TJ are simple point processes with
=1= 0 a.s., E( < 00, and Etl(r]) < 00.
Proof: First assume that (X,) is stationary with =1= 0 and E < 00,
put ak = tk(), and define L(Y, 'fJ) by (3). Then for any n E N and for
bounded, measurable! > 0, we have
nE'Ef(Y'TJ)=Eln f«()s(X,))(ds)=E L f«()(jk(X,)),
ukE(O,n)
Writing Tk = tk(TJ), we get by a suitable substitution
nE(.E!({}rl(Y,r])) = E L j(fJ uk + 1 (X,f,)),
ukE(O,n)
and so by subtraction,
211/11
IE f«()Tl (Y, TJ)) - Ef(Y, TJ) I < n E .
As n -+ 00, we obtain Ef(9Tl (Y, 'TJ)) = Ef(Y, 'TJ), and therefore Orl (Y, 'TJ) d
(Y, 1]), which means that (Y,1]) is cycle-stationary. Also note that (4) holds
in this case by Proposition 11.3.
Next assume that (Y, 'TJ) is cycle-stationary with Etl ('TJ) < 00, and define
£(X,f,) by (4). Then for nand f as before,
nET1' Ef(X,) = E l Tn f«()s(Y,TJ))ds,
and so for any t E IR,
l Tn [Tn +t
nET1.Ef«()t(X,))=E 0 !«()s+t(Y,TJ))ds=E Jt f«()s(Y,TJ))ds.
Hence, by subtraction,
IEf«()t(X,)) - Ef(X,)1 < 2ltfll .
n Tl
11. Special Notions of Symmetry and lnvariance 207
d
As n 00, we get Ef(Ot(X,)) == Ef(X,), and so Ot(X,) == (X,),
which means that (X,) is stationary.
To see that (X,) and (Y,1]) are related by (3), we jntroduce a possibly
unbounded measure space with integration operator E and a random pair
(Y, ij) satisfying
Ef(¥,iJ) =E 1 1 f(Os(X,))(ds).
(5)
Proceeding as in the proof of Proposition 11.3, except that the monotone
class argument requires some extra care since E may be infinite, we obtain
ftl (ij) (t l (17)
E Jo f(Os(¥,iJ))ds = Ef(XJ,) = E J o f(Os(Y,T/))ds/Elt(T/).
Replacing f(x, J1) by f(Oto(j.t) (x, J1)) and noting that to(OsJ1) == -8 when
J1{0} = 1 and s E [0, t l (J1)), we get
E [t 1 ( ij) f (Y , ij)] = E [t 1 (1]) f (Y, 1] ) ] / Et I (7 ) .
Hence, by a suitable substitution,
Ef(Y, ij) == Ef{Y, 1])/ Etl (1]).
Inserting this into (5) and dividing by the same formula for f = 1, we
obtain the required equation. 0
When is a simple point process on }Rd, we may think of the Palm distri-
bution Qx, as the conditional distribution of (X, ), given that {O} == 1.
The interpretation is justified by the following result, which also provides
an asymptotic formula for the hitting probabilities of s]nall Borel sets. By
Bn -t 0 we mean that sup{lsl; S E Bn} --* 0, and we write 11.11 for the total
variation norm.
Theorem 11.5 (local hitting and conditioning, Korolyuk, Ryll-Nardzewski,
Konig, Matthes) Consider a stationary pair (X,) on ]Rd, where X is a
measurable process in S and is a simple point proceS5 with E E (0,00).
Let B I , B 2 ,... E B d with IBnl > 0 and Bn -t 0, and let f be bounded,
measurable, and shift-continuous. On {Bn == 1}, let an denote the unique
point of in Bn. Then
(i) P{Bn = I} P{Bn > O} rv EBn;
(ii) IIP[8u n (X,) E .1 Bn == 1] - Qx,11 -t 0;
(iii) E[f{X, )I Bn > 0] -t Qx,f.
Proof: (i) Since 1]{O} == 1 a.s., we have (Os1])Bn > 0 for all s E -Bn.
Hence, Proposition 11.3 (ii) yields
P{> O} = E Iv., l{(OsT/)B n > O} ds > EI n (-Bn)l.
208 Foundations of Modern Probability
Dividing by IBnl and using Fatou's lemma, we obtain
I " . f P{Bn > O}
lID In E ' B
noo n
> 1 . " f EIV1] n (-Bn)1
Iffiln I B I
noo n
> EI " . f IVl1 n (-Bn)1 _
lID III IB I - 1,
n-+oo n
which implies
. " P{Bn=l} P { Bn>O }
llIDlnf > 2liminf - 1 > 1.
n-+oo EBn - n-+oo EBn -
The converse relations are obvious since
P{Bn = I} < P{Bn > O} < EBn.
(ii) Introduce on SR d x N(IRd) the measures
/in = E { 1 { e s (X, ) E .} ( ds ) ,
JB n
V n == P[8a n (X,) E "; Bn == 1],
and put m n == EBn and Pn == P{Bn = I}. By (1) the stated total
variation becomes
V n J..Ln
<
V n V n lI n J-Ln
--- + ---
Pn m n m n m n
Pn m n
1 1 1 I Pn
< Pn - - - + - Pn - m n I = 2 1 - - ,
Pn m n m n m n
which tends to 0 in view of (i).
(iii) Here we write
IE[f(X, )I Bn > 0] - Qx,fl
< IE[f(X,)IBn > 0] - E[f(X,)IBn == 1]1
+ IE[f(X,) - f(Oa n (X, »I Bn = 1]1
+ IE[f(8a n (X, »I Bn = 1] - Qx,fl.
By (i) and (ii) the first and last terms on the right tend to 0 as n --+ 00.
To estimate the second term, we introduce on Sad x N(d) the bounded,
measurable functions
ge(X, J.L) = sup If(Os(x, j.t» - f(x, j.t)I, € > 0,
Isl<e
and conclude from (ii) that for large enough n
IE[f(X,) - f(Oa n (X, »I Bn == 1]1
< E[ge«(Ja n (X, »I Bn == 1] -t QX,ge"
Since also QX,ge --+ 0 by dominated convergence as € --+ 0, the desired
convergence follows. 0
11. Special Notions of Symmetry and Invariance 209
We turn to a general ergodic theorem for Palm distributions. Given a
bounded measure 1/ i= 0 on JRd and a positive or bounded, measurable
function f on SIRd x M(JR d ), we introduce the average
Iv (x, J.t) = J f( () s (x, J.t)) v( ds) / II vII , x E SR d , 11, E M (lR. d ),
where x is understood to be a measurable function on jRd, to ensure the
existence of the integral. When 1/ = 0, we take Iv = o. ]et us say that the
weight functions (probability densities) gl, g2, . .. on JRd are asymptotically
invariant if the corresponding property holds for the associated measures
9n . Ad. For convenience, we may sometimes write 9 . J1 := gJ.-l.
Theorem 11.6 (pointwise averages) Consider a stationary and ergodic
pair (X,) on JRd, where X is a measurable process in S and is a ran-
dom measure with E (0,00) a.s. Let .c(Y,1]) = Qx,. Then for any
bounded, measurable function f and asymptotically invariant distributions
J-Ln or weight functions gn on JRd, we have
. - p
(1) fJLn (Y, 1]) -t Ef(X, );
(ii) 19n€(X,) Ef(Y, 'TJ).
The same convergence holds a. s. when J-Ln = 1 Bn . Ad or 9n = 1 Bn' re-
spectively, for some bounded, convex sets B 1 C B 2 C ... in B d with
r(Bn) -t 00.
We can give a short and transparent proof by using the general shift
coupling in Theorem 10.28. Since the latter result applies directly only when
the sample intensity is a constant (which holds in particular when is
ergodic), we need to replace the Palm distribution Qx, in (1) by a suitably
modified version Q'x,€, given for f > 0 and B E B d with IBI E (0, (0) by
Q'x,tJ = E L f((}s(X,f,)) (ds)/IBI.
whenever E (0,00) a.s. If is ergodic, we note that = E a.s., and
therefore Q'x,€ = Qx,€. As previously for Qx,, it is both suggestive and
convenient to introduce a random pair (Z, () with distribution Q'x,€.
Lemma 11.7 (shift coupling, Thorisson) Consider a stationary pair
(X,) on }Rd, where X is a measurable process in S and is a random
measure with E (0,00) a.s. Let .c(Z, () = Q'x,. Then there exist some
random vectors a and T in ]Rd such that
d d
(X,) = (}(j(Z, (), (Z, () = Br(X, ).
The result suggests that we think of Q'x, as the distribution of (X,)
shifted to a "typical" point of . Note that this interpretation fails for Q x,
in general.
Proof: Write I for the shift-invariant a-field in the measurable path space
of (X, ), and put Ix, = (X, )-lI. Letting B = (O,ll d and noting that
210 Foundations of Modern Probability
E[BIIx,] = , we get for any I E I
P{(Z,() E I} E L lI«()s(X,))f,,(ds)/
E[B/(; (X,) E I] = P{(X,) E I},
which shows that (X,) d (Z, () on I. Both assertions now follow from
Theorem 10.28. 0
Proof of Theorem 11.6: (i) By Lemma 11.7 we may assume that (Y,1]) =
(}T(X,) for some random element T in ]Rd. Using Corollary 10.20 (i) and
the asymptotic invariance of J..ln, we get
I!JLn(Y,1]) - Ef(X,)1
- p
< IIJ..ln - OrJ..ln 1IIIfil + If/-Ln (X,) - Ef(X, )I o.
The a.s. version follows in the same way from Theorem 10.14.
(ii) Let f be the stationary and ergodic random measure in Lemma 11.2.
Applying Corollary 10.20 (ii) to both and f and using (1), we obtain
d -
f - ( X C ) = fgn ,\ gn P ! = EfB = E f( Y )
gnt;. , ,\d gn gn EB ' 'TJ .
For the pointwise version, we may use Corollary 10.19 instead. 0
Taking expected values in Theorem 11.6, we get for bounded f the
formulas
E!JLn(Y,TJ) Ef{X,),
Efgn(.(X') Ef{Y, "1),
which may be interpreted as limit theorems for suitable space averages of
the distributions .c(X,) and .c(Y,1]). We shall prove the less obvious fact
that both relations hold uniformly for bounded f. For a striking formula-
tion, we may introduce the possibly defective distributions .c JL(X,) and
£g (. (X, ), given for measurable functions f > 0 by
.c /-L(X, )f = EfJL(X, ),
.cg (X,)f = Elg{X,).
Theorem 11.8 (distributional averages, Slivnyak, Ziihle) Consider a sta-
tionary pair (X,) on]Rd, where X is a measurable process in S and is
a random measure with E (0, (0) a.s. Let £(Z, () = Q'x,. Then for any
asymptotically invariant distributions J-tn or weight functions gn on JRd,
(i) II .c JLn (Z, () - .c(X,) II 0;
(ii) II £ gn(X,) - £(Z, () II --* o.
Proof: (i) By Lemma 11.7 we may assume that (Z,() = f)T(X,). Using
Fubini's theorem and the stationarity of (X, ), we get for any measurable
function f > 0
£' lJ.n{X,)f = J Ef«()s(X,))J.tn(ds) = Ef(X,) = £'(X,)f.
11. Special Notions of Symmetry and lnvariance 211
Hence, by Fubini's theorem and dominated convergence,
II .c tLn(Z,() - .c(X,)" II .c tLn(8T(X,)) - .c tLn(X",)11
< E J l{Os(X,O E .} (JLn - OrJLn)(ds)
< EIIJ-Ln - 8 T t-tn II -+ o.
(ii) Letting 0 < f < 1 and defining f as in Lemma 11.2, we get
f"f9n = J f(Os(X, f,,)) 9n(S) f,,(ds) < f,,9n.
Interpreting fgn/gn as 0 when gn == 0, we obtain
l .c gn(X, )f - .c(Z, ()\
<
IEJgn(X,) - Ef(Z,()1
E fgn _ f!!n < E 1 _ n .
gn -
- p
Here gn/ -+ 1 by Corollary 10.20, and moreover
E(gn/(,) == E(E[gnIIx,]/(,) == E((/() == 1.
Hence, Proposition 4.12 yields gn/(, -+ 1 in £1, and the assertion
follows. D
To motivate our next main topic, we consider a simple limit theorem
for multivariate sampling from a stationary process. IIere we consider a
measurable process X on some index set T, taking values in a space S,
and let T == (71, 72, . . .) be an independent sequence of random elements
in T with joint distribution J-l. We may then form the associated sampling
sequence == X 0 7 in S(X), given by
== (l, 2, . . . ) == (X T1 , X T2 , . . . )
and referred to below as a sample from X with distribution J-L. The sam-
pling distributions J-ll , 112, . .. on T(X) are said to be asymptotically invariant
if their projections onto T k are asymptotically invariant for every kEN.
Recall that Ix denotes the invariant a-field of X, and note that the con-
ditional distribution 1] == P[X o E .IIx] exists by Theorem 6.3 when S is
Borel.
Lemma 11.9 (asymptotically invariant sampling) Let X be a stationary
and measurable process on T == JR or Z with values in a Polish space S,
and form 1, 2, . .. by sampling from X with some asymptotically invariant
distributions J-ll, J-L2, . .. on T(X). Then n in Soc, where .c() == E1]oc
with 1] == P[X o E 'IIx].
Proof: Write == (k) and n == (). Fix any asymptotically invariant
distributions VI, V2,... on T, and let fl,. . . , fm be measurable functions
on S bounded by :!::1. Proceeding as in the proof of Corollary 10.20 (i), we
212 Foundations of Modern Probability
get
IE II/k() - E II/k(k)1
< EIJLn Q9/k(X) - IIk 1J!kl
< IIJLn - JLn * v;:r'1I + J EI(v;:r' * 8d Q9/k(X) - IIk 1J!k1 JLn(dt)
< JllJLn - JLn * 8tll v;:r'(dt) + Lk SUPtEI(v r * 8 t )!k(X) -1J!kI.
Using the asymptotic invariance of J.Ln and lI r together with Corollary 10.20
(i) and dominated convergence, we see that the right-hand side tends to
o as n -+ 00 and then r -1- 00. The assertion now follows by Theorem
4.29. 0
The last result leads immediately to a version of de Finetti's theorem,
the fact that infinite exchangeable sequences are mixed i.i.d. For a precise
statement, consider any finite or infinite random sequence = (l, 2, . . . )
with index set I, and say that is exchangeable if
d
(kl , k2' . . . ) = (l, 2, . . . )
(6)
for any finite permutation (k 1 , k 2 , . . . ) of I. (Here a permutation is said to
be finite if it affects only finitely many elements). For infinite sequences
we also consider the formally weaker property of spreadability, where (6)
is required for all strictly increasing sequences k 1 < k 2 < ... . Note that
is then stationary and that any sample from with strictly increasing
sampling times 71,72,. .. has the same distribution as. By Lemma 11.9 we
conclude that .c() == E1]oo with 1] == P[l E .II]. Below we give a slightly
stronger conditional statement. Recall that for any random measure 1] on a
measurable space (S, S), the associated a-field is generated by the random
variables 1]B for arbitrary B E S.
Theorem 11.10 (exchangeable sequences, de Finetti, Ryll-Nardzewski)
For any infinite random sequence in a Borel space S, the following
conditions are equivalent:
(i) is exchangeable;
( ii ) is spreadable;
(iii) P[ E .11]] == 1]00 a.s. for some random distribution 1] on S.
The random measure 1] is then a.s. unique and equals P[l E .II].
Since '11 00 is a.s. the distribution of an i.i.d. sequence in S based on
the measure 1], we may state condition (iii) in words by saying that is
conditionally i.i.d. Taking expectations of both sides in (iii), we obtain the
seemingly weaker condition £() = E1]oo, which says that is mixed i. i. d.
Now the latter condition implies that is exchangeable, and so, by the
stated theorem, the two versions of (iii) are in fact equivalent.
11. Special Notions of Symmetry and lnvariance 213
Proof: Since S is Borel, we may assume that S == [0,1]. Letting J-tn be the
uniform distribution on the product set Xk { (k - l)n + 1, . . . , kn} and using
the spreadability of , we see from Lemma 11.9 that 1:>{ E .} == ETJ oo .
More generally, consider any invariant Borel set I c 5 00 , and note that (6)
extends to
d
(lI(€),kl,€k2"") == (lI(),1,2,"'), k 1 < k 2 < ... .
Applying Lemma 11.9 to the sequence of pairs (k, 1I()), we get as before
P{ E . n I} == E[TJoo; E I], and since TJ is I-measurable, it follows that
P[ E .11}] == 1}oo a.s..
To see that 1} is unique, we may use the law of large numbers and Theorem
6.4 to obtain
n- 1 "'" IB(€k) -1- TJB a.s., B E S.
kn
The statement of Theorem 11.10 is clearly false for finite sequences.
To rescue the result in the finite case, we need to replace the inherent
Li.d. sequences by so-called urn sequences, generated by successive draw-
ing without replacement from a finite set. For a precise statement, fix any
measurable space S, and consider a measure of the form J-t == Lkn 8Sk
with S1, . . . , Sn E S. The associated factorial measure J1 (n) on sn is defined
by
D
j..L(n) == 2: 8 sop ,
p
where the summation extends over all permutations P == (PI,..., Pn) of
1, . . . , n, and we write sop == (SpI' . . . , spn)' Note that j1(n) is independent
of the order of 81, . . . , Sn and is measurable as a function of J1.
Lemma 11.11 (finite exchangeable sequences) Let €I,... ,n be random
elements in some measurable space, and put == (k) and TJ == Lk 8k'
Then f, is exchangeable iff P[ E .11]] == 1](n) In! a.s.
Proof: Since 1} is invariant under permutations of 1, . . . , n' we note that
(f, 0 p, '11) d (, 1}) for any permutation p of 1,. . . , n. Now introduce an
exchangeable permutation 7r llf, of 1, . . . , n. Using Fubini's theorem twice,
we get for any measurable sets A and B in appropriate spaces
P{f, E B, 1} E A} - P{€07r E B, TJ E A
E[P[ ° 7r E BI]; 1] E A]
- E[(n!)-l1](n) B; 1] E A].
o
Just as for the martingale and Markov properties, even the notions of
exchangeability and spreadability may be related to a filtration F == (Fn).
Thus, a finite or infinite sequence of random elements (; == (1, 2, . . .) is
said to be :F-exchangeable if it is F-adapted and such that, for every n > 0,
the shifted sequence (Jn = (n+1,n+2,"') is conditionally exchangeable
given :F n . For infinite sequences , the definition of F-spreadability is sim-
ilar. (Since both definitions may be stated without reference to regular
214 Foundations of Modern Probability
conditional distributions, no restrictions are needed on S.) When :F is
the filtration induced by , the stated properties reduce to the unqualified
versions considered earlier.
For an infinite sequence , we define strong stationarity or :F-stationarity
by the condition ()r d for every finite optional time T > O. By the
prediction sequence of we mean the set of conditional distributions
7r n == P[()n E .IFn], n E Z+.
(7)
The random probability measures 7ro, 7rl, . .. on S are said to form a
measure-valued martingale if (7r nB) is a real-valued martingale for every
measurable set B c S.
The next result shows that strong stationarity is equivalent to ex-
changeability; it also exhibits an interesting connection with martingale
theory.
Lemma 11.12 (strong stationarity) Let be an infinite, :F-adapted ran-
dom sequence in a Borel space S, and let 1f' denote the prediction sequence
of . Then these conditions are equivalent:
(i) is :F -exchangeable;
(ii) is :F -spreadable;
(iii) is F -stationary;
(iv) 1f' is a measure-valued :F -martingale.
Proof: Conditions (i) and (ii) are equivalent by Theorem 11.10. Assuming
(ii), we get a.s. for any B E Soo and n E Z+
E[1f'n+lBI:Fn] = P[Bn+l E BIFn] = P[(}n E BIFn] == 7r n B, (8)
which proves (iv). Conversely, (ii) follows by iteration from the second
equality in (8), and so (ii) and (iv) are equivalent.
Next we note that (7) extends by Lemma 6.2 to
1f'rB == P[8r E BIFr] a.s., BE Soo,
for any finite optional time T. By Lemma 7.13 it follows that (iv) IS
equivalent to
P{(Jr E B} == E1f'rB == E7r o B = P{ E B}, B E Soo,
which in turn is equivalent to (iii). D
We next aim to show how the exchangeability property extends to a
wide class of random transformations. For a precise statement, we say that
an integer-valued random variable T is predictable with respect to a given
filtration :F if the shifted time 'T - 1 is :F -optional.
11. Special Notions of Symmetry and lnvariance 215
Theorem 11.13 (predictable sampling) Let == (1, 2, . . .) be a finite or
infinite, :F -exchangeable random sequence, and let 71, . . . , 7 n be a. s. distinct
:F -predictable times in the index set of . Then
d
( Tl , . . . , Tn) == (1, . . . , n ) . (9 )
Of special interest is the case of optional skipping, when 71 < 72 < . .. .
If Tk = T + k for some optional time 7 < 00, then (9) reduces to the strong
stationarity of Lemma 11.12. In general, we require neither to be infinite
nor the 7k to be increasing.
For both applications and proof, it is useful to introduce the associated
allocation sequence
aj == inf{ k; 7k == j}, j E I,
where I is the index set of . Note that any finite value of a J gives the
position of j in the permuted sequence (7k). The random times 7k are
clearly predictable iff the aj form a predictable sequence in the sense of
Chapter 7.
Proof of Theorem 11.13: First let be indexed by I == {I, . . . , n }, so that
( 71, . . . , 7 n) and (0:1,..., an) are mutually inverse random permutations
of I. For each m E {O,...,n}, put aj == Qj for all j < m, and define
recursively
aH-l == min(I \ {Q,... ,aj}), m < j < n.
Then (aft,..., a) is a predictable and :F m - 1 -measurable permutation of
1,. .. , n. Since also aj == aj-1 == aj whenever j < m, Theorem 6.4 yields
for any bounded measurable functions fl,. . . , in on S
En.fo;n(j)
J
E E[n/a;"(j)1 Fm-l]
E n . fom(j)E [n . fam(j) I Fm-l ]
J<m J J?:.m J .
E n . fom-l(j)E [n . f m-l(j) l :Fm-1 ]
J<m J J?:.m OJ
En .ia m - 1 (j).
J J
Summing over m E {I, . . . , n} and noting that aj == OJ and aJ == j for all
j, we get
EIL!k(Tk) = EII/aj(j) = EIIJk((k),
which extends to (9) by a monotone class argument.
Next assume that I = {I,.. . , m} with m > n. We may then extend the
sequence (Tk) to I by recursively defining
Tk+ 1 = min (I \ {T1, . . . , 7k } ), k > n,
(10)
216 Foundations of Modern Probability
so that 71,..., 7m form a random permutation of I. Using (10), we see
by induction that the times 7 n+1, . . . , 7 Tn are again predictable. Hence, the
previous case applies, and (9) follows.
Finally, assume that I = N. For each mEN, we introduce the predictable
times
7k = 7k 1 {7k < m} + (m + k) 1 {7k > m}, k = 1, . . . , n,
and conclude from the previous version of (9) that
d
(Tm,... 'Tm) = (1,... ,n).
1 n
(11)
As m --t 00, we have 7k --t Tk, and (9) follows from (11) by dominated
convergence. D
The last result yields a simple proof of yet another basic property of
random walks in JR, a striking relation between the first maximum and
the number of positive values. The latter result will in turn lead to simple
proofs of the arcsine laws in Theorems 13.16 and 14.11.
Corollary 11.14 (sojourns and maxima, Sparre-Andersen) Let 1, .. . , n
be exchangeable random variables, and put Sk = 1 + . . . + k. Then
l{Sk > O} d min{k > 0; Sk = m ax J o<nS J o}.
kn -
- -
Proof: Put k = n-k+l for k = 1,..., n, and note that the k remain
xchageable for_ the filtration :Fk = a{ Sn, l' . . . , k}, k = 0, . . . , n. Write
Sk = 1 + . . . + k, and introduce the predictable permutation
k-l
Ok = L l{Sj < Sn} + (n - k + l)l{Sk-l > Sn}, k = 1, . .. , n.
j=O
Define = l:j j 1 {OJ = k} for k = 1, . . . , n, and conclude from Theorem
11.13 that () d (k). Writing S = + . . . + , we further note that
n-l n
min{k > 0; S = maxjSj} = L 1{Sj < Sn} = L l{Sk > O}. 0
j=O k=l
Turning to the case of continuous time, we say that a process X in some
topological space is continuous in probability if Xs --+ Xt as s --+ t. An
JRd-valued process X on JR+ is said to be exchangeable or spreadable if it is
continuous in probability with Xo = 0 and such that the increments Xt-X s
over any set of disjoint intervals (s, t] of equal length form an exchangeable
or spreadable sequence. Finally, we say that X has conditionally stationary
and independent increments, given some O"-field I, if the stated property is
conditionally true for any finite collection of intervals.
The following continuous-time version of Theorem 11.10 characterizes the
exchangeable processes on IR+_ We postpone the much harder finite-interval
11. Special Notions of Symmetry and Invariance 217
case until Theorem 16.21. The point process case is treated separately by
different methods in Theorem 12.12.
Theorem 11.15 (exchangeable processes on ffi.+, Buhlmann) Let the pro-
cess X on IR+ be JRd-valued and continuous in probability with Xo == O.
Then X is spreadable iff it has conditionally stationary and independent
increments, given some a-field I.
Proof: The sufficiency being obvious, it suffices to show that the stated
condition is necessary. Thus, assume that X is spreadable. Then the in-
crements nk over the dyadic intervals Ink == 2- n (k - 1, k] are spreadable
for fixed n, and so by Theorem 11.10 they are conditionally i.i.d. T/n for
some random probability measure T/n on . Using Coronary 3.12 and the
uniqueness in Theorem 11.10, we obtain
*2 n - m
1]n == 1]m a.s., m < n.
(12)
Thus, for any m < n, the increments mk are conditionally i.i.d. rJm, given
1]n. Since the a-fields a(1]n) are a.s. nondecreasing by (12), Theorem 7.23
shows that the €mk remain conditionally i.i.d. rJm, given I = a { 1]0, Tll, . . . }.
Now fix any disjoint intervals II"." In of equal length with associ-
ated increments 1, . . . , n' Here we may approximate by disjoint intervals
II' . . . , I of equal length with dyadic endpoints. For each m, the associ-
ated increments k are conditionally i.i.d., given I. Thus, for any bounded,
continuous functions f 1, . . . , f n ,
EIIIkn!k(k) = IIkn E I fk(k) = IIknEI !k(f{'). (13)
Since X is continuous in probability, we have k k for each k, so (13)
extends by dominated convergence to the original variab]es k. By suitable
approximation and monotone class arguments, we may finally extend the
relations to any measurable indicator functions fk == IBk' 0
We turn to an interesting relationship between the san1ple intensity of
a stationary random measure on JR+ and the corresponding maximum
over increasing intervals. It is interesting to compare with the more general
but less precise maximum inequalities in Proposition 10.10 and Lemmas
10.11 and 10.17. For the need of certain applications, we also consider the
case of random measures on [0,1). Here [ == [O, 1) by definition, and
stationarity is defined as before in terms of the shifts Ot on [0,1), where
(JtB = s+t (mod 1) and correspondingly for sets and measures. Recall that
is singular if its absolutely continuous component vanishes. This holds in
particular for purely atomic measures .
218 Foundations of Modern Probability
Theorem 11.16 (ballot theorem) Let be a stationary and a.s. singu-
lar random measure on JR+ or [0, 1). Then there exists a U (0, 1) random
variable a J.LI such that
a sup t- I [O, t] == ( a.s. (14)
t>O
To justify the statement, we note that singularity is a measurable prop-
erty of a measure J-l. Indeed, by Proposition 2.21, it is equivalent that the
function Ft == prO, t] be singular. Now it is easy to check that the singularity
of F can be described by countably many conditions, each involving the
increments of F over finitely many intervals with rational endpoints.
Proof: If is stationary on [0, 1), then the periodic continuation 1] ==
L::n<O f}n is clearly stationary on JR+, and moreover ITf == I and ij == (.
We may also use the elementary inequality
Xl + . . . + X n Xk
< max - , n EN,
tl + . . . + t n - kn tk
valid for arbitrary Xl, X2, . .. > 0 and tl, t2, . .. > 0, to see that
SUPt t-IrJ[O, t] == SUPt t-I[O, t]. It is then enough to consider random
measures on 1R+.
In that case, put Xt == (O, t] and define
At == infs>t(s - Xs), at == l{At == t - Xt},
t > o.
(15)
Noting that At < t - Xt and using the monotonicity of X, we get for any
s<t
As
inf (r-X r )I\A t > (8-X t )I\A t
rE[s,t)
> (8 - t + At) 1\ At == 8 - t + At.
If Ao is finite, then so is At for every t, and we obtain by subtraction
o < At - As < t - 8 on {Ao> -oo}, 8 < t.
(16)
Thus, A is nondecreasing and absolutely continuous on {Ao > -oo}.
Now fix a singular path of X such that Ao is finite, and let t > 0 be such
that At < t - Xt. Then At + X t ::!: < t by monotonicity, and so, by the left
and right continuity of A and X, there exists some E > 0 such that
As + Xs < s - 2£, 18 - tl < E.
Then by (16),
s - Xs > As + 2€ > At + €, Is - tl < €,
and by (15) it follows that As == At for Is - tl < €. In particular, A has
derivative A == 0 == at at t.
We turn to the complementary set D == {t > 0; At == t-X t }. By Theorem
2.15 both A and X are differentiable a.e., the latter with derivative 0, and
we form a set D' by excluding the corresponding null sets. We may also
11. Special Notions of Symmetry and lnvariance 219
exclude the at most countably many isolated points of D. Then for any
tED' we may choose some t n t in D \ {t}. By the definition of D,
At n - At = 1 _ X tn - Xt R\T
, n E l,
t n - t t n - t
and as n 00 we get A = 1 = at. Combining this with the result in the
previous case gives A' == a a.e., and since A is absolutely continuous, we
conclude from Theorem 2.15 that
At - Ao = it cxsds on {Ao> -oo}, t. O. (17)
Now recall that Xtlt -t a.s. as t ex) by Corollary 10.19. When < 1,
we see from (15) that -00 < Atlt 1 - a.s. Also
At + X t - t inf s 2:: t ( (s - t) - (X s - X t ) )
infso(s - 8t(0, s]),
and hence
at = l{infso(s - 8t(0, s]) == OJ, t > O.
Dividing (17) by t and using Corollary 10.9, we get a.s. on { < 1}
P[SUPt>o(Xtlt) < 11 L€]
P[SUPt2::0 (X t - t) == 01 I]
P[Ao == 01 I]
E[o:ol I€] == 1 - .
Replacing by r and taking complements, we obtain more generally
P[r SUPt>o(Xtlt) > 11 T€] == r /\ 1 a.s.,
r :> 0
-- ,
(18)
where the result for r( E [1, (0) follows by monotonicity.
When E (0, (0), we may simply define a by (14); if instead ( = 0 or
00, we take a == iJ, where iJ is U(O, 1) and independent of . Note that (14)
remains true in the latter case, since == 0 a.s. on { == O} and Xtli -t 00
a.s. on { == oo}. To verify the distributional claim, we conclude from (18)
and Theorem 6.4 that, on { E (O,oo)},
P[a < rl T€] == P[r SUPt(Xt/t) > I .L€] == r /\ 1 a.s., r > O.
Since the same relation holds trivially when == 0 or 00, we see that
a is conditionally U(O,l) given If,., which means that (]" is U(O,l) and
independent of If,.. 0
From the last theorem we may easily deduce a corresponding discrete-
time result. Here (14) holds only with inequality and will be supplemented
by a sharp relation similar to (18). For a stationary sequence ==
(1, 2, . . .) in + with invariant a-field .L, we define == E[lII] a.s. On
{I, . . . , n} we define stationarity in the obvious way in terms of addition
modulo n, and we put t == n- 1 Ek k.
220 Foundations of Modern Probability
Corollary 11.17 (discrete-time ballot theorem) Let = (l, 2, . . .) be a
finite or infinite, stationary sequence of JR+ -valued random variables, and
put Sk = Ejkj. Then there exists a U(O, 1) random variable allI such
that
asuPk>O (Sk/k) < ( a.s.
If the k are Z+-valued, we have also
P[suPk>O (Sk - k) > 01 I] = /\ 1 a.s.
(19)
(20)
Proof: Arguing by periodic continuation as before, we may reduce to the
case of infinite sequences . Now let {) be U(0,1) and independent of ,
and define Xt = S[t+19]' Then X has stationary increments, and we note
that also Ix = If.. and X = (. By Theorem 11.16 there exists some U(O, 1)
random variable a JLI x such that a.s.
sup (Sk/k) = sup (S[t]/t) < sup (Xt/t) = (/a.
k>O t>O t>O
If the k are Z+-valued, the same result yields a.s.
P[SUPk>O (Sk - k) > 01 I] - P[SUPtO (X t - t) > 01 If..)
P[SUPt>o (Xt/t) > 11 I]
P[ > al I{] = !\ 1. 0
To state the next result, consider a random element in a countable
space S, and put Pj == P{ = j}. Given an arbitrary a-field :F, we define
the information I(j) and the conditional information I(jIF) by
I(j) = -logpj, I(jl}=") = -logP[ = jlF],
j E S.
For motivation, we note the additivity property
I(l,.' ., n) = I(l) + I(211) + ... + I(nll,'.., n-l), (21)
valid for any random elements l, . . . , n in S. Next we form the associated
entropy H() = EI() and conditional entropy H(I:F) == EI(IF), and
note that
H() = EI() = - L .Pj logpj.
J
From (21) we see that even H is additive, in the sense that
H(f,l,. .., n) = H(f,l) + H(f,211) + . . . + H(nll,. . ., n-l)' (22)
If the n form a stationary and ergodic sequence such that H(o) < 00,
we show that the averages of the terms in (21) and (22) converge toward a
common limit.
11. Special Notions of Symmetry and Invariance 221
Theorem 11.18 (entropy and information, Shannon, McMillan, Breiman,
Ionescu Tulcea) Let == (k) be a stationary and ergodic sequence in a
countable space S such that H(o) < 00. Then
n- 1 1(1,. .., n) --+ H(ol-l, -2, . . .) a.s. and in L 1 .
Note that the condition H(o) < 00 holds automatically when the state
space is finite. Our proof will be based on a technical estimate.
Lemma 11.19 (maximum inequality, Chung, Neveu) libr any countably
valued random variable and discrete filtration (:F n ), we have
EsuPnl(IFn) < H() + 1.
Proof: Write Pj == P{ == j} and 1] == sUPn I(IFn)' For fixed r > 0, we
introduce the optional times
Tj == inf{n; l(jl:F n ) > r} == inf{n; P[ == jlFn] < e- r }, j E S.
By Lemma 6.2,
P{", > r, == j} P{Tj < 00, == j}
E[P[ == j 1FT]]; TJ < 00]
< e-rP{Tj<oo} < e- T .
Since the left-hand side is also bounded by Pj, Lemma 3.4 yields
.E[1]j E = j] = . roo P{ 1] > r, E = j} dr
J J Jo
< . roo (e- r 1\ Pj) dr
J J o
.Pj (1 - logpj) == H() + 1.
J
Proof of Theorem 11.18 (Breiman): We may assume to be defined on
the canonical space Soo. Then introduce the functions
E1]
o
gk() == I(ol-I," . , -k+l), g() == I(ol-I, (-2,. . .).
By (21) we may write the assertion in the form
n- 1 gk(Ok) --+ Eg() a.s. and in L 1 ,
kn
(23)
Here gk() --+ g() a.s. by martingale convergence and E7suPngk() < 00
by Lemma 11.19. Hence, (23) follows by Corollary 10.8. 0
Exercises
1. Show that Lemma 11.1 can be strengthened to lirn inft t-l[O, t] > 0 a.s.
on =f O. (Hint: Use Corollary 10.19.)
222 Foundations of Modern Probability
2. Let =: be a stationary random set in JR. Show that sup =: == 00 a.s. on
=: =1= 0. (Hint: Use Lemma 11.1, or prove the result by a similar argument.)
3. For (X,) as in Lemma 11.2, define on the appropriate product space
a random measure by (A x B) == IB lA(Os(X,))(ds). Show that is
again stationary under shifts in ]Rd.
4. Prove Theorem 11.5 (i) by an elementary argument when the Bn are
intervals in ]Rd. (Hint: If an interval I is partitioned for each n into subin-
tervals Inj with maxj IInj I -7 0, then Lj 1 {Inj == I} -7 f,I a.s. Now take
expected values and use dominated convergence.)
5. In the context of Theorem 11.8, show that Qx, == Q'x, iff ( == E( E
(0,00) a.s. Also give examples where Qx, exists while Q'x,f. does not, and
conversely.
6. Let J-Lc be the distribution of the sequence 71 < 72 < ... , where ==
Lj fJ T ) is a stationary Poisson process on + with rate c > O. Show that
the Me are asymptotically invariant as c -7 O.
7. Show by an example that a finite, exchangeable sequence need not be
mixed i.i.d.
8. Let the random sequence be conditionally i.i.d. 'rJ. Show that is
ergodic iff 'TJ is a.s. nonrandom.
9. Let and'TJ be random probability measures on some Borel space such
that Ef,oo == E",oo. Show that d 'TJ. (Hint: Use the law of large numbers.)
10. Let 1, 2, . .. be spreadable random elements in some Borel space S.
Prove the existence of a measurable function f: [0,1]2 -7 S and some i.i.d.
U(O, 1) random variables 19 0 ,19 1 ,." such that f,n = f(190' 19n) a.s. for all n.
(Hint: Use Lemma 3.22, Proposition 6.13, and Theorems 6.10 and 11.10.)
11. Let = (1, 2, . . . ) be an F -spreadable random sequence in some Borel
space S. Prove the existence of some random measure 'TJ such that, for each
n E Z+, the sequence on is conditionally i.i.d. 'TJ, given Fn and 'TJ.
12. Let 1, . . . , n be exchangeable random variables, fix a Borel set B, and
let 71 < . . . < 7v be the indices k E {I, . . . , n} with Lj<k j E B. Construct
d
a random vector ("'1,.." 'T}n) == (1, . . . , n) such that Tk == 'T}k a.s. for all
k < v. (Hint: Extend the sequence (7k) to k E (v, n], and apply Theorem
11.13.)
13. Prove a version of Corollary 11.14 for the last maximum.
14. State and prove a continuous-time version of Lemma 11.12. (If no reg-
ularity conditions are imposed on the exchangeable processes of Theorem
11.15, we need to consider optional times taking countably many values.)
15. Anticipating the theory of Levy processes in Chapter 15, show that
any exchangeable process on 1R+ as in Theorem 11.15 has a version with
rcll paths.
11. Special Notions of Symmetry and lnvariance 223
16. Show by an example that the conclusion of TheoreIn 11.16 may fail
when is not singular.
17. Give an example where the inequality in Corollary 11.17 is a.s. strict.
(Hint: Examine the proof.)
18. (Bertrand, Andre) Show that if two candidates A and B in an election
get the proportions p and 1 - P of the votes, then the probability that A
will lead throughout the ballot count equals (2p-1)+. (Hint: Use Corollary
11.17. Alternatively, use a combinatorial argument based on the reflection
principle. )
19. Prove the second claim in Corollary 11.17 by a martingale argument,
in the case where 1,. . . , n are Z+-valued and exchangeable. (Hint: We
may assume that Sn is nonrandom. Then the variables M k == Sk/k form a
reverse martingale, and the result follows by optional sarnpling.)
20. Prove that the convergence in Theorem 11.18 holds in LP for ar-
bitrary p > 0 when S is finite. (Hint: Show as in Lemma 11.19 that
II sUPn I(IFn)llp < 00 when is S-valued, and use Corollary 10.8 (ii).)
21. Show that H(, 'TJ) < H() + H('fJ) for any and 7]. (Hint: Note that
H(1}I) < H(r]) by Jensen's inequality.)
22. Give an example of a stationary Markov chain (n) such that H(l) > 0
but H(llo) == o.
23. Give an example of a stationary Markov chain (n) such that H(l) ==
00 but H(llo) < 00. (Hint: Choose the state space Z+, and consider
transition probabilities Pij that equal 0 unless j == i + 1 or j == 0.)
Chapter 12
Poisson and Pure J ump- Type
arkov Processes
Random measures and point processes; Cox processes, random-
ization, and thinning; mixed Poisson and binomial processes;
independence and symmetry criteria; Markov transition and
rate kernels; embedded Markov chains and explosion; compound
and pseudo-Poisson processes; ergodic behavior of irreducible
chains
Poisson processes and Brownian motion constitute the basic building blocks
of modern probability theory. Our first goal in this chapter is to introduce
the family of Poisson and related processes. In particular, we construct
Poisson processes on bounded sets as mixed binomial processes and derive
a variety of Poisson characterizations in terms of independence, symmetry,
and renewal properties. A randomization of the underlying intensity mea-
sure leads to the richer class of Cox processes. We also consider the related
randomizations of general point processes, obtainable through independent
motions of the individual point masses. In particular, we will see how the
latter type of transformations preserve the Poisson property.
It is usually most convenient to regard Poisson and other point processes
on an abstract space as integer-valued random measures. The relevant parts
of this chapter may then serve at the same time as an introduction to ran-
dom measure theory. In particular, Cox processes and randomizations will
be used to derive some general uniqueness criteria for simple point processes
and diffuse random measures. The notions and results of this chapter form a
basis for the corresponding weak convergence theory developed in Chapter
16, where Poisson and Cox processes appear as limits in important special
cases.
Our second goal is to continue the theory of Markov processes from
Chapter 8 with a detailed study of pure jump-type processes. The evo-
lution of such a process is governed by a rate kernel lX, which determines
both the rate at which transitions occur and the associated transition prob-
abilities. For bounded a one gets a pseudo-Poisson process, which may be
described as a discrete-time Markov chain with transition times given by
an indep.endent, homogeneous Poisson process. Of special interest is the
case of compound Poisson processes, where the underlying Markov chain is
a random walk. In Chapter 19 we shall see how every Feller process can be
12. Poisson and Pure Jump-Type Markov Processes 225
approximated in a natural way by pseudo-Poisson processes, recognized in
that context by the boundedness of their generators. A similar compound
Poisson approximation of general Levy processes is utilized in Chapter 15.
In addition to the already mentioned connections to other topics, we
note the fundamental role of Poisson processes for the theory of Levy pro-
cesses in Chapter 15 and for excursion theory in Chapter 22. In Chapter
25 the independent-increment characterization of Poisson processes is ex-
tended to a criterion in terms of compensators, and we derive some related
time-change results. Finally, the ergodic theory for continuous-time Markov
chains, developed at the end of this chapter, is analogous to the discrete-
time theory of Chapter 8 and will be extended in Chapter 20 to a general
class of Feller processes. A related theory for diffusions appears in Chapter
23.
To introduce the basic notions of random measure theory, consider an
arbitrary measurable space (5, S). By a random measure on S we mean a
u-finite kernel from the basic probability space (f!, A, P) into S. Here the
u-finiteness means that there exists a partition B 1 , B 2 , . .. E 5 of S such
that Bk < 00 a.s. for all k. It is often convenient to think of as a random
element in the space M(5) of u-finite measures on S, endowed with the
u-field generated by the projection maps 7rB : J-t H J-tB for arbitrary B E S.
Note that B = (., B) is a random variable in [0,00] for every B E 5.
More generally, it is clear by a simple approximation that f == J fd is a
random variable in [0,00] for every measurable function .f > 0 on S. The
intensity of is defined as the measure EB = E(B), B E S.
We often encounter the situation when 5 is a topological space with Borel
u-field S = B(8). In the special case when S is a locally compact, second
countable Hausdorff space (abbreviated as lcscH), it is understood that
is a.s. finite on the ring S of all relatively compact Borel sets. Equivalently,
we assume that f < 00 a.s. for every f E C1«5), the class of continuous
functions f > 0 on S with compact support. In this case, the a-field in
M (S) is generated by the projections 7r f : J1 J1 f for all fECi< (S).
The following elementary result provides the basic uniqueness criteria for
random measures. Stronger results are given for simple point processes and
diffuse random measures in Theorem 12.8, and related convergence criteria
appear in Theorem 16.16.
Lemma 12.1 (uniqueness for random measures) Let and'TJ be random
measures on S. Then d 11 under each of these conditions:
(i) (Bl,..., Bn) d (ryBl'...' 'TJBn) for any B l , . .., Bn E 5, n E N;
(ii) f d 11 f for any measurable function f > 0 on S.
If S is lescH, it suffices in (ii) to consider functions f E (Jj( ( S) .
Proof: The sufficiency of (i) is clear from Proposition 3.2. Next we note
that (i) follows from (ii), as we apply the latter condition to any positive
linear combination f = Ek ck1Bk and use the Cramer-Wold Corollary 5.5.
226 Foundations of Modern Probability
Now assume that S is lescH, and that (ii) holds for all f E C]((8). Since
C]( (S) is closed under positive linear combinations, we see as before that
(!1,' .., !n) d (TJ!I,..., TJfn), 11,. . . , In E Ck, n E N.
By Theorem 1.1 it follows that .c() = .c(rJ) on the a-field 9 = a{7rj; I E
Cj( }, where 1T j : J-l H J-lf, and it remains to show that Q contains F =
a{1TB; B E 5}. Then fix any compact set K c S, and choose some functions
in E Cj( with In -!- l K . Since J-lln -!- J.LK for every /-L E M(S), the mapping
7rK is Q-measurable by Lemma 1.10. Next apply Theorem 1.1 to the Borel
subses of an arbitrary compact set, to see that 1rB is Q-measurable for any
B E S. Hence, F c Q. 0
-
By a point process on S we mean an integer-valued random measure .
In other words, we assume B to be a Z +-valued random variable for every
B E S. Alternatively, we may think of as a random element in the space
N(S) c M(S) of all a-finite, integer-valued measures on S. When S is
Borel, we may write = 2: k K, 8'k for some random elements /'1, /'2, . . .
in Sand '" in Z +, and we note that is simple iff the 7k with k < K are
distinct. In general, we may eliminate the possible multiplicities to create
a simple point process *, which agrees with the counting measure on the
support of . By construction it is clear that * is a measurable function
of €.
A random measure on a measurable space S is said to have independent
increments if the random variables Bl,'.. ,Bn are independent for any
disjoint sets Bl"..' Bn E S. By a Poisson process on S with intensity
measure J.-L E M (S) we mean a point process on S with independent
increments such that B is Poisson with mean /-LB whenever JlB < 00.
By Lemma 12.1 the stated conditions specify the distribution of , which
is then determined by the intensity measure J..L. More generally, for any
random measure 'TJ on S, we say that a point process is a Cox process
directed by TJ if it is conditionally Poisson, given 1], with E[I1]] = 'TJ a.s. In
particular, we may take rJ = QJ..L for some measure J.L E M (S) and random
variable a > 0 to form a mixed Poisson process based on J.-L and a.
We next define a v-randomization ( of an arbitrary point process on
S, where v is a probability kernel from S to some measurable space T.
Assuming first that is nonrandom and equal to J..L = 2:k 8 Sk , we may
take ( == 2:k 6 Sk ,'Yk' where the "'/k are independent random elements in
T with distributions V(Sk, .). Note that the distribution PJ-L of ( depends
only on J.L. In general, we define a v-randomization ( of by the condition
P[( E .I] = p a.s. In the special case when T = {O, I} and v(s, {O}) = P E
[0,1], we refer to the point process p == (. x {O}) on S as a p-thinning of.
Another special instance is when S == {O}, = ",8 0 , and v = J.t / J-LT for some
J-L E M(T) with J-LT E (0,00), in which case <" is called a mixed binomial
(or sample) process based on J.-L and K. Note that (B is then binomially
distributed, conditionally on K, with parameters vB and "'. If T is Borel,
12. Poisson and Pure Jump-Type Markov Processes 227
we can write ( == Lk<K fJ'Yk' where the random elements lk are i.i.d. v and
independent of K. -
Our first aim is to examine the relationship between the various point
processes introduced so far. Here we may simplify the cornputations by us-
ing the Laplace functional1/J(f) == Ee-€f of a random measure , defined
for any measurable function f > 0 on the state space S. Note the 'l/JE" de-
termines the distribution .c() by Lemma 12.1 and the uniqueness theorem
for Laplace transforms. The following lemma lists some useful formulas.
Recall that a kernel v between two measurable spaces Sand T may be
regarded as an operator between the associated function spaces, given by
vf(s) == J v(s,dt)f(t). For convenience, we write D(s,.) :== 8s 0 V(b, .), so
that J-L 0 v == {LV.
Lemma 12.2 (Laplace functionals) Let f, 9 > 0 be measurable.
(i) If is a Poisson process with E == J-L, then
Ee-€f == exp{ -J-L(1 - e- f)} .
Here we may replace f by if when f: S -+ IR with p(lfl/\ 1) < 00.
(ii) If is a Cox process directed by 'fJ, then
Ee-f-119 == Eexp{ -1}(1- e- f + g)}.
(iii) If ( is a v-randomization of, then
Ee-(f == Eexp(logfje-f).
(iv) If p is a p-thinning of, then
Ee-pf-g == E exp{ -(g - log{l - p(l - e- f )})} .
(v) If is a mixed binomial process based on JL and K, then
Ee-E"f == E(J-Le- f / J-LS)K.
Proof: (i) If Q is a Poisson random variable with mean m, then clearly
Ee- CCl = e- m LkO (me-c)k /k! = exp{ -m(l - e- C )}, C E <C.
Now let f == Lk<m Ck IB k , where Ck E <C and the sets Bk E S are disjoint
with {LBk < 00. Then
Ee-f
Eexp {- Lk ckf.Bk} = It Ee-Ck(.Bk
It exp{ -JLBdl - e- Ck )}
exp { - LkJLBk(l - e- Ck )}
exp{ -J-L(1 - e- f)}.
For general f > 0, we may choose some simple functions In > 0 with In t I
and conclude by monotone convergence that fn -+ f and J.L(1 - e- fn ) -+
f..£(l - e- f ). The asserted formula then follows by dominated convergence
from the version for f n.
228 Foundations of Modern Probability
Now assume that J.L(lfl/\ 1) < 00. Replacing f by elfl in the previous
formula and letting c t 0, we get by dominated convergence P{ Ifl < oo} ==
eO == 1, or Ifl < 00 a.s. Next choose some simple functions In ---* f with
Ifni < If I and J.Llfnl < 00, and note that 11 - e-ifnl < Ifl/\ 2 by Lemma
5.14. By dominated convergence we obtain f,fn --+ f,f and /l(I - e- ifn ) ---*
J.L(1 - e- if ). The extended formula now follows from the version for In.
(ii) By (i) we have
Ee-f. f -1J9 _ Ee- 119 E[e-f. f l1]]
Ee- 1J9 exp{ -1J(1 - e- f)}
Eexp{ -ry(l - e- f + g)}.
(iii) First assume that == Lk 8Sk is nonrandom. Introducing some
independent random elements rk in T with distributions V(Sk, .), we get
Ee-(f Eexp {- 2:/(Sk,1'k)}
IIk Ee- f(Sk,"Ik) = II/e- f (Sk)
- exp 2:klogve-f(sk) = expOogve- f .
Hence, in general,
Ee-c;f == EE[e-c;fl] == Eexplogve-f.
(iv) Apply (iii), or use the same method of proof.
(v) We may assume that == Lk<1'£ b"Yk' where /'1, 1'2, . .. are i.i.d. and
independent of with distribution J..L7 J.lB. Using Fubini's theorem, we get
Ee-f. f
Eex p { - !(Ik) } == E ll Ee-f('Yk)
k k
E II (J.le-f / J.lS) == E(J.le- f / J..LB) .
kK
o
It is now easy to prove that the Poisson property is preserved under
randomizations. Here is a more general result.
Proposition 12.3 (preservation laws) For any measurable spaces S, T,
and U, consider some probability kernels J.L: S --+ T and v: S x T --+ U.
(i) If is a Cox process on S directed by 1] and (lL 1] is a J..L-
randomization of, then ( is a Cox process directed by 1] @
J..L.
(ii) If'fJ is a J-L-randomization of and ( is a v-randomization of 1], then
( is a J.L Q9 v-randomization of .
Note that the conditional independence in (i) holds automatically when
( is constructed from by independent randomization, as in Lemma 6.9.
12. Poisson and Pure Jump-Type Markov Processes 229
Proof: (i) Using Proposition 6.6 and Lemma 12.2 (ii) and (iii), we get for
any measurable functions f, 9 > 0
Ee-(f-TJflg Ee- 1J ji,g E[e-(fl, 1]]
Eexp{logjte-f -1]jtg}
Eexp{ -1](1 - jle- f + jly)}
- Eexp{-ru 1 (1 - e- f + g)}.
The result now follows by Lemmas 12.1 (ii) and 12.2 (ii).
(ii) By Lemma 12.2 (iii),
Ee-(f
E exp{ 1] log ve- f}
E exp{ log flve- f}
E exp { log (Jl @ v)" e - f } .
o
We continue with a basic relationship between Poisson and binomial
processes. The result leads to an easy construction of the general Poisson
process in Theorem 12.7. The significance of mixed Poisson and binomial
processes is further clarified by Theorem 12.12 below.
Theorem 12.4 (mixed Poisson and binomial processes) Consider a point
process and a a-finite measure J..l on a common space (8,5), and let
B 1 , B 2 , ... E S with Bn t S. Then is a mixed Poisson or binomial
process based on J..l iff the same property holds on Bn for every n EN.
Proof: First assume that is a mixed Poisson process based on Jl and
a. Then the same property holds for the restriction to any set B E S with
J-tB = 00. If instead J..lB E (0, 00 ), let 1] be a mixed binom ial process based
on IB . p and "", where"" is conditionally Poisson with mean allB. By
Lemma 12.2 (ii) and (v) we have for any measurable function f: S --+ JR+
supported by B
Ee- TJf E(p[e- f; B]/ pB) K
- E exp( -aJ.lB(l - J..l[e- f; B]/ J.ll1))
- E exp( -aJ.l[1 - e- f; B])
E exp ( - Q J..l (1 - e - f)) == E e -f" f .
Thus, d 1] on B, as required.
Next let be a mixed binomial process on S based on J.L and K, and fix any
B E S with J.lB > O. Let Kp be a p-thinning of K, where p = J-tB / J.lS, and
consider a mixed binomial process 1] based on IB . J.L and "'p. Using Lemma
12.2 (iv) and (v), we get for any measurable function f > 0 supported
byB
230 Foundations of Modern Probability
Ee- 1Jf - E(J.t[e-f;B]/J.tB)P
E{ 1- (1- JL[e:B] ) }
- E(l- J.t(l- e-f)/J.tS)K
- J
_ E (j-te - f / j-tS) K == Ee -(.f .
Again it follows that d TJ on B.
To prove the converse assertion, we may clearly assume that j-tBn E
(0,00) for all n, so that 1Bn . is a mixed binomial process based on IBn . J-t
and Bn' If f > 0 is supported by Bm, then by Lemma 12.2 we have for
n > m
Ee-f = E ( JL[e- f; B n ] ) £;Bn = E ( 1 _ JL(l - e- f) ) £;Bn .
J.tBn j.tBn
If J-tS < 00, then as n --t 00 we get by dominated convergence
Ee- U = E(1- JL(l :;-f) ) £;S = E( JL=f ) £;S
Taking f = c1Bm and letting c -+ 0, we see in particular that S < 00 a.s.
The relation extends by dominated convergence to arbitrary f > 0, and so
by Lemma 12.2 we conclude that is a mixed binomial process based on
ft and €S.
If instead jLS = 00, Theorem 5.19 shows that Bn/ /-LBn a in [0,00]
along some subsequence N' c N, where 0 < Q < 00. By Theorem 4.30 we
d
may choose some an = Bn/ j-tBn such that an -+ 0: a.s. along N', and so
by dominated convergence in (1)
(1)
Ee-f == Eexp(-ajL(l- e- f )).
As before, we see that Q < 00 a.s., and by monotone and dominated con-
vergence we may extend the relation to arbitrary f > O. Hence, Lemma
12.2 shows that is a mixed Poisson process based on J-t and Q. 0
The l;t, result leads in particular to a criterion for a Poisson process to
'-)
be simpTe Recall that a measure J.t on S is said to be diffuse if J..t{ s} == 0
for all s E S.
Corollary 12.5 (simplicity and diffuseness) Let be a Cox process di-
rected by some random measure '1}, both defined on a Borel space S. Then
is a.s. simple iff", is a.s. diffuse.
Proof: It is enough to establish the corresponding property for mixed
binomial processes. Then let ')'1,,2,... be Li.d. with distribution J.L. By
12. Poisson and Pure Jump- Type Markov Processes 231
Fubini's theorem
Phi = ')'j} = J J.L{ s }J.L(ds) = Ls (J.L{ S})2, i =1= j,
and so the fj are a.s. distinct iff J.t is diffuse. 0
The following uniqueness assertion will playa crucial role in a subsequent
proof.
Lemma 12.6 (uniqueness for Cox processes and thinnings) Fix apE
(0, 1).
(i) For any Cox processes and' directed by 1] and ffJ', we have d '
jJ d /
'l 'TJ = 'TJ .
(ii) For any p-thinnings p and of and', we have p d iff d '.
Proof: We prove only (i), the argument for (ii) being similar. By Lemma
12.2 (ii) we have for any measurable function 9 > 0 on S
Ee-g = E exp{ -'TJ(1 - e- g )} = Ee-1]f,
where f = 1-e- g , and similarly for' and 'fl. Assuming d /, we conclude
that Ee-T/f = Ee-T/'f for any measurable function f: S --+ [0,1). Then also
Ee-tT/f = Ee- tTl ' f, t E [0,1],
and since both sides are analytic for t > 0, the relation extends to all t > O.
Hence, Ee- TJf = Ee-T/' f for all bounded, measurable functions f > 0, and
Lemma 12.1 (ii) yields 1] d 'TJ'. 0
We proceed to establish the existence of a Poisson process with arbitrary
intensity measure on a general measurable space. More generally, we can
prove the existence of arbitrary Cox processes and randomizations, which
also covers the cases of thinnings and mixed binomial processes.
Theorem 12.7 (existence) Fix any measurable spaces Sand T, and allow
suitable extensions of the basic probability space.
(i) For any random measure", on 5, there exists a Co; process directed
by 'TJ.
(ii) For any point process on S and probability kernelll: S -t T, there
exists a v-randomization ( of .
Proof: (i) First assume that", = J.t is nonrandom with J-LS E (0,00). By
Corollary 6.18 we may choose a Poisson distributed random variable K with
EK = J1S and an independent sequence of i.i.d. random elements ,1,,2, . . .
in S with distribution J-L/ J-LS. By Theorem 12.4 the random measure =
Ej:::;K 8j is then Poisson with intensity J-L.
Next let J1S = 00. Since J1 is l1-finite, we may split S into disjoint subsets
B 1 , B2,' .. E S such that J..tBk E (0, (0) for each k. As before, there exists
for every k a Poisson process k on S with intensity J1k = 1Bk . J.L, and by
232 Foundations of Modern Probability
Corollary 6.18 we may choose the k to be independent. Writing == Lk k
and using Lemma 12.2 (i), we get for any measurable function f > 0 on S
Ee -f
ilk Ee-f.kf = ilk exp { -J.Lk(l - e- f )}
exp { - L k J.Lk(l - e- f) }
exp{ -tt(l - e- f )}.
Using Lemmas 12.1 (ii) and 12.2 (i), we conclude that is a Poisson process
with intensity J..L.
Now let f,J-t be a Poisson process with intensity tt. Then for any numbers
m1, . . . , m n E Z+ and disjoint sets B 1 , .. . , Bn E S, we have
p n {J-tBk == mk} = Il e-J-tBk (J.LBk)mk /mk!'
kn kn
which is a measurable function of J..L. (Here the expression on the right
is understood to be 0 when ttBk == 00.) The measurability extends to
arbitrary sets Bk E S, since the general probability on the left is a finite
sum of such products. 'Now the sets on the left form a 1r-system generating
the a-field in N(B), and so by Theorem 1.1 we conclude that PJ-t == £(f,J-t)
is a probability kernel from M(S) to N(S). But then Lemma 6.9 ensures
the existence, for any random measure 1J on S, of a Cox process directed
by 'TJ.
(ii) First let J..L = Lk 6Sk be nonrandom in N(S). By Corollary 6.18
there exist some independent random elements 'Yk in T with distributions
V(Sk, .), and we note that (JL == Lk 6 Sk ,'Yk is a v-randomization of J..L. Letting
Bl'...' Bn E S x T and 81,. .. ,8n E (0,1), we get by Lemma 12.2 (iii)
E exp (I' Lk 1Bk log Sk
- exp J.L log vexp Lk 1Bk log Sk
exp J.L log v Ilk SBk .
Using Lemma 1.41 (i) twice, we see that f) TIk8Bk is a measurable function
on S for fixed Sl,.. ., 8n, and hence that the right-hand side is a measurable
function of J-L. Differentiating mk times with respect to Sk for each k and
taking 81 == . . . == 8n = 0, we conclude that the probability P nk {(J-tBk ==
mk} is a measurable function of J.L for any ml, . . . , m n E Z+. As before, it
follows that PJL == £«(J-t) is a probability kernel from N(B) to N(B x T),
and the general result follows by Lemma 6.9. 0
E Il 8(p.Bk
k k
We may use Cox transformations and thinnings to derive some general
uniqueness criteria for simple point processes and diffuse random measures,
improving-the elementary statements in Lemma 12.1. Related convergence
criteria are given in Proposition 16.17 and Theorems 16.28 and 16.29.
12. Poisson and Pure Jump-Type Markov Processes 233
Theorem 12.8 (one-dimensional uniqueness criteria) let (3, S) be Borel.
(i) For any simple point processes and "l on S, we have d "l iff
P{B = O} = P{'rJB = O} for all B E S.
(ii) Let and"l be simple point processes or diffuse random measures on
5, and fix any C > O. Then d "l iff Ee-cB == Ee- Cl1B for all B E S.
(iii) Let be a simple point process or diffuse random rneasure on S, and
let "l be an arbitrary random measure on S. Then; d "l iff B d TJB
for all B E s.
Proof: We may clearly assume that S = (0,1].
(i) Let C denote the class of sets {J.L; J.LB == O} with B E S, and note that
C is a 7r-system since
{JLB = O} n {JLG = O} = {JL(B U C) = O}, B, C E S.
By Theorem 1.1 it follows that d 77 on a(C). Furthermore, writing Inj ==
2- n (j -l,j] for n E Nand j = 1,... ,2 n , we have
f-L* B = lim L .(JL(B n Inj) 1\ 1), f-L E N(S), B E S,
noo 1
which shows that the mapping JL r4 JL* is a(C)-measurable. Since and TJ
are simple, we conclude that == * d 'rJ* == TJ.
(ii) First let { and 1} be diffuse. By Theorem 12.7 we may choose some
Cox processes and ij directed by c{ and C'rJ. Conditioning on or TJ,
respectively, we obtain
P{B = O} == Ee-c{B = Ee- cTJB == P{ijB = O}, B E S. (2)
Since t and ij are a.s. simple by Corollary 12.5, assertion (i) yields t d iJ,
and so d 'T} by Lemma 12.6. If and 'TJ are instead simple point processes,
then (2) holds by Lemma 12.2 (iv) when and ij are p-thinnings of and
'T} with p = 1 - e- c , and the proof may be completed as before.
(iii) First let be a simple point process. Fix any B E S such that
'TJE < 00 a.s. Defining Inj as before, we note that 1}(B n Inj) E Z+ outside
a fixed null set. It follows easily that IB . 'T} is a.s. integer valued, and so
even 1] is a.s. a point process. Noting that
P{1]* B = O} = P{1]B = O} = P{B = OJ, 11 E S,
we conclude from (i) that d 'rJ*. In particular, 'TJ B d B d 77* B for all B,
and so 1}* == 1] a.s.
Next assume that is a.s. diffuse. Letting and ij be Cox processes
directed by and 1], we note that B d ijB for every B E S. Since t is a.s.
simple by Proposition 12.5, it follows as before that d " and so d 1] by
Lemma 12.6. 0
234 Foundations of Modern Probability
As an easy consequence, we get the following characterization of Pois-
son processes. To simplify the statement, we may allow a Poisson random
variable to have infinite mean, hence to be a.s. infinite.
Corollary 12.9 (one-dimensional Poisson criterion, Renyi) Let be a
random measure on a Borel space S such that {s} = 0 a.s. for all S E S.
Then is a Poisson process iff B is Poisson for every B E S, in which
case E is u-finite and diffuse.
Proof: Assume the stated condition. Then J.L = E is clearly a-finite and
diffuse, and by Theorem 12.7 there exists a Poisson process 1] on S with
intensity J-l. Then ryB d B for all B E S, and since 1] is a.s. simple by
d
Corollary 12.5, we conclude from Theorem 12.8 that € = 'rJ. 0
Much of the previous theory can be extended to the case of marks. Given
any measurable spaces (S, S) and (K, lC), we define a K -marked point pro-
cess on S as a point process on S x K in the usual sense satisfying
({ s} x K) < 1 identically and such that the projections €(. x Kj) are
a-finite point processes on S for some measurable partition K 1 , K 2 , . . .
of K.
We say that € has independent increments if the point processes €(B 1 x.),
. . . , (Bn x.) on K are independent for any disjoint sets B 1 , . . . , Bn E S. We
also say that is a Poisson process if € is Poisson in the usual sense on the
product space S x K. The following result characterizes Poisson processes
in terms of the independence property. The result plays a crucial role in
Chapters 15 and 22. A related characterization in terms of compensators
is given in Corollary 25.25.
Theorem 12.10 (independence criterion for Poisson, Erlang, Levy) Let
be a K -marked point process on a Borel space S such that €( {s} x K) = 0
a.s. for all S E S. Then is Poisson iff it has independent increments, in
which case E is a-finite with diffuse projections onto S.
Proof: We may assume that S = (0,1]. Fix any set B E S Q9 /C with
B < 00 a.s., and note that the projection 17 = (lB . )(. x K) is a simple
point process on S with independent increments such that 1]{ s} = 0 a.s.
for all s E S. Introduce the dyadic intervals In; = 2- n (j - 1, j], and note
that maxj 1]Inj V 1 --t 1 a.s.
Next fix any c > O. By dominated convergence, every point s E [0,1] has
an open neighborhood GS such that P{1]Gs > O} < £, and by compactness
we may cover [0,1] by finitely many such sets G 1 ,. . . , G m . Choosing n so
large that every interval Inj lies in one of the Gk, we get maxj P{1]Inj >
O} < E. This shows that the variables 1]Inj form a null array.
Now apply Theorem 5.7 to see that the random variable B = 1]8 =
E j ",In; is Poisson. Since B was arbitrary, Corollary 12.9 then shows that
is a Poisson process on S x K. The last assertion is now obvious. 0
12. Poisson and Pure Jump-Type Markov Processes 235
The last theorem yields in particular a representation of random mea-
sures with independent increments. A version for general processes on 1R+
will be proved in Theorem 15.4.
Corollary 12.11 (independent increments) Let € be a random measure
on a Borel space S such that { s} == 0 a. s. for all s. Then has independent
increments iff a.s.
B = aB + 1 00 xT}(B x dx), B E S, (3)
for some nonrandom measure Q on S and some Poisson process TJ on S x
(0,00). Furthermore, B < 00 a.s. for some B E S iff oB < 00 and
1 00 (x 1\ 1) ET}(B x dx) < 00. (4)
Proof: Introduce on S x (0,00) the point process TJ == L8 8s,{s}, where
the required measurability follows by a simple approximation. Noting that
1] has independent S-increments, and also that
1]({S} x (0,00)) == 1{{s} > O} < 1, S E S,
we conclude from Theorem 12.10 that rJ is a Poisson process. Subtracting
the atomic part from , we get a diffuse random measure 0 satisfying (3),
and we note that a has again independent increments. Hence, a is a.s.
nonrandom by Theorem 5.11. Next, Lemma 12.2 (i) yields for any B E S
and r > 0
-logEexp {-r 1 00 XT}(B x dX)} = 1 00 (1 - e- rx ) ET}(B x dx).
As r -t 0, it follows by dominated convergence that Jo CX ) x1](B x dx) < 00
a.s. iff (4) holds. 0
We proceed to characterize the mixed Poisson and binomial processes by
a natural symmetry condition. Related results for more general processes
appear in Theorems 11.15 and 16.21. Given a random measure and a
diffuse measure j.L on S, we say that € is j.L-symmetric if 0 1- 1 d for
every JL-preserving mapping f on S.
Theorem 12.12 (symmetric point processes) Consider a simple point
process € and a diffuse, a-finite measure J.-l on a Borel space S. Then €
is J..l-symmetric iff it is a mixed Poisson or binomial process based on J.-l.
Proof: By Theorem 12.4 and scaling we may assume that J.-lS == 1. By
the symmetry of there exists a function 'P on [0,1] such that P{B ==
O} = r.p(jjB) for all B, and by Theorem 12.8 (i) it is enough to show that
'P has the desired form. For notational convenience, we may then assume
that jj equals Lebesgue measure on (0,1], the general case being similar.
Then introduce for suitable j,n E N the intervals Inj == n- 1 (j -1,j], and
236 Foundations of Modern Probability
put nj = Inj 1\ 1. Writing Kn = Lj nj, we get by symmetry
k-l .
<p(k/n) = E n n - "'n J , 0 < k < n.
. O n - J
J=
As n -t 00, we have ""n --t K == (O, 1], and so for kin t E (0,1)
k-1 .
IT n - ""n - J
log .
n- J
j=O
n
L log (1 - "'r n )
r=n-k+1
r-v
n i n
-1 -1
-K L r r-v -K X dx
r=n-k+1 n-k
K log( 1 - kin) -+ "" log( 1 - t).
Hence, the product on the left tends to (1 - t), and so by dominated
convergence we get 'P(t) = E(l - t)'" for rational t E (0,1), which extends
by monotonicity to all real t E [0, 1]. This clearly agrees with the result for
a mixed binomial process on (0,1] with K points. 0
Integrals with respect to Poisson processes occur frequently in applica-
tions. The next result gives criteria for the existence of the integrals f,
( - /)/, and ( - J1)/, where and ' are independent Poisson processes
with a common intensity measure J-L. In each case the integral may be de-
fined as a limit in probability of elementary integrals fn, ( - /)fn, or
( - JL) / n, respectively, where the / n are bounded with compact support and
such that Ifni < If I and fn -4 f. We say that the integral of f exists if the
appropriate limit exists and is independent of the choice of approximating
functions f n .
Lemma 12.13 (Poisson integrals) Let and ' be independent Poisson
processes on S with the same intensity measure JL. Then for any measurable
function f on S, we have
(i) f exists iff J1( 1/1/\ 1) < 00;
(ii) (- /)f exists iff J-L(f2 1\ 1) < 00;
(iii) (- J.t)f exists iff Jl;(f2 /\ liD < 00.
In each case, it is also equivalent that the corresponding set of approximat-
ing elementary integrals is tight.
Proof: (i) If Ifl < 00 a.s., then Jl;(lfl /\ 1) < 00 by Lemma 12.2. The
converse implication was established in the proof of the same lemma.
(ii) First consider a deterministic counting measure II = Lk lJ sk , and
define v = Lk {}k 8 sk' where {)1, {)2, . .. are i.i.d. random variables with
P{{}k = :f:1} = . By Theorem 4.17, the series ilf converges a.s. iff
v j2 < 00, and otherwise Iv f n I 00 for any bounded approximations
fn = IBnf with Bn E S. The result extends by conditioning to arbitrary
point processes v and their symmetric randomizations v. Now Proposition
12. Poisson and Pure Jump-Type Markov Processes 237
12.3 exhibits - ' as such a randomization of the Poisson process + /,
and by part (i) we have (+ /)f2 < 00 a.s. iff J-t(f2 1\ 1) < 00.
(iii) Write f == 9 + h, where 9 == fl{lfl < I} and h == jl{lfl > I}. First
assume that J-tg2 + ttlhl == tt(f2/\ If I) < 00. Since clearly E( f - jjf)2 == pf2,
the integral (- J-t)g exists. Furthermore, h exists by part (i). Hence, even
( - JL)f == ( - p)g + h - JLh exists.
Conversely, assume that ( - J-t)f exists. Then so does ( - /)j, and by
part (ii) we get ttg 2 + J1{h =1= O} == J1(f2/\ 1) < 00. The existence of (- J-t)g
now follows by the direct assertion, and trivially even h exists. Thus, the
existence of ph == (- J1)g + h - (- J1)f follows, and so J-tlhl < 00. 0
A Poisson process on JR+ is said to be time-homogeneous with rate C > 0
if E == CA. In that case Proposition 8.5 shows that Nt == [O, t], t > 0, is a
space- and time-homogeneous Markov process. We now introduce a more
general class of Markov processes.
Say that a process X in some measurable space (8, S) is of pure jump
type if its paths are a.s. right-,continuous and constant apart from isolated
jumps. In that case we may denote the jump times of X by 71,72, . . . , with
the understanding that Tn == 00 if there are fewer than n jumps. By Lemma
7.3 and a simple approximation, the times Tn are optional with respect to
the right-continuous filtration :F == (Ft) induced by X. F'or convenience we
may choose X to be the identity mapping on the canonical path space 0.
When X is Markov, the distribution with initial state x is denoted by Px,
and we note that the mapping x t---t Px is a kernel from (8, S) to (0, Foo).
We begin our study of pure jump-type Markov processes by proving an
extension of the elementary strong Markov property in Proposition 8.9. A
further extension appears as Theorem 19.17.
Theorem 12.14 (strong Markov property, Doob) A pure jump-type
Markov process satisfies the strong Markov property at every optional time.
Proof: For any optional time r, we may choose some optional times an >
r+2- n taking countably many values such that an -+ T a.s. By Proposition
8.9 we get, for any A E:F., n {T < oo} and B E :Foo,
P[f)unX E B; A] == E[P x <7n B; A].
(5)
By the right-continuity of X, we have P{XD"n =1= X T } -+ o. If B depends
on finitely many coordinates, it is also clear that
P({OD"nX E B}6{OTX E B}) -+ 0, n -+ CX).
Hence, (5) remains true for such sets B with an replaced by T, and the
relation extends to the general case by a monotone class argument. 0
We shall now see how the homogeneous Poisson processes may be char-
acterized as special renewal processes. Recall that a random variable 'Y is
said to be exponentially distributed with rate c > 0 if P {J > t} == e - ct for
all t > O. In this cage, clearly E, == c- 1 .
238 Foundations of Modern Probability
Proposition 12.15 (Poisson and renewal processes) Let € be a simple
point process on 1R+ with atoms at T1 < T2 < ... , and put TO == o. Then
is homogeneous Poisson with rate c > 0 iff the differences Tn - Tn-1 are
i. i. d. and exponentially distributed with mean c- 1 .
Proof: First assume that is Poisson with rate c. Then Nt = [O, t] is a
space- and time-homogeneous pure jump-type Markov process. By Lemma
7.6 and Theorem 12.14, the strong Markov property holds at each Tn, and
by Theorem 8.10 we get
d
T1 = Tn+1 - Tn lL (T1,..., Tn), n E N.
Thus, the variables Tn - T n -1 are i.i.d., and it remains to note that
P{7"l > t} = P{[O,t] == O} == e- c .
Conversely, assume that T1, 72, . .. have the stated properties. Consider
a homogeneous Poisson process 'fJ with rate c and with atoms at (11 < (12 <
... , and conclude from the necessity part that (an) d (Tn). Hence,
= Ln b-r n d Ln bUn = 'fJ.
We proceed to examine the structure of a general pure jump-type Markov
process. Here the first and crucial step is to describe the distributions
associated with the first jump. Say that a state xES is absorbing if
Px{ X = x} = lor, equivalently, if Px{ 71 == oo} == 1.
Lemma 12.16 (first jump) If x is nonabsorbing, then under Px the time
T1 until the first jump is exponentially distributed and independent of (}"'l X .
o
Proof: Put 71 = T. Using the Markov property at fixed times, we get for
any s, t > 0
Px{T> S + t} == Px{T > S, TO OS > t} = Px{T > S}Px{T > t}.
The only nonincreasing solutions to this Cauchy equation are of the form
Px{7 > t} = e- ct with c E [0,00]. Since x is nonabsorbing and 7 > 0 a.s.,
we have c E (0, (0), and so 7 is exponentially distributed with parameter
c.
By the Markov property at fixed times, we further get for any B E F 00
PX{T > t, (}".X E B}
Px{ T > t, (fJ".X) 0 (}t E B}
Px{7 > t}Px{(J".X E B},
which shows that T lLO-rX.
o
Writing Xoo = x when X is eventually absorbed at x, we may define the
rate function c and jump transition kernel /-L by
c(x) = (ExT1)-1, J.L(x, B) = PX{X"'l E B},
XES, B E S.
It is often convenient to combine c and J-L into a rate kernel a(x, B)
c(x)J.1.(x, B) or Q = CJ.L, where the required measurability is clear from
12. Poisson and Pure Jump- Type Markov Processes 239
that for the kernel (P x ). Note that jj may be reconstrueted from Q, if we
add the requirement that jj(x,.) == 8x when a(x,.) == 0, conforming with
our convention for absorbing states. This ensures that J.L is a measurable
function of ();.
The following theorem gives an explicit representation of the process
in terms of a discrete-time Markov chain and a sequence of exponentially
distributed random variables. The result shows in particular that the dis-
tributions Px are uniquely determined by the rate kernel a. As usual, we
assume the existence of required randomization variables.
Theorem 12.17 (embedded Markov chain) Let X be a pure jump-type
Markov process with rate kernel a == Cjj. Then there exist a Markov process
Y on Z+ with transition kernel J-L and an independent sequence of i. i. d.,
exponentially distributed random variables 1'1, 1'2, . .. with mean 1 such that
a.s.
Xt = Y n ,
t E [Tn, T n +1), n E Z+,
(6)
where
n
Tk
Tn = (Yi ) , n E Z+.
k=l C k-l
(7)
Proof: To satisfy (6), put TO = 0, and define Y n == X Tn for n E Z+. Intro-
duce some Li.d. exponentially distributed random variables T' T' . . .lLX
with mean 1, and define for n E N
"In == (Tn - Tn-1)C(Yn)1{Tn-l < oo} + "Il{c(}) == O}.
By Lemma 12.16, we get for any t > 0, B E 5, and xES with c(x) > 0
Px{l'l > t, Y 1 E B} = Px{TIC(X) > t, Y 1 E B} = e-tJ-L(x,B),
and this clearly remains true when c(x) = o. By the strong Markov property
we obtain for every n, a.s. on { Tn < oo},
Px[rn+l > t, Y n + 1 E BIFrn] = P Yn {l'l > t, Yl E B} = e-tjj(Yn,B). (8)
The strong Markov property also gives Tn+l < 00 a.s. on the set {Tn < 00,
C(Y n ) > O}. Arguing recursively, we get {c(Yn) == O} == {Tn+l == oo} a.s.,
and (7) follows. Using the same relation, it is also easy to check that (8)
remains a.s. true on { Tn == oo}, and in both cases we Dlay clearly replace
Fr n by Yn == Fr n V a{ T' . . . , T}. Thus, the pairs (Tn, Y n ) form a discrete-
time Markov process with the desired transition kernel. 13y Proposition 8.2,
the latter property together with the initial distribution determine uniquely
the joint distribution of Y and (Tn). 0
In applications the rate kernel Q is normally given, and one needs to know
whether a corresponding Markov process X exists. As before we may write
a(x, B) = c(x)J.t(x, B) for a suitable choice of rate function c: S -+ R+ and
transition kernel J.L on S, where J..L(x,.) = 6x when c(x) == 0 and otherwise
J1(x, {x}) == o. If X does exist, it clearly may be constructed as in Theorem
240 Foundations of Modern Probability
12.17. The construction fails when ( = SUP n Tn < 00, in which case an
explosion is said to occur at time (.
Theorem 12.18 (synthesis) For any kernel Q; == CM on S with a(x, {x}) =
0, consider a Markov chain Y with transition kernel J-l and some i. i.d., expo-
nentially distributed random variables 1'1,1'2,... llY with mean 1. Assume
that En "tn/C(Yn-l) == 00 a.s. under every initial distribution for Y. Then
(6) and (7) define a pure jump-type Markov process with rate kernel Q.
Proof: Let Px be the distribution of the sequences Y = (Y n ) and r =
(1'n) when Yo = x. For convenience, we may regard (Y, r) as the identity
mapping on the canonical space n == 8 00 x R+. Construct X from (Y, r)
as in (6) and (7), with Xt = So arbitrary for t > SUP n Tn, and introduce
the filtrations y == (Qn) induced by (Y,,) and :F = (Ft) induced by X. It
suffices to prove the Markov property Px[8tX E 'IFt] == PXt {X E,}, since
the rate kernel may then be identified via Theorem 12.17.
Then fix any t > 0 and n E Z+, and define
K == sup{k; Tk < t}, f3 = (t - Tn)C(Yn).
Put Tm(Y,r) == {(Y k "k+1); k > m}, (Y',r') == T n + 1 (Y,r), and "
,n+1' Since clearly
Ft = Yn Va{,' > fj} on {== n},
it is enough by Lemma 6.2 to prove that
Px[(Y', r') E ., 'Y' - (3 > rl Qn,,' > (3] = pYn {T(Y, r) E ., ')'1 > r}.
Now (y l , r')l1.g n (,"'/, {3) because ')"lL(Qn, Y', r'), and so the left-hand side
equals
Px[(Y', r') E ., ')" - {3 > rlQn]
Px [')" > IQn]
= p [( Y' r' ) E . I I! ] Px[,' - /3 > rlQn] == ( p 0 T- 1 ) -T
x, n Px ['Y' > .8lgn] Y n e,
as required.
o
To complete the picture, we need a convenient criterion for nonexplosion.
Proposition 12.19 (explosion) For any rate kernel Q and initial state x,
let (Y n ) and (Tn) be such as in Theorem 12.17. Then a.s.
Tn 00 iff 2: n {C(Yn)} -1 = 00. (9)
In particular, Tn -t 00 a.s. when x is recurrent for (Y n ).
Proof: Write (3n = {c(Y n - 1 )} -1. Noting that Ee-U'Yn == (1 + u)-1 for all
u > 0, we get by (7) and Fubini's theorem
E[e-u(IYj = IIn (1 + ul1n)-l = exp { - Ln log(l + ul1n}} a.s. (10)
12. Poisson and Pure Jump- Type Markov Processes 241
Since !(rI\1) < log(l+r} < r for all r > 0, the series on the right converges
for every u > 0 iff En /3n < 00. Letting u 0 in (10), we get by dominated
convergence
P[( < 001 Y] = 1 {:LJ1n < oo} a.s.,
which implies (9). If x is visited infinitely often, then the series En!3n has
infinitely many terms C;l > 0, and the last assertion follows. 0
By a pseudo-Poisson process in some measurable space 5 we mean a
process of the form X == YoN a.s., where Y is a diserete-time Markov
process in Sand N is an independent homogeneous Poisson process. Letting
Jlt be the transition kernel of Y and writing c for the constant rate of N,
we may construct a kernel
a(x, B} = CJl(x, B \ {x}}, XES, B E B(S),
(11)
which is measurable since J.t(x, {x}) is a measurable function of x. The next
result characterizes pseudo-Poisson processes in terms of the rate kernel.
Proposition 12.20 (pseudo-Poisson processes) A pr-ocess X in some
Borel space 5 is pseudo-Poisson iff it is pure jump-type Markov with a
bounded rate function. Specifically, if X == YoN a.s. for some Markov
chain Y with transition kernel Jlt and an independent Poisson process N
with constant rate c, then X has the rate kernel in (II).
Proof: Assume that X = YoN with Y and N as stated. Letting T1, T2, . . .
be the jump times of N and writing F for the filtration induced by the pair
(X, N), it may be seen as in Theorem 12.18 that X is F-:rvlarkov. To identify
the rate kernel 0, fix any initial state x, and note that the first jump of
X occurs at the first time Tn when Y n leaves x. For each transition of Y,
this happens with probability Px = J.t(x, {x}C). By Proposition 12.3 the
time until first jump is then exponentially distributed with parameter cpx.
If Px > 0, we further note that the location of X after the first jump has
distribution J.t( x, . \ {x} ) / Px. Thus, a is given by (11).
Conversely, let X be a pure jump-type Markov process with uniformly
bounded rate kernel Q =1= O. Put r x = a(x, S) and c == sUPx r x , and note
that the kernel
Jlt(x,.) == c- 1 {a(x,.) + (c - r x )6x}, xES,
satisfies (11). Thus, if X' = Y' 0 N' is a pseudo-Poisson process based on J.L
and c, then X' is again Markov with rate kernel a, and so X d X'. Hence,
Corollary 6.11 yields X = YoN a.s. for some pair (Y, N) d (Y', N'). 0
If the underlying Markov chain Y is a random walk in some measurable
Abelian group 5, then X == YoN is called a compound Poisson process. In
this case X - Xo J.L Xo, the jump sizes are i.i.d., and the jump times are
given by an independent homogeneous Poisson process. Thus, the distribu-
tion of X - Xo is determined by the characteristic measure lJ == cJ.t, where c
242 Foundations of Modern Probability
is the rate of the jump time process and J-t is the common distribution of the
jumps. A kernel a on S is said to be homogeneous if a(x, B) = a(O, B - x)
for all x and B. Let us also say that a process X in S has independent
increments if Xt - Xs l.L {X r ; r < s} for any s < t.
The next result characterizes compound Poisson processes in two ways,
analytically in terms of the rate kernel and probabilistically in terms of the
increments of the process.
Corollary 12.21 (compound Poisson processes) For any pure jump-
type process X in some measurable Abelian group, these conditions are
equivalent:
(i) X is Markov with homogeneous rate kernel;
(ii) X has independent increments;
(iii) X is compound Poisson.
Proof: If a pure jump-type Markov process is space-homogeneous, then its
rate kernel is clearly homogeneous; the converse follows from the representa-
tion in Theorem 12.17. Thus, (i) and (ii) are equivalent by Proposition 8.5.
Next Theorem 12.17 shows that (i) implies (iii), and the converse follows
by Theorem 12.18. 0
Our next aim is to derive a combined differential and integral equation
for the transition kernels /-Lt. An abstract version of this result appears in
Theorem 19.6. For any measurable and suitably integrable function f: S
1R, we define
Ttf(x) = J f(Y)I1Ax, dy) = Exf(Xt), XES, t > o.
Theorem 12.22 (backward equation, Kolmogorov) Let Q be the rate ker-
nel of a pure jump-type Markov process on S, and fix any bounded,
measurable function /: S --t IR. Then Tt/(x) is continuously differentiable
in t for fixed x, and we have
Ttf(x) = J a(x, dy){Jtf(y) -Jtf(x)}, t > 0, XES. (12)
Proof: Put T = T1, and let xES and t > O. By the strong Markov
property at u = T 1\ t and Theorem 6.4,
Ttf(x)
Ex/(Xt) = Exf«OuX)t-u) = ExTt-uf(Xu)
f(x)Px{r > t} + Ex[Tt-T!(X T ); r < t]
f(x)e- tcx + it e-SCxds J a(x,dy)Tt-sf(y),
and so
etcXTf.f(x) = f(x) + it e 8Cx ds J a(x,dy)Tsf(y). (13)
12. Poisson and Pure Jump-Type Markov Processes 243
Here the use of the disintegration theorem is justified by the fact that
X(w, t) is product measurable on Q x JR+ because of the right-continuity
of the paths.
From (13) we note that Ttf(x) is continuous in t for each x, and so by
dominated convergence the inner integral on the right is continuous in s.
Hence, Ttf(x) is continuously differentiable in t, and (12) follows by an
easy computation. 0
The next result relates the invariant distributions of a pure jump-type
Markov process to those of the embedded Markov chain.
Proposition 12.23 (invariance) Let the processes X and Y be related as
in Theorem 12.17, and fix a probability measure v on S" with J c dv < 00.
Then v is invariant for X iff c . v is invariant for Y.
Proof: By Theorem 12.22 and Fubini's theorem, we have for any bounded
measurable function f: S -1- 1R
Evf(Xt) = J f(x)v(dx) + l t ds J v(dx) J Q(x, dy){Tsf(y) - Tsf(x)}.
Thus, v is invariant for X iff the second term on the right is identically
zero. Now (12) shows that Ttf(x) is continuous in t, and by dominated
convergence this is also true for the integral
It = f v(dx) f Q(x, dy){TtJ(y) - TtJ(x)}, t > O.
Thus, the condition becomes It = O. Since f is arbitra.ry, it is enough to
take t = o. Our condition then reduces to (va)f = v(cf) or (c. v)J.L = c. v,
which means that c . v is invariant for Y. 0
By a continuous-time Markov chain we mean a pure jump-type Markov
process on a countable state space S. Here the kernels J.-£t may be specified
by the set of transition functions Pj = JLt(i, {j}). The connectivity prop-
erties are simpler than in discrete time, and the notion of periodicity has
no counterpart in the continuous-time theory.
Lemma 12.24 (positivity) For any i,j E S, we have either Pj > 0 for
all t > 0, or Pj = 0 for all t > O. In particular, Pi > 0 for all t and i.
Proof: Let q = (qij) be the transition matrix of the embedded Markov
chain Y in Theorem 12.17. If qij = Pi{Y n == j} = 0 for all n > 0, then
clearly 1{X t =I j} = 1 a.s. Pi, and so Pj == 0 for all t > O. If instead
qIj > 0 for some n > 0, there exist some states i = io, iI, . . . , in == j with
qik-bik > 0 for k = 1, . . . , n. Noting that the distribution of ('1, . . . , l'n+l)
has positive density I1kn+l e- Xk > 0 on JR+l, we obtain for any t > 0
{ n n+l } n
Pj > p 2: 7 k < t < 2: c:rk II qik-l,ik > O.
k=l Ci k - 1 k=l tk-l k=1
244 Foundations of Modern Probability
Since P?i = q?i = 1, we get in particular Pi > 0 for all t > o.
o
A continuous-time Markov chain is said to be irreducible if Pj > 0 for
all i, j E Sand t > O. Note that this holds iff the associated discrete-time
process Y in Theorem 12.17 is irreducible. In that case clearly sup{ t > 0;
Xt == j} < 00 iff sup{ n > 0; Y n == j} < 00. Thus, when Y is recurrent, the
sets {t; Xt = j} are a.s. unbounded under Pi for all i E S; otherwise, they
are a.s. bounded. The two possibilities are again referred to as recurrence
and transience, respectively.
The basic ergodic Theorem 8.18 for discrete-time Markov chains has an
analogous version in continuous time. Further extensions are considered in
Chapter 20.
Theorem 12.25 (ergodic behavior) For any irreducible, continuous-time
Markov chain in S, exactly one of these cases occurs:
(i) There exists a unique invariant distribution v, the latter satisfies Vi >
o for all i E S, and for any distribution Jl on S,
!im IIPJ.t 0 Oil - Pv II == O. (14)
too
(ii) No invariant distribution exists, and Pj -+ 0 for all i, j E s.
Proof: By Lemma 12.24 the discrete-time chain Xnh, n E Z+, is irre-
ducible and aperiodic. Assume that (X nh ) is positive recurrent for some
h > 0, say with invariant distribution v. Then the chain (Xnh,) is pos-
itive recurrent for every h' of the form 2- m h, and by the uniqueness in
Theorem 8.18 it has the same invariant distribution. Since the paths are
right-continuous, we may conclude by a simple approximation that v IS
invariant even for the original process X.
For any distribution J-l on S we have
IIPI' oOt l - Pvll = III:ilLiI:/Pj - Vj)Pjll < LilLiL)pj - vjl.
Thus, (14) follows by dominated convergence if we can show that the inner
sum on the right tends to zero. This is clear if we put n = [tlh] and
T == t - nh and note that by Theorem 8.18
I:klpk - vkl < I:jI:klpijh - VjlPJk = I:)pijh - Vjl -+ O.
It remains to consider the case when (Xnh) is null recurrent or transient
for every h > O. Fixing any i, k E S and writing n == [tlh] and r == t - nh
as before, we get
t r nh < nh + r nh + (1 r )
Pik = L...ijPijPjk - Pik L.Jj#iPij == Pik - Pii ,
which tends to zero as t -+ 00 and then h -+ 0, due to Theorem 8.18 and
the continuity of Pi. 0
As in discrete time, we note that condition (ii) of the last theorem holds
for any transient Markov chain, whereas a recurrent chain may satisfy either
12. Poisson and Pure Jump- Type Markov Processes 245
condition. Recurrent chains satisfying (i) and (ii) are again referred to as
positive recurrent and null-recurrent, respectively. It is interesting to note
that X may be positive recurrent even when the embedded, discrete-time
chain Y is null-recurrent, and vice versa. On the other hand, X clearly has
the same ergodic properties as the discrete-time processes (Xnh), h > O.
Let us next introduce the first exit and recurrence tirres
"Yj == inf{ t > 0; Xt =1= j},
Tj = inf {t > "'Yj; )( t == j}.
As in Theorem 8.22 for the discrete-time case, we may express the asymp-
totic transition probabilities in terms of the mean recurrence times EjTj. To
avoid trivial exceptions, we confine our attention to non absorbing states.
Theorem 12.26 (mean recurrence times) For any continuous-time Mar-
kov chain in S and states i, j E S with j nonabsorbing, we have
I . t Pi {Tj < oo}
lID Pi' == .
t-+oo J C.E'T'
J J J
(15)
Proof: It is enough to take i == j, since the general statement will then
follow as in the proof of Theorem 8.22. If j is transient, then 1 {X t ==
j} -t 0 a.s. P j , and so by dominated convergence P]j == Pj{X t == j} -+ O.
This agrees with (15), since in this case Pj {Tj == oo} > O. Turning to the
recurrent case, let 5j denote the class of states i accessible from j. Then
8j is clearly irreducible, and so PJj converges by Theorem 12.25.
To identify the limit, define
L = '\{s < t; Xs =j} = l t l{X s =j}ds, t > 0,
and let rj denote the instant of nth return to j. Letting m, n -+ 00 with
1m - nl < 1, and using the strong Markov property and the law of large
numbers, we get a.s. Pj
U(rj) _ Lj(rj) n m Ej'Yj 1
n - .-n'--+ E ==- E :;--.
Tj m Tj n jTj Cj 'jTj
By the monotonicity of Lj, it follows that t- I L -+ (Cj Ej Tj) -1 a.s. Hence,
by Fubini's theorem and dominated convergence,
t j
1 ps. .ds = EjLt ---+ 1 ,
t 0 JJ t C.E.T'
J J J
and (15) follows.
o
246 Foundations of Modern Probability
Exercises
1. Let be a point process on a Borel space S. Show that = Ek 6'Tk for
some random elements Tk in S U { }, where d fJ. S is arbitrary. Extend the
result to general random measures. (Hint: We may assume that S = R.+.)
2. Show that two random measures and 1] are independent iff Ee-f-".,g
= Ee-f Ee- TJ9 for all measurable f, 9 > o. Also, in case of simple point
processes, prove the equivalence of P { B + 'T}C == O} == P { B == O} P {TJC ==
O} for any B, C E S. (Hint: Regard (, 1]) as a random measure on 28.)
3. Let 1, 2, . .. be independent Poisson processes with intensity measures
J-Ll, J-L2, . .. such that the measure J.l == Ek J.,lk is a-finite. Show that ==
Ek k is again Poisson with intensity measure J-L.
4. Show that the classes of mixed Poisson and binomial processes are
preserved under randomization.
5. Let be a Cox process on S directed by some random measure 'T}, and
let f be a measurable mapping into some space T such that TJ 0 j-l is a.s.
u-finite. Prove directly from definitions that 0 1-1 is a Cox process on
T directed by TJ 0 j-l. Derive a corresponding result for p-thinnings. Also
show how the result follows from Proposition 12.3.
6. Consider a p-thinning 1] of and a q-thinning ( of TJ with (J.L1J. Show
that ( is a pq-thinning of .
7. Let be a Cox process directed by 1] or a p-thinning of'fJ with p E (0,1),
and fix two disjoint sets B, C E S. Show that BlLC iff'T]BJ1.1]C. (Hint:
Compute the Laplace transforms. The if assertions can also be obtained
from Proposition 6.8.)
8. Use Lemma 12.2 to derive expressions for P{B = O} when is a Cox
process directed by 1J, a j.L-randomization of 17, or a p-thinning of 1]. (Hint:
Note that Ee-tB ---t P{B = O} as t -t 0.)
9. Let be a p-thinning of TJ, where p E (0,1). Show that and 1} are
simultaneously Cox. (Hint: Use Lemma 12.6.)
10. (Fichtner) For a fixed p E (0,1), let TJ be a p-thinning of a point process
€ on S. Show that € is Poisson iff 1]Jl€ - 1]. (Hint: Extend by iteration to
arbitrary p. Then a uniform randomization of on S x [0, 1] has independent
increments in the second variable, and the result follows by Theorem 18.3.)
11. Use Theorem 12.8 to give a simplified proof of Theorem 12.4 in the
case when is simple.
12. Derive Theorem 12.4 from Theorem 12.12. (Hint: Note that is sym-
metric on S iff it is symmetric on Bn for every n. If is simple, the assertion
follows immediately from Theorem 12.12. Otherwise, apply the same result
to a uniform randomization on S x [0, 1].)
12. Poisson and Pure Jump-Type Markov Processes 247
13. For as Theorem 12.12, show that P{B == O} =: cp(I-LB) for some
completely monotone function 'P. Conclude from the Hausdorff-Bernstein
characterization and Theorem 12.8 that is a mixed Poisson or binomial
process based on J.L.
14. Show that the distribution of a simple point process on is not
determined, in general, by the distributions of €I for all intervals I. (Hint: If
is restricted to {I,..., n}, then the distributions of all I give 2: k sn k(n-
k + 1) < n 3 linear relations between the 2 n - 1 parameters.)
15. Show that the distribution of a point process is not determined, in
general, by the one-dimensional distributions. (Hint: If is restricted to
{O, I} with {O} V {1} < n, then the one-dimensional distributions give
4n linear relations between the n( n + 2) parameters.)
16. Show that Lemma 12.1 remains valid with Bl'...' Bn restricted to an
arbitrary preseparating class C, as defined in Chapter 16 or Appendix A2.
Also show that Theorem 12.8 holds with B restricted to a separating class.
(Hint: Extend to the case when C = {B E S; (+ 1])8B = 0 a.s.}. Then
use monotone class arguments for sets in S and in M(S).)
17. Show that Theorem 12.10 fails in general without the condition (( {s} x
K) = 0 a.s. for all s.
18. Give an example of a non-Poisson point process on S such that f,B
is Poisson for every B E S. (Hint: It suffices to take S = {O, I}.)
19. Extend Corollary 12.11 to the case when Ps = P{{s} > O} may be
positive. (Hint: By Fatou's lemma, Ps > 0 for at most countably many s.)
20. Prove Theorem 12.13 (i) and (iii) by means of characteristic functions.
21. Let and", be independent Poisson processes on S with E == Ef} = J-L,
and let /1, 12, . . . : S -+ JR be measurable with 00 > J.-L(f: 1\ 1) -+ 00. Show
that I( - 1J)fnl 00. (Hint: Consider the symmetrization v of a fixed
measure 11 E N(S) with lIf --+ 00, and argue along subsequences as in the
proof of Theorem 4.17.)
22. For any pure jump-type Markov process on S, show that Px{T2 < t}
= o(t) for all XES. Also note that the bound can be sharpened to O(t 2 ) if
the rate function is bounded, but not in general. (Hint: Use Lemma 12.16
and dominated convergence.)
23. Show that any transient, discrete-time Markov chain Y can be em-
bedded into an exploding (resp., nonexploding) continuous-time chain X.
(Hint: Use Propositions 8.12 and 12.19.)
24. In Corollary 12.21, use the measurability of the mapping X = YoN
to deduce the implication (iii) => (i) from its converse. (Hint: Proceed as
in the proof of Proposition 12.15.) Also use Proposition 12.3 to show that
(iii) implies (ii), and prove the converse by means of Theorem 12.10.
248 Foundations of Modern Probability
25. Consider a pure jump-type Markov process on (8, S) with transi-
tion kernels J.Lt and rate kernel a. Show for any xES and B E S that
a(x,B) = /lo(x,B \ {x}). (Hint: Take f == 1 B \{x} in Theorem 12.22, and
use dominated convergence.)
26. Use Theorem 12.22 to derive a system of differential equations for the
transition functions Pij(t) of a continuous-time Markov chain. (Hint: Take
f( i) == 6 ij for fixed j.)
27. Give an example of a positive recurrent, continuous-time Markov chain
such that the embedded discrete-time chain is null-recurrent, and vice versa.
(Hint: Use Proposition 12.23.)
28. Establish Theorem 12.25 by a direct argument, mimicking the proof of
Theorem 8.18.
Chapter 13
Gaussian Processes
and Brownian Motion
Symmetries of Gaussian distribution; existence and path prop-
erties of Brownian motion; strong Markov and reflection
properties; arcsine and uniform laws; law of the iterated log-
arithm; Wiener integrals and isonormal Gaussian processes;
multiple Wiener-Ita integrals; chaos expansion of Brownian
functionals
The main purpose of this chapter is to initiate the study of Brownian
motion, arguably the single most important object in rnodern probability
theory. Indeed, we shall see in Chapters 14 and 16 how the Gaussian limit
theorems of Chapter 5 can be extended to approximations of broad classes
of random walks and discrete-time martingales by a Brownian motion.
In Chapter 18 we show how every continuous local rnartingale may be
represented in terms of Brownian motion through a suitable random time-
change. Similarly, the results of Chapters 21 and 23 demonstrate how large
classes of diffusion processes may be constructed from Brownian motion
by various pathwise transformations. Finally, a close relationship between
Brownian motion and classical potential theory is uncovered in Chapters
24 and 25.
The easiest construction of Brownian motion is via a so-called isonormal
Gaussian process on £2 (R+ ), whose existence is a consequence of the char-
acteristic spherical symmetry of the multivariate Gaussian distributions.
Among the many important properties of Brownian motion, this chapter
covers the Holder continuity and existence of quadratic variation, the strong
Markov and reflection properties, the three arcsine laws, and the law of the
iterated logarithm.
The values of an isonormal Gaussian process on L 2 (JR+) may be identi-
fied with integrals of L2-functions with respect to the associated Brownian
motion. Many processes of interest have representations in terms of such
integrals, and in particular we shall consider spectral arld moving average
representations of stationary Gaussian processes. More generally, we shall
introduce the multiple Wiener-Ita integrals In! of functions f E £2(IR+.)
and establish the fundamental chaos expansion of Brownian L 2 -functionals.
The present material is related to practically every other chapter in the
book. Thus, we refer to Chapter 5 for the definition of Gaussian distribu-
250 Foundations of Modern Probability
tions and the basic Gaussian limit theorem, to Chapter 6 for the transfer
theorem, to Chapter 7 for properties of martingales and optional times, to
Chapter 8 for basic facts about Markov processes, to Chapter 9 for similar-
ities with random walks, to Chapter 11 for some basic symmetry results,
and to Chapter 12 for analogies with the Poisson process.
Our study of Brownian motion per se is continued in Chapter 18 with the
basic recurrence or transience dichotomy, some further invariance proper-
ties, and a representation of Brownian martingales. Brownian local time
and additive functionals are studied in Chapter 22. In Chapter 24 we
consider some basic properties of Brownian hitting distributions, and in
Chapter 25 we examine the relationship between excessive functions and
additive functionals of Brownian motion. A further discussion of multiple
integrals and chaos expansions appears in Chapter 18.
To begin with some basic definitions, we say that a process X on some
parameter space T is Gaussian if the random variable C1X t1 + . . . + cnX tn
is Gaussian for any choice of n EN, t 1, . . . , t n E T, and Cl, . . . , C n E JR. This
holds in particular if the Xt are independent Gaussian random variables.
A Gaussian process X is said to be centered if EXt = 0 for all t E T. Let
us also say that the processes Xi on T i , i E I, are jointly Gaussian if the
combined process X == {X;; t E Ti, i E I} is Gaussian. The latter condition
is certainly fulfilled if the processes Xi are independent and Gaussian.
The following simple facts clarify the fundamental role of the covariance
function. As usual, we assume all distributions to be defined on the a-fields
generated by the evaluation maps.
Lemma 13.1 (covariance function)
(i) The distribution of a Gaussian process X on T is determined by the
functions EXt and cov(Xs, X t ), s, t E T.
(ii) The jointlv Gaussian processes Xi on T i , i E I, are independent iff
cov(X,Xl) == 0 for all s E T i and t E Tj, i =1= j in I.
Proof: (i) Let X and Y be Gaussian processes on T with the same
means and covariances. Then the random variables C1X tl +.. '+cnX tn and
Cl yt 1 +. . . + C n ¥t n have the same mean and variance for any Cl, . . . , C n E JR.
and tl,.'., t n E T, n E N, and since both variables are Gaussian, their
distributions must agree. By the Cramer-Wold theorem it follows that
d d
(X t1 , . . . , X tn ) = (}It}, . . . , "Yt n ) for any tl, . . . , t n E T, n E N, and so X = Y
by Proposition 3.2.
(ii) Assume the stated condition. To prove the asserted independence,
we may assume I to be finite. Introduce some independent processes yi,
i E I, with the same distributions as the Xi, and note that the combined
processes X = (Xi) and Y = (yi) have the same means and covariances.
Hence, the joint distributions agree by part (i). In particular, the indepen-
dence between the processes yi implies the corresponding property for the
processes Xi. 0
13. Gaussian Processes and Brownian Motion 251
The following result characterizes the Gaussian distributions by a simple
symmetry property.
Proposition 13.2 (spherical symmetry, Maxwell) Let 1, . . . , d be inde-
pendent random variables, where d > 2. Then the distribution of (l, . . . , d)
is spherically symmetric iff the i are i. i. d. centered Gav..ssian.
Proof: Let cp denote the common characteristic function of 1, . . . , d, and
assume the stated condition. In particular, -l l, and so 'P is real valued
and symmetric. Noting th at Sl + t2 d 1 V s2 + t 2 , we obtain the functional
equation <p(s)cp(t) = cp( v s 2 + t 2 ), and so by iteration cpn(t) = cp(tyTi) for
all n. Thus, for rational t 2 we have 'P(t) == e at2 for some constant a, and by
continuity this extends to all t E . Finally, we have a < 0 since I'PI < 1.
Conversely, let l,..., d be i.i.d. centered Gaussian, and assume that
( 'T}1, . . . , 'T/d) = T ( 1, . . . , d) for some orthogonal transformation T. Then
both random vectors are Gaussian, and we may easily verify that
cov( 'T/i, 'TJj) = cov( i, j) for all i and j. Hence, the two distributions agree
by Lemma 13.1. 0
In infinite dimensions, the Gaussian property is essentia.lly a consequence
of the rotational symmetry alone, without any assumption of independence.
Theorem 13.3 (unitary invariance, Schoenberg, Freedrnan) For any infi-
nite sequence of random variables 1,2,... , the distribution of(l,... ,n)
is spherically symmetric for every n > 1 iff the k are conditionally i. i. d.
N(O, (]"2), given some random variable a 2 > o.
Proof: The n are clearly exchangeable, and so by Theorem 11.10 there
exists a random probability measure f.-L such that the n are conditionally
J.l,-i.i.d. given J.L. By the law of large numbers,
J.LB == Hm n- 1 1{k E B} a.s., B E B,
n--+CX) k".5:.n
and in particular J.L is a.s. {3, 4, . . . }-measurable. Now the spherical
symmetry implies that, for any orthogonal transformation T on JR.2,
P[(l, 2) E BI3,. . ., n] == P[T(I, 2) E BI3,. . . , n], B E B(JR. 2 ).
As n --t 00, we get J.L2 == f.-L2 0 T- 1 a.s. Considering a countable dense
set of mappings T, it is clear that the exceptional null set can be chosen
to be independent of T. Thus, f.-L2 is a.s. spherically symmetric, and so
J.l, is a.s. centered Gaussian by Proposition 13.2. It rema.ins to take (j2 ==
J x2f.-L(dx). 0
Now fix a separable Hilbert space H. By an isonorrnal Gaussian pro-
cess on H we mean a centered Gaussian process ryh, h E H, such that
E( TJh TJk) == (h, k), the inner product of hand k. To construct such a pro-
cess 'T}, we may introduce an orthonormal basis (ONB) eL, e2,... E H, and
let 1,2,... be independent N(O,1) random variables. For any element
252 Foundations of Modern Probability
h == Ei biei, we define 'TJh == Ei bii, where the series converges a.s. and in
£2 since Ei b; < 00. The process 'T} is clearly centered Gaussian. It is also
linear, in the sense that 'TJ( ah + bk) == a'TJh + b'TJk a.s. for all h, k E Hand
a, b E . Assuming k == Ei Ciei, we may compute
E(rJh'TJk) == L. .bicjE(ij) == L.biCi == (h,k).
Z,J
By Lemma 13.1 the stated conditions uniquely determine the distribu-
tion of 1]. In particular, the symmetry in Proposition 13.2 extends to a
distributional invariance of'TJ under any unitary transformation on H.
The following result shows how the Gaussian distribution arises naturally
in the context of processes with independent increments. It is interesting
to compare with the similar Poisson characterization in Theorem 12.10.
Theorem 13.4 (independence and Gaussian property, Levy) Let X be a
continuous process in }Rd with independent increments and Xo == O. Then
X is Gaussian, and there exist some continuous functions b in d and a
in JRd 2 , the latter with nonnegative definite increments, such that Xt - Xs
is N(b t - b s , at - as) for all s < t.
Proof: Fix any s < t in + and U E ]Rd. For every n E N we may
divide the interval [s, t] into n subintervals of equal length, and we denote
the corresponding increments of uX by nl, . . . , nn' By the continuity of X
we have maxj Injl ---t 0 a.s., and so Theorem 5.15 shows that u(Xt - Xs) ==
E j nj is a Gaussian random variable. Since X has independent increments,
it follows that X is Gaussian. Writing b t == EXt and at == cov(X t ), we get
E(X t - Xs) == EXt - EX s == b t - b s , and so by independence
o < cov(X t - Xs) == cov(Xt) - cov(Xs) == at - as, s < t.
The continuity of X yields Xs Xt as s ---7 t, and so b s ---7 b t and as at.
Thus, both functions are continuous. 0
If the process X in Theorem 13.4 has stationary, independent increments,
then the mean and covariance functions are clearly linear. The simplest
choice in one dimension is to take b = 0 and at == t, so that Xt - Xs is
N(O, t - s) for all s < t. The next result shows that the corresponding
process exists; it also gives an estimate of the local modulus of continuity.
More precise rates of continuity are obtained in Theorem 13.18 and Lemma
14.7.
Theorem 13.5 (existence of Brownian motion, Wiener) There exists a
continuous Gaussian process B in JR with stationary independent incre-
ments and Bo == 0 such that Bt is N(D, t) for every t > O. Furthermore, B
is a.s. locally Holder continuous with exponent c for any c E (0, !).
Proof: Let 'TJ be an isonormal Gaussian process on L 2 (1R+, A), and de-
fine Bt = 1J1[O,t], t > O. Since indicator functions of disjoint intervals are
orthogonal, the increments of the process Bare uncorrelated and hence
13. Gaussian Processes and Brownian Motion 253
independent. Furthermore, we have IIl(s,t]112 == t - s for any s < t, and so
Bt - B s is N (0, t - s). For any s < t we get
!!.- !!.- ( ) 1/2
Bt - Bs - Bt-s - t - S B 1 ,
(1)
whence,
EIBt - Bslc = (t - s)c/2EIB 1 / c < 00, C:::> o.
The asserted Holder continuity now follows by Theorem 3.23.
A process B as in Theorem 13.5 is called a (standard) Brownian motion
or a Wiener process. By a Brownian motion in JRd we mean a process
Bt == (BI,..., Bf), where B 1 ,..., Bd are independent., one-dimensional
Brownian motions. From Proposition 13.2 we note that the distribution of
B is invariant under orthogonal transformations of Rd. It. is also clear that
any continuous process X in d with stationary independent increments
and Xo == 0 can be written as Xt == bt + uBt for some vector b and matrix
a.
From Brownian motion we may construct other important Gaussian pro-
cesses. For example, a Brownian bridge may be defined as a process on [0,1]
with the same distribution as Xt == Bt -tB I , t E [0, 1]. An easy computation
shows that X has covariance function Ts,t = s(1 - t), 0 :::; s < t < 1.
The Brownian motion and bridge have many nice symmetry properties.
For example, if B is a Brownian motion, then so is - B as well as the
process c- 1 B(c 2 t) for any c > o. The latter transformation is especially
useful and is often referred to as a Brownian scaling. We also note that,
for each u > 0, the processes B u =1:.t - Bu are Brownian motions on IR+
and [0, u], respectively. If B is instead a Brownian bridge, then so are the
processes -Bt and Bl-t.
The following result gives some less obvious invariance properties.
Further, possibly random mappings that preserve the distribution of a
Brownian motion or bridge are exhibited in Theorem 13.11, Lemma 13.14,
and Proposition 18.9.
o
Lemma 13.6 (scaling and inversion) If B is a Brownian motion, then
so is the process tB I / t , whereas (1 - t)B t /(1-t) and tB(1-t)/t are Brownian
bridges. If B is instead a Brownian bridge, then the processes (1 +t)Bt/(l+t)
and (1 + t)B1/(1+t) are Brownian motions.
Proof: Since all processes are centered Gaussian, it suffices by Lemma
13.1 to verify that they have the desired covariance functions. This is clear
from the expressions s 1\ t and (s /\ t) (1 - s V t) for the covariance functions
of the Brownian motion and bridge. 0
From Proposition 8.5 together with Theorem 13.4 we note that any space-
and time-homogeneous, continuous Markov process in d has the form
a Bt + tb + c, where B is a Brownian motion in ]Rd, a is a d x d matrix, and
254 Foundations of Modern Probability
band c are vectors in ]Rd. The next result gives a general characterization
of Gaussian Markov processes. Here we use the convention % = o.
Proposition 13.7 (Gaussian Markov processes) Let X be a Gaussian
process on some index set T c JR., and define rs,t = cov(Xs, Xt). Then
X is Markov iff
Ts,u == Ts,tTt,u/rt,t, S < t < u in T. (2)
If X is further stationary and defined on JR, then rs,t = ae-bls-tl for some
constants a > 0 and b E [0,00].
Proof: Subtracting the means if necessary, we may assume that EXt = O.
Now fix any times t < u in T, and choose a E JR such that X = Xu -
aX t 1.. Xt. Then a = rt,u/Tt,t when rt,t =f 0, and if rt,t = 0, we may take
a = o. By Lemma 13.1 we get XlLXt.
First assume that X is Markov, and let s < t be arbitrary. Then
Xsll.XtXU' and so XsJlXtX. Since also XtJlX by the choice of a, Propo-
sition 6.8 yields Xsll.X. Hence, rs,u = ars,t, and (2) follows as we insert
the expression for a. Conversely, (2) implies Xs J... X for all s < t, and so
FtJlX by Lemma 13.1, where Ft = a{X s ; s < t}. By Proposition 6.8 it
follows that FtllXtXu, which is the required Markov property of X at t.
If X is stationary, then r s,t = rls-tl,O == Tis-tl' and (2) reduces to the
Cauchy equation Tors+t = Tsrt, s, t > 0, which admits the only bounded
solutions Tt = ae- bt . 0
A continuous, centered Gaussian process on R with covariance func-
tion Tt = !e- 1tl is called a stationary Ornstein-Uhlenbeck process. Such
a process Y can be expressed in terms of a Brownian motion B as
yt = e- t B( e2t), t E JR. The last result shows that the Ornstein-Uhlenbeck
process is essentially the only stationary Gaussian process that is also a
Markov process.
We will now study some basic sample path properties of Brownian
motion.
Lemma 13.8 (level sets) If B is a Brownian motion or bridge, then
'\{t; Bt = u} == 0 a.s., u E JR.
Proof: Introduce the processes XI" = B[nt]/n, t E R+ or [0,1], n E N,
and note that XI" -+ B t for every t. Since each process xn is product
measurable on (1 x JR+ or n x [0,1], the same thing is true for B. Now use
Fubini's theorem to conclude that
EA{tj Bt = u} = ! P{Bt = u}dt = 0, u E IR.. 0
The next result shows that Brownian motion has locally finite quadratic
variation. An extension to general continuous semimartingales is obtained
in Proposition 17.17.
13. Gaussian Processes and Brownian Motion 255
Theorem 13.9 (quadratic variation, Levy) Let B be a 13rownian motion,
and fix any t > 0 and a sequence of partitions 0 == tn,o < tn, 1 < ... <
tn,k n = t, n E N, such that h n = maxk(tn,k - tn,k-l) -+ O. Then
(n = Lk (Btn.k - Bt n .k_J2 -t t in £2. (3)
If the partitions are nested, then also (n -+ t a.s.
Proof (Doob): To prove (3), we may use the scaling property Bt - Bs d
It - 81 1 / 2 Bl to obtain
E(n - Lk E(Btn.k - Bt n .k_l)2
- Lk(tn,k - tn,k-l)EB; = t,
var((n) Lk var(Btn.k - Bt n .k_l)2
- Lk (tn,k - t n ,k_d 2v ar(B;) < hntEBi -t O.
For nested partitions we may prove the a.s. convergence by showing that
the sequence (n) is a reverse martingale, that is,
E[(n-l - (nl(n, (n+l, . . .] == 0 a.s., n E N.
(4)
Inserting intermediate partitions if necessary, we may assume that k n == n
for all n. In that case there exist some numbers tl, t2 '\ . .. E [0, t] such
that the nth partition has division points t 1 ,..., tn. To verify (4) for a
fixed n, we may further introduce an auxiliary random variable {} lLB with
P{{) = :i1} = , and replace B by the Brownian motion
B == BSl\t n + f}(Bs - Bs/\t n ), S > o.
Since B' has the same sums (n, (n+l, . .. as B whereas (n--l - (n is replaced
by {)(n - (n-1), it is enough to show that E[{}(n - (n-l)l(n,(n+l,...]
= 0 a.s. This is clear from the choice of {} if we first condition on
(n-l,(n'.... 0
The last result implies that B has locally unbounded variation. This
explains why the stochastic integral J V dB cannot be defined as an or-
dinary Stieltjes integral and a more sophisticated approach is required in
Chapter 17.
Corollary 13.10 (linear variation) Brownian motion has a.s. unbounded
variation on every interval [s, t] with s < t.
Proof: The quadratic variation vanishes for any continuous function of
bounded variation on [s, t]. 0
From Proposition 8.5 we note that Brownian motion B is a space-homo-
geneous Markov process with respect to its induced filtration. If the Markov
property holds for some more general filtration F == (Ft) -that is, if B is
adapted to F and such that the process B = B s+t - B s is independent of
256 Foundations of Modern Probability
Fs for each s > 0 -we say that B is a Brownian motion with respect to F,
or an :F-Brownian motion. In particular, we may take :Ft = Qt V N, t > 0,
where 9 is the filtration induced by Band N = a{N c A; A E A, P A =
OJ. With this construction, :F becomes right-continuous by Corollary 7.25.
The Markov property of B will now be extended to suitable optional
times. A more general version of this result appears in Theorem 19.17. As
in Chapter 7, we write :Ft+ = Ft+.
Theorem 13.11 (strong Markov property, Hunt) For any :F-Brownian
motion B in}Rd and a.s. finite :F+ -optional time T, the process B == B,+t-
Br, t > 0, is again a Brownian motion independent of :F;.
Proof: As in Lemma 7.4, we may choose some optional times Tn -+ T that
take countably many values and satisfy Tn > T + 2- n . Then F: c nn Fr n
by Lemmas 7.1 and 7.3, and so by Proposition 8.9 and Theorem 8.10 each
process Bf == B'n+ t - Br n , t > 0, is a Brownian motion independent of
:r::. The continuity of B yields B'f -+ B a.s. for every t. By dominated
convergence we then obtain, for any A E :F: and tI,..., tk E JR+, kEN,
and for bounded continuous functions f: JRk -+ JR,
E[f(Bl'...' Bk); A] = Ef(Btl'...' B tk ) . P A.
The general relation P[B' E ., A] == P {B E .} . P A now follows by a
straightforward extension argument. 0
If B is a Brownian motion in JRd, then a process with the same distri-
bution as IBI is called a Bessel process of order d. More general Bessel
processes may be obtained as solutions to suitable SDEs. The next result
shows that IBI inherits the strong Markov property from B.
Corollary 13.12 (Bessel processes) If B is an :F-Brownian motion in
}Rd, then IBI is a strong :F+ -Markov process.
Proof: By Theorem 13.11 it is enough to show that IB + xl d IB + yl
whenever Ixl == Iyl. We may then choose an orthogonal transformation T
on ]Rd with Tx = y, and note that
d
IB+xl = IT(B+x)1 = ITB+yl = IB+y\.
o
We shall use the strong Markov property to derive the distribution of
the maximum of Brownian motion up to a fixed time. A stronger result is
obtained in Corollary 22.3.
Proposition 13.13 (maximum process, Bachelier) Let B be a Brownian
motion in JR, and define Mt = sUPst Bs, t > O. Then
d d
Mt = Mt - Bt = IBt\, t > o.
For the proof we need the following continuous-time counterpart to
Lemma 9.10.
13. Gaussian Processes and Brownian Motion 257
Lemma 13.14 (reflection principle) Consider a Brownian motion Band
an associated optional time T. Then B has the same distribution as the
reflected process
Bt == B tAT - (Bt - B tAr ), t > o.
Proof: It is enough to compare the distributions up to a fixed time t, and
so we may assume that T < 00. Define B[ == BrAt and B: == B r + t - B T .
By Theorem 13.11 the process B' is a Brownian motion independent of
(T, BT). Since, moreover, - B' d B', we get (T, BT, B') (T, B r , - B'). It
remains to note that
Bt = B[ + B Ct - r )+' Bt = B; - B(t-T)+'
t > o.
o
Proof of Proposition 13.13: By scaling it suffices to take t == 1. Applying
Lemma 13.14 with T == inf {t; Bt == x} gives
P{M 1 > x, Bl < y} == P{B 1 > 2x - y}, x > yVO.
By differentiation it follows that the pair (M 1 , B 1 ) has probability density
-2<p'(2x - y), where <p denotes the standard normal density. Changing
variables, we may conclude that (M 1 , M 1 - B 1 ) has density -2c.p'(x + y),
x, y > o. In particular, both M 1 and M 1 - Bl have density 2c.p(x), x >
o. 0
To prepare for the next main result, we shall derive another elementary
sample path property.
Lemma 13.15 (local extremes) The local maxima and minima of a
Brownian motion or bridge are a.s. distinct.
Proof: Let B be a Brownian motion, and fix any intervals I == [a, b] and
J == [e, d] with b < c. Write
sup Bt - sup Bt = sup(B t - Be) + (Be - B b ) - sup(B t - Bb).
tEJ tEl tEJ tEl
Here the second term on the right has a diffuse distribution, and by in-
dependence the same thing is true for the whole expression. In particular,
the difference on the left is a.s. nonzero. Since I and J are arbitrary, this
proves the result for local maxima. The case of local minirna and the mixed
case are similar.
The result for the Brownian bridge B O follows from that for Brownian
motion, since the distributions of the two processes are equivalent (mutu-
ally absolutely continuous) on any interval [0, t] with t < 1. To see this,
construct from Band BO the corresponding "bridges"
Xs=Bs-Bt, Ys=B-B, sE[O,t],
t t
and check that BtiLX d Y JlB. The stated equivalence now follows from
the fact that N(O, t) f'J N(O, t(l- t)) when t E [0,1). 0
258 Foundations of Modern Probability
The next result involves the arcsine law, which may be defined as the
distribution of == sin 2 Q when Q is U(O,27r). The name comes from the
fact that
P{ < t} = P {I sinal < v't} = : arcsin v't, t E [0,1].
Note that the arcsine distribution is symmetric around !, since
= sin 2 a d cos 2 Q = 1 - sin 2 Q = 1 - .
The following celebrated result exhibits three interesting functionals of
Brownian motion, all of which are arcsine distributed.
Theorem 13.16 (arcsine laws, Levy) Let B be a Brownian motion on
[0,1] with maximum MI. Then these random variables are all arcsine
distributed:
71 == A{t; Bt > O}, 72 == inf{t; Bt == M 1 }, 73 == sup{t; Bt == O}.
It is interesting to compare the relations 71 d 72 d 73 with the
discrete-time versions obtained in Theorem 9.11 and Corollary 11.14. In
Theorems 14.11 and 15.21, the arcsine laws are extended by approximation
to appropriate random walks and Levy processes.
d
Proof: To see that 71 == 72, let n E N, and note that by Corollary 11.14
n- 1 Lkn 1{B k / n > O} d n- 1 min{k > 0; Bk/n = maxjnBj/n}.
By Lemma 13.15 the right-hand side tends a.s. to 72 as n -+ 00. To see that
the left-hand side converges to 71, we may conclude from Lemma 13.8 that
A{t E [0,1]; Bt > O} + A{t E [0,1]; Bt < O} == 1 a.s.
It remains to note that, for any open set G C [0,1],
liminfn- 1 " lc(k/n) > AG.
n--+oo L...J kn
In case of 72, fix any t E [0,1], let and T} be independent N(O, 1), and
let Q be U(O,27r). Using Proposition 13.13 and the circular symmetry of
the distribution of (, T]), we get
P{72 < t} P{suPst(Bs - Bt) > sUPs2 t (B s - Bt)}
- P{IBt\ > IBI - Btl} = p{t2 > (1 - t)1]2}
- p{ £;2 : 'TJ2 < t } = P{sin 2 a < t}.
In case of 73, we may write
P {73 < t}
P{suPstBs < O} + P{infstBs > O}
2P{SUPs2 t (B s - Bt) < -Bt} = 2P{I B l - Btl < Bt}
P{IB1 - Btl < IBtl} = P{72 < t}. 0
13. Gaussian Processes and Brownian Motion 259
The first two arCSIne laws have the following counterparts for the
Brownian bridge.
Theorem 13.17 (uniform laws) Let B be a Brownian bridge with
maximum M I - Then these random variables are both U(O, 1):
/1 == A{t; Bt > O}, /2 == inf{t; Bt == M l }.
Proof: The relation /1 d /2 may be proved in the same way as for Brow-
nian motion. To see that /2 is U (0, 1), write (x) == x - [x], and consider for
each u E [0,1] the process Bf = B(u+t) - Bu, t E [0,1]. It is easy to check
that BU d B for each u, and further that the maximum of BU occurs at
(/2 - u). By Fubini's theorem we hence obtain for any t E: [0,1]
P{T2 < t} = 1 1 P{(T2 - u) < t}du = E ..\{u; (T2 - u) < t} = t. 0
From Theorem 13.5 we note that t- C Bt 0 a.s. as t 0 for any
c E [0, ). The following classical result gives the exact growth rate of
Brownian motion at 0 and 00. Extensions to random walks and renewal
processes are obtained in Corollaries 14.8 and 14.14. A functional version
appears in Theorem 27.18.
Theorem 13.18 (laws of the iterated logarithm, Khinchin) For a Brow-
nian motion B in JR, we have a.s.
.
hm sup == lim sup == 1
t--+O V 2t log log(ljt) t--+oo y! 2t log log t .
Proof: The Brownian inversion Bt == tB1!t of Lemma L3.6 converts the
two formulas into one another, so it is enough to prove the result for t -+ 00.
Then we note that as u 00
1 00 e- x2f2 dx '" u- 1 1 00 xe- x2f2 dx = u-1e-u2f2.
By Proposition 13.13 we hence obtain, uniformly in t > 0,
P{Mt > ut 1 !2} = 2P{Bt > ut 1 !2} rv (2/1r)1!2u- L e- U2 / 2 ,
where Mt == sUPstBs. Writing ht == (2tloglogt)lj2, we get for any r > 1
and c > 0
P{M(r n ) > ch(r n - 1 )} n- c2 !r(logn)-1!2, n E N.
Fixing c > 1 and choosing r < c 2 , it follows by the Bore]-Cantelli lemma
that
P{limsuPt--+oo(Bt/ht) > c} < P{M(r n ) > ch(r n - 1 ) La.} == 0,
which shows that limsuPt--+oo(Bt/ht) < 1 a.s.
To prove the reverse inequality, we may write
P{ B(r n ) - B(rn-l) > ch(r n )} ?: n- c2r !(r-l) (log n)- L/2, n E N.
260 Foundations of Modern Probability
Taking c == {( r - 1) / r } 1/2, we get by the Borel-Cantelli lemma
1 . Bt - B t / r > 1 . B(rn) - B(r n - 1 ) ( r -1 ) 1/2
lID sup lID sup () > a.s.
too ht - n--+oo h r n - r
The upper bound obtained earlier yields limsuPt--+oo( -Bt/r/h t ) < r- 1 / 2 ,
and combining the two estimates gives
B
limsup > (1 - r- 1 )1/2 - r- 1 / 2 a.s.
t--+oo ht
Here we may finally let r 00 to obtain lim SUPt--+CXJ (Bt / ht) > 1 a.s. 0
In the proof of Theorem 13.5 we constructed a Brownian motion B from
an isonormal Gaussian process 'rJ on L 2 (+,,X) such that Bt = 'rJl[o,t] a.s.
for all t > o. If instead we are starting from a Brownian motion B on
1R+, the existence of an associated isonormal Gaussian process T} may be
inferred from Theorem 6.10. Since every function h E L2(+,'x) can be
approximated by simple step functions, as in the proof of Lemma 1.35, we
note that the random variables 'rJh are a.s. unique. We shall see how they
can also be constructed directly from B as suitable Wiener integrals J hdB.
As already noted, the latter fail to exist in the pathwise Stieltjes sense, and
so a different approach is needed.
As a first step, we may consider the class S of simple step functions of
the form
ht = Ljn aj l(tj_l ,tj] (t), t > 0,
where n E Z+, 0 == to < . . . < tn, and aI, . . . , an E JR. For such integrands
h, we may define the integral in the obvious way as
",h = roo htdBt = Bh = L '< aj(Btj - B tj _ 1 ).
Jo J_n
Here 'TJh is clearly centered Gaussian with variance
E(",h)2 = L, aJ(tj - tj-d = roo h;dt = IIh11 2 ,
Jn J o
where IIhll denotes the norm in L 2 (R+, -X). Thus, the integration h ..-.+ 1Jh =
J hdB defines a linear isometry from S c L2(R+,'x) into L2(f!, P).
Since S is dense in L2(+, ,X), we may extend the integral by continuity
to a linear isometry h ..-.+ 'T}h = J hdB from L 2 (,\) to L 2 (P). Here '1}h is
again centered Gaussian for every h E L 2 (,\), and by linearity the whole
process h r-+ 'rJh is then Gaussian. By a polarization argument it is also
clear that the integration preserves inner products, in the sense that
E(",h",k) = 1 00 htktdt = (h,k), h,k E L 2 (A).
We shall consider two general ways of representing stationary Gaussian
processes in terms of Wiener integrals 'fJh. Here a complex notation is conve-
nient. By a complex-valued, isonormal Gaussian process on a (real) Hilbert
13. Gaussian Processes and Brownian Motion 261
space H we mean a process ( = + i1] on H such that and 1] are indepen-
dent, real-valued, isonormal Gaussian processes on H. For any f = 9 + ih
with g, h E H, we define (f = g - 1]h + i(h + 1]g).
Now let X be a stationary, centered Gaussian process on JR with
covariance function rt = E XsXs+t, S, t E JR.. We know that r is non-
negative definite, and it is further continuous whenever X is continuous
in probability. In that case Bochner's theorem yields a unique spectral
representation
rt = f: eitxf.-t(dx), t E JR,
where the spectral measure J-l is a bounded, symmetric IIteasure on IR.
The following result gives a similar spectral representation of the process
X itself. By a different argument, the result extends to suitable non-
Gaussian processes. As usual, we assume that the basic probability space
is rich enough to support the required randomization variables.
Proposition 13.19 (spectral representation, Stone, Cramer) Let X be an
L 2 -continuous, stationary, centered Gaussian process on IR with spectral
measure J-l. Then there exists a complex, isonormal Gau.5sian process ( on
L 2 (J.L) such that
Xt = ?R f: eitxd(x a.s., t E R
Proof: Denoting the right-hand side of (5) by Y, we may compute
E Y s yt E J ( cos sx dx - sin sx d'TJx) J ( CDS tx d:x - sin tx d'TJx)
J ( cos sx cos tx - sin sx sin tx) f.-t( dx )
- J cos(s - t)x f.-t(dx) = J ei(s-t)x f.-t(dx) = rs-t.
Since both X and Yare centered Gaussian, Lemma 13.1 shows that Y d X.
Now both X and ( are continuous and defined on the separable spaces
L 2 (X) and L2(J-t), and so they may be regarded as randonl elements in suit-
able Polish spaces. The a.s. representation in (5) then follows by Theorem
6.10. 0
(5)
Another useful representation may be obtained under suitable regularity
conditions on the spectral measure J-l.
Proposition 13.20 (moving average representation) _Let X be an £2_
continuous, stationary, centered Gaussian process on I with absolutely
continuous spectral measure J-t. Then there exist an isonormal Gaussian
process'T} on L 2 (JR, A) and a function f E L 2 (A) such that
Xt = i: ft-sd'TJs a.s., t E JR. (6)
262 Foundations of Modern Probability
Proof: Fix a symmetric density 9 > 0 of Jl;, and define h = 9 1 / 2 . Then
h E L 2 (A), and we may introduce the Fourier transform in the sense of
Plancherel,
is = hs == (27r)-1/2 lim j a eisxhxdx, S E JR,
a---+CX) -a
(7)
which is again real valued and square integrable. For each t E the function
kx = e- itx hx has Fourier transform ks = /s-t, and so by Parseval's relation
j oo j oo j oo
itx 2 -
Tt = -00 e hxdx = -00 hxkxdx = -00 Isls-tds.
(8)
Now consider any isonormal Gaussian process 1] on L 2 (A). For / as in (7),
we may define a process Y on R by the right-hand side of (6). Using (8),
we get EYsY s + t = rt for arbitrary s, t E , and so Y d X by Lemma 13.1.
Again an appeal to Theorem 6.10 yields the desired a.s. representation of
x. 0
For an example, we may consider a moving average representation of
the stationary Ornstein-Uhlenbeck process. Then introduce an isonormal
Gaussian process 1] on L2(, A) and define
Xt = j t es-td'T/s, t > O.
-00
The process X is clearly centered Gaussian, and we get
j Sl\t
rs,t = E XsXt == eu-seu-tdu = e-Is-tl,
-00
s, t E ,
as desired. The Markov property of X follows most easily from the fact
that
i t
s-t u-t
Xt = e Xs + s e d'T/u,
s < t.
We proceed to introduce multiple integrals In = 1]n with respect to an
isonormal Gaussian process 1] on a separable (infinite-dimensional) Hilbert
space H. Without loss of generality, we may take H to be of the form
L 2 (S, J1). Then HQf)n can be identified with L 2 (sn, tlQf)n), where Jl;Qf)n denotes
the n-fold product measure J.l; Q9 . . . 0 Jl;, and the tensor product @k<n h k =
hI Q9 · . · Q9 h n of the elements hI, . . . , h n E H is equivalent to the function
hI (t l ).. . hn(t n ) on sn. Recall that for any ONB el, e2,... in H, the tensor
products jn ekj with arbitrary k I ,..., k n E N form an ONB in Hn.
We may now state the basic existence and uniqueness result for the
integrals In.
13. Gaussian Processes and Brownian Motion 263
Theorem 13.21 (multiple stochastic integrals, Wiener" Ita) Let 1] be an
isonormal Gaussian process on some separable Hilbert space H. Then for
every n E N there exists a unique continuous linear mapping In : H0 n -+
L2(p) such that a.s.
In Q9 hk = II'TJhk, hI,..., h n E H orthogonal.
kn kn
Here the uniqueness means that Inh is a.s. unique for every h, and the
linearity means that In(af + bg) = aInf + bIng a.s. for any a, b E JR and
f, 9 E H0 n . Note in particular that II h == 1]h a.s. For consistency, we define
10 as the identity mapping on .
For the proof we may clearly assume that H = £2([0,1], A). Let En denote
the class of elementary functions of the form
f= 2:CjQ91A'
jm kn
(9)
where the sets A},. . . , Aj E 8[0,1] are disjoint for each j E {1,..., m}.
The indicator functions 1 A are then orthogonal for fixed j, and we need
J
to take
Inf = 2: Cj II 1]Aj,
jm kn
(10)
where 'TJA = 'TJ1A. From the linearity in each factor it is clear that the value
of Inf is independent of the choice of representation (9) for f.
To extend the definition of In to the entire space £2 (Rt., A @n), we need
two lemmas. For any function f on 1R+, we introduce the symmetrization
l(t l ,..., t n ) = (n!)-l 2: f(t p1 ,..., t pn ), t l ,. .., t n E 1R+,
p
where the summation extends over all permutations p of {1, . . . , n }. The
following result gives the basic L 2 -structure, which later carries over to the
general integrals.
Lemma 13.22 (isometry) The elementary integrals In! 'In (10) are
orthogonal for different n and satisfy
E(Inf)2 = n!IIJII 2 < n!lIfI1 2 , f E En. (11)
Proof: The second relation in (11) follows from Minkowski's inequality.
To prove the remaining assertions, we may first reduce to the case when all
sets Aj are chosen from some fixed collection of disjoint sets B l , B 2 , . .. .
For any finite index sets J ::/= K in N, we note that
E II 1}Bj II 'T]Bk = II E('TJBj)2 II E1]Bj = O.
jEJ kEK jEJnK jEJK
This proves the asserted orthogonality. Since clearly (I, g) == 0 when 1
and 9 involve different index sets, it also reduces the proof of the isometry
in (11) to the case when all terms in f involve the same sets Bl,... , Bn,
264 Foundations of Modern Probability
though in possibly different order. Since In! = In!, we may further assume
that! = Q9k 1Bk. But then
E(Inf)2 = It E(1]B k )2 = I1k ABk = IIfl1 2 = n!lIiIl 2 ,
where the last relation holds since, in the present case, the permutations
of f are orthogonal. 0
To extend the integral, we need to show that the elementary functions
are dense in L2(,X@n).
Lemma 13.23 (approximation) The set En is dense in L 2 (;A@n).
Proof: By a standard argument based on monotone convergence and a
monotone class argument, any function f E L 2 (;A@n) can be approximated
by linear combinations of products Q9k<n 1Ak' and so it is enough to ap-
proximate functions f of the latter typeThen divide [0, 1] for each m into
2 m intervals Bmj of length 2- m , and define
fm = f L Q?)lB m ,jk'
(12)
jl ,...,jn kn
where the summation extends over all collections of distinct indices
jl,." ,jn E {I,..., 2 m }. Here 1m E En for each m, and the sum in
(12) tends to 1 a.e. AQ9n. Thus, by dominated convergence fm -+ f in
L 2 (;A@n). 0
By the last two lemmas, In is defined as a uniformly continuous mapping
on a dense subset of L 2 (.,\@n), and so it extends by continuity to all of
L 2 (;A@n), with preservation of both the linearity and the norm relations
in (11). To complete the proof of Theorem 13.21, it remains to show that
InQ9k<nhk = I1k'f}hk for any orthogonal functions hI,...,h n E L 2 (,X).
This is an immediate consequence of the following lemma, where for any
f E L 2 (.,\@n) and 9 E L2(,X) we write
(f 01 g)(tl,"" tn-I) = J f(t 1 ,..., tn)g(tn)dt n .
Lemma 13.24 (recursion) For any f E L 2 (.,\@n) and 9 E L2(A) with
n EN, we have
In+I(f Q9 g) = In! '1]g - nI n - 1 (! Q91 g).
(13)
Proof: By Fubini's theorem and the Cauchy-Buniakowski inequality,
1I10g11 = 1111I11gll,
11101 gll < 1111111g11 < 1111111gll.
Hence, the two sides of (13) are continuous in probability in both f and
g, and it is enough to prove the formula for f E £n and 9 E £1. By the
linearity of each side we may next reduce to the case when 1 = Q9k<n 1Ak
and 9 = lA, where AI,.. . , An are disjoint and either A n Uk Ak - 0 or
13. Gaussian Processes and Brownian Motion 265
A == AI. In the former case we have j @l 9 == 0, so (13) is immediate from
the definitions. In the latter case, (13) becomes
In+l (A 2 X A 2 X . . . x An) == {( 1]A)2 - ,XA }1]A 2 . . '1]An. (14)
Approximating l A 2 as in Lemma 13.23 by functions 1m E £2 with support
in A 2 , it is clear that the left-hand side equals 1 2 A 2 '1]A 2 . . . '1]An. This re-
duces the proof of (14) to the two-dimensional version 1 2 A 2 == (1]A)2 - 'xA.
To prove the latter, we may divide A for each m into 2 m subsets BmJ of
measure < 2- m , and note as in Theorem 13.9 and Lemrna 13.23 that
(17A)2 = L i (17 B mi)2 + Li-f:- j 17 Bm i 17 B mj ----+ AA + hA2 in £2. 0
The last lemma will be used to derive an explicit representation of the
integrals In in terms of the Hermite polynomials Po, PI, . .. . The latter are
defined as orthogonal polynomials of degrees 0, 1, . .. with respect to the
standard Gaussian distribution on JR. This condition determines each Pn up
to a normalization, which we choose for convenience such that the leading
coefficient becomes 1. The first few polynomials are then
Po ( x) == 1, PI ( x) == x, P2 ( x) == x 2 - 1, P3 ( x) == :1;3 - 3x,
Theorem 13.25 (orthogonal representation, Ito) On a separable Hilbert
space H, let TJ be an isonormal Gaussian process with associated multi-
ple Wiener-ItD integrals 1 1 ,1 2 ,... . Then for any orthonormal elements
el,..., em E H and integers nl,..., n m > 1 with sum n, we have
In Q9 eJn) == II Pn)(TJej).
js.m J5:m
Using the linearity of In and writing it == h/llhll, we see that the stated
formula is equivalent to the factorization
In Q9 hJnj == II Injhfn), hI,..., h k E H orthogonal, (15)
j:5m j5:m
together with the representation of the individual factors
I n h6])n == Ilhllnpn(TJh), h E H \ {O}.
(16)
Proof: We prove (15) by induction on n. Then assume the relation
to hold for all integrals up to order n, fix any orthonormal elements
h, hI, . . . ,h m E H and integers k, nl, . . . , n m E N with sum n + 1, and
write 1 = j5:m hfn j . By Lemma 13.24 and the induction hypothesis,
In+l (I (j!) h6])k)
In(f (j!) h(k-l»). TJh - (k -l)ln-l(f @ h0(k-2»)
(In-k+1f) {h_ 1 h0(k-l) .17 h - (k -- 1)h_2 h 0(k-2)}
In-k+l/.Ik h @k.
Using the induction hypothesis again, we obtain the desired extension to
In+l-
266 Foundations of Modern Probability
It remains to prove (16) for an arbitrary element h E H with IIhll = 1.
Then conclude from Lemma 13.24 that
In+Ih(n+l) = Inhn . 'fJh - nln_Ih(n-I), n E N.
Since 101 = 1 and Ilh = 'fJh, we see by induction that Inh filn is a polynomial
in 1Jh of degree n and with leading coefficient 1. By the definition of Hermite
polynomials, it remains to show that the integrals Inh@n for different n are
orthogonal, which holds by Lemma 13.22. 0
Given an isonormal Gaussian process 1] on some separable Hilbert space
H, we introduce the space L2('TJ) = L 2 (f2, a{7]}, P) of 7]-measurable random
variables with E2 < 00. The nth polynomial chaos Pn is defined as the
closed linear subspace generated by all polynomials of degree < n in the
random variables ".,h, h E H. We also introduce for every n E Z+ the nth
homogeneous chaos tin, consisting of all integrals In!, f E H(8)n.
The relationship between the mentioned spaces is clarified by the fol-
lowing result. As usual, we write EB and e for direct sums and orthogonal
complements, respectively.
Theorem 13.26 (chaos expansion, Wiener) On a separable Hilbert space
H, let 'fJ be an isonormal Gaussian process with associated polynomial and
homogeneous chaoses Pn and 1-l n , respectively. Then the 1-l n are orthogonal,
closed, linear subspaces of £2 (".,), satisfying
n
00
Pn = EB 1ik, n E Z+; £2(TJ) = EB tin. (17)
k=O n=O
Furthermore, every E £2(7]) has a unique a.s. representation = Ln Infn
with symmetric elements f n E H@n, n > o.
In particular, we note that 1lo = Po == JR and
1i n = Pn e P n - 1 , n E N.
Proof: The properties in Lemma 13.22 extend to arbitrary integrands,
and so the spaces ll n are mutually orthogonal, closed, linear subspaces of
£2(1]). From Lemma 13.23 or Theorem 13.25 we see that also 1-l n C Pn.
Conversely, let be an nth-degree polynomial in the variables 1Jh. We may
then choose some orthonormal elements eI, . . . , em E H such that is an
nth-degree polynomial in 'fie 1 , . . . , 'fIe m . Since any power ('fJej)k is a linear
combination of the variables po(7]ej),... ,Pk('fJej), Theorem 13.25 shows
that is a linear combination of multiple integrals Ikf with k < n, which
means that E E9k<n llk. This proves the first relation in (17).
To prove the second relation, let E L2('TJ) e E9n 1-l n . In particular,
..L('T1h)n for every h E Hand n E Z+. Since Ln l'fJhln In! = e ll1hl E £2,
the series eiT}h = Ln(iT}h)nln! converges in £2, and we get -Lei'11h for
every h E H. By the linearity of the integral 'fJh, we hence obtain for any
13. Gaussian Processes and Brownian Motion 267
hI, . . . , h n E H, n EN,
E [exp Lk$niUk1]hk] = 0, Ul,..., Un E R
Applying the uniqueness theorem for characteristic functions to the distri-
butions of (1Jh 1 ,.. . ,1Jh n ) under the bounded measures J-lI == E[I;.], we
may conclude that
E [; (1] hI, . . . , 'fJ h n ) E B] == 0, B E B OR n ) .
By a monotone class argument, this extends to E[; A] == 0 for arbitrary
A E (1{TJ}, and since is 1J-measurable, it follows that == E[I7J] == 0 a.s.
The proof of (17) is then complete.
In particular, any element E L 2 (17) has an orthogonal expansion
== Infn == Inln,
.L...t n 0 .L...t n 0
for some elements in E HQ9n with symmetric versions in, n E Z+. Now
assume that also == En Ingn. Projecting onto ll n and using the linearity
of In' we get In(gn - in) == o. By the isometry in (] 1) it follows that
119n - inll == 0, and so 9n == In. 0
Exercises
1. Let l,. . . , n be i.i.d. N(m, (12). Show that the random variables ( ==
n- 1 Ek k and s2 == (n - 1)-1 Ek(k - )2 are independent and that (n -
1)s2 d Ek<n(k - m)2. (Hint: Use the symmetry in Proposition 13.2, and
no calculations.)
2. For a Brownian motion B, put tnk == k2- n , and define O,k == Bk - B k - 1
and nk == B tn ,2k-l - (Btn-l,k-l + Btn-1,k)' k, n > 1. Show that the nk
are independent Gaussian. Use this fact to construct a Brownian motion
from a sequence of i.i.d. N(O, 1) random variables.
3. Let B be a Brownian motion on [0,1], and define Xt == Bt - tB l . Show
that X JiB!. Use this fact to express the conditional distribution of B,
given B!, in terms of a Brownian bridge.
4. Combine the transformations in Lemma 13.6 with the Brownian scal-
ing c- 1 B(c 2 t) to construct a family of transformations preserving the
distribution of a Brownian bridge.
5. Show that the Brownian bridge is an inhomogeneous Markov process.
(Hint: Use the transformations in Lemma 13.6 or verify the condition in
Proposition 13.7.)
6. Let B = (B 1 , B2) be a Brownian motion in JR2, and consider some times
tnk as in Theorem 13.9. Show that Ek(B£n,k - B£n.k_l )(Bln,k - B;n,k_l)
0 in L 2 or a.s., respectively. (Hint: Reduce to the case of the quadratic
variation. )
268 Foundations of Modern Probability
7. Use Theorem 7.27 to construct an rcll version B of Brownian motion.
Then show as in Theorem 13.9 that B has quadratic variation [B]t - t,
and conclude that B is a.s. continuous.
8. For a Brownian motion B, show that inf {t > 0; Bt > O} == 0 a.s. (Hint:
Conclude from Kolmogorov's 0-1 law that the stated event has probability
o or 1. Alternatively, use Theorem 13.18.)
9. For a Brownian motion B, define Ta == inf{t > 0; Bt == a}. Compute the
density of the distribution of Ta for a :/= 0, and show that ETa == 00. (Hint:
Use Proposition 13.13.)
10. For a Brownian motion B, show that Zt == exp(cB t - !c 2 t) is a martin-
gale for every c. Use optional sampling to compute the Laplace transform
of Ta above, and compare with the preceding result.
11. (Paley, Wiener, and Zygmund) Show that Brownian motion B is a.s.
nowhere Lipschitz continuous, and hence nowhere differentiable. (Hint: If B
is Lipschitz at t < 1, there exist some K, 8 > 0 such that I Br - B s I < 2hK
for all r, s E (t - h, t + h) with h < {yo Apply this to three consecutive
n-dyadic intervals (r, s) around t.)
12. Refine the preceding argument to show that B is a.s. nowhere Holder
continuous with exponent c > !.
13. Show that the local maxima of a Brownian motion are a.s. dense in
and that the corresponding times are a.s. dense in JR+. (Hint: Use the
preceding result.)
14. Show by a direct argument that lim SUPt t- 1 / 2 Bt == 00 a.s. as t -t 0
and 00, where B is a Brownian motion. (Hint: Use Kolmogorov's 0-1 law.)
15. Show that the law of the iterated logarithm for Brownian motion at 0
remains valid for the Brownian bridge.
16. Show for a Brownian motion B in jRd that the process IBI satisfies the
law of the iterated logarithm at 0 and 00.
17. Let 1,2,'" be i.i.d. N(D,l). Show that limsuPn(2Iogn)-1/2n == 1
a.s.
18. For a Brownian motion B, show that Mt == t- 1 Bt is a reverse martin-
gale, and conclude that t- 1 Bt -t 0 a.s. and in LP, p > 0, as t -t 00. (Hint:
The limit is degenerate by Kolmogorov's 0-1 law.) Deduce the same result
from Theorem 10.9.
19. For a Brownian bridge B, show that Mt == (1 - t)-l Bt is a martingale
on [0,1). Check that M is not Ll-bounded.
20. Let In be the n-fold Wiener-Ito integral w.r.t. Brownian motion B on
1R+. Show that the process Mt = In(l[o,t]n) is a martingale. Express M in
terms of B, and compute the expression for n == 1,2,3. (Hint: Use Theorem
13.25. )
13. Gaussian Processes and Brownian Motion 269
21. Let 'f/l, . . . , 'TJn be independent, isonormal Gaussian processes on a sep-
arable Hilbert space H. Show that there exists a unique continuous linear
mapping @k 1Jk from Hfi9 n to £2(p) such that @k 1]k @k h k == Ilk 1]k h k
a.s. for all hI, . . . , h n E H. Also show that @ k TJk is an isometry.
Chapter 14
Skorohod Embedding
and Invariance Principles
Embedding of random variables; approximation of random
walks; functional central limit theorem; laws of the iterated
logarithm; arcsine laws; approximation of renewal processes;
empirical distribution functions; embedding and approximation
of martingales
In Chapter 5 we used analytic methods to derive criteria for a sum of inde-
pendent random variables to be approximately Gaussian. Though this may
remain the easiest approach to the classical limit theorems, the results are
best understood when viewed as consequences of some general approxima-
tion theorems for random processes. The aim of this chapter is to develop
a purely probabilistic technique, the so-called Skorohod embedding, for
deriving such functional limit theorems.
In the simplest setting, we may consider a random walk (Sn) based on
some i.i.d. random variables €k with mean 0 and variance 1. In this case
there exist a Brownian motion B and some optional times Tl < /2 < ...
such that Sn = Br n a.s. for every n. For applications it is essential to choose
the Tn such that the differences Tn are again i.i.d. with mean one. The
step process 3[t] will then be close to the path of B, and many results for
Brownian motion carryover, at least approximately, to the random walk.
In particular, the procedure yields versions for random walks of the arcsine
laws and the law of the iterated logarithm.
From the statements for random walks, similar results may be deduced
rather easily for various related processes. In particular, we shall derive a
functional central limit theorem and a law of the iterated logarithm for
renewal processes, and we shall also see how suitably normalized versions
of the empirical distribution functions from an i.i.d. sample can be approxi-
mated by a Brownian bridge. For an extension in another direction, we shall
obtain a version of the Skorohod embedding for general L 2 -martingales and
show how any suitably time-changed martingale with small jumps can be
approximated by a Brownian motion.
The present exposition depends in many ways on material from previous
chapters. Thus, we rely on the basic theory of Brownian motion, as set
forth in Chapter 13. We also make frequent use of ideas and results from
Chapter 7 on martingales and optional times. Finally, occasional references
14. Skorohod Embedding and Invariance Principles 271
are made to Chapter 4 for empirical distributions, to Chapter 6 for the
transfer theorem, to Chapter 9 for random walks and renewal processes,
and to Chapter 12 for the Poisson process.
More general approximations and functional limit theorems are obtained
by different methods in Chapters 15, 16, and 19. We also note the close
relationship between the present approximation result for martingales with
small jumps and the time-change results for continuous local martingales
in Chapter 18.
To clarify the basic ideas, we begin with a detailed discussion of the
classical Skorohod embedding for random walks. The rnain result in this
context is the following.
Theorem 14.1 (embedding of random walk, Skorohod) Let 1, 2,... be
i.i.d. random variables with mean 0, and put Sn == 1 + . . . + n. Then
there exists a filtered probability space with a Brownian rnotion B and some
optional times 0 = TO < Tl < ... such that (B Tn ) d (Sn) and the differences
Tn = Tn - Tn-l are i.i.d. with ETn = Er and E(Tn)2 < 4Et.
Here the moment requirements on the differences LlTn are crucial for
applications. Without those conditions the statement would be trivially
true, since we could then choose BlL(n) and define the Tn recursively by
Tn = inf{t > Tn-I; Bt = Sn}. In that case ETn = 00 unless 1 = 0 a.s.
The proof of Theorem 14.1 is based on a sequence of lemmas. First we
exhibit some martingales associated with Brownian motion.
Lemma 14.2 (Brownian martingales) For a Brownian motion B, the
processes Bt, B; - t, and Bi - 6tB; + 3t 2 are all martingales.
Proof: Note that EBt = EBl = 0, EBl = t, and EBt = 3t 2 . Write F
fr the filtration induced by B, let 0 < s < t, and recall that the process
Bt = B s + t - Bs is again a Brownian motion independent of :Fs. Hence,
2 2 - -2 2
E[Bt IFs] = E[Bs + 2BsBt-s + Bt-sl:F s ] = B + t - s.
Moreover,
E[BiIFs]
4 3 - 2 -2 - 3 - 4
E[Bs + 4BsBt-s + 6BsBt-s + 4BsBt_s + Bt-sl:F s ]
B: + 6(t - s)B; + 3(t - s)2,
and so
E[Bt - 6tB;IF s ] = B; - 6sB; + 3(s2 - t 2 ).
o
By optional sampling, we may deduce some useful formulas.
Lemma 14.3 (moment relations) Consider a Brownian motion B and an
optional time T such that BT is bounded. Then
EB T = 0, ET = EB;, ET 2 < 4EB;. (1)
Proof: By optional stopping and Lemma 14.2, we get for any t > 0
EBT/\t = 0, E(T /\ t) = EB;At, (2)
272 Foundations of Modern Probability
3E(T 1\ t)2 + EB;'/\t = 6E(T 1\ t)B;/\t.
(3)
The first two relations in (1) follow from (2) by dominated and monotone
convergence as t --+ 00. In particular, we have ET < 00. We may then take
limits even in (3) and conclude by dominated and monotone convergence
together with the Cauchy-Buniakovsky inequality that
3ET 2 + EB; == 6ET B; < 6(ET 2 EB;)1/2.
Writing r == (ET2jEB;.)1/2, we get 3r 2 + 1 < 6r. Thus, 3(r -1)2 < 2, and
finally, r < 1 + (2/3)1/2 < 2. 0
The next result shows how an arbitrary distribution with mean 0 can
be expressed as a mixture of centered two-point distributions. For any
a < 0 < b, let lIa,b denote the unique probability measure on {a, b} with
mean O. Clearly, lIa,b == 6 0 when ab == 0, and otherwise
Va,b ==
b6 a - a6 b
b - a '
a < 0 < b.
It is easy to verify that v is a probability kernel from JR_ x 1R+ to IR. For
mappings between two measure spaces, measurability is defined in terms
of the O"-fields generated by all evaluation maps 7r B : J-l r-t J-lB, where B is
an arbitrary set in the underlying a-field.
Lemma 14.4 (randomization) For any distribution J-l on IR with mean
zero, there exists a distribution ji on JR_ x JR+ with J-l == J ji( dx dy )vx,y,
and we can choose ji to be a measurable function of J-l.
Proof (Chung): Let J-l:f: denote the restrictions of jj to 1R:f: \ {O}, define
l (x) = x, and put c == J ldJ-l+ == - J ldJ-l-. For any measurable function
f: JR --+ JR+ with 1(0) == 0, we get
c f fd/-L = J ld/-L+ f fd/-L- - J ld/-L- J fd/-L+
J J (y - x)/-L_(dx)/-L+(dy) f fdvx,y,
and so we may take
ji(dxdy) = J-l{0}6 0 ,o(dxdy) + c- 1 (y - x)J-L-(dx)J-l+(dy).
The measurability of the mapping J-l t-+ ji is clear by a monotone class
argument, once we note that ji(A x B) is a measurable function of J-l for
arbitrary A, B E B(IR). 0
The embedding in Theorem 14.1 will now be constructed recursively,
beginning with the first random variable eEl,
14. Skorohod Embedding and lnvariance Principles 273
Lemma 14.5 (embedding of random variable) For any probability mea-
sure J-l on IR with mean 0, consider a random pair (a, (3) with distribution
jl as in Lemma 14.4, and let B be an independent Brownian motion.
Then the time T = inf{ t > 0; Bt E {a, j3}} is optiona.l for the filtration
:Ft == a{ a, (3; Bs, s < t}, and we have
.c(B T ) = J-L, ET = / x 2 J-L(dx), ET 2 < 4 J x 4 J-L(dx).
Proof: The process B is clearly an F-Brownian motion, and T is F-
optional as in Lemma 7.6 (ii). Using Lemma 14.3 and Fubini's theorem
gIves
ET
E P[B T E . I a, 13] == EVo:,f3 == J-l,
EE[Tla,;3) = E J X 2 v a ,fJ(dx) = J x 2 J-L(dx) ,
EE[T 2 Ia,,6] < 4E J X 4 v a ,fJ(dx) = 4/ x 4 J-L(dx).
o
£( B T )
ET 2
Proof of Theorem 14.1: Let J-l be the common distribution of the n. In-
troduce a Brownian motion B and some independent i.i.d. pairs (an, (3n),
n E N, with the distribution ji of Lemma 14.4. Define recursively the
random times 0 == TO < Tl < . .. by
Tn == inf{t > Tn-I; Bt - BTn-l E {a n ,j3n}}, n EN.
Here each Tn is clearly optional for the filtration Ft == a{fk, f3k, k > 1; B t },
t > 0, and B is an F-Brownian motion. By the strong l\farkov property at
Tn, the process Bn) == B Tn + t - B Tn is then a Brownian IIlotion independent
of G n = a{Tk,B Tk ; k < n}. Since moreover (a n +l,(3n+l)JL(B(n),gn), we
obtain (a n +l, tJn+l, B(n))JLQn, and so the pairs (L1Tn, L1B Tn ) are i.i.d. The
remaining assertions now follow by Lemma 14.5. 0
The last theorem enables us to approximate the entire random walk by
a Brownian motion. As before, we assume the underlying probability space
to be rich enough to support the required randomization variables.
Theorem 14.6 (approximation of random walk, Skorohod, Strassen) Let
1, 2,. .. be i.i.d. random variables with mean 0 and va-riance 1, and write
Sn = 1 + . .. + n. Then there exists a Brownian motion B such that
t- 1 / 2 sUPstIS[s] - Bsl 0, t -+ 00,
Hm Srt] - Bt = 0 a.s.
too v 2 t log log t
(4)
(5)
The proof of (5) requires the following estimate.
274 Foundations of Modern Probability
Lemma 14.7 (rate of continuity) For a Brownian motion B in R, we
have
1 . I . IBu - Btl 0
1m 1m sup sup = a.s.
T 11 t-+oo t$;u$;rt v! 2t log log t
Proof: Write h(t) = (2tloglogt)1/2. It is enough to show that
!irn Hrn sup sup
T 11 n-+oo rn$t$rn+l
IBt - Brnl
h(r n ) = 0 a.s.
(6)
Proceeding as in the proof of Theorem 13.18, we get as n --+ 00 for fixed
r > 1 and c > 0
P {SUPtE[rn,rn+1]iBt - Br n I > ch(r n )}
<
lD{B(rn(r - 1)) > ch(r n )}
n- c2 /(r-l) (log n)-1/2.
,--...
<
,--...
(As before, a ;s b means that a < cb for some constant c > 0.) If c 2 > r -1,
it is clear from the Borel-Cantelli lemma that the lim sup in (6) is a.s.
bounded by c, and the relation follows as we let r -+ 1. 0
For the main proof, we need to introduce the modulus of continuity
w(f, t, h) = sup Ifr - fsl, t, h > O.
r,st, Ir-slh
Proof of Theorem 14.6: By Theorems 6.10 and 14.1 we may choose a
Brownian motion B and some optional times 0 = TO < Tl < ... such that
Sn = Br n a.s. for all n, and the differences Tn - T n -1 are i.i.d. with mean
1. Then Tn/n -+ 1 a.s. by the law of large numbers, and so T[t]/t -+ 1 a.s.
Relation (5) now follows by Lemma 14.7.
Next define
6t = sUPstIT[s] - 81, t > 0,
and note that the a.s. convergence Tn/n -+ 1 implies 6t/t -+ 0 a.s. Fix any
t, h, € > 0, and conclude by the scaling property of B that
p {Cl/2SUPs9IBTIS] - Bsl > E: }
< P{ w(B, t + th, th) > et 1 / 2 } + lD{ 8t > th}
= P{w(B,1 + h, h) > €} + P{t- 1 8 t > h}.
Here the right-hand side tends to zero as t -+ 00 and then h -+ 0, and (4)
. 0
As an immediate application of the last theorem, we may extend the law
of the iterated logarithm to suitable random walks.
14. Skorohod Embedding and lnvariance Principles 275
Corollary 14.8 (law of the iterated logarithm, Hartman and Wintner)
Let 1, 2, . .. be i. i. d. random variables with mean 0 and variance 1, and
define Sn = 1 + . . . + n' Then
I . n 1
1m sup == a.s.
n-+oo V 2n log log n
Proof: Combine Theorems 13.18 and 14.6.
o
To derive a weak convergence result, let D[O,1] denote the space of all
functions on [0,1] that are right-continuous with left-hand limits (rell). For
our present needs, it is convenient to equip D[O, 1] with the norm Ilxll ==
SUPt IXtl and the a-field V generated by all evaluation maps 1rt : x M Xt-
The norm is clearly V-measurable, and so the same thing is true for the
open balls Bx,r == {y; IIx - yll < r}, x E D[O, 1], r > O. (However, 1) is
strictly smaller than the Borel a-field induced by the norm.)
Given a process X with paths in D[O, 1] and a mapping f: D[O, 1] JR,
we say that f is a.s. continuous at X if X ft D f a.s., where D f is the set of
functions x E D[O, 1] where f is discontinuous. (The measurability of Df
is irrelevant here, provided that we interpret the condition in the sense of
inner measure.)
We may now state a functional version of the classical central limit
theorem.
Theorem 14.9 (functional central limit theorem, Donsker) Let 1, 2,.'.
be i. i. d. random variables with mean 0 and variance 1, and define
Xf == n- 1 / 2 k, t E [0,1], n E N.
knt
Consider a Brownian motion B on [0, 1], and let f : D[O, 1] IR be
measurable and a.s. continuous at B. Then f{X n ) j(B).
The result follows immediately from Theorem 14.6 together with the
following lemma.
Lemma 14.10 (approximation and convergence) Let Xl, X 2 ,... and
Y 1 , Y2, ... be rcll processes on [0, 1] with Y n d Y 1 :::= Y for all nand
IIX n - Y n II 0, and let f: D[O, 1] be measurable and a.s. continuous
at Y. Then f(Xn) j(Y).
Proof: Put T == Q n [0,1]. By Theorem 6.10 there exist some processes
X on T such that (X, Y) d (X n , Y n ) on T for all n. Then each X is a.s.
bounded and has finitely many up crossings of any nondegenerate interval,
and so the process Xn(t) == X(t+) exists a.s. with paths in D[O, 1]. From
the right continuity of paths it is also clear that (X n , Y) d (X n , Y n ) on
[0, 1] for every n.
To obtain the desired convergence, we note that IIX n - YII d IIX n - Y n II
d - P
0, and hence f{X n ) = f{X n ) f{Y) as in Lemma 4.3. 0
276 Foundations of Modern Probability
In particular, we may recover the central limit theorem in Proposition
5.9 by taking f(x) == Xl in Theorem 14.9. We may also obtain results that
go beyond the classical theory, such as for the choice f(x) == SUPt IXtl. As a
less obvious application, we shall see how the arcsine laws of Theorem 13.16
can be extended to suitable random walks. Recall that a random variable
is said to be arcsine distributed if d sin 2 a, where a is U(0,21r).
Theorem 14.11 (arcsine laws, Erdos and Kac, Sparre-Andersen) Let
(Sn) be a random walk based on some distribution J1 with mean 0 and
variance 1, and define for n E N
r n- 1 Lkn1{Sk > OJ,
r n- l min{k > 0; Sk == maxjnSj},
T n- l max{k < n; SkSn < OJ.
Then T r for i == 1, 2,3, where T is arcsine distributed. The results for
i == 1,2 remain valid for any nondegenerate, symmetric distribution J-l.
For the proof, we consider on D[O, 1] the functionals
fl(X) A{t E [0,1]; Xt > OJ,
f 2 ( X ) inf {t E [0, 1]; Xt V X t - == sup s 1 X s } ,
f3(X) sup{t E [0,1]; XtXl < OJ.
The following result is elementary.
Lemma 14.12 (continuity of functionals) The functionals Ii are mea-
surable. Furthermore, 11 is continuous at X iff ,\{ t; Xt == O} == 0, 12 is
continuous at x iff Xt V Xt- has a unique maximum, and 13 is continuous
at x if 0 is not a local extreme of Xt or Xt- on (0, 1] .
Proof of Theorem 14.11: Clearly, T == fi(X n ) for n E Nand i == 1,2,3,
where
X n - 1/2 S
t ==n [nt],
t E [0, 1], n E N.
To prove the first assertion, it suffices by Theorems 13.16 and 14.9 to show
that each Ii is a.s. continuous at B. Thus, we need to verify that B a.s.
satisfies the conditions in Lemma 14.12. For 11 this is obvious, since by
Fubini's theorem
E..\{t < 1; Bt = O} = 1 1 P{Bt = O}dt = O.
The conditions for 12 and 13 follow easily from Lemma 13.15.
To prove the last assertion, it is enough to consider T, since T has
the same distribution by Corollary 11.14. Then introduce an independent
Brownian motion B and define
(j == n- 1 " l{EBk + (1 - E)Sk > O}, n E N, E E (0,1].
k$;n
14. Skorobod Embedding and Invariance Principles 277
By the first assertion together with Theorem 9.11 and Corollary 11.14, we
have a d a ..5, T. Since P{Sn == O} -+ 0, e.g. by Theorem 4.17, we also
note that
limsup I(T - TI < n- 1 L l{Sk = O} O.
€o kn
Hence, we may choose some constants En -4 0 with an - T 0, and by
Theorem 4.28 we get T ..5, T. 0
Theorem 14.9 is often referred to as an invariance principle, because the
limiting distribution of f(X n ) is the same for all i.i.d. sequences (k) with
mean 0 and variance 1. This fact is often useful for applications, since a
direct computation may be possible for some special choice of distribution,
such as for P{ k == :i:l} == .
The approximation Theorem 14.6 yields a corresponding result for
renewal processes, regarded here as nondecreasing step processes.
Theorem 14.13 (approximation of renewal processes) ..f..tet N be a renewal
process based on some distribution J-L with mean 1 and variance a 2 E (0, 00 ) .
Then there exists a Brownian motion B such that
t- 1 / 2 sUPstlNs - s - aBsl 0, t -4 00, (7)
. Nt - t - a Bt
hm == 0 a.s.
too v 2t log log t
Proof: Let TO, T1,. .. be the renewal times of N, and introduce the random
walk Sn == n - r n + TO, n E Z+. Choosing a Brownian motion B as in
Theorem 14.6, we get
1 . N'Tn - Tn - aBn I . Sn - aBn
1m == 1m == 0 a.s.
noo v 2n log log n noo y 2n log log n
Since Tn r>..J n a.s. by the law of large numbers, we may replace n in the
denominator by Tn, and by Lemma 14.7 we may further replace En by B'T n .
Hence,
(8)
Nt - t - a Bt
v' I ---+ 0 a.s. along ( Tn).
2t log og t
Invoking Lemma 14.7, we see that (8) will follow if we can only show that
Tn+1- T n 0
a.s.
y 2Tn log log Tn
This may be seen most easily from Theorem 14.6.
From Theorem 14.6 we see that also
n- 1 / 2 sup IN'Tk - Tk - aBkl == n-l/2 sup ISk - TO -- aBkl 0,
kn kn
and by Brownian scaling,
n- 1 / 2 w(B,n,1) d w(B,I,n- 1 ) o.
278 Foundations of Modern Probability
To get (7), it is then enough to show that
n-l/2suPknITk - Tk-l - 11 = n-l/2suPknISk - Sk-ll 0,
which is again clear from Theorem 14.6.
o
We may now proceed as in Corollary 14.8 and Theorem 14.9 to deduce
an associated law of the iterated logarithm and a weak convergence result.
Corollary 14.14 (limits of renewal processes) Let N be a renewal process
based on some distribution J.L with mean 1 and variance a 2 < 00. Then
. :f: ( N t - t )
hm sup = (}' a.s.
too ..J 2t log log t
If B is a Brownian motion and
X r _ N rt - rt [ ]
t - ar 1 /2 ' t EO, 1 , r > 0,
then also f(X r ) i+ f(B) as r -+ 00 for any measurable function f
D[0,1] --+ 1R that is a.s. continuous at B.
The weak convergence part of the last corollary yields a similar result
for the empirical distribution functions associated with a sequence of i.i.d.
random variables. In this case the asymptotic behavior can be expressed in
terms of a Brownian bridge.
Theorem 14.15 (approximation of empirical distribution functions) Let
1 , 2, . .. be i. i. d. random variables with distribution function F and em-
pirical distribution functions ii, F 2 ,... . Then there exist some Brownian
bridges B 1 , B 2 , . .. such that
SUPx In 1 / 2 (Fn(x) - F(x)) - B n 0 F(x)1 0, n -t 00. (9)
Proof: Arguing as in the proof of Proposition 4.24, we may reduce the
discussion to the case when the n are U(O, 1), and F(t) = t on [0,1]. Then
clearly
n 1 / 2 (Fn(t) - F(t)) = n- 1 / 2 Lkn (1{k < t} - t), t E [0, 1].
Now introduce for each n an independent Poisson random variable "'n with
mean n, and conclude from Proposition 12.4 that Nr = Ek<K 1 {k < t}
_ n
is a homogeneous Poisson process on [0,1] with rate n. By Theorem 14.13
there exist some Brownian motions W n on [0,1] with
SUPt9In-l/2(N;' - nt) - wtl o.
For the associated Brownian bridges Bf = wtn - tWi, we get
SUPt < 1In- 1 / 2 (N;' - tNI'") - Bfl o.
14. Skorohod Embedding and lnvariance Principles 279
To deduce (9), it is enough to show that
n- 1 / 2 SUPt<1 " (l{k < t} - t) O. (10)
- k::;ln-nl
Here In -nl 00, e.g. by Proposition 5.9, and so (10) ho]ds by Proposition
4.24 with n 1 / 2 replaced by I/'\:n - nl. It remains to note that n- 1 / 2 1K: n - nl
is tight, since E(n - n)2 = n. 0
Our next aim is to establish martingale versions of the Skorohod
embedding Theorem 14.1 and the associated approximation Theorem 14.6.
Theorem 14.16 (embedding of martingales) Let (M n ) be a martingale
with Mo = 0 and induced filtration (9n). Then there exist a Brownian
motion B and some associated optional times 0 == TO < 71 < . .. such that
M n = Br n a.s. for all nand
E[TnIFn-l] = E[(Mn)2Ign_1], (11)
E[(Tn)2IFn_1] < 4E[(Mn)4Ign_1,' (12)
where (Fn) denotes the filtration induced by the pairs (Af n , Tn).
Proof: Let J.tl, J.t2, . -. be probability kernels satisfying
P[Mn E -I Qn-l] == /Ln(M 1 ,.. ., Mn-l;.) a.s., n E N. (13)
Since the M n form a martingale, we may assume that J.Ln (x; .) has mean 0
for all x E jRn-l. Define the associated measures iin(x; .) on JR2 as in Lemma
14.4, and conclude from the measurability part of the lemma that iin is a
probability kernel from IRn-l to}R2. Next choose some measurable functions
In: ]Rn -). JR2 as in Lemma 3.22 such that /n(x, {}) has distribution iin(x, .)
when {) is U(O, 1).
Now fix any Brownian motion B' and some independent Li.d. U(O, 1) ran-
dom variables {)l, {}2, . . . . Take T = 0, and recursively define the random
variables an, !3n, and T, n EN, through the relations
(an, !3n)
f n (BI , · . . , BI , {} n) ,
1 n-l
(14)
(15)
I
Tn
- inf {t > <-1; B - B_l E {ar",Bn}}.
Since B' is a Brownian motion for the filtration Bt = a{(B')t, (19 n )}, t > 0,
and each T is B-optional, the strong Markov property shows that Bin) =
BI + t - BI is again a Brownian motion independent of F = a{ T k ' , B' , ;
n n Tk
k < n}. Since also {}n+lJl(B(n), F), we have (B(n), {}71+1)lLF. Writing
g == a{ B/; k < n}, it follows easily that
k
(T + l' BI )llgl F.
n+l n
(16)
By (14) and Theorem 6.4 we have
P(on, ,an) E .Ig-l] = jln(B/, . . . , BI ; .).
1 n-l
(17)
280 Foundations of Modern Probability
Since also B(n-I) lL(a n , ,In, g-I)' we have B(n-I) lLg, (an, /3n), and
n-l
B(n-I) is conditionally a Brownian motion. Applying Lemma 14.5 to the
conditional distributions given g-l' we get by (15), (16), and (17)
P[LlB, E .Ig-I]
n
E[TIF_l] == E[Tlg_I]
E[(T)2IF_1] == E[(7)2Ig_1] <
J-Ln(B/,...,B, ;.),
1 n-l
E[(B, )2Ig_I],
n
4E[(B, )419_I].
n
(18)
(19)
(20)
Comparing (13) and (18) gives (B/) d (M n ). By Theorem 6.10 we may
then choose a Brownian motion B with associated optional times 71, T2, . . .
such that
{B, (M n ), (Tn)} d {B', (B,), (7)}.
n
All a.s. relations between the objects on the right, including also their
conditional expectations given any induced a-fields, remain valid for the
objects on the left. In particular, M n == Br n a.s. for all n, and relations
(19) and (20) imply the corresponding formulas (11) and (12). 0
We may use the last theorem to show how martingales with small jumps
can be approximated by a Brownian motion. For martingales M on Z+,
we then introduce the quadratic variation [M] and predictable quadratic
variation (M), given by
[M]n == " (b. M k)2,
L-ik'5:n
(M)n = " E[(D. M k)2I F k_I].
L-ik5. n
Continuous-time versions of those processes are considered in Chapters 17
and 26.
Theorem 14.17 (approximation of martingales with small jumps) For
each n E N, let M n be an Fn-martingale on Z+ with Mr; == 0 and lilMrl <
p
1, and assume that sUPk IMrl --+ O. Define
Xr = Lk LlMk'l{[M n ]k < t}, t E [0,1], n E N,
and put (n == [Mn]ooe Then (x n _Bn)(nl\l 0 for some Brownian motions
Bn. This remains true with [Mn] replaced by (Mn), and we may also replace
the condition sUPk ID.Mi:1 0 by
k P[lLlMk' I > el.1k-l] 0, e > O. (21)
For the proof, we need to show that the time scales given by the sequences
(7;;), [Mn], and (M n ) are asymptotically equivalent.
14. Skorohod Embedding and Invarianc€ Principles 281
Lemma 14.18 (time-scale comparison) Assume in Theorem 14.17 that
Mi: == Bn(T;:) a.s. for some Brownian motions Bn and associated optional
times TJ: as in Theorem 14.16. Put K == inf{k; [Mn]k > t}. Then as
n -+ 00 for fixed t > 0, we have
sup (IT;: - [Mn]kl V I[Mn]k - (Mn)kl) O. (22)
k
Proof: By optional stopping, we may assume that [lJn] is uniformly
bounded and take the supremum in (22) over all k. To handle the second
difference in (22), we note that Dn == [Mn] - (Mn) is a martingale for
each n. Using the martingale property, Proposition 7.1(i, and dominated
convergence, we get
E(D n )*2
<
SUPk E (Di.n 2 = Lk E(D'k)2
Lk EE[(D'k)2IFk_l]
Lk EE[([Mn]k)2IFk_l]
E'" ( Mn ) 4 < Esu p ( M1 ) 2 -+ 0
L..Jk k ___ k k ,
---
<
and so (Dn)* O. This clearly remains true if each sequence (Mn) is
defined in terms of the filtration gn induced by M n .
To complete the proof of (22), it is enough to show, for the latter versions
of (Mn), that (Tn - (M n ))* o. Then let Tn denote the filtration induced
by the pairs (MJ:,T;:), kEN, and conclude from (11) that
(Mn)m == '" E[Tkl7k-l], m, n E N.
L..Jkm
Hence, Dn == Tn - (M n ) is a Tn-martingale. Using (11) and (12), we then
get as before
E(b n )*2
<
---
SUPk E (D'k)2 = Lk EE[(Dk)217k_l]
Lk EE[(Tk)217k_l]
Lk EE[(AM k )419 k _ 1 ]
EL k (AM k )4 ESUPk(M;)2 --+ O.
o
<
<
---
The sufficiency of (21) is a consequence of the following simple estimate.
Lemma 14.19 (Dvoretzky) For any filtration:F on Z+ and sets An E :F n ,
n EN, we have
PUnAn < p{LnP[AnIFn-l] > c} +E, E > O.
282 Foundations of Modern Probability
Proof: Write n = IAn and tn = P[AnIFn-l], fix any E > 0, and define
r = inf{n; 1 +... +n > E}. Then {r < n} E Fn-l for each n, and so
ELn == LE[n; T > n] = LE[tn; 7> n] = ELn < E.
n<T n n n<T
Hence,
PUAn < P{r<oo}+EL(n < P{Lnn>C:}+C:. 0
n n<T
Proof of Theorem 14.17: To prove the result for the time-scales [Mn], we
may reduce by optional stopping to the case when [Mn] < 2 for all n. For
each n we may choose some Brownian motion Bn and associated optional
times 71: as in Theorem 14.16. Then
(X n - Bn)(n A1 < w(Bn, 1 + 6n, 8 n ), n E N,
where
8n = sUPk{lrk - [Mn]kl + (Mk)2},
and so
E[(X n - B n )(nl\l/\ 1] < E[w(B n , 1 + h, h) 1\ 1] + P{6n > h}.
Since 6n 0 by Lemma 14.18, the right-hand side tends to zero as n -+ 00
and then h -+ 0, and the assertion follows.
In the case of the time scales (Mn), define Kn = inf{k; [Mn] > 2}. Then
[Mn]n - (Mn)n 0 by Lemma 14.18, and so P{ (Mn)K n < 1, Kn <
oo} --+ O. We may then reduce by optional stopping to the case when
[Mn] < 3. The proof may now be completed as before. 0
Though the Skorohod embedding has no natural extension to higher di-
mensions, one can still obtain useful multidimensional approximations by
applying the previous results to each component separately. To illustrate
the method, we proceed to show how suitable random walks in JRd can be
approximated by continuous processes with stationary, independent incre-
ments. Extensions to more general limits are obtained by different methods
in Corollary 15.20 and Theorem 16.14.
Theorem 14.20 (approximation of random walks in JRd) Let 8 1 , 8 2 , . . .
be random walks in ]Rd such that .c(Sn) N(O,o-o-') for some d x d
matrix (Y and integers m n --+ 00. Then there exist some Brownian mo-
tions B 1 , B 2 , . .. in]Rd such that the processes Xl" = 8nt] satisfy
p
(xn -aBn); --+ 0 for all t > O.
Proof: By Theorem 5.15 we have
p
max Id8 k l -+ 0, t > 0,
k < 111,n t
14. Skorohod Embedding and Invariance Principles 283
and so we may assume that ISkl < 1 for all nand k. Subtracting the
means, we may further assume that EST: = O. Applying Theorem 14.17
in each coordinate, we get w(X n , t, h) 0 as n -+ 00 and then h -+ O.
Furthermore, w(uB, t, h) 0 a.s. as h o.
Using Theorem 5.15 in both directions gives Xf n aBt as t n -i' t. By
independence it follows that (X,. . . , Xf rn ) a(Bt 1 , . . . , Bt-rn) for all n E
Nand t 1 ,..., t n > 0, and so x n aB on Q+ by Theorem 4.29. By
Theorem 4.30 or, more conveniently, by Corollary 6.12 and Theorem A2.2,
there exist some rcll processes yn d xn with n -+ aBt a.s. for all t E Q+.
For any t, h > 0 we have
E[(yn - aB); /\ 1] < E [maxjt/hIYjh - aBjhl/\ 1]
+E[w(yn, t, h) /\ 1] + E[w(aB, t, h) /\ 1].
Multiplying by e- t , integrating over t > 0, and letting 'n -+ 00 and then
h -+ 0 along Q+, we get by dominated convergence
1 00 e- t E[(yn - oB); 1\ l]dt -+ o.
Hence, by monotonicity, the last integrand tends to zero as n -+ 00, and so
(yn - uB); 0 for each t > O. It remains to use Theorem 6.10. 0
Exercises
1. Proceed as in Lemma 14.2 to construct Brownian martingales with
leading terms Bt and Bf. Use multiple Wiener-Ita integrals to give an
alternative proof of the lemma, and find for every n E N a martingale with
leading term Br. (Hint: Use Theorem 13.25.)
2. Given a Brownian motion B and an associated optional time T < 00,
show that ET > EB;.. (Hint: Truncate T and use Fatou's lemma.)
3. For Sn as in Corollary 14.8, show that sequence of random variables
(2n log log n) -1/2 Sn, n > 3, is a.s. relatively compact with set of limit points
equal to [-1,1]. (Hint: Prove the corresponding property for Brownian
motion, and use Theorem 14.6.)
4. Let 1, 2, . .. be i.i.d. random vectors in JRd with mean 0 and covariances
bij. Show that the conclusion of Corollary 14.8 holds with Sn replaced by
I Sn I. More precisely, show that the sequence (2n log log n) -1/2 Sn, n > 3, is
relatively compact in JRd, and that the set of limit points is contained in
the closed unit ball. (Hint: Apply Corollary 14.8 to the projections u . Sn
for arbitrary u E ]Rd with lul = 1.)
284 Foundations of Modern Probability
5. In Theorem 13.18, show that for any C E (0,1) there exists a sequence
t n -+ 00 such that the limsup along (t n ) equals c a.s. Conclude that the
set of limit points in the preceding exercise agrees with the closed unit ball
in IR d .
6. Condition (21) clearly follows from L:k E[IMkll\ 11.F_1] O. Show
by an example that the latter condition is strictly stronger. (Hint: Consider
a sequence of random walks.)
7. Specialize Lemma 14.18 to random walks, and give a direct proof in this
case.
8. In the special case of random walks, show that condition (21) is also
necessary. (Hint: Use Theorem 5.15.)
9. Specialize Theorem 14.17 to a sequence of random walks in IR, and
derive a corresponding extension of Theorem 14.9. Then derive a functional
version of Theorem 5.12.
10. Specialize further to the case of successive renormalizations of a single
random walk Sn. Then derive a limit theorem for the values at t == 1, and
compare with Proposition 5.9.
11. In the second arcsine law of Theorem 14.11, show that the first maxi-
mum on [0, 1] can be replaced by the last one. Conclude that the associated
times an and Tn satisfy Tn - an O. (Hint: Use the corresponding result
for Brownian motion. Alternatively, use the symmetry of (Sn) and of the
arcsine distribution.)
12. Extend Theorem 14.11 to an arbitrary sequence of symmetric ran-
dom walks satisfying a Lindeberg condition. Also extend the results for T
and T to sequences of random walks based on diffuse, symmetric distribu-
tions. Finally, show that the result for T may fail in the latter case. (Hint:
Consider the n- 1 -increments of a compound Poisson process based on the
uniform distribution on [-1,1], perturbed by a small diffusion term EnB,
where B is an independent Brownian motion.)
13. In the context of Theorem 14.20, show that for any Brownian motion
B there exist some processes yn d X n such that (yn - a B); -+ 0 a.s.
for all t > o. Prove a corresponding version of Theorem 14.17. (Hint: Use
Theorem 4.30 or Corollary 6.12.)
Chapter 15
Independent Increments
and Infinite Divisibility
Regularity and integral representation; Levy processes and sub-
ordinators; stable processes and first-passage time..,; infinitely
divisible distributions; characteristics and convergence crite-
ria; approximation of Levy processes and random 'walks; limit
theorems for null arrays; convergence of extremes
In Chapters 12 and 13 we saw how Poisson processes and Brown-
ian motion arise as special processes with independent increments. Our
present aim is to study more general processes of this type. Under a
mild regularity assumption, we shall derive a general representation of
independent-increment processes in terms of a Gaussian component and
a jump component, where the latter is expressible as a suitably compen-
sated Poisson integral. Of special importance is the time-homogeneous case
of so-called Levy processes, which admit a description in terms of a charac-
teristic triple (a, b, v), where a is the diffusion rate, b is the drift coefficient,
and v is the Levy measure that determines the rates for jumps of different
sizes.
In the same way that Brownian motion is the basic example of both a a
diffusion process and a continuous martingale, the general Levy processes
constitute the fundamental cases of both Markov processes and general
semimartingales. As a motivation for the general weak convergence theory
of Chapter 16, we shall further see how Levy processes serve as the natural
approximations to random walks. In particular, such approximations may
be used to extend two of the arcsine laws for Brownian Jnotion to general
symmetric Levy processes. Increasing Levy processes, even called subordi-
nators, playa basic role in Chapter 22, where they appear in representations
of local time and regenerative sets.
The distributions of Levy processes at fixed times coincide with the in-
finitely divisible laws, which also arise as the most general limit laws in the
classical limit theorems for null arrays. The special cases of convergence to-
ward Poisson and Gaussian limits were considered in Chapter 5, and now
we shall be able to characterize the convergence toward an arbitrary in-
finitely divisible law. Though characteristic functions will still be needed
occasionally as a technical tool, the present treatment is more probabilis-
286 Foundations of Modern Probability
tic in flavor and involves as crucial steps a centering at truncated means
followed by a compound Poisson approximation.
To resume our discussion of general independent-increment processes, say
that a process X in ]Rd is continuous in probability if Xs !+ Xt whenever
s t. Let us further say that a function f on JR+ or [0, 1] is right-continuous
with left-hand limits (abbreviated as rell) if the right- and left-hand limits
ft:1:: exist and are finite and if, moreover, ft+ = ft. A process X is said to be
rcll if its paths have this property. In that case only jump discontinuities
may occur, and we say that X has a fixed jump at some time t > 0 if
P{X t f= Xt-} > O.
The following result gives the basic regularity properties of independent-
increment processes. A similar result for Feller processes is obtained by
different methods in Theorem 19.15.
Theorem 15.1 (regularization, Levy) If a process X in d is continuous
in probability and has independent increments, then X has an rcll version
without fixed jumps.
For the proof we shall use a martingale argument based on the
characteristic functions
'Ps,t(u) = Eexp{iu(X t - Xs)}, u E ]Rd, 0 < s < t.
Note that 'Pr,s'Ps,t = 'Pr,t for any r < s < t, and put 'PO,t = 'Pt. In order to
construct associated martingales, we need to know that 'Ps,t =1= o.
Lemma 15.2 (zeros) For any u E JRd and s < t, we have 'Ps,t(u) =1= o.
Proof: Fix any u E JRd and s < t. Since X is continuous in probability,
there exists for any r > 0 some h > 0 such that 'Pr,r'(u) =1= 0 whenever
Ir - r'l < h. By compactness we may then choose finitely many division
point s = to < t} < ... < t n = t such that 'Ptk-l ,tk (u) =1= 0 for all k, and
by the independence of the increments we get 'Ps,t(u) = Ilk'Ptk-l,tk(u) =1=
o. 0
We also need the following deterministic convergence criterion.
Lemma 15.3 (complex exponentials) Fix any aI, a2,... E ]Rd. Then an
converges iff e iuan converges for almost every U E ]Rd.
Proof: Assume the stated condition. Fix a nondegenerate Gaussian ran-
dom vector 'T] in JRd, and note that exp{ it'T]( am - an)} 1 a.s. as m, n ---t 00
for fixed t E JR. By dominated convergence the characteristic function of
'T](a m - an) tends to 1, and so 'T](a m - an) 0 by Theorem 5.3, which
implies am - an o. Thus, (an) is Cauchy and therefore convergent. 0
Proof of Theorem 15.1: We may clearly assume that Xo = o. By Lemma
15.2 we may define
e iuXt
M = () ' t > 0, U E ]Rd,
<Pt U
15. Independent Increments and Infinite l)ivisibility 287
which is clearly a martingale in t for each u. Letting Ou c n denote the
set where e iuXt has limits from the left and right along (11+ at every t > 0,
we see from Theorem 7.18 that po. u == 1.
Restating the definition of Ou in terms of upcrossings, we note that
the set A == {(u,w); W E Ou} is product measurable in jRd x O. Writing
Aw == {u E ]Rd; W E o.u}, it follows by Fubini's theorem that the set
(2' == {w; Ad A == O} has probability 1. If w E 0', we have u E Aw for
almost every u E JRd, and so Lemma 15.3 shows that X itself has finite
right- and left-hand limits along Q+. Now define Xt == X t + on 0.' and
X == 0 on o.'c, and note that X is rcll everywhere. Further note that X is
a version of X, sice Xt+h Xt as h ---t 0 for fixed t by hypothesis. For
the same reason, X has no fixed jumps. D
We proceed to state the general representation theorem. Given any Pois-
son process 'fJ with intensity measure 11 == E'TJ, we recall from Theorem
12.13 that the integral (1J - 11) f == J f (x) (1J - 11) (dx) exists in the sense of
approximation in probability iff lI(f2 1\ If I) < 00.
Theorem 15.4 (independent-increment processes, Levy, lto) Let X be an
rell process in ]Rd with Xo == o. Then X has independent increments and
no fixed jumps iff, a.s. for each t > 0,
Xt=mt+G t + t ( x(1]-E1])(dsdx)+ t ( X 1](dsdx), (1)
10 Jlxll 10 J1xl>1
for some continuous function m with mo == 0, some continuous centered
Gaussian process G with independent increments and Go == 0, and some
independent Poisson process 1J on (0,00) X (JR d \ {O}) with
l t j(lx' 2 A l)E1](dsdx) < 00, t > O. (2)
In the special case when X is real and nondecreasing, (1) simplifies to
Xt = at + l t l°O x1](dsdx), t > 0, (3)
for some nondecreasing continuous function a with ao 0 and some
Poisson process TJ on (0, 00)2 with
l t l°O(XAl)E1](dsdX) <00, t>O. (4)
Both representations are a.s. unique, and all functions m or a and processes
G and 'TJ with the stated properties may occur.
We begin the proof by analyzing the jump structure of X. Let us then
introduce the random measure
1] = Lt8t,Xt = L?{(t,Xt) E.},
(5)
where the summation extends over all times t > 0 with Xt - Xt - Xt- =I-
O. We say that 'TJ is locally X -measurable if, for any s <::: t, the measure
7](8, t] X .) is a measurable function of the process X r - Xs, r E [s, t].
oo rUUllua10ns or lVloaern r'robability
Lemma 15.5 (Poisson process of jumps) Let X be an Tell process in d
with independent increments and no fixed jumps. Then 1J in (5) is a locally
X -measurable Poisson process on (0,00) x (JRd \ {O}) satisfying (2). If X is
further real-valued and nondecreasing, then "1 is supported by (0,00)2 and
satisfies (4).
Proof (beginning): Fix any times s < t, and consider a sequence of parti-
tions s = tn,o < . . . < tn,n with maxk(tn,k-tn,k-l) -+ O. For any continuous
function f on JRd that vanishes in a neighborhood of 0, we have
L/(Xtn,k - Xtn,k_J -4 J f(x)1J((s, t] x dx),
which implies the measurability of the integrals on the right. By a simple
approximation we may conclude that 1J((s, t] x B) is measurable for every
compact set B C JRd \ {OJ. The measurability extends by a monotone
class argument to all random variables 1]A with A included in some fixed
bounded rectangle [0, t] x B, and the further extension to arbitrary Borel
sets is immediate.
Since X has independent increments and no fixed jumps, the same prop-
erties hold for 1], which is then Poisson by Theorem 12.10. If X is real-valued
and nondecreasing, then (4) holds by Theorem 12.13. 0
The proof of (2) requires a further lemma, which is also needed for the
main proof.
Lemma 15.6 (orthogonality and independence) Let X and Y be rcll pro-
cesses in JRd with Xo = Yo = 0 such that (X, Y) has independent increments
and no fixed jumps. Assume also that Y is a.s. a step process and that
X. LlY = 0 a.s. Then XllY.
Proof: Define 'fJ as in (5) in terms of Y, and note as before that 1] is locally
Y-measurable whereas Y is locally 'T}-measurable. By a simple transforma-
tion of 1J we may reduce to the case when Y has bounded jumps. Since
1] is Poisson, Y then has integrable variation on every finite interval. By
Corollary 3.7 we need to show that (X tI ,... , X tn )JL(lt I , . . . , lt n ) for any
tl < ... < tn, and by Lemma 3.8 it suffices to show for all s < t that
Xt - XsJlyt - Ys. Without loss of generality, we may take s = 0 and t = 1.
Then fix any u, v E d, and introduce the locally bounded martingales
e iuX t
Mt = E . x '
e ZU t
e ivYt
Nt = E . Y; ,
eV t
t > o.
15. Independent Increments and Infinite l)ivisibility 289
Note that N again has integrable variation on [0,1]. For n E N, we get by
the martingale property and dominated convergence
E MINI - 1 E Lk:'Sn (Mk/n - M(k-l)/n) (N k / n - N(k-l)/n)
E 1 1 (M[sn+l-J/n - M[sn-]/n) dN
-+ E (I llMsdN s = E L llMsllN s = O.
Jo s1
Thus, E MINI = 1, and so
EeiuXl+ivYl = EeiuXIEeivYl, u,v E d.
The asserted independence XIJlY I now follows by the uniqueness theorem
for characteristic functions. 0
End of proof of Lemma 15.5: It remains to prove (2). 'Then define TJt =
1]([0, t] x .), and note that 1Jt{x; Ixl > E} < 00 a.s. for aU t, c > 0 because
X is rcll. Since 1] is Poisson, the same relations hold for the measures E1Jt,
and so it suffices to prove that
( Ixl 2 E'T}t(dx) < 00, t > O.
Jlxl1
Then introduce for each e > 0 the process
(6)
x: = L llXsl{lllXsl > e} = ( x'T}t(dx), t > 0,
st J1xl>c
and note that Xc lLX - Xc by Lemma 15.6. By Lemmas 12.2 (i) and 15.2
we get for any €,t > 0 and u E}Rd \ {O}
o < I Eei'Ux t I < I Ee iuX : I = Eexp ( iuxT}t(dx)
J'xl>c
exp { (e iux - l)ET}t(dx) = exp { (cosux - l)ET}t(dx).
J1xl>c J1xl>c
Letting € -+ 0 gives
( l ux I 2 ET}t(dx) J (l- cosux)E1Jt(dx) < 00,
Jluxl1
and (6) follows since u is arbitrary.
o
Proof of Theorem 15.4: In the nondecreasing case, we may subtract the
jump component to obtain a continuous, nondecreasing process Y with
independent increments, and from Theorem 5.11 it is clear that Y is a.s.
nonrandom. Thus, in this case we get a representation as in (3).
:l9U bundations of Modern Probability
In the general case, introduce for each € E [0, 1 J the martingale
M{ = t r x (1] - E1])(dsdx), t > o.
J o J1xIE(c,lJ
Put Mt = M?, and let J t denote the last term in (1). By Proposition
7.16 we have E(Mc - MO);2 0 for each t. Thus, M + J has a.s. the
same jumps as X, and so the process Y = X - M - J is a.s. continuous.
Since 'rJ is locally X-measurable, the same thing is true for Y. Theorem
13.4 then shows that Y is Gaussian with continuous mean and covariance
functions. Subtracting the means mt yields a continuous, centered Gaussian
process G, and by Lemma 15.6 we get ClL(Mc + J) for every £ > o. The
independence extends to M by Lemma 3.6, and so GlL1].
The uniqueness of 1] is clear from (5), and G is then determined by
subtraction. From Theorem 12.13 it is further seen that the integrals in (1)
and (3) exist for any Poisson process 1] with the stated properties, and we
note that the resulting process has independent increments. 0
We may now specialize to the time-homogeneous case, when the distri-
bution of Xt+h - Xt depends only on h. An rcll process X in ]Rd with
stationary independent increments and X o = 0 is called a Levy process. If
X is also real and nonnegative, it is often called a subordinator.
Corollary 15.7 (Levy processes and subordinators) An rcll process X in
R d is Levy iff (1) holds with mt = bt, G t = uBt, and E1J = A Q9 v for some
b E JRd, some d x d-matrix u, some measure v on R d \ {O} with f(lxl2 A
l)v(dx) < 00, and some Brownian motion BJ1.1] in ]Rd. Furthermore, X is
a subordinator iff (3) holds with at = at and E'fJ = AQ9V for some a > 0 and
some measure v on (0,00) with f(x A l)v(dx) < 00. The triple (aa', b, v)
or pair (a, v) is then determined by .c(X), and any a, b, u, and v with the
stated properties may occur.
The measure v above is called the Levy measure of X, and the quantities
au', b, and v or a and v are referred to collectively as the characteristics
of X.
Proof: The stationarity of the increments excludes the possibility of fixed
jumps, and so X has a representation as in Theorem 15.4. The stationarity
also implies that E1] is time invariant. Thus, Theorem 2.6 yields ETJ = AQ9v
for some measure v on JRd \ {O} or (0,00). The stated conditions on v are
immediate from (2) and (4). Finally, Theorem 13.4 gives the form of the
continuous component. Formula (5) shows that 'fJ is a measurable function
of X, and so v is uniquely determined by £(X). The uniqueness of the
remaining characteristics then follows by subtraction. 0
From the representations in Theorem 15.4 we may easily deduce the
following so-called Levy-Khinchin formulas for the associated characteristic
functions or Laplace transforms. Here we write u' for the transpose of u.
15. Independent Increments and Infinite I)ivisibility 291
Corollary 15.8 (characteristic exponents, Kolmogorov, Levy) Let X be
a Levy process in R d with characteristics (a, b, v). Then EeiuXt == et1/Ju for
all t > 0 and u E ]Rd, where
tPu = iu'b - !u'au + !(eiU'X - 1- iu'x1{lxl < 1})v(dx), u E ]Rd. (7)
If X is a subordinator with characteristics (a, v), then also Ee- UXt == e- txu
for all t, u > 0, where
Xu = ua + ! (1- e-UX)v(dx), u > O. (8)
In both cases, the characteristics are determined by .c (X 1) .
Proof: Formula (8) follows immediately from (3) and Lemma 12.2 (i).
Similarly, (7) is obtained from (1) by the same lemma when v is bounded,
and the general case then follows by dominated convergence.
To prove the last assertion, we note that is the unique continuous
function with 'l/Jo = 0 satisfying e'l/Ju == Ee iuX1 . By the uniqueness theorem
for characteristic functions and the independence of the increments, 'l/J de-
termines all finite-dimensional distributions of X, and so the uniqueness of
the characteristics follows from the uniqueness in Corollary 15.7. 0
From Proposition 8.5 we note that a Levy process X is Markov for the
induced filtration 9 = (Qt) with translation-invariant transition kernels
J-Lt(x,B) = J..Lt(B-x) = P{X t E B-x}. More generally, given any filtration
:F, we say that X is Levy with respect to F, or simply :F-Levy, if X is
adapted to F and such that (X t - Xs)llFs for all s < t. In particular, we
may take:F t = gt V N, t > 0, where N == a{N c A; A E A, PA == OJ.
Note that the latter filtration is right-continuous by Corollary 7.25. Just
as for Brownian motion in Theorem 13.11, we further see that any process
X which is F-Levy for some right-continuous, complete filtration F is a
strong Markov process, in the sense that the process X' == t',X -X, satisfies
X d X' Jl.F, for any finite optional time T.
We turn to a brief discussion of some basic symmetry properties. A pro-
cess X on JR+ is said to be self-similar if for any r > 0 there exists some
s == h(r) > 0 such that the process X rt , t > 0, has the saIne distribution as
sX. Excluding the trivial case when Xt == 0 a.s. for all t > 0, it is clear that
h satisfies the Cauchy equation h(xy) == h(x)h(y). If X is right-continuous,
then h is continuous, and the only solutions are of the form h(x) = x Q for
some Q E JR.
Let us now return to the context of Levy processes. \Ve say that such
a process X is strictly stable if it is self-similar, and weakly stable if it is
self-similar apart from a centering, so that for each r > 0 the process (X rt )
has the same distribution as (sXt + bt) for suitable sand b. In the latter
case, the corresponding symmetrized process is strictly stable, and so s is
again of the form rO;. In both cases it is clear that Q > o. We may then
introduce the index p = a-I and say that X is strictly or weakly p-stable.
292 Foundations of Modern Probability
The terminology carries over to random variables or vectors with the same
distribution as Xl.
Proposition 15.9 (stable Levy processes) Let X be a nondegenerate Levy
process in R with characteristics (a, b, v). Then X is weakly p-stable for
some p > 0 iff either of these conditions holds:
(i) p == 2 and v == 0;
(ii) p E (0,2), a == 0, and v(dx) == C:i:lx\-P-Idx on :i: for some C::I: > o.
For subordinators, weak p-stability is equivalent to the condition
(iii) p E (0,1) and v(dx) == cx-p-1dx on (0,00) for some c > o.
Proof: Writing Sr : x rx for any r > 0, we note that the processes
X (rPt) and r X have characteristics r P (a, b, v) and (r 2 a, rb, v 0 S; 1 ), respec-
tively. Since the latter are determined by the distributions, it follows that
X is weakly p-stable iff rPa == r 2 a and rPv == v 0 S;l for all r > o. In par-
ticular, a == 0 when p =F 2. Writing F(x) == v[x, 00) or v( -00, -x], we also
note that r P F(rx) == F(x) for all r, x > 0, and so F(x) == x- P F(l), which
yields the stated form of the density. The condition J(x 2 A l)v(dx) < 00
implies p E (0, 2) when v -I o. If X > 0, we have the stronger condition
J(x A l)v(dx) < 00, so in this case p < 1. 0
If X is weakly p-stable for some p -11, it can be made strictly p-stable
by a suitable centering. In particular, a weakly p-stable subordinator is
strictly stable iff the drift component vanishes. In the latter case we simply
say that X is stable.
The next result shows how stable subordinators may arise naturally even
in the study of continuous processes. Given a Brownian motion B in , we
introduce the maximum process Mt == sUPst Bs and its right-continuous
Inverse
Tr == inf {t > 0; Mt > r} == inf {t > 0; Bt > T}, r > o. (9)
Theorem 15.10 (first-passage times, Levy) For a Brownian motion B,
the process T in (9) is a -stable subordinator with Levy measure
v(dx) == (27r)-1/2 x -3/2dx, x > o.
Proof: By Lemma 7.6, the random times Tr are optional with respect
to the right-continuous filtration :F induced by B. By the strong Markov
property of B, the process BrT - Tr is then independent of FTr with the
same distribution as T. Since T is further adapted to the filtration (FT r ),
it follows that T has stationary independent increments and hence is a
su bordinator .
To see that T is -stable, fix any c > 0, put Bt = c- 1 B(c 2 t), and define
'i r == inf{t > 0; Bt > r}. Then
Tcr = inf{t > 0; Bt > cr} = c 2 inf{t > 0; Bt > r} == c2Tr.
15. Independent Increments and Infinite Divisibility 293
By Proposition 15.9 the Levy measure of T has a density of the form ax- 3 / 2 ,
x > 0, and it remains to identify a. Then note that the process
Xt = exp(uB t - u2t), t > 0,
is a martingale for any u E JR. In particular, EX TrAt = 1 for any r, t > 0,
and since clearly B Tr == T, we get by dominated convergence
E exp( - u2Tr) == e- ur , u, T > O.
Taking u == v2 and comparing with Corollary 15.8, we obtain
v2 == 1 00 (1 - e- X )x- 3 / 2 dx = 2 1 00 e-xx-l/2dx = 2J1f,
a 0 0
which shows that a == {27r)-1/2. 0
If we add a negative drift to a Brownian motion, the associated maximum
process M becomes bounded, and so T == M- 1 terminates by a jump
to infinity. For such occasions, it is useful to consider subordinators with
possibly infinite jumps. By a generalized subordinator we mean a process of
the form Xt = yt + 00. l{t > (} a.s., where Y is an ordinary subordinator
and ( is an independent, exponentially distributed random variable. In
this case we say that X is obtained from Y by exponential killing. The
representation in Theorem 15.4 remains valid in the generalized case, except
that v may now have positive mass at 00.
The following characterization is needed in Chapter 22.
Lemma 15.11 (generalized subordinators) Let X be a nondecreasing and
right-continuous process in [0,00] with Xo == 0, and let :F denote the
filtration induced by X. Then X is a generalized subordinatoT iff
P[X s + t - Xs E .IFs] == P{Xt E,} a.s. on {Xs < oo}, s,t > O. (10)
Proof: Writing ( == inf{t; Xt == oo}, we get from (10) the Cauchy
equation
P{( > s + t} == P{( > s}P{( > t}, s, t 2:: 0, (11)
which shows that ( is exponentially distributed with nlean m E (0,00].
Next define J..tt == P[Xt E .IX t < 00], t > 0, and conclude from (10)
and (11) that the /-tt form a semigroup under convolution. By Theorem
8.4 there exists a corresponding process Y with stationary, independent
increments. From the right-continuity of X, it follows that Y is continuous
in probability. Hence, Y has a version that is a subordinator. Now choose
- d - - -
( = ( with (lLY, and let X denote the process Y killed at (. Comparing
- d
with (10), e note that X == X. By Theorem 6.10 we may assume that
even X = X a.s., which means that X is a generalized subordinator. The
converse assertion is obvious. 0
The next result provides the basic link between Levy processes and
triangular arrays. A random vector or its distribution is said to be in-
294 Foundations of Modern Probability
finitely divisible if for every n E N there exist some i.i.d. random vectors
nl, . . . , nn with L:k nk d . By an i. i. d. array we mean a triangular ar-
ray of random vectors f,nj, j < m n , where the f,nj are i.i.d. for each nand
m n -+ 00.
Theorem 15.12 (Levy processes and infinite divisibility) For any random
vector in ]Rd, these conditions are equivalent:
(i) is infinitely divisible;
(ii) L: j f,nj f, for some i. i.d. array (f,nj);
(iii) f, d Xl for some Levy process X in d.
Under those conditions, £(X) is determined by £() = £(X I ).
A simple lemma is needed for the proof.
Lemma 15.13 (individual terms) If the nj are such as in Theorem 15.12
(ii), then nl o.
Proof: Let J.-t and J..Ln denote the distributions of and nj, respectively.
Choose r > 0 so small that {L =1= 0 on [-r, r], and write {L = e 1P on this
interval, where 'l/;: [-r, r] -+ <C is continuous with 'l/'{O) = O. Since the
convergence {l";:'n -+ jl is uniform on bounded intervals, it follows that
fln =1= 0 on [-r, r] fOf sufficiently large n. Thus, we may write jl,n(u) = e 1Pn (u)
for lul < r, where m n 1/;n -+ 'l/J on [-r, r]. Then'l/;n -+ 0 on the same interval,
and therefore {In -+ 1. Now let c < r- 1 , and note as in Lemma 5.1 that
2r J ( 1 - SIn rx ) J.Ln ( dx )
rx
> 2r(1- SlnrC )J.Ln{lxl > e}.
rE
As n -+ 00, the left-hand side tends to 0 by dominated convergence, and
we get J..tn 6 0 . 0
J (1 - jJ,n ( U ))du
Proof of Theorem 15.12: Trivially (Hi) =? (i) =? (H). Now let nj, j < m n ,
be an i.i.d. array satisfying (ii), put J.Ln = £(nj), and fix any kEN. By
Lemma 15.13 we may assume that k divides each m n and writeL: j nj =
h h . . d . th d . t . b t . * ( m n / k ) "D
'T/nl +. . . + 'TJnk, were t e'T/nj are 1.1. . WI IS fl U Ion J.Ln . ror any
u E ]Rd and r > 0 we have
(P{ U17nl > r})k = P{minj$k U 17nj > r} < P{Lj$k U17nj > kr } ,
and so the tightness of Lj 'T/nj carries over to the sequence ('T/nl). By Propo-
sition 5.21 we may extract a weakly convergent subsequence, say with
limiting distribution Vk. Since L: j 'rJnj , it follows by Theorem 5.3 that
has distribution v k k . Thus, (ii) =? (i).
Next assume (i), so that .c() = Jj = Jjn for each n. By Lemma 15.13
we get iln --+ 1 uniformly on bounded intervals, and so [1 f:: O. We may
15. Independent Increments and Infinite Divisibility 295
then write /l = e1/J and /In == e1/Jn for some continuous functions V; and n
with 'ljJ(O) == 1Pn(O) == 0, and we get 'ljJ == n'l/Jn for each n. Hence, e bjJ is
a characteristic function for every t E Q+, and then also for t E JR+ by
Theorem 5.22. By Theorem .16 there exists a process JY with stationary
independent increments such that Xt has characteristic function et'ljJ for
every t. Here X is continuous in probability, and so by Theorem 15.1 it has
an rcll version, which is the desired Levy process. Thus, (i) =} (iii). The
last assertion is clear from Corollary 15.8. 0
Justified by the one-to-one correspondence between infinitely divisible
distributions fl and their characteristics (a, b, v) or (a, 1./), we may write
J.t == id( a, b, v) or J-l == id( a, v), respectively. The last result shows that the
class of infinitely divisible laws is closed under weak convergence, and we
proceed to derive explicit convergence criteria. Then define for each h > 0
a h = a + f xx'v(dx), b h = b - f xv(dx),
J1xlSh Jh<lxlSl
where fh<lxlSl == - Jl<lxlSh when h > 1. In the positive case, we define
instead a h == a+ Jx<h xv(dx). Let JRd denote the one-point compactification
of JRd. -
Theorem 15.14 (convergence of infinitely divisible distributions)
(i) Let fl == id(a, b, v) and J.-Ln == id(a n , b n , v n ) on JRd, and fix any h > 0
with v{ Ixl == h} == O. Then fln J-l iff a -+ a h , b .-+ b h , and V n v
on JRd \ {O}.
(ii) Let J.L == id( a, v) and J-ln == id( a, v) on JR+, and fix any h > 0 with
v{h} == O. Then J.Ln J.L iff a --t a h and V n v on (0,00].
For the proof, we consider first the one-dimensional ease, which allows
some important simplifications. Thus, (7) may then be written as
f . 1 2
. 'iux ux + x _
1Pu==cu+ (e -1- 2 ) 2 v(dx),
1+x x
(12)
where
x 2
iI(dx) (}' 28 0(dx) + 2 v(dx), (13)
l+x
c = b+!( 1:x2 -X1{lxl < 1})v(dX), (14)
and the integrand in (12) is defined by continuity as -'u 2 /2 when x == o.
For infinitely divisible distributions on R+, we may instead introduce the
measure
ii(dx) = a8 0 + (1 - e-X)v(dx).
(15)
The associated distributions J.L are denoted by Id(c, iI) and Id(iI), respec-
tively.
296 Foundations of Modern Probability
Lemma 15.15 (one-dimensional convergence criteria)
(i) Let J.t == Id(c,v) and {In == Id(cn,v n ) on. Then {In J-t iffc n -+ C
and v n v.
(ii) Let J.t == Id(v) and J-Ln == ld(v n ) on JR+. Then J-ln {l iff v n -4 v.
Proof: (i) Defining 'ljJ and 'l/Jn as in (12), we may write P ( e'l/J and
iln == e'l/ln. If C n -+ c and v n -4 v, then 'l/;n --t 'l/J by the boundedness
and continuity of the integrand in (12), and so {In --t fl, which implies
fln J.t by Theorem 5.3. Conversely, J.tn -4 {l implies {In --t jl, uniformly
on bounded intervals, and we get 'l/Jn --t 1/J in the same sense. Now define
1 1 J . ( sin X ) 1 + x 2 -
X(u) == ('ljJ(u) - 'ljJ(u + s)) ds == 2 eux 1 - - 2 v(dx),
-1 x X
and similarly for Xn, where the interchange of integrations is justified by
Fubini's theorem. Then Xn -7 X, and so by Theorem 5.3
(1 _ sinx ) 1 +2 X2 vn(dx) (1- sinx ) 1 \X 2 v(dx).
x x x x
Since the integrand is continuous and bounded away from 0, it follows that
v n -4 v. This implies convergence of the integral in (12), and by subtraction
C n --t c.
(ii) This may be proved directly by the same method, where we note
that the functions in (8) satisfy X(u + 1) - X(u) = J e-UXv(dx). 0
Proof of Theorem 15.14: For any finite measures m n and m on JR. we
w v-
note that fin --t m iff m n --t m on 1R \ {O} and m n ( -h, h) --t m( -h, h)
for some h > 0 with m{:!:h} = O. Thus, for distributions J-L and J.tn on JR,
we have v n v iff V n -4 v on JR \ {O} and a --t a h for any h > 0 with
v{:1:h} == O. Similarly, v n v holds for distributions J.t and J.tn on JR+ iff
V n -4 v on (0, 00] and a --t a h for all h > 0 with v{ h} == o. Thus, (ii)
follows immediately from Lemma 15.15. To obtain (i) from the same lemma
when d = 1, it remains to notice that the conditions b --t b h and C n -7 C
are equivalent when v n ;; and v{:f:h} = 0, since Ix-x(l+x 2 )-11 < Ix1 3 .
Turning to the proof of (i) when d > 1, let us first assume that V n -4 v
on JRd \ {O} and that a --t a h and b b h for some h > 0 with v{lxl = h}
= o. To prove J.tn J.t, it suffices by Corollary 5.5 to show that, for any
one-dimensional projection 1r U : x u' x with u i= 0, J-Ln 0 1r;; 1 J.L 0 1r:;; 1 .
Then fix any k > 0 with v{lu'xl == k} = 0, and note that J.L 01r;;1 has the
associated characteristics V U = v 0 1r;1 and
au,k - u' ahu + J (u' X )2{I(o,k] (Iu' xl) - l(O,h] (Ixl)}v(dx),
bu,k u'b h + J U'X{I(l,k] (Iu'xl) - l(l,h](l x l)}v(dx).
15. Independent Increments and Infinite Divisibility 297
Let a,k, b,k, and v denote the corresponding characteristics of J-Ln 0 7r:;; 1 .
Then V U -4 V U on IR \ { O } and furthermore au,k ---t au,k and bu,k ---t bu,k
n , n n'
The desired convergence now follows from the one-dimensional result.
Conversely, assume that J-ln j.,L. Then JLn 0 7r;: 1 I-L 0 7r:;; 1 for every
u =1= 0, and the one-dimensional result yields v:: -4 V U on ]R \ {O} as well
as a,k ---t au,k and b,k ---t bu,k for any k > 0 with v{ lu' xl == k} == o.
In particular, the sequence (vnK) is bounded for every compact set K C
]Rd \ {O}, and so the sequences (u'au) and (u'b) are bounded for any
u =1= 0 and h > O. In follows easily that (a) and (b) are bounded for every
h > 0, and therefore all three sequences are relatively compact.
Given any subsequence N' eN, we have V n -4 v' along a further sub-
sequence Nil C N' for some measure v' satisfying J(lxl2 1\ l)v'(dx) < 00.
Fixing any h > 0 with v' {Ixl == h} == 0, we may choose a still further sub-
sequence N'" such that even a and b converge toward some limits a' and
b'. The direct assertion then yields J.Ln J-L' along Nil', where J-L' is infinitely
divisible with characteristics determined by (a', b ' , v'). Since J-l' == J-l, we get
v' == v, a' == a h , and b' == b h . Thus, the convergence remains valid along
the original sequence. 0
By a simple approximation, we may now derive explicit criteria for the
convergence E j €nj € in Theorem 15.12. Note that the compound
Poisson distribution with characteristic measure J-L == .c() is given by
jl == id(O, b, J.L), where b == E[€; I€I < 1]. For any array of random vectors nj,
we may introduce an associated compound Poisson ary:ay, consisting of row-
wise independent compound Poisson random vectors €nj with characteristic
measures £(€nj).
Corollary 15.16 (i.i.d. arrays) Consider in]Rd an i.i.d. array (€nj) and
an associated compound Poisson array (€nj), and let € be id(a,b,v). Then
Ej€nj € ifJEjnj . For any h > 0 with v{lxl == h} == 0, it is also
equivalent that
(i) mn£(nl) -4 v on JRd \ {O};
(ii) mnE[€nl€l; I€nll < h] -+ a h ;
(iii) mnE[€nl; lnll < h] ---t b h .
Proof: Let J.L == £(€) and write p == e'l/J, where 'l/J is continuous with
'ljJ(0) == o. If J.lmn J.l, then r::n ---t P uniformly on compacts. Thus, on
any bounded set B we may write iln == e'l/1n for large enough n, where the 1/Jn
are continuous with m n 1/Jn -+ 'l/J uniformly on B. Hence, m n ( e V;n - 1) -+ 'ljJ,
and so jlmn J.L. The proof in the other direction is similar. Since jlmn is
id(O,bn,mnJ-Ln) with b n == mnXI$l xJ-Ln(dx), the last assertion follows by
Theorem 15.14. 0
The weak convergence of infinitely divisible laws extends to a pathwise
approximation property for the corresponding Levy processes.
298 Foundations of Modern Probability
Theorem 15.17 (approximation of Levy processes, Skorohod) Let X, Xl,
X 2 , . .. be Levy processes in ]Rd with Xl i+ Xl. Then there exist some
- d - P
processes xn == xn such that (X n - X); ---t 0 for all t > O.
Before proving the general result, we consider two special cases.
Lemma 15.18 (compound Poisson case) The conclusion of Theorem
15.17 holds when X, Xl, x 2 ,. .. are compound Poisson with characteristic
measures v, VI, V2, . .. satisfying V n v.
Proof: Allowing positive mass at the origin, we may assume that v and the
V n have the same total mass, which may then be reduced to 1 through a suit-
able scaling. If l, 2, . .. and 1' 2 , . .. are associated i.i.d. sequences, then
(1 , 2 , . . . ) i+ (l, 2, . . . ) by Theorem 4.29, and by Theorem 4.30 we may
assume that the convergence holds a.s. Letting N be an independent unit-
rate Poisson process, and defining Xt == EjNt j and XI" == EjNt j, it
follows that (X n - X); ---t 0 a.s. for each t > O. 0
Lemma 15.19 (case of small jumps) The conclusion of Theorem 15.17
P
holds when EX n = 0 and 1 > (LlXn)i -+ O.
Proof: Since (xn)i 0, we may choose some constants h n ---t 0 with
m n == h;;l E N such that w(xn, 1, h n ) O. By the stationarity of the
increments, it follows that w(X n , t, h n ) 0 for all t > O. Next, Theorem
15.14 shows that X is centered Gaussian. Thus, there exist as in Theorem
14.20 some processes yn d (X[:nnt]hJ with (yn - X); 0 for all t > o.
..., d
By Coollary 6.11 we may further choose some processes xn == X n with
yn = X[:nnt]h n a.s. Then, as n ---t 00 for fixed t > 0,
E[(X n - X); 1\ 1] < E[(yn - X); !\ 1] + E[w(X n , t,h n )!\ 1] ---t O. 0
Progf of Theorem 15.17: The asserted convergence is clearly equivalent
to p(X n , X) --+ 0, where p denotes the metric
p(X, Y) = lcc e- t E[(X - Y); /\ l]dt.
For any h > 0 we may write X == L h + M h + Jh and X n = Ln,h + Mn,h +
In,h with Lf = bht and L,h = bt, where M h and Mn,h are martingales
containing the Gaussian components and all centered jumps of size < h,
and the processes Jh and In,h are formed by all remaining jumps. Write B
for the Gaussian component of X, and note that p( M h , B) --7 0 as h -+ 0
by Proposition 7.16.
For any h > 0 with v{ Ixl == h} == 0, it is clear from Theorem 15.14 that
b -+ b h and v v h , where v h and v denote the restrictions of v and
V n , respectively, to the set {I x I > h}. The same theorem yields a ---t a as
n --7 00 and then h -+ 0, and so under those conditions M,h i+ Bl'
15. Independent Increments and Infinite Divisibility 299
Now fix any c > o. By Lemma 15.19 there exist some constants h, r > 0
- d -
and processes Mn,h == Mn,h such that p( M h , B) < E and p( Al n ,h, B) < E
for all n > r. Furthermore, if v{lxl == h} == 0, there exist by Lemma 15.18
- d -
some number r' > r and processes In,h == In,h independent of Mn,h such
that p( jh , jn,h) < E for all n > r'. We may finally choose r" > r' so large
that p(Lh,Ln,h) < c for all n > r". The processes X n : Ln,h + JVtn,h +
- d -
In,h == x n then satisfy p( X, X n ) < 4E for all n > r n . 0
Combining Theorem 15.17 with Corollary 15.16, we get a similar approx-
imation theorem for random walks, which extends the result for Gaussian
limits in Theorem 14.20. A slightly weaker result is obtained by different
methods in Theorem 16.14.
Corollary 15.20 (approximation of random walks) (;onsider in d a
Levy process X and some random walks 8 1 , 8 2 , . .. such that Sn Xl
for some integers m n -t 00, and let N be an independent unit-rate Pois-
son process. Then there exist some processes x n d (sn 0 N mnt ) such that
p
(X n - X); -+ 0 for all t > O.
In particular, we may use this result to extend the first two arcsine laws
in Theorem 13.16 to symmetric Levy processes.
Theorem 15.21 (arcsine laws) Let X be a symmetric Levy process in lR.
with Xl i= 0 a.s. Then these random variables are arcsine distributed:
II == A{t < 1; Xt > O}, 72 == inf{t > 0; Xt V Xt- == SUPs<lXs}. (16)
The purpose of the condition Xl i= 0 a.s. is to exclude the degenerate
case of pure jump-type processes.
Lemma 15.22 (diffuseness, Doeblin) A measure 11 == id( a, b, v) in JRd is
diffuse iff a i= 0 or vd == 00.
Proof: If a == 0 and vJRd < 00, then J-l is compound Poisson apart from
a shift, and so it is clearly not diffuse. When either condition fails, then
it does so for at least one coordinate projection, and we may take d == 1.
If a > 0, the diffuseness is obvious by Lemma 1.28. Next assume that v
is unbounded, say with v(O, 00) == 00. For each n E N we may then write
v == V n +v, where v is supported by (0, n- 1 ) and has total mass log 2. For
J-L we get a corresponding decomposition J-ln * J-l, where J-l is compound
Poisson with Levy measure v and J.-l {O} == . For any x E IR and E > 0
we get
J.L{X} < JLn{x}J.l{O} + J.ln[x - E,X)J.l(O,E] + IL(E, (0)
< JLn [x - c, x] + J.l (c, 00 ).
Letting n -+ 00 and then c --t 0, and noting that J1 8 0 and J-ln /1, we
get JL{x} < JL{ x} by Theorem 4.25, and so p,{ x} == o. 0
300 Foundations of Modern Probability
Proof of Theorem 15.21: Introduce the random walk SJ: = X k / n , let N
be an independent unit-rate Poisson process, and define Xr = sn 0 N nt. By
- d - P
Corollary 15.20 there exist some processes X n = X n with (X n - X)i -+ O.
Define If and 12 as in (16) in terms of X n , and conclude from Lemmas
14.12 and 15.22 that Tin Ti for i == 1,2.
Now define
ur == N;;I L l{Sk > O}, u; == N;;Imin{k; Sk == maXjNnSj}.
kNn
Since t- I Nt -+ 1 a.s. by the law of large numbers, we have sUPt<1In- 1 N nt -
tl -t 0 a.s., and so 0"'2 -'2 -t 0 a.s. Applying the same law to the sequence
of holding times in N, we further note that ar - 71 O. Hence, ai Ii
for i = 1,2. Now ar d a2' by Corollary 11.14, and by Theorem 14.11 we
have a'2 sin 2 Ct where Ct is U(O, 21r). Hence, 71 d T2 d sin 2 Ct. 0
The preceding results will now be used to complete the classical limit
theory for sums of independent random variables begun in Chapter 5. Re-
call that a null array in }Rd is defined as a family of random vectors nj,
j = 1,..., m n , n E N, such that the nj are independent for each nand
satisfy SUPj E[Injl /\ 1] -t O. Our first goal is to extend Theorem 5.11,
by giving the basic connection between sums with positive and symmetric
terms. Here we write P2 for the mapping x t--+ x 2 .
Proposition 15.23 (positive and symmetric terms) Let (nj) be a null
array of symmetric random variables, and let and 1] be infinitely divis-
ible with characteristics (a, 0, v) and (a, v 0 pi!), respectively, where v is
symmetric and a > 0. Then 2: j nj iff 2: j j TJ.
Again the proof may be based on a simple compound Poisson ap-
proximation. Here n '/}n means that n iff 'fJn for any
.
Lemma 15.24 (approximation) Let ({nj) be a null array of positive or
symmetric random variables, and let (nj) be an associated compound
. d -
PO'lsson array. Then 2: j {nj I"'.J 2: j nj.
Proof: Write J-t = £() and J-tnj = £({nj). In the symmetric case we need
to show that
II .{lnj -+ jl
J
{:=::>
II. exp({lnj - 1) --t {L,
J
which is immediate from Lemmas 5.6 and 5.8. In the nonnegative case, a
similar argument applies to the Laplace transforms. 0
Proof of Proposition 15.23: Define JLnj = £'(nj), and fix any h > 0
with v{lxl = h} == O. By Theorem 15.14 (i) and Lemma 15.24 we have
15. Independent Increments and Infinite Divisibility 301
E j f,nj iff
L .J-tnj -4
J
L .E[j; Injl < h]
J
v on JR \ {O},
a + f x 2 v(dx),
Jlxlh
whereas E j j 11 iff
'" -1
L.J . J..Lnj 0 P2
J
-4 v 0 P"2 1 on (0,00],
L .E[j; j < h2]
J
-t a + f y( v 0 p;;l )( dy).
JYh2
The two sets of conditions are equivalent by Lemma 1.22.
o
The limit problem for general null arrays is more delicate, since a com-
pound Poisson approximation as in Corollary 15.16 or Lelnma 15.24 applies
only after a careful centering, as specified by the following key result.
Theorem 15.25 (compound Poisson approximation) let (nj) be a null
array of random vectors in JRd, and fix any h > o. Define 'TIn) == nj - b nj ,
where b nj == E[nj; I€nj I < h], and let (i]nj) be an associated compound
Poisson array. Then
L .nj L . (r,nj + b nj ).
J J
(17)
A technical estimate is needed for the proof.
Lemma 15.26 (uniform summability) Let the random vectors 'TInj ==
€nj - b nj in Theorem 15.25 have characteristic functions <.{Jnj. Then either
condition in (17) implies
limsup L .11 - 'Pnj(u)\ < 00, U E ]Rd.
n-+oo J
Proof: By the definitions of b nj , 'TInj, and 'Pnj, we have
1 - ipnj(U) = E[l- e i1L ''T/n3 + iu'11njl{lnjl < h}] - iu'bnjP{Injl > h}.
Putting
an == L.E[1Jnj1Jj; I€njl < h], Pn == L.P{Injl > h},
) J
and using Lemma 5.14, we get
L .11- 'Pnj(u)\ ::S u'anU + (2 + lul)Pn.
J
Hence, it is enough to show that (u'anu) and (Pn) are bounded.
Assuming the second condition in (17), the desired boundedness follows
easily from Theorem 15.14, together with the fact that maxj Ibnjl o. If
instead E j nj , we may introduce an independent copy (j) of the
302 Foundations of Modern Probability
array (nj) and apply Theorem 15.14 and Lemma 15.24 to the symmetric
random variables (::j == u' nj - u' j' For any h' > 0, this gives
limsup L .P{I(jl > h'} < 00,
noo J
(18)
limsup L .E[((j)2; I(jl < hi] < 00.
nCX) J
(19)
The boundedness of Pn follows from (18) and Lemma 4.19. Next we note
that (19) remains true with the condition I(j I < hi replaced by Inj I V
Ij I < h. Furthermore, by the independence of nj and j'
!L.E[((j)2; Injl V Ijl < h]
J
L .E[(u'1Jnj)2; Injl < h]P{Injl < h} - L . (E[u'1]nj; Injl < h])2
J J
> u'anuminjP{Injl < h} - L.(U'bnjP{Injl > h})2.
J
Here the last sum is bounded by Pn maxj ( u' b nj )2 -+ 0, and the minimum on
the right tends to 1. The boundedness of (u'anu) now follows by (19). 0
Proof of Theorem 15.25: By Lemma 5.13 it is enough to show that
E j I 'Pnj ( u) - exp{ 'Pnj ( u) - I} I -+ 0, where 'Pnj denotes the characteristic
function of "Inj. This is clear from Taylor's formula, together with Lemmas
5.6 and 15.26. 0
In particular, we may now identify the possible limits.
Corollary 15.27 (limit laws, Feller, Khinchin) Let (nj) be a null array
of random vectors in d such that E j nj !!,. for some random vector .
Then is infinitely divisible.
Proof: The random vectors iinj in Theorem 15.25 are infinitely divisible,
so the same thing is true for the sums E j (finj -b nj ). The infinite divisibility
of then follows by Theorem 15.12. 0
We may further combine Theorems 15.14 and 15.25 to obtain explicit
convergence criteria for general null arrays. The present result generalizes
Theorem 5.15 for Gaussian limits and Corollary 15.16 for i.i.d. arrays. For
convenience, we write cov[; A] for the covariance matrix of the random
vector lA.
15. Independent Increments and Infinite Divisibility 303
Theorem 15.28 (convergence criteria for null arrays, Doeblin, Gne-
denko) Let (nj) be a null array of random vectors in JRd, let be ide a, b, v),
and fix any h > 0 with v{lxl == h} == o. Then j (nj iff these
conditions hold:
(i) j £'(nj) v on JRd \ {O};
(ii) j cov[nj; Inj I < h] -+ a h ;
(Hi) j E[nj; Inj I < h] -+ b h .
Proof: Define anj == cov[nj; Inj I < h] and b nj == E[nj; Inj I < h]. By
Theorems 15.14 and 15.25 the convergence j nj is equivalent to the
conditions
v -
(i') j .c("'nj) -+ v on JRd \ {O},
(ii') j E["'nj1]j; I"'nj I < h] -+ a h ,
(iii') j (b nj + E[1]nj; I"'nj I < h]) -+ b h .
Here (i) and (i') are equivalent since maxj Ib nj I ---t O. Using (i) and the facts
that maxj Ibnjl -+ 0 and v{lxl == h} == 0, it is further elear that the sets
{11Jnj I < h} in (ii') and (Hi') can be replaced by {Inj I ; h}. To prove the
equivalence of (ii) and (ii'), it is then enough to note that, in view of (i),
IIL j {anj - E[17nj17jj Injl < h]}11 < IILjbnjbjP{Injl > h}11
< maxjlb nj l 2 L .P{Injl > h} ---t O.
J
Similarly, (iii) and (iii') are equivalent because
IL}njP{Injl > h}1
< maxjlbnjl L .P{I€njl > h} ---t o.
J
In the one-dimensional case, we give two probabilistic interpretations of
the first condition in Theorem 15.28, one of which involves the row-wise
d
extremes. For random measures", and "'n on ]R \ {O}, the convergence "'n ---t 1]
on JR \ {O} is defined by the condition "lnf 1]1 for all f E Cj« JR \ {O}).
\LjE[17nj; Injl < hJI
o
Theorem 15.29 (sums and extremes) Let (nj) be a null array oj ran-
dom variables with distributions J-Lnj, and define TIn == 2: j 8nJ and Q ==
maxj (::f:nj ), n EN. Fix a Levy measure v on ]R \ {O}, let 'f} be a Poisson
process on ]R \ {O} with E'TJ = 1I, and put a:i: = sup{ x > 0; 1]{:i::x} > O}.
Then these conditions are equivalent:
. v-
(1) j Mnj -+ v on]R \ {O};
(ii) ''In TI on R \ {O};
(iii) a; a:f: .
304 Foundations of Modern Probability
The equivalence of (i) and (ii) is an immediate consequence of Theorem
16.18 in the next chapter. Here we give a direct elementary proof.
Proof: Condition (i) holds iff
L.JLnj(X,OO) -+ v(x,oo), L.JLnj(-OO,-x) -+ v(-oo,-x), (20)
J J
for all x > 0 with v{::t:x} = O. By Lemma 5.8, the first condition in (20) is
equivalent to
P{Q < x} = II .(1- P{nj > x}) -+ e-v(x,CX) = P{a+ < x},
J
which holds for all continuity points x > 0 iff Q 0+. Similarly,
the second condition in (20) holds iff 0;;:- 0-. Thus, (i) and (iii) are
equivalent.
To show that (i) implies (ii), we may write the latter condition in the
form
L .f(nj) 171, I E ci« \ {O}).
J
(21)
Here the variables I(nj) form a null array with distributions J.lnj 0 1-1,
and 'TJI is compound Poisson with characteristic measure vol-I. Thus,
Theorem 15.14 (ii) shows that (21) is equivalent to the conditions
L .JLnj 0 1-1 V 0 I-Ion (0,00], (22)
J
lim limsup L. f f(x)J.Lnj(dx) = O. (23)
E-+O n-+oo J J f(x)s€
Now (22) follows immediately from (i). To deduce (23), it suffices to note
that the sum on the left is bounded by 2: j J.lnj(1 1\ €) -+ v(1 /\ c).
Finally, assume (ii). By a simple approximation, 17n(x, 00) 17(x, 00) for
any x > 0 with I/{ x} = O. In particular, for such an x,
P{a < x} = P{1]n(x,oo) = O} -+ P{1](x,oo) = O} = P{a+ < x},
and so Q a+. Similarly, a a-, which proves (iii).
o
Exercises
1. Show that a Levy process X in 1R is a subordinator iff Xl > 0 a.s.
2. Show that the Cauchy distribution J.l(dx) = 7T- 1 (1 + X2)-ldx is strictly
I-stable, and determine the corresponding Levy measure 1/. (Hint: Check
that jj(u) = e- 1ul . By symmetry, v(dx) = cx- 2 dx for some c > 0, and it
remains to determine c.)
3. Let X be a weakly p-stable Levy process. If p =1= 1, show that the process
Xt - ct is strictly p-stable for a suitable constant c. Note that the centering
fails for p = 1.
15. Independent Increments and Infinite Divisibility 305
4. Extend Proposition 15.23 to null arrays of spherically symmetric random
vectors in JR d .
5. Show by an example that Theorem 15.25 fails without the centering at
truncated means. (Hint: Without the centering, condition (ii) of Theorem
15.28 becomes 2: j E[njj; Inj I < h] -+ a h .)
6. Deduce Theorems 5.7 and 5.11 from Theorem 15.14 and Lemma 15.24.
7. For a Levy process X of effective dimension d > 3, show that IXtl -+ 00
a.s. as t -+ 00. (Hint: Define T == inf{t; IXtl > I}, and iterate to form a
random walk (Sn). Show that the latter has the same effective dimension
as X, and use Theorem 9.8.)
8. Let X be a Levy process in JR, and fix any p E (0,2). Show that t- l/p Xt
converges a.s. iff EIX1IP < 00 and either p < 1 or EX l == O. (Hint: Define
a random walk (Sn) as before, show that Sl satisfies the same moment
condition as Xl, and apply Theorem 4.23.)
9. If is idea, b, v) and p > 0, show that EIIP < 00 iff x! >1 IxlPv( dx) < 00.
(Hint: If v has bounded support, then EIIP < 00 for all p. It is then
enough to consider compound Poisson distributions, for which the result is
elementary. )
10. Show by a direct argument that a Z+-valued random variable is
infinitely divisible (on Z+) iff - log E s == 2: k (1- sk) Vk, S E (0, 1], for some
unique, bounded measure v = (Vk) on N. (Hint: Assuming .c() = J1n, use
the inequality 1 - x < e- x to show that the sequence (nJ1n) is tight on
N. Then nJ..Ln v along a subsequence for some bounded measure v on
N. Finally note that -log(1 - x) f".J x as x O. For the uniqueness, take
differences and use the uniqueness theorem for power series.)
11. Show by a direct argument that a random variable > 0 is infinitely
divisible iff -log Ee-u = ua + J(1 - e-UX)v(dx), u > 0, for some unique
constant a > 0 and measure v on (0,00) with J(lxiA 1) < 00. (Hint: If
.c() = J..Ln, note that the measures Xn(dx) = n(l - e- X )J1n(dx) are tight
on R+. Then Xn X along a subsequence, and we Dlay write X(dx) =
a8o(dx) + (1- e- X )lI(dx). The desired representation now follows as before.
To get the uniqueness, take differences and use the uniqueness theorem for
Laplace transforms.)
12. Show by a direct argument that a random variable is infinitely divis-
ible iff u = log Eeiu exists and is given by (7) for some unique constants
a > 0 and b and measure v on 1R \ {O} with J(x 2 1\ l)v(dx) < 00. (Hint:
Proceed as in Lemma 15.15.)
306 Foundations of Modern Probability
13. Given a semi group of infinitely divisible distributions J..tt, show that
there exists a process X on 1R+ with stationary, independent increments
and £(X t ) = J-Lt for all t > O. Starting from a suitable Poisson process and
an independent Brownian motion, construct a Levy process Y with the
same property. Conclude that X has a version with rell paths and a similar
representation as Y. (Hint: Use Lemma 3.24 and Theorems 6.10 and 6.16.)
Chapter 16
Convergence of Random Processes,
Measures, and Sets
Relative compactness and tightness; uniform topology on C(K, S);
Skorohod's J 1 -topology; equicontinuity and tightness; conver-
gence of random measures; superposition and thinning; ex-
changeable sequences and processes; simple point processes and
random closed sets
The basic notions of weak or distributional convergence were introduced in
Chapter 4, and in Chapter 5 we studied the special case of distributions
on Euclidean spaces. The purpose of this chapter is to develop the gen-
eral weak convergence theory into a powerful tool that applies to a wide
range of set, measure, and function spaces. In particular, some functional
limit theorems derived in the last two chapters by cumbersome embedding
and approximation techniques will then be accessible by straightforward
compactness arguments.
The key result is Prohorov's theorem, which gives the basic connection
between tightness and relative distributional compactness. This result will
enable us to convert some classical compactness criteria into convenient
probabilistic versions. In particular, we shall see how the Arzela-Ascoli
theorem yields a corresponding criterion for distributional compactness of
continuous processes. Similarly, an optional equicontinuity condition will
be shown to guarantee the appropriate compactness for processes that are
right-continuous with left-hand limits (rcll). We shall also derive some gen-
eral criteria for convergence in distribution of random measures and sets,
with special attention to the point process case.
The general criteria will be applied to some interesting concrete situa-
tions. In addition to some already familiar results froIn Chapters 14 and
13, we shall obtain a general functional limit theorem for sampling from
finite populations and derive convergence criteria for superpositions and
thinnings of point processes. Further applications appear in subsequent
chapters, such as a general approximation result for Markov chains in Chap-
ter 19 and a method for constructing weak solutions to SDEs in Chapter
21.
Beginning with the case of continuous processes, let us fix two metric
spaces (K,d) and (S,p), where K is compact and S is separable and com-
plete, and consider the space C(K, S) of continuous functions from K to
308 Foundations of Modern Probability
S, endowed with the uniform metric jJ(x, y) = SUPtEK p(Xt, Yt). For each
t E K we may introduce the evaluation map 1rt : x t---t Xt from C(K, S)
to S. The following result shows that the random elements in C(K, S) are
precisely the continuous S-valued processes on K.
Lemma 16.1 (Borel sets and evaluations) B(C(K, S») = a{1rt; t E K}.
Proof: The maps 1ft are continuous, hence Borel measurable, and so
the generated a-field C is contained in B(C(K, S)). To prove the reverse
relation, we need to show that any open subset G c C(K, S) lies in
C. From the Arzela-Ascoli Theorem A2.1 we note that C(K, S) is a-
compact and hence separable. Thus, G is a countable union of open balls
Bx,r = {y E C(K, S); p(x, y) < r}, and it suffices to prove that the latter
lie in C. But this is clear, since for any countable dense set D c K,
B x,r = n {y E C(K, S); p(Xt, Yt) < r}. 0
tED
If X and xn are random processes on K, we write xn f d ) X for
convergence of the finite-dimensional distributions, in the sense that
(X;: , . . . , X;:) (X tI ,. . . , X tk ), tl, . . . , tk E K, kEN. (1)
Though by Proposition 3.2 the distribution of a random process is deter-
mined by the family of finite-dimensional distributions, condition (1) is
insufficient in general for the convergence xn X in C(K, S). This is al-
ready clear when the processes are nonrandom, since pointwise convergence
of a sequence of functions need not be uniform. To overcome this difficulty,
we may add a compactness condition. Recall that a sequence of random
elements I , 2, . .. is said to be relatively compact in distribution if every
subsequence has a further subsequence that converges in distribution.
Lemma 16.2 (weak convergence via compactness) Let X, Xl, X 2 ,. .. be
random elements in C(K, S). Then X n X iff X n fd) X and (X n ) zs
relatively compact in distribution.
Proof: If X n X, then X n fd) X by Theorem 4.27, and (X n ) is trivially
relatively compact in distribution. Now assume instead that (X n ) satisfies
the two conditions. If X n X, we may choose a bounded continuous
function f: C(K, S) -+ JR and an c > 0 such that IEf(Xn) - Ef(X)1 > c
along some subsequence N' c N. By the relative compactness we may
d
choose a further subsequence N" and a process Y such that X n -+ Y along
Nil. But then X n f d ) Y along Nil, and since also X n fd) X, Proposition
3.2 yields X d Y. Thus, X n i+ X along Nfl, and so Ef(X n ) -+ Ef(X)
along the same sequence, a contradiction. We conclude that X n X. 0
The last result shows the importance of finding tractable conditions for
a random sequence l, 2, . .. in a metric space S to be relatively compact.
16. Convergence of Random Processes, Measures, and Sets 309
Generalizing a notion from Chapter 4, we say that (n) is tight if
SUPK liminf P{n E K} == 1,
n-+ CX)
(2)
where the supremum extends over all compact subsets !{ c s.
We may now state the key result of weak convergence theory, the equiv-
alence between tightness and relative compactness for random elements
in sufficiently regular metric spaces. A version for Euclidean spaces was
obtained in Proposition 5.21.
Theorem 16.3 (tightness and relative compactness, PTohorov) For any
sequence of random elements 1, 2, . .. in a metric space S, tightness
implies relative compactness in distribution, and the tuo conditions are
equivalent when S is separable and complete.
In particular, we note that when S is separable and complete, a single
random element in S is tight, in the sense that SUPK P{ E K} == 1. In
that case we may clearly replace the "lim inf" in (2) by "inf."
For the proof of Theorem 16.3 we need a simple lemma. Recall from
Lemma 1.6 that a random element in a subspace of a metric space S may
also be regarded as a random element in S.
Lemma 16.4 (preservation of tightness) Tightness is pTeserved by contin-
uous mappings. In particular, if (n) is a tight sequence of random elements
in a subspace A of some metric space S, then (n) remains tight when
regarded as a sequence in S.
Proof: Compactness is preserved by continuous mappings. This applies
in particular to the natural embedding I: A -t S. 0
Proof of Theorem 16.3 (Varadarajan): For S == JRd the result was proved
in Proposition 5.21. Turning to the case when S == }RCX), consider a tight se-
quence of random elements n = (1' f,'2, . . . ) in JRCX). Writing rf k == (r, . . . ,
k)' we conclude from Lemma 16.4 that the sequence (1}J:; n E N) is tight in
:IRk for each kEN. Given any subsequence N' c N, we may then use a di-
d
agonal argument to extract a further subsequence Nil such that 1}k -+ some
'f]k as n -+ 00 along Nil for fixed kEN. The sequence ([,(1}k)) is projective
by the continuity of the coordinate projections, and so by Theorem 6.14
there exists a random sequence == (1, 2, . . . ) such that (1, . . . , k) d 1}k
for each k. But then n f d ) along Nil, and so Theorem 4.29 yields n
along the same sequence.
Next assume that S c JRCX). If (n) is tight in S, then by Lemma 16.4
it remains tight as a sequence in }Roo. Hence, for any sequence N' c N
there exist a further subsequence Nil and some random element such
that n in Roo along Nil. To show that the convergence remains valid
in S, it suffices by Lemma 4.26 to verify that E S a.s. Then choose some
compact sets Km C SwithliminfnP{n E Km} > 1-2- rn for each m E N.
310 Foundations of Modern Probability
Since the Km remain closed in ]Roo, Theorem 4.25 yields
P{ E Km} > limsupP{n E Km} > liminf P{n E Km} > 1 - 2- m ,
nEN" n-+oo
and so E UmKm C S a.s.
Now assume that S is a-compact. In particular, it is then separable and
therefore homeomorphic to a subset A C }ROO. By Lemma 16.4 the tightness
of (n) carries over to the image seqence (tn) in A, and by Lemma 4.26 the
possible relative compactness of (n) implies the same property for (n).
This reduces the discussion to the previous case.
Now turn to the general case. If ({n) is tight, there exist some compact
sets Km C S with liminfnP{n E Km} > 1- 2- m . In particular, P{n E
A} -+ 1, where A == Urn Krn, and so we may choose some random elements
TJn in A with P{n = TJn} -+ 1. Here (TJn) is again tight, even as a sequence
in A, and since A is a-compact, the previous argument shows that (TJn) is
relatively compact as a sequence in A. By Lemma 4.26 it remains relatively
compact in S, and by Theorem 4.28 the relative compactness carries over
to (n).
To prove the converse assertion, let S be separable and complete, and
assume that (n) is relatively compact. For any r > 0 we may cover S by
some open balls B 1 , B 2 ,... of radius r. Writing G k = B 1 U .. . U B k , we
claim that
lirn inf P{ n E G k } == 1.
k-+oo n
(3)
Indeed, we may otherwise choose some integers nk t 00 with sUPk P{nk E
Gk} == c < 1. By the relative compactness we have nk along a
subsequence N' C N for a suitable , and so
P{ E G m } < li1o/ P{nk E G m } < c < 1, mEN,
which leads as m -+ 00 to the absurdity 1 < 1. Thus, (3) must be true.
Now take r = m- 1 and write G'k for the corresponding sets G k . For any
E > 0 there exist by (3) some k 1 , k 1 , . . . E N with
inf P{n E G km } > 1- e2- m , mEN.
n
Writing A = nm G km , we get inf n P{n E A} > 1 - E. Also, note that A is
complete and totally bounded, hence compact. Thus, (n) is tight. 0
In order to apply the last theorem, we need convenient criteria for tight-
ness. Beginning with the space C(K, S), we may convert the classical
Arzela-Ascoli compactness criterion into a condition for tightness. Then
introduce the modulus of continuity
w(x,h) = sup{p(xs,Xt); d(s,t) < h}, x E C(K,S), h > O.
The function w(x, h) is clearly continuous for fixed h > 0 and hence a
measurable function of x.
16. Convergence of Random Processes, Measures, and Sets 311
Theorem 16.5 (tightness in C(K, S), Prohorov) For any metric spaces
K and S, where K is compact and S is separable and complete, let
X, Xl, X 2 ,... be random elements in C(K, S). Then X n X iff X n f d )
X and
Hrn lim sup E[w(Xn, h) /\ 1] == O.
hO n-+oo
(4)
Proof: Since C(K, S) is separable and complete, Theorem 16.3 shows that
tightness and relative compactness are equivalent for (X n ). By Lemma 16.2
it is then enough to show that, under the condition X n X, the tightness
of (X n ) is equivalent to (4).
First let (X n ) be tight. For any c > 0 we may then choose a compact set
B c C(K, S) such that limsuPn P{X n E BC} < E. By the Arzela-Ascoli
Theorem A2.1 we may next choose h > 0 so small that w(x, h) < E for all
x E B. But then limsuPn P{w(Xn, h) > E} < E, and (4) follows since E was
arbitrary.
Next assume that (4) holds and X n f d ) X. Since each X n is continuous,
w( X n , h) -+ 0 a.s. as h -+ 0 for fixed n, so the "lim sup" in (4) may be
replaced by "sup." For any E > 0 we may then choose hI, h 2 , . .. > 0 so
small that
sUPnP{w(Xn, hk) > 2- k } < 2- k - 1 E, kEN.
(5)
Letting t 1 , t2, . .. be dense in K, we may further choose some compact sets
C 1 , C 2 ,. . . C S such that
sUPnP{Xn(tk) E Ck} < 2- k - I E, kEN.
(6)
Now define
B = nk{x E C(K,S); X(tk) E C k , w(x,h k ) < T k }.
Then B is compact by the Arzela-Ascoli Theorem A2.1, and from (5) and
(6) we get sUP n P{X n E B C } < E. Thus, (X n ) is tight. D
One often needs to replace the compact parameter space K by some more
general index set T. Here we assume T to be locally compact, second-
countable, and Hausdorff (abbreviated as ZeseH) and endow the space
C(T, S) of continuous functions from T to S with the topology of uniform
convergence on compacts. As before, the Borel a-field in C(T, S) is gener-
ated by the evaluation maps 1ft, and so the random elements in C(T, S) are
precisely the continuous processes on T taking values in S. The following
result characterizes convergence in distribution of such processes.
312 Foundations of Modern Probability
Proposition 16.6 (locally compact parameter space) Let X, Xl, X 2 , . . .
be random elements in G(T, S), where S is a metric space and T is LescH.
Then x n .!!t X iff convergence holds for the restrictions to any compact
subset K c T.
Proof: The necessity is obvious from Theorem 4.27, since the restriction
map 'irK : G(T, S) --* C(K, S) is continuous for any compact set K c T.
To prove the sufficiency, we may choose some compact sets KI C K 2 C
. .. c T with KJ t T, and let Xi, xl , X;, . .. denote the restrictions of
the processes X, Xl, X 2 , . .. to K i . By hypothesis we have Xi .!!t Xi for
every i, and so Theorem 4.29 yields (Xl' X 2 , . . .) .!!t (Xl, X 2 , . . .). Now
1r == ('lrK 1 ,1rK 2 ,...) is a homeomorphism from G(T,S) onto its range in
d
XjC(K j , S), and so xn ---7 X by Lemma 4.26 and Theorem 4.27. 0
For a simple illustration, we may prove a version of Donsker's Theorem
14.9. Since Theorem 16.5 applies only to processes with continuous paths,
we need to replace the original step processes by their linearly interpolated
verSIons
X == n- I / 2 { k + (nt - [nt])[nt]+l } ' t > 0, n E N. (7)
k'5::nt
Corollary 16.7 (functional central limit theorem, Donsker) Let €1, €2, . . .
be i. i. d. random variables with mean 0 and variance 1, define X I , X 2 , . . .
by (7), and let B denote a Brownian motion on JR+. Then X n .!!t B zn
C(]R+).
The following simple estimate may be used to verify the tightness.
Lemma 16.8 (maximum inequality, Ottaviani) Let €1, 2,... be i.i.d.
random variables with mean 0 and variance 1, and put Sn == Ljnj.
Then
P{S > 2rvn} < P{In -:vn} , r > 1, n E N.
Proof: Put c == rJTi, and define T == inf{kEN; ISkl > 2c}. By the strong
Markov property at T and Theorem 6.4,
P{ISnl > c} > P{ISnl > c, S > 2c}
> p{, < n, IBn - Sri < c}
> P{S > 2c} minknP{ISkl < c},
and by Chebyshev's inequality,
min P{ISkl < c} > min(l - kc- 2 ) > (1 - nc- 2 ) == 1 - r- 2 . 0
k'5::n k'5::n
Proof of Corollary 16. 7: By Proposition 16.6 it is enough to prove the
convergence on [0, 1]. Clearly, X n fd) X by Proposition 5.9 and Corollary
5.5. Combining the former result with Lemma 16.8, we further get the
16. Convergence of Random Processes, Afeasures, and Sets 313
rough estimate
lirn r2limsupP{S > ryln} == 0,
r --+ 00
n --+ 00
which implies
lirn h-1lirnsup supP{suPo<r<hIXf+r - Xfl > E} == O.
hO n--+oo t - -
Now (4) follows easily, as we divide [0,1] into subintervals of length <
h. 0
Next we show how the Kolrnogorov-Chentsov criterion in Theorem 3.23
may be converted into a sufficient condition for tightness in C (JR d , S). An
important application appears in Theorem 21.9.
Corollary 16.9 (moments and tightness) Let Xl, X 2 , . .. be continuous
processes on ]Rd with values in a separable, complete metric space (3, p).
Assume that (Xl)) is tight in S and that for suitable constants a, b > 0,
E{p(X:, Xf)}a 5. Is - tl d + b , s, t E d, n E N, (8)
uniformly in n. Then (xn) is tight in CORd, S), and for any c E (0, bja)
the limiting processes are a.s. locally Holder continuous 1J.Jith exponent c.
Proof: For each process X n we define the associated quantities nk as in
the proof of Theorem 3.23, and we get E€k 5. 2- kb . Hence, Lemma 1.29
yields for m, n E N
Il w ( X 2-'Tn )Il aI\1 < li e Il al\l < 2- kb /(aVl) < 2- mb /(aVl)
n , a..-.. L...J k ? m nk a --. L...J k ? m --. '
which implies (4). Condition (8) extends by Lemma 4.11 to any limiting
process X, and the last assertion then follows by Theorem 3.23. 0
Let us now fix a separable, complete metric space S, and consider random
processes with paths in D(R+, S), the space of rcll functions f: JR+ -+ S.
We endow D(R+, S) with the Skorohod J 1 -topology, whose basic properties
are summarized in Appendix A2. Note in particular that the path space
is again Polish and that compactness may be characterized in terms of a
modified modulus of continuity ill, as defined in Theorem A2.2.
The following result gives a criterion for weak convergence in D(JR.+, S),
similar to Theorem 16.5 for C(K, S).
Theorem 16.10 (tightness in D(JR+, S), Skorohod, Prohorov) For any
separable, complete metric space S, let X, Xl, X 2 ,... be random elements
in D(1R+, S). Then X n X iff X n fd) X on some dense subset of T =
{t > 0; llX t = 0 a.s.} and
lim limsupE[w(X n , t, h) 1\ 1] == 0, t > 0. (9)
hO noo
Proof: Since 1Tt is continuous at every path x E D(JR+, S) with D.Xt == 0,
X n X implies X n f d ) X on T by Theorem 4.27. Now use Theorem A2.2
and proceed as in the proof of Theorem 16.5. 0
314 Foundations of Modern Probability
Tightness in D(+, S) is often verified most easily by means of the fol-
lowing sufficient condition. Given a process X, we say that a random time
is X-optional if it is optional with respect to the filtration induced by X.
Theorem 16.11 (optional equicontinuity and tightness, Aldous) For any
metric space (S, p), let X l , x 2 , . .. be random elements in D (JR+, S). Then
(9) holds if for any bounded sequence of Xn-optional times Tn and any
positive constants h n --+ 0,
p(X,X+hn)2+0, n-too. (10)
The proof will be based on two lemmas, where the first one IS a
restatement of condition (10).
Lemma 16.12 The condition in Theorem 16.11 is equivalent to
Hm limsup supE[p(X;,X)!\ 1] = 0, t > 0, (11)
hO n-too u,r
where the supremum extends over all Xn-optional times (1,7 < t with (I <
T < a + h.
Proof. Replacing p by p!\ 1 if necessary, we may assume that p < 1. The
condition in Theorem 16.11 is then equivalent to
!im limsup sup sup Ep(X;:,X;-+h) = 0, t > 0,
8.-.0 n-too rt hE[O,6]
where the first supremum extends over all Xn-optional times r < t. To
deduce (11), assume that 0 < 7 - (j < D. Then [7,7 + 6] c [a, a + 26], and
so by the triangle inequality and a simple substitution,
bp(X q , X r )
<
1 6 {p(Xu,XT+h) + p(XnXT+h)}dh
1 26 1 6
o p(X u , Xu+h)dh + 0 p(X n XT+h)dh.
<
Thus,
supEp(Xu,X r ) < 3sup sup Ep(XT,Xr+h),
q,r r hE[O,26]
where the suprema extend over all optional times T < t and a E [T-
b,7]. 0
We also need the following elementary estimate.
16. Convergence of Random Processes, Measures, and Sets 315
Lemma 16.13 Let 1, . . . , n > 0 be random variables with sum Sn. Then
Ee- Sn < e- nc + maxP{€k < c}, c > o.
kS:n
Proof: Let p denote the maximum on the right. By the Holder and
Chebyshev inequalities we get
Ee- Sn = Ell e-k < II(Ee-nk)l/n < {(e- nc + p)l/n} n = e- nc + p. 0
k k
Proof of Theorem 16.11: Again we may assume that p < 1, and by
suitable approximation we may extend condition (11) to weakly optional
times a and T. For each n E Nand E > 0, we recursively define the weakly
Xn-optional times
ak+l = inf{s > ak; P(X;;k'X) > E}, k E Z+,
starting with 0'0 = O. Note that for mEN and t, h > 0,
w(X n , t, h) < 2E + L l{ak+l - aJ: < h, O"k < t} + 1{0" < t}. (12)
k<m
Now let vn(t, h) denote the supremum in (11). By Chebyshev's inequality
and a simple truncation,
P{ak+l - ai: < h, ai: < t} < €-ll/n(t + h, h), kEN., t, h > 0, (13)
and so by (11) and (12),
lirn lirnsupEw(Xn,t,h) < 2E + limsupP{O", < t}. (14)
h--+O n-+oo n-+CX)
Next we conclude from (13) and Lemma 16.13 that, for any c > 0,
P{a < t} < etE[e-U; a < t] < et{e- mc + c-Ivn{t + c,c)}.
By (11) the right-hand side tends to 0 as m, n -+ 00 and then c -+ o.
Hence, the last term in (14) tends to 0 as m -+ 00, and (9) follows since €
is ar bi trary. 0
We may illustrate the use of Theorem 16.11 by proving an extension of
Corollary 16.7. A more precise result is obtained by different methods in
Corollary 15.20. An extension to Markov chains appears in Theorem 19.28.
Theorem 16.14 (approximation of random walks, Skorohod) Let S1,
8 2 ,... be random walks in JRd such that 8n Xl for some Levy process
X and some integers m n -+ 00. Then the processes Xlt = S[:nn t ] satisfy
xn X in D(:Il+,1Rd).
fd
Proof: By Corollary 15.16 we have X n X, and so by Theorem 16.11
it is enough to show that IX;n+hn - X I 0 for any finite optional
times T' n and constants h n -t o. By the strong Markov property of sn, or
316 Foundations of Modern Probability
alternatively by Theorem 11.13, we may reduce to the case when Tn == 0
for all n. Thus, it suffices to show that X hn 0 as h n -4 0, which again
may be seen from Corollary 15.16. 0
For the remainder of this chapter, we assume that S is IcscH with Borel
a-field S. Write S for the class of relatively compact sets in S. Let M(S)
denote the space of locally finite measures on S, endowed with the vague
topology induced by the mappings 1r f : J.L t-t J-lf == J f dJ.-L, f E Cf(. The
basic properties of this topology are summarized in Theorem A2.3. Note
in particular that M (S) is Polish and that the random elements in M (S)
are precisely the random measures on S. Similarly, the point processes on
S are random elements in the vaguely closed subspace N(S), consisting of
all integer-valued measures in M(S).
We begin with the basic tightness criterion.
Lemma 16.15 (tightness of random measures, Prohorov) Let 1,2,...
be random measures on some lcscH space S. Then the sequence (n) is
relatively compact in distribution iff (nB) is tight in JR+ for every B E S.
Proof: By Theorems 16.3 and A2.3 the notions of relative compactness
and tightness are equivalent for (n). If (n) is tight, then so is ((,nf) for
every f E C1< by Lemma 16.4, and hence (nB) is tight for all B E S.
Conversely, assume the latter condition. Choose an open cover G 1, G 2 , . . . E
S of S, fix any c > 0, and let Tl, r2, . .. > 0 be large enough that
sUPnP{nGk > Tk} < c2- k , kEN. (15)
Then the set A = nk {J.t; j.tGk < rk} is relatively compact by Theorem A2.3
(ii), and (15) yields inf n P{n E A} > 1 - c. Thus, (n) is tight. 0
We may now derive some general convergence criteria for random mea-
sures, corresponding to the uniqueness results in Lemma 12.1 and Theorem
12.8. Define S = {B E S; f,8B = 0 a.s.}.
Theorem 16.16 (convergence of random measures) Let, 1, 2,... be
random measures on an lcscH space S. Then these conditions are
equivalent:
(i) n €;
(ii) (,nf f for all f E Cf(;
(iii) ( n B 1, . . . , (, n B k) «(, B 1, . . · , B k) for all B 1 , . . . , B k E S, kEN.
If is a simple point process or a diffuse random measure, it is also
equivalent that
d A
(iv) f,nB ---t f,B for all B E s.
Proof: By Theorems 4.27 and A2.3 (Hi), condition (i) implies both (ii)
and (iii). Conversely, Lemma 16.15 shows that (n) is relatively compact
in distribution under both (ii) and (iii). Arguing as in the proof of Lemma
16. Convergence of Random Processes, Measures, and Sets 317
16.2, it remains to show for any random measures and TJ on S that d TJ
if f d TJf for all f E OJ(, or if
d
(Bl,...,Bk)=('TJBl,...,'TJBk), Bl,...,BkES+Tp kEN. (16)
In the former case, this holds by Lemma 12.1; in the latter ease it follows by
a monotone class argument from Theorem A2.3 (iv). The last assertion is
obtained in a similar way from a suitable version of Theorern 12.8 (iii). 0
Weaker conditions are required for convergence to a simple point process,
as suggested by Theorem 12.8. The following conditions are only sufficient,
and a precise criterion is given in Theorem 16.29.
Here a class U C S is said to be separating if, for any compact and
open sets K and G with KeG, there eists some U E U with K c
U c G. Furthermore, we say that I c S is preseparating if the finite
unions of sets in I form a separating class. Applying Lemma A2.6 to the
function h(B) = Ee-B, we note that the class S is separating for any
random measure . For Euclidean spaces S, a preseparating class typically
consist of rectangular boxes, whereas the corresponding finite unions form
a separating class.
Proposition 16.17 (convergence of point processes) Let ,1,2,... be
point processes on an lcscH space S, where is simple, and fix a separating
class U c S. Then n .!4 under these conditions:
(i) P{nU = O} -t P{U = O} for all U E U;
(ii) limsuPn EnK < EK < 00 for all compact sets K c S.
Proof: Fist note that both (i) and (ii) extend by suitable approximation
to sets in S. By the usual compactness argument together with Lemma
4.11, it is enough to prove that a point process 1] is distributed as whenever
P{1]B = O} = P{B = O}, E1]B < EB,
B E S+7J.
Here the first relation yields 1]* d as in Theorem 12.8 (i). From the second
relation we then obtain E'fJB < E'TJ* B for all B E S, which shows that 1]
is a.s. simple. 0
We may illustrate the use of Theorem 16.16 by showing how Poisson
and Cox processes may arise as limits under superposition or thinning.
Say that the random measures nj, n,j E N, form a null array if they
are independent for fixed n and such that, for every B E S, the random
variables njB form a null array in the sense of Chapter 5. The following
result is a point process version of Theorem 5.7.
318 Foundations of Modern Probability
Theorem 16.18 (convergence of superpositions, Grigelionis) Let (nj) be
a null array of point processes on an lcscH space S, and consider a Poisson
process on S with E == J.l. Then E j nj iff these conditions hold:
(i) E j P{njB > O} J.lB for all B E SJ-L;
(ii) E j P{njB > I} 0 for all B E S.
d d ,..,
Proof: If E j nj -t , then E j njB --t B for all B E SJ-L by Theorem
16.16, which implies (i) and (ii) by Theorem 5.7. Conversely, assume (i)
and (ii). To prove that E j nj , we may restrict our attention to an
arbitrary compact set C E SJ-L. For notational convenience, we may also
assume that S itself is compact. Now define 1Jnj == nj1{njS < I}, and
note that (i) and (ii) remain true for the array ('TJnj). Moreover, E j 'T}nj
implies E j nj by Theorem 4.28. This reduces the discussion to the
case when (,njS < 1 for all nand j.
Now define J.lnj =: EEnj. By (i) we get
L .MnjB =: L .EnjB == L .P{njB > O} --t J-lB, B E SJ-L'
J J J
and so E j J-tnj J.l by Theorem 4.25. Noting that m(l- e- f ) == 1- e- mf
when m == 6x or 0 and writing n =: E j nj, we get by Lemmas 5.8 and
12.2 (i)
Ee-€nf
II .Ee-€n J ! == II .E{l - nj(l - e- f )}
J J
II j {I - Jlnj(l - e- f )} '"" exp { - LjJlnj(l - e- f )}
--+ exp( -J.l(l - e- f)) = Ee-!.
o
We may next establish a basic limit theorem for independent thinnings
of point processes.
Theorem 16.19 (convergence of thinnings) For every n E N, let n be a
Pn-thinning of some point process 1]n on S, where S is lcscH and Pn -+ o.
Then n some iff Pn1]n some 1J, in which case is distributed as a
Cox process directed by 1].
Proof: For any f E CJ(, we get by Lemma 12.2
E-nf = Eexp(TJn lo g{l- Pn(l- e- f )}).
Noting that px < -log(l - px) < -x log(l - p) for any p, x E [0,1) and
writing p = -log(l - Pn), we obtain
E exp{ -P1]n(l - e- f)} < Ee-f.nf < E exp{ -Pn1]n(l - e- f)}. (17)
d d
If Pn'TJn -+ TJ, then even PTJn --t TJ, and so by Lemma 12.2
Ee-nf -4 Eexp{-'1}(l- e- f )} = Ee-f,
16. Convergence of Random Processes, Measures, and Sets 319
where is a Cox process directed by 'T}. Hence, n .
Conversely, assume that n . Fix any 9 E C-; and let 0 < t < Ilgll- 1 .
Applying (17) with f == -log(l - tg), we get
limin£ Eexp{ -tpnTJng} > Eexp{log(l - tg)}.
n-+oo
Here the right-hand side tends to 1 as t -+ 0, and so by Lemmas 5.2 and
16.15 the sequence (PnTJn) is tight. For any subsequence N' c N, we may
d
then choose a further subsequence Nil such that Pn TJn -+ some TJ along Nil.
By the direct assertion, is then distributed as a Cox process directed by
'1J, which by Lemma 12.6 determines the distribution of ry. Hence, TJn TJ
remains true along the original sequence. 0
The last result leads in particular to an interesting characterization of
Cox processes.
Corollary 16.20 (Cox processes and thinnings, Mecke) Let be a point
process on S. Then is Cox iff for every p E (0,1) there exists a point
process p such that is distributed as a p-thinning of p.
Proof: If and p are Cox processes directed by TJ and 17/P, respectively,
then Proposition 12.3 shows that is distributed as a p-thinning of p.
Conversely, assuming the stated condition for every p E (0,1), we note
that is Cox by Theorem 16.19. 0
The previous theory will now be used to derive a general limit theorem
for sums of exchangeable random variables. The result applies in particu-
lar to sequences obtained by sampling without replacement from a finite
population. It is also general enough to contain a version of Donsker's the-
orem. The appropriate function space in this case is D([O, 1], JR) == D[O, 1],
to which the results for D(JR+) apply with obvious modifications.
For motivation, we begin with a description of the possible limits, which
are precisely the exchangeable processes on [0, 1]. Here we say that a process
X on [0, 1] is exchangeable if it is continuous in probability with Xo == 0
and has exchangeable increments over any set of disjoint intervals of equal
length. The following result is a finite-interval version of Theorem 11.15.
Theorem 16.21 (exchangeable processes on [0,1]) A process X on [0,1]
is exchangeable iff it has a version with representation
Xt = at + aBt + L ./3j(l{rj < t} - t), t E [0,1], (18)
J
for some Brownian bridge B, some independent i. i. d. U (0, 1) random
variables 71, 72, . . . , and some independent set of coefficients Q, a, and
{31, 132, . .. such that 2: j 13J < 00 a.s. In that case, the sum in (18) converges
in probability, uniformly on [0,1], toward an rclilimit.
In particular, we note that a simple point process on [0,1] is symmetric
with respect to Lebesgue measure ,\ iff it is a mixed binonlial process based
on '\, in agreement with Theorem 12.12. Combining the present result with
320 Foundations of Modern Probability
Theorem 11.15, we also see that a continuous process X on JR+ or [0, 1] with
Xo = 0 is exchangeable iff it can be written in the form Xt = at + uBt,
where B is a Brownian motion or bridge, respectively, and (a,O") is an
independent pair of random variables.
We first examine the convergence of the series in (18).
Lemma 16.22 (convergence of series) For any t E (0,1), the series in
(18) converges a.s. iff Ej!3J < 00 a.s. In that case, it converges in proba-
bility with respect to the uniform metric on [0, 1], and the sum has a version
in D[O, 1].
Proof: For both assertions, we may assume that the coefficients {3j are
nonrandom. Then for fixed t E (0,1), the terms are independent and
bounded with mean 0 and variance !3Jt(l - t), and so by Theorem 4.18
the series converges iff Lj !3J < 00.
To prove the second assertion, let x n denote the nth partial sum in (18),
and note that the processes Mf == Xf /(1 - t) are L 2 -martingales on [0,1)
with respect to the filtration induced by the processes 1{ Tj < t}. By Doob's
inequality we have for any m < nand t E [0, 1)
E(X n - X m );2 < E(M n - M m );2 < 4E(Mr - M;n)2
- 4(1 - t)-2 E(Xf - X;n)2
< 4t(1- t)-l '" (3
j>m J'
which tends to 0 as m --t 00 for fixed t. Hence, (xn - X); -+ 0 a.s. along
p
a subsequence for some process X, and then also (xn - X); -+ 0 along
- - p
N. By symmetry we have also (X n - X); -4 0 for the reflected processes
Xt == Xl-t- and X;" = Xf-t-, and so by combination (xn-x) o. The
last assertion now follows from the fact that X n is rcll for every n. 0
We plan to prove Theorem 16.21 together with the following approxima-
tion result. Here we consider for every n E N some exchangeable random
variables nj, j < m n , where m n -+ 00, and introduce the summation
processes
X;" == . nj, t E [0,1], n E N.
L...,,; J '5:. m n t
(19)
Our aim is to show that the xn can be approximated by exchangeable
processes as in (18). The convergence criteria will be stated in terms of the
random variables and measures
an = 2: .nj,
J
f\,n == 2: .j<5nj'
J
n EN,
(20)
K = 0"2<5 0 + 2: .f3Jbj.
J
(21)
16. Convergence of Random Processes, Measures, and Sets 321
Theorem 16.23 (approximation of exchangeable sums) ror every n E N,
consider some exchangeable random variables nj, j < fin, and define X n ,
an, and "'n by (19) and (20). Assume fin -t 00. Then xn some X in
D[O,l] iff (an, "'n) some (a, K) in 1R x M( IR ), in which case X and (a, K)
are related by (18) and (21).
Our proof is based on three auxiliary results. We begin with a simple
randomization lemma, which will enable us to reduce the proof to the case
of non-random coefficients. Recall that if v is a measure on Sand J-l is a
kernel from S to T, then VJ-L denotes the measure J Jl(s, .)v(ds) on T. For
any measurable function f : T --+ IR+, we define the mea..'3urable function
J-LI on S by J-tf(s) = J Jl(s, dt)f(t).
Lemma 16.24 (randomization) For any metric spaces Sand T, let v, VI,
V2, . .. be probability measures on S with V n v, and let J-l, Jll, J-l2, . .. be
probability kernels from S to T such that Sn -t s in S implies J-ln (sn, .)
J..t ( s, .). Th en v n Jln v Jl.
Proof: Fix any bounded, continuous function f on T. Then Jlnl(sn) -+
J..tf(s) as Sn -+ s, and so by Theorem 4.27
(v n J1n)f = v n (J1nf) -t v(Jlf) = (vJ..t)f.
o
To establish tightness of the random measures "'n, we need the following
conditional hyper-contractivity criterion.
Lemma 16.25 (hyper-contraction and tightness) Let the random var't-
abIes 1, 2, ... > 0 and a-fields ;:1, F2, be such that, for some
a > 0,
E[IFn] < a(E[nIFn])2 < 00 a.s., n E N.
Then if (n) is tight, so is the sequence "In == E[nIFn], n E N.
p
Proof: By Lemma 4.9 we need to show that CnTJn -+ 0 whenever 0 <
C n --+ O. Then conclude from Lemma 4.1 that, for any r E (0,1) and c > 0,
o < (1- r)2a- 1 < P[n > rTJnlFn] < P[cnn > rclFn] + l{c n 1Jn < c}.
Here the first term on the right tends in probability to 0 since cnn 0 by
Lemma 4.9. Hence, l{C n 17n < e} 1, which means that P{c n 1Jn > c} -4 O.
Since £ is arbitrary, we get Cn17n O. 0
Since the summation processes in (19) will be approximated by exchange-
able processes, as in Theorem 16.21, we finally need a convergence criterion
for the latter. This result also has some independent interest.
322 Foundations of Modern Probability
Proposition 16.26 (convergence of exchangeable processes) Let the pro-
cesses X n and pairs (an, Kn) be related as in (18) and (21). Then X n
some X in D[O, 1] iff (an, h: n ) some (a, h:) in R x M( JR ), in which case
even X and (o:,h:) are related by (18) and (21).
Proof' First let (on, Kn) (a, K). To prove xn X for the correspond-
ing processes in (18), it suffices by Lemma 16.24 to assume that all the an
and Kn are nonrandom. Thus, we may restrict our attention to processes
X n with constant coefficients an, (J' n, and (3nj, j EN.
To prove that xn f d ) X, we begin with four special cases. First we
note that if an -+ a, then trivially O:nt at uniformly on [0,1]. Similarly,
an -+ a implies anB -+ (J'B in the same sense. Next we consider the case
when an == an == 0 and {3n,m+l == !3n,m+2 == . . · == 0 for some fixed mEN.
Here we may assume that even a == a == 0 and 13m+l == /3m+2 == . . . == 0, and
that moreover !3nj -+ /3j for all j. The convergence xn -+ X is then obvious.
Finally, we may assume that an == an == 0 and a == (3l == /32 == ... == O.
Then maxj l13nj I -+ 0, and for any s < t we have
E(Xr; Xf) == s(1 - t) I: ./3j -+ s(1 - t)a 2 == E(XsX t ). (22)
J
In this case, X n f d ) X by Theorem 5.12 and Corollary 5.5. By indepen-
dence we may combine the four special cases to obtain X n f d ) X whenever
j3j == 0 for all but finitely many j. From here on, it is easy to extend to the
general case by means of Theorem 4.28, where the required uniform error
estimate may be obtained as in (22).
To strengthen the convergence to xn X in D[O,l], it is enough to
verify the tightness criterion in Theorem 16.11. Thus, for any Xn-optional
times 'Tn and positive constants h n --t 0 with Tn + h n < 1, we need to show
that X+hn - Xn o. By Theorem 11.13 and a simple approximation,
it is equivalent that X hn 0, which is clear since
E(X hn )2 == ha + h n (l - hn)KnIR -7 O.
To obtain the reverse implication, we assume that xn X in D[O, 1] for
some process X. Since an == Xl Xl, the sequence (an) is tight. Next
define for n E N
1Jn = 2Xr/2 - Xf == 2a n B l/2 + 2 I: .{Jnj(l{Tj < !} - !).
J
Then
E[1J IKn]
a + :E .(3j = Kn,
J
3 {(1 + I:lj} 2 - 2 Ljj < 3(nll.
E[17In]
16. Convergence of Random Processes, Measures, and Sets 323
Since ('TJn) is tight, Lemmas 16.15 and 16.25 show that even (n) is tight,
and so the same thing is true for the sequence of pairs (an, n)'
The tightness implies relative compactness in distribution, and so every
subsequence contains a further subsequence that converges in x M( JR )
toward some random pair (a, ). Since the measures in (21) form a vaguely
closed subset of M( ), the limit"" has the same form for suitable a and
{31, {32,. .. . The direct assertion then yields X n Y with Y as in (18),
and therefore X d Y. Now the coefficients in (18) are measurable functions
of Y, and so the distribution of (Q, ) is uniquely determined by that of
X. Thus, the limiting distribution is independent of subsequence, and the
convergence (an, Kn) (a, K) remains valid along N. We may finally use
Corollary 6.11 to trnsfer the representation (18) to the original process
X. 0
Proof of Theorem 16.23: Let 71,12,. .. be i.i.d. U(O,1) and independent
of all nj, and define
n := L .nj1{Tj < t} := ant + L .nj(1{Tj < t} - t), t E [0,1].
J J
Writing nk for the kth jump from the left of yn (including possible 0
jumps when nj := 0), we note that (tnj) d (nj) by exchangeability. Thus,
- d - - -
X n := X n , where Xr := Ejmnt nj. Furthermore, d(X n , yn) ---t 0 a.s.
by Proposition 4.24, where d is the metric in Theorem A2.2. Hence, by
Theorem 4.28 it is equivalent to replace xn by yn. But then the assertion
follows by Proposition 16.26. 0
Using similar compactness arguments, we may finally prove the main
representation theorem for exchangeable processes on [0, 1].
Proof of Theorem 16.21: The sufficiency part being obvious, it is enough
to prove the necessity. Thus, assume that X has exchangeable increments.
Introd uce the step processes
x; = X(2-n[2 n t]), t E [0,1], n E N,
define Kn as in (20) in terms of the jump sizes of X n , and put an = X 1. If
the sequence (Kn) is tight, then (an, Kn) (a, "") along some subsequence,
d
and by Theorem 16.23 we get x n -t Y along the same subsequence, where
Y can be represented as in (18). In particular, X n f d ) y, and so the
finite-dimensional distributions of X and Y agree for dyadic times. The
agreement extends to arbitrary times, since both processes are continuous
in probability. By Lemma 3.24 it follows that X has a version in D[O, 1],
and by Corollary 6.11 we obtain the desired representation.
To prove the required tightness of (K n ), denote the increments in xn by
f,nj, put (nj = nj - 2- n a n , and note that
fbnR = L .j = L .(j + 2-na.
J J
(23)
324 Foundations of Modern Probability
Writing TIn = 2Xr!2 - Xr = 2X 1j2 - Xl and noting that E j (nj = 0, we
get the elementary estimates
E[1J;In] :S L/j + Li#j'i'j = {Lj'j} 2 :S (E[1J;In])2.
Since 'r/n is independent of n, the sequence of sums E j (j is tight by
Lemma 16.25, and the tightness of (n) follows by (23). 0
For measure-valued processes X n with rcll paths, we show that tight-
ness can be characterized in terms of the real-valued projections Xl" f ==
f f(s)XJ:(ds), f E C1(.
Theorem 16.27 (measure-valued processes) Let Xl, X 2 ,. .. be random
elements in D(1R+, M (8)), where S is lcscH. Then (X n ) is tight iff (X n f)
is tight in D(1R+, 1R+) for every 1 E ct (8).
Proof: Assume that (X n f) is tight for every f E Cj(, and fix any £ > o.
Let 11,12,... be such as in Theorem A2.4, and choose some compact sets
B 1 , B2, . . . c D(IR+, IR+) with
P{Xnfk E B k } > 1-c2- k , k,n E N.
(24)
Then A = nk{p,; P,fk E Bk} is relatively compact in D(JR+,M(S)), and
(24) yields P{X n E A} > 1 - c. 0
We turn our attention to random sets. Then fix an IcscH space S, and
let F, Q, and /C denote the classes of closed, open, and compact subsets,
respectively. We endow :F with the so-called Fell topology, generated by
the sets {F; F n G =f:. 0} and {F; F n K == 0} for arbitrary G E g and
K E /C. Some basic properties of this topology are summarized in Theorem
A2.5. In particular, F is compact and metrizable, and {F; F n B = 0} is
universally measurable for every B E S.
By a random closed set in S we mean a random element c.p in F. In this
context we often write c.p n B = c.pB, and we note that the probabilities
P{ c.pB = 0} are well defined. For any random closed set c.p, we introduce
the class
Sip = {B E Sj P{cpBo = 0} = P{ cpB = 0}} ,
which is separating by Lemma A2.6. We may now state the basic con-
vergence criterion for random sets. It is interesting to note the formal
agreement with the first condition in Proposition 16.17.
16. Convergence of Random Processes, Measures, and Sets 325
Theorem 16.28 (convergence of random sets, Norberg) Let C{J, <PI, C{J2,...
be random closed sets in an LescH space S. Then 4?n <p iff
P{C{JnU==0}.-tP{<pU==0}, UEU, (25)
for some separating class U c S, in which case we may take U == S<.p.
Proof: Write h(B) == P{<pB =1= 0} and hn(B) == P{C{Jn B i= 0}. If <Pn <P,
then by Theorem 4.25,
h(BO) < liminf hn(B) < limsuphn(B) < h( B ),
n-+oo
BE S,
n-+oo
and so for any B E S'P we get hn(B) h(B). A
Next assume that (25) holds for some separating class U. Fix any B E Scp,
and conclude from (25) that, for any U, V E U with U c B c V,
h(U) < liminf hn(B) < limsuphn(B) < h(V).
n-+oo n-+oo
Since U is separating, we may let U t B O to get {<pU =I 0} t {<pBo 1= 0} and
hence h(U) t h(BO) = h(B). Next choose some sets V E U with V .t. B , and
conclude by the finite intersection property that {<p V =1= 0} t {<p B =1= 0},
which gives h(V) t ( B ) == h(B). Thus, hn(B) .-t h(B), and so (25)
remains true for U == S'P'
Since F is compact, the sequence {<Pn} is relatively compact by Theorem
d
16.3. Thus, for any subsequence N' c N, we have <(In --t 1jJ along a further
subsequence for some random closed set 'ljJ. By the direct statement together
with (25) we get
" A
P{ cpB = 0} == P{ 1/JB == 0}, B E S<p n 5'1/;.
(26)
Since Scp ns"p is separating by Lemma A2.6, we may approximate as before
to extend (26) to arbitrary compact sets B. The class of sets {F; FnK == 0}
with K compact is clearly a 7f-system, and so a monotone class argument
gives <p d 1/J. Since N' is arbitrary, we obtain 'Pn <p along N. 0
Simple point processes allow the dual descriptions as integer-valued ran-
dom measures or locally finite random sets. The corresponding notions of
convergence are different, and we proceed to clarify their relationship. Since
the mapping J.L ...-.+ sUPP J-L is continuous on N(S), we note that n im-
plies supp n supp . Conversely, assuming the intensity measures E
and Ef,n to be locally finite, we see from Proposition 16.17 and Theorem
d d v
16.28 that n --1- f, whenever sUPPn supp and En --t E. The next
result gives a general criterion.
326 Foundations of Modern Probability
Theorem 16.29 (supports of point processes) Let , 1, 2, . .. be point
processes on an lcscH space S, where is simple, and fix a preseparating
d d
class I c Sf,.. Then n -t iff supp n -t supp and
limsupP{nI > I} < P{I > I}, I E I. (27)
n-+oo
f
Proof: By Corollary 6.12 we may assume that sUPPn -=-t sUPP a.s., and
since is simple we get by Proposition A2.8
limsuP(nB 1\ 1) < B < liminf nB a.s., B E B. (28)
n-+oo noo
Next we have for any a, b E Z+
{b < a < l}C {a>1}U{a<bI\2}
- {b>l}U{a==O,b=l}U{a>l > b},
where all unions are disjoint. Substituting a == I and b == nI, we get by
(27) and (28)
!im P{I < f,nI 1\ 2} = 0, I E I.
nCX)
(29)
Next let Bel E I and B' == 1\ B, and note that
{nB > B} c {nI > I} U {nB' < B'}
c {nI 1\ 2 > I} U {I > I} U {nB' < B'}. (30)
More generally, assume that B E B is covered by 11,. .. ,1m E I. It may
then be partitioned into sets Bk E Bt,. n Ik, k == 1, . . . , m, and by (28), (29),
and (30) we get
limsupP{nB > B} < pU {lk > I}.
n-+oo k
(31)
Now let B E B and K E K with B c KO. Fix a metric d in S and let
E > O. Since I is preseparating, we may choose some 11,. . ., 1m E I with d-
diameters < e such that B C Uk Ik C K. Letting PK denote the minimum
d-distance between points in K n supp , it follows that the right-hand side
of (31) is bounded by P{PK < E}. Since PK > 0 a.s. and E > 0 is arbitrary,
we get P{nB > B} -+ o. In view of the second relation in (28), we obtain
P d
nB -+ B. Thus, n -t by Theorem 16.16. 0
Exercises
1. For any metric space (S,p), show that if x n -+ x in D(JR+,S) with x
continuous, then SUPs<t p(x, xs) -+ 0 for every t > o. (Hint: Note that x
is uniformly continuous on every interval [0, t].)
16. Convergence of Random Processes, Measures, and Sets 327
2. For any separable and complete metric space (8, p), show that if X n X
in D(IR+, S) with X continuous, there exist some processes yn d x n such
that sUPs<t p(Ysn, Xs) -+ 0 a.s. for every t > O. (Hint: lTse the preceding
result together with Theorems 4.30 and A2.2.)
3. Give an example where X n -+ x and Yn ---t Y in D(+, JR.) and yet
(xn, Yn) -1+ (x, y) in D(]R+, }R2).
4. Let X and X n be random elements in D(R+, ]Rd) with X n f d > X and
such that uX n uX in D (IR+ , IR) for every u E IR d. Show that X n X.
(Hint: Proceed as in Theorems 16.27 and A2.4.)
5. Let f be a continuous mapping between two metric spaces Sand T,
where S is separable and complete. Show that if X n X in D(R+, S),
then f(X n ) f(X) in D(+, T). (Hint: By Theorem 4.27 it suffices to
show that X n -t x in D(JR+, S) implies f(xn) ---t f(x) in D(IR+, T). Since
A = {x, Xl, X2,...} is relatively compact in D(JR+, S), Theorem A2.2 shows
that U t = Us<t 1fsA is relatively compact in 8 for every t > o. Hence, f is
uniformly continuous on each U t .)
6. Show by an example that the condition in Theorem 16.11 is not necessary
for tightness. (Hint: Consider nonrandom processes X n. )
7. In Theorem 16.11, show that it is enough to consider optional times
taking finitely many values. (Hint: Approximate from the right and use the
right-continuity of the paths.)
8. Let the process X on + be continuous in probability with values in a
separable and complete metric space (8, p). Assume that p(X Tn , XTn+h n )
o for any bounded optional times Tn and constants h n O. Show that X
has an rcll version. (Hint: Approximate by suitable step processes and use
Theorems 16.10 and 16.11.)
9. Extend Corollary 16. 7 to random vectors in JR d .
10. Let X, Xl, X 2 ,. .. be Levy processes in ]Rd. Show that xn X in
D(R+,JR d ) iff Xl Xl in Rd. Compare with Theorem 15.17.
11. Show that", conditions (iii) and (iv) of Theorem 16.16 remain sufficient
if we replace S by an arbitrary separating class. (Hint: Restate the con-
ditions in terms of Laplace transforms, and extend to S by a suitable
approximation. )
12. Deduce Theorem 16.18 from Theorem 5.7. (Hint: First assume that J-t
is diffuse and use Theorem 16.17. Then extend to the general case by a
suitable randomization.)
d
13. Strengthen the conclusion in Theorem 16.19 to (n,Pn1Jn) (, 1]),
where is a Cox process directed by 1].
328 Foundations of Modern Probability
14. For any lcscH space S, let ,1,2,... be Cox processes on S directed
by 1], 'fJl, 1]2, . .. . Show that n iff 'rJn "I. Prove the corresponding
result for p-thinnings with a fixed p E (0,1).
15. Let 1], 1]1 , TJ2, . .. be.A- randomizations of some point processes , 1 , 2,
. .. on an lcscH space S. Show that n iff 1Jn 'fl.
16. Specialize Theorem 16.23 to suitably normalized sequences of i.i.d.
random variables, and compare with Corollary 16.7.
17. Let X be a continuous process on I = ffi:+ or [0,1] with Xo == o. Show
that X is exchangeable iff a.s. Xt == at + {J' Bt, tEl, for some Brownian
motion or bridge B and some independent pair of random variables Q and
a > o. Also show that Q and a are a.s. unique. (Hint: For the last assertion,
use the laws of large numbers and the iterated logarithm.)
18. Characterize the Levy processes on [0,1] as special exchangeable
processes, in terms of the coefficients in Theorem 16.21.
19. Show that a process X on JR+ is exchangeable iff it has a version that
is conditionally Levy with random characteristics (0:, /3, v). (Hint: Theorem
16.21 shows that X has an rcll version. By Theorem 11.15 it is then condi-
tionally Levy, given some a-field I. Finally, the characteristics (0:, /3, v) of
X are I-measurable by the law of large numbers.)
20. Let X be an rcll, exchangeable process on R+. Show directly from
Corollary 16.20 and Theorem 16.21 that the point process of jump sizes
on [0,1] is Cox. Also conclude from Theorem 16.19 and the law of large
numbers that the point process of jump times and sizes is Cox with directing
random measure of the form v Q9 A.
21. For an IcscH space S, let U C S be separating. Show that if KeG with
K compact and G open, there exists some U E U with K c UO c U c G.
(Hint: First choose B, C E S with K c B O c B c Co c C c G.)
Chapter 17
Stochastic Integrals
and Quadratic Variation
Continuous local martingales and semimartingales; quadratic
variation and co variation; existence and basic properties of
the integral; integration by parts and Ito's forrnula; Fisk-
Stratonovich integral; approximation and uniquen(ss; random
time-change; dependence on parameter
This chapter introduces the basic notions of stochastic calculus in the spe-
cial case of continuous integrators. As a first major task, we shall construct
the quadratic variation [M] of a continuous local martingale M, using an
elementary approximation and completeness argument. The processes M
and [M] will be related by some useful continuity and norm relations, most
importantly by the powerful BDG inequalities.
Given the quadratic variation [M], we may next construct the stochastic
integral J V dM for suitable progressive processes V, using a simple Hilbert
space argument. Combining with the ordinary Stieltjes integral J V dA for
processes A of locally finite variation, we may finally extend the integral
to arbitrary continuous semimartingales X == M + A. The continuity prop-
erties of quadratic variation carryover to the stochastic integral, and in
conjunction with the obvious linearity they characterize the integration.
The key result for applications is Ita's formula, which shows how
semimartingales are transformed under smooth mappings. The present sub-
stitution rule differs from the corresponding result for Stieltjes integrals,
but the two formulas can be brought into agreement by a suitable modi-
fication of the integral. We conclude the chapter with some special topics
of importance for applications, such as the transformation of stochastic
integrals under a random time-change, and the integration of processes
depending on a parameter.
The present material may be regarded as continuing the martingale the-
ory from Chapter 7. Though no results for Brownian motion are used
explicitly in this chapter, the existence of the Brownian quadratic variation
in Chapter 13 may serve as a motivation. We shall also need the represen-
tation and measurability of limits obtained in Chapter 4. The stochastic
calculus developed in this chapter plays an important role throughout the
remainder of this book, especially in Chapters 18, 21, 22, and 23. In Chapter
26 the theory is extended to possibly discontinuous semimartingales.
330 Foundations of Modern Probability
Throughout the chapter we let F == (Ft) be a right-continuous and
complete filtration on 1R+. A process M is said to be a local martingale if it is
adapted to :F and such that the stopped and shifted processes MTn - Mo are
true martingales for suitable optional times Tn t 00. By a similar localization
we may define local L 2 -martingales, locally bounded martingales, locally
integrable processes, and so on. The associated optional times Tn are said
to form a localizing sequence.
Any continuous local martingale may clearly be reduced by localiza-
tion to a sequence of bounded, continuous martingales. Conversely, we see
by dominated convergence that every bounded local martingale is a true
martingale. The following useful result may be less obvious.
Lemma 17.1 (localization) Fix any optional times Tn t 00. Then a
process M is a local martingale iff M'n has this property for every n.
Proof: If M is a local martingale with localizing sequence (un), and if T is
an arbitrary optional time, then the processes (M')CT n == (MU n )' are true
martingales. Thus, MT is again a local martingale with localizing sequence
(an).
Conversely, assume that each process MTn is a local martingale with
localizing sequence (0';:). Since uk -+ 00 a.s. for each n, we may choose
some indices k n with
P{ak n < Tn An} < 2- n , n EN.
Writing T = Tn A a kn , we get T -+ 00 a.s. by the Borel-Cantelli lemma,
and so the optional times r:: = infm>n r:n satisfy r:: t 00 a.s. It remains to
" T II
note that the processes MTn == (MTn )Tn are true martingales. 0
The next result shows that every continuous martingale of finite variation
is a.s. constant. An extension appears as Lemma 25.11.
Proposition 17.2 (finite-variation martingales) If M is a continuous
local martingale of locally finite variation, then M == Mo a.s.
Proof: By localization we may reduce to the case when Mo = 0 and M
has bounded variation. In fact, let vt denote the total variation of M on
the interval [0, t], and note that V is continuous and adapted. For each
n E N we may then introduce the optional time Tn = inf {t > 0; vt = n},
and we note that MTn - Mo is a continuous martingale with total variation
bounded by n. Note also that Tn -+ 00 and that if M'n == Mo a.s. for each
n, then even M = Mo a.s.
In the reduced case, fix any t > 0, write tn,k = ktln, and conclude from
the continuity of M that a.s.
(n = (M tn k - M tn k _ l )2 < Vi max IM tn k - M tn k-ll -t O.
L..Jkn' · kn' ·
17. Stochastic Integrals and Quadratic Variation 331
Since (n < 2, which is bounded by a constant, it follows by the martingale
property and dominated convergence that EM; == E(n ---+ 0, and so Mt == 0
a.s. for each t > O. 0
Our construction of stochastic integrals depends on the quadratic vari-
ation and covariation processes, which then need to be constructed first.
Here we use a direct approach that has the further advantage of giving
some insight into the nature of the basic integration-by-parts formula of
Theorem 17.16. An alternative but less elementary approach would be to
use the Doob-Meyer decomposition in Chapter 25.
The construction utilizes predictable step processes of the form
Vi = Lkkl{t > rd = L/7k1(rk,rk+1] (t), t > 0, (1)
where the Tn are optional times with Tn t 00 a.s., and the k and 1}k are
FTk-measurable random variables for each kEN. For any process X we
may introduce the elementary integral process V . X, given as in Chapter
7 by
(V. X)t = (t VdX = L k(Xt - X?) = L 1Jk(X;k+l - X;J, (2)
J o k k
where the series converge since they have only finitely many nonzero terms.
Note that (V . X)o = 0 and that V . X inherits the possible continuity
properties of X. It is further useful to note that V . X == V . (X - X 0)' The
following simple estimate will be needed later.
Lemma 17.3 (L 2 -bound) For any continuous L 2 -martingale M with
Mo == 0 and predictable step process V with IVI < 1, the process V . M
is again an L 2 -martingale, and E(V . M) < EM;.
Proof: First assume that the sum in (1) has only finitely many nonzero
terms. Then Corollary 7.14 shows that V . M is a martingale, and the
L2-bound follows by the computation
E(V . M);
E Lk 1J(M;k+l - M;J2
E" ( M t - M t ) 2 == EM 2 .
k Tk+l Tk t
<
The estimate extends to the general case by Fatou's lemma, and the
martingale property then extends by uniform integrability. 0
Let us now introduce the space M 2 of all L 2 -bounded, continuous mar-
tingales M with Mo == 0, and equip M 2 with the norrn 11M II == IIM(X) 112.
Recall that IIM*1I2 < 211MII by Proposition 7.16.
Lemma 17.4 (completeness) The space M 2 is a Hilbert space.
Proof: Fix any Cauchy sequence M 1 , M 2 ,. .. in M 2 . The sequence (M::O)
is then Cauchy in L 2 and thus converges toward some element E £2.
332 Foundations of Modern Probability
Introduce the L2- mar tingale Mt = E[IFt1, t > 0, and note that Moo =
a.s. since is F oo-measurable. Hence,
II(M n - M)*112 < 211M n - Mil == 211M - Moo 112 -+ 0,
and so liMn - Mil -+ O. Moreover, (Mn - M)* -+ 0 a.s. along some
subsequence, which shows that M is a.s. continuous with Mo = O. 0
We are now ready to prove the existence of the quadratic variation and
covariation processes [1\;1] and [M, N]. Extensions to possibly discontinuous
processes are considered in Chapter 26.
Theorem 17.5 (covariation) For any continuous local martingales !v!
and N, there exists an a.s. unique continuous process [M, N] of locally
finite variation and with [M, N]o == 0 such that M N - [M, N] is a
local martingale. The form [M, N] is a.s. symmetric and bilinear with
[M, N] == [M - Mo, N - No] a.s. Furthermore, [M] == [M, M] is a.s.
nondecreasing, and for any optional time T,
[M T , N] = [MT, NT] == [M, N]T a.s.
Proof: The a.s. uniqueness of [M, N] follows from Proposition 17.2, and
the symmetry and bilinearity are immediate consequences. If [M, N] exists
with the stated properties and T is an optional time, then by Lemma 17.1
the process MT NT - [M, N]T is a local martingale, and so is the process
MT(N - NT) by Corollary 7.14. Hence, even MT N - [M, N]T is a local
martingale, and so [MT, N] = [MT, NT] == [M, N]T a.s. Furthermore,
M N - (M - Mo)(N - No) == MoNo + Mo(N - No) + No(M - Mo)
is a local martingale, and so [M - Mo, N - No] = [M, N] a.s. whenever
either side exists. If both [M + N] and [M - N1 exist, then
4M N - ([M + N] - [M - N])
== ((M + N)2 - [M + N]) - ((M - N)2 - [M - N])
is a local martingale, and so we may take [M, N] == ([M +N] - [M - N])/4.
It is then enough to prove the existence of [M] when Mo = O.
First assume that M is bounded. For each n E N, let TO = 0 and define
recursively
7;:+1 = inf{t > 7; IMt - MTrl = 2- n }, k > O.
Clearly, TT: ---t 00 as k 00 for fixed n. Introduce the processes
v;n = Lk MTk'l{t E (Tk,Tk+1])' Qf = Lk(MtMk' - MtMi:_J2.
The V n are bounded predictable step processes, and we note that
Ml = 2(V n . M)t + Qr , t > O. (3)
By Lemma 17.3 the integrals V n . M are continuous L 2 -martingales, and
since tv n - Ml < 2 n for each n, we have
IIV m . M - V n . MU = II(V m - V n ) · Mil < 2- m + 1 I1MII, m < n.
17. Stochastic Integrals and Quadratic Variation 333
Hence, by Lemma 17.4 there exists some continuous martingale N such
that (V n . M - N)* o. The process [M] == M2 - 2N is again continuous,
and by (3) we have
(Qn _ [M])* = 2(N - V n . M)* o.
In particular, [M] is a.s. nondecreasing on the randorn time set T ==
{ T;:; n, kEN}, and the monotonicity extends by continuity to the clo-
sure T . Also note that [M] is constant on each interval in r C , since this is
true for M and hence also for every Qn. Thus, [M] is a.s. nondecreasing.
Thrning to the unbounded case, we define Tn == inf{t > 0; IMtl == n},
n E N. The processes [MT n ] exist as before, and we note that [MT m JTm =:
[MT n ]Tm a.s. for all m < n. Hence, [MT m ] == [MTn] a.s. on [0, Tm], and since
Tn -+ 00 there exists a nondecreasing, continuous, and adapted process [M]
such that [M] == [MT n ] a.s. on [0, Tn] for each n. Here (MTn)2 - [M]T n is
a local martingale for each n, and so M 2 - [M] is a local martingale by
Lemma 17.1. 0
We proceed to establish a basic continuity property.
Proposition 17.6 (continuity) For any continuous local martingales M n
starting at 0, we have M 0 iff [Mn]oo o.
Proof: First let M O. Fix any c > 0, and define Tn == inf{t > 0;
IMn(t)1 > c}, n E N. Write N n = M - [M n ], and note that Nn is a true
martingale on JR +. In particular, E[Mn]T n < E 2 , and so by Chebyshev's
inequality
P{[Mn]oo > E} < P{Tn < oo} + E- 1 E[M n ]Tn < P{M > E} + E.
Here the right-hand side tends to zero as n -+ 00 and then c ---t 0, which
p
shows that [Mn]oo -t 0.
The proof in the other direction is similar, except that we need to use a
localization argument together with Fatou's lemma to see that a continuous
local martingale M with Mo = 0 and E[M]oo < 00 is necessarily £2_
bounded. 0
Next we prove a pair of basic norm inequalities involving the quad-
ratic variation, known as the BDG inequalities. Partial extensions to
discontinuous martingales are established in Theorem 26.12.
Theorem 17.7 (norm inequalities, Burkholder, Millar, Gundy, Novikov)
There exist some constants c p E (0,00), p > 0, such that for any continuous
local martingale M with Mo = 0,
c- 1 E [ M ] P/2 < EM*P < c E [ M ] P/2 P :> o.
P 00- -P 00'
Proof: By optional stopping we may assume that M and [M] are
b9unded. Write M' = M - MT with T = inf{t; Ml == r} and define
334 Foundations of Modern Probability
N = (M')2 - [M']. By Corollary 7.30 we have for any r > 0 and c E (0,2- P )
p{M*2 > 4r} - P{[M]oo > cr} < p{M*2 > 4r, [M]oo < cr}
< P{N> -cr, SUPtNt > r - cr}
< cP{N* > O} < cP{M*2 > r}.
Multiplying by (p/2)r P / 2 - 1 and integrating over lR+, we get by Lemma 3.4
2- P EM*P - c- p !2 E[M]2 < cEM*P,
and the right-hand inequality follows with c p == c- p !2/(2- P - c).
N ext let N be as before with T = inf {t; [M] t = r}, and write for any
r > 0 and c E (0,2- p / 2 - 2 )
P{[M]oo > 2r} - P{Al*2 > cr} < P{[M]oo > 2r, M*2 < cr}
< P{N < 4cr, inftNt < 4cr - r}
< 4cP{[M]oo > r}.
Integrating as before yields
2- p / 2 E[M]2 - c- p / 2 EM*P < 4cE[M]2,
and the left-hand inequality follows with c p = c- p / 2 /(2- p / 2 - 4c). 0
It is often important to decide whether a local martingale is in fact a
true martingale. The last proposition yields a useful criterion.
Corollary 17.8 (uniform integrability) Let M be a continuous local mar-
tingale satisfying E{IMol + [M];b2) < 00. Then M is a uniformly integrable
martingale.
Proof: By Theorem 17.7 we have EM* < 00, and the martingale property
follows by dominated convergence. 0
The basic properties of [M, N] suggest that we think of the covariation
process as an inner product. A further justification is given by the following
useful Cauchy-Buniakovsky-type inequalities.
Proposition 17.9 (Cauchy-type inequalities, Courrege) For any contin-
uous local martingales M and N, we have a.s.
I[M, N] I < J Id[M, NJI < [Mf/2 [NJl/2 . ( 4)
More generally, we have a.s. for any measurable processes U and V
it IUV d[M, NJI < (U 2 · [M])/2(V2 . [N])/2, t > o.
Proof: Using the positivity and bilinearity of the covariation, we get a.s.
for any a, b E JR and t > 0
o < [aM + bN]t = a 2 [M]t + 2ab[M, N]t + b 2 [N]t.
17. Stochastic Integrals and Quadratic Variation 335
By continuity we can choose a common exceptional null set for all a and
b, and so [M, N]; < [M]t[N]t a.s. Applying this inequality to the processes
M - MS and N - NS for any s < t, we obtain a.s.
I [M, N] t - [M, N] S I < ([M] t - [M] s ) 1/ 2 ( [N] t - [N] s ) 1/ 2 , ( 5 )
and by continuity we may again choose a common null set. Now let 0 ==
to < tl < . . . < t n == t be arbitrary, and conclude from (5) and the classical
Cauchy-Buniakovsky inequality that
I[M,N]tl < Lk \[M,N]tk - [M,N]tk_ll < [M];/2[N];/2.
To get (4), it remains to take the supremum over all partitions of [0, t].
Next write dJ1- == d[M], dv == d[N], and dp == Id[M,lV]1, and conclude
from (4) that (pI)2 < J.LI vI a.s. for every interval I. By continuity we may
choose the exceptional null set A to be independent of 1. Letting G c IR+
be open with connected components Ik and using the Cauchy-Buniakovsky
inequality, we get on A C
pG = LkPh < L k (J-th v h)1/2 < {LjJ-tljLkv h } 1/2 = (J-tGvG)1/2.
By Lemma 1.34 the last relation extends to any B E B(+).
Now fix any simple measurable functions f == Lk ak1Bk and 9 ==
Lk bk 1Bk. Using the Cauchy-Buniakovsky inequality again, we obtain on
AC
plfgl < LklakbklpBk < Lklakbkl(J-tBkVBk)1/2
< {Lja]J-tB j Lk b'f.vB k } 1/2 < (J-tf2vg2) 1/2 ,
which extends by monotone convergence to any measurable functions f and
9 on R+. In particular, in view of Lemma 1.33, we may take f(t) == Ut(w)
and g(t) = vt(w) for fixed w E AC. 0
Let £ denote the class of bounded, predictable step processes with
jumps at finitely many fixed times. To motivate the construction of general
stochastic integrals and for subsequent needs, we shall establish a basic
identity for elementary integrals.
Lemma 17.10 (covariation of elementary integrals) }tor any continuous
local martingales M, N and processes U, V E £, the integrals U . M and
V . N are again continuous local martingales, and we have
[U . M, V . N] == (UV) . [M, N] a.s.
(6)
Proof: We may clearly take Mo == No == o. The first assertion follows by
localization from Lemma 17.3. To prove (6), let U t = Ek<nkl(tk,tk+l](t),
where k is bounded and Ftk -measurable for each k. By localization we may
assume M, N, and [M, N] to be bounded, so that M, N and M N - (M, N]
336 Foundations of Modern Probability
are martingales on +. Then
E(U . M)ooN oo - ELjj(MtJ+l - M tj ) Lk (N tH1 - N tk )
- ELkdMtk+lNtk+l - MtkN tk )
- ELkk([M,NhHl - [M,Nhk)
- E(U. [M, N])oo.
Replacing M and N by MT and NT for an arbitrary optional time 7, we
get
E(U. M)rNT =: E(U . Mr)ooN =: E(U . [M r , NT])oo == E(U . [M, N))r.
By Lemma 7.13 the process (U. M)N - U. [M, N] is then a martingale, and
so [U. M, N] == U. [M, N] a.s. The general formula follows by iteration. 0
In order to extend the stochastic integral V . M to more general pro-
cesses V, it is convenient to take (6) as the characteristic property. Given a
continuous local martingale M, let L(M) denote the class of all progressive
processes V such that (V 2 · [.l\IJ])t < 00 a.s. for every t > O.
Theorem 17.11 (stochastic integral, ltB, Kunita and Watanabe) For any
continuous local martingale M and process Y E L( M), there exists an a. s.
unique continuous local martingale V.M with (V.M)o = 0 such that [V.M,
N] = V. [M,N) a.s. for every continuous local martingale N.
Proof: To prove the uniqueness, let M' and Mil be continuous local
martingales with M == M' = 0 such that [M', N] == [Mil, N] = V .
[M, N] a.s. for all continuous local martingales N. By linearity we get
[M' - Mil, N] == 0 a.s. Taking N =: M' - Mil gives [M' - M"] == 0 a.s. But
then (M' - M")2 is a local martingale starting at 0, and it easily follows
that M' = M" a.s.
To prove the existence, we may first assume that IIVIIL = E(y 2 .
[M])oo < 00. Since V is measurable, we get by Proposition 17.9 and the
Cauchy-Buniakovsky inequality
IE(V. [M, N])ool < IIVIIMI/NII, N E M 2 .
The mapping N .-+ E(V . [M, N])oo is then a continuous linear functional
on M 2 , and so by Lemma 17.4 there exists an element V . M E M2 with
E(V. [M, N])oo = E(V . M)ooN oo , N E M 2 .
Now replace N by NT for an arbitrary optional time T. By Theorem 17.5
and optional sampling we get
E(V. [M, N])r = E(V. [M, N]T)OO = E(V . [M, NT])oo
= E(V. M)ooN.,. = E(V . M).,.N.,..
Since V is progressive, it follows by Lemma 7.13 that V. (M, NJ - (V. M)N
is a martingale, which means that [V . M, N] = V . [M, N] a.s. The last
17. Stochastic Integrals and Quadratic Variation 337
relation extends by localization to arbitrary continuous local martingales
N.
In the general case, define Tn == inf {t > 0; (V 2 . [M])t == n}. By the
previous argument there exist some continuous local martingales V . MTn
such that, for any continuous local martingale N,
[V. MTn,N] == V. [MTn,N] a.s., n E L
(7)
For m < n it follows that (V . MTn )Tm satisfies the corresponding relation
with [MT m , N], and so (V . MTn )Tm == V . MTm a.s. Hence, there exists
a continuous process V . M with (V . M)T n == V . MTn a.s. for all n, and
Lemma 17.1 shows that V. M is again a local martingale. Finally, (7) yields
[V . M, N] == V . [M, N] a.s. on [0, Tn] for each n, and so the same relation
holds on IR+. 0
By Lemma 17.10 we note that the stochastic integral V . M of the last
theorem extends the previously defined elementary integral. It is also clear
that V. M is a.s. bilinear in the pair (V, M) and satisfies the following basic
continuity property.
Lemma 17.12 (continuity) For any continuous local martingales M n and
processes V n E L(M n ), we have (V n . M n )* 0 iff (V; . (Mn])oo O.
Proof: Recall that [V n . M n ] = V; . (M n ] and use Proposition 17.6. 0
Before continuing the study of stochastic integrals, it is convenient to
extend the definition to a larger class of integrators. A process X is said
to be a continuous semimartingale if it can be written as a sum M + A,
where M is a continuous local martingale and A is a continuous, adapted
process of locally finite variation and with Ao == O. By Proposition 17.2 the
decomposition X = M + A is then a.s. unique, and it is often referred to as
the canonical decomposition of X. By a continuous semimartingale in R d
we mean a process X = (Xl,. .. , X d ) such that the cOlnponent processes
X k are one-dimensional continuous semimartingales.
Let L(A) denote the class of progressive processes V such that the process
(V . A)t = J V dA exists in the sense of ordinary Stieltjes integration.
For any continuous semimartingale X == M + A we lllay write L(X) ==
L(M) nL(A), and we define the integral of a process V E L(X) as the sum
V . X = V . M + V . A. Note that V . X is again a continuous semimartingale
with canonical decomposition V . M + V . A. For progressive processes V,
it is further clear that V E L(X) iff V 2 E L([M]) and r E L(A).
From Lemma 17.12 we may easily deduce the following stochastic version
of the dominated convergence theorem.
Corollary 17.13 (dominated convergence) For any continuous semimar-
tingale X, let U, V, VI, V 2 , . .. E L(X) with IVnl < U and V n -+ V. Then
(V n . X - V . X); 0, t > O.
338 Foundations of Modern Probability
Proof: Assume that X == M + A. Since U E L(X), we have U 2 E L([M])
and U E L(A). Hence, by dominated convergence for ordinary Stieltjes
integrals, «(V n - V)2 . [M])t 0 and (V n . A - V . A); --t 0 a.s. By Lemma
17.12 the former convergence implies (V n . M - V . M); 0, and the
assertion follows. 0
The next result extends the elementary chain rule of Lemma 1.23 to
stochastic integrals.
Proposition 17.14 (chain rule) Consider a continuous semimartingale
X and two progressive processes U and V, where V E L(X). Then U E
L(V . X) iff UV E L(X), in which case U . (V . X) == (UV) . X a.s.
Proof: Let M + A be the canonical decomposition of X. Then U E L(V. X)
iff U 2 E L([V . M]) and U E L(V . A), whereas UV E L(X) iff (UV)2 E
L([M]) and UV E L(A). Since [V.M] == V 2 .[M], the two pairs of conditions
are equivalent.
The formula U . (V . A) == (UV) . A is elementary. To see that even U .
(V .1\,f) == (UV). M a.s., let N be an arbitrary continuous local martingale,
and note that
[(UV) . M, N]
(UV). [M,N] = U. (V. [M,N])
U. [V. M,N] == [U. (V. M),N].
o
The next result shows how the stochastic integral behaves under optional
stopping.
Proposition 17.15 (optional stopping) For any continuous semimartin-
gale X, process V E L(X), and optional time 7, we have a.s.
(V . X)T == V . X T == (Vl[O,r]) . X.
Proof: The relations being obvious for ordinary Stieltjes integrals, we
may assume that X == M is a continuous local martingale. Then (V . M)T
is a continuous local martingale starting at 0, and we have
[(V. 1Vf)T, N]
[V. M,N T ] == V. [M,N T ] == V. [MT,N]
V . [M, N]T == (V1[o,T]) . [M, N].
Thus, (V . M)T satisfies the conditions characterizing the integrals V . MT
and (Vl[o,T]) . M. 0
We may extend the definitions of quadratic variation and covariation
to arbitrary continuous semimartingales X and Y with canonical decom-
positions M + A and N + B, respectively, by putting [X] = [M] and
[X, Y] == [M, N]. As a key step toward the development of a stochastic
calculus, we show how the covariation process can be expressed in terms
of stochastic integrals. In the martingale case, the result is implicit in the
proof of Theorem 17.5.
17. Stochastic Integrals and Quadratic Variation 339
Theorem 17.16 (integration by parts) For any continuous semimartin-
gales X and Y, we have a.s.
Xy == XoYo + X . Y + Y . X + [X, Y]. (8)
Proof: We may take X == Y, since the general result \J\rill then follow by
polarization. First let X == M E M 2 , and define v n and Qn as in the proof
of Theorem 17.5. Then V n -4 M and In! < Mt < 00, and so Corollary
17.13 yields (V n . M)t (M. M)t for each t > o. Thus, (8) follows in this
case as we let n -4 00 in the relation ]\;1 2 == V n . 1\1 + Qn, and it extends
by localization to general continuous local martingales A1 with J\;1 0 == o. If
instead X == A, formula (8) reduces to A 2 == 2A. A, which holds by Fubini's
theorem.
Turning to the general case, we may assume that Xo == 0, since the
formula for general Xo will then follow by an easy computation from the
result for X -Xo. In this case (8) reduces to X 2 == 2X.X +[M]. Subtracting
the formulas for M2 and A 2 , it remains to prove that AJ\1 == A . !vI + !v! . A
a.s. Then fix any t > 0, and introduce the processes
A == A(k-l)t/n, M;- == M kt / n ,
which satisfy
S E t(k - 1,k]jn, k,n E N,
AtMt == (An. M)t + (A1 n . A)t, n E L
Here (An. M)t (A . M)t by Corollary 17.13 and (MH . A)t (M . A)t
by dominated convergence for ordinary Stieltjes integra]s. 0
The terms quadratic variation and covariation are justified by the
following result, which extends Theorem 13.9 for Brownian motion.
Proposition 17.17 (approximation, Fisk) Let X and Y be continuous
semimartingales, fix any t > 0, and consider for every n E N a partition
o == tn,o < tn,l < . .. < tn,k n == t such that maxk(tn,k - tn,k-l) o. Then
(n = Lk (Xtn,k - Xtn,k-l ) (l'tn,k - l'tn,k-J [X, yJt. (9)
Proof: We may clearly assume that Xo == Yo == o. Introduce the
predictable step processes
X;- == Xtn,k_l' Ysn == "Yt n . k - 1 ,
S E (tn,k-l, tn,k], k, n E N,
and note that
Xt¥t == (X n . Y)t + (y n . X)t + (n, n E:: N.
Since X n -t X and yn -t Y, and also (X n ); < X; < 00 and (yn); <
X; < 00, we get by Corollary 17.13 and Theorem 17.16
p
(n Xtrt - (X . Y)t - (Y . X)t == [X, Y]t.
o
We proceed to prove a version of lto's formula, arguably the most im-
portant formula in modern probability. The result sho,vs that the class of
340 Foundations of Modern Probability
continuous semimartingales is preserved under smooth mappings; it also
exhibits the canonical decomposition of the image process in terms of the
components of the original process. Extended versions appear in Corollaries
17.19 and 17.20 as well as in Theorems 22.5 and 26.7.
Let C k = Ck(JRd) denote the class of k times continuously differentiable
functions on JRd. When I E C 2 , we write II and II; for the first- and second-
order partial derivatives of f. Here and below, summation over repeated
indices is understood.
Theorem 17.18 (substitution rule, ItD) For any continuous semimartin-
gale X in jRd and function f E C 2 (JRd), we have a.s.
I(X) == j(X o ) + jf(X). Xi + !ffj(X). [Xi,xj]. (10)
The result is often written in differential form as
df(X) == If(X)dX i + !lij(X) d[Xi, xj].
It is suggestive to think of Ita's formula as a second-order Taylor expansion
df(X) = ff(X)dX i + !ffj(X)dXidX j ,
where the second-order differential dXidX j is interpreted as d[X i , Xj].
If X has canonical decomposition M + A, we get the corresponding de-
composition of f(X) by substituting Mi + Ai for Xi on the right of (10).
When M == 0, the last term vanishes, and (10) reduces to the familiar sub-
stitution rule for ordinary Stieltjes integrals. In general, the appearance of
this ltD correction term shows that the Ita integral does not obey the rules
of ordinary calculus.
Proof of Theorem 17.18: For notational convenience we may assume that
d == 1, the general case being similar. Then fix a one-dimensional, contin-
uous semimartingale X, and let C denote the class of functions I E C 2
satisfying (10), now appearing in the form
f(X) == f(Xo) + f'(X) . X + f"(X) . [X].
(11)
The class C is clearly a linear subspace of C 2 containing the functions
f(x) = 1 and f(x) = x. We shall prove that C is closed under multiplication
and hence contains all polynomials.
To see this, assume that (11) holds for both f and g. Then F = f(X)
and G = g(X) are continuous semimartingales, and so, by the definition of
the integral together with Proposition 17.14 and Theorem 17.16, we have
(fg)(X) - (fg)(Xo)
- FG - FoGo == F · G + G . F + [F, G]
- F. (g'(X). X + g"(X). [X])
+G. (f'(X) . X + f"(X). [X]) + [f'(X). X, g'(X). X]
- (fg' + f'g)(X) . X + (fg" + 2f'9' + f"g) (X) . [X]
- (fg)'(X). X + (fg)"(X) . [X].
17. Stochastic Integrals and Quadratic Variation 341
Now let f E 0 2 be arbitrary. By Weierstrass' approximation theorem,
we may choose some polynomials Pl,P2,... such that sUPlxl:::;c IPn(x) -
f" (x) I -t 0 for every c > O. Integrating the Pn twice yields polynomials f n
satisfying
sup (Ifn(x) - f(x)1 V If(x) - f'(x)1 V If::(x) - f"(x) I) -t 0, C > O.
Ixlc
In particular, fn(X t ) -t f(Xt) for each t > O. Letting M +A be the canon-
ical decomposition of X and using dominated convergence for ordinary
Stieltjes integrals, we get for any t > 0
(f(X) . A + f(X) . [X])t -t (f'(X) . A + f"(J) . [X])t.
Similarly, (f(X) - f'(X))2 . [M])t -+ 0 for all t, and so by Lemma 17.12
(f(X) . M)t (f'(X) . M)t, t > O.
Thus, equation (11) for the polynomials fn extends in the limit to the same
formula for f. 0
We sometimes need a local version of the last theorem, involving stochas-
tic integrals up to the time (D when X first leaves a given domain D C }Rd.
If X is continuous and adapted, then (D is clearly predictable, in the sense
of being announced by some optional times Tn t (D such that Tn < (D a.s.
on {(v > O} for all n. In fact, writing p for the Euclidean metric in }Rd, we
may choose
Tn = inf{t E [0, n]; p(X t , DC) < n- 1 }, n E: N. (12)
We say that X is a semimartingale on [0, (D) if the stopped process XTn is
a semimartingale in the usual sense for every n E N. In that case, we may
define the covariation processes [Xi, Xj] on the interval [0, (D) by requiring
[Xi, XJ]T n = [(Xi)Tn, (Xj)T n ] a.s. for every n. Stochastic integrals with
respect to Xl, . . . , x d are defined on [0, (D) in a similar way.
Corollary 17.19 (local lto-formula) For any domain D C }Rd, let X be
a continuous semimartingale on [0, (D). Then (10) holds a.s. on [0, (D) for
every f E C 2 (D).
Proof: Choose some functions in E C 2 (JR d ) with in (x) == f(x) when
p(x, DC) > n- 1 . Applying Theorem 17.18 to fn(XT n ) with Tn as in (12), we
get (10) on [0, Tn]. Since n was arbitrary, the result extends to [0, (D). 0
By a complex-valued, continuous semimartingale we mean a process of
the form Z = X + iY, where X and Yare real continuous semimartin-
gales. The bilinearity of the covariation process suggests that we define the
quadratic variation of Z as
[Z] = [Z, Z] = [X + iY, X + iY] = [X] + 2i[X, -Y] - [Y].
342 Foundations of Modern Probability
Let L(Z) denote the class of processes W = U + iV with U, V E L(X) n
L(Y). For such a process W, we define the integral by
W . Z (U + iV) . (X + iY)
== U. X - V . Y + i(U . Y + V . X).
Corollary 17.20 (conformal mapping) Let f be an analytic function on
some domain DeC. Then (10) holds for any D-valued, continuous
semimartingale Z.
Proof: Writing f(x + iy) == g(x, y) + ih(x, y) for any x + iy E D, we get
, ' h ' f '
gl + Z 1 == ,
, ' h ' ' f '
g2 + Z 2 == Z ,
and so by iteration
g " + ih" - f "
11 11 - ,
g " + ' h " - f "
12 z 12 - tJ ,
g " + ,; h " - f "
22 tJ 22 - - .
Equation (10) now follows for Z == X + iY, as we apply Corollary 17.19 to
the semimartingale (X, Y) and the functions 9 and h. 0
We also consider a modification of the It6 integral that does obey the
rules of ordinary calculus. Assuming both X and Y to be continuous
semimartingales, we define the Fisk-Stratonovich integral by
it X 0 dY = (X . Y)t + 4 [X, Y]t, t > 0, (13)
or in differential form X 0 dY == XdY + d[X, Y], where the first term on
the right is an ordinary It6 integral.
Corollary 17.21 (modified substitution rule, Fisk, Stratonovich) For any
continuous semimartingale X in JRd and function f E C 3 (IR d ), we have a.s.
f(Xt) = f(Xo) + it f:(X) 0 dX i , t > O.
Proof: By Ita's formula,
fleX) = fl(X o ) + flj(X) . xj + f:'/k(X) . [xj, X k ].
Using It6's formula again, together with (6) and (13), we get
fleX) . Xi + [fieX),Xi]
fi(X) . Xi + fij(X) . [xj, Xi] = f(X) - f(Xo). 0
Unfortunately, the more convenient substitution rule of Corollary 17.21
comes at a high price: The new integral does not preserve the martingale
property, and it requires even the integrand to be a continuous semimartin-
gale. It is the latter restriction that forces us to impose stronger regularity
conditions on the function f in the substitution rule.
Our next task is to establish a basic uniqueness property, justifying our
reference to the process V . M in Theorem 17.11 as an integral.
1 f:(X) 0 dX i
17. Stochastic Integrals and Quadratic Variation 343
Theorem 17.22 (uniqueness) The integral V . M in Theorem 17.11 is the
a.s. unique linear extension of the elementary stochastic integral such that,
for any t > 0, the convergence (V; . [M])t 0 implies (V n . M); o.
The statement follows immediately from Lemmas 17.10 and 17.12,
together with the following approximation of progressive processes by
predictable step processes.
Lemma 17.23 (approximation) For any continuous sem,imartingale X ==
M + A and process V E L(X), there exist some processes VI, V 2 ,'" E £
such that a.s. ((V n - V)2 . [M])t -+ 0 and ((V n - V) . A); -+ 0 for every
t > o.
Proof: It is enough to take t == 1, since we can then combine the processes
V n for disjoint finite intervals to construct an approxima.ting sequence on
+. Furthermore, it suffices to consider approximations in the sense of
convergence in probability, since the a.s. versions will then follow for a
suitable subsequence. This allows us to perform the construction in steps,
first approximating V by bounded and progressive processes V', next ap-
proximating each V' by continuous and adapted processes V", and finally
approximating each V" by predictable step processes V"'.
Here the first and last steps are elementary, so we may concentrate on the
second step. Then let V be bounded. We need to construct some continuous,
adapted processes V n such that ((V n - V)2. [M])I -+ 0 and ((V n - V). A)i -+
o a.s. Since the V n can be taken to be uniformly bounded, we may replace
the former condition by (IV n - VI . [M])l -+ 0 a.s. Thus, it is enough to
establish the approximation (IV n - VI . A)l -+ 0 in the case when A is a
nondecreasing, continuous, adapted process with Ao == O. Replacing At by
At + t if necessary, we may even assume that A is strictly increasing.
To construct the required approximations, we may introduce the inverse
process Ts = sup{ t > 0; At < s}, and define
h = h- 1 t V dA = h- 1 f At V(Ts)ds, t, h > O.
JT(At-h) (At- h )+
By Theorem 2.15 we have V h 0 T -7 V 0 T as h -7 0, a.e. on [0, AI]' Thus,
by dominated convergence,
1 1 IV h - VidA = l A1 IVh(Ts) - V(Ts)lds O.
The processes V h are clearly continuous. To prove that they are also
adapted, we note that the process T(A t - h) is adapted for every h > 0
by the definition of T. Since V is progressive, it is further seen that V . A
is adapted and hence progressive. The adaptedness of (V . A)T(At-h) now
follows by composition. 0
Though the class L(X) of stochastic integrands is sufficient for most
purposes, it is sometimes useful to allow the integration of slightly more
344 Foundations of Modern Probability
general processes. Given any continuous semimartingale X == M + A, let
L(X) denote the class of product-measurable processes V such that (V -
V) . [M] == 0 and (V - V) . A = 0 a.s. for some process V E L(X). For
V E L(X) we define V . X == V . X a.s. The extension clearly enjoys all the
previously established properties of stochastic integration.
It is often important to see how semimartingales, covariation processes,
and stochastic integrals are transformed by a random time-change. Let us
then consider a nondecreasing, right-continuous family of finite optional
times Ts, S > 0, here referred to as a finite random time-change T. If even
:F is right-continuous, then by Lemma 7.3 the same thing is true for the
induced filtration 98 == :F Ts ' S > o. A process X is said to be T-continuous
if it is a.s. continuous on JR+ and constant on every interval [T 8 -, is], S > 0,
where TO- == X o - == 0 by convention.
Theorem 17.24 (random time-change, Kazamaki) Let T be a finite
random time-change with induced filtration g, and let X == M + A be a T-
continuous F -semimartingale. Then X OT is a continuous g -semimartingale
with canonical decomposition MOT + A 0 T and such that [X 0 T] == [X] 0 T
a.s. Furthermore, V E L(X) implies V 0 T E t(X 0 T) and
(Vor).(XoT)==(V.X)or a.s. (14)
Proof: It is easy to check that the time-change X ...-.t X 0 r preserves
continuity, adaptedness, monotonicity, and the local martingale property.
In particular, X 0 r is then a continuous Q-semimartingale with canonical
decomposition M 07+ AOT. Since M 2 - [M] is a continuous local martingale,
the same thing is true for the time-changed process M2 0 r - [M] 0 T, and
so
[X 0 r] = [M 0 T] == [M] 0 T == [X] 0 T a.s.
If V E L(X), we also note that V 0 7 is product-measurable, since this is
true for both V and r.
Fixing any t > 0 and using the r-continuity of X, we get
-1
(l[o,tl 0 T) . (X 0 T) = 1[O'Tt-1] . (X 0 T) = (X 0 r)Tt == (l[o,t] . X) 0 T,
which proves (14) when V == l(o,t]. If X has locally finite variation, the
result extends by a monotone class argument and monotone convergence to
arbitrary V E L(X). In general, Lemma 17.23 yields the existence of some
continuous, adapted processes VI, V 2 ,. .. such that I(V n - V)2d[M] --+ 0
and J I(V n - V)dAI --+ 0 a.s. By (14) the corresponding properties hold
for the time-changed processes, and since the processes V n 0 Tare right-
continuous and adapted, hence progressive, we obtain V 0 T E L(X 0 r).
Now assume instead that the approximating processes VI, V 2 ,. .. are pre-
dictable step processes. The previous calculation then shows that (14) holds
for each V n , and by Lemma 17.12 the relation extends to V. 0
Let us next consider stochastic integrals of processes depending on a
parameter. Given any measurable space (8, S), we say that a process V on
17. Stochastic Integrals and Quadratic Variation 345
S x JR+ is progressive if its restriction to S x [0, t] is S @ Bt @ Ft-measurable
for every t > 0, where Bt == 8([0, t]). A simple version of the following
result will be useful in Chapter 18.
Theorem 17.25 (dependence on parameter, Doleans, Stricker and Yor)
Let X be a continuous semimartingale, fix a measurable space S, and con-
sider a progressive process V s (t), s E S, t > 0, such that V s E L( X) for
every s E S. Then the process Ys(t) == (V s . X)t has a version that is
progressive on S x 1R+ and a.s. continuous for each s E 5.
Proof: Let M + A be the canonical decomposition of X. Assume the
existence of some progressive processes Vsn on S x ffi.+ such that, for any
t > 0 and s E S,
«V s n - \/;,)2 . [M])t 0,
«V s n - V s ). A); O.
Then Lemma 17.12 yields (Vsn.X - Vs.X); 0 for every sand t. Proceeding
as in the proof of Proposition 4.31, we may choose a subsequence (nk (s)) c
N, depending measurably on s, such that the same convergence holds a.s.
along (nk(s)) for any sand t. Define Ys,t = limsuPk(V;k . X)t whenever
this is finite, and put Y:s,t == 0 otherwise. If we can choose versions of the
processes (Vsn . X)t that are progressive on S x lR+ and a.s. continuous for
each s, then Ys,t is clearly a version of the process (V s . X)t with the same
properties. This argument will now be applied in three steps.
First we reduce to the case of bounded and progressive integrands by
taking V n == V1{IVI < n}. Next we apply the transformation in the
proof of Lemma 17.23, to reduce to the case of continuous and progressive
integrands. In the final step, we approximate any continuous, progres-
sive process V by the predictable step processes Vsn(t) == V s (2-n[2 n t]).
Here the integrals Vsn . X are elementary, and the desired continuity and
measurability are obvious by inspection. 0
We turn to the related topic of functional representations. To motivate
the problem, note that the construction of the stochastic integral V . X
depends in a subtle way on the underlying probability measure P and
filtration F. Thus, we cannot expect any universal representation F(V, X)
of the integral process V.X. In view of Proposition 4.31, one might still hope
for a modified representation F(J.l, V, X), where fL denotes the distribution
of (V, X). Even this could be too optimistic, however, since the canonical
decomposition of X may also depend on F.
Dictated by our needs in Chapter 21, we restrict our attention to a very
special situation, which is still general enough to cover most applications of
interest. Fixing any progressive functions O"} and b i of suitable dimension,
defined on the path space C(1R+, JRd), we may consider an arbitrary adapted
process X satisfying the stochastic differential equation
dX; = O";(t, X)dBf + bi(t, X)dt,
(15)
346 Foundations of Modern Probability
where B is a Brownian motion in IR T . A detailed discussion of such equations
is given in Chapter 21. For the moment, we need only the simple fact from
Lemma 21.1 that. the coefficients aJ(t, X) and bi(t, X) are again progressive.
Write aij == aaic.
Proposition 17.26 (functional representation) For any progressive func-
tions a, b, and f of suitable dimension, there exists a measurable
mappzng
F: P(C(1R+,IR d ) x C(R+,IR d ) -+ C(JR+,) (16)
such that, whenever X is a solution to (15) with £(X) == J.L and fi(X) E
L(Xi) for all i, we have fi(X) . Xi = F(J.L, X) a.s.
Proof: From (15) we note that X is a semi martingale with covariation
processes [Xi, xj] == aij (X) . A and drift components b i (X) . A. Hence,
fi(X) E L(Xi) for all i iff the processes (fi)2a ii (X) and fibi(X) are a.s.
Lebesgue integrable. Note that this holds in particular when j is bounded.
Now assume that fl, f2,... are progressive with
(f - fi)2a ii (X) . A -+ 0,
I(f - ji)bi(X)1 . A O.
(17)
. .. . p
Then (f(X) . XZ - ft(X) . X); -+ 0 for every t > 0 by Lemma 17.12.
Thus, if f(X) . Xi = Fn(J.L, X) a.s. for some measurable mappings Fn as
in (16), then Proposition 4.31 yields a similar representation for the limit
fi(X) . Xi.
As in the preceding proof, we may apply this argument in three steps,
reducing first to the case when j is bounded, next to the case of continuous
f, and finally to the case when f is a predictable step function. Here the
first and last steps are again elementary. For the second step, we may now
use the simpler approximation
fn(t,x) == n i t f(s,x)ds, t > 0, n E N, x E C(R+,R d ).
(t-n- 1 )+
By Theorem 2.15 we have fn(t,x) f(t,x) a.e. in t for each x E
C(1R+,JR d ), and (17) follows by dominated convergence. 0
Exercises
1. Show that if M is a local martingale and is an Fo-measurable random
variable, then the process Nt = Mt is again a local martingale.
2. Use Fatou's lemma to show that every local martingale M > 0 with
EMo < 00 is a supermartingale. Also show by an example that M may
fail to be a martingale. (Hint: Let Mt = X t /(I-t)+, where X is a Brownian
motion starting at 1, stopped when it reaches 0.)
17. Stochastic Integrals and Quadratic Variation 347
3. Fix a continuous local martingale M. Show that M and [M] have a.s.
the same intervals of constancy. (Hint: For any r E Q+, put T == inf{t >
r; [M]t > [M]r}' Then MT is a continuous local martingale on [r, (0)
with quadratic variation 0, so MT is a.s. constant on [s, T]. Use a similar
argument in the other direction.)
4. For any continuous local martingales M n starting at 0 and associated
optional times Tn, show that (Mn);n 0 iff [Mn]Tn O. State the
corresponding result for stochastic integrals.
5. Show that there exist some continuous semimartingales Xl, x 2 , . .. such
that X 0 and yet [Xn]t 0 for all t > O. (Hint: Let B be a Brow-
nian motion stopped at time 1, put A k2 -n == B(k-l)+2-n, and interpolate
linearly. Define X n == B - An.)
6. Consider a Brownian motion B and an optional tirne T. Show that
EB, == 0 when ET I / 2 < 00 and that EB; == ET when ET < 00. (Hint: Use
optional sampling and Theorem 17.7.)
7. Deduce the first inequality in Proposition 17.9 from J>roposition 17.17
and the classical Cauchy-Buniakovsky inequality.
8. Prove for any continuous semimartingales X and Y that [X + y]1/2 <
[X] 1/2 + [Y] 1/2 a.s.
9. (Kunita and Watanabe) Let M and N be continuous local martingales,
and fix any p,q,r > 0 with p-l + q-1 == r- 1 . Show that II[M,N]tllr <
II [M]t IIplI[N]t Ilq for all t > O.
10. Let M, N be continuous local martingales with Mo == No == O. Show
that M JlN implies [M, N] = 0 a.s. Also show by an example that the
converse is false. (Hint: Let M == U . Band N == V . B for a Brownian
motion B and suitable U, Y E L(B).)
11. Fix a continuous semimartingale X, and let U, V E L(X) with U == V
a.s. on some set A E Fo. Show that U . X == V . X a.s. on A. (Hint: Use
Proposition 17.15.)
12. Fix a continuous local martingale M, and let U, U I , U 2 ,... and V, VI,
V 2 ,'" E L(M) with IUnl < V n , Un U, V n V, and ((V n - V). M); 0
for all t > O. Show that (Un' M)t (U . M)t for all t. (Hint: Write
(Un - U)2 < 2(V n - V)2 + 8y 2 , and use Theorem 1.21 and Lemmas 4.2
and 1 7.12. )
13. Let B be a Brownian bridge. Show that Xt == Btl\l is a semimartingale
on R+ w.r.t. the induced filtration. (Hint: Note that Mt == (1 - t)-l Bt is
a martingale on [0,1), integrate by parts, and check that the compensator
has finite variation.)
14. Show by an example that the canonical decomposition of a continuous
semimartingale may depend on the filtration. (Hint: Let B be Brownian mo-
tion with induced filtration :F, put Qt = :Ft V a(B 1 ), and use the preceding
result. )
348 Foundations of Modern Probability
15. Show by stochastic calculus that t- P Bt -t 0 a.s. as t -7 00, where
B is a Brownian motion and p > . (Hint: Integrate by parts to find the
canonical decomposition. Compare with the Ll-limit.)
16. Extend Theorem 17.16 to a product of n semimartingales.
17. Consider a Brownian bridge X and a bounded, progressive process
V with Jo 1 vtdt = 0 a.s. Show that E J0 1 V dX = O. (Hint: Integrate by
parts to get Je: V dX == J (V - U)dB, where B is a Brownian motion and
U t == (1 - t)-l J/ Vsds.)
18. Show that Proposition 17.17 remains valid for any finite optional times
t and tnk satisfying maxk(tnk - tn,k-l) o.
19. Let M be a continuous local martingale. Find the canonical decompo-
sition of IMIP when p > 2, and deduce for such a p the second relation in
Theorem 17.7. (Hint: Use Theorem 17.18. For the last part, use Holder's
inequality. )
20. Let M be a continuous local martingale with Mo == 0 and [M]oo < 1.
Show for any r > 0 that P {SUPt Mt > r} < e _r 2 /2. (Hint: Consider the
supermartingale Z == exp(cM - c 2 [M]/2) for a suitable c > 0.)
21. Let X and Y be continuous semimartingales. Fix at> 0 and a se-
quence of partitions (tnk) of [0, t] with maxk(tnk - t n ,k-1) -7 o. Show that
! EkCY tnk + ¥tn,k-l)(X tnk - Xtn,k-l) (Y 0 X)t. (Hint: Use Corollary
17.13 and Proposition 17.17.)
22. Show that the Fisk-Stratonovich integral satisfies the chain rule U 0
(VoX) = (UV) oX. (Hint: Reduce to Ito integrals and use Theorems 17.11
and 17.16 and Proposition 17.14.)
23. A process is predictable if it is measurable with respect to the a-field
in JR+ x Q induced by all predictable step processes. Show that every pre-
dictable process is progressive. Conversely, given a progressive process X
and a constant h > 0, show that the process yt = X(t-h)+ is predictable.
24. Given a progressive process V and a nondecreasing, contiuous,
adapted process A, show that there exists some predictable process V with
IV - VI. A == 0 a.s. (Hint: Use Lemma 17.23.)
25. Given the preceding statement, deduce Lemma 17.23. (Hint: Begin
with predictable V, using a monotone class argument.)
26. Construct the stochastic integral V . M by approximation from ele-
mentary integrals, using Lemmas 17.10 and 17.23. Show that the resulting
integral satisfies the relation in Theorem 17.11. (Hint: First let M E M 2
and E(V 2 . [M])(X) < 00, and extend by localization.)
d - ,.., ,..,
27. Let (V, B) = (V, B), where Band B are Brownian motions on possibly
different filtered probability spaces and V E L(B), V E L(B). Show that
d - ,..,,.., ,..,
(V, B, V . B) = (V, B, V . B). (Hint: Argue as in the proof of Proposition
17.26. )
17. Stochastic Integrals and Quadratic Variation 349
28. Let X be a continuous F-semimartingale. Show that X remains a
semimartingale conditionally on :Fo, and that the conditional quadratic
variation agrees with [X]. Also show that if V E L(X), where V == a(Y)
for some continuous process Y and measurable function a, then V remains
conditionally X-integrable, and the conditional integral agrees with V . X.
(Hint: Conditioning on Fo preserves martingales.)
Chapter 18
Continuous Martingales
and Brownian Motion
Real and complex exponential martingales; martingale charac-
terization of Brownian motion; random time-change of mar-
tingales; integral representation of martingales; iterated and
multiple integrals; change of measure and Girsanov's theo-
rem; Cameron-Martin theorem; Wald's i4entity and Novikov's
condition
This chapter deals with a wide range of applications of the stochastic
calculus, the principal tools of which were introduced in the preceding
chapter. A recurrent theme is the notion of exponential martingales, which
appear in both a real and a complex variety. Exploring the latter yields
an effortless approach to Levy's celebrated martingale characterization of
Brownian motion as well as to the basic random time-change reduction of
isotropic continuous local martingales to a Brownian motion. By applying
the latter result to suitable compositions of Brownian motion with har-
monic or analytic functions, we may deduce some important information
about Brownian motion in JRd. Similar methods can be used to analyze a
variety of other transformations that lead to Gaussian processes.
As a further application of the exponential martingales, we shall derive
stochastic integral representations of Brownian functionals and martingales
and examine their relationship to the chaos expansions obtained by differ-
ent methods in Chapter 13. In this context, we show how the previously
introduced multiple Wiener-Ita integrals can be expressed as iterated single
Ita integrals. A similar problem, of crucial importance for Chapter 21, is to
represent a continuous local martingale with absolutely continuous covari-
ation processes in terms of stochastic integrals with respect to a suitable
Brownian motion.
Our last main topic is to examine the transformations induced by an
absolutely continuous change of probability measure. The density process
turns out to be a real exponential martingale, and any continuous local
martingale in the original setting will remain a martingale under the new
measure, apart from an additional drift term. The observation is useful for
applications, where it is often employed to remove the drift from a given
semimartingale. The appropriate change of measure then depends on the
18. Continuous Martingales and Brownian Motion 351
process, and it becomes important to derive effective criteria for a proposed
exponential process to be a true martingale.
Our present exposition may be regarded as a continuation of the dis-
cussion of martingales and Brownian motion from Chapters 7 and 13,
respectively. Changes of time and measure are both important for the the-
ory of stochastic differential equations, as developed in (hapters 21 and
23. The time-change results for continuous martingales have a counterpart
for point processes explored in Chapter 25, where general Poisson processes
playa role similar to that of the Gaussian processes here. The results about
changes of measure are extended in Chapter 26 to the context of possibly
discontinuous semimartingales.
To elaborate on the new ideas, we begin with an introduction of complex
exponential martingales. It is instructive to compare them with the real
versions appearing in Lemma 18.21.
Lemma 18.1 (complex exponential martingales) Let M be a real contin-
uous local martingale with Mo == o. Then
Zt == exp(iM t + ![M]t), t > 0,
is a complex local martingale satisfying Zt == 1 + i(Z . M)t a.s.
Proof: Applying Corollary 17.20 to the complex-valued semimartingale
Xt == iMt + [M]t and the entire function j(z) == e Z , we get
dZ t == Zt(dX t + !d[X]t) == Zt(idM t + d[M]t - d[M]t) == iZtdMt. 0
The next result gives the basic connection between continuous martin-
gales and Gaussian processes. For any subset K of a Hilbert space, we write
k for the closed linear subspace generated by K.
Lemma 18.2 (isometries and Gaussian processes) Given a subset K of a
Hilbert space H, consider for each h E K a continuous local F -martingale
M h with MIf == 0 such that
[M h , Mk]oo = (h, k) a.s., h, k E K. (1)
Then there exists an isonormal Gaussian process 1JllFo on K such that
M == TJh a.s. for all h E K.
Proof: Fix any linear combination Nt == u 1 M;1 + . . . + unM:n, and
conclude from (1) that
[N]oo == L. ujuk[Mhj,Mhk]oo == L. ujuk(hj,hk) == Ilh11 2 ,
J,k J,k
where h == u1h 1 + ... + unh n . The process Z == exp(iN + ![N]) is a.s.
bounded, and so by Lemma 18.1 it is a uniformly integrable martingale.
Writing == N oo , we hence obtain for any A E Fo
PA = E[Zoo;A] = E[exp(iN oo + [N]oo);A] = E[e i l;;A]e lfh I!2/2.
Since Ul, . . . , Un were arbitrary, we conclude from the uniqueness theorem
for characteristic functions that the random vector (M, . . . , Mn) is in-
352 Foundations of Modern Probability
dependent of :Fa and centered Gaussian with covariances (h j , h k ). It is now
easy to construct a process rJ with the stated properties. 0
As a first application, we may establish the following basic martingale
characterization of Brownian motion.
Theorem 18.3 (characterization of Brownian motion, Levy) Let B ==
(B 1 , ..., B d ) be a process in JRd with Bo == O. Then B is an :F-Brownian
motion iff it is a continuous local :F -martingale with [B i , Bj]t = bijt a.s.
Proof' For fixed s < t, we may apply Lemma 18.2 to the continuous
local martingales M; = B:l\t - B:l\s, r > s, i = 1,..., d, to see that the
differences B1 - B are i.i.d. N(O, t - s) and independent of :Fs. 0
The last theorem suggests the possibility of transforming an arbitrary
continuous local martingale M into a Brownian motion through a suitable
random time-change. The proposed result is indeed true and admits a nat-
ural extension to higher dimensions; for convenience, we consider directly
the version in ]Rd. A continuous local martingale M == (M 1 , . . . , M d ) is said
to be isotropic if a.s. [M i ] == [Mj] and [M i , Mj] == 0 for all i i= j. Note in
particular that this holds for Brownian motion in JRd. When M is a contin-
uous local martingale in <C, the condition is clearly equivalent to [M] == 0
a.s., or [M] == [M] and [M, M] == 0 a.s. For isotropic processes M,
we refer to [Ml] == ... == [M d ] or [RM] == [SSM] as the rate process of M.
The proof is straightforward when [M]oo == 00 a.s., but in general it re-
quires a rather subtle extension of the filtered probability space. To simplify
our statements, we assume the existence of any requested randomization
variables. This can always be achieved, as in the elementary context of
Chapter 6, by passing from the original setup (Q, A,:F, P) to the product
space (n, A, P, P), where n == n x [0,1], A == A Q9 B, Pt == :Ft x [0,1], and
P == p 0 A. Given two filtrations :F and g on n, we say that 9 is a standard
extension of :F if Ft C YtJlFt:F for all t > o. This is precisely the condition
needed to ensure that all adaptedness and conditioning properties will be
preserved. The notion is still flexible enough to admit a varety of useful
constructions.
Theorem 18.4 (time-change reduction, Dambis, Dubins and Schwarz)
Let M be an isotropic continuous local :F-martingale in JRd with Mo == 0,
and define
Ts == inf{t > 0; [Ml]t > s}, Ys == FTs' S > o.
Then there exists in JRd a Brownian motion B with respect to a standard
extension ofg, such that a.s. B == MOT on [0, [M 1 ]oo) and M == Bo[M 1 ].
Proof: We may take d = 1, the proof in higher dimensions being similar.
Introduce a Brownian motion X Ji:F with induced filtration X, and put
Qt = Qt V Xt. Since (ilLX, it is clear that g is a standard extension of both
18. Continuous Martingales and Brownian Motion 353
g and X. In particular, X remains a Brownian motion under Q. Now define
Bs = M'T s + 1 s l{Tr = oo}dX r , s > o.
(2)
Since M is i-continuous by Proposition 17.6, Theorem 17.24 sows that the
first term M 0 i is a continuous Q-martingale, hence also a Q-martingale,
with quadratic variation
[M 0 i]s == [M]T s == S 1\ [M]oo, s > o.
The second term in (2) has quadratic variation s - s 1\ [M]OCJ' and the
covariation vanishes since MOT l.LX. Thus, [B]s == s a.s., and so Theorem
18.3 shows that B is a Q-Brownian motion. Finally, Bs == M Ts for s < [M]oo,
which implies M == B 0 [M] a.s. by the i-continuity of M. 0
In two dimensions, isotropic martingales arise naturally through the
composition of a complex Brownian motion B with an arbitrary (possi-
bly multi-valued) analytic function f. For a general continuous process X,
we may clearly choose a continuous evolution of f(X), as long as X avoids
the possible singularities of f. Similar results are available for harmonic
functions, which is especially useful in dimensions d > 3, when no analytic
functions exist.
Theorem 18.5 (harmonic and analytic maps, Levy)
(i) Let M be an isotropic, continuous local martingale in ffi.d, and fix an
harmonic function I such that M a.s. avoids the .sigularities of f.
Then I(M) is a local martingale with [f(M)] == l\7f(M)12 . [M 1 ].
(ii) Let M be a complex, isotropic, continuous local martingale, and fix
an analytic function f such that M a.s. avoids the singularities of f.
Then f(M) is again an isotropic local martingale, and [f(M)] ==
If'(M)12. [M]. If B is a Brownian motion and 1'1= 0, then [f(B)]
is a.s. unbounded and strictly increasing.
Proof: (i) Using the isotropy of M, we get by Corollary 17.19
I(M) == I(M o ) + II . M i + D.I(M) . [M 1 ].
Here the last term vanishes since I is harmonic, and so f(M) is a local
martingale. From the isotropy of M we further obtain
[f(M)] = Li[JI(M)' Mi] = L i UI(M»2. [M 1 ] = I'VJ(MW. [Ml].
(ii) Since f is analytic, we get by Corollary 17.20
I(M) == I(M o ) + f'(M) . M + f/l(M) . [Af].
(3)
Here the last term vanishes since M is isotropic. The same property also
yields
[f(M)] = [f'(M) . M] = (f'(M))2 . [M) = 0,
354 Foundations of Modern Probability
and so I(M) is again isotropic. Finally, writing M == X + iY and I'(M) ==
U + iV, we get
[f(M)J == [U . X - V . Y] == (U 2 + V 2 ) . [X] == /f'(M)/2 . [RM].
If I' is not identically 0, it has at most count ably many zeros. Hence, by
Fubini's theorem
E'x{t > 0; !'(B t ) = O} = 1 00 P{J'(Bt) = O}dt = 0,
and so [RI(B)] == If'(B)12. A is a.s. strictly increasing. To see that it is also
a.s. unbounded, we note that f(B) converges a.s. on the set {[f(B)] < oo}.
However, I(B) diverges a.s. since I is nonconstant and the random walk
Bo, B 1 , . .. is recurrent by Theorem 9.2. 0
Combining the last two results, we may derive two basic properties
of Brownian motion in ]Rd, namely the polarity of singleton sets when
d > 2 and the transience when d > 3. Note that the latter property is a
continuous-time counterpart of Theorem 9.8 for random walks. Both prop-
erties play important roles for the potential theory developed in Chapter
24. Define Ta = inf{t > 0; Bt = a}.
Theorem 18.6 (point polarity and transience, Levy, Kakutani) For a
Brownian motion B in ]Rd, we have the following:
(i) If d > 2, then Ta == 00 a.s. for all a E ]Rd.
(ii) If d > 3, then IBtl -j. 00 a.s. as t --t 00.
Proof: (i) Here we may clearly take d == 2, so we may let B be a complex
Brownian motion. Applying Theorem 18.5 (ii) to the entire function e Z ,
it is seen that M = e B is a conformal local martingale with unbounded
rate [M]. By Theorem 18.4 we have M - 1 == X 0 [RM] a.s. for some
Brownian motion X, and since M =I 0 it follows that X a.s. avoids -1.
Hence, T-1 = 00 a.s., and by the scaling and rotational symmetries of B
we get Ta == 00 a.s. for every a 1= O. To extend the result to a = 0, we may
conclude from the Markov property at h > 0 that
Po{ TO 0 ()h < oo} = EOPBh {TO < oo} == 0, h > O.
As h -1- 0, we get Po{ TO < oo} = 0, and so TO = 00 a.s.
(ii) Here we may take d = 3. For any a =I 0 we have Ta = 00 a.s. by
claim (i), and so by Theorem 18.5 (i) the process M = IB - al- 1 is a
continuous local martingale. By Fatou's lemma M is then an L1-bounded
supermartingale, and so by Theorem 7.18 it converges a.s. toward some
random variable . Since Mt 0 we have = 0 a.s. 0
Combining part (i) of the last result with Theorem 19.11, we note that
a complex, isotropic continuous local martingale avoids every fixed point
outside the origin. Thus, Theorem 18.5 (ii) applies to any analytic function
f with only isolated singularities. Since f is allowed to be multi-valued,
18. Continuous Martingales and Brownian Motion 355
the result applies even to functions with essential singularities, such as to
j(z) == loge! + z). For a simple application, we may consider the windings
of planar Brownian motion around a fixed point.
Corollary 18.7 (skew-product representation, Galmarino) Let B be a
complex Brownian motion starting at 1, and choose a continuous version
of V == argB with Va == O. Then Vi = Y 0 (IBI- 2 . A)t a.s. for some real
Brownian motion Y lLIBI.
Proof: Applying Theorem 18.5 (ii) with fez) == log(1 + z), we note that
Mt == log IBtl + ivt is an isotropic martingale with rate [RM] == IBI- 2 . A.
Hence, by Theorem 18.4 there exists some complex Brownian motion Z ==
X + iY with M == Z 0 [M] a.s., and the assertion £0110'\\1-8. 0
For a nonisotropic continuous local martingale M in d, there is no
single random time-change that will reduce the process to a Brownian
motion. However, we may transform each component M i separately, as
in Theorem 18.4, to obtain a collection of one-dimensional Brownian mo-
tions B I ,..., B d . If the latter processes happen to be independent, they
may clearly be combined into a d-dimensional Brownian motion B ==
(B I , . . . , B d ). It is remarkable that the required independence arises au-
tomatically whenever the original components M i are strongly orthogonal,
in the sense that [M i , Mj) == 0 a.s. for all i =1= j.
Proposition 18.8 (orthogonality and independence, Knight) Let M I , M 2 ,
. .. be strongly orthogonal, continuous local martingales starting at O.
Then there exist some independent Brownian motions B 1 , B 2 , . .. such that
M k == B k 0 [M k ] a.s. for every k.
Proof: When [Mk]oo == 00 a.s. for all k, the result is an easy consequence
of Lemma 18.2. In general, we may introduce a sequence of independent
Brownian motions Xl, x 2 , . . . lL F with induced filtration X. Define
B: == M k (,:) + Xk((s - [Mk]oo)+), s > 0, kEN,
write'l.jJt == -log(1 - t)+, and put 9t == :F1/Jt + X(t-l)+, t > O. To check that
BI , B 2 , . .. have the desired joint distribution, we may clearly assume the
[M k ] to be bounded. Then the processes N t k == Mt + X_l)+ are strongly
orthogonal, continuous Q-martingales with quadratic variations [Nk]t ==
[M k ]1Pt + (t - 1)+, and we note that B: == N;k, where a == inf{t > 0;
s
[Nk]t > s}. The assertion now follows from the result for [Mk](X) == 00
a.s. 0
As a further application of Lemma 18.2, we consider a simple continuous-
time version of Theorem 11.13. Given a continuous semimartingale X on
I = IR+ or [0,1) and a progressive process T on I that takes values in
I == [0,00] or [0,1], respectively, we may define
(X oT-1)t = jl{T s < t}dX s , t E I,
356 Foundations of Modern Probability
as long as the integrals on the right exist. For motivation, we note that
if is a random measure on I with "distribution function" Xt = [O, t],
tEl, then X 0 T- 1 is the distribution function of the transformed measure
0 T- 1 .
Proposition 18.9 (measure-preserving progressive maps) Let B be a
Brownian motion or bridge on I = JR+ or [0, 1], respectively, and let T
be a progressive process on I such that AoT- 1 == A a.s. Then BoT- 1 d B.
Proof: The result for I = JR+ is an immediate consequence of Lemma 18.2,
and so we may assume that B is a Brownian bridge on [0, 1]. Then Mt ==
Bt/(l- t) is a martingale on [0,1), and therefore B is a semimartingale on
the same interval. Integrating by parts gives
dBt = (1 - t)dM t - Mtdt = dX t - Mtdt.
(4)
Thus, [X]t = [B]t == t a.s. for all t, and X is a Brownian motion by Theorem
18.3.
Now let V be a bounded, progressive process on [0,1] such that the
integral V = Jo 1 "Vtdt is a.s. nonrandom. Integrating by parts, we get for
any u E [0, 1)
1'" VtMt dt
M", 1'" Vtdt - 1'" dMt it Vsds
1'" dMt 1 1 Vsds - M", 1 1 Vtdt.
As u --t 1, we have (1 - u)Mu = Bu --t 0, and so the last term tends to O.
Hence, by dominated convergence and (4),
1 1 VtdBt = 1 1 VtdX t -1 1 VtMtdt = 1 1 (Vt - V t)dX t ,
where V t = (1- t) -1 It! Vsds. If U is another bounded, progressive process,
we get by a simple calculation
1 1 (U t - U t)(Vt - V t)dt = 1 1 UtVtdt - UV .
For U r = l{Tr < s} and V r = l{Tr < t}, the right-hand side becomes
s 1\ t - st = E(BsBt), and the assertion follows by Lemma 18.2. 0
We turn to a basic representation of martingales with respect to a
Brownian filtration.
18. Continuous Martingales and Brownian Motion 357
Theorem 18.10 (Brownian martingales) Let F be the complete filtration
induced by a Brownian motion B == (B I , . . . , B d ) in]Rd. Then any local :F-
martingale M is a.s. continuous, and there exist some (P x A)-a.e. unique
processes VI, . . . , V d E L(B I ) such that
M == A10 + " V k . B k a.s. (5)
kd
The statement is essentially equivalent to the following representation of
Brownian functionals, which we prove first.
Lemma 18.11 (Brownian functionals, Ita) Let B == (BI,...,B d ) be
a Brownian motion in }Rd, and fix any B -measurable random variable
E L2 with E == O. Then there exist some (P x A)-a.e. unique processes
VI,. . . , V d E £(B 1 ) such that =: I:k(V k . Bk)oo a.s.
Proof (Dellacherie): Let H denote the Hilbert space of B-measurable
random variables E £2 with E == 0, and write K for the subspace of
elements admitting the desired representation I:k(V k .1k)00' For such a
we get E2 = E I:k ((V k )2 . A)oo, which implies the asserted uniqueness.
By the obvious completeness of L(B 1 ), it is further seen from the same
formula that K is closed. To obtain K == H, we need to show that any
E H e K vanishes a.s.
Then fix any nonrandom functions u l ,..., u d E L 2 (JR). Put M ==
I:kuk. B k , and define the process Z as in Lemma 18.1. Then Z -1 ==
iZ.1\;1 =: iI:k(Zu k ) . B k by Proposition 17.14, and so 1- (Zoo - 1), or
E exp{ iI:k (uk. Bk)oo} == O. Specializing to step functions uk and using
the uniqueness theorem for characteristic functions, we get
E[; (Btl'.'.' Bt n ) E 0] == 0, tl, . . . , t n E JR+, C E B n , n E N.
By a monotone class argument this extends to E[; A] == 0 for arbitrary
A E :Foo, and so == E[I:Foo] == 0 a.s. 0
Proof of Theorem 18.10: We may clearly take Mo == 0, and by suitable
localization we may assume that M is uniformly integrable. Then MOC)
exists in L1(:Foo) and may be approximated in L 1 by some random variables
l, 2,' .. E L2(:F 00 ). The martingales M'[" == E[n'Ft] are a.s. continuous
by Lemma 18.11, and by Proposition 7.15 we get, for any E > 0,
P{(M)* > 2e} < P{(M n - M)* > E} < E-1EIEn -- Moo I -t O.
Hence, (dM)* == 0 a.s., and so M is a.s. continuous. The remaInIng
assertions now follow by localization from Lemma 18.11. 0
Our next theorem deals with the converse problem of finding a Brownian
motion B satisfying (5) when the representing processes \Irk are given. The
result plays a crucial role in Chapter 21.
358 Foundations of Modern Probability
Theorem 18.12 (integral representation, Doob) Let M be a continuous
local :F -martingale in 1R d with Mo = 0 such that [M i , Mj] = VV · A a.s.
for some :F-progressive processes V, 1 < i < d, 1 < k < n. Then there
exists in 1R d a Brownian motion B with respect to a standard extension of
:F such that M i = V . B k a.s. for all i.
Proof: For any t > 0, let Nt and Rt be the null and range spaces of
the matrix \It, and write N t .1. and Rt for their orthogonal complements.
Denote the corresponding orthogonal projections by 1TN t , 1TRt' 1TN-L, and
t
1TR..L, respectively. Note that vt is a bijection from Nt to Rt, and write
t
-1 for the inverse mapping from Rt to Nl-. All these mappings are clearly
Borel-measurable functions of Vi, and hence again progressive.
Now introduce a Brownian motion X lL:F in }Rn with induced filtration
X, and note that 9t = Ft V Xt, t > 0, is a standard extension of both :F
and X. Thus, V remains Q-progressive, and the martingale properties of
M and X are still valid for Q. Consider in ]Rn the local Q-martingale
B = V- 1 1rR . M + 1rN . X.
The covariation matrix of B has density
(V- 1 7rR)VV'(V- 1 1rR)' + 7rN7r'rv = 1fN-L 1r..L + 1rN1T'rv = 7rN-L + 7rN = I,
and so Theorem 18.3 shows that B is a Brownian motion. Furthermore, the
process 7rR-L . M = 0 vanishes a.s. since its covariation matrix has density
1rR-L VV'1Tk-L = o. Hence, by Proposition 17.14,
V. B = VV- l 7rR. M + V1fN . Y = 1rR. M = (1rR + 1fR-L). M = M. 0
We may next prove a Fubini-type theorem, which shows how the multiple
Wiener-Ita integrals defined in Chapter 13 can be expressed in terms of
iterated Ita integrals. Then introduce for each n E N the simplex
n = {( t 1, . . . , t n ) E IR; t 1 < . . . < t n }.
Given a function f E L 2 (IR+.,A n ), we write j = n!j1n' where j denotes
the symmetrization of f defined in Chapter 13.
Theorem 18.13 (multiple and iterated integrals) Consider a Brownian
motion B in JR with associated multiple Wiener-Ita integrals In' and fix
any f E L2(1R+.). Then
In! = J dBt n J dBtn_l · .. J j(tl'..., t n )dB t1 a.s. (6)
Though a formal verification is easy, the existence of the integrals on
the right depends in a subtle way on the possibility of choosing suitable
versions in each step. The existence of such versions is implicitly regarded
as part of the assertion.
18. Continuous Martingales and Brownian Motion 359
Proof: We shall prove by induction that the iterated integral
"1.-'tH ....,t n = J dB tk J dB tk _ 1 .. . J j( tl, . . . , tn)dB t1
exists for almost all tk+l,. . . , tn, and that V k has a version supported by
D.n-k that is progressive as a process in tk+l with parameters tk+2, . . . , tn.
FUrthermore, we shall establish the relation
E ("1.-'t:+lo....t n f = J... J {f(h,.. ., t n )}2dh . .. dtk. (7)
This allows us, in the next step, to define :;,...,tn for almost all
tk+2, . . . , tn.
The integral VO = j clearly has the stated properties. Now assume that
a version of the integral "1.-'t::.,tn has been constructed with the desired
properties. For any tk+l, . . . , t n such that (7) is finite, Theorem 17.25 shows
that the process
X;,tk+l....,t n = I t "1.-'t..tn dB tk , t > 0,
has a progressive version that is a.s. continuous in t for fixed tk+l, . . . , tn.
By Proposition 17.15 we obtain
t k t =X t k t t a.s., tk + l,...,tn > O,
k+l,"., n k+l, k+l,..., n
and the progressivity clearly carries over to V k , regarded as a process in
tk+l with parameters tk+2'..., tn. Since V k - 1 is supported by n-k+l, we
may choose X k to be supported by JR+ x t1 n - k , which ensures V k to be
supported by Lln-k. Finally, equation (7) for V k - 1 yields
E J ("1.-'t:..tJ 2 dtk
J . . . J {f (t 1, . · . , t n )} 2 dt 1 . . . dt k .
To prove (6), we note that the right-hand side is linear and L 2 -continuous
in f. Furthermore, the two sides agree for indicator functions of rectangu-
lar boxes in n. The relation extends by a monotone class argument to
arbitrary indicator functions in n, and the further extension to L2(il n )
is immediate. It remains to note that In! = In! = In! for any ! E
L 2 (1R+). 0
E (:+l ,...,t n ) 2
OUf previous developments have provided two entirely different represen-
tations of Brownian functionals with zero mean and finite variance, namely
the chaos expansion in Theorem 13.26 and the stochastic integral repre-
sentation in Lemma 18.11. We proceed to examine ho'\\'- the two formulas
are related. For any function f E £2 (JR+.), we define !t (t 1 , . . . , tn-I)
!(t l ,. . ., tn-I, t) and write In-lf(t) = In-1ft when 11ft II < 00.
360 Foundations of Modern Probability
Proposition 18.14 (chaos and integral representations) Fix a Brownian
motion B in 1R, and let be a B-measurable random variable with chaos
expansion Enl In/n. Then = (V . B)oo a.s., where
Vi = " In-lln{t), t > o.
L..,., n 1
Proof: For any mEN we get, as in the last proof,
J " "2 "'"'2" 2
dt L..,., E{In-1fn(t)} = L..,., IIfnll = L..,., E(Infn) < 00.
nm nm nm
(8)
Since integrals In! with different n are orthogonal, it follows that the series
for Vi converges in L 2 for almost every t > O. On the exceptional set we
may redefine \It to be o. As before, we may choose progressive versions of
the integrals In-lin(t), and from the proof of Corollary 4.32 it is clear that
even the sum V can be chosen to be progressive. Applying (8) with m = 1,
we then obtain V E L(B).
Using Theorem 18.13, we get by a formal calculation
= L: Infn = L: J In-dn(t)dBt = J dBt L: In-dn(t) = J YtdBt.
nl nl nl
To justify the interchange of integration and summation, we may use (8)
and conclude as m -t 00 that
E {J dBt LnmIn-dn(t) r - J dt Lnm E{In-dn(t)}2
Lnm E(Infn)2 o.
Let us now consider two different probability measures P and Q on the
same measurable space (0, A), equipped with a right-continuous and P-
complete filtration (:Ft). If Q « P on :Ft, we denote the corresponding
density by Zt, so that Q = Zt . P on Ft. Since the martingale property
depends on the choice of probability measure, we need to distinguish be-
tween P-martingales and Q-martingales. Integration with respect to P is
denoted by E as usual, and we write :F 00 = V t :Ft.
o
Lemma 18.15 (absolute continuity) Let Q = Zt . P on :Ft for all t > o.
Then Z is a P -martingale, and it is further uniformly integrable iff Q « P
on Foo. More generally, an adapted process X is a Q-martingale iff XZ is
a P -martingale.
Proof: For any adapted process X, we note that Xt is Q-integrable iff
XtZt is P-integrable. If this holds for all t, we may write the Q-martingale
property of X as
L XsdQ = L XtdQ, A E :Fs, S < t.
18. Continuous Martingales and Brownian Motion 361
By the definition of Z, it is equivalent that
E[XsZs; A] == E[XtZt; A], A E Fs, S < t,
which means that X Z is a P-martingale. This proves the last assertion,
and the first statement follows as we take Xt = 1.
Next assume that Z is uniformly P-integrable, say with £1-limit Zoo' For
any t < u and A E Ft we have QA == E[Zu; A]. As u -+ 00, it follows that
QA == E[Zoo; A], which extends by a monotone class argulnent to arbitrary
A E Foo. Thus, Q == Zoo . P on Foo. Conversely, if Q == . P on Foo, then
E == 1, and the P-martingale Mt == E[IFt] satisfies Q == Mt . P on :Ft for
each t. But then Zt == Mt a.s. for each t, and Z is uniformly P-integrable
with limit . 0
By the last lemma and Theorem 7.27, we may henceforth assume that
the density process Z is rcll. The basic properties may then be extended
to optional times and local martingales as follows.
Lemma 18.16 (localization) Let Q == Zt' P on Ft for all t > O. Then for
any optional time T, we have
Q==Z,.p on :F,n{T<OO}. (9)
Furthermore, an adapted rcll process X is a local Q-martingale iff X Z is a
local P -martingale.
Proof: By optional sampling,
QA == E[ZTAt; A], A E F TAt , t > 0,
and so
Q[A;T < t] == E[Z,;An {T < t}], A E :F" t > O.
Equation (9) now follows by monotone convergence as t -+ 00.
To prove the last assertion, it is enough to show for any optional time T
that X' is a Q-martingale iff (X Z)T is a P-martingale. This may be seen
as before if we note that Q == Z[ . P on F,/\t for each t. 0
We also need the following positivity property.
Lemma 18.17 (positivity) For every t > 0 we have infs::;t Zs > 0 a.s. Q.
Proof: By Lemma 7.31 it is enough to show for each t > 0 that Zt > 0 a.s.
Q. This is clear from the fact that Q{Zt == O} == E[Zt; Zt == 0] == o. 0
In typical applications, the measure Q is not given at the outset but needs
to be constructed from the martingale Z. This requires some regularity
conditions on the underlying probability space.
362 Foundations of Modern Probability
Lemma 18.18 (existence) For any Polish space S, let P be a probability
measure on n = D(1R+, S), endowed with the right-continuous and complete
induced filtration F. Consider an F-martingale Z > 0 with Zo = 1. Then
there exists a probability measure Q on n with Q = Zt . P on Ft for all
t > o.
Proof: For each t > 0, we may introduce the probability measure Qt =
Zt . P on Ft, which may be regarded as a measure on D([O, t], S). Since
the spaces D([O, t], S) are Polish for the Skorohod topology, Corollary 6.15
ensures the existence of some probability measure Q on D(JR+, S) with
projections Qt. It is easy to verify that Q has the stated properties. 0
The following basic result shows how the drift term of a continuous semi-
martingale is transformed under a change of measure with a continuous
density Z. An extension appears in Theorem 26.9.
Theorem 18.19 (transformation of drift, Girsanov, van Schuppen and
Wong) Let Q = Zt . P on Ft for each t > 0, where Z is a.s. co r:. tin-
uous. Then for any continuous local P-martingale M, the process M =
M - Z-l · [M, Z] is a local Q-martingale.
_ Proof: First assume that Z-l is bounded on the support of [M]. Then
M is a continuous P-semimartingale, and we get by Proposition 17.14 and
an integration by parts
MZ-(MZ)o - M.Z+Z.M+[M,Z]
1M. Z + Z. M - [M, Z] + [1\1, Z]
- M. Z + Z . M,
which shows that M Z is a local P-martingale. Hence, M is a local Q-
martingale by Lemma 18.16.
For general , we may define Tn = inf {t > 0; Zt < 1/ n} and conclude
as before that MTn is a local Q-martingale for each n E N. ince Tn --t
00 a.s. Q by Lemma 18.17, it follows by Lemma 17.1 that M is a local
Q-martingale. 0
The next result shows how the basic notions of stochastic calculus are
preserved under a change of measure. Here [X]p denotes the quadratic
variation of X under the probability measure P. We further write Lp(X)
for the class of X-integrable processes V under P, and let (V . X)p be the
corresponding stochastic integral.
Proposition 18.20 (preservation laws) Let Q = Zt . P on Ft for each
t > 0, where Z is continuous. Then any continuous P-semimartingale X is
also a Q-semimarlingale, and [X]p = [X]Q a.s. Q. Furthermore, Lp(X) c
LQ(X), and for any V E Lp(X) we have (V . X)p = (V . X)Q a.s. Q.
Finally, any continuous local P-martingale M satisfies (V . M)'" = V . M
a.s. Q whenever either side exists.
18. Continuous Martingales and Brownictn Motion 363
Proof: Consider a continuous P-semimartingale X == M + A, where M is a
continuous local P-martingale and A is a process of locally finite variation.
Under Q we may write X = if +Z-l.[M, Z]+A, where if is the continuous
local Q-martingale of Theorem 18.19, and we note that Z-l . (M, Z] has
locally finite variation since Z > 0 a.s. Q by Lemma 18.17. Thus, X is also
a Q-semimartingale. The statement for [X] is now clear from Proposition
17.17.
Now assume that V E Lp(X). Then V 2 E Lp([X]) anj V E Lp(A), so
the same relations hold under Q, and we get V E LQ (Iv! + A). Thus, to
get V E LQ(X), it remains to show that V E LQ(Z-l(M, Z]). Since Z > 0
under Q, it is equivalent to show that V E LQ«(M, Z]). But this is clear by
Proposition 17.9, since [M,Z]Q = [M,Z]Q and V E LQ(M).
To prove the last assertion, we note as before that LQ(M) == LQ(M). If
V belongs to either class, then by Proposition 17.14 we get under Q the
a.s. relations
(V . M)rv
V . M - Z-l . [V . M, Z)
1 -
- V. M - V Z- . [M, Z] == V . M.
o
In particular, we note that if B is a P-Brownian motion in d, then B is a
Q-Brownian motion by Theorem 18.3, since both processes are continuous
martingales with the same covariation process.
The preceding theory simplifies when P and Q are equivalent on each
Ft, since in that case Z > 0 a.s. P by Lemma 18.17. If Z is also continuous,
it may be expressed as an exponential martingale. More general processes
of this type are considered in Theorem 26.8.
Lemma 18.21 (real exponential martingales) A continuous process Z >
o is a local martingale iff it has an a. s. representation
Zt == £(M)t - exp(M t - [M]t), t > 0, (10)
for some continuous local martingale M. In that case M 'is a.s. unique, and
for any continuous local martingale N we have [M,N] =: Z-l . [Z,N].
Proof: If M is a continuous local martingale, then so is [(M) by Ito's
formula. Conversely, assume that Z > 0 is a continuous local martingale.
Then by Corollary 17.19,
10gZ -logZo = Z-l. Z - !Z-2. [Z] = Z-l. Z - [Z-l. Z],
and (10) follows with M = log Zo + Z-l . Z. The last assertion is clear
from this expression, and the uniqueness of M follows from Proposition
17.2. 0
We shall now see how Theorem 18.19 can be used to eliminate the drift
of a continuous semimartingale, and we begin with the silnple case of Brow-
nian motion B with a deterministic drift. Here we need the fact that £(B)
is a true martingale, as can be seen most easily by a direct computation.
By p Q we mean that P «: Q and Q P. Write Lroc for the class of
364 Foundations of Modern Probability
functions f : + -t jRd such that 111 2 is locally Lebesgue integrable. For
any f E Lfoc we define f . A == (i 1 . A, . . . , fd . A), where the components on
the right are ordinary Lebesgue integrals.
Theorem 18.22 (shifted Brownian motion, Cameron and Marlin) Let:F
be the complete filtration induced by canonical Brownian motion B in d,
fix a continuous function h : IR+ --+ IR d with ho == 0, and write Ph for the
distribution of B + h. Then Ph '" Po on :Ft for all t > 0 iff h == f . A for
some I E Lfoc, in which case Ph == £(1 . B)t . Po.
Proof: If Ph '" Po on each Ft, then by Lemmas 18.15 and 18.17 there
exists some Po-martingale Z > 0 such that Ph == Zt . Po on :Ft for all t > O.
Theorem 18.10 shows that Z is a.s. continuous, and by Lemma 18.21 it
can then be written as £(M) for some continuous local Po-martingale M.
Using Theorem 18.10 again, we note that M == V . B = Ei Vi . B i a.s. for
some processes Vi E L(B 1 ), and in _particular V E Lfoc a.s.
By Theorem 18.19 the process B == B - [B, M] == B - V . A is a P h -
Brownian motion, and so, under Ph, the canonical process B has two
semimartingale decompositions, namely
B == B + V . A == (B - h) + h.
By Proposition 17.2 the decomposition is a.s. unique, and so V . A == h a.s.
Thus, h == I . A for some nonrandom function f E Lc' and furthermore
A{t > 0; \It =I It} == 0 a.s., which implies M == V . B == f. B a.s.
Conversely, assume that h == I . A for some f E Lfoc. Since M == I . B
is a time-changed Brownian motion under Po, the process Z == £ (M) is a
Po-martingale, and by Lemma 18.18 there exists a probability measure Q
on C(1R+, d) such that Q = Zt' Po on :Ft for all t > O. Moreover, Theorem
18.19 shows that B = B - [B, M] = B - h is a Q-Brownian motion, which
means that Q = Ph. In particular, Ph '" Po on each :Ft. 0
In more general cases, Theorem 18.19 and Lemma 18.21 suggest that
we might try to remove the drift of a semimartingale through a change of
measure of the form Q == £(M)t . P on :Ft for each t > 0, where M is
a continuous local martingale with Mo == o. By Lemma 18.15 it is then
necessary for Z == £ (M) to be a true martingale. This is ensured by the
following condition.
Theorem 18.23 (uniform integrability, Novikov) Let M be a continuous
local martingale with Mo == 0 such that Ee[M]oo/2 < 00. Then £(M) is a
uniformly integrable martingale.
The result will first be proved in a special case.
Lemma 18.24 (Wald's jdentity) If B is a real Brownian motion and r
is an optional time with Ee 7" /2 < 00, then E exp( Br - T) = 1.
Proof: We first consider the special optional times
Tb = inf {t > 0; Bt = t - b}, b > O.
18. Continuous Martingales and Brownian Motion 365
Since the Tb remain optional with respect to the right-continuous, induced
filtration, we may assume B to be canonical Brownian n1otion with asso-
ciated distribution P == Po. Defining ht = t and Z == E(B)., we see from
Theorem 18.22 that Ph == Zt . P on :Ft for all t > O. Since Ib < 00 a.s. under
both P and Ph, Lemma 18.16 yields
Eexp(B'b - Tb) == EZ'b == E[Z'b; Tb < 00] == Ph{Tb < oo} == 1.
In the general case, the stopped process Mt - Ztl\Tb is a. positive martin-
gale, and Fatou's lemma shows that M is also a supermartingale on [0,00].
Since, moreover, EMoo == EZ Tb == 1 == EMo, it is clear from the Doob
decomposition that M is a true martingale on [0,00]. Hence, by optional
sampling,
1 == EM, == EZ,I\ T b == E[Z,; T < Tb] + E[Z'b; T > Tb]. (11)
By the definition of Tb and the hypothesis on T, we get as b -+ 00
E[ZTb; T > Tb] == e- b E[e Tb / 2 ; T > Tb] < e- b Ee T / 2 --t 0,
and so the last term in (11) tends to zero. Since, moreover, Tb -t 00, the
first term on the right tends to EZ, by monotone convergence, and the
desired relation EZ, == 1 follows. 0
Proof of Theorem 18.23: Since £(M) is always a supermartingale on
[0,00], it is enough, under the stated condition, to show that E£(M)oo ==
1. We may then use Theorem 18.4 and Proposition 7.9 to reduce to the
statement of Lemma 18.24. 0
In particular, we obtain the following classical result for Brownian
motion.
Corollary 18.25 (removal of drift, Girsanov) Consider in }Rd a Brown-
ian motion B and a progressive process V with Eexp{(IVI2. -X)oo} < 00.
Then Q == £(V ' . B)oo . P is a probability measure, and 13 == B - V . -X is a
Q-Brownian motion.
Proof: Combine Theorems 18.19 and 18.23.
o
Exercises
1. Assume in Theorem 18.4 that [M]oo == 00 a.s. Show that M is T-
continuous in the sense of Theorem 17.24, and use Theore1n 18.3 to conclude
that B == MOT is a Brownian motion. Also show for any V E L(M) that
(V 0 T) . B == (V . M) 0 T a.s.
2. If B is a real Brownian motion and V E L(B), then X == V. B is a time-
changed Brownian motion. Express the required time-change T in terms of
V, and verify that X is r-continuous.
366 Foundations of Modern Probability
3. Let M be a real continuous local martingale. Show that M converges
a.s. on the set {SUPt Mt < oo}. (Hint: Use Theorem 18.4.)
4. Let :F and 9 be filtrations on a common probability space (!2, A, P).
Show that 9 is a standard extension of F iff every F-martingale is also
a Q-martingale. (Hint: Consider martingales of the form Mt == E[I:Ft],
where E Ll(:F oo ). Here Mt is gt-meaBurable for all iff:Ft C gt, and then
Mt = E[lgt] a.s. for all iff F lLFtYt by Proposition 6.6.)
5. Let F and 9 be right-continuous filtrations such that 9 is a standard
extension of :F, and let 7 be an F-optional time. Show that F-r C Q.,lLFrF.
(Hint: Apply optional sampling to the uniformly integrable martingale
Mt = E[IFtJ for any E Ll(:F oo ).)
6. Let M be a nontrivial isotropic continuous local martingale in ffi.d, and
fix an affine transformation f on }Rd. Show that even f(M) is isotropic iff
f is conformal (i.e., the composition of a rigid motion with a change of
scale) .
7. Deduce Theorem 18.6 (ii) from Theorem 9.8. (Hint: Define T ==
inf{t; IBtl = I}, and iterate the construction to form a random walk in
}Rd with steps of size 1.)
8. Deduce Theorem 18.3 for d == 1 from Theorem 14.17. (Hint: Proceed
as above to construct a discrete-time martingale with jumps of size h. Let
h --t 0, and use a version of Proposition 17.17.)
9. Consider a real Brownian motion B and a family of progressive processes
V t E L(B), t > O. Give necessary and sufficient conditions on the V t for
the existence of a Brownian motion B', such that B; = (V t . B)oo a.s. for
each t. Verify the conditions in the case of Proposition 18.9.
10. Extend Proposition 18.9 to any continuous, F-exchangeable process X
on + or [0,1]. (Hint: Recall that Xt = at+aB t for some Brownian motion
or bridge B and some independent pair of random variables a and a > o.
Note that X remains exchangeable for the filtration gt == :Ft V a{ a, a}.
Hence, so is B, and we may apply Proposition 18.9.)
11. Use Proposition 18.9 to give direct proofs of the relation 71 d 72 in
Theorems 13.16 and 13.17. (Hint: Imitate the proof of Theorem 11.14.)
12. For a Brownian motion B and optional time T < 00, show that
E exp(B., - T) < 1 where the inequality may be strict. (Hint: Thuncate and
use Fatou's lemma. Note that t - 2Bt --t 00 by the law of large numbers.)
Chapter 19
Feller Processes and Semigroups
Semigroups, resolvents, and generators; closure and core; Hille-
Yosida theorem; existence and regularization; strong Markov
property; characteristic operator; diffusions and elliptic oper-
ators; convergence and approximation
Our aim in this chapter is to continue the general discussion of continuous-
time Markov processes initiated in Chapter 8. We have already seen several
important examples of such processes, such as the pure jump-type processes
in Chapter 12, Brownian motion in Chapters 13 and 18, and the general
Levy processes in Chapter 15. The present treatment will be supplemented
by detailed studies of ergodic properties in Chapter 20, of diffusions in
Chapters 21 and 23, and of excursions and additive functionals in Chapters
22 and 25.
The crucial new idea is to regard the transition kernels as operators Tt
on an appropriate function space. The Chapman-Kolmogorov relation then
turns into the semigroup property Ts Tt = Ts+t, which suggests a formal
representation Tt = etA in terms of a generator A. Under suitable regularity
conditions-the so-called Feller properties-it is indeed possible to define
a generator A that describes the infinitesimal evolution of the underlying
process X. Under further hypotheses, X will be shown to have continuous
paths iff A is (an extension of) an elliptic differential operator. In general,
the powerful Hille- Yosida theorem provides the precise conditions for the
existence of a Feller process corresponding to a given operator A.
Using the basic regularity theorem for submartingales from Chapter 7, it
will be shown that every Feller process has a version that is right-continuous
with left-hand limits (rcll). Given this fundamental result, it is straightfor-
ward to extend the strong Markov property to arbitrary Feller processes.
We shall also explore some profound connections with martingale theory.
Finally, we shall establish a general continuity theorem for Feller processes
and deduce a corresponding approximation of discrete-time Markov chains
by diffusions and other continuous-time Markov processes. The proofs of
the latter results will require some weak convergence theory from Chapter
16.
To clarify the connection between transition kernels and operators, let
I-L be an arbitrary probability kernel on some measurable space (S, S). We
368 Foundations of Modern Probability
may then introduce an associated transition operator T, given by
Tf(x) = (Tf)(x) = J J1.(x,dy)f(y), XES,
(1)
where f: S --t JR is assumed to be measurable and either bounded or non-
negative. Approximating f by simple functions, we see that by monotone
convergence T f is again a measurable function on S. It is also clear that
T is a positive contraction operator, in the sense that 0 < f < 1 implies
o < T f < 1. A special role is played by the identity operator I, which cor-
responds to the kernel J.L(x, .) = 6x. The importance of transition operators
for the study of Markov processes is due to the following simple fact.
Lemma 19.1 (transition kernels and operators) The probability kernels
J..Lt, t > 0, satisfy the Chapman-Kolmogorov relation iff the corresponding
transition operators Tt have the semigroup property
T s + t == TsTt, s, t > O. (2)
Proof: For any B E S we have Ts+t1B(x) == J..Ls+t(x, B) and
(Ts1t)lB(x) Ts(Tt 1 B)(x) = J J1.s(x,dy)(Tt 1 B)(Y)
- J J1.s(x, dY)J1.t(Y, B) = (J1.sJ1.t)(x, B).
Thus, the Chaprnan-Kolmogorov relation is equivalent to Ts+t1B ==
(TsTt)IB for any B E S. The latter relation extends to (2) by linearity
and monotone convergence. 0
By analogy with the situation for the Cauchy equation, one might hope to
represent the semigroup in the form Tt == etA, t > 0, for a suitable generator
A. For the formula to make sense, the operator A must be suitably bounded,
so that the exponential function can be defined through a Taylor expansion.
We shall consider a simple case when such a representation exists.
Proposition 19.2 (pseudo-Poisson processes) Let (Tt) be the transi-
tion semigroup of a pure jump-type Markov process in S with bounded rate
kernelo. Then Tt = etA for all t > 0, where for any bounded measurable
function f: S JR,
Af(x) = J (f(y) - f(x »a(x, dy), xES.
Proof: Choose a probability kernel J.l- and a constant c > 0 such that
a(x, B) = CJ.l(x, B \ {x}). From Proposition 12.20 we see that the process is
pseudo-Poisson of the form X = YoN, where Y is a discrete-time Markov
chain with transition kernel J.L, and N is an independent Poisson process
with fixed rate c. Letting T denote the transition operator associated with
19. Feller Processes and Semigroups 369
J-L, we get for any t > 0 and f as stated,
Ttf(x)
Exl(Xt) = Ex[f(Yn); Nt == n]
nO
" P{N t == n}Exf(Yn)
L....t n 0
_" e- ct (ct)n Tn f(x) = ect(T-r) f(x).
L....tno n!
Hence, Tt == etA holds for t > 0 with
Af(x) c(T - l)f(x) = c J (f(y) - f(x))JJ(x, dy)
- J (f(y) - f(x))a(x, dy).
For the further analysis, we assume S to be a locally compact, separable
metric space, and we write Co == Co (S) for the class of continuous functions
f : S JR with f(x) -+ 0 as x -+ 00. We can make (7 0 into a Banach
space by introducing the norm ItIII == sUPx If(x)l. A sernigroup of positive
contraction operators Tt on Co is called a Feller semigroup if it has the
additional regularity properties
o
(F 1 ) TtCo C Co, t > 0,
(F 2 ) Ttf(x) -t f(x) as t 0, f E Co, xES.
In Theorem 19.6 we show that (F 1 ) and (F 2 ) together vrith the semi group
property imply the strong continuity
(F 3 ) Ttf 1 as t 0, I E Co.
For motivation, we proceed to clarify the probabilistic significance of
those conditions. Then assume for simplicity that S is compact, and also
that (Tt) is conservative in the sense that Tt1 == 1 for all t. For every initial
state x, we may then introduce an associated Markov process Xf, t > 0,
with transition operators Tt.
Lemma 19.3 (Feller properties) Let (Tt) be a conservative transition
semigroup on a compact metric space (S, p). Then
(F 1 ) holds iff Xf xi as x y for fixed t > 0;
(F 2 ) holds iff Xf x as t 0 for fixed x;
(F 3 ) holds iff sUPx Ex[p(Xs, Xt) !\ 1] 0 as s - t o.
Proof: The first two statements are obvious, so we shall prove only the
third one. Then choose a dense sequence fl, 12,. .. in () = C(S). By the
compactness of S we note that X n -+ x in S iff !k(X n ) !k(X) for each k.
Thus, p is topologically equivalent to the metric
p'(x, y) = Lk 2- k (I!k(x) - fk(y) I A. 1), x, yES.
370 Foundations of Modern Probability
Since S is compact, the identity mapping on S is uniformly continuous with
respect to p and p', and so we may assume that p = p'.
Next we note that, for any f E C, x E 5, and t, h > 0,
Ex(f(Xt) - f(Xt+h))2 Ex(f2 - 21Th! - Thf2)(Xt)
< 111 2 - 2fTh! + Thf211
< 211fll IIf - Thill + 11/ 2 - Thf211.
Assuming (F 3 ), we get sUPx Exlfk(Xs) - fk(Xt)1 -+ 0 as s - t --t 0 for fixed
k, and so by dominated convergence SUP x Exp(Xs, Xt) -+ O. Conversely,
the latter condition yields Thfk -+ fk for each k, which implies (F 3 ). 0
Our aim is now to construct the generator of an arbitrary Feller semi-
group (Tt) on Co. In general, there is no bounded linear operator A
satisfying It = etA, and we need to look for a suitable substitute. For
motivation, we note that if p is a real-valued function on JR+ with repre-
sentation Pt = eta, then a can be recovered from p by either differentiation
or integration:
t -1 (Pt - 1) -+ a as t --t 0;
1 00 e->.tptdt = (>' - a)-I, >. > O.
Motivated by the latter formula, we introduce for each A > 0 the
associated resolvent or potential RA' defined as the Laplace transform
R>.! = 1 00 e->'t(Td)dt, ! E co.
Note that the integral exists, since TtJ(x) is bounded and right-continuous
in t > 0 for fixed xES.
Theorem 19.4 (resolvents and generator) Let (Tt) be a Feller semigroup
on Co with resolvents R).., A > o. Then the operators AR).. are injective
contractions on Co such that AR).. --t I strongly as A -+ 00. Furthermore,
the range V = R)..C o is independent of A and dense in Co, and there exists
an operator A on Co with domain V such that R-;.l = A - A on V for every
A > O. Finally, A commutes on TJ with every Tt.
Proof: If f E Co, then (F 1 ) yields Ttf E Co for every t, and so by domi-
nated convergence we have even RAf E Co. To prove the stated contraction
property, we write for any ! E Co
/I>'RA!/I < >.1 00 e->.t/lTd/ldt < >"If/ll OO e->.tdt = IIfli.
A simple computation yields the resolvent equation
R A - Rp, = (J-t - A)RAR JL , A, J.L > 0,
(3)
19. Feller Processes and Semigroups 371
which shows that the operators R). commute and have a eommon range V.
If f == RI9 with 9 E Co, we get by (3) and as A --t 00
IIAR).f - III II(AR). - I)R 1 911 = II(R I - I)R).gll
< A -lIIRI - lUl\gll --to.
The convergence extends by a simple approximation to the closure of V.
Now introduce the one-point compactification S == S u {} of 5, and
extend any f E Co to 6 == C(5) by putting f() == o. If V =1= Co, then by
the Hahn-Banach theorem there exists a bounded linear functional 'P t 0
on (; such that cpR 1 ! = 0 for all f E Co. By Riesz's representation Theorem
2.22 we may extend 'P to a bounded, signed measure on S. Letting f E Co
and using (F 2), we get by dominated convergence as .A -+ 00
o = >..<pR),J = J <p(dx) 1 00 >..e->..tTtf(:r)dt
! <p(dx) 1 00 e-STsj>..f(x)dt --t <pf,
and so 'P = o. The contradiction shows that V is dense in Co.
To see that the operators R). are injective, let ! E (;0 with R).o f == 0
for some AO > O. Then (3) yields R).f == 0 for every A > 0, and since
AR).f 4 f as A -t 00, we get f == O. Hence, the inverses R-;I exist on V.
Multiplying (3) by R-;l from the left and by RI from the right, we get on
V the relation Rl - R-;l == J.-t - A. Thus, the operator A == A - R-;l on V
is independent of A.
To prove the final assertion, we note that Tt and R). commute for any
t, A > 0, and write
Tt(A - A)R). == Tt == (A - A)R).T t == (,\ - A)Tt R )..
o
The operator A in Theorem 19.4 is called the generator of the semigroup
(Tt). If we want to emphasize the role of the domain V, we say that (Tt)
has generator (A, V). The term is justified by the follo,ving lemma.
Lemma 19.5 (uniqueness) A Feller semigroup is uniquely determined by
its generator.
Proof: The operator A determines R). == (A - A)-I for all A > o. By the
uniqueness theorem for Laplace transforms, it then determines the measure
J.t(dt) = Ttf(x)dt on 1R+ for any I E Co and xES. Since the density Ttf(x)
is right-continuous in t for fixed x, the assertion follows. 0
We now aim to show that any Feller semigroup is strongly continuous
and to derive abstract versions of Kolmogorov's forward and backward
equations.
372 Foundations of Modern Probability
Theorem 19.6 (strong continuity, forward and backward equations) Let
(Tt) be a Feller semigroup with generator (A, D). Then (Tt) is strongly
continuous and satisfies
TtJ - I = it TsAlds, I E v, t > O. (4)
Furthermore, Tt! is differentiable at 0 iff ! E D, in which case
d
dt (Ttf) == Tt A ! == ATtf, t > O. (5)
To prove this result, we introduce the so-called Yosida approximation
AA == -XAR A == -X(AR A - I), A > 0,
(6)
and the associated semigroup Tt A == etA>', i > o. The latter is clearly the
transition semigroup of a pseudo-Poisson process with rate A based on the
transition operator AR A .
Lemma 19.7 (Yosida approximation) For any f E V, we have
IITtf - TtAfli < iliA! - A AlII, t, A > 0, (7)
and AAf 4 Af as A 4 00. Furthermore, Tt A f 4 Ttf as A 4 00 for each
j E Co, uniformly for bounded t > o.
Proof: By Theorem 19.4 we have AAj == -XR>..Af -t Af for any f E 1).
For fixed A > 0 it is further clear that h -1 (T; - I) -t A A in the norm
topology as h -+ o. Now for any commuting contraction operators Band
C,
IIB n f - Cnjll < IIB n - 1 + B n - 2 C +... + cn-III liB! - Cfll
< nllBf - Cfll.
Fixing any! E Co and t, A, JL > 0, we hence obtain as h == tin -+ 0
IITtAf - Tt fll < n IITt j - Th fll
= t Tt I h - I - T/: I h - I -+ t IIAA I - AIL III.
For j E V it follows that TtAf is Cauchy convergent as A -t 00 for fixed t,
and since V is dense in_Co, the same property holds for arbitrary f E Co.
Denoting the limit by Ttt, we get in particular
IIT/I - Tdll < tllA A I - AlII, I E V, t > o.
(8)
Thus, for each f E V we have Tl'! -+ 'it! as A -t 00, uniformly for bounded
t, which again extends to all f E Co.
To identify 7't, we may use the resolvent equation (3) to obtain, for any
! E Co and -X, JL > 0,
roo e-AtTf p,R,,.Jdt = pI. _ AIL) -1 p,RI-LI = ,.\ IL Rvl, (9)
h +p
19. Feller Processes and Semigroups 373
where v == >..J-L(>.. + J-L)-1. As J-L -t 00, we have v -+ '\, and so Rv! -+ R>...
Furthermore,
IITt J-LRf - Tt/ll < IIJlRlLf - ill + IITi f - Ttfll -t 0,
so from (9) we get by dominated convergence J e->..tTtfdt == R>..I. Hence,
the semi groups (Tt) and ('it) have the same resolvent operators R>.., and so
they agree by Lemma 19.5. In particular, (7) then follows from (8). 0
Proof of Theorem 19.6: The semigroup (T/) is clearly norm continuous
in t for each ,.\ > 0, and so the strong continuity of (Tt) follows by Lemma
19.7 as ,.\ -+ 00. Furthermore, we note that h- 1 (T; - I) -t AA as h -!. O.
Using the semigroup relation and continuity, we obtain more generally
!!-.T>" == A>"T>" == T>" A>"
dt t t t ,
t > 0,
which implies
Tl'f - f = I t TsAAAfds, f E Co, t > O.
If f E 1), then by Lemma 19.7 we get as ,.\ -t 00
(10)
II Ts>"A>" ! - TsA!11 < IIA>"f - Ajll + IIT;A! - TsAf11 -+ 0,
uniformly for bounded s, and so (4) follows from (10) as ,\ -+ 00. By the
strong continuity of Tt we may differentiate (4) to get the first relation in
(5). The second relation holds by Theorem 19.4.
Conversely, assume that h- 1 (Th! - f) -+ 9 for some pair of functions
1,9 E Co. As h -+ 0, we get
Th - I Thf - 1
AR>..I +- h R>..I == R>.. h -+ R;\g,
and so
I == (,.\ - A)R>..f == "\R>..! - AR>..I == R>..(,\j - g) E V. 0
In applications, the domain of a generator A is often hard to identify or
too large to be convenient for computations. It is then useful to restrict A
to a suitable subdomain. An operator A with domain V on some Banach
space B is said to be closed if its graph G == {(I, Af); f E V} is a closed
subset of B 2 . In general, we say that A is closable if the closure G is the
graph of a single-valued operator A , the so-called closure of A. Note that
A is closable iff the conditions V :3 f n -t 0 and Af n -t 9 imply 9 == o.
When A is closed, a core for A is defined as a linear subspace D c V such
that the restriction AID has closure A. In this case, A is clearly uniquely
determined by AID. We shall give some conditions ensuring that D c V is
a core when A is the generator of a Feller semigroup (Tt) on Co.
374 Foundations of Modern Probability
Lemma 19.8 (closure and cores) The generator (A, V) of a Feller semi-
group is closed, and for any A > 0 a subspace D c V is a core for A iff
(A - A)D is dense in Co.
Proof: Assume that 11,/2,... E V with In I and Afn g. Then
(1 -A)!n ! -g, and since R 1 is bounded, it follows that!n R 1 (! -g).
Hence, ! == R 1 (! - g) E V, and we have (1 - A)f = f - g, or 9 = Af.
Thus, A is closed.
If D is a core for A, then for any 9 E Co and A > 0 there exist some
fl, f2,. . . E D with fn RAg and Afn ARAg, and we get (..\ - A)fn -*
(A - A)RAg == g. Thus, (A - A)D is dense in Co.
Conversely, assume that (A - A)D is dense in Co. To show that D is a
core, fix any j E V. By hypothesis we may choose some !1, 12, . . . E D with
9n = (A - A)fn -7 (A - A)f = g.
Since R).. is bounded, we obtain fn == R)..9n -7 RAg == f, and thus
Afn == Afn - gn -* Af - 9 == Af.
o
A subspace DeCo is said to be invariant under (Tt) if TtD c D for all
t > O. In particular, we note that, for any subset B c Co, the linear span
of Ut Tt B is an invariant subspace of Co.
Proposition 19.9 (invariance and cores, Watanabe) If (A, V) is the gen-
erator of a Feller semigroup, then any dense, invariant subspace D C V is
a core for A.
Proof: By the strong continuity of (Tt) we note that R 1 can be approxi-
mated in the strong topology by some finite linear combinations L 1 , L 2 ,. . .
of the operators yt. Now fix any fED, and define gn == Lnf. Noting that
A and Ln commute on D by Theorem 19.4, we get
(1 - A)gn == (I - A)Lnf == Ln(l - A)f -+ R 1 (I - A)j == f.
Since gn E D and D is dense in Co, it follows that (I - A)D is dense in Co.
Hence, D is a core by Lemma 19.8. 0
The Levy processes in }Rd are the archetypes of Feller processes, and we
proceed to identify their generators. Let COO denote the class of all infinitely
differentiable functions f on ]Rd such that f and all its derivatives belong
to Co = Co(JRd).
Theorem 19.10 (Levy processes) Let Tt, t > 0, be the transition op-
erators of a Levy process in ]Rd with characteristics (a, b, v). Then (Tt)
is a Feller semigroup, and CD is a core for the associated generator A.
Moreover, we have for any f E C and x E ]Rd
Af(x) = L. .aijfIj(x) + L.bifI(x)
,J 1
+ J {f(x + y) - f(x) - L i YdI(x)l{IYI < I}} v(dy).(U)
19. Feller Processes and Semigroups 375
In particular, a standard Brownian motion in JRd has generator , and
the uniform motion with velocity b E R d has generator b\7, both on the core
COO. Here and V denote the Laplace and gradient operators, respectively.
Also note that the generator of the jump component has the same form
as for the pseudo-Poisson processes in Proposition 19.2, apart from the
compensation for small jumps by a linear drift term.
*[t- 1 ] w
Proof of Theorem 19.10: As t -+ 0, we have tLt -+ /-ll. Thus, Corollary
15.20 yields J-Lt/t -4 1/ on ]Rd \ {O} and
at,h = r l f XXI ILt(dx) ah, bt,h = r 1 f XILt(dx) bh, (12)
J1xlSh Jlxlh
provided that h > 0 satisfies v{ Ix I = h} = O. Now fix any f E COO, and
write
rl(Ttf(x) - f(x)) = r 1 J (f(x + y) - f(X))ILt(dy)
- t- I f { f(x + y) - f(x) - L.Yif:(x) - L. .uiYjf:j(x) } /-Lt(dy)
J1ylSh ,J
+ rll (f(x + y) - f(X))ILt(dy) + L b,h f:Cx) + L ajh f:j(x).
Iyl>h i i,j
As t -+ 0, the last three terms approach the expression in (11), though
with aij replaced by afj and with the integral taken over {Ixl > h}. To
establish the required convergence, it is then enough to show that the first
term on the right tends to zero as h -+ 0, uniformly for small t > O. But
this is clear from (12), since the integrand is of the order hlyl2 by Taylor's
formula. From the uniform boundedness of the derivatives of f, we also see
that the convergence is uniform in x. Thus, Co c D by Theorem 19.6, and
(11) holds on Co.
It remains to show that COO is a core for A. Since Co is dense in Co,
it suffices by Proposition 19.9 to show that it is also invariant under (Tt).
Then note that, by dominated convergence, the differentiation operators
commute with each Tt, and use condition (F 1 ). 0
We proceed to characterize the linear operators A on Co whose closures
A are generators of a Feller semigroups.
Theorem 19.11 (characterization of generators, Hille, Yosida) Let A be
a linear operator on Co with domain V. Then A is closable and its closure
A is the generator of a Feller semigroup on Co iff these conditions hold:
(i) V is dense in Co;
(ii) the range of Ao - A is dense in Co for some Ao > 0;
(iii) if f V 0 < f(x) for some f E V and x E 5, then A.f(x) < O.
Condition (iii) is known as the positive-maximum principle.
376 Foundations of Modern Probability
Proof: First assume that .it is the generator of a Feller semi group (Tt).
Then (i) and (ii) hold by Theorem 19.4. To prove (iii), let fED and xES
with f+ = f V 0 < f(x). Then
Ttf(x) < Ttf+(x) < /lTtf+1I < l/f+II = f(x), t > 0,
and so h- 1 (T h f - f)(x) < O. As h 0, we get Af(x) < o.
Conversely, assume that A satisfies (i), (H), and (Hi). Let f E V be
arbitrary, choose xES with I/(x)/ = lilli, and put 9 = fsgnf(x). Then
9 E V with g+ < g(x), and so (iii) yields Ag(x) < o. Thus, we get for any
A>O
II(A - A)fll > Ag(X) - Ag(x) > Ag(X) = A/lfll. (13)
To show that A is closable, let 11,12,' .. E V with fn 0 and Afn g.
By (i) we may choose 91, g2," . E V with gn g, and by (13) we have
II (A - A)(gm + Afn) II > AII9m + A/nil, m,n EN, A > O.
As n 00, we get II (A - A)gm - Agli > AIIgmil. Here we may divide by
A and let A 00 to obtain 119m - 911 > 119m", which yields IIgll = 0 as
m -+ 00. Thus, A is closable, and from (13) we note that the closure A
satisfies
II(A - A)/II > All/II, A > 0, 1 E dom(A).
(14)
Now assume that An --+ A > 0 and (An - A)fn --+ 9 for some 11,12,' . . E
dom(.A). By (14) the sequence (In) is then Cauchy, say with limit I E Co.
By the definition of A we get (-X - A)f = g, and so 9 belongs to the range
of A - .A. Letting A denote the set of constants A > 0 such that -X - .A has
range Co, it follows in particular that A is closed. If we can show that A is
open as well, then by (ii) we have A = (0,00).
Then fix any A E A, and conclude from (14) that A - A has a bounded
inverse R A with norm liRA I! < A-I. For any J-L > 0 with IA - J-LII/RAII < 1,
we may form the bounded linear operator
R =" ( A - J-L ) n Rn+l
/-l L...,.;. n 0 A ,
and we note that
(J-L - A)RJ.l == (A - A)Rjl - (A - J-L)RJl == I.
In particular, J.1- E A, which shows that A E A 0 .
We may next establish the resolvent equation (3). Then start from the
identity (-X - .A)R A = (J-t - .A)RJ.l = I. By a simple rearrangement,
(A - .A)(R A - R/-l) == (J-L - A)R/-l'
and (3) follows as we multiply from the left by R A . In particular, (3) shows
that the operators R A and RJ.l commute for any A, J.-t > o.
Since RA(A - A) == I on dom(A) and IIRAII < A-I, we have for any
f E dom(A) as A 4' 00
!lARAI - III = IIR.xAIIi < A-lilA/II o.
19. Feller Processes and Semigroups 377
From (i) and the contractivity of AR A , it follows easily that AR). -+ I in
the strong topology. Now define AA as in (6) and let Tt A == etA>'. As in the
proof of Lemma 19.7, we get Tli -7 Ttl for each I E (7 0 , uniformly for
bounded t, where the Tt form a strongly continuous family of contraction
operators on Co such that J e-AtTtdt == R A for all A > O. To deduce the
semigroup property, fix any I E Co and s, t > 0, and note that as A -t 00
(T s + t - TsTt)f = (Ts+t - Ts>+t)f + TsA(Tt A - Tt)f + (T; - Ts)Tt! -+ o.
The positivity of the operators Tt will follow immediately, if we can show
that R).. is positive for each A > o. Then fix any function 9 > 0 in Co, and
put f = RAg, so that g =: (A - A.)f. By the definition of ..4, there exist
some 11,12,." E 1) with In -+ I and Aln -+ Af. If infx f(x) < 0, we have
infx In (x) < 0 for all sufficiently large n, and we may choose some X n E S
with fn(xn) < fn A O. By (iii) we have Aln(xn) > 0, and so
infx(A - A)fn(x) < (A - A)fn(xn)
< Aln(xn) == Ainfxln(x),
As n -7 00, we get the contradiction
o < infxg(x) = infx(A - A)/(x) < Ainfx!(x) < o.
It remains to show that A is the generator of the semigroup (Tt). But this
is clear from the fact that the operators A - A are inverses to the resolvent
operators R A . 0
From the proof we note that any operator A on Co satisfying the positive
maximum principle in (iii) must be dissipative, in the sense that II (A -
A)fll > AII/II for all f E dom(A) and A > O. This leads to the following
simple observation, which will be needed later.
Lemma 19.12 (maximality) Let (A, V) be the generator of a Feller semi-
group on Co, and assume that A extends to a linear operator (A', D')
satisfying the positive-maximum principle. Then V' == V.
Proof: Fix any f E V', and put 9 = (1 - A') f. Since A' is dissipative and
(1 - A)R 1 == 1 on Co, we get
IIf - Rigil < 11(1 - A')(f - RIg)11 = 119 - (I - A)R 1 911 == 0,
and so f == RIg E D.
o
Our next aim is to show how a nice Markov process can be associated
with every Feller semigroup (Tt). In order for the corresponding transition
kernels ILt to have total mass 1, we need the operators Tt to be conservative,
in the sense that sUPfl Ttf(x) == 1 for all xES. This can be achieved by
a suitable extension.
Let us then introduce an auxiliary state f/. Sand fornl the compactified
space S == S u {}, where is regarded as the point at infinity when S
is noncompact, and otherwise as isolated from S. Note that any function
378 Foundations of Modern Probability
! E Co has a continuous extension to S, obtained by putting I() == O. We
may now extend the original semigroup on Co to a conservative semigroup
on the space 6 = C(S).
Lemma 19.13 (compactijication) Any Feller semigroup (Tt) on Co ad-
mits an extension to a conservative Feller semigroup (Tt) on 6, given
by
Tt! == I() + Tt{f - j(Ll)},
"
t > 0, I E C.
Proof: It is straightforward to verify that (Tt) is a strongly continuous
semigroup on C. To show that the operators Tt are positive, fix any f E (;
with f > 0, and note that 9 = f() - f E Co with 9 < f(). Hence,
Ttg < Ttg+ < II Ttg+ II < Ilg+ II < f(),
and so Ttl == f() - Tt9 > o. The contraction and conservation properties
now follow from the fact that Tt 1 == 1. 0
Our next step is to "construct an associated semi group of Markov
transition kernels J-Lt on S, satisfying
Ttf(x) = J f(Y)J..Lt(x, dy), f E Co.
(15)
We say that a state xES is absorbing for (Ilt) if Ilt(X, {x}) = 1 for each
t > o.
Proposition 19.14 (existence) For any Feller semigroup (Tt) on Co,
there exists a unique semigroup of Markov transition kernels J.Lt on S
satisfying (15) and such that is absorbing for (J-Lt).
Prool: For fixed x Sand t > 0, the mapping I r--+ Ttf(x) is a positive
linear functional on C with norm 1, so by Riesz's representation Theorem
2.22 there exist some probability measures J.Lt (x, .) on S satisfying
Ttf(x) = J f(Y)J..Lt(x, dy),
'" "-
f E C, XES, t > o.
(16)
The measurability of the right-hand side is clear by continuity. By a stan-
dard approximation followed by a monotone class argument, we then obtain
the desired measurability of J.Lt(x, B) for any t > 0 and Borel set B c S.
The Chapman-Kolmogorov relation holds on S by Lemma 19.1. Relation
(15) is a special case of (16), and from (16) we further get
J f(Y)J..Lt(6., dy) = Td(6.) = f(6.) = 0, f E Co,
which shows that is absorbing. The uniqueness of (Ilt) is a consequence
of the last two properties. 0
For any probability easure v on S, there exists by Theorem 8.4 a
Markov process XV in S with initial distribution 1/ and transition ker-
nels Ilt. As before, we denote the distribution of XV by Pv and write Ev for
19. Feller Processes and Semigroups 379
the corresponding integration operator. When v == 8x, we often prefer the
simpler forms Px and Ex, respectively. We may now extend Theorem 15.1
to a basic regularization theorem for Feller processes. Given a process X,
we say that is absorbing for XI if Xt = or Xt- == implies Xu =
for all u > t.
Theorem 19.15 (regularization, Kinney) Let X be a Feller proces in
with arbitrary initial distribution v. Then X has an rcll version X in S
such that Ll is absorbng for XI. If (Tt) is conservative and v is restricted
to S, we can choose X to be rell even in S.
The idea of the proof is to construct a sufficiently rich class of super-
martingales, to which the regularity theorems of Chapter 7 can be applied.
Let ct denote the class of nonnegative functions in Co.
Lemma 19.16 (resolvents and excessive functions) If fEet, then the
process yt == e- t R1!(X t ), t > 0, is a supermartingale under Pv for every
v.
Proof: Writing (Qt) for the filtration induced by X, we get for any t, h > 0
E[e-t-hRl!(Xt+h)IQt] = e-t-h1hRI!(Xt)
e- t - h 1 00 e-STs+hf(Xdds
e- t 1 00 e-STsf(Xt)ds < }t.
Proof of Theorem 19.15: By Lemma 19.16 and Theorem 7.27, the pro-
cess f(Xt) has a.s. right- and left-hand limits along Q+ for any f E V
dom(A). Since V is dense in Co, the stated property holds for every f E Co.
By the separability of Co we may choose the exceptional null set N to be
independent of !. If Xl, X2, . . . E are such that f (xn) converges for every
f E Co, then the compactness of S ensures that X n converges in the topol-
ogy of S. Thus, on NC the process X itself has right- and left-ha!ld limits
Xt:i: along Q+; on N we may re<!efine X to be o. Then clearly Xt = X t +
is rcll. It remains to show that X is a version of X or, equivalently, that
X t + = Xt a.s. for each t > o. But this follows from the fact that Xt+h Xt
as h t 0 by Lemma 19.3 and dominated convergence.
Now fix any f E Co with f > 0 on S, and note from the strong continuity
of (Tt) that even R1f > 0 on S. Applying Lemma 7.31 to the supermartin-
gale It = e-tR1!(X t ), we conclude that X - a.s. on the interval [(, (0),
where ( = inf {t > 0; E {X t , Xt- } }. Discarding the exceptional null set,
we can make_this hold identically. If (Tt) is conservative and v is restricted
to S, then Xt E S a.s. for every t > O. Thus, ( > t a.s. for all t, and
h_ence ( _ 00 a.s. Again we may assume that this holds identically. Then
Xt and Xt- take values in S, and the stated regularity properties remain
valid in S. 0
E[Yi+hIQtJ
o
0U f4bundations of Modern .Probability
'" In view of the last theorem, we may choose 11 to be the space of all
S-valued rcll functions such that the state is absorbing, and let X be
the canonical process on !1. Processes with different initial distributions
v are then distinguished by their distributions Pv on !1. Thus, under Pv
the process X is Markov with initial distribution v and transition kernels
J1t, and X has all the regularity properties stated in Theorem 19.15. In
particular, X = on the interval [(, (0), where ( denotes the terminal
time
(= inf{t > 0; Xt = or Xt- = }.
We take (Ft) to be the right-continuous filtration generated by X, and put
A = F 00 = V t Ft. The shift operators Ot on Q are defined as before by
(Otw)s = Ws+t, s,t > o.
The process X with associated distributions Pv, filtration F = (F t ), and
shift operators Ot is called the canonical Feller process with semigroup (Tt).
We are now ready to state a general version of the strong Markov prop-
erty. The result extends the special versions obtained in Proposition 8.9
and Theorems 12.14 and 13.11. A further instant of this property appears
in Theorem 21.11.
Theorem 19.17 (strong Markov property, Dynkin and Yushkevich, Blu-
menthal) For any canonical Feller process X, initial distribution v,
optional time T, and random variable > 0, we have
Ev[ 0 8,IFr] = EX-r a.s. Pv on {T < oo}.
Proof: By Lemmas 6.2 and 7.1 we may assume that T < 00. Let 9
denote the filtration induced by X. Then Lemma 7.4 shows that the times
Tn = 2- n [2 n T + 1] are Q-optional, and by Lemma 7.3 we have F T C QT n for
all n. Thus, Proposition 8.9 yields
Ev[ 0 8'n; A] = Ev[Ex-rn; A], A E F" n E N. (17)
To extend the relation to T, we first assume that = TIk<m Ik(X tk ) for
some II, . . . , 1m E Co and tl < . . . < t m . Then 0 f),n -+ 0 f) by the right-
continuity of X and the continuity of II, . . . ,1m- Writing hk = tk-tk-I with
to == 0, it is also clear from the first Feller property and the right-continuity
of X that
EX-rn Th 1 (!I T h 2 . . . (/m-lThrnlm) . . . ) (X Tn )
-+ Thl (flTh2 . .. (fm-lThrn!m) -.. )(X r ) = EXT.
Thus, (17) extends to 'T by dominated convergence on both sides. We may fi-
nally use standard approximation and monotone class arguments to extend
the result to arbitrary . 0
As a simple application, we get the following useful zero-one law.
19. Feller Processes and Semigroups 381
Corollary 19.18 (Blumenthal's 0-1 law) For any canonical Feller process,
we have
PxA == 0 or 1, xES, A E Fa.
Proof: Taking T == 0 in Theorem 19.17, we get for any xES and A E:Fo
1A == Px[AIFo] == PXoA == Px A a.s. Px.
o
To appreciate the last result, recall that Fa == F o +. In particular, we
note that Px {T == O} == 0 or 1 for any state xES and F -optional time T.
The strong Markov property is often used in the following extended form.
Corollary 19.19 (optional projection) For any canonical Feller process
X, nondecreasing adapted process Y, and random variable > 0, we have
Ex 1 00 (EXt) d}'t = Ex 1 00 (o ()d d}'t, xES.
Proof: We may assume that Yo == O. Introduce the right-continuous
Inverse
T s == inf {t > 0; yt > s}, s > 0,
and note that the times Ts are optional by Lemma 7.6. By Theorem 19.17
we have
Ex[Ex[ 0 OTsIF Ts ]; Ts < 00]
Ex[ 0 OT s ; Ts < 00].
Since Ts < 00 iff s < Y 00, we get by integration
Ex[ExTS; Ts < 00]
{Y oo {Y oo
Ex J o (Ex.,..) ds = Ex J o ( 0 ()r.) ds,
and the asserted formula follows by Lemma 1.22.
o
Our next aim is to show that any martingale on the canonical space of
a Feller process X is a.s. continuous outside the discontinuity set of X.
For Brownian motion, the result was already noted as a consequence of the
integral representation in Theorem 18.10.
Theorem 19.20 (discontinuity sets) Let X be a canon'ical Feller process
with arbitrary initial distribution v, and let M be a local Pv-martingale.
Then
{t > 0; LlMt =1= O} c {t > 0; Xt- # Xt} Q.s.
(18)
Proof (Chung and Walsh): By localization we may reduce to the case
when M is uniformly integrable and hence of the form Mt == E[IFt] for
some E L 1 . Let C denote the class of random variables E £1 such that
the corresponding M satisfies (18). Then C is a linear subspace of L 1 . It is
further closed, since if Mf == E[nIFt] with lInlh -t 0, then
P{sUPtIMfl > £} < e- 1 EInl -t 0, e > 0,
382 Foundations of Modern Probability
p
and so SUPt IMt"1 -t o.
Now let € = TIk<n Ik(X tk ) for some 11'...' In E Co and t 1 < ... < tn.
Writing h k = tk - tk-l, we note that
Mt = IIk:::;m /k{X tk )T t "'+1- t9m H (X t ), t E [tm, tmH]' (19)
where
9k = !kThk+l (!k+l T h k + 2 (... Thnfn).'.)' k = 1,..., n,
with the obvious conventions for t < t 1 and t > tn. Since Ttg(x) is jointly
continuous in (t, x) for each 9 E Co, equation (19) defines a right-continuous
version of M satisfying (18), and so € E C. By a simple approximation it
follows that C contains all indicator functions of sets nk<n {X tk E G k} with
G 1 , . . . , G n open. The result extends by a monotone class argument to any
X-measurable indicator function , and a routine argument yields the final
extension to £1. 0
A basic role in the theory is played by the processes
M! = f{X t ) - f{X o ) - I t Af{Xs)ds, t > 0, fED.
Lemma 19.21 (Dynkin's formula) The processes M / are martingales un-
der any initial distribution v for X. In particular, we have for any bounded
optional time T
Exf{X r ) = f{x) + Ex l T Af(Xs)ds, x E 5, fED. (20)
Proof: For any t, h > 0, we have
I t + h
M!+h - M! = f{Xt+h) - f{X t ) - t Af{Xs)ds = M[ 0 (}t,
and so by the Markov property at t and Theorem 19.6
Ev[M/+hIFt] - M! = Ev[M[ 0 OtlFt] = EXtMl = o.
Thus, M/ is a martingale, and (20) follows by optional sampling. 0
As a preparation for the next major result, we introduce the optional
times
Th = inf{t > 0; p(Xt,X o ) > h}, h > 0,
where p denotes the metric in S. Note that a state x is absorbing iff 'h = 00
a.s. Px for every h > o.
Lemma 19.22 (escape times) For any nonabsorbing state xES, we have
ExTh < 00 for all sufficiently small h > O.
Proof: If x is not absorbing, then J.lt(x, B;) < p < 1 for some t, £ > 0,
where B; = {V; p(x,y) < ,g}. By Lemma 19.3 and Theorem 4.25 we may
19. Feller Processes and Semigroups 383
choose h E (0, e] so small that
jjt(Y, B) < JLt(Y, B) < p, Y E B.
Then Proposition 8.2 yields
PX{Th > nt} < Px n ksn {Xkt E B} < pn, n E IE.+,
and so by Lemma 3.4
f t
ExTh = In P{Th > s}ds < t LP{Th > nt} = t Lpn = 1- p < 00.0
o nO nO
We turn to a probabilistic description of the generator and its domain.
Say that A is maximal within a class of linear operators if it extends every
member of the class.
Theorem 19.23 (characteristic operator, Dynkin) Let (.A, V) be the gen-
erator of a Feller process. Then for any f E V we have ilf (x) == 0 if x is
absorbing, and otherwise
Af(x) = lim Exf(X rh ) - f(x) . (21)
h-+O ExTh
Furthermore, A is the maximal operator on Co with those properties.
Proof: Fix any lEV. If x is absorbing, then Ttf(x) == f(x) for all t > 0,
and so AI(x) = O. For a nonabsorbing x, we get instead by Lemma 19.21
[Th I\t
Exf(XThAt) - f(x) = Ex Jo Af(Xs)ds,
t, h > o.
(22)
By Lemma 19.22 we have ETh < 00 for sufficiently small h > 0, and so
(22) extends by dominated convergence to t == 00. Relation (21) now follows
from the continuity of Af, together with the fact that p( X s, x) < h for all
S < Th' Since the positive maximum principle holds for any extension of A
with the stated properties, the last assertion follows by Lemma 19.12. 0
In the special case when S == JRd, let C denote the class of infinitely
differentiable functions on JRd with bounded support. An operator (A, V)
with V ::> C is said to be local on C if Ai (x) == 0 whenever I vanishes in
some neighborhood of x. For any generator with this property, we note that
the positive-maximum principle implies a local positive-maximum principle,
asserting that if f E C has a local maximum > 0 at some point x, then
AI(x) < o.
The following result gives the basic connection between diffusion pro-
cesses and elliptic differential operators. This connection is explored further
in Chapters 21 and 24.
384 Foundations of Modern Probability
Theorem 19.24 (Feller diffusions and elliptic operators, Dynkin) Let
(A, V) be the generator of a Feller process X in JRd, and assume that C K C
V. Then X is continuous on [0, (), a.s. Pv for every l/, iff A is local on
C. In that case there exist some functions aij,bi,c E CORd), where c > 0
and the aij form a symmetric, nonnegative definite matrix, such that for
any f E C and x E JR+,
Af(x) = ! L. .aij (x)jij (x) + L.bi(x)fi(x) - c(x)f(x). (23)
1,) 1
In the situation described by this result, we may choose (2 to consist
of all paths that are continuous on [0, (). The resulting Markov process is
referred to as a canonical Feller diffusion.
Proof: If X is continuous on [0, (), then A is local by Theorem 19.23.
Conversely, assume that A is local on C. Fix any x E jRd and 0 < h < m,
and choose f E C with f > 0 and support {y; h < Iy - xl < m}. Then
Af(y) = 0 for all Y E B, and so Lemma 19.21 shows that f(XtI\Th) is
a martingale under Px. By dominated convergence we get Exf(XTh) = 0,
and since m was arbitrary,
Px {IX Th - xl < h or X Th = } = 1, x E JRd, h > O.
Applying the Markov property at fixed times, we obtain for any initial
distribution v
pv n (};1 {IX Th - Xol < h or X Th = } = 1, h > 0,
tEQ+
which implies
Pv {suPt«IXtl < h} = 1, h > O.
Hence, X is continuous on [0, () a.s. PJ.L'
To show that (23) holds for suitable aij, b i , and c, we choose for every
x E JRd some functions Iff, If, lij E C K such that, for any y close to x,
f(y) = 1, jt(y) =: Yi - Xi, fij(Y) =: (Yi - Xi)(Yj - Xj).
Putting
C(X) =: -Af(x), bi(x) = A/f(x), aij(x) = Aiij(x),
we note that (23) holds locally for any function i E C that agrees near x
with a second-degree polynomial. In particular, we may choose fo(y) = 1,
fi(Y) =: Yi, and fij(Y) = YiYj near x to obtain
A!o(x)
Afi (x)
Afij(X)
-C(x),
- bi(x) - XiC(X),
- aij(x) + Xibj(X) + xjbi(x) - XiXjC(X).
This shows that c, bi, and aij = aji are continuous.
19. Feller Processes and Semigroups 385
Applying the local positive-maximum principle to fa gives c(x) > O. By
the same principle applied to the function
f = - {L:iudi} 2 = - L iJ ui u jf 0 ,
we get L:ij UiUjaij{X) > 0, which shows that (aij) is nonnegative defi-
nite. Finally, we consider any function f E OK with a second-order Taylor
expansion f around x. Here each function
g(y) == :l:(f(y) - j(y)) - Elx - y12, C > 0,
has a local maximum 0 at x, and so
Ag1:(x) = ::f:(Af(x) - Aj(x)) - E Liaii(x) < 0, E > O.
Letting E -t 0 gives Af(x) = AJ(x), which shows that (23) is generally
true. 0
We consider next a basic convergence theorem for Feller processes,
essentially generalizing the result for Levy processes in Theorem 15.17.
Theorem 19.25 (convergence, Trotter, Sova, Kurtz, Mackevicius) Let X
and X n be Feller processes in S with semigroups (Tt) and (Tn,t) and gen-
erators (A, V) and (An, V n ), respectively. Fix a core D for A. Then these
conditions are equivalent:
(i) If fED, there exist some fn E 1J n with fn -t f and Anfn -t Af.
(ii) Tn,t -t Tt strongly for each t > o.
(iii) Tn,tf -t Ttf for every f E Co, uniformly for bounded t > O.
(iv) If Xl) i+ Xo in 5, then xn i+ X in D(JR+, S).
For the proof we need two lemmas, the first of which extends Lemma
19.7.
Lemma 19.26 (norm inequality) Let (Tt) and (T:) be Feller semigroups
with generators (A, V) and (A', V'), respectively, where A' is bounded. Then
IITd - TUII < it II (A - A')Tsfll ds, f E v, t > O. (24)
Proof: Fix any f E V and t > O. Since (T) is norm continuous, we get
by Theorem 19.6
:s (TLsTsJ) = TLs(A - A')Tsf, 0 < s < t.
Here the right-hand side is continuous in s, because of the strong continuity
of (Ts), the boundedness of A', the commutativity of A and Ts, and the
norm continuity of (T;). Hence,
Td - T:! = it :s (Tf-sTsJ) ds = it Tf-s(A - A')Tsf ds,
386 Foundations of Modern Probability
and (24) follows by the contractivity of T£_s.
D
We may next establish a continuity property for the Yosida approxima-
tions AA and A of A and An, respectively.
Lemma 19.27 (continuity of Yosida approximation) Let (A, V) and
(An, V n ) be the generators of some Feller semigroups satisfying condition
(i) of Theorem 19.25. Then A -+ AA strongly for every A > o.
Proof: By Lemma 19.8 it suffices to show that Af -+ AAf for every f E
(A - A)D. Then define 9 = RA fED. By (i) we may choose some gn E V n
with gn -+ g and Angn -+ Ag. Then in = (A - An)gn --t (A - A)g == f, and
so
IIAI - AA III A211Rf - R A III
< A2I1R(f - In) II + A211Rfn - R A III
< All! - fnll + A 2 11gn - 911 -+ O. 0
Proof of Theorem 19.25: First we show that (i) implies (iii). Since D
is dense in Co, it is enough to verify (iii) for fED. Then choose some
functions In as in (i), and conclude by Lemmas 19.7 and 19.26 that, for
any n E Nand t, A > 0,
IITn,tf - Ttfll < IITn,t(f - in)" + II (Tn,t - T,t)fn II + IIT:,t(!n - f) II
+ II(T t - T£)fll + II(Tt A - Tt)fll
,
< 211fn - fll + tll(AA - A)/II + tll(A n - A)fnll
+ I t II(A - A>')Ts>' 111 ds. (25)
By Lemma 19.27 and dominated convergence, the last term tends to zero
as n -+ 00. For the third term on the right, we get
II (An - A)fnll < IIAnin - Alii + II (A - AA)fll
+ II(AA - A)/II + IIA(I - fn)lI,
which tends to II (A - AA)fll by the same lemma. Hence, by (25)
limsup sup IITn,tf - Ttfll < 2ull(AA - A)/II, u, A > 0,
n-+oo tu
and the desired convergence follows by Lemma 19.7 as we let ..\ -+ 00.
Conversely, (iii) trivially implies (ii), and so the equivalence of (i)-(iii)
will follow if we can show that (ii) implies (i). Then fix any / E D and
A > 0, and define 9 = (A - A)f and In = Rg. Assuming (ii), we get by
dominated convergence fn -+ RAg == f. Since (..\ - °An)fn == g == (..\ - A)f,
we also note that An/n -+ Af. Thus, even (i) holds.
It remains to show that conditions (i)-(iii) are equivalent to (iv). For
convenience, we may then assume that S is compact and the semigroups
(Tt) and (Tn,t) are conservative. First assume (iv). We may establish (ii)
by showing that, for any fEe and t > 0, we have T[" f(xn) Ttf(x)
19. Feller Processes and Semigroups 387
whenever X n --+ x in S. Then assume that Xo = x and Xl) == Xn. By Lemma
19.3 the process X is a.s. continuous at t. Thus, (iv) yields XI" i+ Xt, and
the desired convergence follows.
Conversely, assume conditions (i)-(iii), and let Xo i+ Xo. To obtain
xn Jd) X, it is enough to show that, for any fa, . . . ,1m E C and 0 = to <
t l . . . t
m,
Hrn E II fk(X4) = E II fk(Xtk).
n-+oo km km
(26)
This holds by hypothesis when m = o. Proceeding by induction, we may
use the Markov property to rewrite (26) in the form
E II fk(X) . ThTnfm(Xrn_l) -t E II fk(X tk ) . Thrnfm(Xtrn-l)' (27)
k<m k<m
where h m = t m - t m - 1 . Since (ii) implies Th rn f m -t Th rn f rn, it is equivalent
to prove (27) with T replaced by Th m . The resulting condition is of the
form (26) with m replaced by m - 1. This completes the induction and
shows that xn f d ) X.
To strengthen the conclusion to X n i+ X, it suffices by Theorems 16.10
and 16.11 to show that p(Xn,X+hn) 0 for any finite optional times
Tn and positive constants h n -t O. By the strong Markov property we
may prove instead that p(X o , X hn ) 0 for any initial distributions v n . In
view of the compactness of S and Theorem 16.3, we may then assume that
V n v for some probability measure v. Fixing any f,9 E C and noting
that Th n 9 -+ 9 by (iii), we get
Ef(X[j)g(Xh n ) = EfThng(XO) Efg(Xo),
where .c(X o ) = v. Then (X{j, X hn ) i+ (X o , Xo) as before, and in particular
p(X(j, X hn ) p(Xo,X o ) = O. This completes the proof of (iv). 0
From the last theorem and its proof we may easily deduce a similar
approximation property for discrete-time Markov chains. The result ex-
tends the approximations for random walks obtained in Corollary 15.20
and Theorem 16.14.
Theorem 19.28 (approximation of Markov chains) Let yl, y2, . .. be
discrete- time Markov chains in S with transition operators U 1, U 2, . .. , and
consider a Feller process X in S with semigroup (Tt) and generator A. Fix
a core D for A, and assume that 0 < h n -t o. Then conditions (i)-(iv) of
Theorem 19.25 remain equivalent for the operators and processes
A - h - 1 (U I) rp - U [t/h n ] X n -- "'{.,Tn
n - n n -, .L n,t - n' t -- L [tj h n ]'
Proof: Let N be !tn independent, unit-rate Poisson process, and note
that the processes XI" == yn 0 N t / hn are pseudo-Poisson with generators
An. Theorem 19.25 shows that (i) is equivalent to (iv) with X n replaced
388 Foundations of Modern Probability
by x n . By the strong law of large numbers for N together with Theorem
4.28, we also see that (iv) holds simultaneously for the processes x n and
X n . Thus, (i) and (iv) are equivalent.
Since X is a.s. continuous at fixed times, condition (iv) yields X .:;. Xt
whenever t n -+ t and the processes X n and X start at fixed points X n -+ x
in S. Hence, Tn,tnf(xn) -+ Ttf(x) for any f E 6, and (iii) follows. Since
(iii) trivially implies (ii), it remains to show that (ii) implies (i).
Arguing as in the preceding proof, we then need to show that Rg -t R A 9
for any A > 0 and 9 E Co, where R = (..\ - An)-I. Now (ii) yields
Rg -+ RAg, where R = J e-AtTn,tdt, and so it suffices to prove that
(R - R)g -t o. Then note that
..\Rg - ARg = Eg(YK:-l) - Eg(YK n n _l)'
where the random variables n and Kn are independent of yn and geometri-
cally distributed with parameters Pn == 1- e-'\h n and Pn == ..\hn(l + ..\h n )-I,
respectively. Since Pn rv Pn, we have 11£(K n ) - £(n) II -+ 0, and the desired
convergence follows by Fubini's theorem. 0
Exercises
1. Examine how the proofs of Theorems 19.4 and 19.6 can be simplified if
we assume (F 3 ) instead of the weaker condition (F 2 ).
2. Consider a pseudo-Poisson process X on S with rate kernel Q. Give
conditions ensuring X to be Feller.
3. Verify the resolvent equation (3), and conclude that the range of R A is
independent of A.
4. Show that a Feller semigroup (Tt) is uniquely determined by the resolvent
operator R A for a fixed A > O. Interpret the result probabilistically in terms
of an independent, exponentially distributed random variable with mean
A-I. (Hint: Use Theorem 19.4 and Lemma 19.5.)
5. Consider a discrete-time Markov process in S with transition operator
T, and let T be an independent random variable with a fixed geometric
distribution. Show that T is uniquely determined by Exf(Xr) for arbitrary
xES and f > o. (Hint: Apply the preceding result to the associated
pseudo-Poisson process.)
6. Give a probabilistic description of the Yosida approximation Tl' in terms
the original process X and two independent Poisson processes with rate A.
7. Given a Feller diffusion semigroup, write the second differential equation
in Theorem 19.6, for suitable f, as a PDE for the function Ttf(x) on R+ x
JRd. Also show that the backward equation of Theorem 12.22 is a special
case of the same equation.
19. Feller Processes and Semigroups 389
8. Consider a Feller process X and an independent subordinator T. Show
that yt == X(Tt) is again Markov, and that Y is Levy whenever this is
true for X. If both T and X are stable, then so is Y. F'ind the relation-
ship between the transition semigroups, respectively between the indices of
stability.
9. Consider a Feller process X and an independent rene\val process TO, 71,
. . . . Show that Y n == X Tn is a discrete-time Markov process, and express
its transition kernel in terms of the transition semigroup of X. Also show
that yt == X ( T[t]) may fail to be Markov, even when ( Tn) is Poisson.
10. Let X and Y be independent Feller processes in Sand T with genera-
tors A and !J. Sow that eX, Y) s a Feller process in S x T with generator
extending A + B, where A and B denote the natural extensions of A and
B to C o (8 x T).
11. Consider in S a Feller process with generator A and a pseudo-Poisson
process with generator B. Construct a Markov process with generator A +
B.
12. Use Theorem 19.23 to show that the generator of Brownian motion in
extends A == on the set D of functions f E CJ with Ai E Co.
13. Let R).. be the A-resolvent of Brownian motion in JR. For any f E Co,
put h == R)..f, and show by direct computation that Ah - h lf == f. Conclude
by Theorem 19.4 that with domain D, defined as above, extends the
generator A. Thus, A == by the preceding exercise or by Lemma 19.12.
14. Show that if A is a bounded generator on Co, then the associated
Markov process is pseudo-Poisson. (Hint: Note as in Theorem 19.11 that
A satisfies the positive-maximum principle. Next use Riesz' representation
theorem to express A in terms of bounded kernels, and show that A has
the form of Proposition 19.2.)
15. Let the processes xn and X be such as in Theorem 16.14. Show that
Xr Xt for all t > 0 implies xn X in D(IR+, d), and compare with
the stated theorem. Also prove a corrsponding result for a sequence of Levy
processes X n . (Hint: Use Theorems 19.28 and 19.25, respectively.)
Chapter 20
Ergodic Properties
of Markov Processes
transition and contraction operators; ratio ergodic theorem;
space-time invariance and tail triviality; mixing and conver-
gence in total variation; Harris recurrence and transience;
existence and uniqueness of invariant measure; distributional
and pathwise limits
In Chapters 8 and 12 we have seen, under suitable regularity conditions,
how the transition probabilities of a discrete- or continuous-time Markov
chain converge in total variation toward a unique invariant distribution.
Here our main purpose is to study the asymptotic behavior of more general
Markov processes and their associated transition kernels. A wide range of
powerful tools will then come into play.
We first extend the basic ergodic theorem of Chapter 10 to suitable con-
traction operators on an arbitrary measure space and establish a general
operator version of the ratio ergodic theorem. The relevance of those re-
sults for the study of Markov processes is due to the fact that the transition
operators are positive L 1 - LOO-contractions with respect to any invariant
measure ,\ on the state space 5. The mentioned results cover both the pos-
itive recurrent case, where '\S < 00, and the null-recurrent case, where
,\5 == 00. Even more remarkably, the same ergodic theorems apply to
both the transition probabilities and the sample paths, in each case giving
conclusive information about the asymptotic behavior.
Next we prove for an arbitrary Markov process that a certain strong
ergodicity condition is equivalent to the triviality of the tail a-field, the con-
stancy of all bounded, space-time invariant functions, and a uniform mixing
condition. We also consider a similar result where all four conditions are
replaced by suitably averaged versions. For both sets of equivalences, one
gets very simple and transparent proofs by applying the general coupling
results of Chapter 10.
In order to apply the mentioned theorems to specific Markov processes,
one needs to find regularity conditions ensuring the existence of an invari-
ant measure or the triviality of the tail a-field. Here we consider a general
class of Feller processes which satisfy either a strong recurrence or a uni-
form transience condition. In the former case, we prove the existence of an
invariant measure, required for the application of the mentioned ergodic
20. Ergodic Properties of Markov Processes 391
theorems, and show that the space-time invariant functions are constant,
which implies the mentioned strong ergodicity. Our proofs of the latter re-
sults depend on some potential theoretic tools related to those developed
in Chapter 19.
To begin with the technical developments, we consider a Markov tran-
sition operator T on an arbitrary measurable space (5,S). Note that T is
positive, in the sense that f > 0 implies T f > 0, and also that Tl == 1.
As before, we write Px for the distribution of a Markov process on Z+
with transition operator T starting at x E 5. More generally, we define
PJL == Is PxJ-L(dx) for any measure J.L on S. A measure .\ on S is said to
be invariant if AT f == Af for any measurable function f > O. Writig ()
or the shift on the path space SOCJ, we define the associated operator () by
Of==foO.
For any p > 1, we say that an operator T on some measure space (S, S, Jl)
is an LP-contraction if IITfil p < Ilfll p for every f E LP. By an L1_£00_
contraction we mean an operator that is an LP-contraction for every p E
[1,00]. The following result shows the relevance of the mentioned notions
for the theory of Markov processes.
Lemma 20.1 (Markov processes and contractions) Let T be a Markov
transition operator on (8, S) with invariant measure A. Then
(i) T is a positive L 1 -Loo-contraction on (8, A);
( ii) 0 is a positive L 1 - L 00 - contraction on (5 00 , P A ) .
Proof: (i) Applying Jensen's inequality to the transition kernel J.L(x, B) ==
T1B(x) and using the invariance of A, we get for any p E [1, (0) and f E LP
liT fll == AIJ.LfI P < AJllfl P == Alfl P == IIfll.
The result for p == 00 is obvious.
(ii) Proceeding as in Lemma 8.11, we see that 0 is a IIleasure-preserving
transformation on (Soo, P>.). Hence, for any measurable function f > 0 on
SOCJ and constant p > 1, we have
P>.IOfI P == P>.lf 0 61P == (P A 0 (}-l)lfI P == PAl liP.
The contraction property for p == 00 is again obvious.
o
We shall see how some crucial results of Chapter 10 carryover to the
context of positive L 1 - LOO-contractions on an arbitrary measure space.
First we consider an operator version of Birkhoff's ergodic theorem. To
simplify our writing, we introduce the operators Sn == Ek<n Tk, An ==
Sn/n, and Mf = sUP n Anf. Say that f is T-invariant if Tf = f.
392 Foundations of Modern Probability
Theorem 20.2 (operator ergodic theorem, Hop/, Dunford and Schwartz)
Let T be a positive L 1 _LCXJ -contraction on a measure space (S, S, J-l). Then
Ani converges a.e. for every f E £1 toward a T-invariant function Af E
Ll.
For the proof, we need to extend the inequalities in Lemmas 10.7 and
10.11 and in Proposition 10.10 (i) to an operator setting.
Lemma 20.3 (maximum inequalities) For any positive L 1 -contraction T
on a measure space (8, S, J-l), we have
(i) J1 [f; M I > 0] > 0, f EL I .
If T is even an L 1 - L 00 - contraction, then also
(ii) rJ1{ M f > 2r} < J-l[f; f > r], f E £1, r > 0;
(iii) 11M flip :$ II flip, f E LP, P > 1.
Proof: (i) For any f E £1 we write Mnf = Slf V . .. V Snf and conclude
by positivity that
Skf = f + TSk-lf < f + T(Mnf)+, k = 1,. . . , n.
Hence, Mnf < f+T(Mnf)+ for all n, and so by positivity and contractivity
J1[f; Mnf > 0] > J-l[Mnf - T(Mnf)+; Mnf > 0]
> J1[(M n f)+ - T(Mnf)+]
II(Mnf)+lh -IIT(Mnf)+lh > O.
As before, it remains to let n -t 00.
(ii) Put fr == fl{f > r}. By the LCXJ-contractivity and positivity of An,
Anf - 2r < An(f - 2r) < An(fr - r), n E N,
which implies Mf - 2r < M(fr - r). Hence, by part (i),
rJ1{M f > 2r} < rJ-l{M(fr - r) > O}
< J.L[/r; M(/r - r) > 0]
< J1fr = J1[f; / > r].
(iii) Here the earlier proof applies with only notational changes. D
Proof of Theorem 20.2: Fix any f ELI. By dominated convergence, we
may approximate I in L 1 by functions j E £lnLoo c L 2 . By Lemma 10.18,
we may next approximate j in L 2 by functions of the form j + (g - Tg),
where j, 9 E L 2 and T j = j. Finally, we may approximate 9 in L 2 by
functions 9 E £1 nLoo. Since T contracts L 2 , the functions 9 - Tg will then
approximate 9 - Tg in L 2 . Combining the three approximations, we have
for any c > 0
f = fe: + (ge: - Tge:) + he: + ke:, (1)
where Ie: E L 2 with Tie: = fe:, ge: E £1 n LCXJ, and II he: 112 V IIke:1I1 < c.
20. Ergodic Properties of Markov Processes 393
Since fE is invariant, we have AnfE = IE. Next we note that
IIAn(ge: - Tge:) 1100 == n- 1 I1ge: - Tnge:lloo < 2n- 1 11gEl\oo -+ O. (2)
Hence,
limsupAnf < ie: + Mhe: + Mke: < 00 a.e.,
n --+ 00
and similarly for liminf n An!. Combining the two estimates gives
(limsuPn -liminfn)Ani < 2Ml h e:1 + 2Mlk g l.
Now Lemma 20.3 yields for any E, r > 0
IIMlh c ll1 2 5 IIh g l1 2 < E,
JL{Mlke:! > 2r} < r- 1 11k g 11 1 < E/r,
and so MlhEI + Mike:l -t 0 a.e. as E -t 0 along a suitable sequence. Thus,
An! converges a.e. toward some limit Af.
To see that Af is T-invariant, we note that by (1) and (2) the a.e. limits
Ah E and Ake: exist and satisfy T Af - Af == (T A - A)(h g + kg). By the
contraction property and Fatou's lemma, the right-hand side tends to 0
a.e. as E -t 0 along some sequence, and we get T AI == AI a.e. 0
A problem with the last theorem is that the limit AI nlay be 0, in which
case the a.s. convergence Ani -t Af gives little inforlnation about the
asymptotic behavior of Anf. For example, this happens \vhen J-tS == 00 and
T is the operator induced by a j.t-preserving and ergodic transformation f}
on S. Then Af is a constant, and the condition Af E L 1 implies Af = O. To
get around this difficulty, we may instead compare the asymptotic behavior
of Snf with that of Sng for a suitable reference function 9 ELI. This idea
leads to a far-reaching and powerful extension of Birkhoff's theorem.
Theorem 20.4 (ratio ergodic theorem, Chacon and Ornstein) Let T be a
positive L 1 -contraction on a measure space (S,S,jl), and fix any f E L 1
and 9 E L. Then Snf /Sng converges a.e. on the set {Sf oog > O}.
Our proof will be based on three lemmas.
Lemma 20.5 (individual terms) T n f/Sn+1g -+ 0 a.e. on {Soog > O}.
Proof: We may assume that f > O. Fix any E > 0, and define
h n == Tn f - ES n +1g, An == {h n > O},
By positivity,
'n > o.
h n = Thn-l - Eg < Th-;;_1 - Eg, n > 1.
Examining the cases An and A separately, we conclude that
h < Th-;;_1 - clAng, n > 1,
and so by contractivity
EJL[g; An] < JL(Th_l) - fLh < JLh-l - J-lh .
394 Foundations of Modern Probability
Summing over n gives
eJL Ln11Ang < JLht = JLU - 109) < JLi < 00,
which implies J.L[g; An La.] == 0 and hence limsuPn(Tn f /Sn+lg) < E a.e.
on {g > OJ. Since € was arbitrary, we obtain Tn 1 / Sn+lg --+ 0 a.e. on
{g > OJ. Applying this result to the functions T m I and Tmg gives the
same convergence on {Sm-lg == 0 < Smg} for arbitrary m > 1. 0
To state the next result, we introduce the nonlinear filling operator U on
L 1 , given by U h == Th+ - h_. It is suggestive to think of the sequence un h
as resulting from successive attempts to fill a hole h_, by mapping in each
step only the matter that has not yet fallen into the hole. We also define
Mnh == 8 1 h V . . . V Snh.
Lemma 20.6 (filling operator) For any h E L1 and n EN, we have
un- 1 h > 0 on {Mnh > OJ.
Proof. Writing h k = h+ + (Uh)+ + ... + (Ukh)+, we claim that
hk > Sk+lh, k > o.
(3)
This holds for k = 0 since h+ == h + h_ > h. Proceeding by induction, we
assume (3) to be true for k = m > O. Using the induction hypothesis and
the definitions of 8k, h k , and U, we get for m + 1
Sm+2 h h + TS m + 1 h < h + Th m = h + Lkm T(Ukh)+
- h+ Lkm (Uk+ 1 h+ (Ukh)_)
h+ " ((U k + 1 h)+ - (Uk+1h)_ + (Ukh)_)
k5:m
h + hm+l - h+ + h_ - (U m + 1 h)_ < hm+l.
This completes the proof of (3).
If Mnh > 0 at some point in S, then 8kh > 0 for some k < n, and so by
(3) we have h k > 0 for some k < n. But then (Ukh)+ > 0, and therefore
(Uk h)_ == 0 for the same k. Since (Ukh)_ is nonincreasing, it follows that
(U n - 1 h)_ == 0, and hence U n - 1 h > o. 0
To state our third and crucial lemma, we write 9 E 7i (I) for a given
I E L if there exists a decomposition f == il + /2 with 11, /2 E L
such that 9 = T 11 + 12. In particular, we note that I, gEL implies
U(I - g) == I' - 9 for some I' E 7(/). The classes Tn{/) are defined
recursively by 1n+l(/) = Ti(Tn(/» and we put 7(/) == Un Tn(f). We may
now introduce the functionals
'l/JBf = sup{J.L[g;B]; 9 E 7(1)}, f E L, B E S.
20. Ergodic Properties of Markov Processes 395
Lemma 20. 7 (filling functionals) Let I, gEL and B E S. Then
B C {limsupnSn(f - g) > O} ===> 'l/Jnf > 1/;Bg.
Proof: Fix any 9' E 7(g) and c > 1. First we show that
{limsuPnSn(f - g) > O} C {limsuPn8n(cf - g') > O} a.e. (4)
We may then assume that g' E Ti(g), since the general result then follows
by iteration in finitely many steps. Letting g' == r + Ts for some r, s E L
with r + s = g, we obtain
Sn(cf - g') == '" Tk(cf - r - Ts)
k<n
== Sn(f - g) + (c - l)Snf + s -- Tns.
Since Tns/Snf = Tn-ITs/Snf -+ 0 a.e. on {Boo! > O} by Lemma 20.5,
we conclude that eventually Sn(cf - g') > 8n(f - g) a.e. on the same set,
and (4) follows.
Combining the given hypothesis with (4), we obtain the a.e. relation
B C {M(cf - g') > O}. Now Lemma 20.6 yields
Un-l(c! - g') > 0 on Bn = B n {Mn(cf - g') > O}, n E N.
Since Bn t B a.e. and Un-l(c! - g') == I' - g' for some f' E T(cf), we get
o < J1.[U n - l (cJ - g'); Bn]
J1.[I'; Bn] - Jl [g'; Bn]
< 1/;B (cf) - Jl[g'; Bn]
--+ c1jJ B f - Jl[g'; B],
and so c1/;BI > J1.[g'; B]. It remains to let c --+ 1 and take the supremum
over g' E 7(g). 0
Proof of Theorem 20.4: We may assume that 1 > O. On {Soo9 > O}, put
Q == liminfn(Snf/Sng) < limsuPn(Snf/Sng) == fJ,
and define Q = /3 = 0 otherwise. Since Sng is nondecreasing, we have for
any c > 0
{,B > c} C {limsuPn(Sn(f - cg)/Sn9) > 0, 5 00 9 > O}
C {limsuPnBn(f - cg) > O}.
Writing B = {B = 00, 8 00 9 > O}, we see from Lemma 20.7 that
c'l/JBg == 'l/JB(cg) < 'lfJBf < J-tf < 00,
and as c --+ 00 we get 'tPBg == O. But then J-L[Tng; B] == 0 for all n > 0, and
therefore J.L[Soog; B] = Q. Since 8 00 9 > 0 on B, we obtain JlB == 0, which
means that {3 < 00 a.e.
Now define C = {Q < a < b < {3} for fixed b > a > O. As before,
C C {limsuPn8n(f - bg) 1\ limsuPnSn(ag - f) > O},
396 Foundations of Modern Probability
and so by Lemma 20.7
b'lf;cg == 'l/;c(bg) < 'l/Jcf < 1/Jc(ag) == a1/Jcg < 00,
which implies 1/;Cg == 0, and therefore J-LC = O. Hence,
J-l{ a < jJ} < '""' J.l{ a < a < b < jJ} == 0,
a<b
where the summation extends over all rational a < b, and so a == jJ a.e.,
which proves the asserted convergence. 0
We illustrate the use of the last theorem by considering a striking ap-
plication to discrete-time Markov processes. Given such a process X on S
and a measurable function f on 5 00 , we define 5 n f == Ek<n j(OkX).
Corollary 20.8 (ratio limit theorem) Given a discrete-time Markov pro-
cess in S with invariant measure '\, we have for any fELl (P A ) and
9 E L(P>J
(i) Sn! /Sn9 converges a.e. P A on {y E soo; Soog(y) > O};
(ii) Ex5n!/ExSng converges a.e. ,\ on {x E S; ExS009 > O}.
Proof: (i) By emma 20.1 (ii) we may apply Theorem 20.4 to the L1-
LOO-contractio 0 on (BOO, P)..) induced by the shift 0, and the result follows.
(ii) Writing f(x) = Exf(X) and using the Markov property at k, we get
- k -
Exf(Ok X ) == ExExkf(X) == Exf(Xk) == T f(x),
and so
Ex5n! == Ex!(OkX) == T k J(x) == Snj(X),
k<n k<n
where Sn == TO + .. . + Tn-Ion the right. We also note that
;..j = J j(x) >'(dx) = J Exf(X) >'(dx) = E>.f(X) = P>.l
Now Lemma 20.1 (i) shows that T is a positie £1 - LOO-contraction on
(8,'\). By Theorem 20.4 we conclude that Snf(x)/8ng(x) converges a.e.
,\ on the set {x E 8; 8ooY(x) > O}, which translates immediately into the
asserted statement. 0
Now consider a conservative, continuous-time Markov process on an ar-
bitrary state space (8, S) with distributions Px and associated expectation
operators Ex. On the canonical path space f2 = SJR.+ we introduce the shift
operators Ot and filtration :F = (:F t ). A bounded function f: S --+ JR is said
to be invariant or harmonic if it is measurable and such that
f(x) = Ttf(x) = Exf(Xt), x E 8, t > o.
More generally, we say that a bounded function f: SxJR+ --+ 1R is space-time
invariant or harmonic if it is measurable and satisfies
f(x, s) = Exf(Xt, s + t), XES, s, t > o.
(5)
20. Ergodic Properties of Markov Processes 397
For motivation, we note that f is then invariant for the associated space-
time process Xt = (X t , s + t) in S x ffi.+, where the second component
is deterministic apart from the possibly random initial value s > o. Note
that X is again a time-homogeneous Markov process with transition oper-
ators Ttf(x, s) == Exf(Xt, s + t). We need the following useful martingale
connection.
Lemma 20.9 (space-time invariance) A bounded, meas'urable function f:
S x + JR is space-time invariant iff the process Mt == f(Xt, s + t) is a
PJ-l-martingale for any J-l and s > o.
Proof: Assume that f is space-time invariant. Letting s, t, h > 0 and
using the Markov property of X, we get
EJ-l[Mt+hIFt] EJ-l(f(Xt+h, S + t + h) 1Ft]
- EXt!(Xh, s + t + h)
f(Xt, s + t) == Mt,
which shows that M is a PJ-l-martingale. Conversely, the martingale
property of M for J-l = 8x yields
Exf(Xt, s + t) == ExMt == ExMo == f(x, s) a.s., XES, s, t > 0,
which means that f is space-time invariant.
o
The tail u-field on n is defined as T == nt It, where Tt == a(Ot) -
o-{X s ; s > t}. A a-field 9 on n is said to be PJ-l-trivial if PJ-lA == 0 or 1 for
every A E Q. We write p!! == PJ-l[ .IB] and say that PJ-l is mixing if
lim IIpJl 0 Bi l - p/! 0 Bill! = 0, B E :Foo with PJ-lB > O.
t-+oo fA'
The following key result defines the notion of strong ergodicity, as opposed
to the weak ergodicity of Theorem 20.11.
Theorem 20.10 (strong ergodicity, Orey) For any conservative, discrete-
or continuous-time Markov semigroup with distributions PJ-L, these condi-
tions are equivalent:
(i) the tail a-field T is PJ-L-trivial for every jj;
(ii) Pj-t is mixing for every J-l;
(iii) every bounded, space-time invariant function is a constant;
(iv) IIPJ.L 0 Bi l - Pv 0 Billl 0 as t 00 for any J-l and v.
First proof: By Theorem 10.27 (i) we note that (ii) and (iv) are equivalent
to the conditions
(ii') PJl == P: on T for any J-l and B;
(iv') PJ.L = Pv on T for any J.t and v.
We may then prove that (ii') {::} (i) :::;. (iv') and (iv) => (iii) => (i).
398 Foundations of Modern Probability
(i) {:::} (ii'): If PJLA = 0 or 1, then clearly also Pff A = 0 or 1, which shows
that (i) =} (ii'). Conversely, let A E T be arbitrary with PJLA > o. Taking
B = A in (ii') gives PJ.LA = (PJ.LA)2, which implies PJ.LA = 1.
(i) => (iv'): Applying (i) to the distribution (J.L+v) gives PJLA+PvA = 0
or 2 for every A E T, which implies PJ.LA = PvA = 0 or 1.
(iv) ::::} (iii): Let I be bounded and space-time invariant. Using (iv) with
tL = Dx and v = Dy gives
If(x,8) - fey, 8)1 = IExf(Xt, s + t) - Eyf(Xt, s + t)/
< 11111 IIPx 0 Oil - Py 0 Bill! -t 0,
which shows that I (x, s) = f (s) is independent of x. But then f (s )
f(s + t) by (5), and so f is a constant.
(iii) ::::} (i): Fix any A E T. Since A E Tt = a(Ot) for every t > 0, we have
A = OtlAt for some sets At E ;:00' and we note that At is unique since Ot
is surjective. For any s, t > 0,
Ot10;lAs+t = O;JtAs+t = A = Bt1At, s, t > 0,
and so O;lA s + t = At. Putting f(x, t) == PxAt and using the Markov
property at time s, we get
Exf(Xs,8 + t) = ExPxsAs+t = Expx[O;lAs+tIFs] = PxAt == f(x, t).
Thus, f is space-time invariant and therefore equal to a constant c E [0,1].
By the Markov property at t and martingale convergence as t -+ 00, we
have a.s.
c == f(Xt, t) = PXtAt == PJ.L[BtlAtIFt] == PJL[AIFt] lA,
which implies PJLA = c E {O, I}. This shows that T is PJL-trivial. 0
Second proof: We can avoid using the rather deep Theorem 10.27 by
giving direct proofs of the implications (i) => (ii) ::::} (iv).
(i) => (ii): Assuming (i), we get by reverse martingale convergence
IIPJL (. n B) - PJL (. )PJL (B) II 'It = 'IEJL [PJL [BiTt] - PJLB; .] II 'It
< EJLIPJL[BITt] - PJLBI o.
(ii) => (iv): Let J-l' - v' be the Hahn decomposition of J.L - v and choose
B E S with J.L' BC = vB = O. Writing X = J.L' + v' and A = {X o E B}, we
get by (ii)
IIPIJ 0 Ot 1 - PI' 0 Ot 1 11 = lip/II lip:: 0 Ot 1 - pt 0 Ot 1 11--+ o. 0
The invariant a-field 'I on n consists of all events A c {1 such that
Oil A == A for all t > O. Note that a random variable on {1 is I-measurable
iff 0 (Jt = for all t > o. The invariant a-field I is clearly contained in the
tail a-field T. We say that Pp, is weakly mixing if
lim 1 1 (PI' - p/!) 00;/ ds = 0, B E ;:00 with PIJB > 0,
t-+oo 0
20. Ergodic Properties of Marko,r Processes 399
where it is understood that Os = O[s] when the time scale is discrete. We
may now state the weak counterpart of Theorem 20.10.
Theorem 20.11 (weak ergodicity) For any measurable, conservative, dis-
crete- or continuous-time Markov semigroup with distributions PJ-L, these
conditions are equivalent:
(i) the invariant a-field I is PJ-L -trivial for every J-l;
(ii) PJL is weakly mixing for every J-l;
(iii) every bounded, invariant function is a constant;
(iv) II J(PJ-L - Pv) oO;;ldsll 0 as t 00 for any J-l and v.
Proof: By Theorem 10.27 (ii) we note that (ii) and (iv) are equivalent to
the conditions
(ii ' ) PJL == p!! on I for any J-l and B;
(iv') PJL == Pv on I for any J.L and v.
Here the implications (ii') <=> (i) =? (iv') may be established as before, and
so it is enough to show that (iv) => (iii) => (i).
(iv) => (iii): Let f be bounded and invariant. Then f(x) = Ex/(Xt) ==
Ttf(x), and therefore I(x) = J; Tstf(x) ds. Using (iv) gives
If(x) - f(y)1 - 1 1 (Tsd(x) - Tsd(y)) ds
< IIfll 1 1 (Px - Py) 0 e-;/ ds -+ 0,
which shows that f is a constant.
(iii) => (i): Fix any A E I, and define I(x) == PxA. Using the Markov
property at t and the invariance of A, we get
Ex/(Xt) = ExPXtA == Ex p x[Ot 1A IFd = Px O t lA == Px A :=: f(x),
which shows that f is invariant. By (iii) it follows that f equals a constant
c E [0,1]. Hence, by the Markov property and martingale convergence, we
have a.s.
c = f{X t ) = PXtA = PJl[8t1AIFt] = PJL[AIFtl -t lA,
which implies PJLA = c E {O, 1}. Thus, I is PJ.L-trivial.
o
Let us now specialize to the case of conservative Feller processes X with
distributions Px, defined on an IcscH (locally compact.. second countable
Hausdorff) space S with Borel a-field S. We say that the process is regular
if there exist a locally finite measure p on S and a continuous function
(x, y, t) t-+ Pt(x, y) > 0 on 8 2 x (0,00) such that
Px{Xt E B} = L Pt(x, y) p(dy), xES, B E S, t > o.
Note that the supporting measure p is then unique, up to an equivalence,
and that supp(p) = S by the Feller property. A Feller process is said to be
400 Foundations of Modern Probability
Harris recurrent if it is regular with a supporting measure p satisfying
1 00 IB(X t ) dt = 00 a.s. Px, x E 5, B E S with pB > O. (6)
Theorem 20.12 (Harris recurrence and ergodicity, Grey) Any Harris
recurrent Feller process is strongly ergodic.
Proof: By Theorem 20.10 it suffices to prove that any bounded, space-
time invariant function f: S x JR+ -+ 1R is a constant. First we show for
fixed xES that f(x, t) is independent of t. Then assume instead that
I(x, h) =1= f(x,O) for some h > 0, say f(x, h) > f(x, 0). Recall from Lemma
20.9 that Mt = f(Xt, S + t) is a Py-martingale for any yES and s > o. In
particular, the limit M exists a.s. along hQ, and we get a.s. Px
Ex[M - MIFo] = }Yl - M8 == f(x, h) - f(x, 0) > 0,
which implies Px{M > M} > O. We may then choose some constants
a < b such that
px{M20 < a < b < M} > o.
(7)
We also note that
Mt+ h 0 Os == f(Xs+t, S + t + h) == M:+ t , s, t, h > o. (8)
With sand t restricted to hQ+, we define
g(y,s) = pynto{MtS < a < b < M!+h}, Y E 5, s > o.
Using the Markov property at sand (8), we get a.s. Px for any r < s
g(Xs, s) - p x . ntO {M! < a < b < M:+ h }
p:Fs n { MS 0 () < a < b < M s + h 0 () }
x tO t s - - t s
p:Fs n { MO < a < b < Mh }
x t?s t - - t
> p:Fs n { MO < a < b < M h } .
x > t- - t
t_T
By martingale convergence, we get a.s. as s -+ 00 along hQ and then r 00
liminfg(Xs,s) > liminfl{M < a < b < Mt h }
s-+oo t-+oo
> 1 {M < a < b < M},
and so by (7)
Px{g(Xs, s) -4 I} > Px{M < a < b < M} > o. (9)
Now fix any nonempty, bounded, open set B c S. Using (6) and the
right-continuity of X, we note that limsups IB(X s ) == 1 a.s. Px, and so in
view of (9)
Px{limsuPsIB(Xs) g(Xs, s) = I} > o.
(10)
20. Ergodic Properties of Markor Processes 401
Furthermore, we have by regularity
Ph ( U, v) !\ P2h ( u, v) > E > 0, u, v E B,
(11)
for some E > o. By (10) we may choose some y E Band s > 0 such that
g(y, s) > 1 - EpB. Define for i == 1,2
B i == B \ {u E S; feu, s + ih) < a < b < f(u, s + (i + l)h)}.
Using (11), the definitions of B i , MS, and g, and the properties of y and s,
we get
EpB i < Py{X ih E B i }
< 1 - Py{f(Xih, s + ih) < a < b < !(X ih , .5 + (i + l)h)}
1 - P { M < a < b < M+h }
y h - - h
< 1 - g(y, s) < EpB.
Thus, pB l + pB 2 < pB, and there exists some u E B \ (B l U B 2 ). But
this yields the contradiction a < b < I ( u, s + 2h) < a, which shows that
I(x, t) = f(x) is indeed independent of t.
To see that I(x) is also independent of x, we assume that instead
p{x; f(x) < a} /\ p{x; f(x) > b} > 0
for some a < b. Then by (6) the martingale Mt == f(Xt) satisfies
l001{Mt < a}dt= l001{Mt > b}dt=oo a.s. (12)
Writing M for the right-continuous version of M, which exists by Theorem
7.27 (ii), we get by Fubini's theorem for any XES
sup Ex {U l{M t < a} dt - r l{M t < a} dt
u>o Jo Jo
< 1 00 Exll{Mt < a} - l{Mt < a}\ dt
< 1 00 Px{M t =I- Md dt = 0,
and similarly for the events Mt > band Mt > b. Thus, the integrals on the
lft agree a.s. Px for all x, and so (12) remains true with M replaced by
M. In particular,
- -
lim inf Mt < a < b < lim sup Mt a.s. P.r.
t ---+- 00 t---+-oo
But this is impossible, since M is a bounded, right-continuous martingale
and therefore converges a.s. The contradiction shows that f(x) == c a.e. p
for some constant c E JR. Then for any t > 0,
f(x) = Exf(Xt) = J f(y) Pt(x, y) p(dy) = c, XES,
402 Foundations of Modern Probability
and so f(x) is indeed independent of x.
o
Our further analysis of regular Feller processes requires some poten-
tial theory. For any measurable functions f, h > 0 on S, we define the
h-potential of Uhf of f by
Uh!(X) = Ex 1 00 e-A f(Xt) dt, XES,
where Ah denotes the elementary additive functional
A = it h(X s ) ds, t > O.
When h is a constant a > 0, we note that Uh = U a agrees with the resolvent
operator Ra of the semigroup (Tt), in which case
Ua!(X) = Ex 1 00 e- at f(Xt) dt = 1 00 e-atTtf(x) dt, XES.
The classical resolvent equation extends to general h-potentials as follows.
Lemma 20.13 (resolvent equation) Let f > 0 and h > k > 0 be mea-
surable functions on S, and assume that h is bounded. Then Uhh < 1,
and
Uk! = Uhf + Uh(h - k)Ukf = Uhf + Uk(h - k)Uh!.
Proof: For convenience, we define F = f{X), H = h{X), and K = k(X).
By Ita's formula for continuous functions of bounded variation,
e-A = 1 -it e-A Hs ds, t > 0,
(13)
which implies Uhh < 1. We may also conclude from the Markov property
of X that a.s.
( ) f _A k
Uk! Xt - EXt looo e · Fsds
- E;t 1 00 e-Ao9t Fs+t ds
- E;t 1 00 e- A :+ A ; Fu duo
(14)
20. Ergodic Properties of Markov Processes 403
Using (13) and (14) together with Fubini's theorem, we get
roo h 1 00 k k
Ex Jo e- At (Ht - Kt) dt t e-A,,+At Fu du
Ex 1 00 e-A Fu du l u e-A+A (Ht - Kt) dt
Ex 1 00 e-A Fu (1 - e-A+A) du
Ex 1 00 (e-A - e-A) Fu du
Ukf(x) - Uhf(x).
Uh(h - k)Ukf(x)
A similar calculation gives the same expression for Uk(h - k)Uhf(x). 0
For a simple application of the resolvent equation, we show that any
bounded potential function Uhf is continuous.
Lemma 20.14 (boundedness and continuity) For any regular Feller pro-
cess on S, let J, h > 0 be bounded, measurable functions on S such that
Uhf is bounded. Then Uhf is continuous.
Proof: Using Fatou's lemma and the continuity of Pt(., y), we get for any
time t > 0 and sequence X n --t x in S
lfTd(xn) lf J Pt(xn, y)f(y) p(dy)
> J Pt(x, y)f(y) p(dy) = Ttf(x).
If f < c, the same relation applies to the function Tt(c- f) == c-Ttf, and by
combination it follows that Tt! is continuous. By dominated convergence,
Uaf is then continuous for every a > o.
Now assume that h < a. Applying the previous result to the bounded,
measurable function (a - h)Uhf > 0, we conclude that even Ua(a - h)Uhf
is continuous. The continuity of Uhf now follows from Lemma 20.13 with
hand k replaced by a and h. 0
We proceed with some useful estimates.
Lemma 20.15 (lower bounds) For any regular Feller process on S, there
exist some continuous functions h, k : S --t (0,1] such that, for every
measurable function f > 0 on S,
(i) U 2 f(x) > p(kf) hex);
(ii) Uhf(x) > p(kf) Uhh(x).
Proof: Fix any compact sets K c Sand T c (0, 00) with pK > 0 and
AT > o. Define
u(X,y) = l e-atpt(x,y)dt, x,y E 8,
404 Foundations of Modern Probability
and note that for any measurable function f > 0 on S,
Uaf(x) > / u?;(x,y)f(y)p(dy), xES.
Using Lemma 20.13 for the constants 4 and 2 gives
U 2 f(x) U 4 f(x) + 2U 4 U 2 !(X)
> 2 L ur(x,y)p(dy) / ur(y,z)f(z)p(dz),
and (i) follows with
h(x) = 2 f K uI(x, y) p(dy) 1\ 1, k(x) == inf uf(y, x) 1\ 1.
i yEK
To deduce (ii), we may combine (i) with Lemma 20.13 for the functions 2
and h to obtain
Uh!(X) > Uh(2 - h) U 2 !(x)
> Uh U 2f(x) > Uhh(x) p(kf).
The continuity of h is clear by dominated convergence. For the same
reason, the function ur is jointly continuous on S2. Since K is compact,
the functions ur(y, .), y E K, are then equicontinuous on S, which yields
the required continuity of k. Finally, the relation h > 0 is obvious, whereas
k > 0 holds by the compactness of K. 0
Fixing a function h as in Lemma 20.15, we introduce the kernel
QxB = Uh(hlB)(x), XES, B E S,
(15)
and note that QxS == Uhh(x) < 1 by Lemma 20.13.
Lemma 20.16 (convergence dichotomy) Let Q be given by (15) in terms
of some function h as in Lemma 20.15.
(i) If Uhh 1, then IIQnSII < rn-l, n E N, for some r E (0,1).
(ii) If Uhh = 1, then IIQ - vii --+ 0, x E 5, for some Q-invariant prob-
ability measure v rv p, and every a-finite, Q-invariant measure on S
is proportional to 1/.
Proof: (i) Choose k as in Lemma 20.15, fix any a E S with Uhh(a) < 1,
and define
r == 1 - h(a) p(hk(1 - Uhh».
Note that p(hk(1 - Uhh)) > 0 since Uhh is continuous by Lemma 20.14.
Using Lemma 20.15 (i), we obtain
o < 1 - r < h(a) pk < U 2 1(a) = .
Next we see from Lemma 20.15 (ii) that
(1 - r)Uhh = h(a) p(hk(l - Uhh)) Uhh < Uhh(l - Uhh).
20. Ergodic Properties of Markov Processes 405
Hence,
Q 2 S == UhhUhh < rUhh == rQS,
and so by iteration
Qns < rn-1QS < rn-I, n E N.
(ii) Introduce a measure p == hk . p on S. Since Uhh == 1, we get by
Lemma 20.15 (ii)
pB == p(hk1B) < U h (h1 B )(x) == QxB, B E:: S. (16)
Regarding p as a kernel, we have for any x, yES and m, n E Z+
(Qr; - Q;)jjk == (Qr: s - Q S)pk == o.
Iterating (17) and using (16), we get as n --7 00
IIQ _ Q+kll II(b x _ Q)Qnll
II(b x - Q)(Q - p)nll
< Ilb x - QZII sUPzllQz - plln
< 2(1 - ps)n o.
(17)
Hence, suPx IIQ - vII -t 0 for some set function v on S, and it is easy to
see that v is a Q-invariant probability measure. By Fubini's theorem we
note that Qx « p for all x, and so 1/ == 1/Q « p. Conversely, Lemma 20.15
(ii) yields
vB == vQ(B) == v(U h (hlB)) > p(hk1B),
which shows that even p 1/.
Now consider any a-finite, Q-invariant measure J.1 on S. By Fatou's
lemma, we get for any B E S
J.LB == Hm inf ftQn B > ftV B == ftS 1/ B .
n-+oo
Choosing B such that J.-tB < 00 and vB > 0, we obtain J.LS < 00. We may
then conclude by dominated convergence that ft == J-L?n -+ J-LV == J-LS v,
which proves the asserted uniqueness of v. 0
We are now ready to prove the basic recurrence dichotomy for regular
Feller processes. Write U == U o and say that X is uniformly transient if
IIUIK!I = sUPx Ex 1 00 lK(X t ) dt < 00, K c S compact.
Theorem 20.17 (recurrence dichotomy) A regular FeUer process is either
Harris recurrent or uniformly transient.
Proof: Choose hand k as in Lemma 20.15 and Q, r, and v as in Lemma
20.16. First assume that Uhh t 1. Letting a E (0, IIhll]' we note that ah <
(h !\ a) IIhll, and hence
aUhAah < UhAa(h 1\ a)lIhll < IIhll. (18)
406 Foundations of Modern Probability
FUrthermore, Lemma 20.13 yields
UhAah < Uhh + UhhUhAah = Q(1 + UhAah).
Iterating this relation and using Lemma 20.16 (i) and (18), we get
Uhl\a h
< Q l 1 + QnUhl\a h
L...J l5: n
< 2:1nrl-l + rn-11lUhl\ahli
< (1 - r)-l + r n - 1 I1hllla.
Letting n -4 00 and then a -t 0, we conclude by dominated and monotone
convergence that Uh < (1 - r)-I. Now fix any compact set K c S. Since
b = inf K h > 0, we get
UI K (x) < b-1Uh(x) < b- 1 (1 - r)-l < 00, xES,
which shows that X is uniformly transient.
Now assume instead that Uhh = 1. Fix any measurable function f on S
with 0 < I < hand pf > 0, and put 9 = 1- Uff. By Lemma 20.13 we get
9 1 - Ufl == Uhh - Uhf - Uh(h - f)Uff
- Uh(h - f)(1 - Ufl)
- Uh(h - I)g < Uhhg = Qg. (19)
Iterating this relation and using Lemma 20.16 (ii), we obtain 9 < Qng -4
lIg, where 1/ "J P is the unique Q-invariant distribution on S. Inserting this
into (19) gives 9 < Uh(h - 1)1/g, and so by Lemma 20.15 (ii)
1/g < v(Uh(h - f)) vg < (1 - p(kf)) vg.
Since p(kf) > 0, we obtain lIg == 0, and so Uff = 1 - 9 = 1 a.e. l/ rv p.
Recalling that Utf is continuous by Lemma 20.14 and suppp == S, we
obtain UtI = 1. Taking expected values in (13), we conclude that Ate = 00
a.s. Px for every xES. Now fix any compact set K c S with pK > O. Since
b = infK h > 0, we may choose f = blK, and the desired Harris recurrence
follows. 0
A measure A on S is said to be invariant for the semigroup (Tt) if
A(Ttf) == Af for all t > 0 and every measurable function f > 0 on S.
In the Harris recurrent case, the existence of an invariant measure A can
be inferred from Lemma 20.16.
Theorem 20.18 (invariant measure, Harris, Watanabe) Any Harris re-
current Feller process on S with supporting measure p has a locally finite,
invariant measure A "J p, and every u-finite, invariant measure agrees with
A up to a normalization.
To prepare for the proof, we first express the required invariance in terms
of the resolvent operators.
20. Ergodic Properties of Markov Processes 407
Lemma 20.19 (invariance equivalence) Let (Tt) be a Feller semigroup
on S with resolvent (U a ), and fix any locally finite measure A on Sand
constant c > o. Then A is (Tt)-invariant iff it is aU a -invariant for every
a > c.
Proof: If A is (Tt)-invariant, then Fubini's theorenl yields for any
measurable function f > 0 and constant a > 0
),,(Uaf) = 1 00 e- at )"(Td) dt = 1 00 e- at )..1 dt = )..J/ a, (20)
which shows that A is aUa-invariant.
Conversely, assume that A is aUa-invariant for every a > c. Then for
any measurable function f > 0 on S with Af < 00, the integrals in (20)
agree for all a > c. Hence, by Theorem 5.3 the measures .A(Ttf)e-ctdt and
Afe-ctdt agree on IR+, which implies A(Ttf) == Af for almost every t > O.
By the semi group property and Fubini's theorem we then obtain for any
t > O
)"(Td) - c)..Uc(Td) = cA l OCJ e-csTsTd ds
- c 1 00 e- CS )"(Ts+tf) ds
c 1 00 e- CS )..1 ds = )..1,
which shows that A is (Tt)-invariant.
o
Proof of Theorem 20.18: Let h, Q, and v be such as in Lemmas 20.15
and 20.16, and put A = h- 1 . v. Using the definition of .A (twice), the Q-
invariance of v (three times), and Lemma 20.13, we get for any constant
a > Ilhl! and bounded, measurable function f > 0 on S
aAUaf - av(h- 1 U a f) = avUhUal
- v(Uhf - Ual + UhhU a !)
- vUhl == v(h- 1 I) == AJ,
which shows that A is aUa-invariant for every such a. By Lemma 20.19 it
follows that A is also (Tt)-invariant.
To prove the asserted uniqueness, consider any a-fillite, (Tt)-invariant
measure A' on S. By Lemma 20.19, A' is even aUa-invariant for every a >
IIhll. Now define v' == h . A'. Letting I > 0 be bounded and measurable on
S and using Lemma 20.13, we get as before
v'Uh(hf) - A'(hUh(hf)) = aA'UahUh(hf)
- aA'(Ua(hf) - Uh(hf) + aUa[Th(hf))
- aA'Ua(hf) = A/(hl) = v' f,
408 Foundations of Modern Probability
which shows that v' is Q-invariant. Hence, the uniqueness part of Lemma
20.16 (ii) yields v' = cv for some constant c > 0, which implies A' -
CA. 0
A Harris recurrent Feller process is said to be positive recurrent if the
invariant measure A is bounded and null-recurrent otherwise. In the former
case, we may assume that A is a probability measure on S. For any process
X in S, the divergence Xt ---+ 00 a.s. or Xt 00 means that lK(X t ) ---+ 0
in the same sense for every compact set K c S.
Theorem 20.20 (distributional limits) For any regular Feller process X
and distribution Jl on S, the following holds as t -+ 00:
(i) If X is positive recurrent with invariant distribution ,X and A E ;::00
with PI1A > 0, then IIPt 0 (r;l - PAil -t O.
(ii) If X is null-recurrent or transient, then Xt P> 00.
Proof: (i) Since P).. 0 Oil = P A by Lemma 8.11, the assertion follows from
Theorem 20.12 together with properties (ii) and (iv) of Theorem 20.10.
(ii) (null-recurrent case): For any compact set K c S and constant £ > 0,
we define
Bt = {x E S; Tt 1 K(X) > J-lTt1K - c:}, t > 0,
and note that, for any invariant measure '\,
(J-lTt1K - c) 'xBt < 'x(T t 1K) = AK < 00.
(21)
Since /-lTt1K - Tt1K(x) -+ 0 for all xES by Theorem 20.12, we have
liminft Bt = S, and so 'xBt --+ 00 by Fatou's lemma. Hence, (21) yields
limsuPt J-l T t 1 K < c, and since € was arbitrary, we obtain PJ.L{Xt E K} =
JlTt1K -+ o.
(ii) (transient case): Fix any compact set K c S with pK > 0, and
conclude from the uniform transience of X that UI K is bounded. Hence,
by the Markov property at t and dominated convergence,
E,..UIK(X t ) = E,..EXt 10 00 lK(X s ) ds = EJ1-1 OO lK(X s ) ds -* 0,
which shows that UIK(X t ) P IL ) O. Since UIK is strictly positive and also
continuous by Lemma 20.14, we conclude that Xt P IL ) 00. 0
We complete our discussion of regular Feller processes with a pathwise
limit theorem. Recall that "almost surely" means a.s. PJ-t for every initial
distribution J.L on S.
20. Ergodic Properties of Markov Processes 409
Theorem 20.21 (pathwise limits) For any regular Feller process X on S,
the following holds as t --t 00:
(i) If X is positive recurrent with invariant distribution .A, then
ellt f(OsX) ds -+ E>J(X) a.s., f bounded, measurable.
(ii) If X is null-recurrent, then
ellt lK(X s ) ds -+ 0 a.s., K c S compact.
(iii) If X is transient, then X t --t ()() a. s.
Proof: (i) From Lemma 8.11 and Theorems 20.10 (i) and 20.12 we note
that P A is stationary and ergodic, and so the assertion holds a.s. P A by
Corollary 10.9. Since the stated convergence is a tail event and Pp, == P A
on T for any J..t, the general result follows.
(ii) Since P A is shift-invariant with P A {X s E K} == .A K < 00, the left-
hand side converges a.e. P A by Theorem 20.2. From Theorems 20.10 and
20.12 we see that the limit is a.e. a constant c > O. Using Fatou's lemma
and Fubini's theorem gives
E>..c < lim inf e l t P>.. {X s E K} ds = >"K < 00,
t-H)() J o
which implies c == 0 since IIPAII == 11.A11 == 00. The general result follows from
the fact that PJ.L = Pv on T for any distributions J..t and II.
(iii) Fix any compact set K c S with pK > 0, and conclude from the
Markov property at t > 0 that a.s. PJ-L
UIK(X t ) = Ex, 1 00 lK(Xr)dr = E:" 1 00 lK(X r ) dr.
Using the chain rule for conditional expectations, we get for any s < t
E/L[UIK(Xt)l.F s ] E:'S 1 00 lK(X r ) dr
< E:,'l OO lK(X r ) dr = UIK(X s ),
which shows that UIK(X t ) is a supermartingale. Since it is also nonnegative
and right-continuous, it converges a.s. Pp, as t ---* 00, and the limit equals
p
o a.s. since UIK(X t ) 0 by the preceding proof. Since UI K is strictly
positive and continuous, it follows that Xt --+ 00 a.s. Pp,. 0
410 Foundations of Modern Probability
Exercises
1. Given a measure space (S, S, J..L), let T be a positive, linear operator on
L 1 n Loo. Show that if T is both an Ll- con traction and an Loo-contraction,
then it is also an LP-contraction for every p E [1,00]. (Hint: Prove a Holder-
type inequality for T.)
2. Extend Lemma 10.3 to arbitrary transition operators T on a measurable
space (S, S). In other words, letting I denote the class of sets B E S with
T1B = IB, show that an S-measurable function I > 0 is T-invariant iff it
is I-measurable.
3. Prove a continuous-time version of Theorem 20.2 for measurable
semigroups of positive L 1 - LOO-contraction. (Hint: Interpolate in the
discrete-time result.)
4. Let (Tt) be a measurable, discrete- or continuous-time semigroup of
positive L1-Loo-contractions on (8, S, v), let J..Ll, ft2, . .. be asymptotically
invariant distributions on Z+ or IR+, and define An = J TtJ-Ln (dt). Show
that Ani At for any I E Ll(A), where denotes convergence in mea-
sure. (Hint: Proceed as in Theorem 20.2, using the contractivity together
with Minkowski's and Chebyshev's inequalities to estimate the remainder
terms. )
5. Prove a continuous-time version of Theorem 20.4. (Hint: Use Lemma
20.5 to interpolate in the discrete-time result.)
6. Derive Theorem 10.6 from Theorem 20.4. (Hint: Take 9 = 1, and proceed
as in Corollary 10.9 to identify the limit.)
7. Show that when! > 0, the limit in Theorem 20.4 is strictly positive on
the set {Boo! 1\ 8 00 9 > O}.
8. Show that the limit in Theorem 20.4 is invariant, at least when T is
induced by a measure-preserving map on S.
9. Derive Lemma 20.3 (i) from Lemma 20.6. (Hint: Note that if 9 E 7(/)
with fELt, then J-L9 < ftf. Conclude that for any h E L 1 , J-L[h; Mnh >
0] > ft[U n - 1 h; Mnh > 0] > 0.)
10. Show that Brownian motion X in d is regular and strongly ergodic
for every dEN with an invariant measure that is unique up to a constant
factor. Also show that X is Harris recurrent for d = 1, 2, uniformly transient
for d > 3.
11. Let X be a Markov process with associated space-time process _ X.
Show that X is strongly ergodic in the sense of Theorem 20.10 iff X is
weakly ergodic in the sense of Theorem 20.11. (Hint: Note that a function
is space-time invariant for X iff it is invariant for X.)
12. For a Harris recurrent process on JR+ or Z+, every tail event is clearly
a.s. invariant. Show by an example that the statement may fail in the
transient case.
20. Ergodic Properties oE Markov Processes 411
13. State and prove discrete-time versions of Theorems 20.12,20.17, and
20.18. (Hint: The continuous-time arguments apply with obvious changes.)
14. Derive discrete-time versions of Theorems 20.17 and 20.18 from the
corresponding continuous-time results.
15. Show that a regular Markov process may be weakly but not strongly
ergodic. (Hint: For any strongly ergodic process, the assoeiated space-time
process has the stated property. For a less trivial example, consider a
suitable supercritical branching process.)
16. Give examples of nonregular Markov processes ",ith no invariant
measure, with exactly one (up to a normalization), and with more than
one.
17. Show that a discrete-time Markov process X and the corresponding
pseudo-Poisson process Y have the same invariant measures. FUrthermore,
regularity of X implies that Y is regular, but not conversely.
Chapter 21
Stochastic Differential Equations
and Martingale Problems
Linear equations and Ornstein-Uhlenbeck processes; strong ex-
istence, uniqueness, and nonexplosion criteria; weak solutions
and local martingale problems; well-posedness and measurabil-
ity; pathwise uniqueness and functional solution; weak existence
and continuity; transformation of SDEs; strong Markov and
Feller properties
In this chapter we shall study classical stochastic differential equations
(SDEs) driven by a Brownian motion and clarify the connection with the
associated local martingale problems. Originally, the mentioned equations
were devised to provide a pathwise construction of diffusions and more gen-
eral continuous semimartingales. They have later turned out to be useful in
a wide range of applications, where they may provide models for a diversity
of dynamical systems with random perturbations. The coefficients deter-
mine a possibly time-dependent elliptic operator A as in Theorem 19.24,
which suggests the associated martingale problem of finding a process X
such that the processes M f in Lemma 19.21 become martingales. It turns
out to be essentially equivalent for X to be a weak solutions to the given
SDE, as will be seen from the fundamental Theorem 21.7.
The theory of SDEs utilizes the basic notions and ideas of stochastic
calculus, as developed in Chapters 17 and 18. Occasional references will be
made to other chapters, such as to Chapter 6 for conditional independence,
to Chapter 7 for martingale theory, to Chapter 16 for weak convergence,
and to Chapter 19 for Feller processes. Some further aspects of the theory
are displayed at the beginning of Chapter 23 as well as in Theorems 24.2,
26.8, and 27.14.
The SDEs studied in this chapter are typically of the form
dX: = a; (t, X)dBl + b i (t, X)dt,
(1)
or more explicitly,
X; = X + L. ft 0-;(8, X)dB + t bi(s, X)ds, t > O. (2)
J J o J o
Here B = (Bl,..., BT) is a Brownian motion in JRr with respect to
some filtration F, and the solution X = (Xl,..., X d ) is a continuous
21. Stochastic Differential Equations and Martingale Problems 413
F-semimartingale in JRd. Furthermore, the coefficients a and b are progres-
sive functions of suitable dimension, defined on the canonical path space
C(IR+, IR d ) equipped with the induced filtration gt == a{ 11's; S < t}, t > o.
For convenience, we shall often refer to (1) as equation (0-, b).
For the integrals in (2) to exist in the sense of It6 and Lebesgue
integration, X must fulfill the integrability conditions
h t (Ia ij (s, X) I + Ib i (s, X)I)ds < 00 a.s.,
t :> 0
-- ,
(3)
where aij == ala or a == o-a', and the bars denote any norms in the
spaces of d x d-matrices and d-vectors, respectively. For the existence and
adaptedness of the right-hand side, it is also necessary that the integrands
in (2) be progressive. This is ensured by the following result.
Lemma 21.1 (progressive functions) Let the function f on JR+ xC(IR+, JRd)
be progressive for the induced filtration 9 on C(JR+,d), and let X be a
continuous, F-adapted process in JRd. Then the process It == f(t, X) is
F -progressive.
Proof: Fix any t > O. Since X is adapted, we note that 1r s (X) == Xs is Ft-
measurable for every s < t, where 1rs(w) == W s on C(JR+,JR d ). Since Qt ==
a{7r s ; s < t}, Lemma 1.4 shows that X is Ft/9t-measurable. Hence, by
Lemma 1.8 the mapping c.p(s,w) == (8, X(w)) is Bt o Ft/Bt o Qt-measurable
from [0, t] x 0 to [0, t] x C (IR+, ]Rd), where Bt == B[O, t]. Also note that f is
Bt Q9 gt-measurable on [0, t] x C(]R+, IR d ) since f is progressive. By Lemma
1.7 we conclude that Y = f 0'P is Bt Q9 Ft/B-measurable on [0, t] x n. 0
Equation (2) exhibits the solution process X as an JRd-valued semi-
martingale with drift components bi(X) . A and covariation processes
[Xi,Xj] = aij(X). A, where aij(w) = aij(.,w) and bi(w) == b(.,w). It
is natural to regard the densities aCt, X) and b(t, X) as local characteristics
of X at time t. Of special interest is the diffusion case, where 0- and b have
the form
aCt, w) == a(wt), b(t, w) == b(wt), t > 0, w E C(JR+, ]Rd), (4)
for some measurable functions on ]Rd. In that case, the local characteristics
at time t depend only on the current position Xt of the process, and the
progressivity holds automatically.
We shall distinguish between strong and weak solutions to an SDE (a, b).
For the former, the filtered probability space (O,:F, P) is regarded as given,
along with an F-Brownian motion B and an Fo-measurable random vector
. A strong solution is then defined as an adapted process X with Xo ==
a.s. satisfying (1). In case of a weak solution, only the initial distribution
J-t is given, and the solution consists of the triple (n, F, P) together with
an .r-Brownian motion B and an adapted process X yith P 0 XOI == J.L
satisfying (1).
414 Foundations of Modern Probability
This leads to different notions of existence and uniqueness for a given
equation (a, b). Thus, weak existence is said to hold for the initial distribu-
tion J1 if there is a corresponding weak solution (0, F, P, B, X). By contrast,
strong existence for the given J-L means that there is a strong solution X
for every basic triple (F, B,) such that has distribution J.L. We further
say that uniqueness in law holds for the initial distribution J.l if the cor-
responding weak solutions X have the same distribution. Finally, we say
that pathwise uniqueness holds for the initial distribution J.1- if, for any two
solutions X and Y on a common filtered probability space with a given
Brownian motion B such that Xo = Yo a.s. with distribution J.L, we have
X = Y a.s.
One of the simplest SDEs is the Langevin equation
dX t = dBt - Xtdt,
(5)
which is of great importance for both theory and applications. Integrating
by parts, we get from (5) the equation
d(e t Xt) = etdX t + e t Xtdt = etdB t ,
which admits the explicit solution
X - -t x + i t -(t-S) dB t > 0
t - e 0 e s, _,
o
recognized as an Ornstein-Uhlenbeck process. Conversely, the process in (6)
is easily seen to satisfy (5). We further note that ()tX Y as t --t 00, where
Y denotes the stationary version of the process considered in Chapter 13.
We can also get the stationary version directly from (6), by choosing Xo
to be N(D,) and independent of B.
We turn to a more general class of equations that can be solved explicitly.
A further extension appears in Theorem 26.8.
(6)
Proposition 21.2 (linear equations) Let U and V be continuous semi-
martingales, and put Z = exp(V - YO - ![V]). Then the equation dX =
dU + X dV has the unique solution
X = Z{Xo + Z-1 . (U - [U, V])}. (7)
Proof: Define Y = X/Z. Integrating by parts and noting that dZ = ZdV,
we get
dU = dX - XdV == YdZ + ZdY + dry, Z] - XdV = ZdY + d[Y, Z]. (8)
In particular,
[U, V] = Z. [Y, V] = [Y,Z].
(9)
Substituting (9) into (8) yields ZdY = dU - d[U, V], which implies dY =
Z- l d(U - [U, V]). To get (7), it remains to integrate from D to t and note
that YO = Xo. Since all steps are reversible, the same argument shows that
(7) is indeed a solution. 0
21. Stochastic Differential Equations and Martingale Problems 415
Though most SDEs have no explicit solution, we may still derive gen-
eral conditions for strong existence, pathwise uniqueness, and continuous
dependence on the initial conditions, by imitating the classical Picard itera-
tion for ordinary differential equations. Recall that the relation denotes
inequality up to a constant factor.
Theorem 21.3 (strong solutions and stochastic flows, lto) Let a and b
be bounded, progressive functions satisfying a Lipschitz condition
(u(w) - a(w')); + (b(w) - b(w')); (w - w');, t > 0, (10)
and fix a Brownian motion B in JRT with associated complete filtration :F.
Then there exists a jointly continuous process X == (Xf) on IR+ x JRd such
that, for any :Fo-measurable random vector in lR d , equation (a, b) has the
a.s. unique solution X{ starting at .
For one-dimensional diffusion equations, a stronger result is established
in Theorem 23.3. The solution process X == (Xt) on JR+ x }Rd is called the
stochastic flow generated by B. Our proof is based on two lemmas, and we
begin with an elementary estimate.
Lemma 21.4 (Gronwall) Let f be a continuous function on JR+ such that
f(t) < a + b it f(s)ds, t > 0, (11)
for some a, b > o. Then J(t) < ae bt for all t > o.
Proof: We may write (11) as
:t {e- bt it f(S)dS} < ae- bt , t > o.
It remains to integrate over [0, t] and combine with (11).
o
To state the next result, let S(X) denote the process defined by the
right- hand side of (2).
Lemma 21.5 (local contraction) Let a and b be bounded, progressive func-
tions satisfying (10), and fix any p > 2. Then there exists a non decreasing
function c > 0 on JR+ such that, for any continuous adapted processes X
andY inJRd,
E(S(X) - S(Y))? < 2EIX o - Yolp + Ct it E(X - Y)?ds, t > o.
416 Foundations of Modern Probability
Proof: By Theorem 17.7, condition (10), and Jensen's inequality,
E(S(X) - S(y));P - 2E/Xo - Yolp
< E((a(X) - a(Y)) . B);P + E((b(X) - bey)) . A);P
5 E(la(X) - a(Y)1 2 . A)f/2 + E(lb(X) - b(Y)I. A)f
E it (X - y):2ds p/2 + E it (X - Y):ds P
< (t p / 2 - 1 + t p - 1 ) l t E(X - Y):Pds. 0
Proof of Theorem 21.3: To prove the existence, fix any Fa-measurable
random vector in JRd, put x2 = , and define recursively X n = S(X n - 1 )
for n > 1. Since l7 and b are bounded, we have E(X 1 - XO);2 < 00, and
by Lemma 21.5
E(X nH - xn);2 < Ct it E(X n - xn-l):2ds, t > 0, n > 1.
Hence, by induction,
nt n
E(X n + 1 - X n );2 < E(X 1 - );2 < 00, t, n > O.
n.
For any kEN, we get
IISUP n 2k(X n - X k );11 2 < L n2k II (X nH - X n );112
< II (Xl - ); 112 (ctn /n!)1/2 < 00.
n?k
Thus, by Lemma 4.6 there exists a continuous adapted process X with
Xo = such that (X n - X); -+ 0 a.s. and in L2 for each t > O. To see that
X solves equation (a, b), we may use Lemma 21.5 to obtain
E(X n - S(X));2 < Ct it E(X n - 1 - X):2ds, t > O.
As n -+ 00, we get E(X - SeX) );2 = 0 for all t, which implies X = SeX)
a.s.
Now consider any two solutions X and Y with IX o - Yol < € a.s. By
Lemma 21.5 we get for any p > 2
E(X - V)? < 2e P + Ct it E(X - Y)?ds, t > 0,
and by Lemma 21.4 it follows that
E(X - y);P < 2E P e Ctt , t > o.
(12)
If Xo == Yo a.s., we may take E == 0 and conclude that X = Y a.s., which
proves the asserted uniqueness. Letting Xx denote the solution X with
21. Stochastic Differential Equations and Martingale Problems 417
Xo = x a.s., we get by (12)
EIXx - XYI;P < 21x - ylPeCtt, t > O.
Taking p > d and applying Theorem 3.23 for each T > 0 with the met-
ric PT(f,g) == (f - g)T' we conclude that the process ()(f) has a jointly
continuous version on + x d.
From the construction we note that if X and Yare solutions with Xo ==
and Yo == TJ a.s., then X == Y a.s. on the set { == 1J}. In particular, X := Xf.
a.s. when takes countably many values. In general, we may approximate
uniformly by random vectors l, 2,'" in Qd, and by (12) we get x;n -4 Xt
in £2 for all t > O. Since also x;n -+ xf a.s. by the continuity of the flow,
it follows that Xt = X; a.s. 0
It is often useful to allow the solutions to explode. As in Chapter 19, we
may then introduce an absorbing state at infinity, so that the path space
becomes C(JR+, JRd ) with }Rd == IR d U {}. Define (n == inf{t; IXtl > n} for
each n, put ( == sUPn (n, and let Xt == for t > (. Given a Brownian
motion B in JRT and an adapted process X in the extended path space, we
say that X or the pair (X, B) solves equation (a, b) on the interval [0, () if
ftA(n ftA(n
Xtl\(n = Xo + J o o-{s, X)dB s + Jo b{s, X)ds, t > 0, n E N. (13)
When ( < 00, we have IX (n I -t 00, and X is said to explode at time (.
Conditions for the existence and uniqueness of possibly exploding solu-
tions may be obtained from Theorem 21.3 by suitable localization. The
following result is then useful to decide whether explosion can actually
occur.
Proposition 21.6 (explosion) The solutions to equation (a, b) are a.s.
nonexploding if
a(x); +b(x); :s 1 +x;, t > O.
(14)
Proof: By Proposition 17.15 we may assume that Xo is bounded. From
(13) and (14) we get for suitable constants Ct < 00
EX;(n < 2EI X ol2 + Ct it {I + EX;;"(n )ds, t > 0, n E N,
and so by Lemma 21.4
1 + EX;;'(n < (1 + 2EIXo\2) exp(ctt) < 00, t > 0, n E N.
As n -4 00, we obtain EX;;, < 00, which implies ( > t a.s. 0
Our next aim is to characterize weak solutions to equation (a, b) by a
martingale property that involves only the solution X. Then define
M! = f{X t ) - f{X o ) - it Asf{X)ds, t > 0, .f E Gft, (15)
418 Foundations of Modern Probability
where the operators As are given by
Asf(x) == aij(s,x)f:j(xs) + bi(s,x)f:(x s ), s > 0, f E CK. (16)
In the diffusion case we may replace the integrand Asf(X) in (15) by the
expression Af(X s ), where A denotes the elliptic operator
AI(x) == aij(x)f:j(x) + bi(x)I:(x), IE CK, x E }Rd. (17)
A continuous process X in jRd or its distribution P is said to solve the
local martingale problem for (a, b) if M f is a local martingale for every
f E C K . When a and b are bounded, it is clearly equivalent for M f to be a
true martingale, and the original problem turns into a martingale problem.
The (local) martingale problem for (a, b) with initial distribution J.-t is said
to be well posed if it has exactly one solution PJ.L. For degenerate initial
distributions 8x, we may write Px instead of P8x. The next result gives the
basic equivalence between weak solutions to an SDE and solutions to the
associated local martingale problem.
Theorem 21.7 (weak solutions and martingale problems, Stroock and
Varadhan) Let a and b be progressive, and fix any probability measure P
on C(+, }Rd). Then equation (a, b) has a weak solution with distribution
P iff P solves the local martingale problem for ((Ju', b) .
Proof: Write a = au'. If (X,B) solves equation (a, b), then
[Xi, Xj] [a1(X) . B k , a{ (X) . Bl]
== a1 a f (X) . [B k , Bl] = aij (X) . A.
By Ita's formula we get for any f E C K
df(Xt) f:(Xt)dX; + f:j(Xt)d[Xi,Xj]t
J:(Xt)u}(t,X)dBf + AtJ(X)dt.
Hence, dMf = fI(Xt)(J](t,X)dBl, and so Mf is a local martingale.
Conversely, assume that X solves the local martingale problem for (a, b).
Considering functions f E C K with f(x) = xi for Ixl < n, it is clear by
a localization argument that the processes
M; = X; - X - l t bi(s, X)ds, t > 0, (18)
are continuous local martingales. Similarly, we may choose f:! E C K with
f:!(x) = xixj for Ixl < n, to obtain the local martingales
Mij = XiX j - XX6 - (X i j3j + X j j3i + aij) . A,
where a ij = aij(X) and (3i = bi(X). Integrating by parts and using (18),
we get
Mij _ Xi. xj + X j . Xi + [Xi,xj] - (X i {3j + X j (3i + ail) . A
_ Xi. Mj + x j . M i + [M i , Mj] - aij . A.
21. Stochastic Differential Equations and Martingale Problems 419
The last two terms on the right then form a local martingale, and so by
Proposition 17.2
[M i , Mj]t = it a ij (s, X)ds, t > O.
Hence, by Theorem 18.12 there exists a Brownian motion B with respect
to a standard extension of the original filtration such that
Mi = it ot(s, X)dB:, t > O.
Substituting this into (18) yields (2), which means that the pair (X, B)
solves equation ((7, b). 0
For subsequent needs, we rcte that the previous construction can be
made measurable in the following sense.
Lemma 21.8 (functional representation) Let (7 and b be progressive.
Then there exists a measurable mapping
F: P(C(JR+,JR d )) x C(JR+,JR d ) x [0, 1] C(JR+,JR T ),
such that, if the local martingale problem for ((7(7' , b) admits a solution
X with distribution P and iftJlLX is U(O,l), then B =: F(P,X,rJ) is a
Brownian motion in JRT and the pair (X, B) with induced filtration solves
equation ((7, b) .
Proof: In the previous construction of B, the only nonelementary step is
the stochastic integration with respect to (X, Y) in Theorem 18.12, where
Y is an independent Brownian motion, and the integrand is a progressive
function of X obtained by some elementary matrix algebra. Since the pair
(X, Y) is again a solution to a local martingale problem, })roposition 17.26
yields the desired functional representation. 0
Combining the martingale formulation with a compactness argument, we
may deduce some general existence and continuity results.
Theorem 21.9 (weak existence and continuity, Skorohod) Let a and b be
bounded, progressive functions such that, for any fixed t : 0, the functions
a( t, .) and b( t, .) are continuous on C(JR+, }Rd). Then the martingale problem
for ( a, b) has a solution PJl for every initial distribution J-L. If those solutions
are unique, then the mapping J-l t--+ PJ.L is weakly continuous.
Proof: For any € > 0, t > 0, and x E C(JR+,JR d ), define
(7c:(t, x) = (7((t - E)+, x), bc:(t, x) == b«(t - E)+, x),
and let ac; = (7 c;(7. Since (7 and b are progressive, the processes (J c; (s, X)
and bc;(s,X), s < t, are measurable functions of X on [0, (t - E)+]. Hence,
a strong solution XC: to equation ((7 c' be:) may be constructed recursively
on the intervals [en - 1)c:, nc:J, n E N, starting from an arbitrary random
420 Foundations of Modern Probability
vector JLB in ]Rd with distribution J-L. Note in particular that X e solves
the martingale problem for the pair (a e , be).
A pplying Theorem 17.7 to equation (a e, be:) and using the boundedness
of a and b, we get for any p > 0
E sup IXf+r - Xfl P >- h p / 2 + h P :s h P / 2 , t,E > 0, h E [0,1].
Orh
For p > 2d it follows by Corollary 16.9 that the family {XC:} is tight in
C(+, JRd), and by Theorem 16.3 we may then choose some En -+ 0 such
that X€n X for a suitable X.
To see that X solves the martingale problem for (a, b), let f E C
and s < t be arbitrary, and consider any bounded, continuous function
g: C([O, s], JRd) -t JR. We need to show that
E {f(X t ) - f(Xs) -it Arf(X)dr} g(X) = o.
Then note that X e satisfies the corresponding equation for the operators A
constructed from the pair (a e , be). Writing the two conditions as Ec.p(X) == 0
and E'Pe(X€) == 0, respectively, it suffices by Theorem 4.27 to show that
<Pe:(xe:) --+ <p(x) whenever Xc x in C(IR+,JR d ). This follows easily from
the continuity conditions imposed on a and b.
Now assume that the solutions PJ-L are unique, and let J-ln jl. Arguing as
before, we see that (PJ.Ln) is tight, and so by Theorem 16.3 it is also relatively
compact. If PJ.Ln Q along some subsequence, then as before we note that
Q solves the martingale problem for (a, b) with initial distribution Jl. Hence
Q == PJ-L' and the convergence extends to the original sequence. 0
Our next aim is to show how the well-posedness of the local martingale
problem for (a, b) extends from degenerate to arbitrary initial distributions.
This requires a basic measurability property, which will also be needed
later.
Theorem 21.10 (measurability and mixtures, Stroock and Varadhan) Let
a and b be progressive and such that, for any x E JRd, the local martingale
problem for ( a, b) with initial distribution 8x has a unique solution Px. Then
(Px) is a kernel from ]Rd to C(R+,JR d ), and for every initial distribution
jl, the associated local martingale problem has the unique solution PI-' ==
J PxJ.L( dx).
Proof: According to the proof of Theorem 21.7, it is enough to formu-
late the local martingale problem in terms of functions f belonging to
some countable subclass C C C K , consisting of suitably truncated ver-
sions of the coordinate functions Xi and their products xix j . Now define
P = P(C(IRd,lR. d )) and PM = {P x ; x E JRd}, and write X for the canon-
ic.I process in C(JR+,JRd). Let D denote the class of measures PEP
with degenerate projections P 0 XOI. Next let I consist of all measures
PEP such that X satisfies the integrability condition (3). Finally, put
21. Stochastic Differential Equations and Martingale Problems 421
Tl = inf{t; IM!I > n}, and let L be the class of measures PEP such that
the processes M!,n = M f (t 1\ rl) exist and are martingales under P for all
f E C and n E N. Then clearly PM = D n I n L.
To prove the asserted kernel property, it is enough to show that PM is
a measurable subset of P, since the desired measurability will then follow
by Theorem A1.3 and Lemma 1.40. The measurability of D is clear from
Lemma 1.39 (i). Even I is measurable, since the integrals on the left of (3)
are measurable by Fubini's theorem. Finally, L n I is a measurable subset
of I, since the defining condition is equivalent to countably many relations
of the form E[M!,n - M!,n; F] = 0, with f E C, n E N, s < t in Q+, and
F E Fs.
Now fix any probability measure J.L on JRd. The measure PJ-l == J PxJ.L( dx)
has clearly initial distribution J-L, and from the previous argument we note
that PJ.L again solves the local martingale for (a, b). To prove the uniqueness,
let P be any measure with the stated properties. Then E[M!,n - M!,n; FI
Xo] == 0 a.s. for all f, n, s < t, and F as above, and so P[ .IX o ] is a.s.
a solution to the local martingale problem with initial distribution {; Xo .
Thus, P[ .IX o ] == PX o a.s., and we get P == EP xo == J PxJl(dx) == PJ-l. This
extends the well-posedness to arbitrary initial distributions. 0
We return to the basic problem of constructing a Feller diffusion with
given generator A in (17) as the solution to a suitable SDE or the associated
martingale problem. The following result may be regarded as a converse to
Theorem 19.24.
Theorem 21.11 (strong Markov and Feller properties, Stroock and Varad-
han) Let a and b be measurable functions on jRd such that, for any
x E jRd, the local martingale problem for (a, b) with initial distribution b x
has a unique solution Px. Then the family (Px) satisfies the strong Markov
property. If a and b are also bounded and continuous, then the equation
Ttf(x) = Exf(Xt) defines a Feller semigroup on Co, and the operator A
in (17) extends uniquely to the associated generator.
Proof: By Theorem 21.10 it remains to prove that, for any state x E jRd
and bounded optional time T,
Px [X 0 Or E . 1F T ] = PX.,. a.s.
As in the previous proof, this is equivalent to count ably many relations of
the form
Ex[{(M!,n - M!,n)1p} 0 OTIF,] == 0 a.s.
(19)
with s < t and F E Fs, where M f,n denotes the process M f stopped at
Tn = inf{t; 1M!! > n}. Now 0:;1 Fs C F'+8 by Lemma 7.5, and in the
diffusion case
( M!,n M!,n ) (J - M! Mf
t - s 0 r - (r+t)I\U n - rl\u n ,
422 Foundations of Modern Probability
where an = T + Tn 0 Or, which is again optional by Proposition 8.8. Thus,
(19) follows by optional sampling from the local martingale property of M f
under Px.
Now assume that a and b are also bounded and continuous, and define
Ttf(x) = Exf(Xt). By Theorem 21.9 we note that Ttf is continuous for
every f E Co and t > 0, and from the continuity of the paths it is clear
that Ttf(x) is continuous in t for each x. To see that Ttf E Co, it remains
to show that IXf/ 00 as Ixl -+ 00, where XX has distribution Px. But
this follows from the SDE by the boundedness of a and b if for 0 < r < Ixl
we write
E/XX - xl 2 t + t 2
P{IXfl<r} < P{IXf-x\>lxl-r} < (lxl t -r)2 ;S (lxl-r)2 '
and let Ixl -+ 00 for fixed rand t. The last assertion is obvious from the
uniqueness in law together with Theorem 19.23. 0
It is usually harder to establish uniqueness in law than to prove weak exis-
tence. Some fairly general uniqueness criteria will be obtained in Theorems
23.1 and 24.2. For the moment we shall only exhibit some transformations
that may simplify the problem. The following result, based on a change of
probability measure, is often useful to eliminate the drift term.
Proposition 21.12 (transformation of drift) Let a, b, and c be pro-
gressive functions of suitable dimension, where c is bounded. Then weak
existence holds simultaneously for equations ( a, b) and ((j, b + ac). If, more-
over, c = a'h for some progressive function h, then even uniqueness in law
holds simultaneously for the two equations.
Proof: Let X be a weak solution to equation (a, b), defined on the canoni-
cal space for (X, B) with induced filtration F and with probability measure
P. Put V = c(X), and note that (V 2 . '\)t is bounded for each t. By
Lemma 18.18 and Corollary 18.25 there exists a probability_ measure Q
with Q == £(V' . B)t' P on Ft for each t > 0, and we note that B = B - V . A
is a Q-Brownian motion. Under Q we further get by Proposition 18.20
X - Xo - a(X). (B + V . A) + b(X) . A
- a(X). iJ + (b + O"c)(X) . '\,
which shows that X is a weak solution to the SDE (a, b + ac). Since the
same argument applies to equation (a, b + O"c) with c replaced by -c, we
conclude that weak existence holds simultaneously for the two equations.
N ow let c = a' h, and assume that uniqueness in law holds for equation
(a, b+ ah). Further assume that (X, B) solves equation (a, b) under both P
and Q. Choosing V and iJ as before, it follows that (X, B) solves equation
(a, b+ O"c) under the transformed distributions [(V' . B)t' P and [(V' . B)t'
Q for (X, B). By hypothesis the latter measures then have the same X-
marginal, and the stated condition implies that [(V' . B) is X-measurable.
Thus, the X-marginals agree even for P and Q, which proves the uniqueness
21. Stochastic Differential Equations and Martingale Problems 423
in law for equation (a, b). Again we may reverse the argument to get an
implication in the other direction. 0
Next we examine how an SDE of diffusion type can be transformed by a
random time-change. The method will be used systematically in Chapter
23 to analyze the one-dimensional case.
Proposition 21.13 (scaling) Fix some measurable functions a, b, and
c > 0 on d, where c is bounded away from 0 and 00. Then weak existence
and uniqueness in law hold simultaneously for equations ( (1, b) and (ca, c 2 b).
Proof: Assume that X solves the local martingale problem for the pair
(a, b), and introduce the process V =: c 2 (X) o-X with inverse (Ts). By optional
sampling we note that MIs' s > 0, is again a local ma.rtingale, and the
process Y s =: X Ts satisfies
M£ = f(Ys) - f{Yo) - is c 2 Af{Yr)dr.
Thus, Y solves the local martingale problem for (c 2 a, c 2 b).
Now let T denote the mapping on C(JR+,d) leading from X to Y,
and write T' for the corresponding mapping based on c- 1 . Then T and
T' are mutual inverses, and so by the previous argument applied to both
mappings, a measure P E P( C(JR+, ffi.d)) solves the local martingale prob-
lem for (a, b) iff P 0 T- 1 solves the corresponding problem for (c 2 a, c 2 b).
Thus, both existence and uniqueness hold simultaneously for the two prob-
lems. By Theorem 21.7 the last statement translates immediately into a
corresponding assertion for the SDEs. 0
Our next aim is to examine the connection between weak and strong solu-
tions. Under appropriate conditions, we shall further establish the existence
of a universal functional solution. To explain the subsequent terminology,
let Q be the filtration induced by the identity mapping (€, B) on the canon-
ical space 0 = JRd x C(IR+, JRT), so that Qt =: a{€, B t ), t > 0, where
B = BSl\t. Writing WT for the r-dimensional Wiener measure, we intro-
duce for any J1 E P(ffi. d ) the (J1 Q9 WT)-completion 9f of Qt. The universal
completion g t is defined as nJ-L gf, and we say that a function
F: JRd x C(JR+, ]RT) ---t C(JR+, ]Rd) (20)
is universally adapted if it is adapted to the filtration 9 = ( Q t).
Theorem 21.14 (pathwise uniqueness and functional solution) Let a and
b be progressive and such that weak existence and pathwise uniqueness hold
for solutions to equation ( a, b) starting at fixed points. Then strong existence
and uniqueness in law hold for any initial distribution, and there exists a
measurable and universally adapted function F as in (20) such that every
solution (X, B) to equation (a, b) satisfies X = F(X o , B) a.s.
Note in particular that the function F above is independent of initial
distribution J-t. A key step in the proof, accomplished in Lemma 21.17, is
424 Foundations of Modern Probability
to establish the corresponding result for a fixed J.1.. Two further lemmas
will be needed, and we begin with a statement that clarifies the connection
between adaptedness, strong existence, and functional solutions.
Lemma 21.15 (transfer of strong solution) Let (X, B) solve equation
(a, b), and assume that X is adapted to the complete filtration induced by
Xo and B. Then X == F( Xo, B) a.s. for some Borel-measurable function
F as in (20), and for any basic triple (:F, B,) with d X o , the process
X == F(, B) is :F-adapted and such that the pair (X, B) solves equation
(a, b).
Proof: By Lemma 1.13 we have X == F(X o , B) a.s. for some Borel-
measurable function F as stated. By the same result, there exists for every
t > 0 a further representation of the form Xt == Gt(X o , B t ) a.s., and so
F(X o , B)t == Gt(X o , B t ) a.s. Hence, Xt == Gt(f" iJt) a.s., and so X is :F-
adapted. Since also (X, B) d (X, B), Proposition 17.26 shows that even
the former pair solves equation (a, b). 0
The following result shows that even weak solutions can be transferred
to any given probability space with a specified Brownian motion.
Lemma 21.16 (transfer of weak solution) Let (X, B) solve equation
(0", b), and fix any basic triple (:F,iJ,(,) with (, d Xo. Then there exists
- - - - d
a process XlL,BF with Xo == f, a.s. and (X, B) = (X, B). Furthermore,
the filtration g induced by (X,:F) is a standard extension of:F, and the
pair (X, B) with filtration 9 solves equation ( 0", b).
Proof: By Theorem 6.10 and Proposition 6.13 there exists a process
- - - d -
XJl,BF satisfying (X,("B) = (X, Xo, B), and in particular Xo =
a.s. To see that 9 is a standard extension of :F, fix any t > 0 and de-
fine B' = B - iJt. Then (X t , iJt).lLB' since the corresponding relation
holds for (X, B), and so X t .J.l.,BtiJ'. Since also X t lL,BF, Proposition 6.8
yields xtJ.l.,Bt(B',:F) and hence xtJL.rt:F. But then (Xt,Ft)Jl.rtF by
Corollary 6.7, which means that 9tJl.FtF.
Since standard extensions preserve martingales, Theorem 18.3 shows that
iJ remains a Brownian motion with respect to Q. As in Proposition 17.26,
we conclude that the pair (X, B) solves equation (a, b). 0
We are now ready to establish the crucial relationship between strong
existence and pathwise uniqueness.
Lemma 21.17 (strong existence and pathwise uniqueness, Yamada and
Watanabe) Assume that weak existence and pathwise uniqueness hold for
solutions to equation ((j, b) with initial distribution J.L. Then even strong
existence and uniqueness in law hold for such solutions, and there exists a
measurable function FIJ as in (20) such that any solution (X, B) with initial
distribution Jl satisfies X = FJL(Xo, B) a.s.
21. Stochastic Differential Equations and Martingale Problems 425
Proof: Fix any solution (X, B) with initial distribution J-L and associated
filtration F. By Lemma 21.16 there exists some process Y Jlxo,B F with
Yo ==: Xo a.s. such that (Y, B) solves equation (a, b) for the filtration 9
induced by (Y,F). Since 9 is a standard extension of :F, the pair (X, B)
remains a solution for g, and the pathwise uniqueness yields X == Y a.s.
For each t > 0 we have Xtl.Lxo,BX t and (Xt,Bt)Jl(B - B t ), and so
X t llxo,BtX t a.s. by Proposition 6.8. Thus, Corollary 6.7 (ii) shows that
X is adapted to the complete filtration induced by ()(o, B). Hence, by
Lemma 21.15 there exists a measurable function Fp, with X == FJ-L(Xo, B)
a.s. and such that, for any basic triple (f:, B,) with = Xo, the process
X == FI-L(' B) is f:-adapted and solves equation (a, b) along with B. In
particular, d X since (c;, B) d (X o , B), and the pathwis niqueness
shows that X is the a.s. unique solution for the given triple (F, B, ). This
proves the uniqueness in law. 0
Proof of Theorem 21.14: By Lemma 21.17 we have uniqueness in law
for solutions starting at fixed points, and Theorem 21.10 shows that the
corresponding distributions Px form a kernel from d to C(+, JRd). By
Lemma 21.8 there exists a measurable mapping G such that, whenever X
has distribution Px and 1?lLX is U(O, 1), the process B == G(Px, X, 19) is a
Brownian motion in }RT and the pair (X, B) solves equation (a, b). Writing
Qx for the distribution of (X, B), it is clear from Lemmas 1.38 and 1.41
(ii) that the mapping x Qx is a kernel from JRd to C(+,JRd+T).
Changing the notation, we may write (X, B) for the canonical process
in C(IR+, JRd+T). By Lemma 21.17 we have X == Fx(x, B) == Fx{B) a.s. Qx,
and so
Qx[X E .IB] == OFx(B) a.s., x E d. (21)
By Proposition 7.26 we may choose versions vx,w == Qx[X E .IB E dw] that
combine into a probability kernel v from JRd x C(+, JRT) to C(JR+, }Rd).
From (21) we see that vx,w is a.s. degenerate for each x, and since the set
D of degenerate measures is measurable by Lemma 1.39 (i), we can modify
v such that vx,wD = 1. In that case,
vx,w == OF(x,w), x E JRd, W E C(JR+,JR T ), (22)
for some function F as in (20), and the kernel property of l/ implies that
F is product measurable. Comparing (21) and (22) gives F(x, B) == Fx(B)
a.s. for all x.
Now fix any probability measure J.-l on jRd, and conclude as in Theorem
21.10 that PJ.L = J PxJ.L(dx) solves the local martingale problem for (a, b)
with initial distribution J-L. Hence, equation (a, b) has a solution (X, B) with
distribution Jl for Xo. Since conditioning on :Fo preserves martingales, the
equation remains conditionally valid given Xo. By the pathwise uniqueness
in the degenerate case we get P[X == F(X o , B)IX o ] == 1 a.s., and so X ==
F(X o , B) a.s. In particular, the pathwise uniqueness extends to arbitrary
initial distributions J-L.
426 Foundations of Modern Probability
Returning to the canonical setting, we may take (, B) to be the identity
map on the canonical space}Rd x C(+,JRT), endowed with the probability
measure J-L @ WT and the induced complete filtration gJ.L. By Lemma 21.17
equation (0-, b) has a gJ1- a dapted solution X == FJ1(' B) with Xo = a.s.,
and the previous discussion shows that even X == F(, B) a.s. Hence, F'
is adapted to g'\ and since J.-l is arbitrary, the adaptedness extends to the
universal completion g t == n/-L gr, t > o. 0
Exercises
1. Show that for any c E (0,1), the stochastic flow Xf in Theorem 21.3
is a.s. Holder continuous in x with exponent c, uniformly for bounded x
and t. (Hint: Apply Theorem 3.23 to the estimate in the proof of Theorem
21.3. )
2. Show that a process X in JRd is a Brownian motion iff the process
f(Xt) - f f(Xs)ds is a martingale for every f E C K . Compare with
Theorem 18.3 and Lemma 19.21.
3. Show that a Brownian bridge in JRd satisfies the SDE dX t == dBt -
(1 - t)-l Xt dt on [0,1) with initial condition Xo = o. Also show that if Xx
denotes the solution starting at x, then the process x == Xf - (1 - t)x is
again a Brownian bridge. (Hint: Note that Mt = Xt/(1-t) is a martingale
on [0,1) and that yx satisfies the same SDE as X.)
4. Solve the preceding SDE, using Proposition 21.2, to express the Brown-
ian bridge in terms of a Brownian motion. Compare with previously known
formulas.
5. Given two continuous semimartingales U and V, show that the Fisk-
Stratonovich SDE dX = dU + X 0 dV has the unique solution X = Z (X o +
Z-l oU), where Z = exp(V - V o ). (Hint: Use Corollary 17.21 and the chain
rule for FS-integrals, or derive the result from Proposition 21.2.)
6. Show under suitable conditions how a Fisk-Stratonovich SDE can
be converted into an Ito equation, and conversely. Also give a sufficient
condition for the existence of a strong solution to an FS-equation.
7. Show that weak existence and uniqueness in law hold for the SDE dX t =
sgn(Xt+ )dBt with initial condition Xo = 0, while strong existence and
pathwise uniqueness fail. (Hint: Show that any solution X is a Brownian
motion, and define B = sgn(X+)dX. Note that both X and -X satisfy
the given SDE.)
8. Show that weak existence holds for the SDE dX t = sgn(Xt)dBt with
initial condition Xo = 0, while strong existence and uniqueness in law fail.
(Hint: We may take X to be a Brownian motion or put X = 0.)
9. Show that strong existence holds for the SDE dX t = l{X t =1= O}dB t with
initial condition Xo = 0, while uniqueness in law fails. (Hint: Here X = B
and X = 0 are both solutions.)
21. Stochastic Differential Equations and Martingale Problems 427
10. Show that a given process may satisfy SDE's with different ((J(J', b).
(Hint: For a trivial example, take X == 0, b == 0, and a == 0 or a(x) == sgnx.)
11. Construct a non-Markovian solution X to the SDE d.Jt == sgn(Xt)dBt.
(Hint: We may take X to be a Brownian motion, stopped at the first visit
to 0 after time 1. Another interesting choice is to take X to be 0 on [0,1]
and a Brownian motion on [1, 00 ).)
12. For X as in Theorem 21.3, construct an SDE in }Rmd satisfied by
the process (Xfl,.. . , Xfm) for arbitrary Xl, . . . ,X m E ]Rd. Conclude that
.c(X) is determined by .c(XX, XY) for arbitrary x, y E IR d . (Hint: Note that
.c( XX) is determined by (aa' , b) and x, and apply this result to the m-point
motion. )
13. Find two SDE's as in Theorem 21.3 with solutions )( and Y such that
Xx d yx for all X but X $. Y. (Hint: We may choose dX == dB and
dY = sgn(Y + )dB.)
14. For a diffusion equation (a, b) as in Theorem 21.3, show that the dis-
tribution of the associated flow X determines Lj aj(x)a](y) for arbitrary
pairs i,k E {I,... ,d} and x,y E }Rd.
15. Show that if weak existence holds for the SD E (a, b), then the pathwise
uniqueness can be strengthened to the corresponding property for solutions
X and Y with respect to possibly different filtrations.
16. Assume that weak existence and the stronger version of pathwise
uniqueness hold for the SDE (a, b). Use Theorem 6.10 and Lemma 21.15
to prove the existence for every J.1 of an a.s. unique functional solution
F(Xo, B) with .c(X o ) == J.L.
Chapter 22
Local Time, Excursions,
and Additive Functionals
Tanaka's formula and semimartingale local time; occupation
density, continuity and approximation; regenerative sets and
processes; excursion local time and Poisson process; Ray-Knight
theorem; excessive functions and additive functionals; local time
at a regular point; additive functionals of Brownian motion
The central theme of this chapter is the notion of local time, which we
will approach in three different ways, namely via stochastic calculus, via
excursion theory, and via additive functionals. Here the first approach leads
in particular to a useful extension of Ita's formula and to an interpretation
of local time as an occupation density. Excursion theory will be developed
for processes that are regenerative at a fixed state, and we shall prove the
basic Ita representation, involving a Poisson process of excursions on the
local time scale. Among the many applications, we consider a version of the
Ray-Knight theorem about the spatial variation of Brownian local time.
Finally, we shall study continuous additive functionals (CAFs) and their
potentials, prove the existence of local time at a regular point, and show
that any CAF of one-dimensional Brownian motion is a mixture of local
times.
The beginning of this chapter may be regarded as a continuation of the
stochastic calculus developed in Chapter 17. The present excursion theory
continues the elementary discussion for the discrete-time case in Chapter 8.
Though the theory of CAFs is formally developed for Feller processes, few
results from Chapter 19 will be needed beyond the strong Markov property
and its integrated version in Corollary 19.19. Both semimartingale local
time and excursion theory will reappear in Chapter 23 as useful tools for
studying one-dimensional SDEs and diffusions. Our discussion of CAFs of
Brownian motion and their associated potentials is continued at the end of
Chapter 25.
For the stochastic calculus approach to local time, consider an arbitrary
continuous semimartingale X in JR. The semimartingale local time LO of X
at 0 may be defined through Tanaka '8 formula
L = IXtl - IXol - it sgn(Xs- )dX s , t > 0, (1)
22. Local Time, Excursions, and Additive Functionals 429
where sgn(x-) = l(o,oo)(x) - 1C-oo,o] (x). Note that the stochastic integral
on the right exists since the integrand is bounded and progressive. The
process LO is clearly continuous and adapted with Lg == O. To motivate the
definition, we note that a formal application of Ita's rule to the function
f(x) = Ix\ yields (1) with L = fs<t 8(X s )d[X]s. The following result gives
the basic properties of local time at a fixed point. Here we say that a
non decreasing function f is supported by a Borel set A if the associated
measure J-L satisfies j.LAc = o. The support of I is the slnallest closed set
with this property.
Theorem 22.1 (semimartingale local time) Let L O be t.he local time at 0
of a continuous semimartingale X. Then L O is a.s. nondecreasing, contin-
uous, and supported by the set Z == {t > 0; Xt = O}. Furthermore, we have
a.s.
L == { -IXol- inf r s gn(X-)dX } V 0, t > O. (2)
st J o
The proof of the last assertion depends on an elementary observation.
Lemma 22.2 (supporting function, Skorohod) Let I be a continuous
function on + with fo > O. Then there exists a unique nondecreas-
ing, continuous function 9 with 90 == 0 such that h == I + 9 > 0 and
f l{h > O}dg == 0, namely
9t == - inf fs 1\ 0 == sup( - Is) V 0, t > o. (3)
st st
Proof: The function in (3) clearly has the desired properties. To prove
the uniqueness, assume that both 9 and g' have the stated properties,
and put h = f + 9 and h' == I + g'. If 9t < g for some t > 0, define
s = sup{r < t; 9r = g}, and note that h' > h' - h == gl - 9 > 0 on (8, t].
Hence, 9 = g, and so 0 < g - gt < g - gs == 0, a contradiction. D
Proof of Theorem 22.1: For any h > 0, we may choose a convex function
fh E 0 2 such that fh(x) == -x for x < 0 and fh(x) = x -- h for x > h. Here
clearly fh(x) Ixl and f -+ sgn(x-) as h -+ O. By Ita's formula we get,
a.s. for any t > 0,
y;;h = !h(X t ) - fh(X O ) -I t f(Xs)dXs = it f::(Xs)d[X]s,
and by Corollary 17.13 and dominated convergence we note that (Y h - L 0 );
0 for each t > o. The first assertion now follows froIn the fact that the
processes yh are nondecreasing and satisfy
1 00 l{X s ft [0, h]}dY s h = 0 a.s., h > o.
The last assertion is a consequence of Lemma 22.2. 0
430 Foundations of Modern Probability
In particular, we may deduce a basic relationship between a Brownian
motion, its maximum process, and its local time at o. The result improves
the elementary Proposition 13.13.
Corollary 22.3 (local time and maximum process, Levy) Let L O be the
local time at 0 of Brownian motion B, and define Mt == sUPs:S;t Bs. Then
(LO,IBI) d (M,M - B).
Proof: Define B == - fs<t sgn(Bs- )dBs and M; == sUPst B, and con-
clude from (1) and (2) tht LO == M' and IBI == L O - B' == M' - B'. It
remains to note that B' d B by Theorem 18.3. D
The local time LX at an arbitrary point x E JR. is defined as the local
time of the process X - x at O. Thus,
Lf = IX t - xl- IX o - xl -it sgn(X s - x- )dX s , t > O. (4)
The following result shows that the two-parameter process L == (Lf) on
JR+ x JR has a version that is continuous in t and rcll (right-continuous with
left-hand limits) in x. In the martingale case we even have joint continuity.
Theorem 22.4 (regularization, Trotter, Yor) Let X be a continuous
semimartingale with canonical decomposition M + A. Then the local time
L == (L[) of X has a version that is rcll in x, uniformly for bounded t, and
satisfies
Lf - Lf- = 2i t l{X s = x}dAs, x E , t E +. (5)
Proof: By the definition of L we have for any x E Rand t > 0
LX
t
IX t - xl - \X o - xl
-it sgn(X s - x-)dM s -it sgn(X s - x- )dAs.
(6)
By dominated convergence the last term has the required continuity proper-
ties, and the discontinuities in the space variable are given by the right-hand
side of (5). Since the first two terms are trivially continuous in (t, x), it re-
mains to show that the first integral in (6), denoted by If below, has a
jointly continuous version.
By localization we may then assume that the processes X - Xo, [M]1/2,
and J IdAI are all bounded by some constant c. Fix any p > 2. By Theorem
17.7 we get for any x < y
E(I X - IY);P < 2 P E(l(x,y] (X) . M);P ;s E(l(x,y] (X) . [M])f/2. (7)
22. Local Time, Excursions, and Additive F'unctionals 431
To estimate the integral on the right, put y - x == h and choose f E C 2
with f" > 2 . 1(x,yJ and If' I < 2h. By Ita's formula
1(x,yJ(X) . [M] < f"(X). [X] == f(X) - f(Xo) - f'(X) . X
< 4ch + If' (X) . MI, (8)
and by another application of Theorem 17.7
E(f' (X) . M);p/2 :S E( (f' (X))2 . [MJ)f/4 < (2ch )p/2. (9)
Combination of (7)-(9) gives E(IX - IY);P :s (ch )p/2" and the desired
continuity follows by Theorem 3.23. 0
By the last result we may henceforth assume the local time Lf to be rcll
in x. Here the right-continuity is only a convention, consistent with our
choice of a left-continuous sign function in (4). If the occupation measure
of the finite variation component A of X is a.s. diffuse, then (5) shows that
L is a.s. continuous.
We proceed to give a simultaneous extension of Ita's and Tanaka's for-
mulas. Recall that any convex function f on JR has a nondecreasing and
left-continuous left derivative f'(x-). The same thing is then true when f
is the difference between two convex functions. In that ease there exists a
unique signed measure J-Lf with J-Lf[x, y) == f'(y-) - f'(.];-) for all x < y.
In particular, J-L f (dx) == f" (x )dx when f E C 2 .
Theorem 22.5 (occupation density, Meyer, Wang) Llt X be a continu-
ous semimartingale with right-continous local time L. Then outside a fixed
null set we have, for any measurable function f > 0 on IR,
it f(Xs)d[X]s = I: f(x)Ldx, t > o. (10)
If f is the difference of two convex functions, then also
f(Xt) - f(Xo) = it f'(X - )dX + I: Lf /-Lf(dx), t > o. (11)
In particular, Theorem 17.18 extends to any function fECI () such that
f' is absolutely continuous with Radon-Nikodym derivative f".
Note that (11) remains valid for the left-continuous version of L, provided
that f' (X -) is replaced by the right derivative f' (X + ) ,
Proof: For f(x) = Ix - ai, equation (11) reduces to the definition of Lf.
Since the formula is also trivially true for affine functions f (x) - ax + b,
it extends by linearity to the case when J-Lf is supported by a finite set.
By linearity and a suitable truncation, it remains to prove (11) when J.1f
is positive with bounded support and f( -(0) == f' ( -00) == o. Then define
for every n E N the functions
9n(X) = f'(2- n [2 n x]-), fn(x) = J 9n(u)du,
x E JR,
432 Foundations of Modern Probability
and note that (11) holds for all fn. As n 00, we get f(x-) = 9n(X-) t
f'(x-), and so Corollary 17.13 yields f(X-) . X f'(X-) . X. Also
note that In -t I by monotone convergence. It remains to show that
J L J-l f n ( dx) J L J-l f ( dx). Then let h be any bounded, right-continuous
function on , and note that J-tfnh == /-lfh n with hn(x) = h(2- n [2 n x + 1]).
Since h n -7 h, we get J-lfh n J-lfh by dominated convergence.
Comparing (11) with Ita's formula, we note that (10) holds a.s. for any
t > 0 and f E C. For each t > 0, the two sides of (10) define random
measures on JR, and so by suitable approximation and monotone class ar-
guments we may choose the exceptional null set N to be independent of f.
By the continuity of each side, we may also assume that N is independent
of t.
If fECI with f' as stated, then (11) applies with J.1f(dx) = f"(X)dx,
and the last assertion follows by (10). 0
In particular, we note that the occupation measure at time t,
7]tA = it lA(Xs)d[X]s, A E B(JR), t > 0,
(12)
is a.s. absolutely continuous with density Lt. This leads to a simple
construction of L.
Corollary 22.6 (right derivative) Outside a fixed P-null set, we have
L == lim T}t[x, x + h)/h, t > 0, x E IR.
hO
Proof: Use Theorem 22.5 and the right-continuity of L.
o
Our next aim is to show how local time arises naturally in the context
of regenerative processes. Then consider an rcll process X in some Polish
space S such that X is adapted to some right-continuous and complete
filtration :F. Fix a state a E S, and assume X to be regenerative at a, in
the sense that there exists some distribution Pa on the path space satisfying
P[Br X E .IF T ] == Pa a.s. on {T < 00, X r == a},
(13)
for every optional time T. The relation will often be applied to the hitting
times 7"r = inf{t > r; Xt = a}, which are optional for all r > 0 by Theorem
7.7. In fact, when X is continuous, the optionality of 7"r follows already
from the elementary Lemma 7.6. In particular, we note that :Fro and (}ToX
are conditionally independent, given that TO < 00. For simplicity we may
henceforth take X to be the canonical process on the path space D =
D(+, S), equipped with the distribution P = Pa.
Introducing the regenerative set Z = {t > 0; Xt = a}, we may write the
last event in (13) simply as {7" E Z}. From the right-continuity of X it
is clear that Z :3 t n t t implies t E Z, which means that every point in
Z \ Z is isolated from the right. Since Z C is open and hence a countable
union of disjoint open intervals, it follows that ZC is a countable union of
disjoint intervals of the form (u, v) or [u, v). With every such interval we
22. Local Time, Excursions, and Additive Functionals 433
may associate an excursion process yt == X(t+u)I\V' t > O. Note that a is
absorbing for Y, in the sense that yt == a for all t > inf {s > 0; == a}.
The number of excursions may be finite or infinite, and if Z is bounded
there is clearly a last excursion of infinite length.
We begin with a classification according to the local properties of Z.
Proposition 22.7 (local dichotomies) If the set Z is regenerative, then
(i) either ( Z )O == 0 a.s. or Zo == Z a.s.;
(ii) either a.s. all points of Z are isolated, or a.s. none of them is;
(iii) either AZ = 0 a.s. or supp(Z . A) == Z a.s.
Recall that the set Z is said to be nowhere dense if ( Z )O == 0, and that Z
is perfect if Z has no isolated points. If ZO == Z , then clearly supp(Z."\) == Z ,
and no isolated points exist.
Proof: By the regenerative property, we have for any optional time T
PiT == O} == E[P[7 == OIFo]; 7 == 0] == (P{T == o})2,
and so P{r == O} == 0 or 1. If (J is another optional time, then T' == a+To(}(j
is again optional by Proposition 8.8, and we get
P{r' - h < a E Z} == PiT 0 OCT < h, a E Z} == PiT ; h}P{a E Z}.
Thus, P[7' - a E '\(J E Z] == p 0 7- 1 , and in particular T == 0 a.s. implies
T' == a a.s. on {a E Z}.
(i) Here we apply the previous argument to the optional times T == inf ZC
and a == rr. If T > 0 a.s., then rO(}T r > 0 a.s. on {rr < oo}, and so Tr E ZO
a.s. on the same set. Since the set {Tr; r E Q+} is dense in Z , it follows
that Z == zo a.s. Now assume instead that 7 == 0 a.s. TIlen T 0 e Tr == 0 a.s.
on {Tr < oo}, and so rr E zc a.s. on the same set. Hence, Z c ZC a.s., and
therefore zc == 1R+ a.s. It remains to note that Zc == ( Z )c, since ZC is a
disjoint union of intervals (u, v) or [u, v).
(ii) In this case, we define r = inf(Z \ {O}). If T == 0 a.s., then 70 (}T r == 0
a.s. on {Tr < oo}. Since every isolated point of Z is of the form Tr for some
r E Q+, it follows that Z has a.s. no isolated points. If instead 7 > 0 a.s.,
we may define the optional times (In recursively by a n +l == an + T 0 OCT n ,
starting from (J1 == T. Then (In == Ek<n {k, where the {k are i.i.d. and
distributed as r, and so (In -t 00 a.s. by the law of large numbers. Thus,
Z == {an < 00; n E N} a.s., and a.s. all points of Z are isolated.
(Hi) Here we may take T == inf{ t > 0; (Z . '\)t > OJ. If T == 0 a.s., then
T 0 ()T r = 0 a.s. on {Tr < oo}, and so Tr E supp(Z . .,\) a.s. on the same set.
Hence, Z c supp(Z . A) a.s., and the two sets agree a.s. If instead T > 0
a.s., then T = r + r 0 ()., > r a.s. on {T < oo}, which "implies T == 00 a.s.
This yields AZ = 0 a.s. 0
To examine the global properties of Z, we may introduce the holding time
1 = inf ZC = inf{t > 0; Xt =1= a}, which is optional by Lemma 7.6. The
434 Foundations of Modern Probability
following extension of Lemma 12.16 gives some more detailed information
about dichotomy (i) above.
Lemma 22.8 (holding time) The time I is exponentially distributed with
mean m E [0,00], where m == 0 or 00 when X is continuous. Furthermore,
Z is a.s. nowhere dense when m == 0, and if m > 0 it is a.s. a locally finite
union of intervals [a, T). Finally, ,lLX 0 (}"Y when m < 00.
Proof: The first and last assertions may be proved as in Lemma 12.16,
and the statement for m = 0 was obtained in Proposition 22.7 (i). Now let
o < m < 00. Noting that 'Y 0 B--y == 0 a.s. on {'Y E Z}, we get
0== P{1'oB"Y > 0, ,E Z} == P{1' > O}P{1' E Z} == P{1' E Z},
so in this case I tJ. Z a.s. Put 0"0 = 0, let 0"1 == l' + TO 0 (}"'j' and define
recursively O"n+l == O"n + 0"1 0 ()(Tn' Write I'n == an + l' 0 Ban' Then O"n -7 00
a.s. by the law of large numbers, and so Z == Un [an, 1'n). If X is continuous,
then Z is closed and the last case is excluded. 0
The state a is said to be absorbing if m == 00 and instantaneous if m == O.
In the former case clearly X _ a and Z == R+ a.s. Hence, to avoid trivial
exceptions, we may henceforth assume that m < 00. A separate treatment
is sometimes required for the elementary case when the recurrence time
, + 70+ 0 ()"'j is a.s. strictly positive. This clearly occurs when Z has a.s. only
isolated points or the holding time 'Y is positive.
We proceed to examine the set of excursions. Since there is no first excur-
sion in general, it is helpful first to focus on excursions of long duration. For
any h > 0, let Dh denote the set of excursion paths longer than h, endowed
with the O"-field Vh generated by all evaluation maps 1ft, t > o. Note that
Do is a Borel space and that Dh E Do for aU h. The number of excursions in
Dh will be denoted by h' The following result is a continuous-time version
of Proposition 8.15.
Lemma 22.9 (long excursions) Fix any h > 0, or h > 0 when the re-
currence time is positive. Then either h = 0 a.s., or h has a geometric
distribution with mean mh E (1, 00]. In the latter case, X has Dh -excursions
-vj . < l . . d vI "\/"2 . D h "'{.TK.h.
.I. h' J _ "'h J or some z. t. . processes .1. h , .I. h , . .. zn h, were .I. h 'lS a. s.
infinite when mh < 00.
Proof: For t E (0,00], let K denote the number of Dh-excursions com-
pleted at time t E [0,00], and note that K.t > 0 when It = 00. Writing
Ph = P{Kh > O}, we obtain
Ph - P{Kt > O} + P{Kt = 0, h OOTt > O}
- P{Kt > O} + P{Kt = O}Ph.
Since K Kh as t 00, we get Ph == Ph + (1 - Ph)Ph, and so Ph = 0 or 1.
Now assume that Ph = 1. Put eTo = 0, let 0"1 denote the end of the first
Dh-excursion, and recursively define l1n+1 = an + l11 0 DUn. If all excursions
are finite, then clearly eTn < 00 a.s. for all n, and so fih == 00 a.s. Thus,
22. Local Time, Excursions, and Additive ]nctionals 435
the last Dh-excursion is infinite when lih < 00. We may now proceed as in
the proof of Proposition 8.15 to construct some i.i.d. processes Yl, Y , . . .
in Dh such that X has Dh-excursions y, j < ""h. Since lih is the nun1ber
of the first infinite excursion, we note in particular that ""h is geometri-
cally distributed with mean q/:l, where qh is the probability that y is
infinite. 0
Now put h == inf{h > 0; lih == 0 a.s.}_ For any h E (0, h) we have
lih > 1 a.s., and we may define Vh as the distribution of the first excursion
in Dh- The next result shows how the Vh can be combined into a single
measure v on Do, the so-called excursion law of X. For convenience, we
write v[ 'IA] == v(. n A)/vA whenever 0 < vA < 00.
Lemma 22.10 (excursion law, Ito) There exists a measure v on Do such
that VDh E (0, (0) and Vh == v[ .IDh] for every h E (0, h,). Furthermore, v
is unique up to a normalization, and it is bounded iff the recurrence time
is a.s. positive.
Proof: Fix any h < k in (0, h), and let y, y, . .. be such as in Lemma
22.9. Then the first Dk-excursion is the first process } that belongs to
D k , and since the Yl are i.i.d. Vh, we have
"-
Vk == Vh[ .IDk), 0 < h < k < h. (14)
Now fix any k E (0, h), and define Vh == Vh/VhDk, h E: (0, k]. Then (14)
yields Vh' == Vh(' n Dh') for any h < h' < k, and so Vh increases as h -+ 0
toward a measure v with v(. n Dh) == Vh for all h < k. :For any h E (0, h),
we get
v[ .ID h ] == Vh/\k [ . IDh] == Vh/\k [ . IDh] == L'h.
If v'is another measure with the stated property, then
V(.nDh) Vh v'(.nD h )
-
VDk VhDk V'Dk
As h -+ 0 for fixed k, we get v == rv' with r == VDk/V'Dko
If the recurrence time is positive, then (14) remains true for h == 0, and
we may take v == va. Otherwise, let h < k in (0, h), and denote by ""h,k the
number of Dh-excursions up to the first completed excursion in Dk. For
fixed k we have ""h,k -+ 00 a.s. as h -+ 0, since Z is perfect and nowhere
dense. Now lih,k is geometrically distributed with mean
"-
h < k < h.
E"'h,k = (Vh D k)-l == (V[DkI D h])-l == VDh/vDk,
and so VDh -+ 00. Thus, v is unbounded.
o
When the regenerative set Z has a.s. only isolated points, then Lemma
22.9 already gives a complete description of the excursion structure. In the
complementary case when Z is a.s. perfect, we have the following funda-
mental representation in terms of a local time process L and an associated
436 Foundations of Modern Probability
Poisson point process , both of which can be constructed from the array
of holding times and excursions.
Theorem 22.11 (excursion local time and Poisson process, Levy, lto) Let
X be regenerative at a and such that the closure of Z = {t; Xt = a} is a.s.
perfect. Then there exist a nondecreasing, continuous, adapted process L
on JR+ with support Z a.s., a Poisson process on JR+ x Do with intensity
measure of the form A 0 v, and a constant c > 0, such that Z . A = cL a.s.
and the excursions of X with associated L-values are given by the restriction
of to [0, Loo]. Furthermore, the product v . L is a.s. unique.
Proof (beginning): If E7 == c > 0, we may define v == vole and introduce
a Poisson pr.?cess on + x Do with intensity measure A0v. Let the points
of {be (O"j, 1j), j E N, and put ao = O. By Proposition 12.15 the differences
1j == OJ - OJ-l are independent and exponentially distributed with mean c.
Furthermore, by Proposition 12.3 the processes 1j are independent of the
O"j and i.i.d. vo. Letting 'Pi be the first index j such that 1j is infinite, we
see from Lemmas 22.8 and 22.9 that
{'Yj,1j; j < } d {1j, fj; j < }, (15)
where the quantities on the left are the holding times and subsequent ex-
cursions of X. By Theorem 6.10 we may redefine such that (15) holds
a.s. The stated conditions then become fulfilled with L == Z . A.
Turning to the case when E'Y == 0, we may define v as in Lemma 22.10
and let be Poisson A 0 v, as before. For any h E (O,_h) , the points of
in JR+ x Dh may be enumerated from the left as (a, y), j E N, and we
define h as the first index j such that y is infinite. The processes y are
clearly i.i.d. Vh, and so by Lemma 22.9 we have
{Y;j < Kh} d {Y;j < Kh}' hE(O,h).
(16)
Since longer excursions form subarrays, the entire collections in (16) have
the same finite-dimensional distributions, and so by Theorem 6.10 we may
redefine.{ such that all relations hold a.s.
Let it" be the right endpoint of the jth excursion in D h , and define
Lt = inf{a; h,j > 0, T > t}, t > O.
We need the obvious facts that, for any t > 0 and h, j > 0,
Lt < 0{ =} t < i Lt < a. (17)
To see that L is a.s. continuous, we may assume that (16) holds identically.
Since v is infinite, we may further assume the set {ai; h, j > O} to be
dense in the interval [0, Loo]. If Lt > 0, there exist some i, j, h > 0 with
Lt- < (]" < O" < L t +- By (17) we get t - E < i < 7k < t + e for every
e > 0, which is impossible. Thus, D.Lt == 0 for all t.
To prove that Z c supp L a.s., we may further assume Z w to be perfect
and nowhere dense for each w E O. If t E Z , then for every € > 0 there
22. Local Time, Excursions, and Additive F'unctionals 437
exist some i,j,h > 0 with t - E < T < Th < t + E, and by (17) we
. j
get Lt-c < a'h < a h < Lt+c' Thus, Lt-€ < Lt+€ for all E > 0, and so
t E suppLe 0
In the perfect case, it remains to establish the a.s. relation Z . A == cL for
a suitable c and to show that L is unique and adapted. To. avoid repetition,
we postpone the proof of the former claim until Theorem 22.13. The latter
statements are immediate consequences of the following result, which also
suggests many explicit constructions of L. Let TJtA denote the number of
excursions in a set A E V o , completed at time t > O. Note that 'T} is an
adapted, measure-valued process on Do.
Proposition 22.12 (approximation) If AI, A 2 ,'" E V o with 00 >
v An -j. 00, then
sup 1]tAn - Lt 0, U > o.
t5:u II An
The same convergence holds a.s. when the An are nested.
In particular, 1]tDh/lIDh -+ Lt a.s. as h -+ 0 for fixed t. Thus, L is a.s.
determined by the regenerative set Z.
Proof: Let be such as in Theorem 22.11, and put s == ([O, s] x.).
First assume that the An are nested. For any s > 0 we note that (sAn) d
(N svAn ), where N is a unit-rate Poisson process on +. Since t-IN t ---+ 1
a.s. by the law of large numbers and the monotonicity of N, we get
sAn
A -+ s a.s., s > O.
II n
(18)
As in case of Proposition 4.24, we may strengthen this to
sAn
sup A - S -t 0 a.s., r > O.
s5:r II n
(19)
Without the nestedness assumption, we may introduce a nested sequence
A , A, . .. with II A = II An for all n. Then (19) holds with An replaced
by A, and since the distributions on the left are the saIne for each n, the
formula for An remains valid with convergence in probability. In both cases
we may clearly replace r by any positive random variable. The convergence
(18) now follows, if we note that Lt- < 'T}t < Lt for all t > 0 and use the
continuity of L. 0
The excursion local time L is described most conveniently in terms of its
right-continuous inverse
Ts = L;1 = inf{t > 0; Lt > s}, s > O.
To state the next result, we introduce the subset Z' C Z, obtained from
Z by omission of all points that are isolated from the right. Let us further
write l ( u) for the length of an excursion path U E Do.
438 Foundations of Modern Probability
Theorem 22.13 (inverse local time) Let L, , l/, and c be such as
in Theorem 22.11. Then T == L -1 is a generalized subordinator with
characteristics (c, 1/ 0 i-I) and a. s. range Z' in IR+, and we have a. s.
Ts = CS+ l s +1 l(u)(drdu), S > O. (20)
Proof: We may clearly discard the null set where L is not continuous
with support Z . If Ts < 00 for some s > 0, then Ts E supp L == Z by
the definition of T, and since L is continuous, we get Ts Z \ Z'. Thus,
T(+) C Z' U { oo} a.s. Conversely, assume that t E Z'. Then for any E > 0
we have Lt+€ > Lt, and so t < ToLt < t+E. As c -+ 0, we get ToLt = t.
Thus, Z' c T(JR+) a.s.
For each s > 0, the time Ts is optional by Lemma 7.6. Furthermore, it is
clear from Proposition 22.12 that, as long as Ts < 00, the process BsT - Ts
is obtainable from X 0 ()T s by a measurable mapping that is independent
of s. By the regenerative property and Lemma 15.11, the process T is then
a generalized subordinator, and in particular it admits a representation
as in Theorem 15.4. Since the jumps of T agree with the lengths of the
excursion intervals, we obtain (20) for a suitable c > O. By Lemma 1.22
the double integral in (20) equals J x(s 0[-1 )(dx), which shows that T has
Levy measure E(l 0 [-1) == l/ 0 [-1.
Substituting s == Lt into (20), we get a.s. for any t E Z'
t = To Lt = cLt + 1 Lt +j l(u)(drdu) = cLt + (ZC . >'k
Hence, cLt == (Z . A)t a.s., which extends by continuity to arbitrary t >
o. 0
We may justify our terminology by showing that the semimartingale and
excursion local times agree whenever both exist.
Proposition 22.14 (reconciliation) Let the continuous semimartingale
X in be regenerative at some a E IR with P{L =I- o} > O. Then the
set Z = {t; Xt == a} is a.s. perfect and nowhere dense, and La is a version
of the excursion local time at a.
Proof: By Theorem 22.1 the state a is nonabsorbing, and so Z is nowhere
dense by Lemma 22.8. Since P{L =I- O} > 0 and La is a.s. continuous
with support in Z, Proposition 22.7 shows that Z is a.s. perfect. Let L
be a version of the excursion local time at a, and put T = L -1. Define
Y s = La 0 Ts for s < Loo, and let Y s == 00 otherwise. By the continuity of
La we have Ys:l:: == La 0 Ts for every s < Loo. If Ts > 0, we note that
La 0 Ts- == La 0 Ts, since (Ts-, Ts) is an excursion interval of X and La is
continuous with support in Z. Thus, Y is a.s. continuous on [0, Loo).
By Corollary 22.6 and Proposition 22.12 the process BsY - is obtain-
able from OTsX through the same measurable mapping for all s < Loo. By
the regenerative property and Lemma 15.11 it follows that Y is a general-
22. Local Time, Excursions, and Additive Functionals 439
ized subordinator, and so by Theorem 15.4 and the cont.inuity of Y there
exists some c > 0 with Y s = cs a.s. on [0, LCXJ)' For t E Z' we have a.s.
To Lt = t, and therefore
Lf == La 0 (T 0 Lt) == (La 0 T) 0 Lt == cLt.
This extends to + since both extremes are continuous with support in
z. 0
For Brownian motion it is convenient to normalize local time according
to Tanaka's formula, which leads to a corresponding normalization of the
excursion law 1/. By the spatial homogeneity of Brownian motion, we may
restrict our attention to excursions from O. The next result shows that ex-
cursioty' of different length have the same distribution apart from a scaling.
For a precise statement, we may introduce the scaling operators Sr on D,
given by
(Srf)t == r 1/2 ftlr, t > 0, r > 0, fED.
Theorem 22.15 (Brownian excursion) Let 1/ be the normalized excursion
law of Brownian motion. Then there exists a unique distribution v on the
set of excursions of unit length such that
1/ = (21T)-1/2l°O (v 0 S;l )r- 3 / 2 dr. (21)
Proof: By Theorem 22.13 the inverse local time L- 1 is a subordinator
with Levy measure vo[-l, where l( u) denotes the length of u. Furthermore,
d
L == M by Corollary 22.3, where Mt == sUPst Bs, and so by Theorem
15.10 the measure 1/0[-1 has density (27r)-1/2r-3/2, r> o. As in Theorem
6.3, there exists a probability kernel (1/ r ) from (0,00) to Do such that
[ -1 - . d
1/r 0 == U r an
1/ = (21T)-1/2l OO I/ r r- 3 / 2 dr,
and we note that the measurs V T are unique a.e. A.
For any r > 0 the process B _ SrB is a_gain a Brownian motion, and by
Corollary 22.6 the local time of B equals L == SrL. If B h an excursion u
ending at time t, then the corresponding excursion Sru of B ends at rt, and
the local time for fJ at he new excursion equals Lrt = r 1 / 2 Lt. Thus, the
excursion process for B is obtained from the process for B through the
mapping Tr: (s, u) t---+ (r 1/2 s, Sr u ). Since d , each Tr leaves the intensity
measure A 0 1/ invariant, and we get
(22)
1/ 0 S -l = r l/2 1/ > 0
r , r .
(23)
Combining (22) and (23), we get for any r > 0
1 00 (I/ x 0 S;1)x- 3 / 2 dx = r 1 / 2 l°O I/ x x- 3 / 2 dx = l CX ) I/rxx-3/2dx,
440 Foundations of Modern Probability
and by the uniqueness in (22) we obtain
8 -1
V x 0 r == v rx ,
x > 0 a.e. A, r > o.
By Fubini's theorem, we may then fix an x == c > 0 such that
8 -1
V COr == Vcr,
r > 0 a.s. A.
Define f) == V e 0 8V' and conclude that for almost every r > 0
8 -1 8 -1 S -l A S -l
V r == Vc(rlc) == V e 0 rle == V e 0 lie 0 r == V 0 r .
Substituting this into (22) yields equation (21).
If J.-L is another probability measure with the stated properties, then for
almost every r > 0 we have J.L 0 8;1 == V 0 8;1, and hence
S -l 8 -1 A 8 -1 S -1 A
J.l == J.l 0 r 0 1/r == V 0 r 0 1/r == v.
Thus, f) is unique. 0
By continuity of paths, an excursion of Brownian motion is either positive
or negative, and by symmetry the two possibilities have the same probabil-
ity under f). This leads to the further decomposition f) == (f) + + v _ ). A
process with distribution v+ is called a (normalized) Brownian excursion.
For subsequent needs, we continue with a simple computation.
Lemma 22.16 (height distribution) Let v be the excursion law of
Brownian motion. Then
v{u E Do; SUPtUt > h} = (2h)-1, h > o.
Proof: By Tanaka's formula the process M == 2B V 0 - L O == B + IBI- LO
is a martingale, and so we get for T = inf{t > 0; Bt = h}
E L/\t == 2E(B r /\t V 0), t > o.
Hence, by monotone and dominated convergence EL = 2E(B T V 0) ==
2h. On the other hand, Theorem 22.11 shows that L is exponentially
distributed with mean (lIAh)-l, where Ah == {u; SUPtUt > h}. 0
The following result gives some remarkably precise information about
the spatial behavior of Brownian local time.
Theorem 22.17 (space dependence, Ray, Knight) For Brownian motion
B with local time L, let T == inf{t > 0; Bt == I}. Then on [0,1] the process
St == L;-t is a squared Bessel process of order 2.
Several proofs are known. Here we derive the result as an application of
the previously developed excursion theory.
Proof (Walsh): Fix any U E [0,1], put (j == L, and let :I: denote the
Poisson processes of positive and negative excursions from u. Write Y for
the process B, stopped when it first hits u. Then Y II (+ , -) and + JL-,
so + Jl(-, Y). Since a is + -measurable, we obtain + Jl(7(-, Y) and
hence : 1l(7(;, Y), which implies the Markov property of L at x = u.
22. Local Time, Excursions, and Additive Functionals 441
To derive the corresponding transition kernels, fix any x E [0, u), and
write h == u - x. Put TO == 0, and let 71,72, . .. be the right endpoints of
those excursions from x that reach u. Next define (k == L:k+l - Lk' k > 0,
so that L == (0 +... + ("" with K =: sup{k; Tk < T}. By ]emma 22.16 the
variables (k are Li.d. and exponentially distributed with mean 2h. Since f\;
agrees with the number of completed u-excursions before time T that reach
x and since (1 lL-, it is further seen that "" is conditionally Poisson (1/2h,
gIven a.
We also need the fact that (0-, K)ll((o, (1,. . . ). To see this, define (J"k ==
Lk. Since - is Poisson, we note that ((11, 0-2, . . . ) lL ((1, (2, . . . ), and so
(0",0"1,0"2,.. . )ll(Y, (1, (2,. .. ).'The desired relation now follows, since is
a measurable function of (0", (11, 0"2,. . .) and (0 depends measurably on Y.
For any s > 0, we may now compute
E[ e-sL-h I a]
E [ (Ee-s(o) 1<+11 a] = E [ (1 + 2sh) -1<-11 a]
{ -sO" }
(1 + 2sh)-1 exp 1 + 2sh .
In combination with the Markov property of L, the last relation is equiv-
alent, via the substitutions u == 1- t and 2s == (a - t)-l, to the martingale
property of the process
{ _L1-t }
Mt = (a - t)-1 exp 2(a =- t) ,
for arbitrary a > o.
Now let X be a squared Bessel process of order 2, and note that
L = Xo = 0 by Theorem 22.4. By Corollary 13.12 the process X is
again Markov. To see that X has the same transition kernel as L-t, it
is enough to show for 'an arbitrary a > 0 that the process M in (24) re-
mains a martingale when L-t is replaced by Xt. This is easily verified by
means of Ita's formula, if we note that X is a weak solution to the SDE
dX t = 2Xt1/2 dBt + 2dt. 0
t E [0, a),
(24)
As an important application of the last result, we may show that the
local time is strictly positive on the range of the process.
Corollary 22.18 (range and support) Let M be a continuous local
martingale with local time L. Then outside a fixed P-null set,
{L > O} == {infs=:;tMs < x < sUPs<tMs}, x E R, t > O. (25)
Proof: By Corollary 22.6 and the continuity of L, we have Lf == 0 for x
outside the interval in (25), except on a fixed P-null set. 'Ib see that Lf > 0
otherwise, we may reduce by Theorem 18.3 and Corollary 22.6 to the case
when M is a Brownian motion B. Letting T u = inf {t > 0; Bt == u}, we see
from Theorems 18.6 (i) and 18.16 that, outside a fixed }>-null set,
Lu > 0, 0 < x < u E Q+.
(26)
442 Foundations of Modern Probability
If 0 < X < sUPs<t Bs for some t and x, there exists some u E Q+ with
x < U < sUPs::;t Es. But then Tu < t, and (26) yields Lt > L;.u > o. A
similar argument applies to the case when infsst Bs < x < o. 0
Our third approach to local times is via additive functionals and their po-
tentials. To introduce those, consider a canonical Feller process X with state
space 5, associated terminal time (, probability measures Px, transition
operators Tt, shift operators Ot, and filtration F. By a continuous addi-
tive functional (CAF) of X we mean a nondecreasing, continuous, adapted
process A with Ao == 0 and A(vt - A(, and such that
A s + t == As + At 0 Os a.s., s, t > 0,
(27)
where a.s. without qualification means Px-a.s. for every x. By the continuity
of A, we may choose the exceptional null set to be independent of t. If it
can also be taken to be independent of s, then A is said to be perfect.
For a simple example, let f > 0 be a bounded, measurable function on
S, and consider the associated elementary CAF
At = it f(Xs)ds, t > O.
(28)
More generally, given any CAF A and a function f as above, we may define
a new CAF f. A by (f . A)t == fs<t f(Xs)dAs, t > O. A less trivial example
is given by the local time of X at a fixed point x, whenever it exists in
either sense discussed earlier.
For any CAF A and constant a > 0, we may introduce the associated
a- potential
U1(x) = Ex 1 00 e-atdA t , XES,
and put UAf = Uf.A' In the special case when At = t /\ (, we shall often
write UO f == UAf. Note in particular that U A == uo f == Rof when A is
given by (28). If a == 0, we may omit the superscript and write U == uO
and U A == u. The next result shows that a CAF is determined by its
a-potential whenever the latter is finite.
Lemma 22.19 (uniqueness) Let A and B be CAFs of a Feller process X
such that U A = Un < 00 for some a > 0. Then A == B a.s.
Proof: Define Af == fs<t e-asdA s , and conclude from (27) and the
Markov property at t that,ror any xES,
Ex[AIFt] - Af == e- at Ex[A 0 OtlFt] == e-atUA(X t ). (29)
Comparing with the same relation for B, it follows that Aa - Ba is a
continuous Px-martingale of finite variation, and so Aa == Ba a.s. Px by
Proposition 17.2. Since x was arbitrary, we get A == B a.s. 0
Given any CAF A of Brownian motion in ]Rd, we may introduce the
associated Revuz measure v A, given for any measurable function 9 > 0 on
22. Local Time, Excursions, and Additive Functionals 443
}Rd by VAg = E (g.A)l' where E == J Exdx. When A is given by (28), \ve get
in particular VAg == (I, g), where (.,.) denotes the inner product in L 2 (JR d ).
In general, we need to prove that v A is a-finite.
Lemma 22.20 (a-finiteness) For any CAF A of Brownian motion X in
]Rd, the associated Revuz measure v A is a-finite.
Proof: Fix any integrable function f > 0 on }Rd, and define
g(x) = Ex 1 00 e- t - At f(Xt)dt, x E ]Rd.
Using Corollary 19.19, the additivity of A, and Fubini's theorem, we get
Ulg(x) Ex 1 00 e-tdA t EXt 1 00 e- s - As f(Xs)ds
Ex 1 00 e-tdA t 1 00 e-s-AsOOt f(Xs+dds
Ex 1 00 eAtdAt 1 00 e- s - As f(Xs)ds
Ex 1 00 e- s - As f(Xs)ds l s eAtdAt
- Ex 1 00 e- S (1- e-AS)f(Xs)ds < Eo 1 00 e- S f(Xs + x)ds.
Hence, by Fubini's theorem
e-lvAg < J Ulg(x)dx < J dx Eo 1 00 e- S f(Xs + x)ds
- Eo 1 00 e-Sds J f(Xs + x)dx = J f{x)dx < 00.
The assertion now follows since 9 > O. 0
Now let Pt(x) denote the transition density (27rt)-d/2 e - 1 x I2 /2t of Brow-
nian motion in ]Rd, and put uO:(x) == fo oo e-o:tpt(x)dt. For any measure J-l
on d, we may introduce the associated a-potential UCtJ-l(x) == J uQ(x -
Y)J-L(dy). The following result shows that the Revuz measure has the same
potential as the underlying CAF.
Theorem 22.21 (a-potentials, Hunt, Revuz) For Brownian motion in
JRd, let A be a CAP with Revuz measure v A. Then U A == ua v A for all
Q > o.
Proof: By monotone convergence we may assume that ( > O. By Lemma
22.20 we may choose some positive functions In t 1 such that vfnoA1 ==
vAin < 00 for each n, and by dominated convergence we have U fnoA t U A
and UO:vfnoA t UO:VA. Thus, we may further assume that VA is bounded.
In that case, clearly U A < 00 a.e.
444 Foundations of Modern Probability
Now fix any bounded, continuous function f > 0 on R d , and note that
by dominated convergence UO: f is again bounded and continuous. Writing
h = n- 1 for an arbitrary n E N, we get by dominated convergence and the
additivity of A
l/AU Ol f = E {I U Ol f(Xs)dAs = Hrn E '"'. U Ol f(Xjh)A h 0 ()jh.
J o n-+oo J<n
Noting that the operator U a is self-adjoint and using the Markov property,
we may write the expression on the right as
L:. E UO: I(Xjh)ExJhAh == n J UO:f(x)ExAhdx = n{f, UO:E.A h ).
J<n
To estimate the function U a E.Ah on the right, it is enough to consider
arguments x such that UA(x) < 00. Using the Markov property of X and
the additivity of A, we get
U Ol E.Ah(x) Ex 1 00 e- OlS EXsAhds = Ex 1 00 e-OlS(Ah 0 ()s)ds
Ex 1 00 e-OlS(A s + h - As)ds
(e Olh - l)E x 1 00 e- OlS Asds - eOlhEx 1 h e- OlS A s d,.;{.30)
Integrating by parts gives
Ex 1 00 e- OlS Asds = a-I Ex 1 00 e-OltdA t = a-IU(x).
Thus, as n = h- 1 -t 00, the first term on the right of (30) yields in the
limit the contribution (I, U A ). The second term is negligible since
(f,E.Ah)::s E Ah = hVAl-+ O.
Hence,
(UO:vA,/) = VAUO:j = {UA,f},
and since f is arbitrary, we obtain U A = uo: v A a.e.
To extend this to an identity, fix any h > 0 and x E }Rd. Using the addi-
tivity of A, the Markov property at h, the a.e. relation, Fubini's theorem,
and the Chapman-Kolmogorov relation, we get
e Olh Ex 1 00 e-OlSdA s - Ex 1 00 e-OlSdA s OOh
ExUA(X h ) = ExUO:VA(X h )
J l/A (dy) ExuOl (Xh - y)
e Olh J l/A(dy) L oo e-OlSps(x - y)ds.
22. Local Time, Excursions, and Additive Functionals 445
The required relation U A (x) = ua V A (x) now follows by monotone
convergence as h -+ O. 0
It is now easy to show that a CAF is determined by its Revuz measure.
Corollary 22.22 (uniqueness) If A and Bare CAFs of 13rownian motion
in]Rd with VA = VB, then A = B a.s.
Proof: By Lemma 22.20 we may assume that v A is bounded, so that
U A < 00 a.e. for all a > O. Now V A determines U A by Theorem 22.21,
and from the proof of Lemma 22.19 we note that U A determines A a.s. Px
whenever UA(x) < 00. Since Px 0 x;; 1 « Ad for each h > 0, it follows that
A 0 Oh is a.s. unique, and it remains to let h -+ o. 0
We turn to the reverse problem of constructing a CAF associated with a
given potential. To motivate the following definition, we may take expected
values in (29) to get e-atTtU A < U A . A function f on S is said to be
uniformly a-excessive if it is bounded and measurable with 0 < e-atTtf <
f for all t > 0 and such that I/Tt! - ill -t 0 as t -t 0, where II . 1/ denotes
the supremum norm.
Theorem 22.23 (excessive functions and CAFs, Volkonsky) For any
Feller process X in S and constant a > 0, let f > 0 be a uniformly Q;-
excessive function on S. Then f == U A for some a.s. unique, perfect CAF
A of X .
Proof: For any bounded, measurable function 9 on S, v/e get by Fubini's
theorem and the Markov property of X
Ex roo e-o.tg(Xt)dt 2 = Ex roo e-o:tg(Xt)dt roo e-o:(t+h)g(Xt+h)dh
lo lo. J o
Ex 1 00 e- 2 o: t g(X t )dt 1 00 e-O:hThg(Xt)dh
Ex 1 00 e- 2 o. t gUo.g(X t )dt = 1 00 e- 2 o: t TtgUO:g(x)dt
< lI u o:glll°O e-o: t 1tlgl(x)dt < IIUO:gIIIIUO:jglll. (31)
Now introduce for each h > 0 the bounded, nonnegative functions
fh
h- 1 (f - e-ahT h !),
UO:g h = h- 1 l h e-asTsfds,
gh
and define
Ah(t) = I t gh(Xs)ds, Mh(t) = Ah(t) + e-o. t !h(X t ).
As in (29), we note that the processes M h are martinga.les under Px for
every x. Using the continuity of the Ah, we get by Proposition 7.16 and
446 Foundations of Modern Probability
(31), for any xES and as h, k -+ 0,
Ex{Ah - Ak)*2 < ExSllPtEQ+ IMh(t) - M k (t)1 2 + Ilfh - fkl1 2
< Ex IA(oo) - A(oo)12 + Ilfh - fkl1 2
< IIfh - fkllllfh + fkll + Ilfh - fkl1 2 -+ o.
Hence, there exists some continuous process A independent of x such that
Ex(Ah - AQ)*2 -+ 0 for every x.
For a suitable sequence h n -+ 0 we have (Ah n -+ Aa)* -+ 0 a.s. Px for all
x, and it follows easily that A is a.s. a perfect CAF. Taking limits in the
relation fh(X) == ExAh(oo), we also note that f(x) = ExAQ(oo) = UA(x).
Thus, A has a-potential f. 0
We will now use the last result to construct local times. Let us say that a
CAF A is supported by some set B c S if its set of increase is a.s. contained
in the closure of the set {t > 0; X t E B}. In particular, a nonzero and
perfect CAF supported by a singleton set {x} is called a local time at x.
This terminology is clearly consistent with our earlier definitions of local
time. Writing Tx == inf{t > 0; Xt = x}, we say that x is regular (for itself)
if Tx = 0 a.s. Px. By Proposition 22.7 this holds iff Px-a.s. the random set
Zx == {t > 0; Xt == x} has no isolated points.
Theorem 22.24 (additive functional local time, Blumenthal and Getoor)
A Feller process in S has a local time L at a point a E S iff a is regular.
In that case L is a.s. unique up to a normalization, and
Ul(x) = UI(a)Exe-ra < 00, xES. (32)
Proof: Let L be a local time at a. Comparing with the renewal process
L:;;1, n E Z+, we see that SUPx,t Ex (Lt+h - Lt) < 00 for every h > 0, which
implies UI(x) < 00 for all x. By the strong Markov property at T = Ta, we
get for any xES
Ul(x) Ex(L - L) = Exe-T(L 0 (}T)
Exe-T EaL = UI(a)Ex e - r ,
proving (32). The uniqueness assertion now follows by Lemma 22.19.
To prove the existence of L, define f(x) = Exe- r , and note that f is
bounded and measurable. Since T < t + T 0 ()t, we may also conclude from
the Markov property at t that, for any xES,
f(x) == Exe-T > e- t Ex(e- T oOt)
= e- t ExExte-T = e- t Exf(Xt) = e-tTtf(x).
Noting that (J't = t + T 0 ()t is nondecreasing and tends to 0 a.s. Pa as t 0
by the regularity of a, we further obtain
o < f(x) - e-hThf(x)
_ Ex(e- r - e- Uh ) < Ex(e- r - e- Uh + T )
Ex e - T Ea(l - e- Uh ) < Ea(l - e- Uh ) -+ O.
22. Local Time, Excursions, and Additive Functionals 447
Thus, f is uniformly I-excessive, and so by Theorem 22.23 there exists a
perfect CAF L with ul == f.
To see that L is supported by the singleton {a}, we may write
Ex(L - L;) == Exe-r EaL == Exe-r Eae-r == Exe--r == ExL,
which implies L;. == 0 a.s. Hence, Lr == 0 a.s., and so the Markov property
yields L(1t == Lt a.s. for all rational t. This shows that L has a.s. no point
of increase outside the closure of {t > 0; Xt == a}. 0
The next result shows that every CAF of one-dimensional Brownian mo-
tion is a unique mixture of local times. Recall that v A denotes the Revuz
measure of the CAF A.
Theorem 22.25 (integral representation, Volkonsky, Mcl{ean and Tanaka)
For Brownian motion X in JR with local time L, a process A is a CAF of
X iff it has an a. s. representation
At = I: Lfv(dx), t > 0, (33)
for some locally finite measure v on. The latter zs then unique and
equals v A .
Proof: For any measure v we may define an associated process A as in
(33). If v is locally finite, it is clear by the continuity of 1 and dominated
convergence that A is a.s. continuous, hence a CAF. In the opposite case,
we note that v is infinite in every neighborhood of some point a E JR. Under
Pa and for any t > 0, the process Lf is further a.s. continuous and strictly
positive near x == a. Hence, At == 00 a.s. Pa, and A fails to be a CAF.
Next, we conclude from Fubini's theorem and Theorem 22.5 that
E Lf = !(EyLf)d y = Eo ! L-Ydy = l.
Since LX is supported by {x}, we get for any CAF A as in (33)
vAf E (f. Ah = E ! v(dx) 1 1 f(Xt)dL:
J f(x)v(dx) E Li = vf,
which shows that v == VA.
Now consider an arbitrary CAF A. By Lemma 22.20 there exists some
function I > 0 with vAl < 00. The process
Bt = ! LfVfoA(dx) = ! Lf f(X)VA(dx),
t :> 0
-- ,
is then a CAF with VB = VI-A, and by Corollary 22.22 we get B == f . A
a.s. Thus, A = f-l . B a.s., and (33) follows. 0
448 Foundations of Modern Probability
Exercises
1. Use Lemma 13.15 to show that the set of increase of Brownian local time
at 0 agrees a.s. with the zero set Z. Extend the result to any continuous
local martingale. (Hint: Apply Lemma 13.15 to the process sgn(B-) . B in
Theorem 22.1.)
2. (Levy) Let M be the maximum process of a Brownian motion B. Show
that B can be measurably recovered from M - B. (Hint: Use Corollaries
22.3 and 22.6.)
3. Use Corollary 22.3 to give a simple proof of the relation 12 d 13 in The-
orem 13.16. (Hint: Recall that the maximum is unique by Lemma 13.15.)
Also use Proposition 18.9 to give a direct proof of the relation II d 72.
(Hint: Integrate separately over the positive and negative excursions of B,
and use Lemma 13.15 to identify the minimum.)
4. Show that for any c E (0, ), Brownian local time Lf is a.s. Holder
continuous in x with exponent c, uniformly for bounded t. Also show that
the bound c < is best possible. (Hint: Apply Theorem 3.23 to the estimate
in the proof of Theorem 22.4. For the last assertion, use Theorem 22.17.)
5. Let M be a continuous local martingale such that B 0 [M] a.s. for some
Brownian motion. Show that if B has local time Lf, then the local time of
M at x equals LX 0 [M]. (Hint: Use Theorem 22.5, and note that L 0 [M]
is jointly continuous.)
6. For any continuous semimartingale X, show that J f(Xs, s )d[X]s =
J dx J I(x, s)dL outside a fixed null set. (Hint: Extend Theorem 22.5 by
a monotone class argument.)
7. Let Z be the zero set of Brownian motion B. Use Proposition 22.12
and Theorem 22.15 to construct its local time L directly from Z. Also
use Lemma 22.16 to construct L from the heights of the excursions of B.
Finally, use Corollary 22.6 to construct L from the occupation measure of
B.
8. Let 'TJ be the maximum of a Brownian excursion. Show that E'TJ =
(7r/2)1/2. (Hint: Use Theorem 22.15 and Lemmas 22.16 and 3.4.)
9. Let L be the continuous local time of a continuous local martingale
M with [M]oo = 00 a.s. Show that a.s. Lf --* 00 as t --* 00, uniformly
on compacts. (Hint: Reduce to the case of Brownian motion. Then use
Corollary 22.18, the strong Markov property, and the law of large numbers.)
10. Show that the intersection of two regenerative sets is regenerative.
11. Let L be the local time of a regenerative set and let T be an indepen-
dent, exponentially distributed time. Show that L T is again exponentially
distributed. (Hint: Prove a Cauchy equation for the function P{Lr > s}.)
12. For any unbounded regenerative set Z, show that .c(Z) is a.s.
determined by Z. (Hint: Use the law of large numbers.)
22. Local Time, Excursions, and Additive Fanctionals 449
13. Let Z be a nontrivial regenerative set. Show that cZ d Z for all c > 0
iff the inverse local time is strictly stable.
14. Let X be a Feller process in JR and put Mt == sUPss;t Xs. Show that
the points of increase of M form a regenerative set. Also prove the same
d
statement for the process X; == sUPst IXsl when -X == )(.
15. Let X be a strictly stable Levy process, let Z denote the set of increase
of the process Mt = sUPs<t Xs, and write L for the local time of Z. As-
suming Z to be nontrivialshow that L- 1 is strictly stable. Also prove the
corresponding statement for X* when X is symmetric.
16. Give an explicit construction of the process X in Theorem 22.11, based
on the Poisson process and the constant c. (Hint: Use 1heorem 22.13 to
construct the time scale.)
17. Show that semimartingale local time is preserved under a change of
measure Q == Zt . P. Use this result to extend Corollary 22.18 Brownian
motion with a suitable drift. (Hint: Use Proposition 18.20 and Corollary
18.25. )
18. Show that the notion of a continuous additive functional is preserved
under a suitable change of measure Q == Zt . P. Use this result to extend
Theorem 22.25 to a Brownian motion with drift.
Chapter 23
One-Dimensional SDEs
and Diffusions
Weak existence and uniqueness; pathwise uniqueness and com-
parison; scale function and speed measure; time-change rep-
resentation; boundary classification; entrance boundaries and
Feller properties; ratio ergodic theorem; recurrence and ergod-
icity
By a diffusion is usually understood a continuous strong Markov process,
sometimes required to possess additional regularity properties. The basic
example of a diffusion process is Brownian motion, which was first intro-
duced and studied in Chapter 13. More general diffusions, first encountered
in Chapter 19, were studied extensively in Chapter 21 as solutions to suit-
able stochastic differential equations (SDEs). This chapter focuses on the
one-dimensional case, which allows a more detailed analysis. Martingale
methods are used throughout the chapter, and we make essential use of re-
sults on random time-change from Chapters 17 and 18, as well as on local
time, excursions, and additive functionals from Chapter 22.
After considering the Engelbert-Schmidt characterization of weak ex-
istence and uniqueness for the equation dX t == a(Xt)dBt, we turn to a
discussion of various pathwise uniqueness and comparison results for the
corresponding equation with drift. Next we proceed to a systematic study
of regular diffusions, introduce the notions of scale function and speed mea-
sure, and prove the basic representation of a diffusion on a natural scale
as a time-changed Brownian motion. Finally, we characterize the different
types of boundary behavior, establish the Feller properties for a suitable
extension of the process, and examine the recurrence and ergodic properties
in the various cases.
To begin with the SDE approach, consider the general one-dimensional
diffusion equation (a, b), given by
dX t == a(Xt)dBt + b(Xt)dt.
(1)
From Theorem 21.11 we know that if weak existence and uniqueness in law
hold for (1), then the solution process X is a continuous strong Markov
process. It is clearly also a semimartingale.
In Proposition 21.12 we saw how the drift term can sometimes be elim-
inated through a suitable change of the underlying probability measure.
23. One-dimensional SDEs and .Diffusions 451
Under suitable regularity conditions on the coefficients, we may use the
alternative approach of transforming the state space. Let us then assume
that X solves (1), and put yt = p(X t ) for some function p E 0 1 possessing
an absolutely continuous derivative p' with density p". By the generalized
Ita formula of Theorem 22.5, we have
dyt p'(Xt)dX t + p"(Xt)d[X]t
(ap')(Xt)dBt + (a2p" + bp')(Xt)dt.
Here the drift term vanishes iff p solves the ordinary differential equation
a2p" + bp' = O.
If bl a 2 is locally integrable, then (2) has the explicit solutions
p'(x) = c exp { -21 x (b(T-2) (u)du }, x E JR,
where c is an arbitrary constant. The desired scale function p is then deter-
mined up to an affine transformation, and for c > 0 it is strictly increasing
with a unique inverse p-1. The mapping by p reduces (1) to the form
dyt = o-(yt)dB t , where 0- = (ap')op-1. Since the new equation is equivalent,
it is clear that weak or strong existence or uniqueness hold simultaneously
for the two equations.
Once the drift has been removed we are left with an equation of the
form
(2)
dX t = a(Xt)dBt.
(3)
Here exact criteria for weak existence and uniqueness may be given in terms
of the singularity sets
Sa { X E JR; 1+ (T-2(y)dy = 00 } ,
N a {x E IR; u(x) = O}.
Theorem 23.1 (existence and uniqueness, Engelbert and Schmidt) Weak
existence holds for equation (3) with arbitrary initial distribution iff Sa C
N a. In that case, uniqueness in law holds for every initial distribution iff
Sa == N a .
Our proof begins with a lemma, which will also be useful later. Given
any measure v on JR., we may introduce the associated singularity set
SI/ = {x E JR.; v(x-, x+) = oo}.
If B is a one-dimensional Brownian motion with associated local time L,
we may also introduce the additive functional
As = J Lv(dx), s > 0,
(4)
452 Foundations of Modern Probability
Lemma 23.2 (singularity set) Let L be the local time of Brownian motion
B with arbitrary initial distribution, and define A by (4) for some measure
v on JR.. Then a.s.
inf{s > 0; As == co} == inf{s > 0; Bs E Sv}.
Proof: Fix any t > 0, and let R be the event where Bs tt Sv on [0, t].
Noting that Lf == 0 a.s. for x outside the range B[O, t], we get a.s. on R
At = i: V;v(dx) < v(B[O, t]) suPxL < 00
since B[O, t] is compact and Lf is a.s. continuous, hence bounded.
Conversely, suppose that Bs E Sv for some s < t. To show that At == 00
a.s. on this event, we may use the strong Markov property to reduce to the
case when Bo == a is nonrandom in Sv. But then Lf > 0 a.s. by Tanaka's
formula, and so by the continuity of L we get for small enough € > 0
At == 1 00 Lfv(dx) > v(a - c, a + c) inf Lf = 00. 0
-00 Ix-al<c
Proof of Theorem 23.1: First assume that 5(7 C N(7. To prove the as-
serted weak existence, let Y be a Brownian motion with arbitrary initial
distribution J.L, and define ( == inf{ s > 0; Y s E 5(7}. By Lemma 23.2 the
additive functional
As = is 0-- 2 (Yr)dr, S > 0, (5)
is continuous and strictly increasing on [0, (), and for t > (we have At == 00.
Also note that A, == 00 when ( == 00, whereas A, may be finite when ( < 00.
In the latter case A jumps from A, to 00 at time (.
Now introduce the inverse
Tt == inf {s > 0; As > t}, t > o. ( 6)
The process T is clearly continuous and strictly increasing on [0, A(], and
for t > A( we have Tt == (. Also note that Xt = Y Tt is a continuous local
martingale and, moreover,
[Tt ft
t = Art = 10 0--2(Y r )dr = 10 0-- 2 (Xs)dTs,
Hence, for t < A"
t < A,.
[Xh = Tt = it 0-2(Xs)ds. (7)
Here both sides remain constant after time A, since S(7 C N u , and so (7)
remains true for all t > O. Hence, Theorem 18.12 yields the existence of a
Brownian motion B satisfying (3), which means that X is a weak solution
with initial distribution /.L.
To prove the converse implication, assume that weak existence holds for
any initial distribution. To show that Su C N u , we may fix any x E Su and
23. One-dimensional SDEs and Diffusions 453
choose a solution X with Xo == x. Since X is a continuous local martingale,
Theorem 18.4 yields Xt == Y Tt for some Brownian motion Y starting at x
and some random time-change T satisfying (7). For A as in (5) and for
t > 0 we have
{Tt {t (t
AT, = Jo u- 2 (Y r )dr = J o u- 2 (X s )dT s = J o l{u(Xs) > O}ds < t.
(8)
Since As == 00 for s > 0 by Lemma 23.2, we get Tt == 0 a.s., and so Xt = x
a.s. But then x E N u by (7).
Turning to the uniqueness assertion, assume that N u c: Sa, and consider
a solution X with initial distribution J-L. As before, we may write Xt == Y Tt
a.s., where Y is a Brownian motion with initial distribution J-L and T is a
random time-change satisfying (7). Define A as in (5), put X == inf{t >
0; Xt E S(7}, and note that TX == ( = inf{ s > 0; Y s E Su}' Since N(7 C Su,
we get as in (8)
{Tt
AT, = J o u- 2 (Y s )ds = t,
t < X.
Furthermore, As = 00 for s > ( by Lemma 23.2, and so (8) implies Tt < (
a.s. for all t, which means that T remains constant after time X. Thus, T and
A are related by (6), which shows that r and then also X are measurable
functions of Y. Since the distribution of Y depends only on J-L, the same
thing is true for X, which proves the asserted uniqueness in law.
To prove the converse, assume that Su is a proper subset of N u, and fix
any x E N u \ Su. As before, we may construct a solution starting at x by
writing Xt = Y Tt , where Y is a Brownian motion starting at x, and T is
defined as in (6) from the process A in (5). Since x tf. SO", Lemma 23.2 gives
A o + < 00 a.s., and so It > 0 a.s. for t > 0, which shows that X is a.s.
nonconstant. Since x E N u , (3) has also the trivial solution Xt = x. Thus,
uniqueness in law fails for solutions starting at x. 0
Proceeding with a study of pathwise uniqueness, we return to equation
(1), and let w(a,.) denote the modulus of continuity of (]".
Theorem 23.3 (pathwise uniqueness, Skorohod, Yamada and Watanabe)
Let a and b be bounded, measurable functions on JR, where
lE(w(u,h))-2dh = 00, c>O, (9)
and either b is Lipschitz continuous or (J i= O. Then pathwise uniqueness
holds for equation (a , b) .
The significance of condition (9) is clarified by the following lemma,
where for any semimartingale Y we write Lf (Y) for the associated local
time.
454 Foundations of Modern Probability
Lemma 23.4 (local time) For i == 1,2, let Xi solve equation (0", b i ), where
a satisfies (9). Then LO(X I - X 2 ) = 0 a.s.
Proof: Write Y == Xl - x 2 , Li == Lf{Y), and w(x) == w{a, Ixl). Using
(1) and Theorem 22.5, we get for any t > 0
1 00 Lidx i t d[Y]s i t { a(X;) - O"(X;) } 2
2 == 2 == 1 2 ds < t < 00.
-ex) W x 0 (w(Ys)) 0 w(X s - Xs)
By (1) and the right-continuity of L it follows that L = 0 a.s. 0
Proof of Theorem 23.3 for a :f- 0: By Propositions 21.12 and 21.13 com-
bined with a simple localization argument, we note that uniqueness in law
holds for equation (a, b) when 0" i= O. To prove the pathwise uniqueness,
consider any two solutions X and Y with X o = Yo a.s. Using Tanaka's
formula, Lemma 23.4, and equation (a, b), we get
d(X t V yt)
dX t + d(yt - Xt)+
dX t + l{yt > Xt}d(¥t - Xt)
l{yt < Xt}dX t + l{yt > Xt}dyt
a(X t V Yt)dBt + b(X t V Yt)dt,
which shows that X V Y is again a solution. By the uniqueness in law we
get X d X V Y. Since X < X V Y, it follows that X = X V Y a.s., which
implies Y < X a.s. Similarly, X < Y a.s. 0
The assertion for Lipschitz continuous b is a special case of the following
comparison result.
Theorem 23.5 (weak comparison, Skorohod, Yamada) Fix some func-
tions a and b I > b 2 , where 0" satisfies (9) and either b I or b 2 is Lipschitz
continuous. For i = 1,2, let Xi solve equation (0-, b i ), and assume that
XJ > X5 a.s. Then Xl > X 2 a.s.
Proof: By symmetry we may assume that b 1 is Lipschitz continuous.
Since X5 < XJ a.s., we get by Tanaka's formula and Lemma 23.4
(xi - Xl)+
it l{X; > X;} (a(X;) - a(XI)) dBt
+ it l{X; > X;} (b 2 (X;) - bl(X;)) ds.
23. One-dimensional SDEs and Diffusions 455
Using the martingale property of the first term, the Lipschitz continuity of
b l , and the condition b 2 < b I , we conclude that
E(X; - xl)+ < E it l{X; > xl} (b1(X;) - b1(xl)) ds
< E it l{X; > xl} Ix; - xli ds
it E(X; - xl)+ ds.
By Gronwall's lemma E(Xl - Xl)+ == 0, and hence xl < xl a.s. 0
Imposing stronger restrictions on the coefficients, we rnay strengthen the
last conclusion to a strict inequality.
Theorem 23.6 (strict comparison) Fix a Lipschitz continuous function (J'
and some continuous functions b l > b 2 . For i == 1, 2, let X'l solve equation
(a,b i ), andassumethatXJ > X6 a.s. Then Xl >x 2 a.s. on (0,00).
Proof: Since the b i are continuous with b l > b 2 , there exists a locally
Lipschitz continuous function b on 1R with b l > b > b 2 . By Theorem 21.3
equation (a, b) has a solution X with Xo == XJ > xg a.s., and it suffices
to show that Xl > X > X 2 a.s. on (0,00). This reduces the discussion to
the case when one of the functions b i is locally Lipschitz. By symmetry we
may take that function to be b l .
By the Lipschitz continuity of a and b l , we may define some continuous
semimartingales U and V by
U t it (b 1 (X;) - b 2 (X;)) ds,
i t a(X;) - a(X;) dB i t bl(X;) - b1(X;) d
t X l _ X 2 s + X l _ X 2 s,
o s s 0 s s
subject to the convention % == 0, and we note that
d(Xl- X;) == dU t + (Xl- X;)dvt.
Letting Z == exp(V - ![V]) > 0, we get by Proposition 21.2
xl- xl = Zt(xJ - xJ) + Zt it Z;l (b1(X;) - b 2 (X;)) ds,
and the assertion follows since XJ > xg a.s. and b l > b 2 .
D
We turn to a systematic study of one-dimensional diffusions. By a diffu-
sion on some interval I C JR we mean a continuous strong Markov process
taking values in I. Termination will only be allowed at open end-points
of I. We define Ty == inf{t > 0; Xt == y} and say that X is regular if
Px{Ty < oo} > o for any x E [0 andy E I. Let us further write Ta,b == Ta/\Tb.
456 Foundations of Modern Probability
Our first aim is to transform the general diffusion process into a contin-
uous local martingale, using a suitable change of scale. This corresponds
to the removal of drift in the SDE (1).
Theorem 23.7 (scale function, Feller, Dynkin) For any regular diffusion
X on I, there exists a continuous and strictly increasing function p on I
such that p(XTa,b) is a Px-martingale for all a < x < b in I. Furthermore,
an increasing function p has the stated property iff
P { } Px - Pa [ ]
x Tb < T a == , X E a, b .
Pb - Pa
(10)
A function p with the stated property is called a scale function for X,
and we say that X is on a natural scale if the scale function can be chosen
to be linear. In general, we note that Y == p(X) is a regular diffusion on a
natural scale.
Our proof begins with a study of the functions
Pa,b(X) == Px{ Tb < Ta}, ha,b(X) == ExTa,b,
which play a basic role in the subsequent analysis.
Lemma 23.8 (hitting times) For any regular diffusion on I and constants
a < b in I, we have
(i) Pa,b is continuous and strictly increasing on [a, b];
(ii) ha,b is bounded on [a, b].
a < x < b,
In particular, we see from (ii) that Ta,b < 00 a.s. under Px for any a <
x < b.
Proof: (i) First we show that Px{ Tb < Ta} > 0 for any a < x < b.
Then introduce the optional time 0"1 == Ta + Tx 0 ()T a , and define recursively
O"n+1 == an +a1 ofJ un . By the strong Markov property the an form a random
walk in [0,00] under each Px. If Px{ Tb < Ta} == 0, we get Tb > an --t 00 a.s.
Px, and so Px {Tb == oo} == 1, which contradicts the regularity of X.
Using the strong Markov property at Ty, we next obtain
Px{Tb < Ta} == Px{Ty < Ta}Py{Tb < Ta}, a < x < y < b. (11)
Since Px{Ta < Ty} > 0, we have Px{Ty < Ta} < 1, which shows that
Px {Tb < T a} is strictly increasing.
By symmetry it remains to prove that Py{Tb < Ta} is left-continuous on
(a, b]. By (11) it is equivalent to show for each x E (a, b) that the mapping
Y H- Px{Ty < Ta} is left-continuous on (x, b]. Then let Yn t Y, and note that
TYn t Ty a.s. Px by the continuity of X. Hence, {TYn < Ta} t {7y < Ta},
which implies convergence of the corresponding probabilities.
(ii) Fix any C E (a, b). By the regularity of X we may choose h > 0 so
large that
Pc{Ta < h} A Pc{Tb < h} = fJ > O.
23. One-dimensional SDEs and Diffusions 457
If x E (a, c), we may use the strong Markov property at Tx to get
{; < Pc{Ta < h} < Pc{Tx < h}Px{Ta < h}
< Px{Ta < h} < Px{Ta,b < h},
and similarly for x E (c, b). By the Markov property at h and induction on
n we obtain
Px{Ta,b > nh} < (1- 8)n, x E [a,b], n E Z+,
and Lemma 3.4 yields
ExTa,b = 1 00 Px{Ta,b > t}dt < h I:no(l - 8)n < 00. 0
Proof of Theorem 23. 7: Let P be a locally bounded and measurable
function on I such that M == p(XTa,b) is a martingale under Px for any
a < x < b. Then
Px ExMo == ExMoc> == EXP(XTa,b)
PaPx{ Ta < Tb} + PbPx{ Tb < Ta}
Pa + (pb - Pa)Px{ Tb < Ta},
and (10) follows, provided that Pa =I Pb.
To construct a function p with the stated properties, fix any points u < v
in I, and define for arbitrary a < u and b > v in I
( ) Pa,b(X) - Pa,b( u)
P x -
- Pa,b(V) - Pa,b(U) ,
To see that P is independent of a and b, consider any larger interval [a', b'] in
I, and conclude from the strong Markov property at Ta,b that, for x E [a, b],
x E [a, b] .
(12)
PX{Tb ' < Tal} = Px{Ta < Tb}Pa{Tb ' < Tal} + Px{Tb < Ta}Pb{Tb ' < Tal},
or
Pal,bl(X) = Pa,b(X)(Pa',bl(b) - Pal,bl(a)) + Pa'b,(a).
Thus, Pa,b and Pa' ,b ' agree on [a, b] up to an affine transformation and so
give rise to the same value in (12).
By Lemma 23.8 the constructed function is continuous and strictly in-
creasing, and it remains to show that p(XTa,b) is a martingale under Px for
any a < b in I. Since the martingale property is preserved by affine trans-
formations, it is equivalent to show that Pa,b(XTa,b) is a Px-martingale.
Then fix any optional time a, and write T == a /\ Ta,bo By the strong Markov
property at T we get
ExPa,b(X T ) ExPX-r{Tb < Ta} == p x fr;l{Tb < Ta}
- Px{Tb < Ta} = Pa,b(X),
and the desired martingale property follows by Lemma 7.13. 0
458 Foundations of Modern Probability
To prepare for the next result, consider a Brownian motion B in 1R with
associated jointly continuous local time L. For any measure v on JR., we
may introduce as in (4) the associated additive functional A == J LXv(dx)
and its right-continuous inverse
at == inf {s > 0; As > t}, t > o.
If v ::j:. 0, it is clear from the recurrence of B that A is a.s. unbounded.
Hence, at < 00 a.s. for all t, and we may define Xt == Bat' t > o. We shall
refer to (j == (at) as the random time-change based on v and to the process
X == Boa as the correspondingly time-changed Brownian motion.
Theorem 23.9 (speed measure and time-change, Feller, Volkonsky, It6
and McKean) For any regular diffusion on a natural scale in I, there exists
a unique measure v on I with v[a, b] E (0,00) for all a < b in ]0, such that
X is a time-changed Brownian motion based on some extension of v to 1.
Conversely, any such time-change of Brownian motion defines a regular
diffusion on I.
The extended version of v is called the speed measure of the diffusion.
Contrary to what the term suggests, we note that the process moves slowly
through regions where v is large. The speed measure of Brownian motion it-
self is clearly equal to Lebesgue measure. More generally, the speed measure
of a regular diffusion solving equation (3) has density a- 2 .
To prove the uniqueness of v we need the following lemma, which is also
useful for the subsequent classification of boundary behavior. Here we write
a a,b == i nf {s > 0; B s (a, b ) } .
Lemma 23.10 (Green function) Let X be a time-changed Brownian mo-
tion based on v. Then for any measurable function f > 0 on I and points
a < b in I,
(T a b l b
Ex J o ' f(Xddt = a ga,b(X, y)f(y)v(dy), X E [a, b],
where
( ) _ E L y - 2(x /\ Y - a)(b - x V y) [ b ]
ga,b x, Y - x a a b - b ' x, yEa, .
, -a
If X is recurrent, this remains true with a == -00 or b == 00.
(13)
(14)
Taking f - 1 in (13), we get in particular the formula
ha,b(X) = ExTa,b = l b 9a,b(X, y)v(dy), x E [a, b], (15)
which will be useful later.
Proof: Clearly, Ta,b == A(aa,b) for any a, b E 1, and also for a == -00 or
b == 00 when X is recurrent. Since LY is supported by {y}, it follows by (4)
that
{T a b (aa b l b
Jo ' f(Xt)dt = Jo · f(Bs)dAs = a f(y)La,bv(dy).
23. One-dimensional SDEs and Diffusions 459
Taking expectations gives (13) with ga,b(X, y) == ExLa.b' To prove (14), we
note that by Tanaka's formula and optional sampling
ExLa,bI\S == ExlBaa.bl\S - yl - Ix - yl, s > o.
If a and b are finite, we may let s ---+ ()() and conclude by monotone and
dominated convergence that
( ) (y-a)(b-x) (b-y)(x-a) I I
9a,b x, Y == b + b - x - y ,
-a -a
which simplifies to (14). The result for infinite a or b follows immediately
by monotone convergence. D
The next lemma will enable us to construct the speed measure v from
the functions ha b in Lemma 23.8.
,
Lemma 23.11 (consistency) For any regular diffusion on a natural scale
in I, there exists a strictly concave function h on 1° such that, for any
a < b in I,
x-a b-x
ha,b(X) == h(x) - b _ a h(b) - b _ a h(a), x E [a, b]. (16)
Proof: Fix any u < v in I, and define for any a < u and b > v in I
x-u v-x
h(x) == ha b(X) - ha b(V) - ha b(U), :r E [a, b]. (17)
, v-u' v-u'
To see that h is independent of a and b, consider any larger interval [ai, b'] in
I, and conclude from the strong Markov property at Ta,b that, for x E [a, b],
ExTal,b l == ExTa,b + Px{Ta < Tb}EaTa',b l + Px{Tb < Ta}EbTal,b "
or
b-x x-a
ha',b/(X) == ha,b(X) + b _ a ha',bl(a) + b _ a ha',b,(b). (18)
Thus, ha,b and ha',b ' agree on [a, b) up to an affine function and therefore
yield the same value in (1 7) .
If a < U and b > v, then (17) shows that hand ha,b agree on [a, b] up to
an affine function, and (16) follows since ha,b(a) == ha,b(b) == o. The formula
extends by means of (18) to arbitrary a < b in I. 0
Since h is strictly concave, its left derivative h'- is strictly decreasing and
left-continuous, and so it determines a measure v on [0 satisfying
2v[a, b) == h'- (a) - h'- (b), a < b in ]0.
(19)
For motivation, we note that this expression is consistent with (15).
The proof of Theorem 23.9 requires some understanding of the behavior
of X at the endpoints of I. If an endpoint b does not belong to I, then by
hypothesis the motion terminates when X reaches b. It is clearly equivalent
to attach b to I as an absorbing endpoint. For convenience we may then
assume that I is a compact interval of the form [a, b), where either endpoint
460 Foundations of Modern Probability
may be inaccessible, in the sense that a.s. it cannot be reached in finite time
from a point in [0.
For either endpoint b, the set Zb == {t > 0; Xt == b} is regenerative under
P b in the sense of Chapter 22. In particular, we see from Lemma 22.8 that
b is either absorbing, in the sense that Zb == 1R+ a.s., or reflecting, in the
sense that Zb == 0 a.s. In the latter case, we say that the reflection is fast if
)..Zb == 0 and slow if )..Zb > o. A more detailed discussion of the boundary
behavior will be given after the proof of the main theorem.
We first establish Theorem 23.9 in a special case. The general result will
then be deduced by a pathwise comparison.
Proof of Theorem 23.9 for absorbing endpoints (Miliard): Let X have
distribution Px, where x E [0, and put <: == inf{t > 0; Xt =1= [O}. For any
a < b in 1° with x E [a, b], the process XTa,b is a continuous martingale,
and so by Theorem 22.5
h(X t ) = h(x) + it h'-(X)dX -1 Lfv(dx), t E [O,(), (20)
where L denotes the local time of X.
Next conclude from Theorem 18.4 that X == B 0 [X] a.s. for some Brown-
ian motion B starting at x. Using Theorem 22.5 twice, we get in particular,
for any nonnegative measurable function f, -
f {t f[Xh f
if f(x)Lfdx = io f(Xs)d[X]s = io f(Bs)ds = if f(x)LfxJtdt,
where L denotes the local time of B. Hence, Lf == Lfxh a.s. for t < (, and
so the last term in (20) equals A[Xh a.s.
For any optional time (J, put 7 == (J 1\ 7a,b, and conclude from the strong
Markov property that
Ex [7 + ha,b(X T )]
Ex [7 + EXT 7a,b]
- Ex [7 + Ta,b 0 B T ] == ExTa,b = ha,b(X).
Writing Mt == h(X t ) + t, it follows by Lemma 7.13 that MTa,b is a P x -
martingale whenever x E [a, b] c ]0. Comparing with (20) and using
Proposition 17.2, we obtain A[Xh == t a.s. for all t E [0, (). Since A is
continuous and strictly increasing on [0, () with inverse (J, it follows that
[X]t = Ut a.s. for t < (. The last relation extends to [(,00), provided that
1/ is given infinite mass at each endpoint. Then X == B 0 (J a.s. on JR+.
Conversely, it is easily seen that Boa is a regular diffusion on I when-
ever (J" is a random time-change based on some measure 1/ with the stated
properties. To prove the uniqueness of v, fix any a < x < b in 1°, and apply
Lemma 23.10 with fey) = (ga,b(X, y))-l to see that v(a, b) is determined
by Px. 0
Proof of Theorem 23.9, general case: Define 1/ on [0 as in (19), and
extend the definition to I by giving infinite mass to absorbing endpoints.
23. One-dimensional SDEs and Diffusions 461
To every reflecting endpoint we attach a finite mass, to be specified later.
Given a Brownian motion B, we note as before that the correspondingly
time-changed process X = B 0 (1 is a regular diffusion on I. Letting ( ==
sup{t; Xt E IO} and £ == sup{t; Xt E IO}, we further see from the previous
case that X< and X< have the same distribution for any starting position
x E 1°.
Now fix any a < b in 1°, and define recursively
Xl == (" + Ta,b 0 B<; Xn+l == Xn + Xl 0 (}Xn' n E N.
The processes y,b = x< 0 (}Xn then form a Markov chain in the path
space. A similar construction for X yields some processes y::,b, and we
d - , b'
note that (y:,b) == (y::,b) for fixed a and b. Since the processes y '
for any smaller interval [a', b'] can be measurably recovered from those for
[a, b) and similarly for y::' ,b' , it follows that the whole collections (y;,b) and
(y,b) have the same distribution. By Theorem 6.10 we may then assume
that the two families agree a.s.
Now assume that I == [a, b], where a is reflecting. Fro the propertie.:"
of Brownian motion we note that the level sets Za and Za for X and X
are a.s. perfect. hus, we may introdue the corresponding excursion poi:r:t
processes and , local times Land L, and inverse locl times T and T.
Since the excursions within [a, b) agree a.s. for X and ...Y, it is clear from
the law of large numbers that we may normalize the excursin laws for
the two processes sch that the corresponding parts of and agree a.s.
Then even T and T agree, possibly apart from the lengths of_ excursions
that reach b and the drift coefficient c in Theorem 22.13. For X the latter
is proportional to the mass v{a}, which may now be chosen such that c
becomes the same as for X . Note that this choce of v { a} is independent
of starting position x for the processes X and X. _
If the other endpoint b is absorbing, then clearly X == X a.s., and the
proof is complee. If b is instead reflecting, then the excursions from b agree
a.s. for X and X. Repeating !he previous argument with the roles of a and
b interchanged, we get X = X a.s. after a suitable adjustment of the mass
v{b}. 0
We proceed to classify the boundary behavior of a regular diffusion on a
natural scale in terms of the speed measure lI. A right endpoint b is called
an entrance boundary for X if b is inaccessible, and yet
Hrn infPy{Tx < r}>O, xEIo.
roo y>x
(21)
By the Markov property at times nr, n E N, the limit in (21) then equals
1. In particular, Py{ Tx < oo} = 1 for all x < y in 1°. As we shall see in
Theorem 23.13, an entrance boundary is an endpoint where X may enter
but not exit.
The opposite situation occurs at an exit boundary. By this we mean an
endpoint b that is accessible and yet naturally absorbing, in the sense that
462 Foundations of Modern Probability
it remains absorbing even when the charge v{b} is reduced to zero. If b
is accessible but not naturally absorbing, we have already seen how the
boundary behavior of X depends on the value of v{b}. Thus, b in this case
is absorbing when v{b} = 00, slowly reflecting when v{ b} E (0,00), and fast
reflecting when v{b} == o. For reflecting b it is further clear from Theorem
23.9 that the set Zb == {t > 0; Xt == b} is a.s. perfect.
Theorem 23.12 (boundary behavior, Feller) Let v be the speed measure
of a regular diffusion on a natural scale in somt; interval I == [a, b], and fix
any u E [0. Then
(i) b is accessible iff it is finite with J(b - x)v(dx) < 00;
(ii) b is accessible and reflecting iff it is finite with v( u, b] < 00;
(iii) b is an entrance boundary iff it is infinite with J: xv(dx) < 00.
The stated conditions may be translated into corresponding criteria for
arbitrary regular diffusions. In the general case it is clear that exit and other
accessible boundaries may be infinite, whereas entrance boundaries may be
finite. Explosion is said to occur when X reaches an infinite boundary point
in finite time. An interesting example of a regular diffusion on (0, 00) with
o as an entrance boundary is given by the Bessel process Xt == IBtl, where
B is a Brownian Illotion in lR d with d > 2.
Proof of Theorem 23.12: (i) Since limsups(::l:B s ) == 00 a.s., Theorem
23.9 shows that X cannot explode, so any accessible endpoint is finite.
Now assume that a < c < u < b < 00. Then Lemma 23.8 shows that b is
accessible iff hc,b(U) < 00, which by (15) is equivalent to J:(b - x)v(dx) <
00.
(ii) In this case b < 00 by (i), and then Lemma 23.2 shows that b is
absorbing iff v( u, b] == 00.
(iii) An entrance boundary b is inaccessible by definition, and therefore
Tu == Tu,b a.s. when a < U < b. Arguing as in the proof of Lemma 23.8,
we also note that EyTu is bounded for y > u. If b < 00, we obtain the
contradiction EyTu = hu,b(Y) == 00, and so b must be infinite. From (15)
we get by monotone convergence as y -t 00
EyTu = hu,oo(Y) = 21 00 (x 1\ Y - u)v(dx) -+ 21 00 (x - u)v(dx),
which is finite iff Ju oo xv(dx) < 00.
o
We proceed to establish an important regularity property, which also
clarifies the nature of entrance boundaries.
Theorem 23.13 (entrance laws and Feller properties) Given a regular
diffusion on I, form an extended interval I by attaching the possible en-
trance boundaries to I. Then the original diffusion extends to a continuous
Feller process on I.
23. One-dimensional SDEs and Diffusions 463
Proof: For any f E Cb, a, x E I, and r, t > 0, we get by the strong
Markov property at Tx !\. r
Eaf(X"'x/\r+t)
EaTtf(XTX/\T)
Ttf(x)Pa{Tx < r} + Ea[Ttf(Xr); Tx > r]. (22)
To show that Tt! is left-continuous at some y E 1, fix any a < y in 1°,
and choose r > 0 so large that Pa{Ty < r} > O. As x t y, we have Tx t Ty
and hence {Tx < r} .J, {Ty < r}. Thus, the probabilities and expectations in
(22) converge to the corresponding expressions for Ty, and we get Ttf(x) -+
Tt f (y ). The proof of the right-continuity is similar.
If an endpoint b is inaccessible but not of entrance type, and if f (x) -+ 0
as x -+ b, then clearly even Tt!(x) -+ 0 at b for each t > Q. Now assume
that 00 is an entrance boundary, and consider a function f with a finite
limit at 00. We need to show that even Ttf(x) converges as x -+ 00 for
fixed t. Then conclude from Lemma 23.10 that as a -+ 00,
sup ExTa = 2 sup fOO(Xl\r-a)v(dr) =2 foo(r-a)v(dr)-+O. (23)
xa xa Ja Ja
Next we note that, for any a < x < y and r > 0,
Py{Ta < r} < Py{Tx < r,Ta-Tx < r}
- Py{Tx < r}Px{Ta < r} < Px{Ta < r}.
Thus Px 0 T;;l converges vaguely as x -+ 00 for fixed a, and in view of (23)
the convergence holds even in the weak sense.
Now fix any t and f, and introduce for each a the continuous func-
tion ga(s) = EaJ(X(t-s)+)' By the strong Markov property at Ta !\. t and
Theorem 6.4 we get for any x, y > a
ITtJ(x) - TtJ(y)\ < IExga(Ta) - Eyga(Ta)1 + 211fll(Pr + Py){Ta > t}.
Here the right-hand side tends to zero as x, y -+ 00 and then a -t 00,
because of (23) and the weak convergence of Px 0 T;l. Thus, Ttf(x) is
Cauchy convergent as x -+ 00, and we may denote the limit by Tt f ( 00 ) .
It is now easy to check that the extended operators Tt form a Feller semi-
group on Co(I). Finally, it is clear from Theorem 19.15 that the associated
process starting at a possible entrance boundary again has a continuous
version, in the topology of I. 0
We proceed to establish a ratio ergodic theorem for elementary additive
functionals of a recurrent diffusion. It is instructive to compare with the
general ratio limit theorems of Chapter 20.
464 Foundations of Modern Probability
Theorem 23.14 (ratio ergodic theorem, Derman, Motoo and Watanabe)
Let X be a regular, recurrent diffusion on a natural scale and with speed
measure v. Then for any measurable functions I,g > 0 on I with vi < 00
and vg > 0,
lirn J f{Xs)ds = v f a.s. Px,
t-HX) J; g(Xs)ds vg
x E I.
Proof: Fix any a < b in I, put Ti == Tb + Ta 0 Orb' and define recursively
some optional times ao, aI, . .. by
an+l == an + T: 0 BUn' n > 0,
starting with ao = Ta. Write
l<7 n f{Xs)ds = 1<7 0 f{Xs)ds + t 1:1 f{Xs)ds, (24)
and note that the terms of the last sum are i.i.d. By the strong Markov
property and Lemma 23.10, we get for any x E I
Ex j <7 k f{Xs)ds = Ea r b f{Xs)ds + Eb r a f{Xs)ds
Uk-l Jo Jo
- J f{y){9-oo,b{y, a) + 9a,oo{Y, b)}v{dy)
2 J f(y){{b-.yva)++(y/\b-a)+}v(d Y )
2(b - a)v f.
From the same lemma, we also see that the first term in (24) is a.s. finite.
Hence, by/the law of large numbers
Hrn n- 1 r n f{Xs)ds = 2{b - a)vf a.s. P"" x E I.
n-+oo Jo
Writing Kt = sup{ n > 0; an < t}, we get by monotone interpolation
Hrn K,t 1 t f{Xs)ds = 2(b - a)vf a.s. P x , x E I. (25)
t-+oo Jo
This remains true when v f = 00, since we can then apply (25) to some
approximating functions In t f with vfn < 00 and let n 00. The
assertion now follows as we apply (25) to both f and g. 0
We may finally classify the asymptotic behavior of the process, according
to the boundedness of the speed measure v and the nature of the endpoints.
For convenience, we may first apply an affine mapping to transforms ]0
into one of the intervals (0,1), (0, (0), or (00,00). Since finite endpoints
may be either inaccessible, absorbing, or reflecting-represented below by
the brackets (, [, and [[, respectively-we need to distinguish between ten
different cases.
23. One-dimensional SDEs and Diffusions 465
We say that a diffusion is v-ergodic if it is recurrent and such that Px 0
X t - l vjvI for all x. A recurrent diffusion may be either null-recurrent
or positive recurrent, depending on whether \Xtl -4 00 or not. Let us also
recall that absorption occurs at an endpoint b whenever Xt == b for all
sufficiently large t.
Theorem 23.15 (recurrence and ergodicity, Feller, Maruyama and Tanaka)
For any regular diffusion on a natural scale and with speed measure v, the
ergodic behavior is the following, depending on initial position x and the
nature of the boundaries:
(-00,00): v-ergodic if v is bounded, otherwise null-recurrent;
(0,00): converges to 0 a.s.;
[0,00): absorbed at 0 a.s.;
[[0,00): v-ergodic if v is bounded, otherwise null-recurrent;
(0, 1): converges to 0 or 1 with probabilities 1 - x and x, respectively;
[0, 1): absorbed at 0 or converges to 1 with probabilities 1 - x and x,
respectively;
[0, 1]: absorbed at 0 or 1 with probabilities 1 - x and x, respectively;
[[0,1): converges to 1 a.s.;
[[0, 1]: absorbed at 1 a.s.;
[[0,1]]: v-ergodic.
We begin our proof with the relatively elementary recurrence properties,
which distinguish between the possibilities of absorption, convergence, and
recurrence.
Proof of recurrence properties:
[0,1]: Relation (10) yields Px{TO < oo} == 1 - x and Px{Tl < oo} == x.
[0,00): By (10) we have for any b > x
Px{70 < oo} > Px{70 < Tb} == (b - x)/b,
which tends to 1 as b --+ 00.
(-00,00): The recurrence follows from the previous case.
[[0,00): Since 0 is reflecting, we have PO{Ty < oo} > 0 for some y > o. By
the strong Markov property and the regularity of X, this extends to
arbitrary y. Arguing as in the proof of Lemma 23.8, we may conclude
that Po { T Y < oo} = 1 for all y > o. The asserted recurrence now
follows, as we combine with the statement for [0, ()o).
(0,00): In this case X == B 0 [X] a.s. for some Brownian motion B. Since
X > 0, we have [X]oo < 00 a.s., and therefore X converges a.s. Now
Py{ Ta,b < oo} = 1 for any 0 < a < y < b. Applying the Markov
property at an arbitrary time t > 0, we conclude that a.s. either
466 Foundations of Modern Probability
lim inft Xt < a or lirn SUPt Xt > b. Since a and b are arbitrary, it
follows that Xoo is an endpoint of (0, 00) and hence equals o.
(0,1): Arguing as in the previous case, we get a.s. convergence to either 0
or 1. To find the corresponding probabilities, we conclude fronl (10)
that
b-x
Px{Ta<OO} > Px{Ta<Tb}== b-a ' O<a<x<b<l.
Letting b -+ 1 and then a -+ 0, we obtain Px{Xoo == O} > 1 - x.
Similarly, Px {X 00 = I} > x, and so equality holds in both relations.
[0,1): Again X converges to either 0 or 1 with probabilities 1 - x and x,
respectively. Furthermore, we note that
PX{TO < co} > Px{TO < Tb} == (b - x)jb, 0 < x < b < 1,
which tends to 1 - x as b -t 1. Thus, X gets absorbed when it
approaches o.
[[0,1]]: Arguing as in the previous case, we get PO{TI < co} == 1, and by
symmetry we also have PI {TO < oo} = 1.
[[0,1]: Again we get PO{TI < oo} == 1, so the same relation holds for Px.
[[0,1): As before, we get PO{Tb < oo} = 1 for all b E (0,1). By the
strong Markov property at Tb and the result for [0,1) it follows that
Po{Xt -t I} > b. Letting b -t 1, we obtain Xt -t 1 a.s. under Po.
The result for Px now follows by the. strong Markov property at Tx,
applied under Po. 0
I
The ergodic properties will be proved along the lines of Theorem 8.18,
which requires some additional lemmas.
Lemma 23.16 (coupling) If X and Yare independent Feller processes,
then the pair (X, Y) is again Feller.
Proof: Use Theorem 4.29 and Lemma 19.3.
The next result is a continuous-time counterpart of Lemma 8.20.
Lemma 23.17 (strong ergodicity) Given a regular, recurrent diffusion,
we have for any initial distributions J-LI and J-L2
Hrn IIPJ.Ll oO"t I - PJ.t2 0 B;ll1 = o.
t-+oo
o
Proof: Let X and Y be independent with distributions PJ.Ll and PJ.t2'
respectively. By Theorem 23.13 and Lemma 23.16 the pair (X, Y) can be
extended to a Feller diffusion, and so by Theorem 19.17 it is again strong
Markov with respect to the induced filtration g. Define T = inf {t > 0; Xt =
yt}, and note that T is Q-optional by Lemma 7.6. The assertion now follows
as in case of Lemma 8.20, provided we can show that T < 00 a.s.
23. One-dimensional SDEs and Diffusions 467
To see this, assume first that I == IR. The processes X and Yare
then continuous local martingales. By independence they remain local
martingales for the extended filtration g, and so even X - Y is a local
Q-martingale. Using the independence and recurrence of X and Y, we get
[X - Y]oo == [X]oo + [Y]oo == 00 a.s., which shows that even X - Y is
recurrent. In particular, T < 00 a.s.
Next let 1== [[0,(0) or [[0,1]], and define 71 == inf{t > 0; Xt == O} and
72 == inf {t > 0; yt == O}. By the continuity and recurrence of X and Y, we
get T < Tl V T2 < 00 a.s. 0
Our next result is similar to the discrete-time version in Lemma 8.21.
Lemma 23.18 (existence) Every regular, positive recurrent diffusion has
an invariant distribution.
Proof: By Theorem 23.13 we may regard the transition kernels J-Lt
with associated operators Tt as defined on I, the interval I with possi-
ble entrance boundaries adjoined. Since X is not null-recurrent, we may
choose a bounded Borel set B and some Xo E I and in 00 such that
inf n J..tt n (xo, B) > O. By Theorem 5.19 there exists SODle measure J.-L on I
with J.-LI > 0 such that J.Lt n (xo, .) -4 J.L along a subsequence, in the topology
of I. The convergence extends by Lemma 23.1 7 to arbitrary x E I, and so
Tt n f(x) -t J.Lf, f E Co (I), x E I. (26)
Now fix any h > 0 and f E C o (1), and note that even Thf E Co (I)
by Theorem 23.13. Using (26), the semigroup property, and dominated
convergence, we get for any x E I
J-L(Thf) f- Tt n (Thf)(x) == Th (Tt n f)(x) J-Li.
Thus, J.LJ-lh = J.L for all h, which means that J-l is invariant on I. In particular,
J.L(J \ I) == 0 by the nature of entrance boundaries, and so the normalized
measure J-L/ J-LI is an invariant distribution on I. 0
Our final lemma provides the crucial connection between speed measure
and invariant distributions.
Lemma 23.19 (positive recurrence) For a regular, recurrent diffusion on
a natural scale and with speed measure lJ, these conditions are equivalent:
(i) vI < 00;
(ii) the process is positive recurrent;
(iii) an invariant distribution exists.
The invariant distribution is then unique and equals l/ / l/ I.
Proof: If the process is null-recurrent, then clearly no invariant distribu-
tion exists. The converse is also true by Lemma 23.18, and so (ii) and (iii)
are equivalent. Now fix any bounded, measurable function f: I ---+ IR+ with
bounded support. By Theorem 23.14, Fubini's theorem, and dominated
468 Foundations of Modern Probability
convergence, we have for any distribution J-l on I
c1lt El-'f(Xs)ds = EI-' c1lt f(Xs)ds -+ : .
If J-l is invariant, we get /-l f = v f / v I, and so v I < 00. If instead X is
null-recurrent, then EJ.lf(Xs) -+ 0 as s -+ 00, and we get v f /vI == 0, which
implies vI = 00. 0
End of proof of Theorem 23.15: It remains to consider the cases when I
is either (00, (0), [[0,(0), or [[0, 1]], since we have otherwise convergence or
absorption at some endpoint. In case of [[0, 1]] we note from Theorem 23.12
(ii) that v is bounded. In the remaining cases v may be unbounded, and
then X is null-recurrent by Lemma 23.19. If v is bounded, then J-l == v/vI
is invariant by the same lemma, and the asserted lI-ergodicity follows from
Lemma 23.17 with J.Ll == J-l. 0
Exercises
1. Prove pathwise uniqueness for the SDE dX t == (Xt+)1/2dBt + cdt with
c > O. Also show that the solutions Xx with Xc) == x satisfy Xf < Xi a.s. /
for x < y up to the time when XX reaches O.
2. Let X be Brownian motion in ]Rd, absorbed at O. Show that Y = IXI 2 is
a regular diffusion on (0, (0), describe its boundary behavior for different d,
and identify the corresponding case of Theorem 23.15. Verify the conclusion
by computing the associated scale function and speed measure.
3. Show that solutions to equation dX t == a(Xt)dBt cannot explode. (Hint:
If X explodes at time ( < 00, then [X], == 00, and the local time of X tends
to 00 as t -+ (, uniformly on compacts. Now use Theorem 22.5 to see that
( == 00 a.s.)
4. Assume in Theorem 23.1 that Sa == N a . Show that the solutions X to (3)
form a regular diffusion on a natural scale on every connected component
I of Su. Also note that the endpoints of I are either absorbing or exit
boundaries for X. (Hint: Use Theorems 21.11, 22.4, and 22.5, and show
that the exit time from any compact interval J c I is finite.)
5. Assume in Theorem 23.1 that Su C N a , and form a from 0" by taking
o-(x) = 1 on A = N a \ Suo Show that any solution X to equation (0-,0)
also solves equation (0",0), but not conversely unless A == 0. (Hint: Since
AA = 0, we have J lA(Xt)dt = J lA(Xt)d[X]t = 0 a.s. by Theorem 22.5.)
6. Assume in Theorem 23.1 that SO' c NO'. Show that equation (0-,0) has
solutions that form a regular diffusion on every connected component of
S;. Prove the corresponding statement for the connected components of
N when N a is closed. (Hint: For S, use the preceding result. For N,
take X to be absorbed when it first reaches N u .)
23. One-dimensional SDEs and Diffusions 469
7. In the setting of Theorem 23.14, show that the stated relation implies
the convergence in Corollary 20.8 (i). Also use the result to prove a law of
large numbers for regular, recurrent diffusions with bounded speed measure
v. (Hint: Note that vg > 0 implies J g(Xs)ds > 0 a.s.)
Chapter 24
Connections with PDEs
and Potential Theory
Backward equation and Feynman-Kac formula; uniqueness
for SDEs from existence for PDEs; harmonic functions and
Dirichlet's problem; Green functions as occupation densities;
sweeping and equilibrium problem,s; dependence on conductor
and domain; time reversal; capacities and random sets
In Chapters 19 and 21 we saw how elliptic differential operators arise natu-
rally in probability theory as the generators of nice diffusion processes. This
fact is the ultimate cause of some profound connections between probability
theory and partial differential equations (PDEs). In particular, a suitable
extension of the operator ! appears as the generator of Brownian motion
in lR d , which leads to a close relationship between classical potential theory
and the theory of Brownian motion. More specifically, many basic problems
in potential theory can be solved by probabilistic methods, and, conversely,
various hitting distributions for Brownian motion can be given a potential
theoretic interpretation.
This chapter explores some of the mentioned connections. First we derive
the celebrated Feynman-Kac formula and show how existence of solutions
to a given Cauchy problem implies uniqueness of solutions to the associated
SDE. We then proceed with a probabilistic construction of Green functions
and potentials and solve the Dirichlet, sweeping, and equilibrium problems
of classical potential theory in terms of Brownian motion. Finally, we show
how Green capacities and alternating set functions can be represented in a
natural way in terms of random sets.
Some stochastic calculus from Chapters 17 and 21 is used at the begin-
ning of the chapter, and we also rely on the theory of Feller processes from
Chapter 19. As for Brownian motion, the present discussion is essentially
self-contained, apart from some elementary facts cited from Chapters 13
and 18. Occasionally we refer to Chapters 4 and 16 for some basic weak
convergence theory. Finally, the results at the end of the chapter require the
existence of Poisson processes from Proposition 12.5, as well as some basic
facts about the Fell topology listed in Theorem A2.5. Potential theoretic
ideas are used in several other chapters, and additional, though essentially
unrelated, results appear in especially Chapters 20, 22, and 25.
24. Connections with PDEs and Potential Theory 471
To begin with the general PDE connections, we consider an arbitrary
Feller diffusion in d with associated semigroup operators Tt and generator
(A, V). Recall from Theorem 19.6 that, for any f E V, the function
u(t, x) == Ttf(x) == Exf(Xt), t > 0, x E IR d ,
satisfies Kolmogorov's backward equation it == Au, where 71 == au/at. Thus,
u provides a probabilistic solution to the Cauchy problem,
it == Au,
u(O, x) == f(x).
(1)
Let us now add a potential term vu to (1), where v : JRd -t JR+, and
consider the more general problem
it == Au - vu,
u(O, x) == f(x).
(2)
Here the solution may be expressed in terms of the elementary multiplicative
functional e- v , where
Vi = I t v(Xs)ds, t > O.
Let 0 1 ,2 denote the class of functions f : JR+ x JRd that are of class 0 1 in
the time variable and of class 0 2 in the space variables. Write Cb(JR d ) and
ct (JR d ) for the classes of bounded, continuous functions from ]Rd to ]R and
JR+, respectively.
Theorem 24.1 (Cauchy problem, Feynman, Kac) Let (A,1)) be the gen-
erator of a Feller diffusion in ]Rd, and fix any f E Cb(]Rd) and v E C: (JR d ).
Then any bounded solution u E 0 1 ,2 to (2) is given by
u(t, x) = Exe-Vt f(Xt), t > 0, x E )Rd. (3)
Conversely, (3) solves (2) whenever f E 1).
The expression in (3) has an interesting interpretation in terms of killing.
To see this, we may introduce an exponential random variable '"Y lLX with
mean 1, and define ( == inf {t > 0; Vi > '"Y}. Letting X denote the process X
killed at time (, we may exp!ess the right-hand side of (3) as Exf(Xt), with
the understanding that f(Xt) == 0 when t > (. In other words, u(t,x) ==
Ttf(x), where Tt is the transition operator of the killed process. It is easy
to verify directly from (3) that the family (Tt) is again a Feller semigroup.
Proof of Theorem 24.1: Assume that u E C 1 ,2 is bounded and solves (2),
and define for fixed t > 0
M s == e - V s u (t - s, X s ) , s E [0, t].
Letting ';!:, denote equality apart from a continuous local martingale or its
differential, we see from Lemma 19.21, Ita's formula, and (2) that, for any
s < t,
dMs e-Vs{du(t - s,X s ) - u(t - s,Xs)v(Xs)ds}
e- V s {Au(t - s, Xs) - u(t - s, Xs) - u(t - s, .X"s)v(Xs) }ds == O.
472 Foundations of Modern Probability
Thus, M is a continuous local martingale on [0, t). Since M is bounded,
the martingale property extends to t, and we get
u(t, x) = ExMo = ExMt == Exu(O, X t ) == Exe-Vt f(Xt).
Next let u be given by (3) for some f E V. Integrating by parts and
using Lemma 19.21, we obtain
d{e- Vt f(X t )} e-Vt{df(X t ) - (vf)(Xt)dt}
e-Vt(Af - vf)(Xt)dt.
Taking expectations and differentiating at t == 0, we conclude that the
generator of the semigroup Ttf(x) == Exf(Xt) = u(t, x) equals A = A - v
on V. Equation (2) now follows by the last assertion in Theorem 19.6. 0
The converse part of Theorem 24.1 can often be improved in special
cases. In particular, if v == 0 and A == == ! L:i a 2 / ax;, so that X
is a Brownian motion and (2) reduces to the standard heat equation, then
u(t, x) == Exf(Xt) solves (2) for any bounded, continuous function f on ]Rd.
To see this, we note that u E C 1 ,2 on (0, 00) x]Rd because of the smoothness
of the Brownian transition density. We may then obtain (2) by applying
the backward equation to the function Thf(x) for a fixed h E (0, t).
Let us now consider an SDE in ]Rd of the form
dX: == aJ(Xt)dBf + bi(Xt)dt, (4)
and introduce the associated elliptic operator
Av(x) == !aij(x)vj(x) + bi(X)V(x), x E ]Rd, V E C 2 ,
where a ij = aia. The next result shows how uniqueness in law for solu-
tions to (4) may be inferred from the existence of solutions to the associated
Cauchy problem (1).
Theorem 24.2 (uniqueness, Stroock and Varadhan) If for every f E
Co(R d ) the Cauchy problem in (1) has a bounded solution on [0,£] x ]Rd
for some € > 0, then uniqueness in law holds for the SDE (4).
Proof: Fix any f E COO and t E (0,£], and let u be a bounded solution to
(1) on [0, t] x ]Rd. If X solves (4), we note as before that Ms == u(t - s, Xs)
is a martingale on [0, t), and so
Ef(X t ) == Eu(O, Xt) == EMt == EMo == Eu(t, Xo).
Thus, the one-dimensional distributions of X on [0, €] are uniquely
determined by the initial distribution.
Now assume that X and Yare solutions with the same initial distribu-
tion. To prove that their finite-dimensional distributions agree, it is enough
to consider times 0 = to < t1 < ... < t n such that tk - tk-1 < € for all
k. Assume that the distributions agree at to, . . . , t n -1 == t, and fix any set
C == 1f...,tn_l B with B E Bnd. By Theorem 21.7, both £(X) and £(Y)
solve the local martingale problem for (a, b). If P{X E C} = P{Y E C} >
24. Connections with PDEs and Potential Theory 473
0, we see as in case of Theorem 21.11 that the same property holds for the
conditional measures P[(JtX E .IX E C] and P[8t Y E . I}?" E C]. Since the
corresponding initial distributions agree by hypothesis, the one-dimensional
result yields the extension
P{X E C, Xt+h E .} == P{Y E C, Yf+h E.}, h E (O,c].
In particular, the distributions agree at times to, . . . , tn. The general result
now follows by induction. 0
Let us now specialize to the case when X is Brownian Inotion in JRd. For
any closed set B C JRd, we introduce the hitting time TB == inf{ t > 0; Xt E
B} and associated hitting kernel
HB(x,dy) == Px{TB < 00, X'B E dy}, x E ]Rd.
For suitable functions f, we write HB/(x) == J /(y)HB(a;, dy).
By a domain in }Rd we mean an open, connected subset D c ]Rd. A
function u: D -4 1R is said to be harmonic if it belongs to C 2 (D) and
satisfies the Laplace equation du == O. We also say that u has the mean-
value property if it is locally bounded and measurable, and such that for
any ball BcD with center x, the average of u over the boundary BB
equals u(x). The following analytic result is crucial for the probabilistic
developments.
Lemma 24.3 (harmonic functions, Gauss, Koebe) A fu.nction u on a do-
main D C JRd is harmonic iff it has the mean-value property, in which case
U E Coo(D).
Proof: First assume that u E C 2 (D), and fix a ball BcD with center
x. Writing T = TaB and noting that ExT < 00, we get by Ita's formula
Exu(Xr) - u(x) = Ex 1 r llu(Xs)ds.
Here the first term on the left equals the average of u over 8B, due to
the spherical symmetry of Brownian motion. If u is harmonic, then the
right-hand side vanishes, and the mean-value property follows. If instead u
is not harmonic, we may choose B such that u =1= 0 on B. But then the
right-hand side is nonzero, and so the mean-value property fails.
It remains to show that every function u with the mean-value property is
infinitely differentiable. Then fix any infinitely differentiable and spherically
symmetric probability density 'P, supported by a ball of radius E > 0 around
the origin. The mean-value property yields u = u * r.p on the set where
the right-hand side is defined, and by dominated convergence the infinite
differentiability of c.p carries over to u * c.p == u. 0
Before proceeding to the potential theoretic developments, we need to
introduce a regularity condition on the domain D. Writing ( == (D == TDC,
we note that Px{( = O} == 0 or 1 for every x E aD by Corollary 19.18.
When this probability is 1, we say that x is regular for DC or simply regular.
474 Foundations of Modern Probability
If this holds for every x E aD, then the boundary aD is said to be regular
and we refer to D as a regular domain.
Regularity is a fairly weak condition. In particular, any domain with a
smooth boundary is regular, and we shall see that even various edges and
corners are allowed, provided they are not too sharp and directed inward.
By a spherical cone in d with vertex v and axis a =1= 0 we mean a set of
the form C == {x; (x - v, a) > clx - vi}, where c E (O,lalJ.
Lemma 24.4 (cone condition, Zaremba) Given a domain D C IR d , let
x E aD be such that C n G c DC for some some spherical cone C with
vertex x and some neighborhood G of x. Then x is regular for DC.
Proof: By compactness of the unit sphere in ]Rd, we may cover ]Rd by
C 1 == C along with finitely many congruent cones C 2 , . . . , C n with vertex
x. By rotational symmetry
1 == Px{mink<nTc1c == O} < '" P x {T0 1c == O} == nPx{TC == O},
- kn
and so Px{ TO == O} > O. Hence, Corollary 19.18 yields P{ TC == O} = 1, and
we get (D < TonG == 0 a.s. Px. 0
Now fix a domain D C IR. d and a continuous function f : aD -+ JR..
A function u on D is said to solve the Dirichlet problem (D, f), if u is
harmonic on D and continuous on D with u == f on aD. The solution may
be interpreted as the electrostatic potential in D when the potential on the
boundary is given by f.
/
Theorem 24.5 (Dirichlet problem, Kakutani, Doob) For any regular do-
main D C IR d and function f E C b (8D), the Dirichlet problem (D, f) is
solved by the function
u(x) == Ex[f(X'D); (D < 00] == HDcf(x), xED. (5)
If (D < 00 a.s., then this is the only bounded solution; when d > 3 and
f E Co(8D), it is the only solution in Co ( D ).
Thus, HDc agrees with the sweeping (balayage) kernel of Newtonian
potential theory, which determines the harmonic measure on aD. The
following result clarifies the role of the regularity condition on aD.
Lemma 24.6 (regularity, Doob) A point bEaD is regular for DC iff, for
any f E Cb(8D), the function u in (5) satisfies u(x) f(b) as D 3 x -+ b.
Proof: First assume that b is regular. For any t > h > 0 and xED, we
get by the Markov property
Px{( > t} < Px{( 0 ()h > t - h} = EXPXh {( > t - h}.
Here the right-hand side is continuous in x, by the continuity of the
Gaussian kernel and dominated convergence, and so
limsupP x {( > t} < EbPXh {( > t - h} == P b {( 0 Oh > t - h}.
x--+-b
24. Connections with PDEs and Potential Theory 475
As h -7 0, the probability on the right tends to P b {( > t} == 0, and so
Px{( > t} -t 0 as x -+ b, which means that Px 0 (-1 boo Since also
Px Pb in C(IR+, IR d ), Theorem 4.28 yields Px 0 (X, ()-l P b 0 (X, 0)-1
in C(JR+, JRd) x [0,00]. By the continuity of the mapping (x, t) r-+ Xt it
follows that Px 0 X(l P b 0 XOI == bb, and so u(x) --+ f(b) by the
continuity of f.
Next assume the stated condition. If d == 1, then D is an interval, which
is obviously regular. Now assume that d > 2. By the Markov property we
get for any f E Cb(an)
u(b) == Eb[f(X(); ( < h] + Eb[U(Xh); ( > h], h > O.
As h -+ 0, it follows by dominated convergence that u( b) == f ( b), and for
f(x) == e- 1x - bl we get Pb{X( == b, ( < oo} == 1. Since a.s. Xt i- b for all
t > 0 by Theorem 18.6 (i), we may conclude that P b {( == O} == 1, and so b
is regular. 0
Proof of Theorem 24.5: Let u be given by (5), fix any closed ball in D
with center x and boundary S, and conclude by the strong Markov property
at 7 == 7S that
u(x) == Ex[f(X(); ( < 00] == Ex EXT [f(X(); « 00] == EXu(X T ).
This shows that u has the mean-value property, and so by Lemma 24.3 it
is harmonic. From Lemma 24.6 it is further seen that u is continuous on D
with u == f on aD. Thus, u solves the Dirichlet problem (D, f).
Now assume that d > 3 and I E Co (aD). For any E > 0 we have
lu(x)1 < € + 11/11 Px{lf(X()1 > E, ( < oo}.
(6)
Since X is transient by Theorem 18.6 (ii) and the set {y E aD; If(y)1 > E}
is bounded, the right-hand side of (6) tends to 0 as Ix: -7 00 and then
€ -t 0, which shows that u E Co( D ).
To prove the asserted uniqueness, it is clearly enough to assume f == 0
and show that any solution u with the stated properties is identically zero.
If d > 3 and u E Co( D ), then this is clear by Lemma 24.3, which shows that
harmonic functions can have no local maxima or minima. Next assume that
( < ()() a.s. and u E Cb( D ). By Corollary 17.19 we have E x u(X(l\n) == u(x)
for any xED and n E N, and as n 00, we get by continuity and
dominated convergence u(x) == Exu(Xc;) == o. 0
To prepare for our probabilistic construction of the Green function in
a domain D C ]Rd, we need to study the transition densities of Brownian
motion killed on the boundary aD. Recall that ordinary Brownian motion
in IR d has transition densities
Pt(X, y) = (27rt)-d/2e-lx-YI2 /2t, X, Y E IR d , t > O. (7)
476 Foundations of Modern Probability
By the strong Markov property and Theorem 6.4, we get for any t > 0,
xED, and B c B (D),
Px{Xt E B} == Px{Xt E B, t < (} + Ex[Tt-(lB(X,); t > (].
Thus, the killed process has transition densities
pp(x, y) == Pt(x, y) - Ex[Pt-«(X(, y); t > (], x, Y E D, t > O. (8)
The following symmetry and continuity properties of PP play a crucial role
in the sequel.
Theorem 24.7 (transition density, Hunt) For any domain D in }Rd and
time t > 0, the function PP is symmetric and continuous on D 2 . If bEaD
is regular, then pf(x, y) -4 0 as x -4 b for fixed y E D.
Proof: From (7) we note that Pt(x, y) is uniformly continuous in (x, y)
for fixed t > 0, as well as in (x, y, t) for Ix - yl > c > 0 and t > O. By (8) it
follows that pf(x,y) is equicontinuous in y E D for fixed t > O. To prove
the continuity in xED for fixed t > 0 and y ED, it is then enough to show
that Px{Xt E B, t < (} is continuous in x for fixed t > 0 and B E B(D).
Letting h E (0, t), we get by the Markov property
Px{Xt E B, ( > t} == Ex [P Xh {Xt-h E B, ( > t - h}; ( > h].
Thus, for any x, y E D,
I(Px - Py){Xt E B, t :< (}I
< (Px + Py){( < h} + IIPx 0 X h 1 - Py 0 X h 1 }II,
which tends to 0 as y --+ x and then h --+ O. Combining the continuity in
x with the equicontinuity in y, we conclude that pf(x,y) is continuous in
(x, Y) E D 2 for fixed t > O.
To prove the symmetry in x and y, it is now enough to establish the
integrated version
i Px{Xt E B, ( > t}dx = l Px{Xt E C, ( > t}dx, (9)
for any bounded sets B, C E B(D). Then fix any compact set FeD.
Letting n E N and writing h == 2- n t and tk == kh, we get by Proposition
8.2
l PX{X tk E F, k < 2 n ; Xt E B}dx
= r ... r lc(xo)lB(X2n) II Ph(Xk-l, xk)dx o .. . dX2n.
iF iF k::;2 n
Here the right-hand side is symmetric in the pair (B, C), because of the
symmetry of Ph(X, y). By dominated convergence as n --+ 00 we obtain
(9) with F instead of D, and the stated version follows by monotone
convergence as F t D.
24. Connections with PDEs and Potential Theory 477
To prove the last assertion, we recall from the proof of Lemma 24.6 that
Px 0 (, X)-l P b 0 (0, X)-l as x --+ b with bEaD regular. In particular,
Px 0 ((, X() c5 0 ,b, and by the boundedness and continuity of Pt(x, y) for
Ix - yl > € > 0, it is clear from (8) that PP (x, y) ---+ O. 0
A domain D C d is said to be Greenian if either d > 3, or if d < 2 and
Px {(D < oo} = 1 for all xED. Since the latter probability is harmonic
in x, it is enough by Lemma 24.3 to verify the stated property for a single
xED. Given a Greenian domain D, we may introduce the Green function
gD(X,y) = i OOpp (X,y)dt, x,yED.
For any measure J-t on D, we may further introduce the associated Green
potential
CDj.t(x) = J gD(x,y)j.t(dy), xED.
Writing CD J-t = CD f when J..l(dy) = f(y)dy, we get by Fubini's theorem
Ex it;, f(Xt)dt = J gD(x,y)f(y)dy = CD f(x), xED,
which identifies gD as an occupation density for the killed process.
The next result shows that gD and G D agree with the Green function and
Green potential of classical potential theory. Thus, CD J..l( x) may be inter-
preted as the electrostatic potential at x arising from a charge distribution
J-t in D, when the boundary aD is grounded.
Theorem 24.8 (Green function) For any Greenian domain D C JRd, the
function gD is symmetric on D 2 . Furthermore, gD (x, y) is harmonic in
xED \ {y} for each y E D, and if bEaD is regular, then gD(x, y) ---+ 0 as
x --+ b for fixed y E D.
The proof is straightforward when d > 3, but for d < 2 we need two
technical lemmas. We begin with a uniform estimate for large t.
Lemma 24.9 (uniform integrability) Consider a d01nain D
assumed to be bounded when d < 2. Then
lim sup 1 00 p(x, y)ds = o.
too x,yED t
C ]Rd
,
Proof: For d > 3 we may take D = ]Rd, in which case the result is obvious
from (7). Next let d = 2. By obvious domination and scaling arguments,
we may then assume that Ixl < 1, Y = 0, D = {z; Izi < 2}, and t > 1.
478 Foundations of Modern Probability
Writing Pt(X) = Pt (x, 0), we get by (8)
pf(x,o) < Pt(X) - E o [Pt-«(l); ( < t/2]
< Pt(O) - pt{l)Po{( < t/2}
< Pt(O)Po{( > t/2} + Pt(O) - pt(l)
t- I PO{( > t/2} + t- 2 .
As in case of Lemma 23.8 (ii), we have Eo( < 00, and so by Lemma 3.4
the right-hand side is integrable in t E [1, (0). The proof for d == 1 is
similar. 0
We also need the fact that bounded sets have bounded Green potential.
Lemma 24.10 (boundedness) For any Greenian domain D C d and
bounded set B E B(D), the function GDIB is bounded.
Proof: By domination and scaling together with the strong Markov prop-
erty, it suffices to take B == {x; Ixl < I} and to show that CD 1 B (0) < 00.
For d > 3 we may further take D == d, in which case the result follows by a
simple computation. For d == 2 we may assume that D :) C = {x; Ixl < 2}.
Write a == (C+TBO()(c and TO == 0, and recursively define Tk+I == Tk+aof)'Tk'
k > O. Putting b == (1,0), we get by the strong Markov property at the
times Tk
CDIB{O) = CCIB(O) + CCIB(b) PO{Tk < (}.
kl
Here CCIB(O) V CCIB(b) < 00 by Lemma 24.9. By the strong Markov
property it is further seen that Po{ Tk < (} < pk, where p = SUPXEB Px{ a <
(}. Finally, note that p < 1, since Px{a < (} is harmonic and hence
continuous on B. The proof for d == 1 is similar. 0
Proof of Theorem 24.8: The symmetry of gD is clear from Theorem 24.7.
If d > 3, or if d == 2 and D is bounded, it is further seen from Theorem
24.7, Lemma 24.9, and dominated convergence that gD (x, y) is continuous
in xED \ {y} for each y E D. Next we note that GDI B has the mean-
value property in D \ B for bounded B E B(D). The property extends by
continuity to the density gD(x, y), which is then harmonic in xED \ {y}
for fixed y E D, by Lemma 24.3.
For d == 2 and unbounded D, we define Dn == {x E D; Ixl < n}, and note
as before that gDn (x, y) has the mean-value property in x E Dn \ {y} for
each y E Dn. Since pfn t PP by dominated convergence, we have gDn t gD,
and so the mean-value property extends to the limit. For any x =1= y in D,
choose a circular disk B around y with radius € > 0 small enough that
x B cD. Then 1r€2gD(x, y) == GDIB(x) < 00 by Lemma 24.10. Thus,
by Lemma 24.3 even gD(x, y) is harmonic in xED \ {y}.
To prove the last assertion, fix any y E D, and assume that x -+ bEaD.
Choose a Greenian domain D' :> D with bED'. Since PP < pP', and
24. Connections with PDEs and Potential Theory 479
both pP' (" y) and gD' (., y) are continuous at b whereas PP (x, y) -+ 0 by
Theorem 24.7, we get gD(x,y) -+ 0 by Theorem 1.21. 0
We proceed to show that a measure is determined by its Green potential
whenever the latter is finite. An extension appears as part of Theorem
24.12. For convenience, we write
pp ",,(x) = J pp(x, y)",,(dy), xED, t > O.
Theorem 24.11 (uniqueness) If J.L and v are measures on a Greenian
domain D C ]Rd such that CD J.L == CDv < 00, then M == v.
Proof: For any t > 0 we have
it (PsD ",,)ds = CD"" - PPC D "" = CDv - PPCDv = it (Ppv)ds. (10)
By the symmetry of pD, we further get for any measurable function f :
D -+ IR+
Jf(x)PP",,(x)dx Jf(X)dX Jp:;(x,y)",,(d y )
J ",,(dy) J f(x)p:;(x, y)dx = J psD f(y)",,(dy).
Hence,
J f(x)dx it psD ",,(x) ds
it ds J Pi> f(y) ",,(dy)
J ",,(dy) it Pp f(y) ds,
and similarly for lI. By (10) we obtain
J ",,(dy) it psD f(y)ds = J v(dy) it psD f(y)ds. (11)
Assuming that f E Cj«(D), we get psD f -+ f as s -+ 0, and so t- I f pf fds
-+ f. If we can take limits inside the outer integrations in (11), we obtain
J..tf == v f, which implies J1 == II since f is arbitrary.
To justify the argument, it suffices to show that SUPs pf f is J1- and v-
integrable. Then conclude from Theorem 24.7 that f p;> (., y) for fixed
s > 0 and y ED, and from Theorem 24.8 that f CD f. The latter
property yields pF f ;S pFG D f < G D f, and by the former property we
get for any y E D and s > 0
",,(CD!) = J CD",,(x)f(x)dx psDCD",,(y) < CDJ.L(Y) < 00,
and similarly for v.
o
480 Foundations of Modern Probability
Now let :FD and KD denote the classes of closed and compact subsets of
D, and write F D and /CD for the subclasses of sets with regular boundary.
For any B E :F D we may introduce the associated hitting kernel
HE(x,dy) = Px{TB < (D, X rB E dy}, xED.
Note that if X has initial distribution J-L, then the hitting distribution of
XC; in B equals J-tHE = J J-t(dx)HE(x, .).
The next result solves the sweeping problem of classical potential theory.
To avoid technical complications, here and below, we shall only consider
subsets with regular boundary. In general, the irregular part of the bound-
ary can be shown to be polar, in the sense of being a.s. avoided by a
Brownian motion. Given this result, one can easily remove all regularity
restrictions.
Theorem 24.12 (sweeping and hitting) For any Greenian domain D C
R d and subset B E F1, let J.l be a bounded measure on D with GD J-t < 00
on B. Then J-tH E is the unique measure 1/ on B with CD J.,L = CD 1/ on B.
For an electrostatic interpretation, assume that a grounded conduc-
tor B is inserted into a domain D with grounded boundary and charge
distribution J-l. Then a charge distribution -J-tHE arises on B.
A lemma is needed for the proof. Here we define gD\B(x, y) = 0 whenever
x or y lies in B.
Lemma 24.13 (fundamental identity) For any Greenian domain D C JR.d
and subset B E F D , we have
gD(x,y) = gD\B(x,y) + 1 HB(x,dz)gD(z,y), x,y E D.
Proof: Write ( = (D and T = TB. Subtracting relations (8) for the do-
mains D and D \ B, and using the strong Markov property at T together
with Theorem 6.4, we get
pf(x,y) - pf\B(x,y)
- Ex [Pt-r(X r , y); T < ( 1\ t] - Ex (Pt-«(X" y); T < ( < t]
Ex [Pt - r (X r, y); T < ( 1\ t]
- Ex [ Ex.,. [Pt -7" - C; ( X ( , y); ( < t - T]; T < ( 1\ t]
- ExlPR-r(Xr, y); T < ( 1\ t].
Now integrate with respect to t to get
gD(x,y) - gD\B(X,y) _ Ex[gD(Xr,Y); T < (]
J HE(x,dz)gD(z,y). 0
Proof of Theorem 24.12: Since 8B is regular, we have Hi(x,.) = b x for
all x E B, and so by Lemma 24.13 we get for all x E Band zED
J gD(x,y)HE(z,dy) = J gD(z,y)HB(x,dy) = gD(z,x).
24. Connections with PDEs and Potential Theory 481
Integrating with respect to J-L(dz) gives CD(J-LHf])(x) == CD J-l(x), which
shows that v == J-lH f] has the stated property.
Now consider any measure v on B with CD J-l == CD V on B. Noting that
gD\B(x,.) == 0 on B whereas Hf](x,.) is supported by B, we get by Lemma
24.13 for any xED
GDv(x) J v(dz)gD(z,x) = J v(dz) J gD(z,y)Hf](x,dy)
J Hf](x,dy)GDv(y) = J Hf](x,dy)GDJ.L(Y).
Thus, J-l determines CDv on D, and so l/ is unique by Theorem 24.11. 0
Let us now turn to the classical equilibrium problem. For any K E /CD
we introduce the last exit or quitting time
I'f == sup{t < (D; Xt E K}
and the associated quitting kernel
L(x,dy) == p x {l'f > 0; X(I') E dy}.
Theorem 24.14 (equilibrium measure and quitting, ()hung) For any
Greenian domain D E ]Rd and subset K E lCD, there exists a measure
J-l on 8K such that
Lf(x, dy) == gD (x, Y)J-lf(dy), xED. (12)
Furthermore, J-lf is diffuse when d > 2, and if K E Kb, then J-l is the
unique measure J-l on K satisfying CD J-l == 1 on K.
Here J-lf is called the equilibrium measure of K relative to D, and its total
mass Cf is called the capacity of K in D. For an electrostatic interpretation,
assume that a conductor K with potential 1 is inserted into a domain
D with grounded boundary. Then a charge distribution J-l arises on the
boundary of K.
Proof of Theorem 24.14: Write l' == I', and define
lc(x) == E- 1 Px{O < 1 < E}, E > O.
Using Fubini's theorem, the simple Markov property, and dominated
convergence as E -t 0, we get for any f E Cb(D) and xED
GD(Jle)(x) - Ex 1< f(Xt)le(Xt)dt
c 1 1 00 Ex [f(Xt)PXt {O < 'Y < di t < (]dt
- c 1 1 00 Ex[f(Xt); t < 'Y < t + .:-]dt
e;-l Ex r f(Xt)dt
J("f- c )+
-t Ex[f(X"f); l' > 0] == L f(x).
482 Foundations of Modern Probability
If f has compact support, then for each x we may replace f by the
bounded, continuous function f / gD ( x, .) to get as £ 0
J f( )l ( )d J L(x, dy)f(y)
y € Y Y gD ( x, Y) .
(13)
Since the left-hand side is independent of x, the same thing is true for the
measure
D ( d ) _ L(x, dy)
{tK Y - D ( ) .
9 x, Y
(14)
If d = 1, we have gD(x, x) < 00, and (14) is trivially equivalent to (12).
If instead d > 2, then singletons are polar, and so the measure L(x, .) is
diffuse, which implies the same property for {t. Thus, (12) and (14) are
again equivalent. We may further conclude from the continuity of X that
L(x, .), and then also {tf is supported by oK.
Integrating (12) over D yields
Px{TK < (D} == GD{t(x), xED,
and so for K E JC D we get G D J.-tf == 1 on K. If y is another measure on
K with GDy == 1 on K, then v = {t by the uniqueness part of Theorem
24.12. 0
The next result relates the equilibrium measures and capacities for
different sets K E /CD'
Proposition 24.15 (consistency) For any Greenian domain D C JRd and
subsets K c B in /CD' we have
D D H D D L D
J.-tK J.-tB K == {tB K,
C = L Px{TK < (D}JL£(dx).
(15)
(16)
Proof: By Theorem 24.12 and the defining properties of {t and {t, we
have on K
GD({tHJi) == GD{t == 1 == cD{tf,
and so {tl!fl == J-t by the same result. To prove the second relation in
(15), we note by Theorem 24.14 that, for any A E B(K),
JL£L(A) J JL£(dx) i gD(x, Y)JL(dy)
i GDJL£(Y)JL(dy) = JL(A),
since CD J.-t == 1 on A c B. Finally, (15) implies (16), since H(x, K) ==
Px{TK < (D}. 0
Some basic properties of capacities and equilibrium measures follow im-
mediately from Proposition 24.15. To explain the terminology, fix any space
24. Connections with PDEs and Potential Theory 483
S along with a class of subsets U, closed under finite unions. For any
function h : U -+ IR and sets U, U l , U 2 , . .. E U, we recursively define the
differences
Ul h(U)
Ul ,...,U n h(U)
h(U U U I ) - h(U),
Un {Ul ,...,U n - 1 h(U)}, rt > 1,
where the difference Un in the last formula is taken with respect to U.
Note that the higher-order differences Ul,...,Un are invariant under permu-
tations of U I , . . . , Un. We say that h is alternating or completely monotone
if
(_l)n+lUl,""Unh(U) > 0, n E N, U, U I , U 2 ,." E U.
Corollary 24.16 (dependence on conductor, Choquet) For any Greenian
domain D C JRd, the capacity cfI is an alternating function of K E Kb.
Furthermore, ILn IL as Kn t K or Kn t K in Kv.
Proof: Let 1/J denote the path of XC, regarded as a random closed set in
D. Writing
hx{K) = Px{'ljJK =1= 0} == Px{TK < (}, xED \ K,
we get by induction
(_l)n+lKl,...,Knhx(K) == Px{'ljJK == 0, 7jJK I =1= 0, ..., wK n =I- 0} > 0,
and the first assertion follows by Proposition 24.15 with ]( c BO.
To prove the last assertion, we note that trivially TKn - TK when Kn t
K, and that TKn t TK when Kn t K since the Kn are closed. In the
latter case we also note that nn {TKn < (} == {TK < (} by compactness.
Thus, in both cases HJ1 n (x,.) H(x,.) for all xED \ Un Kn, and
by dominated convergence in Proposition 24.15 with BO => Un Kn we get
D W D
J-lK n -+ ILK. 0
The next result solves an equilibrium problem involving two conductors.
Corollary 24.17 (condenser theorem) For any disjoint .sets B E F"D and
K E Kb, there exists a unique signed measure v on B U K with CD v == 0
on Band GDv == 1 on K, namely
_ D\B D\B H D
v-J-lK -J-lK B'
Proof: Applying Theorem 24.14 to the domain D \ B with subset K, we
get v = ILf\B on K, and then v == -JLf\B HE on B by Theorem 24.12. 0
The symmetry between hitting and quitting kernels in I>roposition 24.15
may be extended to an invariance under time reversal of the whole pro-
cess. More precisely, putting "y == l'f, we may relate the stopped process
Xl = X--y/\t to its reversal Xi = X(")'-t)+. For convenience, we write
484 Foundations of Modern Probability
PJ..L = J PxJ.t( dx) and refer to the induced measures as distributions, even
when J-t is not normalized.
Theorem 24.18 (time reversal) Given a Greenian domain D E ]Rd and
a set K E Kn, put "Y == 'Y and JL == Jlf. Then X'" d .x, under PJ..L.
Proof: Let Px and Ex refer to the process X(. Fix any times 0 == to <
t 1 < ... < tn, and write Sk = t n -tk and h k = tk -tk-l. For any continuous
functions 10, . .. ,in with compact supports in D, we define
r (x) Ex Ito !k(X Sk ) lE(X tn )
- Ex Itl !k(X Sk ) EXSl (JOlE)(XtJ,
where the last equality holds by the Markov property at S1. Proceeding as
in the proof of Theorem 24.14, we get
/ (rGDf.L) (x) dx / GD!E(y) f.L(dy)
-t EI-' IIJk(X;:') 1{-y > t n }. (17)
On the other hand, (13) shows that the measure l€(x)dx tends vaguely
to J-t, and so by Theorem 24.7
Ex (fOlE)(XtJ / pf.(x,y) (fOlE)(y)dy
-t / pf.(x,y)!o(y)f.L(dy).
Using dominated convergence, Fubini's theorem, Proposition 8.2, Theorem
24.7, and the relation GDJ-t(x) == Px{ "Y > O}, we obtain
/ (fEGDf.L)(x)dx
-t / GDf.L(x)dx / !o(Y)f.L(dy)Ex IIk>o!k(Xsk)Pf.(Xspy)
/ !o(xo)f.L(dxo) /.../ GDf.L(xn) IIk>opfk (Xk-l, Xk)!k(Xk)dxk
EI-' II/k(Xtk)GDf.L(Xtn) = EI-' II/k(X tk )l{-y > t n }.
Comparing with (17), we see that X'" and X'" have the same finite-dimen-
sional distributions. 0
We may now extend Proposition 24.15 to the case of possibly different
Greenian domains DeD'. Fixing any K E JCD, we recursively define the
optional times
D' D
Tj = 'Yj-l + TK 0 ()'"Y:J- 1 ' 'Yj == Tj + 'YK 0 Orj' j > 1,
starting with 1'0 == O. In other words, Tk and ,k are the times of hitting or
quitting K during the kth excursion in D that reaches K, prior to the exit
24. Connections with PDEs and Potential Theory 485
time (D'. The generalized hitting and quitting kernels are given by
D D' D D'
H K ' (x,.) = Ex k 8 X (Tk)' L K ' (x,.) = Ex k 8 X ('Yk)'
where the summations extend over all kEN with Tk < 00.
Theorem 24.19 (extended consistency relations) Let D c D' be Gree-
nian domains in JRd with regular compact subsets K c K'. Then
D D' D D' D' D,D'
ilK = {lK,H K ' = J.LK,L K . (18)
Proof: Define l€ = E- 1 Px{ I'll E (0, E]}. Proceeding as in the proof of
Theorem 24.14, we get for any xED' and f E Cb(D')
,{(D' DD'
CD (flc)(x) = E:- 1 Ex J o f(Xdlb oOt E (0, E:]}dt L K ' f(x).
If f has compact support in D, we may conclude as before that
D D'
J D J J LK' (x, dy)f(y)
f(Y)J.LK(dy) f- (flc)(y)dy gD'(x,y) '
and so
D D' D' D
L K ' (x, dy) = 9 (x, Y){lK(dy).
Integrating with respect to Il:, and noting that GD'Ilf<: = 1 on K' =:) K,
we obtain the second expression for {l in (18).
, D D' D D'
To deduce the first expression, we note that H f H K' = H K' by the
strong Markov property at TK. Combining with the second expression in
(18) and using Theorem 24.18 and Proposition 24.15, we get
D D' D D' D' D D' v' D' D D' D' D,D'
{lK = ilK L K ' = ilK H K ' = {lK,H K H K ' = f.-lK,H K . 0
The last result enables us to study the equilibrium measure JL and
capacity C as functions of both D and K. In particular, we obtain the
following continuity and monotonicity properties.
Corollary 24.20 (dependence on domain) For any regular, compact set
K C JRd, the measure J..l is nonincreasing and continuo'U..s from above as a
function of the Greenian domain D =:) K.
Proof: The monotonicity is clear from (18) with K = K', since
H,D' (x,.) > 8x for x EKe D cD'. It remains to prove that cf1 is con-
tinuous from above and below in D for fixed K. By dominated convergence
it is then enough to show that Kfn -t K, where K = sup{j; Tj < oo} is
the number of D-excursions hitting K.
Assuming Dn t D, we need to show that if Xs, Xt E K and XED on
[s, t], then X E Dn on [s, t] for sufficiently large n. But this is clear from
the compactness of the path on the interval [s, t]. If instead Dn 4- D, we
need to show for any r < s < t with X r , Xt E K and Xs f/:- D that Xs rJ. Dn
for sufficiently large n. But this is obvious. 0
486 Foundations of Modern Probability
We proceed to show how Green capacities can be expressed in terms of
random sets. Let X denote the identity mapping on FD. Given any measure
v on FD \ {0} with v{XK -=f 0} < 00 for all K E /CD, we may introduce
a Poisson process 1] on :F D \ {0} with intensity measure v and form the
associated random closed set <p = U{F; 1]{F} > O} in D. Letting 'Trv denote
the distribution of c.p, we note that
1I"v{x K = 0} = P{1]{XK =l0} = O} = exp( -v{X K =l0}), K E /CD.
Theorem 24.21 (Green capacities and random sets, Choquet) For any
Greenian domain D C ]Rd, there exists a unique measure v on F D \ {0}
such that
cf = v{X K =l0} = -log7r v {X K = 0}, K C /CD.
Proof: Let 'l/J denote the path of X< in D. Choose sets Kn t D in /CD
with Kn C K+l for all n, and put J.Ln == J.tn' 'l/ln = 'l/;Kn, and Xn = XKn.
Define
v = J Px{'l/Jp E ., 'l/Jn i= 0}/Lp(dx), n < p,
(19)
and conclude by the strong Markov property and Proposition 24.15 that
v{Xp E ., Xm # 0} = v, m < n < p < q. (20)
By Corollary 6.15 there exist some measures V n on FD, n L E N, satisfying
vn{Xp E.} = v, n < p,
and from (20) we note that
V n { . , Xm # 0} = V m , m < n.
(21)
(22)
Hence, the measures V n agree on {Xm # 0} for n > m, and so we may
define v = sUP n v n . By (22) we have v{., Xn =1= 0} = V n for all n. Assuming
K E /CD with K c K, we conclude from (19), (21), and Proposition 24.15
that
v{XK # 0}
vn{x K =1= 0} = v;:{X K =1= 0}
J Px{'l/Jn K i= 0}/Ln(dx)
J Px{TK < (}/Ln(dx) = cIl.
The uniqueness of 1I is clear by a monotone class argument.
o
The representation of capacities in terms of random sets will now be
extended to the abstract setting of alternating set functions. As in Chapter
16, we may then fix an IcscH space S with Borel a-field S, open sets Q,
closed sets :F, and" compacts /C. Write S = {B E S; B E K:}, and recall
that a class U C S is said to be separating if for any K E J( and G E 9
with KeG there exists some U E U with K cUe G.
24. Connections with PDEs and Potential Theory 487
For any nondecreasing function h on a separating class U c S, we define
the associated inner and outer capacities hO and h by
h ° ( G)
h(K)
sup{h(U); U E U, U c G},
inf{h(U); U E U, UO :J K},
G E 9,
K E K.
Note that the formulas remain valid with U replaced by any separating
subclass. For any random closed set <.p in S, the associated hitting function
h is given by h(B) = P{<pB =1= 0} for all B E S.
Theorem 24.22 (alternating functions and random sets, Choquet) The
hitting function h of a random closed set in S is alternating wit h == h
on K and h = hO on Q. Conversely, given a separating class U C 5, closed
under finite unions, and an alternating function p: U -+ [0, 1] with p(0) = 0,
there exists a random closed set with hitting function h such that h == 15 on
K and h == pO on g.
The algebraic part of the construction is clarified by the following lemma.
Lemma 24.23 (discrete case) Assume U C S to be finite and closed un-
der unions, and let h: U -+ [0, 1] be alternating with h(0) == O. Then there
exists a point process on S such that P{U > O} = h(U) for all U E U.
Proof: The statement is obvious when U == {0}. Proceeding by induction,
assume the assertion to be true when U is generated by up to n -1 sets, and
consider a class U generated by n nonempty sets B 1 ,. .. , Bn. By scaling we
may assume that h(B l U . . . U Bn) == 1.
For each j E {I, . . . , n}, let U j be the class of unions formed by the sets
B i \ Bj, i =1= j, and define
hj(U) = Dt.uh(Bj) == h(Bj U U) - h(Bj), [] E U j .
Then each h j is again alternating with h j (0) == 0, and so the induction
hypothesis ensures the existence of some point process j on Ui B i \ Bj
with hitting function h j . Note that h j remains the hitting function of j
on all of U. Let us further introduce a point process n+1 with
pn.{n+1Bi > O} == (_1)n+1 Bl,...,Bnh(0).
t
For 1 < j < n + 1, let 1/ j denote the restriction of L ( j) to the set A j =
ni<j{/LB i > O}, and put 1/ == Lj 1/j. We may take to be the canonical
point process on S with distribution 1/.
To see that has hitting function h, we note that for any U E U and
j < n,
Vj{j.tU > O} -
P{1B1 > 0, .. . , jB1-1 > 0, jU > O}
( _1 ) 1+1 Dt. h . ( 0 )
Bl,...,Bj-l,U J
( _1 ) j+1 Dt. h ( B . )
Bt,...,Bj-l,U J.
488 Foundations of Modern Probability
It remains to show that, for any U E U \ {0},
( _1 ) j+18 h ( B. ) + ( -1 ) n+1 h ( 0 ) =h ( U )
j$.n Bl,...,BJ-l,U J Bl,...,Bn .
This is clear from the fact that
Bl,...,BJ_l,uh(Bj) = Bl,...,Bj,uh(0) + D.Bl,...,B J _ 1 ,uh(0). 0
Proof of Theorem 24.22: The direct assertion can be proved in the same
way as Corollary 24.16. Conversely, let U and p be as stated. By Lemma
A2.7 we may assume U to be countable, say U = {U 1 , U 2 ,. . .}. For each
n, let Un be the class of unions formed from U 1 ,..., Un. By Lemma 24.23
there exist some point processes l, 2, . .. on S such that
P{nU > O} = p(U), U E Un, n E N.
The space :F is compact by Theorem A2.5, and so by Theorem 16.3
there exists some random closed set c.p in S such that supp n c.p along a
subsequence N' c N. Writing h n and h for the associated hitting functions,
we get
h(BO) < limif hn(B) < limsuphn(B) = h( B ),
nEN nEN'
'"
BE S,
and in particular,
h(UO) < p(U) < h( U ), U E U.
Using the strengthened separation property K c UO c U c G, we may
easily conclude that h = pO on 9 and h = P on /C. 0
Exercises
1. For a domain D C }R2 and point x E aD, assume that x E I c DC
for some line segment I. Show that x is regular for DC. (Hint: Consider
the windings around x of Brownian motion starting at x, using the strong
Markov property and Brownian scaling.)
2. Compute the Newtonian potential kernel 9 = gD when D = JRd with
d > 3, and check by direct computation that g(x, y) is harmonic in x i= y
for fixed y.
3. For any domain D C d, show that Pt(x, y) - pP(x, y) -+ 0 as t -+
0, uniformly for x =1= y in a compact set KeD. Also prove the same
convergence as inf{lxl; x fj. D} --+ 00, uniformly for bounded t > 0 and
x =1= y. (Hint: Note that Pt(x,y) is uniformly bounded for Ix - yl > € > 0,
and use (8).)
4. Given a domain D C }Rd with d > 3, show that g(x, y) - gD(x, y) is
uniformly bounded for x =1= y in a compact set KeD. Also show that the
difference tends to 0 as inf{lxl; x D} -t 00, uniformly for x i= y in K.
(Hint: Use Lemma 24.13.)
24. Connections with PDEs and Potential Theory 489
5. Show that the equilibrium measure J.L is restricted to the outer bound-
ary of K and agrees for all sets K with the same outer boundary. (Here the
outer boundary of K consists of all points x E oK that can be connected
to DC or 00 by a path through KC.) Prove a corresponding statement for
the sweeping measure l/ in Theorem 24.12.)
6. For any Greenian domain D C d, disjoint sets Kl'...' Kn E K D , and
constants PI, . . . , Pd E JR, show that there exists a unique signed measure
l/ on UjK j with GDv = Pj on Kj for all j. (Hint: Use Corollary 24.17
recursively. )
7. Show that if 'PI and <{J2 are independent random sets with distributions
'IT VI and 1f V2' then 'PI U 'P2 has distribution 1[" Vl +V2 .
8. Extend Theorem 24.22 to unbounded functions p. (Hint: Consider the
restrictions to compact sets, and proceed as in Theorem 24.21.)
Chapter 25
Predictability, Compensation,
and Excessive Functions
Accessible and predictable times; natural and predictable pro-
cesses; Doob-Meyer decomposition; quasi-left-continuity; com-
pensation of random measures; excessive and superharmonic
junctions; additive functionals as compensators; Riesz decom-
position
The purpose of this chapter is to present some fundamental, yet profound,
extensions of the theory of martingales and optional times from Chapter 7.
A basic role in the advanced theory is played by the notions of predictable
times and processes, as well as by various decomposition theorems, the most
important being the celebrated Doob-Meyer decomposition, a continuous-
time counterpart of the elementary Doob decomposition from Lemma _7.10.
Applying the Doob-Meyer decomposition to increasing processes and
their associated random measures leads to the notion of a compensator,
whose role is analogous to that of the quadratic variation for martingales.
In particular, the compensator can be used to transform a fairly general
point process to Poisson, in a similar way that a suitable time-change of
a continuous martingale was shown in Chapter 18 to lead to a Brownian
motion.
The chapter concludes with some applications to classical potential the-
ory. To explain the main ideas, let f be an excessive function of Brownian
motion X on }Rd. Then f(X) is a continuous supermartingale under Px
for every x, and so it has a Doob-Meyer decomposition M - A. Here A
can be chosen to be a continuous additive functional (CAF) of X, and we
obtain an associated Riesz decomposition f = U A + h, where U A denotes
the potential of A and h is the greatest harmonic minorant of f.
The present material is related in many ways to topics from earlier
chapters. Apart from the already mentioned connections, we shall occa-
sionally require some knowledge of random measures and point processes
from Chapter 12, of stable Levy processes from Chapter 15, of stochastic
calculus from Chapter 17, of Feller processes from Chapter 19, of additive
functionals and their potentials from Chapter 22, and of Green potentials
from Chapter 24. The notions and results of this chapter play a crucial role
for the analysis of semimartingales and construction of general stochastic
integrals in Chapter 26.
25. Predictability, Compensation, and Excessive Functions 491
All random objects in this chapter are assumed to be defined on some
given probability space f2 with a right-continuous and complete filtration
F. In the product space n x 1R+ we may introduce the predictable a-field
P, generated by all continuous, adapted processes on ffi.+. The elements
of P are called predictable sets, and the P-measurable functions on n x
1R+ are called predictable processes. Note that every predictable process is
progressIve.
The following lemma provides some useful characterizations of the
predictable a-field.
Lemma 25.1 (predictable a-field) The predictable a-field is generated by
each of the following classes of sets or processes:
(i) Fo x IR+ and the sets A x (t,oo) with A EFt, t > 0;
(ii) Fo x + and the intervals (T, 00) for optional times T;
(iii) the left-continuous, adapted processes.
Proof: Let PI, P 2 , and P 3 be the a-fields generated by the classes in (i),
(ii), and (iii), respectively. Since continuous functions are left-continuous,
we have trivially P C P3. To see that P3 C PI, it is enough to note that
any left-continuous process X can be approximated by the processes
X;" = XOl[O.l] (nt) + LklXk/nl(k.k+1](nt), t > O.
Next we obtain PI C P2 by noting that the random time tA == t.lA +OO.lAc
is optional for any t > 0 and A E :Ft. Finally, we may prove the relation
P2 C P by noting that, for any optional time T, the process l(T,oo) can be
approximated by the continuous, adapted processes Xl" == (n(t - T)+) /\ 1,
t > o. 0
A random variable T in [0,00] is called a predictable time if it is announced
by some optional times Tn t T with Tn < T a.s. on {T > O} for all n. With
any optional time T we may associate the a-field F T - generated by Fo and
the classes :Ft n {T > t} for arbitrary t > O. The following result gives the
basic properties of the a-fields Fr-. It is interesting to note the similarity
with the results for the a-fields F T in Lemma 7.1.
Lemma 25.2 (strict past) For any optional times a and T, we have
(i) Fu n {a < T} C F T - C :F T ;
(ii) if T is predictable, then {a < T} E Fu- n F T -;
(iii) if T is predictable and announced by (Tn), then V n Fr n == :F7"-.
Proof: (i) For any A E Fu we note that
An{a<T}=U (An{a < r}n{r<T})EF T _,
rEQ+
since the intersections on the right are generators of :FT-. Hence, Fu n {a <
T} E FT-' The second relation holds since each generator of Fr- lies in
Fr.
492 Foundations of Modern Pobability
(ii) Assuming that (Tn) announces T, we get by (i)
{T < 0-} = {T = O} u nn {Tn < a} E :Fa-.
(iii) For any A E :F rn we get by (i)
A == (A n {Tn < T}) U (A n {Tn == T == O}) E :F r -,
and so V n :F rn C :Fr-. Conversely, (i) yields for any t > 0 and A E :Ft
An {T > t} = Un (A n {Tn> t}) E V n:F rn - C V n:F rn ,
which shows that :F r - C V n Fr n .
o
Next we examine the relationship between predictable processes and the
a-fields :Fr-. Similar results for progressive processes and the a-fields Fr
were obtained in Lemma 7.5.
Lemma 25.3 (predictability and strict past)
(i) For any optional time T and predictable process X, the random
variable X,I {T < oo} is Fr- -measurable.
(ii) For any predictable time T and :F r - -measurable random variable a,
the process Xt = al{T < t} is predictable.
Proof: (i) If X == IAX(t,oo) for some t > 0 and A EFt, then clearly
{X,l{T < oo} = I} = An {t < T < oo} E :Fr-.
We may now extend by a monotone class argument and subsequent ap-
proximation, first to arbitrary predictable indicator functions, and then to
the general case.
(ii) We may clearly assume a to be integrable. Fixing an announcing
sequence (Tn) for T, we define
xr == E[aIF'-n](l{O < Tn < t} + 1{ Tn = O}), t > o.
Then each X n is left-continuous and adapted, hence predictable. Moreover,
X n -+ X on + a.s. by Theorem 7.23 and Lemma 25.2 (iii). 0
By a totally inaccessible time we mean an optional time T such that
P {a = T < oo} = 0 for every predictable time a. An accessible time may
then be defined as an optional time T such that P{ a = T < oo} = 0 for
every totally inaccessible time a. For any random time T, we introduce the
associated graph
[T] = {(t,w) E + x Q; r(w) = t},
which allows us to express the previous condition on a and T as [a] n [T] = 0
a.s. Given any optional time T and set A E :F,-, the time TA = TIA +OO.IAc
is again optional and is called the restriction of T to A. We now consider a
basic decomposition of optional times. Related decompositions of increasing
processes and martingales are given in Propositions 25.17 and 26.16.
25. Predictability, Compensation, and Excessive .Functions 493
Proposition 25.4 (decomposition of optional times) For any optional
time T there exists an a.s. unique set A E F T n {T < oo} such that T A
is accessible and 7 Ac is totally inaccessible. Furthermore, there exist some
predictable times 71,72,. .. with [7 A] C Un [Tn] a.s.
Proof: Define
p = suppU {7 = Tn < oo},
n
(1)
where the supremum extends over all sequences of predictable times 'Tn.
Combining sequences such that the probability in (1) approaches p, we
may construct a sequence (Tn) for which the supremum is attained. For
such a maximal sequence, we define A as the union in (1).
To see that 7 A is accessible, let a be totally inaccessible. Then [0-] n ['Tn] ==
o a.s. for every n, and so [a] n [7 A] == 0 a.s. If 7 Ac is not totally inaccessible,
then P{ T Ac == 70 < oo} > 0 for some predictable time 70, which contradicts
the maximality of 71,72,. .. . This shows that A has the desired property.
To prove that A is a.s. unique, let B be another set with the stated
properties. Then T A \B and 7B\A are both accessible and totally inaccessible,
and so 7 A \B == 7B\A == 00 a.s., which implies A == B a.s. 0
We proceed to establish a version of the celebrated Doob-Meyer decompo-
sition, a cornerstone in modern probability theory. By an increasing process
we mean a nondecreasing, right-continuous, and adapted process A with
Ao == O. We say that A is integrable if EAoo < 00. Recall that all sub-
martingales are assumed to be right-continuous. Local submartingales and
locally integrable processes are defined by localization in the usual way.
Theorem 25.5 (decomposition of submartingales, Meyer, Doleans) A
process X is a local submartingale iff it has a decomposition X == M + A,
where M is a local martingale and A is a locally integrable, increasing,
predictable process. In that case M and A are a.s. unique.
The process A in the statement is often referred to as the compensator
of X, especially when X is increasing. Several proofs of this result are
known, most of which seem to require the deep section theorems. Here
we give a relatively short and elementary proof, based on Dunford's weak
compactness criterion and an approximation of totally inaccessible times.
For convenience, we divide the proof into several lemmas.
Let (D) denote the class of measurable processes X such that the fam-
ily {X r } is uniformly integrable, where T ranges over the set of all finite
optional times. By the following result it is enough to consider class (D)
submartingales.
Lemma 25.6 (uniform integrability) Any local subrnartingale X with
Xo = 0 is locally of class (D).
Proof: First reduce to the case when X is a true submartingale. Then
introduce for each n the optional time 7 = n /\ inf {t > 0; IX t I > n}. Here
494 Foundations of Modern Pobability
/xrl < n V /Xrl, which is integrable by Theorem 7.29, and so X r is of class
(D). 0
An increasing process A is said to be natural if it is integrable and such
that E 10 00 MtdAt = 0 for any bounded martingale M. As a crucial step
in the proof of Theorem 25.5, we may establish the following preliminary
decomposition, where the compensator A is shown to he natural rather
than predictable.
Lemma 25.7 (Meyer) Any submartingale X of class (D) has a decom-
position X = M + A, where M is a uniformly integrable martingale and A
is a natural, increasing process.
Proof (Rao): We may assume that Xo = O. Introduce the n-dyadic times
fJ: = k2- n , k E Z+, and define for any process Y the associated differences
n k Y = ytn - }tn. Let
k+l k
A == ""' E[kXIFtn], t > 0, n E N,
k<2nt k
and note that Mn == X - An is a martingale on the n-dyadic set.
Writing r:;- == inf{t; A > r} for n E Nand r > 0, we get by optional
sampling, for any n-dyadic time t,
< E[A - Ar 1\ r] < E[A - AnAtJ
r
== E[Xt - Xr/\t] == E[Xt - Xr;:-/\t; A > r]. (2)
By the martingale property and uniform integrability, we further obtain
l E [ A n . An > 2r ]
2 t, t
rP{A > r} < EA = EXt ::S 1,
and so the probability on the left tends to zero as r --t 00, uniformly in t
and n. Since the random variables Xt - XrnAt are uniformly integrable by
r
(D), the same property holds for the variables Af by (2) and Lemma 4.10.
In particular, the sequence (A) is uniformly integrable, and each Mn is
a uniformly integrable martingale.
By Lemma 4.13 there exists some random variable Q E £1(:F00) such
that A -t a weakly in L 1 along some subsequence N' c N. Define
Mt = E[X oo - alFt],
A = X - M,
and note that Aoo = a a.s. by Theorem 7.23. For any dyadic t and bounded
random variable , we get by the martingale and self-adjointness properties
E(A - At) - E(M t - M) = E E[M oo - MIFt]
= E(M oo - M)E[IFt]
== E(A - a)E[IFt] --t 0,
as n -t 00 along N'. Thus, A -t At weakly in £1 for dyadic t. In particular,
we get for any dyadic s < t
o < E[A - A; At - As < 0] -t E[(At - As) 1\ 0] < o.
25. Predictability, Compensation, and Excessive Functions 495
Thus, the last expectation vanishes, and therefore At > As a.s. By right-
continuity it follows that A is a.s. nondecreasing. Also note that Ao == 0
a.s. since Ao = 0 for all n.
To see that A is natural, consider any bounded martingale N, and
conclude by Fubini's theorem and the martingale properties of Nand
An - A == M - Mn that
ENooA
Lk EN oo b..'kAn = Lk ENt'b..kAn
'"' ENtnLln k A == E" NtnkA.
k k k k
Now use weak convergence on the left and dominated convergence on the
right, and combine with Fubini's theorem and the martingale property of
N to get
E 1 00 Nt_dAt
ENooAoo == " ENookA == ENtn LlkA
k k k+l
- E" Ntn b..kA --+ E roo NtdAt.
k k+l Jo
Hence, E 10 00 L1NtdAt = 0, as required.
o
To complete the proof of Theorem 25.5, it remains to show that the
compensator A in the last lemma is predictable. This will be inferred from
the following ingenious approximation of totally inaccessible times.
Lemma 25.8 (uniform approximation, Doob) For any totally inaccessible
time r, put r n == 2-n[2nr], and let X n be a right-continuous version of the
process P[r n < tiFt]. Then
lirn suplX -l{r < t}\ = 0 a.s. (3)
n--+oo tO
Proof: Since r n t r, we may assume that xl > xl > ... > l{r < t}
for all t > O. Then X:" == 1 for t E [7,(0), and on the set {7 = oo} we
have Xl < P[r < 00 1Ft] -+ 0 a.s. as t -+ 00 by Theorem 7.23. Thus,
sUPn (Xr -l{r < t}1 -+ 0 a.s. as t -+ 00. To prove (3), it is then enough to
show for every € > 0 that the optional times
an = inf{t > 0; XI" - 1{7 < t} > e}, n E N,
tend a.s. to infinity. The an are clearly nondecreasing, and we denote their
limit by a. Note that either an < r or an = 00 for each n.
By optional sampling, Theorem 6.4, and Lemma 7.1, we have
X;l{a < oo}
P(Tn < a < ooIFO"]
P(7 < (j < ooIFO"] == l{r < (j < oo}.
-+
Hence, X; -+ 1 {7 < a} a.s. on {a < oo}, and so by right-continuity we
have on this set an < a for large enough n. Thus, (j is predictable and
announced by the times an /\ n.
496 Foundations of Modern Pobability
Next apply the optional sampling and disintegration theorems to the
optional times an, to obtain
eP{u < oo} < eP{u n < oo} < E[Xn; Un < 00]
P{ Tn < an < oo} = P{ Tn < an < T < oo}
P{T==a<oo}==O,
where the last equality holds since I is totally inaccessible. Thus, (j == 00
a.s. 0
It is now easy to see that A has only accessible jumps.
Lemma 25.9 (accessibility) For any natural increasing process A and
totally inaccessible time I, we have Ar == 0 a. s. on {T < oo}.
Proof: Resealing if necessary, we may assume that A is a.s. continuous
at dyadic times. Define In == 2- n [2 n r). Since A is natural, we have
E 1 00 P[Tn > tlFtJdAt = E 1 00 P[Tn > tIFt-JdAt,
and since T is totally inaccessible, it follows by Lemma 25.8 that
EAr- = E 1 00 l{T > t}dAt = E 1 00 l{T > t}dAt = EAr.
Hence, E[A-r; T < 00] = 0, and so Ar = 0 a.s. on {T < oo}. 0
Finally, we need to show that A is predictable.
Lemma 25.10 (Doleans) Every natural increasing process is predictable.
Proof: Fix a natural increasing process A. Consider a bounded martingale
M and a predictable time r < 00 announced by u1, (72, . .. . Then Mr - MUk
is again a bounded martingale, and since A is natural, we get by dominated
convergence EM-r6..Ar == O. In particular, we may take Mt == P[BIFt]
with B E Fr. By optional sampling we have M-r == IB and
M-r- +- M Uk == P[BIFuk] -+ P[BIFr-].
Thus, D.M-r == IB - P[BIFr-], and so
E[Ar; B] == EArP[BIF-r-] == E[E[LlArIFr-]; B].
Since B was arbitrary in Fr, we get Ar == E[LlArIF-r-] a.s., and so the
process A = ArI{T < t} is predictable by Lemma 25.3 (ii). It is also
natural, since for any bounded martingale M
EArMr == ED.A-rE[MTIFr-] = O.
By an elementary construction we have {t > 0; At > O} C Un[7n] a.s.
for some optional times Tn < 00, and by Proposition 25.4 and Lemma 25.9
we may assume the latter to be predictable. Taking T = 71 in the previous
argument, we may conclude that the process A: == ATll {71 < t} is both
25. Predictability, Compensation, and Excessive Functions 497
natural and predictable. Repeating the argument for the process A - A l
with T == 72 and proceeding by induction, we may conclude that the jump
component Ad of A is predictable. Since A - Ad is continuous and hence
predictable, the predictability of A follows. 0
For the uniqueness assertion we need the following extension of
Proposition 17.2.
Lemma 25.11 (constancy criterion) A process M is a predictable mar-
tingale of integrable variation iff Mt = Mo a.s.
Proof: On the predictable o--field P we define the signed measure
I-LB = E 1 00 IB(t)dM t , B E P,
where the inner integral is an ordinary Lebesgue-Stieltjes integral. The
martingale property implies that JL vanishes for sets B of the form F x (t, 00 )
with F E Ft. By Lemma 25.1 and a monotone class argument it follows
that J-L = 0 on P.
Since M is predictable, the same thing is true for the process llMt ==
Mt - Mt-, and then also for the sets J-i= == {t > 0; D.Mt > O}. Thus,
J-LJ:i:: = 0, and so M = 0 a.s., which means that M is a.s. continuous. But
then Mt = Mo a.s. by Proposition 17.2. 0
Proof of Theorem 25.5: The sufficiency is obvious, and the uniqueness
holds by Lemma 25.11. It remains to prove that any local submartingale X
has the stated decomposition. By Lemmas 25.6 and 25.11 we may assume
that X is of class (D). Then Lemma 25.7 shows that X = M + A for some
uniformly integrable martingale M and some natural increasing process A,
and by Lemma 25.10 the latter process is predictable. 0
The two conditions in Lemma 25.10 are, in fact, equivalent.
Theorem 25.12 (natural and predictable processes, Doleans) An inte-
grable, increasing process is natural iff it is predictable.
Proof: If an integrable, increasing process A is natural, it is also pre-
dictable by Lemma 25.10. Now assume instead that A is predictable. By
Lemma 25.7 we have A = M + B for some uniformly integrable martin-
gale M and some natural increasing process B, and Lemma 25.10 shows
that B is predictable. But then A = B a.s. by Lemma 25.11, and so A is
nm. 0
The following useful result is essentially implicit in earlier proofs.
498 Foundations of Modern Pobability
Lemma 25.13 (dual predictable projection) Let X and Y be locally in-
tegrable, increasing processes, and assume that Y is predictable. Then X
has compensator Y iff E f V dX == E J V dY for every predictable process
V > 0.
Proof: First reduce by localization to the case when X and Yare inte-
grable. Then Y is the compensator of X iff M == Y - X is a martingale
or, equivalently, iff EM T == 0 for every optional time T. This is equivalent
to the stated relation for V = l[o,T]' and the general result follows by a
straightforward monotone class argument. 0
We may now establish the fundamental connection between predictable
times and processes.
Theorem 25.14 (predictable times and processes, Meyer) For any op-
tional time T, these conditions are equivalent:
(i) T is predictable;
(ii) the process 1 {T < t} is predictable;
(iii) EMT == 0 for any bounded martingale M.
Proof (Chung and Walsh): Since (i) => (ii) by Lemma 25.3 (ii), and (ii)
{::} (Hi) by Theorem 25.12, it remains to show that (iii) => (i). We then
introduce the martingale Mt == E[e-TIFtJ and the supermartingale
Xt == e-T/\t - Mt == E[e-r/\t - e-rIFt] > 0, t > o.
Here X T == 0 a.s. by optional sampling. Letting a == inf{t > 0; Xt- !\ Xt ==
O}, we see from Lemma 7.31 that {t > 0; Xt == O} == [0", (0) a.s., and in
particular CT < T a.s. Using optional sampling again, we get E(e- U -e- r ) ==
EX u == 0, and so a == T a.s. Hence, Xt 1\ Xt- > 0 a.s. on [0, T). Finally,
(iii) yields
EX T - = E(e- r - M r -) == E(e- r - M r ) == EXT == 0,
and so Xr- == O. It is now clear that T is announced by the optional times
Tn == inf{t; Xt < n- 1 }. 0
To illustrate the power of the last result, we may give a short proof of
the following useful statement, which can also be proved directly.
Corollary 25.15 (restriction) For any predictable time T and set A E
:F r -, the restriction T A is again predictable.
Proof: The process lAl{T < t} = l{TA < t} is predictable by Lemma
25.3, and so the time TA is predictable by Theorem 25.14. 0
We may also use the last theorem to show that predictable martingales
are continuous.
25. Predictability, Compensation, and Excessive Functions 499
Proposition 25.16 (predictable martingales) A local rnartingale is pre-
dictable iff it is a. s. continuous.
Proof: The sufficiency is clear by definitions. To prove the necessity, we
note that, for any optional time r,
M; == M t l[O,r](t) + M r l(T,oo)(t), t > o.
Thus, predictability is preserved by optional stopping, and so we may as-
sume that M is a uniformly integrable martingale. Now fix any E > 0,
and introduce the optional time T == inf{t > 0; IMt > E}. Since the
left-continuous version Mt- is predictable, so is the process flMt as well
as the random set A == {t > 0; IMt I > c}. Hence, the same thing is true
for the random interval [T, 00) == A U (7, 00 ), and therefore T is predictable
by Theorem 25.14. Choosing an announcing sequence (Tn), we conclude by
optional sampling, martingale convergence, and Lemmas 25.2 (iii) and 25.3
(i) that
M r - f- M rn = E[MrIFTn] E[MrIFr-] :=: Mr.
Thus, T 00 a.s. Since £ was arbitrary, it follows that M IS a.s.
continuous. 0
The decomposition of optional times in Proposition 25.4 may now be
extended to increasing processes. We say that an rcll process X or a filtra-
tion F is quasi-leftcontinuous if Xr- == X T a.s. on {T < oo} or Fr- == F T ,
respectively, for every predictable time T. We further say that X has ac-
cessible jumps if X T - == X T a.s. on {T < oo} for every t.otally inaccessible
time T.
Proposition 25.17 (decomposition of increasing processes) Any purely
discontinuous, increasing process A has an a.s. unique decomposition into
increasing processes Aq and Aa, where Aq is quasi-leftcontinuous and
Aa has accessible jumps. Furthermore, there exist some predictable times
T1,72,... with disjoint graphs such that {t > 0; Af > O} C Un [Tn]
a.s. Finally, if A is locally integrable with compensator A, then Aq has
compensator (A)c.
Proof: Introduce the locally integrable process Xt == I:s<t(As /\ 1) with
compensator X, and define Aq == A - Aa == l{X == O} . A, or
A = At - A = I t + l{b.X s = O} dAs,
For any finite predictable time T, the graph [7] is again predictable by
Theorem 25.14, and so by Lemma 25.13,
t > o.
(4)
E(A 1\ 1) == E[XT; Xr = 0] == E[XT; Xr == 0] == 0,
which shows that Aq is quasi-Ieftcontinuous.
500 Foundations of Modern PobabiIity
Now let Tn,O = 0, and recursively define the random times
Tn,k = inf{t > Tn,k-l; Xt E (2-n,2-n+l]}, n,k E N,
which are predictable by Theorem 25.14. Also note that {t > 0; Af > O}
c Un k [T nk] a.s. by the definition of A a. Hence, if T is a totally inaccessible
,
time, then A == 0 a.s. on {T < oo}, which shows that A a has accessible
Jumps.
To prove the uniqueness, assume that A has two decompositions Aq +
Aa == Bq + Ba with the stated properties. Then Y == Aq - Bq == Ba - Aa is
quasi-leftcontinuous with accessible jumps. Hence, by Proposition 25.4 we
have YT = 0 a.s. on {r < oo} for any optional time T, which means that
Y is a.s. continuous. Since it is also purely discontinuous, we get Y == 0 a.s.
If A is locally integrable, we may replace (4) by Aq == l{dA = O} . A,
and we also note that (A)C == l{A == O} . A. Thus, Lemma 25.13 yields
for any predictable process V > 0
E J VdAq - E J l{.6.A = O}VdA
E J l{.6.A = O}VdA = E J Vd(A)C,
and the same lemma shows that Aq has compensator (A)c.
o
By the compensator of an optional time T we mean the compensator of
the associated jump process Xt == 1 {T < t}. The following result charac-
terizes the special categories of optional times in terms of the associated
compensators.
Corollary 25.18 (compensation of optional times) Let T be an optional
time with compensator A. Then
(i) r is predictable iff A is a.s. constant apart from a possible unit jump;
(ii) T is accessible iff A is a.s. purely discontinuous;
(iii) T is totally inaccessible iff A is a.s. continuous.
In general, T has the accessible part TD, where D == {A'T > 0, T < oo}.
Proof: (i) If T is predictable, then so is the process Xt = 1 {T < t} by
Theorem 25.14, and hence A = X a.s. Conversely, if At = l{a < t} for
some optional time a, then the latter is predictable by Theorem 25.14, and
Lemma 25.13 yields
Pia = T < oo} = E[XO"; a < 00] = E[AO"; a < 00]
= P{a < co} = EAec = EXec = PiT < oo}.
Thus, T = G a.s., and so T is predictable.
(ii) Clearly, T is accessible iff X has accessible jumps, which holds by
Proposition 25.17 iff A = Ad a.s.
(iii) Here we note that T is totally inaccessible iff X is quasi-Ieftcontin-
uous, which holds by PrQposition 25.17 iff A = AC a.s.
25. Predictability, Compensation, and Excessive Functions 501
The last assertion follow'8 easily from (ii) and (iii). 0
The next result characterizes quasi-left-continuity for both filtrations and
martingales.
Proposition 25.19 (quasi-leftcontinuous filtrations, Meyer) For any fil-
tration F, these conditions are equivalent:
(i) Every accessible time is predictable;
(ii) Fr- == Fr on {T < oo} for every predictable time T;
(iii) b.M r == 0 a.s. on {T < oo} for every martingale 1\1 and predictable
time T.
If the basic a-field in 0 is taken to be F cx:n then Fr- == Fr on {T == oo}
for any optional time T, and the relation in (ii) extends to all of fl.
Proof: (i) => (ii): Let T be a predictable time, and fix any B E Fr n
{T < oo}. Then [TB] C [T], and so TB is accessible, hence by (i) even pre-
dictable. The process Xt == 1 {TB < t} is then predictable by Theorem
25.14, and since
Xr1{T < oo} == l{TB < T < oo} == IB,
Lemma 25.3 (i) yields B E Fr-.
(ii) =? (iii): Fix any martingale M, and let T be a bounded, predictable
time with announcing sequence (Tn). Using (ii) and Lemma 25.2 (iii), we
get as before
M r - +- M rn == E[Mr\Fr n ] -t E[MrIFT-] == E[Mr\Fr] == M r ,
and so M r - == M r a.s.
(iii) => (i): If T is accessible, then by Proposition 25.4 there exist some
predictable times Tn with [T] C Un[Tn] a.s. By (iii) we have b.M rn == 0 a.s.
on {Tn < oo} for every martingale M and all n, and so b.M T == 0 a.s. on
{T < oo}. Hence, T is predictable by Theorem 25.14. 0
In particular, quasi-left-continuity holds for canonical Feller processes
and their induced filtrations.
Proposition 25.20 (quasi-left-continuity of Feller p1"'Ocesses, Blumen-
thal, Meyer) Let X be a canonical Feller process with arbitrary initial
distribution, and fix any optional time T. Then these conditions are
equivalent:
(i) T is predictable;
(ii) T is accessible;
(iii) Xr- == X r a.s. on {T < oo}.
In the special case when X is a.s. continuous, we may conclude that every
optional time is predictable.
Proof: (ii) => (iii): By Proposition 25.4 we may assume that T is finite
and predictable. Fix an announcing sequence (Tn) and a function f E Co.
502 Foundations of Modern Pobability
By the strong Markov property, we get for any h > 0
E{f(X rn ) - f(Xr n +h)}2 E(f2 - 2fThf + Thf2)(Xr n )
< 11/ 2 - 2fThf + Thf211
< 211flilif - Thill + 11/ 2 - Th1 2 11.
Letting n -t 00 and then h ..J.. 0, it follows by dominated convergence on
the left and by strong continuity on the right that E{/(X r -) - f(Xr)}2 =
0, which means that !(X r -) = f(Xr) a.s. Applying this to a sequence
11,/2,. . . E Co that separates points, we obtain Xr- = X r a.s.
(Hi) => (i): By (iii) and Theorem 19.20 we have LlM r = 0 a.s. on {T < oo}
for every martingale M, and so T is predictable by Theorem 25.14.
(i) => (ii): This is trivial. 0
The following basic inequality will be needed in the proof of Theorem
26.12.
Proposition 25.21 (norm inequality, Garsia, Neveu) Consider a right-
or left-continuous, predictable, increasing process A and a random variable
( > 0 such that a.s.
E[Aoo - At 1 F t] < E[(IFt], t > O. (5)
Then
!lAco lip < pl!(!lp, p > 1.
In the left-continuous case, predictability is clearly equivalent to adapt-
edness. The proper interpretation of (5) is to take E[AtIFtJ = At and to
choose right-continuous versions of the martingales E[AcoIFt] and E[(IFt].
For a right-continuous A, we may clearly choose ( = Z* , where Z is the su-
permartingale on the left of (5). We also note that if A is the compensator
of an increasing process X, then (5) holds with ( = Xoo.
Proof: We need to consider only the right-continuous case, the case of a
left-continuous process A being similar but simpler. It is enough to assume
that A is bounded, since we may otherwise replace A by the process A /\ u
for arbitrary u > 0, and let u --t 00 in the resulting formula. For each r > 0,
the random time Tr = inf{t; At > r} is predictable by Theorem 25.14. By
optional sampling and Lemma 25.2 we note that (5) remains true with
t replaced by T r -. Since Tr is Frr_-measurable by the same lemma, we
obtain
E[Aoo - r; Aoo > r] < E[Aoo - r; Tr < 00]
< E[Aoo - Arr-; Tr < 00]
< E[(; T'r < 00] < E[(; Aoo > r].
25. Predictability, Compensation, and Excessive Functions 503
Writing Aoo = ex and letting p-I + q-I == 1, we get by flibini's theorem,
Holder's inequality, and some calculus
1I0:1I = p2q-l E 1° (0: - r)r P - 2 dr
_ p2q-l 1 00 E[o: - rj 0: > r]r P - 2 dr
< p2q-l 1 00 E[(; 0: > r]r P - 2 dr
_ p2q-l E ( 1° r p - 2 dr
_ pE(aP-1 < plI(llpIIQII-l.
If lIali p > 0, we may finally divide both sides by "all-l.
o
Let us now turn our attention to random measures on (0,00) x S,
where (8, S) is a Borel space. We say that is adapted, predictable, or
locally integrable if there exists a subring S c S with a(S) == S such that
the process tB = «O, t] x B) has the corresponding property for every
B E S. In case of adaptedness or predictability, it is clearly equivalent that
the relevant property holds for the measure-valued process t. Let us further
say that a process V on 1R+ x S is predictable if it is P Q9 S-measurable,
where P denotes the predictable a-field in 1R+ x n.
Theorem 25.22 (compensation of random measures, Grigelionis, Jacod)
Let be a locally integrable, adapted random measure on some product space
(0,00) x S, where 8 is Borel. Then there exists an a.s. unique predictable
random measure t on (0,00) x S such that E J V d == E f V d€ for every
predictable process V > 0 on JR.+ x s.
'"
The random measure above is called the compensator of €. By Lemma
25.13 this extends the notion of compensator for real-valued processes. For
the proof of Theorem 25.22 we need a simple technical ]emma, which can
be established by straightforward monotone class arguments.
Lemma 25.23 (predictable random measures)
(i) For any predictable random measure and predictable process V > 0
on (0,00) x S, the process V . is again predictable.
(ii) For any predictable process V > 0 on (0, (0) x S and predictable,
measure-valued process p on 8, the process yt == J Vi,spt(ds) is again
predictable.
Proof of Theorem 25.22: Since is locally integrable, we may easily con-
struct a predictable process V > 0 on JR.+ x 8 such that E f V d(, < 00. If
the random measure (" = V . has compensator (, then by Lemma 25.23
the measure = V-I. ( is the compensator of . Thus, we may henceforth
agsume that E«O, 00) x S) = 1.
504 Foundations of Modern Pobability
Write", = (. x S). Using the kernel operation Q9 of Chapter 1, we may
introduce the probability measure J.l = P@ on f2x1R+ xS and its projection
1/ := P Q9 "1 onto Q x 1R+. Applying Theorem 6.3 to the restrictions of J.l and
v to the a-fields P Q9 Sand P, respectively, we conclude that there exists
some probability kernel p from (n x IR+, P) to (8, S) satisfying J.l = v Q9 P,
or
p (g) = p 0 1] 0 p on (0 x 1R+ x S, P x S).
Letting fJ" denote the compensator of 1], we may introduce the random
measure = fJ 0 p on JR+ x S.
To see that t is the compensator of, we first note that t is predictable by
Lemma 25.23 (i). Next we consider an arbitrary predictable process V > 0
on IR+ x S, and note that the process Y s := J Vs,tpt(ds) is again predictable
by Lemma 25.23 (ii). By Theorem 6.4 and Lemma 25.13 we get
E ! Vdt = E! r,(dt) ! Vs,tpt(ds)
E ! ".,(dt) ! Vs,t pt(ds) = E ! V.
"-
It remains to note that is a.s. unique by Lemma 25.13.
o
Our next aim is to show, under a weak regularity condition, how a
point process can be transformed to Poisson by means of a suitable pre-
dictable mapping. The result leads to various time-change formulas for
point processes, similar to those for continuous local martingales in Chapter
18.
Recall that an S-marked point process on (0,00) is defined as an integer-
valued random measure on (0,00) x S such that a.s. ([t] x S) < 1 for
all t > O. The condition implies that (, s locally integrable, and so the
existence of the associated compensator is automatic. We say that is
quasi-leftcontinuous if ([T] x S) = 0 a.s. for every predictable time T.
Theorem 25.24 (predictable mapping to Poisson) Fix a Borel space S
and a a-finite measure space (T, J.L), let be a quasi-leftcontinuous S-marked
point process on (0,00) with compensator t, and let Y be a predictable
mapping from 1R+ x S to T with t a y-l = Jl a.s. Then 1} = 0 y-l is a
Poisson process on T with E1J = J.l.
Proof: For any disjoint measurable sets Bl,'. . , Bn in T with finite J-t-
measure, we need to show that 1JBl,..', 1]Bn are independent Poisson
random variables with means J.LBl, . . . , ftBn. Then introduce for each k < n
the processes
J; = li t + IBk(,x){(dsdx), J; = lit IBk(,x)€(dsdx).
Here j = J.lBk < 00 a.s. by hypothesis, and so the Jk are simple
Hinteable point prQcesses on 1R+ with compensators jk. For fixed
25. Predictability, Compensation, and Excessive Functions 505
Ul, . . . , Un > 0, we define
Xt = L k :5;n {UkJ: - (1 - e-Uk)J:}, t > o.
The process Mt == e- Xt has bounded variation and finitely many jumps,
and so by an elementary change of variables
Mt - 1 L .6.e- xs - t e-xsdX
st Jo
Lkn I t + e- xs - (1 - e- Uk ) d(J: - J:).
Since the integrands on the right are bounded and predictable, M is a
uniformly integrable martingale, and we get EMoo == 1. Thus,
E exp { - Lk uk'TJBk } = exp { - Lk (1 - e- Uk )/lBk } ,
and the assertion follows by Theorem 5.3.
o
The preceding theorem immediately yields a corresponding Poisson char-
acterization, similar to the characterization of Brownian motion in Theorem
18.3. The result may also be considered as an extension of Theorem 12.10.
Corollary 25.25 (Poisson characterization, Watanabe) Fix a Borel space
S and a measure J.L on (0,00) x S with J.L( {t} x S) == 0 for all t > o. Let
be an S -marked, F -adapted point process on (0, (0) with compensator .
'"
Then is F -Poisson with E == J.L iff == Jl a.s.
We may further deduce a basic time-change result, similar to Proposition
18.8 for continuous local martingales.
Corollary 25.26 (time-change to Poisson, Papangelou, Meyer) Let N 1 ,
. . . , Nn be counting processes on IR+ with a.s. unbounded and continuous
compensators ill,..., fJn, and assume that Ek N k is a.s. simple. De-
fine T: ==inf{t>O; JVk > s} and Ysk == Nk(T:). Then yl,...,yn are
independent unit-rate Poisson processes.
Proof: We may apply Theorem 25.24 to the random measures ==
" " 1
(l,... 'n) and = (l,... ,n) on {I,..., n}xJR+ induced by (N ,..., Nn)
and (N l ,..., JVn), respectively, and to the predictable mapping Tk,t ==
(k,Nf) on {I,... ,n} x JR+. It is then enough to verify that, a.s. for fixed
k and t,
""k "'k k k
k{S > 0; N s < t} = t, k{S > 0; N s < t} == N (T t ),
which is clear by the continuity of JVk. 0
There is a similar result for stochastic integrals with respect to p-stable
Levy processes, as described in Proposition 15.9. For simplicity, we consider
only the case when p < 1.
506 Foundations of Modern Pobability
Proposition 25.27 (time-change of stable integrals) For apE (0,1), let
X be a strictly p-stable Levy process, and consider a predictable process
V > 0 such that the process A = VP . is a.s. finite but unbounded. Define
Ts = inf{t; At > s}, s > O. Then (V. X) 07 d X.
Proof: Define a point process on JR+ x (IR\ {O}) by B == 2:s 1B(8, Xs),
and recall from Corollary 15.7 and Proposition 15.9 that is Poisson with
intensity measure of the form AQ9V, where v(dx) = c:1:lxl- p - 1 dx for IX > o.
"-
In particular, has compensator = Q9 v. Let the predictable mapping
T on 1R+ x JR be given by Ts,x = (As, x). Since A is continuous, we have
{As < t} == {s < it} and Art == t. By Fubini's theorem, we hence obtain
for any t, u > 0
(AQ9V) oT- 1 ([O,t] x (u,oo)) (AQ9V){(S,x); As < t, xV s > u}
_ l Tt lI{x; xV s > u}ds
lI(u,oo) l Tt V[ds = t lI(u, (0),
and similarly for the sets [0, t] x (-00, -u). Thus, 0 T-l == == Q9 v a.s.,
and so Theorem 25.24 yields {o T-l d . Finally, we note that
(V . X)Tt - l Tt + J xV s (dsdx) = 1 00 J xVsl{As < tH(dsdx)
- I t + J y (o T- 1 )(drdy),
where the process on the right has the same distribution as x. 0
We turn to an important special case where the compensator can be
computed explicitly. By the natural compensator of a random measure
we mean the compensator with respect to the induced filtration.
Proposition 25.28 (natural compensator) For any Borel space (8, S), let
(T, () be a random element in (0,00] x S with distribution J.t. Then = 8 r ,(
has natural compensator
i tB = 1 j.t(dr x B) , B S
.. ([ ] ) t > 0, E .
(O,tAr] JL r,oo x S
(6)
Proof: The process 'T/tB on the right of (6) is clearly predictable for every
B E S. It remains to show that Mt = tB - T/tB is a martingale, hence
that E[Mt - Ms; A] = 0 for any s < t and A E Fs. Since Mt = Ms on
{7 < s}, and the set {T > 8} is a.s. an atom of Fs, it suffices to show that
25. Predictability, Compensation, and Excessive Functions 507
E(M t - Ms) = 0, or EMt = o. Then use Fubini's theorem to get
E'TJt B =
E f j.t(dr x B)
J(O,tAT] J.L([r,oo] X S)
r lL(dx) r lL(dr x B)
J(O,oo] J(O,tAX] J.L([r,oo] x S)
r lL(dr x B) r lL(dx)
J(O,t] J-l([r,oo] x S) J[r,oo]
J-l( (0, t] x B) == EtB.
o
We turn to some applications of the previous ideas to elassical potential
theory. Then fix a domain D C ]Rd, and let Tt = Tt D denote the transition
operators of Brownian motion X in D, killed at the boundary aD. A func-
tion f > 0 on D is said to be excessive if Tt! < f for all t > 0 and Tt/ -t !
as t -t O. In this case clearly Ttf t f. Note that if f is excessive, then
f{X) is a supermartingale under Px for every xED. The basic example
of an excessive function is the Green potential CD v of a measure v on a
Greenian domain D, provided this potential is finite.
Though excessivity is defined globally in terms of the operators Tt D , it is
in fact a local property. For a precise statement, we say that a measurable
function ! > 0 on D is superharmonic if, for any ball B in D with center x,
the average of f over the sphere BB is bounded by f(x). As we shall see, it is
enough to consider balls in D of radius less than an arbitrary f > O. Recall
that f is lower semicontinuous if X n -t x implies liminf n f(x n ) > f(x).
Theorem 25.29 (superharmonic and excessive functions, Doob) Let f >
o be a measurable function on a domain D C d. Then f is excessive iff it
is superharmonic and lower semicontinuous.
For the proof we need two lemmas, the first of which clarifies the relation
between the two continuity properties.
Lemma 25.30 (semicontinuity) Consider a measurable function f > 0
on a domain D C JRd such that Ttf < f for all t > O. Then f is excessive
iff it is lower semicontinuous.
Proof: First assume that f is excessive, and let X n -). x in D. By Theorem
24.7 and Fatou's lemma
Itf(x) J pf(x, y)f(y)dy
< liminf J pf(Xn, y)f(y)dy
n-+oo
- liminfTtf(x n ) < liminf f(xn),
n-+oo n-+oo
and as t -t 0, we get f(x) < liminf n f(xn). Thus, f is lower semicontinuous.
508 Foundations of Modern PobabiUty
Next assume that f is lower semicontinuous. Using the continuity of X
and Fatou's lemma, we get as t ---t 0 along an arbitrary sequence
f(x)
Exf(Xo) < Ex liminf f(Xt)
t--+-O
< Hminf Ex/(Xt) = liminfTtJ(x)
t--+-O t--+-O
< limsupTt/(x) < f(x).
t--+-O
Thus, Tt f ---1- /, and / is excessive.
o
For smooth functions, the superharmonic property is easy to describe.
Lemma 25.31 (smooth functions) A function f > 0 in C 2 (D) is
superharrnonic iff f < 0, in which case J is also excessive.
Proof: By Ita's formula, the process
Mt = f(Xt} - 1 t f(Xs)ds, t E [0, (),
(7)
is a continuous local martingale. Now fix any closed ball BcD with center
x, and write T = TaB. Since ExT < 00, we get by dominated convergence
f(x) = Exf(X-r) - Ex 1-r f(Xs)ds.
Thus, f is superharmonic iff the last expectation is < 0, and the first
assertion follows.
To prove the last statement, we note that the exit time ( == TaD is
predictable, say with announcing sequence ('Tn). If! < 0, we get from (7)
by optional sampling
Ex[f(XtA7"n); t < (] < Exf(XtATn) < f(x).
Hence, Fatou's lemma yields Ex[J(Xt); t < (] = Ttf(x), and so f IS
excessive by Lemma 25.30. 0
Proof of Theorem 25.29: If f is excessive or superharmonic, then Lemma
25.30 shows that f /\ n has the same property for every n > O. The converse
statement is also true-by monotone convergence and because the lower
semicontinuity is preserved by increasing limits. Thus, we may henceforth
assume that f is bounded.
Now assume that f is excessive on D. By Lemma 25.30 it is then lower
semicontinuous, and it remains to prove that f is superharmonic. Since
the property Tt! < f is preserved by passing to a subdomain, we may
assume that D is bounded. For each h > 0 we define qh = h- 1 (f - Thf)
and fh = QDqh. Since f and D are bounded, we have GDf < 00, and so
fh = h- 1 Jo h Tsfds t f. By the strong Markov property we further see that,
25. Predictability, Compensation, and Excessive Functions 509
for any optional time T < (,
Exfh(X T )
ExEx.,. 1 00 qh(Xs)ds = Ex 1 00 qh(Xs+r)ds
Ex 1 00 qh(Xs)ds < fh(X).
In particular, fh is superharmonic for each h, and so by monotone
convergence the same property holds for f.
Conversely, assume that f is superharmonic and lower semicontinuous.
To prove that f is excessive, it is enough by Lemma 25.30 to show that
Ttf < f for all t. Then fix a spherically symmetric probability density 'l/J E
coo(d) with support in the unit ball, and put 'l/Jh(X) = h--d'ljJ(x/h) for each
h > O. Writing p for the Euclidean metric in d, we may define fh == 'l/Jh * f
on the set Dh = {x E D; p(x,DC) > h}. Note that fh E COO(D h ) for all
h, that fh is super harmonic on Dh, and that fh t f. By Lemma 25.31
and monotone convergence we conclude that f is excessive on each set Dh.
Letting (h denote the first exit time from D h , we obtain
Ex[f(Xt); t < (h] < f(x), h > o.
As h ---t 0, we have (h t (, and hence {t < (h} t {t < (}. Thus, by
monotone convergence Tt f (x) < f (x). 0
We may now prove the remarkable fact that, although an excessive func-
tion f need not be continuous, the supermartingale f{X) is a.s. continuous
under Px for every x.
Theorem 25.32 (continuity, Doob) Fix an excessive function f on a do-
main D C JRd, and let X be a Brownian motion killed at aD. Then the
process f{X t ) is a.s. continuous on [0, ().
The proof is based on the following invariance under time reversal of a
stationary version of Brownian motion. Though no such process exists in
the usual sense, we may consider distributions with respect to the a-finite
measure P = J Px dx , where Px is the distribution of a Brownian motion
in Rd starting at x.
Lemma 25.33 (time reversal, Doob) For any c > 0, the processes =
Xt and Yt = Xc-t on [0, c] have the same distribution under P .
Proof: Introduce the processes
Bt = Xt - Xo, Bt = Xc-t - Xc,
t E [0, c],
and note that Band iJ are Brownian motions on [0, c] under each Px. Fix
any measurable function f > 0 on C([O,c],JRd). By Fubini's theorem and
510 Foundations of Modern Pobability
the invariance of Lebesgue measure, we get
E f{Y) = E f{X o - Be + B) = J Exf{x - Be + B) dx
- J Eof{x - Be + B)dx = Eo J f{x - Be + B)dx
- Eo J f{x + B) dx = J Exf(Y) dx = E f{Y). 0
Proof of Theorem 25.32: Since f 1\ n is again excessive for each n > 0 by
Theorem 25.29, we may assume that f is bounded. As in the proof of the
same theorem, we may then approximate f by smooth excessive functions
fh t f on suitable subdomains Dh t D. Since fh{X) is a continuous super-
martingale up to the exit time (h from D h , Theorem 7.32 shows that f{X)
is a.s. right-continuous on [0, () under any initial distribution J-L. Using the
Markov property at rational times, we may extend the a.s. right-continuity
to the random time set T = {t > 0; Xt ED}.
To strengthen the result to a.s. continuity on T, we note that f{X) is
right-continuous on T, a.e. P . By Lemma 25.33 it follows that f{X) is also
left-continuous on T, a.e. P . Thus, f(X) is continuous on T, a.s. PJ.t for
arbitrary J-l «: )...d. Since PJ.L 0 Xi: 1 « )...d for any J.l and h > 0, we may
conclude that f (X) is a.s. continuous on T n [h, 00) for any h > o. This
together with the right-continuity at 0 yields the asserted continuity on
[0, (). 0
If f is excessive, then f{X) is a supermartingale under Px for every x,
and so it has a Doob-Meyer decomposition f{X) = M -A. It is remarkable
that we can choose A to be a continuous additive functional (CAF) of X
independent of x. A similar situation was encountered in connection with
Theorem 22.23.
Theorem 25.34 (compensation by additive functional, Meyer) Let f be
an excessive function on a domain D C jRd, and let Px be the distribution
of Brownian motion in D, killed at aD. Then there exists an a.s. unique
CAF A of X such that M = f{X) + A is a continuous, local Px-martingale
on [0, () for every xED.
The main difficulty in the proof is to construct a version of the process
A that compensates - f{X) under every measure Pp. Here the following
lemma is helpful.
Lemma 25.35 (universal compensation) Consider an excessive function
f on a domain D C ]Rd, a distribution m I".J >.. d on D, and a Pm -compensa-
tor A of - f{X) on [0, (). Then for any distribution f-L and constant h > 0,
the process A 0 {Jh is a PIL-compensator of - f{X 0 (Jh) on [0, (0 (Jh).
In other words, the process Mt = f{X t ) + At-h 0 (Jh is a local
Pp.-martingale on [h, () for every J.t and h.
25. Predictability, Compensation, and Excessive Functions 511
Proof: For any bounded Pm-martingale M and initial distribution J-L «
m, we note that M is also a PJL-martingale. To see this, ¥lrite k == dJ-L/ dm,
and note that PJL = k(X o ) .Pm. It is equivalent to show that Nt = k(Xo)Mt
is a Pm-martingale, which is clear since k(X o ) is Fo-measurable with mean
1.
Now fix any distribution J-L and a constant h > o. To prove the stated
property of A, it is enough to show that, for any bounded Pm-martingale
M, the process Nt = Mt-h OOh is a PJL-martingale on [h, (0). Then fix any
times s < t and sets F E :Fh and G E Fs. Using the Markov property at h
and noting that PJL 0 Xi: 1 « m, we get
EJL[Mt 0 Oh; F n Oh"lG] - EJL[Exh [Mt; G]; F]
EJL[Exh [Ms; G]; F]
- EJL[Ms 0 Oh; F n Oh1G].
Hence, by a monotone class argument, EJ.£(MtoOhIFh+s] = MsoOh a.s. 0
Proof of Theorem 25.34: Let AJL denote the PJL-compensator of - f(X) on
[0, (), and note that AJL is a.s. continuous, e.g. by Theorem 18.10. Fix any
distribution m Ad on D, and conclude from Lemma 25.35 that Am 0 Oh
is a PJ.L-compensator of - f(X OOh) on [0, (0 Oh) for any J-L and h > O. Since
this is also true for the process Ar+h - A, we get for any J-L and h > 0
Ar = A + Ah OOh, t > h, a.s. PJ.£. (8)
Restricting h to the positive rationals, we may define
At = lim Ah oOh, t > 0,
h-+O
whenever the limit exists and is continuous and nondecreasing with Ao = 0,
and put A = 0 otherwise. By (8) we have A = AJ.£ a.s. P JJ , for every J-L, and
so A is a PJL-compensator of - f(X) on [0, () for every J-L. For each h > 0
it follows by Lemma 25.35 that A 0 (}h is a PJL-compensator of - f(X 0 (}h)
on [0, ( 0 (}h), and since this is also true for the process At+h - A h , we get
At+h = Ah + At 0 (}h a.s. PJ.L. Thus, A is a CAF. 0
We may now establish a probabilistic version of the classical Riesz de-
composition. To avoid technical difficulties, we restrict our attention to
locally bounded functions f. By the greatest harmonic minorant of f we
mean a harmonic function h < f that dominates all other such functions.
Recall that the potential U A of a CAF A of X is given by U A (x) = Ex Aoo .
Theorem 25.36 (Riesz decomposition) Fix any locally bounded function
f > 0 on a domain D C }Rd, and let X be Brownian motion on D, killed
at aD. Then f is excessive iff it has a representation f == U A + h, where
A is a CAF of X and h is harmonic with h > O. In that case, A is the
compensator of - f{X) and h is the greatest harmonic m,inorant of f.
A similar result for uniformly a-excessive functions of an arbitrary Feller
process was obtained in Theorem 22.23. From the classical Riesz represen-
512 Foundations of Modern Pobability
tation on Greenian domains, we know that U A may also be written as the
Green potential of a unique measure v A, so that f = G D V A + h. In the
special case when D = IR d with d > 3, we recall from Theorem 22.21 that
l/AB = E (lB . A)l. A similar representation holds in the general case.
Proof of Theorem 25.36: First assume that A is a CAF with U A < 00.
By the additivity of A and the Markov property of X, we get for any t > 0
UA(X)
ExAoo = Ex (At + Aoo oOt)
ExAt + ExEXtAoo = ExAt + TtUA(X).
By dominated convergence ExAt t 0 as t -+ 0, and so U A is excessive. Even
U A + h is then excessive for any harmonic function h > O.
Conversely, assume that f is excessive and locally bounded. By Theorem
25.34 there exists some CAF A such that M == I(X) + A is a continuous
local martingale on [0, (). For any localizing and announcing sequence Tn t
(, we get
f(x) == ExMo = EXM'Tn = Exl(X'T n ) + ExAr n > ExAr n .
As n -+ 00, it follows by monotone convergence that U A < f.
By the additivity of A and the Markov property of X,
Ex [Aoo 1Ft]
At + Ex[Aoo oOtlFt]
At + EXt Aoo == Mt - I(X t ) + UA(X t ).
(9)
Writing h == I - UA, it follows that heX) is a continuous local martingale.
Since h is locally bounded, we may conclude by optional sampling and
dominated convergence that h has the mean-value property. Thus, h is
harmonic by Lemma 24.3.
To prove the uniqueness of A, assume that I also has a representation
U B + k for some OAF B and some harmonic function k > O. Proceeding
as in (9), we get
At - Bt = Ex[Aoo - Boo 1Ft] + h(X t ) - k(X t ), t > 0,
which shows that A - B is a continuous local martingale. Hence, Proposition
17.2 yields A = B a.s.
To see that h is the greatest harmonic minorant of I, consider any har-
monic minorant k > O. Since I - k is again excessive and locally bounded,
it has a representation U B + 1 for soe OAF B and some harmonic function
l. But then f = U B + k + l, and so A = B a.s. and h = k + l > k. 0
For any sufficiently regular measure v on ]Rd, we may now construct an
associated OAF A of Brownian motion X such that A increases only when
X visits the support of v. This clearly extends the notion of local time. For
convenience we may write GD(lD · v) = GDv.
25. Predictability, Compensation, and Excessive Functions 513
Proposition 25.37 (additive functionals induced by measures) Fix a
measure v on JRd such that U(lD . v) is bounded for every bounded do-
main D. Then there exists an a.s. unique CAF A of Bro'wnian motion X
such that, for any D,
ExA(D = GDv(x), xED.
Conversely, v is uniquely determined by A. Furthermore,
suppA c {t > 0; Xt E suppv} a.s.
(10)
(11)
The proof is straightforward, given the classical Riesz decomposition,
and we shall indicate the main steps only.
Proof: A simple calculation shows that CD V is excessive for any bounded
domain D. Since cDv < U(ID . v), it is further bounded. Hence, by Theo-
rem 25.36 there exist a CAF AD of X on [0, (D) and a harmonic function
hD > 0 such that GDv == U AD + hD. In fact, h D == 0 by Riesz' theorem.
Now consider another bounded domain D' D. We claim that CD' v-
GDv is harmonic on D. This is clear from the analytic definitions, and it
also follows, under a regularity condition, from Lemma 24.13. Since AD
and A D , are compensators of -CDv(X) and -G D ' v(X) respectively, we
conclude that AD - AD' is a martingale on [0, (D), and so AD = AD' a.s.
up to time (D. Now choose a sequence of bounded domains Dn t JRd, and
define A == SUPn ADn' so that A == AD a.s. on [0, (D) for all D.
It is easy to see that A is a CAF of X, and that (10) holds for any
bounded domain D. The uniqueness of v is clear from the uniqueness in
the classical Riesz decomposition. Finally, we obtain (11) by noting that
GDv is harmonic on D \ supp v for every D, so that GDv(X) is a local
martingale on the predictable set {t < (D; Xt fj. supp v}. 0
Exercises
1. Show by an example that the a-fields :F T and :F T - may differ. (Hint:
Take T to be constant.)
2. Give examples of optional times that are predictable; accessible but not
predictable; and totally inaccessible. (Hint: Use Corollary 25.18.)
3. Show by an example that a right-continuous, adapted process need not
be predictable. (Hint: Use Theorem 25.14.)
4. Given a Brownian motion B on [0, 1], let F be the filtration induced by
Xt == (Bt, B 1 ). Find the Doob-Meyer decomposition B == M + A on [0,1)
and show that A has a.s. finite variation on [0, 1].
5. For any totally inaccessible time T, show that SUPt IP{T < t + EIFt] -
1 {T < t} I -+ 0 a.s. as E -+ O. Derive a corresponding result for the
compensator. (Hint: Use Lemma 25.8.)
514 Foundations of Modern Pobability
6. Let the process X be adapted and rcll. Show that X is predictable iff
it has accessible jumps and LlX r is Fr--measurable for every predictable
time T < 00. (Hint: Use Proposition 25.17 and Lemmas 25.2 and 25.3.)
7. Show that the compensator A of a quasi-Ieftcontinuous local submartin-
gale is a.s. continuous. (Hint: Note that A has accessible jumps. Use
optional sampling at an arbitrary predictable time T < 00 with announcing
sequence (Tn).)
8. Extend Corollary 25.26 to possibly bounded compensators. Show that
the result fails in general when the compensators are not continuous.
9. Show that any general inequality involving an increasing process A
and its compensator A remains valid in discrete time. (Hint: Embed the
discrete-time process and filtration into continuous time.)
Chapter 26
Semimartingales and General
Stochastic Integration
Predictable covariation and £2 -integral; semimartingale integral
and covariation; general substitution rule; Doleans' exponen-
tial and change of measure; norm and exponential inequal-
ities; martingale integral; decomposition of semirnartingales;
quasi-martingales and stochastic integrators
In this chapter we shall use the previously established Doob-Meyer de-
composition to extend the stochastic integral of Chapter 17 to possibly
discontinuous semimartingales. The construction proceeds in three steps.
First we imitate the definition of the L 2 -integral V . M from Chapter 17,
using a predictable version (M, N) of the covariation process. A suitable
truncation then allows us to extend the integral to arbitrary semimartin-
gales X and bounded, predictable processes V. The ordinary covariation
[X, Y] can now be defined by the integration-by-parts formula, and we may
use some generalized versions of the BDG inequalities from Chapter 17 to
extend the martingale integral V . M to more general integrands V.
Once the stochastic integral is defined, we may develop a stochastic cal-
culus for general semimartingales. In particular, we shall prove an extension
of Ita's formula, solve a basic stochastic differential equation, and establish
a general Girsanov-type theorem for absolutely continuous changes of the
probability measure. The latter material extends the appropriate portions
of Chapters 18 and 21.
The stochastic integral and covariation process, together with the Doob-
Meyer decomposition from the preceding chapter, provide the tools for a
more detailed analysis of semimartingales. Thus, we may now establish two
general decompositions, similar to the decompositions of optional times
and increasing processes in Chapter 25. We shall further derive some ex-
ponential inequalities for martingales with bounded jUlnps, characterize
local quasi-martingales as special semimartingales, and show that no con-
tinuous extension of the predictable integral exists beyond the context of
semimartingales.
Throughout this chapter, M 2 denotes the class of uniformly square-
integrable martingales. As in Lemma 17.4, we note that M 2 is a Hilbert
space for the norm IIMII = (EM)1/2. We define M5 as the closed lin-
ear subspace of martingales M E M 2 with Mo = o. The corresponding
516 Foundations of Modern Probability
classes Mfoc and M6 lac are defined as the sets of processes M such that
,
the stopped versions MTn belong to M 2 or M6, respectively, for some
sequence of optional times Tn -+ 00.
For every M E Mfoc we note that M 2 is a local submartingale. The corre-
sponding compensator, denoted by (M), is called the predictable quadratic
variation of M. More generally, we may define the predictable covariation
(M, N) of two processes M, N E Mroc as the compensator of M N, also
computable by the polarization formula
4(M,N) = (M + N) - (M - N).
Note that (M, M) - (M). If M and N are continuous, then clearly
(M, N) = [M, N] a.s. The following result collects some further useful
properties.
Proposition 26.1 (predictable covariation) For any M, Mn, N E Mfoc,
(i) (!vI, N) == (M - Mo, N - No) a.s.;
(ii) (M) is a.s. increasing, and (M, N) is a.s. symmetric and bilinear;
(iii) I(M, N)I < J Id(M, N)I < (M)1/2(N)1/2 a.s.;
(iv) (M, N) I == (M', N) == (MT, NT) a.s. for any optional time T;
(v) (Mn}oo 0 implies (Mn - M(f)* O.
Proof: By Lemma 25.11 we note that (M, N) is the a.s. unique predictable
process of locally integrable variation and starting at 0 such that M N -
(M, N) is a local martingale. The symmetry and bilinearity in (ii) follow
immediately, as does property (i), since M No, MoN, and MoNo are all
local martingales. Property (iii) is proved in the same way as Proposition
17.9, and (iv) is obtained as in Theorem 17.5.
To prove (v), we may assume that M{) = 0 for all n. Let (Mn)oo O. Fix
any c > 0, and define Tn = inf{t; (Mn)t > c}. Since (Mn) is predictable,
even Tn is predictable by Theorem 25.14 and is therefore announced by
some sequence Tnk t Tn. The latter may be chosen such that Mn is an £2_
martingale and (Mn)2 - (Mn) a uniformly integrable martingale on [0, Tnk]
for every k. By Proposition 7.16
E ( M n ) *2 < E ( M n ) 2 = E ( M n ) < c
Tnk ,.-.. Ink Tnk - ,
and as k 00, we get E(Mn);_ ;S E. Now fix any b > 0, and write
P{(M n )*2 > 8} < P{Tn < co} +8-1E(Mn):_
5 p{(Mn)oo > c}+b- 1 c.
Here the right-hand side tends to zero as n 00 and then € -+ O. D
We may use the predictable quadratic variation to extend the It6 integral
from Chapter 17. As before, let £ denote the class of bounded, predictable
step processes V with jumps at finitely many fixed times. We refer to the
corresponding integral V . X as the elementary predictable integral.
26. Semimartingales and General Stochastic Integration 517
Given any M E Mroc' let L2(M) be the class of predictable processes V
such that (V 2 . (M))t < 00 a.s. for every t > o. We first consider integrals
V . M with M E Mfoc and V E L2(M). Here the integral process belongs
to M6,loc, the class of local L 2 -martingales starting at O. In the following
statement, it is understood that M, N E Mfoc and that U and V are
predictable processes such that the stated integrals exist.
Theorem 26.2 (L 2 -integral, Courrege, Kunita and Watanabe) The ele-
mentary predictable integral extends a.s. uniquely to a bilinear map of any
M E Mroc and V E L2(M) into V . M E M61oc, such that (V; . (Afn))t
,
0 implies (V n . M n ); 0 for every t > O. Furthermore,
(i) (V. M, N) == V . (M, N) a.s. for all N E Mfoc;
(ii) u. (V . M) == (UV) . M a.s.;
(iii) (V. M) == V M a.s.;
(iv) (V. M)T == V . MT == (Vl[o,T]) . M a.s. for any optional time T;
where property (i) characterizes the integral.
The proof depends on an elementary approximation property, corre-
sponding to Lemma 17.23 in the continuous case.
Lemma 26.3 (approximation) Let V be a predictable process with IVIP E
L( A), where A is increasing and p > 1. Then there exist some VI, v 2 , . . . E £
with (IV n - VIP. A)t -+ 0 a.s. for all t > o.
Proof: It is enough to establish the approximation (IV n - VI)P . A)t
o. By Minkowski's inequality we may then approximate in steps, and by
dominated convergence we may first reduce to the case when V is simple.
Each term may then be approximated separately, and so we may next
assume that V == I B for some predictable set B. Approximating separately
on disjoint intervals, we may finally reduce to the case when B cOx [0, t]
for some t > O. The desired approximation is then obtained from Lemma
25.1 by a monotone class argument. D
Proof of Theorem 26.2: As in Theorem 17.11, we may construct the
integral V . M as the a.s. unique element of M6 loc satisfying (i). The
,
mapping (V, M) M V. M is clearly bilinear, and by the analogue of Lemma
17.10 it extends the elementary predictable integral. Properties (ii) and (iv)
may be obtained in the same way as in Propositions 17.14 and 17.15. The
stated continuity property follows immediately from (i) and Proposition
26.1 (v). To get the stated uniqueness, it is then enough to apply Lemma
26.3 with A == (M) and p == 2.
To prove (iii), we note from Lemma 26.3 with At == (lvI)t + Es<t(Ms)2
that there exist some processes V n E £ satisfying Vnb,.M -+ V M and
(V n . M - V . M)* 0 a.s. In particular, (Vn . M) -t (V . M) a.s., and
so (Hi) follows from the corresponding relation for the elementary integrals
V n . M. The argument relies on the fact that Est(.L\1s)2 < 00 a.s. To
518 Foundations of Modern Probability
verify this, we may assume that M E M5 and define tn,k - kt2- n for
k < 2 n . By Fatou's lemma
E" (D.Ms)2
s5:t
< Eliminf" (Mt k - Mt k_l)2
n--+oo k n, n,
< lim inf E" (Mt k - Mt k_l)2 == EMt2 < 00. 0
n-+oo k n, n,
A semimartingale is defined as a right-continuous, adapted process X
admitting a decomposition M + A, where M is a local martingale and A is
a process of locally finite variation starting at O. If the variation of A is even
locally integrable, we can write X == (M + A - A) + A, where A denotes the
compensator of A. Hence, in this case we can choose A to be predictable.
The decomposition is then a.s. unique by Propositions 17.2 and 25.16, and
X is called a special semimartingale with canonical decomposition M + A.
Levy processes are the basic examples of semimartingales. In particular,
we note that a Levy process is a special semimartingale iff its Levy measure
v satisfies J(x 2 f\ Ixl)v(dx) < 00. From Theorem 25.5 it is further seen that
any local submartingale is a special semimartingale.
The next result extends the stochastic integration to general semimartin-
gales. At this stage we consider only locally bounded integrands, which
covers most applications of interest.
Theorem 26.4 (semimartingale integral, Doleans-Dade and Meyer) The
L 2 -integral of Theorem 26.2 and the Lebesgue-Stieltjes integral extend a.s.
uniquely to a bilinear map of any semimartingale X and locally bounded,
predictable process V into a semimartingale V . X. This integral satisfies
conditions (ii)-(iv) of Theorem 26.2 and is such that, if V > IVnl -+ 0 for
some locally bounded, predictable processes V, VI, v 2 , . .. , then (V n . X); .4 0
for all t > O. Finally, V. X is a local martingale whenever this holds for X.
Our proof relies on the following basic decomposition.
Lemma 26.5 (truncation, Doleans-Dade, Jacod and Memin, Yan) Any
local martingale M has a decomposition into local martingales M' and Mil,
where M' has locally integrable variation and IM"I < 1 a.s.
Proof: Define
At ==" Msl{IMsl > }, t > O.
s::;t
By optional sampling, we note that A has locally integrable variation. Let
A denote the compensator of A, and put M' = A - A and Mil == M - MI.
Then M' and Mil are again local martingales, and M' has locally integrable
variation. Furthermore,
ILlM"I < ILlM - AI + IAI < + IA"
and so it suffices to show that IAI < . Since the constructions of A
and A commute with opt,ional stopping, we may then assume that M and
26. Semimartingales and General Stochastic Integration 519
M' are uniformly integrable. Now A is predictable, so the times T = n J\
inf{t; IAI > !} are predictable by Theorem 25.14, and it is enough to
show that IATI < ! a.s. Clearly, E[MTIFT-] = E[MIFT-] = 0 a.s.,
and so by Lemma 25.3
IATI
IE[ArIFr-]1 = IE[Mr; IMrl > !IFr-]1
IE[LlM T ; ILlM.,.1 < IFr-]1 < .
o
Proof of Theorem 26.4: By Lemma 26.5 we may write X = M + A, where
M is a local martingale with bounded jumps, hence a local L 2 -martingale,
and A has locally finite variation. For any locally bounded, predictable
process V we may then define V . X = V . M + V . A, where the first term is
the integral in Theorem 26.2, and the second term is an ordinary Lebesgue-
Stieltjes integral. If V > IVnl -t 0, then (V; . (M))t -t 0 and (V n . A); -t 0
by dominated convergence, and so Theorem 26.2 yields (V n . X); 0 for
all t > o.
To prove the uniqueness, it suffices to prove that if M = A is a local £2_
martingale of locally finite variation, then V. M = V. A a.s. for every locally
bounded, predictable process V, where V . M is the integral in Theorem
26.2 and V . A is an elementary Stieltjes integral. The t",.o integrals clearly
agree when V E E. For general V, we may approximate as in Lemma 26.3
by processes V n E £ such that ((V n - V)2. (M))* -t 0 and (IV n - VI.A)* -t 0
p
a.s. But then (V n . M)t -t (V · M)t and (V n . A)t -t (V . A)t for every t > 0,
and the desired equality follows.
To prove the last assertion, we may reduce by means of Lemma 26.5 and
a suitable localization to the case when V is bounded and X has integrable
variation A. By Lemma 26.3 we may next choose some uniformly bounded
processes VI, V 2 ,.. . E £ such that (IV n - VI . A)t -t 0 a.s. for every t > o.
Then (V n . X)t -t (V . X)t a.s. for all t, and by dominated convergence this
remains true in £1. Thus, the martingale property of V n . X carries over to
V.X. 0
For any semimartingales X and Y, the left-continuous versions X_
(X t -) and Y_ = (yt_) are locally bounded and predictable, and so they can
serve as integrands in the general stochastic integral. './e may then define
the quadratic variation [X] and covariation [X, Y] by the integration-by-
parts formulas
[X]
[X, Y]
X 2 - X5 - 2X _ . X,
- XY-Xoyo-X- .y-y_.x
([X + Y] - [X - Y])j4.
(1)
In particular, [X] = [X, X]. Here we list some further basic properties of
the covariation.
520 Foundations of Modern Probability
Theorem 26.6 (covariation) For any semimartingales X and Y,
(i) [X, Y] == [X - Xo, Y - Yo] a.s.;
(ii) [X] is a.s. nondecreasing, and [X, Y] is a.s. symmetric and bilinear;
(iii) I[X, Y]I < J Id[X, YJI < [X]1/2[y]1/2 a.s.;
(iv) [X] == (X)2 and Ll[X, Y] == Xy a.s.;
(v) [V. X, Y] == V . [X, Y] a.s. for any locally bounded, predictable V;
(vi) [X r , Y] == [xr, yr] == [X, y]r a.s. for any optional time T;
(vii) if M, N E Mtoc' then [M, N] has compensator (M, N);
(viii) if A has locally finite variation, then [X, A]t == Es::;t AXsAAs a.s.
Proof: The symmetry and bilinearity of [X, Y] are obvious from (1), and
to get (i) it remains to check that [X, Yo] == o.
(ii) We may extend Proposition 17.17 with the same proof to general
semimartingales. In particular, [X]s < [X]t a.s. for any s < t. By right-
continuity the exceptional null set can be chosen to be independent of s
and t, which means that [X] is a.s. nondecreasing. Relation (iii) may now
be proved as in Proposition 17.9.
(iv) By (1) and Theorem 26.2 (iii),
Ll[X, Y]t (XY)t - A(X_ . Y)t - Ll(Y_ . X)t
Xtlt - Xt-lt- - Xt-Ayt - yt-Xt == Xtyt.
( v) For V E £ the relation follows most easily from the extended version
of Proposition 17.17. Also note that both sides are a.s. linear in V. Now let
V, VI, V 2 ,... be locally bounded and predictable with V > IVnl ---1- o. Then
V n . [X, Y] ---1- 0 by dominated convergence, and by Theorem 26.4 we have
p
[V n . X, Y] == (V n . X)Y - (V n . X)_ . Y - (VnY_) . X -+ o.
Using a monotone class argument, we may now extend the relation to
arbitrary V.
(vi) This follows from (v) with V = l[O,r].
(vii) Since M_.N and N_.M are local martingales, the assertion follows
from (1) and the definition of (M, N).
(viii) For step processes A the stated relation follows from the extended
version of Proposition 17.17. Now assume instead that .6.A < e, and con-
clude from the same result and property (iii) together with the ordinary
Cauchy-Buniakovsky inequality that
2 i t
[X, A]; V I E A XsAAs l < [X]t[A]t < E[X]t IdAsl.
st 0
The assertion now follows by a simple approximation.
o
We may now extend the Ito formula of Theorem 17.18 to a substitution
rule for general semimartingales. By a semimartingale in ]Rd we mean a pro-
cess X = (Xl,..., X d ) such that each component Xi is a one-dimensional
26. Semimartingales and General Stochastic Integration 521
semimartingale. Let [Xi, Xj]C denote the continuous components of the
finite-variation processes [Xi, X j ], and write fI and fI; for the first- and
second-order partial derivatives of f, respectively. Summation over repeated
indices is understood as before.
Theorem 26.7 (substitution rule, Kunita and Watanabe) For any semi-
martingale X = (Xl,... ,Xd) in JRd and function f E C 2 (JRd), we have
f(Xt) f(Xo) + it fI(Xs-)dX + it f:;(Xs_)d[Xi,XJ]
+ " {f(Xs) - f:(Xs-)X}. (2)
sS:.t
Proof: Assuming that (2) holds for some function f E C 2 (JR d ), we shall
prove for any k E {I, . . . , n} that (2) remains true for g(:1;) == xkf(x). Then
note that by (1)
g(X) == g(Xo) + X . f(X) + f(X-) . X k + [X k , f(X)]. (3)
Writing J(x,y) = f{x) - f{y) - fI(y)(xi - Yi), we get by (2) and property
(ii) of Theorem 26.2
X . f{X)
Xf:(X-) . Xi + Xf:;(X-) . [X\ xj]C
k
+ s Xs_f(Xs, X s -).
(4)
Next we note that, by properties (ii), (iv), (v), and (viii) of Theorem 26.6,
[X k , j(X)]
I k i " k "
f i (X_) . [X , X ] + Xs j(X s , Xs-)
s
fI (X_) . [X k , Xi]C + Ls X: f(Xs)'
(5)
Inserting (4) and (5) into (3), and using the elementary formulas
g ( x )
gj ( x )
g(x,y)
bikf(x) + xkfI(x),
bikfj(x) + bjkfI(x) + xkflj(x),
(Xk - Yk)(f(x) - f(y)) + YkJ(X, y),
we obtain after some simplification the desired expression for g(X).
Equation (2) is trivially true for constant functions, and it extends
by induction and linearity to arbitrary polynomials. Now any function
f E C 2 (JRd) may be approximated by polynomials, in such a way that
all derivatives up to the second order tend uniformly to those of f on every
compact set. To prove (2) for f, it is then enough to show that the right-
hand side tends to zero in probability, as f and its first- and second-order
derivatives tend to zero, uniformly on compact sets.
For the two integrals in (2), this is clear by the dominated convergence
property of Theorem 26.4, and it remains to consider the last term. Writing
Bt = {x E ]Rd; Ixl < X;} and IIgliB = sUPB Igl, we get by Taylor's formula
522 Foundations of Modern Probability
in 1R d
" Ij(Xs,Xs-)1
L..-is$;t
<
,-,.
Li,jllf:jIlBt LS91LlXsl2
L. .lIf::jIlB t L.[Xi]t -t O.
,J
<
The same estimate shows that the last term has locally finite varIa-
tion. 0
To illustrate the use of the general substitution rule, we consider a partial
extension of Proposition 21.2 to general semimartingales.
Theorem 26.8 (Do leans ' exponential) For any semimartingale X with
Xo = 0, the equation Z = 1 + Z_ . X has the a.s. unique solution
Zt = £(X) = exp(X t - [X]) II (1 + Xs)e-xs, t > O. (6)
st
Note that the infinite product in (6) is a.s. absolutely convergent, since
Es<t(Xs)2 < [X]t < 00. However, we may have Xs = -1 for some
s > -0, in which case Z = 0 for t > s. The process £(X) in (6) is called the
Doleans exponential of X. When X is continuous, we get £(X) = exp(X -
![X]), in agreement with the notation of Lemma 18.21. For processes A of
locally finite variation, formula (6) simplifies to
£(A) = exp(A) II (1 + As), t > o.
s$;t
Proof of Theorem 26.8: To check that (6) is a solution, we may write
Z = f(Y, V), where Y = X - ![X]C, V = I1(1 + X)e-X, and f(y, v) =
eYv. By Theorem 26.7 we get
Z - 1 = Z_. Y + e Y - . V + Z- . [X]C
+ L {Z - Z_X - eY-V}.
(7)
Now e Y - . V = E e Y - V since V is of pure-jump type, and furthermore
Z = Z_X. Hence, the right-hand side of (7) simplifies to Z_ . X, as
desired.
To prove the uniqueness, let Z be an arbitrary solution, and put V =
Ze- Y , where Y = X - [X]C as before. By Theorem 26.7 we get
V - 1 = e- Y -. Z - V_ . Y + V_ . [X]C - e- Y - . [X, Z]C
+ L {V + V_Y - e- Y - Z}
V_ . X - V_ . X + V_ . [X]C + V_ . [X]C - V_ · [X]C
+ L{LlV + V_LlX - V_X}
- LV.,
26. Semimartingales and General Stochastic Integration 523
Thus, V is a purely discontinuous process of locally finite variation. We
may further compute
V - Ze- Y - Z_e- Y - == (Z_ + LlZ)e-Y--Y - Z_e- Y -
- V_ {(I + LlX)e- AX - 1 } ,
which shows that V == 1 + V_ . A with A == E{(1 + X)e-X - I}.
It remains to show that the homogeneous equation V == V_ . A has the
unique solution V == O. Then define- Rt == J(O,t] IdAI, and conclude from
Theorem 26.7 and the convexity of the function x r-+ x n that
R n == nRr:..- 1 . R + L(Rn - nRr:..- 1 R) > nR-l . R. (8)
We may now prove by induction that
* < *R/n!, t > 0, n E Z+.
(9)
This is obvious for n = 0, and assuming (9) to be true for n - 1, we get
by (8)
Y:* == ( V . A ) * < *(R-l. R)t < * Rr
t - t - ( _ 1) ' - "
n. n.
as required. Since R In! -t 0 as n -t 00, relation (9) yields * == 0 for all
t > O. 0
The equation Z == 1 + Z_ . X arises naturally in conneetion with changes
of probability measure. The following result extends Proposition 18.20 to
general local martingales.
Theorem 26.9 (change of measure, van Schuppen and Wong) Let Q ==
Zt . P on:F t for all t > 0, and consider a local P -martingale M such that the
process [M, Z] has locally integrable variation and P-co1npensator (M, Z).
Then M == M - Z=l . (M, Z) is a local Q-martingale.
A lemma will be needed for the proof.
Lemma 26.10 (integration by parts) If X is a semimartingale and A is
a predictable process of locally finite variation, then AX == A . X + X _ . A
a.s.
Proof: We need to show that A . X == [A, X] a.s., which by Theorem
26.6 (viii) is equivalent to
f AAsdXs = L AAsAXs, t > o.
J(O,t] st
Noting that the series on the right is absolutely convergent by the Cauchy-
Buniakovsky inequality, we may reduce, by dominated convergence on each
side, to the case when A is constant apart from finitely many jumps. Using
Lemma 25.3 and Theorem 25.14, we may next proceed to the case when A
has at most one jump, occurring at some predictable time T. Introducing
524 Foundations of Modern Probability
an announcing sequence (Tn) and writing Y = A. X, we get by property
(iv) of Theorem 26.2
Y TnAt == 0 = ¥t - ¥tAT a.s., t > 0, n E N.
Thus, even Y is constant apart from a possible jump at T. Finally, property
(iii) of Theorem 26.2 yields YT = ATXT a.s. on {T < oo}. 0
Proof of Theorem 26.9: For each n E N, let Tn =_inf{t; Zt < 1In}, and
note that Tn 00 a.s. Q by Lemma 18.17. Hence, M is well defined under
Q, and it suffices as in Lemma 18.15 to show that (M Z)T n is a local P-
martingale for every n. Writing for equality up to a local P-martingale,
we may conclude from Lemma 26.10 with X == Z and A == Z=l . (M, Z)
that, on every interval [0, Tn],
M Z [M, Z] ;!!; (M, Z) == Z_ . A AZ.
- m
Thus, we get MZ == (M - A)Z rv 0, as required.
o
Using the last theorem, we may easily show that the class of semimartin-
gales is invariant under absolutely continuous changes of the probability
measure. A special case of this result was previously obtained as part of
Proposition 18.20.
Corollary 26.11 (preservation law, Jacod) If Q « P on Ft for all t > 0,
then every P-semimartingale is also a Q-semimartingale.
Proof: Assume that Q = Zt . P on :Ft for all t > O. We need to show
that every local P-martingale M is a Q-semimartingale. By Lemma 26.5 we
may then assume M to be bounded, so that [M] is locally bounded. By
Theorem 26.9 it suffices to show that [M, Z] has locally integrable variation,
and by Theorem 26.6 (iii) it is then enough to prove that [Z]1/2 is locally
integrable. Now Theorem 26.6 (iv) yields
[Z];/2 < [Z];2 + IZtl < [Z]:2 + Z:_ + IZtl, t > 0,
and so the desired integrability follows by optional sampling.
o
OUf next aim is to extend the BDG inequalities of Theorem 17.7 to
general local martingales. Such an extension turns out to be possible only
for exponents p > 1.
Theorem 26.12 (norm inequalities, Burkholder, Davis, Gundy) There
exist some constants c p E (0, 00 ), p > 1, such that for any local martingale
M with Mo = 0,
c;lE[M]2 < EM*P < CpE[M]2, P > 1. (10)
As in Corollary 17.8, it follows in particular that M is a uniformly
integrable martingale whenever E[M]2 < 00.
Proof for p = 1 (Davis): To exploit the symmetry of the argument, we
write M'rJ and M# for the processes M* and [M]1/2, taken in either order.
26. Semimartingales and General Stochastic Integration 525
Put J == b,.M, and define
At = '""" J s 1{IJ s l > 2J:_}, t > o.
st
Since Ib,.AI < 2b,.J*, we have
1 00 IdAsl = LJ6.Asl < 2J* < 4ML,.
Writing Ii for the compensator of A and putting D == A - A, we get
ED V ED'to < E 1 00 IdDsl E 1 00 IdAsl ::: EML,. (11)
To get a similar estimate for N == M - D, we introduce the optional
times
Tr == inf{t; Nf V Jt > r}, r > 0,
and note that
P{N > r} < P{Tr < oo} + P{TT == 00, N > r}
< P{N > r} + P{J* > r} + P{Nr > r}. (12)
Arguing as in the proof of Lemma 26.5, we get INI < 4J, and so
N1 r < N /\ (Nr- + 4J;r-) < N /\ 5T.
Since N 2 - [N] is a local martingale, we get by Chebyshev's inequality or
Proposition 7.15, respectively,
r 2 P{N;r > r} 5 EN; 5 E(N /\r)2.
Hence, by Fubini's theorem and some calculus,
1 00 P{N;r > r}dr 1 00 E(NL, /\r)2r- 2dr :S. ENL,.
Combining this with (11)-(12) and using Lemma 3.4, we get
EN:x, 1 00 P{ N:x, > r }dr
< 1 00 (P{NL, > r} + P{J* > r} + P{N;r > r}) dr
EN + EJ* 5 EM.
It remains to note that EM < ED + EN.
o
Extension to p > 1 (Garsia): For any t > 0 and B EFt, we may apply
(10) with p == 1 to the local martingale 1B(M - M t ) to get a.s.
Cl l E[[M - Mt]2IFt] < E[(M - Mt) IFtJ
< c1E[[M - Mt}2IFt].
526 Foundations of Modern Probability
Since
[M]2 - [M];/2 < [M _ Mt]2 < [M]2,
M - Mt < (M - Mt) < 2M,
the relation E[Aoo - At 1Ft] ::S E[(IFtJ occurring in Proposition 25.21 holds
with At = [M]/2 and ( = M*, and also with At = Mt and ( = [M]2.
Since
6.M* < 6. [ M ] 1/2 = I M I < [ M ] 1/2 1\ 2M*
t- t t - t t,
we have in both cases 6Ar :S E[(IFr] a.s. for every optional time T, and
so the cited condition remains fulfilled for the left-continuous version A_.
Hence, Proposition 25.21 yields IIAoo lip ;S 1I(lIp for every p > 1, and (10)
follows. 0
We may use the last theorem to extend the stochastic integral to a larger
class of integrands. Then write M for the space of local martingales and
Mo for the subclass of processes M with Mo = O. For any M E M, let
L(M) denote the class of predictable processes V such that (V 2 . [M])1/2
is locally integrable.
Theorem 26.13 (martingale integral, Meyer) The elementary predictable
integral extends a.s. uniquely to a bilinear map of any M E M and V E
L(M) into V · MEMo, such that if V, VI, V 2 ,... E L(M) with IVnl < V
and (V; . [M])t 0 for some t > 0, then (V n . M); o. This integral
satisfies properties (ii)-(iv) of Theorem 26.2 and is characterized by the
condition
[V. M,N] = V. [M,N] a.s., N EM.
(13)
Proof: For the construction of the integral, we may reduce by localization
to the case when E(M - Mo)* < 00 and E(V 2 . [M])2 < 00. For each
n E N, define V n = V1{IVI < n}. Then V n . MEMo by Theorem 26.4,
and by Theorem 26.12 we have E(V n . M)* < 00. Using Theorems 26.6 (v)
and 26.12, Minkowski's inequality, and dominated convergence, we obtain
E(V m . M - V n . M)*
<
E[(V m - V n ) . M]2
E«V m - V n )2 . [M])2 -t o.
...-
Hence, there exists a process V . M with E(V n . M - V · M)* --t 0, and
clearly V . MEMo and E(V . M)*oo.
To prove (13), we note that the relation holds for each V n by Theorem
26.6 (v). Since E[V n . M - V . M]2 -t 0 by Theorem 26.12, we get by
Theorem 26.6 (iii) for any N E M and t > 0
I[V n . M, N]t - [V . M, N]tl < [V n . M - V . M]:/2[N];/2 O. (14)
26. Semimartingales and General Stochastic Integration 527
Next we note that, by Theorem 26.6 (iii) and (v),
I t IVnd[M, NJI = I t Id[V n . M, NIl < (Vn . M];/2 [NI;/2.
As n 00, we get by monotone convergence on the left, and Minkowski's
inequality on the right
I t IV d[M, NIl < (V . MI;/2 [N];/2 < 00.
Hence, by dominated convergence V n . [M, N] V. [M, N], and (13) follows
by combination with (14).
To see that (13) determines V. M, it remains to note that if [M] == 0 a.s.
for some MEMo, then M* =: 0 a.s. by Theorem 26.12. To prove the stated
continuity property, we may reduce by localization to the case when E(V 2 .
[M])2 < 00. But then E(V; . [M])2 -7 0 by dominated convergence,
and Theorem 26.12 yields E(V n . M)* -7 O. To prove the uniqueness of
the integral, it is enough to consider bounded integrands V. We may then
approximate as in Lemma 26.3 by uniformly bounded processes V n E £
with «V n - V)2 . [M]) 0, and conclude that (V n . M - V . M)* o.
Of the remaining properties in Theorem 26.2, relation (ii) may be proved
as before by means of (13), whereas (iii) and (iv) follow most easily by
truncation from the corresponding statements in Theorem 26.4. 0
A semimartingale X = M + A is said to be purely discontinuous if there
exist some local martingales M 1 , M 2 , . .. of locally finite variation such that
E(M - Mn)*2 --T 0 for every t > o. The property is clearly independent of
the choice of decomposition X = M + A. To motivate the terminology, we
note that any martingale M of locally finite variation may be written as
M = Mo + A - A, where At = Ls<t LlMs and A denotes the compensator
of A. Thus, M - Mo is in this case a compensated sum of jumps.
Our present goal is to establish a fundamental decomposition of a general
semimartingale X into a continuous and a purely discontinuous component,
corresponding to the elementary decomposition of the quadratic variation
[X] into a continuous part and a jump part. In this connection the reader
is cautioned that, although any adapted process of locally finite variation is
a purely discontinuous semimartingale, it may not be purely discontinuous
in the sense of real analysis.
Theorem 26.14 (decomposition of semimartingales, Yoeurp, Meyer) Ev-
ery semimartingale X has an a.s. unique decomposition X = Xo+xc+X d ,
where XC is a continuous local martingale with X8 = 0 and X d is a purely
discontinuous semimartingale. Furthermore, [XC] = [X]C and [X d ] = [X]d
a.s.
Proof: To decompose X it is enough to consider the martingale com-
ponent in any decomposition X = Xo + M + A, and by Lemma 26.5 we
may assume that M E M5,loc. We may then choose some optional times
528 Foundations of Modern Probability
Tn t 00, where TO = 0, such that MTn E M6 for each n. It is enough to
construct the desired decomposition for each process MTn - M T n-l, which
reduces the discussion to the case when M E M5. Now let C and D denote
the classes of continuous and purely discontinuous processes in M5, and
note that both are closed linear subspaces of the Hilbert space M5. The
desired decomposition will follow from Theorem 1.33 if we can show that
DJ.. c c.
Then let M E V..L. To see that M is continuous, fix any E > 0, and put
T = inf {t; Mt > c}. Define At = 1 {T < t}, let A denote the compensator
of A, and put N = A-A. Integrating by parts and using Lemma 25.13
gIves
EA; < E J AdA = E J AdA = EA,. = EAr < 1.
Thus, N is L2-bounded and hence lies in D. For any bounded martingale
M', we get
EMNoo E J M'dN = E J b..M'dN
- E J b..M'dA = E[b..M; T < 00],
where the first equality is obtained as in the proof of Lemma 25.7, the
second is due to the predictability of M, and the third holds since A is
predictable and hence natural. Letting M' -4 M in M 2 , we obtain
o = EMooNoo = E[MT; T < 00] > EP{r < oo}.
Thus, M < E a.s., and therefore M < 0 a.s. since E is arbitrary.
Similarly, D..M > 0 a.s., and the desired continuity follows.
Next assume that MEV and N E C, and choose martingales Mn --+ M
of locally finite variation. By Theorem 26.6 (vi) and (vii) and optional
sampling, we get for any optional time r
o = E[M n , N]T = EM;: NT -t EMTN T = E[M, N]r,
and so [M, N] is a martingale by Lemma 7.13. Since it is also continuous by
(15), Proposition 17.2 yields [M,N] == 0 a.s. In particular, EMooNoo = 0,
which shows that C 1- V. The uniqueness assertion now follows easily.
To prove the last assertion, we conclude from Theorem 26.6 (iv) that,
for any M E M 2 ,
[M]t == [M] + (Ms)2, t > o.
L...J s5t
(15)
Letting MEV, we may choose martingales of locally finite variation Mn
M. By Theorem 26.6 (vi) and (viii) we have [Mn]c = 0 and E[Mn-M)oo
26. Semimartingales and General Stochastic Integration 529
--1' O. For any t > 0, we get by Minkowski's inequality and (15)
{Ls<t (M:)2} 1/2 - {Ls<t (Ms)2} 1/2
- < {L s : 9 (M: -=- Ms)2} 1/2 < [M n - M]/2 0,
I [M n ]:/2 - [M]:/21 < [M n - M]/2 O.
Taking limits in (15) for the martingales Mn, we get the same formula for
M without the term [M], which shows that [M] == [M]d.
Now consider any M E M 2 . Using the strong orthogonality [MC,M d ] ==
0, we get a.s.
[M]C + [M]d == [M] = [M C + M d ] == [M C ] + [M d ],
which shows that even [MC] == [M]C a.s. By the sanle argument com-
bined with Theorem 26.6 (viii) we obtain [X d ] == [X]d a.s. for any
semimartingale X. 0
The last result immediately yields an explicit formula for the covariation
of two semimartingales.
Corollary 26.15 (decomposition of covariation) For a.ny semimartingale
X, the process Xc is the a.s. unique continuous local 1nartingale M with
!VIo == 0 such that [X - M] is purely discontinuous. Furthermore, we have
a.s. for any semimartingales X and Y
[X, Y]t = [XC, Y C ] +" XsYs, t > O. (16)
st
In particular, we note that (V . X)C == V . Xc a.s. for any semimartingale
X and locally bounded, predictable process v.
Proof: If M has the stated properties, then [(X - M)C] == [X - M]C == 0
a.s., and so (X -M)C = 0 a.s. Thus, X -M is purely discontinuous. Formula
(16) holds by Theorem 26.6 (iv) and Theorem 26.14 when X == Y, and the
general result follows by polarization. 0
The purely discontinuous component of a local martingale has a fur-
ther decomposition, similar to the decompositions of optional times and
increasing processes in Propositions 25.4 and 25.17.
Corollary 26.16 (decomposition of martingales, Yoeurp) Every purely
discontinuous local martingale M has an a.s. unique decomposition M =
Mo + Mq + Ma with purely discontinuous Mq, Ma E Mo, where Mq is
quasi-leftcontinuous and Ma has accessible jumps. Furthermore, there exist
some predictable times 71, 72, . .. with disjoint graphs such that {t; Mf =I-
O} C Un [7n] a.s. Finally, [Mq] = [M]q and [M a ] = [M]a a.s., and also
(Mq) = (M)C and (M a ) == (M)d a.s. when M E Mfoc.
Proof: Introduce the locally integrable process At == Es<t{(Ms)21\ I}
with compensator A, and define Mq == M - Mo - Ma = l{At == O} . M.
530 Foundations of Modern Probability
By Theorem 26.4 we have Mq, Ma E Mo and Mq = 1 {A = O} M
a.s. Furthermore, Mq and Ma are purely discontinuous by Corollary 26.15.
The proof may now be completed as in the case of Proposition 25.17. 0
We may illustrate the use of the previous decompositions by proving two
exponential inequalities for martingales with bounded jumps.
Theorem 26.17 (exponential inequalities) Let M be a local martingale
with Mo = 0 such that IMI < c for some constant c < 1.
(i) If [M]oo < 1 a.s., then
P{M* > r};s exp{-r2/(1+rc)}, r > O.
(ii) If (M)oo < 1 a.s., then
P{M* > r} exp{-rlog(l +rc)jc}, r > o.
For continuous martingales both bounds reduce to e- r2 /2, which can
also be obtained directly by more elementary methods. For the proof of
Theorem 26.17 we need two lemmas. We begin with a characterization of
certain pure jump-type martingales.
Lemma 26.18 (accessible jump-type martingales) Let N be a pure jump-
type process with integrable variation and accessible jumps. Then N is a
martingale iff E[LlNrIFr-] = 0 a.s. for every finite predictable time 7.
Proof: By Proposition 25.17 there exist some predictable times '1, 72, . . .
with disjoint graphs such that {t > 0; D.N t =1= O} C Un[Tn]. Assuming
the stated condition, we get by Fubini's theorem and Lemma 25.2 for any
bounded optional time 7
EN T - Ln E[.6.N Tn ; Tn < T]
Ln E[E[.6.N T nI.r Tn -]; Tn < T] = 0,
and so N is a martingale by Lemma 7.13. Conversely, given any uni-
formly integrable martingale N and finite predictable time T, we have a.s.
E[N.,.IFr-] = N T - and hence E[LlNrIF.,._] = o. 0
For general martingales M, the process Z = e M -[M]/2 in Lemma 18.21
is not necessarily a martingale. For many purposes, however, it can be
replaced by a similar supermartingale.
Lemma 26.19 (exponential supermartingales) Let M be a local martin-
gale with Mo = 0 and IMI < c < 00 a.s., and put a = f(c) and b = g(c),
where
f(x) = -(x + log(l - x)+)x- 2 , g(x) = (eX - 1 - x)x- 2 .
Then the processes X = eM -a[M] and Y = eM -b(M} are supermartingales.
26. Semimartingales and General Stochastic Integration 531
Proof: In case of X we may clearly assume that c < 1. By Theorem 26.7
we get, in an obvious shorthand notation,
X::.- 1 . X = M - (a - )[M]C + L {e6.M-a(6.M)2 - 1 - b..M} .
Here the first term on the right is a local martingale, and the second term
is nonincreasing since a > !. To see that even the sum is nonincreasing, we
need to show that exp(x - ax 2 ) < 1 + x or f( -x) < f(c) whenever Ixl < c.
But this is clear by a Taylor expansion of each side. Thus, X.=l . X is a local
supermartingale, and since X > 0, the same thing is true for X_ . (X.=l .
X) = X. By Fatou's lemma it follows that X is a true supermartingale.
In the case of Y, we may decompose M according to Theorem 26.14 and
Proposition 26.16 as M == MC + Mq + Ma, and conclude by Theorem 26.7
that
yl . Y _ M - b(M)C + [M]C + L {e6.M-b6.(M) - 1 - b..M}
- M + b([Mq] - (Mq)) - (b - )[M]C
'" { M-bA(M) _ 1 + D.M + b(6.M)2 }
+ e 1 + b(M)
'" { I + b...Ma + b(b...M a )2 _ 1 _ b..M a } .
+ 1 + bb...(Ma)
Here the first two terms on the right are martingales, and the third term
is nonincreasing since b > !. Even the first sum of jumps is nonincreasing
since eX - 1 - x < bx 2 for Ixl < c and e Y < 1 + y for y > o.
The last sum clearly defines a purely discontinuous process N of locally
finite variation and with accessible jumps. Fixing any finite predictable
time T and writing == D.M r and 'TJ == (M)r' we note that
1 + + b2
E 1+br7 -l-{
< EI1++b2-(1+)(1+b1])1
bEI2 - (1 + )'TJI < b(2 + c)E2.
Since
E Lt (b...Mt) 2 < E[M]oo = E(M)oo < 1,
we conclude that the total variation of N is integrable. lJsing Lemmas 25.3
and 26.18, we also note that a.s. E[IFT-] == 0 and
E[2IFT_] == E[[M]rIFr-] == E[1JI F r-] == 1J.
Thus,
E [ 1 + {+ b{2 _ 1 _ t :F _ ] == 0
l+b1] r ,
and Lemma 26.18 shows that N is a martingale. The proof may now be
completed as before. 0
532 Foundations of Modern Probability
Proof of Theorem 26.17: (i) Fix any u > 0, and conclude from Lemma
26.19 that the process
X; = exp{uM t - u 2 f(uc)(M]t}, t > 0,
is a positive supermartingale. Since [M] < 1 and X8 = 1, we get for any
r>O
P{SUPtMt > r} < p{ SUPtX > exp{ ur - u 2 f(uc)} }
< exp{-ur+u 2 f(uc)}. (17)
Now define F(x) = 2xf(x), and note that F is continuous and strictly
increasing from [0,1) onto JR+. Also note that F(x) < x/(1 - x) and hence
F-l(y) > y/(l +y). Taking u = p-l(rc)/c in (17), we get
P{SUPtMt > r} < exp{-rF-l(rc)lc}
< exp{ -r2 1(1 + rc)}.
It remains to combine with the same inequality for -M.
(ii) Define G(x) = 2xg(x), and note that G is a continuous and strictly
increasing mapping onto IR+. Furthermore, G(x) < eX -1, and so G-l(y) >
10g(1 + y). Proceeding as before, we get
P{SUPtMt > r} < exp{-rG-l(rc)/c}
< exp{-rlog(l+rc)/c},
and the result follows.
o
A quasi-martingale is defined as an integrable, adapted, and right-con-
tinuous process X such that
sup L E IX tk - E[Xtk+lIFtk] I < 00,
7r k$.n
(18)
where the supremum extends over all finite partitions 7r of 1R+ of the form
o = to < tl < ... < t n < 00, and the last term is computed under the
conventions t n + 1 = 00 and Xoo = o. In particular, we note that (18) holds
when X is the sum of an Ll-bounded martingale and a process of integrable
variation starting at o. The next result shows that this case is close to the
general situation. Here localization is defined in the usual way in terms of
a sequence of optional times Tn t 00.
Theorem 26.20 (quasi-martingales, Rao) Any quasi-martingale is a dif-
ference of two nonnegative supermartingales. Thus, a process X with
Xo = 0 is a local quasi-martingale iff it is a special semimartingale.
Proof: For any t > 0, let Pt denote the class of partitions 7r of the interval
[t,oo) of the form t = to < t 1 < ... < tn, and define
'f}; = " E[(X tk - E[Xtk+lIFtk]):i:\ Ft], 7r E Pt,
L....J k 5: n
where t n +l =: 00 and Xoo = 0 as before. We claim that 'TJ; and 'f}; are
a.s. nondecreasing under refinements of 7r E Pt. To see this, it is clearly
26. Semimartingales and General Stochastic Integration 533
enough to add one more division point u to Ti, say in the interval (tk, tk+l).
Put Q == X tk - Xu and /3 == Xu - X tk + 1 . By subadditivity and Jensen's
inequality we get the desired relation
E [E [a + (31:F t k ] 1: l:Ft] < E [ E [ Q l:Ft k ]:f: + E [f3I:F t k ] :f: I :Ft]
< E [E[al:Ftk]:f: + E[,BIFuLf: I :Ft] .
Now fix any t > 0, and conclude from (18) that m; = SUP7rE'P t E'rJ < 00.
For each n E N we may then choose some 7r n E Pt with ET/;-n > mt -
n -1. The sequences ("';-n) are Cauchy in L 1 , and so they converge in L 1
toward some limits 1:. Note also that EIT/; - :f: I < n -1 whenever Ti is a
refinement of 7r n. Thus, 1J;- -+ :i: in L 1 along the directed set Pt.
Next fix any s < t, let Ti E Pt be arbitrary, and define 1r' E Ps by adding
the point s to 1r. Then
Ys:l: > 17;' = (X s - E[Xtl:Fs]):f: + E[T/;I:F s ] > E[1];I F s].
Taking limits along Pt on the right, we get Y s :!: > E[y:f: l:Fs] a.s., which
means that the processes y:l: are supermartingales. By 'Theorem 7.27 the
right-hand limits along the rationals zt = then exist outside a fixed
null set, and the processes Z:f: are right-continuous supermartingales. For
1r E Pt we have Xt = 17; - 17; -t + - -, and so zt - Z; == X t + == Xt
a.s. 0
The next result shows that semimartingales are the most general pro-
cesses for which a stochastic integral with reasonable continuity properties
can be defined. As before, £ denotes the class of bounded, predictable step
processes with jumps at finitely many fixed points.
Theorem 26.21 (stochastic integrators, Bichteler, Dellacherie) A right-
continuous, adapted process X is a semimariingale iff for any VI, V 2 , . . . E £
with IIV:lloo -+ 0 we have (V n . X)t 0 for all t > o.
The proof is based on three lemmas, the first of which separates the
crucial functional-analytic part of the argument.
Lemma 26.22 (convexity and tightness) For any tight, convex set K C
L 1 (P), there exists a bounded random variable p > 0 with sUPEJC Ep < 00.
Proof (Yan): Let B denote the class of bounded, nonnegative random
variables, and define C = {')' E B; sUPEJC E( ')') < oo}. We claim that, for
any ,1, ,2, . .. E C, there exists some, E C with {"y > O} == Un {"Yn > O}.
Indeed, we may assume that "In < 1 and sUPEK E('n) < 1, in which case
we may choose, == 2:n 2- n ,n. It is then easy to construct apE C such
that P{p > O} = sUP,EC P{, > OJ. Clearly,
{, > O} C {p > O} a.s., '"Y E C, (19)
since we could otherwise choose a p' E C with P{p' > O} > P{p > O}.
To show that p > 0 a.s., we assume that instead Pip == O} > E > O. By
the tightness of JC we may choose r > 0 so large that P { > r} < € for
534 Foundations of Modern Probability
all E /C. Then P{ cE - {3 > r} < € for all E /C and j3 E B. B y Fat ou's
lemma we obtain P{( > r} < € for all ( in the £1-closure Z == K - B. In
particular, the random variable (0 == 2rl{p == O} lies outside Z. Now Z
is convex and closed, and so, by a version of the Hahn-Banach theorem,
there exists some 'Y E (L 1 )* = £00 satisfying
supE'Y - inf E'Y{3 < supE'Y( < E'Y(o == 2rE['Y; P == 0]. (20)
E}C I3EB (EZ
Here 'Y > 0, since we would otherwise get a contradiction by choos-
ing (3 == b 1 { 'Y < O} for large enough b > o. Hence, (20) red uces to
SUPE}C E'Y < 2r E['Y; P == 0], which implies 'Y E C and E['Y; P == 0] > o.
But this contradicts (19), and therefore p > 0 a.s. 0
Two further lemmas are needed for the proof of Theorem 26.21.
Lemma 26.23 (tightness and boundedness) Let T be the class of optional
times T < 00 taking finitely many values, and consider a right-continuous,
adapted process X such that the family {X.,.; T E T} is tight. Then X* < 00
a.s.
Proof: By Lemma 7.4 any bounded optional time T can be approxi-
mated from the right by optional times Tn E T, and by right-continuity
we have X"'n -+ X.,.. Hence, Fatou's lemma yields P{IX.,.I > r} <
lim inf n P{IX':n I > r}, and so the hypothesis remains true with T replaced
by the class T of all bounded optional times. By Lemma 7.6 the times
Tt,n == t 1\ inf{s; IXsl > n} belong to t for all t > 0 and n E N, and as
n -+ 00, we get
P{X* > n} == supP{X; > n} < sup P{IXTI > n} -+ O. 0
t>O TET
Lemma 26.24 (scaling) For any finite random variable , there exists a
bounded random variable p > 0 such that Elp1 < 00.
Proof: We may take p == (II V 1)-1.
o
Proof of Theorem 26.21: The necessity is clear from Theorem 26.4. Now
assume the stated condition. By Lemma 4.9 it is equivalent to assume
for each t > 0 that the family /Ct == {(V . X)t; V E £1} is tight, where
&1 == {V E &; IVI < I}. The latter family is clearly convex, and by the
linearity of the integral the convexity carries over to /Ct.
By Lemma 26.23 we have X* < 00 a.s., and so by Lemma 26.24 there
exists a probability measure Q f".J P such that EQX; = J X;dQ < 00. In
particular, IC t C Ll(Q), and we note that IC t remains tight with respect to
Q. Hence, by Lemma 26.22 there exists a probability measure R rv Q with
bounded density p == dRjdQ such that K.t is bounded in Ll(R).
Now consider an arbitrary partition 0 = to < tl < .. . < t n == t, and note
that
Lk$;n ERIX tk - ER[Xtk+lIFtk]1 = ER(V · X)t + ERIXtl, (21)
26. Semimartingales and General Stochastic Integration 535
where
V s = Lk<nsgn(ER[Xtk+ll.Ftk] - X tk ) l(tkh+d(s), s > o.
Since p is bounded and V E [1, the right-hand side of (21) is bounded by a
constant. Hence, the stopped process X t is a quasi-martingale under R. By
Theorem 26.20 it is then an R-semimartingale, and since P rv R, Corollary
26.11 shows that X t is even a P-semimartingale. Since t is arbitrary, it
follows that X itself is a P-semimartingale. 0
Exercises
1. Construct the quadratic variation [M] of a local L 2 -martingale M
directly as in Theorem 17.5, and prove a corresponding version of
the integration-by-parts formula. Use [M] to define the L 2 -integral of
Theorem 26.2.
2. Show that the approximation in Proposition 17.17 remains valid for
general semimartingales.
3. Consider a local martingale M starting at 0 and an optional time T. Use
Theorem 26.12 to give conditions for the validity of the relations EM T == 0
and EM; = [M]7".
4. Give an example of a sequence of £2-bounded martingales M n such
that M 0 and yet (Mn}oo 00. (Hint: Consider con1pensated Poisson
processes with large jumps.)
5. Give an example of a sequence of martingales M n such that [J\;l n ]oo 0
and yet M 00. (Hint: See the preceding problem.)
6. Show that (Mn}oo 0 implies (Mn]oo o.
7. Give an example of a martingale M of bounded variation and a bounded,
progressive process V such that V 2 . (M) == 0 and yet V . M =1= o. Con-
clude that the L 2 -integral in Theorem 26.2 has no continuous extension to
progressive integrands.
8. Show that any general martingale inequality involving the processes M,
[M], and (M) remains valid in discrete time. (Hint: Embed M and the
associated discrete filtration into a martingale and filtration on 1R+.)
9. Show that the a.s. convergence in Theorem 4.23 re]nains valid in LP.
(Hint: Use Theorem 26.12 to reduce to the case whenp < 1. Then truncate.)
10. Let 9 be an extension of the filtration F. Show that any F-adapted
Q-semimartingale is also an F-semimartingale. Also show by an example
that the converse implication fails in general. (Hint: Use Theorem 26.21.)
11. Show that if X is a Levy process in , then [X] is a subordinator.
Express the characteristics of [X] in terms of those for ..X".
536 Foundations of Modern Probability
12. For any Levy process X, show that if X is p-stable, then [X] is strictly
p/2-stable. Also prove the converse, in the case when X has positive or
symmetric jumps. (Hint: Use Proposition 15.9.)
13. Extend Theorem 26.17 to the case when [M]oo < a or (M) < a a.s. for
some a > 1. (Hint: Apply the original result to a suitably scaled process.)
14. For any Levy process X with Levy measure v, show that X E Mfoc iff
X E M 2 , and also iff J x 2 v(dx) < 00, in which case (X)t = tEXf. (Hint:
Use Corollary 25.25.)
15. Show that if M is a purely discontinuous local martingale with positive
jumps, then M -Mo is a.s. determined by [M]. (Hint: For any such processes
M and N with [M] == [N], apply Theorem 26.14 to M - N.)
16. Show that a semimartingale X is quasi-Ieftcontinuous or has accessible
jumps iff [X] has the same property. (Hint: Use Theorem 26.6 (iv).)
17. Show that a semimartingale X with IXI < c < 00 a.s. is a special
semimartingale with canonical decomposition M + A satisfying IAI < c
a.s. In particular, X is a continuous semimartingale iff it has a decomposi-
tion M + A, where M and A are continuous. (Hint: Use Lemma 26.5, and
note that IAI < c a.s. implies IAI < c a.s.)
18. Show that a semimartingale X is quasi-leftcontinuous or has accessible
jumps iff it has a decomposition M + A, where M and A have the same
property. Also show that, for special semimartingales, we may choose M +A
to be the canonical decomposition of X. (Hint: Use Proposition 25.17 and
Corollary 26.16, and refer to the preceding exercise.)
19. Show that a semimartingale X is predictable iff it is a special semi-
martingale with canonical decomposition M + A such that M is continuous.
(Hint: Use Proposition 25.16.)
Chapter 27
Large Deviations
Legendre-Fenchel transform; Cramer's and Schilder's theorems;
large-deviation principle and rate function; functional form of
the LDP; continuous mapping and extension; perturbation of
dynamical systems; empirical processes and entropy; Strassen's
law of the iterated logarithm
In its simplest setting, large deviation theory provides the exact rate of
convergence in the weak law of large numbers. To be precise, consider any
i.i.d. random variables 1, 2, . .. with mean m and cumulant-generating
function A(u) = logEeUi < 00, and write n = n- 1 L:k<n k. Then for
any x > m, the tail probabilities P {n > x} tend to 0 at an exponential
rate I (x), given by the Legendre-Fenchel transform A * of A. In higher
dimensions, it is often convenient to state the result more generally in the
form n-llogP{n E B} -+ -1(B), where I(B) == inf xEB l(x) and B is
restricted to a suitable class of continuity sets. In this standard format of
a large-deviation principle with rate function I, the result extends to an
amazing variety of contexts throughout probability theory.
A striking example, of fundamental importance in statistical mechan-
ics, is Sanov's theorem, which provides a similar large deviation result for
the empirical distributions of a sequence of i.i.d. randorIl variables with a
common distribution J1. Here the rate function I is defined on the space of
probability measures v on JR and agrees with the relative entropy function
H(vlJ1). Another important example is Schilder's theorem for the fam-
ily of resealed Brownian motions in JRd, where the rate function becomes
l(x) == llxll, the squared norm in the Cameron-Martin space considered
in Chapter 18. The latter result can be used to derive the Fredlin-Wentzell
estimates for randomly perturbed dynamical systems. It also provides a
short proof of Strassen's law of the iterated logarithm, a stunning extension
of the classical Khinchin law from Chapter 13.
Modern proofs of those and other large deviation results rely on some
general extension principles, which also serve to explain the wide appli-
cability of the present ideas. In addition to some rather straightforward
and elementary techniques of continuity and approximation, we consider
the more sophisticated and extremely powerful methods of inverse contin-
uous mapping and projective limits, both of which play a crucial role in
subsequent applications. We may also call attention to the significance of
538 Foundations of Modern Probability
exponential tightness, and to the essential equivalence between the setwise
and functional formulations of the large-deviation principle.
Large deviation theory is arguably one of the most technical branches
of modern probability theory. For the nonexpert it then seems essential
to avoid getting distracted by topological subtleties or elaborate computa-
tions. Many results are therefore stated here under simplifying assumptions.
Likewise, we postpone our discussion of general principles until the reader
has become aquainted with the basic ideas in a concrete setting. For this
reason, important applications appear both at the beginning and at the
end of the chapter, separated by a more abstract discussion of some general
notions and principles.
Let us now return to the elementary context of i.i.d. random variables
, 1, 2,... and write Sn == I:k'5:n k and n == Sn/n. If m = E exists
and is finite, then P {(n > x} -+ 0 for all x > m by the weak law of
large numbers. Under stronger moment conditions, the rate of convergence
turns out to be exponential and can be estimated with great accuracy. This
rather elementary but quite technical result lies, along with its multidimen-
sional counterpart, at the core of large-deviation theory and provides both
a pattern and a point of departure for more advanced developments. For
motivation, we begin with some simple observations.
Lemma 27.1 (convergence) Let , 1, 2,. .. be i.i.d. random variables.
Then
(i) n- 1 1og P{(n > x} sUPn n-1log P{n > x} = -h(x) for all x;
(ii) h is [O,oo]-valued, nondecreasing, and convex;
(iii) h(x) < 00 iff P{€ > x} > o.
Proof: (i) Writing Pn = P {(n > x}, we get for any m, n E N
Pm+n P{Sm+n > (m+n)x}
> P{Sm > mx, 8 m + n - 8m > nx} = PmPn.
Taking logarithms, we conclude that the sequence - log Pn is subadditive,
and the assertion follows by Lemma 10.21.
(ii) The first two assertions are obvious. To prove the convexity, let x, y E
IR be arbitrary, and proceed as before to get
P{S2n > n(x + y)} > P{Sn > nx} P{Sn > ny}.
Taking logarithms, dividing by 2n, and letting n -+ 00, we obtain
h((x + y)) < (h(x) + hey)), x, y > o.
(iii) If P{ > x} = 0, then P{n > x} = 0 for all n, and so hex) == 00.
Conversely, (i) yields 10gP{ > x} < -h(x), and so hex) = 00 implies
P{ > x}=O. 0
To determine the limit in Lemma 27.1 we need some further notation,
which is given here for convenience directly in d dimensions. For any random
27. Large Deviations 539
vector in }Rd, we introduce the function
A(u) = A(u) = logEeU, U E JRd,
(1)
known in statistics as the cumulant-generating function of . Note that A is
convex, since by Holder's inequality we have for any u, v E IR d and p, q > 0
with p + q == 1
A(pu + qv) log Eexp((pu + qv))
< log((EeU)P(EeV)q):= pA(u) + qA(v).
The surface z == A(u) in JRd+l is determined by the family of supporting
hyperplanes (d-dimensional affine subspaces) with different slopes, and we
note that the plane with slope x E JRd (or normal vector (1, - x )) has
equation
z+A*(x)==xu, uEJR d ,
where A * denotes the Legendre-Fenchel transform of A, given by
A*(x) == sup (ux - A(u)), x E ]Rd.
uEJRd
(2)
We can often compute A * explicitly. Here we list two simple cases that will
be needed below. The results are proved by elementary calculus.
Lemma 27.2 (Gaussian and Bernoulli distributions)
(i) If == (1, . . . , d) is standard Gaussian in ]Rd, then A (x) IxI2/2.
(ii) If E {O, I} with P{ == I} == P E (0,1), then A(x) == 00 for
x tt: [0,1] and
x I-x
A (x) == x log - + (1 - x) log , x E [0, 1].
P 1-p
The function A * is again convex, since for any x, y E JRd and for p and q
as before
A*(px + qy) suPu(P(ux - A(u)) + q(uy - A(u))]
< p suPu(ux - A(u)) + q suPu(uy - A(u))
pA*(x) + qA*(y).
If A < 00 near the origin, then m = E exists and agrees with the gradient
V A (0). Thus, the surface z == A ( u) has tangent hyperplane z == m u at
0, and we conclude that A*(m) == 0 and A*(x) > 0 for x =f- m. If is
also truly d-dimensional, then A is strictly convex at 0, and A * is finite
and continuous near m. For d = 1, we sometimes need the corresponding
one-sided statements, which are easily derived by dominated convergence.
The following key result identifies the function h in Lemma 27.1. For
simplicity, we assume that m == E exists in [-00, (0).
540 Foundations of Modern Probability
Theorem 27.3 (rate function, Cramer, Chernoff) Let ,1,€2,... be
i.i.d. random variables with m == E < 00. Then for any x > m, we
have
n-l1ogP{n > x} -t -A*(x).
(3)
Proof: Using Chebyshev's inequality and (1), we get for any u > 0
P{n > x} == P{ e uSn > e nux } < e- nux Ee uSn == enA(u)-nux,
and so
n-llogP{n > x} < A(u) - ux.
This remains true for u < 0, since in that case A( u) - ux > 0 for x > m.
Hence, by (2) we have the upper bound
n-llogP{n > x} < -A*(x), x > m, n E N.
(4)
To derive a matching lower bound, we first assume that A < 00 on +.
Then A is smooth on (0,00) with A'(O+) == m and A'(oo) == esssup - b,
and so for any a E (m, b) we can choose au> 0 such that A' ( u) == a. Let
'f}, 'f}1 , TJ2, . .. be i.i.d. with distribution
P{1J E B} == e-A(u) E[eU; E B], B E B. (5)
Then ATJ(r) == A(r + u) - A,(u), and therefore E", = A(O) == A(u) == a.
For any € > 0, we get by (5)
P{In - al < €} enA(u)E[e-nu1Jn; tiln - al < €}
> enA(u)-nu(a+c)P{liin_al<E}. (6)
Here the last probability tends to 1 by the law of large numbers, and so by
(2)
liminf n-l1ogP{In - al < e} > A(u) - u(a + e) > -A*(a + e).
n-+oo
Fixing any x E (m, b) and putting a == x + €, we get for small enough e > 0
liminf n-llogP{n > x} > -A*(x + 2e).
noo
Since A * is continuous on (m, b) by convexity, we may let e -t 0 and
combine with (4) to obtain (3).
The result for x > b is trivial, since in that case both sides of (3) equal
-00. If instead x == b < 00, then both sides equallogP{ = b}, the left
side by a simple computation and the right side by an elementary estimate.
Finally, assume that x == m > -00. Since the statement is trivial when
= m a.s., we may assume that b > m. For any y E (m, b), we have
o > n-llogP{n > m} > n- 1 P{ > y} -A*(y) > -00.
Here A*(y) --1> A*(m) = 0 by continuity, and (3) follows for x = m. This
completes the proof when A < 00 on 1R+.
27. Large Deviations 541
The case when A(u) == 00 for some u > 0 may be handled by truncation.
Thus, for any r > m we consider the random variables k == k 1\ r. Writing
AT and A; for the associated functions A and A *, we get for x > m > Er
n-1IogP{n > x} > n-1IogP{ > x} -+ -A;(x). (7)
Now Ar(u) t A(u) by monotone convergence as r -+ 00, and by Dini's
theorem the convergence is uniform on every compact interval where A <
00. Since also A' is unbounded on the set where A < 00, it follows easily that
A; (x) A * (x) for all x > m. The required lower bound is now immediate
from (7). 0
We may now supplement Lemma 27.1 with a criterion for exponential
decline of the tail probabilities P{n > x}.
Corollary 27.4 (exponential rate) Let , 1 , 2, . .. be i. i. d. with m ==
E < 00 and b == ess sup . Then for any x E (m, b), the probabilities
P{& > x} decrease exponentially iff A(e) < 00 for some E > O. The
exponential decline extends to x == b iff 0 < P { == b} < 1.
Proof: If A(e) < 00 for some E > 0, then A'(O+) == m by dominated
convergence, and so A * (x) > 0 for all x > m. If instead A == 00 on (0, 00 ),
then A * (x) == 0 for all x > m. The statement for x == b is trivial. 0
The large deviation estimates in Theorem 27.3 are easily extended
from intervals [x, (0) to arbitrary open or closed sets, which leads to the
large-deviation principle for Li.d. sequences in IR. To fulfill the needs of sub-
sequent applications and extensions, we shall derive a version of the same
result in }Rd. Motivated by the last result, and also to avoid some technical
complications, we assume that A(u) < 00 for all u. Write BO and B- for
the interior and closure of a set B.
Theorem 27.5 (large deviations in ]Rd, Varadhan) Let ,1,2,'" be
i. i. d. random vectors in JRd with A == Ar" < 00. Then for any B E B d ,
we have
- inf A*(x) < liminf n-1logP{n E B}
xEBO n-+oo
< limsup n-110gP{n E B} < - inf A*(x).
n-+oo c;EB-
Proof: To derive the upper bound, we fix any E > O. By (2) there exists
for every x E }Rd some U x E d such that
uxx - A(u x ) > (A*(x) - E)!\ E- 1 ,
and by continuity we may choose an open ball Bx around x such that
uxY> A(u x ) + (A*(x) - E) 1\ e- 1 , y E Bx.
By Chebyshev's inequality and (1) we get for any n E N
P{n E Bx} < Eexp(uxS n - ninf{uxY; y E Bx})
< exp ( -n( ( A * (x) - €) 1\ € -1 ) ) . ( 8 )
542 Foundations of Modern Probability
Also note that A < 00 implies A*(x) 00 as Ixl -7 00, at least when d = 1.
By Lemma 27.1 and Theorem 27.3 we may then choose r > 0 so large that
n- 1 IogP{I[nl > r} < -1/e:, n E N.
(9)
Now let B C d be closed. Then the set {x E B; Ixl < r} is compact and
may be covered by finitely many balls B Xl , . . . , Bxm. with centers Xi E B.
By (8) and (9) we get for any n E N
P{n E B}
< "'. P{(n E B Xi } + P{Inl > r}
m
< . exp( -n«A *(Xi) - £) 1\ £-1)) + e- n / E
m
< (m+l)exp(-n«A*(B)-£)I\£-l)),
where A*(B) = infxEB A*(x). Hence,
limsup n- 1 IogP{(n E B} < -(A*(B) _ £) 1\ £-1,
noo
and the upper bound follows since £ was arbitrary.
Turning to the lower bound, we first assume that A(u)/lul -7 00 as
lul -+ 00. Fix any open set B C d and a point x E B. By compactness
and the smoothness of A, there exists a u E }Rd such that \7 A ( u) = x.
Let 'fJ, 'fJl, 1]2, . .. be LLd. random vectors with distribution (5), and note as
before that E'fJ = x. For £ > 0 small enough, we get as in (6)
P{(n E B} > P{I(n - xl < £}
> exp(nA(u) - nux - n£lul) P{lijn - xl < £}.
Hence, by the law of large numbers and (2),
liminf n- 1 IogP{(n E B} > A(u) - ux - £Iul > -A*(x) - £Iul.
noo
It remains to let £ -+ 0 and take he supremum over x E B.
To eliminate the growth condition on A, let (, (1, (2,'.. be i.i.d. standard
Gaussian random vectors independent of and the n. Then for any a > 0
and u E ]Rd, we have by Lemma 27.2 (i) .
A+u«u) = A(u) + A«au) = A(u) + a21u12 > A(u),
and in particular A+(7( < A. Since also +u(u)/lul > a 2 ful/2 00, we
note that the previous bound applies to n + a(n.
Now fix any x E B as before, and choose £ > 0 small enough that B
contains a 2£-ball around x. Then
P{In + a(n - xl < £} < P{n E B} + P{al(nl > £}
< 2 (P{n E B} V P{al(nl > £}).
27. Large l)eviations 543
- -
Applying the lower bound to the variables n + O'(n and the upper bound
to (n, we get by Lemma 27.2 (i)
-A(x) < -A+O"((x) < liminf n-IlogP{In + O'(n - xl < E}
n -+ ex:>
< liminf n-Ilog (P{n E B} V P{O'I1 > E})
n-+oo
< liminfn-IlogP{€nEB} V (-E2/2O' 2 ).
n-+oo
The desired lower bound now follows, as we let a -t 0 and then take the
supremum over all x E B. 0
We can also derive large-deviation results in function spaces. Here the fol-
lowing theorem is basic and sets the pattern for more conlplex results. For
convenience, we may write C == C([O, 1], ]Rd) and C == {:r E Ck; Xo == O}.
We also introduce the Cameron-Martin space HI, consisting of all abso-
lutely continuous functions x E Co admitting a Radon-Nikodym derivative
x E £2, so that IIxll == J; \Xtl2dt < 00.
Theorem 27.6 (large deviations of Brownian motion, Schilder) Let X be
a d-dimensional Brownian motion on [0, 1]. Then for any Borel set B c
C([O, I],JR d ), we have
- inf lex) <
xEBO
Hm inf £2 log P{ EX E B}
£-+0
< limsup £2I og P{EX E B} < - inf lex),
c-+O xEB-
where lex) == lIxll for x E HI and lex) == 00 otherwise.
The proof requires a simple topological fact.
Lemma 27.7 (level sets) For any r > 0, the level set Lr == I-I[O,r] ==
{x E HI; IIxll < 2r} is compact in C([O, 1], ]Rd).
Proof: The Cauchy-Buniakovsky inequality yields
IXt - xsl < it Ixul du < (t - S) lj2 l1xll2, 0 < s < t < 1, x E HI-
By the Arzela-Ascoli Theorem A2.1 it follows that Lr is relatively com-
pact in C. It is also weakly compact in the Hilbert space HI with norm
!lxt! == I\x1l2. Thus, every sequence Xl, X2, . .. E Lr has a subsequence that
converges in both C and HI, say with limits x E C and y E Lr, respec-
tively. For every t E [0,1], the sequence xn(t) then converges in Rd to both
x(t) and y(t), and we get x = y E Lr. 0
Proof of Theorem 27.6: To establish the lower bound, we fix any open
set B c C. Since I == 00 outside H l , it suffices to prove that
-l(x) < liminf £21 og P{EX E B}, x E B(IH I . (10)
c-40
Now we note as in Lemma 1.35 that C5 is dense in HI, and also that
Uxll<x> < IIX"l < HxH2 for any x E HI. Hence, for every ;:r; E B n HI there
544 Foundations of Modern Probability
exist some functions X n E B n C5 with l(xn) -» lex), and it suffices to
prove (10) for x E B n e5.
Now for small enough h > 0, Theorem 18.22 yields
P{EX E B} > P{ljf:X - xII 00 < h}
E[£( -(x/E:) . X)l; IIEXl!oo < h]. (11)
Integrating by parts gives
log£( -(X/E) . Xh = _€-1 1 1 XtdXt - €-2I(x)
-1. X -I f .. X d -2 1( )
- -€ Xl 1 + € J o 1 Xt t t - € X ,
and so by (11)
E 2 10g P{EX E B} > -lex) - hlx11 - hili/II + E: 2 1og P{IIc:xlloo < h}.
Relation (10) now follows as we let £ -+ 0 and then h -+ o.
Turning to the upper bound, we fix any closed set Bee and let Bh
denote the closed h-neighborhood of B. Letting X n be the n-segment,
polygonal approximation of X with Xn(k/n) = X(k/n) for k < n, we
note that
P{EX E B} < P{EX n E B h } + P{EIIX - Xnll > h}. (12)
Writing I(Bh) = inf{l(x); x E B h }, we obtain
P{EX n E B h } < P{I(EX n ) > I(B h )}.
Here 21(X n ) is a sum of nd variables lk' where the €ik are i.i.d. N(O,I),
and so by Lemma 27.2 (i) and an interpolated version of Theorem 27.5,
limsup E 2 log P{c:X n E Bh} < -I(B h ). (13)
£--+0
Next we get by Proposition 13.13 and some elementary estimates
P{EIIX - Xnll > h} < nP{c:IIXIl > hy'n/2}
< 2ndp{€22 > h 2 n/4d},
where is N(O, 1). Applying Theorem 27.5 and Lemma 27.2 (i) again, we
obtain
limsup c: 2 10gP{£IIX - Xnll > h} < -h 2 n/8d. (14)
e--+O
Combining (12), (13), and (14) gives
limsup E 2 10g P{EX E B} < -1(Bh) 1\ (h 2 n/8d),
e--+O
and as n -+ 00 we obtain the upper bound -1(Bh).
It remains to show that I(B h ) t I(B) as h --+ o. Then fix any r >
sUPh I(Bh). For every h > 0 we may choose some Xh E Bh such that
I(xh) < r, and by Lemma 27.7 we may extract a convergent sequence
27. Large l)eviations 545
Xh n x with h n 0 such that even l(x) < r. Since also x E nh Bh == B,
we obtain I(B) < r, as required. 0
The last two theorems suggest the following abstraction. Letting €, £ >
0, be random elements in some metric space S with Borel a-field S, we say
that the family (€) satisfies the large-deviation principle (LDP) with rate
function I: S [0, 00], if for any B E S we have
- inf I(x) < liminf €logP{€ E B}
xEBO £-+0
< limsup £logP{€ E B} < - inf I(x). (15)
£-+0 xEB-
For sequences 1, 2, . .. we require the same condition with the normalizing
factor € replaced by n- 1 . It is often convenient to write I(B) == inf xEB I(x).
Letting S1 denote the class {B E S; I(BO) == I(B-)} of all I-continuity
sets, we note that (15) implies the convergence
Hm ElogP{£ E B} == -I(B), B E S1.
€-+O
(16)
If ,1,2,... are i.i.d. random vectors in]Rd with A(u) == Eeuf.. < 00
for all u, then by Theorem 27.5 the averages [n satisfy the LDP in }Rd
with rate function A *. If instead X is a d-dimensional Brownian motion on
[0,1], then Theorem 27.6 shows that the processes £1/2 X satisfy the LDP
in C([O, l],JR d ) with rate function I(x) == llxll for x E HI and I(x) == 00
otherwise.
We show that the rate function I is essentially unique.
Lemma 27.8 (regularization and uniqueness) If (€) satisfies the LDP in
a metric space S, then the associated rate function I can be chosen to be
lower semicontinuous, in which case it is unique.
Proof: Assume that (15) holds for some I. Then the function
J(x) == liminf l(y), xES,
y-+x
is clearly lower semicontinuous with J < I. It is also easy to verify that
J(G) == I(G) for all open sets G c S. Thus, (15) remains true with I
replaced by J.
To prove the uniqueness, assume that (15) holds for two lower semi-
continuous functions I and J, and let I (x) < J (x) for some xES. By
the semicontinuity of J, we may choose a neighborhood G of x such that
J(C-) > I(x). Applying (15) to both I and J yields the contradiction
-I(x) < -1(G) < limi o nf €logP{£ E G} < -J(G-) < -I(x). D
€-.
Justified by the last result, we may henceforth take the lower semiconti-
nuity to be part of our definition of a rate function. (An arbitrary function
I satisfying (15) will then be called a raw rate function.) No regularization
is needed in Theorems 27.5 and 27.6, since the associated rate functions A*
546 Foundations of Modern Probability
and I are already lower semicontinuous, the former as the supremum of a
family of continuous functions and the latter by Lemma 27.7.
It is sometimes useful to impose a slightly stronger regularity condition
on the function I. Thus, we say that I is good if the level sets I-I [0, r] =
{x E S; lex) < r} are compact (rather than just closed). Note that the
infimum I(B) = inf xEB lex) is then attained for every closed set B i- 0.
The rate functions in Theorems 27.5 and 27.6 are clearly both good.
A related condition on the family (c) is the exponential tightness
inf limsup €logP{c fj K} = -00,
K c-'O
(17)
where the infimum extends over all compact sets K c S. We actually need
only the slightly weaker condition of sequential exponential tightness, where
(17) is only required along sequences En O. To simplify our exposition,
we often omit the sequential qualification from our statements and carry
out the proofs under the stronger nonsequential hypothesis.
We finally say that (c) satisfies the weak LDP with rate function I if
the lower bound in (15) holds as stated while the upper bound is only
required for compact sets B. We list some relations between the mentioned
properties.
Lemma 27.9 (goodness, exponential tightness, and the weak LDP) Let
c, c > 0, be random elements in a metric space S.
(i) The LDP for (c) with rate function I implies (16), and the two
conditions are equivalent when I is good.
(ii) If the c are exponentially tight and satisfy the weak LDP with rate
function l, then I is good and (€) satisfies the full LDP.
(iii) (Pukhalsky) If S is Polish and (€) satisfies the LDP with rate
function I, then I is good iff (E:) is sequentially exponentially tight.
Proof: (i) Let I be good and satisfy (16). Write B h for the closed h-
neighborhood of B E S. Since I is nonincreasing on S, we have B h t/:. S I
for at most countably many h > O. Hnce, (16) yields for almost every
h>O
limsup elogP{E: E B} < Hm ElogP{E: E Bh} = -1(B h ).
co c-.O
To see that l(B h ) t l(B-) as h 0, assume instead that SUPh l(B h ) <
l(B-). Since I is good, we may choose for every h > 0 some Xh E B h with
l(Xh) = l(B h ), and then extract a convergent sequence Xh n x E B-
with h n --t o. By the lower semicontinuity of I we get the contradiction
I(B-) < l(x) < liminf l(xh n ) < supl(B h ) < l(B-),
n-+oo h>O
which proves the upper bound. Next let x E BO be arbitrary, and conclude
from (16) that, for almost all sufficiently small h > 0,
-I(x) < -I( {x }h) = lim £ log P{€ E {x }h} < liminf € log P{c E B}.
€-.Q €o
27. Large l)eviations 547
The lower bound now follows as we take the supremum over x E BO.
(ii) By (17) we may choose some compact sets Kr satisfying
limsup €logP{c f/- Kr} < -r, r > O. (18)
eO
For any closed set B c S, we have
P{c E B} < 2(P{c E B n Kr} V P{€c rf-: Kr}), r > 0,
and so, by the weak LDP and (18),
limsup € log P{£ E B} < -I(B n Kr) /\ r < -I(B) /\ r.
£o
The upper bound now follows as we let r 00. Applying the lower bound
and (18) to the sets K gives
-I(K) < limsup E log P{€e tt Kr} < -r, r > 0,
e--+-O
and so I-I [0, r] C Kr for all r > 0, which shows that I is good.
(iii) The sufficiency follows from (ii), applied to an arbitrary sequence
En O. Now let S be separable and complete, and assume that the
rate function I is good. For any kEN we may cover S by some open
balls B kI ,B k2 ,... of radius Ilk. Putting U km = UjmBkJ' we have
SUPm I(U km ) = 00 since any level set I-I [0, r] is covered by finitely many
sets Bkj. Now fix any sequence En ---t 0 and constant r :> O. By the LDP
upper bound and the fact that P{en E U km } ---t 0 as m ---t 00 for fixed n
and k, we may choose mk E N so large that
P{£n E Uk,mk} < exp(-rkIEn), n,k E f.
Summing a geometric series, we obtain
limsuPn En log p{ En E Uk Uf,m k } < -r.
The asserted exponential tightness now follows, since the set nk Uk,mk is
totally bounded and hence relatively compact. 0
The analogy with weak convergence theory suggests that we look for a
version of (16) for continuous functions.
Theorem 27.10 (functional LDP, Varadhan, Bryc) Let E' E > 0, be
random elements in a metric space S.
(i) If (E) satisfies the LDP with a rate function I and if f: S ---+ is
continuous and bounded above, then
AI = lim E log Eexp (f(€E)/E) = sup (f(x) - I(x)).
E--+-O xES
(ii) If the c are exponentially tight and the limit Af in (i) exists for every
f E C b , then (c) satisfies the LDP with the good rate function
I(x) = sup (f(x) - Af), xES.
fECb
548 Foundations of Modern Probability
Proof: (i) For every n E N we can choose finitely many closed sets B 1 , . . . ,
Bm C S such that f < -n on n j Bj and the oscillation of f on each Bj is
at most n- 1 . Then
lim sup c log Eef(€: )/e: < rp.ax Hrn sup clog E [ ef(€: )/e:; e: E B j ] V (-n)
e:0 Jm eO
< rp.ax ( SUPXEB.!(X) - i nfxEB.I(x) ) V (-n)
Jm J J
< rp.ax sup (f(x) - l(x) + n- 1 ) V (-n)
Jm xEBJ
sup (f(x) - I(x) + n- 1 ) V (-n).
xES
The upper bound now follows as we let n -+ 00. Next we fix any xES
with a neighborhood G and write
lij8-f dogEef(e)/c > lij8-f dogE[ef(e)/c; c E G]
> inf f(y) - inf I(y)
yEG yEG
> inf f(y) - I(x).
yEG
Here the lower bound follows as we let G .J,. {x} and then take the supremum
over xES.
(ii) First we note that I is lower semicontinuous, as the supremum over
a family of continuous functions. Since A/ == 0 for f == 0, it is also clear
that I > O. By Lemma 27.9 (ii) it remains to show that (£) satisfies the
weak LDP with rate function I. Then fix any 6 > O. For every x E 5, we
may choose a function fx E C b satisfying
fx(x) - Afx > (I(x) - b) /\ b- 1 ,
and by continuity there exists a neighborhood Bx of x such that
l.
fx(Y) > A/x + (I(x) - b) 1\ b- 1 , Y E Bx.
By Chebyshev's inequality we get for any c > 0
P{e E Bx} < Eexp (c-l(!x(e) - inf{!x(y); y E Bx}))
< Eexp (c-l(fx(eJ - A/x - (I(x) - 6) 1\ b- 1 )) ,
and so by the definition of A/x'
limsup clogP{e: E Bx}
£-+0
< lirn c log Eexp(fx(£)/c) - A/x - (I(x) - 8) 1\ 8- 1
e:----..O
- -(I(x) - 6) 1\ 6- 1 .
27. Large Deviations 549
N ow fix any compact set K c S, and choose Xl,' . . , X rn E K such that
K c Ui BXt' Then
limsup clogP{c E K} <
E-+O
<
rp.ax limsup clogP{€€ E Bx}
::;m E-+O
- min (I(Xi) - 8) !\ b- 1
::;m
-(I(K) - 8) 1\ 8- 1 .
<
The upper bound now follows as we let 8 ---t o.
Next consider any open set G and element x E G. For any n E N we
may choose a continuous function in: S -+ [-n,O] such that !n(x) == 0 and
In == -n on CC. Then
-I (x) /fbb (Af - f(x)) < Afn - fn(x) = Afn
lim c log Eexp(in(€c)/c)
c-+o
< liminf ElogP{E E G} V (-n).
E-+O
The lower bound now follows as we let n ---t 00 and then take the supremum
over all x E G. 0
Next we note that the LDP is preserved by continuous mappings. The
following results are often referred to as the direct and inverse contraction
principles. Given any rate function I on S and a function I: S -+ T, we
define the image J = 1 0 f-l on T as the function
J (y) == I (f-1 {y }) == inf { 1 ( x ); f ( x) == y}, yET. ( 19 )
Note that the corresponding set functions are related by
J(B) = inf J(y) == inf{1(x); I(x) E B} == 1(/- 1 B), BeT.
yEB
Theorem 27.11 (continuous mapping) Consider a continuous function
f between two metric spaces Sand T, and let E be random elements in S.
(i) If (€) satisfies the LDP in S with rate function I, then the images
i(€) satisfy the LDP in T with the raw rate function J == I 0 f-1.
Moreover, J is a good rate function on T whenever- the function I is
good on S.
(ii) (loffe) Let (€) be exponentially tight in S, let f be injective, and
let the images f(E) satisfy the weak LDP in T with rate function J.
Then (E) satisfies the LDP in S with the good rate function I == J 0 f.
Proof: (i) Since f is continuous, we note that f-l B is open or closed
whenever the corresponding property holds for B. Using the LDP for (E)'
we get for any BeT
-1(f-l BO)
< liminf £logP{E E f- 1 BO}
€-+O
< limsup €logP{e E f-1B-} < -1(f- 1 B-),
c-+O
550 Foundations of Modern Probability
which proves the LDP for {f(g)} with the raw rate function J = 10 f-l.
When I is good, we claim that
J- 1 [0, r] == I{I- 1 [0, r]), r > o.
(20)
To see this, fix any r > 0, and let x E I-I [0, r]. Then
J 0 j(x) = 10 j-l 0 j(x) = inf{I(u); feu) = j(x)} < lex) < r,
which means that f(x) E J-1[O,r]. Conversely, let y E J-l[O,r]. Since I is
good and 1 is continuous, the infimum in (19) is attained at some XES,
and we get y == j(x) with l(x) < r. Thus, y E 1(1- 1 [0, r]), which completes
the proof of (20). Since continuous maps preserve compactness, (20) shows
that the goodness of 1 carries over to J.
(ii) Here I is again a rate function, since the lower semicontinuity of J is
preserved by composition with the continuous map f. By Lemma 27.9 (ii)
it is then enough to show that (g) satisfies the weak LDP in S. To prove
the upper bound, fix any compact set K c S, and note that the image
set f{K) is again compact since 1 is continuous. Hence, the weak LDP for
(j(f,cJ) yields
limsup ElogP{f,E E K}
EO
limsup E log P{f(f,E) E j(K)}
E-+O
< -J(j(K)) == -I(K).
Next we fix any open set G c S, and let x E G be arbitrary with
I (x) == r < 00. Since (E) is exponentially tight, we may choose a compact
set K c S such that
Hm sup E log P{ f,c: t/: K} < -r.
E-+O
(21)
The continuous image f(K) is compact in T,and so by (21) and the weak
LDP for {f(E)}
-1(K C ) -J{j(K C )) < -J«j(K))C)
< liminf ElogP{f(E) tI. j(K)}
£o
< limsup £logP{E K} < -r.
e-+O
Since I(x) = r, we conclude that x E K.
As a continuous bijection from the compact set K onto f (K), the func-
tion f is in fact a homeomorphism between the two sets with their subset
topologies. By Lemma 1.6 we may then choose an open set G' C T such
that I(x) E I(G n K) = G' n j(K). Noting that
P{f(e) E G'} < P{e E G} + P{E K}
27. Large Deviations 551
and using the weak LDP of {f (€c) }, we get
-r -I(x) = -J(f(x))
< lim inf E log P{f(c) E G'}
£-+-O
< liminf clogP{c E G} V limsup clogP{c ff K}.
c-+-o €-+-O
Hence, by (21)
-I(x) < liminf ElogP{€ E G}, x E G,
c-+-o
and the lower bound follows as we take the supremum over all x E G. 0
We turn to the powerful method of projective limits. The following se-
quential version is sufficient for our needs and will enable us to extend the
LDP to a variety of infinite-dimensional contexts. Some general background
on projective limits is provided by Appendix A2.
Theorem 27.12 (random sequences, Dawson and Giirtner) For any met-
ric spaces 8 1 , 8 2 , . .. , let c == ( ) be random elements in S == Xk S k ,
such that for every n E N the vectors (;,..., ) satisfy the LDP in
sn == XknSk with a good rate function In. Then (£) sa.tisfies the LDP in
S with the good rate function
I(x) == sUPnln(Xl,..., x n ), X == (Xl, X2,...) E S. (22)
Proof: For any m < n we introduce the natural projections 1r n : S -)- sn
and 1r mn : sn --+ sm. Since the 1r mn are continuous and the In are
good, Theorem 27.11 shows that 1m = In 0 1r for all m < n, and so
1r mn (I;1[0,r]) C I;1[0,r] forallr > Oandm < n. Hence foreachr > o the
level sets 1;1 [0, r] form a projective sequence. Since they are also compact
by hypothesis, and in view of (22)
I-l[O,r]=nn7rlIl[O,r], r > O,
(23)
Lemma A2.9 shows that the sets 1-1[0, r] are compact. Thus, I is again a
good rate function.
Now fix any closed set A c S and put An == 1r n A, so that 1rmnAn == Am
for all m < n. Since the 1r mn are continuous, we have also 1rmnA C A;;;,
for m < n, which means that the sets A form a projective sequence. We
claim that
A = nn7rl A.
(24)
Here the relation A C 1r:;;1 A is obvious. Next assume that x tt A. By the
definition of the product topology, we may choose a kEN and an open set
U C Sk such that x E 1r;1U C AC. It follows easily that 1rkX E U c Ak.
Since U is open, we have even 1f'kX E (A;)c. Thus, x tt nn 1r; 1 A;;-, which
completes the proof of (24). The projective property carries over to the
intersections A;;- n I;l[O,r], and formulas (23) and (24) combine into the
552 Foundations of Modern Probability
relation
Anr 1 [O,r] = nn7rl (A nIl[O,r]), r > O. (25)
Now assume that I(A) > r E JR. Then A n 1-1 [0, r] == 0, and by (25)
and Lemma A2.9 we get A;; n 1;;1 [0, r] == 0 for some n E N, which implies
In(A) > r. Noting that A C 7r;;-1 An and using the LDP in sn, we conclude
that
limsup ElogP{f: E A} < limsup ElogP{7rne E An}
eO eO
< -In(A;;:-) < -r,
The upper bound now follows as we let r t I(A).
Finally, fix an open set G C S and let x E G be arbitrary. By the
definition of the product topology, we may choose n E N and an open set
U C sn such that x E 7r:;;lU C G. The LDP in sn yields
liminf ElogP{e: E G} > liminf €logP{7rn£ E U}
eO e:O
> -In(U) > -In 0 7r n (x) > -I(x),
and the lower bound follows as we take the supremum over all x E G. 0
We consider yet another basic method for extending the LDP, namely by
suitable approximation. Here the following elementary result is often help-
ful. Let us say that the random elements e: and 'f}e in a common separable
metric space (8, d) are exponentially equivalent if
Hm ElogP{d(e,'f}cJ > h} = -00, h> o.
e-+O
(26)
The separability of S is needed only to ensure measurability of the pairwise
distances d( e:, 'f}e:). In general, we may replace (26) by a similar condition
involving the outer measure.
Lemma 27.13 (approximation) Let e and 'f}e be exponentially equivalent
random elements in a separable metric space S.< Then (e:) satisfies the LDP
with a good rate function I iff the same LDP holds for ('T}c).
Proof: Suppose that the LDP holds for (E) with rate function I. Fix any
closed set B C S, and let B h denote the closed h-neighborhood of B. Then
P {'T}e E B} < P {c E Bh} + P { d( e, 'l]e) > h},
and so by (26) and the LDP for (e:)
limsup c logP{17e E B}
e-+O
< limsup clogP{e E Bh} V limsup clogP{d(c,17e) > h}
c-+O eO
< -I(B h ) V (-00) = -I(B h ).
Since I is good, we have I(Bh) t I(B) as h -+ 0, and the required upper
bound follows.
27. Large Deviations 553
Next we fix an open set G c S and an element x E G. If d(x, GC) > h > 0,
we may choose a neighborhood U of x such that U h c G. Noting that
P{£ E U} < P{7J£ E G} + P{d(£, 1]£) > h},
we get by (26) and the LDP for (£)
-lex)
< -I(U) < lirn inf E log P{£ E U}
£-+0
< lirninf ElogP{7J£ E G} V limsup ElogP{d(t;,1J£) > h}
£O £-+0
lirn inf E log P{ 7J£ E G}.
£O
The required lower bound now follows, as we take the supremum over all
x E G. 0
We now demonstrate the power of the abstract theory by considering
some important applications. First we study perturbations of the ordinary
differential equation x == b(x) by a small noise term. 1\fore precisely, we
consider the unique solution X£ with xg == 0 of the d-dimensional SDE
dX t == €1/2dBt + b(Xt)dt, t > 0,
(27)
where B is a Brownian motion in ]Rd and b is bounded and uniformly Lip-
schitz continuous mapping on ]Rd. Let Hoo denote the set of all absolutely
continuous functions x: JR+ -+ JRd with Xo == 0 such that x E £2.
Theorem 27.14 (perturbed dynamical systems, Freidlin and Wentzell)
For any bounded, uniformly Lipschitz continuous function b : JRd -+ JRd,
the solutions X£ to (27) with xg == 0 satisfy the LDP n C(JR+,d) with
the good rate function
I(x) = 1 00 IXt - b(XtW dt, x E Hoo. (28)
Here it is understood that l(x) == 00 when x (j. Hoo. Note that the result
for b == 0 extends Theorem 27.6 to processes on JR+.
Proof: If B I is a Brownian motion on [0,1], then for every r > 0 the
process Br = <I>(B I ) given by B[ ::::: r I / 2 BI/ r is a Brownian motion on
[0, r]. Noting that <P is continuous from C([O, 1]) to C([O, r]), we see from
Theorems 27.6 and 27.11 (i) together with Lemma 27.7 that the processes
£1/2 Br satisfy the LDP in C([O, r]) with the good rate function Ir == II 0
<fJ-I, where II(X) = tlxll for x E HI and II(X) == 00 otherwise. Now <P
maps HI onto Hr, and when y == q,(x) with x E HI we have Xt == r l / 2 Yrt.
Hence, by calculus Ir(Y) = J; IYsl2ds = !"yll, which extends Theorem
27.6 to [0, r]. For the further extension to JR.+, let 7f n X denote the restriction
of a function x E C(IR+) to [0, n], and infer from Theorem 27.12 that the
processes EI/2 B satisfy the LDP in C(IR+) with the good rate function
loo(x) = sUPn In (7r n x) = !lIxll.
554 Foundations of Modern Probability
By an elementary version of Theorem 21.3, the integral equation
Xt = Zt + it b(xs) ds, t > 0, (29)
has a unique solution x :=: F(z) in C :=: C(J.+) for every z E C. Letting
Zl, z2 E C be arbitrary and writing a for the Lipschitz constant of b, we
note that the corresponding solutions xi == F(zi) satisfy
Ix: - x1 < Ilzl - z211 + a it Ix - x;1 ds, t > o.
Hence, Gronwall's Lemma 21.4 yields II Xl - x 2 11 < IIZI - z211e ar on the
interval [0, r], which shows that F is continuous. Using Schilder's theorem
on JR+ along with Theorem 27.11 (i), we conclude that the processes XC:
satisfy the LDP in C(JR+) with the good rate function 1 :=: 100 0 F- 1 . Now
F is clearly bijective, and (29) shows that the functions z and x == F(z) lie
simultaneously in Hoo, in which case z = x - b(x) a.e. Thus, 1 is indeed
given by (28). 0
Now consider a random element with distribution fl in an arbitrary
metric space S. We introduce the cumulant-generating functional
A(f) :=: log Eef() :=: log J-le I , f E Cb(S),
and the associated Legendre-Fenchel transform
A*(v) :=: sup (vi - A(f)), v E P(S),
fECb
where P(S) denotes the class of probability measures on 8, endowed with
the topology of weak convergence. Note that A and A * are both convex, by
the same argument as for JRd.
Given any two measures J.l, v E P(S), we define the relative entropy of v
with respect to J-l by (
(30)
H(vlJ-l) = { V logp:=: J-l(plogp),
00,
v J..t with v = p . J-l,
v J-l.
Since xlogx is convex, the function H(vIJl) is convex in v for fixed fl, and
by Jensen's inequality we have
H(vlJ-l) > J-lp log J-lP :=: v8 log v8 = 0, v E P(S),
with equality iff 1/ :=: J..t.
Now let 1, 2, . .. be i.i.d. random elements in S. The associated
empirical distributions are given by
'TJn=n-l 8k' nEN.
L....J k $. n
They may be regarded as random elements in P(8), and we note that
'TInf= n- 1 Lk.5nf(k)' f E Cb(S), n E N.
27. Large Deviations 555
In particular, Theorem 27.5 applies to the random vectors (TJnfl, . . . , TJnfm)
for fixed 11,. . . ,1m E Cb(S). The following result may be regarded as an
infinite-dimensional version of Theorem 27.5. It also provides an important
connection to statistical mechanics, via the entropy function.
Theorem 27.15 (large deviations of empirical distributions, Sanov) Let
1, 2, . .. be i. i. d. random elements with distribution J-L in a Polish space
S, and put A(f) == log J-Le i . Then the associated empirical distributions
1}1, TJ2, . .. satisfy the LDP in P(S) with the good rate function
A*(v) == H(vIJl), v E P(S). (31)
A couple of lemmas will be needed for the proof.
Lemma 27.16 (entropy, Donsker and Varadhan) In (30) it is equivalent
to take the supremum over all bounded, measurable functions f: S -t JR.
The identity (31) then holds for any probability measures Jl and v on a
common measurable space S.
Proof: The first assertion holds by Lemma 1.35 and dominated conver-
gence. If v <;k. J-L, then H(vIJL) == 00 by definition. Furthermore, we may
choose a set B E S with pB :=: 0 and vB > 0, and take in == nIB to
obtain vfn -logJle fn == nvB -t 00. Thus, even A*(v) == 00 in this case,
and it remains to prove (31) when v « Jl. Assuming v == p. JL and writing
f = log p, we note that
v f - log J.Le f == v logp - log J.LP == H(vlJ.L).
If f = logp is unbounded, we may approximate by bounded measurable
functions f n satisfying J-le f n -t 1 and v f n -t v 1, and we get A * ( v) >
H(vlJ-l).
To prove the reverse inequality, we first assume that S is finite and
generated by a partition B 1, . . . , Bn of S. Putting J-Lk == JlB k, Vk == vB k,
and Pk == Vk/ J-Lk, we may write our claim in the form
g(x) = LkvkXk -log Lkl-Lk eXk < Lkvk logpk,
where x == (Xl,..., x n ) E ]Rd is arbitrary. Here the function 9 is concave
and satisfies V g( x) = 0 for x == (log PI, . . . , log Pn), asymptotically when
Pk = 0 for some k. Thus,
sUPxg(x) = g(lOgPl" .., logpk) = Lkvk lOgPk'
To prove the inequality v f - log JLe f < v log p in general, we may assume
that f is simple. The generated a-field :F c S is then finite, and we note
that v = j.t(PIF] . JL on F. Using the result in the finite case, together with
Jensen's inequality for conditional expectations, we obtain
v f - log J.Le f < J-L(J.L[PIF] log J.L[pIF])
< J.t J.L[p log plF] = v log p.
o
556 Foundations of Modern Probability
Lemma 27.1 7 (exponential tightness) The empirical distributions 1]n in
Theorem 27.15 are exponentially tight in P(S).
Proof: If B E S with P{ E B} = P E (0,1), then by Theorem 27.3 and
Lemmas 27.1 and 27.2 we have for any x E (P,1]
x I-x
supn-11ogP{17nB>X} < -xlog--(I-x)log . (32)
n p I-p
In particular, we note that the right-hand side tends to -00 as p --7 0 for
fixed x E (0,1). Now fix any r > o. By (32) and Theorem 16.3, we may
choose some compact sets K 1 , K 2 ,.. . C S such that
P{1]n K k > 2- k } < e- knr , k,n E N.
Summing over k gives
limsupn-1logPU {1]n K k > 2- k } < -r,
nCX) k
and it remains to note that the set
M = n k {v E peS); vKk < Tk}
is compact, by another application of Theorem 16.3.
o
Proof of Theorem 27.15: By Theorem AI.1 we can embed S as a Borel
subset of a compact metric space K. The function space Cb(K) is separable,
and we can choose a dense sequence 11,/2,... E Cb(K). For any mEN,
the random vector (11 (), . . . , 1m ()) has cumulant-generating function
Am(u) = log Eexp L ukfk() = A 0 L uklk, U E m,
k5:m k5:m
and so by Theorem 27.5 the random vectors (1]n/l,..., 1}nf m) satisfy the
LDP in JRm with the good rate function A:n. By Theorem 27.12 it follows
that the infinite sequences ('fJn/l, 1]n!2, . . .) satisfy the LDP in JRCX) with
the good rate function J = sUPm(A 0 7r m ), where 7r m denotes the natural
projection of ]ROO onto }Rm. Since P(K) is compact by Theorem 16.3 and
the mapping v M (ViI, v 12,. ..) is a continuous injection of P(K) into]Roo,
Theorem 27.11 (ii) shows that the random measures TIn satisfy the LDP in
P(K) with the good rate function
IK(v) J(V/l, V!2,...) = sUPm A :n(v/l,..., vim)
sup sup ( 2: UkV Ik - A 0 2: ukfk )
m uERm km k5:m
sup (vi - A(f)) = sup (vI - A(f»), (33)
IE1=" IEC b
where F denotes the set of all linear combinations of !1, f2,. .. .
Next we note that the natural embedding P(S) -+ P(K) is continuous,
since for any f E Cb(K) the restriction of I to S belongs to Cb(S). Since it
is also trivially injective, we see from Theorem 27.11 (ii) and Lemma 27.17
27. Large Deviations 557
that the 1Jn satisfy the LDP even in P(S), with a good rate function Is
that equals the restriction of IK to P(S). It remains to note that Is == A*
by (33) and Lemma 27.16. 0
We conclude with a remarkable application of Schilder's Theorem 27.6.
Writing B for a standard Brownian motion in JRd, we define for any t > e
the scaled process X t by
X t == Bst
, s > o.
s yl 2t log log t
Theorem 27.18 (functional law of the iterated logarithm, Strassen) Let
B be a Brownian motion in }Rd, and define the processes X t by (34). Then
the following equivalent statements hold outside a fixed P -null set:
(i) The paths xt, t > 3, form a relatively compact set in C(JR+,IR d ),
whose set of limit points as t --+ 00 equals K == {x = Hoo; IIxl12 < I}.
(ii) For any continuous function F: C(JR+,JR d ) -+ JR, 'UJe have
limsupF(X t ) == sup F(x).
t-+oo xEK
(34)
In particular, we may recover the classical law of the iterated logarithm
in Theorem 13.18 by choosing F(x) = Xl. Using Theorem 14.6, we can
easily derive a correspondingly strengthened version for random walks.
Proof: The equivalence of (i) a nd (ii) be ing elementary, we need to prove
only (i). Noting that X t d B/ V 2loglogt and using Theorem 27.6, we get
for any measurable set A c C(+,JRd) and constant r > 1
I " logP{xrnEA} I " logP{XtEA} 2I(A - )
1m sup < Imsup < - ,
n-+oo log n - t-+oo log log t -
lirn inf log p{X rn E A} > lirn inf log P{X t E A > -2I(A O ),
n-+oo log n - t-+oo log log t -
where lex) = !lIxll when x E Hoo and lex) = 00 otherwise. Hence,
LnP{XrnE A} { < ::
2I(A-) > 1,
2I(AO) < 1.
(35)
Now fix any r > 1, and let G :) K be open. Note that 2I(GC) > 1
by Lemma 27.7. By the first part of (35) and the Borel-Cantelli lemma
we have P {X rn t/:. G La.} == 0 or, equivalently, IG (X rn ) --+ 1 a.s. Since G
n
was arbitrary, it follows that p(X r , K) --+ 0 a.s. for any metrization p of
C(R+,R d ). In particular, this holds with any c > 0 for the metric
Pc(x, y) = 1 00 ((x - y): A 1) e- cs ds, x, y E C(JR+, JRd).
To extend the convergence to the entire family {X t }, fix any path of B
such that PI (X rn , K) --+ 0, and choose some functions yrn E K satisfying
PI (X rn , yrn) o. For any t E [r n , rn+l ), the paths X rn and X t are related
558 Foundations of Modern Probability
by
( n l 1 n ) I/2
Xt(s) = X rn (tr-ns) r og ogr ,
t log log t
s > o.
Defining yt in the same way in terms of yr n , we note that also yt E K since
I(yt) < I(yr n ). (The two Hoo-norms would agree if the logarithmic factors
were omitted.) Furthermore,
Pr(X t , yt) _ 1 00 ((X t - yt): AI) e- rs ds
< 1 00 ((X rn _ yrn);s AI) e- rs ds
r- 1 PI (X rn , yrn) -t o.
Thus, Pr(X t , K) -+ O. Since K is compact, we conclude that {xt} is
relatively compact, with all its limit points as t -+ 00 belonging to K.
Now fix any y E K and u > £ > O. By the established part of the theorem
and the Cauchy-Buniakowski inequality, we have a.s.
limsup (X t - y); < sup (x - y); < sup x; + y; < 2£1/2. (36)
t-H)() xEK xEK
Write x;,u = sUPsE[e,u] IXs-xel, and choose r > u/£ to ensure independence
between the variables (X rn -y); u. Applying the second part of (35) to the
,
open set A == {x; (x - Y);,u < £} and using the Borel-Cantelli lemma
together with (36), we obtain a.s.
liminf (X t - y): < limsup (X t - y); + liminf (X rn - y);,u
t(X) t(X) n(X)
< 2£1/2 + E.
Letting £ -+ 0 gives liminft(X t - y) == 0 a.s., and so liminf t Pl(X t ,y) <
e- U a.s. As u -t 00, we obtain lim inft PI (X t , y) = 0 a.s. Applying this
result to a dense sequence Yl, Y2, . . . E K, we see that a.s. every element of
K is a limit point as t -+ 00 of the family {X t }. 0
Exercises
1. For any random vector and constant a in IR d , show that A-a(u) =
A(u) - ua and A_a(x) == A(x + a).
2. For any random vector in }Rd and nonsingular d x d matrix a, show
that Aa(u) = A(ua) and A(x) = A(a-lx).
3. For any pair of independent random vectors and 1}, show that
A,17(u, v) = A(u) + A17(v) and A,17(x, y) = A(x) + A;(y).
4. Prove the claims of Lemma 27.2.
,
27. Large Deviations 559
5. If is Gaussian in JRd with mean m E d and covariance matrix a, show
that A€ (x) == ! (x - m)' a-I (x - m). Explain the interpretation when a is
singular.
6. Let be a standard Gaussian random vector in ]Rd. Show that the family
cl/2 satisfies the LDP in IR d with the good rate function I(x) == lxI2.
(Hint: Deduce the result along the sequence Cn == n- l from Theorem 27.5,
and extend by monotonicity to general E > 0.)
7. Use Theorem 27.11 (i) to deduce the preceding result from Schilder's
theorem. (Hint: For x E HI, note that IXll < IIxl12 with equality iff Xt
tXl. )
8. Prove Schilder's theorem on [0, T] by the same argument as for [0, 1].
9. Deduce Schilder's theorem in the space C([O, n], d) from the version in
C([O, 1], nd).
10. Let B be a Brownian bridge in d. Show that the processes c l / 2 B
satisfy the LDP in C([O, 1], d) with the good rate function I(x) == llxll
for x E HI with Xl == 0 and I(x) == 00 otherwise. (Hint: Write Bt ==
Xt - tX l , where X is a Brownian motion in ]Rd, and use Theorem 27.11.
Check that Ilx - al12 is minimized for a == Xl')
11. Show that the property of exponential tightness and its sequential
version are preserved by continuous mappings.
12. Prove that if the processes X£ and ye in C(+, JRd) are exponentially
tight, then so is any linear combination aXe + bYe. (Hint: Use the Arzela-
Ascoli theorem.)
13. Show directly from (27) that the processes Xc in Theorem 27.14 are
exponentially tight. (Hint: Use Lemmas 27.7 and 27.9 (iii) together with the
Arzela- Ascoli theorem.) Derive the same result from the stated theorem.
14. Let c be random elements in a locally compact metric space S, satisfy-
ing the LDP with a good rate function I. Show that the c are exponentially
tight (even in the nonsequential sense). (Hint: For any r > 0, there exists a
compact set Kr C S such that I-I [0, r] C K. Now apply the LDP upper
bound to the closed sets (K)C K.)
15. For any metric space Sand lcscH space T, let Xc be random elements in
C(T, S) whose restrictions Xl< to an arbitrary compact set K c T satisfy
the LDP in C(K, S) with the good rate function IK. Show that the Xc
satisfy the LDP in C (T, S) with the good rate function I == sup K (I K 011 K ),
where 1IK denotes the restriction map from C(T, S) to C(K, S).
16. Let kj be i.i.d. random vectors in JR.d satisfying A( u) == EeUkJ < 00 for
all u E )Rd. Show that the sequences €n = n-lEk<n(kl,k2"") satisfy
an LDP in (JRd)oo with the good rate function I() = E j A*(xj). Also
derive an LDP for the associated random walks in Rd.
560 Foundations of Modern Probability
17. Let be a sequence of i.i.d. N(O, 1) random variables. Use the preceding
result to show that the sequences €1/2 satisfy the LDP in ]Roo with the
good rate function l(x) == !lIxll2 for x E Z2 and l(x) == 00 otherwise. Also
show how the statement follows from Schilder's theorem.
18. Let l, 2, . .. be i.i.d. random probability merasures on a Polish space
S. Derive an LDP in P(S) for the averages n = n- 1 Ek<n k. (Hint: Define
A{f) == log Eel;.kf, and proceed as in the proof of Sanov's theorem.)
19. Show how the classical law of the iterated logarithm in Theorem 13.18
follows from Theorem 27.18. Also use the latter result to derive a law of the
iterated logarithm for the variables t = IB 2t - Btl, where B is a Brownian
motion in JRd.
20. Use Theorem 27.18 to derive a corresponding law of the iterated
logarithm in C([O, 1], ]Rd).
21. Use Theorems 14.6 and 27.18 to derive a functional law of the iterated
logarithm for random walks based on i.i.d. random variables with mean 0
and variance 1. (Hint: To state the result in C(IR+, JR), replace the summa-
tion process S[t] by its linearly interpolated version, as in case of Corollary
16.7. )
22. Use Theorems 14.13 and 27.18 to derive a functional law of the iterated
logarithm for suitable renewal processes.
23. Let B 1 , B 2 , . .. be independent Brownian motions in IR d . Show that the
sequence of paths Xr = (2Iogn)-1/2Bf, n > 2, is a.s. relatively compact
in C(JR+,]Rd) with set of limit points K == {x E Hoo; IIx!l2 < I}.
Appendices
Here we list some results that play an important role in this
book but whose proofs are too long or technical to contribute in
any essential way to the understanding of the subject matter.
Proofs are given only for results that are not easily accessible in
the literature.
AI. Advanced Measure Theory
The basic facts of measure theory were reviewed in Chapters 1 and 2. In this
appendix we list, mostly without proofs, some special or less elementary
results that are required in this book. One of the quoted results is used more
frequently, namely the Borel nature of Polish spaces in Theorem A1.2. The
remaining results are needed only for special purposes.
We begin with a basic embedding theorem. Recall that a topological
space is said to be Polish if it is separable with a complete metrization.
Theorem AI.l (embedding) Any Polish space is homeomorphic to a
Borel subset of the compact space [0, 1] <X> .
Proof: See Theorem 11.82.5 in Rogers and Williams (1994).
o
We say that two measurable spaces Sand T are Borel isomorphic if there
exists a measurable bijection f: S --+ T such that j-l is also measurable.
A Borel space is defined as a measurable space that is Borel isomorphic to
a Borel subset of [0, 1]. The following result shows that the most commonly
occurring spaces are Borel.
Theorem Al.2 (Polish and Borel spaces) Every Borel subset of a Polish
space is a Borel space.
Proof: By Theorem A1.I, it is enough to show that [0,1]00 is a Borel
space. This may be seen by an elementary argument involving binary ex-
pansions, similar to that used in the proof of Lemma 3.21. However some
extra care is needed to ensure that the resulting mapping into [0, 1] is in-
jective and bimeasurable with a measurable range. See, e.g., Theorem A.47
in Breiman (1968) for details. 0
If a measurable mapping is invertible, then the measurability of the
inverse can sometimes be inferred from the measurability of the range.
562 Foundations of Modern Probability
Theorem Al.3 (range and inverse, Kuratowski) Let f be a measurable
bijection between two Borel spaces Sand T. Then the inverse f-l : T -+ S
is again measurable.
Proof: See Parthasarathy (1967), Section 1.3.
o
We turn to the basic projection and section theorem, which plays such an
important role in the more advanced literature. For any measurable space
(n,:F), the universal completion of F is defined as the a-field F = nJ.L FJ-t,
where :FJ-t denotes the completion with respect to J..L, and the intersection
extends over all probability measures J.-t on :F. For any spaces nand S, we
define the projection 7r A of a set A c n x S onto n as the union Us As,
where As == {w E n; (w, s) E A}, s E S.
Theorem Al.4 (projection and sections, Lusin, Choquet, Meyer) Fix a
measurable space (n,F) and a Borel space (8,5), and consider a set A E
F 0 S with projection 7r A onto n. Then
(i) 1f A belongs to the universal completion :F of :F;
(ii) for any probability measure P on:F, there exists a random element €
in S such that (w, (w)) E A holds P-a.s. on 7rA.
Proof: See Dellacherie and Meyer (1975), Section 111.44.
o
A2. Some Special Spaces
Here we collect some basic facts about various set, measure, and function
spaces of importance in probability theory. Though random processes with
paths in C(JR+, ]Rd) or D(JR+, JRd) and random measures on a variety of
spaces are considered throughout the book, most of the topological results
mentioned here are not needed until Chapter 16, where they play a fun-
damental role for the theory of convergence in distribution. Our plan is
to begin with the basic function spaces and then move on to some spaces
of measures and sets. Whenever appropriate accounts are available in the
literature, we omit the proofs.
We begin with a well-known classical result. On any space of functions
x: K -+ S, we introduce the evaluation maps 1ft : x M Xt, t E K. Given
some metrics d in K and p in S, we define the associated modulus of
continuity by
w(X, h) = sup{p(xs,Xt); des, t) < h}, h > O.
A2. Some Special Spaces 563
Theorem A2.1 (equicontinuity and compactness, Arzela, Ascoli) Fix two
metric spaces K and S, where K is compact and 5 is complete, and let D
be dense in K. Then a set A c C(K,5) is relatively compact iff 1rtA is
relatively compact in S for every t E D and
lim sup w(x, h) ==: O.
h-40 xEA
In that case, even UtEK 1TtA is relatively compact in S.
Proof: See Dudley (1989), Section 2.4.
o
Next we fix a separable, complete metric space (5, p) and consider the
space D(JR+, S) of functions x: 1R+ -+ S that are right-continuous with left-
hand limits (rcll). It is easy to see that, for any E, t > 0, such a function x
has at most finitely many jumps of size> E before time t. In D(JR+, 5) we
introduce the modified modulus of continuity
w(x, t, h) == inf max sup p(x r , xs), x E D(+, S), t, h > 0, (1)
(I k ) k r,sE1k
where the infimum extends over all partitions of the interval [0, t) into
subintervals Ik == [u, v) such that v - u > h when IV < t. Note that
w(x, t, h) -+ 0 as h -+ 0 for fixed x E D(IR+, S) and t > O. By a time-
change on JR+ we mean a monotone bijection A: JR+ -+ R+. Note that A is
continuous and strictly increasing with Ao == 0 and Aoo == 00.
Theorem A2.2 (J 1 -topology, Skorohod, Prohorov, Kolmogorov) Fix a
separable, complete metric space (S, p) and a dense set T c JR+. Then there
exists a separable and complete metric d in D(JR+, 5) such that d(xn, x) -+ 0
iff
sup IAn(s) - sl + sup p(x n 0 An(S), x(s)) -+ 0, t > 0,
st st
for some time-changes An on JR+. Furthermore, B(D(JR+, S)) == a{ 1ft; t E T},
and a set A c D(R+, S) is relatively compact iff 1rtA is relatively compact
in S for every t E T and
lim sup w(x, t, h) == 0, t > O. (2)
hO xEA
In that case, Ust 1T'sA is relatively compact in S for every t > O.
Proof: See Either and Kurtz (1986), Sections 3.5 and 3.6, or Jacod and
Shiryaev (1987), Section VI.l. 0
A suitably modified version of the last result applies to the space D([O, 1],
S). Here we define w(x, h) in terms of partitions of [0,1) into subintervals
of length > h and use time-changes A that are increasing bijections on [0,1].
Turning to the case of measure spaces, let S be a locally compact, second-
countable Hausdorff (lcscH) space S with Borel a-field S, and let S denote
the class of bounded (i.e., relatively compact) sets in S. The space S is
known to be Polish, and the family C1< of continuous functions f: S IR+
with compact support is separable in the uniform metric. Furthermore,
564 Foundations of Modern Probability
there exists a sequence of compact sets Kn t S such that Kn C K+l for
each n.
Let M(S) denote the class of measures on S that are locally finite (i.e.,
finite on S), and write 7r Band 1r j for the mappings J.l H- J-LB and J-L H J.LI =
J fdJ.L, respectively, on M(S). The vague topology in M(S) is generated by
the maps 1r f, f E Cj(, and we write the vague convergence of J-Ln toward J.l
v ""
as tLn J-L. For any J.L E M(S) we define SJi. = {B E S; J-LaB == O}.
Here we list some basic facts about the vague topology.
Theorem A2.3 (vague topology) For any LescH space S, we have
(i) M(S) is Polish in the vague topology;
(ii) a set A c M(S) is vaguely relatively compact iffsupp,EAJ1f < 00 for
all f E ej(;
(iii) if J1n -4 J-t and B E S with po B == 0, then J-tnB -t J.lB;
(iv) B(M(S)) is generated by the maps 7rj, f E Cj(, and also for any
m E M(S) by the maps 7rB, B E Sm.
Proof: (i) Let /1,/2, . .. be dense in c1<, and define
p(J.L, v) = I: k Tk(lJ.Lh - v hi!\. 1), J.L, v E M(S). (3)
It is easily seen that p metrizes the vague topology. In particular, M(S) is
homeomorphic to a subset ofJRoo and therefore separable. The completeness
of p will be clear once we have proved (ii).
(ii) The necessity is clear from the continuity of 7T f for each f E e1<:. Con-
versely, assume that SUPjlEA J-LI < 00 for all 1 E e1<. Choose some compact
sets Kn t S with Kn C K+l for each n, and let the functions fn E Cj(
be such that 1Kn < in < 1Kn+l. For each n the set {In. J.L; J-L E A} is uni-
formly bounded, and so by Theorem 16.3 it is even sequentially relatively
compact. A diagonal argument then shows that A itself is sequentially rela-
tively compact. Since M(S) is metrizable, the desired relative compactness
follows.
(iii) The proof is the same as for Theorem 4.25.
(iv) A topological basis in M(S) is formed by all finite intersections of
the sets {/l; a < J-Lf < b} with 0 < a < band f E c1<. Furthermore, since
M (S) is separable, every vaguely open set is a countable union of basis
elements. Thus, B(M(S)) == a{1T"J; f E CI-}. By a simple approximation
and monotone class argument it follows that B(M(S)) = a{1rB; B E S}.
Now fix aI}Y m E S, put A = a{7rB; B E 8m}, and let V denote the class
of all DES such that 7rD is A-measurable. Fixing a metric d in S such
that all d-bounded closed sets are compact, we note that only countably
many d-spheres around a fixed point have positi,:e m-measure. Thus, 8m
contains a topological basis. We also note that Sm is closed under finite
unions, whereas D is closed under bounded increasing limits. Since S is
separable, it follows that V contains every open set G E S. For any such
A2. Some Special Spaces 565
G, the class D n G is a A-system containing the 7r-system", of all open sets
in G, and by a monotone class argument we get V n G == S n G. It remains
to let G t S. 0
Next we consider the space of all measure-valued rcll functions. Here we
may characterize compactness in terms of countably many one-dimensional
projections, a result needed for the proof of Theorem 16.27.
Theorem A2.4 (measure-valued functions) For any LcscH space S, there
exist some fl, 12,'" E Ck(8) such that a set A c D(+, M(S)) is rela-
tively compact iff Afj == {xfj; x E A} is relatively compact in D(IR+,JR+)
for every j E N.
Proof: If A is relatively compact, then so is Af for every f E Cj{ (S),
since the map x xf is continuous from D(R+,M(S)) to D(+,+).
To prove the converse, choose a dense collection f 1, f2, . . . E Cj( (S), closed
under addition, and assume that AIj is relatively compact for every j. In
particular, sUPxEA Xt!j < 00 for all t > 0 and j E N, and so by Theorem
A2.3 the set {Xt; x E A} is relatively compact in M(S) for every t > O. By
Theorem A2.2 it remains to verify (2), where w is defined in terms of the
complete metric p in (3).
If (2) fails, then either we may choose some x n E A and t n -4 0 with
lim sUPn p( xf n ' xo) > 0, or else there exist some x n E A and some bounded
St < t n < Un with Un - Sn --+ 0 such that
Hm sup (p(xn' xn) /\ p(xn' xn)) > o.
nCX)
(4)
In the former case it is clear from (3) that limsuPn Ixfn!j - xofjl > 0 for
some j E N, which contradicts the relative compactness of Afj.
Next assume (4). By (3) there exist some i, j E N such that
limsup (Ixnfi - Xnfil/\ IXnfj - xnfjl) :> O. (5)
nCX)
Now for any a, a', b, b' E JR, we have
(Ial /\ Ib'D < (Ial/\ la'l) V (Ibl /\ Ib'l) V (Ia + a' 1/\ Ib + b'D.
Since the set {!k} is closed under addition, (5) then implies the same
relation with a common i == j. But then (2) fails for Afi, ,vhich by Theorem
A2.2 contradicts the relative compactness of Afi- Thus, (2) does hold for
A, and so A is relatively compact. 0
Given an IcscH space S, we introduce the classes g, F, and JC of open,
closed, and compact subsets, respectively. Here we may consider :F as a
space in its own right, endowed with the Fell topology generated by the
sets {F E F; FnG =1= 0} and {F E :F; FnK == 0} for arbitrary G E 9 and
K E IC. To describe the corresponding notion of convergence, we may fix a
metrization p of the topology in 8 such that every closed p-ball is compact.
566 Foundations of Modern Probability
Theorem A2.5 (Fell topology) Fix any LcscH space S, and let F be the
class of closed sets F c S, endowed with the Fell topology. Then
(i) F is compact, second-countable, and Hausdorff;
(ii) Fn -+ F in :F iff p(s, Fn) -+ pes, F) for all s E S;
(iii) {F E :F; FnB =1= 0} is universally Borel measurable for every B E S.
Proof: First we show that the Fell topology is generated by the maps
F ....-+ pes, F), s E S. To see that those mappings are continuous, put Bs,r ==
{t E S; pes, t) < T}, and note that
{F; p(s,F) < r}
{F; p(s,F) > r}
{F; F n B; =1= 0},
{F; FnlJ; == 0}.
Here the sets on the right are open, by the definition of the Fell topology
and the choice of p. Thus, the Fell topology contains the p-topology.
To prove the converse, fix any F E :F and a net {F i } C :F with directed
index set (1, -<) such that F i -+ F in the p-topology. We need to show that
convergence holds even in the Fell topology. Then let G E 9 be arbitrary
with F n G 0. Fix any s E F n G. Since pes, F i ) -+ pes, F) == 0, we may
further choose some Si E F i with pes, Si) -+ O. Since G is open, there exists
some i E 1 such that Sj E G for all j >- i. Then also Fj n G ft 0 for all j >- i.
Next consider any K E J( with F n K == 0. Define Ts = p(s, F) for each
S E K and put G s == Bs,rs' Since K is compact, it is covered by finitely
many balls G Sk ' For each k we have p(Sk, F i ) --+ p(Sk, F), and so there
exists some ik E 1 such that Fj n G Sk = 0 for all j >- i k . Letting i E I be
such that i >- ik for all k, it is clear that Fj n K = 0 for all j >- i.
Now we fix any countable dense set DeS, and assume that pes, F i ) -+
pes, F) for all sED. For any s, s' E S we have
Ip(s, Fj) - pes, F)I < IpCs', Fj) - pes', F)I + 2p(s, s').
Given any sand € > 0, we can make the left-hand side < €, by choosing an
s' E D with p( s, s') < € /3 and then an i E 1 such that Ip( s', Fj) - p( s', F) I <
£/3 for all j >- i. This shows that the Fell topology is also generated by the
mappings F ..Ps, F) with s restricted to D. But then:F is homeomorphic
to a subset of JR+ , which is second-countable and metrizable.
To prove that :F is compact, it is now enough to show that every sequence
(Fn) c F contains a convergent subsequence. Then choose a subsequence
such that p( s, F n ) converges in JR + for all SED, and hence also for all
s E S. Since the family of functions pes, Fn) is equicontinuous, even the
limit f is continuous, and so the set F = {s E S; I(s) = O} is closed.
To obtain Fn -+ F, we need to show that whenever F n G -# 0 or
F n K = 0 for some G E g or K E JC, the same relation eventually holds
even for Fn. In the former case, we may fix any s E F n G and note that
p(s, Fn) -+ f(s) = o. Hence, we may choose some Sn E Fn with Sn --t s,
and since Sn E G for large n, we get Fn n G =1= 0. In the latter case, we
assume that instead FnnK =1= 0 along a subsequence. Then there exist some
A2. :iome :ipeclal :ipaces 507
Sn E Fn n K, and we note that Sn -7 S E K along a further subsequence.
Here 0 == p(sn, Fn) --t pes, F), which yields the contradiction s E F n K.
This completes the proof of (i).
To prove (iii), we note that the mapping (s, F) ...-.+ p( s, F) is jointly
continuous and hence Borel measurable. Now Sand F are both separable,
and so the Barela-field in S x F agrees with the product a-field S @ B(F).
Since s E F iff p(s,F) = 0, it follows that {(s,F); S E F} belongs to
SQ9B(F). Hence, so does {(s,F); S E FnB} for arbitrary B E S. The
assertion now follows by Theorem Al.4. 0
We say that a class U c S is separating if for any KeG with K E K
and G E 9 there exists some U E U with K cUe G. A preseparating
class I C S is such that the finite unions of I-sets form a separating class.
When S is Euclidean, we typically choose I to be a class of intervals or
rectangles and U as the corresponding class of finite unions.
Lemma A2.6 (separation) For any monotone function h : S -+ ffi., the
class Sh == {B E S; h(BO) == h( B )} is separating.
Proof: Fix a metric p in S such that every closed p-baU is compact, and
let K E JC and G E 9 with KeG. For any € > 0, define KE: = {s E S;
d( s, K) < E} and note that K E: == {s E S; p( s, K) < €}. Since K is compact,
we have p(K, GC) > 0, and so K C KE: C G for sufficientlY"small € > O.
From the monotonicity of h it is further clear that KE: E Sh for almost
everye > O. 0
We often need the separating class to be countable.
""
Lemma A2.7 (countable separation) Every separating class U C S
contains a countable separating subclass.
Proof: Fix a countable topological base B C S, closed under finite unions.
Choose for every B E B some compact sets KB,n .J,. B with KB,n ::> B , and
then for each pair (B, n) E BxN some set UB,n E U with B C UB,n C KB,n'
The family {UB,n} is clearly separating. 0
The next result, needed for the proof of Theorem 16.29, relates the vague
and Fell topologies for integer-valued measures and their supports. Let
N(S) denote the class of locally finite, integer-valued measures on S, and
write -4 for convergence in the Fell topology.
Proposition A2.8 (supports of measures) Letj.l,j.ll,Jl2,'" EN(S) with
supp Jln -4 supp Jl, where S is LcscH and J.l is simple. Then
limsup(J.Ln B 1\ 1) < J.lB < lim inf JlnB, B E SJ-L.
n n
Proof: To prove the left inequality, we may assume that J1B == O. Since
B E S,.,., we have even Jl B = 0, and so B n sUPP Jl = 0. By convergence of
568 Foundations of Modern Probability
the supports we get B n supp J.tn = 0 for large enough n, which implies
lim sUP(J-lnB /\ 1) < !im sup /ln B = 0 = J1B.
n--+oo n--+oo
To prove the right inequality, we may assume that J-LB = m > O. Since
SJ.L is a separating ring, we may choose a partition B1'...' Bm E SJ.L of
B such that J-LBk = 1 for each k. Then also J1B'k = 1 for each k, and so
B'k n Supp J1 =I- 0. By convergence of the supports we get Bk n supp J.tn =1= 0
for large enough n. Hence,
1 < lirn inf J.tn B'k < lim inf J.tn B k ,
n--+oo n--+oo
and so
ftB
m < "" lim inf /In B k
k n--+oo
lirn inf "" J.tn B k = !im inf J-tn B .
n--+oo k n--+oo
o
<
To state the next result, fix any metric spaces 8 1 ,8 2 , . .. , and introduce
the product spaces sn = S1 X . . . X 8n and S == 8 1 X 8 2 X ... endowed with
their product topologies. For any m < n < 00, let 1f m and 1r mn denote the
natural projections of 8 and sn onto 8 m . The sets An C sn, n E N, are
said to form a projective sequence if 1rmnAn C Am for all m < n. We may
then define their projective limit in S as the set A = nn 1r1 An.
Lemma A2.9 (projective limits) For any metric spaces 8 1 ,8 2 ,. .. , con-
sider a projective sequence of nonempty, compact sets Kn C 8 1 X . . . X Sn,
n E N. Then the projective limit K = nn 1r;1 Kn is again nonempty and
compact.
Proof: Since the Kn are nonempty, we may choose some sequences x n ==
(x) E 1r1 Kn, n E N. By the projective property of the sets Km, we
have 1r m X n E Km for all m < n. In particular, the sequence x:n, x,. . .
is relatively compact in 8m for each mEN, and by a diagonal argument
we may choose a subsequence N' C N and an element x = (x m ) E S such
that x n x as n 00 along N'. Then also 7rmXn 7r m X along N' for
each mEN, and since the Km are closed, we conclude that 1r m X E Km
for all m. Thus, we have x E K, which shows that K is nonernpty. The
compactness of K may be proved by the same argument, where we assume
that x l , x 2 , . . . E K. 0
Historical and Bibliographical Notes
The following notes were prepared with the modest intentions
of tracing the origins of some of the basic ideas in each chap-
ter, of giving precise references for the main results cited in the
text, and of suggesting some literature for further reading. No
completeness is claimed, and knowledgeable readers are likely
to notice misinterpretations and omissions, for which I appolo-
gize in advance. A comprehensive history of modern probability
theory still remains to be written.
1. Measure Theory - Basic Notions
The first author to consider measures in the modern sense was BOREL
(1895, 1898), who constructed Lebesgue measure on the Borel a-field in
JR. The corresponding integral was introduced by LEBESGUE (1902, 1904),
who also established the dominated convergence theorern. The monotone
convergence theorem and Fatou's lemma were later obtained by LEVI
(1906a) and FATOU (1906), respectively. LEBESGUE also introduced the
higher-dimensional Lebesgue measure and proved a first version of Fubini's
theorem, subsequently generalized by FUBINI (1907) and TONELLI (1909).
The integration theory was extended to general measures and abstract
spaces by many authors, including RADON (1913) and FRECHET (1928).
The norm inequalities in Lemma 1.29 were first noted for finite sums
by HOLDER (1889) and MINKOWSKI (1907), respectively, and were later
extended to integrals by RIESZ (1910). Part (i) for p == 2 goes back to
CAUCHY (1821) for finite sums and to BUNIAKOWSKY (1859) for integrals.
The Hilbert space projection theorem can be traced back to LEVI (1906b).
The monotone class Theorem 1.1 was first proved, along with related
results, already by SIERPINSKI (1928), but the result was not used in prob-
ability theory until DYNKIN (1961). More primitive versions had previously
been employed by HALMOS (1950) and DOOB (1953).
Most results in this chapter are well known and can be found in any
textbook on real analysis. Many probability texts, including LOEVE (1977)
and BILLINGSLEY (1995), contain detailed introductions to measure theory.
There are also some excellent texts in real analysis adapted to the needs of
probabilists, such as DUDLEY (1989) and DOOB (1994). The former author
also provides some more detailed historical information.
570 Foundations of Modern Probability
2. Measure Theory - Key Results
As we have seen, BOREL (1995, 1998) was the first to prove the existence
of one-dimensional Lebesgue measure. However, the modern construction
via outer measures in due to CARATHEODORY (1918).
Functions of bounded variation were introduced by JORDAN (1881), who
proved that any such function is the difference of two nondecreasing func-
tions. The corresponding decomposition of signed measures was obtained
by HAHN (1921). Integrals with respect to nondecreasing functions were
defined by STIELTJES (1894), but their importance was not recognized un-
til RIESZ (1909b) proved his representation theorem for linear functionals
on C[O, 1]. The a.e. differentiability of a function of bounded variation was
first proved by LEBESGUE (1904).
VITALI (1905) was the first author to see the connection between absolute
continuity and the existence of a density. The Radon-Nikodym theorem was
then proved in increasing generality by RADON (1913), DANIELL (1920),
and NIKODYM (1930). The idea of a combined proof that also establishes
the Lebesgue decomposition is due to VON NEUMANN.
Invariant measures on specific groups were early identified through ex-
plicit computation by many authors, notably by HURWITZ (1897) for the
case of SO(n). HAAR (1933) proved the existence (but not the uniqueness)
of invariant measures on an arbitrary IcscH group. The modern treatment
originated with WElL (1940), and excellent expositions can be found in
many books on real or harmonic analysis. Invariant measures on more gen-
eral spaces are usually approached via quotient spaces. Our discussion in
Theorem 2.29 is adapted from ROYDEN (1988).
3. Processes, Distributions, and Independence
The use of countably additive probability measures dates back to BOREL
(1909), who constructed random variables as measurable functions on the
Lebesgue unit interval and proved Theorem 3.18 for independent events.
CANTELLI (1917) noticed that the "easy" part remains true without the
independence assumption. Lemma 3.5 was proved by JENSEN (1906) after
HOLDER had obtained a special case.
The modern framework, with random variables as measurable functions
on an abstract probability space (0, A, P) and with expected values as p-
integrals over 0, was used implicitly by KOLMOGOROV from (1928) on and
was later formalized in KOLMOGOROV (1933). The latter monograph also
contains Kolmogorov's zero-one law, discovered long before HEWITT and
SAVAGE (1955) obtained theirs.
Early work in probability theory deals with properties depending only on
the finite-dimensional distributions. WIENER (1923) was the first author to
construct the distribution of a process as a measure on a function space.
The general continuity criterion in Theorem 3.23, essentially due to KOL-
Historical and Bibliographical Notes 571
MOGOROV, was first published by SLUTSKY (1937), with minor extensions
later added by LOEVE (1978) and CHENTSOV (1956). The general search
for regularity properties was initiated by DOOB (1937, 1947). Soon it be-
came clear, especially through the work of LEVY (1934-35, 1954), DOOB
(1951, 1953), and KINNEY (1953), that most processes of interest have
right-continuous versions with left-hand limits.
More detailed accounts of the material in this chapter appear in many
textbooks, such as in BILLINGSLEY (1995), ITO (1984), and WILLIAMS
(1991). Further discussions of specific regularity properties appear in
LOEVE (1977) and CRAMER and LEADBETTER (1967). Earlier texts tend
to give more weight to distribution functions and their densities, less weight
to measures and a-fields.
4. Random Sequences, Series, and Averages
The weak law of large numbers was first obtained by BERNOULLI (1713)
for the sequences named after him. More general versions were then estab-
lished with increasing rigor by BIENAYME (1853), CHEBYSHEV (1867), and
MARKOV (1899). A necessary and sufficient condition for the weak law of
large numbers was finally obtained by KOLMOGOROV (1928-29).
KHINCHIN and KOLMOGOROV (1925) studied series of independent, dis-
crete random variables and showed that convergence holds under the
condition in Lemma 4.16. KOLMOGOROV (1928-29) then obtained his max-
imum inequality and showed that the three conditions in Theorem 4.18
are necessary and sufficient for a.s. convergence. The equivalence with
convergence in distribution was later noted by LEVY (1954).
The strong law of large numbers for Bernoulli sequences was stated by
BOREL (1909), but the first rigorous proof is due to FABER (1910). The
simple criterion in Corollary 4.22 was obtained in KOLMOGOROV (1930).
In (1933) KOLMOGOROV showed that existence of the mean is necessary
and sufficient for the strong law of large numbers for general i.i.d. sequences.
The extension to exponents p =1= 1 is due to MARCINKIEWICZ and ZVGMUND
(1937). Proposition 4.24 was proved in stages by GLIVENKO (1933) and
CANTELLI (1933).
RIESZ (1909a) introduced the notion of convergence in Ineasure, for prob-
ability measures equivalent to convergence in probability and showed that
it implies a.e. convergence along a subsequence. The weak compactness
criterion in Lemma 4.13 is due to DUNFORD (1939). The functional rep-
resentation of Proposition 4.31 appeared in KALLENBERG (1996a), and
Corollary 4.32 was given by STRICKER and YOR (1978).
The theory of weak convergence was founded by ALEXANDROV (1940-
43), who proved in particular the so-called Portmanteau Theorem 4.25. The
continuous mapping Theorem 4.27 was obtained for a single function In
f by MANN and WALD (1943) and then in the general case by PROHOROV
572 Foundations of Modern Probability
(1956) and RUBIN. The coupling Theorem 4.30 is due for complete S to
SKOROHOD (1956) and in general to DUDLEY (1968).
More detailed accounts of the material in this chapter may be found in
many textbooks, such as in LOEVE (1977) and CHOW and TEICHER (1997).
Additional results on random series and a.s. convergence appear in STOUT
(1974) and KWAPIEN and WOYCZYNSKI (1992).
5. Characteristic Functions and Classical Limit
Theorems
The central limit theorem (a name first used by P6LYA (1920)) has a long
and glorious history, beginning with the work of DE MOIVRE (1733-56),
who obtained the now-familiar approximation of binomial probabilities in
terms of the normal density function. LAPLACE (1774, 1812-20) stated the
general result in the modern integrated form, but his proof was incomplete,
as was the proof of CHEBYSHEV (1867, 1890).
The first rigorous proof was given by LIAPOUNOV (1901), though under
an extra moment condition. Then LINDEBERG (1922a) proved his funda-
mental Theorem 5.12, which in turn led to the basic Proposition 5.9 in
a series of papers by LINDEBERG (1922b) and LEVY (1922a-c). BERN-
STEIN (1927) obtained the first extension to higher dimensions. The general
problem of normal convergence, regarded for two centuries as the cen-
tral (indeed the only) theoretical problem in probability, was eventually
solved in the form of Theorem 5.15, independently by FELLER (1935) and
LEVY (1935a). Slowly varying functions were introduced and studied by
KARAMATA (1930).
Though characteristic functions have been used in probability theory
ever since LAPLACE (1812-20), their first use in a rigorous proof of a limit
theorem had to wait until LIAPOUNOV (1901). The first general continuity
theorem was established by LEVY (1922c), who assumed the characteristic
functions to converge uniformly in some neighborhood of the origin. The
definitive version in Theorem 5.22 is due to BOCHNER (1933). Our direct
approach to Theorem 5.3 may be new, in avoiding the relatively deep HELLY
selection theorem (1911-12). The basic Corollary 5.5 was noted by GRAMER
and WOLD (1936).
Introductions to characteristic functions and classical limit theorems may
be found in many textbooks, notably LOEVE (1977). FELLER (1971) is
a rich source of further information on Laplace transforms, characteris-
tic functions, and classical limit theorems. For more detailed or advanced
results on characteristic functions, see LUKACS (1970).
Historical and Bibliographical Notes 573
6. Conditioning and Disintegration
Though conditional densities have been computed by statisticians ever since
LAPLACE (1774), the first general approach to conditioning was devised
by KOLMOGOROV (1933), who defined conditional probabilities and ex-
pectations as random variables on the basic probability space, using the
Radon-Nikodym theorem, which had recently become available. His orig-
inal notion of conditioning with respect to a random vector was extended
by HALMOS (1950) to general random elements and then by DOOB (1953)
to abstract sub-a-fields.
Our present Hilbert space approach to conditioning, essentially due to
VaN NEUMANN (1940), is more elementary and intuitive and avoids the
use of the relatively deep Radon-Nikodym theorem. It has the further
advantage of leading to the attractive interpretation of a martingale as
a projective family of random variables.
The existence of regular conditional distributions was studied by several
authors, beginning with DOOB (1938). It leads immediately to the familiar
disintegration of measures on product spaces and to the frequently used
but rarely stated disintegration Theorem 6.4.
Measures on infinite product spaces were first considered by DANIELL
(1918-19, 1919-20), who proved the extension Theorenl 6.14 for count-
able product spaces. KOLMOGOROV (1933) extended the result to arbitrary
index sets. LOMNICKI and ULAM (1934) noted that no topological assump-
tions are needed for the construction of infinite product measures, a result
that was later extended by C.T. IONESCU TULCEA (1949-50) to measures
specified by a sequence of conditional distributions.
The interpretation of the simple Markov property in terms of conditional
independence was indicated already by MARKOV (1906), and the formal
statement of Proposition 6.6 appears in DooB (1953). Further properties
of conditional independence have been listed by DaHLER (1980) and others.
The transfer Theorem 6.10, in the present form quoted from KALLENBERG
(1988), may have been first noted by THORISSON.
The traditional Radon-Nikodym approach to conditional expectations
appears in many textbooks, such as in BILLINGSLEY (1995).
7. Martingales and Optional Times
Martingales were first introduced by BERNSTEIN (1927, 1937) in his efforts
to relax the independence assumption in the classical limit theorems. Both
BERNSTEIN and LEVY (1935a-b, 1954) extended Kolmogorov's maximum
inequality and the central limit theorem to a general martingale context.
The term martingale (originally denoting part of a horse's harness and later
used for a special gambling system) was introduced in the probabilistic
context by VILLE (1939).
574 Foundations of Modern Probability
The first martingale convergence theorem was obtained by JESSEN (1934)
and LEVY (1935b), both of whom proved Theorem 7.23 for filtrations
generated by sequences of independent random variables. A submartin-
gale version of the same result appears in SPARRE-ANDERSEN and JESSEN
(1948). The independence assumption was removed by LEVY (1954), who
also noted the simple martingale proof of Kolmogorov's zero-one law and
obtained his conditional version of the Borel-Cantelli lemma.
The general convergence theorem for discrete-time martingales was
proved by DOOB (1940), and the basic regularity theorems for continuous-
time martingales first appeared in DooB (1951). The theory was extended
to submartingales by SNELL (1952) and DOOB (1953). The latter book
is also the original source of such fundamental results as the martingale
closure theorem, the optional sampling theorem, and the LP-inequality.
Though hitting times have long been used informally, general optional
times seem to appear for the first time in DOOB (1936). Abstract filtrations
were not introduced until DOOB (1953). Progressive processes were intro-
duced by DYNKIN (1961), and the modern definition of the a-fields :F r is
due to YUSHKEVICH.
Elementary introductions to martingale theory are given by many
authors, including WILLIAMS (1991). More information about the discrete-
time case is given by NEVEU (1975) and CHOW and TEICHER (1997). For a
detailed account of the continuous-time theory and its relations to Markov
processes and stochastic calculus, see DELLACHERIE and MEYER (1975-87).
8. Markov Processes and Discrete-Time Chains
Markov chains in discrete time and with finitely many states were intro-
duced by MARKOV (1906), who proved the first ergodic theorem, assuming
the transition probabilities to be strictly positive. KOLMOGOROV (1936a-
b) extended the theory to countable state spaces and arbitrary transition
probabilities. In particular, he noted the decomposition of the state space
into irreducible sets, classified the states with respect to recurrence and
periodicity, and described the asymptotic behavior of the n-step transition
probabilities. Kolmogorov's original proofs were analytic. The more intu-
itive coupling approach was introduced by DOEBLIN (1938), long before
the strong Markov property had been formalized.
BACHELlER had noted the connection between random walks and dif-
fusions, which inspired KOLMOGOROV (1931a) to give a precise definition
of Markov processes in continuous time. His treatment is purely analytic,
with the distribution specified by a family of transition kernels satisfying
the Chapman-Kolmogorov relation, previously noted in special cases by
CHAPMAN (1928) and SMOLUCHOVSKY.
KOLMOGOROV (1931a) makes no reference to sample paths. The transi-
tion to probabilistic methods began with the work of LEVY (1934-35) and
DOEBLIN (1938). Though the strong Markov property was used informally
Historical and Bibliographical Notes 575
by those authors (and indeed already by BACHELlER (1900, 1901)), the
result was first stated and proved in a special case by DOOB (1945). Gen-
eral filtrations were introduced in Markov process theory by BLUMENTHAL
(1957). The modern setup, with a canonical process X defined on the path
space 0, equipped with a filtration F, a family of shift operators ()t, and
a collection of probability measures Px, was developed systematically by
DYNKIN (1961, 1965). A weaker form of Theorem 8.23 appears in BLUMEN-
THAL and GETOOR (1968), and the present version is from KALLENBERG
(1987, 1998).
Elementary introductions to Markov processes appear in many text-
books, such as ROGERS and WILLIAMS (2000a) and CHUNG (1982). More
detailed or advanced accounts are given by DYNKIN (1965), BLUMEN-
THAL and GETOOR (1968), ETHIER and KURTZ (1986), DELLACHERIE and
MEYER (1975-87), and SHARPE (1988). FELLER (1968) gives a masterly in-
troduction to Markov chains, later imitated by many authors. More detailed
accounts of the discrete-time theory appear in KEMENY et al. (1966) and
FREEDMAN (1971a). The coupling method fell into oblivion after Doeblin's
untimely death in 1940 but has recently enjoyed a revival, meticulously
documented by LINDVALL (1992) and THORISSON (2000).
9. Random Walks and Renewal Theory
Random walks originally arose in a wide range of applications, such as gam-
bling, queuing, storage, and insurance; their history can be traced back to
the origins of probability. The approximation of diffusion processes by ran-
dom walks dates back to BACHELlER (1900, 1901). A further application
was to potential theory, where in the 1920s a method of discrete approxi-
mation was devised, admitting a probabilistic interpretation in terms of a
simple symmetric random walk. Finally, random walks played an important
role in the sequential analysis developed by WALD (1947).
The modern theory began with P6LYA'S (1921) discovery that a simple
symmetric random walk on Zd is recurrent for d < 2 and transient other-
wise. His result was later extended to Brownian motion by LEVY (1940)
and KAKUTANI (1944a). The general recurrence criterion in Theorem 9.4
was derived by CHUNG and FUCHS (1951), and the probabilistic approach
to Theorem 9.2 was found by CHUNG and ORNSTEIN (1962). The first con-
dition in Corollary 9.7 is, in fact, even necessary for recurrence, as was
noted independently by ORNSTEIN (1969) and C.J. STONE (1969).
The reflection principle was first used by ANDRE (1887) in his discussion
of the ballot problem. The systematic study of fluctuation and absorption
problems for random walks began with the work of POLLACZEK (1930).
Ladder times and heights, first introduced by BLACKWELL, were explored in
an influential paper by FELLER (1949). The factorizations in Theorem 9.15
were originally derived by the Wiener-Hopf technique, which had been de-
veloped by PALEY and WIENER (1934) as a general tool in Fourier analysis.
576 Foundations of Modern Probability
Theorem 9.16 is due for u = 0 to SPARRE-ANDERSEN (1953-54) and in gen-
eral to BAXTER (1961). The former author used complicated combinatorial
methods, which were later simplified by FELLER and others.
Though renewals in Markov chains are implicit already in some early
work of KOLMOGOROV and LEVY, the general renewal process was ap-
parently first introduced by PALM (1943). The first renewal theorem was
obtained by ERDOS et al. (1949) for random walks on Z+. In that case,
however, CHUNG noted that the result is an easy consequence of KOL-
MOGOROV'S (1936a-b) ergodic theorem for Markov chains on a countable
state space. BLACKWELL (1948, 1953) extended the result to random walks
on +. The ultimate version for transient random walks on JR. is due to
FELLER and OREY (1961). The first coupling proof of Blackwell's theorem
was given by LINDVALL (1977). Our proof is a modification of an argu-
ment by ATHREYA et al. (1978), which originally did not cover all cases.
The method seems to require the existence of a possibly infinite mean. An
analytic approach to the general case appears in FELLER (1971).
Elementary introductions to random walks are given by many authors,
including CHUNG (1974), FELLER (1968, 1971), and LOEVE (1977). A
detailed exposition of random walks on Zd is given by SPITZER (1976).
10. Stationary Processes and Ergodic Theory
The history of ergodic theory dates back to BOLTZMANN'S (1887) work
in statistical mechanics. Boltzmann's ergodic hypothesis--the conjectural
equality between time and ensemble averages-was long accepted as a
heuristic principle. In probabilistic terms it amounts to the convergence
t- 1 J f(Xs) ds --t Ef(X o ), where Xt represents the state of the system
(typically the configuration of all molecules in a gas) at time t, and the
expected value is computed with respect to a suitably invariant probability
measure on a compact submanifold of the state space.
The ergodic hypothesis was sensationally proved as a mathematical the-
orem, first in an L2- vers ion by VON NEUMANN (1932), after KOOPMAN
(1931) had noted the connection between measure-preserving transforma-
tions and unitary operators on a Hilbert space, and shortly afterwards
in the pointwise form of BIRKHOFF (1932). The initially quite intricate
proof of the latter was simplified in stages: first by Y OSIDA and KAKUTANI
(1939), who noted how the result follows easily from the maximal ergodic
Lemma 10.7, and then by GARSIA (1965), who gave a short proof of the
latter result. KHINCHIN (1933, 1934) pioneered a translation of the results
of ergodic theory into the probabilistic setting of stationary sequences and
processes.
The first multivariate ergodic theorem was obtained by WIENER (1939),
who proved Theorem 10.14 in the special case of averages over concentric
balls. More general versions were established by many authors, including
DAY (1942) and PITT (1942). The classical methods were pushed to the
Historical and Bibliographical Notes 577
limit in a notable paper by TEMPEL' MAN (1972). NGU'YEN and ZESSIN
(1979) proved versions of the theorem for finitely additive set functions. The
first ergodic theorem for noncommutative transformations was obtained by
ZVGMUND (1951). SUCHESTON (1983) noted that the statement follows
easily from MAKER'S (1940) result. In Lemma 10.15, part (i) is due to
ROGERS and SHEPHARD (1958); part (ii) is elementary.
The ergodic theorem for random matrices was proved by FURSTENBERG
and KESTEN (1960), long before the subadditive ergodic theorem became
available. The latter result was originally proved by KINGI\tlAN (1968) under
the stronger hypothesis that the array (Xm,n) be jointly stationary in m
and n. The present extension and shorter proof are due to LIGGETT (1985).
The ergodic decomposition of invariant measures dates back to KRVLOV
and BOGOLIOUBOV (1937), though the basic role of the invariant a-field
was not recognized until the work of FARRELL (1962) a,nd VARADARA-
JAN (1963). The connection between ergodic decompositions and sufficient
statistics is explored in an elegant paper by DVNKIN (1978). The tra-
ditional approach to the subject is via Choquet theory, as surveyed by
DELLACHERIE and MEVER (1975-87).
The coupling equivalences in Theorem 10.27 (i) were proved by S. GOLD-
STEIN (1979), after GRIFFEATH (1975) had obtained a related result for
Markov chains. The shift coupling part of the same theorem was estab-
lished by BERBEE (1979) and ALDOUS and THORISSON (1993), and the
version for abstract groups was then obtained by THORISSON (1996). The
latter author surveyed the whole area in (2000).
Elementary introductions to stationary processes have been given by
many authors, beginning with DooB (1953) and CRAMER and LEAD-
BETTER (1967). LOEVE (1978) contains a more advanced account of
probabilistic ergodic theory. A modern and comprehensive survey of the
vast area of general ergodic theorems is given by KRENGEL (1985).
11. Related Notions of Symmetry and Invariance
Palm distributions are named after the Swedish engineer I>ALM (1943), who
in a pioneering study of intensity fluctuations in telephone traffic consid-
ered some basic Palm probabilities associated with simple, stationary point
processes on JR, using an elementary conditioning approach. Palm also de-
rived some primitive inversion formulas. An extended and more rigorous
account of Palm's ideas was given by KHINCHIN (1955), in a monograph
on queuing theory.
Independently of Palm's work, KAPLAN (1955) first obtained Theorem
11.4 as an extension of some results for renewal processes by DOOB (1948).
A partial discrete-time result in this direction had already been noted
by KAC (1947). Kaplan's result was rediscovered in the setting of Palm
distributions, independently by RVLL-NARDZEWSKI (1961) and SLIVNVAK
(1962). In the special case of intervals on the real line, Theorem 11.5 (i) was
578 Foundations of Modern Probability
first noted by KOROLYUK (as cited by KHINCHIN (1955)), and part (iii) of
the same theorem was obtained by RYLL-NARDZEWSKI (1961). The general
versions are due to KONIG and MATTHES (1963) and MATTHES (1963) for
d = 1 and to MATTHES et al. (1978) for d > 1. A more primitive setwise
version of Theorem 11.8 (i), due to SLIVNYAK (1962), was strengthened by
ZAHLE (1980) to convergence in total variation.
DE FINETTI (1930, 1937) proved that an infinite sequence of exchangeable
random variables is mixed i.i.d. The result became a cornerstone in his the-
ory of subjective probability and Bayesian statistics. RYLL-NARDZEWSKI
(1957) noted that the theorem remains valid under the weaker hypothesis
of spreadability, and BUHLMANN (1960) extended the result to continuous
time. The predictable sampling property in Theorem 11.13 was first noted
by DOOB (1936) for Li.d. random variables and increasing sequences of
predictable times. The general result and its continuous-time counterpart
appear in KALLENBERG (1988). SPARRE-ANDERSEN'S (1953-54) announce-
ment of his Corollary 11.14 was (according to Feller) "a sensation greeted
with incredulity, and the original proof was of an extraordinary intricacy
and complexity." A simplified argument (different from ours) appears in
FELLER (1971). Lemma 11.9 is quoted from KALLENBERG (1999b).
BERTRAND (1887) noted that if two candidates A and B in an election
get the proportions p and 1- p of the votes, then the probability that A will
lead throughout the counting of ballots equals (2p - 1) V O. More general
"ballot theorems" and alternative proofs have been discovered by many au-
thors, beginning with ANDRE (1887) and BARBIER (1887). TAKACS (1967)
obtained the version for cyclically stationary processes on a finite interval
and gave numerous applications to queuing theory. The present statement
is cited from KALLENBERG (1999a).
The first version of Theorem 11.18 was obtained by SHANNON (1948),
who proved the convergence in probability for stationary and ergodic
Markov chains in a finite state space. The Markovian restriction was lifted
by McMILLAN (1953), who also strengthened the result to convergence in
L 1 . CARLESON (1958) extended McMillan's result to countable state spaces.
The a.s. convergence is due to BREIMAN (1957-60) and A. IONESCU TUL-
CEA (1960) for finite state spaces and to CHUNG (1961) for the countable
case.
More information about Palm measures is available in MATTHES et al.
(1978), DALEY and VERE-JONES (1988), and THORISSON (2000). Appli-
cations to queuing theory and other areas are discussed by many authors,
including FRANKEN et al. (1981) and BACCELLI and BREMAUD (1994).
ALDOUS (1985) gives a comprehensive survey of exchangeability theory. A
nice introduction to information theory is given by BILLINGSLEY (1965).
Historical and Bibliographical Notes 579
12. Poisson and Pure Jump-Type Markov Processes
The Poisson distribution was introduced by DE MOIVRE (1711-12) and
POISSON (1837) as an approximation to the binomial distribution. The as-
sociated process arose much later from miscellaneous applications. Thus, it
was considered by LUNDBERG (1903) to model streams of insurance claims,
by RUTHERFORD and GEIGER (1908) to describe the process of radioactive
decay, and by ERLANG (1909) to model the incoming traffic to a telephone
exchange. Poisson random measures in higher dimensions appear implicitly
in the work of LEVY (1934-35), whose treatment was later formalized by
ITO (1942b).
The independent-increment characterization of Poisson processes goes
back to ERLANG (1909) and LEVY (1934-35). Cox processes, originally
introduced by Cox (1955) under the name of doubly stochastic Poisson
processes, were thoroughly explored by KINGMAN (1964), KRICKEBERG
(1972), and GRANDELL (1976). Thinnings were first considered by RENYI
(1956). The binomial construction of general Poisson processes was noted
independently by KINGMAN (1967) and MECKE (1967). ()ne-dimensional
uniqueness criteria were obtained, first in the Poisson case by RENYI (1967),
and then in general by MaNcH (1971), KALLENBERG (173a, 1986), and
GRANDELL (1976). The mixed Poisson and binomial processes were studied
extensively by MATTHES et al. (1978) and KALLENBERG (1986).
Markov chains in continuous time have been studied by many authors,
beginning with KOLMOGOROV (1931a). The transition functions of general
pure jump-type Markov processes were explored by POSPIS]L (1935-36) and
FELLER (1936, 1940), and the corresponding sample path properties were
examined by DOEBLIN (1939b) and DooB (1942b). The first continuous-
time version of the strong Markov property was obtained by DOOB (1945).
KINGMAN (1993) gives an elementary introduction to Poisson processes
with numerous applications. More detailed accounts, set in the context of
general random measures and point processes, appear in l\;1ATTHES et al.
(1978), KALLENBERG (1986), and DALEY and VERE-JONES (1988). Intro-
ductions to continuous-time Markov chains are provided by many authors,
beginning with FELLER (1968). For a more comprehensive account, see
CHUNG (1960). The underlying regenerative structure was examined by
KINGMAN (1972).
13. Gaussian Processes and Brownian Motion
The Gaussian density function first appeared in the work of DE MOIV-
RE (1733-56), and the corresponding distribution became explicit through
the work of LAPLACE (1774, 1812-20). The Gaussian law was popularized
by GAUSS (1809) in his theory of errors and so became named after him.
MAXWELL derived the Gaussian law as the velocity distribution for the
molecules in a gas, assuming the hypotheses of Proposition 13.2. Theorem
580 Foundations of Modern Probability
13.3 was originally stated by SCHOENBERG (1938) as a relation between
positive definite and completely monotone functions; the probabilistic in-
terpretation was later noted by FREEDMAN (1962-63). Isonormal Gaussian
processes were introduced by SEGAL (1954).
The process of Brownian motion was introduced by BACHELlER (1900,
1901) to model fluctuations on the stock market. Bachelier discovered some
basic properties of the process, such as the relation Mt =d I Bt I. EINSTEIN
(1905, 1906) later introduced the same process as a model for the physical
phenomenon of Brownian motionthe irregular movement of microscopic
particles suspended in a liquid. The latter phenomenon, first noted by VAN
LEEUWENHOEK in the seventeenth century, is named after the botanist
BROWN (1828) for his systematic observations of pollen grains. Einstein's
theory was forwarded in support of the still-controversial molecular theory
of matter. A more refined model for the physical Brownian motion was
proposed by LANGEVIN (1909) and ORNSTEIN and UHLENBECK (1930).
The mathematical theory of Brownian motion was put on a rigorous
basis by WIENER (1923), who constructed the associated distribution as a
measure on the space of continuous paths. The significance of Wiener's rev-
olutionary paper was not fully recognized until after the pioneering work of
KOLMOGOROV (1931a, 1933), LEVY (1934-35), and FELLER (1936). Wiener
also introduced stochastic integrals of deterministic L 2 -functions, which
were later studied in further detail by PALEY et al. (1933). The spectral
representation of stationary processes, originally deduced from BOCHNER'S
(1932) theorem by CRAMER (1942), was later recognized as equivalent to a
general Hilbert space result due to M.H. STONE (1932). The chaos expan-
sion of Brownian functionals was discovered by WIENER (1938), and the
theory of multiple integrals with respect to Brownian motion was developed
in a seminal paper of ITO (1951c).
The law of the iterated logarithm was discovered by KHINCHIN, first
(1923, 1924) for Bernoulli sequences, and later (1933) for Brownian motion.
A systematic study of the Brownian paths was initiated by LEVY (1954,
1965), who proved the existence of the quadratic variation in (1940) and
the arcsine laws in (1939, 1965). Though many proofs of the latter have
since been given, the present deduction from basic symmetry properties
may be new. The strong Markov property was used implicitly in the work
of Levy and others, but the result was not carefully stated and proved until
HUNT (1956).
Many modern probability texts contain detailed introductions to Brown-
ian motion. The books by ITO and McKEAN (1965), FREEDMAN (1971b),
KARATZAS and SHREVE (1991), and REVUZ and YOR (1999) provide a
wealth of further information on the subject. Further information on mul-
tiple Wiener-Ito integrals is given by KALLIANPUR (1980), DELLACHERIE
et al. (1992), and NUALART (1995). The advanced theory of Gaussian
distributions is nicely surveyed by ADLER (1990).
Historical and Bibliographical Notes 581
14. Skorohod Embedding and lnvariance Principles
The first functional limit theorems were obtained in (1931b, 1933a) by KOL-
MOGOROV, who considered special functionals of a random walk. ERDOS
and KAC (1946, 1947) conceived the idea of an invariance principle that
would allow functional limit theorems to be extended froIn particular cases
to a general setting. They also treated some special functionals of a ran-
dom walk. The first general functional limit theorems were obtained by
DONSKER (1951-52) for random walks and empirical distribution functions,
following an idea of DOOB (1949). A general theory based on sophisti-
cated compactness arguments was later developed by PROHOROV (1956)
and others.
SKOROHOD's (1965) embedding theorem provided a new and probabilis-
tic approach to Donsker's theorem. Extensions to the nlartingale context
were obtained by many authors, beginning with DUBINS (1968). Lemma
14.19 appears in DVORETZKY (1972). Donsker's weak invariance princi-
ple was supplemented by a strong version due to STRASSEN (1964), which
yields extensions of many a.s. limit theorems for Brownia.n motion to suit-
able random walks. In particular, his result yields a silnple proof of the
HARTMAN and WINTNER (1941) law of the iterated logarithm, which had
originally been deduced from some deep results of KOLMOGOROV (1929).
BILLINGSLEY (1968) gives many interesting applications and extensions
of Donsker's theorem. For a wide range of applications of the martin-
gale embedding theorem, see HALL and HEYDE (1980) and DURRETT
(1995). KOMLOS et al. (1975-76) showed that the approximation rate in the
Skorohod embedding can be improved by a more delicate "strong approx-
imation." For an exposition of their work and its numerous applications,
see CSORGO and REvEsz (1981).
15. Independent-Increment Processes and
Approximation
Until the 1920s, Brownian motion and the Poisson process were essentially
the only known processes with independent increments. In (1924, 1925)
LEVY introduced the stable distributions and noted that they too could be
associated with suitable "decomposable" processes. DE FINETTI (1929) saw
the general connection between processes with independent increments and
infinitely divisible distributions and posed the problem of characterizing the
latter. A partial solution for distributions with a finite second moment was
found by KOLMOGOROV (1932).
The complete solution was obtained in a revolutionary paper by LEVY
(1934-35), where the "decomposable" processes are analyzed by a virtuosic
blend of analytic and probabilistic methods, leading to an explicit descrip-
tion in terms of a jump and a diffusion component. As a byproduct, Levy
582 Foundations of Modern Probability
obtained the general representation for the associated characteristic func-
tions. His analysis was so complete that only improvements in detail have
since been possible. In particular, ITO (1942b) showed how the jump com-
ponent can be expressed in terms of Poisson integrals. Analytic derivations
of the representation formula for the characteristic function were later given
by LEVY (1954) himself, by FELLER (1937), and by KHINCHIN (1937).
The scope of the classical central limit problem was broadened by LEVY
(1925) to a general study of suitably normalized partial sums, obtained
from a single sequence of independent random variables. To include the
case of the classical Poisson approximation, KOLMOGOROV proposed a fur-
ther extension to general triangular arrays, subject to the sole condition
of uniformly asymptotically negligible elements. In this context, FELLER
(1937) and KHINCHIN (1937) proved independently that the limiting distri-
butions are infinitely divisible. It remained to characterize the convergence
to specific limits, a problem that had already been solved in the Gaussian
case by FELLER (1935) and LEVY (1935a). The ultimate solution was ob-
tained independently by DOEBLIN (1939) and GNEDENKO (1939), and a
comprehensive exposition of the theory was published by GNEDENKO and
KOLMOGOROV (1968).
The basic convergence Theorem 15.17 for Levy processes and the as-
sociated approximation result for random walks in Corollary 15.20 are
essentially due to SKOROHOD (1957), though with rather different state-
ments and proofs. Lemma 15.22 appears in DOEBLIN (1939a). Our approach
to the basic representation theorem is a modernized version of Levy's
proof, with simplifications resulting from the use of basic point process
and martingale methods.
Detailed accounts of the basic limit theory for null arrays are provided by
many authors, including LOEVE (1977) and FELLER (1971). The positive
case is treated in KALLENBERG (1986). A modern introduction to Levy
processes is given by BERTOIN (1996). General independent increment pro-
cesses and associated limit theorems are treated in JACOD and SHIRYAEV
(1987). Extreme value theory is surveyed by LEADBETTER et al. (1983).
16. Convergence of Random Processes, Measures, and
Sets
After DONSKER (1951-52) had proved his functional limit theorems for
random walks and empirical distribution functions, a general theory of
weak convergence in function spaces was developed by the Russian school,
in seminal papers by PROHOROV (1956), SKOROHOD (1956, 1957), and
KOLMOGOROV (1956). Thus, PROHOROV (1956) proved his fundamental
compactness Theorem 16.3, in a setting for separable and complete metric
spaces. The abstract theory was later extended in various directions by
Historical and Bibliographical Notes 583
LE CAM (1957), VARADARAJAN (1958), and DUDLEY (1966, 1967). The
elementary inequality of OTTAVIANI is from (1939).
Originally SKOROHOD (1956) considered the space D([O,l]) endowed
with four different topologies, of which the J1-topology considered here
is by far the most important for applications. The theory was later ex-
tended to D(R+) by C.J. STONE (1963) and LINDVALL (1973). Tightness
was originally verified by means of various product moment conditions, de-
veloped by CHENTSOV (1956) and BILLINGSLEY (1968), before the powerful
criterion of ALDOUS (1978) became available. KURTZ (1)75) and MITOMA
(1983) noted that criteria for tightness in D(IR+, S) can often be expressed
in terms of one-dimensional projections, as in Theorem 16.27.
The weak convergence theory for random measures and point processes
originated with PROHOROV (1961), who noted the equivalence of (i) and
(ii) in Theorem 16.16 when S is compact. The development continued with
seminal papers by DEBES et al. (1970-71), HARRIS (1971), and JAGERS
(1974). The one-dimensional criteria in Proposition 16.17 and Theorems
16.16 and 16.29 are based on results in KALLENBERG (1973a, 1986, 1996b)
and a subsequent remark by KURTZ. Random sets had already been stud-
ied extensively by many authors, including CHOQUET (1953-54), KENDALL
(1974), and MATHERON (1975), when an associated weak convergence
theory was developed by NORBERG (1984).
The applications considered in this chapter have a long history. Thus,
primitive versions of Theorem 16.18 were obtained by PALM (1943), KHIN-
CHIN (1955), and OSOSKOV (1956). The present version is due for S = to
GRIGELIONIS (1963) and for more general spaces to GOLDMAN (1967) and
JAGERS (1972). Limit theorems under simultaneous thinning and reseal-
ing of a given point process were obtained by RENYI (1956), NAWROTZKI
(1962), BELYAEV (1963), and GOLDMAN (1967). The general version in
Theorem 16.19 was proved by KALLENBERG (1986) after MECKE (1968)
had obtained his related characterization of Cox processes. Limit theo-
rems for sampling from a finite population and for general exchangeable
sequences have been proved in varying generality by many authors, in-
cluding CHERNOV and TEICHER (1958), HAJEK (1960), ROSEN (1964),
BILLINGSLEY (1968), and HAGBERG (1973). The results of Theorems 16.23
and 16.21 first appeared in KALLENBERG (1973b).
Detailed accounts of weak convergence theory and its applications may be
found in several excellent textbooks and monographs, including BILLINGS-
LEY (1968), POLLARD (1984), ETHIER and KURTZ (1986), and JACOD and
SHIRYAEV (1987). More information on limit theorems for random measures
and point processes is available in MATTHES et al. (1978) and KALLENBERG
(1986). A good general reference for random sets is MATHERON (1975).
584 Foundations of Modern Probability
17. Stochastic Integrals and Quadratic Variation
The first stochastic integral with a random integrand was defined by ITO
(1942a, 1944), who used Brownian motion as the integrator and assumed
the integrand to be product measurable and adapted. DooB (1953) noted
the connection with martingale theory. A first version of the fundamen-
tal substitution rule was proved by ITO [(1951a). The result was later
extended by many authors. The compensated integral in Corollary 17.21
was introduced by FISK, and independently by STRATONOVICH (1966).
The existence of the quadratic variation process was originally de-
duced from the Doob-Meyer decomposition. FISK (1966) showed how the
quadratic variation can also be obtained directly from the process, as in
Proposition 17.17. The present construction was inspired by ROGERS and
WILLIAMS (2000b). The BDG inequalities were originally proved for p > 1
and discrete time by BURKHOLDER (1966). MILLAR (1968) noted the ex-
tension to continuous martingales, in which context the further extension to
arbitrary p > 0 was obtained independently by BURKHOLDER and GUNDY
(1970) and NOVIKOV (1971). KUNITA and WATANABE (1967) introduced
the covariation of two martingales and proved the associated characteri-
zation of the integral. They further established some general inequalities
related to Proposition 17.9.
The It6 integral was extended to square-integrable martingales by COUR-
REGE (1962-63) and KUNITA and WATANABE (1967) and to continuous
semimartingales by DOLEANS-DADE and MEYER (1970). The idea of local-
ization is due to ITO and WATANABE (1965). Theorem 17.24 was obtained
by KAZAMAKI (1972) as part of a general theory of random time change.
Stochastic integrals depending on a parameter were studied by DOLEANS
(1967b) and STRICKER and YOR (1978), and the functional representation
of Proposition 17.26 first appeared in KALLENBERG (1996a).
Elementary introductions to Ita integration appear in many textbooks,
such as CHUNG and WILLIAMS (1983) and 0KSENDAL (1998). For more
advanced accounts and for further information, see IKEDA and WATANABE
(1989), ROGERS and WILLIAMS (2000b), KARATZAS and SHREVE (1991),
and REVUZ and YOR (1999).
18. Continuous Martingales and Brownian Motion
The fundamental characterization of Brownian motion in Theorem 18.3 was
proved by LEVY (1954), who also (1940) noted the conformal invariance
up to a time change of complex Brownian motion and stated the polar-
ity of singletons. A rigorous proof of Theorem 18.6 was later provided by
KAKUTANI (1944a-b). KUNITA and WATANABE (1967) gave the first mod-
ern proof of Levy's characterization theorem, based on Ita's formula and
exponential martingales. The history of the latter can be traced back to
the seminal CAMERON and MARTIN (1944) paper, the source of Theorem
Historical and Bibliographical Notes 585
18.22, and to WALD'S (1946, 1947) work in sequential analysis, where the
identity of Lemma 18.24 first appeared in a version for random walks.
The integral representation in Theorem 18.10 is essentially due to ITO
(1951c), who noted its connection with multiple stochastic integrals and
chaos expansions. A one-dimensional version of Theorem 18.12 appears in
DOOB (1953). The general time-change Theorem 18.4 was discovered in-
dependently by DAMBIS (1965) and DUBINS and SCHWARZ (1965), and a
systematic study of isotropic martingales was initiated by GETOOR and
SHARPE (1972). The multivariate result in Proposition 18.8 was noted
by KNIGHT (1971), and a version of Proposition 18.9 for general ex-
changeable processes appears in KALLENBERG (1989). The skew-product
representation in Corollary 18.7 is due to GALMARINO (1963),
The Cameron-Martin theorem was gradually extended to more general
settings by many authors, including MARUYAMA (1954, 1955), GIRSANOV
(1960), and VAN SCHUPPEN and WONG (1974). The martingale criterion
of Theorem 18.23 was obtained by NOVIKOV (1972).
The material in this chapter is covered by many texts, including the
excellent monographs by KARATZAS and SHREVE (1991) and REVUZ and
YOR (1999). A more advanced and amazingly informative text is JACOD
(1979).
19. Feller Processes and Semigroups
Semigroup ideas are implicit in KOLMOGOROV'S pioneering (1931a) pa-
per, whose central theme is the search for local characteristics that will
determine the transition probabilities through a system of differential equa-
tions, the so-called Kolmogorov forward and backward equations. Markov
chains and diffusion processes were originally treated separately, but in
(1935) KOLMOGOROV proposed a unified framework, with transition ker-
nels regarded as operators (initially operating on measures rather than on
functions), and with local characteristics given by an associated generator.
Kolmogorov's ideas were taken up by FELLER (1936), who obtained
general existence and uniqueness results for the forward and backward
equations. The abstract theory of contraction semigroups on Banach spaces
was developed independently by HILLE (1948) and YOSIDA (1948), both
of whom recognized its significance for the theory of Markov processes.
The power of the semigroup approach became clear through the work of
FELLER (1952, 1954), who gave a complete description of the generators of
one-dimensional diffusions. In particular, Feller characterizes the boundary
behavior of the process in terms of the domain of the generator.
The systematic study of Markov semigroups began with the work of
DVNKIN (1955a). The standard approach is to postulate strong continuity
instead of the weaker and more easily verified condition (F 2 ). The posi-
tive maximum principle appears in the work of ITO (1957), and the core
condition of Proposition 19.9 is due to S. WATANABE (1968).
586 Foundations of Modern Probability
The first regularity theorem was obtained by DOEBLIN (1939b), who
gave conditions for the paths to be step functions. A sufficient condition for
continuity was then obtained by FORTET (1943). Finally, KINNEY (1953)
showed that any Feller process has a version with rcll paths, after DYNKIN
(1952) had obtained the same property under a 6lder condition. The use
of martingale methods for the study of Markov processes dates back to
KINNEY (1953) and DOOB (1954).
The strong Markov property for Feller processes was proved indepen-
dently by DYNKIN and YUSHKEVICH (1956) and by BLUMENTHAL (1957)
after special cases had been considered by DOOB (1945), HUNT (1956), and
RAY (1956). BLUMENTHAL'S (1957) paper also contains his zero-one law.
DYNKIN (1955a) introduced his "characteristic operator," and a version of
Theorem 19.24 appears in DYNKIN (1956).
There is a vast literature on approximation results for Markov chains
and Markov processes, covering a wide range of applications. The use of
semigroup methods to prove limit theorems can be traced back to LINDE-
BERG'S (1922a) proof of the central limit theorem. The general results in
Theorems 19.25 and 19.28 were developed in stages by TROTTER (1958a),
SOYA (1967), KURTZ (1969, 1975), and MACKEVICIUS (1974). Our proof
of Theorem 19.25 uses ideas from J .A. GOLDSTEIN (1976).
A splendid introduction to semigroup theory is given by the relevant
chapters in FELLER (1971). In particular, Feller shows how the one-
dimensional Levy-Khinchin formula and associated limit theorems can be
derived by semigroup methods. More detailed and advanced accounts of
the subject appear in DYNKIN (1965), ETHIER and KURTZ (1986), and
DELLACHERIE and MEYER (1975-87).
20. Ergodic Properties of Markov Processes
The first ratio ergodic theorems were obtained by DOEBLIN (1938b), DOOB
(1938, 1948a), KAKUTANI (1940), and HUREWICZ (1944). HOPF (1954) and
DUNFORD and SCHWARTZ (1956) extended the pointwise ergodic theorem
to general L1-LOO-contractions, and the ratio ergodic theorem was extended
to positive L1-contractions by CHACON and ORNSTEIN (1960). The present
approach to their result in due to AKCOGLU and CHACON (1970).
The notion of Harris recurrence goes back to DOEBLIN (1940) and HAR-
RIS (1956). The latter author used the condition to ensure the existence,
in discrete time, of a a-finite invariant measure. A corresponding continu-
ous-time result was obtained by H. WATANABE (1964). The total variation
convergence of Markov transition probabilities was obtained for a count-
able state space by OREY (1959, 1962) and in general by JAMISON and
OREY (1967). BLACKWELL and FREEDMAN (1964) noted the equivalence
of mixing and tail triviality. The present coupling approach goes back to
GRIFFEATH (1975) and S. GOLDSTEIN (1979) for the case of strong ergod-
Historical and Bibliographical Notes 587
icity and to BERBEE (1979) and ALDOUS and THORISSON (1993) for the
corresponding weak result.
There is an extensive literature on ergodic theorems for Markov pro-
cesses, mostly dealing with the discrete-time case. General expositions have
been given by many authors, beginning with NEVEU (1971) and OREY
(1971). Our treatment of Harris recurrent Feller processes is adapted from
KUNITA (1990), who in turn follows the discrete-time approach of RE-
vuz (1984). KRENGEL (1985) gives a comprehensive survey of abstract
ergodic theorems. Detailed accounts of the coupling method and its various
ramifications appear in LINDVALL (1992) and THORISSON (2000).
21. Stochastic Differential Equations and Martingale
Problems
Long before the existence of any general theory for SDEs, LANGEVIN (1908)
proposed his equation to model the velocity of a Brownian particle. The
solution process was later studied by ORNSTEIN and UHLENBECK (1930)
and was thus named after them. A more rigorous discussion appears in
DOOB (1942a).
The general idea of a stochastic differential equation goes back to BERN-
STEIN (1934, 1938), who proposed a pathwise construction of diffusion
processes by a discrete approximation, leading in the limit to a formal
differential equation driven by a Brownian motion. However, I TO (1942a,
1951b) was the first author to develop a rigorous and systematic theory,
including a precise definition of the integral, conditions for existence and
uniqueness of solutions, and basic properties of the solution process, such
as the Markov property and the continuous dependence on initial state.
Similar results were obtained, later but independently, by GIHMAN (1947,
1950-51).
The notion of a weak solution was introduced by GIRSANOV (1960), and
a version of the weak existence Theorem 21.9 appears in SKOROHOD (1965).
The ideas behind the transformations in Propositions 21.12 and 21.13 date
back to GIRSANOV (1960) and VOLKONSKY (1958), respectively. The no-
tion of a martingale problem can be traced back to LEVY's martingale
characterization of Brownian motion and DYNKIN's theory of the charac-
teristic operator. A comprehensive theory was developed by STROOCK and
VARADHAN (1969), who established the equivalence with weak solutions
to the associated SDEs, obtained general criteria for uniqueness in law,
and deduced conditions for the strong Markov and Feller properties. The
measurability part of Theorem 21.10 is a slight extension of an exercise in
STROOCK and VARADHAN (1979).
YAMADA and WATANABE (1971) proved that weak existence and path-
wise uniqueness imply strong existence and uniqueness in law. Under the
same conditions, they further established the existence of a functional
588 Foundations of Modern Probability
solution, possibly depending on the initial distribution of the process;
that dependence was later removed by KALLENBERG (1996a). IKEDA and
WATANABE (1989) noted how the notions of pathwise uniqueness and
uniqueness in law extend by conditioning from degenerate to arbitrary
initial distributions.
The basic theory of SDEs is covered by many excellent textbooks on
different levels, including IKEDA and WATANABE (1989), ROGERS and
WILLIAMS (1987), and KARATZAS and SHREVE (1991). More information
on the martingale problem is available in JACOD (1979), STROOCK and
VARADHAN (1979), and ETHIER and KURTZ (1986).
22. Local Time, Excursions, and Additive Functionals
Local time of Brownian motion at a fixed point was discovered and ex-
plored by LEVY (1939), who devised several explicit constructions, mostly
of the type of Proposition 22.12. Much of Levy's analysis is based on
the observation in Corollary 22.3. The elementary Lemma 22.2 is due to
SKOROHOD (1961-62). Formula (1), first noted for Brownian motion by
TANAKA (1963), was taken by MEYER (1976) as the basis for a general semi-
martingale approach. The general It6-Tanaka formula in Theorem 22.5 was
obtained independently by MEYER (1976) and WANG (1977). TROTTER
(1958b) proved that Brownian local time has a jointly continuous version,
and the extension to general continuous semimartingales in Theorem 22.4
was obtained by YOR (1978).
Modern excursion theory originated with the seminal paper of ITO
(1972), which was partly inspired by earlier work of LEVY (1939). In par-
ticular, Ita proved a version of Theorem 22.11, assuming the existence of
local time. HOROWITZ (1972) independently studied regenerative sets and
noted their connection with subordinators, equivalent to the existence of
a local time. A systematic theory of regenerative processes was developed
by MAISONNEUVE (1974). The remarkable Theorem 22.17 was discovered
independently by RAY (1963) and KNIGHT (1963), and the present proof is
essentially due to WALSH (1978). Our construction of the excursion process
is close in spirit to Levy's original ideas and to those in GREENWOOD and
PITMAN (1980).
Elementary additive functionals of integral type had been discussed ex-
tensively in the literature when DYNKIN proposed a study of the general
case. The existence Theorem 22.23 was obtained by VOLKONSKY (1960),
and the construction of local time in Theorem 22.24 dates back to BLUMEN-
THAL and GETOOR (1964). The integral representation of CAFs in Theorem
22.25 was proved independently by VOLKONSKY (1958, 1960) and McK-
EAN and TANAKA (1961). The characterization of additive functionals in
terms of suitable measures on the state space dates back to MEYER (1962),
and the explicit representation of the associated measures was found by
REVUZ (1970) after special cases had been considered by HUNT (1957-58).
Historical and Bibliographical Notes 589
An excellent introduction to local time appears in KARATZAS and
SHREVE (1991). The books by ITO and McKEAN (1965) and REVUZ and
YOR (1999) contain an abundance of further information on the subject.
The latter text may also serve as a good introduction to additive func-
tionals and excursion theory. For more information on the latter topics,
the reader may consult BLUMENTHAL and GETOOR (1968), BLUMENTHAL
(1992), and DELLACHERIE et al. (1992).
23. One-Dimensional SDEs and Diffusions
The study of continuous Markov processes and the associated parabolic dif-
ferential equations, initiated by KOLMOGOROV (1931a) and FELLER (1936),
took a new direction with the seminal papers of FELLER (1952, 1954), who
studied the generators of one-dimensional diffusions within the framework
of the newly developed semigroup theory. In particular, Feller gave a com-
plete description in terms of scale function and speed measure, classified
the boundary behavior, and showed how the latter is determined by the
domain of the generator. Finally, he identified the cases when explosion
occurs, corresponding to the absorption cases in Theorem 23.15.
A more probabilistic approach to these results was developed by DYNKIN
(1955b, 1959), who along with RAY (1956) continued Feller's study of the
relationship between analytic properties of the generator and sample path
properties of the process. The idea of constructing diffusions on a natural
scale through a time change of Brownian motion is due to HUNT (1958)
and VOLKONSKY (1958), and the full description in Theorem 23.9 was com-
pleted by VOLKONSKY (1960) and ITO and McKEAN (1965). The present
stochastic calculus approach is based on ideas in MELEARD (1986).
The ratio ergodic Theorem 23.14 was first obtained for Brownian motion
by DERMAN (1954), by a method originally devised for discrete-time chains
by DOEBLIN (1938). It was later extended to more general diffusions by
MOTOO and WATANABE (1958). The ergodic behavior of recurrent one-
dimensional diffusions was analyzed by MARUYAMA and TAN AKA (1957).
For one-dimensional SDEs, SKOROHOD (1965) noticed that Ita's original
Lipschitz condition for pathwise uniqueness can be replaced by a weaker
Holder condition. He also obtained a corresponding comparison theorem.
The improved conditions in Theorems 23.3 and 23.5 are due to YAMADA
and WATANABE (1971) and YAMADA (1973), respectively. PERKINS (1982)
and LE GALL (1983) noted how the use of semimartingale local time sim-
plifies and unifies the proofs of those and related results. The fundamental
weak existence and uniqueness criteria in Theorem 23.1 were discovered
by ENGELBERT and SCHMIDT (1984, 1985), whose (1981) zero-one law is
implicit in Lemma 23.2.
Elementary introductions to one-dimensional diffusions appear in BREI-
MAN (1968), FREEDMAN (1971b), and ROGERS and VVILLIAMS (2000b).
More detailed and advanced accounts are given by DYNKIN (1965) and ITO
590 Foundations of Modern Probability
and McKEAN (1965). Further information on one-dimensional SDEs may
be obtained from the excellent books by KARATZAS and SHREVE (1991)
and REVUZ and YOR (1999).
24. Connections with PDEs and Potential Theory
The fundamental solution to the heat equation in terms of the Gaus-
sian kernel was obtained by LAPLACE (1809). A century later BACHELlER
(1900, 1901) noted the relationship between Brownian motion and the
heat equation. The PDE connections were further explored by many au-
thors, including KOLMOGOROV (1931a), FELLER (1936), KAc (1951), and
DOOB (1955). A first version of Theorem 24.1 was obtained by KAC (1949),
who was in turn inspired by FEYNMAN'S (1948) work on the Schrodinger
equation. Theorem 24.2 is due to STROOCK and VARADHAN (1969).
GREEN (1828), in his discussion of the Dirichlet problem, introduced
the functions named after him. The Dirichlet, sweeping, and equilibrium
problems were all studied by GAUSS (1840) in a pioneering paper on
electrostatics. The rigorous developments in potential theory began with
POINCARE (1890-99), who solved the Dirichlet problem for domains with
a smooth boundary. The equilibrium measure was characterized by GAUSS
as the unique measure minimizing a certain energy functional, but the ex-
istence of the minimum was not rigorously established until FROSTMAN
(1935).
The first probabilistic connections were made by PHILLIPS and WIENER
(1923) and COURANT et al. (1928), who solved the Dirichlet problem
in the plane by a method of discrete approximation, involving a version
of Theorem 24.5 for a simple symmetric random walk. KOLMOGOROV
and LEONTOVICH (1933) evaluated a special hitting distribution for two-
dimensional Brownian motion and noted that it satisfies the heat equation.
KAKUTANI (1944b, 1945) showed how the harmonic measure and sweeping
kernel can be expressed in terms of a Brownian motion. The probabilistic
methods were extended and perfected by DOOB (1954, 1955), who noted
the profound connections with martingale theory. A general potential the-
ory was later developed by HUNT (1957-58) for broad classes of Markov
processes.
The interpretation of Green functions as occupation densities was known
to KAC (1951), and a probabilistic approach to Green functions was devel-
oped by HUNT (1956). The connection between equilibrium measures and
quitting times, implicit already in SPITZER (1964) and ITO and McKEAN
(1965), was exploited by CHUNG (1973) to yield the explicit representation
of Theorem 24.14.
Time reversal of diffusion processes was first considered by SCHRODINGER
(1931). KOLMOGOROV (1936b, 1937) computed the transition kernels of the
reversed process and gave necessary and sufficient conditions for symmetry.
The basic role of time reversal and duality in potential theory was recog-
Historical and Bibliographical Notes 591
nized by DOOB (1954) and HUNT (1958). Proposition 24.15 and the related
construction in Theorem 24.21 go back to HUNT, but Theorem 24.19 may
be new. The measure v in Theorem 24.21 is related to the "Kuznetsov mea-
sures," discussed extensively in GETOOR (1990). The connection between
random sets and alternating capacities was established by CHOQUET (1953-
54), and a corresponding representation of infinitely divisible random sets
was obtained by MATHERON (1975).
Elementary introductions to probabilistic potential theory appear in
BASS (1995) and CHUNG (1995), and to other PDE connections in
KARATZAS and SHREVE (1991). A detailed exposition of classical prob-
abilistic potential theory is given by PORT and STONE (1978). DOOB
(1984) provides a wealth of further information on both the analytic
and probabilistic aspects. Introductions to Hunt's work and the subse-
quent developments are given by CHUNG (1982) and DELLACHERIE and
MEYER (1975-87). More advanced treatments appear in BLUMENTHAL and
GETOOR (1968) and SHARPE (1988).
25. Predictability, Compensation, and Excessive
Functions
The basic connection between superharmonic functions and supermartin-
gales was established by DOOB (1954), who also proved that compositions
of excessive functions with Brownian motion are continuous. Doob further
recognized the need for a general decomposition theorem for supermartin-
gales, generalizing the elementary Lemma 7.10. Such a result was eventually
proved by MEYER (1962, 1963), in the form of Lemma 25.7, after special
decompositions in the Markovian context had been obtained by VOLKON-
SKY (1960) and SHUR (1961). Meyer's original proof was profound and
clever. The present more elementary approach, based on DUNFORD'S (1939)
weak compactness criterion, was devised by RAO (1969a). The extension to
general submartingales was accomplished by ITO and WATANABE (1965)
through the introduction of local martingales.
Predictable and totally inaccessible times appear implicitly in the work of
BLUMENTHAL (1957) and HUNT (1957-58), in the context of quasi-left-con-
tinuity. A systematic study of optional times and their associated a-fields
was initiated by CHUNG and DOOB (1965). The basic role of the predictable
u-field became clear after DOLEANS (1967a) had proved the equivalence
between naturalness and predictability for increasing processes, thereby
establishing the ultimate version of the Doob-Meyer decomposition. The
moment inequality in Proposition 25.21 was obtained independently by
GARSIA (1973) and NEVEU (1975) after a more special result had been
proved by BURKHOLDER et al. (1972). The theory of optional and pre-
dictable times and u-fields was developed by MEYER (1966), DELLACHERIE
592 Foundations of Modern Probability
(1972), and others into a "general theory of processes," which has in many
ways revolutionized modern probability.
Natural compensators of optional times first appeared in reliability the-
ory. More general compensators were later studied in the Markovian context
by S. WATANABE (1964) under the name of "Levy systems." GRIGELIO-
NIS (1971) and JACOD (1975) constructed the compensator of a general
random measure and introduced the related "local characteristics" of a
general semimartingale. WATANABE (1964) proved that a simple point
process with a continuous and deterministic compensator is Poisson; a
corresponding time-change result was obtained independently by MEYER
(1971) and PAPANGELOU (1972). The extension in Theorem 25.24 was given
by KALLENBERG (1990), and general versions of Proposition 25.27 appear
in ROSINSKI and WOYCZYNSKI (1986) and KALLENBERG (1992).
An authoritative account of the general theory, including an elegant but
less elementary projection approach to the Doob-Meyer decomposition due
to DOLEANS, is given by DELLACHERIE and MEYER (1975-87). Useful in-
troductions to the theory are contained in ELLIOTT (1982) and ROGERS
and WILLIAMS (2000b). Our elementary proof of Lemma 25.10 uses ideas
from DaoB (1984). BLUMENTHAL and GETOOR (1968) remains a good
general reference on additive functionals and their potentials. A detailed
account of random measures and their compensators appears in JACOD and
SHIRYAEV (1987). Applications to queuing theory are given by BREMAUD
(1981), BACCELLI and BREMAUD (2000), and LAST and BRANDT (1995).
26. Semimartingales and General Stochastic
Integration
DOOB (1953) conceived the idea of a stochastic integration theory for gen-
eral L 2 -martingales, based on a suitable decomposition of continuous-time
submartingales. MEYER'S (1962) proof of such a result opened the door
to the L2-theory, which was then developed by COURREGE (1962-63) and
KUNITA and WATANABE (1967). The latter paper contains in particular a
version of the general substitution rule. The integration theory was later
extended in a series of papers by MEYER (1967) and DOLEANS-DADE and
MEYER (1970) and reached its final form with the notes of MEYER (1976)
and the books by JACOD (1979), METIVIER and PELLAUMAIL (1979), and
DELLACHERIE and MEYER (1975-87).
The basic role of predictable processes as integrands was recognized by
MEYER (1967). By contrast, semimartingales were originally introduced in
an ad hoc manner by DOLEANS-DADE and MEYER (1970), and their ba-
sic preservation laws were only gradually recognized. In particular, JACOD
(1975) used the general Girsanov theorem of VAN SCHUPPEN and WONG
(1974) to show that the semimartingale property is preserved under abso-
lutely continuous changes of the probability measure. The characterization
Historical and Bibliographical Notes 593
of general stochastic integrators as semimartingales was obtained indepen-
dently by BICHTELER (1979) and DELLACHERIE (1980), in both cases with
support from analysts.
Quasimartingales were originally introduced by FISK (1965) and OREY
(1966). The decomposition of RAO (1969b) extends a result by KRICKE-
BERG (1956) for LI-bounded martingales. YOEURP (1976) combined a
notion of "stable subspaces" due to KUNITA and WATANABE (1967) with
the Hilbert space structure of M 2 to obtain an orthogonal decomposition
of L 2 -martingales, equivalent to the decompositions in 'I'heorem 26.14 and
Proposition 26.16. Elaborating on those ideas, MEYER (1976) showed that
the purely discontinuous component admits a representation as a sum of
compensated jumps.
SDEs driven by general Levy processes were already considered by ITO
(1951 b ). The study of SD Es driven by general semimartingales was initi-
ated by DOLEANS-DADE (1970), who obtained her exponential process as
a solution to the equation in Theorem 26.8. The scope of the theory was
later expanded by many authors, and a comprehensive account is given by
PROTTER (1990).
The martingale inequalities in Theorems 26.12 and 26.17 have ancient
origins. Thus, a version of the latter result for independent random variables
was proved by KOLMOGOROV (1929) and, in a sharper form, by PROHOROV
(1959). Their result was extended to discrete-time martingales by JOHNSON
et al. (1985) and HITCZENKO (1990). The present statements appeared in
KALLENBERG and SZTENCEL (1991).
Early versions of the inequalities in Theorem 26.12 were proved by KHIN-
CHIN (1923, 1924) for symmetric random walks and by PALEY (1932) for
Walsh series. A version for independent random variables was obtained by
MARCINKIEWICZ and ZYGMUND (1937, 1938). The extension to discrete-
time martingales is due to BURKHOLDER (1966) for p > 1 and to DAVIS
(1970) for p = 1. The result was extended to continuous time by BURK-
HOLDER et al. (1972), who also noted how the general result can be deduced
from the statement for p = 1. The present proof is a continuous-time version
of Davis' original argument.
Excellent introductions to semimartingales and stochastic integration are
given by DELLACHERIE and MEYER (1975-87) and JACOD and SHIRYAEV
(1987). PROTTER (1990) offers an interesting alternative approach, orig-
inally suggested by MEYER and by DELLACHERIE (1980). The book by
JACOD (1979) remains a rich source of further information on the subject.
27. Large Deviations
Large deviation theory originated with certain refinements of the central
limit theorem obtained by many authors, beginning with KHINCHIN (1929).
Here the object of study is the ratio of tail probabilities rn(x) == P{ (n > x} /
P{< > x}, where < is N(O,l) and <n == n-l/2L:knk for some Li.d.
594 Foundations of Modern Probability
random variables k with mean 0 and variance 1, so that r n (x) -t 1 for
fixed x. A precise asymptotic expansion was obtained by CRAMER (1938),
in the case when x varies with n at a rate x o(n 1 / 2 ). (See PETROV (1995),
Theorem 5.23, for details.)
In the same historic paper, CRAMER (1938) obtained the first true large
deviation result, in the form of our Theorem 27.3, though under some
technical assumptions that were later removed by CHERNOFF (1952) and
BAHADUR (1971). VARADHAN (1966) extended the result to higher dimen-
sions and rephrased it in the form of a general large deviation principle.
At about the same time, SCHILDER (1966) proved his large deviation re-
sult for Brownian motion, using the present change-of-measure approach.
Similar methods were used by FREIDLIN and WENTZELL (1970, 1998) to
study random perturbations of dynamical systems.
Even earlier, SANOY (1957) had obtained his large deviation result for
empirical distributions of i.i.d. random variables. The relative entropy
H(vlj.,t) appearing in the limit had already been introduced in statistics by
KULLBACK and LEIBLER (1951). Its crucial link to the Legendre-Fenchel
transform A *, long anticipated by physicists, was formalized by DONSKER
and VARADHAN (1975-83). The latter authors also developed some pro-
found and far-reaching extensions of Sanov's theorem, in a long series of
formidable papers. ELLIS (1985) gives a detailed exposition of those results,
along with a discussion of their physical significance.
Much of the formalization of underlying principles and techniques was
developed at a later stage. Thus, an abstract version of the projective
limit approach was introduced by DAWSON and GARTNER (1987). BRYC
(1990) supplemented VARADHAN'S (1966) functional version of the LDP
with a reverse proposition. Similarly, IOFFE (1991) appended a power-
ful inverse to the classical "contraction principle." Finally, PUKHALSKY
(1991) established the equivalence, under suitable regularity conditions, of
the exponential tightness and the goodness of the rate function.
STRASSEN (1964) established his formidable law of the iterated logarithm
by direct estimates. A detailed exposition of the original approach appears
in FREEDMAN (1971b). VARADHAN (1984) recognized the result as a corol-
lary to Schilder's theorem, and a complete proof along the suggested lines
appears in DEUSCHEL and STROOCK (1989).
Gentle introductions to large deviation theory and its applications
are given by VARADHAN (1984) and DEMBO and ZEITOUNI (1998). The
more demanding text of DEUSCHEL and STROOCK (1989) provides much
additional insight to the persistent reader.
Appendix
Some more advanced aspects of measure theory are covered by Roy-
DEN (1988), PARTHASARATHY (1967), and DUDLEY (1989). The projection
Historical and Bibliographical Notes 595
and section theorems depend on capacity theory, for which we refer to
DELLACHERIE (1972) and DELLACHERIE and MEYER (1975-87).
The J}-topology was introduced by SKOROHOD (1956), and detailed ex-
positions may be found in BILLINGSLEY (1968), ETHIER and KURTZ (1986),
and JACOD and SHIRYAEV (1987). A discussion of the vague topology on
M(S) with S IcscH is given by BAUER (1972). The topology on the space
of closed sets, considered here, was introduced in a more general setting by
FELL (1962), and a full account appears in MATHERON (1975), including
a detailed proof (different from ours) of the basic Theorem A2. 5.
,
Bibliography
This list includes only publications that are explicitly mentioned
in the text or notes or are directly related to results cited in the
book. Knowledgeable readers will notice that many books and
papers of historical significance have been omitted.
ADLER, R.J. (1990). An Introduction to Continuity, Extrema, and Related Topics
for General Gaussian Proceses. Inst. Math. Statist., Hayward, CA.
AKCOGLU, M.A., CHACON, R.V. (1970). Ergodic properties of operators in
Lebesgue space. Adv. Appl. Probab. 2, 1-47.
ALDOUS, D.J. (1978). Stopping times and tightness. Ann. Probab. 6, 335-340.
- (1985). Exchangeability and related topics. Lect. Notes in Math. 1117, 1-198.
Springer, Berlin.
ALDOUS, D., THORISSON, H. (1993). Shift-coupling. Stoch. Proc. Appl. 44,1-14.
ALEXANDROV, A.D. (1940-43). Additive set-functions in abstract spaces. Mat.
Sb. 8, 307-348; 9, 563-628; 13, 169-238.
ANDRE, D. (1887). Solution directe du probleme resolu par M. Bertrand. C.R.
Acad. Sci. Paris 105, 436-437.
ATHREYA, K., McDoNALD, D., NEY, P. (1978). Coupling and the renewal
theorem. Amer. Math. Monthly 85, 809-814.
BACCELLI, F., BREMAUD, P. (2000). Elements of Queueing [sic) Theory, 2nd ed.,
Springer, Berlin.
BACHELlER, L. (1900). Theorie de la speculation. Ann. Sci. Ecole Norm. Sup.
17, 21-86.
- (1901). Theorie mathematique du jeu. Ann. Sci. Ecole Norm. Sup. 18, 143-
210.
BAHADUR, R.R. (1971). Some Limit Theorems in Statistics. SIAM, Philadelphia.
BARBIER, E. (1887). Generalisation du probleme resolu par M. J. Bertrand. C.R.
Acad. Sci. Paris 105, 407, 440.
BASS, R.F. (1995). Probabilistic Techniques in Analysis. Springer, NY.
- (1998). Diffusions and Elliptic Operators. Springer, NY.
BAUER, H. (1972). Probability Theory and Elements of Measure Theory. Engl.
trans., Holt, Rinehart & Winston, NY.
BAXTER, G. (1961). An analytic approach to finite fluctuation problems in
probability. J. d'Analyse Math. 9, 31-70.
BELYAEV, Y.K. (1963). Limit theorems for dissipative flows. Th. Probab. Appl.
8, 165-173.
Bibliography 597
BERBEE, H.C.P. (1979). Random Walks with Stationary Increments and Renewal
Theory. Mathematisch Centrum, Amsterdam.
BERNOULLI, J. (1713). Ars Conjectandi. Thurnisiorum, Basel.
BERNSTEIN, S.N. (1927). Sur l'extension du theoreme limite du calcul des
probabilites aux sommes de quantites dependantes. Math. Ann. 97, 1-59.
- (1934). Principes de la theorie des equations differentielles stochastiques.
Trudy Fiz.-Mat., Steklov Inst., Akad. Nauk. 5,95-124.
- (1937). On some variations of the Chebyshev inequality (in Russian). Dokl.
Acad. Nauk SSSR 17, 275-277.
- (1938). Equations differentielles stochastiques. Act. Sci. Ind. 738, 5-31.
BERTOIN, J. (1996). Levy Processes. Cambridge Univ. Press.
BERTRAND, J. (1887). Solution d'un probleme. C.R. Acad. Sci. Paris 105, 369.
BICHTELER, K. (1979). Stochastic integrators. Bull. Amer. Math. Soc. 1, 761-
765.
BIENAYME, J. (1853). Considerations a l'appui de la decouverte de Laplace sur la
loi de probabilite dans la methode des moindres carres. C.R. Acad. Sci. Paris
37, 309-324.
BILLINGSLEY, P. (1965). Ergodic Theory and Information. "Tiley, NY.
- (1968). Convergence of Probability Measures. Wiley, NY.
- (1995). Probability and Measure, 3rd ed. Wiley, NY.
BIRKHOFF, G.D. (1932). Proof of the ergodic theorem. Proc. Natl. Acad. Sci.
USA 17, 656-660.
BLACKWELL, D. (1948). A renewal theorem. Duke Math. J. 15, 145-150.
- (1953). Extension of a renewal theorem. Pacific J. Math. 3, 315-320.
BLACKWELL, D., FREEDMAN, D. (1964). The tail a-field of a Markov chain and
a theorem of Orey. Ann. Math. Statist. 35, 1291-1295.
BLUMENTHAL, R.M. (1957). An extended Markov property. 'Trans. Amer. Math.
Soc. 82, 52-72.
- (1992). Excursions of Markov Processes. Birkhauser, Bost.on.
BLUMENTHAL, R.M., GETOOR, R.K. (1964). Local times for Markov processes.
Z. Wahrsch. verw. Geh. 3, 50-74.
- (1968). Markov Processes and Potential Theory. Academic Press, NY.
BOCHNER, S. (1932). Vorlesungen iiber Fouriersche Integrale, Akad. Verlagsges.,
Leipzig. Repr. Chelsea, NY 1948.
- (1933). Monotone Funktionen, Stieltjessche Integrale unci harmonische Anal-
yse. Math. Ann. 108, 378-410.
BOLTZMANN, L. (1887). Uber die mechanischen Analogien des zweiten Haupt-
satzes der Thermodynamik. J. Reine Angew. Math. 100, 201-212.
BOREL, E. (1895). Sur quelques points de la theorie des fonctions. Ann. Sci.
Ecole Norm. Sup. (3) 12, 9-55.
- (1898). Ler;ons sur la Theorie des Fonctions. Gauthier-Villars, Paris.
- (1909). Les probabilites denombrables et leurs applications arithmetiques.
Rend. Cire. Mat. Palermo 27 247-271.
598 Foundations of Modern Probability
BREIMAN, L. (1957-60). The individual ergodic theorem of infomation theory.
Ann. Math. Statist. 28, 809-811; 31, 809-810.
- (1968). Probability. Addison-Wesley, Reading, MA. Repr. SIAM, Philadelphia
1992.
BREMAUD, P. (1981). Point Processes and Queues. Springer, NY.
BROWN, R. (1828). A brief description of microscopical observations made in
the months of June, July and August 1827, on the particles contained in the
pollen of plants; and on the general existence of active molecules in organic
and inorganic bodies. Ann. Phys. 14, 294-313.
BRYC, W. (1990). Large deviations by the asymptotic value method. In Diffu-
sion Processes and Related Problems in Analysis (M. Pinsky, ed.), 447-472.
Birkhauser, Basel.
BUHLMANN, H. (1960). Austauschbare stochastische Variabeln und ihre Grenzw-
ertsatze. Univ. Calif. Publ. Statist. 3, 1-35.
BUNIAKOWSKY, V.Y. (1859). Sur quelques inegalites concernant les integrales or-
dinaires et les integrales aux differences finies. Mem. de l'Acad. St.-Petersbourg
1:9.
BURKHOLDER, D.L. (1966). Martingale transforms. Ann. Math. Statist. 37,1494-
1504.
BURKHOLDER, D.L., DAVIS, B.J., GUNDY, R.F. (1972). Integral inequalities for
convex functions of operators on martingales. Proc. 6th Berkeley Symp. Math.
Statist. Probab. 2, 223-240.
BURKHOLDER, D.L., GUNDY, R.F. (1970). Extrapolation and interpolation of
quasi-linear operators on martingales. Acta Math. 124, 249-304.
CAMERON, R.H., MARTIN, W.T. (1944). Transformation of Wiener integrals
under translations. Ann. Math. 45, 386-396.
CANTELLI, F.P. (1917). Su due applicaziolle di un teorema di G. Boole alla
statistica matematica. Rend. Accad. Naz. Lincei 26, 295-302.
- (1933). Sulla determinazione empirica della leggi di probabilita. Ciorn. 1st.
Ital. Attuari 4, 421-424.
CARATHEODORY, C. (1927). Vorlesungen iiber reelle Funktionen, 2nd ed. Teubner,
Leipzig (1st ed. 1918). Repr. Chelsea, NY 1946.
CARLESON, L. (1958). Two remarks on the basic theorems of information theory.
Math. Scand. 6, 175-180.
"
CAUCHY, A.L. (1821). Cours d'analyse de ['Ecole Royale Poly technique, Paris.
CHACON, R.V., ORNSTEIN, D.S. (1960). A general ergodic theorem. Illinois J.
Math. 4, 153-160.
CHAPMAN, S. (1928). On the Brownian displacements and thermal diffusion of
grains suspended in a non-uniform fluid. Proc. Roy. Soc. London (A) 119,
34-54.
CHEBYSHEV, P.L. (1867). Des valeurs moyennes. J. Math. Pures Appl. 12, 177-
184.
- (1890). Sur deux theoremes relatifs aux probabilites. Acta Math. 14, 305-315.
Bjbliography 599
CHENTSOV, N.N. (1956). Weak convergence of stochastic processes whose trajec-
tories have no discontinuities of the second kind and the "heuristic" approach
to the Kolmogorov-Smirnov tests. Th. Probab. Appl. 1, 140-144.
CHERNOFF, H. (1952). A measure of asymptotic efficiency for tests of a hypothesis
based on the sum of observations. Ann. Math. Statist. 23, 493-507.
CHERNOFF, H., TEICHER, H. (1958). A central limit theorem for sequences of
exchangeable random variables. A nn. Math. Statist. 29, 118-130.
CHOQUET, G. (1953-54). Theory of capacities. Ann. Inst. Fourier Grenoble 5,
131-295.
CHOW, Y.S., TEICHER, H. (1997). Probability Theory: Independence, Inter-
changeability, Martingales, 3nd ed. Springer, NY.
CHUNG, K.L. (1960). Markov Chains with Stationary Transition Probabilities.
Springer, Berlin.
- (1961). A note on the ergodic theorem of information theory. Ann. Math.
Statist. 32, 612-614.
- (1973). Probabilistic approach to the equilibrium problem in potential theory.
Ann. Inst. Fourier Grenoble 23, 313-322.
- (1974). A Course in Probability Theory, 2nd ed. Academic Press, NY.
- (1982). Lectures from Markov Processes to Brownian Motion. Springer, NY.
- (1995). Green, Brown, and Probability. World Scientific, Singapore.
CHUNG, K.L., DOOB, J .L. (1965). Fields, optionality and measurability. Amer.
J. Math. 87, 397-424.
CHUNG, K.L., FUCHS, W.H.J. (1951). On the distribution of values of sums of
random variables. Mem. Amer. Math. Soc. 6.
CHUNG, K.L., ORNSTEIN, D.S. (1962). On the recurrence of sums of random
variables. Bull. Amer. Math. Soc. 68, 30-32.
CHUNG, K.L., WALSH, J .B. (1974). Meyer's theorem on previsibility. Z. Wahrsch.
verw. Geb. 29, 253-256.
CHUNG, K.L., WILLIAMS, R.J. (1990). Introduction to Stochastic Integration,
2nd ed. Birkhauser, Boston.
COURANT, R., FRIEDRICHS, K., LEWY, H. (1928). Uber die partiellen Differen-
tialgleichungen cler mathematischen Physik. Math. Ann. 100, 32-74.
COURREGE, P. (1962-63). Integrales stochastiques et martingales de carre
integrable. Bern. Brelot-Choquet-Deny 7. Pub!. Inst. H. Poincare.
Cox, D.R. (1955). Some statistical methods connected with series of events. J.
R. Statist. Soc. Sere B 17, 129-164.
CRAMER, H. (1938). Sur un nouveau theoreme-limite de la theorie des
probabilites. Actual. Sci. Indust. 736, 5-23.
- (1942). On harmonic analysis in certain functional spaces. Ark. Mat. Astr.
Fys. 28B:12 (17 pp.).
CRAMER, H., LEADBETTER, M.R. (1967). Stationary and Related Stochastic
Processes. Wiley, NY.
CRAMER, H., WOLD, H. (1936). Some theorems on distribution functions. J.
London Math. Soc. 11, 290-295.
600 Foundations of Modern Probability
CSORGO, M., REVESZ, P. (1981). Strong Approximations in Probability and
Statistics. Academic Press, NY.
DALEY, D.J., VERE-JONES, D. (1988). An Introduction to the Theory of Point
Processes. Springer, NY.
DAMBIS, K.E. (1965). On the decomposition of continuous submartingales. Th.
Probab. Appl. 10, 401-410.
DANIELL, P.J. (1918-19). Integrals in an infinite number of dimensions. Ann.
Math. (2) 20, 281-288.
- (1919-20). Functions of limited variation in an infinite number of dimensions.
Ann. Math. (2) 21, 30-38.
- (1920). Stieltjes derivatives. Bull. Amer. Math. Soc. 26, 444-448.
DAVIS, B.J. (1970). On the integrability of the martingale square function. Israel
J. Math. 8, 187-190.
DAWSON, D.A., GARTNER, J. (1987). Large deviations from the McKean-Vlasov
limit for weakly interacting diffusions. Stochastics 20, 247-308.
DAY, M.M. (1942). Ergodic theorems for Abelian semigroups. Trans. Amer.
Math. Soc. 51, 399-412.
DEBES, H., KERSTAN, J., LIEMANT, A., MATTHES, K. (1970-71). Verallge-
meinerung eines Satzes von Dobrushin I, III. Math. Nachr. 47, 183-244; 50,
99-139.
DELLACHERIE, C. (1972). Capacites et Processus Stochastiques. Springer, Berlin.
- (1980). Un survol de la theorie de l'integrale stochastique. Stoch. Proc. Appl.
10, 115-144.
DELLACHERIE, C., MAISONNEUVE, B., MEYER, P.A. (1992). Probabilites et
Potentiel, V. Hermann, Paris.
DELLACHERIE, C., MEYER, P .A. (1975-87). Probabilites et Potentiel, I-IV.
Hermann, Paris. Engl. trans., North-Holland.
DEMBO, A., ZEITOUNI, O. (1998). Large Deviations Techniques and Applications,
2nd ed. Springer, NY.
DERMAN, C. (1954). Ergodic property of the Brownian motion process. Proc.
Natl. Acad. Sci. USA 40, 1155-1158.
DEUSCHEL, J.D., STROOCK, D.W. (1989). Large Deviations. Academic Press,
Boston.
DOEBLIN, W. (1938a). Expose de la theorie des chaines simples constantes de
Markov a un nombre fini d'etats. Rev. Math. Union Interbalkan. 2, 77-105.
- (1938b). Sur deux problemes de M. Kolmogoroff concernant les chaines
denombrables. Bull. Soc. Math. France 66, 210-220.
- (1939a). Sur les sommes d'un grand nombre de variables aleatoires indepen-
dantes. Bull. Sci. Math. 63, 23-64.
- (193gb). Sur certains mouvements aleatoires discontinus. Skand. Aktuarietid-
skr. 22, 211-222.
- (1940). Elements d'une theorie generale des chaines simples constantes de
Markoff. Ann. Sci. Ecole Norm. Sup. 357, 61-111.
DaHLER, R. (1980). On the conditional independence of random events. Th.
Probab. Appl. 25, 628-634.
Bibliography 601
DOLEANS( -DADE), C. (1967a). Processus croissants naturel et processus crois-
sants tres bien mesurable. C.R. Acad. Sci. Paris 264, 874-876.
- (1967b). Integrales stochastiques dependant d 'un parametre. Publ. Inst. Stat.
Univ. Paris 16, 23-34.
- (1970). Quelques applications de la formule de changement de variables pour
les semimartingales. Z. Wahrsch. verw. Geb. 16, 181-194.
DOLEANS-DADE, C., MEYER, P.A. (1970). Integrales stochastiques par rapport
aux martingales locales. Lect. Notes in Math. 124, 77-107. Springer, Berlin.
DONSKER, M.D. (1951-52). An invariance principle for certain probability limit
theorems. Mem. Amer. Math. Soc. 6.
- (1952). Justification and extension of Doob's heuristic approach to the
Kolmogorov-Smirnov theorems. Ann. Math. Statist. 23, 277-281.
DONSKER, M.D., VARADHAN, S.R.S. (1975-83). Asymptotic evaluation of cer-
tain Markov process expectations for large time, I-IV. Comm. Pure Appl.
Math. 28, 1-47, 279-301; 29, 389-461; 36, 183-212.
DOOB, J .L. (1936). Note on probability. Ann. Math. (2) 37, 363-367.
- (1937). Stochastic processes depending on a continuous parameter. Trans.
Amer. Math. Soc. 42, 107-140.
- (1938). Stochastic processes with an integral-valued paranleter. Trans. Amer.
Math. Soc. 44, 87-150.
- (1940). Regularity properties of certain families of chance variables. Trans.
Amer. Math. Soc. 47, 455-486.
- (1942a). The Brownian movement and stochastic equations. Ann. Math. 43,
351-369.
- (1942b). Topics in the theory of Markoff chains. Trans. Arner. Math. Soc. 52,
37-64.
- (1945). Markoff chains-denumerable case. Trans. Amer. .J\1ath. Soc. 58, 455-
473.
- (1947). Probability in function space. Bull. Amer. Math. joc. 53, 15-30.
- (1948a). Asymptotic properties of Markov transition probabilities. Trans.
Amer. Math. Soc. 63, 393-421.
- (1948b). Renewal theory from the point of view of the theory of probability.
Trans. Amer. Math. Soc. 63, 422-438.
- (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. Ann. Math.
Statist. 20, 393-403.
- (1951). Continuous parameter martingales. Proc. 2nd Be'rkeley Symp. Math.
Statist. Probab., 269-277.
- (1953). Stochastic Processes. Wiley, NY.
- (1954). Semimartingales and subharmonic functions. Trans. Amer. Math. Soc.
77, 86-121.
- (1955). A probability approach to the heat equation. Trans. Amer. Math. Soc.
80, 216-280.
- (1984). Classical Potential Theory and its Probabilistic Counterpart. Springer,
NY.
- (1994). Measure Theory. Springer, NY.
DUBINS, L.E. (1968). On a theorem of Skorohod. Ann. Math. Statist. 39, 2094-
2097.
602 Foundations of Modern Probability
DUBINS, L.E., SCHWARZ, G. (1965). On continuous martingales. Proc. Natl.
Acad. Sci. USA 53, 913-916.
DUDLEV, R.M. (1966). Weak convergence of probabilities on nonseparable metric
spaces and empirical measures on Euclidean spaces. Illinois J. Math. 10, 109-
126.
- (1967). Measures on non-separable metric spaces. Illinois J. Math. 11, 449-
453.
- (1968). Distances of probability measures and random variables. Ann. Math.
Statist. 39, 1563-1572.
- (1989). Real Analysis and Probability. Wadsworth, Brooks & Cole, Pacific
Grove, CA.
DUNFORD, N. (1939). A mean ergodic theorem. Duke Math. J. 5, 635-646.
DUNFORD, N., SCHWARTZ, J. T. (1956). Convergence almost everywhere of
operator averages. J. Rat. Mech. Anal. 5, 129-178.
DURRETT, R. (1984). Brownian Motion and Martingales in Analysis. Wadsworth,
Belmont, CA.
- (1995). Probability Theory and Examples, 2nd ed. Wadsworth, Brooks & Cole,
Pacific Grove, CA.
DVORETZKV, A. (1972). Asymptotic normality for sums of dependent random
variables. Proc. 6th Berkeley Symp. Math. Statist. Probab. 2, 513-535.
DVNKIN, E.B. (1952). Criteria of continuity and lack of discontinuities of the
second kind for trajectories of a Markov stochastic process (Russian). Izv.
Akad. Nauk SSSR, Sere Mat. 16, 563-572.
- (1955a). Infinitesimal operators of Markov stochastic processes (Russian).
Dokl. Akad. Nauk SSSR 105, 206-209.
- (1955b). Continuous one-dimensional Markov processes (Russian). Dokl.
Akad. Nauk SSSR 105, 405-408.
- (1956). Markov processes and semigroups of operators. Infinitesimal operators
of Markov processes. Th. Probab. Appl. 1, 25-60.
- (1959). One-dimensional continuous strong Markov processes. Th. Probab.
Appl. 4, 3-54.
- (1961). Theory of Markov Processes. Engl. trans., Prentice-Hall and Pergamon
Press, Englewood Cliffs, NJ, and Oxford. (Russian orig. 1959.)
- (1965). Markov Processes, Vols. 1-2. Engl. trans., Springer, Berlin. (Russian
orig. 1963.)
- (1978). Sufficient statistics and extreme points. Ann. Probab. 6, 705-730.
DVNKIN, E.B., YUSHKEVICH, A.A. (1956). Strong Markov processes. Th. Probab.
Appl. 1, 134-139.
EINSTEIN, A. (1905). On the movement of small particles suspended in a sta-
tionary liquid demanded by the molecular-kinetic theory of heat. Engl. trans.
in Investigations on the Theory of the Brownian Movement. Repr. Dover, NY
1956.
- (1906). On the theory of Brownian motion. Engl. trans. in Investigations on
the Theory of the Brownian Movement. Repr. Dover, NY 1956.
ELLIOTT, R.J. (1982). Stochastic Calculus and Applications. Springer, NY.
ELLIS, R.S. (1985). Entropy, Large Deviations, and Statistical Mechanics.
Springer, NY.
Bibliography 603
ENGELBERT, H.J., SCHMIDT, W. (1981). On the behaviour of certain functionals
of the Wiener process and applications to stochastic differential equations.
Lect. Notes in Control and Inform. Sci. 36, 47-55.
- (1984). On one-dimensional stochastic differential equations with generalized
drift. Lect. Notes in Control and Inform. Sci. 69, 143-155. Springer, Berlin.
- (1985). On solutions of stochastic differential equations without drift. Z.
Wahrsch. verw. Geb. 68, 287-317.
ERDOS, P., FELLER, W., POLLARD, H. (1949). A theorem on power series. Bull.
Amer. Math. Soc. 55, 201-204.
ERDOS, P., KAC, M. (1946). On certain limit theorems in the theory of
probability. Bull. Amer. Math. Soc. 52, 292-302.
- (1947). On the number of positive sums of independent random variables.
Bull. Amer. Math. Soc. 53, 1011-1020.
ERLANC, A.K. (1909). The theory of probabilities and telephone conversations.
Nyt. Tidskr. Mat. B 20, 33-41.
ETHIER, S.N., KURTZ, T.G. (1986). Markov Processes: Characterization and
Convergence. Wiley, NY.
FABER, G. (1910). Uber stetige Funktionen, II. Math. Ann. 69, 372-443.
FARRELL, R.H. (1962). Representation of invariant measures. Illinois J. Math.
6, 447-467.
FATOU, P. (1906). Series trigonometriques et series de Taylor. Acta Math. 30,
335-400.
FELL, J .M.G. (1962). A Hausdorff topology for the closed subsets of a locally
compact non-Hausdorff space. Proc. Amer. Math. Soc. 13, 472-476.
FELLER, W. (1935-37). Uber den zentralen Grenzwertsatz der Wahrschein-
lichkeitstheorie, I-II. Math. Z. 40, 521-559; 42, 301-312.
- (1936). Zur Theorie der stochastischen Prozesse (Existenz und Eindeutigkeits-
satze). Math. Ann. 113, 113-160.
- (1937). On the Kolmogoroff-P. Levy formula for infinitely divisible distribu-
tion functions. Proc. Yugoslav Acad. Sci. 82, 95-112.
- (1940). On the integra-differential equations of purely discontinuous Markoff
processes. Trans. Amer. Math. Soc. 48, 488-515; 58, 474.
- (1949). Fluctuation theory of recurrent events. Trans. Amer. Math. Soc. 67,
98-119.
- (1952). The parabolic differential equations and the associated semi-groups of
transformations. Ann. Math. 55, 468-519.
- (1954). Diffusion processes in one dimension. Trans. Amer. Math. Soc. 77,
1-31.
- (1968, 1971). An Introduction to Probability Theory and its Applications, 1
(3rd ed.); 2 (2nd ed.). Wiley, NY (1st eds. 1950, 1966).
FELLER, W., OREY, S. (1961). A renewal theorem. J. Math. lv/echo 10, 619-624.
FEYNMAN, R.P. (1948). Space-time approach to nonrelativistic quantum me-
chanics. Rev. Mod. Phys. 20, 367-387.
DE FINETTI, B. (1929). Sulle funzioni ad incremento aleatorio. Rend. Ace. Naz.
Lincei 10, 163-168.
604 Foundations of Modern Probability
- (1930). Fuzione caratteristica di un fenomeno aleatorio. Mem. R. Ace. Lincei
(6) 4, 86-133.
- (1937). La prevision: ses lois logiques, ses sources subjectives. Ann. Inst. H.
Poincare 7, 1-68.
FISK, D.L. (1965). Quasimartingales. Trans. Amer. Math. Soc. 120, 369-389.
- (1966). Sample quadratic variation of continuous, second-order martingales.
Z. Wahrsch. verw. Geb. 6, 273-278.
FORTET, R. (1943). Les fonctions alEatoires du type de Markoff associees it cer-
taines equations lineaires aux derivees partielles du type parabolique. J. Math.
Pures Appl. 22, 177-243.
FRANKEN, P., KONIG, D., ARNDT, D., SCHMIDT, V. (1981). Queues and Point
Processes. Akademie- Verlag, Berlin.
FRECHET, M. (1928). Les Espaces Abstraits. Gauthier-Villars, Paris.
FREEDMAN, D. (1962-63). Invariants under mixing which generalize de Finetti's
theorem. Ann. Math. Statist. 33, 916-923; 34, 1194-1216.
- (1971a). Markov Chains. Holden-Day, San Francisco. Repr. Springer, NY 1983.
- (1971b). Brownian Motion and Diffusion. Holden-Day, San Francisco. Repr.
Springer, NY 1983.
FREIDLIN, M.I., WENTZEL, A.D. (1970). On small random permutations of
dynamical systems. Russian Math. Surveys 25, 1-55.
- (1998). Random Perturbations of Dynamical Systems. Engl. trans., Springer,
NY. (Russian orig. 1979.)
FROSTMAN, O. (1935). Potentiel d'equilibre et capacite des ensembles avec
quelques applications it la theorie des fonctions. Medd. Lunds Univ. Mat. Bern.
3, 1-118.
FUBINI, G. (1907). SugH integrali multipli. Rend. Ace. Naz. Lincei 16, 608-614.
FURSTENBERG, H., KESTEN, H. (1960). Products of random matrices. Ann.
Math. Statist. 31, 457-469.
GALMARINO, A.R. (1963). Representation of an isotropic diffusion as a skew
product. Z. Wahrsch. verw. Geb. 1, 359-378.
GARSIA, A.M. (1965). A simple proof of E. Hopf's maximal ergodic theorem. J.
Math. Mech. 14, 381-382.
- (1973). Martingale Inequalities: Seminar Notes on Recent Progress. Math.
Lect. Notes Ser. Benjamin, Reading, MA.
GAUSS, C.F. (1809). Theory of Motion of the Heavenly Bodies. Engl. trans.,
Dover, NY 1963.
- (1840). Allgemeine Lehrsatze in Beziehung auf die im vehrkehrten Verhaltnisse
des Quadrats der Entfernung wirkenden Anziehungs- und Abstossungs-Krafte.
Gauss Werke 5, 197-242. Gottingen 1867.
GETOOR, R.K. (1990). Excessive Measures. Birkhauser, Boston.
GETOOR, R.K., SHARPE, M.J. (1972). Conformal martingales. Invent. Math. 16,
271-308.
GIHMAN, 1.1. (1947). On a method of constructing random processes (Russian).
Dokl. Akad. Nauk SSSR 58,961-964.
Bibliography 605
- (1950-51). On the theory of differential equations for random processes, I-II
(Russian). Ukr. Mat. J. 2:4, 37-63; 3:3, 317-339.
GIHMAN, I.I., SKOROHOD, A.V. (1965). Introduction to the Theory of Random
Processes. Engl. trans., Saunders, Philadelphia. Repr. Dover, Mineola 1996.
- (1974-79). The Theory of Stochastic Processes, 1-3. Engl. trans., Springer,
Berlin.
G IRSANOV, LV. (1960). On transforming a certain class of stochastic processes by
absolutely continuous substitution of measures. Th. Probab. Appl. 5, 285-301.
GLIVENKO, V.1. (1933). Sulla determinazione empirica della leggi di probabilita.
Giorn. 1st. Ital. Attuari 4, 92-99.
GNEDENKO, B.V. (1939). On the theory of limit theorems for sums of independent
random variables (Russian). Izv. Akad. Nauk SSSR Sere Mat. 181-232,643-
647.
GNEDENKO, B.V., KOLMOGOROV, A.N. (1968). Limit Distn.butions for Sums
of Independent Random Variables. Engl. trans., 2nd ed., Addison-Wesley,
Reading, MA. (Russian orig. 1949.)
GOLDMAN, J.R. (1967). Stochastic point processes: Limit theorems. Ann. Math.
Statist. 38, 771-779.
GOLDSTEIN, J .A. (1976). Semigroup-theoretic proofs of the central limit theorem
and other theorems of analysis. Semigroup Forum 12, 189-206.
GOLDSTEIN, S. (1979). Maximal coupling. Z. Wahrsch. verw. Geb. 46, 193-204.
GRANDELL, J. (1976). Doubly Stochastic Poisson Processes. Leet. Notes in Math.
529. Springer, Berlin.
GREEN, G. (1828). An essay on the application of mathematical analysis to the
theories of electricity and magnetism. Repr. in Mathematical Papers, Chelsea,
NY 1970.
GREENWOOD, P., PITMAN, J. (1980). Construction of local time and Poisson
point processes from nested arrays. J. London Math. Soc. (2) 22, 182-192.
GRIFFEATH, D. (1975). A maximal coupling for Markov chains. Z. Wahrsch.
verw. Geb. 31, 95-106.
GRIGELIONIS, B. (1963). On the convergence of sums of random step processes
to a Poisson process. Th. Probab. Appl. 8, 172-182.
- (1971). On the representation of integer-valued measures by means of
stochastic integrals with respect to Poisson measure. Litovsk. Mat. Sb. 11,
93-108.
HAAR, A. (1933). Der MaBbegriff in der Theorie der kontinuerlichen Gruppen.
Ann. Math. 34, 147-169.
HAGBERG, J. (1973). Approximation of the summation process obtained by
sampling from a finite population. Th. Probab. Appl. 18, 790-803.
HAHN, H. (1921). Theorie der reellen Funktionen. Julius Springer, Berlin.
HAJEK, J. (1960). Limiting distributions in simple random sampling from a finite
population. Magyar Tud. Akad. Mat. Kutat6 Int. Kozi. 5, 361-374.
HALL, P., HEYDE, C.C. (1980). Martingale Limit Theory and its Application.
Academic Press, NY.
606 Foundations of Modern Probability
HALMOS, P.R. (1950). Measure Theory, Van Nostrand, Princeton. Repr.
Springer, NY 1974.
HARDY, G.H., LITTLEWOOD, J.E. (1930). A maximal theorem with function-
theoretic applications. Acta Math. 54, 81-116.
HARRIS, T.E. (1956). The existence of stationary measures for certain Markov
processes. Proc. 3rd Berkeley Symp. Math. Statist. Probab. 2, 113-124.
- (1971). Random measures and motions of point processes. Z. Wahrsch. verw.
Geb. 18, 85-115.
HARTMAN, P., WINTNER, A. (1941). On the law of the iterated logarithm. J.
Math. 63, 169-176.
HELLY, E. (1911-12). Uber lineare Funktionaloperatoren. Sitzungsber. Nat. Kais.
Akad. Wiss. 121, 265-297.
HEWITT, E., SAVAGE, L.J. (1955). Symmetric measures on Cartesian products.
Trans. Amer. Math. Soc. 80, 470-501.
HILLE, E. (1948). Functional analysis and semi-groups. Amer. Math. Colloq. Publ.
31, NY.
HITCZENKO, P. (1990). Best constants in martingale version of Rosenthal's
inequality. Ann. Probab. 18, 1656-1668.
HOLDER, O. (1889). Uber einen Mittelwertsatz. Nachr. Akad. Wiss. Gottingen,
math.phys. Kl., 38-47.
HOPF, E. (1954). The general temporally discrete Markov process. J. Rat. Meeh.
Anal. 3, 13-45.
HOROWITZ, J. (1972). Semilinear Markov processes, subordinators and renewal
theory. Z. Wahrsch. verw. Geb. 24, 167-193.
HUNT, G.A. (1956). Some theorems concerning Brownian motion. Trans. Amer.
Math. Soc. 81, 294-319.
- (1957-58). Markoff processes and potentials, I-III. Illinois J. Math. 1, 44-93,
316-369; 2, 151-213.
HUREWICZ, W. (1944). Ergodic theorem without invariant measure. Ann. Math.
45, 192-206.
HURWITZ, A. (1897). Uber die Erzeugung der Invarianten durch Integration.
Nachr. Ges. Gottingen, math.-phys. Kl., 71-90.
IKEDA, N., WATANABE, S. (1989). Stochastic Differential Equations and Diffusion
Processes, 2nd ed. North-Holland and Kodansha, Amsterdam and Tokyo.
IOFFE, D. (1991). On some applicable versions of abstract large deviations
theorems. Ann. Probab. 19, 1629-1639.
IONEscu TULCEA, A. (1960). Contributions to information theory for abstract
alphabets. Ark. Mat. 4, 235-247.
IONESCU TULCEA, C. T. (1949-50). Mesures dans les espaces produits. Atti Accad.
Naz. Lincei Rend. 7, 208-211.
ITO, K. (1942a). Differential equations determining Markov processes (Japanese).
Zenkoku Shij6 Sugaku Danwakai 244:1077, 1352-1400.
- (1942b). On stochastic processes (I) (Infinitely divisible laws of probability).
Jap. J. Math. 18, 261-301.
- (1944). Stochastic integral. Proc. Imp. Acad. Tokyo 20, 519-524.
Bibliography 607
- (1946). On a stochastic integral equation. Proc. Imp. Acad. Tokyo 22, 32-35.
- (1951a). On a formula concerning stochastic differentials. Nagoya Math. J. 3,
55-65.
- (1951b). On stochastic differential equations. Mem. Amer. Math. Soc. 4,1-51.
- (1951c). Multiple Wiener integral. J. Math. Soc. Japan 3, 157-169.
- (1957). Stochastic Processes (Japanese). Iwanami Shoten, Tokyo.
- (1972). Poisson point processes attached to Markov processes. Proc. 6th
Berkeley Symp. Math. Statist. Probab. 3, 225-239.
- (1984). Introduction to Probability Theory. Engl. trans., Cambridge Univ.
Press.
lTC, K., McKEAN, H.P. (1965). Diffusion Processes and their Sample Paths.
Repr. Springer, Berlin 1996.
ITa, K., WATANABE, S. (1965). Transformation of Markov processes by
multiplicative functionals. Ann. Inst. Fourier 15, 15-30.
JACOD, J. (1975). Multivariate point processes: Predictable projection, Radon-
Nikodym derivative, representation of martingales. Z. Wahrsch. verw. Geb.
31, 235-253.
- (1979). Calcul Stochastique et Problemes de Martingales. Leet. Notes in Math.
714. Springer, Berlin.
JACOD, J., SHIRYAEV, A.N. (1987). Limit Theorems for Stochastic Processes.
Springer, Berlin.
JAGERS, P. (1972). On the weak convergence of superpositions of point processes.
Z. Wahrsch. verw. Geb. 22, 1-7.
- (1974). Aspects of random measures and point processes. Adv. Probab. ReI.
Topics 3, 179-239. Marcel Dekker, NY.
JAMISON, B., OREY, S. (1967). Markov chains recurrent in the sense of Harris.
Z. Wahrsch. verw. Geb. 8, 206-223.
JENSEN, J.L.W.V. (1906). Sur les fonctions convexes et les inegalites entre les
valeurs moyennes. Acta Math. 30, 175-193.
JESSEN, B. (1934). The theory of integration in a space of an infinite number of
dimensions. Acta Math. 63, 249-323.
JOHNSON, W.B., SCHECHTMAN, G., ZINN, J. (1985). Best constants in moment
inequalities for linear combinations of independent and exchangeable random
variables. Ann. Probab. 13, 234-253.
JORDAN, C. (1881). Sur la serie de Fourier. C.R. Acad. Sci. Paris 92, 228-230.
KAC, M. (1947). On the notion of recurrence in discrete stochastic processes.
Bull. Amer. Math. Soc. 53, 1002-1010.
- (1949). On distributions of certain Wiener functionals. T1'ans. Amer. Math.
Soc. 65, 1-13.
- (1951). On some connections between probability theory and differential and
integral equations. Proc. 2nd Berkeley Symp. Math. Statist. Probab., 189-215.
U niv. of California Press, Berkeley.
KAKUTANI, S. (1940). Ergodic theorems and the Markoff process with a stable
distribution. Proc. Imp. Acad. Tokyo 16, 49-54.
- (1944a). On Brownian motions in n-space. Proc. Imp. Acad. Tokyo 20, 648-
652.
608 Foundations of Modern Probability
- (1944b). Two-dimensional Brownian motion and harmonic functions. Proc.
Imp. Acad. Tokyo 20, 706-714.
- (1945). Markoff process and the Dirichlet problem. Proc. Japan Acad. 21,
227-233.
KALLENBERG, o. (1973a). Characterization and convergence of random measures
and point processes. Z. Wahrsch. veruJ. Geb. 27, 9-21.
- (1973b). Canonical representations and convergence criteria for processes with
interchangeable increments. Z. Wahrsch. verw. Geb. 27, 23-36.
- (1986). Random Measures, 4th ed. Akademie-Verlag and Academic Press,
Berlin and London (1st ed. 1975).
- (1987). Homogeneity and the strong Markov property. Ann. Probab. 15, 213-
240.
- (1988). Spreading and predictable sampling in exchangeable sequences and
processes. Ann. Probab. 16, 508-534.
- (1990). Random time change and an integral representation for marked
stopping times. Probab. Th. Ret. Fields 86, 167-202.
- (1992). Some time change representations of stable integrals, via predictable
transformations of local martingales. Stoch. Proc. Appl. 40, 199-223.
- (1996a). On the existence of universal functional solutions to classical SDEs.
Ann. Probab. 24, 196-205.
- (1996b). Improved criteria for distributional convergence of point processes.
Stoch. Proc. Appl. 64, 93-102.
- (1999a). Ballot theorems and sojourn laws for stationary processes. Ann.
Probab. 27, 2011-2019.
- (1999b). Asymptotically invariant sampling and averaging from stationary-like
processes. Stoch. Proc. Appl. 82, 195-204.
KALLENBERG, 0., SZTENCEL, R. (1991). Some dimension-free features of vector-
valued martingales. Probab. Th. Ret. Fields 88, 215-247.
KALLIANPUR, G. (1980). Stochastic Filtering Theory. Springer, NY.
KAPLAN, E.L. (1955). Transformations of stationary random sequences. Math.
Scand. 3, 127-149.
KARAMATA, J. (1930). Sur une mode de croissance reguliere des fonctions.
Mathematica (Cluj) 4, 38-53.
KARATZAS, I., SHREVE, S.E. (1991). Brownian Motion and Stochastic Calculus,
2nd ed. Springer, NY.
KAZAMAKI, N. (1972). Change of time, stochastic integrals and weak martingales.
Z. Wahrsch. veruJ. Geb. 22, 25-32.
KEMENY, J.G., SNELL, J.L., KNAPP, A.W. (1966). Denumerable Markov Chains.
Van Nostrand, Princeton.
KENDALL, D.G. (1974). Foundations of a theory of random sets. In Stochastic
Geometry (eds. E.F. Harding, D.G. Kendall), pp. 322-376. Wiley, NY.
KHINCHIN, A.Y. (1923). Uber dyadische Briicke. Math. Z. 18, 109-116.
- (1924). Uber einen Satz der Wahrscheinlichkeitsrechnung. Fund. Math. 6, 9-
20.
- (1929). Uber einen neuen Grenzwertsatz der Wahrscheinlichkeitsrechnung.
Math. Ann. 101, 745-752.
Bibliography 609
- (1933). Zur mathematischen Begriinding der statistischen Mechanik. Z.
Angew. Math. Mech. 13, 101-103.
- (1933). Asymptotische Gesetze der Wahrscheinlichkeitsrechnung, Springer,
Berlin. Repr. Chelsea, NY 1948.
- (1934). Korrelationstheorie der stationaren stochastischen Prozesse. Math.
Ann. 109, 604-615.
- (1937). Zur Theorie der unbeschrankt teilbaren Verteilungsgesetze. Mat. Sb.
2, 79-119.
- (1938). Limit Laws for Sums of Independent Random Variables (Russian).
Moscow.
- (1960). Mathematical Methods in the Theory of Queuing. Engl. trans., Griffin,
London. (Russian orig. 1955.)
KHINCHIN, A.Y., KOLMOGOROV, A.N. (1925). Uber Konvergenz von Reihen
deren Glieder durch den Zufall bestimmt werden. Mat. Sb. 32, 668-676.
KINGMAN, J.F.C. (1964). On doubly stochastic Poisson processes. Proc.
Cambridge Phil. Soc. 60, 923-930.
- (1967). Completely random measures. Pac.. J. Math. 21, 59-78.
- (1968). The ergodic theory of subadditive stochastic processes. J. Roy. Statist.
Soc. (B) 30, 499-510.
- (1972). Regenerative Phenomena. Wiley, NY.
- (1993). Poisson Processes. Clarendon Press, Oxford.
KINNEY, J.R. (1953). Continuity properties of Markov processes. Trans. Amer.
Math. Soc. 74, 280-302.
KNIGHT, F.B. (1963). Random walks and a sojourn density process of Brownian
motion. Trans. Amer. Math. Soc. 107, 56-86.
- (1971). A reduction of continuous, square-integrable martingales to Brownian
motion. Leet. Notes in Math. 190, 19-31. Springer, Berlin.
KOLMOGOROV, A.N. (1928-29). Uber die Summen durch den Zufall bestimmter
unabhangiger Grossen. Math. Ann. 99, 309-319; 102, 484-488.
- (1929). Uber das Gesatz des iterierten Logarithmus. Math. Ann. 101, 126-135.
- (1930). Sur la loi forte des grandes nombres. C.R. Acad. Sci. Paris 191, 910-
912.
- (1931a). Uber die analytischen Methoden in der Wahrscheinlichkeitsrechnung.
Math. Ann. 104, 415-458.
- (1931b). Eine Verallgemeinerung des Laplace-Liapounoffschen Satzes. Izv.
Akad. Nauk USSR, Otdel. Matern. Yestestv. Nauk 1931, 959-962.
- (1932). Sulla forma generale di un processo stocastico omogeneo (un problema
di B. de Finetti). Atti Aeead. Naz. Lineei Rend. (6) 15, 805-808, 866-869.
- (1933a). Uber die Grenzwertsatze der Wahrscheinlichkeitsrechnung. Izv. Akad.
Nauk USSR, Otdel. Matern. Yestestv. Nauk 1933, 363-372.
- (1933b). Zur Theorie der stetigen zufalligen Prozesse. Math. Ann. 108, 149-
160.
- (1933c). Foundations of the Theory of Probability (German), Springer, Berlin.
Engl. trans., Chelsea, NY 1956.
- (1935). Some current developments in probability theory (in Russian). Proe.
2nd All-Union Math. Congr. 1,349-358. Akad. Nauk SSSR, Leningrad.
- (1936a). Anfangsgriinde der Marko£Ischen Ketten mit unendlich vielen
moglichen Zustanden. Mat. Sb. 1, 607-610.
610 Foundations of Modern Probability
- (1936b). Zur Theorie der Markoffschen Ketten. Math. Ann. 112, 155-160.
- (1937). Zur Umkehrbarkeit der statistischen Naturgesetze. Math. Ann. 113,
766-772.
- (1956). On Skorohod convergence. Th. Probab. Appl. 1, 213-222.
KOLMOGOROV, A.N., LEONTOVICH, M.A. (1933). Zur Berechnung der mittleren
Brownschen FUiche. Physik. Z. Sowjetunion 4, 1-13.
KOMLos, J., MAJOR, P., TUSNADY, G. (1975-76). An approximation of partial
sums of independent r. v.'s and the sample d.f., I-II. Z. Wahrsch. verw. Geb.
32, 111-131; 34,33-58.
KONIG, D., MATTHES, K. (1963). Verallgemeinerung der Erlangschen Formeln,
I. Math. Nachr. 26, 45-56.
KOOPMAN, B.O. (1931). Hamiltonian systems and transformations in Hilbert
space. Proc. Nat. Acad. Sci. USA 17, 315-318.
KRENGEL, U. (1985). Ergodic Theorems. de Gruyter, Berlin.
KRICKEBERC, K. (1956). Convergence of martingales with a directed index set.
Trans. Amer. Math. Soc. 83, 313-357.
- (1972). The Cox process. Symp. Math. 9, 151-167.
KRYLOV, N., BOGOLIOUBOV, N. (1937). La theorie generale de Ia ffiesure dans
son application it I' etude des systemes de la mecanique non lineaires. Ann.
Math. 38, 65-113.
KULLBACK, S., LEIBLER, R.A. (1951). On information and sufficiency. Ann.
Math. Statist. 22, 79-86.
KUNITA, H. (1990). Stochastic Flows and Stochastic Differential Equations.
Cambridge Univ. Press, Cambridge.
KUNITA, H., WATANABE, S. (1967). On square integrable martingales. Nagoya
Math. J. 30, 209-245.
KURTZ, T.G. (1969). Extensions of Trotter's operator semi group approximation
theorems. J. Funct. Anal. 3, 354-375.
- (1975). Semigroups of conditioned shifts and approximation of Markov
processes. Ann. Probab. 3, 618-642.
KWAPIEN, S., WOYCZYNSKI, W.A. (1992). Random Series and Stochastic
Integrals: Single and Multiple. Birkhauser, Boston.
LANGEVIN, P. (1908). Sur la theorie du mouvement brownien. C.R. Acad. Sci.
Paris 146, 530-533.
LAPLACE, P .S. DE (1774). Memoire sur la probabilite des causes par les
evenemens. Engl. trans. in Statistical Science 1, 359-378.
- (1809). Memoire sur divers points d'analyse. Repr. in Oeuvres Completes de
Laplace 14, 178-214. Gauthier-Villars, Paris 1886-1912.
- (1812-20). Theone Analytique des Probabilites, 3rd ed. Repr. in Oeuvres
Completes de Laplace 7. Gauthier-Villars, Paris 1886-1912.
LAST, G., BRANDT, A. (1995). Marked Point Processes on the Real Line: The
Dynamic Approach. Springer, NY.
LEADBETTER, M.R., LINDGREN, G., ROOTZEN, H. (1983). Extremes and Related
Properties of Random Sequences and Processes. Springer, NY.
Bibliography 611
LEBESGUE, H. (1902). Integrale, longeur, aire. Ann. Mat. Pura .i4ppl. 7,231-359.
- (1904). Leons sur l' Integration et la Recherche des Fonctions Primitives.
Paris.
LE CAM, L. (1957). Convergence in distribution of stochastic processes. Univ.
California Publ. Statist. 2, 207-236.
LE GALL, J.F. (1983). Applications des temps locaux aux equations differentielles
stochastiques unidimensionelles. Lect. Notes in Math. 986, 15-31.
LEVI, B. (1906a). Sopra l'integrazione delle serie. Rend. 1st. Lo'mbardo Sci. Lett.
(2) 39, 775-780.
- (1906b). SuI principio de Dirichlet. Rend. Cire. Mat. Palerrno 22, 293-360.
LEVY, P. (1922a). Sur Ie role de la loi de Gauss dans la theorie des erreurs. C.R.
Acad. Sci. Paris 174, 855-857.
- (1922b). Sur la loi de Gauss. C.R. Acad. Sci. Paris 1682-1684.
- (1922c). Sur la determination des lois de probabilite par leurs fonctions
caracteristiques. C.R. Acad. Sci. Paris 175, 854-856.
- (1924). Theorie des erreurs. La loi de Gauss et les lois exceptionelles. Bull.
Soc. Math. France 52, 49-85.
- (1925). Calcul des Probabilites. Gauthier-Villars, Paris.
- (1934-35). Sur les integrales dont les elements sont des variables aleatoires
independantes. Ann. Scuola Norm. Sup. Pisa (2) 3, 337-36t>; 4, 217-218.
- (1935a). Proprietes asymptotiques des sommes de variables aleatoires inde-
pendantes ou enchainees. J. Math. Pures Appl. (8) 14, 347-402.
- (1935b). Proprietes asymptotiques des sommes de variables aleatoires en-
chainees. Bull. Sci. Math. (2) 59, 84-96, 109-128.
- (1939). Sur certain processus stochastiques homogenes. Cornp. Math. 7, 283-
339.
- (1940). Le mouvement brownien plan. Amer. J. Math. 62, 487-550.
- (1954). Theorie de l'Addition des Variables Aleatoires, 2nd ed. Gauthier-
Villars, Paris (1st ed. 1937).
- (1965). Processus Stochastiques et Mouvement Brownien, 2nd ed. Gauthier-
Villars, Paris (1st ed. 1948).
LIAPOUNOV, A.M. (1901). Nouvelle forme du theoreme sur la limite des
probabilites. Mem. Acad. Sci. St. Petersbourg 12, 1-24.
LIGGETT, T.M. (1985). An improved subadditive ergodic theorem. Ann. Probab.
13, 1279-1285.
LINDEBERG, J.W. (1922a). Eine neue Herleitung des Exponentialgesetzes in der
Wahrscheinlichkeitsrechnung. Math. Zeitschr. 15, 211-225.
- (1922b). Sur la loi de Gauss. C.R. Acad. Sci. Paris 174, 1400-1402.
LINDVALL, T. (1973). Weak convergence of probability measures and random
functions in the function space D[O, 00). J. Appl. Probab. 10, 109-121.
- (1977). A probabilistic proof of Blackwell's renewal theorern. Ann. Probab. 5,
482-485.
- (1992). Lectures on the Coupling Method. Wiley, NY.
LIPSTER, R.S., SHIRYAEV, A.N. (2000). Statistics of Rand01n Processes, I-II,
2nd ed., Springer, Berlin.
612 Foundations of Modern Probability
LOEVE, M. (1977-78). Probability Theory 1-2, 4th ed. Springer, NY (1st ed.
1955).
LOMNICKI, Z., ULAM, S. (1934). Sur la theorie de la ffiesure dans les es-
paces combinatoires et son application au calcul des probabilites: I. Variables
independantes. Fund. Math. 23, 237-278.
LUKACS, E. (1970). Characteristic Functions, 2nd ed. Griffin, London.
LUNDBERG, F. (1903). Approximerad Framstiillning av Sannolikhetsfunktionen.
Aterforsiikring av Kollektivrisker. Thesis, Uppsala.
MACKEVICIUS, V. (1974). On the question of the weak convergence of random
processes in the space D[O, (0). Lithuanian Math. Trans. 14, 620-623.
MAISONNEUVE, B. (1974). Systemes Regeneratifs. Asterique 15. Soc. Math. de
France.
MAKER, P. (1940). The ergodic theorem for a sequence of functions. Duke Math.
J. 6, 27-30.
MANN, H.B., WALD, A. (1943). On stochastic limit and order relations. Ann.
Math. Statist. 14, 217-226.
MARCINKIEWICZ, J., ZYGMUND, A. (1937). Sur les fonctions independantes.
Fund. Math. 29, 60-90.
- (1938). Quelques theoremes sur les fonctions independantes. Studia Math. 7,
104-120.
MARKOV, A.A. (1899). The law of large numbers and the method of least squares
(Russian). Izv. Fiz.-Mat. Obshch. Kazan Univ. (2) 8, 110-128.
- (1906). Extension of the law of large numbers to dependent events (Russian).
Bull. Soc. Phys. Math. Kazan (2) 15, 135-156.
MARUYAMA, G. (1954). On the transition probability functions of the Markov
process. Natl. Sci. Rep. Ochanomizu Univ. 5, 10-20.
- (1955). Continuous Markov processes and stochastic equations. Rend. Circa
Mat. Palermo 4, 48-90.
MARUYAMA, G., TANAKA, H. (1957). Some properties of one-dimensional
diffusion processes. Mem. Pac. Sci. Kyushu Univ. 11, 117-141.
MATHERON, G. (1975). Random Sets and Integral Geometry. Wiley, London.
MATTHES, K. (1963). Stationare zufallige Punktfolgen, I. Jahresber. Deutsch.
Math. - Verein. 66, 66-79.
MATTHES, K., KERSTAN, J., MECKE, J. (1978). Infinitely Divisible Point
Processes. Wiley, Chichester. (German ed. 1974, Russian ed. 1982.)
McKEAN, H.P. (1969). Stochastc Integrals. Academic Press, NY.
McKEAN, H.P., TANAKA, H. (1961). Additive functionals of the Brownian path.
Mem. ColI. Sci. Univ. Kyoto, A 33, 479-506.
McMILLAN, B. (1953). The basic theorems of information theory. Ann. Math.
Statist. 24, 196-219.
MECKE, J. (1967). Stationare zufallige Mafie auf lokalkompakten Abelschen
Gruppen. Z. Wahrsch. venn. Geb. 9, 36-58.
- (1968). Eine characteristische Eigenschaft der doppelt stochastischen Poisson-
schen Prozesse. Z. Wahrsch. venn. Geb. 11, 74-81.
Bibliography 613
MELEARD, S. (1986). Application du calcul stochastique a l'etude des processus
de Markov reguliers sur [0,1]. Stochastics 19, 41-82.
METIVIER, M. (1982). Semimartingales: A Course on Stochastic Processes. de
Gruyter, Berlin.
METIVIER, M., PELLAUMAIL, J. (1980). Stochastic Integration. Academic Press,
NY.
MEYER, P .A. (1962). A decomposition theorem for supermartingales. Illinois J.
Math. 6, 193-205.
- (1963). Decomposition of supermartingales: The uniqueness theorem. Illinois
J. Math. 7, 1-17.
- (1966). Probability and Potentials. Engl. trans., Blaisdell, Waltham.
- (1967). Integrales stochastiques, I-IV. Lect. Notes in Math. 39, 72-162.
Springer, Berlin.
- (1971). Demonstration simplifiee d 'un theoreme de Knight. Lect. Notes in
Math. 191, 191-195. Springer, Berlin.
- (1976). Un cours sur les integrales stochastiques. Lect. Notes in Math. 511,
245-398. Springer, Berlin.
MILLAR, P.W. (1968). Martingale integrals. Trans. Amer. Math. Soc. 133,145-
166.
MINKOWSKI, H. (1907). Diophantische Approximationen. Teubner, Leipzig.
MITOMA, I. (1983). Tightness of probabilities on C([O, 1); 5') and D([O, 1]; 5').
Ann. Probab. 11, 989-999.
DE MOIVRE, A. (1711-12). On the measurement of chance. Engl. trans., Int.
Statist. Rev. 52 (1984), 229-262.
- (1718-56). The Doctrine of Chances; or, a Method of Calculating the Proba-
bility of Events in Play, 3rd ed. (post.) Repr. Case and Chelsea, London and
NY 1967.
- (1733-56). Approximatio ad Summam Terminorum Binomii a + bin in Seriem
Expansi. Translated and edited in The Doctrine of Chances, 2nd and 3rd eds.
Repr. Case and Chelsea, London and NY 1967.
MONCH, G. (1971). Verallgemeinerung eines Satzes von A. Renyi. Studia Sel.
Math. Hung. 6, 81-90.
MOTOO, M., WATANABE, H. (1958). Ergodic property of recurrent diffusion
process in one dimension. J. Math. Soc. Japan 10, 272-286.
NAWROTZKI, K. (1962). Ein Grenzwertsatz fur homogene zufallige Punktfolgen
(Verallgemeinerung eines Satzes von A. Renyi). Math. Nachr. 24, 201-217.
VON NEUMANN, J. (1932). Proof of the quasi-ergodic hypothesis. Proc. Natl.
Acad. Sci. USA 18, 70-82.
- (1940). On rings of operators, III. Ann. Math. 41, 94-161.
NEVEU, J. (1971). Mathematical Foundations of the Calculus of Probability.
Holden-Day, San Francisco.
- (1975). Discrete-Parameter Martingales. North-Holland, j\.msterdam.
NGUYEN, X.X., ZESSIN, H. (1979). Ergodic theorems for spatial processes. Z.
Wahrsch. verw. Geb. 48, 133-158.
NIKODYM, O.M. (1930). Sur une generalisation des integrales de M. J. Radon.
Fund. Math. 15, 131-179.
614 Foundations of Modern Probability
NORBERG, T. (1984). Convergence and existence of random set distributions.
Ann. Probab. 12, 726-732.
NOVIKOV, A.A. (1971). On moment inequalities for stochastic integrals. Th.
Probab. Appl. 16, 538-541.
- (1972). On an identity for stochastic integrals. Th. Probab. Appl. 17, 717-720.
NUALART, D. (1995). The Malliavin Calculus and Related Topics. Springer, NY.
0KSENDAL, B. (1998). Stochastic Differential Equations, 5th ed. Springer, Berlin.
OREY, S. (1959). Recurrent Markov chains. Pacific J. Math. 9, 805-827.
- (1962). An ergodic theorem for Markov chains. Z. Wahrsch. verw. Geb. 1,
174-176.
- (1966). F-processes. Proc. 5th Berkeley Symp. Math. Statist. Probab. 2:1,301-
313.
- (1971). Limit Theorems for Markov Chain Transition Probabilities. Van
Nostrand, London.
ORNSTEIN, D.S. (1969). Random walks. Trans. Amer. Math. Soc. 138, 1-60.
ORNSTEIN, L.S., UHLENBECK, G.E. (1930). On the theory of Brownian motion.
Phys. Review 36, 823-841.
OSOSKOV, G.A. (1956). A limit theorem for flows of homogeneous events. Th.
Probab. Appl. 1, 248-255.
OTTAVIANI, G. (1939). Sulla teoria astratta del calcolo delle probabilita proposita
dal Cantelli. Giorn. 1st. Ital. Attuari 10, 10-40.
PALEY, R.E.A.C. (1932). A remarkable series of orthogonal functions 1. Proc.
London Math. Soc. 34, 241-264.
PALEY, R.E.A.C., WIENER, N. (1934). Fourier transforms in the complex
domain. Amer. Math. Soc. Coll. Publ. 19.
PALEY, R.E.A.C., WIENER, N., ZYGMUND, A. (1933). Notes on random
functions. Math. Z. 37, 647-668.
PALM, C. (1943). Intensity Variations in Telephone Traffic (German). Ericsson
Technics 44,1-189. Engl. trans., North-Holland Studies in Telecommunication
10, Elsevier 1988.
PAPANGELOU, F. (1972). Integrability of expected increments of point processes
and a related random change of scale. Trans. Amer. Math. Soc. 165,486-506.
PARTHASARATHY, K.R. (1967). Probability Measures on Metric Spaces. Academic
Press, NY.
PERKINS, E. (1982). Local time and pathwise uniqueness for stochastic
differential equations. Lect. Notes in Math. 920, 201-208. Springer, Berlin.
PETROV, V.V. (1995). Limit Theorems of Probability Theory. Clarendon Press,
Oxford.
PHILLIPS, H.B., WIENER, N. (1923). Nets and Dirichlet problem. J. Math. Phys.
2, 105-124.
PITT, H.R. (1942). Some generalizations of the ergodic theorem. Proc. Camb.
Phil. Soc. 38, 325-343.
POINCARE, H. (1890). Sur les equations aux derivees partielles de la physique
mathema-tique. Amer. J. Math. 12, 211-294.
- (1899). Theorie du Potentiel Newtonien. Gauthier-Villars, Paris.
Bibliography 615
POISSON, S.D. (1837). Recherches sur la Probabilite des Jugements en Matiere
Criminelle et en Matiere Civile, Precedees des Regles Generales du Calcul des
Probabilites. Bachelier, Paris.
POLLACZEK, F. (1930). Uber eine Aufgabe cler Wahrscheinlichkeitst, heorie I-II.
Math. Z. 32, 64-100, 729-750.
POLLARD, D. (1984). Convergence of Stochastic Processes. Springer, NY.
P6LYA, G. (1920). Uber den zentralen Grenzwertsatz cler Wahrscheinlichkeit-
srechnung und das Momentenproblem. Math. Z. 8, 171-181.
- (1921). Uber eine Aufgabe der Wahrscheinlichkeitsrechnung betreffend die Irr-
fahrt im Strassennetz. Math. Ann. 84, 149-160.
PORT, S.C., STONE, C.J. (1978). Brownian Motion and Classical Potential
Theory. Academic Press, NY.
POSPISIL, B. (1935-36). Sur un probleme de M.M.S. Bernstein et A. Kolmogoroff.
Casopis Pest. Mat. Fys. 65, 64-76.
PROHOROV, Y.V. (1956). Convergence of random processes and limit theorems
in probability theory. Th. Probab. Appl. 1, 157-214.
- (1959). Some remarks on the strong law of large numbers. Th. Probab. Appl.
4, 204-208.
- (1961). Random measures on a compactum. Soviet Math. Dokl. 2, 539-541.
PROTTER, P. (1990). Stochastic Integration and Differential E:quations. Springer,
Berlin.
PUKHALSKY, A.A. (1991). On functional principle of large deviations. In New
Trends in Probability and Statistics (V. Sazonov and T. Shervashidze, eds.),
198-218. VSP Moks'las, Moscow.
RADON, J. (1913). Theorie und Anwendungen der absolut additiven Mengen-
funktionen. Wien Akad. Sitzungsber. 122, 1295-1438.
RAO, K.M. (1969a). On decomposition theorems of Meyer. Math. Scand. 24,
66-78.
- (1969b). Quasimartingales. Math. Scand. 24, 79-92.
RAY, D.B. (1956). Stationary Markov processes with continuous paths. Trans.
Amer. Math. Soc. 82, 452-493.
- (1963). Sojourn times of a diffusion process. Illinois J. Math. 7, 615-630.
RENYI, A. (1956). A characterization of Poisson processes. Magyar Tud. Akad.
Mat. Kutato Int. Kozl. 1, 519-527.
- (1967). Remarks on the Poisson process. Studia Sci. Math. Hung. 2, 119-123.
REVUZ, D. (1970). Mesures associees aux fonctionnelles additives de Markov, I-II.
Trans. Amer. Math. Soc. 148, 501-531; Z. Wahrsch. verw. Geb. 16,336-344.
- (1984). Markov Chains, 2nd ed. North-Holland, Amsterdam.
REvuz, D., YOR, M. (1999). Continuous Martingales and Brownian Motion,
23rd ed. Springer, Berlin.
RIESZ, F. (1909a). Sur les suites de fonctions mesurables. C.R. Acad. Sci. Paris
148, 1303-1305.
- (1909b). Sur les operations fonctionelles lineaires. C.R. Acad. Sci. Paris 149,
974-977.
616 Foundations of Modern Probability
- (1910). Untersuchungen iiber Systeme integrierbarer Funktionen. Math. Ann.
69, 449-497.
- (1926-30). Sur les fonctions subharmoniques et leur rapport it la theorie du
potentiel, I-II. Acta Math. 48, 329-343; 54, 321-360.
ROGERS, C.A., SHEPHARD, G.C. (1958). Some extremal problems for convex
bodies. M athematica 5, 93-102.
ROGERS, L.C.G., WILLIAMS, D. (2000ajb). Diffusions, Markov Processes, and
Martingales, 1 (2nd ed.); 2. Cambridge Univ. Press.
ROSEN, B. (1964). Limit theorems for sampling from a finite population. Ark.
Mat. 5, 383-424.
ROSINSKI, J., \V OYCZYNSKI, W. A. ( 1986). On I to stochastic integration with
respect to p-stable motion: Inner clock, integrability of sample paths, double
and multiple integrals. Ann. Probab. 14, 271-286.
ROYDEN, H.L. (1988). Real Analysis, 3rd ed. Macmillan, NY.
RUTHERFORD, E., GEIGER, H. (1908). An electrical method of counting the
number of particles from radioactive substances. Proc. Roy. Soc. A 81, 141-
161.
RVLL- N ARDZEWSKI, C. (1957). On stationary sequences of random variables and
the de Finetti's [sic] equivalence. Colloq. Math. 4, 149-156.
- (1961). Remarks on processes of calls. Proc. 4th Berkeley Symp. Math. Statist.
Probab. 2, 455-465.
SANOV, I.N. (1957). On the probability of large deviations of random variables
(Russian). Engl. trans.: Sel. Trans. Math. Statist. Probab. 1 (1961), 213-244.
SCHILDER, M. (1966). Some asymptotic formulae for Wiener integrals. Trans.
Amer. Math. Soc. 125, 63-85.
SCHOENBERG, I.J. (1938). Metric spaces and completely monotone functions.
Ann. Math. 39, 811-841.
SCHRODINGER, E. (1931). Uber die Umkehrung der Naturgesetze. Sitzungsber.
Preuss. Akad. Wiss. Phys. Math. Kl. 144-153.
VAN SCHUPPEN, J.H., WONG, E. (1974). Transformation of local martingales
under a change of law. Ann. Probab. 2, 879-888.
SEGAL, I.E. (1954). Abstract probability spaces and a theorem of Kolmogorov.
Amer. J. Math. 76, 721-732.
SHANNON, C.E. (1948). A mathematical theory of communication. Bell System
Tech. J. 27, 379-423, 623-656.
SHARPE, M. (1988). General Theory of Markov Processes. Academic Press,
Boston.
SHIRVAEV, A.N. (1995). Probability, 2nd ed. Springer, NY.
SHUR, M.G. (1961). Continuous additive functionals of a Markov process. Dokl.
Akad. Nauk SSSR 137, 800-803.
SIERPINSKI, W. (1928). U ne theoreme generale sur les familles d' ensemble. Fund.
Math. 12, 206-210.
SKOROHOD, A.V. (1956). Limit theorems for stochastic processes. Th. Probab.
Appl. 1, 261-290.
Bibliography 617
- (1957). Limit theorems for stochastic processes with independent increments.
Th. Probab. Appl. 2, 122-142.
- (1961-62). Stochastic equations for diffusion processes in a bounded region,
I-II. Th. Probab. Appl. 6, 264-274; 7, 3-23.
- (1965). Studies in the Theory of Random Processes. Addison-Wesley, Reading,
MA. (Russian orig. 1961.)
SLIVNYAK, I.M. (1962). Some properties of stationary flows of homogeneous
random events. Th. Probab. Appl. 7, 336-341.
SLUTSKY, E.E. (1937). Qualche proposizione relativa aHa teoria delle funzioni
aleatorie. Giorn. fst. ftal. Attuari 8, 183-199.
SNELL, J .L. (1952). Application of martingale system theorems. Trans. Amer.
Math. Soc. 73, 293-312.
SOYA, M. (1967). Convergence d'operations lineaires non born(es. Rev. Roumaine
Math. Pures Appl. 12, 373-389.
SPARRE-ANDERSEN, E. (1953-54). On the fluctuations of sums of random
variables, I-II. Math. Scand. 1, 263-285; 2, 195-223.
SPARRE-ANDERSEN, E., JESSEN, B. (1948). Some limit theorelns on set-functions.
Danske Vide Selsk. Mat.-Fys. Medd. 25:5 (8 pp.).
SPITZER, F. (1964). Electrostatic capacity, heat flow, and Brownian motion. Z.
Wahrsch. verw. Geb. 3, 110-121.
- (1976). Principles of Random Walk, 2nd ed. Springer, NY.
STIELTJES, T.J. (1894-95). Recherches sur les fractions continues. Ann. Fac. Sci.
Toulouse 8, 1-122; 9, 1-47.
STONE, C. J. (1963). Weak convergence of stochastic processes defined on a semi-
infinite time interval. Proc. Amer. Math. Soc. 14, 694-69().
- (1969). On the potential operator for one-dimensional recurrent random walks.
Trans. Amer. Math. Soc. 136, 427-445.
STONE, M.H. (1932). Linear transformations in Hilbert space and their
applications to analysis. Amer. Math. Soc. Colla Publ. 15.
STOUT, W.F. (1974). Almost Sure Convergence. Academic Press, NY.
STRASSEN, V. (1964). An invariance principle for the law of the iterated
logarithm. Z. Wahrsch. verw. Geb. 3, 211-226.
STRATONOVICH, R.L. (1966). A new representation for stochastic integrals and
equations. SIAM J. Control 4, 362-371.
STRICKER, C., YOR, M. (1978). Calcul stochastique dependant d'un parametre.
Z. Wahrsch. verw. Geb. 45, 109-133.
STROOCK, D.W. (1993). Probability Theory: An Analytic View. Cambridge Univ.
Press.
STROOCK, D.W., VARADHAN, S.R.S. (1969). Diffusion processes with continuous
coefficients, I-II. Comma Pure Appl. Math. 22, 345-400, 4:79-530.
- (1979). Multidimensional Diffusion Processes. Springer, Berlin.
SUCHESTON, L. (1983). On one-parameter proofs of almost sure convergence of
multiparameter processes. Z. Wahrsch. verw. Geb. 63, 43-49.
TAKACS, L. (1967). Combinatorial Methods in the Theory of Stochastic Processes.
Wiley, NY.
618 Foundations of Modern Probability
TANAKA, H. (1963). Note on continuous additive functionals of the I-dimensional
Brownian path. Z. Wahrsch. verw. Geb. 1, 251-257.
TEMPEL'MAN, A.A. (1972). Ergodic theorems for general dynamical systems.
Trans. Moscow Math. Soc. 26, 94-132.
THORISSON, H. (1996). Transforming random elements and shifting random
fields. Ann. Probab. 24, 2057-2064.
- (2000). Coupling, Stationarity, and Regeneration. Springer, NY.
TONELLI, L. (1909). Sull'integrazione per partie Rend. Ace. Naz. Lincei (5) 18,
246-253.
TROTTER, H.F. (1958a). Approximation of semi-groups of operators. Pacific J.
Math. 8, 887-919.
- (1958b). A property of Brownian motion paths. Illinois J. Math. 2, 425-433.
VARADARAJAN, V.S. (1958). Weak convergence of measures on separable metric
spaces. On the convergence of probability distributions. Sankhya 19, 15-26.
- (1963). Groups of automorphisms of Borel spaces. Trans. Amer. Math. Soc.
109, 191-220.
VARADHAN, S.R.S. (1966). Asymptotic probabilities and differential equations.
Comma Pure Appl. Math. 19, 261-286.
- (1984). Large Deviations and Applications. SIAM, Philadelphia.
VILLE, J. (1939). Etude Critique de la Notion du Collectif Gauthier-Villars,
Paris.
VITALI, G. (1905). Sulle funzioni integrali. Atti R. Accad. Sci. Torino 40, 753-
766.
VOLKONSKY, V.A. (1958). Random time changes in strong Markov processes.
Th. Probab. Appl. 3, 310-326.
- (1960). Additive functionals of Markov processes. Trudy Mosk. Mat. Obshc.
9, 143-189.
WALD, A. (1946). Differentiation under the integral sign in the fundamental
identity of sequential analysis. A nn. Math. Statist. 17, 493-497.
- (1947). Sequential Analysis. Wiley, NY.
WALSH, J .B. (1978). Excursions and local time. Asterisque 52-53, 159-192.
WANG, A. T. (1977). Generalized It6's formula and additive functionals of
Brownian motion. Z. Wahrsch. verw. Geb. 41, 153-159.
WATANABE, H. (1964). Potential operator of a recurrent strong Feller process in
the strict sense and boundary value problem. J. Math. Soc. Japan 16, 83-95.
WATANABE, S. (1964). On discontinuous additive functionals and Levy measures
of a Markov process. Japan. J. Math. 34, 53-79.
- (1968). A limit theorem of branching processes and continuous state branching
processes. J. Math. Kyoto Univ. 8, 141-167.
WElL, A. (1940). L'integration dans les Groupes Topologiques et ses Applications.
Hermann et Cie, Paris.
WIENER, N. (1923). Differential space. J. Math. Phys. 2, 131-174.
- (1938). The homogeneous chaos. Amer. J. Math. 60, 897-936.
- (1939). The ergodic theorem. Duke Math. J. 5, 1-18.
Bibliography 619
WILLIAMS, D. (1991). Probability with Martingales. Cambridge Univ. Press.
YAMADA, T. (1973). On a comparison theorem for solutions of stochastic
differential equations and its applications. J. Math. Kyoto Univ. 13, 497-512.
YAMADA, T., WATANABE, S. (1971). On the uniqueness of solutions of stochastic
differential equations. J. Math. Kyoto Univ. 11, 155-167.
YOEURP, C. (1976). Decompositions des martingales locales et formules expo-
nentielles. Lect. Notes in Math. 511, 432-480. Springer, Berlin.
YOR, M. (1978). Sur la continuite des temps locaux associee a certaines
semimartingales. Asterisque 52-53, 23-36.
Y OSIDA, K. (1948) . On the differentiability and the representation of one-
parameter semigroups of linear operators. J. Math. Soc. Japan 1, 15-21.
YOSIDA, K., KAKUTANI, S. (1939). Birkhoff's ergodic theorem and the maximal
ergodic theorem. Proc. Imp. Acad. 15, 165-168.
ZAHLE, M. (1980). Ergodic properties of general Palm measures. Math. Nachr.
95, 93-106.
ZAREMBA, S. (1909). Sur Ie principe du minimum. Bull. Acad. Sci. Cracovie.
ZVGMUND, A. (1951). An individual ergodic theorem for noncommutative
transformations. Acta Sci. Math. (Szeged) 14, 103-110.
Symbol Index
C, 378
Co, Co, 369, 374
C k , 340
Ct:, 98, 225
Cb(8), 65
C(K, S), 307
cf, 481
cov(, 1]), cov[; A], 50, 302
FD, Fv, 480
F 0 Q, 2
F v g, V n Fn, 50
FJlQ, FJlgH, 50,109
f, 11
1-1, 3
I:, I:;, 340
I . A, 442
fog, 5
/0g, 262
(f, g), I -l g, 17
f . J-l, 12
I >- U, I -< V, 36
L, 567
, 308
l.pB, 324
A, 499
An, 391
A>', 372, 442
A C , A \ B, ALlB, 1
A X B, 2
A, AIL, 13, 46
IBI, 187
BO B- 541
, ,
8, B(8), 2
GD,gD, 477
l'f, 481
Dh, V h , 434
D(R+, S), 313, 563
D([O, 1], S), 319
Ll, \7, 1, 287, 375, 377, 483
8, 8, 150, 187, 473
6 x , 8
:: , 48
d
-+, 65
HI, Hoo, 543, 553
H@n 262
,
H, 480
ha,b, 456
H(), H(IF), H(vlll), 220,554
E, 48, 225
E , 443
Ex, EJ.l' 145
E[; A], 49
E[IF] == E:F, 104
£, En, 263, 335
£(X), 363, 522
I, I(B), 368, 545
In, 263
I(), I(IF), 220
Tfa, 181, 189
id, Id, 295
K, 324
KD, lC;, 480
Fn, 75
II F II , 33
F, 120, 324
F , 124
F+, 121
Fr, 120
Fr- , 491
Foo, 132
Lt, Lf, 430, 436, 446
L , 481
LP, Lfoc' 15, 36;
L(X), L(X), 3:6-337, 344, 526
L 2 (M), 517
L 2 ( 1] ) , 266
.c(), 47
A, 24
A, A"', 539, 554
622 Foundations of Modern Probability
M, MJ, 184,391
(M), (M, N), 280, 516
M, Mo, 526
M 2 , M5, Mfoc, 331,515-516
M(S), 19, 225
524
,
jj, J-L, 227
J..lt, J-Ls,t, 144
J.L, 481
J..lJ, 10
J..loJ-l, 10
J-L*l/, 15
J..ll/, J.L l/, 14, 20, 142
J..l 1- l/, J.L « l/, J.L rv l/, 13, 29, 363
J..l V l/, J.L 1\ l/, 29
S, 377
s..n, "Snf, 184, 391, 396
8, 81-" 225, 316, 324, 564
a{.}, 2,5
supp J-L, 9
Tt, T/', 368, 372
TA, TB, 123, 492
Ta, Ta,b, 455
[r], _ 492
()t, (), 146, 179, 189, 391
U, U Q , Uh, UA, U1, 402,442-443
N(m,a 2 ), 90
N(S), 226
N, 2
(n//k), 187
l/, 290, 435
l/ A , 442
v . X, 128, 336, 517-518, 526
v
, 98, 564
var(), var[c;; A], 50, 71
wf, w(J, h), w(J, t, h), 57, 274, 310,
562
w(j,t,h), 563
w
, 65
n, w, 46
n T 2
,
XC, X d , 527
X r , 128
X., X;, 129
XodY, 342
[X], [X, Y], 280, 332, 519
{, 436
, 503
. , 226
, n, 190, 538
P, 46
P , 509
Px, PJ.l' 145,391
Po-l, 47
P[AIF] = p:F A, 106
P(S), 19
Pa,b, 456
P, Pj, 151, 243
Pt, PP, 475-476
p
, 63, 408
1fB, 1fJ, 1ft, 19,47,225-226,316
Z, 432
Z,Z+, 6,59
(, (D, 380, 473
Qx,, Q'x,, 203, 209
Q, Q+, 98, 125
0, 1
[[0,1), 464
1, 58
lA, 1{.}, 5,46
2 8 1
,
<, 57
,--...
II . II, II . II p , 15, 152, 369
R).., 370
R, IR+, JR , R +, 2, 5
Tx,y, 149
Author Index
Abel, N.H., 15, 144, 147, 242
Adler, R.J., 580
Akcoglu, M.A., 586
Aldous, D.J., 197, 314, 577-578, 583,
587
Alexandrov, A.D., 75, 571
Andre, D., 165, 223, 575, 578
Arndt, D., 604
Arzela, C., 307, 310-311, 559, 563
Ascoli, G., 307, 310-311, 559, 563
Athreya, K.B., 576
Baccelli, F., 578, 592
Bachelier, L., 256, 574-575, 580, 590
Bahadur, R.R., 594
Banach, S., 49, 369, 534, 585
Barbier, E., 578
Bass, R.F., 591
Bauer, H., 595
Baxter, G., 159, 169, 576
Bayes, T., 578
Belyaev, Y.K., 583
Berbee, H.C.P., 197, 577, 587
Bernoulli, J., 46, 55-56, 539, 571, 580
Bernstein, S.N., 128, 247, 572-573,
587
Bertoin, J., 582
Bertrand, J., 223, 578
Bessel, F. W., 256
Bichteler, K., 533, 593
Bienayme, J., 63, 69, 571
Billingsley, P., 569, 571, 573, 578,
581, 583, 595
Birkhoff, G.D., 178, 181, 391, 393,
576
Blackwell, D., 172, 575-576, 586
Blumenthal, R.M., 380-381, 446, 501,
575, 586, 588-589, 591-592
Bochner, S., 100, 261, 572, 580
Bogolioubov, N., 196, 577
Bohl, 200
Boltzmann, L., 576
Borel, E., 2, 3, 7, 24-25, 45, 47, 55,
119, 131, 308, 561, 569-571,
574
Brandt, A., 592
Breiman, L., 221, 561, 578, 589
Bremaud, P., 578, ,592
Brown, R., 252-253, 439, 580
Bryc, VV., 547, 594
Biihlmann, H., 217, 578
Buniakovsky, V.Y., 17, 334, 569
Burkholder, D.L., :333, 524, 584, 591,
593
Cameron, R.H., 364, 537, 543,
584-585
Cantelli, F.P., 45, 47, 55, 75, 119,
131, 570-571, 574
Caratheodory,' C., 24, 26, 570
Carleson, L., 578
Cauchy, A.L., 16-17, 65, 238, 304,
334, 470-472, 569
Chacon, R.V., 393, 586
Chapman, S., 140, 142-143, 145, 154,
367-368, 378, 574
Chebyshev, P.L., 63, 69, 571-572
Chentsov, N.N., 57, 313, 571, 583
Chernoff, H., 540, 583, 594
Choquet, G., 483, 486-487, 562, 577,
583, 591
Chow, Y.S., 572, 574
Chung, K.L., 162, 221, 272, 381,
481, 575-576, 578-579, 584,
590-591
Courant, R., 590
Courrege, P., 334, 517, 584, 592
Cox, D., 224, 226--228, 230-233, 246,
317-319,327-328,579,583
Cramer, H., 87, 261, 540, 571-572,
577, 580, 594
Csorg6, M., 581
Daley, D.J., 578-579
Dambis, K.E., 352, 585
Daniell, P.J., 23, 104, 114, 570, 573
Davis, B.J., 524, 593, 598
Dawson, D.A., 551, 594
Day, M.M., 201, 576
Debes, H., 583
624 }4bundations of Modern Probability
Dellacherie, C., 357, 533, 562,
574-575, 577, 580, 586, 589,
591-593, 595
Dembo, A., 594
Derman, C., 464, 589
Deuschel, J.D., 594
Dini, D., 541
Dirac, P., 8
Dirichlet, P.G.L., 470, 474, 590
Doeblin, W., 299, 303, 574-575, 579,
582, 586, 589
Dohler, R., 573
Doleans(-Dade), C., 345, 493,
496-497, 518, 522, 584,
591-593
Donsker, M.D., 275, 312, 319, 555,
581-582, 594
Doob, J.L., 7, 109-110, 124, 126-127,
129-131, 134-136, 138, 237,
358, 474, 490, 493, 495,
507, 509, 569, 571, 573-575,
577-579, 581, 584-587,
590-592
Dubins, L.E., 352, 581, 585
Dudley, R.M., 79, 563, 569, 572, 583,
594
Dunford, N., 69, 392, 571, 586, 591
Durrett, R., 581
Dvoretzky, A., 281, 581
Dynkin, E.B., 380, 382-384, 456, 569,
574-575, 577, 585-589
Egorov, D., 18
Einstein, A., 580
Elliott, R.J., 592
Ellis, R.S., 594
Engelbert, H.J., 450-451, 589
Erdos, P., 276, 576, 581
Erlang, A.K., 234, 579
Ethier, S.N., 563, 575, 583, 586, 588,
595
Faber, G., 571
Farrell, R.H., 195, 577
Fatou, P., 11, 46, 67, 569
Fell, J.M.G., 324, 470, 565-567, 595
Feller, W., 92-93, 96, 165, 172, 302,
367, 369-387, 400, 405-409,
421, 442, 456, 458, 462, 465,
501, 572, 575-576, 578-580,
582, 585-587, 589-590
Fenchel, W., 537, 539, 554, 594
Feynman, R.P., 470-471, 590
Fichtner, K.H., 246
de Finetti, B., 202, 212, 578, 581
Fisk, D.L., 339, 342, 426, 584, 593
Fortet, R., 586
Fourier, J.B.J., 90, 100, 163, 262, 575
Franken, P., 578
Frechet, M., 569
Freedman, D., 251, 575, 580, 586,
589, 594
Freidlin, M.I., 537, 553, 594
Friedrichs, K., 599
Frostman, 0., 590
Fubini, G., 14, 52, 108, 569
Fuchs, W.H.J., 162, 575
Furstenberg, H., 193, 577
Galmarino, A.R., 355, 585
Garsia, A.M., 182, 502, 525, 576, 591
Gartner, J., 551, 594
Gauss, C.F., 90-96, 250-254, 260-263,
266, 351, 473, 539, 579, 590
Geiger, H., 579
Getoor, R.K., 446, 575, 585, 588-589,
591-592
Gihman, I.I., 587
Girsanov, LV., 362, 365, 515, 585,
587, 592
Glivenko, V.I., 75, 571
Gnedenko, B.V., 303, 582
Goldman, J.R., 583
Goldstein, J .A., 586
Goldstein, S., 197, 577, 586
Grandell, J., 579
Green, G., 458, 470, 475, 477-486,
590
Greenwood, P., 588
Griffeath, D., 577, 586
Grigelionis, B., 503, 583, 592
Gronwall, 415, 455, 554
Gundy, R.F., 333, 524, 584, 598
Haar, A., 23, 39, 41, 198, 570
Hagberg, J., 583
Hahn, H., 28, 33, 35, 49, 534, 570
Hajek, J., 583
Hall, P., 581
Halmos, P.R., 569, 573
Hardy, C.H., 184
Harris, T.E., 400, 405-406, 408, 410,
583, 586-587
Hartman, P., 275, 581
Hausdorff, F., 36, 247, 311, 399, 563
Heine, H.E., 25
Helly, E., 98, 572
Hermite, C., 84, 265-266
Hewitt, E., 45, 53, 161, 570
Heyde, C.C., 581
Hilbert, D., 104, 188, 251, 260,
262-263, 265-266, 331, 351,
515, 543
Hille, E., 367, 375, 585
Hitczenko, P., 592
Holder, 0., 15, 49, 57, 109, 252, 268,
313, 426, 448, 569-570, 586,
589
Hopf, E., 159, 168, 392, 586
Horowitz, J., 588
Hunt, G.A., 124, 256, 443, 476, 580,
586, 588-591
Hurewicz, W., 586
Hurwitz, A., 570
Ikeda, N., 584, 588
Ioffe, D., 549, 594
Ionescu Tulcea, A., 221, 578
Ionescu Tulcea, C.T., 104, 116, 573
Ita, K., 263, 265, 287, 336, 339-341,
357-358, 415, 431, 435-436,
458, 520, 571, 579-580, 582,
584-585, 587-591, 593
Jacod, J., 503, 518, 524,563,582-583,
585, 588, 592-593, 595
Jagers, P., 583
Jamison, B., 586
Jensen, J.L.W.V., 49, 109, 570
Jessen, B., 132, 574
Johnson, W.B., 593
Jordan, C., 33, 570
Kac, ., 276, 470-471, 577, 581, 590
Kakutani, S., 182, 354, 474, 575-576,
584, 586, 590
Kallenberg, 0., 571, 573, 575,
578-579, 582-585, 588,
592-593
Author Index 625
Kallianpur, G., 580
Kaplan, E.L., 206, 577
Karamata, J., 96, 572
Karatzas, 1., 580, 584-585, 588-591
Kazamaki, N., 344,584
Kemeny, J.G., 575
Kendall, D., 583
Kerstan, J., 600, 612
Kesten, H., 193, 577
Khinchin, A., 70, 96, 259, 290,
302, 537, 571, 576-578, 580,
582-583, 586, 593
Kingman, J.F.C., ] 78, 192, 577, 579
Kinney, J.R., 379, 571, 586
Knapp, A.W., 608
Knight, F.B., 355, 428, 440, 585, 588
Koebe, P., 473
Kolmogorov, A.N.. 53, 57, 69-71,
73, 104, I1t), 132, 142-143,
145, 152, 154, 242, 291, 313,
368,371,471, 563, 570-571,
573-574, 576, 579-582, 585,
589-590, 593
Kom16s, J., 581
Konig, D., 207, 578, 604
Koopman, B.O., 576
Korolyuk, V.S., 207, 578
Krengel, U., 577, 587
Krickeberg, K., 579, 593
Kronecker, L., 62, 73
Krylov, N., 196, 577
Kullback, S., 594
Kunita, H., 336, 347, 517, 521, 584,
587, 592-593
Kuratowski, K., 562
Kurtz, T.G., 385, 563, 575, 583, 586,
588, 595
Kuznetsov, S.E., 591
Kwapien, S., 572
Langevin, P., 414, 580, 587
Laplace, P.S. de, 84, 86, 88, 100, 227,
370, 375, 473, 572-573, 579,
590
Last, G., 592
Leadbetter, M.R., 571, 577, 582
Lebesgue, H., 11-12, 14,24-25,27,
29, 31, 55, 569-570
Le Carn, L., 583
van Leeuwenhoek, A., 580
626 Foundations of Modern Probability
Le Gall, J.F., 589
Legendre, A.M., 537, 539, 554, 594
Leibler, R.A., 594
Leontovich, M.A., 590
Levi, B., 11, 569
Levy, P., 71, 86, 90, 93, 96, 100,
128, 131-132,234, 252,255,
258, 285-287, 290-292, 294,
298-299, 352-354, 374, 430,
436, 571-576, 579-582, 584,
586-588, 593
Lewy, H., 599
Liapounov, A.M., 572
Liemant, A., 600
Liggett, T., 577
indeberg, J.W., 90, 92, 572, 586
Lindgren, G., 610
Lindvall, T., 575-576, 583, 587
Lipschitz, R., 268, 415, 453-455, 553,
589
Lipster, R.S., 611
Littlewood, J .E., 184
Loeve, M., 57, 569, 571-572, 576-577,
582
Lomnicki, Z., 117, 573
Lukacs, E., 572
Lundberg, F., 579
Lusin, N.N., 19, 562
Mackevicius, V., 385, 586
Maisonneuve, B., 588, 600
Major, P., 609
Maker, P., 183, 577
Mann, H.B., 76, 571
Marcinkiewicz, J., 73, 571, 593
Markov, A.A., 63, 140-155, 237-245,
254, 256, 368, 378, 380, 387,
391, 396, 421, 571, 573-574
Martin, W.T., 364, 537, 543, 584-585
Maruyama, G., 465, 585, 589
Matheron, G., 583, 591, 595
Matthes, K., 207, 578-579, 583, 600
Maxwell, J.e., 251, 579
McDonald, D., 596
McKean, H.P., 447, 458, 580, 588-590
McMillan, B., 221, 578
Mecke, J., 319, 579, 583, 612
Meleard, S., 460, 589
Memin, J., 518
Metivier, M., 592
Meyer, P.A., 136,431,493-494,498,
501, 505, 510, 518, 526-527,
562, 574-575, 577, 584, 586,
588, 591-593, 595
Millar, P.W., 333, 584
Minkowski, H., 15-16, 109, 183,
190-191, 263, 569
Mitoma, 1., 583
de Moivre, A., 572, 579
Monch, G., 579
de Morgan, A., 1
Motoo, M., 464, 589
Nawrotzski, K., 583
von Neumann, J., 200, 570, 573, 576
Neveu, J., 221, 502, 574, 587, 591
Newton, I, 474,488
Ney, P., 596
Nguyen, X.X., 190, 577
Nikodym, a.M., 29, 31, 105, 570, 573
Norberg, T., 325, 583
Novikov, A.A., 333, 364, 584-585
Nualart, D., 580
0ksendal, B., 584
Orey, S., 152, 172, 397, 400, 576,
586-587, 593
Ornstein, D.S., 162, 393, 575, 586
Ornstein, L.S., 254, 262, 414, 580, 587
Ososkov, G.A., 583
Ottaviani, G., 312, 583
Paley, R.E.A.C., 63, 268, 575, 580,
593
Palm, C., 203-210, 576-578, 583
Papangelou, F., 505, 592
Parseval, M.A., 162, 262
Parthasarathy, K.R., 562, 594
Pellaumail, J., 592
Perkins, E., 589
Petrov, V.V., 594
Phillips, H.B., 590
Picard, E., 415
Pitman, J.W., 588
Pitt, H.R., 576
Plancherel, M., 262
Poincare, H., 590
Poisson, S.D., 87-88, 226-231,
234-238, 241-242, 288,
297-298, 301, 318, 368, 436,
504-505, 579
Pollaczek, F., 575
Pollard, D., 583
Pollard, H., 603
P61ya, G., 572, 575
Port, S.C., 591
Pospisil, B., 579
Prohorov, Y.V., 76, 309, 311, 313,
316, 563, 571, 581-583, 593
Protter, P., 593
Pukhalsky, A.A., 546, 594
Radon, J., 29, 31, 36, 105, 569-570,
573
Rao, K.M., 494, 532, 591, 593
Ray, D.B., 428, 440, 586, 588-589
Renyi, A., 234, 579, 583
Revesz, P., 581
Revuz, D., 442-445, 447, 580,
584-585, 587-590
Riemann, G.F.B., 31, 43, 175
Riesz, F., 23,36,43,378,490, 511,
569-571
Rogers, C.A., 577
Rogers, L.C.G., 561, 575, 584,
588-589, 592
Rootzen, H., 610
Rosen, B., 583
Rosinski, J., 592
Royden, H.L., 570, 594
Rubin, H., 76, 572
Rutherford, E., 579
Ryll-Nardzewski, C., 207, 212,
577-578
Sanov, LN., 537, 555, 594
Savage, L.J., 45, 53, 161, 570
Schechtman, G., 607
Schilder, M., 537, 543, 554, 557, 594
Schmidt, V., 604
Schmidt, W., 450-451, 589
Schoenberg, LJ., 251, 580
Schrodinger, E., 590
van Schuppen, J .H., 362, 523, 585,
592
Schwartz, J. T ., 392, 586
Schwarz, G., 352, 585
Schwarz, H.A., 17
Segal, I.E., 580
A ut.hor Index 627
Shannon, C.E., 221, 578
Sharpe, M., 575, 585, 591
Shephard, G.C., 577
Shiryaev, A.N., 563, 582-583,
592-593, 595
Shreve, S.E., 580, 584-585, 588-591
Shur, M.G., 591
Sierpiriski, W., 2, 200, 569
Skorohod, A.V., 79, 113,271, 273,
298, 313, 315, 419, 429,
453-454, 563, 572, 581-583,
587-589, 595
Slivnyak, L1., 210, 577-578
Slutsky, E., 571
Smoluchovsky, M., 142, 574
Snell, J .L., 130, 574, 608
Soya, M., 385, 586
Sparre-Andersn, E., 166, 169, 216,
276, 574, 576, 578
Spitzer, F., 576, 590
Stieltjes, T.J., 31, 255, 329, 340, 519,
570
Stone, C.J., 575, 583, 591
Stone, M.H., 86, 261, 580
Stout, W.F., 572
Strassen, V., 273, 537, 557, 581, 594
Stratonovich, R.L., 342, 426, 584
Stricker, C., 80, 345,571,584
Stroock, D.W., 418, 420-421, 472,
587-588, 590, 594
Sucheston, L., 617
Sztencel, R., 593
Takacs, L., 578
Tanaka, H., 428,431, 439, 447, 454,
459, 465, 588-589
Taylor, B., 90, 92
Teicher, H., 572, 574, 583
Tempel'man, A.A., 577
Thorisson, H., 197-198,209,573,575,
577-578, 587
Tonelli, L., 14, 569
Trotter, R.F., 385, 430, 586, 588
Thsnady, G., 609
Tychonov, A.N., 40
Uhlenbeck, G.E., 254, 262, 414, 580,
587
Ulam, S., 117,573
628 Foundations of Modern Probability
Varadarajan, V.S., 195, 309, 577, 583
Varadhan, S.R.S., 418, 420-421, 472,
541, 547, 555, 587-588, 590,
594
Vere-Jones, D., 578-579
Ville, J., 573
Vitali, G., 570
Volkonsky, V.A., 445, 447, 458,
587-589, 591
Voronoi, G., 204
Wald, A., 76, 364, 571, 575, 585
Walsh, J.B., 381, 440, 588
Wang, A.T., 431, 588
Watanabe, H., 406, 464, 586, 589
Watanabe, S., 336, 347, 374, 424,
453,505,517,521, 584-585,
587-589, 591-593
Weierstrass, K., 86, 341
Weil, A., 39, 570
Wentzell, A.D., 537, 553, 594
Weyl, H., 200
Wiener, N., 168, 184, 187, 252-253,
260, 263-266, 268, 358, 570,
575-576, 580, 590, 614
Williams, D., 561, 571, 574-575, 584,
588-589, 592
Williams, R.J., 584
Wintner, A., 275, 581
Wold, H., 87, 250, 572
Wong, E., 362, 523, 585, 592
Woyczynski, W.A., 572, 592
Yamada, T., 424, 453-454, 587, 589
Yan, J.A., 518, 533
Yoeurp, C., 527, 529, 593
Yor, M., 80, 345, 430, 571, 580,
584-585, 588-590
Yosida, K., 182, 367, 372, 375, 386,
576, 585
Yushkevich, A.A., 380, 574, 586
ZaWe, M., 210, 578
Zaremba, S., 474
Zeitouni, 0., 594
Zessin, H., 190, 577
Zinn, J., 607
Zorn, M., 43, 197-198
Zygmund, A., 63, 73, 178, 186, 268,
571, 577, 593, 614
Subject Index
absolute:
continuity, 13, 29, 35, 261, 360,
432, 523
moment, 49
absorption of:
Markov process, 155, 238, 378, 382,
434
diffusion, 461, 465
su permartingale, 136
accessible:
set, boundar 160, 462
time, 492, 50D-50 1
jumps, 499, 529-530
action, left, right, 41
adapted, 120, 503
additive functional, 442
a.e., almost everywhere, 12
allocation sequence, 215
almost:
everywhere, 12
invariant, 180
alternating function, 483, 487
analytic function, 342, 353
announcing sequence, 341, 491
aperiodic, 150
approximation of:
covariation, 339
empirical distributions, 278
exchangeable sums, 321
local time, 432, 437
Markov chains, 387
martingales, 280
predictable process, 51 7
progressive process, 343
random walk, 273, 282, 299, 315
renewal process, 277
arcsine laws, 258, 276, 299
Arzela-Ascoli theorem, 310, 563
a.s., almost surely, 47
asymptotic invariance, 211
atom, atomic, 9, 19
augmented filtration, 124
averaging property, 105
backward equation, 242, 372, 471
balayage, sweeping, 474
ballot theorem, 218, 220
BDG inequalities, 333, 524
Bernoulli sequence, 56, 539
Bessel process, 256, 440
bilinear, 50
binary expansion, 56
binomial process, 226-227, 229, 235
Blumenthal's zero-one law, 381
Borel-Cantelli lemrna, 47, 55, 131
Borel:
isomorphism, space, 7, 561
set, a-field, 2
boundary behavior; 462, 465, 474
bounded optional time, 126
Brownian:
bridge, 253, 278, 319, 356
excursion, 439
motion, 252-260, 271-275, 277-282,
312, 352-360, 364-365,
412-424, 430, 439-440,
443-445, 447, 450-455, 458,
472-486, 507-513, 543, 553,
557
scaling, inversion, 253
CAF, continuous additive functional,
442
Cameron-Martin space, 364, 543,
553, 557
canonical:
decomposition, 337, 518
process, space, filtration, 146, 380,
384
capacity, 481-483, 486-487
Cartesian product, 2
Cauchy:
sequence, 16, 65
problem, 471-472
Cauchy-Buniakovsky inequality, 17,
334, 516, 520
centering, centered, 72, 126, 250
central limit theorem, 90, 275, 312
chain rule for:
conditional independence, III
630 Foundations of Modern Probability
conditioning, 105
integration, 12, 338, 517
change of:
measure, 360-365, 422, 523
scale, 451, 456
time, 344, 352, 423, 451-453,
458-461, 505-506
chaos expansion, 266, 360
Chapman-Kolmogorov equation,
142-143, 145, 151, 368
characteristic:
exponent, 291
function, 84-86, 90, 100, 227
measure, 241
operator, 383
characteristics, 290, 413
Chebyshev's inequality, 63
closed, closure:
martingale, 131, 135
operator, 373-374
coding, 113, 145, 204
commuting operators, 186, 370
compactification, 377-378
compactness:
vague, 98, 564
weak, 98, 309
weak £1, 69
in C and D, 563
comparison of solutions, 454-455
compensator, 493, 498-500, 503-506,
510-511
complete, completion:
filtration, 123
function space, 16, 65
a-field, 13, 110
completely monotone, alternating,
483, 487
complex-valued process, 260, 341,
351-352
composition, 5
compound:
optional time, 146
Poisson process, 242, 297-298,
300-301
condenser theorem, 483
conditional:
distribution, 107
entropy, information, 220
expectation, 104-105
independence 109-113, 141, 212,
217,228,424
probability, 106
conductor, 480-481, 483
cone condition, 474
conformal mapping, invariance, 342,
353
conservative semigroup, 369, 377
continuity:
set, 75, 545
theorem, 86, 100
for a time-change, 344
continuous:
additive functional, 442-447,
451-452, 458, 510-513
in probability, 216, 286, 319
mapping, 64, 76, 549
martingale component, 527-529
contraction:
operator, 105, 109, 368, 391-393,
415
principle., 549
convergence in/of:
distribution, 65-66, 71-72, 75-79,
86-88, 90-93, 96, 99-100,
275-276, 308-326, 385-387
probability, 63-66, 80
exchangeable processes, 322
infinitely divisible laws, 295-296
Levy processes, 298
LP, 16,68
Markov processes, 385
point processes, 317, 326
random measures, 316
random sets, 325
convex, concave:
functions, 49, 126, 431, 459,
538-539
sets, 187-190, 196, 533
convolution, 15, 52
core of generator, 373-374, 385, 387
countably additive, subadditive, 8
counting measure, 8
coupling, 152, 172
independent, 152, 466
shift, 197-198, 209
Skorohod, 79, 113, 298-299
covariance, 49-50, 250
covariation, 332, 334-336, 339-342,
516-517,519-521,526,529
Cox process, 226-228, 230-231,
318-319
Cramer-Wold theorem, 87
cumulant-generating function, 539,
554
cycle stationarity, 206
cylinder set, 2, 115
(0), submartingale class, 493
Daniell-Kolmogorov theorem,
114-115
debut, 123
decomposition of:
finite-variation function, 33-34
increasing process, 499
martingale, 518, 527, 529
measure, 29
optional time, 493
signed measure, 28
submartingale, 126, 493
degenerate:
measure, 9, 19
random element, 51
delay, 170-172
density, 12-13, 29, 31, 133
differentiation theorem, 31
diffuse, nonatomic, 9-10, 19, 230, 233,
299
diffusion, 384, 413, 455-467, 471
equation, 413, 421, 423, 450-455
Dirac measure, 8
Dirichlet problem, 474
discrete time, 143
disintegration, 108
dissipative, 377
distri bu tion, 47
function, 48, 59
Doleans exponential, 522
domain, 473
of attraction, 96
of generator, 370, 372-375, 377
dominated:
convergence, 11-12, 337, 518, 526
ergodic theorem, 184
Donsker's theorem, 275, 312
Doob decomposition, 126
Doob-Meyer decomposition, 493
dual predictable projection, 498
duality, 167
Dynkin's formula, 382
Subject Index 631
effective dimension, 160
Egorov's theorem, 18
elementary:
function, 263
additive functional, 442
stochastic integral, 128, 335, 343,
517
elliptic operator, 384, 418, 472
embedded:
Markov chain, 239
martingale, 279
random variable, walk, 271-273,
464
empirical distribution, 75, 195, 278,
554-555
entrance boundary, 461-462
entropy, 220-221, 554-555
equicontinuous, 86, 311, 313-314, 563
equilibrium measure, 481-485
ergodicity, 181, 195-196, 397, 399,
465
ergodic decomposition, 196
ergodic theorems:
Markovian, 152-154, 244-245, 397,
399, 408-409, 465
multivariate, Palm, 186-187, 190,
209-210
ratio, 393, 396, 464
stationarity, contractions, 181-183,
392
subadditive, matrices, 192-193
evaluation, projection, 47, 225, 562
event, 46
excessive function, 379, 445, 507-511
exchangeable:
sequence, 212-215, 320-321
process, 216-218, 235, 319-322
excursion, 150, 433-440
existence of:
Brownian motion, 252
Cox process, randomization 231
Markov process, 143, 378
random sequence, process, 55,
114-117
solution to SDE, 415, 419, 422-423,
451
exit boundary, 461
expectation, expected value, 48-49,
52
exploon, 240,417,462
032 bundations of Modern Probability
exponential:
distribution, 237-240, 434
equivalence, 552
inequalities, 530
martingale, process, 351, 363, 522,
530
rate, 541
tightness, 546-549, 556
extended real line, 5
extension of:
filtration, 124, 352
measure, 26, 114-115, 362
probability space, 111-112
extreme:
element, 196
value, 257, 303
factorial measure, 213
fast reflection, 460
Fatou's lemma, 11, 67
Fell topology, 324, 565-566
FeUer process, semigroup, 369-387,
399-409, 421, 442, 445-446,
462, 501
Fenchel-Legendre transform, 539, 554
Feynman-Kac formula, 471
filling operator, functional, 394-395
filtration, 120
de Finetti's theorem, 212
finite-dimensional distributions, 48,
142
finite- variation:
function, 33-35
process, 330, 337,497, 518
first:
entry, 124
maximum, 166, 216, 258, 276, 299
passage, 166-170,292
Fisk-Stratonovich integral, 342
fixed jump, 286
flow, 183, 415
fluctuations, 167
forward equation, 372
Fubini theorem, 14, 52, 108, 358
functional:
CLT, LIL, 275, 312, 557
LDP, 547
representation, 80, 346
solution, 423
fundamental:
identity, 480
theorem, 31-32
Gaussian:
convergence, 90-92, 96
measure, process, 90, 250-252, 254,
260-266, 539
generated:
a-field, 2, 5
filtration, 120
generating function, 84
generator, 368, 370-377, 383-387
geometric distribution, 149, 434
Girsanov theorem, 362, 365, 523
Glivenko-Cantelli theorem, 75
goodness of rate function, 546
graph:
of operator, 373
of optional time, 492
Green function, potential, 458, 477,
513
Haar measure, 39
Hahn decomposition, 28
harmonic:
function, 353, 396, 473
measure, 474
minorant, 511
Harris recurrent, 400, 405-406
heat equation, 472
Helly's selection theorem, 98
Hermite polynomials, 265
Hewitt-Savage zero-one law, 53
HiUe-Yosida theorem, 375
hitting:
function, 325, 487
kernel, 473, 480, 485
time, 123, 456, 473
Holder:
continuous, 57, 252, 313
inequality, 15, 109
holding time, 238, 434
homogeneous:
chaos, 266
kernel, 144, 242
hyper-contraction, 321
hyperplane, 539
i.i.d. sequence, 53-54, 56, 73, 89-90,
95-96, 271-276, 294, 297, 312,
538-541, 555
inaccessible boundary, 460
increasing process, 493
increment of function, measure, 33,
58-59, 226, 234
independent, 50-55
independent increments:
processes, 144, 242, 252, 286-287
random measures, 226, 234-235
indicator function, 5, 46
indistinguishable, 57
induced:
O'-field, 2-3, 5
filtration, 120, 344
infinitely divisible, 293-297, 302
information, 220-221
initial distribution, 141
Inner:
content, 37
product, 17
radius, 187
instantaneous state, 434
integrable:
function, 11
increasing process, 496
random vector, process 47
integral representation:
invariant distribution, 196
martingale, 357-360
integration by parts, 339, 519, 523
intensity, 189, 203, 225
invariance principle, 277
invariant:
distribution, 148-149, 151-152,
243-244, 408-409, 467
function, 180, 392, 396-399
measure, 15, 27, 39-41, 391, 396,
404-407
set, a-field, 180, 183, 186, 189,
398-399
subspace, 188, 374
inverse:
contraction principle, 549
function, 3, 562
local time, 438
maximum process, 292
inversion formulas, 204-205
Lo., infinitely often, 46, 54-55, 131
Subject Index 633
irreducible, 151, 244
isometry, 260, 263, :351
isonormal, 251, 260, 263-266
isotropic, 352
Ita:
correction term, 340
formula, 340-342, 431, 521
integral, 336-337, 343-344
J 1 -topology, 313, 563
Jensen's inequality, 49, 109
joint stationarity, 203
Jordan decomposition, 33
jump transition kernel, 238
jump-type process, 237
kernel, 20-21, 56, 106, 145,225,404,
420
density, 133
hitting, quitting, sweeping, 473,
480-481, 485
transition, rate, 141-145, 238-242
killing, 471, 475
Kolmogorov:
extension theorern, 115
maximum inequality, 69
zero-one law, 53, 132-133
Kolmogorov-Chentsov criterion, 57,
313
ladder time, height, 166-167, 169-170
A-system, 2
Langevin equation, 414
Laplace:
operator, equation, 375, 472-473
transform, functional 84-86, 88,
100, 227, 370
large deviation principle, LD P,
541-555
last:
return, zero, 165, 258, 276
exit, 481
law of:
large numbers, 73, 95
the iterated logarithm, 259, 275,
277- 278, 557
lcscH space, 225
LDP, 541-555
Lebesgue:
decomposition, 29
634 .f4bundations of Modern Probability
differentiation theorem, 31
measure, 24, 27
unit interval, 55
Lebesgue-Stieltjes measure, integral,
31, 518
Legendre-Fenchel transform, 539, 554
level set, 254, 543
Levy:
characterization of Brownian
motion, 352
measure, 290
process, 290-294, 298-299, 315,
374, 518
Levy-Khinchin formula, 290-291
Lindeberg condition, 92
linear:
SDE, 414, 522
functional, 36, 263
Lipschitz condition, 415, 453-455
L log L-condition, 186
local:
characteristics, 413
condition, property 57, 105
conditioning, hitting, 207
operator, 383-384
martingale, submartingale, 330,
493, 518
measurability, 287
substitution rule, 341
time, 428-432, 436-438, 440-441,
446-447, 452, 454, 458,
512-513
localization, 330
locally:
compact, 225, 312, 316, 324, 369,
563
finite, 9, 19, 30, 33-35, 225, 283,
564
LP-
bounded, 67, 130, 132
contraction, 109, 391-393
convergence, 16, 68, 132, 181-183,
186-187, 190
Lusin's theorem, 19
marked point process, 234, 504-505
Markov:
chain, 151-154, 243-245, 387
inequality, 63
process, 141-148, 254, 367-387,
391, 396-409, 421, 455-467
martingale, 125-136, 352-358,
360-364, 382
closure, 131, 135
convergence, 130-132, 135
decomposition, 518, 527, 529
embedding, 279-281
problem, 418-421
transform, 127
maximum, maximal:
ergodic lemma, 181
inequality, 69, 128-129, 184, 188,
221, 312, 333, 392, 524, 530
measure, 29
operator, principle, 377, 383
process, 256, 292, 430
mean, 48
continuity, 28
ergodic theorem, 190
recurrence time, 154, 245
mean-value property, 473
measurable:
group, 15
function, 4-7
set, space, 2, 23
measure, 8-9
determining, 9, 195
preserving, 179, 235, 356
space, 8
valued function, process, 214, 324,
565
median, 71-72
Minkowski's inequality, 15-16, 109
mixed:
binomial, Poisson, 226-227, 229,
235
Li.d., Levy, 212, 217
mixing, 397-399
modulus of continuity, 57, 274,
310-311, 453, 562-563
moment, 49
moment inequalities, 129, 184, 333,
502, 524
monotone:
class theorem, 2
convergence, 11, 104
rgodic theorem, 187, 190
moving average, 261
multiple stochastic integral, 263-266,
358-360
multiplicative functional, 471
multivariate ergodic theorem,
186-190, 209-210
natural:
absorption, 461
increasing process, 494, 496-497
scale, 456
nonarithmetic, 1 72
nonnegative definite, 50, 261
normal, Gaussian, 90, 250
norm inequalities, 15-16, 109, 129,
184, 333, 502, 524
nowhere dense, 433-434
null :
array, 88, 91, 93, 300-303, 317-318
recurrent, 152, 245, 408-409, 465
set, 12
occupation:
density, 431, 477
times, measure, 149, 160, 171-173,
432
ONB, orthonormal basis, 251, 262
one-dimensional criteria, 233- 234,
317-318,324-326
operator ergodic theorem, 392-393
optional:
projection, 381
sampling, 127, 135
skipping, 215
stopping, 128, 338
time, 120-124, 146, 491-493, 498
Ornstein-Uhlenbeck process, 254,
262, 414
orthogonal:
functions, spaces, 17, 265-266
martingales, processes, 288, 355
measures, 13, 28-29
projection, 17
outer measure, 23-25, 37-38
Palm distribution, 203-210
parabolic equation, 372,471-472
parallelogram identity, 17
parameter dependence, 80, 345
partition of unity, 36
path, 47, 486
Subject Index 635
pathwise uniqueness, 414-415,
423-424, 453
perfect, 433
period, 15D-151
permutation, 53, 212
perturbed dynamical system, 553
7T'-system, 2
Picard iteration, 415
point process, 171-172, 203-207,
226-236,317-319,326
Poisson:
compound, 242, 297-298, 300-301
convergence, 88, 318
distribution, 88
integrals, 236, 287
mixed, 226, 229, 235
process, 226-227, 234-236, 238,
288, 318, 436, 486, 504-505
pseudo-, 241, 368
polar set, 354, 480
polarization, 516, 519
Polish space, 7, 56 L
polynomial chaos, 266
Portmanteau theorem, 75
positive:
density, 361, 441
functional, operator, 36, 105, 368
maximum principle, 375, 377, 383
operator, 368
random variables, 70, 91, 300
recurrent, 152, 245, 408-409, 465,
467
variation, 33
potential:
of additive functional, 442-445, 511
Green, 477-481, 512-513
operator, 370, 379, 402-403
term, 471
predictable:
quadratic variation, covariation,
280, 516
process, 491-492, 496-499,
502-504, 506, 517-518, 523,
526
random measure, 503
sampling, 215
sequence, 126
step process, 128, 331, 516, 533
time, 214, 341, 491-493,498-501,
504, 529
636 Foundations of Modern Probability
prediction sequence, 214
preseparating class, 317, 326, 567
preservation of:
semimartingales, 340, 431, 521, 524
stochastic integrals, 362
probability, 46
generating function, 84
measure, space 46
product:
a-field, 2, 115
measure, 14-15, 52, 117
progressive, 122, 345, 356, 413
Prohorov's theorem, 309
projection, 17, 562
projective limit, 114-117, 551, 568
proper, 41
pseudo-Poisson, 241, 368
pull-out property, 105
purely:
atomic, 10
discontinuous, 499, 527, 529
quadratic variation, 255, 280,
332-334, 337, 519-520
quasi-left continuous , 499, 501, 504,
529
quasi-martingale, 532
quitting time, kernel, 481, 485
Radon measure, 36
Radon-Nikodym theorem, 29, 105
random:
element, variable, process, 47
matrix, 193
measure, 106, 203-204, 209-210,
212, 218, 225-235, 316, 503
sequence, 64, 78, 551
series, 69-73, 319-320
set, 325-326, 486-487
time, 120
walk, 54, 160-172, 271, 273,
275-276, 282, 299, 315
randomization, 113, 145, 272, 321
of point process, 226-228
variable, 112, 352
fate:
function, kernel, 238, 545-555
process, 352
ratio ergodic theorem, 393,464
raw rate function, 545, 549
Ray-Knight theorem, 440
rcll, 134
recurrence, 149, 151-152, 160-164,
244, 400, 405-406
time, 154, 245, 434
reflecting boundary, 460
reflection principle, 165, 257
regenerative set, process, 432
regular:
boundary, domain, set, point 446,
473-474
conditional distribution, 106-107
diffusion, 455-459
measure, outer measure 18, 37-38
regularization of:
local time, 430
Markov process, 379
rate function, 545
stochastic flow, 415
submartingale, 130, 134
relative:
compactness, 69, 98-99, 309,
563-565
entropy, 554
renewal:
measure, process, 170, 238, 277-278
theorem, 172
equation, 175
resolvent, 370, 379
equation, 370, 402
restriction of:
measure, 9
optional time, 492-493, 498
Revuz measure, 442-445, 447
Riemann integrable, 1 75
Riesz:
decomposition, 511-513
representation, 36, 371, 378
right-continuous:
filtration, 121, 124
function, 34-35
process, 134, 379
right-invariant, 39
sample, 211
intensity, 190
process, 226
sampling:
sequence, 211
without replacement, 213, 319
scale function, 456
Schwarz's inequality, 17
SDE, stochastic differential equation,
346, 412-424, 450-455, 522,
553
sections, 14, 562
selection, 98, 562
self-adjoint, 105
self-similar, 291
semicontinuous, 507, 545
semigroup, 145, 183, 186-187,
368-378, 397-399
semimartingale, 337-345, 518-524,
527-529, 532-533
semiring, 26
separating class, 317, 486-487, 567
series of measures, 8
shift:
coupling, 197-198,209
operator, 146, 1 79, 380
a-field, 1
a-finite, 9, 225
signed measure, 28,34
simple:
function, measure, 6, 10
point process, 203-207, 226, 230,
233-235,238,317,326
random walk, 165
singular(ity), 13, 28-29, 35, 218,452
skew-product, 355
Skorohod:
coupling, 79, 113, 298-299
embedding, 271
slow:
reflection, 460
variation, 96
sojourn, 216, 258, 276
space filling, 187
space-homogeneous, 144, 147
space-time invariant, 397
special semimartingale, 518, 532
spectral measure, representation, 261
speed measure, 458, 465, 467
spreadable, 212, 214, 216-217
stable, 291-292, 506
standard extension, 352, 358, 419, 424
stationary:
process, 148, 179, 183,211,220-221
random measure. 170, 189-190,
203-210, 218
Subject Index 637
stochastic:
differential equation, 346, 412-424,
450-455, 522, 553
flow, 415
integral, 236, 260--266, 336-346,
517-518, 526
process, 47
Stone-Weierstrass theorem, 86, 341,
521
stopping (optional) time, 120
Stratonovich integral, 342
strict past, 491-492
strong:
continuity, 369, 372
ergodicity, 153, 244, 397, 400, 408,
466
existence, 414-415, 423-424
homogeneity, 155
law of large numbers, 73
Markov property, 147, 155, 237,
256, 380, 421
orthogonality, 355
solution, 413, 424
stationarity, 214
subadditive:
ergodic theorem, 192
sequence, 191, 538
set function, 8, 37
submartingale, 125-126, 128, 130,
134-135, 493
subordinator, 290-293, 438
subsequence criterion, 63
subspace, 4, 47, 76: 309
substitution rule, 12, 340-342, 431,
521
super harmonic, 507
supermartingale, 125, 136, 379, 510
superposition, 318, 486
support of:
additive functional, 446, 513
local time, 429, 441
measure, 9, 326, 429, 567
supporting measure, 399
sweeping, 474, 480
symmetry, symmetric:
difference, 1
point process, 235
random variable, 54, 70, 91, 163,
300
set, 53
638 Foundations of Modern Probabjlity
spherical, 251
symmetrization, 71-72, 163, 263
tail:
probabilities, 49, 63, 85
a-field, 53, 133, 197, 397
Tanaka's formula, 428
Taylor expansion, 90, 92, 340
terminal time, 240, 341, 380, 473
thinning, 226-227, 231, 318-319
three-series criterion, 71
tightness, 66, 86, 99, 309-311,
313-314, 316, 321, 324, 533,
546, 556
time:
change, 124, 344, 352, 355, 458,
505-506, 563
homogeneous, 144-145, 147, 237
reversal, 484, 509
topological group, 38
total variation, 33, 152, 255
totally inaccessible, 492, 495, 500
transfer, 58, 112, 424
transient, 149, 160, 164, 244, 354,
405, 408-409
transition:
densi ty, 476
function, matrix, 151, 243
kernel, 141, 144-145, 368
operator, semigroup, 241, 368
transitive, 41
translation, 15, 27
trivial u-field, 51, 53, 181, 381,
397-399
two-sided extension, 180
ultimately, 46
uncorrelated, 50, 250
uniform:
distribution, 55
excessivity, 445
integrability, 67-69, 109, 131, 134,
173,334,477,493
laws, 259, 218-220
transience, 405
uniqueness (of):
additive functional, 442, 445
distribution, 48, 86-87, 141, 204,
371
pathwise, 414-415, 423-424, 453
rate function, 545
in law, 414, 421-424,451,472
universal completion, 423, 562
upcrossings, 129-130
urn sequence, 213
vague topology, 98-99, 172, 316, 564
variance, 50, 52
variation, 33
version of process, 57
V oronoi cell, 204
Wald's identity, 364
weak:
compactness, 99, 309
convergence, 65, 86-96, 99-100,
275-276, 308-326, 385-387
ergodici ty, 399
existence, 414, 419, 422-423, 451
L 1 compactness, 69
law of large numbers, 95
LDP, 546, 549
mixing, 398-399
optionality, 121
solution, 413, 418, 424
weight function, 190
well posed, 418
Wiener:
integral, 260-263
process, Brownian motion, 253
Wiener-Hopf factorization, 168
Yosida approximation, 372, 386
zero-one law, 53, 381
zero-infinity law, 203