/
Текст
Probability and its Applications
A Series of the Applied Probability Trust
Editors: J. Gani, C.C. Heyde, T.G. Kurtz
Springer
New York
Berlin
Heidelberg
Barcelona
Hong Kong
London
Milan
Paris
Singapore
Tokyo
Probability and its Applications
Anderson: Continuous-Time Markov Chains.
Azencott/Dacunha-Castelle: Series of Irregular Observations.
Bass: Diffusions and Elliptic Operators.
Bass: Probabilistic Techniques in Analysis.
Choi: ARMA Model Identification.
de la Pena/Gine: Decoupling: From Dependence to Independence.
Galambos/Simonelli: Bonferroni-type Inequalities with Applications.
Gani (Editor): The Craft of Probabilistic Modelling.
Grandell: Aspects of Risk Theory.
Gut: Stopped Random Walks.
Guyon: Random Fields on a Network.
Kallenberg: Foundations of Modern Probability.
Last/Brandt: Marked Point Processes on the Real Line.
Leadbetter/Lindgren/Rootzen: Extremes and Related Properties of Random Sequences
and Processes.
Nualart: The Malliavin Calculus and Related Topics.
Rachev/Ruschendorf: Mass Transportation Problems. Volume I: Theory.
Rachev/Ruschendorf: Mass Transportation Problems. Volume II: Applications.
Resnick: Extreme Values, Regular Variation and Point Processes.
Shedler: Regeneration and Networks of Queues.
Thorisson: Coupling, Stationarity, and Regeneration.
Todorovic: An Introduction to Stochastic Processes and Their Applications.
Hermann Thorisson
Coupling, Stationarity,
and Regeneration
With 27 Illustrations
Springer
Hermann Thorisson
Science Institute
University of Iceland
Dunhaga 3
107 Reykjavik
Iceland
E-mail: hermann@hi.is
Homepage: www.hi.is/~hermann
Series Editors
J. Gani
Stochastic Analysis
Group, CMA
Australian National
University
Canberra ACT 0200
Australia
C.C. Heyde
Stochastic Analysis
Group, CMA
Australian National
University
Canberra ACT 0200
Australia
T.G. Kurtz
Department of
Mathematics
University of Wisconsin
480 Lincoln Drive
Madison, WI 53706
USA
Mathematics Subject Classification (1991): 60Gxx, 60Jxx, 60Kxx, 60D05, 60B15, 60A10, 60F99
Library of Congress Cataloging-in-Publication Data
Thorisson, Hermann.
Coupling, stationarity, and regeneration / Hermann Thorisson.
p. cm. — (Probability and its applications)
Includes bibliographical references and index.
ISBN 0-387-98779-7 (hardcover : alk. paper)
1. Random variables. 2. Stochastic processes.
I. Title. II. Series.
QA273.T4395 2000
519.2—dc21 99-40961
Printed on acid-free paper.
© 2000 Springer-Verlag New York, Inc.
All rights reserved. This work may not be translated or copied in whole or in part without the
written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York,
NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use
in connection with any form of information storage and retrieval, electronic adaptation, computer
software, or by similar or dissimilar methodology now known or hereafter developed is forbidden.
The use of general descriptive names, trade names, trademarks, etc., in this publication, even if the
former are not especially identified, is not to be taken as a sign that such names, as understood by
the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone.
Production managed by Jenny Wolkowicki; manufacturing supervised by Jeffrey Taub.
Typeset by Thorir Magnusson using the author's TgX files.
Printed and bound by Maple-Vail Book Manufacturing Group, York, PA.
Printed in the United States of America.
987654321
ISBN 0-387-98779-7 Springer-Verlag New York Berlin Heidelberg SPIN 10660218
TileinkaS per Rannveig
Solarhjort
leit ek sunnan fara,
hann teymdu tveir saman;
faetr hans
stoSu foldu &,
en toku horn til himins.
Ur Solarljodum
Preface
This is a book on coupling, including self-contained treatments of station-
arity and regeneration. Coupling is the central topic in the first half of the
book, and then enters as a tool in the latter half. The ten chapters are
grouped into four parts as follows:
Chapters 1-2 form an introductory part presenting basic
elementary couplings (Chapter 1 on random variables) and the classical
triumphs of the coupling method (Chapter 2 on Markov chains, random
walks, and renewal theory).
Chapters 3-7 present a general coupling theory highlighting
maximal couplings and convergence characterizations for random
elements, stochastic processes, random fields, and random elements
under the action of a transformation semigroup.
Chapters 8-9 present Palm theory of stationary stochastic processes
associated with a simple point process. Chapter 8 treats the one-
dimensional case and Chapter 9 the higher-dimensional case.
Chapter 10 deals with regeneration, both classical regenerative
processes and three generalizations: wide-sense regeneration (as in Harris
chains); time-inhomogeneous regeneration (as in time-inhomogeneous
recurrent Markov chains); and taboo regeneration (as in transient
Markov chains). It ends with a section on perfect simulation
(coupling from-the-past). This enormous chapter is thrice the size of a
normal chapter, and is really a book within the book.
For more information on the content of the book, see the introductions to
the chapters. Also, the table of contents provides a structural review.
viii Preface
The book should be of interest to students and researchers in probability,
stochastic modelling, and mathematical statistics. It is written with a Ph.D.
student in mind, and the first two chapters can be read at the master's level
and even at an advanced undergraduate level.
The book is mathematically self-contained, relying only on basic measure-
theoretic probability. Measure-theoretic language is suppressed in the first
two chapters, and then enters heavily in Chapter 3 to be used explicitly
for the rest of the book; Ash (1972) is used as reference, but Billingsley
(1986) is also fine. Some prior knowledge of elementary Markov chain
theory would be useful, at least in Chapter 2; Karlin and Taylor (1975) and
Cinlar (1975) are excellent, and the compact first two sections of the first
two chapters in Asmussen (1987).
Some Conventions
In order to make clear what results belong to the measure-theoretic
background, the term 'Fact' is used for results stated without proof, while the
terms 'Theorem' and 'Lemma' are reserved for results that are proved here.
Facts of basic importance throughout the book are restricted to Chapter 3
(Sections 3 and 4).
Sections are enumerated within chapters. For instance, the 4th section
in the 3rd chapter is referred to within the chapter only as 'Section 4'; but
in the other chapters as 'Section 4 in Chapter 3' or 'Chapter 3 (Section 4)'.
Subsections are enumerated within chapters and sections: the 5th
subsection of the 4th section in the 3rd chapter is referred to within the chapter
as 'Section 4.5'; but in the other chapters as 'Section 4.5 in Chapter 3'.
The same goes for Theorems, Lemmas, Facts, Remarks, and Figures.
Definitions are stated in the text, and only indicated by writing the
concept being defined in italics (we also use italics for emphasis). Figures
are placed in the text precisely where they should be consulted (mostly),
but the text does not rely on them. We use both parenthesis () and brackets
[] for comments that can be skipped. Historical and bibliographic notes are
deferred to a separate section at the end of the book.
The symbol X (and X', X, X\, ...) is reserved for real-valued random
variables. The symbol Sk always denotes a sum, Sk = Sq+ Xx + 1- -X*.
On the other hand, S is either a sequence of Sk (one-sided sequence in
Chapters 2, 3, and 10; two-sided in Chapter 8), or a d-dimensional random
vector (Chapters 7 and 9). The symbol U is reserved for a variable uniform
on [0,1]. The symbol Z is reserved for processes. The symbol Yoften denotes
a random element in a general space; and P(Y € •) is the distribution of Y.
Errors are bound to abound in the book, in spite of all the thinning
attempts. For errata, and even some notes and references, see my homepage
(www. hi. is/~hermann) or Springer's (www. springer-ny. com). If you find
an error or have a comment, please send me an informal note by e-mail
(To: hermann@hi.is; Subject: book).
Preface ix
Acknowledgements
It took four long years to write this book, word for word, from the first
word on the first page to the last word on page four-hundred-seventy-eight.
Previously, the book had been in preparation for five years; and before that,
subconsciously for years.
I would like to thank Torgny Lindvall for introducing me to this field
of study and for grooming me for this task, and Peter Jagers for his
influence and for the focused comments on the book. Also thanks to S0ren
Asmussen, Karl Sigman, Peter Glynn, Serguei Foss, and Richard Gill, who
have influenced this work in various ways throughout the years.
Special thanks to Olle Nerman, Olav Kallenberg, David Blackwell, Henry
Berbee, Jakob Yngvason, Peter Donnelly, and Olle Haggstrom for
illuminating observations; to Rolando Rebolledo, who got me started on this
project by inviting me to Chile to lecture in the fall of 1991; to Vladimir
Kalashnikov for the collaboration on the Petrozavodsk proceedings, which
helped get things into perspective; to Christian Meise for reading and
rereading the book, and for the thicket of detailed comments; to Diemer
Salome for comments on the first four chapters; to Damien White, Remco
van der Hofstad, Erik van Zwet, Karma Dajani, Ronald Meester, and Adam
Shwartz for comments on the first two; to Vladimir Bogachev and Andrew
Nobel for comments on the third; to David Svensson for comments on
the eighth; to my next-door colleague Magnus Halld6rsson for reading and
commenting on parts of the book, and for the many discussions; and to my
fellow probabilist Ott6 Bjornsson for his long-lasting interest and support.
Also thanks to the copyeditor David Kramer for excellent suggestions while
basically accepting my Icelandic English, and for all the commas; to my
colleague Robert Magnus for straightening out many unclear sentences, and
for some British moderation; and to the production editor Jenny Wolkow-
icki for the insightful finishing touch. After all this feedback the book should
be in pretty good shape; but whatever mistakes there are, they are all mine.
I would like to thank John Kimmel, Springer's statistics editor, for his
deep understanding of what a work like this is about, and for never rushing
me in spite of the writing going almost two years beyond deadline and the
book becoming more than twice the planned size.
Many thanks to Porir Magnusson for his expert LaTeX-ing, and for
his support and the patience it must have taken to work along with me
these four years. Also thanks to my son Freyr (Hermannsson) for calmly
FreeHand-ing the figures during these hectic last weeks.
I am grateful to the University of Iceland and its Science Institute for
funding this project, and for providing the freedom necessary for this task.
I am also grateful to the Icelandic Science Foundation for travel grants.
And now, as this book goes to print, I am informed that I have been
awarded the Olafur Danielsson Prize in Mathematics for my research, most
ofwhich can be found in some form in this book. I am deeply moved and will
use the generous sum to recover from this work, and prepare for the next.
x Preface
Music, - from Johann Sebastian Bach to Par Lindh Project (PLP), - played
an important role in getting me through the composition of this book. So,
on another note, thank you Keith Emerson for the High Level Fugue and
the Endless Enigma; evermoving without ever moving. Also thanks to Ian
(Thick as a Brick) Anderson for Twelve Dances with God, and to King
Crimson for both Discipline and Indiscipline, and for completing this trinity
of sons IN THE COURT OF THE CRIMSON KING.
Finally, Rannveig, Freyr, and Nanna, thank you for so lovingly supporting
me through these Ten Dances with Chance.
Reykjavik
October 1999
Hermann Porisson
Contents
Preface vii
1 RANDOM VARIABLES 1
1 Introduction 1
2 The i.i.d. Coupling - Positive Correlation 2
3 Quantile Coupling - Stochastic Domination 3
4 Coupling Event - Maximal Coupling 6
5 Poisson Approximation - Total Variation 11
6 Convergence of Discrete Random Variables 15
7 Continuous Variables - Hitting the Limit 18
8 Convergence in Distribution and Pointwise 21
9 Quantile Coupling - Dominated Convergence 26
10 Impossible Coupling - Quantum Physics 27
2 MARKOV CHAINS AND RANDOM WALKS 33
1 Introduction ' 33
2 Classical Coupling - Birth and Death Processes 33
3 Classical Coupling - Recurrent Markov Chains 38
4 Classical Coupling - Rates and Uniformity 44
5 Ornstein Coupling - Random Walk on the Integers 47
6 Ornstein Coupling - Recurrent Markov Chains 52
7 Epsilon-Coupling - Nonlattice Random Walk 57
8 Epsilon-Coupling - Blackwell's Renewal Theorem 62
9 Renewal Processes - Stationarity 68
10 Renewal Processes - Asymptotic Stationarity 72
xii Contents
3 RANDOM ELEMENTS 77
1 Introduction 77
2 Back to Basics - Definition of Coupling 78
3 Extension Techniques 80
4 Conditioning - Transfer 86
5 Splitting 93
6 Random Walk with Spread-Out Step-Lengths 98
7 Coupling Event - Maximal Coupling 104
8 Maximal Coupling Two Elements - Total Variation 107
9 Hitting the Limit 113
10 Convergence in Distribution and Pointwise 117
4 STOCHASTIC PROCESSES 125
1 Introduction 125
2 Preliminaries - What Is a Stochastic Process? 126
3 Exact Coupling - Distributional Exact Coupling 136
4 Distributional Coupling 140
5 Exact Coupling - Inequality and Asymptotics 142
6 Exact Coupling - Maximality 146
7 Coupling with Respect to a Sub-cr-Algebra 149
8 Exact Coupling - Another Proof of Theorem 6.1 153
9 Exact Coupling - Tail cr-Algebra - Equivalences 156
5 SHIFT-COUPLING 161
1 Introduction 161
2 Shift-Coupling - Distributional Shift-Coupling 162
3 Shift-Coupling - Inequality and Asymptotics 165
4 Shift-Coupling - Maximality 169
5 Shift-Coupling - Invariant cr-Algebra - Equivalences .... 174
6 e-Coupling - Distributional e-Coupling 178
7 e-Coupling - Inequality and Asymptotics 180
8 e-Coupling - Maximality 187
9 e-Coupling - Smooth Tail cr-algebra - Equivalences 188
6 MARKOV PROCESSES 195
1 Introduction 195
2 Mixing and Triviality of a Stochastic Process 195
3 Markov Processes - Preliminaries 201
4 Exact Coupling 204
5 Shift-Coupling 207
6 Epsilon-Coupling 209
7 Stationary Measure 213
7 TRANSFORMATION COUPLING 217
1 Introduction 217
Contents xiii
2 Shift-Coupling Random Fields 218
3 Transformation Coupling 222
4 Inequality and Asymptotics 225
5 Maximality 228
6 Invariant a-Algebra and Equivalences 231
7 Topological Transformation Groups 238
8 Self-Similarity - Exchangeability - Rotation 241
9 Exact Transformation Coupling 244
8 STATIONARITY, THE PALM DUALITIES 249
1 Introduction 249
2 Preliminaries - Measure-Free Part of the Dualities 251
3 Key Stationarity Theorem 254
4 The Point-at-Zero Duality 258
5 Interpretation - Point-Conditioning 264
6 Application - Perfect Simulation 271
7 The Invariant cr-Algebras I and J 281
8 The Randomized-Origin Duality 284
9 Interpretation - Cesaro Limits and Shift-Coupling 288
10 Comments on the Two Palm Dualities 290
9 THE PALM DUALITIES IN HIGHER DIMENSIONS 295
1 Introduction 295
2 The Point-Stationarity Problem 296
3 Definition of Point-Stationarity 302
4 Palm Characterization of Point-Stationarity 308
5 Point-Stationarity Characterized by Randomization 317
6 Point-Stationarity and the Invariant a-Algebras 320
7 The Point-at-Zero Duality 323
8 The Randomized-Origin Duality 328
9 Comments 335
10 REGENERATION 337
1 Introduction 337
2 Preliminaries - Stationarity 339
3 Classical Regeneration ' 346
4 Wide-Sense Regeneration - Harris Chains - GI/GI/k .... 358
5 Time-Inhomogeneous Regeneration 372
6 Classical Coupling 385
7 The Coupling Time - Rates and Uniformity 399
8 Asymptotics From-the-Past 422
9 Taboo Regeneration 436
10 Taboo Stationarity 451
11 Perfect Simulation - Coupling From-the-Past 467
xiv Contents
Notes 479
References 491
Index 509
Notation 517
Chapter 1
RANDOM VARIABLES
1 Introduction
Coupling means the joint construction of two or more random variables (or
processes), usually in order to deduce properties of the individual variables
or gain insight into distributional similarities or relations between them.
In this chapter and the next the method is introduced through a series of
basic elementary examples. The arguments are carried out in full detail at
an undergraduate level, suppressing measure-theoretic language. Advanced
readers should be able to fill in any missing measure-theoretic notation or
find it at the beginning of Chapter 3, where we return to the definition of
coupling.
Let us spend a few lines on terminology before turning to the examples.
A copy or a representation of a random variable X is a random variable X
with the same distribution as X. Denote this by
X^X.
A coupling of a collection of random variables Xi, i € I, [where I is some
index set] is a family of random variables (Xi : i € I) such that
Xi = xu i e i.
Note that only the individual Xt are copies of the individual Xi, while the
whole family (X, : i 6 I) is typically not a copy of the family (Xi : i 6 I). In
other words, the joint distribution of the Xt need not be the same as that of
1
2 Chapter 1. RANDOM VARIABLES
the Xi. In fact, the Xi need not even have a (specified) joint distribution.
On the other hand, we write (Xi : i 6 I) in parentheses to stress that
the Xi have a joint distribution. A trivial but often useful coupling is the
independence coupling consisting of independent copies of the Xi.
Thus a coupling has fixed marginal distributions (the distributions of
the individual Xi), and the trick is to find a dependence structure (joint
distribution) that fits one's purposes.
2 The i.i.d. Coupling - Positive Correlation
A self-coupling of a random variable X is a family (Xt : i £ I) where each
Xi is a copy of X. A trivial (and not so useful) self-coupling is the one
with all the Xi identical. Another trivial self-coupling is the i.i.d, coupling
consisting of independent copies of X. As an example of an efficient use of
the i.i.d. coupling we shall prove the following result.
For every random variable X and nondecreasing bounded functions /
and g, the random variables f(X) and g(X) are positively correlated,
that is,
Cov[f(X),g(X)]^0. (2.1)
In order to prove this claim let X' be an independent copy of X [thus
(X, X') is an i.i.d. coupling of X]. The additivity of covariances yields
Cov[/(X) - f(X'),g(X) - g(X')} = Cov[f(X),g(X)}
- Cov[f(X),g(X')] - Cov[f(X'),g(X)} + Cov[f(X'),g(X%
Since X and X' are independent, the middle terms on the right are zero,
and since X and X' have the same distribution, the remaining terms on
the right are identical. Thus
Cov[f(X),g(X)] = i Cov[/(X) - f(X'),g(X) - g(X')}.
Since the mean of both f(X) - f(X') and g{X) - g{X') is zero, we have
Cov[f(X)-f(X'),g(X)-g(X')}
= E[(f(X)-f(X'))(g(X)-g(X'))},
which is positive, since / and g nondecreasing implies that
f(x) — f(y) and g(x) — g(y) are either both ^ 0 or both ^ 0.
Thus (2.1) holds.
Section 3. Quantile Coupling - Stochastic Domination 3
3 Quantile Coupling — Stochastic Domination
In this section we produce a coupling that turns so-called stochastic
domination into ordinary (pointwise) domination. Another application can be
found in Section 8. See also Section 9.
3.1 The Coupling
Consider a random variable X with distribution function F, that is,
P{X ^ x) = F(x), i£l
Let F"1 be the generalized inverse of F (or quantile function) defined by
F-1{u) = ini{xeR:F{x)^u}, u e [0,1].
Note that if F is continuous and strictly increasing, then F_1 is the ordinary
inverse of F (see Figure 3.1).
FIGURE 3.1. The generalized inverse F_1.
Let U be uniform on [0,1] (this is short for saying that U is a random
variable that is uniformly distributed on [0,1]). Then the random variable
X = F~1(U)
is a copy of X, since [note that F-1(u) ^ x if and only if u ^ F(x)]
P{X < i) = P(F_1 {U) ^x) = P(U ^ F{x)) = F(x), xeR.
Thus letting F run over the class of all distribution functions (using the
same U) yields a coupling of all differently distributed random variables.
Call it the quantile coupling.
Since F_1 is nondecreasing, we have, according to Section 2, that the
quantile coupling consists of positively correlated random variables. We
might even think of this coupling as a maximal dependence coupling
because knowing the value of only one of its variables, namely the value of U
itself, gives us the value of all the others.
4 Chapter 1. RANDOM VARIABLES
3.2 Application — Stochastic Domination
Let X and X' be two random variables with distribution functions F and
G, respectively. If there is a coupling (X, X') of X and X' such that X is
pointwise dominated by X', that is,
X^X',
then {X ^ x} D {X' ^ x}, which implies P(X ^ x) ^ P(X' ^ x) and
thus
F(x) ^ G(x), x e M.
(3-1)
If (3.1) holds, then X is said to be stochastically dominated (or dominated
in distribution) by X'. Denote this by
D
X ^ X'.
We shall now show that the quantile coupling turns stochastic domination
back into pointwise domination: due to (3.1), G(x) ^ u implies F(x) ^ u
and thus
{x e R : G{x) )ti}C{iel: F(x) ^ u}
and thus F~\u) ^ G~\u), which yields F~\U) ^ G~\U) [see Figure 3.2].
FIGURE 3.2. Turning stochastic domination into pointwise domination.
We have established the following result.
Theorem 3.1. Let X and X' be random variables. Then
D
X ^X'
if and only if there is a coupling (X,X') of X and X' such that
X <X'.
Section 3. Quantile Coupling - Stochastic Domination 5
3.3 What For?
The direct usefulness of Theorem 3.1 is mainly due to the fact that it is
easier to carry out arguments using pointwise domination than stochastic
domination. As an illustration of this we shall prove the following result.
Corollary 3.1. Let Xi,X2,X[, and X2 be random variables such that
X\ and X2 are independent,
X[ and X'2 are independent,
Xx J X[ and X2 J X2.
Then
X1+X2^X[+X2. (3.2)
Proof. Let (Xi,X[) be a coupling of Xi and X[ such that XY ^ X[ and
let {X2,X^} be a coupling of X2 and X2 such that X2 f X2. Let {XUX[)
and (X2,X2) be independent. Then (Xi + X2,X{ + X2) is a coupling of
Xi + X2 and X[ + X2, and
Xl+X2^X[+X'2.
This implies (3.2). □
A more substantial example of obtaining a distributional result through a
pointwise argument by way of coupling can be found in Section 9, where we
use the quantile coupling to obtain the distributional version of dominated
convergence from the standard pointwise version.
3.4 On the General Coupling Idea
Theorem 3.1 and Corollary 3.1, illustrate two general points about coupling.
Firstly, a coupling characterization of a distributional property deepens our
understanding of that property: according to Theorem 3.1, stochastic
domination is simply the distributional form of pointwise domination. Secondly,
the coupling characterization can also be directly useful because it is easier
to argue pointwise (as in the proof of Corollary 3.1) than in distribution.
3.5 Variations on the Quantile Coupling
Let U be uniform on [0,1] and define
X = F-\U), X' = G~1(1-U).
Then (X, X') is still a coupling of X and X', since 1 — U is also uniform on
[0,1]. Now G_1(l — u) is nonincreasing in u, and an obvious modification
6 Chapter 1. RANDOM VARIABLES
of the last three lines in Section 2 yields that X and X' are negatively
correlated.
More generally, if we put
X = F~l{(a + bU) mod 1), X' = G'1 ((c + dU) mod 1),
where a,c € M and b, d = ±1, and x mod 1 means the fractional part of x,
x mod 1 = x — [x],
then (X,X') is a coupling of X and X'. Here we could even allow a,b,c,
and d to be random variables that are independent of U.
If we take G = F, these modifications of the quantile coupling yield a
nontrivial self-coupling of X.
The quantile approach is used heavily in simulation to generate random
variables with specified distributions.
3.6 Comment
D „ „
Suppose X ^ X' and apply Theorem 3.1 to obtain a coupling (X, X') such
that X ^ X'. Then for each bounded nondecreasing function g we have
g(X) ^ g(X') and thus
E[g(X)} < E[P(X')]. (3.3)
Conversely, suppose (3.3) holds for all bounded nondecreasing functions g.
Fix anigl and take g = l^oo) to obtain from (3.3) that
P(X >x)^ P{X' > x), x € BL
Thus X ^ X' if and only if (3.3) holds for all bounded nondecreasing g.
This is taken as the definition of stochastic domination in higher dimensions
and, more generally, in partially ordered spaces. Theorem 3.1 can in fact
be extended to partially ordered Polish spaces; cf. Lindvall (1992), Chapter
IV.1 (Strassen's theorem).
4 Coupling Event - Maximal Coupling
Let Xi, i € I, be a collection of discrete or continuous random variables.
We shall construct a coupling such that the variables coincide maximally.
We first treat the discrete case and start by establishing an upper bound
on the coincidence probability. In Section 5 we give an application of this
coupling.
Section 4. Coupling Event - Maximal Coupling 7
4.1 The Coupling Event Inequality — Discrete Variables
Suppose (Xi : i £ I) is a coupling of X^, i £ I, and let C be an event such
that if C occurs, then all the Xi coincide, that is,
C C {Xi = Xj for all j, j £ I}.
Call such an event a coupling event.
Consider first the discrete case: let all the Xi take values in a finite or
countable set E and denote the probability mass functions by pi, that is,
for x £ E,
P(Xi=x)=Pi(x).
For all i,j £ I and x £ E we have
P(Xi =x,C) = P(Xj = x,C)^ Pj{x)
and thus for all i £ I and x £ E
P{Xi =x,C) ^infp^x).
Summing over x £ E yields the following basic coupling event inequality.
Theorem 4.1. If C is a coupling event of a coupling of discrete random
variables Xi, i £ I, taking values in a finite or countable set E, then
P(C) <Y,infPi(x). (4.1)
4.2 Maximal Coupling — Discrete Variables
We shall now construct a coupling with a coupling event C such that (4.1)
holds with identity. Call such a coupling maximal and C a maximal
coupling event. Put
c := 2_, inf Pj(x) (the maximal coupling probability).
xeE'€
If c = 0, take the Xi independent and C = 0. If c = 1, take the Xi
identical and C = 0 = the set of all outcomes. If 0 < c < 1, let us mix
these couplings as follows. Let
/, V, and Wi, i £ I, be independent random variables
such that
/ is 0-1 valued with P(I = 1) = c,
P(y = x) =mipi(x)/c, x £ E.
P(Wi =x) = (pi{x) - cP{V = x))/(l -c), x£ E.
8 Chapter 1. RANDOM VARIABLES
Define, for each i £ I,
[Wi if 7 = 0. K J
Then
P(Xt =x)= P(V = x)P(I = 1) + P(Wi = x)P(7 = 0)
= P(Xi = a:).
Moreover, C= {1= 1} is a coupling event and P(C) has the desired value c.
We have established the following result.
Theorem 4.2. Suppose Xi, i £ I, are discrete random variables taking
values in a finite or countable set E. Then there exists a maximal coupling,
that is, a coupling with coupling event C such that
P(C) = £ MPi(x).
»€I
x€E
4.3 The Coupling Event Inequality — Continuous Variables
Now let the X{ be continuous random variables with densities /,, that is,
for intervals A
P{Xi £ A) = / fi (which is short for / fi(x)dx).
J A J A
It is a little harder to establish the coupling event inequality in this case,
and we shall make the simplifying assumption that the X, are either finitely
or count ably many, that is,
I = {l,...,n} or I={1,2,...}.
Suppose (Xi : » £ I) is a coupling of Xi, i £ I, and C is a coupling event.
Then, for intervals A and i,j £ I,
P(Xi £A,C) = P(Xj £A,C)^ [ fj. (4.3)
J A
Consider first the finite case I = {1,..., n} and define a partition of E by
A1 ={x£ E: /!(x) = inf fj(x)}
and recursively for 1 < k ^ n
Ak = {x £ E : fk(x) = inf fj(x)}\(A1U---UAk-1).
Section 4. Coupling Event - Maximal Coupling 9
Then (4.3) yields the inequality in
P(Xi GAnAk,C)^ [ fj= [ inf fk, (4.4)
JAr\Ak JAnAk Kj<n
while the equality follows from the definition of Ak- Sum over A; £ I to
obtain, in the finite case, that
P(Xi g A,C)^ /inf/j, iGl. (4.5)
Ja^1
In the countable case I = {1,2,...} fix n < oo to obtain that (4.4) still
holds for i,k ^ n. This yields (4.5) with infjgi fj replaced by infi^j^n fj.
Sending n —> oo yields (4.5), since infi^-^n fj decreases to inf j€i fj.
Take A = E in (4.5) to obtain the following coupling event inequality.
Theorem 4.3. If C is a coupling event of a coupling of continuous random
variables with densities f\, fi,... (or /i,..., fn), then
P(C)^Jmifi- (4-6)
4.4 Maximal Coupling — Continuous Variables
Call a coupling and event achieving identity in (4.6) maximal. The
construction in Section 4.2 extends with an obvious modification to the continuous
case. Put
/inf
J iGI
fi (the maximal coupling probability).
If c = 0, take the Xt independent and C = 0. If c = 1, take the Xt identical
and C = 0. If 0 < c < 1, mix these couplings as follows. Let
/, V, and Wi, i £ I, be independent random variables
such that
/ is 0-1 valued with P(I = 1) = c,
V has density inf file,
i€I
Wi has density (fi - inf /j)/(l - c).
JGI
Define Xi by (4.2). Then (Xj : i 6 I) is a coupling of the Xi, since for
intervals A,
P{Xi £i) = P{V £ A)P(7 = 1) + P(Wi £ A)P(7 = 0)
= P(Xi e A).
Moreover, C = {I = 1} is a coupling event, and P(C) has the desired
value. We have established the following result.
10 Chapter 1. RANDOM VARIABLES
Theorem 4.4. Suppose X\,X2, ■ ■ ■ (or X\,..., Xn) are continuous
random variables with densities /i, /2, • • • (or f\,..., /„). Then there exists a
maximal coupling, that is, a coupling with coupling event C such that
\C) = Jmifi.
4.5 Comments
It is often natural to take
C={Xi = Xj for alH,./el}.
By definition, any coupling event of (Xt : i £ I) is contained in this set,
and thus the maximal couplings in Theorems 4.2 and 4.4 are also maximal
with this choice of C.
In particular, for two discrete random variables X and X' there exists a
coupling (X,X') such that, with A denoting minimum,
p(i = i') = ^p(i = i)AP(r = i) (4.7)
X
and for two continuous random variables X and X' with densities / and
/' there exists a coupling (X,X') such that
P(X = X')= f fAf' (see Figure 4.1). (4.8)
t
v Maximal coupling probability ■
FIGURE 4.1. The maximal coupling probability.
Call these couplings maximal (without a reference to a particular coupling
event).
In simulation a maximal coupling of continuous random variables X and
X' with densities / and g can be generated as follows. Choose a point
uniformly at random under the /-curve and let its ^-coordinate be a realization
of X. If the point happens to be under the g-curve, let its x-coordinate also
be a realization of X'. If not, choose a new point uniformly at random under
Section 5. Poisson Approximation - Total Variation 11
the g-curve and above the /-curve and let its x-coordinate be a realization
oiX'.
This simulation procedure extends to a collection Xi,..., Xn of random
variables with densities fi, ■ ■ ■, fn as follows. Choose a point uniformly at
random under fx, consider the densities under which this point falls, and let
its x-coordinates be the realizations of the corresponding variables. Then
pick a point uniformly at random above these densities and under one of
the remaining densities, consider the densities under which the point falls,
and let its ^-coordinates be the realizations of the corresponding variables.
Repeat this until no density remains. This yields a coupling (Xi,... ,Xn)
such that all subcollections (Xni,..., Xnk) are maximal couplings. In fact,
repeating this ad infinitum yields a coupling of a countable collection of
continuous random variables such that each subcollection is a maximal
coupling.
We shall refer to the representation (4.2) of X, as a splitting
representation.
In Chapter 3 (Section 7) we extend the results of this section to arbitrary
collections of random elements.
5 Poisson Approximation - Total Variation
The following well-known approximation
Bin(n,p) « Poi(np) (5.1)
can be established and made precise by coupling.
5.1 Approximating a 0-1 Variable
Let X be a 0-1 variable with P(X = 1) = p where 0 ^ p ^ 1 and let X'
be Poisson p. Let (X, X') be a maximal coupling of X and X'. In order to
determine the maximal coupling probability P(X = X'), recall that for all
real x it holds that 1 + x ^ ex, which yields
P(X = 0) = 1 - p ^ e~p = P{X' = 0),
and note that
P(X = l)=p> pe-p = P(X' = 1).
This and (4.7) yields
P(X = X') = P{X = 0) A P(X' = 0) + P(X = 1) A P{X' = 1)
— 1 — p + pe~p.
Since e~p ^ 1 - p, this implies that P(X = X') ^ 1 - p2 and thus
P(XjtX')^p2. (5.2)
12 Chapter 1. RANDOM VARIABLES
5.2 Sums of Independent 0-1 Variables
Let Xi,... ,Xn be independent 0-1 variables with P(Xi = 1) = pi, where
0 < pi ^ 1. Put
x = x1 + --- + xn.
Let X[,...,X'n be independent Poisson variables, X[ with parameter p;.
Recall that
X' := X[ + ■ ■ ■ + X'n is Poisson pi + • ■ • + p„.
Let (Xi, X[),..., (Xn, X^) be independent pairs such that for each i, (Xi, X[)
is a maximal coupling of Xi and Xs'. Put
X = Xx+--- + Xn and X' = X{+--- + X;.
Then (X,X') is a coupling of X and X', and
P(X ^ X') ^ P(Xi ^ Xs' for some i) ^ ^ P(X< ^ X?).
Applying (5.2) yields
P(X ^ X') ^ ^ pi (5.3)
If we take pi = p, then X is binomial (n,p), and thus we have the following
clear and intuitively appealing random variable formulation of (5.1):
Bin(n,p) differs from Poi(np) with probability at most np2.
In order to use the above coupling to formulate (5.1) in terms of total
variation distance between distributions we take an excursion into that
topic for the next two subsections.
5.3 Total Variation — Definition and Identities
Let X and X' be random variables with distributions A and /i, that is, for
each (Borel) set A,
X(A) = P{X g A) and n(A) = P{X' € A).
The total variation distance between A and // is simply twice the supremum
distance
||A-/i||:=2sup|A(A)-/i(A)|. (5.4)
A
Section 5. Poisson Approximation - Total Variation 13
The reason for multiplying by 2 and using the phrase 'total variation' is the
following. Suppose X and X' are discrete with probability mass functions
p and q, or continuous with densities / and g. Then twice the supremum of
A — /t equals the actual total variation (the total of the variation) of p — q,
or / — <?, namely
IIA -/ill = $>(*)-<?(*)| or ||A-/i|| = ||/-3|. (5.5)
X
We shall establish (5.5) and two other useful identities:
Theorem 5.1. If X and X' are discrete with probability mass functions p
and q, or continuous with densities f and g, then (5.5) holds and
||A -/i|| = 2 ^(x)-<?(*))+ or ||A-/i|| = 2 /"(/-$)+, (5.6)
x J
||A - /*|| = 2 - 2 5^p(a:) A g(a:) or ||A - /i|| = 2 - 2 f f A g. (5.7)
x •*
Here we have used the following standard notation: for real numbers a and
b let
a+ = a V 0, where a V b = maximum of a and 6,
a~ = — (aAO), where aA6= minimum of a and 6.
Proof. We shall carry out the proof of Proposition 5.1 in the discrete
case, the continuous case is analogous. It is clear that for sets A,
X(A) - p(A) < 5>(*) - q(x))+
X
and that equality holds if we take A = {x : p(x) > q(x)}. Thus
sup(A(A) - n(A)) = $>(*) - q(x))+, (5.8)
A x
( and similarly,
sup(/i(A) - \{A)) = Y,(p(x) ~ <?(*))-• (5-9)
A
( From Yl,x P(x) — 1 = Sx <z(x) ^ follows that
£(p(a:) - g(x))+ = 5Z(p(a:) - «(*))-. (5.10)
14 Chapter 1. RANDOM VARIABLES
Combining (5.8), (5.9), and (5.10) yields
Sup|A(A)-MA)|=5>(a0-*(a0)+>
A
and thus (5.6) holds. From
\p-q\ = (p-1)+ + (p - g)~
together with (5.6) and (5.10) we obtain (5.5). Finally,
(p - q)+ = p - p A q
together with (5.6) and ^p(i) = 1 yields (5.7). □
5.4 Total Variation and Coupling
Let (X, X') be a coupling of two random variables X and X', and let C be
a coupling event. Since C implies that X = X', we have for (Borel) sets A,
P(XeA,c) = P(x'eA,c)
and thus
P{x e A) - P(x' e A) = P(x e A) - P{x' g A)
= P(x e A, cc) - P(x' e A, cc)
^ P(CC).
Apply (5.4) to obtain the coupling event inequality
||P(Xe-)-P(*'e-)IK2P(<7c). (5.n)
From (5.7) we see that in the discrete and continuous cases this is just
a total variation formulation of Theorems 4.1 and 4.3 specialized to two
variables. We also see that the coupling is maximal if and only if identity
holds in (5.11). Thus when (X,X') is a maximal coupling, we have
||P(x e •) - P(x' g -)ll = 2P(^ ¥> x'). (5.12)
5.5 Back to the Poisson Approximation
Combining (5.3) and the coupling event inequality in the form (5.11) [and
with C = {X = X'}} yields
n
||P(XG0-Poi(pi + ---+Pn)|K2^p2_
»=i
Section 6. Convergence of Discrete Random Variables 15
In particular, if p, = p, then X is binomial (n,p), and thus we have the
following precise formulation of (5.1):
|| Bin(n,p) - Poi(np)|| ^ 2np2.
If a Poisson parameter c is given and n ^ c, then taking p = c/n yields
|| Bin(n, c/n) - Poi(c) || ^ 2c2/n.
tv
Sending n to infinity yields in particular, with —> denoting convergence in
total variation,
Bin(n,c/n) -4 Poi(c), n ->■ oo, (5.13)
which further implies
0
(c/n)x(l-c/n)n-x -> eTc— asn->oo, iEZ+, (5.14)
where Z+ are the nonnegative integers.
5.6 Comment
The above results can be much sharpened and extended; see Barbour, Hoist,
and Janson (1992). We just mention here Le Cam's theorem: with X as in
Section 5.2,
||P(X£-)-Poi(p1+---+pn)|| ^2 max p^
and in particular,
||Bin(n,p)-Poi(np)|| ^ 2p.
6 Convergence of Discrete Random Variables
Let Xi,..., Xqo be discrete random variables taking values in a finite or
countable set E. We shall first show that convergence in total variation,
like (5.13), is (somewhat surprisingly) equivalent to the apparently weaker
pointwise convergence of probability mass functions, like (5.14). We shall
then show that these distributional modes of convergence can be turned by
coupling into a convergence where the random variables actually hit the
limit and stay there.
6.1 Mass Function Convergence <& Total Variation Convergence
Suppose
P(Xn = x) —)■ P(Xoo = x) as n —> oo for each x £ E. (6.1)
16 Chapter 1. RANDOM VARIABLES
Then (P(Xoo = x) - P(Xn = x))+ ->• 0, and since
(P(Xoo = x) - P(Xn = X))+ ^ P(Xoo = X)
and
53 P(^oo = x) = 1< 00,
xeE
we have by dominated convergence that
53(P(Xoo=x)-P(Xn = x))+^0, n^oo.
xeE
Now (5.6) yields convergence in total variation:
Xn % Xoo, n ->• oo. (6.2)
Conversely, it is clear that (6.2) implies (6.1). Thus (6.1) and (6.2) are
equivalent.
6.2 Hitting the Limit
We now show that if (6.1) holds, then there exists a coupling {X\,..., Xoo)
of X\,..., Xqo and a finite random integer K such that
Xn = Xoo, n>K. (6.3)
We obtain this by elaborating on the maximal coupling construction in
Section 4.2. Note that (6.1) implies, for all x £ E,
qn(x) := inf P(Xk = x) t PiX^ = x) as n -> oo. (6.4)
Put go = 0 and let K, V\, V2,..., W\, Wi,... be independent random
variables such that for 1 ^ n < oo and x £ E
P(K = n)=^2 gn(x) - 53 9n-i(x),
P(y = ^ = /(«n(a:) - ff„-i(a:))/P(A- = n) if P(K = n) > 0,
1 " ^ [arbitrary if P{K = n) = 0,
PW = ^ = /(P(X» = *) - ?n(x))/P(ir > n) if P{K > n) > 0,
1 " X; [arbitrary if P(K > n) = 0.
The random variable _K" is finite, since by dominated convergence and (6.4)
P{K ^ n) = 53 Q1"^) T 53 p(^°° = a;) = 1 as n ->■ oo.
Section 6. Convergence of Discrete Random Variables 17
Define, for 1 < n ^ oo,
Xn=[V« i{n>K> (6.5)
\Wn \{n<K.
This is a coupling of Xi,..., Xoo, since for 1 ^ n < oo and each x £ E
P(Xn =x)= Y, P(Vk = X^P(K = fc) + P(Wn = X^P(K > n)
= Y (qk(x) - qk-i(x)) + (P{Xn = x) - qn(x))
= P(Xn = x),
while Xqo = Vk, and thus [due to (6.4)] for each x £ E
P{X00=x) = Y (9k(x)-qk-i{x))=P{X0O=x).
l<fc<oo
Clearly, (6.3) holds.
6.3 Converse
Conversely, suppose (6.3) holds. Then {K ^ n} is a coupling event of the
coupling (Xn,Xoo) of Xn and Xqo. Applying the coupling event inequality
(5.11) and the finiteness of K yields
||P(Xn G •) - P(*oo G Oil < 2P(K >n)^0, n ^ oo,
which implies (6.1).
Since (6.1) in turn implies (6.4), which was in fact the condition under
which we established (6.3), we have established the following equivalences.
Theorem 6.1. Let X\,..., Xoo be discrete random variables taking values
in a finite or countable set E. Then the three claims
lim P{Xn =x) = P(X00 = x), x G E, [this is (6.1)]
n—»oo
Xn -4 Xqq, n —> oo, [£/i«s is (6.2)]
liminf P(Xn = x) = P(X00 = x), x G £, [ttis is (6.4)]
are equivalent and hold if and only if there exists a coupling (Xi,..., X^)
ofXu...,X oo and a finite random integer K such that
Xn — Xoo, n ^ K,
[this is (6.3)].
18 Chapter 1. RANDOM VARIABLES
7 Continuous Variables - Hitting the Limit
Let X\,..., Xqo be continuous random variables with densities /i, •• •, /oo-
How should Theorem 6.1 be extended to this case? This section is
structured as the previous one and gives the answer at the end.
7.1 Density Convergence =>■ Total Variation Convergence
Replacing probability mass functions by densities in the argument in
Section 6.1 yields that the following analogue of (6.1):
the densities /i, • ■ ■, /oo can be chosen so that
(7.1)
fn{x) —> foa(x) as n —> oo for each x £ E,
implies convergence in total variation,
Xn 4 Xoo, n -> oo. (7.2)
However, the converse is not as obvious. In fact, it is no longer true, as we
shall see in a while.
7.2 Hitting the Limit
We shall now show that the condition (7.1) is sufficient to hit the limit, that
is, if (7.1) holds, then there exists a coupling (Xi,..., X^) of Xi,..., Xao
and a finite random integer K such that
Xn = X00, n > K. (7.3)
This follows by a coupling construction analogous to the one in Section 6.2.
Let us go through the essential part of it again. Put
go = 0 and for n > 1 gn= inf fk-
Let K, V\, V2, ■ ■ ■, Wi, W2, ... be independent random variables such that
for 1 ^ n < 00,
K is integer valued and V(K = n) = / gn — I gn~\,
V has Jdensity (#" ~ 9n-i)/P(K = n) if P(K = n) > 0,
[arbitrary density if Y?{K = n) = 0,
w hag f density (/„ - gn)/P(K > n) if P(K > n) > 0,
(arbitrary density if P(K > n) = 0.
Defining Xn as at (6.5) yields the desired result, since (7.1) implies that
gn t /oo as n ->■ 00.
Section 7. Continuous Variables - Hitting the Limit 19
7.3 Converse?
Note that (7.3) was actually established under gn f /oo, that is, under
liminf /„ is a density of X^, (7.4)
n—>oo
which is weaker than (7.1). We shall now show that (7.4) is the correct
condition, that is, that (7.4) is implied by (7.3).
Suppose there is a coupling and a finite K such that (7.3) holds. Then
{K ^ n} is a coupling event of the coupling (Xn,..., Xco) of Xn,...,X^,
and (4.5) yields for (Borel) sets A,
P(i"oo &A,K^n)^gn, 1 ^ n < oo.
J A
Now, gn increases to liminf n_>oo /n, and thus [by monotone convergence
and since K < oo]
P(Xoo G A) < / liminf /„. (7.5)
yA n->oo
Also, since J gn ^ J fn = 1, we have J" lim infn_>oo /n ^ 1- Thus, for each
(Borel) set A,
1 = P^ £i) + P(Xoo G Ac)
^ / liminf/„+ / liminf/n^l.
J A n^°° Va= n^°°
This cannot hold unless (7.5) holds with identity. Thus (7.4) holds.
7.4 Pointwise Convergence of Densities Is Too Strong
We have established the equivalence of (7.3) and (7.4). For discrete
variables there were two more equivalences, which both break down in the
continuous case. We start with (7.1) and (7.4): certainly, (7.1) implies (7.4),
and the following example shows that (7.1) is, in fact, strictly stronger than
(7.4).
Example 7.1. Let the random variables X\,.. .,Xoa be [0,1) valued and
have densities /i,..., /«, defined on [0,1) as follows:
/i(a:) = /0o(a:) = l) *€[0,1);
and for n = 2m + k where m > 1 and 0 < k < 2m [each n > 1 can be
written uniquely in this way] put (see Figure 7.1 on the next page)
(2, arG[*2-"\(A + l)2-m),
W 1 2 — (1 — 2-m)-1, x$ [k2-m,{k+l)2-m).
20 Chapter 1. RANDOM VARIABLES
Example 7.1
Example 7.2
FIGURE 7.1. The functions /„ when n = 12 (m = 3 and k = 4).
Then for each x G [0,1) there are infinitely many n such that fn(x) = 2,
and thus
limsup/n(x) = 2^ 1 =/oo(x), x G [0,1).
n—>oo
Hence (7.1) does not hold.
On the other hand, for each x G [0,1),
2-(l-2~m)-1 ^gn(x)<l.
This yields, as n —> oo,
gn(x) ->• 1 = /oo(a:), a; £ [0,1),
and thus (7.4) holds.
7.5 Total Variation Convergence Is Too Weak
Finally, consider (7.4) and (7.2). In Section 7.2 we showed that (7.4) implies
(7.3) and in Section 7.3 that (7.3) implies (7.2). Thus (7.4) implies (7.2),
and the following example shows that (7.4) is, in fact, strictly stronger than
(7.2).
Example 7.2. Let the random variables Xi,... ,Xoo be [0,1) valued and
have densities /i,..., /oo defined on [0,1) as follows:
foo(x) = l, a:€ [0,1);
and for n = 2m + k where m > 0 and 0 ^ k < 2m put (see Figure 7.1)
(0, x G [k2~m, (k + l)2"m),
{X |(l-2-m)-\ xi [fc2"m,(fc+l)2-m).
Then, due to (5.6),
||P(Xn G •) - P(*oo G Oil = 2/(/oo - fn) +
= 2-2~m->0, n->oo.
Section 8. Convergence in Distribution and Pointwise 21
Hence (7.2) holds. On the other hand, for each x £ [0,1) there are infinitely
many n such that fn{x) = 0, which yields
liminf fn(x) = 0, x <= [0,1),
n—»oo
and thus (7.4) does not hold.
7.6 What Has Been Achieved?
In Sections 7.2-7.5 we have established the following result.
Theorem 7.1. If X\,...,Xoo are continuous random variables with
densities fi, ■■■, foo, then
lim /„ is a density of X^ [this is (7.1)]
n—»oo
is strictly stronger than
liminf/„ is a density of X^, [this is (7.4)]
n—»oo
which is strictly stronger than
Xn -4 Xqo, n ->■ oo. [this is (7.2)]
Moreover, (7.4) holds if and only if there exists a coupling (X\,... ,-Xqo) of
X\,..., Xqo a-nd a finite random integer K such that
Xn = Xqo, n ^ K. [this is (7.3)]
In Chapter 3 (Section 9) we shall extend this coupling result to general
random elements.
8 Convergence in Distribution and Pointwise
Let Xi,..., Xoo be random variables with distribution functions Fi,..., Fqo-
The Xn tend pointwise (or surely, or realizationwise) to Xoo if
Xn ->• Xoo, n ->• oo, (8.1)
which is short for
Xn(u>) —> Xoo(w), n —)• oo, for all outcomes to.
22 Chapter 1. RANDOM VARIABLES
This means that the Xn close in on the limit without necessarily hitting
it as in (7.3). In order to compare (8.1) to (7.3) note that (8.1) can be
rewritten as follows: for each e > 0 there is a finite random integer Ke such
that
\Xn — Xoo| ^ e, n > Ke, (see Figure 8.1).
(8.2)
11 Illustration of (7.3)
n i-» X„(co)
XJlfo)
K(fo)
i k Illustration of (8.1), that is, of (8.2)
n H> XJfit)
XO0)
KM
FIGURE 8.1. Comparison of (7.3) and (8.1).
In this section we shall dig out the distributional form of pointwise
convergence. The result, once more, is stated at the end of the section.
8.1 Total Variation Convergence Is Too Strong
The distributional condition we are looking for should be implied by point-
wise convergence. This excludes convergence in total variation, as can be
seen by the following example. Put Xn = \jn and X^ = 0. Then certainly
(8.1) holds, but Xn does not tend to Xoo in total variation, since clearly
||P(xne-)-P(*ooe-)ll = 2^o.
Even the much weaker condition
Fn(x) ->-F0O(x), n->oo, iel, (8.3)
is too strong, since in our example
Fn(0) = 0/>1 = FQO(0).
8.2 Pointwise Convergence =>■ Convergence in Distribution
The distributional form of pointwise convergence turns out to be the
following slight weakening of (8.3):
Fn(x) —> Foo(x), n —> oo, for all x where Fqq is continuous. (8.4)
Section 8. Convergence in Distribution and Pointwise 23
This is called convergence in distribution and is denoted by
Xn -> Xqo, n ->■ oo.
In order to see that pointwise convergence implies convergence in
distribution assume that (8.1) holds and apply its equivalent form (8.2) to obtain
that for all x £ E and e > 0
Fn(x) = P(Xn ^x,Ke^n) + P{Xn ^x,KE>n)
^P(Xoo ^x + e)+P(Ke >n)
->• Fqo (x + e), n ->■ oo,
and
Fn{x) >P{Xn^x,Ks^n)
^PiXcc ^x~e,Ke^n)
—> Fqo (x — e), n —> oo.
Thus for all x <E E and e > 0
Fqo (x - e) ^ lim inf Fn (x) ^ lim sup Fn (x) ^ F*, (x + e),
and sending e to 0 shows that (8.4) holds.
8.3 Turning Distributional Convergence into Pointwise
We shall now use the quantile coupling (Section 3.1) to reverse the above
implication, that is, turn convergence in distribution into pointwise
convergence (see Figure 8.2).
FIGURE 8.2. Turning convergence in distribution into pointwise convergence.
First we need the following fact.
Lemma 8.1. For a nondecreasing real function f, the set of points where
f is not continuous is either finite or countable.
24 Chapter 1. RANDOM VARIABLES
Proof. That / is not continuous at u is equivalent to the left-hand limit
being less than the right hand limit, f(u—) < f(u+). To each such u we
can associate a rational number that lies in the interval [f(u—),f(u+)).
These intervals are disjoint (because / is nondecreasing), and thus we have
established a one-to-one correspondence between {ti£l: f(u~) < f(u+)}
and a subset of the rational numbers. Since the rationals are countable, the
set {u £ E : f{u—) < /(«+)} is either finite or countable. □
Recall that the generalized inverse of a distribution function F is
F_1(w) =inf{a:e E : F(x) > w}, w<E[0,l].
Clearly, F~l is nondecreasing,
F(F-l{u)-)^u^F(F-l(u)), «G[0,1], (8.5)
and
F"1 is continuous at u
and F(x-) ^ w ^ F(x)
Since F^1 is nondecreasing, the set of points where F^1 is not continuous
is finite or countable. Thus there is a random variable U that is uniform
on [0,1] and takes values in the set of points at which F^1 is continuous.
Use this U to define the quantile coupling
Xn = F~1(U), O^n^oo.
We shall show that (8.4) implies
FnXu) ~* •P,o^1(u)> n-t-oo, for all u where F^1 is continuous, (8.7)
which yields the desired result that Xn —> Xoo as n —> oo.
8.4 Establishing That (8.4) Implies (8.7)
Fix u G [0,1] and put
x = liminf F~l(u).
Since F^ is nondecreasing and thus is discontinuous at only finitely or
countably many points, we can fix an arbitrarily small e > 0 such that
Foo is continuous at both x — e and x + s. Let rik,k ^ 1, be a sequence of
integers such that
F_1(m) =x.
(8.6)
x - s < F~* (u) ^ x + e, k > 1.
Section 8. Convergence in Distribution and Pointwise 25
Applying (8.5) yields
Fnk(x-e) ^ u ^ Fnk{x+e), k > 1.
Send k to infinity and use (8.4) and the choice of e to deduce
Fxj
(x - e) ^ u ^ Fqo(x +e).
Then send s to zero to obtain
Fooix-) ^u^ Foo(a:). (8.8)
Replace x by
i/ = lim sup F^^tt)
n—>oo
in the above argument to obtain
Fcoiy-) ^u^ Food/). (8-9)
If Fj;1 is continuous at u, we obtain from (8.6), (8.8), and (8.9) that
F-1(«) = ^ = liminfF-1H,
n—j-oo
F^(u) = 2/ = limsupF-1(M).
n—j-oo
Thus (8.7) holds.
8.5 What Has Been Achieved?
In Sections 8.2-8.4 we have established the following result.
Theorem 8.1. Let X\,..., Xqq be random variables. Then
V D V
A„ -» Aqo, n -» 00,
if and only if there exists a coupling (Xi,..., X^) of X\,..., X^ such that
Xn ->• Xqo, n ->■ oo.
We finally mention that the definition (8.4) of convergence in distribution
can be seen to be equivalent to
E[/(Xn)] -»■ E[/(Xoo)], n -»■ oo,
for all bounded continuous functions /. This is taken to be the definition
of convergence in distribution for random elements in metric spaces. In
Chapter 3 (Section 10) we extend Theorem 8.1 to random elements in a
separable metric space.
26 Chapter 1. RANDOM VARIABLES
9 Quantile Coupling - Dominated Convergence
The pointwise version of the dominated convergence theorem [see Ash (1972)]
states that if Xi, X2,.. ■,X,*,,X are random variables such that
|Xn| sj X, 1 ^ n < oo,
E[X] < oo and Xn ->• X^ as n ->■ oo,
then
E[|Xoo|] < oo and E[Xn] ->• E[Xoo] as n ->• oo. (9.1)
Using the quantile coupling it is straightforward to extend this result to
the following distributional form.
Theorem 9.1. // X\, X2, ■ ■ •, Xqo, X are random variables such that
D
\Xn\ sj X, 1 ^ n < oo,
E[X] < oo and Xn —> X,*, as n -^ oo,
then (9.1) ZioMs.
Proof. Under the assumptions of the theorem we have
d D
X+ ^ X and X+ ->• X+, as n ->■ oo,
_ D r>
Xn ^ X and Xn —>■ X^ as n —>• oo.
Apply the quantile coupling in Sections 3 and 8 to turn these distributional
relations into pointwise ones, that is, to obtain copies of X+ and X~ that
are pointwise dominated by a copy of X (Section 3) and converge pointwise
to copies of X+, and X^, respectively (Section 8). By the pointwise version
of dominated convergence, this together with E[X] < oo implies
E[X+] < oo and E[X+] ->• E[X+] as n ->• oo,
E[X^] < oo and E[X"] ->• E[X^] as n ->• oo.
Thus E[|Xoo|] = E[X+] + E[X-] < oo and
E[Xn] = E[X+] - E[X"]
-> E[X+] - E[X"] = E[Xoo] as n -> oo,
and the proof is complete. D
In Chapter 2 we shall need the following extension to a continuous index.
Section 10. Impossible Coupling - Quantum Physics 27
Corollary 9.1. If Xt, t £ [0, oo], and X are random variables such that
\xt\^x, te[o,oo),
E[X] < oo and Xt —> Xqo as t —» oo,
then
E[|Xoo|] < oo and ~E[Xt] -> E[Xoo] as t -t oo.
Proof. A collection of real numbers like E[Xt], t e [0, oo), tends to a limit
if and only if it tends to this limit along all subsequences E[Xt(nj], n ^ 1,
where the t(n) increase to oo as n —> oo. Apply Theorem 9.1 to Xt^ to
obtain E[Xt{n)] ->■ E[X oo] as n -> oo. D
10 Impossible Coupling - Quantum Physics
We end this first chapter on a rather different note: the coupling aspect of
a (the?) problem in quantum physics.
10.1 A Surprising Experimental Result
The following experiment has been carried out. Some material (calcium,
carefully excited by laser) sends off particles (photons) in pairs, one particle
to the left and the other to the right. Measuring devices are placed on each
side of this material with measurements made when particles pass through.
What is being measured is the so-called polarization of the particle, which
can be either 1 or —1 and depends on the angle in the plane orthogonal to
the direction of movement.
0. When the measuring devices are aligned to measure polarization in
the same direction, say 0°, the same measurement is always recorded
on both sides.
1. When the left device is tilted 30° and the right device is kept at the
initial 0° position, then the measurements agree | of the time.
2. When the left device is rotated back to its 0° position and the right
device is tilted —30° (that is, 30° in the opposite direction), then the
measurements also agree | of the time.
3. When the left device is again tilted 30° and the right device is kept
at its new —30° position (that is, the total relative rotation is 60°),
then the measurements agree ~ of the time.
28 Chapter 1. RANDOM VARIABLES
10.2 Why Surprising?
On the basis of the above empirical facts it is now natural to build the
following model. Consider a particular pair of photons, and set
X = the polarization of the left particle in the 0° direction
= the polarization of the right particle in the 0° direction,
Y = the polarization of the left particle in the 30° direction,
Z ~ the polarization of the right particle in the -30° direction;
see Figure 10.1.
Setup 0
-D-
Measurements
agree
all the time.
Setup 1
-D-
Measurements
agree
3/4 of the time.
Setup 2
-D-
Measurements
agree
3/4 of the time.
Setup 3
-D-
Measurements
agree
1/4 of the time.
FIGURE 10.1. The experimental setups.
Interpreting the relative frequencies as probabilities, we have
and
P(Y = X)=I>(X = Z)= f,
P(Y = Z)=\.
Section 10. Impossible Coupling - Quantum Physics 29
By basic rules of probability,
P{Y = Z) ^P(Y = Z,X = Z)
= P(Y = X,X = Z)
= P(F = X) - P{Y = X,X^Z)
^P(Y = X)-P{X^Z)
= P(Y = X) + P{X = Z)-l,
that is,
P{Y = Z)^P(Y = X) + P{X = Z)-l. (10.4)
Combine this, (10.2), and (10.3) to obtain the following contradiction:
This contradiction is derived in an ordinary probabilistic way from
straightforward empirical facts: real life seems to contradict probability theory...
10.3 Predicted by Quantum Theory
Because of this apparent contradiction it is all the more annoying (for
probabilists) that the empirical results are in fact predicted by quantum
mechanics, which calculates the probabilities as follows:
P(F = X)= P{X = Z) = cos2 30° = 1 - sin2 30° = 1 - (|)2 = |,
P(F = Z)=cos260° = (f)2= I.
10.4 No Contradiction at the Level of Observation
Note that X, Y, and Z refer to polarization as intrinsic properties of the
particles, thought of as existing simultaneously without interaction with
the macro world (without being measured). If we instead stay at the level
of observation (measurement), then it turns out that the contradiction
disappears.
It is clear that we are dealing with three experimental setups (leaving
out the one with the measuring devices aligned).
First consider Setup 1: the case when the left device is tilted 30° and the
right device is kept at the initial 0° position. Put
X\ = observed polarization of the right particle in the 0° direction,
Y\ — observed polarization of the left particle in the 30° direction.
30 Chapter 1. RANDOM VARIABLES
In addition to the measurements agreeing | of the time it has been recorded
that —1 and 1 are observed in equal proportions on both sides. Specify the
complete joint distribution of X\ and Yx as
p(x1 = -i,r1 = -i) = p(x1 = i,y, = i)= §,
p(x1 = -i,r1 = i) = P(A:1 = i,Yi = -i)= |.
This is in accordance with the relative frequencies since
P(Yi=-l)=P(X!=-l)
= P(Xi = -l,Yi = -1) +P(X1 = -l,Yi = 1)
_ i_
~ 2'
P(X, = Yx) = P(Xi = -1, Yi = -1) + P(Xi = 1, Yi = 1)
_ 3
4"
Now consider Setup 2: the case when the left device is at the 0° position
and the right device is tilted —30°. Put
X2 = observed polarization of the left particle in the 0° direction,
Z2 = observed polarization of the right particle in the —30° direction.
Letting (X2,Z2) have the same distribution as (Xi,Y\) again yields
probabilities in accordance with the relative frequencies,
P(Y2 = -1) = P(X2 = -1) = \ and P(X2 = Y2) = f.
Finally, consider Setup 3: the case when the left device is tilted 30° and
the right device -30°. Put
Y3 = observed polarization of the left particle in the 30° direction,
Z3 = observed polarization of the right particle in the —30° direction.
The measurements now agree only | of the time, but it has still been
recorded that —1 and 1 are observed in equal proportions on both sides.
Specify the complete joint distribution of Y3 and Z3 as
p(y3 = -i, z3 = -i) = P(r3 = 1, z3 = 1) = |,
p(y3 = -1, z3 = 1) = P(y3 = 1, z3 = -i) = §.
This is in accordance with the relative frequencies,
P(y3 = -1) = P(Z3 = -1) = I and P(y3 = Z3) = \.
We have managed to account for all three experiments, and thus the
contradiction is not at the level of observation. The contradiction appears when
we assume that each particle has a polarization in a direction where we do
not make a measurement.
Section 10. Impossible Coupling - Quantum Physics 31
10.5 What Has This to Do with Coupling?
We have created three pairs (Xi,Y\), (X2,Z2), and (Y3,Zz)- What we
proved in Section 10.2 is that there is no coupling of these pairs such that
the X-variables agree, the y-variables agree, and the Z-variables agree.
More precisely, there is no jointly distributed triple (X, Y, Z) such that
(X,Y) = (X^Y,), (X,Z) = (X2,Z2), (Y,Z)^(Y3,Z3).
That is, although reality seems to be able to construct a coupling, we can't.
10.6 Does Probability Not Suffice in the Micro World?
It is one of the implications of quantum theory that polarization cannot
be measured simultaneously in all three directions; only one measurement
on each particle is possible. The reason we have measurements in pairs is
that we have two particles. The above contradiction further suggests that
polarization exists in the micro world only through interaction with the
macro world (only by being measured).
Is there then nothing, no reality, behind the observations? Or does
probability not suffice to describe it?
One school of thought claims that classical probability (that is,
Kolmogorov's axioms) is too narrow. It should be replaced by quantum
probability (an axiom system more general than Kolmogorov's) in a similar way
as Newton's theory had to be replaced by Einstein's in physics.
Applying quantum probability there is no longer a contradiction to be derived
from the assumption that polarization exists in all three directions. See
Kummerer and Maassen (1998) and Accardi (1998) for such viewpoints.
Note that there are finitely many possible outcomes in each individual
experiment, so the contradiction does not appear to have to do with
countable additivity. Since Kolmogorov's axioms otherwise reflect properties of
relative frequencies, it is hard to swallow that they should not apply. And
so it is not surprising that there are other attempts to get rid of the
contradiction. See Maudlin (1994) and Gill (1998, 1999) for the following point
of view.
Behind the attempt in Section 10.2 to create a model are several
implicit assumptions. One assumption is that measuring the polarization in a
particular direction does not affect the polarization in the other directions.
In other words, an interplay between the micro and macro worlds is not
allowed. Allowing a local interplay is not a serious crime against physical
ideas, but it turns out that a nonlocal interplay is needed to get rid of
the contradiction. Nonlocal means that the experimental setup on the left,
for instance, affects the polarization of a particle measured on the right.
This is not easy to accept, but for an Einsteinian realist this is easier to
accept than having to discard Kolmogorov's axioms, which is too close to
discarding 2 + 2 = 4.
32 Chapter 1. RANDOM VARIABLES
10.7 What Does This Teach Us About Coupling?
The above excursion into the quantum experience shows that we have to
be careful when assuming existence of couplings. For empirical or
intuitive reasons joint distributions may appear to exist when they do not.
In Chapter 3 (Sections 3 through 5) we shall consider some safe methods
for constructing couplings. The next chapter, however, is devoted to the
classical triumphs of the coupling method.
Chapter 2
MARKOV CHAINS
AND RANDOM WALKS
1 Introduction
We now turn to the coupling of Markov chains in discrete and
continuous time, random walks, and renewal processes, the aim being to establish
asymptotic properties such as asymptotic stationarity. We start with the
earliest example, the classical coupling, which we present first in the
pleasant context of birth and death processes.
2 Classical Coupling - Birth and Death Processes
A continuous-time irreducible nonexplosive birth and death process is a
collection of random variables (stochastic process)
Z = (2s)sg[0,oo)
taking values in the state space E = {0,1,...} and developing in time (as
the time parameter s increases) in such a way that Z changes state only
finitely often in finite time intervals (nonexplosion) and whenever Z visits
a state i, it stays there an exponential length of time (sojourn time) with
parameter depending only on i, and then jumps either one step up to i + 1
(a birth) or one step down to i — 1 (a death, this occurs only if i > 0)
with positive probabilities depending only on i. Irreducibility follows from
the positivity of the birth and death probabilities (irreducibility means
that each state is visited with positive probability starting from any other
33
34 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
state). Finally, we let the paths be right-continuous, that is, Zt = Zt+,
where Zt+ = lims;t Zs.
2.1 Notation
Let A be the distribution of Zq, the initial distribution. Let P\ indicate this.
Let Pj indicate that Z starts in state j, that is, Z0 = j. The semigroup of
transition matrices is
Pl = (Ptj:i,j e E), t > 0, [semigroup because PlPs = Pt+S]
where P'- denotes the probability of going from i to j in a time interval of
length t,
P£ =P(Z.+t=j|Z. = t), s,t^Q, i,j€E.
If we treat the initial distribution A as a row vector, then the row vector
AP* represents the distribution of Zt,
px(zt = j) = \pj, t^o, jeE.
2.2 The Classical Coupling
Let Z' be a differently started independent version of Z, that is, Z' is
independent of Z and has the same semigroup of transition matrices but
another initial distribution A', say. Let T be the time when Z and Z' first
meet,
T = inf{i >0 : Zt = Z[) (see Figure 2.1).
Let Z" be the process that follows the path of Z' up to T and then switches
to Z,
z,,[z't if*<r,
' \zt iit^T.
At time T the processes Z and Z' are in the same state and will continue
as if they both were starting anew in that state. Therefore, modifying Z'
by switching to Z at time T does not change its distribution, that is, Z" is
a copy of Z'. Thus (Z, Z") is a coupling of Z and Z', the classical coupling.
2.3 The Coupling Time T - Asymptotic Loss of Memory
The time T when Z and Z" merge is called a coupling time or coupling
epoch. By definition (Section 4.1 in Chapter 1) the event
{T < t} = {Zt = Z't'}
Section 2. Classical Coupling - Birth and Death Processes 35
FIGURE 2.1. The classical coupling in the birth and death case.
is a coupling event for the coupling (Zt, Z[') of Zt and Z[. The coupling
event inequality (5.11) in Chapter 1 yields the coupling time inequality
||XPl - \'Pl || < 2P(T > t), t ^ 0. (2.1)
The coupling is called successful if P(T < oo) = 1. This implies asymptotic
loss of memory
P(T<oo) = l => ||AP* -A'P'H -+0, t-^oo. (2.2)
2.4 Recurrence of the Birth and Death Process implies T < oo
The process Z is called recurrent if each state j £ E is recurrent:
Pj(tj < oo) = 1,
where r,- is the time of first visit to state j (re-entrance if Z starts in j):
Tj=M{t>0:Zt- ?j,Zt=j}.
Recurrence implies, by irreducibility, that
Pa(tj < oo) = 1 for all initial distributions A and all states j,
since otherwise there would be states i and j such that Z could go from j
to i and never return to j, contradicting the recurrence of j.
By the birth and death property Z and Z' cannot pass each other without
meeting (since jumps cannot happen simultaneously due to the exponen-
tiality of the sojourn times in the individual states). Thus if Z starts above
Z', we have T < r0, while if Z starts below Z', we have T < Tq, that is,
T < t0 V To (see Figure 2.1). (2.3)
If Z is recurrent, this implies that P(T < oo) = 1, and (2.2) yields that an
irreducible recurrent birth and death process forgets how it started: for all
initial distributions A and A',
Z recurrent => ||XP* - A'P*|| -> 0, t -> oo. (2.4)
36 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
2.5 Recurrence Implies Existence of a Stationary Vector
Let Z be recurrent. Fix an arbitrary state k and let i/, be the expected
amount of time spent in i between two entrances to k,
Ei
rTk
/ 1{Za=i}
Jo
ds
i e E.
(2.5)
Then the row vector v with entries i/j, i 6 E, is stationary:
vPl = v, t > 0. (2.6)
This can be seen as follows. Note that
Vi = E*
r-oo poo
I l{z,=i,Tk>s}ds = / Pk(Zs = i,rk >
Jo J Jo
s) ds.
Note also that we can tell whether the event {rk > s} happens or not
by observing Z only in the time interval [0, s], and thus, conditionally on
{Zs = i,rk > s}, the process Z starts anew in state i at time s, that is,
Ptij=Pk(Zs+t = j\Zs=i,rk >s).
These two observations yield
/•OO
ViP^ = / Pk (Zs+t =j,Zs = i,rk> s) ds.
Jo
Sum over i to obtain the first equation in
/•OO r-Tk
*=j Pk(Zs+t=j,Tk>s)ds = Ek[J l{Za+t-_
= E4/ liz,=j}ds
vP]
■■3}
ds
r fTk ~\ r rTk+t
= Efc[j l{Zs=j}ds\+Ek[J 1
{Z,=j}
ds\.
Since Z starts anew in state k at time rk, the last term can be replaced by
Efc[Jo l{z„=j} ds]. This yields the first equation in
V.=j} ds
= Efc[y l{z.=j}ds
+ E;
U'
!{Z.=j} rfs
that is, (2.6) holds.
Note also that
5>i = E J P 53 l{Zj=i} dsl = Efc [ I'" ll - Etlnt]. (2.7)
Section 2. Classical Coupling - Birth and Death Processes 37
2.6 Positive Recurrence Implies Asymptotic Stationarity
The process Z is called positive recurrent if each state j £ E is positive
recurrent:
rrij := Ej[r,] < oo.
In this case, due to (2.6) and (2.7), the row vector ir = u/mk is a stationary
distribution for Z, that is,
irPt=Tr, t^Q, and yVj = 1.
Choose Z' stationary to obtain that Z is asymptotically stationary: putting
A' = 7r in (2.4) yields
\ nt tv
XP ->• 7T, t ->• OO.
In other words (cf. Theorem 6.1 in Chapter 1), we have established the
following result: for all initial distributions A and all j £ E it holds that
Z positive recurrent => P(Zt = j) —> ftj, t —> oo. (2.8)
Remark 2.1. The proof of (2.8) works for all stationary distributions ir.
Thus 7r must be unique. In particular, 7r does not depend on the arbitrary
fixed state k. For each j e E, let hj be the expected sojourn time in j,
hj = E^inf {t > 0 : Zt- = j, Zt ? j}}, (2.9)
and note that v^ = h^. This yields (by taking k = j) that the unique
stationary distribution tt is given by
7Tj = hj/m.j, j G E.
2.7 Null Recurrence Implies Asymptotic 'Nullity'
A recurrent process Z is called null recurrent if each state j e E is null
recurrent:
mJ = %h] = °°-
In this case (2.6) yields ViP\k ^ uk, and since vk = hk < oo (A* is the
expected value of an exponential random variable and thus hk < oo) and
P\h > 0 for all t > 0 (some t > 0 suffices of course), we obtain
Vi < oo, ie E. (2.10)
Take a finite set of states B and put A' = v(-\B), that is,
V = \vjlY,i£Bvu J e B,
J 10, j^B.
38 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
Then A' ^ vIY^i£Bvi [entrywise], which implies X'Pj ^ ^V XagB ^
and thus by (2.6),
*'^<"j/I>. 3€E. (2.11)
i£B
This yields the second inequality in \Z' has initial distribution A']
p(zt = j) < \p(zt = j) - p(z; = j)\ + p(z; = j)
*Z\P(Zt=])--P(Z't=j)\ + uj/,£iui.
Apply (2.4) to obtain
limsupP(Zt=j)<^/2I/i- (2-12)
Sending B "[ E yields ^ieB Vi t X)ieB "» = mfc = °°) an(l tnus we have
established the following result: for all initial distributions A and all j £ E,
Z null recurrent =*■ P(Zt = j) ->• 0, t ->• oo.
Remark 2.2. In the null recurrent case no stationary distribution exists,
since if a stationary 7r existed, then we could let it be the initial distribution
of Z and obtain ttj = P{Zt = j) —► 0 as t —► oo, that is, 7Tj = 0 for all j,
which contradicts Y^i^E^i ~ 1-
Remark 2.3. The result in Section 2.6 that limt_).00P(Zt = j), j £ E, is
a stationary distribution was obtained under the condition my. < oo for
some state k. The result in this subsection that limt_>00P(Zt = j) = 0,
j £ E, was obtained under the condition m^ = oo for some state fc. Since
a stationary distribution cannot be identically 0, it follows that either all
states are positive recurrent or all states are null recurrent. That is, a
recurrent Z is either positive recurrent or null recurrent.
3 Classical Coupling - Recurrent Markov Chains
The argument in the previous section went basically as follows. In order
to deduce asymptotic properties of the process Z, let a differently started
version Z' run independently of Z until it hits Z, at time T, say. At time T
switch from Z' to Z to obtain a copy Z" of Z' that sticks to Z from time
T onward. Establish that T is finite with probability one to deduce that Z
and Z' have the same asymptotic behaviour. In the positive recurrent case,
choose Z' stationary to obtain that Z is asymptotically stationary. In the
null recurrent case, choose Z' such that P(Zt' = j) is close to 0 for all t to
obtain that P(Zt = j) is close to 0 asymptotically.
Section 3. Classical Coupling - Recurrent Markov Chains 39
*Z'andZ" ° *
/ \ p '0--0' 2'
0 T
FIGURE 3.1. Classical coupling of two discrete-time Markov chains.
This argument extends immediately to finite and countable state space
Markov chains in both discrete (see Figure 3.1) and continuous time, except
for the proof of the finiteness of T: the inequality (2.3) relies on the birth
and death structure. In fact, we cannot establish that T is finite in the null
recurrent case. We deal with these complications below, first in continuous
and then in discrete time.
3.1 Continuous Time — Preliminaries
A continuous-time stochastic process
Z = (-^s)se[o,oo)
with a finite or countable state space E is called a Markov jump process
(or a continuous-time Markov chain) if Z is piecewise constant and right-
continuous (as a function of s) and satisfies the Markov property: the future
is independent of the past given the present, that is,
P(Z.+t=j\Zh,0 4:h4:8;Za = i) = P!j, s,t>0, i,j £ E,
where P/- depends only on t, i, and j. The independence of s is called
time-homogeneity. For convenience we assume time-homogeneity here to
be part of the Markov property.
Below we outline some properties of Markov jump processes needed here.
For formal proofs, see, for instance, Asmussen (1987).
A Markov jump process behaves as follows: it stays in a state i an
exponential length of time with parameter depending only on i and then jumps
to a new state with probabilities depending only on i (if the process is
nonexplosive, then this description is equivalent to the Markov property).
Thus a birth and death process is the special case with E the nonnegative
integers and with jumps of size one being the only possible jumps.
In addition to the terminology and facts from the birth and death case in
Section 2 we need the following. Assume that Z is irreducible, that is, for
each i, j £ E there is a t > 0 such that Py > 0. Then Z can go from i to j
40 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
through a finite sequence of states i = io,«i, ■ • -in-i;*n — j- Conditionally
on going through these states the sojourn times in io,i\,... ,in are
distributed like independent exponential random variables, say X0,... ,Xn,
with parameters depending on these states. Since for all t > 0 we have
P{X0 H 1- Xn_i < t < X0-\ h Xn) > 0, it follows that Z can enter
the state j before time t and not leave it until after time t, that is, Z can
be in j at time t. Thus irreducibility implies that
Ftj >0, t>0, i,j€E. (3.1)
The process Z is transient if each state j E E is transient:
Pj{Tj < CO) < 1.
Transience implies (since each time Z leaves the state j it has the
probability Pj(tj = oo) > 0 of never entering j again) that
P\(kj < oo) = 1 for all initial distributions A and all j E E
where (with sup 0=0)
Kj = sup{£ > 0 : Zt = j} — the time of the last exit from j.
An irreducible Markov jump process is either recurrent or transient. This
can be seen as follows. Let Z start in some state i and suppose there exists
a recurrent state j. Then, by irreducibility, Z is sure to visit j (because
otherwise it could go from j to i and never return to j, contradicting
the recurrence of j). Each time Z leaves j it has a positive probability of
visiting i before returning to j (because otherwise it would always return
to j before visiting i and thus could not go from j to i, contradicting the
irreducibility of Z). Since, by irreducibility of Z and by recurrence of j,
Z leaves j infinitely often, it will eventually get back to i with probability
one, that is, i is also recurrent. This yields the desired result that if one
state is recurrent, then all the states are recurrent.
3.2 Continuous Time - The Theorem
We shall now establish the following result.
Theorem 3.1. Let Z be an irreducible recurrent Markov jump process with
a finite or countable state space E. Then, with k a fixed state, the row
vector v with entries defined at (2.5) is stationary and the entries are finite.
Further, Z is either positive recurrent or null recurrent.
If Z is positive recurrent, then the classical coupling (defined in
Section 2.2) is successful, the row vector
■k — (hj/m.j : j € E) [hj is expected sojourn time in j, see (2.9)]
Section 3. Classical Coupling - Recurrent Markov Chains 41
is a unique stationary distribution, and for all initial distributions A and
all j e E,
PX{Zt=j)-^7Tj, £->0O. (3.2)
// Z is null recurrent, then no stationary distribution exists, and for all
initial distributions A and all j 6 E,
Px{Zt=j)^Q, i->oo. (3.3)
Proof. For the stationarity of v, see Section 2.5, and for the flniteness
of Vi, see (2.10). In order to establish the rest of the theorem, let Z' be a
differently started independent version of Z. Then the bivariate process
\ZS, Zs)sg[0jOo)
is a Markov jump process with state space E2 and transition probabilities
pUi'Hi,r) = pijpi'j'^ l > °> *'*'. J'.J" e E-
These are strictly positive due to (3.1), that is, (Zs,-Zs)se[o,oo) is also
irreducible. Thus (Zs, Zj)se[o>00) is either recurrent or transient. We treat
these two cases separately.
Case 1: suppose (Zs, Zj)se[0>Oo) is recurrent. Let T be the classical
coupling time. Instead of (2.3) use that for an arbitrary j,
T ^ T(jj) — tne time of the first visit of (Zs, Z's)s&[0,oo) t0 (hi)-
Since (Zs, Zj)se[0jOo) is recurrent, this implies P(T < oo) = 1 for all initial
distributions A and A', and the same argument as in Section 2 yields the
desired results.
Case 2: suppose (Zs, Z's)S£[0,oo) is transient. Then for an arbitrary j,
KU'i) = t'ie ilme of the last exit of (Zs, Zj)se[o,oo) from (j,j)
is finite with probability one. Letting Z' have the same initial distribution
as Z yields
P(Zt = j)2 = P((Zt,Zfi = (j,j)) ^ P(kuj) > t),
and thus for all initial distributions A and all j £ E,
P(Zt = j) -)• 0, t -)• oo. (3.4)
It follows from this that Z cannot have a positive recurrent state k because
if it had, then we could choose the initial distribution of Z to be the
stationary ■k = v/mk to obtain from (3.4) that ttj = P(Zt = j) —> 0 as t —> oo,
that is, itj = 0 for all j, which contradicts ^ieE^i = 1- Thus Z is null
recurrent, and (3.4) is the desired limit result. The nonexistence of a
stationary distribution follows from Remark 2.2 at the end of Section 2.7. □
42 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
3.3 Discrete Time — Preliminaries
A discrete-time stochastic process
Z = (Zt)8°
with a finite or countable state space E is called a Markov chain in discrete
time if
P(Z„+i =j\Z0,...,Zn-i;Zn = i) =Pij, n>0, i,j £ E,
where P$j depends only on i and j. As in the continuous-time case we
assume here that time-homogeneity (the fact that P^ does not depend
on n) is part of the Markov property. The matrix Pn of n-step transition
probabilities is the nth power of P = (Pij : i,j £ E).
We use notation and terminology from the continuous-time case with the
following modification (to conform with standard practice even if it is not
formally needed): let tj be the time of first visit to state j (revisit if Z
starts in j), that is,
Tj = inf{n > 0 : Zn - j}.
Again the same argument as in Section 2.5 yields vP — v, where v is the
row vector with entries {k is a fixed state)
r n ~\
Vi = Efc 2_, l{z„=i} (note that now v^ = 1). (3.5)
We can use the same facts as in the continuous-time case except (3.1).
Irreducibility means that Pf" > 0 for some n > 0, and this does not imply
that it holds for all n > 0. In particular, periodicity can pop up in discrete
time. Therefore, we need the following property. The Markov chain Z is
called aperiodic if each state j 6 E is aperiodic:
gcd{n > 1 : P£ > 0} = 1;
here gcd denotes greatest common divisor, the largest integer that is a
factor of all the integers in the set.
3.4 Discrete Time — The Theorem
We shall now establish the following result.
Theorem 3.2. Let Z be an irreducible recurrent Markov chain in discrete
time with a finite or countable state space E. Then, with k a fixed state, the
row vector v with entries defined at (3.5) is stationary and the entries are
finite. Further, Z is either positive recurrent or null recurrent, and either
is aperiodic or has no aperiodic state.
Section 3. Classical Coupling - Recurrent Markov Chains 43
If Z is aperiodic and positive recurrent, then the classical coupling
(defined in Section 2.2) is successful, the row vector
■k = (1/rrij :jEE)
is a unique stationary distribution, and for all initial distributions A and
all j e E,
P\(Zn=j)-^TTj, n-^oo.
If Z is aperiodic and null recurrent, then no stationary distribution exists,
and for all initial distributions A and all j 6 E,
P\(Zn = j) ->0, n->oo.
Proof. Suppose Z has an aperiodic state h. Then, by Lemma 3.1(6)
below, there is an integer n such that P£h > 0 and P^1 > 0 [take
B = {k ^ 0 : Pfih > 0}]. For alii £ E and all integers /, k and m ^ 0,
we have p^k+m ^ P\hPhh^bA- ^v irreducibility, we can find / and m such
that P\h > 0 and P% > 0. Thus P/+n+m > 0 and pl.+n+1+m > 0. Thus
either all states are aperiodic or no state is aperiodic.
The rest of the proof is the same as in the continuous-time case (see
the proof of Theorem 3.1), except that the irreducibility of the bivariate
Markov chain (Zk,Z'k)1f is harder to establish, since we cannot rely on
(3.1). We have only that Pf™ > 0 for some n > 0, not necessarily all.
In order to establish that (Z&, ZjJ.)g° is irreducible, let h £ E be fixed and
hjii'ij' € E arbitrary. We must show that there is an n such that both
P"j > 0 and P"-, > 0. We have, for all integers l,m,l',m' ^ 0 and for all
n ^ (l + m) V(/' + m'),
P"i v- pi pn-1 — m r>m „„J pn v. of pn—l' — m'pm'
ij ^ rihrhh rhi ana r{,y ^ rilhrhh rhy.
Use the irreducibility of Z to find l,m,l',m' such that
p/h>o, ph™>o, p/;h>o, p$>o.
According to Lemma 3.1(6) below, if we take n large enough, then both
PhY'~m > 0 and P^l'-m' > 0 [take B = {k > 0 : P*fc > 0}]. □
Lemma 3.1. (a) Let A be a subset of Z that is aperodic [gcd A = 1],
additive [a, 6 6 A implies a + b £ A], and closed under negation [a 6 A =>
-ae 4]. Then A = 1.
(6) Lei B be a subset of the nonnegative integers that is aperiodic and
additive. Then there is an integer ub such that n 6 B for all n ^ rig.
PROOF, (a) Since A is closed under addition and negation, it follows that
A contains dZ, where d = min{fc ^ 1 : k £ A}. For each 6 £ A there is
44 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
a k £ Z such that 0 ^ b — kd < d, and due to the definition of d we have
b - kd = 0. Thus A = dZ, and the aperiodicity of ^4 yields d = 1.
(6) It is no restriction to assume that 0£B. The set
i:={fc£Z: there is an n^ € B such that n^ + k £ B}
is aperiodic [since B is so, take n^ = 0 to see that B is a subset of ^4],
additive [take na+b — na + nb], and closed under negation [take n_a =
na + a\. Thus A = Z, due to (a). Thus IgtI. We shall show that nB = n\
does the trick. Each n~^n\ can be written in the form
n = n\ + mn\ + k, where m ^ 0 and 0 ^ k < m.
Thus n = (ni + m — k)rii + k{n\ + 1), which lies in B [since B is additive,
since both n\ and n\ + 1 are in B, and since both n\ + m — k and k are
nonnegative]. □
3.5 Comment on the Strong Markov Property
A countable state space Markov jump process Z satisfies the strong Markov
property at hitting times: for a hitting time r = inf{i > 0 : Zt £ A}
of a subset A of E it holds that, conditionally on r < oo, the process
(Zr+S)se[0ioo) is a version of Z and is conditionally independent of (Zs)s€[0 T\
given ZT. In other words, at the time r the process Z starts anew in state
ZT independently of how it got there. The same comment applies to Markov
chains in discrete time.
Thus, when proving in Section 2.2 that Z" is a copy of Z', we were
actually using the strong Markov property of the bivariate process (Zs, Zj)s€[0oo)
at
T = hitting time of the diagonal {(j,j) ■ j £ E} of E2.
In fact, the strong Markov property holds at stopping times, that is,
random times r such that for each t ^ 0 the event {r ^ t} is determined
(measurably) by (Zs)s€[0t] [see, for instance, Theorem 3.1 in Chapter 1 of
Asmussen (1987)]. A Markov process with this property is called a strong
Markov process.
4 Classical Coupling - Rates and Uniformity
Have another look at the coupling time inequality (2.1)
||APt-A'P'|K2P(T>i). (4.1)
If we knew not only that T is finite but also how fast P(T > t) goes to
zero, then we would obtain a rate result for the convergence of the Markov
process. Also, if T is stochastically dominated by a finite random variable
Section 4. Classical Coupling - Rates and Uniformity 45
with distribution that does not depend on the family Pl, t > 0, as long
as it lies in some fixed class of transition matrices, then we would obtain
uniform convergence over that class.
In this section we shall take a closer look at the classical coupling time
T in two simple cases.
4.1 Birth and Death Processes
Let Z be the irreducible nonexplosive recurrent birth and death process
considered in Section 2. There we proved for the classical coupling time T
that
T ^ T0 V T0.
If we know, for instance, that tq and t'0 have finite a-moments for some
a > 0, that is,
E[t£] < oo and E^"] < oo,
then E[Ta] < E[r0a] + E[r^a] < oo and thus
taP(T > t) < E[Tal{T>t}] -> 0, £ -> oo,
which together with (4.1) yields the following rate result: for a > 0,
E[r0a],E[r^a]<oo => ia||AP' - A'P*|| -> 0, t -> oo.
It is worth noting that for this rate result we needed only recurrence, not
positive recurrence as in the generalization to Markov jump processes
mentioned in Section 4.3 below.
4.2 Finite State Space — Doeblin's Argument
The classical coupling was introduced by Doeblin in 1938 in order to
establish asymptotic stationarity of a regular discrete-time finite-state Markov
chain Z = (Zfc)g°. Regularity means that there is an integer m and an
s > 0 such that
P%>e, i,jeE. ■ (4.2)
Doeblin argued along the following lines. Let a differently started version
Z' run independently of Z until the two chains meet, at a time T, say.
From T onward let the two chains run together. Regularity implies that if
Z has not met Z' up to time mk, then it will meet Z' before time mk + m
with probability no less than e. Thus
P(T > km) < (1 -e)k -> 0, k -> oo.
(4.3)
46 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
Thus the chains eventually coincide and we obtain
|P(Z„ = j) - P(Z'n = j)\ *C P(Zn ?Z'n)^0, n^ oo.
Add to this the observation that maxie£ P,™ is nonincreasing and that
mini€E P?j is nondecreasing in n to deduce that P(Zn = j) has a limit.
(Nowadays one usually takes Z' stationary, which makes the last sentence
unnecessary.)
Doeblin's argument in fact yields that P(Zn = j) goes to the limit itj at
a geometric rate: from (4.3) and (4.1) with A' = ■k (and with t replaced by n)
we get
||APn - tt|| < 2(1 - £)["/m' (here [x] = sup{n e Z : n < x}),
which yields the following geometric rate result:
Vp< {l-e)-l/m : pn||APn-7r|| -> 0, n -> oo.
Note also that if we let Vm,e denote the class of transition matrices P
satisfying (4.2) with m and e fixed, then we obtain the following uniform
convergence result:
sup ||APn - tt|| -> 0, n -> oo.
The rate result also holds uniformly:
Vp < (1 - e)~l/m : pn sup ||APn — vr|| —> 0, n -> oo.
Pevm,£
Remark 4.1. Regularity is equivalent to irreducibility and aperiodicity.
This can be seen as follows. Regularity obviously implies irreducibility and
also that
which implies aperiodicity. Conversely, irreducibility and aperiodicity
together with PV> ^ PljPj]~l and Lemma 3.1(6) yield that PV> > 0 for all
m large enough, which, together with finiteness, implies regularity.
4.3 Comment on the Countable Markov Case
In the irreducible positive recurrent Markov jump case (and discrete-time
aperiodic Markov chain case) the following holds for the classical coupling
time T. Let j be an arbitrary fixed state and let Z and Z' have initial
distributions A and A', respectively. If a > 0 and
E[7f ] < oo, E[rja] < oo, Ej[r.,a] < oo,
Section 5. Ornstein Coupling - Random Walk on the Integers 47
then E[Ta] < oo, which yields (as above)
ilAP'-A'P'H-^O, t->oo.
Moreover, if
3p>l: E[pTj] < oo, E[pT'i] < oo, Ej[pTi]<oo,
then there is a p > 1 such that E[pT] < oo, which yields (as above)
^IIAP'-A'P'H-^O, t->oo.
This and more elaborate rate and uniformity results are established in
Chapter 10 (Section 7.5). See also Section 5 in Chapter 4.
4.4 Comment on Diffusions
A diffusion on [0, oo) is a continuous-time Markov process with state space
[0, oo) and continuous as a function of time. Two diffusions Z, Z' have
to meet before both have hit 0. Thus T < r0 V Tq holds, where T is the
classical coupling time and tq, Tq the hitting times of 0. The limit results for
birth and death processes were based on this inequality, and thus analogous
results hold for diffusions.
5 Ornstein Coupling — Random Walk on the Integers
The classical coupling need not be successful in the null recurrent and
transient cases. Integer-valued random walks are either null recurrent or
transient (see Remark 5.1 below). In this section we construct a coupling of
such walks that is always successful, provided that the step-lengths satisfy
a certain aperiodicity condition.
5.1 The Walk
Let X\,X%,... be i.i.d. finite integer-valued random variables that are
independent of the finite integer-valued random variable So- Put
Sk = S0 + X! + --- + Xk, 0<fc<oo.
Then 5 = (5fc)g° is called a random walk on the integers with step-lengths
X^,X2, ■ ■ ■ and initial position So- We further assume that the step-lengths
are strongly aperiodic: there is an h such that
P(Xi = ft) > 0 and gcd{n 6 Z : P(Xi - h = n) > 0} = 1.
Note that strong aperiodicity implies that the step-lengths are aperiodic,
that is,
gcd{n 6 1 : P(Xi = n) > 0} = 1.
48 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
The converse is not true, however. The step-lengths can be aperiodic
without being strongly aperiodic. For instance, step-lengths with distribution
P(Xi = 1) = P(-X"i = —1) = | are aperiodic but not strongly aperiodic.
Note in this example that the difference X\ — X[ of two i.i.d. step-lengths
X\ and X[ will take the values 2, 0, and -2, that is, the difference is not
aperiodic. This is the reason we assume strong aperiodicity: we shall need
step-length aperiodicity for the difference of two walks.
Remark 5.1. Clearly, 5 is a Markov chain with state space Z. When 5
is irreducible and aperiodic, it will be either transient or null recurrent. It
cannot be positive recurrent because if it were, then Theorem 3.2 would
yield that 7r = (1/rrij : j £ Z) is a stationary distribution, but the expected
recurrence times m.j are obviously all identical, say rrij = a, and positive
recurrence would imply Y^^j = 2 l/a = °°i which contradicts X^j = 1-
5.2 Ornstein Coupling
Let 5' be a differently started independent version of 5, that is, 5' is a
random walk on the integers,
S'k = S^ + X[ + --- + X'k, 0^fc<oo,
that is independent of S and has the same step-length distribution. Let
h be as above. Since X\ — h is aperiodic, there is a constant c such that
X\ — h is aperiodic on {|^i — h\ ^ c}, that is, we can take c large enough
so that
gcd{n 6 Z : P(Xi - h = n, \XX - h\ < c) > 0} = 1. (5.1)
Put S£ = S£ and for k ^ 1,
x,l = [X'k ii\Xk-X'k\^c,
k \Xk ii\Xk-X'k\>c.
By symmetry [since (Xk,X'k) ^ (X'k,Xk)}
P(Xk = n, \Xk - X'k\ < c) = P(X'k = n, \Xk - X'k\ < c),
which yields the second equality in
P(X'k' = n)
= P(X'k = n, \Xk - X'k\ <c)+ P(Xk = n, \Xk - X'k\ > c)
= P(X'k =n), n e Z.
Thus the step-length distribution of the random walk 5",
S'k'= SH + X^ + ■ ■ ■ + X'k\ 0^k<oo,
Section 5. Ornstein Coupling - Random Walk on the Integers 49
is the same as that of 5'. Since the initial positions are the same, 5" is a
copy of 5'.
The random walk R = {Rk)^ defined by
Rk = Sk- S'k\ 0 < k < oo,
has step-lengths Xk — Xj!, k > 1, which are symmetric and bounded:
Xi - X[' = X'l - XY and \XX - X['\ < c.
Further, with h as above and n 6 Z, we have
P(Xi - X[' = n) > P(Xi -X'1=n, \XY -X[\^c)
> P(Xi - h = n, \Xi - h\ < c, X{ = h)
= P(Xi - A = n, |Xi - /iK c)P(X{ = A),
which together with (5.1) and P(X[ = h) > 0 implies that R has aperiodic
steplengths, that is,
gcd{n 6 Z : P(Xi - X(' = n) > 0} = 1.
Such a random walk is irreducible and recurrent (see next subsection), and
thus
K = inf{fc > 0 : Sk = Sk} = the time of the first visit of R to 0
is finite with probability one. Let 5'" be the copy of 5' that sticks to the
path of 5" up to time K and then switches to 5. Then (5,5'") is a coupling
of 5 and 5' with coupling time K. This together with the coupling time
inequality
\\P(Ske-)-P(S'ke-)\\^2P(K>k)
yields the following result.
Theorem 5.1. Let S be an integer-valued random walk with strongly
aperiodic step-lengths. Let S' be a differently started version of S. Then there
exists a successful coupling of S and S', and
\\P(Ske-)-P(S'ke-)\\^o, k^oo.
The above coupling of random walks was introduced by Ornstein in 1968
and is named after its inventor.
50 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
5.3 The Difference Walk R is Irreducible and Recurrent
Here is an elementary argument showing that an integer-valued random
walk R = (-Rfc)o° witn symmetric bounded aperiodic step-lengths is
irreducible and recurrent (as a Markov chain).
Put R° = (RDF, where R°k = Rk - fl0- Clearly, the set
A = {n G Z : P(R°k = n) > 0 for some k}
is additive. Due to the step-length symmetry, A is closed under negation.
Due to the step-length aperiodicity, A is aperiodic. Thus A coincides with
the integers [see Lemma 3.1(a)]. Thus the Markov chain R is irreducible.
In order to establish recurrence, we shall first show that
P( sup |Jfc|=oo) =1. (5.2)
^0<fc<oo '
Fix an r > 0 and an n > 0 so large that p := P(fl° G [-2r, 2r}) < 1. Then,
for 1 ^ k < oo,
P(K € [-r,r],R°2n G [~r,r],... ,R%n G [~r,r}) ^pk.
Send k —»• oo to obtain P(sup0<fc<oo |.R£n| ^ r) = 0. Since r is arbitrary,
this yields P(sup0^fc<oo \Rk\ = oo) = 1. This implies (5.2).
Next, note that by step-length symmetry,
P( sup Rk = oo) =p( inf Rk = -oo). (5.3)
^0^k<oo ' \0^k<oo I
Put M„ = sup0^fc^n Rk and M^ = sup0^fe<oo flfc. Since {Moo = oo} =
{sup0<fc<oo(.Rn+fc — Rn) — oo}, it follows by the step-length independence
that the events {Mx = oo} and {Mn > x} are independent and thus
P(M„ > x, Moo = oo) = P(Mn > x)P(Moo = oo).
Send n -» oo to obtain P(Moo = oo) = P(Moo > z)P(Moo = oo) and then
x —> oo to obtain
P(Moo = 00) = P(Moo = OO)2.
Thus P(sup0<fc<oo Rk — oo) = 0 or 1 (example of Kolmogorov's 0-1 law).
This, together with (5.3) and (5.2), shows that
P( sup Rk = oo ) = 1 and P( inf Rk = -oo ) = 1.
^O^fc<oo ' \0^k<oo I
Thus R changes sign infinitely often. Due to the step-length bounded-
ness, there is a constant c such that R cannot change sign without visiting
{0,1,..., c - 1}. Thus R visits a finite set of states infinitely often. Thus,
due to irreducibility, the Markov chain R is recurrent.
Section 5. Ornstein Coupling - Random Walk on the Integers 51
Remark 5.2. Note that the proof of the infinite number of visits to [0, c)
does not depend on R having integer-valued step-lengths. Thus we have
established the following result. Let R be a random walk on R with symmetric
step-lengths that are nonzero with positive probability and bounded by a
constant c > 0 with probability one. Then R visits [0, c) with probability
one.
5.4 Comment on Nonidentically Distributed Step-Lengths
The Ornstein coupling can be modified to apply to step-lengths Xi, X2, ■ ■ ■
that are independent but not identically distributed and not necessarily
integer valued. Assume there is an integer-valued strongly aperiodic random
variable V and a p > 0 such that for each k > 1,
P(Xk =n) >pP(V = n), nel.
Let (this is no restriction, see Section 5 in Chapter 3) Ii,I2,... be i.i.d. 0-1
variables such that the pairs (Xi,Ii), (X2,I2), ■ • • are independent and, for
each k > 1,
P(Xk=n,Ik = l)=pP(V=n), neZ.
Let Vi, V2, ■ ■ ■ be i.i.d. copies of V, let V\, V2, ■ ■ ■ be independent of (Xi,Ii),
(X2,h), ■ ■ ■, and put, for k > 1,
x, (yfc if 4 = 1,
k \xk if 4 = 0.
Let So and 50 be independent, integer valued, and independent of (Xi ,I\),
(X2,I2),..., and Vi, V2,.... Put, for k > 1,
Sk =S0 + X1 +...+Xk and S'k = 50 + X[ + ■ ■ ■ + X'k.
It is no restriction to assume that V is bounded. Then
Rk=Sk-S'k, 0^fc<oo,
forms an integer-valued random walk with symmetric bounded aperiodic
step-lengths. Thus the time K of the first visit of this random walk to 0
is finite, and we have established a successful coupling of the differently
started versions 5 and 5'.
Remark 5.3. Another coupling that works for step-lengths that are
independent, integer valued, but not necessarily identically distributed is the
Mineka coupling; see Lindvall (1992), Section 14 in Chapter 2.
52 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
6 Ornstein Coupling - Recurrent Markov Chains
The Ornstein argument went as follows: in order to make two random walks
meet, start with independent walks and then change them by letting the
step-lengths coincide when the step-length difference is large. We shall now
apply this trick in the recurrent Markov case to the random walk formed
by times of visits to a fixed state: let the excursions of the process between
two visits coincide when the difference of the excursion-lengths is large.
This yields a coupling that is successful even when the recurrence is null.
Trivial modifications are needed: in continuous time to have the random
walk integer valued, in discrete time to have the random walk strongly
aperiodic.
6.1 Continuous Time
Let Z — (Z«)s€[o)00) be an irreducible recurrent countable state space
Markov jump process. Fix a state j and let Sn be the time of the (n + l)st
visit to j at integer time, that is,
S0=mi{keZ+:Zk=j},
and recursively for n ^ 0,
5„+i = inf{fc 6 Z+ : k > Sn and Zk = j}.
That P(5„ < oo) = 1, n > 0, can be seen as follows. By recurrence the state
j is entered infinitely often (with probability one). Since at each entrance to
j the process starts anew, the sojourn times in state j form an i.i.d. sequence
of exponential random variables, and thus (with probability one) there are
infintely many sojourn times greater than 1. Each sojourn interval that is
greater than 1 contains an integer, and thus (with probability one) there
are infinitely many integers k such that Zk = j. Thus P(5n < oo) = 1,
n >0.
Clearly, 5 = (5„ ){j° is a random walk [since Z starts anew from the state
j at the times 5„] on the integers, and the step-lengths Xk, k ^ 1, are
strongly aperiodic because, due to (3.1),
P(X1 = l) = Pj(Zi=j)>0,
P{X1 =2)2 Pj(Zi = i)Pi{Z1 = j) > 0, i ± j.
Let Z' be a differently started independent version of Z and define 5' in
the analogous way. Then 5' is a differently started independent version
of S, and due to the strong aperiodicity, the Ornstein construction in the
previous section yields a successful coupling of 5 and 5'. But this does
not suffice; we need a successful coupling of Z and Z'. For that purpose
Section 6. Ornstein Coupling - Recurrent Markov Chains 53
<*- u
o >
a o
§•£
'5 a
|S
as. -a
f
independent -
D
0 1 2 3 4 5 6 7
8 10 11 12 15 16 17 18
S0 St S2 Sj
FIGURE 6.1. The sequence S splits Z into a delay and cycles.
we introduce the concepts of delay and cycles: the increasing sequence of
random times 5 splits Z into the delay
D = (Zt)0^t<So (see Figure 6.1)
and the sequence of cycles (excursions, blocks)
Ck = (ZSk^+t)o^t<xk, 1 < k < oo.
Since Z starts anew from state j at the times Sn, it follows that the cycles
are i.i.d. and independent of the delay. Note that Z is uniquely determined
by the delay and cycles, and in particular, we obtain So as the length of
D, and Xk as the length of Ck- In the same way 5' splits Z' into a delay
D' and cycles C'k that are copies of C\.
Define Z" by mimicking the definition of 5" in the previous section: let
Z" be the process with delay D" = D' and cycles C£, 1 < n < oo, defined
by
C" =
Again by symmetry C" is a copy of C'n [just as in the previous section X'^
was a copy of X'n\. Since the pairs (Cn,C'n), 1 ^ n < oo, form an i.i.d.
sequence that is independent of D', the cycles C£, 1 < n < oo, of Z" form
an i.i.d. sequence that is independent of D" = D'. It follows that Z" is a
copy of Z'.
Let 5" be the sequence of integer times at which Z" is in the fixed state
3- Put K = inf{n > 0 : Sn = S£}. Since (as in Section 5.2) Rn = S„ - S£,
1 ^ n < oo, forms an integer-valued random walk with symmetric bounded
aperiodic step-lengths, it follows (see Section 5.3) that K is finite with
probability one, and thus so is
T = SK = S'Ji.
Note that the pairs (C„,C^'), 1 < n < oo, form an i.i.d. sequence that
is independent of the pair (D,D'). For each k > 0, the event {K = k}
54 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
is determined by (D, D",Ci,C",..., Ck,C'£) and thus is independent of
(Cfc+„,CJ[.'+n), 1 < n < oo. It follows that the cycles
Ck+i,Ck+2,- ■■ and CK+1,CK+2, ■ ■ ■
are i.i.d. copies of C\ and independent of
(D,Clt...,CK) and (D",C^...,C'k).
Thus both Z and Z" start anew at time T from the state j independently
of how they got there. Let Z"' be the process that sticks to the path of Z"
up to time T and then switches to Z, that is, Z'" has delay D"' = D" and
cycles
n [Cn iin>K.
Then Z'" is a copy of Z" (and thus of Z'), since both Z and Z" start anew
at time T independently of how they got there. Thus (Z, Z'") is a coupling
of Z and Z' with coupling time T, and since T is finite, we have obtained
the following result (apply the coupling time inequality (2.1) for the latter
statement).
Theorem 6.1. There exists a successful coupling of two differently started
versions of an irreducible recurrent countinuous-time countable state space
Markov jump process. Moreover, with Pl, t ^ 0, denoting the semigroup of
transitiori matrices,
HAP'-A'P'H-^O, i-*oo,
for all initial distributions A and A'.
This result enables us to improve the convergence result in the null
recurrent case (Theorem 3.1).
Corollary 6.1. With k a fixed state let v be the row vector with entries
defined at (2.5). Let c be a finite constant. In the null recurrent case,
P(Zt 6 A) -> 0, t->oo,
uniformly in subsets A of E satisfying ^2ie^ Vi ^ c.
Proof. As in Section 2.7 let A' be the stationary vector v conditioned on
a finite set of states B. Use (2.11) to obtain the second inequality in
P(Zt e A)< £ A'P/+ ||AP'- A'P'U
i£A i£B
Section 6. Ornstein Coupling - Recurrent Markov Chains 55
Thus, with sup„A<jc denoting supremum in A C E satisfying ~^2ieA Vi < c,
sup P(zt eA)^c/J2"i + Wxpt ~ x'ptW
■msJc ieB
-+cl2_,vii * —^ oo, (by Theorem 6.1)
i£B
->0, BtE,
since Eies^i -*• Y.ieEvi =mk = oo- n
6.2 Discrete Time
Let Z = (Zjt)§° be an irreducible aperiodic recurrent discrete-time
countable state space Markov chain. If the random walk formed by the visits to a
fixed state j has strongly aperiodic step-lengths, then the argument in the
continuous-time case works as it stands. If this is not the case, we proceed
as follows.
Let Jo,Ii,... be independent 0-1 variables such that P(Ik = 1) = \ for
k > 1. Then (Zk,Ik)'o' ls an irreducible aperiodic recurrent countable state
space Markov chain. Let j be a fixed state and Tj{n) be the time of the nth
(re)visit of Z to j. Clearly, the set
B = {k > 1 : Pj(Tj(n) = k) > 0 for some n}
is aperiodic and additive. Thus by Lemma 3.1(6) there are integers n, n'
and h such that both Pj(r,-(n) = ft) > 0 and Pj(r,-(n') = h + 1) > 0. This
and
P(i,i)(T-a,i)=*)>2-"Pi(ri(n)=fc), Ol, n>l, (6.1)
yields the strong aperiodicity of the time r^i) between two (Zk,Ik)'o' visits
to (j, 1). Put
So = inf{fc 6 Z+ : (Zk,Ik) = (j, 1)} = tWi1)
and recursively, for n > 0,
5n+1 = inf{fc 6 Z+ : k > Sn and (Zk,Ik) = (j, 1)} = r(jil)(n + 1).
Then [repeating the argument starting with the second paragraph of the
previous subsection with Z replaced by (Zk, h)™} there is a successful
coupling ((Zk,Ik)?,(Zk",Ik")%>) of (Zfc,Jfc)8° and (Z£,I£)8°, where (Z^)°°
is any differently delayed version of (Zk, 4)g°. Then (Z, Z'") is a successful
coupling of (Z, Z'), and we obtain a discrete-time version of Theorem 6.1.
Theorem 6.2. There exists a successful coupling of two differently started
versions of an irreducible aperiodic recurrent discrete-time countable state
56 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
space Markov chain. Moreover, with P denoting the one-step transition
matrix,
\\XPn - A'Pn|| -> 0, n-+oo,
for all initial distributions A and A'.
As in the continuous-time case this result enables us to improve the
convergence result in the null recurrent case (Theorem 3.2). The proof is the
same as that of Corollary 6.1.
Corollary 6.2. With k a fixed state let v be the row vector with entries
defined at (3.5). Let c be a finite constant. In the null recurrent case,
P(Z„ £A)->0, n -> oo,
uniformly in subsets A of E satisfying ^2n^A Vi ^ c.
Remark 6.1. A coupling of discrete time Markov chains that has not been
mentioned here is the Vasershtein coupling (see Lindvall (1992)). It is
obtained by simply maximal coupling (as in Section 4 of Chapter 1) at each
step the transitions of two differently started versions Z and Z' of a discrete
time Markov chain. This should not be confused with Griffeath's maximal
coupling which obtains identity at all times in the coupling time inequality:
||P(Z„ € ■) - P(Z'n 6 -)ll - 2P(T >n), n> 0,
see Theorem 6.1(a) in Chapter 3 and the observation at (3.1) in Chapter 6.
6.3 Aperiodicity Versus Strong Aperiodicity — Shift-Coupling
Although the 0-1 variable trick in Section 6.2 results in a successful coupling
of the Markov chains, it does not result in a successful coupling of the
random walks (r3'(n))^=0 and (tJ(^))^=0 formed by successive visits to the
fixed state j; the two walks will not merge in the end. We only obtain that
there are two finite random times M and M'" [namely the random integers
M and M'" such that Tj(M) = t^"{M'") = T = tUa)(K) = t^a)(K)] such
that
rj(M + fc) = rj"(M'"+fc), Jb > 0,
that is, the random walks (rj(n)^L0 and (r'"(n)^=0 merge in the end only
modulo the random time shift M — M'".
Applying the 0-1 variable trick in the Ornstein coupling of integer-valued
random walks S and 5' in Section 5 allows us to replace strong aperiodicity
of the step-lengths by the weaker condition that the step-lengths are only
aperiodic. This results in a coupling such that the walks merge only up to
a random time shift (see Section 7.5 for a formal statement).
Section 7. Epsilon-Coupling - Nonlattice Random Walk 57
Such a coupling is called a successful shift-coupling. It does not yield the
limit result ||P(S* e •) - P(S'k £ -)ll -> 0 as fc -> oo but only the weaker
time-average (Cesaro) result
n-l
r«-1^P(Ske-)-n-1^P(S[£-)
fc=o fc=o
0, k -» oo.
We establish this and many more results about shift-coupling in Chapters 5
and 6.
7 Epsilon-Coupling - Nonlattice Random Walk
We shall now apply the Ornstein approach and the 0-1 variable trick from
Section 6.2 to random walks with step-lengths that do not take values
exlusively in the integers or in any other lattice dlj,d> 0. We will only be
able to make the random walks come e-close and stay e-close from there
on; they will not merge. In fact, they will not even come e-close at the same
time; they come e-close only modulo a random time shift (cf. Section 6.3
above and Section 7.7 below). In spite of these limitations we can use
this coupling to prove Blackwell's renewal theorem (Section 8) and other
important renewal results (Section 10).
Although the arguments are now becoming familiar, we once more go
through the details.
7.1 Nonlattice Random Walks
A random variable X, and its distribution function F, is nonlattice if
Vd>0: P(XedZ)<l.
Note that a discrete random variable can be nonlattice (for instance, X
can have 1 and y/2 as its only values). Say that X can be close to a point
x, and that a; is a point of increase of F, if
V<5 > 0 : P(X e [x - S, x + 6}) > 0 [that is, F(x - S) < F{x + 6)}.
Thus X is nonlattice if and only if the points of increase of F are not all
contained in the same lattice.
Let S and S' be differently started independent versions of a random
walk (see Section 5.1) with nonlattice step-lengths.
7.2 Merge When Difference of Geometric Sums Is Not e-Small
Let Ii, I2, ... , I[, I'2, ... be i.i.d. 0-1 variables that are independent of 5
and 5' and such that P(/i = 1) = \. Put K0 = 0 and recursively for n ^ 1
Kn = ini{k > Kn-i : h = !}■
58 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
The Kn split the step-lengths into cycles
C„ = {XKn-x+\,- ■ ■ ,Xk„), n^l,
that are i.i.d. and independent of the initial position So- Let Yn be the sum
over the cycle Cn,
Yn = XKn_1+1+---+XKn, n>\.
Define C'n and Y'n in the analogous way. Fix an e > 0 and define a new
sequence of cycles C^, n ^ 1, as follows:
fc; if|yn-^K£)
n \cn if \Yn - y^\ > e.
By symmetry [since {Cn,C'n) = (C'n,Cn)\
P(Cn e B, \Yn - yj ^ e) = P(c; g b, |Y„ - y;\ <: s)
for sets B of cycle values. This yields the second equality in
p(c;' g b) = p(c; g b, |y„ - y^i ^ e) + P(cn g b, |y„ - y^\ > £)
= P(CnGB), n^l.
Thus
C'^Cn, n^l.
Also, note that the pairs of cycles (Cn, C^'), n ^ 1, form an i.i.d. sequence
that is independent of (So, S0). Let Y^' be the sum over the cycle C'^, that
is,
y„ [^ if \Yn~Y^\^s,
n \y„ if |yn - y^| > e.
7.3 The Difference Walk Has Nontrivial Step-Lengths
Let R = (fln)o° be the random walk with initial position So — S'0 and
step-lengths Yn — Y£, n ^ 1. These are symmetric and bounded by e:
Yj. - Y/' = Y{' - yx and | y - Y/' | sC e.
We shall prove that P(Yi - Y{' ^ 0) > 0 [recall that e > 0 is arbitrary].
Put
A — {x G K : Yi - Y/ can be close to x}.
Section 7. Epsilon-Coupling - Nonlattice Random Walk 59
Firstly, A is nonlattice, since X\ is nonlattice and if Xi can be close to x,
then x is in A: for all 6 > 0,
P(Y1 -Y{£[x-S,x + S\)
> 2"3P(X! + X2 - X[ £ [x - 6, x + 6})
> 2-3P(Xx £ [x - 6/3, x + 6/3})3 > 0.
Secondly, A is additive, since Yi - Y{ can be close to x if and only if there
are k and k! such that Xi + • • • + X^ — X{ — • • • — X'k, can be close to x
and since for all 6 > 0,
P(Xi + • • • + Xk+n -X[ X'k,+n, £[x + y-S,x + y + 6})
> P(XX + • • • + Xfc - X[ X'k, £ [x - 6/2,x + 6/2])
P(Xj +... + Xn-X[ X'n,£[y- 6/2,y + 6/2}).
Thirdly, if x £ A, then —x £ A, since Y\ - Y{ is symmetric. Fourthly, if
Xk £ A and xk —> x as k —» oo, then x £ A, since for each 6 > 0 there is
a k such that |xfc — x\ < 6/2 and thus, with F the distribution function of
Vi -1?.
F(i - <J) ^ F(xfc - 6/2) < F{xk + 6/2) <C F{x + 6).
The only subset of K with these four properties is K itself (see Lemma 7.1
below). Thus Y\-Y{ can be close to e/2. Thus Yi-Y/' can be close to e/2.
Thus P(Yi - Y/' ^ 0) > 0.
7.4 The Epsilon-Coupling
A random walk like R with step-lengths that are symmetric, bounded by e,
and nonzero with positive probability visits [0, e) with probability one (see
Remark 5.2). Thus
M = inf{ra > 0 : Rn £ [0,e)}
is finite with probability one. Define a new sequence of cycles C^", n > 1,
by switching from the C£ to the C„ after the Mth cycle:
c,u = {Cl ifn^M,
[Cn if n>M.
The C^", n > 1, are i.i.d. copies of Ci and independent of 5q [since both
CM+n, n ^ 1, and C^^, n > 1, are i.i.d. copies of Ci and independent of
(So.C'i', • • • ,Cm)]- For fc > 1, let X'k" and 4" be such that [with K% the
nth. k such that J£" = 1]
(X^Li+1,...,X£-',„) = C", n>i.
60 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
The random walk 5'" with initial position S'0" = S'0 and step-lengths X{",
X'-l', ... is a copy of 5', since 5'" has the same initial position as 5', since
the step-lengths of S'" are obtained in the same way from the cycles C^",
n > 1, as those of 5' from the cycles C'n, n > 1, and since both cycle
sequences are independent of S'Q and both contain i.i.d. copies of C\.
This yields the following result.
Theorem 7.1. For an arbitrary e > 0, the pair (5,5'") is a coupling of
the differently started nonlattice random walks S and S'. Moreover, Km
and KM are finite and
SKu+k-S%,»+k = RM€[0,e), k>0.
Finally, Km is a randomized stopping time for S in the presence of S'Hm,
that is, for each n ^ 0,
the event {Km = n} and the variable S'^„,
are conditionally independent of S given (S0,..., 5„);
and K'm is a randomized stopping time for S'" in the presence of Skm ■
Proof. Only the randomized stopping time claim has not been proved.
The event {KM=n} and the variable S'^„, are determined by (Ik)o°> (^)o°)
5', and (50, ■.. ,5n), which are independent of (Xk)^+i- Thus {Km =n),
S'^,,,, and [So, ■ ■ ■, Sn) are independent of (XjOIh-i- Since 5 is determined
by (So, ■ ■ ■ ,Sn) and (Xk)^+i, we obtain that {Km = n) and S'^„, are
conditionally independent of 5 given (So, ■ ■ ■, Sn). In the same way we see
that K'm is a randomized stopping time for 5'" in the presence of Skm ■ ^
7.5 Analogous Result for Integer-Valued Random Walks
A straightforward modification of the above argument yields the following
result.
Theorem 7.2. Let S and S' be differently started versions of an integer-
valued random walk with aperiodic step-lengths. Then there exists a coupling
(S,S') of S and S' and two finite random integers K and K' such that
SK+k = S'K,+k, 0 ^ k < 00,
and such that K is a randomized stopping time for S, that is, for n ^ 0,
{K = n} is conditionally independent of S given (So, ■ ■ ■, Sn),
and such that K' is a randomized stopping time for 5".
Section 7. Epsilon-Coupling - Nonlattice Random Walk 61
Proof. In Section 7.2 take e = 1 to obtain that Y\ — Y[' can at most take
the values —1, 0, and 1. In Section 7.3 take 5 = 0 to obtain that
A = {x£l: P(Xi - Y( = x) > 0}
is aperiodic and additive and closed under negation. Thus A = Z by
Lemma 3.1(a). Thus Yi - Y{' can take all the values —1, 0, and 1. In
Section 7.4 take e = 1 to obtain the desired result. □
7.6 The Reals Have No Proper Closed Nonlattice Subgroup
In Section 7.3 we needed the following result.
Lemma 7.1. Let A be a subset of M. that is nonlattice [there is no d > 0
such that dTL contains A], additive [x,y £ E implies x + y £ E], closed
under negation \x £ E implies —x £ E], and closed [xk £ A and Xk —>• x
as k —» oo implies x £ A}. Then A = E.
Proof. Since A is nonlattice and closed under negation, A n (0, oo) is not
empty. Put
d = inf AD (0,oo).
Then d £ A [since A is closed], and thus A contains dJL [since A is closed
under addition and negation]. There is an x £ A such that x 0 dZ [since
A is nonlattice], and thus d = 0 [since if d > 0 there would be an integer k
such that 0 < x — kd < d, but x — kd£ A, contradicting the definition of d].
Take dk £ A n (0, oo) such that dk i d = 0. Let x be an arbitrary real
number and let n* be such that n^dk ^ x < rikdk + dk- Then nud^ —> x,
and thus x £ A [since n^dk £ A and A is closed]. Thus A = E. □
7.7 Comment on Nonlattice Versus Strongly Nonlattice
Call a random variable X with distribution function F strongly nonlattice
if there is a point x such that X can be close to x and X — x is nonlattice.
Suppose S and S' have strongly nonlattice step-lengths (instead of only
nonlattice). Then the Ornstein construction in Section 5 yields a difference
walk Rk — Sfc — S'l having symmetric nonlattice step-lengths bounded by
some (large enough) constant c. Such a walk is e-recurrent for all e > 0 (in
order to establish this we can use the result from Remark 5.2 that R visits
[0, c] infinitely often: to be able to deduce that [0,e] is also visited infinitely
often it only remains to show that the symmetry and nonlatticeness implies
that each time R is in [0, c] it has a conditional probability greater than
some strictly positive p of hitting [0,e]). Thus we obtain the result that the
coupling (5,5'") of the strongly nonlattice random walks 5 and 5' satisfies
SK+k - S'H+k =RK£[0,e], k> 0,
where K = inf{fc > 0 : Rk £ [0,e]}. That is, the walks really come (and
stay) e-close, not only modulo a random time shift.
62 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
7.8 Comment on Successful Coupling
Consider a random walk S with nonlattice step-lengths taking only rational
values. Let S have a rational initial position and let S" be a version of S
having a nonrational initial position. Then Sk is rational for all k, and S'k
is nonrational for all k, and thus there can be no random integers K and
K' such that Sk = S'K,. In particular, there can be no successful coupling
of S and S' (that is, no K such that Sk = S'K).
There are, however, nonlattice step-lengths that allow successful
coupling, for instance continuous step-lengths. We shall prove this (and a more
general result, 'spread-out' step-lengths) in Chapter 3 when we have
introduced the most useful 'conditioning' and 'splitting' techniques.
8 Epsilon-Coupling - Blackwell's Renewal Theorem
Informally, Blackwell's renewal theorem can be stated as follows. A room is
lit by one light bulb. When it burns out, a new one is installed immediately.
Then, as time passes, the expected number of light bulb installations in a
time interval of length h will tend to h/m, where m is the expected life
length of a light bulb.
This is one of those intuitively obvious facts with no elementary proof.
Or so it seemed for many years until a coupling proof emerged. Even the
coupling argument was not fully elementary at the begining but has little
by little been refined down to the acceptably elementary form presented in
the previous section.
Below we introduce renewal terminology, state the theorem, and
complete the proof.
8.1 The Renewal Process
A renewal process S = (Sfc)^0 is a random walk with nonnegative initial
position and strictly positive step-lengths, that is, So is a nonnegative random
variable and
Sk = So+Xi +---+Xk, 0 O < oo,
where Xi,X2, ... are i.i.d. strictly positive random variables that are
independent of So. It is customary to think of the Sk as times when something
happens. Thus in renewal context we refer to the k in Sk as index, not
as time. Call So [the time when the first light bulb is installed] the delay
time and denote its distribution function by G. Call X\, X2, ■ ■ ■ [the life
lengths of the successive light bulbs] the recurrence times and denote their
common distribution function by F. Say that a renewal takes place at time
Sit [the time of the (k + l)th light bulb installation]. Put
m = E[Xi] = the mean recurrence time.
Section 8. Epsilon-Coupling - BlackwelPs Renewal Theorem 63
Let N(t, t + h] denote the number of renewals in the time interval (t, t + h],
N(t, t + h]:= #{k ^0:t<Sk^t + h}, t, ft > 0.
Thus N(t, t + h) = Nt+h - Nt, where (see Figure 8.1)
Nt '■= inf {k ^ 0 : Sk > t} = the number of renewals in [0, i\.
o
o
o
o
-—• •—• • • • ►
o s0 s, s, s3 s4 s5
I <«— x, —*• h-x,-*i -<—x3—*-\+ x4 *-h-x5-*i
FIGURE 8.1. The renewal process S and some associated random variables.
Take an x > 0 such that F(x) < 1 and let Ln be the nth k > 1 such
that Xk > x. Then Ln is the sum of n geometric variables with parameter
1 - F(x), and thus E[Ln] = ra/(l - F(x)) < oo. Since
Sk > xl{Xl>x} H 1- a;l{Xfc>x},
we have Nt ^ inf{fc : l{x!>x} H 1" !{xfc>x} > V1} = ^[t/sl+i. and thus
E[Nt] < oo, t > 0. (8.1)
If S0 = 0 identically, then S is zero-delayed. In particular, 5 has the zero-
delayed version
S° := (Sk - 50)C° = (X1 + ■ ■ ■ + Xfc)~.
The event {Nt— n} is determined by (So, ■ ■ •, S„) and thus is independent
of Xn+i, Xn+2, This implies that Xn,'+i,Xn,+2, ■ ■ ■ are i.i.d. copies of
Xi. Thus N[SN,,SNt + ft], the number of renewals in [Sn,, SWt + ft], is a
copy of N%, which together with N(t, t + ft] ^ N[SNt, SNt + ft] yields
N(t,t + h]^N%, £,ft>0. (8.2)
In order not to have the renewal process stuck in a lattice dZ we need the
assumption that F is nonlattice [recall the definition: P(Xi £ dZ) < 1 for
all d > 0].
64 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
8.2 Blackwell's Renewal Theorem - Idea of Proof
We are now ready to state Blackwell's renewal theorem. [Sharpened versions
of this result can be found in Chapter 3 (Theorem 6.2) and Chapter 10
(Theoerem 7.6).]
Theorem 8.1. If F is nonlattice, then
lim E[N(t, t + h]]- h/m [= 0 if m = oo]
for all delay time distributions G-
The obvious coupling approach to proving this theorem would be [station-
arity part] to look for a version S' with a delay distribution G' making
E[iV'(£, t + h]] = h/m (close to 0 when m = oo) for all t
and then [coupling part] construct a coupling such that the two processes
have common renewals from some random time T onwards. It turns out
that e-close renewals (Theorem 7.1) suffice; everything works out fine by
sending e 4- 0 in the end.
We carry out the proof in the next four subsections.
8.3 Stationarity Part of the Proof
The following is the key stationarity result (we use the corollaries).
Theorem 8.2. For the zero-delayed renewal counting process N° we have
f
Jo
(1 - F{x))E[N°_x] dx = t, 0^t<oo.
PROOF. Note that (5°+1 - Xi)^ is a copy of S°. This yields the
distributional identity in
oo oo oo
Nt-x - 22 1{S°n<it-x} = 2^ l{S»-X!^t-i} = 2^ l{x+S°-X!^t}-
Since Xi is independent of Yln=\ ^{x+s^~Xi^.t}^ this yields
oo
(1 - F(x))E{N?_x] = E[l{Xl>x} J2 Mx+sz-x^t}} •
n=l
Integrate over x (and move out sum and expectation) to obtain
/■oo °° r pXi
/ (l-F(x))E[N^x]dx = Y,V \ l{x+s^Ximdx\.
Jo n=1 lJo J
Section 8. Epsilon-Coupling - Blackwell's Renewal Theorem 65
Variable substitution yields the first equality in
r°° °° r rs*
/ (l-F(x))E[N?_x}dx = J2v\ W>
J° „=i lJs?t-x1
dy
= 5^E[5°At-(5°-X1)At'
n=l
oo
= £(E[S° At]- E[5°_! A t]) (since (5° - X>) £ S°_J
n=l
= lim E[5° At]- E[5n A t] (telescoping sum)
n—>oo
= t, (S° A t increases to t as n —» oo and 5q A t = 0)
and the theorem is established. □
When m < oo define a distribution function Goo by
Goo(x) = E[Xi A x]/m, 0 ^ x < oo.
Corollary 8.1. If m < oo, t/ten Gqo /tas density (1 — F)/m and
G = Goo => E[JV(i,i + ft]] = ft/m, t,h>0.
Proof. For 0 ^ i ^ oo we have Xi A x = J"0X l{Xi>j/} dy, which yields the
first equality in
H[X1Ax]=E[[ l{Xl>y}dy]
Jo
= [Xp(X1>y)dy= [\l-F(y))dy.
Jo Jo
This shows that Goo has density (1 — F)/m. Take G = Goo and use this
density to obtain the second equality in
E[Nt] = E[Nt°_So]
= m_1 / (1 - F(x))E[N?_Jdx = t/m (by Theorem 8.2)
Jo
and thus E[N(t, t + h]] = E[JVt+h] - E[iVt] = ft/m as desired. □
For 0 < a < oo, define a distribution function Ga by
= JE[Xi A x]/E[Xi A o], 0 < i <C o,
1 1, a ^. x < oo.
66 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
Corollary 8.2. For a < oo, Ga has density (1 — F)l[0,a]/E[Xi A a] and
G = Ga =>• E[N(t,t + h]] ^h/E[Xl A a], t,h>0.
Proof. The first display in the proof of Corollary 8.1 shows that Ga has
the desired density. Take G = Ga and use this density to obtain the second
equality in
E[N(t,t + h}} = E[JV°+ft_5o] - E[N°_So]
= E[XX A a]"1 f (1 - F(x))(E[N°+h_x] - E[iV°_J) dx
Jo
/•oo
^ E[XX A a]"1 / (1 - F(i))(E[^.J - E[JV°_J) di
Jo
= E[Xi A a]"1 ((t + h) -t) (due to Theorem 8.2)
and thus E[N(t, t + h]] ^ A/E[Jf i A a] as desired. D
Note that h/E[Xi Ao] 4- ft/m as a —» oo, and thus Corollary 8.2 implies that
we can choose the delay time distribution G' of N' so that E[N'(t,t + h]]
is close to zero uniformly in t when m = oo.
8.4 Coupling Part of the Proof
The proof of Theorem 8.1 is based on Theorem 7.1. Let N"' be the counting
process associated with S'" and put
T - Skm ■
In the time interval [T, oo) the renewals of S stay at distance Rm ahead of
those of 5"', and thus
t>T =>• iV"'(i-RM,* + ft-fiM] = iV(i,i + /i].
Now, Rm £ [0,e), and thus for e < h,
N'"(t, t + h- e]l{Tm ^ N(t, t + h]l[T<t} ^ N'"{t -e,t + h].
Subtract the term (N'"(t,t + h] - N'"(t,t + h - e])l{r>t} on tlie left and
add the term N(t,t + h]l^r>t} m tne middle and on the right to obtain
(after taking expectations and using N"' = N') the coupling inequality
E[N'(t, t + h-e]]- E[N"'(t, t + h}l{T>t}}
<^E[N{t,t + h}} (8.3)
^ E[N'(t -e,t + h]) + E[N{t,t + h)l{T>t}).
Section 8. Epsilon-Coupling - Blackwell's Renewal Theorem 67
8.5 Completing the Proof When m < oo
When m < cxd put G' = Goo and subtract E[N'(t,t + h]} = h/m on all
three sides of (8.3) to obtain
\E[N(t,t + h]]-h/m\
<; e/m + E[N'"(t, t + h}l{T>t}] + E[N(t, t + h]l{T>t}].
Both N'"(t,t + h]l{T>t} and N(t,t + h]l{T>t} tend to 0 with probability
one as t —>• oo and both are [see (8.1) and (8.2)] stochastically dominated
by the finite mean random variable N%. Thus by dominated convergence
(see Corollary 9.1 in Chapter 1),
E[N"'{t,t + h]l{T>t}]-+0, t-+oo,
E[N(t,t + h]l{T>t}]->0, *-^co, (8.4)
which yields
lim sup |E[iV(t, t + h]]- h/m\ < e/m.
t—too
Sending e 4- 0 yields the desired result when m < oo.
8.6 Completing the Proof When m = oo
Whenm = oo put G' =Ga and apply E[N'(t-e, t + h]] < (h + e)/E[Xi Aa]
to the second inequality in (8.3) to obtain
E[N(t, t + h]}^(h + e)/E[Xl A a] + E[N(t, t + h}l{T>t}}.
Send t —»• oo and use (8.4) to obtain the inequality in
lim sup E[N(t, t + h]]^(h + e)/E[Xi Aa]->Oaso-^oo,
t—>oo
while the limit result is due to monotone convergence. This completes the
proof of Blackwell's renewal theorem.
8.7 Comment on the Two-Sided Case
If we drop the condition that the Xn are strictly positive and only assume
(in addition to the nonlatticeness) that the random walk S has positive
drift, that is,
0 < m := E[Xi] < oo,
then Theorem 8.1 still holds. This can be established along the above lines
(since Theorem 7.1 holds for any nonlattice random walk) with the
following modifications.
68 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
Note that
Nt ■■= inf{fc ^ 0 : 5fc > 0
no longer equals
N[0,t]:=#{k>0:0^Sk^t}.
Both are, however, finite with probability one due to the strong law of large
numbers, which says that P(Sfc/fc —> m as k —>• oo) = 1 and thus (since
m > 0) P(Sfc -» oo as k -» oo) = 1. In fact, E[iVt] < oo and E[iV(-oo, ft]] <
oo; see, for instance, Asmussen (1987). Moreover, (SN,+k — Sn,)^Lo 1S a
zero-delayed version of S, and (t,t + h] C (—oo, Sn, + h], which yields,
instead of (8.2), that
N(t,t + h] ^N°{-oo,h}.
This allows us to use dominated convergence in Section 8.5.
Finally, the stationarity results hold if we replace (1 - F(x))E[N^_x] by
P(S°No > x)E[N°(-x,t - x}} in Theorem 8.2 and redefine
Ga(x) = E[S%S A a A x]/E[S%. A a], 0 ^ x < oo,
Goo(:r) = E[5^. A i]/E[5^.], 0 ^ x < oo.
9 Renewal Processes — Stationarity
We shall now establish a full-fledged stationarity result for renewal
processes using a method that differs from the one in Section 8.3, where we
only established a minimal stationarity result in order to prove Blackwell's
renewal theorem. The result below will be used in the next section together
with epsilon-coupling to obtain asymptotic stationarity for nonlattice
renewal processes. The constructive approach of this section will be used
in Chapter 8 to develop a general theory (Palm theory) on the relation
between stationary processes and processes consisting of a stationary
sequence of cycles. In Chapter 9 we move on with the same idea to processes
with d-dimensional time parameter (random fields).
9.1 Structure of the Stationary Version - Intuitive Motivation
Let 5 be a renewal process with m < oo (see Section 8.1). Let Xq be a
random variable such that Xo ^ So and such that the pair (Xo,5o) is
independent of Xi,X2, If we think of 5 as a sequence of times when a
light bulb burns out and is replaced by a new one, then So can be seen as
the residual life of the light bulb in use just before time 0, and Xo as its
Section 9. Renewal Processes - Stationarity 69
total life. Put S-i = S0 - X0. For t ^ 0, put
At =t - Sn,-i
Bt = SNt-t
Dt = XNt =At + Bt
Ut = At/Dt
age at time t,
residual life at time t,
total life at time t,
relative age at time t;
see Figure 9.1.
FIGURE 9.1. The age, residual life, total life, and relative age processes.
70 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
In order to get an intuitive feeling for the stationary version of the Markov
process (As,Bs,Ds,Us)se[o,oo), suppose F has a density /, consider a zero-
delayed renewal process and make the thought experiment of selecting a
time r uniformly at random in [0, oo).
Then the Markov process from time r onward should be stationary.
Further, the relative position of r in the renewal interval where it landed
should be uniform. Finally, the probability of r being in an interval of
length x should be proportional to xf(x)dx because (due to the law of
large numbers) the relative number of intervals of length x is f(x) dx and
the probability of landing in a particular interval of length x is proportional
to x (this length-biasing , the fact that an interval selected in this manner is
longer than an ordinary interval, is called the inspection paradox or waiting
time paradox).
Thus a stationary version should be obtained by placing the origin
uniformly in the initial interval after choosing its length Xo according to the
density xf(x)/m, that is, according to the distribution function
F00(x) = E[X1l{Xl^x})]/m.
Note that if Xo has this distribution, then for nonnegative (Borel
measurable) functions g,
E[g(X0)} = E[Xl5(X!)]/m. (9.1)
9.2 The Stationarity Theorem - Proof
We shall now prove that the above guesswork is correct.
Theorem 9.1. Let S be a renewal process with finite mean recurrence
times and delay time So = (1 — Uq)Xq, where
Xo has the distribution F^ and is independent of (Xi,X2, ■ ■ ■)
and Uq is uniform on (0,1) and independent of(Xo, Xi, X2, ■ ■ ■ )■
Then (AS,BS,DS,US) se[o,oo) «s stationary and
S0 = (1 - Uo)X0 and - S-i = U0X0
both have the distribution function Goo from Section 8.3.
PROOF. Since U0 and (1 — U0) are both uniform on [0,1] and both
independent of Xo, it holds that So and — S_i have the same distribution, and
the common distribution function is Goo, since
P(U0Xo ^x) = P(U0^ (Xo A x)/Xo)
= E[(X0Aa;)/Xo]
= E[Xi A x]/m (due to (9.1))
= G00(x). (by definition)
Section 9. Renewal Processes - Stationarity 71
Note that the process
Z° = (^s:-Bs>-Ds't/s°)s6[0,oo)
is determined by (X\, X2, X3,...) in the same way as the process
Z = (As_l+S, Bs_1+S, Ds_1+S, Us_1+S)se[o,oo)
by (X0,Xi,X2,...)- Since
(X2,X3!.-.) = (^i>^2,...), (9-2)
this yields the first equality in
1 r-Xa+t
E[T0It l^dS
Xq = x
r 1 rXi+t
= E
-an
Xi=x
l{z;<*}d*
Xi=x
, 0 < x < 00, z €
With z fixed call this expression g(x) and apply (9.1) to obtain
Efei l{Z^}dS
= E
r-x'i+t
/" 1+ 1 /
/ l{z,°<z}dsj/m.
(9.3)
Since J70 is independent of Z and since (At, Bt, Dt, Ut) = Zu0x0+t, we have
1 [Xa+t
in
r 1 rXo+t
P((At,Bt,Dt,Ut)<z) = E[TJ l{z.$z}d8
Combining this and (9.3) yields the first identity
mP{{At,Bt,Dt,Ut)^z) = E
r
*{z;&}ds
fx1+t
+ E / • l{z°^z}ds
lJXi
= E[Jt Mzs^yds
= E[| ' l{z.<z}da]+n[J l{z^z}ds\ (by (9.2))
Thus the distribution of (At,Bt,Dt,Ut) does not depend on t. Since the
process (AS,BS,DS, ^s)o<s<<x> is Markovian, this implies stationarity. □
72 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
10 Renewal Processes - Asymptotic Stationarity
The e-coupling constructed in Section 7 can be used for more than Black-
well's renewal theorem. Below we apply it together with Theorem 9.1 to
obtain total variation convergence of the total life and convergence in
distribution of the age, the remaining life and the relative age, in the nonlattice
case. The lattice case is dealt with at the end of the section using the results
for Markov chains from Section 3. We start with a general coupling result.
10.1 Epsilon-Coupling and Piecewise Constant Processes
Let Z — (Zs)se[o,oo) an<l Z' = (^s)s6[o,oo) be two continuous-time
stochastic processes. For e > 0, say that (Z,Z',T,T') is a successful e-coupling
of Z and Z' if (Z, Z') is a coupling of Z and Z', and T and T' are finite
random times such that
Zt+s = Z'T,+S, 0 ^ s < oo,
\T-T'\ ^e.
A stochastic process Z is stationary if
(Z(+S)s6[o,oo) = Z, 0 ^ t < 00.
Theorem 10.1. Let Z and Z' be piecewise constant right-continuous
processes with finitely many jumps in finite time intervals. Suppose Z' is
stationary and there is for each e > 0 a successful e-coupling of Z and Z'.
Then
Zt -* Zqi * "*■ °°-
PROOF. Let Z and Z' be an e-coupling (0 < e < 1) of Z and Z' with finite
times T and T". From the piecewise constancy we deduce that for t > 1,
Zt = Z't on C = {TVT' ^t, no Z' jump in [t-e,t + e]}.
Thus (by definition) C is a coupling event of the coupling (Zt,Z't) of Zt
and Zj, and the coupling event inequality (5.11) in Chapter 1 yields
||P(zte-)-P(^e-)ll
^ 2P(T V T' > t) + 2P(Z' jumps in [t-e,t + e]),
since Cc - {T\/T > i}\J {Z' jumps in [t- e,t+e]}. Use the stationarity
of Z' and Z' = Z' to obtain
||P(ZtG-)-P(^G-)ll
^ 2P(T V T > t) + 2P(Z' jumps in [1 - e, 1 + e]).
Section 10. Renewal Processes - Asymptotic Stationarity 73
Sending t —>• oo yields
limsup||P(Zt e-)-P(Zo g-)II ^ 2P(Z'jumps in [l-e,l+e]).
t—>oo
The right-hand side tends to P(Z' jumps at 1) as e 4- 0, and thus
limsup||P(Zt £-)-P(Zo G-)ll < 2P(Z'jumps at 1).
t—>oo
Let V be uniform on [1,2] and independent of Z' and use the stationarity
of Z' to obtain P(Z' jumps at 1) = P(Z' jumps at V"). Since there are only
finitely many jumps in finite intervals and V is uniform and independent
of Z', we have P(Z' jumps at V) = 0. Thus
limsup ||P(zte-)-P(Zoe 011 = 0,
t—>oo
and the proof is complete. □
10.2 Nonlattice Renewal Processes
We now use Theorems 7.1, 9.1, and 10.1 to establish asymptotic stationarity
(see Sections 8.1 and 9.1 for the definitions of S, At, Bt, Dt, and Ut).
Theorem 10.2. Let S be a renewal process with nonlattice finite mean
recurrence times. Let Y and U be independent random variables, Y with
distribution Fqq and U uniform on [0,1). Then
Dt%Y,
Ut%U,
both At and Bt 4 UY,
P{Ut ^u,Dt < x) -tuF^x), we [0,1], i€[0,oo),
P(At >x,Bt >y) -> 1-Goo(i+j/), x,y e [0,oo),
as t —> oo.
Proof. Theorem 7.1 yields a successful e-coupling of the stochastic process
(AS,BS,DS, Us)se[o,oo) and its stationary version (Theorem 9.1) and thus
of each of the piecewise constant processes
[0,oo)i (l{C/^u,£)^x})se[0,oo), (l{^,>x,BJ>y})s6[0,oo)j
and their stationary versions. A reference to Theorem 10.1 yields Dt % Y
and
P{Ut <u,Dt^x) -+P(U^u,Y^x),
P{At >x,Bt>y)-+ P(UY > x, (1 - U)Y > y).
74 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
Since P(UY > x) = 1 - G00(x), it only remains to check that
P{UY >x,(l- U)Y >y) = P(UY >x + y).
This follows from the independence of U and Y and the following [condition
or\Y = z and take a = x/z and b = y/z]:
P{U > a, (1 - U) > b) = P(a < U < 1 - 6)
= P(C/>a + 6), a, 6^0,
and we are through. □
Remark 10.1. If the recurrence times are continuous (or, more generally,
spread out), then the above convergence in distribution can be replaced by
convergence in total variation, since then the successful epsilon coupling can
be replaced by a successful (exact) coupling; see Theorem 6.1 in Chapter 3.
Remark 10.2. Since the events {At < x}, {Bt < x}, and {Dt < x} are
all contained in {N[t — x, t + x] > 0} and since
P(N[t - x,t + x] > 0) < E[JV(t -2x,t + x}},
it follows from Blackwell's renewal theorem (Theorem 8.1) that for every
0 ^ x < oo,
m = oo
P(At < x) -> 0, t-*oo,
P(Bt < i) -> 0, t -> oo,
P(D( < x) -> 0, t-^-oo.
But P(C/f ^ u) should still go to u, or what? I don't know.
10.3 Integer-Valued Renewal Processes
The lattice version of Theorem 10.2 is easily deduced from Theorem 3.2.
Without loss of generality we can take the span of the lattice dZ+ to be
one, d = 1. The result for general d follows by change of scale, that is, by
replacing 5 by (Sk/d)f=0.
Theorem 10.3. Let S be an integer-valued renewal process with aperiodic
recurrence times, that is, P(Xi € nZ+) < 1 for all n > 1. If m < oo, then
for all integers i > j ^ 0,
P(Dn = i,An = j) -*• P(Xi = i)/m, n -> oo.
If m = oo, then for all integers i and j,
P(£>„ = i, An = j) -» 0, n -> oo.
Section 10. Renewal Processes - Asymptotic Stationarity 75
Proof. Apply Theorem 3.2 to ((Dn,An) : n ^ 0), which is an irreducible
aperiodic recurrent Markov chain with state space
E = {(i,j):P(Xl=i)>0, i>j>0}.
The time between two ((£>„, An) : n ^ 0) visits to (i,j) € E is of the form
X\ H + Xk, where K is the first k such that Xk = i.
Since Kis geometric with parameter P(Xi =t), we have E[K] = l/P(Xi=i).
Since the event {K < k} is determined by X\,... ,Xk-i, it is independent
of Xk- Thus the expected time between two (Dn, An)g° visits to (i,j) is
m/P(Xi = i), due to the following lemma. □
Lemma 10.1. Let K be a nonnegative random variable. Then
E[K] = P{K ^ 1) + P{K ^ 2) + ... .
Further, if X\, X2, ... are nonnegative random variables such that for
all k ^ 1 it holds that E[Xk] = E[Xi] and that the event {K < k} is
independent of Xk, then
E[Xi +...+XK] = E[K]E[Xi] (Wald's identity).
Proof. The first claim follows from K = 1{k^i} + ^{k^2} + The
second claim follows from
E[Xi +---+XK] = E[X!l{x^1}] + E[X21{K^2}] + ...
= E[Xl}E[l{K^1}] + E[X2}E[l{K^2}}} + ...
= E[X1}(P(K>1)+P(K>2) + ...)
= EiX^EiK],
where the second equality is due to {K ^ k} being the complement of
{K < k] and thus independent of Xk, and the third equality is due to
E[Xk] = E[Xi]. D
10.4 More on Integer-Valued Renewal Processes
The limit result in the finite mean case, m < oo, can be stated as follows.
As time passes the total life becomes length-biased, that is, for integers i
such that P(Xi = i) > 0,
P{Dn = i)/P{Xi = i) -> i/m, n -> oo,
and the age becomes uniformly distributed on the total life, that is, for
integers i > j ^ 0 such that P(Xi = i) > 0,
P{An = j\Dn = i) -> 1/t, n -^ oo.
Results for the residual life Bn are easily deduced from Theorem 10.3
because Bn = Dn — An. In particular, we have the following results.
76 Chapter 2. MARKOV CHAINS AND RANDOM WALKS
Corollary 10.1. Let S be an integer-valued renewal process with aperiodic
recurrence times. Then for all integers J ^ 0 and k ^ 1,
P(^n = j, Bn = k) -> P(Xi = j + k)/m, n -> oo,
P{Bn =j + l)= P{An = j) -> P(Xi > j)/m, n -»• oo.
The lattice version of Blackwell's renewal theorem follows immediately from
the last observation by noting that
N{n} := #{fc > 0 : 5fc = n) = \{An=0}
and thus E[iV{n}] = P(An = 0):
Corollary 10.2. // 5 is an integer-valued renewal process with aperiodic
recurrence times, then
E[N{n}] -> 1/m, n -^ oo.
Summary
This chapter started by presenting the classical coupling (Sections 2, 3, 4).
We showed that it is successful for irreducible recurrent birth and death
processes and for irreducible (aperiodic in the discrete-time case) positive
recurrent Markov chains. Then the Ornstein coupling was introduced
(Sections 5 and 6). We showed that it is successful for integer-valued random
walks with strongly aperiodic step-lengths without any recurrence
condition and then applied it to construct a successful coupling of irreducible
(aperiodic in the discrete-time case) null recurrent Markov chains. Finally
(Sections 7 through 10), the Ornstein idea was used to construct successful
epsilon-couplings of random walks with nonlattice step-lengths. When
applied to renewal processes, this rendered Blackwell's renewal theorem and
several other results on asymptotic stationarity.
* * *
This ends the introductory part of the book. The next five chapters present
a general coupling theory:
Let me take you down 'cause I'm going to Strawberry Fields,
Nothing is real
Chapter 3
RANDOM ELEMENTS
1 Introduction
This chapter consists of two parts: Sections 2-5 and Sections 6-10. The
first part introduces general coupling tools, and the second part generalizes
some of the results from Chapters 1 and 2.
After a measure-theoretic review of terminology in Section 2, Section 3
explains what is meant by extending the underlying probability space and
collects some extension techniques. Sections 4 and 5 are devoted to
particularly important extension techniques: conditioning, transfer, and splitting.
These sections may seem rather technical, but the extension methods are
quite probabilistic in nature, and are used frequently throughout the book.
In Section 6, transfer and splitting are used to construct a successful
coupling of random walks with step-lengths that are spread out
(continuous step-lengths are a special case of spread out). In Section 7, splitting is
used to construct a maximal coupling of an arbitrary collection of random
elements. In Section 8, we consider the special case of two random elements
and formulate the maximal coupling result in terms of total variation. In
Section 9, splitting is used to turn lim inf convergence of distributions to a
distribution into a pointwise convergence where the random elements
actually hit the limit and stay there. In Section 10, we use transfer (and
Theorem 6.1 in Chapter 1) to turn convergence in distribution into pointwise
convergence for random elements in a separable metric space (Dudley's
extension of the Skorohod coupling), and then re-prove this result in the case
when the space is also complete (the Skorohod coupling) after generalizing
of the quantile coupling.
77
78 Chapter 3. RANDOM ELEMENTS
2 Back to Basics - Definition of Coupling
A random element in a measurable space (E,£) defined on a probability
space (ft, J7, P) is a measurable mapping Y from (ft, J7, P) to (E,£), that
is,
{Y G A} 6 T, A G £,
where
{Y G A} := {w G ft : Y(w) G A} =: Y~l A.
We also say that Y is supported by (ft, J7, P) and that Y is an ,F/£
measurable mapping from ft to E. Note that if we replace P by another probability
measure Q, then Y is the same measurable mapping but a different random
element. Also, if we replace J7 by a, larger a-algebra and/or £ by a smaller,
then Y is the same mapping but not the same measurable mapping. If we
replace T by a smaller cr-algebra and/or £ by a larger, then Y need not
even be measurable.
A random variable X is a random element in (R, B), where E denotes
the set of real numbers (the line) and B its Borel subsets, that is, B is the
smallest cr-algebra on R containing the open sets (the cr-algebra generated
or induced by the open sets),
B = B{R) = a{A C E : A open}.
An extended random variable X is a random element in
([-oo,oo],B([-oo,oo])).
When the line is regarded as time, we often call an extended random
variable T a random time. If T cannot take the values — oo and oo, then T is
a finite random time.
The distribution of a random element Y [under P] is the probability
measure on (E,£) induced by Y, namely PY"-1. Since
P(Y G A) =PY~lA, A££,
a more probabilistic notation for the distribution of Y is P(V G •).
A random element Y is canonical if Y is the identity mapping, that is, if
(ft, J7) = {E, £) and Y(u) = u, u G ft.
ThenP(Y G ■) = P-
A random element Y in {E,£) defined on a probability space (ft,^", P)
is a copy or representation of Y if
P(Y g-) =P(YG-); this is denoted by Y = Y.
A random element Y has always a canonical representation, the canonical
random element on (E,£,P(Y G ■)).
Section 2. Back to Basics - Definition of Coupling 79
2.1 Coupling Random Elements — Definition
For each i in an index set I let Yi be a random element in a measurable space
(Ei,£i) defined on a probability space (fij,.^,?;). A family of random
elements (Y : i G I) defined on a common probability space {(l,^, P) is a
coupling of Yi, i € I, if
Yi^Y
for each j£l.
Note that the Yi need not be defined on a common probability space,
in other words need not have a joint distribution. Thus 'coupling' can be
seen to refer to the fact that the copies % are defined on a common
probability space, have a joint distribution, live together. Writing (Y : i £ I) in
parentheses indicates that this is the case.
For any collection of random elements Yi, i € I, there is always at least
one coupling, the independence coupling , consisting of independent copies
of the Yi. This follows from the product measure theorem (Fact 3.1 below).
2.2 Coupling Probability Measures — Rephrasing the Definition
In terms of distributions the definition of coupling can be rephrased as
follows. For each i in an index set I let Pi be a probability measure on a
measurable space (Ei,£i). Define the product space
0(^i,fO:=(II£;*'0f')
i€l i€l iel
where Y\ieI Ei is the Cartesian product of the Ei,
YI Ei := {y = {yi : i £ I) : Vi £Ei,i£ I},
and 0ieI £t is the product cr-algebra, that is, the smallest cr-algebra on
Yli^jEi making the ith projection mapping taking y in Yliei^i to ^i Vlx
Ei measurable for all i £ I (the cr-algebra generated or induced by the
projection mappings):
(g) Si := a{{y : y{ G A} : i G I and A G Si}.
iei
A probability measure P on (&i€i(Ei,Si) is a coupling of Pj, i G I, if Pj
is the ith marginal of P, that is, if Pi is induced by the ith projection
mapping:
P({y.yi&A}) = Pi(A), AeSh iel.
80 Chapter 3. RANDOM ELEMENTS
2.3 The Relation Between the two Formulations
The latter definition of coupling can be seen as a canonical version of the
former by the following identification: let (Ei,£i,Pi) be the probability
spaces supporting the canonical copies of the individual random elements
Yi, that is,
Pi = Pi{Yi € ■), » € I,
and let (fliei ^i) ®iei &> P) be the probability space supporting the
canonical copy of the coupling (Y* : i € I), that is,
P = P((£ :<€!)€■).
Note that here we treat the expression (Yi : i € I) not as a collection
of individual random elements in (Ei,£i), i € I, but as a single random
element in 0ieI(.Ej,£j) defined by
{Yi-.iG I)((j) := (Yi{u>) : i € I), w € ft.
The distribution of this random element, P((li : i € I) € •), is the joint
distribution of the Yt, i € I.
3 Extension Techniques
Finding random elements with particular properties (having a particular
joint distribution with those already introduced) can be an essential task
in constructing couplings. These new random elements are often brought
into existence by extension, by extending the underlying probability space.
In this section, and Sections 4 and 5, we give several ways of doing this.
Let us start by making precise what we mean by extension.
3.1 Extending the Underlying Probability Space - Definition
A probability space (£l,!F, P) is an extension of another probability space
(ft, T, P) if (Q,^", P) supports a random element £ in (fi, J") having P as
distribution. If Y is a random element in {E,£) denned on (fi, T, P), then
the random element Y defined on (A,^, P) by
Y{Q) = Y{£(Q)), Q£&, (see Figure 3.1)
is a copy of Y, since for A € £,
P(Y £A) = P((erU)
= P(Y~1A) =P(y € A).
Section 3. Extension Techniques 81
(fi,J-,P)i »{E,£)
(ft.-F.P)
FIGURE 3.1. The original random element Y induced by Y.
Say that Y is induced by Y and call Y an orginal random element. Thus,
in particular, £ is the original random element induced by the canonical
random element on (ft, T, P). In addition to the original random elements
(which we shall think of as 'old') the probability space (A,^,P) may
support 'new' random elements not induced by random elements already
supported by (ft, T, P). These 'new' random elements we shall call external.
Convention of the common probability space. When there is no risk
of confusion, we often extend the underlying probability space (ft, T, P)
without changing its name: after extending (ft, T, P) to obtain (Cl,!F,P)
we rename the extension (ft, J7, P), and the induced Y we rename Y. This
identification of Y and Y explains the term 'original' for Y. This procedure
enables us to assume, when convenient, that all the random elements to be
considered in a certain context are defined on a common probability space,
which we then denote by (ft,T,P). Call this the convention of the
common probability space. Typically, in probability theory, it is not the actual
underlying probability space that matters but the (joint) distributions of
the random elements under consideration.
It is, however, crucial that new random elements be introduced in a
consistent manner and, as the example in Section 10 of Chapter 1 shows, we
must be careful here. For the rest of this section, and in the next two, we
give several safe ways of introducing new random elements. But first we
consider at some length an extension that does not yield any new random
elements.
3.2 Reduction Extension - Deleting a Null Event
Let (ft, T, P) be a probability space. An element lj € ft is an outcome and
a set A € T is an event. If P(A) = 0, then A is a null event. If P(A) = 1,
then A is an almost sure (a.s.) event. Any statement that holds on an a.s.
event (that is, for all outcomes in the event) is an a.s. statement.
It is common practice to remove a null event (or a set contained in a null
event, an outer null set) from the underlying probability space, thereby
Y
82 Chapter 3. RANDOM ELEMENTS
getting rid of some unpleasant outcomes (turning some a.s. statement into
a pointwise one). Although this is certainly a reduction of the space, it is
in fact an extension in the above sense of the word. This can be seen as
follows.
Let fi be a subset of Cl of inner probability one, that is, containing an
almost sure event A. (Note that Cl has inner probability one if and only if
its compliment, which we are going to delete, is an outer null set.)
Reduction extension. Define f, P, and £ by
T := JFn Cl : = {B D Cl : B € T), [the trace of Cl on T ],
P(Bnfi):=P(5), -BeJT, (3.1)
£(u>) := u>, Q G ft.
Note that P is well-defined on T because if B nCl = B' nft, then BC\A =
B' n A, and thus
P(B) = P(B n A) = P{B' n A) = P(B').
The above reduction is an extension, since {£ € B} = B D fi, and thus
P({GB)=P(B), B£J,
that is, £ has distribution P, as desired.
We can remove a countable number of null events, since their union is a
null event, but not an uncountable number unless, of course, we know that
their union is a null event. It is also worth noting that deleting a null event
is measure dependent: a null event with respect to P need not be a null
event with respect to another probability measure Q unless, for instance,
Q has density with respect to P (which is the same as saying that Q is
absolutely continuous with respect to P, that is, the null events of P are
also null events of Q).
3.3 Reduction Extension — Deleting an Inner Null Set
We shall now show that the above reduction extension (3.1) can be
generalized to a set Cl with outer probability one, that is, to a set Cl $ T such
that
P(A) = 1 for all A e F that contain Cl.
Note that Cl has outer probability one if and only if its complement is
an inner null set, that is, if and only if the complement of Cl has inner
probability zero:
P(A) = 0 for all A £ T contained in the complement of Cl.
Section 3. Extension Techniques 83
Thus reduction to a set of outer probability one is the same as deleting an
inner null set. In order to show that this is allowed we must check that if
(ft, T, P) is a probability space and A is a subset of ft with outer probability
one, then P as defined at (3.1) is well-defined on T. This follows by noting
that if B and B' are two sets in T such that B n fi = B' (1 fl, then the
events B\B' and B' \B are both in the complement of ft, and thus both
have probability zero, that is, P(B) - P(-B'). Thus Bnft = B'nft implies
P(J5 D fl) = P(B' n ft), that is, P is well-defined.
The reduction extension cannot be generalized beyond sets ft of outer
probability one because the complement of such a set would contain a
nonnull event, and thus deleting it would result in loss of mass.
We must be careful when deleting two or more inner null sets because the
union of two inner null sets need not be an inner null set, as the following
trivial example shows. Consider T = {0, ft} where ft has more than one
element; let P be the 0-1 measure. Then any nonempty proper subset A of
ft is an inner null set, since it contains only one element of J-', namely 0,
which is a null event. But ft \ A is also an inner null set. Thus if we first
delete A and then ft \ A, we have deleted all of ft. The lesson to be drawn
from this example is that after deleting an inner null set the next subset to
be deleted must be an inner null set of the reduced space (in the example
ft\ A is not a null set of {0, ft \ A}). This is never the case when the second
set is the complement of the first as in the above example.
Also, more care must be shown when deleting a single inner null set than
when deleting a single null event. For instance, if Y is a random element
in (E,£), A € £, and P(Y G A) = 0, then the event {Y G A} is a null
event and may be deleted. But what if A g £ and P(F G B) = 0 for
all B G £ contained in A? Then, in general, {Y G A} cannot be deleted,
because although A is an inner null set with respect to P(Y G •), the
set {Y G A} need not have inner measure zero with respect to P, as the
following trivial example shows. Let ft = {a, &}, T = {0, {a}, {&}, {a, &}},
and P({a}) = P({6}) = \. Let Y be the random element in (ft,{0,ft})
defined by Y(a) = a and Y(b) = b. Then A = {a} is an inner null set with
respect to the 0-1 measure P(Y G •) on (ft, {0,ft}). But {Y G A} = {a},
and thus {Y e A} is an event with positive probability P(Y G A) = \,
that is, {Y £ A} is not an inner null set with respect to P.
However, although not allowed in general, we shall in the next subsection
consider an important case where {Y G A} can be deleted if A is an inner
null set with respect to P(Y G •). This is the case when (ft, T) is the product
of two measurable spaces and Y is the projection on the second space. This
is a particularly important case because all our remaining extensions (all
the proper extensions, the product space extensions) yield a new random
element Y satisfying this condition and reduction is often carried out at
the end of such an extension (see Remark 4.1 below).
84 Chapter 3. RANDOM ELEMENTS
3.4 Deleting an Inner Null Set of a Product Space Component
Let P be a probability measure on the product space
(fi,.F) = (ni,.Fi)®(n2,Jr2)
and let Y be the random element in {Sl2,J-2) defined on (ft,T) by
Y(loi,u2) = u2 [Y is the projection of (ft,T) on (ft2,fa)]-
Let G be a subset of ft2 of outer probability one with respect to P(Y € •).
We shall show that the set {Y g G} = fti x Gc can be deleted from (ft, T).
[Since the reduction extension cannot be stretched beyond subsets ft of
outer probability one, this of course means implicitly that fti x G has in
fact outer probability one with respect to P.]
Put
ft := Q,i x G and define fa P and £ as at (3.1).
We must prove that P is well-defined on T. First consider product sets:
if B\ and B[ are in fa, and B2 and B'2 are in fa, and Bi x (B2 D G) =
B[ x (B2 n G), then ^ = B[ and both B2 \ B2 and B2 \ B2 are in the
complement of G, which implies
P(Y € B2 \ B'2) = P(r € B2 \ B2) = 0.
Hence
\P{B, x B2) - P(B[ x B2)|
= |P(5! x (B2 \ B'2)) - P(Bl x (B'2 \ B2))\
^ P{Y e B2 \ B'2) + P(Y e B'2 \ B2) = 0.
Thus Bj x (B2nG) = B[ x (-B2nG) implies P(Bi xB2) = P(B[ x52), that
is, P is well-defined on T\ x (T2 PlG). Since P is a probability measure, it
follows from this that P as denned at (3.1) is a well-defined probability
measure on the algebra of all finite unions of disjoint sets in T\ x (J"2nG). Thus,
by the Caratheodory extension theorem (see Ash (1972), Theorem 1.3.10),
P extends from this algebra to a unique probability measure (denote it by
P) on f. Now,
P(f £BiX B2) = P(B! x (B2 n G))
= P{Bl xB2), Bx£Ty, B2 £T2,
and thus £ has distribution P under P, as desired. This together with
{£ e A} = An ft finally yields thatP satisfies P(A n ft) = P{A) for all
Aef, that is, P is well-defined on T by (3.1).
Section 3. Extension Techniques 85
3.5 Independence Extension
In Chapters 1 and 2 we freely introduced new independent random elements
without changing the name of the previously introduced elements or the
probability measure. Although this could (together with deleting a null
event) be named the 'tool of the unconscious probabilist', it is in line with
the convention of the common probability space and is allowed due to the
following product measure theorem.
Fact 3.1. Let I be an arbitrary index set. For each i€l let Pi be a
probability measure on a measurable space (Ei,£i). Then there exists a unique
probability measure 0iel Pi on ®;6i(-E'i,£i) such that
^^({y&YlEi-.yi, £Ah,...,yin € Ain})
iel iel
= Pi1(Ail)---Pin(Ain)
for all integers n > 0, all i\,... ,i„ € I, and all A^ E S^,..., Ain € £ jn .
For a proof, see Halmos (1950), Section 38, Theorem B and comment (2).
(When I is countable, then Fact 3.1 is a consequence of Fact 4.3 below;
comment (2) in Halmos takes care of generalization to an arbitrary I.)
The measure ®iej Pi is called the product measure and
®(Ei,£i,Pi) := (l[Ei,(g)£i,(g)Pi)
iei iei iei iei
the product probability space.
Suppose (ft, T, P) is the probability space we wish to extend to support
independent random elements Y;, i € I, that are independent of the random
elements already supported by (ft, T, P). If Yt is to be a random element
in (Ei,£i) and to have distribution Pt, then this is achieved by putting
(n.^.P) := (H,^,P) ®^{Ei,£uPi),
iei
£(uj,y) :=uj, weft, yeJjEi,
Yi(u,y) := y{, i€l, weft, y £^\Ej.
jei
Call an external random element Yi obtained in this way independent (it
is independent of the original random elements).
3.6 Consistency Extension
If we wish to introduce dependent random elements, some restrictions are
needed. A measurable space (E,£) is Polish if there exists a metric on E
86 Chapter 3. RANDOM ELEMENTS
such that E is complete (each Cauchy sequence converges to a limit in E)
and separable (E has a countable dense subset) and such that £ is generated
by the open sets.
If our (fi, !F) is Polish, then we can introduce any collection of random
elements, provided that they take values in Polish spaces and that the
proposed finite-dimensional distributions are internally consistent and
consistent with P. This is due to the Kolmogorov extension theorem.
Fact 3.2. For each i in an arbitrary index set I let (Ei,£i) be Polish.
Assume that for each finite nonempty subset J of I we are given a probability
measure Pj on ®ipj{Ei,£i). Assume that the Pj are consistent, that is,
if K is a subset of J, then
Pj({(yi :i€J)€Y[Ei:(yi:i€K)€ A}) = PK(A), A G (g) £.
Then there exists a unique probability measure P on 0i6l(.Ej,£i) such that
for all J
P({(»i : i G I) G Y[Ei : (Vi : i G J) G A}) = Pj(A), A G ® fc.
i€/ i€J
For a proof, see Ash (1972), Section 4.4.3.
The extension of (fi, T, P) is analogous to the one in the previous
subsection. We shall not use this extension much because the methods in the next
two sections fit our purposes better, in particular by not demanding that
(fi,.F) be Polish.
4 Conditioning — Transfer
In this section we consider an extension technique that is particularly
useful for coupling purposes. The idea is straightforward: if the value of some
random element is y, then we introduce a new random element with
conditional distribution depending on y. Let us first recall some properties of
conditional distributions before giving the restriction needed for this to be
allowed.
4.1 Conditional Distribution — Regularity — Probability Kernel
Let (fi, J-,P) be a probability space supporting the random variable X. If
X is nonnegative, then the expectation (or expected value, or mean) of X
is
E[X]:= fxdl>= f X{u)P{dj).
Section 4. Conditioning - Transfer 87
If X is not nonnegative, then the expectation of X is
E[X] := E[X+] - E[X~],
provided that either E[X+] < oo or E[X~] < oo.
Let Q be a sub-cr-algebra of T. If X has a well-defined expectation, then
the conditional expectation of X given Q is the a.s. unique Q/B{[— oo, oo])
measurable function E[X|<?] from ft to [-00,00] satisfying
f E[X\g\{u)P{du) = [ X{u)P(du), A eg,
J A J A
that is,
E[E[X\g]lA] = E[X1A], A €Q.
Here a.s. unique means that two functions with this property are a.s.
identical, they are called versions of E[X|£].
If Y\ is a random element in {E\,S\) supported by (ft,T,P), then the
conditional expectation of X given Y\ is the a.s. unique cr(Yi)/B([— 00,00])
measurable function
E[X\Y1]:=E[X\a(Y1)]
while the conditional expectation of X given the value of Y\ is the a.s.
unique £i/B([—00,00]) measurable function E[X|Yi = •] satisfying
/ E[X|yi = y]P(Y1 g dy) = E[Xl{Yl€B}], BgS,.
Jb
A cr(Vi)/B([—00,00]) measurable function h is a version of E[X|Fi = •] if
and only if h(Yi) is a version of E[X|Fi].
The conditional probability of an event A € T given g, given Y\, or given
the value of Y\, is
P{A\YX)
V{A\Y, = ■)
= E[U|r1],
= E{1A\Y1=.],
respectively.
If Y2 is a random element in (E2,S2) supported by (ft,T,P) and we
Pick a particular version of P(Y2 G B\Yi = •) for each B £ £2, then
P(^2 G -\Y\ = ■) is the conditional distribution of Y2 given the value of Yi.
Another pick results in a different version of P(Y2 € -|Vi = •)• The random
element Y2 is conditionally independent of another random element Y0 given
Yi if P(r2 G -|^i = 0 is a version of P(Y2 G MY^Yo) = (-,•))•
88 Chapter 3. RANDOM ELEMENTS
If we consider P(Y2 € B\Yi = y) as a function of B keeping y fixed,
then it need not be a probability measure for all y. If P(>2 € -|Yi = y) is
a probability measure for each y € E\, then we say that the conditional
distribution V{Y2 € -\Y\ = •) is regular. Obviously there exists a regular
version of P(l*2 G -|Vi = •) if Y\ is discrete, that is, if E\ is finite or
countable and S\ the power set of E\ containing all its subsets. A more
useful condition is given by the following theorem, where the condition is
placed on Y2 rather than Yi.
Two measurable spaces {E,£) and {G,G) are Borel equivalent (or Borel
isomorphic) if there exists a bijection (that is, an invertible mapping) /
from E to G such that / is £/Q measurable and its inverse f~l is Q/£
measurable. The bijection / is a Borel equivalence. A measurable space
{E,£) is a standard space if it is Borel equivalent to (G,G), where G is a
Borel subset of [0,1] and Q are the Borel subsets of G.
Fact 4.1. There exists a regular version ofP(Y2 6 -|Yi = •) if (£2,£2) is
a standard space. Any Polish space is a standard space.
For a proof, see Ash (1972), Theorem 6.6.5 and Problem 8 in Section 4.4
(and the solution on pages 442-443).
A function Q(-, •) from E\ x £2 to [0,1] is an ((Ei,£i), (E2,£2)) probability
kernel if
Q(-,A) is £i/B([0,1]) measurable for each A e £2 and
Q{y, ■) is a probability measure on (E2,£2) for each y € E\.
Thus P(l2 € -|Yi = •) is regular if and only if it is a probability kernel.
4.2 Conditioning In
Suppose we need the existence of a random element Y2 having a particular
conditional distribution given the value of another random element Y\.
Then the underlying probability space can be extended to support Y2,
provided that the proposed conditional distribution is regular, is a probability
kernel. This is due to the following theorem.
Fact 4.2. Let Pi be a probability measure on (Ei,£i) and let Q(-, ■) be an
((Ei,£i), (E2,£2)) probability kernel. Then there exists a unique probability
measure P on (Ei,£i) ® (E2,£2) such that
P(AlxA2)= f Q{yi,A2)Pi{dyi), A^£x, A2 € £2.
For a proof, see Ash (1972), Section 2.6.2.
We can now condition in a new random element as follows. Let Yi be
a random element in (Ei,£i) defined on (Q,?,P) and let Q{-,-) be an
Section 4. Conditioning - Transfer 89
((Ei,£i),{E2,£2)) probability kernel. Note that
Q{Yl (•),•) is an ((fi, T), {E2,£2)) probability kernel.
Conditioning extension. Define T, P, £, Y, and Y2 by
($,?):= (n,F)®(E2,£2),
P{AxB):= [ Q{Y1(uj),B)P{(Lj), A£T, B e £2,
J A
£(u,y):=u, u € 0, y £ E2,
Yl{u,y):=Yl{u), lo € ft, y G E2,
y2(w,y) :=y, wen, !/e£2-
Theorem 4.1. T/ie conditional distribution ofY2 given Y\ is Q(Yi,-) or,
equivalently, for y € E\,
the conditional distribution ofY2 given Y\ = y is Q{y,-)-
Moreover, ifYo is a random element defined on (fi, T', P) and YQ its induced
copy, then
Y2 is conditionally independent of Yo given Y\.
Proof. With B € £2 and A e £\ ® £0,
P(Y2 € B,(Yl,Y0) € A) = P({(yi,lo) & A} x B)
= E{l{iYl,yo)eA}Q(Y1,B)}.
Thus [since (Y\,Yo) is a copy of (Yi, Yo)]
P(Y2 € B,(YUY0) EA) = E[l{(PliP2)eA}Q(Yi,B)],
that is, Q(Yi,-) is a version of P(Y2 € -|Yi, Y0), which yields the desired
results. □
Call an external random element Y2 obtained by the conditioning extension
conditional.
4.3 Conditioning in Countably Many Times
It is clear that the conditioning extension can be repeated finitely many
times. It is not as clear, however, that it can be repeated countably many
times, but this is in fact true due to the Ionescu Tulcea theorem.
90 Chapter 3. RANDOM ELEMENTS
Fact 4.3. Let (E\,£i), (E2,£2), • ■ • be a sequence of measurable spaces. Let
Pi be a probability measure on (Ei,£i) and let, for 2 ^ n < oo,
Q„{-,-) be an ( (^) (Ei,£i),(En,£n)\ probability kernel.
Then there exists a unique probability measure P on ®Ki<00{Ei,£i) such
that
P{AY x---xAnx Y[ Ei)
n<i<oo
= / (/ (•••(/ Qn{{yi,---,yn-i),dyn))---)Q2{yi,dy2))Pi{dyl)
JA\ J A2 J An
for all finite n ^ 2 and all Al € £j,..., An € £n.
For a proof, see Ash (1972), Section 2.7.2.
4.4 On Conditional Independence
Conditional independence comes up naturally when conditioning in a new
random element, and since it can be quite tricky to handle, we shall have
a look here at some of its basic properties.
Let Y0,Yi,... be random elements in (E0,£o)> (Ei,£i), ■ ■ ■, respectively,
defined on a probability space (n,J",P). Interpret the statements below
about P(-|lo) to mean that there is a version of P(-|Y0) sucn that the
statements hold.
Say that Y\ and Yi are conditionally independent given Ya if
P(Yi e -,Y2 e -\Y0) = P(Yi € -\Y0)P{Y2 £-\Y0), (4.1)
that is, for all bounded £i/B measurable functions ft, i = 0,1,2, it holds
that
V[fo(Y0)f1(Y1)f2(Y2)}=E[fo(Y0)E{f1(Y1)\Y0}E[f2(Y2)\Y0}}. (4.2)
This is equivalent to
p(y2 e -|y0,Fi) = p(f2 e -|io), (4.3)
that is, for all bounded £i/B measurable functions fi,i = 0,1,2, it holds
that
E[f0(Y0)fl(Yl)f2(Y2)]=E[f0(Y0)f1(Y1)E[f2(Y2)\Yo}}. (4.4)
In order to establish this equivalence note that the left-hand sides of (4.2)
and (4.4) coincide, and so do the right-hand sides, since
E[/o(io)E[/1(yi)|yo]E[/2(y2)|r0]]
= E[E[/0(y0)/i(Vi)E[/2(y2)|yo]in]]
= E[fo{Y0)f1(Y1)E[f2(Y2)\Y0]].
Section 4. Conditioning - Transfer 91
Due to (4.3), we also say (as in Section 4.1) that Y2 is conditionally
independent of Y\ given Yo, or that Y2 depends on Y\ only through Yq. It is
clear from (4.1) that conditional independence is symmetric in Yi and Y2.
Note that in (4.4) we may replace /o(Y0)/i(Yi) by g(Y0,Yi) where g is any
bounded S0 ®£\/B measurable function.
Lemma 4.1. The statement
Y3 depends on Y2 only through (Yi,Y0)
(4.5)
and on Yi only through Yo
is equivalent to the statement
Y3 depends on (Y2, Yi) only through Y0. (4.6)
Proof. If (4.5) holds then
P(Y3 € -\Y2, Yi, Yo) = P(Y3 G -\YU Y0) = P(Y3 G -\Y0),
that is, (4.6) holds. Conversely, if (4.6) holds then so does the latter part
of (4.5) since
P(Y3 € -in, Y0) = E[P(Y3 e -\Y2, Yu Y0)\Ylt Y0] = P(Y3 G -|^o),
which in turn yields the second equality in
P(Y3 € -|Y2, Yu Y0) = P(Y3 € -\Y0) = P(Y3 € -^i, Y0),
that is, the first part of (4.5) holds also. □
A random element Y\ and an event A are conditionally independent given
a random element Y0 if Y\ and 1^ are conditionally independent given Y0.
This implies that Y\ and Ac are conditionally independent given Y0.
Two random elements Y\ and Y2 are conditionally independent given an
event A, P{A) > 0, if
P(Yi £-,Y2£-\A)= P(Yi € -|A)P(y2 € -\A).
Note that this does not imply that Y\ and Y2 are conditionally independent
given Ac.
Conditional independence extends to more than two random elements as
follows: Yi,..., Yn are conditionally independent given Yq if
P(Yi €•,...,Yn€-|Y0)=P(Y1 €-|Yo')---P(Yn€-|Yo).
This is equivalent to any subfamily (Yj : i £ I) being conditionally
independent of the rest (Yj : i £ I) given Y0. Countably many random elements
Yi,Y2,... are conditionally independent given Y0 if Y\,...,Yn are so for
each n.
Finally, Yi,..., Yn (or Y\, Y2,... ) are conditionally i.i.d. given Y0 if they
are conditionally independent given Y0 and the conditional distribution of
Yi given the value of Y0 has a version that does not depend on i.
92 Chapter 3. RANDOM ELEMENTS
4.5 Typical Application - Transfer
The conditioning extension is often applied in the following situation. Let Y\
be a random element in (Ei,E\) defined on (fi,T,P) and suppose we have
managed to construct a pair (Y{,Y2) on some probability space (fi',.F',P')
where Y2 is a random element in some measurable space (£2,£2) and Y[
is a random element in (2?i,£i) such that
Further, suppose there exists a regular version Q(-, •) of the conditional
distribution of Y2 given Y{ (according to Fact 4.1 this holds in particular
when (£^,£2) is Polish, or standard, which is the main reason why we
sometimes assume Polishness). Then we can transfer Y2 to (fl,,?-,P) as
follows.
Theorem 4.2. With (Y{,Y2) as above the conditioning extension yields a
random element Y2 such that
(yuy2)Z(y;,y±),
and given Y\ the external random element Y2 is conditionally independent
of any original random element Yq. This transfer procedure can be repeated
countably many times.
Proof. This follows immediately from Theorem 4.1 except the final claim,
which is due to Fact 4.3. □
Thus if we are working with Y0 and Y\ defined on (fi,.F,P), then by the
common probability space convention we could have taken (fi,.F,P) large
enough to support a random element Y2 such that (Y\,Y2) is a copy of
(Y{,Y2) and such that, given Yi, the random element Y2 is conditionally
independent of Y0- Call such an external random element Y2 transferred
{Y2' has been 'transferred' from (fi',.F,P') to (fi,.F,P)).
We give an application of this transfer approach in Sections 5, 6, and 10,
and then repeatedly in the next chapters.
Remark 4.1. In the next chapters (see Section 2.12 of Chapter 4) we will
have use for the following immediate consequence of the reduction result
in Section 3.4 above [which is applicable because Y2 is the projection of
(n,T)®(E2,£2)on(E2,S2)}:
If Y2 takes values in a subset G of E2 or, more generally,
if G has outer measure one with respect to P'^' € ■),
then we may assume that Y2 takes values in G.
This can be a useful observation because we may need a random element
Y2 in a space (G,Q), and although there is not a regular version of the
Section 5. Splitting 93
conditional distribution of Y2' given Y{ when Y£ is regarded as a random
element in (G, Q), it may exist when Y"2' is regarded as a random element
in a larger space (£2,£2), where E2 is a set containing G, and £2 is such
that Q = £2 H G. Typically, in applications G is not an element of £2-
5 Splitting
Consider the following example. Let X be a continuous random variable
with density / and suppose
/ > pg where 0 < p < 1 and g is a density.
We would then like to say that X has density g with probability p. But
how are we going to tell whether X is governed by g or not?
We can do so if g and h := (/ - pg)/(l — p) have disjoint supports, that
is, if there is a Borel set B such that g = 0 outside B and h = 0 on B.
When X is in B then X has density g (see Figure 5.1).
(l-p)ft
The set B
FIGURE 5.1. If X falls on the left-hand side then it is governed by g.
More generally, we can tell whether X is governed by the density g or not
whenever X can be split as follows: there are random variables /, V, and
W such that
X = IV + {1- I)W,
where / is a 0-1 variable with P(7 = 1) = p and, conditionally on J = 1,
the variable V has density g.li I = 1, then X = V, so X has density g.
When this is not the case (see Figure 5.2 on the next page), we cannot
tell whether g is governed by g or not unless we extend the underlying
probability space bringing into existence such a 0-1 variable i\ We do this
below for general random elements and then prove a more general splitting
result.
94 Chapter 3. RANDOM ELEMENTS
PS
FIGURE 5.2. When is N(0,1) uniform on [-1,1] ?
5.1 Splitting Indicator
Let Y be a random element in a measurable space {E,£) defined on a
probability space (fi, J7,P). Let v be a subprobability measure on {E,£)
with mass \\u\\ = i/(E). Suppose 0 < \\u\\ < 1 and v is a component or part
of the distribution of Y, that is,
P(F £-)^v (short for P(Y € A) ^ v{A) for all A € £).
Let I' ,V, W be independent random elements defined on some probability
space (fi', JF',P') with distributions
P'(/' = 1) = ||i/|| and P'(/' = 0) = 1 - ||i/||,
P'(V'G-) = "/IIHI,
P'(W €.) = (P(Y€.)-*,)/(!-H).
Then the random element Y' defined by
Y' =
if I' = 1,
if J' = 0,
certainly is a copy of Y. Call Y' a splitting representation of V.
Due to Theorem 4.2 and Y' = Y we can now transfer /' to (fi, T, P) to
obtain a 0-1 variable / such that (Y,I) — (Y',I'), that is,
P(J = 1)
and P(Y e-|J= 1) = W
Call / a splitting indicator.
After this splitting extension we can tell when Y is governed by *VIMI:
when / = 1, then the conditional distribution of Y is ^/||^||. Moreover, if
Y\ is another random element that was supported by (fi, T, P) before the
splitting, then Y\ is conditionally independent of / given Y. In particular,
if Y\ is independent of Y, then Y\ is also independent of (Y, I).
Due to Theorem 4.2, splitting can be repeated countably many times.
Section 5. Splitting 95
5.2 Generalization Beyond a Single Component
Let 1/1,1/2,- ■ ■ be subprobability measures on {E,£) and suppose
P(Y £•) = J/j +l/2 + ....
Then clearly an analogous argument to the one in the previous subsection
yields an extension of ($l,T, P) supporting a nonnegative integer-valued
random variable K (a splitting variable) that tells us which of the
components Vi, 1 ^ i < oo, governs the distribution of Y, that is,
P(K = t) = INI,
P(Y g -\K = i) = i/i/\\i/i\\ (arbitrary when H^H = 0).
More generally, let Y\ and Y2 be random elements in (E\,£\) and (E2,£2),
respectively, defined on a probability space (fi, T, P). Let /i be a
probability measure on a Polish space (E3,£3), let v{-, ■) be an ((E3,£3), (E2,£2))
probability kernel, and suppose
P(Y2 G A) = /" i/(i, A)/i(di), A € £2.
Then (fi, JF, P) can be extended to support a random element Y3 (a splitting
element) in (£3,£3) having distribution /i and such that
P(Y2£-\Y3 = y) = l/(y,-), y € E3.
Furthermore, Y\ is conditionally independent of Y3 given Y2, and in
particular, if Yi is independent of Y2, then Y\ is independent of (Y2,Y3).
This generalized splitting can be repeated countably many times. It is a
special case of Theorem 5.1 below (take Yq nonrandom).
5.3 Conditional Splitting
In the next section we shall need to be able to split conditionally on the
value of a random element Y0.
Theorem 5.1. LetY0, Y\, andY2 be random elements in (E0,£0), {E\,£\),
and (E2,£2), respectively, supported by a probability space (il,J-, P). Let
{E3,£3) be a Polish space and /i(-, •) be an ((Eo, £0), (E3, £3)) probability
kernel. Let v{-,-) be an {{E0,£o)®(E3,£3),{E2,£2)) probability kernel and
suppose, for yQ € EQ and A £ £2, that
P(Y2 £ A\Y0 =y0) = J v{(y0, y3), A)^(y0,dy3). (5.1)
Then (fi, T, P) can be extended to support a random element Y3 in (E3,£3)
such that for y0 € E0 and y3 € E3,
P(Y3e-\Y0 = y0) = ti(y0,-),
P(^2 € -\Y0 = y0,Y3 = y3) = v({y0,y3),-). (5.2)
96 Chapter 3. RANDOM ELEMENTS
Moreover, Y\ is conditionally independent ofY3 given (Yb,l^), and in
particular, ifY\ is independent of [Yq,Y2), thenYi is independent of (lo,^2,^3) •
Conditional splitting can be repeated countably many times.
Proof. Due to Fact 4.2, there is a probability space (ft', T', P') supporting
random elements yo', Y2', and F3' such that for A0 € £0, A2 € £2, and
A3 € £3,
p'(y0'€^0,y2'€^2,y3'€^3)
r r (5-3)
= / (/ "{{yo,y3),A2)n{yo,dy3))P{Y0 £dy0).
J A0 J A3
Take A3 = E3 and compare with (5.1) to see that (Y0',Y2') is a copy of
{Yo,Y2). Since (E3,£3) is Polish, this allows us to transfer (Theorem 4.2)
Y3' to (ft,T,P) to obtain a random element Y3 such that (lo^,^) is a
copy of (y0',r2',y3'). Thus
p(y2 e -\y0 = -,y3 = -) = p'(y2' e -|y0' = -,y3' = •)•
Due to (5.3), the right-hand side equals ^((-,-)j') and (5.2) follows.
Theorem 4.2 also yields that Y\ is conditionally independent of Y3 given (Y0, Y2).
This in turn yields the independence claim. Finally, due to Theorem 4.2,
this type of extension can be carried out countably many times. □
In Chapter 10 (Section 4.5 on Harris chains) we need the following
conditional version of the 0-1 variable splitting in Section 5.1 above.
Corollary 5.1. LetYo, Y\, andY2 be random elements in some measurable
spaces (Eo,£q), {E\,£\), and (.E^,^), respectively, supported by a
probability space (ft, T, P). Suppose P(y2 € -\Yo = •) has a regular version Q{-; ■)
and there is a subprobability measure v on (£^,£2) such that 0 < ||j/|| < 1
and
Q{yo, 0 >v, y0£E0. (5.4)
Then (Cl,J-,P) can be extended to support a 0-1 variable I such that for
y0 £ E0 and y2 £ E2,
Y\ is conditionally independent of I given (yo,y2), (5.5a)
P(j = i|y0=»o) = IHI, (5-56)
P(y2G-|yo = 0o,-T=l) = jAj, (5.5c)
P(f = l|y0=y0, Y2=y2)= V{dy^ (5.5rf)
Q{yo,dy2)
This conditional splitting can be repeated countably many times.
Section 5. Splitting 97
Proof. In Theorem 5.1 take E3 = {0,1} and, for y0 £ E0,
Myo,{i}) = IMI and Myo,{o}) = i-|HI,
"((2/o, 1), 0 = Tj^jj and i/((y„, 0), •) = t _ ^
to obtain (5.1) from (5.4). This yields the desired results (take I = Y3)
except (5.5rf). In order to obtain (5.5d), take A0 £ £q and A2 £ £2 and
deduce from (5.56) and (5.5c) that
P(y0 € A0,Y2 £A2,I = 1) = P(y0 e A))K^2)-
Combine this and
v{dy2)
II
J An J A
A0 Ja2 Q{yo,dy2)
u{dy2)
II
JAo J A;
P(Y0 edy0,Y2 edy2)
P{Y0 £dy0)Q(y0,dy2)
Ia2 Q{Vo,dy2)
= 11 P(Vo € dy0)p{dy2)
J An J A2
= P(y0 e a0)p{a2)
to obtain
P(Y0£A0,Y2 £A2,I = l)
v{dy2)
-II
J An J A:
P(Y0£dy0,Y2£dy2).
Ia0Ja2 Q(Vo,dy2)
This yields (5.5rf). □
5.4 Review of Splitting - Brownian Motion
As a review of this section, consider the example of a standard Wiener
process (Brownian motion), namely, a real-valued Markov process (VFs)se[0,oo)
with continuous paths, W0 = 0, and stationary independent increments.
Then for each t £ [0,oo), Wt is normal with mean 0 and variance t. In
particular, W\ is N(0,1); see Figure 5.2.
According to the first subsection (take Y — W\), we can introduce a
splitting indicator I such that 7 = 1 indicates that W\ is uniform on [-1,1].
The reason why this is more useful than a splitting representation (V, /'),
where Y' is only a copy of W\ and not W\ itself, is that by introducing I
we can split without losing the process (Ws)se[0iOOy
This means, for instance, that we can repeatedly split in the same
process: first split W\, then W2, and so on. This yields a sequence of splitting
98 Chapter 3. RANDOM ELEMENTS
indicators 7i,/2,... such that for each n ^ 1, 7n = 1 indicates that the
N(0,n) variable Wn is uniform on [—1,1].
Moreover, suppose we allow Wq to be a random variable taking values
in a bounded interval [a, 6]. Then the conditional distributions of W\ given
W0 = x, x € [a, 6], have a common uniform component. Thus according to
Corollary 5.1 (take Y0 = W0 and Y2 = W\), we can split W\ such that I
is independent of W0, and such that given I = 1, W\ is uniform on [—1,1]
and independent of Wq.
6 Random Walk with Spread-Out Step-Lengths
In this section we apply splitting and transfer to the coupling of random
walks. At the end of the section the coupling result is used to sharpen
Blackwell's renewal theorem.
Let S = (5fc)g° be a random walk on the line, that is,
Sk =S0 + Xl +--- + Xk, 00<oo,
where the step-lengths Xi, X2,... are i.i.d. finite random variables that
are independent of the initial position S0. Let S' be a differently started
version of S, that is, let S' be a random walk with the same step-length
distribution as 5.
In Chapter 2 we showed in the lattice case (for integer-valued walks with
strongly aperiodic step-lengths, Theorem 5.1 of Chapter 2) that there exists
a successful coupling of S and 5', that is, a coupling (5, S') of S and 5',
and a finite random integer K such that
Sn = S'n, n ^ K.
In the nonlattice case (when the step-lengths are strongly nonlattice,
Section 7.7 in Chapter 2) we only managed to obtain that for each e > 0, there
is a coupling (5, S') of S and S' and a finite random integer K such
\Sn - S'n\ = \SK -S'K\^e, n> K.
In this section we shall establish that a successful coupling actually exists
in the nonlattice case, provided that we assume that the step-lengths are
spread owf, namely, provided that there exists an integer r ^ 1 and a
nonnegative Borel measurable function / such that J*R / > 0 and
P(Jfi+ ■■■ + *,.€ A) £ //, A£B. (6.1)
J A
Theorem 6.1. Let S and S' be differently started versions of a random
walk with spread-out step-lengths. Then there exists a successful coupling
(S,S') ofS andS', and
||P(5„G-)-P(S;e-)H-^0, m-kx),
Section 6. Random Walk with Spread-Out Step-Lengths 99
where || • || denotes the total variation norm. Moreover, the coupling time
K is a randomized stopping time for both S and S'.
We prove this coupling result in the next four subsections. (The limit result
follows in the standard way; see Chapter 2, Section 2.3, or the next chapter,
Theorem 5.1. And the randomized stopping time claim follows in the same
way as in the proof of Theorem 7.1 in Chapter 2.)
6.1 Key Idea of Proof - Uniform Step-Lengths
Suppose (6.1) holds with r = 1 and / = l[0,2]/2, that is, the step-lengths
are uniformly distributed on [0,2]. Suppose also that S0 = 1 and S'0 = 0.
Let S" be the copy of S' with Stf — 0 and fcth step-length defined by
„ .= [Xk + 1 if Xk^ 1,
* ' \Xk-l \iXk> 1.
Then Rk := Sk — SJj.', 0 ^ k < oo, forms an integer-valued random walk
with symmetric aperiodic bounded step-lengths, and thus (see Chapter 2,
Section 5.3)
M : = inf{fc ^ 0 : Rk = 0} = inf{fc ^ 0 : Sk = S'£}
is finite with probability one. Define a copy S"' of 5' by switching from S"
to S at time M,
g.^is- if n<M,
" \Sn if n^M,
(and delete the null set {M = oo}) to obtain that (5, S'") is a successful
coupling of S and S' with time K = M.
6.2 Splitting Part of Proof - Step-Lengths with Uniform Part
Now suppose (6.1) holds with r = 1 and / = c\a,b] f°r some constants a < b
and c > 0. Clearly, (6.1) still holds if we replace [a,b] by the subinterval
[a, a + 2d(s, s')], where s and s' are any real numbers and
d(s,s') '■= sup{x € [0, (6 - a)/2] : \s — s'\/x is an integer}.
By recursive conditional splitting (see Section 5) extend the underlying
probability space to support 0-1 variables h, I2,... such that given (50,5q),
the pairs (X\, /i), (X2, h), ■■■ are conditionally i.i.d. and such that for each
k ^ 1 and all real s and s',
P(Ik = l\(S0,S'0) = (s,s')) = 2cd{s<sl)
100 Chapter 3. RANDOM ELEMENTS
and, given (S0,SQ,Ik) = (s, s',1), the conditional distribution of Xk is
uniform on [a, a + 2d(s>s/)].
Let S" be the copy of S' with S0' = S0 and fcth step-length defined by
(Xk if Jfc = 0,
X£ := < Xk + d{So,S'0) if 4 = 1 and Xk^a + d{So,s'0),
[Xk - rf(s0,s^) if h = 1 and Xfc > a + rf(s0,s^)-
Conditionally on (S0,S0'),
fl*:=(S*-S£')Ms„,sj'). 0<*<oo,
forms an integer-valued random walk with symmetric aperiodic bounded
step-lengths. Thus
M := mi{k > 0 : Rk = 0} = inf{fc ^ 0 : Sk = S£}
is finite with probability one.
Now note that both IM+1,Ijf+2,... and X^f+1,X^f+2,... are
sequences of i.i.d. copies of X\ and that both sequences are independent
of (So, S",..., S'm). Since Sm = S^, this means that we again obtain a
copy S"' of S' by switching from S" to S at time -R" = M, that is, we have
again established the existence of a successful coupling.
6.3 Transfer Part of Proof - Uniform Part After r Steps
Now allow r > 1 but still assume that / = cl[a b]. Then the random walk
(Skr)kxL0 with initial position S0 and fcth step-length
Lk '■= -X"(fc-l)r+l + • • ■ + Xkr
has a uniform component in one step. Proceed as in the previous subsection
to obtain a copy L'k' of Lk such that given (So,S0), Lk — L'k is symmetric
and takes the values 0 and ±d(s0,s") with positive probabilities, and
(-X"(fc-i)r+i, • • • ,Xkr,Lk,Lk), k ^ 1,
is a conditionally i.i.d. sequence. Since L'fc' is a copy of Lk, we can recursively
apply the transfer extension (Theorem 4.2) to obtain (-X"(/t_i)r+1, • • • :Xkr)
such that -XV'fc-D,..!, • • •, Xkr are i.i.d. copies of X\ with sum L'fc' and, given
(S0,S0),
(-X"(fc-l)r+l, • • • >-X"fcr,£fc,£fc,-X"(fc_l)r + l, • • • ,Xkr), K ^ 1,
is a conditionally i.i.d. sequence. Let S" be the copy of S' with S0' = S0
and step-lengths X[',X%,.... Conditionally on (S0, S0'),
ijfc := (Sfcr - 5i'r)/d(So,5»), 0 ^ fc < oo,
Section 6. Random Walk with Spread-Out Step-Lengths 101
forms an integer-valued random walk with symmetric aperiodic bounded
step-lengths. Thus
M = inf{k ^0:Rk=0} = inf{fc ^ 0 : Skr = Skr}
is finite with probability one.
Both XMr+i,XMr+2,--- and X^r+1,X^r+2,... are sequences of i.i.d.
copies of X\, and both sequences are independent of (Sq , S",..., 5^r).
Since Smv = "5^r, this means that we obtain a copy S'" of S' by switching
from S" to S at time K = Mr, that is, we have once more established the
existence of a successful coupling.
6.4 Final Part of Proof - Always Uniform Part After 2r Steps
We shall now complete the proof of Theorem 6.1 by showing that the
situation dealt with in the previous subsection is always the case (when
the step-lengths are spread-out).
Lemma 6.1. // (6.1), holds then there are constants a, b, and c such that
a < b, c > 0, and
P(X1+--- + X2r£A)
> c / 1M], A£B.
J A
Proof. Due to (6.1),
P(Xj + --- + x2re A)
> J (Jf(x-y)f(y)dy)dx, AeB.
(6.2)
It is no restriction to assume that / ^ 1 and that / = 0 outside a finite
interval [a,b] (since if this is not the case, we can replace / by l[a,b]f A 1,
where a and b are such that faf> 0). Let gn, 0 ^ n < oo, be nonnegative
continuous functions that are 0 outside the interval [a,b], bounded by 1,
and such that (see Ash (1972), Section 2.4.14)
/
|/-ffn|-»0, n -» oo.
102 Chapter 3. RANDOM ELEMENTS
Use 0 ^ / ^ 1 to obtain the latter inequality in
| J f{x - y)f(y) dy- J f(x' - y)f(y) dy\
^ | f{x-y)f(y)dy- gn(x-y)f(y)dy\
+ | / 9n(x-y)f{y)dy- / gn{x' -y)f(y)dy\
+ | J f(x' - y)f(y) dy- J gn{x' - y)f(y) dy\
^ 2 / \f ~9n\ + | (gn(x-y) -gn(x' -y))f(y)dy\.
Since gn(x — y) — gn(x' — y) is bounded and goes to 0 as x' —> x, we obtain
by bounded convergence that
limsupl [ f(x-y)f{y)dy- [ f(x' - y)f(y) dy\ ^ 2 /' \f - gn\.
x'—^x J J J
The right-hand side goes to 0 as n —> oo, and thus
lim \ [f{x- y)f(y) dy - [ f(x' - y)f(y) dy\ = 0, x G R,
x'->x J J
that is, J f(x — y)f(y) dy is continuous as a function of x. Take d such that
/ f{d - V)f(y) dy >0 and put
= \jf(d-y)f(y)
dy.
Take a and b close enough to d for J f(x — y)f(y) dy ^ c to hold when
a ^ x ^ b. Then J f(x — y)f(y)dy ^ cl[a,i,](a;), a; G K, and a reference to
(6.2) completes the proof. □
6.5 The Renewal Theorem - Spread-Out Version
Let 5 be a renewal process, that is, let the X^ be strictly positive and So
nonnegative. For B G B([0, oo)), let N(B) be the number of renewals in B,
oo
k=0
Blackwell's renewal theorem (Theorem 8.1 in Chapter 2) says that if Xt is
nonlattice, then for h G [0, oo),
E[N(t,t + h]]->h/E[Xi], t-»oo.
Section 6. Random Walk with Spread-Out Step-Lengths 103
When Xi is spread out and E[Xj] < oo, Theorem 6.1 can be used to
sharpen this to hold with (t, t + h] replaced by t + B where B is any Borel
subset of [0, h]; and the convergence is uniform in B.
Theorem 6.2. Let S be a renewal process such that Xx is spread out and
E[X\\ < oo. Then, for each h € [0, oo) and with A the Lebesgue measure
on [0, oo),
E[N(t + B)] -> A(S)/E[X!] uniformly in B e B([0, h]), (6.3)
as t —¥ oo.
COMMENT. Note that (6.3) cannot hold for general nonlattice renewal
processes (with EfXJ < oo) as the following example shows. Let B be the
nonrational subset of [0, h] and note that X(B) = h. Let S be zero-delayed
with rational recurrence times. Then for all rational t, E[N(t + B)] = 0,
which does not tend to h/E[Xi].
PROOF. Let S' have the same recurrence time distribution as S and the
delay time distribution Goo from Corollary 8.1 in Chapter 2. According to
that corollary, E[N'{0,t]] = t/E[Xi], t e [0,oo). Thus the measure with
mass E[N'(B)] at B £ B([0,oo)) must coincide with A/E[Xi]. Let S, S',
and K be as in Theorem 6.1, put T = Sk, and take h g [0, oo) and
BeB{[0,h]).Then
E[N(t + B)] = E[N(t + B)],
E[N'(t + B)] = E[N'{t + B)} = A(B)/E[A-!],
N(t + B) = N'(t + B) on{T^t}-
This yields the two equalities in
\E[N{t + B)] - A(S)/E[X1]| = |E[JV(* + B)] - E[N'{t + B)]\
^E[\N(t + B)-N'(t + B)\]
= E[\N(t + B)-N'(t + B)\l{T>t}}.
Since \N{t + B) - N'(t + B)\ ^ N([t, t + h}) + N'([t, t + h}), we obtain
sup \E[N(t + B)] - A(B)/E[Jfi]|
BeB{[o,h}) (64)
< E[N([t, t + h})l{T>t}] + E[N'([t, t + h])l{r>t}].
BothN{[t,t + h])l{T>ty and N'([t,t + h])l{T>t} tend to 0 with probability
one as t -> oo, and both are [see (8.1) and (8.2) in Chapter 2] dominated
104 Chapter 3. RANDOM ELEMENTS
in distribution by the finite-mean random variable N%. Thus by dominated
convergence (see Corollary 9.1 in Chapter 1)
E[N([t,t + h])l{T>t}]->0 and E[N'([t,t + h])l{T>t}] -> 0
as t -¥ oo. This and (6.4) yield (6.3). □
Remark 6.1. The measure with mass E[N(B)] at B G B([0,oo)) is
denoted by E[N] and called the intensity measure of the renewal process S.
The uniform rate result (6.3) says that E[N] tends to A/E[Xi] in total
variation on bounded intervals (total variation is defined in Section 8
below). If the condition E[Xi] < oo is sharpened to E[Xj] < oo, then this
result can be improved to hold in total variation on the whole half-line; see
Section 7.5 of Chapter 10.
7 Coupling Event - Maximal Coupling
In Chapter 1 we established the existence of a maximal coupling of a
collection of discrete random variables and of a countable collection of continuous
random variables, that is, a coupling making all the variables coincide with
maximal probability. We now extend this result to general random elements
in an arbitrary space and start by establishing the key measure-theoretic
result.
7.1 Greatest Common Component
Let (E,£) be a measurable space, I an arbitrary index set, and m, i G I,
a collection of measures on (E,£). A measure v on {E,£) is a common
component of the /jj, i G I, if it is a component of each Hi, that is, if
v ^ m, i G I.
Moreover, v is a greatest common component of the /x*, i G I, if all other
common components are components of v.
Theorem 7.1. Any collection of measures pn, i G I, on an arbitrary space
(E, £) has a unique greatest common component, which we denote by f\iej Hi-
It holds that
(/\/ii)(i4) = sup{i/(A):i/<A*<,«'eI}, Ae£. (7.1)
Comment. Note that in general,
(/^(AJjMnfi/^AJiiel}.
t€l
Section 7. Coupling Event - Maximal Coupling 105
In order to see this, consider a collection of probability measures Hi, i G I,
and suppose there is a j and a k such that Hj(Ac) = Hk{A) = 0. Then
( /\ &)(E) = ( f\/*)(Ac) + ( /\ w)(A) < ^(Ac) + N(A) = 0,
iei iei iei
while inf{/Xj(£) : i G 1} = 1.
PROOF. Uniqueness is obvious: if v and z/ are two greatest common
components, then by definition, v(A) ^ v'(A) and v(A) ^ ^'(^4), A £ £, and
thus v = v'.
In order to establish existence define a set function n by
/i(A) := sup{v(A) : v ^ Hi,i & I}.
Clearly, /j ^ /jj, j 6 I, and z/ ^ /x for all common components v. Thus the
set function n is a greatest common component if it is a measure. Since \x
is nonnegative, it only remains to show that /j, is u-additive.
For that purpose, let Ai,A2}... be an arbitrary sequence of disjoint sets
in £. We must establish that
H{AY U A2 U • • •) = ji(Ai) + /i(A2) + • ■ • . (7.2)
For each j ^ 1, let v^\, i/j2, ■ ■ ■ be a sequence of common components of
the [ii,i € I, such that
"jkiArftfiiAj), fc->oo.
For each k ^ 1, define a common component vk of the m, i G I, by
Vk :=^u(-nAi) + z/2/fc(-nA2) + •••
and note that for each j we have Vk{Aj) = Vjk(Aj). Thus
"t(^)tM^j). fc->oo, j ^ 1,
and
/x(^! \jA2U---)^vk(A1uA2U---)
= VkiAi) + vk(A2) + ■ ■ ■ .
Sending k —> oo yields, due to monotone convergence,
fi{Al U42U-)^ M^O + /i(A2) + • • ■ . (7.3)
In order to establish the converse inequality, let v[, v'2,... be a sequence of
common components of the /i;, i G I, such that
v'k(A1UA2U---)-tn(A1uA2U---), k^oo.
106 Chapter 3. RANDOM ELEMENTS
Since for k ^ 1,
i/k{A1uA2\J---) = i/t(Ai) + i/k(A2) + ■■■
^ n{Ai) +/i{A2) + • • ■ ,
we obtain, by sending k —» oo,
/i(Ai U A2 U • • • ) ^ M^l) + M^) + • • • •
This together with (7.3) yields the desired result, (7.2). □
7.2 Coupling Event Inequality
Let Yi, i e I, be a collection of random elements in a general space (E,£).
Call an event C a coupling event of a coupling (Y, : i 6 I) of Yi, i e I, if
Y, = y} on C, i,j e I.
Call lc a coupling indicator if C is a coupling event.
Theorem 7.2. If C is a coupling event, then
/\p(yie-)^P(^e-,C), jel, (7.4)
and, with || • || denoting total mass (total variation norm),
|| /\P(yj € -)|| ^ P(C)- COUPLING EVENT INEQUALITY (7.5)
Proof. Fix a j e I. For all i e I,
P(Yi e •) = P(£ € •) ^ P(£ € -,C) = P(y,- e -.C).
Thus P(Yj e -,C) is a common component of the P(Yi € •), i € I, which
yields (7.4), and (7.5) follows by evaluating (7.4) at E. □
7.3 Maximal Coupling
A coupling (Yi : t e I) with coupling event C is maximal if the coupling
event inequality is an equality,
||/\p(yie-)|| = P(C). (7-6)
This is equivalent to
/\P(Yie-) = P(^e-,C), jel, (7.7)
ten
Section 8. Maximal Coupling Two Elements - Total Variation 107
since (7.6) follows by evaluating (7.7) at E, and conversely, if (7.7) does
not hold, then with fi := f\ielP{Yi € ■), there is a j and an A such that
n{A) >P(Yj e A, C), by (7.4), which together with fi(Ac) ^ P(Yj &AC,C)
yields ||p|| > P(Yj e A,C) + P(YJ e AC,C) = P(C), that is, if (7.7) does
not hold, then neither does (7.6).
Theorem 7.3. There exists a maximal coupling of any collection of
random elements Yi, i g I, in an arbitrary space (E,£).
Proof. Let I, V, and Wj, j e I, be independent. Let I be a 0-1 variable
with
p(/ = i) = ||/\p(yi€-)|.
iei
When P(I = 1) = 0, let the Yj, j e I, be independent and take C = 0.
When P(J = 1) = 1, let the Yj, j e I, be identical and take C = fi. When
0 < P(J = 1) < 1, let V and Wj, j e I, be independent random elements
in {E, £) that are independent of I and have distributions
P(Ve-) = /\P(Yi&.)/p(l=i),
iei
P(Wj e ■) = (P(Vi e •) - Ap(y» e -))/p(^ = 0).
Put, for j e I,
V if 1=1,
Y =
J 'K if/ = 0,
and C = {/ = 1} to obtain the desired result. □
8 Maximal Coupling Two Elements - Total Variation
We shall now consider the case of only two random elements Y and Y' in
an arbitrary space (E, £) and formulate the results of the previous section
in terms of total variation distance between the distributions of Y and Y'.
We first establish a decomposition result for differences of measures, like
P(Y e •) — P(Y' e •), then have a look at some basic properties of the
total variation norm, and finally present the reformulation.
8.1 Difference of Measures — Mutual Singularity
Two measures v+ and v~ on £ are mutually singular if they put all their
mass in separate parts of E, that is, if there is a set A+ e £ such that
v+{E\A+)=0 and v~{A+) = 0. (8.1)
108 Chapter 3. RANDOM ELEMENTS
Denote this by v+ Lv . According to the next theorem
v+Lv~ <£> z/+ A v~ (E) = 0; (8.2)
here v+ A v~ is the greatest common component of v+ and v~.
Theorem 8.1. Let [i and // be bounded measures on £ and let /1A/1' be
their greatest common component. Then
(fi- fi')+ := fi - fi A fi',
(8.3)
(fi-fj,') := fi'- fi A fi',
are the unique measures satisfying
fi-fi' = {fi- fi')+ -{n- fi')~,
(8.4)
Further, let f and f be densities of [i and fi' with respect to some measure
X, for instance with respect to A = [i + fi'. Then
/ A /' is a density of /jA/i', (8.5)
(/ - f')+ *s a density of (/j - /i')+, (8.6)
(/ — /')" is a density of (fi — //)". (8.7)
Proof. Let / and /' be densities of \x and fi' with respect to some measure
A, and note that
/A/' = /-(/-/')+=/' -(/-/')"• (8-8)
Let v be the measure with density /A/', and note that v ^ fiA/j,'. Also, fi—u
and // — v have densities (/ — /')+ and (/ — /')~ and are mutually singular
since (/ - /')+ = 0 outside B+ = {x : f(x) > f'(x)} and (/ - /')~ = 0
on B+. Now, [i A n' — v is a component of both [i — v and n' — v and thus
has mass zero both outside and inside B+. Thus v' = \x A fi', and we have
established the theorem except for the uniqueness result.
In order to establish uniqueness of the decomposition (8.4), suppose there
is another decomposition /j, — fi' = v+ — v~ where v+ ± v~. Let A+ be as at
(8.1) and B+ be such that (/x - n')+(E \B+) = 0 and (/x - /x')~(s+) = °-
Then
(»-»')(■ n A+) = v+> o,
(/x-/x')(-nB+) = (/x-/x')+^0,
and thus by additivity (fi - fi'){- n (A+ U £?+)) ^ 0. In the same way we
obtain (fj, - //)(• n (A+ n B+)c) ^ 0. Therefore,
(li - aO(- n (A+ u B+) n (A+ n B+V) = °>
which implies (fi - //)(■ n B¥) = (/j - /j')(- n A+), that is, i/+ = (/j - /x')+-
Thus also v~ = (fi — /i')~. □
Section 8. Maximal Coupling Two Elements - Total Variation 109
Corollary 8.1. Let Hi,H2,--- be a sequence of bounded measures on £.
Then
f\ Hii f\ m, n->oo.
Further, if ft, fi, ■ ■ ■ are densities of Hi, [i2, ■ ■ ■ with respect to some
measure X, for instance with respect to X = ^2l<i<oc 2~lHi> then
inf fi is a density of A Hi-
l<i<oo ' x
l^i<oo
Proof. The measures f\i<nHi decrease setwise to a component \x of all
the n\,H2,---, and if there were an A such that n(A) < (f\i<oc m)(A),
then there would be an n such that {/\i<n Hi){A) < {f\i<oc Hi){A), which
cannot hold, since f\i<oc Hi is a component of f\i<nHi- Thus
/* = f\ Hi-
l^i<oo
To establish the second half of the corollary observe that (h\ A H2) A H3 =
f\i<3Hi- Applying this and (8.5) repeatedly yields that f\i<nHi has the
density inf^„/j. By monotone convergence {/\i^nHi){A) = JA inf i^n ft
decreases to JA infi<0O ft for A e £. Thus
fi{A) = / inf ft, Ae£,
J A l$«<oo
IA
and the proof is complete. □
8.2 Total Variation
The difference \x — h' oi two bounded measures is still a bounded u-additive
set function but not necessarily nonnegative, therefore 'signed': a real-
valued function v defined on £ is a bounded signed measure if it is bounded,
sup \v{A)\ < 00,
Ae£
and (j-additive, that is, for each sequence Ai,A2,... of disjoint sets in £,
v{Ax U A2 U • • •) = u{Ai) + v(A2) + ■■■ .
The total variation norm of v is
|H| := sup v{A) - inf v{A). (8.9)
A€£ Ae£
If v is a measure, then clearly ||i/|| = v{E) = the mass of v. For a real-valued
function g defined on E let g € £ denote that g is £/B-measurable.
110 Chapter 3. RANDOM ELEMENTS
Theorem 8.2. Let fi and fi' be bounded measures onS with densities f and
f with respect to some measure X, for instance with respect to X = [i + fi'.
Then there is a set A+ € £ such that
Ha* - a*'II = (a* - »')(A+) - (a* - n')(E \ A+)
= ll(^-/x') + || + ||(/x-/x')-|l
= J(f-f')+d\ + J(f-f')-d\,
= f\f-f'\dX
J (8.10)
= f fdX+ f f'dX - 2 f f A f'dX
= ||Ai|| + ||A«'||-2||AiAAi'||
= SUP [ 9dn - gd/i'j - inf N gdfi - gdfi'j.
In particular, when [i and fi' have the same mass, ||/j|| = \\(J.'\\, then
\\li-fi'\\=2fr-n')(A+)
= 2snp(fi-fi')(A)
Ae£
= 2 sup \/i(A) - fj,'{A)\
Ae£
= 2||(/x-//) + l
= 211^-^)"!
= 2J(f-f')+dX
= 2 ju-fydx
= 2 sup ( gdfi- gdfi'j,
(8.11)
and when [i and fi' are probability measures, \\fi\\ = ||/i'|| = 1, then
||At-Ai'|| = 2-2||A.AAi'||
= 2-2 [fAf'dX.
(8.12)
Section 8. Maximal Coupling Two Elements - Total Variation 111
Proof. From (8.4) we deduce -\\(/i - //)"ll ^ {fi- fi')(A) ^ ||(/i - fi')+\\
and that there is a set A+ € S such that
(fl-fl')(A+) = \\(n-n')+\\,
(n-n')(E\A+) = -\\(ii-fi')-\\.
Thus
sup(/x - a*') W = (A* - A*')(^+) = ll(A* - A«')+ll,
^ (8.13)
inf (ax - fi'){A) = (a, - a*')(S \ A+) = -||(a. - A»')"ll-
This yields the first and second equality in (8.10). The third equality in
(8.10) follows from (8.6) and (8.7), the fourth and fifth are immediate, the
sixth follows from (8.5), and the seventh from the third equality and
sup (ax - n'){A) < sup ( fgdfi - [gdfi) < /(/ - f')+d\,
Ae£ ge£ V J ' J
f(Ax-Ax')(^)^ mf (Jgdti-Jgdfi')^-J(f-f')-dX
in
Ae£
If ||Ai|| = ||Ai'||,then(Ai-Ai')(£) = 0 and thus (a* - fi')+{E) = (A» - A»')~(£)-
Hence (8.11) follows from the first and second equality in (8.10) together
with (8.13), (8.6), (8.7), and (8.14). Finally, (8.12) follows from the fifth
and sixth equality in (8.10). □
If a*i , At2, ■ ■ ■, A4 are bounded measures let
tv
fln -t (1, Tl ->■ 00,
denote that ||ax„ — a*|| -> 0 as n -> oo. The following result (Scheffe's
theorem) is an easy consequence of Theorem 8.2.
Corollary 8.2. Let fix, fi2, ..., fi be probability measures with densities
Ai fi, ■ • • i / with respect some measure A. If fn —¥ f as n —► oo a.e. X, then
tv
fin —¥ fi as n —► oo.
Proof. According to the sixth equality in (8.11), we have ||At„ — At|| =
/(/ - fn)+d\ . Now (/-/„)+ ^ / and / fdX < oo, and the desired result
follows by dominated convergence. □
The converse does not hold, see Example 7.2 in Chapter 1.
112 Chapter 3. RANDOM ELEMENTS
8.3 The Coupling Event Inequality - Maximal Coupling
We are now ready to reformulate the results of Section 7 in the case of two
random elements Y and Y' in an arbitrary space (E, £). The total variation
distance between the distributions of Y and Y' is [see (8.11)]
||P(y g ■) - p(y' e Oil =2 suP(P(r g A) - p(y g a)). (8.15)
If (Y, Y') is a coupling of Y and y with coupling event C [that is, Y — Y'
on C], then
P(r G A) - P(y' G A) = P(f G A) - P(f' G A)
= p(y g a, cc) - P(y' g a, cc) < P(CC)
and applying (8.15) yields the following coupling event inequality,
||P(y G •) - P(i" G OIK 2P(CC). (8.16)
This inequality we wrote in Section 7 as
||P(y G0AP(y'G0ll ^P(C). (8.17)
That (8.16) is a reformulation of (8.17) follows from the first equality in
(8.12).
Equality holds in (8.16) if and only if it holds in (8.17). Thus the coupling
is maximal if and only if
||P(y GO- P(l" G Oil = 2P(CC). (8.18)
If the set {y = Y'} is an event [is measurable], then by definition {Y —
Y'} is a coupling event, and since C is a subset of {Y = Y'} [and thus
P(y jt f) ^ P(CC)], we can [due to (8.16)] rewrite (8.18) as
l|P(y go- p{y' g oil = 2P(f ? y'). (8.19)
The set [Y = Y'} is measurable when, for instance, (E,£) is Polish (see
(7.5) in Chapter 4).
8.4 Comment on Signed Measures
Although we shall not need this result here, it is in fact true (and not too
hard to prove, see Ash (1972), Theorem 2.1.2) that for any bounded signed
measure v on £ there are unique measures v+ and v~ on £ such that
v = v+ — v~ and v+± v~ [Jordan Hahn decomposition].
The measures v+ and v~ are the positive and negative parts of v and
\v\ := i/+ + v~ is called the total variation measure.
Note that in general,
v+(A) jt MA))+, v-{A)jt(v{A))-, \u\(A)jt\u(A)\.
Section 9. Hitting the Limit 113
9 Hitting the Limit
In Chapter 1 (Sections 6 and 7) we established that a sequence of discrete
random variables can be coupled in such a way that the variables hit the
limit in finite time and stay there if and only if their probability mass
functions converge pointwise to the probability mass function of the limit
variable, and that a sequence of continuous random variables can be coupled
in this way if and only if the lim inf of their densities is a density of the
limit variable. We shall now extend this to random elements in an arbitrary
space.
9.1 Coupling Index Inequality
Let Y\,..., Yqc be random elements in an arbitrary space (E, £). Call a
random variable K in {1,..., oo} a coupling index of a coupling (1t\,..., Y^)
ofY1,...,Y00ii
Y„ = Yoo for n ^ K.
Theorem 9.1. If K is a coupling index, then for 0 ^ n ^ oo,
l\ p(ne)^p(y00G-,K^n),
|| f\ P(Yfc € -)ll ^ V{K ^ n)> COUPLING INDEX INEQUALITY
||P(yooe-)- A P(ne-)ll<P(tf>n)- (9-1)
PROOF. This follows from Theorem 7.2, since {K ^ n} is a coupling event
for the coupling (Yn,..., Y^) of Yn,..., Y^. □
9.2 Hitting the Limit
If K is finite, we obtain from (9.1) that
/\ Pfte-)tP(^e-), n-K». (9-2)
In fact the following holds.
Theorem 9.2. Let Y\,..., Y^ be random elements in an arbitrary space.
There exists a coupling with a finite coupling index if and only if (9.2) holds
and if and only if
/\ P(y*G-)tP(noG-). n-Kx>, (9.3)
114 Chapter 3. RANDOM ELEMENTS
and if and only if
liminf//t is a density of Y^, (9-4)
k—+oo
where /i, /2, • • • are the densities o/Yi, Y%,... with respect to some measure
A, for instance with respect to A = Xa</t<oo 2-fcP(YJs: G •).
Comment. In Chapter 1, Theorems 6.1 and 7.1, we showed that these
equivalent conditions are in general strictly stronger than total variation
convergence [Yn —>• Y^ as n —>■ oo], except in the discrete case [when E is
finite or countable).
PROOF. By Corollary 8.1, f\n<k<00 ~P(Yk <E •) has density infnsjfc<00 fk.
As n —► oo, this density increases to lim infk->oo fk and thus by monotone
convergence f\n<k<oc PQ'fc £ ') increases setwise to a measure with density
liminffc^oo fk. Thus (9.4) and (9.3) are equivalent.
From (9.3) we obtain that An<fc<t30P(y* G ") ^ P(^oo € •) and
therefore A„^<00P(n € •) = A„a<°oP(y* e 0; hence (9.3) implies (9.2).
Conversely, since AnO^oo p(** G ') ^ A„£*<ooP(** e 0. we have that
(9.2) implies that lim^oo /\n<k<00 P(Yk € •) ^ P(Voo € •)> and since
liirin^oo An</fc<ooP(^: e ') has mass ^ 1, it follows that (9.3) holds. Thus
(9.2) and (9.3) are equivalent.
Due to (9.1), the existence of a coupling with a finite coupling index
implies (9.2), and the converse follows from the next theorem. □
9.3 Maximality at Each Index
Call a coupling (Y\,..., Yoo) with coupling index K maximal at each index
if the coupling index inequality is an equality,
|| [\ P(ne-)||=P(*"<n), 0<n<oo, (9.5)
or equivalently [see (7.6) and (7.7)], if
f\ P(n G-)=P(Y00 €-,-K"<n), (Kn<oo. (9.6)
Theorem 9.3. For any sequence of random elements in an arbitrary space
there exists a coupling that is maximal at each index.
Proof. Let fi\,... , fioo be the distributions of Y\,... , l^oo- Put
vn= /\ (*k, 1 ^ n < oo,
Vaa = lim A \xk and Vq — 0.
n~+00 ' *
Section 9. Hitting the Limit 115
Let Vi,V2, ■ ■ ■ ,Wi,..., Woo, K be independent random elements. Let K be
{1,..., oo} valued with distribution
Y{K = n) = \\vn\\-\\vn-X\\, 1 ^ n < oo,
P(tf = oo) = l-||"oo||,
and note that
P(K^n) = J2 (lkl|-|l"*-ill) = IKII, K"<oo. (9.7)
For 1 ^ n < oo, let the distribution of Vn be
-p(Vn G •) = K - ^n-\)fP(K = n) (arbitrary if ¥{K = n) = 0).
For 1 ^ n ^ oo, let the distribution of Wn be [note that P(K ^ n + 1) =
1 — ||i/n||, even for n = oo]
P(TynG-) = (/in-^n)/P(^^n + l) (arbitrary if P(K^n + l) =0).
Put
f = (V*r on{K<n + l},
" [Wn on{K^n + l}.
Then (Yi,..., Yoo) is a coupling of Y\,..., Y^, since
p(fnG-)= Y. ?(vke-)'P(K = k) + -p(Wne-)-p(Kzn + i)
l^k<n+l
= fin, 1 ^ n ^ 00.
Clearly, K is a coupling index, and by (9.7) the coupling is maximal at
each index. □
9.4 No Final Element
Suppose a sequence Yi, Y^,... (without a final element Y^) is given. Call
K a coupling index if {K ^ n} is a coupling event for Yn, Yn+i,... for all
n < oo. Then
|| /\ P(Yke-)\\>P(K^n), l^n<oo.
n^fc<oo
116 Chapter 3. RANDOM ELEMENTS
Let Yoo be a random element such that
P(yoo G ") 2 lim A P(n G •)•
n—>oo ' *
Then Theorem 9.3 yields the following: there exists a coupling of Yx, Yi, • • •
that is maximal at each index, that is, a coupling with coupling index K
such that
|| f\ P(ne.)ll=P(tf<n)> (Kn< oo.
And Theorem 9.2 yields the following: there exists a coupling of Yx, y2, • • •
with a finite coupling index if and only if
lim A P(Yjfc G •) is a probability measure
and if and only if lim inf k-^oa fk is a probability density.
9.5 Continuous Index
Consider a continuous index family Yt, t £ (0, 00]. Then a random variable
K in (0, 00] is a coupling index if
Yt = Yoo for t ^ K.
Since ~P(K ^ t) is right-continuous in t, while || A(<s<oo PQ's G -)ll need
not be [and not left-continuous either], it is clear that Theorem 9.3 must be
modified. We also leave out the density part of Theorem 9.2, but the rest
of the above results can be transferred from discrete to continuous index.
Theorem 9.4. Let Yt, t € (0, oo], be random elements in an arbitrary
space. If K is a coupling index, then
||P(^ocG-)- A P(nG-)ll<P(tf>*). 0<t<oo,
and there exists a coupling with a finite coupling index if and only if
/\ P(y, e-)tP(noG-), t->oo.
Further, let t\,t2,--- be an increasing sequence of positive real numbers
such that tn —¥ 00 as n —> 00. There is a coupling that is maximal at each
index in {^1,^2, ■ • • }, that is,
|| /\ P(Yse-)\\ = P(K^t), te{h,t2,...},
Section 10. Convergence in Distribution and Pointwise 117
or equivalently,
/\ P(ys6-) = P(te-,if^t), *€{*,,t2,...}.
PROOF. Let /xs be the distribution of Ys. Put t(s) = tn if tn ^ s < tn+\.
In the proof of Theorem 9.3 replace {1, ...,oo} by {ti,t2, ■ ■ ■ ,00} and
W\,..., Woo by Ws, s G (0, 00], where Ws is a random element with
distribution (/xs — z/((s))/(l — ||^t(»)||) if Ik^s)!! < 1 and otherwise an arbitrary
distribution. This yields a {*i, *2i - - - } valued coupling index K. The rest
of the theorem follows as in the discrete index case. □
10 Convergence in Distribution and Pointwise
In Chapter 1 (Section 8) we turned convergence in distribution for random
variables into pointwise convergence. We shall now extend this result to
random elements in a space {E, £) where E is separable metric and £
its Borel subsets. In Section 1 below we have a look at the definition of
convergence in distribution and its basic properties and then prove the
coupling result in Section 2.
In Chapter 1 the quantile coupling was used to turn convergence in
distribution into pointwise convergence. The quantile coupling represents all
random variables as measurable functions of a uniform random variable.
In Section 3 below we extend this result to random elements in a Polish
space and in Section 4 use the construction to give another proof of the
coupling result in the special case of a Polish space. This latter coupling
result is due to Skorohod (1956), while the extension to a separable space
was established by Dudley (1968).
10.1 Convergence in Distribution
The following result is basic.
Fact 10.1. Let Y\,Yz,..., Y be random elements in (E,£) where E is a
metric space with metric d and £ are the Borel subsets of E. The following
four conditions are equivalent:
(a) for bounded continuous functions f from E to M.,
lim E[/(y„)] = E[/(y)],
n—>oo
(6) for Ae£ with P(Y G boundary of A) = 0,
lim P(Yn G A) = P{Y G A),
n—+00
118 Chapter 3. RANDOM ELEMENTS
(c) for open subsets A of E,
liminf P(yn GA)Z P(Y G A),
n—>oo
(d) /or closed subsets A of E,
HmsupP(y„ G A) < P(y G A).
n—>oo
Moreover, if E is the real line and F\, F2,. ■ ■, -F are the distribution
functions of the random variables Y\, Y2,.. ■, Y then these equivalent conditions
are also equivalent to
(e) for x GM. such that F is continuous at x,
lim Fn(x) = F(x).
n—>oo
For a proof, see Ash (1972), Theorem 4.5.1 and Theorem 4.5.4. (The first
of these theorems is the basic result for so-called weak convergence of
measures. We have stated the result here in the special case of probability
measures and in a random element form).
The sequence Y\,Yi,. ■. is said to converge in distribution to Y if one of
the equivalent conditions in Theorem 10.1 holds (and the distributions are
said to converge weakly). Denote this by
Yn —> y, n —► 00.
Note that convergence in distribution is weaker than convergence in total
variation. The latter is defined for {E, £) arbitrary [see Section 8.2] and
means that (a) holds uniformly in / G £ bounded by one [or any fixed
constant] and without the continuity restriction. Total variation convergence
also means that (6) holds uniformly in A G £ and without the restriction
that P(y G boundary of A) = 0.
Further, note that if the sequence Yi,Y2,... converges pointwise to Y,
that is,
Yn->Y, n ->■ 00, (short for lim d(Y{to),Yn(to)) = 0,w G 0)
n—>oo
then f(Yn) —► f(Y) pointwise as n —> 00 for continuous functions /, and
thus by bounded convergence it follows from (a) that pointwise convergence
implies convergence in distribution.
Finally, note that if Yn %■ Y and Yn 4 Y', then, due to (a), E[f(Y)] =
E[/(y')] for bounded continuous functions /, and thus [since bounded
continuous function are a measure determining class] Y and Y' have the same
distribution, that is, the limit random element is distributionally unique.
Section 10. Convergence in Distribution and Pointwise 119
10.2 Turning Distributional Convergence into Pointwise
We shall now show that if Y\,Yi,. ■. ,Y are random elements in a space
(E,£), where E is a separable metric space and £ its Borel subsets, then
V D V
Yn-¥Y, n -¥ oo,
if and only if there is a coupling (Yi, Yi,..., Y) of Y\, Yi,..., Y such that
Yn —> Y, n —► oo.
The 'if part was established in Section 10.1, and the 'only if part follows
from the next theorem.
Theorem 10.1. Let E be a separable metric space and £ its Borel subsets.
If a sequence of random elements in (E,£) with distributions Pi,P2,...
converges in distribution to a random element with distribution P then there
exist random elements Y^\Y^2\ ... ,Y with distributions Pi,P2, ■ ■ ■ ,P such
that y(n) —>■ Y pointwise in the metric as n —>■ oo.
PROOF. Let d be the metric. A set A in £ is called a P-continuity set if
P(dA) — 0 where dA denotes the boundary of A. Due to separability, for
any fixed k ^ 1, E can be covered with countably many sets of diameter
less than 1/k. Further, these sets may be taken to be P-continuity sets
since d{x £ E : d(y,x) < r} is a subset of {x £ E : d(y,x) = r} and the
spheres around y are P-continuity sets except for countably many radii r.
Note also that the sets can be made disjoint because d(A D B) is a subset
of dA n dB.
Let {An, A\2, • ■ • } be a countable partition of E into P-continuity sets
with diameter less than 1. Let {A^iijA^a,--.} be a countable partition
of An into P-continuity sets with diameter less than |. Continue this
recursively: let {A(k+i)i1...iki,-*4-(fc+i)ii...ifc2) • • • } be a countable partition of
Au\...ik into P-continuity sets with diameter less than 1/fc. Since countable
unions of countable classes are countable, it follows that for each k ^ 1,
{^kii...ik '■ (ii, • • • ,ik) € {1,2,... }k} is a countable partition of the set E
into P-continuity sets with diameter less than 1/k.
Let Y be a random element with distribution P and define a sequence
of positive random integers Mi, M2,... by
(Afi,...,Aft) = (ii,...»*) if YeAkil...ik. (10.1)
Thus
T>((M1,...,Mk) = (i1,...,ik))=P(Akh...ik).
For each n ^ 1, let M[n\ M2, ■ ■ ■ be a sequence of positive random
integers with distribution defined by
P((M1(n))...,Afin)) = (ii,...it))=Pn(Atil...<J.
120 Chapter 3. RANDOM ELEMENTS
We shall prove in the next subsection that the M[n', M% , ■ ■ ■ can be
chosen in such a way that for each 1 ^ k < oo there exists a finite random
integer Nk such that
(M[n\...,Mikn)) = (M1,...,Mk), n^Nk. (10.2)
Assume at this point that (10.2) holds.
Let Vfcj- i be an Aki1...ik valued random element that is independent
of (M{ ,..., Mfc ) and has distribution Pn(-\Aki1...ik). Define a random
element Y^ by
n(B) = *&U if (Af1(B),...,AfW) = (i1>...,ifc)-
Then Y^n' has the distribution Pn. Since j ^ k implies that Aji1..jj is a
subset of i4fci1...jfc, it follows from (10.1) and (10.2) that if n ^ Nk and
j ^ fc, then y and y("' are in the same Aki1...ik set. Thus
d(Y,Y!>n)) < Ilk if n ^ ATfc and j ^ k.
Define
y(") = yfc^) where kn = sup{fc ^ n : P(Nk > n) < l/k}.
Then y(n) has distribution Pn,kn —► oo as n —► oo, fc„ ^ km if n ^ m, and
d(y,y(n)) < l/fcm if O Nkm and n ^ m.
Since the iVjfc are finite, this yields
limsup d(yy'n^) ^ l/km -> 0 as m -> oo,
n—>oo
that is, limn-yoo d(y, y(")) = 0. Thus it only remains to establish (10.2) to
complete the proof. □
10.3 Proof of (10.2) - Transfer
We now show that the Al[n\ M^n\ ... can be chosen so that (10.2) holds. In
the proof we shall use Theorem 6.1 in Chapter 1 and the transfer extension
(Section 4.5 of this chapter) inductively.
Since P{dAki1...ik) = 0, we have the following convergence of probability
mass functions: for all (ii,...ik) € {1,2,... }k,
P((M<n\...,Af<n>) = (i1,...,it))
->P((Mi,...,Mfc) = (h,...,ik)), n-> oo.
Section 10. Convergence in Distribution and Pointwise 121
Take k = 1 and apply Theorem 6.1 in Chapter 1: there is a coupling
(M[1\m[2),...,Mi) of M1(1),M1(2),...,Mi and a finite random integer
N\ such that
M[n)=Mx, n>N1. (10.3)
Apply transfer: since M\ is discrete, there exists a regular conditional
distribution of (M[l), m[2\ ..., Mu Nx) given M\, which together with
Mi = Mi means that we may take Mi := M\. Then apply transfer count-
ably many times recursively in n: each M{ ' is discrete, and thus there
exists a regular conditional distribution of (Mj ,M^ , ■■■) given M[n',
which together with M[n) = M[n) implies that we may take M[n) := M[n)
for all n. Due to Mi = Mi and M[n) = M[n\ 1 ^ n < oo, we obtain from
(10.3) that (10.2) holds for k = 1.
In order to establish (10.2) inductively for finitely many k, repeat the
above argument. Fix k ^ 2 and suppose (10.2) holds with k replaced by 1
through k — 1. Apply Theorem 6.1 in Chapter 1 to obtain that there is a
coupling
((M[1\...,M^),(M[2\...,Mi2)),...,(Ml,...,Mk))
of the collection of /c-dimensional random integer vectors
(M1(1),...,M(1)),(M1(2),...,Mf)),...,(M1,...,Mfc)
and a finite random integer Nk such that
(Min),...,Min)) = (M1,...,Mk), n^Nk. (10.4)
Apply transfer to obtain that we may take
(Mi,...,Mk):=(Mi,...,Mk),
(M1^,...)M^)1):=(Af1(n))...,Af^1), l^noo.
(10.5)
Then apply transfer countably many times recursively in n to obtain that
we may take
M{kn) := M{kn\ lsCn<oo. (10.6)
Due to (10.5) and (10.6), we obtain from (10.4) that (10.2) holds for k.
In order to establish (10.2) for all k note that in the Arth step of the
recursive transfer construction of
(Mi,M2, ...),(Ni,N2,...), and {M[n\ M<n),...), 1 sC n < oo,
we did not [see (10.5) and (10.6)] redefine
(Mi,M2, ...),(NU..., Nk-!), and {M[n\ ..., M{kn_\), 1 ^ n < oo.
Thus the recursive transfer construction is consistent and can be repeated
countably many times recursively in k. Thus (10.2) holds for all k.
122 Chapter 3. RANDOM ELEMENTS
10.4 Representation on the Lebesgue Interval
The Lebesgue interval is the probability space ([0,1],<B[0,1], A) where A
is the Lebesgue measure on ([0,1],$[0,1]). We shall now generalize the
quantile coupling (Section 3 in Chapter 1) by showing that all random
elements in Polish spaces can be represented as measurable mappings of a
single uniform variable. This result can be stated as follows.
Theorem 10.2. For each probability measure P on a Polish space (E,£)
there is a random element, defined on the Lebesgue interval, with
distribution P.
Proof. Recall that (E,£) Polish means that there is a metric d on E
making E a complete [Cauchy sequences converge] separable [there is a
countable dense subset] metric space with S the Borel sets.
For each k ^ 1, construct a countable [E is separable] partition Ak —
{Ak\,Ak2, ■ ■ ■ } of E into sets (in £) of diameter less than 1/k. Let Ak+i
refine Ak, that is, let each set in Ak be a union of sets in Ak+i [see the second
paragraph of the proof of Theorem 10.1 and note that the P-continuity
requirement is not needed in this proof]. Then construct countable partitions
h = {hi,Ik2,---} of [0, 1] into subintervals whose length is
X(Iki) = P{Akl)
with Ik+i refining Ik- Arrange the indexing so that Aki 2 Ak+i,i if and
only if Iki 2 h+i,i- The construction can be carried out inductively because
if ai,a,2, ■.. are nonnegative numbers adding to the length of an interval,
then the interval can be split into subintervals of lengths ai, <i2,....
Let \jki be some element of A^ and define, for u G [0,1],
Yk(u) = yki if uelki. (10.7)
Since [for fixed w] the set {Yk (u), Yk+i («),...} is a subset of a single element
of {Aki, Ak2, ■ ■ ■ }, its diameter is at most 1/k. Thus Yjt(u), Yk+i (u),... is
a Cauchy sequence for each u. Thus the limit Y(u) exists \E is complete]
and
d(Y(u),Yk(u))^l/k. (10.8)
For B G £ let Bk be the closure of
{y G E : d(y, x) < 1/k for some x £ B} (1/k neighborhood of B).
With k fixed and L the set of positive integers i such that B n A^ /0we
have
X(Yk GB)^ X(Yk G (J Aki) = £ \(Yk G Aki) = £ \(Iki)
= Y,P(Akt) = P([JAkt) = P( U AkinB)^P(Bk).
Section 10. Convergence in Distribution and Pointwise 123
If B is closed, then Bk decreases to B as k —> oo, and it follows that
limsup X(Yk € B) ^ P(B), for closed subsets B of E.
Thus, by definition [see (d) of Fact 10.1], it holds that Yk converges in
distribution to a random element with distribution P. Now Yk converges to
Y pointwise, and thus in distribution. Since such a limit is distributionally
unique, we deduce that Y has distribution P. □
10.5 The Skorohod Coupling in the Polish Case
We shall now elaborate on the construction in the previous subsection to
reprove Theorem 10.1 in the special case of random elements in a Polish space
(which means that in addition to separability we also have completeness).
The proof is quite different and yields the extra result that the random
elements are all supported by the Lebesgue interval.
Theorem 10.3. // a sequence of random elements in a Polish space (E,£)
with distributions P\,P2,... converges in distribution to a random
element with distribution P, then there exist, on the Lebesgue interval,
random elements Y^\Y^2\ ... ,Y with distributions P\,P2, ■ ■ ■, P such that
y(n) _^ Y pointwise in the metric as n —> oo.
PROOF. Construct the countable partitions Ak of the preceding proof, but
this time require that each Aki be a P-continuity set [see the first paragraph
of the proof of Theorem 10.1]. Consider the countable partitions Ik as
before and, for each n, construct successively finer [as k increases] partitions
^{/^^•■.Iwith
A(/£>) = Pn(Akl).
Inductively, arrange the indexing so that
il? < iff if and on\y if Iki < Ikj
[here I < J means that the right endpoint of / does not exceed the
left endpoint of J). In other words, arrange that for each k the families
Ik, Ik \lk ■>••■ are ordered similarly.
Define Yk by (10.7), as before, where yki 6 Aki, and define
Yln\u) = yki if uElff.
Again Yk(u) converges to a limit Y(u) satisfying (10.8) and Y^n\u)
converges as k —> oo to a limit Y^n\u) satisfying
d(Y^(u),Yk{n)(u)) ^l/k. (10.9)
124 Chapter 3. RANDOM ELEMENTS
And again Y has distribution P and Y^ distribution Pn.
Since the Aki are P-continuity sets, we have by the convergence in
distribution assumption [see (6) of Fact 10.1]
A(Ffc(n) = Vti) = Pn(Aki) -»■ P(Aki) = X(Yk = yki), n -»■ oo,
that is, for each fc, the probability mass functions of Yk converge pointwise
to that of Yfc as ra —► oo. Theorem 6.1 in Chapter 1 now yields that Y^
tends to Yk in total variation as n -> oo, and thus [since X(Yk' = yki) =
M^ki ) an^ -M^* = 2/fcj) — ^(^fci)] for any set L of nonnegative integers
A(U4?))^A(U/^)' n^°°- (10-10)
Fix /c and j and choose
L = {i21:Iki<Ikj} = {i21:lW<I%)}
to obtain from (10.10) that the left endpoint of/^ goes to the left endpoint
of Ikj as n —> oo. Similarly, the right endpoint of /Ji," goes to the right
endpoint of Ikj.
Hence, if u is in the interior of Ikj, then for all sufficiently large n, u lies
in 4"\ so that Yk(n)(u) = Yk(u) and, by (10.8) and (10.9),
d(Y(u),Y^(u))^d(Y(u),Yk(u))+0 + d(Y^(u),Yk{n)(u))^2/k.
Send first n —► oo and then k —> oo to obtain
if w is not an endpoint of any Ikj, then Y^n\u) -» Y(u) as n —► oo.
The set of endpoints of the Ikj is countable and thus has Lebesgue measure
zero so that if Y^n\u) is redefined as Y{u) on this set, then Y^™) still has
distribution Pn, and there is now convergence for all u. D
Chapter 4
STOCHASTIC PROCESSES
1 Introduction
In this chapter we shall be concerned with coupling general stochastic
processes (in one-sided time) in such a way that they ultimately merge. This
was our main concern in the first half of Chapter 2 and in Section 5 of
Chapter 3. Recall, for instance, the classical coupling: two differently started
versions of a Markov chain run independently until they meet, at a time T
say, and then run together from time T onward.
For lack of a better term we shall call this kind of coupling exact coupling.
The qualifier 'exact' here refers to the fact that the processes coincide
exactly from T onward as opposed to what was the case in the latter part
of Chapter 2 and will be the case in Chapter 5, where the processes merge
only modulo a random time shift (shift-coupling, epsilon-coupling). Exact
coupling is in fact what some writers still call only 'coupling'. The word
coupling then refers to the merging, and not to joint construction in general
as in this book and as is becoming more and more common.
Section 2 starts with preliminaries establishing notation and discussing
the definition of a stochastic process to make sure that our simple abstract
framework and the reasons why it is chosen are understood.
Section 3 then introduces exact coupling and its distributional version,
which is obtained by replacing pointwise merging by distributional merging.
Distributional exact coupling is not as intuitively appealing as its nondis-
tributional counterpart but has several merits. It applies (slightly?) more
generally and has the same distributional implications. It is easier to
establish and can serve as a first step in constructing a nondistributional exact
125
126 Chapter 4. STOCHASTIC PROCESSES
coupling. Section 4 takes a brief look at distributional coupling in general
and establishes the coupling event inequality.
Section 5 presents the coupling time inequality and the resulting limit
theory. Section 6 proves the central maximality result, and Section 8
reformulates the proof after the concept of coupling with respect to a sub-
c-algebra has been introduced in Section 7. Section 9 introduces the tail
cr-algebra T and proves a result on maximal coupling with respect to T
that leads to a basic set of equivalences between successful exact coupling,
convergence in total variation, and distributional identity on T■
This chapter is followed by Chapter 5, where analogous theory is
established for two generalizations of exact coupling: shift-coupling and epsilon
couplings. Chapter 6 then considers the implications of these three sets
of coupling results in the Markov case, while Chapter 7 extends the view
beyond stochastic processes.
2 Preliminaries - What Is a Stochastic Process?
Before turning to the coupling theory we spend a number of pages (a third
of the chapter!) on some general but simple aspects of stochastic processes
that are basic for our purposes. The impatient reader could skim rapidly
through this section. Much of it is a motivation for the technically
straightforward property of 'shift-measurability', which we often impose in order to
be able to shift our processes randomly, and which is satisfied in the
standard settings such as when the state space is Polish and the paths right
continuous.
2.1 Classical Definition
The classical definition of a stochastic process goes as follows: a stochastic
process with index set 1 and state space (E,£) is a family Z = (Zs)s^i,
where the Zs are random elements defined on a common probability space
(0, T, P) and all taking values in (E,£).
We shall here think of the index set as time and mostly restrict the use
of the term 'stochastic process' to the following four cases:
I = R two-sided continuous time,
I = [0, oo) one-sided continuous time,
I = Z two-sided discrete time,
I = {0,1, 2,... } one-sided discrete time.
In this chapter we shall, in fact, only be concerned with one-sided processes.
But in this section the index set I is kept general for later purposes.
Warning. To avoid misunderstanding it should be stressed that here the
role of the index set I is different from the role it had in Chapter 3. We
are not going to couple the collection of random elements Zs, s 6 I, that is,
Section 2. Preliminaries - What Is a Stochastic Process? 127
their joint distribution will be fixed. We shall couple Z and another process
Z'\ the collection to be coupled will be Z and Z'.
2.2 Stochastic Process as a Random Mapping in {El,£l)
Let (El,£l) denote the product space
{E\£l):=®{E,£).
Rather than regarding Z in the classical way as a family of random elements
in (E,£), we can equivalently regard Z as a random mapping, that is, as a
single random element in {El,£l) defined by
Z(w) = (Zs(u>))sei, a; G ft.
The two points of view are equivalent because each Zs is a measurable
mapping from (Cl, T) to (E, £) if and only if Z is a measurable mapping
from (ft,.?-) to(E\£l).
The distribution of a stochastic process Z is the distribution of Z as a
random element in (El,£l). The distribution of Z is uniquely determined
by the finite-dimensional distributions, that is, by the distributions of the
(En,£n) valued random elements {Ztx, ■ ■ ■ ,Ztn), t\,...,tn G I, n ~£ 1.
These finite-dimensional distributions are in turn determined by
p(ztl e Ai,...,ztn e Ai), Au...,Ane£, h,...,tnei, n > 1.
In particular, when Z is real valued, that is, (E,£) = (R, B), then the
finite-dimensional distributions are determined by the finite-dimensional
distribution functions
P(Ztt < xi,...,Ztn < xn), xi,...,xneR, ti,...,tn e I, n > 1.
Finally, for Polish (E,£), Kolmogorov's consistency theorem (Fact 3.2 in
Chapter 3) states that if a consistent collection of finite-dimensional
distributions is given, then there always exists a stochastic process having these
finite-dimensional distributions.
2.3 Path Space {H,%) - Standard Settings
The paths of Z are the realizations Z(u)),u> G ft, of the random mapping Z.
Sometimes restrictions are put on the paths, for instance in the continuous-
time case that they are continuous, or right-continuous, or right-continuous
with left-hand limits, or more generally that they lie in some subset H of
El. In this case it is natural to consider Z as a random element not in
{El,£l) but in (H,H), where H is the a-algebra on H generated by the
128 Chapter 4. STOCHASTIC PROCESSES
projection mappings taking z in H to zt in E, t 6 I. Note that H is the
trace of H on £l, that is,
ft := £l n# := {inffiie^1}.
Again the two points of view are equivalent: Z has H valued paths and
each Zt is a measurable mapping from (fi, .F) to (£, £) if and only if Z is a
measurable mapping from (0, J7) to (H,H). When a particular patft set iJ
is given, call (H,"H) the pot/i space of Z. (One could conceive of allowing
7i to be more general than just the trace of H on £l, but we shall not do
so here.)
Note that H need not be an element of £l. In particular, H is not an
element of £l in the standard settings in continuous time when (E, £) is
Polish, I = R or I = [0, oo), and # is one of the sets
Ce(P) = continuous maps from I to E,
-De (I) = right-continuous maps from I to E with left-hand limits,
■Re(I) = right-continuous maps from I to E.
Both C.e(I) and -De(I) can be metrized such that the Borel cr-algebras,
Ce(I) and T>e(I), are the traces of the respective path sets Ce(I) and -De(I)
on £l (see Ethier and Kurtz (1986)), but we shall not need this fact here
[except for three isolated results, Theorem 7.4 in Chapter 5, Theorem 5.4
in Chapter 8, and the (v)-part of Theorem 3.4(6) in Chapter 10].
The distribution of a stochastic process Z with path space (H, V.) is the
distribution of Z as a random element in (H,H). The distribution is again
uniquely determined by the finite-dimensional distributions.
However, even if (E, £) is Polish and a consistent collection of finite-.
dimensional distributions is given, there need not be a stochastic process
with path space (H,H) having these finite-dimensional distributions. This
has to be checked in each individual case. One example where it can be
established is for the Wiener process: there is a one-sided continuous-time
real-valued process with path space (C,e(I),C,e(I)) having independent
stationary increments (see Billingsley (1986) or Kallenberg (1997), we will
only use this fact for an isolated example, Section 8.1 in Chapter 7).
2.4 Observing a Process at a Random Time
We would like to be able to observe a stochastic process at a random time.
There is no complication with this in discrete time, but in continuous time
there is. So let Z be a one-sided continuous-time stochastic process with
state space (E,£) and let T be a random time in [0, oo). By Zt we mean
the E valued mapping defined on fi in the obvious way:
zT{w) ■■= zT(W)(w), w e fi.
Section 2. Preliminaries - What Is a Stochastic Process? 129
This mapping need not be T'/£ measurable.
A condition sometimes imposed to take care of measurability
complications is the following: a continuous-time one-sided stochastic process Z is
jointly measurable if
the mapping taking (u, t) 6 ft x [0, oo) to Zt(uj) 6 E
is T ® B[0, oo)/£ measurable.
This condition implies that Zt is measurable, since Zt is the composition
of two measurable mappings: the first, taking u> to (ui,T(ui)), is TjT®
B[0,oo) measurable, and the second, taking (u>,T(u>)) onward to Zt(u\(u>),
is T ® B[0, oo)/£ measurable.
However, Zt being measurable is not all we need, as is illustrated in the
next subsection.
2.5 Joint Measurability Is Not Enough
Consider the following example.
Example 2.1. Take (£,£) = (R,B) and let (Q,f,P) be the Lebesgue
interval:
ft =[0,1], T =£([0,1]), P = Lebesgue measure.
Put T(u) = u and let Z, Z' be the one-sided continuous-time stochastic
processes defined by
Zs(u) =0, s € [0,oo), weft,
Z;(w) = l{0,i,...}(s-w), s6[0,oo), weft.
Trivially, Z is jointly measurable. Also, Z' is jointly measurable, since as
a mapping from ft x [0, oo) to E, it is a composition of two measurable
mappings: the first,
taking (w, t) in ft x [0, oo) to t — u in R,
is T ® B[0, oo)/5-measurable, and the second,
taking t — uj in R onward to l{o,i,...} (t — cj) in R,
is B/B-me&sm&ble.
Now, the finite-dimensional distributions of Z and Z' are identical, and
thus Z and Z' have the same distribution. In fact, for 0 ^ u ^ 1,
P(Z'ti =0,...,Zi=0,T^u)
= P(Ztl =0,...,ZK =0,T^u)=u.
Thus (Z,T) = (Z',T). In spite of this, ZT and ZT do not h ave the same
distribution, since certainly Zt = 0, while ZT = 1.
130 Chapter 4. STOCHASTIC PROCESSES
2.6 Canonical Joint Measurability May Be Needed
Example 2.1 shows the following: knowing that (Z,T) = (Z',T) and that
Zt and Z'T are random elements does not suffice to deduce that Zt and
Z'T have the same distribution. This is typically the sort of conclusion we
would like to be able to draw, so what went wrong?
Not surprisingly, the condition we need is joint measurability of the
canonical versions of the processes: say that a continuous-time one-sided
stochastic process with path space (H, T-Cj is canonically jointly measurable
if
the mapping (z, t) 6 H X [0, oo) to zt 6 E
is H ® B[0, oo)/£ measurable.
This is the condition needed to draw the conclusion, from the fact that
(Z,T) and (Z',T) have the same distribution, that Zt and ZT also have
the same distribution.
Thus in Example 2.1 there is no way to find a path space such that the
processes are canonically jointly measurable. Canonical joint measurability
on the other hand implies joint measurability, since a canonically jointly
measurable Z, as a mapping from fi x [0, oo) to E, is the composition
of two measurable mappings: the first, taking (u,t) to (Z(u),t), is T ®
B[0, oo)/?^® B[0, oo) measurable, and the second, taking (Z(u),t) onward
to Zt(oj), is H®B[Q, oo)/£ measurable. Hence, canonical joint measurability
is a strictly stronger condition than joint measurability.
2.7 In Fact, Shift-Measurability May Be Needed
Rather than observing our process only at a random time we would also
like to be able to observe the whole process from that time onward. Say-
that a path set H of a one-sided continuous-time stochastic process Z is
internally shift-invariant if
{(zt+s)se[o,°o) ■ z € H} = H, t e [0, oo).
When this is the case, define the shift-maps 8t, t 6 [0, oo), from H to H
by
OtZ = (Zt+s)s€[0,oo)> Z E H.
Say that Z is shift-measurable if Z has a path set H that is internally
shift-invariant and
the mapping taking (z,t) 6 H x [0, oo) to 6tz 6 H
is H <8> B[0, oo)/?^ measurable.
A stochastic process with an internally shift-invariant path set H is shift-
measurable if and only if it is canonically jointly measurable: canonical
Section 2. Preliminaries - What Is a Stochastic Process? 131
joint measurability is (trivially) equivalent to the mapping taking (z, t + s)
to zt+s being H ® #[oj00)/£ measurable for each s £ [0, oo), which in turn
is equivalent to shift-measurability, since % is generated by the projection
mappings.
2.8 Shift-Measurability Holds in the Standard Settings
The standard cases where the state space (E, £) is Polish and the path
sets are Ce[0,oo), De[0,oo), and Re[0, oo), respectively, are all covered
by shift-measurability. In fact, we only need E separable metric but not
necessarily complete.
This is a corollary of the next theorem, as can be seen as follows. First
note that Ce[0, oo), De[0, oo), and Re[0, oo) are all internally shift-invariant
and thus it suffices to establish canonical joint measurability. Now recall
that separability of a metric space is equivalent to second countability,
that is, every open cover has a countable subcover. Thus if G, is open we
can cover it by open balls whose closure lies in G. And since G is second
countable as a subspace of E, we can cover it by countably many of these
balls.
Theorem 2.1. Suppose E is topological, £ is generated by the open sets,
and the paths of Z are right-continuous (an element of E is a limit of a
function if the function eventually stays in any neighbourhood of the
element). If every open G C E is the union of countably many open sets Gj
whose closure Gj lies in G, then Z is canonically jointly measurable.
PROOF. Let H consist of the right-continuous elements oi E^°'°°K In order
to show that
g : (z, t) h-> zt is H ® B[0, oo)/£ measurable,
take d > 0, put Ld = {0,d,2d,...} and [t\d = sup{s 6 Ij : s < t}, and
define
9d ■ (z,t) ^ z[t]d+d.
Note that
(z, t) t-» (z, [t]d + d) is H <g> B[0, oo)/H'® B{Ld) measurable,
(z,t) t-» zt is H ® B(Ld)/£ measurable,
gd is a composition of these two mappings.
Thus gd is %®B[Q, oo)/£ measurable. By right-continuity, gd —» g pointwise
as d I 0, and thus the measurability of g follows from the next lemma. □
Lemma 2.1. Suppose E is topological, £ is generated by the open sets, fn
are measurable mappings from some measurable space (K,JC) into (E,£),
132 Chapter 4. STOCHASTIC PROCESSES
and fn —¥ f pointwise as n —> oo. // every open G C E is the union
of countably many open sets Gj whose closure Gj lies in G, then f is
measurable.
Proof. We must show that f~1(G) G K. for open G C E. Now,
xef-\G)
f(x) GG => 3j: f{x) G Gj
if a-
3j : fix) eGj 4= 3j,i : fn(x) eGj, n^i,
3j,i : x G f^iGj), n Js i,
lj,i:xef)f^(Gj)
j,i n^i
and the proof is complete. □
2.9 Killing - Birth - Shifting to Infinity
Consider a one-sided continuous time stochastic process with state space
(E, £). (The following applies also to processes in discrete and/or two-sided
time with obvious modifications.)
In order to be able to hide the process from a random time onward
(killing) and/or prior to a random time (birth) we introduce a new state
A icemetery or censoring state) external to E. For 0 $C t $C oo, define the
killing maps Kt taking z G (E U {Zi})!0-00) to ntz G (E U {Zi})[°'°°) by
\zs H0^s<t,
\A if t ^ s < oo,
and the birth maps pt taking z G (£U {Zi})!0-00) to j3tz G (El) {Zi})!0-00'
by
fzi ifO<a<t,
\ zs if t -C s < oo.
Section 2. Preliminaries - What Is a Stochastic Process? 133
Note that there are no joint measurability complications with killing and
birth: the mapping
taking (z, t)G(EU {A})^0'^ x [0, oo] to ntz 6(£U {A}f^
is (a(S U {{4}}))!0-00) ® B[0, oo}/(a(£ U {{Zi}}))^00) measurable, and so
is the mapping taking (z, t) to j3tz.
If the process has a path space (H,7i), it is natural to consider nt and
fit as mappings from the set Ha to Ha where
Ha := |J KtPsH.
This set is internally shift-invariant, and if the process is shift-measurable,
then so are processes with state space (E U {A}, a(£ U {{Zi}})) and path
space (Ha,T-La), where 1-La is (of course) generated by the projection
mappings.
It can be convenient to have the shift maps also defined for t = oo. Let
that is, #oo is the mapping from Ha to Ha defined by
(900z)s = A, 00<oo, zGHa-
For t < oo extend the definition of 9t in the obvious way to Ha-
6tz = (zt+s)s€[o,oo), z 6 Ha-
If Z is a one-sided stochastic process and T a nonnegative random time
then 8tZ,PtZ and ktZ denote the Ha valued mappings defined on fi by
(6tZ)(lu) = 6T(L0)Z{u),
(f3TZ)(oj) = f3T(Lo)Z(oj),
(nTZ)(u>) = kt(u)Z{u),
for w£(l.
2.10 A Countable Product of Polish Spaces Is Polish
Existence of regular conditional distributions is the key to applying the
conditioning extension (Chapter 3, Section 4), which we use quite heavily.
According to the following result, discrete-time stochastic processes Z with
a Polish state space have also a Polish path space and thus (Chapter 3,
Fact 4.1) have regular conditional distributions.
Theorem 2.2. A countable product of Polish spaces is Polish.
134 Chapter 4. STOCHASTIC PROCESSES
Proof. Let (Ei,£i), (E2,£2),-- ■ be a sequence of Polish spaces. Let dk
be a metric making Ej. complete and separable and £k its Borel subsets.
Define a metric d on rji° E/. by
oo
d = Y^2~k Adk-
1
In order to show that d is complete let zn = (z^)^=1, 1 ^ n < oo, be
a Cauchy sequence in rji° E/. with respect to d. Then z%, 1 ^ n < oo,
is a Cauchy sequence in Ej. with respect to dk, for each k, and since E\.
is complete, this sequence has a limit Zk- Put z = (zk)f. Since for each
k, dk(z^,Zk) -> 0 as n -> oo, it follows by dominated convergence that
d(zn, z) —> 0 as n —> oo. Thus d is complete.
In order to establish separability let, for each k, Ak be a dense countable
subset of Ej. with respect to dk and let ak be a fixed element of Ak- Then
the YlT Ek subset
oo
A = [j A1x-- xAkx {ak+1} x {ak+2} x ■■■
is countable, since the finite product Ai x ■ ■ ■ x Ak is countable and since
a countable union of countable sets is countable (note that A is different
from the uncountable set ni°^*)- Fix a z = (zfc)i° m TIT Ek- For each
e > 0 there is an n such that 2~n < 2_1e and for each k < n a bk in Ak
such that dk(zk,bk) ^ 2_1e. Put b = (blt... ,bn,an+i,an+2,. ■ ■ )■ Then b is
in A and
n oo
d(z, b) sC 2~xe Yl2~k + Y1 2~k ^ 2_l£ + 2~" < e-
1 n+l
Thus the countable set A is dense in \\T Ek, that is, \\T Ek is separable.
In order to establish that 0^° Sk is the Borel c-algebra BIXYT Ek), reca^
that separability and second countability are equivalent for a metric space,
and thus each Ek has a countable base. The sets
AiX ■■■ x Anx En+l x En+2 x • ■ ■ , 1 ^ n < oo,
where the Ak range over the countable base of Ek, form a countable base for
\\T ^k- Thus #(ni° Ek) is contained in ®^° £k- Conversely, for a fixed n
let A be the largest sub-c-algebra of £n making the nth projection mapping
<B(ni° Ek)/A measurable, that is,
oo
A= {Ae£n:E1 x ■■■xEn_1 xAxEn+l x ■•• G fi(JJ^*)}-
Section 2. Preliminaries - What Is a Stochastic Process? 135
Then A contains the open sets of £n and thus A = £n. Hence, for each n,
the nth projection mapping is B(Y[^° E/.)/£n measurable. It follows that
0J°£fc is contained in B([\f Ek), that is, we have proved that ®^°£fc =
m? Ek). □
2.11 Weak-Sense-Regular Conditional Distributions
In the standard settings in continuous time we shall not establish the
existence of regular conditional distributions [in fact, (CE([0, oo)),Ce([0, oo))
and (De([0, oo)),T>e([0, oo)) are both Polish when (E,£) is Polish; see
Ethier and Kurtz (1986), but we shall not use this fact here]. We shall
establish a weaker result, which turns out (next subsection) to be all we
need to carry out the much-used transfer extension.
Let Y\ and Y2 be random elements in some measurable spaces [E\,£\)
and (£2,£2), respectively, defined on the probability space (fi,T,P). Say
that there exists a weak-sense-regular conditional distribution of Y2 given
Yi if (.E2,£2) can be embedded into a larger measurable space where
regular conditional distributions exist, that is, if there is a measurable space
(£3, £3) and a subset G of E3 (typically G £ £3) such that
(£2, £2) and (G, £3 C\G) are Borel equivalent and there exists a
regular version of I3 given Y\ where Y3 is the random element
in (£3,£3) defined by Y3 = /(I2) with / the Borel equivalence.
Theorem 2.3. Let Z be a one-sided continuous-time stochastic process
with a Polish state space (E,£) and right-continuous paths. Let Y be some
random element defined on the same probability space (fi, T, P) as Z. Then
there exists a weak-sense-regular conditional distribution of Z given Y.
PROOF. Let Q+ be the nonnegative rationals. In the above definition put
Yx := Y,
Y2:=Z and (E2,£2) := (RE([0, oo)),nE([0,oo))),
(£3,£3):=(£°+,£0+) and G := {(zs)seQ+ : z G KE([Q, 00))}.
Let / be the bijection from Re([0, 00)) to G defined by
f(z) = (zs)seQ+, z S Re([0, 00)).
Now Y3:=(Zs)sm+ is a random element in (JE°+,£°+) and (Zs)sm+ =f(Z).
By Theorem 2.2, (£°+,£°+) is Polish, and thus [Fact 4.1 in Chapter 3]
there exists a regular conditional distribution of (Zs)s6q+ given Y. □
2.12 Transfer Revisited
We shall now show that weak-sense-regularity (and thus Theorem 2.3)
suffices for extension purposes. Consider again the transfer extension in
Section 4.5 of Chapter 3, that is, let Yx be a random element in (£i,£i)
136 Chapter 4. STOCHASTIC PROCESSES
defined on a probability space (fi, T, P) and suppose we have managed to
construct on another probability space (fi', T', P') a pair (Y{, Y2) where Y2
is a random element in some measurable space (Z?2, £2) and Y{ is a random
element in (E\,£i) such that
This time assume only that there exists a weafc-serase-regular version of the
conditional distribution of Y2' given Y{.
Theorem 2.4. With Y\ and (Y/, Y2) as above, (fi, T, P) can be extended
to support a random element Y2 in (£2,£2) such that
(YUY2) = {YIX)- (2-1)
Proof. Let / be the bijection (the Borel equivalence) from E2 to the
subset G of Ez and let Y3' be the random element in (E3,£3) defined by
YI = /(Y2). Since (by assumption) there exists a regular version of the
conditional distribution of Y3' given Y{, we can use the transfer extension
of Section 4.5 in Chapter 3 (see Remark 4.1 at the end of that subsection)
to obtain a G valued random element Y3 in (E3,£3) such that
(YltY3)2(Y{,Y& (2-2)
Since both Y3 and Y3 are G valued, they can be considered as random
elements in {G,£% C\G) rather than (£3,£3). After this modification, (2.2)
still holds and also Y2 = /_1(Y3'), and finally we can define Y2 = f~1(Yz)
to obtain from (2.2) that (2.1) holds, as desired. □
It is also readily checked that given Y\, Y2 is conditionally independent of
any random element Yq supported by (fi, T, P) before the extension.
3 Exact Coupling - Distributional Exact Coupling
After these lengthy preliminaries we now return to coupling. This section
introduces exact coupling and its distributional version.
3.1 Exact Coupling — Definition
Let Z and Z' be one-sided discrete- or continuous-time stochastic processes
with a general state space (E, £) and a general path space (H, %). We shall
use notation in accordance with continuous time, but all we need to switch
to discrete time is to substitute, for instance, t and s by n and k.
Recall that (Z, Z') is a coupling of Z and Z' if
Z = Z and Z' = Z'.
Section 3. Exact Coupling - Distributional Exact Coupling 137
A nonnegative random time T (integer-valued in the discete-time case) is
a coupling time (or coupling epoch) if
6TZ = 6TZ' on {T < 00} (see Figure 3.1).
(3.1)
The triple (Z,Z',T) is an exact coupling of Z and Z', P(T < 00) is the
success probability, and (Z, Z',T) is successful if P(T < 00) = 1.
Using the convention that a process is absorbed in the cemetery state
when shifted to infinity, we can rewrite (3.1) simply as
6tZ = 6tZ .
Also, using the birth maps [see Section 2.9], (3.1) can clearly (if unclear,
consult the end of next subsection) be rewritten as
pTz = pTz'.
(3.2)
Note that
T+ 1 is also
a coupling time.
FIGURE 3.1. Exact coupling: the processes merge from time T onward.
3.2 Distributional Exact Coupling — Definition
We now weaken the requirement that Z and Z' ultimately coincide and
demand rather that Z behave distributionally from a time T onward as
Z' does from a time X" onwards: say that (Z,Z',T,T') is a distributional
exact coupling of Z and Z' if (Z, Z') is a coupling of Z and Z' and
D
(3TZ = Pt'Z1 (see Figure 3.2).
(3.3)
The bold parts
have the same
distribution.
jO/hy
0 r
FIGURE 3.2. Distributional exact coupling: the processes merge in distribution.
138 Chapter 4. STOCHASTIC PROCESSES
Call T and X" distributional coupling times. If T s T', call T a single
distributional coupling time. Note that if (Z, Z',T) is an exact coupling, then
(Z,Z',T,T) is a distributional exact coupling with a single distributional
coupling time T. We shall use the word nondistributional to distinguish
an exact coupling from a distributional one. Otherwise, we use the same
terminology in both cases.
From (3.3) it follows that
rp D fit
This can be seen as follows: T is the time when flrZ exits from A, and T
is recovered in a measurable way from /3tZ, since it is the pointwise limit
of
Tn = sup{fc J> 0 : (PrZ)k/n = A}/n,
which are measurable mappings of (3tZ; in the same measurable way X" is
recovered from /3T>Z'; thus (3.3) yields T = T".
Moreover, when Z and Z' are discrete-time processes or continuous-time
shift-measurable, then (Z,Z',T,T') is a distributional coupling of Z and
Z' if and only if
(6TZ,T) = (6T,Z',T'). (3.4)
This can be seen as follows. From shift-measurability and
QTZ = eT^rZ and QT.Z' = 6TI(3T,Z'
we see that 9TZ is the same measurable mapping of (PtZ, T) as 9t> Z' is of
(PtiZ'jT1). We have just seen that T is the same measurable mapping of
(3tZ as X" is of Pt'Z1 . Thus 6tZ is the same measurable mapping of /3t'Z
as 9T'Z' is of /3T'Z'. Thus (3.3) implies (3.4). Conversely, for 0 ^ s < oo,
(B Z) ={A on{T>5},
{PT )s \{6TZ)S-T on{T^5},
and thus by shift-measurability PtZ is a measurable mapping of (9tZ,T).
Since faZ' is the same measurable mapping of (8t'Z\T'), we have that
(3.4) implies (3.3).
For an example of a distributional exact coupling consider the classical
coupling. Recall that a nondistributional exact coupling is obtained by
letting two differently started versions of a Markov chain, Z and Z', run
independently until they meet, say at time T, and letting the chains run
together from time T onward. A distributional exact coupling (with a single
time T) is obtained by letting the chains continue to run independently after
meeting at time T (that is, if we allow the chains to stay independent and
do not introduce the chain Z" as in Section 2.2 of Chapter 2).
Section 3. Exact Coupling - Distributional Exact Coupling 139
3.3 The Hats May Be Dropped in the Distributional Case
We shall now show that if we have a distributional exact coupling of Z and
Z', then we can take Z and Z' to be the original processes Z and Z'.
Theorem 3.1. Suppose (Z, Z',T,T') is a distributional exact coupling of
Z and Z'. Then the underlying probability space can be extended to support
random times T and X" such that
(Z, T) = (Z, f) and (Z', T) = (Z', f").
In particular,
(3TZ = (3T.Z'. (3.5)
PROOF. This follows from the transfer extension in Section 4.5 of
Chapter 3. In order to obtain T take Yx := Z and (Y{,Y2) := (Z,T) and define
T : = Y2. In order to obtain T" take Yx := Z' and (Y{,Y£) := (Z',f') and
define T':=Y2. D
This theorem motivates dropping the hats when discussing distributional
coupling, at least when there is no danger of confusion. We shall say that
T and X" are distributional coupling times for Z and Z' if (3.5) holds.
3.4 Turning Distributional into Nondistributional
In the standard settings a distributional exact coupling can always be
turned into a nondistributional one.
Theorem 3.2. Let (Z,Z',T,T') be a distributional exact coupling of Z
and Z'. Suppose there exists a weak-sense-regular conditional distribution
of Z given (3fZ [this holds in discrete time when the state space is Polish
and in continuous time when the state space is Polish and the paths are
right-continuous}. Then the underlying probability space (fi,^7,P) can be
extended to support T and Z" such that
(Z, T) = (Z, f) and (Z", T) = (Z', f")
and (Z,Z",T) is a nondistributional exact coupling of Z and Z'
PROOF. Let T be as in Theorem 3.1. Obtain Z" by applying the transfer
extension in Section 2.12 as follows. Take Yx := pTZ and (F/,!^') : =
{Pf,Z',Kf,Z') to obtain Y2 such that
(PrZ,Y2) = (l3t.Z',KT,Z').
Define Z" by
((3tZ",KTZ"):=(PtZ,Y2).
140 Chapter 4. STOCHASTIC PROCESSES
Since (/3tZ",ktZ") is a copy of (/3f,Z',Kf,Z'), it follows that (Z",T) is a
copy of (Z',2") because (Z",T) is determined in the same measurable way
by {PtZ",ktZ") as (Z1 ,f') is by (/3f,Z',Kf,Z'). D
4 Distributional Coupling
Distributional coupling concepts will play some role in what follows, and
before continuing with stochastic processes, we shall now devote a whole
section to the simplest of them all, the distributional version of a coupling
event. The section ends with a general comment on distributional coupling.
4.1 Distributional Coupling Events
Let (Y,Y') be a coupling of two random elements Y and Y' in an arbitrary
space (E,£). Two events C and C" are distributional coupling events of
(Y,Y') if Y has the same distribution on C as Y' has on C", that is, if
p(y e-,C) = P(y'e-,c"). (4.1)
Note that P(C) = P(C"). If C = C, call C a single distributional coupling
event. If C is a coupling event [Y = Y' on C], then clearly C is a single
distributional coupling event. We shall use the word nondistributional to
distinguish a coupling event from a distributional one.
We shall first show that a coupling with distributional coupling events
can always be unhatted, that is, we can take Y and Y' to be the original
Y and Y'.
Theorem 4.1. Let (Y,Y') be a coupling of Y and Y' with distributional
coupling events C and C'. Then the underlying probability space can be
extended to support events C and C' such that
(y,ic) = (r,i(j) and (Y',ic>) = (Y',id,).
In particular,
P(Y e-;C) = P(y'G-;C"). (4-2)
Proof. Apply the splitting extension in Section 5.1 of Chapter 3. Due to
(4.1), P(f G -,C) is a component of both P(Y G ■) and P(Y' G •)■ Let I
and /' be 0-1 variables such that
P(y e -,/ = 1) = P(Y' G -,/' = 1) = P(y G -,C).
Take C = {I = 1} and C" = {/' = 1} to obtain the desired result. □
We now show that a coupling with distributional coupling events can always
be turned into a coupling with a nondistributional coupling event.
Section 4. Distributional Coupling 141
Theorem 4.2. Let (Y,Y') be a coupling of Y and Y' with distributional
coupling events C and C". Then the underlying probability space can be
extended to support an event C and a Y" such that
(Y,lc) = (Y,ld) and (F", lc) = (f, ld.)
and (Y,Y") is a coupling ofY and Y' with C a nondistributional coupling
event.
Proof. Let C be as in Theorem 4.1. Let W be a random element in (E, £)
that is independent of C and has distribution P(F' € -\C'C). Define Y" by
Y" := Y on C and Y" : = W on Cc. D
4.2 The Coupling Event Inequality — Maximality
The distributional coupling event inequality is a corollary to the
nondistributional one. (It can also be established directly in exactly the same way
as in the nondistributional case.)
Theorem 4.3. Let (Y,Y') be a coupling of Y and Y' with distributional
coupling events C and C. Then, with \\ • || the total variation norm,
||P(F € •) - P(F' € -)l| <2P(CC). COUPLING EVENT INEQUALITY
PROOF. This follows from Theorem 4.2 and the coupling event inequality
in the nondistributional case [Section 8.3 in Chapter 3]. □
A coupling with distributional coupling events such that the coupling event
inequality is an identity is distributionally maximal and its events are distri-
butionally maximal. Such a coupling always exists, since a nondistributional
maximal coupling (Section 8.3 in Chapter 3) is in particular distribution-
ally maximal. We now show that there exists an 'unhatted' coupling that
is distributionally maximal.
Theorem 4.4. Let Y and Y' be random elements in an arbitrary space
(E,£). Then the underlying probability space can be extended to support
events C and C such that (4.2) holds and
||P(Fe-)-P'CF'(E-)ll=2P(Cc),
that is, (Y,Y',C,C) is distributionally maximal.
Proof. This is an immediate consequence of Theorem 4.1 and the
existence of a maximal coupling in the nondistributional case [Section 8.3 in
Chapter 3]. □
142 Chapter 4. STOCHASTIC PROCESSES
4.3 Comment on Distributional Coupling
Up to now we have been concerned with using coupling to turn
distributional properties and relations into their pointwise counterparts. The
distributional versions of both exact coupling and coupling events are steps in
the reverse direction: they loosen up pointwise relations, turning them into
distributional relations. In fact, to obtain a distributional exact coupling
we no longer need to couple Z and Z', we only need the times T and X"
[Theorem 3.1]. Similarly, to obtain distributional coupling events we need
not couple Y and 1", we only need the events C and C [Theorem 4.1].
We shall see in the upcoming sections and chapters that this can be quite
convenient.
Distributional exact coupling assumes nothing about the joint
distribution of the pairs (Z,T) and (Z1, T"). They could in principle be defined on
different probability spaces: the pairs are only linked distributionally, that
is, through the distributional relation PtZ = (3t'Z' rather than through
the pointwise and therefore, necessarily, defined-on-a-common-probability-
space relation firZ = p\Z'. Thus Z and Z' need not be a coupling of Z and
Z', that is, although Z and Z' should have the same distributions as Z and
Z', respectively, they need not really be defined on a common probability
space as in the formal definition of coupling.
A similar comment applies to a coupling (Y, Y') with distributional
coupling events C and C": we assume nothing about the joint distribution of
(Y,C) and (Y',C).
We could formalize this as follows:
Let Y and Y' be random elements in an arbitrary space (E, £) defined
on the probability spaces (0, T', P) and (0', J7', P'), respectively. Call
two random elements Y and Y' defined on some probability spaces
(Ct,J-,P) and (fi',.F',P'), respectively, a distributional coupling ofY
and Y' if Y is a copy of Y and Y' is a copy of Y'.
According to this definition we should write (Z,T) and (Z',T") for a
distributional exact coupling rather than (Z, Z', T, X"). In fact, writing (Z, T)
and (Z',T') indicates nicely that we assume nothing about the joint
distribution of the pairs.
We shall not use this definition, however, due to the convention of the
common probability space (Chapter 3, Section 3.1). This is mentioned here
only because it may be an illuminating observation.
5 Exact Coupling - Inequality and Asymptotics
Section 3 was devoted to the definition of exact coupling and its
distributional version. We shall now go on to the limit implications.
Section 5. Exact Coupling - Inequality and Asymptotics 143
5.1 Coupling Time Inequality
The following inequality (encountered repeatedly in Chapter 2) explains
much of the interest in exact coupling.
Theorem 5.1. Let Z and Z' be one-sided discrete- or continuous-time
stochastic processes with a general state space (E, S) and an arbitrary path
space [H,T-L). If there is an exact coupling (nondistributional or
distributional) of Z and Z' with time T, then for 0 ^ t < oo,
\\P{OtZ £■)- P(6tZ' € -)ll COUPLING TIME
^ 2P(T > t). INEQUALITY
Proof. In the nondistributional case {T ^ t} is clearly a coupling event of
the coupling (9tZ,0tZ') oiOtZ and 9tZ', and the coupling event inequality
(Section 8.3 in Chapter 3) yields the coupling time inequality.
In the distributional case we lean on the distributional version of the
coupling event inequality (Theorem 4.3). Clearly,
9tZ = 9tj3TZ on {T ^ t} and 9tZ' = 8tj3T>Z' on {T1 ^ t],
and thus we obtain from $tZ — fir1 Z1 that
P(9tz eA,T^t) = P(6tz' eA,T'^ t), A en.
In Theorem 4.3 take
Y = 9tZ, Y'=9tZ', C = {T^t}, C' = {T'^t},
to obtain the coupling time inequality. □
5.2 Finite T — Plain Total Variation Convergence
The coupling time inequality is of basic importance for total variation
asymptotics. We first note that if there exists a successful exact coupling
(nondistributional or distributional) of Z and Z', then P(T > t) -> 0 as
t —> oo, which yields
\\P(9tZe-)-P{9tZ'e-)\\^0, f->oo. (5.1)
If Z' is stationary, that is,
9tZ' = Z', t> 0,
then (5.1) can be rewritten as
8tZ % Z', t -> oo, (asymptotic stationarity). (5.2)
144 Chapter 4. STOCHASTIC PROCESSES
5.3 Finite Moments of T — Rates of Convergence
Results on rates of convergence can be obtained from the coupling time
inequality if we know how fast P(T > t) goes to zero (examples were given
in Chapter 2, Section 4).
Let ip be a nondecreasing function from [0, oo) to [0, oo). If
<p{t)P{T > t) -> 0, t -> oo, (5.3)
then clearly the total variation convergence is of order ip, that is,
<p(t)\\P{8tZe-)--p(8tZ'G-)\\^0, t^oo. (5.4)
Common functions to consider are
<p(t) — ta where a > 0 (power order a),
<p(t) = pl where p > 1 (exponential or geometric order p).
Also, logarithmic order comes to mind and mixtures of these orders. More
general classes of functions will be considered in Chapter 10 (Section 7),
where the observations in this section are applied to regenerative processes.
Often a finite (p-moment of T, E[<p(T)] < oo, is what we have rather
than the rate condition (5.3). Note that
<p(t)P(T > t) < E[ip(T); T >t) for nondecreasing <p
-> 0 as t -> oo if E[ip(T)] < oo
by dominated convergence (since ip(T)l{T>t} ^ V(^) and goes to zero
pointwise as t —> oo). Thus a finite (^-moment implies (5.3), and we obtain
E[<p(T)] < oo
(5.5)
=> <p(t)\\P(6tze-)-P(6tZ'e-)\\^o, i^oo,
for nondecreasing functions ip.
5.4 Finite Moments of T — Moment Rates of Convergence
Now E[(/?(T)] < oo is stronger than the rate condition (5.3) and should
yield a stronger rate result.
Consider first continuous time and suppose ip(Q) = 0 and that <p has a
density, that is, there is a nonnegative measurable function ip such that
tp(t) = / ip(s) ds, 0 ^ t < oo.
Jo
Clearly f* <p(s) ds = J™ v(s)l{t>s} ds and thus <p(T) = f™ v(s)l{T>s} ds,
Section 5. Exact Coupling - Inequality and Asymptotics 145
which yields
/•OO
E[<p(T)] = / (p(t)P(T > t) dt [by Fubini].
Jo
Combine this and the coupling time inequality to obtain that the total
variation convergence is of moment-order ip, that is,
E[<p(T)] < oo
(5.6)
/
Jo
<p(t)WP(0tz e •) - p(<?tz' e Oil dt < oo,
(5.7)
for nondecreasing ip having a density 0. When ip(t) = ta where a > 0, then
we have convergence of power moment-order a — 1. When tp(t) = pl where
p > 1, then we have convergence of exponential (or geometric) moment-
order p.
In the discrete-time case let A<p denote the difference function (here A
is not the cemetery state!)
A<p(0) = <p(0) and A<p(k) = <p(k) - <p(k - 1), k^ 1.
Then, instead of (5.5) we clearly have
E[<p(T)] < oo
=> Y, AvWWkZ e •) - p(^^' e Oil < °°-
0
Remark 5.1. The integrand in (5.6) is measurable because
\\P(9tze-)-P{9tZ'e-)\\
= 2 sup |P(Z e El0'*' x A) - P(Z' e El0-" x A)\,
AeH
which clearly is nonincreasing (and thus measurable).
5.5 Stochastically Dominated T — Uniform Convergence
Let Z be a class of discrete- or continuous-time stochastic processes on
a general state space. An example of such a class is the collection of all
differently started Markov chains having the same transition probabilities.
Suppose there exists a finite random variable T such that for all pairs of
processes Z, Z' e Z there is an exact coupling (distributional or not) of Z
and Z' with time T such that
T < T [that is, P(T > t) < P(T > t) for 0 < t < oo].
146 Chapter 4. STOCHASTIC PROCESSES
Then the coupling time inequality yields that for 0 ^ t < oo,
||P(0tz e •) - P{Btz' e OIK 2P(T > i), z, z' e 2,
and we obtain uniform convergence over the class Z:
sup ||P(0(Ze-)-p(0tz'<E-)ll ->0, £->oo. (5.8)
z,Z'ez
Rates of convergence are obtained in the same way as in the previous
subsection under conditions on T rather than T: for nondecreasing functions
<p it holds that
E[yj(f)] < 00
=> <p(t) sup ||P(0tze-)-P(0tZ'e-)ll->o, ^^ (5.9)
z,z'ez
in the continuous-time case the stronger result
E[<p(f)} < oo
r°° (5.10)
=> / <p{t) sup ||P(6»(ze-)-p(^z'e-)ll*<oo
70 Z,Z'£Z
holds, provided that ip has a density 0, while in the discrete-time case
E[ip(f)} < oo
~ (5.11)
=> V^A) sup ||P(flfcZG-)-P(«fc^'G-)ll<oo-
o z,z'ez
In Chapter 10 (Section 7) we make a thorough use of the results of this
section in the case of regenerative processes.
6 Exact Coupling - Maximality
We now turn to the task of reversing the implications in the previous
section: we shall show that there is always an exact coupling that is good
enough. In this section we prove this by a direct measure-theoretic
construction, but in Section 8 we shall reformulate the proof after introducing
the concept of maximal coupling with respect to a sub-cr-algebra in
Section 7.
6.1 The Maximality Theorem
Call an exact coupling (distributional or not) of Z and Z' with time T
maximal at time t if the coupling time inequality is an equality at t:
\\P(6tZ e ■) ~ 'PifitZ' G -)l| = 2P(T > t).
Section 6. Exact Coupling - Maximality 147
Call an exact coupling (distributional or not) of Z and Z' with time T
maximal if it is maximal at all times:
\\P(9tz e •) - F(°tZ' e -)ll = 2P(T > *), o ^ i < oo.
A maximal exact coupling brings the processes together maximally fast
(has a minimal coupling time).
We shall now show that a maximal distributional exact coupling always
exists in discrete time. In continuous time, however, the left-hand side of
the coupling time inequality need not be right-continuous in t, whereas the
right-hand side is. Thus in continuous time, equality cannot be achieved in
general and we shall content ourselves here with showing that equality can
be achieved at a sequence of times increasing to infinity.
Theorem 6.1. (a) Let Z and Z' be one-sided discrete-time stochastic
processes with a general state space (E,£). Then there exists a maximal
distributional exact coupling (Z, Z', T, T') of Z and Z'. Moreover, there exists
a maximal nondistributional exact coupling of Z and Z' if there exists a
weak-sense-regular conditional distribution of Z given 9tZ [this holds when
{E,£) is Polish}.
(b) Let Z and Z' be one-sided continuous-time stochastic processes with
a general state space (E,£) and an arbitrary path space (H,T{). Let to <
t\ < ■ ■ ■ be a sequence of nonnegative real numbers increasing to infinity.
Then there exists a distributional exact coupling (Z,Z',T,T') of Z and Z'
(with to, t\,..., oo valued times) which is maximal at the times tn, that is,
||P(0t„ Z e •) - P(0t„ Z' e -)ll = 2P(T > tn), 0 ^ n < oo. (6.1)
Moreover, there exists a nondistributional exact coupling of Z and Z' with
this property if there exists a weak-sense-regular conditional distribution
of Z given (3tZ [this holds when (E,£) is Polish and the paths are right-
continuous].
Remark 6.1. Thus in discrete time, total variation convergence implies
the existence of a successful distributional coupling, and the same holds in
continuous time because ||P(0tZ G •) — P(6tZ' G -)II is nonincreasing (see
Remark 5.1).
6.2 Preparation for the Proof of Theorem 6.1
The following lemma is the key part of the proof of Theorem 6.1.
Lemma 6.1. Let n be a bounded measure on a measurable space (£,£).
Let A be a sub-a-algebra of C and A a component of the restriction h[a of
H from C to A- Then there exists a component v of ^ such that
v\A = A.
148 Chapter 4. STOCHASTIC PROCESSES
PROOF. Define a nonnegative set function v on £ by
v{A) = Jn(A\A)d\, AeC, [here p(A\A) := (l*/M)(A\A)].
For A e A we have p.(A|yl) = lA ^ a.e. and thus v\a = A. Since A ^ /j,\a,
we have v ^ J" ju(-|^4)d/u, that is, v ^ ^u. Since the null sets of ju|^ are
null sets of A, it follows that v does not depend on the version of ^-\A).
Thus for a given sequence of sets in £ we can choose a version that is a-
additive for that particular sequence. Hence v is u-additive, and the lemma
is established. □
6.3 Proof of Theorem 6.1
We shall prove (a) and (b) simultaneously. To simplify notation in the
continuous case we carry out the proof for tn = n; the proof for general tn
is analogous (replace 0,1,2,... by t0,ti, £2, • • • throughout). Due to
Theorem 3.2 we only need to find a distributional coupling such that (6.1) holds.
Put
tt:=P(ZG-) and V := P(Z' e •)•
If there are measures vo,..., v^ and i/J,...,^oiiW such that
it = v0-\ h j/oo and n' = i/'0-\ V v'^, (6.2)
then we can use the splitting extension (Section 5.2 in Chapter 3) to obtain
integer-valued random times T and X" such that
V{Z e-,T = n) = vn and P(Z' e -,T' = n) = v'n, 0 < n ^ 00.
Then (Z, Z', T, T') would be a distributional exact coupling of Z and Z' if
P(0nZ € A,T = n) = P(9nZ' e A,T' = n), 0 ^ n < 00, A e H, which is
equivalent to
vn\rn = ^nlr„, 0 ^ n < 00, where Tn = 0~l1-L. (6.3)
And (Z,Z',T,T') would be maximal at integer times, that is, (6.1) would
hold, if we establish ||7r|r„ ^tt'\t„ II = P(^ ^ n) (see Section 8 in Chapter 3,
in particular display (8.12), and recall that A denotes greatest common
component), which is equivalent to
(i>0-\ \-vn)\rn = t|t„ A7r'|r„. 0 ^ n < 00. (6.4)
Thus all we have to do is find subprobability measures vq,...,Voo and
(/0,...,i/>nW such that (6.2), (6.3), and (6.4) hold.
Since the measure 7r|r„-i A 7r'|T^_i is a component of both ir\j-n_1 and
tt'\t„-i and since 7^ is a sub-u-algebraof Tn-\, it is clear that the restriction
Section 7. Coupling with Respect to a Sub-cr-Algebra 149
of 7r|r„-i A ir'\r„-i to Tn is a component of both -k\t„ and ir'\r„, that IS>
(^iTn-i A ^'iTn-i)!^ is a component of 7r|r„ A 7r'(7-^. Thus we can define
subprobability measures An on Tn by
Ao = 7T A 7r',
A« =7r|r„ Att'It,, -(t|t„-i A7r'|r„-i)lr„, 1 <« < 00.
Make the induction assumption that there are subprobability measures
vo,... ,vn on % such that
^|rfc=Afc, 0 ^ fc ^ n, and f0 H h t-„ ^ 7r; (6.5)
this certainly holds for n = 0, since % = H. In Lemma 6.1 put n = 7r —
(i/0H hi>„) and A = An+i to obtain that there is a subprobability measure
vn+i on % such that vn+i\rn+i = ^n+i and vn+\ ^ it - (vQ + ■ ■ ■ + vn).
Thus (6.5) holds with n replaced by n + 1.
By induction we have proved that for all n < 00 there are
subprobability measures v0,... ,vn on % such that (6.5) holds, that is, there are
subprobability measures u0,u\,... on H such that
vn\fn = An, 0 ^ n < 00, and 1/0 + v\ + ■ ■ ■ ^ it.
In the same way we obtain subprobability measures v'Q, v\,... on % such
that
v'n\rn = A„, 0 ^ n < 00, and f0 + ^i H < 7I''-
Thus (6.3) and (6.4) hold, and defining v^ and u'^ by
^00 = 7T - (^0 + v\ H ) and 1/^ = 7r' - (i/q + v\ -\ )
yields (6.2) and completes the proof of Theorem 6.1.
7 Coupling with Respect to a Sub-cr-Algebra
In this section we extract a concept hidden in the argument of the last
section; it will be used in Section 8 to reformulate the proof of Theorem 6.1.
We have noted earlier that if (Z, Z', T) is an exact coupling, then {T ^ t) is
a coupling event of (9tZ, 9tZ') for each finite t. On the other hand, {T ^ t}
is in general not a coupling event of (Z, Z1). We remedy this by extending
the concept of a coupling event.
7.1 Coupling Event with Respect to a Sub-cr-Algebra
Let (Y, Y') be a coupling of two random elements Y and Y' in a measurable
space (E^S). Let A be a sub-cr-algebra of £. Call an event C an A-coupling
event if Y and Y' are A-identical (or A-indistinguishable) on C, that is, if
{Y e A} n c = {Y1 e A} n c, AeA.
150 Chapter 4. STOCHASTIC PROCESSES
This is equivalent to C being a coupling event for the coupling (1a(Y),
1a(Y')) of 1a(Y) and l/t(F') for each A £ A. This in turn is equivalent to
C being a coupling event for the coupling (f(Z), f(Z')) of f(Z) and f(Z')
for all real-valued A/B measurable functions /.
Call two events C and C" distributional A- coupling events if Y and Y'
have the same distribution on C and C", respectively, when considered as
random elements in (E,A):
P(F eA,C) = P{Y'eA,C), AeA. (7.1)
If we regard the F's as random elements in (E,A) rather than in {E,£),
then (Y,Y') is still a coupling of Y and Y', and C and C" are ordinary
distributional coupling events of this coupling. The converse of this does
not hold: although F and Y have the same distribution when regarded as
random elements in (E, A), they need not have the same distribution when
regarded as random elements in (E,£).
We shall first show that a coupling with distributional ^4-coupling events
can always be unhatted, that is, we can take F and F' to be the original
F and Y'.
Theorem 7.1. Let (Y,Y') be a coupling of Y and Y' with distributional
A-coupling events C and C. Then the underlying probability space can be
extended to support events C and C such that
(F,1C) = (F,16) and (Y',lc>) = (Y',lGl).
In particular,
p(ye-,c)U = P(r'e-,c")U. (7.2)
PROOF. This follows immediately from Theorem 4.1 by regarding the F's
as random elements in (E,A) rather than (E,£). D
Say that a coupling has a single distributional ^4-coupling event if (7.1)
holds with C" = C. Certainly, a nondistributional ^4-coupling event C is a
single distributional ^4-coupling event. We next show that a coupling with
two distributional ^4-coupling events can always be turned into a coupling
with a single distributional y4-coupling event (the question of when such a
coupling can be made nondistributional is discussed in Remark 7.1 below).
Theorem 7.2. Let (Y,Y'j be a coupling of Y and Y' with distributional
A-coupling events C and C'. Then the underlying probability space can be
extended to support an event C and a Y" such that
(Y,lc) = (Y,ld) and (Y",lc) = (Y',ld,)
and (F, F") is a coupling of Y and Y' with C a single distributional A-
coupling event.
Section 7. Coupling with Respect to a Sub-cr-Algebra 151
Proof. Let C be as in Theorem 7.1. Let V and W be random elements
in (E,£) that are independent of the event C and have the distributions
P(Y' G -\C') and P(Y' G -|C"C), respectively. Define Y" by Y" := V on C
and Y" := W on Cc. D
7.2 ^4-Coupling Event Inequality — Distributional Maximality
The ^4-coupling event inequality is a corollary to the distributional coupling
event inequality.
Theorem 7.3. // C and C are A-coupling events (distributional or not)
of a coupling (Y,Y') ofY andY', then
\\P(Y G -)U - P(^' e -)UII ^-COUPLING EVENT
< 2P(CC). INEQUALITY
Proof. This follows immediately from Theorem 4.3 by regarding the Y's
as random elements in (E,A) rather than (E,£). D
Call a coupling (Y,Y') maximal with respect to A if there is an ^4-coupling
event C such that the ^-coupling event inequality is an equality. Call
(Y, Y') distributionally maximal with respect to A if there are
distributional ^4-coupling events C and C such that the inequality is an equality.
Call the (distributional) ^4-coupling event(s) C (and C") maximal if this is
the case. A coupling (Y, Y') with distributional ^4-coupling events C and
C" is maximal in this sense if and only if [recall that _L denotes mutual
singularity]
P(Ye-,Cc)\A ± P(Y'e-,C'c)\A (7.3)
and if and only if
||P(ye-)UAP(F'e-)UII=P(C); (7.4)
see Section 8 in Chapter 3.
Theorem 7.4. There always exists a coupling with a single distributionally
maximal A-coupling event.
Proof. Use the splitting extension of Section 5.1 in Chapter 3 as follows.
Regard Y as a random element in (E, A) and take
i/:=P(FGOUAP(K'G-)U
to obtain a 0-1 variable / such that
P(Y e -,I = i)U = v<Y e -)U aP(F' g -)U-
152 Chapter 4. STOCHASTIC PROCESSES
In the same way we obtain a 0-1 variable I' such that
P(F' e -,!' = 1)U = P(Y £ -)U A P(F' G -)U-
Thus C = {/ = 1} and C" = {/' = 1} are maximal distributional A-
coupling events of the (unhatted) coupling (Y,Y'). Apply Theorem 7.2 to
complete the proof. □
Remark 7.1. Theorem 7.4 claims only the existence of a coupling that
is distributionally maximal with respect to A. The existence of a coupling
that is nondistributionally maximal with respect to A would follow if we
found a way to turn a coupling with distributional ^-coupling events into
a coupling with a nondistributional ^.-coupling event. When there exists a
regular version of the conditional distribution of Y' given Y' regarded as a
random element in (E,A), then the conditioning extension can be used to
turn a coupling with distributional ^-coupling events into a coupling with
an almost sure ^4-coupling event, that is, with an event C such that for
AeA,
P(f e A,Y" e A,C) = P(Y eA,C) = P(Y' e A,C).
So the question is when the a.s. can be removed. According to the next
theorem this can be done, in particular, when A is generated by a
measurable mapping g taking values in a separable metric space equipped with its
Borel subsets.
7.3 Gluing Together on a Function Value
Note that if E is a separable metric space and £ its Borel subsets, then .
£ <S> £ contains the diagonal {(y, y) : y € E}. (7-5)
This can be seen as follows. By separability E has a countable dense subset
A. For each e > 0 and a € A, let BE(a) be the open e-ball around a. Since
£ is generated by the open sets, Be{a) € £, and thus BE(a) xBE(a) € £®£.
Let A£ denote the union of Be(a) x Be{a) over a £ A. Since A is countable,
Ae G £ ® £■ Since A is dense, the diagonal is contained in each Ae, and
thus the diagonal is the decreasing limit of the Ae, which yields the desired
result (7.5).
We shall need the following result in the proof of Theorem 3.2 in
Chapter 7.
Theorem 7.5. Let Y and Y' be random elements in some measurable
spaces (K,K.) and {K1 ,K,'), respectively. Let g and g' be measurable
mappings from (K,K.) and (K',K.'), respectively, to a measurable space {E,£).
Suppose there exists a weak-sense-regular conditional distribution of Y'
(7.6)
Section 8. Exact Coupling - Another Proof of Theorem 6.1 153
given g'(Y') [holds when (K',IC') is Polish] and
£ ® £ contains the diagonal {(y,y) : y € E}
[holds if (E,£) is separable metric and £ its Borel subsets].
If
9(Y) = g'{Y'),
then there exists a coupling (Y,Y') of Y and Y' such that
g(Y)=g'(Y')
and Y can be identified with Y.
Proof. Take Y : — Y and obtain Y' by applying the transfer extension in
Section 2.12 as follows. Take 7, := g{Y) and (Y{,Y£) := (g'(Y'),Y') and
define Y' := Yi to obtain
(g(Y),Y')^(g>(Y'),Y'Y
This implies
(g(Y),g'(Y'))^(g'(Y'),g'(Y')). (7.7)
Since £®£ contains the diagonal {(y,y) : y e E}, the set {g{Y) = g'(Y')}
is an event, and we obtain from (7.7) the second equality in
F(g(Y) = g'(Y')) = P((g(Y), g'(Y')) €{(y,y):ye E})
= V{{g\Y'),g\Y1)) e {(y,y) : y G E}).
Thus P(g(Y) — g'{Y')) = 1, and the desired result follows by deleting a
null event. □
8 Exact Coupling - Another Proof of Theorem 6.1
We now return to stochastic processes and use Theorem 7.4 to rephrase
the proof of Theorem 6.1.
8.1 The Post-i a-Algebra
Again consider continuous- or discrete-time stochastic processes Z and Z'
having a general state space (E, £) and an arbitrary path space (H, %). We
shall apply the above theory to the post-t a-algebra, the sub-u-algebra of
H defined by
Tt :=6>r1H = {£[0,t) xA:Aen}, 0^t<oo.
154 Chapter 4. STOCHASTIC PROCESSES
Theorem 8.1. Let (Z, Z') be a coupling of Z and Z' and let 0 ^ t < oo. //
T is a coupling time, then {T ^ t} is a Tt-coupling event. If T and T' are
distributional coupling times, then {T ^ t} and {T' ^ t} are distributional
Tt-coupling events.
Proof. Take an arbitrary A G Tt and note that A = E^') x 6tA. Thus
{Z e A} = {8tZ g 6tA} and {Z' G A} = {8tZ' G 6tA}, which yields (with
T" = T in the nondistributional case)
{ZeA,T^t} = {0tZ G M,T ^ t] = {6tpTZ G 9tA,T ^ t},
{Z1 eA,T'^t} = {8tZ' G 8tA,T' ^t} = {6t(3T>Z' G 8tA,T' ^ t}.
In the nondistributional case the right-hand sides are identical, and in the
distributional case they have the same probability. □
From Theorem 8.1 and the 7f-coupling event inequality we obtain
||P(ZG-)h-P(^'e-)klU2P(r>i), o ^ t < oo. (8.1)
This inequality also follows from the coupling time inequality and the
observation [Remark 5.1] that for 0 ^ t < oo,
||P(Z G 01-73 - P(Z' G Ok II = \\P(8tZ G •) - PtftZ' G -)||. (8-2)
8.2 Another Proof of Theorem 6.1
Due to Theorem 3.2, it suffices to establish the distributional part. We shall
prove (a) and (&) simultaneously. To simplify notation in the continuous
case we carry out the proof for tn = n; the proof for general tn is analogous
(replace 0,1, 2,... by to, t\, £2, • • • throughout).
From Theorem 8.1 and (8.1) and (8.2) we see that a distributional exact
coupling is maximal at t if and only if the distributional 7f-coupling event
{T ^ t} is maximal. This suggests using Theorem 7.4 (the existence of a
maximal coupling with respect to a sub-cr-algebra) recursively to obtain a
distributional exact coupling maximal at the integers.
Let (Z(n),Z'("),Cn), 0 ^ n ^ 00, be independent triples with the
following properties. Let
(Z(°>, Z'(°>) be a coupling of Z and Z'with a single
maximal distributional 7o-coupling event Cq.
Recursively, for 0 < n < 00, let
(Z(™\ Z'(n') be a coupling of processes with distributions
P(Z<"-1) G -|C^-i) and P(Z'("-1) G -|C£-i) with a single (8.4)
maximal distributional 7^-coupling event Cn.
Section 8. Exact Coupling - Another Proof of Theorem 6.1 155
Put
T = inf{0 ^ n < oo : C„ occurs} [inf 0 := oo]
and note that [due to the independence of the triples (Z("', Z'(n\Cn)]
P(T>n)=P(C0c)---P(C£), (Kn<oo,
and that [since P(Z<n+1) e •) = P(Z<n) G -|C£)], for 0 ^ n < oo,
P(Z<n> e-) = P(^(n) g •,<?„)+ P(Z<n+1> g-)P(0
This yields [P(Z(°) G •) = P(Z G •) and Z<°) = Z<T) on C0 = {T ^ 0}]
that the following holds for n = 0,
P(Ze-)=P(#) e-,T^n)+P(Z("+1) e-)P(r>n), (8.5)
and that if it holds for n, then it holds with n replaced by n + 1, since
P(Z("+1» G-)p(T>«)
= P(Z("+1> G -,C„+i)P(r > n) + P(Z<n+2> 6 -)P(C^+1)P(T > n)
= P(Z<T> G-,T = n + l)+P(Z("+2) G-)P(r>n + l),
where we have used the independence of (Z^n+1\Cn+\) and {T > n} =
Cq Pi • • • n C^ for the second identity. Thus by induction (8.5) holds for all
0 ^ n < oo. Dropping the last term and sending n to infinity yields
P(Ze-) ^P(Z(T) e-,r < oo).
Similarly, P(Z' G •) ^ P(^'(T) G -,T < oo). Let Z(°°) and Z'(co) be
independent with arbitrary distributions when P(T = oo) = 0 and with
the following distributions when P(T = oo) > 0 :
P(Z(co' G •) := (P(Z G •) - P(Z(T) G -.T < oo))/P(T = oo),
P(Z'(oo) G •) := (P(Z' G ■) - P(Z'(T» G -,r < oo))/P(T = oo).
Then (Z(T),Z'(T)) is a well-defined coupling of Z and Z'. Further, for
0 ^ n < oo and AeTi,
P(0TZ<T) GA,T = n)
= P(Z(") G 0-xA,C„)P(r £ n) [independence]
= P(Z'(n) G 9-lA,Cn)Y(T > n) [Cn single T„-event]
= P(6»TZ'(T) € A,T = n) [independence]
156 Chapter 4. STOCHASTIC PROCESSES
and thus T is a single distributional coupling time. Finally, due to (8.5),
P(Z(T> e -,T > n) = P(Z("+1> G -)P(T > n)
and similarly
P(Z'(T) G -,T > n) = P(Z'(n+1) G -)P(T > ").
Since P(Z("+1) g -)k -L P(Z'(n+1) G -)k, this yields that
P(Z<T> G-,r>n)|r„ ±P(Z'(T) G-,r>n)|rn,
that is, {T > n} is a maximal distributional 7^-coupling event. This
completes our second proof of Theorem 6.1.
Comment. Lemma 6.1 is implicitly re-proved in the second sentence of
the proof of Theorem 7.4.
9 Exact Coupling — Tail cr-Algebra - Equivalences
In this section we introduce the tail u-algebra, which is intimately linked
to exact coupling, and establish a basic set of equivalences.
9.1 The Tail a-Algebra
The tail a-algebra is the decreasing limit of the post-i u-algebras
T:= |-| 71.
For an example of a set in T let B be some set in £ and put
A\ = {z G H : Zk G B for infinitely many integers k}.
More generally, if B0, B\,... is a sequence of sets in £, then
A2 = {z G H : Zk G Bk for infinitely many integers k ^ 0}
is in T. For real-valued processes we can, for instance, take Bk = {&}•
Then in discrete time A2 is the set where the space-time diagonal is visited
infinitely often.
Restrict attention to the continuous-time shift-measurable case. Then,
with B G £,
A3 = {z G H : {s ^ 0 : zs G B} has infinite Lebesgue measure}
is in T. And for real-valued processes, for instance,
A4 = {z G H : {s ^ 0 : zs = s} has infinite Lebesgue measure}
is in T.
Section 9. Exact Coupling - Tail a-Algebra - Equivalences 157
9.2 The Inequality
The following theorem explains what the tail u-algebra has to do with
exact coupling. (Recall that \x\a denotes the restriction of a measure /j, to
a sub-u-algebra A.)
Theorem 9.1. Let (Z,Z') be a coupling of the discrete- or continuous-
time stochastic processes Z and Z' with a general state space (E, £) and a
general path space (H,H). If T is a coupling time, then {T < oo} is a T-
coupling event. IfT andT' are distributional coupling times, then {T < oo}
and {T' < oo} are distributional 7-coupling events. In both cases
||P(Z G -)\t ~ P(Z' G -)lrll ^ 2P(T = oo). (9.1)
Proof. Consider first the nondistributional case. Since T is contained in
each 7f, we have by Theorem 8.1 that for 0 ^ t < oo,
{Z eB,T ^t} = {Z'eB,T^t}, BeT,
and thus sending t —> oo renders the desired result
{Z e B,T < oo} = {Z'e B,T < oo}, B eT.
In the distributional case the first of these identities (with T" instead of T
on the right) holds in distribution due to Theorem 8.1 and thus so does the
second. The inequality follows from Theorem 7.3. □
Note that we cannot expect a coupling with coupling time T to have
coupling event {T < oo}, since this would imply ||P(Z G •) - P(Z' G -)|| ^
2P(T = oo), while it is even possible that ||P(Z G •) - P(Z' G -)ll = 2 and
P(T = oo) = 0.
9.3 Maximally Successful Exact Coupling
Clearly, the exact coupling in Theorem 6.1 that is maximal at, for instance,
integer times is also maximally successful, that is, attains the supremum of
the success probabilities over all exact couplings. The converse is not true,
since if T is the time of an exact coupling'that is maximal at integer times,
then replacing T by T+1, for instance, yields an exact coupling that is not
maximal at integer times; however, it is still maximally successful because
P(T+1 = oo) = P(T = oo).
We shall now establish that this maximally successful exact coupling in
fact yields a maximal T-coupling event (attains identity in (9.1)), which in
particular shows that
maximal success probability = ||P(Z G -)lr A P(Z' G -)lr||-
158 Chapter 4. STOCHASTIC PROCESSES
Theorem 9.2. Let Z and Z' be one-sided discrete- or continuous-time
stochastic processes with a general state space (E, £) and a general path
space (H,H). The distributional exact coupling (Z, Z',T,T') of Z and Z'
in Theorem 6.1, which is maximal at the integers, is such that {T < oo}
and {T" < oo} are maximal distributional T-coupling events, that is,
||P(Z G -)\T - P(Z' G Olrll = 2P(T = oo). (9.2)
Proof. Since (Z, Z',T,T') is maximal at the integers, we obtain from
Theorem 8.1 together with (8.2) and (7.3) that
P(ZG-,T>n)|Tn ± P(Z'e-,T'>n)|r„.
Since {T = oo} C {T > n}, this implies that P(Z G -,T = oo)|Tn and
P(Z' G -,T" = oo)|rn are also mutually singular, that is,
3An G Tn : P(Z G An, T = oo) = 0 and P(Z' G Acn, T = oo) = 0.
Put
CO CO
A = limsupylri := f] \J Ak
"^°° n=Ok=n
and note that A eT and that Ac = liminfn^oo A^ to obtain
3A G T: P(Z G A, T = oo) = 0 and P(Z' G Ac, T' = oo) = 0,
that is, P(Z G -,T = 00)7- and P(Z' G -,T' = 00)7- are mutually singular,
which is equivalent to (9.2). □
9.4 A Total Variation Limit Result
The following theorem explains what the tail u-algebra has to do with total
variation convergence.
Theorem 9.3. Let Z and Z' be one-sided discrete- or continuous-time
stochastic processes with a general state space (E, £) and a general path
space (11,%). Then as t —> 00,
||P(0(Z g •) - P(etZ' G Oil -+ l|P(Z G Olr - P(Z' G Olrll-
Proof. Let T be as in Theorem 9.2 and send t -* 00 in the coupling time
inequality (Theorem 5.1) to obtain (due to (9.2)) that
limsup \\P{OtZ GO- P(0tZ' G Oil ^ ||P(Z G Olr - P(Z' G Olrll-
t—>co
Since T is contained in each Tt, we have
||P(Z g Olr - P(^' G Olrll ^ ||P(Z G Ok - P(Z' G Okll,
Section 9. Exact Coupling - Tail cr-Algebra - Equivalences 159
and since the right-hand side equals ||P(0tZ G ■) — P(8tZ' G -)||, we have
||P(Z G -)lr - P(^' G Olrll ^ Hminf ||P(6>tZ G ■) - P(6»tZ' G Oil-
The first and last inequality yield the desired result. □
REMARK 9.1. By a similar argument we can obtain the inequality (9.1)
directly:
||P(Z G -)|r - P(Z' G -)|rll ^ HP(fl^ GO- P{etZ' G Oil
< 2P(T> t) -+2P(T = oo), i ->■ ex),
without the concept of coupling with respect to a u-algebra.
9.5 Equivalences
We can now tie together exact coupling, total variation convergence, and
the tail u-algebra as follows.
Theorem 9.4. Let Z and Z' be one-sided discrete- or continuous-time
stochastic processes with a general state space (E,£) and a general path
space (H,T-C). The following statements are equivalent.
(a) There exists a successful distributional exact coupling of Z and Z'.
(b) \\P(9tZ €■)- P(0tZ' G Oil -> 0 as t ->■ oo.
(c) P(ZGOIr = P(Z'GO|r-
Moreover, these statements are equivalent to the existence of a successful
nondistributional exact coupling if there exists a weak-sense-regular
conditional distribution of Z given firZ for any random time T [this holds in
discrete time when (E, £) is Polish and in continuous time when (E, £) is
Polish and the paths are right-continuous}.
PROOF. By the coupling time inequality, (a) implies (&), see (5.1). By
Theorem 9.3, (6) implies (c). By Theorem 9.2, (c) implies (a). The final
claim of the theorem follows from Theorems 3.1 and 3.2. □
o
Chapter 5
SHIFT-COUPLING
1 Introduction
The previous chapter dealt with coupling one-sided stochastic processes in
such a way that their paths eventually merge. This we called exact coupling
to distinguish it from the more general shift-coupling to be considered in
this chapter. Shift-coupling means that the paths eventually do not merge
'exactly' but only modulo a random time shift. In this chapter we shall also
consider an issue that arises only in continuous time: what happens when
the random time shift can be made arbitrarily small, that is, when epsilon
couplings exist.
It turns out that both shift-coupling and epsilon-couplings have a theory
paralleling that of exact coupling: they can be linked to a mode of
convergence (Cesaro and smooth total variation convergence, respectively) and to
a (7-algebra (the invariant and the smooth tail u-algebra, respectively) in
the same way as exact coupling is linked to plain total variation convergence
and to the tail cr-algebra. For both shift-coupling and epsilon-coupling we
introduce inequalities that play the same- key role as the coupling time
inequality in the exact coupling case.
In order to stress the similarities (and the dissimilarities) between these
three types of coupling and to make comparison easier, the treatment of
first shift-coupling (Sections 2 through 5) and then epsilon-couplings
(Sections 6 through 9) is organized in the same way as that of exact coupling
(Sections 3, 5, 6, and 9 in Chapter 4): the sections have analogous titles,
and the subsections and theorems are enumerated in the same way (when
possible). We start with a section defining the concept and its distribu-
161
162 Chapter 5. SHIFT-COUPLING
tional version, continue with a section presenting the inequality and the
resulting limit theory, then move on to a section discussing the question
of maximality, and finish with a section introducing the u-algebra and the
basic set of equivalences between the coupling, the total variation result,
and the u-algebra.
Throughout this chapter U is a random variable that is uniform on
[0, l] and independent of the processes and the shift-coupling (epsilon
couplings). And note that in the continuous-time case we now impose the
shift-measurability condition throughout.
2 Shift-Coupling - Distributional Shift-Coupling
This section introduces shift-coupling and its distributional version.
2.1 Shift-Coupling - Definition
Let Z and Z' be one-sided discrete-time or continuous-time shift-measurable
stochastic processes with general state space (E,£) and path space (H,l-t);
see Section 2 in Chapter 4. We shall use notation in accordance with
continuous time, but all we need to switch to discrete time is to substitute, for
instance, t by n and s by k and to introduce the following convention: in
the discrete-time case extend the definition of the shift-maps to noninteger
times by
etz = e[t]z, te[0,oo), z = (z0,z1,...)eE^1'-K
A shift-coupling of Z and Z' is a quadruple (Z, Z',T,T') where (Z, Z') is a
coupling of Z and Z' and T and T" are two random times (integer-valued
in the discrete-time case) such that
8TZ = 8T'Z' on {T < oo}
(2.1)
and {T <oo} = {T' < oo}.
Using the convention that a process is absorbed in the cemetery state when
shifted to infinity we can rewrite (2.1) simply as
8TZ = 6T'Z' (see Figure 2.1).
The times T and T" are the shift-coupling times, P(T < oo) is the success
probability, and the shift-coupling is successful if P(T < oo) = 1. When
T < oo, then T - V is the shift. There is no shift if T = T', and then the
shift-coupling is an exact coupling.
Section 2. Shift-Coupling - Distributional Shift-Coupling 163
The bold parts
are identical.
W^
0 T
FIGURE 2.1. Nondistributional shift-coupling: merging modulo a time shift.
2.2 Distributional Shift-Coupling — Definition
Say that (Z, Z', T, T') is a distributional shift-coupling of Z and Z' if (Z, Z')
is a coupling of Z and Z', and T and T" are nonnegative random times such
that
j*yZ — ut' Z ;
(2.2)
here we again use the convention that the shifted processes are absorbed
in the cemetery state when T and T' are infinite.
Certainly a shift-coupling is also a distributional shift-coupling. We shall
use the word nondistributional to distinguish a shift-coupling from a
distributional one. Otherwise, we use the same terminology in both cases.
For an example of a successful distributional shift-coupling consider two
independent differently started versions, Z and Z', of a countable state
space irreducible recurrent Markov chain and let T and T' be the times
when Z and Z', respectively, first hit a fixed state. Then (Z, Z',T,T') is a
distributional shift-coupling of Z and Z'. A nondistributional shift-coupling
is obtained by letting the chains continue in the same way after hitting the
state.
Note that (2.2) implies nothing about T and T" except that P(T < oo) =
P(T' < oo). It is an interesting observation, however, that a distributional
shift-coupling of the space-time processes (-Zs, s)sg[o,co) and (Z's, s)s6[o,co)
is a distributional exact coupling of Z and Z', since
9t(Zs, s)se[o,co) — #T'(Zs,s)se[o,co)
is equivalent to (6tZ,T) = (8t'Z',T'). And in the nondistributional case
a shift-coupling of the space-time processes is equivalent to T — T' and
eTz = eTz'.
164 Chapter 5. SHIFT-COUPLING
2.3 The Hats May Be Dropped in the Distributional Case
If we have a distributional shift-coupling of Z and Z', then we can take Z
and Z1 to be the original processes Z and Z1.
Theorem 2.1. Suppose (Z, Z',T,T") is a distributional shift-coupling of
Z and Z'. Then the underlying probability space can be extended to support
random times T and T' such that
{Z, T) = (Z, f) and (Z', T') = (Z', f').
In particular,
6tZ = 6t>Z'. (2.3)
PROOF. This follows from the transfer extension in Section 4.5 of
Chapter 3. In order to obtain T take Fx := Z and (F/,F2') := (Z,T) and define
T : = Y2. Similarly, in order to obtain T' take Yx := Z' and {Y{,Y2') : =
(Z',f") and define T':=y2. D
This theorem motivates again dropping the hats when discussing
distributional shift-coupling, when there is no danger of confusion. Say that T and
T' are distributional shift-coupling times of Z and Z' if (2.3) holds.
2.4 Turning Distributional into Nondistributional
In the standard settings a distributional shift-coupling can always be turned
into a nondistributional one.
Theorem 2.2. Let (Z, Z', T, T') be a distributional shift-coupling of Z and
Z'. Suppose there exists a weak-sense-regular conditional distribution of
Z' given 9f,Z' [this holds in discrete time when the state space is Polish
and in continuous time when the state space is Polish and the paths are
right-continuous]. Then the underlying probability space (ft, J7,P) can be
extended to support T, Z", and T" such that
{Z,T) = (Z,f) and (Z",T") = {Z',f>)
and (Z, Z",T,T") is a nondistributional shift-coupling of Z and Z'.
Proof. Let T be as in Theorem 2.1. To obtain (Z",T") use the transfer
extension in Section 2.12 of Chapter 4 as follows. Take Y\ := 8tZ and
(Y{,Y^) := (6f,Z',Kf,Z') to obtain Y2 such that [see Section 2.9 of
Chapter 4 for the definition of the killing maps nt]
(eTz,Y2)B{et,z',Kt,z').
Define {Z",T") by
(6t»Z",kt»Z") := {9TZ,Y2) (thus 9TZ = 0T»Z").
Section 3. Shift-Coupling - Inequality and Asymptotics 165
Since {eT„Z",KT»Z") is a copy of (eflZ',KflZ'), it follows that (Z",T")
is a copy of (Z',T') because (Z", T") is determined in the same measurable
way by {eT„Z" ,kt»Z") as {Z',f') is by (9f,Z',Kf,Z'). D
3 Shift-Coupling — Inequality and Asymptotics
The last section was devoted to the definition of shift-coupling and its
distributional version. We shall now go on to the limit implications.
3.1 Shift-Coupling Inequality — And Its Reformulations
Rather than shifting Z to a nonrandom t as in the coupling time inequality
we now shift to a point picked uniformly at random in [0, t\.
Theorem 3.1. Let Z and Z1 be one-sided discrete-time or continuous-time
shift-measurable stochastic processes with a general state space (E,£) and
path space (H,"H). If there is a distributional shift-coupling of Z and Z' with
times T and T', then the underlying probability space can be extended to
support a copy R ofT' such that {T < oo} = {R < oo} and, forO ^ t < oo,
\\P(9UtZ 6 •) - P(QutZ' 6 -)ll SHIFT-COUPLING
^ 2P(T V R > Ut), INEQUALITY
where U is uniform on [0,1] and independent of Z, Z', T, and R. In the
nondistributional case we can take R := T' and the shift-coupling inequality
becomes
\\P{6UtZ e •) - P(0utZ' € OIK 2P(T V T' > Ut).
PROOF. With {/independent of the shift-coupling (Z,Z',T,T'), note that
the remainder when T + Ut is divided by t,
(T + Ut) mod t := {T/t + U- [T/t + U])t,
is uniform on [0,t] and independent of Z. Therefore, 9(r+ut) mod t% 1S a
copy of 6UtZ. Similarly, 0(T'+ut) mod tZ' is a copy of 9UtZ'. Thus
{9(T+Ut) mod tZ,6(T' + Ut) mod t% )
\^ )
is a coupling of dutZ and dutZ'.
In the nondistributional case 9TZ = 9t'Z' yields the second identity in
0(T+ut) mod tZ = Out^rZ — 9ut9r'Z'
= 0(T'+Ut) mod tZ' On {Ut^t-TVT},
166 Chapter 5. SHIFT-COUPLING
while the other two follow from the fact that 'mod V can be removed on
both sides when Ut ^t-TVT'. Thus {Ut ^ t - T V T'} is a coupling
event of the coupling at (3.1), and the coupling event inequality [see (8.16)
in Chapter 3] yields the desired result, since
¥{Ut >t-TVT') = P(T V T' > (1 - U)t) = P(T V T' > Ut).
In the distributional case, 6tZ = 8t'Z' allows us to apply the conditioning
extension in Section 4.5 of Chapter 3 as follows. First take (lo,^i) '■ =
{{Z,T),eTZ) and (Y{,Y{) := {0T>Z',T') and define R := Y2. Then take
{YcYi) := {{Z',T'),eT,Z') and {Y{,Y±) := {9TZ,T) and define R' := Y2.
This yields
(6TZ,T,R)^(8T,Z',R',V),
which in turn yields the second identity in
P(0(T+£/t) mod J € -, Ut < t - T V R)
= P(eut0TZ £-,Ut^t-TVR)
= ¥{9ut9T,Z' e-,Ut^t-T'vR')
= P(9(T' + Ut) mod tZ'€;Ut^t-T'\/ R').
Thus {Ut ^ t - T V R} and [Ut ^ t - T'V R1} are distributional
coupling events of the coupling at (3.1), and the distributional coupling event
inequality yields the shift-coupling inequality. □
Reformulations. The left-hand side (l.h.s.) of the shift-coupling
inequality can clearly be rewritten in the following Cesaro (time-average) form in
the continuous-time case,
l.h.s. =
\ f P{9SZ e •) *» - 7 / P{9SZ' €-)ds , (3.2)
t Jo Wo
and in the discrete-time case (recalling that then t = n and 9unZ =
9{Un}Z),
II 1 n 1 ™
l.h.s. =\\-'Vp{0kZe-)--'yiP{8kz'e-)
\\n t-*1 n ^—'
o o
The right-hand side (r.h.s.) can be rewritten in several ways:
'TV R
r.h.s. = P ( ?-~ >*)= YEKT V R) A t) = E
t
Al
(3.3)
(3.4)
Section 3. Shift-Coupling - Inequality and Asymptotics 167
here the first and last equalities are obvious, and the one in the middle
follows from
tP{TvR> Ut) = E[ / 1{Tvr>s} ds)
Jo
/•CO
= E[/ l{{TvR)At>s]ds]=E[(TvR)At}.
Jo
Since (T V R) A t ^ T + R, we have in particular, (since E[R] = E[T'])
/ V{9SZ £-)ds- f P{9SZ' e-)ds ^ 2(E[T] + E[T'\). (3.5)
Jo Jo
The analogy between the coupling time inequality and the shift-coupling
inequality is stressed in a different way by the following reformulation of
the latter: for 0 «C t < oo,
/ P(0sZ e •) ds - ( P(8SZ' e-)ds ^ 2 / P(T V R > s) ds.
Jo Jo Jo
3.2 Finite T — Cesaro Total Variation Convergence
In the same way as the coupling time inequality is basic for plain total
variation asymptotics, the shift-coupling inequality is basic for Cesaro (or
time-average) total variation asymptotics.
If there exists a successful shift-coupling, then clearly
P l—jj- > t) -*0, *-Noo
and thus
WP{OutZe-)--p{eutZ'e-)\\-+o, t^oo.
(3.6)
In particular, if Z' is stationary, then OutZ' has the same distribution as
Z', and (3.6) can be rewritten as
dutZ —> Z', t —> oo, (Cesaro asymptotic stationarity).
3.3 Finite Moments of T and T" — Rates of Convergence
If a 6 (0,1) and both E[Ta] and E [T'a] are finite, then, since U and (T, R)
are independent and (T V R)a = Ta V Ra ^ Ta + Ra,
E
TVfl
U
E[[/-a]E[(T V R)a] ^ E[U-a]E[Ta + Ra],
168 Chapter 5. SHIFT-COUPLING
which yields, since E[[/-a] < oo for a < 1 and E[Ra] = E[T'a], that
TViT "
E
U
< oo.
Thus (see Section 5.3 in Chapter 4) taP (^^ > t) ->■ 0 as £ -> oo, and the
shift-coupling inequality yields convergence of power order a
0 < a < 1 and E[Ta] and E[T'a] < oo
=> ta||P(0t/tZe-)-P(0t/^'G-)ll->o, *^oo.
(3.7)
From the inequality we cannot obtain rates of order a = 1 or higher. What
can be deduced from (3.5), however, is the following boundedness result:
E[T] and E[T'] < oo
sup
0<t<co
/ P{9SZ e-)ds- f P{9SZ'
Jo Jo
ds
< oo.
3.4 Finite Moments of T and T" — Moment Rates of Convergence
If a 6 (0,1) and both E[Ta] and E[T'a] are finite, then we have just shown
that E[(-^j^) ] < oo, which yields a stronger result than (3.7), namely a
rate of convergence of power moment-order a — 1:
0 < a < 1 and E[Ta] and E[T'a] < oo
/•OO
=> / t^WPidutZe^-PieutZ'e-)\\dt<
Jo
oo;
see Section 5.4 in Chapter 4. (In discrete time the integral is replaced by a
sum.)
3.5 Stochastically Dominated T and T' — Uniform Convergence
Let Z be a class of discrete- or continuous-time shift-measurable
stochastic processes on a general state space. Suppose there exist finite random
variables T and T' such that for all pairs of processes Z, Z' € Z there is a
shift-coupling (distributional or not) of Z and Z' with times T and T" such
that
D _ D _
T ^T and T' < T'.
Since P(T V R> Ut) ^ P(T > E/£) + P(# > C/t) and since P(i? > f/i) =
P(T' > C/0, we have P(T V i? > C/i) < P(f > C/t) + P(f" > C/t)- ThuS
the shift-coupling inequality yields that for 0 < t < oo and Z,Z' & Z,
\\P(dutZ € •) - P(0t/tZ' € -)ll < 2P(f > t/i) + 2P(f' > t/i),
Section 4. Shift-Coupling - Maximality 169
and we obtain uniform convergence over the class Z:
sup ||P(9[/tZ6-)-P(W G-)l| -+0, £->oo.
Z,Z'£Z
Rates of convergence are obtained in the same way as in the previous
subsection under conditions on T and T" rather than on T and T'\
0 < a < 1 and E[Ta] and E[T'a] < oo
=> ta sup ||P(^Ze-)-P(^c/^'e
Z,Z'£Z
0,
t -^ oo,
and in the continuous-time case we have the stronger result
0 < a < 1 and E[fa] and E[f"a] < oo
/•OO
=> / T"1 sup \\P{eUtZ e-)-P{9UtZ'e-)\\dt<oo,
JO Z,Z'£Z
while in the discrete-time case
0 < a < 1 and E[fa] and E[f"a] < oo
CO
=> Vfc""1 sup ||P(6>[l/fc]Ze-)-P(0[t/*]^'G-
j Z,Z'£Z
Also, due to (3.5) and E[T] < E[f] and E[T'] < E[f'],
E[f] and E[f'] < oo
/ P(8sZe-)ds- [ P(8sz'e-)ds
Jo Jo
< oo.
sup
0<t<oo
Z,Z'£Z
< oo.
4 Shift-Coupling - Maximality
In this section we shall not establish a shift-coupling analogue of maximal
exact coupling. We only establish a result that enables us to show in the
next section that there is a shift-coupling that is both maximally successful
and also successful when the Cesaro total variation convergence (3.6) holds.
But we will not be able to reverse the rate results in the previous section.
The maximality question is further discussed in Section 4.5 below.
4.1 The Maximality Theorem
The right-hand side of the shift-coupling inequality is nonincreasing, but
the left-hand side need not be. For a simple counterexample let Z be a
nonrandom periodic function with period d > 1 and put Z' — 9\Z; then
170 Chapter 5. SHIFT-COUPLING
\\P(8utZ € •) - ¥(9utZ' € -)|| = 0 or > 0 according as t/d is an integer
or not. Thus defining maximal shift-coupling by demanding identity at all
times, even at integer times, is not without complications, and we shall not
proceed further along that path.
Recall, however, that for exact coupling, maximality is equivalent to
P{9tZ €-,T >t) and P{9tZ' € ■ ,V > t) being mutually singular for all t.
The following shift-coupling analogue of this reformulation is considered by
Greven (1987): in discrete time there exists a distributional shift-coupling
(Z,Z',T,T') such that
OO OO
^P(SnZ6-,T>n) 1 ^P(0„Z'e-,T'>n). (4.1)
o o
This is a strong property, which tells us that for all times n and n' the
processes 9nZ and 9niZ' stay in separate parts of the path space prior to
merging.
Here we shall content ourselves with the following weaker maximality
property, which says only that for all times t and t' the processes 9tZ and
9t'Z' stay in separate parts of the path space if they do not merge at all.
This property, however, is all we need for the next section, and the result
has the merit of not being restricted to discrete time.
Theorem 4.1. Let Z and Z' be one-sided discrete-time or continuous-
time shift-measurable stochastic processes with a general state space (E, £)
and path space (H,T-L). Then there exists a distributional shift-coupling
(Z,Z',T,T') of Z and Z' such that
/•OO />00
/ P{9tZ e-,T=oo)dt ± / P(6tZ' e-,T' = oo)dt. (4.2)
Jo Jo
Moreover, there exists a nondistributional shift-coupling of Z and Z' with
this property if there exists a weak-sense-regular conditional distribution
of Z given 9tZ [this holds in discrete time when (E,£) is Polish and in
continuous time when (E,£) is Polish and the paths are right-continuous}.
We prove this result in the next three subsections.
4.2 First Part of Proof — Construction of a Candidate
Let V\, V2,... be i.i.d. exponentially distributed random variables that are
independent of the sequence of independent quadruples (Z^k\ Z'^k\Ck, C'k),
1 < k < 00, which have the following properties. Let
(Z*1), Z"^>) be a coupling of Z and Z' and let Cx and C[ be
maximal distributional coupling events of (9y1Z(-1\ 9v1Z'^).
This is possible because we can first let (Z^,Z'^) be a coupling of Z
and Z' and then use Theorem 4.4 in Chapter 4 [with Y :— 9vxZ^ and
Section 4. Shift-Coupling - Maximality 171
Y' := (Vi-Z^1)] to obtain C\ and C[. In the same way we can recursively,
for 1 < k < oo, let
(Z(fc),Z'(fc)) be a coupling of processes with distributions
P(Z(fc-1) £-|C^_1)andP(Z'(fc-1) e-\C'%_^ aadletCtand C'k
be maximal distributional coupling events of (9ykZ^k\ 9vkZ'^).
Put
K = inf{1 ^. k < oo : Ck occurs} [inf 0 := oo]
and note that [by the independence of the quadruples (Z(fc), Z'^k\ Ck, C'k)\
P{K>k)=P{Cc1)...P{Cck), l^fc<oo,
and that [since P(Z<fc+1) € •) = P{Z^ € -\Cck)\
P(Z(fc) e-)=P(^(fc) e-,Cfc) + P(Z<fc+1) e-)P(^), l ^ fc < 00.
This yields [P(Z(1) e •) = p(z € •) and Z*1) = Z^ on d = {K «C 1}]
that the following holds for k = 1
P(Z e •) = P(Z<*> e -, AT < Jfc) + P(Z<fc+1> e -)P(^ > *0 (4-3)
and that if it holds for some k, then it holds with k replaced by k +1, since
P(z(fc+1) e-)P{K> k)
= P(Z(fc+1) e -,Ck+1)P{K >k) + P(Z(fc+2) e -)P(C£+1)P(iir > k)
= p(zW e-,ii" = fc + i)+P(z(fc+2) e-)P(tf> Jfc + i),
where we have used the independence of (Z(fc+1), Cfc+1) and {if > k} =
C{ fl • • • fl Ck for the second identity. Thus by induction (4.3) holds for all
1 «C k < oo. Drop the last term and send k —> oo to obtain
P(Ze-) ^P(zW e-,ii"<oo).
Similarly, with
K' = inf{l ^ k < oo : C'k occurs}
we obtain
P(Z'e-)^P(^'w e-,if'<oo).
Put ^ = 00. Let Z(°°) and Z'(°°> be independent of (V*, Z<fc), Z'(fc), Cfc, C'k),
1 ^ fc < oo, with arbitrary distributions when P(K < oo) = 1 and with
172 Chapter 5. SHIFT-COUPLING
the following distributions when P{K < oo) < 1:
pr7(oo) G , = P(Ze.)--p(zWe;K<oo)
{ j P(K = oc)
(4-4)
pr7,(oo) G s = P(Z'e-)-P(Z'(«)e-,K><oc)
{ ' P{K' = oo)
We shall show that
{Z,Z',T,T') := {Z(K\Z'(K),VK,Vji) [the candidate]
is a distributional shift-coupling satisfying (4.2).
4.3 Middle Part of Proof - The Candidate Is a Shift-Coupling
Since K is independent of Z(°°) and K' of Z'(°°\ it follows from (4.4) that
(Z, Z') is a coupling of Z and Z'. For 1 «C k < oo, we have
P(9TZ e-,K = k) = P{eVkZM € ;Ck)P(K > k),
P(8T'Z' €-,K' = k) = P{9VkZ'W e-,C'k)P{K' >k).
Since Cfc and C'k are distributional coupling events of {9vkZ^k\9vkZ'^),
and since P{K ^ k) = P(K' ^ fc), the right-hand sides are identical,
and summing over 1 ^ k < oo yields [since {T < oo} = {K < oo} and
{V < oo} = {K1 < oo}]
P(9TZ e -,T < oo) = P(6T>Z' e -,T' < oo).
Thus (Z, Z',T,T') is a distributional shift-coupling of Z and Z'.
4.4 Final Part of Proof - The Candidate Satisfies (4.2)
The mutual singularity of P(%Z(fc) € -,Cck) and P{9VkZ'^ € -,C"£)
means that there is an Ak & % such that
P(9VkZ^ e Ak) =P(9VkZ^ e Ak,Ck),
P(9Vkz'^ e A£) = P(%z'(fc) e Al,C'k).
From (4.3) we obtain the equality in
P(Ze-,T = oo) <P(Z<*) e.iK^k)
(4.6)
= P{Z^ e-)P(K^k), l«Cfc<oo.
Section 4. Shift-Coupling - Maximality 173
Let V be a copy of Vk and be independent of the shift-coupling. Then
P(8VZ e Ak,T = oo) ^ V{9VkZ(k) e Ak)P{K > k) [due to (4.6)]
= P{8VhZW E Ak,Ck)P(K > k) [due to (4.5)]
< P(Ck)P(K >k) = P(K = k),
and thus
CO
p(evZ e (J Ak,T = ooW P{n ^ K < oo) ->■ 0 as n ->■ oo.
fc=n
Put A = limsupj..^ ^ to obtain
P{dvZ e A,T = oo) =0.
Since V has a density with respect to Lebesgue measure that is strictly
positive on [0, oo) and is independent of (Z,T), we can write this as
P{6tZ e A,T= oo)dt = 0.
Since liminffc^oo Ack is the complement of A, we obtain similarly
P{9tZ' eAc,T' = oo)dt = 0.
Thus (4.2) holds, and the proof of Theorem 4.1 is complete.
4.5 Remarks on Maximality
If T and T' are distributional shift-coupling times, then so are T + Y and
T' + Y for any nonnegative (integer-valued in the discrete case) random
variable Y that is independent of the shift-coupling (independence is not
needed in the nondistributional case). Furthermore, if Y is finite, then
{T + Y = oo} = {T = oo}, and thus (4.2) holds if and only if it holds
with T and V replaced by T + Y and T + Y. Thus (4.2) does not tell us
anything about behaviour in finite time.
Greven's maximality property (4.1) is clearly stronger than (4.2), since
the measures in (4.1) contain those in (4.2). But it is not strong enough
to be a full-fledged shift-coupling analogue of maximal exact coupling, as
can be seen from the following example. Let Z be an irreducible recurrent
Markov chain starting in a fixed state x. Let Z' be a Markov chain with
the same transition probabilities starting from a different state. Take an
n ^ 0 and let T'n be the time of the first visit of Z' to x after time n.
Then 0 and T'n are distributional shift-coupling times of Z and Z', and
(4.1) certainly holds (the left-hand side is 0). Since T'n ^ n, where n can be
174 Chapter 5. SHIFT-COUPLING
chosen arbitrarily large, this is not a sufficiently sharp maximality property:
it should have picked Tq.
In particular, we can see from the above example that (4.1) on its own
is not sufficient to reverse the rate results in the last section. We can find
Z and Z' such that Tq = 1, and Section 3.4 yields Cesaro total variation
convergence with (for instance) moment rate of order —|. But (4.1) holds
for T = 0 and T = T'N where N is independent of Z' with E[N^2] = oo
which implies E[T;1/2] = oo.
5 Shift-Coupling - Invariant a-Algebra - Equivalences
In this section we introduce the invariant cr-algebra, which is linked to
shift-coupling in the same way as the tail cr-algebra to exact coupling, and
establish an analogous set of equivalences as in the exact coupling case.
5.1 The Invariant a-Algebra
The invariant a-algebra consists of path sets in % that do not depend on
where the time origin is placed. It is defined as follows:
1= {A e n : e'1 A = A,0 ^ t < oo}.
This is a cr-algebra because if A is the union of sets Ak satisfying O^Ak =
A/., then 9^1A = A (certainly 1 is closed under complementation and
contains H). Since AeV. and A = 9^1A imply A G O^H = Tt, we have
1CT.
For examples of sets in 1, and not in 1, consider the sets A\, A2, A3, At in
Section 9.1 of Chapter 4. In the discrete-time case A\ G X but A2 £ 1. In
the continuous-time case A3 G 1 but A\, A2, and A4 $.T. Thus in general
the inclusion is strict.
We note at this point the following pleasant result.
Lemma 5.1. Let Z be a discrete-time or continuous-time shift-measurable
stochastic process with a general state space (E,£) and path space {!!,%)■
IfT is a random time, then
{z e A,T< 00} = {6Tz e A,T < 00}, Aei.
In particular, ifT is finite, then {Z G A} = {8tZ G A] for A el.
Proof. With A el and 0 ^ t < 00 we have
{Z eA,T = t} = {Ze 9^A,T= t}
= {9tz e a,t = t} = {eTz eA,T = t}.
Section 5. Shift-Coupling - Invariant <T-Algebra - Equivalences 175
Take the union over 0 ^ t < oo to obtain the desired result. (Note that
uncountable union is no problem here because the union itself is
measurable.) □
5.2 The Inequality
The following theorem explains what the invariant cr-algebra has to do with
shift-coupling.
Theorem 5.1. Let (Z, Z') be a coupling of the discrete-time or continuous-
time shift-measurable stochastic processes Z and Z' with a general state
space (E,£) and path space (H,TL). If T is a shift-coupling time, then
{T<oo} is aX-coupling event. IfT andT' are distributional shift-coupling
times, then {T < oo} and {T' < oo} are distributional X-coupling events.
In both cases
||P(Z g -)|i - P(Z' e -)|z|| ^ 2P(T = oo). (5.1)
Proof. For A G X we have, due to Lemma 5.1,
{Z G A,T < 00} = {9TZ eA,T< 00},
{Z1 G A,T < 00} = {Or,Z' G A,T' < 00}.
In the nondistributional case the right-hand sides are identical, and in the
distributional case they have the same probability. And (5.1) is the I-
coupling event inequality (Theorem 7.3 in Chapter 4). □
5.3 Maximally Successful Shift-Coupling
We shall now show that the coupling in Theorem 4.1 yields equality in
(5.1). Thus there is a maximally successful shift-coupling (attaining the
supremum of the success probabilities over all shift-couplings), and
maximal success probability = ||P(Z G 0|z A P(Z' G -)|ill-
Theorem 5.2. Let Z and Z' be one-sided discrete-time or continuous-time
shift-measurable stochastic processes with a general state space [E, £) and
path space {H,%). The distributional shift-coupling (Z,Z',T,T') of Z and
Z' in Theorem 4-1 is such that {T < 00} and {T1 < 00} are maximal
distributional X-coupling events, that is,
||P(Z G -)|i ~ P(Z' G Olill = 2P(T = 00). (5.2)
Proof. According to Theorem 4.1 there is a set A E H such that
oo
P(6»(ZG A,T = 00) dt = 0,
CO
P(0(Z'G AC,T = 00) dt = 0,
176 Chapter 5. SHIFT-COUPLING
which we can rewrite as
E
E
Put
/•OO
[/ 1{etzeA}dt'T=0O}=°>
/•OO
IJ 1{9lz'eA^}dt'T' = oo]=0.
(5.3)
/•OO
B = {ze H : / l{etzeA} dt = oo}
Jo
and note that JQ l{stZ£A} dt < oo implies f™ l{stZ£Ac} dt = oo and thus
/•CO
Bc = {zeH: / l{dtzeA} dt < oo}
./o
/•CO
C {z E H : / l{etzeAc} dt = oo}.
./o
This and (5.3) yields
P(Z e B, T = oo) = 0 and P(Z' G Bc, T' = oo) = 0. (5.4)
Now, for 0 ^ t < oo,
8rlB={zeH: J l{93zeA}ds = oo}=B.
Thus Be 1, which, together with (5.4), implies that P(Z G -,T = oo)|z
and P(Z' G -,T' = oo)|j are mutually singular, that is, (5.2) holds. □
5.4 A Cesaro Total Variation Limit Result
The following theorem explains what the invariant cr-algebra has to do with
Cesaro total variation convergence.
Theorem 5.3. Let Z and Z' be one-sided discrete time or continuous time
shift-measurable stochastic processes with a general state space (E,£) and
path space {H,W). Then as t -^ oo,
\\P(9utz e ■) - P(6utz' e -)|| -+ ||P(Z g -)li - p(z' e Olill.
where U is uniform on [0,1] and independent of Z and Z'.
Proof. Let T be as in Theorem 5.2 and send t ->• oo in the shift-coupling
inequality (Theorem 3.1) to obtain (due to (5.2))
limsup \\P(9utz e ■) - P(eutz' e Oil ^ l|P(Z € Ok - P(Z' g -)lill-
t—>co
Section 5. Shift-Coupling - Invariant a-Algebra - Equivalences 177
By Lemma 5.1 we have, for 0 ^ t < oo,
||P(Z g -)|i - P(Z' g oiill = \\v{Butz e -)|i - P(emz' e Okll,
and since the right-hand side is ^ \\P{9UtZ G •) - V{6UtZ' G -)||, we nave
||P(Z G Oil - P(Z' e Olill < Uminf ||P(0t«Z G0~ P(0t/t^' £ Oil-
t—»co
These two inequalities yield the desired result. □
REMARK 5.1. By a similar argument we can obtain the inequality (5.1)
directly:
||P(Z G Oil - P(Z' G Olill ^ \\^(0UtZ e 0 - ?(0utZ' G Oil
^ 2P(TV.R> Ut) ->■ 2P(T = oo), £-+oo,
without the concept of coupling with respect to a cr-algebra.
5.5 Equivalences
We can now tie together shift-coupling, Cesaro total variation convergence,
and the invariant cr-algebra as follows.
Theorem 5.4. Let Z and Z' be one-sided discrete-time or continuous-time
shift-measurable stochastic processes with a general state space (E,£) and
path space (H, %). Let U be uniform on [0,1] and independent of Z and
Z'. The following statements are equivalent.
(a) There exists a successful distributional shift-coupling of Z and Z'.
(6) \\P{9mZ GO- P(9utZ' G Oil ->■ 0 as t -+ oo.
(c) P(Ze-)\I = P(Z'e-)\I .
Moreover, these statements are equivalent to the existence of a successful
nondistributional shift-coupling if there exists a weak-sense-regular
conditional distribution of Z given 9tZ for any random time T [this holds in
discrete time when (E,£) is Polish and in continuous time when (E,£) is
Polish and the paths are right-continuous].
PROOF. By the shift-coupling inequality, (a) implies (6); see (3.6). By
Theorem 5.3, (6) implies (c). By Theorem 5.2, (c) implies (a). The final claim
follows from Theorems 2.1 and 2.2. □
Corollary 5.1. Suppose Z and Z' are both stationary. Then Z and Z'
have the same distribution if and only ifP(Z € -)j = P(Z' G Or-
178 Chapter 5. SHIFT-COUPLING
Proof. IfP(Ze ■) = P(Z'G • ),then in particular P(ZG -)x = T{Z' G -)i,
since 1 C U. Conversely, iiP(Ze-)i = P{Z'e -)i, then (6) in Theorem 5.4
holds, which together with the stationarity of Z and Z' yields
||P(z e •) - P(Z' e -)|| = \\P(emz e ■) - ?(emz' e oil -+ o
as i ->• oo. Thus P(Z G 0 = p(^' 6 0 as desired. □
Corollary 5.2. Suppose for all A EH,
P(eutz ei)4 P(z' eA), * ->■ oo. (5.5)
Then dutZ 4Z' as ^ ->• oo.
Proof. Due to Lemma 5.1, we have V{QUtZ £ A) = P(Z G A) for iel
Thus, (5.5) implies that P(Z G Oil = p(z' ^ Oil- Thus (c) in Theorem 5.4
holds and the desired result follows from (6). □
Chapter 7 will mainly be devoted to extending the above shift-coupling
theory to general random elements under a semigroup of transformations.
We shall show there that the limit result (6) in Theorem 5.4 can be much
generalized. We can, for instance, replace Ut by a random variable that
is uniform on tB for any B G B[0, oo) with positive finite Lebesgue
measure. This follows from a corresponding generalization of the shift-coupling
inequality.
6 e-Coupling - Distributional e-Coupling
In the next four sections we shall only be concerned with continuous time
shift-measurable processes and treat an issue that does not arise in
discrete time: what happens when the shift of a shift-coupling can be made
arbitrarily small? These four sections mimic the pattern from the exact
coupling and shift-coupling cases. This section introduces £-coupling and
its distributional version.
6.1 £-Coupling — Definition
Let Z and Z' be one-sided continuous-time shift-measurable stochastic
processes with a general state space (E,£) and path space {H,%). For e > 0,
an e-coupling of Z and Z' is a shift-coupling (Z,Z',T,T') such that
\T-T'\^e on{T<oo}. (6.1)
Certainly an exact coupling is an £-coupling with V — T, for each e > 0.
We shall refer to a collection (Z^, Z'(e\ Te, T'e), e > 0, of £-couplings
simply as epsilon-couplings. Call limsup£|0P(^ < oo) tne success
probability. Say that the epsilon-couplings are successful if
limsupP(T£ <oo) = 1.
ei.0
Section 6. e-Coupling - Distributional e-Coupling 179
Remark 6.1. In Chapter 2 we proved BlackwelFs renewal theorem by ep-
silon coupling random walks (the discrete-time processes formed by the
renewal times) in space: the random walks got £-close in the state space
(at different times). In this chapter we use the term epsilon-couplings for
getting close in time, not in space.
6.2 Distributional £-Coupling - Definition
The distributional version of £-coupling is not as obvious as that of
exact coupling and shift-coupling. The definition (which is best motivated
by Theorems 6.2 and 7.1 below) goes as follows. For e > 0, say that
(Z, Z', T, T', R, R') is a distributional e-coupling of Z and Z' if (Z, Z', T, V)
is a distributional shift-coupling of Z and Z', and R and R! are nonnegative
random times such that
{T < oo} = {R < oo} and {T' < oo} = {R' < oo},
{9TZ,T,R) = {9T'Z',R',T'), (6.2)
\T-R\ <e on {T<oo} and \T' - R'\ < e on {T' < oo}.
It can be helpful to think of R as a substitute for T" and of R' as a substitute
for T (such R and R! have already appeared in Theorem 3.1 and its proof).
Note that an £-coupling (Z,Z',T,T') is a distributional £-coupling in
the sense that (Z,Z',T,T',T',T) is a distributional £-coupling. We shall
use the word nondistributional to distinguish an £-coupling from a
distributional e-coupling. Otherwise, we use the same terminology in both cases.
6.3 The Hats May Be Dropped in the Distributional Case
If we have a distributional £-coupling of Z and Z', then we can take Z and
Z' to be the original processes Z and Z'.
Theorem 6.1. Let e > 0 and suppose (Z,Z',T,T',R,R') is a
distributional e-coupling of Z and Z1. Then the underlying probability space can be
extended to support random times T, T', R, and R! such that
(Z,T,R) = (Z,f,R) and (Z',T',R') = (Z',f',R').
In particular,
(eTZ,T,R) = (9T>Z,,R,,T'). (6.3)
Proof. This follows from the transfer extension in Section 4.5 of
Chapter 3. In order to obtain T and R take Yi := Z and (Y{, Y2') := (Z, (f, R))
and define (T',R') := Y2. Similarly, in order to obtain T' and R' take
Yi := Z' and (Y{,YJ) := {Z1, (f',R')) and define (T',R') := Y2. □
180 Chapter 5. SHIFT-COUPLING
This theorem motivates once more dropping the hats when discussing a
(single) distributional £-coupling, when there is no danger of confusion.
Since the transfer extension can be applied countably many times, we can
drop the hats simultaneously when considering, for instance, distributional
^-couplings, 1 ^ k < oo. In fact, the hats can be dropped simultaneously
even when we have an uncountable collection of £-couplings, e > 0: see the
proof of Theorem 2.2 in Chapter 7.
6.4 Turning Distributional into Nondistributional
In the standard settings a distributional £-coupling can always be turned
into a nondistributional one.
Theorem 6.2. Let e > 0 and let (Z,Z',f,f',R,R') be a distributional e-
coupling of Z and Z'. Suppose there exists a weak-sense-regular conditional
distribution of Z' given 9f,Z' [this holds when the state space is Polish and
the paths are right-continuous}. Then the underlying probability space can
be extended to support T, Z"', and T" such that
{Z, T, T") = [Z, f, R) and (Z", T", T) = (Z',f, R')
and (Z,Z",T,T") is a nondistributional e-coupling of Z and Z'.
Proof. Let T and R be as in Theorem 6.1 and put T" := R. To obtain
Z" use the transfer extension in Section 2.12 of Chapter 4 as follows. Take
Yi := (8TZ,T,T") and (Y{,Y{) := ((6»t,Z', R',f'),KflZ') to obtain Y2
such that
(6TZ,T,T",Y2) = (6f,Z',R',f',Kf,Z').
Define Z" by
(eT"Z",KTnZ") := {eTZ,Y2) (thus 9TZ = 9T»Z").
Now (0T» Z", T, T", kT" Z") is a copy ot(0flZ',R',f', Kfl Z'), and it follows
that (Z",T",T) is a copy of (Z',f',R') because (Z",T",T) is determined
in the same measurable way by (6T»Z", T, T", kt»Z") as (Z\ f', R') is by
(9t,Z',R',f',Kf,Z'). D
7 e-Coupling — Inequality and Asymptotics
The last section was devoted to the definition of £-coupling and its
distributional version. We shall now go on to the limit implications. This
section differs from Section 3 (and from Section 5 in Chapter 4) in two
ways. Firstly, we will not be able to establish any rate results. Secondly,
we shall consider two types of convergence in addition to the one (smooth
total variation convergence) that turns out (Section 9) to be appropriately
linked to epsilon-couplings.
Section 7. e-Coupling - Inequality and Asymptotics 181
7.1 £-Coupling Inequality
Rather than shifting to a point picked at random in [0,t] as in the shift-
coupling case, the appropriate thing to do here is to shift to a t as in the
exact coupling case and then blur t slightly (we can also think of the time
origin of the processes as blurred slightly).
Theorem 7.1. Let Z and Z' be one-sided continuous-time
shift-measurable stochastic processes with a general state space (E,£) and path space
(H,'H). Let e > 0 and suppose (Z,Z',T,T',R, R') is an e-coupling of Z
and Z' (distributional or not). Then, for all h > 0 and t £ [0, oo),
\\P(8t+UhZ G •) - ?{9t+UhZ' e -)|| £-COUPLING
^ 2P(T > t) + 2-, INEQUALITY
h
where U is uniform on [0,1] and independent of Z and Z1.
Proof. Let U be uniformly distributed on [0,1] and independent of Z,
Z', and (Z, Z', T, T', R, R'). Clearly ((T - R)+ + Uh) mod h [the
remainder when (T — R)+ + Uh is divided by h] is uniform on [0, h] and
independent of Z. Therefore 9t<T-R)++Uh) mod h.Z is a copy of OuhZ. Similarly,
8((T>-R')++Uh) mod hZ' is a copy of 8UhZ'. Thus
(QtO((T-R) + + Uh) mod hZ, 9t8((T' ~R') + + Uh) mod h.Z )
is a coupling of 6t+UhZ and 6t+UhZ'■
Since
9t8((T-R)++uh) mod h.Z = Qt+(T-R)++uhZ on {Uh ^ h - \T — R\]
and (due to (T - R)+ = T - T A R)
9t+(T-R) + + UhZ = Ot-TAR+UhQrZ On {T^t},
we have
8t8({T-R)Jr + Uh) mod h^ = ^t-TAR+UhOrZ
on
C:= {T^«,t//i^/i-|r-.R|}.
Similarly,
9t8((T'-R') + + Uh) mod /i-^ = 8t-T'AR' + Uh^T'Z'
on
c" = {i?'^,t//i^/i-|fl'-r'|}.
182 Chapter 5. SHIFT-COUPLING
These two identities together with (9TZ, T, R, U) = (Ot>Z', R', V, U) imply
P(8t8((T-R) + + Uh) mod h% S •, C)
= P(9t9((T'-R') + + Uh) mod h% G">C)-
Thus C and C" are distributional coupling events for the coupling at (7.1),
and the distributional coupling event inequality (Theorem 4.3 in Chapter 4)
yields
\\P(8t+Uhz e ■) - P(9t+Uhz' e -)IK 2P(CC). (7.2)
Observing that
P(CC) ^P(T>t) + P(Uh>h-\T-R\) (7.3)
and
P{Uh >h-\T-R\)^ P(Uh > h - e) = |
completes the proof. □
Remark 7.1. When we have a nondistributional £-coupling (Z,Z',T,T')
then R = T" and .R' = T, and in the proof C = C, and C is a
nondistributional coupling event of the coupling at (7.1).
Reformulation. The left-hand side (l.h.s.) of the e-coupling inequality
can be rewritten in the following smooth form:
1 rt+h -. rt+h
- P(esze-)ds-- P(esz'e-)ds
Thus, in the same way as the coupling time and shift-coupling inequalities-
are basic for plain and Cesaro total variation asymptotics, respectively, the
£-coupling inequality is basic for smooth total variation asymptotics.
7.2 Finite T£'s — Smooth Total Variation Convergence
Let (Z(s\,Z'(s\Te,T£,Re,R'e) be £-couplings. The £-coupling inequality
yields, for h > 0,
limsup \\P(9t+UhZ e ■) - P(9t+uhZ' G -)ll < 2P(^ = oo) + 2£-.
t->oo n
Thus if there exist successful epsilon-couplings (distributional or not), then
taking lim inf as e goes to 0 yields
V/i>0: \\P(9t+UhZ e-)-P(9t+uhZ'e-)\\-+0, t->oo. (7.4)
If Z' is stationary, then 9t+uhZ' has the same distribution as Z', and (7.4)
can be rewritten as
V/i > 0 : 9t+uhZ %Z\ t-+ oo. (7.5)
Section 7. e-Coupling - Inequality and Asymptotics 183
7.3 Stochastically Dominated T£'s - Uniform Convergence
In order to obtain rate results a more sophisticated £-coupling inequality
is needed, but uniform convergence follows easily.
Let 2bea class of continuous-time shift-measurable stochastic processes
on a general state space. Suppose there exists, for each £ > 0, a finite
random variable Te such that for each pair of processes Z, Z' 6 Z there is
an £-coupling (distributional or not) of Z and Z' with time Te such that
D _
Te < TV
Then the £-coupling inequality yields that for all h > 0 and t 6 [0, oo),
sup \\P(6t+Uhz e •) - PWt+uhZ' e OIK 2P(re > t) + ie-.
Send first t —> oo and then £ 4- 0 to obtain the following result on uniform
convergence over the class Z:
sup \\p(Ot+uhZe-)-P(Ot+Uhz'e-)\\->o, t^oo.
Z,Z'6Z
7.4 Total Variation Convergence in the State Space
In Section 7.2 we showed that successful epsilon couplings of Z and Z',
where Z' is stationary, imply smooth total variation convergence to station-
arity in the path space, namely (7.5). This certainly implies the following
weaker result on smooth total variation convergence to stationarity in the
state space
V/i>0: Zt+uh'AZ'a, t -> oo.
This latter result can be sharpened to total variation convergence if we put
a severe condition on the paths (while in the path space we will still have
only smooth total variation convergence; see, however, Section 7.2 in the
next chapter).
Theorem 7.2. Let Z and Z' be one-sided continuous-time stochastic
processes with a general state space (E, £) and path space (H, %) where H
consists of paths that are piecewise constant and right-continuous (in the
discrete topology) with finitely many jumps in finite time intervals. Suppose
further that Z' is stationary and that there are successful epsilon couplings
(Z^\Z'^,Te,T^,Re,R'e), £ > 0, ofZ andZ' (distributional or not). Then
t -> Zq, t -> oo.
Proof. In the nondistributional case this is Theorem 10-1 in Chapter 2.
The following modification of the proof is needed to cover the distributional
184 Chapter 5. SHIFT-COUPLING
case. Note firstly that
Zt = Z(+(T_fi)
on the event
C = {TvR<:t-E1 no Z jump in [(T- R) + t - e,(T- R)+ t + e]},
secondly that Zt+(T-R) is determined in the same measurable way by
(9TZ, T, R) on C as Z[ is by (6T,Z', R', V) on the event
C = {T V R' ^ t - e, no Z' jump in[t-e,t + e]},
and thirdly that C is determined in the same measurable way by (OrZ, T, R)
as C" is by (9t'Z',R',X"). These three observations yield
P(zte-,C) = P(zi'e-,C"),
that is, C and C are distributional coupling events of the coupling (Zt, Z't)
of Zt and Zq. The rest of the proof is now the same as in the nondistribu-
tional case (using 2P(C"C) rather than 2P(CC) to bound the total variation
distance). □
7.5 Distributional Convergence in the State Space
We shall now show that successful epsilon-couplings yield convergence in
distribution (see Section 10 in Chapter 3) in the state space, provided that
the state space is metric and the paths are right-continuous.
Theorem 7.3. Let Z and Z' be one-sided continuous-time stochastic
processes with state space (E,£) and path space (H,!!), where E is met- ■
ric, £ its Borel subsets, and the paths are right-continuous {that is, H =
Re{[0, oo))). Let Z' be stationary and suppose there are successful epsilon-
couplings of Z and Z' (distributional or not). Then
Zt —> Zq, t —> oo.
Proof. We must prove that
lim|E[/(Zt)]-E[/(Z£)]|=0 (7.6)
t—>oo
for an arbitrary real-valued bounded continuous function / defined on E.
It is no restriction to take |/| ^ 1. For notational convenience fix an
£ > 0 and let (Z,Z',T,T',R, R') be the unhatted copy of the e-coupling
(ZlE\Z'W,Te,Tl,Re,R'E); see Theorem 6.1. Then
/(Zt) - f(Z't) = f(Zt)l{T>t} - f(Z't)l{R,>t]
+ f(Zt)^{T^t] ~ f{Z't+T'-R')l{R'^t}
+ f(Zt+T'-R')^{R'^t} ~ f{Z't)^{R'^t}-
Section 7. e-Coupling - Inequality and Asymptotics 185
Now, f(Zt)l{T^t] is determined in the same measurable way by (6tZ, T, R)
as f(Z't+T,_R,)l{Rl^t} is by (0T,Z',R',T'). Thus, by the definition of
distributional £-coupling, f(Zt)l{T^t] and f{Z't+T,_Ri)l{R'^t} have the same
distribution and thus the same expectation. Hence the mid-part on the
right-hand side cancels when we take expectations, and we obtain (after
taking absolute value)
|E[/(Zt)] - E[/(Zt')]| ^\nf(Zt)l{T>t}}\ + |E[/(Zt')l{K'>t}]|
+ \E[f(Zl+T,_R,)l{R,m - f(Z't)l{R%i]]\.
Use |/| ^ 1 and \R' — T'\ ^ e, respectively, to dominate the terms on the
right and obtain (recalling that both T and R' are copies of Te)
|E[/(Zt)]-E[/(Zt')]|<2P(rE>t) + E[ sup |/(Z('+J-/(Z;)|].
—e^u^e
Applying the stationarity of Z' on both sides yields
|E[/(Zt)]-E[/(Zi)]|^2P(rE>t)+E[ sup \}{Z'u)-f{Z'e)\}.
Send t to infinity to obtain
limsup|E[/(Zt)]-E[/(Z£)]|
t—»00
^2P(T, = cx))+E[ sup \f(Z'u)-f(Z'e)\].
Now,
sup |/(z:)-/(z;)K|/(z^)-/(z;)|+ sup |/(z;)-/(z^)|,
0^u^.2e 0^u^2e
and thus, by the continuity of / and the right continuity of the paths,
suPo<u<2e \f(Z'u) — f(Z'£)\ —> 0 pointwise as e decreases to zero, and bounded
convergence yields
limsup|E[/(Zi)]-E[/(Z^)]|^21immfP(re = oo).
Since the epsilon-couplings are successful, this yields (7.6). □
7.6 Distributional Convergence in the Path Space
Under the conditions of Theorem 7.3 we have in fact finite-dimensional
distributional convergence with respect to the product metric: for n ^ 1
and *i,... ,tn ^ 0,
(Zt+il,..., Zt+tn) 4 (Z'u,..., Z'tJ, t -> oo.
186 Chapter 5. SHIFT-COUPLING
This follows by applying Theorem 7.3 to the processes
(Zt+ti. • • • > Zt+tn )t£[o,oo) and (Z't+t! i • • • > Z't+tn )<e[o,oo) >
since they are right-continuous in the product metric on En and since an
£-coupling of Z and Z' yields an £-coupling of these vector processes.
Now instead of the finite-dimensional vector (Zt+tl,..., Zt+tn) consider
the whole process beyond t, namely 6tZ. Then distributional convergence
still holds, provided that we impose more conditions: let (E,£) be Polish
and let the paths not only be right-continuous but also have left-hand limits,
that is, take % = VE([0, oo)). Then [see Ethier and Kurtz (1986)] for each
path z in De([0, oo)) the mapping from [0, oo) to De([0, oo)) taking t to 6tz
is right-continuous in the so-called Skorohod topology. Moreover, this
topology can be metrized in such a way that % — T>e([0, oo)) is generated by the
open subsets of De([0, oo)). (This metric makes (De([0,co)),T>e([0,oo)))
Polish, but we shall not need this fact here). By convergence in distribution
in the path space we mean distributional convergence with respect to this
metric.
Theorem 7.4. Let Z, Z1 be one-sided continuous-time stochastic processes
with a Polish state space (E,£) and path space (De([0, oo)),2?e([0, oo))).
Let Z' be stationary and suppose there are successful epsilon-couplings of
Z and Z'. Then
9tZ S> Z', t -> oo.
Proof. This follows by applying Theorem 7.3 to the processes
(0tZ)te[o,oo) and (8tZ')t<=[o,oo)-
These processes have the metric state space (De([0, oo)),T>e([0, oo)) and
have right-continuous paths. Moreover, an £-coupling of Z and Z' yields an
£-coupling of (04Z)t6[O,oo) and (6>(Z')<e[o,oo)- □
Remark 7.2. Since successful epsilon-couplings imply convergence in the
path space in both distribution and smooth total variation, the question
arises of what the relation is between these two modes of convergence. Due
to Theorem 9.4 below, smooth total variation convergence is equivalent
to the existence of successful epsilon-couplings. The same is not true of
convergence in distribution, as the following simple counterexample shows:
Take Zt = 1/t and Z't=0, 0 ^ t < oo,
to obtain 9tZ -> Z' as t -> oo,
while obviously there are no successful epsilon-couplings.
Thus convergence in distribution is in general strictly weaker than smooth
total variation convergence.
Section 8. e-Coupling - Maximality 187
8 e-Coupling - Maximality
In this section we only establish the following straightforward consequence
of the continuous-time maximality result for exact coupling (Theorem 6.1
in Chapter 4). As in the shift-coupling case this result enables us to show in
the next section that there are epsilon-couplings that are both maximally
successful and also successful when the smooth total variation convergence
(7.4) holds. But the question of how to define an appropriate analogue of
maximal exact coupling is otherwise left open.
Theorem 8.1. Let Z and Z' be one-sided continuous-time shift-measurable
stochastic processes with a general state space (E, £) and path space (H, %).
For eache > 0 there is a distributionale-coupling {Z^e\ Z'(e>, Te,T'e, R£,R'e)
of Z and Z' such that
\\P{6ne+U£Z e ■) ~ P(0ns+ueZ' € -)ll = 2P(Te > ne + e), (8.1)
for 0 ^ n < oo, and
\\p(6Uez e Olr - P(Ousz' e OH! = 2P(T£ = oo), (8.2)
where U is uniform on [0,1] and independent of Z and Z'.
Moreover, there exist nondistributional e-couplings with this property if
there exist weak-sense-regular conditional distributions of Z^e> given 6tc Z^
[this holds when (E,£) is Polish and the paths are right-continuous].
Proof. Fix an e > 0. According to Theorem 6.1 in Chapter 4 there is a
distributional exact coupling of 6usZ and 6usZ' maximal at the times ne,
n ^ 0, and having {ne : 0 ^ n ^ oo} valued coupling times. According
to Theorem 3.1 in Chapter 4 this distributional exact coupling can be
unhatted, that is, the underlying probability space can be extended to
support {ne : 0 ^ n ^ oo} valued random times Le and L'e such that
(6Lt6UsZ,Ls) = (6L,c6UsZ',L's)
and, for all integers n ^ 0,
\\p{9n£eUez e o - v{0nS6Uez' e Oil = 2P(Le > ne). (8.3)
Due to Theorem 9.2 in Chapter 4 it holds that
\\P(0ueZ e Olr - PtfueZ' e Olrll - 2P(LS = oo). (8.4)
Apply the conditioning extension in Section 4.5 of Chapter 3 twice to
obtain first a random variable V [take Y\ := {0Lc9u£Z,Le) and (l7/,!^) : =
{{9L.e0UeZ',L'e),U) and put V := Y2] such that
(8Lc8UeZ,L£,V) £ (0K0UeZ',L't,U)
188 Chapter 5. SHIFT-COUPLING
and then a random variable V [take Yx := {QL,QUeZ', L'e,U) and (Y{, K,') : =
((9LJUeZ,Le,V),U) and put V := Y2\ such'that
(6Lc6UeZ,Le,V,U)^(6L,6UeZ',L'e,U,V'),
which implies
(6Lc+UeZ, Le + Ue, Le + Ve) = (6L,+U£Z', L'e + Ve, L'e + Ue).
Since
\{Le + Ue)- (Le + Ve)\ = \U-V\e^e on {Le < oo},
\(L'e + Ue)-(L'e + V'e)\ = \U-V,\e^e on {L'e < oo},
it follows that (Z,Z',Le + Ue,L'£ + Ue,Le + Ve,L'e + V'e) is a distributional
£-coupling. Since Le is {ne : 0 ^ n ^ oo} valued and C/ is uniform on [0,1],
it holds that
P(Le > ne) = P(Le ^ ne + e) = P(Le + Ue > ne + e),
and thus (8.3) yields (8.1). Finally, P(L6 = oo) = P{L6 + Ue = oo), and
thus (8.4) yields (8.2). □
9 e-Coupling — Smooth Tail cr-algebra - Equivalences
It turns out that there is a cr-algebra playing the same role for epsilon-
couplings as the tail cr-algebra for exact coupling and the invariant a-.
algebra for shift-coupling. We shall call it the smooth tail cr-algebra. In
this section we introduce it, link it to epsilon-couplings, and establish an
analogous set of equivalences as in the exact coupling and shift-coupling
cases.
9.1 The Smooth Tail cr-Algebra
It seems there is no direct way of defining the smooth tail cr-algebra by
explicitly specifying the sets it contains. We shall define it by specifying a
generating class of functions.
Let iS° be the class of tail functions that are right-continuous in time,
that is,
S° = {fET: f{0tz) -> f{z) asijO.ze H}.
Define the smooth tail a-algebra by
<S = cr{<S°}.
Section 9. £-Coupling - Smooth Tail cr-algebra - Equivalences 189
We first note that
1CSCT.
Here <S C T is obvious, while 1 C S follows by observing that if A 6 2,
then 1^(^(2;) = 1a{z) and thus trivially 1^(^(2;) —> 1a(z) as £ 4- 0, that is,
I4 6 <S° and consequently A 6 <S.
Secondly, we note that <S° (and thus S) contains smoothed tail functions,
that is, functions f(h\ h > 0, defined by
fh
f{h)(z) = /T1 / f(0sz)ds, zE H, f £ T and bounded. (9.1)
Jo
That fW e 5° follows from
|/W(0(2)-/«(2)|=ri f(6sz)ds- f(6sz)ds\
Jt Jo
rt+h ft
= /i"1| / }{6sz)ds- / /(e.z)ds|
Jh JO
^2isup|/|//i-^0, U0.
This shows in particular that the inclusion X C S is in general strict.
Finally, in order to see that the inclusion S C T is in general strict,
consider the following example.
EXAMPLE 9.1. Consider real-valued nonnegative processes with path space
H consisting of right-continuous piecewise linear paths having slope —1 and
finitely many jumps in finite intervals and having left-hand limit 0 at the
jumps and rational lengths between jumps (that is, the jump sizes are
rational). Then the set
A5 = {z 6 H : z0 is rational}
equals the set {z 6 H : zt is rational} when t is rational and equals its
complement when t is not rational. Thus A5 6 % for arbitrarily large t.
Thus A5 6 T. But the indicator of A5 is not in <S°, since \j\5{9tz) =
1 — 1,4,5(2) when t is not rational and thus lA5(6tz) cannot go to 1a5{z)
as t 4- 0. This suggests that A5 is not in <S. We shall show indirectly that
A5 is not in S. The set H is the path set of the remaining life process
(see Section 9.1 in Chapter 2) of a nonlattice renewal process with rational
recurrence times. In Chapter 2 (Theorem 7.1) we showed that there are
successful epsilon-couplings of two differently started versions of such a
process. According to Theorem 9.4 below this implies that the distributions
of the two processes agree on S. But if we let one of the processes have a
rational delay and the other have an irrational delay, then the probabilities
of the processes being in A5 are one and zero, respectively. Thus A5 cannot
be in S.
190 Chapter 5. SHIFT-COUPLING
9.2 The Inequality
The following theorem explains what the smooth tail cr-algebra has to do
with epsilon-couplings.
Theorem 9.1. Let Z and Z' be one-sided continuous-time shift-measurable
stochastic processes with a general state space (E, £) and path space (H, 7i).
For each s > 0 let {Z^e\Z'^,Te,T'e,Re,R'e) be an e-coupling of Z and Z'
{distributional or not). Then
||P(Z G .)\s - P(Z' G .)lsll ^ 21iminf P(Te = oo).
eJ.0
Proof. Apply Theorem 9.3 in Chapter 4 to the processes OuhZ and OyhZ'
to obtain
\\p{6teuhz e ■) -T>{6teUhz' e -)\\
-> \\P(6UhZ g -)\T - P(0UhZ' g -)lrll, t -> oo.
Thus sending t —> oo in the £-coupling inequality (Theorem 7.1) yields
\\P(0uhZ G -)\T - P(6uhZ' G -)\r\\ ^ 2P(T£ = oo) + 2e/h.
Take lim inf as e \. 0 to obtain
\\P(6UhZ G -)lr " PtfuhZ' G Olrll < 2liminf P(Te = oo), h> 0.
eJ.0
A reference to the following lemma completes the proof. □
Lemma 9.1. It holds that
\\P(Ze-)\s~P(Z'e-)\s\\
= sup \\P(6UhZ e -)lr - P(^fc^' e Olrll-
/i>0
Proof. Put
./ = P(Ze-)lr-P(^e-)lr,
i/<fc> = p(0™z g -)\T - P(oUhz' g -)lr, fc > o.
For bounded / G T let /CO be defined by (9.1) and recall that /CO G 5°.
Note also that if / G <S°, then /CO -> / pointwise as /i 4- 0.
We must prove
|H5||=sup||I/W||.
h>0
For that purpose take an A G S such that |Ms|| = 2v(A) [see Theorem 8.2
in Chapter 3, the first equality in (8.11)] and fix an e > 0. It is a basic
Section 9. e-Coupling - Smooth Tail cr-algebra - Equivalences 191
fact of bounded measures [see Ash (1972), Theorem 1.3.11] that if a cr-
algebra is generated by an algebra, then each set in the cr-algebra can
be approximated in measure by a set in the algebra (the measure of the
symmetric difference of the sets can be made arbitrarily small). Now, <S is
generated by the algebra
{{(/i, ...,/„) £ B} : A,..., /„ G S°, B e B(Rn), 1 ^ n < oo},
and thus there is an n ^ 1 and a B 6 B(Rn) and functions fi,.-.,fn in <S°
such that
/
lA-lB(fi,...,fn)\d\v\^e. (9.2)
Moreover, it is a basic fact of bounded measures on (En,Z?(En)) [see Ash
(1972), Theorem 2.4.14] that the indicator of any set in B(Rn) can be
approximated by a [0,1] valued continuous function in such a way that the
integral of the absolute value of their difference can be made arbitrarily
small (approximation in L\). Apply this to the measure |^|(/i, • • •, /n)_1
to find a continuous function a from En to [0,1] such that
/
lfl(/i,...,/„)-/|dH^e, where / = a(/,,..., /„). (9.3)
Since /i,..., /n are in 5° and a is continuous, it follows that / is in 5°,
which implies /W —> / pointwise as h \. 0. Hence, by bounded convergence,
there is an h > 0 such that J \f — f^\ d\v\ ^ £, which together with (9.2)
and (9.3) yields
J\lA-fw\d\v\^3e.
Combine this,
u(A) = f /(fc) dv + f(lA - f{h)) dv,
and / /(h) dv = J f dv^ to obtain
v(A) ^ f fdvW +3e.
Since \\v\s\\ = 2v(A) and
||i/(h)|| =2 sup gdv(h) [see Theorem 8.2 in Chapter 3],
geT J
this yields \\v\s\\ ^ suph>0 \\v^\\ + 3e. Since e > 0 is arbitrary, we obtain
\v\s\\ ^ sup||i/
h>0
CO I
192 Chapter 5. SHIFT-COUPLING
The converse ||i>|s|| ^ suph>0 \\v^\\ holds, since g 6 T implies g^ 6 <S
and thus
Hs||£2 sup /Vfc>di/=|l"(fc)ll>
0<C9<:i
and the lemma is established. □
9.3 Maximally Successful Epsilon-Couplings
Once more equality can be obtained in the inequality. Thus there exist
maximally successful epsilon-couplings (attaining the supremum of the success
probabilities over all collections of epsilon-couplings) and
maximal success probability = ||P(Z 6 -)|s A P(Z' 6 -)lsll-
Theorem 9.2. Let Z and Z' be one-sided continuous-time shift-measurable
stochastic processes with a general state space (E, £) and path space (H, Ti).
The distributional epsilon-couplings {Z^\Z'^\Te,T'e,Re,R'e), e > 0, of Z
and Z' in Theorem 8.1 are such that
||P(Z e -)\s - P(Z' e -)|sll = 2supP(T, = oo)
= 2limP(T£. = oo).
PROOF. The first equality follows from Theorem 8.1 and Lemma 9.1.
Certainly,
limsupP(Te = oo) ^ supP(Te = oo)
ej.0 £>0
and thus the second equality follows from the first and Theorem 9.1. □
Remark 9.1. There are distributional epsilon-couplings
such that
P|{T,<oo} and p|{T;<oo}
e>0 e>0
are maximal distributional <S-coupling events of Z and Z'. In fact, we may
take the epsilon-couplings such that
the events C — {Te < oo} and C = {T'e < oo} do not depend on e;
on Cc the processes Z^6\e > 0, are identical;
on C'c the processes Z'(6\e > 0, are identical.
Section 9. e-Coupling - Smooth Tail cr-algebra - Equivalences 193
In order to establish this let (Z, Z') be a coupling of Z and Z' with maximal
distributional <S-coupling events C and C. Then
||P(Ze-|C)|s-P(Ze-|C")ls|l=o,
and thus according to Theorem 9.2, there is for each e > 0 a successful
distributional £-coupling of the processes with distributions P(Z 6 -\C)
and P(Z' 6 -|C"). We may let this £-coupling be independent of C and C.
On C take (Z^\Te,Re) from this e-coupling and on Cc let {Z^e\Te,Re)
be (Z,oo,oo). Similarly, on C" take {Z'(e\T'e,R'e) from this £-coupling and
on C"c let {Z'(e\T'e,R'e) be (Z',oo,oo).
9.4 A Smooth Total Variation Limit Result
The following limit result is quite different from Theorem 9.3 in Chapter 4
and Theorem 5.3 in this chapter.
Theorem 9.3. Let Z and Z' be one-sided continuous-time shift-measurable
stochastic processes with general state space (E,£) and path space (H,T-L).
Then as h 10,
\\P(6Uhz e -)\T - P(eUhz' e -)lrll -> ||P(z e -)\s - P(Z' e -)ls||,
where U is uniform on [0,1] and independent of Z and Z'.
PROOF. The distributional epsilon-couplings in Theorem 8.1 satisfy
\\P{0UeZ € -)|r - P(0ueZ' € -)|rll = 2P(Te = oo), e > 0,
and a reference to the limit result in Theorem 9.2 completes the proof. □
9.5 Equivalences
We can now tie together epsilon-couplings, smooth total variation
convergence, and the smooth tail cr-algebra as follows.
Theorem 9.4. Let Z and Z' be one-sided continuous-time shift-measurable
stochastic processes with general state space (E,£) and path space (H,H).
Let U be uniform on [0,1] and independent of Z and Z'. The following
statements are equivalent.
(a) There exist successful distributional epsilon-couplings of Z and Z'.
(b) For each h > 0, \\P(6t+UhZ e •) - P{6t+UhZ' e -)ll -> 0 as t -> oo.
(c) P(Ze-)\s = P(Z'e-)\s.
These statements are also equivalent to each of the following claims,
(a!) For eache>0, there is a successful distributional e-coupling of Z, Z'.
194 Chapter 5. SHIFT-COUPLING
(c') For each h>0, P(8UhZ e -)lr = P(0uhZ' £ -)lr •
Finally, these statements are equivalent to the existence of a successful
nondistributional e-coupling of Z and Z1 for each £ > 0, if there exists a
weak-sense-regular conditional distribution of Z given &tZ for each random
time T [this holds when (E, £) is Polish and the paths are right-continuous}.
PROOF. By the £-coupling inequality, (o) implies (6); see (7.4). By
Theorem 9.4 in Chapter 4, (6) implies (c'). By Theorem 9.3, (c') implies (c). By
Theorem 9.2, (c) implies (a'). Certainly (o') implies (o). The final claim
of the theorem follows from Theorems 6.1 and 6.2. □
Corollary 9.1. Suppose the equivalent statements in Theorem 9-4 hold.
Then, for each h > 0, the underlying probability space can be extended to
support finite times Th and T'h such that (6uhZ,6uhZ'jT^jT^) is a
successful distributional exact coupling of OuhZ and OuhZ'■ Moreover, if there
exists a weak-sense-regular version of the conditional distribution of Z given
6tZ for each random time T, then for each h > 0, the underlying probability
space can be further extended to support a copy (Z^h\Uh) of (Z',U) such
that (OuhZjOuhhZ^jTh) is a nondistributional exact coupling ofOuhZ and
OuhZ' .
PROOF. According to Theorem 9.4 in Chapter 4, the statement (c')
implies the existence of a successful exact coupling of OuhZ and OuhZ'. This
together with Theorem 3.1 in Chapter 4 yields the distributional claim.
Establish the nondistributional claim by applying the transfer extension
of Section 2.12 in Chapter 4 as follows. Take Y\ := (9Th&uhZ,Th) and
(Yi',y2') := ((OrxOuhZ'^), {KTi'+Uh)Z',U)) to obtain a Y2 such that
{{0TjuhZ,Th),Y2) = {{eT,heUhZ',Th),{KT,h+UhZ',U)). (9.4)
Define (ZW,Uh) by
(KTh+uhhZlh\Uh) := Y2 and 6Th6UhhZ^ := 6Th6UhZ.
Then, due to (9.4), (Z^,Th,Uh) is a copy of (Z',T'h,U). □
Chapter 6
MARKOV PROCESSES
1 Introduction
In this chapter we apply the three sets of coupling equivalences established
in the previous two chapters (Theorem 9.4 in Chapter 4 and Theorems 5.4
and 9.4 in Chapter 5) to Markov processes. To each set of equivalences we
add four more equivalent statements: on triviality, on mixing, on
convergence in the state space, and on the constancy of harmonic functions.
In Section 2 we start by applying the equivalences to a single process
(not necessarily Markovian) adding the triviality and mixing aspects.
Markov processes enter in Section 3, which contains preliminaries. Then
each set of equivalences gets one section (Sections 4, 5, and 6) adding the
two remaining aspects, on convergence in the state space and on harmonic
functions. Section 7 concludes the chapter by considering the implication
of these results in the case when there exists a stationary measure for the
Markov process.
As in the previous two chapters we denote the time parameter by s and
t in accordance with continuous time, but all we need to switch to discrete
time is to restrict s and t to be integer and replace integration (over time)
by summation.
2 Mixing and Triviality of a Stochastic Process
In this non-Markovian section we consider a single one-sided discrete- or
continuous-time stochastic process Z with a general state space (E, £) and
195
196 Chapter 6. MARKOV PROCESSES
some path space (H,H). We shall apply the three sets of coupling
equivalences to the two processes obtained by conditioning Z on being in two
arbitrary sets of paths A and B £ %. To each set of equivalences we add
a triviality aspect and a mixing aspect: a sub-cr-algebra A of 7i is trivial
with respect to Z, and Z is A-trivial, if
P(Zei) = 0orl, AeA,
while mixing properties have to do with asymptotic independence of events
happening early and events happening late in the process.
2.1 Exact coupling: T-Triviality <£> Mixing •«••••
The word 'mixing' is used to indicate some sort of independence between
what happens in a process early on and in the far future. We shall use the
following definition. The process Z is mixing if as t —> oo,
sup \P(6tz eA,zeB)- p(6tz e A)P(z e B)\ -» o, (2.1)
Aen
for each B £ H. Equivalently, Z is mixing if and only if (2.1) holds for all
B of the finite-dimensional form
B = {zeH:ztleAi,...,ztneAn}, (2.2)
where n > 1 and 0 ^ ti < ■ ■ ■ < tn and Ai,...,An £ £. This equivalence
follows from Lemma 2.3(6) below [take Yt = 6tZ\.
Theorem 2.1. Let Z be a one-sided discrete- or continuous-time stochas7
tic process with a general state space (E, £) and a general path space (H, H).
Then the following statements are equivalent.
(a) For each B £ H such that P(Z 6 B) > 0, there exists a successful
distributional exact coupling of Z and the process with distribution
P{Ze-\Z eB).
(b) For each B £ U such that P(Z £ B) > 0,
\\P{6tz e ■) - p(etz £-\ze B)\\ -> o, t -> oo.
(c) For each B £ U such that P(Z £ B) > 0,
p(Ze-)\T = P(Ze-\zeB)\T.
(d) The process Z is T-trivial.
(e) The process Z is mixing.
Section 2. Mixing and Triviality of a Stochastic Process 197
Moreover, in (a) we may replace distributional by nondistributional if there
exists a weak-sense-regular conditional distribution of Z given (5tZ for any
random time T [this holds in discrete time when (E, £) is Polish and in
continuous time when (E,£) is Polish and the paths are right-continuous].
Finally, in each of (a), (b), and (c) we may restrict B £ % to be finite-
dimensional as at (2.2).
PROOF. The equivalence of (a), (6), and (c) follows from Theorem 9.4
in Chapter 4 [take Z' with distribution P(Z £ -\Z £ B)], and so does
the nondistributional claim. The equivalence of (d) and (c) follows from
Lemma 2.1 below [take A = 7"]. The equivalence of (e) and (b) follows from
Lemma 2.2 below [take Yt = 6tZ]. The equivalence of (a), (6), and (c) with
B restricted to be finite-dimensional follows from Theorem 9.4 in Chapter 4
[take Z' with distribution P(Z £ -\Z £ B)]. The equivalence of (c) and (c)
with B restricted to be finite-dimensional follows from Lemma 2.3(a) below
[take A = T\. □
Lemma 2.1. Let A be a sub-a-algebra ofH. The process Z is A-trivial if
and only if
p(Ze-|ZeB)U = p(Ze-)U (2-3)
for all B EH such that P(Z £ B) > 0.
Proof. To establish the 'only-if part take A £ A and B £ % such that
P(Zefl)>0 and note that .4-triviality implies that P(Z £ A\Z e B) = 0
or 1 according as P(Z 6 A) = 0 or 1. To establish the 'if part take B e A
such that P(Z 6 B) > 0 and note that the distributional identity on A
implies that P(Z e B\Z £ B) = P(Z £ B), that is, P(Z £ B) = 1. □
Lemma 2.2. Let Yt, 0 ^ t < oo, 6e random elements in some measurable
space (G,Q). Then, for all B £ %,
sup|P(rt £ A,Z eB)-P(Yt eA)P(ZeB)\ -»0, t->oo, (2.4)
AGS
if and only if for all B £ % such that P(Z £ B) > 0,
\\P(Yt £ •) - P{Yt e-\Ze B)\\ -> 0, . t -> oo.
PROOF. The first limit claim trivially holds if P(Z £ B) = 0, while if
P(Z eB) > 0, then
sup |P(rt £ A, Z £ B) - P(rt £ A)P(Z £ B)\
AeG
= 2-xP(Z £ B)||P(yi £ •) - P(n £ -|Z £ B)||,
and thus the two limit claims are equivalent.
□
198 Chapter 6. MARKOV PROCESSES
Lemma 2.3. (a) Let A be a sub-a-algebra ofH. If (2.3) holds for all B of
the finite-dimensional form (2.2) and such that P(Z 6 B) > 0, then (2.3)
holds for all B Eli such that P(Z e B) > 0.
(b) Let Yt, 0 ^ t < oo, be random elements in some measurable space
(G,Q). If (2.4) holds for all B of the finite-dimensional form (2.2), then
(2.4) holds for all B en.
Proof. It is a basic fact of bounded measures [see Ash (1972),
Theorem 1.3.11] that if a cr-algebra is generated by an algebra, then each set
in the cr-algebra can be approximated in measure by a set in the algebra.
Thus for each Ben and e > 0 there is an n ^ 1 and 0 ^ ii < • • • < tn
and A\,...,An e £ such that with
B£ = {zeH:ztl eAu...,ztn e An}
we have
P(Z eB,Z £BS) + P(Z eBs,Z$B) <e. (2.5)
In order to establish (a), suppose (2.3) holds for finite-dimensional sets.
Fix B e n such that P(ZeB)>0 and take A e A. Then
|p(z e A, z e B) - P(z e A)P(z e B)\
^ \p{z eA,zeB)-p{z eA,z e Be)\
+ \P(z eA,ZeBe)- p(z e A)p(z e Be)\
+ \P(z e A)P(z e Be) - p(z e A)P(z e B)\.
Since (2.3) holds for sets like Be, the middle term on the right is zero, and
due to (2.5), the first and last are less than e. Since e > 0 is arbitrary, this
means that
|P(Z eA,ZeB)- P(Z e A)P(Z e B)\ = 0, as desired.
In order to establish (6), suppose (2.4) holds for finite-dimensional sets. Fix
BG^andAeg. Then
\P{Yt eA,ZeB)- P{Yt e A)P(z e B)\
^ \P{Yt eA,zeB)-p(Yt eA,ZeBs)\
+ \P(Yt eA,ZeBe)- p(Yt e A)P(Z e Be)\
+ \P(Yt e A)P(Z e Be) - p{Yt e A)P(z e B)\,
which together with (2.5) yields
sup |p(rt e A, z e B) - P{Yt e A)p{z e B)\
A€G
^ e + sup |p(rt e A, z e Be) - p(Yt e A)P{Z e Be)\ + e.
Aeg
Section 2. Mixing and Triviality of a Stochastic Process 199
Since (2.4) holds for sets like BE, the middle term on the right tends to
zero as t —> oo. Since e > 0 is arbitrary, this yields (2.4). □
2.2 Shift-Coupling:' X-Triviality O- Cesaro Mixing O • • •
Let U be uniform on [0,1] and independent of Z. In the continuous-time
case assume now that Z is shift-measurable. Recall that in the discrete-
time case we extend the definition of the shift-maps to noninteger times by
etz = e[t]z,te[o,oo).
The process Z is Cesaro mixing if as t —> oo,
sup \P(0utz eA,z e B) - P(8utz e A)p(z e B)\ -> o (2.6)
for each B EH. Equivalently, Z is Cesaro mixing if and only if (2.6) holds
for all B of the finite-dimensional form (2.2). This equivalence follows from
Lemma 2.3(6) above [take Yt = 6utZ].
Theorem 2.2. Let Z be a one-sided discrete-time or continuous-time shift-
measurable stochastic process with a general state space (E, £) and path
space (H,7i). Let U be uniform on [0,1] and independent of Z. Then the
following statements are equivalent.
(a) For each B £ 7i such that P(Z £ B) > 0, there exists a successful
distributional shift-coupling of Z and the process with distribution
P(z e-\z eB).
(b) For each B £ U such that P(ZeB)>0,
\\p{8utz e •) - PtfmZ e-\ze B)\\ -> o, t -> oo.
(c) For each B eV. such that P(Z £ B) > 0,
P(Ze-)\I = p(Ze-\zeB)\I.
(d) The process Z is I-trivial.
(e) The process Z is Cesaro mixing.
Moreover, in (a) we may replace distributional by nondistributional if there
exists a weak-sense-regular conditional distribution of Z given OtZ for any
random time T [this holds in discrete time when (E, £) is Polish and in
continuous time when {E,£) is Polish and the paths are right-continuous].
Finally, in each of (a), (b), and (c) we may restrict B £ V. to be finite-
dimensional as at (2.2).
PROOF. The equivalence of (a), (b), and (c) follows from Theorem 5.4
in Chapter 5 [take Z' with distribution P(Z £ -\Z £ B)], and so does
200 Chapter 6. MARKOV PROCESSES
the nondistributional claim. The equivalence of (d) and (c) follows from
Lemma 2.1 above [take A = I\. The equivalence of (e) and (b) follows from
Lemma 2.2 above [take Yt = 8utZ]. The equivalence of (a), (b), and (c) with
B restricted to be finite-dimensional follows from Theorem 5.4 in Chapter 5
[take Z' with distribution P(Z £ -\Z £ B)]. The equivalence of (c) and (c)
with B restricted to be finite-dimensional follows from Lemma 2.3(a) above
[take A = 1}. □
2.3 Epsilon Couplings: <S-Triviality <£> Smooth Mixing <£>•••
Finally, assume that Z is a continuous-time shift-measurable process and
let U be uniform on [0,1] and independent of Z.
The process Z is smoothly mixing if as t —> oo,
sup\P(6t+UhZeA,ZeB)-P(6t+uhZeA)P(ZeB)\->0 (2.7)
Aen
for each B £% and /i > 0. Equivalently, Z is smoothly mixing if and only
if (2.7) holds for all B of the finite-dimensional form (2.2). This equivalence
follows from Lemma 2.3(6) above [take Yt = 6t+uhZ\-
Theorem 2.3. Let Z be a one-sided continuous-time shift-measurable
stochastic process with a general state space (E,£) and path space (H^H).
Let U be uniform on [0,1] and independent of Z. Then the following
statements are equivalent.
(a) For each B £ 7i such that P(Z 6 B) > 0, there exist successful
distributional epsilon couplings of Z and the process with distribution
F(ze-\zeB).
(b) For each B £ H such that P(Z £B)>0 and each h > 0,
\\P(6t+Uhz e.) - P(ot+Uhz e-\ze B)\\ ->■ o, t -> oo.
(c) For each B £ % such that P(Z eB)>0,
p(Ze-)\s = P(Ze-\zeB)\s.
(d) The process Z is S-trivial.
(e) The process Z is smoothly mixing.
These statements are also equivalent to each of the following claims.
(a1) For each B £ % such that P(Z £ B) > 0 and all e > 0, there
exists a successful distributional e-coupling of Z and the process with
distribution P(Z £ -\Z £ B).
Section 3. Markov Processes - Preliminaries 201
(c') For each B £ % such that P(Z £ B) > 0 and each h > 0,
P(0Uhz e -)lr = P(0Uhz e-\ze B)\T.
Moreover, in (a') we may replace distributional by nondistributional if there
exists a weak-sense-regular conditional distribution of Z given 6tZ for any
random time T [this holds when (E,£) is Polish and the paths are right-
continuous}. Finally, in each of (a), (b), (c), (a'), and (c') we may restrict
B E Ti to be finite-dimensional as at (2.2).
PROOF. The equivalence of (a), (b), (c), (a'), and (c') follows from
Theorem 9.4 in Chapter 5 [take Z' with distribution P(Z £ -\Z £ B)\, and
so does the nondistributional claim. The equivalence of (d) and (c) follows
from Lemma 2.1 above [take A = S]. The equivalence of (e) and (6) follows
from Lemma 2.2 above [take Yt = 8t+uhZ}. The equivalence of (a), (6),
(c), (a'), and (c') with B restricted to be finite-dimensional follows from
Theorem 9.4 in Chapter 5 [take Z' with distribution P(Z £ -\Z £ B)\. The
equivalence of (c) and (c) with B restricted to be finite-dimensional follows
from Lemma 2.3(a) above [take A = S\. □
3 Markov Processes - Preliminaries
In this section we recall some bare-bone basics for Markov processes and
then reduce total variation convergence in the path space to total variation
convergence in the state space.
3.1 Basics
A discrete- or continuous-time stochastic process Z with a general state
space (E, £) and a general path space (H, H) is a Markov process if the
future depends on the past only through the present, that is, if for all t ^ 0,
6tZ is conditionally independent of KtZ given Zt.
A Markov process Z is time-homogeneous if the conditional distribution of
8tZ given the value of Zt does not depend on t.
If Z is a time-homogeneous Markov process and there is, for 0 ^ s, t < oo,
a regular version Ps of the conditional distribution of Zt+S given the value
ofZt,
P{Zt+s eA\zt = x) = ps(x,A), xeE, Ae£,
then the family of probability kernels Pl, 0 ^ t < oo, is called the semigroup
of transition probabilities, semigroup because Ps+t = PsPl, that is,
Ps+t(x,A)= f Pt(y,A)Ps{x,dy), 0 ^ s,t < oo, x £ E, Ae£.
202 Chapter 6. MARKOV PROCESSES
In the discrete-time case Pn is simply the nth power of the one-step
transition probabilities P = P1.
The distribution of a Markov process Z with transition semigroup Pl,
0 ^ t < oo, is determined by the semigroup and the initial distribution
A = P(Zo 6 0: tne finite-dimensional distributions are
p(ztoeA0,...,ztk eAk)
= [■■[ Ptk-tk-1(xk-1,dxk)---Ph{x0,dxl)\(dx0),
JA0 JAk
0 = t0 < ■■■ <tk, A0,...,Ak £ £, l^k<oo.
Conversely, the following holds. In discrete time, for each probability kernel
P and each probability measure A on (E, £) there exists a time-homogeneous
Markov process Z with P as one-step transition probabilities and A as
initial distribution (Ionescu Tulcea theorem, Fact 4.3 in Chapter 3). And in
continuous time, if (E,£) is Polish and H = £t0.°°); then for each
semigroup of probability kernels Pl, 0 ^ t < oo, and each probability measure
A on (E,£) there exists a time-homogeneous Markov process Z with P(,
0 ^ t < oo, as transition probabilities and A as initial distribution (Kol-
mogorov extension theorem, Fact 3.2 in Chapter 3). This need not be the
case when F/El0-00).
Another Markov process is a version or a differently started version of
Z if it has the same semigroup of transition probabilities. We shall denote
by Zx a version with initial distribution A. Thus \Pl is the distribution of
P(ZX e A) = \Pl(A) := ( P\x, A)\{dx), Ae£.
When Zq = x we write Zx.
3.2 Total Variation Reduced from Path Space to State Space
For Markov processes total variation convergence in the path space reduces
to total variation convergence in the state space, since
||P(6»tzA e ■) - P{8tZx' e -)|| = ||AP( - A'P'H (3.1)
due to the following lemma [take Y = 6tZx, Y' = 6tZx', g(Y) = Z?, and
9{Y') = Zfl
Lemma 3.1. Let Y and Y' be random elements in a measurable space
(E, £) and let g be a measurable mapping from (E, £) to a measurable space
(G,g). Then
\\P(g(Y) e •) - P(s(n e Oil ^ IIP(^ e 0 - p(*" e Oil- (3.2)
Section 3. Markov Processes - Preliminaries 203
Moreover, if the conditional distribution ofY given the value of g(Y) is the
same as that ofY' given the value of g{Y'), then
||P(5(Y) 6 •) - P(3(n 6 Oil = \\P<X GO- P(l" G Oil- (3-3)
Proof. For B e Q we have g~lB e £. Thus
\P(g(Y) eB)- P(g(Y') G B)| ^ sup \P(Y e A) - P(Y' 6 A)\.
Take the supremum in B e G and multiply by 2 to obtain (3.2).
In order to establish (3.3), note that by assumption there is, for each
ie^.a function Q{-,A) from (G,G) to ([0,1],B[0,1]) such that
P(Y eA)- P(Y' 6 A) = E[Q(g(Y), A)} - E[Q(g(Y'),A)}.
Thus
P{Y EA)- P(r € A) < sup E[f(g(Y))} - E[f{g(Y'))].
/ee
Take the supremum in A £ £ and multiply by 2 to obtain the reverse of
(3.2), that is, (3.3) holds. □
3.3 Cesaro and Smooth Total Variation Can Also Be Reduced
In order to reduce Cesaro and smooth total variation convergence from the
path space to the state space we need to be able to replace t in (3.1) by a
random variable that is independent of the processes.
The semigroup of transition probabilities is jointly measurable if for
each Ae £, the mapping taking (x,t) to Pl{x,A) is £ x B[0,oo)/B[0,l]-
measurable.
Lemma 3.2. Let Pl, 0 ^ t < oo, be a discrete-time or continuous-time
jointly measurable semigroup of probability kernels on a measurable space
(E, £) and suppose that for each probability measure A on (E, £) there is
a Markov process Zx with state space (E,£), transition probabilities Pl,
0 ^ t < oo, and initial distribution A. Suppose further in the continuous-
time case that these processes Z are shift-measurable with a common path
space (H, %). Let V be a nonnegative random variable that is independent of
the processes Zx and denote its distribution by F. Then 6yZx is a version
of Zx with initial distribution
P(Z£ € A) = f XPt(A)F{dt), Ae£,
and for all initial distributions A and A',
\\P(0vZxe-)--p(6vZx' e-)\\
(3.4)
I\PlF{dt)- f \'PtF{dt)\\.
204 Chapter 6. MARKOV PROCESSES
Proof. Due to Fact 3.1 below [take Y = V, Y' = Zx, and g{V,Zx) =
U{9vZx)],
P(0vZx 6 A\V = t) = P{6tZx eA\V = t), 0 <^t < oo, AeV..
Due to the independence of V and Zx this yields
~P{6VZX e A\V = t) = P{6tZx €A), 0^t<oo, Aen.
Now P{6tZx 6 A) = JP{ZX € A)APe(da;), and thus
P(<VZA 6 A) = ffp{Zx 6 A)APe(da;)F(dO, 4eE
This means that flyZ* is a version of Zx with J \PtF(dt) as initial
distribution, and (3.4) follows from (3.1). □
3.4 Two Useful Facts
The following fact formalizes the intuitively reasonable idea that when we
condition on the value of a random element Y = y, then we may replace
random elements of the form g(Y, Y') by g{y, Y').
Fact 3.1. Let Y and Y' be random elements in the measurable spaces
{E,£) and (E',£') and suppose there is a regular conditional
distribution Q{-,-) of Y' given the value of Y. If g is a measurable mapping from
(E,£) <g> (E',£') to (ffi,£(ffi)) such that E[g(Y,Y')] exists, then
E[g{Y,Y'W = y} = J9{y,y')Q{y,dy') = E[g{y,Y')\Y = y],
forP{Y £■) a.e. y &E.
For a proof, see Ash (1972), Problem 1 in Section 6.6 and the solution on'
page 450.
Below we shall need the following fact, the earliest and simplest of the
martingale convergence theorems.
Fact 3.2. Let X be a bounded random variable defined on a probability
space (n,T,P) and Tn, 1 ^ n < oo, an increasing sequence of sub-a-
algebras of T. Let T^ be generated by the J-n. Then
E[X\Fn) -> EfXI^oo] a.s., t -> oo.
For a proof, see Ash (1972), Theorem 7.6.2.
4 Exact Coupling
In this section we show that the exact coupling equivalences established so
far hold for the whole family of all differently started versions of a Markov
process. We also add two more equivalent statements, one on convergence in
the state space and one on the constancy of space-time harmonic functions.
Section 4. Exact Coupling 205
4.1 Space-Time Harmonic Functions
A measurable function / from ((£,£) ® ([0,oo), B{[0,oo))) to (K,E(R))
is called space-time harmonic with respect to a semigroup of probability
kernels P(, 0 ^ t < oo, if
f{x,s)= I f{y,8 + t)Pt{x,dy), 0^s,t<<x, xeE. (4.1)
If Zx, a; £ £?, is a family of differently started Markov processes with
transition probabilities Pl, 0 ^ t < oo, then (4.1) can be written as
f{x,s)=E[f{Zx,s + t)], 0^s,t<oo, xeE.
For an example of a space-time harmonic function take A &T and put
fA{x,t) :=P{ZX €M), 0^i<oo, i eE. (4.2)
Then /^ is space-time harmonic, since A € T implies that #SA 6 T, which
implies that 6^l6t6sA = 9SA, which yields the first equality in
fA(x,s) = p(etzxeeteaA)
= E[P(etz* e etetA\zf)] = E[fA{zx,s +t)}.
Note that {9nZx e 0nA} = {Zx e A} ior A e T- This yields the second
equality in
fA(Zx,n) = P(6nZx € e„A|K„Z*,Z*) [Markov property]
= P(Z*eA|KnZ*,^)
-> P(Z* e A|Z*) a.s., in oo, [by Fact 3.2].
Since P(ZX 6 A|ZX) = 1A{ZX) a.s., we obtain
fA(Zx,n) ^ 1A(ZX) a,.s., moo, (4.3)
for A &T- This we shall use in the proof of the next theorem.
4.2 The Equivalences
We are now ready for the exact coupling equivalences.
Theorem 4.1. Let Pl, 0 ^ t < oo, be a discrete- or continuous-time
semigroup of probability kernels on a measurable space (E, £) and suppose
that for each probability measure X on (E,£) there is a Markov process Zx
with state space (E,£), transition probabilities Pl, 0 ^ t < oo, and initial
distribution X. Let {H,H) be some common path space. Then the following
statements are equivalent.
206
Chapter 6. MARKOV PROCESSES
(a) For all initial distributions A and A', there exists a successful
distributional exact coupling of Zx and Zx .
(b) For all initial distributions A and A',
\\P{6tZx £-)-P(9tZx' e-)H-K), i^oo.
(c) For all initial distributions A and A',
P(ZAe-)lr = P(ZA'e-)lr-
(d) For each initial distribution A, Z is T-trivial.
(e) For each initial distribution X, Zx is mixing.
(/) For all initial distributions A and A',
IIAP'-A'P'H-^O, t^oo.
(<?) All bounded space-time harmonic functions are constant.
Moreover, these statements are equivalent to the existence of a successful
nondistributional exact coupling if there exists a weak-sense-regular
conditional distribution of Zx given (3tZx for any random time T [this holds in
discrete time when {E,£) is Polish and in continuous time when {E,£) is
Polish and the paths are right-continuous].
4.3 Proof of Theorem 4.1
The equivalence of (a), (b), and (c) follows from Theorem 9.4 in Chapter 4,
and so does the final claim. The equivalence of (d) and (e) follows from the
equivalence of (d) and (e) in Theorem 2.1. The equivalence of (/) and (b)
follows from (3.1). Thus the theorem is established if we can show that (d)
implies (c), that (/) implies (<?), and that (<?) implies (d).
Suppose (d) holds. Fix A and A' and put A" = (A 4- A')/2. Then, for
AeT,
{P{ZX eA) + P(ZX' € A))/2 = P(ZX" € A) = 0 or 1.
Thus either P{ZX 6 A) and P(ZV 6 A) are both 0 or both 1. Thus (d)
implies (c).
Suppose (/) holds. Then, with / a bounded space-time harmonic
function, x,y 6 E, and 0 ^ s < oo,
|/(i, s) - f(y, s)\ = \J f(z, s + t)P\x, dz) - j f(z, s + t)Pl{y, dz)\
<sup\f\\\P\x,.)-P\y,-)\\^0, t^cxi.
Section 5. Shift-Coupling 207
Thus f(x,t) does not depend on x. This together with (4.1) implies that
f{x,t) does not depend on t either. Thus (/) implies (g).
Suppose (g) holds. Then the function fA defined at (4.2) is constant.
Thus fA{Zx,n) = fA{x,0) = P(ZA £ A) for an arbitrary A, and from (4.3)
we obtain that P(ZA £ A) = 0 or 1. Thus (g) implies (d), and the proof of
Theorem 4.1 is complete.
5 Shift-Coupling
We now show that the shift-coupling equivalences established so far hold
for the whole family of all differently started versions of a Markov process.
We also add two more equivalent statements, one on Cesaro convergence
in the state space and one on the constancy of harmonic functions.
5.1 Harmonic Functions
A measurable function / from ((£,£), B([0, oo))) to (ffi, B(R)) is harmonic
with respect to a semigroup of probability kernels Pl, 0 ^ t < oo, if
f{x) = J f(y)Pt(x,dy), 0^*<oo, x 6 E. (5.1)
A harmonic function can be viewed as a space-time harmonic function that
is constant in the time parameter.
Let Zx, x € E, be a family of differently started Markov processes with
transition probabilities Pl, 0 ^ t < oo, shift-measurable in the continuous-
time case with a common path space {H,H). Then (5.1) can be written
as
f[x) = E[/(Zf)], 0^i<oo, xeE.
For an example of a harmonic function take A 6 X and put
fA{x) := P{ZX € A), 0 ^ t < oo, x 6 E. (5.2)
Then fA is harmonic, since
fA{x) = P{6tZx eA) [AeX means 0"1 A = A]
= E[P{6tZx 6 A\ZX))
= nfA(zn}.
From (4.3) we obtain
fA(Zx) -»■ lA(Zx) a.s., n ^ oo, (5.3)
for A £l. This we shall use in the proof of the next theorem.
208 Chapter 6. MARKOV PROCESSES
5.2 The Equivalences
We are now ready for the shift-coupling equivalences.
Theorem 5.1. Let Pl, 0 ^ t < oo, be a discrete-time or continuous-time
jointly measurable semigroup of probability kernels on a measurable space
(E, £) and suppose that for each probability measure A on (E, £) there is
a Markov process Zx with state space {E,£), transition probabilities Pl,
0 ^ t < oo, and initial distribution A. Suppose further in the continuous-
time case that these processes Zx are shift-measurable with a common path
space {H,7i). Let U be a random variable that is uniform on [0,1] and
independent of the processes Zx. Then the following statements are equivalent.
(a) For all initial distributions A and A', there exists a successful
distributional shift-coupling of Zx and Zx .
{b) For all initial distributions A and A',
\\P{6UtZx € ■) - V{9utZx' € -)ll -► 0, t ->■ oo.
(c) For all initial distributions A and A',
P(ZAeO|i = P(zv eOIz-
(d) For each initial distribution A, Zx is X-trivial.
(e) For each initial distribution A, Zx is Cesaro mixing.
(/) For all initial distributions A and A',
0 / \Psds- / \'Psds
" Jo Jo
t —► 00.
(<?) All bounded harmonic functions are constant.
Moreover, these statements are equivalent to the existence of a successful
nondistributional shift-coupling if there exists a weak-sense-regular
conditional distribution of Zx given 9tZx for any random time T [this holds in
discrete time when (E, £) is Polish and in continuous time when (E, £) is
Polish and the paths are right-continuous].
5.3 Proof of Theorem 5.1
The equivalence of (a), (b), and (c) follows from Theorem 5.4 in Chapter 5,
and so does the final claim. The equivalence of (d) and (e) follows from the
equivalence of (d) and (e) in Theorem 2.2. The equivalence of (/) and (b)
follows from (3.4) with V — Ut and F the uniform distribution on [0,t].
Section 6. Epsilon-Coupling 209
Thus the theorem is established if we can show that (d) implies (c), that
(/) implies (<?), and that (g) implies (d).
Suppose (d) holds. Fix A and A' and put A" = (A + A')/2. Then, for
A el,
(P(ZA e A) + P(ZV e A))/2 = P{ZX" 6 A) = 0 or 1.
Thus either P(ZA 6 A) and P(ZV 6 A) are both 0 or both 1. Thus (d)
implies (c).
Suppose (/) holds. Then, with / any bounded harmonic function and
x,y € E, we obtain by averaging over the time parameter in (5.1)
\m-m\
= t-11 J J f(z)Ps(x, dz) ds- J f f(z)Ps(y, dz) ds\
^ sup|/| HI / Ps{x,-)ds- [ Ps{y,-)ds
11 Jo Jo
-> 0, t -^ oo.
Thus f(x) does not depend on x. Thus (/) implies (<?).
Suppose (<?) holds. Then the function /a defined at (5.2) is constant. Thus
fA{Z^) = P(ZA 6 A) for an arbitrary A, and from (5.3) we obtain that
P{ZX 6 A) = 0 or 1. Thus (g) implies (d), and the proof of Theorem 5.1 is
complete.
6 Epsilon-Coupling
In this section we show that the epsilon-coupling equivalences established so
far hold for the whole family of all differently started versions of a Markov
process. We also add two more equivalent statements, one on smooth
convergence in the state space and one on the constancy of certain space-time
harmonic functions which we shall call smooth.
6.1 Smooth Space-Time Harmonic Functions
We now restrict attention to the continuous-time case. Call a space-time
harmonic function / smooth if for all 0 ^ t < oo and x € E,
I
f{y,t)P'{x,dy)->f{x,t), siO. (6.1)
If Pl, 0 ^ * < oo, is jointly measurable and / is bounded space-time
harmonic, then the smoothed version /(ft) defined, for h > 0, by
f^h\x,t) := h~l J ff{y,t)Pu{x,dy)du, 0^t<oo, x € E, (6.2)
210 Chapter 6. MARKOV PROCESSES
is smooth space-time harmonic: smoothing (4.1) yields that /(h) is space-
time harmonic, and the smoothness follows from
J fih)(y,t)Ps(x,dy)-f^(x,t)
. rs+h r rh r
= h~l\ f(z,t)Pu(x,dz)du- f(y,t)Pu(x,dy)du
= h~l | J' J f(z, t)Pu(x, dz) du- J J f(y, t)Pu(x, dy) du
^ 2/T1sup|/| s ->0, slO.
Note that if / is smooth space-time harmonic, then /(h) —y f pointwise as
hlO.
If Zx, x 6 E, is a family of differently started Markov processes that are
shift-measurable and have a common path space (H, %) and jointly
measurable transition probabilities P4, 0 ^ t < oo, then (6.1) can be written
as
E[/(Z.V)1-►/(*,0, no,
and (6.2) as
fW(x,t)=E[f(Z*Uh,t)},
where U is uniform on [0,1] and independent of Zx.
6.2 The Equivalences
We are now ready for the epsilon-coupling equivalences.
Theorem 6.1. Let Pl, 0 ^ t < oo, be a continuous-time jointly
measurable semigroup of probability kernels on a measurable space {E, £) and
suppose that for each probability measure A on (E, £) there is a Markov
process Zx with state space (E,£), transition probabilities Pl, 0 ^ t < oo,
and initial distribution A. Suppose further that these processes Zx are shift-
measurable with a common path space (H,7{). Let U be a random variable
that is uniform on [0,1] and independent of the processes Zx. Then the
following statements are equivalent.
(a) For all initial distributions A and A', there exist successful
distributional epsilon-couplings of Zx and Zx .
(b) For all initial distributions A and A' and all h > 0,
\\P(9t+UhZx G •) - ?(0t+UhZx' G -)ll -► 0, t ->■ oo.
Section 6. Epsilon-Coupling 211
(c) For all initial distributions A and \',
P{Zxe-)\s = P{Zy e-)|s.
{d) For each initial distribution A, Z is S-trivial.
(e) For each initial distribution X, Zx is smoothly mixing.
(/) For all initial distributions A and A' and each h > 0,
/ \Pt+sds- [ \'Pt+sds
Jo Jo
-t 0, t -> oo.
(<?) All bounded smooth time-harmonic functions are constant.
These statements are also equivalent to each of the following claims.
(a') For all initial distributions A and A' and each e > 0, there exists a
successful distributional e-coupling of Zx and Zx .
(c') For all initial distributions A and A' and each h > 0,
P{eUhzxe-)\T = P{8uhZx' e-)lr-
Finally, these statements are equivalent to the existence of a successful
nondistributional e-coupling of Zx and Zx for all initial distributions A
and A' and each e > 0 if there exists a weak-sense-regular conditional
distribution of Zx given 8tZx for any random time T [this holds when (E, S)
is Polish and the paths are right-continuous].
6.3 Proof of Theorem 6.1
The equivalence of (a), (b), (c), (a'), and (c') follows from Theorem 9.4 in
Chapter 5, and so does the final claim. The equivalence of (d) and (e) follows
from the equivalence of (d) and (e) in Theorem 2.3. The equivalence of (/)
and (b) follows from (3.4) with V = t + Uh and F the uniform distribution
on [t,t + h]. Thus the theorem is established if we can show that (d) is
equivalent to (c), that (/) implies (g), and that (g) implies (c').
Suppose (d) holds. Fix A and A' and put A" = (A + A')/2. Then, for
A 6 S,
{P{ZX eA) + P{ZX' 6 A))/2 = P(ZX" 6 A) = 0 or 1.
Thus either P(ZX £ A) and P(ZX' e A) are both 0 or both 1. Thus (d)
implies (c).
Suppose (c) holds. Then, for each time t and initial distribution A,
P{6tZx eA)= P{9tZx e A\KtZx, Zx) a.s, AGS, (6.3)
212 Chapter 6. MARKOV PROCESSES
since 9tZx is a version of Zx (Markov property and time-homogeneity)
and since, conditionally on its initial value and past, 6tZx is a differently
started version of itself (Markov property). With A £ S, this yields the
second equality in
P(ZA 6 A) = P(9nZx £ 6nA) [holds for any tail set A]
= P{6nZx 6 enA\nnZx, Zx) a.s. [due to (6.3)]
= P(ZA 6 A\nnZx, Zx) a.s. [holds for any tail set A]
-> P(ZA e A\ZX) a.s., ri -> oo, [due to Fact 3.2].
Thus P(ZA 6 A) -> U(ZA) a.s. as n -> oo, that is, P(ZA 6 A) = 0 or 1.
Thus (c) implies (d).
Suppose (/) holds. Then, with / a bounded smooth space-time
harmonic function and /(h) the smoothed version defined at (6.2), we have, for
x,y € E and 0 ^ t < oo,
\fW{x,t)-fW{y,t)\
= j f(h\z,t + s)P'(x,dz)
- J f^{z,t + s)Ps{y,dz)\ [by (4.1) applied to f^]
= h~1\ f f{z,t + s)( f Ps+U{x,dz)du^j
-Jf(z,t+s)(J Ps+U(y,dz)du)\ [by (6.2)]
/ Pu+s{x,-)du- / Pu+s{y,-)du
Jo Jo
—> 0, s —> oo.
Thus f(h\x,i) does not depend on x. This together with (4.1) implies that
f(h\x,t) does not depend on t either. Since /(h) —► / pointwise as h J. 0,
f{x,t) depends on neither a; nor t. Thus (/) implies (<?).
Suppose (g) holds. With A € T and /^ the bounded space-time harmonic
function defined at (4.2), we have that the smoothed version fA ' satisfies
fAh){x,t)=P{eUhZx eM), h>0, 0^t<oo, xeE.
Since }{A\x,t) is constant in x (and £), we obtain that P(6UhZx e •) does
not depend on the initial state x. Thus (5) implies (c'), and the proof of
Theorem 6.1 is complete.
Section 7. Stationary Measure 213
7 Stationary Measure
Consider a discrete- or continuous-time Markov process Z with state space
(E, S) and a semigroup of transition probabilities Pl, 0 ^ t < oo. A measure
it on (E, S) is a stationary measure for Z if
WP1 = 7T, 0 ^ * < OO.
A stationary measure with mass one is a stationary distribution.
In Chapter 2 (Section 3) we established the existence of a unique
stationary distribution for irreducible aperiodic positive recurrent Markov chains
and the existence of a <r-finite stationary measure in the null recurrent case.
(See Chapter 10, Theorems 3.1 and 4.1 and Section 4.5, for stationarity in
the case of regenerative Markov processes.)
7.1 Stationary Distribution — Asymptotic Stationarity
If there exists a stationary distribution it, then the equivalent statements
in Theorems 4.1, 5.1, and 6.1, respectively, are clearly equivalent to each
of the following claims:
VA : XPl 4 7T, t ->• oo, [add to Theorem 4.1]
i-t
VA : / XPS ds % 7T, t -> oo, [add to Theorem 5.1]
Jo
VA, V/i >0:h~1 \Pt+s ds 4 tt, t -> oo, [add to Theorem 6.1].
Jo
When there exists a stationary measure with finite mass, then dividing
by the mass yields the existence of a stationary distribution. Thus if the
equivalent statements in Theorem 5.1 hold, then a finite stationary measure
is unique up to multiplication by a constant.
7.2 Stationary Distribution — Piecewise Constant Paths
Taking a stationary distribution as initial distribution clearly yields (due to
the time-homogeneity) a stationary version of the Markov process. When
further the paths are piecewise constant with finitely many jumps in finite
intervals, then according to Theorem 7.2 in Chapter 5, the existence of
successful epsilon-couplings implies total variation convergence to stationarity
in the state space.
Thus, if there exists a stationary distribution and the paths are piecewise
constant with finitely many jumps in finite intervals, then the equivalent
statements of Theorem 6.1 imply the stronger statements in Theorem 4.1,
that is, the equivalent statements of Theorem 6.1 are equivalent to those
in Theorem 4.1.
214 Chapter 6. MARKOV PROCESSES
In particular, the existence of successful epsilon-couplings implies the
existence of a successful exact coupling. This is surprising, but not too
surprising because the sojourn times in the states visited are exponential.
We leave open the question whether this holds without the assumption that
there exists a stationary distribution.
7.3 A cr-Finite Stationary Measure — Uniform Nullity
Suppose there exists a a-finite stationary measure w with infinite mass,
that is,
tt(E) = co,
and there is a sequence of sets A\,A2,--- £ £ such that
ir(An) < co and An t E as n —>• oo.
Let ir(-\An) be the conditional stationary measure, that is, ^(-^n) is the
probability measure on (E, £) defined by
ir(A\An) = ir(AnAn)/ir(An), Ae£.
Clearly, n(-\A„) ^ w/-ir(An) and thus
w(-\An)P\A) 4: vP\A)MAn) = ir(A)/ir(An).
Thus
A = tt(-|A„) =► P(ZeA G •) ^ ir/ir(An), 0 < * < oo. (7.1)
This can be used to deduce the following result on 'uniform nullity'.
Theorem 7.1. (i) Suppose the equivalent statements of Theorem 4-1
hold. If there exists a a-finite stationary measure it with infinite mass,
then for each initial distribution X and each c < oo, as t —> oo,
XPl(A) ->■ 0 uniformly in A £ £ such that it {A) ^ c, (7.2)
or equivalently, for each initial distribution X and each e > 0, as
t —> oo,
XPt(A)
Vtt ->■ 0 uniformly in Ae £. (7.3)
e + ir(A) " v ;
(ii) Suppose the equivalent statements of Theorem 5.1 hold. If there
exists a a-finite stationary measure it with infinite mass, then for each
initial distribution X and for each c < oo, as t —► oo,
t~l I XPS{A) ds^O uniformly in A £ £ such that n(A) ^ c
Jo
Section 7. Stationary Measure 215
or equivalently, for each initial distribution A and each e > 0, as
t —>• oo,
f1 /' XPs(A)ds A „
ttt > 0 uniformly in A G £.
e + ir(A)
(Hi) Suppose the equivalent statements of Theorem 6.1 hold. If there
exists a a-finite stationary measure it with infinite mass, then for each
initial distribution A, each h > 0, and each c < oo, as t —> oo,
jt+s/
/ \Pt+s(A) -> 0 uniformly in A£ £ such that n(A) ^ c,
Jo
or equivalently, for each initial distribution A, each h > 0, and each
e > 0, as t —> oo,
tfxpt+'iA)
— r~r, ► 0 uniformly in A G £.
e + n(A)
7.4 Proof of Theorem 7.1
Let Z have an arbitrary initial distribution A and let Z' have the initial
distribution 7r(-|v4„). Since
P(Zt eA)< \\P(Zt e •) - P(Z't e Oil + P(3 e A)
and since [due to (7.1)]
P(Z't G •) ^ 7r/7r(An), (7.4)
we obtain
sup P(Ze G A) ^ \\P(Zt g •) - P(Z(' G -)ll + c/tt(A„).
Since one of the equivalent statements in Theorem 4.1 claims that
||P(ZtG-)-P(3e-)ll->o, ^0O;
this yields
limsup sup P(Zt Ei)^ c/ir(An).
Send n —>• oo to obtain (7.2).
216 Chapter 6. MARKOV PROCESSES
In order to see that (7.2) implies (7.3) suppose (7.2) holds. Then
\Pf(A) ^ suPw(AKcAP'(A) l
Ae£e + 7r(^4) e e + c
, t -> co, [due to (7.2)]
e + c
—> 0, c —>• oo.
In order to establish the converse suppose (7.3) holds. Then
XPt(A)
sup XPt(A)^(e + c) sup i—f- -> 0, * -> oo.
Aef Aef £ + Tr(A)
Thus (i) is established.
To establish (ii) and (Hi), let [/ be uniform on [0,1] and independent of
Z and Z' and note that (7.1) yields both
P(Z'ut G •) ^ 7r/7r(An), 0 ^ t < oo,
and
P(Z't+Uh G •) ^ 7r/7r(An), 0 ^ t < oo.
In'The above argument use these inequalities rather than (7.4) and replace
P(Zt £ A) by P(Zut e A) and P(Zt+uh € A), respectively, to obtain (ii)
and (m). Theorem 7.1 is established.
For more on Markov processes, see Section 4.5 in Chapter 10 (in particular,
Remark 4.2).
Chapter 7
TRANSFORMATION
COUPLING
1 Introduction
The last three chapters were concerned with shifting one-sided stochastic
processes, 6fZ = (Zt+S)o^s<00. This chapter extends the view to an abstract
setup where general random elements replace stochastic processes and a
semigroup of transformations G replaces the shift-maps 6t, 0 ^ t < oo.
As mental preparation for this leap, we start off in Section 2 by observing
that the shift-coupling theory of Chapter 5 (Sections 2 through 5) extends
from one-sided processes to two-sided processes, Z = (Zs)_00<s<00. The
two-sided case is even easier to deal with, since, while the one-sided shifts
do not form a group, the two-sided shifts, 6tZ := (Zt+s)-oo<s<oo, do: if
we shift the origin to t, we have not lost what happened before time t and
can shift back, 8-t8tZ = Z. The same observation applies to random fields
with the index set B.d (processes in d-dimensional time).
The main part of the chapter, Sections 3 through 6, then deals with
transformation coupling: the generalization of shift-coupling. In order to
stress similarities (and dissimilarities), the treatment parallels that of shift-
coupling presented in Sections 2 through 5 of Chapter 5: we use analogous
section titles and enumerate the theorems in the same way. Several proofs
are more or less replicas of the proofs in the shift-coupling case, but we
go through all details again to explicate where the abstract conditions
enter. One of these conditions is the existence of an invariant measure (an
analogue of the Lebesgue measure), which is essential in this theory and
is simply assumed. This semigroup theory applies, for instance, to random
fields with index set [0, oo)d.
217
218 Chapter 7. TRANSFORMATION COUPLING
In Section 7 we spell out the implications of transformation coupling in
the special case when G is a locally compact second countable topological
group. This has similarities with the step from one-sided processes to two-
sided, but this step is even more pleasant because it hands us the existence
of an invariant measure, the Haar measure. Section 8 indicates applications:
selfsimilarity, exchangeability, rotational invariance, ...
Section 9 rounds off by considering briefly a possible generalization of
exact coupling, taking random fields as a specific example.
2 Shift-Coupling Random Fields
In this section we consider shift-coupling of random fields in d dimensions,
highlighting aspects that distinguish this case from the case of one-sided
stochastic processes. This is in part a preview of what is to come because
for several claims we refer to the theory of transformation coupling to be
developed in the subsequent sections.
2.1 Preliminaries
Call a stochastic process (see Section 2 of Chapter 4) with the index set
B.d(d ^ 1) a random field in d dimensions. Thus, in this terminology, a two-
sided continuous-time stochastic process is a random field in one dimension.
Call B.d the site set and a random element in (B.d,B(B.d)) a random site.
Let
Z - (Zs)s€Rd and Z' = (Z's)s€Rd
be two shift-measurable random fields with a general state space (E,£)
and path space {H.'H). Define the shift-maps 6t, t £ B.d, by
6tz = (zt+s)seRd, z £ H.
Shift-measurability means that 8tH = H,t £ B.d, and that the mapping
taking (z,t) in Hx Rd to 6tz in H is n®B(Rd)/'H measurable [that is, the
mapping taking (z,t) in H x Rd to zt in E is 7i <g> B(Rd)/£ measurable].
Shift-measurability holds in the standard settings [when (E,£) is Polish
and the processes right-continuous; see Section 2.8 in Chapter 4]. Unlike
what was the case in Chapter 5, we need never impose any restrictions
beyond shift-measurability in this section.
2.2 Shift-Coupling — Distributional Shift-Coupling
Say that (Z,Z',T,T',C) is a (nondistributional) shift-coupling of Z and
Z' if (Z, Z') is a coupling of Z and Z', T and T' are random sites, and C
is an event such that
6TZ = 6T'Z' on C.
(2.1)
Section 2. Shift-Coupling Random Fields 219
Say that (Z,Z',T,T',C,C) is a distributional shift-coupling of Z and Z'
if (Z, Z') is a coupling of Z and Z', T and X" are random sites, and C and
C are events such that
P(0r£e-,C)=P(0T'£'e-,C). [thus P(C) = P(C")] (2.2)
The shift-coupling (distributional or not) is successful if P(C) = 1. In
this case we somtimes leave out the events and only write (Z,Z',T) or
(Z, Z',T,T') for the shift-coupling.
Observe that for one-sided processes we obtain the shift-couplings of
Chapter 5 (Section 2) from (2.1) and (2.2) by taking C = {T < oo} and
C" = {X" < oo}. In the present case it is no longer natural to use {T < oo}
and {T' < oo} for the shift-coupling events C and C".
With these definitions the shift-coupling results in Chapter 5 (Sections 2
through 5) still hold. This can be seen either by repeating the arguments
in Chapter 5 with straightforward modifications, or by referring to the
abstract group theory in Section 7 below. In fact, the present case is easier
to deal with, since [unlike the one-sided shifts in Chapter 5] the shift-maps
now form a group: if we shift the origin to t, we have not lost a part of the
process and can shift back to the initial origin. In particular, this yields
[see Theorem 3.2 below] that now
distributional shift-coupling can always be made nondistributional
without assuming [as we had to do in Theorem 2.2 of Chapter 5] that (E, S)
is Polish and the processes right-continuous.
2.3 Shift-Coupled Fields Identical, Only with Different Origins
The group property allows us also to simplify the definition of
nondistributional shift-coupling. With
S:=T-T'
definition (2.1) can be rewritten as
6SZ = Z' on C. (2.3)
Thus on C the two random fields are really the same, only with different
origins.
Call (Z, Z', S, C) a shift-coupling of Z and Z' with shift S if (2.3) holds.
The distributional version of this is as follows: call (Z,Z',S,C,C) a
distributional shift-coupling of Z and Z' with shift S if
P(0sze-,C) = P(z'e-,C").
In the successful case this becomes
6sZ = Z',
and we can immediately turn the distributional shift-coupling (Z, Z1, S)
into the nondistributional shift-coupling (Z,6sZ,S).
220 Chapter 7. TRANSFORMATION COUPLING
2.4 Shift-Coupling Inequality - F0lner Averaging Sets
The shift-coupling inequality and the associated Cesaro total variation
convergence over intervals [0, t] in Chapter 5 (Section 3) extend naturally to
the sets [0,t]d, but also to [-£,0]d and to {-t,t]d. In fact the sets can be
quite general.
Let A be the Lebesgue measure on B(Rd) and for all B £ B(Rd) such
that 0 < X(B) < co, let A(-|£) be the uniform distribution on B, that is,
X(A\B) := X(A n B)/X(B), A £ B(Rd).
The following shift-coupling inequality holds (see Section 7.3 below): for
B £ B(Rd) such that 0 < X(B) < oo,
||P(0t,B Z £ •) - P(0Ub Z' € -)||< 2 - 2E[X(S + £|£); C], (2.4)
where [/^ is uniform on B [that is, Ub has the distribution A(-|£)[ and
independent of Z and Z'.
Thus the Cesaro total variation convergence extends to the following
general class of averaging sets. Call a family Bh £ B(Rd), 0 < h < oo,
F0lner averaging sets if
0 < X{Bh) < oo,
(2.5)
X{t + Bh\Bh) -> 1 as /i-s-co, tel"1.
When P(C) = 1, we obtain from (2.5) [take B = Bh m (2.4)] the Cesaro
total variation convergence
l|P(0t/BhZeO-P(0£/BhZ'eOII->o, ft ^oo. (2.6)
This generalization of the Cesaro total variation convergence is not
restricted to random fields with index set B.d. It works for one-sided processes
and for random fields with index set [0, oo)d; see Section 4 below.
2.5 The Sets hB Are F0lner
We shall now give a nice example of F0lner averaging sets, which shows
clearly how general they are.
Theorem 2.1. If B £ B(Rd) and 0 < X(B) < oo, then the family
hB := {hs € Ud : s £ B}, 0 < h< oo,
ore F0lner averaging sets.
Proof. Note first that
X{hBf)(t + hB))/X(hB) = X(B<l(t/h + B))/X(B), t€Rd, (2.7)
Section 2. Shift-Coupling Random Fields 221
and that [with || • H2 denoting the L2 norm with respect to A ]
X(B) - \(B n (t/h + £?)) = 2"1 [(1b - It/h+B? d\
J (2.8)
= 2-1(||lB-le/h+B||2)2.
Let /„, n ^ 1, be a sequence of bounded continuous functions such that
His — /„j|2 —>■ 0, n-+00, [see Ash (1972), Theorem 2.4.14]
to obtain [use \\lt/h+B - fn(- ~ t/h)h = ||ls - fnh in tlie first steP]
Ills - h/h+Bh < 2||1B - /„||2 + ||/„ - /„(■ - t/h)h
->2||lB-/n||2 as/n-oo
—> 0 asn-> 00.
From this and (2.8) we obtain X(B) - \(B n (t/h + B)) -+ 0 as h -+ co,
and a reference to (2.7) completes the proof. □
2.6 The Invariant cr-Algebra — Equivalences
Define the invariant a-algebra by
l={Aen:8tA = A,te Rd}.
Note that I also equals {A £ 7i : 6~[lA = A,t £Rd}, since 0t-1 = 0_e for
(el'i. The following claims are equivalent [see the end of Section 7 below]:
(a) There exists a successful distributional shift-coupling of Z and Z'.
(a1) There exists a random site T such that 8tZ = Z'.
(b) For some F0lner averaging sets Bh, 0 < h < 00, (2.6) holds.
(6') For all F0lner averaging sets Bh, 0 < h < 00, (2.6) holds.
(c) P(Ze-)z = P(Z'e-)i-
When Z' is stationary [that is, Z' = 6tZ' for all t£Rd], then (2.6) becomes
eUBhziAzl, h^oo.
It follows from the equivalence of (c) and (6) that two stationary
random fields agree in distribution on I if and only if they are identically
distributed.
222 Chapter 7. TRANSFORMATION COUPLING
2.7 Shifting a Single Random Field to Obtain Them All
We shall end this section by showing that all shift-measurable random fields
that agree in distribution on the invariant u-algebra can be represented as
a single random field with the origin at different random sites.
Theorem 2.2. Let Z be a shift-measurable random field with site set Rd
defined on a probability space (fi, T, P). All random fields Z' agreeing with
Z in distribution on I can be represented in the form 9tZ on the same
extension of (fi, T, P).
Proof. Extend (fi, T, P) to support independent variables U\,.. ■ ,Ud that
are uniform on [0,1] and independent of Z. For an arbitrary Z'
agreeing with Z in distribution on I let (Z,Z',T) be a successful nondistribu-
tional shift-coupling [exists because (c) implies (a1)]. Let T, denote the ith
component of T, that is, T = (T\,... ,Td)- Let Pj(-|-) be a regular
version of P(fj e •|(Z',fi,...,f'i_1) = •), and let Fj(-|-) be the associated
distribution function and F~1(-\-) its generalized inverse [see Chapter 1,
Section 3]. Define recursively, for i — l,...,d, the components of T by
Ti = Fr\Ui\{Z,Tu...,Ti-1)). □
3 Transformation Coupling
We now turn to the abstract semigroup setup. This section introduces
transformation coupling and its distributional version.
Let G be a class of measurable mappings [transformations) from a
measurable space (E, £) to itself. With 7 £ G and y £ E we write 72/ rather
than 7(2/) in the same way as we wrote 6fZ when shifting paths in the
stochastic processes case. With 7 and r/ £ G let 777 denote the mapping
taking y € E to ■yqy. Assume that G is a semigroup, that is,
7 and 7] e G => 77? £ G, (3.1)
and say that G is a transformation semigroup acting on (E,£).
Further, let Q be a u-algebra of subsets of G and assume that the
measurable semigroup (G, Q) is jointly measurable, that is,
the mapping from G x G to G taking (7, if) to 777
is Q ® G/G measurable,
and that (G, G) acts jointly measurably on (E,£), that is,
the mapping from G x E to E taking (7, y) to -yy
is G <8> £/£ measurable.
The class G is a group if in addition to the semigroup property (3.1), it
holds that
7 £ G => 7 has an inverse 7 : and 7 'eG.
Section 3. Transformation Coupling 223
If G is a group, call G inverse-measurable if
the mapping from G to G taking 7 to 7-1 is Q/Q measurable.
For an example, consider a random field with state space (E,£), site set
[0,oo)d, and an internally shift-invariant path set H C £;[°>°°) . Then the
shift-maps G = {8t : t £ [0, oo)d} form a transformation semigroup acting
on the path space (//,%). We can identify G with [0,oo)d under addition
and let Q be the Borel subsets of [0, oo)d. If we replace the site set [0, oo)d
by Rd, then the shift-maps G = {8t : t £ Rd } form a transformation group
acting on the path space (//,%). In this case we can identify G with Rd
under addition and let Q be the Borel subsets of Rd. In both cases (G, Q)
is jointly measurable. And in both cases (G, Q) acts jointly measurably
if and only if the random field is canonically jointly measurable (that is,
shift-measurable).
More examples of transformation groups are given in Section 8:
permuting the index set of a random field with a countable index set, rescaling
time and space of a real-valued stochastic process, rotating a random field
with site set Rd.
3.1 Transformation Coupling — Definition
Let Y and Y' be random elements in (E, £) defined on a probability space
(fi, T, P). A random transformation is a random element r in (G,Q). The
expression rY denotes the random element in (E, £) defined by
(rY)(u) := r(u)Y(u), wefi.
Call (y, y', r, r1, C) a transformation coupling of Y and Y' if (Y, Y') is a
coupling of y and Y', r and i~" are random transformations, and C is an
event such that
rY = r'Y' on C. (3.2)
The transformations r and i~" are the coupling transformations, P(C) is
the success probability, and (Y,Y',r,r',C) is successful if P(C) = 1.
3.2 Distributional Transformation Coupling — Definition
Call (Y,Y',r,r',C,C) a distributional transformation coupling oiY and
y' if (Y,Y') is a coupling of Y and Y', r and P are random
transformations, and C and C" are events such that
p(ry e-,C) = p(r'y'e-,c") [thus P(C) = p(C)]. (3.3)
When P(C) = 1, this becomes
rt = r'Y'.
224 Chapter 7. TRANSFORMATION COUPLING
A transformation coupling can be seen as a distributional transformation
coupling by identifying (Y,Y',r,r',C) with (Y,Y',r,r',C,C). Use the
word nondistributional to distinguish a transformation coupling from a
distributional one.
Let A be a fixed element not in E (the censoring state). Let Aq be the
constant mapping Aoy = A for all y £ E. Let A\ be the identity mapping
A\y — y for all y € i?. Then we can rewrite (3.3) as
AlcrY^Alc,r'Y'.
3.3 Dropping the Hats in the Distributional Case
Unlike in the shift-coupling case, we now need conditions for dropping the
hats: when we have a distributional transformation coupling of Y and Y',
then we can take Y and Y' to be the original random elements Y and Y'
if, for instance, (G, Q) is Polish.
Theorem 3.1. Suppose (Y,Y',r,r',C,C') is a distributional
transformation coupling of Y and Y'. If
there exist weak-sense-regular versions of t given Y
and of f" given Y' [this holds when (G,Q) is Polish],
then the underlying probability space can be extended to support random
transformations r and jT' and events C and C such that
(Y,r,lc) = (Y,f,lG) and (Y',r',lc-) = (Y',f',l6,).
In particular,
P(rY e-,C) = P(r'r'e-,c").
Proof. This follows from the conditioning extension in Section 2.12 of
Chapter 4. In order to obtain r take Yx := Y and (Y{, Y2') := (Y, (f, ld))
and define (P, lc) :=Y2.ln order to obtain P take YX:=Y' and (Y{, y2') : =
(Y', (/", 1&)) and define (J", lc-) := ^2- □
3.4 Turning Distributional into Nondistributional
When either (E, £) is Polish or (G, Q) is a Polish inverse-measurable group,
then a distributional transformation coupling can be turned into a
nondistributional one.
Theorem 3.2. Let (Y,Y',r,r",C,C') be a distributional transformation
coupling ofY andY'. Suppose either
there exists a weak-sense-regular conditional distribution
of Y' given t'Y' and £ ® £ contains the diagonal (3.4)
{(y,y) '■ V £ E] [both conditions hold when (E,£) is Polish]
Section 4. Inequality and Asymptotics 225
or (G, Q) is an inverse-measurable group and
there exists a weak-sense-regular conditional distribution
(3-5)
of r' given rY' [this holds when {G,Q) is Polish].
Then there is a nondistributional transformation coupling (Y,Y',r,r',C)
of Y and Y' such that
(Y,r,ic) = (Y,r,ic) and (Y',r',ic) = (Y',f',i6,)
and (Y,r,C) can be identified with (Y,r,C).
Proof. If (3.4) holds, apply Theorem 7.5 in Chapter 4 as follows. Let both
g and g' be the mapping taking (7, y, i) in G x E x {0,1} to A^y. Put
V:=(Y,t,ld) and V := (?',/", l&).
Then
g(V) = AletY ^ Ale,t'Y' = g'(V),
and we obtain the desired result [since a(£ U {0, {^\}}) <8> cr(£ U {0, {^}})
contains the diagonal of E U {A} x E U {A}].
If (G, 5) is an inverse-measurable group and (3.5) holds, take (Y, r, lc) '■ =
(Y,r, 1q) and obtain (Y',r') by applying the transfer extension in
Section 2.12 of Chapter 4 as follows. Take Yi := AicTY and (Y{,Y^) : =
(Aic,r'Y', ?') and define T' := Y2 to obtain
(z\lcrf,r') = (z\lc,r'f,f')-
Since G is an inverse-measurable group, this implies that A\cr'~lrY =
A\ Y\ where r"_1 denotes the group inverse of J1'-1 [that is, for each
fixed outcome u, r"_1(o;) is the group inverse of r'(u); thus here T'-1 is
not the inverse of i~" as a mapping of w ]. Let W be a random element
in (£, £) that is independent of C and has the distribution P(Y' € -|C"C).
Define Y' by f' := ^icT'-^y on C and f" := W on Cc. D
4 Inequality and Asymptotics
The last section was devoted to the definition of transformation coupling
and its distributional version. We shall now go on to the associated
inequality and to the limit implications. For this purpose we need an analogue of
the Lebesgue measure. So let us assume now that there exists a finite or
cr-finite measure A on (G,Q) that is right-invariant, that is, A is nontrivial
[A(G) > 0] and
A<y G Q and X(Aj) = X(A), A&g, 7 g G. (4.1)
226 Chapter 7. TRANSFORMATION COUPLING
Recall that A is a-finite if X(G) = oo and G is the union of disjoint sets in
G each with finite mass. For B G G such that 0 < X(B) < oo, let X(-\B) be
the probability measure obtained by conditioning on B, that is,
X(A\B) = X(AnB)/X(B), A£G. (4.2)
4.1 Transformation Coupling Inequality
Here is a generalization of the shift-coupling inequality.
Theorem 4.1. Let Y and Y' be random elements in a general space (E, £).
Let (G, G) be a jointly measurable semigroup acting jointly measurably on
(E,£). Let (Y,Y',r,r',C,C) be a transformation coupling ofY andY',
distributional or not. Suppose
there exists a finite or a-finite right-invariant measure X on (G,G)-
Then, for B G G such that 0 < X(B) < oo,
WP(UBY G •) - P(UBY' G -)ll TRANSFORMATION
^ 2E[1 - X(Br\B);C] COUPLING
+ 2E[1 - X(Br'\B); C] + 2P(Ce), INEQUALITY
where Ub has distribution X(-\B) and is independent ofY andY'.
Comment. This inequality can clearly be rewritten in the following form:
/ p(7r g -)A(d7) - / p(7y g -)A(d7)
Jb Jb
^2F,[X(B)-X(BrnB);C]
+ 2E[A(B) - X(Br' n B);C'] + 2X(B)P(CC).
Proof. Since A is right-invariant, we have for A G £ and B G G,
J l{7y6j4}A(d7) - J l{7rye^}A(d7)
= J hrteAyWl) ~ J^ l{7ye^}A(rf7)
ib JBr
^x(B)-x(BnBr).
Dividing by X(B) and taking expectations over C yields
P(UBY £A,C)~ V{UBrt £A,C)^ E[l - X(Br\B);C].
Similarly,
P{UBr'Y' G A, C) - P{UBY' G A, C') ^ E[l - X{Br'\B); C'}.
Section 4. Inequality and Asymptotics 227
Since P((UB,rY) G -,C) = P{(UB,r'Y') G -,C"), we have
P(UBTY £A,C)- P(UBr'Y' G A,C") = 0.
Summing the last two inequalities and this equality yields
P(UBY £A,C)- P(UBY' G A,C)
< E[l - \{Br\B)\ C] + E[l - \(Br'\B); C'}.
Certainly, P(UBY £ A,CC) - P(UBY' g ^,G'e) ^ P(Ce), and thus
P(UBY £A)- P(UBY' G A)
^ E[l - A(Br|B); C] - E[l - A(Br'|B); C] + P(Ce).
Taking the supremum in A and multiplying by 2 completes the proof. □
4.2 Successful Transformation Coupling — Finite A
Suppose there exists a successful transformation coupling (distributional
or not). Then the transformation coupling inequality simplifies to
\\P(UBYe-)-P(UBY'e-)\\
^ 2E[1 - X(Br\B)} + 2E[1 - \(Br'\B)}.
If A(G) < co, then
A(GDG7) = A(G), 7eG,
since, due to the right-invariance, A(G7) = A(G) = the full measure, and
the intersection of two sets of full (finite) measure has full measure. Dividing
by A(G) yields
A(G) < oo => A(G7|G) = 1, 7 G G. (4.4)
Take B = G in (4.3) to obtain that the right-hand side is zero and thus
UGY = UGY', (4.5)
that is, we need not take a limit to obtain a Cesaro result.
4.3 Successful Transformation Coupling — F0lner Sets
If A is not finite but only cr-finite, then there need not even be a Cesaro
limit result at all. In order to obtain one we must assume the existence of
F0lner averaging sets, that is, a family of sets Bh G Q, 0 < h < oo, such
that
0 < X(Bh) < co,
228 Chapter 7. TRANSFORMATION COUPLING
and, for all 7 £ G,
\{Bhl\Bh) -► 1, h -> 00; (4.6)
see Theorem 2.1 for an example of such sets in the random field case.
If there exists a successful transformation coupling, then (4.3), (4.6), and
dominated convergence yield the Cesaro limit result
||P([/Bhye-)-P(^hre-)II^O, ft->oo. (4.7)
In particular, when Y' is distributionally invariant under G, that is,
7r ^ r, 7 e g,
then (4.7) can be rewritten as
UBhY%Y', h-+oo.
Results on rates of convergence and uniform convergence can of course be
obtained if more is known about the behaviour of E[l — A(.B/j.r|£?/»)] and
E[l - X(Bhr'\Bh)} as functions of h.
5 Maximality
In this section we shall establish a transformation coupling analogue of the
maximality result for shift-coupling, Theorem 4.1 in Chapter 5. The proof
is basically the same, but we repeat it in the present general framework for
the sake of completeness.
5.1 The Maximality Theorem
Note that in the following theorem we do not assume the existence of F0lner
averaging sets, not even the existence of an invariant measure (although
we shall apply the theorem in the next section with n = A).
Theorem 5.1. LetY andY' be random elements in a general space (E,£).
Let (G, G) be a jointly measurable semigroup acting jointly measurably on
(E,£). Let /j, be a finite or a-finite measure on (G,Q). Then there exists
a distributional transformation coupling (Y,Y',r,r',C,C') of Y andY'
such that
f P(jY € -,Cc)Md7) ± / P(lY' e ;C,c)»{drt). (5.1)
JG JG
Moreover, there exists a nondistrihutional transformation coupling of Y and
Y' with this property if either there exists a weak-sense-regular conditional
distribution ofY' given FY' and 8®8 contains the diagonal {(y,y) ■ y £ E}
[this holds when (E,£) is Polish] or (G,G) is an inverse-measurable group
and there exists a weak-sense-regular conditional distribution of f' given
FY' [this holds when (G,G) is Polish].
We prove this result in the next three subsections.
Section 5. Maximality 229
5.2 First Part of Proof- Construction of a Candidate
Since /j, is finite or er-finite, there are disjoint sets £?i,i?2,-'" G G with
union G and such that n(Bn) < oo, n ^ 1. Let i~i, i~2,. ■ ■ be i.i.d. random
transformations in (G, 5) with common distribution
1
The important property of this distribution is that it has the same null sets
as ix. Let F\, i~2, • • • be independent of a sequence of independent quadruples
(Yk,Yl,Ck,C'k), 10<oo,
which have the following properties. Let
(Yi, Y{) be a coupling of V and Y' and let Ci and C[ be maximal
distributional coupling events of (-TiYi, AYi').
This is possible because we can first let (Yi,Y{) be a coupling of V and Y'
and then use Theorem 4.4 in Chapter 4 to obtain C\ and C{. In the same
way we can recursively, for 1 < k < 00, let
(Yk, YD be a coupling of random elements with distributions
P(yfc_i G -|Cfc_i) and P(^'_! G -|Cfc-i) and let Ck and q
be maximal distributional coupling events of (A^j A^')-
Now put
AT = inf {1 < fc < 00 : Cfc occurs} [inf 0 = 00]
and note that [due to the independence of the quadruples (Yjt, Yfc', C^, Cj(.)]
P(K>fc)=P(C,1C)...P(C^), 10<00,
and that [since P(Yfc+i £ •) = p(*fc G "1^)]
p(n e •) = p(n e ;Ck) + p(n+i e -)p(^), 1 ^ * < oo.
This implies [since P(Yi e •) = P(Y~ £ •) and Yi = Yk on Ci = {K ^ 1}]
that the following holds for k = 1,
P(y e-) = P(Yir e-.#«;*)+ P(n+i e-)P(tf>*), (5-2)
and that if it holds for some k, then it holds with k replaced by k + 1, since
p(n+i e -)p(k > k)
= p(n+i e -,cfc+1)p(K > t) + p(n+2 g -)p(C£+1)p(a" > *)
= P(Yk e •, A" = * + !)+P(yfc+2 G-)P(if>* + l),
230 Chapter 7. TRANSFORMATION COUPLING
where we have used the independence of (Yk+\,Ck+i) and {K > k} —
C\ D • • • D Ck for the second identity. Thus by induction (5.2) holds for all
1 ^ k < oo. Drop the last term in (5.2) and send k —► oo to obtain
P(Ye-)^P(YKe-,K<oo). (5.3)
Similarly, with
K' = inf{l ^ k < co : C'k occurs}
we obtain
P(l" € •) ^ P(*K' £;K'<oo). (5.4)
Note that K' is a copy of K and, in particular, ~P{K < oo) = P(2f' < oo).
Let .Too be some fixed element of G and let Yx, and Y^ be
independent of (A, Yfc,Yk\Ck,C'k), 1 ^. k < oo, with arbitrary distributions when
P(K < oo) = 1 and with the following distributions when ~P{K < oo) < 1:
P(Yoo e •) = (P(Y e •) - P(YK e-,K< oo))/p(k = oo),
P(K4 e •) = (P(Y' e •) - P(i^. €-,#'< oo))/P(if' = oo). (5'5)
These distributions are well-defined due to (5.3) and (5.4).
We shall show that the candidate
(Y,Y',r,r',c,C) := (YK,Yk,,rK,rK,,{K < oo},{#' < oo})
is a distributional transformation coupling satisfying (5.1).
5.3 Mid-Part of Proof - Candidate Is Transformation Coupling
If ~P(K < oo) = 1, we obtain from (5.3) and (5.4) that
p(y e •) = P(y e •) and P(Y' e-) = P(Y' e-).
If P(K < oo) < 1, we obtain this same result from (5.5), since K is
independent of Yoo and K' of Y^. Thus (Y, Y') is a coupling of Y and Y'.
For 1 ^ k < oo, we have
p(rY e -,# = *) = P(r*n e -,ck)P(K > A),
p(rf e-,K' = k) = P(rkYk' e -,c;)P(K' > *).
Since C^ and Cjj. are distributional coupling events of (AYt, rkYk) jjind
since P(ii' ^ k) = P(K' > fc), the right-hand sides are identical, and
summing over 1 ^ k < oo yields
p(rYe-,C) = P(rY'e-,c").
Thus (Y,Y',r,r',C,C) is a distributional transformation coupling of Y
and Y'.
Section 6. Invariant cr-Algebra and Equivalences 231
5.4 Final Part of Proof- The Candidate Satisfies (5.1)
The maximality of Ck and C'k means that the subprobability measures
P(A1* £ -,Cck) and P(rkYk' £ -,C"£) are mutually singular, that is, there
is an Ak £ E such that
P(r*n e Ak) = P(An e ^,c,), (5.6)
p(rkYk' e ^) = P(W G 4£, C£). (5.7)
From (5.2) we obtain the equality in
P(Ye-,Cc)^P(YKe-,K^k)
(5.8)
= P(Yk £ -)P(K > A), 0 s$ A < oo.
Let T0 be a copy of rfc and be independent of (Y,Y',r,r',C,C). Then
p(r0y e ^,ce) ^ P(rkYk e ^)P(# > A) [due to (5.8)]
= P(An e Ak,Ck)P(K > A) [due to (5.6)]
^P(Ck)P(K^k)
= P(K = k),
and thus
P(r0Y £ |J ^,CeWp(n ^if <oo) ->0 asn->oo.
ra^fc<oo
Put A = limsup^oo Aj; to obtain
P(r0Y £A,Cc)=0.
Since Jo is independent of (Y, Cc) and has a distribution that has the same
null sets as n, we can write this as
[ P(1YeA,Cc)»(d1) = 0.
Jg
Since liminfjfc-^oo Ak is the complement of limsupj._>00 Ak, we obtain
similarly [using (5.7) rather than (5.6)] that
[ P(1Y'eAc,Cc)»(d1) = 0.
Jg
Thus (5.1) holds, and the proof of Theorem 5.1 is complete.
6 Invariant a-Algebra and Equivalences
In this section we introduce the invariant cr-algebra and extend the set of
equivalences for shift-coupling (Chapter 5, Section 5) to the transformation
coupling case.
232 Chapter 7. TRANSFORMATION COUPLING
6.1 The Invariant a-Algebra
The invariant a-algebra is defined as follows:
1={A££:1-1A = A,1£G}.
The following observation is useful [note that here we really do not need r
to be measurable, nor the transformation class G to be a semigroup].
Lemma 6.1. Let Y be a random element in (E,£). If T is a random
transformation, then
{ry ei} = {ye4 At I.
Proof. With iel and 7 £ G we have
{rYzA,r = 1} = {1Y£A,r = 1}
= {Y£1-lA,r = 1} = {Y£A,r = 1}.
Taking the union over 7 £ G yields the desired result. □
6.2 The Inequality
The following result explains what the invariant er-algebra has to do with
transformation coupling.
Theorem 6.1. Let (Y,Y') be a coupling of random elements Y and Y'
in a general space (E,£). Suppose (G,Q) is a measurable semigroup acting
jointly measurably on (E,£). If C is a transformation coupling event, then
it is an I-coupling event. If C and C' are distributional transformation
coupling events, then they are distributional I-coupling events. In both cases
||P(y e -)li - P(l" e -)lzll *S 2P(CC). (6.1)
Proof. For A £ I we have, due to Lemma 6.1,
{Y eA}nC = {rYeA}nc,
{Y' ei}nC" = {r'Y1 e A] nC".
In the nondistributional case the right-hand sides are identical, and in the
distributional case they have the same probability. And (6.1) is the I-
coupling event inequality (Theorem 7.3 in Chapter 4). □
6.3 Maximally Successful Transformation Coupling
We shall now give conditions under which the coupling in Theorem 5.1
yields equality in (6.1). Thus there is, under these conditions, a maximally
successful transformation coupling (attaining the supremum of the success
probabilities over all transformation couplings), and
maximal success probability = ||P(y G -)|i A p(^' e Olill-
Section 6. Invariant tr-Algebra and Equivalences 233
Theorem 6.2. Let Y and Y' be random elements in a general space (E, £).
Let (G, Q) be a jointly measurable semigroup acting jointly measurably on
(E,£). Suppose
there is a finite or a-finite right-invariant measure A on (G,G)
and suppose one of the following conditions holds:
(i) G is a group,
(ii) X(G) < oo,
(Hi) G is normal [that is, G7 = -yG for all 7 € G],
(iv) Gr]ip C Grj for all rj and ipgG.
Then the distributional transformation coupling (Y,Y',r,r',C,C) of Y
and Y' in Theorem 5.1 with fi = A is such that C and C are maximal
distributional I-coupling events, that is,
||P(yG-)|i-P(ir'G-)lill = 2P(Cc). (6.2)
Comment. Note that (Hi) holds when G is a group and when G is Abelian
[that is, 777 = 777 for all 7 and 77 € G\. Also note that (Hi) implies (iv) since
[due to the semigroup property] <pG C G and thus r^pG C r\G, which
[together with (Hi)] yields Gr]ip C Gr\.
Proof. Let (Y,Y',r,r',C,C) be as in Theorem 5.1 with \i = A. Thus
there is a set A € S such that
/ P(7F G A,Cc)X(dj) = 0 = / P(7F' G Ae,C'c)\(d~t),
Jg Jg
which we can rewrite as
v[JGlhyeA}\(dj);Cc] =0 = -E[JGl{yyleAc}\(d~t);C'c]. (6.3)
Case (i): suppose G is a group. Put
B:={y£E: J l{yyeA}X(d7) > o}.
Then Bc C{y £E : JG l{yyeAc}X(d-y) > 0}, and (6.3) yields
P(Y &B,CC) = Q and P(f e BC,C'C) = 0. (6.4)
Further, for y> € G,
y G <p_1S » W£B O / lhvyeA}X(dy) > 0
J G
<=> / l{7yeJ4}A(d7) > 0 [since A is right-invariant]
JG<p
(6.5)
234 Chapter 7. TRANSFORMATION COUPLING
Since G is a group, we have G<p = G, which yields, together with (6.5), the
first equivalence in
yev~lB «. / l^y€A}X(dj) > 0 o y£B. (6.6)
Jg
Thus B € I, which, together with (6.4), implies mutual singularity on I,
that is, (6.2) holds.
Case (ii): suppose X(G) < oo. First note that (6.5) and (6.4) still hold.
Then note that [due to the right-invariance] X(G<p) = \{G), and thus [due
to X(G) < oo] we have X(G\Gip) = 0 [since the difference set of two sets of
full finite measure has measure zero], which together with (6.5) yields the
first equivalence in (6.6). Thus B £ 1, which, together with (6.4), implies
mutual singularity of P(Y € -,Cc)\i and P(Y' € -,C"e)|i, that is, (6.2)
holds.
Case (iii): suppose G is normal. We observed in the comment
immediately after the theorem that (iii) implies (iv), and thus Case (iii) is a
special case of Case (iv).
Case (iv): suppose Grjtp ^ Grj for all rj and <p £ G. Redefine
B:={yGE: jf 1^ 1{^A}Hdy)=0}X(dr]) = o},
and observe that y £ B implies that there is at least one rj such that
Jg?) ^{tyeA}^^) > 0 and thus [since Gr\ C G]
BC {y&E:J l{^yeA}X(d7) > o}. (6.7)
Note also that y £ Bc implies that there is at least one rj such that
Igt, ^{-ryeA}^(d7) = 0 which in turn implies J*G l^yeAcyX(d'j) > 0 and
thus [since Grj C G]
Bcc{y&E: J lhyeAc}X(dy)>0}. (6.8)
The two inclusions (6.7) and (6.8) together with (6.3) yield that (6.4) also
holds with this definition of B. Hence in order to complete the proof it
only remains to show that B £ 1. For that purpose take ip € G. Since [by
assumption] Gr]ip C Gr], we have
JG lUa„ i{„,c=AjA(d7)=o>A(*7) > JG lUG„ i{lv,A}x(dy)=o}Hdr,). (6.9)
Conversely, since A is right-invariant, we have
Section 6. Invariant tr-Algebra and Equivalences 235
which together with G<p Q G and (6.9) implies
JG l{So,v U-,yzA}Hd-r)=o}x(dv) = Jg l{fan i{„eA}A(d7)=o}A(dij).
This yields the final equivalence in
y G <p~lB <=> <py G B
** JG hfo, i{,.»eA}A(d7)=o}A(d»7) = 0
** / ^-W i{„€A,A(d7)=o}A(^) = 0 [A is right-invariant]
<=> y & B.
Hence B G I, which, together with (6.4), implies mutual singularity of
P(y G -,Cc)\i and P(y' G -,C'c)\i, that is, (6.2) holds. D
6.4 The Cesaro Total Variation Result
In the following theorem we impose a maximal set of conditions (all the
conditions encountered up to now).
Theorem 6.3. Let Y and Y' be random elements in a general space (E, £).
Let (G, Q) be a jointly measurable semigroup acting jointly measurably on
(E,£). Suppose
there exists a finite or a-finite right-invariant measure A on (G, Q)
and let, for each B G Q such that 0 < \{B) < oo, Ub have distribution
A(-|J3) and be independent ofY and Y'.
If X(G) < oo, then
\\P(UGY G •) - P(UGY' G -)|| = \\P(Y G -)|z - P(l" € Olzll-
// A(G) = oo and there exist F0lner averaging sets Bh G Q, 0 < h < oo,
then as h —^ oo,
\\P(UBhY G •) - P(UBhY' G -)ll -+ \\P(Y G -)|z - P(Y' G Olzll,
provided that one of the conditions
G is a group,
G is normal [that is, G-y — ■yG for all 7 G G],
Gr)ip C Gr] for all 77 and <p G G,
holds.
236 Chapter 7. TRANSFORMATION COUPLING
Proof. If X(G) < oo, apply the transformation coupling inequality
(Theorem 4.1) with B = G to the transformation coupling in Theorem 6.2 to
obtain [see (4.4)]
\\P(UGY G •) - P(UGY' G OK \\P(X G Oil - P(3" G Olill-
By Lemma 6.1 we have
||P(Y G Oil - P(X' G Olill = \\P(UGY G Oil - P(UGY' G OHI,
and since the right-hand side is at most ||P([/gY G 0 - P(UgY' G Oil, we
obtain the reversed inequality
||P(Y g oii - P(r g oiill ^ \\p(ugy go- p(ugy' g oil-
If A(G) = oo, apply the transformation coupling inequality with B = Bh
to the transformation coupling in Theorem 6.2 and send h —> oo to obtain
limsup \\P(UBkY GO- P(UBhY' G Oil
h—)-oo
^||P(yG0li-P(Y'G0lill-
By Lemma 6.1 we have, for 0 ^ h < oo,
||P(Y G Oil - P(Y' G Olill = \\P(UBhY G Oil - P(UBhY' G OHI,
and since the right-hand side is at most ||P([/BhY GO- P(UBhY' G Oil,
we have
||P(yG Oli-P(y'e 01x11
^ liminf \\P(UBhY G 0 - P(UBhY' G Oil-
These two inequalities yield the desired result. □
Remark 6.1. By a similar argument we can obtain the inequality (6.1)
directly:
||P(Y g ok - P(r g OHI < HP^r eO- P(uBhY' g oil
^ 2E[1 - X(Bhr\Bh); C] + 2E[X{Bhr'\Bh); C] + 2P(Ce)
-> 2P(CC), h -> oo,
without the concept of coupling with respect to a cr-algebra, but under
unnecessarily strong conditions.
Section 6. Invariant a-Algebra and Equivalences 237
6.5 Equivalences
We can now tie together transformation coupling, Cesaro total variation
convergence, and the invariant cr-algebra as follows.
Theorem 6.4. Let Y and Y' be random elements in a general space (E, £).
Let (G, Q) be a jointly measurable semigroup acting jointly measurably on
(E,£). Suppose
there exists a finite or a-finite right-invariant measure A on (G, Q)
and let, for each B € Q such that 0 < \{B) < oo, Ub have distribution
A(-|.B) and be independent of Y and Y'. Suppose further that one of the
following conditions holds:
G is a group,
\(G) < oo,
G is normal [that is, G~f = jG for all 7 € G],
Gr]ip C Gr\ for all 77 and ip €. G.
Then the following two statements are equivalent:
(a) There is a successful distributional transformation coupling ofY, Y';
(c) P(ye-)|z = P(i"e-)lz-
If X(G) < 00, then the equivalent statements (a) and (c) are equivalent to
(b - finite case) UGY = UGY'.
If A(G) = 00 and there exist F0lner averaging sets Bh € G, 0 < h < 00,
then the equivalent statements (a) and (c) are equivalent to
(b - infinite case) \\P{UBhY € ■) - T>{UBhY' € ■)\\ ->■ 0 as h ->■ 00.
Moreover, these equivalent statements are equivalent to the existence of a
successful nondistributional transformation coupling if either there exists
a weak-sense-regular conditional distribution of Y' given TY' and £ <g> £
contains the diagonal {(y,y) ■ y G E} [this holds when (E,£) is Polish] or
(G, Q) is an inverse-measurable group and there exists a weak-sense-regular
conditional distribution of T' given r'Y1 [this holds when {G,Q) is Polish].
Proof. By Theorem 6.1, (a) implies (c). By Theorem 6.2, (c) implies
(a). By the transformation coupling inequality [see (4.6) when \(G) < 00
and (4.7) when A(G) = 00], (a) implies (6). By Theorem 6.3, (b) implies
(c). The nondistributional coupling claim is due to the final statement of
Theorem 6.2. □
238 Chapter 7. TRANSFORMATION COUPLING
The (almost) omnipresent condition that there exists a right-invariant
measure on (G,Q) is annoying in its abstractness. It holds, however, when G
is a locally compact second countable topological group (and Q its Borel
subsets) or a subsemigroup of such a group. In the next section we spell
out the streamlined theory that results from this fact in the group case.
7 Topological Transformation Groups
This section collects the transformation coupling results of Sections 3-6 in
the special case when G is a locally compact second countable topological
group. In this case all the conditions needed in Sections 3-6 hold, except
the existence of F0lner averaging sets.
7.1 Preliminaries
Let G be a class of measurable mappings (transformations) from a
measurable space {E,£) to itself. Assume that G is a topological group, that is, in
addition to the group properties
7 and rj € G => 777 £ G,
7 € G => 7 has an inverse 7-1 £ G,
let G have a topology with respect to which
the mapping from G x G (with the product topology)
to G taking (7, rf) to 777 and the mapping from G to G
taking 7 to 7-1 are both continuous.
Let Q be the Borel subsets of G and note that (G, Q) is both jointly
measurable and inverse-measurable.
Assume further that G is locally compact and second countable. Then,
according to the following two theorems, there exists a finite or cr-finite
right-invariant measure A on (G, Q) and (G, Q) is Polish.
Fact 7.1. A locally compact topological group possesses right- and left-
invariant measures, the right and left Haar measures. Further, a locally
compact second countable topological group is either compact, in which case
the Haar measures are finite, or a-compact, in which case the Haar
measures are cr-finite.
For a proof, see Halmos (1950), pages 254 and 256.
Fact 7.2. A locally compact first countable topological group has an
invariant metric inducing the topology. Further, a locally compact second
countable metric space is separable and topologically complete (that is, has a
topologically equivalent metric that is complete).
Section 7. Topological Transformation Groups 239
For a proof, see Montgomery and Zippin (1955), page 34; Bourbaki (1948),
page 25; and Bourbaki (1951), page 27. (In fact, any locally compact second
countable topological space has a metric inducing the topology and is either
compact or cr-compact; see Ash (1972).)
Finally, assume that (G, Q) acts jointly measurably on (E, £), let Y and Y'
be random elements in (£,£), and for B € Q such that 0 < \{B) < oo, let
Ub have distribution \{-\B) and be independent of Y and Y'.
7.2 Transforming Y into a Copy of Y' and Vice Versa
Recall that (Y,Y',r,r',C) is a transformation coupling oiY and Y' if
fY = f'Y' onC.
Since G is a group, this can be written as
<PY = Y" on C, (7.1)
where
# = f'~1f [here T'-1 denotes the group inverse of f'\.
Also, we can write this as
Y = S'Y' on C, where <£' = #-1.
Thus transformation coupling means that on C each random element can
be transformed into the other and vice versa.
Recall that (Y,Y',r,r",C,C') is a distributional transformation
coupling of Y and Y' if
P(fY€-,C') = P(f'y-'e ■,(?').
Since (G,G) is Polish, Theorem 3.1 gives us the existence of an unhatted
version of (Y,Y',t,r',C:C'): and Theorem 3.2 gives us the existence of
a nondistributional version.
This means that we can first turn a distributional transformation
coupling into a nondistributional one, then write it in the form (7.1), and finally
unhat (7.1). That is, any transformation coupling (Y,Y',f,f',C,C'),
distributional or not, has an unhatted version (Y,Y',$,C,C) where # is a
random transformation and C and C" are events such that
{Y,1C) = {Y,16) and (Y1, lc-) = (Y', 1&)
and
p($y e-,c) = P(re-,c").
240 Chapter 7. TRANSFORMATION COUPLING
In particular, any successful transformation coupling (Y,Y',f,f'),
distributional or not, has an unhatted version (Y,Y',$) such that
#y = Y'. (7.2)
Thus a successful transformation coupling means that the original Y can
be transformed into a copy of the original Y' and vice versa.
7.3 Transformation Coupling Inequality
A transformation coupling (distributional or not) can always be
transformed to the form (7.2), that is, one of the coupling transformations can
be taken to be the identity mapping. Thus one of the terms in the
transformation coupling inequality (Theorem 4.1) disappears and we obtain the
following simplification: for B G Q such that 0 < X(B) < oo, it holds that
\\P(UBY G •) - P(UBY' G -)ll TRANSFORMATION
^ 2E[1 - \{B$\B); C] + 2P(CC). coupling inequality
If there exists a successful transformation coupling (distributional or not),
then this transformation coupling inequality simplifies further to
\\P(UBY G •) - P(UBY' G -)|| «S 2E[1 - \{B$\B)].
If G is compact (that is, \{G) < oo), then the inequality yields [take B = G
and use G# = G to obtain E[l - \(G$\Gj\ = 0]
UGY = UGY'.
If G is cr-compact (that is, X(G) = oo) and there exist F0lner averaging
sets Bh G Q, 0 < h < oo, then
\\P{UBhY£-)-P{UBhY'£-)\\-+0, A->oo.
Groups possessing F0lner averaging sets are called amenable.
7.4 Maximality — Invariant a-Algebra — Equivalences
According to Theorem 5.1, for any finite or cr-finite measure (i on (G,Q),
in particular for \i = A, there exists a nondistributional transformation
coupling (Y, Y', f, /", C) of Y and Y' such that
/ P(7F G -,Cc)M(d7) -L [ P(7Y' G -,Cc)Kdj). (7.3)
Jg Jg
According to Theorem 6.1, a (distributional) transformation coupling event
C is a (distributional) J-coupling event, and
||P(ye0li-P(^'G-)lill^2P(cc).
Section 8. Self-Similarity - Exchangeability - Rotation 241
According to Theorem 6.2 and Theorem 3.2, there exists a nondistribu-
tional transformation coupling with event C such that
||P(ye-)|i-P(i"e-)lill = 2P(Cc).
According to Theorem 6.3, if G is compact (that is, X(G) < oo), then
\\P(UGY G •) - V(UgY' G -)H = \\P(Y G Oil - PG" G Olzll.
and if G is cr-compact (that is, X(G) — oo) and there exist F0lner averaging
sets Bh G Q, 0 < h < oo, then as h —> oo,
\\P(UBhY GO- P(UBhY' G Oil -+ l|P(^ G Ok - PO" G OMI-
According to Theorem 6.4 and Section 7.2, the following statements are
equivalent:
(a) There is a successful distributional transformation coupling of Y, Y'.
(a1) There exists a random transformation # such that $Y = Y'.
(c) P(re0lz = PO"e0lz-
If G is compact (that is, X(G) < oo), then the equivalent statements (a),
(a1), and (c) are equivalent to
(6 - finite case) UGY = C/Gy'.
If G is cr-compact (that is, X(G) = oo) and there exist F0lner averaging
sets Bh G Q, 0 < h < oo, then the equivalent statements (a), (a'), and (c)
are equivalent to
(6 - infinite case) \\P{UBhY G 0 - P([/b„1" G Oil -> 0 as /i ->■ oo.
Finally, if the distribution of 1" is invariant under G [jY' = Y' for 7 G £],
then (6) can be rewritten as
C/Gy = y' when G is compact |A(G) < 00],
UbhY —> Y' as /i —> 00 when G is cr-compact \X{G) = 00].
It follows from the equivalence of (c) and (b) that two random elements
with distributions that are invariant under G agree in distribution on I if
and only if they have the same distribution.
8 Self-Similarity - Exchangeability - Rotation
The streamlined group theory of the previous section has many potential
applications that are largely unexplored. In this section we shall indicate
applications in a few fields, but these are only stumbling first steps.
242 Chapter 7. TRANSFORMATION COUPLING
8.1 Application to Brownian Motion — Self-Similarity
Let W = (Wt)te[o,oo) be a standard Brownian motion (or standard Wiener
process). This means that W is a one-sided continuous-time real-valued
stochastic process with continuous paths and independent increments [that
is, the increments Wtj — Wt2,..., Wtn_1 —Wtn are independent for all n > 0
and 0 = t\ < ■ ■ ■ < tn] and
Wt is normal with E[W4] = 0 and Var[Wt] = t, t£ [0, oo).
Then W is self-similar in the sense that
lrW = W, 0 < r < oo,
where 7r is the rescaling defined for a path z = {zt)te[o,oo) by
7rZ= {r1/2zt/r)te[0tOO).
Clearly, 7,.7,, = ^rs and 7"* = 7i/r, 0 < r < 00, that is,
G = {7,. : 0 < r < 00}
is a group that can be identified with (0, 00) under multiplication, which in
turn can be identified with K under addition. Thus we are in the framework
of the previous section.
Let Z be another one-sided continuous-time real-valued process with
continuous paths. According to Section 7.4 [the equivalence of (a1) and
(c)], Z and W have the same distribution on measurable sets that are
invariant under the rescalings 7,-, 0 < r < 00, if and only if there exists a
strictly positive finite random variable R such that
jRZ = W.
Further, according to Section 7.4 [the final claim] and since W is self-similar,
the above equivalent claims hold if and only if
■yRhZ%W, h-+oo,
for Rh = eUBh, where Bh € B, 0 < h < 00, is any family of F0lner averaging
sets with respect to the Lebesgue measure A (for instance, Bh = hB where
B € B is such that 0 < X(B) < 00; see Theorem 2.1) and Ush is uniform
on Bh and independent of Z.
8.2 Application in Exchangeability
Let Z and Z' be one-sided discrete-time stochastic processes on a general
state space. For a path z — (2fc)o° and with p a finite permutation of
{0,1,... } define the exchange tt by
■KZ = (Zp(fc) )S^o-
Section 8. Self-Similarity - Exchangeability - Rotation 243
The exchangeable a-algebra consists of measurable sets invariant under
such finite exchanges. A stochastic process is called exchangeable if its
distribution is invariant under finite exchanges [that is, ttZ' = Z1 for finite
exchanges tt].
According to Section 7.4 [the equivalence of (a1) and (c)], Z and Z' have
the same distribution on the exchangeable cr-algebra if and only if there
exists a finite random exchange II such that
IIZ = Z' (permutation coupling).
Further, if Z' is exchangeable, then according to Section 7.4 [the final
claim], the above equivalent claims hold if and only if
UnZ 4Z', n —> oo,
where Un is the random exchange associated with a uniformly distributed
random permutation of {0,..., n} that is independent of Z.
It should be possible to extend this result to one-sided continuous-time
real-valued stochastic processes with right-continuous paths having left-
hand limits: replace finite permutations by splitting a finite interval into
finitely many subintervals (open to the right and closed to the left) and
permuting them. A similar comment applies to random fields.
8.3 Rotational Invariance
Let
Z = (Zs)sERd and Z' = {Z's)seRd
be random fields with a general state space (E, £) and path space (H, %)
and site set Rd, where d ^ 1. Let § be the rotation group, the group of
orthogonal real dxd matrices with determinant 1 (if we allow determinant
— 1, then we get reflections also). This is a compact topological group. Here
let rt denote a rotation of t £ M.d by r € S. Define the rotation maps pr,
r e §, by
prz = {zrt)te^d, z € H.
Assume that Z and Z' are rotation measurable, that is, prH= H,r£§, and
the mapping taking (z, r) in H x § to prz in H is V. <g> £?(§)/% measurable.
The rotation maps G = {pr : r £ §} form a group, which can be identified
with §.
According to Section 7.4 [the equivalence of (a') and (c)], Z and Z'
have the same distribution on measurable sets that are invariant under the
rotation maps pr, r € S, if and only if there exists a random rotation R in
(§,#(§)) such that
PrZ = Z' (rotation coupling).
244 Chapter 7. TRANSFORMATION COUPLING
Further, if the distribution of Z' is invariant under rotations \prZ' = Z' for
r G §], then according to Section 7.4 [the final claim], the above equivalent
claims hold if and only if
PuZ = Z'
where U is uniformly distributed on (§,£(§)).
More generally, the theory of Section 7 applies to finite combinations of
rotations and shifts, and to Lorentz transformations and Poincare
transformations (relativity).
9 Exact Transformation Coupling
This chapter has up to now been concerned with the generalization of shift-
coupling. In this final section we shall comment briefly on the possible
generalization of exact coupling without going into much detail.
9.1 Exact Transformation Coupling
In the same way as exact coupling is the special case of shift-coupling
when the times are identical [T = T'\ we can define exact transformation
coupling to be a transformation coupling with identical transformations. In
this case the framework can be made more general, for instance, we need
not necessarily assume that the transformations form a semigroup nor even
that they take values in the same space as the random elements.
Let Y and Y' be random elements in a general space (E,E). Let G
be a class of measurable mappings (transformations) from (E, S) to some
measurable space (E',81) and let Q be a cr-algebra of subsets of G.
Call (Y, Y1, r, C) an exact transformation coupling of Y and Y' if (Y, Y')
is a coupling of Y and Y', r is a random transformation in (G, Q), and C
is an event such that
rY = TY' on C. (9.1)
Call (Y,Y',r,r',C,C) a distributional exact transformation coupling of
Y and Y' if (Y,Y') is a coupling of Y and Y', T and J1' are two random
transformations, and C and C" are two events such that
p((ry,r) e -,c) = p((r'y',r') e -,c'). (9.2)
Note that if G is a group, then rY = rY' is equivalent to Y = Y', and
thus (9.1) is only a complicated way of writing Y — Y' on C, that is, C
is just a coupling event of Y and Y'. Similarly, when G is a group acting
jointly measurably on {E,£) then (9.2) means only that C and C" are
distributional coupling events of (Y, T) and (Y',r'), nothing more, nor
Section 9. Exact Transformation Coupling 245
less. Thus in order to obtain a nontrivial theory we must now stay away
from groups.
Rather than attempting to build a general theory around this concept we
content ourselves here with considering two ways of applying it to random
fields in d dimensions.
9.2 Random Fields - Exact Coupling - Tail a-Algebra
Let
z = {Zs)se[o,ooy and Z' = {Z's)se[0ooy
be random fields with a general state space {E,£) and a general path
space {H,7i) and site set [0, oo)d where d ^ 2. Define the shift maps 9t,
*€[0,oo)d, by
9tz = {zt+s)se[a,°c)di z £ H.
Call (Z, Z', T) an exact coupling of Z and Z' if (Z, Z') is a coupling of Z
and Z' and T is a [0,oo]d valued random site (the coupling site) such that
6TZ = 6TZ' if T G [0, oo)d. (9.3)
Let A be a state external to E (the censoring or cemetery state or womb).
For t € [0,oo]d \ [0,oo)d define 6t by 6tz = {A)s€[0tOOy, z£ H. Then (9.3)
can be rewritten as
6fZ = Oj-Z .
For a set of sites A C [0, oo)d and z € H, define (3az {z born or observed
inside A) by
W^' = {A iis? A.
Then (9.3) can also be rewritten as
(3t+[0,oc)<<Z = 0T+[O,co)dZ ■
Call (Z,Z',T,T') a distributional exact coupling of Z and Z' if (Z, Z') is
a coupling of Z and Z' and T and T' are [0, oo]d valued random sites such
that
f3T+[o,oo)dZ = 0T'+[o,ooyZ'. (9.4)
When Z and Z' are shift-measurable, then (9.4) can be rewritten as
{0TZ,T) = {6T'Z',T').
246 Chapter 7. TRANSFORMATION COUPLING
The theory of exact coupling in Chapter 4 is easily redone in this d
dimensional setting. Thus we obtain (Theorem 5.1 in Chapter 4) a coupling site
inequality
\\P(9tZ G-)- PWtZ' G Oil «S 2(1 - P(T «£ t)), t G [0, oo)d,
and that (Theorem 6.1 in Chapter 4) there always exists a distributional
exact coupling maximal at integer diagonal sites, that is, such that for
integers n ^ 0,
||P(0(„,...,„)Z GO- P(0(n,...,„)Z' G OH = 2(1 - P(T ^ (n,... ,n))).
Extend the tail cr-algebra to d-dimensions as follows:
«e[o,oo)<'
and let t -> oo denote that all the coordinates of t go to infinity. Then the
following statements are equivalent:
(a) There exists a successful distributional exact coupling of Z and Z'.
(b) \\P(6tZ G 0 - P{0tZ' G Oil -► 0 as t -> oo.
(c) P(Z€0lr = P(^'G0lr-
It should also be possible to work out in a similar way a theory of epsilon-
coupling for random fields with site set [0, oo)d. (A theory of shift-coupling
for random fields with site set [0, oo)d is already implicit in Sections 3
through 6.)
9.3 Random Fields — Remote Coupling — Remote a-Algebra
Another way of extending the exact coupling of one-sided stochastic
processes to random fields is to drop the semigroup of shift maps and rather
assume that the fields coincide outside a bounded ball. In some sense this
is a more natural extension. For instance, it may fit better in the case of
Markov random fields [if the Markov property is taken to be conditional
independence of a field outside and inside certain sets (like bounded convex
sets) given its values on the boundary].
Let Z and Z' be random fields with a general state space (E, £), a general
path space (H,'H), and site set Rd where d ^ 2:
Z = (Zs)seRd and z' = iz's)seRd-
For a set of sites A C [0, oo)d and z G H, define kaz (z killed or censored
inside A) by
(a Use A,
(«AZ)' = \z. its * A.
Section 9. Exact Transformation Coupling 247
Let B be the unit ball around the origin. Call {Z, Z',R) a remote coupling
of Z and Z' if (Z, Z') is a coupling of Z and Z' and R is a [0, oo] valued
random variable (the coupling radius) such that
krbZ = krbZ'. (9.5)
Call (Z,Z',R,R') a distributional remote coupling of Z and Z' if (Z,Z')
is a coupling of Z and Z' and i? and R' are [0, oo] valued random variables
such that
krbZ = kriBZ'. (9.6)
The following theory is easily obtained from the theory of exact coupling
in Chapter 4 by noting that (/trs2')rg[o oo) is simply a one-sided stochastic
process and that R at (9.5) is then a coupling time. Thus (Theorem 5.1 in
Chapter 4) we obtain a coupling radius inequality
\\P(KrBZ G ■) - PKbZ' e -)|| ^ 2P(R >r), r€ [0, oo),
and that (Theorem 6.1 in Chapter 4) there always exists a distributional
remote coupling maximal at integer radii, that is, such that
\\P(KrBZ e ■) ~ V{ktbZ' G -)ll = 2P(i? >r), r = 0,1, 2,....
For a set of sites A, let TZa be the sub-cr-algebra of % generated by the
projection mappings taking z to zt: t $ A. Define the remote a-algebra by
K-.= f| nrB= f) nA.
0^r<oo A bounded
Then the following statements are equivalent:
(a) There exists a successful distributional remote coupling of Z and Z'.
(6) \\P{KrBZ e •) - P{KrBZ' G -)ll -> 0 as r h oo.
(c) p(ZG-)k = P(Z'e-)k-
In the above discussion the site set Kd can clearly be replaced by [0, oo)d
or some other subset of Rd.
* * *
248 Chapter 7. TRANSFORMATION COUPLING
This chapter ends our general treatment of coupling. In the second half of
the book (the remaining three chapters) the focus will be on other topics,
first stationarity and then regeneration, with coupling entering only as a
tool. We therefore conclude at this point with some general comments on
coupling.
There are many aspects of coupling that have not been treated here, like
the domination coupling in partially ordered Polish spaces (see Section 3 in
Chapter 1) and the many ingenious coupling tricks that have been devised
in particular models. Also, as the last two subsections have demonstrated,
there is much yet to be done along the above lines, both in applying the
theory to specific problems and in developing new theory.
Finally, the many coupling equivalences encountered in Chapters 1-7
suggest that the following might be a useful guideline:
Working hypothesis. Any meaningful distributional relation should have
a coupling counterpart.
Chapter 8
STATIONARITY,
THE PALM DUALITIES
1 Introduction
In this relatively self-contained chapter we shift the focus from coupling to
stationarity. [There is, however, an obvious link to coupling because in the
coupling inequalities we would like one of the processes to be a stationary
version of the other. And it turns out that there are coupling applications
in the end, a shift-coupling application in this chapter, exact and epsilon-
coupling applications in Chapter 10.]
The aspect of stationarity under consideration here is the relation
between stationarity and cycle-stationarity. A stochastic process is
stationary if it is distributionally invariant under (nonrandom) time shifts and
cycle-stationary if it consists of cycles forming a stationary sequence (are
distributionally invariant under shifts from one cycle to the next).
We have already encountered examples of this relationship in Chapter 2.
A recurrent Markov chain starting from a fixed state is split into cycles
by the times of successive visits to this state. These cycles are i.i.d. and
thus form a stationary sequence. In the positive recurrent case we showed
(Sections 2.5 and 2.6 in Chapter 2) that in addition to this obvious cycle-
stationary version the Markov chain has also a stationary version. A similar
result was established for renewal processes (Section 9 of Chapter 2). A
zero-delayed renewal process consists of i.i.d. intervals and thus is cycle-
stationary. When the interval lengths have finite mean, we showed that in
addition to this trivial cycle-stationary version the renewal process also has
a stationary version.
249
250 Chapter 8. STATIONARITY, THE PALM DUALITIES
In this chapter we consider two-sided stochastic processes split into cycles
by a sequence of random times (called points) and use the simple approach
of Section 9 in Chapter 2 to develop from scratch a general theory on
the relation between stationarity and cycle-stationarity. The same ideas
will be applied to processes in d-dimensional time (random fields) in the
next chapter. The intuitive motivation for this approach is explained in the
renewal case in Section 9.1 of Chapter 2.
We establish two dualities between stationarity and cycle-stationarity. In
the first duality the stationary process is obtained from the cycle-stationary
one by placing the origin uniformly at random in a cycle after 'length-
biasing' the cycle-length. Conversely, the cycle-stationary process is
obtained from the stationary one by shifting the origin to the right endpoint
of the cycle straddling the origin after 'length-debiasing' the cycle-length.
This duality has the following -point-at-zero interpretation:
The cycle-stationary dual behaves like the stationary process
(1.1)
conditioned on having a point at the origin.
The second duality is produced in the same way as the first with the
modification that the length-biasing (length-debiasing) is done under conditioning
on the invariant cr-algebra. This duality has the following randomized-origin
interpretation:
The cycle-stationary dual behaves like the stationary process
(1.2)
with origin shifted to a uniformly chosen point;
and conversely:
The stationary dual behaves like the cycle-stationary process
(1.2°)
with origin shifted to a time chosen uniformly in R.
This is a version of so-called Palm theory of stationary point-processes,
named after the Swedish engineer Conny Palm, who pioneered this field in
the early forties. Palm theory is used, for instance, in queueing theory to
derive characteristics of a queue observed at particular points (like arrival
or departure instants) from the stationary characteristics, and vice versa.
In Section 2 we establish notation and present the trivial measure-free
part of the dualities (shifting to and from a point), and in Section 3 we prove
the key result for the change-of-measure part (length-biasing and length-
debiasing). In Section 4 we present the point-at-zero duality and then
motivate the point-at-zero interpretation by conditioning and limit results in
Section 5, while Section 6 contains simulation applications. After
introducing the invariant cr-algebra in Section 7, we present the randomized-origin
duality in Section 8 and then motivate the randomized-origin interpretation
by shift-coupling and Cesaro limit results in Section 9. Section 10 concludes
with comments on the two Palm dualities.
Section 2. Preliminaries - Measure-Free Part of the Dualities 251
2 Preliminaries - Measure-Free Part of the Dualities
In this section we shall establish the measure-free framework of the
chapter. Although we use the words stochastic process and random times, no
probability measure is present in this section.
2.1 Process and Points
Let (O,^7) be a measurable space supporting
Z = (Zs)seR and 5 = (5,)!°00
where Z is a two-sided continuous-time stochastic process with a general
state space (E, S) and path space (H, %) and 5 is a two-sided sequence of
random times satisfying
-oo <-■••< 5_2 < S-i < So < Si < > oo
and
5_i < 0 < S0.
Refer to the Sn as points. We shall call nonrandom elements of IR times
(and not points) to distinguish them from these points.
Regard 5 as a measurable mapping from (fi, J7) to the sequence space
{L,C) where
L = {{sk^oo e Kz : -oo < < s_i < 0 ^ so < si < > oo}
and C are the Borel subsets of L, that is,
C = L n Bz.
Thus the pair (Z,S) is a measurable mapping from (fi, J7) to (HxL,7i®£).
Let ~H®C+ denote the class of all measurable functions from [H y.L,~H®C)
to ([0,oo), B[0,oo)).
2.2 The Two-Sided Joint Shift - Shift-Measurability
For (61, define the (joint) shift-map 9t from H x L to H x L by
Ot((z,),€R, (Sfc)-oo) = ((zt+s)seR, (Snt.+k ~ 0-co), (2-1)
where nt_ is determined by (s^)^ as follows:
nt- = n if and only if t e{sn^i,sn}. (2.2)
Note that 6t is a time shift and shifts the points (sjb)??oo regarded as a
sequence of times: 6t shifts (sk)?oo by subtracting t from the times s^ and
252 Chapter 8. STATIONARITY, THE PALM DUALITIES
only shifts the index k of (sjt)??oo to observe the convention that zero (the
time origin) lies between the points indexed by —1 and 0 [in accordance
with this we call k index and not time].
In order to be able to shift at will, assume that Z is shift-measurable,
that is, let the path set H be invariant under time shifts and the mapping
taking (z,t) G H x E to zt G E be H^B/E measurable (which is equivalent
to the mapping taking (z,t) G H x R to (zt+s)seR £ H being % <8> B/H
measurable; see Section 2 of Chapter 4). Shift-measurability is all we need
assume about Z in this chapter. It covers, for instance, processes with a
Polish state space (in fact, separable metric suffices) and right-continuous
paths. When Z is shift-measurable, then the mapping
taking (((2,),eH,(a*)~00),*) eifxlxl
toflf(W,6R)(st)-Jeifxi
is % <8> C <8> B/U <8> C measurable.
2.3 Cycles and Cycle-Lengths — Relative Position
Think of the points S as splitting Z into cycles
Cn '■= {Zs„-1 + s)s€[0,Xn), n G Z,
where Xn is the nth cycle length,
Thus Xq is the length of the cycle Cq straddling the origin:
-^o = So — S-\.
For t G E, put
Nt = n if and only if t G [5n_i, S„).
Note that for s < t, Nt — Ns is the number of points in (s,i\, and that
t ^ 0 => Nt = number of points in [0, £].
Denote the relative position of t in [5jv(-i, 5jvt) by
Ut = (t — SN,-i)/XNt.
Note that the cycle C„ is a one-sided stochastic process vanishing at the
random time Xn. One way of making sense of Cn as a random element is
to place it in the cemetery state A from time Xn onward (see Section 2.9 in
Chapter 4), that is, to identify it with a one-sided stochastic process killed
at time Xn,
Cn := Kx„(Zs„_i+s)se[o,oo); n G Z.
The pair (Z, 5) is determined measurably by (S0, (Cn)^°oo) and vice versa.
Section 2. Preliminaries - Measure-Free Part of the Dualities 253
2.4 The Measure-Free Duality Between (Z,S) and ((Z°,S°),U)
Call Z the process associated with S, and S the points associated with Z.
Observe that we do not postulate any functional link between Z and S. In
applications, however, S is often even determined by Z. For instance, in
the Markov chain example, S is formed by the times of the successive visits
of Z to a fixed state.
We shall write 5° to indicate a sequence of times with a point at zero,
that is,
S0° = 0.
In this case we also write Z° for the associated process although the ° does
not indicate anything about the process except its association with S°.
Let U be a (0,1] valued random variable. Throughout this chapter we
assume that (Z,S) and ((Z°,S°),U) are linked functionally as follows.
When {Z,S) is given, define
(Z°,S°):=6So(Z,S) [thus5g=0],
U := Uq- = —S-i/Xq [Uis the relative position of 0 in (5_i,5o]].
Conversely, when ((Z°,S°),U) is given, define
(Z,S) : = 6_{l_u)xs(Z°,S°) [thus X0 = XS].
Note that {Z,S) and (Z°,S°) have the same cycles,
Cn=C°n, which we can write 6Sn {Z, S) = 6»s° (Z°,S°),
while Sn = (1 - U)X$ + 5°; see Figure 2.1.
Realization of (Z, S)
(to make the illustration easier we let
Z be real-valued with continuous paths
and S be the times of visits to 0)
(Z, S) (Z°, S°)
The gray axis is at
the origin of (Z°, S°).
X, —►|*X3 »-|<-
FIGURE 2.1. The functional duality between (Z,S) and {(Z°,S°),U).
254 Chapter 8. STATIONARITY, THE PALM DUALITIES
3 Key Stationarity Theorem
The last section was measure-free. We now introduce a probability measure
P on (Cl,!F), that is, assume that (Z,S) is supported by the probability
space (fi,^7, P). Call (Z,S) stationary (under P) if it is distributionally
invariant under time shifts: under P,
6t(Z,S) = (Z,S), ieR
Let P° be another probability measure on (Q,J-) and regard (Z°,S°) as
supported by (fi,.F,P0). Call (Z°,S°) cycle-stationary (under P°) if the
sequence of cycles is stationary: under P°,
(...,Cn_i,C„,Cn+i, ...) = (...,C_i,Co,Ci, • • •), n € Z.
Since 6gn (Z, S) is determined measurably by (..., Cn_i, C„, Cn+i,...) in
the same way for all n, and vice versa, it follows that (Z°,S°) is cycle-
stationary if and only if
6Sn(Z,S) = (Z°,S°), neZ.
3.1 The Basic Equivalences
The following theorem characterizes stationarity in several ways. The link
to cycle-stationarity is indicated by the last characterization, which is the
key to the Palm dualities to be studied in the subsequent sections.
Theorem 3.1. Let (Z,S) be supported by the probability space (Q,,J-,T).
The following statements are equivalent:
(a) (Z,S) is stationary under P.
(6) For /e?i®£+ and t G [0, oo), it holds that
E[J N' f(6s(Z,S))/Xff. ds\ = tE[f(Z,S)/X0}. (3.1)
(c) The variable U is uniform on (0,1] and independent of (Z°,S°), and
Nt
E[£/(05*(Z,S))] =*E[/(Z0,5°)/Xo] (3.2)
fc=i
for f &n<Si£+ and t G [0,oo).
(d) The variable U is uniform on (0,1] and independent of (Z°,S°), and
E[f(0sn (Z, S))/X0] = E[f(Z°, S°)/X0] (3.3)
for f &U®C+ andn G Z.
Section 3. Key Stationarity Theorem 255
We prove this result in the next four subsections, but let us first note several
interesting consequences.
Observe first that taking / = 1 in (3.2) yields (since by stationarity
E[N-t] = -E[Nt])
(Z, 5) stationary => E[Nt] = E[l/X0]t, t G E. (3.4)
In particular, (3.4) yields [take t = 1] the following result for the intensity
E[iVi] of the stationary point-stream 5:
(Z, 5) stationary =>■ E[Ni] = E[1/X0). (3.5)
Also, we see from (3.4) that if E[Ni] < oo then (since P(50 = 0) < E[N0])
we have P(5o = 0) = 0. This is in fact also true when E[iVi] = oo as
can be seen as follows. Let V be uniform on [0,1] and independent of S.
Then, by stationarity, P(So = 0) = P(5n = V for some n). Since the 5n
are countably many, P(5n = V for some n) = 0. Thus we obtain that a
stationary point-stream cannot have a point at the origin:
(Z, 5) stationary ^ P(50 = 0) = 0.
Finally, due to (c), the origin of a stationary point-stream is placed
uniformly at random in the cycle where it lies and independently of the process
seen from one of the endpoints of the cycle:
tr, ™ ■ \U uniform on (0,1]
(Z,S) stationary => { v J
I and independent of (Z°,S°).
This beautiful fact has the following intuitive explanation. One can think
of the origin of a stationary (Z, 5) as chosen uniformly at random in R.
The relative position of 0 in (5_i,5o] should therefore be uniform and
independent of (Z°,5°).
3.2 Proof: (a) Implies (6)
Assume that (a) holds. First suppose / ^ a. Since iVs = 0 for 0 ^ s < So
and So ^ -^o, we have
/ ° f{0s(Z,S))/XN. ds < {S0/X0)a < a.
Jo
256 Chapter 8. STATIONARITY, THE PALM DUALITIES
Thus the expectation of the left-hand side is finite, which allows us to take
the final step in
tE[f(Z,S)/X0]= f E[f(6s(Z,S))/XN3}ds (stationarity)
Jo
= E[J f(0a(Z,S))/XN,ds]
= E[J ° f(6s(Z,S))/XN,ds\+E[J f{Ba(Z,S))/XN.ds\.
By stationarity,
E
J ° f{6t(Z,S))/XN,ds] =E[[ N' f{0,(Z,S))/XN,ds],
and thus (3.1) is established for / bounded. In order to remove the bounded-
ness restriction replace / by /Aa in (3.1) and apply monotone convergence
once on the left-hand side and twice on the right-hand side to obtain that
(a) implies (&).
3.3 Proof: (b) Implies (c)
Assume that (&) holds. The statement (c) is equivalent to the following: for
all g G B+ and / G U ® £+ it holds that
g(x)dx)E[Y,f(8sk(Z,S))}. (3.6)
k=\
In order to establish (3.6), apply (6) to obtain the first equality in
fSNt
tE[g(U)f(Z°,S°)/X0] = E[J ^ g(U,)f(Os„,(Z,S))/XN,ds]
= E[£f(es,(Z,S)) f k g(Ua)/Xkds]
k=\ Jsk--i
and then note that
[ " g(Us)/Xkds= f " g(USk_1+s)/Xkds
JSk-.! JO
= / g{s/Xk)/Xkds= / g{x)dx.
Jo Jo
Thus (3.6) holds, that is, (6) implies (c).
Section 3. Key Stationarity Theorem 257
3.4 Proof: (c) Implies (d)
We obtain that (c) implies (d) if we can show that (3.2) implies (3.3). For
that purpose assume that (3.2) holds and let / be bounded, say / ^ a.
Apply (3.2) with / replaced by the function taking ((z«)»gR, (s/b)^) to
/(^sn((2s)s6Ki (s/t)^oo)) to obtain the first equality in
tE[f(6Sn (Z, S))/X0] = E [ £ f(6Sk+n (Z, S)j\
fc=i
N, n
= E[Y,f(0sk(Z,S))]-E[J2f(0sk(Z,S))]
+ V[YJ f(0sk(Z,S))].
Apply (3.2) and / ^ a and divide by t to obtain
-an/t < E[f{0Sn (Z, S))/X0] - E[f{Z°,S°)/X0) < an/t.
Send t —> oo to obtain that (3.3) holds, that is, (c) implies (d).
3.5 Proof: (gQ Implies (a)
Assume that (g?) holds. Take an / 6 H ® £+ to obtain
E[/(0t(Z,5))] = E[/(0t_(1_l/)Xo(Z°,5°))]
= E[/ /(0.(Z°,So))ds/xo],
where the second step is due to U being uniform on [0,1) and independent
of (Z°,S°). For any a ^ b and x ^ y it holds that
[a, b) n[x,y)-x = [(a V x) A y, {b Ay) Vi)-i (3.7)
= [(a - x)+ A (j, - x), (b - x)+ A (y - x)). (3.8)
Taking [a, b) = [t- X0, t) and [x,y) = [Xx + ■ ■ ■ + Xk, Xj + ■ • ■ + Xk+i) and
applying (3.8) yields
ft °° rXk + 1A(t-X! Xk) +
/ f(6s(Z°,S°))ds = Y, f(0s0Sk(Z,S))ds.
Jt-Xo ^^JXk + iAit-Xo Xk) +
258 Chapter 8. STATIONARITY, THE PALM DUALITIES
Due to (3.3),
rXkJri/\(t-Xi Xk) +
E / f(6s6Sk(Z,S))ds/Xo\
LJXH1A(l-Xo Xk)+ ' J
U/-XiA(t-X_t + i X0) +
' f(6s(Z°,S°))ds/x0\.
Xi/\(t-X_k X0)+ ' J
Applying (3.7) with [a, b) = [t- X_k X0, t - X_k+i X0) and
\x,y) = [0, X\) yields the second equality in
oo fXiAit-X-k + i X0) +
E[/(0t(Z,S))] = £> / f(6s(Z°,S°))ds X0\
_o0 lJXlA(t-X_k X0)+ ' J
= E[J ' f(6s(Z°,S°))ds/Xo\.
Hence E[f(8t{Z, S))] does not depend on t for any /eH® C+, that is,
(Z, S) is stationary. Thus (d) implies (a), and the proof of Theorem 3.1 is
complete.
4 The Point-at-Zero Duality
We are now ready for the first Palm duality between stationarity and
cycle-stationarity. This duality has the informal point-at-zero
interpretation stated at (1.1), namely, the cycle-stationary dual behaves like the
stationary process conditioned on having a point at the origin. We
motivate this interpretation in the next section. It is informal because in the
stationary case the probability of having a point at the origin is zero.
In order to see at this point why a duality with this interpretation is
reasonable, consider a stationary recurrent Markov chain in two-sided discrete
time. According to the Markov property, future and past are independent
given the present. Thus if we condition the stationary Markov chain on
being in a particular fixed reference state at time zero (on having a point
at the origin), then both future and past consist of the i.i.d. cycles between
visits to this reference state. That is, the conditioning makes the stationary
Markov chain cycle-stationary.
The duality is obtained in two separate steps, one measure-free (shifting
to and from a point), the other involving only the measure (length-biasing
and length-debiasing the cycle straddling the origin). The order in which
the steps are taken does not matter. The measure-free step was taken in
Section 2.4, and the biasing (change of measure, Radon Nikodym) step we
take now.
Section 4. The Point-at-Zero Duality 259
4.1 Length-Biasing -f-» Length-Debiasing
Recall that X0 is the length of the cycle straddling the origin. Suppose we
are given a probability measure P on (Q,J-) satisfying
E[l/X0] < oo. (4.1)
Then we can define a new probability measure P° on (fi, T) by letting it
have the density (Radon Nikodym derivative) dP°/dP := l/(X0E[l/X0])
with respect to P, that is,
dP° = Jrifv idP {length-debiasing P). (4.2)
E[l/A0]
From (4.2) we obtain
E™ = Eiikr («>
Since 0 < X0 < oo implies E[l/X0] > 0, we obtain from (4.3) that
E°[X0] < oo. (4.1°)
Thus (4.2) can be rewritten as
dP = X° .dP° (length-biasing P°). (4.2°)
E°[a0J
Conversely, suppose we are given a probability measure P° on (fi, T)
satisfying (4.1°). Then we can define a new probability measure P on (fi,T)
by (4.2°). From (4.2°) we obtain
E^ = E^oJ- (4"30)
Since X0 > 0 implies E°[X0] > 0, we obtain from (4.3°) that (4.1) holds.
Thus (4.2°) can be rewritten as (4.2).
[Note that (4.2°) is a reformulation of (4.2), and (4.3°) is a reformulation
of (4.3), but (4.1°) is not a reformulation of (4.1): (4.1°) follows from
E[l/X0] > 0, not from E[l/X0] < oo, and (4.1°) follows from E°[X0] > 0,
not from E°[X0] < oo.]
We have established that the length-debiasing at (4.2) is equivalent to the
length-biasing at (4.2°). This yields a duality (one-to-one correspondence)
between probability measures P on (fi, J7) satisfying (4.1) and probability
measures P° on (ft,^) satisfying (4.1°).
260 Chapter 8. STATIONARITY, THE PALM DUALITIES
4.2 Stationarity «-* Cycle-Stationarity
Combining this measure duality between P and P° and the measure-free
duality [Section 2.4] between (Z,S) and ((Z°,S°),U) yields the following
duality between stationarity and cycle-stationarity.
Theorem 4.1. Let {Q,J-) be a measurable space supporting {Z,S) and
((Z°,S°),U) where Z and Z° are two-sided shift-measurable processes, S
and S° are two-sided sequences of times increasing strictly from — oo to oo
with 5_i < 0 ^ So and Sq = 0, and U is a (0,1] valued variable. Let (Z, S)
and ((Z°,S°),U) be linked by
(Zo,S°)=0So{Z,S) and U = -S-1/X0
or, equivalently, by
(Z, S) = ^-(i-t/)*. (Z°, 5°) [thus X0 = XS].
LetT and P° be probability measures on (il,J-) satisfying (4.1) and (4.1°)
and linked by (4.2) or, equivalently, by (4.2°). Then
(Z, S) is stationary under P (4-4)
if and only if
(Z°,S°) is cycle-stationary under P° (4.4°)
and U is uniform on (0,1] and independent of (Z°,S°).
Comment. Note that U is uniform on (0,1] and independent of (Z°,S°)
under P if and only if it is so under P° (this follows, for instance, from the
equivalence of (4.5) and (4.5°) below).
Proof. Due to the equivalence of (a) and (d) in Theorem 3.1, (4.4) holds
if and only if for each g 6 B+, / G % ® C+, and n G Z,
E[g(U)f(6Sn(Z,S))/X0] = [J g(x)dx)E[f(Z°,S°)/X0]. (4.5)
Due to (4.2), (4.5) holds if and only if for each g G B+, f G Ti <2) C+, and
n £ Z,
V°[g(U)f(6sAZ,S))}=(J g(x)dx)E°[f(Z°,S°)}, (4.5°)
which is a reformulation of (4.4°). Thus (4.4) and (4.4°) are equivalent. □
Section 4. The Point-at-Zero Duality 261
4.3 Stationary Intensity f* Mean Cycle-Stationary Cycle-Length
Suppose the equivalent statements (4.4) and (4.4°) hold. Then, see (3.5),
E[l/X0] = E[iVi] = intensity of the stationary point-stream.
Thus (4.1) simply means that the intensity of the stationary point-stream
is finite. And (4.3) becomes
E^ = eW (4"6)
In other words, under the duality established in Theorem 4.1, the intensity
of the stationary point-stream is the reciprocal of the mean cycle-length
of the cycle-stationary point-stream. This relation is familiar from renewal
theory; see Chapter 2, Section 9.
4.4 The Stationary Delay Time
Suppose the equivalent statements (4.4) and (4.4°) hold. Then, under P,
both —5_i and the stationary delay time So have the density
P°(Xo>s)/E0[X0], 0<s<oo, (4.7)
that is, both have the distribution function Goo defined (as in the renewal
case in Chapter 2) by
G (g).= E'[*oAs]_J0xP°(Xo>,)cfa Q<x<oQ
Uoo(X)- E°[X0] _ E°[X0] ' u^x<0°-
This can be seen as follows. Under P both U and 1 — U are uniform on
[0,1] and independent of Xq, and thus So = (1 — U)Xq and —5_i = UXq
have the same distribution. For 0 ^ x < oo,
and thus the common distribution function is Goo-
4.5 Conditional Distributions Given Xq Are Identical
The change of measure at (4.2) length-debiases the distribution of Xq:
F°(X0Gdx)= * P(X0&dx), 0^x<<x>; (4.8)
xni[\./Ao\
and conversely, (4.2°) length-biases the distribution of Xq:
P(X0 6 dx) = *x P°(I0 € dx), 0^x<oo. (4.9)
262 Chapter 8. STATIONARITY, THE PALM DUALITIES
On the other hand, conditional distributions given Xq remain the same:
P(-|X0) = P°(-|X0) a.s. P and a.s. P°. (4.10)
In particular, a.s. P(X0 G dx)) and a.s. P°(X0 G dx), it holds that
P((Z°,5°) G -\XQ = x) = P°((Z°,S°) G -\X0 = x), (4.11)
P((Z,S) € -\X0 = x) =P°((Z,S) € -\X0 = x). (4.12)
These claims are direct consequences of the following lemma.
Lemma 4.1. Let (fi, T, P) be a probability space supporting a random
element Y in a measurable space (E,£). Let P be the distribution ofY and
Q be a probability measure on {E,£) having a density g with respect to P.
If we define a new probability measure Q on (Q,J-) by
dQ = g(Y)dP (change of measure),
then Y has the distribution Q under Q and, for V G T+,
EQ[V\Y] = E[V\Y] a.s. Q, (4.13)
where Eq denotes expectation under Q, and E expectation under P.
PROOF. The random element Y has the distribution Q under Q, since for
Ae£,
Q(Y eA) = E[l{Y€A}g(Y)} = [ gdP= [ dQ.
J A J A
Further, for V G T+ and / G £ +,
EQ[E[V\Y}f{Y)} = E[E[V\Y]f{Y)g(Y)] (by definition of Q)
= E[Vf{Y)g{Y)} (by definition of E[V\Y])
= EQ[Vf{Y)} (by definition of Q)
and thus, by definition of EQ[F|y], (4.13) holds. □
Remark 4.1. The pair (Z°,S°) is regenerative if the cycles are i.i.d., that
is, if in addition to being cycle-stationary, (Z°,S°) also has independent
cycles. The stationary dual {Z,S) of a regenerative (Z°,S°) does not have
i.i.d. cycles, since the cycle Co straddling zero is length-biased. However,
the cycles of the stationary dual (Z, 5) are still independent, and if we
leave out Co, then the remaining cycles ..., C_2, C_i, C\, C2, ■ ■ ■ are i.i.d.
copies of the cycles of (Z°,S°). [This follows by noting that (4.10) implies
P(.|C0) = P°(-|Co) and thus, for n ^ 1,
P(C_n G -,..., C_i G-,d G-,...,C„G-|C0)
= P°(C_nG-)---P°(C_1 e.jp^d g-)---P0(C„g-)
if (Z°, 5°) is regenerative.]
Section 4. The Point-at-Zero Duality 263
4.6 The Palm Duality in Terms of Distributions
Theorem 4.1 gives us a one-to-one correspondence between particular copies
of a stationary (Z, 5) and a cycle-stationary (Z°,S°). In the cycle-stationary
case the independent uniform (0,1] variable U is redundant, while in the
stationary case (Z, 5) is obtained from (Z°, S°) with the aid of U, and if we
replaced U by another independent uniform (0,1] variable, then we would
obtain another stationary dual having the same distribution as (Z, 5).
Thus it is in some sense more natural to think of the duality as a one-to-
one correspondence between stationary and cycle-stationary distributions,
say P and P°, rather than between particular copies having these
distributions. A distributional form of the duality can be obtained from
Theorem 4.1 as follows. If a stationary distribution P is given, apply Theorem 4.1
to some (Z, 5) with the distribution P to obtain P° as the distribution of
the cycle-stationary dual. Conversely, if a cycle-stationary distribution P°
is given, apply Theorem 4.1 to some ((Z°,S°),U), where (Z°,S°) has the
distribution P° and U is uniform on (0,1] and independent of (Z°,S°), to
obtain P as the distribution of the stationary dual. For more details, see
the next subsection.
A nondistributional way of getting around the pluralism of Theorem 4.1
would be to assume that ((Z°,S°),U) is canonical. Note that then {Z,S)
is not canonical: it is obtained from (Z°,S°) by placing the origin at
random in a cycle. [If we assume conversely that (Z, S) is canonical, then
((Z°,S°),U) is not canonical.]
We shall use neither the distributional approach here nor the canonical
one. Theorem 4.1 is clean-cut, highlights the simple two-step duality
construction, and is easy to apply. We shall stick to the Palm duality in this
form. Keeping (Q,J-) unspecified also allows us the freedom of adding new
random elements when needed.
4.7 Collapsing the Two-Step Construction into a Single Step
Suppose the equivalent statements (4.4) and (4.4°) hold. Combining (4.2),
(3.2) in Theorem 3.1, and the observation (3.5) yields
E°[/(Z°,S°)]=EE^^(Z'S))], f&n®£\ (4.14)
thus deriving the distribution of the cycle-stationary (Z°,S°) from the
stationary (Z, S) in a single step. This is the most common definition of
the distribution P° of the so-called Palm version of (Z,S), called cycle-
stationary Palm dual here.
264 Chapter 8. STATIONARITY, THE PALM DUALITIES
Conversely, combining (4.2°), (Z, S) = 6_{1_u)Xo(Z0,5°), and the latter
claim at (4.4°) yields
nnz,s)} = ^^^p^, fzn®c+, (4.i4°)
which gives us back, in a single step, the distribution P of the stationary
(Z,S). This formula is known as the inversion formula, indicating that
the Palm version is thought of as derived from the stationary version and
just happens to have the cycle-stationarity property. In our treatment the
stationary and cycle-stationary duals have equal status.
5 Interpretation — Point-Conditioning
The point-at-zero interpretation of the duality established in Theorem 4.1
[stated in words at (1.1)] can now be formulated as follows:
P((Z,5) G -\S0 = 0) = P°((Z°,5°) G •)■ (5-1)
This of course does not have an immediate meaning because P(5o =0) = 0.
In this section we present several results motivating (5.1).
5.1 The Basic Point-Conditioning Theorem
The following theorem is the key result on which the rest of this section
relies. It also provides an immediate motivation for the point-at-zero
interpretation: put s = 0 in the third identity to obtain (5.1). Since the identity
holds only for P(5o G •) a.e. s, the motivation is still informal. However,
sending s to 0 provides a formal limit motivation of (5.1).
Theorem 5.1. Suppose the equivalent claims (4.4) and (4.4°) hold. Then
for s > 0,
P((Z°,5°) G -\S0 = s) = P°((Z°,5°) G -|X0 > s),
P((Z°,S°) G -I5-, = -s) = P°((Z°,S°) e-\X0 > s),
P((Z,S) G -\S0 = s) = P°(0_s(Z°,S°) G -\X0 > s),
P({Z,S) G -|5_i = -s) = P°(0s(Z°,S°) g -|X, > s),
in the sense that the right-hand sides are versions of the left-hand sides as
functions of s.
Section 5. Interpretation - Point-Conditioning 265
PROOF. We start by proving the following reformulation of the first
identity: for all h£B+ and / 6 % <8> £+ it holds that
E[h(So)9f{S0)) = E{h(S0)f(Z°,S°)}, (5.2)
where gj is defined by
gf(s) = E°[f(Z°, S°)\X0 >s), s> 0. (5.3)
Use (5.3) and (4.7) to take the first step in
E[h(S0)gf(S0)]
/■OO
= / h{s)E°[f(Zo,S°)\X0 > s]P°(X0 > s)/E°[X0] ds
Jo
/•OO
= / h(s)E°[f(Z°,S°)l{Xo>s}}/E°[X0}ds
Jo
= E° [f(Z°,S°) J°° h(s)l{Xo>s} ds] /E°[X0]
= e[/(Z°,5°) J°° h(s)l{Xo>s]ds/x0] [by (4.2)]
= E[f(Z°,S°)J ° h{s)ds/x0
= E[h(S0)f(Z°,S°)},
while the last is due to So = (1 — U)Xq, the uniformity of U, and its
independence of (Z°,S°). Thus (5.2) holds, that is, the first identity in
Theorem 5.1 is established.
The second identity in Theorem 5.1 follows from the first, since [due
to So = (1 — U)Xo and —5_i = UX0 and the uniformity of U and its
independence of (Z°,S°)]
((Z°,50),5o) = ((Z0,5°),-5_1).
der to estab
the third step in
In order to establish the third identity in Theorem 5.1, use the first to take
t.hp tVlirH Qtpn in
P((Z,5) G -\S0 = s) = P(6_So(Z°,S°) e -|50 = s)
= P(6-S(Z°,S°) G -\S0 = s) [cf. Fact 3.1 in Chapter 6]
= P0(6LS(Z°,S°) G -\Xq > s) [due to the first identity].
266 Chapter 8. STATIONARITY, THE PALM DUALITIES
Finally, in order to establish the fourth identity in Theorem 5.1, use the
second to take the third step in
P((Z,S) G -IS-! = -s) = P(e_s_^-x0(^o,5o) G -|5-i = -s)
= P(6s6-x0(Z°,S°) G -|S_i = -s) [cf. Fact 3.1 in Chapter 6]
= P°(0s0-xo(Z°,S°) G -\X0 > s) [due to the second identity]
= P°(0.(Zo,So)e-|Xi >s),
while the last step is due to cycle-stationarity. □
5.2 Total Variation Motivation of (5.1)
The following theorem gives a strong motivation for the point-at-zero
interpretation (5.1): if the stationary process has a point in a small interval [0, t]
and its origin is moved to that point, then it is close to its cycle-stationary
dual in total variation. In fact, the theorem gives explicit bounds that yield
an even stronger limit result: common component convergence. Recall that
A denotes greatest common component of measures (or common part; see
Section 7.1 in Chapter 3) and that || • || denotes the total variation norm
(see Section 8.2 in Chapter 3).
Theorem 5.2. Suppose the equivalent claims (4.4) and (4.4°) hold. Then
the following bounds hold for t > 0:
P((Z°,5°) G -\S0 < t) 2 P°((Z°,S°) G -,X0 > t), (5.4)
P((Z°,S°) G -\S0 < t) < P™^. (5.5)
This implies that as t \.Q,
/\ P((Z°,S0)G-|So^u)tP0((Z°,S0)G-)- (5.6)
Moreover, for t > 0,
||P((Z°,5°) G -|50 < t) - P°((Z°,S°) G -)|| < 2P°(X0 < t), (5.7)
which implies
P{{Z°,S°) G -\S0 <*)->• P°((Z°,5°) G 0 (5-8)
in total variation as t 4- 0.
Proof. With A G U ® C put
h(s) = P°((Z°,S°) G A\X0 >s)= po((z°,s°)£A'Xo>s)_
P°(A'o > s)
Section 5. Interpretation - Point-Conditioning 267
Applying the first identity in Theorem 5.1 yields P((Z°,5°) G A\S0) =
h(So) and thus
P((Z°, 5°) G A\So < *) = E[/l(5o)|50 < t]. (5.9)
For 0 ^ s ^ t, we have
P°((Z°,S°) &A)
P°((Z°,S°)eA,X0>t)^h(s)t:
P°(X0>t)
Combining this and (5.9) yields (5.4) and (5.5). The common part result
(5.6) follows by noting that the lower bound increases to P°((Z°,S°) G •)
and the upper bound decreases to P°((Z°,S°) G •) as t J, 0. In order to
obtain (5.7), first deduce from (5.4) that
P°((Z°,S°) G A) - P((Z°,S°) G A\S0 < t) < P°(X0 < t)
and then take the supremum in A G H <S> C and multiply by 2 [see the
second equality at (8.12) in Chapter 3]. The total variation limit result
(5.8) now follows by noting that P°(X0 < t) ->■ 0 as 11 0. D
5.3 Smooth Total Variation Motivation of (5.1)
The following theorem involves the stationary process in a direct way
(without shifting its origin to a point as in Theorem 5.2): if the stationary process
has a point in a small interval [0,t], then it is close to its cycle-stationary
dual in smooth total variation and also in the stronger sense of smooth
common component convergence.
Theorem 5.3. Suppose the equivalent claims (4.4) and (4.4°) hold. Then
the following bounds hold for t > 0 and h > 0:
I
h
P(6s(Z,S)€-\S0^t)ds (5.10)
fh Ih tP°{6s{Z°,S°)&-)ds
2 p°(es(z°,s°)e-,x0>t)ds-J-±-t
Jo
P°{X0>t)
and
L
h
o P°(X0>t)
h fn.P°(6s{Z°,S°)&-)ds
p(es(z,s)&-\s0^t)ds^J-^—' ' ; (5.ii)
This implies that as t J, 0,
A / P(0,(Z,S)e-|So<u)dsT / P°(0s(Z°,S°)€-)ds. (5.12)
^..sJo Jo
0<u^t
268 Chapter 8. STATIONARITY, THE PALM DUALITIES
Moreover, for t > 0,
rh
r/l r /l
/ P(6s{Z,S)G-\S0^t)ds- P°{6s{Z°,S°)€-)ds
Jo Jo
^ 2hP°{X0 ^t) +
2t
(5.13)
P°(Xo>*)'
which implies
rh
r/l rh
I P(6s(Z,S)€-\S0^t)ds^ P°(6s{Z°,S°)€-)ds (5.14)
Jo Jo
in total variation as t J, 0.
Comment. The above convergence after time-smoothing has the following
randomized-origin formulation: with V uniform on (0,1) and independent
of (Z, 5) under both P and P° we have, for h > 0,
/\ P(6vh(Z,S)€-\So^s)tP0(6Vh(Z°,S0)€-), t|0,
0<s^t
and
P(0y„(Z,S) G -|50 <*)->• P°(0vn(Z°,S°) G •)
in total variation t J, 0.
Proof. We obtain the lower bound (5.10) as follows: for A G % <8> £,
-So
1
{0,(Z,S)6/1}
ds
So<«]
= E[/ Ho,(z°,s°)£A}ds So ^ ij
~E / l{es(z°,s°)6/i} ^s 5o ^ *
lJh-S0 J
^ / P{ds(Z°,S°)eA\So^t)ds
Jo
-f P(6s(Z°,S°)€A\S0^t)ds,
Jh-t
and applying (5.4) and (5.5) in Theorem 5.2 yields (5.10).
The upper bound (5.11) is obtained in a similar manner. The
common component result (5.12) follows by noting that the lower bound
increases to J0 P(«,(2°,5°) e -)ds and the upper bound decreases to
Section 5. Interpretation - Point-Conditioning 269
JgP°{ds(Z°,S°) G -)ds as t I 0. In order to obtain (5.13), first deduce
from (5.10) that
rh rh
\ P°{6s(Z°,S°)€A)ds- P{6s{Z,S)€A\S0^t)ds
Jo Jo
^hP°{X0 ^t) + t/P°{X0>t)
and then take the supremum in A G H <8> C and multiply by 2 [see the
second equality at (8.12) in Chapter 3]. The total variation limit result
(5.14) now follows by noting that P°(X0 < t) ->■ 0 and P°(X0 > t) ->■ 1 as
Ho. a
5.4 Weak Convergence Motivation of (5.1)
The following theorem also involves the stationary process in a direct way
(without shifting its origin to a point as in Theorem 5.2): if the
stationary process has a point in a small interval [0, £], then it is close to its
cycle-stationary dual in the sense of weak convergence (convergence in
distribution; see Section 10 of Chapter 3).
For this result we need a metric path space: if {E, £) is Polish and D =
De{R) is the set of paths that are right-continuous with left-hand limits,
then the path space (D, V) is Polish; see Ethier and Kurtz (1986). Thus [see
Theorem 2.2 in Chapter 4], {D,V)® (K, B)z is Polish. By weak convergence
in (D,T>) <g> (L,£) we mean convergence with respect to the metric that
{D,T>) <g> (L,C) inherits as a subspace of {D,V) <g> (K,#)z.
Theorem 5.4. Suppose the equivalent claims (4.4) and (4.4°) hold. If
(E,£) is Polish and the path space is (D,D), then
P((Z,S) G -\S0 < t) -+ P°((Z°,S°) G ■) (5.15)
weakly as t \. 0.
PROOF. Weak convergence means that for all bounded continuous
functions /eP®£+,
E[/(Z, S)\So ^t]^ E°[/(Z°, 5°))], 11 0.
So let /eP® £+ be bounded and continuous. Due to the total variation
limit result (5.8) in Theorem 5.2 [see the last identity at (8.11) in Chapter 3]
we have
E[f(Z°,S°)\So^t}^E°[f(Z°,S°))}, HO,
and thus it only remains to prove
\E[f{Z°,S°)\So^t}-E[f{Z,S)\So^t]\^0, HO. (5.16)
270 Chapter 8. STATIONARITY, THE PALM DUALITIES
For that purpose define a bounded function //, G V <g> C+ by
//,(•)••= sup |/(.)-/(0_„(-))l-
For each fixed (.zu)ugR G D the map taking ( e Rto (^t+«)«eR G £>
is continuous, and for each fixed (sfc)ff00 G L the map taking £ G K to
(s„t_+t — i^oo G L is left-continuous (recall the definition of dt and nt_
at (2.1) and (2.2)). Thus fh \. 0 pointwise as h \. 0. This (together with the
boundedness of //,) yields the final step in
\E[f{Z°,S°)\So^t)-E[f(Z,S)\S0^t}\
^E[\f(Z°,S°)-f(Z,S)\\S0^t]
< E[A(Z°,5°)|50 < t) for t < ft [(Z,5) = 6-So(Z°,S°)}
->E°[fh{Z°,S°)] ast|0 [due to (5.8)]
-» 0 as ft 4- 0.
Thus (5.16) holds, and the proof is complete. □
5.5 Coupling Motivations of (5.1)
The convergence modes (5.6), (5.12), and (5.15) have coupling counterparts
established in Sections 9 and 10 of Chapter 3.
Theorem 5.5. Suppose the equivalent claims (4.4) and (4.4°) hold. Then
the following statements hold.
(a) There is a probability space supporting pairs (Z^l\S^), 0 ^ t < oo,
and a strictly positive random variable R such that
(Z<f>,S<f>) has distribution P((Z°,5°) G -\SQ < *). t > °>
(ZW,5(0)) has distribution P°((Z°,5°) G •),
(ZW,S^) = (Z<°\S<°>) for 0 < i< R.
(b) For ft > 0, there is a probability space supporting pairs (Z(h'l\ S^h'^),
0 ^ t < oo, and a strictly positive random variable R(h) such that
1 fh
^Z(h,t)^s(h,t^ has distribution- / P(0s(Z, 5) G -j^o ^i)cfs, t>0,
(Z(/l'0),S(/l'0)) /ms distribution \ f P°(0s(Z°, 5°) G ■) ds,
" Jo
(Z(M))5(ft,0) = (z(ft.»)>S(M)) /or 0 < 4 ^ #00.
Section 6. Application - Perfect Simulation 271
(c) // (E,£) is Polish and the path space is (D,T>), then for all strictly
positive sequences tn, 0 ^ n < oo, decreasing strictly to 0 as n —» oo,
there is a probability space supporting pairs [Z^tn\ S"n'), 0 < n < oo,
and (Z(°),5(°)) such that
(Z(*»))5(tn)) has distribution P((Z,S) G -\S0 ^ tn), n > 0,
(Z(0),S(0)) has distribution P°{{Z°,S°) G ■),
pointwise as n —} oo.
Proof, (a) By Theorem 9.4 in Chapter 3 the result (5.6) in Theorem 5.2
above is equivalent to (a). In order to see this let Yt have distribution
P((Z,S) G -|50 ^ l/t) and Y^ have distribution P°((Z°,5°) G •) and put
(Z^.SW) := y1/t, (Z(°),S(°)) := Yoo and i? := 1/K.
(b) By Theorem 9.4 in Chapter 3 the result (5.12) in Theorem 5.3
above is equivalent to (&). In order to see this let Yt have the
distribution h~l JQ P(8S(Z,S) G |5o ^ l/t)ds and let Y^ have the distribution
h'1 JoP°{{Z°,S°) G-)^andput(Z(M),S(/a>):=y1/t,(Z(/,'0),S(','0>) : =
Yoo and R^ := 1/K.
(c) The metric, that the subspace {D,V)®{L, C) inherits from the Polish
{D,T>) <S) (K, B)z, is separable. Therefore, according to Theorem 10.1 in
Chapter 3, (5.15) in Theorem 5.4 above is equivalent to (c). In order to see
this take Pn = P((Z,5) G -\S0 ^ tn) and P = P°((Z°,5°) G •) and put
(Z('»),S('")) := Y"("> and (Z(°',5(0)) := Y. D
6 Application - Perfect Simulation
In this section we show how the two-step duality construction in
Theorem 4.1 yields perfect solutions of a classical simulation problem, the so-
called initial transient problem. The solutions are based on the so-called
acceptance-rejection algorithm. We start by explaining the problem in
general terms.
6.1 The Initial Transient Problem
Stochastic simulation is concerned with generating realizations of random
variables and stochastic processes, usually in order to estimate properties
that cannot for some reason be calculated mathematically (for instance,
because of their complicated structure or because the input in the stochastic
model under consideration is not completely known).
An old problem in this context is the following: how can we generate a
stationary version of a given process. Suppose, for instance, we know the
transition probabilities of an irreducible positive recurrent Markov chain
but cannot calculate the stationary distribution. Then we could start a
272 Chapter 8. STATIONARITY, THE PALM DUALITIES
chain in a fixed state, let it run long enough so that it gets close to
stationarity, and then use the realization from that time onward to estimate
the unknown stationary distribution. The obvious problem is, what is long
enough?
If we do not wait long enough, a bias will be introduced. Consider, for
instance, a queueing model starting with an empty system. The system
will take some time to fill up, and if we do not wait long enough, we will
underestimate the stationary queue length or work load. This problem goes
under the name the initial transient problem.
Our solutions to the initial transient problem are not obtained by
waiting until the process is 'close enough' to stationarity. We shall generate a
process in perfect stationarity. This is called perfect simulation. And we
shall not do this by waiting for a single process to be in perfect stationarity
but by generating several processes until the right one is found: we shall
use acceptance-rejection.
6.2 Acceptance—Rejection
Acceptance-rejection is a method for producing a random element with
a desired distribution Q by selecting it from a sequence of i.i.d. random
elements Y(l\Y(2\ ... from another distribution P. This P must be such
that Q has a bounded density g with respect to P, say g ^ c. Sequentially,
for each n, accept y(n) with probability g(y)/c, where y is the realized value
of y(n). According to the following theorem, an accepted random element
has the desired distribution Q.
Theorem 6.1. Let (Y^\I^l)), (y(2\/(2)),... be i.i.d. copies of a pair
(Y,I), where Y is a random element in some measurable space (E,£) with
distribution P and I is a 0-1 variable taking the value 1 with probability
p > 0. Let these pairs be defined on a probability space (fls\m,^simjPsim)-
Then the following claims hold.
(a) Define a geometric random variable M with parameter p and mean
1/p by
M = inf{n > 1 : I<-n) = 1}
= number of acceptance-rejection trials.
Then y(M) has the distribution Psim(y £ -\I = 1). This distribution
has a density with respect to P that is bounded by 1/p.
(b) Let Q be a probability measure on (E,£) having a density g with
respect to P and suppose there is a finite constant c such that g ^ c.
If
Ps[m(I=l\Y) = g(Y)/c,
(6.1)
Section 6. Application - Perfect Simulation 273
then M has parameter p = 1/c and Esim[M] = c and
Esim[/(^(M))] = Esim[f(Y)g(Y)} = J fdQ, f G £+ , (6.2)
that is, y(M) has the distribution Q.
(c) Suppose g ^ c and I = l{u^.g<Y)/c}> where U is independent ofY and
uniformly distributed on (0,1). Then (6.1) holds.
Proof, (a) Clearly, M has geometric distribution with parameter p, and
thus Esim[M] = IJ p. Further, for / G £+,
OO
Esim[/Cr(M))] = ^Esim[/(y("))l{M=n}]
n-\
oo
= ^Esim[/(y("))/(")i{M^n}]
n=\
oo
= Y, Esim[/(y(n))/(n)]Psim(M ^ n) (independence)
n=l
oo
= Esim[/(y)J]^Psim(AOn)
n=l
= Esim[/(y)/]Esim[M] (Lemma 10.1 in Chapter 2)
= E8im[/(y)i]/p = Esim[/(y)|/ = l],
and thus y(M) has the distribution Psim(y G -|J = 1). Since
Psim(^ G -|/ = 1) < Psim(^ G -)/Psim(/ = 1) = P/p,
it follows that PSim(y G -|I = 1) has a density with respect to P and that
the density is bounded by 1/p.
(6) If (6.1) holds, then
P = Psim(/ = 1) = Esim[g(Y)]/c = 1/c,
and thus Es;m[M] = c. Further, due to (a) and p = 1/c, we have
Esim[/(y(M))] = Esim[/(y)|/ = i]
= Esim[/(y)i]c = E8im[/(y)Esim[/|y]]C)
and since Esim[J|y] = Psim(7 = l|y) = g(Y)/c, this yields
Esim[/(y(M))] = Esim[f(Y)g(Y)} = JfgdP = jfdQ.
274 Chapter 8. STATIONARITY, THE PALM DUALITIES
(c) Since U is uniform on (0,1) and independent of Y, we have
Psim(I = 1\Y) = Psim(U ^ g(Y)/c\Y) = g(Y)/c
as desired. □
Remark 6.1. An interesting feature of the acceptance-rejection method
is that we need to know neither P nor Q. We only need to know g/c and to
be able to obtain realizations of the i.i.d. Y^ ,Y(2\ These realizations
need not be produced by an explicit use of P, that is, the characteristics
of the input in the stochastic model under consideration need not be
completely known. It can, for instance, be the output of another simulation.
Remark 6.2. Let Q be a probability measure on (E,£) having a
density g with respect to P. If we define a new probability measure QSim on
(ftsirru^sim) by
dQsim = g(Y)dPsim,
then by Lemma 4.1, Y has the distribution Q under Q5im. Thus y(M)
has the same distribution under Psim as Y has under QSim, that is, the
acceptance-rejection algorithm has the same effect as a change of measure.
The simulation relevance of Theorem 4.1 should now be getting clearer.
6.3 Generating Stationary Renewals — Bounded Recurrence
Before applying the full strength of Theorem 4.1, let us consider an
elementary special case, namely the problem of generating a stationary
renewal process when it is known how to generate the i.i.d. recurrence times
A"i,X2, ■ • • but their distribution function F is not explicitly known.
In Section 9 of Chapter 2 (and Theorem 4.1 above) we showed that if
F has a finite mean m, then a stationary renewal process is obtained by
placing the origin uniformly at random in an interval of length Xq, where
Xq has the density x/m, 0 < x < oo, with respect to F. Thus the simulation
problem is solved if we can generate such an Xq.
This can be done by acceptance-rejection if the recurrence times are
bounded with probability one by a known finite constant, say Xx ^ a.
Then
g(x) = x/m, 0 < x ^ a,
is a density of X0 with respect to F, and g ^ a/m. Theorem 6.1(6 and c)
thus yields the following procedure for generating X0:
let U(l\U(2\... be i.i.d., uniform on (0,1) and independent
of the Xn and accept the first Xn satisfying [/(") ^ Xn/a.
This approach is extended beyond renewal processes in the next subsection.
Section 6. Application - Perfect Simulation 275
6.4 Generating the Stationary Dual When X0 ^ a < oo
Now consider the duality established in Theorem 4.1 between a stationary
(Z,S) under P and a cycle-stationary (Z°,S°) under P°, where
dP/dP° = X0/E[X0] and (Z,S) = 0_(1_U)Xo(Z°,S°).
Suppose we wish to generate the stationary (Z, S) when it is known how to
generate its cycle-stationary dual. Here is an acceptance-rejection solution
in the bounded cycle-length case, that is, when there is a known constant
a < oo such that
P°(X0^a) = l. (6.3)
Recursively, for n ^ 1:
1. Generate (Z(n\S^) with distribution P°((Z°,S°) e •) until X{0n)
has been realized.
2. Generate an independent [/(n) uniformly distributed on (0,1).
3. Repeat steps 1 and 2 independently for n ^ 1 until {[/(") ^ Xq /a}
occurs and put
M = inf{n ^ 1 : U^ «C X{0n)/a}.
4. Now generate as much of (Z(M',S(M') as desired.
5. Generate an independent V uniformly distributed on (0,1).
According to the following theorem,
8_v (M)(Z'M\S^M)) is a (perfect) copy of the stationary dual,
and the expected number of acceptance-rejection trials is a/E°[Xo].
Theorem 6.2. Let (fiSim,-^simjPsim) be the probability space supporting
the random elements generated in steps 1 through 5. If (6.3) holds, then
Psim(^vxr>(i?(M),S(M)) e •) = P((Z,S) e •), (6.4)
Esim[M] = a/E°[X0]. (6.5)
Proof. Apply Theorem 6.1(6 and c) [see Remark 6.2] with
P = P°((Z°,S°)e-) and Q = P((Z°,S°)e-),
g(Z°,S°) = Xo/E°[X0} and c = a/E°[X0],
to obtain Psim((Z(M),S(M>) G ■) = P((Z°,S°) £ ■). Now (6.4) follows
from the fact that both 9_vxiM)(Z<-M\S^M'>) and (Z,S) are obtained by
placing the origin uniformly at random in the interval straddling zero. By
Theorem 6.1, Esim[M] = c, and (6.5) follows. □
276 Chapter 8. STATIONARITY, THE PALM DUALITIES
6.5 Example — The S-s-Inventory System
An example of a process with bounded cycle-lengths is the supply process
in the so-called S-s-inventory system with a deterministic demand
component. This system is as follows. Some material is stored in a storage with
maximal capacity S. Demand is the sum of a linear deterministic
component and a random stationary component. The deterministic demand has
rate d > 0, that is, during a period of length t the quantity demanded is
dt. The random demand is compound Poisson, that is, i.i.d. nonnegative
quantities are demanded at the times of a renewal process with exponential
recurrence times (more generally, the compound Poisson demand can be
replaced by any stationary stochastic process with nonnegative independent
increments). When the supply drops below a minimal level s, the storage
is filled up again to its maximal capacity S.
Call the process formed by the supply in store at time i, 0 ^ t < oo,
the supply process. The times between successive jumps to the maximal
level S split the supply process into an i.i.d. sequence of cycles. Thus, if
it is known how to generate the demand, then we can generate a cycle-
stationary supply process by starting at time 0 with maximal supply S.
The cycle-lengths are bounded by
a = (S — s)/d,
and thus the procedure in Section 6.4 gives us a way to generate the
stationary version of the supply process.
6.6 Generating the Cycle-Stationary Dual When X0 ^ 6 > 0
Consider again the duality established in Theorem 4.1 between a stationary
(Z,S) under P and a cycle-stationary (Z°,S°) under P°, where
dP°/dP = 1/(E[1/X0]X0) and (Z°,S°) = 6So(Z,S).
This time suppose we wish to generate the cycle-stationary (Z°,S°) when
it is known how to generate its stationary dual. Here is an acceptance-
rejection solution in the case when the cycle-lengths are bounded away from
zero, that is, when there is a known constant 6 > 0 such that
P(X0 ^ 6) = 1. (6.6)
Recursively, for n ^ 1:
1. Generate {Z<-n\S^) with distribution P((Z°,S°) G ■) until X^n)
has been realized.
2. Generate an independent f7(n) uniformly distributed on (0,1).
Section 6. Application - Perfect Simulation 277
3. Repeat steps 1 and 2 independently for n ^ 1 until {£/(") ^ b/X^ }
occurs and put
M = inf{n^ 1 :!/(") «C b/X{0n)}.
4. Now generate as much of {Z^M\ S^M)) as desired.
According to the following theorem
is a (perfect) copy of the cycle-stationary dual,
and the expected number of acceptance-rejection trials is 1/(E[1/X0]b).
Theorem 6.3. Let (fiSim,-^simjPsim) be the probability space supporting
the random elements generated in steps 1 through 4- If (6-6) holds, then
ps]m((z(M\sW)e-) = p°((Z°,s°)e-),
Esim[M] = 1/(E[1/X0]6).
Proof. Apply Theorem 6.1(6 and c) [see Remark 6.2] with
P = P((Z°,S°)G-) and Q = P°((Z°,S°) e •),
g(Z°,S°) = 1/(E[1/X0]X0) and c = 1/(E[1/X0]6),
to obtain the desired results. □
6.7 Generating the Stationary Dual — Delay Time Given
Once more consider the duality established in Theorem 4.1 between a
stationary (Z,S) under P and a cycle-stationary (Z°,S°) under P°. We shall
now show that the problem of generating the stationary (Z,S), when it is
known how to generate its cycle-stationary dual, can be reduced to that of
generating the stationary delay time, that is, a random variable having the
distribution Goo with density
P°(X0 > x)/E°[X0], 0 < x < oo, [see (4.7)].
We shall use acceptance-rejection and the following result from
Theorem 5.1:
P((Z°,S°)e-\S0=s) = P°((Z°,S°)e-\X0>s), s>0. (6.7)
Proceed as follows:
1. Generate W with distribution Goo and let W be the stationary delay.
2. Generate independent (Z(n\S(n)) with distribution P°((Z°,S°) <= •)
until X^1' has been realized.
278 Chapter 8. STATIONARITY, THE PALM DUALITIES
3. Repeat step 2 independently for n ^ 1 until {X^n' > W} occurs and
put
M = inf{n ^ 1 : X^n) > W}.
According to the following theorem,
9-w{Z^M\ S^M)) is a (perfect) copy of the stationary dual,
and the expected number of acceptance-rejection trials is infinite!
Theorem 6.4. Let (fiSim,•T'simjPsim) be the probability space supporting
the random elements generated in steps 1 through 3. Then
Psim(0.w(Z^M\S^) G •) = P((Z,S) G ■), (6.8)
Esim[M] = oo. (6.9)
Proof. We shall use Theorem 6.1(a) with Psim replaced by Psim(-| W = s).
Since W is independent of the (Z^n\ S^) this is the same as applying
Theorem 6.1(a) with y(n) = (Z(n),S(n>) and /(") = {X^n) > s}. Thus, for
s >0,
Psim((£(M),S(M)) G -\W = s)= P°((Z°,S°) G -\X0 > s), (6.10)
E8im[M|W = s} = 1/P°(X0 > s). (6.11)
Comparing (6.10) and (6.7) yields
Psim((Z(M\S(M)) G -\W = s)= P((Z°,S°) G -\S0 = s), s> 0.
This and the fact that Psim(W G •) = P(-So £ •) shows that
Psim(((^(M),S(M)),^) G •) = P(((Z°,S°),So) G ■)■
Now (6.8) follows from the fact that d-W(Z^M\ S(M>) is the same
measurable mapping of ((Z(M), S<M>), W) as (Z,S) is of ((Z°,S°),S0).
Further, recall that W has density P°(X0 > s)/E°[X0]. This yields the
first equality in
Esim[M] = r-Esim[M\W = s]P°(X0 > s)ds/e°[X0]
= J ds/E°[X0] (due to (6.11)),
and noting that J"0°° ds = oo yields (6.9). □
Section 6. Application - Perfect Simulation 279
6.8 Imperfect Simulation — Unbounded Cycle-Lengths
Finally, let us see what happens if we apply the method of Section 6.4
without the assumption (6.3) that the cycle-lengths are bounded. So fix an
a < oo, carry out steps 1 through 5 in Section 6.4, and denote the number
of acceptance-rejection trials by Ma. According to the following theorem,
9_ {Ma)(Z^Ma\ S^Ma^) is an imperfect copy of the stationary dual
V Aq
with perfection probability Goo(o/Goo(a))- Note that
Goo (a/Goo (a)) ^ Goo(a) -> 1 as a ->oo.
Theorem 6.5. Let a > 0 be a finite constant and let (0,s\m, Jrs\m,Ps\m ) be
the probability space supporting the random elements generated in steps 1
through 5 of Section 6-4- Denote the number of acceptance-rejection trials
by Ma. Then
\psim{e_vx(oMa)(z(M°\s^)£-)-p((z,s)£
= 2(1-Goo(a/Goo(a))),
||psim((z(M«\s(M°>) g •) - P((z°,s°) e -)ll
= 2(1-000(0/000(0))),
(zW,s(M-))e-)AP((z,S) e-
= Goo(a/Goo(a)),
||Psim((Z(M°\S(M°)) G 0 A P((Z°,S°) G Oil
= Goo (a/Goo(a)),
(6.12)
(6.13)
(6.14)
(6.15)
E8im[Ma]=o/Eo[oAX0]. (6.16)
Proof. Write M = Ma. Apply Theorem 6.1(6 and c) [see Remark 6.2]
with
P = P°((Z°,S°) GO,
g(Z°,S°) = (aAl0)/E°[oMo] and c = a/E°[a A X0],
to obtain (6.16) and
Esim[/(Z(M\ S(M>)] = E°[/(Z°, S°)(a A X0)]/E> A X0]. (6.17)
280 Chapter 8. STATIONARITY, THE PALM DUALITIES
Due to Lemma 4.1, it follows from (6.17) that
Psim(^M) e dx) = (aAx)P°(X0 G dx)/F,°-[aAX0], (6.18)
Psim((Z(M\S(M)) G -\X{0M) = x)=P°((Z°,S°) G -|*o= x). (6.19)
By (4.11), we have
P°((Z°,S°) G .\X0=x) = P((Z°,S°) G -\X0 = x).
This and (6.19), together with Lemma 3.1 of Chapter 6, yields the first
equality in
||Psim((Z(M\ S(M>) G •) - P((Z°,S°) G -)||
= ||Psim(x(M)G0-P(*oG0ll (6-20)
= 2(l-||Psim(X<iM)e0AP(XoG0ll),
while the second identity follows from (8.12) of Chapter 3. By (4.9) we have
P(X0 G dx) = xP°(X0 G dx)/E°[X0], 0 < x < oo.
This and (6.18), together with (8.5) of Chapter 3, yield the first equality
in
||Psim(^M)GOAP(X0GOII
/nAi x „.„,,,
W^]AnMp{Xoedx)
= ^]InaAxa0}/nx0]Axpa{Xoedx)
= i^Fl / Tr^T\ A x P°(Xo G dx) [by definition of G^]
= Groo(a/Groo(a)) [by definition of Goo].
This and (6.20) yield (6.13).
Since V and (1 — U) are identically distributed and independent of
(Z°, S°) and (Z(M\ S(M)), respectively, we obtain from (3.3) of Lemma 3.1
in Chapter 6 that
||Psim(((Z(M\S(M>), V) GO- P{{(Z°,S°), (1 - U)) G Oil
(6.21)
= ||Psim((Z(M),S(M>) GO- P((Z°,S°) G Oil-
Since, moreover, 0_vx(m)(Z(m\ S^m^) is the same measurable mapping of
((Z<M>, S(M0, V) as {Z, S) is of ((Z°, S°), {1-U)), and since this mapping
Section 7. The Invariant cr-Algebras I and J 281
has a measurable inverse, we obtain from (3.2) of Lemma 3.1 in Chapter 6
that the left-hand sides of (6.21) and (6.12) are identical. Thus (6.12)
follows from (6.13).
Finally, due to (8.12) in Theorem 8.2 of Chapter 3, (6.12) and (6.14) are
equivalent and (6.13) and (6.15) are equivalent. □
7 The Invariant cr-Algebras X and J
In the previous three sections we have been concerned with the point-at-
zero Palm duality. We now start preparing for the other Palm duality, the
randomized-origin duality, which will be established in the next section.
This latter Palm duality is obtained in the same way as the first, except
that the length-debiasing and length-biasing are done conditionally on the
invariant a-algebra J of the process and points. In this section we show
that stationarity and cycle-stationarity properties are preserved under
conditioning on J.
7.1 Definitions — Observations
The pair (Z,S) is a measurable mapping from (0, T) to (H x L,1-L ® C).
Define the invariant a-algebra on (H x L,H <g> C) by
I:= {B eU®C:6^B = Biox te E} (7.1)
and the invariant a-algebra of (Z,S) by
J:={Z,S)~ll [that \s,J = {{(Z,S)£B} : B e 1}]. (7.2)
Thus I is a sub-er-algebra of H <8> C, while J is a sub-er-algebra of T.
According to the following lemma, J is also the invariant cr-algebra
of 9t(Z,S) for any finite time T supported by (0, J7). Since (Z°,S°) =
9s0(Z,S), this means in particular that J is the invariant cr-algebra of
(Z°,S°):
J=(Z°,S°)-ll [that is, J = {{(Z°,S°)eB}: Bel}]. (7.3)
Lemma 7.1. For any finite time T supported by (Q,J-) it holds that
{8T(Z,S)GB} = {(Z,S) GB}, Be 1. (7.4)
Proof. For B el we have
{8T(Z,S)eB}= \J{8t(Z,S)eB,T = t}
tew.
= \J{(Z,S) e b,t = t} = {{z,s) e B}
tew.
as desired.
□
282 Chapter 8. STATIONARITY, THE PALM DUALITIES
7.2 Conditioning on J
We now show that stationarity and cycle-stationarity properties are
preserved under conditioning on J.
Theorem 7.1. Let (Z, S) and ((Z°,S°),U) be linked as in Section 2.4 and
let P and P° be two probability measures on (fl, F). Then the following
claims hold.
(a) The pair (Z,S) is stationary under P if and only if it is so
conditionally on J, that is, if and only if
E[f(0t(Z,S))\J]=E[f(Z,S)\J] (7.5)
for f GH&C+ and t e E.
(6) The pair (Z°, S°) is cycle-stationary under P° if and only if it is so
conditionally on J, that is, if and only if
E°[f(0sAZ,S))\J] = E°[f{Z°,S°)\J] (7.6)
for f £%®C+ andneZ.
(c) The formula (3.3) holds if and only if it holds conditionally on J,
that is, if and only if
E[f(0Sn(Z,S))/Xo\J] = E[f(Z°,S°)/X0\J] (7.7)
for f eH®C+ andne 1.
(d) The formula (3.2) holds if and only if it holds conditionally on J,
that is, if and only if
N,
E[YJf{eSk{Z,S))\j}=tE[f{Z°,So)IX0\J] (7.8)
k=\
for f GH&C+ and t G K.
(e) Suppose (Z,S) is stationary under P. Then U is independent of J
and
E[JVi|J] = E[1/X0|J] {conditional intensity). (7.9)
Proof, (a) Clearly, the 'if part holds [take expectations in (7.5)]. In order
to prove the converse, suppose (Z, S) is stationary under P. Then for B £
U <g> C, f G U <g> £+, and t € E, we have
E[f(8t(Z,S))l{el{z,s)eB}] = E[f{Z,S)l{{Z,s)€B}].
Section 7. The Invariant u-Algebras I and J 283
By the definition of X, this yields that for B G X, f G U ® £+, and i G R,
E[/(6»t(Z, S))l{(z,s)eB}] = E[/(Z, S)l{(z,S)eB}],
which is a reformulation of (7.5).
(6) Clearly, the 'if part holds [take expectations in (7.6)]. In order to
prove the converse, suppose (Z,S) is cycle-stationary under P°. Then for
B£?{®£,/e^®£+,andnGZ,we have
E°[/(0s„(-£, S))l{eSn(Z,S)eB}} = E°[/(Z°, S°)1{(Z° ,s°)eB}]-
Apply Lemma 7.1 to obtain that for B G X, / G W <8> £+, and (£i,
E°[/(6»s„(^5))l{(Z,S)eB}] = E°[/(Z°,S°)l{(Z,S)eB}],
which is a reformulation of (7.6).
(c) This follows by replacing E°[-] by E[-/X0] in the proof of (6).
(d) Clearly, the 'if part holds [take expectations in (7.8)]. In order to
prove the converse, suppose (3.2) holds. Then for B G W<8> C, f € % <8> C-+,
and n € Z, we have [apply (3.2) with / replaced by /1b]
jv,
E[^/(^(^5))l{eSt(Z,s)eB}]=tE[/(Z0,S0)l{(ZoiSo)eB}].
Apply Lemma 7.1 to obtain that for B G 1, f G U <8> £+, and t G K,
E[^/(6»st(^S))l{(Z,s)€B}]=tE[/(Z0,S0)l{(Z,s)eB}],
which is a reformulation of (7.8).
(e) Suppose (Z, S) is stationary. Then, by Theorem 3.1, U is independent
of {Z°,S°). Due to (7.3), J C ^{(Z0,^0)}. Thus U is independent of J.
Also, by Theorem 3.1, (3.2) holds, and thus (7.8) holds. Take t = 1 and
/ = 1 in (7.8) to obtain (7.9). □
7.3 The Point-Shift Invariant a-Algebra Coincides with J
We now establish a curious result, namely, that the cr-algebra invariant
under shifts to the points in fact coincides with J. Since J is invariant
under all shifts, one would expect it to be strictly smaller.
For igZ, define the point-shift r^ from H x L to H x L by
Tk(z,s)=eth(z,8). (7.10)
Theorem 7.2. It holds that B G I if and only if
B = t~1B, n£Z. (7.11)
Thus
1 = {B G H ® C : t~1 B = B for n G Z}.
284 Chapter 8. STATIONARITY, THE PALM DUALITIES
Proof. In order to show that B£l implies (7.11), apply Lemma 7.1 with
the general (Q, T) replaced by the canonical {Hy.L,1-L®C) and T replaced
by sn and note that we can write (7.11) as
B = {dSn{z,s)eB}, n<=Z.
In order to establish the converse, suppose (7.11) holds. Thus
6tlB = 0^TolB [due to (7.11)]
= {Todt(z,s)eB}
nSZ
= U {(z's) € ■B's"-1 < * ^ s«) Idue t0 (7-11)]
nSZ
= {(z,a)GB}=B.
Thus (7.11) implies that G^1 B = B for all t e R, that is, Bel. □
8 The Randomized-Origin Duality
We are now ready for the latter Palm duality between stationarity and
cycle-stationarity. This duality has the informal randomized-origin
interpretations stated at (1.2) and (1.2°), namely, the cycle-stationary dual
behaves like the stationary process with origin shifted to a point chosen
uniformly at random among all the points, and conversely, the
stationary dual behaves like the cycle-stationary process with origin shifted to a
time chosen uniformly at random in R. We motivate these interpretations
in the next section. They are informal because there is neither a uniform
distribution on a countable set of points nor on R.
In order to see at this point why a duality with these interpretations
is reasonable, consider a stationary recurrent Markov chain in two-sided
continuous time. If we leave out the cycle straddling the origin, then the
cycles between entrances to a particular fixed reference state are i.i.d. Thus
if we allow ourselves to pick a cycle uniformly at random among all the
cycles, then the one straddling the origin should be lost (should disappear
to plus or minus infinity), and thus the cycles seen from the selected cycle
should form an i.i.d. sequence. Conversely, if we have a process formed by
such i.i.d. cycles, then selecting a new origin uniformly at random in R
should result in a stationary process. In fact, selecting a time uniformly
at random in R should result in ending up in a cycle that is stochastically
longer than a typical cycle because our uniform time is more likely to end
up in a long interval than a short one. The longer the cycle, the likelier it
is to be selected. Thus the length-biasing below. (Why the conditioning on
Section 8. The Randomized-Origin Duality 285
the invariant u-algebra J is needed might be clarified by the example in
Section 10.2 below.)
Like the point-at-zero duality, the randomized-origin duality is obtained
in two separate steps, one measure-free (shifting to and from a point),
the other involving only the measure (length-biasing and length-debiasing
the cycle straddling the origin, this time conditionally on the invariant
cr-algebra J). The order in which the steps are taken does not matter.
The measure-free step was taken in Section 2.4, and the biasing (change of
measure, Radon Nikodym) step we take now.
8.1 Length-Biasing -B- Length-Debiasing
Recall that Xq is the length of the cycle straddling the origin. Suppose we
are given a probability measure P on (0, T) satisfying
E[1/X0|J] < oo. (8.1)
Then we can define a new probability measure P° on (fi, J7) by letting it
have the density dP°/dP := 1/(X0E[1/X0|J]) with respect to P, that is,
dP° = i^T^7TdP (length-debiasing P given J). (8.2)
Lemma 8.1. // (8.1) and (8.2) hold, then
P° = P on J, (8.3a)
E°[Y\J] = W2QQt Y&T+, (8.36)
E[1/X0|J]
1
E[1/X0|J]'
Proof. We obtain (8.3a) as follows: for A G J
E°[*o|J] = ^ * ,„,• (8.3c)
P°(A)=E°[ly4] = E[ly4^^] (due to (8.2))
= E [E [1A e[11//^°|j] I J] ] (conditioning on J)
- E[l^ . , ^[1/XolJ]] (moving out functions in J)
= E[U]=P(A).
286 Chapter 8. STATIONARITY, THE PALM DUALITIES
We obtain (8.36) as follows: for Y G T+ and A <= J,
l/*o
E°[1AY] = E
= E
UY
E[1/X0|J]J
^y^h-p]
= E
= E°
h
E[1/X0|J]
1
E[1/X0|J]
1
U;
(due to (8.2))
(conditioning on J)
E[yyX0| J] (moving out functions in J)
E[y/X0|J]| (due to (8.3a)).
E[1/X0|J]'
Take Y = X0 in (8.36) to obtain (8.3c). □
Since 0 < X0 < oo implies E[l/X0| J] > 0, we obtain from (8.3c) that
E°[X0|J] < oo. (8.1°)
Thus (8.2) can be rewritten as
X0
dP
E°[X0|J]
dP° (length-biasing P° given J).
(8.2°)
Conversely, suppose we are given a probability measure P° on (0, J7)
satisfying (8.1°). Then we can define a new probability measure P on (0,T)
by (8.2°). From (8.2°) we obtain (mimicking the proof of Lemma 8.1)
P = P° on J,
E°[yx0|j]
E[T|J]
E[1/X„|J] = i^
E°[X0|J] '
1
Y eT+,
e°[x0ij]'
(8.3a°)
(8.36°)
(8.3c°)
Since X0 > 0 implies E°[X0|J] > 0, we obtain from (8.3c°) that (8.1°)
holds. Thus (8.2°) can be rewritten as (8.2).
We have established that the length-debiasing at (8.2) is equivalent to the
length-biasing at (8.2°). This yields a duality (one-to-one correspondence)
between probability measures P on (0, T) satisfying (8.1) and probability
measures P° on (0,T) satisfying (8.1°).
8.2 Stationarity *+ Cycle-Stationarity
Combining this measure duality between P and P° and the measure-free
duality [Section 2.4] between (Z,S) and ((Z°,S°),U) yields the following
duality between stationarity and cycle-stationarity.
Section 8. The Randomized-Origin Duality 287
Theorem 8.1. Let {^i,T) be a measurable space supporting (Z,S) and
((Z°, S°),U), where Z and Z° are two-sided shift-measurable processes, S
and S° are two-sided sequences of times increasing strictly from — oo to oo
with S_i < 0 ^ So and Sq = 0, and U is a (0,1] valued variable. Let (Z, S)
and {(Z°,S°),U) be linked by
(Z°,S°) = 9So(Z,S) and U=-S^IX0
or, equivalently, by
(Z,S) = dHl_U)XS{Z°,S°) [thus X0 = X°].
LetP andP° be probability measures on (£l,F) satisfying (8.1) and (8.2),
that is,
E[l/X0|J]<oo and dP° = ^ dP
or, equivalently, satisfying (8.2°) and (8.2°), that is,
E°[X0|J]<<X) and dP= *° dP°.
Then
(Z, S) is stationary under P (8-4)
if and only if
(Z°,S°) is cycle-stationary under P°
and U is uniform on (0,1] and independent of (Z°, S°).
Comment. Note that U is uniform on (0,1] and independent of (Z°,S°)
under P if and only if it is so under P°. This follows from Lemma 4.1: first
take g(Z°, S°) = 1/(X0E[1/X0|J]) and then g(Z°, S°) = X0/E°[X0|J] to
obtain that the conditional distribution of U given (Z°, S°) is uniform on
(0,1] under one of the measures if and only it is so under the other.
Proof. Due to the equivalence of (a) and (d) in Theorem 3.1 and due
to Theorem 7.1(c), (8.4) holds if and only if U is uniform on (0,1] and
independent of (Z°, S°) under P and, for / G U <g> C+ and n G Z,
E[/(6»Sn (Z, S))/X0\J] = E[/(Z°, S°)/X0\J]. (8.5)
Thus [according to the above comment] the equivalence of (8.4) and (8.4°)
follows if we can establish that (8.5) is equivalent to (Z°,S°) being cycle-
stationary under P°. For that purpose, divide by E[l/X0| J] on both sides
of (8.5) and then apply (8.36), on the left with Y = f(8s„(Z,S)) and on
the right with Y — f(Z°,S°), to obtain that (8.5) is equivalent to
E°[f(0sn(Z,S))\J] = E°[/(Z°,S°)|J], /£Ji8£+,»GZ.
Due to Theorem 7.1(6), this holds if and only if (Z°, S°) is cycle-stationary
under P°. □
(8.4°)
288 Chapter 8. STATIONARITY, THE PALM DUALITIES
9 Interpretation - Cesaro Limits and Shift-Coupling
The randomized-origin interpretations of the duality established in
Theorem 8.1 [stated in words at (1.2) and (1.2°)] can now be formulated as
follows:
P(0uniform point of S(Z, S) G •) = P°((Z°, S°) G •), (9-1)
P° (Uniform time in R(Z°, S°) G •) = P((Z, S) G •)• (9-1°)
This of course does not have an immediate meaning, because such uniform
random variables do not exist. In this section we present two results
motivating (9.1) and (9.1°). Both results are straightforward consequences of
the coupling equivalences in Section 7.4 of Chapter 7.
9.1 Cesaro Total Variation Motivation of (9.1) and (9.1°)
The following theorem gives a Cesaro total variation meaning to randomized-
origin interpretations (9.1) and (9.1°).
Theorem 9.1. Suppose the equivalent claims (8.4) and (8.4°) hold. Let A
be the Lebesgue measure on (R, B), # denote number of elements in a set,
tv
and —> denote convergence in total variation.
Then as n —> oo,
-r±- £ P(eSk(Z,S) G •) 4 P°((Z°,S°) G •) (9.2)
W*n keBn
for all integer subsets BnCZ,0<n<oo, satisfying 0 < #i?ra < oo and
such that for all k G Z,
#((& + Bn) fl Bn)/#Bn —> 1, n —> oo, [F0lner averaging sets}.
Examples of such sets are Bn = {—n,..., n} and Bn = {0,..., n}.
Conversely, as h —» oo;
tt^t / P°(9,(Z0,s°)e-)d8tAp((z,S)e-) (9.2°)
-H-D/iJ JBh
for all Borel sets Bh G B, 0 < h < oo, satisfying 0 < X(Bh) < oo and such
that for all t G E,
X((t + i?h) n Bh)/X{Bh) -» 1, ft -> oo, [F0lner averaging sets}.
Examples of such sets are Bh = [—h, h} and Bh = [0, ft] and, more generally,
Bh = hB where B is any Borel set such that 0 < \(B) < oo.
Section 9. Interpretation - Cesaro Limits and Shift-Coupling 289
Comment. The Cesaro results (9.2) and (9.2°) can be rewritten in the
following randomized-origin form. With Kn uniform on Bn and independent
of (Z, S) under P we have
J>(9sKJZ,S)e-)t4P°((Z°,S°)e-), n^<x>. (9.3)
With Vh uniform on Bh and independent of (Z, S) under P° we have
P°(0vh(Z°,S°)e-)^P((£,S)e-), a-k». (9.3°)
Proof. Note that (8.3a) in Lemma 8.1 can be written as
P((Z, S) G •) = P°((Z°, S°) G •) on I. (9.4)
In order to establish (9.2) let rk, k G Z, be the point shifts defined at (7.10)
and note that according to Theorem 7.2,
I = {B G U ® C : t~1B = B for k G Z}.
Apply the results in Section 7.4 of Chapter 7 with
Y having the distribution P((Z, S) £ •),
Y' having the distribution P°((Z°, S°) G •), (9.5)
G = {rk :k G Z}.
Then (9.4) is the condition (c) in Section 7.4 of Chapter 7, and we obtain
(9.2) from the final display in that subsection.
In order to establish (9.2°) apply the results in Section 7.4 of Chapter 7
with
Y having the distribution P°((Z°,S°) G •),
Y' having the distribution P((Z, S) G •), (9.5°)
G = {9t : t G R}.
Then (9.4) is the condition (c) in Section 7.4 of Chapter 7, and we obtain
(9.2°) from the final display in that subsection. For the fact that Bh = hB
are F0lner averaging sets, see Theorem 2.1 in Chapter 7. □
9.2 Shift-Coupling Motivation of (9.1) and (9.1°)
The following shift-coupling result gives a surprisingly strong motivation of
(9.1) and (9.1°): the two processes can be represented as a single process
with different origins.
290 Chapter 8. STATIONARITY, THE PALM DUALITIES
Theorem 9.2. Suppose the equivalent claims (8.4) and (8.4°) hold. Then
the probability space (0, T, P) can be extended to support a random integer
K such that
P(dsK (Z, S) G •) = P°((Z°, S°) G •)• (9.6)
Conversely, the probability space (0, T, P°) can be extended to support a
random time T such that
P°{dT{Z°,S°) G •) = P((£, S) G •)• (9-6°)
Proof. In order to establish (9.6) apply the results in Section 7.4 of
Chapter 7 with Y,Y' and G as at (9.5). Then (9.4) implies (c) in Section 7.4 of
Chapter 7, and we obtain (9.6) from (a') in that subsection.
In order to establish (9.6°) apply the results in Section 7.4 of Chapter 7
with Y,Y' and G as at (9.5°). Then (9.4) implies (c) in Section 7.4 of
Chapter 7, and we obtain (9.6°) from (a') in that subsection. □
10 Comments on the Two Palm Dualities
We end this chapter with a few comments on the relation between the
point-at-zero duality of Theorem 4.1 and the randomized-origin duality of
Theorem 8.1.
10.1 When Do the Two Dualities Coincide?
Under what conditions is it true that standing at the origin of a stationary
point-stream, and happening to find a point there, is equivalent to standing
at a point selected uniformly at random from the point-stream? That is,
when do the two Palm dualities coincide? We shall now specify the exact
condition.
In order to distinguish between the two dualities let Pj and P° be two
probability measures on (0, T) linked as in Theorem 4.1:
E1[l/X0]<oo and dP° = -L^L-jdP,, (10.1)
or equivalently,
E°[X0]<<X) and dPi = -^-dP°; (10.1°)
and let P2 and P£ be two probability measures on (fi,.?7) linked as in
Theorem 8.1:
E2[1/X0|J] < 00 and dP° = ^^"^dPa, (10.2)
Section 10. Comments on the Two Palm Dualities 291
or equivalently,
E2°[X0|J]<<X) and dP2 = *° dP2°. (10.2°)
t^2[X0\J\
The two Palm dualities coincide when
p1=p2 o P°=P2.
Consequently, if (Z, S) is stationary under a probability measure P and
E[l/X0] < oo, then the two cycle-stationary duals coincide if and only if
E[1/X0|J] = E[l/X0] a.s. P. (10.3)
Conversely, if (Z°,S°) is cycle-stationary under a probability measure P°
and E°[Xo] < oo, then the two stationary duals coincide if and only if
E°[X0|J]=E°[Xo] a.s. P°. (10.3°)
Note that the dualities coincide in the ergodic case, that is, when
P(A) = 0 or 1 for A <= J [J is trivial under P]
and, equivalently,
P°{A) = 0 or 1 for A <= J [J is trivial under P°],
Note also that the exact coincidence conditions (10.3) and (10.3°) are
weaker than ergodicity, that is, may hold even in a nonergodic case.
10.2 Can the Two Dualities Differ?
Here is a simple example showing that the two dualities can differ. Let
Y be a strictly positive random variable supported by a probability space
(fi,J",P). Define (Z°,S°) by
Z°t = 0, t € M, and S° = nY, n G Z,
that is, Z° is a nonrandom constant, S° is a lattice with a random span Y,
and
■•• = X-l=Xo=X1=--- = Y.
Let U be uniform on (0,1] and independent of Y under P. Put
(z,s) = e.{1.u)Xo(z°,s°).
Then (Z,S) is stationary under P. (Note also that the pair (Z°,S°) is
cycle-stationary under P, in fact, under any measure.)
292 Chapter 8. STATIONARITY, THE PALM DUALITIES
The cycle-stationary dual according to the point-at-zero duality exists if
E[l/y] < oo. The length-debiasing of P then yields the measure
dpo 1/y dP
This results in a length-debiasing of the span Y of the random lattice 5°.
The cycle-stationary dual according to the randomized-origin duality is
even simpler. Note that the random span Y of the lattice S° is the same
measurable function of #t(Z, S) for all t £ E, namely, Y is the length of the
interval straddling t. This means that Y € J+, and thus
E[1/Y\J] = 1/Y.
Thus the length-debiasing conditionally on J does not change P. Thus the
cycle-stationary dual according to the randomized-origin duality is simply
(Z°, S°) under P itself. Since P and P° differ, the two dualities differ.
The interpretation of the point-at-zero duality says the following in this
case. If you observe a stationary random lattice from the origin and happen
to find a point there, then the span of the random lattice shrinks. This is
not too strange, because a stationary lattice is more likely to have a point
close to the origin when the random span happens to be short.
The interpretation of the randomized-origin duality, on the other hand,
says the following. If you observe a stationary random lattice from a
uniformly chosen point, then the span of the random lattice remains the same.
This is not strange at all, because, obviously, choosing some point to view
the lattice from will not alter its span.
Remark 10.1. If we take Y such that E[l/F] = oo, then (Z,S) under
P is an example of a stationary pair with infinite intensity and thus no
cycle-stationary point-at-zero dual.
10.3 Random Time Change Hides the Gap Between the Dualities
We can make the two Palm dualities coincide by a simple random time
change. Let (Z, 5) be stationary under a probability measure P such that
E[l/Xo|»7] < oo. Let Z be measurable under change of time scale and
change the time scale by
R := E[1/X0|J]
to obtain
{(Zs/r)s€R,RS).
This new pair ((Zs/fi)seR, RS) is stationary, and the length of the cycle
straddling the origin is RX$. Since R is J measurable, we have
E[1/(RX0)\J] = E[1/X0\J]/R = 1 [thus E[1/(RX0)\ = 1].
Section 10. Comments on the Two Palm Dualities 293
The invariant cr-algebra of ({Zs/n)se^, RS) is contained in J, and thus
E[l/(RX0)\((Zs/R)seU,RS)-lI\ = 1.
Since also E[1/(RX0)] = 1, the coincidence condition holds [see (10.3)].
Consequently, the two cycle-stationary Palm duals of ((Zs/^)sGr, RS)
coincide, that is, ((Zs/r)s£u,RS) has only one cycle-stationary Palm dual.
Note that 1/(RX0) = 1 / (X0E[1 / X0\J}), and thus the change of measure
used to obtain this common cycle-stationary dual of ((Zs/R)seu,RS) is
the same as the change of measure used to obtain the cycle-stationary
randomized-origin dual of (Z,S). Therefore, this procedure preserves the
randomized-origin duality and not the point-at-zero duality.
In fact, we lose the point-at-zero duality by this procedure: the point-at-
zero duality merges with the randomized-origin duality by the time change
and does not reappear when we return to the original time scale after
changing the measure (as the randomized-origin duality does). Thus the
time change is not a way to bridge the gap between the two dualities; it
only hides it. To bridge the gap we cannot avoid a change of measure: with
Pj and P2 the two dual measures of P denned at (10.1) and (10.2) with
Pj = P2 = P we have (provided E[l/X0] < 00)
E[1/X0|J] E[l/X0]
dPl - E[i/xo] dP2 and dpi = W7w]dPl-
There is an important distinction between the two Palm dualities. Using
the first when the second is appropriate (for instance when averaging over
the points) can lead to wrong results.
10.4 On Marked Point Processes
The sequence of times S = (S/O^oo is sometimes called a simple point
process. If to each point Sn, there is associated a random element Yn then
the joint sequence (5^,1^)^ is a marked point process and Yn is the mark
of the point Sn.
In this chapter we have considered S in association with a stochastic
process Z — (Zs)sGr. This is equivalent to considering a marked point
process in the following sense. When (Z, S) is given, we could define the
mark of the point Sn to be Yn :— 9s„Z- Conversely, when a marked point
process (S/t, Y/i)^ is given, we could define Z by letting Zs be the marked
point process with origin shifted to s. A similar comment applies in the
next chapter.
o
Chapter 9
THE PALM DUALITIES
IN HIGHER DIMENSIONS
1 Introduction
In the previous chapter we considered stochastic processes split into cycles
by a sequence of random times (called points) and established two Palm
dualities between stationary processes and cycle-stationary processes. We
shall now extend this theory to d > 1 dimensions: to random fields
'punctuated' by a countable set of isolated points scattered over Rd in some
random manner (a simple point process).
This extension is basically straightforward using so-called Voronoi cells
instead of intervals, that is, associating to each point the set of sites that are
closer to that point than to any other point. There is, however, one major
complication, namely the apparent lack of a higher-dimensional analogue of
cycle-stationarity. There are no cycles in higher dimensions, so what does
cycle-stationarity mean there?
In one dimension cycle-stationarity means that the cycles of the
process form a stationary sequence. This definition can be rephrased as point-
stationarity: the behaviour relative to a given point is independent of the
point selected as origin; the process looks the same from all the points. Note
that point-stationarity is different from stationarity: stationarity means
that the behaviour of the process relative to any given nonrandom time is
independent of the time selected as origin; the process looks the same from
all nonrandom times.
Point-stationarity, the property that the process looks the same from all
the points, should make sense also in higher dimensions. But what does
it mean, exactly? How should point-stationarity be formally defined when
295
296 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
d > 1? The answer to this question needs some motivation. Since the point-
stationarity problem is what really separates the higher-dimensional case
from the one-dimensional case, we shall highlight it in the structure of the
chapter.
The point-stationarity problem is presented in Section 2 and solved in
Section 3. After defining point-stationarity in Section 3, we further
characterize the concept in Sections 4, 5, and 6. These characterizations are then
used to extend to d > 1 dimensions the theory of the previous chapter: the
point-at-zero duality is presented in Section 7 and the randomized-origin
duality in Section 8. Section 9 concludes with comments on the two Palm
dualities and on possible extensions of the point-stationarity concept, for
instance to the zero set of Brownian motion.
2 The Point-Stationarity Problem
This section explains the point-stationarity problem in full detail,
moving from the obvious one-dimensional case to the not-so-obvious higher-
dimensional case. Necessary notation is introduced along the way. In order
to highlight the problem we consider it first in the context of simple point
processes only, not introducing the associated random field until the next
section.
2.1 The Simple Point Processes N and N°
Intuitively, a simple point process in d dimensions (d ^ 1) is a countable
set of isolated points scattered over the d-dimensional Euclidean space M.d
in some random manner (like planets scattered over space). In the one-
dimensional case in Chapter 8 this random set of points was written as an
increasing sequence of random times. There is no natural analogue of this
procedure in higher dimensions. Instead, we shall represent the random set
of points (in the standard way) by a collection of random variables
iV = (N{B) : B € Bd),
where Bd are the Borel subsets of Rd and
N(B) = the number of points in the set B.
More precisely, a simple point process in d dimensions is a random
element N in the measurable space (M, M), where M is the set of all simple
counting measures and M. is the product a-algebra on M, that is,
M = set of integer-valued measures // on (Rd,Bd) with fi(B) < oo,
for all bounded B e Bd, and //({£}) = 0 or 1, for all t e Rd,
Section 2. The Point-Stationarity Problem 297
and
M — smallest cr-algebra such that the projection from M to [0, oo]
taking // to fi(B) is M/B([0, oo]) measurable for each B e Bd.
We shall write iV0 to indicate that one of the points is placed at 0 (the
origin of Rd), that is,
N°({0}) = 1.
We shall regard iV° as a random element in (M°,M°), where M° is the
subset of M containing the simple counting measures having mass one at
the origin,
M° = {/x € M : /x({0}) = 1},
and M° is the trace of M° on M,
M° = Mf)M°.
Let (il,J-) be the measurable space supporting N and iV0 and let P and
P° be two probability measures on (ft, J7). In this section, and the next,
we shall not postulate any link between N and iV0, nor between P and P°.
We shall think of N as governed by P and iV° as governed by P°.
2.2 Sites and Points
Call a nonrandom element t of Rd a site (and not a point) to distinguish it
from the points of the point process. Call a random element T in (Ed, Bd)
a random site. Call a random site H a point or a random point of iV only
if the point process iV has a point at II, that is, only if
N({n}) = 1 [short for N({II(lj)})(lj) = 1, w € ft ].
Similarly, call a random site 11° a point or a random point of iV0 only if
the point process iV° has a point at 11°, that is, only if
N°({n°}) = l.
A point process typically does not look the same seen from a nonrandom
observation site as seen from an observation point (the universe does not
look the same seen from space as seen from a planet).
2.3 Site-Shifts
For (i e M, let s(/x) denote the set of //-points (the point pattern):
s(fJ-) ={pef: fJ-({p}) ~ 1} = the support of [i.
298 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
For t eRd, define the shift or site-shift 0t taking // G M to 6tfi G M by
Otn(B)= n(t + B), B&B-, [hevet + B = {t + s:s € B}}
In order to work with expressions like iV({T}) and #xiV we need the
following joint measurability or shift-measurability result.
Theorem 2.1. For each B G Bd, the mapping taking (fi,t) G M x Rd
to [i(t + B) G [0, oo] is M <& Bd/B([0,oo]) measurable. Equivalently, the
mapping taking (fi, t) G M x Rd to 9tfi G M is M ® Bd/M measurable.
Proof. The equivalence follows from the definition of M. We shall prove
the former claim, namely, the M ® Bd/B([0,oo}) measurability of
Consider first B = [a, 6) where a = (ai,..., aj) and b = (&i,..., bd) are in
Rd and a,\ < bi,..., aj < bj and [a,b) = [ai, &i) x • • • x [a^, &d) • Take a real
number h > 0 and put [t\h = sup{s G hZd : s ^t}. Define
9h ■ (M) >-> n{[t]h + [a,b))
and note that
(//, t) >-» (//, [*]h) is AI <8> Bd/AI (8) B(/«Zd) measurable,
(//, r) >-> /x(r + [a, 6)) is .M (8) B(hZd)/B([0, oo]) measurable,
and that gh is the composition of these two mappings.
Thus gh is M. (8> $d/$([0, oo]) measurable. Now, <?/! goes to /[Qi6) pointwise
as ii | 0. Hence /[Qib) is M (8> Bd/B([0, oo]) measurable. Hence so is /b
for all B in the algebra generated by sets of the form [a,b). Thus the
class of all B G Bd such that /b is M (8> Bd/B([0, oo]) measurable contains
an algebra generating Bd. Moreover, this class is monotone [since if Bn
increases/decreases to B as n —>• oo, then /en increases/decreases to fs
pointwise as n —>• oo, and thus if fsn is M ® Bd/B([0,oo]) measurable,
then so is /#]. This and the monotone class theorem (see Ash (1972),
Theorem 1.3.9) imply that the class coincides with Bd. Thus fs is M (8>
Bd/B([0, oo]) measurable for each B &Bd. D
2.4 The Point-Stationarity Problem for N°
The point process N is stationary if
N = N, (ef, [ = denotes identity in distribution].
Note that #( shifts the point pattern by —t, that is, 6t shifts the origin (the
observation site) to t. Thus stationarity of the point process N means that
it looks the same from all nonrandom observation sites.
Section 2. The Point-Stationarity Problem 299
Similarly, it would be natural to say that the point process iV° is point-
stationary if it looks the same from all observation points. What this means
is not clear, except in one dimension. When d = 1, point-stationarity means
that iV0 is interval-stationary, that is, the intervals between the points form
a stationary sequence:
(^"«+fc)^°oo = (^"fc)^°oo) n £ Z;
here (as in Chapter 8) X% = S% — S£_1; where
• • ■ < 5°2 < S°_ 1 < 50° = 0 < S° < 52° < • • •
are the points of iV° written as an increasing sequence. This definition of
point-stationarity when d = 1 can be rewritten as
es°N°=N°, n€Z.
In other words, if the observer moves from the point at the origin to the nth
point to the right of the origin (or left of the origin), then the probability
distribution of the point pattern that he sees does not change: the point
process N° looks the same from all observation points.
When d > 1, this definition of point-stationarity does not work, since
then there are no intervals between the points of iV° to form a stationary
sequence, and the observer cannot use the simple point selection rule move-
from-the-point-at-the-origin-to-the-nth-point-to-the-right (or -to-the-left),
since there is typically no nth point to the right (left). But is there some
similar way of moving between points in higher dimensions?
2.5 The Point-Stationarity Problem in the Poisson Case
Note that even the Poisson process seems to present a problem. A point
process N is a Poisson process (with constant intensity) if the number
of points in disjoint Borel sets form independent random variables and the
expected number of points in each Borel set is proportional to the Lebesgue
measure of the set. This can be thought of as saying that the points are
scattered around completely at random. If we define iV0 by adding a point
at the origin to N,
N° := N + S0, where S0 is the measure with mass one at 0, (2.1)
then it is intuitively reasonable that all the points of iV° are equivalent as
observation points (you are standing at one of them, and the others are
scattered around completely at random). When d = 1, this is indeed the
case: it is well known that the intervals between points are i.i.d. exponential,
and thus iV° is point-stationary. But when d > 1 (in the plane, for instance),
how can we shift the origin to another point of N° without spoiling the
distribution of the point pattern that we see?
300 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
"Why not shift the origin to the closest point?" is a natural first reaction
to this question. In order to indicate why this does not work let us consider
the following example.
Example 2.1. Consider the case when d = 1 and let iV° be the Poisson
process with a point at the origin defined at (2.1). We know that iV0 is
point-stationary, so if shifting to the closest point is to be the way to shift
in higher dimensions, it ought to work in one dimension also, that is, it
should not change the distribution of JV°.
Shift the origin of iV° to the closest point to obtain 9n° N°, where
closest j^ if5°>-5°j.
Then we are sure to see, either to the right or to the left of the new origin,
an interval followed by a longer interval. This is definitely not a property
of iV°, which in both directions from the origin has an exponential interval
followed by an independent exponential interval. Thus
@n° N° does not have the same distribution as iV0,
closest '
that is, shifting the origin to the closest point does not preserve the
distribution of this point-stationary iV°.
2.6 Point-Maps and Point-Shifts
Example 2.1 illustrates that the selection of a point to shift the origin to is
the key issue in defining point-stationarity. We cannot select points in any
old way. In order to discuss this problem we need the following terminology.
Call an M°/Bd measurable mapping it from M° to Rd an N°-point-map
if it selects a point, that is, if
MW/*)}) = i, c^°.
Call the mapping 9n from M° to M° defined by
QnV '■= #7r(M)M, M G M°,
an N°-point-shift. Note that 8^ shifts the origin from a point to a point.
Call 9n the point-shift associated with n. It follows from Theorem 2.1 that
9n is M°IM° measurable [since 9V, seen as an M° valued mapping from
M° to M, is the composition of the M°/M ® Bd measurable mapping
taking (i € M° to (fj,,ir(fj)) G M x Rd and the M ® Bd/M measurable
mapping taking (fi,t) G M x Rd to 9tfi G M\.
Section 2. The Point-Stationarity Problem 301
When d = 1, examples of iV°-point-maps are the irn defined for // € M°
as follows
nth //-point to the right of 0, n > 0,
?r„(/i) = < 0, n = 0,
—nth //-point to the left of 0, n < 0.
The associated iV°-point-shifts translate the origin to the nth point to the
right or nth to the left. Note that the random points 5° in Section 2.4 can
be written as 5° = 7rn(JV°).
When d ^ 1, an example of an iV°-point-map is the shift to the closest
point, 7rciosest, defined for // £ M° by
^closest (//) = the //-point having the lexicographically highest
order among the nonzero //-points being at
shortest distance from the origin.
The lexicographic rule is just to make sure that 7rciosest (//) is uniquely
defined. The random point in Example 2.1 can be written as n°losest =
7I"closest(-/V°)-
2.7 What Is Wrong with Shifting to the Closest Point?
In Example 2.1, shifting the origin to the closest point changed the
distribution of an interval-stationary (that is, point-stationary) JV°. So what is
wrong with this point-shift?
In order to answer this question let us first consider another one: when
d = 1, what is so special about shifting the origin of iV° to the nth point
to the right? The essential property of this point-shift is the following. By
knowing the point-selection rule (select-the-nth-point-to-the-ri<?/i£) and by
looking at the point pattern from the new origin, you can always tell from
what point you came and shift the origin back to the nth point to the left
of the new origin: we first shift the origin of // to 7rn(//) to obtain 6Vnn and
then the origin of #,,•„// to
n-n{Q*n(j) = -Tn(^)
to obtain
Thus // is the only element of M° that 6-Kn shifts to //' := 9nnfi. Also, any
element //' of M° can arise from this point-shift, since taking // := 9n_n[i'
yields
302 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Thus, shifting the origin to the nth point to the right is a bijective point-
shift. When d = 1, it can be shown (see Remark 3.1 below) that an interval-
stationary iV° is distributionally invariant under the group of all bijective
iV°-point-shifts.
We can now guess the answer to the first question: shifting the origin to
the closest point is wrong because this point-shift is not bijective. There
can be more than one point of a point pattern having a particular point p
as the closest point.
2.8 Bijective Point-Shifts?
Bijective point-shifts are natural to apply in defining point-stationarity,
since under a bijective point-shift 9V all points are equally important, or
get equal attention, in the following sense: with // £ M° fixed, the mapping
is a bijection from s(/x) to s(/x), that is,
under this mapping each point is the image of a unique point.
All this suggests that we should define point-stationarity in higher
dimensions by requiring that iV° be distributionally invariant under bijective
iV°-point-shifts.
For this definition to make sense we must find some bijective point-shifts.
A simple example is the following (devised by Olle Haggstrom): shift the
origin to the closest point if that point has the point at the origin as its
closest point; otherwise, stay where you are. But as this is being written it is
still not known whether the class of bijective iV°-point-shifts is rich enough
to characterize point-stationarity appropriately when d > 1 (appropriately
in the sense that the Palm dualities hold).
It is known, however, that the class of bijective point-shifts can at least
be made rich enough to define point-stationarity when d > 1. The trick is
to consider iV° against an independent stationary background. We explain
what this means in the next section.
3 Definition of Point-Stationarity
In order to highlight the point-stationarity problem we have up to now
suppressed that the point process iV° will be regarded in association with
a random field Z°. Considering iV° jointly with Z° does not solve our
problem but is conceptually a step in the right direction.
Section 3. Definition of Point-Stationarity 303
3.1 The Associated Random Fields Z and Z°
We shall from now on consider a pair (N, Z), where JV is a simple counting
process and
Z = (■^s)s€R'f
is a random field with an arbitrary state space (E,£) and path space
(H,H), where H is a shift-invariant subset of ER and H is the cr-algebra
on H generated by the projection mappings taking z — (zs)seR<f in H to zt
in E, te Rd.
In order to be able to apply random shifts, we need the minimal regularity
condition [satisfied in the standard settings, for instance when (E,£) is
Polish and the paths right-continuous; see Section 2 in Chapter 4] that Z
is canonically jointly measurable, that is, the mapping from H x Rd to E
taking (z,t) to zt is H ® Bd/£ measurable. For t e Rd, let 9t denote the
shift or site-shift from H to H defined by
9tz = (zt+s)send, z £ H.
Canonical joint measurability is equivalent to the mapping from H xRd to
H taking (z,t) to 9tz being H ®Bdj'H. measurable (shift-measurability).
We shall not assume any functional connection between N and Z. At
one extreme, N could be determined by Z. For instance, when d = 1, the
points could be the times when a stochastic process enters a given state.
At the other extreme, Z could be identically constant, which boils down to
regarding N alone (not in association with any random field) as we did in
Section 2.
When we consider a random field in association with iV° we shall denote
it by Z°. Thus the ° on Z° is not to indicate a property of Z°. It only
indicates that Z° is considered jointly with iV° (while as before, the ° on
iV0 is to indicate that iV° has a point at the origin).
3.2 Extended Point-Maps and Point-Shifts
Let 9t denote the shift or site-shift from M x H to M x H defined, for
ieEd, by
9t(fi,z) = (9tn,9tz), (n,z)GMxH.
Call an M°®n/Bd measurable mapping tt from M° x H to Rd an (N°,Z°)-
point-map if it selects a point, that is, if
MW/i,z)}) = l, (fi,z)GM°xH.
Call the mapping 9n from M° x H to M° x H defined by
9n(v, z) = Sn{^z)(fi, z), {fi, z) e M° x H,
304 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
an (N°, Z°)-point-shift. Note that 9n shifts the origin from a point to a
point. Call 9n the point-shift associated with tv. The M° ® H/M° ® H
measurability of 9n follows from Theorem 2.1 and the shift-measurability
of Z°.
In order to distinguish (iV°, Z°)-point-maps and -shifts from iV0-point-
maps and -shifts we shall sometimes call them extended point-maps and
extended point-shifts.
3.3 The Point-Stationarity Problem for (N°,Z°)
The pair (N, Z) is stationary if it looks the same from all observation sites,
that is,
9t(N,Z)^(N,Z), t&W.
Similarly (as in the case of iV0 alone), it would be natural to say that the
pair (iV°, Z°) is point-stationary if it looks the same from all observation
points.
When d = 1, point-stationarity means that (N°,Z°) is cycle-stationary,
that is, the points of iV0 split Z° into a stationary sequence of cycles. This
definition of point-stationarity, when d = 1, can be rewritten as
9S°(N°,Z°)^(N°,Z°), n&Z,
and it can be shown that this is equivalent to (N°, Z°) being distribution-
ally invariant under the group of all bijective (N°, Z°)-point-shifts (again
see Remark 3.1).
Under a bijective (iV0, Z'0)-point-shift 9W all points are equally important,
or get equal attention, in the same sense as in the iV0-case. Namely, with
(//, z) £ M° x H fixed, the mapping
p^p + ir(9p(n,z))
is a bijection from s(/x) to s(/x), that is, under this mapping each point is
the image of a unique point.
This observation again suggests that we should define point-stationarity
for d > 1 by requiring that (N°,Z°) be distributionally invariant under
bijective (iV°, Z°)-point-shifts. And again, as this is being written, it is not
known whether the class of bijective (N°, .^-point-shifts is always rich
enough to characterize point-stationarity appropriately (appropriately in
the sense that the Palm dualities hold). But we are now closing in on a
definition of point-stationarity that works at both the intuitive and formal
levels even if the class of bijective (./V°, Z'0)-point-shifts turns out to be too
meager in general to characterize the concept.
Section 3. Definition of Point-Stationarity 305
3.4 Intuitive Motivation of the Solution
The key trick in our solution of the point-stationarity problem is to
consider (N°,Z°) against any independent stationary background, that is, to
consider (N°, Z°) jointly with an arbitrary independent stationary (shift-
measurable), random field
Let (L,£) be the path space of Y°. Note that the ° on Y° (like ° on Z°)
is only to indicate that Y° is considered in association with iV0 and does
not imply that Y° and iV° are functionally connected (while the ° on the
point process iV0 itself indicates that iV° has a point at the origin).
Intuitively, if the triple (N°, Z°,Y°) looks the same from all the points
of JV°, then so in particular does the pair (N°, Z°). Conversely, if (N°,Z°)
looks the same from all the points of iV°, then so will (iV°, Z°, Y°) because
[due to the stationarity of Y°] Y° looks the same from all random sites
that are independent of Y°, and thus [due to the independence of Y° and
(N°, Z°) and the fact that (iV°, Z°) looks the same from all points of N°]
the triple (JV°, Z°, Y°) should look the same from all points of N°.
That is, (N°, Z°) should be point-stationary if and only if (N°, Z°,Y°)
is point-stationary.
3.5 Solution of the Point-Stationarity Problem
This suggests that we call the pair (N°,Z°) point-stationary if the triple
(N°,Z°,Y°) is distributionally invariant under all bijective (N°,Z°,Y°)-
point-shifts for all shift-measurable random fields Y° that are stationary
and independent of (iV0, Z°). The above discussion motivates this definition
intuitively, while the theory established in the upcoming sections motivates
it practically. Here is the definition stated in full detail.
Definition 3.1. Let iV° be a simple point process and Z° a random field
defined on a probability space (ft, J7, P°). Call (N°,Z°) point-stationary
if for each shift-measurable random field Y° that is stationary and
independent of (iV0, Z°) and possibly defined on an extension of (fi, T', P°), it
holds that
6*770 (N°, Z°, Y°) = (N°,Z°,Y°) ' (3.1)
for all random points 11° of the form
n° = Tr(N°,Z°,Y°), (3.2)
where it is any (JV°, Z°,F°)-point-map [that is, it is an M° ®U® £/Bd
measurable mapping from M° x H x L to Rd that selects a point:
H({ir(fi, z,y)}) = 1, (/i, z,y) € M° x H x L]
306 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
such that the associated point-shift [that is, the M.° ®1i® jC/M° ®1-L® jC
measurable mapping 0^ from M° x H x L to M° x H x L defined by
Mm, z>2/) =07r(M,z,y)(M, z,Z/), (v,z,y) £ M° x H x L]
is a bijection.
REMARK 3.1. When d = 1, this point-stationarity definition is equivalent
to the apparently weaker property of cycle-stationarity, since both
properties are equivalent to (4.1) in Theorem 4.1 below.
REMARK 3.2. Definition 3.1 is equivalent (see Lemma 4.1 below) to the
apparently weaker condition
eno(N°,Z°) = (N°,Z°) (3.3)
for all random points
n° = Kn(N°,Y°),
where Y° and k„, n £ Z, are from the family of random fields and point-
shifts (indexed by h > 0) defined in the next subsection.
3.6 The Key Example of Extended Bijective Point-Shifts
We shall now construct a random field Y° and a two-sided sequence of
(Ar°, y°)-point-maps k„, n £ Z, such that the associated (N°, y°)-point-
shifts are bijections. Both Y° and the point-maps will depend on a fixed
constant h > 0. Thus we are really defining a family of random fields and
point-shifts indexed by h > 0, although we suppress the parameter h in the
notation.
We start by constructing Y°, which will simply represent the stationary
point pattern KLd — hU, where U is uniformly distributed on [—|, |)d. Let
b = (bs)seud be the fixed function from Rd to Rd defined by
bs = the vector from the site s to the closest element of /iZ .
If there are more than one such /iZd-elements let 6S be the vector to the
element of highest lexicographic order (thus bs is right-continuous in all the
d coordinates of s).
Let Y° have state space (Rd,Bd) and have paths in
L = {6tb : t € [-h/2, h/2)d} = {9tb : t £ Rd}.
Let U be a random site that is uniformly distributed on [— |, |)d and
independent of (iV0,X0) and define
Y°:=6uhb (see Figure 3.1)
Section 3. Definition of Point-Stationarity 307
Clearly, Y° is stationary and independent of (N°, Z°), and shift-measurability
follows from right-continuity of the paths.
Now turn to constructing the bijective (N°, y°)-point-maps k„, n e Z.
Fix (fi,y) e M° x L. Call a site t e Rd such that j(=0a y-center and the
associated set t + [—h/2,h/2)d a y-box. Note that yo is the center of the
y-box containing the origin. Let k be the number of ^-points in that box,
k = fi(y0 + [-h/2,h/2)d).
Note that k ^ 1, since 0 e s(fi) and y0 e (-h/2,h/2]d.
Let p0,... ,Pk-i be the ^-points in s(fi) (1 (yo + [—h/2,h/2)d) ordered
lexicographically. Let m denote the index of the /i-point at the origin, that
is, pm = 0. For n € Z, put
Kn(fi, y) = P(m+n) mod k (see Figure 3.1)
where
(m + n) mod k = inf(m + n — kZ) D [0, oo).
That is, Kn(fi,y) is the nth point after the point at the origin (if n < 0,
this is to be interpreted as the —nth point before the point at the origin)
in a circular enumeration of the points in the box containing the origin.
Thus the point at the origin is the nth point before Kn(fi,y) in that same
circular enumeration. Thus
6K_JKn(ii,y) = (ii,y), (n,y)€M°xL.
Thus (fi, y) is the only element of M° x L that 9Kn shifts to (fi',y') : =
dKn(fi,y). Also, any element (fi',y') of M° x L can arise from 9Kn, since
taking (fi, y) := 0K_„ (fj,',y') yields 9Kn (fi, y) = {fi',y'). Thus 6Kn is bijective.
The box containing
the point at the origin
2
I I r i Center of the box
\___/ containing the site s
FIGURE 3.1. Definition of Y° and /t„ (d = 2 and A: = 5).
Origin placed uniformly
at random in the box ""
308 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
4 Palm Characterization of Point-Stationarity
In this section we establish a pleasant characterization of point-stationarity
in terms of nonrandom site shifts. We shall call it Palm characterization
because it is the key to the Palm dualities presented in Sections 7 and 8
below. At the end of the section we state a dual Palm characterization of
stationarity. This is analogous to the equivalence of (a) and (d) in
Theorem 3.1 of Chapter 8.
We start by introducing an important concept, so-called Voronoi cells,
which play in some respect the same role in higher dimensions as the
intervals between points in one dimension.
4.1 Voronoi Cells
Consider a point pattern in Rd represented by a simple counting measure
fi £ M. Note that the point pattern s(fi) need not have a point at the origin.
To each point p £ s(fi) associate the set of sites t £ Rd that are strictly
closer to p than to any other point. This set is the open Voronoi cell with
point p. Define the Voronoi cells themselves by extending the open Voronoi
cells in such a way that a site t £ Rd that is at equal minimal distance
to two or more points belongs to the cell with point p having the highest
lexicographic order.
Thus to each point p £ s(fi) there is associated a Voronoi cell. These
Voronoi cells are finitely or countably many, they are disjoint, and their
union is Rd. In other words, the Voronoi cells form a finite or countable
partition of Rd.
[Note that when d = 1, the Voronoi cells are intervals with the points in
the interior, while in Chapter 8 we considered intervals with the points at
the ends. This means that in the one-dimensional case we will now arrive
at results in a slightly different way from that of Chapter 8.]
In what follows, the Voronoi cell containing the origin is of key
importance. For N put
Co = the Voronoi cell of N that contains the origin
and
iTo = the iV-point of Co-
For iV° put
Cq = the Voronoi cell of iV° that contains the point at the origin.
4.2 Shifting the Origin to and from iT0
Let S be a Cg valued random site. From now on, throughout this chapter,
(N,Z) and ((N°,Z°),S) will be functionally linked as follows (see Fig-
Section 4. Palm Characterization of Point-Stationarity 309
ure 4.1). When (N, Z) is given, define
(N°, Z°) := 9no(N, Z) [this implies that 11% = 0],
s -.= -n0.
Conversely, when ((Ar°, Z°),S) is given, define
(N, Z) := 9S(N°, Z°) [this implies that iT0 = -S].
Thus
((N°,Z°),S) is ((N,Z),0) seen from iT0,
(N, Z) is (N°, Z°) seen from S.
Under this one-to-one correspondence between (N, Z) and ((N°, Z°),S) we
have
Cq = Co — iTo, or equivalently, Co = Cq — S.
In particular, the Voronoi cells Co and Cq have the same volume:
A(C0°) = A(C0),
where A is Lebesgue measure on (Ed, Bd).
FIGURE 4.1. The Voronoi cell containing the origin.
310 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
4.3 The Palm Characterization of Point-Stationarity
We shall now show that point-stationarity of the pair (iV0, Z°) means that
if we shift the origin to a site selected uniformly at random in Cq, then
the pair will look the same from all observation sites, provided that we
volume-bias by the volume of Cq. This is analogous to the equivalence of
(a) and (d) in Theorem 3.1 in Chapter 8, except that we now highlight
point-stationarity rather than stationarity.
Theorem 4.1. Let (N°,Z°), S, and (N,Z) be supported by a
probability space (Q,J-,P°) and linked as in Section ^.2. Suppose A(Co) is finite
with probability one. Then (N°, Z°) is point-stationary and the conditional
distribution of S given (N°, Z°) is uniform on Cq if and only if
E°[\(C0)f(et(N, Z))\ = E°[X(C0)f(N, Z)\ (4.1)
for all f £M®H+ and t£Rd.
We prove this theorem in the next three subsections.
4.4 First Step in Proof of Theorem 4.1 - The 'Only-If' Part
Assume that (iV°, Z°) is point-stationary and that the conditional
distribution of S given (iV0, Z°) is uniform on Cq. We shall show that this implies
that (4.1) holds. Take t € Rd and / e M ®U+ and use the conditional
uniformity [and X(Cq) = X(Cq)] to obtain the second equality in
E°[\(C0)f(9t(N, Z))\ = E°[\(C0)f(9t+s(N°, Z°))]
f 1 (4-2)
/ f(9s(N°,Z°))ds .
Let Kn and Y° be as in Section 3.6 and put
= EC
Cn = the Voronoi cell of N° associated with the point Kn(N°,Y°),
K = N°(Yo° + [-h/2,h/2)d)
= number of iV0-points in the y°-box containing the origin,
A = union of C" over 0 ^ n < K
= union of cells with points in the y°-box containing the origin.
Section 4. Palm Characterization of Point-Stationarity 311
Note that the number of iV°-points in the box containing the origin remains
the same (since the point-map Kn selects a point in that box) after the shift
by 6Kn, namely K. Thus
=*} /
by the same function in M° ® % <8> C+ as
1{K=A} / f(9,(N°, Z°)) ds is obtained from (N°, Z°, Y°
J(t
f(9s(N°,Z°))ds from 9Kn(N°,Z°,Y
C+c-»)nc;
Due to point-stationarity, (N°, Z°, Y°) and 9Kn (N°, Z°,Y°) have the same
distribution. Thus
E°
{K=k} j
J a
(<+C°)nO
f(es(N°,z°))ds
= E°\l{K=k} [ f(es(N°,Z°))ds\.
L J(t+C-»)ncs
Recall that A is the union of Cn over 0 ^ n < K and note that A is also
the union of C~n over 0 ^ n < K. Thus summing first over 0 ^ n < k and
then over 1 ^ k < oo yields
lJ(t-
E°| / f(6s(N°,Z°))ds\ =E°f /
l(t+cs)r\A J LJf*
(t+A)nCS
f(9s(N°,Z°))ds
Now, A in fact depends on the parameter h and expands to the union of
all the Voronoi cells as h —>• oo. Thus both A and t + A expand to Rd as
h —► oo. Thus by monotone convergence,
E°
/ /(0s(iv°,;n)d5
= E°
f f(9s(N°,Z°))ds
Combine this with (4.2) to obtain that E°[X(C0)f(9t(N, Z))} does not
depend on t, that is, (4.1) holds. Thus the proof of the 'only-if part of
Theorem 4.1 is complete.
4.5 Mid-Step in Proof of Theorem 4.1 - Preparing for the 'If
The following result is needed in the proof of the 'if part of Theorem 4.1.
[By some additional effort this theorem can be strengthened to become an
equivalence result, similar to (6) and (c) in Theorem 3.1 of Chapter 8, but
we shall not do so here, since we do not need it for the Palm dualities.]
312 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Theorem 4.2. Let (N°, Z°), S, and (N, Z) be supported by the probability
space (ft, T, P°) and linked as in Section 1^.2. For t G Rd, put
Ct = the Voronoi cell of N containing the site t,
LTt = the N-point of Ct-
Let Y be a stationary shift-measurable random field that is independent
of (N, Z) and possibly supported by an extension of (ft, T, P°). Let (L,C)
be the path space of Y and put Y° = 6-sY. Suppose A(Co) is finite with
probability one. If (4.1) holds, then for all f G M. ® H ® C+,
E°l\(Co)f(et(N,Z,Y))]=E°[\(Co)f(N,Z,Y)}, t£Rd, (4.3)
A(Co) / l{77.e[o,i)'} f{et{x{CZt) Y)) dt\ = E°[/(iV' Z'Y)]' (4'4)
E
E°
A(C0) J2 f(eP(N,Z,Y))] =E°[f(N°,Z°,Y°)}, (4.5)
pes(N)n[o,i)d
and the conditional distribution of S given (N°, Z°,Y°) is uniform on Cq.
Proof. Assume that (4.1) holds. Since Y is stationary and independent
of (N,Z) we obtain from (4.1) that (4.3) holds for all / = /i/2, where
/i€M®H+ and f2 G C+, since then
E°[\(C0)f(9t(N,Z,Y))] = E°[\(C0)fl(9t(N,Z))}E°[f2(etY)}
= E°[\(Co)fi(N,Z)]E°[f2(Y)} =E°[\{CQ)f(N,Z,Y))].
Thus (4.3) holds for all / G M ® U ® £+.
To obtain (4.4), apply (4.3) with f(N, Z, Y) replaced by f(N, Z, Y)/X(C0)
and f(0t(N,Z,Y)) by f(9t(N,Z,Y))/X(Ct) to get
E°[\{C0)M(N, Z, Y))/X(Ct)} = E°[f(N, Z, Y)].
Integrating over t G [0, l)d and interchanging integration and expectation
yields
E°[a(C0)/ f{pt(N,Z,Y))l\{Ct)dt\=E°[f{N,Z,Y)\. (4.6)
L Jte[o,i)d J
Take i G Zd and note that
/ l{Ijtei+loAy}f(et(N,Z,Y))/X(Ct)dt
Jte[o,i)d
Section 4. Palm Characterization of Point-Stationarity 313
is obtained from (iV, Z, Y) by the same function in M ® H ® £+ as
/ l{77«e[o,i)<}/(WZ,K))/A(Ct)di
■/te-t+[o,i)d
from #_j(iV, Z, F). Apply (4.3) [with / replaced by this function and t
replaced by —i] to obtain
E°[a(C0) / l{ntei+[0Ay}f(et(N,Z,Y))/X(Ct)dt
L Jte[o,i)d J
= E°[a(C0) f l{ntelo,i)'}M{N,Z,Y))/\(Ct)dt
L Jte-i+[o,i)d
Sum over i £ Zd to obtain
E
A(Co) / f(9t(N,Z,Y))/X(Ct)dt
Jte[o,i)d
A(Co) / l{77,€[o,i)-}/(»tW^y))/A(Ct)dt
JteRd
This and (4.6) yield (4.4).
In order to establish (4.5), take g e Bd+ and f £ M®U®C+ and note
that
teC ^ if(0nt(N,Z,Y))/\(Ct) = f(0p(N,Z,Y))/\{Cp),
p \g(t-nt)=g(t-p).
Apply this and (4.4), with f(N,Z,Y) replaced by f(N°, Z°,Y°)g(S) and
f(6t(N, Z, F)) by f(0„t (N, Z, F))<?(* - nt), to obtain
A(Co) Yl f{0P{N,Z,Y)) I g(t-p)dt/\(Cp)
Pes(N)n[o,i)d Jtecp ^j^
= E°[f(N°,Z°,Y°)g(S)}.
Taking g = 1 yields (4.5).
In order to establish the conditional uniformity, replace f(9p(N,Z,Y))
by f(0P(N, Z, F)) JteC g(t - p) dt/X(Cp) in (4.5) to obtain
E°[a(C0) J2 f(0P(N,Z,Y)) f g(t-p)dt/\(Cp)
= E° \f(N°, Z°,Y°) [ g(t) dt/\(Co)].
314 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Compare this and (4.7) to obtain that for g G Bd+ and / G M®H® £+,
E°[f(N°, Z°, Y°)g(S)} = E° \f(N°, Z°,Y°) [ g(t) dt/\(C0)
This means that the conditional distribution of 5 given (N°, Z°,Y°) is
uniform on Cq as desired, and the proof of Theorem 4.2 is complete. □
4.6 Final Step in Proof of Theorem 4.1 - The 'If Part
Assume that (4.1) holds. Then, due to Theorem 4.2, the conditional
distribution of S given (A7^, Z°, Y°) is uniform on Cq, and thus it only remains
to establish that (N°, Z°) is point-stationary. Let Y° and ir be as in
Definition 3.1. Assume also that Y° is independent of (N, Z), that is, not only
independent of (N°,Z°) but also of ((N°,Z°),S). [This is no restriction,
since if this were not the case, we could replace Y° by a random field that
is independent of (N, Z) and has the same distribution as Y°: (3.1) holds
for Y° if and only if it holds with Y° replaced by this copy of Y°.] Put
y = osy°
and note that since Y° is stationary and independent of (N,.X), so is Y.
Thus, due to Theorem 4.2, (4.5) holds. Fix / G M ® H ® C+ and apply
(4.5) with / replaced by f(0■*(■)) to obtain
E°[A(Co) £ f(0„Op(N,Z,Y))]
Pes{N)n[o,iy (4.8)
= E0[f{e*{N°,Z0,Y°))].
Take i G Zd and note that
/ , 1{p+7r(ep(N,z,y))ei+[o,i)'i}/(^^p(Ar, Z, Y))
Pes(N)n[o,i)d
is obtained from (N, Z, Y) by the same function in M ® H ® C+ as
/ , 1{p+7r(ep(N,z,y))e[o,i)'i}/(^^p(ArI ^> *0)
p€»(N)n([0,l)d-t)
from 9-i(N, Z,Y). Applying (4.3) [with / replaced by this function and t
by —i] yields
E°^A(Cb) 22 1{p+n(eT{N,z,Y))ei+[o,i)d}f(0^P(N,Z,Y))^
Pes(N)n[o,i)d
= E0|^A(Co) 2^ 1{p+7r(ep(N,z,y))e[o,i)'i}/(^^p(-^'^'^r))
pGs(N)n([0,l)d-i)
Section 4. Palm Characterization of Point-Stationarity 315
Sum over i £ Zd and compare with (4.8) to obtain
E°^A(Cb) 22 ^{p+-K(ep(N,z,Y))€[o,i)d}f{^7r0p{N,Z,Y))
pEs(N)
= E°[f(e7r(N°,Z°,Y°))].
Since 6V is bijective, it holds that for each point q G s(iV) fl[0, l)d there is a
unique point p G s(N) such that q = p+ ir(6p(N, Z, Y)). Applying this on
the right-hand side yields [note 07r0p(N, Z, Y) = Op+Tv(ep(N,z,Y)){N, Z, Y)]
Ec
A(Co) £ f(eq(N,Z,Y))]=E°[f(8AN°,Z°,Y°))}.
g6s(N)n[0,l)d
Compare this and (4.5) to obtain that
E°[f(9n(N°,Z°,Y°))] = E°[f(N°,Z°,Y°)}, f zM®U®C+,
holds for Y° and 7r as in Definition 3.1. Thus (N°,Z°) is point-stationary,
and the proof of Theorem 4.1 is complete.
4.7 The Backgrounds and Point-Maps in Section 3.6 Suffice
The Palm characterization (4.1) was established in Section 4.4 using only
distributional invariance under the point-shifts associated with the family
of background fields Y° and point-maps k„, n e Z, defined in Section 3.6
(indexed by a parameter h suppressed in the notation). This family thus
suffices to characterize point-stationarity. Below (in the proof of Theorem 5.1)
we shall need the following slightly modified result.
Lemma 4.1. Let V be a [0,1) valued random variable that is independent
of (N°, Z°) and the family of random fields Y° in Section 3.6. Let V° be
the stationary random field defined by V° = V, t G Rd. The pair (N°,Z°)
is point-stationary if and only if
6n°n(N°,Z°) = (N°,Z°) (4.9)
for all random points 77° of the form
II°n = an(N°,Z°,Y°,V°),
where an, n G Z, are the (N°, Z° ,Y° ,V°)-point-maps defined by
an(/l, Z, V, v) = K[v0k}+n(H, V)
with k = n(yo + [-h/2,h/2)d) and [■] denoting the integer part.
316 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
PROOF. The 'only-if holds because the condition in Definition 3.1 is
satisfied, namely (y0,^0) regarded as a bivariate random field (Y° ,V°)teRd is
stationary and independent of (N°, Z°), and 0a„ is bijective for the same
reason as 6Kn is: see the end of Section 3.6 and replace the origin in that
argument by ao(fi,z,y,v).
To establish the 'if part, let 5 be a random site such that the conditional
distribution of S given (N°, Z°) is uniform on C£. Replace Y° by (Y°,V°)
and Kn by an in the argument in Section 4.4 to obtain that (4.1) holds. By
Theorem 4.1 this implies that (N°,Z°) is point-stationary. □
4.8 Palm Characterization of Stationarity
Modifying the proof of Theorem 4.1 in an obvious way yields the following
dual Palm characterization of stationarity (the analogue of the equivalence
of (a) and (d) in Theorem 3.1 of Chapter 8).
Theorem 4.3. Let (N, Z) and ((N°, Z°),S) be supported by the probability
space (n,^7, P) and linked as in Section 4-2. Suppose A(Co) < oo with
probability one. Then (N, Z) is stationary if and only if the conditional
distribution of S given (N°,Z°) is uniform on Cq and
E[/(077- (N°, Z°))/X(Co)] = E[/(JV°, Z°)/A(C0)], / €M° ® H+,
for all random points LJ° as in Definition 3.1.
PROOF. Simply replace E°[-] by E[-/A(C0)] throughout the proof of
Theorem 4.1. □
Thus stationarity of the pair (N, Z) means that if we view it from the
point of the Voronoi cell where the origin lies, then the origin is located
uniformly at random in the cell, and moreover, the pair looks the same
from all observation points, provided that we volume-debias by the volume
of the cell.
The analogue of Theorem 4.2 is obtained by replacing E° [■] by E[-/A(Co)]-
Rather than stating that result we let the following suffice.
Lemma 4.2. // (N, Z) is stationary, then for all f G M ® 7i+,
E[ £ f(ep(N,Z))]=E[f(N°,Z°)/\(C0)}. (4.10)
p€s(N)n[0,l)d
In particular,
E[7V([0,l)d)] = E[l/A(C0)]. (4.11)
Proof. In the proof of Theorem 4.2 replace E°[-] by E[-/A(Co)] and leave
out Y to obtain (4.10) instead of (4.5). Take / = 1 to obtain (4.11). O
Section 5. Point-Stationarity Characterized by Randomization 317
The expected number of points in a unit box E[iV([0, l)d)] is called the
intensity of the stationary point process TV. According to (4.11) the intensity
can be calculated by taking the expectation of the reciprocal of the volume
of the Voronoi cell containing the origin.
5 Point-Stationarity Characterized by Randomization
In this section we shall show that point-stationarity means distributional
invariance under doubly randomized point-shifts: shift to a uniformly
selected site followed by a shift to a uniformly selected point.
5.1 The Characterization Result
A point-stationary (TV°, Z°) turns out to be characterized by the following
property:
If (TV°, Z°) is first shifted by an independent site U selected uniformly
at random in any bounded Borel set B of positive volume and the
origin then shifted to a #_i/TV0-point 77 picked uniformly at random
among the points s(6-uN°) C\B that ended up in B, then the
distribution of (N°,Z°) does not change.
There is at least one 9-uN°-point in B, the one at U which initially was
at the origin:
0-uN°{{U}) = N°({O}) = l.
This characterization is the key to the interpretation of the randomized-
origin Palm duality in Section 8. Note that the first randomized shift would
not change the distribution of a stationary (TV, Z): if (TV, Z) is stationary,
then, since U is independent of (TV, Z),
0-u(N,Z)2(N,Z).
The first randomized shift does, however, change the distribution of a point-
stationary (N°,Z°), since 0-u{N°,Z°) has no point at the origin, unlike
(N°,Z°). But when the first randomized shift is followed by the second,
then the distribution is restored: if (TV°, Z°) is point-stationary, then
en0-u(No,Z°) = (No,Zo).
Observe that 77° := 77 - U is uniform on s(N°) n (B - U), so we can
describe this characterization of a point-stationary (N°,Z°) alternatively
as follows:
If we place any bounded Borel set of positive volume uniformly at
random around the TV°-point at the origin and shift the origin to an
TV°-point 77° selected uniformly at random among the TV°-points in
that set, then the distribution of (N°,Z°) does not change.
318 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Here is a more formal statement.
Theorem 5.1. Let B £ Bd be bounded and such that X(B) > 0. Let U be
a random site that is uniform on B and independent of (N°,Z°). Let LT°
be a random point of N° such that the conditional distribution of 11° given
((N°, Z°), U) is uniform on the finite set of points of N° lying in B — U,
that is, uniform on
s(N°)n(B-U).
Then (N°, Z°) is point-stationary if and only if for each such B,
{6no (N°, Z°),LT° + U)^ ((N°, Z°), U). (5.1)
In fact, (N°, Z°) is point-stationary if (5.1) holds for all B of the form
B = [-h/2,h/2)d, h>0.
Note that with 77 := 77° + U we have that U = 77 and that the conditional
distribution of 77 given ((N°,Z°),U) is uniform on s(N°) n B. Thus it
follows from Theorem 5.1 that if we have a point-stationary point process
and place a bounded Borel set of positive volume uniformly at random
around the point at the origin, then the point at the origin is located
uniformly at random among the points in the set.
5.2 First Step in the Proof of Theorem 5.1 — Preparations
Let Y° be as in Section 3.6 and choose h large enough for B to be contained
in [-h/2,h/2)d. Put
B' = [-h/2,h/2)d\B.
Let /3n, n £ Z, be the following modification of the point-shifts Kn in
Section 3.6: if 0 £ y0 + B,
let /3n(fi,y) be the nth point after the point at the origin in the
circular lexicographic enumeration of the /i-points in y0 + B,
and if 0 £ y0 +B',
let Pn((J-,y) be the nth point after the point at the origin in the
circular lexicographic enumeration of the /i-points in yo + B'.
Then 9/3„ is bijective for the same reason as 0Kn is bijective (see Section 3.6).
Note that conditionally on —Yq £ B, the set Y$ + B is the set B placed
uniformly at random around the origin (like B — U). In order to select an
A^-point placed uniformly at random among the iV°-points in 1^,° + B (like
77° among the Appoints in B — U) proceed as follows. Let V be uniform
Section 5. Point-Stationarity Characterized by Randomization 319
on [0,1) and independent of (N°,Z°, Y°), and define a stationary random
field V° by
Vt° = V, ie Rd,
and for each nEZ, define (N°, Z°, Y°, F°)-point-maps an as follows (with
[■] denoting the integer part):
if 0 e ?/o +B, let an(n, z, y, v) = h
vok]+n
(H,y), where k = n(y0 + B);
if 0 e yo + B', let an(n,z,y,v) - fin(n,y).
Then 6a„ is bijective, and (y0,^0), regarded as a bivariate random field
{Yt°,V°)seRd, is stationary and independent of (N°,Z°).
Since [Vk] is uniform on {0,1,..., k — 1} we can now select an iV°-point
placed uniformly at random among the iV°-points in yo° + B as follows:
II°=an(N°,Z°,Y°,V°).
Thus
p°(((iv°,z°),77° - r0°,-r0°) e -| - y° e B)
= p°(((n°, z°), n° + u,u)€-).
5.3 Mid-Step in the Proof of Theorem 5.1 - The 'Only-If' Part
Suppose (N°,Z°) is point-stationary. Then
en°{N°,Z°,Y°,V°) = (N°,Z°,Y0,V°),
which implies
(Oni(N°,Z°),-YZ.) = ((N°,Z°),-Y°). (5.3)
Now,
{-y0°eB} = K^B} and -Y^.=n°-Y0°, (5.4)
which yields the second identity in
p°((en°(N°,z°),n° + u)£-)
= P°((0,7o(Ar°,Z°),77° - Y°) e -| - Y° e B) (due to (5.2))
= P°((077;(JVo)Z°),-y£.)e-|-yi|o €B) (due to (5.4))
= P°(((N°,Z°),-Y0°) e -| - r0° e B) (due to (5.3))
= P°(((N°,Z°),U)£-) (due to (5.2)).
Thus point-stationarity implies (5.1) for all bounded B of positive Lebesgue
measure.
320 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
5.4 Final Step in the Proof of Theorem 5.1 - The 'If Part
Suppose (5.1) holds for B = [-h/2,h/2)d. With this B, (5.2) becomes
p°(((i\r,z°),n° - y°,-y°) e ■) = P°(((N°,z°),n° + u,u)c •)•
Thus 0nz{N°,Z°) = 6n°{N°,Z°), which together with (5.1) yields
^o(AT0,Z°) = (iV0,Z0).
Thus (4.9) in Lemma 4.1 holds. Also, when B = [-h/2,h/2)d, the an
satisfy the condition in Lemma 4.1. Thus, by Lemma 4.1, (N°, Z°) is point-
stationary, and the proof of Theorem 5.1 is complete.
6 Point-Stationarity and the Invariant a-Algebras
In this section we first define the invariant <r-algebras X and J, then
extend Theorem 5.1 slightly, and finally show that several properties are
preserved under conditioning on J, the invariant <r-algebra of both (N, Z)
and (N°, Z°) = 6nQ{N, Z). This we need for the randomized-origin duality
in Section 8.
6.1 The Invariant a-Algebras I and J
The pair (N, Z) is a measurable mapping from (fi, J7) to (M x H,M ®Ti).
Define the invariant a-algebra on M. ® H by
I:={B€M®H :6»t-1B = -B,teEd} (6.1)
and the invariant a-algebra of (N, Z) by
J:=(N,Z)-ll [tha.tis,J={{(N,Z)€B}:BEl}]. (6.2)
Thus I is a sub-u-algebra of M. ® H, while J" is a sub-u-algebra of T.
Lemma 6.1. (a) For any random site T supported by (f2, T) it holds that
{eT(N,Z)€B} = {(N,Z)€B}, Bel. (6.3)
(b) It holds that g e X if and only if g = g0t for all t € Rd.
Proof, (a) For B e X we have
{9T(N, Z)eB}= |J {0t(N, Z)€B,T = t}
teRd
= \J{(N,Z)€B,T = t} = {(N,Z)eB}.
teRd
Section 6. Point-Stationarity and the Invariant a-Algebras 321
(b) If g = 1b where B e X, then g9t = le-iB = Is- It follows that
g = gOt holds for all simple g G X and thus for all g G X. Conversely, if
3 = g0t for i e Ed, then for each A £ B, O^1g~lA = (g6t)-lA = g~lA.
Thus g~lA e X, that is, g £l. O
According to Lemma 6.1(a), J is the invariant u-algebra of 6t(N, Z) for
any random site T supported by (f2, F). Since
(N°, Z°) = 6n0(N, Z) [assumed from Section 4.2 onward],
this means, in particular, that J is also the invariant u-algebraof (N°,Z°):
J=(iV°,Z°)-1X [that is, J = {{(N°,Z°)EB}:BEX}]. (6.4)
[Note that although we have chosen to regard iV° as a random element in
{M°,M°), and not in (M,M), the invariant cr-algebra of (N°,Z°) is still
J, since
X° = the invariant u-algebra on (M°,M°)
= the trace of M° x H on X,
and thus J = (N^Z0)-1! = (N°,Z°)-ll°.}
6.2 Extension of Theorem 5.1
We shall now extend Theorem 5.1 by allowing the set B to be expanded
by an invariant random variable. This result will be used in the proof of
Theorem 8.4.
Theorem 6.1. Let g S X be a strictly positive and finite function and put
G = g(N°,Z°).
Let B S Bd be bounded and such that \{B) > 0. Let U be a random site that
is uniform on B and independent of (N°,Z°). Let 11° be a random point
of N° such that the conditional distribution of 11° given ((N°,Z°),U) is
uniform on the finite set of points of N° lying in G(B — U), that is, let 11°
be uniform on
s(N°)nG(B-U).
Then (N°,Z°) is point-stationary if and only if for each such B
(en° (N°,Z°), n° + GU) = ((N°, Z°), GU). (6.5)
PROOF. First assume that g is bounded and repeat the proof of
Theorem 5.1 with B replaced by g(fi,z)B. By Lemma 6.1(6), gel means that
g{dt(n,z))B = g{v,z)B for all t e Rd. This fact [that the set g{n,z)B does
322 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
not change by shifting (/i, z)\ is needed to deduce that 6pn is still bijective.
Thus (6.5) holds for bounded g. If g is not bounded, take a finite constant
a > 0 and apply (6.5) with g replaced by g A a to obtain
E[f(6„. (N°, Z°),LT° + GU)l{g(6no lN.,z>))<a}]
= E[f((N°,Z°), GU)l{g{N.iZ.)<a}], f€M®H®Bd+.
Send a to infinity to obtain (6.5). □
6.3 Conditioning on J
We now show that all characterizations still hold when we condition on J.
This result will be used in Section 8.
Theorem 6.2. Let ((N°,Z°),S) and (N,Z) be linked as in Section 4.2
and let P and P° be two probability measures on (Q,,T). Then the following
claims hold.
(a) The pair (N°, Z°) is point-stationary under P° if and only if it is so
conditionally on J.
(b) The pair (N,Z) is stationary under P if and only if it is so
conditionally on J.
(c) The formula (4.1) holds if and only if it holds with E° replaced by
E°[-|J].
(d) The formulas (5.1) and (6.5) hold under P° if and only if they hold
conditionally on J.
(e) If(N,Z) is stationary under P, then
E[N([0, l)d)\J] = E[1/A(C0)|J] a.s. P.
PROOF. The formula (3.1) in the definition of point-stationarity is
equivalent to
E°[Wn>{N°,Z°,Y°))l{en.lNotZo)€B}]
= E°{f(N°,Z°,Y°)l{iN,tZ,)eB}}, Bel, fzM®U®C+.
Due to (6.3), it holds that
l{6»r,o(N°,Z°)6B} = l{(N°,.Z°)eB}, B £ 1,
and thus (3.1) is equivalent to
E°[f{0„.{N°,Z°,Yo))\J] = P°[f(N°,Z°,Y°)\J], f£M®U®C+,
Section 7. The Point-at-Zero Duality 323
that is, (a) holds. We obtain (6), (c), and (d) in a similar way (see the proof
of Theorem 7.1 in Chapter 8 if more details are needed).
In order to obtain (e), take / e X in Lemma 4.2. Then, by Lemma 6.1(6),
f(Op(N,Z)) = f{N°,Z°), and we obtain from (4.10) that
E[f(N°, Z°)N({0, l)d)} = E[f(N°,Z°)/\(C0)}, f e X,
that is, (e) holds. □
7 The Point-at-Zero Duality
Up to now this chapter has been concerned with point-stationarity, the
extension to higher dimensions of the one-dimensional concept of cycle-
stationarity. We now turn to the extension of the two Palm dualities
considered in Chapter 8 in the one-dimensional case. This section deals with the
point-at-zero duality between stationarity and point-stationarity and the
next section with the randomized origin duality. In fact, we were ready for
the point-at-zero duality after Section 4; the aspects of point-stationarity
studied in Sections 5 and 6 will be used for the randomized-origin duality.
The point-at-zero duality has (as in Chapter 8) the following informal
interpretation:
The point-stationary dual behaves as the stationary process
(7.1)
conditioned on having a point at the origin.
We start by establishing the duality, then motivate the interpretation, and
finally skim through a simulation application. We shall spend minimal effort
on proofs, since they are similar to those in Chapter 8, Sections 4, 5, and
6. Also, many comments from Chapter 8 apply here as well but will not be
repeated.
7.1 Stationarity O Point-Stationarity
Let (N,Z) and ((N°,Z°),S) be defined on some measurable space (£1, J7)
and linked as in Section 4.2, namely,
(N°,Z°)=6„0(N,Z) and 5 = -J7„,
or equivalently,
(N,Z) = es(N°,Z°).
Recall that C0 and Cq, respectively, are the Voronoi cells of (N,Z) and
(N°,Z°) containing the origin and that they have the same Lebesgue
measure A(C0°) = A(Cb).
324 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Let P and P° be two probability measures on (Q,T) satisfying
E[1/A(C0)] < oo
and
1
dP° =
A(Co)El[l/A(C0)]
or equivalently
dP (yolume-debiasing P)
E°[A(C0)] < oo
and
dP
A(Cp)
E°[A(C0;
rdPc
(volume-biasing P°).
Theorem 7.1. With (N,Z), ((N°,Z°),S), P, and P° as above it holds
that
(N, Z) is stationary under P
if and only if
(N°, Z°) is point-stationary under P°
and S is conditionally uniform on Cq given (N°,Z°).
Proof. This is an immediate consequence of Theorem 4.1.
7.2 Intensity - The Distribution of S
Let P and P° be linked as above. Note that
1
□
E[l/A(C0)] = -[A(Co)].
When N is stationary under P, then these two quantities equal the intensity
E[N[0, l)d] of N; see Lemma 4.2.
Also, note that the conditional distribution of 5 given (N°,Z°) is the
same under P as under P° [since A(Cb) is determined by (N°,Z°); see
Lemma 4.1 in Chapter 8].
The distribution of 5 under P is given by the following theorem.
Theorem 7.2. Suppose the conditional distribution of S given (N°,Z°) is
uniform on Cq under Y. Then S is continuous under P and has the density
P(5 E ds) _ P°(s E C§) seEd_
ds
E°[A(C0)]
(7.2)
Section 7. The Point-at-Zero Duality 325
Proof. For B e Bd,
I>(SeB) = E{E[l{SeB}\(N°,Z°)}}
= E[X(B n Cq )/A(C0)] (by conditional uniformity)
= E°[\{B n C0°)]/E°[A(Co)] (by the definition of P°)
= /"p0(SGC00)ds/E0[A(Co)],
as desired. D
7.3 The Point-at-Zero Interpretation — Limit Motivation
The point-at-zero interpretation (7.1) of the duality established in
Theorem 7.1 can now be formulated as follows:
P((N,Z) e-|770 = 0) = P°((iVo,Zo) e-)- (7.3)
This expression is informal because P(i7o = 0) = 0 when N is stationary,
since
P(770 = 0) < P(770 g [-h/2,h/2)) < E[N([-h/2,h/2)d)}
= /idE[l/A(C0)]4-0, hiO.
The following theorem yields a strong limit motivation of (7.3): put 77o =
t = 0.
Theorem 7.3. Suppose the conditional distribution of S given (N°, Z°) is
uniform on Cq under P. Then, for each A £ M. ® H and s G M.d,
P((N°, Z") 6 A\S = s)= P°((7V°, Z°) e A\s € Cg), (7.4)
and thus there is a version ofP(9n0(N, Z) G A\IIo = •) such that
P(0„o(N,Z) e A\n0 = i) -► P°((N°,Z°) e A), \t\ I 0.
Proof. With f e M®H+ put
5/(s) = E°[/(iV°,z°)|seC0o], seW.
Let h G Bd be a nonnegative function and apply (7.2) for the first step in
E[h(S)9f(S)}
= [ h(s)E°[f(N°,Z°)\s£CS]P°(s&C°0)/E°[\(C0)]ds
= E° [f{N°, Z°) J h(s)l{secs} ds] /e°[A(C0)]
= E[f{N°,Z°)Jh(8)l{s€cS}ds/\(C0j\.
326 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Apply the conditional uniformity of S to obtain (7.4) in the following form:
E[h(S)gf(S)} = E[h(S)f(N°, Z")], feM® H+,h e Bd+.
The limit claim now follows by noting that 77o = —5 and that l^tec°] —* 1
pointwise as \t\ I 0. D
7.4 Application — Perfect Simulation
The two-step duality construction in Theorem 7.1 yields perfect solutions
of the problem of simulating a point-stationary (N°,Z°), or a stationary
(N,Z), when it is known how to generate the dual. The arguments are
analogous to those in Section 6 of Chapter 8, and thus we only state the
results.
Suppose (cf. Section 6.3 in Chapter 8) we wish to generate the stationary
(N,Z) when it is known how to generate its point-stationary dual. Here is
a solution when there is a known constant a < oo such that
P°(A(C0) < o) = 1. (7.5)
Recursively, for n ^ 1:
1. Generate (AT<n), Z<n>) with distribution P°((iV°, Z°) € •) until X(C{0n})
has been realized.
2. Generate an independent E/(") uniformly distributed on (0,1).
3. Repeat steps 1 and 2 independently for n ^ 1 until {[/<") ^ X(C{0")/a}
occurs and put
K = inf{n ^ 1 : [/<"' ^ X(C{0n))/a}.
4. Now generate as much of (N^K\ Z^) as desired.
5. Generate a random site V uniformly distributed on Q '.
Then
ev{N{K\Z^K)) is a perfect copy of the stationary (N, Z),
and the expected number of acceptance-rejection trials is a/E°[A(Co)]-
Apply (cf. Section 6.7 in Chapter 8) the above method without the
assumption (7.5). Fix a < oo, carry out steps 1 through 5, and denote the
number of acceptance-rejection trials by Ka and the uniform site in Cq
by Va. Then
0v<,{N{Ka]', Z^"') is an imperfect copy of the stationary (N,Z)
Section 7. The Point-at-Zero Duality 327
with perfection probability G(a/G(a)), where G is the distribution function
G{x) = E°[x A A(C0)]/E°[A(Co)], 0 ^ x < oo;
that is, 9Va (N<-K"\ Z(A"°>) coincides with a copy of (N, Z) with probability
G(a/G(a)).
Suppose (cf. Section 6.5 in Chapter 8) we wish to generate the point-
stationary (N°,Z°) when it is known how to generate its stationary dual.
Here is a solution in the case when there is a known constant b > 0 such
that
P(A(C0) > b) = 1. (7.6)
Recursively, for n ^ 1:
1. Generate (iV("),Z<")) with distribution P((iV°,Z°) e •) until A(C^n))
has been realized.
2. Generate an independent [/'"' uniformly distributed on (0,1).
3. Repeat steps 1 and 2 independently for n ^ 1 until {£/(") ^ 6/A(Cq"0}
occurs and put
# = inf{n ^ 1 : [/<"> ^ 6/A(c£n))}.
4. Now generate as much of (N^K\Z^K^) as desired.
Then
(N{K),Z{K)) is a perfect copy of the point-stationary (N°,Z°),
and the expected number of acceptance-rejection trials is l/(E[l/A(Co)]6).
Applying this method without the assumption (7.6) yields an imperfect
solution with perfection probability R(l/(bR(l/b))), where R is the
distribution function defined by
R(x) = E[x A (l/A(Co))]/E[l/A(Co)], 0 ^ x < oo.
Finally (cf. Section 6.6 in Chapter 8), the problem of generating the
stationary (N, Z), when it is known how to generate its point-stationary dual,
can be reduced to that of generating the location of the stationary origin
seen from the closest point, that is, to the problem of generating a random
site W with density
P°(s € C0°)/E°[A(C0)], seRd, [see Theorem 7.2].
Proceed as follows:
328 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
1. Generate W with the density P°(s € C00)/E°[A(Co)], s € Rd.
2. Generate (AT(n),Z<n)) with distribution P°((N°,Z°) € •) until C(Qn)
has been realized.
3. Repeat step 2 independently for n ^ 1 until {W G Cq } occurs and
put
# = inf{n^ l:lfeCin)}.
Then
0w(Z(if),S(Ar)) is a perfect copy of the stationary (N,Z),
and the expected number of acceptance-rejection trials is infinite.
8 The Randomized-Origin Duality
We now extend the randomized-origin duality from d = 1 to d > 1. This
duality is obtained in the same way as the point-at-zero duality except
that we condition on the invariant cr-algebra before biasing. It has (as in
Chapter 8) the following informal randomized-origin interpretation:
The point-stationary dual behaves like the stationary process
(8.1)
with origin shifted to a uniformly chosen point;
and conversely:
The stationary dual behaves like the point-stationary process
(8.1°)
with origin shifted to a site chosen uniformly in Rd.
These interpretations are informal because there is neither a uniform
distribution on a countable set of points nor on Rd. We start by establishing
the duality and then motivate the interpretations.
8.1 Stationarity O Point-Stationarity
Again let (N,Z) and ((N°,Z°),S) be defined on some measurable space
(fi, T) and linked as in Section 4.2, namely,
(N°,Z°) = 6„0(N,Z) and S = -770,
or equivalently,
(N,Z)=6s(N°,Z°).
Section 8. The Randomized-Origin Duality 329
Recall that Co and Cq, respectively, are the Voronoi cells of (N,Z) and
(N°,Z°) containing the origin and that they have the same Lebesgue
measure, A(C£) = A(C0). Recall from Section 6.1 that (N,Z) and (N°,Z°)
have the same invariant cr-algebra, namely
J = (N, Z)-1! = {N°, Z°)~ll,
where
1={B£M®%+ :etB = B,teM.d}.
Let P and P° be two probability measures on (fi, T) satisfying
E[1/A(C0)|J] < oo a.s. P
and
dP° = A(Co)E[l1/A(C0)|J]dP (volume-debiasinS P ^en J),
or equivalent ly,
E°[A(Co)|J] <oo a.s. P°
and
AfC ^
dP = Eorwg°N| 7ldP° (volume-biasing P° given J).
Note that (see the proof of (8.3a) in Chapter 8)
P = P° on J, (8.2)
which we can write as
P{{N, Z) e •) = P°((iV°, Z°) € •) on I. (8.3)
Note also that (see the proof of (8.36) in Chapter 8)
and, in particular,
E°MC^J] = E[l/A(Co)|JI- (8-5)
330 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Theorem 8.1. With (N,Z), ((N°,Z°),S), P, and P° as above it holds
that
(N, Z) is stationary under P
if and only if
(N°,Z°) is point-stationary under P°
and S is conditionally uniform on Cq given (Na,Z°).
Proof. By Theorem 6.2(6), (N,Z) is stationary under P if and only if
E[f(6t(N,Z))\J] = -E[f(N,Z)\J], feM®H+, teRd.
By (8.4), this is equivalent to
E°{f(6t(N,Z))\(C0)\J]
= E°[f(N,Z)\(C0)\J], feM®H+, teRd.
By Theorem 6.2(c), this is equivalent to
E°[f(6t(N,Z))\(C0)} = E°[f(N,Z)\(C0)}, feM®H+, teRd.
A reference to Theorem 4.1 completes the proof. □
The randomized-origin interpretations (8.1) and (8.1°) of the duality
established in Theorem 8.1 can now be formulated as follows:
P(0uniform point of n(N, Z) G •) = P°((iV°, Z°) G ■). (8"6)
and conversely,
P°(0unifOrm site in R-(^°, ^°) € ■) = P((N, Z) G •)• (8-6°)
This does not have an immediate meaning, because such uniform sites and
points do not exist. Below we motivate (8.6) and (8.6°) by shift-coupling
and by Cesaro limit results. These results rely on the coupling equivalences
in Section 7.4 of Chapter 7.
8.2 Shift-Coupling the Stationary and Point-Stationary Duals
According to the following theorem the stationary and point-stationary
duals in the duality established in Theorem 8.1 are really the same, only
seen from different sites.
Theorem 8.2. Suppose the equivalent statements in Theorem 8.1 hold.
Then the probability space (fi, J-, P) can be extended to support a random
point II of N such that
P(6n(N,Z)£-)=P°((N°,Z°)e-).
(8.7)
Section 8. The Randomized-Origin Duality 331
Conversely, the probability space (Q, T, P°) can be extended to support a
random site T such that
P°(6T{N°, Z°) g ■) = P((N, Z) G •)• (8-7°)
Proof. We shall first establish (8.7°). Apply the results in Section 7.4
of Chapter 7 with Y having the distribution P°((iV0,Z0) G •) and Y' the
distribution P((iV, Z) G •) and with {8t : t G Rd} the transformation group.
Then (8.3) above is the condition (c) in Section 7.4 of Chapter 7, which is
equivalent to (a') in that subsection, which is (8.7°).
In order to establish (8.7) use the transfer extension in Section 4.5 of
Chapter 3: use (8.7°) to extend (fi, T, P) to obtain a random point 77 of
N such that
P(((iV, Z), 77) G •) = P°((8t(N°, Z°), -T) G •)•
This implies
P(6n(N,Z) G •) = P°(0-T0t(N°,Z°) G ■),
that is, (8.7) holds. □
8.3 Cesaro Total Variation Motivation of (8.6°)
The next theorem gives a Cesaro total variation meaning to the randomized-
origin interpretation (8.6°).
Theorem 8.3. Suppose the equivalent statements in Theorem 8.1 hold.
Let Bh G Bd,0 < h < oo, be F0lner averaging sets, that is, a family of sets
satisfying 0 < X(Bh) < oo and, for all t £ Rd,
X(Bh H(t + Bh))/\{Bh) -> 1, h->oo,
(this holds, for instance, when Bh = hB\; see Theorem 2.1 in Chapter 7).
Let g G X fee strictly positive and finite and put
G = g(N°,Z°).
Let Uh be uniform on Bh and independent of (N°, Z°) under P°. Then
P°(6Guh(N°, Z°) G •) -> P((N,Z) G •)
in total variation as h —>■ oo.
PROOF. Apply the results in Section 7.4 of Chapter 7 with Y having the
distribution P°((A^°,Z°) G ■) and Y' the distribution P((N,Z) G ■) and
with {6t : t G W} the transformation group. Then (8.3) above is the
condition (c) in Section 7.4 of Chapter 7, and we obtain the desired limit
result from the final display in that subsection. □
332 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
8.4 Cesaro Total Variation Motivation of (8.6)
The next theorem gives a Cesaro total variation meaning to the randomized-
origin interpretation (8.6).
Theorem 8.4. Suppose the equivalent statements in Theorem 8.1 hold.
Let Bh, 0 < h < oo, and G be as in Theorem 8.3. Let each Bh be bounded.
Let Th be a random site such that under P the conditional distribution of
Th given (N, Z) is uniform on s(N) (~) GBh when s(N) fl GBh / 0 flfirf
Th - 0 when s(N) n GBh = 0. Then
p(6Th(N,Z)e-)^p°((N°,z°)e.)
in total variation as h —> oo.
Proof. Let E/h be as in Theorem 8.3. Let 77£ be a random point of N°
such that the conditional distribution of 77£ given ((N°, Z°),Uh) is uniform
on s(N°) n G{Bh - Uh) under P°. Apply Theorem 6.1 to obtain
P°(enh0-auh(N°,Z°) 6 •) = P°((N°,Z°) € ■), (8.8)
where
nh = n° + Guh.
Note that the conditional distribution of 77/j given ((iV°, Z°),Uh) is uniform
on s(6-GUhN°)r\GBh under P°. By Lemma6.1(6), G = g(6-GUh(N°, Z°)),
and thus the conditional distribution of 77^ given 6-avh (N°, Z°) is uniform
on s(9-GUhN°)f\GBh under P°. It follows that, P°(9-GUh(N°,Z°) e •)
almost surely, we have
P(Th G -\(N,Z) = ■) = P°(IIh G -le.Gu^N^Z0) = •).
This, and (3.3) in Lemma 3.1 of Chapter 6, yields
||P((iV, Z),Th) G •) - P°((0-GUh(N°,Z°),nh) 6 -)II
= ||P((iv, z) g 0 - P°(0-GUh(N°, z°) e oil,
where || • || denotes the total variation norm. Due to (3.2) in Lemma 3.1 of
Chapter 6,
\\P(0Th(N,Z) G 0 -P°(0nhe-Guh(N°,Z°) G Oil
^ \\P(((N,Z),Th) G 0 -Po((0_GUh(iV°,Z°),/7,O G Oil-
Thus
||P(^(iV,Z)G0-Po((Aro^O)G0ll
= ||P((9Tfc(iV,Z) G 0 -P°(0nhe-GUh(N°,Z°) G Oil [by (8.8)]
^ \\p((N,z),Th) e o -P°((0-Guh(N°,z°),nh) e oil [by (8.10)]
= ||P((iV,Z) GO- P°(0-GUh(N°,Z°) G Oil- [by (8.9)]
Apply Theorem 8.3 to obtain the desired limit result. □
Section 8. The Randomized-Origin Duality 333
8.5 Another Cesaro Total Variation Motivation of (8.6)
The reader may have noted that Theorem 8.4 is not an immediate
counterpart of (9.2) in Theorem 9.1 of Chapter 8: we average over a random
number of points in a set and not over a deterministic number of points.
In order to average over a deterministic number of points we need the
following fact from ergodic theory.
Fact 8.1. Suppose (N,Z) is stationary under P and E[N([0,l)d)] < oo.
Let Bn G Bd, 1 ^ n < oo, be convex and compact and increase to Rd as
n —> oo. Then
N{Bn)/\{Bn) ->. E[N([0, l)d\J] a.s. P, n ^ oo.
For a proof, see Daley and Vere-Jones (1988), Proposition 10.2.II and
Theorem 10.2.IV. The proof relies on the ergodic theory of Tempel'man (1972).
We also need the following natural lemma (natural because a G G J+
should behave as a constant when J is given).
Lemma 8.1. Suppose the equivalent statements in Theorem 8.1 hold. Let
G G J+, that is, let G = g(N,Z) for some g G 1+■ Then under P the
point process N(G-) is stationary and has conditional intensity
E[N(G[0, l)d)\J] = GdE[iV([0, l)d)\J] a.s. P. (8.11)
If further G = E[N([0, l)d)|J]_1/d, then E[AT(G[0, l)d)|J] = 1 a.s. P.
Proof. Stationarity of N(G-) follows from the stationarity of (N, Z) and
Lemma 6.1(6). In order to establish (8.11), note that for all k > 0, all
ai,..., afc > 0, and all disjoint A\,...,Ak G T we have
N((ailAl+--- + aklAk)[0,l)d)
= lAlN(ai[0, l)d) + ■■■ + lAkN(ak{0, l)d)
and that by Theorem 6.2(6), N is stationary conditionally on J, which
yields
E[JV(a[0, l)d)|J] = adE[iV([0, l)d)|J] a.s. P for all a > 0.
Thus (8.11) holds for all simple functions G G J+■ Any bounded G G J+
can be approximated both from below and above by simple functions in
J+, and thus, for bounded G € J+, we have that (8.11) holds with =
replaced by both ^ and >. Thus (8.11) holds for bounded G G J+. Now,
for any G G J+,
N(G[0, l)d) = £ N(Gl{i^G<i+l}[0,1)"),
334 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
and since (8.11) holds for each GI^oki+i}, it holds for G. The final claim
follows from (8.11) by noting that due to Theorem 6.2(e), E[iV([0, l)d)\J] =
E[1/A(C0)|J] and that by assumption, E[1/A(C0)|J] < oo. □
Here finally is the other Cesaro total variation motivation of (8.6).
Theorem 8.5. Suppose the equivalent statements in Theorem 8.1 hold. Let
l?/j,0 < h < oo, be as in Theorem 8.3. Let the Bh be convex and compact
and increase continuously from {0} to Rd as h increases from 0 to oo. Let
LTn,n ^ 0, be the points of N enumerated in the order they are hit by B^
as h increases and lexicographically if two or more are hit simultaneously.
Let U be uniform on [0,1) and independent of (N, Z) under P. Then
p(en[Un](N,z) & ■) ^p°((N°,z°) e ■)
in total variation as n —>■ oo.
Proof. Put G = E[N([0, l)d)\J]-lld. By Lemma 8.1, N(G-) is stationary
and
E[iV(G[0,l)d)|J] = l a.s. P.
Let h(n) be such that X(Bh^) = n and apply Fact 8.1 to N(G-) to obtain
N(GBh{n))/n ->■ 1 a.s. P, n ->■ oo. (8.12)
Let T/j be as in Theorem 8.4, interpret
1 °
0 5Z 1{Dnk_1{N,Z)e-} = l{(JV,Z)e-},
fc=i
and note that for n,m ^ 1 and 0 ^ ai ^ 1, 0 ^ a% ^ 1,...,
1 v-> 1 v~> I n(- —-) = 1 — n/m, n ^ m,
n£r{ mkT1 l("-m)^ = l- m/n> m < n,
to obtain
P(9„lUn](N,Z)€-)-P(0Thln)(N,Z)€-)
■1^. 1
nAN{GBh{n))
nVN(GBh(n))
rnAN(GBh(n)h
^ ^InV N(GBhln,)\
Section 9. Comments 335
Thus (see the second identity in (8.11) of Chapter 3)
\\p(en[Un](N,z) e -)-P(eTHn)(N,z) e Oil
<2 2E\nAN{GBh{n))]
InV N(GBhln))\-
By (8.12) and bounded convergence the expectation goes to 1 as n -4 oo,
and thus
\\P(6n[Un](N:Z) e •) - P(0Th(n)(N, Z) e -)II -> 0, n -> cx>.
This, together with Theorem 8.4, yields the final step in
\\P(0„lUn](N,Z)e-)--po((No,z°)e-)\\
< \\p(0„lUn](N,Z) e .)-P(6Thin)(N,Z) e -)ll
+ \\P(eTh(n)(N,Z)&.)-P°((N°,Z°)&-)\\
—> 0, n —> oo,
and the proof is complete. □
9 Comments
We conclude this chapter with comments on the two Palm dualities and
on a possible extension of the point-stationarity concept to more general
random phenomena.
9.1 When Do the Two Palm Dualities Coincide?
Under what conditions is it true that standing at the origin of a stationary
point pattern, and happening to find a point there, is equivalent to standing
at a point selected uniformly at random from the point pattern? That
is, when do the two Palm dualities coincide? We now specify the exact
condition.
Let (JV, Z) be stationary under a probability measure P with finite
intensity:
E[iV([0,l)d)] = E[l/A(Co)]<oo.
Then the two point-stationary duals of (JV, Z) coincide if and only if
E[1/A(C0)|J] = E[1/A(C0)] a.s. P.
In particular, this holds in the ergodic case, that is, when P= 0 or 1 on J.
336 Chapter 9. THE PALM DUALITIES IN HIGHER DIMENSIONS
Conversely, let (N°,Z°) be point-stationary under a probability measure
P° with finite Voronoi cell volume:
Eo[A(C0)]<oo.
Then the two stationary duals of (N°,Z°) coincide if and only if
E[A(Co)|J] = E[A(C0)] a.s. P°.
In particular, this holds in the ergodic case, that is, when P° = 0 or 1 on J.
9.2 Random Site Change Hides the Gap Between the Dualities
As in the one-dimensional case we can make the two Palm dualities
coincide by a simple random site change. Let (N, Z) be stationary under a
probability measure P with E[l/A(Co)] < oo. Let Z be measurable under
change of site-scale and change the site-scale by
G = E[l/A(Co)|J]-1/d to obtain (N(G-), (ZGs)se^).
This new pair is stationary, and its two point-stationary Palm duals
coincide. This procedure preserves the randomized-origin duality and not the
point-at-zero duality. In fact, we lose the point-at-zero duality: the point-
at-zero duality merges with the randomized-origin duality by the change of
site-scale and does not reappear when we return to the original site-scale
after change of measure (as the randomized-origin duality does). Thus the
site change is not a way to bridge the gap between the two dualities; it only
hides it. To bridge the gap we cannot avoid a change of measure.
9.3 Extending Point-Stationarity to General Random Sets?
There are other random sets that should look the same from all the points.
An obvious example is the set of times where a Brownian motion takes the
value zero. The solution in the present chapter of the point-stationarity
problem in the case of point processes in d > 1 dimensions (Definition 3.1)
suggests that point-stationarity could be defined in these models as follows.
Proposed definition. A random set is point-stationary if it is distri-
butionally invariant under bijective point-shifts against any independent
stationary background.
It remains to find such backgrounds and point-shifts.
With this open problem we end the stationarity part of the book.
Chapter 10
REGENERATION
1 Introduction
In this last chapter we finally focus on the third topic of the book,
regeneration. Regenerative processes are generalizations of Markov chains and
renewal processes, which we considered in Chapter 2. We shall look at
several kinds of regeneration, and as in Chapter 2, the aspects we concentrate
on are coupling, stationarity (and its generalizations), and total variation
asymptotics.
In Section 2 we establish notation and then consider briefly the one-sided
counterpart of the two-sided stationarity theory of Chapter 8.
In Section 3 we consider classical regeneration. A stochastic process is
regenerative in the classical sense if there are random times where it starts
anew independently of the past, like a recurrent Markov chain at the times
of visits to a fixed reference state. The regeneration times form a renewal
process and split the stochastic process into a sequence of cycles that are
i.i.d. and independent of a possible initial delay.
In Section 4 we consider wide-sense regeneration. Wide-sense
regeneration allows the future after regeneration to depend on the past as long
as the future is independent of the past regeneration times. This is the
type of regeneration occurring in so-called Harris chains, but for a simple
example consider a recurrent Markov chain and let J > 0 be fixed: / time
units after visiting a fixed reference state the chain regenerates in the wide
sense (but typically not in the classical sense). The regeneration times still
form a renewal process, but the cycles are only stationary and need not be
independent.
337
338 Chapter 10. REGENERATION
In Section 5 we move on to consider time-inhomogeneous regeneration.
Time-inhomogeneous regeneration allows the future after regeneration to
depend on the time of regeneration. This is the type of regeneration
occurring in time-inhomogeneous Markov chains with a recurrent state: if such
a Markov chain visits this state at time t, then the future after time t is
independent of the past before time t but has a distribution that depends
on t. In this case the sequence of cycles need no longer be stationary and
the regeneration times need not form a renewal process, they only form an
increasing discrete-time Markov process (note that a renewal process is an
example of an increasing discrete-time Markov process).
Section 6 contains a coupling construction for time-inhomogeneous
regenerative processes. This construction is an elaboration on the classical
coupling (Chapter 2). In Section 7 we investigate the coupling time
thoroughly to obtain results on uniform convergence and rates of convergence.
In Section 8 we introduce asymptotics from-the-past. Ordinary asymp-
totics are to-the-future: we start a process at time zero and observe it in the
far future to obtain a stationary limit process. Asymptotics from-the-past is
the reversal of this procedure: we start a process in the remote past and
observe it from any fixed time t onwards to obtain a (typically) nonstationary
limit process. In the time-inhomogeneous case we cannot expect to obtain
a limit process by going to-the-future (unless the time-inhomogeneity
disappears asymptotically), but coming in from-the-past turns out to yield a
limit process, a nonstationary one because of the time-inhomogeneity.
In Section 9 we consider taboo regeneration. Taboo regeneration means
basically that the process regenerates in the classical sense while not
entering a fixed region of the state space (taboo region), like a transient Markov
chain while it stays in a finite irreducible set of states. In this case the
regeneration times form a possibly terminating renewal process. We shall
consider the existence of taboo limits: we start a process at time zero and
observe it in the far future, conditionally on not yet having entered the
taboo region, to obtain a limit process. A taboo regenerative process
becomes a time-inhomogenous regenerative process under this conditioning,
and thus the limit theory from the time-inhomogeneous case applies.
In Section 10 we consider taboo stationarity, the characterizing property
of a taboo limit process, and work out the structure of the taboo limit
process in the taboo regenerative case. This structure is quite different
from, but analogous to, the structure of the stationary version of a cycle-
stationary process (Chapter 8).
Section 11 rounds off with coupling from-the-past, a perfect simulation
method for generating observations from the stationary, nonstationary,
and taboo stationary (quasi-stationary) limits of finite state space Markov
chains.
As the above description indicates, many themes from the previous
chapters converge in the next two sections, to be further developed and extended
in the remaining sections.
Section 2. Preliminaries - Stationarity 339
2 Preliminaries — Stationarity
This section lays down the framework to be used in Sections 3 through 6
(and also with certain modifications in Sections 7 through 9) and then
considers briefly the one-sided counterpart of the two-sided stationarity
theory in Chapter 8.
2.1 The One-Sided Process and Points
Let (fl, T, P) be a probability space supporting
Z = (^*)se[o,oo) and S = (Sk)o>,
where Z is a one-sided continuous-time stochastic process with a general
state space (E, £) and path space (H, V.) and S is a one-sided sequence of
random times (points) satisfying
0 ^ S0 < Si < > oo.
Regard S as a measurable mapping from (fi, J-) to the sequence space
(L,£), where
L = {(sfc)S° € [0, oo)*0'1-- > : so < «i < ► <x]
and £ are the Borel subsets of L, that is,
£ = Lfl#'-'.
Thus the pair (Z, S) is a measurable mapping from (fi, F) to (HxL, H®C).
Let %®C+ denote the class of all measurable functions from (H xi,7{®£)
to ([0,oo),B[0,oo)).
We shall not assume any functional connection between Z and S. At one
extreme, Z and S could be independent. At another extreme, S could be
determined by Z: for instance, S could be the times when Z enters a given
state or set. At a third extreme, Z could be determined by S, as is the case
if Z is one of the following processes.
Let 5_i be a strictly negative random variable and put, for t € [0, oo),
Nt
At
Bt
Dt
ut
= inf {n ^ 1 : S„ >
= t — 5jv,-i
= Sn, — t
= XNt =At + Bt
= AtjDt
t}
number of points in [0, t],
age at time t,
residual life at time t,
total life at time t,
relative age at time t;
see Figures 8.1 and 9.1 in Chapter 2.
340 Chapter 10. REGENERATION
2.2 The One-Sided Shift - Shift-Measurability
For t € [0, oo), let 8t be the shift-map from H to H:
&tZ = (^t+s)se[0,oo)-
Let 8t also denote the joint shift-map from H x L to H x L:
et(z,(sk)™) = (9tz,(snt_+k-t)^),
where nt- = inf{n ^ 1 : sn ^ t}.
Note that f?t is a time shift and shifts (sfc)o° regarded as a sequence of
times, that is, 8t shifts (sfc)§° by subtracting t from the times sfc and only
shifts the index k of (sfc)o° to observe the convention that the first time is
indexed by 0.
In order to be able to shift at will without measurability
complications, assume that Z is shift-measurable, that is, let the path set H be
invariant under time-shifts and the mapping taking (z,t) € H x [0, oo) to
zt e E be % ® B[0, oo)/£ measurable. This is equivalent to the mapping
taking (z,t) e H x [0, oo) to 0tz € H being % ® B/U measurable. Shift-
measurability covers, for instance, processes with a Polish state space (in
fact separable metric suffices) and right-continuous paths (left-hand
continuity is not needed). See Section 2 of Chapter 4 for more details.
2.3 Cycles and Cycle-Lengths — Delay and Delay-Length
The random times Sn split Z into a delay
D = (zs)s£[o,sQ) (see Figure 2.1)
and a sequence of cycles
Cn = (•^sn_i+s)se[o,xn)> n^l,
where Xn are the cycle-lengths
Xn = Sn — Sn-i, n ^ 1.
Realization of (Z, S)
(to make the illustration
easier we let Z be real-
valued with continuous
paths and S be the times
of visits to 0)
(Z,S)
(Z°,5°)
C.
c2
The gray axis is at
the origin of (Z°, 5°).
C4
So
S,
s4
-v
-x2^\*x3*\*~xA
FIGURE 2.1. The points S split Z into a delay D and cycles C„.
Section 2. Preliminaries - Stationarity 341
The delay D and the cycles Cn are stochastic processes vanishing at the
random times So and Xn, respectively. The easiest way to make sense
of such processes as random elements is to think of them as entering an
absorbing state A when vanishing, where A (the cemetery) is external to
the state space; see Section 2.9 of Chapter 4 for technical details. The cycle-
lengths Xi,X2,-.- and the delay-length So are all obtained by the same
measurable mapping from their respective cycles C\, Ci, ■. ■ and delay D.
They are simply the hitting times of the absorbing cemetery state A. The
pair (Z, S) is a measurable mapping of the delay and cycles (string them
together), and vice versa.
Say that (Z, S) is zero-delayed if So = 0. Define a zero-delayed pair by
(Z°,S°) := 6So{Z, S) (see Figure 2.1).
Thus S^ = 0 and 5,° = Xf, while for n ^ 1, X° = Xn and C°n = Cn.
2.4 Cycle-Stationarity — Stationarity
Call (Z, S) cycle-stationary if the cycles form a stationary sequence, that
is, with = denoting identity in distribution:
(C,n+1,Cn+2,...) = (C1,C2,...), n^O.
Cycle-stationarity is equivalent to
eSn(z,s) = (z°,s°), too,
since (Cn+i, Cn+2, • • •) and 8sn(Z,S) are measurable mappings of each
other, and since these mappings do not depend on n. When (Z, S) is cycle-
stationary, put
F(x) = P(X1^x), 0s$:r<oo,
that is, F is the common distribution function of the cycle lengths.
A pair (Z*,S*) is stationary if
0t{Z*,S*) = {Z*,S*), t>0.
We now construct a stationary (Z*,S*) from a cycle-stationary (Z°,S°)
when E[Xi] < oo. The proof is based on the same idea as in Section 9 of
Chapter 2 and Section 4 of Chapter 8; namely, a stationary version should
be obtained by length-biasing the first cycle of (Z°,S°) and then placing
the time origin at random in that cycle.
Theorem 2.1. Suppose (Z,S) is cycle-stationary with E[XY] < oo. Let U
be uniformly distributed on [0,1) and independent of (Z°,S°) and let P*
be the probability measure on (fi, T) defined by
dP* = r ,dP (length-biasing).
E[AiJ
342 Chapter 10. REGENERATION
Let (Z*,S*) have the distribution V*{6UXl (Z°,S°) € ■)■
Then (Z*,S*) is stationary,
E[f(Z',Sr)]=E[[ VcW^^sj/EpG], fzU®C+, (2.1)
Jo
and Sq is continuous with distribution function Goo defined by
and density P(X] > x)/E[Xi], x ^ 0.
Comment. In this chapter we return to the convention of the common
probability space (see Section 3.1 in Chapter 3) abandoned for a while
in Chapters 8 and 9. This means that we let all random elements under
consideration [like (Z,S) and (Z*,S*) in the above theorem] be defined
on a single probability space (fi, T, P). However, we sometimes [as in the
above theorem and later in Section 9] establish existence and structure
through a change of measure [by replacing P by P*].
Proof. The definition of P* yields the first equality in the following
calculation: for t ^ 0 and bounded / e % ® C+,
E*[f(eteUXl(z°,s°))} = Eifieteux^z^s^x.yEix,}
= E[J ' f{es{Z\Sa))ds]/E[X,]
= (E[[ 1/(^(2°,5°))rfs]+E[/ 1+/(fl.(Z°,S°))ds])/E[A:1]
Jt JXi
= (E[ / * f(0,(Zo,S°))d8] +E[ f fiesiZ^S^ds^/EiX,}
Jt Jo
= E[[ l fie^Z^S^dsj/EiX,},
Jo
while the second equality follows from the fact that the conditional
distribution of t + UX\ given (Z°,S°) is uniform on [t,t + Xi), the third
equality holds since / is bounded (note that the integral from t to X\ can
be negative, since t can be greater than X\), and the fourth equality
follows from the fact that 0Xi(-Z°,S°) has the same distribution as (Z°,S°).
The last term does not depend on t, and stationarity is established. This
also establishes (2.1) for bounded / and thus, by monotone convergence,
for all /. For the distribution of Sq, see Section 4.4 in Chapter 8. □
Section 2. Preliminaries - Stationarity 343
2.5 Lattice X\ — Periodic Stationarity
Call X\ lattice with span d if d > 0 and
V{Xl£dL)=\ and ¥(XY € al) < 1 for all a > d.
Call a pair (Z**, S**) periodically stationary with period d if d > 0 and
6nd(Z**,S**) = (Z**,S**), TOO. (2.2)
Recall that x mod d := x — [x/d]d.
Theorem 2.2. Suppose {Z,S) is cycle-stationary with E[Xi] < oo and let
(Z*,S*) be as in Theorem 2.1. If X\ is lattice with span d, then
\Zj , O ) .— os*modd(^ ,3 )
is periodically stationary with period d,
Xi/d
E[/(Z**,S")] =dE[ Y, f{Okd{Z°,S°)j\IV[Xil (2-3)
fc=i
for f € 7i ® C+, and Sq* is d!L valued with probability mass function
Proof. Since the cycle-lengths of (Z*,S*) are dZ valued, we have
Sq mod d — B*nd mod d, n ^ 0.
Since (Z*,S*) is stationary, this implies that
(0„d(Z*,S*),S* modd) ~ ((Z*,S*),S0* modd), n>0.
Since 9nd(Z**,S**) is the same measurable mapping of the left-hand side
as (Z**,S**) is of the right-hand side, we obtain (2.2), that is, (Z**,5**)
is periodically stationary with period d.
Apply (2.1), with f{Z",S*) replaced by f{eBimoid{Z*,S*)) and with
f(6s(Z°,S°)) replaced by f(eB°modd0s(Zo,S°)), to obtain
nnoBimodd(Z;S*))] = E[ / ' /(flB.m0dA(^0,S0))ds]/E[A:1].
./o
Since 5q mod d = Bq mod rf, the left-hand side equals the left-hand side of
(2.3), and since 0Bomodd0s{Z°,So) = 6kd{Z°,S°) for s € (kd-d,kd\, the
right-hand side equals the right-hand side of (2.3). Thus (2.3) holds.
Finally, the density claim in Theorem 2.1 yields the second equality in
P(S0** = kd) = P(kd ^ S0* <kd + d) [S** = S* - (S0* mod d)]
pkd+d
= / P(X, > x) dar/Epd] = dP(X, > fc^/E^],
while the last equality follows from the fact that X\ is dZ valued. O
344 Chapter 10. REGENERATION
2.6 On Discrete Time
To simplify the presentation we do not treat discrete-time processes
separately in this chapter. This is no restriction, since a discrete-time process
(Zfc)g° can be embedded into a continuous-time shift-measurable process
Z = (Z5)se[0,oo) by defining
Zs = Zw, s E [0, oo).
If there is an integer-valued sequence of times S associated with (Zfc)g°,
then the cycle-lengths of {Z, S) will be lattice with an integer span d ^ 1.
If d = 1, then the cycle lengths of (Zfc)g° are called aperiodic. In this case
Theorem 2.2 yields a discrete-time pair ((Z^*)g°, S**) that is stationary (in
discrete time), that is,
0„((Zr)oo,S**) = ((Zr)5°,S**), n^O,
and the delay length Sq* has probability mass function
If d > 1, then ((Z**)^, S**) is periodically stationary (in discrete time)
with period d, and the delay-length S** has the probability mass function
in Theorem 2.2.
2.7 Comparison with the Two-Sided Case in Chapter 8
An important distinction between the above one-sided framework and the
two-sided framework in Section 2 of Chapter 8 is the delay concept. One
could think of the delay as an initial cycle, but it is more appropriate to
separate it from the cycles. For instance, the delay of a one-sided stationary
process (see Theorem 2.1 above) is in fact only the latter part of a full cycle.
For that reason we do not use Xq (here a symbol for a length of a cycle)
for the delay length Sq.
The essentials of the two-sided Palm duality theory in Chapter 8 can be
established in the one-sided setting by a modification of the proofs. In fact,
when the state space is Polish and the paths right-continuous with left-hand
limits, then we need not even modify the proofs because then we can carry
results immediately over to the one-sided setting by extending a one-sided
stationary (Z*,S*) and a one-sided cycle-stationary (Z°,S°) to two-sided
pairs [use the Kolmogorov extension theorem, Fact 3.2 in Chapter 3].
Here we shall not go through the details of the one-sided counterpart
of the theory in Chapter 8 but only sketch briefly those results that are
particularly illuminating to contrast with the results in the next section.
Section 2. Preliminaries - Stationarity 345
Theorem 2.1 above establishes half of the point-at-zero duality between
stationarity and cycle-stationarity (for the two-sided version, see
Theorem 4.1 in Chapter 8). The informal point-at-zero interpretation
Z° behaves as Z* conditioned on the null event 5g = 0
can be established formally. For instance, when the state space is Polish
and the paths right-continuous with left-hand limits, then Z* conditioned
on {Sq ^ t} goes in distribution to Z° as t goes to zero.
In the same way we can establish half of the randomized-origin Palm
duality between stationarity and cycle-stationarity (for the two-sided version,
see Theorem 8.1 in Chapter 8). Namely, suppose (Z°,S°) is cycle-stationary
and that
E[X°U°] < oo, where J° is the invariant a-algebra of (Z°,S°).
Let U be uniformly distributed on [0,1) and independent of (Z°,S°) and
define a probability measure P on (ft, J-) as follows:
X°
d^ = prvoiTd dP (length-biasing given J°).
ViM J
Let (Z,S) have the distribution P(0UXl(z°>S°) <E •)• Then (Z,S) is
stationary, and the distributions of Z° and Z agree on invariant sets. Thus,
by Theorem 5.4 in Chapter 5, the randomized-origin interpretation holds:
Z behaves as Z° with origin shifted to a uniform time in [0, oo);
or formally in terms of Cesaro total variation convergence: with V uniform
on [0,1) and independent of Z°, it holds that
9vtZ° 4 Z, t -> oo.
Also, there exists a successful distributional shift-coupling of Z° and Z (see
Theorems 5.4 and 2.1 in Chapter 5); namely, the probability space (Q., J-, P)
can be extended to support finite random times T° and T such that
0T°Z° =6fZ.
When there exists a weak-sense-regular conditional distribution of Z given
8fZ (for instance, when the state space' is Polish and the paths right-
continuous), then we may choose Z such that the shift-coupling is nondis-
tributional (see Theorem 2.2 in Chapter 5):
C7J-0 Z =■ Uj>Z.
This is the shift-coupling (and Cesaro total variation) counterpart of the
stronger exact coupling and epsilon-coupling (and plain and smooth
total variation) results established for classical and wide-sense regenerative
processes in the next two sections.
346 Chapter 10. REGENERATION
Remark 2.1. The two stationary versions (Z*,S*) and (Z,S) of a cycle-
stationary (Z°,S°) turn out to coincide in the regenerative case. This can
be seen, for instance, by noting that there is a trivial successful
distributional shift-coupling of a regenerative (Z°,S°) and the stationary version
(Z*,S*) (see Theorem 3.1 below: the shift-coupling times are 5q = 0 and
So). Since there is also successful distributional shift-coupling of a
regenerative (Z°,S°) and the stationary version (Z,S), the two versions are
both Cesaro total variation limits of (Z°, 5°) and thus must have the same
distribution.
Remark 2.2. The reader may wonder why we state the Cesaro total
variation and shift-coupling results only for Z° and Z and not for (Z°,S°) and
(Z, S) as in Chapter 8. This is just to be in accordance with the rest of this
chapter. To simplify notation in this chapter we shall state many results
for the process only and not for the joint process and points. This is no
restriction because we can always embed 5 in the process by replacing Z
by (Zs,As)se[0yOOy Then 5 is simply formed by the times when the age
process (As)s6[o>0o) Enters the state zero.
3 Classical Regeneration
In this section we shall consider processes regenerative in the sense
commonly associated with that term. In order to distinguish this regeneration
concept from the generalizations studied in Sections 4 through 10 we shall
use the term classical regeneration.
3.1 Definition
Call a one-sided-shift-measurable stochastic process Z classical regenerative
with regeneration times S if
dSn(Z,S) = (Z°,S°), n^O, (3.1)
and
8sn(Z,S) is independent of ((Zs)s€[0tS„),So,---,Sn), n ^ 0. (3.2)
Call the pair (Z, S) classical regenerative if this holds.
This definition can be reformulated as follows: (Z, S) is classical
regenerative if and only if
C\, C2,... are i.i.d. and independent of D. (3.3)
In order to establish this equivalence, first note that (3.3) can be
reformulated as the following two claims
(Cn+1,Cn+2,...) = (CuC2,...), n>0, (3.4)
(C„+i,C„+2,...) is independent of (D,Ci,...,C„), n ^ 0; (3.5)
Section 3. Classical Regeneration 347
then recall that (3.4) is equivalent to (3.1); and finally note that (3.5) is
equivalent to (3.2), since (Cn+i,C„+2, • • ■) and 6s„(Z,S) are measurable
maps of each other and so are (D, C\,..., Cn) and ((Zs)s6[oj5?i),5o,.. .,5n).
It follows from (3.3) that if (Z, S) is classically regenerative, then the
cycle-lengths X1,X2,... are i.i.d. and independent of the delay-length So,
that is, 5 is a renewal process. The cycle-lengths are also called inter-
regeneration times and recurrence times. Let the nonnegative random
variable 5-i be such that (S-i,D) is independent of (Z°,S°).
3.2 Examples
Here are a few standard examples of processes that are regenerative in the
classical sense.
Let 5 be a renewal process. Then the age process (-*4s)se[o,oo)> the residual
life process (i?s)s€[oj0o), the total life process (£,s)se[o,oo); and the relative
age process (Us)se[o,oo) are all classical regenerative with S as regeneration
times. In fact, these processes viewed jointly as (As, Bs, Ds, Us)se[o,oo) form
a four-dimensional classical regenerative process. Also, if Z is classical
regenerative with regeneration times 5, then so is (Zs, As, BS,DS, Us)se[o,oo)-
Let Z be an irreducible recurrent Markov chain (as in Chapter 2). Then Z
is classical regenerative with regeneration times S formed by the successive
entrances to a fixed reference state.
Let Z be a general state space shift-measurable Markov process (as in
Chapter 6). Let A be a set of states such that Z enters A infinitely often
and finitely many times in finite intervals, and satisfies the Markov
property at the entrance times. If the transition probabilities are the same from
all states in A, then Z is classical regenerative with regeneration times
S formed by the successive entrances to A. Conversely, any classical
regenerative (Z, S) can be embedded into a Markov process. For instance,
the process with value (Zt+s)se[o,Bt) at time t is Markovian, and so is the
process with value (Zt+s)s€[-At,o\ &*■ time t.
Regeneration is usually not a primary assumption in applications. Rather
regeneration is the key property of many processes that come out of
stochastic models, the property that makes the processes amenable to analysis. In
particular, regenerative processes abound in queueing theory. As an
example, let us consider the GI/GI/1 queueing model, namely the single-server
queueing system where customers arrive to a service station at times
forming a renewal process and line up to be served under the first-come-first-
served discipline with i.i.d. service times that are independent of the arrival
process (GI stands for general independent). Let a denote an inter-arrival
time and /? a services time. Let Qt denote the queue length at time t and
Rt the remaining service time of the customer being served at time t. If
E[/3] < E[q], then (according to the law of large numbers) the system will
empty infinitely often and the bivariate process (Qs,.Rs)se[o,oo) is classical
regenerative with the times of arrivals to an empty system as regeneration
348 Chapter 10. REGENERATION
times. If further E[/3] < E[a] < oo, then it can be shown that the expected
value of the inter-regeneration times is finite, and thus (according to
Theorem 3.1 below) there exists a stationary version of (Qs, Rs)se[o,co)-
Moreover, it is readily checked that if the inter-arrival times are nonlattice (or
spread out), then so are the inter-regeneration times and thus (according
to Theorem 3.3 below) the process (Qs, Rs)se[o,<x>) tends to its stationary
version in smooth (or plain) total variation.
3.3 Stationary Version — Periodically Stationary Version
A pair (Z',S') is a version of a classical regenerative (Z,S) if (Z',S') is
also classical regenerative and
9S-0(Z',S') = (Z°,S°). (3.6)
Note that a pair (Z\ S') is a version of a classical regenerative (Z, S) if and
only if (3.6) holds and the delay D' of (Z', 5') is independent of 6S-0(Z',S').
In particular, (Z°,S°) is a zero-delayed version of (Z, S).
Theorem 3.1. Suppose (Z,S) is classical regenerative with E[X\] < oo.
Then (Z*,S*) in Theorem 2.1 is a stationary version of (Z,S). Further, if
X\ is lattice with span d, then (Z**,S**) in Theorem 2.2 is a periodically
stationary version of (Z, S) with period d.
Proof. According to Theorem 2.1, (Z*,S*) is stationary. According to
Theorem 2.2, (Z**,S**) is periodically stationary. Since
(Z** ,5**) := 0s*modd{Z* ,5*),
it follows that 0Sj. (Z**, S**) = 0S* (Z*,S*) and that the delay of (Z**,S**)
is a measurable mapping of the delay of (Z*,S*). Thus (Z**,S**) is a
version of (Z,S) if (Z*,S*) is a version of (Z,S). Hence it only remains
to show that (Z*,S*) is a version of (Z,S), that is, with U and P* as in
Theorem 2.1 we must show that
P*(6Xl(Z°,S°) e •) = P((Z°,5°) 6 ■), and under P* the
delay of 6UXl(z°,S°) is independent of 0Xl(Z°, S°).
Basically, (3.7) holds because the change of measure dP* = Xj/EpGjdP
only affects the first cycle of (Z°,S°), which is independent of 6Xl (Z°,S°),
and because the delay of 8uXl(Z°,S°) is obtained by placing the origin at
random in that cycle.
Here is a more detailed proof of (3.7). The density A"i/E[Xi] is a
measurable function of C\ and (thus) is independent of 6Xl(Z°,S°) under
P. This implies that replacing P by P* does not change the
distribution of eXl{Z°,S°) nor the fact that 6Xl(Z°,S°) is independent of Ci
[use Lemma 4.1 in Chapter 8 with Y = C\ and with V any nonnegative
Section 3. Classical Regeneration 349
measurable function of 8Xl(Z°,S0)]. Neither does it change the fact that
U is uniformly distributed on [0,1) and independent of (Z°,S°) [apply
Lemma 4.1 in Chapter 8 with Y = (Z°,S°) and with V any nonnegative
measurable function of U]. Thus
P*(6Xl(Z°,S°) €•) = P(6Xi(Z°,S°) e-), and under P* the
triple (U,Xi,C\) is independent of 9Xl (Z°, 5°).
Combine this and P(0Xl(Z°,S°) e •) = f((Z°,S°) e •) and the
observation that 9UXl Cx is the delay of 0UXl (Z°,S°) to obtain (3.7). D
3.4 The Key Coupling Result
In order to understand why Theorem 3.2 below is the key coupling result
for classical regenerative processes it may be useful to consider briefly the
exact coupling (Section 3 of Chapter 4) problem in this case.
If two independent classical regenerative processes happen to regenerate
at the same time (say T = Sk = S'k1), then we can switch from one to
the other at that time without changing the distributions of the processes,
that is, we would have created an exact coupling. Now, although
simultaneous regeneration may take place if the regeneration times are lattice
valued, it almost surely does not take place when the regeneration times
are continuous, unless we construct dependence between the processes that
forces them to regenerate simultaneously. Such construction can be quite
cumbersome. It might be easier to create a distributional exact coupling: if
we could stop the two sequences of regeneration times in such a way that
the two stopped regeneration times (say T = Sk and T" = S'k1) have
the same distribution, then we would have created a distributional exact
coupling.
For this purpose we shall need the following generalization of the
stopping time concept: a nonnegative integer-valued random variable if is a
randomized stopping time with respect to a sequence of random elements
(Yjt)g° if for each n ^ 0, the event {K = n} depends on (Yjfe)o° only through
(Yo, • ■ •, Yn). A stopping time is the special case when for each n ^ 0, the
event {K = n} is in the cr-algebra generated by (Yo,..., Yn). A randomized
stopping time allows the stopping event {K = n} to be determined not
only by (Yq,. .. ,Yn) but also by some additional randomization, as long
as it is conditionally independent of (Yn+i,Yn+2, ■ ■ ■) given (Yo,.. ■ ,Yn).
More generally, we say that K is a randomized stopping time with respect
to (Yjt)o° in the presence of a random element Y if for each n ^ 0, the
event {K — n} and the random element Y depend on (Yjfc)o0 Only through
(Y0,..., Y"n). For properties of conditional independence, see Section 4.4 of
Chapter 3.
The following theorem reduces the distributional coupling problem for
classical regenerative processes to a distributional coupling problem for
350 Chapter 10. REGENERATION
the regeneration times (the random variable R will be needed for epsilon
coupling and can be dropped in the exact coupling case).
Theorem 3.2. Suppose (Z,S) is classical regenerative. Let (S,K,R) be
such that R is a random variable and
S = S, (3.8a)
K is a randomized stopping time w.r.t. S in the presence of R. (3.86)
Then the probability space (Q.,!F,P) on which (Z,S) is defined can be
extended to support (K: R) such that
(S,K,R) = (S,K,R), (3.9a)
K is a randomized stopping time w.r.t. S in the presence of R, (3.96)
(K, R) is conditionally independent of Z given S. (3.9c)
Moreover, with T = Sk it holds that
(T,R)£(Sk,R), (3.10a)
(T,R) is independent of6TZ, (3.106)
eTZ = Z°. (3.10c)
PROOF. Apply the transfer extension in Section 4.5 of Chapter 3 to
obtain (3.9a) and (3.9c) from (3.8a). From (3.86) and (3.9a) it follows that
(3.96) holds. From (3.9a) it follows that (3.10a) holds. In order to
establish (3.106) and (3.10c), note that [due to (3.9c)] the event {K = n}
and the random variable R depend on (Z, S) only through S and [due
to (3-96)] on S only through (So, ■ ■ ■, Sn). Thus {K = n} and R depend on
(Z,S) only through (So,.. .,Sn). Thus {K = n} and R depend on 9snZ
only through (So, • • •, £„). Since 9snZ and (So,- ■ ■ ,S„) are independent,
this means that 0$nZ is independent of (Sn,R) and {K = n}. This and
P(0SnZ e ■) = P(Z° e ■) yield the second identity in
P(9sKZe-,(SK,R)e-,K^n)
= P(6SnZe-,(Sn,R)e-,K = n)
= P(Z°e-)P((Sn,R)E-,K = n)
= P(Z° e-)P((SK,R)e-,K = n).
Sum over n to obtain (3.106) and (3.10c). D
This is a convenient place for the following analogue of the strong Markov
property.
Section 3. Classical Regeneration 351
Lemma 3.1. Let (Z,S) be classical regenerative. Suppose K is a stopping
time with respect to S or, more generally, K is a randomized stopping time
with respect to S and conditionally independent of Z given S. Then
8Sk(Z,S) = (Z°,S°),
9Sk(Z,S) is independent of ((Zs)s€[0tsK),S0,... ,Sk)-
Proof. In the above proof replace R by ((Zs)s€[0 sK),So, ■ ■ ■, Sk) and Z
by (Z,S). ' □
3.5 Lattice or Spread-Out Xi - Exact Coupling
In this subsection and the next many ideas developed in the previous
chapters converge through Theorem 3.2 above.
Recall that the random variable X\ is spread out if there exists ann^l
and an / e B+ such that fR f{x) dx > 0 and, with X2, ...,Xn i.i.d. copies
ofXj,
P(X1+---+X„e£) ^ / f{x)dx, BeB.
Jb
In particular, a continuous Xx is spread out (simply take n = 1 and / the
density of X\ with respect to Lebesgue measure). On the other hand, a
discrete X\ is not (since then X\ + ■ ■ ■ + Xn is discrete for each n ^ 1).
Theorem 3.3. Let (Z, S) be classical regenerative and (Z1, S') be a version
of(Z,S). Suppose either
Xi is spread out,
or
X\ is lattice with span d, and So and S'0 are both d!L valued.
Then the following claims hold.
(a) The underlying probability space (Q.,Jr,P) can be extended to support
finite random times T and T" such that (Z, Z',T,T') is a successful
distributional exact coupling, that is,
(0tZ,T) = (9T'Z',T')- (3-H)
Moreover, if there exists a weak-sense-regular conditional distribution
of Z given 9tZ [this holds when (E,£) is Polish and the paths are
right-continuous], then (ft,Jr,'P) can be further extended to support
a copy Z" of Z' such that (Z,Z",T) is a successful nondistributional
exact coupling of Z and Z', that is,
0TZ = eTZ" where Z" = Z'.
(3-12)
352 Chapter 10. REGENERATION
(b) With || • || denoting total variation (see Section 8.2 of Chapter 3) we
have
\\P(6tZ e-)-P(6tZ'e-)\\-+0, i->oo. (3.13)
Moreover, ifE[Xx] < oo and X\ is spread out, then
9tZ%Z\ i-»oo, (3.14)
while i/E[Xi] < oo and X\ is lattice with span d and So is d!L valued,
then
BndZ%Z**, n->oo. (3.15)
(c) With T the tail a-algebra on 7i (Section 9.1 of Chapter 4) we have
P(Ze-)|r = P(^'G-)|r-
(d) The process Z is T-trivial and mixing (Section 2.1 in Chapter 6).
Proof, (a) If Xx is lattice with span d, and So and S'0 are both dZ valued,
then S/d and S'/d are integer-valued random walks with aperiodic step-
lengths, and Theorem 7.2 in Chapter 2 yields the existence of (5, S", K, K')
such that
S = S and K is a randomized stopping time w.r.t. S, (3.16)
5' = 5' and K' is a randomized stopping time w.r.t. S", (3-17)
If Xi is spread out, then we obtain (S,S',K,K') with these properties
from Theorem 6.1 in Chapter 3. Due to (3.16) and Theorem 3.2 above,
(fi, T, P) can be extended to support a T such that
T = 5^-, T is independent of 9tZ, 9tZ = Z°.
Due to (3.17) and Theorem 3.2, (fl, T, P) can be further extended to
support a T" such that
T' = S'k,, T' is independent of 0T<Z', BT,Z' ~Z°.
Combine Sk = S'k, and T = Sk and T' = S'k, to obtain
Section 3. Classical Regeneration 353
and combine Q^Z = Z° and Ot'Z' = Z° to obtain
uj-Z = vj-i Z .
Since in addition, T is independent oiOrZ, and T" is independent of 8t'Z',
this yields (3.11). Apply Theorem 3.2 in Chapter 4 to obtain Z" such that
(3.12) holds.
(b) To obtain (3.13) use (a) and Theorem 9.4 in Chapter 4. To obtain
(3.14) from (3.13) take Z' = Z* and use P(0tZ* £ •) = P(Z* <E •)• To
obtain (3.15) from (3.13) take Z' = Z**and use P(6ndZ** e-) = P(Z** e •)■
(c) To obtain (c) use (a) and Theorem 9.4 in Chapter 4.
(d) Due to Theorem 2.1 in Chapter 6, Z is T-trivial and mixing if we
can establish that
\\p(PtZe-\zeB)-P(ptZ€-)\\-+o, t-x», (3.18)
for all £? of the form
B = {zeH:ztl eAlt...,ztn £ An}, (3.19)
where n^l,0^£i <••■<£„ and A\,..., An € f.
In order to prove (3.18), note that (Z, (Sjv, +*)o°) ls a version of (Z, 5)
[see Lemma 3.1], and that the event {Z E B} is in the cr-algebra generated
by the delay of (Z, (S'jv,^ +fc)o°) because Swtn is greater than tn. The delay
of (Z, (Sn( +*)o°) ls independent of the cycles, and thus so is {Z E B}. It
follows that a pair (Z1, S') with distribution P((Z, (SNtn+k)(?) G -\Z £ B)
is a version of (Z,S). Thus (6) yields (3.18). " D
Remark 3.1. Let 5 be a renewal process and 5° its zero-delayed version.
Consider this special case of (3.13):
||P(Bt E-)-P(B° e-)ll ->0, t->oo. (3.20)
Theorem 3.3 has the following converse.
If So is continuous and (3.20) holds, then
X\ is spread out,
since otherwise B°t would be singular with respect to Lebesgue measure
for all t, which together with the observation that Bt is continuous for all
t would imply \\P(Bt E ■) - P(jB(0 e -)|| = 2 for all t, thus contradicting
(3.20).
If Xi is dTL valued and S0 = d and (3.20) holds, then
Xi is lattice with span d,
since otherwise there would be a k > 1 such that P{B^kd € kdL) = 1 and
P{Bnkd e d+kdZ) = 1 for alln, implying ||P(Bnfcd <E -)-P(B°nkd E -)|| = 2
for all n and thus contradicting (3.20).
354 Chapter 10. REGENERATION
Remark 3.2. Let (Z,S) be classical regenerative and (Z',S') be a version
of (Z, S). If Xi is lattice with span d, then 0somodd(Z,S) and 0s^modd(Z', S')
are versions of (Z, S) with dZ valued delay-lengths, and Theorem 3.3(a)
yields the existence of finite times T and T" such that
(SrOsomoddZ, T) = (0T'8s'omoddZ',T').
Thus if Xi is lattice with span d, then for all delay-lengths there is a
successful distributional exact coupling modulo a time shift that is strictly
less than d.
In particular, when this coupling can be made nondistributional, we have
OrOsomoddZ = OrOs^'moddZ", where (Z",S'd) = (Z',S'0).
Since \(T + (S0 mod d)) - (T + (S'd mod d))\ < d, this means that
(Z, Z", T + (So mod d), T + (S'd mod d))
is a successful nondistributional d-coupling of Z and Z'.
Remark 3.3. Classical regeneration is a special case of time-inhomogeneous
regeneration, which we shall consider in Sections 5 through 8. Thus the
results on uniform convergence and rates of convergence established there
also apply to classical regenerative processes; see Section 7.5 below.
3.6 Nonlattice X\ — Epsilon-Couplings
Recall that the random variable X\ is nonlattice if
P(Xj e dZ) < 1 for all d > 0.
Any spread out X\ is nonlattice, and so is, for instance, a discrete X\ taking
the values 1 and y/2 with strictly positive probabilities.
Theorem 3.4. Let (Z, S) be classical regenerative and (Z', 5') be a version
of(Z,S). Let U be uniform on [0,1) and independent of Z and Z'. Suppose
X\ is nonlattice.
Then the following claims hold.
(a) For each e > 0, the underlying probability space (Cl,!F, P) can be
extended to support finite random times Ts, T'E, Re, and R'E such that
(Z,Z',T£,T'£,R£,R'£) is a successful distributional e-coupling of'Z and
Z', that is,
\Te-Re\<e and \T'E-R'E\<e, (3.21)
(6TcZ,Te,Re) i (0TiZ',R'£,Te)- (3.22)
Section 3. Classical Regeneration 355
Moreover, if there exists a weak-sense-regular conditional
distribution of Z given OtZ for each random time T [this holds when (E, £)
is Polish and the paths are right-continuous], then for each e > 0,
(Cl,!F,P) can be further extended to support a copy Z^> of Z' such
that (Z,Z(e\Te,Re) is a successful nondistributional e-coupling of Z
and Z', that is,
\T£ - Re\ < e and 6TtZ = 9RsZ{e), where Z{e) = Z'. (3.23)
(a!) For each h > 0, (CI, T, P) can be extended to support finite
random times Th and T'h such that (9uhZ,0uh.Z'',Th,T'h) is a successful
distributional exact coupling of 8uh.Z and 6uh.Z''> that is,
(0Th0uhZ,Th) £ (6T,h6UhZ',T'h). (3.24)
Moreover, if there exists a weak-sense-regular conditional distribution
of Z given 8tZ for each random time T [this holds when (E, £) is
Polish and the paths are right-continuous], then for each h > 0, (Cl, T, P)
can be further extended to support a copy (Z^h\Uh) of (Z',U) such
that (6uh.Z,9uhhZ(h>,Th) is a successful nondistributional exact
coupling of Q\jhZ and 9uh,Z', that is,
OtJuhZ = 6Th0uhhZ(h), where (Z<"»,Uh) = (Z',U). (3.25)
(b) For each h > 0,
\\P(6t+uhZe-)-P(6t+UhZ'e-)\\^0, i^oo. (3.26)
Moreover, if F,[Xi] < oo, then the following claims hold.
(i) For each h > 0,
Ot+uhZ lAZ*, t ->• oo. (3.27)
(ii) If the paths are piecewise constant with finitely many jumps in
finite intervals, then
Zt A Zq, t -> oo.
(m) With P* as in Theorem 2.1 we have
P(CNte-)%P*(Cxe-), t^oo, (3.28)
and
p(eSNt^ze-)tAp*(z°e-), t^oo. (3.29)
356 Chapter 10. REGENERATION
(iv) If E is metric, £ its Borel subsets, and the paths right-continuous,
then
Zt —> Zq, t —> oo.
(v) If (E,£) is Polish and the paths are right-continuous with left-
hand limits, then
OtZ^Z*, t^t oo,
where —> means weak convergence in the Skorohod topology.
(c) With S the smooth tail a-algebra on 7i {Section 9.1 in Chapter 3) we
have
p(Ze-)\s = P(z'e-)\s,
and with T the tail a-algebra on 7i we have
P(9UhZ e -)|r = P(OuhZ' e -)|r for each h > 0.
(d) The process Z is S-trivial and smoothly mixing {Section 2.3 in
Chapter 6).
PROOF, (a) Fix e > 0. Theorem 7.1 in Chapter 2 yields the existence of
(S,S', K,K') such that
S — S and K is a randomized stopping time
with respect to 5 in the presence of S'k,,
5' = 5' and K' is a randomized stopping time
with respect to 5' in the presence of Sk,
(3.30a)
(3.306)
\Sk-S'k,\<e. (3.30c)
Due to (3.30a) and Theorem 3.2 above, (fi, T, P) can be extended to
support (Te,Re) such that
(TE,RE)^(Sk,S'k,), (3.31a)
(TE,RE) is independent of 6Te Z, (3.316)
8T,Z = Z°. (3.31c)
Section 3. Classical Regeneration 357
Due to (3.306) and Theorem 3.2, (Cl,!F, P) can be further extended to
support (TE,R'E) such that
(ft,R'e)2(S'k„Sk), (3.32a)
(Te', R'E) is independent of 6T, Z', (3.326)
6T-CZ' = Z°. (3.32c)
From (3.30c), (3.31a), and (3.32a) we obtain (3.21) and
(Te,RE)^(R'EX)- (3-33)
From (3.31c) and (3.32c) we find that BTcZ = 6T<Z', which together
with (3.33), (3.316), and (3.326) yields (3.22). Apply Theorem 6.2 in
Chapter 5 to obtain Z^ such that (3.23) holds.
(a') To obtain (3.24) and (3.25), use (a) and Corollary 9.1 in Chapter 5.
(6) To obtain (3.26), use (a) and Theorem 9.4 in Chapter 5. To obtain
(3.27) from (3.26), take Z' = Z* and use P(8t+uhZ* e ■) = P{Z* <E •)• To
obtain (ii), use (a) and Theorem 7.2 in Chapter 5. To obtain (3.28) from
(n), note that the process with value Cn, at time t [the value D if Nt = 0]
is piecewise constant with finitely many jumps in finite intervals, is classical
regenerative with regeneration times S, and has stationary version having
marginal distribution P*(Ci € •) at time zero (see Theorems 2.1 and 3.1).
To obtain (3.29) from (3.28), note that
HPC^.^eO-P'^eOII
= \\P((CNt,8SNtz) e ■) - P*((Cu9Xlz°) e OH
= ||P(CJVte-)-P*(C'1e-)ll,
where the latter equality is due to the facts that Cn, and 9sN Z are
independent under P, that C\ and 0XiZ° are independent under P*, and that
P{0sNtZ e •) = P*{0XlZ° e •) [see (3.3) in Lemma 3.1 of Chapter 6]. To
obtain (iv), use (a) and Theorem 7.3 in Chapter 5. To obtain (v), use (a)
and Theorem 7.4 in Chapter 5.
(c) To obtain (c), use (a) and Theorem 9.4 in Chapter 5.
(d) Due to Theorem 2.3 in Chapter 6, the process Z is 5-trivial and
smoothly mixing if for all B as at (3.19) and all h > 0,
\\P(6t+UhZ E-\Z EB)- P(6t+UhZ e -)|| ->• 0, * ->• oo.
This follows from (6) by repeating the proof of (3.18) with 9tZ replaced by
8t+UhZ. □
358 Chapter 10. REGENERATION
Remark 3.4. Let 5 be a renewal process and 5° its zero-delayed version.
Consider this special case of (3.26): for all h > 0,
\\P(Bt+Uhe.)-P(B°+Uh)e-)\\^0, *^oo. (3.34)
Theorem 3.4 has the following converse.
If So is exponential and (3.34) holds, then
Xi is nonlattice,
since otherwise there is a d > 0 such that P(-^nd+t/d/2
e (d/2,d] + dZ) = l
and P(Bnd+Ud/2 E (0,d/2] -f- dZ) > P(d/2 < 50 < d) for all n, implying
\\P(Bnd+Ud/2 e •) - P(S°d+[/d/2 e -)ll £ 2P(d/2 < 50 < d) > 0 for all n,
and thus contradicting (3.34).
4 Wide-Sense Regeneration - Harris Chains - GI/GI/k
It turns out that all the results from the previous section (except those on
mixing and triviality) still hold if we allow the future after regeneration
to depend on the past, as long as the future is independent of the past
regeneration times. For lack of a better term we shall call this wide-sense
regeneration. If the dependence lasts only over a time interval of length I,
then the regeneration is lag-l (in this case the results on mixing and
triviality hold). At the end of this section we show that this kind of regeneration
occurs in so-called Harris chains and in the GI/GI/k queueing system.
4.1 Definitions
Call a one-sided shift-measurable stochastic process Z wide-sense
regenerative with regeneration times S if
0Sn(Z,S) = (Z°,S°), n>0, (4.1)
and
0Sn(Z,S) is independent of (50,...,5„), n ^ 0. (4.2)
Call the pair (Z, S) wide-sense regenerative if this holds. If (Z, S) is wide-
sense regenerative, then the cycles are in general not i.i.d. but S is still
a renewal process. Let the nonnegative random variable 5_i be such that
(S_!,So) is independent of (Z°,S°).
With I ^ 0, call a wide-sense regenerative (Z,S) lag-l regenerative if
(4.2) can be strengthened to: for n > 0,
9Sn(Z,S) is independent of ((2s)(,e[0i(Sn_0+],50, ■ • • ,S„); (4.3)
Section 4. Wide-Sense Regeneration - Harris Chains - GI/GI/k 359
and lag-l+ regenerative if (4.2) can only be strengthened to: for n ^ 0,
6Sn(Z,S) is independent of ((Zs)s€[Q^s„-i)+),So, ■ • ■ ,Sn).
Thus lag-0+ regeneration is the same as classical regeneration, while lag-0
regeneration implies further that Zsn is nonrandom.
A pair (Z1,S") is a version of a wide-sense regenerative (Z, S) if (Z1, S')
is also wide-sense regenerative and
0s.(Z',S') = (Zo,So). (4.4)
A pair (Z',S') is a version of a lag-/ regenerative (Z,S) if (Z',S') is also
lag-/ regenerative and (4.4) holds. In both cases (Z°,S°) is a zero-delayed
version of (Z,S).
Note that if (Z, S) is classical regenerative, then (Z, S) is in particular
wide-sense regenerative, and that if (Z', 5') is a wide-sense version of (Z, S),
then (Z',S') need not be a classical version of (Z,S), since the delay of
(Z',S') need not be independent of the cycles.
4.2 Observations - Examples
Suppose (Z, S) is classical regenerative and / is a measurable mapping
from {H,7i) into some measurable space. Then the process (/(#sZ))se[o,oo)
is in general not classical regenerative. It is, however, wide-sense
regenerative with regeneration times 5. In particular, the path process (#sZ)sg[o,oo)
[which has state space (H, 7i)] is wide-sense regenerative with regeneration
times 5 but certainly not classical regenerative (unless Z is a nonrandom
constant).
In fact, the following conservation properties hold:
If Z is classical regenerative with regeneration times 5, then so is
(/(Zs))s6[oj00) for all measurable mappings / from (E,£) into some
measurable space.
If Z is wide-sense regenerative with regeneration times 5, then so is
(f(8sZ))se[0]0O) for all measurable mappings / from (H,H) into some
measurable space.
Note that a Markov process does not have these conservation properties:
If Z is a Markov process, then in general (/(#s-£))se[o,oo) is n°t a
Markov process and neither is (/(■^s))ss[o>oo)-
On the other hand, the following holds:
For any stochastic process Z the path process (0sZ)s6[o,oo) is always
a Markov process.
360 Chapter 10. REGENERATION
This is because the state of the path process at time t (namely 6tZ)
determines its complete future (since 9t+sZ = 0s6tZ for s ^ 0), and therefore
trivially the future depends on the past only through the present.
If Z is any stationary process and 5 is a renewal process that is
independent of Z, then (Z, S) is wide-sense regenerative but in general not classical
regenerative.
If 5 is a renewal process, then the process (Ns+i — iVs)sg[o,oo) is lag-'
regenerative with regeneration times S for any I > 0, but in general not
classical regenerative.
If Z is wide-sense regenerative (or lag-/ regenerative) with regeneration
times 5, then so is (ZS,AS,BS,DS, Us, Ns+t - iVs)s6[0]Oo).
Consider a random walk with step-lengths that have a strictly positive
expectation. Let 5 be the ladder heights renewal process, that is, let Sq be
the first nonnegative state of the random walk, and recursively for n > 1
let 5„ be the first state of the random walk that is greater than 5„_i. Let
Z = (Zs)sg[o,oo) be the process with Zs the number of times the random
walk visits the interval (s,s + I], where I > 0. Then (Z,S) is wide-sense
regenerative but in general not lag-/ regenerative.
More substantial examples are given at the end of this section.
4.3 Extension of the Theorems from the Classical Case
The stationarity result (Theorem 3.1) holds in the wide-sense case:
Theorem 4.1. Suppose (Z, S) is wide-sense regenerative with E[Xi] < oo.
Then (Z*,S*) in Theorem 2.1 is a stationary version of(Z,S). Further, if
Xi is lattice with span d, then (Z**,S**) in Theorem 2.2 is a periodically
stationary version of (Z, S) with period d.
PROOF. In the proof of Theorem 3.1 replace the delays by the delay lengths
and the cycle C\ by its length X\ to obtain the desired result. □
Distributional coupling is even more useful in the wide-sense case than in
the classical case. Note that if two independent versions of a wide-sense
regenerative process regenerate at the same time, then we can in general
not switch from one to the other without changing the distribution of the
process. This is because the future after regeneration is not independent of
the past. However, after a simultaneous wide-sense regeneration the
processes do continue in the same way distrihutionally (since the simultaneous
regeneration only has to do with the past regeneration times and thus does
not affect the future), and this is all we need to have a distributional exact
coupling.
More generally, in order to obtain a distributional exact coupling it is
reasonable to expect that (as in the classical case) we only need to be able
to stop the two sequences of regeneration times in such a way that the two
stopped regeneration times have the same distribution.
Section 4. Wide-Sense Regeneration - Harris Chains - GI/GI/k 361
In fact, the key coupling result (Theorem 3.2) extends to the wide-sense
case:
Theorem 4.2. Theorem 3.2 holds with (Z,S) wide-sense regenerative.
PROOF. The proof of Theorem 3.2 needs no modification to cover the wide-
sense case. □
Also, the analogue of the strong Markov property extends to the wide-sense
case.
Lemma 4.1. Let (Z,S) be wide-sense regenerative. Suppose K is a
stopping time with respect to S or, more generally, K is a randomized stopping
time with respect to S and conditionally independent of Z given S. Then
9sK (Z, S) is a copy of (Z°, S°) and is independent of (So,..., Sk).
If further (Z,S) is lag-l regenerative for some I ^ 0, then in addition,
9Sk(Z,S) is independent of ((Zs)se[0^sK~i)+],S0, ■ ■., SK)-
Proof. In the proof of Theorem 3.2 replace Z by (Z,S) and replace R
first by (So, ■ ■., Sk) to obtain the first statement and then replace R by
((^s)«€[o,(Sk -()+] > ^o, ■ • •, Sk) to obtain the second statement. □
The exact coupling theorem (Theorem 3.3) holds in the wide-sense case,
except the mixing and T-triviality result:
Theorem 4.3. Let (Z,S) be wide-sense regenerative and (Z',S') be a
version of(Z,S). Suppose either
X\ is spread out,
or
X\ is lattice with span d, and So and S'0 are both d!L valued.
Then the claims (a), (b), and (c) in Theorem 3.3 hold. The claim (d) holds
if (Z, S) is lag-l regenerative for some I ^ 0, but not in general.
PROOF. The proof of Theorem 3.3(a, b, c) needs no modification to cover
the wide-sense case. In order to establish (d) in the lag-/ case modify the
proof of (3.18) by considering (Z, (SNtn+r+k)^) rather than (Z, (5Arln+A.)g°).
As a counterexample to (d) in the general case take the process that is in
state 1 at all times with probability \ and in state 0 at all times with
probability |. This process is neither T-trivial nor mixing, but it is wide-
sense regenerative with any independent renewal process as regeneration
times. □
362 Chapter 10. REGENERATION
The epsilon-coupling theorem (Theorem 3.4) also holds in the wide-sense
case, except the smooth mixing and <S-triviality result:
Theorem 4.4. Let (Z, S) be wide-sense regenerative and (Z', S') be a
version of (Z,S). Let U be uniform on [0,1) and independent of Z and Z'.
Suppose
X\ is nonlattice.
Then the claims (a), (b), and (c) in Theorem 3-4 hold. The claim (d) holds
if (Z, S) is lag-l regenerative for some I ^ 0, but not in general.
Proof. The proof of Theorem 4.3(a, 6, c) needs only modification in one
place to cover the wide-sense case. Rather than deducing
P(fl5iV,-1^e-)^P*(^0e-)» t^oo, [this is (3.29)]
from (3.28), it suffices to observe that now this follows [like (3.28)] from
the (n)-part of (6), since (^s^_1^)ss[o,oo) ls regenerative in the wide sense
with piecewise constant paths having jumps at the regeneration times S
and since the stationary version has marginal distribution P*(Z° € •) at
time zero.
In order to establish (d) in the lag-Z case repeat the proof of (3.18)
with 6tZ replaced by 9t+uh,Z and (Z, (SNtn+k)^) by (Z, (SNtn+,+k)c?)- As
a counterexample to (d) in the general case again take the process that
is in state 1 at all times with probability | and in state 0 at all times
with probability |. This process is neither 5-trivial nor smoothly mixing,
but it is wide-sense regenerative with any independent renewal process as
regeneration times. □
4.4 Existence of Regeneration Times
We shall now show that (maybe not surprisingly) we need only two
regeneration times So and S\ to have the whole sequence S (see the corollaries).
Theorem 4.5. Let Z be a one-sided shift-measurable stochastic process
and Sq and Si random times such that 0 ^ So < S\ and
6SlZ^6SoZ. (4.5)
Then the underlying probability space (ft, T, P) can be extended to support
a sequence of random times S such that
6Sn(Z,S)^(Z°,S°), n^O, (4.6)
(Z,So,S\) depends on (X2,X3,...) only through 0s1Z. (4.7)
Comment. It can be deduced from (4.6) and (4.7) that for each n ^ 1,
(Z,S0,---,Sn) depends on (Xn+l,Xn+2, ■ ■ ■) only through 0SnZ.
Section 4. Wide-Sense Regeneration - Harris Chains - GI/GI/k 363
Proof. We shall apply the transfer extension from Section 4.3 of Chapter 3
recursively. Also, the following result from Lemma 4.1 in Chapter 3 will be
used repeatedly: for any random elements Yq, Yx, Y2, and Y3 it holds that
Y3 depends on Y2 only through (Yi,Yq)
and on Y\ only through Y0
if and only if
Y3 depends on (Y2,Y\) only through Y0.
Use (4.5) and the transfer extension to obtain X2 such that (Z, So, Si)
depends on X2 only through Qs^Z and
(9SlZ,X2) = (Za,Xl). (4.8)
Use (4.8) and the transfer extension to obtain X3 such that (Z, So, Si)
depends on X3 only through (6slZ,X2) and
(6SlZ,X2,Xs) = (Z°,X1,X2).
Repeat this countably many times to obtain a sequence X2,X3,... such
that for n ^ 1,
(6SlZ,X2,...,Xn+1) = (Z°,X1,...,Xn), (4.9)
(Z,So,Si) depends on Xn+1 only through (6SlZ,X2,.. .,Xn). (4.10)
From (4.9) we obtain ^(Z, S) = (Z°,S°), which in turn implies that
6Sn{Z,S) = (Z°,S°) holds for all n > 0, that is, (4.6) is established.
In order to obtain (4.7), note that due to (4.10) the following claim holds
for k = 2:
(Z,S0,Xi) depends on (X2,...,Xk) only through 6SlZ. (4.11)
Make the induction assumption that (4.11) holds for some k > 2.
According to (4.10), (Z,SQ,Xi) depends on Xk+i only through (8SlZ,X2,... ,Xk)
and, according to (4.11), on (X2,... ,Xk) only through &sxZ. Therefore
{Z,So,Xi) depends on (X2,... ,Xk+i) only through 6s1Z. Thus, by
induction, (4.11) holds for all k > 2, that is, (4.7) is established. □
Corollary 4.1. In addition to (4.5), suppose {0saZ,Xi) is independent of
D and6s1Z is independent of(D,C\). Then (Z,S) is classical regenerative.
Proof. We must show that for all n > 0,
(D, d,..., C„) is independent of 0Sn {Z, S). (4.12)
Due to (4.7), (D,Ci) depends on {X2,X3,...) only through 6SlZ, and by
assumption, (D,Ci) does not depend on 8SlZ. Thus (D,Ci) is independent
oi(0SlZ,X2,X3,.--)- Thus (4.12) holds forn = 1.
364 Chapter 10. REGENERATION
Make the induction assumption that (4.12) holds for some n ^ 1. We just
established that C\ is independent of 9s1(Z, S). Due to (4.6), this implies
that Cn+i is independent of 9$n+1(Z, S). Due to (4.12), this implies that
(D, C\,... ,Cn+i) is independent of 9sn+1(Z, S). Thus, by induction, (4.12)
holds for all n > 1.
It remains to establish (4.12) for n = 0. From (4.7) it follows that
(X2, X3,...) depends on (D, (8s0 Z, X\)) only through 9Sl Z and thus on D
only through ((9s0Z,X\),9s1Z). Since 0s1Z = 9x19s0Z, this means that
(X2,X3,...) depends on D only through (9s0Z,Xi). Since by
assumption D is independent of (9s0Z,Xx), this implies that D is independent of
(0SoZ,X1,X2,...), that is, (4.12) holds also for n = 0. D
Corollary 4.2. In addition to (4.5), suppose {9g0Z,X\) is independent of
So, and 9s1Z is independent of (So, Si). Then (Z,S) is wide-sense
regenerative.
Proof. We must show that for all n > 0,
(S0, Xi,..., Xn) is independent of 9sn {Z, S).
This follows by replacing D, C\, C2, ■ ■ ■ by So, X\, X2,... in the proof of
Corollary 4.1. □
4.5 Harris Chains — Regeneration Sets — Harris Processes
A discrete-time Markov process Z = (Zjt)o° with state space (E,£) and
one-step transition probabilities P (see Chapter 6, Section 3.1) is a Harris
chain if it has a regeneration set, that is, if there is a set A E £ such that
the hitting time of A
ta = inf {* > 0 : Zt E A}
is finite with probability one for all initial distributions and there is an
I > 0, ap£ (0,1], and a probability measure \i on (E, £) such that
P(Zle-\Z0=x) = P'(x,-)^Pfi, xeA. (4.13)
Note that if E is finite or countable and (^jt)o° ls irreducible and recurrent,
then (Zjt)o° is a Harris chain. To see this, put A = {i} with i € E arbitrary,
take some I > 0, and put p — 1 and n = T?(Z[ € -\Zq = i). More generally,
we could let A be any finite set of states and take I > 0 large enough for
the distributions P(Zt € -\Z0 = i), i € A, to have a nontrivial common
component p[i.
In order to extend regeneration sets to continuous time we need the
strong Markov property (this property automatically holds in the discrete-
time case). A random time r is a stopping time with respect to a continuous
time process Z = (Zs)s6[oi00) if for all t > 0, the event {r ^ t} is in the
Section 4. Wide-Sense Regeneration - Harris Chains - GI/GI/k 365
cr-algebra generated by (Zs)se[0t]. In particular, a measurable hitting time
is a stopping time. A shift-measurable Markov process Z = (•^s)ss[o,oo)
with semigroup of transition probabilities Fs, 0 ^ s < oo, (see Chapter 6,
Section 3.1) is a strong Markov process if the Markov property holds at all
stopping times r, that is, if
8TZ depends on (Zs)s€[o,r] only through ZT and
P(6Tz e-\zT = x) = p(z e -\z0 = x), x e E.
Call a set A € £ a regeneration set for a strong Markov process Z =
(Zs)sg[o,oo) if the hitting time ta is measurable and finite with probability
one for all initial distributions and ZTA € A, and if there is an I > 0, a
p € (0,1], and a probability measure \i on (E,£) such that (4.13) holds.
Intuitively, (4.13) means that whenever Z enters A, then it lag-?
regenerates I time units later with probability p. The problem is that in general
we cannot tell by only observing the process itself whether regeneration
is occurring or not. In order to make the lag-? regeneration observable we
shall use the splitting extension from Section 5 in Chapter 5. Note that
with
T0 = TA and Tk+l = inf{* > Tk + I : Zt € A} for k :> 0,
we have [due to the strong Markov property]
P(ZTk+le-\ZTk=x)=Pl(x,-)^p^ xeA. (4.14)
This allows us to extend the underlying probability space by conditional
splitting. Apply Corollary 5.1 in Chapter 3 recursively [in step k take
(Y0,Y1:Y2) := (ZTk, (Z,I0, ■ ■.,h-i), ZTk+i)] to obtain a sequence of 0-1
variables Iq , h,... such that for k ^ 0,
(Z,I0,.. .,h-i) depends on Ik only through (ZTk, ZTk+l), (4.15a)
P(/t = l\ZTk =x)=p, xeA, (4.156)
P(ZTh+ie-\ZTh=x,Ik = l) = fi, xeA, (4.15c)
P(h = l\ZTk = x, ZTk+l =y)= p^J., xeA,yeE. (4.15d)
Put
Sn = TKn+l, n > 0, (4.16)
where
Kn is the (n + l)th index k such that Ik — 1.
366 Chapter 10. REGENERATION
We shall show that the sequence S forms lag-? regeneration times for Z. To
those who possess the magical Markovian intuition this is rather obvious:
At the randomized stopping time Sn the Markov process Z has
distribution [i and is independent of its state at time Sn — I. Thus its
future after time Sn is a version of Z with initial distribution \i and
is independent of its past before time Sn — I.
In spite of this we shall work through the conditional independence
arguments in full detail because the Markovian intuition is delicate and can
easily go wrong.
Theorem 4.6. Let Z be a discrete- or continuous-time Markov process
with a regeneration set A and let S be obtained by splitting as above. Then,
with I, p, and pL as at (4.13),
Io,h,--- are i.i.d. with P(Io = 1) = p, (4-17)
0snZ is a version of Z with initial distribution pi for n ^ 0, (4-18)
and (Z, S) is lag-l regenerative and the distribution of its zero-delayed
version (Z°,S°) does not depend on the initial distribution of Z.
Comment. Since the cycle-lengths are greater than I, this implies that
although the cycles are dependent, they are only one-dependent, that is,
for each n > 0, (D, C\,..., C„) and (Cn+2, Cn+z, ■ ■ •) are independent.
PROOF. Fix k > 0 and put [observe that Ik is left out]
Wk = (0Tk+lZ,Ik + l,h+2,- ■ ■ ),
Vk = ((Zs)s6[OTfc],io,- • • ,h-i)-
Note that if the Kn are finite, then for each n > 0, @s„(Z, S) is obtained
from Wk„ by the same measurable mapping as 9s0 (Z, S) from Wk0 , and
({Zs)B£[o,sn-i]i So, ■ ■ ■, Sn) is obtained by a measurable mapping from Vk„ •
Thus if we can show that the Kn are finite with probability one and that
Wk„ and Vk„ are independent and Wk„ = Wk0, (4-19)
then it follows that (Z, S) is lag-? regenerative.
To prove this we shall use the following result from Lemma 4.1 in
Chapter 3 repeatedly: for any random elements y0, ^li ^2, and Y3 it holds that
Y3 depends on Y2 only through (Yi,Y0)
and on Y\ only through Y0
if and only if
Y3 depends on (y2, Yi) only through Yq.
Section 4. Wide-Sense Regeneration - Harris Chains - GI/GI/k 367
For 0 ^ i < k, {ZTi,ZT{+i) is a measurable mapping of (■Zs)se[o,Tfc], and
thus we deduce from (4.15a) that
6Tk+iZ depends on U only through ((Zs)se[QtTk],Io, ■■■, £-i)-
Thus 6Tk+lZ depends on Ik^ only through {{Za)S£[o,Tk], h, ■ ■ -,h-2), and
on Ik-2 only through ((Zs)se[0,Tk],Io, ■ • ■, h-z),..., and on I0 only through
(Zs)se[o,Tk], and finally [due to the strong Markov property] on {Zs)se[o,Tk]
only through Zj-k. Thus
Ork+iZ depends on Vk only through Zj-k. (4.20)
Due to (4.15a), Vk depends on Ik only through (ZTk,@Tk+iZ) and, due to
(4.20), on 8rk+iZ only through Zyk. Thus
(0Tk+lZ,Ik) depends on Vk only through ZTk. (4.21)
Due to (4.156), Ik is independent of Z?k, and thus (4.21) implies that Ik
is independent of (I0,..., h-i)- This and (4.156) yields (4.17), which in
particular implies that the Kn are finite with probability one.
Due to (4.21), 0Tk+lZ depends on {Vk,Ik) only through (ZTk,Ik, ZTk+l)
and, due to (4.15a), on Ik only through {Zj-k,ZTk + I), and [due to the
strong Markov property] on Zj-k only through Zj-k+i. Thus
0Tk+iZ depends on (Vk,Ik) only through ZTk+l. (4.22)
From (4.15a) it follows that for each n ^ 1, the pair (Vjt,^it) depends on
h+n only through (8Tk+iZ, Ik+l,..., Ik+n-i), and on h+n-i only through
{6Tk+iZ, Ik+1,..., Ik+n-2), ■■-, and on Ik+l only through 6Tk+lZ, and
finally [due to (4.22)] on 8rk+iZ only through Zxk+i- Thus
Wk depends on {Vk,h) only through ZTk+l. (4.23)
Due to (4.21), Zrk+i is conditionally independent of Vk given (ZTk,h),
and due to (4.15c), Zrk+i is conditionally independent of (ZTk,h) given
the event {Ik = 1}. Thus Zj-k+l is conditionally independent of Vk given
{h = 1}. Thus, due to (4.23),
Wk is conditionally independent of Vk given {/* = 1}. (4-24)
This (and the fact that the event {Kn ^ k} is in the cr-algebra generated
by Vk) yields the third identity in
P(wKn e-,vKn e-,Kn = k)
= P(Wke-,vke-,Kn^k,ik = i)
= P(Wk e-,Vfc E-,Kn^k\Ik = l)P(/t = 1) (4.25)
= P(Wk e-\h = i)P(Vt e -,K„ ^ *|/t = i)P(/t = l)
= p(wke-\h = i)P(vKn e-,Kn = k),
368 Chapter 10. REGENERATION
while for the second and fourth we have used {Kn = k} = {Kn^k, Ik = 1}-
Due to (4.15c), given the event {Ik = 1} the conditional distribution of
Zj-k+i is n, and due to (4.22), BTk+lZ depends on Ik only through ZTk+l.
This [and the strong Markov property] yields that
given {ifc = l}, 9rk+i Z is version of Z with initial distribution/^. (4.26)
For n > 1, due to (4.15a), Ik+n depends on (Ik,9Tk+iZ, Ik+i,..., h+n-\)
only through (Zrk+n, ZTk+n+i), and due to (4.15d), the conditional
distribution of Ik+n given the value of (ZTk+n,ZTk+n+i) does not depend on k.
This and (4.26) yields
P(W* e-|/t = i) = P(w0e-1/0 = 1).
Put this into (4.25) and sum over k > 0 to obtain
P(wKn e -,vKn e •) = P(w0 e -\i0 = i)P(VKn e ■), n > o. (4.27)
This identity yields (4.19), that is, (Z,S) is lag-? regenerative. Also, we
obtain from (4.27) that
P{0snze-) = -p(0To+iZe-\io = i), n>o,
which together with (4.26) yields (4.18). Finally, due to (4.26), the
distribution of 9t0+iZ does not depend on the initial distribution of Z and, due
to (4.15a), neither does the conditional distribution of (7i,/2, • • •) given
the value of 8t0+iZ and thus, due to (4.27), neither does the distribution
of (Z°,S°); and the proof is complete. □
Remark 4.1. Note that in the discrete-time case, when I = 1, then (Z,S)
is in fact classical regenerative (since in discrete time lag-1 regeneration
means classical regeneration).
Remark 4.2. Consider the discrete-time case and let the state space (E, £)
be Polish. A discrete-time Markov process Z = (Zjt)o° wit^ a regeneration
set is a generalization of the regeneration appearing in irreducible recurrent
Markov chains. The following concept is a generalization of the recurrence
property of such processes. Let <p be a a-finite measure on (E, £). A discrete-
time Markov process Z with state space (E,£) is <p-recurrent if for B E £
such that ip{B) > 0, the hitting time rjg is finite with probability one for
all initial distributions or, equivalently, the set B is visited infinitely many
times with probability one. Note that a discrete-time irreducible recurrent
Markov chain has this property with (p the counting measure.
In fact, <p-recurrence for some (p is the original definition of a Harris
chain: it can be shown (see Orey (1971)) that <p-recurrence for some <p and
the existence of a regeneration set are equivalent properties. Relying on this
deep theorem, we can rather easily add one more equivalence result in the
Section 4. Wide-Sense Regeneration - Harris Chains - GI/GI/k 369
discrete-time version of Theorem 5.1 in Chapter 6 (see Theorem 4.6 above
and Theorem 2.2 in Chapter 5 for the 'only-if part and Glynn (1982) for
the 'if part):
A Markov process Z = (Zjt)o° possessing a stationary distribution is
a Harris chain if and only if there is a successful shift-coupling of each
pair of differently started versions of Z.
A Harris chain is aperiodic if the inter-regeneration times are aperiodic
(this can be shown not to depend on the choice of A, I, and p at (4.1)).
We can add one more equivalence result in the discrete-time versions of
Theorem 4.1 in Chapter 6 (see Theorem 4.6 and Theorem 3.3(a) for the
'only-if part and Proposition 3.1.3 in Asmussen (1987) for the 'if part):
A Markov process Z = (Zjt)o° possessing a stationary distribution
is an aperiodic Harris chain if and only if there is a successful exact
coupling of each pair of differently started versions of Z.
Remark 4.3. Consider a continuous-time strong Markov process Z =
(Zs)s6[0jOO) with a Polish state space and right-continuous paths having
left-hand limits. Then Z is a Harris process if it is ip-recurrent for some
(p, that is, if there exists some cr-finite measure (p on (E,£) such that for
B € £ with tp(B) > 0, the total time spent by Z in the set B is infinite
with probability one for all initial distributions. Glynn (1994) shows that
if Z has a stationary distribution, then Z is a Harris process if and only if
for all initial distributions and all A € "H,
P(6UtZ e A) -+P(Z* e A), t-too,
where Z* is the stationary version of Z and U is uniformly distributed on
[0,1] and independent of Z and Z*. Thus in Theorem 5.1 of Chapter 6 we
can add that if Z has a stationary distribution, then the equivalent claims
(a) through (<?) hold if and only if Z is a Harris process.
In continuous time the relation between (^-recurrence for some (p (the
Harris process property) and the existence of a regeneration set is not
clear at this point, which is why we content ourselves with proving only
Theorem 4.6 here in the hope that this theory will be more fully developed
in the future.
4.6 The Queue GI/GI/k
We shall end this section by sketching an interesting example of lag-/
regeneration in queueing theory.
Consider the multiserver queueing model GI/GI/k, namely the fc-server
queueing system, 2 ^ k ^ oo, where customers arrive at times r0 < T\ <
r2 < ■ ■ • forming a renewal process and line up to be served under the
first-come-first-served discipline with i.i.d. service times /3i, /32, ■ ■ ■ that are
370 Chapter 10. REGENERATION
independent of the arrival process. Let Qi,q2,--- denote the i.i.d. inter-
arrival times, that is, an = rn — rn_i. Let Qt denote the queue length
(that is, the number of customers in the system) at time t and Rt the k-
dimensional vector of remaining service times of the customers being served
at time t ordered in decreasing manner (with zeros when some servers are
idle). Let (Qs,Rs)se[o,oo) be right-continuous. Let the initial conditions
(<2o-,-Ro-,T0) be independent of (Qi,a2,... ,/?i,/?2, ■ ■ ■ )■
In Section 3.1 above we pointed out that in the single-server case GI/GI/l
under the traffic intensity condition
E[^] < E[qi] (negative drift)
the process (Qs,Rs)se[o,oo) IS classical regenerative with the times of
arrivals to an idle system [the subsequence of r„ such that QTn _ = 0] as
regeneration times. This is, in fact, also true in the multiserver case GI/GI/k,
2 $C k ^ oo, under the natural extension of the traffic intensity condition
from the single-server case:
E[/3X] < feE[ai] (negative drift),
provided that all servers can be idle at the same time, that is, provided that
the following additional condition holds:
or equivalently,
there is a y > 0 such that P{/3\ ^ y) > 0 and P(qi > y) > 0.
Note that if P(/3i ^ oti) = 0, then whenever the queue length gets down
to one, the next arrival will take place before the service of the remaining
customer is completed, that is, the queue cannot empty. In the single-server
case the possibility of P(/?i ^ Qi) = 0 is ruled out by E[/?i] < Efaj], but
when only Ef/^] < /cE[qi] holds, then T?{f3\ ^ Qj) = 0 is a possibility.
Although the arrivals to an idle system are not regeneration times in the
multi-server case when P(/3i ^ ax) = 0, it turns out (maybe surprisingly)
that there are other times where lag-? regeneration occurs. Consider the
queue GI/GI/k with 2 ^ k ^ oo and assume that
E[/?1]<fcE[a1] and P(ft ^ ai) = 0.
Note that P(^ ^ ax) = 0 holds if and only if
there is a y > 0 such that P{/3i > y) = 1 and P(qi < y) = 1.
Due to E[/?i] < A;E[qi], we have P(/81 < on + ha*) > 0. This and
P(/?i ^ e*i) = 0 imply that there is 1 ^ i < k such that
P(/?i < Qi -h • ■ • + Qi) = 0 and P(/3! <<*!+••■+ ai+1) > 0.
Section 4. Wide-Sense Regeneration - Harris Chains - GI/GI/k 371
Due to P(/?i < Qi + • ■ ■ + Qi) = 0, we can find an / > 0 such that
P(ai + ■ • • + Qi ^ /) = 1. (4.28)
Due to P(/?i < Qi + • ■ • + Q«+i) > 0, we can find an x > 0 such that
P(/?i < x) > 0 and P(ai > x/(i + 1)) > 0. (4.29)
It is not hard to show that the negative drift condition E[/?i] < fcE[ai]
implies that the queue length process (Qs)se[o,oo) enters the state {infinitely
often. Due to P(/?i < Qi + • • • + Qi) = 0, however, it never enters the
set {0,... ,i — 1}. Let T\ < T2 < ... be the successive entrance times of
{Qs)se[o,oo) mto the state i. At time Tm there are i customers in the system,
and since i < k, they are all being served. Denote their (total) service times
by /3™i, ■ ■ ■,/3™i ■ These i customers arrived at the latest i arrival times,
since P(/3i < Qi + ■ ■ ■ + Qi) = 0 implies that the latest i arrivals are still
in the system. Denote these arrival times by r™ < ■ ■ ■ < r^\ and the next
i + 1 arrival times by r™ < ■ ■ • < r™. Let a™i+1,..., a™ be the associated
inter-arrival times. Clearly, a™,... ,a™ are i.i.d., distributed as Qi, and
independent of the past before time r™. Note that P (/?i <QiH hQj)=0
implies that the service times p^,... ,0™i are not affected by the fact that
(<3s)se[o,oo) enters the state i at time Tm, that is, (ir^i,..., p^ are i.i.d.,
distributed as /3i, and independent of (a™i+1,..., aj1). The inter-arrival
times a™i+1,..., Qq1 are, however, affected by this fact, but in spite of that
it is again not hard to show that, due to (4.29), the event
{/?™ <x, ...,P^<x, Q^+1 >x/(i + l),.. .,a?>x/(i + l)} (4.30)
occurs for infinitely many m with probability one. Let Mo be the first m
such that it occurs and put
So = r0Mo + /, where / is as at (4.28).
Recursively for n > 1, let Mn be the first integer m such that the event
{Tm > 5„_i} and the event at (4.30) both occur and put
Immediately before time t^" there are i customers being served, and since
i < k, the service of the customer arriving at time t^" starts without delay.
Since
P"? < x and a%, + ■ ■ ■ + of - > x,
the customer that arrived at time r™ will have left before time r^", and
thus the service of the customer arriving at time rx " starts without delay.
Repeat this argument to obtain that since
P-i+j<x and a%i+j + ■ ■ ■ + a^+j > x for 0^3 <i,
372 Chapter 10. REGENERATION
the service of the i customers arriving at the times r0 ",...,Tt_1 starts
without delay, and immediately before time rt " all customers that arrived
before time r0 " have left. Thus immediately before time rt " there are
[due to P(/3i < Qi + ■ • • + Qj) = 0] i customers in the system, they all
arrived at or after time r0 ", and their service started without delay. Thus
the future of the system after time TjM" is not affected by its past before
time t™" and behaves distributionally in the same way for all n ^ 0. Now
note that due to P(qi + • • • + q« ^ /) = 1, we have r^n ^ r^" + /. Thus
the future of the system after time Sn = r^n +1 is not affected by its past
before time Sn — I = t0 " and behaves distributionally in the same way
for all n ^ 0. Thus (Qs, Rs)se[o,oo) 1S lag-' regenerative with regeneration
times 5„, n ^ 0.
5 Time-Inhomogeneous Regeneration
In this section we extend the classical regeneration concept by allowing the
regeneration to depend on the time when it occurs. This is the kind of
regeneration found in time-inhomogeneous Markov chains (Markov chains with
transition probabilities that depend on the time of transition). When such
a chain visits a recurrent reference state, it only starts anew conditionally
on the time of visit, that is, time-inhomogeneous regeneration takes place.
We shall also extend wide-sense regeneration in the same way by allowing
the future, given the time of regeneration, to be conditionally independent
not necessarily of the full past but only of the past regeneration times.
Time-inhomogeneity allows the environment in which the process
develops to change deterministically with time, as is often the case in real world
situations. For instance, the traffic intensity in actual queueing systems can
vary drastically with time of day. Adapting a periodic model with period
the day is not very helpful, since the relevant time scale is often minutes
rather than days. A time-inhomogeneous model thus seems more
appropriate. In spite of this, the mathematical theory of time-inhomogeneous
models is poorly developed. Hopefully, this section and the next three (though
abstract) will be a small contribution to such a theory.
Note that if two independent versions of a time-inhomogeneous Markov
chain enter a fixed reference state at the same time, then we can let the
two chains run together from that time on without affecting their
distributions, that is, we would have created an exact coupling. Note also that
it is not natural to look for shift-coupling or epsilon-couplings of time-
inhomogeneous Markov chains because the behaviour after regeneration
can differ drastically depending on the time of regeneration.
The same observations apply to time-inhomogeneous regenerative
processes. Therefore, our main task here is to find conditions under which two
versions of such a process can be forced to regenerate simultaneously (or
simultaneously in the distributional sense). We shall carry out the main
Section 5. Time-inhomogeneous Regeneration 373
part of the construction of a successful (distributional) exact coupling in
the next section, but the resulting analogue of Theorems 3.3 and 4.3 is
stated in this section as Theorem 5.3.
5.1 Definitions
Call a one-sided shift-measurable stochastic process Z = (Zs)s€[0,oo) time-
inhomogeneous regenerative with regeneration times S if the future after
regeneration 6sn (Z, S) depends on the past ((Zs)s€[o,sn), So, ■ ■ ■, Sn) only
through the time of regeneration S„, and the conditional distribution of
6s„{Z,S) given the value of S„ is regular and does not depend on n ^ 0.
In other words, Z is time-inhomogeneous regenerative with regeneration
times S if there is a (([0, oo), B[0, oo)), (H x L, H ® £)) probability kernel
p(-\-) such that for n ^ 0 and A G W ® £,
P(6Sn(Z,S)eA\(Z.)te[0iSn),S0,...,Sn)=p(A\Sn) a.s. (5.1)
Call the pair (Z,S) time-inhomogeneous regenerative of type p(-\-) if this
holds. Let the negative random variable S_i be such that (Z°,S°) depends
on (D,S-i) only through So-
Call a one-sided shift-measurable process Z time-inhomogeneous wide-
sense regenerative with regeneration times S if the future after regeneration
6sn (Z, S) depends on the past regeneration times (So, • • •, Sn) only through
the time of regeneration S„, and the conditional distribution of 6sn(Z,S)
given the value of Sn is regular and does not depend on n ^ 0. In other
words, Z is time-inhomogeneous wide-sense regenerative with regeneration
times S if there is a (([0,oo),$[0,oo)), (H x L,H ® £)) probability kernel
p(\) such that for n ^ 0 and A G U ® £,
P(0Sn(Z,S)GA|So,...,Sn)=p(A|Sn) a.s. (5.2)
Call the pair (Z, 5) time-inhomogeneous wide-sense regenerative of it/pe
p(-|-) if this holds. Let the negative random variable S-\ be such that
(Z°,S°) depends on S_i only through So-
With / ^ 0, call a time-inhomogeneous wide-sense regenerative (Z, S)
time-inhomogeneous lag-l regenerative if (5.2) can be strengthened to: for
n ^ 0 and A G % ® £,
P(0Sn(^5)GA|(Za)a€[oi(5n-r)+],5o")...,5n)=p(A|5n) a.s.; (5.3)
and time-inhomogeneous lag-l+ regenerative if (5.2) can only be
strengthened to: for n ^ 0 and A G % ® £,
P(0S„ (Z, 5) G A|(Zs)8€[0,(sn-/)+), So,..., 5„) = p(A|5„) a.s.
Therefore, time-inhomogeneous lag-0+ regeneration is the same as time-
inhomogeneous regeneration, while lag-0 regeneration implies further that
Zsn is a measurable mapping of 5„.
374 Chapter 10. REGENERATION
A pair (Z',S') is a version of a time-inhomogeneous regenerative (Z,S)
if (Z', S') is time-inhomogeneous regenerative of the same type as (Z, S).
A pair (Z', S') is a version of a time-inhomogeneous wide-sense (or lag-/,
or lag-/+) regenerative {Z,S) if (Z',S') is time-inhomogeneous wide-sense
(or lag-/, or lag-/+) regenerative of the same type as (Z, S). Note that in
these cases the zero-delayed (Z°,S°) is in general not a version of (Z, S).
Call a wide-sense time-inhomogeneous regenerative (Z,S) of type p(-\-)
time-homogeneous if p(-\s) does not depend on s, that is, if
p(-|S)=p(-):=P((Z°,5°)G-), se[0,oo).
Thus if a time-inhomogeneous regenerative (Z, S) is time-homogeneous,
then it is classical regenerative. And if a time-inhomogeneous wide-sense
(lag-/, lag-/+) regenerative (Z, 5) is time-homogeneous, then it is wide-
sense (lag-/, lag-Z+) regenerative.
5.2 The Regeneration Times S Are Time-Homogeneous Markov
If (Z,S) is time-inhomogeneous regenerative (wide-sense or not), then in
general it is neither true that the cycles are i.i.d. nor that S forms a renewal
process. However, the following holds.
Theorem 5.1. // (Z,S) is time-inhomogeneous wide-sense regenerative,
then the sequence S is a time-homogeneous Markov process.
PROOF. According to (5.2), for each n ^ 0, (So, ■ ■ ■ ,Sn) depends only on
) through Sn and thus only on (5„+i, 5n+2,.. ■)
through Sn, that is, 5 is a Markov process. Also, according to (5.2), the
conditional distribution of (S„+i — S„, 5„+2 — 5„,...) given the value of
5„ does not depend on n, and therefore the conditional distribution of
(S„+i, S„+2, ■ ■ ■) given the value of 5„ does not depend on n, that is, S is
time-homogeneous. □
Let Fs be the conditional distribution of Xk+i given Sk = s, that is, for
s G [0, oo) and A G B[0,oo),
FS(A) :=p{H x{Ax [0,oo)°°) DL\s) = P(Xk+1 e A\Sk = s).
We shall view 5 as a 'renewal process' that is time-inhomogeneous in the
sense that if a 'renewal' occurs at time Sk = s, then the next recurrence
time Xk+i is governed by a distribution that may depend on s, namely Fs.
Call Fs the recurrence distribution at s and define, for 1 ^ n < oo, the
n-step recurrence distribution at s by
F?(A) := p(H x ([0, oo)""1 x A x [0, oo)°°) n L\s)
= P(Sk+n -Ske A\Sk = s), s£ [0, oo), A e B[0, oo).
Section 5. Time-Inhomogeneous Regeneration 375
Let Fs and F" also denote the conditional distribution functions of Xk+i
and Xk+i + • • • + Xk+n, respectively, given 5„ = s, that is, for s G [0, oo)
and x G [0,oo),
Fs(x) := Fs([0,x]) = P(Xk+1 ^ x\Sk = s),
F?(x) := Ff{[0,x]) = P(Xk+1 + ■■■ + Xk+n ^ x\Sk = s).
The n-step transition probabilities of S are
P(Sfc+nG A\Sk = s) = F?(s + A), sG[0,oo), AeB[0,oo).
If (Z, S) is time-homogeneous, then 5 is a renewal process, and we have
Fs = F independently of s, where F is the common distribution of the i.i.d.
recurrence times, and F™ = Fn, where Fn is the distribution of the sum
Xi + ■ ■ ■ + Xn (Fn is the nth convolution power of F).
5.3 Examples
Let S = (Sjt)o° be a discrete-time Markov process with state space [0, oo),
increasing strictly to infinity. Put Ns- = lim^s Nt- Then (S/v.,_)se[o,oo)
is time-inhomogeneous lag-0 regenerative with regeneration times 5, but
certainly (£Vs_)se[o,oo) is n°t time-homogeneous, not even when 5 is a
renewal process. Also, (Sn3_ , Aa, Bs, Ds, C/s)se[0iOO) is time-inhomogeneous
lag-0+ (but not lag-0) regenerative with regeneration times S. Moreover,
for any / > 0, the process (SNs_,As,Bs,Ds,Us,Ns+i - ATs)se[0]Oo) is time-
inhomogeneous lag-/ regenerative. If Z is wide-sense regenerative (or lag-/
regenerative) with regeneration times 5, then so is the stochastic process
(ZSiSn,„, As,Bs,Ds, Us,Ns+i - Ars)s6[0jOo).
Consider a discrete-time Markov process Y = (Yfc)o° with state space R
and with Hindoo Yk = oo. Put 5„ = Yk„ , where Kq = inf {k ^ 0 : Yk ^ 0}
and, recursively for n ^ 1, Kn = inf {A; ^ 0 : Yk > 5„_i}. Take I > 0 and
let Z — (Zs)s6[oi00) be the process with Zs the number of times the Markov
process Y visits the interval (s,s + I}. Then (Z,S) is time-inhomogeneous
wide-sense regenerative, but in general not lag-/.
A time-inhomogeneous Markov chain Z = (^s)S£[o,oo) with a
recurrent state is time-inhomogeneous regenerative with regeneration times S
formed by the successive entrances to this state. More generally, let Z be a
time-inhomogeneous continuous-time general state space shift-measurable
Markov process, let A be a recurrent set and let the time-homogeneous
space-time process (Zs, s)s6[o,oo) be strong Markov. If the transition
probabilities of Z are the same from all states in A, then Z is time-inhomogeneous
regenerative with regeneration times S formed by the successive entrances
to A.
Still more generally, Theorem 4.6 has a time-inhomogeneous counterpart.
Let Z = (Zs)se[o,oo) be a time-inhomogeneous general state space shift-
measurable Markov process such that the space-time process (Zs, s)se[0iOO)
376 Chapter 10. REGENERATION
is strong Markov. Suppose Z has a set of states A such that ta (the
hitting time of A) is measurable and finite with probability one for all initial
distributions and ZTA £ A, and such that for some / > 0, p G (0,1] and a
probability kernel n(-, •),
P(zt+ie-\zt=x)^pfi{t,-), xeA, te[o,oo).
Then, with T0 = ta and Tk+l = inf{t ^ Tk + I : Zt e A} for k ^ 0, we
have
P{ZTh+,e-\ZTk=x,Tk = t)2pn{t,-), xeA, *G[0,oo).
This allows us to extend the underlying probability space by conditional
splitting: apply Theorem 5.1 in Chapter 3 recursively to obtain i.i.d. 0-1
variables Iq, I\,... such that for k ^ 0,
(ZJo, ■ ■ -,h-i) depends on Ik only through (ZTk,Tk, ZTh+i),
P(Ik = l\ZTk,Tk)=p and P(ZTk+l e-\ZTk,Tk,Ik) = n(Tkr).
Let Kn be the (n + l)th index k such that Ik = 1. Conditionally on the
randomized stopping time 5„ = T^n + I, the time-inhomogeneous Markov
process Z has conditional distribution n{Sn, •) at time 5„ and is
conditionally independent of its state at time 5„ — /. Thus, conditionally on 5„, the
future of Z after time 5„ is independent of the past before time Sn — I,
and the conditional distribution of the future given the value of Sn does
not depend on n. This argument can be sharpened along the lines of the
proof of Theorem 4.6. Thus Z is time-inhomogeneous lag-/ regenerative.
Finally, consider the following time-inhomogeneous version of the GI/GI/k
queueing model, 1 ^ k $C oo: customers arrive to a fc-server station at times
forming a Markov sequence with state space [0, oo) and increasing strictly
to infinity, and line up to be served under the first-come-first-served
discipline with service times that depend on the time of arrival and/or the time
when the service starts. A simple special case is the time-inhomogeneous
version of the M/GI/k queue obtained by allowing the Poisson arrivals
(M stands for memoryless, the Poisson process property) to be nonstation-
ary. If the system empties infinitely often with probability one, then the
queue length and ordered remaining service times process {Qs,Rs)se[o,oo)
is time-inhomogeneous regenerative with the times of arrivals to an idle
system as regeneration times. If the system cannot empty but (<3s)s€[o,oo)
has a recurrent state i < k and the event at (4.30) occurs for infinitely
many m, then the argument in Section 4.6 shows that (Qs,Rs)se[o,oo) 1S
time-inhomogeneous lag-/ regenerative.
5.4 The Key Coupling Result
The key coupling result from the time-homogeneous case (Theorems 3.2
and 4.2) extends as follows to the time-inhomogeneous case. We leave out
Section 5. Time-inhomogeneous Regeneration 377
the random variable R this time, since we are not going to need it for
epsilon-coupling.
Theorem 5.2. Let (Z,S) be time-inhomogeneous wide-sense regenerative
of type p(-\-). Let (S,K) be such that
S = S, (5.4a)
K is a randomized stopping time with respect to S. (5.46)
Then the probability space (£l,!F, P) on which (Z,S) is defined can be
extended to support a K such that
(S,K)^(S,K), (5.5a)
K is a randomized stopping time with respect to S, (5.56)
K is conditionally independent of Z given S. (5.5c)
Moreover, with T = Sa,
T = Sj<, (5.6a)
P{0TZe-\T)=p{-\T) a.s. (5.66)
Proof. Apply the transfer extension in Section 4.5 of Chapter 3 to obtain
(5.5a) and (5.5c) from (5.4a). From (5.46) and (5.5a) it follows that (5.56)
holds. From (5.5a) it follows that (5.6a) holds. Finally, (5.66) holds due to
(5.56), (5.5c), and the following lemma. □
Lemma 5.1. Let (Z,S) be time-inhomogeneous wide-sense regenerative of
typep(-\-). Suppose K is a stopping time with respect to S or, more
generally, K is a randomized stopping time with respect to S and conditionally
independent of Z given S. Then
P(6Sk(Z,S)e-\S0,...,Sk)=p(-\Sk) a.s.
If further (Z, S) is lag-l regenerative for some I ^ 0, then
p(8sk{Z,S) G •|(^)*€[o,(Sfc-o+].s'o,...,S'fe) =p(-\Sk) a.s.
Proof. The event {K = n} depends on (Z, S) only through 5 [since
K is conditionally independent of Z given S] and on S only through
(So,... ,Sn) [since K is a randomized stopping time with respect to S].
Thus {K = n} depends on (Z, S) only through (S0,..., Sn). Thus {K = n}
depends on 6sn (Z, S) only through (So, • • •, 5„). Since (So,..., S„) depends
on 6sn (Z, S) only through S„, this implies that {K = n} and (So,..., Sn)
378 Chapter 10. REGENERATION
depend on 6sn(Z,S) only through Sn. This yields the second equality in
the following calculation: with A G U ® C, B G Uf=lB[Q, oo)* and n ^ 0,
P(0Sfc (Z, 5) G A, (So, • • •, Sk) G B, K = n)
= P(6Sn (Z, S)eA,(S0,...,Sn)eB,K = n)
= E[P(6Sn(Z,S) G A|5n)P((50,...,5n) G B,K = n\Sn)}
= E[p(A|5n)E[l{(Soi...,Sn)eB,K=n}|5n]] (due to (5.2))
= Eb(j4lS'n)l{(So,...,S„)eB,K=n}]
= E\p(A\SK)l{{So,...,SK)€B,K=n}\-
Sum over n to obtain that for A G U ® £ and I? G Ug^/SfO, oo)*,
P(0Sfc (Z, 5) G A, (So, ...,Sk)eB)= E[p(A|5fc)l{(s0,...,sfc)€B}],
that is, the first claim of the lemma holds. In order to obtain the second
claim replace (S0,...,Sk) by {{Zs)se[0t{Sk _,)+], 50,... ,5fe) in the above
argument. □
5.5 Exact Coupling
In the time-homogeneous case we established the existence of a successful
exact coupling (Theorems 3.3 and 4.3) assuming only that the cycle-lengths
are either spread out or lattice with delay lengths supported by that same
lattice. The proof was based on coupling results for renewal processes from
Chapter 3 (Theorem 6.1) and Chapter 2 (Theorem 7.2), which in turn relied
on the Ornstein coupling, that is, on the idea of coupling two versions of a
random walk in such a way that the pairwise difference of their step-lengths
is bounded, symmetric, and 'strongly aperiodic'. The random walk formed
by the difference of the coupled random walks is then recurrent, and hence
the two coupled random walks eventually meet.
The Ornstein coupling method does not apply naturally in the time-
inhomogeneous case: if the two versions of a random walk are replaced
by two versions of a discrete-time Markov process, then a pair of step-
lengths (increments) in general cannot be coupled in such a way that their
difference is bounded, and even if this could be accomplished, it still is far
from implying that the difference process is recurrent.
The classical coupling method, however, works smoothly in the time-
inhomogeneous case: if two independent time-inhomogeneous regenerative
processes (wide-sense or not) regenerate at the same time, then they behave
in the same way distributionally from that time onward.
Now recall that in order to establish that the classical coupling was
successful for irreducible aperiodic Markov chains (Chapter 2, Section 3) we
needed not only aperiodicity but also positive recurrence, that is, we needed
Section 5. Time-inhomogeneous Regeneration 379
the condition that the cycle-lengths have finite expectation. Analogously for
time-inhomogeneous regenerative processes we shall need not only a
generalization of the conditions in Theorems 3.3 and 4.3 (that the cycle-lengths
are either spread out or lattice with delay times supported by that same
lattice) but also a condition that generalizes this finite mean cycle-length
condition.
The coupling construction will be carried out in the next section, but we
shall state the resulting (partial) extension of Theorems 3.3 and 4.3 already
at this point. More detailed coupling and convergence results are presented
in the next section.
Theorem 5.3. Let (Z, S) be time-inhomogeneous wide-sense regenerative
and (Z', S') be a version of (Z, S). Suppose either there is an n ^ 1 and a
subprobability density f on [0, oo) such that J f(x) dx > 0 and
inf F?{B)^ [ f(x)dx, BeB, (5.6)
se[o,oo) JB
or there is a d> 0 and a subprobability mass function f on {d, 2d, 3d,...}
such that f is aperiodic [that is, gcd{fc ^ 1 : f(kd) > 0} = 1] and
inf Fs({kd}) ^ f(kd),k ^ 1, and S and S' are dZ, valued. (5.7)
s£[0,oo)
Further, suppose there is a probability distribution F on [0, oo) such that
inf Fs{x) ^ F{x) for x G [0, oo) and I xF{dx) < oo. (5.8)
se[o,oo) J
Then the following claims hold.
(a) The underlying probability space (f2, T, P) can be extended to support
finite random times T and T" such that (Z, Z',T,T') is a successful
distributional exact coupling, that is,
(6TZ,T)^(8T,Z',T'). (5.9)
Moreover, if there exists a weak-sense-regular conditional distribution
of Z given Oj-Z [this holds when (E,£) is Polish and the paths are
right-continuous], then {£l,T, P) cafi be further extended to support
a copy Z" of Z' such that (Z,Z",T) is a successful nondistributional
exact coupling of Z and Z', that is,
Z" = Z' and 6TZ = 6TZ". (5.10)
(6) With || • || denoting total variation we have
-p{6tZe-)--p{etZ' £-)\\ ->0, *^oo.
380 Chapter 10. REGENERATION
(c) With T the tail a-algebra on W we have
P(Z G A) = P(Z' G A), Ae T.
(d) If (Z,S) is time-inhomogeneous lag-l regenerative for some I > 0,
then Z is T-trivial and mixing, but not in general.
Comment. The conditions (5.6), (5.7), and (5.8) are only simple sufficient
conditions and far from being necessary. In the time-homogeneous case,
however, (5.6) means that X\ is spread out, and (5.7) means that So and
S0 are both dZ valued and X\ is lattice with span d, that is, the conditions
in Theorems 3.3 and 4.3 hold and these are necessary (see Remark 3.1).
Moreover, (5.8) means in the time-homogeneous case that E[Xi] < oo,
which was the condition for asymptotic stationarity.
Proof, (a) Theorems 6.1 and 6.2 in the next section yield the existence
oi{S,S',k,k') such that
S — S and k is a randomized stopping time w.r.t. 5, (5-11)
S" = S' and k' is a randomized stopping time w.r.t. S", (5-12)
Due to (5.11) and Theorem 5.2 above, (fi, J7, P) can be extended to support
a T such that
T=Sk and P(0TZ G -\T) = p(-\T) a.s.
Due to (5.12) and Theorem 5.2, (fi, T, P) can be further extended to
support a T" such that
T' = S'k, and P(02vZ'e-|T')=p(-|T')a.s.
Combine Sk = S'k, and T = Sk and V = S'k, to obtain T = T. Combine
this and P(6TZ G -\T) = p(-\T) and P(0T,Z' G -|T') = p{-\T') to obtain
(5.9). Apply Theorem 3.2 in Chapter 4 to obtain Z" such that (5.10) holds.
(b) Use (a) and Theorem 9.4 in Chapter 4 to obtain (b).
(c) Use (a) and Theorem 9.4 in Chapter 4 to obtain (c).
(d) Assume that (Z, 5) is time-inhomogeneous lag-/ regenerative for some
/ > 0. Due to Theorem 2.1 in Chapter 6, Z is T-trivial and mixing if we
can establish that
\\P(Otze-\zeB)-P{Otz'e-)\\->o, t -> oo, (5.13)
for all B of the form
B = {z G H : ztl Gli,...,zt, G An},
Section 5. Time-Inhomogeneous Regeneration 381
where n ^ 1, 0 ^ t\ < ■ ■ ■ < tn, and Ai,..., An G £.
In order to prove (5.13), note that (Z, (Sjv(n+,+jt)o°) is a version of the
lag-/ regenerative (Z, S) [see Lemma 5.1] and that the event {Z G B} is
in the cr-algebra generated by (Zs)se[o,t„]- Since the mth regeneration time
SjvtTi+,+m is greater than tn + /, it follows from this and the definition of
time-inhomogeneous lag-/ regeneration [see (5.3)] that the event {Z G B}
and the past of (Z, (S,jvtri+,+jb)o°) UP to time SN,n+,+m — / depend on the
future of (Z, (Sn, +,+jt)o°) after time 5jv, +, only through Sjv,n+,+TO- Thus
a pair (Z1, S') with distribution P((Z, (SivfTi+,+fc)o°) S -|Z G B) is a version
of (Z, S). Thus (6) yields (5.13).
Thus Z is T-trivial and mixing in the lag-/ case. According to
Theorem 4.3(d), this need not hold in the general wide-sense case. □
5.6 Condition (5.8) Is Stronger Than Uniform Integrability
A family of random variables Ys, s £ [0, oo), is uniformly integrable if
sup E[Ysl{Ys>x]] -> 0, a;->oo. (5.14)
s£[0,oo)
D
Recall that ^ denotes stochastic domination [see Section 3 in Chapter 1].
If Ys has the distribution Fs, then (5.8) can be rewritten as follows: there
is a random variable Y such that
Ys ^ Y for s G [0, oo) and E[Y] < oo. (5.15)
D _
If (5.15) holds, then Ys1{ys>x} ^ ^l{y>x} f°r a^ s ^ [0, oo), and thus
sup E[Ysl{Ys>x}] ^ E[yi{y>x}] ->• 0, a;->oo.
s£[0,oo)
Thus (5.15) implies uniform integrability. The converse is not true, however,
as the following counterexample shows. Thus the condition (5.8) is strictly
stronger than uniform integrability.
Example 5.1. Let Y be a random variable on [2,oo) with distribution
function
Fix) = 1 - 2l£g£ 2 < a; < oo. ■
a; log a;
Note that |0°° P(Y > x) dx = |2°° ||2|1 dx = oo, and thus [see Lemma 5.2
below]
E[Y) = oo.
For s G [0, oo), define Ys by
Ys = s if Y > s and Ys = 2 if Y ^ s.
382 Chapter 10. REGENERATION
Then E[ysl{y>>x}] = sP(Y > s) for s > x, and E[ysl{y3>x}] = 0 for s ^ x,
and thus, for x ^ 2,
2 log 2
sup E[ysl{ys>a.}] = xP(Y > x) = -— >• 0, x -»• co.
se[o,oo) * log a;
Thus the family Ys, s £ [0, oo), is uniformly integrable. On the other hand,
P(ys > x) - P(y > s) for s > x, and P(y > x) = 0 for s ^ x, and thus
sup P(y > x) = P(y > x), x e [2,00).
s£[0,oo)
Since E[Y] = oo, this shows that there can be no finite-mean random
variable Y dominating all the Ys stochastically, that is, (5.15) cannot hold.
Lemma 5.2. For any nonnegative random variable Y it holds that
/•OO
E[y]= / p(y>x)dx.
Jo
Proof. We have Y = /0°° l{Y>x}dx, and taking expectation [and
interchanging expectation and integration] yields the desired result. □
5.7 Stationarity => Time-Homogeneity
We shall now show that under proper time-inhomogeneity there is no
stationary version.
Theorem 5.4. Let (Z, S) be time-inhomogeneous wide-sense regenerative.
If (Z, S) is stationary, then (Z, S) is time-homogeneous wide-sense
regenerative.
Proof. Take t G [0, oo), n)0,A6?{®£, and let p(-|-) be the type of
(Z, S). We must show that 9s„(Z,S) does not depend on Sn, that is, we
must show that p(A\Sn) is a.s. constant. Since Nt- + n is a stopping time
with respect to 5, we have [see Lemma 5.1]
P(0sNt_+n(Z,S)e A\SNt_+n) = p(A\SNt_+n) a.s. (5.16)
By stationarity, (6>SjV(_+7i (Z, 5), SNt_+n - t) = (9Sn{Z,S),Sn), and thus
P(OsNt__+n(Z,S)eA\SNt_+n-t)=p(A\SNt_+n-t) a.s. (5.17)
Since the left-hand sides of (5.16) and (5.17) are a.s. equal, so are the
right-hand sides, and since SNt_+n — t = Sn, we obtain
p(A\t + Sn) =p(A\Sn) a.s., «G[0,oo),n^0,AGH®£. (5.18)
Section 5. Time-inhomogeneous Regeneration 383
This implies E[/0°° \p(A\t + Sn) - p(A\Sn)\P(Sn G dt)\ = 0. Thus we can
find an s G [0, oo) such that
oo
\p(A\t + s)-p(A\s)\P(Snedt)=0.
Thus p(A\s + Sn) = p(A\s) a.s. This and (5.18) with t = s yield p(A\Sn) =
p(A\s) a.s., that is, p(A\Sn) is a.s. constant as desired. □
5.8 Asymptotic Stationarity => Asymptotic Time-Homogeneity
A comparison of Theorem 5.3 with Theorems 3.3 and 4.3 shows that the
only difference (apart from the finite moment condition) is that
Theorem 5.3 contains no claim about asymptotic stationarity or about
asymptotic periodic stationarity. In the light of Theorem 5.4 this is rather natural.
We shall now show further that if (Z, S) is asymptotically stationary in
total variation, then it is asymptotically time-homogeneous in the sense that
p{A\t + ■) is an L\ constant in the limit as t —> oo for each A G Ti CS> C and
the convergence is uniform in A.
Theorem 5.5. Let (Z,S) be time-inhomogeneous wide-sense regenerative
of type p(-\-). Suppose there is a pair (Z*, S*) such that
6t{Z,S)tA{Z\S*), t->oo. (5.19)
Then (Z*, 5*) is stationary and time-homogeneous wide-sense regenerative,
and
sup / \p{A\t + s)-P(9s*n(Z*,S*)eA)\ds->0, t -> oo, (5.20)
AeH®CJo
for all h > 0.
PROOF. First note that (Z*,S*) is stationary, since (5.19) implies that as
t —> oo,
9S(Z*,S*) £ 0s0t(Z,S)=9t+s(Z,S) % (Z*,5*), s G [0,oo).
Now fix an arbitrary AeV.® C. Due to (5.19), we have, for each n ^ 0,
(QsNi_+n(Z, S),Sn,_ —t,.. .,SNt_+n - t)
n(8s.(Z*,S*),S°0,...,S*n), t->oo.
Since Nt- +n is a stopping time with respect to 5, we have [see Lemma 5.1]
P(^sJv,_+„ {Z, S) G A\SNt_ -t,..., SNt_+n -t)= p{A\SNt_+n) a.s.
384 Chapter 10. REGENERATION
Applying Lemma 5.3 below yields that as t —> oo,
E[\p(A\t + S*n)-P(9s*(Z*,S*)eA\S*0,...,S*n)\}^0. (5.21)
This implies that p(A\t + 5*) converges in probability as t —> oo for all
n ^ 0. Therefore (see Ash (1972), Theorem 2.5.3) there is a sequence
too < toi <■■■—> oo such that p(A\t0k + Sq) converges a.s. as k -> oo,
and a subsequence £10 < t\\ < . ■ ■ of (£0o, £oi, • • ■) sucn that jo(A|ti/t + 5J)
converges a.s. as k —> oo, and so on. Thus p(A\tkk + S'*) converges a.s. as
k —> oo for all n ^ 0. In other words, for each n ^ 0 there is a Borel set
_Bn such that P(5* G -Bn) = 1 and p*(A\s) := limk-+oo p(A\tkk + s) exists
for all s G -B„. Put p*(A\s) := P((Z*,5*) G A) for s £ US° B" to obtain
p{A\tkk + S^) -> p*(A|5^) a.s. as fc -> oo for all n > 0. This and (5.21)
yield
P(0S. {Z\ S*) G A|S',..., S*n) = p*(A\S*n) a.s. for all n > 0.
Therefore, (Z*,5*) is time-inhomogeneous wide-sense regenerative, and
since (Z*,S*) is stationary, Theorem 5.4 yields that (Z*,S*) is time-
homogeneous wide-sense regenerative.
To establish (5.20) use Lemma 5.3 below and the time-homogeneity of
(Z*, S*) to obtain that as t -> oo,
/•oo
sup / \p(A\t + s)-P(0s*o(Z*,S*)eA)\P(S*oeds)^O. (5.22)
Now, 5q is the delay length of the stationary renewal process S* and thus
has a nonincreasing density (see Theorem 2.1). Thus there are a and b > 0
such that
|p(A|t + s)- P(9s; (Z*,S*) G A)|P(S0* G ds)
^af \p(A\t + s)-P(9s>(Z*,S*)eA)\ds, AeU®C.
Jo
This and (5.22) show that (5.20) holds for h = b. Thus (5.20) holds for
h = kb for all k ^ 1. Since the left-hand side of (5.20) in nondecreasing in
h, this implies (5.20) for all h > 0. D
Lemma 5.3. For each t G [0, oo), let (Vt,Wt) be a random pair in some
measurable product space (Ei,£i) ® (E2, £2)- Suppose there exists a regular
version qt{-\-) of P(Vt G -\Wt = •)■ V there is a pair (V,W) such that
(Vt,Wt) 4 (V, W) ast-> 00, then
aupE[\qt{A\W)-P{VeA\W)\]->0, t -► 00. (5.23)
Ae£i
In particular, if qt{\) does not depend on t, <fc(-|-) — <z(-|') saV' then q(-\-)
is a version ofPiV G -\W — ■).
Section 6. Classical Coupling 385
Proof. Take A G £1, let q-oo(^l-) be a version of P(V G A|W = ■), and
put £+ = {w G £2 : 9oo(^|w) ^ q-t(A|w)}. Then
E[l{W€B+}l9<x,(A|W)-gt(A|W)|]
= P(V G A, W G B+) - P(Vt G A, Wt G B+)
+ / qt(A\w)P(Wt edw)- f qt(A\w)P(W G du>).
Jb+ Jb+
The first difference on the right-hand side of this identity is dominated by
i||P((V, W) G -)-P{{Vt, Wt) G -)H, and the second difference is dominated
by \\\P{Wt G ■) - P(W G -)||- Thus
E[l{W€B+}\qx(A\W) - qt(A\w)\]
<\\P((Vt,Wt)e-)-P((V,W)e-)\\.
With B~ = {w G E2 '■ qoc(A\w) < qt(A\w)} we obtain in the same way
ni{WeB-}\q°o(A\W)-qt(A\w)\]
<\\P((Vt,Wt)e-)-P((V,W)e-)\\.
Add these two inequalities and take the supremum in A G £\ to get (5.23).
In particular, if qt(A\-) = q(A\-) for t < 00, then it follows from (5.23)
that E[|g(A|W) - P(V G A\W)\] = 0, that is, P(V G A\W) = q(A\W) a.s.
as desired. □
6 Classical Coupling
In this section we shall use the classical coupling idea (Chapter 2) to
construct a successful distributional exact coupling under the conditions of
Theorem 5.3, that is, we shall complete the proof of Theorem 5.3.
Classical coupling in the context of time-inhomogeneous regenerative
processes means simply using the time of first simultaneous regeneration of
two independent versions of the process as a distributional coupling time.
This procedure is successful under the lattice conditions of Theorem 5.3
and can be modified to be successful under the nonlattice conditions.
Throughout this section and the next we shall write Ps to indicate that
S0 = s with probability one.
6.1 Classical Coupling — The Lattice Case
We start by showing that the classical coupling is successful under the
lattice condition of Theorem 5.3.
Theorem 6.1. Let S and S' be independent nonnegative Markov sequences
increasing strictly to infinity with common time recurrence distributions Fs,
386 Chapter 10. REGENERATION
s G [0,oo). Suppose there is a d > 0 and an aperiodic subprobability mass
function f on {d, 2d, 3d,...} such that S and S' are dl* valued and f is a
mass function component of Fs for all s G [0, 00), that is,
Fs({kd})>f(kd), k^O.
Suppose further there is a probability distribution function F on [0, 00) such
that J xF(dx) < 00 and F dominates Fs stochastically for all s G [0, 00),
that is,
Fs(x)^F(x), iG[0,oo).
Then the time of first simultaneous regeneration is finite with probability
one:
T := ini{t ^ 0 : Bt- = B't_ = 0} < 00 a.s.
Moreover, with K and K' the indices of S and S' such that Sk = S'K, = T,
it holds that K is a randomized stopping time with respect to S, and K' is
a randomized stopping time with respect to S'.
Proof. Let us start with the randomized stopping time claims. For each
n ^ 0, S depends on the event {K = n} only through (50,..., Sn, S')
[since {K — n} is in the cr-algebra generated by (So,... ,Sn,S')] and on
(So, ■ ■ ■, Sn, S') only through (So, ■ ■ ■ ,Sn) [since S is independent of 5'].
Thus S depends on {K = n} only through (So,... ,Sn), that is, K is a
randomized stopping time with respect to 5. In the same way we obtain
that K' is a randomized stopping time with respect to 5'. Thus it only
remains to establish that P(T < 00) = 1.
For that purpose, define a Markov process (Tfc,T[)§° with state space
[0,oo)2 as follows: fix an integer no > 0 and put (see Figure 6.1 in
Section 6.3 below)
(To,To) := (So, So)
and, for k ^ 0,
(Tk+1,T'k+1):=(SNLk_,S'N,; ), where Lk := rk V r'k + n0d.
Put, for k ^ 0,
(0k+i,/3'k+i) ■= (7-fc+i - Lk,r'k+1 -Lk).
When
M:=inf{*£ !:(&,&) = (0,0)}
Section 6. Classical Coupling 387
is finite, we have tm = t'm and thus
T ^ tm. (6.1)
In order to establish that P(M < oo) = 1, note that conditionally on
(Tfc> T'k) = (s>s') tne random variables /3fc+i and /3'k+1 behave as the residual
lives at time (s V s' + nod)— of two independent sequences of regeneration
times with delay lengths s and s', respectively. Thus
Pm+l,P'k+1) = (OM(Tk,T'k) = (s,s'))
= Ps(-B(sVs'+n0d)- = 0)Ps'(-B(sVs'+nod)- = 0).
Due to Lemma 6.1 below, we can choose no such that for some p > 0,
P((A+iX+i) = (0,0)|(Tfc,r[))^2a.s, fc^O. (6.2)
For m ^ 0, the event
{M>m} = {(&,#) ^ (0,0),...,(/3m,/4) ^ (0,0)}
is in the cr-algebra generated by ((t0,Tq), ..., (rm,rj). Since (Tjt,T[)g° is
Markovian, this means that the event {(/3m+i,/3m+i) 7^ (0,0)} is
conditionally independent of the event {M > m) given (Tm,T'm). This and (6.2)
yield the inequality in
P(M >m+l\M > m) = P((/Wi,/4+i) 7^ (0,0)|M ^ m)
Thus
P(M > k) ^ (l-p2)k ->0, fc->oo,
that is, P(M = 00) = 0, which implies P(T < 00) = 1 due to (6.1). □
Lemma 6.1. Under the conditions of Theorem 6.1 there exists an integer
n0 ^ 1 and a p > 0 determined by f and F such that for all s £ [0, 00),
Ps{Bs+nd- = 0) ^ p, n ^ n0. (6.3)
PROOF. Put q = Y^\ /(lc0 and let R = (-Rjt)o° t>e a zero-delayed renewal
process with dZ valued recurrence times having probability mass function
f/q. Then, for n ^ 1,
Ps(£s+mi_=0) = ]r>;({nd})
i=l
00
^ J2 qiP(Rl = nd) =: b(n), say.
(6-4)
»=i
388 Chapter 10. REGENERATION
The set {n ^ 1 : 6(n) > 0} is both additive [since P(Ri = nd) > 0 and
P(Rj = md) > 0 imply P(Ri+j = (n + m)d) > 0] and aperiodic [since
Hn) ^ f{n<i) and / is aperiodic]. Thus, by Lemma 3.1(6) in Chapter 2,
there is an n0 ^ 1 such that 6(n) > 0 for n ^ no- Take m ^ 1 such that
F{{nid}) > 0, note that
6 := min{6(n0),..., 6(n0 + ni - 1)} > 0, (6.5)
and put
, , J 6, no ^ n < n0 +ni,
a(n) := < -
I bF(md)... F((n - n0)d), n0 + ni ^ n < oo.
We shall show by induction that for n ^ no,
inf Ps{Bs+nd_ = 0) ^ a(n). (6.6)
s£[0,oo)
Due to (6.4) and (6.5), this is true for each n G {no,..-,no + ni — 1}.
Suppose (6.6) holds for all n G {no,. - - , m}, where m is some integer such
that m ^ n0 + ni — 1. Since a(-) is nonincreasing, this induction hypothesis
yields
inf Ps(Bs+nd- =0) ^ a(m), n G {n0, ■ ■ ■ ,m}. (6.7)
s£[0,oo)
For each s G [0, oo),
m+l
Ps(-Bs+(m+1)d_ = 0) = 2J -F1s({ic?})Ps+id(-Bs+(TO+i)d_ = 0)
m+l —no
^ £) Fs({id})Ps+id{Bs+{m+1)d_=0)
i=l
m+l —no
^ ]C Fs{{id})a(m) [due to (6.7)]
i=l
^ 0(771)^ ((m + 1 - n0)d)
^ a(m)F((m + 1 — no)d) [F dominates Fs stochastically]
= a(m + 1).
Thus by induction (6.6) holds for all n ^ no-
Since a(-) is nonincreasing, (6.6) yields that (6.3) holds with
oo
p := lim a(n) = 6TTc;, where Ci := F(md + id).
Section 6. Classical Coupling 389
Lemma 5.2 yields £"^(1 - c,) < / xF(dx). Thus J xF(dx) < oo implies
J2°l0(l - ct) < oo, which in turn implies Yl°l0 c, > 0 due to Lemma 6.2
below. This and b > 0 yield p > 0 as desired. □
The following result was needed in the above proof.
Lemma 6.2. 7/c0,ci,---G (0,1], then
oo oo
TT Cj > 0 if and only if /J(l - *) < oo.
o o
Proof. For all x e K it holds that ex ^ x + 1. Thus
OO
o
and thus FJ^° cj > 0 implies 5Z^°(1 — <k) < oo.
In order to establish the converse let Ao,Ai,... be independent events
with P(Ai) = c,, i ^ 0. For k ^ 0,
OO OO OO OO
11* = P( f| A,) = 1 - P( (J Af) > 1 - £(1 - cO.
i=jfc i=fc i=fc i=k
If 5Z^°(1 - Cj) < oo, then 5Z^fe(l — Cj) < 1 for some k ^ 1, and thus
Ilfcfc Ci > 0. Thus £~(1 - Ci) < oo implies n~ c* > °- D
6.2 The Lemma for the Nonlattice Case
We now turn to preparation for the coupling in the nonlattice case. The
following result is the counterpart of Lemma 6.1 [to see the similarity
replace the uniform distribution \x in (6.8) by the distribution with unit mass
at zero to obtain (6.3)].
Lemma 6.3. Let S be a nonnegative Markov sequence increasing strictly
to infinity with recurrence distributions Fs,s £ [0, oo). Suppose there is a
subprobability density f on [0, oo) such that f f(x) dx > 0 and f is a density
component of Fs for all s £ [0, oo), that is,.
FS(B)^ [ f{x)dx, BeB[0,oo).
JB
Suppose further there is a probability distribution function F on [0, oo) such
that f xF(dx) < oo and F dominates Fs stochastically for all s € [0, oo),
that is,
Fs(x)^F(x), iG[0,oo).
390 Chapter 10. REGENERATION
Then there is a to ^ 0, a p £ (0,1], and a c > 0 determined by f and F
such that for all s € [0, oo),
P,(Ba+t-e-)^pii, t^t0, (6.8)
where \x is the uniform distribution on [0, c].
We prove this lemma in six steps.
First step of proof. We shall start by showing that for each h > 0
there is a th and a nonincreasing function bh(-) determined by / such that
inf Es[Ns+t+h - Ns+t] ^ bh(t) > 0, t>th. (6.9)
s£[0,oo)
For that purpose let a be such that q := J"Qa f{x) dx > 0 and let R — (-Rfc)o°
be a zero-delayed renewal process with recurrence times having density
f(x)/q, O^x^a. Then, for t and h ^ 0,
OO
Es[Ns+t+h - Ns+t] = J2 ps(t <X1 + --- + Xt^t + h)
i=\
oo
^^giP(t<ifc^t + /i)=:ffh(')> say.
i=l
Note that gh(t) > 0 if and only if £~x P(t < R{ ^ t + h) > 0. Since the
recurrence times of R are bounded and nonlattice, Blackwell's renewal
theorem (Theorem 8.1 in Chapter 2) says that the expected number of renewals
in (t, t + h], namely J^'tli P(* < Ri ^ t + h), has a strictly positive limit
as t —> oo. Thus there is a t'h such that gh(t) > 0 for t ^ £'h. For each /i > 0
and £> t'h, take nh(t) such that [nh(t)h/2,nh(t)h/2 + h/2) ^ [*,* + /i) and
note that gh(t) ^ gh/2(nh(t)h/2). Put &h(*) := inf^^j^^^) gh/2(kh/2)
and th := £j,2 to obtain (6.9).
Second step of proof. We shall next show that with a > 0 such that
Ja°° f(x^ dx > 0 and b = /a°° f(x) dx we have
sup Es[Ns+t+h - Ns+t-} ^ - + —, t^0, /i^0. (6.10)
se[o,oo) ° a0
For that purpose, consider a random walk starting at 0 with {0, a}
valued step-lengths taking the value a with probability b. For k ^ 0, let
Mfc be the number of times this random walk visits the state kb. Then
M0, Mi,... are i.i.d. and geometric with parameter b. Now, Ns+t - JVS_ is
stochastically dominated by M0-\ \-M[h/ay Thus Es[Ns+t+h — Ns+t-] ^
(1 + h/a)E[M0], and (6.10) follows by noting that E[M0] = 1/6.
Section 6. Classical Coupling 391
Third step of proof. We shall now show that with
Y ~ sNSo+\ - So = 1 + Bs0+i
there exists a distribution function F determined by / and F such that
f xF(dx) < oo and, for all s G [0, oo),
Ps(YKy)>F(y), ye[0,oo). (6.11)
In order to establish this, let a and b be as at (6.10) and put d := (l + l/a)/b.
Then, for y > 1 and s e [0, oo),
PS(Y>y) = £P.(Xi + • • • + Xi-i ^ 1,Xj. + ■ ■ ■ + Xi > y)
i=\
oo «
= V / (1 - Fs+X(y - x))Ps(Xl +■■■ + X^ E dx)
OO »
i=i •W
(6.12)
^ (1 - F(y - 1)) > ' / Ps(Xi + • • ■ + Xi_! e da;)
= (l-F(j/-l))E,[JV,+1-JVg_]
sC d(l - F(y - 1)) [due to (6.10)].
Thus (6.11) holds if we take y0 > 1 such that d{\ — F(y0 — 1)) $C 1 and
define F by
F( ) ■= /°' _ ^ < 2/0'
W' \l-d(l-%-l)), y^y0.
We obtain f xF(dx) < oo from f xF(dx) < oo and
JxF{dx)=J (l-F(y))dy
^y0 + df {l-F{y-l))dy = y0+dfxF{dx),
where the two identities are due to Lemma 5.2.
Fourth step of proof. We shall next show that for each h > 0 there is
an ah > 0 determined by / and F such that with th as at (6.9),
inf Es[Ns+t+h - Ns+t] ^ah, t^ th. (6.13)
s6[0,oo)
392 Chapter 10. REGENERATION
Put n0 = \th] + 2 and take n\ > 1 such that F{nx) > 0. With bh(-) as at
(6.9) and F as at (6.11) put
, ,. J t>h ■= bh(n0 +nl - 1) > 0, n0 ^ n< n0 + ni,
o-h{n) ■ = <
I bhF(ni).. .F(n — n0), n0 + «i < n < oo.
We shall show by induction that for t £ [th, oo),
inf E.[Ws+t+h-iV.+t]>afc([i] + l). (6.14)
s6[0,oo)
By (6.9) and since bh(-) is nonincreasing, this is true for all t G [th, n0+ni—1).
Suppose (6.15) holds for all t G [i/,,m) where m is some integer such that
m ^ no + ni — 1. Since <!/,(•) is nonincreasing, this induction hypothesis
yields
inf Es[Ns+t+h - Ns+t] > a/l(m), t G [ih,m). (6.15)
«6[0,oo)
For each t G [m,m + 1), we now have
Es[Ns+t+h - N.+t] = / Es+x[Ns+t+h - Ns+t}Fs{dx)
J[0,t+h]
> / Es+it[Ws+t+h-JVs+t]F,(da:) [1 < m + 1 - n0 < t]
J[l,m+\-no)
}t / ah(m)Fs(dx) [due to (6.15)]
V[l,m+i-„0)
> ah(m)Fs(m + 1 - n0)
> ah(m)F(m + 1 — n0) [F dominates Fs stochastically]
= ah(m +1), s G [0,oo).
Thus (6.14) holds also for all t G [m,m + l). Thus by induction (6.14) holds
for all t G ft/,, oo).
Since a/,(-) is nonincreasing, (6.15) yields that (6.13) holds with
a/.
:= lim ah(n) = bhTTcj, where Cj := F(ni + i).
Lemma 5.2 yields 5Z^X(1 — c*) ^ f xF(dx). Thus f xF(dx) < oo implies
£]°^0(1 — Ci) < oo, which in turn implies YltLo ci > 0 due to Lemma 6.2
above. This and b^ > 0 yield a/j > 0 as desired.
Section 6. Classical Coupling 393
Fifth step of proof. We shall next show that there are constants a and
b > 0 determined by / and F only such that
oo
YJ F1™ > aX([b, oo) fl •), where A is Lebesgue measure. (6.16)
n=0
Note that Jj0 . /(• — x)f(x) dx is a density component of F^. Thus, due
to Lemma 6.1 in Chapter 3, there is an interval [xo,Xq + h] and a constant
Co > 0 determined by / only such that for all s G [0, oo),
Fs2 }tco\{[x0,Xo + h]n-).
This yields the last step in the following: for A G 23[0, oo) and s G [0, oo),
OO CO
n=2
oo
n=2>.
F?+y(A-y)F?-2(dy)
+y
OO
r oo
= / F?+y(A-y)Y,K(dy)
Jlo,°°) n=0
* OO
> / coX([x0,x0 + h]n(A-y))y2Fp(dy).
J[0,oo) n=0
The last term has a density gs(-),
« oo
g,{x) = / Col[x0tXo+h](x - j/) ]T F"(dv)
Jl°,°°) n=0
* OO
J[x — xo — h,x-xo] n=o
oo
= c° E ^"([x -x0 ~h,x- x0])
n=0
= coEs[JVs+a. -x0- Ns+X -x0-h]
> c0aft for x ^ th + x0 + h [due to (6.13)].
Put a = c0a/j and b = th + x0 + h to obtain (6.16).
394 Chapter 10. REGENERATION
Sixth, and last step of proof. We are now ready to establish (6.8).
For A £ B[0, oo) and t > b, we have
Ps(Bs+t- €A)
OO
= J2Ps(xl +■■■ + x4_i <t,X!+--- + Xi-teA)
i=l
oo
= jr/ Fs+y{A + t-y)Fl-l{dy)
i=i J[°'t)
» OO
= / Fs+y(A + t-y)Y,Frl(dy),
J\o.t) ~T
and applying (6.16) yields the first inequality in
Ps(Bs+t- e A) > a / Fs+y{A + t - y) dy
J[b,t)
> a / / /(a;) dxdy.
J[b,t) JA+t — y
The last term has a density gt(-),
gt(x) = a f(x + t-y)dy = a f(u)du, x€[0,oo).
J[b,t) J[x,x + t-b)
Take c > 0 and t0 ^ b + c such that /, t _b) /(«) du > 0 to get
gt{x) ^ a f(u) du>0, x G [0, c], t > i0.
•/[c,t0-f>)
Put p = ca f, t _b) f(u) du to obtain the desired result (6.8). □
6.3 Modified Classical Coupling — Nonlattice Case
We shall now modify the proof of Theorem 6.1 to obtain finite distributional
coupling times under the nonlattice condition of Theorem 5.3.
Theorem 6.2. Let S and S' be independent nonnegative Markov sequences
increasing strictly to infinity with common recurrence time distributions Fs,
s G [0, oo). Suppose there is a subprobability density f on [0,oo) such that
f f(x)dx > 0 and f is a density component of Fs for all s G [0, oo), that
is,
FS{B) ^ J f{x) dx, B <= B[0, oo).
Jb
Section 6. Classical Coupling 395
Suppose further there is a probability distribution function F on [0, oo) such
that f xF(dx) < oo and F dominates Fs stochastically for all s G [0, oo),
that is,
Fs(x)^F{x), xe[0,oo).
Then the underlying probability space can be extended to support a
randomized stopping time K with respect to S and a randomized stopping time K'
with respect to S' such that
Sk = SKi.
Proof. Define a Markov process (Tk,Tk)0Xi with state space [0, oo)2 as
follows: let to be as in Lemma 6.3 above and put (see Figure 6.1)
(rO,To) := {So,S'0)
and, for k ^ 0,
(Tk+i,T'k+l) ■■= (SNLk-,S'N, ), where Lk = TkV r'k +t0.
Lo
U
• >
-•->•
FIGURE 6.1. The coupling construction.
Conditionally on (Tk,Tk) = (s,s'), the random variables
Pk+i := Tk+i — Lk and f3'k+l := rk+1 — Lk
behave as the residual lives at time (s V s' + t0)— of two independent
sequences of regeneration times with delay lengths s and s', respectively.
Thus, for A and A' eB[0,oo),
P((3k+l € A,p'k+l e A'\(rk,r'k) = (s,s'))
= Ps(B{sWs,+to)_ e A)ps,{B{sVs,+to)_ e A')
Thus, due to Lemma 6.3 above,
P((/9k+i./9fc+i) £-\Tk,T'k) >p2fi®ii, k^O,
(6.17)
396 Chapter 10. REGENERATION
where p G (0,1] and ^ is the uniform distribution on [0, c] for some c > 0.
Intuitively, this means that with probability p2 the random variables
pk+i and (3'k+l are i-i-d- witn distribution /z and independent of (Tk,T'k)
and thus of Lk. Thus with probability p2 the regeneration times Tk+\ =
Lk + (3k+\ and r'k+l = Lk + f}'k+l are identically distributed. Since /z ® /z
governs the /3-pairs independently of whether it has done so before or not,
this means that we obtain distributional coupling times in a geometric
number of trials.
This idea can be made precise by conditional splitting. Apply
Corollary 5.1 in Chapter 3 recursively [in step k take (Yo,Yi,Y2) := ((Tk,T'k),
(S,S',Io, ■ ■ ■ ,Ik), (T*+i>TJk+i))] to obtain 0-1 variables I0,h,- ■ ■ such that
for k > 0,
(S,S',I0,...,h-i) depends on Ik
(6.18a)
only through {{Tk,Tk), (Tk+i,Tk+l)),
and
P(Ik = l\Tk,T'k)=p2, (6.186)
P((/9fc+i,/9i+i) G -\(Tk,r'k) = (s,s'),Ik = 1) =/i®/i. (6.18c)
We shall use the following result from Lemma 4.1 in Chapter 3 repeatedly:
for any random elements Y0, Y\, Y2 and Y3 it holds that
I3 depends on Y2 only through (Yi,Y0)
and on Y\ only through Y0
if and only if
Yz depends on {Y2,Y\) only through Y0.
Fix an m ^ 1. For 0 < i < m, ((tj, t/), (tj+1 , t,'+1)) is a measurable mapping
of (Tfc,T^)^L0, and thus we deduce the following from (6.18a):
ForrO ^ i < m,(Tm+k,T,m+k)k*Ll depends on
Ii only through ((Tfc,T^)^=0,10, ■ ■. ,h-i).
Thus (rm+k, T'm+k^=l depends on 7m_1only through ((rfc, t^=0, io, ■ ■ •, Im-2),
and on 7m_2 only through ((Tfc,T£)£L0, I0,... ,Im-2), • • •, and on I0 only
through {Tk,Tk)^-=0, and [since (Tfc,r£)£L0 is Markovian] on (ta:,t{.)£L0 only
through (Tm,T'm). Thus
(Tm+fc,T^+t)211 depends on
(6.19)
((^>rfc)r=o,/oI---./m-i) only through {Tm,Tm).
Due to (6.18a), {{Tk,T'k)rk=0,Io, ■ ■ ■ Jm-i) depends on Im only through
((rm,T^),(Tm+k,T^+k)f=l) and, due to (6.19), on {Tm+k,T'm+k)f=l only
Section 6. Classical Coupling 397
through (Tm,T'm). Thus
{{Tk,T,k)rk=0'Io,---,Im-i) depends on
(6.20)
Tm+k, Tm+k)k^=l)) onlv through (Tm,T^j).
Due to (6.18a), for each n > 1, (t^,-^, /fc)J?L0 depends on 7m+n only through
((rm+fc,r^+A.)^i1,/m+i,...,/m+„_i), depends on Im+n-\ only through
((rm+fc,T^+fc)^=1,/m+1,...,/m+„_2),..., depends on Im+l only through
(Tm+i,T^+fc)^1, and finally [by (6.20)] depends on {Tm+k,T'm+k)f=l only
through (rm,T^,Im). Thus (rm+fc,T^+fc,/m+fc)^:1 depends on (Tfc,r^,4)^
only through (rm,r^,/m), that is,
{Tk,Tlk,Ik)'^' is a Markov process. (6-21)
From (6.20) it also follows that Im depends on I\,..., Im—\ only through
{Tm,T'm). By (6.186), Im is independent of (rm,r^,). Thus 7m does not
depend on /i,..., 7m—i. Thus the 7i,/2,-. are independent, and (6.186)
yields that
the Ii,I2,- ■ ■ are i.i.d. and P(Ii = I) = p2.
Thus
M = inf{fc > 1 : Ik-\ = 1} is finite with probability one.
Put
K-^Nl^- and K':= Ni^..
Since M — 1 is a stopping time with respect to (Tk,T'k, ifc)§° and since
-T/V/-1 = 1, we obtain from (6.21), due to the strong Markov property, that
P((0M,0'M)e-\(TM-l,T'M-l)=(s,s'))
= P((p1,p'1)e-\(T0,T^ = (s,s'),i0 = i).
This and (6.18c) yield P((/3m,/3'm) G ■\t~m-i,t'm_1) > /i®/x, that is,
/3m and /3M are i.i.d. with distribution /z, (6.22a)
(/3M,/?M)is independent of (tm-i,t'm_1) and thus ofl/M-i- (6.226)
This yields the distributional identity in
Sk —tm— Lm-i + Pm = Lm-i + P'm = t'm = S'k' ■
Thus Sk — S'K, as desired.
It remains to establish the randomized stopping time claims. Fix an
m > 0. For 0 ^ i ^ m, ((tj,t/), (tj+i,t,'+1)) is a measurable mapping of
398 Chapter 10. REGENERATION
(S', (So,.. •, «5jvlm_)), and thus we deduce the following from (6.18a): for
0 ^ i ^ m,
S depends on U only through (S", (S0, - - -,SjvLm_),io, ■ ■ ■ ,-^i-i)-
Thus S depends on 7m only through (S', (S0, ■ ■ ■, SNLm_), I0, ■ ■ ■, Im-i),
and on 7m_! only through (S', (S0,..., SNLm_), I0,...,Im-2), ■ ■ ■, and on
I0 only through (S", (S0, - - -, £Vtm_)). Thus S depends on I0, . ■ ■, /m only
through (S', (S0,... ,5ivtm_)). Thus
{M = m + 1} depends on S only through (S', (So, ■ ■ ■, Sjvtm_)).
This yields the third equality in
P(A" = n, M = m + 1|S, S') = ?(NLm =n,M = m + 1|S, S')
= l{Ntm_=„}P(M = m + l|S,S')
= l{Ntm_=„}P(M = m + 1|(S0,...,SNLm_),S')
= l{NLm_=n}P(M = m + 1|(S0,..., Sn),S'),
while the last equality follows from Lemma 6.4 below (take J = Nim-,
Y0 = (S',So), and Yk = Sk for k > 1). Since the event {Nim_ = n} is
in the cr-algebra generated by (S', (So,..., Sjvtm_)), we obtain further the
first equality in
P(K = n,M = m + l\S,S')
= P(NLm_=n,M = m + l\{S0,...7Sn),S')
= P(K = n,M = m + 1|(S0,...,Sn),S').
Sum over m > 0 to obtain P(A" = n\S,S') = P(K = n|(S0,. - -, Sn),S').
Thus S depends on {K = n} only through ((So, ■ ■ ■, Sn),S'). Since S does
not depend on S', this means that S depends on {K = n} only through
(S0, - - -, Sn), that is, K is a randomized stopping time with respect to S.
In the same way we obtain that K' is a randomized stopping time with
respect to S'. □
The following well-known stopping time result was used in the above proof.
Lemma 6.4. Let J be a stopping time with respect to (Yq, Y\,...), where
for each k ^ 0, Yk is a random element in some measurable space (Ek,£k)-
Then
P(A\Y0,...,Yj)=P(A\Y0,...,Yn) on {J = n},
{or all events A and all n ^ 0.
Section 7. The Coupling Time - Rates and Uniformity 399
Proof. For B € Ufcto £o ® • ■ • ® £* we have
P(An{(Y0,...,Yj)eB})
CO
= £ P(4 n {(Y0, ...,Yn)eB,J = n})
n=0
ex)
= 2_^ E[1{(>'o,.-.,yn)6B}1{J=n}P(^li/0'-- ■ ,^n)]
n=0
CXJ
= E[1{(V0,-,V»€B} /, l{J=n}P(^l*0. • • ■ . Vn)]-
n=0
Therefore £~ l{J=n}P(.4|Yb, ■ • • ,Yn) is a version of P{A\Y0,
desired.
7 The Coupling Time - Rates and Uniformity
In this section we shall take a closer look at the coupling time T constructed
in the last section in order to obtain results on rates of convergence and
uniform convergence along the lines of Section 6 in Chapter 4. After
establishing a useful lemma we show that T can be stochastically dominated by
a manageable random variable T. This yields uniform total variation
convergence over a class of processes (Theorem 7.1). We then establish finite
moment results for T, which yields rate results for the uniform convergence
(Theorem 7.2). Finally, we establish sharper moment results for T itself,
which yields improved (but not uniform) rate results (Theorem 7.3). At
the end of the section we consider some consequences of this for
classical and wide-sense regenerative processes, and improve Blackwell's renewal
theorem in the spread-out case.
Throughout this section we assume that the conditions of Theorem 5.3
hold and write Ps to indicate that So = s with probability one.
7.1 Preliminaries
In the nonlattice case let (Tfc,r^,/jb)g°, {Pk,P'k)Ti and M be as in the proof
of Theorem 6.2 above. In order to treat the lattice case and the nonlattice
case simultaneously we shall not base our argument in the lattice case on the
proof of Theorem 6.1 but rather define (Tk,T'k, ifc)§°, (0k,/3'k)T' and M as in
the proof of Theorem 6.2, replacing to and p from Lemma 6.3 by to '■= nod
and p from Lemma 6.1 and with c = 0 and ^ the distribution having mass
1 at zero, /x({0}) = 1. The proof of Theorem 6.2 works in the lattice case
after this modification. Thus in both cases there are distributional coupling
times T and T" for Z and Z' such that
T = TM- (7.1)
.,Yj) as
□
400 Chapter 10. REGENERATION
Let (Tfe,f>fc)o° be a Markov process with state space [0, oo)2, initial
distribution G®G', and transition probabilities
P((fk+1,fk+1)e-\(fk^) = (s,s'))
:= P((Tk + l,T'k+1) £ -\(Tk,T'k) = {8,8'), Ik = 0), k > 0.
Thus, with
(Pk+i,P'k+1) '■= (n+i -Lk,fk+1 -Lk), where Lk := fkV f'k+t0,
we have, for k > 0,
P(^+ie-,^+1e-|(ffc,fi) = (*,*'))
= P(/Jfc+1 e -,p'k+1 e -|(n,^) = (*,*'),/*=())
= P(fo+1 6-,/^+1 6-, 4 =0\(rk,r'k) = (s,s'))
l-p2
_P(ft+i£-,ft+i 6-|(rfc,^) = (g,g'))-p2/x®/x
l-p2
where the last identity is due to (6.18ft, c). This and (6.17) yield, for k > 0,
P0k+1 e -J'k+1 e -\(fk,fk) = (s,s'))
= Ps(B{sVa,+t0)- e -)P.> (B(,v,>+to)- 6 ■) - p2/x ® /i (7-2)
i-p2
Lemma 7.1. £et V have the distribution \i and let V, M, and {fk,f'k)o'
be independent. Then
rj, D _
1 = TM
= LM-i+V
= (f0 V f0 + t0) + 0i V ~p[ + t0) + • • • + 0M-1 V P'M-1 + *o) + V.
Proof. The nondistributional identities are obvious. We shall prove that
T = Lm-i + V. From (7.1) and tm = Lm-i + Pm we obtain T = Lm-i +
Pm, and according to (6.22a,b), Pm has the same distribution as V and is
independent of Lm-i- Since V is independent of Lm-i, it only remains to
establish that Lm-i = Lm-\- We shall do so below.
Section 7. The Coupling Time - Rates and Uniformity 401
Due to (6.18ft) and since (Tk,T'kJk)T is Markovian, we have, for k > 0
and i £ {0,1},
P((to,^) e -,/0 = 0 = P(/o = i)P((f0,fi) e •)
and
P((Tfe+i,Tfc+1) e-,Ik+l =i\(Tkyk) = (S,s')jk =0)
= P(/fc+1 = i)P((f,+1,f^+1) e -|(f*,-ri) = (*,*')).
This and {M = m + 1} = {I0 = 0,..., Im = 0, Im+\ = 1} yield
P{(t0,t0) £ ■,..., (Tm,T'm) e-,Af = m + l)
= P(Af = m + l)P((7b,fi) e •,..., (fm,0 6 ■), m ^ 0.
This and the independence of M and (ffc,Tfc)o° yield
P((rM-i,r^_1) e-,Af = m + l)
= P((fM-i,r^_1) e-,M = m + l).
Sum over m > 0 to obtain P((tm-i,t'm_1) £ •) = P((fAf-i,f^f_1) £ •)•
Since LM-i = tm-i Vt'm_1 +t0 and LM-\ = tm-\ Vt^i +*o, this yields
Lm-i = Lm-i as desired. □
7.2 Dominating T Stochastically — Uniform Convergence
For a probability distribution function F on [0, oo) with J xF(dx) < oo, let
Gp denote the probability distribution function on [0, oo) having density
(1-F)/ JxF(dx), that is,
f*(l-F(y))dy
GP{X)= fyF(dy) ' x^°°)-
Note that if F is a recurrence distribution for a renewal process, then Gp
is the stationary delay-length distribution.
We shall now establish uniform total variation convergence under the
conditions of Theorem 5.3.
Theorem 7.1. Let (Z,S) be time-inhomogeneous wide-sense regenerative
and (Z',S') be a version of (Z,S). Let G be the distribution of So, G' the
distribution of S'0, and Fs, s 6 [0, oo), the recurrence distributions. Let G
be a probability distribution on [0, oo) such that
G(x) > G(x) and G'(x) > G(x), x e [0, oo).
(7.3)
402 Chapter 10. REGENERATION
Suppose either there is a subprobability density f on [0, oo) such that Jf>0
and, for s G [0,oo),
FS(B)> f f, BeB[0,oo), (7.4)
Jb
or there is a d > 0 and an aperiodic subprobability mass function f on
{d, 2d, 3d,...} such that for s G [0, oo),
Fs({kd}) > f(kd),k^ 1, and S and S' are dZ valued. (7.5)
Further, suppose there is a probability distribution function F on [0, oo)
such that J xF(dx) < oo and, for s G [0, oo),
Fs(x)^F(x), ze[0,oo). (7.6)
Then there are distributional coupling times T and T' for Z and Z' and
independent random variables M, Yo,Y\,... such that
M is geometric with parameter determined by f and F,
Yq and Y\ have the distribution G,
Y2,Y$,... are i.i.d with distribution determined by f and F and
there are finite a, b such that~P(Y2>x)^a(l — G-p{x)), x & [b, oo),
D
and, with ^ denoting stochastic domination,
T^T:=Y0 + --- + Y3M. (7.7)
Moreover, with \\ ■ || denoting total variation, we have
\\P(8tZ G •) - P{6tZ' G -)|| < 2P(f > 0, t > 0,
and thus
sup* \\P(0tZ G •) - P(StZ' G -)l| -> 0, t-+oo,
where sup* means the supremum over all pairs (Z, Z1) of time-inhomogeneous
wide-sense regenerative processes of the same type satisfying the conditions
(7.3) through (7.6) with G, f, and F fixed.
The total variation part of this theorem follows from (7.7); see Section 5.4
in Chapter 4. Thus it only remains to establish (7.7). We shall do so in four
steps below.
Section 7. The Coupling Time - Rates and Uniformity 403
First step of proof. We shall start by showing that there is a constant
ao determined by / and F such that
sup Ps(Bs+t--1> x) ^a0(l-Gp(x)), ze[0,oo). (7.8)
«,te[o,oo)
To that end, note first that according to (6.10) there is a constant b0 < oo
determined by / such that
sup Es[7Vs+t+1 - Ns+t_] ^ b0-
s,te[o,oo)
This yields the next-to-last inequality in the following: for s,t,x 6 [0, oo),
Ps(Bs+t_-l>x)
OO
= ^ Ps(Xl + • • • + Xi-! <t,X1 + --- + Xi>X+l + t)
»=1
oo «
= J2 (1 - Fs+y(x + l + t- y))Ps(Xi +■■■ + Xi-i £ dy)
i=i Jlo,t)
OO „
^ Y, / Q-F(x + l + t- y))P,(X! +■■■ + AVi edy)
i=i -A0-')
oo oo
5$ J] I](1 - ^ + n))ps(* -n<X1 + --- + Xt-! ^t-n + 1)
i=l n=l
oo
= ^2(l-F(x + n))Es[Ns+t-n - Na+t-n-i]
n=l
oo
^feo^(l-^ + "))
n=l
/•oo
Oo/ (1-F(y))dy.
Put a0 = b0JyF(dy) to obtain (7.8).
Second step of proof. We next show that if (V0, V\) and (W0, Wi) are
pairs of nonnegative random variables such that
Vo^W0
and, for s and x 6 [0, oo),
P(V| ^ x\V0 = s) =: Gs(x) ^ H,(x) := P{WX ^ x\W0 = a), (7.9)
404 Chapter 10. REGENERATION
then
Vo + Vi^Wo + Wi. (7.10)
This is an extension of Corollary 3.1 in Chapter 1 and we shall use the
same method of proof, the quantile coupling. Let Uq and U\ be independent
random variables that are uniform on [0,1], and for s 6 [0, oo), let Gj1 and
H~l be the generalized inverses of Gs and Hs- Then, for s 6 [0, oo),
P(G;1(Ul)e-) = P(vle-\Vo = s),
PiH-'iU,) e •) = P(Wi e -\Wo = s),
G^O/iK^-1^) [due to (7.9)].
Let g and h be the generalized inverses of the distribution functions of Vq
and W0. Then
9(Uo) = Vo,
h(U0) = Wo,
g(U0) ^ h(U0) [due to V0 J W0}.
Due to Fact 3.1 in Chapter 6 and since Uo and U\ are independent, we
have, for s 6 [0, oo),
P(G;(1[/o)([/1) e -\g(Uo) = s) = P(G;1([/1) e •),
P(H-1C/0)(f/i) e -\h(U0) = s)= PiH^iU,) e •)•
Combining all this implies
g(U0) + G^Uo)(Ul)^Vo + Vl,
h{U0) + H-lUo)(Ul)^W0 + W,,
g(U0) + G-lUo){JJ{) ^ h(U0) + H^Uo){U{).
This yields (7.10).
Section 7. The Coupling Time - Rates and Uniformity 405
Third step of proof. Let Y0,Yi,... be independent and independent
of M. Let Y0 and Yi have the distribution function G. Let Y2,Y3,... be
i.i.d. with distribution defined as follows: with a0 as at (7.8) put a : =
ao(l - p2)~1/2 and b := c V (t0 + 1) and let
P(Y2>X) = i1> X<b>
V ' {(fl(l-Gf(i)))Al, x^b.
We shall in this third step show by induction that for all n > 0,
Ln J (Y0 V Yi + Y2) + ■ ■ ■ + (Y3n V Y3n+1 + Y3n+2). (7.11)
First note that (7.11) holds for n = 0, since L0 = t0 Vf0 + to, since GG (the
distribution function of Y0 V Y\) dominates GG' (the distribution function
of T0 V t0) stochastically (that is, GG' > GG), and since Y2 ^ *o- Now
suppose (7.11) holds for some n ^ 0. Observe that
Ln+1 = Ln + 0n+l - 1) V (&+1 - 1) + (t0 + 1). (7.12)
From (7.2) we obtain that for x G [0, oo),
P((0n+1 - 1) V (#,+1 - 1) ^ a:|(Tn,<) = (a, a'))
(1-P)2
1-p2
(1 ~ P)P
1 — p
Ps(B{sVs,+to)_ - 1 s$ x)Ps,(B{sVs,+to)_ -l^x)
+ \ _P,2PPs(B{sVs,+to)- - 1 ^ x)(i([0,1 + x})
+ (i_P2Pp«'(B(«v»'+«o)- - 1 < *)M[0,1 + a:])-
Due to (7.8), Ps(5(sVs,+fo)_ -1 ^ x) and P's(5(sVs,+to)_ -1 ^ x) are both
greater than or equal to P(Y"2 ^ x), and so is /z([0,1 4- x]) since Y2 ^ c.
Thus, for x G [0, 00),
P(0n+i - 1) V (#,+1 - 1) ^ x\(fn,f^) = (s,s'))
> P(Y3{n+l) ^ x)P(Y3(n+1)+l^ X)
= P(Y3{n+i)VY3{n+1)+l^x).
Since {fk,f'k)^' is Markov, we have that (/3n+i - 1) V (/3'n+1 - 1) depends
on fn only through (fn,f'n). Thus, for x G [0, 00),
P((/3n+1 - 1) V (p'n+1 - 1) ^ i|Ln) ^ P(Y3(n+1) V y3(n+1) + 1 < 1).
406 Chapter 10. REGENERATION
Due to this and the induction hypothesis (7.11), (7.9) holds with
Vo = Ln,
V1 = (&+! - 1) V 0'n+1 - 1),
W0 = (Y0VY1+Y2) + --- + (Y3n V Y3n+1 + Y3n+2),
Wi = ^3(n+l) V *3(n+l) + 1;
and (7.10) yields
Ln + (Pn+l - 1) V (&+1 - 1)
^ (Y0\/Yl+Y2) + .. .+ (Y3n\ZY3n+1+Y3n+2)+Y3{n+1)\/Y3in+1) + l.
This, (7.12), and Y3^n+l^+2 ^ to + 1 yield that (7.11) holds with n replaced
by n + 1. Thus by induction, (7.11) holds for all n > 0.
Fourth, and final step of proof. Since Mis independent of Yo,Yi,...
and also of Lo, L\,..., we obtain from (7.11) that
D
Lm-\ ^ (*0 V YY + Y2) H h (l3(M-l) V ^3(Af-l) + l + *3(Af-l) + 2)-
By Lemma 7.1 we can take T = Lm-\ + V and since V ^ c ^ Y3m, this
yields
D
T ^ (Y0\/Yi+Y2) + . . . + (l3(Af-l)Vy3(Af-l) + l+^3(Af-l)+2)+^3Af-
The right-hand side is dominated by the right-hand side of (7.7). Thus (7.7)
holds, and a reference to Section 5.4 in Chapter 4 yields the total variation
claims. □
7.3 Moment Results for T — Uniform Rates of Convergence
Let X be a nonnegative random variable with distribution F. Say that X
and F have a finite geometric moment (or finite exponential moment) if
there is a p > 1 such that E[px] < oo.
Let ip be a nondecreasing function from [0, oo) to [0, oo). Say that X and
F have a finite ip moment if
E[<p(X)} < oo.
Define a function # from [0, oo) to [0, oo) by
$(x) = / (f(y)dy, ze[0,oo).
Jo
Section 7. The Coupling Time - Rates and Uniformity 407
If ip has a density with respect by Lebesgue measure, denote it by ip, that
is,
ip(x)=ip(0) + <p(y)dy, ze[0,oo).
Jo
Let A be the class of nondecreasing functions <p from [0, oo) to [0, oo) having
a density ip and satisfying
log<z>(:r) ... , _.
is nonmcreasmg in x and goes to 0 as x —> oo.
a;
If ip £ /l, then y> increases more slowly than any increasing geometric
function (that is, for all p > 1 it holds that ip(x)/px -> 0 as a; -> oo). Also,
if ip(x) = px for some p > 1, then <£> ^ /l.
Otherwise, /l is quite general. For instance,
if ip{x) = px for some p > 1 and 0 < (3 < 1, then ip£/l;
if <p(x) = ea \/xa for some a > 0, then y> 6 /l;
if <fi(x) = eVloga;, then <p G /l;
if y £ yl and a > 0, then ya £ /l;
if ip 6 /l and i/> 6 /l, then yi/) 6 A.
The stochastic domination result (7.7) yields the following powerful rate
results for the uniform convergence in Theorem 7.1.
Theorem 7.2. The following claims hold with sup* denoting the supre-
mum over all pairs (Z, Z') of time-inhomogeneous wide-sense regenerative
processes of the same type satisfying the conditions (7.3) through (7.6) in
Theorem 7.1 with G, ip, and F fixed.
(a) If G and F have finite geometric moments, then so has T, and thus
the uniform convergence is of geometric order: there is a p > 1 such
that
pl sup* \\P(0tz e •) - PtftZ' e -)II -+ o, t -+ oo.
(6) Let if £ A and suppose G has finite <p moment and F finite 4> moment.
Then T has finite ip moment, and thus the uniform convergence is of
order ip:
<p(t) sup* \\P(0tZ G 0 - PtftZ' e OH ->• 0, t -»• oo;
and of moment-order ip:
/•OO
/ <p(t) sup* ||P(04z e 0 - v(°tZ' e oil* < oo.
Jo
408 Chapter 10. REGENERATION
Proof. Let M,Y0,Yi,...,T be as in Theorem 7.1. We shall need the
following observation: if p > 1, p is a nondecreasing function from [0, oo) to
[0, oo) with density <p, and F is a distribution function on [0, oo), then
I pxF(dx) <oo O I px(l-F(x))dx < oo, (7.13a)
ip(x)F(dx) <oo O / <p(a;)(l-F(a;))da;<oo. (7.136)
See Section 5.4 in Chapter 4 for (7.136) and note that (7.13a) follows from
(7.136), since if <p(x) = px, then <p(x) = px\ogp.
To establish (a), suppose G and F have finite geometric moments. Then
J pxF(dx) < oo for some p > 1, and (7.13a) yields /0°° px(l-F(a;)) dx < oo.
Thus JpxGp(dx) < oo, and (7.13a) yields J*0°° px(l-Gp(x)) dx < oo. Since
P(F2 > z) ^ a(l - Gp(i)) for a; > 6 this yields /0°° pxP(Y2 > x) dx < oo,
and (7.13a) yields E[py2] < oo. Since pY2 decreases to 1 as p decreases to
1, this and dominated convergence yield that E[py2] decreases to 1 as p
decreases to 1. Since M has a finite geometric moment, so has 3M — 1.
Thus we may take p close enough to 1 for E[E[py2]3M_1] < oo. Since
lo, Y\,..., M are independent and Y2, Y3, ■ ■ ■ are i.i.d., we have
E[pT] = E^E^EtE^]3**-1].
Take p close enough to 1 for the three factors on the right-hand side to be
finite. Thus E[pT < 00 for some p > 1. This and a reference to Section 6
in Chapter 4 completes the proof of (a).
In order to establish (6), take ip 6 A and suppose G has finite ip moment
and F finite # moment. Then (7.136) yields f^° ip(x)(l - F{x))dx < 00.
Thus J(p(x)Gp(dx) < 00, and (7.136) yields J0°° <p(x){l - GP(x)) dx < 00.
Since P(Yz > x) ^ a(l — Gp(x)) for x > 6, we obtain from this that
J0°° ip(x)P(Y2 > x)dx < 00, and (7.136) yields E[^(F2)] < 00. Due to
Lemma 7.2(a) below,
Efo>(f)] ^ E[p(Y0)]E[p(Y1)}E[p(Y2) + ■■■ + ip(YM)].
The first two factors on the right-hand side are finite by assumption and,
due to E[<£>(Y2)] < 00 and Lemma 7.2(6) below, so is the third. Thus
E[<£>(?)] < 00. This and a reference to Section 6 in Chapter 4 completes
the proof of (6). □
In the above proof we needed the following key properties of the class A.
Lemma 7.2. (a) If p £ A, then
p(x + y) ^ <p(x)<p(y), x, y e [0,oo).
Section 7. The Coupling Time - Rates and Uniformity 409
(6) Let Wi,W2, ■ ■ ■ be i.i.d. nonnegative random variables that are
independent of a nonnegative integer-valued random variable K with
finite geometric moment. If ip 6 A and ~E[<p(Wi)] < oo, then
E[p{Wi + ■■■+ WK)} < oo.
Proof. Take ip £ A and recall that
log* (n(x)
a(x) : = is nonincreasing in x and goes to 0 as x —> oo.
x
We obtain (a) as follows: for x, y 6 [0, oo),
<p(x + y) = e<x+v)x e<x+v^ < e<x)x ea^y = ip{x)ip{y).
In order to establish (b), note that for x 6 [0, oo),
E[tp(Wi +■■■ + WK)) s$ E[ip(x + W1+--- + WK)}
— E\ea(x+Wl~{—^wk)(x+w1+---+wk)-i
< e*(x)xE[ea(-x+Wl+-+WK){Wl+-+WK)] (a(-) nonincreasing)
_ ea(x)x grea(a:+WH \-WK)Wi e<i(x+Wi-\ \-WK)WKl
^ ea{x)x E[e<x+Wl)Wl ... ea(x+WK">WK] (o(-) nonincreasing)
= e<x)x E[E[ea(x+H/l>H/l]x] (K independent of the i.i.d. Wk).
Since a(-) decreases to 0, we have that ea(x+w^)w^ _>. i as x ->• oo and also
that ea(x+^iWi ^ ip(Wi). Thus, by dominated convergence,
E[v3(Wri)]<oo implies E[ea^x+Wl)Wl] -»• 1 as x -+ oo.
Thus if K has geometric moment, then we can take x large enough for
E[ea(-x+w^Wi] to be close enough to 1 for E[E[ea^x+w^w']K] < oo. Thus
(b) holds. □
7.4 Sharper Moment Results for T - Nonuniform Rates
In Theorem 7.2 we needed a finite # moment of F to obtain a finite ip
moment of T for ip e /l. Recall that the functions ip 6 /l increase more slowly
than any increasing geometric function. We shall now consider increasing
power functions ip(x) = xa and, more generally, certain functions ip that
increase more slowly than some power function, such as p(x) = :ra(e Vloga;),
for instance. For these functions we relax the finite # moment condition
for F to a bounded <p moment condition directly on the recurrence
distributions Fs, s £ [0, oo), themselves (not on the stochastically dominating
F) to obtain a finite p moment of the coupling time T itself (but not of
410 Chapter 10. REGENERATION
the stochastically dominating T). This yields sharper rate results for these
functions at the expense of losing the uniformity.
Let a > 0 and let X be a nonnegative random variable with distribution
F. Say that X and F have a finite a moment if
E[Xa] < oo.
Theorem 7.3. Under the conditions of Theorem 7.1 the following claims
hold.
(a) Let a > 0 and suppose So and S'0 have finite a moments and
sup / xaFs(dx) < oo {bounded a moments).
s6[0,oo) J
Then T has a finite a moment, and thus the convergence is of power
order a:
ta\\P(0tZe-)--P(0tZ'e-)\\->O, t^oo;
and of power moment-order a — 1:
/•oo
/ r-^lP^z e ■) - PtftZ' e -)\\dt < oo.
(b) Let ip be a nondecreasing function from [0, oo) to [0, oo), with ip(0) = 0
and lirriz-xx, <p(x) = oo, and having density ip with respect to Lebesgue
measure. Suppose either
(f is concave, E[<p(So)] < oo, and E[(f(S'Q)] < oo (7-14)
ip is convex,
ip is strictly increasing, ip(0) = 0, and lim <p(x) = oo,
x—>oo
there is a c < oo such that <£>(2x) ^ap(x) for x6 [0, oo), <■'• '
E[ip(S0)]<oo,E[ip(S'0)]<oo, and sup hpdF3 <oo.
se[o,oo)J
Then E[ip(T)] < oo, and thus the convergence is of order <p:
<p(t)\\P(0tZ e •) - PtftZ' e i-)\\ -► 0, t -»• oo;
and of moment-order <p:
/•OO
/ <p{t)\\P(0tZ€-)-V(0tZ'€-)\\dt<
Jo
Section 7. The Coupling Time - Rates and Uniformity 411
Comment. Since (a) is a special case of (ft), it suffices to establish (b).
We shall do so in six steps below. The proof relies on certain facts about
the convex functions at (7.15) not proved here. Readers not content with
that, or who only want (a) and do not wish to go into the generality of
(6), can obtain (a) by working through the proof below with ip(x) = xa
and with the so-called Orlicz norm || • \\v replaced by the better-known La
norm || • ||a.
First step of proof. We shall first show that if <p is a nondecreasing
function and sups6r0 ^ J ipdFs < oo, then there are finite constants a\ and
b\ such that
sup Es[y(5s+(_)] s$ai +M, *e[0,oo). (7.16)
s6[0,oo)
For that purpose, fix s, t 6 [0, oo) and note that
Bs+t- ^XNs+t_ ^ max{X0, ...,Xjvj+(_},
and thus (since <p is nondecreasing)
ip(Bs+t-) ^ max{y(X1),.. .,<p{XNs+t_)} ^ ip(X1) + .. . + ip(XNa+t_).
The sum on the right-hand side equals Xl^=i <p(Xk)l{N3+t-^k}! and thus
taking expectations and interchanging sum and expectation yields
EMBs+t-)] ^ J2EMXk)l{Nl+t^k}]- (7-17)
/t=i
Now, l{Na+t_^k} = l{sjt_1<s+t}, which yields the equality in
Es[ip(Xk)l{N3+t_2k}\Sk-i] = Es[<p(Xk)\Sk-i]l{N,+t->k}
^ cil{jv,+(_^fc}, where ci := sup ipdFr.
r6[0,oo) J
Take expectation to obtain from this and (7.17) the first inequality in
oo
EMBs+t-)} ^ ci£e,[1{JVi+,_^}]
fc=i
= ciESE V^,-^}] = ciE,[JV.+t_]
^ ci ( - H—- ), where a and b are as at (6.10).
\b ab)
This yields (7.16).
412 Chapter 10. REGENERATION
Second step of proof. We next show that if <p is a nondecreasing
function and sups6r0]OO) J <pdFs < oo, then there are finite constants a2 and fe2
such that
E[¥.()9nV)9;)|rn-i,f;_1]<a2+62()9n-iV)9;-1), n^l, (7.18)
where 0k,^'k)f and (ffe,f^)o° are from Lemma 7.1 and (/3o,J30) '■= (to,t0).
In order to establish (7.16), fix n > 1 and note that
ip(xVy)=ip(x)V <p(y) ^ip(x)+ip(y), x,ye[0,oo), (7.19)
to obtain
Eb09nV#l)|fn_1,f;_1]
<E[¥.(4„)|fn_i,^_1]+E[¥.()9„)|f„-1)f;_1].
Due to (7.2), we have
P(4ne-|(f„_1)f;_1) = (s,0)
<P.(B(.v..+4o)_e ■)/(!-p2)-
(7.20)
(7.21)
Combine this and (7.16) with t replaced by s V s' + to — s [and note that
s V s' + to — s = (s' — s)+ + to] to obtain
n<p(0n)\(fn-1X_1) = {s,s')]^(a1+b1((a' -s)+ + t0))/(l-pi).
In the same way we get
E[^;)i(fn_1,f;_1) = (S,s')]^(ai + foi((*-*')+ + *o))/(i-P2).
Add these two inequalities [and note that (s' — s)+ + (s — s')+ = \s — s'|]
to obtain, due to (7.20), that
E[^(/3n V &n)\fn^,f'n^} ^ (20! + 2Me> + feil^n-i - r;_1|)/(l - p2).
Now, |f„_i - f;_x| = |0n_, - #,_,! ^ /?„_! V #,_!, and (7.18) follows.
Third step of proof. We now show that if ip is a nondecreasing function
and
lim ip(x)/x = oo and sup / <pdFs < oo, (7.22)
x^°° se[o,oc)J
then there are finite constants a^ and &3 such that
E[V0nV0'n)\Tt>,TJo]^a3+b3{TO\/TJo), n>l. (7.23)
Section 7. The Coupling Time - Rates and Uniformity 413
To this end, fix n > 1 and note that for each e > 0 there is a ce such that
x ^ c£ + e<p(x) for all x £ [0, oo). This and (7.18) yield
E[^(/3n V #,)|f„_i, f;_i] ^ o2 + 62(ce + e^-i V &„_{)).
Put e := (2&2)-1 and d3 := a2 + b2ce to obtain
EM^nV^)|fn-1,f;_1]^d3 + 2-V(^n-lV^_1).
Take conditional expectations with respect to (f„_2,T^_2), ■ ■., (ti,t{),
recursively, to obtain
n<P{0n V #,)|fi, f[] ^ d3(l + 2-1 + • • • + 2"-2) + 2n"V(A V #)).
Applying (7.18) and 1 + 2_1 + • • • + 2n~2 ^ 2 and 2"-1 ^ 1 yields
EM/3n V ^)|ro,ro] ^ 2d3 + a2 + b2(f0 V f'Q).
Thus (7.23) holds.
Fourth step of proof. We now show that if 93 is a nondecreasing
function and (7.22) holds and also E[<£>(So)] < 00 and E[<£>(Sq)] < 00, then there
is a constant a4 such that
E[tp(f0 V fj)] s$ 04 and E[^(/3„ V #,)] s$ o4 for n > 1. (7.24)
For this purpose, note that E[y>(So)] < 00 and E[^(5q)] < 00 implies
E[So] < 00 and E[5q] < 00 due to lima._HX) ip(x)/x = 00. Put
a4 = (Efe>(S0)] + E[<p{S'Q)]) V (03 + fe3E[S0] + b3E[S'0})
to obtain from (7.19) that
EMf0 V ?0)] < Efc>(S0)] + E[<p(S'0)] ^ a4
and from (7.23) that
EM/3n V ^)] ^ a3 + b3E[S0] + fo3E[S^] ^ a4,
that is, (7.24) holds.
Fifth step of proof. We shall now establish (6) in the convex case
(7.15). So assume (7.15). See Garsia (1973), Krasnoselskii and Rutickii
(1961), and the appendix of Neveu (1972) for the following facts about
convex functions ip as at (7.15):
f(x/a) ^ ip(x)/a, a £ [l,oo), x £ [0, 00), (7.25a)
a := sup xip(x)/ip(2x) e [l,oo), (7.25b)
x6[0,oo)
ip(x/a)^<p{x)/aa, a 6 [l,oo), iG[0,oo), (7.25c)
414 Chapter 10. REGENERATION
and the Orlicz norm || • \\v (an extension of the La norm || • ||a) is defined
by
\\X\\v:=mf{a>0:E[ip{X/a)]^l}
for nonnegative random variables X such that the set on the left-hand side
is nonempty.
From (7.25a) we obtain
||X||V<1VE[^(X)], (7.26)
and from (7.25c) we obtain
Eb(X)UlV(||XyQ. (7.27)
Due to Lemma 7.1,
oo
E[<p{T)] = Y, EM^n-i + V)]P{M = n). (7.28)
n=l
From (7.24) and (7.26) we obtain, with a5 : = 1 V a4 V ||£0||^ V H^l^,
maX{||T0VTo||v, \\t0\\v, ||A VftlU lift Vft||„, ••-, \\V\\V} ^ aB. (7.29)
Now (using Ln_! = f0 Vf0+n£0 + A V/3[ H \-Pn-iV0'n-i f°r the second
inequality),
E[¥)(Ln_1 + V)] ^ 1 V ||Ln_, + V||V)Q [due to (7.27)]
n-l
^ 1 V (||f0 V f0\\v + n\\t0\\v + J2 \\h V All <p + 11* lip)
k = \
s$ (a5)a(2n+l)a [due to (7.29)].
From (7.28) we now obtain
oo
E[¥>(T)] ^ (o5)Q £(2n + 1)QP(M = n),
n=l
which is finite since M is geometric. This and a reference to Section 6 in
Chapter 4 completes the proof of (b) in the convex case.
Sixth, and last step of proof. We now establish (ft) in the concave
case (7.14). So let <p be concave and assume that E[y>(So)] < oo and
E[<£>(Sq)] < oo. Due to Lemma 7.3 below, it follows from the condition
f xF(dx) < oo that there is an increasing function ip with limx_>oo ip(x) =
Section 7. The Coupling Time - Rates and Uniformity 415
oo such that J%pdF < oo. Since (7.22) is satisfied with ip replaced by ip,
there are constants a§ and be such that
E[ip(pnV p'n)\f0,f£^a6 + b6(f0Vf^, n>l. (7.30)
It is no restriction to assume that ip(x) > x for all x G [0, oo). This yields
the second inequality in
E[<p{0n V p'n)\f0, f0] ^ tpCElpn V #,|f0) fj]) (Jensen's inequality)
^ ip(E[fl>(fin V4;)|7b,fi]) 0/>(x) > x for all x G [0,oo))
^^(o6 + fo6(roV^)) (due to (7.30))
^ y(o6) + be(p(fo V Tq) (since 93 is concave).
Take expectation to obtain E[ip(/3n V /3'n)] ^ y(a6) + &6E[<£(fo V 7q)] for
n ^ 1. Due to (7.19), E[^(f0 V f^)] ^ E[^(50)] + E[^(5^)], which is finite
by assumption. Thus there is a finite constant c$ such that
Efe>(f0 V fj)] ^ c6 and E[^(/3n V #,)] ^ c6 for n > 1. (7.31)
Thus (using Ln_! = f0 Vf0 + nt0 + A V^+•■•+ /?„_! V $'n_x and the
concavity of ip for the first inequality)
Efc>(Ln_i + V)]
n-l
^ E[^(f0 V fo)] + «¥>(*„) + Y. E^n V #•)] + E[^)]
fc = l
^ n(^(£0) 4- c6) + y)(c) (due to (7.31) and V ^ c).
This and (7.28) yield
OO
E^(T)] ^ £>(¥>(*(,) + c6) + v>(c))P(M = n),
n=l
which is finite since M is geometric. This and a reference to Section 6 in
Chapter 4 completes the proof of (b) in the concave case; and Theorem 7.3
is established. □
The following lemma was used in the last step of the above proof.
Lemma 7.3. If X is a nonnegative random variable and E[X] < 00, then
there exists an increasing function tp with limx_>0O ip(x)/x = 00 and such
that E[ip(X)] < 00.
PROOF. The result is trivial if there is an x < 00 such that P(X > x) = 0.
So suppose P(X > x) > 0 for all x G [0,00) and put
tn = ini{x > 0 : E[Xl{x>x}] <: 1/2"}, n > 0.
416 Chapter 10. REGENERATION
By dominated convergence E[Xl{x>x}] —> 0 as x —> oo, and thus tn
increases strictly to infinity as n —>• oo. Define ip by ip(0) = 0 and, for x > 0,
i/>(i) = Hji, where nx is such that tUl < x ^ t„I+i.
Then i/> is increasing and ip(x)/x = nx —>• oo as a; —>• oo. Moreover, i/>(X) ^
X^n-X^xx,,}. and thus
oo oo
E[^(X)] ^ J>E[X1{X>M] ^ J>/2" < °°
0 0
as desired. □
7.5 The Time-Homogeneous Case
We shall end this overlong section by considering briefly some consequences
of the above theory in the time-homogeneous case, that is, in the special
case of classical regenerative processes (Section 3) and wide-sense
regenerative processes (Section 4). There are two aspects of the time-homogeneous
case that make this worthwhile. Firstly, the conditions simplify, since the
recurrence distribution F does not depend on the time of regeneration.
Secondly, there exists a stationary version and the results on asymptotic
stationarity (Theorem 3.3 and Theorem 4.3) are improved by the above
theory.
Theorem 7.4. Let (Z*,S*) be stationary and classical regenerative, or
stationary and wide-sense regenerative. Let F be the distribution of X^
and let G be a fixed probability distribution function on [0,oo). Then the
following claims hold.
(a) Suppose X{ is spread out. Then
6tZ%Z\ t-^oo,
uniformly in versions (Z, S) of (Z*, S*) with delay length Sq satisfying
P(S0 ^ x) > G(x), x £ [0, oo). (7.32)
Further, this uniform convergence holds uniformly in (Z*,S*)
satisfying
P(XZ s$ x) > F(x), x £ [0, oo), (7.33)
P(X; + ---+X*£B)>[f(x)dx, BeB[0,oo), (7.34)
Jb
with F, f, and n fixed and such that f xF(dx) < oo and j0 / > 0.
Section 7. The Coupling Time - Rates and Uniformity 417
(b) Suppose X* is lattice with span d and let (Z**,S**) be the periodically
stationary version of (Z* ,S*). Then
6ndZ % Z*\ n ^ oo,
uniformly in versions (Z,S) of (Z*,S*) with dli valued delay length
So satisfying (7.32). Further, this uniform convergence holds
uniformly in (Z*,S*) satisfying (7.33) and
P(X; = kd) > f(kd), k > 1, (7.35)
with F and f fixed and such that J xF(dx) < oo and f is aperiodic.
In both cases [(a) and (&)] the following rate results hold:
If G and F have finite geometric moments,
then the uniform convergence is of geometric order.
If <p G A and G has finite p moment and F has finite <£ moment,
then the uniform convergence is of order p and of moment order (p.
PROOF. To obtain (a) from Theorem 7.1, take (Z',S') := (Z*, (S*kn)^=0)
and note that (7.33) implies (see Section 3.3 in Chapter 1)
P(X*1+--- + X*n^x)>Fn(x), ie[0,oo),
where Fn is the distribution of the sum of n independent random variables
with distribution F (the nth convolution power of F). Thus the
condition (7.6) in Theorem 7.1 holds with F replaced by Fn. The condition
(7.4) follows from (7.34). Further, with a := 1/ J0°° yf(y)dy we have [using
F(y) ^ F(y) and 1/ J yF(dy) ^ a for the inequality] that for x £ [0, oo),
P(S0* ^x) = l-J°°(l- F{y))dy/ j yF(dy)
>(l-a|0°(l-F(2/))d2/)+ (7.36)
= (l - aGp(x) j yF(dy))+ =: G(x) (say).
Since G(x) ^ 1 and G(x) ^ 1, this yields P(S£ ^ x) > G(x)G(x), while
(7.32) yields P(S0 ^ x) > G(x)G(x) for x G [0,oo). Thus the condition
(7.3) in Theorem 7.1 holds with the distribution function G replaced by
the distribution function GG. Thus we obtain (o) from Theorem 7.1.
In order to obtain (ft) from Theorem 7.1, take (Z',S') := (Z**,S**)
and note that the condition (7.6) in Theorem 7.1 follows from (7.33) and
that the condition (7.5) in Theorem 7.1 follows from (7.35). Furthermore,
418 Chapter 10. REGENERATION
P(Sq* ^ x) > P(Sq ^ x) for x G [0,oo), and thus (7.36) [this time with
a := 1/ ^^° kdf(kd)\ again yields that the condition (7.3) in Theorem 7.1
holds with the distribution function G replaced by the distribution function
GG. Thus we obtain (b) from Theorem 7.1.
It order to obtain the rate results from Theorem 7.2, we shall first show
that if ip G A and X and Y are nonnegative random variables, then
E[#(X)] < 00 and E[#(F)] < 00 implies E[#(X + Y)} < 00. (7.37)
To this end, note that for i£[l, 00) and y G [0, 00),
rv
<$>(x + y) = <P(x) + / <p(x + s) ds
Jo
^ #(z) + ip(x)$(y) [due to Lemma 7.2(a)]
^ #(z) -f <p(l)ip(x - l)^(y) [due to Lemma 7.2(a)]
^#(x)(l+^(l)#(2/)) [^(1-1)^/ ¥>(s) ds «; *(a:)].
This yields the second step in
E[#(X + Y)] ^ E[#((l V X) + y)]
^ E[#(l V X)](l + p(l)E[#(y)]) [X and y independent]
^(#(l)+E[#(X)])(l + ^(l)E[#(y)]).
This yields (7.37).
From (7.37) it follows (recursively) that if <p 6 A and F has finite #
moment, then so has Fn. Note also that if p > 1 and J pyF(dy) < 00
then J pyFn(dy) = (Jpyp(dy))n < 00. Thus in the spread-out case (a) the
moment conditions on F transfer to Fn. Thus in both cases (a) and (ft) the
moment conditions in Theorem 7.2 on the distribution function dominating
the recurrence distributions stochastically are satisfied.
Thus it only remains to establish that this is also true for the distribution
function dominating the delay-length distributions stochastically. In order
to establish this, note first that GG is the distribution function of the
maximum of two independent random variables with distribution functions
G and G, respectively, and thus
/ ipd(GG) ^ / ipdG + / ipdG for nonnegative ip. (7.38)
Note next [see (7.13a)] that if F has a finite geometric moment, then so
has Gp, and thus so has G. Note finally [see (7.13ft)] that if if G A and F
has a finite # moment, then Gp has a finite ip moment, and thus so has G.
This and (7.38) yield that if G and F have finite geometric moments, then
Section 7. The Coupling Time - Rates and Uniformity 419
so has GG, and if ip 6 A and G has finite <p moment and F has finite #
moment, then GG has finite ip moment. Thus Theorem 7.2 yields the last
claim of the theorem. □
Two processes, Z and Z', both converging in total variation to the same
stationary limit process Z*, can converge to each other at a faster rate. The
following theorem is an example of this: the a and <p moment conditions on
X\ show up directly as convergence of power order a and of order <p. This
was also the order of the convergence to stationarity in Theorem 7.4, but
the moment conditions on X\ were one order higher, namely a + 1 and #.
Theorem 7.5. Let (Z,S) be classical regenerative or wide-sense
regenerative. Let (Z',S') be a version of(Z,S). Suppose either X\ is spread out,
or Xi is lattice with span d, and So and S'0 are d7L valued. Suppose further
E[Xi] < oo.
Let a > 0 and suppose So, S'Q, and X\ all have finite a moments. Then
there are distributional coupling times for Z and Z' with finite a moments,
and thus the convergence is of power order a :
ta\\P(6tZ E-)-P(6tZ' e-)ll -+0, t-+oo;
and of power moment-order a — 1:
oo
ta-l\\P{6tz e ■) - *(6tZ' e -)\\dt < oo.
More generally, let <p be as in Theorem 7.3(b) and suppose So, S0, and X\
all have finite ip moments. Then there are distributional coupling times for
Z and Z' with finite <p moments, and thus the convergence is of order <p :
<p(t)\\P(0tz e ■) - PtftZ' e -)II -»• o, t -+ oo;
and of moment-order ip:
oo
<f>(t)\\P(0tze-)-P{0tz'e-)\\dt <oo.
Proof. This theorem is an immediate corollary of Theorem 7.3. □
7.6 The Renewal Theorem — SpreaclrOut Case
Blackwell's renewal theorem was established in Chapter 2 (Theorem 8.1)
using epsilon-couplings, and improved in Chapter 3 (Theorem 6.2) using an
exact coupling based on the Ornstein idea. The improved version states that
when the recurrence times are spread out and have a finite first moment,
then the limit result holds in total variation on bounded intervals. The
coupling results of this section enable us to sharpen this further to hold in
total variation on the whole half-line, provided that the recurrence times
have a finite second moment and the delay time has a finite first moment.
420 Chapter 10. REGENERATION
Theorem 7.6. Let S be a renewal process. For B G B([0, oo)), let N(B)
be the number of renewals in B, that is,
oo
k=0
Let E[JV] 6e i/ie intensity measure, that is, the measure with mass E[N(B)]
at B G B([0,oo)). Let E[JV(£ + ■)] be the measure on [0, oo) with mass
E[JV(£ + B)] at B G S([0, oo)). Lei A fee i/ie Lebesgue measure on [0, oo).
//Xi is spread out, E[X^] < oo, and E[5o] < oo, then the signed measure
E[JV] - A/EfA-!2] is bounded and
||E[JV(t + ■)] - A/E[X,]|| ->0, £->oc. (7.39)
Proof. Let 5' have the same recurrence time distribution as 5 and the
delay time distribution Goo from Corollary 8.1 in Chapter 2 (note that
Goo = GF). Then E[N'] = A/E[A"i] (see the proof of Theorem 6.2 in
Chapter 3). With t G [0, oo) and B G B([0,i\), this yields the first step in
|E[JV(B)] - A(B)/E[X,]| = |E[JV(B)] - E[JV'(B)]|
A (7.40)
*s£|E[JV(Bn)]-E[JV'(B„)]|
n=0
where
Bn:= BD [n,n + 1).
Due to Theorem 6.2, there are randomized stopping times K and K' with
respect to 5 and 5', respectively, such that with T = Sk and T' = S'K,,
(N(T + -),T) = (N'(V + -),T'),
which implies N(Bn)l{T^ny = N'(Bn)l{T,^ny. Apply this to (7.40) to
obtain
|E[JV(B)]-A(B)/E[X,]|
*S £ |E[iV(i?n)l{T>n}] - E[JV'(B„)l{T,>n}]|.
(7.41)
n=0
Now N(Bn) s$ N{[SNn_,SNn_ + 1]), and thus
E[JV(B„)l{T>n}] ^ E[N([SNn_,SNn_ + 1])1{T>„}].
The right-hand side equals E[N°]P(T > n) because the event {T ^ n} is
independent of N([SNn_,SNn- + 1]) which is a copy of N? [since K is a.
randomized stopping time]. Thus
E[N(Bn)l{T>n}] ^ E[iY°]P(T > n).
Section 7. The Coupling Time - Rates and Uniformity 421
Similarly,
E[JV'(Bn)l{T,>n}] ^ E[iV°]P(r' > n).
Since P(T' > n) = P(T > n), this and (7.41) yield
OO
|E[JV(B)] - A(B)/E[X1]| ^ E[N!] £ P(T > n).
n=0
Take the supremum in B G B([0, t]) and £ G [0, oo), multiply by 2, and
use [see Lemma 5.2] ^2^=0 P(T > n) ^ E[T + 1] to obtain the following
coupling inequality
||E[JV] - A/E[X!]|| ^ 2E[iVI0]E[r + 1]. (7.42)
Since EfXj2] < oo, Goo nas a finite first moment (see (7.136). Thus So and
S0 are both dominated stochastically by the distribution function GGoo,
where G is the distribution function of S0- Since both G and Goo have
finite first moments, so has GG oo- This and EfXj2] < oo imply, in view of
Theorem 7.2(6), that E[T] < oo. This and (7.42) yield that E[iV]-A/E[X,]
is bounded, which in turn implies (7.39). □
The following results on uniform convergence and rates of convergence can
also be established along the lines of the proof of Theorem 7.4 Section 6.
(This can be extended to regenerative random measures; see Thorisson
(1983).)
Let G be the delay time distribution of 5 and F the recurrence time
distribution. Let G be a distribution function with a finite first moment, F
a distribution function with a finite second moment, n a positive integer,
and / a nontrivial subprobability density. Let sup* denote the supremum
over all G that are stochastically dominated by G, and over all F that are
stochastically dominated by F and such that the nth. convolution power of
F has a density component /. Then
sup* \\E[N] - A/E[X,]|| < oo (7.43)
and
sup* ||E[JV(t + ■)] - A/E[X,]|| -> 0, t ->■ oo. (7.44)
Moreover, the same argument as the one leading to (7.42) yields
||E[JV(t + •)] - A/E[X,]|| ^ 2E[iVI°]E[(T - t)+ + 1].
From this it follows easily that if G and F have finite geometric moments
then the uniform convergence at (7.44) is of geometric order. And if ip G A
has a density ip which in turn has a density (p, and G has finite ip moment
and F has finite # moment, then the uniform convergence at (7.44) is of
order ip and of moment order (p.
422 Chapter 10. REGENERATION
8 Asymptotics From-the-Past
How are things now (and from now on) if they started long ago?
The traditional probabilistic way to answer this loosely formulated
question is to start a stochastic process at time 0, consider its distribution in a
time interval [t,co), and check whether it stabilizes as t —> oo (asymptotic
stationarity); see Figure 8.1.
ASYMPTOTICS
TO-THE-FUTURE
* x s' » ' J
FIGURE 8.1. Realization of a process starting at time 0.
This we have done repeatedly up to now, obtaining asymptotic stationarity
for Markov chains in Chapter 2 and for classical regenerative and wide-sense
regenerative processes in this chapter. This approach did not work,
however, for (truly) time-inhomogeneous regenerative processes: according to
Theorem 5.5, asymptotic stationarity forces a time-inhomogeneous
regenerative process to be time-homogeneous in the long run.
In this section we shall reverse this taking limits to-the-future approach
as follows. We start a stochastic process at an arbitrary time r, consider its
distribution in a fixed time interval [i, oo), and check whether it stabilizes
as the starting time r goes backward to —oo, r I —oo; see Figure 8.2.
ASYMPTOTICS
FROM-THE-PAST
■ 1 ■
—oo <— y
t fixed
FIGURE 8.2. Realization of a process starting at time r.
As an answer to the above question, taking limits from-the-past in this
way is even more natural than taking limits to-the-future. Of course, for
time-homogeneous processes the two approaches are equivalent. The point
is that unlike taking limits to-the-future, this taking limits from-the-past
approach also works for time-inhomogeneous processes and thus widely
extends the class of processes admitting a limit law.
Section 8. Asymptotics From-the-Past 423
In this section we establish that there is a limit from-the-past of time-
inhomogeneous regenerative processes (wide-sense or not) satisfying the
conditions from Theorem 5.3. The proof is based on the stochastic
domination result in Theorem 7.1. At the end of the section we extend this result to
processes that are time-inhomogeneous regenerative only up-to-time-zero.
8.1 Preliminaries
In order to take limits from-the-past we must consider processes with time
set [r, oo). So fix an arbitrary r G (—oo,0] and add the following to the
framework from Section 2. Let
Z{T) = (^r))s6[r,oo)
be a one-sided stochastic process with time set [r, oo), state space (E,£),
and path set H^ obtained from the internally shift-invariant subset H of
£[0,°o) by
H(r> := {(zs-r)se[r,oo) : (zs)se[o,oo) £ H}.
Let %^ be the trace of H^ on £lr'°°). For t G [r, oo), define the shift map
8t on H^ to be the map taking z = (.zs)s6[r)00) G H^ to
8tz := (zt+s)sE[o,co) £ H.
Note that the process 6rZ^ has time set [0, oo) and that the process Z^
is shift-measurable if and only if 8rZ^ is shift-measurable.
Let S<r> = (S£r))§° be a one-sided sequence of random times satisfying
r ^ S{0r) < S[r) < > oo.
Regard S^r' as a measurable mapping from (fl,J-) to the sequence space
(Z,W,£(»•)), where
i(r) = {(**)o° e [r, oo)*0'1'- > : so < si < • • ■ -► oo} = r + L,
£(r) = L(r)nB{0,h...} = the Bord Subsets 0f £,(»•).
For t G [r, oo), define the joint shift-map 8t on H^ x l/r) to be the map
taking 0, (sfc)g°) G H^ x Z/r) to
0t(z,(s*)S°) := (etz,(snt_+k-t)%>) e'HxL,
where n<_ = inf {n ^ 1 : sn ^ £}.
For t G R, define the shift-map 8t on i?18 to be the map taking z =
{zs)seR G ER to
Otz := (zt+s)se[0>oo) G £[°'°°)
and note that although z = (zs)ses. G £* is two-sided, the shift 8t is
one-sided.
424 Chapter 10. REGENERATION
8.2 Time-inhomogeneous Regeneration in [r, oo)
The definition of time-inhomogeneous regeneration for a process with time
set [r, oo) is analogous to the definition for a process with time set [0, oo).
Here we shall only focus on the essentials needed for the asymptotics from-
the-past.
Let Z^ and S^ be as above and Z^ be shift-measurable. Call the
family {Z^r\ S^), r G (—oo,0], time-inhomogeneous regenerative of type
p{\) if p{\) is an ((R,B(R)),(H x L,H® £)) probability kernel and, for
r G (-oo, 0], n ^ 0, and A G % ® C,
P(esM(z^,s^)eA\(z^)s M sir),...,sP)
6l°"b" > (8.1)
= p(A|5W)a.s.
Call the family (Z^r\S^), r £ (—oo,0], time-inhomogeneous wide-sense
regenerative of type p(-\-) if instead of (8.1) it holds only that
P(0sir)(ZW,SW) G A\S£\...,SP) =p(A\SP) a.s. (8.2)
In both cases S^r' is a discrete -time strictly increasing Markov process
with state space ([r, oo), B([r, oo))). Let
be the recurrence times and let Fs, s G [r, oo), be the recurrence
distributions: for all r G (—oo,0] and n ^ 1,
FS(A) = P(lW G ^|5n_! =s), se [r, oo), A G S([r, oo)).
For examples of processes that are time-inhomogeneous regenerative in
[r, oo), replace [0, oo) by [r, oo) in Section 5.3.
8.3 Total Variation Convergence From-the-Past
We are now ready to establish the asymptotics from-the-past under the
familiar conditions from Theorem 5.3.
Theorem 8.1. Let (Z^r\ S^), r G (—oo,0], be a time-inhomogeneous
wide-sense regenerative family of type p(-\-)- Let G^ be the distribution
(r)
of Sq — r and Fs, s G IK, the recurrence distributions. Firstly, suppose
there is a distribution function G on [0, oo) such that for r G (—oo, 0],
G(r)(x) ^G(x), a:e[0,oo). (8.3)
Secondly, suppose either there is a subprobability density f on [0, oo) such
that f f > 0 and, for s G IK,
FS(B)> f f, BeB[0,co), (8.4)
Jb
Section 8. Asymptotics From-the-Past 425
or there is a d > 0 and an aperiodic subprobability mass function f on
{d, 2d, 3d,... } such that for s £ R,
Fs({kd}) ^ f{kd),k ^ 1, and S and S' are dZ valued. (8.5)
Thirdly, suppose there is a distribution function F on [0, oo) such that
f xF(dx) < oo and, for s £ E,
Fs(x)^F(x), x£[0,oo). (8.6)
Then there exists, for each t £ R, a stochastic process /?(*•') = (Z« )sz[t,oo)
such that
QtZ(r) 4 6tZ{*'V asri -oo, when (8.4) holds, (8.7a)
6tZ{md) 4 ^Z^'*) asm| -oo, when (8.5) ftoirfs, (8.76)
and the distribution of Z^*'^ is determined by the typep(-\-).
Proof. Take r' <r < £A0. In the lattice case [(8.5)] take further r, r' £ dZ.
We are going to apply Theorem (7.1) to 0r(Z(r),S<r)) and 6>r(Z(r'>,S(r')).
Note that 6>r(Z(r),S(r>) and 6»r(Z(r'),S(r')) are both time-inhomogeneous
wide-sense regenerative of type p(-\r + •)• The recurrence distributions are
Fr+S, s £ [0,oo). Due to (8.4), (8.5), and (8.6), the conditions (7.4), (7.5),
and (7.6) in Theorem 7.1 are satisfied.
In order to establish also the condition (7.3) in Theorem 7.1, note first
that the delay length of 6>r(Z(r'),S(r')) is
Br_ = the residual life immediately before time r.
Now, 6ri (Z(r ), S(r )) is time-inhomogeneous wide-sense regenerative of type
p(-\r' + •) with recurrence distributions Frt+S, s £ [0, oo), satisfying [again
due to (8.4), (8.5), and (8.6)] the conditions (7.4), (7.5), and (7.6) in
Theorem 7.1. Thus the result at (7.8) yields the existence of a finite constant
ao determined by / and F such that for x £ [0, oo),
sup Y{B(r0 - 1 > x\S{0r,) = s) ^ o0(l - Gp(x)). (8.8)
sE.[r' ,r)
On the event {S{0r'] > r} we have B{/J = S^ - r < S^0 - r'. This and
(8.8) yield the inequality in
P{B£)>x)
= P(f£!> > x,S^ * r) +E[p(f#:> > x\S^)l{sr<r}]
^P(S{0r,) -r' >x) +a0(l-GF(x-l)), x€ [0,oo).
426 Chapter 10. REGENERATION
Define a distribution function G on [0, oo) by
G(x) = l-(l-G(x) + a0(l-Gp(x-l))Al, zG[0,oc), (8.9)
to obtain from this and (8.3) that
P(B{rrJ <^x)^G(x), zetO.oo). (8.10)
The delay-length of 6r(Z(r\S^) is S<r) - r, and since G(x) < G(x) for
x G [0, oo), it follows from (8.3) that
P(S{Qr) -r^x)^G(x), a:e[0,oo).
Thus the distribution functions of the delay lengths of both 8r(Z^r\S^)
and er{Z{r'\S^"i) satisfy the condition (7.3) in Theorem 7.1 with G
replaced by G.
Since 9r(Z^,S^) and 6r{Z(r'\Sl-r')) satisfy the conditions of
Theorem 7.1, it follows from Theorem 7.1 that there exists a finite random
variable T with distribution determined by G, f, and F and such that
||P(0tz(r> e •) - P(0tz{rl) e -)|| < P(r > t - r).
For each £ > 0 there is an rE G (—oo, i] such that P(T > t — rE) ^ e, and
thus
||P(0tZ<r> G A) -P(0tZ<r'> G A)|| <e, r' <r<r£.
Due to Lemma 8.1 below (applied to the family of probability measures
P(6tZ^ G 0, r G (-oo,0], with t fixed), the re is a probability measure
/i(') on (H,H) such that
P(94ZW 6 ■) 4/J'1', r|-oo. (8.11)
Let W = (W/s)s6[0jOO) be a stochastic process with the distribution /xW and
define £(*■') := (Jys-t)s6[t,oo)- Then 0* £<*•') = W, and therefore we have
P(6tZ(-*'t) G 0 = A*(/), and (8.7a,b) follows from (8.11).
In order to establish that the distribution of /?(*'') is determined by the
type p(-|0, let (Z'W.S'W), r G (-oo,0], be another family of the type
p(-|0 and having uniformly dominated delay lengths [that is, satisfying a
condition like (8.3)]. Let £'(*•') denote the total variation limit of 9tZ'^
as r I -oo. Now 8r(Z(r\S^) and 0r(Z'(r),S'(r>) satisfy the conditions of
Theorem 7.1, and it follows from Theorem 7.1 that [since t — r —¥ oo as
r I — oo]
\\P(6tZ^ G 0 -P{OtZ'{r) G Oil -> 0, r | -oo.
It follows that the two limit processes Z(*'*) and £'(*-') must have the
same distribution, that is, the distribution of £(*■') is determined by the
type. □
Section 8. Asymptotics From-the-Past 427
The following result was used in the above proof. (It implies that the space
of all probability measures on a given measurable space is complete with
respect to total variation.)
Lemma 8.1. Let Ht, t G [0, oo), be a family of probability measures on
some measurable space (G,G). Suppose for each e > 0 there is a te such
that
WlJ-t — Mt'll ^ £' t' > t > te, {the nt are Cauchy convergent).
Then there exists a probability measure \x on (G, G) such that
Ht % n, Moo. (8-12)
Proof. For each A e Q and e > 0, we have
\fit(A)-fit'(A)\^e, f>t>te. (8.13)
Thus, for each A € Q, there is a number n(A) G [0,1] such that
Ht(A) -> n(A), t->oo. (8.14)
Send t' —> oo in (8.13) to obtain that for each A G Q and e > 0, we have
MA)-/*(A)|sSe, t>te. (8.15)
Let n be the set function taking A G Q to fi(A) G [0,1]. Since nt(E) = 1, it
follows from (8.15) that n(E) = 1. In order to establish that \x is additive,
take disjoint A and B G Q to obtain
fi(A UB)= lim m(A U B) [due to (8.14)]
t—>-oo
= lim (fJ>t(A) + fit(B)) [fit is additive]
t—>-oo
= lim fit(A) + lim nt(B)
t—J-oo (—+oo
= /x(>l) + /x(B) [due to (8.14)].
In order to establish that n is continuous at 0, let Ai,A2,--- G Q be a
sequence of sets decreasing to 0 to obtain, with e > 0 and t > te,
KAn) *S /ut(^n) + e [due to (8.15)]
-> £ as n -> oo [/it is continuous at 0].
Thus lim sup^,^ fj.(An) ^ e for all e > 0, that is, lim^oo /u(^l„) = 0. Thus
/U is an additive set function with fi(E) = 1 and continuous at 0, that is,
H is a probability measure. Take the supremum in (8.15) over A G Q and
multiply by 2 to obtain \\nt - y\\ ^ e for all e > 0 and t> tE, that is, (8.12)
holds. □
428 Chapter 10. REGENERATION
8.4 Coupling From-the-Past
We shall now show that the processes coming in from-the-past can be made
to merge with the limit process (in the distributional sense, unless (E,£)
is Polish and the paths right-continuous).
Theorem 8.2. Suppose the conditions of Theorem 8.1 hold. Then, for each
t G IK, there is a sequence of random times S^*'1' such that the family
(Z^*'1', S^*'*)), t G IK, is time-inhomogeneous wide-sense regenerative of
type p(-|-)- Moreover, for each t G IK, there exists a distributional exact
coupling of 8tZ^ and 9tZ(*'V with finite times T^ andT^*^ such that
T(t) g f := y0 + y, + Y2 + ■ ■ ■ + Y3M,
where Y0 and Y\ are independent random variables with distribution
function G defined at (8.9) and independent of the independent random
variables M, F2,F3,... from Theorem 1.1. Finally, if {E,£) is Polish and the
paths right-continuous, then there is a nondistributional exact coupling of
6tZW and 0t£(*•') with a finite time TW such that T^ ^ f.
PROOF. Fix t G IK. Apply Theorem 8.1 to the family (Zir\ Bir))se[rtQo),
r G (—oo,0], to obtain the existence of a process (Zs*' ,BS*' )se[t,oo) such
that [with r = md when (8.5) holds]
9((^)>Bi:)).€M ^ Wi^.-B^.elt.oo), rl-oo.
Let S(r'^ be the sequence of times s G [t,oo) such that B^_ = 0, and 5^*'^
the sequence of times s G [t, oo) such that B],*2 = 0, to obtain from this
that [see (3.2) of Lemma 3.1 in Chapter 6] for n ^ 0,
(^"(2(r).s(r)).(So''".->^"))
lA {6^{Z^,S^%(S^\...,S^), ri-oo.
Now, for r G (-oo, 0], n ^ 0, and A£W.®£,
P(0sc,.o(Z<r>,S<r>) EAIS^,...^^) = p(A\S™) a.s.,
and thus [see the final statement of Lemma 5.3] the same holds for the
limit, namely for n ^ 0 and A £1-L® C,
P(0s(„„(Z(*''>,S(*-'>) G A\S^l\...,S^) = p(A\S^) a.s.
Thus the family (£(*■'>,S**'')), t G E, is time-inhomo geneous wide-sense
regenerative of type p(-|-)-
Section 8. Asymptotics From-the-Past 429
To establish the coupling claims, note first that due to (8.10) we have,
for r G (-oo, 0], that P(5^r'*) -t^x)^ G(x), x G [0, oo). Sending r | -oo
we see that the same holds for the limit, namely P(Sq — t < x) ^ G(x)
for x G [0, oo). Since S^r't] - t is the delay-length of 0tZ« and S^*'0 - t
is the delay-length of 6»tZ(*-'), and since both 8tZ^ and 8tZ(*^ are time-
inhomogeneous wide-sense regenerative of type p(-\t + ■) with recurrence
distributions Ft+S, s G [0, oo), it follows that the conditions (7.3) through
(7.6) in Theorem 7.1 are satisfied with G replaced by G. This yields the
distributional coupling claim. The nondistributional coupling claim now
follows from Theorem 3.2 in Chapter 4. □
8.5 Two-Sided Limit Process
Theorem 8.1 yields a limit process Z^*'^ with time set [t, oo) for each t G IK.
We shall now show that if (E, £) is Polish, then this family of processes
can be obtained by restricting a single two-sided process Z* = (Z*)s^s_ to
one-sided time sets [t, oo). Note that this two-sided process Z* need not be
stationary.
Theorem 8.3. Suppose the conditions of Theorem 8.1 hold. If (E,£) is
Polish, then there exists a two-sided stochastic process Z* — (Z*)s6k, with
path space (ER,£R), such that for each t G K,
9tZ^ % etZ* asrl -oo, when (8.4) holds, (8.16a)
6tZ(md) ^ QtZ* as m | _00] when (85) holds (8.16b)
If further the paths of Z^r\ r G [t, oo), are right-continuous with left-hand
limits, then so are the paths of Z*, and for each t £ I, there exists a
nondistributional exact coupling of 8tZ^ and QtZ* with a finite coupling
time TW such that T(t) ^ f, where f is from Theorem 8.2.
Comment. If the paths are right-continuous with left-hand limits (that is,
if the path set H consists of such paths), then we may take 8tZ^*'^ := 8tZ*
for all t G IK. On the other hand, if the paths are not right-continuous with
left-hand limits, then Z* has only path space (ER,£R). Thus 8tZ* has path
space (£[o,°o);(r[o,oo)) and not the path space (h,%) of 6tZ* (and of the
8tZ^). However, since 8tZ* has the same distribution as OtZ^*'^ regarded
as an H valued random element in (El0-00),ft0-00)), the total variation
convergence at (8.16a) and (8.166) makes formal sense by interpreting -4
to mean that total variation convergence holds with the distribution of
8tZ* restricted to {H,H).
Proof. Put
^E(-oo,t) x 4) := P(Z(.,t) eA^ teR, A€ £[t'co).
430 Chapter 10. REGENERATION
Take t' < t (and r € dZ when (8.5) holds). Due to (8.7a) and (8.7b), we
have [with r = md when (8.5) holds], as r 4- — oo,
0tZW £■ 0tZW = 6t-tl{6t,Z^) 4 et-v{9t>ZW>) = OtZW.
Thus 9tZ^*'^ and 6tZ^*,t' have the same distribution, that is,
P(Z(«.0 G A) = P(Z<*1*') G jE7[*''*> x A), t' <t,Ae £lt'oo).
From this we see that the set function [i is well-defined on the subalgebra
{£(-«>,*) x A : t € K, A € ft*'00} of £R. Due to the Kolmogorov extension
theorem [see Fact 3.2 in Chapter 3], // extends uniquely to a probability
measure on £R. Let Z* be a process with the distribution // to obtain
(8.16a) and (8.166) from (8.7a) and (8.76).
In order to obtain that the right continuity and left-hand limits transfer
to the paths of Z*, note that for all m € Z the state space of the
sequences ((^+^))s6[o,i])~oo is tne Polish space (DE[0, l],VE[0,1]), namely
the set of all right-continuous mappings from [0, l] to E having left-hand
limits, equipped with its Borel subsets. Thus the state space of the finite-
distributional distributions of the sequence ((^+s)se[01])^°oo can be
restricted to (De[0, 1],T>e[0, 1]). Apply the Kolmogorov extension theorem
to obtain a sequence (Wi-)??^ with state space (De[0, 1],T>e[0, 1]) and
having these finite-dimensional distributions. Remove the null event that there
is a k such that the right endpoint of Wk does not agree with the left end-
point of Wk+i- Now redefine Z* by putting (^+s)se[o,i) := Wk for each
k € Z, to obtain a process with right-continuous paths having left-hand
limits.
The nondistributional coupling claim follows from Theorem 8.2, since
(E, £) is Polish and the paths right-continuous. □
Remark 8.1. It follows from Theorem 8.2 that Z* is time-inhomogeneous
wide-sense regenerative in the sense that the one-sided (Z*)s€[t)00)
restricted to the path-space (H^\ri^) is so for all t € K. It can be shown [at
least when the paths of Z* are right-continuous with left-hand limits; see
Thorisson (1988)] that Z* is time-inhomogeneous wide-sense regenerative
in a proper two-sided sense.
8.6 Moments of T — Convergence Rates — Uniform Convergence
From Theorems 8.2 and 7.2 we obtain the following result. (The function
class A is defined just before Theorem 7.2.)
Theorem 8.4. Suppose the conditions of Theorem 8.1 hold. Then the
convergence at (8.7a) and (8.76) holds uniformly over families {Z^T\S^),
r € (—oo,0], satisfying (8.3) through (8.6) with G, f, and F fixed. If G
and F have finite geometric moments, then so has T, and the uniform
convergence is of geometric order. If ip £ A and G has finite ip moment and
Section 8. Asymptotics From-the-Past 431
F has finite # moment, then T has finite <p moment, and the uniform
convergence is of order <p and of moment order tp. If (E, £) is Polish, then the
same holds for the convergence at (8.16a) and (8.166).
Proof. Take r < tAO (with r G dl if (8.5) holds). From Theorem 8.2
and the coupling time inequality (Theorem 6.1 in Chapter 4) we obtain
||P(f9tZ(r> G ■) - P(6itZ(*'t) G -)IK 2P(T > t - r).
Let sup* denote the supremum over families (Z^r\S^), r Gj-oo,0],
satisfying (8.3) through (8.6) to obtain [since the distribution of T is determined
by G, f, and F]
sup* ||P(0tZ<r> G ■) - P^t^*'0 G OIK 2P(T > t - r). (8.17)
This yields the uniformity of the convergence at (8.7a) and (8.76).
The moment claims for T follow from Theorem 7.2 if we can establish
the same claims for the distribution function G defined at (8.9). From (8.9)
we obtain
l-G{x)^(l-G(x)) + a0{l-Gp(x-l)), zG[0,oo). (8.18)
Suppose G and F have finite geometric moments. Since F has a finite
geometric moment, so [see (7.13a)] has Gp, and thus [see (7.13a)] there is a
p > 1 such that J™ py(l-Gp(y)) dy < oo. Since J™ px(l-Gp{x-l)) dx =
/»Jo°° pv(1 - gf(v)) dv,this yields
pxa0{l-Gp{x-l))dx < oo. (8.19)
Since G has a finite geometric moment, we can [see (7.13a)] take p close
enough to 1 for J0°°px(l — G(x)) dx to be finite. This, together with (8.19)
and (8.18), yields J0°°/5X(1 - G{x)) dx < oo, and thus [see (7.13a)] G has
a finite geometric moment. Thus [due to Theorem 7.2] so has T, and a
reference to Section 6 in Chapter 4 yields that the uniform convergence is
of geometric order.
Take <p G A and suppose G has finite <p moment and F has finite #
moment. Then [see (7.136)] Gp has finite <p moment, and thus Gp(- - 1)
has finite ip(- — 1) moment. Since [see Lemma 7.2(a)] ip ^ tp(l)tp(- — 1), this
means that Gp(- — 1) has finite ip moment, and thus [see (7.136)]
/•OO
/ (p{x)a0{l-Gp(x-l))dx < oo. (8.20)
Since G has finite <p moment, we have [see (7.136)] f^°ip(x)(l — G(x))dx <oo.
This, together with (8.20) and (8.18), yields /0°°<p(i)(l - G{x))dx < oo,
432 Chapter 10. REGENERATION
and thus [see (7.136)] G has finite ip moment. Thus [due to Theorem 7.2]
so has T, and a reference to Section 6 in Chapter 4 yields that the uniform
convergence is of order <p and of moment order <p.
The final claim of the theorem follows from the fact that OtZ* [restricted
to (H,U)\ has the same distribution as Z^^ for t € R □
8.7 Time-inhomogeneous Regeneration in [r, 0]
The limit result in Theorem 8.1 should only depend on the behaviour of
the regeneration times in the far past. It is reasonable to expect that it
holds for processes regenerating only up to some fixed time. Without loss
of generality we can take this time to be zero.
The following modification of the framework in Section 2 and in
Section 8.1 is needed for this purpose. For r G (-oo,0], let S<r) = (s£r))§°
be a one-sided nondecreasing sequence of [r, oo] valued random times such
that for each n ^ 0,
r ^ S<r) < S[r) < ■ ■ ■ < s£r> on {Stf < oo}.
Thus 5<r) is strictly increasing as long as it stays in [r, oo), and is absorbed
in oo when leaving [r, oo).
Regard S^ as a measurable mapping from (£l,!F) to the sequence space
(Loo ,£«?), where
LQ = {(st)o° € [r,oo]{0,1'-} : sk-i < sk < co or sk = sk+i = co},
£(£) = £,£) n B[r, oo]*0'1-* (the Borel subsets of L<r)).
Put (Loo,-Coo) == (L&\C(£>). Note that L& = r + L^.
For t € [r, co), define the joint shift-map 9t on H^ x Loo to be the map
taking (z, (sfc)g°) g H^ x L&> to
0t(z, (st)S°) := (dtz, (snt_+k - t)8°) G if x Loo,
where nt_ = inf{fc > 0 : sk > £}.
For t = oo, define #t(z,(sfc)§°) := Zi, where Zi is an external nonrandom
constant (see Section 2.9 in Chapter 4).
Let Z^ = (Zir')se[roo) be as in Section 8.1 and let it be shift-measurable.
Call the family [Z^r\S^), r £ (—oo,0], time-inhomogeneous regenerative
up to time zero and of type p(-\-) if
p(-|-) is a (((roo,0],B(-oo,0]),(HxL00,'H(g>C00)) probability kernel
Section 8. Asymptotics From-the-Past 433
and, for r G (-00,0], n > 0, and A £ H^Coo, the following holds a.s. on
{S(nr) < 0}:
P(0sW (ZW, 5«) G AK^)) w S<r),..., sP)
e[ ' " ' (8.21)
= p(A|5ir)) a.s. on {S^ < 0}.
Call the family (Z^r\ S(r)), r G (—oo,0], time-inhomogeneous wide-sense
regenerative up to time zero and of type p(-\-) if instead of (8.21) it holds
only that
P(V)(^(r)>5(r))eA|5(r),...,5('-))
bn (8.22)
= P(A\S^) a.s. on {S^ < 0}.
In both cases S^ is a strictly increasing discrete-time Markov process as
long as it stays in [r, 0]. For r G ( — 00, 0] and n > 1, let
y(r) = )Dn Dn-l U Dn-l ^ °°i
\oo ifS^1=oo,
be the recurrence times and let Fs be the recurrence distribution at s, that
is, for all r G (—00,0] and n > 1,
FS(A) = P(X(r> G A|5n_! =s), s G [r,0],' A € B([r, 00)).
8.8 Asymptotics From-the-Past in the [r, 0] Case
We shall use Theorems 8.1 through 8.4 to establish the following
generalization. This result will find an application in the next section.
Theorem 8.5. Let (Z^r\S^), r G (—00, 0], be time-inhomogeneous wide-
sense regenerative up to time zero of type p{-\-). Let G^ be the distribution
function of Sq — r and Fs, s € ( —oo,0], be the recurrence distributions.
Firstly, suppose there is a distribution function G on [0, 00) such that for
re (-oo,0],
G^{x) > G(x), x G [0, -r). (8.23)
Secondly, suppose that either there is a c > 0 and a subprobability density
f on [0, c] such that J f > 0 and, for s G ( — 00, —c],
FS{B)> [ f, BGB[0,oo), (8.24)
Jb
or there is an integer m > 0, a d > 0, and an aperiodic subprobability mass
function f on {d, 2d,..., md} such that for s G (—00, —md],
Fs({kd}) ^ f{kd),l ^ k ^ m, and S and S' are d!L valued. (8.25)
434 Chapter 10. REGENERATION
Thirdly, suppose there is a distribution function F on [0, oo) such that
J xF(dx) < oo and, for s € ( — oo,0],
F,{x)^F(x), i€[0,-j). (8.26)
Then there exists, for each t € M., a stochastic process .£(**') = [Zl*' )se[t,oo)
that is time-inhomogeneous wide-sense regenerative up to time zero, of type
p(-|-), with distribution determined by the type, and such that
0tZ(r) % OtZ^M asrl -oo, when (8.24) holds, (8.27a)
fftZ(md) tv fftZ(*,t) asm± _00) when (8.25) holds. (8.276)
For each t € (—oo, —c], there are (not necessarily finite) distributional exact
coupling times jW and TW) for 6tZ^ and GtZ(*V:
(9noZ(t),r(J)) = (^T(.,()^(*'t),T(*'t)), (8.28a)
such that
T^ A (-t) ^ f, where f is as in Theorem 8.2. (8.286)
If (E, £) is Polish and the paths right-continuous, then there exists a nondis-
tributional exact coupling ofOtZ^ andOtZ^*'1^ with (not necessarily finite)
coupling time T^> satisfying (8.286).
Moreover, if (E,£) is Polish, then there exists a two-sided stochastic
process Z* = (Z*)s6k, with path space (ER,£R), such that for each t € M.,
OtZ(r) % OtZ* asr i -oo, when (8.24) holds, (8.29a)
QtZ(md) *« QtZ. asm^ _QQ^ when (8 25) holds {s,.29b)
If the paths of Z^r\ r € [t, oo) are right-continuous with left-hand limits,
then so are the paths of Z*, and for each t € R, there exists a nondis-
tributional exact coupling of 9tZ^ and 6tZ* with (not necessarily finite)
coupling time T^ satisfying (8.286).
Finally, the convergence at (8.27a) and (8.276), and at (8.28a) and
(8.286), holds uniformly over families (Z^r\S^), r € (-oo,0],
satisfying (8.23) through (8.26) with G, f, and F fixed. If G and F have finite
geometric moments, then so has T, and the uniform convergence is of
geometric order. If ip € A and G has finite <p moment and F has finite $
moment, then T has finite ip moment, and the uniform convergence is of
order <p and of moment order ip.
Comment. The comment to Theorem 8.3 also applies here.
Section 8. Asymptotics From-the-Past 435
Proof. Put c = md when (8.5) holds. For r € (-oo,-c], let G^ be the
distribution function of (5g — r) A (—r), that is,
G<r)(x) = G<r)(x) for x G [0, -r) and G<r)(x) = 1 for x € [-r, oo).
For s € ( —oo,—c], let Fs be the conditional distribution function of the
random variable X„ A (—s) given 5„_i = s, that is,
Fs(x) = Fs(x) for x € [0, -s) and Fs(x) = 1 for x € [-s,oo).
For s £ [—c, oo) put
F = F
± s — c — c-
Note that due to (8.23) through (8.26), the conditions (8.3) through (8.6)
in Theorem 8.1 are satisfied with G^ and Fs, s € [r, oo), replaced by G(r)
and Fs, s 6 [r, oo).
We shall apply Theorem 8.1 to the time-inhomogeneous wide-sense
regenerative family (W^r\R^), r € (—oo,0], defined as follows. Let A be
some nonrandom constant and V\, V2, ■ ■ ■ i-i.d. random variables with the
distribution F_c and independent of the family {Z^r\ S^), r € (—00,0].
For each r € (-00,0], put
W^r) := 6sZ^r) for s 6 [r, -c] and W^r) := A for s € (-c, 00)
and (with 5^ := -00), for k > 0,
**
(r) ._
S^AO, if5tU-c<5W,
Ij£>1 + v4> if-c<si^.
Then i?(r) is Markovian with state space ([r, 00), Z?[r, 00)), increases strictly
to 00, and has recurrence distributions Fs, s € [r, 00); and Rq — r has the
distribution function &rh
Since the family (W^r\R^r>>), r £ (—00, 0], satisfies the conditions of
Theorem 8.1, it follows from Theorem 8.1 that for each t€M, there exists
a stochastic process py(*>') = (Wj*' )«e[t)0O) such that
0tWW % 0twW as r 4. -00, when (8.24) holds, (8.30a)
QtW(md) £3. 0tW{.,t) as m 1 _00) when (8_25j holds_ (8.306)
Forr€ (-00,-c], define Z(*-r) by 6lrZ(*-r):=Pyr(*'r) and recall that Pyr(r): =
6>rZ(r). Thus [see (3.2) of Lemma 3.1 in Chapter 6] (8.30a) and (8.306)
yield that (8.27a) and (8.276) hold for t € (-00,-c]. For t £ (-c, 00),
436 Chapter 10. REGENERATION
define £(*•') by OtZ(*<V := BtZ^*~^ to obtain (8.27a) and (8.276) from
the observation that when t € (—c, oo), the left-hand side, 8tZ^r\ is
obtained from d-cZ^ by the same shift as the right-hand side, dtZ(*^\ from
6»_CZ(*'_C) [see (3.2) of Lemma 3.1 in Chapter 6].
In order to obtain the regeneration claim, proceed as in the first part of
the proof of Theorem 8.2 to obtain the existence of a sequence of random
times S^*'') such that for n > 0 and A £ 7i <g> £oo, the following holds a.s.
on {SiM) < 0}:
p(0s<..„(;?<*•'>,st*-')) g a\s^\...,s^) =P(A\s^%
In order to obtain the distributional coupling claim, apply Theorem 8.2
to the family (W^r\R^), r € (—oo,0], to obtain, for each t € (—oo,— c],
finite random times t'1' and r'*'*' such that
(JrWW(Vi)) = (»r(.,#(,'VM) and r^^f. (8.31)
Define
TW :=t<') if rW ^ -t-c and T^ := oo if r<f» > -t - c,
r(.,t).= r(*,t) if r(*,t)^_^_c and T(*.«):=00ifT(*.*)> _i_C;
and recall that Wt(t) = dtZ^ and W^*'^ = 6»tZ<*>') for t € (-oo,-c] to
obtain from (8.31) that (8.28a) and (8.286) hold. The nondistributional
coupling claim now follows from Theorem 3.2 in Chapter 4.
If (E,£) is Polish, repeat the proof of Theorem 8.3 to obtain from (8.27a)
and (8.276) that (8.28a) and (8.286) hold for a two-sided Z*, and that right
continuity and left-hand limits of the paths transfer to Z*. The
nondistributional coupling claim follows from the fact that (E, £) is Polish and the
paths right-continuous.
Theorem 8.4 yields the moment results for T and the rate and uniformity
results for the convergence at (8.30a) and (8.306). This yields the rate and
uniformity results for the convergence at (8.27a) and (8.276) [since the left-
and right-hand sides at (8.27a) and (8.276) are measurable mappings of the
left- and right-hand sides at (8.30a) and (8.306)]. The rate and uniformity
results for the convergence at (8.28a) and (8.286) follows immediately from
this. □
9 Taboo Regeneration
Suppose we are studying a fish population that has lived a long time in
an isolated lake. This fish population will eventually become extinct, but
suppose it is still there at the time of observation. Then it is not
appropriate to use asymptotic stationarity to motivate a stationary process as a
Section 9. Taboo Regeneration 437
model for the present state of the population. We should rather consider
the asymptotic behaviour of the population under an extinction taboo, that
is, conditionally on the observed fact that the population is still nonextinct
at the time of observation. We should look for a taboo limit.
In this section we shall introduce taboo regenerative processes, processes
that regenerate as long as some specific event (like extinction) has not
occurred. This is the generalization of regeneration appropriate for obtaining
a taboo limit. A key ingredient in our analyses will be the fact
(Theorem 9.1 below) that the taboo conditioning turns taboo regeneration into
time-inhomogeneous regeneration up-to-time-zero (up to the observation
time). Therefore the asymptotics from-the-past in the previous section
apply to yield a taboo limit (Theorem 9.4 below).
9.1 Preliminaries
For taboo purposes we need to modify the framework in Section 2 by
allowing the sequence of times S to be terminating (to be absorbed at infinity).
Let S = (5fc)§° be a nondecreasing sequence of random times that is strictly
increasing as long as it is finite, that is, for each n > 0,
0 ^ So < Si < • ■ • < S„ on {Sn < oo}.
Regard S as a measurable mapping from (ft, J7) to the sequence space
(Loo,Coo), where (with s_i = -oo)
£<x> = {(sfc)o° G[0, oo]*0'1'■■■}:sfc_i <sk<cooisk = sk+i=cc},
-Coo = £<x> n B[0, oo]*0'1'-* (the Borel subsets of L^).
Let Z = (Zs)se[o,oo) be as in Section 2 and let r be a finite nonnegative
random time. Let (f2, T, P) be the probability space supporting (Z,S,F)
and assume that
P(r > Sn) > 0, n > 0.
The triple (Z, S, F) is a measurable mapping from the measurable space
(f>, T) to (HxLooXp, oo), n^Cav^BlO, oo)). As in Section 2, for t e [0, oo),
let 9t be the shift-map from H to H
9tz := (zt+s)se[o,oo)
and also the joint shift-map from H x L^ toflx L^:
et{z,{sk)™):={dtz,{snt_+k-t)™),
where nt- — inf{n ^ 1 : sn ^ t}.
438 Chapter 10. REGENERATION
Further, for t € [0, oo), let 9t be the joint shift-map from H x Loo x [0, oo)
toflx Lqo x [0, oo) defined as follows:
6t(z, (8k)?,x) := (6t(z, (St)g°), (x - t)+).
In order to be able to shift with t replaced by a [0, oo] valued random
time, let A be a fixed nonrandom constant (the cemetery; see Section 2.9
in Chapter 4) and define
floo«:=floo(«,(st)o0):=floo(«,(*t)o°,a:) == (4).€[o,oo)-
The random times Sn split Z into a delay
D := (zs)se{o,s0)
and a (this time possibly terminating) sequence of cycles: for n ^ 1,
C„ := (^s„_i+*)s€[o,x„) on {S„_i < oo},
where Xn are the cycle-lengths
Xn := S„ - 5n_i on {5n_i < oo}.
In order to have nonterminating sequences of cycles and cycle-lengths put,
for n > 1,
Cn := (zi)se[0,oo) and Xn := oo on {5n_i = oo}.
Put
(Z°,S°,F°):=eSo(Z,S,F)
and regard (Z°,S°,F°) as supported by the probability space (Q, T, P°),
where
P°:=P(-|r>50).
9.2 Taboo Regeneration — Definition
Call a one-sided shift-measurable stochastic process Z taboo regenerative
with regeneration times S and taboo time F if for all n ^ 0,
P(6Sn(Z,S,F) £-\F > S„) = P0((Z°,S°,r°) G •) (9.1)
and 6sn (Z, S, F), given the event {F > 5n}, is conditionally independent of
((■Zs)se[o,s„), So, • ■ • ,Sn). These two conditions can be written as a single
condition:
P(9sn(Z,S,r)e-\(Zs)se[0tsn),So,...,Sn;F^Sn)
= P°((Z°,S°,r°) €■), n>0.
Section 9. Taboo Regeneration 439
Call the triple (Z,S,F) taboo regenerative if this holds.
This definition can be reformulated as follows: (Z,S,r) is taboo
regenerative if and only if for each n ^ 1,
given {r>Sn}, £>,<?!,...,<?„_!,0s„(Z,S,r) are
conditionally independent and C\,..., Cn-\ are i.i.d.
We shall refer to the conditioning on the events {r > Sn} by saying under
taboo. Thus taboo regeneration means, loosely speaking, that under taboo
the future is independent of the past, and the past cycles are i.i.d.
Call a triple (Z',S',r') a version of a taboo regenerative (Z,S,r) if
(Z',S',r') is also taboo regenerative and
p(0Sj(W,r') e -\r' > si) = p°((z°,s°,r°) e •)• (9-3)
In particular, (Z°,S°,r°) under P° is a zero-delayed version of a taboo
regenerative (Z,S,T).
9.3 Taboo Wide-Sense Regeneration — Definition
Call a one-sided shift-measurable stochastic process Z taboo wide-sense
regenerative with regeneration times S and taboo time F if instead of (9.2)
we have only:
p(esn(z,s,r)e-\s0,...,sn;r^sn)
9.4
= P0((Z°,50,r°) €•)> n>0.
Call the triple (Z,S,F) taboo wide-sense regenerative if this holds.
Taboo wide-sense regeneration differs from taboo regeneration in that
under the taboo {F > Sn}, the future 9s„{Z,S,F) is no longer
independent of the full past but only of the past regeneration times (S0, ■ ■ ■ ,Sn).
However, (9.1) still holds.
Call a triple (Z',S',F') a version of a taboo wide-sense regenerative
(Z, 5, F) if (Z\ S', r") is also taboo wide-sense regenerative and (9.3) holds.
In particular, (Z°,S°,r°) under P° is a zero-delayed version of a taboo
wide-sense regenerative (Z,S,r).
Taboo lag-l regeneration is defined analogously (see Section 4.1), but we
shall refrain from discussing the lag-/ case here. The discussion is sufficiently
inflated without it.
9.4 Examples
Suppose (Z, S) is classical regenerative (Section 3) and J1 is a first exit
time,
r = mi{t > 0 : Zt & B} (for some specific B € £)
440 Chapter 10. REGENERATION
that is finite, measurable, and such that P(.T > 5n) > 0 for n > 0. Then
(Z, S, r) is taboo regenerative.
On the other hand, if (Z,S) is only wide-sense regenerative (Section 4)
and r is a finite measurable first exit time such that P(.T ^ 5n) > 0
for n > 0, then [Z, S, r) need not be taboo wide-sense regenerative. The
dependence between future and past at the times of regeneration may,
under the taboo, destroy the independence between the future and the past
regeneration times, since the taboo event {r ^ 5n} carries information
about the past.
For an example of a process that is not classical regenerative but taboo
regenerative, consider a transient Markov chain Z. Let B be a transient set
of states that is irreducible, that is, the chain can go from any state i € B
to any other state j € B through a sequence of states in B. If r is the
first exit time from B, and 5 the times of successive entrances into a fixed
reference state j € £?, then (Z, S, r) is taboo regenerative.
For an example of a process that is not wide-sense regenerative but taboo
wide-sense regenerative, modify Section 4.5 as follows. Let Z be a strong
Markov process and r a measurable first exit time from a transient set of
states B. Say that a subset A of B is a taboo regeneration set if A can be
revisited without leaving B, if the first exit time from B is measurable, and
if (4.13) holds with fi(B) — 1. Note that a taboo regeneration set A is not
a regeneration set, since a part of our definition of a regeneration set was
that A be recurrent, but A is a subset of the transient set B and thus is
transient itself. An argument analogous to the one in Section 4.5 yields a
sequence of taboo lag-/ regeneration times for Z.
Finally, consider the GI/GI/k queueing system (see Sections 3.2 and
4.2) in the transient case, that is, with the mean inter-arrival time less
than the mean service time divided by k. Let r be the first time that the
queue length exceeds some fixed level. If the system can empty, then the
successive entrances to an idle system form taboo regeneration times for
the queue length and the ordered remaining service times. If the system
cannot, empty then a modification of the argument in Section 4.3 yields
taboo wide-sense regeneration times for these processes.
9.5 Time-Inhomogeneous Regeneration Under Taboo in [0, t]
Consider a Markov chain Z forbidden to leave an irreducible finite set of
states B up to time t. If the chain is further conditioned on visiting a fixed
reference state j € B at a time s where s ^ t, then its behaviour from time
s onward is, firstly, independent of its past before time s and, secondly,
like the behaviour of a chain starting in state j and forbidden to leave B
in a time interval of length t — s. That is, Z regenerates at time s but the
regeneration is time-inhomogeneous; the future after time s depends on s.
This shows that under taboo in [0,t], the Markov chain Z is time-
inhomogeneous regenerative in [0, t]. Now note that the distribution of the
Section 9. Taboo Regeneration 441
future after regeneration at time s does not only depend on s but also on t.
However, the dependence on t and s is only through t — s, the length of
period from regeneration to the end of the taboo interval. Thus if we instead
of starting Z at time 0 start it at the time —t and forbid it to leave B up to
time 0, then the distribution of the future after regeneration at a time s ^ 0
depends only on s and not on t. The chain becomes time-inhomogeneous
regenerative in [—1,0], and the type is the same for all t.
The above example suggests that if Z is taboo regenerative and if we
start Z at time —t instead of at time 0, then taboo in [—1,0] yields a
process that is time-inhomogeneous regenerative up to time zero and of a
type that does not depend on t.
We shall now show that this is indeed the case. (This result makes
Theorem 8.5 available to establish taboo limits in Theorem 9.4 below.)
Theorem 9.1. For each t € [0, oo), let (Z^~l\ S(~'*) be a pair with
distribution
P((Z(-'),s(-t)) g •) := P(((Z,+t).e[-t,oo), (S* - t)o°) e -|^ > t)
and define a probability kernelp(-\-) by
p(A\s) :=P°((Z°,S°) € A\r° > -s), A£-H®Coo, s £ (-oo,0].
If (Z, S,T) is taboo regenerative, then the family (Z^r\S^), r € (—oo,0],
is time-inhomogeneous regenerative up to time zero and of type p(-\-).
If (Z,S,T) is taboo wide-sense regenerative, then (Z^r\S^), r€( —oo,0],
is time-inhomogeneous wide-sense regenerative up to time zero and of type
P(-l-)-
Proof. Consider first the case when (Z, S, r) is taboo wide-sense
regenerative. Fix arbitrary t € [0, oo), x £ [0,t], and n > 0. Apply (9.4) to
obtain
P(9Sn(Z,S)£-,r-Sn>t-x\So,...,Sn;r> Sn)
= p°((z°,s°) £-,r° >t-x).
But [use Fact 3.1 in Chapter 6]
p{9Sn {z,s)e-,r-sn>t- x\{So,..., s„_i) = -, s„ = x- r > sn)
= P{eSn(z,S)e-,r>t\(s0,...,sn-1) = ;Sn = x;r^sn),
and thus
P{esjz,s)e;r>t\(s0,...,sn.1) = ;Sn = x;r^sn)
(9.5)
= p°((z°,s°) £-,r°>t-i).
442 Chapter 10. REGENERATION
In particular,
p(r > t\(s0,.. .,s„_i) = -,sn = x-r > sn) = p°(r° >t-x).
Divide (9.5) by this to obtain
P(9Sn (Z, S) G • |(S0) • • •, S„_i) = •, S„ = i, T > 0
= P°((Z°,5°) G-|r°> t-x).
Thus, with s = x — t,
p(0sJ-„ (£<-*>, s<-*>) G-Ksr0,...^^) - -^i-r) = *) =p(-W-
Since s £ [—t,0] is arbitrary, this means that the pair (Z^~l\ S'-')) is
time-inhomogeneous wide-sense regenerative up to time zero and of type
P(-l-)-
When the triple (Z,S,T) is taboo regenerative, replace (5o, ■. ■ ,5„_i)
by ((Zs)se[o,s„),5o,... ,Sn-i) in the above argument to obtain the desired
result. □
9.6 Change of Measure - Exponential Biasing
We shall now make an exponential change of measure that is hard to
motivate intuitively. It is, however, motivated (mathematically at least) by its
use in the proof of Theorem 9.2 below and will be further motivated by
its use in the next section. It turns out to be the taboo counterpart of the
length-biasing of cycle-stationary processes in Chapter 8.
Note that if (Z,S,T) is taboo regenerative (in the wide-sense or not),
then
P((Xn+i, Xn+2, ■ ■ ■) 6 -|5o,... ,Sn;r ^ Sn)
= P°((X1,X2,...)€-), n>0.
If this holds, call 5 a taboo renewal process with taboo time T and the pair
(5, -T) taboo regenerative. Under the taboo {J1 > 5n}, the recurrence times
Xi,..., Xn_i are i.i.d. and independent of the delay length So.
Make the following basic assumption:
There is an a > 0 such that E°[eaXl l{r°^Xi}] = *!
and define a probability measure P£aboo on (fi, !F) by
dPtaboo == eaXl l{r^xl}dP°. (9-6)
Further, note that for n ^ 0,
ear l{w<s„+l} = ( eaS° l{Os0>) ( eaXl l{rs0>xl})
• • • ( e°*-i l{r-s„_lW) ( ^{r-s-l) l{r-s.<x.+l}) •
Section 9. Taboo Regeneration 443
Take conditional expectations E[-|So,..., Sn; -T > Sn],..., E[-|So; -T > So]
and E[-] recursively and apply taboo regeneration to obtain
E[ea l{Sn^r<s„+1}]
= E[eaS° l{S0^r}]Eo[eaXl l{r^xl}]nE°[ear° l{r-<xl}]-
Since E°[eaXl l{r°^xa}] = 1. this yields
E[ear l{w<s„+l}] = E[eaS° l{W}]E°[ear° l{r.<Xj}].
Hence if we assume that
E[eaS° l{Os0}] < oo and E°[ear° l{r°<xl}] < oo,
then we can define probability measures Po,Pi,... on (fi,.F) by
dP„ := ^r g 6 {^^<5;.+l} rdP, n > 0. (9.8)
E[e«s°l{OM]E°[e^°l{ro<Xl}] " ^ ;
The following lemma is the key to the proof of the next theorem.
Lemma 9.1. Let R = (i?fc)o° be a renewal process with recurrence times
Yn = Rn — Rn-i having distribution
P(Yn G •) = Pt°aboo(*i G ■), n > 1,
and de/aj/ time Ro having distribution
P(R0 G •) = P0(r G •)•
TTien
P(i?nG-) = Pn(rG-), n^O. (9.9)
Proof. Fix n > 0 and let /0, /i,..., fn,g € B be bounded. Due to (9.7)
and taboo regeneration [take conditional expectations recursively], we have
E[/o(50)/i(X1)... fn(Xn)g(r - Sn) ear l{sn<r<s„+1}]
= E[/o(S0) eaS° l{r^0}]E°[/i(X1) eaXl l{r^xl}]
■ • • E°[/n(X0 eaX> l{r^Xl}]E°[g(n ear° l{r.<Xl}].
The special case n = 0 yields that the product of the first term on the left
and last term on the left equals E[f0(S0)g{r - So) ear \{s0^.r<Si}\- Thus
dividing by E[eaS° l{r^So}}E°[ear°
l{r°<Xi}] on both sides yields (due to
the definition of Pn, P0, and Ptaboo)
E„[/o(50)/1(X1)... fn(Xn)g(r - Sn)}
= E0[f0(S0)g(r - S0)]E°aboo[/i(*i)] • • ■^boo[fn(Xl)}.
444 Chapter 10. REGENERATION
Thus
under Pn, Xi,.. .,Xn arei.i.d. and independent of (5o,r"—5n), (9.10a)
Pn(Xk G •) = Pt°aboo(^i G ■) = P(Yn G ■). 1 < * < ». (9-106)
Pn((S0,r-S„)G-) = Po((5o,r-So)G-)- (9-lOc)
From (9.10c) we obtain Pn(50 + r - Sn G •) = P0(r G •) = p(#o G •)•
Since
r = {s0 + r-sn) + x1 + --- + xn,
this, together with (9.10a) and (9.106), yields (9.9). □
9.7 Exponential Taboo Asymptotics for r
It is natural to start the study of taboo asymptotics by considering the
taboo time r itself.
Theorem 9.2. Let (S,T) be taboo regenerative. Suppose
there is an a > 0 such that E,°[eaXl l{r°>Xi}] = 1; (9.11a)
E[eaSo l{r>So}] < oo, (9-116)
E°[XX eaX> l{r^Xl}] < oo, (9.11c)
e°" P(S0 > r >t)-+ 0 as t-+ oo, (9.lid)
E°[ear° l{r°<xl}] < oo. (9.11e)
If P°(Xi G -\r° ^ Xi) is nonlattice, then as t —> oo,
IfP°(Xi £ -\r° ^ X\) is lattice with span d and
P°(50GdZ)=l and P°(r° G dZ\r° < X,) = 1,
£/ien as n —> oo,
e^P(r > nd) -* E[e«s° l{r>So}], /f °iC ^xT^ r
V ' L {°6o}J(e«d-l)E°[Xie«^l{ro>Xl}]
Comment. In the nonlattice case the theorem implies that conditionally
on r > t, the remaining taboo time r — Ms asymptotically exponential
with parameter a, namely, for x G [0, oo),
P(r-t > x\r > t) ->• e~ax, t^oo.
Section 9. Taboo Regeneration 445
In the lattice case the theorem implies that conditionally on T > nd, the
random variable (r — nd)/d is asymptotically geometric with parameter
p = 1 — e~ad, namely, for k > 1,
P((r -nd)/d> k\r>nd) -* (1 - p)k, n-> oo.
Proof. Consider first the nonlattice case. Put
C = E[eaSM{OSo}]E°[ear°l{ro<Xl}].
Let R, Y\, Yii ■ ■ ■ be as in Lemma 9.1 and note that
E[Vi] = E^^JXi] = E[X, eaX> l{r->Xl}]- (9-12)
Due to this and (9.lid), we must prove that
eatP(r>t,r>5o)^^r|, t^cx>. (9.13)
In order to establish (9.13), note that
eat P(r > t, r > So) = E[eat l{r>t}l{So^r}]
OO
= XI E[eat 1{/,>01{Sn^r<5„+1})]
n=0
oo
= c^En[e-Q(r-*)l{r>t})] [by (9.8)]
n=0
= c£E[e-°(fl--t)l{fln>t})] [by (9.9)].
n=0
Let Mt be the first n such that Rn > t and let Vt = Rm, — t be the residual
life at time t. Then
eat P(r > t, r £ So) = cE[£ e"0^--') l{fl„M}]
n=0
OO
= cE[^ e-Q(Vt + YMt+1 + ■■■ + YMt+kj\
Jt-O
oo
= c^2 Wa{Vt + YMt+l +■■■+ YMt+k)\
k=0
oo
= c^E[e-°v<]E[e-°y Mt+ !}■■■ E[e~aY Mt + k]
k-0
cE[e-aV«]^E[e-Q
k=0
446 Chapter 10. REGENERATION
Since E[e~aFl] < 1, this yields
eat P(r > t, r > So) = cE[e-aV']/(l - E[e~aYl]). (9.14)
Since P°(Xi € -\r° > Xx) is nonlattice, so is Pt°aboo(Xi 6 •), that is,
Y\ is nonlattice. Due to (9.12) and (9.11c) we have E[Yi] < oo. Due to
Theorem 10.1 in Chapter 2, the residual life Vt tends in distribution to a
continuous random variable W with density P(Yi > x)/E[Yi], x > 0. Thus
/•OO
lim E[e-aVt] = E[e~aW] = / e~ax P(Yi > x) dx/E[Yi]
t->oo _/0
/•OO
= E[ / exp-ox l{y1>x} dz]/E[Yi]
•/o (9.15)
= E[ / e-ax di]/E[yi]
Jo
= (l-E[e-oy'])/(aE[r1]).
This together with (9.14) yields (9.13), and the theorem is established in
the nonlattice case.
In the lattice case carry out the above argument with t = nd and with
the following modification. Use Theorem 10.2 in Chapter 10 (instead of
Theorem 10.1 in that chapter) to obtain a lattice-valued limit variable W
with probability mass function dP(Yi > fcd)/E[Yi], k > 1. In (9.15) replace
the integral by a sum and the density by this probability mass function to
obtain
lim E[e-aVt] = d(l -E[e-aFl])/((ead-1)E[y1]).
This together with (9.14) yields (9.13) with a replaced by (ead -l)/d, and
a reference to (9-12) and (9.lid) completes the proof in the lattice case. □
9.8 Stochastic Domination
We shall now use Theorem 9.2 to establish the stochastic domination result
needed to apply Theorem 8.5 for taboo limit purposes.
Theorem 9.3. Suppose the conditions in Theorem 9.2 hold. Then there
are finite constants a and b such that for t € [0, oo) and x € [0, t],
P(50 > x\r > t)
< aE[eaS° l{OS0>x}] + a eat P(50 > T > t),
and
P°(Xi >x\r° >t)
< 6E°[eaXl l{r.£x1>x}] + b eat Pa(X, > T° > t).
(9.16a)
(9.166)
Section 9. Taboo Regeneration 447
Proof. By Theorem 9.2, there are t0 > 0, oo > 0, and ai < oo such that
for t > t0, we have eat P(r > t) > a0 and P0(f0 > t) 4 ax. For 0 ^ t4 t0,
we have eatP(r > t) ^ P(F > t0) > 0 and eaiP°(r° > t) 4 eat° < oo.
Thus we may take ao > 0 and a,\ < oo such that
e«t p^r > ^ ^ ao and eat P°(r° > *) ^ ax for £ € [0, oo). (9.17)
For x € [0, t] we have (using taboo regeneration for the third identity)
P(50 > x,r > t) = p(x < So 4: t,r > t) + P(S0 >t,r >t)
= I P(S0 £dy,r>t) + P(S0 >t,T>t)
J(x,t]
= [ p°(r° > t - y)P(S0 g dy,r > So) + P(50 > t,r > t)
J(x,t]
4 ai e~at [ eay P(50 € dj/, T > S0) + P(50 > t, T > t),
J{x,t]
where the inequality follows from (9.17). Divide by P(.T > t) and apply
(9.17) again to obtain, with a the maximum of a^/ao and l/«o,
P(50 > x\r > t)
4a f eayP(S0 £dy,r^S0)+aeatP(So >t,T>t).
J{x,t]
Now
/ e«y P(50 £dy,r> S0) = E[eaS° l{Os0}l{x<s0^}]
J{x,t]
and
a eat P(r >S0>t)4 E[eaS° l{r>s0}l{t<sQ}],
and thus
P(50 >x\r >t)4 aE[eaS° l{r>s0}l{x<So&}]
+ aE[eaS° l{r>s0}l{t<s0}} + a eat P(S0 > T > t).
Add the first two terms on the right to obtain (9.16a).
In order to obtain (9.166), repeat the above argument with So, -T, P,
and E replaced by X?, T°, P°, and E°. D
9.9 The Taboo Limit Theorem
We are now ready to give conditions under which a taboo regenerative
process considered in a time interval [t — h, oo) tends, under taboo in [t, oo),
to a limit process with time set [—h, oo) as t —t oo.
448 Chapter 10. REGENERATION
Theorem 9.4. Let (Z,S,F) be taboo regenerative (in the wide sense or
not). Suppose
there is an a > 0 such that E°[eaXl l{r°^Xi}] = 1, (9.18a)
E[eaS° l{r>So}] < ~. (9-18fc)
E°[XieaXll{r=^l}]<oo, (9.18c)
eat P(S0 > r > t) -> 0 as t ->• oo, (9.18d)
/■CO
/ sup eat P°{XX >T° >t)dx< oo. (9.18e)
JO t€[x,oo)
If P°(Xi G -l-T0 ^ Xi) is spread out, then for each h G [0, oo), there exists
a stochastic process Z(*'_/l) = (Zs*' )se[_/,]00) such that
P{9t-hZ G -|r >t)% PiO-hZ^-V G •), t ->. oo.
IfP°(X\ G -li-"0 > Xi) is lattice with span d and
P°(50GdZ)=l and P°(r° G dZ|r° < Xi) = 1, (9.19)
£/jen /or eac/i h G [0, oo), £/iere exists a process Z(*'~h) = (Zi*'-/,))se[_/l,00)
suc/i £/ia£
P^d-ftZ G -\r > nd) lA V{e-hZ^-V G ■). n ->• oo.
Comment. The conditions (9.18a) through (9.18d) are the same as the
conditions (9.11a) through (9.lid) in Theorem 9.2, but (9.18e) is stronger
than (9.lie). On the other hand, (9.18e) is weaker than a first-moment
version of (9.lie): we have
E°[re«rl{r<Xl)]<co =» (9.18e) => E°[e°rl{r.<Xl}] <oo.
In order to establish the former implication, note that
eat P°(X! >r°>t)^ E°[eaX> l{r°<xl}l{r°M}]
^E°[eaXll{ro<Xl}l{ro>x}], t^x,
take the supremum over t > x, and integrate over x to obtain
/•OO
/ sup eatP°{Xl >T° >t)dx
Jo te[x,oo)
/•OO
^E0[eoX'l{r<Xl} / l{r°>x}dx]
Jo
= E°[reaX'l{r<Xl}].
Section 9. Taboo Regeneration 449
The latter implication follows from
/•oo fT°
/ eax P°(X1 > r° > x) dx = E°[l{ro<Xl} / eax dx]
Jo Jo
>(E0[earl{r<x,}]-l)/a.
Here we need (9.18e) to be able to apply Theorem 8.5, but let us state as
a conjecture that Theorem 9.4 holds with (9.18e) replaced by (9.lie). This
is suggested by Theorem 9.2 and also by the structure of the limit process
in the next section, which relies only on (9.lie).
Proof. Theorem 9.4 follows from Theorem 8.5 if we can establish that
the family (Z^r\S^), r G ( —oo,0], in Theorem 9.1 satisfies the conditions
(8.23) through (8.26) in Theorem 8.5. We shall establish this in the lattice
case and with a modification in the spread-out case.
According to (9.16a) in Theorem 9.3, we have
P(50 *£ x\r >t)> G(x), t G [0, oo), x G [0, t], (9.20)
where G is the nondecreasing right-continuous function denned at its
continuity points x G [0, oo) by
G(x) = 1 - (aV[eaS°l{r^s„>x}] + a sup eat P(S0 > T > t)) M.
£G[x,oo)
Due to the conditions (9.186) and (9.18d), G(x) —> 1 as x —> oo, and thus
G is a distribution function. Since the distribution function of SQ + t is
P(50 ^ -\r > t), we obtain the condition (8.23) from (9.20).
According to (9.166) in Theorem 9.3, we have
P0(Xi ^ x\r° >t)> F{x), t G [0, oo), x G [0, t], (9.21)
where F is the nonincreasing right-continuous function defined at its
continuity points x G [0, oo) by
F(x) = l-(bE°[eaX>l{r^Xl>x]} + b sup eatP°(X1>ro>0)Al.
t£[x,co)
Since
/•OO /.OO
/ E°[eaX>l{ro^Xi>x}}dx = E° eaX>l{r^Xl} l{Xl>x}dx
Jo l ' Jo ■
= E°[Xiea^l{r^Xl}dx},
we obtain
f xF(dx) = [ (1 — F(x)) dx [see Lemma 5.2]
/•OO
^bE°[XieaXll{r^Xl}dx}+b sup eaiP0(Xi>r°>t)da:
' Jo te[x,oo)
450 Chapter 10. REGENERATION
Thus, due to the conditions (9.18c) and (9.18e), J xF(dx) < oo. For t 6
[0, oo), the recurrence distribution of the family {Z^r\S^), r 6 (—oo,0],
at the time —t is P°(Xi ^ -\r° > t), and thus we obtain the condition
(8.26) from (9.21).
Now suppose P°(Xi 6 -\r° ^ Xi) is lattice with span d and (9.19) holds.
Let m be such that P°(Xi 6 • D [0,md]-,r° ^ Xi) is aperiodic. Take n~^ m
and 1 ^ k ^ m, and use taboo regeneration to obtain the second equality
in
P°(Xi = kd\r° > nd) = P°(Xi = kd,r° > nd)/P°(r° > nd)
= P°(Xi = kd, r° 2 Xi)P°(r° > (n - k)d)/P°(r° > nd)
^P°(Xi =kd,T° ^Xx), k = l,...,m.
This yields the condition (8.25), and the theorem is established in the lattice
case.
In the spread-out case there is an integer n and a subprobability density
/ on [0, oo) such that J / > 0 and
P°(S°n EB,r°>S°n)> I f, Be B[0,oo).
JB
Let c be such that JQC / > 0. Take t ^ c and B 6 #[0,c]. Use taboo
regeneration and t — 5° > t to obtain
p°(s° 6 B,r° ^ s°,r° > t) > p°(s° e s,r° > s°)P°(r° > t).
Dividing by P°(r° > f) yields
p°(S°n eB,r°> s°n\r° > t) > / /, Be B[o,c].
,/B
Thus condition (8.24) holds with x[r) replaced by x[r) +■■■+ x£r). Since
condition (8.26) holds, it also holds with x[r) replaced by x[r) + .. .+X^]
and F replaced by the nth convolution power of F (see the second step of
the proof of Theorem 7.1). Thus, in the spread-out case, the conditions of
Theorem 8.5 hold with the family (Z<r), S^), r 6 (-oo,0], replaced by the
family (Z^r\ (S^)^), r 6 (—oo,0], which is also taboo regenerative up to
time zero. The limit result now follows from Theorem 8.5. □
9.10 Comments on Uniformity, Rates, and Coupling
The reader may have noted that Theorem 9.4 does not use the full power of
Theorem 8.5. The uniformity results and the associated rate and coupling
results are left out. We leave this out to focus better on the taboo limit
phenomenon itself. Here are only a few comments.
Section 10. Taboo Stationarity 451
Rate and coupling results follow easily from Theorem 8.5 using the
distribution functions G and F introduced in the proof of Theorem 9.4. We
obtain the moment conditions for G and F by placing moment conditions
on the ingredients in their definition.
For the uniformity results the common density and mass function part
is easy. However, some care must be taken with the constants a and b in
Theorem 9.3. They must be traced through the proof of Theorem 9.2 (using
the uniform convergence in Theorem 3.3 rather than using Theorems 10.1
and 10.2 from Chapter 2).
The reader may also have noted that we left out the statements on
existence of a two-sided limit process. We did this because the issue will be
considered in great detail in the next section, where we shall establish the
explicit structure of the two-sided limit process.
10 Taboo Stationarity
In this section we shall consider the taboo counterpart of stationarity.
Stationarity means that the distribution of a process does not change by non-
random time shifts. This is the characterizing property of any two-sided
limit process obtained by shifting the time origin of a one-sided process to
the far future. Similarly, taboo stationarity means that the distribution of a
process does not change by nonrandom time shifts under taboo. This is the
characterizing property of any two-sided limit process obtained by shifting
the origin of a one-sided process to the far future under taboo up to the
new time origin.
We begin by defining taboo stationarity for general two-sided processes
and show that it is the characterizing property of a taboo limit
(Theorem 10.1). We then establish a basic but amazingly simple structural
characterization of taboo stationary processes (independent-exponential-shift-
to-the-past, Theorem 10.2).
After this we return to taboo regeneration and explicitly construct a
taboo stationary version of a taboo regenerative process (Theorems 10.3
through 10.6). This is the taboo counterpart of the construction of a
stationary process in Chapter 8. We finally show (Theorem 10.7) that this
taboo stationary version is indeed the limit process in Theorem 9.4 above.
10.1 Taboo Stationary Stochastic Processes — Definition
Consider a pair (Z*,T*), where T* is a nonnegative finite random time
and
Z* — (zs)seu
is a two-sided shift-measurable stochastic process. Recall that 6t in this
chapter (see Section 2 and Section 9.1 for formal details) denotes the one-
452 Chapter 10. REGENERATION
sided shift-map:
6tZ* = (^*+«)«€[o,oo), t £ R, (one-sided shift).
Let 6t denote the two-sided shift-map:
6tZ* = (Z;+s)seR, t £ M, (two-sided shift).
Call Z* taboo stationary with taboo time r* if shift under taboo does not
change the distribution of the pair (Z*, i-1*), that is, if
p((etz*,r-t)£-\r*>t) = p((z*,r*)£-), te[o,oo). (10.1)
Call (Z*, r*) taboo stationary if this holds. Note that if we shift the origin
back, then (10.1) yields P((Z*,T*) £ -\r* > t) = P((0_tZ*,r* + t) £■)
for t £ [0, oo). Thus (10.1) is equivalent to the following condition:
p((etz*,r-t) e-\r* >t) = p({z*,r*) e-\r* > -t), *gr
We shall now show that taboo stationarity is the characterizing property
of a total variation taboo limit.
Theorem 10.1. A pair (Z*, i"1*) is taboo stationary if and only if there is
a pair (Z, r), where Z = (Zs)s€i0(x>\ is a one-sided shift-measurable process
and r is a nonnegative finite random time, such that
p((et-hz,r-t)e-\r>t)%p({0-hz*,r*)e-), *-><», (10.2)
for all h £ [0, oo).
Proof. If (10.1) holds, then so does (10.2) with (Z,T) := (90Z*,r*). In
order to establish the converse [that (10.2) implies (10.1)], assume that
(10.2) holds. Take x € [0,oo) and h £ [x,oo) and note that (10.2) implies
[with h replaced by h — x\ that
p((et_{h_x)z, r-t-x)£-,r-t>x\r>t)
4 P((e_(h_x)z*,r* -X)£-,r*>x), t -> oo.
Divide by P(r~t > x\r > t) on the left and by the limit P(r* > x) on the
right [and note that 6t-(h-x)Z = 6{t+x)_hZ and 6_(h_x)Z* = 9x0-hZ*] to
obtain that as t —> oo,
p{{e{t+x)_hz, r-t-x)£-\r>t + x)
% p((ox6-hz*,r - x) £ -\r* > x).
According to (10.2), the left-hand side tends also to P((^_/lZ*,i~"*) £ ■).
Since the two limits must be identical, we have [replace x by t] that
p((ete-hz*,r*) £ -\r* >t) = p((e_hz*,r*) e •), o < t < h.
Since h is arbitrary, this yields (10.1). □
Section 10. Taboo Stationarity 453
10.2 Basic Structural Characterization
According to the following theorem, taboo stationarity is characterized by
an independent-exponential-shift-to-the-past. That is, if Z' is some two-
sided shift-measurable process and V is exponential and independent of
Z', then
(6-vZ',V) is always taboo stationary;
and conversely, all taboo stationary processes are of this form.
Theorem 10.2. The pair (Z*,T*) is taboo stationary if and only if T* is
exponential and independent of 6r*Z*.
Proof. Suppose (Z*,T*) is taboo stationary. From (10.1) we obtain
p(r*-te-|r* > t) = p(r* g-)> te[o,oo),
which is the standard characterization of exponentiality. Moreover,
0r-Z* = 0r--t0tZm, *G[0,oo),
that is, Of'Z* is the same measurable mapping of (6tZ*,r* — t) for all
t G [0,oo). This together with (10.1) yields
P{0r'Z* G -\r* >t)= P(6r*Z* G •), t€ [0,oo).
Multiply by P(.T* > t) to obtain
P(6r'Z* G -,r* > t) = P(0r«Z* G -)P(^* > 0, l € [0,oo),
that is, 6r* Z* and i"1* are independent.
Conversely, suppose .T* is exponential and independent of 6r*Z*. Since
r* is exponential, we have, for all z in the path set H,
P{{6t-r-z,r - 0 G -|r* > 0 = P('(fl-r.z,r*) G •), t G [0,oo).
Since J1* and 6r-Z* are independent, we may replace z by 6r-Z* to obtain
[since 0t_r,0r.Z* = 0tZ* and 0_r.0r.Z* = Z*]
p((0tz*,r* -1) g -|r* > o = P((^*,r*) e •), t g [o,oo),
that is, (Z*:F*) is taboo stationary. □
454 Chapter 10. REGENERATION
Remark 10.1. In the above we could consider Z* jointly with a nonde-
creasing sequence of (—00,00] valued random times S* = (S^)^^
satisfying, for all n ^ 0,
-00 « < S*_2 <S*! <0^S* <---<S; on {5;<oo}.
For i£t, let 6t denote the following two-sided joint shift-maps:
0t(Z*,S*) = (9tZ*, (5^_+,)-00) where N?_ = inf{* : S*k > t},
et(z*,s*,r*) = (et{z*,s*),r*-t).
The triple (Z*,S*,T*) is taboo stationary if for t 6 [0, 00),
p(et(Z*,s*,r*) e -\r* >t) = p({z*,s*,r*) e ■).
Both Theorem 10.1 and Theorem 10.2 hold with Z* replaced by (Z*,S*).
10.3 Back to Taboo Regeneration — Intuitive Motivation
We now return to the topic of the last section, taboo regeneration. The
task for the rest of this section is to construct a taboo stationary version
(Z*, S*,r*) of a taboo regenerative (Z, S, F). Here is an attempt at
motivating this construction intuitively in the proper taboo regenerative case,
that is, not in the wide-sense case but in the case when (Z: S, r) satisfies
(9.1); see Figure 10.1.
Think of (Z*,S*,T*) as a taboo limit of (Z,S,T). Then the following
guesses seem reasonable. The cycles of (Z*,S*) coming in from the past,
..., C^, C!li, should be i.i.d. and independent of the cycle Cq straddling
zero. Moreover, conditionally on {Sq < 00}, the cycles ... ,CI2)C-i>Co
should be independent of the future 9s*(Z*,S*,r*), which should behave
as the zero-delayed version of (Z,S,r).
In order to have a complete description of (Z*,5*,T*) there are still
three guesses missing: the distribution of (711, the distribution of Cq, and
the position of zero in the cycle Cq . But we shall not proceed further along
this path [for the complete description of (Z*,S*,T*), see the comment
following Theorem 10.4] because it turns out to be easier to consider another
triple (Z, S, t) defined as follows:
(Z, S, f) = 6R. (Z*,S*,r*) (see Figure 10.1),
where R* is the last SI in (—00, i"1*], that is,
R* := S^,-i = snp{S*k : k £ Z and S*k ^ T*}.
In the light of Theorem 10.2, r* should be exponential and independent
of <9r'{Z*,S*). But 6r*(Z*,S*) = *0t(Z,S), and -f is the initial point
Section 10. Taboo Stationarity 455
of the ^r*(^*,5'*)-cycle straddling zero. Thus (Z,S,f) is a measurable
mapping of 6r*{Z*,S*) and should therefore be independent of T*. Thus
we should obtain (Z*,S*) as follows:
(Z*,S*) = 6-r,6t(Z,S)
where
r* is exponential and independent of (Z, S, P).
Now let us guess at the structure of (Z, S, t). We have already guessed that
the cycles of (Z*,S*) are i.i.d. before time zero. Since (Z*,S*) is obtained
by shifting the origin of 6f(Z,S) independently to the past, this suggests
that the same should hold for 0p(Z,S), that is,
..., C-i, Co should be i.i.d. copies of Cl1.
The naive guess would be that CL\ is like the first cycle C\ of the zero-
delayed version, conditioned on the event {r° ^ X{\. But this is not the
case. So now let us give up guessing and simply state the upcoming results
(in the proper taboo regenerative case).
8.2
Realization of a classical regenerative (Z, S) under taboo in [0, f]
The taboo
stationary
(Z?,S',r)
Realization
of {z,s,n
r* is exponential a and independent of (Z, S,t).
FIGURE 10.1. The structure of the taboo stationary version (Z*,S*,r*
456 Chapter 10. REGENERATION
It turns out that an exponential biasing of the cycle-length X\ (and
not conditioning on {r° ^ X\}) is the appropriate way to change the
subprobability distribution P°(Ci £ -,T° > X\) into the probability
distribution of the i.i.d. cycles ...,C-\,Cq. Further, it turns out that these
cycles should be independent of the future, 60(Z, S, t). Finally, it turns out
that an exponential biasing of the taboo time F° is the appropriate way to
change the subprobability distribution P°((Z°, 5°, f°) € -,F° < Xx) into
the probability distribution oi0o(Z,S,r).
10.4 Construction of (Z, S, F)
We shall now turn to the construction of (Z, S, P) motivated intuitively
above.
Theorem 10.3. (a) Let (Z, S, r) be taboo regenerative, that is, let it
satisfy (9.1). Suppose
there is an a > 0 such that E°[eaXl l{r°>x-i>] = 1
(10-3)
and E°[ear° l{r"<x1}} < °°-
Define probability measures Pg, P°,... on (fi,^") 6?/
HP° — ^ "^ <an + l>dpo > q
dF"- E°[e^l{ro<Xl}]dF' n^°-
Tften tftere exists a two-sided process Z — (Zs)sSr; a nondecreasing
two-sided sequence of ( — 00,00] valued random times S — (Sk)0?^
satisfying, for each n ^ 1,
-00 <-•■•< S-2 < S-i < 0 = S0 < ■ ■ ■ < Sn on {Sn < 00},
and a finite nonnegative random time F such that
p(6s_n(z,s,f)e-) = p°n((z°,s°,r°)e-), n^o. (io.4)
Moreover, the cycles of (Z, S) up to time zero,
^■=(Zsk_1+s)se[0^xky k^O, {here Xk = Sk - Sk-i)
are i.i.d. with distribution P°ab 00(^1 € •), where P°aboo is the
probability measure on (fl,^) defined (as in Section 9.6) by
dP°ab00:=e^l{ro>.Yl}dP°.
Finally, these cycles are independent of 6o(Z,S,F), which has the
distribution Pg((Z°, S°,r°) £ ■)•
Section 10. Taboo Stationarity 457
(b) Let (Z,S,T) be taboo wide-sense regenerative. Suppose (10.3) holds.
If Z has a Polish state space and right-continuous paths with left-hand
limits, then there exists a triple (Z, S, t) with the above properties
except that the cycles ..., C-i, Co need neither be i.i.d. nor independent
of 9o(Z:S,r). However, the cycle-lengths ... ,X-\, Xo are still i.i.d.
and independent of 80(Z,S,r). Their distribution is Ptaboo(^i € ■)•
Comment. The triple (Z,S,t) is obtained as follows: first string out
tabooed cycles in (—oo,0) and break the taboo in the first cycle in [0,oo);
then bias exponentially the cycle-lengths in ( —oo, 0) and the taboo time in
[0, oo). This can be seen (informally) as an exponential biasing of the total
(infinite) taboo time from — oo to t.
Proof, (a) The P° are a special case of the probability measures Pn
from Section 9.6 [with P replaced by P° and (Z, S, T) by (Z°,S°,r°)]. Let
(Y,R,t) be a triple with distribution Pg((Z°,5°,r°) e • ). According
to Fact 3.1 in Chapter 3, we may assume the existence of i.i.d. cycles
... ,C-i,Co that are independent of (Y,R,T) and have the distribution
pt°aboo(c'i e ■)• Define (Z,S) by putting
00{Z, S) := (Y, R) [thus50=0]
and by letting ..., C-\, (% be the cycles of (Z, S, t) up to time So = 0.
In order to complete the proof of (a), it only remains to establish (10.4).
The same argument as in the proof of Lemma 9.1 [leave out So, and replace
the cycle lengths X^ by the cycles Ck,F — Sn by 8sn(Z, S, r), and r — So
by {Z°,S°,r°)\ shows that taboo regeneration implies [instead of (9.10a),
(9.106), and (9.10c)] that for n ^ 1,
under P°, C\,...,Cn are i.i.d. and independent of 6s„ (Z, S, r),
P°n(Ck G ■) = Pt°aboo(Ci G •), 1 < * < n,
p°n(esjz,s,r)£-) = p°((z°,s°,r°)&-),
that is,
p^c, €■,...,cn€-,eSn(z,s,r)G-)
= Pt°aboo(Ci e-)---P°aboo(Ci e-)PS((2°,5°,r°)e-)-
Due to the definition of (Z,S,r), this implies that
P(C_n+1 e •,...,(% e-A(z,s,f)e-)
= P°n(d £■,...,Cn€-,6Sn(Z,S,r)£-), n^0,
which is a reformulation of (10.4).
458 Chapter 10. REGENERATION
(6) Again the P° are a special case of the probability measures Pn from
Section 9.6 [with P replaced by P° and (Z,S,T) by (Z°,S°,r°)]. For
bounded f £71® C^ <g> B[0, oo) and n ^ 0, we have
ear° l{s:+1^ro<ss+2}/(^1 (Z°, S°, r°))
= (eaXn{r^xl})(ea(r°-Xih{s°+^r°<si+2}f(Oxl(Z°,S°,n)),
and thus, due to taboo wide-sense regeneration,
E°[e»r l{So+^ro<So+2}/(0Xl(Z°,S°,n| eaXl l{r->Xl}]
= eaXl l{r^Xl}E°[ear l{s^r.<s.+i}/(Z0,S0,r°)].
Take expectation and use E°[eaXl l{r°^x1}] = 1 and the definition of P°+1
and P° to obtain
p°n+1(eXl(z°,s°,n&-) = p°n((z°,s°,r°)&-), oo. (io.5)
The same argument as in the proof of Lemma 9.1 [leave out So, and replace
r - Sn by 0Sn (Z, S, f) and r - So by (Z°,S°, f°)] shows that taboo wide-
sense regeneration implies [instead of (9.10a), (9.106), and (9.10c)] that for
n ^ 1,
under P°,Xi,.. .,Xnarei.i.d. and independent of OsJZ, S, T), (10.6a)
P°(X* G-)=Pt°aboo(^ieO, lO^n, (10.66)
P° (0Sn (Z, S, r) G •) = PS((Z°, 5°, r°) G •)■ (10.6c)
Define the distribution of (Z,S,t) recursively by (10.4). Due to (10.5),
this definition is consistent, and thus the Kolmogorov extension theorem
(Fact 3.2 in Chapter 3) yields the existence of such a random element
(Z, S, t). In order to see that the infinite sequence of cycles ..., C-\, Co
stretches backward to —oo, note that according to (10.6a), the cycle-lengths
... ,X-i,X0 are i.i.d., and thus their sum is infinite. Thus there exists a
triple (Z,S,t) satisfying (10.4). Now (10.6a), (10.66), and (10.6c) yield
what remains of (6). □
10.5 The Taboo Stationary {Z*,S*,T*)
Let (Z, S, f) be as in Theorem 10.3,
let r* be exponential with parameter a,
(10.7a)
let r* be independent of (Z, S, t),
and put
(Z*,S*):=fl-r«MZ,S). (10.76)
Section 10. Taboo Stationarity 459
Then (Z*, S*, T*) is taboo stationary according to Theorem 10.2 (see
Remark 10.1).
In the next subsection we shall motivate calling (Z*,S*,T*) a version of
(Z,S,r). But first we establish the following structure of (Z*,S*,T*).
Theorem 10.4. Suppose the conditions of Theorem 10.3 hold. Let the
triple (Z*,S*,r*) be as above and, in addition, let the exponential T* be
independent of (Z, S, T). Then, for n ^ 1,
p(6s*_n(z*,s*,r*) e-,s0*Ar e-)
= p*n((Z°,s°,r°) e •, (r* mod xn a (r° - s°_,)) e •),
where P£ is the probability measure on (£1,^) defined by
e
,aS°
,/0a(XnA(r°-S° ,)) i\i
HP* •= - L "-1 —-dP°
E°[e^°l{ro<Xl}]
Comment. From this we can read that the independent-exponential-shift-
backward-from-r1 works as acceptance-rejection (see Section 6 in
Chapter 8): it gives the distribution P*(Ci € •) to the 'accepted' cycle QJ that
straddles the new origin, and it unbiases the 'rejected' cycles of (Z, S), the
cycles that end up in [0, oo) and do not straddle the new origin. It also
unbiases the taboo time t or the part of P that becomes positive by the
shift. Finally, this shift places the origin at a truncated exponential
distance from the right endpoint of the cycle where it happens to fall, or to
the right of the taboo time f if t happens to be in that cycle.
Proof. Fix n ^ 1. We shall first show that P* is a probability measure.
Due to taboo wide-sense regeneration,
E°[eaS"- l{S;_1<r-}| eaS""2 l{s;_a<r-}]
By assumption, ~E°[eaXl ^r^xa] = 1, and thus, recursively, we obtain
E°[eaS--i ^sj.^r-}] = 1-
Again due to taboo wide-sense regeneration,
H, [e " \e ^ ^ » 1" -l)l{s^_l^r'}\e "-1 l{s°_1<r°}J
— e " 1 l{S°_iSjro}H, [e *■ ' -1J.
Take expectation and apply E°[eaS"-1 l{s°_1^r°}] = 1 to obtain the first
identity in
E°[eaS"-i(ea((s»Ar°)-S"-i) -l)l{So_^ro}] = E°[ea(^Ar°) -1]
= E°[e°r l{ro<Xl}] (since E°[eaX' l{r->Xl}] = 1).
460 Chapter 10. REGENERATION
Thus P* is a probability measure.
Now take bounded / € % ® £00 and g € B[0,oo). Then, due to (10.76)
and the fact that t < S\, we have
E[/(«sljz*,s',r))5(s0*Ar)] (10.8)
00
= T,^f(es.k.SZ,s,r)g(r*-(r-s.k)+)i{§_ki<f,_r,^_k}}-
Jfc=-1
Due to (10.4) in Theorem 10.3 [and since r* is exponential a and
independent of both (Z, S, t) and (Z, S, T) under both P and P£+n; see Lemma 4.1
in Chapter 8] we have, for k ^ — 1,
nWs.k.SZ,s,r))g(r*-(r-s.k)+)i{s_k_1<r_r^s.k}}
=Ey/(z",s°r*)9(r-(r-sy)iK_i<p.r.N<s.}].
Put
c:=l/E"[e«rl{r<Xl}].
By the definition of P£+n in Theorem 10.3, for all bounded W € T,
E°k+n[W] = cE°[W e«r° Ms:+k^<s:+k+l}}
and thus
00
Y, E°k+n[W] = cE°[Wear° l{Son_^ro}].
k = -l
Combine this, (10.8), and (10.9) to obtain
E[/(fls.n(z,,st,r)),(s0*Ar)]
= cE°[/(z°,5°,r°)g(r* - (r° - s°n)+) (10.10)
1{5°_1<r°-r*^s°} ea 1{s°_1^r°}]-
Since r* is exponential a and independent of (Z°,5°,r°) under P°, we
have
E0[s(r*-(r0-s°)+)i{ro_So<r.<ro_s._i}|(z0)s0,r0)]
= E°[3(r*mod((r°-5°_1)+-(r°-5°)+))|(z°,5°,n] (10.11)
( e-a(r°-S°_l)+ _ e-a(r°-S°)+ J_
Section 10. Taboo Stationarity 461
Now
l-{r°-sz^r»<r°-s°_l} = l{s°_1<r°-r*^s°}>
(r° - s;u)+ - (r° - s°)+ = xn a (r° - s^),
e-a(r°-S°_1)+ _ e-a(r°-S°n)+= eaS°_1(ea(XnA(rt'-S°_1) + )_1)e-ar°
Combine this and (10.11) to obtain
E°[3(r*-(r0-5:)+)i{So_i<ro_r.^}|(z0,50,r°)]
= E°[g(r* mod Xn A (r° - 5°_1))|(Z°, 5°, T°)]
Multiply by f(Z°,S°,r°) and by ear° l{s°_ <;r°}> take expectation, and
compare with (10.10) to obtain
nf(6Sln(z*,s*,r))g(s*0An}
= cE°[/(Z°,5°,r°)3(r*modXnA(r°-5°_1))
eQ5;_l(6a(X„A(7--5»_1) + )_1)1{s;,_i^0}].
By the definition of PJj, this yields
E[f(Os._n(Z',S;r*))g(SS AT')]
= E;[/(z°,5°,r°)3(r* mod xn a (r° - s°_,))],
which is the desired result. D
10.6 The Taboo Stationary (Z*, S*, T*) Is a Version of (Z, 5, T)
We shall now establish two theorems that motivate calling the taboo
stationary (Z* ,S*,r*) a version of the taboo regenerative (Z,S,r), a taboo
stationary version. Theorem 10.5 deals with the behaviour of (Z*,S*, T*)
after time zero and Theorem 10.6 with the behaviour before time zero.
We shall now show that (Z*,S*,r*) taboo (wide-sense) regenerates
at Sq and continues after the regeneration like the zero-delayed version
(Z°,S°,r°).
Theorem 10.5. If (Z,S,T) is taboo regenerative, then the following holds
under the conditions of Theorem 10.3(a) and with (Z* ,5* ,T*) as at (10.76):
p(es.(Z'I5',ne-|(^Voo,s;),-,S!1,s0t;r^so')
(10.12a)
= P°((Z°,5°,r°) €•)•
462 Chapter 10. REGENERATION
// (Z, S, r) is taboo wide-sense regenerative, then the following holds under
the conditions of Theorem 10.3(b) and with (Z*,S*,r*) as at (10.76):
p(es.0(z*,s*,r*)€-\...,s*_1,s*o]r*>s*0)
(10.126)
= p0((z°,50,r°)e-)-
Proof. We shall first consider the latter claim. Assume the conditions of
Theorem 10.3(6). Take n ^ 1 and bounded / G U ® C^ and g G Bn+1. Let
P*n be as in Theorem 10.4 and put c = 1/E°[ear° l{r<.<x1}]- The density
of p; with respect to P° is ce^-'fe"1- -1) on {r° ^ 5°}, and thus, by
taboo wide-sense regeneration and since we may let r* be independent of
(Z°,S°,ro) under P° and thus under P;,
E;[/(6»s. (Z°, S°, r°))g(X1 ,...,Xn, (r* mod Xn))l{r^s°n}]
= e°[/(z°,5°,r°))]K[g(Xi,...,xn,(r* modxn))i{ro>s.}].
Now
r* mod Xn = r* mod Xn A (r° - 5°_i) on {r° ^ S°},
and thus, due to Theorem 10.4, this identity can be rewritten as
nf(0s;(z*,s*,r*)g(xin+1,...,xz,s*0)i{r.^}]
= v°[f(z°,s°,r°)]ng(x*_n+1,...,xz,s*0)i{r^s,}}.
Since n, /, and g are arbitrary, this yields (10.126).
The claim (10.12a) is established in the same way with (Xi,... ,Xn)
replaced by ((Z°,S^+s))s€[_si7l,o),Xu... ,Xn) and (X*n+1,... ,X£)
replaced by ((^+a)se[Sin-s0*,o),^-„+i, ■ ■ ■ ,*o)- D
Theorem 10.5 shows that (Z*, 5* r*) behaves in [0, oo) as a version of
(Z, S, r). The next theorem completes the motivation for calling (Z*, S*, r*)
a version of (Z, S, r) by showing that (Z* ,S* ,T*) behaves in the taboo
period ( — oo, 0] as (Z, S, r) does under taboo in any nonrandom interval [0, t].
Theorem 10.6. If(Z,S,r) is taboo regenerative, then the following holds
under the conditions of Theorem 10.3(a) and with (Z*, S*, r*) as at (10.76):
for t G [0, oo) and s G [0, t],
p(^. jz*,s*, n g -k^W^, (s*Nlt_+k)zl-, s*Nlt_ = -8)
= P0((Z°,5°,r°)G-|r° >«). (10.13a)
If (Z,S,r) is taboo wide-sense regenerative, then the following holds under
the conditions of Theorem 10.3(b) and with (Z*,S*,F*) as at (10.76): for
Section 10. Taboo Stationarity 463
t € [0,oo) and s € [Q,t],
p(es%. (z*,s*,ne-\(S*N.t +k)zl;S"N,t =-s)
(10.136)
= p°((z°,50,r0) e-|r° > s).
Proof. We shall first consider the latter claim (10.136). Assume the
conditions of Theorem 10.3(6) and fix t € [0,00) and s € [0, i\. Since (Z*,S*,T*)
is taboo stationary, we have
p(«s.,jr,s4,r)g,((i;.t_+t)»co,s^_)e-)
= p(es.0(z*,s*,r*)&-,((x*k)0_oo,s*0-t)&-\r*>t)
and thus
P(Js..(_(r)st,ne-|(n:1.XiS^.=-») (10-14)
= p(6si(z*,s*,r*)£;r*>t\(Xj)0_oo;SS-t = -8)
p(r*>t\(x*k)°_00;S5-t = -s)
On {Sq - t = —s} we can rewrite r* > t as r* — Sq > s, and thus [see
Fact 3.1 in Chapter 6] we have
p(6ss(z*,s;n g -,r* > wxtt^ss -t = -s)
= p(6»s.(z*,s*,r*) e -,r* - s0* > sITO0^;^ -1 = -s).
Since r* - Sq > s implies r* ^ Sq, this yields
p(es-(z;s;n g -,r* > tK^^sgg -1 = -s)
P(r*2SS\(X*k)0_oo;SS-t = -8)
= P(6s.0(Z*,S*,n€-,r*-S*0>s\(X*k)0_oo;S*0-t=-s,r>S*0).
By Theorem 10.5, the right-hand side equals P°{{Z°,S°,ro) £ -,r° > s).
Thus
p(0ss(z*,S',n e -,r* > tim^g* -t = -8)
p(r* ^ ssk^j^^o* -1 = -s) (10.i5)
= p°((zo,50,r°) e-,r° >s). .
In particular,
p(r>t\(x*k)0_oo;S5-t = -8)
P(r*^S*0\(X*r_oo;S*0-t = -s)
p°(r° >«).
Divide (10.15) by this and compare with (10.14) to obtain (10.136).
The claim (10.6) is established in the same way with (X^* )?_00 replaced
by ((z;).<s;,._ _, (*£.,_+*)» oo)- ~" n
464 Chapter 10. REGENERATION
As in Theorem 9.1, define a probability kernel p(-\-) by
p(-\s) = P°((Z°,S°) G -\r° > -s), s G (-oo,0].
We shall now show that (Z*,S*) is time-inhomogeneous (wide-sense)
regenerative up to time zero of type p(-\-) in the following two-sided sense.
Corollary 10.1. If (Z,S,T) is taboo regenerative, then for t G [0, oo) and
P(^.i_+n(^,5*)G-|(Z;)s<5^_+n,(S^.(_+fc)^00)
= P(-\S*Nlt_+n) on {^,(_+n < 0}. (10.16a)
// (Z, S, r) is taboo wide-sense regenerative, then for t G [0, oo) and n ^ 0,
= Pi-\S*Nlt_+n) on{S*N,_t+n<0}. (10.166)
Proof. First consider the latter claim (10.166). Assume that (Z,S,T) is
taboo wide-sense regenerative and fix t G [0, oo) and n ^ 0. According to
Theorem 9.1 there is a pair (Z^~^, 5(~^) such that for so,... ,sn G [0, t],
P(es(-„(Z(-t))S<-i))G-|S(-t) = -So,...,5^t) = -Sn)
(10.17)
= P(-\ ~Sn).
According to (10.136) in Theorem 10.6 and Theorem 9.1, for Sq G [0,t],
p(es-Nit_jz*,s*) g -,(Shlt_+k)i e ■\(s*Nlt„+k)-1oo;S*Nlt_ = so)
= P{9si-t)(Z^\S^) G -.(S^)? G .|S(-4> = -So).
This implies that for so, ■.., sn G [0, t],
P(^, ^+n(Z*,S*) G •|(5'^.(_+j.)lJ<); Sjv-(_ = -so,- ■ ■, 5'^(_+n= -sn)
= P(0S,_„ (Z(-*), S(-')) G • \Sil) = -5o,. • •, S^-') = -sn).
This and (10.17) yield (10.166).
The former claim (10.16a) is established in the same way with
replaced by ((Z*)S<S*N, AS*N-_t_
(S*Nlt_+k)? replaced by (C*Nli_+k,S*N._t_+k)^,
and (S^_t))r replaced by (C^_t),S^-0)?. D
Section 10. Taboo Stationarity 465
10.7 The Taboo Stationary (Z*,S*,r*) Is the Taboo Limit
We shall now show that in the spread-out case Z* is indeed a two-sided
extension of the family Z^*'~h\ h £ [0, oo), of one-sided taboo limit processes
in Theorem 9.4.
Theorem 10.7. Let (Z,S,r) be taboo regenerative or taboo wide-sense
regenerative. In the wide-sense case, let Z have Polish state space and
right-continuous paths with left-hand limits. Suppose the conditions (9.18a)
through (9.18e) in Theorem 9.4 hold and let (Z*, S1*, r*) be as at (10.7a).
If P°(Xi £ -\r° ^ Xi) is spread out, then for each h £ [0, oo),
P(9t-hZ £-\r > t) %V{6-hZ* £■), <->oo. (10.18)
In fact, for each h £ [0, oo),
p(et-h{z,s,r)e-\r>t)%p(0-h(z*,s;r*)€-), *-><». (10.19)
Comment. The taboo stationary (Z*,S*,T*) exists without the first
moment condition (9.18c), that is, without E°[Xi eaXl l{r°^Xi}] < °°- Let
us state as a conjecture that this condition should not be needed for
Theorem 10.7. See also the comment to Theorem 9.4.
Proof. Fix h £ [0, oo). We obtain (10.18) from Theorem 9.4 if we can
establish that (Z*)sG[_/jj00) is a copy of the limit process Z(*'~h"> in (8.28a).
For that purpose let (Z^r\S^), r £ ( —oo,0], be as in Theorem 9.1 and
let n ^ 1 be such that P°(5° £ -\r° ^ 5°) has a density component.
Recall from the proof of Theorem 9.4 that Z(*'~h^ is the limit from-the-
past obtained by applying Theorem 8.5 to the family (Z^r\ (S^)fL0),
r £ (-oo, 0]. Due to Corollary 10.1, the family {{Z*a)a&[r<oo),{S*N._+kn)f=0),
r £ ( — 00,0], is time-inhomogeneous (wide-sense) regenerative up to time
zero and of the same type as (Z(r), (5^)^0), r £ ( —oo,0]. According to
Theorem 8.5, the limit Z(*'~'1) is determined by the type, and thus, if
((Zs)se[r,00),(s*N*_+kn)'kLo),r 6 (-°o,0], satisfies the
(10.20)
conditions (8.23), (8.24), and (8.26) in Theorem 8.5,
then Theorem 8.5 yields that (Z*)sG[roo, r e (-oo,0], has also the limit
Z(*~hl Since (Z;)se[r^h r £ (-oo,0], trivially has the limit (Z*)s€[_hi0o),
it follows that (Z*)s€[_/ii00) is a copy of Z(*-~h) as desired.
Thus (10.18) follows if we can establish (10.20). Now, in the proof of
Theorem 9.4 we showed that (Z^, (S^)f=0), r e (-oo, 0], satisfies (8.23),
(8.24) and (8.26). Since the conditions (8.24) and (8.26) only have to do
with the type, it follows that ((Z*)ge[ritX)), (S£f._+fc„)£l0), r £ (-oo,0],
466 Chapter 10. REGENERATION
also satisfies (8.24) and (8.26). Thus it only remains to establish (8.23),
namely, that there is a distribution function G on [0, oo) such that
P(S*N._-r^x)^G(x), r€ (-oo,0], x€ [0,-r]. (10.21)
By Theorem 10.3, ...,X_i,Xo are i.i.d. with distribution P°aboo(^i € •)
and independent of r. Thus [due to (10.7a)] ... ,X^2)^Ii are i.i.d. with
distribution Ptaboo(^1 e ') an<^ independent of St1. By the assumption
(9.18c), E^!^] < oo. Thus (—S^_l_k)(f) is a renewal process with finite
mean recurrence time. Note that for r € (—oo,0] and x € [0, — r],
P(S*N,_ -r^x)= P(y(_r_x)_ < a;), (10.22)
where y(_r_x)_ is the residual life of (—S*_1_k)(f immediately before time
—r — x. Due to the domination result at (7.8), there is a nonincreasing
right-continuous function g such that g(x) —> 0 as x —» oo and
P(F(_r_x)_ >x\- SI, = s) < g(x)
for r £ (-oo,0], x 6 [0, —r], and s € [0, -r - x]. This and
P(F(_r_x)_ > x, -SI, > -r) < P(-S*! > x)
yield the inequality in
P(y(_r_x)_ > a;)
= P(y(_r_x)_ >x, -S*_! <-r - a;)+P(y(_r_x)_ >x, -S*, > -r)
<fl(a;) + P(-S*1>a;), r € (-oo,0], are [0,-r].
Put G(ar) = 1 - (g(x) + P(-5*! >i))Al to obtain (10.21) from this and
(10.22). Thus (10.18) is established.
In order to obtain (10.19), apply the above argument to the taboo wide-
sense regenerative triple ((Os(Z*,S*,r*))seR,S*,r*). □
10.8 The Lattice Case — Periodic Taboo Stationarity
We shall end this section by looking briefly at the taboo counterpart of
periodic stationarity. Note that this discussion also covers discrete time
(see Section 2.6).
Consider a pair (Z**,T**), where r** is a nonnegative finite random
time and
Z** = (Z*s*)seu
is a two-sided shift-measurable stochastic process (possibly the extension
to continuous time of a discrete-time process). Call (Z**,r**) periodically -
taboo stationary with period d if d > 0, r** is dZ valued, and
p((endz**,r** -nd) €-\r** >nd) = p((z**,r**) e ■), « ^ o.
Section 11. Perfect Simulation - Coupling From-the-Past 467
It is readily checked that the analogue of Theorem 10.1 holds: (Z**,T**)
is periodically taboo stationary with period d if and only if there is a pair
(Z,T) such that
p((end-hZ,r-nd)e-\r>nd)%P((9-hz*,r*) g-)> « -> oo.
Also, it is readily checked that the analogue of Theorem 10.2 holds:
(Z**, r**) is periodically taboo stationary with period d if and only if r** /d
is geometric and independent of 6T**Z**. In the above we can replace the
pair (Z**,T**) by a triple (Z**, S**, T**); see Remark 10.1.
Let (Z, S, r) be taboo regenerative (in the wide sense or not) and assume
that P°(Xi £-\r° ^ Xx) is lattice with span d and that
P°{S0€dZ)=l and P°(r° G dZ\r° < X,) = 1.
Let (Z,S,t) be as in Theorem 10.3 and let (Z*,S*,T*) be as at (10.7a).
Put
r** =d+[r*/d\d and (Z**,S**) := 0-r..9f,(Z,S).
Then r**/d is geometric and independent of (Z,S,t), and thus we have
that (Z**,S**,r**) is periodically taboo stationary with period d.
Observe that r**-T* is [0,d) valued and independent of (Z**,S**,T**)
and that
(z**,s**,r**) = 9r.-r,.(z*,s*:r*).
From this, the lattice assumption, and Theorems 10.5 and 10.6 we
obtain easily that Theorems 10.5 and 10.6 hold with (Z*,S*,T*) replaced by
(Z**,5**,r**), that is, (Z**,5**,T**) is a periodically taboo stationary
version of (Z, S, r). Now the proof of Theorem 10.7 (with obvious
modifications) yields the following result: if the conditions (9.18a) through (9.18e)
in Theorem 9.4 hold, then for h £ [0, oo),
P(6nd-h(Z,S,r) G -\r > nd) 4 P(6.h(Z**,S**,r**) G •)
as n -» oo.
11 Perfect Simulation - Coupling From-the-Past
We shall end this final chapter by considering the simulation aspects of
the above theory. Coupling from-the-past (Section 8) can be applied to
finite state space Markov chains to generate the stationary version of a
time-homogeneous chain, the two-sided version of a time-inhomogeneous
chain, and the taboo stationary version of a time-homogeneous chain (in
particular, the so-called quasi-stationary distribution).
468 Chapter 10. REGENERATION
In Section 6.1 of Chapter 8 we discussed briefly the general problem
of generating a stationary version of a given stochastic process, and the
same discussion applies with obvious modification to the two-sided time-
inhomogeneous case and the taboo case. We then gave a solution for Palm
duals with bounded cycle-lengths using the acceptance-rejection algorithm.
At the end of this section this algorithm is applied together with the
structural results of Section 10 to generate the taboo stationary version of a
taboo regenerative process when the minimum of the cycle-length and the
taboo time is bounded and the exponential parameter a is known.
An important distinction between the acceptance-rejection algorithm
and the coupling from-the-past algorithm is that the coupling algorithm
works without knowledge of a. An interesting common feature of the
algorithms is that the transition probabilities (or cycle distribution) of the
processes need not be known. The processes could, for instance, be the
output of another simulation.
11.1 Generating a Stationary Finite-State Markov Chain
Consider a Markov chain in discrete or continuous time
Z = (Zk)™ or Z = (Zs)se[o,oo)
with a finite state space E. Assume that Z is irreducible and, in the discrete-
time case, that Z is aperiodic. Suppose it is known how to generate Z
starting from any given initial state i. Note that the problem of generating
the stationary version Z* of Z can be reduced to that of generating the
stationaTy4nitial state Zq. The stationary initial state Zq can be generated
by coupling from-the-past as follows (see Figure 11.1).
Initial step. Start a family of independent versions of Z in all states
at time —1 and run them up to time 0 (that is, one transition in the
discrete-time case).
Recursive steps. For each n ^ 2, start a new family of independent
versions of Z in all states at time —n and run them up to time —n + 1.
From time — n + 1 let the chains continue up to time 0 as follows: if
the chain starting in state i at time — n is in state j at time — n + 1,
let it continue up to time 0 along the path of the chain that starts at
time — n + 1 in state j.
Termination condition. Terminate the recursion at the first n ^ 1
such that all the chains that start at time — n are in the same state at
time 0. This state is a realization of the stationary state Zq, according
to the following theorem.
Section 11. Perfect Simulation - Coupling From-the-Past 469
-M ... -4 -3 -2 -1 0
FIGURE 11.1. Coupling from-the-past.
Theorem 11.1. Let M be the first n ^ 1 such that all the chains that
start at time —n are in the same state at time 0. Call this state Y. Then
P(M < oo) = 1 and Y is a copy of Zq.
Proof. Let P; indicate that Z starts in the state i, that is, Pi(Z0 = i) = 1.
According to Theorems 3.1 and 3.2 in Chapter 2, limn^f00Pi(Zn = j) =
P(Zq = j) > 0 for all states i and j. Thus there are jo, no, and p > 0 such
that
Pi(Zno = j0) > p, i£E.
For each n ^ no the probability that all the chains starting at time — n
are in the same state at time — n + no is no less than the probability that
independent chains starting from all states at time — n are all in the state
jo at time —n + n0. Thus
P(M > {k + l)n0\M> km) ^ (l-p*E), k^O.
This implies
P(M > kno) ^ (1 - p*E)k -> 0 as k -> oo,
that is, P(M < oo) = 1.
Now fix an i € E. Let Zq"''' be the state at time 0 of the chain that
starts in i at time —n and note that Z^~n''' =7on {M < n}. Thus, for
each j £ E,
P{Z(~n'l) = j) = P(zin'l) =j,M>n)+ P(Y =j,M^ n)
->P(V=j), n ->• oo.
But for each j € £7,
P(Z(-"'l) = j) = Pt(Zn = j) -> P(Z0* = j) as n ^ oo,
and thus P(Y = j) = P(Z^ = j) as desired. □
470 Chapter 10. REGENERATION
Remark 11.1. Why use coupling from-the-past and not ordinary coupling
to-the-future? Simply because the latter does not work. The coupling time
T, when the chains starting from all states at time zero merge, is also the
state of the stationary chain. But T is random, and thus the stationary
chain need not have the stationary distribution at time T. (In order to see
this, consider a chain with state space E = {1,2,3} which goes from 1
to 2 with probability |, from 1 to 3 with probability |, from 2 to 3 with
probability 1, and from 3 to 1 with probability 1. The versions starting
from all states at time zero merge in state 3. The nonrandom state 3 is not
the stationary state.)
11.2 More Efficient Algorithm for Birth and Death Chains
The above algorithm shows that perfect simulation is theoretically possible,
but the algorithm is not very efficient. Typically, in practical simulations
the number of states is several thousands or millions, even trillions or more,
and thus having to generate independent transitions from all states is
astronomically expensive and time-consuming. However, in special cases there
are efficient versions of the above algorithm.
For instance, if Z is a birth and death chain in continuous time, then
we can (recursively for n ^ 1) generate two chains starting at time —n,
one starting from top (from the highest state) and the other from bottom
(from zero), and run each of them until time zero or until it merges with
a chain starting at time — k for some k < n. Repeat this until the first n
such that the two chains starting at time — n are in the same state at time
zero._This- common state is a realization of the stationary state because
airdiains coming in from the past (in particular the stationary chain) are
captured by these chains and have to merge with them.
The same trick can be used for more complicated monotone chains, that
is, chains having a partially ordered state space and transition probabilities
that preserve this partial ordering. An example is the (finite) Ising model,
a Markov chain with state space { — 1, l}{0----'fe} (a state is a configuration
of minus ones and ones indexed by a location in {0,..., k}d) and with the
property that a state changes either by switching a single —1 to 1 or a
single 1 to — 1. The chains starting from top (with ones at all locations)
and the chains starting from bottom (with zeros at all locations) will then
capture all chains that come in from the past.
11.3 Generating a Two-Sided Time-Inhomogeneous Chain
The above coupling algorithm also works for time-inhomogeneous chains.
To simplify the presentation we shall only consider the discrete-time case.
Analogous results hold in the continuous-time case.
Section 11. Perfect Simulation - Coupling From-the-Past 471
Let E be a finite or countable set and, for each k G Z, let
Pk = (Pkij :i,j€E)
be transition probabilities on E. For n ^ 0, consider a time-inhomogeneous
Markov chain in discrete time
z(-„) = (Z(-"))r=_n
starting at time — n with transition probabilities Pk, k ^ —n, that is,
P(Z^)=j|4-n)=i) = PWi, i,jeE, k^-n.
Assume that there is a finite subset B of E and jo G B, n\ > n0 ^ 1, and
p > 0 such that
Pkij =0, k<0, ieB, j$B,
P(Z{_-n%0=j0\Z(_-nn)=i)>p, n>nu ieB. (11.1)
Suppose it is known how to generate, for each n ^ 0, versions of Z'_n'
starting from any given initial state i £ B and suppose we wish to generate
a two-sided version,
Z* = (Z*k)
OO
OO"
Then the key task is to generate, for each fixed integer m 4, 0, a realization
of Z^. If this can be done, we can, for any k Js 1, run a version of Z(_m) for
k steps starting from the realized value of Z^ and obtain a realization of
{Z*n,..., Z^+k). So fix m ^ 0 and apply coupling from-the-past as follows
to obtain a realization of Z^.
Initial step. For each i€B, generate an independent copy Z\^~ 'l'
of Zm ~ ' starting at time m — 1 in state i.
Recursive steps. With n ^ 2 generate, for each i G B, an
independent copy Z^™Sn+\ °f Zm-^n+i starting at time m-n in state
i. From time m — n + 1 define thechain z(m-rM) up to time m as
follows:
Tf 7i(m-n'i) _ -■ nl]t 7{m-n,i) _ 7(m-n+l,j) „ < , <
Termination condition. Terminate the recursion at the first n ^ 1
such that all the chains that start at time m — n are in the same state
at time m. This common state is a realization of Z^, according to
the following theorem.
472 Chapter 10. REGENERATION
Theorem 11.2. Let Pk, k G Z, be a family of transition probabilities
satisfying (11-1). Then there exists a two-sided time-inhomogeneous Markov
chain Z* = (ZD^^ with one-step transition probabilities Pk, k G Z.
Moreover, for all families Z^~n> of time-inhomogeneous Markov chains starting
at time —n in the set B with one-step transition probabilities Pk, k ^ — n,
it holds that as n —> oo;
P(Z<-n) = j) -» P(Z*m=j) (limit from-the-past), (11.2)
for all m G Z and j G E.
Finally, take m ^ 0, fix an arbitrary state Iq G B, and in the above
algorithm put
M = inf{n > 1 : Z^~nA = Z^'71'^ for all i G B}.
Then P(M < oo) = 1, and Z(^M'io) is a copy of Z*m.
PROOF. Due to (11.1), for each n > n\, the probability that the chains
Z(m~n'l\ i0 G B, starting at time m — n are all in the same state at time
m — n + no is no less than the probability that independent chains starting
from all states at time m — n are all in the state jo at time m — n + Uq.
Thus
P(M > (k + l)n0 + k0\M > kn0 + k0) < (l-p*B), k > 0, (11.3)
that is, P(M < oo) = 1. Note that for i,j G B,
p(Z(m-n,i) = j}
= P(Z^~n^ =j,M>n)+ P{Z^-M^ = j, M < n) (11.4)
_>P(Z(j»-Af,io)=j)) n_>00.
In order to establish the (mathematical) existence of Z* put, for k ^ m,
Mk = inf{n > 1 : ^m_n,i) = ^m""'io) for all i G B}.
Then (11.3) holds with M replaced by Mk, and thus P(Mk < oo) = 1.
Define Z* up to time m by
From time m onward, let Z* run according to the transition probabilities
Pk, k > m. Then by definition, Zm~ 'l°> is a copy of Z£, and (11.2) follows
from
p(z(l-)=j)=x: p(^-n,°=j)P(zi-nn)=i)
->• P(^m = j) asn -» oo [due to (11.4) and #B < oo].
Section 11. Perfect Simulation - Coupling From-the-Past 473
It only remains to establish that Z* is Markov with transition probabilities
Pk, k G Z. Take k ^ m and jk,... jm £ B. A calculation like (11-4) yields
p(z(m-„.io) = jfc> . . . , ^"".io) = jm) _> p(Z* =jki^_jZ*m= jm)
and P(4m-",io) = jfc) -> P(Z*k = jk) as n -> oo.
Combine this and
p/7(m-n,i0) _ • 7(m-n,i0) _ • \
^l^fc ~ Jk,- ■ ■ ,Am — Jm)
= P(Zim-n^) = Jfc)P(^ffi° = J*+l , ■ ■ ■ , ^ = Jm)
to obtain
= P(z* = i,)P(zf+f > = jk+1,..., z&m = jm),
that is, Z* is Markov with transition probabilities Pk, k G Z, up to time
m. From time m onward, Z* is Markov with transition probabilities Pk,
k G Z, by definition. □
11.4 Generating a Taboo-Stationary Markov Chain
Consider a Markov chain in discrete time
Z = (W
with state space E. Let B be a finite subset of E and let r be the first exit
time out of B,
r = inf{/b> \:Zk & B}.
Assume that r is a.s. finite for all initial states and that B is irreducible:
Vi,j &B 3k > 1 : P{Zk =j,T> k\Z0 = i) > 0;
and aperiodic:
gcd{k > 1 : P{Zk =i,T> k\Z0 = i) > 0} = 1, i G B.
Assume that it is known how to generate Z starting from any given initial
state i and suppose we wish to generate the taboo limit of Z, that is, a
two-sided taboo stationary chain Z* = {Zl)°?00 such that for all integers h,
P(Zn^h = j\r > n) ->P{Z*_h=j), n^oo. (11.5)
In the special case when h = 0, the distribution of Zq is called the quasi-
stationary distribution.
474 Chapter 10. REGENERATION
The chain Z conditioned on {F > n} is a time-inhomogeneous Markov
chain, and the simulation algorithm from the previous subsection can be
applied to generate Z*. We shall show how to generate the taboo segment
{Z*_h,..., Zq) of Z* ending at time zero. In order to generate {Z*_h,..., Z^)
for an m > 0 we can then continue from time 0 up to time m with a
version of Z starting from the realized value of Zq. Fix an integer h ^ 0
and generate a realization of {ZZh, ■ ■ ■, Zq) as follows.
Recursive steps. For n ^ 1, fix i € B and generate i.i.d. versions
Z(-h-n,i,i) Z(-h-n,i,2) ^ of z starting at time -h - n in state i and
ending at time 0. Let r(>-h-n'i>1\r(-h-n'i<2\... be the first exit
times out of B and continue generating until
K(-h-n,i) = jnf {fc ^ x . r{-h-n,i,k) > 0}
Define a chain Z^~h~n^ starting at time —h — n in stat^i, ending at
time 0, and tabooed in the time set {—h — n,... ,0}, by
Z(-h-n,i) = ^z(-h-n+ltj)) if Z(_-™.Mfi(-h-"(°) = j.
Do this for each i G B.
Termination condition. Terminate the recursion at the first n ^ 1
such that all the chains Z(~h~n'l\ i € B, that start at time —h — n
are in the same state at time —h. These chains will then run together
from this common state up to time 0, and this common segment is a
realization of {Z^_h,..., Zq), according to the following theorem.
Theorem 11.3. Let Z = (Zfc)o° be a Markov chain in discrete time with
state space E. Let B be a finite irreducible aperiodic subset of E with an a.s.
finite first exit time r. Then there exists a two-sided time-inhomogeneous
Markov chain Z* = {Z^)0^^ such that for all h ^ 0,
P((Z„_fc,..., Zn) = -\r > n) -> P((Z* h, ...,Z*) = ■), n -> oo.
Further, take h > 0, fix an arbitrary state io, and in the above algorithm
put
M = inf{n > 1 -. Z^~n^ = Z^~nM) for all i&B).
Then P(M < oo) = 1 and {Z{Shh~M'io),... ,Z(0~h'MM)) is a copy of
{zih,...,zs).
Proof. Let the chains Z^~h~n^ run from time —h — n up to infinity
(rather than end at time zero). The acceptance-rejection used to obtain
these chains renders (see Theorem 6.1(a) in Chapter 8)
p(z(-fc-n,i) G .) = P((Zfc+n+fc)2L_h_n € -W > n + h).
Section 11. Perfect Simulation - Coupling From-the-Past 475
This in turn implies that the chains Z^~h"n'^ form a family of time-
inhomogeneous Markov chains with common transition probabilities. Thus
Theorem 11.2 applies to yield the desired results if we can establish (11.1),
that is, if we can show that there are jo G B, n\ > no #s 1, and p > 0 such
that
Pi{Zno = j0\r > n) > P, n^ni, i€B, (11.6)
where Pi indicates that Z starts in the state i.
In order to establish (11.6), we shall begin by applying Lemma 3.1(6) in
Chapter 2, which states that an aperiodic additive set A of nonnegative
integers contains all integers from some k0 onward. Fix a state jo € B and
put
A = {k 2 1 : Pjo{Zk = jo,T> k) > 0}.
By assumption, A is aperiodic, and A is additive, since for all k and k' ^ 1,
pjo{zk+k. =j0,r>k + k') > vjo{zk = jo,zk+k. =j0,r>k + k')
= pjo{zk = jo, r > k)pj0{zk, = j0, r > k').
Thus Lemma 3.1(6) in Chapter 2 yields the existence of an integer ko such
that
Pjo{Zk=jo,r>k)>0, k^k0. (11.7)
Since B is irreducible, there is, for each i € B, an integer m; such that
Pi{Zmi=jo,r>mi)>Q. (11.8)
Put no — ko + maxj£B mi- In (H-7) take k = no — m; and multiply by
(10.8) to obtain
Pi{Zno=jo,r>no)>0, iEB. (11.9)
Since B is irreducible there is, for each i e B, an integer ki such that
pj0{zki =i,r>ki)>o.
Multiply (11.9) by this to obtain
Pi{Zno = j0,Zno+ki =i,r>n0 + ki) >0, ieB. (11.10)
Put m = no + maxjes &»• F°r i G B and n ^ m,
Pi{Z„0 = jo, r > n) > Pi{Zno = jo, Zno+k. =i,T> n)
= Pi{Z„0 = jo, Zno+ki =i,T > n0 + h)Pi(r > n- (n0 + hi))
> Pi(^„0 = io, ^no+fci = *, £ > "o + h)Pi(r > n).
476 Chapter 10. REGENERATION
This and
p := inf Pi{Zno = j0, Zno+ki =i,r>n0 + k{) > 0 [due to (11.10)]
yield
Vi(Zno = jo, r >n)> PPi(r > n).
Divide by Pj(.T > n) to obtain (11.6) and complete the proof. □
11.5 Generating a Taboo Stationary Regenerative Process
Finally, let us consider how to generate the taboo stationary version of a
taboo regenerative process by acceptance-rejection (Section 6 in Chapter 8)
using the structural results of Section 10. Let (Z, S, r) b, e properly taboo
regenerative, that is, let [Z, S, r) satisfy (9.1). Assume th&t there are known
finite constants a and b such that
P°(A"i < a\r° ^X1) = l and P°(r° < b\r° < AV)_ = 1,
and that there is a known a > 0 such that
V°[eaX> l{r^xl}] = 1.
Recall from Section 10 that the taboo stationary version (Z*,S*,T*) of
(Z, S, r) has the following structure: r* is exponential with parameter a
and
(r,s') = I.r.Mz,s), (ii.il)
where (Z, S, f) is as in Theorem 10.3 and independent of r*.
Suppose it is known how to generate the zero-delayed version of (Z, S, r).
Then the taboo stationary version (Z*, S*, r*) can be generated as follows.
1. Generate i.i.d. copies of the cycle C\. Let C^\C^2\ ... be the cycles
that obey the taboo. Let {R^, T^), (i?(2\ T^2'),... be the cycles
and the taboo time of the cycles that break the taboo. Generate as
many of these cycles as needed for the remaining steps.
2. Recursively for n ^ 1, generate an independent [/(") uniformly
distributed on (0,1). Accept the cycle C^ if {[/<") ^ eQ(xin)-a'}
occurs. Let Co, C-1, C-2, ■ ■ ■ be the subsequence of accepted cycles and
generate as many of them as desired.
3. Recursively for n ^ 1, generate an independent W^ with uniform
distribution on (0,1). Accept the first (R(n\ r(n>) such that the event
occurs. Let (Ci,r) be the accepted pair.
Section 11. Perfect Simulation - Coupling From-the-Past 477
4. Generate independent i.i.d. copies €2,03,-■■ of the cycle C\, as
many as desired, and let (Z, S) be the pair obtained by stringing
out the cycles C\, C2, ■ ■ • forward from time zero and the cycles
Co, C-i, C-2, ■ ■ ■ backward from time zero.
5. Generate an independent exponential r* with parameter a and put
(Z',S*):=0-r.9f(Z,S).
By acceptance-rejection [Theorem 6.1 in Chapter 8], (Z, S, f) is as in
Theorem 10.3(a), and thus
(Z*,S*,T*) is the taboo stationary version of (Z, S, T).
A modification of the above algorithm works in the wide-sense case but is
less efficient, since all the generated cycles of (Z,S) must be obtained in
the right order from a single process to preserve the dependence structure.
11.6 Remarks
The boundedness condition in the above acceptance-rejection algorithm is
automatically satisfied in the case when the taboo time is the initial point
of the first cycle with length that exceeds a fixed level (or the time when a
cycle-length exceeds that level). But in general the boundedness is a severe
restriction. However, carrying out the above algorithm with some fixed a
and b yields an imperfect simulation as in Section 6.8 of Chapter 8. A more
serious drawback of the acceptance-rejection algorithm is that we must
know a.
The coupling from-the-past method does not have these drawbacks. It
seems to be a method with much potential. It can even be extended beyond
finite state space as in the following domination example.
Consider a discrete-time birth and death process with all birth
probabilities pi, i > 0, less than or equal to some known constant p < |. The
stationary distribution is well known but hard to calculate, so proceed as
follows. Dominate the birth and death process by a random walk with
{ — 1,1} valued step-lengths, reflection at 0, and taking the upward step
1 with probability p. Run the (known) stationary version of this random
walk backward from time zero until it hits the state 0. Now generate the
birth and death process (coming in from the past) forward from this time
as follows: let it always take the step —1 when the random walk takes the
step —1; let it also take the step —1 with probability 1 — Pi/p when the
random walk takes the step 1; let it take the step 1 with probability pt/p
when the random walk takes the step 1. The state at time zero of this
process is a realization of the desired stationary state.
An observation by King Crimson:
Said the straight man to the late man
Where have you been
I've been here and I've been there
And I've been in between.
Notes
The following notes reflect the author's desire to get this book into print
without further delay. Where nothing is known, nothing is claimed.
Chapter 1 RANDOM VARIABLES
The quantile coupling can be traced back to the fifties, and had probably
been around for awhile. It is used in Hodges and Rosenblatt (1953), Harris
(1955), Skorohod (1956), and Lehmann (1959). The natural term 'quantile
coupling' was suggested to me by Richard Gill.
The generalization of Theorem 3.1 to partially ordered Polish spaces is
called Strassen's theorem. It is a special case of Theorem 11 in Strassen
(1965). For outlines of proof, see Liggett (1985) and Lindvall (19926). See
also Kamae, Krengel, and O'Brien (1977). Fill and Machida (1999) consider
how Strassen's theorem can (and cannot) be extended.
For coupling, Poisson approximation, and the so-called Stein's method,
see Barbour, Hoist, and Jansson (1992). Erhardsson (1999) combines the
coupling version of Stein's method with properties of regenerative
processes.
Theorem 7.1 is from Thorisson (19956). Theorem 8.1 is the elementary
one-dimensional version of the Skorohod coupling; see Skorohod (1956) and
also Skorohod (1965).
The problem described in Section 10 is considered by many scientists to
be the problem in modern physics. It is often referred to by the key phrases
nonlocality, Bell inequality, and EPR (Einstein, Podolsky, and Rosen). The
term 'impossible coupling' seemed appropriate in the context of this book.
479
480 Notes
The key historical papers are the following: Einstein, Podolsky, and Rosen
(1935) pinpoint that quantum physics implies nonlocality; Bell (1964,1966)
establishes the Bell inequality, an inequality like (10.4) [Boole's inequality
is in fact a Bell-type inequality]; and Aspect, Dalibard, and Roger (1982)
report the results of an actual experiment like the one described in Section
10, confirming the predictions of quantum physics. In addition to the non-
Kolmogorovian views of Kiimmerer and Maassen (1998) and Accardi (1984,
1995, 1998) and the nonlocality view of Maudlin (1994) and Gill (1998,
1999) mentioned in Section 10.6, there is, for instance, the view of Pitowski
(1989) that the problem has to do with measurability. For a short and
excellent common-sense survey, see Mermin (1985).
Chapter 2 MARKOV CHAINS AND RANDOM WALKS
It is generally agreed that the coupling idea dates back to Doeblin (1938),
where the classical coupling is presented in the context of regular finite-
state Markov chains. For a survey of Doeblin's life and work, see Lindvall
(1991). The classical coupling appears in Harris (1955), but otherwise seems
to have disappeared for a long time. It finally surfaced in the elementary
books Breiman (1969) and Hoel, Port, and Stone (1972). In Pitman (1974)
the classical coupling is used to establish rates of convergence for irreducible
aperiodic positive recurrent Markov chains. This idea is further explored
in the context of discrete-time renewal process in Kalashnikov (1977) and
Lindvall (1979a). Lindvall (19796) considered the classical coupling of birth
and death processes.
The Ornstein coupling was introduced in Ornstein (1969), and epsilon-
coupling in Lindvall (1977).
Blackwell's renewal theorem (Theorem 8.1) was first proved by Black-
well (1948) although special cases had been treated by Tacklind (1945)
and Doob (1948). Several proofs have been proposed since then, the least
complicated analytic proof probably being the one based on Choquet's
theorem, see Feller (1971). The first probabilistic proof is presented in
Lindvall (1977). It covered the finite-mean case, m < oo, and was based on
an epsilon-coupling version of the classical coupling relying on the Hewitt-
Savage 0-1 law to establish a successful epsilon-coupling. Athreya,
McDonald, and Ney (1978) considered the two-sided case [which was first treated
in Blackwell (1953)] and proposed establishing successful epsilon-coupling
by applying the epsilon-recurrence of zero-mean nonlattice random walks
to the difference walk of two independent nonlattice random walks; the
problem is that this difference walk need not be nonlattice. Berbee (1979)
added a geometric number of 0 step-lengths at each step to make the
difference walk nonlattice. Thorisson (1987a) extended Lindvall's approach
to the infinite-mean case, m = oo, and also removed the reliance on the 0-1
law (at the cost of having to treat an epsilon-transient case). This paper
also suggested as an alternative approach the Ornstein-type construction
Notes 481
to obtain epsilon-recurrence of the difference walk (which is quite hard to
establish in the unbounded finite-mean case, and need not even hold when
m = oo). Lindvall and Rogers (1996) gave the proof an elegant finishing
touch by introducing the geometric-sum idea, which allows the use of not
only bounded but actually epsilon-bounded step-lengths.
A relatively simple analytic consequence of Blackwell's renewal theorem
is the so-called key renewal theorem [see, for instance, Feller (1971)], which
is commonly used to derive the results in Theorem 10.2 on convergence in
distribution. It does not, however, yield the total variation result for the
total life. This result is from Thorisson (1997a), and so is Theorem 10.1.
The first complete extension (in the one-sided case) of Blackwell's
theorem to Markov renewal processes is in Shurenkov (1984); it is based on
Fourier analysis. Alsmeyer (19946, 1997) presents two probabilistic proofs
with coupling as an ingredient (and covers the two-sided case).
Comment on Strong Stationary Times
Consider a deck of cards. Take the card at the top and put it into the deck
uniformly at random, possibly on top again and possibly below the card at
the bottom. Repeat this until the card that was originally at the bottom
is at the top. When this card is put into the deck uniformly at random,
then the deck is uniform (is at stationarity) and independent of how many
rounds (say T) this took.
This example is due to Aldous and Diaconis; see Aldous (1983), Aldous
and Diaconis (1986, 1987), and Diaconis (1988). They call a time like T
a strong uniform time, and later a strong stationary time. The definition
is as follows: T is a strong stationary time for a Markov chain Z if the
future of the chain from time T onward is stationary and independent of T.
Such times are, for instance, used to establish 'nonasymptotics' for certain
random walks on finite groups, in particular sudden transitions in real time
from nonstationarity to stationarity (so-called threshold phenomena). For
strong stationary times in simulation, see Fill (1998).
In Thorisson (19886) a time T is called future-independent if it is
independent of the future of the chain Z from time T onward [observe that this
is exactly the defining property of regeneration times in the wide sense;
see Chapter 10 (Section 4.1)]. Thorisson (19886) calls a strong stationary
time a future-independent stationarity time and notes the following
relation between this concept and coupling: since the future of Z after time T
is stationary and independent of T, we can run a time-reversed stationary
version of Z from the state Zt backward to time zero to obtain a stationary
chain Z' such that Z and Z' coincide from time T onward. Thus, in fact,
strong stationary times are coupling times.
The converse is not true. Neither is this true for future-independent times
in general.
482 Notes
Chapter 3 RANDOM ELEMENTS
Extension techniques are just used, but rarely explained. Extension, as
defined in Section 3.1, is in Berbee (1979). The special case of a product
space extension is more common; see Kallenberg (1997).
The transfer idea must have been around in some form for a long time,
but as far as I am aware it was first presented explicitly in Section 5.1 of
Thorisson (1981); see also Construction 1.1 in Thorisson (1983). The perfect
term 'transfer' is from Kallenberg (1997). A version of transfer [basically
Theorem 7.5 in Chapter 4 of this book] was presented in Kallenberg (1988).
Transfer under weak-sense-regularity (Theorem 2.4 of Chapter 4) seems to
be new. This rather obvious observation can be quite useful, since weak-
sense-regularity does not imply regularity [as one might think; see Bogachev
(1998) for measure theory relevant in general setting^].
The splitting idea is from Nummelin (1978) and Athreya and Ney (1978),
the eye-opening papers on regeneration in Harris chains. Theorem 5.1 (the
extension to general conditional splitting) seems to\be new.
Berbee (1979) presents an exact coupling of random walks with spread-
out step-lengths. It is based on the Ornstein idea, as the coupling in Section
6. Ney (1981) and Lindvall (1982) elaborate on the classical coupling idea
to obtain rate results. See also Kalashnikov (1980) and Silvestrov (1979,
1994). Renewal theory in the spread-out case is in Smith (1954); see also
Stone (1966) and Arjas, Nummelin, and Tweedie (1978). In particular,
Theorem 6.2 has an associated key renewal theorem that does not require
direct Rieman integrability (as does the one based on BlackwelFs renewal
theorem) and holds uniformly over a class of functions. Markov renewal
theory in the spread-out case is treated in Niemi and Nummelin (1986).
The material in Section 9 is from Thorisson (1994a). Theorem 10.1 is
from Dudley (1968); the proof is new. Theorem 10.3 is from Skorohod
(1956); the proof follows Billingsley (1971).
Chapter 4 STOCHASTIC PROCESSES
The term 'coupling' was earlier (and still is occasionally) used for what
we have called exact coupling (following Lindvall (1992)). Distributional
exact coupling was introduced (under the name 'distributional coupling',
or just 'coupling') in Thorisson (1981, 1983), inspired by the approach in
Ney (1981). It is called 'weak coupling' in Lindvall (1992).
Maximal exact coupling of discrete-time Markov chains is presented
in Griffeath (1975); see also Pitman (1976). Griffeath (1978) extended
the result to discrete-time Markov processes on a Polish state space, and
Goldstein (1979) to general discrete-time stochastic processes on a
Polish state space. Berbee (1979) proved Goldstein's result by applying Grif-
feath's result to the path process (which is Markovian with a Polish state
space). Thorisson (19866) established a maximal distributional exact cou-
Notes 483
pling without any restriction on the state space, and obtained the maximal
nondistributional result in the Polish case as a corollary (by transfer). See
also Greven (1987) and Harison and Smirnov (1990). For maximal exact
coupling in continuous time, see Sverchkov and Smirnov (1990).
The equivalences in Theorem 9.4 are from Goldstein (1979). Goldstein's
proof contained a Hilbert space argument, which is not needed here because
of the use of coupling with respect to a sub-cr-algebra. This concept was
introduced in Aldous and Thorisson (1993).
Michail Sverchkov pointed out to me the need to consider canonical joint
measurability rather than joint measurability. Lemma 2.1 is due to Walter
Rudin (personal communication).
Chapter 5 SHIFT-COUPLING
Shift-coupling dates back to the amazing monograph by Berbee (1979),
where the link to Cesaro total variation convergence is established (see his
Theorem 4.3.3). Greven (1987) considers distributional shift-coupling in
the context of discrete-time Markov processes, and introduces the
maximality property (4.1). Aldous and Thorisson (1993) provide the link to the
invariant cr-algebra. The term 'shift-coupling' was introduced in that
paper (coined by David Aldous). Thorisson (19946) considers shift-coupling
in continuous-time and presents the shift-coupling inequality. The proof of
the maximality result (Theorem 4.1) is a shift-coupling version of the proof
of Theorem 1 in Thorisson (1996).
The epsilon-coupling material in Sections 6-9 is mostly from Thorisson
(19946). Theorem 7.2 is from Thorisson (1997a), and Theorem 7.3 is from
Asmussen (1992).
Chapter 6 MARKOV PROCESSES
The exact coupling equivalences have been around for awhile. For instance,
the equivalence of mixing and triviality in Theorem 2.1 is Theorem 4.1
in Orey (1971). The whole set of exact coupling equivalences follows
basically from the maximal exact couplings in Griffeath (1978), Goldstein
(1979), and Berbee (1979). The shift-coupling equivalences have also been
around for awhile; the whole set of them follows basically from Berbee
(1979), Aldous and Thorisson (1993), and Thorisson (19946). The epsilon-
coupling material follows basically from Thorisson (19946). The smooth tail
cr-algebra is introduced there, and smooth space-time harmonic functions
in Thorisson (19976).
The total variation limit claim in Theorem 4.1(6) was established for
aperiodic positive recurrent Harris chains in Orey (1959), and the extension
to the null recurrent case in Jamison and Orey (1967). The limit claim
(7.3) was established for null recurrent Harris chains in Jain (1966); the
alternative formulation (7.2) is from Thorisson (19956).
484 Notes
Chapter 7 TRANSFORMATION COUPLING
The group material in Section 7 is from Thorisson (1996). The
generalization to semigroups is from Thorisson (2000). Georgii (1997) uses the term
'orbit coupling' for transformation coupling. He considers a semigroup
acting measurably on a standard space and assumes that either the semigroup
is countable normal, or a compact metric group, or composed of finitely
many such building blocks. A permutation coupling (as in Section 8.2) is
in Aldous and Pitman (1979).
Further References on Coupling
For coupling and chains with infinite connections, see Harris (1955); see
also Berbee (19876). For applications in interacting particle systems, see
Liggett (1985). Decoupling is used to analyze; problems involving
dependent random variables as if they were (conditionally) independent; see de
la Pena and Gine (1999). For the use of coupling in improving the efficiency
of simulations, see Glynn and Wong (1996), and Glynn, Iglehart and Wong
(1999). For shift-coupling and convergence rates in simulation, see Roberts
and Rosenthal (1997). For coupling from-the-past, see the notes on
perfect simulation below. For card shuffling couplings, see Aldous and Diaco-
nis (1987) and Pemantle (1989). For coupling of recursive sequences, see
Borovkov and Foss (1994). For coupling in branching, see Jagers (1997). For
some aspects of coupling, see Schweizer and Sklar (1983), Scarsini (1989),
and Cuesta and Matran (1994). The TES process approach to modelling
and forecasting yields a self-coupling of the histogram of the empirical
time series, see Melamed (1993). For application of coupling to the
estimation of the spectral gap, see Chen (1998). For coupling and harmonic
functions on manifolds, see Cranston (1991, 1993). For coupling and the
strong law of large numbers for a Brownian polymer, see Cranston and
Mountford (1996). Cranston and Wang (1999) establishes the equivalence
of coupling and shift-coupling for a certain class of Markov processes. A
surprising behaviour of random walks on groups is established by coupling
in Lyons, Pemantle, and Peres (1996). For coupling of Markov and Gibbs
fields, see Haggstrom (1998) and Georgii, Haggstrom, and Maes (1999). For
the canonical coupling of percolation processes, see Haggstrom and Peres
(1999) and HaggstrOm, Peres, and Schonmann (1999). In Evans, Kenyon,
Peres, and Schulman (1999) a noisy network is coupled to a simpler network
to obtain sharp bounds in a reconstruction problem that arose
independently in computer science, mathematical biology, and statistical physics.
In Dembo, Peres, Rosen, and Zeitouni (1999) coupling of random walks
and Brownian motion is used to prove a conjecture of Erdos and
Taylor from 1960 on simple random walk. Coupling is used in the textbooks
Grimmett and Stirzaker (1992) and Durrett (1991). For applications of
coupling in various fields, see the collection of articles edited by Kalashnikov
Notes 485
and Thorisson (1994). Lindvall (19926) has many topics and references not
mentioned here; see also his papers in the list of references. The wealth of
topics in the monograph by Berbee (1979) never ceases to surprise.
Chapter 8 STATIONARITY, THE PALM DUALITIES
Palm theory dates back to the early forties. It was pioneered by the Swedish
engineer Conny Palm, who worked at the Royal Institute of Technology and
at the Swedish Telephone Company.
The point-at-zero duality in Theorem 4.1 is a process version of the
standard Palm duality; see Palm (1943); Kinchine (1955); Ryll-Nardzewski
(1961); Neveu (1976); Kallenberg (1986); Matthes, Kerstan, and Mecke
(1978); Franken, Konig, Arndt, and Schmidt (1981); Rolski (1981); Daley
and Vere-Jones (1988); Brandt, Franken and Lisek (1990); Baccelli and Bre-
maud (1994); and Brandt and Last (1995). This duality is usually obtained
in one step through a single formula (see Section 4.7). The present two-step
approach (Theorem 4.1) goes back to Thorisson (1981); see also Thorisson
(1992, 1995a). For a similar two-step approach in a canonical point process
setting, see Nieuwenhuis (1989a, 19896, 1994). Cycle-stationary processes
are also called 'syncronous'. The length-biasing is known as the inspection
(or waiting time) paradox. The uniform position of the origin in its interval
under stationarity (Theorem 3.1) has been known [for an early reference,
see McFadden (1962)], but has not been highlighted. Here it is (together
with the length-biasing) the basic fact on which the theory is built.
The randomized-origin duality in Theorem 8.1 is not as well known;
see Nawrotzki (1978); Glynn and Sigman (1992); and Nieuwenhuis (1994).
The path of the author from the point-at-zero duality to the randomized-
origin duality went through work on shift-coupling. The resulting two-step
approach (Theorem 8.1), and the Cesaro limit result (Theorem 9.1) and
shift-coupling result (Theorem 9.2), are presented in Thorisson (1995a).
The Cesaro limit result is from Glynn and Sigman (1992).
In the past there has been some confusion regarding these two dualities
and their interpretations. It seems that the distinction between them was
first noted by Nawrotzki (1978). When Palm introduced his theory in 1943
he was aiming at the randomized-origin interpretation, see his Chapter 2.
However, the point-at-zero interpretation became dominant, which is
natural in a way since it is the correct interpretation of the standard Palm
duality. But most work in this field has been done assuming ergodicity, in
which case the two dualities coincide (see Section 10.1). For more discussion
on history and applications, see Sigman (1995), where the shift-coupling
result is used regularly throughout the text.
This chapter (Chapter 8) is based on Thorisson (1995a), except the
perfect simulation, which is from Asmussen, Glynn, and Thorisson (1992).
Acceptance-rejection dates back to von Neumann (1951).
486 Notes
Chapter 9 PALM DUALITIES IN HIGHER DIMENSIONS
The material in this chapter is from Thorisson (1999). For more
information on Voronoi cells, see Okabe, Boots, and Sugihara (1992). For the
point-at-zero Palm version of a stationary point process when d > 1, see
Matthes, Kerstan, and Mecke (1978), Daley and Vere-Jones (1988), and
Stoyan, Kendall, and Mecke (1987). I am not aware of any references on the
randomized-origin Palm version for d > 1. Neither am I aware of references
on full-fledged Palm dualities when d > 1. The one-to-one correspondence
between a stationary point process and its Palm version is a rather
unsatisfactory duality because the Palm version is derived from the stationary
point process, that is, the Palm version does not have a defining property
without reference to its stationary dual. Here (Theorems 7.1 and 8.1) we
present Palm dualities between two classes of processes of equal status:
stationary and point-stationary. j
The only references that the authorhas found where the point-stationarity
problem is mentioned are Mandelbrot (1983) and a paper by Kagan and
Vere-Jones (1988). The following is taken from that paper:
Mandelbrot (1983) [suggests] that point process models for self-similar
behaviour should be sought not within the class of homogeneous
processes [called stationary in this book] but within the class of
processes for which the behaviour relative to a given point of the process
is independent of the point selected as origin. ... What hampers
us in pursuing this discussion, is that we are not aware of a well
established theory for such Palm-stationary point processes [called
point-stationary in this book].
Chapter 9 presents such a theory. Kagan and Vere-Jones (1988) continue:
Further examples of Palm-stationary processes suggested by
Mandelbrot, such as the Levy dust model and zeros of Brownian motion,
have a very complex point set structure, including finite
accumulation points, and cannot be modelled within the standard point process
framework.
The proposed definition at the end of Chapter 9 suggests a solution to this
general point-stationarity problem.
Chapter 10 (Sections 2-4) REGENERATION
Regenerative processes (called classical regenerative here) were introduced
in Smith (1955). The i.i.d. cycle characterization is from Smith (1958). He
uses the term 'tour' for cycle.
Regeneration in the wide sense was also introduced in Smith (1955); he
uses the term 'equilibrium process'. Classical regeneration became heavily
used, but wide-sense regeneration passed more or less unnoticed, to be
Notes 487
rediscovered only at the end of the seventies independently by Asmussen
and the author. Thorisson (1981, 1983) uses the term 'time-homogeneous
regeneration' [treating wide-sense regeneration as a special case of time-
inhomogenous wide-sense regeneration which is called simply 'regeneration'
in Thorisson (1981, 1983)]. Asmussen (1987) uses the term 'regeneration'
for wide-sense regeneration. Wide-sense regeneration turned out to be the
appropriate characterization of the regeneration that Nummelin (1978) and
Athreya and Ney (1978) had found in Harris chains. The clumsy term
'wide-sense regenerative' is simply the phrase that is commonly used to
indicate this type of regeneration. The excellent term 'lag-? regenerative"
was suggested to me by Peter Glynn.
The related phenomenon of a renovating event was discovered in the
multi-server queue by Akhmarov and Leont'eva (1976), and the general
concept for recursive sequences was formulated in Borovkov (1978). See
also Borovkov (1984), Borovkov and Foss (1992, 1994), and Baccelli and
Bremaud (1994).
A stationary version of a classical regenerative process with finite-mean
cycle-lengths was constructed in Miller (1972); his method is basically the
simulation algorithm in Section 6.7 of Chapter 8. Here we use the length-
biasing and uniform-shifting from Thorisson (1981): Theorems 2.1, 2.2, 3.1,
and 4.1 are Propositions 2.1 and 3.1 in that work. The key coupling results,
Theorems 3.2, 4.2 and 5.2, are from Thorisson (1981, 1983). Theorems
2.3, 2.4, 3.4, and 4.4 have been around for years scattered throughout the
literature; the limit parts often established by an application of the key
renewal theorems. The result (3.27) on smooth asymptotic stationarity in
Theorems 3.3 and 4.3 was established in Glynn and Iglehart (1989). The
result on Cesaro asymptotic stationarity in Section 2.7 was established in
Glynn and Sigman (1992). The material in Section 4.4 is from Sigman,
Thorisson, and Wolff (1994).
Theorem 4.6 is an elaboration on the approach in Nummelin (1978)
and Athreya and Ney (1978). These papers considered discrete-time Harris
chains with I = 1 and established classical regeneration. The extension to
/ ^ 2 is not trivial, and was not well understood for some time. However,
the one-dependence of the cycles is in Glynn (1982), called one-dependent
regeneration; and the wide-sense regeneration is in Asmussen (1987), the
lag-? regeneration is also noted there; see also Asmussen and Thorisson
(1987) and Sigman (1990).
Glynn (1982) considers simulation of Harris chains. This work also uses
the one-dependence of the cycles to establish the strong law of large
numbers and the central limit theorem in the Harris chain context. Glynn (1994)
extends these ideas to continuous-time Harris processes, and Andradottir,
Calvin, and Glynn (1995) shows how to use the splitting idea to increase
the frequency of regeneration points in a given simulated realization.
The fact that the renovation in multi-server queues is actually lag-?
regeneration for the continuous-time process (Section 4.6) was realized in a
488 Notes
discussion with S0ren Asmussen, Serguei Foss, and Vladimir Kalashnikov at
the Second Symposium on Queueing Theory and Related Topics in Poland,
January 1990. This resulted in two papers, Foss and Kalashnikov (1991),
and Asmussen and Foss (1993).
Asmussen (1987) treats regenerative processes and Harris chains,
highlighting them as basic mathematical tools in applied probability. See Sig-
man and Wolff (1993) for a review of regenerative processes with many
references. See Kalashnikov (1994) for topics in regenerative processes. See
Nummelin (1984) and Meyn and Tweedie (1993) for more aspects of
general Markov chain theory. For other regenerative phenomena such as
regenerative random sets, see Kingman (1972) and Kallenberg (1997) and
the references therein. These phenomena are mo/e general than classical
regeneration, and orthogonal to wide-sense regeneration and renovation.
However, all these concepts are still time-homogeneous.
Chapter 10 (Sections 5-10) REGENERATION
Time-inhomogeneous regeneration was introduced in Thorisson (1981,1983)
and the material in Sections 5-7 is (mostly) from there. The coupling
approach (Section 6) and the study of the coupling time (Section 7) were
inspired by the treatment of renewal processes in Ney (1981) and Lind-
vall (1982). The function class A is in Stone and Wainger (1967) and was
considered in Ney (1981). The concave functions and the Orlicz space
functions (growing slower than some power function) are generalizations of the
power functions considered in Lindvall (1982). The convergence results in
Section 7 were applied in queueing theory in Thorisson (1985a, 6, c).
Conditions related to the stochastic domination condition (5.8) can be found
in Alsmeyer (1991, 1994a, 1995).
The asymptotics from-the-past in Section 8 [and the material in Sections
5.7 and 5.8] are from Thorisson (1988a); see also Thorisson (1985d, 1986a,
1990). Coupling from-the-past was introduced in these papers (under the
name 'backward-successful couplings') to establish the existence of a two-
sided (possibly nonstationary) limit process coming in from-the-past (called
'backward limit'). The idea of asymptotics from-the-past seemed to be the
natural way to obtain limit results in a time-inhomogeneous context.
Asymptotics from-the-past date back to (surprise?) Kolmogorov (1936).
Kolmogorov studied finite-state time-inhomogeneous Markov chains with
transition matrices p(m>") from time m to time n and showed (by simply
sending m backward to minus infinity along a subsequence) that there is a
probability vector solution (7r(n); —oo < n < oo) to the equations
v(n) =7r(m)p(m,n)) -oo < m < n< oo;
thus there always exists a two-sided version of a time-inhomogeneous finite-
state Markov chain. Kolmogorov's results were elaborated on by Black-
well (1945); see also Cohn (1974, 1982), Seneta (1981), Sonin (1987), and
Notes 489
Brandt, Lisek, and Nerman (1990). Asymptotics from-the-past showed up
in a queueing context in Loynes (1962). Coupling from-the-past can be
found in the renovation literature; see Borovkov and Foss (1994). The
beautiful coupling from-the-past simulation algorithm (Section 11) was invented
by Propp and Wilson (1996).
Taboo regeneration (Section 9) and taboo stationarity (Section 10) are
from Glynn and Thorisson (19996), where renewal theory is used to
derive the limit results; see also Glynn and Thorisson (1999a) on Markov
processes. The approach in this book (based on time-inhomogeneous
regeneration and asymptotics from the-past) has not been published
elsewhere. Taboo stationarity generalizes so-called quasi-stationarity of Markov
chains. This topic was introduced in the sixties; see Seneta and Vere-
Jones (1966); Tweedie (1974); Arjas and Nummelin (1976); Nummelin
and Tweedie (1978); Nummelin (1984); and Ferrari, Kesten, and Martinez
(1996). The excellent descriptive term 'taboo' is in Chung (1967).
Chapter 10 (Section 11) Perfect Simulation
The now accepted term 'perfect simulation' is coined by Kendall (1998);
'exact sampling' is also used. The earliest positive result on perfect
simulation seems to be the method presented in Chapter 8 (Section 6) and
Chapter 9 (Section 7), which is from Asmussen, Glynn, and Thorisson (1992)
[see also the preliminary abstract Thorisson (1987-6)]. This paper
considered the general question of when perfect simulation (called stationarity
detection) is possible and when not. It also presents a perfect simulation
algorithm for finite-state Markov chains; but this algorithm was hampered
by tremendous use of computer time.
The powerful idea of using coupling from-the-past for perfect
simulation was introduced in the highly influential paper by Propp and
Wilson (1996). The concluding example in Section 11.6 was communicated
by Serguei Foss; see Foss and Tweedie (1999). A related domination idea
was presented in Kendall (1998) in the context of spatial point processes.
For more of the subsequential work, see Murdoch and Green (1998), Hag-
gstrom, Lieshout, and M0ller (1999), Haggstrom and Nelander (1999),
Wilson (1998, 1999), and Fill (1998). For further references, see David Wilson's
homepage (dimacs. rutgers . edu/~dbwilson/exact).
The example in the second paragraph of the introduction to Section 5
was suggested to me by Peter Glynn as a situation where coupling from-
the-past might be useful for simulation in a time-inhomogeneous context.
* * *
490 Notes
The poem at the front of the book is from Solarljod, the ethereal Sun
Poems of 13th century Iceland. It has been set to music by Jon Nordal in
Ottusongvar a vori (Matins in Spring, 1993). The following is an English
interpretation by Alan Boucher:
From the South I saw
the Sun-Hart step;
leading him, two together;
light his hooves
on the hills below,
his horns reached high to the heavens.
The words at the end of Chapter 2 are the opening lines of Strawberry
Fields Forever by the Beatles (1967).
The words at the end of Chapter 10 are the opening lines of I Talk
to the Wind from the album IN THE COURT OF THE CRIMSON KING, AN
OBSERVATION BY KING CRIMSON (1969). They are followed by:
I talk to the wind
My words are all carried away
I talk to the wind
The wind does not hear
The wind cannot hear.
References
ACCARDI, L.
[1984] Some trends and problems in quantum probability. Springer
Lecture Notes in Mathematics 1055, 1-19.
[1995] Can mathematics help solving the interpretational problems of
quantum theory? // Nuovo Cimento 110 B, 685-721.
[1998] On the EPR paradox and the Bell inequality. Preprint 352,
Centra V. Volterra, Universita degli Studi di Roma Tor Vergata.
AKHMAROV, I. and LEONT'EVA, N.
[1976] Conditions for the convergence to the limit laws and law of large
numbers for queueing systems. Theo. Probab. Appl. 21, 559-570.
ALDOUS, D.
[1983] Random walks on finite groups and rapidlymixing Markov chains.
Springer Lecture Notes in Mathematics 986.
ALDOUS, D. and DIACONIS, P.
[1986] Shuffling cards and stopping times. Amer. Math. Monthly 93,
333-348.
[1987] Strong uniform times and finite random walks. Adv. Appl. Math.
8, 69-97.
ALDOUS, D. and PITMAN, J.W.
[1979] On the zero-one law for exchangeable events. Ann. Probab. 7,
704-723.
491
492 References
ALDOUS, D. and THORISSON, H.
[1993] Shift-coupling. Stock. Proc. Appl. 44, 1-14.
ALSMEYER, G.
[1991] Random walks with stochastically bounded increments:
Foundations and characterization results. Result, der Math. 19, 22-45.
[1994a] Random walks with stochastically bounded increments: Renewal
theory via Fourier Analysis. Yokohama Math. J. 42, 1-21.
[19946] On the Markov renewal theorem. Stoch. Proc. Appl. 50, 37-56.
[1995] Random walks with stochastically bounded increments: Renewal
theory. Math. Nachr. 175, 13-31.
[1997] The Markov renewal theorem and related results. Markov Proc.
Rel. Fields 3, 103-127.
ANDRADOTTIR, S., CALVIN, J.M., and GLYNN, P.W.
[1995] Accelerated regeneration for Markov chain simulations. Probab.
Eng. Inf. Sc. 9, 497-523.
ARJAS, E. and NUMMELIN, E.
[1976] A direct construction of the R-invariant measure for a Markov
chain on a general state space. Ann. Probab. 4, 674-679.
ARJAS, E., NUMMELIN, E. and TWEEDIE, R.L.
[1978] Uniform limit theorems for non-singular renewal and Markov
renewal processes. J. Appl. Probab. 15, 112-125.
ASH, R.B.
[1972] Real Analysis and Probability. Academic Press, New York.
ASMUSSEN, S.
[1987] Applied Probability and Queues. Wiley, New York.
[1992] On coupling and weak convergence to stationarity. Ann. Appl.
Probab. 2, 739-751.
ASMUSSEN, S. and FOSS, S.G.
[1993] Renovation, regeneration, and coupling in multiple-server queues
in continuous time. Frontiers in Pure and Appl. Probab. 1, 1-6.
ASMUSSEN, S., GLYNN, RW. and THORISSON, H.
[1992] Stationarity detection in the initial transient problem. ACM
Trans. Modelling Comput. Simulation 2, 130-157.
ASMUSSEN, S. and THORISSON, H.
[1987] A Markov chain approach to periodic queues. J. Appl. Probab.
24, 215-225.
References 493
ASPECT, A, DALIBARD, J., and ROGER, G.
[1982] Experimental test of Bell's inequalities using time-varying
analyzers. Physical Review Letters 49, 1804-1807.
ATHREYA, K.B. and NEY, P.
[1978] A new approach to the limit theory of recurrent Markov chains.
Trans. Amer. Math. Soc. 245, 493-501.
ATHREYA, K.B., MCDONALD, D., and NEY, P.
[1978] Coupling and the renewal theorem. Amer. Math. Monthly 85,
809-814.
BACCELLI, F. and BREMAUD, P.
[1994] Elements of Queueing Theory. Springer, Berlin.
BARBOUR, A., HOLST, L., and JANSON, S.
[1992] Poisson Approximation. Oxford University Press, Oxford.
BARBOUR, A., LINDVALL, T., and ROGERS, L.C.G.
[1991] Stochastic ordering of order statistics. J. Appl. Prob. 28, 278-286.
BELL, J.S.
[1964] On the Einstein Podolsky Rosen Paradox. Physics 1:3, 195-200.
[1966] On the problem of hidden variables in quantum mechanics.
Reviews of Modern Physics 38:3, 447-453.
BERBEE, H.C.P.
[1979] Random walks with stationary increments and renewal theory.
Math. Centre Tract 112 (Mathematisch Centrum, Amsterdam).
[1986] Periodicity and absolute regularity. Israel J. Math. 55, 289-304.
[1987a] Convergence rates in the strong law for bounded mixing
sequences. Probab. Th. Rel. Fields 74, 255-270.
[19876] Chains with infinite connections: uniqueness and Markov
representation. Probab. Th. Rel. Fields 76, 243-253.
BILLINGSLEY, P.
[1971] Weak convergence of Measures: Applications in Probability.
SIAM, Philadelphia.
[1986] Probability and Measure. 2nd ed. Wiley, New York.
BLACKWELL, D.
[1945] Finite non-homogeneous chains. Ann. Math. 46, 594-599.
[1948] A renewal theorem. Duke Math. J. 15, 145-150.
[1953] Extension of a renewal theorem. Pacific J. Math. 3, 315-320.
494 References
BOGACHEV, V.
[1998] Measures on topological spaces. J. Math. Sci. (New York) 91,
3033-3156.
BOROVKOV, A.A.
[1978] Theorems of ergodicity and stability for one class of stochastic
equations. Theo. Probab. Appl. 23, 241-262.
[1984] Asymptotic Methods in Queueing Theory. Wiley, New York.
BOROVKOV, A.A. and FOSS, S.G.
[1992] Stochastically recursive sequences and their generalizations.
Siberian Advances in Mathematics 2, 16-81.
[1994] Two ergodicity criteria for stochastically recursive sequences.
Acta Applicandae Mathematicae 34, 125-134.
\
BOURBAKI, N.
[1948] Elements de Mathematiques. Topologie Generale. Chapitre IX.
Hermann, Paris.
[1951] Elements de Mathematiques. Topologie Generale. Chapitre III-IV.
Hermann, Paris.
BRANDT, A., FRANKEN, P. and LISEK, B.
[1990] Stationary Stochastic Models. Wiley.
BRANDT, A. and LAST, G.
[1995] Marked Point Processes on the Real Line. Springer, Berlin.
BRANDT, A., LISEK, B., and NERMAN, 0.
[1990] On stationary Markov chains and independent random variables.
Stoch. Proc. Appl. 34, 19-24.
BREIMAN, L.
[1969] Probability and Stochastic Processes. Houghton Mifflin, Boston.
CHEN, M.F.
[1997] Trilogy of couplings - new variational formula of spectral gap.
Probability Towards 2000, Springer Lecture Notes in Statistics
128,123-136.
CHEN, M.F. and LI, S.F.
[1989] Coupling methods for multidimensional diffusion processes. Ann.
Probab. 17, 151-177.
CHUNG, K.L.
[1967] Markov Chains with Stationary Transition Probabilities. 2nd ed.
Springer, Berlin.
References 495
CINLAR, E.
[1975] Introduction to Stochastic Processes. Prentice-Hall, Englewood
Cliffs, New Jersey.
COHN, H.
[1974] On the tail events of a Markov chain. Z. Wahrscheinlichkeitsth.
29, 65-72.
[1982] On a class of non-homogeneous Markov chains. Math. Proc.
Cam. Phil. Soc. 92, 527-534.
CRANSTON, M.
[1991] Gradient estimates on manifolds using coupling. J. Functional
Analysis 99, 110-124.
[1993] A probabilistic approach to Martin boundaries for manifolds
with ends. Prob. Theo. Rel. Fields 96, 319-334.
CRANSTON, M. and MOUNTFORD, T.S.
[1996] The Strong Law of Large Numbers for a Brownian polymer. Ann.
Probab. 24, 1300-1323.
CRANSTON, M. and WANG, F.
[1999] Equivalence of coupling and shift coupling. (Submitted).
CUESTA, J.A. and MATRAN, C.
[1994] Stochastic convergence through Skorohod representation
theorems and Wasserstein distances. First Intern. Conf. on Stoch.
Geometry, Convex Bodies and Empirical Measures. Suppl. Ren-
diconti del Circolo Matematico Palermo, Serie II, 35, 89-113.
DALEY, D.J. and VERE-JONES, D.
[1988] An Introduction to the Theory of Point Processes. Springer, New
York.
DE LA PENA, V.H. and GINE, E.
[1999] Decoupling. Springer, New York.
DEMBO, A., PERES, Y., ROSEN, J., and ZEITOUNI, O.
[1999] Thick points for planar Brownian motion and the Erdos-Taylor
conjecture on random walk. (Preprint)
DIACONIS, P.
[1988] Group Representation in Probability and Statistics. IMS Lecture
Notes - Monograph Series 11, IMS, Hay ward.
DOEBLIN, W.
[1938] Expose de la theorie des chaines simple constantes de Markov a
un nombre fini d'etats. Rev. Math. Union Interbalkan. 2, 77-105.
496 References
DOOB, J.L.
[1948] Renewal theory from the point of view of the theory of
probability. Trans. Amer. Math. Soc. 63, 422-438.
DUDLEY, R.M.
[1968] Distances of probability measures and random variables. Ann.
Math. Statist. 39, 1563-1572.
DURRETT, R.
[1991] Probability: Theory and Examples. Brooks/Cole, Pacific Grove,
California.
EINSTEIN, A., PODOLSKY, B., and ROSENj'N.
[1935] Can quantum-mechanical description of physical reality be
considered complete? Physical Review 47^,777-780.
ERHARDSSON, T.
[1999] Compound Poisson approximation for Markov chains using
Stein's method. Ann. Probab. 27, 565-596.
ETHIER, S.N. and KURTZ, T.G.
[1986] Markov Processes. Wiley, New York.
EVANS, W., KENYON, C, PERES, Y., and SCHULMAN, L.
[1999] Broadcasting on trees and the Ising model. Ann. Appl. Probab.
(to appear).
FELLER, W.
[1971] An Introduction to Probability Theory and Its Applications,
Vol. 2. 2nd ed. Wiley, New York.
FERRARI, P., KESTEN, H., and MARTINEZ, S.
[1996] i?-positivity, quasi-stationary distributions, and ratio limit
theorems for a class of probabilistic automata. Ann. Appl. Probab.
6, 577-616.
FILL, J.A.
[1998] An interruptible algorithm for perfect sampling via Markov
chains. Ann. Appl. Probab. 8, 131-162.
FILL, J.A. and MACHIDA, M.
[1999] Stochastic monotonicity and realizable monotonicity. Ann. Appl.
Probab. (to appear).
FOSS, S.G. and KALASHNIKOV, V.
[1991] Regeneration and renovation in queues. Queueing Systems
Theory Appl. 8, 211-224.
References 497
FOSS, S.G. and TWEEDIE, R.
[1999] Unifying approaches to backward coupling. (In preparation).
FRANKEN, P., KONIG, D., ARNDT, U. and SCHMIDT, V.
[1981] Queues and Point Processes. Akademie-Verlag.
GARSIA, A.M.
[1973] On a convex function inequality for martingales. Ann. Probab.
1, 171-174.
GEORGII, H.-O.
[1997] Orbit coupling. Ann. Inst. H. Poincare, Prob. et Statist. 33 253-
268.
GEORGII, H.-O., HAGGSTROM, 0., and MAES, C.
[1999] The random geometry of equilibrium phases. Phase Transitions
and Critical Phenomena (C. Domb and J.L. Lebowitz, editors),
Academic Press, London (to appear).
GILL, R.D.
[1998] Critique of 'Elements of Quantum Probability'. Quantum
Probability Communications 10, 351-361.
[1999] Quantum probability and the impossible coupling (in
preparation) . www.math.uu.nl/people/gill/Preprints/impossible.
ps.gz
GLYNN, P.W.
[1982] Simulation Output Analysis for General State Space Markov
Chains. Dissertation. Department of Operations Research,
Stanford University.
[1994] Some topics in regenerative steady-state simulation. Acta Appli-
candae Mathematicae 34, 225-236.
GLYNN, P.W. and IGLEHART, D.L.
[1989] Smoothed limit theorems for equilibrium processes. Probability,
Statistics and Mathematics. Papers in Honor of S. Karlin.
Academic Press, New York, 89-102.
GLYNN, P.W., IGLEHART, D. L., and WONG, E.W.
[1999] Transient simulation via empirically based coupling. Probab. Eng.
Inf. Sc. 13, 147-167.
GLYNN, P.W. and SIGMAN, K.
[1992] Uniform Cesaro limit theorems for synchronous processes with
applications to queues. Stoch. Proc. Appl. 40, 29-43.
498 References
GLYNN, P.W. and THORISSON, H.
[1999a] Two-sided taboo limits for Markov processes and associated
perfect simulation. (Submitted)
[19996] Taboo stationarity and limit theory for taboo regenerative
processes. (Preprint)
GLYNN, P.W. and WONG, E.W.
[1996] Efficient simulation via coupling. Probab. Eng. Inf. Sc. 10, 165-
186.
GOLDSTEIN, S.
[1979] Maximal coupling. Z. Wahr-scheinliefikeitsth. 46, 193-204.
GREVEN, A. /
[1987] Coupling of Markov chains and randomized stopping times. Part
I and II. Probab. Th. Rel. Fields 75, 195-212 and 431-458.
GRIFFEATH, D.
[1975] A maximal coupling for Markov chains. Z. Wahrscheinlichkeit-
sth. 31, 95-106.
[1978] Coupling methods for Markov processes. Studies in Probability
and Ergodic Theory. Advances in Mathematics. Supplementary
Studies 2.
GRIMMETT, G.R. and STIRZAKER, D.R.
[1992] Probability and Random Processes. 2nd ed. Oxford University
Press, Oxford.
HAGGSTROM, 0.
[1998] Random-cluster representations in the study of phase
transitions. Markov Proc. Rel. Fields 4, 275-321.
HAGGSTROM, 0., VAN LIESHOUT, M.N.M., and M0LLER, J.
[1999] Characterization results and Markov chain Monte Carlo
algorithms including exact simulation for some spatial point
processes. Bernoulli 5, 641-658.
HAGGSTROM, O. and NELANDER, K.
[1999] On exact simulation of Markov random fields using coupling from
the past. Scand. J. Stat. 26, 395-411.
HAGGSTROM, 0. and PERES, Y.
[1999] Monotonicity of uniqueness for percolation on Cayley graphs:
all infinite clusters are born simultaneously. Probab. Theo. Rel.
Fields 113, 273-285.
References 499
HAGGSTROM, 0., PERES, Y., and SCHONMANN, R.
[1999] Percolation on transitive graphs as a coalescent process:
relentless merging followed by simultaneous uniqueness. Perplexing
Probability Problems: Papers in Honor of Harry Kesten, Birk-
hauser, 69-90.
HALMOS, P.R.
[1950] Measure Theory. Van Nostrand, New York.
HARISON, V. and SMIRNOV, S.N.
[1990] Jonction maximale en distribution dans le cas markovien. Probab.
Th. Rel. Fields 84, 491-503.
HARRIS, T.E.
[1955] On chains of infinite order. Pacific J. Math. 5, 713-724.
HODGES, J.L. and ROSENBLATT, M.
[1953] Recurrence-time moments in random walks. Pacific J. Math. 3,
127-136.
HOEL, P.G., PORT, S.C., and STONE, C.J.
[1972] Introduction to Stochastic Processes. Houghton Mifflin, Boston.
JAGERS, P.
[1997] Coupling and Population Dependence in Branching Processes.
Ann. Appl. Probab. 7, 281-298.
JAIN, N.C.
[1966] Some limit theorems for the general Markov Process. Z. Wahr-
scheinlichkeitsth. verw. Geb. 6, 206-223.
JAMISON, B. and OREY, S.
[1967] Markov chains recurrent in the sense of Harris. Z. Wahrschein-
lichkeitsth. verw. Geb. 8, 41-48.
KAGAN, Y.Y. and VERE-JONES, D.
[1988] Statistical Models of Earthquake Occurrence. Springer Lecture
Notes in Statistics 114, 398-425.
KALASHNIKOV, V.
[1977] A uniform estimate of the rate of convergence in the discrete
time renewal theorem. Theo. Probab. Appl. 22, 399-403.
[1980] Estimation of duration of transition regime for complex stochastic
systems. Trans. Seminar, VNIISI, Moscow, 63-71 (in Russian).
[1994] Topics on Regenerative Processes. CRC Press, Boca Raton.
500 References
KALASHNIKOV, V. and THORISSON, H. (editors)
[1994] Applications of Coupling and Regeneration. Acta Applicandae
Mathematicae 34 (Special Issue).
KALLENBERG, 0.
[1986] Random Measures. 4th ed. Academie-Verlag and AcademicPress.
Berlin and London.
[1988] Spreading and predictable sampling in exchangeable sequences
and processes. Ann. Probab. 16, 508-534.
[1997] Foundations of Modern Probability. Springer, New York.
/
KAMAE, T., KRENGEL, U., and O'BRIEN/G.L.
[1977] Stochastic inequalities on partially ordered spaces. Ann. Probab.
5, 899-912. I
KARLIN, S. and TAYLOR, H.M.
[1975] A First Course in Stochastic Processes. 2nd ed. Academic Press,
New York.
KENDALL, W.S.
[1998] Perfect simulation for the area-interaction point process.
Probability Towards 2000, Springer Lecture Notes in Statistics 128,
218-234.
KINCHINE, Y.A.
[1960] Mathematical Methods in the Theory of Queues. Griffin, London.
(Russian ed. 1955.)
KINGMAN, J.F.C.
[1972] Regenerative Phenomena. Wiley, New York.
KOLMOGOROV, A.
[1936] Zur Theorie der Markoffschen Ketten. Math. Ann. 112,155-160.
KRASNOSELSKII, M.A. and RUTICKII, Ya.B.
[1961] Convex Functions and Orlicz Spaces. Noordhoff, Groningen.
KUMMERER, B. and MAASSEN, H.
[1998] Elements of quantum probability. Quantum Probability
Communications 10, 73-100.
LEHMANN, E.L.
[1959] Testing Statistical Hypothesis. Wiley, New York.
LIGGETT, T.M.
[1985] Interacting Particle Systems. Springer, New York.
References 501
LINDVALL, T.
[1977] A probabilistic proof of Blackwell's renewal theorem. Ann. Prob.
5, 57-70.
[1979a] On coupling of discrete renewal processes. Z. Wahrscheinlichkeit-
sth. 48, 57-70.
[19796] A note on coupling of birth and death processes. J. Appl. Probab.
16,505-512.
[1982] On coupling of continuous-time renewal processes. J. Appl. Prob.
19, 82-89.
[1983] On coupling of diffusion processes. J. Appl. Probab. 20, 82-93.
[1986] On coupling of renewal processes with use of failure rates. Stoch.
Proc. Appl. 22, 1-15.
[1988] Ergodicity and inequalities in a class of point processes. Stoch.
Proc. Appl. 30, 121-131.
[1991] W. Doeblin 1915-1940. Ann. Probab. 19, 929-934.
[1992a] A simple coupling of renewal processes. Adv. Appl. Probab. 24,
1010-1011.
[19926] Lectures on the Coupling Method. Wiley, New York.
[1997] Stochastic monotonicities in Jackson queueing networks. Probab.
Eng. Inf. Sc. 11, 1-9.
[1999] On Strassen's theorem on stochastic domination. Elect. Comm.
in Probab. 4, 51-59.
[2000] On simulation of stochastically ordered life length variables. Prob.
Eng. Inf. Sc. (to appear).
LINDVALL, T. and ROGERS, L.C.G.
[1986] Coupling of multidimensional diffusions by reflection. Ann. Prob.
14, 860-872.
[1996] On coupling of random walks and renewal processes. J. Appl.
Probab. 33, 122-126.
LOYNES, R.M.
[1962] The stability of a queue with non-independent inter-arrival and
service time. Proc. Camb. Phil. Soc. 58, 494-520.
LYONS, R., PEMANTLE, R., and PERES, Y.
[1996] Random walks on the lamplighter group. Ann. Probab. 24, 1993-
2006.
MANDELBROT, B.B.
[1983] The Fractal Geometry of Nature. W. H. Freeman, San Francisco.
502 References
MATTHES, K., KERSTAN, J., and MECKE J.
[1978] Infinitely Divisible Point Processes. Wiley, Chichester.
MAUDLIN, T.
[1994] Quantum Non-locality and Relativity. Blackwell, Oxford.
MCFADDEN, J.A.
[1962] On the lengths of intervals in a stationary point process. Journ.
Royal Stat. Soc. Ser. B 24, 364-382.
MELAMED, B.
[1993] An Overview of TES Processes and Modeling Methodology.
Performance Evaluation of Computer and Communications
Systems, Springer Lecture Notes in Cdmputer Science, 359-393.
MERMIN, N.D.
[1985] Is the moon there when nobody looks? Reality and quantum
theory. Physics Today, 38-47.
MEYN, S.P. and TWEEDIE, R.L.
[1993] Markov Chains and Stochastic Stability. Springer, New York.
MILLER, D.R.
[1972] Existence of limits in regenerative processes. Ann. Math. Statist.
43,1275-1282.
MONTGOMERY, D. and ZIPPIN, L.
[1955] Topological Transformation Groups. Wiley (Interscience), New
York.
MURDOCH, D.J. and GREEN, P.J.
[1998] Exact sampling from a continuous state space. Scand. J. Stat.
25, 483-502.
NAWROTZKI, K.
[1978] Einige Bemerkung zur Verwendung der Palmschen Verteilung in
der Bedienungstheorie. Mathematische Operationsforschung und
Statistik, Series Optimization 9, 241-253.
NEVEU, J.
[1975] Discrete-Parameter Martingales. North-Holland, Amsterdam. t
[1976] Processus ponctuels. Springer Lecture Notes in Mathematics
598,249-445.
NEY, P.
[1981] A refinement of the coupling method in renewal theory. Stoch.
Proc. Appl. 11, 11-26.
References 503
NIEMI, S. and NUMMELIN, E.
[1986] On non-singular renewal kernels with an application to a
semigroup of transition kernels. Stock. Proc. Appl. 22, 177-202.
NIEUWENHUIS, G.
[1989a] Asymptotics for Point Processes and General Linear Processes.
Thesis. Catholic University of Nijmegen.
[19896] Equivalence of functional limit theorems for stationary point
processes and their Palm distributions. Probab. Th. Rel. Fields 81,
593-608.
[1994] Bridging the gap between a stationary point process and its Palm
distribution. Statistica Neerlandica 48, 37-62.
NUMMELIN, E.
[1978] A splitting technique for Harris recurrent Markov chains. Z.
Wahrscheinlichkeitsth. 43, 309-318.
[1984] General Irreducible Markov Chains and Non-Negative Operators.
Cambridge University Press, Cambridge.
NUMMELIN, E. and TWEEDIE, R.L.
[1978] Geometric ergodicity and .R-positivity for general Markov chains.
Ann. Probab. 6, 404-420.
OKABE, A., BOOTS, B., and SUGIHARA, K.
[1992] Spatial Tessellations - Concepts and Applications of Voronoi
Diagrams. Wiley, New York.
OREY, S.
[1959] Recurrent Markov Chains. Pacific J. Math. 9, 805-827.
[1971] Limit Theorems for Markov Chain Transition Probabilities. Van
Nostrand, London.
ORNSTEIN, D.
[1969] Random walks I, II. Trans. Am. Math. Soc. 138,1-42 and 45-60.
PALM, C.
[1943] Intensitatsshwankungen in Fernsprechverkehr. Ericssons
Technics 44, 1-189. (English translation: (1988), Intensity Variations
in Telephone Traffic. North-Holland Studies in
Telecommunication 10. Elsevier.)
PEMANTLE, R.
[1989] Randomization time for the overhand shuffle. Journ. Theo. Prob.
2, No. 1.
504 References
PITMAN, J.W.
[1974] Uniform rates of convergence for Markov chain transition
probabilities. Z. Wahrscheinlichkeitsth. 29, 193-227.
[1976] On coupling of Markov chains. Z. Wahrscheinlichkeitsth. 35,
315-322.
PITOWSKI, I.
[1989] Quantum probability, quantum logic. Springer Lecture Notes in
Physics.
PROPP, J.G. and WILSON, D.B. /
[1996] Exact sampling with coupled Markov chains and applications
to statistical mechanics. Random'Structures and Algorithms 9,
223-252. i
ROBERTS, G.O. and ROSENTHAL, J.S.
[1997] Shift-coupling and convergence rates of ergodic averages. Stoch.
Models 13, 147-165.
ROLSKI, T.
[1981] Stationary Random Processes Associated with Point Processes.
Lecture Notes in Statistics 5. Springer.
RYLL-NARDZEWSKI, C.
[1961] Remarks on processes of calls. Proc. 4th Berkeley Symp. Math.
Stat. Probab. 2, 455-465.
SCARSINI, M.
[1989] Copulae of probability measures on product spaces. Journal of
Multivariate Analysis 31, 201-219.
SCHWEIZER, B. and SKLAR, A.
[1983] Probabilistic Metric Spaces. Elsevier, New York.
SENETA, E.
[1981] Non-Negative Matrices and Markov Chains. Springer, New York.
SENETA, E. and VERE-JONES, D.
[1966] On quasi-stationary distributions in discrete-time Markov chains
with a denumerable infinity of states. J. Appl. Prob. 3, 403-434.
SHURENKOV, V.M.
[1984] On the theory of Markov renewal. Theo. Prob. Appl. 29, 247-265.
SIGMAN, K.
[1990] One-dependent regenerative processes and queues in continuous
time. Math. Oper. Res. 15, 175-189.
References 505
[1995] Stationary Marked Point Processes: An Intuitive Approach.
Chapman and Hall, New York.
SIGMAN, K., THORISSON, H., and WOLFF, R.W.
[1994] A note on the existence of regeneration times. J. Appl. Probab.
31, 1116-1122.
SIGMAN, K. and WOLFF, R.W.
[1993] A review of regenerative processes. SIAM Review 35, 269-288.
SILVESTROV, D.S.
[1979] The method of a single probability space in renewal theory. Theo.
Probab. Appl. 24, 655-656.
[1994] Coupling for Markov renewal processes and the rate of
convergence in ergodic theorems for processes with semi-Markov
switchings. Acta Appl. Math. 34, 109-124.
SKOROHOD, A.V.
[1956] Limit theorems for stochastic processes. Theo. Probab. Appl. 1,
261-290.
[1965] Studies in the Theory of Random Processes. Addison-Wesley,
Reading, Mass.
SMITH, W.L.
[1954] Asymptotic renewal theorems. Proc. Roy. Soc. Edinburgh Ser. A
64, 9-48.
[1955] Regenerative stochastic processes. Proc. Roy. Soc. London Ser.
A 232, 6-31.
[1958] Renewal theory and its ramifications. J. Roy. Statist. Soc. Ser.
B 20, 243-302.
SONIN, I.M.
[1987] A theorem on separation of jets and some properties of random
sequences. Stochastics 21, 231-250.
STONE, C.R.
[1966] On absolutely continuous components and renewal theory. Ann.
Math. Statist. 37, 271-275.
STONE, C.R. and WAINGER, S.
[1967] One-sided error estimates in renewal theory. Journal d Analyse
Mathematique XX, 325-352.
STOYAN, D., KENDALL, W.S., and MECKE, J.
[1987] Stochastic Geometry and its Applications. Wiley, New York.
506 References
STRASSEN, V.
[1965] The existence of probability measures with given marginals. Ann.
Math. Statist. 36, 423-439.
SVERCHKOV, M.Yu. and SMIRNOV, S.N.
[1990] Maximal coupling of D-valued processes. Soviet Math. Dokl. 41,
352-354.
TACKLIND, S.
[1945] Fourieranalytische Behandlung vom Erneuerungsproblem. Skan-
dinavisk Aktuarietidskrift 28, 68-105^
TEMPEL'MAN, A.A.
[1972] Ergodic theorems for general dynarnical systems. Trudy Moskov
Mat. Obsc. 26, 95-132. [Translation in Trans. Moscow Math.
Soc. 26, 94-132.]
[1992] Ergodic Theorems for Group Actions. Kluwer. Dordrecht.
THORISSON, H.
[1981] The Coupling of Regenerative Processes. Thesis. Department of
Mathematics, University of Goteborg.
[1983] The coupling of regenerative processes. Adv. Appl. Probab. 15,
531-561.
[1985a] The queue GI/G/l: Finite moments of the cycle variables and
uniform rates of convergence. Stoch. Proc. Appl. 19, 85-99.
[19856] The queue GI/GI/k: Finite moments of the cycle variables and
uniform rates of convergence. Stoch. Models 1, 221-238.
[1985c] On regenerative and ergodic properties of the A;-server queue with
nonstationary Poisson arrivals. J. Appl. Probab. 22, 893-902.
[1985rf] Backward limits of non-time-homogeneous Markov transition
probabilities. Stoch. Proc. Appl. 19, 20
[1986a] On non-time-homogeneity. Semi-Markov Models: Theory and
Applications, Plenum Press, New York and London, 351-368.
[19866] On maximal and distributional coupling. Ann. Probab. 14, 873-
876.
[1987a] A complete coupling proof of Blackwell's renewal theorem. Stoch.
Proc. Appl. 26, 87-97.
[19876] Construction of a stationary regenerative process with a view
towards simulation. Stoch. Proc. Appl. 26, 191.
[1988a] Backward limits. Ann. Probab. 16, 914-924.
[19886] Future independent times and Markov chains. Probab. Theo. Rel.
Fields 78, 143-148.
References 507
[1990] Backward limits and inhomogeneous regeneration. Probability
Theory and Mathematical Statistics. Proceedings of the 5th
International Vilnius Conference, 474-481.
[1992] Construction of a stationary regenerative process. Stoch. Proc.
Appl. 42, 237-253.
[1994a] Coupling and convergence of random elements, processes and
regenerative processes. Acta Appl. Math. 34, 85-107.
[19946] Shift-coupling in continuous time. Prob. Theo. Rel. Fields. 99,
477-483.
[1995a] On time- and cycle-stationarity. Stoch. Proc. Appl. 55, 183-209.
[19956] Coupling methods in probability theory. Scand. J. Stat. 22, 159-
182.
[1996] Transforming random elements and shifting random fields. Ann.
Probab. 24, 2057-2064.
[1997a] On e-coupling and piecewise constant processes. Stoch. Models
13, 27-38.
[19976] Markov processes and coupling. Theo. Stoch. Proc. 3, 424-438.
[1998] Coupling. Probability Towards 2000, Springer Lecture Notes in
Statistics 128, 319-339.
[1999] Point-stationarity in d dimensions and Palm theory. Bernoulli
5, 797-831.
[2000] Transformation coupling. To appear in Proceedings of the
International Conference on Stochastic Processes and Their
Applications, (A. Krishnamoorthy and P.V. Ushakumari, editors).
Springer.
TWEEDIE, R.L.
[1974] Quasi-stationary distributions for Markov chains on a general
state space. J. Appl. Probab. 11, 726-741.
VON NEUMANN, J.
[1951] Various techniques in connection with random digits. NBS Appl.
Math. Ser. 12, 36-38.
WILSON, D.B.
[1998] Annotated bibliography of perfectly random sampling with
Markov chains. Microsurveys in Discrete Probability, DIM ACS
Series in Discrete Mathematics and Theoretical Computer
Science 41, 209-220. American Mathematical Society. Updated
versions to appear at dimacs.rutgers.edu/~dbwilson/exact.
[1999] How to couple from the past using a read-once source of
randomness. Preprint.
o
Index
.4-coupling event, 149
^-identical, 149
e-coupling, 178, see Epsilon-couplings
distributional, 179
inequality, 181
maximality, 187
nondistributional, 179
successful, 72
CT-algebra
exchangeable, 243
generated by, 78, 79
induced by, 78, 79
invariant, 161, 174, 221, 232
post-£, 153
remote, 247
smooth tail, 161, 188
tail, 156, 161, 246
trivial, 196
CT-finite, 214
Absolutely continuous, 82
Age process, 69, 339
Almost sure (a.s.), 81, 87
Aperiodic, 42, 47, 344
strongly, 47
Bell inequality, 27, 479
Biasing
exponential-biasing, 442
length-biasing, 70, 250, 259,
284, 341, 345
length-debiasing, 250, 259, 284
volume-biasing, 310, 323
volume-debiasing, 316
Birth and death process, 33
Borel
equivalence, 88
equivalent, isomorphic, 88
set, 78
Brownian motion, 97, 128, 242,
336
Can be close to, 57
Canonical, 78
Cartesian product, 79
Cemetery, 132
Censoring state, 132
Classical coupling, 34, 385, 399
Common probability space, 81
Complete metric space, 86
Component, 94
510 Index
common component, 104
Condition in, 88
Conditional
-ly i.i.d., 91
distribution, 87
expectation, 87
independence, 87, 90,91
probability, 87
Conditional distribution, 87
regular, 88
version of, 87
weak-sense-regular, 135
Convergence, 35
Cesaro total variation, 161,
167, 288, 331, 345
dominated, 26
from-the-past, 338, 422
geometric rate, 46
in distribution, 23, 118, 184,
185, 269
in the path space, 183-185
in the state space, 183-185,
195
plain total variation, 143, 266,
351, 361, 378
pointwise, 21, 118
smooth total variation, 161,
182, 267, 354, 362
taboo limit, 436
time-average total variation,
167
to-the-future, 338, 422
uniform, 46, 146, 168, 183,
401
weakly, 118
Convergence rate, 409
exponential order p, 144
geometric order p, 144
moment-order ip, 145
order if, 144
power moment-order a — 1,
168
power order a, 144, 168
uniform, 406
Copy, 1, 78
Coupling, 1, 79, 125, 161, see e-
coupling, see Epsilon
couplings, see Exact coupling,
see Shift coupling, see
Transformation coupling
canonical version, 80
classical, 34, 338
decoupling, 484
distributional, 140, 142
epoch, 34, 137
.event, 7, 106
event inequality, 7, 9,14,112,
141
' from-the-past, 338, 428
i.i.d., 2
impossible, 27, 479
independence, 2, 79
index, 113, 115, 116
index inequality, 113,115,116
indicator, 106
maximal, 7, 9, 10, 106, 141
maximal w.r.t. A, 151
Ornstein, 48
permutation, 243
quantile, 3
radius inequality, 247
regenerative processes, 349,376
remote, 247
rotation, 243
self-, 2
site inequality, 246
strong stationary time, 481
successful, 35
time, 34, 137
time inequality, 35
Cycle, 53, 252, 340, 438
one-dependent cycles, 366
Cycle-length, 252, 340, 438
Cycle-stationary, 249, 250, 254, 259,
284, 295, 341
Defined on, 78
Delay, 53, 340, 438
Delay time, 62
Delay-length, 341
Index 511
Deleting
a null event, 82
an inner null set, 82, 84
Depends on ... only through, 91
Diagonal, 44
Distribution, 78
initial, 34
joint, 80
marginal, 79
stationary, 37
Distributionally unique, 118
Domination
in distribution, 4
pointwise, 4
stochastic, 4, 146, 168, 183,
401
Drift, 67
EPR, 27, 479
Epsilon-couplings, 125, 161, 178
maximally successful, 192
regeneration, 354, 362
success probability, 178
successful, 178
Event, 81
coupling, 7
coupling event, 106
null, 81
shift-coupling, 219
Exact coupling, 125, 137, 161
coupling epoch, 137
coupling time, 137
coupling time inequality, 35
distributional, 125, 137
maximal, 147
maximal at time t, 146
maximally successful, 157
random fields, 245
regeneration, 351, 361
success probability, 137
successful, 137
time inequality, 143
time-inhomogeneous reg., 378
Exact transformation coupling, 244
Exchangeability, 243
Expectation, mean, 86
conditional, 87
Extension, 80
conditioning, 89
consistency, 85
independence, 85
product space, 83
proper, 83
reduction, 82, 84
splitting, 94
transfer, 92, 135
F0lner averaging sets, 220, 227,
240
Future-independent time, 481
Generalized inverse, 3
Greatest common divisor, 42
Harmonic function, 195
harmonic, 207
smooth space-time, 209
smoothed version, 209
space-time harmonic, 205
Harris chain, 364
i.i.d. coupling, 2
Independence coupling, 2, 79
Independent stationary background,
302, 305
Induced by, 78
Initial position, 47
Inspection paradox, 70
Invariant cr-algebra, 161,174, 221,
232, 281, 320
Ionescu Tulcea theorem, 89
Irreducible, 33, 39
Ising model, 470
Jordan Hahn decomposition, 112
Kolmogorov extension theorem, 86
Lattice, 57, 343, 351, 361
Lebesgue interval, 122
Lorentz transformations, 244
512 Index
Markov chain
continuous-time, 39
coupling from-the-past, 467
discrete-time, 42
regular, 45
simulation, 467
time-homogeneous, 39
Markov jump process, 39
Markov process, 201
conditional stationary, 214
initial distribution, 202
regeneration set, 364, 365
space-time process, 375
stationary distribution, 213
stationary measure, 213
strong, 44, 365
taboo regenerative, 440
time-homogeneous, 201
transition probabilities, 201
uniform nullity, 214
version, 202
Markov property, 39
strong, 44, 364
Maximal
coupling, 7, 9, 10
coupling event, 7, 9
Measure dependent, 82
Measure-free, 251
Mixing, 195
Cesaro mixing, 199
mixing, 196, 351, 361, 378
smoothly mixing, 200, 354, 362
Moment
a-moment, 45, 167
(^-moment, 144
classical coupling time, 406,
409
Mutually singular, 107
Nonexplosion, 33
Nonlattice, 57, 354, 362
strongly, 61
Nonlocality, 27, 479
Ornstein coupling, 48
Outcome, 81
Palm theory, 250
associated Z and 5, 253
coincidence of duals, 290, 335
cycle-stationary, 254, 259, 284
duals, 263
ergodic case, 291
inversion formula, 264
length-biasing, 250, 284
xlength-debiasing, 250, 284
Palm characterization, 308, 316
point process, 293, 296
point-at-zero duality, 250, 259,
323
point-stationary, 295, 299, 304,
305, 323
point/site, 297
point/time, 251
process and points, 251
randomization, 317
randomized-origin duality, 250,
284
regenerative, 262
sequence space (£,£), 251
stationary, 254, 259, 284, 295,
298, 304, 323, 337
version, 263
volume-biasing, 310, 323
volume-debiasing, 316, 323
Voronoi cells, 308
Part, 94
Poincare transformations, 244
Point of increase, 57
Point-map, 300
extended, 303
Point-shift, 283, 300
bijective, 302, 304
extended, 304
Point-stationary, 295, 299, 304,'305,
323
Palm characterization, 308
randomization, 317
Poisson process, 299, 376
Polish space, 85
Index 513
Post-i CT-algebra, 153
Power set, 88
Probability
conditional, 87
kernel, 88
mass function, 7
Product CT-algebra, 79
Product probability space, 85
Product space, 79
Projection mapping, 79
Quantile
coupling, 3
function, 3
Quantum physics, 27, 479
Quasi-stationary, 467
Queue G//G//1, 347
Queue GI/GI/k, 369, 376
Queue M/GI/k, 376
Random
point, 251, 297
site, 218, 297
time, 78, 251
transformation, 223
variable, 78
Random element, 78
conditional, 89
external, 81
independent, 85
orginal, 81
transferred, 92
Random field, 218
exact coupling, 245
random site, 218
rotation invariant, 244
shift-coupling, 218, 219
shift-map, 218
site set, 218
Random walk, 47
ladder heights, 360
Randomized stopping time, 349
Recurrence time, 62, 347
Recurrent, 35
null, 37
positive, 37
Regenerative process, see Taboo
regenerative, see Time-
inhomogeneous
regenerative
classical, 337, 346
discrete time, 344
epsilon-couplings, 354, 362
equilibrium process, 486
exact coupling, 349, 351, 361,
376
inter-regeneration times, 347
lag-Z, 358
lag-Z+, 359
regeneration times, 346, 358
renovating event, 487
sequence space, 339
stationary, 348
two-sided, 262
version, 348, 359
wide-sense, 337, 358
zero-delayed, 341, 348
Relative age process, 69, 339
Relative position U, 252, 253
Relativity, 244
Renewal process, 62, 347
epsilon-couplings, 64
exact coupling, 103, 419
intensity measure, 104
taboo, 442
Renewal theorem
Blackwell's, 64
total variation on [0, h], 103
total variation on [0, oo), 419
Representation, 1, 78
splitting, 11
Residual life process, 69, 339
Scheffe's theorem, 111
Self-coupling, 2
Self-similarity, 242
Separable metric space, 86
Set
inner probability one, 82
outer null, 81
514 Index
outer probability one, 82
Shift-coupling, 57, 125, 161, 162,
345
distributional, 163
events, 219
in Palm theory, 289, 330
inequality, 165, 220
maximality, 170
maximally successful, 175
nondistributional, 163
random fields, 218, 219
success probability, 162
successful, 162
times, 162
Shift-map, 130, 218, 251, 340
joint, 340
Signed measure, 109
Simulation, 272, 326
acceptance-rejection, 272,326
birth and death process, 477
coupling from-the-past, 338,
467
imperfect, 279, 326
initial transient problem, 272,
326
perfect, 272, 326, 338, 467
perfection probability, 279
quasi-stationarity, 467
S-s-inventory system, 276
stationarity detection, 489
Site-shift, 298, 303
Smooth tail cr-algebra, 161, 188
Span of a lattice, 74, 343
Splitting, 99, 365
element, 95
extension, 94
indicator, 94
representation, 11, 94
variable, 95
Spread out, 98, 351
Standard space, 88
Starts anew, 44
State space, 33
Stationary, 70, 72, 249, 250, 254,
259, 284, 295, 298, 304,
323, 337, 341, 348, see
Cycle-stationary, see Point-
stationary, see Taboo
stationary
asymptotically, 37, 383
conditional, 214
distribution, 37, 213
intensity, 255, 317
measure, 213
nonstationary, 429
Palm characterization, 316
periodically, 343
simulation, 467
stochastic process, 143
vector, 36
Stein's method, 479
Step-lengths, 47
Stochastic process, 33, 126
birth, 132
canonically jointly measurable,
130
cemetery, 132
censoring state, 132
Cesaro mixing, 199
distribution, 127, 128
exchangeable, 243
index set, 126
jointly measurable, 129
killing, 132
mixing, 196
path process, 359
path set, 128
path space, 128
paths, 127
real valued, 127
self-similar, 242
shift-map, 130
shift-measurable, 130, 252
smoothly mixing, 200
space-time process, 163
standard settings, 128
state space, 126
stationary, 143
trivial, 196
Strassen's theorem, 479
Index 515
Strong stationary time, 481
Strong uniform time, 481
Successful coupling, 35
Supported by, 78
Taboo regenerative, 338, 436, 438
acceptance-rejection, 468
lag-?, 439
regeneration times, 438, 439
simulation, 468
taboo limits, 338
taboo region, 338
taboo stationary, 338, 454
taboo time, 438, 439
time-inhomogeneous reg., 441
under taboo, 439
version, 439
wide-sense, 439
zero-delayed, 439
Taboo stationary, 338, 451, 452
characterization, 453
coupling from-the-past, 467
periodically, 466
simulation, 467
taboo regenerative, 454
taboo time, 452
under taboo, 451
version, 459, 461
Tail er-algebra, 156, 161, 246
Time
coupling, 34
parameter, 33
random, 78
randomized stopping, 349
shift-coupling times, 162
sojourn, 33
stopping, 44, 349
Time-inhomogeneous regenerative,
337, 373
classical coupling, 338, 385,
399
exact coupling, 378
lag-J, 373
lag-f+, 373
nonstationary, 338
oftypep(-|-),373
recurrence distribution Fs, 374
regeneration times, 373
taboo renewal process, 442
time-homogeneous, 374, 383
up to time zero, 432, 433, 441
version, 373
wide-sense, 373
Topological transformation group,
238
amenable, 240
Haar measure, 238
locally compact, 238
second countable, 238
Total life process, 69, 339
Total variation, 12, 109
measure, 112
Trace, 82, 128
Transfer, 92, 100, 120, 135
Transformation coupling, 223, 239
coupling transformations, 223
distributional, 223, 239
exact, 244
inequality, 226, 240
maximality, 228
maximally successful, 232
nondistributional, 224, 239
success probability, 223
successful, 223, 240
Transformation semigroup, 222
acts jointly measurably, 222
F0lner averaging sets, 227
group,222
invariant measure, 225
inverse-measurable group, 223
jointly measurable, 222
measurable, 222
random transformation, 223
Transient, 40
Transition
jointly measurable, 203
matrix, 34
probabilities, 201
semigroup, 34
Triviality, 195, 196
516 Index
Unconscious probabilist, 85
Uniformly integrable, 381
Version, 63, 202
differently started, 34, 202
independent, 34
Voronoi cells, 308
Waiting time paradox, 70
Wiener process, 97, 128, 242
Zero-delayed, 63, 341
Notation
R = (-00,00) and 1+ = [0, 00)
Z = {...,-1,0,1,...} and Z+={0,1,...}
X = Y = X and Y have the same distribution
a := b = a is defined to be b
f = g = for all x it holds that }{x) = g{x)
\a — the indicator function of the set A
#A = the number of elements in the set A
a V b, a A 6 = maximum and minimum of a and b
a+, a~ = a V 0 and -a A 0
[x] = inf {n 6Z:n^i} = integer part of x
x mod /i = x — [x/h]h = the remainder when a; is divided by h
f(x-) = limy-t-j f(y) = left-hand limit
\i A A = the greatest common component of the measures \i and A
(n — A)+, (/x — A)~ = positive and negative parts of /j — A
/j _L A = the measures /j and A are mutually singular
mU = restriction of the measure /j to the sub-er-algebra A
—> = convergence in distribution
A- = convergence in total variation
|| • || = the total variation norm
dot-convention: /(•) = / = the function with value f(x) at x, for all x
er(-) = the CT-algebra generated by •
i.i.d. = independent and identically distributed
517