Untitled
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
Thinning, Entropy, and the Law of Thin Numbers
Peter Harremoës
, Member, IEEE, Oliver Johnson, and Ioannis Kontoyiannis
, Senior Member, IEEE
Abstract—Rényi's thinning operation on a discrete random vari-
where the random variables
are independent and identi-
able is a natural discrete analog of the scaling operation for contin-
cally distributed (i.i.d.) each with a Bernoulli distribution with
uous random variables. The properties of thinning are investigated
parameter , denoted
, and also independent of
in an information-theoretic context, especially in connection with
information-theoretic inequalities related to Poisson approxima-
usual, we take the empty sum
to be equal to zero.] An
tion results. The classical Binomial-to-Poisson convergence (some-
explicit representation of
times referred to as the "law of small numbers") is seen to be a
special case of a thinning limit theorem for convolutions of discrete
distributions. A rate of convergence is provided for this limit, and
nonasymptotic bounds are also established. This development par-
allels, in part, the development of Gaussian inequalities leading
When it causes no ambiguity, the thinned distribution
to the information-theoretic version of the central limit theorem.
In particular, a "thinning Markov chain" is introduced, and it is
shown to play a role analogous to that of the Ornstein-Uhlenbeck
For any random variable
with distribution
process in connection to the entropy power inequality.
-fold convolution of
with itself, i.e., the
distribution of the sum of
. For example, if
Index Terms—Binomial distribution, compound Poisson distri-
bution, entropy, information divergence, law of small numbers, law
, the binomial distribution
of thin numbers, Poisson distribution, Poisson-Charlier polyno-
and . It is easy to see that its
; see Example 2.2. Therefore,
the classical Binomial-to-Poisson convergence result—some-
times referred to as the "law of small numbers"—can be phrased
A PPROXIMATING the distribution of a sum of weakly as saying that, if
dependent discrete random variables by a Poisson distri-
bution is an important and well-studied problem in probability;see [2] and the references therein for an extensive account.
denotes the Poisson distribution with parameter
Strong connections between these results and information-the-
oretic techniques were established [18], [28]. In particular, for
One of the main points of this paper is to show that this result
the special case of approximating a binomial distribution by
holds for very wide class of distributions
, and to provide con-
a Poisson, some of the sharpest results to date are established
ditions under which several stronger and more general versions
using a combination of the techniques [18], [28], and Pinsker's
of (3) can be obtained. We refer to results of the form (3) as laws
inequality [10], [13], [22]. Earlier work on information-theoretic
of thin numbers.
bounds for Poisson approximation is reported in [42], [25], [34].
Section II contains numerous examples that illustrate how
The thinning operation, which we define next, was introduced
particular families of random variables behave on thinning,
by Rényi in [35], who used it to provide an alternative charac-
and it also introduces some of the particular classes of random
terization of Poisson measures.
variables that will be considered in the rest of the paper. In
Definition 1.1: Given
and a discrete random vari-
Sections III and IV several versions of the law of thin numbers
with distribution
, the -
thinning
are formulated; first for i.i.d. random variables in Section III,
is the distribution
and then for general classes of (not necessarily independentor identically distributed) random variables in Section IV. Forexample, in the simplest case where
and with mean , so that the distribution
Manuscript received June 03, 2009; revised April 22, 2010. Date of current
version August 18, 2010. The work of P. Harremoës was supported by a grantfrom the Danish Natural Science Research Council and by the European Pascal
Network of Excellence. The work of I. Kontoyiannis was supported in part bya Marie Curie International Outgoing Fellowship, PIOF-GA-2009-235837. Thematerial in this paper was presented in part at the 2007 IEEE International Sym-
, where, as usual,
posium on Information Theory, Nice, France, June 2007.
notes the
information divergence, or
relative entropy, from
P. Harremoës is with Copenhagen Business College, 1175 Copenhagen, Den-
mark (e-mail:
[email protected]).
O. Johnson is with the Department of Mathematics, University of Bristol,
Bristol, BS8 1TW, United Kingdom (e-mail:
[email protected]).
I. Kontoyiannis is with the Department of Informatics, Athens University of
Economics and Business, Athens 10434, Greece (e-mail:
[email protected]).
Communicated by E. Ordentlich, Associate Editor for Source Coding.
1Throughout the paper, log denotes the natural logarithm to base e, and we
Digital Object Identifier 10.1109/TIT.2010.2053893
adopt the usual convention that 0 log 0 = 0.
0018-9448/$26.00 2010 IEEE
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
Note that, unlike most classical Poisson convergence results, the
ically, we employ the scaled Fisher information functional in-
law of thin numbers in (4) proves a Poisson limit theorem for the
troduced in [28] to give precise, explicit bounds on the diver-
sum of a single sequence of random variables, rather than for a
. An example of the type of result
triangular array.
we prove is the following: Suppose
is an ultra bounded (see
It may be illuminating to compare the result (4) with the in-
Section II, Definition 2.1) random variable, with distribution
formation-theoretic version of the central limit theorem (CLT);
mean , and finite variance
see, e.g., [3] and [23]. Suppose
are i.i.d. continuous
random variables with density
, and with zero mean and
unit variance. Then the density of their sum
, is the -fold convolution
with itself. Write
for a nonzero constant we explicitly identify; cf. Corollary 6.1.
for the standard scaling operation in the CLT regime: If a
Similarly, in Section VIII we give both finite-
continuous random variable
has density , then
totic bounds on the law of small numbers in terms of the total
density of the scaled random variable
, and, in particular,
variation distance,
the density of the standardized sum
distribution. In particular, Theorem 8.1 states that
formation-theoretic CLT states that, if
and finite variance
is the standard Normal density. Note the close analogy
[Corresponding lower bounds will be presented in the com-
between the statements of the law of thin numbers in (4) and the
panion paper [21].]
A closer examination of the monotonicity properties of the
Before describing the rest of our results, we mention that
scaled Fisher information in relation to the thinning operation is
there is a significant thread in the literature on thinning limit
described in Section VII. Finally, Section IX shows how the idea
theorems and associated results for point processes. Conver-
of thinning can be extended to compound Poisson distributions.
gence theorems of the "law of thin numbers" type, as in (3)
TheAppendix contains the proofs of some of the more technical
and (4), were first examined in the context of queueing theory
by Palm [33] and Khinchin [26], while more general results
Finally we mention that, after the announcement of (weaker
were established by Grigelionis [17]. See the discussion in the
and somewhat more specialized versions of) the present results
text, [12, pp. 146–166], for details and historical remarks; also
in [20], Yu [41] also obtained some interesting, related results. In
see the comments following Section IV, Theorem 4.1. More
particular, he showed that the conditions of the strong and ther-
specifically, this line of work considered asymptotic results, pri-
modynamic versions of the law of thin numbers (see Theorems
marily in the sense of weak convergence, for the distribution
3.3 and 3.2) can be weakened, and he also provided conditions
of a superposition of the sample paths of independent (or ap-
under which the convergence in these limit theorems is mono-
propriately weakly dependent) point processes. Here we take
a different direction and, instead of considering the full infi-nite-dimensional distribution of a point process, we focus on
II. EXAMPLES OF THINNING AND DISTRIBUTION CLASSES
finer results—e.g., convergence in information divergence and
This section contains several examples of the thinning opera-
nonasymptotic bounds—for the one-dimensional distribution of
tion, statements of its more basic properties, and the definitions
the thinned sum of integer-valued random variables.
of some important classes of distributions that will play a cen-
With these goals in mind, before examining the finite-
tral role in the rest of this paper. The proofs of all the lemmas
, in Section V we study a simpler but re-
and propositions of this section are given in the Appendix.
lated problem, on the convergence of a continuous-time "thin-
Note, first, two important properties of thinning that are im-
ning" Markov chain on
, which acts as the
mediate from its definition:
It is well known that this Markov chain has the Poisson law as its
1. The thinning of a sum of independent random variables is
unique invariant measure (see for example, [31] or [32]). In The-
the convolution of the corresponding thinnings.
orem 5.1 we characterize precisely the rate at which it converges
and any distribution
to the Poisson law in terms of the
distance, which leads to
the following.
an upper bound on its convergence in information divergence. Anew characterization of the Poisson distribution in terms of thin-
ning is also obtained. The main technical tool used here is basedon an examination of the
properties of the Poisson-Char-
Example 2.1: Thinning preserves the Poisson family of laws,
lier polynomials in the thinning context. In the present context,
. This follows from (2), since
as described by Chafaï [7], this Markov chain plays a role par-allel to that of the Ornstein-Uhlenbeck process in the context ofGaussian convergence and the entropy power inequality [37],[38], [29].
In Section VI we give both asymptotic and finite-
on the rate of convergence for the law of thin numbers. Specif-
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
, denotes the Poisson mass
As it turns out, the factorial moments of a thinned distribu-
tion are easier to work with than ordinary moments. Recall thatthe th factorial moment of
i.i.d. geometrics has a negative binomial distribu-
falling factorial
tion. Thus, in view of this example and property 1 stated in thebeginning of this section, the thinning of a negative binomialdistribution is also negative binomial.
The factorial moments of an -thinning are easy as given by
Partly motivated by these examples, we describe certain
the following result.
classes of random variables (some of which are new). Theseappear as natural technical assumptions in the subsequent
Lemma 2.1: For any random variable
with distribution
development of our results. The reader may prefer to skip the
for a random variable with
remainder of this section and only refer back to the definitions
when necessary.
Definition 2.1:
1) A
Bernoulli sum is a distribution that can be obtained from
That is, thinning scales factorial moments in the same way as
the sum of finitely many independent Bernoulli random
ordinary multiplication scales ordinary moments.
variables with possibly different parameters. The class of
We will use the following result, which is a multinomial ver-
Bernoulli sums with mean
sion of Vandermonde's identity and is easily proved by induc-
tion. The details are omitted.
2) A distribution
Lemma 2.2: The falling factorial satisfies the multinomial
expansion; that is, for any positive integer
is said to be
ultra log-concave (ULC); cf. [24]. The set ofultra log-concave distributions with mean
, and we also write
. Note that (8) is satisfied for a single value
if and only if it is satisfied for all
3) The distribution of a random variable
The following is a basic regularity property of the thinning
will be said to be
ultra bounded (UB) with ratio . The set of ultra boundeddistributions with this ratio is denoted
Proposition 2.1: For any
4) The distribution of a random variable
will be said to be
Poisson
bounded (PB) with ratio
. The set of Poisson bounded
Example 2.2: Thinning preserves the class of Bernoulli sums.
distributions with this ratio is denoted
That is, the thinned version of the distribution of a finite sum of
5) A random variable will be said to be ULC, UB, or PB, if
independent Bernoulli random variables (with possibly different
its distribution is ULC, UB or PB, respectively.
parameters) is also such a sum. This follows from property 1
First we mention some simple relationships between these
stated in the beginning of this section, combined with the ob-
classes. Walkup [39] showed that if
servation that the
distribution is the
distribution. In particular, thinning preserves the bi-
. In [24] it was shown that, if
Example 2.3: Thinning by
transforms a geometric distri-
is Poisson bounded if and only if the -thinning
into a geometric distribution with mean
is Poisson bounded, for some
. The same holds for ultra
Recalling that the geometric distribution with mean
Proposition 2.2: In the notation of Definition 2.1, the class
. That is, if the distribution of
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
The next result states that the PB and UB properties are pre-
is bounded below by
served on summing and thinning.
Proposition 2.3:
Now, for any fixed value of
tending to infinity
Formally, the above discussion can be summarized as
Finally, we note that each of these classes of distributions is"thinning-convex," i.e., if
are element of a set then
and by monotone convergence
is also an element of the same set. In partic-
ular, thinning maps each of these sets into itself, since
, the point mass at zero, has
III. LAWS OF THIN NUMBERS: THE I.I.D. CASE
In this section we state and prove three versions of the law of
thin numbers, under appropriate conditions; recall the relevant
are probability mass functions and so is
discussion in the Introduction. Theorem 3.1 proves convergence
is necessarily a limit.
in total variation, Theorem 3.2 in entropy, and Theorem 3.3 ininformation divergence.
As usual, the entropy of a probability distribution
Recall that the total variation distance
Recall that the entropy of the Poisson distribution cannot beexpressed in closed form, although useful bounds do exist; see,e.g., [1] and the references therein.
Theorem 3.1 (Weak Version): For any distribution
Theorem 3.2 (Thermodynamic Version): If
which is Poisson bounded with mean , then
Proof: In view of Scheffé's lemma, pointwise convergence
Proof: The distribution
converges pointwise
of discrete distributions is equivalent to convergence in total
to the Poisson distribution so, by dominated convergence, it is
variation, so it suffices to show that,
sufficient to prove that
dominated by a summable function. This easily follows from
, and that (2) implies
the simple bound in the following lemma.
the following elementary bounds for all
, using Jensen's in-
Lemma 3.1: Suppose
is Poisson bounded with ratio
Proof: Note that, for all
so that, in particular,
Since for i.i.d. variables
, the probability
obtain that the convolution
From the proof of [24, Theorem 2.5] we know that,
is ultra log-concave, so for
such distributions the theorem states that the entropy converges
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
to its maximum. For ultra log-concave distributions the ther-
where the first term (corresponding to
modynamic version also implies convergence in information
the Poisson measures form an exponential family, they satisfy a
divergence. This also holds for Poisson bounded distributions,
Pythagorean identity [11] which, together with the bound
which is easily proved using dominated convergence. As shownin the next theorem, convergence in information divergence can
be established under quite general conditions.
Theorem 3.3 (Strong Version): For any distribution
see, e.g., [22] or [28], gives, for each
The proof of Theorem 3.3 is given in the Appendix; it is based
on a straightforward but somewhat technical application of thefollowing general bound.
Proposition 3.1: Let
be a random variable with distribu-
and with finite mean
Since the final bound clearly remains valid for
tuting it into (12) gives (10).
IV. LAWS OF THIN NUMBERS: THE NON-I.I.D. CASE
where the right-hand side is always finite.
In this section we state and prove more general versions of the
Proof: First note that, since
has finite mean, its entropy
law of thin numbers, for sequences of random variables that are
is bounded by the entropy of a geometric with the same mean,
not necessarily independent or identically distributed. Although
which is finite, so
is finite. Therefore, the divergence
some of the results in this section are strict generalizations of
can be expanded as
Theorems 3.1 and 3.3, their proofs are different.
We begin by showing that, using a general proof technique
introduced in [28], the weak law of thin numbers can be estab-lished under weaker conditions than those in Theorem 3.1. Themain idea is to use the data-processing inequality on the totalvariation distance between an appropriate pair of distributions.
Theorem 4.1 (Weak Version, Non-I.I.D.): Let
an arbitrary sequence of distributions on
for the convolution of the first
where the last inequality follows from the Stirling bound
as long as the following three conditions are satisfied as
denotes the function
is finite, the bound (11) implies that
is finite. [Recall the convention that
Note that Theorem 4.1 can be viewed as a one-dimensional
Also note that the representation of
version of Grigelionis' Theorem 1 in [17]; recall the relevant
comments in the Introduction. Recently, Schuhmacher [36]established nonasymptotic, quantitative versions of this result,in terms of the Barbour-Brown distance, which metrizes weakconvergence in the space of probability measures of point
Using this and the joint convexity of information divergence in
processes. As the information divergence is a finer functional
its two arguments (see, e.g., [9, Th. 2.7.2]), the divergence of
than the Barbour-Brown distance, Schuhmacher's results are
interest can be bounded as
not directly comparable with the finite-
bounds we obtain in
Propositions 3.1, 4.1, and Corollary 6.1.
Before giving the proof of the theorem, we state a simple
lemma on a well-known bound for
proof is included for completeness.
Lemma 4.1: For any ,
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
Proof: Suppose, without loss of generality, that
and define two independent random variables
coupling inequality [30]
The second inequality in the lemma is trivial.
and, by assumption, this converges to zero as
Proof of Theorem 4.1: First we introduce some convenient
pleting the proof.
be independent random variables with
for all ; for each
Recall that, in the i.i.d. case, the weak law of thin numbers
dent random variables with
for all ; and simi-
only required the first moment of
to be finite, while the strong
version also required that the divergence from
. Also we define the
distribution be finite. For a sum of independent, non-identicallydistributed random variables with finite
second moments,
Proposition 3.1 can be used as in the proof of Theorem 3.3
to prove the following result. Note that the precise conditions
required are somewhat analogous to those in Theorem 4.1.
Theorem 4.2 (Strong Version, Non-i.i.d.): Let
an arbitrary sequence of distributions on
and finite variance. Writing
and, by assumption,
With these definitions in place, we approximate
as long as the following two conditions are satisfied:
where, by Lemma 4.1, the second term is bounded by
which vanishes as
. Therefore, it suffices to show that
The proof of Theorem 4.2 is given in the Appendix, and it is
the first term in (14) goes to zero. For that term
based on Proposition 3.1. It turns out that under the additionalcondition of finite second moments, the proof of Proposition3.1 can be refined to produce a stronger upper bound on thedivergence.
Proposition 4.1: If
is a distribution on
Proof: Recall that in the proof of Proposition 3.1 it was
where the first inequality above follows from the fact that, beingan
-divergence, the total variation distance satisfies the data-
processing inequality [11]; the second inequality comes fromthe well-known bound on the total variation distance betweentwo product measures as the sum of the distances between their
respective marginals; and the third bound is simply the triangleinequality.
Finally, noting that, for any random variable
, and also recalling the simple
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
and where in the last step above we used the simple bound
The first and third terms tend to zero by assumptions (b) and
(c), respectively. And using assumption (a), the second term is
Substituting (17) into (16) yields
which also tends to zero by assumption (b).
V. THE THINNING MARKOV CHAIN
Before examining the rate of convergence in the law of thin
numbers, we consider a related and somewhat simpler problemfor a Markov chain. Several of the results in this section maybe of independent interest. The Markov chain we will discuss
represents the evolution of the
queue. Its properties
Using the bound (15) instead of Proposition 3.1, the following
were used by Chafaï [7] to develop a family of inequalities
more general version of the law of thin numbers can be estab-
which extends the logarithmic Sobolev inequalities of Bobkov
and Ledoux [5], and other authors. Chafaï argues in [7, Sec. 1.2,1.3] that this process is a natural discrete analog of the Ornstein-
Theorem 4.3 (Strong Version, Non-I.I.D.): Let
Uhlenbeck process associated with the Gaussian distribution.
a sequence of (not necessarily independent or identicallydistributed) random variables on
Definition 5.1: Let
be a distribution on
distribution of the partial sum
for the distribution
. Assume that the
have finite means and variances,
a) They are "uniformly ultra bounded," in that,
for all , with a common
is often written simply as
b) Their means satisfy
, and that, obviously,
c) Their covariances satisfy
probability distributions to probability distributions. Therefore,if for a
fixed
of linear operators on the space of probability
defines a Markov transition semigroup. Specif-
, the transition probabilities
define a continuous-time Markov chain
is intuitively clear that, as
(or, equivalently,
should converge to the
Indeed, the following two well-known results (see for example,
Proof: Obviously it suffices to prove the general statement.
[31] or [32]) state that
is ergodic, with unique invariant
Proposition 4.1 applied to
. Theorem 5.1 gives the rate at which it converges
in terms of the moments of the underlying distribution.
This complements results such as [7, Th. 3.1], which gives abound in terms of the tightest value of the log-Sobolev constant.
Proposition 5.1: For any distribution
verges in total variation to
Proof: From the definition of
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
Definition 5.2: For given , the Poisson-Charlier polynomial
Some well-known properties of the Poisson-Charlier polyno-
mials are listed in the following lemma without proof. Note that
their exact form depends on the chosen normalization; other au-thors present similar results, but with different normalizations.
Lemma 5.1: For any ,
where (18) follows from the fact that convolution with any dis-
tribution is a contraction with respect to the
lows from the triangle inequality, and (20) converges to zero
because of the bound (9).
Using this, we can give a characterization of the Poisson
Corollary 5.1: Let
denote a discrete distribution with mean
is the unique invariant measure of the Markov chain
Proof: Assume that
, by Proposition 5.1,
large. The strengthened convergence of
can be proved using standard
arguments along the lines of the corresponding discrete-time
Observe that, since the Poisson-Charlier polynomials form an
results in [15], [4], and [19].
orthonormal set, any function
can be expanded as
Next we shall study the rate of convergence of
Poisson distribution. It is easy to check that the Markov chain
is in fact
reversible with respect to its invariant measure
. Therefore, the natural setting for the study of its con-
space of functions
It will be convenient to be able to translate between factorial mo-
. This space is also endowed
ments and the "Poisson-Charlier moments,"
with the usual inner product
in (21) shows that
. More generally, the following
proposition shows that the role of the Poisson-Charlier momentswith respect to the Markov chain
is analogous to the role
played by the factorial moments with respect to the pure thin-
and the linear operators
ning operation; cf. Lemma 2.1. Its proof, given in the Appendix,
is similar to that of Lemma 2.1.
Proposition 5.2: Let
be a random variable with
for a random variable with distribution
The reversibility of
and assume that the thinning
is a self-adjoint linear operator on
, therefore, its eigenvec-
has initial distribution
tors are orthogonal functions. In this context, we introduce the
, then, Proposition 5.2 states that
Poisson-Charlier family of orthogonal polynomials
for a broad introduction.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
that is, the Poisson-Charlier moments of
Proof: The proof is based on a Hilbert space argument
. Similarly, expanding any func-
using the fact that the Poisson-Charlier polynomials are orthog-
in terms of Poisson-Charlier polynomials,
. Using Proposition 5.3
, and using Proposition 5.2
Thus, the rate of convergence of
will be dominated by the
term corresponding to
The following proposition (proved in the Appendix) will be
used in the proof of Theorem 5.1, which shows that this is indeed
where the last step follows from the orthogonality relation (22).
the right rate in terms of the
distance. Note that there is no
restriction on the mean of
in the proposition.
Proposition 5.3: If
is Poisson bounded, then the
can be expanded as
which is finite. From the previous expansion we see that
, combining Propositions 5.2
, which implies the finite-
and 5.3, we obtain that
ness claim. Moreover, that expansion has
its dominant term, implying the stated limit.
Theorem 5.1 readily leads to upper bounds on the rate of con-
vergence in terms of information divergence via the standardbound
which follows from direct applications of Jensen's inequality.
where, as before,
denotes the first integer
Furthermore, replacing this bound by the well-known approxi-
. This sum can be viewed as a discrete analog
of the well-known Edgeworth expansion for the distribution ofa continuous random variable. A technical disadvantage of boththis and the standard Edgeworth expansion is that, although thesum converges in
, truncating it to a finite number of terms in
gives the estimate
general produces an expression which may take negative values.
By a more detailed analysis we shall see in the following twosections how to get around this problem.
For now, we determine the rate of convergence of
We shall later prove that, in certain cases, this approximation
can indeed be rigorously justified.
recall the definition of the
distance between two probability
VI. THE RATE OF CONVERGENCE IN THE STRONG LAW OF
be a random variable on
Theorem 3.3 we showed that, if
Theorem 5.1: If
is Poisson bounded, then the distance
is finite for all
also has finite variance
, then Proposition 4.1 implies
denotes the smallest
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
suggesting a convergence rate of order
. In this section, we
Theorem 6.1: Suppose
prove more precise upper bounds on the rate of convergence in
bounded with ratio . Let
denote the smallest integer
the strong law of thin numbers (27). For example, if
ultra bounded random variable with
, then we show that
Combining Theorem 6.1 with (31) immediately yields the
lows from the more general result of Corollary 6.1; its proof is
Corollary 6.1: Suppose
based on a detailed analysis of the scaled Fisher information in-
bounded with ratio . Let
denote the smallest integer
troduced in [28]. We begin by briefly reviewing some properties
of the scaled Fisher information.
Definition 6.1: The scaled Fisher information of a random
with mean , is defined by
VII. MONOTONICITY RESULTS FOR THE SCALED
denotes the scaled score function
FISHER INFORMATION
In this section we establish a finer result for the behavior of
the scaled Fisher information upon thinning, and use that to de-duce a stronger finite- upper bound for the strong law of thinnumbers. Specifically, if
is ULC with mean , and
In [28, Prop. 2] it was shown, using a logarithmic Sobolev
denotes a random variable with distribution
inequality of Bobkov and Ledoux [5], that for any
. This implies that, for all ULC random
, we have the following finite- version of the strong
law of thin numbers
under mild conditions on the support of
. Also, [28, Prop. 3]
satisfies a subadditivity property: For indepen-
dent random variables
Note that, unlike the more general result in (28) which gives abound of order
, the above bound is of order
The key observation for these results is in the following
. In particular, recalling that the thinning
of a convolution is the convolution of the corresponding thin-
Lemma 7.1: Suppose
is a ULC random variable with dis-
are i.i.d. random variables with mean
then the bounds in (29) and (30) imply
random variable with distribution
. Then the derivative of
Therefore, our next goal is to determine the rate at which
tending to 0. We begin with the
where, for a random variable
with mass function
following proposition; its proof is given in Appendix.
Proposition 6.1: If
is Poisson bounded, then
mits the representation
Moreover, the truncated sum from
is an upper bound
is even, and a lower bound if
Proof: This result follows on using the expression for the
An important consequence of this proposition is that the prob-
arising as the case
tends to zero like
Prop. 3.6], that is
leads to the following asymptotic result for the scaled Fisher in-formation, also proved in theAppendix.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
Using this, for each
we deduce that the derivative
Consider the finite difference operator
. We require a result suggested by
can be expressed as the sum of
relevant results in [6], [27]. Its proof is given in the Appendix.
Lemma 7.2: Let
be ULC random variable with distribution
. Then for any function , defining
VIII. BOUNDS IN TOTAL VARIATION
In this section, we show that a modified version of the argu-
ment used in the proof of Proposition 4.1 gives an upper boundto the rate of convergence in the weak law of small numbers. If
, then combining the bound
(15) of Proposition 4.1 with Pinsker's inequality we obtain
The result follows (with the term-by-term differentiation of theinfinite sum justified) if the sum of these terms in
convergent. The first terms are positive, and their sum is abso-lutely convergent to
by assumption. The second terms form a
which gives an upper bound of order
. From the asymp-
collapsing sum, which is absolutely convergent assuming that
totic upper bound on information divergence, Corollary 6.1, weknow that one should be able to obtain upper bounds of order
. Here we derive an upper bound on total variation using the
same technique used in the proof of Proposition 4.1.
Note that, for any ULC distribution
, by definition we have for
Theorem 8.1: Let
be a distribution on
above sum is bounded above by,
which is finite by Proposition 2.2.
The proof uses the following simple bound, which follows
easily from a result of Yannaros, [40, Theorem 2.3]; the details
We now deduce the following theorem, which parallels, re-
are omitted.
spectively, [41, Th. 8], where a corresponding result is provedfor the information divergence.
Lemma 8.1: For any
Theorem 7.1: Let
be a ULC random variable with
for a random variable with distribution
Proof: The first inequality in the proof of Proposition 4.1
remains valid due to the convexity of the total variation norm(since it is an -divergence). The next equality becomes an in-equality triangle, and it is justified by the triangle, and we have
The first part follows from the observation
, since, by Lemma
7.1, its derivative is
in the more technical Lemma
7.2 below, we deduce that
, and this proves (i). Then (ii) immediately follows
from (i) combined with the earlier bound (31), upon recallingthat thinning preserves the ULC property [24].
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
And using Lemma 8.1 leads to
are as before, and
able that is independent of the
Perhaps the most natural way in which the compound Poisson
distribution arises is as the limit of compound binomials. That is,
, or, equivalently
and the result follows by an application of Hölder's inequality.
As with the strong law of thin numbers, this result remains
true for general distributions
, and the convergence can be es-
tablished in the sense of information divergence.
IX. COMPOUND THINNING
Theorem 9.1: Let
be a distribution on
There is a natural generalization of the thinning operation,
and finite variance
. Then, for any probability measure
via a process which closely parallels the generalization of thePoisson distribution to the compound Poisson. Starting with arandom variable
is obtained by writing
and then keeping each of these 1s with probability
dently of all the others; cf. (1) above.
The proof is very similar to that of Theorem 3.3 and thus
More generally, we choose and fix a "compounding" distribu-
omitted. In fact, the same argument as that proof works for non-
integer-valued compounding. That is, if
is an
arbitrary prob-
then the compound -thinned version of
ability measure on
, then compound thinning a
-thinned version of
, is the random variable
as in (35) gives a probability measure
which results from first thinning
as above and then replacing
of the 1s that are kept by an independent random sample from
It is somewhat remarkable that the statement
and proof of
most of our results concerning the information divergence re-
main essentially unchanged in this case. For example, we easilyobtain the following analog of Proposition 4.1.
where all the random variables involved are independent. For
Proposition 9.1: If
is a distribution on
for the distribution of the
, then, for any prob-
-thinned version of
pressed as a mixture of "compound binomials" in the same wayas
is a mixture of binomials. The compound binomial
distribution with parameters
is the distribution of the sum of
i.i.d. random variables, each
of which is the product of a
random variable and an
The details of the argument of the proof of the proposition are
random variable. In other words, it is the
straightforward extensions of the corresponding proof of Propo-
-thinned version of the point mass at , i.e., the distribu-
tion of (35) with
w.p.1. Then we can express the prob-
-thinned version of
The following two observations are immediate from the
Proof of Lemma 2.1: Simply apply Lemma 2.2 to Defini-
1) Compound thinning maps a Bernoulli sum into a com-
pound Bernoulli sum: If
is the distribution of the
is the distribution of the "com-
pound Bernoulli sum,"
are i.i.d. with distribu-
, independent of the
2) Compound thinning maps the Poisson to the compound
Poisson distribution, that is,
the compound Poisson distribution with rate
pounding distribution
as the distribution of
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
using the fact that the sequence of factorial moments of the
Proof of Proposition 2.1: Assume that
. Then, recalling the property stated in (6), it
The second property is easily checked using Lemma 2.1.
Proof of Theorem 3.3: In order to apply Proposition 3.1
, which is only possible if
, we need to check that
denote the sum of
is the distribution of
Proof of Proposition 2.2: Note that the expectation
ables. Therefore, using the data-processing inequality [11] as in[28] implies that
finite by assumption.
by the Chebyshev rearrangement lemma, since it is the covari-
Proposition 3.1 gives
ance between an increasing and a decreasing function. Rear-ranging this inequality gives
The law of large numbers implies that,
as required.
complete the proof it suffices to show that
Proof of Proposition 2.3: For part (a), using Lemma 2.2,
, or, equivalently, that the se-
is uniformly integrable. We
will actually show that the nonnegative random variables
bounded above by a different uniformly integrable sequence. In-deed, by the log-sum inequality
It is straightforward to check, using Lemma 2.1, that
Arguing as in the beginning of the proof of Proposition 3.1
shows that the mean
is finite, so the law
To prove part (b), using Lemma 2.2, Pascal's identity and
of large numbers implies that the averages in (36) converge to
relabelling, yields
. Hence, they form a uniformly integrable se-
quence; this implies that the
are also uniformly integrable,
completing the proof.
Proof of Theorem 4.2: The proof is similar to that of The-
orem 3.3, so some details are omitted. For each
, where the random
are independent, with each
First, to see that
plying the data-processing inequality [11] as in [28] gives
, and it is easy to
check that each of these terms is finite because all
second moments. As before, Proposition 3.1 gives
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
for each , the independent random vari-
have zero mean and
where we have used the fact that the factorial moments of a
which is finite by assumption (b). Then, by the general version
. Simplifying and
of the law of large numbers on [14, p. 239]
interchanging the two sums,
a.s., and hence, by assumption (a),
a.s., so that also,
. Moreover, since
for every integer
Proof of Proposition 5.3: First we have to prove that
is Poisson bounded with ratio
which is uniformly bounded over
by our assumptions.
say. Using the bound in Lemma 3.1
Therefore, the sequence
, which implies that it is uniformly
integrable, therefore it converges to
Finally, recalling once more that the Poisson measures form
an exponential family, they satisfy a Pythagorean identity [11],so that
which is finite.
where the first term was just shown to go to zero as
Now, recalling the general expansion (25), it suffices to show
and the second term is actually equal to
as required.
which also vanishes as
by assumption (a).
Proof of Proposition 6.1: We need the following simple
Proof of Proposition 5.2: Let
lemma; for a proof see, e.g., [16].
dent random variables with distributions
Lemma A.1: If
respectively. Then from the definitions, and using Lemmas 2.2and 2.1
Turning to the proof of Proposition 6.1, assume
Poisson bounded with ratio . Then the series in the statementconverges, since
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
Therefore, using this and (38), for small enough
term in (39) is bounded above by
which, divided by
, tends to zero as
For the other two terms we use the full expansion of Propo-
sition 6.1, together with Lemma 2.1, to obtain a more accurateexpression for the score function
A similar argument holds for
Proof of Theorem 6.1: Let
have distribution
Using Lemma 2.1, Proposition 6.1, and the fact that
Since, by assumption,
bounded, for small enough
the score function of
, the first terms in the series in the numerator above
vanish. Therefore,
, the numerator and denominator above are both
bounded functions of
, and the denominator is bounded away
from zero (because of the term corresponding to
, the score function
. For the first term in (39) we thus have
Since the lower bound
is obvious, it follows that,
which, again, when divided by
, tends to zero as
Thus only the second term in (39) contributes. For this term,
we similarly obtain that the limit
For the third term note that, applying Markov's inequality to thefunction
, which increases on
the integers, we obtain,
HARREMOËS
et al.: THINNING, ENTROPY, AND THE LAW OF THIN NUMBERS
This means that (with the reversal of order of summation justi-fied by Fubini, since all the terms have the same sign)
Finally, combining the above limits with (39) yields
Proof of Lemma 7.2: The key is to observe that for
is decreasing in , and
creasing in , there exists an integer
and the result holds. Note that the inequality in (44) follows by(42) and (43), and the inequality in (45) by the discussion above.
The authors wish to thank E. Telatar and C. Vignat for
hosting a small workshop in January 2006, during which someof these ideas developed. J. Swart also provided us with usefulcomments.
[1] J. A. Adell, A. Lekuona, and Y. Yu, "Sharp bounds on the entropy of
the Poisson law and related quantities,"
IEEE Trans. Inf. Theory, vol.
56, no. 5, pp. 2299–2306, 2010.
[2] A. D. Barbour, L. Holst, and S. Janson
, Poisson Approximation.
Further, by Cauchy-Schwarz, for
ford, U.K.: Clarendon , 1992.
[3] A. R. Barron, "Entropy and the central limit theorem,"
Ann. Probab.
Theory, vol. 14, no. 1, pp. 336–342, 1986.
[4] A. R. Barron, "Limits of information, Markov chains, and projections,"
in
Proc. 2000 Int. Symp. Inf. Theory, 2000, pp. 25–25.
[5] S. G. Bobkov and M. Ledoux, "On modified logarithmic Sobolev in-
equalities for Bernoulli and Poisson measures,"
J. Funct. Anal., vol.
156, no. 2, pp. 347–365, 1998.
[6] A. A. Borovkov and S. A. Utev, "An inequality and a characteriza-
tion of the normal distribution connected with it,"
Teor. Veroyatnost. iPrimenen, vol. 28, no. 2, pp. 209–218, 1983.
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 56, NO. 9, SEPTEMBER 2010
[7] D. Chafaï, "Binomial-Poisson entropic inequalities and the M=M=1
[36] D. Schuhmacher, "Distance estimates for dependent superpositions
queue,"
ESAIM Probab. Statist., vol. 10, pp. 317–339, 2006.
of point processes,"
Stochastic Process. Appl., vol. 115, no. 11, pp.
[8] T. S. Chihara
, An Introduction to Orthogonal Polynomials.
1819–1837, 2005.
York: Gordon and Breach, 1978.
[37] C. E. Shannon, "A mathematical theory of communication,"
Bell Syst.
[9] T. Cover and J. A. Thomas
, Elements of Information Theory.
Tech. J., vol. 27, no. 379–423, pp. 623–656, 1948.
York: Wiley, 1991.
[38] A. J. Stam, "Some inequalities satisfied by the quantities of information
[10] I. Csiszár, "Information-type measures of difference of probability dis-
of Fisher and Shannon,"
Inf. Contr., vol. 2, pp. 101–112, Jun. 1959.
tributions and indirect observations,"
Studia Sci. Math. Hungar., vol.
[39] D. W. Walkup, "Pólya sequences, binomial convolution and the union
2, pp. 299–318, 1967.
of random sets,"
J. Appl. Probabil., vol. 13, no. 1, pp. 76–85, 1976.
[11] I. Csiszár and P. Shields, "Information theory and statistics: A tutorial,"
[40] N. Yannaros, "Poisson approximation for random sums of Bernoulli
Found. Trends in Commun. Inf. Theory, vol. 1, pp. 1–111, 2004.
random variables,"
Statist. Probab. Lett., vol. 11, no. 2, pp. 161–165,
[12] D. J. Daley and D. Vere-Jones
, An Introduction To The Theory of Point
Processes, Second ed.
New York: Springer, 2008, vol. II.
[41] Y. Yu, "Monotonic convergence in an information-theoretic law of small
[13] A. Fedotov, P. Harremoës, and F. Topsøe, "Refinements of Pinsker's
numbers,"
IEEE Trans. Inf. Theory, vol. 55, pp. 5412–5422, 2009.
inequality,"
IEEE Trans. Inf. Theory, vol. 49, pp. 1491–1498, Jun. 2003.
[42] V. M. Zolotarev, "Probability metrics,"
Teor. Veroyatnost. i Primenen,
[14] W. Feller
, An Introduction to Probability Theory and Its Applications,
vol. 28, no. 2, pp. 264–287, 1983.
New York: Wiley, 1971, vol. II.
[15] J. Fritz, "An information-theoretical proof of limit theorems for re-
versible Markov processes," in
Proc. 6th Prague Conf. Inf. Theory,Statist. Decision Functions, Random Processes, Prague, Sep. 1971.
Peter Harremoës (M'00) received the B.Sc. degree in mathematics in 1984, the
[16] L. Gerber, "An extension of Bernoulli's inequality,"
Amer. Math.
Exam. Art. degree in archaeology in 1985, and the M.Sc. degree in mathematics
Monthly, vol. 75, pp. 875–876, 1968.
in 1988, all from the University of Copenhagen, Denmark. He received the Ph.D.
[17] B. Grigelionis, "The convergence of stepwise random processes to a
degree in natural sciences in 1993 from Roskilde University, Denmark.
Poisson process,"
Teor. Verojatnost. i Primenen, vol. 8, pp. 189–194,
From 1993 to 1998, he worked as a mountaineer. From 1998 to 2000, he
held various teaching positions in mathematics. From 2001 to 2006, he was
[18] P. Harremoës, "Binomial and Poisson distributions as maximum
a Postdoctoral Fellow with the University of Copenhagen, with a longer visit
entropy distributions,"
IEEE Trans. Inf. Theory, vol. IT-47, pp.
to Zentrum für Interdiziplinäre Forschung, Bielefeld, Germany, 2003. From
2039–2041, Jul. 2001.
2006 to 2009, he was affiliated with Centrum Wiskunde and Informatica, Am-
[19] P. Harremoës and K. K. Holst, "Convergence of Markov chains in in-
sterdam, The Netherlands, under the European Pascal Network of Excellence.
formation divergence,"
J. Theoret. Probab., vol. 22, no. 1, pp. 186–202,
Since then he has been affiliated with Niels Brock, Copenhagen Business
College, Denmark.
[20] O. Johnson and I. Kontoyiannis, "Thinning and the law of small
Dr. Harremoës has been Editor-in-Chief of the journal
Entropy since 2007.
numbers," in
Proc. IEEE Int. Symp. Inf. Theory, Jun. 2007, pp.
1491–1495.
[21] P. Harremoës, O. Johnson, and I. Kontoyiannis, "Thinning and infor-
mation projections,"
Preparation, 2010.
[22] P. Harremoës and P. Ruzankin, "Rate of convergence to Poisson law
Oliver Johnson received the B.A. degree in 1995, Part III Mathematics in 1996
in terms of information divergence,"
IEEE Trans. Inf. Theory, vol. 50,
and the Ph.D. degree in 2000, all from the University of Cambridge.
pp. 2145–2149, 2004.
He was a Clayton Research Fellow of Christ's College and Max Newman Re-
[23] O. Johnson
, Information Theory and Central Limit Theorem.
search Fellow of Cambridge University until 2006, during which time he pub-
London, U.K.: Imperial College Press, 2004.
lished the book
Information Theory and the Central Limit Theorem in 2004.
[24] O. Johnson, "Log-concavity and the maximum entropy property of the
Since 2006, he has been Lecturer in Statistics at Bristol University, U.K.
Poisson distribution,"
Stochastic Process. Appl., vol. 117, no. 6, pp.
791–82, 2006.
[25] I. M. Johnstone and B. MacGibbon, "Une mesure d'information car-
actérisant la loi de Poisson,"
Séminaire de Probab., XXI, pp. 563–573,
Ioannis Kontoyiannis (S'94–M'98–SM"08) was born in Athens, Greece, in
1972. He received the B.Sc. degree in mathematics in 1992 from Imperial Col-
[26] A. Y. Khintchine
, Mathematical Methods in the Theory of Queueing.
lege (University of London), U.K., and in 1993 he obtained a distinction in Part
New York: Hafner, 1960.
III of the Cambridge University Pure Mathematics Tripos. He received the M.S.
[27] C. A. J. Klaassen, "On an inequality of Chernoff,"
Ann. Probab., vol.
degree in statistics and the Ph.D. degree in electrical engineering, both from
13, no. 3, pp. 966–974, 1985.
Stanford University, Stanford, CA, in 1997 and 1998, respectively.
[28] I. Kontoyiannis, P. Harremoës, and O. Johnson, "Entropy and the law
Between June and December 1995, he was with IBM Research working on
of small numbers,"
IEEE Trans. Inf. Theory, vol. IT-51, pp. 466–472,
a NASA-IBM satellite image processing and compression project. From June
1998 to August 2001, we was an Assistant Professor with the Department of
[29] E. H. Lieb, "Proof of an entropy conjecture by Wehrl,"
Commun. Math.
Statistics, Purdue University, West Lafayette, IN (and also, by courtesy, with the
Phys., vol. 62, pp. 35–41, 1978.
Department of Mathematics, and the School of Electrical and Computer Engi-
[30] T. Lindvall
, Lectures on the Coupling Method.
New York: Wiley ,
neering). From August 2000 until July 2005, he was an Assistant, then Associate
Professor, with the Division of Applied Mathematics and with the Department
[31] D. Mejzler, "A note on Erlang's formulas,"
Israel J. Math., vol. 3, pp.
of Computer Science, Brown University, Providence, RI. Since March 2005, he
157–162, 1965.
has been an Associate Professor with the Department of Informatics, Athens
[32] G. F. Newell, "The M=G=1 queue,"
SIAM J. Appl. Math., vol. 14,
University of Economics and Business.
pp. 86–88, 1966.
Dr. Kontoyiannis was awarded the Manning endowed Assistant Professorship
[33] C. Palm, "Intensitätsschwankungen im Fernsprechverkehr,"
Ericsson
in 2002, and was awarded an honorary Master of Arts degree Ad Eundem, in
Technics No, vol. 44, pp. 189–189, 1943.
2005, both by Brown University. In 2004, he was awarded a Sloan Foundation
[34] R.-D. Reiss
, Approximate Distributions of Order Statistics, ser.
Research Fellowship. Currently he serves as an Associate Editor for the IEEE
Springer Series in Statistics.
New York: Springer-Verlag, 1989.
TRANSACTIONS ON INFORMATION THEORY. His research interests include data
[35] A. Rényi, "A characterization of Poisson processes,"
Magyar Tud.
compression, applied probability, information theory, statistics, simulation, and
Akad. Mat. Kutaló Int. Közl., vol. 1, pp. 519–527, 1956.
Source: http://pages.cs.aueb.gr/~yiannisk/PAPERS/UBthin.pdf
Annex № 1 to the Order № February 15- "Rules of Carriage of Passengers, Baggage and Cargo" "Azerbaijan Airlines" Closed Joint Stock Company Table of contents 1. General provisions Terms, Definitions and abbreviations Compliance with laws and requirements of state authorities Electronic Ticket
Hippocampal microRNA-132 mediates stress-inducible cognitive deficits through its acetylcholinesterase target G. Shaltiel, M. Hanan, Y. Wolf, S. Barbash, E. Kovalev, S. Shoham & Brain Structure and Function ISSN 1863-2653Brain Struct FunctDOI 10.1007/s00429-011-0376-z Your article is published under the Creative