Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Areas of Countries and Benford’s LawModel of a Dynamical System Proposed by V. Arnold

Alex Janke1 Xiangyu Wang2

1University of MichiganAnn Arbor, MI

2Peking UniversityBeijing, China

August 12, 2011

A. Janke, X. Wang Areas of Countries and Benford’s Law



Benford’s Law

Numbers from many real-life data sets, particularly thosedominated by exponential processes, have leading digitsdistributed in a non-uniform way.Let d ∈ {1,2,3,4,5,6,7,8,9} be the leading digit of anumber. A set of numbers (or random variable) satisfiesBenford’s law if d occurs with frequency (probability) givenby the following:

P(d) = log10(d + 1)− log10(d) = log10(1 +1d

)




Benford’s Law

The following shows the frequency of leading digits predictedby Benford’s law.




Benford’s Law

The following shows the leading digit for the populations of 237countries. The dots denote the true Benford’s law.




Rationalizing Benford’s Law

The set {αn | n ∈ Zand log10(α) 6∈ Q} satisfies Benford’slaw. This is a consequence of the equidistribution theorem.A continuous random variable X whose logarithm’sfractional parts are uniformly distributed on [0,1) will satisfyBenford’s law.A continuous random variable X on a lognormaldistribution will satisfy Benford’s law increasingly well as itssecond moment approaches infinity.




The Dynamical System Proposed by V. Arnold

Consider N countries with areas A1, ...,AN drawn fromsome distribution such that

∑Ni=1 Ai = 1.

At each iteration 1, ...,n randomly select two countries tomerge together and one country to split into two equalparts.Experimental evidence suggests that for large N and n theareas of countries satisfy the first digit law, irrespective ofthe initial distribution.




Experimental Evidence

Let N = 1000 and the initial entries of A are all 1N . We generate

the entries of A after n = 10000 iterations.

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Country Index

Fra

ctionalPart

ofLog10ofCountryAre

a




Experimental Evidence

Let N = 1000 and the initial entries of A are all drawn from anexponential distribution with λ = 1000 such that the sum of theentries is normalized to one. We generate the entries of A aftern = 10000 iterations.

0 100 200 300 400 500 600 700 800 900 1000−11

−10

−9

−8

−7

−6

−5

−4

−3

−2

Country Index

Log

10ofCountryAre

a

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Country Index

Fra

ctionalPart

ofLog10ofCountryAre

a




Formalizing the Model

Let A ∈ RN be a vector such that∑N

i=1 Ai = 1. At each iteration,three distinct entries (Ai ,Aj ,Ak ) from A are randomly chosen toform V = (Ai ,Aj ,Ak )T . This vector is multiplied on the left bythe following matrix:

M =

1 1 00 0 1

20 0 1

2

We are interested in the evolution of the distribution function forthe coordinates of A. Brackets on the matrix are in honor of Bif.




Density Functions

Suppose A = (x1, ..., xN)T initially. Then the initial densityfunction is given by the following formula:

f0(t) =1N

N∑i=1

δ(t − xi)

Suppose that coordinates xj1, xj2, xj3 are chosen. The newcoordinates are given by x ′ji =

∑3k=1 mikxjk . Then the density

function after a single iteration is given by the following formula:

f1(t) =1N

N∑i=1

δ(t − xi) +1N

3∑i=1

δ(t − x ′ji)−1N

3∑i=1

δ(t − xji)




Expectation of Discrete Transitions

By linearity of the Laplace transform, the expectation of themoment generating function is just the moment generatingfunction of the expected distribution.

L{f0(t)} =1N

N∑i=1

e−xi s

E [L{f1(t)}] =1

N3

∑j1,j2,j3

(1N

N∑i=1

e−xi s+1N

3∑i=1

e−∑3

k=1 mik xjk s− 1N

e−xji s)

We then consider the expected discrete transition as thedifference E [L{f1(t)}]− L{f0(t)}. We can identify a stabledistribution for this by setting the difference equal to zero.




Stable Solution

Let L{f0(t)} = G0(s). Some algebraic magic reveals:

E [G1(s)|G0(s)]−G0(s) =1N

3∑i=1

(3∏

j=1

G0(mijs)−G0(s)) [1.1]

Here we will substitute in the values from our matrix.

E [G1(s)|G0(s)]−G0(s) =1N

(G20(s) + 2G0(

s2

)− 3G0(s)) [1.2]




Stable Solution

Suppose at each iteration we follow conditional expectation withE [G1(s)|G0(s)] = G1(s). Then the stable solution to [1.2] is:

G∞(s) =∞∑

i=0

aisi ,a0 = 1,a1 = −1,an =n−1∑i=1

aian−i

1− 21−n [1.3]

Theorem 1: G∞(s) has a positive radius of convergence.Proof: Coefficients of this power series grow no faster thanexponentially. The coefficients grow as the Catalan numbersdo. Thus the power series has a positive radius of convergence.




Continuous Approximation

We define a continuous approximation for these discretetransitions. Define G(s, t) by G(s,0) = G0(s) and t = n

N . LetN →∞ and n→∞ to define the evolution equation by thefollowing:

∂

∂tG(s, t) = (G2(s, t) + 2G(

s2, t)− 3G(s, t)) [1.4]

Theorem 2: ‖ G(s, t)− G(s, t) ‖< CtN for some Ct > 0.

Proof: The difference is bounded like the error of the Riemannapproximation for the integral of the evolution equation.




Convergence to Stable Solution

Theorem 3: The stable solution G∞(s) [1.3] is an attractor ofthe evolution equation [1.4] with a basin containing all analyticG0(s).

Proof: G(s, t) =∑∞

n=0(an − bn(t))sn, where an is defined as inthe stable solution. It can be shown that bn(t) decreasesexponentially with t . This follows from the contractive propertiesof this mapping.




Random Discrete Transitions

Now let’s assume for each step the areas of countries changerandomly, instead of along the expectation.

E [Gn(s)|Gn−1(s)] = Gn−1(s)2 + 2Gn−1(s)− 3Gn−1(s) [2.1]

If we fix the value of s, {Gn(s)} is a series of random variables.Note that E(E [X |Y ]) = E(X ) and E(X 2) = E(X )2 + Var(X ).

E(Gn(s)) = E(Gn−1(s))2 + 2E(Gn−1(s))

−3E(Gn−1(s)) + Var(Gn−1(s)) [2.2]




Random Discrete Transitions

If this procedure really converges to a unique distribution F thathas Laplace transformation LF (s), then

Gn(s)→ LF (s) a.s. and hence E(Gn(s))→ LF (s) a.s.

Noticing LF (s) is a constant, by the Continuous Mappingtheorem, we know

Var(Gn(s))→ 0 a.s. [2.3]

Then the equation will become identical to the discreteequation [1.3] we derived previously.

E(Gn(s))→ G∞(s) a.s. and therefore LF (s) = G∞(s) a.s.

In other word, if this random procedure converges, it mustconverge to a distribution whose Laplace transformation isidentical to the stable solution we’ve got before.




Convergence

Let X (i)N,n represent the area of country i at the nth step, where

we have N countries total. Via more algebraic magic, we have:

E [X 1N,n+1|FN,n] = (1− 2

N)X (1)

N,n +2N

[2.4]

Let FN,n be a filtration defined by:

FN,n = σ(FN,n−1,X(1)N,n, . . . ,X

(N)N,n )

Then by the following transformation:

Z (i)N,n =

X (i)N,n − 1

(1− 2N )n

We have by [2.4] that:

E [Z (1)N,n+1|FN,n] = Z (1)

N,n [2.5]




Convergence

Fix the ratio t = n/N and set n→∞,N →∞. This implies:

E(Z (1)+N,n ) < +∞ if n = t × N

Then by the Martingale Convergence theorem, we know

Z (1)N,n

P−→ Zt and X (1)N,n

P−→ 1+Zt

e2t n→ +∞,N → +∞,n = Nt

where Zt is a random variable with finite mean. We can then

establish that the series { Zt

e2t } will converge in probability to aunique distribution Z∞ as t →∞.




Convergence

If we can show that as t →∞, the areas will tend to beindependent or weakly dependent pairwisely, then bycombining the empirical distribution given by:

FN(x) =N∑

i=1

1N

I{X (i)N,n<x}(x)

Then, by application of the Law of Large Numbers, we canshow that:

FN(x)d−→ Z∞

This implies that the areas of countries will converge to somedistribution.




Conjecture: Final Distribution is Exponential to aPower

Our conjecture is that the final distribution function is:

F (x) = 1− e−λx1b

Where the density function is given by:

f (x) =λ

be−λx

1b x

1b−1

The coefficients of the moment generating function are:

an =Γ(nb + 1)

λnbn!n−1

The numerically approximated value for b is 1.64677.A. Janke, X. Wang Areas of Countries and Benford’s Law



Conjecture: Final Distribution is Exponential to aPower

The red line is the composed of points drawn from ourconjectured distribution. The blue line is the country areas froman experiment for n = 500000 and N = 10000 with a uniforminitial distribution.




Outlook

Can we formalize our procedure to establish weak pairwisedependence between countries for t →∞?Can it be shown that this limiting distribution is in factexponential with the parameter we estimated numerically?


Documents

Areas of Countries and Benford’s Law