23
Introduction Evolution of Density Functions Convergence to Distribution Outlook Areas of Countries and Benford’s Law Model of a Dynamical System Proposed by V. Arnold Alex Janke 1 Xiangyu Wang 2 1 University of Michigan Ann Arbor, MI 2 Peking University Beijing, China August 12, 2011 A. Janke, X. Wang Areas of Countries and Benford’s Law

Areas of Countries and Benford’s Law

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Areas of Countries and Benford’s LawModel of a Dynamical System Proposed by V. Arnold

Alex Janke1 Xiangyu Wang2

1University of MichiganAnn Arbor, MI

2Peking UniversityBeijing, China

August 12, 2011

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 2: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Benford’s Law

Numbers from many real-life data sets, particularly thosedominated by exponential processes, have leading digitsdistributed in a non-uniform way.Let d ∈ {1,2,3,4,5,6,7,8,9} be the leading digit of anumber. A set of numbers (or random variable) satisfiesBenford’s law if d occurs with frequency (probability) givenby the following:

P(d) = log10(d + 1)− log10(d) = log10(1 +1d

)

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 3: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Benford’s Law

The following shows the frequency of leading digits predictedby Benford’s law.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 4: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Benford’s Law

The following shows the leading digit for the populations of 237countries. The dots denote the true Benford’s law.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 5: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Rationalizing Benford’s Law

The set {αn | n ∈ Zand log10(α) 6∈ Q} satisfies Benford’slaw. This is a consequence of the equidistribution theorem.A continuous random variable X whose logarithm’sfractional parts are uniformly distributed on [0,1) will satisfyBenford’s law.A continuous random variable X on a lognormaldistribution will satisfy Benford’s law increasingly well as itssecond moment approaches infinity.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 6: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

The Dynamical System Proposed by V. Arnold

Consider N countries with areas A1, ...,AN drawn fromsome distribution such that

∑Ni=1 Ai = 1.

At each iteration 1, ...,n randomly select two countries tomerge together and one country to split into two equalparts.Experimental evidence suggests that for large N and n theareas of countries satisfy the first digit law, irrespective ofthe initial distribution.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 7: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Experimental Evidence

Let N = 1000 and the initial entries of A are all 1N . We generate

the entries of A after n = 10000 iterations.

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Country Index

Fra

ctionalPart

ofLog10ofCountryAre

a

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 8: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Experimental Evidence

Let N = 1000 and the initial entries of A are all drawn from anexponential distribution with λ = 1000 such that the sum of theentries is normalized to one. We generate the entries of A aftern = 10000 iterations.

0 100 200 300 400 500 600 700 800 900 1000−11

−10

−9

−8

−7

−6

−5

−4

−3

−2

Country Index

Log

10ofCountryAre

a

0 100 200 300 400 500 600 700 800 900 10000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Country Index

Fra

ctionalPart

ofLog10ofCountryAre

a

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 9: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Formalizing the Model

Let A ∈ RN be a vector such that∑N

i=1 Ai = 1. At each iteration,three distinct entries (Ai ,Aj ,Ak ) from A are randomly chosen toform V = (Ai ,Aj ,Ak )T . This vector is multiplied on the left bythe following matrix:

M =

1 1 00 0 1

20 0 1

2

We are interested in the evolution of the distribution function forthe coordinates of A. Brackets on the matrix are in honor of Bif.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 10: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Density Functions

Suppose A = (x1, ..., xN)T initially. Then the initial densityfunction is given by the following formula:

f0(t) =1N

N∑i=1

δ(t − xi)

Suppose that coordinates xj1, xj2, xj3 are chosen. The newcoordinates are given by x ′ji =

∑3k=1 mikxjk . Then the density

function after a single iteration is given by the following formula:

f1(t) =1N

N∑i=1

δ(t − xi) +1N

3∑i=1

δ(t − x ′ji)−1N

3∑i=1

δ(t − xji)

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 11: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Expectation of Discrete Transitions

By linearity of the Laplace transform, the expectation of themoment generating function is just the moment generatingfunction of the expected distribution.

L{f0(t)} =1N

N∑i=1

e−xi s

E [L{f1(t)}] =1

N3

∑j1,j2,j3

(1N

N∑i=1

e−xi s+1N

3∑i=1

e−∑3

k=1 mik xjk s− 1N

e−xji s)

We then consider the expected discrete transition as thedifference E [L{f1(t)}]− L{f0(t)}. We can identify a stabledistribution for this by setting the difference equal to zero.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 12: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Stable Solution

Let L{f0(t)} = G0(s). Some algebraic magic reveals:

E [G1(s)|G0(s)]−G0(s) =1N

3∑i=1

(3∏

j=1

G0(mijs)−G0(s)) [1.1]

Here we will substitute in the values from our matrix.

E [G1(s)|G0(s)]−G0(s) =1N

(G20(s) + 2G0(

s2

)− 3G0(s)) [1.2]

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 13: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Stable Solution

Suppose at each iteration we follow conditional expectation withE [G1(s)|G0(s)] = G1(s). Then the stable solution to [1.2] is:

G∞(s) =∞∑

i=0

aisi ,a0 = 1,a1 = −1,an =n−1∑i=1

aian−i

1− 21−n [1.3]

Theorem 1: G∞(s) has a positive radius of convergence.Proof: Coefficients of this power series grow no faster thanexponentially. The coefficients grow as the Catalan numbersdo. Thus the power series has a positive radius of convergence.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 14: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Continuous Approximation

We define a continuous approximation for these discretetransitions. Define G(s, t) by G(s,0) = G0(s) and t = n

N . LetN →∞ and n→∞ to define the evolution equation by thefollowing:

∂tG(s, t) = (G2(s, t) + 2G(

s2, t)− 3G(s, t)) [1.4]

Theorem 2: ‖ G(s, t)− G(s, t) ‖< CtN for some Ct > 0.

Proof: The difference is bounded like the error of the Riemannapproximation for the integral of the evolution equation.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 15: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Convergence to Stable Solution

Theorem 3: The stable solution G∞(s) [1.3] is an attractor ofthe evolution equation [1.4] with a basin containing all analyticG0(s).

Proof: G(s, t) =∑∞

n=0(an − bn(t))sn, where an is defined as inthe stable solution. It can be shown that bn(t) decreasesexponentially with t . This follows from the contractive propertiesof this mapping.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 16: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Random Discrete Transitions

Now let’s assume for each step the areas of countries changerandomly, instead of along the expectation.

E [Gn(s)|Gn−1(s)] = Gn−1(s)2 + 2Gn−1(s)− 3Gn−1(s) [2.1]

If we fix the value of s, {Gn(s)} is a series of random variables.Note that E(E [X |Y ]) = E(X ) and E(X 2) = E(X )2 + Var(X ).

E(Gn(s)) = E(Gn−1(s))2 + 2E(Gn−1(s))

−3E(Gn−1(s)) + Var(Gn−1(s)) [2.2]

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 17: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Random Discrete Transitions

If this procedure really converges to a unique distribution F thathas Laplace transformation LF (s), then

Gn(s)→ LF (s) a.s. and hence E(Gn(s))→ LF (s) a.s.

Noticing LF (s) is a constant, by the Continuous Mappingtheorem, we know

Var(Gn(s))→ 0 a.s. [2.3]

Then the equation will become identical to the discreteequation [1.3] we derived previously.

E(Gn(s))→ G∞(s) a.s. and therefore LF (s) = G∞(s) a.s.

In other word, if this random procedure converges, it mustconverge to a distribution whose Laplace transformation isidentical to the stable solution we’ve got before.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 18: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Convergence

Let X (i)N,n represent the area of country i at the nth step, where

we have N countries total. Via more algebraic magic, we have:

E [X 1N,n+1|FN,n] = (1− 2

N)X (1)

N,n +2N

[2.4]

Let FN,n be a filtration defined by:

FN,n = σ(FN,n−1,X(1)N,n, . . . ,X

(N)N,n )

Then by the following transformation:

Z (i)N,n =

X (i)N,n − 1

(1− 2N )n

We have by [2.4] that:

E [Z (1)N,n+1|FN,n] = Z (1)

N,n [2.5]

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 19: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Convergence

Fix the ratio t = n/N and set n→∞,N →∞. This implies:

E(Z (1)+N,n ) < +∞ if n = t × N

Then by the Martingale Convergence theorem, we know

Z (1)N,n

P−→ Zt and X (1)N,n

P−→ 1+Zt

e2t n→ +∞,N → +∞,n = Nt

where Zt is a random variable with finite mean. We can then

establish that the series { Zt

e2t } will converge in probability to aunique distribution Z∞ as t →∞.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 20: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Convergence

If we can show that as t →∞, the areas will tend to beindependent or weakly dependent pairwisely, then bycombining the empirical distribution given by:

FN(x) =N∑

i=1

1N

I{X (i)N,n<x}(x)

Then, by application of the Law of Large Numbers, we canshow that:

FN(x)d−→ Z∞

This implies that the areas of countries will converge to somedistribution.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 21: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Conjecture: Final Distribution is Exponential to aPower

Our conjecture is that the final distribution function is:

F (x) = 1− e−λx1b

Where the density function is given by:

f (x) =λ

be−λx

1b x

1b−1

The coefficients of the moment generating function are:

an =Γ(nb + 1)

λnbn!n−1

The numerically approximated value for b is 1.64677.A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 22: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Conjecture: Final Distribution is Exponential to aPower

The red line is the composed of points drawn from ourconjectured distribution. The blue line is the country areas froman experiment for n = 500000 and N = 10000 with a uniforminitial distribution.

A. Janke, X. Wang Areas of Countries and Benford’s Law

Page 23: Areas of Countries and Benford’s Law

IntroductionEvolution of Density Functions

Convergence to DistributionOutlook

Outlook

Can we formalize our procedure to establish weak pairwisedependence between countries for t →∞?Can it be shown that this limiting distribution is in factexponential with the parameter we estimated numerically?

A. Janke, X. Wang Areas of Countries and Benford’s Law