The concept of Dispersiondocencia.ac.upc.edu/master/CMARC/ConceptOfVariance.pdf · 2 The concept • Law of large numbers tells us that the average result will -> E(X) • What can

1

The concept of Dispersion

2

The concept

• Law of large numbers tells us that the average result will -> E(X)

• What can dispersion can we expect arround E(X)

• Does it have a sense to use the concept of dispersion?

3

The concept

• Given a set of data or a functional distribution we would like to have a number for comparing dispersions.

Margin that embraces 50 % of all points

Extreme points

X XX X

4

The concept


Measure of curvature: Second degree polinomial

Mean distance to the center of mass of the distribution

5

The measures

• Standard deviation

• Mean absolute distance

( )( ) ( )∑

∑

∈

∈

=−=−=

===

Xx

Xx

xXPxXEX

xXxPXEX

)()(

)()()(

222 µµσ

µ

µ

σ

( ) ∑

∑

∈

∈

=−=−=

===

Xx

Xx

xXPXXEXMAD

xXxPXEX

)()(

)()()(

µµ

µ

6

The measures

• Standard deviation• Justification:

– Samples far from the mean increase quadraticallythe value of the standard deviation.

– Is the value of the curvature of the log gausian.

( )( ) ( )∑∈

=−=−=Xx

xXPxXEX )()( 222 µµσ

7

The measures• Standard deviation• Meaning of the curvature of a plane curve:

– Radius of a circle that aproximates locally the curve.

( )2

2

21

)( σµ

σπ

−−

=x

exp( ) )

21log())(log( 2

2

σπσµ +−−= xxp

2

))(log(1

σ==

dxxpd

Radius

8

The measures• Standard deviation• Meaning of the curvature of a plane curve:

– Certainty=low dispersion->Small radius->small variance

– Uncertainty=high dispersion->big radius->big variance

( )2

2

21)( σ

µ

σπ

−−

=x

exp( )

)21

log())(log( 2

2

σπσµ

+−

−=x

xp

2

))(log(1

σ==

dxxpd

Radius

9

The measures• Standard deviation

– Certainty=low dispersion->Small radius->small variance

– Uncertainty=high dispersion->big radius->big variance

65.0

))(log(1 2

==

==

σσ

σ

dxxpd

Radius

Cramer-Rao and statistics

10

Information about Gausian distributions.

• % of the cases for a given dispersion

%952 →± σµ%68→±σµ

%993 →± σµ

Mensa

11

The measures

• Mean absolute distance

( ) ∑

∑

∈

∈

=−=−=

===

Xx

Xx

xXPXXEXMAD

xXxPXEX

)()(

)()()(

µµ

µ

Samples far from the mean increase linearly the value of the standard deviation.Note: outliers do not count as much as in the standard deviation.

12

The measures• Mean absolute distance

– Uses: measure of dispersion for distributions with longer tail than gausians.

( ) ∑∈

=−=−=Xx

xXPXXEXMAD )()( µµ

Laplace vs. GausianNote: That high values are much more probable in the case of an exponential random variable

13

The measures• Mean absolute distance

– Outliers do not change so much the value of the estimation.

( ) ∑∈

=−=−=Xx

xXPXXEXMAD )()( µµ

14

The concept


Margin that embraces 50 % of all points

Extreme points

X XX X

15

The Box Plot• Gives information about the dispersion

summarizing the information about:– The interquantile size. x25-x75

– Median– Sample nearest to the 1.5 times the interquantile margin – The outliers (points >1.5 IQR)

• Examples:100 points

Gausian vs. geometric

16

Description of a random variable

• Gausian( )

2

2

21

)( σµ

σπ

−−

=x

exp

17


• Geometric L1,2,3,ifor )1()( 1 =−== − ppiXP i

18


• Negative binomial

19


• Poisson

20

Some intuitions

• The flaw of averages

http://www.stanford.edu/%7Esavage/flaw/Article.htm

Markowitz’s Idea:•Introduce variability when assesing the value of an asset.

•Maximize mean, minimizing variance.

What is better?:A-Mean benefit of 500 plus minus 400B-Mean benefit of 200 plus minus 50

21

The flaw of averages• An investment problem

– Suppose you want your $200,000 retirement fund invested in the Standard & Poor's 500 index to last 20 years. How much can you withdraw per year?

– The return of the S&P has varied over the years but has averaged about 14 percent per year since 1952.

– If you do this you will be pleased to find that you can withdraw $32,000 per year.

1)1()1(

0)1()1(

20

20

19

0

20

−++

=

=+−+ ∑=

rrrA

x

rxrAk

k

%14000.200

==

rA

22

The flaw of averages

• Model of the return:– r % with probability p– Histogram of the return– Average value r, but can fluctuate.

• Sometimes gives benefits• Sometimes losses

– Note that each month a fixed quantity is substracted independently of r

23

The flaw of averages• Simulations on real data:

Start: 1976 Avg. Return 15.3% Tanks in 10 yrs

Start: 1975 Avg. Return 15.4% Tanks in 13 yrs.

Start:1974 Avg Return 15.4% Goes the distance.

Start: 1973 Avg. Return 14% Tanks in 8 yrs.

The Flaw of AveragesBY SAM SAVAGE

Published Sunday, October 8, 2000, in the San Jose Mercury News

24

The flaw of averages• Model of the return:

– r % with probability p– (1+f)r % with probability (1-p)/2– (1-f)r % with probability (1-p)/2

• Simulation.– 4.000.000 runs.

Distribution of the invested capital

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

00-1

010

-2020

-3030

-4040

-5050

-6060

-7070

-8080

-90

90-10

0>1

00

Capital (1000s)

frac

tion

p=0.5,f=0.2 p=0.8,f=0.1

Taken fromTijms, Understanding probability

25

Tchebychev Inequality

• A bound on the upper probability.

{ } 2

2

)(c

cXEXPσ

≤>−

26


• Geometrical meaning of { } 2

2

)(c

cXEXPσ

≤>−

2

1)(

ccf ∝

27


• Geometrical meaning of

• What happens with?

{ } 2

2

)(c

cXEXPσ

≤>−

2

1)(

ccf ∝

( )211

)(x

xXP+

==π

αxxXP

1)( ==

Only valid on distributions that have finite variance!

28


• For a given probability distribution of a random variable X, with finite variance we have:

• for any k>0 or equivalently

{ } 2

2

)(c

cXEXPσ

≤>−

{ } 2

1)(

kkXEXP ≤>− σ

29

Tchebychev Inequality• Proof

– Given an ordered set– We define the subset

– then( ) ( )

{ }cXEXPc

pcpXExpXExAk

kkAk

kkk

k

>−≥

≥−≥−= ∑∑∑∈∈

)(

)()(

22

2222

σ

σ

{ }L,,,,, 54321 xxxxx

{ }cXExkA k >−= )(

cXExk >− )(

30


• Note that the inequality can be rough, and highly inexact for high values of c

• Uses:– Information theory, bounds on probabilities

and events highly unprobable.

{ }{ }

{ }

{ }

( )

( )

( )

( )

P HHHH HHHH

P FHHH HHHH

P HFFH FFHH

P FFFF FFFF

L

L

M

L

M

L

Source of

Binary data

Observed Sequences

{ } { }( ) ( ) 1/2nP HFF HFF P FHF FHH= = =L L L

{ }

{ }

{ }

{ }

12

2

/22

12

n

n

n

n

HHHH HHHH

nFHHH HHHH

n

nHFFH FFHH

FFFF FFFF

→

→

→

→

L

L

M

L

M

L

Observed symbol frequency

?

?

31

Random variables without variance

• Family known as – Pareto Stable or Mandelbrot Levy

• Models:– Internet traffic– Processes in unix systems– Speculative prices/Pluviometric Data

32

Speculative Prices

• Mandelbrot’s paper on long tail densities– An interesting result

http://classes.yale.edu/fractals/Panorama/ManuFractals/Internet/Internet4.html

33

Syndrom of infinite variance

• Mandelbrot’s paper on long tail densities

( )( ) ( )∑

∑

∈

∈

=−=−=

===

Xx

Xx

xXPxXEX

xXxPXEX

)()(

)()()(

222 µµσ

µ

αxxXP

1)( ==

Note that for alfa<2 the sum diverges

34

( )2

2

21

)( σ

µ

σπ

−−

=x

exp( ) )

21log())(log(

2

2

σπσµ +−−= xxp

Documents

The concept of Dispersiondocencia.ac.upc.edu/master/CMARC/ConceptOfVariance.pdf · 2 The concept • Law of large numbers tells us that the average result will -> E(X) • What can