13
Statistics with Excel Examples

Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

  • Upload
    lytuong

  • View
    228

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Statistics with Excel Examples

Page 2: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Questions

• What is a probability density function (pdf)?

• What is a cumulative density function (cdf)?

• What is an inverse cdf?

• Examples of distributions:

Uniform, Normal, Beta, Gamma, ChiSquare

• Random Numbers

What is a uniform distribution?

The Excel worksheet function rand()

• Synthesis of distributions.

• What is an “Order Statistic”?

Role of Beta distribution.

Synthesis of order statistic distributions.

January 31, 2012 Statistics with Excel Examples, G. Shirley 2

Page 3: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Synthesis of Distributions

• Consider a cdf, F.

Probability as a function of some distributed variable.

Examples of F: BETADIST, GAMMADIST,..

• We want a collection of x’s, x1, x2, x3,…xi,… distributed

according to F.

• How to do it?

Generate a random number from the uniform distribution on

[0,1] and plug into the inverse cdf.

Examples

January 31, 2012 Statistics with Excel Examples, G. Shirley 3

( )P F x

1

i ix F U

x=BETAINV(rand(),Alpha,Beta)

x=NORMSINV(rand()) Distributed normally, with mean 0, and sd = 1.

Page 4: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Order Statistics

• Sample n numbers from a distribution, F.

• Pick the kth smallest

k=1 is the smallest.

k=n is the largest.

• Do this many times.

• How is the k:n distributed?

It is the k:n (“k of n”) order statistic of F.

The 1:1 order statistic is F itself.

January 31, 2012 Statistics with Excel Examples, G. Shirley 4

Page 5: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Synthesis of Order Statistics

• Mental furniture (just know it):

The k:n order statistic of the uniform distribution is the Beta

distribution with Alpha = k, Beta = n+k-1.

• To generate numbers distributed according to the k:n

order statistic of the uniform distribution:

• And for any distribution, generate its k:n order statistic by

plugging Uk:n into the “probability” argument of its inverse

cdf, for example:

January 31, 2012 Statistics with Excel Examples, G. Shirley 5

x = BETAINV(rand(), k, n+1-k)

Alpha Beta U1:1 Uk:n

xk:n=NORMSINV(Uk:n) Isn’t that nice!

Page 6: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Normal Distribution

• To synthesize instances distributed according to a dist’n,

plug uniformly dist’d random numbers into the probability

argument of the inverse cdf of the dist’n.

• Synthesis of normally distributed data

Mean = m, Variance = V (standard error = V)

January 31, 2012 Statistics with Excel Examples, G. Shirley 6

1

1

Probability is called the "Probit"

NORMSINV rand()

x m V z z

x m V u

x m V

Uniformly distributed

Normally distributed with mean, m, and standard deviation V.

Inverse of standard normal dist’n. Mean = 0, variance = 1.

Page 7: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Synthesis of a Multi-Normal Dist’n

• For each sample, instead of generating one random

number, generate one vector of random numbers.

• And make the numbers in each vector correlated.

• To do this, generalize

to

January 31, 2012 Statistics with Excel Examples, G. Shirley 7

x m V z

x m R z

1 11 12 131 1

2 21 22 232 2

3 31 32 333 3

.. ..

.. .... ..

.. .... ..

i i

i i

i i

m R R Rx z

m R R Rx z

m R R Rx z

Vector of independently sampled Probits. Instances are i = 1, 2, 3, ..

Matrix which is the “root” of the covariance matrix.

Vector of means.

Desired sample vector i.

Square root of variance!

Page 8: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Covariance Matrix, and Its Root

• The two dimensional covariance matrix is...

• Since V is real-symmetric and +ve definite, V can be

factorized such that

• So, since..

• ..we have

January 31, 2012 Statistics with Excel Examples, G. Shirley 8

2

1 1 2

2

1 2 2

V

22 2

1 1 1

22 2

2 2 2

1 2 1 2 1 2

x x

x x

x x x x

V R RTranspose of R.

1 1 21 1 1 1

2 2 2 22 2 2 2 2 2 2

1 0 1 00 0 0 01

0 0 0 01 1 0 1 1 0 1V

2

22

1

1

0

R The upper (or lower) triangular

root is the “Cholesky root”.

Page 9: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Covariance Matrix, and Its Root

• Other roots differing from the Cholesky root by a rotation

work too..

• So

January 31, 2012 Statistics with Excel Examples, G. Shirley 9

1 1

2 2

1 1 1 12 2 2 2

1 1

1 1 1 12 2

2 2 2 2

1 11 12 2

1 12 22 2

0 01

0 01

1 1 1 1 1 1 1 10 0

0 01 1 1 1 1 1 1 1

1 1 1 1

1 1 1 1

V

1 11 22 2

1 11 22 2

1 1 1 1

1 1 1 1

1111

1111

21

221

2

21

121

1R

Page 10: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

2-D Example

• What happens when = 1, = -1, = 0?

• U1 and U2 are independently sampled from the uniform

distribution on [0,1].

January 31, 2012 Statistics with Excel Examples, G. Shirley 10

11 1 1

22 2 22 2

1 1 1 1

2

2 2 2 1 2 2

1

1 1

1

2 2

0

1

1

NORMSINV(rand())

NORMSINV(rand())

x m R z

x m z

x m z

x m z

x m z z

z U

z U

Page 11: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Gaussian Copula

• Set m1 = m2 = 0, 1 = 2 = 1

• x1 and x2 are normally distributed with mean = 0, var = 1

• c1 and c2 are uniform on [0,1], so simulate any marginal

distn’s

January 31, 2012 Statistics with Excel Examples, G. Shirley 11

1 1

22 2

1

1 1

1

2 2

1 0

1

NORMSINV(rand())

NORMSINV(rand())

x z

x z

z U

z U

1 1

2 2

c x

c x

1

11

12 2

A xd

d B x

A, B are arbitrary cdfs.

inverse!

Page 12: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Gaussian Copula

• Infamously implicated in financial disaster:

April 20, 2011 Miscorrelation in Manufacturing Test 12

Recipe for Disaster:

The Formula that Killed Wall Street.

Wired Mag. February 2009

Page 13: Statistics with Excel Examples - Computer Action Teamweb.cecs.pdx.edu/~cgshirl/Documents/Demonstrations...distribution on [0,1]. Statistics with Excel Examples, G. Shirley January

Extension to N Dimensions

• Gaussian copulas are easily extended to N dimensions.

• If all marginal distributions have m = 0 and = 1, then

• Calculation of the Cholesky root of V.

Analytically messy for N > 2.

But algorithms are easily available.

January 31, 2012 Statistics with Excel Examples, G. Shirley 13

11 12 13

21 22 23

31 32 33

..

..

..

.. .. .. ..

ij ji i j i j

V V V

V V VV V V x x x x

V V V

12 13

21 23

31 32

1 ..

1 ..

1 ..

.. .. .. ..

V

ij = ji is the correlation coefficient

between variables i and j.