23
Lecture II-2: Probability Review Lecture Outline: Random variables and probability distributions Functions of a random variable, moments Multivariate probability Marginal and conditional probabilities and moments Multivariate normal distributions Application of probabilistic concepts to data assimilation

Lecture II-2: Probability Review

  • Upload
    lundy

  • View
    51

  • Download
    0

Embed Size (px)

DESCRIPTION

Lecture II-2: Probability Review. Lecture Outline: Random variables and probability distributions Functions of a random variable, moments Multivariate probability Marginal and conditional probabilities and moments Multivariate normal distributions - PowerPoint PPT Presentation

Citation preview

Page 1: Lecture II-2: Probability Review

Lecture II-2: Probability Review

Lecture Outline:

• Random variables and probability distributions• Functions of a random variable, moments• Multivariate probability• Marginal and conditional probabilities and moments• Multivariate normal distributions • Application of probabilistic concepts to data assimilation

Page 2: Lecture II-2: Probability Review

Random Variables and Probability Density Functions

A random variable is a variable whose possible values are distributed throughout a specified range. The variable’s probability density function (PDF) describes how these values are distributed (i.e. it gives the probability that the variable value falls within a particular interval).

Smallest values are most likely

y

fy (y)

Exponential distribution(e.g. event rainfall)

0

0 31 2 4 y

fy (y)

Discrete distribution(e.g. number of severe storms)

Only discrete values (integers) are possible

Probability that y = 20.2

0.3

0.250.15

0.1

0 1

All values between 0 and 1 are equally likely

y

fy (y)

Uniform distribution(e.g. soil texture)

Continuous PDFs

A Discrete PDF

Page 3: Lecture II-2: Probability Review

Interval Probabilities

Probability that x falls in interval (x1, x2]:

)()(2

1

21

y

yy dfyyy Prob

Continuous PDF:

]2,1(

21 )( )(yyiy

iyfyyy Prob

Discrete PDF: y1 y2 y

f y(y)

-4 -2 0 2 40

0.2

0.4

-4 -2 0 2 40

0.2

0.4f y(y)

y1 y2 y

Probability that y takes on some value in the range (- , + ) is 1.0:1 )( y Prob

That is, area under PDF must = 1

Page 4: Lecture II-2: Probability Review

Example: Calculating Interval Probabilities from a Continuous PDF

Historical data indicate that average rainfall intensity y during a particular storm follows an exponential distribution:

36.0 )1.0(exp)1.0()(10

21

dyyy Prob

What is the probability that a given storm will produce greater than 10mm. of rainfall if a =0.1 mm-1 ?

otherwise ; 0)(

0 ; )(exp)(

y f

yayay f

y

y

0 20 40 60 800

0.02

0.04

0.06

0.08

0.1

0.12

a=0.1 mm -1

y (mm)

)(y f y

Page 5: Lecture II-2: Probability Review

Cumulative Distribution Functions

Cumulative distribution function (CDF) of x (probability that x is less than ):

Continuous PDF:

Discrete PDF:

y

-4 -2 0 2 40

0.2

0.4 Area = F y ( )

y

) ()(

dfy Prob )(F yy

y

-4 -2 0 2 40

0.5

1F y ()f y(y)

)()(

iyiy yfy Prob )(F

-4 -2 0 2 40

0.2

0.4 f y(y)

-4 -2 0 2 40

0.5

1

F y ()

y

Note that F y () 1.0 !

Page 6: Lecture II-2: Probability Review

-3 -2 -1 0 1 20

0.2

0.4

0.6

0.8

1

0 10 20 30 40 50-4

-2

0

2

How are these 50 monthly streamflows distributed over range of observed values?

Constructing PDFs and CDFs From Data

Rank data from smallest to largest value and divide into bins (sample PDF or histogram) or plot normalized rank (rank/50) vs. value (sample CDF)

Histogram (Sample PDF)

y

y

t

Sample CDFy

Sample CDF may be fit with a standard function (e.g. Gaussian)

2-3 -2 -1 0 1 20

5

10

Page 7: Lecture II-2: Probability Review

The expectation of a function z = g(y) of the random variable y is defined as:

Expectation of a Random Variable

Expectation is a linear operator:

][][][ 2121 ybEyaEbyayE

Note that expectation of y is not a random variable but is a property of the PDF f y(y ).

dpgygEdpzE yz )()()]([ )(][

or

)()()]([ )(][ ii

yiii

zi ypygygEzpzzE or

Continuous:

Discrete:

Page 8: Lecture II-2: Probability Review

Moments and Other Properties of Random Variables

Non-central Moments of y:

Mean:

)(][

)(][

222 dyypyyEy

dyypyyEy

y

y

Second moment:

Central Moments of y:

2

22

222

][

)()(])[(

yy

yy

yyE

dyypyyyyE

Variance:

Standard deviation:

Integrals are replaced by sums when PDF is discrete

0 2 4 6 8 10 12 140

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Prob(y>y95) =0.05y95

Mode (peak)

1 Standard deviation

Median

Prob(y > median) = Prob(y median) =0.5

Mean

Page 9: Lecture II-2: Probability Review

The mean and variance of a random variable distributed uniformly between 0 and 1 are:

Expectation Example

-0.5 0 0.5 1 1.50

0.2

0.4

0.6

0.8

1

Mean defines “center” of distribution

Standard deviation

Mean: 21)1()(

1

0

dyydyypyy y

121

41

31)1()(

1

0

2222

dyyydyypy yyVariance:

29.063 2 yy Standard

deviation:

Page 10: Lecture II-2: Probability Review

f y1 y2 ( 0, 2 )

J uly (y2)0 1 2

0 0.05 0.1 0.151 0.1 0.15 0.20

J une(y1)

2 0.15 0.05 0.05

We frequently work with groups of related random variables.

Discrete example: y1 = number of storms in June (0, 1, or 2) y2 = number of storms in July (0, 1, or 2)

Multiple (Jointly Distributed) Random Variables

Table of joint (multivariate) probabilities:

Assemble multiple random variables in vectors: y = [y1, y2 , …, yn ]

Shorthand:f y (y ) = f y1 y2... yn (y1 , y2 ,..., yn )

01

2

0

1

20

0.1

0.2

0.3

0.4 f y1 y2 ( y1 , y2 )

y1 y2

Plot table as discrete joint PDF with two independent variables y1 and y2

Page 11: Lecture II-2: Probability Review

In multivariate problems interval probabilities are replaced by the probability that the n random variables fall in a specified region (R) of the n-dimensional space with coordinates ( y1 , y2 , …, yn ) .

Interval Probabilities for Multivariate Random Variables

Bivariate case -- Probability that the pair of variables ( y1 , y2 ) lies in a region R in the y1 - y2 plane is:

Discrete PDF (discrete contour plot):

) (]),([R)21(

212121

, yy

y y , yy fRyy Prob

Region R

) (]),([ 21212121 dyyR

d , yy fRyy Prob y y

10 20 30 40 50 60

10

20

30

40

50

60

Region RContinuous PDF (contour plot):

y1

y2

0.15

y1

y2

00 1

1

2

2

Page 12: Lecture II-2: Probability Review

General Multivariate Moments

The mean of a vector of n random variables y = [y1, y2 , …, yn ] is an n vector:

The correlation coefficient between any two scalar random variables (e.g. two elements of the vector y) is:

)())((

))(()(

]))([()( 2

2

T

222211

112211

yy y-yy-yy-y

y-yy-yy-y

y-yy-yEC yCov

Second non-central moment of a vector y is an n by n matrix, called the covariance matrix:

] ..., ,,[ 21 nyyyy

ki

yiyk

ki

kkiiik

Cy-yy-yE

)])([(

If Cyiyk = ik = 0 then yi and yi are uncorrelated.

Page 13: Lecture II-2: Probability Review

Marginal and Conditional PDFs

The marginal PDF of any one of a set of jointly distributed random variables is obtained by integrating joint density over all possible values of the other variables. In the bivariate case marginal density of y1 is:

Continuous PDF : ) ( ) ( 2212111 yd , yy fyf y y y

Discrete PDF:

2 all) ( ) ( 212111

y , yy fyf y y y

The conditional PDF of a random variable yi for a given value of some other random variable yk is defined as:

)(

) ( )y| ( k| y f

, yy fyf

kyk

ki yi ykiyk yi

The conditional density of yi given yk is a valid probability density function (e.g. the area under this function must = 1).

Page 14: Lecture II-2: Probability Review

For the discrete example described earlier the marginal probabilities are obtained by summing over columns [ to get f y1 ( y 1 ) ] or rows [ to get f y2 ( y 2 ) ] :

Discrete Marginal and Conditional Probability Example

J uly (y2)0 1 2 f (y1)

0 0.05 0.1 0.15 0.301 0.1 0.15 0.20 0.452 0.15 0.05 0.05 0.25

J une(y1)

f (y2) 0.30 0.30 0.40 1.00

Marginal densities shown in color (last row and last column)

y1 f y1 ( y1 )0 0.301 0.452 0.25TOTAL 1.00

y2 f y2 ( y2 )0 0.301 0.302 0.40TOTAL 1.00

The conditional density of y1 (June storms) given that y2 = 1 (one storm in July) is obtained by dividing the entries in the y2 = 1 column by f y2 ( y2=1) = 0.3:

y1 f y1 | y2 ( y1| y2 = 1)0 0.1/0.3 = 1/31 0.15/0.3 = 1/22 0.25/0.3 = 1/6TOTAL 1.00

)1(

)1 ( 1)y| (

22

212|1212|1

y f

, yy fyf

y

y y y y

Page 15: Lecture II-2: Probability Review

Conditional moments are defined in the same way as regular moments, except that the unconditional density [e.g. f y1 ( y1 )] is replaced by the conditional density [e.g. f y1|y2 (y1 | y12=1)] in the appropriate definitions.

Conditional Moments

Note that the conditional variance (uncertainty) of y1 is smaller than the unconditional variance. This reflects the decrease in uncertainty we gain by knowing that y12=1.

For discrete example, unconditional mean and variance of y1 may be computed directly from f y1 ( y1) table:

y1 f y1 ( y1 )0 0.301 0.452 0.25 55.0

)95.0()25.0((2)(.45)(1))3.0((0) )(

95.0)25.0)(2()45.0)(1()3.0)(0( ) (2222

1

1

yVar

yE

The conditional mean and variance of y1 given that y2 = 1 may be computed directly from f y1|y2 (y1 | y12=1)] table:

y1 f y1 | y2 ( y1| y2 = 1)0 0.1/0.3 = 1/31 0.15/0.3 = 1/22 0.25/0.3 = 1/6 47.036/17

)83.0()6/1((2)(1/2)(1))3/1((0) )1|(

83.06/5)6/1)(2()2/1)(1()3/1)(0( 1)| (2222

21

21

yyVar

yyE

Page 16: Lecture II-2: Probability Review

Independent Random Variables

)()(),(

)()|(

)()|(

|

|

zfyfyzf

zfyzf

yfzyf

zyzy

zyz

yzy

Two random vectors y and z are independent if any of the following equivalent expressions holds:

Independent variables are also uncorrelated, although the converse may not be true.

For example, for the combination (y1 = 0, y2 = 0 ) we have:

In the discrete example described above, the two random variables y and y are not independent because:

)()(),( 22112121 yfyfyyf yyyy

09.0)0()0(

05.0)0,0(

21

21

yy

yy

ff

f

Page 17: Lecture II-2: Probability Review

A function z = g(y) of a random variable is also a random variable, with its own PDF f z(z).

Functions of a Random Variable

-2 -1 0 1 20

2

4

6

8

z = g(y) = e y

Range of possible y values

Corresponding range of z values

f y(y) f z(z)z = g (y)

0 1 2 3 40

0.2

0.4

0.6

0.8

f z(z)(lognormal)

-4 -2 0 2 40

0.1

0.2

0.3

0.4 f y(y)(normal)

The basic concept also applies to multivariate problems, where y and z are random vectors and z = g (y) is a vector transformation.

Page 18: Lecture II-2: Probability Review

Derived Distributions

The PDF f z(z) of the random variable z = g(y) may be sometimes be derived in closed form from g(y) and f z(z). When this is not possible Monte Carlo (stochastic simulation) methods may be used.

If y and z are scalars and z = g(y) has a unique solution y = g -1(z) for all permissible y, then:

)]([)('

1)( 1 zgfzg

zf yz

where:

)(1

)()('zgydy

ydgzg

An important example for data assimilation purposes is the simple scalar linear transformation z = g() = a + , where is a random variable with PDF f () and a is a constant. Then g -1(z) = z - a and the PDF of the random variable z is:

][][11)( azfazfzf z

If z = g(y) has multiple solutions the right-hand side term is replaced by a sum of terms evaluated at the different solutions. This result extends to vectors of random variables and a vector transformation z = g(y) if the derivative g’ is replaced by the Jacobian of g(y).

Page 19: Lecture II-2: Probability Review

Bayes Theorem

The definition of the conditional PDF may be applied twice to obtain Bayes Theorem, which is very important in data assimilation. To illustrate, suppose that we seek the PDF of a state vector y given that a measurement vector has the value z. This conditional PDF may be computed as follows.:

dyyfy f

yfy f

f

y fz|y f

f

y, z fyf

y y z

y yz

z

y y z

z

yzz y

)()|(z

)()|(z

(z)

)()(

(z)

)( z)|(

|

|||

This expression is useful because it may be easier to determine f z|y( z|y) and then

compute f y|z( y|z) from Bayes Theorem than to derive f y|z( y|z) directly. For example, suppose that:

yz

Then if y is given (not random) f z | y(z| y) = f (z - y). If the unconditional PDFs f ()

and f y(y) are specified they can be substituted into Bayes Theorem to give the desired

PDF f y|z( y|z). The specified PDFs can be viewed as prior information about the uncertain measurement error and state.

Page 20: Lecture II-2: Probability Review

Multivariate Normal (Gaussian) PDFs

) () (21exp)2()( 12/1

y-yC y-yCyf -

yyT

yyn

y

Multivariate normal PDF of the n vector y = [y1, y2 , …, yn ] is completely determined by mean and covariance C yy of y:y

Where | C yy | represents determinant of C yy and C yy-1 represents inverse of C yy .

The only widely used continuous joint PDF is the multivariate normal (or Gaussian):

Bivariate normal PDF: .f y1 y2 ( y1 , y2 )

y2

y1

Mean of normal PDF is at peak value. Contours of equal PDF form ellipses.

Page 21: Lecture II-2: Probability Review

Important Properties of Multivariate Normal Random Variables

The following properties of multivariate normal random variables are frequently used in data assimilation:

• A linear combination z = a1 y1+a2 y2+ … an yn = a T y of jointly normal random variables y = [y1 , y2 , … , yn]T is also a normal random variable. The mean and variance of z are:

yaz T

aCa yyT

z 2

• If y and z are multivariate normal random vectors with a joint PDF fyz(y, z) the

marginal PDFs fy (y) and fz(z) and the conditional PDFs f y| z (y| z) and f z| y (z| y) are also multivariate normal.• Linear combinations of independent random variables become normally distributed as the number of variables approaches infinity (this is the Central Limit Theorem)

In practice, many other functions of multiple independent random variables also have nearly normal PDFs, even when the number of variables is relatively small (e.g. 10-100). For this reason environmental variables are often observed to be normally distributed.

Page 22: Lecture II-2: Probability Review

Conditional Multivariate Normal PDFs and Moments

)]( [)](y [21exp

)(

),()|( 1

||

z|yE-yCz|E-yK

zf

zyfzyf zyy

T

z

yzzy

The conditional covariance is “smaller” than the unconditional y covariance (since the difference matrix [Cy y - Cyy| z] is positive definite). This decrease in uncertainty about y reflects the additional information provided by z

Where:

Consider two vectors of random variables which are all jointly normal:y = [y1, y2 , …, yn ] (e.g. a vector of n states)z = [z1, z2 , …, zm ] (e.g. a vector of m measurements)

The conditional PDF of y given z is:

2/1

1|

1

|])|(|)2[(

]))([

][)|(

zyCovK

zzyyEC

CCCCC

yzCCyzyE

n

Tyz

yzzzyzyyzyy

zzyz

(Conditional mean)

(Conditional covariance)

(y, z cross-covariance)(Normalization constant)

Page 23: Lecture II-2: Probability Review

Application of Probabilistic Concepts to Data Assimilation

• Our knowledge of the state after we include measurements is characterized by the conditional PDF f y|z (y| z). This density can be derived from Bayes Theorem. When y and z are multivariate normal f y|z (y| z) can be readily obtained from the multivariate normal expressions presented earlier. In other cases approximations must be made.

• Suppose we use a model and a postulated unconditional PDF f u ( u) for the input u to derive an unconditional PDF f y ( y ) for the state y . f y ( y ) characterizes our knowledge of the state before we include any measurements.

• Now suppose that we want to include information contained in the measurement vector z . This measurement is also a random vector because it depends on the random state y and the random measurement error . The measurement PDF is f z ( z ).

• Data assimilation seeks to characterize the true but unknown state of an environmental system. Physically-based models help to define a reasonable range of possible states but uncertainties remain because the model structure may be incorrect and the model’s inputs may be imperfect. These uncertainties can be accounted for in an approximate way if we assume that the models inputs and states are random vectors.

• The estimates (or analyses) provided by most data assimilation methods are based in some way on the conditional density f y|z (y| z) .