Upload
amer-ap
View
246
Download
1
Embed Size (px)
DESCRIPTION
wew
Citation preview
Expectation
for multivariate distributions
Definition
Let X1, X2, …, Xn denote n jointly distributed random variable with joint density function
f(x1, x2, …, xn )
then 1, , nE g X X
1 1 1, , , , , ,n n ng x x f x x dx dx
ExampleLet X, Y, Z denote 3 jointly distributed random variable with joint density function then
2127 0 1,0 1,0 1
, ,0 otherwise
x yz x y zf x y z
Determine E[XYZ].
1 1 1
2
0 0 0
127
E XYZ xyz x yz dxdydz Solution:
11 1 1 14 2
2 2 2 2
0 0 0 00
12 3 27 4 2 7
x
x
x xyz y z dydz yz y z dydz
1 1 1
3 2 2
0 0 0
127
x yz xy z dxdydz
11 12 32 2
0 00
3 3 1 227 2 3 7 2 3
y
y
y yz z dz z z dz
12 3
0
3 2 3 1 2 3 17 177 4 9 7 4 9 7 36 84
z z
Some Rules for Expectation
1 11. , ,i i n nE X x f x x dx dx
i i i ix f x dx
Thus you can calculate E[Xi] either from the joint distribution of
X1, … , Xn or the marginal distribution of Xi. Proof: 1 1, , , ,i n nx f x x dx dx
1 1 1 1, ,i n i i n ix f x x dx dx dx dx dx
i i i ix f x dx
1 1 1 12. n n n nE a X a X a E X a E X
The Linearity property
Proof:
1 1 1 1, ,n n n na x a x f x x dx dx
1 1 1 1, , n na x f x x dx dx
1 1, ,n n n na x f x x dx dx
1 1, , , ,q q kE g X X h X X
In the simple case when k = 2
3. (The Multiplicative property) Suppose X1, … , Xq
are independent of Xq+1, … , Xk then
1 1, , , ,q q kE g X X E h X X
E XY E X E Y
if X and Y are independent
1 1, , , ,q q kE g X X h X X
Proof:
1 1 1 1, , , , , ,q q k k ng x x h x x f x x dx dx
1 1, , , ,q q kE g X X E h X X
1 1 1 1, , , , , ,q q k qg x x h x x f x x
2 1 1 1, ,q k q q kf x x dx dx dx dx
1 1 1 1, , q q q kf x x dx dx dx dx
1 2 1 1, , , , , ,q k q k qh x x f x x g x x
1 1, , , ,q q kE g X X E h X X
1 2 1 1, , , ,q k q k q kh x x f x x dx dx
1, , qE g X X
Some Rules for Variance
2 2 2Var X XX E X E X
1. Var Var Var 2Cov ,X Y X Y X Y
Proof
Thus
where Cov , = X YX Y E X Y
2Var X YX Y E X Y
where X Y X YE X Y
2Var X YX Y E X Y
2 22X X Y YE X X Y Y Var 2Cov , VarX X Y Y
and Var Var VarX Y X Y
Note: If X and Y are independent, then
Cov , = X YX Y E X Y
= X YE X E Y
= 0X YE X E Y
2 2and Var 2 X Y XY X YX Y
Definition: For any two random variables X and Y then define the correlation coefficient XY to be:
Cov , Cov ,=
Var Varxy
X Y
X Y X Y
X Y
Thus Cov , = XY X YX Y
if X and Y are independent
2 2X Y
Properties of the correlation coefficient XY
Cov , Cov ,=
Var Varxy
X Y
X Y X Y
X Y
If and are independent than 0.XYX Y
: Cov , 0 X Y Reason
The converse is not necessarily true.i.e. XY = 0 does not imply that X and Y are independent.
More properties of the correlation coefficient XY
1 1XY
if there exists a and b such thatand 1XY
1P Y bX a
whereXY = +1 if b > 0 and XY = -1 if b< 0
Proof: Let and . X YU X V Y
Let 2 0 g b E V bU for all b.
Consider choosing b to minimize
Since g(b) ≥ 0, then g(bmin) ≥ 0
or
2 g b E V bU
Consider choosing b to minimize
2 2 22 E V bVU b U 2 2 22 E V bE VU b E U
22 2 0 g b E VU bE U
min 2
E VUb b
E U
Hence g(bmin) ≥ 0
2 2 2min min min2 g b E V b E VU b E U
2
22 2
2E VU E VU
E V E VUE U E U
2
22
0E VU
E VE U
Hence 2
2 21
E VU
E U E V
or
2
22 2
1X Y
XY
X Y
E X Y
E X E Y
2 2 2min min min2 g b E V b E VU b E U
2min 0E V b U
Note
If and only if2 1XY
This will be true if min 0 1P V b U
i.e. min 0 1Y XP Y b X min min1 where Y XP Y b X a a b
Summary1 1XY
if there exists a and b such thatand 1XY
1P Y bX a
where
min 2
X X
X
E X Yb b
E X
minand YY X Y XY X
X
a b
2
Cov ,= =
VarXY X Y Y
XYX X
X YX
2 22. Var Var Var 2 Cov ,aX bY a X b Y ab X Y
Proof
Thus
2Var aX bYaX bY E aX bY
with aX bY X YE aX bY a b
2Var X YaX bY E aX bY a b
2 22 22X X Y YE a X ab X Y b Y 2 2Var 2 Cov , Vara X ab X Y b Y
1 13. Var n na X a X
2 21 1Var Varn na X a X
1 2 1 2 1 12 Cov , 2 Cov ,n na a X X a a X X
2 3 2 3 2 22 Cov , 2 Cov ,n na a X X a a X X
1 12 Cov ,n n n na a X X
2
1
Var 2 Cov ,n
i i i j i ji
a X a a X X
i j
21
1
Var if , , are mutually independentn
i i ni
a X X X
Some Applications (Rules of Expectation & Variance)
Let 11
1 1 1n
i ni
X X X Xn n n
Let X1, … , Xn be n mutually independent random variables each having mean and standard deviation (variance 2).
1 1 n na X a X
Then 11 1
nX E X E X E Xn n
1 1n n
Also
or X n
2 2
21
1 1nX Var X Var X Var X
n n
2 22 21 1
n n
2 2
2nn n
and X X n Thus
Hence the distribution of is centered at and becomes more and more compact about as n increases
X
Tchebychev’s Inequality
Tchebychev’s InequalityLet X denote a random variable with
mean =E(X) and variance Var(X) = E[(X – )2] = 2
then
Note:Is called the standard deviation of X,
2
11P X kk
2
11P k X kk
2Var X E X
Proof:
dxxfxXVar 22)(
kdxxfx 2
k
k
kdxxfxdxxfx 22
k
kdxxfkdxxfk 2222
kdxxfx 2
kdxxfx 2
kXPkXPk 22
k
kdxxkfdxxfk 22
kXPk 22
kXPk 222 Thus
2
1or k
kXP
2
11 andk
kXP
Tchebychev’s inequality is very conservative
•k =1
•k = 2
•k = 3
2
11k
kXkPkXP
0111 2 XPXP
43
211222 2 XPXP
98
311333 2 XPXP
The Law of Large Numbers
The Law of Large Numbers
Let1
1 n
ii
X Xn
Let X1, … , Xn be n mutually independent random variables each having mean
Then for any > 0 (no matter how small)
1 as P X P X n
Proof
2
11X X X XP k X kk
and X X n Now
We will use Tchebychev’s inequality which states for any random variable X.
P X
where or X
nk k kn
2
11k
kXkP XX
as n
Thus
Thus
2 2
11 1 1 P Xk n
1 as P X n
Thus the Law of Large Numbers states
ˆ 1 as P p p p n
A Special caseLet X1, … , Xn be n mutually independent random variables each having Bernoulli distribution with parameter p
1 if repetition is (prob )0 if repetition is (prob 1 )i
pX
q p
SF
iE X p
1 ˆ proportion of successesnX XX pn
Thus the Law of Large Numbers states that
as n
Some people misinterpret this to mean that if the proportion of successes is currently lower that p then the proportion of successes in the future will have to be larger than p to counter this and ensure that the Law of Large numbers holds true.Of course if in the infinite future the proportion of successes is p than this is enough to ensure that the Law of Large numbers holds true.
ˆ proportion of successesp
converges to the probability of success p
Some more applications
Rules of expectation and Rules of Variance
The mean and varianceof a Binomial Random variable
We have already computed this by other methods:
1. Using the probability function p(x).2. Using the moment generating function mX(t).
Suppose that we have observed n independent repetitions of a Bernoulli trialLet X1, … , Xn be n mutually independent random variables each having Bernoulli distribution with parameter pand defined by
1 if repetition is (prob )0 if repetition is (prob )i
i pX
i q
SF
Now X = X1 + … + Xn has a Binomial distribution with parameters n and pX is the total number of successes in the n repetitions.
1 0iE X p q p
1X nE X E X p p np
2 22 1 0iVar X p p p q pq
21var varX nX X pq pq npq
The mean and varianceof a Hypergeometric distribution
The hypergeometric distribution arises when we sample with replacement n objects from a population of N = a + b objects. The population is divided into to groups (group A and group B). Group A contains a objects while group B contains b objects
Let X denote the number of objects in the sample of n that come from group A. The probability function of X is:
a bx n x
p xa b
n
Then
Let X1, … , Xn be n random variables defined by
1 if object selected comes from group 0 if object selected comes from group
th
i th
i AX
i B
1 nX X X
1 and 0i ia bP X P X
a b a b
Proof
1 1
1 !1 1 !
1 !
!
a b ni
a b n
a ba
a P a b n aP Xa bP a b
a b n
and
Therefore
1 1 0 0 i i iaE X P X P X
a b
2 2 21 1 0 0 i i iaE X P X P X
a b
2
22var - i i ia aX E X E X
a b a b
1- a a a ba b a b a b a b
Thus
bna b
1 nE X E X X
1
n
ii
E X
and
Also
var ia bX
a b a b
1Var Var nX X X
1
Var 2 Cov ,n
i i ji
X X X
We need to also calculate Cov ,i jX X
Note: Cov , U VU V E U V U V U VE UV V U
U V V U U VE UV
U VE UV E UV E U E V
and iaE X
a b
Thus Cov ,i j i j i jX X E X X E X E X
Note:
1 1 0 0i j i j i jE X X P X X P X X 1 1, 1i j i jP X X P X X
2 2
2
11, 1 a b n
i ja b n
a a PP X X
P
2 !1
2 2 ! 1
! 1!
a ba a
a b n a aa b a b a b
a b n
and
Thus
Cov ,i j i j i jX X E X X E X E X
11i j
a aE X X
a b a b
211
a a aa b a b a b
11
a a aa b a b a b
1 11
a a b a a baa b a b a b
21ab
a b a b
with
Thus
2var i
a b abXa b a b a b
1Var Var nX X X
1
Var 2 Cov ,n
i i ji
X X X
and 2Cov ,
1i j
abX Xa b a b
1
Var Var 2 Cov ,n
i i ji
X X X X
2 2
12
2 1
n nab abna b a b a b
i j
i j
Thus
1
Var Var 2 Cov ,n
i i ji
X X X X
2 2
12
2 1
n nab abna b a b a b
i j
2
11
1nabn
a ba b
1A Bnp p f
1 1where , and 1 1A B
a b n np p fa b a b a b N
Thus if X has a hypergeometric distribution with parameters a, b and n then
Var 1A BX np p f
1 1where , and 1 1A B
a b n np p fa b a b a b N
AaE X n np
a b
The mean and varianceof a Negative Binomial distribution
The Negative Binomial distribution arises when we repeat a Bernoulli trial until k successes (S) occur. Then X = the trial on which the kth success occurred.
The probability function of X is:
1 , 1, 2,...
1k x kx
p x p q x k k kk
Let X1= the number of trial on which the 1st success occurred.
and Xi = the number of trials after the (i -1)st success on which the ith success occurred (i ≥ 2)
Xi each have a geometric distribution with parameter p.
Then X = X1 + … + Xk
and X1, … , Xk are mutually independent
2
1thus and Vari iqE X X
p p
1
hence k
ii
kE X E Xp
21
and Var Vark
ii
kqX Xp
Thus if X has a negative binomial distribution with parameters k and p then
2Var kqXp
kE Xp
Multivariate Moments
Non-central and Central
DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint moment of (X1, X2) of order (k1, k2) is defined to be:
1 2
1 2 1 2k k
k k E X X
1 2
1 2
1 2
1 2 1 2 1 2
1 2 1 2 1 2 1 2-
, if , are discrete
, if , are continuous
k k
x x
k k
x x p x x X X
x x f x x dx dx X X
DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint central moment of (X1, X2) of order (k1, k2) is defined to be:
1 2
1 2
0, 1 1 2 2
k kk k E X X
1 2
1 2
1 2
1 1 2 2 1 2 1 2
1 1 2 2 1 2 1 2 1 2-
, if , are discrete
, if , are continuous
k k
x x
k k
x x p x x X X
x x f x x dx dx X X
where 1 = E [X1] and 2 = E [X2]
Note
01,1 1 1 2 2 1 2 Cov ,E X X X X
= the covariance of X1 and X2.
Definition: For any two random variables X and Y then define the correlation coefficient XY to be:
Cov , Cov ,=
Var Varxy
X Y
X Y X Y
X Y
Properties of the correlation coefficient XY
Cov , Cov ,=
Var Varxy
X Y
X Y X Y
X Y
If and are independent than 0.XYX Y
: Cov , 0 X Y Reason
The converse is not necessarily true.i.e. XY = 0 does not imply that X and Y are independent.
More properties of the correlation coefficient
1 1XY
if there exists a and b such thatand 1XY
1P Y bX a
whereXY = +1 if b > 0 and XY = -1 if b< 0
Some Rules for Expectation
1 11. , ,i i n nE X x f x x dx dx
i i i ix f x dx
Thus you can calculate E[Xi] either from the joint distribution of
X1, … , Xn or the marginal distribution of Xi.
1 1 1 12. n n n nE a X a X a E X a E X
The Linearity property
1 1, , , ,q q kE g X X h X X
In the simple case when k = 2
3. (The Multiplicative property) Suppose X1, … , Xq
are independent of Xq+1, … , Xk then
1 1, , , ,q q kE g X X E h X X
E XY E X E Y
if X and Y are independent
Some Rules for Variance
2 2 2Var X XX E X E X
1. Var Var Var 2Cov ,X Y X Y X Y
where Cov , = X YX Y E X Y
and Var Var VarX Y X Y
Note: If X and Y are independent, then
Cov , = X YX Y E X Y
= X YE X E Y
= 0X YE X E Y
2 2and Var 2 X Y XY X YX Y
Definition: For any two random variables X and Y then define the correlation coefficient XY to be:
Cov , Cov ,=
Var Varxy
X Y
X Y X Y
X Y
Thus Cov , = XY X YX Y
if X and Y are independent
2 2X Y
2 22. Var Var Var 2 Cov ,aX bY a X b Y ab X Y
Proof
Thus
2Var aX bYaX bY E aX bY
with aX bY X YE aX bY a b
2Var X YaX bY E aX bY a b
2 22 22X X Y YE a X ab X Y b Y 2 2Var 2 Cov , Vara X ab X Y b Y
1 13. Var n na X a X
2 21 1Var Varn na X a X
1 2 1 2 1 12 Cov , 2 Cov ,n na a X X a a X X
2 3 2 3 2 22 Cov , 2 Cov ,n na a X X a a X X
1 12 Cov ,n n n na a X X
2
1
Var 2 Cov ,n
i i i j i ji
a X a a X X
i j
21
1
Var if , , are mutually independentn
i i ni
a X X X
Distribution functions, Moments,
Moment generating functions in the Multivariate case
The distribution function F(x)
This is defined for any random variable, X.
F(x) = P[X ≤ x]
Properties
1. F(-∞) = 0 and F(∞) = 1.
2. F(x) is non-decreasing(i. e. if x1 < x2 then F(x1) ≤ F(x2) )
3. F(b) – F(a) = P[a < X ≤ b].
4. Discrete Random Variables
F(x) is a non-decreasing step function with
u x
F x P X x p u
jump in at .p x F x F x F x x
0 and 1F F
0
0.2
0.4
0.6
0.8
1
1.2
-1 0 1 2 3 4
F(x)
p(x)
5. Continuous Random Variables Variables
F(x) is a non-decreasing continuous function with
x
F x P X x f u du
.f x F x
0 and 1F F
F(x)
f(x) slope
0
1
-1 0 1 2x
To find the probability density function, f(x), one first finds F(x) then .f x F x
The joint distribution function F(x1, x2, …, xk)
is defined for k random variables, X1, X2, … , Xk.
F(x1, x2, … , xk) = P[ X1 ≤ x1, X2 ≤ x2 , … , Xk ≤ xk ]
for k = 2
F(x1, x2) = P[ X1 ≤ x1, X2 ≤ x2]
(x1, x2)
x1
x2
Properties
1. F(x1 , -∞) = F(-∞ , x2) = F(-∞ , -∞) = 0
2. F(x1 , ∞) = P[ X1 ≤ x1, X2 ≤ ∞] = P[ X1 ≤ x1] = F1 (x1) = the marginal cumulative distribution
function of X1
F(∞, ∞) = P[ X1 ≤ ∞, X2 ≤ ∞] = 1
= the marginal cumulative distribution function of X2
F(∞, x2) = P[ X1 ≤ ∞, X2 ≤ x2] = P[ X2 ≤ x2] = F2 (x2)
3. F(x1, x2 ) is non-decreasing in both the x1 direction and the x2 direction.
i.e. if a1 < b1 if a2 < b2 then
i. F(a1, x2) ≤ F(b1 , x2)
ii. F(x1, a2) ≤ F(x1 , b2)
iii. F( a1, a2) ≤ F(b1 , b2) (b1, b2)
x1
(b1, a2)(a1, a2)
(a1, b2)x2
4. P[a < X1 ≤ b, c < X2 ≤ d] =
F(b,d) – F(a,d) – F(b,c) + F(a,c).
(b, d)
x1
(b, c)(a, c)
(a, d)
x2
4. Discrete Random Variables
F(x1, x2) is a step surface
2 2 1 1
1 2 1 1 2 2 1 2, , ,u x u x
F x x P X x X x p u u
1 2 1 2 1 2, jump in , at , .p x x F x x x x
(x1, x2)
x1
x2
5. Continuous Random Variables
F(x1, x2) is a surface
1 1
1 2 1 1 2 2 1 2 1 2, , ,x x
F x x P X x X x f u u du du
2 21 2 1 2
1 21 2 2 12
, ,,
F x x F x xf x x
x x x x
(x1, x2)
x1
x2
Multivariate Moments
Non-central and Central
DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint moment of (X1, X2) of order (k1, k2) is defined to be:
1 2
1 2 1 2k k
k k E X X
1 2
1 2
1 2
1 2 1 2 1 2
1 2 1 2 1 2 1 2-
, if , are discrete
, if , are continuous
k k
x x
k k
x x p x x X X
x x f x x dx dx X X
DefinitionLet X1 and X2 be a jointly distirbuted random variables (discrete or continuous), then for any pair of positive integers (k1, k2) the joint central moment of (X1, X2) of order (k1, k2) is defined to be:
1 2
1 2
0, 1 1 2 2
k kk k E X X
1 2
1 2
1 2
1 1 2 2 1 2 1 2
1 1 2 2 1 2 1 2 1 2-
, if , are discrete
, if , are continuous
k k
x x
k k
x x p x x X X
x x f x x dx dx X X
where 1 = E [X1] and 2 = E [X2]
Note
01,1 1 1 2 2 1 2 Cov ,E X X X X
= the covariance of X1 and X2.
Multivariate Moment Generating functions
Recall
The moment generating function
if is discrete
if is continuous
tx
xtX
Xtx
e p x X
m t E ee f x dx X
DefinitionLet X1, X2, … Xk be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:
1 1
1 , , 1, , k k
k
t X t XX X km t t E e
1 1
1
1 1
1 1
1 1 1-
, , if , , are discrete
, , if , , are continuous
k k
k
k k
t x t xk k
x x
t x t xk k k
e p x x X X
e f x x dx dx X X
DefinitionLet X1, X2, … Xk be a jointly distributed random variables (discrete or continuous), then the joint moment generating function is defined to be:
1 1
1 , , 1, , k k
k
t X t XX X km t t E e
1 1
1
1 1
1 1
1 1 1-
, , if , , are discrete
, , if , , are continuous
k k
k
k k
t x t xk k
x x
t x t xk k k
e p x x X X
e f x x dx dx X X
1 , ,: 0, ,0 1
kX Xm Note
1 , ,
0, , , 0k iX X X
i
m t m t
Power Series expansion the joint moment generating function (k = 2)
, , tX sY tX sYX Ym t s E e E e e
2 3 4
using 12! 3! 4!
u u u ue u
2 2
1 12! 2!
tX sYE tX sY
2 22 21
2! 2! ! !
k mk mt s t sE Xt Ys X XYts Y X Y
k m
2 2
1,0 0,1 2,0 1,1 2,0 ,12! 2! ! !
k m
k mt s t st s ts
k m
2,0 0,2 ,2 21,0 0,1 1,11
2! 2! ! !k m k mt s t ts s t s
k m