S.Lan Lecture 1 Probability & Statisticsslan/download/stp427_lecture1.pdfMultivariate Distributions Common Probability Distributions Lecture 1 Probability & Statistics A brief overview

Probability &Statistics

S.Lan

Basic Concepts

ConditionalProbability

RandomVariables

MathematicalExpectation

MultivariateDistributions

CommonProbabilityDistributions

Lecture 1 Probability & StatisticsA brief overview

Shiwei Lan1

1School of Mathematical and Statistical SciencesArizona State University

STP427 Mathematical StatisticsFall 2019

1 / 35


S.Lan

Basic Concepts


RandomVariables




Table of Contents

1 Basic Concepts

2 Conditional Probability

3 Random Variables

4 Mathematical Expectation

5 Multivariate Distributions

6 Common Probability Distributions

2 / 35


S.Lan

Basic Concepts


RandomVariables




Probability vs Statistics

3 / 35


S.Lan

Basic Concepts


RandomVariables




Probability vs Statistics

4 / 35


S.Lan

Basic Concepts


RandomVariables




Terminology

• random experiment: outcome cannot be predicted.

• outcome c : specific result of the experiment.

• sample space C: collection of all possible outcomes.

• event C : collection of some outcomes, subset of sample space.

• We have c ∈ C ⊂ C.5 / 35


S.Lan

Basic Concepts


RandomVariables




Set Theory

Figure: Venn diagrams

• commutativity: C1 ∪ C2 = C2 ∪ C1,C1 ∩ C2 = C2 ∩ C1.

• associativity:(C1 ∪ C2) ∪ C3 = C1 ∪ (C2 ∪ C3),(C1 ∩ C2) ∩ C3 = C1 ∩ (C2 ∩ C3)

• distributive laws:C1 ∪ (C2 ∩ C3) = (C1 ∪ C2) ∩ (C1 ∪ C3),C1 ∩ (C2 ∪ C3) = (C1 ∩ C2) ∪ (C1 ∩ C3)

• De Morgan’s laws: (C1 ∪ C2)c = C c1 ∩ C c

2 ,(C1 ∩ C2)c = C c

1 ∪ C c2 .

6 / 35


S.Lan

Basic Concepts


RandomVariables




Probability

Definition (Probability)

Let C be a sample space and let B be the set of events. Let P be a real-valuedfunction defined on B. Then P is a probability set function if P satisfies thefollowing three conditions:

1 P(C ) ≥ 0 for all C ∈ B.

2 P(C) = 1.

3 If {Cn} is a sequence of events in B and Cm ∩ Cn = ∅ for all m 6= n, then

P (∪∞n=1Cn) =∞∑

n=1

P (Cn)

7 / 35


S.Lan

Basic Concepts


RandomVariables




Theoretic Properties

• For each C ∈ B, P(C ) = 1− P(C c).

• P(∅) = 0.

• If C1 ⊂ C2, then P(C1) ≤ P(C2).

• For each C ∈ B, 0 ≤ P(C ) ≤ 1.

• For C1,C2 ∈ B, P(C1 ∪ C2) = P(C1) + P(C2)− P(C1 ∩ C2).

• Let pk =∑

1≤i1,··· ,ik≤n P(∩kj=1Cij ), then P(∪nk=1Ck) =∑n

k=1(−1)k+1pk .

• In general p1 ≥ p2 ≥ · · · pk . In particular, we have• Boole’s inequality:

∑nk=1 P(Ck) ≥ P(∪nk=1Ck). Holds when n→∞.

• Bonferroni’s inequality: P(C1 ∩ C2) ≥ P(C1) + P(C2)− 1.

8 / 35


S.Lan

Basic Concepts


RandomVariables




Counting

• Frequentist statisticians define probability using (relative) frequency, thenumber (measurement) of outcomes in an event divided by the totaloutcomes.

• We need counting rules like the multiplication rule. Moreover, we have thefollowing counting formulae depending on whether the random draw is withreplacement and whether the results are ordered.

Select k objects out of n With replacement Without replacement (k ≤ n)

ordered nk Pnk

unordered(n−1+k

k

) (nk

)

Table: Counting Formulae

9 / 35


S.Lan

Basic Concepts


RandomVariables




Table of Contents

1 Basic Concepts


3 Random Variables




10 / 35


S.Lan

Basic Concepts


RandomVariables




Conditional Probability

Definition (Conditional Probability)

If P(C1) > 0, then the conditional probability of the event C2 given the eventC1 is defined as

P(C2|C1) =P(C1 ∩ C2)

P(C1)

This definition satisfies the requirements of probability set function:

1 P(C2|C1) ≥ 02 P(C1|C1) = 1.3 P (∪∞n=2Cn) =

∑∞n=2 P (Cn) for {Cn}n≥2 mutually exclusive.

We immediately have the following• multiplication rule: P(C1 ∩ C2) = P(C1)P(C2|C1)• law of total probability: if {Ci}ki=1 form a partition of C,

P(C ) =k∑

i=1

P(Ci )P(C |Ci )11 / 35


S.Lan

Basic Concepts


RandomVariables




Bayes’ Theorem

Theorem (Bayes’ Theorem)

if {Ci}ki=1 form a partition of C, and P(C ) > 0, then

P(Cj |C ) =P(C ∩ Cj)

P(C )=

P(Cj)P(C |Cj)∑ki=1 P(Ci )P(C |Ci )

• P(Cj)’s are called prior probabilities.

• P(Cj |C )’s are called posterior probabilities.

• The theorem enables us to update our prior belief (P(Cj)) with data(P(C |Cj)) to get new knowledge (P(Cj |C )), which is the foundation ofBayesian statistics.

12 / 35


S.Lan

Basic Concepts


RandomVariables




Monty Hall Problem

The Monty Hall problemis a brain teaser, in the form of a probabilitypuzzle, loosely based on the Americantelevision game show Let’s Make a Dealand named after its original host, Monty Hall.

Suppose you’re on a game show, and you’re given the choice of threedoors: Behind one door is a car; behind the others, goats. You pick adoor, say No. 1, and the host, who knows what’s behind the doors, opensanother door, say No. 3, which has a goat. He then says to you, “Do youwant to pick door No. 2?”

Is it to your advantage to switch your choice?

13 / 35


S.Lan

Basic Concepts


RandomVariables




Independence

Definition (Independence)

Events C1 and C2 are independent if

P(C1 ∩ C2) = P(C1)P(C2)

It immediately implies that P(C2|C1) = P(C2) if P(C1) > 0 or P(C1|C2) = P(C1)if P(C2) > 0. For multiple events, we have

Definition (Independence among multiple events)

Events {Ci}ni=1 are pairwise independent if

P(Ci ∩ Cj) = P(Ci )P(Cj), 1 ≤ i 6= j ≤ n

They are mutually independent if for 2 ≤ k ≤ n, {dj |1 ≤ dj ≤ n}kj=1 distinct,

P(∩kj=1Cdj ) =d∏

j=1

P(Cdj )14 / 35


S.Lan

Basic Concepts


RandomVariables




Independence

• mutually independence =⇒ pairwise independence ?

• mutually independence ⇐= pairwise independence ?

• counter-example?

• Hint: consider an urn with balls numbered 1, 2, 3, 4: how to constructevents A, B, C?

15 / 35


S.Lan

Basic Concepts


RandomVariables




Table of Contents

1 Basic Concepts


3 Random Variables




16 / 35


S.Lan

Basic Concepts


RandomVariables




Random Variable

Definition (Random Variable)

Consider the probability space (C,B,P). A random variable is a function thatassigns to each element c ∈ C one and only one real number X (c) = x . Thespace or range of X is the set of real numbers D = {x : x = X (c), c ∈ C}.Depending on whether D is a countable set or a subset of real numbers, we nameX as discrete or continuous random variable.

17 / 35


S.Lan

Basic Concepts


RandomVariables




Probability Distribution

Note the random variable X induces a probability PX on D ⊂ R:

PX (D) = P[c ∈ C : X (c) ∈ D], ∀D ⊂ D

18 / 35


S.Lan

Basic Concepts


RandomVariables





Definition (Probability Mass (Density) Function)

If D = {di}, then the probability mass function (pmf) of random variable X is

pX (di ) = P[c ∈ C : X (c) = di ],

If there exists nonnegative function fX (x) such that (a, b) ∈ σ(D),

PX [(a, b)] = P[c ∈ C : a < X (c) < b] =

∫ b

afX (x)dx

then we call fX the probability density function (pdf) of X .

19 / 35


S.Lan

Basic Concepts


RandomVariables





Definition (Cumulative Distribution Function (CDF))

Let X be a random variable. Then its cumulative distribution function (cdf) isdefined by FX (x):

FX (x) = PX ((−∞, x ]) = P[c ∈ C : X (c) ≤ x ]

20 / 35


S.Lan

Basic Concepts


RandomVariables





• CDF is a nondecreasing, right-continuous, bounded (between 0 and 1)function.

• If X is discrete, FX (x) =∑

x ′≤x pX (x ′) and pX (x) = FX (x)− FX (x−).

• If X is continuous, Fx(x) =∫ x−∞ fX (x ′)dx ′ and fX (x) = d

dx F (x) if fX iscontinuous. fX (x) =?.

• If X is continuous,P(a < X ≤ b) = P(a ≤ X ≤ b) = P(a ≤ X < b) = P(a < X < b). Is it trueif X is discrete?

21 / 35


S.Lan

Basic Concepts


RandomVariables




Transformation

• If X is discrete, then

pY (y) = P[Y = y ] = P[g(X ) = y ] = P[X = g−1(y)] = pX (g−1(y))

• If X is continuous, we further assume g is differentiable, then we have

fY (y) = fX (g−1(y))

∣∣∣∣dx

dy

∣∣∣∣ , for y ∈ SY

where the support of Y is the set SY = {y = g(x) : x ∈ SX}.

22 / 35


S.Lan

Basic Concepts


RandomVariables




Table of Contents

1 Basic Concepts


3 Random Variables




23 / 35


S.Lan

Basic Concepts


RandomVariables




Mathematical Expectation

Definition (Expectation)

If X is a continuous random variable with pdf f (x) and∫∞−∞ |x |f (x)dx <∞, then

the expectation of X is

E (X ) =

∫ ∞

−∞xf (x)dx

If X is a discrete random variable with pmf p(x), and∑

x |x |p(x) <∞, then theexpectation of X is

E (X ) =∑

x

xf (x)dx

In general, the expectation of Y = g(X ) can be calculated by substituting theintegrand (summand) with g(x)fX (x) (g(x)pX (x)) as long as they are absolutelyintegrable (summable).

24 / 35


S.Lan

Basic Concepts


RandomVariables




Variance and Moments of higher order

Definition (Variance)

If X is a random variable with finite mean µ = E [X ] and such that E [(X − µ)2] isfinite, then the variance of X is defined to be E [(X − µ)2], usually denoted by σ2

or Var(X ). NoteVar(X ) = E [X 2]− (E [X ])2

It is convention to call σ (square root of the variance) the standard deviation ofX . In general, the k-th moments of X is defined as µk := E [X k ] if it exists.

Definition (Moment Generating Function (mgf))

Let X be a random variable such that for some h > 0, the expectation of etX

exists for −h < t < h. The moment generating function (mgf) of X is definedto be MX (t) = E [etX ]. We have

µk =dk

dtk

∣∣∣∣t=0

M(t)25 / 35


S.Lan

Basic Concepts


RandomVariables




Table of Contents

1 Basic Concepts


3 Random Variables




26 / 35


S.Lan

Basic Concepts


RandomVariables




Random Vector

• Random function (X1,X2) : C × C → R2.D = {(x1, x2) : x1 = X1(c), x2 = X2(c), c ∈ C}.• Joint cdf FX1,X2(x1, x2) = P[{X1 ≤ x1} ∩ {X2 ≤ x2}] = P[X1 ≤ x1,X2 ≤ x2].

• Joint pmf pX1,X2(x1, x2) = P[X1 = x1,X2 = x2]; joint pdf

fX1,X2(x1, x2) = ∂2

∂x1∂x2FX1,X2(x1, x2).

• Marginal cdf FX1(x1) = P[X1 ≤ x1,−∞ < X2 <∞] = limx2↑∞ FX1,X2(x1, x2).

• Expectation of Y = g(X1,X2) for g : R2 → R is calculated asE [Y ] =

∫ ∫g(x1, x2)fX1,X2(x1, x2)dx1dx2 or

E [Y ] =∑

x1

∑x2g(x1, x2)pX1,X2(x1, x2). Expectation is a linear operator.

• Mgf of X = (X1,X2)′: MX(t) = E [et′X] = E [et1X1+t2X2 ].

• Transformation Y = [g1(X), g2(X)]′ := G (X). ThenfY(y) = fX(G−1(y))

∣∣∂X∂Y

∣∣.

27 / 35


S.Lan

Basic Concepts


RandomVariables




Random Vector

Example

Let Y1 = 12(X1 − X2) where X1 and X2 have the joint pdf

fX1,X2(x1, x2) =

{14 exp

(− x1+x2

2

), 0 < x1 <∞, 0 < x2 <∞

0, elsewhere

What is the distribution of Y1?

28 / 35


S.Lan

Basic Concepts


RandomVariables




Conditional Distributions and Expectations

• Conditional pmf pX2|X1(x2|x1) =

pX1,X2 (x1,x2)

pX1 (x1)for given x1 with pX1(x1) > 0;

conditional pdf fX2|X1(x2|x1) =

fX1,X2 (x1,x2)

fX1 (x1)for given x1 with fX1(x1) > 0.

• Conditional cdf FX2|Xx(x2|x1) can be calculated using conditional pmf or pdf.

• Conditional Expectation E [u(X2)|x1] =∫u(x2)f2|1(x2|x1)dx2, conditional

variance Var(X2|x1) = E [X 22 |x1]− (E [X2|x1])2.

Theorem

Let (X1,X2) be a random vector such that the variance of X2 is finite. Then

1 E [E [X2|X1]] = E (X2).

2 Var[E [X2|X1]] ≤ Var(X2).

29 / 35


S.Lan

Basic Concepts


RandomVariables




Correlation Coefficient

Definition (Variance)

The covariance between random variables X and Y , denoted as cov(X ,Y ), isdefined to be E [(X − µX )(Y − µY )] = E [XY ]− µXµY . If each of σ1 and σ2 isfinite, the number

ρ =E [(X − µX )(Y − µY )]

σ1σ2=

cov(X ,Y )

σ1σ2

is called the correlation coefficient of X and Y .

30 / 35


S.Lan

Basic Concepts


RandomVariables




Independent Random Variables

Definition (Independence)

Let the random variables X1 and X2 have the joint pdf f (x1, x2) (joint pmfp(x1, x2)) and the marginal pdfs f1(x1), f2(x2) (marginal pmfs p1(x1), p2(x2))respectively. X1 and X2 are independent if and only if f (x1, x2) ≡ f1(x1)f2(x2)(p(x1, x2) ≡ p1(x1)p2(x2)). Otherwise they are said to be dependent.

• independence =⇒ uncorrelation?

• independence ⇐= uncorrelation?

• counter-example?

31 / 35


S.Lan

Basic Concepts


RandomVariables




Independent Random Variables

Criteria to judge independence between X1 and X2:

• Have separate supports S1 and S2 respectively and the joint pdf factorizesf (x1, x2) ≡ g(x1)h(x2).

• The joint cdf factorizes F (x1, x2) = F1(x1)F2(x2).

• The joint probability factorizesP(a < X1 ≤ b, c < X2 ≤ d) = P(a < X1 ≤ b)P(c < X2 ≤ d).

• The joint mgf factorizes M(t1, t2) = M(t1, 0)M(0, t2).

When X1 and X2 are independent, then

E [u(X1)v(X2)] = E [u(X1)]E [v(X2)]

if they all exist.

32 / 35


S.Lan

Basic Concepts


RandomVariables




Conditional Distributions

Example

Suppose X1 and X2 are jointly Gaussian random variables such thatX = (X1,X2)′ ∼ N (µ,Σ) with the following joint density function

fX1,X2(x1, x2) =1

2π√|Σ|

exp((X− µ)′Σ−1(X− µ)

)

where µ = (µ1, µ2)′ and Σ =

[σ21 σ12σ21 σ22

]. What is the distribution of X2|X1?

33 / 35


S.Lan

Basic Concepts


RandomVariables




Linear Combination of Random Variables

T =n∑

i=1

aiXi = a′X

Example (Sample Mean)

Let X1, · · ·Xn be iid random variables with common mean µ and variance σ2. Thesample mean is defined by X = n−1

∑ni=1 Xi . What is its mean, and variance?

Example (Sample Variance)

Now we define the sample variance as follows

S2X = (n − 1)−1

n∑

i=1

(Xi − X )2 =n

n − 1(X 2 − X

2)

What is its mean, and variance?

34 / 35


S.Lan

Basic Concepts


RandomVariables




Table of Contents

1 Basic Concepts


3 Random Variables




35 / 35

666 Common Distributions

List of Common Discrete Distributions

Bernouli (3.1.1)0 < p < 1 p(x) = px(1 − p)1−x, x = 0, 1

µ = p, σ2 = p(1 − p)m(t) = [(1 − p) + pet], −∞ < t < ∞

Binomial (3.1.2)0 < p < 1 p(x) =

!nx

"px(1 − p)n−x, x = 0, 1, 2, . . . , n

n = 1, 2, . . .µ = np, σ2 = np(1 − p)m(t) = [(1 − p) + pet]n, −∞ < t < ∞

Geometric (3.1.5)0 < p < 1 p(x) = p(1 − p)x, x = 0, 1, 2, . . .

µ = pq , σ2 = 1−p

p2

m(t) = p[1 − (1 − p)et]−1, t < − log(1 − p)

Hypergeometric (N, D, n) (3.1.7)

n = 1, 2, . . . ,min{N, D} p(x) =(N−D

n−x )(Dx)

(Nn)

, x = 0, 1, 2, . . . , n

µ = nDN , σ2 = nD

NN−D

NN−nN−1

The above pmf is the probability of obtaining x Dsin a sample of size n, without replacement.

Negative Binomial (3.1.4)

0 < p < 1 p(x) =!x+r−1

r−1

"pr(1 − p)x, x = 0, 1, 2, . . .

r = 1, 2, . . .

µ = rpq , σ2 = r(1−p)

p2

m(t) = pr[1 − (1 − p)et]−r, t < − log(1 − p)

Poisson (3.2.1)

m > 0 p(x) = e−m mx

x! , x = 0, 1, 2, . . .µ = m, σ2 = mm(t) = exp{m(et − 1)}, −∞ < t < ∞

Common Distributions 667

List of Common Continuous Distributions

beta (3.3.5)

α > 0 f(x) = Γ(α+β)Γ(α)Γ(β)x

α−1(1 − x)β−1, 0 < x < 1

β > 0

µ = αα+β , σ2 = αβ

(α+β+1)(α+β)2

m(t) = 1 +#∞

i=1

$%k−1j=0

α+jα+β+j

&ti

i! , −∞ < t < ∞

Cauchy (1.9.1)f(x) = 1

π1

x2+1 , −∞ < x < ∞Neither the mean nor the variance exists.The mgf does not exist.

Chi-squared, χ2(r) (3.3.3)

r > 0 f(x) = 1Γ(r/2)2r/2 x(r/2)−1e−x/2, x > 0

µ = r, σ2 = 2r

m(t) = (1 − 2t)−r/2, t < 12

χ2(r) ⇔ Γ(r/2, 2)r is called the degrees of freedom.

Exponential (3.3.2)λ > 0 f(x) = λe−λx, x > 0

µ = 1λ , σ2 = 1

λ2

m(t) = [1 − (t/λ)]−1, t < λExponential(λ) ⇔ Γ(1, 1/λ)

F , F (r1, r2) (3.6.6)

r1 > 0 f(x) = Γ[(r1+r2)/2](r1/r2)r1/2

Γ(r1/2)Γ(r2/2)(x)r1/2−1

(1+r1x/r2)(r1+r2)/2 , x > 0

r2 > 0 > 0

If r2 > 2, µ = r2

r2−2 . If r > 4, σ2 = 2$

r2

r2−2

&2r1+r2−2r1(r2−4) .

The mgf does not exist.r1 is called the numerator degrees of freedom.r2 is called the denominator degrees of freedom.

Gamma, Γ(α, β) (3.3.1)

α > 0 f(x) = 1Γ(α)βα xα−1e−x/β, x > 0

β > 0µ = αβ, σ2 = αβ2

m(t) = (1 − βt)−α, t < 1β

668 Common Distributions

Continuous Distributions, Continued

Laplace (2.2.1)

−∞ < θ < ∞ f(x) = 12 e−|x−θ|, −∞ < x < ∞

µ = θ, σ2 = 2m(t) = etθ 1

1−t2 , −1 < t < 1

Logistic (6.1.8)

−∞ < θ < ∞ f(x) = exp{−(x−θ)}(1+exp{−(x−θ)})2 , −∞ < x < ∞

µ = θ, σ2 = π2

3m(t) = etθΓ(1 − t)Γ(1 + t), −1 < t < 1

Normal, N(µ, σ2) (3.4.6)

−∞ < µ < ∞ f(x) = 1√2πσ

exp'

− 12

!x−µ

σ

"2(

, −∞ < x < ∞σ > 0

µ = µ, σ2 = σ2

m(t) = exp{µt + (1/2)σ2t2}, −∞ < t < ∞

t, t(r) (3.6.1)

r > 0 f(x) = Γ[(r+1)/2]√πrΓ(r/2)

1(1+x2/r)(r+1)/2 , −∞ < x < ∞

If r > 1, µ = 0. If r > 2, σ2 = rr−2 .

The mgf does not exist.The parameter r is called the degrees of freedom.

Uniform (1.7.4)−∞ < a < b < ∞ f(x) = 1

b−a , a < x < b

µ = a+b2 , σ2 = (b−a)2

12

m(t) = ebt−eat

(b−a)t , −∞ < t < ∞

Documents

S.Lan Lecture 1 Probability & Statisticsslan/download/stp427_lecture1.pdfMultivariate Distributions Common Probability Distributions Lecture 1 Probability & Statistics A brief overview