Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
ProbabilitySTAT 416Spring 2007
2 Discrete Distributions
1. Introduction
2. Mean and Variance
3. Binomial Distribution
4. Poisson Distribution
5. Other Discrete Distributions
1
2.1 Introduction
Example: Fair dice, Observations: 1, 2, 3, 4, 5, 6Each observation probability p = 1/6:
P (1) = 1/6, P (2) = 1/6, . . .
We observe realizations of a random variable
Random variable: Map from a (suitable) probability space into thereal numbers X : Ω → R
Examples:
Ω = 1, 2, 3, 4, 5, 6P (i) = 1/6, i = 1, . . . 6
X(i) = i
2
Example continued
Two fair dices, Sum of observations X = X1 + X2
X1 and X2 both random variables like before (independent)
Ω = 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12P (2) = P (12) = 1/36
P (3) = P (11) = 2/36
P (4) = P (10) = 3/36
P (5) = P (9) = 4/36
P (6) = P (8) = 5/36
P (7) = 6/36
X : Ω → R, X(i) = i
3
Discrete random variable
Sample space Ω with finite or countable number of elements,i.e. index set N: Ω = x1, x2, x3, . . .
It is always possible to identify the sample space Ω with the set ofall possible observations of the random variable
Random variable X then has the form X : Ω → R, X(xi) = xi
fully described by its probability function:
P : Ω → [0, 1], P (xi) = pi
Probability of elementary events fully describe distribution of adiscrete random variable
4
Cumulative distribution function (cdf)
F : R→ [0, 1], F (x) = P (X ≤ x)
Example: Fair dice
−2 0 2 4 6 8
0
0.2
0.4
0.6
0.8
1
F(x
) =
P(X
≤ x
)
x
5
Uniform distribution
n possible events with equal probability
Ω = 1, . . . , n P (i) = 1/n
Cummulative distribution function:
F (x) =
0, x < 1
i/n, i ≤ x < i + 1, i = 1, . . . , n− 1
1, x ≥ n
at x ∈ Ω the CDF has jumps of size 1/n
⇒ connection between CDF and probability function
P (i) = F (i)− F (i− 1), for i ∈ Ω
6
Properties of the CDF
Specifically for discrete random variables:
CDF is monotonously increasing step function with jumps at eventswith positive probability
Generally for CDF holds:
• P (x) = F (x)− F (x−), where F (x−) = limh→x,h<x
F (h)
due to the definition of F (x) = P (X ≤ x)
• P (a < X ≤ b) = F (b)− F (a)
• lima→−∞
F (a) = 0, limb→∞
F (b) = 1
• F (x) monotonously increasing
7
Exercise
CDF of a random variable X given by
F (x) =
0, x < 1
1− 2−k, k ≤ x < k + 1, k = 1, 2, . . .
1. Draw the CDF in the range of x ∈ [0, 5]
2. Determine the probability function of X
3. Compute the probability of X > 5?
8
2.2 Mean and Variance
Essential properties of a distribution
Important for practical purposes
⇒ Reduction of information of data
Mean is a measure of central tendency, also called expected value,corresponds to the arithmetic mean of a sample
Variance is a measure of dispersioncorresponds to the deviation from the mean of a sample
Both figures based on moments of distribution, specifically for thenormal distribution of major importance
9
Mean
Discrete random variable X with probability space Ω, P
Definition of mean:
E(X) =∑
x∈Ω
xP (x)
Weighted sum of values of Ω
weights are the corresponding probabilities of events
Usual notation: µ = E(X)
Example Fair dice:
E(X) = 1 · 1/6 + 2 · 1/6 + · · ·+ 6 · 1/6
=1 + 2 + 3 + 4 + 5 + 6
6= 21/6 = 3.5
10
Transformation of random variables
Discrete random variable X with probability space Ω, P
Specifically for all x ∈ Ω : P (x) = px
Additionally given f : Ω → R, image set f(Ω)
Definition: f(X) is the random variable Y : f(Ω) → R with
Y (y) = y and P (y) =∑
x∈Ω:f(x)=y
px
I.e. values of events x ∈ Ω are transformed into f(x)Probabilities added for all x with equal images f(x)
11
Examples for transformation
1) Fair dice, f(x) = x2, Y = X2:
Y (y) = y with y ∈ ΩY := 1, 4, 9, 16, 25, 36P (1) = P (4) = P (9) = P (16) = P (25) = P (36) = 1/6
2) Fair dice, g(x) = (x− 3.5)2, Z = (X − 3.5)2:
Z(z) = z with z ∈ ΩZ := 2.52, 1.52, 0.52 = 6.25, 2.25, 0.25P (6.25) = p1 + p6 = 1/3P (2.25) = p2 + p5 = 1/3P (0.25) = p3 + p4 = 1/3
Exercise:Ω = −1, 0, 1, P (X = −1) = P (X = 1) = 1/4, P (X = 0) = 1/2
Compute Y = X2 and Z = X3
12
Expectation of functions
Example: Fair dice – continued:
1) E(f(X)) = E(Y ) = 1 · 1/6 + 4 · 1/6 + · · ·+ 36 · 1/6
=1 + 4 + 9 + 16 + 25 + 36
6= 91/6 = 15.1667
2) E(g(X)) = E(Z) = 6.25/3 + 2.25/3 + 0.25/3 = 2.9167
In general: Computation of expectation of f(X):
E(f(X)) =∑
x∈Ω
f(x)P (x)
Weighted sum of the values of f(Ω)
Note:∑
x∈Ω,f(x)=y
f(x)P (x) =∑
y∈f(Ω)
yPY (y)
13
Linear Transformation
For general a, b ∈ R:
E(aX + b) = aE(X) + b
Proof:
E(aX + b) =∑
x∈Ω
(ax + b)P (x)
= a∑
x∈Ω
xP (x) + b∑
x∈Ω
P (x)
= aE(X) + b
Specifically: E(X − µ) = E(X − E(X)) = 0
14
Variance
Definition:
Var (X) := E(X − µ)2
Usual notation: σ2 = Var (X)
σ . . . Standard deviation: SD(X) =√
Var (X)
It holds Var (X) = E(X2)− µ2
E(X − µ)2 =∑
x∈Ω
(x− µ)2P (x) =∑
x∈Ω
(x2 − 2µx + µ2)P (x)
=∑
x∈Ω
x2P (x)− 2µ∑
x∈Ω
xP (x) + µ2∑
x∈Ω
P (x)
= E(X2)− 2µ2 + µ2 = E(X2)− µ2
15
Example for variance
Three random variables X1, X2, X3
X1 = 0 with probability 1
X2 equally distributed on −1, 0, 1X3 equally distributed on −50,−25, 0, 25, 50
All three random variables have mean 0
Var (X1) = 02 · P (0) = 0
Var (X2) = (−1)2 · 1/3 + 12 · 1/3 = 2/3
Var (X3) = (−50)2 · 1/5 + (−25)2 · 1/5 + 252 · 1/5 + 502 · 1/5 = 1250
Variance gives additional information on the distribution
16
Properties of variance
For general a, b ∈ R:
Var (aX + b) = a2Var (X)
Proof:
Var (aX + b) = E(aX + b− aµ− b)2 = a2E(X − µ)2
Specifically: Var (−X) = Var (X)
Var (X + b) = Var (X)
At times E(X2)− µ2 is easier to compute than E(X − µ)2
Exercise: Variance of fair dice with both formulas
17
Moments of a distribution
k-th moment of a random variable: mk := E(Xk)
k-th central moment: zk = E((X − µ)k)
m1 . . . mean
z2 = m2 −m21 . . . variance
Of practical importance also third and fourth moment
Skewness: ν(X) := z3σ3 = E(X3
∗ ) where X∗ := (X − µ)/σ
• ν(X) = 0 . . . symmetric distribution
• ν(X) < 0 . . . left skewed
• ν(X) > 0 . . . right skewed
Kurtosis: z4σ4 = E(X4
∗ )(has to do with curvature → Normal distribution)
18
Exercise: Skewness
Random variable X has the following distribution:
P (1) = 0.05, P (2) = 0.1, P (3) = 0.3, P (4) = 0.5, P (5) = 0.05
Draw probability function and CDF
Compute skewness!
Compute skewness for the mildly changed distribution
P (1) = 0.05, P (2) = 0.3, P (3) = 0.3, P (4) = 0.3, P (5) = 0.05
19
2.3 Binomial distribution
Bernoulli trial: Two possible outcomes (0 or 1)
P (X = 1) = p, P (X = 0) = q where q = 1− p
E.g. fair coin: p = 1/2
Example: Throw an unfair coin twice. P (head) = p = 0.7Compute probability distribution of Z, the number of heads!
Sample space ΩZ = 0, 1, 2Throwing both coins independently!
P (Z = 0) = P (X1 =0, X2 =0) = P (X1 =0)P (X2 =0) = 0.32 = 0.09
P (Z = 1) = P (X1 =0, X2 =1) + P (X1 =1, X2 =0) =
= 2 · P (X1 =0)P (X2 =1) = 2 · 0.3 · 0.7 = 0.42
P (Z = 2) = P (X1 =1, X2 =1) = P (X1 =1)P (X2 =1) = 0.72 = 0.49
20
Binomial distribution
n independent Bernoulli trials, P (X = 1) = p
Y : Number of successes (trials with outcome 1) binomiallydistributed:
P (Y = k) =(nk
)pkqn−k
Proof: Independence ⇒ Probability for each single sequencewith k successes (1) and n− k failures (0) given by pk(1− p)n−k
Number of such sequences: r-combination without replacement
Notation: Y ∼ B(n, p)
Exercise: Throw independently five fair coins
Compute distribution of the number of heads!
21
Example binomial distribution
Exam with failure rate of 20%
Distribution of number of successes for 10 students?
P (X = 7) =(
107
)· 0.87 · 0.23 = 0.2013
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
22
Examples binomial distribution: n = 10
p = 0.1
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
p = 0.2
p = 0.3
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
p = 0.5
23
Exercise: S.R. Example 6f
Communication system - n components, each functionsindependently with probability p
Total system operates if at least one-half of its components work
1. For which values of p is a 5 - component system more likely towork than a 3 - component system?
2. Generalize: For which values of p is a 2k + 1 - componentsystem more likely to work than a 2k − 1 - component system?
24
Application: Drawing with replacement
population of N objects
• M of N objects have some property E• Draw n objects with replacement
Number X of drawn objects with property E are binomiallydistributed:
X ∼ B(n, M/N)
Exercise: Bowl with 3 black and 9 white balls; draw 5 balls withreplacement, X . . . number of drawn balls that are black
• Probability function of X?
• Expected value of X?
25
Mean of binomial distribution
X ∼ B(n, p) ⇒ E(X) = np
Using k(nk
)= n
(n−1k−1
)we obtain
E(X) =n∑
k=1
k
(n
k
)pkqn−k = np
n∑
k=1
(n− 1k − 1
)pk−1qn−k
= npn−1∑
i=0
(n− 1
i
)piqn−1−i
and due to the binomial theorem
n−1∑
i=0
(n− 1
i
)piqn−1−i = (p + q)n−1 = 1
Alternative Proof: Differentiate (p + q)n = 1 w.r.t. p
26
Variance of binomial distribution
X ∼ B(n, p) ⇒ Var (X) = npq
Again using k(nk
)= n
(n−1k−1
)we obtain
E(X2) =n∑
k=1
k2
(n
k
)pkqn−k = np
n∑
k=1
k
(n− 1k − 1
)pk−1qn−k
= npn−1∑
i=0
(i + 1)(
n− 1i
)piqn−1−i = np (n− 1)p + 1
and thus
Var (X) = E(X2)− µ2 = np (n− 1)p + 1 − (np)2 = np(1− p)
Alternative Proof: Differentiate (p + q)n = 1 twice w.r.t. p
27
2.4 Poisson distribution
Definition: Ω = N0 = 0, 1, 2, · · ·
P (X = k) = λk
k! e−λ , λ > 0
Notation: X ∼ P(λ)
Poisson-distributed random variable can take in principle arbitrarilylarge values - though with very small probability
Example: λ = 2
P (X ≤ 1) =20
0!e−2 +
21
1!e−2 = (1 + 2)e−2 = 0.4060
P (X > 4) = 1− P (X ≤ 4) = 1− (1 + 2 +42
+86
+1624
)e−2
= 1− 0.9473 = 0.0527
28
Examples Poisson distribution
λ = 1
0 1 2 3 4 5 6 7 8 9 10 11 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1 2 3 4 5 6 7 8 9 10 11 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
λ = 1.5
λ = 3
0 1 2 3 4 5 6 7 8 9 10 11 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0 1 2 3 4 5 6 7 8 9 10 11 120
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
λ = 5
29
Application
To model rare events
Examples
• Number of clients within a certain time frame
• Radioactive decay
• Number of errors per slide
• Number of people older than 100 years (per 1 000 000)
• number of false alarms per day
• etc.
Connection between Poisson distributed events and the time inbetween two events ⇒ Exponential distribution
30
Assumptions
Events happening at certain points in time are Poisson - distributedunder the following assumptions
• Probability that exactly 1 event occurs within a given timeinterval of length h is approximately λh
• Probability that 2 or more events occur within a given timeinterval of length h is very small compared to h
• Looking at two time intervals that do not overlap the number ofevents in one interval is independent from the number ofevents in the other interval
For each time interval [t1, t2] the probability for the number ofoccurring events is Poisson distributed with parameter λ(t2 − t1).
31
Example
Suppose that the number of earthquakes per week is Poissondistributed with parameter λ = 2
1. What is the probability of at least 3 earthquakes during thenext week?
2. What is the probability of at least 3 earthquakes during thenext two weeks?
Solution: 1)P (X ≥ 3) = 1− P (X ≤ 2) = 1− (1 + 2 + 4
2 )e−2 = 0.3233
2) Now we have a time interval of 2 weeks, therefore we get aPoisson distribution with parameter 2λ = 4
P (X ≥ 3) = 1− P (X ≤ 2) = 1− (1 + 4 + 162 )e−4 = 0.7619
32
Mean and variance
X ∼ P(λ) ⇒ E(X) = λ
Proof:
E(X) =∞∑
k=0
kλk
k!e−λ = e−λ
∞∑
k=1
λk
(k − 1)!= λe−λ
∞∑
j=0
λj
j!
X ∼ P(λ) ⇒ Var (X) = λ
Proof:
E(X2)=∞∑
k=0
k2 λk
k!e−λ =e−λ
∞∑
k=1
kλk
(k − 1)!=λe−λ
∞∑
j=0
(j + 1)λj
j!=λ(λ+1)
E(X2)− E(X)2 = λ(λ + 1)− λ2 = λ
33
Exercise
Suppose that a book has on average on every third page a typo.
1. What is the probability that you find at least two errors on thepage that you are reading right now?
2. What is the probability that you find at least two errors within10 pages?
3. What is the probability that you find at least two errors on anyof 10 pages?
34
Approximation of binomial distribution
X ∼ B(n, p), where n large and p small (e. g. n > 10 and p < 0.05)
⇒ X ∼ P(np)i.e. X is approximately Poisson-distributed with parameter λ = np
Motivation: Let λ := np
P (X = k) =n!
k! (n− k)!pkqn−k
=n(n− 1) · · · (n− k + 1)
k!· λk
nk· (1− λ/n)n
(1− λ/n)k
For n large and moderate λ (i.e. p small) we have
n(n− 1) · · · (n− k + 1)nk
≈ 1 (1−λ/n)k ≈ 1 (1−λ/n)n ≈ e−λ
and thus P (X = k) ≈ λk
k! e−λ
35
Example Poisson approximation
Comparison of Poisson approximation (λ = 0.5) with exact CDF ofbinomial distribution (n = 10, p = 0.05)
0 1 2 3 4 5 60.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Blue: P ∼ B(10, 0.05)Red: P ∼ P(0.5)
Binomial:
P (X ≤ 3) = 0.9510 + 10 · 0.05 · 0.959
+ 45 · 0.052 · 0.958 + 120 · 0.053 · 0.957
= 0.99897150206211
Poisson approximation:
P (X ≤ 3) ≈
≈(
1 + 0.5 +0.52
2+
0.53
6
)e−0.5
= 0.99824837744371
36
2.5 Other discrete Distributions
We will discuss
• Geometric
• Hypergeometric
Apart from that
• Negative binomial (more general: Panjer)
• Generalized Poisson
• Zeta distribution
• etc.
Wikipedia very helpful
37
Geometric distribution
Independent Bernoulli - trials with probability p
X . . . number of trials until the first success
Therefore P (X = k) = qk−1 p
k − 1 failures with probability q = 1− p
Exercise: Bowl with N white and M black balls
Drawing with replacement
a) Probability, that it takes exactly k trials, till one draws a black ball
b) Probability, that it takes at most k trials, till one draws a black ball
38
Geometric distribution
Compare shape of distribution later with density of exponentialdistribution
0 1 2 3 4 5 6 7 8 9 100
0.05
0.1
0.15
0.2
0.25
0.3
0.35
Memorylessness
39
Mean and variance
Note that∞∑
j=0
qj = 11−q and thus
∞∑k=1
qk−1p = p1−q = p
p = 1
Differentiate:∞∑
k=1
kqk−1 = ddq
∞∑k=0
qk = 1(1−q)2
E(X) =∞∑
k=1
kqk−1p =p
(1− q)2=
1p
Differentiate again:∞∑
k=1
k(k − 1)qk−2 = d2
dq2
∞∑k=0
qk = 2(1−q)3
E(X2) =∞∑
k=1
k2qk−1p = pq∞∑
k=1
k(k−1)qk−2 +p∞∑
k=1
kqk−1 =2pq
p3+
1p
And thus: Var (X) = E(X2)− E(X)2 = 2p2 − 1
p − 1p2 = 1−p
p2
40
Hypergeometric distribution
Binomial distribution: Drawing with replacement
Exercise: Bowl, 3 black balls, 5 white balls,Draw 4 balls with and without replacement respectively.
Compute for both cases distribution of the number of drawn blackballs!
0 1 2 3 40
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
with replacement0 1 2 3 4
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
without replacement
41
Hypergeometric distribution
N objects from which M have some property E . Draw n objectswithout replacement, X number of drawn objects with property E .
P (X = k) = (Mk )(N−M
n−k )(N
n)
We use the definition(ab
)= 0, whenever a < b
Clearly we have P (X = k) = 0 if M < k
I cannot draw more black balls than there are in the bowl
Also clear that P (X = k) = 0 if N −M < n− k
I cannot draw more white balls than there are in the bowl
Thus: Ω = k : max(0, n−N + M) ≤ k ≤ min(n, M)
42
Mean and variance
Without proof (easy but slightly tedious computations)
E(X) = nMN , Var (X) = nM
N (1− MN )N−n
N−1 ,
Define p := MN and compare with binomial distribution
E(X) = np same formula like for binomial
Var (X) = np(1− p)N−nN−1 asymptoticly like binomial
because limN→∞ N−nN−1 = 1
If N and M very large compared to n, then we have approximatelyX ∼ B(n, M
N ) (without proof)
43
Example hypergeometric distribution
Quality control: Delivery of 30 boxes with eggs,10 boxes contain at least one broken egg,Take a sample of size 6
• Compute probability that two boxes within the sample containbroken eggs?
N = 30,M = 10, n = 6
P (X = 2) =
(102
)(204
)(306
) = 0.3672
• Mean and variance for the number of boxes within the samplethat contain broken eggs?
E(X) = 6 · 1030 = 2; Var (X) = 6 · 1
3 · 23 · 24
29 = 1.1034
44
Exercise: Approximation by binomial distribution
Lottery with 1000 lots, 200 are winningAssume you buy 5 lots
1. Compute probability, that at least one lot will win
Solution: 0.6731
2. Compute the same probability using the binomialapproximation
Solution: 0.6723
45
Summary discrete distributions
• Uniform: Ω = x1, . . . , xn , P (X = xk) = 1/n
• Binomial: X ∼ B(n, p), P (X = k) =(nk
)pkqn−k
We have E(X) = np, Var (X) = npq Ω = 0, . . . , n
• Poisson: X ∼ P(λ), P (X = k) = λk
k! e−λ
We have E(X) = λ, Var (X) = λ Ω = 0, 1, 2 . . .
• Geometric: P (X = k) = p qk−1
We have E(X) = p−1, Var (X) = q p−2 Ω = 1, 2 . . .
• Hypergeometric: P (X = k) =(Mk
)(N−Mn−k
)/(Nn
)
We have E(X) = np, Var (X) = np(1− p)N−nN−1 , p = M
N
46