ENGG2450 Probability and Statistics for Engineerstlzhao/teaching/engg2450a/lecture...ENGG2450 Probability and Statistics for Engineers 1 Introduction 3 Probability 4 Probability distributions

ENGG2450 Probability and Statistics for EngineersENGG2450 Probability and Statistics for Engineers1 Introduction3 Probabilityy4 Probability distributions5 Probability Densities5 Probability Densities2 Organization and description of data6 Sampling distributions6 Sampling distributions7 Inferences concerning a mean

C8 Comparing two treatments9 Inferences concerning variancesA Random Processes

6 Sampling distributions

6.2 The sample distribution of

1 Introduction3 Probability4 Probability distributions5 Probability densities2 Organization & description6.2 The sample distribution of

the mean (σ known)

6 3 Th l di t ib ti f

2 Organization & description6 Sampling distributions7 Inferences .. mean8 Comparing 2 treatments9 Inferences .. variancesA Random processes6.3 The sample distribution of

the mean (σ unknown)

A Random processes

6.4 The sampling distribution of the variancethe variance

(revision: 2.1 Populations and samples) (3)

Random Samples (finite population)

A set of observations X1, X2, …, Xn constitutes a random sample of size n from a finite population of size N, if its values are chosen so that


size n from a finite population of size N, if its values are chosen so that each subset of n of the N elements of the population has the same probability of being selected.

e.g. N= 100, n= 4

X1 , X2 , X3 , X4 , X5 ,X6 , X7 , X8 , X9 , X10 ,

X X X X X X X XX11 ,X12, X13 , X14 , X15 , X16 , …. X99, X100

The upper case represents the random variables before they are observed.



A set of observations X1, X2, …, Xn constitutes a random sample of size n from a finite population of size N, if its values are chosen so that


size n from a finite population of size N, if its values are chosen so that each subset of n of the N elements of the population has the same probability of being selected.

e.g. N= 100, n= 4

x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 ,

x11 , x12 , x13 , x14 , x15 , x16 , …… x99, x100

We may also apply the term random sample to x1, x2, …, xn which is the set of observed values of the random variables X1, X2, …, Xn .the set of observed values of the random variables X1, X2, …, Xn .

Random Samples (infinite population)


A set of observations X1, X2, …, Xn constitutes a random sample of

Random Samples (infinite population)

size n from the infinite population f(x) if

1. Each Xi is a random variable whose distribution is given by f(x).i g y f( )2. These n random variables are independent.

X1 , X2 , X3 , X4 , X5 ,X6 , X7 , X8 , X9 , X10 ,

X X X X X X X XX11 ,X12, X13 , X14 , X15 , X16 , …. X99, X100, …

… , X1001, X1002, X1003, X1004, … … … … … …

The upper case represents the random variables before they are observed.…

We may also apply the term random sample to the set of observed values x1, x2, …, xn of the random variables.

x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15

x xx

e.g.

x16 , x17 , x18 , x19 , x20 , x21 , x22, x23, x24.

A set of observations X1(a) How many different samples of size n=2 can be chosen from a finite population of size N=7?

(b) Repeat (a) with N=24

A set of observations X1, X2, …, Xn constitutes a random sample of size nfrom a finite population of (b) Repeat (a) with N 24. p psize N, if its values are chosen so that each subsetof n of the N elements of

(c) What is the probability of each sample in part (a)if the samples are to be random?

l

the population has the same probability of being selected.

(d) Repeat (c) with N=24.

sln.(a) The number of possible samples = C7,2

(b) The n mber of possible samples C 24 23 / 2 276

= 7x6 / 2 = 21

(b) The number of possible samples = C24,2 = 24x23 / 2 = 276

(c) The probability of each sample in part (a) is 1/21.

(d) The probability of each sample in part (b) is 1/276.

6.2 The sample distribution of the mean ( known) (7)

A set of observations X1, X2, …, X constitutes a random sample of size n fromA set of observations X1, X2, …, Xn constitutes a random sample of size n from the infinite population f(x) if each Xi is a random variable whose distribution is given by f(x) and these n random variables are independent.

go to slide 2

A random sample of n (say 10) observations is taken from some population. The mean of the sample is computed to estimate the mean of the populationmean of the population.

x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15x xx

x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ……

Suppose 50 random samples of size n=10 are taken from a population pp p p phaving the discrete uniform distribution f(x) = 0.1 for x=0,1,2,…, 9 and f(x) = 0 for other values of x.

Sampling is with replacement and we are sampling from an infinite population.

(continued) Suppose 50 random samples of size n=10 are taken from a population having the discrete uniform distribution f(x) = 0.1 for x=0,1,2,…, 9 and f(x) = 0 for other values of x.

x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15x xx

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

Proceeding in this way, we get 50 samples whose means are

4 4 3 2 5 0 3 5 4 1 4 4 3 6 6 5 5 3 4 4

means Frequency

[ 2.0 , 3.0 ) 2[ 3.0 , 4.0 ) 144.4 3.2 5.0 3.5 4.1 4.4 3.6 6.5 5.3 4.4

3.1 5.3 3.8 4.3 3.3 5.0 4.9 4.8 3.1 5.33.0 3.0 4.6 5.8 4.6 4.0 3.7 5.2 3.7 3.8 5 3 5 5 4 8 6 4 4 9 6 5 3 5 4 5 4 9 5 3

[ , )[ 4.0 , 5.0 ) 19[ 5.0 , 6.0 ) 12[ 6.0 , 7.0 ) 3

5.3 5.5 4.8 6.4 4.9 6.5 3.5 4.5 4.9 5.33.6 2.7 4.0 5.0 2.6 4.2 4.4 5.6 4.7 4.3

Total 50

Th l ti h th di tThe population has the discrete uniform distribution but the means of the 50 random samples has a Why?bell-shaped distribution.

(continued) The population has the discrete uniform distribution but the means of the 50 random samples has a bell-shaped distribution. Why?

x xxx1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15

x16 x17 x18 x19 x20 x21 x99 x100 x101 x102 x103 x104 x

x xx

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

To answer this kind of question, we need to investigate the

F l f d 2

To answer this kind of question, we need to investigate the

theoretical sampling distribution of the sample mean ...1

nnXXX

Formulas for and 2XX

Theorem 1: If a random sample of size n is taken from a population ha ing the mean and the ariance 2 thenhaving the mean and the variance 2, then(a) is a random variable whose distribution has the mean ,X(b) for samples from infinite populations

,2

n

2 finite population(c) for samples from finite populations,

(b) for samples from infinite populations, the variance of this distribution is

.1

.2

NnN

n

finite population correction factor

(c) for samples from finite populations, the variance of this distribution is

note: is an outcome of random variable

xx1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15

x xx

f f f

random variable

ti thn

nXXX

..1x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

Theorem 1(a): If a random sample of size n is taken from a population having the mean and the variance 2, then

is a random variable which has the mean X

representing the sample mean.

is a random variable which has the mean . XXnote: random variables X1, ..,

Pf: The mean of the sample mean is

nn

n

i

i dxdxdxxxxfnx ...),...,,(... 212

11

X

variables X1, .., Xn have joint pdf f(x1,..,xn).

nn

n

i dxdxdxxfxfxfx ...)()...()(...12121

note: x1,.., xn are dummy variablesnn

ii fff

n)()()( 212

11

1

dummy variables representing outcomes of X1, X2,

nnn dxdxdxxfxfxxxn

...)()...(......121121

…, Xn .

(continue) Pf :

1

The mean of the sample mean is nn

n

i

i dxdxdxxxxfnx ...),...,,(... 212

11

X

nnn dxdxdxxfxfxxxn

...)()...(......121121

X

11

1

......)()...()(...1...)()...()(...12121221211

nnnn dxdxdxxfxfxfx

ndxdxdxxfxfxfx

n

nn dxxfdxxfdxxfxn

)(...)()(122111

1

nn dxxfdxxfdxxfxdxxfn

)(...)()()(13322211

...

nnnnn dxxfxdxxfdxxfdxxfn

)()(...)()(1112211

nnn

...

= the population mean.

e.g. n=10 x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 , x9 , x10 , x11 , x12 , x13 , x14 , x15

x xx

Th 1(b) If d l f i i t k f l ti

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

Theorem 1(b): If a random sample of size n is taken from a population having the mean and the variance 2, then is a random variable of the variance

X.

2

n

Pf: Without loss of generality, we assume =0 and so

n

nn dxdxdxxxxfx ...),...,,(... 212122

X2n

12 xxxn

i ji

12

nx

x i i where2

1

n

iji

jii

2

11 )...()...(n

xxxx nn

nnin

idxdxdxxxxfx

n...),...,,(...1

21212

122

X

nnjiji

dxdxdxxxxfxxn

...),...,,(...121212


x xx

1

(continue) Pf :

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

nnin

i dxdxdxxxxfxn

...),...,,(...12121

212

2

X

dddf )(1

nnjiji

dxdxdxxxxfxxn

...),...,,(... 21212

1 E ])[( 2XVariance 2nni

n

i dxdxdxxfxfxfxn

...)()()(...12121

212

dddfff )()()(1

dxxfx

E

)()(

])[(2

X Variance 2

nnjiji

dxdxdxxfxfxfxxn

...)()()(...121212

1 1iii

n

idxxfx

n

)(1 2

12 jjjiiiji

dxxfxdxxfxn

)()(12

1 2

n

in 12

2

1 n

2


x xx

Theorem 1: If a random sample of size n is taken from a population having

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

Theorem 1: If a random sample of size n is taken from a population having the mean and the variance 2, then(a) is a random variable whose distribution has the mean ,X(b) for samples from infinite populations,

the variance of this distribution is ,2

nn

Chebyshev’s Theorem: f(x)

n

kP || X .12k

nnXXX

..1

n

k /n k /n

k

n

kP || X .11 2k n k


x xx

Theorem 1: If a random sample of size n is taken from a population having

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

Theorem 1: If a random sample of size n is taken from a population having the mean and the variance 2, then(a) is a random variable whose distribution has the mean ,X(b) for samples from infinite populations,

the variance of this distribution is ,2

nn

f(x)Chebyshev’s Theorem:

n

kP || X .11 2k || XP .1 2

2

n

nnXXX

..1

k /n k /nFor any given >0, the probability can be made arbitrarily close to 1 by

|| XP

= =can be made arbitrarily close to 1 by choosing n sufficiently large.


x xx

Law of large numbers

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

Theorem 2: Let X1 , X2 , …, Xn be independent random variables each having the same mean and variance 2 Then

Law of large numbers

having the same mean and variance . Then nP as0)|(| -X

As the sample size increases, unboundedly, the probability that theAs the sample size increases, unboundedly, the probability that the sample mean differs from the population mean , by more than arbitrary amount , converges to zero.

f(x)Chebyshev’s Theorem:

|| XP 12

nXXX

..1 || XP .1 2n

For any given >0, the probability || XP

n

k /n k /n= =

y g , p ycan be made arbitrarily close to 1 by choosing n sufficiently large.


x xx

X1, X2, …, Xn are random variables.

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

e g Consider an experiment where a specified event A has probability

= ( X1 + .. + Xn )/n , called the sample mean, is a random variable. X

e.g. Consider an experiment where a specified event A has probability p of occurring. Suppose that, when the experiment is repeated ntimes, outcomes from different trials are independent. Show that

number of times A occurs in n trialsrelative frequency of A =

nbecomes arbitrary close to p, with arbitrarily high probability, as the number of times the experiment is repeated grows unboundedly.

Sln. We can define n random variables X1 , X2 , …, Xn whereXi =1 if A occurs on the i th trial and Xi =0 otherwise.

X1 + X2 +.. + Xn is the number of times that event A occurs in n trials.

Random variable =( X1 + X2 +.. + Xn )/n is the relative frequency of A.X

e.g. Consider an experiment where a specified event A has probability p of occurring. Suppose that, when the experiment is repeated n times, outcomes from different trials are independent. Show that number of times A occurs in n trialsare independent. Show that number of times A occurs in n trialsrelative frequency of A =

nbecomes arbitrary close to p, which arbitrarily high probability, as the number of times the

i i d b d dl

(continued) Sln. We can define n random variables X1 , X2 , …, Xn whereXi =1 if A occurs on the i th trial and Xi =0 otherwise

experiment is repeated grows unboundedly.

Xi 1 if A occurs on the i th trial and Xi 0 otherwise.

Then X1 + X2 + …+ Xn is the number of times that event A occurs in n trials of the experiment and , the sample mean, is the relative frequency of A.X

E[Xi ] = )(' xfxk

xk

all

E[Xi2 ]

= 1 p + 0 (1- p) = p

= 12 p + 02 (1- p) = p

The Xi are independent and identically distributed with mean = p

[ i ] p ( p) p

and variance 2 = E[Xi2 ] – p(1- p).

e.g. Consider an experiment where a specified event A has probability p of occurring. Suppose that, when the experiment is repeated n times, outcomes from different trials are independent. Show that number of times A occurs in n trialsare independent. Show that number of times A occurs in n trials

relative frequency of A = n

becomes arbitrary close to p, which arbitrarily high probability, as the number of times the i i d b d dl

(continued) Sln.experiment is repeated grows unboundedly.

X1 + X2 + …+ Xn is the number of times that event A occurs in n trials of the experiment.

, the sample mean, is the relative frequency of A in n trials.X

Theorem 2 (Law of large number): Let X1 , X2 , …, Xn be independent 2random variables each having the same mean and variance 2. Then

as the sample size n increases, unboundedly, the probability that the sample mean differs from the population mean which is equal to p) bysample mean differs from the population mean which is equal to p), by more than arbitrary amount , converges to zero, i.e.

.0)|(| nP as-X .0)|(| nP asX(sample size n increases)

l l tisample mean = relative frequency of A in n trials

population mean= p


x xx

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

Theorem 1(b): If a random sample of size n is taken from a population having the

mean and the variance 2, then the sample mean is a random variable

of the variance

X2of the variance

The reliability of the sample mean as an estimate of the population meani ft d b th t d d d i ti f th

.n

is often measured by the standard deviation of the meanwhich is also called standard error of the mean.

n X

50 samples whose means are

4.4 3.2 5.0 3.5 4.1 4.4 3.6 6.5 5.3 4.4 3 1 5 3 3 8 4 3 3 3 5 0 4 9 4 8 3 1 5 3

e.g. Suppose 50 random samples of size n=10are taken from a population having the di t if di t ib ti f( ) 0 1 f 3.1 5.3 3.8 4.3 3.3 5.0 4.9 4.8 3.1 5.3

3.0 3.0 4.6 5.8 4.6 4.0 3.7 5.2 3.7 3.8 5.3 5.5 4.8 6.4 4.9 6.5 3.5 4.5 4.9 5.33.6 2.7 4.0 5.0 2.6 4.2 4.4 5.6 4.7 4.3

discrete uniform distribution f(x) = 0.1 for x=0,1,2,…, 9 and f(x) = 0 for other values of x.

n n

428.450

1

n

i ix

xx 9298.0

50)(

12

2

n

i xix

xxs


x xx

x16 , x17 , x18 , x19 , x20 , x21 , …, x99, x100 , x101, x102 , x103, x104, x105 , ..

S ppose 50 random samples of si e 10Suppose 50 random samples of size n=10are taken from a population having the discrete uniform distribution f(x) = 0.1 for

Theorem 1: If a random sample of size n is taken from an infinite population having the mean and the variance 2 then thex=0,1,2,…, 9 and f(x) = 0 for other values of x.

5.41019

0

xx

and the variance 2, then the sample mean has mean , and variance 2/n.

X

50 samples whose means are

4.4 3.2 5.0 3.5 4.1 4.4 3.6 6.5 5.3 4.4 3.1 5.3 3.8 4.3 3.3 5.0 4.9 4.8 3.1 5.33 0 3 0 4 6 5 8 4 6 4 0 3 7 5 2 3 7 3 8

0x

25.8101)()()(

9

0

29

0

22 xx

xxfx

3.0 3.0 4.6 5.8 4.6 4.0 3.7 5.2 3.7 3.8 5.3 5.5 4.8 6.4 4.9 6.5 3.5 4.5 4.9 5.33.6 2.7 4.0 5.0 2.6 4.2 4.4 5.6 4.7 4.3

By Theorem 1, the mean and variance of the sample mean are respectivelyX

X = 4 5 nX

n22 X

= 4.5

= 0.825 428.450

1

n

i ix

xx

)( 2n

9298.050

)(12

2

n

i xix

xxs

49These theoretical values are close to those computed from the 50 samples.

Central Limit Theorem

Theorem3: If is the mean of a random sample of size n is taken from a population having the mean and the

X

population having the mean and the variance 2, then

is a random variable-XZ is a random variable

whose distribution approaches

nZ

that of the standard normal distributions as n.

X1, X2 , …, Xn are independent random variables with p.d.f. px1, px2 , … , pxn

respectively For Y = X1 + X2 + + X the p d f of Y isrespectively. For Y X1 + X2 + … + Xn , the p.d.f. of Y is

py(y) = px1 px2 … pxn where is convolution.

Central Limit Theorem

Theorm: If n is very large, then for all pxi the p.d.f. of Y equals2

1 y )( 22

21 σ

y

yn

eπσ

yplim)(

)(

where 21 ... n where22

221

221

... n

n

Documents

ENGG2450 Probability and Statistics for Engineerstlzhao/teaching/engg2450a/lecture...ENGG2450 Probability and Statistics for Engineers 1 Introduction 3 Probability 4 Probability distributions