31
Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Embed Size (px)

Citation preview

Page 1: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distributionschapter 7 ST210

Nutan S. Mishra

Department of Mathematics and Statistics

University of South Alabama

Page 2: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Useful links

• http://oak.cats.ohiou.edu/~wallacd1/ssample.html

• http://garnet.acns.fsu.edu/~jnosari/05.PDF

• http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/

Page 3: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distributionIn chapter 2 we defined a population parameter as a function of all the population

values.Let population consists of N observations then population mean and population

standard deviation are parameters

For a given population, the parameters are fixed values.

NN

xx

N

xN

ii

22

1

)(

Page 4: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distributionOn the other hand if we draw a sample of size n from a population of size N,

then a function of the sample values is called a statistics

For example sample mean and sample standard deviation are sample statistics.

Since we can draw a large number of samples from the population the value of sample statistic varies from sample to sample

1

)( 22

1

nn

xx

s

n

xx

n

ii

Page 5: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distributionSince value of a sample statistic varies from sample to sample, the

statistic itself is a random variable and has a probability distribution.

For Example sample mean is random variable and it has a probability distribution.

Example: Start with a toy example

Let the population consists of 5 students who took a math quiz of 5 points.

Name of the students and corresponding scores are as follows:

Name of the student A B C D E

Score 2 3 4 4 5

For this population mean µ = 3.6 and standard deviation σ = 1.02

x

Page 6: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution

Now we repeatedly draw samples of size three from the population of size 5. then the possible samples are 10 as listed below

The population parameters are µ = 3.6 and s.d. σ = 1.02

Sample sample Sample values s

1 A,B,C 2,3,4 3 1

2 A,B,D 2,3,4 3 1

3 A,B,E 2,3,5 3.33 1.53

4 A,C,D 2,4,4 3.33 1.16

5 A,C,E 2,4,5 3.67 1.53

6 A,D,E 2,4,5 3.67 1.53

7 B,C,D 3,4,4 3.67 .58

8 B,C,E 3,4,5 4 1

9 B,D,E 3,4,5 4 1

10 C,D,E 4,4,5 4.33 .58

x

Page 7: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distributionX= score of a student in the math quiz

Thus we see that the sample mean is a new random variable and has a probability distribution.

Question: What is the mean of this random variable and what is its variance?

x f P(x)

2 1 .2

3 1 .2

4 2 .4

5 1 .2

f P( )

3 2 .2

3.33 2 .2

3.67 3 .3

4 2 .2

4.33 1 .1

x xPopulation distribution

Sampling distribution of sample mean

x

Page 8: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.8

Here are some guidelines to solve

1. X= teaching experience of a faculty

2. Write the two columns x and p(x)

3. Total number of samples of size 4 from a population of size 5 is (5 choose 4) = 5

4. List all the 5 samples and compute their sample means.

5. Compute the quantities in part b and c.

Page 9: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distributionLet N be the size of the population and n be the size of the

sample

If n/N > .05

And if n/N ≤.05

1

mean sample ofdevation standard and

mean sample ofmean

x

x

N

nN

n

n

x

x

mean sample ofdevation standard and

mean sample ofmean

Page 10: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of sample mean

Theorem

Let X be a random variable with population mean µ and population standard deviation σ . If we collect the samples of size n then the new random variable sample mean has the mean same as µ and standard deviation σ/√n

We can denote them as follows:

x

n

mean

x

x

x ofdeviation standard

x of

Page 11: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of sample mean

n

mean

x

x

x ofdeviation standard

x of

Standard deviation of sample mean decreases as the sample size increases.

The mean of the sample remains unaffected with the change in sample size.

Sample mean is called an estimator of the population mean.

Because whenever population mean is unknown we will use sample mean in place.

Page 12: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.13

X has a large population with µ=60 and σ = 10

Assuming n/N ≤ .05, the parameters of sample mean are

05.190/10x ofdeviation standard

60x of

90 nwhen

36.218/10x ofdeviation standard

60x of

18nwhen

x

x

x

x

n

mean

n

mean

Page 13: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of sample meanP( )

3 .2

3.33 .2

3.67 .3

4 .2

4.33 .1

x x

From the above table when we compute the mean and variance

They are (complete this with the help of chapter 5 slides)

Page 14: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of sample mean

We have seen that distribution of the sample mean is derived from the distribution of x

Thus distribution of x is called parent distribution.

The next question is to investigate what is the relationship between the parent distribution and the sampling distribution of .

x

x

Page 15: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of sample mean

Let the distribution of x is normal with mean µ and standard deviation σ then it is equivalent to saying that

Let the parent population is normal with mean µ and standard deviation σ

If we draw a sample of size n from such a population then • Mean of that is is equal to the mean of the

population µ.• Standard deviation of that is is equal to σ/√n

• The shape of the distribution of is normal whatever be the value of n

xx

x xx

Page 16: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of sample mean

If X~ N(µ, σ) then

~ N ((µ, σ/√n)

Where n is size of the sample drawn from the population

x

Page 17: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Central Limit Theorem

For a large sample size, the sampling distribution of is approximately normal, irrespective of the shape of the population distribution.

What size of the sample is considered to be large?

A sample of size ≥ 30 is considered to be large.

Useful link:

http://www.austin.cc.tx.us/mparker/1342/cltdemos.htm

x

Page 18: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.28Given that population distribution is skewed to the left.That is X is not distributed as normal.

a. When n=400 (i.e. when we repeatedly draw samples of size

400 from the population) and compute the sample mean for all such samples then what would be the distribution of .

Answer : since the sample size is large, in such a case the distribution of according to Central Limit theorem will be normal that is

~ N( µ, σ/√400)

x

x

x

x

Page 19: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of sample meanIf the random sample comes from a normal population, the sampling distribution of sample

mean is normal regardless the size of the sample.

If the shape of the parent population is not known or not normal then distribution of sample mean is approximately normal when ever n is large (≥30).(this is central limit theorem)

If the shape of the parent population is not known or not normal and sample size is small then we can not say readily about the shape of sample distribution

Page 20: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Estimators• Sample mean is an estimator of population mean

µ• By this we mean when ever value of µ is not

available we will use .• Sample mean is an unbiased estimator of

population mean µ• Unbiased estimator means in the long run value of

approaches to the true value of µ. In other words expected value of is equal to µ.

x

xx

xx

Page 21: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling error• Recall that for a given population value of µ is fixed and

is a variable whose value varies from sample to sample• When we use in place of µ some error is inevitable • The difference between µ and is called sampling error

Sampling error = - µ • The sampling error occurs purely due to chance. The

chance of being a specific sample being selected.• Other type of errors may occur in the estimation : for

example error in recording a value or a missing value. Such types of errors are called non-sampling errors

x

xx

x

Page 22: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Example of sampling error• Now we repeatedly draw samples of size three from the

population of size 5. then the possible samples are 10 as listed below

• The population parameters are µ = 3.6 and s.d. σ = 1.02Sample sample Sample values Sampling error = -µ

1 A,B,C 2,3,4 3 -.6

2 A,B,D 2,3,4 3 -.6

3 A,B,E 2,3,5 3.33 -.27

4 A,C,D 2,4,4 3.33 -.27

5 A,C,E 2,4,5 3.67 .07

6 A,D,E 2,4,5 3.67 .07

7 B,C,D 3,4,4 3.67 .07

8 B,C,E 3,4,5 4 .4

9 B,D,E 3,4,5 4 .4

10 C,D,E 4,4,5 4.33 .73

x x

Page 23: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Example of sampling errorSample sample Sample values Sampling error = -µ

1 A,B,C 2,3,4 3 -.6

2 A,B,D 2,3,4 3 -.6

3 A,B,E 2,3,5 3.33 -.27

4 A,C,D 2,4,4 3.33 -.27

5 A,C,E 2,4,5 3.67 .07

6 A,D,E 2,4,5 3.67 .07

7 B,C,D 3,4,4 3.67 .07

8 B,C,E 3,4,5 4 .4

9 B,D,E 3,4,5 4 .4

10 C,D,E 4,4,5 4.33 .73

x

The last column in the above table computes the error in estimation. That is while drawing a sample of size 3 from the given population, if we get say sample number 3, and use the corresponding value to estimate the population mean µ then the error in estimation is -.27 units.

x

Page 24: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.4Population consists of six numbers15,13,8,17, 9,12a. Population mean = 12.33b. Liza selected a sample with n=4 and values 13,8,9,12.

sample mean = 10.5. then sampling error = 10.5-12.33 = -1.83

c. while calculating sample mean Liza mistakenly entered a 6 in place of 9 in the above sample. That is she entered 13,8,6,12. That is a non-sampling error has occurred. And the sample mean is 9.75.

Total error = sampling error + non-sampling error.

Total error = 9.75 – 12.33 = -2.58 out of which -1.83 is the sampling error . Thus non sampling error = -2.58 - (-1.83) = -.75

Page 25: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.49• X= GPA of a student enrolled at a large university• X~ N( 3.02, .29) (This x represents the characteristics of

whole population of students)• That is average GPA of all the students in the population

is 3.02 and standard deviation is .29.• We draw a sample of size n=20 from this population and

compute the sample mean • To find P( >3.10) (as asked in part a)• To compute such a probability we must know what is the

distribution of • Since the sample is small but the parent population is

normal hence ~ N( 3.02 , .29/√20) • at this point we convert the probability statement in the

form of probability statement in z using the transformation formula

• P( >3.10) = P(z > ) = P(z > )

xx

x

x

xn

x

/

20/29.

02.310.3

Page 26: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.52X = time spent by a college student in studying /week

X~ right skewed ( 8.4, 2.7)

that is the population of all college student spend 8.4 hrs/week on the average with a standard deviation of 2.7 hrs. And the distribution is right skewed (i.e. not normal)

If we draw a sample of size n=45 students from this population and compute the sample mean then we are asked to find P(8 < <9)

To find such a probability we must know the distribution of

Though the parent distribution is right skewed, since sample size large , we apply the CLT to conclude that

~ N(8.4 , 2.7/√45 )

P(8< <9) = P( < z < )

xx

xx

45/7.2

4.88 45/7.2

4.89

Page 27: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Population and sample proportionsConsider a categorical variable with just two

categories.

Let the population size be N out of which X falls in category I.

Then population proportion of category I = X/N (denoted by p)

Thus population proportion p = X/N

If we draw a sample of size n from this population and observe that out of n fall in category I then sample proportion of category I = /n (denoted by

Thus sample proportion = /n

x

xp̂p̂x

Page 28: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Population and sample proportions

A population consists of 9000 families in a small town. Out of these, 3600 families have their houses insured.

Then population proportion of house insured families = p = 3600/9000 = .4

Suppose we drew a sample of size 100 from the above population and observed that 42 families out of 100 have house insurance. Then the sample proportion of the house insured families

= 42/100 = .42

Sampling error = - p = .42 - .40 = .02

p̂p̂

Page 29: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Sampling distribution of p̂

5nq and 5np if large be toconsidered isn here

) p, N( ~p̂

thenlargeley considerab isn size sample If

:proportion samplefor remLimit theo Central

p-1 q where and p̂ ofmean

ondistributi sampling a has and

variablerandom a is p̂ Thus . p̂for valuesdifferent get we

population thefrom samples multiple draw When we

p̂p̂

n

pq

n

pqp

Page 30: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.60N = 1000, X = 640

Then population proportion p = 640/1000 = .64

n= 40 , x = 24

then sample proportion = 24/40 = .60 p̂

Page 31: Sampling distributions chapter 7 ST210 Nutan S. Mishra Department of Mathematics and Statistics University of South Alabama

Exercise 7.70

0483.100

.37*.63

n

pq and

63.then

100 n

drawn. is 100 size of sample when and find to

.05 n/N and

.63 p proportion populationGiven

p̂p̂

p