Upload
erika-mason
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Sampling distributionschapter 7 ST210
Nutan S. Mishra
Department of Mathematics and Statistics
University of South Alabama
Useful links
• http://oak.cats.ohiou.edu/~wallacd1/ssample.html
• http://garnet.acns.fsu.edu/~jnosari/05.PDF
• http://www.ruf.rice.edu/~lane/stat_sim/sampling_dist/
Sampling distributionIn chapter 2 we defined a population parameter as a function of all the population
values.Let population consists of N observations then population mean and population
standard deviation are parameters
For a given population, the parameters are fixed values.
NN
xx
N
xN
ii
22
1
)(
Sampling distributionOn the other hand if we draw a sample of size n from a population of size N,
then a function of the sample values is called a statistics
For example sample mean and sample standard deviation are sample statistics.
Since we can draw a large number of samples from the population the value of sample statistic varies from sample to sample
1
)( 22
1
nn
xx
s
n
xx
n
ii
Sampling distributionSince value of a sample statistic varies from sample to sample, the
statistic itself is a random variable and has a probability distribution.
For Example sample mean is random variable and it has a probability distribution.
Example: Start with a toy example
Let the population consists of 5 students who took a math quiz of 5 points.
Name of the students and corresponding scores are as follows:
Name of the student A B C D E
Score 2 3 4 4 5
For this population mean µ = 3.6 and standard deviation σ = 1.02
x
Sampling distribution
Now we repeatedly draw samples of size three from the population of size 5. then the possible samples are 10 as listed below
The population parameters are µ = 3.6 and s.d. σ = 1.02
Sample sample Sample values s
1 A,B,C 2,3,4 3 1
2 A,B,D 2,3,4 3 1
3 A,B,E 2,3,5 3.33 1.53
4 A,C,D 2,4,4 3.33 1.16
5 A,C,E 2,4,5 3.67 1.53
6 A,D,E 2,4,5 3.67 1.53
7 B,C,D 3,4,4 3.67 .58
8 B,C,E 3,4,5 4 1
9 B,D,E 3,4,5 4 1
10 C,D,E 4,4,5 4.33 .58
x
Sampling distributionX= score of a student in the math quiz
Thus we see that the sample mean is a new random variable and has a probability distribution.
Question: What is the mean of this random variable and what is its variance?
x f P(x)
2 1 .2
3 1 .2
4 2 .4
5 1 .2
f P( )
3 2 .2
3.33 2 .2
3.67 3 .3
4 2 .2
4.33 1 .1
x xPopulation distribution
Sampling distribution of sample mean
x
Exercise 7.8
Here are some guidelines to solve
1. X= teaching experience of a faculty
2. Write the two columns x and p(x)
3. Total number of samples of size 4 from a population of size 5 is (5 choose 4) = 5
4. List all the 5 samples and compute their sample means.
5. Compute the quantities in part b and c.
Sampling distributionLet N be the size of the population and n be the size of the
sample
If n/N > .05
And if n/N ≤.05
1
mean sample ofdevation standard and
mean sample ofmean
x
x
N
nN
n
n
x
x
mean sample ofdevation standard and
mean sample ofmean
Sampling distribution of sample mean
Theorem
Let X be a random variable with population mean µ and population standard deviation σ . If we collect the samples of size n then the new random variable sample mean has the mean same as µ and standard deviation σ/√n
We can denote them as follows:
x
n
mean
x
x
x ofdeviation standard
x of
Sampling distribution of sample mean
n
mean
x
x
x ofdeviation standard
x of
Standard deviation of sample mean decreases as the sample size increases.
The mean of the sample remains unaffected with the change in sample size.
Sample mean is called an estimator of the population mean.
Because whenever population mean is unknown we will use sample mean in place.
Exercise 7.13
X has a large population with µ=60 and σ = 10
Assuming n/N ≤ .05, the parameters of sample mean are
05.190/10x ofdeviation standard
60x of
90 nwhen
36.218/10x ofdeviation standard
60x of
18nwhen
x
x
x
x
n
mean
n
mean
Sampling distribution of sample meanP( )
3 .2
3.33 .2
3.67 .3
4 .2
4.33 .1
x x
From the above table when we compute the mean and variance
They are (complete this with the help of chapter 5 slides)
Sampling distribution of sample mean
We have seen that distribution of the sample mean is derived from the distribution of x
Thus distribution of x is called parent distribution.
The next question is to investigate what is the relationship between the parent distribution and the sampling distribution of .
x
x
Sampling distribution of sample mean
Let the distribution of x is normal with mean µ and standard deviation σ then it is equivalent to saying that
Let the parent population is normal with mean µ and standard deviation σ
If we draw a sample of size n from such a population then • Mean of that is is equal to the mean of the
population µ.• Standard deviation of that is is equal to σ/√n
• The shape of the distribution of is normal whatever be the value of n
xx
x xx
Sampling distribution of sample mean
If X~ N(µ, σ) then
~ N ((µ, σ/√n)
Where n is size of the sample drawn from the population
x
Central Limit Theorem
For a large sample size, the sampling distribution of is approximately normal, irrespective of the shape of the population distribution.
What size of the sample is considered to be large?
A sample of size ≥ 30 is considered to be large.
Useful link:
http://www.austin.cc.tx.us/mparker/1342/cltdemos.htm
x
Exercise 7.28Given that population distribution is skewed to the left.That is X is not distributed as normal.
a. When n=400 (i.e. when we repeatedly draw samples of size
400 from the population) and compute the sample mean for all such samples then what would be the distribution of .
Answer : since the sample size is large, in such a case the distribution of according to Central Limit theorem will be normal that is
~ N( µ, σ/√400)
x
x
x
x
Sampling distribution of sample meanIf the random sample comes from a normal population, the sampling distribution of sample
mean is normal regardless the size of the sample.
If the shape of the parent population is not known or not normal then distribution of sample mean is approximately normal when ever n is large (≥30).(this is central limit theorem)
If the shape of the parent population is not known or not normal and sample size is small then we can not say readily about the shape of sample distribution
Estimators• Sample mean is an estimator of population mean
µ• By this we mean when ever value of µ is not
available we will use .• Sample mean is an unbiased estimator of
population mean µ• Unbiased estimator means in the long run value of
approaches to the true value of µ. In other words expected value of is equal to µ.
x
xx
xx
Sampling error• Recall that for a given population value of µ is fixed and
is a variable whose value varies from sample to sample• When we use in place of µ some error is inevitable • The difference between µ and is called sampling error
Sampling error = - µ • The sampling error occurs purely due to chance. The
chance of being a specific sample being selected.• Other type of errors may occur in the estimation : for
example error in recording a value or a missing value. Such types of errors are called non-sampling errors
x
xx
x
Example of sampling error• Now we repeatedly draw samples of size three from the
population of size 5. then the possible samples are 10 as listed below
• The population parameters are µ = 3.6 and s.d. σ = 1.02Sample sample Sample values Sampling error = -µ
1 A,B,C 2,3,4 3 -.6
2 A,B,D 2,3,4 3 -.6
3 A,B,E 2,3,5 3.33 -.27
4 A,C,D 2,4,4 3.33 -.27
5 A,C,E 2,4,5 3.67 .07
6 A,D,E 2,4,5 3.67 .07
7 B,C,D 3,4,4 3.67 .07
8 B,C,E 3,4,5 4 .4
9 B,D,E 3,4,5 4 .4
10 C,D,E 4,4,5 4.33 .73
x x
Example of sampling errorSample sample Sample values Sampling error = -µ
1 A,B,C 2,3,4 3 -.6
2 A,B,D 2,3,4 3 -.6
3 A,B,E 2,3,5 3.33 -.27
4 A,C,D 2,4,4 3.33 -.27
5 A,C,E 2,4,5 3.67 .07
6 A,D,E 2,4,5 3.67 .07
7 B,C,D 3,4,4 3.67 .07
8 B,C,E 3,4,5 4 .4
9 B,D,E 3,4,5 4 .4
10 C,D,E 4,4,5 4.33 .73
x
The last column in the above table computes the error in estimation. That is while drawing a sample of size 3 from the given population, if we get say sample number 3, and use the corresponding value to estimate the population mean µ then the error in estimation is -.27 units.
x
Exercise 7.4Population consists of six numbers15,13,8,17, 9,12a. Population mean = 12.33b. Liza selected a sample with n=4 and values 13,8,9,12.
sample mean = 10.5. then sampling error = 10.5-12.33 = -1.83
c. while calculating sample mean Liza mistakenly entered a 6 in place of 9 in the above sample. That is she entered 13,8,6,12. That is a non-sampling error has occurred. And the sample mean is 9.75.
Total error = sampling error + non-sampling error.
Total error = 9.75 – 12.33 = -2.58 out of which -1.83 is the sampling error . Thus non sampling error = -2.58 - (-1.83) = -.75
Exercise 7.49• X= GPA of a student enrolled at a large university• X~ N( 3.02, .29) (This x represents the characteristics of
whole population of students)• That is average GPA of all the students in the population
is 3.02 and standard deviation is .29.• We draw a sample of size n=20 from this population and
compute the sample mean • To find P( >3.10) (as asked in part a)• To compute such a probability we must know what is the
distribution of • Since the sample is small but the parent population is
normal hence ~ N( 3.02 , .29/√20) • at this point we convert the probability statement in the
form of probability statement in z using the transformation formula
• P( >3.10) = P(z > ) = P(z > )
xx
x
x
xn
x
/
20/29.
02.310.3
Exercise 7.52X = time spent by a college student in studying /week
X~ right skewed ( 8.4, 2.7)
that is the population of all college student spend 8.4 hrs/week on the average with a standard deviation of 2.7 hrs. And the distribution is right skewed (i.e. not normal)
If we draw a sample of size n=45 students from this population and compute the sample mean then we are asked to find P(8 < <9)
To find such a probability we must know the distribution of
Though the parent distribution is right skewed, since sample size large , we apply the CLT to conclude that
~ N(8.4 , 2.7/√45 )
P(8< <9) = P( < z < )
xx
xx
45/7.2
4.88 45/7.2
4.89
Population and sample proportionsConsider a categorical variable with just two
categories.
Let the population size be N out of which X falls in category I.
Then population proportion of category I = X/N (denoted by p)
Thus population proportion p = X/N
If we draw a sample of size n from this population and observe that out of n fall in category I then sample proportion of category I = /n (denoted by
Thus sample proportion = /n
x
xp̂p̂x
Population and sample proportions
A population consists of 9000 families in a small town. Out of these, 3600 families have their houses insured.
Then population proportion of house insured families = p = 3600/9000 = .4
Suppose we drew a sample of size 100 from the above population and observed that 42 families out of 100 have house insurance. Then the sample proportion of the house insured families
= 42/100 = .42
Sampling error = - p = .42 - .40 = .02
p̂p̂
Sampling distribution of p̂
5nq and 5np if large be toconsidered isn here
) p, N( ~p̂
thenlargeley considerab isn size sample If
:proportion samplefor remLimit theo Central
p-1 q where and p̂ ofmean
ondistributi sampling a has and
variablerandom a is p̂ Thus . p̂for valuesdifferent get we
population thefrom samples multiple draw When we
p̂p̂
n
pq
n
pqp
Exercise 7.60N = 1000, X = 640
Then population proportion p = 640/1000 = .64
n= 40 , x = 24
then sample proportion = 24/40 = .60 p̂
Exercise 7.70
0483.100
.37*.63
n
pq and
63.then
100 n
drawn. is 100 size of sample when and find to
.05 n/N and
.63 p proportion populationGiven
p̂
p̂
p̂p̂
p