49
SAMPLING SAMPLING DISTRIBUTION DISTRIBUTION OF MEANS & OF MEANS & PROPORTIONS PROPORTIONS

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Embed Size (px)

DESCRIPTION

SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION. Sample Knowledge of students No. of red blood cells in a person Length of the life of electric bulbs Population Population census– whole population. - PowerPoint PPT Presentation

Citation preview

Page 1: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

SAMPLING SAMPLING DISTRIBUTION DISTRIBUTION OF MEANS & OF MEANS & PROPORTIONSPROPORTIONS

Page 2: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

SAMPLING AND SAMPLING SAMPLING AND SAMPLING VARIATIONVARIATION

SampleSampleKnowledge of students Knowledge of students No. of red blood cells in a personNo. of red blood cells in a personLength of the life of electric bulbsLength of the life of electric bulbsPopulationPopulationPopulation census– whole Population census– whole populationpopulation

Page 3: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Repeat the same study, under exactly Repeat the same study, under exactly similar conditions, we will not similar conditions, we will not necessarily get identical results.necessarily get identical results.

Example: In a clinical trail of 200 patients we Example: In a clinical trail of 200 patients we find that the efficacy of a particular drug is 75%find that the efficacy of a particular drug is 75%

If we repeat the study using the same drug in If we repeat the study using the same drug in another group of similar 200 patients we will another group of similar 200 patients we will not get the same efficacy of 75%. It could be not get the same efficacy of 75%. It could be 78% or 71%.78% or 71%.

“ “Different results from different trails though all Different results from different trails though all of them conducted under the same conditions”of them conducted under the same conditions”

Page 4: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Example:Example: If the two drugs have the same efficacy then If the two drugs have the same efficacy then

the difference between the cure rates with the difference between the cure rates with these two drugs should be zero.these two drugs should be zero.

But in practice we may not get a difference of But in practice we may not get a difference of zero.zero.

If we find the difference is small say 2%, 3%, If we find the difference is small say 2%, 3%, or 5%, we may accept the hypothesis that the or 5%, we may accept the hypothesis that the two drugs are equally effective.two drugs are equally effective.

On the other hand, if we find the difference to On the other hand, if we find the difference to be large say 25%, we would infer that the be large say 25%, we would infer that the difference is very large and conclude that the difference is very large and conclude that the drugs are not of equally efficacy.drugs are not of equally efficacy.

Page 5: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Example:Example: If we testing the claim of If we testing the claim of pharmaceutical company that the pharmaceutical company that the efficacy of a particular drug is 80%.efficacy of a particular drug is 80%.

We may accept the company’s claim if we We may accept the company’s claim if we observe the efficacy in the trail to be 78%, observe the efficacy in the trail to be 78%, 81%, 83% or 77%.81%, 83% or 77%.

But if the efficacy in trail happens to be 50%, But if the efficacy in trail happens to be 50%, we would have good cause to feel that true we would have good cause to feel that true efficacy cannot be 80%.efficacy cannot be 80%.

And the chance of such happening must be And the chance of such happening must be very low. We then tend to dismiss the claim very low. We then tend to dismiss the claim that the efficacy of the drug is 80%.that the efficacy of the drug is 80%.

Page 6: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

THEREFORE THEREFORE

“ “WHILE TAKING DECISIONS BASED WHILE TAKING DECISIONS BASED ON EXPERIMENTAL DATA WE MUST ON EXPERIMENTAL DATA WE MUST GIVE SOME ALLOWANCE FOR GIVE SOME ALLOWANCE FOR SAMPLING VARIATION “.SAMPLING VARIATION “.

“ “VARIATION BETWEEN ONE SAMPLE VARIATION BETWEEN ONE SAMPLE AND ANOTHER SAMPLE IS KNOWN AS AND ANOTHER SAMPLE IS KNOWN AS SAMPLING VARIATION”.SAMPLING VARIATION”.

Page 7: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

InferenceInference – extension of results obtained from an – extension of results obtained from an experiment (sample) to the general populationexperiment (sample) to the general population

use of sample data to draw conclusions about entire use of sample data to draw conclusions about entire populationpopulation

PParameter arameter – number that describes a – number that describes a ppopulationopulation Value is not usually known Value is not usually known We are unable to examine populationWe are unable to examine population

SStatistictatistic – number computed from – number computed from ssample dataample data Estimate unknown parametersEstimate unknown parameters Computed to estimate unknown parametersComputed to estimate unknown parameters

Mean, standard deviation, variability, etc..Mean, standard deviation, variability, etc..

NotationsNotationspopulation meanpopulation meansample meansample mean

Page 8: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

= Population mean= Population mean Sample mean is a random variable.Sample mean is a random variable. If the sample was randomly drawn, then any If the sample was randomly drawn, then any

differences between the obtained sample mean differences between the obtained sample mean and the true population mean is due to sampling and the true population mean is due to sampling error. error.

Any difference between Any difference between and and μ is due to the fact μ is due to the fact that different people show up in different samplesthat different people show up in different samples

If is not equal to If is not equal to μ , the difference is due to μ , the difference is due to sampling error.sampling error.

““Sampling error” is normal, it isSampling error” is normal, it isto-be-expected variability of samplesto-be-expected variability of samples

X

X

Page 9: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

How can experimental results be trusted? If How can experimental results be trusted? If is rarely exactly right and varies from is rarely exactly right and varies from sample to sample, why is it not a reasonable sample to sample, why is it not a reasonable estimate of the population mean estimate of the population mean μμ??

How can we describe the behavior of the How can we describe the behavior of the statistics from different samples?statistics from different samples? E.g. the mean valueE.g. the mean value

x

Page 10: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Very rarely do sample values coincide Very rarely do sample values coincide with the population value (parameter).with the population value (parameter).

The discrepancy between the sample value The discrepancy between the sample value and the parameter is known as sampling and the parameter is known as sampling error, when this discrepancy is the result error, when this discrepancy is the result of random sampling.of random sampling.

Fortunately, these errors behave Fortunately, these errors behave systematically and have a characteristic systematically and have a characteristic distribution.distribution.

Page 11: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

SAMPLING SAMPLING DISTRIBUTIONDISTRIBUTION

The sample distribution The sample distribution is the distribution of is the distribution of allall possible sample means possible sample means that that could be drawncould be drawn from the population.from the population.

Page 12: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

SAMPLING DISTRIBUTIONSSAMPLING DISTRIBUTIONS

What would happen if we took many samples ofWhat would happen if we took many samples of

10 subjects from the population?10 subjects from the population?

Steps:Steps:

1.1. Take a large number of samples of size 10 from the Take a large number of samples of size 10 from the populationpopulation

2.2. Calculate the sample mean for each sampleCalculate the sample mean for each sample

3.3. Make a histogram of the mean valuesMake a histogram of the mean values

4.4. Examine the distribution displayed in the histogram Examine the distribution displayed in the histogram for shape, center, and spread, as well as outliers and for shape, center, and spread, as well as outliers and other deviationsother deviations

Page 13: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS
Page 14: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Properties of sampling distributions

Page 15: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

A sample of 3 students from a class – A sample of 3 students from a class – a population of 6 students and measure a population of 6 students and measure students GPAstudents GPA

StudentStudent GPAGPA

SusanSusan 2.12.1

KarenKaren 2.62.6

BillBill 2.32.3

CalvinCalvin 1.21.2

RoseRose 3.03.0

DavidDavid 2.4 2.4

Page 16: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Draw each possible Draw each possible sample from this sample from this ‘population’:‘population’:

Susan 2.1

Karen 2.6

Bill 2.3

Rose 3.0David 2.4

Calvin 1.2

Page 17: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

With samples of n = 3 With samples of n = 3 from this population of from this population of N = 6 there are 20 N = 6 there are 20 different sample different sample possibilities:possibilities:

2036

720

123123

123456

)!(!

!

nNn

N

n

N

Page 18: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Note that every different sample Note that every different sample would produce a different mean would produce a different mean

and s.d.,and s.d., ONE SAMPLE = Susan + Karen +Bill / 3

= 2.1+2.6+2.3 / 3

= 7.0 / 3 = 2.3

Standard Deviation:

(2.1-2.3) 2 = .22 = .04

(2.6-2.3) 2 = .32 = .09

(2.3-2.3) 2 = 02 = 0

s2=.13/3 and s = =.21

So this one sample of 3 has a mean of 2.3 and a sd of .21

X

043.

Page 19: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

What about other What about other samples?samples?

A SECOND SAMPLE A SECOND SAMPLE = Susan + Karen + Calvin = Susan + Karen + Calvin = 2.1 + 2.6 + 1.2 = 2.1 + 2.6 + 1.2

= 1.97 = 1.97 SD = .58SD = .58

2020thth SAMPLE SAMPLE= Karen + Rose + David= Karen + Rose + David= 2.6 + 3.0 + 2.4= 2.6 + 3.0 + 2.4= 2.67 = 2.67

SD = .25SD = .25

X

X

Page 20: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Assume the true mean of the Assume the true mean of the population is known, in this simple population is known, in this simple case of 6 people and can be case of 6 people and can be calculated as 13.6/6 = calculated as 13.6/6 = =2.27 =2.27

The The mean of the sampling mean of the sampling distributiondistribution (i.e., the mean of all 20 (i.e., the mean of all 20 samples) is 2.30.samples) is 2.30.

Page 21: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

What is a Sampling What is a Sampling Distribution?Distribution? A distribution made up of every A distribution made up of every

conceivable sample drawn from a conceivable sample drawn from a population.population.

A sampling distribution is almost always a A sampling distribution is almost always a hypothetical distribution because typically hypothetical distribution because typically you do not have and cannot calculate you do not have and cannot calculate every conceivable sample mean.every conceivable sample mean.

The mean of the sampling distribution is The mean of the sampling distribution is an unbiased estimator of the population an unbiased estimator of the population mean with a computable standard mean with a computable standard deviation.deviation.

Page 22: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

LAW OF LARGE NUMBERSLAW OF LARGE NUMBERS

1) If we keep taking larger and larger samples, the statistic 1) If we keep taking larger and larger samples, the statistic is guaranteed to get closer and closer to the parameter is guaranteed to get closer and closer to the parameter value.value.

Page 23: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS
Page 24: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

N = 1 N = 2

N = 10 N = 25

Page 25: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Central Limit TheoremCentral Limit Theorem

If all possible random samples, each the size of If all possible random samples, each the size of your sample, were taken from any population your sample, were taken from any population then the sampling distribution of sample means then the sampling distribution of sample means will have:will have:

a mean equal to the population mean a mean equal to the population mean a standard deviation equal toa standard deviation equal to

The sampling distribution will be normally The sampling distribution will be normally distributed distributed IF EITHER: IF EITHER:

the the parent populationparent population from which you are from which you are sampling is normally distributed sampling is normally distributed OROR

IF the sample size is greater than n=30.IF the sample size is greater than n=30.

n

Page 26: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

ILLUSTRATION OF ILLUSTRATION OF SAMPLING SAMPLING DISTRIBUTIONSDISTRIBUTIONS

Draw 500 different SRSs.

What happens to the shape of the sampling distribution as the size of the sample increases?

Page 27: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

500 Samples of n = 2500 Samples of n = 2

Page 28: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

500 Samples of n = 4500 Samples of n = 4

Page 29: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

500 Samples of n = 6500 Samples of n = 6

Page 30: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

500 Samples of n = 10500 Samples of n = 10

Page 31: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

500 Samples of n = 20500 Samples of n = 20

Page 32: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Key ObservationsKey Observations

As the sample size increases the mean of As the sample size increases the mean of the sampling distribution comes to more the sampling distribution comes to more closely approximate the closely approximate the true population true population mean, here known to be mean, here known to be = 3.5 = 3.5

AND-this critical-AND-this critical-the standard error-that is the standard error-that is the standard deviation of the sampling the standard deviation of the sampling distribution – gets systematically narrower.distribution – gets systematically narrower.

Page 33: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS
Page 34: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Three main pointsThree main points about sampling about sampling distributionsdistributions

Probabilistically, as the sample size gets bigger Probabilistically, as the sample size gets bigger the sampling distribution better approximates a the sampling distribution better approximates a normal distribution.normal distribution.

The mean of the sampling distributionThe mean of the sampling distribution will will more closely estimate the population parameter more closely estimate the population parameter as the sample size increases.as the sample size increases.

The standard error (SE) gets narrower and The standard error (SE) gets narrower and narrower as the sample size increases. Thus, narrower as the sample size increases. Thus, we will be able to make more precise estimates we will be able to make more precise estimates of the whereabouts of the unknown population of the whereabouts of the unknown population mean.mean.

Page 35: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

ESTIMATING THE ESTIMATING THE POPULATION MEANPOPULATION MEAN

We are unlikely to ever see a sampling distribution because We are unlikely to ever see a sampling distribution because it is often impossible to draw every conceivable sample it is often impossible to draw every conceivable sample from a population and we never know the actual mean of from a population and we never know the actual mean of the sampling distribution or the actual standard deviation the sampling distribution or the actual standard deviation of the sampling distribution. But, here is the good news:of the sampling distribution. But, here is the good news:

We can estimate the whereabouts of the population mean We can estimate the whereabouts of the population mean from the sample mean and use the sample’s standard from the sample mean and use the sample’s standard deviation to calculate the standard error. The formula for deviation to calculate the standard error. The formula for computing the standard error changes, depending on the computing the standard error changes, depending on the statistic you are using, but essentially you divide the statistic you are using, but essentially you divide the sample’s standard deviation by the square root of the sample’s standard deviation by the square root of the sample size.sample size.

Page 36: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Don’t be confused between the standard deviation of your sample, computed by:

n

XX

2

and the standard error (s.d.,of sampling distribution) is:

nSE

Page 37: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Note that we rarely know the standarddeviation of the population or the standarddeviation of the sampling distribution.

The standard error must be estimated by using the standard deviation of your sample and dividing by N – 1.

Page 38: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

The Standard Error For Samples:

or, same thing,

What we are trying to do is locate the unknown whereabouts of the population mean. Probabilistically speaking mu is at or somewhere either side of the sample mean.

1

)(2

NSE

XX

1

N

sSE

Page 39: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Standard deviation versus Standard deviation versus standard errorstandard error The The standard deviationstandard deviation (s) describes (s) describes

variability between individuals in a sample.variability between individuals in a sample. The The standard errorstandard error describes variation of a describes variation of a

sample statistic. sample statistic. . The standard deviation describes how . The standard deviation describes how

individuals differ.individuals differ. The standard error of the mean describes The standard error of the mean describes

the precision with which we can make the precision with which we can make inference about the true mean.inference about the true mean.

Page 40: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Standard error of the Standard error of the meanmean Standard error of the mean (sem):Standard error of the mean (sem):

Comments:Comments: n = sample sizen = sample size even for large s, if n is large, we can get even for large s, if n is large, we can get

good precision for semgood precision for sem always smaller than standard deviation (s)always smaller than standard deviation (s)

s sems

nx

Page 41: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

ProportionsProportions A proportion or percentage is a mean: it is a mean of a variable that takes on the values 0 and 1. The event of interest is coded 1.

The CLT then applies to proportions as it does to means. For a 0/1 variable, the population is necessarily not normally distributed, by the CLT says that for a proportion calculated from a large sample the sampling distribution will be normally distributed.

Page 42: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

NotationNotationp = population proportion

= sample proportion

n = sample size

CLT suggests:

p

p

ˆ

ˆ

mean of sampling distribution of proportion ‘ p’

standard deviation of sampling distribution of proportion

Page 43: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

For a 0/1 variable, the standard deviation simplifies to a simple function of the proportion ones in the population:

The standard deviation of the sampling distribution then simplifies as follows:

)1( pp

n

pp

n

pp

np

)1()1(ˆ

Page 44: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS
Page 45: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Normality of Sampling Normality of Sampling DistributionsDistributions

In small samples, the sampling distribution of a proportion will not be normally shaped because the population of a normal.

Rule of thumb: the sampling distribution is close enough to normal to use the normal table if

np10 and n(1-p)10

Otherwise, we cannot do the problem with the normal table.

Page 46: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

SLOGAN TO REMEMBERSLOGAN TO REMEMBER

Sample Mean Sample Mean ++ Sampling Error Sampling Error= The Population Mean= The Population Mean

Some Sample Characteristic Some Sample Characteristic ++ Sampling Error Sampling Error = The Population = The Population CharacteristicCharacteristic

Page 47: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

Two Steps in Statistical Inferencing Process

1. Calculation of “confidence intervals” from the sample mean and sample standard deviation within which we can place the unknown population mean with some degree of probabilistic confidence

2. Compute “test of statistical significance” (Risk Statements) which is designed to assess the probabilistic chance that the true but unknown population mean lies within the confidence interval that you just computed from the sample mean.

Page 48: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

So, first we calculate confidence limits and then test for So, first we calculate confidence limits and then test for statistical significance, which is the proba-bility of mu statistical significance, which is the proba-bility of mu being within the CIs we computed.being within the CIs we computed.

Both these steps are required when making inferences Both these steps are required when making inferences about the whereabouts of the unknown population about the whereabouts of the unknown population mean. Both the calculation of confidence intervals and mean. Both the calculation of confidence intervals and then the calculation of a measure of statistical likelihood then the calculation of a measure of statistical likelihood -- are based on the probabilistic patterns of a sampling -- are based on the probabilistic patterns of a sampling distribution. distribution.

Together, the confidence limits and statistical test tells us Together, the confidence limits and statistical test tells us the probability as to what would happen IF we sampled the probability as to what would happen IF we sampled the population not once but an infinite number of times. the population not once but an infinite number of times. That is, we are sampling from a sampling That is, we are sampling from a sampling distribution.This kind of inferencing is the hallmark of distribution.This kind of inferencing is the hallmark of statistics.statistics.

Page 49: SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS

What we want to do now is to take the next What we want to do now is to take the next step, to learn step, to learn how tohow to substantiate our substantiate our conclusionsconclusions -- to learn how to back up our -- to learn how to back up our conclusions with analyses that will reflect conclusions with analyses that will reflect how much confidence we should havehow much confidence we should have that our estimate of say the mean of the that our estimate of say the mean of the population -- which is being estimated from population -- which is being estimated from our sample -- is at or close to the true our sample -- is at or close to the true population mean. population mean.