Sampling Distributions & Probabilitybme2.aut.ac.ir/~towhidkhah/MI/Resources/... · 4 Sampling Distributions sampling is an imprecise process estimate will never be exactly the same

1

Sampling Distributions &Probability

McCall Chapter 3

measures of central tendency mean

deviations @ the mean minimum variability of scores about the mean

median mode

McCall Chapter 3

measures of variability range variance standard deviation

2

McCall Chapter 3

populations and samples estimators of population parameters based on sample mean, variance

McCall Chapter 7

sampling sampling distributions sampling error probability & hypothesis testing estimation

Methods of Sampling

simple random sampling all elements of the population have an equal

probability of being selected for the sample representative samples of all aspects of

population (for large samples)

3

Methods of Sampling

proportional stratified random sample mainly used for small samples random sampling within groups but not

between e.g. political polls

random sampling within each province (but not between provinces) total # samples for each province predetermined

by overall population

Random Sampling

each subject is selected independentlyof other subjects

selection of one element of the populationdoes not alter likelihood of selecting anyother element of the population

Sampling in Practice

elements of population available to besampled is often biased e.g. willingness of subjects to participate certain subjects sign up for certain experiments Psych 020 subject pool - is it representative of

the general population?

4

Sampling Distributions

sampling is an imprecise process estimate will never be exactly the same as

population parameter a set of multiple estimates based on

multiple samples is called an empiricalsampling distribution

Empirical Sampling Distribution

Sampling Distribution

The distribution of a statistic determinedon separate independent samples of sizeN drawn from a given population is calleda sampling distribution

mean, standard deviation and variance inraw score distributions vs samplingdistributions:

5

Sampling Distributions

Estimates of population statistics by using the mean of a sample of raw scores we

can estimate both: mean of sampling distribution of means mean of population raw scores

we can estimate the standard deviation of thesampling distribution of means using: standard deviation of raw scores in sample divided by

the square root of the size of the sample

Standard error of the mean all that’s required to esimate it is

standard deviation of sample of raw scores N (# of scores in sample)

it represents an estimate of the amount ofvariability (or sampling error) in means from allpossible samples of size N from the populationof raw scores

not necessary to select several samples toestimate the population sampling error of themean

6

Standard error of the mean we divide by the square root of N thus standard error of the mean is always smaller than the

standard deviation of the raw scores in sample variability of means from sample to sample will always be

smaller than the variability of raw scores within a sample as N becomes larger, standard error mean becomes

smaller For large samples the mean will be less variable from

sample to sample thus a more accurate estimate of the true mean of the

population

Normality Given random sampling, the sampling distribution

of the mean is a normal distribution if the population distribution of

the raw scores is normal approaches a normal distribution as the size of the

sample increases even if the population distribution ofraw scores is not normal

Central limit theorem the sum of a large number of independent observations from

the same distribution has, under certain general conditions,an approximate normal distribution

the approximation steadily improves as the number ofobservations increases

Normality why do we care about whether populations or

samples are normally distributed??? parametric statistical tests are based on the

assumption of normality t-Test F-test

given a mean and a variance: we can look up in a table (or compute) the proportion

of population scores that fall above (or below) a givenvalue

based on assuming entire shape of distribution basedon mean and variance

7

Normality What if assumption of normality is violated? perform “non-parametric” statistical tests

we will see some of these later on in course not based on assuming any particular shape of

distribution determine “how serious” the violation is

monte-carlo simulations given 2 known non-normal distributions whose means do

NOT differ, extract 1000 random samples of size N from each perform statistical tests with given alpha value (e.g. .05) was # of type-I errors greater than alpha?

Hypothesis testing

standard normal distribution (z);t distribution

A single case

suppose it is known: for population of subjects asked to learn &

remember 15 nouns mean # correct nouns recalled after 80 min =

7 & standard deviation = 2

!

µ = 7

"x

= 2

8

A single case

does taking a new drug improve memory? test a single person after taking the drug they score 11 nouns recalled what can we conclude?

!

µ = 7

"x

= 2

A single case 11 nouns recalled after taking drug what are the chances that someone randomly

sampled from the population (without the drug)would have scored 11 or higher?

this probability equals the area under the curve:

!

µ = 7

"x

= 2

A single case to determine probability:

convert raw score (11) to a z-score: Z = (11-7)/2 = 2.00 lookup probability in z-table (table A, appendix 2,

McCall) p = 0.0228

!

µ = 7

"x

= 2

!

zi=Xi"µ

x

#x

9

A single case p=0.0228; but what is our alpha level? let’s say 5%

but we didn’t know in advance whether drug shouldlower or raise memory score

we have to split 5% into top 2.5% and bottom 2.5%

!

µ = 7

"x

= 2

2.5% 2.5%

A single case p=0.0228 alpha (two-tailed) = 0.0250 thus p < alpha; reject null hypothesis conclude drug may have an effect

!

µ = 7

"x

= 2

2.5% 2.5%

A single group let’s say the experimenter samples 20

people randomly from the population andtests all of them

average score across 20 subjects = 8.4 same idea as before - ask what is the

probability that group mean of sample size20 could be 8.4 or above given onlyrandom sampling?

i.e. assuming no effect of the drug

10

A single group sample mean (n=20) = 8.4 let’s transform into a z-score mean is mean of sampling distribution of

means standard deviation is sd of sampling

distribution of means (not of raw scores) thus

z=(8.4-7.0)/(2.0/sqrt(20)) = 3.13

!

zi=Xi"µ

x

#x

!

z =X "µ

x

#x

N

A single group

z = 3.13; lookup in table, p = 0.0009 so reject null hyp; drug may have an effect notice that now z-score and hence

probability depend on sample size the higher the sample size the higher the z-

score, & the lower the probability

!

z =X "µ

x

#x

N

Decisions

11

A single group

in these examples, mean and standarddeviation of population was known

typically we don’t know real s.d. but wehave to estimate it from sample data

!

z =X "µ

x

#x

N

!

zi=Xi"µ

x

#x

Tests based on estimates we can use the standard error of the sampling

distribution of the mean to estimate standarddeviation of population

accuracy depends on sample size, N for large samples N>50, N>100, it’s fairly accurate for small samples it is not another theoretical sampling distribution exists

that is appropriate for these situations

!

z =X "µ

#x

!

sx

= sx/ N

The t distribution

similar to z distribution however: there is a different distribution for

each sample size N df = N-1

!

t =X "µ

sx

=X "µ

sx/ N

12

The t distribution

same example as before select 20 subjects at random assume s.d. of population is not available assume population is normal in form with a

mean of 7.0 test the hypothesis that:

the observed sample mean is drawn from apopulation with mean=7.0

The t distribution compute the t statistic: tobs = (8.4-7.0)/(2.3/sqrt(20)) = 2.72 lookup “critical value” of t for alpha = 0.05 and

df=N-1 = 20-1 = 19 tcrit = 2.093 (two-tailed test) since tobs > tcrit, we reject the null hypothesis reject the hypothesis that sample was drawn from

a population with mean=7.0 conclude drug may have had an effect

!

t =X "µ

sx/ N

Confidence Intervals for the mean

our sample mean is not equal to thepopulation mean

it is an estimate using standard error of the mean and the t-

statistic we can compute a confidenceinterval for the true population mean

!

X ± t" (sx )

13

Confidence intervals for the mean Xbar = the sample mean alpha = 1 - level of confidence desired talpha = critical value of t corresponding to a nondirectional test

at significance level alpha sxbar = the estimated standard error of the mean Xbar = 8.4 alpha = 1-.95 = 0.05 talpha = t.05 = 2.093 (from the t-tables, df=19) sxbar = (2.3/sqrt(20)) = 0.51 confidence limits are 8.4 +/- 1.07 7.33 9.47 the probability is 95% that the interval (7.33,9.47) contains the

true population mean

!

X ± t" (sx )

t-tests for difference between means

assume we have two random samples we want to test whether these two

samples have been drawn from h0: the same population (with the same mean) h1: 2 populations with 2 different means

compute t-statistic:

!

t =(X

1" X

2) " (µ

1"µ

2)

sx 1"x 2

t-tests for difference between means under the null hypothesis, mean1=mean2

denominator again can be estimated from thesample data

standard error of the difference between meansactually varies depending on whether scores inthe two samples are correlated or independent

!

t =(X

1" X

2) " (µ

1"µ

2)

sx 1"x 2

=(X

1" X

2) " 0

sx 1"x 2

=(X

1" X

2)

sx 1"x 2

14

independent groups t-test

formula for computing t:

!

t =X 1" X

2

sx 1"x 2

=X 1" X

2

(N1"1)s

1

2+ (N

2"1)s

2

2

N1

+ N2" 2

#

$ %

&

' ( 1

N1

1

N2

#

$ %

&

' (

!

df = (N1"1) + (N

2"1) = N

1+ N

2" 2

correlated groups t-test

assume the difference between twomeans equals the mean differencebetween pairs of scores

formula for computing t:

!

t =D "µ

D

sD

=D

sD/ N

!

df = N "1

!

t =Di"

N Di

2# ( D

i)2""

N #1

Interpretation of significance scientific significance and statistical significance

are NOT the same if N is very large you might find a “statistically

significant” difference between samples that is infact tiny and is not scientifically significant

if N is very small you might falsely conclude theabsence of a scientifically significant differencethat is in fact present because the observeddifference in your samples is not “statisticallysignificant”

15

SPSS & t-tests

independent samples t-test two groups: A: (3,1,5,4,6,2,1,8,3,4) B: (5,7,8,9,7,6,8,7,5,9)

Documents

Sampling Distributions & Probabilitybme2.aut.ac.ir/~towhidkhah/MI/Resources/... · 4 Sampling Distributions sampling is an imprecise process estimate will never be exactly the same