Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
1
Sampling Distributions &Probability
McCall Chapter 3
measures of central tendency mean
deviations @ the mean minimum variability of scores about the mean
median mode
McCall Chapter 3
measures of variability range variance standard deviation
2
McCall Chapter 3
populations and samples estimators of population parameters based on sample mean, variance
McCall Chapter 7
sampling sampling distributions sampling error probability & hypothesis testing estimation
Methods of Sampling
simple random sampling all elements of the population have an equal
probability of being selected for the sample representative samples of all aspects of
population (for large samples)
3
Methods of Sampling
proportional stratified random sample mainly used for small samples random sampling within groups but not
between e.g. political polls
random sampling within each province (but not between provinces) total # samples for each province predetermined
by overall population
Random Sampling
each subject is selected independentlyof other subjects
selection of one element of the populationdoes not alter likelihood of selecting anyother element of the population
Sampling in Practice
elements of population available to besampled is often biased e.g. willingness of subjects to participate certain subjects sign up for certain experiments Psych 020 subject pool - is it representative of
the general population?
4
Sampling Distributions
sampling is an imprecise process estimate will never be exactly the same as
population parameter a set of multiple estimates based on
multiple samples is called an empiricalsampling distribution
Empirical Sampling Distribution
Sampling Distribution
The distribution of a statistic determinedon separate independent samples of sizeN drawn from a given population is calleda sampling distribution
mean, standard deviation and variance inraw score distributions vs samplingdistributions:
5
Sampling Distributions
Estimates of population statistics by using the mean of a sample of raw scores we
can estimate both: mean of sampling distribution of means mean of population raw scores
we can estimate the standard deviation of thesampling distribution of means using: standard deviation of raw scores in sample divided by
the square root of the size of the sample
Standard error of the mean all that’s required to esimate it is
standard deviation of sample of raw scores N (# of scores in sample)
it represents an estimate of the amount ofvariability (or sampling error) in means from allpossible samples of size N from the populationof raw scores
not necessary to select several samples toestimate the population sampling error of themean
6
Standard error of the mean we divide by the square root of N thus standard error of the mean is always smaller than the
standard deviation of the raw scores in sample variability of means from sample to sample will always be
smaller than the variability of raw scores within a sample as N becomes larger, standard error mean becomes
smaller For large samples the mean will be less variable from
sample to sample thus a more accurate estimate of the true mean of the
population
Normality Given random sampling, the sampling distribution
of the mean is a normal distribution if the population distribution of
the raw scores is normal approaches a normal distribution as the size of the
sample increases even if the population distribution ofraw scores is not normal
Central limit theorem the sum of a large number of independent observations from
the same distribution has, under certain general conditions,an approximate normal distribution
the approximation steadily improves as the number ofobservations increases
Normality why do we care about whether populations or
samples are normally distributed??? parametric statistical tests are based on the
assumption of normality t-Test F-test
given a mean and a variance: we can look up in a table (or compute) the proportion
of population scores that fall above (or below) a givenvalue
based on assuming entire shape of distribution basedon mean and variance
7
Normality What if assumption of normality is violated? perform “non-parametric” statistical tests
we will see some of these later on in course not based on assuming any particular shape of
distribution determine “how serious” the violation is
monte-carlo simulations given 2 known non-normal distributions whose means do
NOT differ, extract 1000 random samples of size N from each perform statistical tests with given alpha value (e.g. .05) was # of type-I errors greater than alpha?
Hypothesis testing
standard normal distribution (z);t distribution
A single case
suppose it is known: for population of subjects asked to learn &
remember 15 nouns mean # correct nouns recalled after 80 min =
7 & standard deviation = 2
!
µ = 7
"x
= 2
8
A single case
does taking a new drug improve memory? test a single person after taking the drug they score 11 nouns recalled what can we conclude?
!
µ = 7
"x
= 2
A single case 11 nouns recalled after taking drug what are the chances that someone randomly
sampled from the population (without the drug)would have scored 11 or higher?
this probability equals the area under the curve:
!
µ = 7
"x
= 2
A single case to determine probability:
convert raw score (11) to a z-score: Z = (11-7)/2 = 2.00 lookup probability in z-table (table A, appendix 2,
McCall) p = 0.0228
!
µ = 7
"x
= 2
!
zi=Xi"µ
x
#x
9
A single case p=0.0228; but what is our alpha level? let’s say 5%
but we didn’t know in advance whether drug shouldlower or raise memory score
we have to split 5% into top 2.5% and bottom 2.5%
!
µ = 7
"x
= 2
2.5% 2.5%
A single case p=0.0228 alpha (two-tailed) = 0.0250 thus p < alpha; reject null hypothesis conclude drug may have an effect
!
µ = 7
"x
= 2
2.5% 2.5%
A single group let’s say the experimenter samples 20
people randomly from the population andtests all of them
average score across 20 subjects = 8.4 same idea as before - ask what is the
probability that group mean of sample size20 could be 8.4 or above given onlyrandom sampling?
i.e. assuming no effect of the drug
10
A single group sample mean (n=20) = 8.4 let’s transform into a z-score mean is mean of sampling distribution of
means standard deviation is sd of sampling
distribution of means (not of raw scores) thus
z=(8.4-7.0)/(2.0/sqrt(20)) = 3.13
!
zi=Xi"µ
x
#x
!
z =X "µ
x
#x
N
A single group
z = 3.13; lookup in table, p = 0.0009 so reject null hyp; drug may have an effect notice that now z-score and hence
probability depend on sample size the higher the sample size the higher the z-
score, & the lower the probability
!
z =X "µ
x
#x
N
Decisions
11
A single group
in these examples, mean and standarddeviation of population was known
typically we don’t know real s.d. but wehave to estimate it from sample data
!
z =X "µ
x
#x
N
!
zi=Xi"µ
x
#x
Tests based on estimates we can use the standard error of the sampling
distribution of the mean to estimate standarddeviation of population
accuracy depends on sample size, N for large samples N>50, N>100, it’s fairly accurate for small samples it is not another theoretical sampling distribution exists
that is appropriate for these situations
!
z =X "µ
#x
!
sx
= sx/ N
The t distribution
similar to z distribution however: there is a different distribution for
each sample size N df = N-1
!
t =X "µ
sx
=X "µ
sx/ N
12
The t distribution
same example as before select 20 subjects at random assume s.d. of population is not available assume population is normal in form with a
mean of 7.0 test the hypothesis that:
the observed sample mean is drawn from apopulation with mean=7.0
The t distribution compute the t statistic: tobs = (8.4-7.0)/(2.3/sqrt(20)) = 2.72 lookup “critical value” of t for alpha = 0.05 and
df=N-1 = 20-1 = 19 tcrit = 2.093 (two-tailed test) since tobs > tcrit, we reject the null hypothesis reject the hypothesis that sample was drawn from
a population with mean=7.0 conclude drug may have had an effect
!
t =X "µ
sx/ N
Confidence Intervals for the mean
our sample mean is not equal to thepopulation mean
it is an estimate using standard error of the mean and the t-
statistic we can compute a confidenceinterval for the true population mean
!
X ± t" (sx )
13
Confidence intervals for the mean Xbar = the sample mean alpha = 1 - level of confidence desired talpha = critical value of t corresponding to a nondirectional test
at significance level alpha sxbar = the estimated standard error of the mean Xbar = 8.4 alpha = 1-.95 = 0.05 talpha = t.05 = 2.093 (from the t-tables, df=19) sxbar = (2.3/sqrt(20)) = 0.51 confidence limits are 8.4 +/- 1.07 7.33 9.47 the probability is 95% that the interval (7.33,9.47) contains the
true population mean
!
X ± t" (sx )
t-tests for difference between means
assume we have two random samples we want to test whether these two
samples have been drawn from h0: the same population (with the same mean) h1: 2 populations with 2 different means
compute t-statistic:
!
t =(X
1" X
2) " (µ
1"µ
2)
sx 1"x 2
t-tests for difference between means under the null hypothesis, mean1=mean2
denominator again can be estimated from thesample data
standard error of the difference between meansactually varies depending on whether scores inthe two samples are correlated or independent
!
t =(X
1" X
2) " (µ
1"µ
2)
sx 1"x 2
=(X
1" X
2) " 0
sx 1"x 2
=(X
1" X
2)
sx 1"x 2
14
independent groups t-test
formula for computing t:
!
t =X 1" X
2
sx 1"x 2
=X 1" X
2
(N1"1)s
1
2+ (N
2"1)s
2
2
N1
+ N2" 2
#
$ %
&
' ( 1
N1
1
N2
#
$ %
&
' (
!
df = (N1"1) + (N
2"1) = N
1+ N
2" 2
correlated groups t-test
assume the difference between twomeans equals the mean differencebetween pairs of scores
formula for computing t:
!
t =D "µ
D
sD
=D
sD/ N
!
df = N "1
!
t =Di"
N Di
2# ( D
i)2""
N #1
Interpretation of significance scientific significance and statistical significance
are NOT the same if N is very large you might find a “statistically
significant” difference between samples that is infact tiny and is not scientifically significant
if N is very small you might falsely conclude theabsence of a scientifically significant differencethat is in fact present because the observeddifference in your samples is not “statisticallysignificant”
15
SPSS & t-tests
independent samples t-test two groups: A: (3,1,5,4,6,2,1,8,3,4) B: (5,7,8,9,7,6,8,7,5,9)