Upload
joseph-owens
View
218
Download
1
Tags:
Embed Size (px)
Citation preview
PowerPower
Winnifred Louis
15 July 2009
Overview of Workshop
Review of the concept of power Review of antecedents of power Review of power analyses and effect size
calculations DL and discussion of write-up guide Intro to G-Power3 Examples of GPower3 usage
33
Power Power Comes down to a “limitation” of Null hypothesis testing Comes down to a “limitation” of Null hypothesis testing
approach and concern with decision errorsapproach and concern with decision errors Recall:Recall:
Significant differences are defined with reference to a Significant differences are defined with reference to a criterioncriterion, , (controlled/acceptable rate) for committing type-1 errors, (controlled/acceptable rate) for committing type-1 errors, typically .05typically .05
• the the type-1 errortype-1 error finding a significant difference in the finding a significant difference in the sample when it actually doesn’t exist in the populationsample when it actually doesn’t exist in the population
• type-1 error rate denoted type-1 error rate denoted However relatively little attention has been paid to the However relatively little attention has been paid to the
type-2 errortype-2 error• the the type-2 errortype-2 error finding no significant difference in the finding no significant difference in the
sample when there is a difference in the populationsample when there is a difference in the population• type-2 error rate denoted type-2 error rate denoted
44
Reality vs Statistical DecisionsReality vs Statistical Decisions
Hit (correct Hit (correct decision)decision)
1- 1- αα
Reality: H0 H1Statistical Decision:
Reject H0
Retain H0
55
Reality vs Statistical DecisionsReality vs Statistical Decisions
““False alarm”False alarm”
αα(aka Type 1 error)(aka Type 1 error)
Reality: H0 H1Statistical Decision:
Reject H0
Retain H0
66
Reality vs Statistical DecisionsReality vs Statistical Decisions
““Miss”Miss”
ββ(aka Type 2 error)(aka Type 2 error)
Reality: H0 H1Statistical Decision:
Reject H0
Retain H0
77
Reality vs Statistical DecisionsReality vs Statistical Decisions
Hit (correct Hit (correct decision)decision)
1 - 1 - ββ
PowerPower
Reality: H0 H1Statistical Decision:
Reject H0
Retain H0
88
Reality vs Statistical DecisionsReality vs Statistical Decisions
““False alarm”False alarm”
αα(aka Type 1 error)(aka Type 1 error)
Hit (correct Hit (correct decision)decision)
1 - 1 - ββ
PowerPower
Hit (correct Hit (correct decision)decision)
1- 1- αα
““Miss”Miss”
ββ(aka Type 2 error)(aka Type 2 error)
Reality: H0 H1Statistical Decision:
Reject H0
Retain H0
powerpower is: is:
the probability of correctly rejecting a the probability of correctly rejecting a falsefalse null hypothesisnull hypothesis
the probability that the study will yield the probability that the study will yield significant results significant results if the research if the research hypothesis is truehypothesis is true
the probability of the probability of correctly identifying a truecorrectly identifying a true alternative hypothesisalternative hypothesis
powerpower
sampling distributionssampling distributions
the distribution of a statistic that the distribution of a statistic that we would expect if we drew an we would expect if we drew an infinite number of samples (of a infinite number of samples (of a given size) from the populationgiven size) from the population
sampling distributions have sampling distributions have means and SDsmeans and SDs
can have a sampling can have a sampling distribution for any statistic, but distribution for any statistic, but the most common is the the most common is the sampling distribution of the sampling distribution of the meanmean
H0: 1 = 2
= .025 = .025
Recall: Estimating pop means from sample meansRecall: Estimating pop means from sample meansHere – Null hyp is Here – Null hyp is truetrue
so if our test tells us - our sample of differences between means falls into the shaded areas, we reject the null hypothesis. But, 5% of the time, we will do so incorrectly.
(type I error) (type I error)
H0: 1 = 2
= .025
H1: 1 2
= .025
Here – Null hyp is Here – Null hyp is falsefalse
1 2
H0: 1 = 2
= .025
H1: 1 2
= .025
to the right of this line we reject the null hypothesis
POWER : 1 -
Reject H0Don’t Reject H0
H0: 1 = 2H1: 1 2
Correct decision:Rejection of H0
1 - POWER
type 1 error ( )
type 2 error ()
Correct decision:Acceptance of H0
1 -
factors that influence powerfactors that influence power
1.1. level level
remember the remember the level defines the probability of making level defines the probability of making a Type I errora Type I error
tthe he level is typically .05 but the level is typically .05 but the level might change level might change depending on how worried the experimenter is about depending on how worried the experimenter is about ttype I and ype I and ttype II errorsype II errors
tthe bigger the he bigger the the more powerful the test (but the the more powerful the test (but the greater the risk of erroneously saying there’s an effect greater the risk of erroneously saying there’s an effect when there’s not ... when there’s not ... ttype I error)ype I error)
E.g., use one-tail testE.g., use one-tail test
H0: 1 = 2
= .025 = .025(type I error) (type I error)
factors that influence power: factors that influence power: level level
H0: 1 = 2
= .025
H1: 1 2
= .025
factors that influence power: factors that influence power: level level
POWER
H0: 1 = 2
= .025
H1: 1 2
= .025
factors that influence power: factors that influence power: level level
= .05
2. 2. the size of the effect (d)the size of the effect (d)
the effect size is not something the experimenter the effect size is not something the experimenter can (usually) control - it represents how big the can (usually) control - it represents how big the effect is in reality (the size of the relationship effect is in reality (the size of the relationship between the IV and the DV)between the IV and the DV)
Independent of Independent of N N (population level)(population level) it stands to reason that with big effects you’re it stands to reason that with big effects you’re
going to have more power than with small, going to have more power than with small, subtle effectssubtle effects
factors that influence powerfactors that influence power
H0: 1 = 2
= .025
H1: 1 2
= .025
factors that influence power: factors that influence power: dd
H0: 1 = 2
= .025
H1: 1 2
= .025
factors that influence power: factors that influence power: dd
3. 3. sample size (sample size (NN))
the bigger your sample size, the more the bigger your sample size, the more power you havepower you have
large sample size allows small effects to large sample size allows small effects to emergeemerge or … big samples can act as a magnifying or … big samples can act as a magnifying
glass that detects small effectsglass that detects small effects
factors that influence powerfactors that influence power
3. 3. sample size (sample size (NN))
you can see this when you look closely at formulasyou can see this when you look closely at formulas
the standard error of the mean tells us how much the standard error of the mean tells us how much on average we’d expect a sample mean to differ on average we’d expect a sample mean to differ from a population mean just by chance. The bigger from a population mean just by chance. The bigger the the NN the smaller the the smaller the standard errorstandard error and … smaller and … smaller standard errors = bigger standard errors = bigger zz scores scores
z = X -
X
X =
N
factors that influence powerfactors that influence power
Std err
4.4. smaller variance of scores in the smaller variance of scores in the population (population (22))
small standard errors lead to more power. small standard errors lead to more power. NN is one is one thing that affects your standard errorthing that affects your standard error
the other thing is the the other thing is the variancevariance of the population ( of the population (22) )
basically, the smaller the variance (spread) in basically, the smaller the variance (spread) in scores the smaller your standard error is going to scores the smaller your standard error is going to bebe
factors that influence powerfactors that influence power
H0: 1 = 2
= .025
H1: 1 2
= .025
factors that influence power: factors that influence power: N & N & 22
H0: 1 = 2
= .025
H1: 1 2
= .025
factors that influence power: factors that influence power: N & N & 22
outcomes of interestoutcomes of interest
power determinationpower determination
NN determination determination
, effect size, N, and power related, effect size, N, and power related
Effect sizesEffect sizes
Measures of group differencesMeasures of group differences Cohen’s d (t-test)Cohen’s d (t-test) Cohen’s f (ANOVA)Cohen’s f (ANOVA)
Measures of associationMeasures of association Partial eta-squared (Partial eta-squared (pp
22)) Eta-squared (Eta-squared (22)) Omega-squared (Omega-squared (22) ) R-squared (RR-squared (R22))
Classic 1988 textIn the library
Measures of difference - dMeasures of difference - d
When there are only two groups When there are only two groups dd is the standardised is the standardised difference between the two groupsdifference between the two groups
to calculate an effect size (to calculate an effect size (dd) you need to calculate the ) you need to calculate the difference you difference you expectexpect to find between means and divide to find between means and divide it by the it by the expectedexpected standard deviation of the population standard deviation of the population
conceptually, this tells us how many conceptually, this tells us how many SDSD’s apart we ’s apart we expect expect the populations (null and alternative) to bethe populations (null and alternative) to be
01 -
= d
ˆ d x 1 x 2MSerror
Effect size d % overlap Small
.20
85
Medium
.50
67
Large
.80
53
Cohen’s conventions for dCohen’s conventions for d
H0: 1 = 2 H1: 1 2
overlap of distributionsoverlap of distributions
MediumSmallLarge
Eta squared is the proportion of the total Eta squared is the proportion of the total variance in the DV that is attributed to an effect.variance in the DV that is attributed to an effect.
Partial eta-squared is the proportion of the Partial eta-squared is the proportion of the leftover variance in the DV (after all other IVs are leftover variance in the DV (after all other IVs are accounted for) that is attributable to the effect accounted for) that is attributable to the effect
This is what SPSS gives you but dodgy (over This is what SPSS gives you but dodgy (over estimates the effect)estimates the effect)
Measures of association - Eta-Measures of association - Eta-SquaredSquared
2 SStreatment
SStotal
p2 SStreatment
SStreatment SSerror
Omega-squared is an estimate of the Omega-squared is an estimate of the dependent variable population variability dependent variable population variability accounted for by the independent variable.accounted for by the independent variable.
For a one-way between groups design:For a one-way between groups design:
pp=number of levels of the treatment =number of levels of the treatment variable, F = value and variable, F = value and nn= the number of = the number of participants per treatment levelparticipants per treatment level
Measures of association - Measures of association - Omega-squaredOmega-squared
ˆ 2 ( p 1)(F 1)( p 1)(F 1)np
2= SSeffect – (dfeffect)MSerror
SStotal + Mserror
Cohen’s (1988) Cohen’s (1988) ff for the one-way between groups for the one-way between groups analysis of variance can be calculated as followsanalysis of variance can be calculated as follows
Or can use eta sq instead of omegaOr can use eta sq instead of omega It is an averaged standardised difference between It is an averaged standardised difference between
the 3 or more levels of the IV (even though the the 3 or more levels of the IV (even though the above formula doesn’t look like that)above formula doesn’t look like that)
Small effect - Small effect - ff=0.10; Medium effect - =0.10; Medium effect - ff=0.25; =0.25; Large effect - Large effect - ff=0.40=0.40
Measures of difference - Measures of difference - ff
ˆ f ˆ 2
1 ˆ 2
Measures of association - R-Measures of association - R-SquaredSquared
RR22 is the proportion of variance explained is the proportion of variance explained by the modelby the model
In general RIn general R22 is given by is given by Can be converted to effect size fCan be converted to effect size f22
FF2 2 = R= R22/(1- R/(1- R22)) Small effect – Small effect – ff22=0.02; =0.02; Medium effect - Medium effect - ff2 2 =0.15; =0.15; Large effect - Large effect - ff2 2 =0.35=0.35
R2 SSmodel
SStotal
Summary of effect Summary of effect conventionsconventions
From G*PowerFrom G*Power http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/user_manual/http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/user_manual/
user_manual_02.html#input_valuser_manual_02.html#input_val
estimating effectestimating effect
prior literatureprior literature
assessment of how great a difference is importantassessment of how great a difference is important e.g., effect on reading ability only worth the trouble if at e.g., effect on reading ability only worth the trouble if at
least increases half a least increases half a SDSD
special conventionsspecial conventions
3838
side issues…side issues…
recall the logic of calculating estimates of recall the logic of calculating estimates of effect effect sizesize (i.e., criticisms of significance testing)(i.e., criticisms of significance testing) the tradition of significance testing is based upon an the tradition of significance testing is based upon an
arbitrary rule leading to a yes/no decisionarbitrary rule leading to a yes/no decision
power illustrates further some of the caveats power illustrates further some of the caveats with significance testingwith significance testing with a high with a high NN you will have enough power to detect a you will have enough power to detect a
very small effectvery small effect if you cannot keep error variance low a large effect if you cannot keep error variance low a large effect
may still be non-significant may still be non-significant
3939
side issues…side issues…
on the other hand…on the other hand… sometimes very small effects are importantsometimes very small effects are important by employing strategies to increase power by employing strategies to increase power
you have a better chance at detecting these you have a better chance at detecting these small effects small effects
4040
powerpowerCommon constraints :Common constraints :Cell size too smallCell size too small
• B/c sample difficult to recruit or too little time / moneyB/c sample difficult to recruit or too little time / moneySmall effects are often a focus of theoretical interest Small effects are often a focus of theoretical interest (especially in social / clinical / org)(especially in social / clinical / org)
• DV is subject to multiple influences, so each IV has small impactDV is subject to multiple influences, so each IV has small impact• ““Error” or residual variance is large, because many IVs unmeasured Error” or residual variance is large, because many IVs unmeasured
in experiment / survey are influencing DVin experiment / survey are influencing DV• Interactions are of interest, and interactions draw on smaller cell Interactions are of interest, and interactions draw on smaller cell
sizes (and thus lower power) than tests of main effects [Cell means sizes (and thus lower power) than tests of main effects [Cell means for interaction are based on n observations, while main effects are for interaction are based on n observations, while main effects are based on n x # of levels of other factors collapsed across]based on n x # of levels of other factors collapsed across]
determining powerdetermining power
sometimes, for practical reasons, it’s useful sometimes, for practical reasons, it’s useful to try to calculate the power of your to try to calculate the power of your experiment experiment beforebefore conducting it conducting it
if the power is very low, then there’s no if the power is very low, then there’s no point in conducting the experiment.point in conducting the experiment.
basically, you want to make sure you have basically, you want to make sure you have a reasonable shot at getting an effect (if one a reasonable shot at getting an effect (if one exists!)exists!)
which is why grant reviewers want themwhich is why grant reviewers want them
Post hoc power calculations
Generally useless / difficult to interpret from the point of view of stats
Mandated within some fields Examples of post hoc power write-ups
online at http://www.psy.uq.edu.au/~wlouis
G*POWERG*POWER G*POWER is a FREE program that can make the calculations G*POWER is a FREE program that can make the calculations
a lot easiera lot easier
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/
Faul, F., Erdfelder, E., Lang, A.-G., & Buchner, A. (2007). G*Power 3: A flexible statistical power analysis program for the social,
behavioral, and biomedical sciences. Behavior Research Methods, 39, 175-191. G*Power computes: power values for given sample sizes, effect sizes, and alpha
levels (post hoc power analyses), sample sizes for given effect sizes, alpha levels, and power
values (a priori power analyses) suitable for most fundamental statistical methods Note – some tests assume equal variance across groups and
assumes using pop SD (which are likely to be est from sample)
Ok, lets do it: BS t-test
two random samples of n = 25
expect difference between means of 5
two-tailed test, = .05
– 1 = 5
– 2 = 10
– = 10 .500 = 10
10 - 5 = d
G*POWERG*POWERpower calculations: example
two random samples of n = 25
expect difference between means of 5
two-tailed test, = .05
– 1 = 5
– 2 = 10
– = 10
So, with that expected effect size and n we get So, with that expected effect size and n we get power = ~.41power = ~.41
We have a probability of correctly rejecting null We have a probability of correctly rejecting null hyp (if false) 41% of the timehyp (if false) 41% of the time
Is this good enough?Is this good enough? convention dictates that researchers should be convention dictates that researchers should be
entering into an experiment with no less than entering into an experiment with no less than 80% chance of getting an effect (presuming it 80% chance of getting an effect (presuming it exists) ~ power at least .80exists) ~ power at least .80
determining determining NN
Determine nDetermine n
Calculate effect sizeCalculate effect size Use power of .80 (convention)Use power of .80 (convention)
WS t-testWS t-test Within subjects designs more powerful Within subjects designs more powerful
than between subjects (control for than between subjects (control for individual differences)individual differences)
WS t-test not very difficult in G*Power, but WS t-test not very difficult in G*Power, but becomes trickier in ANOVAbecomes trickier in ANOVA
Need to know correlation between Need to know correlation between timepoints (luckily SPSS paired t gives this)timepoints (luckily SPSS paired t gives this)
Or can use the mean and SD of Or can use the mean and SD of “difference” scores (also in SPSS output)“difference” scores (also in SPSS output)
ss
Screen clipping taken: 7/8/2008, 4:30 PM
Method 1
Difference scores
Dz = Mean Diff/ SD diff
= .0167/.0718= .233
ss
Screen clipping taken: 7/8/2008, 4:30 PM
WS t-testWS t-test
I said before that WS are more powerful I said before that WS are more powerful than the equivalent BS versionthan the equivalent BS version
Let’s test this by using the same means Let’s test this by using the same means and SDs and using the Independent and SDs and using the Independent Samples t-test calculator in GPowerSamples t-test calculator in GPower
Screen clipping taken: 7/8/2008, 4:30 PM
Screen clipping taken: 7/8/2008, 4:30 PM
Between subjectsPower = .18
Within subjectsPower = .07
5656
Extension to 1-way anova…Extension to 1-way anova… In PSYC3010 you used Phi prime as the ANOVA equivalent In PSYC3010 you used Phi prime as the ANOVA equivalent
of of d d which is the same as Cohen’s which is the same as Cohen’s ff G*Power uses Cohen’s G*Power uses Cohen’s ff Numerous methodsNumerous methods1)1) calculate Omega sq and then use the formula for f and enter calculate Omega sq and then use the formula for f and enter
directlydirectly2)2) Calculate Omega sq or eta sq and enter into “Direct” under Calculate Omega sq or eta sq and enter into “Direct” under
“Effect size from variances”“Effect size from variances”3)3) Use means and use “Effect size from means”Use means and use “Effect size from means”
ˆ 2 ( p 1)(F 1)( p 1)(F 1)np
ˆ f ˆ 2
1 ˆ 2
ANOVAPTSD Severity
SS df Mean Square F Sig.Between Groups 507.84 3 169.28 3.269 0.030Within Groups 2278.74 44 51.7895Total 2786.58 47
Calculating omega & fCalculating omega & f
Given the above analysisGiven the above analysis
SoSo
ˆ 2 ( p 1)(F 1)( p 1)(F 1)np
(4 1)(3.269 1)(4 1)(3.269 1)(12)(4)
0.124
ˆ f ˆ 2
1 ˆ 2 0.124
1 0.1240.378
Not sure if this works withSPSS partial eta sq – havehad problems before & Omega more conservative anyway
6060
AlternativelyAlternatively Alternatively, if have means (note – this is a different Alternatively, if have means (note – this is a different
data set)data set)
meanmean DV score DV score nn
CoffeeCoffee 63.7563.75 1616Energy DrinkEnergy Drink 64.6964.69 1616WaterWater 46.5646.56 1616
MSMSerrorerror = = 125.21125.21 =58.33 =58.33 NN=48=48
use square root of MSE to enter into SD within each group in GPOwer
6161
6262
how about 2-way factorial how about 2-way factorial anova?anova?
Need to test for Need to test for 3 effects3 effects to estimate the power: to estimate the power: Main effect IV 1Main effect IV 1 Main effect IV 2Main effect IV 2 Interaction effect (usually less power than main Interaction effect (usually less power than main
effects due to smaller n in each cell)effects due to smaller n in each cell)
See See http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/reference/http://www.psycho.uni-duesseldorf.de/aap/projects/gpower/reference/reference_manual_07.htmlreference_manual_07.html
Within subjects ANOVAWithin subjects ANOVA
Not only need to know effect size but Not only need to know effect size but also correlation across time/varsalso correlation across time/vars Use a convention for estimating effect size Use a convention for estimating effect size
(G*Power uses either Lambda or Cohen’s f)(G*Power uses either Lambda or Cohen’s f) Calculate f using number of levels, effect Calculate f using number of levels, effect
convention, correlation (e.g., test-retest)convention, correlation (e.g., test-retest) Calculate Lambda (f * N)Calculate Lambda (f * N) Use Generic F testUse Generic F test
Within ExampleWithin Example 3 levels over time (m)3 levels over time (m) 64 Participants (n)64 Participants (n) Look for small effect (f = .01)Look for small effect (f = .01) Test-retest corr = .79 (p)Test-retest corr = .79 (p) Calc fCalc f = = (m*f)/(1-p) = (3*.01)/(1-.79) = .143(m*f)/(1-p) = (3*.01)/(1-.79) = .143 Calc Lambda = f*n = .143*64 = 9.152Calc Lambda = f*n = .143*64 = 9.152 DF 1 = m- 1 = 2DF 1 = m- 1 = 2 DF 2 = n*(m-1) = 128DF 2 = n*(m-1) = 128
Note. Can’t do a priori. If need toestimate upfront play with denominatorDF (based on N)
Within ExampleWithin Example
Refer to Karl Wuensch’s website for more Refer to Karl Wuensch’s website for more details re: RMdetails re: RM
http://core.ecu.edu/psyc/wuenschk/http://core.ecu.edu/psyc/wuenschk/StatsLessons.htmStatsLessons.htm
And Gpower manuals online – e.g.: And Gpower manuals online – e.g.: http://www.psycho.uni-duesseldorf.de/abteilungehttp://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/user-guide-n/aap/gpower3/user-guide-type_of_power_analysis type_of_power_analysis
Regression analysesRegression analyses
Effect size associated with Effect size associated with RR22
ff22 = R = R22/1-R/1-R22
For semipartialFor semipartial ff22 = sr = sr22/1-R/1-R22
fullfull
ff22 = .02 (small) = .02 (small) ff22 = .15 (medium) = .15 (medium) ff22 = .35 (large) = .35 (large) Convert to variance acct fConvert to variance acct f22/(1+ f/(1+ f22))
RR22
3 predictor variables3 predictor variables
RR22 for full model = .22 for full model = .22
ff22 = .22/(1-.22) = .282 = .22/(1-.22) = .282
N = 110N = 110
Change RChange R2 2 (HMR)(HMR)
2 steps, 2 predictors in step 1, 3 in step 22 steps, 2 predictors in step 1, 3 in step 2
RR22 for full model = .10 for full model = .10
Change RChange R22 for step 2 = .04 for step 2 = .04
ff22 = R = R22changechange/(1-R/(1-R22
fullfull))
ff22 = .04/(1-.1) = .0444 = .04/(1-.1) = .0444
N = 95N = 95
DF numerator for Step 2= 3DF numerator for Step 2= 3
Complex analysesComplex analyses
G*POWER useful for basic analysesG*POWER useful for basic analyses Complex analyses e.g., SEM, MLM etc Complex analyses e.g., SEM, MLM etc
usually look to monte carlo studiesusually look to monte carlo studies
Additional ResourcesAdditional Resources
http://www.danielsoper.com/statcalc/http://www.danielsoper.com/statcalc/ Some other statistical calculators including for Some other statistical calculators including for
powerpower