28
Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den Noortgate) 10.45-12.15 a.m. Power 1 (I. Van Mechelen) 02.00-03.30 p.m. Power 2 (P. Onghena) 03.45-04.30 (A-N) / 04.30- 05.15 (O-Z)

Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

Embed Size (px)

Citation preview

Page 1: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

Effect Size and Statistical Power Analysis in Behavioral and

Educational Research

• Effect size 1 (P. Onghena)09.00-10.30 a.m.

• Effect size 2 (W. Van den Noortgate)10.45-12.15 a.m.

• Power 1 (I. Van Mechelen)02.00-03.30 p.m.

• Power 2 (P. Onghena)03.45-04.30 (A-N) / 04.30-05.15 (O-Z)

Page 2: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

SIGNIFICANCE TESTING CRISIS

• Carver, R. P. (1993). The case against statistical significance testing

• Cohen, J. (1994). The earth is round (p < .05).• Falk, R., & Greenbaum, C. W. (1995).

Significance tests die hard: The amazing persistence of a probabilistic misconception.

• Hunter, J. E. (1997). Needed: A ban on the significance test.

Page 3: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

CHILDHOOD TRAUMATA

• Furious parental conflicts• Karl Pearson versus Ronald Fisher• Ronald Fisher versus Jerzy Neyman (Egon

Pearson) – see Box (1978), Gigerenzer et al. (1990), Oakes (1986)

• Morrison, D. R., & Henkel, R. E. (Eds.). (1970). The significance test controversy: A reader.

Page 4: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

POSSIBILITY FOR GROWTH

• APA Task Force on Statistical Inference • 1999 American Psychologist article: Wilkinson

& the Task Force • 2001 Publication Manual (5th ed.) • Editorial boards of flagship journals: Journal of

Consulting & Clinical Psychology, Journal of Counseling and Development, Exceptional Children, Journal of Learning Disabilities,…

Page 5: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

GUIDELINES

• Power and sample size. Provide information on sample size and the process that led to sample size decisions. Document the effect sizes, sampling and measurement assumptions, as well as analytic procedures used in power calculations.

Page 6: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

Because power computations are most meaningful when done before data are collected and examined, it is important to show how effect-size estimates have been derived from previous research and theory in order to dispel suspicions that they might have been taken from data used in the study or, even worse, constructed to justify a particular sample size. Once the study is analyzed, confidence intervals replace calculated power in describing results.

Page 7: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

GUIDELINES

• Hypothesis tests. It is hard to imagine a situation in which a dichotomous accept-reject decision is better than reporting an actual p value or, better still, a confidence interval. Never use the unfortunate expression "accept the null hypothesis." Always provide some effect-size estimate when reporting a p value.

Page 8: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

GUIDELINES

• Effect sizes. Always present effect sizes for primary outcomes. If the units of measurement are meaningful on a practical level (e.g., number of cigarettes smoked per day), then we usually prefer an unstandardized measure (regression coefficient or mean difference) to a standardized measure (r or d). It helps to add brief comments that place these effect sizes in a practical and theoretical context.

Page 9: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

For a simple, general purpose display of the practical meaning of an effect size, see Rosenthal and Rubin (1982). Consult Rosenthal and Rubin (1994) for information on the use of “counternull intervals” for effect sizes, as alternatives to confidence intervals.

Page 10: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

GUIDELINES

• Interval estimates. Interval estimates should be given for any effect sizes involving principal outcomes. Provide intervals for correlations and other coefficients of association or variation whenever possible.

Page 11: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: IMPORTANCE

• For power analysis (Cohen, 1969)• For meta-analysis (Glass, 1976)• For descriptive statistics

Test of Significance = Size of Effect × Size of Study

Rosenthal, 1991

Page 12: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: WHAT THE HELL…?

• Cohen (1969): “By the above route, it can now readily made clear that when the null hypothesis is false, it is false to some degree, i.e., the effect size (ES) is some specific nonzero value in the population.” (p. 10)

BAd

Page 13: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: WHAT THE HELL…?

• Use of the tables for significance testing• Cohen (1969): “Accordingly, we refine our

ES index, d, so that its elements are sample results, rather than population parameters, and call it ds.” (p. 64)

p

BAs S

XXd

Page 14: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: WHAT THE HELL…?

BA

BAs nn

nndt

p

BAs S

XXd

2

22

BA

BBAAp nn

XXXXS

Page 15: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: WHAT THE HELL…?

• Glass (1976): uses ds in meta-analysis but only uses S of the control group in the denominator.

• Hedges (1981), Hedges and Olkin (1985)ds is called g (with reference to Gene Glass) Hedges’s g

• Hedges (1981), Hedges and Olkin (1985)confusion: an approximately unbiased estimator called... d!?

Page 16: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: SUMMARYCOMPARISON OF TWO MEANS

• Cohen’s d: population value (if you use the sample as your population, then use the sample size in the denominator)

• Hedges’s g: sample estimator (use the degrees of freedom in the denominator)

• Hedges’s unbiased estimator is rarely used outside meta-analytic contexts

• point biserial correlation coefficient (Rosenthal, 1991)

Page 17: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: EXAMPLE

Experimental Control

7 4

7 4

6 3

5 2

5 2

Sum 30 15

Mean 6 3

S () 1 (0.894) 1 (0.894)

Page 18: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

EFFECT SIZE: EXAMPLE

• Cohen’s d = (6 – 3) / .894 = 3.35• Hedges’s g = (6 – 3) / 1 = 3• Point biserial correlation coefficient:

7 7 6 5 5 4 4 3 2 21 1 1 1 1 0 0 0 0 0r = .86

• All kinds of transformations possible t d g r

Page 19: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

COUNTERNULL VALUE OFAN ES

• Tackle the misconceptions– that failure to reject the null hypothesis ES = 0– that finding a statistically significant p value implies

an ES of important magnitude

• The counternull value is the nonnull magnitude of ES that is supported by exactly the same amount of evidence as is the null value of the ES.

• If the counternull value were taken as H0, then the resulting p value would be the same as the obtained p for the actual H0

Page 20: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

COUNTERNULL VALUE OF AN ES

• For symmetric reference distributions EScounternull = 2ESobtained – ESnull

• For asymmetric reference distributions– transform the ES as to have a symmetric reference

distribution

– calculate the counternull on the symmetric scale

– transform back to obtain the counternull on the original scale

• Example of its use: RRR (2000)

Page 21: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

INTERPRETING EFFECT SIZES

• Cohen’s heuristic values• small: d = 0.20

the size of the difference between 15- and 16-year-old girls

• medium: d = 0.50 visible to the naked eye 14- and 18-year-old girls

• large: d = 0.80 grossly perceptible 13- and 18-year-old girls

Page 22: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

INTERPRETING EFFECT SIZES

• Comparison with other measures• small: d = 0.20

r = .10 r2 = .01

• medium: d = 0.50 r = .243

r2 = .059• large: d = 0.80

r = .371 r2 = .138

Page 23: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

BINOMIAL EFFECT SIZE DISPLAY

r = .32 Treatment outcome

Condition Improved Not improved Totals

Psychotherapy 66 34 100

Control 34 66 100

Totals 100 100 200

Page 24: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

BINOMIAL EFFECT SIZE DISPLAY

• What is the effect on the success rate of the implementation of a certain treatment?

• Psychotherapy success rate: .50 + r/2 = .66

• Control success rate: .50 – r/2 = .34

• Notice: .66 – .34 = .32

• “standardized” percentages in order for all margins to be equal

Page 25: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

ASPIRIN’S EFFECT ONHEART ATTACK

Condition Heart attack No heart attack Total

Aspirin 104 10933 11037

Placebo 189 10845 11034

Totals 293 21778 22071

000001.,01.25)22071,1(2 pN

Page 26: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

ASPIRIN’S EFFECT ONHEART ATTACK: BESD

Condition Heart attack No heart attack Total

Aspirin 48.3 51.7 100

Placebo 51.7 48.3 100

Totals 100 100 200

034.r

Page 27: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

SMALL EFFECTS MAY BE IMPRESSIVE

• and vice versa (Prentice & Miller, 1992)• consider the amount of variation in the

independent variable• consider the importance / the assumed

stability of the dependent variabele

Page 28: Effect Size and Statistical Power Analysis in Behavioral and Educational Research Effect size 1 (P. Onghena) 09.00-10.30 a.m. Effect size 2 (W. Van den

WHAT EFFECT SIZE HAS PRACTICAL SIGNIFICANCE?

• assess practical significance closely related to the particular problems, populations, and measures relevant to the treatment under investigation

• Example: community mental health studyinpatient versus outpatient therapy

• Example: effects of school characteristics on reading achievementfifth grade pupils versus sixth grade pupils