Download pdf - Robust Confidence Intervals for Effect Sizes: A ... Confidence Intervals 2 Robust Confidence Intervals for Effect Sizes: A Comparative Study of Cohen’s d and Cliff’s Delta Under

Robust Confidence Intervals 1

Robust Confidence Intervals for Effect Sizes: A Comparative Study of

Cohen’s d and Cliff’s Delta Under Non-normality and Heterogeneous Variances

Melinda R. Hess

Jeffrey D. Kromrey

University of South Florida

Paper presented at the annual meeting of the American Educational Research Association, San Diego,

April 12 – 16, 2004


Robust Confidence Intervals for Effect Sizes: A Comparative Study of

Cohen’s d and Cliff’s Delta Under Non-normality and Heterogeneous Variances

Educational research continues to come under fire for the perceived lack of rigor, quality and

credibility (see, for example, Gall, Borg, & Gall, 1996; Tuckman, 1990; Keselman, 1998). As the cry for

accountability within education continues to increase, evidenced by such legislation as the 2001 No Child

Left Behind Act, it would be naïve or arrogant for educational researchers to believe that we are exempt

from increased scrutiny. As such, it is imperative that researchers not only examine the worth of their

topics and the rigor of their designs, but also ensure that they are using appropriate and thorough reporting

practices. When reporting the results of empirical inquiry, the information must be both clear and

comprehensive. Clarity can be addressed by the choice of communicating the results through well

thought-out and designed words, pictures, and tables. The need for comprehensiveness in reporting can

be addressed by the tools used to analyze the findings and the methods chosen to report those findings.

This study takes a piece of both clarity and comprehensiveness of reporting as its focus. The last

two editions of the American Psychological Association’s (APA) Publication Manual (1994, 2001) as

well as the 1999 report by Wilkinson and the APA Task Force on Statistical Inference both recommend

and encourage the use of effect size reporting as well as confidence intervals. As such, the primary

purpose of this research focuses on effective and statistically sound methods of constructing CIs around

effect sizes.

Purpose of the Study

Previous research (Hess & Kromrey, 2003; Kromrey & Hess, 2002; Hogarty & Kromrey, 2001)

has suggested that the sensitivity of traditional indices of effect size, such as Cohen’s d, precludes their

valid interpretation under variance heterogeneity and non-normality. However, alternative indices of

effect have evidenced notably lower levels of bias under such conditions (Hogarty & Kromrey, 2001). Of

the variety of effect size indices examined by these researchers, a nonparametric index (δ ) proposed by

Cliff (1993, 1996) provided the least bias and the most consistent standard errors across the conditions

examined. However, Hogarty and Kromrey (2001) considered only point estimates of effect size. The

purpose of this study was to extend this line of inquiry to investigate the accuracy and precision of

interval estimates of Cliff’s δ .

Effect Sizes

A variety of effect sizes are currently available, e.g., Cohen’s d, Hedge’s g and the trimmed d

(Hogarty & Kromrey, 1999), and research into robust and reliable effect size computation is ongoing.


Recent research into constructing CIs for differences between two groups has primarily focused on using

Cohen’s d (Hess & Kromrey 2003, Hess & Kromrey, 2002) given by:

1 2

2 21 1 2 2

1 2

( 1) ( 1)2

X Xdn S n S

n n

−=

− + −+ −

(1)

where 2, andi i iX S n are the sample mean, variance and size of group i.

The results of those studies have indicated that accurate construction of CIs is problematic for

groups that possess more than a minimal level of difference, as indicated by a Cohen’s d greater than 0.2

(Cohen, 1988). Additionally, the presence of variance heterogeneity further diminishes coverage

especially when the ratio of variances between groups exceeds 1:2. The influence of sample size was also

found to be an issue although the effect of sample size as well as sample balance resulted in a variety of

coverage capabilities. These findings may be due, at least in part, to the bias present in standardized

mean difference indices resulting from the parametric characteristics of these statistics. Therefore,

investigation of other effect indices was deemed necessary.

Using a non-parametric approach was expected to help alleviate some of the bias introduced by

using parametric methods such as Cohen’s d. Cliff (1996) suggested a straightforward alternative to

using means as the comparison point for two groups. He proposed an approach that examines the

probability that individual observations within one group are likely to be greater than the observations in

the other group. That is, the population parameter for which such an effect size is intended is the

probability that a randomly selected member of one population has a higher response than a randomly

selected member of the second population, minus the reverse probability:

δ = Pr(xi1>xj2) – Pr(xi1<xj2) (2)

where xi1 is a member of population one and xj2 is a member of population two.

Essentially, this approach considers the ordinal, rather than the interval, properties of the data.

The sample estimate of this statistic, Cliff’s δ̂ , is obtained by comparing each of the scores in one group

to each of the scores in the other. The calculation of this sample statistic is given by:

1 2 1 2

1 2

#( ) #( )ˆ x x x xn n

δ > − <= (3)

where x1 and x2 scores within group 1 and group 2 and n1 and n2 are the group sample sizes

The non-parametric nature of Cliff’s δ reduces the influence of such characteristics as

distribution shape, differences in dispersion and extreme values. The statistic relies on what Cliff refers

to as a dominance analysis, a concept referring to the degree to which one sample overlaps another: the

greater the overlap (i.e., the lower the dominance), the less difference between the groups. Unlike


Cohen’s d, Cliff’s effect size is bounded. An effect size of 1.0 or -1.0 indicates the absence of overlap

between the two groups whereas a 0.0 indicates no overlap and the group distributions are equivalent.

Interval Estimates of Effect Sizes

A variety of methods for constructing confidence bands around Cohen’s d have been investigated

in previous studies under various degrees of group differences, heterogeneity, sample size, and

distribution shape (Hess & Kromrey, 2003). Three techniques emerged as providing the most accurate

coverage under the greatest range of conditions.

The first technique relies on the asymptotic normality of the sampling distribution of d (Hedges &

Olkin, 1985). Confidence bands are constructed using percentiles from the standard normal distribution,

and the asymptotic variance of the standardized mean difference. That is, the upper and lower endpoints

of the band are given by

( )2 ˆ∆ = ±U L dd Zα σ (4)

where 2Zα is the normal deviate corresponding to the ( )1 2 thα− percentile of the normal

distribution and d is the sample value of Cohen’s d with estimated variance ( )2ˆ dσ .

Secondly, an interval inversion approach to confidence interval construction has most recently

undergone explication by Steiger and Fouladi (1982, 1997), complementing and adding to earlier work on

this type of approach (see, for example, Venables, 1975; Serlin & Lapsley, 1985). Such an approach has

been shown to provide accurate confidence intervals in a variety of conditions and has shown promise in

similar applications of confidence interval estimation (Kromrey & Hess, 2001; Hess & Kromrey, 2002).

When applied to the standardized mean difference, this method evaluates the noncentral t distribution and

identifies values of noncentrality for which the observed sample noncentrality is expected to occur (for

example) 2.5% of the time and 97.5% of the time. These values of noncentrality are then transformed to

provide the endpoints of a 95% confidence band around the sample value of d.

The third technique for constructing confidence intervals around effect sizes is that of the

bootstrap, a technique commonly recognized as an efficient method for providing estimates for, among

other things, confidence intervals and standard errors (Efron & Gong, 1983; Efron & Tibshirani, 1986;

Stine, 1990). The basic method of bootstrapping (called the percentile method by Carpenter and Bithell,

2000) consists of repeatedly drawing samples of size n with replacement (e.g., 1000 times or 5000 times)

from a single sample of n observations. Each bootstrap sample provides an estimate of the parameter of

interest and the set of estimates provides an empirical sampling distribution for the statistic. Percentile

points in this empirical sampling distribution (e.g., the 2.5th percentile and the 97.5th percentile) provide

the endpoints of the confidence interval. The percentile method is a relatively simple calculation and has


the advantage of not requiring an estimate of the standard error. Unfortunately, what this method

provides in simplicity and appeal also results in a tendency to be less successful for non-normal

distributions. Alternative approaches to bootstrapping, such as the pivotal bootstrap, bias corrected, and

bias corrected-accelerated bootstrap provided superior CI coverage in many conditions in previous studies

(Carpenter and Bithell, 2000; Hess & Kromrey 2003) and were thus included in this analysis.

In the Non-Studentized Pivotal method (PV) of bootstrapping, samples are drawn with

replacement, as with the percentile method, but the sampling distribution constructed and evaluated to

produce the confidence interval is the sampling distribution of ( )*d d− where d* is the bootstrap estimate

of Cohen’s d, and d is the observed sample value. The relevant percentiles of this empirical sampling

distribution are calculated and back-transformed to produce the endpoints of the confidence interval. In

the Studentized-Pivotal (SPV) bootstrap, the sampling distribution constructed is that of

( )*

*

ˆd

d dσ

−

where *ˆd

σ is the estimated standard error of the bootstrap estimate of Cohen’s d.

Thus, the SPV is a more highly computative method that requires an estimate for the standard error of the

statistic of interest. The ability to reliably estimate this value is not always straightforward. If an analytic

formula for the standard error is not available, the value can be estimated using the jackknife.

The Bias Corrected method (BC) of bootstrapping adjusts for asymmetry in the empirical

sampling distribution that is constructed. This method computes the proportion of the sampling

distribution that is less than the mean as an estimate of asymmetry and incorporates this estimate into the

endpoints of the confidence interval. An extension of this is the Bias Corrected and Accelerated (BCA)

method. The appeal for this method is its perceived ability to not only effectively adjust for asymmetric

sampling distributions, but to adjust for distribution changes along the range of bootstrap values (that is, if

the shape of the sampling distribution changes with the value of d*). In addition, it is reported to result in

smaller coverage error than both the percentile and BC methods. However, it is thought to have stability

issues for small type I error rates (α < .025) and can be highly complex computationally.

Interval Estimates for Cliff’s δ

The δ statistic and inferential methods associated with it are readily addressed by considering the

data from two groups in an arrangement called a dominance matrix. This n1 by n2 matrix has elements

taking the value of 1 if the row response is larger than the column response, -1 if the row response is less

than the column response, and 0 if the two responses are identical. The sample estimate of δ is simply

the average value of the elements in the dominance matrix.


Consider a hypothetical example in which the data displayed in Table 1 represent two sets of

classroom means obtained on a math achievement test. For this example, responses were obtained from

ten treatment and six control classrooms. The research question seeks to address whether the two

populations sampled were different with regard to their mean math achievement.

Table 1

Sample of Two Groups of Classroom Mean Achievement Scores.

Treatment

Classrooms

Control

Classrooms

10 10

10 20

20 30

20 40

20 40

30 50

30

30

40

50

Table 2 exhibits these data in a 10 x 6 dominance matrix. The elements of the matrix take the

value of 1 if the row (Treatment Classroom) mean is larger than the column (Control Classroom) mean.

The value 0 is assigned if the value for the two groups is the same and the value –1 is given if the row

mean is less than the column mean. These data result in a value for Cliff’s δ̂ (from Equation 3) of –0.25.

When used as an effect size index, Cliff’s δ̂ represents the degree of overlap between the two

distributions of scores. It ranges from –1 (if all observations in group 1 are larger than all observations in

group 2) to +1 (if all observations in group 1 are smaller than all observations in group 2) and takes the

value of zero if the two distributions are identical.


Table 2

Dominance Matrix.

10 20 30 40 40 50

10 0 -1 -1 -1 -1 -1

10 0 -1 -1 -1 -1 -1

20 1 0 -1 -1 -1 -1

20 1 0 -1 -1 -1 -1

20 1 0 -1 -1 -1 -1

30 1 1 0 -1 -1 -1

30 1 1 0 -1 -1 -1

30 1 1 0 -1 -1 -1

40 1 1 1 0 0 -1

50 1 1 1 1 1 0

Cliff (1996b) suggested a variety of methods for inference about δ , but the current study focused

on the construction of confidence intervals. A consistent estimate of the variance of δ is given by

( ) ( )2 2 2

2 12

1 2

1 1ˆ di d j dij

dc

n S n S Sn n

σ− + − +

= i i (5)

where id i is the marginal value of row i,

jdi is the column marginal of column j,

ijd is the value of element ij in the matrix,

( )2

2

1

ˆ

1i

di

dS

n

δ−=

−∑ i

i

( )2

2

2

ˆ

1j

d j

dS

n

δ−=

−∑ i

i , and


( )( ) ( )

2

2

1 2

ˆ

1 1ij

dij

dS

n n

δ−=

− + −∑∑

.

The limits of an asymmetric confidence interval are given by

( )2

3 2 2 22 2

2 2 22

ˆ ˆ ˆˆ ˆ1ˆ ˆ1dc dc

dc

Z Z

Zα α

α

δ δ σ δ σ

δ σ

− ± − +

− + (6)

where 2Zα is the normal deviate corresponding to the ( )1 2 thα− percentile of the normal

distribution.

The variance of δ̂ for the data in Table 1 (using Equation 5) is 0.106 and the 95% CI (using

Equation 6) is [-0.71, 0.36]. In contrast, Cohen’s d for these data (using Equation 1) is –0.422 and the

95% CI from Equation 4 is [-1.44, 0.60].

In addition to this confidence interval suggested by Cliff, the five bootstrap methods described

above were applied to the δ statistic.

Method

Monte Carlo methods were used to compare confidence band estimates using random samples

generated from populations under known and controlled conditions. Confidence intervals of 90%, 95%

and 99% were investigated for each sample under each condition for each effect size of interest. The four

design factors in this study included (a) sample size (ranging from 5 to 200, including balanced and

unbalanced designs), (b) population effect size (with ∆= 0.00, 0.20, 0.50 and 0.80 corresponding to

Cohen’s small, medium and large effects as well as a null condition), (c) population distribution shape

(population skewness and kurtosis of 0,0 and 2,6), and (d) variance in the two populations (with 2 21 2:σ σ =

1:1, 1:2, 1:4, and 1:8).

The research was conducted using SAS/IML version 8.2 and run under both Windows and Unix

platforms. Normally distributed random variables were generated using the RANNOR random number

generator in SAS using a different seed value for each generation. The program code was verified

through a baseline hand-check of benchmark datasets. Conditions requiring non-normal population

distributions were produced by transforming the normal random variates obtained from RANNOR using

the technique described by Fleishman (1978).


Trials consisting of 5000 samples for each test condition were conducted, assuring sufficient

precision for an adequate initial investigation into the sampling behavior of these confidence bands.

According to Robey and Barcikowski (1992), the use of 5000 replications provides a maximum 95%

confidence interval width around an observed proportion that is ± .0098.

Results and Conclusions

The results were analyzed in terms of confidence band coverage probabilities and confidence

band widths for each of the effect size estimates under consideration. Interpretations of the effectiveness

of the resulting coverage of each of the methods were guided by the guidelines provided by Bradley

(1978) regarding the degree of departure from nominal coverage. In general the results of the research

support the use of Cliff’s δ as a robust effect size estimate. Confidence band coverage for δ was

superior to that of Cohen’s d under heterogeneous variances and non-normal distributions. The results are

presented for the conditions run using a Type I error rate of .05. Findings were similar for alpha of .01

and .10 and are available from the authors upon request.

Coverage Probabilities

All of the methods of CI construction, when examined across all of the conditions, provided

better coverage probabilities for Cliff’s δ as compared to Cohen’s d, with the exception of the Pivotal

Bootstrap method (see Figure 1). The normal theory Z method seemed to provide exceptional coverage

for Cliff’s δ . The studentized pivotal bootstrap appeared to provide more liberal coverage for Cliff’s δ

as compared to the other methods.

Effect Size. Coverage probabilities as a function of effect size, across conditions, show a marked

decrease in proportional coverage for all methods for Cohen’s d (see Table 1). However, for Cliff’s δ ,

the normal theory Z bands consistently provided the nominal coverage probabilities desired. Regardless

of whether no effect is present (∆= 0) or a large effect is present (∆= 0.8) the Z bands provided coverage

of 0.95 consistently. The bias corrected-accelerated bootstrap method performed almost as well,

maintaining a coverage probability of approximately 0.95 for all but the largest effect size, for which

coverage fell to 94%. In general, coverage probabilities for Cliff’s δ tended to be better than those of

Cohen’s d across techniques with the exception of the pivotal bootstrap (with mean coverage for δ

dropping as low as .89).

Variance Between Groups. When varying degrees of variance heterogeneity were examined

(Table 2), the normal theory Z bands for δ continued to maintain the nominal coverage probability (0.95)

regardless of degree of heterogeneity. Again, the least effective method for Cliff’s δ was the pivotal


bootstrap, providing only a 90% probability of coverage, even under homogeneous conditions. The

studentized pivotal exhibited slightly conservative coverage under all degrees of heterogeneity, with

coverage probabilities of approximately 97% across all degrees of heterogeneity.

Distribution Shape. The two distribution shapes examined (Table 3) represent two conditions

from the potential continuum of distributions possible. For normal population distributions, the pivotal

bootstrap performed best for Cohen’s d whereas both the normal theory Z band and the bias corrected-

accelerated bootstrap bands for Cliff’s δ provided nominal coverage of 0.95. The studentized pivotal

once again produced results with slight over coverage (97%) in conditions with a normal distribution.

When conditions were considered with a highly skewed (2.0) and kurtotic (6.0) distribution, all of the

methods for both Cohen’s d and Cliff’s δ deteriorated in their ability to provide adequate coverage, with

the exception of the Z bands for Cliff’s δ .

Sample Size. Sample size had two obvious effects on coverage, both as function of size and

balance of sizes (Table 4). For those conditions in which n2 > n1, a positive pairing with populations

variances, coverage tended to be enhanced, and, in some cases, coverage probabilities were excessive

relative to the desired alpha level. Slight over coverage occurred with positive pairings of small sample

sizes and variances for Cohen’s d using the normal theory Z bands (.97). Similar results occurred for

Cliff’s δ for not only the normal theory Z band method (.96), but also the bias corrected-accelerated and

studentized pivotal methods (.96 and .99 respectively). When sample size increased, this issue tended to

be resolved and coverage was much closer to the nominal coverage desired. When unbalanced sample

sizes were negatively paired with population variances (smaller size with larger variance), coverage

probabilities were reduced noticeably. Typically, the bands constructed around Cliff’s δ using the Z

method were robust to changes in sample sizes as well as balance shifts.

Specific Conditions. Confidence band coverage estimates for selected conditions are provided in

Tables 5 – 8. For normal distributions with homogeneous variances (Table 5), the coverage estimates for

Cohen’s d were near nominal levels across conditions for the normal theory Z bands, the pivotal and

studentized pivotal bootstraps, and the Steiger and Fouladi interval inversion bands. The percentile

bootstrap, bias corrected and bias corrected accelerated bootstraps evidenced less than nominal coverage

in small samples (n1 + n2 = 20), but provided adequate coverage with large samples (n1 + n2 = 200). For

Cliff’s δ , the normal theory Z bands and the bias corrected and bias corrected accelerated bootstrap

bands provided adequate coverage across these conditions, while the percentile and pivotal bootstraps

provided lower coverage in the small sample conditions. Conversely, the studentized pivotal bootstrap

provided overly conservative coverage with small samples (with coverage probability estimates typically

at .99 or above).


With normal distributions and heterogeneous variances (1:8 variance ratio), the deleterious

impact on Cohen’s d of unequal sample sizes is evident (Table 6). With the normal theory Z bands,

confidence interval coverage estimates reached as low as .75 (with n1 = 150, n2 = 50 and ∆= 0.8). Similar

declines in confidence band coverage were evident for all of the bootstrap approaches, as well as the

Steiger and Fouladi approach. The confidence intervals for Cliff’s δ were less affected by the

heterogeneity in the populations. The normal theory Z bands provided adequate coverage for all

conditions except those with n1 = 15, n2 = 5 and ∆> 0.2 (and in these conditions, the band coverage still

exceeded .91). These small sample, unbalanced conditions also led to reduced coverage for the bias

corrected and bias corrected accelerated bootstrap bands (with coverage dipping below .88). Finally, the

studentized pivotal bootstrap bands retained their conservative coverage with small samples, but coverage

near nominal levels with large samples.

Under conditions of non-normal population distributions and homogeneous variances (Table 7),

the normal theory Z bands and the Steiger and Fouladi bands for Cohen’s d provided declining coverage

as ∆ increased, but the coverage reached only as low as .91 (n1 = n2 = 10). All of the bootstrap approaches

appeared to be more adversely affected by the non-normality, especially in small sample conditions. For

example, confidence interval coverage estimates reached as low as .77 with the percentile bootstrap when

n1 = 15, n2 = 5 and ∆= 0.8. For Cliff’s δ , the normal theory Z bands maintained coverage near the

nominal level across these conditions while the bootstrap bands maintained near-nominal coverage under

the large samples. For small samples, the bootstraps evidenced reduced coverage (reaching as low as .72

with the pivotal bootstrap when n1 = 15, n2 = 5 and ∆= 0.8). As with the previous conditions examined,

the studentized pivotal bootstrap showed conservative coverage with small samples under non-normal

homogenous populations.

Finally, under non-normal distributions and heterogenous variances (1:8 variance ratio), the

greatest impact on the confidence intervals for Cohen’s d was evident (Table 8). With the normal theory Z

bands, confidence interval coverage estimates reached as low as .64 (with n1 = 150, n2 = 50 and ∆= 0.8)

and similar poor coverage was evident for the bootstrap intervals and the Steiger and Fouladi approach.

Conversely, the normal theory Z bands for Cliff’s δ provided adequate coverage for all conditions

presented in this table. As with previous conditions, the bootstrap intervals for Cliff’s δ evidenced

reduced coverage with small samples, but adequate coverage when samples were large. Also consistent

with previous conditions, the studentized pivotal bootstrap bands showed conservative coverage with

small samples, but coverage near nominal levels with large samples.


Bandwidth

The mean widths of the confidence intervals across conditions are presented in Figure 2. Because

Cohen’s d and Cliff’s δ represent different scales, the interval widths across statistics are not directly

comparable. Within statistics, however, the typical bandwidths were comparable across the methods of CI

construction, with an exception being the SPV method applied to Cliff’s δ -- a method that resulted in

notably wider intervals.

Effect Size. Bandwidths for Cliff’s δ tended to decrease as the magnitude of the effect increased,

whereas bandwidths around Cohen’s d increased as effect size increased (Table 9). For example, bands

constructed by the normal theory Z technique around Cohen’s d went from 1.15 when ∆= 0 to 1.19 when

∆= 0.8. Conversely, confidence bands constructed using the same technique around Cliff’s δ = 0

decreased from 0.63 to 0.57.

Variance Between Groups. When varying degrees of variance heterogeneity were examined

(Table 10), the confidence bands around Cliff’s δ and those constructed around Cohen’s d behaved

consistently. For both effect size indices, the confidence intervals increased in width as the degree of

variance heterogeneity increased.

Distribution Shape. The shape of the distribution, whether normal or highly skewed and kurtotic,

seemed to have minimal impact on confidence interval widths, regardless of the method used or the

parameter being estimated (see Table 11). The largest degrees of magnitude in change occurred with the

standardized pivotal method for both Cohen’s d and Cliff’s δ , going from 1.20 to 1.28 and 1.5 to 2.0

respectively. The rest of the changes in bandwidth tended to be quite small, and, in some cases,

decreasing for Cliff’s δ for the non-normal distribution.

Sample Size. As expected, average confidence interval widths were reduced as sample size

increased (see Table 12). For the smallest samples examined (n1 + n2 = 20), the confidence intervals for

both Cohen’s d and Cliff’s δ were wide enough to be virtually uninformative about the parameter

location.

Specific Conditions. Confidence bandwidth estimates for selected conditions are provided in

Tables 13 – 16. Table 13 provides bandwidth estimates for those conditions that are normal with equal

variances. Most notable in this table is the exceptionally low precision (i.e., extremely wide confidence

intervals) for all of the small sample conditions. For both small and large sample sizes, slightly better

precision was evident when sample sizes were equal and somewhat less precision was seen with larger

values of∆ . Across the interval estimation methods, few differences were evidenced for either Cohen’s d

or Cliff’s δ , with the exception of the Studentized Pivotal Bootstrap bands for δ (an approach that


yielded notably larger confidence intervals). Table 14 presents the conditions in which normal

distributions are coupled with heterogeneous (1:8) population variances. For these conditions, the

confidence bandwidths were related to the pairing of sample size with population variance. More

precision was evident with positive pairing and less precision with negative pairing. Further, differences

in precision between the bootstrap bands and the normal theory Z bands were evident for Cohen’s d (with

the bootstrap approaches yielding smaller bands with positive pairing and less precision with negative

pairing), but these differences were not apparent in the comparison of bands for Cliff’s δ . Table 15

provides estimates for conditions coupling non-normality with homogeneous population variances,

conditions that suggest the similar patterns to those noted with the normal distributions under

homogeneous variances. Finally, Table 16 presents estimates for conditions with non-normality and

heterogeneity (1:8) of variances.

Conclusions

It is imperative that we continue to explore the methods used in both theoretical and applied

research as the result of those methods have the potential for far-reaching and in-depth impact on

educational researchers and practitioners alike. The ability to use an interval approach for estimating the

effects of instructional strategies continues to show promise as educational practices continue to develop

and adapt under growing scrutiny. However, it is critical that the appropriateness and effectiveness of

confidence band construction be investigated relative to not only the type of parameter being estimated,

but also different measures of that parameter. This study clearly illustrates that while the normal theory Z

band approach may not have been the most effective for construction confidence bands around Cohen’s d,

it did provide very impressive coverage probabilities for Cliff’s δ . Decisions such as the viability and

appropriateness of using one estimate of effect size as compared to another is, obviously, up to the

researcher. Once that decision is made, then the next should be regarding the best approach for

construction CIs. Further investigation into when to use different CI techniques as a function of data

characteristics, parameter characteristics, and computational sophistication is critical. CI construction can

not, and should not, be thought of as a ‘one size fits all’ issue.


References

American Psychological Association (2001). Publication manual of the American Psychological

Association (5th ed.). Washington, DC: Author.

Bradley, J.V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-

151.

Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological

Bulletin, 114, 494-509.

Cliff, N. (1996a). Answering ordinal questions with ordinal data using ordinal statistics. Multivariate

Behavioral Research, 31, 331-350.

Cliff, N. (1996b). Ordinal Methods for Behaioral Data Analysis. New Jersey: Lawrence Erlbaum

Associates.

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Academic

Press.

Cooper H. & Hedges, L. (1994). The Handbook of Research Synthesis. New York: Russel Sage

Foundation.

Efron, B. & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. The

American Statistician, 37(1), pg 36-49.

Efron, B. & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other

measures of statistical accuracy. Statistical Science, 1(1), p. 54-77.

Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4),

p.521-532.

Grissom R.J. & Kim J.J. (2001). Review of assumptions and problems in the appropriate

conceptualization of effect size. Psychological Methods, 6(2), p. 135-146.

Hedges L.V. & Olkin I. (1985). Statistical Methods for Meta-Analysis. New York: Academic Press.

Hess M. & Kromrey, J.D. (2003, February). Confidence Bands for Standardized Mean Differences: A

Comparison of Nine Techniques Under Non-normality and Variance Heterogeneity. Paper

presented at the Eastern Educational Research Association, Hilton Head, NC.

Hess, M.R. & Kromrey, J.D. (2002, April). Confidence intervals for the standardized mean difference:

An empirical comparison of methods for interval estimation of effect sizes. Paper presented at

the American Educational Research Association, New Orleans, LA


Hogarty K. Y. & Kromrey, J.D. (1999, August). Traditional and robust effect size estimates: Power and

Type I error control in meta-analystic tests of homogeneity. Paper presented at the Joint

Statistical Meetings, Baltimore.

Hogarty K. Y. & Kromrey, J.D. (2001, April). We’ve Been Reporting Some Effect Sizes: Can You

Guess What They Mean? Paper presented at the American Educational Research Association,

Seattle.

Kromrey, K. Y. & Hess, M. H. (2001, April). Interval Estimates of R2: An empirical comparison of

accuracy and precision under violations of the normality assumption. Paper presented at the

annual meeting of the American Educational Research Association, Seattle, WA.

Kromrey, J. D. & Hogarty, K. Y. (1999, April). Traditional and robust effect size estimates: an empirical

comparison in meta-analystic tests of homogeneity. Paper presented at the annual meeting of the

American Educational Research Association, Montreal.

McMillan, J.H., Snyder, A., Lewis, K.L., (2002, April). Reporting Effect Size: The Road Less Traveled.

Paper presented at the annual meeting of the American Educational Research Association, New

Orleans, LA.

Nix, T.W. & Barnette, J. J. (1998). The data analysis dilemma: Ban or abandon. A review of null

hypothesis signficance testing. Research in the Schools, 5(2), p. 3-14.

Robey, R.R. & Barcikowski, R.S. (1992). Tye I error and the number of iterations in Monte Carlo studies

of robustness. British Journal of Mathematical and Statistical Psychology, 45, 283-288.

Serlin, R. C. & Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 73-83.

Steiger, J. H. & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power

calculation, and hypothesis testing for the squared multiple correlation. Behavior Research,

Methods, Instruments, and Computers, 4, 581-582.

Steiger, J. H. & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power

calculation, and hypothesis testing for the squared multiple correlation. Behavior Research,

Methods, Instruments, and Computers, 4, 581-582.

Stine, R. (1990). An introduction to bootstrap methods. Sociological Methods and Research, 18 (2&3),

p. 243-291.

Thompson, B. (1998). Statistical significance and effect size reporting: Portrait of a possible future.

Research in the Schools, 5(2), p. 33-38.


Venables, W. (1975). Calculation of confidence intervals for noncentrality parameters. Journal of the

Royal Statistical Society, Series B, 37, 406-412.

Wilkinson & APA Task Force on Statistical Inference. (1999). Statistical methods in psychology

journals: Guidelines and explanations. American Psychologist, 54, 594-604.


Table 1 Estimated confidence band coverage by effect size, across conditions

Cohen’s d Cliff’s Delta

delta Z Pctl BC BCA Pivotal Std

Pivotal S & F Z Pctl BC BCA Pivotal Std

Pivotal 0.0 0.929 0.920 0.932 0.942 0.949 0.936 0.931 0.952 0.937 0.944 0.951 0.912 0.971 0.2 0.923 0.916 0.928 0.938 0.943 0.929 0.925 0.953 0.936 0.944 0.952 0.910 0.971 0.5 0.909 0.903 0.915 0.924 0.931 0.916 0.911 0.952 0.931 0.942 0.948 0.904 0.967 0.8 0.885 0.884 0.896 0.904 0.912 0.896 0.887 0.952 0.924 0.937 0.942 0.891 0.960

Table 2 Estimated confidence band coverage by degree of heterogeneity, across conditions

Cohen’s d Cliff’s Delta Variance

Ratio Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std

Pivotal 1:1 0.945 0.917 0.931 0.937 0.945 0.933 0.947 0.952 0.933 0.945 0.953 0.906 0.969 1:2 0.928 0.912 0.925 0.933 0.939 0.926 0.930 0.953 0.933 0.944 0.950 0.907 0.969 1:4 0.899 0.902 0.914 0.923 0.930 0.914 0.901 0.951 0.931 0.940 0.946 0.903 0.966 1:8 0.874 0.892 0.903 0.914 0.920 0.904 0.876 0.953 0.930 0.938 0.944 0.901 0.965

Table 3 Estimated confidence band coverage by distribution shape, across conditions

Cohen’s d Cliff’s Delta Skewness, Kurtosis Z Pctl BC BCA Pivotal Std

Pivotal S & F Z Pctl BC BCA Pivotal Std Pivotal

0.0, 0.0 0.927 0.917 0.926 0.935 0.950 0.937 0.929 0.952 0.936 0.944 0.953 0.910 0.971 2.0, 6.0 0.896 0.895 0.910 0.919 0.918 0.901 0.898 0.953 0.928 0.939 0.944 0.898 0.964


Table 4 Estimated confidence band coverage by sample size, across conditions


n1 n2 Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std

Pivotal 5 15 0.966 0.897 0.917 0.927 0.936 0.919 0.968 0.962 0.927 0.949 0.963 0.878 0.993

10 10 0.921 0.886 0.916 0.938 0.966 0.932 0.926 0.961 0.926 0.949 0.966 0.865 0.992 15 5 0.840 0.817 0.851 0.876 0.942 0.894 0.848 0.941 0.868 0.893 0.893 0.788 0.955 25 75 0.966 0.926 0.927 0.925 0.911 0.908 0.967 0.952 0.946 0.949 0.953 0.935 0.962 50 50 0.931 0.937 0.943 0.948 0.943 0.937 0.931 0.952 0.945 0.948 0.953 0.933 0.961 75 25 0.849 0.911 0.922 0.936 0.936 0.924 0.850 0.949 0.939 0.944 0.955 0.921 0.974 50 150 0.955 0.914 0.913 0.909 0.892 0.892 0.955 0.951 0.947 0.949 0.950 0.941 0.955

100 100 0.927 0.940 0.942 0.944 0.939 0.936 0.927 0.950 0.945 0.946 0.948 0.940 0.954 150 50 0.848 0.923 0.931 0.939 0.939 0.931 0.849 0.951 0.944 0.948 0.952 0.935 0.962


Table 5 Estimated confidence band coverage with normal, homogeneous populations.


n1 n2 delta Z Pctl BC BCA Pivotal Std


Pivotal 5 15 0.0 0.945 0.893 0.915 0.925 0.971 0.945 0.948 0.947 0.916 0.933 0.948 0.863 0.998 0.2 0.947 0.887 0.912 0.926 0.971 0.945 0.949 0.949 0.912 0.935 0.954 0.849 0.999 0.5 0.945 0.887 0.913 0.926 0.970 0.944 0.950 0.949 0.909 0.938 0.961 0.845 0.995 0.8 0.944 0.881 0.909 0.920 0.965 0.940 0.948 0.951 0.896 0.935 0.954 0.824 0.979

10 10 0.0 0.944 0.920 0.944 0.961 0.979 0.958 0.950 0.964 0.940 0.954 0.973 0.896 0.999 0.2 0.953 0.924 0.950 0.965 0.985 0.962 0.957 0.967 0.947 0.959 0.973 0.901 0.999 0.5 0.950 0.920 0.946 0.961 0.981 0.960 0.953 0.967 0.938 0.959 0.973 0.887 0.998 0.8 0.948 0.905 0.936 0.953 0.979 0.955 0.951 0.963 0.932 0.958 0.973 0.860 0.993

15 5 0.0 0.951 0.893 0.908 0.921 0.969 0.939 0.951 0.957 0.911 0.931 0.949 0.863 1.000 0.2 0.951 0.900 0.921 0.937 0.979 0.949 0.956 0.948 0.919 0.943 0.957 0.858 0.998 0.5 0.947 0.880 0.904 0.924 0.975 0.949 0.952 0.944 0.895 0.933 0.952 0.836 0.996 0.8 0.936 0.862 0.893 0.904 0.962 0.930 0.938 0.936 0.880 0.917 0.941 0.807 0.986

50 150 0.0 0.960 0.950 0.957 0.959 0.958 0.953 0.960 0.961 0.954 0.958 0.960 0.953 0.967 0.2 0.942 0.929 0.931 0.932 0.938 0.936 0.942 0.938 0.935 0.936 0.939 0.928 0.947 0.5 0.956 0.945 0.946 0.952 0.956 0.951 0.957 0.951 0.947 0.949 0.950 0.940 0.960 0.8 0.938 0.936 0.938 0.942 0.943 0.942 0.939 0.946 0.944 0.947 0.946 0.929 0.955

100 100 0.0 0.940 0.936 0.936 0.937 0.945 0.944 0.940 0.941 0.938 0.937 0.938 0.929 0.941 0.2 0.952 0.950 0.955 0.957 0.960 0.958 0.952 0.964 0.955 0.953 0.955 0.944 0.960 0.5 0.959 0.950 0.953 0.952 0.960 0.959 0.959 0.958 0.955 0.957 0.958 0.952 0.962 0.8 0.954 0.951 0.954 0.952 0.954 0.957 0.953 0.957 0.954 0.956 0.957 0.952 0.964

150 50 0.0 0.963 0.959 0.959 0.959 0.962 0.960 0.963 0.957 0.959 0.962 0.964 0.955 0.966 0.2 0.949 0.941 0.942 0.944 0.949 0.945 0.949 0.958 0.956 0.956 0.958 0.945 0.957 0.5 0.949 0.949 0.950 0.952 0.950 0.949 0.950 0.952 0.952 0.952 0.953 0.945 0.956 0.8 0.946 0.927 0.934 0.939 0.945 0.940 0.945 0.943 0.936 0.941 0.945 0.926 0.953


Table 6 Estimated confidence band coverage with normal, heterogeneous(1:8) populations.




Pivotal 5 15 0.0 0.995 0.907 0.929 0.949 0.978 0.973 0.996 0.970 0.933 0.949 0.967 0.898 1.000 0.2 0.991 0.909 0.926 0.946 0.975 0.963 0.991 0.974 0.939 0.952 0.972 0.901 0.999 0.5 0.989 0.914 0.928 0.939 0.961 0.952 0.989 0.973 0.941 0.957 0.970 0.903 0.997 0.8 0.991 0.903 0.915 0.912 0.923 0.910 0.991 0.973 0.934 0.959 0.965 0.884 0.989

10 10 0.0 0.935 0.905 0.928 0.962 0.983 0.958 0.939 0.967 0.930 0.944 0.975 0.887 0.999 0.2 0.941 0.916 0.934 0.962 0.984 0.963 0.946 0.966 0.937 0.953 0.979 0.881 0.999 0.5 0.952 0.931 0.956 0.971 0.988 0.972 0.954 0.975 0.951 0.969 0.983 0.902 0.999 0.8 0.925 0.889 0.917 0.945 0.979 0.956 0.930 0.954 0.912 0.943 0.964 0.839 0.989

15 5 0.0 0.771 0.841 0.874 0.911 0.963 0.908 0.777 0.935 0.901 0.904 0.913 0.772 0.981 0.2 0.765 0.852 0.891 0.921 0.965 0.912 0.772 0.941 0.901 0.913 0.924 0.774 0.984 0.5 0.763 0.818 0.850 0.890 0.956 0.897 0.768 0.911 0.852 0.867 0.878 0.762 0.959 0.8 0.782 0.820 0.857 0.888 0.957 0.901 0.794 0.915 0.844 0.863 0.879 0.777 0.934

50 150 0.0 0.999 0.934 0.941 0.944 0.942 0.942 0.999 0.941 0.939 0.940 0.944 0.935 0.946 0.2 0.995 0.931 0.924 0.925 0.936 0.935 0.995 0.959 0.955 0.955 0.958 0.952 0.959 0.5 0.962 0.830 0.825 0.819 0.825 0.827 0.962 0.940 0.934 0.940 0.941 0.935 0.945 0.8 0.876 0.661 0.641 0.626 0.592 0.601 0.875 0.954 0.947 0.949 0.951 0.942 0.957

100 100 0.0 0.940 0.943 0.947 0.952 0.951 0.948 0.940 0.944 0.942 0.945 0.945 0.933 0.949 0.2 0.950 0.946 0.951 0.953 0.959 0.956 0.950 0.959 0.954 0.953 0.955 0.951 0.964 0.5 0.931 0.932 0.929 0.933 0.930 0.929 0.931 0.943 0.938 0.941 0.943 0.932 0.949 0.8 0.901 0.916 0.903 0.903 0.907 0.906 0.902 0.947 0.939 0.946 0.954 0.939 0.953

150 50 0.0 0.814 0.942 0.943 0.949 0.952 0.949 0.814 0.948 0.943 0.942 0.945 0.928 0.955 0.2 0.796 0.941 0.945 0.953 0.954 0.946 0.796 0.953 0.944 0.946 0.958 0.935 0.968 0.5 0.777 0.914 0.918 0.932 0.942 0.929 0.779 0.942 0.923 0.933 0.944 0.916 0.962 0.8 0.747 0.900 0.914 0.929 0.939 0.917 0.752 0.959 0.949 0.953 0.956 0.936 0.974


Table 7 Estimated confidence band coverage with non-normal, homogeneous populations.




Pivotal 5 15 0.0 0.949 0.853 0.892 0.895 0.899 0.879 0.951 0.942 0.902 0.927 0.943 0.848 0.997 0.2 0.945 0.878 0.905 0.914 0.886 0.874 0.951 0.956 0.923 0.938 0.955 0.865 0.996 0.5 0.943 0.881 0.901 0.910 0.897 0.876 0.948 0.954 0.926 0.941 0.962 0.867 0.991 0.8 0.924 0.875 0.902 0.915 0.915 0.892 0.931 0.970 0.914 0.959 0.968 0.849 0.977

10 10 0.0 0.946 0.904 0.934 0.946 0.959 0.917 0.949 0.962 0.933 0.954 0.971 0.891 1.000 0.2 0.960 0.909 0.946 0.957 0.968 0.923 0.963 0.972 0.940 0.970 0.987 0.895 1.000 0.5 0.937 0.865 0.916 0.929 0.961 0.921 0.941 0.958 0.923 0.951 0.969 0.859 0.993 0.8 0.913 0.849 0.889 0.908 0.958 0.912 0.919 0.958 0.908 0.947 0.960 0.831 0.976

15 5 0.0 0.953 0.867 0.905 0.916 0.908 0.888 0.955 0.955 0.912 0.933 0.946 0.860 0.999 0.2 0.957 0.844 0.888 0.903 0.920 0.895 0.963 0.946 0.908 0.936 0.950 0.822 0.993 0.5 0.947 0.803 0.857 0.872 0.944 0.911 0.953 0.955 0.875 0.924 0.926 0.790 0.967 0.8 0.935 0.776 0.823 0.847 0.957 0.913 0.945 0.951 0.827 0.865 0.847 0.721 0.886

50 150 0.0 0.946 0.935 0.941 0.945 0.922 0.918 0.946 0.953 0.952 0.954 0.957 0.943 0.958 0.2 0.961 0.960 0.962 0.960 0.948 0.944 0.961 0.964 0.959 0.959 0.963 0.946 0.966 0.5 0.945 0.939 0.943 0.946 0.933 0.929 0.945 0.955 0.948 0.949 0.952 0.941 0.957 0.8 0.931 0.948 0.951 0.948 0.933 0.930 0.932 0.953 0.949 0.954 0.955 0.939 0.955

100 100 0.0 0.948 0.943 0.946 0.949 0.937 0.933 0.948 0.942 0.942 0.942 0.945 0.934 0.946 0.2 0.953 0.947 0.951 0.953 0.941 0.939 0.953 0.938 0.932 0.933 0.934 0.929 0.945 0.5 0.937 0.945 0.952 0.953 0.940 0.939 0.937 0.940 0.934 0.937 0.941 0.937 0.949 0.8 0.920 0.926 0.929 0.936 0.934 0.927 0.919 0.944 0.938 0.941 0.944 0.930 0.950

150 50 0.0 0.946 0.940 0.943 0.945 0.933 0.930 0.947 0.937 0.929 0.929 0.933 0.923 0.943 0.2 0.963 0.946 0.954 0.953 0.947 0.945 0.964 0.963 0.955 0.959 0.964 0.945 0.969 0.5 0.948 0.928 0.937 0.940 0.942 0.936 0.948 0.955 0.953 0.953 0.958 0.942 0.966 0.8 0.927 0.914 0.920 0.913 0.925 0.921 0.927 0.952 0.945 0.956 0.957 0.929 0.966


Table 8 Estimated confidence band coverage with non-normal, heterogeneous (1:8) populations.




Pivotal 5 15 0.0 0.970 0.906 0.929 0.944 0.980 0.964 0.971 0.968 0.932 0.950 0.961 0.900 1.000 0.2 0.979 0.913 0.933 0.950 0.941 0.928 0.981 0.962 0.941 0.953 0.969 0.908 0.994 0.5 0.971 0.898 0.912 0.918 0.891 0.874 0.973 0.965 0.938 0.955 0.962 0.890 0.990 0.8 0.958 0.892 0.902 0.903 0.836 0.826 0.961 0.971 0.911 0.959 0.966 0.853 0.987

10 10 0.0 0.904 0.879 0.908 0.932 0.971 0.940 0.908 0.962 0.926 0.952 0.967 0.878 0.995 0.2 0.867 0.840 0.867 0.905 0.946 0.896 0.874 0.961 0.912 0.944 0.960 0.865 0.983 0.5 0.842 0.835 0.861 0.895 0.929 0.883 0.850 0.951 0.910 0.923 0.941 0.827 0.972 0.8 0.814 0.799 0.833 0.870 0.920 0.856 0.830 0.969 0.909 0.927 0.927 0.777 0.955

15 5 0.0 0.738 0.784 0.818 0.855 0.907 0.856 0.742 0.958 0.877 0.881 0.865 0.770 0.953 0.2 0.725 0.780 0.816 0.853 0.924 0.863 0.733 0.951 0.859 0.871 0.870 0.774 0.938 0.5 0.660 0.729 0.766 0.806 0.918 0.841 0.672 0.945 0.817 0.833 0.784 0.750 0.883 0.8 0.652 0.700 0.726 0.762 0.931 0.837 0.668 0.967 0.766 0.788 0.720 0.724 0.811

50 150 0.0 0.996 0.952 0.957 0.962 0.948 0.945 0.996 0.955 0.952 0.948 0.951 0.941 0.950 0.2 0.992 0.935 0.936 0.930 0.897 0.896 0.992 0.950 0.946 0.946 0.947 0.943 0.955 0.5 0.940 0.894 0.885 0.862 0.800 0.799 0.940 0.955 0.952 0.954 0.959 0.948 0.957 0.8 0.821 0.813 0.797 0.756 0.671 0.675 0.820 0.944 0.940 0.940 0.944 0.932 0.953

100 100 0.0 0.942 0.941 0.946 0.953 0.955 0.952 0.943 0.949 0.945 0.948 0.952 0.941 0.953 0.2 0.910 0.931 0.937 0.934 0.919 0.914 0.911 0.939 0.933 0.934 0.938 0.929 0.942 0.5 0.891 0.951 0.947 0.942 0.923 0.920 0.890 0.966 0.960 0.958 0.959 0.946 0.971 0.8 0.811 0.935 0.933 0.925 0.896 0.894 0.812 0.944 0.946 0.947 0.950 0.940 0.951

150 50 0.0 0.806 0.936 0.943 0.950 0.945 0.939 0.806 0.946 0.941 0.948 0.955 0.931 0.960 0.2 0.756 0.912 0.921 0.938 0.933 0.924 0.756 0.949 0.939 0.940 0.945 0.929 0.965 0.5 0.682 0.894 0.912 0.928 0.931 0.919 0.683 0.960 0.951 0.955 0.960 0.939 0.972 0.8 0.643 0.874 0.892 0.923 0.924 0.902 0.642 0.941 0.935 0.938 0.943 0.922 0.966


Table 9 Estimated bandwidth by effect size, across conditions


delta Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std

Pivotal 0.0 1.149 1.257 1.247 1.260 1.257 1.187 1.150 0.628 0.652 0.658 0.664 0.652 1.542 0.2 1.153 1.289 1.273 1.286 1.289 1.210 1.154 0.620 0.640 0.649 0.656 0.640 1.652 0.5 1.168 1.354 1.325 1.336 1.354 1.254 1.170 0.598 0.604 0.619 0.629 0.604 1.818 0.8 1.194 1.437 1.389 1.398 1.437 1.311 1.198 0.569 0.555 0.575 0.590 0.555 1.936

Table 10 Estimated bandwidth by degree of heterogeneity, across conditions

Cohen’s d Cliff’s Delta Variance

Ratio Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std

Pivotal 1:1 1.162 1.212 1.197 1.206 1.212 1.154 1.164 0.592 0.604 0.614 0.623 0.604 1.283 1:2 1.164 1.269 1.249 1.260 1.269 1.198 1.166 0.597 0.606 0.619 0.628 0.606 1.496 1:4 1.167 1.369 1.340 1.352 1.369 1.268 1.169 0.607 0.615 0.628 0.638 0.615 1.896 1:8 1.171 1.489 1.448 1.462 1.489 1.342 1.174 0.621 0.626 0.640 0.650 0.626 2.273

Table 11 Estimated bandwidth by distribution shape, across conditions

Cohen’s d Cliff’s Delta Skewness, Kurtosis Z Pctl BC BCA Pivotal Std

Pivotal S & F Z Pctl BC BCA Pivotal Std Pivotal

0.0, 0.0 1.162 1.271 1.257 1.265 1.271 1.198 1.164 0.608 0.627 0.636 0.645 0.627 1.493 2.0, 6.0 1.170 1.398 1.360 1.375 1.398 1.283 1.172 0.600 0.599 0.615 0.625 0.599 1.981


Table 12 Estimated bandwidth by sample size, across conditions


n1 n2 Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std

Pivotal 5 15 2.068 1.952 1.891 1.867 1.952 1.805 2.072 0.920 0.943 0.967 0.988 0.943 1.992

10 10 1.814 2.173 2.102 2.085 2.173 1.954 1.820 0.933 0.970 0.995 1.023 0.970 2.393 15 5 2.118 2.997 2.915 3.052 2.997 2.585 2.126 1.142 1.136 1.192 1.217 1.136 8.656 25 75 0.915 0.791 0.787 0.785 0.791 0.782 0.915 0.424 0.427 0.428 0.430 0.427 0.444 50 50 0.797 0.860 0.854 0.853 0.860 0.846 0.798 0.438 0.443 0.444 0.446 0.443 0.461 75 25 0.922 1.220 1.214 1.223 1.220 1.192 0.922 0.562 0.573 0.575 0.580 0.573 0.636 50 150 0.646 0.555 0.553 0.553 0.555 0.552 0.646 0.301 0.302 0.303 0.303 0.302 0.307

100 100 0.563 0.603 0.601 0.600 0.603 0.598 0.563 0.312 0.314 0.314 0.314 0.314 0.319 150 50 0.650 0.860 0.858 0.862 0.860 0.850 0.650 0.405 0.408 0.409 0.411 0.408 0.426


Table 13 Estimated confidence band width with normal, homogeneous populations.




Pivotal 5 15 0.0 2.052 2.148 2.131 2.150 2.148 2.014 2.054 1.057 1.122 1.132 1.149 1.122 2.790 0.2 2.055 2.150 2.129 2.149 2.150 2.014 2.058 1.047 1.109 1.126 1.147 1.109 2.850 0.5 2.077 2.181 2.147 2.164 2.181 2.032 2.082 1.015 1.058 1.088 1.120 1.058 3.388 0.8 2.118 2.232 2.172 2.185 2.232 2.062 2.127 0.958 0.962 1.013 1.061 0.962 3.819

10 10 0.0 1.778 1.953 1.934 1.929 1.953 1.833 1.780 0.957 1.016 1.022 1.031 1.016 1.359 0.2 1.781 1.957 1.936 1.933 1.957 1.835 1.783 0.952 1.010 1.019 1.029 1.010 1.361 0.5 1.805 1.991 1.957 1.955 1.991 1.853 1.810 0.922 0.967 0.986 1.004 0.967 1.589 0.8 1.852 2.046 1.985 1.985 2.046 1.883 1.861 0.867 0.887 0.922 0.956 0.887 1.933

15 5 0.0 2.050 2.129 2.114 2.130 2.129 2.000 2.052 1.054 1.118 1.129 1.146 1.118 2.466 0.2 2.056 2.136 2.114 2.134 2.136 2.002 2.059 1.041 1.101 1.119 1.138 1.101 2.772 0.5 2.078 2.180 2.142 2.162 2.180 2.029 2.083 1.012 1.053 1.084 1.114 1.053 3.380 0.8 2.117 2.224 2.166 2.181 2.224 2.057 2.125 0.960 0.969 1.018 1.053 0.969 3.683

50 150 0.0 0.641 0.642 0.642 0.642 0.642 0.639 0.641 0.366 0.369 0.369 0.369 0.369 0.378 0.2 0.642 0.644 0.644 0.644 0.644 0.640 0.642 0.363 0.365 0.365 0.365 0.365 0.374 0.5 0.649 0.648 0.648 0.648 0.648 0.645 0.649 0.348 0.349 0.349 0.350 0.349 0.359 0.8 0.660 0.660 0.660 0.660 0.660 0.656 0.660 0.322 0.322 0.323 0.324 0.322 0.334

100 100 0.0 0.555 0.559 0.559 0.558 0.559 0.556 0.555 0.318 0.321 0.321 0.321 0.321 0.325 0.2 0.556 0.560 0.560 0.560 0.560 0.557 0.556 0.316 0.318 0.318 0.318 0.318 0.322 0.5 0.564 0.567 0.567 0.568 0.567 0.564 0.564 0.303 0.305 0.305 0.306 0.305 0.309 0.8 0.577 0.580 0.579 0.580 0.580 0.577 0.577 0.281 0.282 0.282 0.283 0.282 0.287

150 50 0.0 0.641 0.643 0.643 0.643 0.643 0.640 0.641 0.366 0.368 0.369 0.369 0.368 0.378 0.2 0.642 0.644 0.645 0.645 0.644 0.641 0.642 0.364 0.367 0.367 0.368 0.367 0.377 0.5 0.648 0.651 0.651 0.651 0.651 0.647 0.648 0.349 0.351 0.352 0.352 0.351 0.361 0.8 0.660 0.661 0.660 0.660 0.661 0.657 0.660 0.323 0.323 0.324 0.325 0.323 0.335


Table 14 Estimated confidence band width with normal, heterogeneous(1:8) populations.




Pivotal 5 15 0.0 2.037 1.480 1.465 1.458 1.480 1.433 2.038 0.928 0.966 0.971 0.977 0.966 1.248 0.2 2.040 1.478 1.463 1.456 1.478 1.431 2.041 0.927 0.963 0.971 0.978 0.963 1.236 0.5 2.050 1.503 1.479 1.472 1.503 1.447 2.053 0.906 0.935 0.949 0.963 0.935 1.377 0.8 2.074 1.564 1.522 1.510 1.564 1.491 2.079 0.868 0.880 0.904 0.932 0.880 1.819

10 10 0.0 1.778 2.073 2.041 2.030 2.073 1.916 1.780 0.991 1.066 1.070 1.084 1.066 1.754 0.2 1.783 2.093 2.057 2.049 2.093 1.928 1.785 0.984 1.055 1.064 1.083 1.055 1.801 0.5 1.800 2.095 2.052 2.045 2.095 1.924 1.804 0.967 1.030 1.047 1.074 1.030 2.237 0.8 1.847 2.186 2.107 2.103 2.186 1.966 1.856 0.915 0.945 0.978 1.025 0.945 2.941

15 5 0.0 2.104 3.761 3.694 3.855 3.761 3.141 2.111 1.202 1.295 1.327 1.364 1.295 11.427 0.2 2.103 3.806 3.738 3.862 3.806 3.167 2.110 1.201 1.304 1.340 1.386 1.304 11.178 0.5 2.151 3.844 3.737 3.889 3.844 3.167 2.162 1.168 1.215 1.281 1.321 1.215 11.672 0.8 2.197 3.921 3.775 3.936 3.921 3.198 2.212 1.160 1.150 1.237 1.274 1.150 11.897

50 150 0.0 0.640 0.427 0.427 0.427 0.427 0.426 0.640 0.304 0.305 0.306 0.306 0.305 0.309 0.2 0.641 0.429 0.429 0.430 0.429 0.428 0.641 0.302 0.304 0.304 0.304 0.304 0.307 0.5 0.645 0.436 0.436 0.436 0.436 0.435 0.645 0.293 0.294 0.294 0.295 0.294 0.297 0.8 0.652 0.449 0.448 0.448 0.449 0.447 0.652 0.277 0.278 0.278 0.279 0.278 0.282

100 100 0.0 0.555 0.561 0.562 0.561 0.561 0.558 0.555 0.340 0.343 0.343 0.343 0.343 0.348 0.2 0.556 0.562 0.562 0.563 0.562 0.559 0.556 0.338 0.341 0.341 0.341 0.341 0.347 0.5 0.562 0.572 0.571 0.571 0.572 0.568 0.562 0.328 0.330 0.330 0.330 0.330 0.336 0.8 0.573 0.589 0.588 0.588 0.589 0.585 0.573 0.311 0.312 0.313 0.314 0.312 0.319

150 50 0.0 0.642 0.979 0.979 0.979 0.979 0.968 0.642 0.459 0.465 0.466 0.466 0.465 0.483 0.2 0.644 0.980 0.979 0.980 0.980 0.969 0.644 0.458 0.464 0.465 0.465 0.464 0.482 0.5 0.652 0.992 0.991 0.992 0.992 0.980 0.652 0.444 0.449 0.450 0.451 0.449 0.469 0.8 0.668 1.015 1.013 1.015 1.015 1.001 0.668 0.421 0.426 0.427 0.429 0.426 0.449


Table 15 Estimated confidence band width with non-normal, homogeneous populations.




Pivotal 5 15 0.0 2.053 2.088 2.082 2.137 2.088 1.948 2.055 1.048 1.112 1.121 1.140 1.112 2.691 0.2 2.061 2.184 2.141 2.147 2.184 2.013 2.064 0.984 1.022 1.041 1.055 1.022 2.094 0.5 2.086 2.322 2.225 2.201 2.322 2.091 2.091 0.895 0.906 0.939 0.959 0.906 2.032 0.8 2.140 2.560 2.379 2.316 2.560 2.232 2.150 0.812 0.780 0.831 0.866 0.780 2.292

10 10 0.0 1.777 1.890 1.876 1.873 1.890 1.784 1.780 0.957 1.018 1.025 1.033 1.018 1.348 0.2 1.781 1.905 1.888 1.885 1.905 1.791 1.784 0.943 0.997 1.014 1.023 0.997 1.461 0.5 1.813 2.007 1.951 1.956 2.007 1.845 1.819 0.892 0.918 0.951 0.982 0.918 2.022 0.8 1.873 2.156 2.038 2.047 2.156 1.924 1.883 0.838 0.815 0.868 0.922 0.815 3.017

15 5 0.0 2.052 2.139 2.126 2.180 2.139 1.983 2.055 1.057 1.123 1.134 1.150 1.123 2.932 0.2 2.056 2.080 2.078 2.164 2.080 1.941 2.059 1.091 1.155 1.190 1.213 1.155 5.011 0.5 2.085 2.095 2.071 2.179 2.095 1.945 2.090 1.086 1.075 1.141 1.182 1.075 7.705 0.8 2.132 2.246 2.164 2.291 2.246 2.044 2.141 1.119 0.947 1.038 1.075 0.947 8.340

50 150 0.0 0.641 0.630 0.631 0.635 0.630 0.627 0.641 0.365 0.367 0.367 0.367 0.367 0.376 0.2 0.642 0.640 0.640 0.642 0.640 0.636 0.642 0.328 0.331 0.331 0.331 0.331 0.337 0.5 0.649 0.664 0.662 0.662 0.664 0.659 0.649 0.287 0.287 0.288 0.288 0.287 0.292 0.8 0.660 0.702 0.698 0.697 0.702 0.695 0.660 0.249 0.249 0.250 0.250 0.249 0.254

100 100 0.0 0.555 0.553 0.553 0.553 0.553 0.550 0.555 0.318 0.320 0.320 0.320 0.320 0.324 0.2 0.556 0.558 0.558 0.559 0.558 0.555 0.556 0.313 0.315 0.316 0.316 0.315 0.320 0.5 0.564 0.581 0.580 0.584 0.581 0.578 0.564 0.294 0.295 0.296 0.296 0.295 0.300 0.8 0.578 0.623 0.621 0.625 0.623 0.618 0.578 0.267 0.267 0.268 0.270 0.267 0.275

150 50 0.0 0.641 0.637 0.638 0.644 0.637 0.634 0.641 0.366 0.368 0.368 0.368 0.368 0.377 0.2 0.642 0.634 0.637 0.644 0.634 0.631 0.642 0.389 0.393 0.394 0.393 0.393 0.405 0.5 0.649 0.651 0.653 0.665 0.651 0.648 0.649 0.382 0.384 0.385 0.387 0.384 0.400 0.8 0.661 0.685 0.685 0.699 0.685 0.680 0.661 0.356 0.355 0.357 0.361 0.355 0.378


Table 16 Estimated confidence band width with non-normal, heterogeneous (1:8) populations.




Pivotal 5 15 0.0 2.041 1.706 1.656 1.600 1.706 1.602 2.042 0.901 0.933 0.946 0.958 0.933 1.340 0.2 2.047 1.850 1.764 1.684 1.850 1.703 2.049 0.877 0.898 0.918 0.938 0.898 1.453 0.5 2.070 2.079 1.950 1.840 2.079 1.850 2.074 0.844 0.846 0.876 0.906 0.846 2.060 0.8 2.104 2.333 2.152 2.021 2.333 2.005 2.111 0.797 0.779 0.826 0.869 0.779 2.464

10 10 0.0 1.788 2.302 2.226 2.182 2.302 2.047 1.791 0.982 1.047 1.066 1.088 1.047 2.726 0.2 1.814 2.487 2.365 2.306 2.487 2.140 1.819 0.962 0.999 1.027 1.061 0.999 3.649 0.5 1.851 2.809 2.621 2.544 2.809 2.291 1.859 0.948 0.941 0.984 1.034 0.941 4.754 0.8 1.916 3.086 2.828 2.706 3.086 2.413 1.930 0.922 0.862 0.918 0.975 0.862 5.436

15 5 0.0 2.114 3.710 3.606 3.948 3.710 3.072 2.122 1.246 1.262 1.319 1.334 1.262 14.675 0.2 2.134 3.911 3.769 4.095 3.911 3.171 2.143 1.252 1.226 1.303 1.325 1.226 15.054 0.5 2.219 4.228 4.013 4.292 4.228 3.280 2.235 1.266 1.097 1.203 1.185 1.097 14.270 0.8 2.298 4.570 4.241 4.506 4.570 3.384 2.321 1.311 0.989 1.109 1.071 0.989 13.606

50 150 0.0 0.641 0.431 0.430 0.429 0.431 0.430 0.640 0.294 0.295 0.296 0.296 0.295 0.299 0.2 0.641 0.478 0.475 0.472 0.478 0.476 0.641 0.284 0.286 0.286 0.286 0.286 0.289 0.5 0.645 0.543 0.537 0.534 0.543 0.539 0.645 0.268 0.268 0.269 0.269 0.268 0.273 0.8 0.652 0.614 0.606 0.602 0.614 0.608 0.653 0.250 0.250 0.251 0.252 0.250 0.255

100 100 0.0 0.555 0.573 0.570 0.567 0.573 0.569 0.555 0.339 0.341 0.342 0.342 0.341 0.348 0.2 0.557 0.627 0.623 0.618 0.627 0.621 0.557 0.331 0.333 0.334 0.334 0.333 0.340 0.5 0.563 0.722 0.712 0.708 0.722 0.712 0.563 0.315 0.317 0.318 0.319 0.317 0.325 0.8 0.575 0.823 0.809 0.804 0.823 0.806 0.575 0.297 0.298 0.298 0.301 0.298 0.308

150 50 0.0 0.642 0.989 0.983 0.983 0.989 0.977 0.642 0.463 0.469 0.470 0.471 0.469 0.489 0.2 0.644 1.061 1.052 1.053 1.061 1.043 0.644 0.455 0.460 0.461 0.463 0.460 0.482 0.5 0.655 1.189 1.173 1.176 1.189 1.160 0.655 0.434 0.438 0.439 0.443 0.438 0.464 0.8 0.671 1.329 1.307 1.309 1.329 1.285 0.671 0.411 0.413 0.415 0.421 0.413 0.445


Z Pctl BC BCA PV SPV S & F Z Pctl BC BCA PV SPV

M etho d

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Esti

mat

ed B

and

Cov

erag

e

Cohen d Cliff d

Figure 1. Distributions of Confidence Band Coverage Estimates


Z Pctl BC BCA PV SPV S & F Z Pctl BC BCA PV SPV

M etho d

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

Esti

mat

ed B

and

Wid

th

Cohen d Cliff d

Figure 2. Distributions of Confidence Band Coverage Widths