Robust Confidence Intervals 1
Robust Confidence Intervals for Effect Sizes: A Comparative Study of
Cohen’s d and Cliff’s Delta Under Non-normality and Heterogeneous Variances
Melinda R. Hess
Jeffrey D. Kromrey
University of South Florida
Paper presented at the annual meeting of the American Educational Research Association, San Diego,
April 12 – 16, 2004
Robust Confidence Intervals 2
Robust Confidence Intervals for Effect Sizes: A Comparative Study of
Cohen’s d and Cliff’s Delta Under Non-normality and Heterogeneous Variances
Educational research continues to come under fire for the perceived lack of rigor, quality and
credibility (see, for example, Gall, Borg, & Gall, 1996; Tuckman, 1990; Keselman, 1998). As the cry for
accountability within education continues to increase, evidenced by such legislation as the 2001 No Child
Left Behind Act, it would be naïve or arrogant for educational researchers to believe that we are exempt
from increased scrutiny. As such, it is imperative that researchers not only examine the worth of their
topics and the rigor of their designs, but also ensure that they are using appropriate and thorough reporting
practices. When reporting the results of empirical inquiry, the information must be both clear and
comprehensive. Clarity can be addressed by the choice of communicating the results through well
thought-out and designed words, pictures, and tables. The need for comprehensiveness in reporting can
be addressed by the tools used to analyze the findings and the methods chosen to report those findings.
This study takes a piece of both clarity and comprehensiveness of reporting as its focus. The last
two editions of the American Psychological Association’s (APA) Publication Manual (1994, 2001) as
well as the 1999 report by Wilkinson and the APA Task Force on Statistical Inference both recommend
and encourage the use of effect size reporting as well as confidence intervals. As such, the primary
purpose of this research focuses on effective and statistically sound methods of constructing CIs around
effect sizes.
Purpose of the Study
Previous research (Hess & Kromrey, 2003; Kromrey & Hess, 2002; Hogarty & Kromrey, 2001)
has suggested that the sensitivity of traditional indices of effect size, such as Cohen’s d, precludes their
valid interpretation under variance heterogeneity and non-normality. However, alternative indices of
effect have evidenced notably lower levels of bias under such conditions (Hogarty & Kromrey, 2001). Of
the variety of effect size indices examined by these researchers, a nonparametric index (δ ) proposed by
Cliff (1993, 1996) provided the least bias and the most consistent standard errors across the conditions
examined. However, Hogarty and Kromrey (2001) considered only point estimates of effect size. The
purpose of this study was to extend this line of inquiry to investigate the accuracy and precision of
interval estimates of Cliff’s δ .
Effect Sizes
A variety of effect sizes are currently available, e.g., Cohen’s d, Hedge’s g and the trimmed d
(Hogarty & Kromrey, 1999), and research into robust and reliable effect size computation is ongoing.
Robust Confidence Intervals 3
Recent research into constructing CIs for differences between two groups has primarily focused on using
Cohen’s d (Hess & Kromrey 2003, Hess & Kromrey, 2002) given by:
1 2
2 21 1 2 2
1 2
( 1) ( 1)2
X Xdn S n S
n n
−=
− + −+ −
(1)
where 2, andi i iX S n are the sample mean, variance and size of group i.
The results of those studies have indicated that accurate construction of CIs is problematic for
groups that possess more than a minimal level of difference, as indicated by a Cohen’s d greater than 0.2
(Cohen, 1988). Additionally, the presence of variance heterogeneity further diminishes coverage
especially when the ratio of variances between groups exceeds 1:2. The influence of sample size was also
found to be an issue although the effect of sample size as well as sample balance resulted in a variety of
coverage capabilities. These findings may be due, at least in part, to the bias present in standardized
mean difference indices resulting from the parametric characteristics of these statistics. Therefore,
investigation of other effect indices was deemed necessary.
Using a non-parametric approach was expected to help alleviate some of the bias introduced by
using parametric methods such as Cohen’s d. Cliff (1996) suggested a straightforward alternative to
using means as the comparison point for two groups. He proposed an approach that examines the
probability that individual observations within one group are likely to be greater than the observations in
the other group. That is, the population parameter for which such an effect size is intended is the
probability that a randomly selected member of one population has a higher response than a randomly
selected member of the second population, minus the reverse probability:
δ = Pr(xi1>xj2) – Pr(xi1<xj2) (2)
where xi1 is a member of population one and xj2 is a member of population two.
Essentially, this approach considers the ordinal, rather than the interval, properties of the data.
The sample estimate of this statistic, Cliff’s δ̂ , is obtained by comparing each of the scores in one group
to each of the scores in the other. The calculation of this sample statistic is given by:
1 2 1 2
1 2
#( ) #( )ˆ x x x xn n
δ > − <= (3)
where x1 and x2 scores within group 1 and group 2 and n1 and n2 are the group sample sizes
The non-parametric nature of Cliff’s δ reduces the influence of such characteristics as
distribution shape, differences in dispersion and extreme values. The statistic relies on what Cliff refers
to as a dominance analysis, a concept referring to the degree to which one sample overlaps another: the
greater the overlap (i.e., the lower the dominance), the less difference between the groups. Unlike
Robust Confidence Intervals 4
Cohen’s d, Cliff’s effect size is bounded. An effect size of 1.0 or -1.0 indicates the absence of overlap
between the two groups whereas a 0.0 indicates no overlap and the group distributions are equivalent.
Interval Estimates of Effect Sizes
A variety of methods for constructing confidence bands around Cohen’s d have been investigated
in previous studies under various degrees of group differences, heterogeneity, sample size, and
distribution shape (Hess & Kromrey, 2003). Three techniques emerged as providing the most accurate
coverage under the greatest range of conditions.
The first technique relies on the asymptotic normality of the sampling distribution of d (Hedges &
Olkin, 1985). Confidence bands are constructed using percentiles from the standard normal distribution,
and the asymptotic variance of the standardized mean difference. That is, the upper and lower endpoints
of the band are given by
( )2 ˆ∆ = ±U L dd Zα σ (4)
where 2Zα is the normal deviate corresponding to the ( )1 2 thα− percentile of the normal
distribution and d is the sample value of Cohen’s d with estimated variance ( )2ˆ dσ .
Secondly, an interval inversion approach to confidence interval construction has most recently
undergone explication by Steiger and Fouladi (1982, 1997), complementing and adding to earlier work on
this type of approach (see, for example, Venables, 1975; Serlin & Lapsley, 1985). Such an approach has
been shown to provide accurate confidence intervals in a variety of conditions and has shown promise in
similar applications of confidence interval estimation (Kromrey & Hess, 2001; Hess & Kromrey, 2002).
When applied to the standardized mean difference, this method evaluates the noncentral t distribution and
identifies values of noncentrality for which the observed sample noncentrality is expected to occur (for
example) 2.5% of the time and 97.5% of the time. These values of noncentrality are then transformed to
provide the endpoints of a 95% confidence band around the sample value of d.
The third technique for constructing confidence intervals around effect sizes is that of the
bootstrap, a technique commonly recognized as an efficient method for providing estimates for, among
other things, confidence intervals and standard errors (Efron & Gong, 1983; Efron & Tibshirani, 1986;
Stine, 1990). The basic method of bootstrapping (called the percentile method by Carpenter and Bithell,
2000) consists of repeatedly drawing samples of size n with replacement (e.g., 1000 times or 5000 times)
from a single sample of n observations. Each bootstrap sample provides an estimate of the parameter of
interest and the set of estimates provides an empirical sampling distribution for the statistic. Percentile
points in this empirical sampling distribution (e.g., the 2.5th percentile and the 97.5th percentile) provide
the endpoints of the confidence interval. The percentile method is a relatively simple calculation and has
Robust Confidence Intervals 5
the advantage of not requiring an estimate of the standard error. Unfortunately, what this method
provides in simplicity and appeal also results in a tendency to be less successful for non-normal
distributions. Alternative approaches to bootstrapping, such as the pivotal bootstrap, bias corrected, and
bias corrected-accelerated bootstrap provided superior CI coverage in many conditions in previous studies
(Carpenter and Bithell, 2000; Hess & Kromrey 2003) and were thus included in this analysis.
In the Non-Studentized Pivotal method (PV) of bootstrapping, samples are drawn with
replacement, as with the percentile method, but the sampling distribution constructed and evaluated to
produce the confidence interval is the sampling distribution of ( )*d d− where d* is the bootstrap estimate
of Cohen’s d, and d is the observed sample value. The relevant percentiles of this empirical sampling
distribution are calculated and back-transformed to produce the endpoints of the confidence interval. In
the Studentized-Pivotal (SPV) bootstrap, the sampling distribution constructed is that of
( )*
*
ˆd
d dσ
−
where *ˆd
σ is the estimated standard error of the bootstrap estimate of Cohen’s d.
Thus, the SPV is a more highly computative method that requires an estimate for the standard error of the
statistic of interest. The ability to reliably estimate this value is not always straightforward. If an analytic
formula for the standard error is not available, the value can be estimated using the jackknife.
The Bias Corrected method (BC) of bootstrapping adjusts for asymmetry in the empirical
sampling distribution that is constructed. This method computes the proportion of the sampling
distribution that is less than the mean as an estimate of asymmetry and incorporates this estimate into the
endpoints of the confidence interval. An extension of this is the Bias Corrected and Accelerated (BCA)
method. The appeal for this method is its perceived ability to not only effectively adjust for asymmetric
sampling distributions, but to adjust for distribution changes along the range of bootstrap values (that is, if
the shape of the sampling distribution changes with the value of d*). In addition, it is reported to result in
smaller coverage error than both the percentile and BC methods. However, it is thought to have stability
issues for small type I error rates (α < .025) and can be highly complex computationally.
Interval Estimates for Cliff’s δ
The δ statistic and inferential methods associated with it are readily addressed by considering the
data from two groups in an arrangement called a dominance matrix. This n1 by n2 matrix has elements
taking the value of 1 if the row response is larger than the column response, -1 if the row response is less
than the column response, and 0 if the two responses are identical. The sample estimate of δ is simply
the average value of the elements in the dominance matrix.
Robust Confidence Intervals 6
Consider a hypothetical example in which the data displayed in Table 1 represent two sets of
classroom means obtained on a math achievement test. For this example, responses were obtained from
ten treatment and six control classrooms. The research question seeks to address whether the two
populations sampled were different with regard to their mean math achievement.
Table 1
Sample of Two Groups of Classroom Mean Achievement Scores.
Treatment
Classrooms
Control
Classrooms
10 10
10 20
20 30
20 40
20 40
30 50
30
30
40
50
Table 2 exhibits these data in a 10 x 6 dominance matrix. The elements of the matrix take the
value of 1 if the row (Treatment Classroom) mean is larger than the column (Control Classroom) mean.
The value 0 is assigned if the value for the two groups is the same and the value –1 is given if the row
mean is less than the column mean. These data result in a value for Cliff’s δ̂ (from Equation 3) of –0.25.
When used as an effect size index, Cliff’s δ̂ represents the degree of overlap between the two
distributions of scores. It ranges from –1 (if all observations in group 1 are larger than all observations in
group 2) to +1 (if all observations in group 1 are smaller than all observations in group 2) and takes the
value of zero if the two distributions are identical.
Robust Confidence Intervals 7
Table 2
Dominance Matrix.
10 20 30 40 40 50
10 0 -1 -1 -1 -1 -1
10 0 -1 -1 -1 -1 -1
20 1 0 -1 -1 -1 -1
20 1 0 -1 -1 -1 -1
20 1 0 -1 -1 -1 -1
30 1 1 0 -1 -1 -1
30 1 1 0 -1 -1 -1
30 1 1 0 -1 -1 -1
40 1 1 1 0 0 -1
50 1 1 1 1 1 0
Cliff (1996b) suggested a variety of methods for inference about δ , but the current study focused
on the construction of confidence intervals. A consistent estimate of the variance of δ is given by
( ) ( )2 2 2
2 12
1 2
1 1ˆ di d j dij
dc
n S n S Sn n
σ− + − +
= i i (5)
where id i is the marginal value of row i,
jdi is the column marginal of column j,
ijd is the value of element ij in the matrix,
( )2
2
1
ˆ
1i
di
dS
n
δ−=
−∑ i
i
( )2
2
2
ˆ
1j
d j
dS
n
δ−=
−∑ i
i , and
Robust Confidence Intervals 8
( )( ) ( )
2
2
1 2
ˆ
1 1ij
dij
dS
n n
δ−=
− + −∑∑
.
The limits of an asymmetric confidence interval are given by
( )2
3 2 2 22 2
2 2 22
ˆ ˆ ˆˆ ˆ1ˆ ˆ1dc dc
dc
Z Z
Zα α
α
δ δ σ δ σ
δ σ
− ± − +
− + (6)
where 2Zα is the normal deviate corresponding to the ( )1 2 thα− percentile of the normal
distribution.
The variance of δ̂ for the data in Table 1 (using Equation 5) is 0.106 and the 95% CI (using
Equation 6) is [-0.71, 0.36]. In contrast, Cohen’s d for these data (using Equation 1) is –0.422 and the
95% CI from Equation 4 is [-1.44, 0.60].
In addition to this confidence interval suggested by Cliff, the five bootstrap methods described
above were applied to the δ statistic.
Method
Monte Carlo methods were used to compare confidence band estimates using random samples
generated from populations under known and controlled conditions. Confidence intervals of 90%, 95%
and 99% were investigated for each sample under each condition for each effect size of interest. The four
design factors in this study included (a) sample size (ranging from 5 to 200, including balanced and
unbalanced designs), (b) population effect size (with ∆= 0.00, 0.20, 0.50 and 0.80 corresponding to
Cohen’s small, medium and large effects as well as a null condition), (c) population distribution shape
(population skewness and kurtosis of 0,0 and 2,6), and (d) variance in the two populations (with 2 21 2:σ σ =
1:1, 1:2, 1:4, and 1:8).
The research was conducted using SAS/IML version 8.2 and run under both Windows and Unix
platforms. Normally distributed random variables were generated using the RANNOR random number
generator in SAS using a different seed value for each generation. The program code was verified
through a baseline hand-check of benchmark datasets. Conditions requiring non-normal population
distributions were produced by transforming the normal random variates obtained from RANNOR using
the technique described by Fleishman (1978).
Robust Confidence Intervals 9
Trials consisting of 5000 samples for each test condition were conducted, assuring sufficient
precision for an adequate initial investigation into the sampling behavior of these confidence bands.
According to Robey and Barcikowski (1992), the use of 5000 replications provides a maximum 95%
confidence interval width around an observed proportion that is ± .0098.
Results and Conclusions
The results were analyzed in terms of confidence band coverage probabilities and confidence
band widths for each of the effect size estimates under consideration. Interpretations of the effectiveness
of the resulting coverage of each of the methods were guided by the guidelines provided by Bradley
(1978) regarding the degree of departure from nominal coverage. In general the results of the research
support the use of Cliff’s δ as a robust effect size estimate. Confidence band coverage for δ was
superior to that of Cohen’s d under heterogeneous variances and non-normal distributions. The results are
presented for the conditions run using a Type I error rate of .05. Findings were similar for alpha of .01
and .10 and are available from the authors upon request.
Coverage Probabilities
All of the methods of CI construction, when examined across all of the conditions, provided
better coverage probabilities for Cliff’s δ as compared to Cohen’s d, with the exception of the Pivotal
Bootstrap method (see Figure 1). The normal theory Z method seemed to provide exceptional coverage
for Cliff’s δ . The studentized pivotal bootstrap appeared to provide more liberal coverage for Cliff’s δ
as compared to the other methods.
Effect Size. Coverage probabilities as a function of effect size, across conditions, show a marked
decrease in proportional coverage for all methods for Cohen’s d (see Table 1). However, for Cliff’s δ ,
the normal theory Z bands consistently provided the nominal coverage probabilities desired. Regardless
of whether no effect is present (∆= 0) or a large effect is present (∆= 0.8) the Z bands provided coverage
of 0.95 consistently. The bias corrected-accelerated bootstrap method performed almost as well,
maintaining a coverage probability of approximately 0.95 for all but the largest effect size, for which
coverage fell to 94%. In general, coverage probabilities for Cliff’s δ tended to be better than those of
Cohen’s d across techniques with the exception of the pivotal bootstrap (with mean coverage for δ
dropping as low as .89).
Variance Between Groups. When varying degrees of variance heterogeneity were examined
(Table 2), the normal theory Z bands for δ continued to maintain the nominal coverage probability (0.95)
regardless of degree of heterogeneity. Again, the least effective method for Cliff’s δ was the pivotal
Robust Confidence Intervals 10
bootstrap, providing only a 90% probability of coverage, even under homogeneous conditions. The
studentized pivotal exhibited slightly conservative coverage under all degrees of heterogeneity, with
coverage probabilities of approximately 97% across all degrees of heterogeneity.
Distribution Shape. The two distribution shapes examined (Table 3) represent two conditions
from the potential continuum of distributions possible. For normal population distributions, the pivotal
bootstrap performed best for Cohen’s d whereas both the normal theory Z band and the bias corrected-
accelerated bootstrap bands for Cliff’s δ provided nominal coverage of 0.95. The studentized pivotal
once again produced results with slight over coverage (97%) in conditions with a normal distribution.
When conditions were considered with a highly skewed (2.0) and kurtotic (6.0) distribution, all of the
methods for both Cohen’s d and Cliff’s δ deteriorated in their ability to provide adequate coverage, with
the exception of the Z bands for Cliff’s δ .
Sample Size. Sample size had two obvious effects on coverage, both as function of size and
balance of sizes (Table 4). For those conditions in which n2 > n1, a positive pairing with populations
variances, coverage tended to be enhanced, and, in some cases, coverage probabilities were excessive
relative to the desired alpha level. Slight over coverage occurred with positive pairings of small sample
sizes and variances for Cohen’s d using the normal theory Z bands (.97). Similar results occurred for
Cliff’s δ for not only the normal theory Z band method (.96), but also the bias corrected-accelerated and
studentized pivotal methods (.96 and .99 respectively). When sample size increased, this issue tended to
be resolved and coverage was much closer to the nominal coverage desired. When unbalanced sample
sizes were negatively paired with population variances (smaller size with larger variance), coverage
probabilities were reduced noticeably. Typically, the bands constructed around Cliff’s δ using the Z
method were robust to changes in sample sizes as well as balance shifts.
Specific Conditions. Confidence band coverage estimates for selected conditions are provided in
Tables 5 – 8. For normal distributions with homogeneous variances (Table 5), the coverage estimates for
Cohen’s d were near nominal levels across conditions for the normal theory Z bands, the pivotal and
studentized pivotal bootstraps, and the Steiger and Fouladi interval inversion bands. The percentile
bootstrap, bias corrected and bias corrected accelerated bootstraps evidenced less than nominal coverage
in small samples (n1 + n2 = 20), but provided adequate coverage with large samples (n1 + n2 = 200). For
Cliff’s δ , the normal theory Z bands and the bias corrected and bias corrected accelerated bootstrap
bands provided adequate coverage across these conditions, while the percentile and pivotal bootstraps
provided lower coverage in the small sample conditions. Conversely, the studentized pivotal bootstrap
provided overly conservative coverage with small samples (with coverage probability estimates typically
at .99 or above).
Robust Confidence Intervals 11
With normal distributions and heterogeneous variances (1:8 variance ratio), the deleterious
impact on Cohen’s d of unequal sample sizes is evident (Table 6). With the normal theory Z bands,
confidence interval coverage estimates reached as low as .75 (with n1 = 150, n2 = 50 and ∆= 0.8). Similar
declines in confidence band coverage were evident for all of the bootstrap approaches, as well as the
Steiger and Fouladi approach. The confidence intervals for Cliff’s δ were less affected by the
heterogeneity in the populations. The normal theory Z bands provided adequate coverage for all
conditions except those with n1 = 15, n2 = 5 and ∆> 0.2 (and in these conditions, the band coverage still
exceeded .91). These small sample, unbalanced conditions also led to reduced coverage for the bias
corrected and bias corrected accelerated bootstrap bands (with coverage dipping below .88). Finally, the
studentized pivotal bootstrap bands retained their conservative coverage with small samples, but coverage
near nominal levels with large samples.
Under conditions of non-normal population distributions and homogeneous variances (Table 7),
the normal theory Z bands and the Steiger and Fouladi bands for Cohen’s d provided declining coverage
as ∆ increased, but the coverage reached only as low as .91 (n1 = n2 = 10). All of the bootstrap approaches
appeared to be more adversely affected by the non-normality, especially in small sample conditions. For
example, confidence interval coverage estimates reached as low as .77 with the percentile bootstrap when
n1 = 15, n2 = 5 and ∆= 0.8. For Cliff’s δ , the normal theory Z bands maintained coverage near the
nominal level across these conditions while the bootstrap bands maintained near-nominal coverage under
the large samples. For small samples, the bootstraps evidenced reduced coverage (reaching as low as .72
with the pivotal bootstrap when n1 = 15, n2 = 5 and ∆= 0.8). As with the previous conditions examined,
the studentized pivotal bootstrap showed conservative coverage with small samples under non-normal
homogenous populations.
Finally, under non-normal distributions and heterogenous variances (1:8 variance ratio), the
greatest impact on the confidence intervals for Cohen’s d was evident (Table 8). With the normal theory Z
bands, confidence interval coverage estimates reached as low as .64 (with n1 = 150, n2 = 50 and ∆= 0.8)
and similar poor coverage was evident for the bootstrap intervals and the Steiger and Fouladi approach.
Conversely, the normal theory Z bands for Cliff’s δ provided adequate coverage for all conditions
presented in this table. As with previous conditions, the bootstrap intervals for Cliff’s δ evidenced
reduced coverage with small samples, but adequate coverage when samples were large. Also consistent
with previous conditions, the studentized pivotal bootstrap bands showed conservative coverage with
small samples, but coverage near nominal levels with large samples.
Robust Confidence Intervals 12
Bandwidth
The mean widths of the confidence intervals across conditions are presented in Figure 2. Because
Cohen’s d and Cliff’s δ represent different scales, the interval widths across statistics are not directly
comparable. Within statistics, however, the typical bandwidths were comparable across the methods of CI
construction, with an exception being the SPV method applied to Cliff’s δ -- a method that resulted in
notably wider intervals.
Effect Size. Bandwidths for Cliff’s δ tended to decrease as the magnitude of the effect increased,
whereas bandwidths around Cohen’s d increased as effect size increased (Table 9). For example, bands
constructed by the normal theory Z technique around Cohen’s d went from 1.15 when ∆= 0 to 1.19 when
∆= 0.8. Conversely, confidence bands constructed using the same technique around Cliff’s δ = 0
decreased from 0.63 to 0.57.
Variance Between Groups. When varying degrees of variance heterogeneity were examined
(Table 10), the confidence bands around Cliff’s δ and those constructed around Cohen’s d behaved
consistently. For both effect size indices, the confidence intervals increased in width as the degree of
variance heterogeneity increased.
Distribution Shape. The shape of the distribution, whether normal or highly skewed and kurtotic,
seemed to have minimal impact on confidence interval widths, regardless of the method used or the
parameter being estimated (see Table 11). The largest degrees of magnitude in change occurred with the
standardized pivotal method for both Cohen’s d and Cliff’s δ , going from 1.20 to 1.28 and 1.5 to 2.0
respectively. The rest of the changes in bandwidth tended to be quite small, and, in some cases,
decreasing for Cliff’s δ for the non-normal distribution.
Sample Size. As expected, average confidence interval widths were reduced as sample size
increased (see Table 12). For the smallest samples examined (n1 + n2 = 20), the confidence intervals for
both Cohen’s d and Cliff’s δ were wide enough to be virtually uninformative about the parameter
location.
Specific Conditions. Confidence bandwidth estimates for selected conditions are provided in
Tables 13 – 16. Table 13 provides bandwidth estimates for those conditions that are normal with equal
variances. Most notable in this table is the exceptionally low precision (i.e., extremely wide confidence
intervals) for all of the small sample conditions. For both small and large sample sizes, slightly better
precision was evident when sample sizes were equal and somewhat less precision was seen with larger
values of∆ . Across the interval estimation methods, few differences were evidenced for either Cohen’s d
or Cliff’s δ , with the exception of the Studentized Pivotal Bootstrap bands for δ (an approach that
Robust Confidence Intervals 13
yielded notably larger confidence intervals). Table 14 presents the conditions in which normal
distributions are coupled with heterogeneous (1:8) population variances. For these conditions, the
confidence bandwidths were related to the pairing of sample size with population variance. More
precision was evident with positive pairing and less precision with negative pairing. Further, differences
in precision between the bootstrap bands and the normal theory Z bands were evident for Cohen’s d (with
the bootstrap approaches yielding smaller bands with positive pairing and less precision with negative
pairing), but these differences were not apparent in the comparison of bands for Cliff’s δ . Table 15
provides estimates for conditions coupling non-normality with homogeneous population variances,
conditions that suggest the similar patterns to those noted with the normal distributions under
homogeneous variances. Finally, Table 16 presents estimates for conditions with non-normality and
heterogeneity (1:8) of variances.
Conclusions
It is imperative that we continue to explore the methods used in both theoretical and applied
research as the result of those methods have the potential for far-reaching and in-depth impact on
educational researchers and practitioners alike. The ability to use an interval approach for estimating the
effects of instructional strategies continues to show promise as educational practices continue to develop
and adapt under growing scrutiny. However, it is critical that the appropriateness and effectiveness of
confidence band construction be investigated relative to not only the type of parameter being estimated,
but also different measures of that parameter. This study clearly illustrates that while the normal theory Z
band approach may not have been the most effective for construction confidence bands around Cohen’s d,
it did provide very impressive coverage probabilities for Cliff’s δ . Decisions such as the viability and
appropriateness of using one estimate of effect size as compared to another is, obviously, up to the
researcher. Once that decision is made, then the next should be regarding the best approach for
construction CIs. Further investigation into when to use different CI techniques as a function of data
characteristics, parameter characteristics, and computational sophistication is critical. CI construction can
not, and should not, be thought of as a ‘one size fits all’ issue.
Robust Confidence Intervals 14
References
American Psychological Association (2001). Publication manual of the American Psychological
Association (5th ed.). Washington, DC: Author.
Bradley, J.V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144-
151.
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological
Bulletin, 114, 494-509.
Cliff, N. (1996a). Answering ordinal questions with ordinal data using ordinal statistics. Multivariate
Behavioral Research, 31, 331-350.
Cliff, N. (1996b). Ordinal Methods for Behaioral Data Analysis. New Jersey: Lawrence Erlbaum
Associates.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). New York: Academic
Press.
Cooper H. & Hedges, L. (1994). The Handbook of Research Synthesis. New York: Russel Sage
Foundation.
Efron, B. & Gong, G. (1983). A leisurely look at the bootstrap, the jackknife, and cross-validation. The
American Statistician, 37(1), pg 36-49.
Efron, B. & Tibshirani, R. (1986). Bootstrap methods for standard errors, confidence intervals, and other
measures of statistical accuracy. Statistical Science, 1(1), p. 54-77.
Fleishman, A. I. (1978). A method for simulating non-normal distributions. Psychometrika, 43(4),
p.521-532.
Grissom R.J. & Kim J.J. (2001). Review of assumptions and problems in the appropriate
conceptualization of effect size. Psychological Methods, 6(2), p. 135-146.
Hedges L.V. & Olkin I. (1985). Statistical Methods for Meta-Analysis. New York: Academic Press.
Hess M. & Kromrey, J.D. (2003, February). Confidence Bands for Standardized Mean Differences: A
Comparison of Nine Techniques Under Non-normality and Variance Heterogeneity. Paper
presented at the Eastern Educational Research Association, Hilton Head, NC.
Hess, M.R. & Kromrey, J.D. (2002, April). Confidence intervals for the standardized mean difference:
An empirical comparison of methods for interval estimation of effect sizes. Paper presented at
the American Educational Research Association, New Orleans, LA
Robust Confidence Intervals 15
Hogarty K. Y. & Kromrey, J.D. (1999, August). Traditional and robust effect size estimates: Power and
Type I error control in meta-analystic tests of homogeneity. Paper presented at the Joint
Statistical Meetings, Baltimore.
Hogarty K. Y. & Kromrey, J.D. (2001, April). We’ve Been Reporting Some Effect Sizes: Can You
Guess What They Mean? Paper presented at the American Educational Research Association,
Seattle.
Kromrey, K. Y. & Hess, M. H. (2001, April). Interval Estimates of R2: An empirical comparison of
accuracy and precision under violations of the normality assumption. Paper presented at the
annual meeting of the American Educational Research Association, Seattle, WA.
Kromrey, J. D. & Hogarty, K. Y. (1999, April). Traditional and robust effect size estimates: an empirical
comparison in meta-analystic tests of homogeneity. Paper presented at the annual meeting of the
American Educational Research Association, Montreal.
McMillan, J.H., Snyder, A., Lewis, K.L., (2002, April). Reporting Effect Size: The Road Less Traveled.
Paper presented at the annual meeting of the American Educational Research Association, New
Orleans, LA.
Nix, T.W. & Barnette, J. J. (1998). The data analysis dilemma: Ban or abandon. A review of null
hypothesis signficance testing. Research in the Schools, 5(2), p. 3-14.
Robey, R.R. & Barcikowski, R.S. (1992). Tye I error and the number of iterations in Monte Carlo studies
of robustness. British Journal of Mathematical and Statistical Psychology, 45, 283-288.
Serlin, R. C. & Lapsley, D. K. (1985). Rationality in psychological research: The good-enough principle. American Psychologist, 40, 73-83.
Steiger, J. H. & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power
calculation, and hypothesis testing for the squared multiple correlation. Behavior Research,
Methods, Instruments, and Computers, 4, 581-582.
Steiger, J. H. & Fouladi, R. T. (1992). R2: A computer program for interval estimation, power
calculation, and hypothesis testing for the squared multiple correlation. Behavior Research,
Methods, Instruments, and Computers, 4, 581-582.
Stine, R. (1990). An introduction to bootstrap methods. Sociological Methods and Research, 18 (2&3),
p. 243-291.
Thompson, B. (1998). Statistical significance and effect size reporting: Portrait of a possible future.
Research in the Schools, 5(2), p. 33-38.
Robust Confidence Intervals 16
Venables, W. (1975). Calculation of confidence intervals for noncentrality parameters. Journal of the
Royal Statistical Society, Series B, 37, 406-412.
Wilkinson & APA Task Force on Statistical Inference. (1999). Statistical methods in psychology
journals: Guidelines and explanations. American Psychologist, 54, 594-604.
Robust Confidence Intervals 17
Table 1 Estimated confidence band coverage by effect size, across conditions
Cohen’s d Cliff’s Delta
delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 0.0 0.929 0.920 0.932 0.942 0.949 0.936 0.931 0.952 0.937 0.944 0.951 0.912 0.971 0.2 0.923 0.916 0.928 0.938 0.943 0.929 0.925 0.953 0.936 0.944 0.952 0.910 0.971 0.5 0.909 0.903 0.915 0.924 0.931 0.916 0.911 0.952 0.931 0.942 0.948 0.904 0.967 0.8 0.885 0.884 0.896 0.904 0.912 0.896 0.887 0.952 0.924 0.937 0.942 0.891 0.960
Table 2 Estimated confidence band coverage by degree of heterogeneity, across conditions
Cohen’s d Cliff’s Delta Variance
Ratio Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 1:1 0.945 0.917 0.931 0.937 0.945 0.933 0.947 0.952 0.933 0.945 0.953 0.906 0.969 1:2 0.928 0.912 0.925 0.933 0.939 0.926 0.930 0.953 0.933 0.944 0.950 0.907 0.969 1:4 0.899 0.902 0.914 0.923 0.930 0.914 0.901 0.951 0.931 0.940 0.946 0.903 0.966 1:8 0.874 0.892 0.903 0.914 0.920 0.904 0.876 0.953 0.930 0.938 0.944 0.901 0.965
Table 3 Estimated confidence band coverage by distribution shape, across conditions
Cohen’s d Cliff’s Delta Skewness, Kurtosis Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std Pivotal
0.0, 0.0 0.927 0.917 0.926 0.935 0.950 0.937 0.929 0.952 0.936 0.944 0.953 0.910 0.971 2.0, 6.0 0.896 0.895 0.910 0.919 0.918 0.901 0.898 0.953 0.928 0.939 0.944 0.898 0.964
Robust Confidence Intervals 18
Table 4 Estimated confidence band coverage by sample size, across conditions
Cohen’s d Cliff’s Delta
n1 n2 Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.966 0.897 0.917 0.927 0.936 0.919 0.968 0.962 0.927 0.949 0.963 0.878 0.993
10 10 0.921 0.886 0.916 0.938 0.966 0.932 0.926 0.961 0.926 0.949 0.966 0.865 0.992 15 5 0.840 0.817 0.851 0.876 0.942 0.894 0.848 0.941 0.868 0.893 0.893 0.788 0.955 25 75 0.966 0.926 0.927 0.925 0.911 0.908 0.967 0.952 0.946 0.949 0.953 0.935 0.962 50 50 0.931 0.937 0.943 0.948 0.943 0.937 0.931 0.952 0.945 0.948 0.953 0.933 0.961 75 25 0.849 0.911 0.922 0.936 0.936 0.924 0.850 0.949 0.939 0.944 0.955 0.921 0.974 50 150 0.955 0.914 0.913 0.909 0.892 0.892 0.955 0.951 0.947 0.949 0.950 0.941 0.955
100 100 0.927 0.940 0.942 0.944 0.939 0.936 0.927 0.950 0.945 0.946 0.948 0.940 0.954 150 50 0.848 0.923 0.931 0.939 0.939 0.931 0.849 0.951 0.944 0.948 0.952 0.935 0.962
Robust Confidence Intervals 19
Table 5 Estimated confidence band coverage with normal, homogeneous populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 0.945 0.893 0.915 0.925 0.971 0.945 0.948 0.947 0.916 0.933 0.948 0.863 0.998 0.2 0.947 0.887 0.912 0.926 0.971 0.945 0.949 0.949 0.912 0.935 0.954 0.849 0.999 0.5 0.945 0.887 0.913 0.926 0.970 0.944 0.950 0.949 0.909 0.938 0.961 0.845 0.995 0.8 0.944 0.881 0.909 0.920 0.965 0.940 0.948 0.951 0.896 0.935 0.954 0.824 0.979
10 10 0.0 0.944 0.920 0.944 0.961 0.979 0.958 0.950 0.964 0.940 0.954 0.973 0.896 0.999 0.2 0.953 0.924 0.950 0.965 0.985 0.962 0.957 0.967 0.947 0.959 0.973 0.901 0.999 0.5 0.950 0.920 0.946 0.961 0.981 0.960 0.953 0.967 0.938 0.959 0.973 0.887 0.998 0.8 0.948 0.905 0.936 0.953 0.979 0.955 0.951 0.963 0.932 0.958 0.973 0.860 0.993
15 5 0.0 0.951 0.893 0.908 0.921 0.969 0.939 0.951 0.957 0.911 0.931 0.949 0.863 1.000 0.2 0.951 0.900 0.921 0.937 0.979 0.949 0.956 0.948 0.919 0.943 0.957 0.858 0.998 0.5 0.947 0.880 0.904 0.924 0.975 0.949 0.952 0.944 0.895 0.933 0.952 0.836 0.996 0.8 0.936 0.862 0.893 0.904 0.962 0.930 0.938 0.936 0.880 0.917 0.941 0.807 0.986
50 150 0.0 0.960 0.950 0.957 0.959 0.958 0.953 0.960 0.961 0.954 0.958 0.960 0.953 0.967 0.2 0.942 0.929 0.931 0.932 0.938 0.936 0.942 0.938 0.935 0.936 0.939 0.928 0.947 0.5 0.956 0.945 0.946 0.952 0.956 0.951 0.957 0.951 0.947 0.949 0.950 0.940 0.960 0.8 0.938 0.936 0.938 0.942 0.943 0.942 0.939 0.946 0.944 0.947 0.946 0.929 0.955
100 100 0.0 0.940 0.936 0.936 0.937 0.945 0.944 0.940 0.941 0.938 0.937 0.938 0.929 0.941 0.2 0.952 0.950 0.955 0.957 0.960 0.958 0.952 0.964 0.955 0.953 0.955 0.944 0.960 0.5 0.959 0.950 0.953 0.952 0.960 0.959 0.959 0.958 0.955 0.957 0.958 0.952 0.962 0.8 0.954 0.951 0.954 0.952 0.954 0.957 0.953 0.957 0.954 0.956 0.957 0.952 0.964
150 50 0.0 0.963 0.959 0.959 0.959 0.962 0.960 0.963 0.957 0.959 0.962 0.964 0.955 0.966 0.2 0.949 0.941 0.942 0.944 0.949 0.945 0.949 0.958 0.956 0.956 0.958 0.945 0.957 0.5 0.949 0.949 0.950 0.952 0.950 0.949 0.950 0.952 0.952 0.952 0.953 0.945 0.956 0.8 0.946 0.927 0.934 0.939 0.945 0.940 0.945 0.943 0.936 0.941 0.945 0.926 0.953
Robust Confidence Intervals 20
Table 6 Estimated confidence band coverage with normal, heterogeneous(1:8) populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 0.995 0.907 0.929 0.949 0.978 0.973 0.996 0.970 0.933 0.949 0.967 0.898 1.000 0.2 0.991 0.909 0.926 0.946 0.975 0.963 0.991 0.974 0.939 0.952 0.972 0.901 0.999 0.5 0.989 0.914 0.928 0.939 0.961 0.952 0.989 0.973 0.941 0.957 0.970 0.903 0.997 0.8 0.991 0.903 0.915 0.912 0.923 0.910 0.991 0.973 0.934 0.959 0.965 0.884 0.989
10 10 0.0 0.935 0.905 0.928 0.962 0.983 0.958 0.939 0.967 0.930 0.944 0.975 0.887 0.999 0.2 0.941 0.916 0.934 0.962 0.984 0.963 0.946 0.966 0.937 0.953 0.979 0.881 0.999 0.5 0.952 0.931 0.956 0.971 0.988 0.972 0.954 0.975 0.951 0.969 0.983 0.902 0.999 0.8 0.925 0.889 0.917 0.945 0.979 0.956 0.930 0.954 0.912 0.943 0.964 0.839 0.989
15 5 0.0 0.771 0.841 0.874 0.911 0.963 0.908 0.777 0.935 0.901 0.904 0.913 0.772 0.981 0.2 0.765 0.852 0.891 0.921 0.965 0.912 0.772 0.941 0.901 0.913 0.924 0.774 0.984 0.5 0.763 0.818 0.850 0.890 0.956 0.897 0.768 0.911 0.852 0.867 0.878 0.762 0.959 0.8 0.782 0.820 0.857 0.888 0.957 0.901 0.794 0.915 0.844 0.863 0.879 0.777 0.934
50 150 0.0 0.999 0.934 0.941 0.944 0.942 0.942 0.999 0.941 0.939 0.940 0.944 0.935 0.946 0.2 0.995 0.931 0.924 0.925 0.936 0.935 0.995 0.959 0.955 0.955 0.958 0.952 0.959 0.5 0.962 0.830 0.825 0.819 0.825 0.827 0.962 0.940 0.934 0.940 0.941 0.935 0.945 0.8 0.876 0.661 0.641 0.626 0.592 0.601 0.875 0.954 0.947 0.949 0.951 0.942 0.957
100 100 0.0 0.940 0.943 0.947 0.952 0.951 0.948 0.940 0.944 0.942 0.945 0.945 0.933 0.949 0.2 0.950 0.946 0.951 0.953 0.959 0.956 0.950 0.959 0.954 0.953 0.955 0.951 0.964 0.5 0.931 0.932 0.929 0.933 0.930 0.929 0.931 0.943 0.938 0.941 0.943 0.932 0.949 0.8 0.901 0.916 0.903 0.903 0.907 0.906 0.902 0.947 0.939 0.946 0.954 0.939 0.953
150 50 0.0 0.814 0.942 0.943 0.949 0.952 0.949 0.814 0.948 0.943 0.942 0.945 0.928 0.955 0.2 0.796 0.941 0.945 0.953 0.954 0.946 0.796 0.953 0.944 0.946 0.958 0.935 0.968 0.5 0.777 0.914 0.918 0.932 0.942 0.929 0.779 0.942 0.923 0.933 0.944 0.916 0.962 0.8 0.747 0.900 0.914 0.929 0.939 0.917 0.752 0.959 0.949 0.953 0.956 0.936 0.974
Robust Confidence Intervals 21
Table 7 Estimated confidence band coverage with non-normal, homogeneous populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 0.949 0.853 0.892 0.895 0.899 0.879 0.951 0.942 0.902 0.927 0.943 0.848 0.997 0.2 0.945 0.878 0.905 0.914 0.886 0.874 0.951 0.956 0.923 0.938 0.955 0.865 0.996 0.5 0.943 0.881 0.901 0.910 0.897 0.876 0.948 0.954 0.926 0.941 0.962 0.867 0.991 0.8 0.924 0.875 0.902 0.915 0.915 0.892 0.931 0.970 0.914 0.959 0.968 0.849 0.977
10 10 0.0 0.946 0.904 0.934 0.946 0.959 0.917 0.949 0.962 0.933 0.954 0.971 0.891 1.000 0.2 0.960 0.909 0.946 0.957 0.968 0.923 0.963 0.972 0.940 0.970 0.987 0.895 1.000 0.5 0.937 0.865 0.916 0.929 0.961 0.921 0.941 0.958 0.923 0.951 0.969 0.859 0.993 0.8 0.913 0.849 0.889 0.908 0.958 0.912 0.919 0.958 0.908 0.947 0.960 0.831 0.976
15 5 0.0 0.953 0.867 0.905 0.916 0.908 0.888 0.955 0.955 0.912 0.933 0.946 0.860 0.999 0.2 0.957 0.844 0.888 0.903 0.920 0.895 0.963 0.946 0.908 0.936 0.950 0.822 0.993 0.5 0.947 0.803 0.857 0.872 0.944 0.911 0.953 0.955 0.875 0.924 0.926 0.790 0.967 0.8 0.935 0.776 0.823 0.847 0.957 0.913 0.945 0.951 0.827 0.865 0.847 0.721 0.886
50 150 0.0 0.946 0.935 0.941 0.945 0.922 0.918 0.946 0.953 0.952 0.954 0.957 0.943 0.958 0.2 0.961 0.960 0.962 0.960 0.948 0.944 0.961 0.964 0.959 0.959 0.963 0.946 0.966 0.5 0.945 0.939 0.943 0.946 0.933 0.929 0.945 0.955 0.948 0.949 0.952 0.941 0.957 0.8 0.931 0.948 0.951 0.948 0.933 0.930 0.932 0.953 0.949 0.954 0.955 0.939 0.955
100 100 0.0 0.948 0.943 0.946 0.949 0.937 0.933 0.948 0.942 0.942 0.942 0.945 0.934 0.946 0.2 0.953 0.947 0.951 0.953 0.941 0.939 0.953 0.938 0.932 0.933 0.934 0.929 0.945 0.5 0.937 0.945 0.952 0.953 0.940 0.939 0.937 0.940 0.934 0.937 0.941 0.937 0.949 0.8 0.920 0.926 0.929 0.936 0.934 0.927 0.919 0.944 0.938 0.941 0.944 0.930 0.950
150 50 0.0 0.946 0.940 0.943 0.945 0.933 0.930 0.947 0.937 0.929 0.929 0.933 0.923 0.943 0.2 0.963 0.946 0.954 0.953 0.947 0.945 0.964 0.963 0.955 0.959 0.964 0.945 0.969 0.5 0.948 0.928 0.937 0.940 0.942 0.936 0.948 0.955 0.953 0.953 0.958 0.942 0.966 0.8 0.927 0.914 0.920 0.913 0.925 0.921 0.927 0.952 0.945 0.956 0.957 0.929 0.966
Robust Confidence Intervals 22
Table 8 Estimated confidence band coverage with non-normal, heterogeneous (1:8) populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 0.970 0.906 0.929 0.944 0.980 0.964 0.971 0.968 0.932 0.950 0.961 0.900 1.000 0.2 0.979 0.913 0.933 0.950 0.941 0.928 0.981 0.962 0.941 0.953 0.969 0.908 0.994 0.5 0.971 0.898 0.912 0.918 0.891 0.874 0.973 0.965 0.938 0.955 0.962 0.890 0.990 0.8 0.958 0.892 0.902 0.903 0.836 0.826 0.961 0.971 0.911 0.959 0.966 0.853 0.987
10 10 0.0 0.904 0.879 0.908 0.932 0.971 0.940 0.908 0.962 0.926 0.952 0.967 0.878 0.995 0.2 0.867 0.840 0.867 0.905 0.946 0.896 0.874 0.961 0.912 0.944 0.960 0.865 0.983 0.5 0.842 0.835 0.861 0.895 0.929 0.883 0.850 0.951 0.910 0.923 0.941 0.827 0.972 0.8 0.814 0.799 0.833 0.870 0.920 0.856 0.830 0.969 0.909 0.927 0.927 0.777 0.955
15 5 0.0 0.738 0.784 0.818 0.855 0.907 0.856 0.742 0.958 0.877 0.881 0.865 0.770 0.953 0.2 0.725 0.780 0.816 0.853 0.924 0.863 0.733 0.951 0.859 0.871 0.870 0.774 0.938 0.5 0.660 0.729 0.766 0.806 0.918 0.841 0.672 0.945 0.817 0.833 0.784 0.750 0.883 0.8 0.652 0.700 0.726 0.762 0.931 0.837 0.668 0.967 0.766 0.788 0.720 0.724 0.811
50 150 0.0 0.996 0.952 0.957 0.962 0.948 0.945 0.996 0.955 0.952 0.948 0.951 0.941 0.950 0.2 0.992 0.935 0.936 0.930 0.897 0.896 0.992 0.950 0.946 0.946 0.947 0.943 0.955 0.5 0.940 0.894 0.885 0.862 0.800 0.799 0.940 0.955 0.952 0.954 0.959 0.948 0.957 0.8 0.821 0.813 0.797 0.756 0.671 0.675 0.820 0.944 0.940 0.940 0.944 0.932 0.953
100 100 0.0 0.942 0.941 0.946 0.953 0.955 0.952 0.943 0.949 0.945 0.948 0.952 0.941 0.953 0.2 0.910 0.931 0.937 0.934 0.919 0.914 0.911 0.939 0.933 0.934 0.938 0.929 0.942 0.5 0.891 0.951 0.947 0.942 0.923 0.920 0.890 0.966 0.960 0.958 0.959 0.946 0.971 0.8 0.811 0.935 0.933 0.925 0.896 0.894 0.812 0.944 0.946 0.947 0.950 0.940 0.951
150 50 0.0 0.806 0.936 0.943 0.950 0.945 0.939 0.806 0.946 0.941 0.948 0.955 0.931 0.960 0.2 0.756 0.912 0.921 0.938 0.933 0.924 0.756 0.949 0.939 0.940 0.945 0.929 0.965 0.5 0.682 0.894 0.912 0.928 0.931 0.919 0.683 0.960 0.951 0.955 0.960 0.939 0.972 0.8 0.643 0.874 0.892 0.923 0.924 0.902 0.642 0.941 0.935 0.938 0.943 0.922 0.966
Robust Confidence Intervals 23
Table 9 Estimated bandwidth by effect size, across conditions
Cohen’s d Cliff’s Delta
delta Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 0.0 1.149 1.257 1.247 1.260 1.257 1.187 1.150 0.628 0.652 0.658 0.664 0.652 1.542 0.2 1.153 1.289 1.273 1.286 1.289 1.210 1.154 0.620 0.640 0.649 0.656 0.640 1.652 0.5 1.168 1.354 1.325 1.336 1.354 1.254 1.170 0.598 0.604 0.619 0.629 0.604 1.818 0.8 1.194 1.437 1.389 1.398 1.437 1.311 1.198 0.569 0.555 0.575 0.590 0.555 1.936
Table 10 Estimated bandwidth by degree of heterogeneity, across conditions
Cohen’s d Cliff’s Delta Variance
Ratio Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 1:1 1.162 1.212 1.197 1.206 1.212 1.154 1.164 0.592 0.604 0.614 0.623 0.604 1.283 1:2 1.164 1.269 1.249 1.260 1.269 1.198 1.166 0.597 0.606 0.619 0.628 0.606 1.496 1:4 1.167 1.369 1.340 1.352 1.369 1.268 1.169 0.607 0.615 0.628 0.638 0.615 1.896 1:8 1.171 1.489 1.448 1.462 1.489 1.342 1.174 0.621 0.626 0.640 0.650 0.626 2.273
Table 11 Estimated bandwidth by distribution shape, across conditions
Cohen’s d Cliff’s Delta Skewness, Kurtosis Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std Pivotal
0.0, 0.0 1.162 1.271 1.257 1.265 1.271 1.198 1.164 0.608 0.627 0.636 0.645 0.627 1.493 2.0, 6.0 1.170 1.398 1.360 1.375 1.398 1.283 1.172 0.600 0.599 0.615 0.625 0.599 1.981
Robust Confidence Intervals 24
Table 12 Estimated bandwidth by sample size, across conditions
Cohen’s d Cliff’s Delta
n1 n2 Z Pctl BC BCA Pivotal Std Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 2.068 1.952 1.891 1.867 1.952 1.805 2.072 0.920 0.943 0.967 0.988 0.943 1.992
10 10 1.814 2.173 2.102 2.085 2.173 1.954 1.820 0.933 0.970 0.995 1.023 0.970 2.393 15 5 2.118 2.997 2.915 3.052 2.997 2.585 2.126 1.142 1.136 1.192 1.217 1.136 8.656 25 75 0.915 0.791 0.787 0.785 0.791 0.782 0.915 0.424 0.427 0.428 0.430 0.427 0.444 50 50 0.797 0.860 0.854 0.853 0.860 0.846 0.798 0.438 0.443 0.444 0.446 0.443 0.461 75 25 0.922 1.220 1.214 1.223 1.220 1.192 0.922 0.562 0.573 0.575 0.580 0.573 0.636 50 150 0.646 0.555 0.553 0.553 0.555 0.552 0.646 0.301 0.302 0.303 0.303 0.302 0.307
100 100 0.563 0.603 0.601 0.600 0.603 0.598 0.563 0.312 0.314 0.314 0.314 0.314 0.319 150 50 0.650 0.860 0.858 0.862 0.860 0.850 0.650 0.405 0.408 0.409 0.411 0.408 0.426
Robust Confidence Intervals 25
Table 13 Estimated confidence band width with normal, homogeneous populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 2.052 2.148 2.131 2.150 2.148 2.014 2.054 1.057 1.122 1.132 1.149 1.122 2.790 0.2 2.055 2.150 2.129 2.149 2.150 2.014 2.058 1.047 1.109 1.126 1.147 1.109 2.850 0.5 2.077 2.181 2.147 2.164 2.181 2.032 2.082 1.015 1.058 1.088 1.120 1.058 3.388 0.8 2.118 2.232 2.172 2.185 2.232 2.062 2.127 0.958 0.962 1.013 1.061 0.962 3.819
10 10 0.0 1.778 1.953 1.934 1.929 1.953 1.833 1.780 0.957 1.016 1.022 1.031 1.016 1.359 0.2 1.781 1.957 1.936 1.933 1.957 1.835 1.783 0.952 1.010 1.019 1.029 1.010 1.361 0.5 1.805 1.991 1.957 1.955 1.991 1.853 1.810 0.922 0.967 0.986 1.004 0.967 1.589 0.8 1.852 2.046 1.985 1.985 2.046 1.883 1.861 0.867 0.887 0.922 0.956 0.887 1.933
15 5 0.0 2.050 2.129 2.114 2.130 2.129 2.000 2.052 1.054 1.118 1.129 1.146 1.118 2.466 0.2 2.056 2.136 2.114 2.134 2.136 2.002 2.059 1.041 1.101 1.119 1.138 1.101 2.772 0.5 2.078 2.180 2.142 2.162 2.180 2.029 2.083 1.012 1.053 1.084 1.114 1.053 3.380 0.8 2.117 2.224 2.166 2.181 2.224 2.057 2.125 0.960 0.969 1.018 1.053 0.969 3.683
50 150 0.0 0.641 0.642 0.642 0.642 0.642 0.639 0.641 0.366 0.369 0.369 0.369 0.369 0.378 0.2 0.642 0.644 0.644 0.644 0.644 0.640 0.642 0.363 0.365 0.365 0.365 0.365 0.374 0.5 0.649 0.648 0.648 0.648 0.648 0.645 0.649 0.348 0.349 0.349 0.350 0.349 0.359 0.8 0.660 0.660 0.660 0.660 0.660 0.656 0.660 0.322 0.322 0.323 0.324 0.322 0.334
100 100 0.0 0.555 0.559 0.559 0.558 0.559 0.556 0.555 0.318 0.321 0.321 0.321 0.321 0.325 0.2 0.556 0.560 0.560 0.560 0.560 0.557 0.556 0.316 0.318 0.318 0.318 0.318 0.322 0.5 0.564 0.567 0.567 0.568 0.567 0.564 0.564 0.303 0.305 0.305 0.306 0.305 0.309 0.8 0.577 0.580 0.579 0.580 0.580 0.577 0.577 0.281 0.282 0.282 0.283 0.282 0.287
150 50 0.0 0.641 0.643 0.643 0.643 0.643 0.640 0.641 0.366 0.368 0.369 0.369 0.368 0.378 0.2 0.642 0.644 0.645 0.645 0.644 0.641 0.642 0.364 0.367 0.367 0.368 0.367 0.377 0.5 0.648 0.651 0.651 0.651 0.651 0.647 0.648 0.349 0.351 0.352 0.352 0.351 0.361 0.8 0.660 0.661 0.660 0.660 0.661 0.657 0.660 0.323 0.323 0.324 0.325 0.323 0.335
Robust Confidence Intervals 26
Table 14 Estimated confidence band width with normal, heterogeneous(1:8) populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 2.037 1.480 1.465 1.458 1.480 1.433 2.038 0.928 0.966 0.971 0.977 0.966 1.248 0.2 2.040 1.478 1.463 1.456 1.478 1.431 2.041 0.927 0.963 0.971 0.978 0.963 1.236 0.5 2.050 1.503 1.479 1.472 1.503 1.447 2.053 0.906 0.935 0.949 0.963 0.935 1.377 0.8 2.074 1.564 1.522 1.510 1.564 1.491 2.079 0.868 0.880 0.904 0.932 0.880 1.819
10 10 0.0 1.778 2.073 2.041 2.030 2.073 1.916 1.780 0.991 1.066 1.070 1.084 1.066 1.754 0.2 1.783 2.093 2.057 2.049 2.093 1.928 1.785 0.984 1.055 1.064 1.083 1.055 1.801 0.5 1.800 2.095 2.052 2.045 2.095 1.924 1.804 0.967 1.030 1.047 1.074 1.030 2.237 0.8 1.847 2.186 2.107 2.103 2.186 1.966 1.856 0.915 0.945 0.978 1.025 0.945 2.941
15 5 0.0 2.104 3.761 3.694 3.855 3.761 3.141 2.111 1.202 1.295 1.327 1.364 1.295 11.427 0.2 2.103 3.806 3.738 3.862 3.806 3.167 2.110 1.201 1.304 1.340 1.386 1.304 11.178 0.5 2.151 3.844 3.737 3.889 3.844 3.167 2.162 1.168 1.215 1.281 1.321 1.215 11.672 0.8 2.197 3.921 3.775 3.936 3.921 3.198 2.212 1.160 1.150 1.237 1.274 1.150 11.897
50 150 0.0 0.640 0.427 0.427 0.427 0.427 0.426 0.640 0.304 0.305 0.306 0.306 0.305 0.309 0.2 0.641 0.429 0.429 0.430 0.429 0.428 0.641 0.302 0.304 0.304 0.304 0.304 0.307 0.5 0.645 0.436 0.436 0.436 0.436 0.435 0.645 0.293 0.294 0.294 0.295 0.294 0.297 0.8 0.652 0.449 0.448 0.448 0.449 0.447 0.652 0.277 0.278 0.278 0.279 0.278 0.282
100 100 0.0 0.555 0.561 0.562 0.561 0.561 0.558 0.555 0.340 0.343 0.343 0.343 0.343 0.348 0.2 0.556 0.562 0.562 0.563 0.562 0.559 0.556 0.338 0.341 0.341 0.341 0.341 0.347 0.5 0.562 0.572 0.571 0.571 0.572 0.568 0.562 0.328 0.330 0.330 0.330 0.330 0.336 0.8 0.573 0.589 0.588 0.588 0.589 0.585 0.573 0.311 0.312 0.313 0.314 0.312 0.319
150 50 0.0 0.642 0.979 0.979 0.979 0.979 0.968 0.642 0.459 0.465 0.466 0.466 0.465 0.483 0.2 0.644 0.980 0.979 0.980 0.980 0.969 0.644 0.458 0.464 0.465 0.465 0.464 0.482 0.5 0.652 0.992 0.991 0.992 0.992 0.980 0.652 0.444 0.449 0.450 0.451 0.449 0.469 0.8 0.668 1.015 1.013 1.015 1.015 1.001 0.668 0.421 0.426 0.427 0.429 0.426 0.449
Robust Confidence Intervals 27
Table 15 Estimated confidence band width with non-normal, homogeneous populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 2.053 2.088 2.082 2.137 2.088 1.948 2.055 1.048 1.112 1.121 1.140 1.112 2.691 0.2 2.061 2.184 2.141 2.147 2.184 2.013 2.064 0.984 1.022 1.041 1.055 1.022 2.094 0.5 2.086 2.322 2.225 2.201 2.322 2.091 2.091 0.895 0.906 0.939 0.959 0.906 2.032 0.8 2.140 2.560 2.379 2.316 2.560 2.232 2.150 0.812 0.780 0.831 0.866 0.780 2.292
10 10 0.0 1.777 1.890 1.876 1.873 1.890 1.784 1.780 0.957 1.018 1.025 1.033 1.018 1.348 0.2 1.781 1.905 1.888 1.885 1.905 1.791 1.784 0.943 0.997 1.014 1.023 0.997 1.461 0.5 1.813 2.007 1.951 1.956 2.007 1.845 1.819 0.892 0.918 0.951 0.982 0.918 2.022 0.8 1.873 2.156 2.038 2.047 2.156 1.924 1.883 0.838 0.815 0.868 0.922 0.815 3.017
15 5 0.0 2.052 2.139 2.126 2.180 2.139 1.983 2.055 1.057 1.123 1.134 1.150 1.123 2.932 0.2 2.056 2.080 2.078 2.164 2.080 1.941 2.059 1.091 1.155 1.190 1.213 1.155 5.011 0.5 2.085 2.095 2.071 2.179 2.095 1.945 2.090 1.086 1.075 1.141 1.182 1.075 7.705 0.8 2.132 2.246 2.164 2.291 2.246 2.044 2.141 1.119 0.947 1.038 1.075 0.947 8.340
50 150 0.0 0.641 0.630 0.631 0.635 0.630 0.627 0.641 0.365 0.367 0.367 0.367 0.367 0.376 0.2 0.642 0.640 0.640 0.642 0.640 0.636 0.642 0.328 0.331 0.331 0.331 0.331 0.337 0.5 0.649 0.664 0.662 0.662 0.664 0.659 0.649 0.287 0.287 0.288 0.288 0.287 0.292 0.8 0.660 0.702 0.698 0.697 0.702 0.695 0.660 0.249 0.249 0.250 0.250 0.249 0.254
100 100 0.0 0.555 0.553 0.553 0.553 0.553 0.550 0.555 0.318 0.320 0.320 0.320 0.320 0.324 0.2 0.556 0.558 0.558 0.559 0.558 0.555 0.556 0.313 0.315 0.316 0.316 0.315 0.320 0.5 0.564 0.581 0.580 0.584 0.581 0.578 0.564 0.294 0.295 0.296 0.296 0.295 0.300 0.8 0.578 0.623 0.621 0.625 0.623 0.618 0.578 0.267 0.267 0.268 0.270 0.267 0.275
150 50 0.0 0.641 0.637 0.638 0.644 0.637 0.634 0.641 0.366 0.368 0.368 0.368 0.368 0.377 0.2 0.642 0.634 0.637 0.644 0.634 0.631 0.642 0.389 0.393 0.394 0.393 0.393 0.405 0.5 0.649 0.651 0.653 0.665 0.651 0.648 0.649 0.382 0.384 0.385 0.387 0.384 0.400 0.8 0.661 0.685 0.685 0.699 0.685 0.680 0.661 0.356 0.355 0.357 0.361 0.355 0.378
Robust Confidence Intervals 28
Table 16 Estimated confidence band width with non-normal, heterogeneous (1:8) populations.
Cohen’s d Cliff’s Delta
n1 n2 delta Z Pctl BC BCA Pivotal Std
Pivotal S & F Z Pctl BC BCA Pivotal Std
Pivotal 5 15 0.0 2.041 1.706 1.656 1.600 1.706 1.602 2.042 0.901 0.933 0.946 0.958 0.933 1.340 0.2 2.047 1.850 1.764 1.684 1.850 1.703 2.049 0.877 0.898 0.918 0.938 0.898 1.453 0.5 2.070 2.079 1.950 1.840 2.079 1.850 2.074 0.844 0.846 0.876 0.906 0.846 2.060 0.8 2.104 2.333 2.152 2.021 2.333 2.005 2.111 0.797 0.779 0.826 0.869 0.779 2.464
10 10 0.0 1.788 2.302 2.226 2.182 2.302 2.047 1.791 0.982 1.047 1.066 1.088 1.047 2.726 0.2 1.814 2.487 2.365 2.306 2.487 2.140 1.819 0.962 0.999 1.027 1.061 0.999 3.649 0.5 1.851 2.809 2.621 2.544 2.809 2.291 1.859 0.948 0.941 0.984 1.034 0.941 4.754 0.8 1.916 3.086 2.828 2.706 3.086 2.413 1.930 0.922 0.862 0.918 0.975 0.862 5.436
15 5 0.0 2.114 3.710 3.606 3.948 3.710 3.072 2.122 1.246 1.262 1.319 1.334 1.262 14.675 0.2 2.134 3.911 3.769 4.095 3.911 3.171 2.143 1.252 1.226 1.303 1.325 1.226 15.054 0.5 2.219 4.228 4.013 4.292 4.228 3.280 2.235 1.266 1.097 1.203 1.185 1.097 14.270 0.8 2.298 4.570 4.241 4.506 4.570 3.384 2.321 1.311 0.989 1.109 1.071 0.989 13.606
50 150 0.0 0.641 0.431 0.430 0.429 0.431 0.430 0.640 0.294 0.295 0.296 0.296 0.295 0.299 0.2 0.641 0.478 0.475 0.472 0.478 0.476 0.641 0.284 0.286 0.286 0.286 0.286 0.289 0.5 0.645 0.543 0.537 0.534 0.543 0.539 0.645 0.268 0.268 0.269 0.269 0.268 0.273 0.8 0.652 0.614 0.606 0.602 0.614 0.608 0.653 0.250 0.250 0.251 0.252 0.250 0.255
100 100 0.0 0.555 0.573 0.570 0.567 0.573 0.569 0.555 0.339 0.341 0.342 0.342 0.341 0.348 0.2 0.557 0.627 0.623 0.618 0.627 0.621 0.557 0.331 0.333 0.334 0.334 0.333 0.340 0.5 0.563 0.722 0.712 0.708 0.722 0.712 0.563 0.315 0.317 0.318 0.319 0.317 0.325 0.8 0.575 0.823 0.809 0.804 0.823 0.806 0.575 0.297 0.298 0.298 0.301 0.298 0.308
150 50 0.0 0.642 0.989 0.983 0.983 0.989 0.977 0.642 0.463 0.469 0.470 0.471 0.469 0.489 0.2 0.644 1.061 1.052 1.053 1.061 1.043 0.644 0.455 0.460 0.461 0.463 0.460 0.482 0.5 0.655 1.189 1.173 1.176 1.189 1.160 0.655 0.434 0.438 0.439 0.443 0.438 0.464 0.8 0.671 1.329 1.307 1.309 1.329 1.285 0.671 0.411 0.413 0.415 0.421 0.413 0.445
Robust Confidence Intervals 29
Z Pctl BC BCA PV SPV S & F Z Pctl BC BCA PV SPV
M etho d
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Esti
mat
ed B
and
Cov
erag
e
Cohen d Cliff d
Figure 1. Distributions of Confidence Band Coverage Estimates
Robust Confidence Intervals 30
Z Pctl BC BCA PV SPV S & F Z Pctl BC BCA PV SPV
M etho d
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
Esti
mat
ed B
and
Wid
th
Cohen d Cliff d
Figure 2. Distributions of Confidence Band Coverage Widths