Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
Author's Accepted Manuscript
Symptom screening scales for detecting majordepressive disorder in children and adoles-cents: A systematic review and meta-analysisof reliability, validity and diagnostic utility
Emily Stockings, Louisa Degenhardt, Yong YiLee, Cathrine Mihalopoulos, Angus Liu, MeganHobbs, George Patton
PII: S0165-0327(14)00785-XDOI: http://dx.doi.org/10.1016/j.jad.2014.11.061Reference: JAD7152
To appear in: Journal of Affective Disorders
Received date: 26 August 2014Revised date: 28 November 2014Accepted date: 29 November 2014
Cite this article as: Emily Stockings, Louisa Degenhardt, Yong Yi Lee, CathrineMihalopoulos, Angus Liu, Megan Hobbs, George Patton, Symptom screeningscales for detecting major depressive disorder in children and adolescents: Asystematic review and meta-analysis of reliability, validity and diagnosticutility, Journal of Affective Disorders, http://dx.doi.org/10.1016/j.jad.2014.11.061
This is a PDF file of an unedited manuscript that has been accepted forpublication. As a service to our customers we are providing this early version ofthe manuscript. The manuscript will undergo copyediting, typesetting, andreview of the resulting galley proof before it is published in its final citable form.Please note that during the production process errors may be discovered whichcould affect the content, and all legal disclaimers that apply to the journalpertain.
www.elsevier.com/locate/jad
1
Symptom screening scales for detecting major depressive disorder in children and
adolescents: A systematic review and meta-analysis of reliability, validity and diagnostic
utility Emily Stockings*1 PhD, Louisa Degenhardt1 PhD, Yong Yi Lee2 MHEcon, Cathrine Mihalopoulos3 PhD, Angus Liu1 MPH, Megan Hobbs4 PhD, George Patton5 MD. 1National Drug and Alcohol Research Centre, University of New South Wales, 22-32 King Street, Randwick, New South Wales, Australia 2031.
2School of Population Health, University of Queensland, Herston Road, Herston, Queensland, Australia, 4006. 3Population Health Strategic Research Centre, Deakin University, 211 Burwood Highway, Burwood, Victoria, Australia, 3125. 4Clinical Research Unit for Anxiety and Depression (CRUfAD), University of Newcastle, St Vincent’s Hospital, 390 Victoria Street, Darlinghurst, 2010. 5Centre for Adolescent Health, The Royal Children’s Hospital, 50 Flemington Road, Parkville, Victoria, 3025. *Corresponding author: Emily Stockings, Email: [email protected] Phone: (02)9385 1062 Address: National Drug and Alcohol Research Centre (NDARC), Building R3, 22-32 King Street, Randwick, New South Wales, Australia, 2031
Abstract
Background: Depression symptom screening scales are often used to determine a clinical
diagnosis of major depressive disorder (MDD) in prevention research. The aim of this review
is to systematically examine the reliability, validity and diagnostic utility of commonly used
screening scales in depression prevention research among children and adolescents.
Methods: We conducted a systematic review of the electronic databases PsycINFO,
PsycEXTRA and Medline examining the reliability, validity and diagnostic utility of four
commonly used depression symptom rating scales among children and adolescents: the
Children's Depression Inventory (CDI), Beck Depression Inventory (BDI), Center for
Epidemiologic Studies - Depression Scale (CES-D) and the Reynolds Adolescent Depression
Scale (RADS). We used univariate and bivariate random effects models to pool data and
conducted metaregression to identify and explain causes of heterogeneity.
Results: We identified 54 studies (66 data points, 34,542 participants). Across the four
scales, internal reliability was 'good' (pooled estimate: 0.89, 95% Confidence Interval (CI):
2
0.86 to 0.92). Sensitivity and specificity were 'moderate' (sensitivity: 0.80, 95% CI: 0.76 to
0.84; specificity: 0.78, 95% CI: 0.74 to 0.83). For studies that used a diagnostic interview to
determine a diagnosis of MDD, positive predictive power for identifying true cases was
mostly poor. Psychometric properties did not differ on the basis of study quality, sample type
(clinical vs. nonclinical) or sample age (child vs. adolescent).
Limitations: Some analyses may have been underpowered to identify conditions in which
test performance may vary, due to low numbers of studies with adequate data.
Conclusions: Commonly used depression symptom rating scales are reliable measures of
depressive symptoms among adolescents; however, using cutoff scores to indicate clinical
levels of depression may result in many false positives.
Keywords: Depression, Children, & Adolescents, Psychometrics, Validity, Psychiatric
Symptom Rating Scales, Prevention.
Abbreviations:
ADIS-C: Anxiety Disorders Interview Schedule for Children AUC: Area under the curve BDI: Beck Depression Inventory BSI: Brief Symptom Inventory BYI: Beck Youth Inventories CBCL: the Child Behaviour Checklist CDI: Children’s Depression Inventory CDRS-R: Children’s Depression Rating Scale – Revised CES-D: Center for Epidemiologic Studies- Depression Scale CES-DC: Center for Epidemiologic Studies – Depression Scale for Children CIDI: Composite International Diagnostic Interview DAWBA: The Development and Wellbeing Assessment DEPS-10: Depression Scale - Version 10 DICA-IV: Diagnostic Interview for Children and Adolescents- Version 4 DISC: The National Institute of Mental Health Diagnostic Interview Schedule for Children DSM: Diagnostic and Statistical Manual DSRS: Depression Symptom Rating Scale (DSRS) HADS: Hospital Anxiety and Depression Scale ICD: International Classification of Diseases KID-SCID: Structured Clinical Interview for DSM-IV disorders – Child version
3
Kinder DIPS: Diagnostisches Interview bei psychischen Störungen im Kindes- und Jugendalter [German] K-SADS: Kiddie-Schedule for Affective Disorders and Schizophrenia MDD: Major depressive disorder MDI-C: Multi-score Depression Inventory- Children MINI: Mini-International Neuropsychiatric Interview MINI-KID: The Mini-International Neuropsychiatric Interview for Children NPV: Negative predictive value PPV: Positive predictive value PRIME-MD: The Primary Care Evaluation of Mental Disorders RADS: Reynolds Adolescent Depression Inventory RCDAS: Revised Child Anxiety and Depression Scale ROC: Receiver operator characteristic SBB-DES: The Self-Report Questionnaire—Depression [German] SCAN: Schedules for Clinical Assessment in Neuropsychiatry SCID: Structured Clinical Interview for Depression for DSM disorders SCID-1: Structured Clinical Interview for DSM-IV Axis I disorders SDQ: Strengths and Difficulties Questionnaire SMFQ: Short Mood and Feelings Questionnaire YSR: Youth Self-Report
4
Introduction
Major depressive disorder (MDD) is the leading global cause of disability among
young people aged 10-24 years, accounting for 8.2% of the global non-fatal disease burden
(Gore et al., 2011). Approximately 3% of children and 6% of adolescents suffer current or
recent depression (Costello et al., 2006). MDD in young people is associated with poor
academic performance, substance abuse, attempted and completed suicide and an increased
risk of suffering depression during adulthood (Birmaher et al., 1996, Brent et al., 1986).
Despite the significant health burden associated with MDD, studies have suggested that less
than 50% of youths seek mental health treatment for the condition (Reavley et al., 2010, Leaf
et al., 1996). Valid and accurate screening tools for depression may assist clinicians in
identifying MDD in youths, and may subsequently increase the rates of appropriate treatment
and referral (Hosman et al., 2005, Andrews et al., 2002).
Given the significant health burden associated with MDD, there has been growing
recognition of the need to develop programs aiming to prevent the onset of MDD during
childhood and adolescence (Hosman et al., 2005, Andrews et al., 2002). Promisingly, the
number of studies that have examined the efficacy of preventative interventions for MDD
among children and adolescents more than doubled between 2004 and 2010 (Merry et al.,
2004, Merry et al., 2011). Such interventions have typically been delivered in the school
setting during regular classes by teachers or trained external facilitators (Merry et al., 2011).
Given that routinely administering structured or semi-structured diagnostic interviews in
schools can be costly and time consuming, many trials have used categorical thresholds on
MDD symptom screening scales, such as the Center for Epidemiologic Studies – Depression
Scale (CES-D) (Radloff, 1977) as a proxy for a diagnosis of MDD (Merry et al., 2011).
Evaluating the efficacy of preventative interventions using symptom screening scales is
problematic in two ways. Firstly, symptom screening scales that impose a categorical
5
threshold over a larger number of symptoms that are included in the DSM and the ICD can
increase the number of cases that are identified compared to the cases where DSM or ICD
diagnoses were applied (Ferrari et al., 2013).
Secondly, while the reliability, validity and utility of identifying cases of MDD using
the CES-D and other symptom screening scales have been established in adult populations
(Radloff, 1977), the same clinical thresholds have been applied among childhood and
adolescent samples. No review has systematically examined whether the clinical thresholds
identified in adult samples are reliable, valid or useful for identifying cases of MDD among
children and adolescents. There is also no review that has quantitatively synthesized the
traditional psychometric properties of symptom screeners for MDD among children and
adolescents. Indeed, previous reviews examining the traditional psychometric characteristics
of depression symptom screening scales among children and adolescents have been non-
systematic and qualitative in nature, the most comprehensive of which were conducted more
than a decade ago (Myers and Winters, 2002b, Brooks and Kutcher, 2001). The absence of
robust data in support of applying the adult-derived thresholds for MDD when assessing for
MDD among children and adolescents means that it is difficult to interpret the efficacy,
effectiveness and efficiency of early-life preventative interventions for MDD meaningfully.
The purpose of this review is to systematically: 1) identify symptom screening scales
that are commonly used in childhood and adolescent preventative interventions for MDD;
and 2) identify evidence of reliability, validity and diagnostic utility of these symptom
screening scales.
6
Methods
Search 1: Identify symptom screening scales that are commonly used in childhood and
adolescent preventative interventions for MDD.
a) Search strategy
Given the large number of randomized controlled trials examining the efficacy of preventive
interventions for MDD among children and adolescents, and the existence of multiple
reviews synthesizing these findings (e.g. (Merry et al., 2004, Merry et al., 2011)), we
conducted a systematic review of reviews, and updated recent reviews with any studies which
may have been published subsequent to their publication date. The review of reviews was
conducted in August, 2013 using the online databases PubMed, Medline, PSYCINFO and the
Cochrane Library of Systematic Reviews, in consultation with a librarian and in accordance
with the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA)
statement (Liberati et al., 2009). Databases were searched using a combination of MeSH
terms and text words pertaining to depression and dysthymia (Depressive Disorder, Major
Depression, Dysthymic Disorder, “dysthymia.mp.”), prevention (Primary Prevention,
Preventative Psychiatry, “prevention.mp.”) and intervention trials (Intervention Studies,
“intervention.mp.”). An additional search of empirical studies dated from August 2010-
present was conducted in January 2014 to identify recently published randomised controlled
trials not included in the existing reviews. This additional search was conducted in the
electronic databases PubMed, Medline and PSYCINFO using the search string ((((depress*
OR dysthymi*)) AND (child* OR adolescen*)) AND (prevent* OR early intervention* OR
risk OR at-risk OR vulnerab*)) AND (randomised controlled trial OR controlled trial
[Publication Type]).
7
b) Inclusion criteria
i) Reviews: Reviews were eligible for inclusion if: 1) they were published
between 2003 and August 2013 in the English language; 2) the authors
employed systematic methods of reviewing the literature; and 3) the data were
reported in a usable form, and:
ii) Empirical studies: Individual studies identified within the reviews, and the
updated search of empirical studies were eligible for inclusion if: 4) the
participants were aged between 5 and 18 years; 5) participants were randomly
assigned to an intervention or a control group; 6) the control group received no
intervention, placebo or usual care; and 7) the intervention aimed to prevent of
the onset of depression or dysthymia.
c) Data extraction
Data were extracted from each empirical study by co-author ES and were cleaned and double
checked by co-author YL and included: 1) sociodemographics of the sample; 2) details of the
intervention and delivery; 3) timing of follow-up assessments; 4) the methods used to
measure the prevalence, or symptoms of depression or dysthymia, including depression
symptom screening scales, or structured diagnostic interviews; and 5) outcome data.
Symptom screening scales were defined as any type of measure that provided an assessment
of MDD, producing a numerical score that has guidelines for interpretation, whether
completed by the person of interest (self report) or someone else (e.g. parent, guardian,
teacher), with any type of response format (Myers and Winters, 2002a). For the purposes of
this review, depression symptom screening scales were defined as ‘common’ if they were
used in more than two studies. Scales used in less than two studies are mentioned only briefly
here.
8
Search 2: Identify evidence of reliability, validity and diagnostic utility of commonly used
depression symptom screening scales used in childhood and adolescent preventative
interventions for MDD.
a) Search strategy
For each symptom screening scale for MDD that was identified as commonly used in Search
1, evidence for each scale’s reliability, validity and diagnostic utility in any child or
adolescent sample (not limited to samples where preventive interventions were applied) was
obtained through a series of additional systematic searches of the literature using PsycINFO,
PsycEXTRA and Medline and a combination of MeSH terms and text words pertaining to
depression (exp Major Depression), children (children OR adolesc~) symptom screening
scales (exp Rating scales/exp Screening Tests/exp Psychological Tests/exp Psychiatric Status
Rating scales), and psychometric properties (exp Psychometrics/exp Test Validity/exp Test
Reliability/exp Diagnostic Utility), and the name of each individual scale as text words (e.g
“children’s depression inventory”, “center for epidemiologic studies depression scale”, “beck
depression inventory”,). Other sources of literature included existing reviews of depression
scales (Brooks and Kutcher, 2001, Brooks and Kutcher, 2003, Pavuluri and Birmaher, 2004,
Suzuki, 2011, Collett et al., 2003), and hand searches of articles identified as relevant.
b) Eligibility criteria
Articles reporting on the scales’ reliability, validity and diagnostic utility were eligible for
inclusion in the review if the study was: 1) published between 1980 (following publication of
the initial validation studies) and August 2013 in the English language; 2) the participants
were aged between 5 and 18 years; 3) the scale was identified in Search 1; and 4) the study
reported the traditional psychometric properties of the scale including measures of reliability
9
and/or validity. We excluded any revisions or short versions of the scales if they were not
analogous to the original scales (i.e. differed in terms of number of items, scoring method,
etc.). Studies attempting to translate an existing scale into another language without reporting
psychometric properties of the scale itself were excluded.
c) Data extraction
Data from each included study were extracted by co-author AL and were cleaned and double
checked by co-author ES and included: 1) characteristics of the sample (including sample
size, age, gender, country, language and setting); 2) the names of the scales assessed and the
number of items; 3) the test-retest and inter-rater reliability of the scale; 4) the internal
validity of the scale (e.g. Cronbach’s alpha); 5) details about the external validity of the scale
including the categorical thresholds used to indicate cases of MDD, the type of clinical
interview used to determine cases of MDD (i.e. the ‘criterion standard’), and the associated
sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and
area under the curve (AUC) results derived from receiver operating characteristics (ROC)
analyses (described below). Sensitivity, specificity, PPV and NPV were calculated manually
when data were partially reported, and could be imputed based on the presentation of other
data.
Primary outcomes: Traditional psychometric properties and diagnostic utility
a) Data synthesis
Data were synthesised via meta analysis using the statistical software program Stata/SE
version 13.1 (StataCorp, 2013). We used univariate random effects models to pool data
10
across all studies for the primary outcomes internal reliability and AUC. Bivariate random
effects models (using the ‘mvmeta’ command) were used to pool data for the primary
outcomes sensitivity and specificity in order to take into account the interdependency of these
values. Values for PPV and NPV were not meta analysed and were synthesised in a
qualitative manner only, as these values are highly dependent on the prevalence of depression
in each sample, which was likely to vary substantially between each study included in this
review (Leeflang et al., 2012). In order to identify and explain any causes of heterogeneity,
random effects metaregression (using the ‘metareg’ command) was performed on the primary
outcomes (with the exception of PPV and NPV) for each of the four scales, and across the
four scales overall. The effects of the sample setting (clinical vs. nonclinical), age (children
vs. adolescents), risk of bias quality score, the cutoff score used, and the scale (CDI, BDI,
CES-D, RADS) were evaluated by individually adding them as covariates in the regression
models. Due to the small number of studies, these analyses were primarily exploratory.
Additionally, the I2 index was employed to quantify any heterogeneity in the pooled
estimates, and was described as low, moderate or high according to an I2 value of 25, 50 and
75%, respectively (Higgins et al., 2003). Statistical significance for all analyses was set at p <
.05.
b) Reliability
The internal consistency of each scale was assessed using Cronbach’s alpha [α], and test
retest and interrater reliability were assessed using the Kappa [κ] statistic. These reliability
coefficients were classified as ‘excellent’ if α ≥ .9, ‘good’ if .85 ≤ α < .9, ‘moderate’ if .80 ≤
α < .85, ‘fair’ if .75 ≤ α < .80, or ‘unsatisfactory’ if < .75 (Ponterotto and Ruckdeschel, 2007).
11
c) Validity
To examine the validity of each scale, we assessed five key measures of discriminative
validity: 1) sensitivity (the proportion of true cases correctly identified, or “true positive
rate”); 2) specificity (proportion of non-cases correctly identified, or “true negative rate”); 3)
PPV (probability that subjects identified as cases are true cases, or “precision”); and 4) NPV
(probability that the subjects identified as not being a case are not cases) (Murphy and
Davidshofer, 1994). For inclusion of studies in the meta analysis, we assigned a minimum
quality assessment for the criterion standard as diagnoses made using either a standardised
(e.g. the Composite International Diagnostic Interview (CIDI)) or structured (e.g. Kiddie-
Schedule for Affective Disorders and Schizophrenia (K-SADS)) diagnostic interview
schedule based on DSM or ICD-10 diagnostic criteria. Clinician rated diagnoses without
reference to a specific diagnostic instrument were excluded from the analyses. Based on
expert statistical recommendation, we assigned a value of ‘excellent’ for the scales’
sensitivity, specificity, PPV and NPV if the pooled value of coefficient across studies was ≥
.9, a value of ‘good’ if .8 ≤ coefficient < .9, a value of ‘moderate’ if 0.6 ≤ coefficient < 0.8,
and a value of ‘low’ coefficient < .6 (Andrews et al., 1993). We additionally assessed the
scale’s diagnostic accuracy using the area under the curve (AUC) receiver operating
characteristic (ROC) analysis. ROC curves plot the tradeoff between sensitivity and
specificity values for all possible cutoff scores, and the AUC is a measure of the scales’
overall ability to correctly classify individuals as cases or noncases (Brooks and Kutcher,
2001). This is calculated as the sum of true positives and the sum of true negatives divided by
the total population (not taking into account false positive or negatives). AUC values range
from 0.5 (random classification) to 1 (perfect classification). Henderson (Henderson, 1993)
12
recommended the following interpretation with respect to the scales’ diagnostic accuracy:
AUC > 9 ‘high’; 0.7 < AUC ≤ 0.9 ‘moderate’; AUC ≤ 7 ‘low’.
Risk of bias
Risk of bias in the included studies was examined using the risk assessment tool developed
by the Cochrane Collaboration Diagnostic Test Accuracy Working Group (Reitsma et al.,
2009), derived from the Revised Quality Assessment of Diagnostic Accuracy Studies
(QUADAS-2) tool (Whiting et al., 2011b). Risk was determined across four domains: patient
selection, index test, reference standard, and flow and timing, and was categorised and
quantified as either low (3), unclear (2) or high (1) in order to derive a total quality score for
use in the metaregression analyses.
Results
Search 1: Identify commonly used depression symptom screening scales in childhood and
adolescent preventative interventions for MDD.
Search 1 identified 17 systematic reviews of childhood and adolescent preventative
interventions for MDD. These 17 reviews contained a total of 394 individual trials, of which
288 were duplicates (in instances where the same trial appeared in more than one review) and
106 were unique.
Of the 106 unique trials, 103 assessed depressive symptomatology as an outcome using
depression symptom screening scales, and 32 assessed the prevalence of MDD using either
cutoff scores on depression symptom screening scales (n = 15) or by administering structured
or semistructured clinical interviews (n = 17). Across the 103 trials, a total of 17 different
depression symptom screening scales, and five structured or semistructured clinical
interviews were used. The most commonly used screening scale for assessing depressive
13
symptomatology was the Children’s Depression Inventory (CDI; 45 studies), followed by the
Beck Depression Inventory (BDI; 16 studies), the Center for Epidemiologic Studies –
Depression Scale (CES-D; 15 studies), and the Reynolds Adolescent Depression Scale
(RADS; 5 studies). The following scales were used as outcome measures in two studies or
fewer: the Brief Symptom Inventory (BSI), the Child Behaviour Checklist (CBCL), the
Children’s Depression Rating Scale – Revised (CDRS-R), the Depression Symptom Rating
Scale (DSRS), Beck Youth Inventories (BYI), the Depression Scale version 10 [Finnish]
(DEPS-10), the Hospital Anxiety and Depression Scale (HADS), the Multi-score Depression
Inventory – Children (MDI-C), the Revised Child Anxiety and Depression Scale (RCADS),
Youth Self-Report (YSR), and The Self-Report Questionnaire—Depression [German]
(Selbstbeurteilungsbogen—Depressive Störungen; SBB–DES). Two studies were found to
have used undefined composite constructs. The most commonly used measure to determine
the presence of MDD was the Kiddie-Schedule for Affective Disorders and Schizophrenia
(K-SADS; 7 studies), followed by the Anxiety Disorders Interview Schedule for Children
(ADIS-C; 5 studies), the Diagnostic Interview for Children and Adolescents version 4,
(DICA-IV; 1 study), and the Structured Clinical Interview for Depression for DSM disorders
(SCID; 1 study).
Search 2: Identify evidence of reliability, validity and diagnostic utility of commonly used
depression symptom screening scales used in childhood and adolescent preventative
interventions for MDD.
Figure 1 shows the study selection process. The search for the reliability, validity and
diagnostic utility of depression symptom screening scales for children and adolescents
yielded a total of 2017 results, of which 1798 were unique. Of these, 1626 were excluded as
they were not relevant to the topic. Of the remaining 172 publications, 77 were excluded as
14
the age of the sample was outside the specified age range (5-18), 30 conducted unrelated
psychometric tests, 10 assessed the validity of translated versions only and 3 assessed short or
alternate versions of the target scale. We identified the remaining 52 papers (comprising 54
studies) as relevant, of which 21 assessed the reliability, validity and diagnostic utility of the
CDI, 17 examined the BDI (one study additionally examined the CES-D, and study
additionally examined the RADS), 10 examined the CES-D and 6 examined the RADS
(Figure 1). The 54 studies comprised a total of 66 individual data points and 34,542
participants. Studies were conducted predominantly in the United States, with a minority
conducted in Europe (Spain, Germany, Belgium, Denmark, the Netherlands, Greece, Sweden
and Switzerland), Africa (Nigeria and Rwanda), India and Taiwan. Half of the articles
included in this review (n = 26 articles, 32 studies; 50%) were published since the last major
review of symptom screening scales for MDD among children and adolescents in 2002
(Myers and Winters, 2002b).
Risk of bias in included studies
Of the 52 individual papers (based on 54 studies) included in the review, risk of bias was
mostly unable to be determined, or determined to be high (Figure 1a, online supplement).
Only 10 studies employed representative samples. Four studies pre-specified the cutoff scores
on the scales to determine ‘caseness’, with most studies calculating these values post-hoc (n =
39). Of the 33 studies that used a diagnostic interview to determine the discriminative validity
of the symptom screening scale, 24 used structured or semistructured clinical interviews
based on ICD-10 or DSM criteria, however only 14 of these stated that the interviewers were
blinded to the scale scores when administering the interview. The 9 studies that used
undefined clinical interviews that were not based on ICD-10 or DSM diagnostic criteria were
excluded from the analyses. Appropriate lag time between the index test (symptom screening
15
scale) and reference test (diagnostic interview) was deemed to be within one week (Whiting
et al., 2011a), and 11 studies fulfilled this criterion.
Commonly used depression scales for children and adolescents
Children’s Depression Inventory (CDI)
The CDI is the child version of Beck Depression Inventory (BDI), designed for use with
children aged 7-18 years (Kovacs, 1984). The scale is self report, comprises 27 items with
three-point answers and takes approximately 10-20 minutes to complete and score (Kovacs,
1984). The period of assessment is the prior two weeks, and the range of possible scores is 0
to 54. The CDI was specifically designed for and tested among children and adolescents and
includes a corresponding parent version. It has a good track record in depression research as
it is one of the most widely used and studied scale of youth depression, with normative data
available for children and adolescents (Kovacs, 1984). The CDI purports to possess a five
factor structure, however one of the factors relates to externalising disorders (‘acting out’)
and one to anxiety, resulting in uncertainty around the validity of depression construct (Myers
and Winters, 2002b).
Reliability, validity and diagnostic utility coefficients for the CDI among clinical and
nonclinical samples nonclinicalof children and adolescents identified in our review are
summarised in Table 1, and Figure 2. The 21 identified studies comprised a total of 25 data
points (n = 8371). Of these, 14 studies (18 data points) provided details of the scales’ internal
reliability (Allgaier et al., 2012, Carey et al., 1987, Crowley and Emerson, 1996, Crowley et
al., 1994, Figueras Masip et al., 2010, Fundudis et al., 1991, Giannakopoulos et al., 2009,
Nelson III et al., 1987, Reynolds et al., 1985, Saylor et al., 1984, Smucker et al., 1986,
16
Sorensen et al., 2005, Thompson et al., 2012, Weiss et al., 1991). The pooled estimate for
internal reliability was 0.86 (18 data points, n = 7372, 95% confidence interval (CI): 0.84 to
0.88) and was classified as ‘good’ (Figure 2), however heterogeneity was high (I2 = 76%).
Metaregression showed that sample type (clinical vs. nonclinical; t(1, 15) = 0.50, p = 0.62,
adjusted R2 <0), sample age (child vs adolescent; t(1,15) = 0.31, p = 0.76, R2 <0) and risk of
bias quality score (t(1, 15) = 1.61, p = 0.13, R2 = 67.38%) did not significantly affect the
internal reliability values on the CDI.
Nine studies (10 data points) examined the discriminative validity of the CDI (Allgaier et al.,
2012, Figueras Masip et al., 2010, Fristad et al., 1988, Fundudis et al., 1991, Roelofs et al.,
2010, Shemesh et al., 2005, Timbremont et al., 2004, Sorensen et al., 2005, Craighead et al.,
1995) with cutoff scores ranging from 11 to 19 (M = 14.5, SD = 2.67). Of these, seven used
structured or semi structured clinical interviews to determine presence of MDD. All seven
studies examined the specificity and sensitivity of the CDI. Bivariate meta analyses revealed
that sensitivity was ‘good’ overall (7 data points, n = 1432, pooled estimate: 0.83, 95% CI:
0.77 to 0.89), as was specificity (7 data points, n = 1432, pooled estimate: 0.84, 95% CI: 0.77
to 0.92), however heterogeneity was high (I2 = 91% and 96% respectively). Metaregression
showed that the sample type (clinical vs. nonclinical; t(1, 6) = 0.59, p = 0.58, R2 < 0), sample
age (child vs adolescent; t(1, 6)= 1.05, p = 0.33, R2 < 0), cutpoint used (t (1, 6) = 2.21, p =
0.06, R2 = 38.68%) and risk of bias quality score (t (1, 6) = 0.56, p = 0.59, R2 < 0) did not
significantly affect sensitivity values on the CDI. Six studies examined the PPV of the CDI,
all of which were conducted among clinical samples, and most were classified as ‘low’, with
values of 0.28 (Allgaier et al., 2012), 0.35 (Roelofs et al., 2010), 0.38 (Shemesh et al., 2005),
0.21 (Sorensen et al., 2005), 0.63 (Timbremont et al., 2004) and 0.90 (Fristad et al., 1988).
Five studies examined the NPV of the CDI, and as per the PPV values, all were conducted
among clinical samples. NPV values were mostly classified as ‘excellent’, with values of 1.0
17
(Roelofs et al., 2010), 0.98 (Allgaier et al., 2012), , 0.98 (Timbremont et al., 2004), 0.94
(Shemesh et al., 2005) and 0.63 (Sorensen et al., 2005). Four studies examined the diagnostic
accuracy of the CDI using AUC analyses, all of which were conducted among clinical
samples (Allgaier et al., 2012, Roelofs et al., 2010, Sorensen et al., 2005, Timbremont et al.,
2004), and overall accuracy was classified as ‘high’ (4 data points, n = 1146, pooled estimate:
0.90, 95% CI: 0.79 to 0.98; Figure 2), however heterogeneity was high I2=95%. There was
insufficient data to conduct a metaregression on AUC values for the CDI.
Beck Depression Inventory (BDI)
The BDI was originally developed in 1961 as a depression symptom rating scale for the adult
population (Beck and Alford, 2009). The original inventory comprises 21 items, with four (or
five) statement responses representing how the respondent has been feeling during the past
week and current day. The statements are presented in order of increasing severity and are
scored from 0 to 3 (alternative statements sharing the same score are sometimes provided).
Six of the items refer to vegetative symptoms; the remaining 15 items refer to either affective
or cognitive symptoms. The range of possible scores is 0 to 63 (Beck et al., 1961). The scale
is widely used among both adults and adolescents and has a strong track record in depression
research (Myers and Winters, 2002b). The scale lacks items pertaining to school and does not
offer parallel parent or teacher rating forms, and as such, its use among younger children may
not be suitable (Kovacs and Beck, 1977). There are several alternate versions of the BDI. The
BDI-1A is a revision of the original BDI (Beck et al., 1996), also containing 21 items but
with refined response formats and a defined assessment period of the prior two weeks. The 21
item BDI-II is a further revision of the BDI, developed in 1996 to align with the DSM-IV
criteria for depression (Beck et al., 1996).
18
Reliability, validity and diagnostic utility coefficients for the BDI among clinical and
nonclinical samples of children and adolescents are summarised in Table 2 and Figure 3. The
17 identified studies comprised a total of 20 data points (n = 5464). Of these, 11 studies (12
data points) examined the scale’s internal reliability (Adewuya et al., 2007, Ambrosini et al.,
1991, Barrera and Garrison-Jones, 1988, Dolle et al., 2012, Jolly et al., 1994, Krefetz et al.,
2002, Kumar et al., 2002, Roberts et al., 1991, Strober et al., 1981, Teri, 1982, Kashani et al.,
1990). The pooled estimate for internal reliability was 0.86 (12 data points, n = 4152, 95%
CI: 0.81 to 0.90) and was classified as ‘good’, and heterogeneity was moderate (I2 = 72%)
(Figure 3). Metaregression showed that sample type (clinical vs. nonclinical; t (1, 9) = 1.86, p
= 0.09, R2 = 32.17%), sample age (child vs adolescent; t (1, 9) = 0.88, p = 0.40, R2 = 73.52%)
and risk of bias quality score (t (1, 9) = 0.70, p = 0.50, R2 < 0) did not significantly affect the
internal reliability values on the BDI.
Fifteen studies (18 data points) examined the discriminative validity of the BDI (Adewuya et
al., 2007, Ambrosini et al., 1991, Barrera and Garrison-Jones, 1988, Bennett et al., 1997,
Blom et al., 2010, Canals et al., 2001, Dolle et al., 2012, Kashani et al., 1990, Krefetz et al.,
2002, Kumar et al., 2002, Marton et al., 1991, Roberts et al., 1991, Russell et al., 2012,
Strober et al., 1981, Whitaker et al., 1990), with cutoff scores ranging from 11 to 24. Of the
13 studies that used a structured or semi-structured clinical interview, 12 studies (13 data
points) examined the scales’ sensitivity and specificity (Adewuya et al., 2007, Ambrosini et
al., 1991, Barrera and Garrison-Jones, 1988, Bennett et al., 1997, Canals et al., 2001, Dolle et
al., 2012, Kashani et al., 1990, Marton et al., 1991, Roberts et al., 1991, Russell et al., 2012,
Strober et al., 1981, Whitaker et al., 1990). Bivariate meta analyses revealed that sensitivity
was ‘good’ overall (13 data points, n = 4597, pooled estimate: 0.81, 95% CI: 0.74 to 0.87), as
was specificity (13 data points, n = 4597, pooled estimate: 0.81, 95% CI: 0.75 to 0.88; Figure
3), however heterogeneity was high (I2 = 94% and 97% respectively). Metaregression showed
19
that the sample type (clinical vs. nonclinical; t(1, 12) = 0.73, p = 0.48, R2 < 0), sample age
(child vs adolescent; t(1, 12) = 0.85, p = 0.41, R2 < 0), cutpoint used (t(1, 12) = 0.13, p =
0.90, R2 < 0) and risk of bias quality score (t(1, 12) = 1.32, p = 0.37, R2 = 28.33%) did not
significantly affect sensitivity values on the BDI. Similar results were found for specificity,
with no impact of sample type (t(1, 12) = 0.86, p = 0.40, R2 < 0), age (t(1, 12) = 0.83, p =
0.42, R2 < 0), cutpoint used (t(1, 12) = 0.94, p = 0.37, R2 < 0) or risk of bias quality score (t(1,
12) = 0.39, p = 0.70, R2 < 0).
Eight studies examined the PPV of the BDI, half of which were conducted with clinical
samples and half in non-clinical samples, with significant heterogeneity between studies.
PPV values were higher overall in clinical samples, with values of 0.79 (Marton et al., 1991),
0.81 (Bennett et al., 1997), 0.81 (Dolle et al., 2012), and 0.93 (Ambrosini et al., 1991) and
were lower, however more varied in nonclinical samples, with values of 0.10 (Roberts et al.,
1991), 0.14 (Russell et al., 2012), 0.47 (Canals et al., 2001) and 0.88 (Adewuya et al., 2007)
(Figure 3). Five studies examined the NPV of the BDI, with values of 0.80 (Bennett et al.,
1997), 0.95 (Dolle et al., 2012), 0.97 (Russell et al., 2012), 0.98 (Adewuya et al., 2007) 0.99
(Canals et al., 2001) and 0.99 (Roberts et al., 1991) (Figure 3). Six studies (7 data points)
used AUC analyses to examine the diagnostic accuracy of the BDI (Adewuya et al., 2007,
Blom et al., 2010, Dolle et al., 2012, Kashani et al., 1990, Roberts et al., 1991, Russell et al.,
2012), and overall accuracy was classified as ‘high’ (7 data points, n = 3249, pooled
estimate: 0.92, 95% CI: 0.81 to 1.00). Metaregression showed that the sample type (clinical
vs. nonclinical; t(1, 6) = 0.39, p = 0.71, R2 < 0), sample age (child vs adolescent; t(1, 6) =
0.09, p = 0.93, R2 <0), cutpoint used (t(1, 6) = 0.17, p = 0.87, R2 < 0) and risk of bias quality
score (t(1, 6) = 1.44, p = 0.19, R2 = 14.16%) did not significantly affect AUC values on the
BDI.
20
Center for Epidemiologic Studies Depression Scale (CES-D)
The CES-D is a 20-item self-report questionnaire designed to detect depression in the general
population. The items address six symptom areas of depression including depressed mood,
feelings of guilt/worthlessness, helplessness, psychomotor retardation, loss of appetite and
sleep disturbance (Radloff, 1977). Each of the 20 items is rated on a scale from 0 to 3, with
total scores ranging from 0-60. The period of assessment is the past week. The CES-DC is the
child version of the original CES-D and retains the same number of items and response
format; however language has been adjusted to suit a child reading level (Weissman et al.,
1980).
Reliability, validity and diagnostic utility coefficients for the CES-D among clinical and
nonclinical samples of children and adolescents are summarised in Table 3 and Figure 4.
Nine studies (10 data points) examined the internal reliability of the original full scale CES-D
(Aebi et al., 2009, Betancourt et al., 2012, Cuijpers et al., 2008, Fendrich et al., 1990,
Garrison et al., 1991, Logsdon and Myers, 2010, Roberts et al., 1991, Thrane et al., 2004).
The pooled estimate for internal reliability was 0.88 (10 data points, n = 9006, 95% CI: 0.84
to 0.92, Figure 4) was classified as ‘good’, however heterogeneity was high (I2 = 92%).
Metaregression showed that sample type (clinical vs. nonclinical; t(1, 7) = 0.87, p = 0.41, R2
< 0), sample age (child vs adolescent; t(1, 7) = 0.79, p = 0.45, R2 = 2.75%) and risk of bias
quality score (t(1, 7) = 0.44, p = 0.67, R2 < 0) did not significantly affect the internal
reliability values on the CES-D.
Nine studies (12 data points) examined the discriminative ability of the CES-D (Aebi et al.,
2009, Betancourt et al., 2012, Cuijpers et al., 2008, Fendrich et al., 1990, Garrison et al.,
1991, Logsdon and Myers, 2010, Prescott et al., 1998, Roberts et al., 1991, Yang et al.,
21
2004), with cutoff scores ranging from 12 to 24. Of these, eight studies used structured or
semi structured clinical interviews to determine a diagnosis of depression. Bivariate meta
analysis revealed that sensitivity coefficients were ‘moderate’ overall (9 data points, n =
9209, pooled estimate: 0.76, 95% CI: 0.67 to 0.84) as was specificity (9 data points, n =
9209, pooled estimate: 0.71, 95% CI: 0.63 to 0.79). Metaregression showed that sample type
(clinical vs. nonclinical; t(1, 7) = 0.08, p = 0.93, R2 < 0), sample age (child vs adolescent; t(1,
7) = 0.80, p = 0.45, R2 < 0) risk of bias quality score (t(1, 7) = 1.71, p = 0.13, R2 = 18.04%)
and cutpoint used (t (1, 7) = 0.70, p = 0.51, R2 < 0) did not significantly affect the sensitivity
values on the CES-D. Similar results were found for specificity, with no impact of sample
type (t(1, 7) = 0.12, p = 0.91, R2 < 0), age (t(1, 7) = 1.36, p = 0.21, R2 = 12.69%), risk of bias
quality score (t(1, 7) = 0.77, p = 0.46, R2 < 0) or cutpoint used (t(1, 7) = 2.12, p = 0.07, R2 =
32.62%).
Five studies (six data points) examined the PPV and NPV of the CES-D, with only one study
conducted among a clinical sample (Logsdon and Myers, 2010), and the remainder in
nonclinical samples (Fendrich et al., 1990, Garrison et al., 1991, Logsdon and Myers, 2010,
Prescott et al., 1998, Roberts et al., 1991). PPV was low overall, with values of 0.08 (Roberts
et al., 1991), 0.15 (Fendrich et al., 1990), 0.16 (Garrison et al., 1991), 0.24 (Prescott et al.,
1998), 0.25 (Logsdon and Myers, 2010) and 0.32 (Garrison et al., 1991). NPV was good
overall (with the exception of the one study conducted among a clinical sample where NPV =
0.12; (Logsdon and Myers, 2010)), and values were 0.96 (Fendrich et al., 1990, Prescott et
al., 1998), 0.98 (Garrison et al., 1991) and 0.99 (Roberts et al., 1991). Seven studies (nine
data points) examined diagnostic utility of the CES-D using AUC analyses (Betancourt et al.,
2012, Cuijpers et al., 2008, Garrison et al., 1991, Logsdon and Myers, 2010, Prescott et al.,
1998, Roberts et al., 1991, Yang et al., 2004), which was classified as ‘moderate’ overall (9
data points, n = 8989, pooled estimate: 0.83, 95% CI: 0.74 to 0.90, I2 = 99%). Metaregression
22
revealed that studies using higher cutpoints on the CES-D had higher diagnostic accuracy
(t(1, 8) = 5.1, p = 0.01, β = 1.01, R2 = 53.22%), with no significant effect found for sample
type (clinical vs. nonclinical; t(1, 8) = 1.54, p = 0.17, R2 = 12.88% ), sample age (child vs.
adolescent; t(1, 8) = 1.12, p = 0.30, R2 = 7.93%), and the risk of bias quality score (t(1, 8) =
0.69, p = 0.51, R2 < 0).
Reynolds Adolescent Depression Scale (RADS)
The Reynolds Adolescent Depression Scale is a 30-item self report instrument developed in
1986 to assess depression symptom severity in adolescents (13-18 years) (Reynolds, 1986).
Responses are made on a four point scale indicating frequency (almost never, hardly ever,
sometimes, most of the time) of symptoms characteristics of depression. The scale assesses
cognitive, somatic, psychomotor, and interpersonal symptoms derived from the DSM-III,
with an assessment period of the past two weeks. The RADS takes approximately 10 minutes
to complete, and scores range from 30 to 120 (Walker et al., 2005a). The RADS-2 is an
updated version of the original RADS. It also contains 30 items, and addresses several
psychometric issues in the original RADS such as re-examining estimates of internal
consistency, criterion related validity, and known groups validity (Osman et al., 2010).
Scores across the reliability, validity and diagnostic utility for the RADS among clinical and
nonclinical samples of children and adolescents are summarised in Table 4. Six studies (7
data points) examined the internal reliability of the RADS (Boyd and Gullone, 1997, Krefetz
et al., 2002, Osman et al., 2010, Reynolds and Miller, 1985, Walker et al., 2005b, Weber and
Terhorst, 2010), which was classified as ‘excellent’ overall (7 data points, n = 11095, pooled
estimate: 0.93, 95% CI: 0.86 to 0.99), however heterogeneity was high (I2 = 92%).
Metaregression showed that sample type (clinical vs. nonclinical; t(1, 4) = 0.52, p = 0.62, R2
23
< 0), sample age (child vs adolescent; t(1, 4) = 1.10, p = 0.32, R2 = 5.56% ) and risk of bias
quality score (t(1, 4) = 0.66, p = 0.54, R2 < 0) did not significantly affect the internal
reliability values on the RADS.
Only two studies examined the discriminative validity of the full scale RADS (Krefetz et al.,
2002, Osman et al., 2010) using cutoff scores of 67 and 70, however neither study used a
structured or semi-structured clinical interview based on DSM or ICD-10 criteria to
determine a diagnosis of MDD, and thus meta analyses and metaregression for sensitivity,
specificity and AUC values were not conducted.
Overall judgement
Comparisons across scales using metaregression revealed no differences in the four key
measures of reliability and validity. Internal reliability was classified as ‘good’ overall (47
data points, n = 31593, pooled estimate: 0.89, 95% CI: 0.86 to 0.92, I2 = 93%), with
metaregression revealing no impact of scale type (CDI, BDI, CES-D, RADS; t(1, 44) = 2.54,
p = 0.07, R2 = 15.31%).sample (clinical vs. nonclinical; t(1, 44) = 0.21, p = 0.84, R2 < 0), age
(child vs. adolescent; t(1, 44) = 1.30, p = 0.20, R2 = 3.33%) or risk of bias quality score (t(1,
44) = 0.05, p = 0.96, R2 < 0). Sensitivity and specificity coefficients were ‘moderate’
(sensitivity: 29 data points, n = 15238, pooled estimate: 0.80, 95% CI: 0.76 to 0.84;
specificity: 29 data points, n = 15238, pooled estimate: 0.78, 95% CI: 0.74 to 0.83), with
metaregression revealing no impact of scale type (t(1, 29) = 1.36, p = 0.18, R2 = 3.06%),
sample type (t(1, 29 = 1.24, p = 0.22, R2 = 1.53%) age (t(1, 29) = 1.25, p = 0.22, R2 = 1.60%)
cutpoint (t(1, 29) = 0.55, p = 0.59, R2 < 0) or risk of bias quality score (t(1, 29) = 1.79, p =
0.08, R2 = 6.54% for sensitivity coefficients. Similar results were found for specificity, with
no difference on the basis of scale type (t(1, 29) = 1.65, p = 0.12, R2 = 16.12%) sample (t(1,
24
29) = 0.32, p = 0.75, R2 < 0), age (t(1, 29) = 1.23, p = 0.23, R2 = 1.41%), cutpoint (t(1, 29) =
0.99. p = 0.33, R2 < 0) or risk of bias quality score (t (1, 29) = 0.25, p = 0.80, R2 < 0). Overall
diagnostic accuracy was ‘moderate’ (20 data points, n = 13384, pooled estimate: 0.86, 95%
CI: 0.79 to 0.92, I2 = 98%), with metaregression revealing no impact of scale type (t(1, 20) =
1.52, p = 0.14, R2 =6.19%) , sample (t(1, 20) = 0.84, p = 0.81, R2 < 0) , age (t(1, 20) = 0.41, p
= 0.68, R2 < 0) cutpoint (t(1, 20) = 0.25, p = 0.80, R2 < 0) or risk of bias quality score (t (1,
20) = 0.97, p = 0.34, R2 < 0).
Discussion
This is the first systematic review and meta analysis to provide an objective judgement of the
reliability, validity and diagnostic utility of symptom screening scales used in prevention
research for MDD among children and adolescents. The review identified 54 studies
(published in 52 papers), of which 32 studies were published since the last major narrative
review was conducted in 2002 (Myers and Winters, 2002b). The outcomes of this review
indicate that commonly used symptom screening scales for MDD including the CDI, BDI,
CES-D and the RADS have good internal consistency when used among children and
adolescents; and metaregression revealed that these properties did not differ when a range of
factors were considered, including sample type (clinical vs. nonclinical), age (child vs.
adolescent) and the risk of bias quality score. While the ability of each scale to correctly
identify positive and negative cases (i.e. diagnostic utility) was moderate overall; positive
predictive power was poor across most scales, suggesting that using cutoff scores on these
scales to determine clinical levels of MDD may result in high misclassification rates,
particularly when used in nonclinical settings (such as schools). Importantly, cutoff scores to
determine ‘caseness’ varied widely, and no single score was identified to be applicable in
both clinical and community settings, for any scale. However, metaregression revealed that
25
studies using higher cutoff scores on the CES-D had higher diagnostic accuracy (AUC) than
those with lower scores. Regardless, if researchers or clinicians choose to employ cutoff
scores on commonly used symptom screening scales for MDD to determine clinical
‘caseness’, they will most likely need to be adjusted to suit the sample under investigation.
Limitations
There are several limitations to consider when interpreting the outcomes of this review.
Firstly, risk of bias for the included studies was largely unable to be determined, or
determined to be high due to insufficient reporting. Secondly, many studies adjusted the
cutoff score to determine clinical caseness post hoc in order to maximise sensitivity and
specificity values, which may have artificially increased the diagnostic accuracy results using
AUC analyses, as these values are based on a trade off of sensitivity and specificity.
Consequently, the results regarding diagnostic utility identified in this review may differ in
real world settings where specific cutpoints are typically chosen a priori to classify
participants as having clinically significant depressive symptoms. Further, such cutpoints are
likely to vary substantially on a study by study basis and depending on the context in which
the scale is used (e.g. schools versus clinics). Thirdly, positive and negative predictive values
are dependent on the prevalence of the disorder in the sample (Trikalinos et al., 2012), and as
such, the low positive predictive values identified in this review may reflect a low prevalence
of depression among some samples; however the small number of studies and significant
heterogeneity between them precluded the use of metaregression to determine factors which
may have influenced predictive power in this review. Where metaregression was conducted,
it is possible that some analyses were underpowered due to low numbers of studies with
adequate data, as evidenced by low adjusted R2 values for some outcomes. A greater number
26
of high quality studies examining the psychometric properties of these scales across a range
of settings and ages is needed in in order to identify circumstances in which performance of
these scales is likely to differ. Further, while our review considered internal reliability of the
scales, other measures of reliability that may impact scale utility, including test retest and
inter rater reliability were less consistently reported, precluding the conduct of meta analyses
and metaregression on these outcomes. Finally, it is also important to consider that while the
criterion standards employed throughout this review (i.e. diagnostic interview schedules
based on DSM or ICD-10 diagnostic criteria such as the K-SADs) are currently the gold
standard in detecting the presence of a depressive disorder (Carlisle and McClellan, 2009),
they are not perfect measures, and are also open to the same issues of reliability and validity
as the screening scales themselves (Hodges, 1993).
There has been recent criticism of binary classification systems such as the DSM, which take
the assumption that people should be classified as either having or not having a particular
psychopathological disorder (Hankin et al., 2005), with suggestions that depression may
occur on a continuum (Solomon et al., 2001), particularly so in youth (Hankin et al., 2005). In
light of this, other outcomes in depression treatment that take into account the disability
caused by the disorder have been considered, such as functional impairment (McKnight and
Kashdan, 2009). Recent research suggests that indicators of disability or functional
impairment, in the absence of a formal diagnosis of a mental disorder may be a useful
measure of need for mental health services and treatment. An analysis of the 2007 Australian
Survey of Mental Health and Wellbeing found that a significant proportion of those who had
used mental health services in the past 12 months did not have a formally diagnosed mental
disorder, but had other indicators of possible need, including maintenance treatment
following a previous episode or significant psychological distress in the absence of a
diagnosis (Harris et al., 2014). Thus, in addition to using classification systems such as ICD
27
and DSM to determine mental disorder diagnoses and symptom screening scales to measure
symptom severity, researchers examining the efficacy of depression prevention and
treatment interventions should also consider including items assessing functional impairment,
as this may be a useful indicator of disability and need for treatment, even if a formal mental
diagnosis is not given (Üstün and Kennedy, 2009).
While there has been some suggestion that routine screening of MDD symptoms may
improve the detection of, and initiate timely treatment for MDD among children and
adolescents at a population level (Thombs et al., 2012), the impact of routine screening for
MDD on health outcomes for children and adolescents is unclear (Williams et al., 2009), and
there is some concern that screening may have the potential to cause harm to those who may
be diagnosed and treated inappropriately (MacMillan et al., 2005). Reviews conducted by the
Canadian Task Force on Preventive Health Care, and the United Kingdom’s National
Institute of Health and Clinical Excellence both concluded that there was insufficient
evidence regarding health outcomes to support the routine screening of MDD among children
and adolescents in primary healthcare settings (MacMillan et al., 2005, National
Collaborating Centre for Mental Health, 2005). Unfortunately there have been few large
prospective cohort studies that have examined the ability of MDD screening scales in
predicting the onset of depression among adolescents in order to inform their routine use in
healthcare settings (van Lang et al., 2007). Interestingly such studies have indicated that
recurrent screening of depressive symptoms does not improve detection of MDD, but that
cognitive and physical symptoms (such as sleeping problems) may be better predictors.
(McKenzie et al., 2011, van Lang et al., 2007). For example, McKenzie et al, 2011 identified
that particular items of the Short Mood and Feelings Questionnaire (SMFQ) that are not
included in formal diagnostic criteria for MDD, such as “I hated myself” were most
predictive of high depressive symptoms 12 months later. Such single items may have utility
28
for being incorporated into short screening measures, or existing population level surveys -
many of which exclude mental disorder symptomatology (Baxter et al., 2013), however
further research using large cohort studies is needed to support this.
Conclusions
Symptom screening scales for MDD are reliable measures of MDD symptomatology among
adolescent samples. While we found that the psychometric properties of the scales did not
differ when used in clinical and nonclinical settings, across a range of subject ages and with
variable study quality, limited data was available, and further high quality studies are needed
to identify conditions in which test performance may vary. Cutoff scores on symptom
screening scales for MDD may be useful for identifying ‘risk status’, however, the cutoffs
used will likely vary depending on the context in which the scale is applied, and
misclassification is likely to be high, particularly in nonclinical samples where disorder
prevalence is low (e.g. school settings).Further research examining the predictive ability of
depression screening scales in cohort studies of children and adolescents are needed.
29
References
Adewuya, A. O., Ola, B. A. & Aloba, O. O. 2007. Prevalence of major depressive disorders
and a validation of the Beck Depression Inventory among Nigerian adolescents. European Child & Adolescent Psychiatry, 16, 287-292.
Aebi, M., Metzke, C. W. & Steinhausen, H. C. 2009. Prediction of major affective disorders in adolescents by self-report measures. J Affect Disord, 115, 140-9.
Allgaier, A. K., Fruhe, B., Pietsch, K., Saravo, B., Baethmann, M. & Schulte-Korne, G. 2012. Is the Children's Depression Inventory Short version a valid screening tool in pediatric care? A comparison to its full-length version. Journal of psychosomatic research, 73, 369-74.
Ambrosini, P. J., Metz, C., Bianchi, M. D., Rabinovich, H. & Undie, A. 1991. Concurrent validity and psychometric properties of the Beck Depression Inventory in outpatient adolescents. J Am Acad Child Adolesc Psychiatry, 30, 51-7.
Andrews, G., Szabo, M. & Burns, J. 2002. Preventing major depression in young people. Brit J Psychiat, 181, 460-462.
Andrews, J. A., Lewinsohn, P. M., Hops, H. & Roberts, R. E. 1993. Psychometric properties of scales for the measurement of psychosocial variables associated with depression in adolescence. Psychol Rep, 73, 1019-1046.
Barrera, M., Jr. & Garrison-Jones, C. V. 1988. Properties of the Beck Depression Inventory as a screening instrument for adolescent depression. Journal of abnormal child psychology, 16, 263-73.
Baxter, A. J., Patton, G., Scott, K. M., Degenhardt, L. & Whiteford, H. A. 2013. Global epidemiology of mental disorders: what are we missing? PLoS One, 8, e65514.
Beck, A. T. & Alford, B. A. 2009. Depression: Causes and treatment, University of Pennsylvania Press.
Beck, A. T., Steer, R. A., Ball, R. & Ranieri, W. F. 1996. Comparison of Beck Depression Inventories-IA and-II in psychiatric outpatients. J Pers Assess, 67, 588-597.
Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. & Erbaugh, J. 1961. An inventory for measuring depression. Arch Gen Psychiatry, 4, 561.
Bennett, D. S., Ambrosini, P. J., Bianchi, M., Barnett, D., Metz, C. & Rabinovich, H. 1997. Relationship of Beck Depression Inventory factors to depression among adolescents. J Affect Disord, 45, 127-34.
Betancourt, T., Scorza, P., Meyers-Ohki, S., Mushashi, C., Kayiteshonga, Y., Binagwaho, A., Stulac, S. & Beardslee, W. R. 2012. Validating the center for epidemiological studies depression scale for children in Rwanda. Journal of the American Academy of Child & Adolescent Psychiatry, 51, 1284-1292.
Birmaher, B., Ryan, N. D., Williamson, D. E., Brent, D. A., Kaufman, J., Dahl, R. E., Perel, J. & Nelson, B. 1996. Childhood and adolescent depression: a review of the past 10 years. Part I. J Am Acad Child Adolesc Psychiatry, 35, 1427-39.
Blom, E. H., Larsson, J.-O., Serlachius, E. & Ingvar, M. 2010. The differentiation between depressive and anxious adolescent females and controls by behavioural self-rating scales. J Affect Dis, 122, 232-240.
Boyd, C. P. & Gullone, E. 1997. An investigation of negative affectivity in Australian adolescents. Journal of Clinical Child Psychology, 26, 190-197.
Brent, D. A., Kalas, R., Edelbrock, C., Costello, A. J., Dulcan, M. K. & Conover, N. 1986. Psychopathology and its relationship to suicidal ideation in childhood and adolescence. J Am Acad Child Psychiatry, 25, 666-73.
30
Brooks, S. J. & Kutcher, S. 2001. Diagnosis and measurement of adolescent depression: a review of commonly utilized instruments. J Child Adolesc Psychopharmacol, 11, 341-76.
Brooks, S. J. & Kutcher, S. 2003. Diagnosis and measurement of anxiety disorder in adolescents: a review of commonly used instruments. J Child Adolesc Psychopharmacol, 13, 351-400.
Canals, J., Bladé, J., Carbajo, G. & Domènech-Llabería, E. 2001. The Beck Depression Inventory: Psychometric characteristics and usefulness in nonclinical adolescents. European Journal of Psychological Assessment, 17, 63-68.
Carey, M. P., Faulstich, M. E., Gresham, F. M., Ruggiero, L. & Enyart, P. 1987. Children's Depression Inventory: Construct and discriminant validity across clinical and nonreferred (control) populations. J Consult Clin Psychol, 55, 755.
Carlisle, L. & McClellan, J. 2009. Diagnostic interviews. In: DULCAN, M. K. (ed.) Textbook
of Child and Adolescent Psychiatry. Washington, DC: American Psychiatric Publishing.
Collett, B. R., Ohan, J. L. & Myers, K. M. 2003. Ten-Year Review of Rating Scales. V: Scales Assessing Attention-Deficit/Hyperactivity Disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 42, 1015-1037.
Costello, J. E., Erkanli, A. & Angold, A. 2006. Is there an epidemic of child or adolescent depression? J Child Psychol Psychiatry, 47, 1263-71.
Craighead, W. E., Curry, J. F. & Ilardi, S. S. 1995. Relationship of Children's Depression Inventory factors to major depression among adolescents. Psychol Assess, 7, 171.
Crowley, S. L. & Emerson, E. N. 1996. Discriminant validity of self-reported anxiety and depression in children: Negative affectivity or independent constructs? Journal of Clinical Child Psychology, 25, 139-146.
Crowley, S. L., Thompson, B. & Worchel, F. 1994. Validity Studies the Children'S Depression Inventory: A Comparison of Generalizability and Classical Test Theory Analyses. Educational and Psychological Measurement, 54, 705-713.
Cuijpers, P., Boluijt, P. & van Straten, A. 2008. Screening of depression in adolescents through the Internet : sensitivity and specificity of two screening questionnaires. European child & adolescent psychiatry, 17, 32-8.
Dolle, K., Schulte-Korne, G., O'Leary, A. M., von Hofacker, N., Izat, Y. & Allgaier, A. K. 2012. The Beck depression inventory-II in adolescent mental health patients: cut-off scores for detecting depression and rating severity. Psychiatry Res, 200, 843-8.
Fendrich, M., Weissman, M. M. & Warner, V. 1990. Screening for depressive disorder in children and adolescents: validating the Center for Epidemiologic Studies Depression Scale for Children. American journal of epidemiology, 131, 538-51.
Ferrari, A. J., Somerville, A. J., Baxter, A. J., Norman, R., Patten, S. B., Vos, T. & Whiteford, H. A. 2013. Global variation in the prevalence and incidence of major depressive disorder: a systematic review of the epidemiological literature. Psychol Med, 43, 471-81.
Figueras Masip, A., Amador-Campos, J. A., Gomez-Benito, J. & del Barrio Gandara, V. 2010. Psychometric properties of the Children's Depression Inventory in community and clinical sample. The Spanish journal of psychology, 13, 990-9.
Fristad, M. A., Weller, E. B., Weller, R. A., Teare, M. & Preskorn, S. H. 1988. Self-report vs. biological markers in assessment of childhood depression. J Affect Disord, 15, 339-45.
Fundudis, T., Berney, T. P., Kolvin, I., Famuyiwa, O. O., Barrett, L., Bhate, S. & Tyrer, S. P. 1991. Reliability and validity of two self-rating scales in the assessment of childhood depression. The British journal of psychiatry. Supplement, 36-40.
31
Garrison, C. Z., Addy, C. L., Jackson, K. L., McKeown, R. E. & Waller, J. L. 1991. The CES-D as a screen for depression and other psychiatric disorders in adolescents. J Am Acad Child Adolesc Psychiatry, 30, 636-41.
Giannakopoulos, G., Kazantzi, M., Dimitrakaki, C., Tsiantis, J., Kolaitis, G. & Tountas, Y. 2009. Screening for children's depression symptoms in Greece: the use of the Children's Depression Inventory in a nation-wide school-based sample. European Child & Adolescent Psychiatry, 18, 485-92.
Gore, F. M., Bloem, P. J. N., Patton, G. C., Ferguson, J., Joseph, V., Coffey, C., Sawyer, S. M. & Mathers, C. D. 2011. Global burden of disease in young people aged 10–24 years: a systematic analysis. Lancet, 377, 2093-2102.
Hankin, B. L., Fraley, R. C., Lahey, B. B. & Waldman, I. D. 2005. Is depression best viewed as a continuum or discrete category? A taxometric analysis of childhood and adolescent depression in a population-based sample. Journal of abnormal psychology, 114, 96-110.
Harris, M. G., Diminic, S., Burgess, P. M., Carstensen, G., Stewart, G., Pirkis, J. & Whiteford, H. A. 2014. Understanding service demand for mental health among Australians aged 16 to 64 years according to their possible need for treatment. Aust NZ J Psychiat.
Henderson, A. R. 1993. Assessing test accuracy and its clinical consequences: a primer for receiver operating characteristic curve analysis. Ann Clin Biochem, 30 ( Pt 6), 521-39.
Higgins, J., Thompson, S., Deeks, J. & Altman, D. 2003. Measuring inconsistency in meta-analyses. Brit Med J, 327, 557-560.
Hodges, K. 1993. Structured interviews for assessing children. J Child Psychol Psychiatry, 34, 49-68.
Hosman, C., Jane-Llopis, E. & Saxena, S. 2005. Prevention of mental disorders: effective interventions and policy options Oxford, Oxford University Press.
Jolly, J. B., Wiesner, D. C., Wherry, J. N., Jolly, J. M. & Dykman, R. A. 1994. Gender and the comparison of self and observer ratings of anxiety and depression in adolescents. J Am Acad Child Adolesc Psychiatry, 33, 1284-8.
Kashani, J. H., Sherman, D. D., Parker, D. R. & Reid, J. C. 1990. Utility of the Beck Depression Inventory with clinic-referred adolescents. J Am Acad Child Adolesc Psychiatry, 29, 278-82.
Kovacs, M. 1984. The Children's Depression, Inventory (CDI). Psychopharmacol Bull, 21, 995-998.
Kovacs, M. & Beck, A. T. 1977. An empirical-clinical approach toward a definition of childhood depression. Depression in childhood: Diagnosis, treatment, and conceptual models, 1-25.
Krefetz, D. G., Steer, R. A., Gulab, N. A. & Beck, A. T. 2002. Convergent validity of the Beck depression inventory-II with the reynolds adolescent depression scale in psychiatric inpatients. J Pers Assess, 78, 451-60.
Kumar, G., Steer, R. A., Teitelman, K. B. & Villacis, L. 2002. Effectiveness of Beck Depression Inventory-II subscales in screening for major depressive disorders in adolescent psychiatric inpatients. Assessment, 9, 164-70.
Leaf, P. J., Alegria, M., Cohen, P., Goodman, S. H., Horwitz, S. M., Hoven, C. W., Narrow, W. E., Vaden-Kiernan, M. & Regier, D. A. 1996. Mental health service use in the community and schools: results from the four-community MECA Study. Methods for the Epidemiology of Child and Adolescent Mental Disorders Study. J Am Acad Child Adolesc Psychiatry, 35, 889-97.
32
Leeflang, M. M., Deeks, J. J., Rutjes, A. W., Reitsma, J. B. & Bossuyt, P. M. 2012. Bivariate meta-analysis of predictive values of diagnostic tests can be an alternative to bivariate meta-analysis of sensitivity and specificity. Journal of clinical epidemiology, 65, 1088-97.
Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., Clarke, M., Devereaux, P. J., Kleijnen, J. & Moher, D. 2009. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ, 339.
Logsdon, M. C. & Myers, J. A. 2010. Comparative performance of two depression screening instruments in adolescent mothers. J Womens Health 19, 1123-8.
MacMillan, H. L., Patterson, C. J. S., Wathen, C. N. & Care, a. T. C. T. F. o. P. H. 2005. Screening for depression in primary care: recommendation statement from the Canadian Task Force on Preventive Health Care. Can Med Assoc J, 172, 33-35.
Marton, P., Churchard, M., Kutcher, S. & Korenblum, M. 1991. Diagnostic utility of the Beck Depression Inventory with adolescent psychiatric outpatients and inpatients. Canadian journal of psychiatry. Revue canadienne de psychiatrie, 36, 428-31.
McKenzie, D. P., Toumbourou, J. W., Forbes, A. B., Mackinnon, A. J., McMorris, B. J., Catalano, R. F. & Patton, G. C. 2011. Predicting future depression in adolescents using the Short Mood and Feelings Questionnaire: a two-nation study. J Affect Dis, 134, 151-9.
McKnight, P. E. & Kashdan, T. B. 2009. The importance of functional impairment to mental health outcomes: A case for reassessing our goals in depression treatment research. Clinical Psychology Review, 29, 243.
Merry, S., McDowell, H., Hetrick, S., Bir, J. & Muller, N. 2004. Psychological and/or educational interventions for the prevention of depression in children and adolescents. Cochrane Database Syst Rev, CD003380.
Merry, S. N., Hetrick, S. E., Cox, G. R., Brudevold-Iversen, T., Bir, J. J. & McDowell, H. 2011. Psychological and educational interventions for preventing depression in children and adolescents. Cochrane Database Syst Rev, CD003380.
Murphy, K. & Davidshofer, C. 1994. Psychological testing: principles and applications. Upper Saddle River, NJ: Prentice Hall.
Myers, K. & Winters, N. C. 2002a. Ten-year review of rating scales. I: overview of scale functioning, psychometric properties, and selection. J Am Acad Child Adolesc Psychiatry, 41, 114-22.
Myers, K. & Winters, N. C. 2002b. Ten-year review of rating scales. II: Scales for internalizing disorders. J Am Acad Child Adolesc Psychiatry, 41, 634-59.
National Collaborating Centre for Mental Health 2005. Depression in children and young people: Identification and management in primary, community and secondary care,
UK, NICE. Nelson III, W., Politano, P., Finch Jr, A., Wendel, N. & Mayhall, C. 1987. Children's
Depression Inventory: Normative data and utility with emotionally disturbed children. Journal of the American Academy of Child & Adolescent Psychiatry, 26, 43-48.
Osman, A., Gutierrez, P. M., Bagge, C. L., Fang, Q. & Emmerich, A. 2010. Reynolds adolescent depression scale-second edition: a reliable and useful instrument. J Clin Psychol, 66, 1324-45.
Pavuluri, M. & Birmaher, B. 2004. A practical guide to using ratings of depression and anxiety in child psychiatric practice. Current psychiatry reports, 6, 108-16.
Ponterotto, J. G. & Ruckdeschel, D. E. 2007. An overview of coefficient alpha and a reliability matrix for estimating adequacy of internal consistency coefficients with psychological research measures. Percept Motor Skill, 105, 997-1014.
33
Prescott, C. A., McArdle, J. J., Hishinuma, E. S., Johnson, R. C., Miyamoto, R. H., Andrade, N. N., Edman, J. L., Makini, G. K., Jr., Nahulu, L. B., Yuen, N. Y. & Carlton, B. S. 1998. Prediction of major depression and dysthymia from CES-D scores among ethnic minority adolescents. J Am Acad Child Adolesc Psychiatry, 37, 495-503.
Radloff, L. S. 1977. The CES-D scale a self-report depression scale for research in the general population. Appl Psych Meas, 1, 385-401.
Reavley, N. J., Cvetkovski, S., Jorm, A. F. & Lubman, D. I. 2010. Help-seeking for substance use, anxiety and affective disorders among young people: results from the 2007 Australian National Survey of Mental Health and Wellbeing. Aust N Z J Psychiatry, 44, 729-35.
Reitsma, J., Rutjes, A., Whiting, P., Vlassov, V., Leeflang, M. & Deeks, J. 2009. Chapter 9: Assessing methodological quality. In: DEEKS JJ, B. P., GATSONIS C (ed.) Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version
1.0.0. Available from: http://srdta.cochrane.org/. The Cochrane Collaboration. Reynolds, W. M. 1986. A model for the screening and identification of depressed children
and adolescents in school settings. Prof School Psychol, 1, 117. Reynolds, W. M., Anderson, G. & Bartell, N. 1985. Measuring depression in children: a
multimethod assessment investigation. Journal of abnormal child psychology, 13, 513-26.
Reynolds, W. M. & Miller, K. L. 1985. Depression and learned helplessness in mentally retarded and nonmentally retarded adolescents: An initial investigation. Applied Research in Mental Retardation, 6, 295-306.
Roberts, R. E., Lewinsohn, P. M. & Seeley, J. R. 1991. Screening for adolescent depression: a comparison of depression scales. J Am Acad Child Adolesc Psychiatry, 30, 58-66.
Roelofs, J., Braet, C., Rood, L., Timbremont, B., van Vlierberghe, L., Goossens, L. & van Breukelen, G. 2010. Norms and screening utility of the Dutch version of the Children's Depression Inventory in clinical and nonclinical youths. Psychol Assess, 22, 866-77.
Russell, P. S., Basker, M., Russell, S., Moses, P. D., Nair, M. K. & Minju, K. A. 2012. Comparison of a self-rated and a clinician-rated measure for identifying depression among adolescents in a primary-care setting. Indian journal of pediatrics, 79 Suppl 1, S45-51.
Saylor, C. F., Finch, A. J., Jr., Spirito, A. & Bennett, B. 1984. The children's depression inventory: a systematic evaluation of psychometric properties. J Consult Clin Psychol, 52, 955-67.
Shemesh, E., Yehuda, R., Rockmore, L., Shneider, B. L., Emre, S., Bartell, A. S., Schmeidler, J., Annunziato, R. A., Stuber, M. L. & Newcorn, J. H. 2005. Assessment of depression in medically ill children presenting to pediatric specialty clinics. J Am Acad Child Adolesc Psychiatry, 44, 1249-57.
Smucker, M. R., Craighead, W. E., Craighead, L. W. & Green, B. J. 1986. Normative and reliability data for the Children's Depression Inventory. Journal of abnormal child psychology, 14, 25-39.
Solomon, A., Haaga, D. A. & Arnow, B. A. 2001. Is clinical depression distinct from subthreshold depressive symptoms? A review of the continuity issue in depression research. The Journal of nervous and mental disease, 189, 498-506.
Sorensen, M. J., Frydenberg, M., Thastum, M. & Thomsen, P. H. 2005. The Children's Depression Inventory and classification of major depressive disorder: validity and reliability of the Danish version. European child & adolescent psychiatry, 14, 328-34.
StataCorp 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp, LP.
34
Strober, M., Green, J. & Carlson, G. 1981. Utility of the Beck Depression Inventory with psychiatrically hospitalized adolescents. J Consult Clin Psychol, 49, 482-3.
Suzuki, T. 2011. Which rating scales are regarded as 'the standard' in clinical trials for schizophrenia? A critical review. Psychopharmacol Bull, 44, 18-31.
Teri, L. 1982. The use of the beck depression inventory with adolescents. Journal of abnormal child psychology, 10, 277-284.
Thombs, B. D., Roseman, M. & Kloda, L. A. 2012. Depression screening and mental health outcomes in children and adolescents: a systematic review protocol. Syst Rev, 1, 58.
Thompson, R. D., Craig, A. E., Mrakotsky, C., Bousvaros, A., DeMaso, D. R. & Szigethy, E. 2012. Using the Children's Depression Inventory in youth with inflammatory bowel disease: support for a physical illness-related factor. Comprehensive psychiatry, 53, 1194-9.
Thrane, L. E., Whitbeck, L. B., Hoyt, D. R. & Shelley, M. C. 2004. Comparing three measures of depressive symptoms among American Indian adolescents. American Indian and Alaska native mental health research (Online), 11, 20-42.
Timbremont, B., Braet, C. & Dreessen, L. 2004. Assessing depression in youth: relation between the Children's Depression Inventory and a structured interview. Journal of clinical child and adolescent psychology : the official journal for the Society of Clinical Child and Adolescent Psychology, American Psychological Association, Division 53, 33, 149-57.
Trikalinos, T., Balion, C., Coleman, C., Griffith, L., Santaguida, P. L., Vandermeer, B. & Fu, R. 2012. Chapter 8: Meta-Analysis of Test Performance When There Is a “Gold Standard”. In: CHANG, S., MATCHAR, D. & SMETANA, G. (eds.) Methods Guide
for Medical Test Reviews [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US).
Üstün, B. & Kennedy, C. 2009. What is “functional impairment”? Disentangling disability from clinical significance World Psychiatry, 8, 85-85.
van Lang, N. D. J., Ferdinand, R. F. & Verhulst, F. C. 2007. Predictors of future depression in early and late adolescence. J Affect Dis, 97, 137-144.
Walker, L., Merry, S., Watson, P. D., Robinson, E., Crengle, S. & Schaaf, D. 2005a. The Reynolds Adolescent Depression Scale in New Zealand adolescents. Aust NZ J Psychiat, 39, 136-140.
Walker, L., Merry, S., Watson, P. D., Robinson, E., Crengle, S. & Schaaf, D. 2005b. The Reynolds Adolescent Depression Scale in New Zealand adolescents. Australian & New Zealand Journal of Psychiatry, 39, 136-40.
Weber, S. & Terhorst, L. 2010. Does the currently accepted RADS-2 factor structure apply to LGBTIQ youth? Journal of Psychiatric & Mental Health Nursing, 17, 621-7.
Weiss, B., Weisz, J. R., Politano, M., Carey, M., Nelson, W. M. & Finch, A. J. 1991. Developmental differences in the factor structure of the Children's Depression Inventory. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 38.
Weissman, M. M., Orvaschel, H. & Padian, N. 1980. Children's symptom and social functioning self-report scales comparison of mothers' and children's reports. The Journal of nervous and mental disease, 168, 736-740.
Whitaker, A., Johnson, J., Shaffer, D., Rapoport, J. L., Kalikow, K., Walsh, B. T., Davies, M., Braiman, S. & Dolinsky, A. 1990. Uncommon troubles in young people: prevalence estimates of selected psychiatric disorders in a nonreferred adolescent population. Arch Gen Psychiatry, 47, 487-96.
Whiting, P. F., Rutjes, A. W., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., Leeflang, M. M., Sterne, J. A. & Bossuyt, P. M. 2011a. QUADAS-2: a revised tool
35
for the quality assessment of diagnostic accuracy studies. Annals of internal medicine, 155, 529-36.
Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., Leeflang, M. M. G., Sterne, J. A. C. & Bossuyt, P. M. M. 2011b. QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies. Annals of internal medicine, 155, 529-536.
Williams, S. B., O'Connor, E. A., Eder, M. & Whitlock, E. P. 2009. Screening for child and adolescent depression in primary care settings: A systematic evidence review for the US Preventive Services Task Force. Pediatrics, 123, e716-e735.
Yang, H. J., Soong, W. T., Kuo, P. H., Chang, H. L. & Chen, W. J. 2004. Using the CES-D in a two-phase survey for depressive disorders among nonreferred adolescents in Taipei: a stratum-specific likelihood ratio analysis. J Affect Disord, 82, 419-30.
36
Acknowledgements
The authors would like to acknowledge Professor Gavin Andrews (UNSW Australia) and Dr
Mary Lou Chatterton (Deakin University) for their assistance in developing ideas and
refining the concept of the manuscript.
37
Conflict of interest
All authors declare that they have no conflicts of interest.
38
Contributors
EAS, LD and CM conceived the study. EAS, YL and AL conducted the reviews and extracted
the data, and EAS and AL synthesised the data, with MH and GP contributing to
interpretation. All authors contributed to, and approved the final manuscript.
39
Role of funding source
The work was funded by the National Health and Medical Research Council of Australia
(NH&MRC Grant No. APP1041131).
40
Table 1. Validation evidence for the Children’s Depression Inventory (CDI) in child and adolescent samples
Source N Age
&
Gen
der
(%m, %f)
Sample
(location) Sca
le
na
me
(no. of ite
ms)
Reliabil
ity
Criterio
n
Cut
off
Sensiti
vity
Specifi
city
PP
V
NP
V
AU
C
Clinical
Allgaier et al. 2012
406
9-12 (56, 44)
Child hospital patients (Germany)
CDI (27)
α = 0.84
Kinder-DIPS
≥ 12
0.83
0.83
0.28
0.98
0.88
Figueras Masip et al. 2010a
102
10-18 (41, 59)
Child and adolescent clinical sample (Spain)
CDI (27)
α = 0.85
Clinical interview
19 0.95 0.96 - - -
Roelofs et al. 2010
511
7-18 (52, 48)
Child and adolescent clinical sample (Belgium)
CDI (27)
- KID-SCID
16 0.92 0.95 0.35
1.00
0.95
Shemesh et al. 2005
81 8-19 (42, 58)
Child and adolescent hospital patients (USA)
CDI (27)
- K-SADS-PL
11 0.80 0.70 0.38
0.94
-
Sorensen et al. 2005
149
8-13
Child and adolescent psychiatric patients (Denmark)
CDI (27)
α = 0.86
K-SADS-PL
12 0.63 0.64 0.21
0.63
0.72
Timbremont et al. 2004
80 8-18 (42, 58)
Child and adolescent psychiatric inpatients/outpatients, (Belgium, Netherlands)
CDI (27)
κ = 0.76
KID-SCID
16 0.84 0.94 0.63
0.98
0.86
Craighead et al. 1995
107
12-18 (54, 46)
Child and adolescent psychiatric inpatients (USA)
CDI (27)
- K-SADS-III-R
17 0.81 0.84 - - -
Weiss et al. 1991a
515
8-12 (73, 27)
Child in- and out- patients
CDI (27)
α = 0.86
- - - - - - -
Weiss et al. 1991b
768
13-16 (52, 48)
Adolescent in- and out- patients
CDI (27)
α = 0.88 - - - - - - -
Nelson III and Politano 1990
96 6-15 (61, 39)
Child and adolescent psychiatric inpatients (USA)
CDI (27)
Test-retest: 0.62
- - - - - - -
Fristad et al. 1988
98 6-13 Child and adolescent depressed inpatients, psychiatric inpatient controls,
CDI (27)
- DICA ≥15 0.89 0.65 0.90
- -
41
normal controls (USA)
Carey et al. 1987a
153
9-17 (75, 25)
Child and adolescent inpatients (USA)
CDI (27)
α = 0.83 - - - - - - -
Nelson III et al. 1987
535
6-17 (68, 32)
Child and adolescent psychiatric inpatients (USA)
CDI (27)
α = 0.86 - - - - - - -
Saylor et al. 1984a
105
7-16 (69, 31)
Child and adolescent inpatients (USA)
CDI (27)
Kuder-Richardson coefficient= 0.80
- - - - - - -
Non-
Clinical
Thompson et al. 2012
191
11-17 (53, 47)
Child and adolescent sample with inflammatory bowel disease (USA)
CDI (27)
α = 0.88 - - - - - - -
Figueras Masip et al. 2010b
1705
10-18 (46, 54)
Child and adolescent community sample (Spain)
CDI (27)
α = 0.82 Test-retest = 0.81
- - - - - - -
Giannakopoulos et al. 2009
538
8-12 (47, 53)
Child nation-wide school-based sample (Greece)
CDI (27)
α= 0.80; test-retest reliability ICC: 0.82 for girls and 0.62 for boys
- ≥ 15 - - - - -
Crowley and Emerson 1996
273
8-12 (51, 49)
Child school students (USA)
CDI (27)
α = 0.89 - - - - - - -
Crowley et al. 1994
164
11-16 (49, 51)
Child and adolescent community sample (USA)
CDI (27)
α = 0.86 - - - - - - -
Fundudis et al. 1991
93 8-16 (50, 50)
Children and adolescents of staff of a university child psychiatry department (USA)
CDI (27)
α = 0.88 Standardised Psychiatric Interview
15 0.77 0.77 - - -
Carey et al. 1987b
153
9-17 (74, 26)
Child and adolescent non-referred sample
CDI (27)
α = 0.75 - - - - - - -
Finch et al. 1987
108
7-12 (51, 49)
Child sample from public schools (USA)
CDI (27)
Test-retest: 0.82
- - - - - - -
42
Smucker et al. 1986
1252
8-16 (47, 53)
Child and adolescent sample from public schools (USA)
CDI (27)
α = 0.89 - - - - - - -
Reynolds et al 1985
166
8-12 (46, 54)
Child sample from public schools (USA)
CDI (27)
α = 0.90 - - - - - - -
Saylor et al. 1984b
72 10-13 (31, 69)
Child sample from public schools (USA)
CDI (27)
Kuder-Richardson coefficient= 0.94
- - - - - - -
Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis CDI = Children’s Depression Inventory α = Cronbach’s alpha reliability co-efficient κ = Cohen’s kappa reliability co-efficient Kinder-DIPS: The Diagnostic Interview for Psychiatric Disorders in Children and Adolescents KID-SCID: The child version of the Structured Clinical Interview for DSM Disorders K-SADS-PL: The Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version K-SADS-E: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – Epidemiological Version K-SADS-III-R: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – the revision of the third edition DICA: The Diagnostic Interview for Children and Adolescents
43
Table 2. Validation evidence for the Beck Depression Inventory (BDI) in child and adolescent samples
Source N Age &
Gender
(%m, %f)
Sample
(location)
Scal
e
nam
e
(no. of
items)
Reliabil
ity
Criterion Cuto
ff
Sensitiv
ity
Specific
ity
PP
V
NP
V
A
U
C
Clinic
al
Dolle et al. 2012
88 13-16 Adolescent psychiatric patients (Germany)
BDI-II (21)
α = 0.94 Kinder-DIPS
≥23 0.88 0.92 0.81
0.95 0.93
Blom et al. 2010
73 14-18 (0, 100)
Adolescent girls who were psychiatric patients (Sweden)
BDI-A1 (21)
- DAWBA >14 - - - - 0.86
Krefetz et al. 2002
100
12-17 (44, 56)
Child and adolescent psychiatric inpatients (USA)
BDI-II (21)
α = 0.92 PRIME-MD
24 0.74 0.70 0.76
0.67 0.78
Kumar et al. 2002
100
12-17 (45, 55)
Child and adolescent psychiatric inpatients (USA)
BDI-II (21)
α = 0.94 PRIME-MD
21 0.85 0.83 0.85
0.83 0.92
Bennett et al. 1997
328
11-19 (41.5, 58.5)
Child and adolescent psychiatric inpatients and outpatients of a clinic for depression (USA)
BDI (21)
- K-SADS 13 0.87 0.71 0.81
0.80 -
Jolly et al. 1994
75 12-17 (47, 53)
Child and adolescent inpatient
BDI (21)
α = 0.88 - - - - - - -
44
s (USA)
Ambrosini et al. 1991
122
12-19 (43, 57)
Child and adolescent outpatients referred to a clinic for depression (USA)
BDI (21)
α = 0.91 K-SADS-III-R
16 0.86 0.82 0.93
- -
Marton et al, 1991
119
14-19 (42.9; 57.1)
Adolescent inpatients and outpatients of a children's mental health centre with a current DSM-III Axis I psychiatric disorder (Depressed, n = 60 vs Non-Depressed, n = 59; USA)
BDI (21)
- DICA 16 0.68 0.81 0.79
- -
Barrera and Garrison-Jones 1988a
65 12-17 (48,52)
Child and adolescent psychiatric inpatients (USA)
BDI (21)
α = 0.86 CAS 11 0.82 0.53 - - -
Strober et al. 1981
78 12-16 (46, 54)
Child and adolescent patients (USA)
BDI (21)
α = 0.79 SADS 16 0.81 0.81 - - -
Non-Clinical
Russell et al. 2012
181
14-17 Adolescents from three schools (India)
BDI (21)
κ = 0.06 K-SADS-PL
18 0.63 0.70 0.14
0.97 0.67
Adewuya et al. 2007
1095
13-18 (58, 42)
Adolescents attending secondary school (Nigeria)
BDI (21)
α = 0.82 K-SADS-E
18 0.91 0.97 0.88
0.98 0.99
45
Canals et al. 2001
304
17.5-18.5 (50.3,49.7)
Adolescents from an urban commercial area (Spain)
BDI (21)
- SCAN 16 0.90 0.96 0.47
0.99 -
Roberts et al. 1991a
1710
14-18 (47, 53)
Adolescents from senior high schools (USA)
BDI (21)
α = 0.88 K-SADS 11 0.84 0.81 0.10
0.995
-
Roberts et al. 1991b
804
14-18 (100,0)
Male adolescent sample from senior high schools (USA)
BDI (21)
- K-SADS 15 - - - - 0.93
Roberts et al. 1991c
906
14-18 (0,100)
Female adolescent sample from senior high schools (USA)
BDI (21)
- K-SADS 11 - - - - 0.83
Kashani et al. 1990
102
13-18 (53, 47)
Adolescents who attended an outpatient “counselling center” (USA)
BDI (21)
α = 0.87 DICA 16 0.48 0.87 - - 0.79
Whitaker et al 1990
356
14-17 (45.8; 45.5)
Adolescents (grades 8-12) from 8 schools in New Jersey county (USA)
BDI (21)
- Semi-structured clinical interview based on DSM-III criteria administered by child psychiatrist or health professional
16 0.80 0.65 - - -
Barrera and Garrison-Jones 1988b
49 12-18 (45; 55)
Child and adolescent secondary school students (USA)
BDI (21)
α = 0.90 CAS 16 1.00 0.93 - - -
Teri 1982
568
14-17 (40, 60)
High school students (USA)
BDI (21)
α = 0.87 - - - - - -
46
Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis BDI = Beck Depression Inventory α = Cronbach’s alpha reliability co-efficient κ = Cohen’s kappa reliability co-efficient Kinder-DIPS: The Diagnostic Interview for Psychiatric Disorders in Children and Adolescents DAWBA: The Development and Wellbeing Assessment PRIME-MD: The Primary Care Evaluation of Mental Disorders K-SADS: The Schedule for Affective Disorders and Schizophrenia for School-Age Children K-SADS-III-R: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – the revision of the third edition CAS: The Child Assessment Schedule SADS: The Schedule for Affective Disorders and Schizophrenia K-SADS-PL: The Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version K-SADS-E: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – Epidemiological Version DICA: The Diagnostic Interview for Children and Adolescents SCAN: Schedules for Clinical Assessment in Neuropsychiatry
47
Table 3. Validation evidence for the Center for Epidemiologic Studies Depression Scale
(CES-D) in child and adolescent samples
Source N Age
&
Gend
er
(%m, %f)
Sample
(location) Scal
e
nam
e
(no. of items)
Reliabil
ity
Criteri
on
Cuto
ff
Sensitiv
ity
Specific
ity
PP
V
NP
V
AU
C
Clinical
Logsdon and Myers 2010
59
13-18 (0, 100)
Adolescent mothers at 4-6 weeks postpartum (USA)
CES-D (20)
α = 0.84 K-SADS-PL
16 0.7 0.52 0.25
0.12
0.62
Aebi et al. 2009
140
Mean: 15.5 (33, 67)
Adolescents diagnosed with major depressive disorders (Switzerland)
CES-D (20)
α = 0.83 Clinical interview
21 0.86 0.86 - - 0.94
Non-
Clinical
Betancourt et al. 2012
367
10-17 (33, 67)
Children and adolescents (Rwanda)
CES-DC (20)
α = 0.86 MINI-KID
≥30 0.82 0.72 - - 0.83
Cuijpers et al. 2008
1392
14-16 (52, 48)
Adolescents (Netherlands)
CES-D (20)
α = 0.93 MINI 22 0.9 0.74 - - 0.90
Thrane et al. 2004
213
9-16 (54, 46)
Adolescents from three American Indian reservations (USA)
CES-D (20)
α = 0.80 -
- - - - - -
Yang et al. 2004
2440
12-16 (52, 48)
Adolescents (Taiwan)
CES-D (20)
α = 0.9 K-SADS-E
90th %tile
0.41 0.9 - - 0.9
Prescott et al. 1998
556
Mean: 16.8
Adolescent students from grades 9-12 (USA)
CES-D (20)
- DISC 16 0.79 0.74 0.24
0.96
0.74
Garrison et al. 1991a
1231
12-14 (100, 0)
Child and adolescent boys from school sample (USA)
CES-D (20)
α = 0.81 K-SADS-P
12 0.85 0.49 0.16
0.98
0.61
Garrison et al.1991b
1234
12-14 (0, 100)
Child and adolescent girls from a school
CES-D (20)
α = 0.86 K-SADS-P
22 0.83 0.77 0.32
0.98
0.77
48
sample (USA)
Roberts et al. 1991a
1710
14-18 (47, 53)
Adolescents of nine senior high schools (USA)
CES-D (20)
α = 0.89 K-SADS
24 0.84 0.75 0.08
0.99
-
Roberts et al. 1991b
804
14-18 (100,0)
Adolescent male sample of nine senior high schools (Roberts et al, 1991a; USA)
CES-D (20)
- K-SADS
22 - - - - 0.87
Roberts et al. 1991c
906
14-18 (0,100)
Adolescent female enrolment of nine senior high schools (Roberts et al, 1991a; USA)
CES-D (20)
- K-SADS
24 - - - - 0.83
Fendrich et al. 1990
220
12-18 Children and adolescents at risk for depression according to their parents’ diagnosis (USA)
CES-DC (20)
α = 0.89 K-SADS-E
≥16 0.71 0.62 0.15
0.96
-
Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis CES-D = Center for Epidemiologic Studies Depression Scale CES-DC = Center for Epidemiologic Studies Depression Scale Child Version α = Cronbach’s alpha reliability co-efficient K-SADS-PL: The Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version MINI-KID: The Mini-International Neuropsychiatric Interview MINI: The Mini-International Neuropsychiatric Interview for children K-SADS: The Schedule for Affective Disorders and Schizophrenia for School-Age Children K-SADS-E: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – Epidemiological version DISC: The National Institute of Mental Health Diagnostic Interview Schedule for Children K-SADS-P: The Schedule for Affective Disorders and Schizophrenia for School-Age Children Present Episode version
49
Table 4. Validation evidence for the Reynolds Adolescent Depression Scale (RADS) in child and adolescent samples
Source N Age
&
Gend
er
(%m, %f)
Sample
(location) Scale
name
(no. of
items)
Reliabili
ty
Criteri
on
Cuto
ff
Sensitivi
ty
Specifici
ty
PP
V
NP
V
AU
C
Clinic
al
Osman, A. et al. 2010
196 14-17 (43, 57)
Adolescent inpatients (USA)
RADS-2 (30)
α = 0.95 BHS 67 0.64 0.8 0.77
0.68
0.80
Krefetz et al. 2002
100 12-17 (44, 56)
Child and adolescent psychiatric inpatients (USA)
RADS (30)
α = 0.91 PRIME-MD
70 0.86 0.49 0.69
0.72
0.76
Non-
Clinic
al
Weber and Terhorst 2010
265 13-18 (73, 27)
Adolescent LGBTIQ youth (USA)
RADS-2 (30)
α = 0.92 - - - - - - -
Walker et al. 2005
9699
12-18 (46, 54)
Child and adolescent school students (New Zealand)
RADS (30)
α = 0.94 - - - - - - -
Boyd et al. 1997
783 11-18 (49, 52)
Child and adolescent school students (Australia)
RADS (30)
α = 0.85 - - - - - - -
Reynolds & Miller 1985a
26 Mean: 17.3
Intellectually delayed adolescent school students (USA)
RADS (30)
α = 0.87 - - - - - - -
Reynolds & Miller 1985b
26 Mean: 16.7
Non-intellectually delayed adolescent school students (USA)
RADS (30)
α = 0.97 - - - - - - -
Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis RADS = Reynolds Adolescent Depression Scale α = Cronbach’s alpha reliability co-efficient BHS: Beck Hopelessness Scale PRIME-MD: The Primary Care Evaluation of Mental Disorders LGBTIQ: Lesbian, Gay, Bisexual, Transgender, Intersexual, Questioning.
51
Highlights:
• Symptom screening scales are often used to determine a clinical diagnosis of
depression.
• We examined the diagnostic utility of these scales using systematic review and meta-
analysis.
• Commonly used screening scales for depression are reliable measures among
adolescents.
• Using cutpoints to determine a clinical diagnosis may produce high misclassification
rates.