Author's Accepted Manuscript - UQ eSpace348925/UQ348925_OA.pdf · Author's Accepted Manuscript Symptom screening scales for detecting major depressive disorder in children and adoles-cents:

Author's Accepted Manuscript

Symptom screening scales for detecting majordepressive disorder in children and adoles-cents: A systematic review and meta-analysisof reliability, validity and diagnostic utility

Emily Stockings, Louisa Degenhardt, Yong YiLee, Cathrine Mihalopoulos, Angus Liu, MeganHobbs, George Patton

PII: S0165-0327(14)00785-XDOI: http://dx.doi.org/10.1016/j.jad.2014.11.061Reference: JAD7152

To appear in: Journal of Affective Disorders

Received date: 26 August 2014Revised date: 28 November 2014Accepted date: 29 November 2014

Cite this article as: Emily Stockings, Louisa Degenhardt, Yong Yi Lee, CathrineMihalopoulos, Angus Liu, Megan Hobbs, George Patton, Symptom screeningscales for detecting major depressive disorder in children and adolescents: Asystematic review and meta-analysis of reliability, validity and diagnosticutility, Journal of Affective Disorders, http://dx.doi.org/10.1016/j.jad.2014.11.061

This is a PDF file of an unedited manuscript that has been accepted forpublication. As a service to our customers we are providing this early version ofthe manuscript. The manuscript will undergo copyediting, typesetting, andreview of the resulting galley proof before it is published in its final citable form.Please note that during the production process errors may be discovered whichcould affect the content, and all legal disclaimers that apply to the journalpertain.

www.elsevier.com/locate/jad

http://dx.doi.org/10.1016/j.jad.2014.11.061






1

Symptom screening scales for detecting major depressive disorder in children and

adolescents: A systematic review and meta-analysis of reliability, validity and diagnostic

utility Emily Stockings*1 PhD, Louisa Degenhardt1 PhD, Yong Yi Lee2 MHEcon, Cathrine Mihalopoulos3 PhD, Angus Liu1 MPH, Megan Hobbs4 PhD, George Patton5 MD. 1National Drug and Alcohol Research Centre, University of New South Wales, 22-32 King Street, Randwick, New South Wales, Australia 2031.

2School of Population Health, University of Queensland, Herston Road, Herston, Queensland, Australia, 4006. 3Population Health Strategic Research Centre, Deakin University, 211 Burwood Highway, Burwood, Victoria, Australia, 3125. 4Clinical Research Unit for Anxiety and Depression (CRUfAD), University of Newcastle, St Vincent’s Hospital, 390 Victoria Street, Darlinghurst, 2010. 5Centre for Adolescent Health, The Royal Children’s Hospital, 50 Flemington Road, Parkville, Victoria, 3025. *Corresponding author: Emily Stockings, Email: [email protected] Phone: (02)9385 1062 Address: National Drug and Alcohol Research Centre (NDARC), Building R3, 22-32 King Street, Randwick, New South Wales, Australia, 2031

Abstract

Background: Depression symptom screening scales are often used to determine a clinical

diagnosis of major depressive disorder (MDD) in prevention research. The aim of this review

is to systematically examine the reliability, validity and diagnostic utility of commonly used

screening scales in depression prevention research among children and adolescents.

Methods: We conducted a systematic review of the electronic databases PsycINFO,

PsycEXTRA and Medline examining the reliability, validity and diagnostic utility of four

commonly used depression symptom rating scales among children and adolescents: the

Children's Depression Inventory (CDI), Beck Depression Inventory (BDI), Center for

Epidemiologic Studies - Depression Scale (CES-D) and the Reynolds Adolescent Depression

Scale (RADS). We used univariate and bivariate random effects models to pool data and

conducted metaregression to identify and explain causes of heterogeneity.

Results: We identified 54 studies (66 data points, 34,542 participants). Across the four

scales, internal reliability was 'good' (pooled estimate: 0.89, 95% Confidence Interval (CI):

2

0.86 to 0.92). Sensitivity and specificity were 'moderate' (sensitivity: 0.80, 95% CI: 0.76 to

0.84; specificity: 0.78, 95% CI: 0.74 to 0.83). For studies that used a diagnostic interview to

determine a diagnosis of MDD, positive predictive power for identifying true cases was

mostly poor. Psychometric properties did not differ on the basis of study quality, sample type

(clinical vs. nonclinical) or sample age (child vs. adolescent).

Limitations: Some analyses may have been underpowered to identify conditions in which

test performance may vary, due to low numbers of studies with adequate data.

Conclusions: Commonly used depression symptom rating scales are reliable measures of

depressive symptoms among adolescents; however, using cutoff scores to indicate clinical

levels of depression may result in many false positives.

Keywords: Depression, Children, & Adolescents, Psychometrics, Validity, Psychiatric

Symptom Rating Scales, Prevention.

Abbreviations:

ADIS-C: Anxiety Disorders Interview Schedule for Children AUC: Area under the curve BDI: Beck Depression Inventory BSI: Brief Symptom Inventory BYI: Beck Youth Inventories CBCL: the Child Behaviour Checklist CDI: Children’s Depression Inventory CDRS-R: Children’s Depression Rating Scale – Revised CES-D: Center for Epidemiologic Studies- Depression Scale CES-DC: Center for Epidemiologic Studies – Depression Scale for Children CIDI: Composite International Diagnostic Interview DAWBA: The Development and Wellbeing Assessment DEPS-10: Depression Scale - Version 10 DICA-IV: Diagnostic Interview for Children and Adolescents- Version 4 DISC: The National Institute of Mental Health Diagnostic Interview Schedule for Children DSM: Diagnostic and Statistical Manual DSRS: Depression Symptom Rating Scale (DSRS) HADS: Hospital Anxiety and Depression Scale ICD: International Classification of Diseases KID-SCID: Structured Clinical Interview for DSM-IV disorders – Child version

3

Kinder DIPS: Diagnostisches Interview bei psychischen Störungen im Kindes- und Jugendalter [German] K-SADS: Kiddie-Schedule for Affective Disorders and Schizophrenia MDD: Major depressive disorder MDI-C: Multi-score Depression Inventory- Children MINI: Mini-International Neuropsychiatric Interview MINI-KID: The Mini-International Neuropsychiatric Interview for Children NPV: Negative predictive value PPV: Positive predictive value PRIME-MD: The Primary Care Evaluation of Mental Disorders RADS: Reynolds Adolescent Depression Inventory RCDAS: Revised Child Anxiety and Depression Scale ROC: Receiver operator characteristic SBB-DES: The Self-Report Questionnaire—Depression [German] SCAN: Schedules for Clinical Assessment in Neuropsychiatry SCID: Structured Clinical Interview for Depression for DSM disorders SCID-1: Structured Clinical Interview for DSM-IV Axis I disorders SDQ: Strengths and Difficulties Questionnaire SMFQ: Short Mood and Feelings Questionnaire YSR: Youth Self-Report

4

Introduction

Major depressive disorder (MDD) is the leading global cause of disability among

young people aged 10-24 years, accounting for 8.2% of the global non-fatal disease burden

(Gore et al., 2011). Approximately 3% of children and 6% of adolescents suffer current or

recent depression (Costello et al., 2006). MDD in young people is associated with poor

academic performance, substance abuse, attempted and completed suicide and an increased

risk of suffering depression during adulthood (Birmaher et al., 1996, Brent et al., 1986).

Despite the significant health burden associated with MDD, studies have suggested that less

than 50% of youths seek mental health treatment for the condition (Reavley et al., 2010, Leaf

et al., 1996). Valid and accurate screening tools for depression may assist clinicians in

identifying MDD in youths, and may subsequently increase the rates of appropriate treatment

and referral (Hosman et al., 2005, Andrews et al., 2002).

Given the significant health burden associated with MDD, there has been growing

recognition of the need to develop programs aiming to prevent the onset of MDD during

childhood and adolescence (Hosman et al., 2005, Andrews et al., 2002). Promisingly, the

number of studies that have examined the efficacy of preventative interventions for MDD

among children and adolescents more than doubled between 2004 and 2010 (Merry et al.,

2004, Merry et al., 2011). Such interventions have typically been delivered in the school

setting during regular classes by teachers or trained external facilitators (Merry et al., 2011).

Given that routinely administering structured or semi-structured diagnostic interviews in

schools can be costly and time consuming, many trials have used categorical thresholds on

MDD symptom screening scales, such as the Center for Epidemiologic Studies – Depression

Scale (CES-D) (Radloff, 1977) as a proxy for a diagnosis of MDD (Merry et al., 2011).

Evaluating the efficacy of preventative interventions using symptom screening scales is

problematic in two ways. Firstly, symptom screening scales that impose a categorical

5

threshold over a larger number of symptoms that are included in the DSM and the ICD can

increase the number of cases that are identified compared to the cases where DSM or ICD

diagnoses were applied (Ferrari et al., 2013).

Secondly, while the reliability, validity and utility of identifying cases of MDD using

the CES-D and other symptom screening scales have been established in adult populations

(Radloff, 1977), the same clinical thresholds have been applied among childhood and

adolescent samples. No review has systematically examined whether the clinical thresholds

identified in adult samples are reliable, valid or useful for identifying cases of MDD among

children and adolescents. There is also no review that has quantitatively synthesized the

traditional psychometric properties of symptom screeners for MDD among children and

adolescents. Indeed, previous reviews examining the traditional psychometric characteristics

of depression symptom screening scales among children and adolescents have been non-

systematic and qualitative in nature, the most comprehensive of which were conducted more

than a decade ago (Myers and Winters, 2002b, Brooks and Kutcher, 2001). The absence of

robust data in support of applying the adult-derived thresholds for MDD when assessing for

MDD among children and adolescents means that it is difficult to interpret the efficacy,

effectiveness and efficiency of early-life preventative interventions for MDD meaningfully.

The purpose of this review is to systematically: 1) identify symptom screening scales

that are commonly used in childhood and adolescent preventative interventions for MDD;

and 2) identify evidence of reliability, validity and diagnostic utility of these symptom

screening scales.

6

Methods

Search 1: Identify symptom screening scales that are commonly used in childhood and

adolescent preventative interventions for MDD.

a) Search strategy

Given the large number of randomized controlled trials examining the efficacy of preventive

interventions for MDD among children and adolescents, and the existence of multiple

reviews synthesizing these findings (e.g. (Merry et al., 2004, Merry et al., 2011)), we

conducted a systematic review of reviews, and updated recent reviews with any studies which

may have been published subsequent to their publication date. The review of reviews was

conducted in August, 2013 using the online databases PubMed, Medline, PSYCINFO and the

Cochrane Library of Systematic Reviews, in consultation with a librarian and in accordance

with the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA)

statement (Liberati et al., 2009). Databases were searched using a combination of MeSH

terms and text words pertaining to depression and dysthymia (Depressive Disorder, Major

Depression, Dysthymic Disorder, “dysthymia.mp.”), prevention (Primary Prevention,

Preventative Psychiatry, “prevention.mp.”) and intervention trials (Intervention Studies,

“intervention.mp.”). An additional search of empirical studies dated from August 2010-

present was conducted in January 2014 to identify recently published randomised controlled

trials not included in the existing reviews. This additional search was conducted in the

electronic databases PubMed, Medline and PSYCINFO using the search string ((((depress*

OR dysthymi*)) AND (child* OR adolescen*)) AND (prevent* OR early intervention* OR

risk OR at-risk OR vulnerab*)) AND (randomised controlled trial OR controlled trial

[Publication Type]).

7

b) Inclusion criteria

i) Reviews: Reviews were eligible for inclusion if: 1) they were published

between 2003 and August 2013 in the English language; 2) the authors

employed systematic methods of reviewing the literature; and 3) the data were

reported in a usable form, and:

ii) Empirical studies: Individual studies identified within the reviews, and the

updated search of empirical studies were eligible for inclusion if: 4) the

participants were aged between 5 and 18 years; 5) participants were randomly

assigned to an intervention or a control group; 6) the control group received no

intervention, placebo or usual care; and 7) the intervention aimed to prevent of

the onset of depression or dysthymia.

c) Data extraction

Data were extracted from each empirical study by co-author ES and were cleaned and double

checked by co-author YL and included: 1) sociodemographics of the sample; 2) details of the

intervention and delivery; 3) timing of follow-up assessments; 4) the methods used to

measure the prevalence, or symptoms of depression or dysthymia, including depression

symptom screening scales, or structured diagnostic interviews; and 5) outcome data.

Symptom screening scales were defined as any type of measure that provided an assessment

of MDD, producing a numerical score that has guidelines for interpretation, whether

completed by the person of interest (self report) or someone else (e.g. parent, guardian,

teacher), with any type of response format (Myers and Winters, 2002a). For the purposes of

this review, depression symptom screening scales were defined as ‘common’ if they were

used in more than two studies. Scales used in less than two studies are mentioned only briefly

here.

8

Search 2: Identify evidence of reliability, validity and diagnostic utility of commonly used

depression symptom screening scales used in childhood and adolescent preventative

interventions for MDD.

a) Search strategy

For each symptom screening scale for MDD that was identified as commonly used in Search

1, evidence for each scale’s reliability, validity and diagnostic utility in any child or

adolescent sample (not limited to samples where preventive interventions were applied) was

obtained through a series of additional systematic searches of the literature using PsycINFO,

PsycEXTRA and Medline and a combination of MeSH terms and text words pertaining to

depression (exp Major Depression), children (children OR adolesc~) symptom screening

scales (exp Rating scales/exp Screening Tests/exp Psychological Tests/exp Psychiatric Status

Rating scales), and psychometric properties (exp Psychometrics/exp Test Validity/exp Test

Reliability/exp Diagnostic Utility), and the name of each individual scale as text words (e.g

“children’s depression inventory”, “center for epidemiologic studies depression scale”, “beck

depression inventory”,). Other sources of literature included existing reviews of depression

scales (Brooks and Kutcher, 2001, Brooks and Kutcher, 2003, Pavuluri and Birmaher, 2004,

Suzuki, 2011, Collett et al., 2003), and hand searches of articles identified as relevant.

b) Eligibility criteria

Articles reporting on the scales’ reliability, validity and diagnostic utility were eligible for

inclusion in the review if the study was: 1) published between 1980 (following publication of

the initial validation studies) and August 2013 in the English language; 2) the participants

were aged between 5 and 18 years; 3) the scale was identified in Search 1; and 4) the study

reported the traditional psychometric properties of the scale including measures of reliability

9

and/or validity. We excluded any revisions or short versions of the scales if they were not

analogous to the original scales (i.e. differed in terms of number of items, scoring method,

etc.). Studies attempting to translate an existing scale into another language without reporting

psychometric properties of the scale itself were excluded.

c) Data extraction

Data from each included study were extracted by co-author AL and were cleaned and double

checked by co-author ES and included: 1) characteristics of the sample (including sample

size, age, gender, country, language and setting); 2) the names of the scales assessed and the

number of items; 3) the test-retest and inter-rater reliability of the scale; 4) the internal

validity of the scale (e.g. Cronbach’s alpha); 5) details about the external validity of the scale

including the categorical thresholds used to indicate cases of MDD, the type of clinical

interview used to determine cases of MDD (i.e. the ‘criterion standard’), and the associated

sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and

area under the curve (AUC) results derived from receiver operating characteristics (ROC)

analyses (described below). Sensitivity, specificity, PPV and NPV were calculated manually

when data were partially reported, and could be imputed based on the presentation of other

data.

Primary outcomes: Traditional psychometric properties and diagnostic utility

a) Data synthesis

Data were synthesised via meta analysis using the statistical software program Stata/SE

version 13.1 (StataCorp, 2013). We used univariate random effects models to pool data

10

across all studies for the primary outcomes internal reliability and AUC. Bivariate random

effects models (using the ‘mvmeta’ command) were used to pool data for the primary

outcomes sensitivity and specificity in order to take into account the interdependency of these

values. Values for PPV and NPV were not meta analysed and were synthesised in a

qualitative manner only, as these values are highly dependent on the prevalence of depression

in each sample, which was likely to vary substantially between each study included in this

review (Leeflang et al., 2012). In order to identify and explain any causes of heterogeneity,

random effects metaregression (using the ‘metareg’ command) was performed on the primary

outcomes (with the exception of PPV and NPV) for each of the four scales, and across the

four scales overall. The effects of the sample setting (clinical vs. nonclinical), age (children

vs. adolescents), risk of bias quality score, the cutoff score used, and the scale (CDI, BDI,

CES-D, RADS) were evaluated by individually adding them as covariates in the regression

models. Due to the small number of studies, these analyses were primarily exploratory.

Additionally, the I2 index was employed to quantify any heterogeneity in the pooled

estimates, and was described as low, moderate or high according to an I2 value of 25, 50 and

75%, respectively (Higgins et al., 2003). Statistical significance for all analyses was set at p <

.05.

b) Reliability

The internal consistency of each scale was assessed using Cronbach’s alpha [α], and test

retest and interrater reliability were assessed using the Kappa [κ] statistic. These reliability

coefficients were classified as ‘excellent’ if α ≥ .9, ‘good’ if .85 ≤ α < .9, ‘moderate’ if .80 ≤

α < .85, ‘fair’ if .75 ≤ α < .80, or ‘unsatisfactory’ if < .75 (Ponterotto and Ruckdeschel, 2007).

11

c) Validity

To examine the validity of each scale, we assessed five key measures of discriminative

validity: 1) sensitivity (the proportion of true cases correctly identified, or “true positive

rate”); 2) specificity (proportion of non-cases correctly identified, or “true negative rate”); 3)

PPV (probability that subjects identified as cases are true cases, or “precision”); and 4) NPV

(probability that the subjects identified as not being a case are not cases) (Murphy and

Davidshofer, 1994). For inclusion of studies in the meta analysis, we assigned a minimum

quality assessment for the criterion standard as diagnoses made using either a standardised

(e.g. the Composite International Diagnostic Interview (CIDI)) or structured (e.g. Kiddie-

Schedule for Affective Disorders and Schizophrenia (K-SADS)) diagnostic interview

schedule based on DSM or ICD-10 diagnostic criteria. Clinician rated diagnoses without

reference to a specific diagnostic instrument were excluded from the analyses. Based on

expert statistical recommendation, we assigned a value of ‘excellent’ for the scales’

sensitivity, specificity, PPV and NPV if the pooled value of coefficient across studies was ≥

.9, a value of ‘good’ if .8 ≤ coefficient < .9, a value of ‘moderate’ if 0.6 ≤ coefficient < 0.8,

and a value of ‘low’ coefficient < .6 (Andrews et al., 1993). We additionally assessed the

scale’s diagnostic accuracy using the area under the curve (AUC) receiver operating

characteristic (ROC) analysis. ROC curves plot the tradeoff between sensitivity and

specificity values for all possible cutoff scores, and the AUC is a measure of the scales’

overall ability to correctly classify individuals as cases or noncases (Brooks and Kutcher,

2001). This is calculated as the sum of true positives and the sum of true negatives divided by

the total population (not taking into account false positive or negatives). AUC values range

from 0.5 (random classification) to 1 (perfect classification). Henderson (Henderson, 1993)

12

recommended the following interpretation with respect to the scales’ diagnostic accuracy:

AUC > 9 ‘high’; 0.7 < AUC ≤ 0.9 ‘moderate’; AUC ≤ 7 ‘low’.

Risk of bias

Risk of bias in the included studies was examined using the risk assessment tool developed

by the Cochrane Collaboration Diagnostic Test Accuracy Working Group (Reitsma et al.,

2009), derived from the Revised Quality Assessment of Diagnostic Accuracy Studies

(QUADAS-2) tool (Whiting et al., 2011b). Risk was determined across four domains: patient

selection, index test, reference standard, and flow and timing, and was categorised and

quantified as either low (3), unclear (2) or high (1) in order to derive a total quality score for

use in the metaregression analyses.

Results

Search 1: Identify commonly used depression symptom screening scales in childhood and

adolescent preventative interventions for MDD.

Search 1 identified 17 systematic reviews of childhood and adolescent preventative

interventions for MDD. These 17 reviews contained a total of 394 individual trials, of which

288 were duplicates (in instances where the same trial appeared in more than one review) and

106 were unique.

Of the 106 unique trials, 103 assessed depressive symptomatology as an outcome using

depression symptom screening scales, and 32 assessed the prevalence of MDD using either

cutoff scores on depression symptom screening scales (n = 15) or by administering structured

or semistructured clinical interviews (n = 17). Across the 103 trials, a total of 17 different

depression symptom screening scales, and five structured or semistructured clinical

interviews were used. The most commonly used screening scale for assessing depressive

13

symptomatology was the Children’s Depression Inventory (CDI; 45 studies), followed by the

Beck Depression Inventory (BDI; 16 studies), the Center for Epidemiologic Studies –

Depression Scale (CES-D; 15 studies), and the Reynolds Adolescent Depression Scale

(RADS; 5 studies). The following scales were used as outcome measures in two studies or

fewer: the Brief Symptom Inventory (BSI), the Child Behaviour Checklist (CBCL), the

Children’s Depression Rating Scale – Revised (CDRS-R), the Depression Symptom Rating

Scale (DSRS), Beck Youth Inventories (BYI), the Depression Scale version 10 [Finnish]

(DEPS-10), the Hospital Anxiety and Depression Scale (HADS), the Multi-score Depression

Inventory – Children (MDI-C), the Revised Child Anxiety and Depression Scale (RCADS),

Youth Self-Report (YSR), and The Self-Report Questionnaire—Depression [German]

(Selbstbeurteilungsbogen—Depressive Störungen; SBB–DES). Two studies were found to

have used undefined composite constructs. The most commonly used measure to determine

the presence of MDD was the Kiddie-Schedule for Affective Disorders and Schizophrenia

(K-SADS; 7 studies), followed by the Anxiety Disorders Interview Schedule for Children

(ADIS-C; 5 studies), the Diagnostic Interview for Children and Adolescents version 4,

(DICA-IV; 1 study), and the Structured Clinical Interview for Depression for DSM disorders

(SCID; 1 study).

Search 2: Identify evidence of reliability, validity and diagnostic utility of commonly used

depression symptom screening scales used in childhood and adolescent preventative

interventions for MDD.

Figure 1 shows the study selection process. The search for the reliability, validity and

diagnostic utility of depression symptom screening scales for children and adolescents

yielded a total of 2017 results, of which 1798 were unique. Of these, 1626 were excluded as

they were not relevant to the topic. Of the remaining 172 publications, 77 were excluded as

14

the age of the sample was outside the specified age range (5-18), 30 conducted unrelated

psychometric tests, 10 assessed the validity of translated versions only and 3 assessed short or

alternate versions of the target scale. We identified the remaining 52 papers (comprising 54

studies) as relevant, of which 21 assessed the reliability, validity and diagnostic utility of the

CDI, 17 examined the BDI (one study additionally examined the CES-D, and study

additionally examined the RADS), 10 examined the CES-D and 6 examined the RADS

(Figure 1). The 54 studies comprised a total of 66 individual data points and 34,542

participants. Studies were conducted predominantly in the United States, with a minority

conducted in Europe (Spain, Germany, Belgium, Denmark, the Netherlands, Greece, Sweden

and Switzerland), Africa (Nigeria and Rwanda), India and Taiwan. Half of the articles

included in this review (n = 26 articles, 32 studies; 50%) were published since the last major

review of symptom screening scales for MDD among children and adolescents in 2002

(Myers and Winters, 2002b).

Risk of bias in included studies

Of the 52 individual papers (based on 54 studies) included in the review, risk of bias was

mostly unable to be determined, or determined to be high (Figure 1a, online supplement).

Only 10 studies employed representative samples. Four studies pre-specified the cutoff scores

on the scales to determine ‘caseness’, with most studies calculating these values post-hoc (n =

39). Of the 33 studies that used a diagnostic interview to determine the discriminative validity

of the symptom screening scale, 24 used structured or semistructured clinical interviews

based on ICD-10 or DSM criteria, however only 14 of these stated that the interviewers were

blinded to the scale scores when administering the interview. The 9 studies that used

undefined clinical interviews that were not based on ICD-10 or DSM diagnostic criteria were

excluded from the analyses. Appropriate lag time between the index test (symptom screening

15

scale) and reference test (diagnostic interview) was deemed to be within one week (Whiting

et al., 2011a), and 11 studies fulfilled this criterion.

Commonly used depression scales for children and adolescents

Children’s Depression Inventory (CDI)

The CDI is the child version of Beck Depression Inventory (BDI), designed for use with

children aged 7-18 years (Kovacs, 1984). The scale is self report, comprises 27 items with

three-point answers and takes approximately 10-20 minutes to complete and score (Kovacs,

1984). The period of assessment is the prior two weeks, and the range of possible scores is 0

to 54. The CDI was specifically designed for and tested among children and adolescents and

includes a corresponding parent version. It has a good track record in depression research as

it is one of the most widely used and studied scale of youth depression, with normative data

available for children and adolescents (Kovacs, 1984). The CDI purports to possess a five

factor structure, however one of the factors relates to externalising disorders (‘acting out’)

and one to anxiety, resulting in uncertainty around the validity of depression construct (Myers

and Winters, 2002b).

Reliability, validity and diagnostic utility coefficients for the CDI among clinical and

nonclinical samples nonclinicalof children and adolescents identified in our review are

summarised in Table 1, and Figure 2. The 21 identified studies comprised a total of 25 data

points (n = 8371). Of these, 14 studies (18 data points) provided details of the scales’ internal

reliability (Allgaier et al., 2012, Carey et al., 1987, Crowley and Emerson, 1996, Crowley et

al., 1994, Figueras Masip et al., 2010, Fundudis et al., 1991, Giannakopoulos et al., 2009,

Nelson III et al., 1987, Reynolds et al., 1985, Saylor et al., 1984, Smucker et al., 1986,

16

Sorensen et al., 2005, Thompson et al., 2012, Weiss et al., 1991). The pooled estimate for

internal reliability was 0.86 (18 data points, n = 7372, 95% confidence interval (CI): 0.84 to

0.88) and was classified as ‘good’ (Figure 2), however heterogeneity was high (I2 = 76%).

Metaregression showed that sample type (clinical vs. nonclinical; t(1, 15) = 0.50, p = 0.62,

adjusted R2 <0), sample age (child vs adolescent; t(1,15) = 0.31, p = 0.76, R2 <0) and risk of

bias quality score (t(1, 15) = 1.61, p = 0.13, R2 = 67.38%) did not significantly affect the

internal reliability values on the CDI.

Nine studies (10 data points) examined the discriminative validity of the CDI (Allgaier et al.,

2012, Figueras Masip et al., 2010, Fristad et al., 1988, Fundudis et al., 1991, Roelofs et al.,

2010, Shemesh et al., 2005, Timbremont et al., 2004, Sorensen et al., 2005, Craighead et al.,

1995) with cutoff scores ranging from 11 to 19 (M = 14.5, SD = 2.67). Of these, seven used

structured or semi structured clinical interviews to determine presence of MDD. All seven

studies examined the specificity and sensitivity of the CDI. Bivariate meta analyses revealed

that sensitivity was ‘good’ overall (7 data points, n = 1432, pooled estimate: 0.83, 95% CI:

0.77 to 0.89), as was specificity (7 data points, n = 1432, pooled estimate: 0.84, 95% CI: 0.77

to 0.92), however heterogeneity was high (I2 = 91% and 96% respectively). Metaregression

showed that the sample type (clinical vs. nonclinical; t(1, 6) = 0.59, p = 0.58, R2 < 0), sample

age (child vs adolescent; t(1, 6)= 1.05, p = 0.33, R2 < 0), cutpoint used (t (1, 6) = 2.21, p =

0.06, R2 = 38.68%) and risk of bias quality score (t (1, 6) = 0.56, p = 0.59, R2 < 0) did not

significantly affect sensitivity values on the CDI. Six studies examined the PPV of the CDI,

all of which were conducted among clinical samples, and most were classified as ‘low’, with

values of 0.28 (Allgaier et al., 2012), 0.35 (Roelofs et al., 2010), 0.38 (Shemesh et al., 2005),

0.21 (Sorensen et al., 2005), 0.63 (Timbremont et al., 2004) and 0.90 (Fristad et al., 1988).

Five studies examined the NPV of the CDI, and as per the PPV values, all were conducted

among clinical samples. NPV values were mostly classified as ‘excellent’, with values of 1.0

17

(Roelofs et al., 2010), 0.98 (Allgaier et al., 2012), , 0.98 (Timbremont et al., 2004), 0.94

(Shemesh et al., 2005) and 0.63 (Sorensen et al., 2005). Four studies examined the diagnostic

accuracy of the CDI using AUC analyses, all of which were conducted among clinical

samples (Allgaier et al., 2012, Roelofs et al., 2010, Sorensen et al., 2005, Timbremont et al.,

2004), and overall accuracy was classified as ‘high’ (4 data points, n = 1146, pooled estimate:

0.90, 95% CI: 0.79 to 0.98; Figure 2), however heterogeneity was high I2=95%. There was

insufficient data to conduct a metaregression on AUC values for the CDI.

Beck Depression Inventory (BDI)

The BDI was originally developed in 1961 as a depression symptom rating scale for the adult

population (Beck and Alford, 2009). The original inventory comprises 21 items, with four (or

five) statement responses representing how the respondent has been feeling during the past

week and current day. The statements are presented in order of increasing severity and are

scored from 0 to 3 (alternative statements sharing the same score are sometimes provided).

Six of the items refer to vegetative symptoms; the remaining 15 items refer to either affective

or cognitive symptoms. The range of possible scores is 0 to 63 (Beck et al., 1961). The scale

is widely used among both adults and adolescents and has a strong track record in depression

research (Myers and Winters, 2002b). The scale lacks items pertaining to school and does not

offer parallel parent or teacher rating forms, and as such, its use among younger children may

not be suitable (Kovacs and Beck, 1977). There are several alternate versions of the BDI. The

BDI-1A is a revision of the original BDI (Beck et al., 1996), also containing 21 items but

with refined response formats and a defined assessment period of the prior two weeks. The 21

item BDI-II is a further revision of the BDI, developed in 1996 to align with the DSM-IV

criteria for depression (Beck et al., 1996).

18

Reliability, validity and diagnostic utility coefficients for the BDI among clinical and

nonclinical samples of children and adolescents are summarised in Table 2 and Figure 3. The

17 identified studies comprised a total of 20 data points (n = 5464). Of these, 11 studies (12

data points) examined the scale’s internal reliability (Adewuya et al., 2007, Ambrosini et al.,

1991, Barrera and Garrison-Jones, 1988, Dolle et al., 2012, Jolly et al., 1994, Krefetz et al.,

2002, Kumar et al., 2002, Roberts et al., 1991, Strober et al., 1981, Teri, 1982, Kashani et al.,

1990). The pooled estimate for internal reliability was 0.86 (12 data points, n = 4152, 95%

CI: 0.81 to 0.90) and was classified as ‘good’, and heterogeneity was moderate (I2 = 72%)

(Figure 3). Metaregression showed that sample type (clinical vs. nonclinical; t (1, 9) = 1.86, p

= 0.09, R2 = 32.17%), sample age (child vs adolescent; t (1, 9) = 0.88, p = 0.40, R2 = 73.52%)

and risk of bias quality score (t (1, 9) = 0.70, p = 0.50, R2 < 0) did not significantly affect the

internal reliability values on the BDI.

Fifteen studies (18 data points) examined the discriminative validity of the BDI (Adewuya et

al., 2007, Ambrosini et al., 1991, Barrera and Garrison-Jones, 1988, Bennett et al., 1997,

Blom et al., 2010, Canals et al., 2001, Dolle et al., 2012, Kashani et al., 1990, Krefetz et al.,

2002, Kumar et al., 2002, Marton et al., 1991, Roberts et al., 1991, Russell et al., 2012,

Strober et al., 1981, Whitaker et al., 1990), with cutoff scores ranging from 11 to 24. Of the

13 studies that used a structured or semi-structured clinical interview, 12 studies (13 data

points) examined the scales’ sensitivity and specificity (Adewuya et al., 2007, Ambrosini et

al., 1991, Barrera and Garrison-Jones, 1988, Bennett et al., 1997, Canals et al., 2001, Dolle et

al., 2012, Kashani et al., 1990, Marton et al., 1991, Roberts et al., 1991, Russell et al., 2012,

Strober et al., 1981, Whitaker et al., 1990). Bivariate meta analyses revealed that sensitivity

was ‘good’ overall (13 data points, n = 4597, pooled estimate: 0.81, 95% CI: 0.74 to 0.87), as

was specificity (13 data points, n = 4597, pooled estimate: 0.81, 95% CI: 0.75 to 0.88; Figure

3), however heterogeneity was high (I2 = 94% and 97% respectively). Metaregression showed

19

that the sample type (clinical vs. nonclinical; t(1, 12) = 0.73, p = 0.48, R2 < 0), sample age

(child vs adolescent; t(1, 12) = 0.85, p = 0.41, R2 < 0), cutpoint used (t(1, 12) = 0.13, p =

0.90, R2 < 0) and risk of bias quality score (t(1, 12) = 1.32, p = 0.37, R2 = 28.33%) did not

significantly affect sensitivity values on the BDI. Similar results were found for specificity,

with no impact of sample type (t(1, 12) = 0.86, p = 0.40, R2 < 0), age (t(1, 12) = 0.83, p =

0.42, R2 < 0), cutpoint used (t(1, 12) = 0.94, p = 0.37, R2 < 0) or risk of bias quality score (t(1,

12) = 0.39, p = 0.70, R2 < 0).

Eight studies examined the PPV of the BDI, half of which were conducted with clinical

samples and half in non-clinical samples, with significant heterogeneity between studies.

PPV values were higher overall in clinical samples, with values of 0.79 (Marton et al., 1991),

0.81 (Bennett et al., 1997), 0.81 (Dolle et al., 2012), and 0.93 (Ambrosini et al., 1991) and

were lower, however more varied in nonclinical samples, with values of 0.10 (Roberts et al.,

1991), 0.14 (Russell et al., 2012), 0.47 (Canals et al., 2001) and 0.88 (Adewuya et al., 2007)

(Figure 3). Five studies examined the NPV of the BDI, with values of 0.80 (Bennett et al.,

1997), 0.95 (Dolle et al., 2012), 0.97 (Russell et al., 2012), 0.98 (Adewuya et al., 2007) 0.99

(Canals et al., 2001) and 0.99 (Roberts et al., 1991) (Figure 3). Six studies (7 data points)

used AUC analyses to examine the diagnostic accuracy of the BDI (Adewuya et al., 2007,

Blom et al., 2010, Dolle et al., 2012, Kashani et al., 1990, Roberts et al., 1991, Russell et al.,

2012), and overall accuracy was classified as ‘high’ (7 data points, n = 3249, pooled

estimate: 0.92, 95% CI: 0.81 to 1.00). Metaregression showed that the sample type (clinical

vs. nonclinical; t(1, 6) = 0.39, p = 0.71, R2 < 0), sample age (child vs adolescent; t(1, 6) =

0.09, p = 0.93, R2 <0), cutpoint used (t(1, 6) = 0.17, p = 0.87, R2 < 0) and risk of bias quality

score (t(1, 6) = 1.44, p = 0.19, R2 = 14.16%) did not significantly affect AUC values on the

BDI.

20

Center for Epidemiologic Studies Depression Scale (CES-D)

The CES-D is a 20-item self-report questionnaire designed to detect depression in the general

population. The items address six symptom areas of depression including depressed mood,

feelings of guilt/worthlessness, helplessness, psychomotor retardation, loss of appetite and

sleep disturbance (Radloff, 1977). Each of the 20 items is rated on a scale from 0 to 3, with

total scores ranging from 0-60. The period of assessment is the past week. The CES-DC is the

child version of the original CES-D and retains the same number of items and response

format; however language has been adjusted to suit a child reading level (Weissman et al.,

1980).

Reliability, validity and diagnostic utility coefficients for the CES-D among clinical and

nonclinical samples of children and adolescents are summarised in Table 3 and Figure 4.

Nine studies (10 data points) examined the internal reliability of the original full scale CES-D

(Aebi et al., 2009, Betancourt et al., 2012, Cuijpers et al., 2008, Fendrich et al., 1990,

Garrison et al., 1991, Logsdon and Myers, 2010, Roberts et al., 1991, Thrane et al., 2004).

The pooled estimate for internal reliability was 0.88 (10 data points, n = 9006, 95% CI: 0.84

to 0.92, Figure 4) was classified as ‘good’, however heterogeneity was high (I2 = 92%).

Metaregression showed that sample type (clinical vs. nonclinical; t(1, 7) = 0.87, p = 0.41, R2

< 0), sample age (child vs adolescent; t(1, 7) = 0.79, p = 0.45, R2 = 2.75%) and risk of bias

quality score (t(1, 7) = 0.44, p = 0.67, R2 < 0) did not significantly affect the internal

reliability values on the CES-D.

Nine studies (12 data points) examined the discriminative ability of the CES-D (Aebi et al.,

2009, Betancourt et al., 2012, Cuijpers et al., 2008, Fendrich et al., 1990, Garrison et al.,

1991, Logsdon and Myers, 2010, Prescott et al., 1998, Roberts et al., 1991, Yang et al.,

21

2004), with cutoff scores ranging from 12 to 24. Of these, eight studies used structured or

semi structured clinical interviews to determine a diagnosis of depression. Bivariate meta

analysis revealed that sensitivity coefficients were ‘moderate’ overall (9 data points, n =

9209, pooled estimate: 0.76, 95% CI: 0.67 to 0.84) as was specificity (9 data points, n =

9209, pooled estimate: 0.71, 95% CI: 0.63 to 0.79). Metaregression showed that sample type

(clinical vs. nonclinical; t(1, 7) = 0.08, p = 0.93, R2 < 0), sample age (child vs adolescent; t(1,

7) = 0.80, p = 0.45, R2 < 0) risk of bias quality score (t(1, 7) = 1.71, p = 0.13, R2 = 18.04%)

and cutpoint used (t (1, 7) = 0.70, p = 0.51, R2 < 0) did not significantly affect the sensitivity

values on the CES-D. Similar results were found for specificity, with no impact of sample

type (t(1, 7) = 0.12, p = 0.91, R2 < 0), age (t(1, 7) = 1.36, p = 0.21, R2 = 12.69%), risk of bias

quality score (t(1, 7) = 0.77, p = 0.46, R2 < 0) or cutpoint used (t(1, 7) = 2.12, p = 0.07, R2 =

32.62%).

Five studies (six data points) examined the PPV and NPV of the CES-D, with only one study

conducted among a clinical sample (Logsdon and Myers, 2010), and the remainder in

nonclinical samples (Fendrich et al., 1990, Garrison et al., 1991, Logsdon and Myers, 2010,

Prescott et al., 1998, Roberts et al., 1991). PPV was low overall, with values of 0.08 (Roberts

et al., 1991), 0.15 (Fendrich et al., 1990), 0.16 (Garrison et al., 1991), 0.24 (Prescott et al.,

1998), 0.25 (Logsdon and Myers, 2010) and 0.32 (Garrison et al., 1991). NPV was good

overall (with the exception of the one study conducted among a clinical sample where NPV =

0.12; (Logsdon and Myers, 2010)), and values were 0.96 (Fendrich et al., 1990, Prescott et

al., 1998), 0.98 (Garrison et al., 1991) and 0.99 (Roberts et al., 1991). Seven studies (nine

data points) examined diagnostic utility of the CES-D using AUC analyses (Betancourt et al.,

2012, Cuijpers et al., 2008, Garrison et al., 1991, Logsdon and Myers, 2010, Prescott et al.,

1998, Roberts et al., 1991, Yang et al., 2004), which was classified as ‘moderate’ overall (9

data points, n = 8989, pooled estimate: 0.83, 95% CI: 0.74 to 0.90, I2 = 99%). Metaregression

22

revealed that studies using higher cutpoints on the CES-D had higher diagnostic accuracy

(t(1, 8) = 5.1, p = 0.01, β = 1.01, R2 = 53.22%), with no significant effect found for sample

type (clinical vs. nonclinical; t(1, 8) = 1.54, p = 0.17, R2 = 12.88% ), sample age (child vs.

adolescent; t(1, 8) = 1.12, p = 0.30, R2 = 7.93%), and the risk of bias quality score (t(1, 8) =

0.69, p = 0.51, R2 < 0).

Reynolds Adolescent Depression Scale (RADS)

The Reynolds Adolescent Depression Scale is a 30-item self report instrument developed in

1986 to assess depression symptom severity in adolescents (13-18 years) (Reynolds, 1986).

Responses are made on a four point scale indicating frequency (almost never, hardly ever,

sometimes, most of the time) of symptoms characteristics of depression. The scale assesses

cognitive, somatic, psychomotor, and interpersonal symptoms derived from the DSM-III,

with an assessment period of the past two weeks. The RADS takes approximately 10 minutes

to complete, and scores range from 30 to 120 (Walker et al., 2005a). The RADS-2 is an

updated version of the original RADS. It also contains 30 items, and addresses several

psychometric issues in the original RADS such as re-examining estimates of internal

consistency, criterion related validity, and known groups validity (Osman et al., 2010).

Scores across the reliability, validity and diagnostic utility for the RADS among clinical and

nonclinical samples of children and adolescents are summarised in Table 4. Six studies (7

data points) examined the internal reliability of the RADS (Boyd and Gullone, 1997, Krefetz

et al., 2002, Osman et al., 2010, Reynolds and Miller, 1985, Walker et al., 2005b, Weber and

Terhorst, 2010), which was classified as ‘excellent’ overall (7 data points, n = 11095, pooled

estimate: 0.93, 95% CI: 0.86 to 0.99), however heterogeneity was high (I2 = 92%).

Metaregression showed that sample type (clinical vs. nonclinical; t(1, 4) = 0.52, p = 0.62, R2

23

< 0), sample age (child vs adolescent; t(1, 4) = 1.10, p = 0.32, R2 = 5.56% ) and risk of bias

quality score (t(1, 4) = 0.66, p = 0.54, R2 < 0) did not significantly affect the internal

reliability values on the RADS.

Only two studies examined the discriminative validity of the full scale RADS (Krefetz et al.,

2002, Osman et al., 2010) using cutoff scores of 67 and 70, however neither study used a

structured or semi-structured clinical interview based on DSM or ICD-10 criteria to

determine a diagnosis of MDD, and thus meta analyses and metaregression for sensitivity,

specificity and AUC values were not conducted.

Overall judgement

Comparisons across scales using metaregression revealed no differences in the four key

measures of reliability and validity. Internal reliability was classified as ‘good’ overall (47

data points, n = 31593, pooled estimate: 0.89, 95% CI: 0.86 to 0.92, I2 = 93%), with

metaregression revealing no impact of scale type (CDI, BDI, CES-D, RADS; t(1, 44) = 2.54,

p = 0.07, R2 = 15.31%).sample (clinical vs. nonclinical; t(1, 44) = 0.21, p = 0.84, R2 < 0), age

(child vs. adolescent; t(1, 44) = 1.30, p = 0.20, R2 = 3.33%) or risk of bias quality score (t(1,

44) = 0.05, p = 0.96, R2 < 0). Sensitivity and specificity coefficients were ‘moderate’

(sensitivity: 29 data points, n = 15238, pooled estimate: 0.80, 95% CI: 0.76 to 0.84;

specificity: 29 data points, n = 15238, pooled estimate: 0.78, 95% CI: 0.74 to 0.83), with

metaregression revealing no impact of scale type (t(1, 29) = 1.36, p = 0.18, R2 = 3.06%),

sample type (t(1, 29 = 1.24, p = 0.22, R2 = 1.53%) age (t(1, 29) = 1.25, p = 0.22, R2 = 1.60%)

cutpoint (t(1, 29) = 0.55, p = 0.59, R2 < 0) or risk of bias quality score (t(1, 29) = 1.79, p =

0.08, R2 = 6.54% for sensitivity coefficients. Similar results were found for specificity, with

no difference on the basis of scale type (t(1, 29) = 1.65, p = 0.12, R2 = 16.12%) sample (t(1,

24

29) = 0.32, p = 0.75, R2 < 0), age (t(1, 29) = 1.23, p = 0.23, R2 = 1.41%), cutpoint (t(1, 29) =

0.99. p = 0.33, R2 < 0) or risk of bias quality score (t (1, 29) = 0.25, p = 0.80, R2 < 0). Overall

diagnostic accuracy was ‘moderate’ (20 data points, n = 13384, pooled estimate: 0.86, 95%

CI: 0.79 to 0.92, I2 = 98%), with metaregression revealing no impact of scale type (t(1, 20) =

1.52, p = 0.14, R2 =6.19%) , sample (t(1, 20) = 0.84, p = 0.81, R2 < 0) , age (t(1, 20) = 0.41, p

= 0.68, R2 < 0) cutpoint (t(1, 20) = 0.25, p = 0.80, R2 < 0) or risk of bias quality score (t (1,

20) = 0.97, p = 0.34, R2 < 0).

Discussion

This is the first systematic review and meta analysis to provide an objective judgement of the

reliability, validity and diagnostic utility of symptom screening scales used in prevention

research for MDD among children and adolescents. The review identified 54 studies

(published in 52 papers), of which 32 studies were published since the last major narrative

review was conducted in 2002 (Myers and Winters, 2002b). The outcomes of this review

indicate that commonly used symptom screening scales for MDD including the CDI, BDI,

CES-D and the RADS have good internal consistency when used among children and

adolescents; and metaregression revealed that these properties did not differ when a range of

factors were considered, including sample type (clinical vs. nonclinical), age (child vs.

adolescent) and the risk of bias quality score. While the ability of each scale to correctly

identify positive and negative cases (i.e. diagnostic utility) was moderate overall; positive

predictive power was poor across most scales, suggesting that using cutoff scores on these

scales to determine clinical levels of MDD may result in high misclassification rates,

particularly when used in nonclinical settings (such as schools). Importantly, cutoff scores to

determine ‘caseness’ varied widely, and no single score was identified to be applicable in

both clinical and community settings, for any scale. However, metaregression revealed that

25

studies using higher cutoff scores on the CES-D had higher diagnostic accuracy (AUC) than

those with lower scores. Regardless, if researchers or clinicians choose to employ cutoff

scores on commonly used symptom screening scales for MDD to determine clinical

‘caseness’, they will most likely need to be adjusted to suit the sample under investigation.

Limitations

There are several limitations to consider when interpreting the outcomes of this review.

Firstly, risk of bias for the included studies was largely unable to be determined, or

determined to be high due to insufficient reporting. Secondly, many studies adjusted the

cutoff score to determine clinical caseness post hoc in order to maximise sensitivity and

specificity values, which may have artificially increased the diagnostic accuracy results using

AUC analyses, as these values are based on a trade off of sensitivity and specificity.

Consequently, the results regarding diagnostic utility identified in this review may differ in

real world settings where specific cutpoints are typically chosen a priori to classify

participants as having clinically significant depressive symptoms. Further, such cutpoints are

likely to vary substantially on a study by study basis and depending on the context in which

the scale is used (e.g. schools versus clinics). Thirdly, positive and negative predictive values

are dependent on the prevalence of the disorder in the sample (Trikalinos et al., 2012), and as

such, the low positive predictive values identified in this review may reflect a low prevalence

of depression among some samples; however the small number of studies and significant

heterogeneity between them precluded the use of metaregression to determine factors which

may have influenced predictive power in this review. Where metaregression was conducted,

it is possible that some analyses were underpowered due to low numbers of studies with

adequate data, as evidenced by low adjusted R2 values for some outcomes. A greater number

26

of high quality studies examining the psychometric properties of these scales across a range

of settings and ages is needed in in order to identify circumstances in which performance of

these scales is likely to differ. Further, while our review considered internal reliability of the

scales, other measures of reliability that may impact scale utility, including test retest and

inter rater reliability were less consistently reported, precluding the conduct of meta analyses

and metaregression on these outcomes. Finally, it is also important to consider that while the

criterion standards employed throughout this review (i.e. diagnostic interview schedules

based on DSM or ICD-10 diagnostic criteria such as the K-SADs) are currently the gold

standard in detecting the presence of a depressive disorder (Carlisle and McClellan, 2009),

they are not perfect measures, and are also open to the same issues of reliability and validity

as the screening scales themselves (Hodges, 1993).

There has been recent criticism of binary classification systems such as the DSM, which take

the assumption that people should be classified as either having or not having a particular

psychopathological disorder (Hankin et al., 2005), with suggestions that depression may

occur on a continuum (Solomon et al., 2001), particularly so in youth (Hankin et al., 2005). In

light of this, other outcomes in depression treatment that take into account the disability

caused by the disorder have been considered, such as functional impairment (McKnight and

Kashdan, 2009). Recent research suggests that indicators of disability or functional

impairment, in the absence of a formal diagnosis of a mental disorder may be a useful

measure of need for mental health services and treatment. An analysis of the 2007 Australian

Survey of Mental Health and Wellbeing found that a significant proportion of those who had

used mental health services in the past 12 months did not have a formally diagnosed mental

disorder, but had other indicators of possible need, including maintenance treatment

following a previous episode or significant psychological distress in the absence of a

diagnosis (Harris et al., 2014). Thus, in addition to using classification systems such as ICD

27

and DSM to determine mental disorder diagnoses and symptom screening scales to measure

symptom severity, researchers examining the efficacy of depression prevention and

treatment interventions should also consider including items assessing functional impairment,

as this may be a useful indicator of disability and need for treatment, even if a formal mental

diagnosis is not given (Üstün and Kennedy, 2009).

While there has been some suggestion that routine screening of MDD symptoms may

improve the detection of, and initiate timely treatment for MDD among children and

adolescents at a population level (Thombs et al., 2012), the impact of routine screening for

MDD on health outcomes for children and adolescents is unclear (Williams et al., 2009), and

there is some concern that screening may have the potential to cause harm to those who may

be diagnosed and treated inappropriately (MacMillan et al., 2005). Reviews conducted by the

Canadian Task Force on Preventive Health Care, and the United Kingdom’s National

Institute of Health and Clinical Excellence both concluded that there was insufficient

evidence regarding health outcomes to support the routine screening of MDD among children

and adolescents in primary healthcare settings (MacMillan et al., 2005, National

Collaborating Centre for Mental Health, 2005). Unfortunately there have been few large

prospective cohort studies that have examined the ability of MDD screening scales in

predicting the onset of depression among adolescents in order to inform their routine use in

healthcare settings (van Lang et al., 2007). Interestingly such studies have indicated that

recurrent screening of depressive symptoms does not improve detection of MDD, but that

cognitive and physical symptoms (such as sleeping problems) may be better predictors.

(McKenzie et al., 2011, van Lang et al., 2007). For example, McKenzie et al, 2011 identified

that particular items of the Short Mood and Feelings Questionnaire (SMFQ) that are not

included in formal diagnostic criteria for MDD, such as “I hated myself” were most

predictive of high depressive symptoms 12 months later. Such single items may have utility

28

for being incorporated into short screening measures, or existing population level surveys -

many of which exclude mental disorder symptomatology (Baxter et al., 2013), however

further research using large cohort studies is needed to support this.

Conclusions

Symptom screening scales for MDD are reliable measures of MDD symptomatology among

adolescent samples. While we found that the psychometric properties of the scales did not

differ when used in clinical and nonclinical settings, across a range of subject ages and with

variable study quality, limited data was available, and further high quality studies are needed

to identify conditions in which test performance may vary. Cutoff scores on symptom

screening scales for MDD may be useful for identifying ‘risk status’, however, the cutoffs

used will likely vary depending on the context in which the scale is applied, and

misclassification is likely to be high, particularly in nonclinical samples where disorder

prevalence is low (e.g. school settings).Further research examining the predictive ability of

depression screening scales in cohort studies of children and adolescents are needed.

29

References

Adewuya, A. O., Ola, B. A. & Aloba, O. O. 2007. Prevalence of major depressive disorders

and a validation of the Beck Depression Inventory among Nigerian adolescents. European Child & Adolescent Psychiatry, 16, 287-292.

Aebi, M., Metzke, C. W. & Steinhausen, H. C. 2009. Prediction of major affective disorders in adolescents by self-report measures. J Affect Disord, 115, 140-9.

Allgaier, A. K., Fruhe, B., Pietsch, K., Saravo, B., Baethmann, M. & Schulte-Korne, G. 2012. Is the Children's Depression Inventory Short version a valid screening tool in pediatric care? A comparison to its full-length version. Journal of psychosomatic research, 73, 369-74.

Ambrosini, P. J., Metz, C., Bianchi, M. D., Rabinovich, H. & Undie, A. 1991. Concurrent validity and psychometric properties of the Beck Depression Inventory in outpatient adolescents. J Am Acad Child Adolesc Psychiatry, 30, 51-7.

Andrews, G., Szabo, M. & Burns, J. 2002. Preventing major depression in young people. Brit J Psychiat, 181, 460-462.

Andrews, J. A., Lewinsohn, P. M., Hops, H. & Roberts, R. E. 1993. Psychometric properties of scales for the measurement of psychosocial variables associated with depression in adolescence. Psychol Rep, 73, 1019-1046.

Barrera, M., Jr. & Garrison-Jones, C. V. 1988. Properties of the Beck Depression Inventory as a screening instrument for adolescent depression. Journal of abnormal child psychology, 16, 263-73.

Baxter, A. J., Patton, G., Scott, K. M., Degenhardt, L. & Whiteford, H. A. 2013. Global epidemiology of mental disorders: what are we missing? PLoS One, 8, e65514.

Beck, A. T. & Alford, B. A. 2009. Depression: Causes and treatment, University of Pennsylvania Press.

Beck, A. T., Steer, R. A., Ball, R. & Ranieri, W. F. 1996. Comparison of Beck Depression Inventories-IA and-II in psychiatric outpatients. J Pers Assess, 67, 588-597.

Beck, A. T., Ward, C. H., Mendelson, M., Mock, J. & Erbaugh, J. 1961. An inventory for measuring depression. Arch Gen Psychiatry, 4, 561.

Bennett, D. S., Ambrosini, P. J., Bianchi, M., Barnett, D., Metz, C. & Rabinovich, H. 1997. Relationship of Beck Depression Inventory factors to depression among adolescents. J Affect Disord, 45, 127-34.

Betancourt, T., Scorza, P., Meyers-Ohki, S., Mushashi, C., Kayiteshonga, Y., Binagwaho, A., Stulac, S. & Beardslee, W. R. 2012. Validating the center for epidemiological studies depression scale for children in Rwanda. Journal of the American Academy of Child & Adolescent Psychiatry, 51, 1284-1292.

Birmaher, B., Ryan, N. D., Williamson, D. E., Brent, D. A., Kaufman, J., Dahl, R. E., Perel, J. & Nelson, B. 1996. Childhood and adolescent depression: a review of the past 10 years. Part I. J Am Acad Child Adolesc Psychiatry, 35, 1427-39.

Blom, E. H., Larsson, J.-O., Serlachius, E. & Ingvar, M. 2010. The differentiation between depressive and anxious adolescent females and controls by behavioural self-rating scales. J Affect Dis, 122, 232-240.

Boyd, C. P. & Gullone, E. 1997. An investigation of negative affectivity in Australian adolescents. Journal of Clinical Child Psychology, 26, 190-197.

Brent, D. A., Kalas, R., Edelbrock, C., Costello, A. J., Dulcan, M. K. & Conover, N. 1986. Psychopathology and its relationship to suicidal ideation in childhood and adolescence. J Am Acad Child Psychiatry, 25, 666-73.

30

Brooks, S. J. & Kutcher, S. 2001. Diagnosis and measurement of adolescent depression: a review of commonly utilized instruments. J Child Adolesc Psychopharmacol, 11, 341-76.

Brooks, S. J. & Kutcher, S. 2003. Diagnosis and measurement of anxiety disorder in adolescents: a review of commonly used instruments. J Child Adolesc Psychopharmacol, 13, 351-400.

Canals, J., Bladé, J., Carbajo, G. & Domènech-Llabería, E. 2001. The Beck Depression Inventory: Psychometric characteristics and usefulness in nonclinical adolescents. European Journal of Psychological Assessment, 17, 63-68.

Carey, M. P., Faulstich, M. E., Gresham, F. M., Ruggiero, L. & Enyart, P. 1987. Children's Depression Inventory: Construct and discriminant validity across clinical and nonreferred (control) populations. J Consult Clin Psychol, 55, 755.

Carlisle, L. & McClellan, J. 2009. Diagnostic interviews. In: DULCAN, M. K. (ed.) Textbook

of Child and Adolescent Psychiatry. Washington, DC: American Psychiatric Publishing.

Collett, B. R., Ohan, J. L. & Myers, K. M. 2003. Ten-Year Review of Rating Scales. V: Scales Assessing Attention-Deficit/Hyperactivity Disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 42, 1015-1037.

Costello, J. E., Erkanli, A. & Angold, A. 2006. Is there an epidemic of child or adolescent depression? J Child Psychol Psychiatry, 47, 1263-71.

Craighead, W. E., Curry, J. F. & Ilardi, S. S. 1995. Relationship of Children's Depression Inventory factors to major depression among adolescents. Psychol Assess, 7, 171.

Crowley, S. L. & Emerson, E. N. 1996. Discriminant validity of self-reported anxiety and depression in children: Negative affectivity or independent constructs? Journal of Clinical Child Psychology, 25, 139-146.

Crowley, S. L., Thompson, B. & Worchel, F. 1994. Validity Studies the Children'S Depression Inventory: A Comparison of Generalizability and Classical Test Theory Analyses. Educational and Psychological Measurement, 54, 705-713.

Cuijpers, P., Boluijt, P. & van Straten, A. 2008. Screening of depression in adolescents through the Internet : sensitivity and specificity of two screening questionnaires. European child & adolescent psychiatry, 17, 32-8.

Dolle, K., Schulte-Korne, G., O'Leary, A. M., von Hofacker, N., Izat, Y. & Allgaier, A. K. 2012. The Beck depression inventory-II in adolescent mental health patients: cut-off scores for detecting depression and rating severity. Psychiatry Res, 200, 843-8.

Fendrich, M., Weissman, M. M. & Warner, V. 1990. Screening for depressive disorder in children and adolescents: validating the Center for Epidemiologic Studies Depression Scale for Children. American journal of epidemiology, 131, 538-51.

Ferrari, A. J., Somerville, A. J., Baxter, A. J., Norman, R., Patten, S. B., Vos, T. & Whiteford, H. A. 2013. Global variation in the prevalence and incidence of major depressive disorder: a systematic review of the epidemiological literature. Psychol Med, 43, 471-81.

Figueras Masip, A., Amador-Campos, J. A., Gomez-Benito, J. & del Barrio Gandara, V. 2010. Psychometric properties of the Children's Depression Inventory in community and clinical sample. The Spanish journal of psychology, 13, 990-9.

Fristad, M. A., Weller, E. B., Weller, R. A., Teare, M. & Preskorn, S. H. 1988. Self-report vs. biological markers in assessment of childhood depression. J Affect Disord, 15, 339-45.

Fundudis, T., Berney, T. P., Kolvin, I., Famuyiwa, O. O., Barrett, L., Bhate, S. & Tyrer, S. P. 1991. Reliability and validity of two self-rating scales in the assessment of childhood depression. The British journal of psychiatry. Supplement, 36-40.

31

Garrison, C. Z., Addy, C. L., Jackson, K. L., McKeown, R. E. & Waller, J. L. 1991. The CES-D as a screen for depression and other psychiatric disorders in adolescents. J Am Acad Child Adolesc Psychiatry, 30, 636-41.

Giannakopoulos, G., Kazantzi, M., Dimitrakaki, C., Tsiantis, J., Kolaitis, G. & Tountas, Y. 2009. Screening for children's depression symptoms in Greece: the use of the Children's Depression Inventory in a nation-wide school-based sample. European Child & Adolescent Psychiatry, 18, 485-92.

Gore, F. M., Bloem, P. J. N., Patton, G. C., Ferguson, J., Joseph, V., Coffey, C., Sawyer, S. M. & Mathers, C. D. 2011. Global burden of disease in young people aged 10–24 years: a systematic analysis. Lancet, 377, 2093-2102.

Hankin, B. L., Fraley, R. C., Lahey, B. B. & Waldman, I. D. 2005. Is depression best viewed as a continuum or discrete category? A taxometric analysis of childhood and adolescent depression in a population-based sample. Journal of abnormal psychology, 114, 96-110.

Harris, M. G., Diminic, S., Burgess, P. M., Carstensen, G., Stewart, G., Pirkis, J. & Whiteford, H. A. 2014. Understanding service demand for mental health among Australians aged 16 to 64 years according to their possible need for treatment. Aust NZ J Psychiat.

Henderson, A. R. 1993. Assessing test accuracy and its clinical consequences: a primer for receiver operating characteristic curve analysis. Ann Clin Biochem, 30 ( Pt 6), 521-39.

Higgins, J., Thompson, S., Deeks, J. & Altman, D. 2003. Measuring inconsistency in meta-analyses. Brit Med J, 327, 557-560.

Hodges, K. 1993. Structured interviews for assessing children. J Child Psychol Psychiatry, 34, 49-68.

Hosman, C., Jane-Llopis, E. & Saxena, S. 2005. Prevention of mental disorders: effective interventions and policy options Oxford, Oxford University Press.

Jolly, J. B., Wiesner, D. C., Wherry, J. N., Jolly, J. M. & Dykman, R. A. 1994. Gender and the comparison of self and observer ratings of anxiety and depression in adolescents. J Am Acad Child Adolesc Psychiatry, 33, 1284-8.

Kashani, J. H., Sherman, D. D., Parker, D. R. & Reid, J. C. 1990. Utility of the Beck Depression Inventory with clinic-referred adolescents. J Am Acad Child Adolesc Psychiatry, 29, 278-82.

Kovacs, M. 1984. The Children's Depression, Inventory (CDI). Psychopharmacol Bull, 21, 995-998.

Kovacs, M. & Beck, A. T. 1977. An empirical-clinical approach toward a definition of childhood depression. Depression in childhood: Diagnosis, treatment, and conceptual models, 1-25.

Krefetz, D. G., Steer, R. A., Gulab, N. A. & Beck, A. T. 2002. Convergent validity of the Beck depression inventory-II with the reynolds adolescent depression scale in psychiatric inpatients. J Pers Assess, 78, 451-60.

Kumar, G., Steer, R. A., Teitelman, K. B. & Villacis, L. 2002. Effectiveness of Beck Depression Inventory-II subscales in screening for major depressive disorders in adolescent psychiatric inpatients. Assessment, 9, 164-70.

Leaf, P. J., Alegria, M., Cohen, P., Goodman, S. H., Horwitz, S. M., Hoven, C. W., Narrow, W. E., Vaden-Kiernan, M. & Regier, D. A. 1996. Mental health service use in the community and schools: results from the four-community MECA Study. Methods for the Epidemiology of Child and Adolescent Mental Disorders Study. J Am Acad Child Adolesc Psychiatry, 35, 889-97.

32

Leeflang, M. M., Deeks, J. J., Rutjes, A. W., Reitsma, J. B. & Bossuyt, P. M. 2012. Bivariate meta-analysis of predictive values of diagnostic tests can be an alternative to bivariate meta-analysis of sensitivity and specificity. Journal of clinical epidemiology, 65, 1088-97.

Liberati, A., Altman, D. G., Tetzlaff, J., Mulrow, C., Gøtzsche, P. C., Ioannidis, J. P. A., Clarke, M., Devereaux, P. J., Kleijnen, J. & Moher, D. 2009. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ, 339.

Logsdon, M. C. & Myers, J. A. 2010. Comparative performance of two depression screening instruments in adolescent mothers. J Womens Health 19, 1123-8.

MacMillan, H. L., Patterson, C. J. S., Wathen, C. N. & Care, a. T. C. T. F. o. P. H. 2005. Screening for depression in primary care: recommendation statement from the Canadian Task Force on Preventive Health Care. Can Med Assoc J, 172, 33-35.

Marton, P., Churchard, M., Kutcher, S. & Korenblum, M. 1991. Diagnostic utility of the Beck Depression Inventory with adolescent psychiatric outpatients and inpatients. Canadian journal of psychiatry. Revue canadienne de psychiatrie, 36, 428-31.

McKenzie, D. P., Toumbourou, J. W., Forbes, A. B., Mackinnon, A. J., McMorris, B. J., Catalano, R. F. & Patton, G. C. 2011. Predicting future depression in adolescents using the Short Mood and Feelings Questionnaire: a two-nation study. J Affect Dis, 134, 151-9.

McKnight, P. E. & Kashdan, T. B. 2009. The importance of functional impairment to mental health outcomes: A case for reassessing our goals in depression treatment research. Clinical Psychology Review, 29, 243.

Merry, S., McDowell, H., Hetrick, S., Bir, J. & Muller, N. 2004. Psychological and/or educational interventions for the prevention of depression in children and adolescents. Cochrane Database Syst Rev, CD003380.

Merry, S. N., Hetrick, S. E., Cox, G. R., Brudevold-Iversen, T., Bir, J. J. & McDowell, H. 2011. Psychological and educational interventions for preventing depression in children and adolescents. Cochrane Database Syst Rev, CD003380.

Murphy, K. & Davidshofer, C. 1994. Psychological testing: principles and applications. Upper Saddle River, NJ: Prentice Hall.

Myers, K. & Winters, N. C. 2002a. Ten-year review of rating scales. I: overview of scale functioning, psychometric properties, and selection. J Am Acad Child Adolesc Psychiatry, 41, 114-22.

Myers, K. & Winters, N. C. 2002b. Ten-year review of rating scales. II: Scales for internalizing disorders. J Am Acad Child Adolesc Psychiatry, 41, 634-59.

National Collaborating Centre for Mental Health 2005. Depression in children and young people: Identification and management in primary, community and secondary care,

UK, NICE. Nelson III, W., Politano, P., Finch Jr, A., Wendel, N. & Mayhall, C. 1987. Children's

Depression Inventory: Normative data and utility with emotionally disturbed children. Journal of the American Academy of Child & Adolescent Psychiatry, 26, 43-48.

Osman, A., Gutierrez, P. M., Bagge, C. L., Fang, Q. & Emmerich, A. 2010. Reynolds adolescent depression scale-second edition: a reliable and useful instrument. J Clin Psychol, 66, 1324-45.

Pavuluri, M. & Birmaher, B. 2004. A practical guide to using ratings of depression and anxiety in child psychiatric practice. Current psychiatry reports, 6, 108-16.

Ponterotto, J. G. & Ruckdeschel, D. E. 2007. An overview of coefficient alpha and a reliability matrix for estimating adequacy of internal consistency coefficients with psychological research measures. Percept Motor Skill, 105, 997-1014.

33

Prescott, C. A., McArdle, J. J., Hishinuma, E. S., Johnson, R. C., Miyamoto, R. H., Andrade, N. N., Edman, J. L., Makini, G. K., Jr., Nahulu, L. B., Yuen, N. Y. & Carlton, B. S. 1998. Prediction of major depression and dysthymia from CES-D scores among ethnic minority adolescents. J Am Acad Child Adolesc Psychiatry, 37, 495-503.

Radloff, L. S. 1977. The CES-D scale a self-report depression scale for research in the general population. Appl Psych Meas, 1, 385-401.

Reavley, N. J., Cvetkovski, S., Jorm, A. F. & Lubman, D. I. 2010. Help-seeking for substance use, anxiety and affective disorders among young people: results from the 2007 Australian National Survey of Mental Health and Wellbeing. Aust N Z J Psychiatry, 44, 729-35.

Reitsma, J., Rutjes, A., Whiting, P., Vlassov, V., Leeflang, M. & Deeks, J. 2009. Chapter 9: Assessing methodological quality. In: DEEKS JJ, B. P., GATSONIS C (ed.) Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy Version

1.0.0. Available from: http://srdta.cochrane.org/. The Cochrane Collaboration. Reynolds, W. M. 1986. A model for the screening and identification of depressed children

and adolescents in school settings. Prof School Psychol, 1, 117. Reynolds, W. M., Anderson, G. & Bartell, N. 1985. Measuring depression in children: a

multimethod assessment investigation. Journal of abnormal child psychology, 13, 513-26.

Reynolds, W. M. & Miller, K. L. 1985. Depression and learned helplessness in mentally retarded and nonmentally retarded adolescents: An initial investigation. Applied Research in Mental Retardation, 6, 295-306.

Roberts, R. E., Lewinsohn, P. M. & Seeley, J. R. 1991. Screening for adolescent depression: a comparison of depression scales. J Am Acad Child Adolesc Psychiatry, 30, 58-66.

Roelofs, J., Braet, C., Rood, L., Timbremont, B., van Vlierberghe, L., Goossens, L. & van Breukelen, G. 2010. Norms and screening utility of the Dutch version of the Children's Depression Inventory in clinical and nonclinical youths. Psychol Assess, 22, 866-77.

Russell, P. S., Basker, M., Russell, S., Moses, P. D., Nair, M. K. & Minju, K. A. 2012. Comparison of a self-rated and a clinician-rated measure for identifying depression among adolescents in a primary-care setting. Indian journal of pediatrics, 79 Suppl 1, S45-51.

Saylor, C. F., Finch, A. J., Jr., Spirito, A. & Bennett, B. 1984. The children's depression inventory: a systematic evaluation of psychometric properties. J Consult Clin Psychol, 52, 955-67.

Shemesh, E., Yehuda, R., Rockmore, L., Shneider, B. L., Emre, S., Bartell, A. S., Schmeidler, J., Annunziato, R. A., Stuber, M. L. & Newcorn, J. H. 2005. Assessment of depression in medically ill children presenting to pediatric specialty clinics. J Am Acad Child Adolesc Psychiatry, 44, 1249-57.

Smucker, M. R., Craighead, W. E., Craighead, L. W. & Green, B. J. 1986. Normative and reliability data for the Children's Depression Inventory. Journal of abnormal child psychology, 14, 25-39.

Solomon, A., Haaga, D. A. & Arnow, B. A. 2001. Is clinical depression distinct from subthreshold depressive symptoms? A review of the continuity issue in depression research. The Journal of nervous and mental disease, 189, 498-506.

Sorensen, M. J., Frydenberg, M., Thastum, M. & Thomsen, P. H. 2005. The Children's Depression Inventory and classification of major depressive disorder: validity and reliability of the Danish version. European child & adolescent psychiatry, 14, 328-34.

StataCorp 2013. Stata Statistical Software: Release 13. College Station, TX: StataCorp, LP.

34

Strober, M., Green, J. & Carlson, G. 1981. Utility of the Beck Depression Inventory with psychiatrically hospitalized adolescents. J Consult Clin Psychol, 49, 482-3.

Suzuki, T. 2011. Which rating scales are regarded as 'the standard' in clinical trials for schizophrenia? A critical review. Psychopharmacol Bull, 44, 18-31.

Teri, L. 1982. The use of the beck depression inventory with adolescents. Journal of abnormal child psychology, 10, 277-284.

Thombs, B. D., Roseman, M. & Kloda, L. A. 2012. Depression screening and mental health outcomes in children and adolescents: a systematic review protocol. Syst Rev, 1, 58.

Thompson, R. D., Craig, A. E., Mrakotsky, C., Bousvaros, A., DeMaso, D. R. & Szigethy, E. 2012. Using the Children's Depression Inventory in youth with inflammatory bowel disease: support for a physical illness-related factor. Comprehensive psychiatry, 53, 1194-9.

Thrane, L. E., Whitbeck, L. B., Hoyt, D. R. & Shelley, M. C. 2004. Comparing three measures of depressive symptoms among American Indian adolescents. American Indian and Alaska native mental health research (Online), 11, 20-42.

Timbremont, B., Braet, C. & Dreessen, L. 2004. Assessing depression in youth: relation between the Children's Depression Inventory and a structured interview. Journal of clinical child and adolescent psychology : the official journal for the Society of Clinical Child and Adolescent Psychology, American Psychological Association, Division 53, 33, 149-57.

Trikalinos, T., Balion, C., Coleman, C., Griffith, L., Santaguida, P. L., Vandermeer, B. & Fu, R. 2012. Chapter 8: Meta-Analysis of Test Performance When There Is a “Gold Standard”. In: CHANG, S., MATCHAR, D. & SMETANA, G. (eds.) Methods Guide

for Medical Test Reviews [Internet]. Rockville (MD): Agency for Healthcare Research and Quality (US).

Üstün, B. & Kennedy, C. 2009. What is “functional impairment”? Disentangling disability from clinical significance World Psychiatry, 8, 85-85.

van Lang, N. D. J., Ferdinand, R. F. & Verhulst, F. C. 2007. Predictors of future depression in early and late adolescence. J Affect Dis, 97, 137-144.

Walker, L., Merry, S., Watson, P. D., Robinson, E., Crengle, S. & Schaaf, D. 2005a. The Reynolds Adolescent Depression Scale in New Zealand adolescents. Aust NZ J Psychiat, 39, 136-140.

Walker, L., Merry, S., Watson, P. D., Robinson, E., Crengle, S. & Schaaf, D. 2005b. The Reynolds Adolescent Depression Scale in New Zealand adolescents. Australian & New Zealand Journal of Psychiatry, 39, 136-40.

Weber, S. & Terhorst, L. 2010. Does the currently accepted RADS-2 factor structure apply to LGBTIQ youth? Journal of Psychiatric & Mental Health Nursing, 17, 621-7.

Weiss, B., Weisz, J. R., Politano, M., Carey, M., Nelson, W. M. & Finch, A. J. 1991. Developmental differences in the factor structure of the Children's Depression Inventory. Psychological Assessment: A Journal of Consulting and Clinical Psychology, 3, 38.

Weissman, M. M., Orvaschel, H. & Padian, N. 1980. Children's symptom and social functioning self-report scales comparison of mothers' and children's reports. The Journal of nervous and mental disease, 168, 736-740.

Whitaker, A., Johnson, J., Shaffer, D., Rapoport, J. L., Kalikow, K., Walsh, B. T., Davies, M., Braiman, S. & Dolinsky, A. 1990. Uncommon troubles in young people: prevalence estimates of selected psychiatric disorders in a nonreferred adolescent population. Arch Gen Psychiatry, 47, 487-96.

Whiting, P. F., Rutjes, A. W., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., Leeflang, M. M., Sterne, J. A. & Bossuyt, P. M. 2011a. QUADAS-2: a revised tool

35

for the quality assessment of diagnostic accuracy studies. Annals of internal medicine, 155, 529-36.

Whiting, P. F., Rutjes, A. W. S., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., Leeflang, M. M. G., Sterne, J. A. C. & Bossuyt, P. M. M. 2011b. QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies. Annals of internal medicine, 155, 529-536.

Williams, S. B., O'Connor, E. A., Eder, M. & Whitlock, E. P. 2009. Screening for child and adolescent depression in primary care settings: A systematic evidence review for the US Preventive Services Task Force. Pediatrics, 123, e716-e735.

Yang, H. J., Soong, W. T., Kuo, P. H., Chang, H. L. & Chen, W. J. 2004. Using the CES-D in a two-phase survey for depressive disorders among nonreferred adolescents in Taipei: a stratum-specific likelihood ratio analysis. J Affect Disord, 82, 419-30.

36

Acknowledgements

The authors would like to acknowledge Professor Gavin Andrews (UNSW Australia) and Dr

Mary Lou Chatterton (Deakin University) for their assistance in developing ideas and

refining the concept of the manuscript.

37

Conflict of interest

All authors declare that they have no conflicts of interest.

38

Contributors

EAS, LD and CM conceived the study. EAS, YL and AL conducted the reviews and extracted

the data, and EAS and AL synthesised the data, with MH and GP contributing to

interpretation. All authors contributed to, and approved the final manuscript.

39

Role of funding source

The work was funded by the National Health and Medical Research Council of Australia

(NH&MRC Grant No. APP1041131).

40

Table 1. Validation evidence for the Children’s Depression Inventory (CDI) in child and adolescent samples

Source N Age

&

Gen

der

(%m, %f)

Sample

(location) Sca

le

na

me

(no. of ite

ms)

Reliabil

ity

Criterio

n

Cut

off

Sensiti

vity

Specifi

city

PP

V

NP

V

AU

C

Clinical

Allgaier et al. 2012

406

9-12 (56, 44)

Child hospital patients (Germany)

CDI (27)

α = 0.84

Kinder-DIPS

≥ 12

0.83

0.83

0.28

0.98

0.88

Figueras Masip et al. 2010a

102

10-18 (41, 59)

Child and adolescent clinical sample (Spain)

CDI (27)

α = 0.85

Clinical interview

19 0.95 0.96 - - -

Roelofs et al. 2010

511

7-18 (52, 48)

Child and adolescent clinical sample (Belgium)

CDI (27)

- KID-SCID

16 0.92 0.95 0.35

1.00

0.95

Shemesh et al. 2005

81 8-19 (42, 58)

Child and adolescent hospital patients (USA)

CDI (27)

- K-SADS-PL

11 0.80 0.70 0.38

0.94

-

Sorensen et al. 2005

149

8-13

Child and adolescent psychiatric patients (Denmark)

CDI (27)

α = 0.86

K-SADS-PL

12 0.63 0.64 0.21

0.63

0.72

Timbremont et al. 2004

80 8-18 (42, 58)

Child and adolescent psychiatric inpatients/outpatients, (Belgium, Netherlands)

CDI (27)

κ = 0.76

KID-SCID

16 0.84 0.94 0.63

0.98

0.86

Craighead et al. 1995

107

12-18 (54, 46)

Child and adolescent psychiatric inpatients (USA)

CDI (27)

- K-SADS-III-R

17 0.81 0.84 - - -

Weiss et al. 1991a

515

8-12 (73, 27)

Child in- and outpatients

CDI (27)

α = 0.86

- - - - - - -

Weiss et al. 1991b

768

13-16 (52, 48)

Adolescent in- and outpatients

CDI (27)

α = 0.88 - - - - - - -

Nelson III and Politano 1990

96 6-15 (61, 39)


CDI (27)

Test-retest: 0.62

- - - - - - -

Fristad et al. 1988

98 6-13 Child and adolescent depressed inpatients, psychiatric inpatient controls,

CDI (27)

- DICA ≥15 0.89 0.65 0.90

- -

41

normal controls (USA)

Carey et al. 1987a

153

9-17 (75, 25)

Child and adolescent inpatients (USA)

CDI (27)

α = 0.83 - - - - - - -

Nelson III et al. 1987

535

6-17 (68, 32)


CDI (27)

α = 0.86 - - - - - - -

Saylor et al. 1984a

105

7-16 (69, 31)

Child and adolescent inpatients (USA)

CDI (27)

Kuder-Richardson coefficient= 0.80

- - - - - - -

Non-

Clinical

Thompson et al. 2012

191

11-17 (53, 47)

Child and adolescent sample with inflammatory bowel disease (USA)

CDI (27)

α = 0.88 - - - - - - -

Figueras Masip et al. 2010b

1705

10-18 (46, 54)

Child and adolescent community sample (Spain)

CDI (27)

α = 0.82 Test-retest = 0.81

- - - - - - -

Giannakopoulos et al. 2009

538

8-12 (47, 53)

Child nation-wide school-based sample (Greece)

CDI (27)

α= 0.80; test-retest reliability ICC: 0.82 for girls and 0.62 for boys

- ≥ 15 - - - - -

Crowley and Emerson 1996

273

8-12 (51, 49)

Child school students (USA)

CDI (27)

α = 0.89 - - - - - - -

Crowley et al. 1994

164

11-16 (49, 51)

Child and adolescent community sample (USA)

CDI (27)

α = 0.86 - - - - - - -

Fundudis et al. 1991

93 8-16 (50, 50)

Children and adolescents of staff of a university child psychiatry department (USA)

CDI (27)

α = 0.88 Standardised Psychiatric Interview

15 0.77 0.77 - - -

Carey et al. 1987b

153

9-17 (74, 26)

Child and adolescent non-referred sample

CDI (27)

α = 0.75 - - - - - - -

Finch et al. 1987

108

7-12 (51, 49)

Child sample from public schools (USA)

CDI (27)

Test-retest: 0.82

- - - - - - -

42

Smucker et al. 1986

1252

8-16 (47, 53)

Child and adolescent sample from public schools (USA)

CDI (27)

α = 0.89 - - - - - - -

Reynolds et al 1985

166

8-12 (46, 54)


CDI (27)

α = 0.90 - - - - - - -

Saylor et al. 1984b

72 10-13 (31, 69)


CDI (27)

Kuder-Richardson coefficient= 0.94

- - - - - - -

Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis CDI = Children’s Depression Inventory α = Cronbach’s alpha reliability co-efficient κ = Cohen’s kappa reliability co-efficient Kinder-DIPS: The Diagnostic Interview for Psychiatric Disorders in Children and Adolescents KID-SCID: The child version of the Structured Clinical Interview for DSM Disorders K-SADS-PL: The Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version K-SADS-E: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – Epidemiological Version K-SADS-III-R: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – the revision of the third edition DICA: The Diagnostic Interview for Children and Adolescents

43

Table 2. Validation evidence for the Beck Depression Inventory (BDI) in child and adolescent samples

Source N Age &

Gender

(%m, %f)

Sample

(location)

Scal

e

nam

e

(no. of

items)

Reliabil

ity

Criterion Cuto

ff

Sensitiv

ity

Specific

ity

PP

V

NP

V

A

U

C

Clinic

al

Dolle et al. 2012

88 13-16 Adolescent psychiatric patients (Germany)

BDI-II (21)

α = 0.94 Kinder-DIPS

≥23 0.88 0.92 0.81

0.95 0.93

Blom et al. 2010

73 14-18 (0, 100)

Adolescent girls who were psychiatric patients (Sweden)

BDI-A1 (21)

- DAWBA >14 - - - - 0.86

Krefetz et al. 2002

100

12-17 (44, 56)


BDI-II (21)

α = 0.92 PRIME-MD

24 0.74 0.70 0.76

0.67 0.78

Kumar et al. 2002

100

12-17 (45, 55)


BDI-II (21)

α = 0.94 PRIME-MD

21 0.85 0.83 0.85

0.83 0.92

Bennett et al. 1997

328

11-19 (41.5, 58.5)

Child and adolescent psychiatric inpatients and outpatients of a clinic for depression (USA)

BDI (21)

- K-SADS 13 0.87 0.71 0.81

0.80 -

Jolly et al. 1994

75 12-17 (47, 53)

Child and adolescent inpatient

BDI (21)

α = 0.88 - - - - - - -

44

s (USA)

Ambrosini et al. 1991

122

12-19 (43, 57)

Child and adolescent outpatients referred to a clinic for depression (USA)

BDI (21)

α = 0.91 K-SADS-III-R

16 0.86 0.82 0.93

- -

Marton et al, 1991

119

14-19 (42.9; 57.1)

Adolescent inpatients and outpatients of a children's mental health centre with a current DSM-III Axis I psychiatric disorder (Depressed, n = 60 vs Non-Depressed, n = 59; USA)

BDI (21)

- DICA 16 0.68 0.81 0.79

- -

Barrera and Garrison-Jones 1988a

65 12-17 (48,52)


BDI (21)

α = 0.86 CAS 11 0.82 0.53 - - -

Strober et al. 1981

78 12-16 (46, 54)

Child and adolescent patients (USA)

BDI (21)

α = 0.79 SADS 16 0.81 0.81 - - -

Non-Clinical

Russell et al. 2012

181

14-17 Adolescents from three schools (India)

BDI (21)

κ = 0.06 K-SADS-PL

18 0.63 0.70 0.14

0.97 0.67

Adewuya et al. 2007

1095

13-18 (58, 42)

Adolescents attending secondary school (Nigeria)

BDI (21)

α = 0.82 K-SADS-E

18 0.91 0.97 0.88

0.98 0.99

45

Canals et al. 2001

304

17.5-18.5 (50.3,49.7)

Adolescents from an urban commercial area (Spain)

BDI (21)

- SCAN 16 0.90 0.96 0.47

0.99 -

Roberts et al. 1991a

1710

14-18 (47, 53)

Adolescents from senior high schools (USA)

BDI (21)

α = 0.88 K-SADS 11 0.84 0.81 0.10

0.995

-

Roberts et al. 1991b

804

14-18 (100,0)

Male adolescent sample from senior high schools (USA)

BDI (21)

- K-SADS 15 - - - - 0.93

Roberts et al. 1991c

906

14-18 (0,100)

Female adolescent sample from senior high schools (USA)

BDI (21)

- K-SADS 11 - - - - 0.83

Kashani et al. 1990

102

13-18 (53, 47)

Adolescents who attended an outpatient “counselling center” (USA)

BDI (21)

α = 0.87 DICA 16 0.48 0.87 - - 0.79

Whitaker et al 1990

356

14-17 (45.8; 45.5)

Adolescents (grades 8-12) from 8 schools in New Jersey county (USA)

BDI (21)

- Semi-structured clinical interview based on DSM-III criteria administered by child psychiatrist or health professional

16 0.80 0.65 - - -

Barrera and Garrison-Jones 1988b

49 12-18 (45; 55)

Child and adolescent secondary school students (USA)

BDI (21)

α = 0.90 CAS 16 1.00 0.93 - - -

Teri 1982

568

14-17 (40, 60)

High school students (USA)

BDI (21)

α = 0.87 - - - - - -

46

Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis BDI = Beck Depression Inventory α = Cronbach’s alpha reliability co-efficient κ = Cohen’s kappa reliability co-efficient Kinder-DIPS: The Diagnostic Interview for Psychiatric Disorders in Children and Adolescents DAWBA: The Development and Wellbeing Assessment PRIME-MD: The Primary Care Evaluation of Mental Disorders K-SADS: The Schedule for Affective Disorders and Schizophrenia for School-Age Children K-SADS-III-R: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – the revision of the third edition CAS: The Child Assessment Schedule SADS: The Schedule for Affective Disorders and Schizophrenia K-SADS-PL: The Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version K-SADS-E: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – Epidemiological Version DICA: The Diagnostic Interview for Children and Adolescents SCAN: Schedules for Clinical Assessment in Neuropsychiatry

47

Table 3. Validation evidence for the Center for Epidemiologic Studies Depression Scale

(CES-D) in child and adolescent samples

Source N Age

&

Gend

er

(%m, %f)

Sample

(location) Scal

e

nam

e

(no. of items)

Reliabil

ity

Criteri

on

Cuto

ff

Sensitiv

ity

Specific

ity

PP

V

NP

V

AU

C

Clinical

Logsdon and Myers 2010

59

13-18 (0, 100)

Adolescent mothers at 4-6 weeks postpartum (USA)

CES-D (20)

α = 0.84 K-SADS-PL

16 0.7 0.52 0.25

0.12

0.62

Aebi et al. 2009

140

Mean: 15.5 (33, 67)

Adolescents diagnosed with major depressive disorders (Switzerland)

CES-D (20)

α = 0.83 Clinical interview

21 0.86 0.86 - - 0.94

Non-

Clinical

Betancourt et al. 2012

367

10-17 (33, 67)

Children and adolescents (Rwanda)

CES-DC (20)

α = 0.86 MINI-KID

≥30 0.82 0.72 - - 0.83

Cuijpers et al. 2008

1392

14-16 (52, 48)

Adolescents (Netherlands)

CES-D (20)

α = 0.93 MINI 22 0.9 0.74 - - 0.90

Thrane et al. 2004

213

9-16 (54, 46)

Adolescents from three American Indian reservations (USA)

CES-D (20)

α = 0.80 -

- - - - - -

Yang et al. 2004

2440

12-16 (52, 48)

Adolescents (Taiwan)

CES-D (20)

α = 0.9 K-SADS-E

90th %tile

0.41 0.9 - - 0.9

Prescott et al. 1998

556

Mean: 16.8

Adolescent students from grades 9-12 (USA)

CES-D (20)

- DISC 16 0.79 0.74 0.24

0.96

0.74

Garrison et al. 1991a

1231

12-14 (100, 0)

Child and adolescent boys from school sample (USA)

CES-D (20)

α = 0.81 K-SADS-P

12 0.85 0.49 0.16

0.98

0.61

Garrison et al.1991b

1234

12-14 (0, 100)

Child and adolescent girls from a school

CES-D (20)

α = 0.86 K-SADS-P

22 0.83 0.77 0.32

0.98

0.77

48

sample (USA)

Roberts et al. 1991a

1710

14-18 (47, 53)

Adolescents of nine senior high schools (USA)

CES-D (20)

α = 0.89 K-SADS

24 0.84 0.75 0.08

0.99

-

Roberts et al. 1991b

804

14-18 (100,0)

Adolescent male sample of nine senior high schools (Roberts et al, 1991a; USA)

CES-D (20)

- K-SADS

22 - - - - 0.87

Roberts et al. 1991c

906

14-18 (0,100)

Adolescent female enrolment of nine senior high schools (Roberts et al, 1991a; USA)

CES-D (20)

- K-SADS

24 - - - - 0.83

Fendrich et al. 1990

220

12-18 Children and adolescents at risk for depression according to their parents’ diagnosis (USA)

CES-DC (20)

α = 0.89 K-SADS-E

≥16 0.71 0.62 0.15

0.96

-

Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis CES-D = Center for Epidemiologic Studies Depression Scale CES-DC = Center for Epidemiologic Studies Depression Scale Child Version α = Cronbach’s alpha reliability co-efficient K-SADS-PL: The Schedule for Affective Disorders and Schizophrenia for School-Age Children-Present and Lifetime version MINI-KID: The Mini-International Neuropsychiatric Interview MINI: The Mini-International Neuropsychiatric Interview for children K-SADS: The Schedule for Affective Disorders and Schizophrenia for School-Age Children K-SADS-E: The Schedule for Affective Disorders and Schizophrenia for School-Age Children – Epidemiological version DISC: The National Institute of Mental Health Diagnostic Interview Schedule for Children K-SADS-P: The Schedule for Affective Disorders and Schizophrenia for School-Age Children Present Episode version

49

Table 4. Validation evidence for the Reynolds Adolescent Depression Scale (RADS) in child and adolescent samples

Source N Age

&

Gend

er

(%m, %f)

Sample

(location) Scale

name

(no. of

items)

Reliabili

ty

Criteri

on

Cuto

ff

Sensitivi

ty

Specifici

ty

PP

V

NP

V

AU

C

Clinic

al

Osman, A. et al. 2010

196 14-17 (43, 57)

Adolescent inpatients (USA)

RADS-2 (30)

α = 0.95 BHS 67 0.64 0.8 0.77

0.68

0.80

Krefetz et al. 2002

100 12-17 (44, 56)


RADS (30)

α = 0.91 PRIME-MD

70 0.86 0.49 0.69

0.72

0.76

Non-

Clinic

al

Weber and Terhorst 2010

265 13-18 (73, 27)

Adolescent LGBTIQ youth (USA)

RADS-2 (30)

α = 0.92 - - - - - - -

Walker et al. 2005

9699

12-18 (46, 54)

Child and adolescent school students (New Zealand)

RADS (30)

α = 0.94 - - - - - - -

Boyd et al. 1997

783 11-18 (49, 52)

Child and adolescent school students (Australia)

RADS (30)

α = 0.85 - - - - - - -

Reynolds & Miller 1985a

26 Mean: 17.3

Intellectually delayed adolescent school students (USA)

RADS (30)

α = 0.87 - - - - - - -

Reynolds & Miller 1985b

26 Mean: 16.7

Non-intellectually delayed adolescent school students (USA)

RADS (30)

α = 0.97 - - - - - - -

Note: N = Number of participants in the study sample PPV = Positive predictive value NPV = Negative predictive value AUC = Area under the curve analysis RADS = Reynolds Adolescent Depression Scale α = Cronbach’s alpha reliability co-efficient BHS: Beck Hopelessness Scale PRIME-MD: The Primary Care Evaluation of Mental Disorders LGBTIQ: Lesbian, Gay, Bisexual, Transgender, Intersexual, Questioning.

51

Highlights:

• Symptom screening scales are often used to determine a clinical diagnosis of

depression.

• We examined the diagnostic utility of these scales using systematic review and meta-

analysis.

• Commonly used screening scales for depression are reliable measures among

adolescents.

• Using cutpoints to determine a clinical diagnosis may produce high misclassification

rates.

Documents

Author's Accepted Manuscript - UQ eSpace348925/UQ348925_OA.pdf · Author's Accepted Manuscript Symptom screening scales for detecting major depressive disorder in children and adoles-cents: