17
International Journal of Methods in Psychiatric Research 14(4); 186-201 (2005) DOI: 10.1002/mpr.7 ADHD Rating Scale IV: psychometric properties from a multinational study as a clinician-administered instrument S. ZHANG,' D.E. FARIES,' M. VOWLES,^ D. MICHELSON''^ 1 Lilly Research Laboratories, Indianapolis, USA 2 Lilly Research Centre, Surrey, UK 3 Indiana University School of Medicine, Indianapolis, USA ABSTRACT The development of rating scales for attention-dcficitlkyperactivity disorder (ADHD) has traditionally focused on parent- or teacher-rated scales. However, clinician-based instruments are valuable tools for assessing ADHD symptom severity. The ADHD Rating Scaie IV (ADHD RS), clinician administered and scored, has been validated as a useful instrument to assess ADHD symptoms among American children and adolescents. In this study, we assessed the ps^ichometric properties of the scale in a recent clinical trial coruiucted mainly in Europe with over 600 children and adolescents diagnosed with ADHD. The trial was conducted in I ] European countries plus Australia, Israel, and South Africa. Results based on data in the study indicate that this i^ersion of the scale has acceptable psychometric properties including inter-rater reliability, test-retest reliability, internal consistency, factor structure, convergent and divergent validity, discrim- inant validity, and responsiveness. There were low-to-moderate ceiling and floor effects. The psychometric properties were comparable with other validated scales for assessing A D H D symptom .severity. These results were consistent across the 14 countries participating in this trial. Overall, the data from this study support the use of the ADHD RS as a clinician-rated instrument for assessing the severity of ADHD symptoms in children and adolescent.'^ in Europe. Copyright © 2005 John Wiley & Sons, Ltd. Key words: Attention Deficit Hyperactivity Disorder, Ratinp Scii!e-IV-Parent, psychometric properties Information about the scale rating procedure is given in the method section of this paper. Introduction children (National Institute of Clinical Excellence, Attention-deficit/hyperactivity disorder (ADHD) is a 2000). In the Netherlands, a prevalence of 2% to 4% neurobiological disorder characterized by symptoms of for children between the ages of 5 and 14 years has inattention and/or hyperactivity/impulsivity with an been reported (Health Council of the Netherlands, onset during childhood (Pliszka et al., 1996; Faraone 2000), while Baumgaertel, Wolraich, and Dietrich and Biederman, 1998). It is one of the most common (1995) found a prevalence rate of 17,8% for attention psychiatric disorders of childhood atid adolescence, and deficit disorders in German school-aged children, occurs in 3% to 7% of school-aged children in the US These differences, however, appear to be primarily due (American Psychiatric Association, 2000). Outside of to different diagnostic criteria and methodology rather the US and Canada, the reported prevalence rates for than true differences in prevalence, as evidenced when ADHD var>' from country to country' (Mueller et al., consistent criteria are used to assess the presence ofthe 1995; Reid et al, 1998; Livingston, 1999; Graetz at al., disorder (Prendergast et al., 1988). 2001). In the UK, the estimated prevalence rate of The standard instruments for assessing ADHD ADHD has been reported to he 5% for school-aged symptom severity have traditionally been parent- or Copyright© 2005 John WLICV SJ. Sons, Led. [4; 186_2O1 (2005)

ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

Embed Size (px)

Citation preview

Page 1: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

International Journal of Methods in Psychiatric Research14(4); 186-201 (2005)DOI: 10.1002/mpr.7

ADHD Rating Scale IV: psychometricproperties from a multinational study as aclinician-administered instrumentS. ZHANG,' D.E. FARIES,' M. VOWLES,^ D. MICHELSON''^

1 Lilly Research Laboratories, Indianapolis, USA2 Lilly Research Centre, Surrey, UK3 Indiana University School of Medicine, Indianapolis, USA

ABSTRACTThe development of rating scales for attention-dcficitlkyperactivity disorder (ADHD) has traditionally focused on parent-

or teacher-rated scales. However, clinician-based instruments are valuable tools for assessing ADHD symptom severity.

The ADHD Rating Scaie IV (ADHD RS), clinician administered and scored, has been validated as a useful instrument to

assess ADHD symptoms among American children and adolescents. In this study, we assessed the ps^ichometric properties

of the scale in a recent clinical trial coruiucted mainly in Europe with over 600 children and adolescents diagnosed with

ADHD. The trial was conducted in I ] European countries plus Australia, Israel, and South Africa.

Results based on data in the study indicate that this i ersion of the scale has acceptable psychometric properties including

inter-rater reliability, test-retest reliability, internal consistency, factor structure, convergent and divergent validity, discrim-

inant validity, and responsiveness. There were low-to-moderate ceiling and floor effects. The psychometric properties were

comparable with other validated scales for assessing ADHD symptom .severity. These results were consistent across the 14

countries participating in this trial. Overall, the data from this study support the use of the ADHD RS as a clinician-rated

instrument for assessing the severity of ADHD symptoms in children and adolescent.'^ in Europe. Copyright © 2005 John

Wiley & Sons, Ltd.

Key words: Attention Deficit Hyperactivity Disorder, Ratinp Scii!e-IV-Parent, psychometric properties

Information about the scale rating procedure is given in the method section of this paper.

Introduction children (National Institute of Clinical Excellence,Attention-deficit/hyperactivity disorder (ADHD) is a 2000). In the Netherlands, a prevalence of 2% to 4%neurobiological disorder characterized by symptoms of for children between the ages of 5 and 14 years hasinattention and/or hyperactivity/impulsivity with an been reported (Health Council of the Netherlands,onset during childhood (Pliszka et al., 1996; Faraone 2000), while Baumgaertel, Wolraich, and Dietrichand Biederman, 1998). It is one of the most common (1995) found a prevalence rate of 17,8% for attentionpsychiatric disorders of childhood atid adolescence, and deficit disorders in German school-aged children,occurs in 3% to 7% of school-aged children in the US These differences, however, appear to be primarily due(American Psychiatric Association, 2000). Outside of to different diagnostic criteria and methodology ratherthe US and Canada, the reported prevalence rates for than true differences in prevalence, as evidenced whenADHD var>' from country to country' (Mueller et al., consistent criteria are used to assess the presence ofthe1995; Reid et al, 1998; Livingston, 1999; Graetz at al., disorder (Prendergast et al., 1988).2001). In the UK, the estimated prevalence rate of The standard instruments for assessing ADHDADHD has been reported to he 5% for school-aged symptom severity have traditionally been parent- or

Copyright© 2005 John WLICV SJ. Sons, Led. [4; 186_2O1 (2005)

Page 2: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Rating Scaie/V 187

teacher.-rated scales, which have heen well validatedand in use for many years (Barkley, 1990; Conners,1997; DuPaul et al., 1998). Cohen et al. (1990) con-cluded that the low correlation hetween parent- andteacher-scored scales was not due to low rellahility ofthe scales but suf gested the need for multiple methodsfor assessing symptom severity. Brown et al. (2001)found that the teacher rating scale had a lower effectsize when compared with the parent- and clinician-rated scales. Recently, Beidemian et al. (2004)performed a literature search and summarized clinicaltrials, which contained both parent- and teacher-reported measures. Their results showed similarsensitivity for parent and teacher ratings.

Parent- and teacher-rated scales provide importantinformation for assessing ADHD symptom severity huta clinician-rated assessment tnay he preferahle as a pri-mary outcome measure in a clinical trial designed toassess treatment efficacy. First of all, clinicians aretrained to assess the impact of interventions alreadyimplemented to cope with ongoing hehaviour proh-lems (for example, special classroom methods), animportant factor in achieving an accurate assessmentof the severity of the underlying disorder. Secondly, useof a clinician-rated scale avoids the prohlems of need-ing to obtain rating for children from multiple teachersand issues of school vacations. Thirdly, inclusion ofclinician assessments is consistent with the DSM-IVcriteria for the diagnosis of ADHD, which requiresthat the ADHD symptoms are present to a degree ofimpairment in at least 2 settings, which cannot he sat-isfied hy parent or teacher ratings alone. Furthermore,in the clinical trial setting, trained clinician raters canapply standardized, symptom-severity inclusion crite-ria, which lead to the selectiim of a more homogenouspatient population and a reduction in variability thatcan be important in detecting true drug effects.Additionally, application of standardized inclusion cri-teria (DSM-IV) by the trained clinicians facilitates theincorporation of data collected from multiple sourcesand settings into a single score, thereby reducing thestatistical multiplicity that occurs when data are col-lected inconsistently using separate measures. Andfinally, in neuroscience clinical trials, using trainedclinician raters can reduce variability hy having themapply consistent judgements on severity ratings acrosspatient.s; therefore, clinician-rated instruments of ill-ness severity are typically required as the primaryefficacy outcome measures by regulatory authorities.

Parent-, teacher-, and clinician-scored versions ofthe ADHD Rating Scale-IV (ADHD RS, DuPaul etal., 1998; Farics, 2001) have been validated in NorthAmerican populations. Magnusson et al. (1999) stud-ied the validity of an Icelandic version of ADHD RSrated by parents and teachers in over 400 Icelandicschoolchildren. However, the validity of this instru-ment has not heen previously studied in majoritynon-North American populations, especially as a clin-ician-rated instrument.

The objective is to examine the psychometricproperties of the ADHDRS-IV-Parent: Clinicianadministrated and scored (ADHDRS-PI), scored bytrained clinicians based on parent interviews in chil-dren with a diagnosis of ADHD who reside outside ofNorth America. Specifically, the assessment includesthe inter-rater reliability, internal consistency, factorstructure, test-retest reliability, convergent and diver-gent validity, discriminant validity, responsiveness,and ceiling and floor effects. These psychometricproperties are assessed relative to ClinicalGlobal Impressions-ADHD-Severity (CGI-ADHD-S),Conners' Parent Rating Scales (CPRS), and Conners'Teacher Rating Scales (CTRS).

Materials and methods

Situdy desifrn and rating scale

The study of the validity and reliability of theADHDRS-PI was assessed as part of a large clinical trialof atomoxetine. The study was conducted at 33 sites inthe UK, France, Spain, Italy, Belgium, the Netherlands,Germany, Poland, Hungary, Sweden, Norway, Israel,South Africa, and Australia, and enrollment took placeover approximately 11 months. Principle investigatorsat each site were physicians with specialty training inpsychiatry or paediatrics and psychologists with experi-ence diagnosing and treating children and adolescentswith ADHD. After description of the procedures, pur-pose ofthe study, and prior to the administration of anystudy procedure or dispensing of study medication, writ-ten informed consent was obtained from each patient'sparent or guardian and written assent was obtained fromeach patient. This study was reviewed hy each site'sinstitutional review hoard and was conducted in accor-dance with the ethical standards of the Declaration ofHelsinki 1975, as revised in 2000.

In this trial, 604 children and adolescents, aged 6through 15 years, who met the Diagnostic and

2005 John Wiley & Sons, LtJ- 14:186-201 (2005)

Page 3: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

188 Zhang eta/.

Statistical Manual of Mental Disorders, Fourth Edition(DSM-IV) criteria for a diagnosis of ADHD, wereenrolled at 33 investigational sites in 14 countries.The study design included a 1-week assessment anddrug washout phase for those children and adolescentstaking any medication excluded by the protocol, fol-lowed by a 10-week, open-lahel, treatment periodduring which atomoxetine was administrated twice aday (the subsequent extension phase of the study isongoing). Detailed information regarding the efficacyand safety of the use of atomtjxetine has been previ-ously published (Allen et al., 2001; Michelson et al.,2001; Spencer et al., 2001; Michelson et al., in press).Only data from the acute phase are presented here.

Patients were children and adolescents 6 through15 years of age at the time of study entry. For eachpatient, the clinical diagnosis of ADHD was confirmedusing the Kiddie Schedule of Affective Disorders andSchizophrenia for School-Aged Children Present andLifetime Version (KSADS-PL) (Kaufman et al., 1996).Patients must have had an ADHDRS-PI score of atleast 1.5 standard deviations above the age and gendernorm for their diagnostic subtype using published USnorms for the ADHDRS-PI. Generally, the sameinvestigator completed hoth KSADS-PL and theADHD RS. Patients who were taking psychotropicmedication at study entry had a washout equal to aminimutn of five half-lives of the psychotropic medica-tion prior to obtaining the baseline severity assessmentand starting treatment with atomoxetine. Patientswere seen at approximately weekly or biweekly visitsduring tbe 10-week, open-label period, and no otherpsychotropic medications were allowed during thestudy.

Parent-, teacher-, and clinician-scored instrumentswere used in this study to assess the severity of ADHD.Since clinicians at each investigational site werefluent in English, the English version of clinician-ratedscales was used without translation into the native lan-guages. The intention of this paper is simply validatingthe use of the English clinician-rated ADHDRS-PIscale in countries outside the US. The translation andvalidation of self and investigator forms is the subjectof future work.

The primary efficacy measure was the ADHDRS-PI. It was completed at each visit to assess ADHDsymptom severity over the past week or past 2 weeks.The ADHDRS-PI is an 18-item scale witb one itemfor each of the 18 symptoms contained in the DSM-IV

diagnosis of ADHD. Each item is scored on a 0 to 3scale: 0 = none (never or rarely); 1 = mild (some-times); 2 = moderate (often); 3 = severe (very often).The total score is computed as the sum of the scores oneach of the 18 items. In addition to the total score, thescores from tbe inattention and hyperactivity/impul-sivity subscales were computed. The inattentionsubscale score is the sum ofthe scores on the odd-num-bered items and the hyperactivity/impulsivity subscaleis the sum of the scores on the even-numbered items.A clinician-ratc'd symptom severity for each item wasbased on his or her interview with the child's parent orprimary caretaker. The clinicians were trained at thetime of study start-up to rate scores based on the fre-quency ofthe hehaviour (across multiple settings) andthe degree of impairment, and to use developmentalcomparison (to consider the age appropriateness ofbehaviour in rating each item).

The Clinical Global Impressions-ADHD-Severity(CGI-ADHD-S) (Guy, 1976) is a single-item clini-cian rating of the clinician's assessment of theseverity of the ADHD symptoms in relation to tbeclinician's total experience with patients withADHD. Severity is rated on a 7-point scale: 1 =normal, not ill; 2 = minimally ill; 3 = mildly ill; 4 =moderately ill; 5 = markedly ill; 6 = severely ill; and 7= very severely ill. It was completed at each visit.Other scales used in this study included tbe Conners'Parent Rating Scale - Revised: Short Form (CPRS)and Conners' Teacher Rating Scale-Revised: ShortForm (CTRS) (Conners, 1997). Both the CPRS andCTRS have 4 subscales and were collected at Visit 1and endpoint: ADHD Index, Hyperactivity,Cognitive, and Oppositional.

Reliability and validity methods

Inter-rater reliabilityInter-rater reliability refers to a scale's ability toachieve similar ratings by different raters assessing thesame patient. The Kappa statistic and average squareddeviation from the mode were used to assess inter-raterreliability. Kappa statistic has a range from 0 to 1 and iscommonly used to assess inter-rater reliability whenobserving categorical variables. The Kappa statistic isthe proportion of agreeing pairs out of all possiblepairs, adjusted for chance agreement (Fleiss, 1971).The average squared deviation was computed for eachrater by averaging (over the 18 items) the squared

Copyright © 2005 John Wiley &. Sons, Ltd. 14:186-201(2005)

Page 4: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Raring Scoft; IV 189

difference between their ratings and the mode ratingfor each item from the entire group (Channon andBurler, 1998); values from all raters were then aver-aged. Unlike the Kappa statistic, the average squareddeviation statistic more severely penalizes ratings thatarc more than I point from the mode.

For this study, all clinical personnel who would beusing the ADHDRS-PI were trained and assessed in 2rater-training sessions prior to the study. The rater-training session began with a presentation anddiscussion of the rating scale. Raters then watched avideotaped interview between a clinician and a parentof a child with ADHD, and independently completedthe ADHDRS-PI after the video. The results werethen presented and points of agreement and disagree-ment were discussed to help gain consistency in futurescoring. The raters then watched a second videotapedinterview and completed the ADHDRS-PI for thesecond time. Data from 1 of the 2 ratcr-trainitig ses-sions were collected and available for the analysis inthis manuscript.

Factor structureFactor analysis is a powerful tool used to uncover thelatent structure (dimensions) of a set of variables. Itcan be used to validate a scale by demonstrating thatits constituent items load on the same factor, and todrop proposed scale itetns that cross-load on tiiore than1 factor.

In this paper, a principal-components factor analy-sis with Varimax rotation (Reid, 1995) was performedto examine the factor structure of ADHD RS.

Internal consistencyCronbach's alpha was used to evaluate internal consis-tency (Cronbach, 1951). It assesses the degree towhich each item of a rating scale measures the sameconstruct based upon all possible correlations between2 sets of items within a rating scale. The range of thestatistic is from 0 to 1. The accepted minimal standardto claim internal consistency is 0.65 (Nunnally, 1994).The scores for all 18 ttems in ADHDRS-PI at Visit 1for all enrolled patients were used to cotnputeCronbach's alpha for ADHDRS-PI total score. Scoresfrom all odd- and even-numbered items were used tocompute Cronbach's alpha for ADHDRS-PIinattention subscale and hyperactive/impulsive sub-scale. Similar analysis was also done for ADHDRS-PIat Visit 2.

Test-retest reliabilityA good rating scale sht)uld he able to reproduce thesame score for the same individual at different timeswhile in the same disease condition. Test-retest relia-bility indicates such stability of a rating scale.

The intra-class correlation coefficient (ICC) is therecommended measure to assess the test-retest reliabil-ity (Deyo et al., 1991). The ICC is computed as thevariability due to patients divided by the total variabil-ity (Included variability due to patients, time, andother factors). Since Pearson's correlation coeft icientis the most commonly u.sed measure for assessing thestrength of the relationship between the scores, bothcoefficients were computed to assess the correlationbetween the ADHD-PI total scores at Visits 1 and 2.To show the variability of correlation, the confidenceintervals for ICC and Pearson's correlation coefficientswere computed based on Fisher's Z transformation(Anderson, 1990). To further quantify shifts over time,t-test was used to test the null hypothesis that themean ADHD total scores obtained were similar atVisits 1 and 2.

In this study, ADHDRS-PI and CGl-ADHD-S werecompleted at both Visits 1 and 2, which were sched-uled approximately 1 week apart. Since no study drugwas dispensed at these two visits, the changes inADHDRS-PI and CGI-ADHD-S scores from Visits 1to 2 can be used to assess the stability of scores.Patients who were taking tnedication for ADHD otherthan the study drug at Visit 1 were excluded from theanalysis group due to the possible changes in ADHDsymptoms from Visits 1 to 2 after discontinuing thedrug. Test-retest reliability was not assessed for theCPRS and CTRS as these scales were only adminis-tered at Visit 1.

In addition to the correlation analysis, a graphicalapproach. Bland and Altman Plot (Bland and Altman,1986), which plots the difference between the 2 mea-sures plot against the average score, was also used toassess the test-retest reliability.

Starting from this section, the imputed scores fortotal and subscales are used for all analyses. Theimputed score for subscale was computed as follows: ifonly 1 single item was missing in the subscale, the meanscore for all other items in the subscale was imputed asthe score for the missing score. The total score wascomputed as the sum of the imputed subscale scores. Ifmore than 1 item was mi.ssing In the subscale then thesubscale score and total score would be missing.

Copyrighr © 2005 John Wiley & Sons, Ltd. 14: 186-201(2005)

Page 5: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

190 Zhang etai

Convergent and divergent validityConvergent and divergent validity were utilized toestablish the construct validity ai the scale.Convergent/divergent validity indicates a relationshipbetween the scale under review and other scalesthought to measure the same/different construct. Inthis manuscript, the scale undet review is ADHDRS-PI; other validated scales for cotuparison were theCPRS (parent scored), CTRS (teacher scored), andCGI-ADHD-S (investigator scored). All of thesescales were used to measure the severity of ADHDsymptoms.

To assess convergent validity, Pearsons correlationcoefficients were computed between the followingmeasures (meastircs of the same construct): ADHDRS-Pl Total with CGl-ADHD-S, CPRS ADHD Index andCTRS ADHD Index, ADHDRS-PI InattenttonSubscale with CPRS Cognitive and CTRS CognitiveSubscale, and ADHDRS-PI hyperactive/impulsivesubscale with CPRS hyperactive and CTRS hyperac-tive suhscale. Correlations were computed ft>r bothbaseline and change-from-haseline-to-endpoint scores.

To assess divergent validity, Pearson's correlationcoefficients were computed between the followingmeasures (measures of different construct); ADHDRS-PI Inattention Subscale with CTRS hyperactive andCTRS hyperactive subscale, and ADHDRS-PI bypcr-active/impulsive subscale with CPRS cognitive andCTRS cognitive suhscale. Correlations were cotnputedfor haseline scores.

To show the variability of correlatiem, the confi-dence intervals for Pearson's correlation coefficientswere computed based on Fisher's Z transformation(Anderson, 1990).

Discriminant validityThe di.scriminant validity of a scale tneasures the scalesability to distinguish hetween different groups of sub-jects. An instrument for asscssitig ADHD symptomseverity with discriminant validity should distinguishhetween patients with and without a diagnosis ofADHD, or between patients with and without signifi-cant hyperactive symptoms. In this study. (n"ily patientswith ADHD sytuptoms were recruited. Thus, insteadof a comparison hetween ADHD patients and a con-trol group, 3 alternative approaches were used to assessthe discriminant validity.

First, a cotiiparison hetween patients with differentADHD sLthtypes, namely inattentive, hyperactive/

impulsive, and combined (inattentive plus hyperactive/impulsive) was conducted. As noted previously,ADHD suhtype was assessed at Visit 1 using theKSADS-PL. Patients with an inattentive suhtype didnot have sufficient hyperactive/impulsive symptoma-tology to meet the full ctnnhined ADHD subtype. Thisprovides an opportunity to assess whether the hyperac-tive/impulsive subscale of the ADHDRS-PI was ableto distinguish between the ADHD subtypes.Therefore, we think the comparison hetween subtypescan be used to assess the discriminant validity of a sub-scale of the ADHDRS-PI.

Secondly, although patients without ADHD sytnp-ttims were not recruited, with active treatment,severity levels of the sytnptoms decrease to aboutnormal (as indicated by CGl-ADHD-S score = I or 2)at endpoint for some patients. Thus, an analysis ofvariance can he used to compare mean ADHDRS-PItotal scores with CGl-ADHD-S scores. Comparisonwith the CGI-ADHD-S scores also provides additionalinformation regarding the clinical significance of spe-cific ADHDRS-PI total scores.

Third, Pearson's correlation coefficient between theADHDRS-PI and other measures that should not helogically related were cotnputed. Several measurementsincluded in the study were the Children's DepressionInventory (CDl) and Children's Depression RatingScale-Revised (CDRS) to measure presence and sever-ity of depression, and the Multidimensional AnxietyScale for Children (MASC) to assess anxiety. The cor-relation of scores hetween ADHDRS-PI and thesemeasurements should be low compared with thathetween ADHDRS-PI and CPRS or CTRS.

ResponsivenessResponsiveness indicates the ability of a scale to detectsmall but clinically significant changes in the patient'ssymptotn severity when a change has tx;curred. Thestandardized response mean (SRM) is a commonlyused statistic to assess responsiveness (Stratford et al.,1996). The SRM is defined as the mean change-from-baseline score divided hy the standard deviation of thechanj 'e.s scores. The SRM hascd on the ADHDRS-PIwas compared with the SRM from other validatedscales (CGI-ADHD-S, CPRS, and CTRS).

Minitnal clinically important differencesAnother itiiporr^inr need is to determine the hetween-and within-treatrnent minimum clinically important

Copyright © 2005 John Wiley 6* Sons. Ltd. 14; 186-201 (Z005)

Page 6: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Rating Scale IV 191

differences (MCID) for an instrument. The MCIDhelps clinicians interpret the relevance of changes inthe instrument scores. The within-treatment MCID isdefined as the improvement in a score with treattnent atwhich a patient recognizes that sheA e is improved. TTiebetween-treatment MCID is the minimutii differencebetween 2 treatments that can be considered clinicallyrelevant. One widely accepted way to determine theMCIDs is to anchor the scale to a global rating scalesuch as CGl-lmprovetnent (CGl-1). Unfortunately,CGl-1 was not collected in this study; as an alternative,CCl-ADHD-S was used. First, the LOCF change frombaseline to endpoint of CGI-ADHD-S was calculatedfor each patient. If the change score is 0, then thepatient was rated as having 'no change'; if the changescore is 1 (the smallest change detectable by the CGl-ADHD-S), then the patient was rated as 'a little better'.The mean change in tbe ADHDRS for those subjectswho rated as 'a little better' could be considered as thewithin-treatment MCID. The difference in the meanchanges for subjects who rated as 'a little better' andwho rated 'no change' could be considered as thebetween-treatment MCID. The between-treatmentMCID can be a sound choice for the treatment differ-ence in order to power the clinical studies. These twoMCIDs provide guidance to researchers to interpret the

change scores for the instmment. They become criticalwhen statistically significant differences needed to bejustified as clinically relevant.

Ceiling and floor effectsCeiling and floor effects exist if a substantial percent-age of the patient scores are at the ends of the scales -then the scale may not be able to accurately capturechange scores or differentiate among patients near theceiling or floor. A scale with floor effect lacks the abil-ity to detect minor disease symptoms, while a scalewith ceiling effect would he less sensitive to changes inthe more serious symptoms (Stucki and Michel, 1995;Herrmann et al., 1997). Percentages of lowest andhighest possible scores at baseline and endpoint werecalculated to assess ceiling and floor effects.

Results

SubjectsSix-hundred-and-ftmr patients enrolled in this study(H countries, 33 investigational sites). Tahle 1 sum-marizes the patient characteristics for this group aswell as the patient characteristics for a similar US-based study. The mean (SD) age for the group was10.24 (2.25) years.

Table 1. Summary ot h;i L'Ii L• patient

Current multinationtii study (N = 604) Previous US study (Farics, 2001) (N = 228)

Variable

GenderMale

OriginCaucasian

ADHD suhrypeHyper/impulsiveInattentiveCotTibined

Previous stimulanr

Ag<i (yrs.)5-78-910-1112-1314-15

541

583

.30124450

341

4415518414081

(89.6)

(96,5)

(5,0)(20.5)(74.5)

(56.5)

(7.3)(25.7)(30.5)(23.2)(13.4)

n (%)

211 (92.5)

175 (76.8)

3 (1.6)52 (22.9)

172 (75.8)

125 (54.8)

69 (30.3)72 (31.6)51 (22.4)36 (15.8)

Coryright © 2005 John Wiley & Sons, Lt. 14;186-201(2005)

Page 7: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

192 Zhang etal.

The diagnosis of ADHD and comorbid diagnoseswere assessed by clinical interview and confirmedusing the KSADS-PL semi-structured interview. Inaddition to the diagnosis of ADHD, 45.5% of thepatients also had a diagnosis of oppositional defiantdisorder, 5.6% had conduct disorder, and 1.5% haddepression.

Inter-rater reliability

Rater training data from 41 raters were collected foranalysis prior to the start ofthe trial. The Kappa statis-tics for the first and second tapes were 0.58 and 0.63,respectively. The average squared deviations from themode for the first and second tapes for the rating scalewere 0.38 and 0.30, respectively. Agreement was simi-lar for inattention and hyperactive/impulsive items.Approximately 80% of the raters independently gave atotal score in the range of 33 through 39,

Factor structure

The first three eigenvalues In the solution were 5.99(55%), 2.80 (26%), and 0.80 (7%). Kaiser's criterionof an eigenvalue greater than 1 indicated that two fac-tors could be extracted for ADHDRS-PI. Table 2shows the structure matrix for the two-factor solution.

The pattern of loadings also showed a clear hyperac-tivity-impulsivity factor and a clear inattention factor.Odd-numbered items (reflecting inattention) loadedon Factor 2 and even-numbered items (reflectinghyperactivity) loaded on Factor 1.

Internal consistencyCronbach's alpha for the ADHDRS-PI total score was0.795 based on Visit 1 data and 0.838 based on Visit 2data from the 604 enrolled patients (for the inatten-tion subscale, 0.724 for Visit 1 and 0.770 for Visit 2; forthe hyperactivity-impulsivity subscale, 0.825 for Visit1 and 0.848 for Visit 2). The item-to-total correlationsrange from 0.25 to 0.51 at Visit 1, and from 0.29 to0.57 at Visit 2.

Test-retest reliability

Test-retest assessments included the 565 patients whowere not taking stimulant medication for ADHD atVisit 1 and who had an efficacy measurement(ADHDRS-PI) at both Visits 1 and 2. Table 3 reportsthe ICC, the Pearson's correlation coefficients, themean changes and their associated confidence inter-vals for ADHDRS-PI total score, ADHDRS-PIinattentive subscale score, ADHDRS-PI hyperactive/

Table 2. Factor pattern matrix of a Varitnax rotation for the ADHD RS

Item* Item short description Factor 1 Factor 2 Communality

Hyperactivity100616180814041202

Inattentive071709110115031305

On the goRuns aboutDifficult waiting; turnInterruptsDifficult playinyBlurts out answersLeaves seatTalks excessivelyFidgets

No follow-throughForgetfulDifficult organizingAvoids tasksClose attentionEasily distractedSustaining attentionLoses thingsDoes not listen

0.710.680.630.620,610,540,530,510.46

0.040.060.000,040,050.130,140,100,27

0,020,090.090.080,160.130.060.040.12

0.600.570.560.540.520.480.440.420.35

0.500.480.410.390.390.310.290.260.22

0.360.330.310.290.270.250.220.190.19

Copyright © 2005 John Wiley & Sons, Ltd. 14; 186-201(2005)

Page 8: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Rating Scale IV 193

Table 3. Test-retest reliability of ADHDRS-PI and CGI-ADHD-S

Variable N ICC(LC!,UCI)*

Pearson's correlation(LC1,UCI)**

Mean difference

ADHDRS-PI Total scoreADHDRS-PI Hyper/imp SuhscaleADHDRS-PI Inattentive SubscaleCGl-ADHD-S score

565565565566

.84 (.82, .87)

.89 (.87, .90)

.77 (.74. .80)

.85 (.83, .87)

.85 (.82,

.89 (.87,

.78 (.74,

.86 (.83,

.87)

.90)

.81)

.88)

.09 (-.25,

.19 (-,03,-.09 (-,31,

.02 (-.01,

.44),40),12).06)

ICC: Intra-class coefficient from ANOVA model with tertu of PATIENT and VISIT; meiin difference: mean of changescore from Visits 1 to 2.LCI - lower bound of 95% confidence interval.UCI — upper hound of 95% confidence interval,* 95% confidence interval of ICC hased on Fisher's Z transformation.** 95% confidence interval of Pearson's correlation between Visit 1 and Visit 2 scores. Based on Fisher's Z transfortnation,*** 95% confidence interval of mean difference hased on normal distrihution.

impulsive subscale score, and CGI-ADHD-S score.The correlation coefficients for the ADHDRS-PI scaleand the CCI-ADHD-S scale were both high (rangefrom 0.78 to 0.89), and the mean differences in scoresfrom Visits 1 to 2 were both very low (range from-0.09 to 0.19).

Figure 1 plots the difference of ADHD RS totalscore between Visits 1 and 2 against the average of

ADHDRS-PI total scores at those 2 visits. The meandifference was -0.14, and the limits of agreement were-8.9 to 8.6.

Convergent and divergent validityPearson's correlation coefficients between ADHDRS-PI scores and scores from <.)ther scales thought tomeasure the same construct for both baseline and

20

I 10

uj -5-

-10

-15-

-20

-2510 15 20 25 30 35 40 45 50

AOHDRS-PI Total SCORE (average of Visit 1 and 2)55 60

Figure 1. ADHDRS-PI: difference vs, average of total scores measured at Visits 1 and 2 with 93% limits of agreement.

Copyrishc © 2005 John Wiley &, Snns, Ltd, 14:186-201 (2005)

Page 9: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

194 Zhang etal.

change-from-haseline-tO'endpoint measurements arepresented in Table 4. Correlations between tbeADHDRS-PI scores and other parent (CPRS) andclinician (CGI-ADHD-S) measures of ADHD symp-tom severity for both baseline and change score weremoderate to high. Correlations with teacher-ratedmeasures were low to moderate. AU correlations werestatistically different from 0.

Pearson's correlation coefficients betweenADHDRS'PI scores and scores from other scalesthought to measure a different construct at baseline, aswell as the correlations between scales thought tomeasure the same set of symptoms, are presented in

Table 5. As expected, the ADHDRS-PI inattentivesubscale had a much lower correlation with otherhyperactivity subscales than with assessments of othercognitive subscales. The corresponding trend wasobserved for the ADHDRS-PI hyperactive/impulsivesubscale.

Discriminant validity

The ADHDRS-PI total score and subscale scores forpatients with each ADHD subtype were summarized atbaseline and reported in Figure 2. A statistically signif-icant difference (P < 0.001) was noted betweeninattentive subtype patients and hyperactive/impul-

Table 4. Convergent v;i!idity of ADHDRS-Pl based on score at baseline and change score from baseline to enJpoint

Variable I

ADHDRS-PI Total

ADHDRS-PIInattentiveADHDRS-PIHyper/imp

Variable 2

CGI-ADHD-SCPRS ADHD IndexCTRS ADHD IndexCPRS CognitiveCTRS CognitiveCPRS HyperactiveCTRS Hyperactive

N

604602535602529603536

Baseline

Correlation^

0.560.490.210.530.150.730.36

(LCI^ UCF)

(0.50,0.61)(0.42,0.54)(0.13,0.29)(0.47,0.58)(0.07,0.24)(0.69,0.76)(0.28,0.43)

N

603566457566452572461

Change

Correlation-'

0.770.710.180.670.140.720.21

(0.74,0.80)(0.67,0.75)(0.09,0.27)(0.63,0.72)(0.05,0.23)(0.68,0.76)(0.12,0.29)

''Correlation is a.ssessed by Pearson's correlation coefficient between variable 1 and variable 2.^LCI - lower bound of 95% confidence interval based on Fisber's Z transformation.

- upper bound of 95% confidence interval based on Fisber's Z tran.sforniation.

Table 5. Correlations between the ADHDRS-PI subscales and corresponding CPRS and CTRS subscales atbaseline

Variable 1

ADHDRS-PI

Inattentive subscaleHyper/imp subscale

Cognitivesubscale

correlation(LCI^ UCI'

N = 602

0.53(0.47,0.0.06 (-0.02,0.

CPRS

58) 014) 0

Variable 2

Hyperactivesubscale

correlation^(LCI^UC^)

N = 6O3

Cognitivesubscale

correlation^

N = 529

.15(0.07,0.23) 0.15(0.07,0.

.73(0.69,0.76) -0.09 (-0.17,-0.

CTRS

,

24)01)

Hyp/impulsivesubscale

correlation^(LCI^ UCI^)

N=536

0.02 (-0.07,0.10)0.36(0.28,0.43)

^Correlation is assessed by Pearson's correlation coefficient between variable I and variable 2.- lower bound of 95% confidence interval based on Fisher's Z transformation.- upper bound ot 95% confidence interval based on Fisher's Z transformation.

Copyright © Z005 lohn Wiley Ck Sons, Ltd. 14:186-201(2005)

Page 10: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Rating Scale IV 195

ow

3

Figure

CGI-Scverity

2. ADHDRS-PI baseline total and subscale scores by subtype.

sive subtype patients on both subscales of theADHDRS'PI. In addition, statistically significant dif-ferences were noted between the inattentive-subtypepatients and combined-subtype patients on theADHDRS-PI hyperactive/impulsive subscaie, andbetween the hyperactive/impulsive-subtype andcombined-subtype patients on the ADHDRS-PIinattention suKscale.

Figure 3 summarizes the mean ADHDRS-PI rotalscores hy CGI-ADHD-S score at endpoint. Analyses ofvariance (ANOVA) on endpoint data indicate statisti-cally significant differences in mean ADHDRS-PI totalscores between each CGI-ADHD-S level (results notshown). In the total sample of patients, CGI-ADHD-Sscores of'minimally ill', 'mildly ill', 'moderately ill', and'markedly ill' ccirresponded to mean ADHDRS-PI totalt-score 52.3, 60.0, 68.6, and 78.1, respectively. In apopulation sample, a t-score of 50 represents the meanraw score for a child of a given age and gender, andeach change o( 10 points in t-score corre.sponding to 1SD from the mean for each patient. In this study, nor-mative data based on a sample of 2000 US childrenhave been used to compute t-scores (transfontiations ofraw scores based on normative data).

The Pearson correlation coefficients for total scoresat baseline between ADHDRS-PI total score were

-0.05 (P = 0.183) for CDRS, 0.016 (P = 0.709) forCDI, and 0.002 (P = 0.956) for MASC. Thus, correla-tions between the ADHDRS-PI total score andmeasurements of comorbid disease severity scores werevery low and were not statistically significant.

ResponsivenessFor patients who received at least one dose of atomoxe-tine, the mean baseline, mean change from baseline toendpoint, standard deviation in change scores, and thestandardized response mean (SRM) for ADHDRS-PI,CGI'ADHD-S, GPRS, and GTRS are presented inTable 6. The SRM produced hy the ADHDRS-PIscores was similar or numerically higher than the SRMusing other parent and clinician measures of ADHD.All scales demonstrated a statistically significantchange from baseline. SRM for ADHDRS-PI was alsocalculated for each ADHD subrype, and is included inTable 6. The scores were consistent across 3 subtypes,with the hyperactive/impulsive subtype having aslightly lower score, which might be due, in part, to themuch smaller sample size. In addition, patients weredivided into 3 groups according to their baselineADHD RS total score, and the SRM for ADHDRS-PIwas computed for each subgroup. The results are as fol-lows: for Group 1 (ADHD RS total score (:s36), SRM =

Copyright © 2005 John Wiley & Sons, Ltd. 14: 186-201 (2005)

Page 11: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

196 Zhang etal.

50

Hyp Iiu Tot

AU Patients

N=604

Hyp Ilia Tot

Hype ryimp Subtype

N = 3 0

Hyp Ina Tot

Inattentive Subtype

N=124

Hyp Ina Tot

CombiDed Subtype

N = 450

Hyp = Hyperactive/Impulsive Subscaie scoreIna = Inattentive Subscale scoreTot= Total score

Figure 3. Mean ADHDRS-PI total scores by CGl-ADHD-severity score.

Table 6. Responsiveness of ADHDRS-PI hased on change score frotn baseline to endpoint in the atotitoxetine group

Baseline Change

Population Variable

AOHDRS'PI totalADHDRS-PI totalADHDRS-PI totalADHDRS-PI totalCGI-ADHD-SCPRS ADHD indexCTRS ADHD itidex

N

60344912430

603572458

Mean

41.3243.5234.8934.90

5.2128.3724.04

SD

7.856.807.336.400.795.708.02

Mean

-23.37-24.57-20.54-17.13

-2.65-12.83-5.98

SD

12.6812.9011.4410.64

1.468.917.70

P Value*

<.OO1<.OO1<.OO1<.OO1

<.OO1

<.OO1<.OO1

SRM**

1.841.901.801.611.821.440.78

All patientsMixed subtypeInatt. subtypeHyper, .subtypeAll patietitsAll patiencsAll patients

*Within-treattnent group P values ;ire frotn Wilcoxon's Signed Rank Test.**SRM (Standardized Response Mean) is cotitputed as the tnean change from baseline to endpoint divided by tbe stan-dard deviation of cbange from baseline to endpoint scores.

Copyright © 2005 John Wiley & Sons, Ltd. 14: 186-201 (2005)

Page 12: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Rating Scak IV 197

1.67; for Group 2 (ADHD RS total score of 37 to 44),SRM = 1.96; and for Group 3 (ADHD RS total score(fi45), SRM = 2.21. It seems that SRM is affected hyhaseline severity; the higher the haselinc ADHD RStotal score, the greater the change from baseline toendpoint (P< 0.001).

Minimal clinically important differences

The within-treattnent MGID for ADHD RS totalscore is 10.2 (or a 27% decrease from baseline). Thehetween-treattnent MCID is 6.6 (or a 19% decreasefrotn haseline). If grouping the patients into approxi-mately three equal groups hased on their baselineADHD RS total score (Group 1: (s36,Group2: 37-44,Group 3: (^45), then the within-treatment MCID forrhe ADHD RS total score is 9.7 (or a 35% decreasefrom baseline) for Group 1, 9.6 (or a 24% decreasefrom baseline) for Group 2, and 11.4 (or a 23%

decrease from baseline) for Group 3. The between-treatment MCID for the ADHD RS total score is 7.7for Group 1, 7.6 for Group 2, and 5.2 for Group 3.

Ceiling and floor effects

The frequency of the highest and lowest possiblescores is reported at both baseline and endpoint (Table7). The percentage of patients achieving the highestand lowest possible scores was very small at both base-line and endpoint (less than 7%).

Validation ofADHDRS'P! hy countryThis was a multicountry study so the psychometricproperties of ADHDRS-PI were also investigated forindividual countries. Table 8 summarizes the key results.

Across all 14 countries, tbe range of Cronbach'salpha for the ADHDRS-PI total score at Visit 1 was0.718 to 0.844. Intra-class correlation coefficients

Table 7. Frequency of best and worst possible scote.s for ADHDRS-Pi at baseline and endpoint

Frequency

ADHDRS-PI totalADHDRS-PI inattentiveADHDRS-PI hyper/imp

Best scoren(%)

0(0.0)0 (0.0)1 (0.16)

Baseline

Worst scoren (%)

12(1.86)44 (6.82)42(6.51)

Best scoren(%)

5 (0.78)16(2.48)43 (6.67)

Endpoint

Wor5t scoren(%)

0 (0.0)1 (0.16)1 (0.16)

Table 8. Psychotnetric properties of ADHDRS-PI by country

Country

AustraliaBelgiumGermanySpainFranceUKHungaryIsraelItalyHollandNorwayP<ilandSwedenSouth Africa

Copyright © 2005 John Wil

N

3836263635297734313124623768

t-'Y & Sons, Ltd.

Internal Consistency(Cronhach's Alpba)

0.8440.8150.7180.7650.7360.7880.7410.8150.7180.7980.7310.7390.7560.815

Test-retesr reliahility(ICC)

0.8170.8900.8990.8650.7600.7330.7680.8710.8040.7310.9460.8680.9400.800

Responsiveness(SRM)

1.531.311.732.421.822.922.681.152.371.331.742.033.152.02

14; 186-201(2005)

Page 13: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

198 Zhangeld.

between ADHDRS-PI total .score for Visits 1 and 2ranged from 0.731 to 0,946, respectively. The SRMproduced for the ADHDRS-PI total scores had a mini-mum of 1.15 (Israel) and a maximum of 3.15(Sweden). Positive correlations were found betweenADHDRS-PI ana other ADHD scales (CGl-ADHD-Sand CPRS) for all 14 countries with the exception otNorway, where the correlation between ADHDRS-PItotal score and CPRS ADHD Index subscale score was-0.08 at baseline. The number of patients with dhyperactive/impulsive subtype was too small to helpassess discriminant validity within each country, a sta-tistically significant difference (P <0.05) was notedbetween inattentive-suhtype patients and cotnbinedsubtype patients on the ADHDRS-PI hyperactive/impulsive suhscale. Correlations with the ADHDRS-PI inattentive subscale were much lower with CPRSHyperactivity suhscales than with assessments ofCPRS cognitive subscales, and vice versa for theADHDRS-PI hyperactive/impulsive subscaie (resultsnot shown here).

DiscussionIn this study, the psychometric properties of theADHD RS-PI, clinician administered and scored, wereassessed in a ^roup ot over 600 patients from Europe(about 500 patients), Australia, Israel, and SouthAfrica (106 patients) with a diagnosis of ADHD.

Generally, a Kappa statistic greater than 0.6 sug-gested good agreement, and a larger value indicatedgreater strength of agreement (Landis and Koch,1977). There are no published guidelines that definean acceptahle level of avctagc squared deviation toassess inter-rater reliability. However, average squareddeviation trom a similar training tape for a US trialusing ADHDRS-PI was 0.28 (Fades, 2001). Theseresults suggested that the inter-rater reliability ofADHDRS-PI (Kappa = 0.63, average squared devia-tion = 030) is satisfactory for a tniilticentre clinicaltrial.

The results of exploratory factor analysis of the scaleindicate that a two-factor solution would best repre-sent the structure of this scale, which is also consistentwith the DSM-IV two-dimensional diagnostic criteria.

To assess ititernal consistency, the literature suggeststhat 0.65 to 0.70 is an acceptable mitiifnal statidard{Nunnally, 1994; Perrin et al., 1997). The higherCronbach's alpha, the greater the internal consistency.A very low value indicates that the rating scale is

either too short or the items included in the scale havevery little in common. Conversely, a very high valuesuggests some redundancy in the scale. Therefore, theinternal consistency of the ADHDRS-PI scale(Cronbach's alpha = 0.795 at Visit 1 and 0.838 at Visit2) is satisfied and is not sufficiently high to suggestredundancy in items high enough to suggest accept-able internal consistency. The modest item-to-totalcorrelation also indicated that there were no itemswithin the scale that showed either a non-acceptahlecorrelation, or high colinearity. Note that Cronbach'salpha is a little higher at Visit 2 compared with Visit I,which is consistent with the previous finding that theCronbach's alpha value increases once raters havetimre experience with the scale. Also noted that thisCronbach's alpha is slightly lower than the target(0.90) suggested for use of the scale tor individualrather than gmup cotnparisotis (Perrin et al,, 1997).

In this study, ICCs tor ADHDRS-PI total and sub-scale scores range from 0.773 to 0.887, respectively,(see Table 2). Landis and Koch (1977) suggested thatICCs above 0.60 indicate satisfactory test-retest relia-bility and ICCs greater than 0.80 are excellent. Thescore of 0.773 to 0.887 indicated excellent test-retestreliability of ADHDRS-PI. Bland and Altman plotalso showed a good agreement between scoresobtained at two different time points, which suggestedthere is no potential systematic diffetence.

In genetal, cortelations between the ADHDRS-PIand other measures of the same set of ADHD symp-toms were high except for ct>rrelations with CTRS,especially for the ADHDRS-PI Inattentive suhscaleand CTRS cognitive subscale {see Table 3). At thesame time, correlations between scales thought tomeasure different symptom groups were low. Theseresults suggest adequate content validity for the scale.The low correlation between ADHDRS-PI and CTRSis consistent with other researches comparing parentand teacher scales (Faries, 2001). This may he due tomultiple factors, including the fact that the scales arenot cotTipleted at the same time (due to teacher sched-ules), there is limited contact or time to developrelationships between teacher and child, and theCTRS cognitive subscale assesses a slightly broader setof symptoms (academic performatice) than just theADHD symptt)tTi list. Previous studies also suggest thatthere is a low correlation between teachers' ratings andratitigs made by trained classroom observers using theCTRS (Conger et al., 1983; Kazdin ct al., 1983). This

Copyright © 2005 John Wiley & Sons. Ltd, 14:186-201(2005)

Page 14: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Rating Scale ZV 199

may be in part due to the observation that teacherstend to put excessive emphasis on children's acadetnicexcellence (Lessinget al., 1974)-

There were statistically significant differences (P <0.001) among different ADHD-subtype patients (seeFigure I) indicating that the ADHDRS-PI discritni-nates between patients whose diagnosis suggested thepresence of clinically significant hyperactive/impulsiveor inattentive symptoms and those whose diagnosis didnot. The statistically significant differences inADHDRS-PI mean total scores between each CGI-Severity score suggest that ADHDRS-PI had theability to discriminate between groups of patients atdifferent severity levels (see Figure 2). These tworesults indicate the satisfactory discriminant validity ofthe ADHDRS-PI scale. Patients with ADHD showedlarge improvement in ADHDRS-PI total score overtime (SRM ranged from 1.61 to 1.90 for patients withdifferent ADHD subtype). There are no publishedguidelines that define an acceptable level of SRM.However, this result is comparable with CGl-ADHD-S (SRM ^ 1.82), which is a well-accepted clinicianrating scale. This is also consistent with the result(SRM = 1.21) from a US trial with a similar designusing the ADHDRS-PI (Faries, 2001). These resultsindicated acceptable tesponsiveness ofthe ADHDRS-PI scale. Note that the SRM score in this study isrelatively high. We think the reason is that mostpatients entered the study with severe ADHD symp-toms (mean ADHD RS total t-score 3 SD ahovenorms), and thus had bigger opportunity to change.The higher SRM for patients with a higher baselinescore supports our assumption.

Fot ADHDRS-PI total score, the percentages ofpatients given the best and worst possible scores werevery small (see Table 7), thus suggesting that ceilingand floor effects are not an issue for this scale in thispatient population.

The patient characteristics and baseline ADHDseverity scores in this study were very similar to thoseenrolled in a similar US study used to assess the valid-ity of ADHDRS-PI (Fades, 2001). Tahle 1 sumtnatiiesthe patient baseline characteristics for both studies.Moreover, the psychometric properties of the scale forbotb patient populations were similar. This sitnilarityallows us to address the interpretation tif data from USstudies relative to other countries. The consistency ofthe results of this study with the results of the atomox-etine study conducted in tbe US suggests that

ADHDRS-PI can be successfully used to identifypatients with ADHD symptoms, and to detect thechange of symptom severity over time under treattnentin both the US and outside the US.

The ADHDRS-PI demonstrated acceptable inter-nal consistency across all 14 countries (minimumCronbach's alpha was 0.718, see Table 8). ADHDRS-PI was also found to bave acceptable levels oftest-retest reliability for ail 14 countries as ICCsranged from 0.731 to 0.946. Note correlations of atleast 0,60 are considered satisfactory while thosegreater than 0.80 are considered excellent (Landis andKoch, 1977). As with findings from the overall popu-lation, acceptable levels of convergent/divergentvalidity were found between ADHDRS-PI and CPRSfor all 14 countries. Acceptable discritninant validitywas observed within each country. The SRM producedfor the ADHDRS-PI rotal scores hy countries had theminimutn as 1.15, which indicated acceptable respon-siveness of ADHDRS-PI for all countries. Consistentresults of the psychometric properties across patientgrtiups from multiple countries in this study indicatethis rating scale is reasonable for multinational prac-tice.

There are several limitations to this study. It hasbeen noted that culture may play a role in ADHDassessment. Though Magnusson et al. (1999) showedthat the factor stmctures of this rating scale werehighly similar across cultures and the norm scores weresimilar to those found in American studies when ratedby parents, that study used an Icelandic version of thescale and only applied it to Icelandic schoolchildren.In this study, the English version of the rating scalewas utilized in countries where English was not thenative language and the US norms were applied topopulations outside the US. Thus, the ability of theclinicians to translate to non-English-speakingpatients was an additional source of variahility andresults may not extend to other or future translations.Nonetheless, Buitelaar et al. (2004) reported thatwhile there were several differences between the inter-national and Nortb American study populations, thetwo groups were very similar in most respects.

We also realized that because of the small samplesi2e in eacb country (604 participants spread over 14countries), it is difficult to interpret the validity of thisrating scale for each country.

In this study, children without ADHD symptomswere not recruited for ethical reasons. Therefore, a

Copyright © 2005 John Wiley tSi Sons. Ltd. 14; 186-201(2005)

Page 15: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

200 Zhang eial.

comparison hetween patients with ADHD and con-

trols was not possible.

Although discriminant validity could he assessed hy

cotnparing patients with a diagnosis of ADHD inat-

tentive suhtype and a diagnosis of ADHD comhined

suhtype (defined elsewhere in text), the inference was

sotnewhat limited. Additionally, there was neither a

placeho-control arm nor an active comparator

(methylphenidate) in this analysis. Therefore, we

could not assess responsiveness of the scale using effect

sizes based on treatment differences from placeho, nor

compare the standardized response mean from atom-

oxetine with another proven efficacious compound.

Another limitation is that tnore than 50% of

patients had previous drug therapy for ADHD, and we

speculate that the parents of those patients with previ-

ous medication might have significantly more

knowledge ahout ADHD than parents of children

without previous medication. Clinician-scored

ADHDRS-PI is hased on interviews with parents, so

there is potential hias toward either higher or lower

scores for patients with previous experience compared

with patient5 who were treatment naive. This aspect is

not covered in this paper and might be worth further

research.

In conclusion, our results support the validity and

reliability of the ADHD RS as a clinician-adminis-

tered and clinician-scored tool for assessing the

severity of ADHD symptoms in children and adoles-

cents under a research setting in Europe. The ADHD

RS as a clinician-rated tool could be used for screening

patients for baseline symptom severity to determine

whether they meet a threshcild condition for study par-

ticipation and for following the course ai symptom

change over a clinical trial.

AcknowledgementThis research was funded hy Eli Lilly artd Company.

ReferencesAllen AJ, Spencer TJ, Heihgenstein JH, Faries DE,

Kelsey DK, Laws HF, Wernicke J, Kendrick KL,Michelson D. Safety and efficacy of atotnoxetinefor ADHD in two double-blind, placebo-controlledtrials. Biological Psychiatry, Society of BiologicalPsychiatry 56th Annual Convention and ScientificProgtam, 3-5 May 2001; New Orleans LA. Dallas, TX:Elsevier.

Atnerican Psychiatric Association. Diagnostic andStatistical Manual of Mental Disorders: DSM-IV-TR.Wasbington DC: AmtTican Psycbiatric Press, 2000.

Anderson TW. An Introduction to Multivariate StatisticalAnalysis. 2 cdn. Beijing People's Republic of.China:John Wiley & Sons, 1990, pp. 120-5.

Barkley RA. Attention Deficit Hyperactivity Disorder: AHandbook for Diagnosis and Treattnent. New Yorlc:Guilford, 1990.

BaumKaertel A, Wolraicb ML, Dietrich M. Comparison ofdiagnostic criteria for attention deficit disorders in aGertnan eletiietit^ry school satnple. J Am AcaJ CbildAdolesc Psychiatry 1995; .34: 629-38.

Biederman J, Faraone SV, Monuteaux MC, Grosshard JR.How informative are parent reports of attention-deficit/hyperactivity disorder symptotns for assessingoutcome in clinical trials of long-acting treatments.' Apooled analysis o( parents' and teachers' reports.Pediatrics 2004; 113: 1667-71.

Bland JM, Altman DG. Statistical tnetbods for assessingagreement between two tnethods of clitiical measure-ment. Lancet 1986; 1:307-10.

Brown, RT, Freeman WS, Perrin JM, Stein MT, AmierRW, Feldman HM, Pierce K, Buitelaar JK, DanckaertsM, Gillherg C, Zuddas A, Becker K, Bouvard M, FaganJ, Gadoros J, Harpin V, Haiell P, Johnsson M, Lerman-Sagie T, Soutullo C, Wolanczyk T, Zeiner P, Foucbe DS,Krikke-Workel J, Zbang S, Micbelson. A prospective,muiticenter, open-label assessment of atomoxetine itition-North American cbildren and adolescents witbADHD. Eur Child Adolesc Psychiatry 2004; 13(4):249-57.

Catnpbell DT, Fiske DW. Convergent and discriminantvalidation by tbe multitrait-mtiltimetbod matrix.Psychological Bulletin. 1959; 56: 81-105.

Channon E, Butler A. Comparing investigators' use ofrating scales sucb as PANSS in muti-investigatorstudies of schizophrenia. Poster presentation at tbeEleventh European College ofNeuropsycbopbarmacology (ECNP) Congress, 1998.

Coben M, Becker MG, Campbell R. Relationships amongfour metbods of as.sesstnent of cbilJren witb attentiondeficit-byperactivity disorder. J Scbool Fsychol 1990;28: 189-202.

Conger AJ, Conger JC, Wallander j , Wark D, Dygdon J. Ageneralizability study of tbe Conners' Teacber RatingScale-Revised. Educ PsycbnlMeas 1983; 43: 1019-31.

Conners CK. Conners' Rating Scales: Revised TecbnicalManual. Nortb Towanda (New York): Multi-HealtbSystems, 1997.

Cronbacb LE Coefficient alpba and tbe internal structureof tests. Psycbometrika 1951; 16: 297-334.

Dcyo RA, Diehr P, Patrick KL. Reproducibility andresponsiveness of bealtb status treasures: Statistics andstratifies for evaluation. Control Cliti Trials 1991; 12:142S-158S.

DuPaul GJ, Power TJ, Anastopoulos AD, Reid R. ADHDRating Scale IV: checklists, norms, and clinical inter-pretation. New York: Guilford, 1998.

Copyright © 2005 John Wiley & Sons, Ltd. 14;186-201 (2005)

Page 16: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each

ADHD Rating Scale IV 201

Faraone SV, Biederman J. Neurohiology of attention-deficit hyperactivity distirJer. Biol Psychiatry 1998; 10:951-8.

Faries DE, Yalcin I, Harder D, Heiligenstein JH. Validationof the ADHD Rating Scale a.s a clinician administeredand scored instrument. J Atten Disord 2001; 5(2):107-15.

Fleiss JL. Measuring nominal scale agreement among manyraters. P.sychol Bull 1971; 76(5): 378-82.

Graetz BW, Sawyer MG, Hazell PL, Arney F, Baghurst P.Validity of DSM-IV ADHD subtypes in a nationallyrepresentative sample of Australian children andadolescents. J Am Acad Child Adolesc Psychiatry2001; 40: 1410-17,

Guy W. ECDEU a.ssessment manual for psychopharma-cology, revised. Bethesda (MD): US Department ofHealth, Education, atid Welfare, 1976.

Health Council of the Netherlands. Diagnosis and treat-ment of ADHD. Publication no. 2000/24, The Hague:Health Council of the Netherlands, 2000,

Herrmann C. International experiences with the HospitalAnxiety and Depression Scale - a review of validationdata and clinical results. J Psychosom Res 1997; 42:17-41.

Kaufman J, Birmaher B, Brent D, Rao U, Ryan N. Kiddie-SaJs-Present and Lifetime Version (K-SADS-PL),Version 1.0. Pittsburgh: Department of Psychiatry,University of Pittsburgh School of Medicine, 1996.

Kaidin AE, Esveldt-Dawson K, Loar LL. Correspondenceof teacher ratings and direct observations of classroomhehaviour of psychiatric inpatient children. J AbnormPsychol 1983; 10; 483-95.

Landis JR, Koch GG. The measurement of observer agree-rnent for categorical data. Biometrics 1977; 33: 159-74.

Lessing E, Oherlatider Ml, Barhera L. Convergent validityof the IPAT Children's Personality Questionnaire andteachers' ratings of the adjustment of elementary schoolcbtldren. Soc Behav Pers 1974; 2: 222-9.

Magnusson P, Smart J, Gretarsdottir H, Prandardottir H,Attention-Deficit/Hyperactivitity symptoms inIcelandic schoolchildren: assessment with theAttention Deficit/Hyperactivity Rating Scale-IV.Scand J P-sychol 1999; 40: 301-6.

Livingston R, Cultural issues iti Diagnosis and treatment ofADHD, J Am Acad Child Adolesc Psychiatry 1999;38(12): 1591-4.

Michelson D, Faries DE, Wernicke J, Kelsey DK, KendrickKL, Sallee FR, Spencer T, and the Atomoxetine ADHDStudy Group, Atomoxetine in the treatment of chil-dren and adolescents with ADHD: A randomized,placebo-controlled dose-response study. Pediatrics2001; 108(5): 1-9.

Michelson D, Buitelaar JK, Danckaerts MJ, Gillberg C,Spencer T, Zuddas A, Faries D, Zhang S, Biederman J.Relapse prevention in Patients with Attention-

Deficit/Hyperactivity disorder treated with atomoxe-tine: a randomized, double-blind, placebo-controlledstudy. J Am Acad Child Adolesc Psychiatry 2004;43(7): 896-904.

Mueller CW, Mann EM, Tbanapum S, Humris E, ikeda Y,Takahashi A, Tao KT, Li BL. Teachers' ratings of disrup-tive hehaviour in five countries. J Clin Child Psychol1995; 24(4): 434-42,

Nati<in;il Institute of Clinical Excellence. Guidance on theUse of Methylphenidate (Ritalin, Equasym) forAttention Deficit/Hyperactivity Disorder (ADHD) inChildhood. Technology Appraisal Guidance no. 13.London: NICE, 2000.

Nunnally, JC. Psychometric Theory. 3 edn. New York:McGraw-Hill, 1994.

Perrin ES, Aaronson NK, Alonso J, Burnatn A, Lohr K,Patrick KL. Instrument Review Criteria. MedicalOutcome.s Trust 1997: 1-5.

Pliszka SR, McCracken JT, Maas JW. Catecholamines inattention-deficit hyperactivity disorder: currentperspectives. J Am Acad Child Adolesc Psychiatry1996; 35(3): 264-72.

Prendergast M, Taylor E, Rapoport JL, Bartko J, DonnellyM, Zametkin A, Ahearn MB, Dunn G, Wieselberg HM.The diagnosis of childhood hyperactivity. A US-UKcross-national study of DSM-III and IGD-9. J ChildPsychol Psychiatry 1988; 29(3): 289-300.

Reid, R, Assessment of ADHD with culturally differentgroups: the use of behaviour rating scales. SchoolPsychology Review 1995; 24: 537-60.

Reid R, DuPaul Gj, Power TJ, Anastopoulos AD, Rogers-Adkinson D, Noll M, Riccio G. Assessing culturallydifferent students for attention deficit hyperactivitydisorder using behaviour rating scales. J Abnorm ChildPsychol 1998; 26(3): 187-98.

Spencer F, Biederman j , Heiligenstein J, Wilens T, FariesD, Prince J. An open-label, dose ranging study oftomoxetine in children with attention deficit hyperac-tivity disorder. J Ghild Adolesc Psychopharmacol 2001;11(3); 251-65.

Stratford PW, Binkley JM, Riddle DL. Health statusmeasures: strategies and analytic methods for assessingchange scores, PKys Ther 1996; 76(10): 109-23.

Stucki G, Michel BA, How to tneasure improvement: rulesand fallacies, Rheumatol Eur Supp 1995; 2: 107-11.

Correspondence: Shuyu Zhang, Lilly Research

Laboratories, EU Lilly and Company, Drop Code 6161.

Indianapolis, IN 46285.

Telephone (-¥1)317-2763455.

Fax (+1) 317-4336590.

Email: shuyu_zhan^lilly.com.

Copyright © 200S John Wiley & Sons, LtJ. 14:186-201(2005)

Page 17: ADHD Rating Scale IV: psychometric properties from …sbs5/Greinar_ADHD/adhd_rating_scale...ADHD Rating Scale IV: ... gent validity, discriminant validity, responsiveness, ... at each