19
This article was downloaded by: [71.189.131.183] On: 05 August 2013, At: 10:58 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK The Clinical Neuropsychologist Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/ntcn20 Effectiveness of the Comalli Stroop Test as a Measure of Negative Response Bias Timothy J. Arentsen a , Kyle Brauer Boone b , Tracy T. Y. Lo c , Hope E. Goldberg d , Maria E. Cottingham e , Tara L. Victor f , Elizabeth Ziegler g & Michelle A. Zeller h a Fuller Graduate School of Psychology , Pasadena , CA , USA b California School of Forensic Studies, Alliant International University , Los Angeles , CA , USA c City of Hope Medical Center , Duarte , CA , USA d Olive View UCLA-Medical Center , Sylmar , CA , USA e Private Practice, Los Angeles , CA , USA f California State University Dominguez Hills , Carson , CA , USA g VA Medical Center Spokane, WA , Spokane , WA , USA h West Los Angeles VA , Los Angeles , CA , USA Published online: 07 Jun 2013. To cite this article: The Clinical Neuropsychologist (2013): Effectiveness of the Comalli Stroop Test as a Measure of Negative Response Bias, The Clinical Neuropsychologist, DOI: 10.1080/13854046.2013.803603 To link to this article: http://dx.doi.org/10.1080/13854046.2013.803603 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content.

Stroop Word Reading Raw Score as a Processing Speed Symptom Validity Test

Embed Size (px)

Citation preview

This article was downloaded by: [71.189.131.183]On: 05 August 2013, At: 10:58Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

The Clinical NeuropsychologistPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/ntcn20

Effectiveness of the Comalli Stroop Testas a Measure of Negative Response BiasTimothy J. Arentsen a , Kyle Brauer Boone b , Tracy T. Y. Lo c ,Hope E. Goldberg d , Maria E. Cottingham e , Tara L. Victor f ,Elizabeth Ziegler g & Michelle A. Zeller ha Fuller Graduate School of Psychology , Pasadena , CA , USAb California School of Forensic Studies, Alliant InternationalUniversity , Los Angeles , CA , USAc City of Hope Medical Center , Duarte , CA , USAd Olive View UCLA-Medical Center , Sylmar , CA , USAe Private Practice, Los Angeles , CA , USAf California State University Dominguez Hills , Carson , CA , USAg VA Medical Center Spokane, WA , Spokane , WA , USAh West Los Angeles VA , Los Angeles , CA , USAPublished online: 07 Jun 2013.

To cite this article: The Clinical Neuropsychologist (2013): Effectiveness of the ComalliStroop Test as a Measure of Negative Response Bias, The Clinical Neuropsychologist, DOI:10.1080/13854046.2013.803603

To link to this article: http://dx.doi.org/10.1080/13854046.2013.803603

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Effectiveness of the Comalli Stroop Test as a Measure ofNegative Response Bias

Timothy J. Arentsen1, Kyle Brauer Boone2, Tracy T. Y. Lo3,Hope E. Goldberg4, Maria E. Cottingham5, Tara L. Victor6,Elizabeth Ziegler7 and Michelle A. Zeller81Fuller Graduate School of Psychology, Pasadena, CA, USA2California School of Forensic Studies, Alliant International University, Los Angeles, CA, USA3City of Hope Medical Center, Duarte, CA, USA4Olive View UCLA-Medical Center, Sylmar, CA, USA5Private Practice, Los Angeles, CA, USA6California State University Dominguez Hills, Carson, CA, USA7VA Medical Center Spokane, WA, Spokane, WA, USA8West Los Angeles VA, Los Angeles, CA, USA

Practice guidelines recommend the use of multiple performance validity tests (PVTs) to detectnoncredible performance during neuropsychological evaluations, and PVTs embedded instandard cognitive tests achieve this goal most efficiently. The present study examined theutility of the Comalli version of the Stroop Test as a measure of response bias in a largesample of “real world” noncredible patients (n = 129) as compared with credibleneuropsychology clinic patients (n=233). The credible group performed significantly better thanthe noncredible group on all trials, but particularly on word-reading (Stroop A) and color-naming (Stroop B); cut-scores for Stroop A and Stroop B trials were associated with moderatesensitivity (49–53%) as compared to the low sensitivity found for the color interference trial(29%). Some types of diagnoses (including learning disability, severe traumatic brain injury,psychosis, and depression), very advanced age (P80), and lowered IQ were associated withincreased rates of false positive identifications, suggesting the need for some adjustments tocut-offs in these subgroups. Despite some previous reports of an inverted Stroop effect (i.e.,color-naming worse than color interference) in noncredible subjects, individual Stroop wordreading and color naming trials were much more effective in identifying response bias.

Keywords: Stroop; Malingering/Symptom validity testing; Forensic neuropsychology.

INTRODUCTION

In order to avoid increasing test battery administration time and to reduce therisk of coaching, researchers have encouraged the use of embedded indices of responsebias derived from standard cognitive tests in the assessment of noncredible test perfor-mance (Boone, 2013; Bortnik et al., 2010; Erdal, 2004; Iverson & Binder, 2000; M.Kim et al., 2010). Use of multiple, effective, and independent performance validitytests (PVTs) can enhance a clinician’s ability to detect noncredible test taking as nega-tive response bias may fluctuate across time (Boone, 2009; Boone, 2013; Bush et al.,2005; Sweet & Nelson, 2007), and failures on multiple uncorrelated performance

Address correspondence to: Timothy J. Arentsen, Fuller Graduate School of Psychology, 135 N.Oakland Ave, Pasadena, CA, USA. Email: [email protected]

Accepted for publication 7 March 2013. First published online 5 June 2013

The Clinical Neuropsychologist, 2013http://dx.doi.org/10.1080/13854046.2013.803603

� 2013 Taylor & Francis

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

validity indicators increase the likelihood that a test taker was not performing to trueability (Boone, 2013; Nelson et al., 2003; Slick, Sherman, & Iverson, 1999).The current study examined the use of a verbal processing speed and inhibition task,the Comalli version of the Stroop Test, as a performance validity indicator.

Multiple versions of the Stroop Test exist, which complicates describing the taskand synthesizing the research conducted on it (see Mitrushina, Boone, Razani, &D’Elia, 2005). Trials typically include a word-reading task (Stroop A), a color-namingtask (Stroop B), and an interference task in which the participant must name the inkcolor of a discrepant word (e.g., the word “red” is printed in blue ink; Stroop C).Stroop A and B measure verbal processing speed, while Stroop C assesses the testtaker’s ability to inhibit over-learned behaviors (e.g., the tendency to “automatically”read words with little consideration as to the colors they are printed in). Subjectstypically require up to twice as long to complete the color interference task ascompared to the word-reading and color-naming tasks due to the active inhibitionnecessary for execution of this task (see Mitrushina et al., 2005).

Little research exists on the ability of the Stroop Test to detect noncredible testperformances despite the fact that timed response measures have been repeatedlyshown to be sensitive to negative response bias (Arnold et al., 2005; Babikian, Boone,Lu, & Arnold, 2006; Boone, Lu, & Herzberg, 2002b; Boone, Lu, & Herzberg, 2002c;M. Kim et al., 2010; N. Kim et al., 2010).

The few known groups and simulation studies on the utility of the Stroop Testas a measure of response bias have documented that noncredible subjects performworse than do credible patients (Schmand et al., 1998; van Gorp et al., 1999; Vickeryet al., 2004). However, sensitivity, specificity, and cut-off scores have generally notbeen reported, limiting the clinical utility of these data. In one exception, Backhaus,Fichtenberg, and Hanks (2004) showed that use of a cut-off at the 50th percentile fora moderate to severe post-acute traumatic brain injury (TBI) group resulted in 92%specificity in nonlitigating mild brain injury patients and 88% sensitivity in mildtraumatic injury litigants who met Slick et al. (1999) criteria for probable/definitemalingered neurocognitive dysfunction. Interestingly, some studies have demonstratedan inverted Stroop effect in which noncredible subjects show prolonged times on wordreading and/or color naming as compared to performance when naming the incongru-ent color word (Egeland & Langfjaeran, 2007; Osimani, Alon, Berger, & Abarbanel,1997). However, when presence of an inverted Stroop effect was used to classifysubjects, specificity was poor (61% sensitivity but only 59% specificity; Egeland &Langfjaeran, 2007).

Some promising experimental adaptations of the Stroop Test for detection offeigned brain injury and post-traumatic stress disorder (PTSD) have been reported. Forexample, brain injury simulators scored more poorly when the words to be “not read”on the interference trial involved deception (cheat, fake, lie; Cannon, 2003). Othersobserved that patients with actual PTSD showed a differential slowing on the interfer-ence task if the words involved PTSD-related threat content, whereas simulators ofPTSD displayed equal slowing on threat and neutral content words (Buckley, Galovski,Blanchard, & Hickling, 2003), though caution has been advised in generalizing thesefindings (see Thomas & Fremouw, 2009). Further, these paradigms have not beenvalidated in “real world” settings.

2 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

A unique role for the Stroop Test has been identified in detection of individualsfeigning severe reading disability. Specifically, errors of reading on Stroop C (i.e., theword is “read” rather than the color ink named) is pathognomonic for malingering inthose falsely claiming a complete inability to read (i.e., indicate they cannot read anywords on Stroop A; Lu, Boone, Jimenez, & Razani, 2004). However, admittedly,claimed total inability to read is a rare presenting complaint, thus limiting this use ofthe Stroop Test as a validity indicator.

In summary, only limited research has focused on the potential of the StroopTest to identify negative response bias and many of the studies have employed smallsamples (e.g., Backhaus et al., 2004; Lu et al., 2004; Osimani et al., 1997; van Gorpet al., 1999; Vickery et al., 2004), simulators (e.g., Osimani et al., 1997; Vickeryet al., 2004), or questionable methods for identifying negative response bias (e.g.,definition of noncredible as failure on a single PVT, Egeland & Langfjaeran, 2007).

The purpose of the present study was to examine the effectiveness of theword-reading, color-naming, and color interference trials of the Comalli Stroop Test(Comalli, Wapner, & Werner, 1962) as a measure of negative response bias in alarge known-groups sample meeting stringent criteria for group assignment.

METHOD

Participants

All participants were referred for neuropsychological assessment to the OliveView UCLA Medical Center Neuropsychology Service, the Harbor-UCLA MedicalCenter Outpatient Neuropsychology Service, or the private practice of the secondauthor.1 All participants were fluent in English. IRB approval to examine archivaldata was obtained from the hospital-affiliated research institutes (Los Angeles Bio-medical Institute and Olive View-UCLA Medical Center Educational and ResearchInstitute). Criteria for inclusion and exclusion within credible and noncrediblegroups are described below. Assignment to groups included information onwhether PVTs were passed or failed. As the neuropsychological battery hasadapted over time, not all participants were administered all 11 of the PVTs,especially the earlier cases. Therefore, only participants with data for P3 PVTswere included.

Credible group. The 233 credible patients (108 males, 125 females) failed 61independent neurocognitive PVTs (tests and cutoffs listed in Table 1; failures on multi-ple scores from a single test were counted as a single failure) and were not in litigationor applying for disability services. Patients who failed one PVT were retained in thesample because research shows that failure on a single indicator among several is notunusual in credible populations (Boone, 2013; Victor, Boone, Serpa, Buehler, & Zie-gler, 2009). Subjects with FSIQ <70 and diagnoses of dementia were excluded givenevidence that these populations fail PVTs at a high rate despite performing to true abil-ity (Dean, Victor, Boone, & Arnold, 2008; Dean, Victor, Boone, Philpott, & Hess,2009). Demographic information is shown in Table 2, and final diagnoses(incorporating test results) are listed in Table 3.

EFFECTIVENESS OF THE STROOP TEST AS A PVT 3

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Table 1. Performance validity tests used for group assignment

(1) Rey 15 plus Recognition (Boone et al., 2002)

(a) Combination score <20

(2) Dot Counting Test (Boone et al., 2002a)

(a) E-score P17

(3) b Test (Boone et al., 2002b)

(a) E-score P155

(4) WAIS III Digit Symbol Coding (N. Kim et al., 2010)

(a) Combination equation 657

(5) WAIS III or IV Digit Span (Babikian et al., 2006)

(a) Age-Corrected Scaled Score for WAIS III 65, OR(b) Reliable Digit Span 66, OR(c) Average time to repeat 3 digits forward P3 seconds

(6) Rey Word Recognition (Nitch et al., 2006)

(a) Total recognition (without subtracting false positives) for men 65, OR(b) Total recognition (without subtracting false positives) for women 67, OR(c) Combination equation 69

(7) Rey Auditory Verbal Learning Test (RAVLT) Effort Equation (Boone, Lu, & Wen,2005) and Rey-Osterrieth Complex Figure Test (RCFT)/RAVLT DiscriminantFunction (Sherman, Boone, Lu, & Razani, 2002)

(a) Effort Equation Score 612, OR(b) RCFT/RAVLT discriminant function 6–0.40

(8) Finger Tapping dominant hand (Arnold et al., 2005)

(a) 635 for men, OR(b) 628 for women

(9) Rey-Osterrieth Complex Figure Effort Equation (Lu et al., 2003)

(a) Combination score <47

(10) Warrington Recognition Memory Test – Words (M. Kim et al., 2010)

(a) Total score 642, OR(b) Total time to complete P207 seconds

(Continued)

4 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Noncredible group. A total of 129 patients (80 males, 49 females) met Slicket al.’s (1999) criteria for probable malingered neurocognitive dysfunction. All wereseeking disability compensation or were in litigation for claimed medical or psychiatricdisorders, and failed P2 independent neurocognitive PVTs (tests and cut-offs arelisted in Table 1). Per Slick et al. (1999) criterion D, it was critical to identify andexclude from this group any truly low functioning individuals who failed the PVTsdue to actual neurologic or developmental conditions (e.g., with actual dementia orFSIQ <70). However, IQ data and memory scores could not be used for this purposegiven research showing that noncredible subjects obtain spuriously lowered neurocog-nitive scores (e.g., Bortnik et al., 2010; Demakis et al., 2001); that is, neurocognitivescores are not accurate indicators of function in this population. Therefore, clinicaljudgments by experienced neuropsychologists (K.B.B., H.E.G., and M.E.C.) were usedto determine whether low neuropsychological scores were inconsistent with evidenceof normal function in activities in daily living (e.g., substantially impaired memory orother scores yet subjects lived independently, drove, handled finances, etc.).Demographic information for this noncredible group is contained in Table 2, andclaimed/presenting diagnoses are listed in Table 3.

Procedures

The Comalli version (Comalli et al., 1962) of the Stroop Color/Word Test wasadministered as part of a larger clinical neuropsychological test battery (test stimulican be obtained from the second author). This version consists of three trials(Word-Reading – A, Color-Naming – B, and Interference – C), each having 100 items(words or color blocks) with three possible response options (red/blue/green). For thefirst trial, Stroop A, patients were instructed to read the color names printed in blackink as quickly as possible. In the second trial, Stroop B, patients named 100 coloredblocks as quickly as possible. Stroop C was the third trial and required participants torapidly name the ink color of color names that were incongruent. Each trial was timed.Scores used for analysis were the number of seconds required to complete each trial.Error data were not consistently recorded and were not available for analysis.

Due to previous reports of an inverted Stroop effect in noncredible subjects, fourpreviously published methods (two examined in noncredible subjects) were employedto identify whether performance on Stroop C was better or worse than that expectedgiven performance on Stroop A and/or B: Equation 1 = C – B (Comalli et al., 1962;Golden, 1978); Equation 2 = C – (A + B) (Golden, 1978); Equation 3 = C – [(A �B)/(A + B)] (Golden, 1978); and Equation 4 = C – {[(216 – A) � B]/[(216 – A) +B]} in which trial scores were transformed by: 4500/Time (Chafetz & Matthews,2004). These various equations theoretically produce a more pure estimate of theStroop effect than Stroop C alone by accounting for the time it takes to name and/or

Table 1. Continued

(11) Test of Memory Malingering (Tombaugh, 1997)

(a) Trial 2 <45 OR(b) Retention Trial <45

EFFECTIVENESS OF THE STROOP TEST AS A PVT 5

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Tab

le2.

Group

comparisons

fordemographic

variablesandStroopscores

Credible(n

=23

3)Non

credible

(n=12

9)t

d

Age

43.27±14

.61

43.62±11.10

–0.24

0.03

Edu

catio

n(years)

13.32±2.72

12.70±4.37

1.67

0.17

Stroo

pA

(seconds)

50.82±11.01

74.14±33

.89

–9.63⁄

⁄⁄1.04

Stroo

pB(secon

ds)

71.70±16

.19

94.76±32

.98

–8.89⁄

⁄⁄0.94

Stroo

pC(secon

ds)

138.70

±40

.33

178.08

±61

.15

–7.21⁄

⁄⁄0.78

Stroo

pEffectEqu

ation1

70.80±65

.61

85.36±49

.63

–2.14⁄

0.25

Stroo

pEffectEqu

ation2

19.98±64

.71

12.97±52

.09

1.03

0.12

Stroo

pEffectEqu

ation3

112.94

±67

.50

144.82

±92

.21

–3.70⁄

⁄⁄0.40

Stroo

pEffectEqu

ation4

–7.31±8.46

–9.60±6.44

2.6⁄

0.30

n(%

)n(%

)Ethnicity

Caucasian

112(48.07

)49

(37.98

)Hispanic

45(19.31

)24

(18.60

)Asian

American

16(6.87)

6(4.65)

African

American

30(12.88

)43

(33.33

)Other

30(12.88

)7(5.43)

Eng

lishas

second

language

55(23.71)

25(19.53

)

Note:

StroopA

=WordReading

trial;StroopB

=Color

Nam

ingTrial;StroopC

=Interference

trial.StroopEffectEquation1=C

–B;Stroo

pEffectEqu

ation2=C

–(A

+B);Stroo

pEffectEqu

ation3=C

–[(A

�B)/(A

+B)];Stroo

pEffectEqu

ation4=C

–{[(216

–A)�

B]/[(21

6–A)+B)with

trialscores

transformed

by(4500/

time).

⁄ p<.05;

⁄⁄p<.01;

⁄⁄⁄ p

<.001.

6 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

read the colors. Negative scores on Equations 1–3 and positive scores on Equation 4(reflecting faster scores for C relative to A and/or B) were judged to reflect an invertedStroop effect.

Table 3. Frequency of primary diagnoses by group

Credible Noncredible

Alcohol disorder 8 1Anoxia 3 1Anxiety/Panic attacks/OCD 5 3Asperger’s 1 –R/O Asperger’s disorder 1 –

Attention deficit/Hyperactive disorder 5 –Autoimmune disorder 1 –Bipolar disorder 12 1Brain tumor 3 1Chronic fatigue – 1Cognitive disorder NOS 4 5R/O Cognitive disorder NOS 1 –

Dementia – 3R/O Dementia 5 1

Depressive disorder 40 12R/O Depressive disorder 2 –

R/O Dissociative disorder 1 –Electrocution – 2Fetal alcohol/Drug exposure 1 –HIV/AIDS 4 1Klinefelter syndrome 1 –Impulse control disorder 1 –Learning disability 24 4R/O Learning disability 4 1

Meningitis – 1Mental retardation – 3R/O Mental retardation – 1

Mild cognitive impairment 4 2Multiple sclerosis 4 –Personality disorder 3 –Posttraumatic stress disorder 1 –Schizoaffective disorder 1 –Schizophrenia/Psychosis 27 24R/O Psychotic disorder 1 –

Seizures 13 4Somatoform disorder 12 2R/O Somatoform disorder 11 1

Stroke/Aneurysm 7 9Substance abuse/Dependence 8 2Toxic exposure – 2Traumatic Brain InjuryMild 1 21Moderate 1 3Severe 10 13Unknown severity 2 4

Total 233 129

EFFECTIVENESS OF THE STROOP TEST AS A PVT 7

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

RESULTS

Group comparisons

As shown in Table 2, the credible and noncredible groups did not differ signifi-cantly in age or education. Significant group differences were documented on all threeStroop trials, with the credible patients consistently performing faster than noncrediblesubjects. As the Stroop scores were positively skewed, the significant group compari-sons were also confirmed through Mann–Whitney U-tests (ps < .001). As shown inTable 4, the following cut-offs scores (and sensitivity rates) were associated withspecificity of P90%: Stroop A (P66, 53.49% sensitivity), Stroop B (P93, 48.82%sensitivity), and Stroop C (P191, 29.41% sensitivity).

As reproduced in Table 2 and confirmed through Mann–Whitney U-tests (ps <.001), groups differed on three out of four equations specific to quantification of extentof Stroop effect. However, effect sizes were smaller than for individual Stroop trials.Two cut-offs were identified for each equation at the opposite extremes of the scoredistributions (each associated with 610% false positive identifications in the crediblesample) that identified enhanced versus reduced Stroop effects. However, sensitivitiesof all cut-offs were low and poorer than those for individual Stroop trials (see Table 5).Further, in the noncredible subjects, an inverted Stroop effect (a negative score forEquations 1–3 and a positive score for Equation 4) was only present in 0.01% (n = 1)on Equation 1, in 35.66% (n = 46) on Equation 2, in 0% on Equation 3, and 3.36%(n = 4) on Equation 4. For comparison, in credible subjects, an inverted Stroop effectwas present in 31.33% (n = 73) on Equation 2, 14.29% (n = 33) on Equation 4, andnot at all for Equations 1 or 3. As trial scores were highly correlated for credible par-ticipants (rs from .39 to .59, ps < .001; rs from .50 to .63, ps < .001 for credible par-ticipants), logistic regression modeling was not performed. Due to poor classificationaccuracy of the Stroop effect equations, they were not included in further statisticalanalyses.

We suspected that some subgroups within the credible group might have been atenhanced risk for scoring beyond cut-offs despite performance of true ability, and weconducted additional analyses to investigate this possibility.

First, we were concerned that the 23 patients with primary diagnoses of somato-form disorder (including rule out) might have performed in an anomalous manner onthe Stroop Test, because, while they failed 61 of the PVTs used for group assignment,their conditions involved fabrication of physical symptoms for psychological reasons.However, as shown in Table 6, somatoform and nonsomatoform patients did not sig-nificantly differ on Stroop scores on independent t-tests (Mann–Whitney U-tests werealso nonsignificant, ps = .08 to .92), and in fact, somatoform patients showed a trendtoward better performance on Stroop B and C.

We also hypothesized that credible patients who spoke English as a secondlanguage might have performed more slowly on the Stroop tasks because of their rela-tive lack of facility with the English language. In fact, reduced color naming speed inEnglish has been reported in bilingual individuals (Rosselli et al., 2002). However, asshown in Table 6, independent t-test results (confirmed with Mann–Whitney U-tests,ps = .06 to .85) showed that performance on the Stroop trials was highly similar in the55 credible subjects who spoke English as a second language or who learned Englishconcurrently with a second language versus the 177 credible subjects who were native

8 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Table 4. Sensitivity and specificity for time elapsed on Stroop trials

Sensitivity Specificity(n = 129) (n = 233)

Stroop AP50 83.72 54.08P60 65.12 84.12P61 62.02 84.98P62 60.47 87.12P63 58.14 87.12P64 55.81 89.27P65 55.04 89.70P66 53.49 90.56P67 51.16 90.99P68 48.84 91.85P70 45.74 93.99P74 41.09 95.71P76 37.21 96.14P77 36.43 96.57P86 20.16 98.71P94 13.95 99.14P95 13.95 100.00

Stroop BP80 66.14 76.82P88 57.48 86.27P89 56.69 87.12P91 55.12 87.98P92 53.54 89.27P93 48.82 90.13P94 48.03 90.56P96 43.31 90.99P97 43.31 91.42P99 40.16 92.70P101 33.07 93.56P103 28.35 94.42P105 27.56 94.42P108 25.20 97.00P115 17.32 97.85P129 9.45 98.28P131 9.45 99.57P132 8.66 100.00

Stroop CP170 46.22 83.55P174 43.70 85.71P177 41.18 86.58P179 39.50 87.01P187 33.61 89.18P190 31.09 89.61P191 29.41 90.48P193 29.41 91.34P196 28.57 91.77

(Continued)

EFFECTIVENESS OF THE STROOP TEST AS A PVT 9

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

English speakers, with actually a trend toward better performance in the ESL speakerson Stroop A.

We additionally suspected that credible subjects with histories of likely learningdisability might underperform on the Stroop trials relative to the remaining crediblesubjects given literature showing that word reading and color naming speed arereduced in learning disability (Golden & Golden, 2002; Willcutt et al., 2001). Whilescores of the 28 individuals with learning disability did not significantly differ fromthose of credible individuals without histories of learning problems (n = 205), therewere trends toward poorer performances in the learning disability group on t-test anal-yses (see Table 6). Further, though the Mann–Whitney U-test was nonsignficant forStroop A (p = .13), nonparametric comparisons were significant for Stroop B and C(ps < .05). For this reason, specificity rates for the Stroop cut-offs were examined inthe learning disability group separately. Using the cut-offs appropriate for the crediblegroup as a whole, 39% of the learning disability subjects failed at least one. To main-tain at least 90% specificity in the learning disordered credible subjects, the cut-off forStroop A had to be raised to 76 (37.21% sensitivity), for Stroop B had to be changedto 98 (41.73% sensitivity), and for Stroop C had to be adjusted to 211 (21.85%sensitivity).

To examine the potential effect of age and education on Stroop raw scores, cor-relations (parametric and nonparametric) were computed between these demographicvariables and the Stroop data in each group separately. Age was not significantlyrelated to any of the three Stroop scores in the noncredible group (rs = –.11 to .14, p> .05), whereas education was significantly negatively correlated with Stroop A (r =–.23, p < .01) but not the other Stroop trials. In the credible group, age was positivelyrelated to Stroop A and C (rs = .14 to .26, p < .05), and education was negatively

Table 4. Continued.

Sensitivity Specificity(n = 129) (n = 233)

P200 26.89 92.64P204 25.21 93.07P218 19.33 97.40P284 5.88 99.13P384 0.84 100.00

Table 5. Cut-off scores for Stroop effect equations and associated sensitivity levels

Low Stroop effect High Stroop effect

Cut-off Sensitivity Cut-off Sensitivity

Equation 1: C – B 632 5.83% P110 22.50%Equation 2: C – (A + B) 6–17 20.00% P55 13.33%Equation 3: C – [(A � B)/(A + B)] 670 2.50% P159 25.83%Equation 4: C – {[(216 – A) � B]/[(216 – A) + B]} 6–16.37 13.33% P3.11 1.67%

Note: A = Stroop Word Reading Trial; B = Stroop Color Naming Trial; C = Stroop Interference Trial.

10 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

related to Stroop C (r = –.13, p < .05). However, all significant relationships weremodest, accounting for less than 7% of the test score variance.

In fact, when we examined demographic characteristics of the 22 to 23 credibleindividuals who fell beyond individual Stroop trial cut-offs (i.e., the 10% incorrectlyidentified as noncredible; see Table 7), educational level did not appear to be a factorin the failures, as the mean education level of subjects falling beyond cut-offs was13.20, and was comparable to the mean of 13.32 years of education in the larger credi-ble group. Further, those who failed the Stroop cut-offs obtained a mean age of 44.27,which was comparable to the overall mean of 43.27. However, of the two crediblesubjects aged P80, one fell beyond cut-offs for Stroop B and C; therefore, some cau-tion may be appropriate when using the Stroop Test as a measure of response bias inthis age range.

Effect of gender on Stroop performance was examined in the credible and non-credible groups separately. No gender differences were found in the noncredible group(ps from .09 to .64), which was confirmed through Mann–Whitney U-tests (ps from.09 to .92). In the credible group, while men and women did not differ on Stroop A(men = 49.94 ± 10.45; women = 51.59 ± 11.46) and Stroop B (men = 70.49 ± 15.11;women = 72.75 ± 17.07), women scored poorer on Stroop C (men = 131.15 ± 35.55;women = 145.21 ± 43.13; p = .008). Further, Mann–Whitney U-tests were nonsignfi-cant for Stroop A and B (both ps = .29), but significant for Stroop C (p < .01). In fact,female credible participants were somewhat over-represented in those credible subjectswho scored beyond Stroop cut-offs (Stroop A = 15 female, 7 male; Stroop B = 13female, 10 male; Stroop C = 14 female, 8 male). As shown in Table 7, the apparentincrease in false positive identifications among credible women appeared to be anartifact of severe neurologic conditions (e.g., stroke, multiple sclerosis, severe TBI) inthose female patients scoring beyond cut-offs.

Examination of types of diagnoses that were associated with increased falsepositive rates revealed that 26% (11 out of 42) patients with diagnoses of depressivedisorder, 20% (2 out of 10) of severe TBI patients, and 21% (6 out of 28) patientswith psychosis scored beyond test cut-offs. These findings are preliminary given the

Table 6. Means, standard deviations, and group comparisons for credible subjects with and withoutSomatoform disorder, English as a second or concurrent language, and learning disability

Stroop A Stroop B Stroop C

Somatoform diagnosisPresent (n = 23) 50.52 ± 10.51 65.91 ± 10.30 125.13 ± 27.25Absent (n = 210) 50.86 ± 11.08 72.34 ± 16.61 140.20 ± 41.30

p = .89 p = .07 p = .09English as second/concurrent languagePresent (n = 55) 48.31 ± 9.80 71.45 ± 17.21 135.75 ± 36.61Absent (n = 177) 51.62 ± 11.29 71.81 ± 15.96 139.45 ± 41.54

p = .05 p = .89 p = .55Learning disabilityPresent (n = 28) 54.14 ± 12.95 76.21 ± 14.82 150.78 ± 36.94Absent (n = 205) 50.37 ± 10.67 71.09 ± 16.31 137.10 ± 40.57

p = .09 p = .12 p = .10

EFFECTIVENESS OF THE STROOP TEST AS A PVT 11

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Table 7. Demographic, IQ, and diagnostic characteristics of credible subjects exceeding Stroop A, B, and Ccut-offs

Gender Age Education Ethnicity ESL Diagnosis FSIQ A B C

Female 57 13 African American No Stroke/Aneurysm 88 Y Y YMale 20 13 Asian Yes Asperger’s disorder – Y Y YFemale 38 14 Other No Anxiety disorder – Y Y YFemale 58 14 African American No Depressive disorder 76 Y Y –Female 52 10 Caucasian No Multiple sclerosis 83 Y Y –Female 56 14 African American No Alcohol abuse 77 Y Y –Male 21 11 African American No Learning disability 75 Y – YFemale 58 11 Caucasian No Learning disability 79 Y – YMale 31 13 Hispanic Yes R/O Somatoform disorder 87 Y – –Female 56 15 Caucasian No R/O Somatoform disorder 104 Y – –Female 32 11 Other No Depressive disorder 85 Y – –Female 34 12 Other No Severe TBI 74 Y – –Female 63 18 Caucasian No R/O Dementia 97 Y – –Female 44 14 Caucasian No Depressive disorder 87 Y – –Male 58 6 Hispanic Yes Depressive disorder 99 Y – –Female 58 9 Other No Depressive disorder 97 Y – –Male 43 18 Asian No Depressive disorder 91 Y – –Female 30 12 Other No Severe TBI 70 Y – –Female 51 12 African American No Multiple sclerosis 72 Y – –Male 48 14 Caucasian No Learning disability 82 Y – –Female 57 11 Hispanic No Learning disability 77 Y – –Male 46 11 Caucasian No Learning disability 86 Y – –Female 47 12 African American No Alcohol abuse 82 – Y YMale 81 16 Caucasian Yes Schizophrenia/Psychosis – – Y YFemale 59 14 Caucasian No Depressive disorder 90 – Y YMale 64 16 Caucasian No Schizophrenia/Psychosis 107 – Y YMale 31 11 Other Yes Learning disability 82 – Y YFemale 30 13 African American No Schizophrenia/Psychosis 88 – Y –Female 22 14 Other No Cognitive disorder NOS 100 – Y –Female 45 18 Caucasian No Bipolar disorder 108 – Y –Female 26 14 African American No Depressive disorder 102 – Y –Female 21 15 Hispanic Yes Fetal alcohol/Drug exposure 89 – Y –Male 51 12 Caucasian No Schizophrenia/Psychosis 93 – Y –Male 45 13 Other Yes Depressive disorder 98 – Y –Male 30 14 Asian Yes Schizophrenia/Psychosis – – Y –Male 45 16 Other Yes Depressive disorder 95 – Y –Male 30 12 Caucasian No Learning disability 90 – Y –Male 46 12 Caucasian No Schizophrenia/Psychosis 85 – Y –Female 47 14 Caucasian No Learning disability 74 – Y –Female 46 15 Caucasian No Bipolar disorder 84 – – YFemale 51 12 African American No R/O Dementia 84 – – YFemale 58 16 Caucasian No Stroke/Aneurysm 96 – – YMale 48 15 Other No Substance abuse 100 – – YFemale 61 12 Caucasian No Bipolar disorder 83 – – YFemale 21 14 Caucasian No Schizoaffective disorder – – – YFemale 60 16 Caucasian No Depressive disorder 110 – – YMale 35 11 Caucasian No Learning disability 88 – – YFemale 24 14 Caucasian No Learning disability 87 – – YFemale 38 14 Caucasian No Anoxia 91 – – YMale 55 11 Hispanic Yes R/O Learning disability 91 – – YFemale 30 11 Hispanic Yes R/O Asperger’s disorder 98 – – Y

Note: ESL = English as second/concurrent language status; R/O = rule out; TBI = traumatic brain injury;A = Stroop Word Reading Trial; B = Stroop Color Naming Trial; C = Stroop Color-Word Interference Trial;Y = failed cut-off.

12 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

small ns, but may indicate that cut scores require adjustment for these conditions.Adjusted Stroop trial cut-offs that resulted in P90% specificity were: P74 for StroopA in severe TBI; P103 for Stroop B in psychosis; and P71 for Stroop A and P97for Stroop B in depression. The other Stroop trials did not require more lenientcut-scores to maintain specificity of at least 90% in these diagnostic groups.

Further, the mean FSIQ of credible subjects falling below any of the cut-offsaveraged more than seven points lower than the entire credible group (88.78 ± 9.94 vs.96.06 ± 14.63), suggesting that even though credible subjects with extremely low IQ(<70) had been excluded, lowered IQ was still a risk factor for false positive identifica-tion as noncredible on the Stroop Test. The Stroop A, B, and C cut-offs associated withP90% specificity in those with borderline IQ (70–79; n = 26 credible subjects) to lowaverage IQ (80–89; n = 48 credible subjects) are P77, P107, and P210, respectively.

DISCUSSION

The present study, utilizing a large sample of “real world” noncredible subjectsand heterogeneous credible neuropsychological patients, revealed that the word-readingand color-naming sections of the Comalli version of the Stroop task appear to havemoderate sensitivity for the detection of noncredible cognitive performance (49–53%sensitivity).

In contrast, the Stroop interference trial was associated with poor sensitivity(29%). The most effective measures of response bias are those that appear difficult butin fact are simple, such as tasks involving recognition memory (Warrington Recogni-tion Memory Test – Words, M. Kim et al., 2010; Digit Symbol Recognition, N. Kimet al., 2010; Rey-Osterrieth Effort Equation, Lu, Boone, Cozolino, & Mitchell, 2003;Rey Word Recognition, Nitch, Boone, Wen, Arnold, & Alfano, 2006) and rapidutilization of over learned information (dot counting and letter discrimination, Boone,Lu, & Herzberg, 2002b, 2002c). Stroop A and B would fall into the latter category,while Stroop C, requiring rapid response inhibition, is a difficult task for many typesof patients. As such, its utility for identification of response bias will always be limitedin that cut-offs selected to detect a sufficient percentage of noncredible subjects willhave unacceptable false positive rates.

Some previous studies had observed an inverted Stroop effect (in which color-naming performance is poorer than scores on color interference) in brain injury simula-tors (Osimani et al., 1997) and probable malingerers (Egeland & Langfjaeran, 2007).However, in the current study, equations documenting an inverted Stroop effectachieved very poor classification rates (0–26% sensitivity). Results from the earlierstudies were based on small sample sizes, used simulators, and employed problematicgroup assignment methods (i.e., use of a single PVT), which likely rendered thosefindings unreliable.

Learning disability status, as defined as diagnosis (including rule out diagnosis)of learning disability or self-reported history of a suspected learning disability and/orpoor performance in school, was associated with increased false positive rates on theStroop trials. This was not an unexpected finding given that rapid word-reading andcolor-naming are lower in individuals with learning disabilities (Golden & Golden,2002; Willcutt et al., 2001). Cut-offs had to be modified to maintain adequate specific-

EFFECTIVENESS OF THE STROOP TEST AS A PVT 13

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

ity in the credible subgroup with history of learning difficulties, which slightly loweredtest sensitivity (48 and 42% for Stroop A and B cut-offs). It is recommended that theStroop Test be used as a PVT in individuals with histories of confirmed or suspectedlearning disability, but only if cut off scores are adjusted to prevent them fromincorrectly being identified as making inadequate effort.

Examination of the characteristics of those credible subjects without histories oflearning problems who fell beyond cut-offs showed that very advanced age (P80) andpresence of depression, severe TBI, and psychosis increased the risk of false positiveidentifications, and as a result, the Stroop Test should be used cautiously as a measureof response bias (with adjusted cut-offs) in these populations. Further, given that credi-ble subjects with IQ <70 and diagnoses of dementia were excluded from the crediblegroup, cut-offs validated in this study cannot be applied to these populations. Credibleindividuals with borderline to low average overall intelligence also showed loweredspecificity rates, requiring adjustment of cut-offs for individuals in these groups.However, the difficulty with making IQ-corrections to cut-scores in clinical practice isthat individuals feigning neurocognitive impairment also typically obtain spuriouslylowered IQ scores; thus, adjustment to cut-scores based on IQ should be made basedon premorbid IQ estimates, not postmorbid IQ data.

Of note, ESL status or having learned English concurrently with anotherlanguage, somatoform diagnosis, and educational level were not associated withincreased failure rates on Stroop trials, and the apparent link between gender and falsepositive identifications instead was likely an artifact of diagnosis type. Thus, theseconditions/characteristics do not warrant concern when using the Stroop as a PVT.

The current study employed a known groups design, which has the advantage ofenhanced generalizability over simulation studies. However, accuracy of group assign-ment is a concern in the former, and several steps were taken to minimize criteriongroup contamination. First, members of the credible group could not have a motive tofeign whereas the noncredible group subjects were either in litigation or attempting tosecure disability. Many studies of negative response bias employ a single compensa-tion-seeking population with group assignment made based on PVT failures, but webelieve that this allows for inadvertent assignment of noncredible subjects to crediblegroup because of imperfect PVT sensitivity. The requirement that credible subjectshave no external motive to feign markedly reduces the possibility of incorrectinclusion of noncredible test takers in the credible group.

Second, we required that noncredible subjects fail at least two independentPVTs, whereas some investigations of response bias use a single PVT failure to assignto noncredible groups. The problem with this latter approach is that some research hasshown that failure on a single PVT out of several administered may approach 40%(Victor et al., 2009); in contrast, failure on 2 PVTs is rare in credible populations (i.e.,associated with 95–100% specificity; see Gigler, Merten, Merchelback, & Oswald,2010; Suhr, Tranel, Wefel, & Barrash, 1997; Victor et al., 2009).

Finally, we carefully examined evidence of functionality in activities in dailyliving in those cases that met the first two criteria (i.e., motive and at least 2 PVTfailures) to ensure that the PVT failure was not due to actual very low cognitivefunctioning.

In conclusion, these data on Comalli Stroop Test performance in credible andnoncredible groups suggest that the Stroop Test can be added to the accumulating

14 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

group of “embedded” performance validity indicators available from standardneurocognitive tests.

Note

1. Analyses were initially performed for each hospital setting separately, but mean Stroop scores werehighly similar across campuses, and the data sets were collapsed and analyzed conjointly.

REFERENCES

Arnold, G., Boone, K. B., Lu, P., Dean, A., Wen, J., Nitch, S., & McPherson, S. (2005).Sensitivity and specificity of finger tapping test scores for the detection of suspect effort.Clinical Neuropsychologist, 19, 105–120.

Babikian, T., Boone, K. B., Lu, P., & Arnold, G. (2006). Sensitivity and specificity ofvarious digit span scores in the detection of suspect effort. Clinical Neuropsychologist,20, 145–159.

Backhaus, S. L., Fichtenberg, N. L., & Hanks, R. A. (2004). Detection of sub-optimalperformance using a floor effect strategy in patients with traumatic brain injury. ClinicalNeuropsychologist, 18, 591–603.

Boone, K.B. (2009). The need for continuous and comprehensive sampling of effort/responsebias during neuropsychological examinations. Clinical Neuropsychologist, 23, 729–741.

Boone, K. B. (2013). Clinical practice of forensic neuropsychology: An evidence-basedapproach. New York, NY: Guiford Press.

Boone, K. B., Lu, P., Back, C., King, C., Lee, A., Philpott, L., & Warner-Chacon, K. (2002a).Sensitivity and specificity of the Rey Dot Counting Test in patients with suspect effort andvarious clinical samples. Archives of Clinical Neuropsychology, 17, 625–642.

Boone, K. B., Lu, P., & Herzberg, D. (2002b). The b test. Los Angeles, CA: Western Psycholog-ical Services.

Boone, K. B., Lu, P., & Herzberg, D. (2002c). The Dot Counting Test. Los Angeles, CA:Western Psychological Services.

Boone, K. B., Lu, P., & Wen, J. (2005). Comparison of various RAVLT scores in the detectionof noncredible memory performance. Archives of Clinical Neuropsychology, 20, 301–319.

Boone, K. B., Salazar, X., Lu, P., Warner-Chacon, K., & Razani, J. (2002). The Rey 15-itemrecognition trail: A technique to enhance sensitivity of the Rey 15-Item Memorization Test.Journal of Clinical and Experimental Neuropsychology, 24, 561–573.

Bortnik, K. E., Boone, K. B., Marion, S. D., Amano, S., Ziegler, E., Cottingham, M. E., ..., &Zeller, M. A. (2010). Examination of various WMS-III logical memory scores in theassessment of response bias. Clinical Neuropsychologist, 24, 344–357.

Buckley, T. C., Galovski, T., Blanchard, E. B., & Hickling, E. J. (2003). Is the emotional stroopparadigm sensitive to malingering? A between-groups study with professional actors andactual trauma survivors. Journal of Traumatic Stress, 16, 59–66.

Bush, S. S., Ruff, R. M., Tröster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., ..., & Silver,C. H. (2005). Symptom validity assessment: Practice issues and medical necessity NANPolicy & Planning Committee. Archives of Clinical Neuropsychology, 20, 419–426.

Cannon, B. J. (2003). An emotional stroop effect to malingering-related words. Perceptual andMotor Skills, 96, 827–834.

Comalli, P. E., Wapner, S., & Werner, H. (1962). Interference effects of stroop color-word test inchildhood, adulthood, and aging. Journal of Genetic Psychology, 100, 47–53.

EFFECTIVENESS OF THE STROOP TEST AS A PVT 15

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Chafetz, M. D., & Matthews, L. H. (2004). A new interference score for the Stroop test.Archives of Clinical Neuropsychology, 19, 555–567.

Dean, A. C., Victor, T. L., Boone, K. B., & Arnold, G. (2008). The relationship of IQ to efforttest performance. Clinical Neuropsychologist, 22, 705–722.

Dean, A. C., Victor, T. L., Boone, K. B., Philpott, L. M., & Hess, R. A. (2009). Dementia andeffort test performance. Clinical Neuropsychologist, 23, 133–152.

Demakis, G. J., Sweet, J. J., Sawyer, T. P., Moulthrop, M., Nies, K., & Clingerman, S. (2001).Discrepancy between predicted and obtained WAIS-R IQ scores discriminates betweentraumatic brain injury and insufficient effort. Psychological Assessment, 13, 240–248.

Egeland, J., & Langfjaeran, T. (2007). Differentiating malingering from genuine cognitivedysfunction using the Trail Making Test-ratio and Stroop interference scores. AppliedNeuropsychology, 14, 113–119.

Erdal, K. (2004). The effects of motivation, coaching, and knowledge of neuropsychologyon the simulated malingering of head injury. Archives of Clinical Neuropsychology, 19,73–88.

Gigler, P., Merten, T., Merchelback, H., & Oswald, M. (2010). Detection of feigned crime-related amnesia: A multimethod approach. Journal of Forensic Psychology Practice, 10,440–463.

Golden, C. J. (1978). Stroop color word test: A manual for clinical and experimental uses.Wood Dale, IL: Stoelting Company.

Golden, C. J., & Golden, Z. L. (2002). Patterns of performance on the stroop color and wordtest in children with learning, attentional, and psychiatric disabilities. Psychology in theSchools, 39, 489–495.

Iverson, G. L., & Binder, L. M. (2000). Detecting exaggeration and malingering in neuropsycho-logical assessment. Journal of Head Trauma Rehabilitation, 15, 829–858.

Kim, M., Boone, K. B., Victor, T., Marion, S. D., Amano, S., Cottingham, M. E., ..., & Zeller,M. A. (2010). The Warrington Recognition Memory Test for words as a measure ofresponse bias: Total score and response time cutoffs developed on “real world” credible andnoncredible subjects. Archives of Clinical Neuropsychology, 25, 60–70.

Kim, N., Boone, K. B., Victor, T., Lu, P., Keatinge, C., & Mitchell, C. (2010). Sensitivity andspecificity of a digit symbol recognition trial in the identification of response bias. Archivesof Clinical Neuropsychology, 25, 420–428.

Lu, P. H., Boone, K. B., Cozolino, L., & Mitchell, C. (2003). Effectiveness of the Rey-OsterriethComplex Figure Test and the Meyers and Meyers recognition trail in the detection ofsuspect effort. Clinical Neuropsychologist, 17, 426–440.

Lu, P. H., Boone, K. B., Jimenez, N., & Razani, J. (2004). Failure to inhibit the readingresponse on the Stroop test: A pathognomonic indicator of suspect effort. Journal ofClinical and Experimental Neuropsychology, 26, 180–189.

Mitrushina, M., Boone, K. B., Razani, J., & D’Elia, L. F. (2005). Handbook of normative datafor neuropsychological assessment (2nd ed.). New York, NY: Oxford University Press.

Nelson, N. W., Boone, K., Dueck, A., Wagener, L., Lu, P., & Grills, C. (2003). Relationshipsbetween eight measures of suspect effort. Clinical Neuropsychologist, 17, 263–272.

Nitch, S., Boone, K. B., Wen, J., Arnold, G., & Alfano, K. (2006). The utility of the ReyWord Recognition Test in the detection of suspect effort. Clinical Neuropsychologist, 20,873–887.

Osimani, A., Alon, A., Berger, A., & Abarbanel, J. M. (1997). Use of the Stroop phenomenonas a diagnostic tool for malingering. Journal of Neurology, Neurosurgery, and Psychiatry,62, 617–621.

Rosselli, M., Ardila, A., Santisi, M. N., Del Rosario Arecco, M., Salvatierra, J., Conde, A., &Lenis, B. (2002). Stroop effect in Spanish-English bilinguals. Journal of the InternationalNeuropsychological Society, 8, 819–827.

16 TIMOTHY J. ARENTSEN ET AL.

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013

Schmand, B., Lindeboom, J., Schagen, S., Heijt, R., Koene, T., & Hamberger, H. L. (1998).Cognitive complaints in patients after whiplash injury: The impact of malingering. Journalof Neurology, Neurosurgery, and Psychiatry, 64, 339–343.

Sherman, D. S., Boone, K. B., Lu, P., & Razani, J. (2002). Re-examination of a Rey AuditoryVerbal Learning Test/Rey Complex Figure Discriminant Function to detect suspect effort.Clinical Neuropsychologist, 16, 242–250.

Slick, D. J., Sherman, E. M. S., & Iverson, G. L. (1999). Diagnostic criteria for malingeredneurocognitive dysfunction: Proposed standards for clinical practice and research. ClinicalNeuropsychologist, 13, 545–561.

Suhr, J., Tranel, D., Wefel, J., & Barrash, J. (1997). Memory performance after head injury:Contributions of malingering, litigation status, psychological factors, and medication use.Journal of Clinical and Experimental Neuropsychology, 19, 500–514.

Sweet, J. J., & Nelson, N. W. (2007). Validity indicators within executive function measures:Use and limits in detection of malingering. In K. B. Boone (Ed.), Assessment of feignedcognitive impairment: A neuropsychological perspective. New York, NY: Guilford Press.

Thomas, T. A., & Fremouw, W. J. (2009). The Stroop interference effect and spontaneous recallin true and malingered motor-vehicle accident-related PTSD. Journal of Forensic Psychiatry& Psychology, 20, 936–949.

Tombaugh, T. N. (1997). The Test of Memory Malingering (TOMM): Normative data from cog-nitively intact and cognitively impaired individuals. Psychological Assessment, 9, 260–268.

van Gorp, W. G., Humphrey, L. A., Kalechstein, A., Brumm, V. L., McMullen, W. J., Stoddard,M., & Pachana, N. A. (1999). How well do standard clinical neuropsychological testsidentify malingering? A preliminary analysis. Journal of Clinical and ExperimentalNeuropsychology, 21, 245–250.

Vickery, C. D., Berry, D. T. R., Dearth, C. S., Vagnini, V. L., Baser, R. E., Cragar, D. E., &Orey, S. A. (2004). Head injury and the ability to feign neuropsychological deficits.Archives of Clinical Neuropsychology, 19, 37–48.

Victor, T. L., Boone, K. B., Serpa, J. G., Buehler, J., & Ziegler, E. A. (2009). Interpreting themeaning of multiple symptom validity test failure. Clinical Neuropsychologist, 23, 297–313.

Willcutt, E. G., Pennington, B. F., Boada, R., Ogline, J. S., Tunick, R. A., Chhabildas, N. A., &Olson, R. K. (2001). A comparison of the cognitive deficits in reading disability andattention-deficit/hyperactivity disorder. Journal of Abnormal Psychology, 110, 157–172.

EFFECTIVENESS OF THE STROOP TEST AS A PVT 17

Dow

nloa

ded

by [

71.1

89.1

31.1

83]

at 1

0:58

05

Aug

ust 2

013