Psychophysiological and behavioral measures for detecting ...old.psychology.huji.ac.il/.upload/Gershon/psyp_01148 1 .pdf · Psychophysiological and behavioral measures for detecting

Psychophysiological and behavioral measures for

detecting concealed information: The role of memory

for crime details

GALIT NAHARIa AND GERSHON BEN-SHAKHARb

aDepartment of Criminology, Bar Ilan University, Ramat Gan, IsraelbDepartment of Psychology, The Hebrew University of Jerusalem, Jerusalem, Israel

Abstract

This study examined the role of memory for crime details in detecting concealed information using the electrodermal

measure, Symptom Validity Test, and Number Guessing Test. Participants were randomly assigned to three groups:

guilty, who committed a mock theft; informed-innocents, who were exposed to crime-relevant items; and uninformed-

innocents, who had no crime-relevant information. Participants were tested immediately or 1 week later. Results

showed (a) all tests detected the guilty in the immediate condition, and combining the tests improved detection

efficiency; (b) tests’ efficiency declined in the delayed condition, mainly for peripheral details; (c) no distinction between

guilty and informed innocents was possible in the immediate, yet some distinction emerged in the delayed condition.

These findings suggest that, while time delay may somewhat reduce the ability to detect the guilty, it also diminishes the

danger of accusing informed-innocents.

Descriptors: Concealed Information Test, Symptom Validity Test, Skin conductance response, Memory

Scientists and forensic experts have attempted for many years to

develop instruments and methods for the purpose of detecting

deception (e.g., Vrij, 2008). One notable approach, which has

spawned several methods over the past century, is the use of

psychophysiological responses (see, e.g., Ben-Shakhar & Fu-

redy, 1990; Marston, 1917; Raskin, 1989; Reid & Inbau, 1977).

In this study, we focus on just one of the two prominent methods

of psychophysiological detection, known as the Guilty Knowl-

edge Test (GKT) or the Concealed Information Test (CIT). This

method, which is designed to detect concealed knowledge, rather

than deception, is based on sound theoretical principles and

proper controls and therefore satisfies the necessary requirements

of an objective test (see Ben-Shakhar, Bar-Hillel, & Kremnitzer,

2002; Ben-Shakhar & Elaad, 2002a; Lykken, 1974, 1998).

The CIT (Lykken, 1959, 1960) utilizes a series of multiple-

choice questions, each having one relevant alternative (e.g., a

feature of the crime under investigation) and several neutral

(control) alternatives, chosen so that an unknowledgeable (in-

nocent) suspect would not be able to discriminate them from the

relevant alternative (Lykken, 1998). These relevant items are

significant only for knowledgeable (guilty) individuals and, thus,

if the suspect’s physiological responses to the relevant alternative

are consistently larger than to the neutral alternatives, knowledge

about the event (e.g., crime) is inferred. As long as information

about the event has not leaked out and assuming that each al-

ternative appears equally plausible to an individual with no guilty

knowledge, the probability that an innocent suspect would pro-

duce consistently larger responses to the relevant than to the

neutral alternatives depends only on the number of questions and

the number of alternative answers per question, and hence it can

be controlled such that maximal protection for the innocent is

provided.

Extensive research conducted since the early 1960s has dem-

onstrated that the CIT can be successfully used for detecting

relevant information and discriminating between knowledgeable

(guilty) and innocent individuals (e.g., Ben-Shakhar & Furedy,

1990; Ben-Shakhar & Elaad, 2003; Elaad, 1998; Lykken, 1959,

1960, 1998). In the last decade, the interest in the CITseems to be

growing, and various studies examining the mechanisms under-

lying this method, as well as applied questions related to its pos-

sible use as an aid in criminal investigations, have been published

(e.g., Gamer, Bauermann, Stoeter, & Vossel, 2007; Gamer &

Berti, 2010; Langleben et al. 2005; Rosenfeld et al., 2008; Rose-

nfeld, Shue, & Singer, 2007; Verschuere, Crombez, De Clercq, &

Koster, 2004; Verschuere, Crombez, & Koster, 2004).

However, in spite of the extensive research conducted on the CIT

and its impressive validity estimates, the method has been applied

extensively only in Japan (seeNakayama, 2002;Osugi, 2010).Many

possible accounts have been offered to explain this gap between

research and practice (e.g., Iacono, 2010; Kraphol, 2010; Podlesny,

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

This research was funded by grants from the Israel Science Foun-

dation to Gershon Ben-Shakhar. We thank Keren Maoz, Assaf Breska,

and Tamar Pelet for their assistance in this research and EwoutMeijer for

his helpful comments.Address correspondence to: Galit Nahari, Department of Crimino-

logy, Bar-Ilan University, Ramat Gan, 52900, Israel. E-mail: [email protected]

Psychophysiology, ]]] (2010), 1–12. Wiley Periodicals, Inc. Printed in the USA.Copyright r 2010 Society for Psychophysiological ResearchDOI: 10.1111/j.1469-8986.2010.01148.x

1

P S Y P 0 1 1 4 8 B Dispatch: 4.10.10 Journal: PSYP CE: Bindu

Journal Name Manuscript No. Author Received: No. of pages: 12 PE: Deepa/Mini

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

mailto:[email protected]

mailto:[email protected]

1993), but one notable limitation of the bulk of CIT research con-

ducted so far is that it has a questionable external validity. Estimates

ofCITvaliditywere based almost exclusively onmock-crime studies,

which differ in many important respects from real polygraph exam-

inations. Roughly, these differences can be classified into twomajor

categories: (a) Motivational-emotional factors, related to the differ-

ences between committing a real crime and following the instructions

of an experimenter to commit a mock-crime, as well as differences

related to the possible consequences of an incriminating polygraph

test compared with failing a laboratory CIT (which typically means

that the participant will not receive a bonus of a few dollars); and

(b) cognitive factors, related to processing the critical information

during the crime and the ability to remember this informationduring

the test.

As the present study focuses only on the second category of

cognitive factors, only this category will be elaborated. In the

typical mock-crime experiment, it is guaranteed that all subjects

learn all the relevant items (e.g., six features of the mock crime,

such as the color of an envelope stolen and the amount of money

it contained). Furthermore, subjects are typically tested imme-

diately after being exposed to this critical information, thus

memory does not play a role in the experimental situation. In real

life, things are typically entirely different. The guilty person is

faced with a complex scene, and it cannot be assumed that all

details were indeed noticed, processed, and stored in memory.

Criminal suspects are very rarely tested immediately after

committing the criminal act. In most cases they are tested

days, weeks, and sometimes months after the crime was com-

mitted (see Ben-Shakhar & Furedy, 1990; Carmel, Dayan,

Naveh, Raveh, & Ben-Shakhar, 2003).

Carmel et al. (2003) were the first to systematically examine

these cognitive aspects of the external validity of CITexperiments

by comparing the standard mock crime procedure with a more

realistic type of mock crime and by comparing immediate and

delayed CITs. The results of this study revealed that the ‘‘real-

istic’’ mock-crime was associated with overall lower recall rates

and weaker detection efficiency than the standard procedure.

However, these effects were mediated by the type of CIT ques-

tions used, such that the decline in memory and detection effi-

ciency was observed mainly for peripheral items that were not

directly related to themock crime (e.g., a picture on the wall), but

not for items that were central to the event (e.g., the amount of

money stolen). The results further indicated that a CIT based

exclusively on the central items was unaffected by the type of

mock-crime procedure. More recently, Gamer, Kosiol and

Vossel (2010) also demonstrated that central items, but not pe-

ripheral ones, are recalled after a 2-week period. Thus, these

studies imply that a careful selection of central items (e.g., modus

operandi, type ofweapon used) can produce high accuracy levels,

not only in the artificial laboratory conditions, but also in more

realistic settings.

Another potential limitation of the CIT is the possibility that,

in actual criminal cases, some critical information may leak out

to innocent suspects. Leakage of information to unaware sus-

pects may lead to enhanced responses to these items and even-

tually to a misclassification of the informed innocent suspects as

guilty (e.g., Bradley, Barefoot, & Arsenault, 2010). Several stud-

ies examined the effects of exposing the critical information to

‘‘innocent’’ subjects in mock-crime experiments (e.g., Ben-

Shakhar, Gronau, & Elaad, 1999; Bradley, MacLaren, & Carle,

1997; Bradley & Rettinger, 1992; Bradley & Warfield, 1984)

and generally demonstrated that, although informed innocent

subjects showed smaller responses to the critical items than guilty

subjects, they did show significantly larger responses to these

items when compared with uninformed innocent subjects. Brad-

ley and his colleagues (e.g., Bradley &Warfield, 1984; Bradley et

al., 1997) proposed a method, labeled the Guilty Action Test

(GAT), in which subjects are asked about their actions rather

than their knowledge. Bradley et al. (1997) demonstrated that,

while the GATwas associated with a smaller rate of false positive

outcomes in informed innocents than the standard version of the

CIT, it still produced a much larger rate of false positive out-

comes in informed innocents compared with uninformed inno-

cents. Recently, Gamer, Verschuere, Crombez, andVossel (2008)

used the GAT and compared ‘‘guilty’’ subjects with ‘‘informed

innocents’’ both when tested immediately after committing a

mock crime and when tested 2 weeks later. They found that,

while ‘‘guilty’’ subjects tended to forget only the peripheral items

during this 2-week period, the informed innocents forgot all

items. Consequently, detection of guilty subjects remained stable

(i.e., the areas under the Receiver Operating Characteristic

(ROC) were 0.89 and 0.90 in the immediate and delayed con-

ditions, respectively), whereas erroneous detection of informed

innocents was significantly reduced in the delayed condition (the

ROC areas were 0.95 and 0.75 in the immediate and delayed

conditions, respectively).

The purpose of the present study is to continue and extend the

line of research initiated by Carmel et al. (2003) and Gamer et al.

(2010). Specifically, we used the more realistic type of mock

crime proposed by Carmel et al. (2003) and a 3 � 2 between-

subjects design with guilt (‘‘guilty,’’ ‘‘informed innocents,’’and

‘‘uninformed innocents’’) and time of testing (immediate vs. de-

layed by 1 week) as the two orthogonal factors. Furthermore, in

addition to measuring skin conductance, which has been dem-

onstrated as the most efficient autonomic measure in CIT re-

search (e.g., Gamer, Verschuere, Crombez, & Vossel, 2008), we

examined two behavioral measures that have been rarely applied

for detecting concealed information.

Both of these measures are based on asking examinees, who

deny knowledge of some critical items, to guess these items.

Effective concealment is possible when guessing is random (i.e.,

where the critical alternative is guessed with the same probability

as all other alternatives), but producing random guesses may be

very difficult for those who are actually aware of the true alter-

natives. Consequently, the outcome of multiple guessing at-

tempts may differentiate knowledgeable (who would not be able

to produce random guessing) and unknowledgeable examinees

(whose guesses will be random). Specifically, we adopted the

Symptom Validity Test (SVT), which is a forced-choice self-re-

port test (with two alternative answers for each question) that has

been used to detect malingering in various contexts (e.g., Me-

rckelbach, Hauer, & Rassin, 2002; Pankratz, Fausti, & Peed,

1975; Verschuere, Meijer, & Crombez, 2008). The SVTmay be a

promising tool for detecting concealed information because it is

based on an entirely different rationale than the physiological

measures and thus may add non-redundant information. Re-

cently, Meijer, Smulders, Johnston and Merckelbach (2007)

demonstrated that the SVT can be a valuable tool for detect-

ing concealed knowledge and, at least in some conditions, it

can increase the validity of CITs based on skin conductance re-

sponse (SCR).

The second measure adopted in this study was derived from

the Number Guessing Test (NGT) proposed by Lieblich and

Ninio (1972) and by Lieblich, Shaham, and Ninio (1976). It is

2 G. Nahari & G. Ben-Shakhar

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

based on a similar rationale to the SVT, but relies on guessing

values of continuous variables, rather than guessing which of two

possible alternatives is the correct one. Specifically, this method

utilizes several numerical items (e.g., the house number where a

crime was committed, the day of the month when the event oc-

curred). As in the SVT, examinees are asked to guess the correct

value of each item, and the detection measure is based on the

correlation between the true profile and the profile guessed by

each examinee. It is expected that knowledgeable examinees will

produce larger correlations (either positive or negative) than un-

knowledgeable individuals.

In the present experiment, we examine the utility of the three

detection measures (SCRs based on the CIT, SVT, and NGT) in

differentiating between ‘‘guilty,’’ ‘‘informed innocents,’’ and

‘‘uninformed innocents’’ both when examined immediately after

committing a mock crime and 1 week later. In addition, we ex-

amine whether memory of the critical items and detection effi-

ciency depend on the type of items used (central vs. peripheral).

Methods

Participants

One hundred and twenty Hebrew University of Jerusalem

undergraduate students (86 females and 34 males) participated

in the experiment for course credit or payment (they receive 40

New Israeli Shekels (NIS), which is equivalent to US$10.50) or

course credit. Their mean age was 24.06 (SD5 3.22) years. Par-

ticipants were recruited through ads placed on notice boards

throughout the campus. All participants signed a consent form

indicating that participation was voluntary and that they could

withdraw from the experiment at any time without penalty.

Eleven participants were eliminated due to unusually high skin

resistance levels or excessive movements during the experiment,

and eight additional participants were eliminated because they

did not commit themock crime or failed to show up to the second

part of the experiment. These participants were replaced, so the

total number of participants remained at 120.

Apparatus

Skin conductance was measured by a constant voltage

system (0.5 V Atlas Researches, Hod Hasharon, Israel). Two

Ag/AgCl electrodes (0.8-cm diameter) were used with a 0.05 M

NaCL electrolyte. The experiment was conducted in an air-

conditioned laboratory, and anNECCF-500 computer was used

to control the stimulus presentation and compute skin conduc-

tance changes. The stimuli were displayed on the computer

monitor.

Design

A3 � 2 between-participants designwas used, with the following

two orthogonal factors: (a) group: Participants either performed

a mock-crimeF’’guilty’’ condition, didn’t performed a mock-

crime but were informed about the relevant detailsF’’informed-

innocent’’ condition, or didn’t perform the mock-crime and had

no knowledge of the relevant detailsF ‘‘uninformed-innocent’’

condition; and (b) time of test: immediately after the first stage of

the experiment (see below)Fimmediate condition, or after 1

weekFdelayed condition. The participants were randomly

allocated to the six conditions created by this design, with 20

participants in each condition.

Procedure

The experiment was conducted in two stages:

Stage 1

Participants arrived at the laboratory individually at the pre-

determined time. They were met by an assistant who read out

loud the instructions appropriate for their particular condition.

Guilty participants. Were instructed to go to an office of a

staff member and ask for a particular numbered article. They

have been told that, if the staff member is not in his office, they

should open the office using a key that was handed to them in

advance, enter the room, and find the particular article in a pile of

numbered articles placed on the desk. In addition, they were

requested to take advantage of the situation and steal an envelope

withmoney and a jewel, to hide it in amail box that was indicated

to them, and then to enter the laboratory and hand over the

requested article to the assistant. Actually, the staff member was

never in the office, and thus all participants in this experimental

condition were able to steal the envelope. Upon arrival at the

designated office, participants faced a locked door with the name

of the staff member and the office number typed on it. They

opened the office using the key, and, when they entered the office,

they saw that the light was turned on. On the desk, they found a

pile of numbered articles with a newspaper on the top of it. Beside

the pile were a family photo and a soft drink bottle. They found

the requested article and looked for the envelope in the room.

The envelope was located in the first drawer of a cabinet. It was a

colored envelope with a date on it. The envelope was open, and

contained Euros bills and a jewel. A note with a name was at-

tached to the bills by an office clip. After checking its contents,

the participants stole the envelope, dropped it in the mail box,

and returned to the lab with the requested article.

A total of six profiles of items were used in the CIT. Each of

these profiles was composed of 11 items, described in Table 1.

Role of memory for crime details 3

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

Table 1. Profiles of Items Used in the Experiments

ProfileEnvelopecolorn

Nameon note

Victim’sfamily namen Soft drink Newspaper Jeweln

Articlenumbern

Officenumbern

Sum ofEurosn Date

Sex ofvictim

Buffer Yellow Lisa Topaz Mineral water Hazofe Brooch 6 5 26 15 –a Green Marsha Koren Sprite Haaretz Earrings 27 15 8 11 Maleb Orange Lora Morag Ice-tea Yediot aharonot Ring 15 10 22 26 Femalec Blue Susan Carmel Coca-cola Maariv Necklace 19 25 6 28 Maled Red Judy Marom Orange juice Israel hayom Neck-pendent 22 20 14 6 Femalee Purple Ashlee Zamir Soda water Globes Bracelet 12 30 4 19 Female

Note: ‘Sex of victim’ is the only item among the 11 items that did not appear in the CIT, but only on the SVT. Thus, it doesn’t have a buffer profile.nThese items were classified as central.

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

One profile, the buffer profile, was used only in the interrogation

phase of the experiment, and was never used as the relevant

profile. One of the other five profiles of items (a–e) was randomly

chosen as the relevant profile for each participant, such that each

profile served as the relevant profile for 20% of the participants.

Eight additional relevant items, which were identical for all par-

ticipants, were used in this experiment for the SVT. These items

are described in Table 2.

Informed-innocent participants. Read an article entitled ‘‘A

Scandal: Theft in the Campus.’’ The article described the mock-

crime and included all the relevant details (according to the par-

ticular profile assigned to the participant). To give the impression

that the article was real, it was embedded among other articles in

a student newspaper. Participants were not asked to memorize

the details in order to preserve the more realistic nature of the

manipulation (See Carmel et al., 2003). After reading the article,

as a control assignment, the participants were requested to go to

the teaching assistants’ mail boxes, where they found a short

questionnaire dealing with personal hobbies and interests. They

were requested to fill out the questionnaire and drop it into an-

other mail box that was pointed out to them (the same mail box

in which the guilty participants were instructed to put the stolen

envelope).

Uninformed-innocent participants. were exposed to the same

procedure as the Informed-innocent participants, except that the

article they read didn’t reveal any of the relevant details.

Stage 2

CIT, SVT, and NGT were administered to all participants.

Participants in the immediate condition took the tests immedi-

ately after stage 1, and those in the delayed condition took it 1

week later. An experimenter, who was unaware of the exper-

imental condition to which the examinee was assigned, informed

the participants that a theft was committed in the Psychology

Department, and that they are suspects in committing this theft.

He/she explained that the experiment was designed to test

whether they could cope with lie detection tests and convince the

examiner that they are innocent of stealing the money and jewel.

It was emphasized that beating these tests is a difficult assignment

that only few people can succeed in, and they were promised a

bonus of 10 NIS (about $2.50) for a successful performance of

the task. Subsequently, the participant was attached to the elec-

trodes, and the CIT examination was conducted. The CITques-

tions were presented after an initial rest period of 2 min, during

which skin conductance baseline was recorded. All examinees

were presented with ten different questions, each targeting a

different relevant detail of the mock crime (the envelope color,

the name of the person written on the note, the family name of

the office owner, the brand of soft drink, the name of the news-

paper, the type of jewel, the number of the requested article, the

sum of money, the office’s number, and the date written on the

envelope). The questions were simultaneously presented on the

computer monitor and heard through the computer speakers.

Each question was followed by a buffer item, designed to absorb

the initial orienting response, and a set of five items (the relevant

item and four neutral control items). The order of the questions

as well as the order of the five items within each question was

randomized. Each questionwas presented for 10 s, and each item

(alternative answer) was presented for 5 s. The inter-stimulus

interval (blank screen) ranged randomly from 16 to 24 s with a

mean of 20 s. Participants were asked to respond verbally, saying

‘‘no’’ to every item. A short, participant-terminated break was

given after presentation of five questions.

Upon completion of the CIT, participants were detached

from the electrodes and performed the SVTandNGT, using a PC

computer. The SVT consisted of 15 questions, each with 2 al-

ternative answersFthe relevant detail (correct answer) and a

non-relevant detail (wrong answer). Six of the SVT questions

resembled those of the CIT (the envelope color, the name of the

person that was written on the note, the family name of the

office’s owner, the brand of soft drink, name of the newspaper,

and type of jewel). For each of these 6 questions, the alternative

to the correct answer was chosen randomly from among the 4

control items, included in the CIT. The other 9 questions ap-

peared only on the SVT, 8 of them had a fixed alternative answer,

while for the 9th (the victim’s gender), the answer depended on

the specific profile. These questions along with the correct and

incorrect alternative answers are displayed in Table 2.

The questions appeared on the screen, one at a time, with the

two alternative answers. The participants were instructed as

follows: ‘‘Please choose one alternative answer each time and if

you do not know the answer, just guess it!’’ Participants were not

aware of the length of the test, and thus would have had difficulty

adjusting their performance in accordance with chance. The

NGT consisted of 4 open questions referring to numerical rel-

evant details, which were included in the CIT (the number of the

requested article, sum of money, the office’s number, and the

date that was written on the envelope). The participants were

informed that answers should be within the range of 1 to 30, and

were instructed as follows: ‘‘Please type your answer by using the

keyboard and if you do not know the answer, just guess it.’’

Before each test, the experimenter indicated to the participants

that the correct answers were known only to the thief. Following

Carmel et al. (2003) and Gamer et al. (2010), the 19 questions

included in this experiment were classified as either central

(questions directly related to the execution of the mock crime) or

peripheral (questions related to items that were present in the

crime scene, but were unrelated to its execution). Tables 1 and 2

specify for each question whether it was classified as central

or peripheral.

At the end of the questioning session, the experimenter

thanked the participants, and asked them to wait until the com-

puter program processed the data of the tests and reached a

decision as to whether they were found ‘‘guilty’’ or ‘‘innocent.’’


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

Table 2. Items Included Only in the SVT

Question CurrencynVictim’stitlen

Envelope’sconditionn

Familyphoto

Position ofnewspapern Drawern

Glasses indrawer

Light inofficen

Victim’sgender

Correct answer Euro Dr. Open Present On top of the pile First Absent On See Table 1Alternative answer Dollar Professor Closed Absent Not on top Second Present Off See Table 1

nThese items were classified as central.

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

The processing took 1 min, and subsequently two memory tests

were administered to examine whether participants recalled the

relevant items: the first was a recall memory test consisting of the

19 questions used in the three detection tests, and the second was

a recognition memory test in which participants were requested

to choose the correct alternative on a copy of the SVTand NGT

which, together, covered all the 19 questions that were used.

Guilty and informed-innocent participants responded to both

recall and recognitionmemory tests, while uninformed-innocents

responded only to the recognition memory test. All participants

were asked to attempt to recall or recognize the relevant details

and guess only when they didn’t know the answer. Level of con-

fidencewas rated for each answer on a 6-point scale ranging from

1 (not confident at all) through 6 (very confident). In addition,

participants filled up a questionnaire regarding their perfor-

mance in the experiment. Specifically, they were asked about

their motivation to beat the tests, whether or not they used a

strategy during the tests, etc. Finally, all participants were de-

briefed and compensated.

Scoring of the Dependent Measures

SCR

Responses were transmitted in real time to the computer.

SCR was defined as the maximal increase in conductance ob-

tained from the examinee, from 1 s to 5 s after stimulus onset and

computed using an A/D (NB-MIO-16) converter with a sam-

pling rate of 50 Hz. To eliminate individual differences in re-

sponsivity and permit meaningful comparisons of the responses

of different examinees, each participant’s SCR was transformed

into within-examinee standard scores (Ben-Shakhar, 1985). To

minimize habituation effects, within-block standard scores were

used (see Ben-Shakhar & Elaad, 2002b; Elaad & Ben-Shakhar,

1997). The 60 items (see Table 1 for a description of the 10 ques-

tions used in the CIT, with 6 alternative items for each question)

were divided into 2 blocks, each consisting of 30 items. Thus, the

z scores used in this study were computed relative to the mean

and standard deviation of the participant’s responses to the 30

items of each block. Finally, two detection scores were computed

for each participant (one for each item-type category) by aver-

aging the standardized SCRs elicited by the critical items within

each item-type category.

SVT

An unknowledgeable individual (uninformed innocent) is ex-

pected to guess the answers on the SVTand thus give about 50%

correct answers (chance level). It is hypothesized that a person

who is aware of the critical items will be unable to ignore this

information when answering the SVTand consequently deviate

from chance level performance. Although it is reasonable to as-

sume that individuals attempting to conceal critical items will

display below chance level performance on the SVT (e.g., Ve-

rschuere et al., 2008), we defined a detection measure based on

the SVT as the absolute deviation of the percent of correct an-

swers from chance level (50%). Specifically, this measure was

defined as jP� 50%j, where P is the percent of the participant’s

correct answers. We used this measure because in some cases

knowledgeable individuals may use their knowledge to guess

above chance level.

NGT

The NGT-based detection measure was defined as

the absolute value of the Pearson correlation coefficient

between the actual values and the values guessed by the

participant.1

Data Analysis and Statistics

Each dependent measure (rates of correctly recognized items and

the three detection scores constructed for SCR, SVT, andNGT2)

was subjected to a mixed 2 � 3 � 2 analysis of variance

(ANOVA), with item-type (central vs. peripheral) serving as a

within-subjects factor and group (‘‘guilty,’’ ‘‘informed inno-

cents,’’ and ‘‘uninformed innocents’’) and time of CIT (imme-

diate vs. delayed) serving as the 2 between-subjects factors. This

was followed by two sets of orthogonal planned contrasts. The

first, which was designed to examinemore closely the effect of the

item-type factor and its interaction with the other factors by

excluding the ‘‘uninformed innocents’’ (for whom no item-type

effect is expected), included the following contrasts: (1) The de-

pendent measure difference between central and peripheral items

among ‘‘guilty’’ participants was compared with the respective

difference among ‘‘informed innocents’’(i.e., examining the item-

type � group interaction, excluding ‘‘uninformed innocents); (2)

The dependent measure difference between central vs. peripheral

items in the immediate condition was compared with the respec-

tive difference in the delayed condition (i.e., examining the item-

type � time of testing interaction, excluding ‘‘uninformed inno-

cents’’ and (3) A contrast examining whether the item-type

differences reflect a group � time interaction, (i.e., whether the

item-type differences among ‘‘guilty’’ participants are less

affected by delaying the test than the respective differences

among ‘‘informed innocents.’’

The second set of contrasts, which was designed to examine

more closely the effects of the between-subjects factors, included

the following four planned contrasts: (1) Combined ‘‘guilty’’ and

‘‘informed innocents’’ (knowledgeable participants) were com-

pared with the ‘‘uninformed innocents’’; (2) ‘‘Guilty’’ were com-

pared with ‘‘informed innocents’’; (3) The time effect (defined as

the dependent variable difference between the immediate and the

delayed conditions) among knowledgeable participants was com-

pared with the time effect among ‘‘uninformed innocents’’; and (4)

The time effect among ‘‘guilty participants’’ was comparedwith the

time effect among ‘‘informed innocents.’’ A rejection region of

po.05was used for all statistical tests, and effect size estimateswere

computed, using Cohen’s f (Cohen, 1988). One-tailed tests were

used to test directional, a priori formulated hypotheses.

Results

Memory Tests

As the pattern of the results of the recall and recognition tests

were essentially similar, only the results of the recognition tests


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

1We used a slightly different measure than the one employed by Lie-blich and Ninio (1972) and Lieblich et al. (1976). They transformed eachnegative correlation into the absolute value of the observed correlationplus one. This measure was inefficient because many uninformed inno-cents produced negative correlations (which were expected when partic-ipants are guessingwith no prior knowledge) and adding 1 to the absolutevalue of these correlations inflated the detection measure among theseparticipants and resulted in a high rate of false positives.

2As the NGT is based on a limited number of numerical items, it wasimpossible to compare central and peripheral items in this context, andthus the item-type factor was not included in the NGTanalysis. Thus, a3 � 2 between-subjects ANOVA, with group and time as the two or-thogonal factors, was conducted.

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

are presented. The recognition results of four participants were

lost, and thus the following analyses are based on 116 partici-

pants. The mean rates of correctly recognized items were com-

puted across the 12 central and the 7 peripheral items, and they

are displayed in Figure 1 as a function of experimental condition.

The ANOVA, conducted on the recognition rates, yielded the

following outcomes: The results of the within-subject factors re-

vealed a statistically significant interaction between item-type

and group (F(2,110)5 12.40, f5 0.31, po.05). The main effect

of item-type was not statistically significant (F(1,110)5 2.23,

f5 0.07), mainly because item-type differences are neither

expected nor observed in the ‘‘uninformed innocent’’ condition.

The interaction of item-type with time of testing as well as the

triple interaction produced very small and non-significant effects

(Fo1 in both tests).

The three orthogonal contrasts conducted after excluding the

‘‘uninformed innocents’’ revealed that, consistent with our

hypothesis, the advantage of central over peripheral items (i.e.,

higher recognition rates) was significantly more pronounced

among ‘‘guilty’’ than among ‘‘informed innocent’’ participants

(t(110)5 3.49, f5 0.27, po.001). However, in contrast to our

hypothesis, the advantage of central over peripheral items was

not more pronounced in the delayed than in the immediate test

(t(110)5 0.17). Finally, the item-type differences did not reflect a

group � time interaction (t(110)5 1.87, f5 0.10).

The analysis of the between-subjects factors revealed statisti-

cally significant results for both main effects (F(2,110)5 78.84,

f5 1.16, po.001 for the group factor and F(1,110)5 17.99,

f5 0.38, po.001 for the time of testing factor). The interaction

between these two factors was also statistically significant

(F(2,110)5 10.07, f5 0.40, po.001). The four orthogonal con-

trasts conducted following this analysis revealed that: (1) Com-

bined ‘‘guilty’’ and ‘‘informed innocents’’ (knowledgeable

participants) displayed significantly larger rates of correctly recog-

nized items than unknowledgeable participants (t(110)5 8.88,

f5 0.82, po.001). (2) The difference in the rate of correctly rec-

ognized items between the ‘‘guilty’’ and the ‘‘informed innocents’’

was not statistically significant (t(110)5 0.95). (3) As expected, the

time effect (i.e., a smaller rate of correctly recognized items in the

delayed than in the immediate condition) was significantly larger

for knowledgeable than for unknowledgeable participants

(t(110)5 4.01, f5 0.36, po.001). (4) Similarly, a significantly

larger time effect was found for ‘‘informed innocents’’ than for

‘‘guilty’’ participants (t(110)5 2.66, f5 0.23, Po.01).

SCR

The means of the SCR detection scores, computed across par-

ticipants within each experimental condition and each item-type

category, are displayed in Figure 2. These data were subjected to

the same ANOVA conducted for the recognition results. The


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

Figure 1. Means and Standard Errors of the rate of correctly recognized items, computed across the 12 central and 7 peripheral questions within each

experimental condition.

Figure 2. Means and Standard Errors of the Standardized SCRs to the Relevant Items, computed across the 6 central and 4 peripheral within each


PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

item-type factor showed a statistically significant main effect

(F(1,114)5 14.18, f5 0.23, po.001) indicating that central items

elicited larger relative SCRs than the peripheral items, but it did

not show any statistically significant interactions with the other

factors. However, these insignificant interactions may be due to

the inclusion of the uninformed innocents for whomneither item-

type nor time of CIT should make a difference. Indeed, the three

orthogonal contrasts conducted, excluding the ‘‘uninformed in-

nocents,’’ revealed that, consistent with our hypothesis, the ad-

vantage in detection of central over peripheral items was more

pronounced in the delayed CIT than in the immediate test

(t(114)5 1.75, po.05, one-tailed, f5 0.11). On the other hand,

in contrast to our hypothesis, the advantage in detection of cen-

tral over peripheral items was not more pronounced in the

‘‘guilty’’ than in the ‘‘informed innocents’’ (t(114)5 1.46,

f5 0.08). Finally, the contrast examining whether the item-type

differences reflect a group � time interaction did not yield a sta-

tistically significant result (t(114)5 0.49).

The analysis of the between-subjects factors revealed a sta-

tistically significant group effect (F(2,114)5 14.59, f5 0.48,

po.001) and a smaller time effect (F(1,114)5 3.56, po.05, one-

tailed, f5 0.15), reflecting larger relative SCRs in the immediate

than in the delayed condition. The group factor showed also a

statistically significant interaction with time (F(2,114)5 4.49,

f5 0.24, po.05). This interaction was expected as time of CIT

should affect only knowledgeable participants. The four orthog-

onal contrasts, conducted to examine more closely the group

effect and its interaction with time, revealed that: (1) knowl-

edgeable participants showed a significantly larger SCR detec-

tion score than non-knowledgeable participants (t(114)5 4.20,

f5 0.37, po.001); (2) guilty did not differ significantly from in-

formed innocents (t(114)5 .47); (3) the time effect (larger detec-

tion score in the immediate than in the delayed condition) was

significantly larger for knowledgeable participants than for ‘‘un-

informed innocents’’ (t(114)5 1.72, po.05, one-tailed; f5 0.13);

and (4) the comparison of the time effect on ‘‘guilty’’ vs. ‘‘in-

formed innocents’’ did not yield a statistically significant out-

come (t(114)5 .1.02; f5 0.02).

SVT

The means of the SVT detection scores computed across partic-

ipants are displayed in Figure 3 as a function of item-type and


The data of Figure 3 were subjected to the same analyses

applied for the CIT and the recognition results. Surprisingly,

central items produced a significantly smaller SVT detection than

peripheral items (F(1,114)5 3.86, f5 0.11, p5 .052). However,

an inspection of Figure 3 reveals that this trend was due to

differences between the two item-types in the uninformed inno-

cents, who are obviously guessing and are unable to differentiate

between central and peripheral items. Indeed, when the unin-

formed innocents were excluded, the differences between the two

item-types were no longer significant. In addition, no statistically

significant interactions between item-type and the other factors

were found. The same three planned contrasts involving the item-

type factor were computed as in the previous analyses, and none

revealed a statistically significant outcome (t(114)5 1.26,

f5 0.06 for the item-type � time interaction; t(114)5 0.45 for

the item-type � group interaction; and t(114)5 .1.51, f5 0.09

for the triple interaction).

The analysis of the between-subjects factors revealed that

both the two main effects and their interaction produced statis-

tically significant outcomes (F(2,114)5 8.58, f5 0.36, po.001

for the group factor; F(1,114)5 3.23, po.05, one-tailed,

f5 0.14, for the time factor, reflecting larger SVT detection

score in the immediate than in the delayed condition; and

F(2,114)5 3.46, f5 0.20, po.05 for the group � time interac-

tion, indicating that as expected the reduction over time in the

detection measure was small in the ‘‘guilty’’ condition, but much

more pronounced with the ‘‘informed innocents’’). To examine

more closely these effects, we conducted the same four planned

contrasts computed for the CITand recognition data. The results

of these analyses were generally similar to the SCR results, in-

dicating that, while knowledgeable participants displayed a

larger average value of the SVT detection score than unknowl-

edgeable participants (t(114)5 4.27, f5 0.38, po.001), there

were no significant differences between ‘‘informed innocents’’

and ‘‘guilty’’ participants (t(114)5 .0.12). The effect of time of

testing was larger for knowledgeable participants than for un-

knowledgeable (t(114)5 3.17, f5 0.27, po.001) and, unlike the

SCR results, it was larger for ‘‘informed innocents’’ as compared

with ‘‘guilty’’ (t(114)5 2.44, f5 0.25, po.01).

NGT

The means of the NGT detection scores computed across par-

ticipants within each condition are presented in Figure 4. These


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61Figure 3. Means and Standard Errors of the SVT-based detection measure, computed across the 9 central and 6 peripheral within each experimental

condition.

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

data are based on 113 participants, as seven participants guessed

the same value for all 4 questions, and thus it was impossible to

compute a detectionmeasure for them. A 3 � 2 between-subjects

ANOVA, with group and time as the two orthogonal factors,

was conducted on the data of Figure 4. This analysis yielded a

statistically significant group � time of test interaction

(F(2,107)5 3.47, f5 0.21, po.05). To further examine the na-

ture of this interaction and possible group differences, we con-

ducted the same 4 planned contrasts computed for the analysis of

the recognition, CIT, and SVTdata.Knowledgeable participants

showed significantly larger NGT detection scores than unknowl-

edgeable participants (t(107)5 2.24, f5 0.19, po.05), but there

was no significant difference between the ‘‘guilty’’ and the

‘‘informed innocents’’ (t(107)5 1.02, f5 0.02). In addition, the

reduction in the detection score over time of testing was signifi-

cantly larger with knowledgeable than with unknowledgeable

participants (t(107)5 2.57, f5 0.22, po.01), but no time effect

differences were found between the ‘‘guilty’’ and the ‘‘informed

innocents’’ (t(107)5 0.44).

ROC Curves

An additional approach for describing and comparing detection

efficiency was adopted from Signal Detection TheoryFSDT

(e.g., Green & Swets, 1966; Swets, Tanner, & Birdsall, 1961).

This approach is particularly useful for analyzing psychophys-

iological as well as behavioral detection data, and it has been

applied extensively in this area (e.g., Ben-Shakhar & Elaad,

2003; National Research Council, 2003). Typically, detection

efficiency is defined in terms of the relationship between the de-

tectionmeasure and the actual guilt (or knowledge of the relevant

items). In SDTterms, this is measured by a ROC curve reflecting

the degree of separation between the distributions of the detec-

tion score of ‘‘guilty’’ and ‘‘innocent’’ participants. In the present

experiment, there are two groups of knowledgeable participants,

and the ROC for each of these groups was constructed by com-

paring the detection score distribution of the knowledgeable

participants (either ‘‘guilty’’ or ‘‘informed innocents’’) with the

respective distribution of the ‘‘uninformed innocents.’’ These

ROCs were constructed within each experimental condition for

eachmeasure, based on the 12 central, the 7 peripheral, as well as

all 19 items. In addition, we examined the possibility of com-

bining the three detection measures and constructed additional

ROC curves, one based on a combination of the SCR and SVT

and another based on a combination of all three measures. The

measures were combined by using simple averages of the stan-

dardized detectionmeasures (eachmeasure was first transformed

into standard scores based on the entire sample, and then the

three standardized measures were averaged). We did not apply

optimal weights to the three detection measures to avoid

the possibility of inflating detection efficiency estimates due to

capitalization on chance.

Table 3 displays the areas under the ROC curves of the

various measures as a function of item-types and experimental

conditions. An inspection of Table 3 reveals that detection effi-

ciency of ‘‘guilty’’ participants, as reflected by the ROC area,

ranged in the immediate testing from 0.69 to 0.82 when a single


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

Figure 4. Means and Standard Errors of the NGT-based detection measure computed within each experimental condition.

Table 3. AreasUnder the ROCCurves Computed for EachDetectionMeasure and for 2 Combinations of theseMeasures,Within Each Item

Category, Across Categories, and Within Each Experimental Condition

CIT SVT NGT CIT1SVT All 3 Tests

All Central Peripheral All Central Peripheral All All All

GuiltyImmediate 0.82nn 0.78nn 0.77nn 0.77nn 0.69n 0.73n 0.81nn 0.94nn 0.97nn

Delayed 0.76nn 0.80nn 0.55 0.68 0.77nn 0.62 0.49 0.84nn 0.80nn

Informed-innocentImmediate 0.91nn 0.76nn 0.87nn 0.87nn 0.81nn 0.83nn 0.70n 0.97nn 0.97nn

Delayed 0.64 0.74n 0.44 0.54 0.64 0.50 0.46 0.65 0.66

npo.05; nnpo.01.

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

measure is considered, and it increased to a level of 0.94, or even

0.97, when two or three measures were combined. This increase

in the ability to differentiate ‘‘guilty’’ from ‘‘uninformed inno-

cents’’ with the addition of the two behavioral measures can be

accounted for by the fact that these behavioral measures reflect

different psychological processes than the psychophysiological

measure. Indeed, the Pearson correlation coefficients among the

three measures, computed across all knowledgeable participants,

were nearly zero (ranging between � 0.05 and 0.02).

In the delayed testing condition, detection efficiency generally

decreased, and both the SVTand NGT produced detection effi-

ciency estimates that don’t significantly exceed a chance level of

0.50. The SCR, on the other hand, remained relatively stable and

produced an area of 0.76 when all items were considered and 0.80

when only central items were used. The addition of the SVT

further increased the area to a level of 0.84 in the delayed con-

dition.

While this may be seen as good news, it must be qualified by

the relatively high areas obtained for the ‘‘informed innocents,’’

whichmeans that the risk of false-positive outcomeswhen critical

information is leaked out may be severe. This danger is partic-

ularly severe in immediate testing, but much less when the test is

delayed. In fact, in almost all cases, the areas computed for the

‘‘informed innocents’’ in the delayed testing were not signifi-

cantly larger than chance. For example, the ROC area computed

for the ‘‘informed innocents’’ decreased from 0.91 to 0.64 when

only the SCR was used and from 0.97 to 0.66 when all three

measures were used. To further examine whether ‘‘guilty’’ and

‘‘informed innocents’’ can be differentiated, additional ROC

curves were constructed, such that sensitivity represented the rate

of correctly classifying ‘‘guilty’’ participants and false-positive

rate represented the proportion of ‘‘informed innocents’’ classi-

fied as ‘‘guilty.’’ The results of this analysis revealed that all the

areas under the ROC curves in the immediate condition were

around a chance level of 0.50, but increased somewhat in the

delayed condition (e.g., it increased from 0.48 to 0.65 for the

SCR and from 0.44 to 0.72 for the combination of SCR and

SVT), implying that false-positive errors due to information

leakage may be attenuated when the test is delayed.

Discussion

The results of this experiment join many previous studies in

demonstrating that the CIT can be a powerful tool in differen-

tiating between individuals possessing critical information and

those who were not exposed to this information. However, the

present results also demonstrate that, at least when tested im-

mediately, individuals who actually committed the mock-crime

cannot be differentiated from those who were just exposed to the

critical information in a neutral context. This pattern was re-

vealed in each of the three detection measures employed in this

study as well as when participants’ memory for the critical items

was examined after they took the various tests. It can be argued

that this result reflects the fact that the standard version of the

CIT (the GKT), rather than the GATproposed by Bradley and

his colleagues (e.g., Bradley et al., 1997), was used in this exper-

iment. But, on the other hand, our results with respect to the

informed innocents are quite similar to those reported recently by

Gamer et al. (2010) who used the GAT. Furthermore, in an

additional study, Gamer (2010) directly compared the GATwith

the standard CIT (or GKT) and found that, while both formats

were equally effective in differentiating between knowledgeable and

unknowledgeable individuals, they were also equally ineffective in

differentiating between guilty and informed innocents.

All measures used in this experiment reflect, as expected, an

effect of time among knowledgeable participants. More inter-

estingly, most measures revealed a stronger time effect (a decre-

ment of the detectionmeasure in the delayed condition relative to

the immediate condition) among ‘‘informed innocents’’ as com-

pared with ‘‘guilty’’ participants. However, this tendency was

statistically significant only in the ANOVAs conducted for the

recognition test and the SVT, but not when the SCRs and the

NGTwere used. The differential time effect is also revealed in the

ROC analysis (see Table 3) where the decline in the area statistic

was smaller among ‘‘guilty’’ participants (e.g., from 0.82 to 0.76

for SCR; from 0.97 to 0.80 for all measures combined), than

among ‘‘informed innocents’’ (e.g., from 0.91 to 0.64 for SCR

and from 0.97 to 0.65 for all three measures). This finding is

consistent with the results reported by Gamer et al. (2010) who

used a combination of autonomic measures and demonstrated

that, while the area statistic did not show any decline in the

delayed test for the ‘‘guilty’’ participants (0.89 and 0.90 in the

immediate and delayed tests, respectively), ‘‘informed innocents’’

showed a considerable decline (from 0.95 to 0.75).

This result may reflect the roles of involvement and active

task-participation in memory. Individuals who actually commit-

ted the mock crime took an active part in producing the items to

be remembered, while ‘‘informed innocents’’ became aware of

the critical details through reading a newspaper. This difference

between the two groups does not affect their responses in the

immediate testing, but it does affect memory and, consequently,

differential responding to the critical items shows greater decline

with time among ‘‘informed innocents’’ than among ‘‘guilty.’’

This account is consistent with an extensive literature on the

‘‘generation effect’’ in memory (e.g., Slamecka & Graf, 1978;

deWinstanley, 1995; deWinstanley & Bjork, 2004), demonstrat-

ing that individuals tend to remember information better when

they take an active part in producing it. For example, partici-

pants who generated words by themselves (e.g., generated the

opposite of a given word) subsequently remembered them better

than participants who read the same words (Slamecka & Graf,

1978). Similarly, the superiority of memory for actions (self-per-

formed tasks) over memory for verbally learned material (ver-

bally learned tasks) has been demonstrated to be highly robust

(‘‘the enactment effect’’; Engelkamp, 1998). By the same token,

‘‘guilty’’ participants who actually experienced the event, enacted

the mock crime and had a direct contact with the critical items

were more involved in the task and thus remembered these items

better than ‘‘informed-innocents’’ who were exposed to the con-

cealed items by reading about them.

The practical implication of these results is that, although a

great caution must be exercised against the possibility of infor-

mation leakage, this problem may be less severe in actual appli-

cations of the CIT, because typically CITs are never conducted

immediately after a crime was committed and often it may take a

few weeks to identify potential suspects and design a CIT. Ide-

ally, of course, CIT should not be conducted at all with suspects

who were informed about the critical information, and some-

times such suspects can be identified by a proper pre-test inter-

view. However, suspects in criminal offenses may be reluctant to

disclose knowledge of crime-related items, even when they did

not commit the crime and the critical information was leaked to

them, because they can’t be certain that they will be believed to


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

have obtained this guilty knowledge through leakage. For ex-

ample, often such suspects are unable to explain how they be-

came aware of the crime-related details (see Ben-Shakhar et al.,

1999, for a more detailed discussion of this issue). Consequently,

it is impossible to guarantee, in practice, that guilty and only

guilty suspects have knowledge of the critical information, and,

therefore, any means that may minimize the risks involved in

testing informed innocent suspects is important.

Our results differ somewhat from those reported by Gamer et

al. (2010) in demonstrating attenuation in detection efficiency of

‘‘guilty’’ participants after 1 week. As indicated earlier, Gamer et

al. (2010) did not find any time effect on ROC area with ‘‘guilty’’

subjects. Similarly, Carmel et al. (2003), who used only ‘‘guilty’’

subjects, reported identical ROC areas in the immediate and de-

layed conditions when the standard mock crime was applied

(0.84 in both conditions), but with the more realistic version of

the mock crime detection efficiency showed some decline when

the test was delayed (from 0.71 to 0.68). Thus, the present results

and those reported byCarmel et al. (2003) suggest that, in amore

realistic mock crime, some reduction in SCR-CIT detection effi-

ciency may be expected when the test is delayed. However, as

Gamer et al. (2010) reached a different conclusion, this issue may

require further research.

Another important aspect of the present results is the differ-

entiation between central and peripheral items. As predicted,

central items produced more efficient SCR detection efficiency,

and this effect was stronger when the test was delayed. This is

most clearly reflected by the ROC analysis where the two item-

types produced in the immediate CIT, either similar areas (0.78

and 0.77 for central and peripheral items, respectively, with

‘‘guilty’’ participants) or even an advantage of peripheral items in

the ‘‘informed innocents’’ (0.76 vs. 0.87 for central and periph-

eral items, respectively). In the delayed condition, on the other

hand, the areas remained stable for the central items (0.80 and

0.74 for ‘‘guilty’’ and ‘‘informed innocents,’’ respectively) but

declined drastically when only peripheral items were used (0.55

and 0.44, both not significantly different from a chance area of

0.50).

The ROC analysis for the SVT reveals a similar pattern (see

Table 3), although the ANOVA conducted on the SVT detection

measure did not reveal a statistically significant item-type � time

interaction. Furthermore, when the test is delayed, relying on just

the central items results in larger areas than relying on all items,

and this pattern is reflected by both the SCR and the SVT. In this

respect, the present results strengthen the conclusion made by

both Carmel et al. (2003) and Gamer et al. (2010), namely, that

when constructing a CIT, an effort should be made to identify as

many central items as possible. Ben-Shakhar and Elaad (2003)

demonstrated that CITs based on at least five questions produce

optimal detection efficiency. However, it is doubtful whether it

would be possible to identify five central features of a crime, in

the realistic criminal context, and it is unclear from the present

results as well as from Carmel et al. (2003) and Gamer et al.

(2010) whether adding peripheral items would be beneficial. One

option would be to use only central items and repeat each ques-

tion several times (see Ben-Shakhar & Elaad, 2002b; Elaad &

Ben-Shakhar, 1997), but this requires additional research as the

previous examinations of item-repetition effects did not relate to

the distinction between central and peripheral items, nor did they

relate to the crucial factor of delaying the test.

The inclusion of two behavioral measures in addition to SCRs

allows us to examine how thesemeasures are affected by delaying

the test and also to assess their incremental validity when com-

bined with the physiological measure. Both the SVT and the

NGT showed the expected time effect on knowledgeable partic-

ipants. Furthermore, the SVT demonstrated a significantly larger

time effect for ‘‘informed innocents’’ than for the ‘‘guilty’’ par-

ticipants, implying that, in realistic conditions where the CIT is

almost always delayed, its vulnerability to information leakage

may be reduced.

The present results also demonstrate that these behavioral

measures may be useful when used in combination with phys-

iological measures in enhancing the validity of the CIT. For ex-

ample, when adding the SVTto the SCRmeasure, the area under

the ROC curve for detecting ‘‘guilty’’ participants in the imme-

diate test increased from 0.82 to 0.94, and adding the NGT fur-

ther increased the area to 0.97. In the delayed test, the addition of

SVT increased the area from 0.76 to 0.84, but no further increase

with the NGT was revealed. Clearly the addition of these be-

havioral measures increases also the likelihood of false-positive

outcomes in the ‘‘informed innocents,’’ at least in the immediate

testing (the area increased from 0.91 to 0.97). Interestingly, er-

roneous detection of ‘‘informed innocents’’ in the delayed con-

dition is relatively minor and the addition of the SVTand NGT

don’t make any difference (i.e., the area slightly increased from

0.64 to 0.65 and 0.66, all values are not significantly larger than a

chance area of 0.50). These results, which are consistent with the

results of the second experiment reported by Meijer et al. (2007),

imply that the SVTcan be a valuable addition to the traditional

physiological measures in applied settings.

Of course, it is premature to make a definitive recommenda-

tion at this stage, and various aspects of this behavioral measure

must be further investigated. In particular, it will be important to

study its vulnerability to countermeasures and to devise algo-

rithms protecting it from countermeasure attempts. The present

study did not include a systematic examination of the effects of

countermeasures on the SVT, but a post-experiment interview

with the participants revealed that some knowledgeable partic-

ipants tried to use sophisticated strategies to produce a random

pattern (e.g., ignoring the content of the questions and answers,

choosing always the answers that appeared at the right (or left)

side of the screen). This issue was examined by Verschuere et al.

(2008) who coached half of their participants not to perform

below chance level. Indeed, none of the coached participants

performed below chance level and consequently they were not

detected, but 21% of these participants were detected when a run

test was applied to detect deviations in the number of response

alterations. Although these results shed doubts on the utility of

the SVT as an aid in criminal investigations, it is possible that

additional algorithms could be developed to detect deviations

from randomness. Future studies should be conducted to further

examine the vulnerability of the SVTto countermeasures and the

effectiveness of various methods to detect deviations from ran-

domness.

A greater deal of caution should be exerted regarding the use of

the NGT. Although the present results show that it may have a

potential, it has to be remembered that, in this experiment, the

NGTwas based on just four questions, and correlation coefficients

derived from such a small profile may be unreliable. In addition, it

should be noted that the use of only fourNGTquestions reflects an

inherent difficulty to identify critical numerical items. Thus, it is

suggested that the validity of the NGT and its potential as an

additional detection measure in forensic applications should be

further explored before any conclusions are reached.


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

Finally, it should be pointed out once again that this study

focused on just two aspects differentiating the laboratory

mock crime set-up from realistic criminal investigations (mem-

ory of various types of critical items when the test is delayed and

leakage of critical information to innocent suspects). Clearly,

there are various emotional and motivational differences

between mock crime studies and criminal investigations that

may affect differential responding to the critical items. It is, of

course, an empirical question whether the present results as well

as the large body of CIT research, which is based on mock crime

paradigms, will generalize to the forensic usage of the CIT.

We believe, following Lykken (1974) that, since the CIT, unlike

the Comparison Questions Test (CQT), focuses on the detec-

tion of specific knowledge stored in memory, rather than on

the detection of deception, it would be relatively unaffected

by the increased emotional arousal associated with realistic crim-

inal investigations. A study by Kugelmass and Lieblich (1966)

who successfully manipulated emotional arousal and stress

and found a general increase in measures of physiological

arousal, but no effect on differential responding to the relevant

information, provides some empirical support for this belief. But

clearly, further research focusing on the emotional and motiva-

tional factors and their effect on the outcomes of the CIT is

needed.

REFERENCES

Ben-Shakhar, G. (1985). Standardization within individuals: A simplemethod to neutralize individual differences in psychophysiologicalresponsivity. Psychophysiology, 22, 292–299.

Ben-Shakhar, G., Bar-Hillel, M., & Kremnitzer, M. (2002). Trial bypolygraph: Reconsidering the use of the GKT in court. Law andHuman Behavior, 26, 527–541.

Ben-Shakhar, G., & Elaad, E. (2002a). The Guilty Knowledge Test(GKT) as an application of psychophysiology: Future prospectsand obstacles. In M. Kleiner (Ed.), Handbook of polygraph testing(pp. 87–102). San Diego, CA: Academic Press.

Ben-Shakhar, G., & Elaad, E. (2002b). Effects of questions’ repetitionand variation on the efficiency of the guilty knowledge test: A reex-amination. Journal of Applied Psychology, 87, 972–977.

Ben-Shakhar, G., & Elaad, E. (2003). The validity of psychophysiolog-ical detection of deception with the Guilty Knowledge Test: A meta-analytic review. Journal of Applied Psychology, 88, 131–151.

Ben-Shakhar, G., & Furedy, J. J. (1990). Theories and applications in thedetection of deception: A psychophysiological and international per-spective. New York: Springer-Verlag.

Ben-Shakhar, G., Gronau, N., & Elaad, E. (1999). Leakage of relevantinformation to innocent examinees in the GKT: An attempt to re-duce false-positive outcomes by introducing target stimuli. Journal ofApplied Psychology, 84, 651–660.

Bradley, M. T., Barefoot, C., & Arsenault, A. (2010). Leakage ofinformation to innocents. In B. Verschuere, G. Ben-Shakhar, &E. Meijer (Eds.), Memory detection: Theory and application of theConcealed Information Test. Cambridge, UK: Cambridge UniversityPress, Forthcoming.

Bradley, M. T., MacLaren, V. V., & Carle, S. B. (1997). Deception andnondeception in guilty knowledge and guilty actions polygraph tests.Journal of Applied Psychology, 81, 153–160.

Bradley, M. T., & Rettinger, J. (1992). Awareness of crime-relevant in-formation and the guilty knowledge test. Journal of Applied Psycho-logy, 77, 55–59.

Bradley,M. T., &Warfield, J. F. (1984). Innocence, information, and theguilty knowledge test in the detection of deception. Psychophysiology,21, 683–689.

Carmel, D., Dayan, E., Naveh, A., Raveh, O., & Ben-Shakhar, G.(2003). Estimating the validity of the Guilty Knowledge Test fromsimulated experiments: The external validity of mock crime studies.Journal of Experimental Psychology: Applied, 9, 261–269.

Cohen, J. E. (1988). Statistical power analysis for the behavioral sciences.Hillsdale, NJ: Lawrence Erlbaum.

deWinstanley, P. A. (1995). A generation effect can be foundduring naturalistic learning. Psychonomic Bulletin & Review, 2,538–541.

deWinstanley, P. A., & Bjork, E. L. (2004). Processing strategies and thegeneration effect: Implications for making a better reader.Memory &Cognition, 32, 945–955.

Elaad, E. (1998). The challenge of the concealed knowledge polygraphtest. Expert Evidence, 6, 161–187.

Elaad, E., & Ben-Shakhar, G. (1997). Effects of items’ repetitions andvariations on the efficiency of the guilty knowledge test. Psycho-physiology, 34, 587–596.

Engelkamp, J. (1998).Memory for actions. East Sussex, UK: PsychologyPress Publishers.

Gamer, M. (2010). Does the guilty action test allow for differentiatingguilty participants from informed innocents? A re-examination.International Journal of Psychophysiology, 76, 19–24.

Gamer, M., Bauermann, T., Stoeter, P., & Vossel, G. (2007). Covari-ations among fMRI, skin conductance and behavioral data duringprocessing of concealed information. Human Brain Mapping, 28,1287–1301.

Gamer, M., & Berti, S. (2010). Task relevance and recognition of con-cealed information have different influences on electrodermal activityand event-related brain potentials. Psychophysiology, 47, 355–364.

Gamer, M., Kosiol, D., & Vossel, G. (2010). Strength of memoryencoding affects physiological responses in the Guilty Action Test.Biological Psychology, 83, 101–107.

Gamer, M., Verschuere, B., Crombez, G., & Vossel, G. (2008). Com-bining physiological measures in the detection of concealed informa-tion. Physiology and Behavior, 95, 333–340.

Green, D. M., & Swets, J. A. (1966). Signal detection theory andpsychophysics. New York: John Wiley & Sons.

Iacono, W. I. (2010). Encouraging the use of the guilty knowledge test(GKT): What the GKT has to offer to law enforcement. InB. Verschuere, G. Ben-Shakhar, & E. Meijer (Eds.), Memory detec-tion: Theory and application of the Concealed Information Test. Cam-bridge, UK: Cambridge University Press, Forthcoming.

Kraphol, D. (2010). Practical limitations of the concealed informationtest in criminal cases. In B. Verschuere, G. Ben-Shakhar, & E. Meijer(Eds.), Memory detection: Theory and application of the ConcealedInformation Test. Cambridge, UK: Cambridge University Press,Forthcoming.

Kugelmass, S., & Lieblich, I. (1966). Effects of realistic stress and pro-cedural interference in experimental lie detection. Journal of AppliedPsychology, 50, 211–216.

Langleben, D. D., Loughead, J. W., Bilker, W. B., Ruparel, K., Chil-dress, A. R., Busch, S. I., &Gur, R. C. (2005). Telling truth from lie inindividual subjects with fast event-related fMRI. Human Brain Map-ping, 26, 262–272.

Lieblich, I., & Ninio, A. (1972). Detection of suppressed involvementwith information through a forced number-guessing technique. ActaPsychologica, 36, 381–387.

Lieblich, I., Shaham, E., & Ninio, A. (1976). Effects of time stress andstimulus-response set size on the efficiency of detection of involvementwith suppressed information through the use of the forced number-guessing technique. Acta Psychologica, 40, 75–84.

Lykken, D. T. (1959). The GSR in the detection of guilt. Journal ofApplied Psychology, 43, 385–388.

Lykken, D. T. (1960). The validity of the guilty knowledge technique:The effects of faking. Journal of Applied Psychology, 44, 258–262.

Lykken, D. T. (1974). Psychology and the lie detector industry.AmericanPsychologist, 29, 725–739.

Lykken, D. T. (1998). A tremor in the blood: Uses and abuses of the liedetector. New York: Plenum Trade.

Marston, W. M. (1917). Systolic blood pressure symptoms of deception.Journal of Experimental Psychology, 2, 117–163.

Meijer, E. H., Smulders, F. T. Y., Johnston, J. E., &Merckelbach, H. L.G. J. (2007). Combining skin conductance and forced choice inthe detection of concealed information. Psychophysiology, 44,814–822.


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

Merckelbach, H. L. G. J., Hauer, B., & Rassin, E. (2002). Symptomvalidity testing of feigned dissociative amnesia: A simulation study.Psychology, Crime and Law, 8, 311–318.

Nakayama,M. (2002). Practical use of the concealed information test forcriminal investigation in Japan. In M. Kleiner (Ed.), Handbook ofpolygraph testing (pp. 49–86). San Diego, CA: Academic Press.

National Research Council. (2003). The polygraph and lie detection.Committee to Review the Scientific Evidence on the Polygraph. Wash-ington: The National Academies Press.

Osugi, A. (2010). Daily application of the CIT: Japan. In B. Verschuere,G. Ben-Shakhar, & E. Meijer (Eds.), Memory detection: Theory andapplication of the Concealed Information Test. Cambridge, UK: Cam-bridge University Press., Forthcoming.

Pankratz, L., Fausti, S. A., & Peed, S. (1975). A forced-choice techniqueto evaluate deafness in the hysterical ormalingering patient. Journal ofConsulting and Clinical Psychology, 43, 421–422.

Podlesny, J. A. (1993). Is the guilty knowledge polygraph technique ap-plicable in criminal investigations? A review of FBI case records.Crime Laboratory Digest, 20, 57–61.

Raskin, D. C. (1989). Polygraph techniques for the detection of decep-tion. In D. C. Raskin (Ed.), Psychological methods in criminal inves-tigation and evidence (pp. 247–296). New York: Springer-Verlag.

Reid, J. E., & Inbau, F. E. (1977). Truth and deception: The Polygraph(‘‘Lie Detection’’) Technique. Baltimore: Williams and Wilkins.

Rosenfeld, J. P., Labkovsky, E., Winograd, M., Lui, M. A., Vanden-boom, C., & Chedid, E. (2008). The Complex Trial Protocol (CTP):

A new, countermeasure-resistant, accurate P300-based method fordetection of concealed information. Psychophysiology, 45, 906–919.

Rosenfeld, J. P., Shue, E., & Singer, E. (2007). Single versus multipleprobe blocks of P300-based concealed information tests for autobio-graphical versus incidentally learned information. Biological Psycho-logy, 74, 396–404.

Slamecka, N. J., & Graf, P. (1978). The generation effect: Delineation ofa phenomenon. Journal of Experimental Psychology: Learning, Mem-ory & Cognition, 4, 492–604.

Swets, J. A., Tanner, W. P. Jr., & Birdsall, T. C. (1961). Decision pro-cesses in perception. Psychological Review, 68, 301–340.

Verschuere, B., Crombez, G., De Clercq, A., & Koster, E. (2004). Au-tonomic and behavioral responding to concealed information: Differ-entiating defensive and orienting responses. Psychophysiology, 41,461–466.

Verschuere, B., Crombez, G., & Koster, E. (2004). Orienting to guiltyknowledge. Cognition & Emotion, 18, 265–279.

Verschuere, B., Meijer, E., & Crombez, G. (2008). Symptom validitytesting for the detection of simulated amnesia: Not robust to coach-ing. Psychology, Crime, & Law, 14, 523–528.

Vrij, A. (2008).Detecting lies and deceit. Pitfalls and opportunities (SecondEdition). West Sussex: John Wiley and Sons.

(Received May 5, 2010; Accepted September 17, 2010)


1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

PSYP 01148

(BW

US

PSY

P 01

148

Web

pdf:

=10

/04/

2010

05:

53:3

9 72

5776

Byt

es 1

2 PA

GE

S n

oper

ator

=)

10/4

/201

0 5:

56:2

8 PM

Author Query Form

_______________________________________________________

_______________________________________________________

Dear Author,

During the copy-editing of your paper, the following queries arose. Please respond to these by marking up your proofs with the necessary changes/additions. Please write your answers clearly on the query sheet if there is insufficient space on the page proofs. If returning the proof by fax do not write too close to the paper's edge. Please remember that illegible mark-ups may delay publication.

Journal PSYPArticle 01148

Query No. Description Author Response

.No Queries

Documents

Psychophysiological and behavioral measures for detecting ...old.psychology.huji.ac.il/.upload/Gershon/psyp_01148 1 .pdf · Psychophysiological and behavioral measures for detecting