Psychometric Properties of a Working Memory Span Task

University of Central Florida University of Central Florida

STARS STARS

Honors Undergraduate Theses UCF Theses and Dissertations

2018

Psychometric Properties of a Working Memory Span Task Psychometric Properties of a Working Memory Span Task

Juan M. Alzate Vanegas University of Central Florida

Part of the Applied Statistics Commons, Cognitive Psychology Commons, Multivariate Analysis

Commons, and the Quantitative Psychology Commons

Find similar works at: https://stars.library.ucf.edu/honorstheses

University of Central Florida Libraries http://library.ucf.edu

This Open Access is brought to you for free and open access by the UCF Theses and Dissertations at STARS. It has

been accepted for inclusion in Honors Undergraduate Theses by an authorized administrator of STARS. For more

information, please contact [email protected].

Recommended Citation Recommended Citation Alzate Vanegas, Juan M., "Psychometric Properties of a Working Memory Span Task" (2018). Honors Undergraduate Theses. 358. https://stars.library.ucf.edu/honorstheses/358

https://stars.library.ucf.edu/



https://stars.library.ucf.edu/honorstheses

https://stars.library.ucf.edu/thesesdissertations

http://network.bepress.com/hgg/discipline/209?utm_source=stars.library.ucf.edu%2Fhonorstheses%2F358&utm_medium=PDF&utm_campaign=PDFCoverPages





https://stars.library.ucf.edu/honorstheses

http://library.ucf.edu/

mailto:[email protected]

https://stars.library.ucf.edu/honorstheses/358?utm_source=stars.library.ucf.edu%2Fhonorstheses%2F358&utm_medium=PDF&utm_campaign=PDFCoverPages



PSYCHOMETRIC PROPERTIES OF A WORKING MEMORY SPAN TASK

by

JUAN M. ALZATE VANEGAS

A thesis submitted in partial fulfillment of the requirement

for the Honors in the Major Program in Psychology

in the College of Sciences

and in the Burnett Honors College

at the University of Central Florida

Orlando, Florida

Summer Term, 2018

Thesis Chair: Mustapha Mouloua, Ph.D.

ii

ABSTRACT

The intent of this thesis is to examine the psychometric properties of a complex span task (CST)

developed to measure working memory capacity (WMC) using measurements obtained from a

sample of 68 undergraduate students at the University of Central Florida. The Grocery List Task

(GLT) promises several design improvements over traditional CSTs in a prior study about

individual differences in WMC and distraction effects on driving performance, and it offers

potential benefits for studying WMC as well as the serial-position effect. Currently, the working

memory system is composed of domain-general memorial storage processes and information-

processing, which involves the use of executive functions. Prior research has found WMC to be

associated with attentional measures (i.e., executive attention) and the updating function, and

unrelated to the shifting function. The present study replicates these relationships to other latent

variables in measures obtained from the GLT as convergent and discriminant evidence of

validity. In addition, GLT measures correlate strongly with established measures of WMC. Task

reliability is assessed by estimates of internal consistency, pairwise comparisons with a cross-

validation sample, and an analysis of demographic effects on task measurements.

iii

DEDICATION

I dedicate this thesis to the undergraduate student who is enthusiastic about the idea of

completing a research project, but is afraid to ask for help or follow through with the process.

iv

ACKNOWLEDGEMENTS

No man does it all by himself. Although I am listed as the sole author of this thesis, the humbling

reality is that this work would not have been possible without these outstanding individuals.

My parents. Juan and María, for their love, support, and counsel throughout the process.

My mentors. Dr. Jennifer F. Louie, for inspiring this project out of the task that we

designed for her dissertation, and Dr. Charles Negy, for shaping the scholar I am today.

My research assistants. Madison Hurst, Emily Asaro, and Isabelle Perez Santos, for

their dedication in helping me collect data from a modestly large sample size in a

surprisingly short amount of time.

My participants. The 72 participants who showed genuine interest in this project and

entrusted me with their data as a means of enhancing our understanding of the

psychometrics of working memory ever so slightly.

My committee. Dr. Mustapha Mouloua, Dr. Matthew Chin, and Dr. Alexander

Mantzaris, for their counsel and flexibility in scheduling my thesis defense.

The authors referenced in this thesis. Without your interdisciplinary contributions to

the body of literature in working memory, studies like these would not be possible.

In publishing this work, I would like to express my deepest gratitude to these individuals.

v

TABLE OF CONTENTS

Introduction……………………………………………………………………………………......1

Background……………………………………………………………………………………..1

Theoretical Models…….………………………………………………………………………..2

Method….………………………………………………………………………………………..10

Participants…………………………………………………………………………………….10

Materials…………………………………………………………………………………….…10

Procedures……………………………………………………………………………………..14

Cross-Validation Sample…………………...………………………………….………………16

Results…..………………………………………………………………………………………..17

Demographics…….……………………………………………………………………………17

Reliability……………………………………………………………………………………...17

Validity…..…………………………………………………………………………………….19

Cross-Validation……………………………………………………………………………….22

Item Recall Patterns…………………………………………………………………………...24

Discussion………………………………………………………………………………………..26

Theoretical Implications……………………………………………………………………….26

Practical Implications…………….……………………………………………………………28

Limitations…………………………………………………………………………………….29

Future Research………………………………………………………………………………..30

Appendix A: IRB Approval Letter………………………………………………………………34

Appendix B: GLT Instructions and Sample Lists………………………………………………..37

Appendix C: GLT Lists………………………………………………………………………….39

Appendix D: Item Recall Patterns……………………………………………………………….45

vi

References………………………………………………………………………………………..55

1

INTRODUCTION

The present thesis aims to evaluate the psychometric properties of the Grocery List Task,

a mental task designed to study working memory. Before proceeding, an overview of the

working memory system and its importance for human life are included in this section, along

with a discussion of applicable models used guide the hypotheses tested in this thesis.

BACKGROUND

Working memory refers to a cognitive system in which a small amount of information is

temporarily maintained for processing and rapid access (Baddeley, 1986; Cowan, 2005). In

practice, working memory has been predictive of performance in a variety of higher-order

capabilities such as understanding instructions, problem-solving on the computer, learning a

foreign language, and situational awareness in maneuvering of technical systems in driving and

aviation (Just & Carpeter, 1992; Engle, 2002; Sohn & Doane, 2003; Engle & Kane, 2004;

Conway, Jarrold, Kane, Miyake, & Towse, 2007; Unsworth, Redick, Heitz, Broadway, & Engle,

2009). Usage of the working memory system is so inescapable to humans, in fact, that most

everyday situations have been found to require some degree of processing by working memory

(Kane, Conway, Hambrick, & Engle, 2007; Cohen & Conway, 2008). Literature on the working

memory system follows the paradigm of studying individual differences in working memory

capacity (WMC), the limited capacity of the working memory system, which has been used to

explain differences in academic achievement and deficits in learning (Larkin, McDermott,

Simon, & Simon, 1980; Groen & Patel, 1988; Gathercole & Alloway, 2008), and is known to

increase as the prefrontal cortex develops and decrease with age in adults (Baddeley, 2012;

Clark, Hardman, Schachtman, Saults, Glass, & Cowan, 2018).

2

THEORETICAL MODELS

Although working memory is defined fairly consistently, the literature lacks consensus in

defining precisely what constitutes the working memory system itself. The Baddeley-Hitch

multi-component model, for instance, conceptualizes the central executive as the control center

of information to its slave systems—the phonological loop for verbal content, and the

visuospatial sketchpad for visual and spatial content—as well as an episodic buffer to link them

to the long-term memory system (Baddeley & Hitch, 1974; Baddeley, 1986, 2009, 2012). In this

regard, Baddeley’s modal model assumes working memory is a domain-specific, with distinct

structures to support processing of different varieties of information. Like other critics of

Baddeley’s theory, Engle and Kane (2004) have questioned this assumption and argued that the

multi-component model implies the existence of different types of working memory (e.g.,

spatial, verbal, sound, etc.), and hence its conceptualization of working memory lacks a

parsimonious explanation for an underlying, domain-general system for information-processing

and memory storage (Pashler, 1999).

Instead, Engle and Kane (2004) proposed that the working memory system behaves

according to the two-factor theory of cognitive control. The overarching theory describes

executive attention (EA), an attentional component specified by Fan, McCandliss, Somer, Raz,

and Posner (2002), as separate but closely related to working memory, especially within the

context of situational awareness (Miyake et al., 2000; Engle & Kane, 2004; Redick & Engle,

2006; Kane et al., 2007; Brown, & Robert, 2014). As a factor of cognitive control, EA behaves

similarly to Posner (1980)’s attentional spotlight in maintaining attention on information

3

pertinent to a specific task (Kane & Engle, 2003). The tri-layer embedded-processes model

expands on Engle and Kane’s proposed link between WMC and attention, suggesting that

individual differences in WMC are attributed in part to the attentional component of the working

memory system, the focus of attention (FA) (Cowan, 1999; 2001; 2005). The FA activates a

layer of LTM (a-LTM), resulting in our ongoing thoughts; however, this activated memory is

susceptible to interference effects and decay (Cowan, 1999). Among other such activation

models, the concentric model describes storage systems with a structural paradigm similar to that

of Baddeley’s, and has shown promise beyond specialized updating tasks (Oberauer, 2002;

Oberauer, Lewandowsky, Farrell, Jarrold, & Greaves, 2012; Velichkovsky, 2017).

Despite their differences in conceptualizing working memory, functional models are

helpful in describing distinct functions of the working memory system, which can then be

invoked by the demands of different tasks (Miyake et al., 2000; Brauer, Neumann, & Friederici,

2008). Updating, for instance, refers to one’s ability to update old, inaccurate information with

new information, and it is invoked in complex span tasks (CSTs) such as the Operation Span

(OSpan) task, as well as other non-CSTs such as the n-back task and the mental counters task

(Garavan, 1998; Miyake et al., 2000; Kane & Engle, 2003; Owen, McMillan, Laird, & Bullmore,

2005; McMillan, Laird, Witt, & Meyerand, 2007). In this regard, the updating function is

responsible for memorial storage in the domain-general model of working memory. Notably,

however, although the n-back paradigm is a common method of measuring WMC, it tends to

correlate weakly with CST measures and should not be used interchangeably with CSTs (Redick

& Lindsey, 2013; Atkins et al., 2014). Shifting is the ability to switch between tasks without

making errors (Miyake et al., 2000), and it is invoked in the Task Switching Task used by Atkins

4

and colleagues (2014). Unlike updating, shifting primarily concerns EA and is unrelated to

WMC (Kane et al., 2007), performance on CSTs (Miyake et al., 2000) and fluid intelligence (gF)

(Friedman et al., 2006; Oswald, McAbee, Redick, & Hambrick, 2015). Lastly, inhibition is one’s

ability to intentionally suppress automatic responses to stimuli (Miyake et al., 2000). This

process is invoked in the Stroop and Tower of Hanoi tasks, and requires selecting ignoring

distractive stimuli and selecting goal-relevant information; thus, it is related to both WMC and

EA (Miyake et al., 2000; Heitz & Engle, 2007).

Unsworth, Redick, Heitz, Broadway, and Engle (2009) define working memory rather

broadly as a system with memorial storage and information processing. Working memory is

separate but related to long-term working memory (LTWM), and it is limited by attentional

resources (Unsworth et al., 2009). Although this definition is fairly succinct and captures an

intuitive view of working memory, the underlying model relies upon the assumption of a limited-

resource system specified by resource theory, which has received limited support at best

(Baddeley, 1986; Logan, 1997; Engle, Kane, & Tuholski, 1999; Cowan, 2005; Cowan, Morey,

Chen, & Bunting, 2007; Tu & Hampton, 2014).

Independently of the model used to operationalize working memory, there is some

agreement on how to measure WMC. Although related to short-term memory (STM), working

memory does not fully load onto simple span tasks (Felez-Nobrega, Foster, Ribera, & Draheim,

2017). To capture all of its dimensions, WMC must be measured with complex span tasks

(CSTs) in multiple domains (Conway et al., 2005; Unsworth et al., 2009; Foster et al., 2015).

Unfortunately, traditional CSTs have several drawbacks. For instance, the method of

administration is often intrusive and does not allow a participant to engage in other activities in

5

the meanwhile; this is problematic when using the task to study distraction effects in driving and

aviation systems. CSTs like the Operation Span task are difficult to administer to groups; and can

be biased by the presence of an experimenter (Pardo-Vázquez & Fernández-Rey, 2008; Heenan,

Herdman, Brown, & Robert, 2014). Moreover, the content of these tasks is abstract and unlike

our everyday experiences with the working memory system; hence, boredom effects can lead to

drops in task performance over a large enough block of trials. In fact, in unproctored CSTs

administered digitally, abstract stimuli such as letters have worsened criterion-related measures

of predictive validity (Hicks, Foster, & Engle, 2016). Reducing the level of abstraction in a

cognitive task has been done before: in the domain of hybrid search tasks, Drew, Boettcher, and

Wolfe (2015) demonstrated that CSTs can adequately measure visual working memory with

information on both partial spans related to a common theme—a grocery list.

Likewise, the newly-developed Grocery List Task (GLT) is a complex span task that

requires participants to manipulate and recall semantic and numerical information from a set of

grocery lists (Louie, in press). Adapted from Daneman and Carpenter (1980)’s auditory Reading

Span, the GLT was designed to require memorial storage as well as the use of executive

attention; hence, its measures should behave according to Miyake et al. (2000)’s model of

executive functions. Originally, the GLT was used as a naturalistic distractor in a study of the

effects of in-vehicle devices on simulated driving performance (Louie, in press). As such, the

GLT sought to fulfill certain objectives to serve as an improvement over traditional CSTs:

Fidelity. The stimuli presented in the task should not be abstract, but rather concretely

pertain to a common human activity that requires both memorial storage and information

processing.

6

Interrelatedness between partial spans. Although the task is a dual span, the

information on each partial span should be domain-general and interrelated to adequately

demonstrate evidence of content validity. In other words, the information presented should be

from the same source or about the same topic.

Portability. The task should not be invasive to administer. In particular, participants

should be physically capable of engaging in other activities (e.g., completing another task,

driving, etc.) while the task is being administered such that the task can be used to activate the

working memory system during the completion of these activities.

Standardization. Each set of lists was normalized according to frequency and word

length, while the whole set was normalized for number of items. These methods of

standardization allow the lists to be roughly equivalent and comparable according to these

variables, such that task content is roughly homogenous across lists and list sets.

Simplicity. The task should be straightforward for large samples of participants to

understand, while still fulfilling the other objectives.

If adequately reliable, the GLT could theoretically fulfill these objectives to facilitate

measurements of WMC in certain contexts, as well as provide evidence of the serial-position

effect, which occurs when items near the beginning and end of a list are more likely to be to

encoded and recalled (Capitani, della Sala, Logie, & Spinnler, 1992).

The main goal of this study is to empirically examine the psychometric properties of the

GLT, particularly to examine its predictive validity and reliability. As such, the present study

will follow approaches taken by past research for validating complex span tasks (Gonthier et al.,

2016; Felez-Nobrega et al., 2017).

7

To evaluate the GLT’s reliability, internal consistency will be estimated with Cronbach’s

coefficient alpha (Cronbach, 1951; Kane et al., 2004; Gonthier, et al., 2016). Traditional CSTs

often produce alpha values of .70 or better (Conway et al., 2005). Moreover, a regression model

will examine the effects on GLT scores of demographics variables such as age, sex, ethnicity,

major, GPA, and first language as well as clinical disorders known to impact WMC, such as

attention deficit disorder, Alzheimer’s disease, and schizophrenia (Manassis, Tannock, Young, &

Francis-John, 2007; Oswald, McAbee, Redick, & Hambrick, 2015; Brydges, Ozolnieks, &

Roberts, 2017; Colbert, 2018).

Finally, following an approach used by researchers to cross-validate the Reliable Digit

Span (Schroeder, Twumasi-Ankrah, Baade, & Marshall, 2012), the score distribution will be

cross-validated with a sample of undergraduates who completed the GLT several months prior to

the current administration of the GLT.

To demonstrate evidence of validity, the GLT should be reflective of the executive

functions of working memory that are invoked in CSTs. Thus, scores on a battery of CSTs will

be correlated with GLT scores as a convergent measure of validity. Consistent with attentional

models of working memory, performance on the Flanker task, a component of the Attentional

Networks Task (ANT), has been found to be predictive of individual differences in WMC

(Eriksen & Eriksen, 1974; Engle & Kane, 2004; Redick & Engle, 2006; Heitz & Engle, 2007).

GLT scores should therefore be reflective of this relationship; hence, scores and response times

on the Flanker task will be correlated with GLT scores to obtain a criterion-related measure of

validity. Given that shifting is an executive function unrelated to WMC, scores on the Task

Switching Task should be unrelated to GLT scores. Finally, scores obtained from the Task

8

Switching Task will be correlated with GLT scores to produce a discriminant measure of validity

(Atkins et al., 2014).

On the basis of this approach, the following set of hypotheses was generated:

Hypothesis 1. As with other CSTs, scores from different GLT lists will produce a value

of Cronbach’s coefficient alpha of .70 or better.

Hypothesis 2. The only significant demographic predictors of GLT scores will be age

and clinical disorders that impact WMC.

Clinical disorders that impact WMC include Attention-Deficit/Hyperactivity Disorder

(ADHD) and learning disorders such as dyslexia and dyscalculia (Brydges, Ozolnieks, &

Roberts, 2017; Colbert, 2018). Participants will also be asked if they receive any treatments for

their condition, as those who are currently being treated may show performance comparable to

participants without disorders (Manassis, Tannock, Young, & Francis-John, 2007; Oswald,

McAbee, Redick, & Hambrick, 2015; Brydges, Ozolnieks, & Roberts, 2017; Colbert, 2018).

The sample will consist of undergraduate students, for whom the prefrontal cortex and

cognitive abilities are still in development (Baddeley, 2012; Linares, Bajo, & Pelegrina, 2016;

Clark et al., 2018). Having smaller WMC, younger participants are thus expected to produce

lower scores.

Hypothesis 3. There will be a positive correlation between performance on the WMC

battery and performance on the GLT; i.e., high-span participants will score higher on the GLT

than low-span participants.

9

Hypothesis 4. There will be a positive correlation between performance on the Flanker

task and performance on the GLT; i.e., participants who score high in the Flanker task will score

high on the GLT.

Hypothesis 5. There will be no association between performance on the Task Switching

Task and performance on the GLT; i.e., participants who score high in the Task Switching Task

and participants who score low in the Task Switching Task will produce similar scores.

Hypothesis 6. Consistent with hypotheses 3-5, performance on the GLT should be

related to WMC and EA, and unrelated to shifting.

Hypothesis 7. The distribution of scores obtained in the present study will not differ

significantly from that of the cross-validation sample.

10

METHOD

PARTICIPANTS

A sample of 72 undergraduate students attending the University of Central Florida (UCF)

participated in the present study. All participants were recruited during the 2018 summer term

through UCF Sona Systems, a university website that allows students to meet course

requirements and earn extra credit by participating in psychological research. Three participants

did not complete the tasks correctly and were removed altogether from the analyses. In addition,

one participant’s data files for the computer tasks were corrupted, reducing the sample size to 68

for most analyses.

The remaining 69 individuals sampled ranged from 18 to 27 years of age (M = 19.30, SD

= 2.02), and 53.62% were female. A summary of demographics can be found in Table 1.

Moreover, 82.61% of participants’ spoke English as their first language, 10.14% spoke Spanish,

and 7.25% spoke another language (i.e., Arabic, Bengali, Cantonese, Portuguese, or Indonesian).

Table 1: Demographic characteristics

Sex

Ethnicity Male Female Totals

White 17 21 38 (55.07%)

Black/African American 9 7 16 (23.19%)

Hispanic/Latino 2 3 5 (7.25%)

Asian 2 5 7 (10.14%)

Multiracial 1 1 2 (2.90%)

Other 1 0 1 (1.45%)

Totals 32 (46.38%) 37 (53.62%) 69

MATERIALS

Demographics questionnaire. Participants completed a short demographics

questionnaire on Qualtrics, which prompted participants to list their age, GPA, first language,

and any clinical conditions (and treatment for these conditions) that may impair WMC or

11

performance on a domain-specific task, such as mental and neurological disorders like ADHD,

autism, schizophrenia, Alzheimer’s disease, dyslexia, dyscalculia, and others.

Grocery List Task (GLT). A complex span task consisting of a set of grocery lists for

which each item has a quantity (specified as a number of boxes) that will be purchased and a

corresponding cost per box. The lists were standardized according to number of words, length of

words, and word frequency for each item using a word frequency corpus1 to reduce the salience

effect (Platzer & Bröder, 2012) as well as the likelihood of random guessing associated with

each item (Dowd, Kiyonaga, Beck, & Egner, 2014). In addition, the quantities and prices were

randomized. Overall, the lists contain 97 unique items (i.e., “Almonds”, “Pasta”, and “Spinach”

each appear on two different lists) and can be found in Appendix C.

To complete the task, participants sat in a driving simulator chair facing a blank projector

screen while a voice recording of my mentor Dr. Jennifer F. Louie reading each list played. The

experimenter scored the participant’s responses silently, obscured from view by a black tarp.

For each item on a given list, participants were told the item type, quantity, and price.

Subsequently, they were given 4 seconds to confirm the correct cost of each type of item by

responding “Yes” or “No” to a possible value of the cost. At the end of each list, participants

were asked to recall as many items on the list as they can in whichever order they recall best

before moving onto the next list.

Participants received the following set of instructions verbally from the experimenter: “In

this task, imagine you are going to the grocery store and need to pick up a list of items. Each

1The corpus used may be accessed at www.wordfrequency.info.

12

item comes in a varying number of boxes. I will tell you the price of each item, and how many

boxes you need to buy. Please verify the total cost of each item by saying ‘Yes’ or ‘No’. We will

give you four seconds to respond before moving on to the next item. At the end of the list, I will

ask you for all the items.” To ensure that participants understood the task requirements, the

experimenter read aloud two sample lists, included in Appendix B. Participants were told to

recall as many items as possible at the end of the list in the order that they recalled them, while

responding and maintaining accuracy in the price-confirmation partial span task.

As designed, the GLT is a dual span task that produces two measures of accuracy:

Primary accuracy (item recall) is considered the average proportion of items correctly recalled

from each list, while secondary accuracy (price confirmation) is the proportion of items for

which a participant correctly verified the cost of a type of item. It is worth noting that the terms

“primary” and “secondary” are merely used as labels, not to indicate that primary accuracy

measure is psychometrically “primary”.

Due to time constraints, the majority of participants completed the task with exactly one

half of the 100 task items, determined using the third digit in the participants’ ID assigned by

Sona Systems. The first half contained lists 1-9, while the second half included lists 10-19. As

task lists were standardized and randomly-generated, both halves should produce equivalent

measures.

Finally, to examine the effects of semantics on recall of GLT items, an adjusted version

of primary accuracy was computed separately to account for items for which participants

incorrectly recalled a similar-sounding word instead of the item itself.

13

Complex Span Tasks. Participants completed shortened versions of the traditional

Operation, Reading, and Symmetry Span tasks on the computer (Turner & Engle, 1989; Conway

et al., 2005). The mean of the partial span scores is used as the WMC score.

Task Switching Task. A dual task involving the shifting executive function, adapted

from Atkins et al. (2014). Participants began by fixating on the computer display for 500 ms.

Based on a cue that was displayed centrally at variable time intervals (80 ms, 100 ms, 200 ms,

400 ms, and 800 ms), participants were asked to judge a set of digits (1-9, excluding 5) in

magnitude relative to 5 (lower or higher) and in parity (even or odd) as quickly and accurately as

possible using keys on a QWERTY keyboard.

Unlike Atkins et al. (2014)’s design which included “High-Low” and “Parity” as

additional cues that corresponded to Magnitude and Odd-Even trials, the only cues used to

designate the trial type were the words “Magnitude” and “Odd-Even”; this change was intended

to simplify the task for participants unfamiliar with the word, “Parity”, and in turn avoid possibly

contaminating response time measures over English vocabulary. A practice block of 10 trials was

used to ensure participants understand the instructions before moving onto a 100-trial

experimental block.

Each trial was considered a “task repeat” trial if the cue matched that of the previous trial;

otherwise, the trial was considered a “task switch” trial. To assess performance on trials that

require shifting relative to an accuracy baseline, the difference between task repeat and task

switch measures was considered the switching measure.

Flanker Task. The Flanker task is a component of the Attention Network Task (ANT),

the Flanker task requires participants to determine if a target arrow located directly above or

14

below a fixation cross points either left or right, while ignoring distractor stimuli (Eriksen &

Eriksen, 1974). The distractor stimuli is classified as neutral (straight lines), congruent (arrows

pointing in the same direction as the target arrow), or incongruent (arrows pointing in the

opposite direction as the target arrow).

Participants held the computer mouse and respond using their thumbs to left-click if the

target arrow points left or right-click if the target arrow points right. Participants received the

following set of instructions and were asked to respond as quickly as possible without losing

accuracy: “This task will take a bit longer than the other tasks (around half an hour). In this task,

you will see a target arrow pointing to either the left or right. The target arrow will always be

located directly above or directly below the fixation cross. Please use the mouse to indicate

which direction the arrow is pointing: If the target is pointing to the left, click the left key with

your left thumb. If the target is pointing to the right, click the right key with your right thumb.”

Accuracy and response time measures were recorded separately for the two

classifications of the distractor stimuli. The differences between each measure on the

incongruent and congruent trials were used to operationalize EA. Throughout this thesis, this

task is referenced interchangeably as “Flanker task” or “ANT”.

PROCEDURES

To minimize the effect of the order in which tasks were completed, participants were

assigned to begin the study in one of two parts based on the third digit of a unique, randomized

ID assigned to each participant by Sona Systems prior to his or her arrival to the laboratory.

Participants completed the tasks in order as they are listed within each part, and subsequently

they completed the remaining part.

15

Part A. Grocery List Task, Attention Networks Task

Part B. Operation Span, Reading Span, Symmetry Span, Task Switching Task

On the basis of this assignment, participants were directed to a seat at a computer station

upon arriving at the laboratory. Participants were asked to read and agree to the informed consent

form at the beginning of the demographics questionnaire. After consenting to take part in the

study, participants were asked to complete the questionnaire. Afterwards, the experimenter

prompted the participant to begin the tasks in the assigned order.

Every task except the GLT was completed at a computer station in the respective part of

the laboratory. After completing all the tasks in one part, participants completed the remaining

part at a different computer station. After completing both parts of the study, participants were

thanked for their time, awarded credit for their courses, and dismissed. The duration of the study

was approximately 90 minutes.

Notably, the order sequences were not implemented as a fully-randomized control of

order effects, but rather as a compromise. Specifically, given the short amount of time allotted to

data collection in the present study, two order sequences were devised as a means of increasing

the sample by allowing experimenters to administer the tasks to two participants simultaneously

without interruptions while providing some assessment of order effects. Although not every

possible permutation of task completion order is represented in the present study, if the GLT is

highly susceptible to order effects, they may still be detectable in a pairwise comparison of GLT

performance according to order sequence.

16

CROSS-VALIDATION SAMPLE

A sample of 70 undergraduate students from the University of Central Florida completed

the GLT and several other tasks several months prior to the current sample, for the dissertation

that prompted this research (Louie, in press). The individuals sampled ranged from 18 to 32

years of age (M = 18.56, SD = 1.80), and 67.14% were female. A summary of their

demographics can be found in Table 2.

Participants in the cross-validation sample only completed lists 1-9 of the GLT. Thus, the

obtained GLT and WMC measures differs methodologically across studies. Nonetheless,

although measures from a variety of different domains are necessary to produce adequate

estimates of WMC (Conway et al., 2005; Unsworth et al., 2009; Foster et al., 2015), performance

on CSTs generally does not vary across these domains (Kane, Conway, Hambrick, & Engle,

2007). Moreover, assuming both halves of the GLT are equivalent, the primary accuracy

measures should remain comparable. As such, this sample will be used jointly with the current

sample for cross-validation to assess psychometric properties of the GLT.

Table 2: Demographic characteristics, cross-validation sample

Sex

Ethnicity Male Female Totals

White 10 22 32 (44.44%)

Hispanic/Latino 2 9 11 (15.28%)

Black/African American 2 9 11 (15.28%)

Asian 5 4 9 (12.50%)

Mixed/Other 4 5 9 (12.50%)

Totals 23 (31.94%) 49 (68.06%) 72

17

RESULTS

DEMOGRAPHICS

An analysis of demographic effects was conducted on the GLT measures. For primary

accuracy, no significant differences were observed between male and female participants, t(66) =

-0.40, p = n.s.. Likewise, primary accuracy did not significantly differ by ethnicity, F(5, 62) =

1.29, p = n.s.. No Sex × Ethnicity interaction effects reached significance for primary accuracy,

F(9, 58) = 0.73, p = n.s.. For secondary accuracy, no significant differences were observed

between male and female participants, z = 1.71, p = .08. Likewise, no significant differences in

secondary accuracy were observed by ethnicity, χ2(5, N = 68) = 3.51, p = n.s..

A positive correlation between age and primary accuracy approached significance, r(66)

= .23, p = .06. However, no such association was observed with secondary accuracy, r(66) =

-.13, p = n.s.. Primary accuracy was unrelated to GPA, r(66) = -.14, p = n.s.. Likewise, secondary

accuracy was unrelated to GPA, r(66) = .06, p = n.s..

Finally, primary accuracy did not differ by first language (F(5, 62) = 0.98, p = n.s.) or

university major, F(16, 51) = 0.73, p = n.s.. Likewise, secondary accuracy did not differ by first

language (χ2(5, N = 68) = 9.54, p = n.s.) or by university major, χ2(16, N = 68) = 20.82, p = n.s..

RELIABILITY

Order effects. Half of participants began by completing part A of the study (i.e., GLT,

ANT) first; the other half by completing part B (i.e., Operation Span, Reading Span, Symmetry

Span, TST) first. Regardless, no order effects reached significance for primary accuracy (t(66) =

-0.88, p = n.s.), secondary accuracy (z = -0.42, p = n.s.), or any of the other measures used.

18

Internal consistency. Cronbach’s coefficient alpha was calculated for each set of lists

using the order in which items were recalled. For the first half of the task lists, the coefficient

alpha was calculated to be 0.73, while the second half produced a value of 0.81.

Further analyses of 4,802 inter-item correlations revealed 38.57% produced a correlation

coefficient estimate less than or equal to 0, while only 39.32% produced an estimate greater than

or equal to 0.10. Only 30.03% were statistically significantly larger than 0 at the 80% confidence

level. In addition, 17 items produced a median inter-item correlation coefficient of 0 or less.

These items were included in lists 1 (“pasta”), 2 (“cupcake”), 3 (“cinnamon”, “pepperoni”), 4

(“walnuts”), 5 (“pasta”), 6 (“peas”, “cashews”), 7 (“avocado”, “cabbage”, “pickles”), 9

(“croutons”, “broccoli”), 10 (“waffles”), 12 (“corn”, “garlic”), and 13 (“asparagus”). Notably,

“pasta”, the only item to appear twice on the same half (lists 1 and 5), produced a very poor

inter-item correlation with itself (r = -.14).

Figure 1: Median inter-item correlation coefficients for item recall.

Parallel-forms reliability. Thirty-six (52.17%) participants completed GLT lists 10-19,

while 32 (46.38%) completed lists 1-9; one participant completed both halves. Without

considering this participant, primary accuracy did not differ across the split halves, t(67) = -0.57,

p = n.s.. Likewise, secondary accuracy did not differ between task halves, z = -1.43, p = n.s..

19

VALIDITY

Primary accuracy. The distribution of primary accuracy on the GLT was normal, M =

58.80%, Mdn = 60.00%, SEM = 1.32%, p = .06. Primary accuracy was positively correlated with

the mean CST score, r(66) = .32, 95% CI [.09, .52], p < .01. The Reading Span task was the only

CST which produced scores significantly associated with GLT primary accuracy, r(66) = .45,

95% CI [.22, .63], p < .001. By contrast, no significant linear relationship was observed between

primary accuracy and Flanker task accuracy (r(66) = -.06, p = n.s.) or response times, r(66) =

-.01, p = n.s.. Likewise, primary accuracy was unrelated to accuracy (r(66) = .16, p = n.s.) and

response time measures on the Task Switching Task, r(66) = -.18, p = n.s..

The version of primary accuracy adjusted for semantics was 1.45% higher than the non-

adjusted version, t(67) = 6.14, 95% CI [0.98%, 1.92%] p < .0001. However, although adjustment

reduced standard error, primary accuracy was no longer normally-distributed, M = 59.89%, SEM

= 1.28%, p < .05. Moreover, the correlation coefficient between primary accuracy and the mean

WMC score was weakened by the adjustment, r(66) = .28, 95% CI [.05, .49], p = .019.

Secondary accuracy. The distribution of secondary accuracy on the GLT was non-

normal with a skewness of -3.44 and kurtosis of 17.45, M = 93.92%, Mdn = 96.00%, SEM =

1.01%, p < .0001. Secondary accuracy was unrelated to the WMC score, r(66) = .19, p = n.s..

20

Figure 2a: Distribution of GLT primary

accuracy.

Figure 2b: Distribution of GLT secondary

accuracy.

No significant linear relationship was observed between secondary accuracy and Flanker

task accuracy (r(66) = -.22, p = .07) or response times, r(66) = -.09, p = n.s.. Finally, secondary

accuracy was unrelated to both accuracy (r(66) = .10, p = n.s.) and response times measures on

the Task Switching Task, r(66) = .00, p = n.s..

Table 3: Correlation matrix of task measures

GLT ANT TST CSTs

Primary Secondary +

Acc +

RT +

Acc +

RT +

WMC

Primarya - .16 -.06 -.01 .16 -.18 .32**

Secondarya+ .16 - -.22 -.09 .10 .00 .19

Accb+ -.06 -.22 - .03 .13 -.19 -.14

RTb+ -.01 -.09 .03 - .05 -.09 .06

Accc+ .16 .10 .13 .05 - -.03 .09

RTc+ -.18 .00 -.19 -.09 -.03 - -.01

WMCd .32** .19 -.14 .06 .09 -.01 -

**p < .01 +Specifies that coefficients were calculated for non-normal data using Spearman’s

nonparametric method (rs). aDenotes accuracy measurements obtained from the Grocery List Task (GLT). bDenotes measurements obtained from the Attention Network Task (ANT). cDenotes measurements obtained from the Task Switching Task (TST). dDenotes the mean of the partial span score obtained from the Operation Span, Symmetry

Span, and Reading Span tasks.

Note: response time measures are labeled “RT” while accuracy measures appear as “Acc”.

21

WMC disorders. Among the individuals recruited in the study, five had Attention

Deficit Disorder (ADD) or Attention Deficit Hyperactivity Disorder (ADHD), two had

Obsessive-Compulsive Disorder (OCD), and one was dyslexic.

For primary accuracy, the standardized z-scores obtained from these individuals ranged

from -0.61 to 1.46. Combined, individuals with WMC disorders had significantly lower primary

accuracy than individuals without WMC disorders, t(66) = -1.82, p < .05 (one-tailed). Due to the

small sample size of participants with WMC disorders, however, no within-group differences

reached significance for analyses of WMC deficiencies according to disorder, F(4, 3) = 0.61, p =

n.s.. In addition, those with untreated WMC disorders performed comparably to those who

receive treatment, t(6) = -0.81, p = n.s..

For secondary accuracy, no significant differences were detected between individuals

with WMC disorders and individuals without WMC disorders, z = -1.46, p = n.s..

Regression analysis. A multiple linear regression was calculated to predict standardized

GLT primary accuracy with WMC (i.e., the mean WMC span) and EA (i.e., accuracy on the

Flanker task portion of the ANT) as predictors. A significant regression equation was found, F(3,

64) = 5.49, p < .01, f 2 = .26. Accordingly, this is the model endorsed by the regression analysis:

GLT Primary(z) = (0.15)WMC + (0.07)EA + (-4.45 × 10-4)WMC×EA - 2.50

Overall, the model accounted for approximately 16.74% of the variance in GLT primary

accuracy z-scores. Expected primary accuracy z-scores correlated strongly with observed

primary accuracy z-scores, r(66) = .45, 95% CI [.24, .62], p < .001. However, the distribution of

standardized residuals obtained from the model was non-normal, M = 3.19 × 10-17, Mdn = 0.06,

SD = 0.89, p = .03626.

22

According to the model, a 1-point increase in mean WMC score was associated with a

GLT primary accuracy increase of 0.15 SD, all other factors held constant, t(64) = 3.69, p < .001,

d = 0.92. Likewise, a 1-point increase in ANT accuracy was associated with a GLT primary

accuracy increase of 0.07 SD, all other factors held constant, t(64) = 2.27, p < .05, d = 0.57. In

addition, the main effect of WMC on primary accuracy depends on EA and vice versa, t(64) = -

2.62, p < .05, d = -0.66.

Table 4: Regression analysis of mean WMC span score and ANT accuracy as predictors of

standardized GLT primary accuracy

Source Estimate (β) SE t* p(|t| ≥ t*)

Intercept -2.497100 0.714891 -3.493 .000872***

WMCa 0.149421 0.040464 3.692 .000462***

EAb 0.065975 0.029107 2.267 .026799*

WMC*EA -0.004467 0.001704 -2.622 .010914*

*p < .05

***p < .001 aDenotes the mean WMC partial span score. bDenotes the mean accuracy difference between incongruent and congruent trials on the Flanker

task portion of the Attention Networks Task (ANT).

CROSS-VALIDATION

A Welch t-test revealed that there were no significant differences in primary accuracy

between the present sample and the cross-validation sample, t(133.25) = -0.24, p = n.s..

Likewise, no significant differences were found between samples in secondary accuracy, z =

0.60, p = n.s.. However, WMC measures for the cross-validation sample were significantly lower

than those of the current of study, Md = 2.52, t(138) = 4.52, 95% CI [1.42, 3.62], p < .0001. ANT

accuracy was also lower for the cross-validation than for the current sample, Mdnd = 3.00, z = -

3.11, 95% CI [-6.00, -1.00], p < .01.

23

Figure 3: Distributions of standardized GLT primary accuracy.

Note: the current sample appears violet, while the cross-validation sample appears in red.

Regression analysis. As in the previous regression analysis, a multiple linear regression

was calculated to predict standardized GLT primary accuracy based on WMC and EA measures

in the cross-validation sample. Given the significant differences in WMC and EA between the

samples, the analysis was conducted separately rather than by pooling the samples.

A significant regression equation was found, F(2, 69) = 6.85, p < .01, f 2 = .20.

Accordingly, this is the model endorsed by the regression analysis:

GLT Primary(z) = (0.09)WMC + (-0.02)EA - 0.80

Overall, the model accounted for approximately 14.15% of the variance in GLT primary

accuracy z-scores. Expected primary accuracy z-scores correlated strongly with observed

primary accuracy z-scores, r(70) = .41, 95% CI [.19, .58], p < .001. Also, the distribution of

standardized residuals obtained from the model was approximately normally distributed, M =

1.29 × 10-17, Mdn = -0.07, SD = 0.91, p = n.s..

Based on this model, a 1-point increase in mean WMC score was associated with a GLT

primary accuracy increase of 0.07 SD, all other factors held constant, t(69) = 2.51, p < .05, d =

24

0.60. In the cross-validation sample, the main effect of ANT accuracy failed to reach

significance, t(68) = 1.94, p = .05701. However, the main effect of WMC on primary accuracy is

dependent upon EA, t(69) = -3.31, p < .01, d = -0.80.

Table 5: Regression analysis of mean WMC span score and ANT accuracy as predictors of

standardized GLT primary accuracy, cross-validation sample

Source Estimate (β) SE t* p(|t| ≥ t*)

Intercept -0.8000176 0.4431461 -1.805 .07539

WMC 0.073701 0.0387765 2.511 .01438*

WMC*EA -0.0018551 0.0005607 -3.309 .00149**

*p < .05

**p < .01

ITEM RECALL PATTERNS

Finally, an analysis of recall patterns for every GLT list was conducted using locally-

weighted scatterplot smoothening (LOWESS) curves of the recall rank mode (i.e., the most

frequently-occurring order in which a particular item was recalled along the sequence of items

recalled by participants) and difficulty index (p) (i.e., proportion of participants who correctly

recalled the item) (Miller & Lovler, 2016) in order to identify lists in which item recall suggested

evidence of the serial-position effects (Capitani, et al., 1992). As lists 16 and 19 only have two

items each, they were excluded from this analysis. Regardless, the plots and curves for item

recall patterns appear in Appendix D.

The criteria for resembling the serial-position effect simply requires that for any given

list, the LOWESS curves of the difficulty index and recall rank mode resemble a U-shaped curve

about the middle items such that the first and last items are recalled relatively more frequently

(i.e., high value of p) and ahead of (i.e., low recall rank mode) other items on the list (Capitani, et

al., 1992). Using these criteria, evidence of the primacy effect was observed in the difficulty

25

index curves of every list except list 4, as well as in the recall rank mode curve of lists 1, 3-6, 9,

and 13. The recency effect was observed in the difficulty index curves of lists

2-11, 13, 15, 17, and 18, as well as in the recall rank mode curve of lists 1, 2, 5, 8, 12, and 15.

Overall, the primacy effect was observed in some capacity in every list, while the recency effect

was only entirely absent from list 14.

26

DISCUSSION

The body of psychometric evidence presented in this thesis provides support for the GLT

as an alternative measure of WMC to traditional CSTs. Below is an evaluation of the GLT’s

strengths and weaknesses as a complex span task.

THEORETICAL IMPLICATIONS

Hypothesis 1. Despite halving the sample size associated with analyses of internal

consistency, splitting the task showed success in producing two equivalent forms of the GLT,

each unaffected by order effects and above the conventional lower-bound threshold of 0.70 for

internal consistency in CSTs (Conway et al., 2005). Although Cronbach’s coefficient alpha is

subject to biases as the number of items increases (Okada 2013; Vaske, Beaman, & Sponarski,

2017), it serves as a preliminary, conservative estimate of the GLT’s internal consistency

(Cronbach, 1951).

Hypothesis 2. As expected, individuals with clinical WMC disorders produced

significantly lower primary accuracy on the GLT than other participants, although the same

differences were not present in secondary accuracy. Due to the small number of participants with

WMC disorders, however, not much else can be said about within-group differences according to

disorder or treatment.

Moreover, GLT accuracy measures failed to replicate age-linked differences in WMC in

the current sample (Clark et al., 2018). Although the significance test for this analysis

approached significance, it is likely that the variability in age of the current sample was too

restricted to detect a significant correlation between task measures and age.

27

As predicted, none of the other demographics factors were related to GLT measures,

suggesting that measurements obtained from the GLT are equivalent across demographic groups.

Hypotheses 3-6. The GLT primary accuracy measure was normally-distributed, allowing

for calculations of z-scores in regression analyses, and it was related to several of the other

measures. By contrast, secondary accuracy was non-normal and unrelated to any predictors. As

such, it was excluded from further analyses as a WMC measure. However, as most participants

had very high accuracy and low variability on this partial span, it may be useful to use secondary

accuracy to gauge whether participants are completing the task correctly or merely guessing

randomly. In fact, the participants who were excluded from the present sample for not

completing the tasks correctly produced much lower secondary accuracy than others.

As hypothesized, the GLT (specifically, primary accuracy) was strongly related to the

mean score obtained from other CSTs, especially the one most closely related in design: the

Reading Span task. This result is interpreted as convergent evidence of validity for GLT primary

accuracy. In addition, both GLT measures were unrelated to measures obtained from the Task

Switching Task, as expected of CSTs which are unrelated to shifting. Thus, this result provides

discriminant evidence of validity for the GLT measures. Combined, these results are supportive

of the GLT’s construct validity.

Although the correlation between GLT primary accuracy and EA was non-significant,

regression analyses for both samples revealed a significant interaction effect between WMC and

EA. The model fit for the current sample revealed a significant main effect of EA which was

non-significant when WMC was excluded from the model; the same main effect was only

marginally-significant in the cross-validation sample.

28

Regardless, these model fits suggest that variability in the GLT primary accuracy

measure can be explained by tasks which involve the updating and inhibition executive functions

(Miyake et al., 2000). Like other CSTs, the GLT was unrelated to the shifting executive function,

operationalized as measures obtained from Task Switching Task (Miyake et al., 2000).

Accordingly, these convergent and discriminant provide evidence of validity via established

relationships with other latent variables. In other words, GLT primary accuracy and its observed

relationships with other variables were consistent with an established theoretical framework of

working memory (i.e., Miyake et al. (2000)’s executive functions model).

Hypothesis 7. The cross-validation analysis revealed that GLT measures did not differ

across samples, providing evidence in favor of the GLT as a standardized measure. Likewise, the

regression model observed in the present sample was replicated with the cross-validation sample.

Unfortunately, however, the null differences in GLT accuracy were observed along with

statistically-significant differences in WMC and EA between samples, suggesting that the GLT

may not be currently as sensitive as other measures in detecting individual differences.

Although mixed, these preliminary findings suggest the GLT has potential as a

standardized measure WMC with further validation. In addition, although the current study

proposed using the GLT as a standardized measure of WMC, the task may have applicability

elsewhere in cognitive research given that many of the list recall patterns closely resemble the

patterns expected of the serial-position effect (Capitani et al., 1992).

PRACTICAL IMPLICATIONS

The evidence presented in this thesis is mostly supportive of the GLT as a higher-fidelity

measure of WMC. Researchers oriented towards applied cognitive research can benefit from the

29

advantages offered by the GLT over traditional CSTs and use the task to engage the working

memory system in studies of vehicular systems, aviation, user interface, or any activity that

restricts participants to verbal responses. For visually-impaired individuals, the GLT may serve

as an estimate of WMC comparable to that of other CSTs. Finally, with further improvements to

task items, the GLT may be capable of detecting WMC impairments for participants with mental

or learning disorders.

LIMITATIONS

Order effects. The control of order effects was limited to two distinct order sequences in

the present study for practical reasons. Although no order effects were observed between the two

order sequences, the present study does not conclusively eliminate the possibility of order effects

on GLT measures.

Ethnicity and first language. When considering the GLT’s reliability and potential

generalizability, however, it is worth noting that although the effects of ethnicity and first

language were reportedly null in the present sample, the task items are entirely in English and the

task was administered to an overwhelmingly English-speaking (82.61%) and White (55.07%)

sample. Thus, it is possible that further studies may observe significant differences in GLT

accuracy by first language or region due to a lack of familiarity with the vocabulary and

semantics of the items presented in the list.

Shifting. The shifting executive function was operationalized using an adaptation of

Atkins et al. (2014)’s Task Switching Task (TST). Aside from the face validity of task design,

not much is known about the psychometric properties of the TST.

30

Regression analyses. In the interest of examining construct-related evidence of validity,

a confirmatory factor analysis (CFA) would have been a method of studying the relationships

between latent variables in this study and the GLT (Miller & Lovler, 2016). However, the

minimum sample size required for this procedure is approximately 300 (VanVoorhis & Morgan,

2007) and data collection was limited to the summer semester. Although this approach was

methodologically unfeasible for the present study, it may describe the relationships observed in

the regression analysis in more depth.

Cross-validation. The sample obtained for cross-validation does not fit the common

definition of a cross-validation sample, given that these individuals were undergraduate students

recruited from the same university, and the sample size was approximately the same as that of

the current sample. As such, this analysis may be viewed as a quasi-between-subjects analysis of

test-retest reliability rather than cross-validation.

Although no significant differences were found between GLT measures, the samples

were not equivalent in EA and WMC, limiting the scope of conclusions reached by analysis.

FUTURE RESEARCH

In conducting this psychometric analysis of the GLT, the main intent was not to formally

create a task, but rather to demonstrate that the task design paradigm of grocery lists shows

promise in producing higher-fidelity measures of WMC. In fact, I highly suspect that future

research involving the GLT would likely involve modifications of the items on the lists, such as

improving internal consistency by conducting a formal item analysis on a larger sample or

revising problematic items to strengthen the weak inter-item correlations observed in the present

31

study. In turn, these improvements may improve the task’s sensitivity in detecting individual

differences in WMC, as was observed in the cross-validation analysis.

The vocabulary used in the GLT may contaminate some of the obtained measures in

larger samples with participants unfamiliar with certain grocery items. As such, other

psychometrically-oriented researchers may opt to create region- or language-specific items to

minimize the potential effects of semantics on measures obtained from the GLT. Likewise, they

may opt to use adjustments similar to the one proposed for item recall in which similar-sounding

words were recalled instead of the list items, although this particular approach was unsuccessful

in strengthening the relationship with WMC.

Notably, along the process of data collection for this study, participants frequently

reported that some of the items “stood out” to them or were difficult to remember due to their

lack of familiarity with the items, suggesting that the salience effect may have occurred with

some items (Platzer & Bröder, 2012). These occurrences warrant further study as the salience

effect is related to attentional factors (Dowd & Mitroff, 2013), which given the interaction

between WMC and executive attention reported in the regression analysis, certainly make the

salience effect a potential moderator of primary accuracy measures obtained from the GLT.

For cross-validation, future studies may recruit a much larger sample obtained than the

one used, and from a different participant population. As mentioned, a confirmatory factor

analysis would be a more suitable method for future researchers to test how the GLT adheres to

theoretical models of WMC on a larger sample. Coupled with this approach, researchers may opt

to add an external measure of the inhibiting executive function. In addition, it may be worth

exploring what item-level and list-level factors produce the recall pattern indicative of the serial-

32

position effect exhibited in the recall lists, as well as whether age-related differences can be

observed in a sample that is more variable in age.

Finally, it is also foreseeable that an automated version of the GLT may be created with

superior administration characteristics, such as the ability to record and score participant

responses and response times using voice-recognition software (from which online adaptations

can be made), expanding and modifying the grocery item vocabulary pool, minimizing

experimenter bias, providing feedback, and procedurally-generating lists for computer-adaptive

testing (CAT) (Miller & Lovler, 2016).

.

33

APPENDIX A: IRB APPROVAL LETTER

34

APPENDIX A: IRB APPROVAL LETTER

35

36

APPENDIX B: GLT INSTRUCTIONS AND SAMPLE LISTS

37

APPENDIX B: GLT INSTRUCTIONS AND SAMPLE LISTS

Instructions: “In this task, imagine you are going to the grocery store and need to pick up a list

of items. Each item comes in a varying number of boxes. I will tell you the price of each item,

and how many boxes you need to buy. Please verify the total cost of each item by saying ‘Yes’

or ‘No’. We will give you four seconds to respond before moving on to the next item. At the end

of the list, I will ask you for all the items.”

Sample List 1

You need to pick up 2 boxes of $2 bacon. $4? [Yes / No] (Correct response = “Yes”)

3 boxes of $9 onion. $21? [Yes / No] (Correct response = “No”)

2 boxes of $3 garlic. $3? [Yes / No] (Correct response = “No”)

1 box of $9 tuna. $9? [Yes / No] (Correct response = “Yes”)

Please recall the list. (Correct response = “bacon, onion, garlic, and tuna”)

Sample List 2

1 box of $7 corn. $7? [Yes / No] (Correct response = “Yes”)

4 boxes of $1 chocolate. $8? [Yes / No] (Correct response = “No”)

2 boxes of $2 chicken. $5? [Yes / No] (Correct response = “No”)

Please recall the list. (Correct response = “corn, chocolate, and chicken”)

38

APPENDIX C: GLT LISTS

39

APPENDIX C: GLT LISTS

Lists 1-9

Cost Item Question

Correct

answer

List 1

(7 items)

0:00 min

2 boxes of $1 Jello $2 YES

6 boxes of $4 Pasta $24 YES

6 boxes of $7 Berries $36 NO

7 boxes of $1 Mango $7 YES

1 box of $5 Pecans $6 NO

2 boxes of $4 Celery $8 YES

9 boxes of $6 Salami $54 YES

List 2

(3 items)

1:02 min

2 boxes of $1 Parfait $3 NO

3 boxes of $1 Cupcake $4 NO

6 boxes of $7 Cinnamon $49 NO

List 3

(3 items)

1:29 min

3 boxes of $1 Pepperoni $4 NO

3 boxes of $1 Yeast $2 NO

4 boxes of $8 Basil $40 NO

List 4

(3 items)

1:56 min

6 boxes of $4 Walnuts $24 YES

4 boxes of $5 Almonds $25 NO

5 boxes of $1 Zucchini $6 NO

40

List 5

(5 items)

2:24 min

5 boxes of $3 Pasta $15 YES

1 box of $8 Carrots $8 YES

5 boxes of $9 Coleslaw $45 YES

9 boxes of $9 Tangerine $81 YES

4 boxes of $4 Mushrooms $20 NO

List 6

(8 items)

3:09 min

2 boxes of $9 Tea $27 NO

6 boxes of $2 Onion $12 YES

3 boxes of $9 Figs $36 NO

9 boxes of $7 Peas $70 NO

6 boxes of $8 Cashews $48 YES

4 boxes of $4 Plums $12 NO

7 boxes of $1 Strawberries $14 NO

3 boxes of $5 Bacon $20 NO

List 7

(8 items)

4:22 min

1 box of $7 Spinach $7 YES

5 boxes of $6 Jam $36 NO

2 boxes of $4 Nutmeg $8 YES

6 boxes of $2 Ravioli $12 YES

9 boxes of $4 Avocado $45 NO

8 boxes of $8 Cabbage $64 YES

5 boxes of $2 Pickles $15 NO

6 boxes of $4 Vanilla $24 YES

List 8

(6 items)

5:36 min

5 boxes of $5 Cucumber $30 NO

1 box of $4 Grapes $4 YES

6 boxes of $4 Oregano $24 YES

6 boxes of $9 Oatmeal $54 YES

7 boxes of $5 Lettuce $35 YES

4 boxes of $3 Biscuit $16 NO

41

List 9

(7 items)

6:31 min

2 boxes of $3 Artichoke $6 YES

7 boxes of $4 Cheese $5 NO

5 boxes of $5 Peppers $30 NO

2 boxes of $7 Vinegar $21 NO

8 boxes of $5 Croutons $32 NO

1 boxes of $7 Broccoli $14 NO

5 boxes of $6 Margarine $36 NO

Cost Item Question

Correct

answer Lists 10-19

List 10

(7 items)

7:35 min

6 boxes of $9 Radish $54 YES

9 boxes of $2 Paprika $18 YES

8 boxes of $6 Flour $48 YES

9 boxes of $4 Lentil $45 NO

1 boxes of $8 Tamarind $16 NO

4 boxes of $2 Spinach $8 YES

6 boxes of

$1 Waffles $12 NO

List 11

(4 items)

8:40 min

1 boxes of $1 Cherries $2 NO

3 boxes of $6 Olives $18 YES

9 boxes of $8 Almonds $81 NO

7 boxes of $8 Chicken $56 YES

List 12

(5 items)

9:16 min

5 boxes of $8 Corn $40 YES

4 boxes of $2 Tomatoes $8 YES

1 boxes of $1 Ginger $1 YES

2 boxes of $7 Coconut $14 YES

8 boxes of $7 Garlic $56 YES

42

List 13

(6 items)

10:02 min

9 boxes of $4 Salmon $45 NO

6 boxes of $3 Apples $18 YES

7 boxes of $7 Chocolate $42 NO

2 boxes of $1 Sausage $2 YES

1 boxes of $8 Orange $1 NO

2 boxes of $7 Asparagus $21 NO

List 14

(6 items)

10:56 min

2 boxes of $1 Tuna $2 YES

1 boxes of $9 Anchovies $9 YES

4 boxes of $4 Croissants $4 NO

4 boxes of $1 Kimchi $4 YES

6 boxes of $6 Halibut $42 NO

2 boxes of $8 Sauerkraut $16 YES

List 15

(8 items)

11:51 min

5 boxes of $8 Horseradish $8 NO

2 boxes of $6 Nectarines $18 NO

9 boxes of $1 Cranberries $9 YES

1 boxes of $4 Blueberries $4 YES

5 boxes of $1 Crabmeat $1 NO

4 boxes of $2 Quinoa $12 NO

3 boxes of $7 Couscous $21 YES

9 boxes of $5 Hazelnut $45 YES

List 16

(2 items)

13:05 min

5 boxes of $9 Tortillas $45 YES

6 boxes of $2 Shallots $12 YES

43

List 17

(6 items)

13:22 min

5 boxes of $4 Prunes $25 NO

2 boxes of $8 Cilantro $2 NO

4 boxes of $1 Wasabi $8 NO

4 boxes of $5 Hummus $25 NO

8 boxes of $5 Molasses $40 YES

7 boxes of $3 Apricots $21 YES

List 18

(4 items)

14:22 min

7 boxes of $5 Lemongrass $35 YES

5 boxes of $6 Applesauce $36 NO

5 boxes of $5 Barley $5 NO

1 boxes of $2 Hushpuppies $2 YES

List 19

(2 items)

15:00 min

8 boxes of $2 Rigatoni $24 NO

8 boxes of $5 Seasoning $40 YES

44

APPENDIX D: ITEM RECALL PATTERNS

45

APPENDIX D: ITEM RECALL PATTERNS

List 1

List 2

46

List 3

List 4

47

List 5

List 6

48

List 7

List 8

49

List 9

List 10

50

List 11

List 12

51

List 13

List 14

52

List 15

List 16

53

List 17

List 18

54

List 19

55

REFERENCES

Atkins, S. M., Sprenger, A. M., Colflesh, G. H., Briner, T. L., Buchanan, J. B., Chavis, S. E.,

Chen, S.Y., Iannuzi, G.L., Kashtelyan, V., Downling, E., Harbison, J.I., Bolger, D.J.,

Bunting, M.F., & Dougherty, M. R. (2014). Measuring working memory is all fun and

games: A four-dimensional spatial game predicts cognitive task performance.

Experimental Psychology, 61(6), 417-438. doi:10.1027/1618-3169/a000262.

Baddeley, A. D., & Hitch, G. (1974). Working memory. In G.H. Bower (Ed.), The psychology of

learning and motivation: Advances in research and theory (Vol. 8, pp. 47–89). New

York: Academic Press.

Baddeley, A.D. (1986). Working memory. New York, NY, US: Clarendon Press/Oxford

University Press.

Baddeley, A.D. (2009). The functional approach to autobiographical memory. Applied Cognitive

Psychology, 23(8), 1045-1049. doi:10.1002/acp.1608.

Baddeley, A.D. (2012). Working memory: Theories, models, and controversies. Annual Review

of Psychology, 6, 1-30. doi:10.1146/annurev-psych-120710-100422.

Brauer, J., Neumann, J., & Friederici, A. D. (2008). Temporal dynamics of perisylvian activation

during language processing in children and adults. NeuroImage, 41(4), 1484-1492.

Brydges, C. R., Ozolnieks, K. L., & Roberts, G. (2017). Working memory – not processing

speed – mediates fluid intelligence deficits associated with attention deficit/hyperactivity

disorder symptoms. Journal of Neuropsychology, 11(3), 362-377. doi:10.1111/jnp.12096.

Capitani, E., della Sala, S., Logie, R. H., & Spinnler, H. (1992). Recency, primacy, and memory:

Reappraising and standardising the serial position curve. Cortex: A Journal Devoted to

56

the Study of the Nervous System and Behavior, 28(3), 315-342. doi:10.1016/S0010-

9452(13)80143-8.

Clark, K. M., Hardman, K. O., Schachtman, T. R., Saults, J. S., Glass, B. A., & Cowan, N.

(2018). Tone series and the nature of working memory capacity

development. Developmental Psychology, 54(4), 663-676. doi:10.1037/dev0000466.

Cohen, G., & Conway, M. A. (2008). Memory in the Real World., (3rd ed). New York, NY, US:

Psychology Press.

Colbert, A. M. (2018). Working memory and symptoms of attention deficit hyperactivity

disorder in children. Dissertation Abstracts International, 78.

Conway, A. A., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle, R. W.

(2005). Working memory span tasks: A methodological review and user's guide.

Psychonomic Bulletin & Review, 12(5), 769-786. doi:10.3758/BF03196772.

Conway, A. A., Jarrold, C., Kane, M. J., Miyake, A., & Towse, J. N. (2007). Variation in

working memory. New York, NY, US: Oxford University Press.

Cowan, N. An embedded-processes model of working memory. In: Miyake A, Shah P, editors.

Models of working memory: Mechanisms of active maintenance and executive control.

Cambridge, UK: Cambridge University Press; 1999. pp. 62–101.

Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental

storage capacity. Behavioral and Brain Sciences, 24, 87-114.

Cowan, N. (2005). Working memory capacity. Hove, East Sussex, UK: Psychology Press.

Cowan, N., Morey, C. C., & Chen, Z. (2007). The legend of the magical number seven.

In S. Della Sala, S. Della Sala (Eds.) , Tall tales about the mind & brain: Separating fact

57

from fiction (pp. 45-59). New York, NY, US: Oxford University Press.

doi:10.1093/acprof:oso/9780198568773.003.0005.

Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika. 16(3):

297–334. doi:10.1007/bf02310555.

Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and

reading. Journal of Verbal Learning & Verbal Behavior, 19(4), 450-466.

doi:10.1016/S0022-5371(80)90312-6.

Drew, T., Boettcher, S. P., & Wolfe, J. M. (2016). Searching while loaded: Visual working

memory does not interfere with hybrid search efficiency but hybrid search uses working

memory capacity. Psychonomic Bulletin & Review, 23(1), 201-212. doi:10.3758/s13423-

015-0874-8.

Dowd, E. W., & Mitroff, S. R. (2013). Attentional guidance by working memory overrides

salience cues in visual search. Journal of Experimental Psychology: Human Perception

And Performance, 39(6), 1786-1796. doi:10.1037/a0032548.

Dowd, E. W., Kiyonaga, A., Beck, J. M., & Egner, T. (2014). Probability of guessing, not

precision, changes in mixture models of visual working memory during cognitive control

of attentional guidance. Visual Cognition, 22(8), 1027-1030.

doi:10.1080/13506285.2014.960669.

Engle, R. W., Kane, M. J., & Tuholski, S. W. (1999). Individual differences in working memory

capacity and what they tell us about controlled attention, general fluid intelligence, and

functions of the prefrontal cortex. In A. Miyake, P. Shah, A. Miyake, P. Shah (Eds.),

Models of working memory: Mechanisms of active maintenance and executive control

58

(pp. 102-134). New York, NY, US: Cambridge University Press.

doi:10.1017/CBO9781139174909.007.

Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in

Psychological Science, 11, 19-23. doi:10.1111/1467-8721.00160.

Engle, R. W., & Kane, M. J. (2004). Executive attention, working memory capacity, and a

two-factor theory of cognitive control. In B. H. Ross, B. H. Ross (Eds.), The

psychology of learning and motivation: Advances in research and theory, (Vol. 44, pp.

145-199). New York, NY, US: Elsevier Science.

Eriksen, B. A., & Eriksen, C. W. (1974). Effects of noise letters upon the identification of a

target letter in a nonsearch task. Attention, Perception, & Psychophysics, 16, 143–149.

Fan, J., McCandliss, B. D., Sommer, T., Raz, A., & Posner, M. I. (2002). Testing the efficiency

and independence of attentional networks. Journal of Cognitive Neuroscience, 14, 340-

347.

Felez-Nobrega, M., Foster, J. L., Puig-Ribera, A., Draheim, C., & Hillman, C. H. (2017).

Measuring working memory in the Spanish population: Validation of a multiple

shortened complex span task. Psychological Assessment. doi:10.1037/pas0000484.

Foster, J. L., Shipstead, Z., Harrison, T. L., Hicks, K. L., Redick, T. S., & Engle, R. W. (2015).

Shortened complex span tasks can reliably measure working memory capacity. Memory

& Cognition, 43(2), 226-236. doi:10.3758/s13421-014-0461-7.

Friedman, N. P., Miyake, A., Corley, R. P., Young, S. E., DeFries, J. C., & Hewitt, J. K. (2006).

Not all executive functions are related to intelligence. Psychological Science, 17, 172-

179.

59

Garavan, H. (1998). Serial attention within working memory. Memory & Cognition, 26(2),

263-276. doi:10.3758/BF03201138.

Gathercole, S. E., & Alloway, T. P. (2008). Working memory and classroom learning. In S. K.

Thurman, C. A. Fiorello, S. K. Thurman, C. A. Fiorello (Eds.), Applied cognitive

research in K–3 classrooms (pp. 17-40). New York, NY, US: Routledge/Taylor &

Francis Group.

Gonthier, C., Thomassin, N., & Roulin, J. (2016). The composite complex span: French

validation of a short working memory task. Behavior Research Methods, 48(1), 233-242.

doi:10.3758/s13428-015-0566-3.

Groen, G. J., & Patel, V. L. (1988). The relationship between comprehension and reasoning in

medical expertise. In M. H. Chi, R. Glaser, M. J. Farr, M. H. Chi, R. Glaser, M. J. Farr

(Eds.), The nature of expertise (pp. 287-310). Hillsdale, NJ, US: Lawrence Erlbaum

Associates, Inc.

Heenan, A., Herdman, C. M., Brown, M. S., & Robert, N. (2014). Effects of conversation on

situation awareness and working memory in simulated driving, Human Factors, 56,

1077-1092.

Heitz & Engle (2007). Focusing the spotlight: Individual differences in visual attention control.

Journal of Experimental Psychology: General, 136, 217-240.

Hicks, K. L., Foster, J. L., & Engle, R. W. (2016). Measuring working memory capacity on the

web with the online working memory lab (the OWL). Journal of Applied Research in

Memory and Cognition, 5(4), 478-489. doi:10.1016/j.jarmac.2016.07.010.

Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual

60

differences in working memory. Psychological Review, 99(1), 122-149.

doi:10.1037/0033-295X.99.1.122.

Kane, M. J., & Engle, R. W. (2003). Working-memory capacity and the control of attention: The

contributions of goal neglect, response competition, and task set to Stroop interference.

Journal of Experimental Psychology: General, 132(1), 47-70. doi:10.1037/0096-

3445.132.1.47.

Kane, M. J., Hambrick, D. Z., Tuholski, S. W., Wilhelm, O., Payne, T. W., & Engle, R. W.

(2004). The generality of working memory capacity: A latent-variable approach to

verbal and visuospatial memory span and reasoning. Journal of Experimental

Psychology: General, 133(2), 189-217. doi:10.1037/0096-3445.133.2.189.

Kane, M. J., Conway, A. R. A., Hambrick, D. Z., & Engle, R. W. (2007). Variation in working

memory capacity as variation in executive attention and control. In A. R. A Conway, C.

Jarrold, M. J. Kane, A. Miyake, & J. N. Towse (Eds.), Variation in Working Memory.

NY: Oxford.

Larkin, J., McDermott, J., Simon, D. P., & Simon, H. A. (1980). Expert and novice performance

in solving physics problems. Science, 208, 1335-1342.

Linares, R., Bajo, M. T., & Pelegrina, S. (2016). Age-related differences in working memory

updating components. Journal of Experimental Child Psychology, 14739-52.

doi:10.1016/j.jecp.2016.02.009.

Logan, G.D. (1997). Automaticity and reading: Perspective from the instance theory of

automatization. Reading and Writing Quarterly, 13, 123-146.

61

Louie (in press). Predicting distracted driving: The role of individual differences in working

memory. Applied Ergonomics.

Manassis, K., Tannock, R., Young, A., & Francis-John, S. (2007). Cognition in anxious children

with attention deficit hyperactivity disorder: a comparison with clinical and normal

children. Behavioral and Brain Functions, 3, 4. doi: 10.1186/1744-9081-3-4.

McMillan, K. M., Laird, A. R., Witt, S. T., & Meyerand, M. E. (2007). Self-paced working

memory: Validation of verbal variations of the n-back paradigm. Brain Research, 1139,

133-142. doi:10.1016/j.brainres.2006.12.058.

Miller, L. A., & Lovler, R. L. (2016). Foundations of psychological testing: A practical

approach (5th ed.). Thousand Oaks, CA, US: Sage Publications, Inc.

Miyake, A., Friedman, N.P., Emerson, M.J., Witzki, A.H., Howerter, A., & Wager T.D. (2000).

Unity and diversity of executive functions and their contribution to complex ‘frontal

lobe’ tasks: A latent variable analysis. Cognitive Psychology, 41, 49-100.

doi:10.1006/cogp.1999.0734.

Oberauer, K. (2002). Access to information in working memory: Exploring the focus of

attention. Journal of Experimental Psychology: Learning, Memory, and Cognition,

28(3), 411-421. doi:10.1037/0278-7393.28.3.411.

Oberauer, K., Lewandowsky, S., Farrell, S., Jarrold, C., & Greaves, M. (2012). Modeling

working memory: An interference model of complex span. Psychonomic Bulletin &

Review, 19(5), 779-819. doi:10.3758/s13423-012-0272-4.

Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices

in one-way ANOVA. Behaviormetrika, 40(2), 129-147.

62

Oswald, F. L., McAbee, S. T., Redick, T. S., & Hambrick, D. Z. (2015). The development of a

short domain-general measure of working memory capacity. Behavior Research

Methods, 47(4), 1343-1355. doi:10.3758/s13428-014-0543-2.

Owen, A. M., McMillan, K. M., Laird, A. R., & Bullmore, E. (2005). N-back working memory

paradigm: A meta-analysis of normative functional neuroimaging studies. Human

Brain Mapping, 25(1), 46-59. doi:10.1002/hbm.20131.

Pardo-Vázquez, J. L., & Fernández-Rey, J. (2008). External validation of the computerized

group administrable adaptation of the 'operation span task.'. Behavior Research Methods,

40(1), 46-54. doi:10.3758/BRM.40.1.46.

Pashler, H. (1998). The Psychology of Attention. Cambridge, MA: MIT Press.

Platzer, C., & Bröder, A. (2012). Most people do not ignore salient invalid cues in memory-

based decisions. Psychonomic Bulletin & Review, 19(4), 654-661. doi:10.3758/s13423-

012-0248-4.

Posner, M. I., Snyder, C. R., & Davidson, B. J. (1980). Attention and the detection of signals.

Journal of Experimental Psychology: General, 109(2), 160-174. doi:10.1037/0096-

3445.109.2.160.

Redick, T. S., & Lindsey, D. B. (2013). Complex span and n-back measures of working memory:

A meta-analysis. Psychonomic Bulletin & Review, 20(6), 1102-1113.

doi:10.3758/s13423-013-0453-9.

Redick, T. S., & Engle, R. W. (2006). Working memory capacity and the Attention Network

Test performance. Applied Cognitive Psychology, 20, 713–721.

63

Schroeder, R. W., Twumasi-Ankrah, P., Baade, L. E., & Marshall, P. S. (2012). Reliable Digit

Span: A systematic review and cross-validation study. Assessment, 19(1), 21-30.

doi:10.1177/1073191111428764.

Sohn, Y., & Doane, S., (2003). Roles of working memory capacity and long-term working

memory skill in complex task performance. Memory & Cognition, 31, 458-466.

Tu, H.W., Hampton, R.R. (2014). Control of working memory in rhesus monkeys (Macaca

mulatta). Journal of Experimental Psychology: Animal Learning and Cognition, 40, 467-

476.

Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent?. Journal of

Memory and Language, 28(2), 127-154. doi:10.1016/0749-596X(89)90040-5.

Unsworth, N., Redick, T. S., Heitz, R. P., Broadway, J. M., & Engle, R. W. (2009). Complex

working memory span tasks and higher-order cognition: A latent-variable analysis of the

relationship between processing and storage. Memory, 17(6), 635-654.

doi:10.1080/09658210902998047.

Vaske, J. J., Beaman, J., & Sponarski, C. C. (2017). Rethinking internal consistency in

Cronbach's alpha. Leisure Sciences, 39(2), 163-173.

doi:10.1080/01490400.2015.1127189.

Velichkovsky, B. B. (2017). The concentric model of human working memory: A validation

study using complex span and updating tasks. Psychology in Russia: State of the Art,

10(3), 74-92.

64

Wilson VanVoorhis, C. R. & Morgan, B. L. (2007). Understanding power and rules of thumb for

determining sample sizes. Tutorials in Quantitative Methods for Psychology, 3(2), 43-50.

doi:10.20982/tqmp.03.2.p043.

Documents

Psychometric Properties of a Working Memory Span Task