Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
University of Central Florida University of Central Florida
STARS STARS
Honors Undergraduate Theses UCF Theses and Dissertations
2018
Psychometric Properties of a Working Memory Span Task Psychometric Properties of a Working Memory Span Task
Juan M. Alzate Vanegas University of Central Florida
Part of the Applied Statistics Commons, Cognitive Psychology Commons, Multivariate Analysis
Commons, and the Quantitative Psychology Commons
Find similar works at: https://stars.library.ucf.edu/honorstheses
University of Central Florida Libraries http://library.ucf.edu
This Open Access is brought to you for free and open access by the UCF Theses and Dissertations at STARS. It has
been accepted for inclusion in Honors Undergraduate Theses by an authorized administrator of STARS. For more
information, please contact [email protected].
Recommended Citation Recommended Citation Alzate Vanegas, Juan M., "Psychometric Properties of a Working Memory Span Task" (2018). Honors Undergraduate Theses. 358. https://stars.library.ucf.edu/honorstheses/358
PSYCHOMETRIC PROPERTIES OF A WORKING MEMORY SPAN TASK
by
JUAN M. ALZATE VANEGAS
A thesis submitted in partial fulfillment of the requirement
for the Honors in the Major Program in Psychology
in the College of Sciences
and in the Burnett Honors College
at the University of Central Florida
Orlando, Florida
Summer Term, 2018
Thesis Chair: Mustapha Mouloua, Ph.D.
ii
ABSTRACT
The intent of this thesis is to examine the psychometric properties of a complex span task (CST)
developed to measure working memory capacity (WMC) using measurements obtained from a
sample of 68 undergraduate students at the University of Central Florida. The Grocery List Task
(GLT) promises several design improvements over traditional CSTs in a prior study about
individual differences in WMC and distraction effects on driving performance, and it offers
potential benefits for studying WMC as well as the serial-position effect. Currently, the working
memory system is composed of domain-general memorial storage processes and information-
processing, which involves the use of executive functions. Prior research has found WMC to be
associated with attentional measures (i.e., executive attention) and the updating function, and
unrelated to the shifting function. The present study replicates these relationships to other latent
variables in measures obtained from the GLT as convergent and discriminant evidence of
validity. In addition, GLT measures correlate strongly with established measures of WMC. Task
reliability is assessed by estimates of internal consistency, pairwise comparisons with a cross-
validation sample, and an analysis of demographic effects on task measurements.
iii
DEDICATION
I dedicate this thesis to the undergraduate student who is enthusiastic about the idea of
completing a research project, but is afraid to ask for help or follow through with the process.
iv
ACKNOWLEDGEMENTS
No man does it all by himself. Although I am listed as the sole author of this thesis, the humbling
reality is that this work would not have been possible without these outstanding individuals.
My parents. Juan and María, for their love, support, and counsel throughout the process.
My mentors. Dr. Jennifer F. Louie, for inspiring this project out of the task that we
designed for her dissertation, and Dr. Charles Negy, for shaping the scholar I am today.
My research assistants. Madison Hurst, Emily Asaro, and Isabelle Perez Santos, for
their dedication in helping me collect data from a modestly large sample size in a
surprisingly short amount of time.
My participants. The 72 participants who showed genuine interest in this project and
entrusted me with their data as a means of enhancing our understanding of the
psychometrics of working memory ever so slightly.
My committee. Dr. Mustapha Mouloua, Dr. Matthew Chin, and Dr. Alexander
Mantzaris, for their counsel and flexibility in scheduling my thesis defense.
The authors referenced in this thesis. Without your interdisciplinary contributions to
the body of literature in working memory, studies like these would not be possible.
In publishing this work, I would like to express my deepest gratitude to these individuals.
v
TABLE OF CONTENTS
Introduction……………………………………………………………………………………......1
Background……………………………………………………………………………………..1
Theoretical Models…….………………………………………………………………………..2
Method….………………………………………………………………………………………..10
Participants…………………………………………………………………………………….10
Materials…………………………………………………………………………………….…10
Procedures……………………………………………………………………………………..14
Cross-Validation Sample…………………...………………………………….………………16
Results…..………………………………………………………………………………………..17
Demographics…….……………………………………………………………………………17
Reliability……………………………………………………………………………………...17
Validity…..…………………………………………………………………………………….19
Cross-Validation……………………………………………………………………………….22
Item Recall Patterns…………………………………………………………………………...24
Discussion………………………………………………………………………………………..26
Theoretical Implications……………………………………………………………………….26
Practical Implications…………….……………………………………………………………28
Limitations…………………………………………………………………………………….29
Future Research………………………………………………………………………………..30
Appendix A: IRB Approval Letter………………………………………………………………34
Appendix B: GLT Instructions and Sample Lists………………………………………………..37
Appendix C: GLT Lists………………………………………………………………………….39
Appendix D: Item Recall Patterns……………………………………………………………….45
vi
References………………………………………………………………………………………..55
1
INTRODUCTION
The present thesis aims to evaluate the psychometric properties of the Grocery List Task,
a mental task designed to study working memory. Before proceeding, an overview of the
working memory system and its importance for human life are included in this section, along
with a discussion of applicable models used guide the hypotheses tested in this thesis.
BACKGROUND
Working memory refers to a cognitive system in which a small amount of information is
temporarily maintained for processing and rapid access (Baddeley, 1986; Cowan, 2005). In
practice, working memory has been predictive of performance in a variety of higher-order
capabilities such as understanding instructions, problem-solving on the computer, learning a
foreign language, and situational awareness in maneuvering of technical systems in driving and
aviation (Just & Carpeter, 1992; Engle, 2002; Sohn & Doane, 2003; Engle & Kane, 2004;
Conway, Jarrold, Kane, Miyake, & Towse, 2007; Unsworth, Redick, Heitz, Broadway, & Engle,
2009). Usage of the working memory system is so inescapable to humans, in fact, that most
everyday situations have been found to require some degree of processing by working memory
(Kane, Conway, Hambrick, & Engle, 2007; Cohen & Conway, 2008). Literature on the working
memory system follows the paradigm of studying individual differences in working memory
capacity (WMC), the limited capacity of the working memory system, which has been used to
explain differences in academic achievement and deficits in learning (Larkin, McDermott,
Simon, & Simon, 1980; Groen & Patel, 1988; Gathercole & Alloway, 2008), and is known to
increase as the prefrontal cortex develops and decrease with age in adults (Baddeley, 2012;
Clark, Hardman, Schachtman, Saults, Glass, & Cowan, 2018).
2
THEORETICAL MODELS
Although working memory is defined fairly consistently, the literature lacks consensus in
defining precisely what constitutes the working memory system itself. The Baddeley-Hitch
multi-component model, for instance, conceptualizes the central executive as the control center
of information to its slave systems—the phonological loop for verbal content, and the
visuospatial sketchpad for visual and spatial content—as well as an episodic buffer to link them
to the long-term memory system (Baddeley & Hitch, 1974; Baddeley, 1986, 2009, 2012). In this
regard, Baddeley’s modal model assumes working memory is a domain-specific, with distinct
structures to support processing of different varieties of information. Like other critics of
Baddeley’s theory, Engle and Kane (2004) have questioned this assumption and argued that the
multi-component model implies the existence of different types of working memory (e.g.,
spatial, verbal, sound, etc.), and hence its conceptualization of working memory lacks a
parsimonious explanation for an underlying, domain-general system for information-processing
and memory storage (Pashler, 1999).
Instead, Engle and Kane (2004) proposed that the working memory system behaves
according to the two-factor theory of cognitive control. The overarching theory describes
executive attention (EA), an attentional component specified by Fan, McCandliss, Somer, Raz,
and Posner (2002), as separate but closely related to working memory, especially within the
context of situational awareness (Miyake et al., 2000; Engle & Kane, 2004; Redick & Engle,
2006; Kane et al., 2007; Brown, & Robert, 2014). As a factor of cognitive control, EA behaves
similarly to Posner (1980)’s attentional spotlight in maintaining attention on information
3
pertinent to a specific task (Kane & Engle, 2003). The tri-layer embedded-processes model
expands on Engle and Kane’s proposed link between WMC and attention, suggesting that
individual differences in WMC are attributed in part to the attentional component of the working
memory system, the focus of attention (FA) (Cowan, 1999; 2001; 2005). The FA activates a
layer of LTM (a-LTM), resulting in our ongoing thoughts; however, this activated memory is
susceptible to interference effects and decay (Cowan, 1999). Among other such activation
models, the concentric model describes storage systems with a structural paradigm similar to that
of Baddeley’s, and has shown promise beyond specialized updating tasks (Oberauer, 2002;
Oberauer, Lewandowsky, Farrell, Jarrold, & Greaves, 2012; Velichkovsky, 2017).
Despite their differences in conceptualizing working memory, functional models are
helpful in describing distinct functions of the working memory system, which can then be
invoked by the demands of different tasks (Miyake et al., 2000; Brauer, Neumann, & Friederici,
2008). Updating, for instance, refers to one’s ability to update old, inaccurate information with
new information, and it is invoked in complex span tasks (CSTs) such as the Operation Span
(OSpan) task, as well as other non-CSTs such as the n-back task and the mental counters task
(Garavan, 1998; Miyake et al., 2000; Kane & Engle, 2003; Owen, McMillan, Laird, & Bullmore,
2005; McMillan, Laird, Witt, & Meyerand, 2007). In this regard, the updating function is
responsible for memorial storage in the domain-general model of working memory. Notably,
however, although the n-back paradigm is a common method of measuring WMC, it tends to
correlate weakly with CST measures and should not be used interchangeably with CSTs (Redick
& Lindsey, 2013; Atkins et al., 2014). Shifting is the ability to switch between tasks without
making errors (Miyake et al., 2000), and it is invoked in the Task Switching Task used by Atkins
4
and colleagues (2014). Unlike updating, shifting primarily concerns EA and is unrelated to
WMC (Kane et al., 2007), performance on CSTs (Miyake et al., 2000) and fluid intelligence (gF)
(Friedman et al., 2006; Oswald, McAbee, Redick, & Hambrick, 2015). Lastly, inhibition is one’s
ability to intentionally suppress automatic responses to stimuli (Miyake et al., 2000). This
process is invoked in the Stroop and Tower of Hanoi tasks, and requires selecting ignoring
distractive stimuli and selecting goal-relevant information; thus, it is related to both WMC and
EA (Miyake et al., 2000; Heitz & Engle, 2007).
Unsworth, Redick, Heitz, Broadway, and Engle (2009) define working memory rather
broadly as a system with memorial storage and information processing. Working memory is
separate but related to long-term working memory (LTWM), and it is limited by attentional
resources (Unsworth et al., 2009). Although this definition is fairly succinct and captures an
intuitive view of working memory, the underlying model relies upon the assumption of a limited-
resource system specified by resource theory, which has received limited support at best
(Baddeley, 1986; Logan, 1997; Engle, Kane, & Tuholski, 1999; Cowan, 2005; Cowan, Morey,
Chen, & Bunting, 2007; Tu & Hampton, 2014).
Independently of the model used to operationalize working memory, there is some
agreement on how to measure WMC. Although related to short-term memory (STM), working
memory does not fully load onto simple span tasks (Felez-Nobrega, Foster, Ribera, & Draheim,
2017). To capture all of its dimensions, WMC must be measured with complex span tasks
(CSTs) in multiple domains (Conway et al., 2005; Unsworth et al., 2009; Foster et al., 2015).
Unfortunately, traditional CSTs have several drawbacks. For instance, the method of
administration is often intrusive and does not allow a participant to engage in other activities in
5
the meanwhile; this is problematic when using the task to study distraction effects in driving and
aviation systems. CSTs like the Operation Span task are difficult to administer to groups; and can
be biased by the presence of an experimenter (Pardo-Vázquez & Fernández-Rey, 2008; Heenan,
Herdman, Brown, & Robert, 2014). Moreover, the content of these tasks is abstract and unlike
our everyday experiences with the working memory system; hence, boredom effects can lead to
drops in task performance over a large enough block of trials. In fact, in unproctored CSTs
administered digitally, abstract stimuli such as letters have worsened criterion-related measures
of predictive validity (Hicks, Foster, & Engle, 2016). Reducing the level of abstraction in a
cognitive task has been done before: in the domain of hybrid search tasks, Drew, Boettcher, and
Wolfe (2015) demonstrated that CSTs can adequately measure visual working memory with
information on both partial spans related to a common theme—a grocery list.
Likewise, the newly-developed Grocery List Task (GLT) is a complex span task that
requires participants to manipulate and recall semantic and numerical information from a set of
grocery lists (Louie, in press). Adapted from Daneman and Carpenter (1980)’s auditory Reading
Span, the GLT was designed to require memorial storage as well as the use of executive
attention; hence, its measures should behave according to Miyake et al. (2000)’s model of
executive functions. Originally, the GLT was used as a naturalistic distractor in a study of the
effects of in-vehicle devices on simulated driving performance (Louie, in press). As such, the
GLT sought to fulfill certain objectives to serve as an improvement over traditional CSTs:
Fidelity. The stimuli presented in the task should not be abstract, but rather concretely
pertain to a common human activity that requires both memorial storage and information
processing.
6
Interrelatedness between partial spans. Although the task is a dual span, the
information on each partial span should be domain-general and interrelated to adequately
demonstrate evidence of content validity. In other words, the information presented should be
from the same source or about the same topic.
Portability. The task should not be invasive to administer. In particular, participants
should be physically capable of engaging in other activities (e.g., completing another task,
driving, etc.) while the task is being administered such that the task can be used to activate the
working memory system during the completion of these activities.
Standardization. Each set of lists was normalized according to frequency and word
length, while the whole set was normalized for number of items. These methods of
standardization allow the lists to be roughly equivalent and comparable according to these
variables, such that task content is roughly homogenous across lists and list sets.
Simplicity. The task should be straightforward for large samples of participants to
understand, while still fulfilling the other objectives.
If adequately reliable, the GLT could theoretically fulfill these objectives to facilitate
measurements of WMC in certain contexts, as well as provide evidence of the serial-position
effect, which occurs when items near the beginning and end of a list are more likely to be to
encoded and recalled (Capitani, della Sala, Logie, & Spinnler, 1992).
The main goal of this study is to empirically examine the psychometric properties of the
GLT, particularly to examine its predictive validity and reliability. As such, the present study
will follow approaches taken by past research for validating complex span tasks (Gonthier et al.,
2016; Felez-Nobrega et al., 2017).
7
To evaluate the GLT’s reliability, internal consistency will be estimated with Cronbach’s
coefficient alpha (Cronbach, 1951; Kane et al., 2004; Gonthier, et al., 2016). Traditional CSTs
often produce alpha values of .70 or better (Conway et al., 2005). Moreover, a regression model
will examine the effects on GLT scores of demographics variables such as age, sex, ethnicity,
major, GPA, and first language as well as clinical disorders known to impact WMC, such as
attention deficit disorder, Alzheimer’s disease, and schizophrenia (Manassis, Tannock, Young, &
Francis-John, 2007; Oswald, McAbee, Redick, & Hambrick, 2015; Brydges, Ozolnieks, &
Roberts, 2017; Colbert, 2018).
Finally, following an approach used by researchers to cross-validate the Reliable Digit
Span (Schroeder, Twumasi-Ankrah, Baade, & Marshall, 2012), the score distribution will be
cross-validated with a sample of undergraduates who completed the GLT several months prior to
the current administration of the GLT.
To demonstrate evidence of validity, the GLT should be reflective of the executive
functions of working memory that are invoked in CSTs. Thus, scores on a battery of CSTs will
be correlated with GLT scores as a convergent measure of validity. Consistent with attentional
models of working memory, performance on the Flanker task, a component of the Attentional
Networks Task (ANT), has been found to be predictive of individual differences in WMC
(Eriksen & Eriksen, 1974; Engle & Kane, 2004; Redick & Engle, 2006; Heitz & Engle, 2007).
GLT scores should therefore be reflective of this relationship; hence, scores and response times
on the Flanker task will be correlated with GLT scores to obtain a criterion-related measure of
validity. Given that shifting is an executive function unrelated to WMC, scores on the Task
Switching Task should be unrelated to GLT scores. Finally, scores obtained from the Task
8
Switching Task will be correlated with GLT scores to produce a discriminant measure of validity
(Atkins et al., 2014).
On the basis of this approach, the following set of hypotheses was generated:
Hypothesis 1. As with other CSTs, scores from different GLT lists will produce a value
of Cronbach’s coefficient alpha of .70 or better.
Hypothesis 2. The only significant demographic predictors of GLT scores will be age
and clinical disorders that impact WMC.
Clinical disorders that impact WMC include Attention-Deficit/Hyperactivity Disorder
(ADHD) and learning disorders such as dyslexia and dyscalculia (Brydges, Ozolnieks, &
Roberts, 2017; Colbert, 2018). Participants will also be asked if they receive any treatments for
their condition, as those who are currently being treated may show performance comparable to
participants without disorders (Manassis, Tannock, Young, & Francis-John, 2007; Oswald,
McAbee, Redick, & Hambrick, 2015; Brydges, Ozolnieks, & Roberts, 2017; Colbert, 2018).
The sample will consist of undergraduate students, for whom the prefrontal cortex and
cognitive abilities are still in development (Baddeley, 2012; Linares, Bajo, & Pelegrina, 2016;
Clark et al., 2018). Having smaller WMC, younger participants are thus expected to produce
lower scores.
Hypothesis 3. There will be a positive correlation between performance on the WMC
battery and performance on the GLT; i.e., high-span participants will score higher on the GLT
than low-span participants.
9
Hypothesis 4. There will be a positive correlation between performance on the Flanker
task and performance on the GLT; i.e., participants who score high in the Flanker task will score
high on the GLT.
Hypothesis 5. There will be no association between performance on the Task Switching
Task and performance on the GLT; i.e., participants who score high in the Task Switching Task
and participants who score low in the Task Switching Task will produce similar scores.
Hypothesis 6. Consistent with hypotheses 3-5, performance on the GLT should be
related to WMC and EA, and unrelated to shifting.
Hypothesis 7. The distribution of scores obtained in the present study will not differ
significantly from that of the cross-validation sample.
10
METHOD
PARTICIPANTS
A sample of 72 undergraduate students attending the University of Central Florida (UCF)
participated in the present study. All participants were recruited during the 2018 summer term
through UCF Sona Systems, a university website that allows students to meet course
requirements and earn extra credit by participating in psychological research. Three participants
did not complete the tasks correctly and were removed altogether from the analyses. In addition,
one participant’s data files for the computer tasks were corrupted, reducing the sample size to 68
for most analyses.
The remaining 69 individuals sampled ranged from 18 to 27 years of age (M = 19.30, SD
= 2.02), and 53.62% were female. A summary of demographics can be found in Table 1.
Moreover, 82.61% of participants’ spoke English as their first language, 10.14% spoke Spanish,
and 7.25% spoke another language (i.e., Arabic, Bengali, Cantonese, Portuguese, or Indonesian).
Table 1: Demographic characteristics
Sex
Ethnicity Male Female Totals
White 17 21 38 (55.07%)
Black/African American 9 7 16 (23.19%)
Hispanic/Latino 2 3 5 (7.25%)
Asian 2 5 7 (10.14%)
Multiracial 1 1 2 (2.90%)
Other 1 0 1 (1.45%)
Totals 32 (46.38%) 37 (53.62%) 69
MATERIALS
Demographics questionnaire. Participants completed a short demographics
questionnaire on Qualtrics, which prompted participants to list their age, GPA, first language,
and any clinical conditions (and treatment for these conditions) that may impair WMC or
11
performance on a domain-specific task, such as mental and neurological disorders like ADHD,
autism, schizophrenia, Alzheimer’s disease, dyslexia, dyscalculia, and others.
Grocery List Task (GLT). A complex span task consisting of a set of grocery lists for
which each item has a quantity (specified as a number of boxes) that will be purchased and a
corresponding cost per box. The lists were standardized according to number of words, length of
words, and word frequency for each item using a word frequency corpus1 to reduce the salience
effect (Platzer & Bröder, 2012) as well as the likelihood of random guessing associated with
each item (Dowd, Kiyonaga, Beck, & Egner, 2014). In addition, the quantities and prices were
randomized. Overall, the lists contain 97 unique items (i.e., “Almonds”, “Pasta”, and “Spinach”
each appear on two different lists) and can be found in Appendix C.
To complete the task, participants sat in a driving simulator chair facing a blank projector
screen while a voice recording of my mentor Dr. Jennifer F. Louie reading each list played. The
experimenter scored the participant’s responses silently, obscured from view by a black tarp.
For each item on a given list, participants were told the item type, quantity, and price.
Subsequently, they were given 4 seconds to confirm the correct cost of each type of item by
responding “Yes” or “No” to a possible value of the cost. At the end of each list, participants
were asked to recall as many items on the list as they can in whichever order they recall best
before moving onto the next list.
Participants received the following set of instructions verbally from the experimenter: “In
this task, imagine you are going to the grocery store and need to pick up a list of items. Each
1The corpus used may be accessed at www.wordfrequency.info.
12
item comes in a varying number of boxes. I will tell you the price of each item, and how many
boxes you need to buy. Please verify the total cost of each item by saying ‘Yes’ or ‘No’. We will
give you four seconds to respond before moving on to the next item. At the end of the list, I will
ask you for all the items.” To ensure that participants understood the task requirements, the
experimenter read aloud two sample lists, included in Appendix B. Participants were told to
recall as many items as possible at the end of the list in the order that they recalled them, while
responding and maintaining accuracy in the price-confirmation partial span task.
As designed, the GLT is a dual span task that produces two measures of accuracy:
Primary accuracy (item recall) is considered the average proportion of items correctly recalled
from each list, while secondary accuracy (price confirmation) is the proportion of items for
which a participant correctly verified the cost of a type of item. It is worth noting that the terms
“primary” and “secondary” are merely used as labels, not to indicate that primary accuracy
measure is psychometrically “primary”.
Due to time constraints, the majority of participants completed the task with exactly one
half of the 100 task items, determined using the third digit in the participants’ ID assigned by
Sona Systems. The first half contained lists 1-9, while the second half included lists 10-19. As
task lists were standardized and randomly-generated, both halves should produce equivalent
measures.
Finally, to examine the effects of semantics on recall of GLT items, an adjusted version
of primary accuracy was computed separately to account for items for which participants
incorrectly recalled a similar-sounding word instead of the item itself.
13
Complex Span Tasks. Participants completed shortened versions of the traditional
Operation, Reading, and Symmetry Span tasks on the computer (Turner & Engle, 1989; Conway
et al., 2005). The mean of the partial span scores is used as the WMC score.
Task Switching Task. A dual task involving the shifting executive function, adapted
from Atkins et al. (2014). Participants began by fixating on the computer display for 500 ms.
Based on a cue that was displayed centrally at variable time intervals (80 ms, 100 ms, 200 ms,
400 ms, and 800 ms), participants were asked to judge a set of digits (1-9, excluding 5) in
magnitude relative to 5 (lower or higher) and in parity (even or odd) as quickly and accurately as
possible using keys on a QWERTY keyboard.
Unlike Atkins et al. (2014)’s design which included “High-Low” and “Parity” as
additional cues that corresponded to Magnitude and Odd-Even trials, the only cues used to
designate the trial type were the words “Magnitude” and “Odd-Even”; this change was intended
to simplify the task for participants unfamiliar with the word, “Parity”, and in turn avoid possibly
contaminating response time measures over English vocabulary. A practice block of 10 trials was
used to ensure participants understand the instructions before moving onto a 100-trial
experimental block.
Each trial was considered a “task repeat” trial if the cue matched that of the previous trial;
otherwise, the trial was considered a “task switch” trial. To assess performance on trials that
require shifting relative to an accuracy baseline, the difference between task repeat and task
switch measures was considered the switching measure.
Flanker Task. The Flanker task is a component of the Attention Network Task (ANT),
the Flanker task requires participants to determine if a target arrow located directly above or
14
below a fixation cross points either left or right, while ignoring distractor stimuli (Eriksen &
Eriksen, 1974). The distractor stimuli is classified as neutral (straight lines), congruent (arrows
pointing in the same direction as the target arrow), or incongruent (arrows pointing in the
opposite direction as the target arrow).
Participants held the computer mouse and respond using their thumbs to left-click if the
target arrow points left or right-click if the target arrow points right. Participants received the
following set of instructions and were asked to respond as quickly as possible without losing
accuracy: “This task will take a bit longer than the other tasks (around half an hour). In this task,
you will see a target arrow pointing to either the left or right. The target arrow will always be
located directly above or directly below the fixation cross. Please use the mouse to indicate
which direction the arrow is pointing: If the target is pointing to the left, click the left key with
your left thumb. If the target is pointing to the right, click the right key with your right thumb.”
Accuracy and response time measures were recorded separately for the two
classifications of the distractor stimuli. The differences between each measure on the
incongruent and congruent trials were used to operationalize EA. Throughout this thesis, this
task is referenced interchangeably as “Flanker task” or “ANT”.
PROCEDURES
To minimize the effect of the order in which tasks were completed, participants were
assigned to begin the study in one of two parts based on the third digit of a unique, randomized
ID assigned to each participant by Sona Systems prior to his or her arrival to the laboratory.
Participants completed the tasks in order as they are listed within each part, and subsequently
they completed the remaining part.
15
Part A. Grocery List Task, Attention Networks Task
Part B. Operation Span, Reading Span, Symmetry Span, Task Switching Task
On the basis of this assignment, participants were directed to a seat at a computer station
upon arriving at the laboratory. Participants were asked to read and agree to the informed consent
form at the beginning of the demographics questionnaire. After consenting to take part in the
study, participants were asked to complete the questionnaire. Afterwards, the experimenter
prompted the participant to begin the tasks in the assigned order.
Every task except the GLT was completed at a computer station in the respective part of
the laboratory. After completing all the tasks in one part, participants completed the remaining
part at a different computer station. After completing both parts of the study, participants were
thanked for their time, awarded credit for their courses, and dismissed. The duration of the study
was approximately 90 minutes.
Notably, the order sequences were not implemented as a fully-randomized control of
order effects, but rather as a compromise. Specifically, given the short amount of time allotted to
data collection in the present study, two order sequences were devised as a means of increasing
the sample by allowing experimenters to administer the tasks to two participants simultaneously
without interruptions while providing some assessment of order effects. Although not every
possible permutation of task completion order is represented in the present study, if the GLT is
highly susceptible to order effects, they may still be detectable in a pairwise comparison of GLT
performance according to order sequence.
16
CROSS-VALIDATION SAMPLE
A sample of 70 undergraduate students from the University of Central Florida completed
the GLT and several other tasks several months prior to the current sample, for the dissertation
that prompted this research (Louie, in press). The individuals sampled ranged from 18 to 32
years of age (M = 18.56, SD = 1.80), and 67.14% were female. A summary of their
demographics can be found in Table 2.
Participants in the cross-validation sample only completed lists 1-9 of the GLT. Thus, the
obtained GLT and WMC measures differs methodologically across studies. Nonetheless,
although measures from a variety of different domains are necessary to produce adequate
estimates of WMC (Conway et al., 2005; Unsworth et al., 2009; Foster et al., 2015), performance
on CSTs generally does not vary across these domains (Kane, Conway, Hambrick, & Engle,
2007). Moreover, assuming both halves of the GLT are equivalent, the primary accuracy
measures should remain comparable. As such, this sample will be used jointly with the current
sample for cross-validation to assess psychometric properties of the GLT.
Table 2: Demographic characteristics, cross-validation sample
Sex
Ethnicity Male Female Totals
White 10 22 32 (44.44%)
Hispanic/Latino 2 9 11 (15.28%)
Black/African American 2 9 11 (15.28%)
Asian 5 4 9 (12.50%)
Mixed/Other 4 5 9 (12.50%)
Totals 23 (31.94%) 49 (68.06%) 72
17
RESULTS
DEMOGRAPHICS
An analysis of demographic effects was conducted on the GLT measures. For primary
accuracy, no significant differences were observed between male and female participants, t(66) =
-0.40, p = n.s.. Likewise, primary accuracy did not significantly differ by ethnicity, F(5, 62) =
1.29, p = n.s.. No Sex × Ethnicity interaction effects reached significance for primary accuracy,
F(9, 58) = 0.73, p = n.s.. For secondary accuracy, no significant differences were observed
between male and female participants, z = 1.71, p = .08. Likewise, no significant differences in
secondary accuracy were observed by ethnicity, χ2(5, N = 68) = 3.51, p = n.s..
A positive correlation between age and primary accuracy approached significance, r(66)
= .23, p = .06. However, no such association was observed with secondary accuracy, r(66) =
-.13, p = n.s.. Primary accuracy was unrelated to GPA, r(66) = -.14, p = n.s.. Likewise, secondary
accuracy was unrelated to GPA, r(66) = .06, p = n.s..
Finally, primary accuracy did not differ by first language (F(5, 62) = 0.98, p = n.s.) or
university major, F(16, 51) = 0.73, p = n.s.. Likewise, secondary accuracy did not differ by first
language (χ2(5, N = 68) = 9.54, p = n.s.) or by university major, χ2(16, N = 68) = 20.82, p = n.s..
RELIABILITY
Order effects. Half of participants began by completing part A of the study (i.e., GLT,
ANT) first; the other half by completing part B (i.e., Operation Span, Reading Span, Symmetry
Span, TST) first. Regardless, no order effects reached significance for primary accuracy (t(66) =
-0.88, p = n.s.), secondary accuracy (z = -0.42, p = n.s.), or any of the other measures used.
18
Internal consistency. Cronbach’s coefficient alpha was calculated for each set of lists
using the order in which items were recalled. For the first half of the task lists, the coefficient
alpha was calculated to be 0.73, while the second half produced a value of 0.81.
Further analyses of 4,802 inter-item correlations revealed 38.57% produced a correlation
coefficient estimate less than or equal to 0, while only 39.32% produced an estimate greater than
or equal to 0.10. Only 30.03% were statistically significantly larger than 0 at the 80% confidence
level. In addition, 17 items produced a median inter-item correlation coefficient of 0 or less.
These items were included in lists 1 (“pasta”), 2 (“cupcake”), 3 (“cinnamon”, “pepperoni”), 4
(“walnuts”), 5 (“pasta”), 6 (“peas”, “cashews”), 7 (“avocado”, “cabbage”, “pickles”), 9
(“croutons”, “broccoli”), 10 (“waffles”), 12 (“corn”, “garlic”), and 13 (“asparagus”). Notably,
“pasta”, the only item to appear twice on the same half (lists 1 and 5), produced a very poor
inter-item correlation with itself (r = -.14).
Figure 1: Median inter-item correlation coefficients for item recall.
Parallel-forms reliability. Thirty-six (52.17%) participants completed GLT lists 10-19,
while 32 (46.38%) completed lists 1-9; one participant completed both halves. Without
considering this participant, primary accuracy did not differ across the split halves, t(67) = -0.57,
p = n.s.. Likewise, secondary accuracy did not differ between task halves, z = -1.43, p = n.s..
19
VALIDITY
Primary accuracy. The distribution of primary accuracy on the GLT was normal, M =
58.80%, Mdn = 60.00%, SEM = 1.32%, p = .06. Primary accuracy was positively correlated with
the mean CST score, r(66) = .32, 95% CI [.09, .52], p < .01. The Reading Span task was the only
CST which produced scores significantly associated with GLT primary accuracy, r(66) = .45,
95% CI [.22, .63], p < .001. By contrast, no significant linear relationship was observed between
primary accuracy and Flanker task accuracy (r(66) = -.06, p = n.s.) or response times, r(66) =
-.01, p = n.s.. Likewise, primary accuracy was unrelated to accuracy (r(66) = .16, p = n.s.) and
response time measures on the Task Switching Task, r(66) = -.18, p = n.s..
The version of primary accuracy adjusted for semantics was 1.45% higher than the non-
adjusted version, t(67) = 6.14, 95% CI [0.98%, 1.92%] p < .0001. However, although adjustment
reduced standard error, primary accuracy was no longer normally-distributed, M = 59.89%, SEM
= 1.28%, p < .05. Moreover, the correlation coefficient between primary accuracy and the mean
WMC score was weakened by the adjustment, r(66) = .28, 95% CI [.05, .49], p = .019.
Secondary accuracy. The distribution of secondary accuracy on the GLT was non-
normal with a skewness of -3.44 and kurtosis of 17.45, M = 93.92%, Mdn = 96.00%, SEM =
1.01%, p < .0001. Secondary accuracy was unrelated to the WMC score, r(66) = .19, p = n.s..
20
Figure 2a: Distribution of GLT primary
accuracy.
Figure 2b: Distribution of GLT secondary
accuracy.
No significant linear relationship was observed between secondary accuracy and Flanker
task accuracy (r(66) = -.22, p = .07) or response times, r(66) = -.09, p = n.s.. Finally, secondary
accuracy was unrelated to both accuracy (r(66) = .10, p = n.s.) and response times measures on
the Task Switching Task, r(66) = .00, p = n.s..
Table 3: Correlation matrix of task measures
GLT ANT TST CSTs
Primary Secondary +
Acc +
RT +
Acc +
RT +
WMC
Primarya - .16 -.06 -.01 .16 -.18 .32**
Secondarya+ .16 - -.22 -.09 .10 .00 .19
Accb+ -.06 -.22 - .03 .13 -.19 -.14
RTb+ -.01 -.09 .03 - .05 -.09 .06
Accc+ .16 .10 .13 .05 - -.03 .09
RTc+ -.18 .00 -.19 -.09 -.03 - -.01
WMCd .32** .19 -.14 .06 .09 -.01 -
**p < .01 +Specifies that coefficients were calculated for non-normal data using Spearman’s
nonparametric method (rs). aDenotes accuracy measurements obtained from the Grocery List Task (GLT). bDenotes measurements obtained from the Attention Network Task (ANT). cDenotes measurements obtained from the Task Switching Task (TST). dDenotes the mean of the partial span score obtained from the Operation Span, Symmetry
Span, and Reading Span tasks.
Note: response time measures are labeled “RT” while accuracy measures appear as “Acc”.
21
WMC disorders. Among the individuals recruited in the study, five had Attention
Deficit Disorder (ADD) or Attention Deficit Hyperactivity Disorder (ADHD), two had
Obsessive-Compulsive Disorder (OCD), and one was dyslexic.
For primary accuracy, the standardized z-scores obtained from these individuals ranged
from -0.61 to 1.46. Combined, individuals with WMC disorders had significantly lower primary
accuracy than individuals without WMC disorders, t(66) = -1.82, p < .05 (one-tailed). Due to the
small sample size of participants with WMC disorders, however, no within-group differences
reached significance for analyses of WMC deficiencies according to disorder, F(4, 3) = 0.61, p =
n.s.. In addition, those with untreated WMC disorders performed comparably to those who
receive treatment, t(6) = -0.81, p = n.s..
For secondary accuracy, no significant differences were detected between individuals
with WMC disorders and individuals without WMC disorders, z = -1.46, p = n.s..
Regression analysis. A multiple linear regression was calculated to predict standardized
GLT primary accuracy with WMC (i.e., the mean WMC span) and EA (i.e., accuracy on the
Flanker task portion of the ANT) as predictors. A significant regression equation was found, F(3,
64) = 5.49, p < .01, f 2 = .26. Accordingly, this is the model endorsed by the regression analysis:
GLT Primary(z) = (0.15)WMC + (0.07)EA + (-4.45 × 10-4)WMC×EA - 2.50
Overall, the model accounted for approximately 16.74% of the variance in GLT primary
accuracy z-scores. Expected primary accuracy z-scores correlated strongly with observed
primary accuracy z-scores, r(66) = .45, 95% CI [.24, .62], p < .001. However, the distribution of
standardized residuals obtained from the model was non-normal, M = 3.19 × 10-17, Mdn = 0.06,
SD = 0.89, p = .03626.
22
According to the model, a 1-point increase in mean WMC score was associated with a
GLT primary accuracy increase of 0.15 SD, all other factors held constant, t(64) = 3.69, p < .001,
d = 0.92. Likewise, a 1-point increase in ANT accuracy was associated with a GLT primary
accuracy increase of 0.07 SD, all other factors held constant, t(64) = 2.27, p < .05, d = 0.57. In
addition, the main effect of WMC on primary accuracy depends on EA and vice versa, t(64) = -
2.62, p < .05, d = -0.66.
Table 4: Regression analysis of mean WMC span score and ANT accuracy as predictors of
standardized GLT primary accuracy
Source Estimate (β) SE t* p(|t| ≥ t*)
Intercept -2.497100 0.714891 -3.493 .000872***
WMCa 0.149421 0.040464 3.692 .000462***
EAb 0.065975 0.029107 2.267 .026799*
WMC*EA -0.004467 0.001704 -2.622 .010914*
*p < .05
***p < .001 aDenotes the mean WMC partial span score. bDenotes the mean accuracy difference between incongruent and congruent trials on the Flanker
task portion of the Attention Networks Task (ANT).
CROSS-VALIDATION
A Welch t-test revealed that there were no significant differences in primary accuracy
between the present sample and the cross-validation sample, t(133.25) = -0.24, p = n.s..
Likewise, no significant differences were found between samples in secondary accuracy, z =
0.60, p = n.s.. However, WMC measures for the cross-validation sample were significantly lower
than those of the current of study, Md = 2.52, t(138) = 4.52, 95% CI [1.42, 3.62], p < .0001. ANT
accuracy was also lower for the cross-validation than for the current sample, Mdnd = 3.00, z = -
3.11, 95% CI [-6.00, -1.00], p < .01.
23
Figure 3: Distributions of standardized GLT primary accuracy.
Note: the current sample appears violet, while the cross-validation sample appears in red.
Regression analysis. As in the previous regression analysis, a multiple linear regression
was calculated to predict standardized GLT primary accuracy based on WMC and EA measures
in the cross-validation sample. Given the significant differences in WMC and EA between the
samples, the analysis was conducted separately rather than by pooling the samples.
A significant regression equation was found, F(2, 69) = 6.85, p < .01, f 2 = .20.
Accordingly, this is the model endorsed by the regression analysis:
GLT Primary(z) = (0.09)WMC + (-0.02)EA - 0.80
Overall, the model accounted for approximately 14.15% of the variance in GLT primary
accuracy z-scores. Expected primary accuracy z-scores correlated strongly with observed
primary accuracy z-scores, r(70) = .41, 95% CI [.19, .58], p < .001. Also, the distribution of
standardized residuals obtained from the model was approximately normally distributed, M =
1.29 × 10-17, Mdn = -0.07, SD = 0.91, p = n.s..
Based on this model, a 1-point increase in mean WMC score was associated with a GLT
primary accuracy increase of 0.07 SD, all other factors held constant, t(69) = 2.51, p < .05, d =
24
0.60. In the cross-validation sample, the main effect of ANT accuracy failed to reach
significance, t(68) = 1.94, p = .05701. However, the main effect of WMC on primary accuracy is
dependent upon EA, t(69) = -3.31, p < .01, d = -0.80.
Table 5: Regression analysis of mean WMC span score and ANT accuracy as predictors of
standardized GLT primary accuracy, cross-validation sample
Source Estimate (β) SE t* p(|t| ≥ t*)
Intercept -0.8000176 0.4431461 -1.805 .07539
WMC 0.073701 0.0387765 2.511 .01438*
WMC*EA -0.0018551 0.0005607 -3.309 .00149**
*p < .05
**p < .01
ITEM RECALL PATTERNS
Finally, an analysis of recall patterns for every GLT list was conducted using locally-
weighted scatterplot smoothening (LOWESS) curves of the recall rank mode (i.e., the most
frequently-occurring order in which a particular item was recalled along the sequence of items
recalled by participants) and difficulty index (p) (i.e., proportion of participants who correctly
recalled the item) (Miller & Lovler, 2016) in order to identify lists in which item recall suggested
evidence of the serial-position effects (Capitani, et al., 1992). As lists 16 and 19 only have two
items each, they were excluded from this analysis. Regardless, the plots and curves for item
recall patterns appear in Appendix D.
The criteria for resembling the serial-position effect simply requires that for any given
list, the LOWESS curves of the difficulty index and recall rank mode resemble a U-shaped curve
about the middle items such that the first and last items are recalled relatively more frequently
(i.e., high value of p) and ahead of (i.e., low recall rank mode) other items on the list (Capitani, et
al., 1992). Using these criteria, evidence of the primacy effect was observed in the difficulty
25
index curves of every list except list 4, as well as in the recall rank mode curve of lists 1, 3-6, 9,
and 13. The recency effect was observed in the difficulty index curves of lists
2-11, 13, 15, 17, and 18, as well as in the recall rank mode curve of lists 1, 2, 5, 8, 12, and 15.
Overall, the primacy effect was observed in some capacity in every list, while the recency effect
was only entirely absent from list 14.
26
DISCUSSION
The body of psychometric evidence presented in this thesis provides support for the GLT
as an alternative measure of WMC to traditional CSTs. Below is an evaluation of the GLT’s
strengths and weaknesses as a complex span task.
THEORETICAL IMPLICATIONS
Hypothesis 1. Despite halving the sample size associated with analyses of internal
consistency, splitting the task showed success in producing two equivalent forms of the GLT,
each unaffected by order effects and above the conventional lower-bound threshold of 0.70 for
internal consistency in CSTs (Conway et al., 2005). Although Cronbach’s coefficient alpha is
subject to biases as the number of items increases (Okada 2013; Vaske, Beaman, & Sponarski,
2017), it serves as a preliminary, conservative estimate of the GLT’s internal consistency
(Cronbach, 1951).
Hypothesis 2. As expected, individuals with clinical WMC disorders produced
significantly lower primary accuracy on the GLT than other participants, although the same
differences were not present in secondary accuracy. Due to the small number of participants with
WMC disorders, however, not much else can be said about within-group differences according to
disorder or treatment.
Moreover, GLT accuracy measures failed to replicate age-linked differences in WMC in
the current sample (Clark et al., 2018). Although the significance test for this analysis
approached significance, it is likely that the variability in age of the current sample was too
restricted to detect a significant correlation between task measures and age.
27
As predicted, none of the other demographics factors were related to GLT measures,
suggesting that measurements obtained from the GLT are equivalent across demographic groups.
Hypotheses 3-6. The GLT primary accuracy measure was normally-distributed, allowing
for calculations of z-scores in regression analyses, and it was related to several of the other
measures. By contrast, secondary accuracy was non-normal and unrelated to any predictors. As
such, it was excluded from further analyses as a WMC measure. However, as most participants
had very high accuracy and low variability on this partial span, it may be useful to use secondary
accuracy to gauge whether participants are completing the task correctly or merely guessing
randomly. In fact, the participants who were excluded from the present sample for not
completing the tasks correctly produced much lower secondary accuracy than others.
As hypothesized, the GLT (specifically, primary accuracy) was strongly related to the
mean score obtained from other CSTs, especially the one most closely related in design: the
Reading Span task. This result is interpreted as convergent evidence of validity for GLT primary
accuracy. In addition, both GLT measures were unrelated to measures obtained from the Task
Switching Task, as expected of CSTs which are unrelated to shifting. Thus, this result provides
discriminant evidence of validity for the GLT measures. Combined, these results are supportive
of the GLT’s construct validity.
Although the correlation between GLT primary accuracy and EA was non-significant,
regression analyses for both samples revealed a significant interaction effect between WMC and
EA. The model fit for the current sample revealed a significant main effect of EA which was
non-significant when WMC was excluded from the model; the same main effect was only
marginally-significant in the cross-validation sample.
28
Regardless, these model fits suggest that variability in the GLT primary accuracy
measure can be explained by tasks which involve the updating and inhibition executive functions
(Miyake et al., 2000). Like other CSTs, the GLT was unrelated to the shifting executive function,
operationalized as measures obtained from Task Switching Task (Miyake et al., 2000).
Accordingly, these convergent and discriminant provide evidence of validity via established
relationships with other latent variables. In other words, GLT primary accuracy and its observed
relationships with other variables were consistent with an established theoretical framework of
working memory (i.e., Miyake et al. (2000)’s executive functions model).
Hypothesis 7. The cross-validation analysis revealed that GLT measures did not differ
across samples, providing evidence in favor of the GLT as a standardized measure. Likewise, the
regression model observed in the present sample was replicated with the cross-validation sample.
Unfortunately, however, the null differences in GLT accuracy were observed along with
statistically-significant differences in WMC and EA between samples, suggesting that the GLT
may not be currently as sensitive as other measures in detecting individual differences.
Although mixed, these preliminary findings suggest the GLT has potential as a
standardized measure WMC with further validation. In addition, although the current study
proposed using the GLT as a standardized measure of WMC, the task may have applicability
elsewhere in cognitive research given that many of the list recall patterns closely resemble the
patterns expected of the serial-position effect (Capitani et al., 1992).
PRACTICAL IMPLICATIONS
The evidence presented in this thesis is mostly supportive of the GLT as a higher-fidelity
measure of WMC. Researchers oriented towards applied cognitive research can benefit from the
29
advantages offered by the GLT over traditional CSTs and use the task to engage the working
memory system in studies of vehicular systems, aviation, user interface, or any activity that
restricts participants to verbal responses. For visually-impaired individuals, the GLT may serve
as an estimate of WMC comparable to that of other CSTs. Finally, with further improvements to
task items, the GLT may be capable of detecting WMC impairments for participants with mental
or learning disorders.
LIMITATIONS
Order effects. The control of order effects was limited to two distinct order sequences in
the present study for practical reasons. Although no order effects were observed between the two
order sequences, the present study does not conclusively eliminate the possibility of order effects
on GLT measures.
Ethnicity and first language. When considering the GLT’s reliability and potential
generalizability, however, it is worth noting that although the effects of ethnicity and first
language were reportedly null in the present sample, the task items are entirely in English and the
task was administered to an overwhelmingly English-speaking (82.61%) and White (55.07%)
sample. Thus, it is possible that further studies may observe significant differences in GLT
accuracy by first language or region due to a lack of familiarity with the vocabulary and
semantics of the items presented in the list.
Shifting. The shifting executive function was operationalized using an adaptation of
Atkins et al. (2014)’s Task Switching Task (TST). Aside from the face validity of task design,
not much is known about the psychometric properties of the TST.
30
Regression analyses. In the interest of examining construct-related evidence of validity,
a confirmatory factor analysis (CFA) would have been a method of studying the relationships
between latent variables in this study and the GLT (Miller & Lovler, 2016). However, the
minimum sample size required for this procedure is approximately 300 (VanVoorhis & Morgan,
2007) and data collection was limited to the summer semester. Although this approach was
methodologically unfeasible for the present study, it may describe the relationships observed in
the regression analysis in more depth.
Cross-validation. The sample obtained for cross-validation does not fit the common
definition of a cross-validation sample, given that these individuals were undergraduate students
recruited from the same university, and the sample size was approximately the same as that of
the current sample. As such, this analysis may be viewed as a quasi-between-subjects analysis of
test-retest reliability rather than cross-validation.
Although no significant differences were found between GLT measures, the samples
were not equivalent in EA and WMC, limiting the scope of conclusions reached by analysis.
FUTURE RESEARCH
In conducting this psychometric analysis of the GLT, the main intent was not to formally
create a task, but rather to demonstrate that the task design paradigm of grocery lists shows
promise in producing higher-fidelity measures of WMC. In fact, I highly suspect that future
research involving the GLT would likely involve modifications of the items on the lists, such as
improving internal consistency by conducting a formal item analysis on a larger sample or
revising problematic items to strengthen the weak inter-item correlations observed in the present
31
study. In turn, these improvements may improve the task’s sensitivity in detecting individual
differences in WMC, as was observed in the cross-validation analysis.
The vocabulary used in the GLT may contaminate some of the obtained measures in
larger samples with participants unfamiliar with certain grocery items. As such, other
psychometrically-oriented researchers may opt to create region- or language-specific items to
minimize the potential effects of semantics on measures obtained from the GLT. Likewise, they
may opt to use adjustments similar to the one proposed for item recall in which similar-sounding
words were recalled instead of the list items, although this particular approach was unsuccessful
in strengthening the relationship with WMC.
Notably, along the process of data collection for this study, participants frequently
reported that some of the items “stood out” to them or were difficult to remember due to their
lack of familiarity with the items, suggesting that the salience effect may have occurred with
some items (Platzer & Bröder, 2012). These occurrences warrant further study as the salience
effect is related to attentional factors (Dowd & Mitroff, 2013), which given the interaction
between WMC and executive attention reported in the regression analysis, certainly make the
salience effect a potential moderator of primary accuracy measures obtained from the GLT.
For cross-validation, future studies may recruit a much larger sample obtained than the
one used, and from a different participant population. As mentioned, a confirmatory factor
analysis would be a more suitable method for future researchers to test how the GLT adheres to
theoretical models of WMC on a larger sample. Coupled with this approach, researchers may opt
to add an external measure of the inhibiting executive function. In addition, it may be worth
exploring what item-level and list-level factors produce the recall pattern indicative of the serial-
32
position effect exhibited in the recall lists, as well as whether age-related differences can be
observed in a sample that is more variable in age.
Finally, it is also foreseeable that an automated version of the GLT may be created with
superior administration characteristics, such as the ability to record and score participant
responses and response times using voice-recognition software (from which online adaptations
can be made), expanding and modifying the grocery item vocabulary pool, minimizing
experimenter bias, providing feedback, and procedurally-generating lists for computer-adaptive
testing (CAT) (Miller & Lovler, 2016).
.
33
APPENDIX A: IRB APPROVAL LETTER
34
APPENDIX A: IRB APPROVAL LETTER
35
36
APPENDIX B: GLT INSTRUCTIONS AND SAMPLE LISTS
37
APPENDIX B: GLT INSTRUCTIONS AND SAMPLE LISTS
Instructions: “In this task, imagine you are going to the grocery store and need to pick up a list
of items. Each item comes in a varying number of boxes. I will tell you the price of each item,
and how many boxes you need to buy. Please verify the total cost of each item by saying ‘Yes’
or ‘No’. We will give you four seconds to respond before moving on to the next item. At the end
of the list, I will ask you for all the items.”
Sample List 1
You need to pick up 2 boxes of $2 bacon. $4? [Yes / No] (Correct response = “Yes”)
3 boxes of $9 onion. $21? [Yes / No] (Correct response = “No”)
2 boxes of $3 garlic. $3? [Yes / No] (Correct response = “No”)
1 box of $9 tuna. $9? [Yes / No] (Correct response = “Yes”)
Please recall the list. (Correct response = “bacon, onion, garlic, and tuna”)
Sample List 2
1 box of $7 corn. $7? [Yes / No] (Correct response = “Yes”)
4 boxes of $1 chocolate. $8? [Yes / No] (Correct response = “No”)
2 boxes of $2 chicken. $5? [Yes / No] (Correct response = “No”)
Please recall the list. (Correct response = “corn, chocolate, and chicken”)
38
APPENDIX C: GLT LISTS
39
APPENDIX C: GLT LISTS
Lists 1-9
Cost Item Question
Correct
answer
List 1
(7 items)
0:00 min
2 boxes of $1 Jello $2 YES
6 boxes of $4 Pasta $24 YES
6 boxes of $7 Berries $36 NO
7 boxes of $1 Mango $7 YES
1 box of $5 Pecans $6 NO
2 boxes of $4 Celery $8 YES
9 boxes of $6 Salami $54 YES
List 2
(3 items)
1:02 min
2 boxes of $1 Parfait $3 NO
3 boxes of $1 Cupcake $4 NO
6 boxes of $7 Cinnamon $49 NO
List 3
(3 items)
1:29 min
3 boxes of $1 Pepperoni $4 NO
3 boxes of $1 Yeast $2 NO
4 boxes of $8 Basil $40 NO
List 4
(3 items)
1:56 min
6 boxes of $4 Walnuts $24 YES
4 boxes of $5 Almonds $25 NO
5 boxes of $1 Zucchini $6 NO
40
List 5
(5 items)
2:24 min
5 boxes of $3 Pasta $15 YES
1 box of $8 Carrots $8 YES
5 boxes of $9 Coleslaw $45 YES
9 boxes of $9 Tangerine $81 YES
4 boxes of $4 Mushrooms $20 NO
List 6
(8 items)
3:09 min
2 boxes of $9 Tea $27 NO
6 boxes of $2 Onion $12 YES
3 boxes of $9 Figs $36 NO
9 boxes of $7 Peas $70 NO
6 boxes of $8 Cashews $48 YES
4 boxes of $4 Plums $12 NO
7 boxes of $1 Strawberries $14 NO
3 boxes of $5 Bacon $20 NO
List 7
(8 items)
4:22 min
1 box of $7 Spinach $7 YES
5 boxes of $6 Jam $36 NO
2 boxes of $4 Nutmeg $8 YES
6 boxes of $2 Ravioli $12 YES
9 boxes of $4 Avocado $45 NO
8 boxes of $8 Cabbage $64 YES
5 boxes of $2 Pickles $15 NO
6 boxes of $4 Vanilla $24 YES
List 8
(6 items)
5:36 min
5 boxes of $5 Cucumber $30 NO
1 box of $4 Grapes $4 YES
6 boxes of $4 Oregano $24 YES
6 boxes of $9 Oatmeal $54 YES
7 boxes of $5 Lettuce $35 YES
4 boxes of $3 Biscuit $16 NO
41
List 9
(7 items)
6:31 min
2 boxes of $3 Artichoke $6 YES
7 boxes of $4 Cheese $5 NO
5 boxes of $5 Peppers $30 NO
2 boxes of $7 Vinegar $21 NO
8 boxes of $5 Croutons $32 NO
1 boxes of $7 Broccoli $14 NO
5 boxes of $6 Margarine $36 NO
Cost Item Question
Correct
answer Lists 10-19
List 10
(7 items)
7:35 min
6 boxes of $9 Radish $54 YES
9 boxes of $2 Paprika $18 YES
8 boxes of $6 Flour $48 YES
9 boxes of $4 Lentil $45 NO
1 boxes of $8 Tamarind $16 NO
4 boxes of $2 Spinach $8 YES
6 boxes of
$1 Waffles $12 NO
List 11
(4 items)
8:40 min
1 boxes of $1 Cherries $2 NO
3 boxes of $6 Olives $18 YES
9 boxes of $8 Almonds $81 NO
7 boxes of $8 Chicken $56 YES
List 12
(5 items)
9:16 min
5 boxes of $8 Corn $40 YES
4 boxes of $2 Tomatoes $8 YES
1 boxes of $1 Ginger $1 YES
2 boxes of $7 Coconut $14 YES
8 boxes of $7 Garlic $56 YES
42
List 13
(6 items)
10:02 min
9 boxes of $4 Salmon $45 NO
6 boxes of $3 Apples $18 YES
7 boxes of $7 Chocolate $42 NO
2 boxes of $1 Sausage $2 YES
1 boxes of $8 Orange $1 NO
2 boxes of $7 Asparagus $21 NO
List 14
(6 items)
10:56 min
2 boxes of $1 Tuna $2 YES
1 boxes of $9 Anchovies $9 YES
4 boxes of $4 Croissants $4 NO
4 boxes of $1 Kimchi $4 YES
6 boxes of $6 Halibut $42 NO
2 boxes of $8 Sauerkraut $16 YES
List 15
(8 items)
11:51 min
5 boxes of $8 Horseradish $8 NO
2 boxes of $6 Nectarines $18 NO
9 boxes of $1 Cranberries $9 YES
1 boxes of $4 Blueberries $4 YES
5 boxes of $1 Crabmeat $1 NO
4 boxes of $2 Quinoa $12 NO
3 boxes of $7 Couscous $21 YES
9 boxes of $5 Hazelnut $45 YES
List 16
(2 items)
13:05 min
5 boxes of $9 Tortillas $45 YES
6 boxes of $2 Shallots $12 YES
43
List 17
(6 items)
13:22 min
5 boxes of $4 Prunes $25 NO
2 boxes of $8 Cilantro $2 NO
4 boxes of $1 Wasabi $8 NO
4 boxes of $5 Hummus $25 NO
8 boxes of $5 Molasses $40 YES
7 boxes of $3 Apricots $21 YES
List 18
(4 items)
14:22 min
7 boxes of $5 Lemongrass $35 YES
5 boxes of $6 Applesauce $36 NO
5 boxes of $5 Barley $5 NO
1 boxes of $2 Hushpuppies $2 YES
List 19
(2 items)
15:00 min
8 boxes of $2 Rigatoni $24 NO
8 boxes of $5 Seasoning $40 YES
44
APPENDIX D: ITEM RECALL PATTERNS
45
APPENDIX D: ITEM RECALL PATTERNS
List 1
List 2
46
List 3
List 4
47
List 5
List 6
48
List 7
List 8
49
List 9
List 10
50
List 11
List 12
51
List 13
List 14
52
List 15
List 16
53
List 17
List 18
54
List 19
55
REFERENCES
Atkins, S. M., Sprenger, A. M., Colflesh, G. H., Briner, T. L., Buchanan, J. B., Chavis, S. E.,
Chen, S.Y., Iannuzi, G.L., Kashtelyan, V., Downling, E., Harbison, J.I., Bolger, D.J.,
Bunting, M.F., & Dougherty, M. R. (2014). Measuring working memory is all fun and
games: A four-dimensional spatial game predicts cognitive task performance.
Experimental Psychology, 61(6), 417-438. doi:10.1027/1618-3169/a000262.
Baddeley, A. D., & Hitch, G. (1974). Working memory. In G.H. Bower (Ed.), The psychology of
learning and motivation: Advances in research and theory (Vol. 8, pp. 47–89). New
York: Academic Press.
Baddeley, A.D. (1986). Working memory. New York, NY, US: Clarendon Press/Oxford
University Press.
Baddeley, A.D. (2009). The functional approach to autobiographical memory. Applied Cognitive
Psychology, 23(8), 1045-1049. doi:10.1002/acp.1608.
Baddeley, A.D. (2012). Working memory: Theories, models, and controversies. Annual Review
of Psychology, 6, 1-30. doi:10.1146/annurev-psych-120710-100422.
Brauer, J., Neumann, J., & Friederici, A. D. (2008). Temporal dynamics of perisylvian activation
during language processing in children and adults. NeuroImage, 41(4), 1484-1492.
Brydges, C. R., Ozolnieks, K. L., & Roberts, G. (2017). Working memory – not processing
speed – mediates fluid intelligence deficits associated with attention deficit/hyperactivity
disorder symptoms. Journal of Neuropsychology, 11(3), 362-377. doi:10.1111/jnp.12096.
Capitani, E., della Sala, S., Logie, R. H., & Spinnler, H. (1992). Recency, primacy, and memory:
Reappraising and standardising the serial position curve. Cortex: A Journal Devoted to
56
the Study of the Nervous System and Behavior, 28(3), 315-342. doi:10.1016/S0010-
9452(13)80143-8.
Clark, K. M., Hardman, K. O., Schachtman, T. R., Saults, J. S., Glass, B. A., & Cowan, N.
(2018). Tone series and the nature of working memory capacity
development. Developmental Psychology, 54(4), 663-676. doi:10.1037/dev0000466.
Cohen, G., & Conway, M. A. (2008). Memory in the Real World., (3rd ed). New York, NY, US:
Psychology Press.
Colbert, A. M. (2018). Working memory and symptoms of attention deficit hyperactivity
disorder in children. Dissertation Abstracts International, 78.
Conway, A. A., Kane, M. J., Bunting, M. F., Hambrick, D. Z., Wilhelm, O., & Engle, R. W.
(2005). Working memory span tasks: A methodological review and user's guide.
Psychonomic Bulletin & Review, 12(5), 769-786. doi:10.3758/BF03196772.
Conway, A. A., Jarrold, C., Kane, M. J., Miyake, A., & Towse, J. N. (2007). Variation in
working memory. New York, NY, US: Oxford University Press.
Cowan, N. An embedded-processes model of working memory. In: Miyake A, Shah P, editors.
Models of working memory: Mechanisms of active maintenance and executive control.
Cambridge, UK: Cambridge University Press; 1999. pp. 62–101.
Cowan, N. (2001). The magical number 4 in short-term memory: A reconsideration of mental
storage capacity. Behavioral and Brain Sciences, 24, 87-114.
Cowan, N. (2005). Working memory capacity. Hove, East Sussex, UK: Psychology Press.
Cowan, N., Morey, C. C., & Chen, Z. (2007). The legend of the magical number seven.
In S. Della Sala, S. Della Sala (Eds.) , Tall tales about the mind & brain: Separating fact
57
from fiction (pp. 45-59). New York, NY, US: Oxford University Press.
doi:10.1093/acprof:oso/9780198568773.003.0005.
Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika. 16(3):
297–334. doi:10.1007/bf02310555.
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory and
reading. Journal of Verbal Learning & Verbal Behavior, 19(4), 450-466.
doi:10.1016/S0022-5371(80)90312-6.
Drew, T., Boettcher, S. P., & Wolfe, J. M. (2016). Searching while loaded: Visual working
memory does not interfere with hybrid search efficiency but hybrid search uses working
memory capacity. Psychonomic Bulletin & Review, 23(1), 201-212. doi:10.3758/s13423-
015-0874-8.
Dowd, E. W., & Mitroff, S. R. (2013). Attentional guidance by working memory overrides
salience cues in visual search. Journal of Experimental Psychology: Human Perception
And Performance, 39(6), 1786-1796. doi:10.1037/a0032548.
Dowd, E. W., Kiyonaga, A., Beck, J. M., & Egner, T. (2014). Probability of guessing, not
precision, changes in mixture models of visual working memory during cognitive control
of attentional guidance. Visual Cognition, 22(8), 1027-1030.
doi:10.1080/13506285.2014.960669.
Engle, R. W., Kane, M. J., & Tuholski, S. W. (1999). Individual differences in working memory
capacity and what they tell us about controlled attention, general fluid intelligence, and
functions of the prefrontal cortex. In A. Miyake, P. Shah, A. Miyake, P. Shah (Eds.),
Models of working memory: Mechanisms of active maintenance and executive control
58
(pp. 102-134). New York, NY, US: Cambridge University Press.
doi:10.1017/CBO9781139174909.007.
Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in
Psychological Science, 11, 19-23. doi:10.1111/1467-8721.00160.
Engle, R. W., & Kane, M. J. (2004). Executive attention, working memory capacity, and a
two-factor theory of cognitive control. In B. H. Ross, B. H. Ross (Eds.), The
psychology of learning and motivation: Advances in research and theory, (Vol. 44, pp.
145-199). New York, NY, US: Elsevier Science.
Eriksen, B. A., & Eriksen, C. W. (1974). Effects of noise letters upon the identification of a
target letter in a nonsearch task. Attention, Perception, & Psychophysics, 16, 143–149.
Fan, J., McCandliss, B. D., Sommer, T., Raz, A., & Posner, M. I. (2002). Testing the efficiency
and independence of attentional networks. Journal of Cognitive Neuroscience, 14, 340-
347.
Felez-Nobrega, M., Foster, J. L., Puig-Ribera, A., Draheim, C., & Hillman, C. H. (2017).
Measuring working memory in the Spanish population: Validation of a multiple
shortened complex span task. Psychological Assessment. doi:10.1037/pas0000484.
Foster, J. L., Shipstead, Z., Harrison, T. L., Hicks, K. L., Redick, T. S., & Engle, R. W. (2015).
Shortened complex span tasks can reliably measure working memory capacity. Memory
& Cognition, 43(2), 226-236. doi:10.3758/s13421-014-0461-7.
Friedman, N. P., Miyake, A., Corley, R. P., Young, S. E., DeFries, J. C., & Hewitt, J. K. (2006).
Not all executive functions are related to intelligence. Psychological Science, 17, 172-
179.
59
Garavan, H. (1998). Serial attention within working memory. Memory & Cognition, 26(2),
263-276. doi:10.3758/BF03201138.
Gathercole, S. E., & Alloway, T. P. (2008). Working memory and classroom learning. In S. K.
Thurman, C. A. Fiorello, S. K. Thurman, C. A. Fiorello (Eds.), Applied cognitive
research in K–3 classrooms (pp. 17-40). New York, NY, US: Routledge/Taylor &
Francis Group.
Gonthier, C., Thomassin, N., & Roulin, J. (2016). The composite complex span: French
validation of a short working memory task. Behavior Research Methods, 48(1), 233-242.
doi:10.3758/s13428-015-0566-3.
Groen, G. J., & Patel, V. L. (1988). The relationship between comprehension and reasoning in
medical expertise. In M. H. Chi, R. Glaser, M. J. Farr, M. H. Chi, R. Glaser, M. J. Farr
(Eds.), The nature of expertise (pp. 287-310). Hillsdale, NJ, US: Lawrence Erlbaum
Associates, Inc.
Heenan, A., Herdman, C. M., Brown, M. S., & Robert, N. (2014). Effects of conversation on
situation awareness and working memory in simulated driving, Human Factors, 56,
1077-1092.
Heitz & Engle (2007). Focusing the spotlight: Individual differences in visual attention control.
Journal of Experimental Psychology: General, 136, 217-240.
Hicks, K. L., Foster, J. L., & Engle, R. W. (2016). Measuring working memory capacity on the
web with the online working memory lab (the OWL). Journal of Applied Research in
Memory and Cognition, 5(4), 478-489. doi:10.1016/j.jarmac.2016.07.010.
Just, M. A., & Carpenter, P. A. (1992). A capacity theory of comprehension: Individual
60
differences in working memory. Psychological Review, 99(1), 122-149.
doi:10.1037/0033-295X.99.1.122.
Kane, M. J., & Engle, R. W. (2003). Working-memory capacity and the control of attention: The
contributions of goal neglect, response competition, and task set to Stroop interference.
Journal of Experimental Psychology: General, 132(1), 47-70. doi:10.1037/0096-
3445.132.1.47.
Kane, M. J., Hambrick, D. Z., Tuholski, S. W., Wilhelm, O., Payne, T. W., & Engle, R. W.
(2004). The generality of working memory capacity: A latent-variable approach to
verbal and visuospatial memory span and reasoning. Journal of Experimental
Psychology: General, 133(2), 189-217. doi:10.1037/0096-3445.133.2.189.
Kane, M. J., Conway, A. R. A., Hambrick, D. Z., & Engle, R. W. (2007). Variation in working
memory capacity as variation in executive attention and control. In A. R. A Conway, C.
Jarrold, M. J. Kane, A. Miyake, & J. N. Towse (Eds.), Variation in Working Memory.
NY: Oxford.
Larkin, J., McDermott, J., Simon, D. P., & Simon, H. A. (1980). Expert and novice performance
in solving physics problems. Science, 208, 1335-1342.
Linares, R., Bajo, M. T., & Pelegrina, S. (2016). Age-related differences in working memory
updating components. Journal of Experimental Child Psychology, 14739-52.
doi:10.1016/j.jecp.2016.02.009.
Logan, G.D. (1997). Automaticity and reading: Perspective from the instance theory of
automatization. Reading and Writing Quarterly, 13, 123-146.
61
Louie (in press). Predicting distracted driving: The role of individual differences in working
memory. Applied Ergonomics.
Manassis, K., Tannock, R., Young, A., & Francis-John, S. (2007). Cognition in anxious children
with attention deficit hyperactivity disorder: a comparison with clinical and normal
children. Behavioral and Brain Functions, 3, 4. doi: 10.1186/1744-9081-3-4.
McMillan, K. M., Laird, A. R., Witt, S. T., & Meyerand, M. E. (2007). Self-paced working
memory: Validation of verbal variations of the n-back paradigm. Brain Research, 1139,
133-142. doi:10.1016/j.brainres.2006.12.058.
Miller, L. A., & Lovler, R. L. (2016). Foundations of psychological testing: A practical
approach (5th ed.). Thousand Oaks, CA, US: Sage Publications, Inc.
Miyake, A., Friedman, N.P., Emerson, M.J., Witzki, A.H., Howerter, A., & Wager T.D. (2000).
Unity and diversity of executive functions and their contribution to complex ‘frontal
lobe’ tasks: A latent variable analysis. Cognitive Psychology, 41, 49-100.
doi:10.1006/cogp.1999.0734.
Oberauer, K. (2002). Access to information in working memory: Exploring the focus of
attention. Journal of Experimental Psychology: Learning, Memory, and Cognition,
28(3), 411-421. doi:10.1037/0278-7393.28.3.411.
Oberauer, K., Lewandowsky, S., Farrell, S., Jarrold, C., & Greaves, M. (2012). Modeling
working memory: An interference model of complex span. Psychonomic Bulletin &
Review, 19(5), 779-819. doi:10.3758/s13423-012-0272-4.
Okada, K. (2013). Is omega squared less biased? A comparison of three major effect size indices
in one-way ANOVA. Behaviormetrika, 40(2), 129-147.
62
Oswald, F. L., McAbee, S. T., Redick, T. S., & Hambrick, D. Z. (2015). The development of a
short domain-general measure of working memory capacity. Behavior Research
Methods, 47(4), 1343-1355. doi:10.3758/s13428-014-0543-2.
Owen, A. M., McMillan, K. M., Laird, A. R., & Bullmore, E. (2005). N-back working memory
paradigm: A meta-analysis of normative functional neuroimaging studies. Human
Brain Mapping, 25(1), 46-59. doi:10.1002/hbm.20131.
Pardo-Vázquez, J. L., & Fernández-Rey, J. (2008). External validation of the computerized
group administrable adaptation of the 'operation span task.'. Behavior Research Methods,
40(1), 46-54. doi:10.3758/BRM.40.1.46.
Pashler, H. (1998). The Psychology of Attention. Cambridge, MA: MIT Press.
Platzer, C., & Bröder, A. (2012). Most people do not ignore salient invalid cues in memory-
based decisions. Psychonomic Bulletin & Review, 19(4), 654-661. doi:10.3758/s13423-
012-0248-4.
Posner, M. I., Snyder, C. R., & Davidson, B. J. (1980). Attention and the detection of signals.
Journal of Experimental Psychology: General, 109(2), 160-174. doi:10.1037/0096-
3445.109.2.160.
Redick, T. S., & Lindsey, D. B. (2013). Complex span and n-back measures of working memory:
A meta-analysis. Psychonomic Bulletin & Review, 20(6), 1102-1113.
doi:10.3758/s13423-013-0453-9.
Redick, T. S., & Engle, R. W. (2006). Working memory capacity and the Attention Network
Test performance. Applied Cognitive Psychology, 20, 713–721.
63
Schroeder, R. W., Twumasi-Ankrah, P., Baade, L. E., & Marshall, P. S. (2012). Reliable Digit
Span: A systematic review and cross-validation study. Assessment, 19(1), 21-30.
doi:10.1177/1073191111428764.
Sohn, Y., & Doane, S., (2003). Roles of working memory capacity and long-term working
memory skill in complex task performance. Memory & Cognition, 31, 458-466.
Tu, H.W., Hampton, R.R. (2014). Control of working memory in rhesus monkeys (Macaca
mulatta). Journal of Experimental Psychology: Animal Learning and Cognition, 40, 467-
476.
Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent?. Journal of
Memory and Language, 28(2), 127-154. doi:10.1016/0749-596X(89)90040-5.
Unsworth, N., Redick, T. S., Heitz, R. P., Broadway, J. M., & Engle, R. W. (2009). Complex
working memory span tasks and higher-order cognition: A latent-variable analysis of the
relationship between processing and storage. Memory, 17(6), 635-654.
doi:10.1080/09658210902998047.
Vaske, J. J., Beaman, J., & Sponarski, C. C. (2017). Rethinking internal consistency in
Cronbach's alpha. Leisure Sciences, 39(2), 163-173.
doi:10.1080/01490400.2015.1127189.
Velichkovsky, B. B. (2017). The concentric model of human working memory: A validation
study using complex span and updating tasks. Psychology in Russia: State of the Art,
10(3), 74-92.
64
Wilson VanVoorhis, C. R. & Morgan, B. L. (2007). Understanding power and rules of thumb for
determining sample sizes. Tutorials in Quantitative Methods for Psychology, 3(2), 43-50.
doi:10.20982/tqmp.03.2.p043.