Upload
ets
View
0
Download
0
Embed Size (px)
Citation preview
Assessing Higher Ed ICT 1
Assessing Information and Communications Technology Literacy for Higher Education
Irvin R. Katz, David M. Williamson, Heather L. Nadelman, Irwin Kirsch, Russell G. Almond, Peter L. Cooper, Margaret L. Redman, and Diego Zapata
Educational Testing Service
Assessing Higher Ed ICT 2
Abstract
Whereas few would argue about the growing importance of information and
communication technology (ICT) skills as they affect educational attainment, workforce
readiness, or lifelong learning, there is less agreement as to what these skills and
knowledge are and how best to measure them. This paper describes an effort to define
ICT Literacy and to design and develop an assessment that measures ICT Literacy for the
higher education community. We begin with a summary of efforts to understand ICT
Literacy as it pertains to educational achievement in a variety of higher educational
settings. We use this foundation to describe how Evidence Centered Design (ECD)
methodology is being used to design and develop a simulations-based assessment that
serves both individuals and higher education institutions. We follow with a presentation
of the challenges and tensions that exist when implementing such an assessment design in
the form of operational simulation tasks. We conclude with a discussion of the strategies
and challenges of automatically scoring such an assessment using Bayesian networks.
Assessing Higher Ed ICT 3
Acknowledgements
We thank the members of the National Higher Education Information and
Communication Technology Initiative for their continued support and contributions to
this project. We also thank Dan Eignor, Aurora Graf, and Pat Kyllonen for their
comments on an earlier version of this manuscript.
Assessing Higher Ed ICT 4
Assessing Information and Communications Technology Literacy for Higher Education
We must rise above the obsession with the quantity of information and the speed of transmission, and focus on the fact that the key issue for us is our ability to organize information once it has been amassed, to assimilate it, to find meaning in it and assure its survival.
Vartan GregorianWhite House Conference on School Libraries
June 4, 2002
Preparing young adults to meet the challenges of the future is a vital part of any
educational system. For many (if not most) of these young adults that future will include
information and communication technologies, both those familiar to us today and those
not yet envisioned. Such technologies are becoming increasingly important in people’s
everyday lives and that presence will most certainly expand in coming years. No longer
relegated to specialized workplace settings or jobs, Information and Communications
Technology (ICT) competencies are now projected by the U.S. Department of Labor to
be required in eight out of the ten fastest growing occupations (Ellis, 2001). Even beyond
the workplace, the ways in which we access and manage information and communicate
with one another in schools, at home, and in the community have become increasingly
technology relevant. Whether one is gathering information about a political candidate,
purchasing an item over the internet, using a simulation tool to learn or better understand
a new concept, managing personal finances, looking up information in an electronic
database or communicating with friends or colleagues, the evidence of this surrounds us.
Recognizing the growing importance of information and communication
technologies in all aspects of people’s lives, ETS convened an international panel in
January of 2001. Covering a 15-month period, the deliberations of this international
panel resulted in a set of recommendations and assumptions about the transformative
nature of ICT competencies. In addition to taking initial steps in laying out a definition
and framework for ICT literacy, the international panel noted in their recommendations
that ETS and others should begin to work with governments and agencies to develop
Assessing Higher Ed ICT 5
measures of ICT literacy. The rationale for such measures, according to the panel, was
grounded in a number of key issues of concern to policy makers and practitioners in the
education community.
ICT is changing the very nature and value of knowledge and information. The growth of information and digital communication technologies, including capabilities for networking and shared environments, is changing the nature of social interactions and collaborative endeavors. Digital technology, in all its forms, allows information to be continuously available and adapted for different uses. Computers, handheld personal digital assistants (PDAs), on-line resources, networks and mobile telephone systems allow us to extend the reach of our cognitive capabilities and communication. Participating in this digital world is fast becoming a necessary condition for successful participation in society.
ICT literacy, in its highest form, has the potential to change the way we live, learn and work. Higher levels of ICT literacy have the potential to transform the lives of individuals who develop these requisite skills and knowledge. Just as researchers have shown that compulsory schooling and literacy lead to changes in how individuals learn and think, future research might show similar advantages resulting from the development and application of ICT literacy skills. For example, researchers studying reading and writing have noted that different cultures and groups may engage in different kinds of literacy practices (Heath, 1980; Scribner & Cole, 1981; Szwed, 1981). The cognitive behaviors connected with these various practices have been associated with the acquisition of different types of knowledge and skills. The transformative nature of information and communication technologies might similarly influence and change not only the kinds of activities we perform at school, at home and in our communities but also how we engage in those activities. As with reading and writing, ICT has the potential to change how we think and learn, advantaging not just the individuals who acquire these skills and knowledge but societies as a whole.
ICT literacy cannot be defined primarily as the mastery of technical skills. The concept of ICT literacy should be broadened to include critical cognitive skills such as reading, numeracy, critical thinking and problem solving and the integration of those skills with technical skills and knowledge. Because of the importance of these underlying cognitive skills, current levels of literacy, critical thinking and problem solving might present a barrier to the attainment of ICT literacy. There are strikingly low levels of general literacy around the world. Even within many OECD countries, there are many young people who fail to develop adequate levels of literacy (OECD, 2001a). Without such skills, it seems doubtful that comprehensive ICT literacy can be attained.
There is a lack of information about the current levels of ICT literacy both within and among countries. Meaningful data from large-scale global assessments and
Assessing Higher Ed ICT 6
from diagnostic tests designed to inform governments, schools, and private sector organizations and consortiums will be crucial in understanding the breadth and gaps in ICT literacy across the world. These data should be important in analyzing the outcomes and effectiveness of current policies and educational programs, as well as in identifying potentially new and more effective strategies.
If information and communication technologies are changing the very nature of
how we live, think, and learn, what are the consequences of lacking skills in this domain?
The negative implications are potentially numerous, not just for individuals but for
societies as a whole. Gary Becker, a Nobel Prize winner in economics recently noted
“human capital is by far the most important form of capital in modern societies” (Becker,
2002). In the emerging global economy, individuals and nations with these skills will
most likely prosper while those lacking them will struggle to compete.
As stated in a recent report titled The Well-Being of Nations (OECD, 2001b),
human capital is made up of the knowledge, skills and attitudes that facilitate the creation
of personal, social and economic well-being. Recent data from national and international
surveys show that, in addition to obtaining and succeeding in a job, literacy and
numeracy skills are also associated with the likelihood that individuals will participate in
lifelong learning, keep abreast of social and political events, and vote in state and national
elections. These data also suggest that literacy is likely to be one of the major pathways
linking education and health and may be a contributing factor to the disparities that have
been observed in the quality of health care in developed countries. Thus, the non-
economic returns to literacy and schooling in the form of enhanced personal well-being
and greater social cohesion have been viewed by some as being as important as the
economic and labor-market returns. According to some, ICT is becoming an essential
literacy for the 21st Century (Partnership for 21st Century Skills, 2003).
Despite widespread consensus about the need for ICT literacy among young
adults, there is little information available to tell us the dimensions of the need or what
might be done to address it. As noted by the international panel, this can be attributed to
the almost exclusive concentration of research on access to technology. In this country
and abroad, countless studies have sought to measure (and thereby close) the “digital
divide” between those who have access to computer hardware, software, and networks,
and those who do not. Access is obviously important, but increased exposure to
Assessing Higher Ed ICT 7
technology does not automatically lead to increased ability to use it. Access is not the
same as understanding.
What is urgently needed, then, is an assessment program that will make it possible
to determine whether (or to what extent) young adults have obtained the combination of
technical and cognitive skills needed to be productive members of an information-rich,
technology-based society.
As part of its subsequent work in the area of ICT literacy, Educational Testing
Service (ETS) has joined forces with seven leading college and university systems in the
United States to create the National Higher Education Information and Communication
Technology Initiative. The purpose of this initiative is design and incorporate scenario-
based, computer delivered and scored tasks into one or more tests of ICT competencies
that reflect the integration of technology and cognitive skills. This paper provides an
overview of the process that was followed in conceptualizing and developing this test.
The project is an ongoing effort that targets the development of a large-scale
survey assessment of ICT literacy for release in 2005 and an accompanying measure of
individual student ICT Literacy for 2006. The project began with an understanding that,
before ICT literacy can be effectively measured, it must be sufficiently defined so that an
appropriate assessment can be designed. The National Higher Education ICT Initiative
committee took the view that ICT literacy is predominantly a cognitive activity, with a
fundamental technical capability required in order to execute cognitive strategies in
information retrieval, use, and dissemination. Based on the work of the international
panel described earlier, the committee developed the following definition of ICT literacy:
ICT Literacy is the ability to appropriately use digital technology, communication tools, and /or networks to solve information problems in order to function in an information society. This includes the ability to use technology as a tool to research, organize, and communicate information and having a fundamental understanding of the ethical / legal issues surrounding accessing and using information.
This definition emphasizes the importance of cognitive activities in acquiring,
interpreting, and disseminating information; the supporting nature of basic technical
competence; and the relevance of operating with an understanding of the legal and
ethical implications for society.
Assessing Higher Ed ICT 8
Once a suitable definition of ICT literacy was agreed upon, the task of assessment
design and development began in earnest. For this effort, we employed the methodology
of Evidence Centered Design (ECD).
Evidence Centered Design of an Assessment of ICT Literacy for Higher Education
Evidence centered design (ECD) is a methodology employed at Educational
Testing Service that emphasizes a logical and explicit representation of an evidence-
based chain of reasoning for assessment design. Such a chain of reasoning provides a
portion of the construct validity evidence for assessment use. ECD emphasizes this
evidential reasoning from the purpose of assessment, to the proficiencies of interest for
measurement, to the evidence required to support hypotheses about such proficiencies.
This process highlights implications of such evidential requirements for task design, and
the nature of performances that must be tracked and recorded to provide such evidence.
The process of ECD centers around four key questions:
Foundations: Who is being measured and why are we measuring them? What claims will we be making about people on the basis of this assessment?
Proficiencies: What proficiencies of people do we want to measure to make appropriate claims from the assessment?
Evidence: How will we recognize and interpret observable evidence of these proficiencies so that we can make these claims?
Tasks: Given limitations on test design, how can we design situations that will elicit valid and reliable observable evidence we need?
Ultimately, the ECD process addresses these questions and develops models for
the assessment design on the basis of these issues. Resultant models include:
Proficiency Model - defines the constructs of interest for the assessment and their interrelationships
Evidence Models - define how observations of behavior are considered as evidence of proficiency
Task Models - describe how assessment tasks must be structured to ensure opportunities to observe behaviors constituting evidence.
Assessing Higher Ed ICT 9
These interrelated models comprise a chain of reasoning for an assessment design
that connects the design of assessment tasks to evidence of proficiencies targeted by the
assessment. Further details on ECD methodology may be found in Mislevy, Steinberg,
& Almond (2003).
Foundations
The first stage of this process is to explore the foundations of the assessment
design: to understand why we are measuring and what interests and populations the
measure is intended to serve. This process, in which the expert committee played a key
role, was folded into discussions on the nature of the construct itself. Several key areas
of discussion had implications for subsequent assessment design, including the purpose
of the assessment, nature of the construct, characteristics of the test users, and the nature
of the population. Each of these foundational decisions was based the needs of the higher
education community the test is intended to serve.
The committee agreed that the purpose of the assessment is to determine the
degree to which students are sufficiently ICT literate to use digital technology,
communication tools, and/or networks to solve information problems likely to be
encountered in most common academic and workplace situations. Potential uses
envisioned for the assessment include:
Understanding student ICT literacy, including comparisons of literacy levels between groups of interest
Informing resource allocation at the institution regarding course offerings, such as a basic ICT literacy course, or curriculum content.
Advising individual students regarding the potential benefits of enrollment in a basic ICT literacy course.
Advising student preparedness to enter academic years, courses of study, or particular courses based on the level of ICT literacy associated with success in these endeavors
For each of these potential applications, appropriate score reports could be
developed. Consumers of the score reports might include academic administrators,
academic advisors, and individual students.
Assessing Higher Ed ICT 10
The interests of the committee also drove the definition of the testing population.
Examinees would consist entirely of students enrolled in a college or university, with a
likely emphasis on students progressing from their sophomore to junior years, in four-
year institutions, or graduating from junior colleges. There is also the potential for use by
entry-level workforce examinees, although the test was specifically designed to avoid any
direct reflection of particular software products or platform.
The format of the test was driven both by committee needs and measurement
constraints. For logistical reasons, the committee determined that the assessment should
not take more than 2 hours—implications of this constraint are discussed in the section on
Task Production, below. The definition of ICT Literacy presented earlier implied that it
would be difficult for simple multiple-choice tasks to address higher-order, integrative
cognitive skills and that computerized simulations would be required to target the kinds
of proficiencies suggested. Such tasks would require specific targeting of cognitive
proficiencies of interest through simulated tasks. These tasks would also require
automated scoring if they are to provide score reports within a timeframe to allow proper
use of assessment results in advisement. The use of automated scoring also allows the
operational use of the assessment for a more modest fee than would be required under
human scoring.
Finally, to discourage use of this assessment as a software product certification,
the design called for a simple and generic technical interface driven by the needs for
cognitive strategy rather than technical proficiency. This aspect of design is also intended
to allow the assessment to remain relevant during continuing evolution and variation of
future commercial software products.
These decisions were folded into a design timeline that incorporates a versioning
approach to assessment design and release. The first use of test tasks will be during field
trials occurring during the latter half of 2004. The field trials will serve as operational
tests for the administrative infrastructure, delivery, automatic scoring, and task
appropriateness. The initial release that incorporates score reporting will be in early 2005
as an institutional survey that will not report scores for individual students. By
administering an institutional survey first, we collect additional data for empirical study
Assessing Higher Ed ICT 11
of reliability, validity and other characteristics required for individual decision-making.
The institutional survey release will be used for college and university decision-making
regarding educational needs and design of curricula. In early 2006, the assessment will
be administered as an instrument for individual decision-making for students while
remaining in service as an institutional survey. With this release, individuals’ scores will
be reported to aid academic advising on students’ enrollment decisions, such as an
individual’s decision whether to take ICT courses.
Proficiency Model
With foundations established for the assessment, the next step is to establish
assessment design models in a formal sense. This effort begins with the development of
the proficiency model, provided as a graphic representation in Figure 1. This model is an
illustration of the concepts previously discussed expressed in a way that represents the
makeup of proficiencies that are targeted by the assessment design.
Figure 1. Higher Education ICT Proficiency Model
As specified in Figure 1, each of the seven subproficiencies includes cognitive,
technical, and social/ethical issues in the definition. The seven subproficiencies are
further defined as:
Define: The ability to use ICT tools to identify and appropriately represent an information need.
Assessing Higher Ed ICT 12
Access: The ability to collect and/or retrieve information in digital environments. This includes the ability to identify likely digital information sources and to get the information from these sources.
Manage: The ability to apply an existing organizational or classification scheme for digital information. This ability focuses on reorganizing existing digital information from a single source using pre-existing organizational formats. This includes the ability to identify preexisting organization schemes, select appropriate scheme(s) for the current usage, and to apply the scheme(s).
Integrate: The ability to interpret and represent digital information. This includes the ability to use ICT tools to synthesize, summarize, compare, and contrast information from multiple digital sources.
Evaluate: The ability to determine the degree to which digital information satisfies the needs of the task in ICT environments. This includes the ability to judge the quality relevance, authority, point-of-view/bias, currency, coverage, or accuracy of digital information.
Create: The ability to generate information by adapting, applying, designing, or inventing information in ICT environments.
Communicate: The ability to communicate information properly in its context of use for ICT environments. This includes the ability to gear electronic information for a particular audience and to communicate knowledge in the appropriate venue.
Evidence Models
With these targeted proficiencies defined, the goal of the evidence model is to
describe how we would optimally evaluate the level of ability in each of these areas of
proficiency. Evidence model develops through several steps:
1) Consider perfect opportunities for naturalistic observations, assuming no constraints and error-free observation
2) Identify sources of evidence in these situations and their value in understanding individual ability
3) List characteristics of these observations and the circumstances under which they are observed that are critical for discriminating among levels of ability
4) Document the characteristics of these observations that most clearly distinguish among these levels of ability
The end result of this evidence modeling process is a formal structure that
represents valued evidence for a proficiency. This structure can be used to inform the
Assessing Higher Ed ICT 13
development of tasks that elicit the necessary evidence. An example of a task designed to
target particular proficiencies is a search task in which students are asked to locate
resources (e.g. articles, web pages) relevant to a research issue (Figure 2).
This task screen illustrates how a task can target specific aspects of proficiency.
This task was designed to assess both Access and Evaluate proficiencies. As outlined
earlier, the access proficiency is defined as the ability to collect and/or retrieve
information in a digital environment. This proficiency is targeted by requiring the
student to access information from the database using the search engine provided (the
results are tracked and strategies scored based on how a student searches for information,
such as key words, sequential refined searches, etc.). The evaluate proficiency is the
ability to identify the degree to which digital information meets the needs of the task.
The proficiency is targeted by requiring the student to select resources to use as
references that meet a specific information need (student choices are tracked and scored
based on tagged characteristics of the sources they choose, including authority, currency,
relevance, etc.). In combination, these tasks evaluate the student ability to locate and
identify “wheat from chaff” with respect to an information need in a searchable database.
Assessing Higher Ed ICT 14
Figure 2. Search Screen from a Sample ICT Assessment Task
Proficiencies Performance Variables
Figure 3. Portion of an Evidence Model for Access and Evaluate Abilities
Assessing Higher Ed ICT 15
Figure 3 provides a sample Evidence Model illustrating how this example
provides evidence that informs our beliefs about student proficiency in Access and
Evaluate. Such a model, in combination with interpretation rules, indicates the
characteristics that must be observed in a performance and how these characteristics are
valued as evidence of targeted abilities. Table 1 represents the mechanics of scoring to
produce the values for “Quality of syntax” and “Quality of selected resources,” which are
two of the five performance variables (quality of search terms; quality of search results;
quality of syntax; use of delimiting terms; and quality of selected resources) in Figure 3.
In subsequent sections, we describe how empirical values are assigned to these models
for scoring.
Table 1. Scoring Table Illustrating how Observable Data are DeterminedObservable Data Work
ProductLevel Measure
Quality of syntax Search terms High Uses AND in first web searchMedium Does not use AND in first web search, but uses
AND in subsequent web searchLow Does not use AND
Quality of selected resources
Selected resources
High All of the resources selected scored 5 points for authority, objectivity, coverage, timeliness, and relevance
Medium At least 80%, but less than 100% of the resources selected scored 5 points for authority, objectivity, coverage, timeliness, and relevance
Low Less than 80% of the resources selected scored 5 points for authority, objectivity, coverage, timeliness, and relevance
Task Models
The process of producing task models is based on the needs defined by the
evidence models. The circumstances of optimal evidence are adapted to the constraints
of the test environment to produce models for task design that specify a number of
elements for task production. These include the nature of observable data that must be
collected, proficiencies that these observable data inform, nature of ICT tools required to
perform the task, cognitive distinctions targeted by the task, elements of construct
represented in task design, and delivery requirements for the task itself. Figure 4 provides
an example of an abbreviated task model for the example task in Figure 2.
Assessing Higher Ed ICT 16
PROFICIENCIES Access, EvaluateOBSERVABLE FEATURES
Quality of search terms with respect to level of specificity in search terms in initial search and in subsequent searches in response to initial search results
Use of delimiting terms Use of syntax Quality of search results (i.e., results returned, not resources selected) Authority, relevance, objectivity, coverage, and currency of resources selected
Element DescriptionMATERIAL PRESENTED TO THE TEST TAKER
Stimulus
Critical parts of the tool functionality and interface
Assignment sheet from instructor
Resource interfaces must have search boxes, help links, limiters (filters) for scholarly content and other common limiters (filters) and the ability to mark, save and email results Advanced search capability
WORK PRODUCT SPECIFICATIONS
Constructed response Log file of student work (includes search terms and syntax, search results)
Selected resources (saved or emailed) TASK MODEL VARIABLES
Variable Name Value/Range
Directedness of the demand Words in the stimulus search terms explicit and specific
(easier); search terms explicit and general
(moderate); search terms implied (harder)
Note: more general search terms in stimulus tend to drive need for successive searches
Number of limiters (filters) in resource interfaces
Number of limiters supported by database interface
1 2 3 4
Note: more limiters make the search easier for skilled people and have no effect for unskilled people
Syntax in most elegant search Optimal syntax terms and or not near none
Note: more restrictive syntax for optimal search may make search easier for skilled searchers; requirement of multiple syntax terms for optimal searches may make search easier for skilled searchers
Figure 4. Abbreviated Model for the Example Task (see Figure 2)
Assessing Higher Ed ICT 17
With task models developed, the chain of reasoning from the construct definition
and purpose of assessment to the evidence required for supporting assessment use and the
elements of task design required for providing the evidence is complete. The next stage
is the development of tasks that meet the evidential requirements of the design.
Task Production for ICT Literacy Assessment using Automatically Scored Simulation Tasks
The challenge in producing tasks from the ECD framework is to meet the
simultaneous and often conflicting constraints of construct targeting, psychometric
soundness, automated scoring capability, and realistic, relevant, and engaging task
settings within a two-hour assessment window. This section describes how such
challenges were addressed in task production for this assessment. This task creation effort
required balancing six issues:
Naturalistic tasks vs. Principles of Measurement
Familiar vs. Academic Context
Cognitive vs. Technical. Emphasis
Technical Fidelity vs. Fairness
Construct Definition vs. Fairness Guidelines
Automated Scoring vs. Unconstrained Work
Naturalistic Tasks vs. Principles of Measurement
An initial challenge in this design was balancing the naturalistic characteristics of
sophisticated and rich environments, implying a high degree of interdependence of
actions within a simulated environment, with measurement requirements of conditional
independence and reliability estimation. This challenge was resolved by constructing
assessment forms that included mixing complex tasks requiring more sophisticated and
cognitively demanding problem-solving with relatively simple problem-solving tasks.
The more complex problems provide evidence concerning four subproficiencies, while
simple tasks are targeted to inform a single subproficiency. In total, the blueprint calls
for 16 tasks per form for individual student measures, with a total of 61 observable pieces
Assessing Higher Ed ICT 18
of data, and is expected to take less than 2 hours to complete. Table 2 shows the
distribution of tasks in a test form using this design. This form design allows for both the
collection of data from rich testing environments requiring substantial cognitive effort
and the collection of multiple observations on a variety of individual abilities to bolster
reliability estimates and scale definition for subscales.
Table 2. Tasks Comprising a Form for Individual Student Assessment
Task Complexity Number in Individual Student Test Form
Typical Number of Observable pieces of data per Task
Expected Completion Time (minutes) per Task
Simple 13 3 4Moderate 2 5 15Complex 1 12 30
Familiar vs. Academic Context
Another challenge is the tension between a task performance context that is
engaging and comfortable for students and a context that is academically challenging.
The former might be argued to facilitate ICT based problem solving in the abstract as
students commonly engage in such situations in informal environments while the latter
might be argued to be more relevant to the ultimate criterion of performance within a
strictly academic environment. In the case of this assessment, the test is being designed
to provide a balance of both academic and non-academic contexts. Subsequent empirical
analyses will investigate the extent to which the context impacts performance and its
relationship with established external validity criteria.
Cognitive vs. Technical Emphasis
The target construct measured by the ICT literacy assessment is the effectiveness
with which students integrate cognitive strategies and technical performance. A
challenge is designing such an assessment is how to represent and balance the relative
emphasis on cognitive elements with technical performance in a way that maintains the
stated goal of emphasizing cognitive aspects of performance. In such a design, the ability
to technically implement solutions is a precondition to success in the cognitive aspects of
task performance. However, by undergoing a test development process that documents
Assessing Higher Ed ICT 19
the extent to which each task requires technical vs. cognitive elements, this aspect is
tracked and balanced for the assessment form. In addition, for technical requirements,
the tasks are specifically designed to require only basic technical functionality. In this
way, the simulated tools mitigate the potential for limitations in students’ technical
proficiency to obscure important aspects of targeted cognitive skill.
Technical Fidelity vs. Fairness
A technical challenge is to balance the goal of providing realistic simulated
technical tools with the goal of eliminating any unfair advantage some students may have
as a result of experience with a particular commercial software package. Whereas
realism is a targeted aspect of the assessment design, so too is an effort to create a
“generic” testing environment that does not privilege users of one operating system over
those of another. The solution implemented was to develop a “stripped-down” word
processor, spreadsheet, email, file manager, presentation, and search engine tools that
contain general menu options common to most applications, but not specific to any. This
ensures that no students have an unfair advantage by making the interface equally
unfamiliar to all students. Elements of this interface will be made available to all students
prior to taking the assessment via test preparation materials.
Construct Definition vs. Fairness Guidelines
An interesting challenge encountered in this development effort that is not
typically encountered in task design is a tension between elements of the construct and
ETS fairness guidelines. The construct of ICT literacy explicitly includes the awareness
of and appropriate behavior with respect to the ethical and social issues of ICT usage for
information problem solving, yet efforts to incorporate this into task development can
easily conflict with ETS fairness guidelines that prohibit the use of potentially offensive
or upsetting material, which is often the very basis of such ethical and social issues in
ICT usage. This issue was resolved in task design but did require delicate navigation of
the presentation of potential issues and how a student would resolve them in the task
design.
Assessing Higher Ed ICT 20
Automated Scoring vs. Unconstrained Work
A final balance is between the development of realistic, scenario-based tasks that
are nevertheless completely scorable by automated scoring systems. Achieving this
balance requires a degree of constraint over the way in which students complete tasks,
particularly those tasks that require word processing or spreadsheet manipulation tools.
A completely free-format task interface would allow users to enter responses that are not
scorable by current computer technology, but a restricted format would reduce such tasks
to purely technical procedures that assess stepwise tool usage rather than the relevant
cognitive abilities. The solution implemented was to apply c-rater (Leacock &
Chodorow, 2003) content scoring technology to score short-answer free text entry tasks
in which moderate constraints are applied to ensure scorability with this technology.
Given the successful (and ongoing) navigation of these challenges, the natural
question is, of course, how such innovative assessment tasks are scored in a way that is
completely automated and consistent with the ECD assessment design. The next section
outlines the current progress and planning for scoring this assessment.
Scoring the Simulations-Based ICT Literacy Assessment
As one might imagine, the scoring of such an assessment is not an afterthought,
but is part of the overall assessment design process. The evidence models specify the
nature of scoring required and a statistical method (or multiple methods) is selected for
eventual implementation prior to task production. This approach allows the task
production to be conducted in a manner that is designed to be consistent with
expectations for scoring.
In this instance, the objective for scoring is straightforward: how can we best
statistically model the value of evidence from observable elements of performance to
update our belief about student ability? There are a number of challenges implicit in this
objective that must be addressed by the scoring mechanism. These include:
Multidimensional proficiency model – assessing many proficiencies in a single task
Multiple scorable elements per task – extracting multiple aspects of a single task performance for scoring
Assessing Higher Ed ICT 21
Conditional dependence – scorable elements of tasks are not completely independent as a result of appearing in a common context
As a result, our scoring method must allow for multidimensional proficiency
models, must be able to accommodate information from multiple sources of evidence,
and must have a mechanism for representing the fact that some sets of observable
elements of performance may share covariance unrelated to the primary construct of
interest as well as sharing covariance that is related to the construct of interest. That is,
the model must be capable of expressly representing assumptions of conditional
independence between variables as well as modeling conditional dependence
relationships. Of these conditionally dependent relationships the model must be capable
of specifying how these conditional dependencies contribute to, or are modeled as
distinct from, targeted proficiencies. This is a situation similar to commonly known
issues with sets of multiple-choice reading comprehension items that refer to a common
reading passage, but with a potentially increased degree of induced dependence.
In some ways, the prior models of assessment design establish the scoring model.
The proficiency model establishes the targets of the assessment and therefore, the latent
variables that the statistical model must be able to accommodate. In the case of this
assessment there are seven subproficiencies for which tasks provide evidence. By design,
there are multiple instances where a single task informs several subproficiencies. The
task model design specifies characteristics of tasks and performance on these tasks
constitutes evidence. In turn, the evidence models specify how the evidence is weighted
and combined with other evidence to inform estimates of ability. Together, these models
provide conceptual relationships between observations and proficiency estimates. It is
therefore the role of the statistical model to apply numeric values that implement these
evidential relationships as a scoring model linking observations and estimates of ability.
Such scoring models have two components:
evidence identification – the process of determining what elements of the task performance constitute evidence and summarizing their values
evidence accumulation – the process of aggregating evidence to update estimates of ability in the proficiency model
Assessing Higher Ed ICT 22
Evidence Identification
Evidence identification for this assessment is initially a rule-based approach that
parses the work products that are produced by the students into observable elements of
performance that can be automatically scored. Essentially, this set of logical rules
determine which aspects of student performance are relevant to scoring and which are
not--- the rules define evidence as distinct from data. The rules specify how we
characterize elements of performance in meaningful ways. An abbreviated example of
such a rule summary is provided in Table 1.
The procedure for implementing evidence identification rules consists of:
Recording the performance
Parsing the work product and then production of observable variables
Prior specification of observable variables
Empirical modification of observable variables
The recording is a technical requirement in that the interface must initially be
capable of tracking relevant actions that students take and recording them in a log file.
Obviously, if a technical limitation precludes the tracking of some behavior of
importance in the interface, that behavior cannot be used in subsequent scoring.
Once some information is recorded in the log file, it must be parsed to extract
relevant information from the raw data collected during administration. For example, if
time is important as a measure of efficiency for some task performance, the log file
would provide timestamps associated with key actions taken. Subsequent parsing s
would require that the relevant actions from the sequence of steps taken be identified in
the log file and the elapsed time computed between these actions. This extraction and
computation of elapsed time represents the initial parsing of the log file for relevant
evidence used in scoring.
Production of observable variables requires using this parsed information to
derive summary variables (observables) used in scoring. For example, once the elapsed
time of some action is parsed, it may be characterized in an observable as “fast” or
“slow” based upon some cutpoint in the elapsed time distribution. Alternatively, it might
Assessing Higher Ed ICT 23
be algorithmically combined with the elapsed time on other steps of the problem-solving
process and/or number of steps undertaken to solve the problem to produce an observable
representing “efficiency of performance” in problem solving. This process of
summarizing the parsed data in observable variables is analogous to the stage of multiple-
choice scoring in which a parsed response (A, B, C or D) is converted into a dichotomous
observable taking on the value of “correct” or “incorrect” by an algorithm comparing the
response to a key. Also, just as in multiple-choice testing, the conversion of a parsed
response to an observable often involves loss of information. In the case of multiple
choice, this loss of information is in using only “correct” or “incorrect” in scoring rather
than the option selected (in some multiple choice tests the option selected is retained in
scoring to infer the kinds of mistakes and misconceptions of the examinee). In this
assessment the loss of information occurs in summarizing one or combinations of
variables as a single polytomous variable.
In the absence of any large sample empirical data the initial specification of the
evaluation rules is based on the logical expectations and requirements of the assessment
design. These decisions are based on the opinion of subject matter experts, assessment
designers and developers, and some small sample of pilot test data. As a result, the
particular cutpoints and algorithms for observables (e.g., the cutpoint on elapsed time to
be classified as “fast” or “slow” or the particular rules that combine variables into a
polytomous “efficiency of performance” variable) may not be optimal for the purposes of
the assessment or may make assumptions that, in light of empirical data, are found to be
false. Therefore, these must always be treated as tentative until they can be subjected to
empirical evaluation and modification on the basis of large-scale field trials. Once such
larger samples of performance are available the assumptions and decisions of the design
team (subject matter experts and assessment designers) must be revisited. Part of this
process includes conducting corollaries of classical item analyses (e.g. percent correct,
option analysis, correlations between scoring element and total performance, etc.) on the
observables. On the basis of these results and comparisons to viable alternative cutpoints
and combinations of variables the evidence identification rules that produce observables
are modified to better serve the purpose of assessment and the, now known, performance
Assessing Higher Ed ICT 24
characteristics of the population of interest. Such modification is not limited to
recombination of existing variables but includes the creation of new observables not
previously defined and extracting new variables of interest from the assessment log files.
Naturally, this implies iterative cycles of evidence identification on the field trial data to
produce the new values of observables to ensure that they are more appropriate than
former versions. Once such analyses and decisions are complete, the evidence
identification scoring algorithms may be considered finalized for operational use.
Evidence Accumulation via Bayesian Networks
The evidence accumulation process takes these observable variables and specifies
how they constitute evidence of abilities in the proficiency model. As such, the evidence
accumulation engine of scoring is responsible for drawing inferences about the students
on the basis of identified evidence. For this scoring process we use Bayesian networks
(Jensen, 1996; Pearl, 1988). Bayesian Networks are based upon Bayes theorem, which
posits that the probability of a variable A can be determined based on the value of B if the
probability of B given A is known and the probabilities of the two variables independent
of each other is also known. This is formally represented as
. To provide an assessment context for this example, the reader
might substitute an ability variable, , for A and an observable, x, for B in the equation
for Bayes theorem. In Bayesian networks this relationship among multiple variables
distribution is expressed as a joint distribution for observable and latent variables
as
, where denotes the variables upon
which z directly depends. By creating an interrelated network of such variables, each
related to the next through Bayes theorem, we can develop a scoring framework that is
capable of propagating evidence from multiple observable variables to multiple
proficiency variables in the proficiency model. In this way, Bayesian networks support
probability-based reasoning about the collection of observable variables from examinee
performance on complex tasks to inferences about levels of ability represented in the
Assessing Higher Ed ICT 25
proficiency model. Through this probability based reasoning the Bayesian networks
serve as a means of transmitting complex observational evidence throughout a network of
interrelated variables to update our estimates of proficiency for a particular student. A
Bayesian network is a graphical model (see Figure 5) of a joint probability distribution
over a set of random variables and consists of the following (Jensen, 1996):
A set of variables (represented by circles and referred to as nodes) with a set of directed edges (represented by arrows) between nodes indicating the statistical dependence between variables. Reflecting the tradition of pedigree analysis for which BINs have been a popular tool, nodes at the source of a directed edge are referred to as parents of nodes at the destination of the directed edge, referred to as children. The structure of these edges represent explicit assumptions about the conditional independence of variables. Specifically, that values of two variables are considered conditionally independent if there is no directed edge between them and they are associated purely through connection with a common parent variable.
For discrete variables each of the variables has a set of exhaustive and mutually exclusive states. For continuous variables the distribution of variable values is defined by the mean and standard deviation of the distribution.
The variables and the directed edges together form an acyclic directed graph (ADG). These graphs are directed in that the directed edges follow a “flow” of dependence in a single direction (i.e., the arrows are always unidirectional rather than bi-directional). The graphs are acyclic in that by following the directional flow of directed edges from any node, it is impossible to return to the node of origin.
To each variable A with parents B1,…,Bn there is attached a conditional probability table or distribution (depending on whether the variable is discrete or continuous) such that for given values of some variables there results a conditional distribution of values of other variables, for example p(A|B1). As such, this distribution provides a basis for subsequent inferences about an examinee given some evidence (observed variables). Figure 5 is reproduced as Figure 6, explicitly showing where conditional probability tables appear in the model.
Assessing Higher Ed ICT 26
Figure 5. Graphical Model of Bayesian Network Scoring
Figure 6. Bayesian Network Scoring with Conditional Probability Tables Represented
Assessing Higher Ed ICT 27
An Example
In applications of Bayesian networks the evidence accumulation portion of
scoring uses observable elements of task performance to update probabilistic estimates of
proficiency model variables. For example, the probability distributions for the
proficiency model and evidence model previously represented as Figures 1 and 3,
respectively, are now combined as a Bayesian network in Figure 7. In this figure, the
probability distributions are specified for five distinct categories of proficiency for
overall ability and three categories of proficiency for each subproficiency in the
proficiency model. Also note that the observable variables previously represented
currently are unobserved values, but each can take on one of three observed values (High,
Medium, and Low). Since no observations have yet been made in the assessment the
probability distributions for the proficiencies reflect uninformative priors. In contrast, the
probability distributions for the observable variables reflect known difficulty
characteristics of the task from pretesting.
Given this model, if we then assume that some observations are made and
observable variables computed, we have an updated Bayes net, presented in Figure 8,
representing observations of:
quality of search terms = Medium
quality of search results = High
quality of syntax = Medium
use of delimiting terms = Medium
quality of selected resources = Low
The figure shows that the estimates of ability for the various subproficiencies
have now been updated to reflect our new belief about ability based on these
observations. In addition, Access and Evaluate have been updated based on direct
evidence while the remaining variables were updated, to a lesser extent, on the basis of
indirect evidence.
Assessing Higher Ed ICT 28
Figure 7. Graphical Model of Bayesian Network Scoring Prior to Observation1
1 Figures 7 and 8 were produced using NeticaTM Application for Belief Networks and Influence Diagrams v2.17 by Norsys Software Corporation.
Assessing Higher Ed ICT 29
ICT_LiteracyAdvancedProficientBasicBelowBasicMinimal
13.219.927.424.415.2
QualityOfSelectedResourcesHighMediumLow
0 0
100
QualityOfSearchTermsHighMediumLow
0 100 0
QualityOfSearchResultsHighMediumLow
100 0 0
QualityOfSyntaxHighMediumLow
0 100 0
UseOfDelimitingTermsHighMediumLow
0 100 0
DefineAboveBasicBasicBelowBasic
26.244.729.2
AccessAboveBasicBasicBelowBasic
30.660.19.32
ManageAboveBasicBasicBelowBasic
26.244.729.2
IntegrateAboveBasicBasicBelowBasic
26.244.729.2
CreateAboveBasicBasicBelowBasic
26.244.729.2
EvaluateAboveBasicBasicBelowBasic
11.743.544.8
Figure 8. Graphical Model of Bayesian Network Scoring Subsequent to Observation
Subject-matter experts provide the initial values for the conditional probability
tables. The values represent weights that contribute to hypotheses about proficiency
model variables. These initial values are treated as tentative values for the purpose of
field testing the capabilities of the model, model structure, and calculation infrastructure.
With the full release of the assessment the data collected during initial release will be
used for empirical calibration of the conditional probability tables using Markov Chain
Monte Carlo estimation (Gilks, Richardson, & Spiegelhalter, 1996).
Conclusion
This paper outlined the assessment design of the Higher Ed ICT literacy
assessment, an Internet delivered assessment that measures a student’s blended cognitive
and technical abilities to use technology to research, organize and communicate
Assessing Higher Ed ICT 30
information. Unlike traditional assessments—which often use discrete, artificial tasks to
evaluate performance—these assessments will evaluate ICT proficiency using a variety
of simple and more complex authentic tasks. The simpler tasks contribute to the overall
reliability of the assessment whereas the more complex tasks focus on the richer aspects
of performance identified as critical for someone to be considered ICT literate. Also
unlike traditional assessments, which typically provide single scores based on isolated
skills, the Higher Ed ICT literacy assessment uses innovative statistical procedures to
produce detailed aggregated information about individuals’ proficiencies in various
contexts. The authentic nature of the assessments, and the involvement of higher
education institutions throughout the development process, ensures both the quality and
validity of the assessment as well as the utility of the results.
References
Becker, G. S. (2002). The age of human capital. In E. P. Lazear (ed.), Education in the
twenty-first century: Hoover Institution Press, Stanford University, 2002.
Ellis, C. (2001, April 3). Innovation in education: The increasing digital world-issues of
today and tomorrow. Presentation at the National IT Workforce Convocation of
the Information Technology Association of America, San Diego, California.
Retrieved from http://www.itaa.org/workforce/events/01conf/highlights.htm. See
also: U.S. Department of Labor (2001). BLS releases 2000-2010 employment
projections. Retrieved from http://www.bls.gov/emp.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.) (1995). Markov chain Monte
Carlo in practice. New York: Chapman and Hall.
Heath, S. B. (1980). The functions and uses of literacy. Journal of Communication, 30,
123-133.
International ICT Literacy Panel. (May, 2002). Digital transformation: A framework for
ICT Literacyliteracy. Princeton, NJ: Educational Testing Service. Available
online at: http://www.ets.org/research/ictliteracy/index.html.
Jensen, F. V. (1996). An introduction to Bayesian network. London: UCL Press.
Leacock, C., & Chodorow, M. (2003). C-rater: Scoring of short-answer questions.
Computers and the Humanities, 37(4), 389-405.
Assessing Higher Ed ICT 31
Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational
assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3-67.
Norsys Software Corp. (2002). NeticaTM Application for Belief Networks and Influence
Diagrams.
Organisation for Economic Co-operation and Development. (2001a). Understanding the digital divide. Paris: Author.
Organisation for Economic Co-operation and Development. (2001b). The well-being of nations. Paris: Author.
Partnership for 21st Century Skills (2003). Learning for the 21st century: A report and
mile guide for 21st century skills. Washington, DC: Author.
Pearl, J. (1988). Probabilistic Reasoning reasoning in Intelligent intelligent
Systemssystems: Networks of Plausible plausible Inferenceinference. Palo Alto:
Morgan Kaufmann Publishers.
Scribner, S., & Cole, M. (1981). The psychology of literacy. Cambridge, MA: Harvard
University Press.
Szwed, J. (1981). The ethnography of literacy. In M. Whitman (Ed.), Writing: The
nature, development, and teaching of written communication: Vol. 1. Hillsdale,
NJ: Erlbaum.