Assessing information and communications technology literacy for higher education

Assessing Higher Ed ICT 1

Assessing Information and Communications Technology Literacy for Higher Education

Irvin R. Katz, David M. Williamson, Heather L. Nadelman, Irwin Kirsch, Russell G. Almond, Peter L. Cooper, Margaret L. Redman, and Diego Zapata

Educational Testing Service


Abstract

Whereas few would argue about the growing importance of information and

communication technology (ICT) skills as they affect educational attainment, workforce

readiness, or lifelong learning, there is less agreement as to what these skills and

knowledge are and how best to measure them. This paper describes an effort to define

ICT Literacy and to design and develop an assessment that measures ICT Literacy for the

higher education community. We begin with a summary of efforts to understand ICT

Literacy as it pertains to educational achievement in a variety of higher educational

settings. We use this foundation to describe how Evidence Centered Design (ECD)

methodology is being used to design and develop a simulations-based assessment that

serves both individuals and higher education institutions. We follow with a presentation

of the challenges and tensions that exist when implementing such an assessment design in

the form of operational simulation tasks. We conclude with a discussion of the strategies

and challenges of automatically scoring such an assessment using Bayesian networks.


Acknowledgements

We thank the members of the National Higher Education Information and

Communication Technology Initiative for their continued support and contributions to

this project. We also thank Dan Eignor, Aurora Graf, and Pat Kyllonen for their

comments on an earlier version of this manuscript.


Assessing Information and Communications Technology Literacy for Higher Education

We must rise above the obsession with the quantity of information and the speed of transmission, and focus on the fact that the key issue for us is our ability to organize information once it has been amassed, to assimilate it, to find meaning in it and assure its survival.

Vartan GregorianWhite House Conference on School Libraries

June 4, 2002

Preparing young adults to meet the challenges of the future is a vital part of any

educational system. For many (if not most) of these young adults that future will include

information and communication technologies, both those familiar to us today and those

not yet envisioned. Such technologies are becoming increasingly important in people’s

everyday lives and that presence will most certainly expand in coming years. No longer

relegated to specialized workplace settings or jobs, Information and Communications

Technology (ICT) competencies are now projected by the U.S. Department of Labor to

be required in eight out of the ten fastest growing occupations (Ellis, 2001). Even beyond

the workplace, the ways in which we access and manage information and communicate

with one another in schools, at home, and in the community have become increasingly

technology relevant. Whether one is gathering information about a political candidate,

purchasing an item over the internet, using a simulation tool to learn or better understand

a new concept, managing personal finances, looking up information in an electronic

database or communicating with friends or colleagues, the evidence of this surrounds us.

Recognizing the growing importance of information and communication

technologies in all aspects of people’s lives, ETS convened an international panel in

January of 2001. Covering a 15-month period, the deliberations of this international

panel resulted in a set of recommendations and assumptions about the transformative

nature of ICT competencies. In addition to taking initial steps in laying out a definition

and framework for ICT literacy, the international panel noted in their recommendations

that ETS and others should begin to work with governments and agencies to develop


measures of ICT literacy. The rationale for such measures, according to the panel, was

grounded in a number of key issues of concern to policy makers and practitioners in the

education community.

ICT is changing the very nature and value of knowledge and information. The growth of information and digital communication technologies, including capabilities for networking and shared environments, is changing the nature of social interactions and collaborative endeavors. Digital technology, in all its forms, allows information to be continuously available and adapted for different uses. Computers, handheld personal digital assistants (PDAs), on-line resources, networks and mobile telephone systems allow us to extend the reach of our cognitive capabilities and communication. Participating in this digital world is fast becoming a necessary condition for successful participation in society.

ICT literacy, in its highest form, has the potential to change the way we live, learn and work. Higher levels of ICT literacy have the potential to transform the lives of individuals who develop these requisite skills and knowledge. Just as researchers have shown that compulsory schooling and literacy lead to changes in how individuals learn and think, future research might show similar advantages resulting from the development and application of ICT literacy skills. For example, researchers studying reading and writing have noted that different cultures and groups may engage in different kinds of literacy practices (Heath, 1980; Scribner & Cole, 1981; Szwed, 1981). The cognitive behaviors connected with these various practices have been associated with the acquisition of different types of knowledge and skills. The transformative nature of information and communication technologies might similarly influence and change not only the kinds of activities we perform at school, at home and in our communities but also how we engage in those activities. As with reading and writing, ICT has the potential to change how we think and learn, advantaging not just the individuals who acquire these skills and knowledge but societies as a whole.

ICT literacy cannot be defined primarily as the mastery of technical skills. The concept of ICT literacy should be broadened to include critical cognitive skills such as reading, numeracy, critical thinking and problem solving and the integration of those skills with technical skills and knowledge. Because of the importance of these underlying cognitive skills, current levels of literacy, critical thinking and problem solving might present a barrier to the attainment of ICT literacy. There are strikingly low levels of general literacy around the world. Even within many OECD countries, there are many young people who fail to develop adequate levels of literacy (OECD, 2001a). Without such skills, it seems doubtful that comprehensive ICT literacy can be attained.

There is a lack of information about the current levels of ICT literacy both within and among countries. Meaningful data from large-scale global assessments and


from diagnostic tests designed to inform governments, schools, and private sector organizations and consortiums will be crucial in understanding the breadth and gaps in ICT literacy across the world. These data should be important in analyzing the outcomes and effectiveness of current policies and educational programs, as well as in identifying potentially new and more effective strategies.

If information and communication technologies are changing the very nature of

how we live, think, and learn, what are the consequences of lacking skills in this domain?

The negative implications are potentially numerous, not just for individuals but for

societies as a whole. Gary Becker, a Nobel Prize winner in economics recently noted

“human capital is by far the most important form of capital in modern societies” (Becker,

2002). In the emerging global economy, individuals and nations with these skills will

most likely prosper while those lacking them will struggle to compete.

As stated in a recent report titled The Well-Being of Nations (OECD, 2001b),

human capital is made up of the knowledge, skills and attitudes that facilitate the creation

of personal, social and economic well-being. Recent data from national and international

surveys show that, in addition to obtaining and succeeding in a job, literacy and

numeracy skills are also associated with the likelihood that individuals will participate in

lifelong learning, keep abreast of social and political events, and vote in state and national

elections. These data also suggest that literacy is likely to be one of the major pathways

linking education and health and may be a contributing factor to the disparities that have

been observed in the quality of health care in developed countries. Thus, the non-

economic returns to literacy and schooling in the form of enhanced personal well-being

and greater social cohesion have been viewed by some as being as important as the

economic and labor-market returns. According to some, ICT is becoming an essential

literacy for the 21st Century (Partnership for 21st Century Skills, 2003).

Despite widespread consensus about the need for ICT literacy among young

adults, there is little information available to tell us the dimensions of the need or what

might be done to address it. As noted by the international panel, this can be attributed to

the almost exclusive concentration of research on access to technology. In this country

and abroad, countless studies have sought to measure (and thereby close) the “digital

divide” between those who have access to computer hardware, software, and networks,

and those who do not. Access is obviously important, but increased exposure to


technology does not automatically lead to increased ability to use it. Access is not the

same as understanding.

What is urgently needed, then, is an assessment program that will make it possible

to determine whether (or to what extent) young adults have obtained the combination of

technical and cognitive skills needed to be productive members of an information-rich,

technology-based society.

As part of its subsequent work in the area of ICT literacy, Educational Testing

Service (ETS) has joined forces with seven leading college and university systems in the

United States to create the National Higher Education Information and Communication

Technology Initiative. The purpose of this initiative is design and incorporate scenario-

based, computer delivered and scored tasks into one or more tests of ICT competencies

that reflect the integration of technology and cognitive skills. This paper provides an

overview of the process that was followed in conceptualizing and developing this test.

The project is an ongoing effort that targets the development of a large-scale

survey assessment of ICT literacy for release in 2005 and an accompanying measure of

individual student ICT Literacy for 2006. The project began with an understanding that,

before ICT literacy can be effectively measured, it must be sufficiently defined so that an

appropriate assessment can be designed. The National Higher Education ICT Initiative

committee took the view that ICT literacy is predominantly a cognitive activity, with a

fundamental technical capability required in order to execute cognitive strategies in

information retrieval, use, and dissemination. Based on the work of the international

panel described earlier, the committee developed the following definition of ICT literacy:

ICT Literacy is the ability to appropriately use digital technology, communication tools, and /or networks to solve information problems in order to function in an information society. This includes the ability to use technology as a tool to research, organize, and communicate information and having a fundamental understanding of the ethical / legal issues surrounding accessing and using information.

This definition emphasizes the importance of cognitive activities in acquiring,

interpreting, and disseminating information; the supporting nature of basic technical

competence; and the relevance of operating with an understanding of the legal and

ethical implications for society.


Once a suitable definition of ICT literacy was agreed upon, the task of assessment

design and development began in earnest. For this effort, we employed the methodology

of Evidence Centered Design (ECD).

Evidence Centered Design of an Assessment of ICT Literacy for Higher Education

Evidence centered design (ECD) is a methodology employed at Educational

Testing Service that emphasizes a logical and explicit representation of an evidence-

based chain of reasoning for assessment design. Such a chain of reasoning provides a

portion of the construct validity evidence for assessment use. ECD emphasizes this

evidential reasoning from the purpose of assessment, to the proficiencies of interest for

measurement, to the evidence required to support hypotheses about such proficiencies.

This process highlights implications of such evidential requirements for task design, and

the nature of performances that must be tracked and recorded to provide such evidence.

The process of ECD centers around four key questions:

Foundations: Who is being measured and why are we measuring them? What claims will we be making about people on the basis of this assessment?

Proficiencies: What proficiencies of people do we want to measure to make appropriate claims from the assessment?

Evidence: How will we recognize and interpret observable evidence of these proficiencies so that we can make these claims?

Tasks: Given limitations on test design, how can we design situations that will elicit valid and reliable observable evidence we need?

Ultimately, the ECD process addresses these questions and develops models for

the assessment design on the basis of these issues. Resultant models include:

Proficiency Model - defines the constructs of interest for the assessment and their interrelationships

Evidence Models - define how observations of behavior are considered as evidence of proficiency

Task Models - describe how assessment tasks must be structured to ensure opportunities to observe behaviors constituting evidence.


These interrelated models comprise a chain of reasoning for an assessment design

that connects the design of assessment tasks to evidence of proficiencies targeted by the

assessment. Further details on ECD methodology may be found in Mislevy, Steinberg,

& Almond (2003).

Foundations

The first stage of this process is to explore the foundations of the assessment

design: to understand why we are measuring and what interests and populations the

measure is intended to serve. This process, in which the expert committee played a key

role, was folded into discussions on the nature of the construct itself. Several key areas

of discussion had implications for subsequent assessment design, including the purpose

of the assessment, nature of the construct, characteristics of the test users, and the nature

of the population. Each of these foundational decisions was based the needs of the higher

education community the test is intended to serve.

The committee agreed that the purpose of the assessment is to determine the

degree to which students are sufficiently ICT literate to use digital technology,

communication tools, and/or networks to solve information problems likely to be

encountered in most common academic and workplace situations. Potential uses

envisioned for the assessment include:

Understanding student ICT literacy, including comparisons of literacy levels between groups of interest

Informing resource allocation at the institution regarding course offerings, such as a basic ICT literacy course, or curriculum content.

Advising individual students regarding the potential benefits of enrollment in a basic ICT literacy course.

Advising student preparedness to enter academic years, courses of study, or particular courses based on the level of ICT literacy associated with success in these endeavors

For each of these potential applications, appropriate score reports could be

developed. Consumers of the score reports might include academic administrators,

academic advisors, and individual students.


The interests of the committee also drove the definition of the testing population.

Examinees would consist entirely of students enrolled in a college or university, with a

likely emphasis on students progressing from their sophomore to junior years, in four-

year institutions, or graduating from junior colleges. There is also the potential for use by

entry-level workforce examinees, although the test was specifically designed to avoid any

direct reflection of particular software products or platform.

The format of the test was driven both by committee needs and measurement

constraints. For logistical reasons, the committee determined that the assessment should

not take more than 2 hours—implications of this constraint are discussed in the section on

Task Production, below. The definition of ICT Literacy presented earlier implied that it

would be difficult for simple multiple-choice tasks to address higher-order, integrative

cognitive skills and that computerized simulations would be required to target the kinds

of proficiencies suggested. Such tasks would require specific targeting of cognitive

proficiencies of interest through simulated tasks. These tasks would also require

automated scoring if they are to provide score reports within a timeframe to allow proper

use of assessment results in advisement. The use of automated scoring also allows the

operational use of the assessment for a more modest fee than would be required under

human scoring.

Finally, to discourage use of this assessment as a software product certification,

the design called for a simple and generic technical interface driven by the needs for

cognitive strategy rather than technical proficiency. This aspect of design is also intended

to allow the assessment to remain relevant during continuing evolution and variation of

future commercial software products.

These decisions were folded into a design timeline that incorporates a versioning

approach to assessment design and release. The first use of test tasks will be during field

trials occurring during the latter half of 2004. The field trials will serve as operational

tests for the administrative infrastructure, delivery, automatic scoring, and task

appropriateness. The initial release that incorporates score reporting will be in early 2005

as an institutional survey that will not report scores for individual students. By

administering an institutional survey first, we collect additional data for empirical study


of reliability, validity and other characteristics required for individual decision-making.

The institutional survey release will be used for college and university decision-making

regarding educational needs and design of curricula. In early 2006, the assessment will

be administered as an instrument for individual decision-making for students while

remaining in service as an institutional survey. With this release, individuals’ scores will

be reported to aid academic advising on students’ enrollment decisions, such as an

individual’s decision whether to take ICT courses.

Proficiency Model

With foundations established for the assessment, the next step is to establish

assessment design models in a formal sense. This effort begins with the development of

the proficiency model, provided as a graphic representation in Figure 1. This model is an

illustration of the concepts previously discussed expressed in a way that represents the

makeup of proficiencies that are targeted by the assessment design.

Figure 1. Higher Education ICT Proficiency Model

As specified in Figure 1, each of the seven subproficiencies includes cognitive,

technical, and social/ethical issues in the definition. The seven subproficiencies are

further defined as:

Define: The ability to use ICT tools to identify and appropriately represent an information need.


Access: The ability to collect and/or retrieve information in digital environments. This includes the ability to identify likely digital information sources and to get the information from these sources.

Manage: The ability to apply an existing organizational or classification scheme for digital information. This ability focuses on reorganizing existing digital information from a single source using pre-existing organizational formats. This includes the ability to identify preexisting organization schemes, select appropriate scheme(s) for the current usage, and to apply the scheme(s).

Integrate: The ability to interpret and represent digital information. This includes the ability to use ICT tools to synthesize, summarize, compare, and contrast information from multiple digital sources.

Evaluate: The ability to determine the degree to which digital information satisfies the needs of the task in ICT environments. This includes the ability to judge the quality relevance, authority, point-of-view/bias, currency, coverage, or accuracy of digital information.

Create: The ability to generate information by adapting, applying, designing, or inventing information in ICT environments.

Communicate: The ability to communicate information properly in its context of use for ICT environments. This includes the ability to gear electronic information for a particular audience and to communicate knowledge in the appropriate venue.

Evidence Models

With these targeted proficiencies defined, the goal of the evidence model is to

describe how we would optimally evaluate the level of ability in each of these areas of

proficiency. Evidence model develops through several steps:

1) Consider perfect opportunities for naturalistic observations, assuming no constraints and error-free observation

2) Identify sources of evidence in these situations and their value in understanding individual ability

3) List characteristics of these observations and the circumstances under which they are observed that are critical for discriminating among levels of ability

4) Document the characteristics of these observations that most clearly distinguish among these levels of ability

The end result of this evidence modeling process is a formal structure that

represents valued evidence for a proficiency. This structure can be used to inform the


development of tasks that elicit the necessary evidence. An example of a task designed to

target particular proficiencies is a search task in which students are asked to locate

resources (e.g. articles, web pages) relevant to a research issue (Figure 2).

This task screen illustrates how a task can target specific aspects of proficiency.

This task was designed to assess both Access and Evaluate proficiencies. As outlined

earlier, the access proficiency is defined as the ability to collect and/or retrieve

information in a digital environment. This proficiency is targeted by requiring the

student to access information from the database using the search engine provided (the

results are tracked and strategies scored based on how a student searches for information,

such as key words, sequential refined searches, etc.). The evaluate proficiency is the

ability to identify the degree to which digital information meets the needs of the task.

The proficiency is targeted by requiring the student to select resources to use as

references that meet a specific information need (student choices are tracked and scored

based on tagged characteristics of the sources they choose, including authority, currency,

relevance, etc.). In combination, these tasks evaluate the student ability to locate and

identify “wheat from chaff” with respect to an information need in a searchable database.


Figure 2. Search Screen from a Sample ICT Assessment Task

Proficiencies Performance Variables

Figure 3. Portion of an Evidence Model for Access and Evaluate Abilities


Figure 3 provides a sample Evidence Model illustrating how this example

provides evidence that informs our beliefs about student proficiency in Access and

Evaluate. Such a model, in combination with interpretation rules, indicates the

characteristics that must be observed in a performance and how these characteristics are

valued as evidence of targeted abilities. Table 1 represents the mechanics of scoring to

produce the values for “Quality of syntax” and “Quality of selected resources,” which are

two of the five performance variables (quality of search terms; quality of search results;

quality of syntax; use of delimiting terms; and quality of selected resources) in Figure 3.

In subsequent sections, we describe how empirical values are assigned to these models

for scoring.

Table 1. Scoring Table Illustrating how Observable Data are DeterminedObservable Data Work

ProductLevel Measure

Quality of syntax Search terms High Uses AND in first web searchMedium Does not use AND in first web search, but uses

AND in subsequent web searchLow Does not use AND

Quality of selected resources

Selected resources

High All of the resources selected scored 5 points for authority, objectivity, coverage, timeliness, and relevance

Medium At least 80%, but less than 100% of the resources selected scored 5 points for authority, objectivity, coverage, timeliness, and relevance

Low Less than 80% of the resources selected scored 5 points for authority, objectivity, coverage, timeliness, and relevance

Task Models

The process of producing task models is based on the needs defined by the

evidence models. The circumstances of optimal evidence are adapted to the constraints

of the test environment to produce models for task design that specify a number of

elements for task production. These include the nature of observable data that must be

collected, proficiencies that these observable data inform, nature of ICT tools required to

perform the task, cognitive distinctions targeted by the task, elements of construct

represented in task design, and delivery requirements for the task itself. Figure 4 provides

an example of an abbreviated task model for the example task in Figure 2.


PROFICIENCIES Access, EvaluateOBSERVABLE FEATURES

Quality of search terms with respect to level of specificity in search terms in initial search and in subsequent searches in response to initial search results

Use of delimiting terms Use of syntax Quality of search results (i.e., results returned, not resources selected) Authority, relevance, objectivity, coverage, and currency of resources selected

Element DescriptionMATERIAL PRESENTED TO THE TEST TAKER

Stimulus

Critical parts of the tool functionality and interface

Assignment sheet from instructor

Resource interfaces must have search boxes, help links, limiters (filters) for scholarly content and other common limiters (filters) and the ability to mark, save and email results Advanced search capability

WORK PRODUCT SPECIFICATIONS

Constructed response Log file of student work (includes search terms and syntax, search results)

Selected resources (saved or emailed) TASK MODEL VARIABLES

Variable Name Value/Range

Directedness of the demand Words in the stimulus search terms explicit and specific

(easier); search terms explicit and general

(moderate); search terms implied (harder)

Note: more general search terms in stimulus tend to drive need for successive searches

Number of limiters (filters) in resource interfaces

Number of limiters supported by database interface

1 2 3 4

Note: more limiters make the search easier for skilled people and have no effect for unskilled people

Syntax in most elegant search Optimal syntax terms and or not near none

Note: more restrictive syntax for optimal search may make search easier for skilled searchers; requirement of multiple syntax terms for optimal searches may make search easier for skilled searchers

Figure 4. Abbreviated Model for the Example Task (see Figure 2)


With task models developed, the chain of reasoning from the construct definition

and purpose of assessment to the evidence required for supporting assessment use and the

elements of task design required for providing the evidence is complete. The next stage

is the development of tasks that meet the evidential requirements of the design.

Task Production for ICT Literacy Assessment using Automatically Scored Simulation Tasks

The challenge in producing tasks from the ECD framework is to meet the

simultaneous and often conflicting constraints of construct targeting, psychometric

soundness, automated scoring capability, and realistic, relevant, and engaging task

settings within a two-hour assessment window. This section describes how such

challenges were addressed in task production for this assessment. This task creation effort

required balancing six issues:

Naturalistic tasks vs. Principles of Measurement

Familiar vs. Academic Context

Cognitive vs. Technical. Emphasis

Technical Fidelity vs. Fairness

Construct Definition vs. Fairness Guidelines

Automated Scoring vs. Unconstrained Work

Naturalistic Tasks vs. Principles of Measurement

An initial challenge in this design was balancing the naturalistic characteristics of

sophisticated and rich environments, implying a high degree of interdependence of

actions within a simulated environment, with measurement requirements of conditional

independence and reliability estimation. This challenge was resolved by constructing

assessment forms that included mixing complex tasks requiring more sophisticated and

cognitively demanding problem-solving with relatively simple problem-solving tasks.

The more complex problems provide evidence concerning four subproficiencies, while

simple tasks are targeted to inform a single subproficiency. In total, the blueprint calls

for 16 tasks per form for individual student measures, with a total of 61 observable pieces


of data, and is expected to take less than 2 hours to complete. Table 2 shows the

distribution of tasks in a test form using this design. This form design allows for both the

collection of data from rich testing environments requiring substantial cognitive effort

and the collection of multiple observations on a variety of individual abilities to bolster

reliability estimates and scale definition for subscales.

Table 2. Tasks Comprising a Form for Individual Student Assessment

Task Complexity Number in Individual Student Test Form

Typical Number of Observable pieces of data per Task

Expected Completion Time (minutes) per Task

Simple 13 3 4Moderate 2 5 15Complex 1 12 30

Familiar vs. Academic Context

Another challenge is the tension between a task performance context that is

engaging and comfortable for students and a context that is academically challenging.

The former might be argued to facilitate ICT based problem solving in the abstract as

students commonly engage in such situations in informal environments while the latter

might be argued to be more relevant to the ultimate criterion of performance within a

strictly academic environment. In the case of this assessment, the test is being designed

to provide a balance of both academic and non-academic contexts. Subsequent empirical

analyses will investigate the extent to which the context impacts performance and its

relationship with established external validity criteria.

Cognitive vs. Technical Emphasis

The target construct measured by the ICT literacy assessment is the effectiveness

with which students integrate cognitive strategies and technical performance. A

challenge is designing such an assessment is how to represent and balance the relative

emphasis on cognitive elements with technical performance in a way that maintains the

stated goal of emphasizing cognitive aspects of performance. In such a design, the ability

to technically implement solutions is a precondition to success in the cognitive aspects of

task performance. However, by undergoing a test development process that documents


the extent to which each task requires technical vs. cognitive elements, this aspect is

tracked and balanced for the assessment form. In addition, for technical requirements,

the tasks are specifically designed to require only basic technical functionality. In this

way, the simulated tools mitigate the potential for limitations in students’ technical

proficiency to obscure important aspects of targeted cognitive skill.

Technical Fidelity vs. Fairness

A technical challenge is to balance the goal of providing realistic simulated

technical tools with the goal of eliminating any unfair advantage some students may have

as a result of experience with a particular commercial software package. Whereas

realism is a targeted aspect of the assessment design, so too is an effort to create a

“generic” testing environment that does not privilege users of one operating system over

those of another. The solution implemented was to develop a “stripped-down” word

processor, spreadsheet, email, file manager, presentation, and search engine tools that

contain general menu options common to most applications, but not specific to any. This

ensures that no students have an unfair advantage by making the interface equally

unfamiliar to all students. Elements of this interface will be made available to all students

prior to taking the assessment via test preparation materials.

Construct Definition vs. Fairness Guidelines

An interesting challenge encountered in this development effort that is not

typically encountered in task design is a tension between elements of the construct and

ETS fairness guidelines. The construct of ICT literacy explicitly includes the awareness

of and appropriate behavior with respect to the ethical and social issues of ICT usage for

information problem solving, yet efforts to incorporate this into task development can

easily conflict with ETS fairness guidelines that prohibit the use of potentially offensive

or upsetting material, which is often the very basis of such ethical and social issues in

ICT usage. This issue was resolved in task design but did require delicate navigation of

the presentation of potential issues and how a student would resolve them in the task

design.


Automated Scoring vs. Unconstrained Work

A final balance is between the development of realistic, scenario-based tasks that

are nevertheless completely scorable by automated scoring systems. Achieving this

balance requires a degree of constraint over the way in which students complete tasks,

particularly those tasks that require word processing or spreadsheet manipulation tools.

A completely free-format task interface would allow users to enter responses that are not

scorable by current computer technology, but a restricted format would reduce such tasks

to purely technical procedures that assess stepwise tool usage rather than the relevant

cognitive abilities. The solution implemented was to apply c-rater (Leacock &

Chodorow, 2003) content scoring technology to score short-answer free text entry tasks

in which moderate constraints are applied to ensure scorability with this technology.

Given the successful (and ongoing) navigation of these challenges, the natural

question is, of course, how such innovative assessment tasks are scored in a way that is

completely automated and consistent with the ECD assessment design. The next section

outlines the current progress and planning for scoring this assessment.

Scoring the Simulations-Based ICT Literacy Assessment

As one might imagine, the scoring of such an assessment is not an afterthought,

but is part of the overall assessment design process. The evidence models specify the

nature of scoring required and a statistical method (or multiple methods) is selected for

eventual implementation prior to task production. This approach allows the task

production to be conducted in a manner that is designed to be consistent with

expectations for scoring.

In this instance, the objective for scoring is straightforward: how can we best

statistically model the value of evidence from observable elements of performance to

update our belief about student ability? There are a number of challenges implicit in this

objective that must be addressed by the scoring mechanism. These include:

Multidimensional proficiency model – assessing many proficiencies in a single task

Multiple scorable elements per task – extracting multiple aspects of a single task performance for scoring


Conditional dependence – scorable elements of tasks are not completely independent as a result of appearing in a common context

As a result, our scoring method must allow for multidimensional proficiency

models, must be able to accommodate information from multiple sources of evidence,

and must have a mechanism for representing the fact that some sets of observable

elements of performance may share covariance unrelated to the primary construct of

interest as well as sharing covariance that is related to the construct of interest. That is,

the model must be capable of expressly representing assumptions of conditional

independence between variables as well as modeling conditional dependence

relationships. Of these conditionally dependent relationships the model must be capable

of specifying how these conditional dependencies contribute to, or are modeled as

distinct from, targeted proficiencies. This is a situation similar to commonly known

issues with sets of multiple-choice reading comprehension items that refer to a common

reading passage, but with a potentially increased degree of induced dependence.

In some ways, the prior models of assessment design establish the scoring model.

The proficiency model establishes the targets of the assessment and therefore, the latent

variables that the statistical model must be able to accommodate. In the case of this

assessment there are seven subproficiencies for which tasks provide evidence. By design,

there are multiple instances where a single task informs several subproficiencies. The

task model design specifies characteristics of tasks and performance on these tasks

constitutes evidence. In turn, the evidence models specify how the evidence is weighted

and combined with other evidence to inform estimates of ability. Together, these models

provide conceptual relationships between observations and proficiency estimates. It is

therefore the role of the statistical model to apply numeric values that implement these

evidential relationships as a scoring model linking observations and estimates of ability.

Such scoring models have two components:

evidence identification – the process of determining what elements of the task performance constitute evidence and summarizing their values

evidence accumulation – the process of aggregating evidence to update estimates of ability in the proficiency model


Evidence Identification

Evidence identification for this assessment is initially a rule-based approach that

parses the work products that are produced by the students into observable elements of

performance that can be automatically scored. Essentially, this set of logical rules

determine which aspects of student performance are relevant to scoring and which are

not--- the rules define evidence as distinct from data. The rules specify how we

characterize elements of performance in meaningful ways. An abbreviated example of

such a rule summary is provided in Table 1.

The procedure for implementing evidence identification rules consists of:

Recording the performance

Parsing the work product and then production of observable variables

Prior specification of observable variables

Empirical modification of observable variables

The recording is a technical requirement in that the interface must initially be

capable of tracking relevant actions that students take and recording them in a log file.

Obviously, if a technical limitation precludes the tracking of some behavior of

importance in the interface, that behavior cannot be used in subsequent scoring.

Once some information is recorded in the log file, it must be parsed to extract

relevant information from the raw data collected during administration. For example, if

time is important as a measure of efficiency for some task performance, the log file

would provide timestamps associated with key actions taken. Subsequent parsing s

would require that the relevant actions from the sequence of steps taken be identified in

the log file and the elapsed time computed between these actions. This extraction and

computation of elapsed time represents the initial parsing of the log file for relevant

evidence used in scoring.

Production of observable variables requires using this parsed information to

derive summary variables (observables) used in scoring. For example, once the elapsed

time of some action is parsed, it may be characterized in an observable as “fast” or

“slow” based upon some cutpoint in the elapsed time distribution. Alternatively, it might


be algorithmically combined with the elapsed time on other steps of the problem-solving

process and/or number of steps undertaken to solve the problem to produce an observable

representing “efficiency of performance” in problem solving. This process of

summarizing the parsed data in observable variables is analogous to the stage of multiple-

choice scoring in which a parsed response (A, B, C or D) is converted into a dichotomous

observable taking on the value of “correct” or “incorrect” by an algorithm comparing the

response to a key. Also, just as in multiple-choice testing, the conversion of a parsed

response to an observable often involves loss of information. In the case of multiple

choice, this loss of information is in using only “correct” or “incorrect” in scoring rather

than the option selected (in some multiple choice tests the option selected is retained in

scoring to infer the kinds of mistakes and misconceptions of the examinee). In this

assessment the loss of information occurs in summarizing one or combinations of

variables as a single polytomous variable.

In the absence of any large sample empirical data the initial specification of the

evaluation rules is based on the logical expectations and requirements of the assessment

design. These decisions are based on the opinion of subject matter experts, assessment

designers and developers, and some small sample of pilot test data. As a result, the

particular cutpoints and algorithms for observables (e.g., the cutpoint on elapsed time to

be classified as “fast” or “slow” or the particular rules that combine variables into a

polytomous “efficiency of performance” variable) may not be optimal for the purposes of

the assessment or may make assumptions that, in light of empirical data, are found to be

false. Therefore, these must always be treated as tentative until they can be subjected to

empirical evaluation and modification on the basis of large-scale field trials. Once such

larger samples of performance are available the assumptions and decisions of the design

team (subject matter experts and assessment designers) must be revisited. Part of this

process includes conducting corollaries of classical item analyses (e.g. percent correct,

option analysis, correlations between scoring element and total performance, etc.) on the

observables. On the basis of these results and comparisons to viable alternative cutpoints

and combinations of variables the evidence identification rules that produce observables

are modified to better serve the purpose of assessment and the, now known, performance


characteristics of the population of interest. Such modification is not limited to

recombination of existing variables but includes the creation of new observables not

previously defined and extracting new variables of interest from the assessment log files.

Naturally, this implies iterative cycles of evidence identification on the field trial data to

produce the new values of observables to ensure that they are more appropriate than

former versions. Once such analyses and decisions are complete, the evidence

identification scoring algorithms may be considered finalized for operational use.

Evidence Accumulation via Bayesian Networks

The evidence accumulation process takes these observable variables and specifies

how they constitute evidence of abilities in the proficiency model. As such, the evidence

accumulation engine of scoring is responsible for drawing inferences about the students

on the basis of identified evidence. For this scoring process we use Bayesian networks

(Jensen, 1996; Pearl, 1988). Bayesian Networks are based upon Bayes theorem, which

posits that the probability of a variable A can be determined based on the value of B if the

probability of B given A is known and the probabilities of the two variables independent

of each other is also known. This is formally represented as

. To provide an assessment context for this example, the reader

might substitute an ability variable, , for A and an observable, x, for B in the equation

for Bayes theorem. In Bayesian networks this relationship among multiple variables

distribution is expressed as a joint distribution for observable and latent variables

as

, where denotes the variables upon

which z directly depends. By creating an interrelated network of such variables, each

related to the next through Bayes theorem, we can develop a scoring framework that is

capable of propagating evidence from multiple observable variables to multiple

proficiency variables in the proficiency model. In this way, Bayesian networks support

probability-based reasoning about the collection of observable variables from examinee

performance on complex tasks to inferences about levels of ability represented in the


proficiency model. Through this probability based reasoning the Bayesian networks

serve as a means of transmitting complex observational evidence throughout a network of

interrelated variables to update our estimates of proficiency for a particular student. A

Bayesian network is a graphical model (see Figure 5) of a joint probability distribution

over a set of random variables and consists of the following (Jensen, 1996):

A set of variables (represented by circles and referred to as nodes) with a set of directed edges (represented by arrows) between nodes indicating the statistical dependence between variables. Reflecting the tradition of pedigree analysis for which BINs have been a popular tool, nodes at the source of a directed edge are referred to as parents of nodes at the destination of the directed edge, referred to as children. The structure of these edges represent explicit assumptions about the conditional independence of variables. Specifically, that values of two variables are considered conditionally independent if there is no directed edge between them and they are associated purely through connection with a common parent variable.

For discrete variables each of the variables has a set of exhaustive and mutually exclusive states. For continuous variables the distribution of variable values is defined by the mean and standard deviation of the distribution.

The variables and the directed edges together form an acyclic directed graph (ADG). These graphs are directed in that the directed edges follow a “flow” of dependence in a single direction (i.e., the arrows are always unidirectional rather than bi-directional). The graphs are acyclic in that by following the directional flow of directed edges from any node, it is impossible to return to the node of origin.

To each variable A with parents B1,…,Bn there is attached a conditional probability table or distribution (depending on whether the variable is discrete or continuous) such that for given values of some variables there results a conditional distribution of values of other variables, for example p(A|B1). As such, this distribution provides a basis for subsequent inferences about an examinee given some evidence (observed variables). Figure 5 is reproduced as Figure 6, explicitly showing where conditional probability tables appear in the model.


Figure 5. Graphical Model of Bayesian Network Scoring

Figure 6. Bayesian Network Scoring with Conditional Probability Tables Represented


An Example

In applications of Bayesian networks the evidence accumulation portion of

scoring uses observable elements of task performance to update probabilistic estimates of

proficiency model variables. For example, the probability distributions for the

proficiency model and evidence model previously represented as Figures 1 and 3,

respectively, are now combined as a Bayesian network in Figure 7. In this figure, the

probability distributions are specified for five distinct categories of proficiency for

overall ability and three categories of proficiency for each subproficiency in the

proficiency model. Also note that the observable variables previously represented

currently are unobserved values, but each can take on one of three observed values (High,

Medium, and Low). Since no observations have yet been made in the assessment the

probability distributions for the proficiencies reflect uninformative priors. In contrast, the

probability distributions for the observable variables reflect known difficulty

characteristics of the task from pretesting.

Given this model, if we then assume that some observations are made and

observable variables computed, we have an updated Bayes net, presented in Figure 8,

representing observations of:

quality of search terms = Medium

quality of search results = High

quality of syntax = Medium

use of delimiting terms = Medium

quality of selected resources = Low

The figure shows that the estimates of ability for the various subproficiencies

have now been updated to reflect our new belief about ability based on these

observations. In addition, Access and Evaluate have been updated based on direct

evidence while the remaining variables were updated, to a lesser extent, on the basis of

indirect evidence.


Figure 7. Graphical Model of Bayesian Network Scoring Prior to Observation1

1 Figures 7 and 8 were produced using NeticaTM Application for Belief Networks and Influence Diagrams v2.17 by Norsys Software Corporation.


ICT_LiteracyAdvancedProficientBasicBelowBasicMinimal

13.219.927.424.415.2

QualityOfSelectedResourcesHighMediumLow

0 0

100

QualityOfSearchTermsHighMediumLow

0 100 0

QualityOfSearchResultsHighMediumLow

100 0 0

QualityOfSyntaxHighMediumLow

0 100 0

UseOfDelimitingTermsHighMediumLow

0 100 0

DefineAboveBasicBasicBelowBasic

26.244.729.2

AccessAboveBasicBasicBelowBasic

30.660.19.32

ManageAboveBasicBasicBelowBasic

26.244.729.2

IntegrateAboveBasicBasicBelowBasic

26.244.729.2

CreateAboveBasicBasicBelowBasic

26.244.729.2

EvaluateAboveBasicBasicBelowBasic

11.743.544.8

Figure 8. Graphical Model of Bayesian Network Scoring Subsequent to Observation

Subject-matter experts provide the initial values for the conditional probability

tables. The values represent weights that contribute to hypotheses about proficiency

model variables. These initial values are treated as tentative values for the purpose of

field testing the capabilities of the model, model structure, and calculation infrastructure.

With the full release of the assessment the data collected during initial release will be

used for empirical calibration of the conditional probability tables using Markov Chain

Monte Carlo estimation (Gilks, Richardson, & Spiegelhalter, 1996).

Conclusion

This paper outlined the assessment design of the Higher Ed ICT literacy

assessment, an Internet delivered assessment that measures a student’s blended cognitive

and technical abilities to use technology to research, organize and communicate


information. Unlike traditional assessments—which often use discrete, artificial tasks to

evaluate performance—these assessments will evaluate ICT proficiency using a variety

of simple and more complex authentic tasks. The simpler tasks contribute to the overall

reliability of the assessment whereas the more complex tasks focus on the richer aspects

of performance identified as critical for someone to be considered ICT literate. Also

unlike traditional assessments, which typically provide single scores based on isolated

skills, the Higher Ed ICT literacy assessment uses innovative statistical procedures to

produce detailed aggregated information about individuals’ proficiencies in various

contexts. The authentic nature of the assessments, and the involvement of higher

education institutions throughout the development process, ensures both the quality and

validity of the assessment as well as the utility of the results.

References

Becker, G. S. (2002). The age of human capital. In E. P. Lazear (ed.), Education in the

twenty-first century: Hoover Institution Press, Stanford University, 2002.

Ellis, C. (2001, April 3). Innovation in education: The increasing digital world-issues of

today and tomorrow. Presentation at the National IT Workforce Convocation of

the Information Technology Association of America, San Diego, California.

Retrieved from http://www.itaa.org/workforce/events/01conf/highlights.htm. See

also: U.S. Department of Labor (2001). BLS releases 2000-2010 employment

projections. Retrieved from http://www.bls.gov/emp.

Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.) (1995). Markov chain Monte

Carlo in practice. New York: Chapman and Hall.

Heath, S. B. (1980). The functions and uses of literacy. Journal of Communication, 30,

123-133.

International ICT Literacy Panel. (May, 2002). Digital transformation: A framework for

ICT Literacyliteracy. Princeton, NJ: Educational Testing Service. Available

online at: http://www.ets.org/research/ictliteracy/index.html.

Jensen, F. V. (1996). An introduction to Bayesian network. London: UCL Press.

Leacock, C., & Chodorow, M. (2003). C-rater: Scoring of short-answer questions.

Computers and the Humanities, 37(4), 389-405.

http://www.ets.org/research/ictliteracy/index.html

http://www.bls.gov/emp

http://www.itaa.org/workforce/events/01conf/highlights.htm


Mislevy, R. J., Steinberg, L. S., & Almond, R. G. (2003). On the structure of educational

assessments. Measurement: Interdisciplinary Research and Perspectives, 1, 3-67.

Norsys Software Corp. (2002). NeticaTM Application for Belief Networks and Influence

Diagrams.

Organisation for Economic Co-operation and Development. (2001a). Understanding the digital divide. Paris: Author.

Organisation for Economic Co-operation and Development. (2001b). The well-being of nations. Paris: Author.

Partnership for 21st Century Skills (2003). Learning for the 21st century: A report and

mile guide for 21st century skills. Washington, DC: Author.

Pearl, J. (1988). Probabilistic Reasoning reasoning in Intelligent intelligent

Systemssystems: Networks of Plausible plausible Inferenceinference. Palo Alto:

Morgan Kaufmann Publishers.

Scribner, S., & Cole, M. (1981). The psychology of literacy. Cambridge, MA: Harvard

University Press.

Szwed, J. (1981). The ethnography of literacy. In M. Whitman (Ed.), Writing: The

nature, development, and teaching of written communication: Vol. 1. Hillsdale,

NJ: Erlbaum.

Documents

Assessing information and communications technology literacy for higher education