Making Sense of Data from Complex Assessments

FERA 2001 Slide 1November 6, 2001

Making Sense of Data fromComplex Assessments

Robert J. MislevyUniversity of Maryland

Linda S. Steinberg & Russell G. AlmondEducational Testing Service

FERA

November 6, 2001


How much can testing gain from modern cognitive psychology?

So long as testing is viewed as something that takes place in a few hours, out of the context of instruction, and for the purpose of predicting a vaguely stated criterion, then the gains to be

made are minimal.

Buzz Hunt, 1986:


Opportunities for Impact Informal / local use Conceptual design frameworks

E.g., Grant Wiggins, CRESST Toolkits & building blocks

E.g., Assessment Wizard, IMMEX Building structures into products

E.g., HYDRIVE, Mavis Beacon Building structures into programs

E.g., AP Studio Art, DISC


For further information, see...

www.education.umd.edu/EDMS/mislevy/


Don Melnick, NBME:

“It is amazing to me how many complex ‘testing’ simulation systems have been developed in the

last decade, each without a scoring system.

“The NBME has consistently found the challenges in the development of innovative testing methods to lie primarily in the scoring arena.”


The DISC Project

The Dental Interactive Simulations Corporation (DISC)

The DISC Simulator The DISC Scoring Engine Evidence-Centered Assessment Design The Cognitive Task Analysis (CTA)


Evidence-centered assessment design

The three basic models

Evidence Model(s) Task Model(s)

1. xxxxxxxx 2. xxxxxxxx3. xxxxxxxx 4. xxxxxxxx5. xxxxxxxx 6. xxxxxxxx

Student Model Stat model Evidence

rules


What complex of knowledge, skills, or other attributes should be assessed?

(Messick, 1992)




rules



What complex of knowledge, skills, or other attributes should be assessed?

(Messick, 1992)

e Model(s) Task Model(s)



rules

Student ModelVariables



What behaviors or performances should reveal those constructs?





rules





rules


Work product






rules


Work productObservable variables






rules


Observable variables






rules


Observable variables


Student ModelVariables


What tasks or situations should elicit those behaviors?





rules





rules


Stimulus Specifications






rules


Work Product Specifications



Implications for Student Model

SM variables should be consistent with …

The results of the CTA.

The purpose of assessment:

What aspects of skill and knowledge should be used to accumulate evidence across tasks, for pass/fail reporting and finer-grained feedback?


Simplified Version of the

DISC Student Model

Communality

Information gathering/Usage

Assessment

Evaluation

Treatment Planning

Medical Knowledge

Ethics/Legal

Student Model 2

9/3/99,rjm

Simplified version of DISCstudent model


Implications for Evidence Models

The CTA produced ‘performance features’ that

characterize recurring patterns of behavior and

differentiate levels of expertise.

These features ground generally-defined, re-usable

‘observed variables’ in evidence models.

We defined re-usable evidence models for recurring

scenarios for use with many tasks.



Assessment

Adapting to situational constraints

Addressing the chief complaint

Adequacy of examination procedures

Adequacy of history procedures

Collection of essential information

Context

InfoGathAss simplified

9/3/99,rjm

A simplified version of the EMfor InformationGathering

Procedures in the context ofAssessment

An Evidence Model


Evidence Models: Statistical Submodel

What’s constant across cases that use the EM» Student-model parents.

» Identification of observable variables.» Structure of conditional probability relationships between

SM parents and observable children.

What’s tailored to particular cases» Values of conditional probabilities» Specific meaning of observables.


Evidence Models: Evaluation Submodel

What’s constant across cases» Identification and formal definition of observable variables.» Generally-stated “proto-rules” for evaluating their values.

What’s tailored to particular cases» Case-specific rules for evaluating values of observables--

Instantiations of proto-rules tailored to the specifics of case.


“Docking” an Evidence Model

Evidence ModelStudent Model

Communality


Assessment

Evaluation

Treatment Planning

Medical Knowledge

Ethics/Legal


Assessment






Context


“Docking” an Evidence Model


Communality


Assessment

Evaluation

Treatment Planning

Medical Knowledge

Ethics/Legal


Assessment






Context


Initial Status

Expert .28Competent .43Novice .28

All .33Some .33None .33



All 1.00Some .00None .00

Status after four ‘good’ findings



All .00Some .00None 1.00

Status after one ‘good’ and three ‘bad’ findings


“Docking” another Evidence Model


T rea tm en t P lan n in g

M ed ica l K n o w led g e

C o n tex t

A d eq u acy o f trea tm en t p ro ced u res

In d iv id u a liz a tio n o f p ro ced u res

E ffec t o f trea tm en t o n p a tien t

P e rfo rm an ce o f ex tran eo u s trea tm en t

Communality


Assessment

Evaluation

Treatment Planning

Medical Knowledge

Ethics/Legal


“Docking” another Evidence Model


T rea tm en t P lan n in g

M ed ica l K n o w led g e

C o n tex t

A d eq u acy o f trea tm en t p ro ced u res

In d iv id u a liz a tio n o f p ro ced u res

E ffec t o f trea tm en t o n p a tien t

P e rfo rm an ce o f ex tran eo u s trea tm en t

Communality


Assessment

Evaluation

Treatment Planning

Medical Knowledge

Ethics/Legal


Implications for Task Models

Task models are schemas for phases of cases,

constructed around key features that ... the simulator needs for its virtual-patient data base,

characterize features we need to evoke specified aspects of skill/knowledge,

characterize features of tasks that affect difficulty,

characterize features we need to assemble tasks into tests.


Implications for Simulator

Once we’ve determined the kind of evidence we need as evidence about targeted knowledge, how must we construct the simulator to provide the data we need?

Nature of problems» Distinguish phases in the patient interaction cycle.» Use typical forms of information & control availability.

» Dynamic patient condition & cross time cases. Nature of affordances

» Examinees must be able to seek and gather data,» indicate hypotheses,» justify hypotheses with respect to cues, » justify actions with respect to hypotheses.


Payoff

Re-usable student-model

» Can project to overall score for licensing

» Supports mid-level feedback as well

Re-usable evidence and task models» Can write indefinitely many unique cases using schemas

» Framework for writing case-specific evaluation rules

Machinery can generalize to other uses & domains


Two ways to “score” complex assessments

THE HARD WAY:

Ask ‘how do you score it?’ after you’ve built the assessment and scripted the tasks or scenarios.

A DIFFERENT HARD, BUT MORE LIKELY TO WORK, WAY:

Design the assessment and the tasks/scenarios around what you want to make inferences about, what you need to see to ground them, and the structure of the interrelationships.

Part 2 Conclusion


We can attack new assessment challenges by working from generative principles:Principles from measurement and evidentiary

reasoning, coordinated with... inferences framed in terms of current and

continually evolving psychology, using current and continually evolving technologies

to help gather and evaluate data in that light, in a coherent assessment design framework.

Grand Conclusion


Documents

Making Sense of Data from Complex Assessments