Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
2
Outline of the Talk
Rationale
Reconceptualizing “items” and test content
Item models and automatic item generation (AIG): mechanisms for mass producing items
Cognitive task models and “item families”: an engineering approach to scale and test development
Quality control (QC) for item families
3
Why Use Automated Item Generation and Principled Item Design Technologies?
“The demand for large numbers of items is challenging to satisfy because the traditional approach to test development uses the item as the fundamental unit of currency. That is, each item is individually hand-crafted—written, reviewed, revised, edited, entered into a computer, and calibrated—as if no other like it had ever been created before.”
Drasgow, Luecht & Bennett, Educational Measurement, 4th Edition, p. 473
4
Traditionally, the quality of test item generation has been dependent on the experience and interpretation of content specifications by item writers (Schmeiser & Welch, 2006)
Principled item design (Bennett, 2001; Irvine, 2002) is rapidly evolving from theory to practical implementation
Layout construct
Develop cognitive
task models
Design templates
Write & tryout
items per template
ReviewGenerate
items
Create Variables
Create Rendering
Models (Shells)
Combine Variables & Models by
Brute Force
NLP Quality Controls
Generate items
AE & ECD
AIG
6
Easy Task Model Family
Items
Templates
Moderate Task Model Family
Items
Templates
Difficult Task Model Family
Items
Templates
Lo
w
Hig
h
Population Distribution of
Examinee Proficiency
Task Model Map for the Item Bank
Scoring Evaluator
Data Model
Rendering Components
<Patient.article><Patient.description.age><Patient.description.occupation> “comes to” <Setting.description> “complaining of”<Patient.ailment.symptom1> <Patient.ailment.symptom1.duration><Patient.ailment.symptom2> <Patient.ailment.symptom2.duration><Patient.history.activity.recent><Patient.physicalexam.temp=# C, (convert(C,F))><Patient.physicalexam.pulse=#/min><Patient.physicalexam.respiration=#/min><Patient.physicalexam.bp=#1/#2><Patient.physicalexam.symptom1><Patient.physicalexam.symptom2> “What is the most likely cause of”
<Patient.ailment.prime_symptom> “?”
7
© Pearson
©American Institutes of Research
©American Institutes of Research
©NBME©AICPA
7
© Pearson
Where is the in the cerebral cortex’sparietal lobe? (Click on the appropriate area of the picture, below)
Parietal Lobe
8
Item Types
MCQ & SR
CR & ER
SA
PA
TE
SIM
Simulations
Multiple-Choice and Selected-Response
Constructed- and Extended Response
Technology-Enhanced
Performance Assessments
Short-Answer
• Stimuli, prompts and problem instructions
• Exhibits• Auxiliary tools/resources• Response capturing• Response data• Scoring evaluators
11
AIG in Three Steps*
The content required for the generated items is identified by test development specialists and defined as a cognitive model
An item model is developed by the test development specialists to specify where content is placed in each generated item
In Step #3, computer-based algorithms are used to place the content specified in Step #1 into the item model developed in Step #2
* Gierl, Lai & Turner (2012). Using automatic item generation to create multiple- choice test items. Medical Education, 46,757–765
13
[AGE] (Integer): From 25.0 to 60.0, by 5.0[GENDER] (String): 1: man 2: woman[PAIN] (String): 1: 2: and intense pain 3: and severe pain 4: and mild pain[LOCATION] (String): 1: the left groin 2: right groin 3: the umbilicus 4: an area near a recent surgery[ACUITYOFONSET] (String): 1: a few months ago 2: a few hours ago 3: a few days ago 4: a few days ago after moving a piano[PHYSICALFINDINGS] (String): 1: protruding but with no pain 2: tender 3: tender and exhibiting redness 4: tender and reducible[WBC] (String): 1: normal results 2: normal results 3: elevated white blood cell count 4: normal results
A 25-year-old man presented with a mass in the left groin. It occurred suddenly 2 hours ago while lifting a piano. On examination, the mass is firm and located in the left groin and lab work came back with normal results. Which of the following is the next best step?
A [AGE]-year-old [GENDER] presented with a mass [PAIN] in [LOCATION]. It occurred [ACUITYOFONSET]. On examination, the mass is [PHYSICALFINDINGS] and lab work came back with [WBC]. Which of the following is the next best step?
Contextual features: exploratory surgery; reduction of mass; hernia repair; ice applied to mass
14
Create Model Structures
Create Item Model
Extract Elements & Constraints
Generate Items
Extract Elements & Constraints
Table
Structures
Table
Constraints
Table
Variant Data
Table
Item Model(s)
Automatic Item Generator (Software)
Natural Language Processor Item 1.2
AIG Item Family
Item 1.3Item 1.2
Item 1.1
15
Sample AE Math Task Model and Templates
Lai, H.; Gierl, M. & Alves, C. (2010). Generating Items under the AE Framework. Invited symposium paper at the Annual Meeting of NCME, Denver
16
AE Item Production
Lai, H.; Gierl, M. & Alves, C. (2010). Generating Items under the AE Framework. Invited symposium paper at the Annual Meeting of NCME, Denver
17
Multiple-Language AIG*Human translations add expense and error on top of an already expensive process of artfully crafted items
Translated English-generated medical licensing examination multiple-choice items into Canadian French and Chinese by adding a “language layer” to the item models
Still partly a work in progress since the “art” of translation is seldom exact, given the nuances of language
Sin embargo... el "arte" de la traducción es raramente exacto
However... the 'art' of the translation is rarely accurate
* Gierl, Lai & Turner (2012). Medical Education. Gierl, Fung, Lai & Zheng (2013). National Council
on Measurement in Education Symposium Paper.
18
AIG model premise: number series problems are a convenient format to measure some aspects of numerical reasoning (application of rule-based induction) and are amenable to algorithmic item design
There is a mature “task model” for number series problems
19
AIG less human involvement ($$$)AIG without STRONG quality controls and evaluation criteria is not fruitfulStandards that depend on projected use and quality of evidence
Quality of item contentPredictability of item parametersImpact of item predictability on score reliability
How much should traditional “content blueprints” drive these standards?
23
Easier Tasks(Low Complexity)
Moderately Difficult Tasks(Medium Complexity)
Very Difficult Tasks(High Complexity)
Template 1.3
Item 1,3,n
Item 1,3,2Item 1,3,1
Template 1.1
Item 1,1,n
Item 1,1,2Item 1,1,1
Template 1.2 Template 25.3
Item 25,3,n
Item 25,3,2Item 25,3,1
Template 25.1
Item 25,1,n
Item 25,1,2Item 25,1,1
Template 25.2
Content Area 1
Item Difficulty
Content Area 2
Content Area 3
Content Area 3
24
Scalable & Replicable
Designs
Stable Cross-Platform
Performance
Consistent Data
Standardized Components
25
Task Model Grammars (TMGs) are domain-specific languages that describe the intended cognitive complexity design features for families of assessment tasks—the Task Models
Content and declarative knowledge components
Procedural skills needed
Tools, resourcesContextual conditions
Task Model Maps (TMMs) provide a distribution of Task Models on a scale
26
Skills: Required Examinee
Actions
Knowledge Objects
Relations Among Objects
or Actions
Contextual Complexity
Auxiliary Tools
32112 .,. , ,obj objectis relate contexec ta objec auxaction toolsc on tdt ti
27
Task Model Mapping: Locating IntendedChallenges to Support Evidence-Based Claims
Skill=identifyObjects = one, simple conceptRelations=noneContext=match word definitionTools=none
Skills=identify, compare, evaluateObjects = 3-4 complex propertiesRelations=hierarchical (3 levels) Context=complex text, dense info.Tools=facilitative if used correctly
28
Fixed-specification task models
Number of task models = no. of test items
Each task model is essential
Domain-sampled task models
Multiple task models per location
Task models are considered exchangeable at a particular location
Self-adaptive performance task models
Template components are manipulated to change the information (location)
Optimal reconfiguration of components
29
Rendering
Components
Semi-Passive Graphic Exhibits
Interactive Input
Compo-nents
Helm Controls
Online Tools
Online Resources
Scripts
Style Sheets
Data Model
Passive component rendering
data structures Interactive
component data
structures
Auxiliary tools dB
TMG codingComponent
object properties
Template and item statistical
information
Auxiliary content coding
Scoring Evaluator
Measurement opportunity
response structures
Scoring rubric, rules or
algorithms
Stored answer
components
Scoring QC agent(s)
30
A test has five multiple-choice questions scored correct/incorrect. Each question has four possible options. What will be the expected number-correct score for students who guess the answers to all five of the questions?
A. 0.25B. 0.80C. 1.25D. 3.75E. 5.00
Calculate expected values and use them to solve problems: (S-MD.3.) Develop a probability distribution for a random variable defined for a sample space in which theoretical probabilitiescan be calculated; find the expected value. (CCSS Initiatives Project, www.corestandards.org/the-standards/mathematics/hs-statistics-and-probability/)
31
Calculate expected values and use them to solve problems: (S-MD.3.) Develop a probability distribution for a random variable defined for a sample space in which theoretical probabilitiescan be calculated; find the expected value. (CCSS Initiatives Project, www.corestandards.org/the-standards/mathematics/hs-statistics-and-probability/)
1Recall.formula.SRS_uniform.discrete 1i i i
p P u aa
1
Recall.formula.expected_valuen
i iiE y y p u
1 1 2 21Apply.formula.sum_products
n
i i n niy p u p u p u p u
1Apply.formula.simplify_distributive 1/
n
iipu pn p a
maxConstraint.value.discrete_int 0,1, ,i
u u
maxConstraint.value.discrete_int 2, ,n n
Constraint.value.prob 0.0 1.0p
32
A <sample.event> has <n> <description.sample_units> <description.auxiliary_info>. <The/Each> <description.theoretical_event_probability>. What will be the expected <description.value_unit(s)> for <description.objects_using_theoretical_prob_distrib>?
<MCq5.distractor.1=p><MCq5.distractor.2=(1/n)*a><MCq5.distractor.3=n*p=∑xx*px><MCq5.distractor.4=(1/a)∑xx = p∑xx><MCq5.distractor.5=(1/a)*p*n>
Note: p=theoretical_prob_distr.constant=1/a
Scoring Evaluator ui=CAK(i.Selection.MCq.d=i.Key,1 if T,0 if F)
33
Reverse-Engineering Existing Items (Bottom-Up)
Reverse engineering actual items to develop a TMG (propositional) or language to detail required skills and knowledge components
Forward engineer templates and items from the TMG
Iteratively refine of TMG-based families, matching empirical item difficulty ordering
Construct Mapping Approach (Top-Down)
Develop a TMM along a trajectory
Design cognitive task models using challenge schema where skills knowledge | context, tools
Iteratively design and validate templates and item families using a hierarchical QC approach
34
Highly Constrained Templates/Models
Item Writing Guidelines with
Exemplars
Content Topic-Focused
Cognitive Complexity Focused Models
Item Modeling (e.g., Case & Swanson, 2003)
AE Task Models & Templates
AIG
Traditional Item Writing
37
1 ; 1if if if if if ifP x c c a b
θ ξ
Geerlings, H., Glas, C. A. W., & van der Linden, W. J. (2011). Modeling rule-based item generation. Psychometrika, 76, 337-359.
First-Level Model
Second-Level Model ,if f fMVNξ μ Σ
1
R
if fr rrd
38
Calibrate individual items (ignore the item family)…Refine templates to reduce variation in item characteristics
Calibrate task models as families…monitor variation over time and “tweak” templates as needed
39
QC Implications(a)Item families are working well(b)Calibrate TM families
QC Implications(a)Item families not working(b)Tighten/repair templates
ESTIMATION and QC
39
41
Substantive isomorphism within item families
Cognitively exchangeable tasks in terms of required knowledge and skills
Exchangeable evidence to inform measurement claims
Statistical isomorphism within item families
Sufficiently small variation of all item statistical properties within item families
Exchangeability of items within families for scoring purposes
42
Questions?
“The world is full of magical things patiently waiting for our wits to grow sharper.”
Bertrand Russell