24
Randy Bennett Frank Jenkins Hilary Persky Andy Weiss [email protected] Scoring Simulation Assessments Funded by the National Center for Education Statistics, US Department of Education

Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Embed Size (px)

DESCRIPTION

TRE Study Purpose Demonstrate an approach to assessing “problem solving with technology” at the 8 th grade level that: Fits the NAEP context Uses extended performance tasks Models student proficiency in an evidence- centered way

Citation preview

Page 1: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Randy BennettFrank JenkinsHilary PerskyAndy Weiss

[email protected]

Scoring Simulation Assessments

Funded by the National Center for Education Statistics, US Department of Education

Page 2: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

What is NAEP?

• National Assessment of Educational Progress• The only nationally representative and

continuing assessment of what US students know and can do in various subject areas

• Paper testing program• Administered to samples in grades 4, 8, and 12 • Scores reported for groups but not individuals

Page 3: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

TRE Study Purpose

• Demonstrate an approach to assessing “problem solving with technology” at the 8th grade level that:• Fits the NAEP context• Uses extended performance tasks• Models student proficiency in an evidence-

centered way

Page 4: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Conceptualizing Problem Solving with Technology

Technology Environment

Content Domain

Searchable Database

Text Processor

Simulation Tools

Dynamic Displays

Spread-sheet

Commun-ication Tools

Biology

Ecology

Physics Balloon xxxxxxx xxxxxxx xxxxxxx

Economics

History

Page 5: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

What do the Example Modules Attempt to Measure?

• By scientific-inquiry skill, we mean being able to find information about a given topic, judge what information is relevant, plan and conduct experiments, monitor one’s efforts, organize and interpret results, and communicate a coherent interpretation.  

• By computer skill, we mean being able to carry out the largely mechanical operations of using a computer to find information, run simulated experiments, get information from dynamic visual displays, construct a table or graph, sort data, and enter text.  

Page 6: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,
Page 7: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,
Page 8: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Scoring the TRE Modules

• Develop initial scoring specifications during assessment design

• Represent what is being measured as a graphical model• Proposal for how the components of

proficiency are organized in the domain of problem solving in technology-rich environments

Page 9: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

TRE Student Model

++

+

+

ScientificInquiry Skill

Scientific InquiryExploration Skill

Science InquirySynthesis Skill

Problem Solving inTechnology-RichEnvironments

ComputerSkill

Page 10: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Connecting Observations to the Student Model

• Three-step process• Feature extraction• Feature evaluation• Evidence accumulation

Page 11: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Feature Extraction

• All student actions are logged in a transaction record

• Feature extraction involves pulling out particular observations from the student transaction record

• Example: the specific experiments the student chose to run for each of the Simulation problems

Page 12: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

A Portion of the Student Transaction Record

# Action Value Time

1 ChooseValues 30

2 SelectMass 90 35

3 TryIt 37

4 MakeTable 55

5 SelectedTabVars Payload Mass 60

6 MakeGraph 68

7 VertAxis Altitude 75

8 HorizAxis Helium 83

Page 13: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Feature Evaluation

• Each extraction needs to be judged as to its correctness

• Feature evaluation involves assigning scores to these observations

Page 14: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

A Provisional Feature-Evaluation Rule

• Quality of experiments used to solve Problem 1• IF the list of payload masses includes the low extreme

(10), the middle value (50), and the high extreme (90) with or without additional values, THEN the best experiments were run.

• IF the list omits one or more of the above required values but includes at least 3 experiments having a range of 50 or more, THEN very good experiments were run.

• IF the list has only two experiments but the range is at least 50 OR the list has more than two experiments with a range equal to 40, THEN good experiments were run.

• IF the list has two or fewer experiments with a range less than 50 OR has more than two experiments with a range less than 40, THEN insufficient experiments were run.

Page 15: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

An Example of a “Best” Solution

Page 16: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

An Example of an “Insufficient” Solution

Page 17: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Evidence Accumulation• Feature evaluations (like item responses)

need to be combined into summary scores that support the inferences we want to make from performance

• Evidence accumulation entails combining the feature scores in some principled manner

• Bayesian inference networks• Offer a very general, formal, statistical framework for

reasoning about interdependent variables in the presence of uncertainty

Page 18: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

An Evidence Model Fragment for Exploration Skill in Simulation 1

Page 19: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Using Evidence to Update the Student Model

Page 20: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Using Evidence to Update the Student Model

Page 21: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

TRE Student Model

Page 22: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Conclusion

• TRE illustrates:• Measuring problem-solving with technology, with

emphasis on the integration of the two skill sets• Using extended tasks like those encountered in

advanced academic and work environments• Modeling student performance in a way that

explicitly accounts for multidimensionality and for uncertainty

Page 23: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Conclusion

• Important remaining issues • Measurement

• Tools to evaluate model fit not well-developed• Extended performance tasks have limited

generalizability• Logistical

• Adequate school technology not yet universal• Cost

• Task production and scoring are labor-intensive

Page 24: Randy Bennett Frank Jenkins Hilary Persky Andy Weiss Scoring Simulation Assessments Funded by the National Center for Education Statistics,

Randy BennettFrank JenkinsHilary PerskyAndy Weiss

[email protected]

Scoring Simulation Assessments

Funded by the National Center for Education Statistics, US Department of Education