Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Fundamentals of Automated
Essay Scoring
Mark D. Shermis, Ph.D.
Professor and Dean,
College of Education
The University of Akron
What is Automated Essay
Scoring?
• Software technology that automatically grades written
English. Graders in other languages have already
been developed.
• Has been applied successfully to short essays (high-
and low-stakes tests) and longer documents.
• Presently a web-based performance assessment.
• Provides both holistic and trait scores.
• Can provide discourse analysis.
CSSO Conference
2
AES--How Does It Work?
• Most grading engines use rater-behavior as the
criterion for their predictions.
• The computer doesn’t “understand” what is written,
but can be programmed to evaluate keywords and
synonyms.
• It is possible to write a non-sensical essay that gets a
good score, but you have to be a good writer to
accomplish this.
• Can evaluate both content and writing ability.
CSSO Conference
3
CSSO Conference
4
Parsers Invested Heavily in
Content
• Intelligent Essay AssessorTM (Pearson
Knowledge Technologies)
• e-Rater® (Educational Testing Service)
• IntelliMetric™ (Vantage Learning)
5
Content is Slippery, However
• Christopher Columbus – Queen America sailed to Santa Maria with 1492
ships. Her husband, King Columbus, looked to
the Indian explorer, Nina Pinta, to find vast wealth
on the beaches of Isabella, but would settle for
spices from the continent of Ferdinand.
CSSO Conference
Tape Measure Analogy
CSSO Conference
6
If you ask a person how to measure length…
Reliability
• Most studies show exact agreement in the
80s and adjacent agreement in the 90s for
the three major vendors.
CSSO Conference
7
Validity
• Validity demonstrated through true score
analysis, correlations with other (objective)
tests, and prediction studies (Keith, 2003).
CSSO Conference
8
Writing A Prompt
• A good prompt is a good prompt; no different in the
automated world.
• Focused Topic and Expectations
• Clear Task/ Charge
• Other Characteristics
– Generate enough content
– Scorability
– Stimulates original writing
– unemotional/unbiased
CSSO Conference
9
Rating Rubrics
• Scoring mechanism that evaluates essays holistically, analytically, or via traits.
• Most of the trait analytic and trait rubrics don’t seem to differentiate all that much from holistic scoring, but people like them (Shermis et al, 2002)
• May miss important (unarticulated) aspects of the writing enterprise (Bennett & Bejar, 1999).
CSSO Conference
10
6+1 Traits™
• Ideas
• Organization
• Voice
• Word Choice
• Sentence Fluency
• Conventions
• +1 Presentation (not used) • Source: Northwest Educational Research Laboratory, Eugene, OR. 6+1™ is a trademark of NWREL.
CSSO Conference
11
6+1 Traits™ Scoring Rubric
CSSO Conference
12
eRater® and Criterion(SM)
• http://www.ets.org/criterion
CSSO Conference
14
Developing The Model
• Ideal: 300 Typical, Scored-Responses Drawn
From the Population
• Ideal: Strong Representation at the Tails of
the Distribution
• Ideal: Scored by Two Well Trained Scorers
• Cross-validated
CSSO Conference
16
Portfolios for Document
Storage/Evaluation
• View Reports/Reporting Options
• Set up Assignments/Assignments
• View Setup Options (Tools, feedback)
CSSO Conference
17
The Florida Proposal
• Develop norms for automated essay scoring
& assess for “vulnerable” groups; replace
FCAT+ Writing
CSSO Conference
18
0
1
2
3
4
5
6
Ass
ign
1
Ass
ign
3
Ass
ign
5
Ass
ign
7
Ass
ign
9
Ass
ign
11
Ass
ign
13
Ass
ign
15
Assignment
Sc
ore
CSSO Conference
19
Future Directions
• Development of general writing models that will
speed up formulation of specific statistical models for
grading.
• Grading by an “ideal” or “gold standard” essay.
• More work with LSA-like approaches to evaluating
content.
• Writing tutorials that will provide additional feedback.
CSSO Conference
20
For Further Information…
Lawrence Erlbaum Associates,
Inc.
http://www.erlbaum.com