Upload
singhg77
View
1.070
Download
1
Tags:
Embed Size (px)
DESCRIPTION
First prize for Engineering Design at ISTE 2011
Citation preview
The Essay Scoring Tool The Essay Scoring Tool - TEST- TEST
B.E Project presentationB.E Project presentation
Submitted by:Submitted by:Abhinav Gupta 201/CO/03Abhinav Gupta 201/CO/03
Danish Contractor 233/CO/03Danish Contractor 233/CO/03 Gaurav Singh 238/CO/03Gaurav Singh 238/CO/03
Himanshu Mehrotra 241/CO/03Himanshu Mehrotra 241/CO/03
Under the guidance of:Under the guidance of:Dr. Shampa Chakraverty Dr. Shampa Chakraverty
COE Dept.COE Dept.NSITNSIT
Date of presentation:Date of presentation: 1 1stst June 2007 June 2007NSIT, Delhi
PRIOR WORKPRIOR WORK
NSIT, Delhi
Overview of the Software
NSIT, Delhi
Student Essay
TEST Essay TEST
Training Essays
INPUTS
Spelling & Grammatical Checks
Corpus Facts
Feedback to student
Score
OUTPUTS
Scoring ParametersScoring Parameters
NSIT, Delhi
Scoring Engine
Quality of
Content
Global Coherence
Factual Accuracy
Local Coherence
SINGULAR VALUES (K) SINGULAR VALUES (K) RETAINEDRETAINED
Variation of correlation of TEST scores with human rater, according to k
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
7 10 28 42 57 74 93 114 137 162 190 220k
co
rre
lati
on
b/w
hu
ma
n
rate
r's
an
d T
ES
T's
sc
ore
s
NSIT, Delhi
Study UndertakenStudy Undertaken
Set of essays given to Human GradersSet of essays given to Human Graders Essays rated as :Essays rated as :
Good EssaysGood Essays Bad EssaysBad Essays
LOCAL COHERENCE – Good LOCAL COHERENCE – Good EssaysEssays
0
0.121
0.242
5 7 9 10 14 15
Essay No.
Av
era
ge
in
ter
se
nte
nc
e s
imil
ari
ty
Average variance from gold standard - 0.0219
NSIT, Delhi
LOCAL COHERENCE – Other LOCAL COHERENCE – Other EssaysEssays
0
0.121
0.242
0.363
3 4 6 11 12 17
Essay no.
Av
era
ge
in
ter
se
nte
nc
e s
imil
ari
ty
NSIT, Delhi
Average variance from gold standard - 0.212
LOCAL COHERENCE- Combined LOCAL COHERENCE- Combined EssaysEssays
0
0.121
0.242
0.363
0.484
1 2 3 4 5 6
No. of observations
Ave
rag
e in
ter
sen
ten
ce
sim
ilar
ity
Series1
Series4
NSIT, Delhi
Series 1 : Good essays
Series 2 : Other Essays
LOCAL COHERENCE - MARKING LOCAL COHERENCE - MARKING SCHEMESCHEME
0
0.121
0.242
0.363
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
Essay No.
Av
era
ge
in
ter-
se
nte
nc
e s
imil
ari
ty
NSIT, Delhi
LOCAL COHERENCE - LOCAL COHERENCE - MARKSMARKS
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
No. of essay
Ma
rks
(c
om
pa
riti
ve
)
NSIT, Delhi
CONTENTS-ESSAYS TO BE MARKEDCONTENTS-ESSAYS TO BE MARKED
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
18 19 33 21 26 24 20 27 28 32 22 25 29 31 34 30 23
Essay nos.
Ave
rag
e aS
imil
arit
y w
ith
go
ld
std
.
NSIT, Delhi
CONTENT – Good EssaysCONTENT – Good Essays
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
20 25 29 31 34 30 23
Essay no.
Av
era
ge
Sim
iari
ty w
ith
Go
ld
sta
nd
ard
NSIT, Delhi
CONTENT – Other EssaysCONTENT – Other Essays
0
0.1
0.2
0.3
0.4
0.5
0.6
18 19 33 21 26 24 28
Essay no.
Av
erg
ae
Sim
ila
rity
wit
h g
old
s
tan
da
rd
NSIT, Delhi
CONTENT - COMBINEDCONTENT - COMBINED
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5 6 7
No. of observations
Ave
rag
e S
imil
arit
y w
ith
go
ld
stan
dar
d
Series1
Series5
SERIES 1 : GOOD ESSAYS
SERIES 5: OTHER ESSAYS NSIT, Delhi
CONTENT-NORMALIZED MARKSCONTENT-NORMALIZED MARKS
0102030405060708090
100
18 19 33 21 26 24 20 27 28 32 22 25 29 31 34 30 23
Essay No.
Mar
ks
NSIT, Delhi
GLOBAL COHERENCEGLOBAL COHERENCE
Essays are classified as having a :Essays are classified as having a :
Good StructureGood Structure Average StructureAverage Structure Bad StructureBad Structure
NSIT, Delhi
0.502 0.523 0.512
0.729
0.432
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 2 3 4 5
Theme No.
Co
rre
lati
on
Co
eff
icie
nt
GOOD STRUCTURED ESSAY
NSIT, Delhi
0.214
0.523
0.398
0.305
0.412
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5
Theme No.
Co
rre
lati
on
Co
eff
icie
nt
AVERAGELY STRUCTURED ESSAY
NSIT, Delhi
0.231
0.342
0.109
0.285
0.198
0
0.1
0.2
0.3
0.4
0.5
0.6
1 2 3 4 5
Theme No.
Co
rrel
atio
n C
oef
fici
ent
BADLY STRUCTURED ESSAY
NSIT, Delhi
GLOBAL COHERENCE MARKSGLOBAL COHERENCE MARKS
0
10
20
30
40
50
60
70
80
90
Bad structure Average Structure Good structure
Type of Essay
Mar
ks
NSIT, Delhi
Fact Evaluation ModuleFact Evaluation Module
NSIT, Delhi
TEST Fact Evaluation Module
Topic Specific Keywords
List of Essays
Correct Facts List
Incorrect Facts List
Individual Essay Reports & Scores
N X 1 Score Matrix (For Internal use by TEST)
Fact Evaluation Fact Evaluation
No. of facts matched:4No. of Incorrect Facts matched:1SCORE: 0.8
NSIT, Delhi
Breakup of Essay Scores Breakup of Essay Scores
0
10
20
30
40
50
60
70
80
90
100
2 11 15
Essay No.
Ma
rks
GlobalCoherence
Content
Local Coherence
Factual Accuracy
Overall Score
NSIT, Delhi
Human scores v/s TEST scoresHuman scores v/s TEST scores
NSIT, Delhi
0 2 4 6 8 10 12 14 16 180
1
2
3
4
5
6
7
8
9
10
Essay No.
Mark
s
TEST tool
Human Rater
0.6
0.65
0.7
0.75
0.8
0.85
0.9
correlation coefficient
human-human
human-TEST
Performance of TESTPerformance of TEST
Adjacent agreement Adjacent agreement with human graders with human graders around 77%around 77%
Agreement among Agreement among human graders human graders around 73%around 73%
NSIT, Delhi
TIME COMPLEXITYTIME COMPLEXITY
PRE-PROCESSING FOR GLOBAL PRE-PROCESSING FOR GLOBAL COHERENCE COHERENCE
0(N^3), Where N = No. of sentences in 0(N^3), Where N = No. of sentences in corpus.corpus.
O(t*n^2), t=no. of themes, n=no. of O(t*n^2), t=no. of themes, n=no. of sentences in eval. Essaysentences in eval. Essay
FACT MODULE – O(k^4)FACT MODULE – O(k^4)
k=no. of keywordsk=no. of keywords
PEGPEG IEAIEA E-RaterE-Rater TESTTESTEvaluation Evaluation parametersparameters
Essay length, Essay length, Complexity of Complexity of sentence and sentence and word lengthword length
Similarity with Similarity with gold standardgold standard
Lexical Lexical complexity, complexity, Vocabulary, Essay Vocabulary, Essay organization and organization and many more..many more..
Similarity with gold Similarity with gold standard, Essay standard, Essay organization,Fact organization,Fact Accuracy.Accuracy.
FeedbackFeedback NoNo YesYes YesYes YesYes
Essay Essay content content checkingchecking
NoNo YesYes YesYes YesYes
Fact Fact checkingchecking
NoNo NoNo YesYes YesYes
Training Training phasephase
Time consuming Time consuming & inexpensive& inexpensive
Time consuming Time consuming & inexpensive& inexpensive
Time consuming & Time consuming & expensiveexpensive
Time consuming & Time consuming & inexpensiveinexpensive
Language of Language of essaysessays
EnglishEnglish EnglishEnglish EnglishEnglish Hindi Hindi
PerformancePerformance Correlation of Correlation of 0.87 with human 0.87 with human ratersraters
Correlation of Correlation of 0.85 with human 0.85 with human raters.raters.
Correlation of 0.87 Correlation of 0.87 with human raters.with human raters.
Correlation of 0.7652 Correlation of 0.7652 with human raters.with human raters.
COMPARISON OF TEST WITH OTHER AES TOOLS
NSIT, Delhi
FUTURE WORK
Include OCR (Optical Character Recognition).
Increasing the size and variety of the corpus.
Incorporating modules for spelling and grammar evaluation.
The use of Random Indexing (RI) techniques can reduce the size of the matrix which is input for the SVD procedure and thus can reduce time-complexity.
LIMITATIONSLIMITATIONSAbsence of grammatical checkingAbsence of grammatical checkingAbsence of a spell-checkAbsence of a spell-checkThe tool is unable to check The tool is unable to check individualistic styles of writingindividualistic styles of writingDomain – specific knowledge Domain – specific knowledge required before checking an essayrequired before checking an essay
NSIT, Delhi
CONTRIBUTIONCONTRIBUTIONFirst AES tool for HindiFirst AES tool for HindiLocal Coherence at granularity of Local Coherence at granularity of sentencessentencesGood correlation with human ratersGood correlation with human ratersSVD done only once for Local and SVD done only once for Local and Global CoherenceGlobal Coherence
NSIT, Delhi
References1. An Introduction to Latent Semantic Analysis by Thomas K Landauer University of Colorado at
Boulder, Peter W. Foltz, Department of Psychology, New Mexico State University, Darrell Laham, Department of Psychology University of Colorado at Boulder, Discourse Processes, 1998
2. The Measurement of Textual Coherence with Latent Semantic Analysis by Peter W. Foltz, New Mexico State University Walter Kintsch and Thomas K. Landauer University of Colorado, Discourse Processes, 1998
3. Indexing by Latent Semantic Analysis by Scott Deerwester, Graduate Library School University of Chicago, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Bell Communications Research Richard Harshman, University of Western Ontario, Journal of the American Society for Information Science, 1990
4. On the notions of theme and topic in psychological process models of text comprehension by Walter Kintsch, Department of Psychology, University of Colorado, Interdisciplinary Studies, 2002
5. How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans by Thomas K. Landauer, Darrell Laham, Bob Rehder, and M. E. Schreiner Department of Psychology & Institute of Cognitive Science University of Colorado, Boulder, corpus, 1996
6. A Critiquing System to Support English Composition through the Use of Latent Semantic Analysis by Kelvin C. Wong, Anders I. Mørch, William K. Cheung, Mason H. Lam1 and Janti P. Tang, Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong, 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) pp. 576-581
7. Finding the WRITE stuff: Automatic identification of discourse structure in student essays by Jill Burstein, Daniel Marcu, and Kevin Knight. 2003b IEEE Trans-actions on Intelligent Systems: Special Issue on Ad-vances in Natural Language Processing, 181:32–39.
NSIT, Delhi
WE WOULD LIKE TO THANKWE WOULD LIKE TO THANK Dr. Shampa Chakraverty, without her constant guidance Dr. Shampa Chakraverty, without her constant guidance
and support we would have given up long ago.and support we would have given up long ago.
Dr. Niladri Chatterjee, Dept. of Mathematics, IIT Delhi for Dr. Niladri Chatterjee, Dept. of Mathematics, IIT Delhi for sharing his experience in the NLP field.sharing his experience in the NLP field.
Ms. Yasmin Contractor, Principal, Summerfields School, Ms. Yasmin Contractor, Principal, Summerfields School, Gurgaon for providing us with the student essays.Gurgaon for providing us with the student essays.
Faculty of COE Dept. and fellow students.Faculty of COE Dept. and fellow students.
NSIT, Delhi
Q & A?
NSIT, Delhi
Automatic Essay Evaluation Automatic Essay Evaluation SoftwareSoftware
B.E. Final Year Project : Final EvaluationB.E. Final Year Project : Final Evaluation
Project Guide: Dr. Shampa ChakravertyProject Guide: Dr. Shampa Chakraverty
Team:Team:
Abhinav Gupta 201/CO/03Abhinav Gupta 201/CO/03
Danish Contractor 233/CO/03Danish Contractor 233/CO/03
Gaurav SinghGaurav Singh 238/CO/03 238/CO/03
Himanshu Mehrotra 241/CO/03Himanshu Mehrotra 241/CO/03
Aim of the softwareAim of the software
To score students’ essays on a specific To score students’ essays on a specific topic.topic.
Give feedback to the student on Give feedback to the student on deficiencies in his/her essay.deficiencies in his/her essay.
Need for this softwareNeed for this software
Teachers these days are overburdened with the Teachers these days are overburdened with the evaluation of answer scripts.evaluation of answer scripts.
Teachers are unable to give personalized attention to the Teachers are unable to give personalized attention to the students’ needs.students’ needs.
Students feel the need to practice writing essays in a Students feel the need to practice writing essays in a non-test environment.non-test environment.
Many factors influence the scoring of essays and Many factors influence the scoring of essays and introduce error.introduce error.
Overview of the Software
Parameters used for evaluationParameters used for evaluation
Similarity with the gold standard Similarity with the gold standard
Local coherence of essayLocal coherence of essay
Global and Theme coherence checker and Feedback generator.Global and Theme coherence checker and Feedback generator.
Fact checkingFact checking
Latent Semantic Analysis Latent Semantic Analysis (LSA) (LSA)
Latent semantic analysis Latent semantic analysis is a statistical technique in natural is a statistical technique in natural language processing of analyzing relationships between a set of language processing of analyzing relationships between a set of documents and the terms they contain by producing a set of documents and the terms they contain by producing a set of concepts related to the documents and terms.concepts related to the documents and terms.
LSA derives a high-dimensional semantic space. Words and LSA derives a high-dimensional semantic space. Words and passages are represented as vectors in the space.passages are represented as vectors in the space.
The LSA measured similarities have been shown to closely mimic The LSA measured similarities have been shown to closely mimic human judgments of meaning similarity.human judgments of meaning similarity.
Training corpus of gold standard essay and other articles, essays on the same topic +
Essay under evaluation
Term-document matrix (M)
(After Singular-value decomposition)
Three matrices – T,S and D(T=Term matrix, S=Singular-values matrix and D=document matrix)
Dimensionality reduction and preserving only 2 largest dimensions in S gives S-improved
(Multiplying T, S-improved and D)
New Term by Document matrix
LSA: Steps involved
LSA Example
Titles of Some Technical Memos
• c1: Human machine interface for ABC computer applications
• c2: A survey of user opinion of computer system response time
• c3: The EPS user interface management system
• c4: System and human system engineering testing of EPS
• c5: Relation of user perceived response time to error measurement
• m1: The generation of random, binary, ordered trees
• m2: The intersection graph of paths in trees
• m3: Graph minors IV: Widths of trees and well- quasi- ordering
• m4: Graph minors : A survey
LSA Example : Term by document matrix
LSA Example: After SVD
LSA Example: Results
Similarity between documents:
C1 and C2 = 0.91 (high)
C1 and C3 = 1.00 (very-high)
C1 with C5 = 0.85(high)
C2 with C3 = 0.91 (high)
C1 and M1 = -0.85 (low)
M1 and M2 = 1.00 (very-high)
M2 and M3 = 1.00 (very-high)
C2 and C3 = 0.91 (high)
Local Coherence Estimation
What is Coherence?
Each sentence in an essay is connected to previous sentences. The degree of this connection measures the coherence of the sentence pairs.
Coherence estimation using LSA:
By comparing vectors for two adjoining segments of text in a semantic space, LSA measures degree of semantic relatedness between the segments.
Global and theme coherence checker and feedback generator
The global structure of the essay is as follows:
Introduction
Ideas in individual paragraphs
Conclusion
Ideas in an essay are presented in the following way:
1. Main idea
2. Supporting idea
3. Explanation of 1. and 2
Global and theme coherence checker and feedback generator
A set of possible introductions, conclusions and ideas are extracted from gold standard and other training essays.
The similarity of student essay introduction is measured against the set of introductions using LSA. The same is done for the ideas and conclusions.
Using the similarity measures the presence or absence of ideas, introductions and conclusions can be determined.
Fact EvaluationFact Evaluation To facilitate this we will have 2 sets of facts –Correct fact and incorrect To facilitate this we will have 2 sets of facts –Correct fact and incorrect
facts, per essay topic. facts, per essay topic.
The following guidelines would be used to evaluate facts:The following guidelines would be used to evaluate facts:
Set of “keywords" to be checked at the sentential level in the text. Set of “keywords" to be checked at the sentential level in the text.
Detection of two or more keywords invokes the checking module Detection of two or more keywords invokes the checking module
2 databases of facts (Correct and Incorrect) contain sets of 2 databases of facts (Correct and Incorrect) contain sets of keywords to form a "fact".keywords to form a "fact".
Each sentence would be assumed to have a maximum of one factEach sentence would be assumed to have a maximum of one fact
Connectives in sentences to be treated as "end-of-sentence" markers Connectives in sentences to be treated as "end-of-sentence" markers for fact evaluation purposes. for fact evaluation purposes.
Fact EvaluationFact Evaluation The keywords detected are paired and matched to form sets of The keywords detected are paired and matched to form sets of
"facts" and then checked in the database. Three cases may arise:"facts" and then checked in the database. Three cases may arise: It returns a positive match in both databases.It returns a positive match in both databases. It returns a positive match in the correct facts database.It returns a positive match in the correct facts database. It returns a positive match in the incorrect facts databaseIt returns a positive match in the incorrect facts database
The time complexity of factual evaluation is around O ( m* (log p)^2 ) The time complexity of factual evaluation is around O ( m* (log p)^2 ) p= No of keywordsp= No of keywords
m= Average sentence length m= Average sentence length
This could be a huge overhead while evaluating essays as fact This could be a huge overhead while evaluating essays as fact evaluation is a very small aspect of the entire process. evaluation is a very small aspect of the entire process.
The use of SQL (for reading facts) and other database optimizations The use of SQL (for reading facts) and other database optimizations should reduce the time required during computation should reduce the time required during computation
References
1. An Introduction to Latent Semantic Analysis by Thomas K Landauer University of Colorado at Boulder, Peter W. Foltz, Department of Psychology, New Mexico State University, Darrell Laham, Department of Psychology University of Colorado at Boulder, Discourse Processes, 1998
2. The Measurement of Textual Coherence with Latent Semantic Analysis by Peter W. Foltz, New Mexico State University Walter Kintsch and Thomas K. Landauer University of Colorado, Discourse Processes, 1998
3. Indexing by Latent Semantic Analysis by Scott Deerwester, Graduate Library School University of Chicago, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Bell Communications Research Richard Harshman, University of Western Ontario, Journal of the American Society for Information Science, 1990
4. On the notions of theme and topic in psychological process models of text comprehension by Walter Kintsch, Department of Psychology, University of Colorado, Interdisciplinary Studies, 2002
5. How Well Can Passage Meaning be Derived without Using Word Order? A Comparison of Latent Semantic Analysis and Humans by Thomas K. Landauer, Darrell Laham, Bob Rehder, and M. E. Schreiner Department of Psychology & Institute of Cognitive Science University of Colorado, Boulder, corpus, 1996
6. A Critiquing System to Support English Composition through the Use of Latent Semantic Analysis by Kelvin C. Wong, Anders I. Mørch, William K. Cheung, Mason H. Lam1 and Janti P. Tang, Department of Computer Science, Hong Kong Baptist University, Kowloon Tong, Kowloon, Hong Kong, 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service (EEE'05) pp. 576-581
7. Finding the WRITE stuff: Automatic identification of discourse structure in student essays by Jill Burstein, Daniel Marcu, and Kevin Knight. 2003b IEEE Trans-actions on Intelligent Systems: Special Issue on Ad-vances in Natural Language Processing, 181:32–39.
Local Coherence ModuleLocal Coherence Module
NSIT, Delhi
The reduced term-documentMatrix after LSA
Evaluation Essay column number in term-document
matrix
Score onLocal Coherence
Feedback to Student
Local Coherence Module
Local Coherence ResultsLocal Coherence Results
0 2 4 6 8 10 12 14 16 180
10
20
30
40
50
60
70
80
90
100Variation of Marks according to Local Coherence with different values of 'k' - scoring scheme 1
Essay No.
Mar
ks k = 114
k = 42
k =10
NSIT, Delhi
Content Evaluation ModuleContent Evaluation Module
NSIT, Delhi
Set of Domain SpecificGolden Standard
Essays
Set of Essaysto be
evaluated
Essay Content Evaluation Module
Normalized scores on basis of Content
Content Evaluation ResultsContent Evaluation Results
0 2 4 6 8 10 12 14 16 180
10
20
30
40
50
60
70
80
90
100
Essay No.
Mar
ks
Variation of Marks according to content with different values of 'k' - scoring scheme 1
k = 114
k = 42
k = 10
NSIT, Delhi
Content EvaluationContent Evaluation Normalized Results Normalized Results
0 2 4 6 8 10 12 14 16 180
10
20
30
40
50
60
70
80
90
100
Essay No.
Mar
ks
Variation of Variation of Marks according to content with different values of 'k' - scoring scheme 2
k = 114
k = 42
k = 10
NSIT, Delhi
Global Coherence ModuleGlobal Coherence Module
NSIT, Delhi
Golden StandardEssays
Global Coherence Evaluation Module
Feedback Score
EvaluationEssay(s)
Global Coherence EvaluationGlobal Coherence EvaluationEffect of KEffect of K
0 2 4 6 8 10 12 14 16 180
20
40
60
80
100
120
Essay No.
Mark
s
Variation of Marks according to Global Coherence with different values of k
k = 114
k = 42 k = 10
NSIT, Delhi