Upload
sai-srivatsa
View
245
Download
0
Embed Size (px)
7/24/2019 Grammar Correction
1/145
7/24/2019 Grammar Correction
2/145
Learner Error Corpora
Grammatical Error Detection
Grammatical Error Correction
Evaluation of Error Detection/CorrectionSystem
7/24/2019 Grammar Correction
3/145
A learner corpus is a computerized textual database of
the language produced by foreign language learners Benefits Researchers will have access to leaners interlanguage
May lead to development of language learning tools
NativeLanguage
ForeignLanguage
Transformation
Interlanguage
7/24/2019 Grammar Correction
4/145
Error tagged corpora Deals with real errors made by language learners
Well formed corpora
Language corpora with well formed constructs BNC, WSJ N-gram corpora
Artificial error corpora
Error tagged corpora are expensive Well formed corpora do not deal with errors Artificially modify well formed corpora to become
error corpora
7/24/2019 Grammar Correction
5/145
NUCLE : NUS Corpus of Learner English
About 1,400 essays from university-level students
with 1.2 million words. Completely annotated with error categories and
corrections.
Annotation performed by English instructors at
NUS Centre for English Language Communication
(CELC).
7/24/2019 Grammar Correction
6/145
Annotation Task Select arbitrary, contiguous text spans using the
cursor to identify grammatical errors.
Classifyerrors by choosing an error tag from a drop-down menu. Correcterrors by typing the correction into a text box. Comment to give additional explanations if necessary.
Writing, Annotation, and Marking Platform (WAMP)
7/24/2019 Grammar Correction
7/14527 error categories with 13 error groups
7/24/2019 Grammar Correction
8/145
NICT-JLE
Error Annotation for Corpus of Japanese Learner
English Izumi et al. CoNLL shared task data
http://www.comp.nus.edu.sg/~nlp/conll13st.html
HOO data
http://clt.mq.edu.au/research/projects/hoo/hoo2012/index.html
7/24/2019 Grammar Correction
9/145
Precision Grammar: formal grammardesigned to distinguish ungrammatical from
grammatical sentences. Constraint Dependency Grammar (CDG)
every grammatical rule is given as a constrainton word-to-word modifications
Resource: Structural disambiguation with constraint propagation, Hiroshi Maruyama,ACL90
7/24/2019 Grammar Correction
10/145
CDG grammar < , , , > finite set of terminal symbols (words) * , , finite set of role-ids * , , finite set of labels constraint that an assignmentA should
satisfy
7/24/2019 Grammar Correction
11/145
A sentence , , is a finite stringon
Each word in a sentence s has k roles , , , () Roles are variables that can take < , >as its
value where
and modifiee is
either
1 or special symbol .
7/24/2019 Grammar Correction
12/145
-
-
-
Analysis of a sentenceassigning values to the roles.
7/24/2019 Grammar Correction
13/145
Definitions
Assuming is an role of word .
7/24/2019 Grammar Correction
14/145
A constraint
. where . range over the set of roles in an assignment. Each is a subformula with vocabulary
Variables: . Constants: * , 1 , 2 , Function symbols: , , , Predicate symbols: ,, Logical connectors: ,,,
7/24/2019 Grammar Correction
15/145
Definitions
The arity of a subformula depends on thenumber of variables that it contains
The degree of grammar is the size of set of role
ids (). A non-null string
over the alphabet is
generated iff there exists an assignment that satisfies the constraint .
7/24/2019 Grammar Correction
16/145
1 1 , , 1 * 1 , , 1 :;11131
: A determiner (D) modifies a noun (N) on the right with the label DET ( , , < ( ))
:A noun modifies a verb (V) on the right with the label SUBJ
: A verb modifies nothing and its label should be ROOT
: No two words can modify the same word with the same label.
7/24/2019 Grammar Correction
17/145
1 1 , , 1 * 1 , , 1 :;11131
: A determiner (D) modifies a noun (N) on the right with the label DET ( , , < ( ))
:A noun modifies a verb (V) on the right with the label SUBJ
, , < ( )
: A verb modifies nothing and its label should be ROOT ,
: No two words can modify the same word with the same label. ,
7/24/2019 Grammar Correction
18/145
[A1]D[dog2]N[runs3]V
7/24/2019 Grammar Correction
19/145
CDG parsing Assigning values to roles from a finite set
* ,1,2,
A constraint satisfaction problem (CSP)
Use constraint propagation or filteringto solveCSP
Form an initial constraint network using a coregrammar.
Remove local inconsistencies by filtering.
If any ambiguity remains, add new constraints and go toStep 2.
7/24/2019 Grammar Correction
20/145
Put the block on the floor on the table in the room
3
2 2 , , 2 * 2 , , , 2 :;
7/24/2019 Grammar Correction
21/145
Constraints
7/24/2019 Grammar Correction
22/145
mod(x) mod(y) x y
mod(x) mod(y) y x
7/24/2019 Grammar Correction
23/145
Total number of possible parse trees?
Catallan number
Explicit representation is not feasible Constraint networkfor implicit representation of
the parse trees
7/24/2019 Grammar Correction
24/145
*
*
*,
*, , 3
*, , 3 ,
,
7/24/2019 Grammar Correction
25/145
A constraint network is said to be arc consistentif, for any constraint matrix, there are no rowsand no columns that contain only zeros
A node corresponding to all zero row or columnis removed from solution
Removing one value makes others inconsistent
The process is propagated until the networkbecomes arc consistent.
The network in example is arc consistent
7/24/2019 Grammar Correction
26/145
Two more constraints
(())extracts semanticfeatures of
7/24/2019 Grammar Correction
27/145
Put the block on the floor on the table in the room
3
7/24/2019 Grammar Correction
28/145
*
*
*,
*, , 3
*, , 3 ,
7/24/2019 Grammar Correction
29/145
*
*
*,
*,
*, , 3 ,
7/24/2019 Grammar Correction
30/145
*
*
*,
*,
*, , 3 ,
7/24/2019 Grammar Correction
31/145
Two more constraints
7/24/2019 Grammar Correction
32/145
*
*
*,
*,
*, ,
7/24/2019 Grammar Correction
33/145
*
*
*
*,
*, ,
7/24/2019 Grammar Correction
34/145
Put the block on the floor on the table in the room
3
7/24/2019 Grammar Correction
35/145
7/24/2019 Grammar Correction
36/145
*
*
*
*
*
7/24/2019 Grammar Correction
37/145
Put the block on the floor on the table in the room
LOC
POSTMOD POSTMODOBJ
7/24/2019 Grammar Correction
38/145
All constraints are treated with same priority failure to adhere to the set of specified
constraints mark an utterance to beungrammatical Gradationin natural language
Can model robustness, the ability to deal with
unexpected and possibly erroneous input .
Weighted Constraint Dependency Grammar(WCDG)
7/24/2019 Grammar Correction
39/145
Different error detection tasks
Grammatical vs Ungrammatical
Detecting errors for targeted categories Preposition errors
Article errors
Agnostic to error category
Approaches
Error detection as classification
Error detection as sequence labelling
7/24/2019 Grammar Correction
40/145
Generic steps Decide on the error category Pick up a learning algorithm
Identify discriminative features Train the algorithm with training data
Error corpora Model encodes the error contexts
flags error detecting a matchof context in learner response
Well-formed corpora Learns the ideal models for the targeted categories
Flags error in case of mismatch
Artificial error corpora
7/24/2019 Grammar Correction
41/145
Type of preposition errors
Selection error [They arrived tothe town]
Extraneous use [They came tooutside]
Omission error [He is fond this book]
Tasks
Classifier prediction
Training a model
What are the features?
Resource: The Ups and Downs of Preposition Error Detection in ESL Writing, Tetreault andChodorow, COLING08
7/24/2019 Grammar Correction
42/145
Cast error detection task as a classification problem Given a model classifier and a context:
System outputs a probability distribution over all prepositions
Compare weight of systems top preposition with writerspreposition
Error occurs when:
Writers preposition classifiers prediction
And the difference in probabilities exceeds a threshold
7/24/2019 Grammar Correction
43/145
Develop a training set of error-annotated ESLessays (millions of examples?):
Too labor intensive to be practical
Alternative:
Train on millions of examples of proper usage
Determining how close to correct writerspreposition is
7/24/2019 Grammar Correction
44/145
Prepositions are influenced by:
Words in the local context, and how they interact
with each other (lexical)
Syntactic structure of context
Semantic interpretation
7/24/2019 Grammar Correction
45/145
1. Extract lexical and syntactic features fromwell-formed (native) text
2. Train MaxEnt model on feature set to outputa probability distribution over a set of preps3. Evaluate on error-annotated ESL corpus by:
Comparing systems prep with writers prep
If unequal, use thresholds to determine
correctness of writers prep
7/24/2019 Grammar Correction
46/145
Feature Description
PV Prior verb
PN Prior noun
FH Headword of the following phrase
FP Following phrase
TGLR Middle trigram (pos + words)
TGL Left trigram
TGR Right trigram
BGL Left bigram
He will take our place inthe line
7/24/2019 Grammar Correction
47/145
Feature Description
PV Prior verb
PN Prior noun
FH Headword of the following phraseFP Following phrase
TGLR Middle trigram (pos + words)
TGL Left trigram
TGR Right trigramBGL Left bigram
He will take our place inthe line
FHPNPV
7/24/2019 Grammar Correction
48/145
Feature Description
PV Prior verb
PN Prior noun
FH Headword of the following phrase
FP Following phrase
TGLR Middle trigram (pos + words)
TGL Left trigram
TGR Right trigram
BGL Left bigram
He will take our place inthe line.
TGLR
7/24/2019 Grammar Correction
49/145
MaxEnt does not model the interactionsbetween features
Build combination features of the headnouns and commanding verbs
PV, PN, FH
3 types: word, tag, word+tag
Each type has four possible combinations
Maximum of 12 features
7/24/2019 Grammar Correction
50/145
Class Components +Combo:word
p-N FH line
N-p-N PN-FH place-line
V-p-N PV-PN take-line
V-N-p-N PV-PN-FH take-place-line
He will take our place in the line.
7/24/2019 Grammar Correction
51/145
Typical way that non-native speakers check ifusage is correct: Google the phrase and alternatives
Google N-gram corpus Queries provided frequency data for the
+Combo features
Top three prepositions per query were usedas features for ME model Maximum of 12 Google features
7/24/2019 Grammar Correction
52/145
Class Combo:word Google Features
p-N line P1= onP2= in
P3= of
N-p-N place-line P1= inP2= on
P3= of
V-p-N take-line P1= on
P2= toP3= into
V-N-p-N take-place-line P1= inP2= on
P3= after
He will take our lace in the line
7/24/2019 Grammar Correction
53/145
Thresholds allow the system to skip caseswhere the top-ranked preposition and what
the student wrote differ by less than a pre-specified amount
7/24/2019 Grammar Correction
54/145
0
10
20
30
40
50
60
70
80
90
100
of in at by with
He is fond withbeer
FLAG AS ERROR
FLAG ERROR
7/24/2019 Grammar Correction
55/145
0
10
20
30
40
50
60
of in around by with
My sister usually gets home around3:00
FLAG AS OK
FLAG OK
7/24/2019 Grammar Correction
56/145
Errors consist of a sub-sequence of tokens ina longer token sequence.
Some of the sub-sequences are errors while theothers not
Advantage: Error category independent
Sequence modelling tasks in NLP
Parts-of-speech tagging
Information Extraction
Resource: High-Order Sequence Modeling for Language Learner Error Detection, MichaelGamon, 6th Workshop on Innovative Use of NLP for Building Educational Applications
7/24/2019 Grammar Correction
57/145
Many NLP problems can be viewed as sequencelabeling.
Each token in a sequence is assigned a label.
Labels of tokens are dependent on the labels ofother tokens in the sequence, particularly theirneighbors (not i.i.d).
foo bar blam zonk zonk bar blam
Slides from Raymond J. Mooney
7/24/2019 Grammar Correction
58/145
Annotate each word in a sentence with apart-of-speech.
Lowest level of syntactic analysis.
Useful for subsequent syntactic parsing
and word sense disambiguation.
John saw the saw and decided to take it to the table.
PN V Det N Con V Part V Pro Prep Det N
7/24/2019 Grammar Correction
59/145
Identify phrases in language that refer to specific types ofentities and relations in text.
Named Entity Recognition (NER) is task of identifyingnames of people, places, organizations, etc. in text.
people organizations places Michael Dellis the CEO of Dell Computer Corporationand lives in
Austin Texas.
Extract pieces of information relevant to a specificapplication, e.g. used car ads:
make model year mileage price For sale, 2002ToyotaPrius, 20,000 mi, $15K or best offer.Available starting July 30, 2006.
7/24/2019 Grammar Correction
60/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
PN
7/24/2019 Grammar Correction
61/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
V
7/24/2019 Grammar Correction
62/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
Det
7/24/2019 Grammar Correction
63/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
N
7/24/2019 Grammar Correction
64/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
Conj
7/24/2019 Grammar Correction
65/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
V
7/24/2019 Grammar Correction
66/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
Part
7/24/2019 Grammar Correction
67/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
V
7/24/2019 Grammar Correction
68/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
Pro
7/24/2019 Grammar Correction
69/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
Prep
7/24/2019 Grammar Correction
70/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
Det
7/24/2019 Grammar Correction
71/145
Classify each token independently but use as input features,information about the surrounding tokens (sliding window).
John saw the saw and decided to take it to the table.
classifier
N
7/24/2019 Grammar Correction
72/145
Better input features are usually thecategoriesof the surrounding tokens, butthese are not available yet.
Can use category of either the preceding orsucceeding tokens by going forward or backand using previous output.
7/24/2019 Grammar Correction
73/145
John saw the saw and decided to take it to the table.
classifier
N
7/24/2019 Grammar Correction
74/145
PN
John saw the saw and decided to take it to the table.
classifier
V
7/24/2019 Grammar Correction
75/145
PN V
John saw the saw and decided to take it to the table.
classifier
Det
7/24/2019 Grammar Correction
76/145
PN V Det
John saw the saw and decided to take it to the table.
classifier
N
7/24/2019 Grammar Correction
77/145
PN V Det N
John saw the saw and decided to take it to the table.
classifier
Conj
7/24/2019 Grammar Correction
78/145
PN V Det N Conj
John saw the saw and decided to take it to the table.
classifier
V
7/24/2019 Grammar Correction
79/145
PN V Det N Conj V
John saw the saw and decided to take it to the table.
classifier
Part
7/24/2019 Grammar Correction
80/145
PN V Det N Conj V Part
John saw the saw and decided to take it to the table.
classifier
V
7/24/2019 Grammar Correction
81/145
PN V Det N Conj V Part V
John saw the saw and decided to take it to the table.
classifier
Pro
7/24/2019 Grammar Correction
82/145
PN V Det N Conj V Part V Pro
John saw the saw and decided to take it to the table.
classifier
Prep
7/24/2019 Grammar Correction
83/145
PN V Det N Conj V Part V Pro Prep
John saw the saw and decided to take it to the table.
classifier
Det
7/24/2019 Grammar Correction
84/145
PN V Det N Conj V Part V Pro Prep Det
John saw the saw and decided to take it to the table.
classifier
N
7/24/2019 Grammar Correction
85/145
Hidden Markov Model
Finite state automation with stochastic state
transitions and observations
Start from a stateemitting an observation
transiting to new stateemitting observation
..Final state
State transition probability (|) Observation probability (|) Initial state distribution ()
7/24/2019 Grammar Correction
86/145
Maximum Entropy Markov Model (MEMM)
Combines transition and observation functions
together with a single function
(|, )
7/24/2019 Grammar Correction
87/145
NER annotation convention
Ooutside NE
Bbeginning of NE
Iinside NE
Learner error annotation
O and I
Most of the error spans are short
Michael Dellis the CEO of Dell Computer Corporationand lives in Austin Texas.B I O O O O B I I O O O B I
7/24/2019 Grammar Correction
88/145
7/24/2019 Grammar Correction
89/145
Language model features How close or far is the learners utterance from
ideal language usage?
String features whether a token is capitalized (initial
capitalization or all capitalized)?
token length in characters
number of tokens in the sentence
Linguistic analysis feature Features from constituency parse tree
7/24/2019 Grammar Correction
90/145
All features are calculated for each token ofthe tokens , , in a sentence
Basic LM features
Unigram probability of average n-gram probability of all n-grams in the
sentence that contain ( +) = += 1 = +=
7/24/2019 Grammar Correction
91/145
Ratio features
( > )
tokens that are part of an unlikely combination of
otherwise likely smaller n-gramserror
Drop features
drop or increase in n-gram probability across token
. + ()
7/24/2019 Grammar Correction
92/145
good n-gram is likely to have a much higherprobability than an n-gram with the sametokens in random order
( ) Minimum ratio to random
Average ratio to random
Overall ratio to random = 1 2 1 ( +) = += 2
7/24/2019 Grammar Correction
93/145
Overlap to adjacent ratio
an erroneous word may cause n-grams that containthe word to be less likely than adjacent but non-
overlapping n-grams
7/24/2019 Grammar Correction
94/145
Features extracted from syntactic parse trees
Label of the parent and grandparent node (some
of the labels denote complex constructs , e.g.,
SBAR )
number of sibling nodes
number of siblings of the parent
length of path to the root
7/24/2019 Grammar Correction
95/145
GEC Approaches
Rule-based Classification LanguageModelling
SMT Hybrid
7/24/2019 Grammar Correction
96/145
Whole sentence error correction
Pipeline based approach
Design classifiers for different error categories
Deploy classifiers independently
Relations between errors are ignored
Example:A cats runs
An article classifier may propose to delete a
A noun number classifier may propose to change cats to cat
Resource: Grammatical Error Correction Using Integer Linear Programming, Yuanbin Wuand Hwee Tou Ng
7/24/2019 Grammar Correction
97/145
Joint Inference
Errors are most of the cases interacting
Errors needs to be correctedjointly
Steps
For every possible correction, a score(how muchgrammatical) is assigned to the corrected sentence
A set of corrections resulting in maximum score isselected
7/24/2019 Grammar Correction
98/145
Integer Linear Programming
0
GECGiven an input sentence, choose a setof correctionswhich results in the bestoutputsentence
7/24/2019 Grammar Correction
99/145
ILP formulation of GEC Encode the output space using integer variables.
Corrections that a word needs
Express inference objective as a linear objectivefunction. Maximize the grammaticality of corrections
Introducing constraints to refine feasible output
space Constraints guarantee that the corrections do not
conflict with each other
7/24/2019 Grammar Correction
100/145
What corrections at which positions?
Location of error
Error type
Correction
First order variables, 0,1
1,2, , is an error type 1,2, , is a correction of type
7/24/2019 Grammar Correction
101/145
7/24/2019 Grammar Correction
102/145
, 1 The word at position should be corrected to
that is of error type
.
, 0 The word at position is not applicable for
correction
, 1 Deletion of a word
7/24/2019 Grammar Correction
103/145
Objective: To find best correction Exponential in combinations of corrections
Approximate by decomposable assumption
Measuring the output quality of multiple corrections canbe decomposed into measuring quality of the individualcorrections
Let
,
and
,, measure the
grammaticality of max ,, ,
,,
7/24/2019 Grammar Correction
104/145
For individual correction , , the quality of isdepends on Language model score: , Classifier confidence: , Disagreement score: ,
Difference between maximum confidence score and thescore of the word that is being corrected
,, , (, ) (, )
7/24/2019 Grammar Correction
105/145
Constraint to avoid conflict
For each error type , only one output is allowedat any applicable position
, 1 applicable ,
Final ILP formulation
m a x ,, ,,,
. . , 1 applicable ,
,
*0 1
7/24/2019 Grammar Correction
106/145
A cats sat on the mat
Possible corrections and related variables
7/24/2019 Grammar Correction
107/145
Constraint, , , 1
Computing weights
Language model score, classifier confidence
score, disagreement score
Classifiers: article (ART), preposition (PREP), noun
number (NOUN)
Correction: ,
.
7/24/2019 Grammar Correction
108/145
Weight for ,: ,, ,
, , ,
,
,
,
, 1 , 1, 5 , 5, 12 , 1, ,5,
, max , , 1, ,1, , 5, ,5,
7/24/2019 Grammar Correction
109/145
,, , 2 , 1, ,5, ,4,
2 , 2, ,6, , , ,
7/24/2019 Grammar Correction
110/145
A motivating case A cat sat on the mat ()Cats sat on the mat ()
,
,
,,will be small due to missing article ,, will be small due to low LM score of A
cats Relaxing decomposable assumption Combine multiple corrections to a single correction
Instead of considering corrections A/and /separately consider /together
Higher order variables
7/24/2019 Grammar Correction
111/145
Let , , , , be the set offirst order variables
Let ,,be the weight of ,
A second order variable:
, , , , ,
, , , , .
7/24/2019 Grammar Correction
112/145
Weight for second order variable is similar asthat for first order variables
Why?
, , ( , ) (, )
7/24/2019 Grammar Correction
113/145
New constraints for enforcing consistencybetween first and second order variables
New objective function
7/24/2019 Grammar Correction
114/145
Statistical Machine Translation for GEC
arg max
(|)
Model GEC as SMT E=L1 and F=L2
Parallel corpora: Learner error corpora
7/24/2019 Grammar Correction
115/145
GEC is as good as SMT
Increase size of parallel corpora covering targeted
types of errorsExpensive
A hack through SMT systems considered to be meaning
preserving
Generate alternate surface renderings of themeaning expressed in L2 sentence
Select the most fluent oneResource: Exploring Grammatical Error Correction with Not-So-Crummy Machine Translation,
Madnani et al.
7/24/2019 Grammar Correction
116/145
Bilingual MT
System 1
Bilingual MT
System 2
Bilingual MT
System n
ErroneousSentence
PL1Translation
PL2Translation
PLnTranslation
RT1 RT2 RTn Select
Combine
7/24/2019 Grammar Correction
117/145
Find the most fluent alternative Use an n-gram language model
Issue Language model does not care about preserving sentence
meaning
No single translation is error free in general
7/24/2019 Grammar Correction
118/145
To increase the likelihood of whole-sentencecorrection
Combine evidence of corrections produced by
each independent translation model
Steps: Combination based approach
Align(original, round translation) pairs
Combinealigned pairs to form word lattice Decode for best candidate
7/24/2019 Grammar Correction
119/145
The task: Align each sentence pair
Alignment:
For a (hypothesis, reference) pair perform some editoperations that transform a hypothesis sentence to areference one
Each edit operation involves a cost
Best alignment is that with minimal cost
Also used as machine translation metric
7/24/2019 Grammar Correction
120/145
Word order rate (WER)
Levenstein distance between pair
Edit operations: Match, Insertionand Substitution
Fails to model reordering of words or phrases in
translation
Translation Edit Rate (TER) Introduce shiftoperation
Resource: TERp System Description, Snover et al.
7/24/2019 Grammar Correction
121/145
Shift operation in TER Allow block movement of words A number of constraints on shift operations
Shifts are selected by a greedy algorithm that selects the shift
that most reduces the WER between the reference and thehypothesis.
The shifted words must exactly match the reference words inthe destination position.
The words to be shifted must contain at least one error to
prevent the shifting of words that currently correctlymatched.
The word sequence of the reference that corresponds to thedestination position must be misaligned before the shift
7/24/2019 Grammar Correction
122/145
TER-Plus (TERp)
Three more edit operations
Stem match, synonym match, phrase substitution
allows shifts if the words being shifted are exactly
the same, are synonyms, stems or paraphrases of
each other, or any such combination
7/24/2019 Grammar Correction
123/145
both experience and books are very important about living .
related to the life experiences and the books are very imp0rtant .
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
124/145
---- both experience and books are very important about living .
related to the life experiences and the books are very imp0rtant .
[I]
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
125/145
---- ---- both experience and books are very important about living .
related to the life experiences and the books are very imp0rtant .
[I] [I]
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
126/145
---- ---- both experience and books are very important about living
related to the life experiences and the books are very imp0rtant
[I] [I] [S]
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
127/145
---- ---- both experience and books are very important about living
related to the experiences and the books are very imp0rtant life
[I] [I] [S] [Y]*
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
128/145
---- ---- both experience and books are very important about living
related to the experiences and the books are very imp0rtant life
[I] [I] [S] [Y]*
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
129/145
---- ---- both experience and books are very important about living
related to the experiences and the books are very imp0rtant life
[I] [I] [S] [T] [Y]*
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
130/145
---- ---- both experience and books are very important about living
related to the experiences and the books are very imp0rtant life
[I] [I] [S] [T] [M] [Y]*
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
131/145
---- ---- both experience and --- books are very important about living
related to the experiences and the books are very imp0rtant life
[I] [I] [S] [T] [M] [I] [Y]*
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
132/145
---- ---- both experience and --- books are very important about living
related to the experiences and the books are very imp0rtant life
[I] [I] [S] [T] [M] [I] [Y]*
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
133/145
---- ---- both experience and --- books are very important about living
related to the experiences and the books are very imp0rtant ----- life
[I] [I] [S] [T] [M] [I] [M] [M] [M] [M] [S] [Y]*
[I] Insertion[S] Substitution[M] Match[T] Stemming[Y] Wordnet Synonym
* Shiftin
7/24/2019 Grammar Correction
134/145
---- both experience --- and books are very important about living
and the experience , and book a very important about life
[I] [S] [M] [I] [M] [T]* [S] [M] [M] [M] [Y]
7/24/2019 Grammar Correction
135/145
The task: Combine every translations usingtheir alignments to the original sentence We need a data structure for combination: Word
Lattice Word Lattice a directed acyclic graph with a single start point
and edges labeled with a word and weight.
every path must pass through every node. a word lattice can represent an exponential
number of sentences in polynomial space
7/24/2019 Grammar Correction
136/145
Create backbone of the lattice using theoriginal sentence
1 2 3 4both/1 experience/1and/1
7/24/2019 Grammar Correction
137/145
For all round trip translations, map thealignments to the lattice
each insertion, substitution, stemming, synonymy
and paraphrase operation lead to creation of newnodes
Duplicate nodes are merged (match operation)
Edges produced by different translations betweensame pair of nodes are merged and their weightsare added (two consecutive match operation)
7/24/2019 Grammar Correction
138/145
Original: Both experience and books are very important about living.Russian: And the experience, and a very important book about life.
---- both experience --- and books are very important about living
and the experience , and book a very important about life
[I] [S] [M] [I] [M] [T]* [S] [M] [M] [M] [Y]
7/24/2019 Grammar Correction
139/145
7/24/2019 Grammar Correction
140/145
7/24/2019 Grammar Correction
141/145
Greedy best first
Both experience and books are very important about life
7/24/2019 Grammar Correction
142/145
1-Best
Convert TREp lattice edge weights to edge costs
by multiplying the weights by -1
Find the output as the shortest path in TERplattice.
Both experience and the books are very important about life (cost: -59)
7/24/2019 Grammar Correction
143/145
Language Model ranked
Find n-best (lowest cost) list from TERp lattice
Rank the list using n-gram language model
Suggest top ranked candidate as correction
7/24/2019 Grammar Correction
144/145
Language Model Composition Convert edge weights in the TERp lattice into
probabilities
Weighted Finite State Transducer (WFST)representation () Train an n-gram finite state language model in
WFST () Compose: shortest path through is suggested as
correction
7/24/2019 Grammar Correction
145/145
Learner error corpora
Grammatical error detection
Grammatical error correction
Evaluating error detection and correctionsystem