1. Reference Scope Identificationin Citing SentencesAuthors:
Amjad Abu-Jbara, Dragomir Radev (University of
Michigan)Conference:NAACL 2012Expositor:Akihiro Kameda(Aizawa Lab.
The University of Tokyo)
2. Abstract Problem: Multiple citation in one sentence There
are many POS taggers developed usingdifferent techniques for many
major languages suchas transformation-based error-driven learning
(Brill,1995), decision trees (Black et al., 1992), Markovmodel
(Cutting et al., 1992), maximum entropymethods (Ratnaparkhi, 1996)
etc for English. Approach:Prepossessingand 2+1+2*3+1=10
methods
3. Preprocessing & Methods
4. Reference Preprocessing(tagging, grouping, non-syntactical
element removal) These constraints can be lexicalized (REF.1;
REF.2),unlexicalized (REF.3; TREF.4) or automatically
learned(REF.5; REF.6). These constraints can be lexicalized
(GREF.1), unlexicalized(GTREF.2) or automatically learned (GREF.3).
(GTREF.1) apply fuzzy techniques for integrating sourcesyntax into
hierarchical phrase-based systems (REF.2).
5. Approach 1(SVM,LR) Word classification with SVM, a logistic
regression classifier Feature: Distance, Position(Before/After), in
Segment(,.;and, but, for, nor, or, so, yet), POS tag,
DependencyDistance, Dependency Relations, Common Ancestor
Node,Syntactic Distance Problem Example: There are many POS taggers
developed using differenttechniques for many major languages such
as transformation-based error-driven learning (Brill, 1995),
decision trees (Black etal., 1992), Markov model (Cutting et al.,
1992), maximum entropymethods (Ratnaparkhi, 1996) etc for
English.
6. Approach 2(CRF) Sequence Labeling with CRF feature is same
as Approach 1
7. Approach 3-S1-* (CRF/segment) segmentation (1) punctuation
marks coordination conjunctions and, but, for, nor, or, so, yet a
set of special expressions "for example", "for instance",
"including", "includes","such as", "like", etc. [Rerankers have
been successfully applied to numerousNLP tasks such as] [parse
selection (GTREF)], [parsereranking (GREF)], [question-answering
(REF)].
8. Approach 3-S2-* (CRF/segment) segmentation (2) chunking tool
noun groups verb groups preposition groups adjective groups adverb
groups other parts form segment by themselves [To] [score] [the
output] [of] [the coreference models], [we][employ] [the
commonly-used MUC scoring program (REF)][and] [the
recently-developed CEAF scoring program (TREF)].
9. Approach 3-*-R1,2,3 (CRF/segment) R1: majority label of the
words it contains R2: inside if any word is inside R3: outside if
any word is outside [I O O O O] [I I I] [O O]
10. AR2011the link grammar parser(Sleator and
Temperley,1991)
11. Experiment
12. Data ACL Anthology Network Corpus 3300 sentences, citations
in each 2 Annotation agreement 500 of 3300, Preprocessing is
perfect Kappa coefficient of scope isP ( A)P ( E ) K==2P (
A)1=0.611P ( E )
13. Tools Edinburgh Language Technology TextTokenization
Toolkit (LT-TTT) text tokenization, part-of-speech tagging,
chunking,and noun phrase head identification. Stanford parser
syntactic and dependency parsing LibSVM with linear kernel Weka
logistic regression classification
14. Tools Machine Learning for Language Toolkit(MALLET)
CRFValidation 10-fold cross validation
15. Experiment (Preprocessing)These constraints can be
lexicalized (REF.1; REF.2), ll r ec aunlexicalized (REF.3; TREF.4)
or and 93.1%learned(REF.5; REF.6). 3% preci s ion automatically ng:
9 8 .Taggi These constraints can be lexicalized (GREF.1),
unlexicalized(GTREF.2) or Perfect! automatically learned
(GREF.3).Grouping:(GTREF.1) apply fuzzy techniques for integrating
sourcea l: syntax into hierarchicalenceremovsystems
(REF.2).Non-syntactic refer phrase-based ecall9 0. 1% rcision and9
0.08% pre
16. Experiment (Main) CRF Chunking Majority
17. Feature Analysis Feature: Distance, Position(Before/After),
Samesegment(,.; and, but, for, nor, or, so, yet), POStag,
Dependency Distance, DependencyRelations, Common Ancestor Node,
SyntacticDistance
18. Summary Identified reference scope in a sentence whichhas
multiple citation CRF Chunking Majority