Reference Scope Identification in Citing Sentences

1. Reference Scope Identificationin Citing SentencesAuthors: Amjad Abu-Jbara, Dragomir Radev (University of Michigan)Conference:NAACL 2012Expositor:Akihiro Kameda(Aizawa Lab. The University of Tokyo)

2. Abstract Problem: Multiple citation in one sentence There are many POS taggers developed usingdifferent techniques for many major languages suchas transformation-based error-driven learning (Brill,1995), decision trees (Black et al., 1992), Markovmodel (Cutting et al., 1992), maximum entropymethods (Ratnaparkhi, 1996) etc for English. Approach:Prepossessingand 2+1+2*3+1=10 methods

3. Preprocessing & Methods

4. Reference Preprocessing(tagging, grouping, non-syntactical element removal) These constraints can be lexicalized (REF.1; REF.2),unlexicalized (REF.3; TREF.4) or automatically learned(REF.5; REF.6). These constraints can be lexicalized (GREF.1), unlexicalized(GTREF.2) or automatically learned (GREF.3). (GTREF.1) apply fuzzy techniques for integrating sourcesyntax into hierarchical phrase-based systems (REF.2).

5. Approach 1(SVM,LR) Word classification with SVM, a logistic regression classifier Feature: Distance, Position(Before/After), in Segment(,.;and, but, for, nor, or, so, yet), POS tag, DependencyDistance, Dependency Relations, Common Ancestor Node,Syntactic Distance Problem Example: There are many POS taggers developed using differenttechniques for many major languages such as transformation-based error-driven learning (Brill, 1995), decision trees (Black etal., 1992), Markov model (Cutting et al., 1992), maximum entropymethods (Ratnaparkhi, 1996) etc for English.

6. Approach 2(CRF) Sequence Labeling with CRF feature is same as Approach 1

7. Approach 3-S1-* (CRF/segment) segmentation (1) punctuation marks coordination conjunctions and, but, for, nor, or, so, yet a set of special expressions "for example", "for instance", "including", "includes","such as", "like", etc. [Rerankers have been successfully applied to numerousNLP tasks such as] [parse selection (GTREF)], [parsereranking (GREF)], [question-answering (REF)].

8. Approach 3-S2-* (CRF/segment) segmentation (2) chunking tool noun groups verb groups preposition groups adjective groups adverb groups other parts form segment by themselves [To] [score] [the output] [of] [the coreference models], [we][employ] [the commonly-used MUC scoring program (REF)][and] [the recently-developed CEAF scoring program (TREF)].

9. Approach 3-*-R1,2,3 (CRF/segment) R1: majority label of the words it contains R2: inside if any word is inside R3: outside if any word is outside [I O O O O] [I I I] [O O]

10. AR2011the link grammar parser(Sleator and Temperley,1991)

11. Experiment

12. Data ACL Anthology Network Corpus 3300 sentences, citations in each 2 Annotation agreement 500 of 3300, Preprocessing is perfect Kappa coefficient of scope isP ( A)P ( E ) K==2P ( A)1=0.611P ( E )

13. Tools Edinburgh Language Technology TextTokenization Toolkit (LT-TTT) text tokenization, part-of-speech tagging, chunking,and noun phrase head identification. Stanford parser syntactic and dependency parsing LibSVM with linear kernel Weka logistic regression classification

14. Tools Machine Learning for Language Toolkit(MALLET) CRFValidation 10-fold cross validation

15. Experiment (Preprocessing)These constraints can be lexicalized (REF.1; REF.2), ll r ec aunlexicalized (REF.3; TREF.4) or and 93.1%learned(REF.5; REF.6). 3% preci s ion automatically ng: 9 8 .Taggi These constraints can be lexicalized (GREF.1), unlexicalized(GTREF.2) or Perfect! automatically learned (GREF.3).Grouping:(GTREF.1) apply fuzzy techniques for integrating sourcea l: syntax into hierarchicalenceremovsystems (REF.2).Non-syntactic refer phrase-based ecall9 0. 1% rcision and9 0.08% pre

16. Experiment (Main) CRF Chunking Majority

17. Feature Analysis Feature: Distance, Position(Before/After), Samesegment(,.; and, but, for, nor, or, so, yet), POStag, Dependency Distance, DependencyRelations, Common Ancestor Node, SyntacticDistance

18. Summary Identified reference scope in a sentence whichhas multiple citation CRF Chunking Majority