17
Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer Science University of Sheffield

Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Embed Size (px)

Citation preview

Page 1: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Using the GATE Architecture for NE Recognition in the Football Domain

Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks

Department of Computer ScienceUniversity of Sheffield

Page 2: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

MUMIS Objectives

• European Project: U. of Twente (CTIT), U. of Nijmegen (TSI), DFKI Saarbrücken, MPI, Sheffield (DCS), ESTEAM, and VDA

• Technology development to automatically index (with formal annotations) lengthy multimedia recordings (off-line process)

• Technology development to exploit indexed multimedia archives (on-line process)

• Test Domain: Football Games / UEFA Tournament 2000

Page 3: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Information Extraction Task• 31 events in the football domain: shot on goal,

goal, yellow card, red card, foul, free-kick, pass, etc.

• Meta Data (result, teams, referee, city, stadium, …)

• Named Entities• Person => player, referee, etc.• Place => location on the pitch, etc.• Time => relative time (2 min)• Numbers => score, distance

39 England's best movement of the match. Wise plays a crossfield pass to Gary Neville, who feeds Scholes,

Event: PassTime: 39Player1: Dennise WisePlayer2: Gary Neville

Page 4: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Text Sources

TickersEngland: Seaman, G. Neville, P. Neville, Campbell, Keown, Beckham, Scholes, Shearer, Owen, Ince, Wise. Substitutes: Martyn, Wright, Southgate, Barry, Gerrard, Barmby, Heskey, Fowler, Phillips.1 England kick off. After all the expectation, we're finally under way. Playing from right to left, the first England attack is a long ball to Shearer.

CommentsAfter 34 years of hurt, self examination, navel gazing, inferiority complexes and frustration, Kevin Keegan believes the tide of German superiority over England has turned. 'We're fed up of hearing they've got something on us and we play them again soon. I hope we make them

pay as we've had to pay.' Matchs

Alan Shearer scored the all-important goal, not one of his most difficult but a strike destined to be remembered longer than many others, early in the second half. They had to survive a few subsequent scares, but England did enough to confirm they are not the worst team in their group. Indeed, England could swagger into the quarter-finals with confidence. They may

need to, for Italy in Brussels are their most likely opponents.

Page 5: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Sheffield Information Extraction System

Page 6: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Basic Steps

Text Formats• HTML, XML, SGML, EMAIL

– HTML: head, title, paragraph, etc.– EMAIL: from, date, subject, etc.

• PLAIN TEXT, RTF

Unicode Tokeniser• Rule Based

– (UPPERCASE_LETTER) (LOWERCASE_LETTER)* > Token; orth = upperInitial; kind = word

Page 7: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Gazetteer Look-up

• Hand-coded lists (.lst) from different sources– referee_names_euro2000.lst

Günter Benkö

Pierluigi Collina

• Set of lists defined in .def file and compiled into FSM

• Each element has attributes MajorType and MinorType

national_teams_euro2000.lst:championships_info:team

referee_names_euro2000.lst:championships_info:referee

players_goalkeeper.lst:player:goalkeeper

Page 8: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Regular Grammars

Java Annotation Pattern Engine (JAPE) Grammar

• Similar to Common Pattern Specification Language

• Set of rules– LHS regular expression over annotations– RHS annotations to be added– Priority– Left and Right context around the pattern– JAVA Code

• Rules are compiled in a FST over annotations

• A set of grammars can be loaded

• Rules for sentence splitting

Page 9: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Rules

Adams (Keown 82mins)

Rule: TimeStamp5(({Token.kind == number}) (SPACE)?

({Lookup.minorType == minutes})) :annotate

({Token.string == “)”})--> :annotate.TimeStamp = { rule =

“TimeStamp5”}

England 6 - 1 Yugoslavia

Team1 = EnglandTeam2 = YugoslaviaScore1 = 6Score2 = 1

Rule: AddValueStateOfGame1({StateOfGame.rule =

“rule1”}):annotate--> :annotate { JAVA CODE }

Page 10: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

NE Recognition

Holland 1 - 0 Czech RepublicFull Time. Holland 1 - 0 Czech RepublicHolland 1 Czech Rep 0Germany Scholl 28 1 - 1 Romania Moldovan 5England: Seaman, G. Neville, Adams, …Holland (4-3-2-1): Van der Sar; Reiziger, Stam (Konterman, 75min),

…France: 1. Bernard Lama; 19. Christian Karembeu, 18. Franck...

Gazetteer Lookup and Classification“Seaman”: Player, Goalkeeper, England

Cascade of Jape GrammarsPlayers (Name and Position), Teams (National and Collective), Substitution (On, Off, Time), Lists of Players (all playing, all substitutes), Formation, Temporal Expressions (General), Teams Playing, State of Game, Time Stamps, Results (partial, final)

Page 11: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Other Finite State Components

• Lemmatiser

– List of Exceptions (“biases” analysed as “bias”+”s”)

– biases => root:bias, affix:s

– Rules for Regular forms (“expresses” analysed as “express”+”s”)

– ANY+ DOUBLE “ES” => root:ANY+ DOUBLE, affix:s

• POS tagger

– Lexicon

– beginning VBG NN

– observed VBD VBN JJ

– Rules

– VB NN PREV1OR2TAG DT

– VB VBP PREVTAG NNS

– IN JJ SURROUNDTAG DT NN

Page 12: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Processing Resource

Page 13: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

An Application

Page 14: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Language Resource

Page 15: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Visualization

Page 16: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Visualization

Page 17: Using the GATE Architecture for NE Recognition in the Football Domain Horacio Saggion, Hamish Cunningham, Diana Maynard, Yorick Wilks Department of Computer

Prolog Components

Named Entities and Semantic Annotationsfeed Prolog back-end components

• Bottom Up Chart Parsing• Context Free Grammar• Semantic Rules

• Discourse Interpretation• Entity and Event Co-reference• Presuppositions and Consequences