96
J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya [email protected] http://www.lsi.upc.edu/~turmo

J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

Embed Size (px)

Citation preview

Page 1: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Information ExtractionInformation Extraction

Jordi Turmo

TALP Research CentreDep. Llenguatges i Sistemes Informàtics

Universitat Politècnica de [email protected]

http://www.lsi.upc.edu/~turmo

Jordi Turmo

TALP Research CentreDep. Llenguatges i Sistemes Informàtics

Universitat Politècnica de [email protected]

http://www.lsi.upc.edu/~turmo

Page 2: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

SummarySummary

• Information Extraction Systems

• Evaluation

• Multilinguality

• Adaptability

• Information Extraction Systems

• Evaluation

• Multilinguality

• Adaptability

Page 3: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

SummarySummary

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

Page 4: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

DefinitionDefinition

• Goal: Localization and extraction, in a specific format, of the relevant information included in a collection of documents

• Input requirements: scenario of extraction and document collection• Output requirements: output format

Introduction

Page 5: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

TypologyTypologyIntroduction

• Different points of view:− conceptual coverage: restricted-domain IE vs. open-domain IE− language coverage: monoligual IE vs. multilingual IE− media coverage: written text IE, speech IE, image IE, multimedia IE− document type: IE from free text, from semi-structured documents, from structured documents (including Web pages in HTML and XML)

Page 6: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

TypologyTypologyIntroduction

• Different points of view:− conceptual converage: restricted-domain IE vs. open-domain IE− language coverage: monoligual IE vs. multilingual IE− media coverage: written text IE, speech IE, image IE, multimedia IE− document type: IE from free text, from semi-structured documents, from structured documents (including Web pages in HTML and XML)

Page 7: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 1: Structured documentsExample 1: Structured documentsIntroduction

• Web pages• A list of members of an organization per document• English • Scenario of Extraction

Name, degree, school and affiliation of the member

Page 8: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 1: Structured documentsExample 1: Structured documentsIntroduction

Name Degree School Affiliation

WL Hsu PhD Cornell IIS, SinicaCS Ho PhD NTU EE,NTITC.Chen PhD SUNY EE,NTITC.Wu PhD Utexas Cedu,NNUMark Liao PhD NWU IIS, SinicaCJ Liau PhD NTU IIS, SinicaWK Cheng PhD TKU TunghaiWC Wang MS Syracus FIT...

Page 9: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 2: Semi-structured documents

Example 2: Semi-structured documents

Introduction

• 485 seminar announcements• A description of one seminar per document• English • Scenario of Extraction

Speaker, location, start time and end time of the

seminar

Page 10: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 2: Semi-structured documents

Example 2: Semi-structured documents

Introduction

Page 11: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 3: Free textExample 3: Free textIntroduction

• 318 Wall Street Journal articles • A description of an incident per document• English• Scenario of Extraction

Type of incident, perpetrator, target, date, location,

effects and instrument

Page 12: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 3: Free textExample 3: Free textIntroduction

A bomb went off this morning near a power tower in San Salvador leavinga large part of the city without energy, but no casualties have been reported.According to unofficial sources, the bomb -allegedly detonated by urban guerrilla commandos- blew up a power tower in the northwestern part ofSan Salvador at 0650.

Incident type: bombingdate: March 19Location: El Salvador: San Salvador (city)Perpetrator: urban guerrilla commandosPhysical target: power towerHuman target: -Effect on physical target: destroyedEffect on human target: no injury or deathInstrument: bomb

Page 13: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 4: Free textExample 4: Free textIntroduction

• 78 documents • A description of mushroom per document• Spanish • Scenario of Extraction

colors of parts of mushrooms and the circumstances

in which they occur

Page 14: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 4: Free textExample 4: Free textIntroduction

Page 15: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 4: Free textExample 4: Free textIntroduction

El color blanco de su sombrero pasa a amarillo crema al corte.El sombrero ennegrece si se corta.

Sombrero_1color:

Sombrero_2color:

virar_1inicio:final:causa: corte

virar_2inicio: indeffinal:causa: corte

color_1base: blancotono: indefluz: indef

color_3base: indeftono: negroluz: indef

color_2base: amarillotono: cremaluz: indef

Page 16: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 5: CombinationExample 5: CombinationIntroduction

• 78 documents • A description of mushroom per document• Spanish • Scenario of Extraction

Names of the mushroom in different languages, ethimology

colors of parts of mushrooms and the circumstances

in which they occur

Page 17: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Example 5: CombinationExample 5: CombinationIntroduction

Page 18: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

ApplicationsApplicationsIntroduction

• IE from the Web• Building of news DBs• Information Integration• Support for QA and Summarization …

Limitation when P<80%

Page 19: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

ReferencesReferencesIntroduction

• D.E. Appelt, D.J. Israel, 1999

• E. Hovy, 1999• R.J. Mooney, C. Cardie,

1999• Muslea, 1999• J. Cowie, Y. Wilks, 2000• M.T. Pazienza, 2003• Turmo, 2003• Turmo et al. 2005

Page 20: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Recent eventsRecent eventsIntroduction

• IJCAI 2001 Workshop on Adaptive Text Extraction and Mining (ATEM-2001)

• ECML 03/PKDD Workshop on Adaptive Text Extraction and Mining (ATEM-2003)

• AAAI 04 Workshop on Adaptive Text Extraction and Mining (ATEM-2004)

• EACL 06 Workshop on Adaptive Text Extraction and Mining (ATEM-2006)

• COLING-ACL 06 Workshop on Information Extraction Beyond the Document

• ECAI 06 Workshop on Adaptive Text Extraction and Mining (ATEM-2006)

Page 21: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

SummarySummary

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

Page 22: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Origin of IEOrigin of IEHistorical framework

• Acquisition of the relevant information involved in knowledge-based systems

• Traditionally Traditionally (High human cost)(High human cost)

Experts Experts

on the on the

DomainDomain

ManualManual

ProcessProcess

RelevantRelevant

InformationInformation

Page 23: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Origin of IEOrigin of IEHistorical framework

• Acquisition of the relevant information involved in knowledge-based systems

Text-based Text-based Intelligent Intelligent SystemsSystems

RelevantRelevant

InformationInformation

• 80’s 80’s (text sources)(text sources)

Page 24: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Origin of IEOrigin of IEHistorical framework

• Text-Based Intelligent Systems (TBIS)− Information Retrieval− Information Integration − Information Filtering− Information Routing− Information Extraction− Document Classification− Question Answering− Automatic Summarization− Topic Detection & Tracking...

Page 25: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Relevant Historical ProgramsRelevant Historical ProgramsHistorical framework

• Precedents: LSP (Sager, 81), FRUMP (DeJong, 82),

JASPER (Hayes, 86)

• in USA− (1987-1991): MUC [US Navy]

− TIPSTER (1991-1998): MUC [DARPA]

− TIDES (1999-): ACE [NIST]

• in Europe− LRE (1993-1996): TREE, AVENTINUS, FACILE, ECRAN, SPARKLE

− PASCAL excellence network (2003-)

Page 26: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

MUC EvolutionMUC EvolutionHistorical framework

• MUC-1 (1987)– naval operations– auto-definition of scenarios– auto-evaluation

• MUC-2 (1989)– naval operations– output structure with 10 attributes (type of event, agent, place, ...)

– auto-evaluation

Page 27: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

MUC EvolutionMUC EvolutionHistorical framework

• MUC-3 (1991), – Latin-American terrorism– output structure with 18 attributes (type of incident, date, place, ...)– recall and precision measures

extracted

relevant

ab

c

de

f

parcially

extracted

extracted = a + b + e + frelevant = a + f + drecall = a + 0.5 f/ (a + f + d)precision = a + 0.5 f/ (a + f + b + e)

Page 28: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

MUC EvolutionMUC EvolutionHistorical framework

• MUC-4 (1992), – Latin-American terrorism– 24 attributes– F-score (harmonic average)

r pβrp1)(β

F 2

2

• MUC-5 (1993), – Financial news, microelectronics– English, Japanese

Page 29: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

MUC EvolutionMUC EvolutionHistorical framework

• MUC-6 (1995), – finantial news– subtasks: NE, coreference tasks: TE (template element), ST

(scenario template)

• MUC-7 (1998),– air crashes– new task: TR (template relation)

Page 30: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

MUC EvolutionMUC EvolutionHistorical framework

• MUC-6, MUC-7 – Partial extractions are discarded

extracted

relevant

a

b

c

d

extracted = a + brelevant = a + drecall = a / (a + d)precision = a / (a + b)

r pβ

rp1)(β F

2

2

Page 31: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

SummarySummary

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

Page 32: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

General ArchitectureGeneral ArchitectureArchitecture

• Hobbs,93:

– Cascade of transducers (or modules) that add structure to text and, often, drop out irrelevant information by applying rules

Page 33: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Traditional ArchitectureTraditional ArchitectureArchitecture

Conceptual HierarchyConceptual Hierarchy

Pattern MatchingPattern Matching

Pattern Base

Document PreprocessingDocument Preprocessing

PostprocessPostprocess

Page 34: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Traditional ArchitectureTraditional ArchitectureArchitecture

Lexical AnalysisLexical Analysis

Pattern MatchingPattern Matching

Conceptual Hierarchy

Pattern BasePattern Base

Text ControlText Control

Syntactic AnalysisSyntactic Analysis

PostprocessPostprocess

Page 35: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Traditional ArchitectureTraditional ArchitectureArchitecture

Conceptual HierarchyConceptual Hierarchy

Pattern MatchingPattern Matching

Pattern BaseDiscourse AnalysisDiscourse Analysis

Output Template GenerationOutput Template Generation

Output FormatOutput Format

Lexical AnalysisLexical Analysis

Text ControlText Control

Syntactic AnalysisSyntactic Analysis

Page 36: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Text controlText controlArchitecture

• Filtering relevant documents• Guessing the language of the documents• Splitting documents into textual zones• Filtering relevant zones• Splitting text into appropriate units (eg.

sentences)• Filtering relevant units• Tokenizing units

Page 37: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Text controlText controlArchitecture

• Example

Page 38: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Text controlText controlArchitecture

• Example

<Sombrero bastante carnoso de 4 a 8 cm , convexo , luego completamente extendido , aplanado y mamelonado , liso , húmedo e higrófano .> <Esta última condición influye en la variabilidad de su coloración desde canela claro a toda la gama de tostados .> <Con la edad generalmente palidece sus tonos .>

<Puede confundirse con otras foliotas comestibles , pero alguna especie es amarga . ><Los aficionados poco experimentados pueden también confundir este género con otros no comestibles , como Hypholoma y Flacemula , también lignícolas.>

Page 39: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Lexical analysisLexical analysisArchitecture

• Identifying morpho-syntactic categories and semantic categories of words General lexicon

• Recognizing terminology words Specific dictionaries

• Recognizing time expressions, quantities, abbreviations, …• Extending abbreviations

Lists of abbrev. + expansion

Page 40: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Lexical analysisLexical analysisArchitecture

• Recognizing and classifying proper nouns (Named Entities –NERC-) Gazetteers Patterns

• Dealing with unknown words• Dealing with lexical ambiguities

POS taggers WSD (???)

Page 41: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Lexical analysisLexical analysisArchitecture

• Example1

<Sombrero bastante carnoso de 4 a 8 cm , convexo , luego completamente extendido , aplanado y mamelonado , liso , húmedo e higrófano .> <Esta última condición influye en la variabilidad de su coloración desde canela claro a toda la gama de tostados .> <Con la edad generalmente palidece sus tonos .>

<Puede confundirse con otras foliotas comestibles , pero alguna especie es amarga . ><Los aficionados poco experimentados pueden también confundir este género con otros no comestibles , como Hypholoma y Flacemula , también lignícolas.>

time expressions

mushroom names

abbreviatures

numbers

morphologic parts

Depends on

the scenario

Page 42: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Lexical analysisLexical analysisArchitecture

• Example2

time expressions

locations

organizations

persons

<A bomb went off this morning near a power tower in San Salvador leaving a large part of the city without energy , but no casualties have been reported .><According to unofficial sources , the bomb-allegedly detonated by urban guerrilla commandos- blew up a power tower in the northwestern part of San Salvador at 0650 .>

Page 43: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Syntactic analysisSyntactic analysisArchitecture

• Full parsing (Lolita, LaSIE, LaSIE-II)

– inefficient, sizes of the grammars– missing robustness (off vocabulary)– treebank grammars– cascaded grammars

• Solves some problems related to the tuning and incompleteness

Page 44: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Syntactic analysisSyntactic analysisArchitecture

• Partial parsing

−the most commonly used−chunks or phrasal trees (noun phrases,

verbal phrases, prep phrases, adj phrases, adv phrases)

−absence of global dependences

Page 45: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Semantic interpretationSemantic interpretationArchitecture

• Compositive semantics

− full parsing + λ-expressions −LaSIE, LaSIE-II−Entries with λ-expressions in the Lexicons

−partial parsing + gramatical relations [Vilain,99]

−output = logical forms

Page 46: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Semantic interpretationSemantic interpretationArchitecture

A bomb went off this morning near a power tower in San Salvador …

• Compositive semantics (example1)

np np pp

np

pp

vp

s

go_off → λ(t) λ(s) λ(r) λ(z) λ(y) λ(x) (bombing(x,y,z,r,s,t))power_tower → λ(x) (power_tower(x))

λ(z) λ(y) λ(x) (bombing(x,y,z,bomb,today_morning,power_tower(San_Salvador)))

Page 47: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Semantic interpretationSemantic interpretationArchitecture

A bomb went off this morning near a power tower in San Salvador …

location_ofsubj time

place

event(bombing , E)subj(bomb , E)time(today_morning , E)place(power_tower, E)location_of(power_tower, San_Salvador)

• Compositive semantics (example2)

Page 48: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Semantic interpretationSemantic interpretationArchitecture

• Pattern matching

−after partial parsing + svo dependences−the most extended−patterns can be implemented in different

ways −scenario driven approach (TE, TR, ST, …)

−Output = partial templates

Page 49: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Semantic interpretationSemantic interpretationArchitecture

• Pattern matching (example)

A bomb went off this morning near a power tower in San Salvador …

np(C-instrument) … vp(go_off) … np(C-time) … “near” np(C-place) “in” np(C-location)→

INSTRUMENT := C-instrumentDATE := C-timePHIS_TARGET := C-placeLOCATION := C-location

Page 50: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Discourse analysisDiscourse analysisArchitecture

• Inter-sentence analysis−Co-reference resolution−Ellipsis resolution−Alias resolution−Traditional semantic interpretation

procedures−Template merging procedures

• Inference procedures−Open-domain and domain-specific

knowledge for inferences

Page 51: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Discourse analysisDiscourse analysisArchitecture

• Example

A bomb went off this morning near a power tower in San Salvador …, but no casualties have been reported

According to unofficial sources , the bomb -allegedly detonated by urban guerrilla commandos- blew up a power tower in the northwestern part of San Salvador at 0650

λ(y) λ(x) (bombing(x,y,no_casualties,bomb,today_morning,power_tower(San_Salvador)))

λ(z) λ(y) (bombing(urban_guerrilla_comandos,y,z,bomb,0650,power_tower(the_northwestern_part_of_San_Salvador)))

Page 52: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Discourse analysisDiscourse analysisArchitecture

• Example

λ(y) (bombing(urban_guerrilla_comandos,y,no_casualties,bomb,today_morning,power_tower(San_Salvador)))

λ(z) λ(y) (bombing(urban_guerrilla_comandos,y,z,bomb,0650, power_tower( the_northwestern_part_of_San_Salvador)))

λ(y) λ(x) (bombing(x,y,no_casualties,bomb,today_morning,power_tower(San_Salvador)))

Unification & inference

bombing(urban_guerrilla_comandos,destroyed,no_casualties,bomb,today_morning,power_tower(San_Salvador))

Inference (blew_up → destroyed)

Page 53: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Output template generationOutput template generationArchitecture

• Mapping of the extracted pieces onto the desired output format

• Specific inferences:− Normalization to predefined values of slots− Mandatory slots− Extracted information that implies different

slot values

Page 54: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Architecture

Output template generationOutput template generationArchitecture

• Examplebombing(urban_guerrilla_comandos,destroyed,no_casualties,bomb,today_morning,power_tower(San_Salvador))

Today_morning → March_19No_casualties = no_injuries_or_death

Incident type: bombingdate: March 19Location: El Salvador: San Salvador (city)Perpetrator: urban guerrilla commandosPhysical target: power towerHuman target: -Effect on physical target: destroyedEffect on human target: no injury or deathInstrument: bomb

Page 55: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

SummarySummary

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

Page 56: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Characteristics of IE systemsCharacteristics of IE systems

• Strong dependence of the domain−Scenario of extraction−Semantics vs. syntax−Discourse analysis

• Strong dependence of the text structure−Sublanguages−Meta-information

• Strong dependence of the output format−BDs−annotations

Page 57: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Characteristics of IE systemsCharacteristics of IE systems

• Importance of the portability and tuning• Importance of the Knowledge

Engineering−Modularity

−Basic tasks and specific tasks−Use of weak and local knowledge

• Importance of the NL resources−MDRs, ontologies, general lexicons, specific

dictionaries, …

Page 58: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Knowledge resourcesKnowledge resources

• Knowledge more or less stable− general lexicon− general grammar− basic NL processors: segmenters, taggers,

parsers, …

• Domain dependent knowledge − Domain specific vocabularies, terminology− gazetteers and patterns for NERC− IE patterns Knowledge specifically used for IEIE

patterns

Page 59: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Types of IE patternsTypes of IE patterns

• Viewpoint 1: type of representation

− rules

np(C-instrument) … vp(go_off) … np(C-time) … “near” np(C-place) “in” np(C-location)→Event:INSTRUMENT := C-instrument Event:DATE := C-timeEvent:PHIS_TARGET := C-place Event:LOCATION := C-location

Page 60: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Types of IE patternsTypes of IE patterns

• Viewpoint 1: type of representation

− statistical models (BNs, HMMs, ME, Hyperplanes, …)

whospeaker5409appointment

withabouthow…

seminarremindertheater…

thatbyspeaker…

dr.professorrobertmichaelmr

wcavalierstevenschristel

will(receivedHas…

1.0

1.0

0.99

0.76

0.24

0.99 0.56

Page 61: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Types of IE patternsTypes of IE patterns

• Viewpoint 2: type of values extracted− slot filler extraction patterns

(the HMM presented before)

whospeaker5409appointment

withabouthow…

seminarremindertheater…

thatbyspeaker…

dr.professorrobertmichaelmr

wcavalierstevenschristel

will(receivedHas…

1.0

1.0

0.99

0.76

0.24

0.99 0.56

Page 62: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Types of IE patternsTypes of IE patterns

• Viewpoint 2: type of values extracted− slot filler extraction patterns

(the HMM presented before)

− event extraction patterns (the rule presented

before)np(C-instrument) … vp(go_off) … np(C-time) … “near” np(C-place) “in” np(C-location)→Event:INSTRUMENT := C-instrument Event:DATE := C-timeEvent:PHIS_TARGET := C-place Event:LOCATION := C-location

Page 63: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Types of IE patternsTypes of IE patterns

• Point of view: type of values extracted− slot filler extraction patterns

(the HMM presented before)

np(C-person) … vp(is) pron(C-his) “wife” →Married_with:HUSBAND := C-hisMarried_with:WIFE := C-person

− relation extraction patterns

− event extraction patterns (the rule presented

before)

Page 64: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Knowledge specific for IE

Types of IE patternsTypes of IE patterns

• Viewpoint 3: number of slot fillers extracted− single-slot IE patterns

(the HMM presented before)

− multi-slot IE patterns (both rules presented

before)

Page 65: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

SummarySummary

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

• Information Extraction Systems• Introduction

• Historical framework

• Architecture

• Knowledge specific for IE

• Examples

• Evaluation

• Multilinguality

• Adaptability

Page 66: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

Methodologies [Turmo,2002]Methodologies [Turmo,2002]

LaSIELaSIE-IILOLITACIRCUSFASTUSBADGERHASTENPROTEUSALEMBICPIETURBIOPLUMIE2LOUELLASIFT

System Reference Parsing Semantics Discourse

Gaizauskas et al, 1995Humphreys et al, 1998Garigliano et al, 1998Lehnert et al, 1991Hobbs et al, 1993Fisher et al, 1995Krupka, 1995Grishman, 1995Aberdeen et al, 1993Lin, 1995Turmo,2002Weischedel et al, 1995Aone et al, 1998Childs et al, 1995Miller et al, 1998

indepth understanding

template merging

Chunking Pattern matching -

semantic Gramm relations interp interpretation procedures

Partial Parsing pattern matching

Pattern matching template merging -

sintactico-semantic parsing

Page 67: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

Knowledge [Turmo,2002]Knowledge [Turmo,2002]

LaSIELaSIE-IILOLITACIRCUSFASTUSBADGERHASTENPROTEUSALEMBICTURBIOPIEPLUMIE2

LOUELLASIFT

System Parsing Semantics Discourse

Treebank grammar -expressionshand-crafted stratified general grammar General grammar semantic network

concept nodes (AutoSlog) hand-crafted IE rules concept nodes (CRYSTAL) decision trees

Phrasal grammar E-graphs IE rules (ExDISCO)

hand-crafted gram relations IE rules (EVIUS)

General grammar hand-crafted IE rules

hand-crafted rules

hand-crafted IE rules decision trees

Statistical models for syntactic-semantic parsing & coreference resolution learned from PTBand on-domain annotated texts

Page 68: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

LaSIE-II systemLaSIE-II system

Sentencesplitter

Buchart parser

Namematcher

Discourseinterpreter

Templatewriter

Lexicon Conceptual hierarchygazetteers

Gazetteerlookup

TE TR ST

Brilltagger

Taggedmorph

Stratified grammar

Page 69: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

LaSIE-II systemLaSIE-II system

Sentencesplitter

Buchart parser

Namematcher

Discourseinterpreter

Templatewriter

Lexicongazetteers

Gazetteerlookup

TE TR ST

Brilltagger

Taggedmorph

Preprocessing• NERC preprocess via gazetters and keyword lists• Root form and inflexional suffix for verbs, nouns and adjs found in sentences

According_to-adv unofficial-adj source[s]-n , the-det bomb-n – allegedly-adv detonate[ed]-v by-prep urban-adj guerrilla-n commando[s]-n - blow_up-v a-det power_tower-n in-prep the-det northwestern-adj part-n of-prep San Salvador-loc at-prep 0650

Conceptual hierarchyStratified grammar

Page 70: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

LaSIE-II systemLaSIE-II system

Sentencesplitter

Buchart parser

Namematcher

Discourseinterpreter

TemplateWriter

Lexicongazetteers

Gazetteerlookup

TE TR ST

Brilltagger

Taggedmorph

Syntactico-semantic interpretation• bottom-up chart parser• cascade of NERC grammars (eg. aircraft, person, money, time, timex)

According_to-adv unofficial-adj source[s]-n , the-det bomb-n – allegedly-adv detonate[ed]-v by-prep urban-adj guerrilla-n commando[s]-n - blow_up-v a-det power_tower-n in-prep the-det northwestern part of San Salvador-loc at-prep 0650-time

Conceptual hierarchyStratified grammar

NE1 NE2

Page 71: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

LaSIE-II systemLaSIE-II system

Sentencesplitter

Buchart parser

Namematcher

Discourseinterpreter

TemplateWriter

Lexicongazetteers

Gazetteerlookup

TE TR ST

Brilltagger

Taggedmorph

Syntactico-semantic interpretation• bottom-up chart parser• cascade of NERC grammars (eg. aircraft, person, money, time) • cascade of partial grammars (NPs, PPs, complex NP, VPs, complex VPs, RelClauses, Sentence)

S(According_to-adv NP(unofficial-adj source[s]-n) , NP(the-det bomb-n) – allegedly-adv VP(detonate[ed]-v) PP(by-prep NP(urban-adj guerrilla-n commando[s]-n)) - VP(blow_up-v) NP(a-det power_tower-n) PP(in-prep NP(the-det NE1-loc)) PP(at-prep NP(NE2-time)))

Conceptual hierarchyStratified grammar

Page 72: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

LaSIE-II systemLaSIE-II system

Sentencesplitter

Buchart parser

Namematcher

Discourseinterpreter

TemplateWriter

Lexicongazetteers

Gazetteerlookup

TE TR ST

Brilltagger

Taggedmorph

Syntactico-semantic interpretation• bottom-up chart parser• cascade of NERC grammars (eg. aircraft, person, money, time) • cascade of partial grammars (NPs, PPs, complex NP, VPs, complex VPs, RelClauses, Sentence)• QLFs (Note: the real implementation of QLFs is not specified)

Conceptual hierarchyStratified grammar

Event(E1), detonate(E1,Y,X), urban_guerrilla_comando(X), bomb(Y), Event(E2), blow_up(E2,Y,Z), power_tower(Z), location_of(Z,NE1), time_of(E2,NE2)

Page 73: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

LaSIE-II systemLaSIE-II system

Sentencesplitter

Buchart parser

Namematcher

Discourseinterpreter

Templatewriter

Lexicongazetteers

Gazetteerlookup

TE TR ST

Brilltagger

Taggedmorph

Discourse analysis• Name matcher: Matches variants of NEs across the text• Discourse interpreter:

• adds QLF representation to a semantic net (links)• adds presuppositions• coreference resolution

Conceptual hierarchyStratified grammar

location of eventdestroy

bombing event

Event(E1), detonate(E1,Y,X), urban_guerrilla_comando(X), bomb(Y), Event(E2), blow_up(E2,Y,Z), power_tower(Z), location_of(Z,NE1), time_of(E2,NE2)

isa

implies

implies

Page 74: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

LaSIE-II systemLaSIE-II system

Sentencesplitter

Buchart parser

Namematcher

Discourseinterpreter

Template writer

Lexicongazetteers

Gazetteerlookup

TE TR ST

Brilltagger

Taggedmorph

Output template generation• procedure that write the templates in the desired format

Conceptual hierarchyStratified grammar

Incident type: bombingdate: March 19Location: El Salvador: San Salvador (city)Perpetrator: urban guerrilla commandosPhysical target: power towerHuman target: -Effect on physical target: destroyedEffect on human target: no injury or deathInstrument: bomb

Page 75: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

Chunk grammar IE-Rules Format

RulesConceptual hierarchy

Inference Rules

TE TR ST

Page 76: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

Preprocessing

According_to-adv unofficial-adj sources-n , the-det bomb-n – allegedly-adv detonated-v by-prep urban-adj guerrilla-n commandos-n - blew_up-v a-det power_tower-n in-prep the-det northwestern part of San Salvador-loc at-prep 0650-time

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

TE TR ST

Chunk grammar IE-Rules Format

Rules

NE1 NE2

Conceptual hierarchy

Inference Rules

Page 77: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

Sintactico-semantic interpretation• basic VP and NP chunks+head_semantics• semantics refer to types of slot fillers (Conceptual hierarchy)

According_to-adv NP(unofficial-adj sources-n-s1) , NP(the-det bomb-n-artifact) – allegedly-adv VP(detonated-v-s3) by-prep NP(urban-adj guerrilla-n commandos-n-person) – VP(blew_up-v-s4) NP(a-det power_tower-n-building) in-prep NP(NE1-location) at-prep NP(NE2-time)

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

TE TR ST

Chunk grammar IE-Rules Format

RulesConceptual hierarchy

Inference Rules

Page 78: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

Sintactico-semantic interpretation• basic VP and NP chunks+head_semantics• IE-rules for relations (appositions, PP-attachments, limited conjunctions)

• NP(A-person) , B-integer years old , → instance(X,person), name_of(X,A), age_of(X,B)• NP(A-position) of NP(B-company) → instance(X,person), position_of(X,A), company_of(X,B)

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

TE TR ST

Chunk grammar IE-Rules Format

RulesConceptual hierarchy

Inference Rules

Bage

Aname

personClass

ValueSlot

Real implementation as objects

Page 79: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

Sintactico-semantic interpretation• basic VP and NP chunks+head_semantics• IE-rules for relations (appositions, PP-attachments, limited conjunctions)• IE-rules for events (PET interface or ExDISCO)

• NP(A-artifact) v-s4 NP(B-building) → instance(E1,s4), instrument_of(E1,A), phisical_target_of(E1,B)

According_to-adv NP(unofficial-adj sources-n-s1) , NP(the-det bomb-n-artifact) – allegedly-adv VP(detonated-v-s3) by-prep NP(urban-adj guerrilla-n commandos-n-person) – VP(blew_up-v-s4) NP(a-det power_tower-n-building) in-prep NP(NE1-location) at-prep NP(NE2-time)

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

TE TR ST

Chunk grammar IE-Rules Format

RulesConceptual hierarchy

Inference Rules

Page 80: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

Discourse analysis• antecedents found seeking in sequential order.• constraints:

• instance of a hyperclass• same number• share arguments

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

TE TR ST

Chunk grammar IE-Rules Format

RulesConceptual hierarchy

Inference Rules

Page 81: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

Discourse analysis• QLFs + inference rules = more complex QLFs

• conversion of date expressions.• inference of slot values from the QLFs already achieved• inference of events from others explicitly described

Fred, the president of Cuban Cigar Corp., was appointed vice president of MicrosoftimpliesFred left the Cuban Cigar Corp.

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

TE TR ST

Chunk grammar IE-Rules Format

RulesConceptual hierarchy

Inference Rules

Page 82: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

PROTEUS systemPROTEUS system

Output template generation• use of rules to build the templates with the desired format

NERC Partial parsing

ScenarioPatterns

Coreferenceresolution

DiscourseAnalysis

Output generator

Lexicon NERC Rules

Lexical Analizer

TE TR ST

Chunk grammar IE-Rules Format

RulesConceptual hierarchy

Inference Rules

Page 83: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

IE2 systemIE2 system

NetOwlExtractor 3.0

CustomNameTag

PhraseTag EventTagDiscourseModule

TempGen

TE TR STHand-craftedrules

Decisiontree

Page 84: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

IE2 systemIE2 system

NetOwlExtractor 3.0

CustomNameTag

PhraseTag EventTagDiscourseModule

TempGen

TE TR STHand-craftedrules

Decisiontree

Preprocessing• only NERC • SGML-tagged• general NE types and subtypes• restricted-domain NE types and subtypes

<person id=1>Jeff Bantle</person>, <entity id=2>NASA</entity>’s mission operations directorate representative for the shuttle flight

Page 85: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

IE2 systemIE2 system

NetOwlExtractor 3.0

CustomNameTag

PhraseTag EventTagDiscourseModule

TempGen

TE TR STHand-craftedrules

Decisiontree

Syntactico-semantic interpretation• SGML-tagging of phrases that are values of slots• NPs denoting persons (PNP), organizations (ENP), artifacts (ANP), …• local links (location-of, employee-of, owner-of, …)

<person id=1>Jeff Bantle</person>, <PNP affil=2><entity id=2>NASA</entity>’s mission operations directorate representative for the shuttle flight</PNP>

Page 86: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

IE2 systemIE2 system

NetOwlExtractor 3.0

CustomNameTag

PhraseTag EventTagDiscourseModule

TempGen

TE TR STHand-craftedrules

Decisiontree

Syntactico-semantic interpretation• SGML-tagging of phrases that are values of slots in templates• NPs• local semantic relations (employee-of, location-of, product-of, …)• event IE-rules (note: the real implementation is not specified)

• $Vehicle + LaunchN → launch_event::vehicle_info := $Vehicle

<launch_event id=2 vehicle_info=1><ANP> The <vehicle id=1>Arian 5</vehicle> launch </ANP> was successfully achieved at 6am

Page 87: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

IE2 systemIE2 system

NetOwlExtractor 3.0

CustomNameTag

PhraseTag EventTagDiscourseModule

TempGen

TE TR STHand-craftedrules

Decisiontree

Discourse analysis• Three coreference resolution methods

• Rule based• Machine learning based• Hybrid

• Name alias resolution in addition to that performed by NetOwl • Definite NPs• Singular personal pronouns

<person id=1>Jeff Bantle</person>, <PNP ref=1 affil=2><entity id=2>NASA</entity>’s mission operations directorate representative for the shuttle flight</PNP>

Page 88: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

IE2 systemIE2 system

NetOwlExtractor 3.0

CustomNameTag

PhraseTag EventTagDiscourseModule

TempGen

TE TR STHand-craftedrules

Decisiontree

Output template generation• Translates SGML output into templates in the desired format• Solves and normalizes time expressions• Performs event merging

Page 89: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

SIFT systemSIFT system

Sentence level Cross-sentece levelOutput

generator

Statistical models

IdentifinderTM TE TR

Page 90: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

SIFT systemSIFT system

Sentence level Cross-sentece levelOutput

generator

Statistical models

IdentifinderTM TE TR

Preprocessing• NERC using a HMM [Bikel et al. 97] + Viterbi maximizing Pr(W,F,C)• each word is tagged with one NE class

person organization location not-a-name

start-sentence

end-sentence

Page 91: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

SIFT systemSIFT system

Sentence level Cross-sentece levelOutput

generator

Statistical models

IdentifinderTM TE TR

Syntactico-semantic interpretation• properties of NEs (TE) and relations (TR)• generative statistical model [Miller et al. 98, 00]• search the most likely augmented parse tree (bottom-up chart based)• prunning of low probability constituents

Page 92: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

SIFT systemSIFT system

Sentence level Cross-sentece levelOutput

generator

Statistical models

IdentifinderTM TE TR

Syntactico-semantic interpretation

Nance , a paid consultant to ABC News , …

per/nnp , det vbn per-desc/nn to org’/nnp org/nnp ,

per-r/np per-desc/np org-r/np

org-ptr/pp

emp-of/pp-lnk

per-desc-r/npper/np

Page 93: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

SIFT systemSIFT system

Sentence level Cross-sentece levelOutput

generator

Statistical models

IdentifinderTM TE TR

Syntactico-semantic interpretation• relations between NEs across sentences• statistical model [Miller et al. 98]• classifier of pairs of entities

• entities in different sentences• entities do not take part into local relations• their types are compatible with any relation

Page 94: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

TURBIO systemTURBIO system

NERC Partial parsing controller

Output generator

Lexicon IE-rule set scheduling

NERC Rules

Lexical Analizer

TE TR

Partial-tree grammar

IE-Rule set processor

IE-Rule sets

Page 95: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

TURBIO systemTURBIO system

NERC Partial parsing controller

Output generator

Lexicon IE-rule set scheduling

NERC Rules

Lexical Analizer

TE TR

Partial-tree grammar

IE-Rule set processor

IE-Rule sets

Preprocessing• WordNet synsets, lemmas, POS tags• NERC• parsed trees of noun, verbal, and adjectival phrases

Page 96: J. Turmo, 2006 Adaptive Information Extraction Information Extraction Jordi Turmo TALP Research Centre Dep. Llenguatges i Sistemes Informàtics Universitat

J. Turmo, 2006 Adaptive Information Extraction

Examples of IE systems

TURBIO systemTURBIO system

NERC Partial parsing controller

Output generator

Lexicon IE-rule set scheduling

NERC Rules

Lexical Analizer

TE TR

Partial-tree grammar

IE-Rule set processor

IE-Rule sets

Syntactico-semantic interpretation• Hypotesis: dependence among relations of NEs• Iterative execution of IE-rule sets depending on the scheduling• Example:

• Scenario = Mushroom parts, their possible colors and the circumstances by which they are produced• There are colors in the documents that are not related to any mushroom part, but all colors related with a circumstance are colors related to mushroom parts.