Information Extraction and Named Entity Recognition-2

7/22/2019 Information Extraction and Named Entity Recognition-2

1/73

Informatio

Extraction and N

Entity Recogni

Introducing the taGetting simple stru

information out o


2/73

Christopher Manning

Information Extraction

Information extraction (IE) systems Find and understand limited relevant parts of texts

Gather information from many pieces of text

Produce a structured representation of relevant information

relations(in the database sense), a.k.a.,

a knowledge base Goals:

1. Organize information so that it is useful to people

2. Put information in a semantically precise form that alloinferences to be made by computer algorithms


3/73

Christopher Manning

Information Extraction (IE)

IE systems extract clear, factual information Roughly: Who did what to whom when?

E.g., Gathering earnings, profits, board members, headquarters, e

company reports

The headquarters of BHP Billiton Limited, and the global of the combined BHP Billiton Group, are located in MelboAustralia.

headquarters(BHP Biliton Limited, Melbourne, Austral

Learn drug-gene product interactions from medical research


4/73

Christopher Manning

Low-level information extraction

Is now availableand I think popularin applicationor Google mail, and web indexing

Often seems to be based on regular expressions and


5/73

Christopher Manning

Low-level information extraction


6/73

Christopher Manning

Why is IE hard on the web?

Need this

price

Title

A book,

Not a toy


7/73

Christopher Manning

How is IE useful?Classified Advertisements (Real Estate)

Background: Plain text

advertisements

Lowest common

denominator: only thing

that 70+ newspapersusing many different

publishing systems can

all handle

2067206v1

March 02, 1998


8/73

Christopher Manning

h h


9/73

Christopher Manning

Why doesnt text search (IR) work?

What you search for in real estate advertisements: Town/suburb. You might think easy, but:

Real estate agents:Coldwell Banker, Mosman

Phrases:Only 45 minutes from Parramatta

Multiple property ads have different suburbs in one ad Money: want a range not a textual match

Multiple amounts: was $155K, now $145K

Variations: offers in the high 700s [but notrents for $270]

Bedrooms: similar issues: br, bdr, beds, B/R

Ch i h M i


10/73

Christopher Manning

Named Entity Recognition (NER)

A very important sub-task: findand classifynames intext, for example:

The decision by the independent MP Andrew Wilkie to

withdraw his support for the minority Labor government

sounded dramatic but it should not further threaten itsstability. When, after the 2010 election, Wilkie, Rob

Oakeshott, Tony Windsor and the Greens agreed to support

Labor, they gave just two guarantees: confidence and

supply.

Ch i t h M i


11/73

Christopher Manning



withdraw his support for the minority Laborgovernment

sounded dramatic but it should not further threaten itsstability. When, after the 2010election, Wilkie, Rob

Oakeshott, Tony Windsor and the Greensagreed to support


supply.


Christopher Manning


12/73

Christopher Manning



withdraw his support for the minority Laborgovernment

sounded dramatic but it should not further threaten itsstability. When, after the 2010election, Wilkie, Rob

Oakeshott, Tony Windsor and the Greensagreed to support


supply.


Christopher Manning


13/73

Christopher Manning


The uses: Named entities can be indexed, linked off, etc.

Sentiment can be attributed to companies or products

A lot of IE relations are associations between named entities

For question answering, answers are often named entities.

Concretely:

Many web pages tag various entities, with links to bio or top

Reuters OpenCalais, Evri, AlchemyAPI, Yahoos Term Ext

Apple/Google/Microsoft/ smart recognizers for document


14/73

Informatio

Extraction and N

Entity Recogni

Introducing the taGetting simple stru

information out o


15/73

Evaluation of N

Entity Recogn

Precision, Recall, an

F measure;

their extension to seq

Christopher Manning


16/73

Christopher Manning

The 2-by-2 contingency table

correct not correct

selected tp fp

not selected fn tn

Christopher Manning


17/73

p g

Precision and recall

Precision: % of selected items that are correctRecall: % of correct items that are selected

correct not correct

selected tp fp

not selected fn tn

Christopher Manning


18/73

p g

A combined measure: F

A combined measure that assesses the P/R tradeoff is(weighted harmonic mean):

The harmonic mean is a very conservative average; se

8.3

People usually use balanced F1 measure

i.e., with = 1 (that is, = ): F= 2PR/(P+

RP

PR

RP

F+

+=

+

=2

2 )1(

1)1(

1

1

Christopher Manning


19/73

Quiz question

What is the F1?

P= 40% R= 40% F1 =

Christopher Manning


20/73

Quiz question

What is the F1?

P= 75% R= 25% F1 =

Christopher Manning


21/73

The Named Entity Recognition Task

Task: Predict entities in a text

Foreign ORG

Ministry ORG

spokesman O

Shen PERGuofang PER

told O

Reuters ORG

: :

}Standard

evaluationis per entity,

notper token

Christopher Manning


22/73

Precision/Recall/F1 for IE/NER

Recall and precision are straightforward for tasks likecategorization, where there is only one grain size (do

The measure behaves a bit funnily for IE/NER when th

boundary errors (which are common):

First Bank of Chicago announced earnings

This counts as both a fp and a fn

Selecting nothingwould have been better

Some other metrics (e.g., MUC scorer) give partial cre

(according to complex rules)


23/73

Evaluation of N

Entity Recogn

Precision, Recall, an

F measure;

their extension to seq


24/73

Methods for d

NER and IE

The space;

hand-written patt

Christopher Manning


25/73

Three standard approaches to NER

1. Hand-written regular expressions Perhaps stacked

2. Using classifiers

Generative: Nave Bayes

Discriminative: Maxent models

3. Sequence models

HMMs

CMMs/MEMMs

CRFs

Christopher Manning

Hand written Patterns for


26/73

Hand-written Patterns for

Information Extraction

If extracting from automatically generated web pagesregex patterns usually work.

Amazon page


27/73

MUC: the NLP genesis of IE

DARPA funded significant efforts in IE in the early to m

Message Understanding Conference (MUC) was an anevent/competition where results were presented.

Focused on extracting information from news articles

Terrorist events

Industrial joint ventures

Company management changes

Starting off, all rule-based, gradually moved to ML

Christopher Manning


28/73

MUC Information Extraction Examp

Bridgestone Sports Co. said Friday

it had set up ajoint venture in

Taiwan with a local concern and a

Japanese trading house to produce

golf clubs to be supplied to Japan.

The joint venture, Bridgestone

Sports Taiwan Co., capitalized at20 million new Taiwan dollars, will

start production in January 1990

with production of 20,000 iron

and metal wood clubs a month.

JOINT-VENTURE-1

Relationship: TIE-UP Entities: Bridgestone Sport

concern, a Japanese trad

Joint Ent: Bridgestone Spo

Activity: ACTIVITY-1

Amount: NT$20 000 000

ACTIVITY-1

Activity:PRODUCTION

Company: Bridgestone Spo

Product:iron and metal w

Start date: DURING: Januar

Christopher Manning

Natural Language Processing based


29/73

Natural Language Processing-based

Hand-written Information Extractio

For unstructured human-written text, some NLP may Part-of-speech (POS) tagging

Mark each word as a noun, verb, preposition, etc.

Syntactic parsing

Identify phrases: NP, VP, PP

Semantic word categories (e.g. from WordNet)

KILL: kill, murder, assassinate, strangle, suffocate

Christopher Manning


30/73

0

1

2

3

4

PN s

Ar

N

PN

P

s

Art

Finite Automaton for

Noun groups:Johns interesting

book with a nice cover

Grep++ = Cascaded grepping

Christopher Manning



31/73


Hand-written Information Extractio

We use a cascaded regular expressions to match relat Higher-level regular expressions can use categories matched

level expressions

E.g. the CRIME-VICTIM pattern can use things matching NOU

This was the basis of the SRI FASTUS system in later M

Example extraction pattern

Crime victim:

Prefiller: [POS: V, Hypernym: KILL]

Filler: [Phrase: NOUN-GROUP]

Christopher Manning


32/73

Rule-based Extraction Examples

Determining which person holds what office in what organization

[person] , [office] of [org]

Vuk Draskovic, leader of the Serbian Renewal Movement

[org] (named, appointed, etc.) [person] Prep [office]

NATO appointed Wesley Clark as Commander in Chief

Determining where an organization is located

[org] in[loc]

NATO headquarters in Brussels

[org] [loc] (division, branch, headquarters, etc.)

KFOR Kosovo headquarters


33/73

Methods for d

NER and IE

The space;

hand-written patt

f i


34/73

Informatio

extraction as

classificatio

Christopher Manning


35/73

Nave use of text classification for I

Use conventional classification algorithms to clsubstrings of document as to be extractedor

In some simple but compelling domains, this natechnique is remarkably effective. But do think about when it would and wouldnt work!

Christopher Manning


36/73

Change of Address email

Christopher Manning

Change-of-Address detection


37/73

Change-of-Address detection[Kushmerick et al., ATEM 2001]

addressnave-Bayes

model

P[[email protected]]

P[[email protected]

everyone so.... My new email address is: [email protected] Hope a

From: Robert Kubinsky Subject: Email update

messagenave Bayesmodel

Yes

2. Extraction

1. Classification

Christopher Manning

Change-of-Address detection resul


38/73

Change of Address detection resul[Kushmerick et al., ATEM 2001]

Corpus of 36 CoA emails and 5720 non-CoA emails

Results from 2-fold cross validations (train on half, test on ot

Very skewed distribution intended to be realistic

Note very limited training data: only 18 training CoA messag

36 CoA messages have 86 email addresses; old, new, and mi

P R F1

Message classification 98% 97% 98%

Address classification 98% 68% 80%

I f ti


39/73

Informatio

extraction as

classificatio

Sequence Mode


40/73

Sequence Mode

Named Enti

Recognitio

Christopher Manning


41/73

The ML sequence model approach

Training

1. Collect a set of representative training documents

2. Label each token for its entity class or other (O)

3. Design feature extractors appropriate to the text and classes

4. Train a sequence classifier to predict the labels from the data

Testing

1. Receive a set of testing documents

2. Run sequence model inference to label each token

3. Appropriately output the recognized entities

Christopher Manning


42/73

Encoding classes for sequence labe

IO encoding IOB encoding

Fred PER B-PER

showed O O

Sue PER B-PER

Mengqiu PER B-PERHuang PER I-PER

s O O

new O O

painting O O

Christopher Manning


43/73

Features for sequence labeling

Words

Current word (essentially like a learned dictionary)

Previous/next word (context)

Other kinds of inferred linguistic classification

Part-of-speech tags

Label context

Previous (and perhaps next) label

43

Christopher Manning


44/73

Features: Word substrings

drug

company

movie

place

person

Cotrimoxazole Wethersfie

Alien Fury: Countdown to Invasio

0

0

0

18

0

oxa

708

0006

:

14

Christopher Manning


45/73

Features: Word shapes

Word Shapes

Map words to simplified representation that encodes

such as length, capitalization, numerals, Greek lette

punctuation, etc.

Varicella-zoster Xx-xxxmRNA xXXX

CPA1 XXXd

Sequence Mode


46/73

Sequence Mode

Named Enti

Recognitio


47/73

Maximum ent

sequence mo

Maximum entropy M

models (MEMMsConditional Markov

Christopher Manning


48/73

Sequence problems

Many problems in NLP have data which is a sequence

characters, words, phrases, lines, or sentences

We can think of our task as one of labeling each item

VBG NN IN DT NN IN NN

Chasing opportunity in an age of upheaval

POS tagging

B B I I B I

Word segmentatio

PERS O O O ORG ORG

Murdoch discusses future of News Corp.

Named entity recognition

Christopher Manning


49/73

MEMM inference in systems

For a Conditional Markov Model (CMM) a.k.a. a Maximum Entro

Markov Model (MEMM), the classifier makes a single decision aconditioned on evidence from observations and previous decisio

A larger space of sequences is usually explored via search

-3 -2 -1 0 +1

DT NNP VBD ??? ???

The Dow fell 22.6 %

Local Context

FeaturW0

W+1

W-1

T-1

T-1-T-2

hasDigit?

(Ratnaparkhi 1996; Toutanova et al. 2003, etc.)

Decision Point

Christopher Manning


50/73

Example: POS Tagging

Scoring individual labeling decisions is no more complex than st

classification decisions We have some assumed labels to use for prior positions

We use features of those and the observed data (which can include

previous, and next words) to predict the current label

-3 -2 -1 0 +1

DT NNP VBD ??? ???

The Dow fell 22.6 %

Local Context

FeaturW0

W+1

W-1

T-1

T-1-T-2

hasDigit?

Decision Point


Christopher Manning


51/73

Example: POS Tagging

POS tagging Features can include:

Current, previous, next words in isolation or together.

Previous one, two, three tags.

Word-internal features: word types, suffixes, dashes, etc.

-3 -2 -1 0 +1

DT NNP VBD ??? ???

The Dow fell 22.6 %

Local Context

FeaturW0

W+1

W-1

T-1

T-1-T-2

hasDigit?


Decision Point

Christopher Manning


52/73

Inference in Systems

Sequence Level

Local Level

Local

Data

Feature

Extraction

Features

LabelOptimization

Smoothing

Classifier Type

SequenceData

Maximum Entropy

ModelsQuadratic

Penalties

Conjugate

Gradient

Sequence Model

Local

DataLocal

Data

Christopher Manning


53/73

Greedy Inference

Greedy inference:

We just start at the left, and use our classifier at each position to assign a

The classifier can depend on previous labeling decisions as well as obse

Advantages: Fast, no extra memory requirements

Very easy to implement

With rich features including observations to the right, it may perform quite

Disadvantage:

Greedy. We make commit errors we cannot recover from

Sequence Model

Inference

Best Sequence

Christopher Manning


54/73

Beam Inference

Beam inference:

At each position keep the top kcomplete sequences.

Extend each sequence in each local way.

The extensions compete for the kslots at the next position.

Advantages:

Fast; beam sizes of 35 are almost as good as exact inference in many c

Easy to implement (no dynamic programming required).

Disadvantage:

Inexact: the globally best sequence can fall off the beam.

Sequence Model

Inference

Best Sequence

Christopher Manning


55/73

Viterbi Inference

Viterbi inference:

Dynamic programming or memoization.

Requires small window of state influence (e.g., past two states are releva

Advantage: Exact: the global best sequence is returned.

Disadvantage:

Harder to implement long-distance state-state interactions (but beam infeto allow long-distance resurrection of sequences anyway).

Sequence Model

Inference

Best Sequence

Christopher Manning

i bi f & h


56/73

Viterbi Inference: J&M Ch. 6

Im punting on this read J&M Ch. 5/6.

Ill do dynamic programming for parsing

Basically, providing you only look at neighboring state

dynamic program a search for the optimal state sequ

Christopher Manning

Vi bi I f J&M Ch 5/6


57/73

Viterbi Inference: J&M Ch. 5/6

Christopher Manning

CRF


58/73

CRFs[Lafferty, Pereira, and McCallum 2001]

Another sequence model: Conditional Random Fields (CRFs)

A whole-sequence conditional model rather than a chaining of local m

The space of cs is now the space of sequences

But if the features firemain local, the conditional sequence likelihood can be calc

using dynamic programming

Training is slower, but CRFs avoid causal-competition biases

These (or a variant using a max margin criterion) are seen as the state

these days but in practice usually work much the same as MEMMs

'

),'(expc i

ii dcf=),|( dcP i ii dcf ),(exp


59/73

Maximum ent

sequence mo

Maximum entropy M

models (MEMMsConditional Markov

The Full Task


60/73

Informatio

Extraction

Christopher Manning

Th F ll T k f I f ti E t


61/73

The Full Task of Information ExtracInformation Extraction =

segmentation + classification+ associaAs a family of techniques:

For years, Microsoft CorporationCEOBill

Gatesrailed against the economic philosophy

of open-source software with Orwellian fervor,

denouncing its communal licensing as a

"cancer" that stifled technological innovation.

Now Gateshimself says Microsoftwill gladly

disclose its crown jewels--the coveted code

behind the Windows operating system--toselect customers.

"We can be open source. We love the concept

of shared source," said Bill Veghte, a

MicrosoftVP. "That's a super-important shift

for us in terms of code access.

Richard Stallman, founderof the Free

Software Foundation, countered saying

Microsoft Corporation

CEO

Bill Gates

Gates

Microsoft

Bill VeghteMicrosoft

VP

Richard Stallman

founder

Free Software Foundation

Slide by Andrew McCallum Use Christopher Manning

An Even Broader View


62/73

An Even Broader View

Create ontology

Segment

Classify

Associate

ClusterLoad DB

Spider

Query,Search

IE

Documentcollection

Database

Filter by relevance

Label training data

Train extraction models


Landscape of IE Tasks:D t F tti


63/73

Document FormattingText paragraphs

without formatting

Grammatical sentence

and some formatting & l

Non-grammatical snippets,

rich formatting & links

Tables

Astro Teller is the CEO and co-founder ofBodyMedia. Astro holds a Ph.D. in Artificial

Intelligence from Carnegie Mellon University,

where he was inducted as a national Hertz

fellow. His M.S. in symbolic and heuristic

computation and B.S. in computer science are

from Stanford University.


Landscape of IE TasksI t d d B dth f C


64/73

Intended Breadth of Coverage

Web site specific Genre specific Wide, no

Amazon.com Book Pages Resumes Univer

Formatting Layout Lan


Landscape of IE Tasks :Complexity of entities/relations


65/73

Complexity of entities/relations

Closed set

He was born in Alabama

Regular set

Phone: (413) 545-13

Complex pattern

University of Arkansas

P.O. Box 140

Hope, AR 71802

was among the six

sold by Hope Feldman

Ambiguous patterns, nee

and many sources of evi

The CALD main office is 412

1299

The big Wyoming sky

U.S. states U.S. phone numbers

U.S. postal addresses Person names

Headquarters:

1128 Main Street, 4th Floor

Cincinnati, Ohio 45210

Pawel Opalinski, Softw

Engineer at WhizBang


Landscape of IE Tasks:Arity of relation


66/73

Arity of relation

Single entity

Person: Jack Welch

Binary relationship

Relation: Person-Title

Person: Jack Welch

Title: CEO

N-ary

Named entityextraction

Jack Welch will retire as CEO of General Electric tomorrow. T

at the Connecticut company will be filled by Jeffrey Immelt.

Relat ion: Company-Location

Company: General Electric

Locat ion : Connecticut

Relation:

Company:

Title:

Out:In :

Person: Jeffrey Immelt

Locat ion: Connecticut

Slide b Andre McCall m Use Christopher Manning

Association task = Relation Extract


67/73

Association task = Relation Extract

Checking if groupings of entities are instances of a rel

1. Manually engineered rules

Rules defined over words/entites: located in


68/73

Relation Extraction: Disease Outbre

May 19 1995, Atlanta -- The Centers for Disease Controland Prevention, which is in the front line of the world'sresponse to the deadly Ebola epidemic in Zaire ,is finding itself hard pressed to cope with the crisis

Date Disease Name

Jan. 1995 Malaria

July 1995 Mad Cow Disease

Feb. 1995 Pneumonia

May 1995 Ebola

InformationExtraction System

Christopher Manning

Relation Extraction: Protein Interac


69/73

We show that CBF-A and CBF-C interactwith each other to form a CBF-A-CBF-C complex

and that CBF-B does not interact with CBF-A or

CBF-C individually but that it associates with the

CBF-A-CBF-C complex.

Relation Extraction: Protein Interac

CBF-A CBF-C

CBF-B CBF-A-CBF-C complex

interactcomplex

associates

Christopher Manning

Binary Relation Association as BinaClassification


70/73

Classification

Christos Faloutsosconferred with Ted Senator, the KDD 2003 G

Person-Role (Christos Faloutsos, KDD 2003 General Cha

Person-Role ( Ted Senator, KDD 2003 General Cha

Person Person Role

Christopher Manning

Resolving coreference(both within and across documents


71/73

(both within and across documents

John Fitzgerald Kennedy was born at 83 Beals Street in Brookline, Massachusetts on29, 1917, at 3:00 pm,[7] the second son of Joseph P. Kennedy, Sr., and Rose Fitzgera

turn, was the eldest child of John "Honey Fitz" Fitzgerald, a prominent Boston politi

was the city's mayor and a three-term member of Congress. Kennedy lived in Brook

years and attended Edward Devotion School, Noble and Greenough Lower School, a

School, through 4th grade. In 1927, the family moved to 5040 Independence Avenu

Bronx, New York City; two years later, they moved to 294 Pondfield Road in Bronxvi

where Kennedy was a member of Scout Troop 2 (and was the first Boy Scout to bec

President).[8] Kennedy spent summers with his family at their home in Hyannisport

Massachusetts, and Christmas and Easter holidays with his family at their winter ho

Beach, Florida. For the 5th through 7th grade, Kennedy attended Riverdale Country

private school for boys. For 8th grade in September 1930, the 13-year old Kennedy

Canterbury School in New Milford, Connecticut.

Christopher Manning

Rough Accuracy of Information Ext


72/73

Rough Accuracy of Information Ext

Errors cascade (error in entity tagerror in relation

These are very rough, actually optimistic, numbers

Hold for well-established tasks, but lower for many specific/

Information type Accuracy

Entities 90-98%

Attributes 80%

Relations 60-70%

Events 50-60%

The Full Task

I f i


73/73

Informatio

Extraction

Documents

Information Extraction and Named Entity Recognition-2