Machine Reading of Web Text Oren Etzioni Turing Center University of Washington

Preview:

Citation preview

Machine Reading of Web

Text Oren Etzioni

Turing CenterUniversity of Washington

http://turing.cs.washington.edu

2

Rorschach Test

3

Rorschach Test for CS

4

Moore’s Law?

5

Storage Capacity?

6

Number of Web Pages?

7

Number of Facebook Users?

8

9

Turing Center Foci

Scale MT to 49,000,000 language pairs 2,500,000 word translation graph P(V F C)? PanImages

Accumulate knowledge from the Web

A new paradigm for Web Search

10

Outline

1. A New Paradigm for Search2. Open Information Extraction3. Tractable Inference4. Conclusions

11

Web Search in 2020?

Type key words into a search box? Social or “human powered” Search? The Semantic Web? What about our technology

exponentials?

“The best way to predict the future is to invent it!”

12

Intelligent Search

Instead of merely retrieving Web pages, read ‘em!

Machine Reading = Information Extraction (IE) + tractable inference

IE(sentence) = who did what? speaker(Alon Halevy, UW)

Inference = uncover implicit information Will Alon visit Seattle?

13

Application: Information Fusion What kills bacteria? What west coast, nano-technology

companies are hiring? Compare Obama’s “buzz” versus

Hillary’s? What is a quiet, inexpensive, 4-star

hotel in Vancouver?

14

Opine (Popescu & Etzioni, EMNLP ’05)

IE(product reviews) Informative Abundant, but varied Textual

Summarize reviews without any prior knowledge of product category

Opinion Mining

15

16

17

But “Reading” the Web is Tough Traditional IE is narrow IE has been applied to small,

homogenous corpora No parser achieves high accuracy No named-entity taggers No supervised learning

How about semi-supervised learning?

18

Semi-Supervised Learning

Few hand-labeled examples Limit on the number of concepts Concepts are pre-specified Problematic for the Web

Alternative: self-supervised learning Learner discovers concepts on the fly Learner automatically labels examples

per concept!

19

2. Open IE = Self-supervised IE (Banko, Cafarella, Soderland, et. al, IJCAI ’07)

Traditional IE Open IE

Input: Corpus + Hand-labeled Data

Corpus

Relations: Specified in Advance

Discovered Automatically

Complexity:

Text analysis:

O(D * R) R relations

Parser + Named-entity tagger

O(D) D documents

NP Chunker

20

Extractor Overview (Banko & Etzioni, ’08)

1. Use a simple model of relationships in English to label extractions

2. Bootstrap a general model of relationships in English sentences, encoded as a CRF

3. Decompose each sentence into one or more (NP1, VP, NP2) “chunks”

4. Use CRF model to retain relevant parts of each NP and VP.

The extractor is relation-independent!

21

TextRunner Extraction

Extract Triple representing binary relation (Arg1, Relation, Arg2) from sentence.

Internet powerhouse, EBay, was originally founded by Pierre Omidyar.

Internet powerhouse, EBay, was originally founded by Pierre Omidyar.

(Ebay, Founded by, Pierre Omidyar)

22

Numerous Extraction Challenges Drop non-essential info: “was originally founded by” founded by Retain key distinctionsEbay founded by Pierr ≠ Ebay founded

Pierre Non-verb relationships“George Bush, president of the U.S…” Synonymy & aliasingAlbert Einstein = Einstein ≠ Einstein Bros.

23

TextRunner (Web’s 1st Open IE

system) 1. Self-Supervised Learner: automatically

labels example extractions & learns an extractor

2. Single-Pass Extractor: single pass over corpus, identifying extractions in each sentence

3. Query Processor: indexes extractions enables queries at interactive speeds

TextRunner Demo

25

26

27

Triples11.3 million

With Well-Formed Relation9.3 million

With Well-Formed Entities7.8 million

Abstract6.8 million

79.2% correct

Concrete1.0 million

88.1%correct

Sample of 9 million Web Pages

Concrete facts: (Oppenheimer, taught at, Berkeley)

Abstract facts: (fruit, contain, vitamins)

28

3. Tractable Inference

Much of textual information is implicit

I. Entity and predicate resolutionII. Probability of correctnessIII. Composing facts to draw conclusions

29

I. Entity Resolution

Resolver (Yates & Etzioni, HLT ’07): determines synonymy based on relations found by TextRunner (cf. Pantel & Lin ‘01)

(X, born in, 1941) (M, born in, 1941) (X, citizen of, US) (M, citizen of, US) (X, friend of, Joe) (M, friend of, Mary)

P(X = M) ~ shared relations

30

Relation Synonymy

(1, R, 2) (2, R 4) (4, R, 8) Etc.

(1, R’ 2) (2, R’, 4) (4, R’ 8) Etc.

P(R = R’) ~ shared argument pairs

•Unsupervised probabilistic model•O(N log N) algorithm run on millions of docs

31

II. Probability of CorrectnessHow likely is an extraction to be correct?

Factors to consider include: Authoritativeness of source Confidence in extraction method Number of independent extractions

32

Counting Extractions

Lexico-syntactic patterns: (Hearst ’92)“…cities such as Seattle, Boston, and…”

Turney’s PMI-IR, ACL ’02: PMI ~ co-occur frequency # results # results confidence in class

membership.

33

Formal Problem StatementIf an extraction x appears k times in a

set of n distinct sentences each suggesting that x belongs to C, what is the probability that x C ?

C is a class (“cities”) or a relation (“mayor of”)

Note: we only count distinct sentences!

34

Combinatorial Model (“Urns”)

Odds increase exponentially with k, but decrease exponentially with n

See Downey et al.’s IJCAI ’05 paper for formal details.

35

0

1

2

3

4

5

City Film Country MayorOf

De

via

tio

n f

rom

ide

al l

og

lik

elih

oo

d

urns

noisy-or

pmi

Performance (15x Improvement)

Self supervised, domain independent method

36

0

250

500

0 50000 100000

Frequency rank of extraction

Nu

mb

er

of

tim

es

ex

tra

cti

on

a

pp

ea

rs i

n p

att

ern

URNS limited on “sparse” facts

A mixture of correct and incorrect

e.g., (Dave Shaver, Pickerington)(Ronald McDonald, McDonaldland)

con

text Tend to be correct

e.g., (Michael Bloomberg, New York City)

37

Language Models to the Rescue (Downey, Schoenmackers, Etzioni, ACL ’07)Instead of only lexico-syntactic patterns, leverage

all contexts of a particular entity

Statistical ‘type check’: does Pickerington “behave” like a city?

does Shaver “behave” like a mayor?

Language model = HMM (built once per corpus) Project string to point in 20-dimensional space Measure proximity of Pickerington to Seattle,

Boston, etc.

38

III Compositional Inference (work in progress, Schoenmackers, Etzioni, Weld)Implicit information, (2+2=4) TextRunner: (Turing, born in, London) WordNet: (London, part of, England) Rule: ‘born in’ is transitive thru ‘part of’ Conclusion: (Turing, born in, England) Mechanism: MLN instantiated on the fly Rules: learned from corpus (future work) Inference Demo

39

Mulder ‘01 WebKB ‘99 PMI-IR ‘01

KnowItAll, ‘04

UrnsBE ‘05

KnowItNow ‘05

TextRunner ‘07

KnowItAll Family Tree

Opine ‘05

Woodward ‘06

Resolver ‘07

REALM ‘07 Inference ‘08

40

KnowItAll Team

Michele Banko Michael Cafarella Doug Downey Alan Ritter Dr. Stephen Soderland Stefan Schoenmackers Prof. Dan Weld Mausam

Alumni: Dr. Ana-Maria Popescu, Dr. Alex Yates, and others.

41

Related Work

Sekine’s “pre-empty IE” Powerset Textual Entailment AAAI ‘07 Symposium on “Machine

Reading” Growing body of work on IE from the

Web

42

4. Conclusions

Imagine search systems that operate over a (more) semantic space

Key words, documents extractions TF-IDF, pagerank relational models Web pages, hyper links entities, relns

Reading the Web new Search Paradigm

43

44

Machine Reading = Unsupervised understanding

of text

Much is implicit tractable inference is

key!

45

HMM in more detail

Training: seek to maximize probability of corpus w given latent states t using EM:

ti ti+1 ti+2 ti+3 ti+4

wi wi+1 wi+2 wi+3 wi+4

cities such as Los Angeles

wordsw

kNt

i

i

1,,...,1

46

Using the HMM at Query Time Given a set of extractions (Arg1, Rln, Arg2) Seeds = most frequent Args for Rln

arg|,|

||

1)(arg, tPseedtP

seedsKLseedsf

ii

1. Distribution over t is read from the HMM

2. Compute KL divergence via f(arg, seeds)

3. For each extraction, average f over Arg1 & Arg2

4. Sort “sparse” extractions in ascending order

47

Language Modeling & Open IE Self supervised Illuminating phrases full context

Handles sparse extractions

48

Focus: Open IE on Web Text

Advantages Challenges

“Semantically tractable”sentences

Redundancy

Search engines

Difficult, ungrammatical sentences

Unreliable information

Heterogeneous corpus

49

II. Probability of CorrectnessHow likely is an extraction to be correct?Distributional Hypothesis: “words that

occur in the same contexts tend to have similar meanings ”

KnowItAll Hypothesis: extractions that occur in the same informative contexts more frequently are more likely to be correct.

50

Relation’s arguments are “typed”:(Person, Mayor Of, City)

Training: Model distribution of Person & City contexts in corpus (Distributional Hypothesis)

Query time: Rank sparse triples by how well each argument’s context distribution matches that of its type

Argument “Type checking” via HMM

51

Silly Example

(Shaver, Mayor of, Pickerington) over (Spice Girls, Mayor of, Microsoft)

Because: Shaver’s contexts are more like “other

mayors” than Spice Girls’, and Pickerington's contexts are more like

“other cities” than Microsoft’s

52

Utilizing HMMs to Check TypesChallenges: Argument types are not known Can’t build model for each argument

type “Textual types” are fuzzy

Solution: Train an HMM for the corpus using EM & bootstrap

REALM improves precision by 90%

53

MLNMLN

Knowledge BasesQuery Formula

Find BestQuery

Run Query

Find ImpliedNodes & Cliques

Results

Best KB + Query

Query Results

New Nodes+ Cliques

TextRunner, WordNetBornIn(Turing, England)? Inference RulesBornIn(X, city) ->

BornIn(X, country)

WordNet: X is in England

London is in England

In(London, England)

TextRunner: Turing born in X

Turing was born in London

BornIn(Turing, London)BornIn(Turing, England)

Query: Was Turing born in England?

In(London, England)BornIn(Turing, London)BornIn(Turing, England)

Yes! Turing wasborn in England!

Recommended