213
Learning Open Domain Knowledge from Text Gabor Angeli Stanford University November 5, 2015 Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 0 / 43

Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Learning Open Domain Knowledge from Text

Gabor Angeli

Stanford University

November 5, 2015

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 0 / 43

Page 2: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Learning “Knowledge?”

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 1 / 43

Page 3: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Learning “Knowledge?”

Knowledge = True Statements

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 1 / 43

Page 4: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Harder For Computers Than Humans

...for a human ...for a computer

Born in Honolulu, Hawaii,Obama is a graduate ofColumbia University and Har-vard Law School.

Need to learn lexical items.Syntax is often non-trivial.Many “facts” in same sentence.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 2 / 43

Page 5: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Harder For Computers Than Humans

...for a human ...for a computer

Born in Honolulu, Hawaii,Obama is a graduate ofColumbia University and Har-vard Law School.

Rattled for Austin, Alaska, Je-sus is the mouse in MicrosoftGoogle but Facebook TwitterSnapchat.

Need to learn lexical items.Syntax is often non-trivial.Many “facts” in same sentence.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 2 / 43

Page 6: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Harder For Computers Than Humans

...for a human ...for a computer

Born in Honolulu, Hawaii,Obama is a graduate ofColumbia University and Har-vard Law School.

Rattled for Austin, Alaska, Je-sus is the mouse in MicrosoftGoogle but Facebook TwitterSnapchat.

Need to learn lexical items.

Syntax is often non-trivial.Many “facts” in same sentence.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 2 / 43

Page 7: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Harder For Computers Than Humans

...for a human ...for a computer

Born in Honolulu, Hawaii,Obama is a graduate ofColumbia University and Har-vard Law School.

Rattled for Austin, Alaska, Je-sus is the mouse in MicrosoftGoogle but Facebook TwitterSnapchat.

Need to learn lexical items.Syntax is often non-trivial.

Many “facts” in same sentence.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 2 / 43

Page 8: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Harder For Computers Than Humans

...for a human ...for a computer

Born in Honolulu, Hawaii,Obama is a graduate ofColumbia University and Har-vard Law School.

Rattled for Austin, Alaska, Je-sus is the mouse in MicrosoftGoogle but Facebook TwitterSnapchat.

Need to learn lexical items.Syntax is often non-trivial.Many “facts” in same sentence.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 2 / 43

Page 9: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

How do we Represent Knowledge?

Unstructured Text Fixed-Schema Knowledge Bases

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 10: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

How do we Represent Knowledge?

Unstructured Text Fixed-Schema Knowledge Bases

⇒(OBAMA; born in; HONOLULU)(OBAMA; born in; HAWAII)

(OBAMA; born on; 1961-8-4)(OBAMA; spouse; MICHELLE)(OBAMA; children; MALIA)

(OBAMA; children; NATASHA)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 11: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

How do we Represent Knowledge?

Active area of research:Supervised relation extractors[Doddington et al., 2004, Surdeanu and Ciaramita, 2007].

Distantly supervised extractors[Wu and Weld, 2007, Mintz et al., 2009].

Weakly+distantly supervised extractors[Hoffmann et al., 2011, Surdeanu et al., 2012].

Partially+weakly+distantly supervised extractors[Angeli et al., 2014a, Angeli et al., 2014b].

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 12: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

More to Life Than Fixed Relation Schema

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 13: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

How do we Represent Knowledge?

Unstructured Text 1. Fixed-Schema Knowledge Bases2. Open-Domain KBs (Open IE)

⇒ (SUBJECT; relation; OBJECT)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 14: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

How do we Represent Knowledge?

Unstructured Text 1. Fixed-Schema Knowledge Bases2. Open-Domain KBs (Open IE)

⇒ (CATS; have; TAILS)(RABBITS; eat; CARROTS)

(OBAMA; enjoys playing; BASKETBALL)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 15: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

How do we Represent Knowledge?

Unstructured Text1. Fixed-Schema Knowledge Bases2. Open-Domain KBs (Open IE)3. Unstructured Text

⇒cats have tails

rabbits eat carrotsObama enjoys playing basketball

A graduated cylinder is best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 16: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

How do we Represent Knowledge?

Unstructured Text1. Fixed-Schema Knowledge Bases2. Open-Domain KBs (Open IE)3. Unstructured Text

⇒cats have tails

rabbits eat carrotsObama enjoys playing basketball

A graduated cylinder is best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 3 / 43

Page 17: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

This Thesis

Store Information as Text (easier)Query Information as Text (hard!)

Build a system that:Takes as input a candidate textual statement.Produces as output the truth of that statement.

Generalizes Fixed-Schema KBsObama was born in Hawaii

7 Obama was born in KenyaGeneralizes Open IE

Rabbits eat carrotsMore precise than web search7 A stopwatch is best to measure the volume of a liquid.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 4 / 43

Page 18: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

This Thesis

Store Information as Text (easier)Query Information as Text (hard!)

Build a system that:Takes as input a candidate textual statement.Produces as output the truth of that statement.

Generalizes Fixed-Schema KBsObama was born in Hawaii

7 Obama was born in KenyaGeneralizes Open IE

Rabbits eat carrotsMore precise than web search7 A stopwatch is best to measure the volume of a liquid.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 4 / 43

Page 19: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

This Thesis

Store Information as Text (easier)Query Information as Text (hard!)

Build a system that:Takes as input a candidate textual statement.Produces as output the truth of that statement.

Generalizes Fixed-Schema KBsObama was born in Hawaii

7 Obama was born in Kenya

Generalizes Open IERabbits eat carrots

More precise than web search7 A stopwatch is best to measure the volume of a liquid.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 4 / 43

Page 20: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

This Thesis

Store Information as Text (easier)Query Information as Text (hard!)

Build a system that:Takes as input a candidate textual statement.Produces as output the truth of that statement.

Generalizes Fixed-Schema KBsObama was born in Hawaii

7 Obama was born in KenyaGeneralizes Open IE

Rabbits eat carrots

More precise than web search7 A stopwatch is best to measure the volume of a liquid.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 4 / 43

Page 21: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

This Thesis

Store Information as Text (easier)Query Information as Text (hard!)

Build a system that:Takes as input a candidate textual statement.Produces as output the truth of that statement.

Generalizes Fixed-Schema KBsObama was born in Hawaii

7 Obama was born in KenyaGeneralizes Open IE

Rabbits eat carrotsMore precise than web search7 A stopwatch is best to measure the volume of a liquid.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 4 / 43

Page 22: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Prior Work

Relation Extraction: (BARACK OBAMA; born in; ???) p

Flexibility

Que

stio

nC

over

age

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 23: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Prior Work

Relation Extraction: (BARACK OBAMA; born in; ???) p

Flexibility

Que

stio

nC

over

age

[Hoffmann et al., 2011, Surdeanu et al., 2012, Angeli et al., 2014b]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 24: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Prior Work

Open Relation Extraction: (RABBITS; eat; ???)

Flexibility

Que

stio

nC

over

age

[Banko et al., 2007, Fader et al., 2011, Mausam et al., 2012]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 25: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Prior Work

Entailment: If a watch measures time, does it measure volume?p

Flexibility

Que

stio

nC

over

age

[Glickman et al., 2006, MacCartney, 2009]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 26: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Textual Entailment

Single Premise:

Mitsubishi Motors Corp.’s new vehicle sales in the US fell 46 percent inJune.

Single Hypothesis:

Mitsubishi sales rose 46 percent.

Classification Task: If you accept the premise, would you accept thehypothesis?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 27: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Prior Work

This Thesis: Formal reasoning with text over large corpora

Flexibility

Que

stio

nC

over

age

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 28: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Prior Work

This Thesis: Formal reasoning with text over large corpora

Flexibility

Que

stio

nC

over

age

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 29: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Prior Work

This Thesis: Formal reasoning with text over large corpora

Flexibility

Que

stio

nC

over

age

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 30: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Roadmap

Common Sense Reasoning:Cats have tails[Angeli and Manning, 2013, Angeli and Manning, 2014]

Complex premises:Born in Hawaii, Obama is a graduate ofColumbia[Angeli et al., 2015]

Lexical + Logical Reasoning:A graduated cylinder would be best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 31: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Roadmap

Common Sense Reasoning:Cats have tails[Angeli and Manning, 2013, Angeli and Manning, 2014]

Complex premises:Born in Hawaii, Obama is a graduate ofColumbia[Angeli et al., 2015]

Lexical + Logical Reasoning:A graduated cylinder would be best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 5 / 43

Page 32: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Reasoning About Common Sense Facts

Kittens play with yarn 7 Kittens play with computers

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 6 / 43

Page 33: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Reasoning About Common Sense Facts

Kittens play with yarn 7 Kittens play with computers

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 6 / 43

Page 34: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Common Sense Reasoning for NLP

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 7 / 43

Page 35: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Start with a large knowledge base

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

Page 36: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Start with a large knowledge base

The cat atea mouse

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

Page 37: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...

The cat atea mouse

. . .

. . .

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

Page 38: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...

The cat atea mouse

. . .

. . .

7 No car-nivores eat

animals

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

↑ Don’t want to run inferenceover every fact!

← Don’t want to store all of these!

Page 39: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...

The cat atea mouse

. . .

. . .

7 No car-nivores eat

animals

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

↑ Don’t want to run inferenceover every fact!

← Don’t want to store all of these!

Page 40: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...

The cat atea mouse

. . .

. . .

7 No car-nivores eat

animals

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

↑ Don’t want to run inferenceover every fact!

← Don’t want to store all of these!

Page 41: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...on demand from a query...

No carnivoreseat animals?

. . .

. . .

The catate a mouse

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

Page 42: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

...Using text as the meaning representation...

No carnivoreseat animals?

The carnivoreseat animals

The cateats animals

The catate an animal

The catate a mouse

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

Page 43: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

...Without aligning to any particular premise.

No carnivoreseat animals?

The carnivoreseat animals

The cateats animals

The catate an animal

The catate a mouse

. . .

. . . . . .

. . .

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

Page 44: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...

The cat atea mouse

. . .

. . .

7 No car-nivores eat

animals

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 8 / 43

Page 45: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

We’re Doing Logical Inference

The cat ate a mouse � ¬ No carnivores eat animals

Recall: Inference on every query: speed is important!

Recall: Both premise and query are sentences.

Detour: Let’s talk about logic!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 9 / 43

Page 46: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

We’re Doing Logical Inference

The cat ate a mouse � ¬ No carnivores eat animals

Recall: Inference on every query: speed is important!

Recall: Both premise and query are sentences.

Detour: Let’s talk about logic!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 9 / 43

Page 47: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

We’re Doing Logical Inference

The cat ate a mouse � ¬ No carnivores eat animals

Recall: Inference on every query: speed is important!

Recall: Both premise and query are sentences.

Detour: Let’s talk about logic!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 9 / 43

Page 48: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

We’re Doing Logical Inference

The cat ate a mouse � ¬ No carnivores eat animals

Recall: Inference on every query: speed is important!

Recall: Both premise and query are sentences.

Detour: Let’s talk about logic!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 9 / 43

Page 49: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First Order Logic is Intractable

Theorem ProversPropositional logic is already NP-complete!

Markov Logic NetworksGrounding 3,800 rules takes 7 hours (Alchemy).Add Chris Re + decades of DB research: 106 seconds.. . . but still slow for open-domain inference.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 10 / 43

Page 50: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First Order Logic is Intractable

Theorem ProversPropositional logic is already NP-complete!

Markov Logic NetworksGrounding 3,800 rules takes 7 hours (Alchemy).

Add Chris Re + decades of DB research: 106 seconds.. . . but still slow for open-domain inference.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 10 / 43

Page 51: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First Order Logic is Intractable

Theorem ProversPropositional logic is already NP-complete!

Markov Logic NetworksGrounding 3,800 rules takes 7 hours (Alchemy).Add Chris Re + decades of DB research: 106 seconds.. . . but still slow for open-domain inference.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 10 / 43

Page 52: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First order logic is an unnatural language

∃x (location(x)∧born in(Obama,x))

[Berant et al., 2013]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 11 / 43

Page 53: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First Order Logic is Inexpressive

Some people think that Obama was born in Kenya.Second order logic:∃x∃P [P = born∧ think(x, P)∧P(Obama,Kenya)]

But, can still infer: Some people think that Obama is from Kenya.

Most students who learned a foreign language learned it at auniversity.

Most is not a first-order quantifier.Scoping ambiguities everywhere!But, can still infer: Most students learned it at a school.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 12 / 43

Page 54: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First Order Logic is Inexpressive

Some people think that Obama was born in Kenya.Second order logic:∃x∃P [P = born∧ think(x, P)∧P(Obama,Kenya)]But, can still infer: Some people think that Obama is from Kenya.

Most students who learned a foreign language learned it at auniversity.

Most is not a first-order quantifier.Scoping ambiguities everywhere!But, can still infer: Most students learned it at a school.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 12 / 43

Page 55: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First Order Logic is Inexpressive

Some people think that Obama was born in Kenya.Second order logic:∃x∃P [P = born∧ think(x, P)∧P(Obama,Kenya)]But, can still infer: Some people think that Obama is from Kenya.

Most students who learned a foreign language learned it at auniversity.

Most is not a first-order quantifier.Scoping ambiguities everywhere!

But, can still infer: Most students learned it at a school.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 12 / 43

Page 56: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

First Order Logic is Inexpressive

Some people think that Obama was born in Kenya.Second order logic:∃x∃P [P = born∧ think(x, P)∧P(Obama,Kenya)]But, can still infer: Some people think that Obama is from Kenya.

Most students who learned a foreign language learned it at auniversity.

Most is not a first-order quantifier.Scoping ambiguities everywhere!But, can still infer: Most students learned it at a school.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 12 / 43

Page 57: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic

[Sanchez Valencia, 1991, MacCartney and Manning, 2008,Icard and Moss, 2014]

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 13 / 43

Page 58: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic

Does a given mutation to a sentence preserve its truth?

Logic over natural languageInstantaneous and perfect semantic parsing!Plays nice with lexical methods

TractablePolynomial time entailment checking[MacCartney and Manning, 2008].

Expressive (for common inferences)Second-order phenomena; most; quantifier scopingNo free lunch: shallow quantification; single-premise only

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 14 / 43

Page 59: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic

Does a given mutation to a sentence preserve its truth?

Logic over natural languageInstantaneous and perfect semantic parsing!Plays nice with lexical methods

TractablePolynomial time entailment checking[MacCartney and Manning, 2008].

Expressive (for common inferences)Second-order phenomena; most; quantifier scopingNo free lunch: shallow quantification; single-premise only

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 14 / 43

Page 60: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic

Does a given mutation to a sentence preserve its truth?

Logic over natural languageInstantaneous and perfect semantic parsing!Plays nice with lexical methods

TractablePolynomial time entailment checking[MacCartney and Manning, 2008].

Expressive (for common inferences)Second-order phenomena; most; quantifier scopingNo free lunch: shallow quantification; single-premise only

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 14 / 43

Page 61: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic

Does a given mutation to a sentence preserve its truth?

Logic over natural languageInstantaneous and perfect semantic parsing!Plays nice with lexical methods

TractablePolynomial time entailment checking[MacCartney and Manning, 2008].

Expressive (for common inferences)Second-order phenomena; most; quantifier scoping

No free lunch: shallow quantification; single-premise only

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 14 / 43

Page 62: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic

Does a given mutation to a sentence preserve its truth?

Logic over natural languageInstantaneous and perfect semantic parsing!Plays nice with lexical methods

TractablePolynomial time entailment checking[MacCartney and Manning, 2008].

Expressive (for common inferences)Second-order phenomena; most; quantifier scopingNo free lunch: shallow quantification; single-premise only

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 14 / 43

Page 63: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic as Syllogisms

s/Natural Logic/Syllogistic Reasoning/g

Some cat ate a mouse(all mice are rodents)

∴ Some cat ate a rodent

Beyond syllogismsGeneral-purpose logic

Compositional grammarArbitrary quantifiers

Model-theoretic soundness + completeness proof[Icard and Moss, 2014]

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 14 / 43

Page 64: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic as Syllogisms

s/Natural Logic/Syllogistic Reasoning/g

Some cat ate a mouse(all mice are rodents)

∴ Some cat ate a rodent

Beyond syllogismsGeneral-purpose logic

Compositional grammarArbitrary quantifiers

Model-theoretic soundness + completeness proof[Icard and Moss, 2014]

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 14 / 43

Page 65: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

cat

house cat

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 66: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

cat

house cat

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 67: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

↑ cat

house cat

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 68: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

living thing

animal

↑ feline

cat

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 69: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

thing

living thing

↑ animal

feline

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 70: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

thing

living thing

↓ animal

feline

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 71: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

living thing

animal

↓ feline

cat

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 72: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic and Polarity

Treat hypernymy as a partial order.>

animal

feline

cat

dog

Polarity is the direction a lexical item can move in the ordering.

animal

feline

↓ cat

house cat

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 15 / 43

Page 73: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Inference

Quantifiers determines the polarity (↑ or ↓) of words.

Polarity

Inference is reversible.

↑ All↓↑

carnivores

felines

↓ cats

house cats

consume

↑ eat

slurp

placentals

rodents

↑ mice

fieldmice

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 74: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Inference

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Inference is reversible.

↑ All↓↑

felines

cats

↓ house cats

kitties

consume

↑ eat

slurp

placentals

rodents

↑ mice

fieldmice

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 75: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Inference

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Inference is reversible.

↑ All↓↑

felines

cats

↓ house cats

kitties

↑ consume

eat

placentals

rodents

↑ mice

fieldmice

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 76: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Inference

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Inference is reversible.

↑ All↓↑

felines

cats

↓ house cats

kitties

↑ consume

eat

rodents

mice

↑ fieldmice

pine vole

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 77: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Inference

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Inference is reversible.

↑ All↓↑

felines

cats

↓ house cats

kitties

↑ consume

eat

placentals

rodents

↑ mice

fieldmice

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 78: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Inference

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Inference is reversible.

↑ All↓↑

felines

cats

↓ house cats

kitties

↑ consume

eat

mammals

placentals

↑ rodents

mice

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 79: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Inference

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Inference is reversible.

↑ All↓↑

felines

cats

↓ house cats

kitties

↑ consume

eat

mammals

placentals

↑ rodents

mice

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 80: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...

The cat atea mouse

. . .

. . .

7 No car-nivores eat

animals

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 81: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Infer new facts...on demand from a query...

No carnivoreseat animals?

. . .

. . .

The catate a mouse

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 82: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

...Using text as the meaning representation...

No carnivoreseat animals?

The carnivoreseat animals

The cateats animals

The catate an animal

The catate a mouse

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 83: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

...Without aligning to any particular premise.

No carnivoreseat animals?

The carnivoreseat animals

The cateats animals

The catate an animal

The catate a mouse

. . .

. . . . . .

. . .

All catshave tails

All kittensare cute

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 16 / 43

Page 84: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic Inference is Search

Nodes ( fact, truth maintained ∈ {true, false})

Start Node ( query fact, true )End Nodes any known fact

Edges Mutations of the current factEdge Costs How “wrong” an inference step is (learned)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 17 / 43

Page 85: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic Inference is Search

Nodes ( fact, truth maintained ∈ {true, false})

Start Node ( query fact, true )End Nodes any known fact

Edges Mutations of the current factEdge Costs How “wrong” an inference step is (learned)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 17 / 43

Page 86: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic Inference is Search

Nodes ( fact, truth maintained ∈ {true, false})

Start Node ( query fact, true )End Nodes any known fact

Edges Mutations of the current fact

Edge Costs How “wrong” an inference step is (learned)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 17 / 43

Page 87: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic Inference is Search

Nodes ( fact, truth maintained ∈ {true, false})

Start Node ( query fact, true )End Nodes any known fact

Edges Mutations of the current factEdge Costs How “wrong” an inference step is (learned)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 17 / 43

Page 88: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

Shorthand for a node:

↑ No↓↓

organism

animal

↓ carnivores

felines

consume

↓ eat

slurp

living thing

organism

↓ animals

chordate

No carnivoreseat animals?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 18 / 43

Page 89: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

ROOT

No carnivoreseat animals?

The carnivoreseat animals

No animalseat animals

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 18 / 43

Page 90: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

No carnivoreseat animals

The carnivoreseat animals?

The felineeats animals

All carnivoreseat animals

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 18 / 43

Page 91: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

The carnivoreseat animals

The felineeats animals?

The cat eatsanimals

The cat eatschordate

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 18 / 43

Page 92: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

The felineeats animals

The cat eatsanimals?

The cat eatschordates

The kittyeats animals

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 18 / 43

Page 93: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

The cat eatsanimals

The cat eatschordates?

The cateats mice

The cateats dogs

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 18 / 43

Page 94: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

The cat eatschordates

The cateats mice?

The cat atea mouse

The kittyeats mice

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 18 / 43

Page 95: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search (with edges)

ROOT

No carnivoreseat animals?

The carnivoreseat animals

No animalseat animals

. . .

Template Instance Edge

Operator Negate NOOPNOOPNOOP

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 19 / 43

Page 96: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search (with edges)

ROOT

No carnivoreseat animals?

The carnivoreseat animals

No animalseat animals

. . .

Template Instance Edge

Operator Negate No→ TheNOOPNOOP

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 19 / 43

Page 97: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search (with edges)

ROOT

No carnivoreseat animals?

The carnivoreseat animals

No animalseat animals

. . .

Template Instance Edge

Operator Negate No→ TheNo carnivores eat animals→The carnivores eat animals

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 19 / 43

Page 98: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Edge Templates

Template InstanceHypernym animal→ catHyponym cat→ animalAntonym good→ badSynonym cat→ true cat

Add Word cat→ ·Delete Word · → cat

Operator Weaken some→ allOperator Strengthen all→ someOperator Negate all→ noOperator Synonym all→ every

Nearest Neighbor cat→ dog

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 20 / 43

Page 99: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

“Soft” Natural Logic

Want to make likely (but not certain) inferences.Same motivation as Markov Logic, Probabilistic Soft Logic, etc.

Each edge template has a cost θ ≥ 0.

Detail: Variation among edge instances of a template.WordNet: cat→ feline vs. cup→ container.Nearest neighbors distance.Each edge instance has a distance f .

Cost of an edge is θi · fi .Cost of a path is θ · f.Can learn parameters θ .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 21 / 43

Page 100: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

“Soft” Natural Logic

Want to make likely (but not certain) inferences.Same motivation as Markov Logic, Probabilistic Soft Logic, etc.Each edge template has a cost θ ≥ 0.

Detail: Variation among edge instances of a template.WordNet: cat→ feline vs. cup→ container.Nearest neighbors distance.Each edge instance has a distance f .

Cost of an edge is θi · fi .Cost of a path is θ · f.Can learn parameters θ .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 21 / 43

Page 101: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

“Soft” Natural Logic

Want to make likely (but not certain) inferences.Same motivation as Markov Logic, Probabilistic Soft Logic, etc.Each edge template has a cost θ ≥ 0.

Detail: Variation among edge instances of a template.WordNet: cat→ feline vs. cup→ container.Nearest neighbors distance.Each edge instance has a distance f .

Cost of an edge is θi · fi .Cost of a path is θ · f.Can learn parameters θ .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 21 / 43

Page 102: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

“Soft” Natural Logic

Want to make likely (but not certain) inferences.Same motivation as Markov Logic, Probabilistic Soft Logic, etc.Each edge template has a cost θ ≥ 0.

Detail: Variation among edge instances of a template.WordNet: cat→ feline vs. cup→ container.Nearest neighbors distance.Each edge instance has a distance f .

Cost of an edge is θi · fi .

Cost of a path is θ · f.Can learn parameters θ .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 21 / 43

Page 103: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

“Soft” Natural Logic

Want to make likely (but not certain) inferences.Same motivation as Markov Logic, Probabilistic Soft Logic, etc.Each edge template has a cost θ ≥ 0.

Detail: Variation among edge instances of a template.WordNet: cat→ feline vs. cup→ container.Nearest neighbors distance.Each edge instance has a distance f .

Cost of an edge is θi · fi .Cost of a path is θ · f.

Can learn parameters θ .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 21 / 43

Page 104: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

“Soft” Natural Logic

Want to make likely (but not certain) inferences.Same motivation as Markov Logic, Probabilistic Soft Logic, etc.Each edge template has a cost θ ≥ 0.

Detail: Variation among edge instances of a template.WordNet: cat→ feline vs. cup→ container.Nearest neighbors distance.Each edge instance has a distance f .

Cost of an edge is θi · fi .Cost of a path is θ · f.Can learn parameters θ .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 21 / 43

Page 105: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Experiments

ConceptNet:A semi-curated collection of common-sense facts.

not all birds can flynoses are used to smellnobody wants to diemusic is used for pleasure

Negatives: ReVerb extractions marked false by Turkers.Small (1378 train / 1080 test), but fairly broad coverage.

Our Knowledge Base:270 million lemmatized Ollie extractions.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 22 / 43

Page 106: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Experiments

ConceptNet:A semi-curated collection of common-sense facts.

not all birds can flynoses are used to smellnobody wants to diemusic is used for pleasure

Negatives: ReVerb extractions marked false by Turkers.Small (1378 train / 1080 test), but fairly broad coverage.

Our Knowledge Base:270 million lemmatized Ollie extractions.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 22 / 43

Page 107: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

ConceptNet Results

SystemsDirect Lookup: Lookup by lemmas.Thesis: This thesis.

Thesis - Lookup: Remove query facts from KB.

System P RDirect Lookup 100.0 12.1Thesis 90.6 49.1Thesis - Lookup 88.8 40.1

4x improvement in recall.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 23 / 43

Page 108: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

ConceptNet Results

SystemsDirect Lookup: Lookup by lemmas.Thesis: This thesis.Thesis - Lookup: Remove query facts from KB.

System P RDirect Lookup 100.0 12.1Thesis 90.6 49.1Thesis - Lookup 88.8 40.1

4x improvement in recall.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 23 / 43

Page 109: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

ConceptNet Results

SystemsDirect Lookup: Lookup by lemmas.Thesis: This thesis.Thesis - Lookup: Remove query facts from KB.

System P RDirect Lookup 100.0 12.1

Thesis 90.6 49.1Thesis - Lookup 88.8 40.1

4x improvement in recall.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 23 / 43

Page 110: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

ConceptNet Results

SystemsDirect Lookup: Lookup by lemmas.Thesis: This thesis.Thesis - Lookup: Remove query facts from KB.

System P RDirect Lookup 100.0 12.1Thesis 90.6 49.1Thesis - Lookup 88.8 40.1

4x improvement in recall.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 23 / 43

Page 111: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

ConceptNet Results

SystemsDirect Lookup: Lookup by lemmas.Thesis: This thesis.Thesis - Lookup: Remove query facts from KB.

System P RDirect Lookup 100.0 12.1Thesis 90.6 49.1Thesis - Lookup 88.8 40.1

4x improvement in recall.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 23 / 43

Page 112: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Success?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 24 / 43

Page 113: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Not Yet!

The Internet doesn’t speak in atomic utterances

Where was Obama born?Born in Hawaii, Obama is a graduate of Columbia University andHarvard Law School.

⇒ Obama was born in Hawaii

.

Let’s store the inferred fact instead

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 24 / 43

Page 114: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Not Yet!

The Internet doesn’t speak in atomic utterancesWhere was Obama born?

Born in Hawaii, Obama is a graduate of Columbia University andHarvard Law School.

⇒ Obama was born in Hawaii

.

Let’s store the inferred fact instead

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 24 / 43

Page 115: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Not Yet!

The Internet doesn’t speak in atomic utterancesWhere was Obama born?

Born in Hawaii, Obama is a graduate of Columbia University andHarvard Law School.⇒ Obama was born in Hawaii.

Let’s store the inferred fact instead

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 24 / 43

Page 116: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Not Yet!

The Internet doesn’t speak in atomic utterancesWhere was Obama born?

Born in Hawaii, Obama is a graduate of Columbia University andHarvard Law School.⇒ Obama was born in Hawaii.

Let’s store the inferred fact instead

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 24 / 43

Page 117: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Roadmap

Common Sense Reasoning:Cats have tails[Angeli and Manning, 2013, Angeli and Manning, 2014]

Complex premises:Born in Hawaii, Obama is a graduate ofColumbia[Angeli et al., 2015]

Lexical + Logical Reasoning:A graduated cylinder would be best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 24 / 43

Page 118: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Roadmap

Common Sense Reasoning:Cats have tails[Angeli and Manning, 2013, Angeli and Manning, 2014]

Complex premises:Born in Hawaii, Obama is a graduate ofColumbia[Angeli et al., 2015]

Lexical + Logical Reasoning:A graduated cylinder would be best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 24 / 43

Page 119: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Atomic Clauses from Sentences

Input: Long sentence.Born in a small town, she took the midnighttrain going anywhere.

Output: Short clauses.she was born in a small town.

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

advcl

nsubj

dobj

nn

det

advcl dobj

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 25 / 43

Page 120: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Atomic Clauses from Sentences

Input: Long sentence.Born in a small town, she took the midnighttrain going anywhere.

Output: Short clauses.she was born in a small town.

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

advcl

nsubj

dobj

nn

det

advcl dobj

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 25 / 43

Page 121: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Atomic Clauses from Sentences

Input: Long sentence.Born in a small town, she took the midnighttrain going anywhere.

Output: Short clauses.she was born in a small town.

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

advcl

nsubj

dobj

nn

det

advcl dobj

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 25 / 43

Page 122: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Clause Classifier

Born in a small town, she took the midnight train going anywhere.

nmod:in

amod

det

advcl

nsubj

dobj

nn

det

advcl dobj

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)Yield (Subject Controller) (Obama Born in Hawaii)Yield (Object Controller) (Fred leave the room)Yield (Parent Subject) (Obama is our 44th president)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 26 / 43

Page 123: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Clause Classifier

Dentists suggest that you should brush your teeth.

ccomp

nsubj

mark

nsubj

aux

dobj

nmod:poss

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)

Yield (Subject Controller) (Obama Born in Hawaii)Yield (Object Controller) (Fred leave the room)Yield (Parent Subject) (Obama is our 44th president)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 26 / 43

Page 124: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Clause Classifier

Born in Hawaii , Obama is a US citizen.

advcl

nsubj

cop

det

amod

nmod:in

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)Yield (Subject Controller) (Obama Born in Hawaii)

Yield (Object Controller) (Fred leave the room)Yield (Parent Subject) (Obama is our 44th president)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 26 / 43

Page 125: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Clause Classifier

I persuaded Fred to leave the room.

xcomp

dobjnsubj mark

dobj

det

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)Yield (Subject Controller) (Obama Born in Hawaii)Yield (Object Controller) (Fred leave the room)

Yield (Parent Subject) (Obama is our 44th president)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 26 / 43

Page 126: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Clause Classifier

Obama, our 44th president.

appos

nmod:poss

amod

Input: Dependency arc.Output: Action to take.

Yield (you should brush your teeth)Yield (Subject Controller) (Obama Born in Hawaii)Yield (Object Controller) (Fred leave the room)Yield (Parent Subject) (Obama is our 44th president)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 26 / 43

Page 127: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Classifier Training

Training Data Generation1. Label the Penn Treebank with Open IE triples using traces.2. Run exhaustive search over possible clause splits.

3. Positive Labels: A sequence of actions which yields a relation(33.5k examples).Negative Labels: All other sequences of actions (1.1M examples).

Features:Edge label; incoming edge label.Neighbors of governor + dependent; number of neighbors.Existence of subject/object edges at governor; dependent.POS tag of governor; dependent.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 27 / 43

Page 128: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Classifier Training

Training Data Generation1. Label the Penn Treebank with Open IE triples using traces.2. Run exhaustive search over possible clause splits.3. Positive Labels: A sequence of actions which yields a relation

(33.5k examples).Negative Labels: All other sequences of actions (1.1M examples).

Features:Edge label; incoming edge label.Neighbors of governor + dependent; number of neighbors.Existence of subject/object edges at governor; dependent.POS tag of governor; dependent.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 27 / 43

Page 129: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Classifier Training

Training Data Generation1. Label the Penn Treebank with Open IE triples using traces.2. Run exhaustive search over possible clause splits.3. Positive Labels: A sequence of actions which yields a relation

(33.5k examples).Negative Labels: All other sequences of actions (1.1M examples).

Features:Edge label; incoming edge label.Neighbors of governor + dependent; number of neighbors.Existence of subject/object edges at governor; dependent.POS tag of governor; dependent.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 27 / 43

Page 130: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Maximally Shorten Clauses

Some strange, nuanced function:

Heinz Fischer of Austria =⇒ Heinz FischerUnited States president Obama =⇒ ObamaAll young rabbits drink milk 6=⇒ 7 All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milkEnemies give fake praise 6=⇒ 7 Enemies give praiseFriends give true praise =⇒ Friends give praise

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 28 / 43

Page 131: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Maximally Shorten Clauses

An entailment function:

Heinz Fischer of Austria =⇒ Heinz FischerUnited States president Obama =⇒ ObamaAll young rabbits drink milk 6=⇒ 7 All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milkEnemies give fake praise 6=⇒ 7 Enemies give praiseFriends give true praise =⇒ Friends give praise

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 28 / 43

Page 132: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Maximally Shorten Clauses

A natural logic entailment function:

Heinz Fischer of Austria =⇒ Heinz FischerUnited States president Obama =⇒ ObamaAll young rabbits drink milk 6=⇒ 7 All rabbits drink milkSome young rabbits drink milk =⇒ Some rabbits drink milkEnemies give fake praise 6=⇒ 7 Enemies give praiseFriends give true praise =⇒ Friends give praise

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 28 / 43

Page 133: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ Some↑↑

mammals

rabbits

↑ young rabbits

baby rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 28 / 43

Page 134: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ Some↑↑

animals

mammals

↑ rabbits

young rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 28 / 43

Page 135: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ All↓↑

mammals

rabbits

↓ young rabbits

baby rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 28 / 43

Page 136: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Natural Logic For Clause Shortening

Quantifiers determines the polarity (↑ or ↓) of words.

Mutations must respect polarity.

Polarity determines valid deletions.

↑ All↓↑

animals

mammals

↓ rabbits

young rabbits

consume

↑ drink

slurp

something

liquid

↑ milk

Lucerne

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 28 / 43

Page 137: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Bonus: Knowledge Base Triples

Heinz Fischer visited US =⇒ (HEINZ FISCHER; visited; US)

Obama born in Hawaii =⇒ (OBAMA; born in; HAWAII)Cats are cute =⇒ (CATS; are; CUTE)Cats are sitting next to dogs =⇒ (CATS; are sitting next to; DOGS)

. . .

5 dependency tree patterns (+ 8 nominal patterns)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 138: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Bonus: Knowledge Base Triples

Heinz Fischer visited US =⇒ (HEINZ FISCHER; visited; US)Obama born in Hawaii =⇒ (OBAMA; born in; HAWAII)

Cats are cute =⇒ (CATS; are; CUTE)Cats are sitting next to dogs =⇒ (CATS; are sitting next to; DOGS)

. . .

5 dependency tree patterns (+ 8 nominal patterns)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 139: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Bonus: Knowledge Base Triples

Heinz Fischer visited US =⇒ (HEINZ FISCHER; visited; US)Obama born in Hawaii =⇒ (OBAMA; born in; HAWAII)Cats are cute =⇒ (CATS; are; CUTE)

Cats are sitting next to dogs =⇒ (CATS; are sitting next to; DOGS). . .

5 dependency tree patterns (+ 8 nominal patterns)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 140: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Bonus: Knowledge Base Triples

Heinz Fischer visited US =⇒ (HEINZ FISCHER; visited; US)Obama born in Hawaii =⇒ (OBAMA; born in; HAWAII)Cats are cute =⇒ (CATS; are; CUTE)Cats are sitting next to dogs =⇒ (CATS; are sitting next to; DOGS)

. . .

5 dependency tree patterns (+ 8 nominal patterns)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 141: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Bonus: Knowledge Base Triples

Heinz Fischer visited US =⇒ (HEINZ FISCHER; visited; US)Obama born in Hawaii =⇒ (OBAMA; born in; HAWAII)Cats are cute =⇒ (CATS; are; CUTE)Cats are sitting next to dogs =⇒ (CATS; are sitting next to; DOGS)

. . .

5 dependency tree patterns (+ 8 nominal patterns)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 142: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Extrinsic Evaluation: Knowledge Base Population

Unstructured Text

Structured Knowledge Base

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 143: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Extrinsic Evaluation: Knowledge Base Population

Relation Extraction Task:Fixed schema of 41 relations.Precision: answers marked correct by humans.Recall: answers returned by any team (including LDC annotators).

Comparison: Open Information Extraction to KBP Relations in 3Hours. [Soderland et al., 2013]

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 144: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Extrinsic Evaluation: Knowledge Base Population

Relation Extraction Task:Fixed schema of 41 relations.Precision: answers marked correct by humans.Recall: answers returned by any team (including LDC annotators).

Comparison: Open Information Extraction to KBP Relations in 3Hours. [Soderland et al., 2013]

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 29 / 43

Page 145: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Map Triples to Structured Knowledge Base

KBP Relation Text PMI2

Per:Date Of Birth be bear on 1.83bear on 1.28

Per:Date Of Death die on 0.70be assassinate on 0.65

Per:LOC Of Birth be bear in 1.21Per:LOC Of Death *elect president of 2.89Per:Religion speak about 0.67

popular for 0.60Per:Parents daughter of 0.54

son of 1.52Per:LOC Residence of 1.48

*independent from 1.18

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 30 / 43

Page 146: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Results

TAC-KBP 2013 Slot Filling Challenge:End-to-end task: includes IR + consistency.Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1UW Submission 69.8 11.4 19.6Ollie 57.7 11.8 19.6

Our System 61.9 13.9 22.7Median Team 18.6Our System + + 58.6 18.6 28.3Top Team 45.7 35.8 40.2

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 31 / 43

Page 147: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Results

TAC-KBP 2013 Slot Filling Challenge:End-to-end task: includes IR + consistency.Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1UW Submission 69.8 11.4 19.6Ollie 57.7 11.8 19.6Our System 61.9 13.9 22.7

Median Team 18.6Our System + + 58.6 18.6 28.3Top Team 45.7 35.8 40.2

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 31 / 43

Page 148: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Results

TAC-KBP 2013 Slot Filling Challenge:End-to-end task: includes IR + consistency.Precision: facts LDC evaluators judged as correct.Recall: facts other teams (including LDC annotators) also found.

System P R F1UW Submission 69.8 11.4 19.6Ollie 57.7 11.8 19.6Our System 61.9 13.9 22.7Median Team 18.6Our System + + 58.6 18.6 28.3Top Team 45.7 35.8 40.2

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 31 / 43

Page 149: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Roadmap

Common Sense Reasoning:Cats have tails[Angeli and Manning, 2013, Angeli and Manning, 2014]

Complex premises:Born in Hawaii, Obama is a graduate ofColumbia[Angeli et al., 2015]

Lexical + Logical Reasoning:A graduated cylinder would be best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 31 / 43

Page 150: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Roadmap

Common Sense Reasoning:Cats have tails[Angeli and Manning, 2013, Angeli and Manning, 2014]

Complex premises:Born in Hawaii, Obama is a graduate ofColumbia[Angeli et al., 2015]

Lexical + Logical Reasoning:A graduated cylinder would be best tomeasure the volume of a liquid

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 31 / 43

Page 151: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

ROOT

Rain and snow aretypes of precipitation?

Rain and snoware forms ofprecipitation

Heavy rain and snoware types of precipitation

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 32 / 43

Page 152: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

ROOT

Rain and snow aretypes of precipitation?

Rain and snoware forms ofprecipitation

Heavy rain and snoware types of precipitation

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 33 / 43

Page 153: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example SearchRain and snow are

types of precipitation

Rain and snow areforms of precipitation?

Rain and snoware formsof weather

Heavy rain and snoware forms of precipitation

. . .

Forms of precipitationinclude rain and sleet

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 33 / 43

Page 154: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example SearchRain and snow are

types of precipitation

Rain and snow areforms of precipitation?

Rain and snoware formsof weather

Heavy rain and snoware forms of precipitation

. . .

Forms of precipitationinclude rain and sleet

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 33 / 43

Page 155: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example SearchRain and snow are

types of precipitation

Rain and snow areforms of precipitation?

Rain and snoware formsof weather

Heavy rain and snoware forms of precipitation

. . .

Forms of precipitationinclude rain and sleet

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 33 / 43

Page 156: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Lexical Alignment Classifier

Forms of precipitation include rain and sleet

Rain and snow are forms of precipitation

Features1. Matching words2. Mismatched words3. Unmatched words in premise/consequent

Competitive with Stanford RTE system (63% on RTE3)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 34 / 43

Page 157: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Lexical Alignment Classifier

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Features1. Matching words2. Mismatched words3. Unmatched words in premise/consequent

Competitive with Stanford RTE system (63% on RTE3)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 34 / 43

Page 158: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Lexical Alignment Classifier

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Features1. Matching words2. Mismatched words3. Unmatched words in premise/consequent

Competitive with Stanford RTE system (63% on RTE3)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 34 / 43

Page 159: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Lexical Alignment Classifier

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Features1. Matching words

2. Mismatched words3. Unmatched words in premise/consequent

Competitive with Stanford RTE system (63% on RTE3)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 34 / 43

Page 160: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Lexical Alignment Classifier

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Features1. Matching words2. Mismatched words

3. Unmatched words in premise/consequent

Competitive with Stanford RTE system (63% on RTE3)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 34 / 43

Page 161: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Lexical Alignment Classifier

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Features1. Matching words2. Mismatched words3. Unmatched words in premise/consequent

Competitive with Stanford RTE system (63% on RTE3)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 34 / 43

Page 162: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Lexical Alignment Classifier

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Features1. Matching words2. Mismatched words3. Unmatched words in premise/consequent

Competitive with Stanford RTE system (63% on RTE3)

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 34 / 43

Page 163: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Old Problem: Logic + Lexical Classifiers

FOL and lexical classifiers don’t speak the same language

...but natural logic does!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 35 / 43

Page 164: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Old Problem: Logic + Lexical Classifiers

FOL and lexical classifiers don’t speak the same language...but natural logic does!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 35 / 43

Page 165: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Big Picture

Run our usual search1. If we find a premise, great!

2. If not, use lexical classifier as an evaluation function

Visit 1M nodes / second: We have to be fast!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 36 / 43

Page 166: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Big Picture

Run our usual search1. If we find a premise, great!2. If not, use lexical classifier as an evaluation function

Visit 1M nodes / second: We have to be fast!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 36 / 43

Page 167: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Big Picture

Run our usual search1. If we find a premise, great!2. If not, use lexical classifier as an evaluation function

Visit 1M nodes / second: We have to be fast!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 36 / 43

Page 168: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Dissecting Our Classifier

Anatomy of a ClassifierFeatures f (matching / mismatched / unmatched words)Weights wEntailment pair x

p(entail | x) = 11+exp(−wTf (x))

p(entail | x) monotone w.r.t.(wTf (x)

)Only need wTf (x) during search to compute maxp(entail | x)wTf (x) is our evaluation function

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 37 / 43

Page 169: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Dissecting Our Classifier

Anatomy of a ClassifierFeatures f (matching / mismatched / unmatched words)Weights wEntailment pair x

p(entail | x) = 11+exp(−wTf (x))

p(entail | x) monotone w.r.t.(wTf (x)

)

Only need wTf (x) during search to compute maxp(entail | x)wTf (x) is our evaluation function

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 37 / 43

Page 170: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Dissecting Our Classifier

Anatomy of a ClassifierFeatures f (matching / mismatched / unmatched words)Weights wEntailment pair x

p(entail | x) = 11+exp(−wTf (x))

p(entail | x) monotone w.r.t.(wTf (x)

)Only need wTf (x) during search to compute maxp(entail | x)wTf (x) is our evaluation function

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 37 / 43

Page 171: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Incorporating our Evaluation Function

Anatomy of a Search Step1. Mutate a word, or2. Delete a word, or3. Insert a word.

Each step updates a small number of features

wT f (x) = v xi

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 38 / 43

Page 172: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Incorporating our Evaluation Function

Anatomy of a Search Step1. Mutate a word, or2. Delete a word, or3. Insert a word.

Each step updates a small number of features

wT f (x) = v xi

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 38 / 43

Page 173: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Incorporating our Evaluation Function

Anatomy of a Search Step1. Mutate a word, or2. Delete a word, or3. Insert a word.

Each step updates a small number of features

v ′ = v - wi · fi + wi · fi

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 38 / 43

Page 174: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Why is this Important?

Faster Search⇒ Deeper ReasoningSpeed: Around 1M search states visited per secondMemory: 32 byte search states

Speed: Don’t re-featurize at every timestep.

Memory: Never store intermediate fact as String.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 39 / 43

Page 175: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are typesc of precipitationc

Score wTf (x): -0.5

Feature w f (x)Matching words 2.0 2Mismatched words -1.0 2Unmatched premise -0.5 1Unmatched consequent -0.75 0Bias -2.0 1

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 176: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

ROOT

Rain and snow aretypes of precipitation?

Rain and snoware forms ofprecipitation

Rain and snow aretypes of weather

. . .

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 177: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Score wTf (x): -0.5 + 2 − -1

Feature w f (x)Matching words 2.0 3Mismatched words -1.0 1Unmatched premise -0.5 1Unmatched consequent -0.75 0Bias -2.0 1

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 39 / 43

Page 178: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

An Example Search

Formsp of precipitationp includep rainp and sleetp

Rainc and snowc are formsc of precipitationc

Score wTf (x): 2.5

Feature w f (x)Matching words 2.0 3Mismatched words -1.0 1Unmatched premise -0.5 1Unmatched consequent -0.75 0Bias -2.0 1

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 39 / 43

Page 179: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

The Full System

Common Sense Reasoningp

Flexibility

Cov

erag

e

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 180: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

The Full System

+ Complex Premisesp

Flexibility

Cov

erag

e

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 181: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

The Full System

+ Evaluation Functionp

Flexibility

Cov

erag

e

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 182: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

The Full System

Common Sense FactsNatural logic inference as searchSoft relaxation of “strict” inference4x improvement in recall

Complex PremisesSplit the premise into atomic clausesShorten each clause w/ natural logic3 F1 improvement on knowledge base population

Evaluation FunctionUse lexical classifier as evaluation functionDetect likely entailment / contradictions

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 183: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

The Full System

Common Sense FactsNatural logic inference as searchSoft relaxation of “strict” inference4x improvement in recall

Complex PremisesSplit the premise into atomic clausesShorten each clause w/ natural logic3 F1 improvement on knowledge base population

Evaluation FunctionUse lexical classifier as evaluation functionDetect likely entailment / contradictions

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 184: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

The Full System

Common Sense FactsNatural logic inference as searchSoft relaxation of “strict” inference4x improvement in recall

Complex PremisesSplit the premise into atomic clausesShorten each clause w/ natural logic3 F1 improvement on knowledge base population

Evaluation FunctionUse lexical classifier as evaluation functionDetect likely entailment / contradictions

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 40 / 43

Page 185: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

Which activity is an example of a good health habit?(A) Watching television(B) Smoking cigarettes(C) Eating candy(D) Exercising every day

In our corpus:Plasma TV’s can display up to 16 million colors . . . great forwatching TV . . . also make a good screen.Not smoking or drinking alcohol is good for health, regardless ofwhether clothing is worn or not.Eating candy for diner is an example of a poor health habit.Healthy is exercising

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 186: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

Which activity is an example of a good health habit?(A) Watching television(B) Smoking cigarettes(C) Eating candy(D) Exercising every day

In our corpus:Plasma TV’s can display up to 16 million colors . . . great forwatching TV . . . also make a good screen.Not smoking or drinking alcohol is good for health, regardless ofwhether clothing is worn or not.Eating candy for diner is an example of a poor health habit.Healthy is exercising

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 187: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

Which activity is an example of a good health habit?(A) Watching television(B) Smoking cigarettes(C) Eating candy(D) Exercising every day

In our corpus:Plasma TV’s can display up to 16 million colors . . . great forwatching TV . . . also make a good screen.Not smoking or drinking alcohol is good for health, regardless ofwhether clothing is worn or not.Eating candy for diner is an example of a poor health habit.Healthy is exercising

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 188: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

System Train TestKNOWBOT 45KNOWBOT (ORACLE) 57

IR Baseline 49This Thesis 52More Data + IR Baseline 62More Data + This Thesis 65This Thesis + + 74 67

We’re able to pass 4th grade science!

[Hixon et al., 2015]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 189: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

System Train TestKNOWBOT 45KNOWBOT (ORACLE) 57IR Baseline 49This Thesis 52

More Data + IR Baseline 62More Data + This Thesis 65This Thesis + + 74 67

We’re able to pass 4th grade science!

[Hixon et al., 2015]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 190: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

System Train TestKNOWBOT 45KNOWBOT (ORACLE) 57IR Baseline 49This Thesis 52More Data + IR Baseline 62More Data + This Thesis 65

This Thesis + + 74 67

We’re able to pass 4th grade science!

[Hixon et al., 2015]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 191: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

System Train TestKNOWBOT 45 −KNOWBOT (ORACLE) 57 −IR Baseline 49 42This Thesis 52 51More Data + IR Baseline 62 58More Data + This Thesis 65 61

This Thesis + + 74 67

We’re able to pass 4th grade science!

[Hixon et al., 2015]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 192: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

System Train TestKNOWBOT 45 −KNOWBOT (ORACLE) 57 −IR Baseline 49 42This Thesis 52 51More Data + IR Baseline 62 58More Data + This Thesis 65 61This Thesis + + 74 67

We’re able to pass 4th grade science!

[Hixon et al., 2015]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 193: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Solving 4th Grade Science

Multiple choice questions from real 4th grade science exams

System Train TestKNOWBOT 45 −KNOWBOT (ORACLE) 57 −IR Baseline 49 42This Thesis 52 51More Data + IR Baseline 62 58More Data + This Thesis 65 61This Thesis + + 74 67

We’re able to pass 4th grade science!

[Hixon et al., 2015]Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 41 / 43

Page 194: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

First Order Logic Lexical Methods

Already useful for textual entailment[MacCartney and Manning, 2008, MacCartney, 2009]This thesis: Useful for question answeringThis thesis: We can bridge the two methods

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 195: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

Natural Logic Lexical Methods

Already useful for textual entailment[MacCartney and Manning, 2008, MacCartney, 2009]This thesis: Useful for question answeringThis thesis: We can bridge the two methods

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 196: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

Natural Logic Lexical Methods

Already useful for textual entailment[MacCartney and Manning, 2008, MacCartney, 2009]

This thesis: Useful for question answeringThis thesis: We can bridge the two methods

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 197: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

Natural Logic Lexical Methods

Already useful for textual entailment[MacCartney and Manning, 2008, MacCartney, 2009]This thesis: Useful for question answeringThis thesis: We can bridge the two methods

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 198: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

1. Encode logic in traditionally lexical representations[Bowman, 2014, Bowman et al., 2015]

2. Make natural logic more expressivePropositional + Natural logics:

Apples are red ∨ Bananas are redBananas are not red

∴ Apples are redSyntactic and idiomatic entailment models: SNLI corpus?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 199: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

1. Encode logic in traditionally lexical representations[Bowman, 2014, Bowman et al., 2015]

2. Make natural logic more expressive

Propositional + Natural logics:Apples are red ∨ Bananas are redBananas are not red

∴ Apples are redSyntactic and idiomatic entailment models: SNLI corpus?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 200: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

1. Encode logic in traditionally lexical representations[Bowman, 2014, Bowman et al., 2015]

2. Make natural logic more expressivePropositional + Natural logics:

Apples are red ∨ Bananas are redBananas are not red

∴ Apples are red

Syntactic and idiomatic entailment models: SNLI corpus?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 201: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

1. Encode logic in traditionally lexical representations[Bowman, 2014, Bowman et al., 2015]

2. Make natural logic more expressivePropositional + Natural logics:

Apples are red ∨ Bananas are redBananas are not red

∴ Apples are red

Syntactic and idiomatic entailment models: SNLI corpus?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 202: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Remaining Work

1. Encode logic in traditionally lexical representations[Bowman, 2014, Bowman et al., 2015]

2. Make natural logic more expressivePropositional + Natural logics:

Apples are red ∨ Bananas are redBananas are not red

∴ Apples are redSyntactic and idiomatic entailment models: SNLI corpus?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 42 / 43

Page 203: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Thanks!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 43 / 43

Page 204: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Thanks!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 43 / 43

Page 205: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Thanks!

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 43 / 43

Page 206: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

Questions?

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 43 / 43

Page 207: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

References I

Angeli, G., Chaganty, A., Chang, A., Reschke, K., Tibshirani, J., Wu,J. Y., Bastani, O., Siilats, K., and Manning, C. D. (2014a).Stanford’s 2013 KBP system.In TAC-KBP.

Angeli, G. and Manning, C. D. (2013).Philosophers are mortal: Inferring the truth of unseen facts.In CoNLL.

Angeli, G. and Manning, C. D. (2014).Naturalli: Natural logic inference for common sense reasoning.In EMNLP.

Angeli, G., Premkumar, M. J., and Manning, C. D. (2015).Leveraging linguistic structure for open domain information extraction.In ACL.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 44 / 43

Page 208: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

References II

Angeli, G., Tibshirani, J., Wu, J. Y., and Manning, C. D. (2014b).Combining distant and partial supervision for relation extraction.In EMNLP.

Banko, M., Cafarella, M. J., Soderland, S., Broadhead, M., and Etzioni,O. (2007).Open information extraction for the web.In IJCAI.

Berant, J., Chou, A., Frostig, R., and Liang, P. (2013).Semantic parsing on freebase from question-answer pairs.In Proceedings of EMNLP.

Bowman, S. (2014).Can recursive neural tensor networks learn logical reasoning?ICLR (arXiv:1312.6192).

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 45 / 43

Page 209: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

References III

Bowman, S. R., Potts, C., and Manning, C. D. (2015).Recursive neural networks can learn logical semantics.ACL-IJCNLP 2015.

Doddington, G. R., Mitchell, A., Przybocki, M. A., Ramshaw, L. A.,Strassel, S., and Weischedel, R. M. (2004).The automatic content extraction (ACE) program–tasks, data, andevaluation.In LREC.

Fader, A., Soderland, S., and Etzioni, O. (2011).Identifying relations for open information extraction.In EMNLP.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 46 / 43

Page 210: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

References IV

Glickman, O., Dagan, I., and Koppel, M. (2006).A lexical alignment model for probabilistic textual entailment.Machine Learning Challenges. Evaluating Predictive Uncertainty, VisualObject Classification, and Recognising Tectual Entailment.

Hixon, B., Clark, P., and Hajishirzi, H. (2015).Learning knowledge graphs for question answering throughconversational dialog.NAACL.

Hoffmann, R., Zhang, C., Ling, X., Zettlemoyer, L., and Weld, D. S.(2011).Knowledge-based weak supervision for information extraction ofoverlapping relations.In ACL-HLT.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 47 / 43

Page 211: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

References V

Icard, III, T. and Moss, L. (2014).Recent progress on monotonicity.Linguistic Issues in Language Technology.

MacCartney, B. (2009).Natural Language Inference.PhD thesis, Stanford.

MacCartney, B. and Manning, C. D. (2008).Modeling semantic containment and exclusion in natural languageinference.In Coling.

Mausam, Schmitz, M., Bart, R., Soderland, S., and Etzioni, O. (2012).Open language learning for information extraction.In EMNLP.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 48 / 43

Page 212: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

References VI

Mintz, M., Bills, S., Snow, R., and Jurafsky, D. (2009).Distant supervision for relation extraction without labeled data.In ACL.

Sanchez Valencia, V. M. S. (1991).Studies on natural logic and categorial grammar.PhD thesis, University of Amsterdam.

Soderland, S., Gilmer, J., Bart, R., Etzioni, O., and Weld, D. S. (2013).Open information extraction to KBP relations in 3 hours.In Text Analysis Conference.

Surdeanu, M. and Ciaramita, M. (2007).Robust information extraction with perceptrons.In ACE07 Proceedings.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 49 / 43

Page 213: Learning Open Domain Knowledge from Textangeli/talks/thesis.pdf · 2016. 8. 24. · This Thesis: Formal reasoning with text over large corpora age Flexibility Gabor Angeli (Stanford)

References VII

Surdeanu, M., Tibshirani, J., Nallapati, R., and Manning, C. D. (2012).

Multi-instance multi-label learning for relation extraction.

In EMNLP.

Wu, F. and Weld, D. S. (2007).

Autonomously semantifying wikipedia.

In Proceedings of the sixteenth ACM conference on information andknowledge management. ACM.

Gabor Angeli (Stanford) Learning Knowledge From Text November 5, 2015 50 / 43