Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Introduction into Inference Based Introduction into Inference‐Based Natural Language Understanding
Ekaterina OvchinnikovaISI, University of Southern California
February, 28th –UNED
OutlineOutline
1. Introduction: natural language unterstanding (NLU)
Knowledge for Reasoning2. Knowledge for Reasoning
3. Automatic reasoning for NLU
4. Experiments
5. Conclusion
Introduction:Natural Language UnderstandingNatural Language Understanding
Natural Language Understanding (NLU)Natural Language Understanding (NLU)
I d t d t d t l l In order to understand natural language, we need to know a lot about the world
and be able to draw inferences.
d l f h k l d h lText: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play
has been highly praised by critics for its language and dramatic effect.
Natural Language Understanding (NLU)Natural Language Understanding (NLU)
I d t d t d t l l In order to understand natural language, we need to know a lot about the world
and be able to draw inferences.
d l f h k l d h lText: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play
has been highly praised by critics for its language and dramatic effect.
Knowledge:
tragedies are plays• tragedies are plays
• plays are written in some language and have dramatic effect
Natural Language Understanding (NLU)Natural Language Understanding (NLU)
I d t d t d t l l In order to understand natural language, we need to know a lot about the world
and be able to draw inferences.
d l f h k l d h l
write
Text: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play
has been highly praised by critics for its language and dramatic effect.
Knowledge:
tragedies are plays• tragedies are plays
• plays are written in some language and have dramatic effect
• Shakespeare is a playwright; playwrights write playsShakespeare is a playwright; playwrights write plays
Natural Language Understanding (NLU)Natural Language Understanding (NLU)
I d t d t d t l l In order to understand natural language, we need to know a lot about the world
and be able to draw inferences.
d l f h k l d h l
write
Text: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play
has been highly praised by critics for its language and dramatic effect.
Knowledge:
tragedies are plays• tragedies are plays
• plays are written in some language and have dramatic effect
• Shakespeare is a playwright; playwrights write playsShakespeare is a playwright; playwrights write plays
• “early” indicates time; time modifies events
• ...
Computational NLU: applicationsComputational NLU: applications
T “R d J li ” i f Sh k ’ l di Th l Text: “Romeo and Juliet” is one of Shakespeare’s early tragedies. The play
has been highly praised by critics for its language and dramatic effect.
Queries:
Sh k i th th f “R d J li t” Shakespeare is the author of “Romeo and Juliet” .
Shakespeare went through a tragedy.
...
Applications: question answering, information extraction, automatic text Applications: question answering, information extraction, automatic text summarization, semantic search,...
Computational NLU: approachesComputational NLU: approaches
Shallow NLP methods are based on:Shallow NLP methods are based on:
lexical overlap pattern matching distributional similarity ...
continuum of methods
Deep NLP methods are based on:
semantic analysisy lexical and world knowledge logical inference ...
Inference based NLUInference‐based NLU
TEXTTEXT“Romeo and Juliet” is one of Shakespeare’s tragedies.
LOGICAL REPRESENTATION„Romeo and Juliet“(x) tragedy(x)
KNOWLEDGE BASEShakespeare(y) playwright(y)
playwright(y) play(x) write(y x)„Romeo and Juliet (x) tragedy(x) Shakespeare(y) rel(y,x)
playwright(y) play(x) write(y,x)tragedy(x) play(x)
b
INTERPRETATION„Romeo and Juliet“(x) tragedy(x) play(x)„ ( ) g y( ) p y( )
Shakespeare(y) write(y,x)
Inference based NLU pipelineInference‐based NLU pipeline
Knowledge base
Knowledgeabout world base
Text Semantic parser
Logicalrepresentation
Inference machine
Final application
QueriesKnowledge about language:lexicon, grammarlexicon, grammar
Knowledge for ReasoningKnowledge for Reasoning
Sources of machine readable world knowledgeSources of machine‐readable world knowledge
L i l ti di ti i1. Lexical‐semantic dictionaries
2. Distributional resources
3. Ontologies
Lexical semantic dictionariesLexical‐semantic dictionaries
Manually developed resources
Encode relations defined on word senses
Examples: WordNet, FrameNet
gived
getcausestreepandanussrew pine
type of
donorrecipienttheme
sourcerecipientthemetrunk
forestwood
bole
Type of inference
x(dog(x) animal(x)) Pluto is a dog Pluto is an animal
x,y,z(give(x,y,z) get(y,z)) John gave Mary a book Mary got a book
Distributional resourcesDistributional resources
Automatically learned from corpora
Encode distributional properties of words
Examples: VerbOcean, DIRT, Proposition Store, WikiRules
X finds a solution to Y Y is solved by X X resolves Y
John finds a solution to the problemthe problem is solved by JohnJohn resolves the problem
Type of inference
p
x,y(solve(x,y) z(find(x,z)solution(z) to(z,y))
John solves a puzzle John finds a solution for a puzzle
OntologiesOntologies
Manually developed resources
Encode relationships between concepts
Examples: SUMO, OpenCyc, DOLCE
x (French_Polynesia_Island(x) Pacific_Island (x))x (Pacific_Island(x) Island(x)y(located_in(x,y) Pacific_Ocean(y))
Type of inference
Tahiti is a Pacific island Tahiti is located in Pacific Ocean
Three types of sources of world knowledge for NLUThree types of sources of world knowledge for NLU
1. Lexical‐semantic 2. Distributional 3. Ontologiesdictionaries resources
3 g
relations between word sensestree plant
wordstree –wood
conceptsx (tree(x) y tree1 plant1
tree2 structure1tree woodtree – leaf
x (tree(x) y (part(y,x)branch(y)))
knowledge/updates common‐sense/ static
common‐sense/ dynamic
„scientific“/staticThese three types of knowledge sources:static dynamic static
designed for reasoning(consistent)
no no yes
These three types of knowledge sources:
• contain disjount knowledge with different properties• are all useful for NLU
language‐dependent yes yes no
domain‐dependent no no/yes yes/no
constructed manually automatically manually
• can be used in combitation
constructed manually automatically manually
structure simple no/simple complex
lexicalized yes yes noy y
probabilistic poor yes no
Logical Inference for NLULogical Inference for NLU
Logical inference: deduction and abductionLogical inference: deduction and abduction
D d ti lid l i l i fDeduction: valid logical inference
x(p(x) q(x)) Dogs are animals.
(A) Pl t i dp(A) Pluto is a dog.___________________ _______________________
q(A) Pluto is an animal.
Abduction: inference to the best explanation
( ( ) ( )) If it i th th i tx(p(x) q(x)) If it rains then the grass is wet.
q(A) The grass is wet.____________________ _____________________________________
p(A) It rains.
Deduction for NLUDeduction for NLU
lid i f i FOL t ti f t t
Blackburn and Bos (2005)
‐ valid inference given FOL representation of text
1 Proving that text follows from another text and KB1. Proving that text follows from another text and KBText1: Pluto is a dog. Pluto (dog(Pluto))
Text2: Pluto is an animal. Pluto (animal(Pluto)) Text1 entails Text2
KB: All dogs are animals. x (dog(x)animal(x))
C i d l2. Constructing modelsText: John saw a house. The door was open. John,h,d (see(John,h)house(h)
door(d) open(d))
KB: All houses have doors as their parts. x (house(x)y(door(y) part‐of(y,x)))
Model: see(John,h)house(h) door(d1) open(d1) door(d2) part‐of(d2,h)
d l h h h h d d d f d hMin. model: see(John,h)house(h) door(d) open(d) part‐of(d,h)
Weighted abduction for NLUWeighted abduction for NLU
b t i t t ti f t t b d i l
Hobbs et al.(1993)
‐ best interpretation of text based on meaning overlap
Text: John composed a sonata.
KB ( ) ( t ( )1 2 k f t( ))KB: (1) x (sonata(x)1.2work‐of‐art(x))
(2) x,y (put‐together(x,y)0.6 collection(y)0.6 compose(x,y))
(3) x,y (create(x,y)0.6 work‐of‐art(y)0.6 compose(x,y))(3) ,y ( ( ,y) f (y) p ( ,y))
(J h )$10 t ( )$10 compose(John,s)$10 sonata(s)$10: $30
put‐together(John x)$6 collection(x)$6 create(John y)$6 work‐of‐art(y)$6put‐together(John,x) collection(x) create(John,y) work‐of‐art(y)
sonata(y)$12
meaning overlap
(y)
Experiments1 RecognizingTextual Entailment1. RecognizingTextual Entailment
Recognizing textual entailment (RTE)Recognizing textual entailment (RTE)
Task: given a Text‐Hypothesis (T‐H) pair, predict entailment
Text : John gave a book to Mary.
H th i M t b kHypothesis : Mary got a book.
Entailment: YES
Text : John gave a book to Mary.
Hypothesis : Mary read a book.
E i l d
Entailment: NO
Experimental data: Second RTE Challenge (RTE‐2) datasets Development and test sets contain 800 T‐H pairs each Development and test sets contain 800 T H pairs each Evaluation measure: accuracy ‐ the percentage of pairs correctly judged
Deduction for RTEDeduction for RTE
C t T d H i t FOL l i l f1. Convert T and H into FOL logical forms
2. Query theorem prover
IF KB TH proven THEN return “entailment”
IF (KB T H) proven THEN return “inconsistent”
3. Query model builder
IF modelsT H towards KB THEN return “no entailment”
IF models T H towards KB THEN return “entailment possible”
Related work: Akhmatova and Molla (2005), Fowler et al. (2005),
Bos and Markert (2006), Tatu and Moldovan (2007)( ), ( 7)
RTE 2: results for deductive reasoner NutcrackerRTE 2: results for deductive reasoner Nutcracker
RTE system Nutcracker (Bos and Markert, 2005)y ( , 5)
Knowledge Base
WordNet inheritance: x (dog(x) animal(x))
synonymy: x (heat(x) warmth(x)) (ca. 619 000 axioms)
FrameNet relations: x,y,z (give(x,y,z) get(y,z))) (ca. 1500 axioms)
KB Proof found No proof Totaloo ou d(no. of pairs)
o p oo(no. of pairs)
ota(no. of pairs)
No KB 19 (2 4%) 781No KB 19 (2.4%) 781800WordNet and
FrameNet axioms22 (2.8 %) 778
Deductive reasoning for RTE: Discussion (1)Deductive reasoning for RTE: Discussion (1)
1. Finding proofs failed, because KB was incomplete
T: This dog barks.
H: Some animal is noisy.?
y
Solutions: LF decomposition (proposition‐to‐proposition entailment) LF decomposition (proposition to proposition entailment)
introduction of heuristics (model size, sentence length, semantic similarity)
2 No handling of ambiguity2. No handling of ambiguity
x (tree(x) plant(x))
x (tree(x) structure(x))
Solution: Statistical disambiguation before reasoning (not based on knowledge, depends g g g , p
on annotated corpora)
Deductive reasoning for RTE: Discussion (2)Deductive reasoning for RTE: Discussion (2)
3. Unbounded inference
dog(x) animal(x) creature(x) physical_object(x) entity(x)
Everything will be a part of the model!
4 Complexity of reasoning (FOL is undecidable)4. Complexity of reasoning (FOL is undecidable)
ca. 30 min per solved problem in average
(20 words per sentence in average)( p g )
Abduction for RTEAbduction for RTE
1 Construct best interpretations of T and H towards KB1. Construct best interpretations of T and H towards KB
2. Add interpretation of T to KB
3. Construct best interpretation of H towards KB + T
4. Compare cost of interpretation ofH towards KB and towards KB + Tp p
5. If the cost difference exceeds trained threshold, return “entailment”.Otherwise, return “no entailment”.
Related work: Raina et al (2005) Ovchinnikova et al (2011)Related work: Raina et al. (2005), Ovchinnikova et al. (2011)
RTE 2: results for abductive reasonerMini TACITUSRTE 2: results for abductive reasonerMini‐TACITUS Abductuve reasoner Mini‐TACITUS (Mulkar et al., 2007)
Knowledge Base
WordNet inheritance, instantiation, entailment, ... (ca. 5o7 500 axioms)
F N t d l ti ( i )FrameNet synonymy and relations (ca. 55 300 axioms)
Knowledge base Accuracy Average number of g y gaxioms per sentence
T H
No KB 57.3% 0 0No KB 57.3% 0 0
WordNet 59.6% 294 111
FrameNet 61.1% 1233 510
f f
WordNet+FrameNet 62.6% 1527 621
Best run outperforms 21 from 24 RTE‐2 participants(2 systems ‐ 73% and 75%, 2 systems ‐ 62% and 63%, 20 systems – from 55% to 61%)
Abductive reasoning for RTE: Discussion (1)Abductive reasoning for RTE: Discussion (1)
1 No treatment of logical connectors quantifiers andmodality in natural1. No treatment of logical connectors , quantifiers andmodality in naturallanguage
If A then B entails A and B
A is not B entails A is B
Solution: explicitly axiomatazing logical connections
... if(e1,e2)
( )... or(e1,e2)
2. Over‐unification
J h l d Bill lJohn eats an apple and Bill eats an apple.
Solution:
J h ( ) t( ) l ( ) Bill( ) t( ) l ( ) John(x1) eat(e1,x1,y1) apple(y1) Bill(x2) eat(e2,x2,y2) apple(y2) y1≠y2But how to formulate the constraints?
Abductive reasoning for RTE: Discussion (2)Abductive reasoning for RTE: Discussion (2)
3. Complexity of reasoning (Horn clauses ‐ exponential)In 30 min per sentence, optimal solutions found for 6% of the cases
Mini‐TACITUSwas not designed for large‐scale processing substantial optimization is possiblep p
Deduction vs abduction for NLUDeduction vs. abduction for NLU
D d ti W i ht d bd tiDeduction Weighted abduction
ambiguity • unable to choose betweenalternative readings,
• choice between readings based on meaning overlap
if both are consistent
unbounded inference unlimited inference chains an inference is appropriate, if it is part of the lowest‐costpproof
incomplete knowledge if a piece of relevantknowledge is missing
assumptions are allowedknowledge is missing, fails to find a proof
expressivity/complexity
FOLexpressive enough to
Horn clausesrestricted quantification and complexity expressive enough to
represent most of NLreasoning iscomputationally complex
restricted quantification and logical connectorsreasoning is cheapercomputationallycomputationally complex computationally
Experiments2 Semantic Role Labeling2. Semantic Role Labeling
Semantic role labeling (SRL)Semantic role labeling (SRL)
Task: given a predicate, disambiguate it and label its arguments with g p , g gsemantic roles
Text: Senses of “take”:
John took Mary home. Bringing [agent, theme, goal]y g g [ g , , g ]
John took drugs. Ingest substance [ingestor, substance]
John took a bus. Ride vehicle [theme, vehicle]
The walk took 30 minutes. Taking time [activity, time_length]
Experimental data: RTE 2 Challenge test data annotated with FrameNet frames and roles RTE‐2 Challenge test data annotated with FrameNet frames and roles
(Burchardt and Pennacchiotti, 2008) used as a gold standard
Abduction for SRLAbduction for SRL
In abductive framework SRL is a by‐product of constructing best In abductive framework, SRL is a by product of constructing best interpretations
Text : John took the bus. He got off at 10th street.g ff
LF: John(x1) take(e1,x1,x2) bus(x2) get_off(e2,x1) at(e2,x3) 10th_ street(x3)
Axioms from FrameNet:
1.Ride_vehicle(e1,x1,x2) take(e1,x1,x2)2 Taking time (e1 x1 x2) take(e1 x1 x2)2.Taking_time (e1,x1,x2) take(e1,x1,x2)3.Disembarking (e1,x1,x2) gett_off(e1,x1) at(e1,x2)4. Disembarking (e1,x1,x2) Ride_vehicle(e2,x1,x3)
Interpretation: John(x1) take(e1,x1,x2) bus(x2) Ride_vehicle(e1,x1,x2) get off(e2 ) at(e2 3) 0th street( 3) Disembarking (e2 3) get_off(e2,x1) at(e2,x3) 10th_ street(x3) Disembarking (e2,x1,x3)
SRL for RTE 2 test set : resultsSRL for RTE‐2 test set : results Frame‐Annotated Corpus for Textual Entailment (FATE) used as a gold
standard standard • annotates RTE‐2 test set• annotates only frames relevant for computing entailment
Results are compared against the state‐of‐the‐art system for assigning FrameNet frames and roles Shalmaneser ( alone and boosted with the WordNet Detour to FrameNet)WordNet Detour to FrameNet)• only recall is considered for frame match
System Frame matchRecall
Role matchPrecision Recall
Shalmaneser 0 55 0 54 0 37Shalmaneser 0.55 0.54 0.37
Shalmaneser+Detour 0.85 0.52 0.36
Mini‐TACITUS 0.65 0.55 0.30
Experiments3 Paraphrasing Noun‐Noun Dependencies3. Paraphrasing Noun Noun Dependencies
Paraphrasing noun noun dependencies in RTEParaphrasing noun‐noun dependencies in RTE
Task: given a noun‐noun construction (noun compound or possessive) in an g p pentailment pair, find best paraphrases
T: Muslims make up some 3.2 million of Germany’s 82 million people...H: 82 million people live in Germany.
Text : Germany’s peopleText : Germany s people...
Paraphrases : Germany has peoplepeople from Germany
l li i Gpeople live in Germany...
Experimental data: 1600 pairs of RTE‐2 set have been manually investigated only those NN dependencies which are crucial for inferring entailment only those NN‐dependencies which are crucial for inferring entailment
have been considered (93T‐H pairs)
Abduction for paraphrasing nn dependenciesAbduction for paraphrasing nn‐dependencies
Text : Shakespeare’s poem was writtenText : Shakespeare s poem was written...
LF: Shakespeare(x) poem (y) of(y,x) write(e,z,y)
Axioms from Proposition Store: Shakespeare(x) write (e,x,y) poem(y) Shakespeare(x) poem (y) of(y,x)Shakespeare(x) have (e x y) poem(y) Shakespeare(x) poem (y) of(y x)Shakespeare(x) have (e,x,y) poem(y) Shakespeare(x) poem (y) of(y,x)
Interpretation: Shakespeare(x) poem (y) of(y,x) write(e,x,y)
Paraphrasing axioms from Proposition StoreParaphrasing axioms from Proposition StorePeñas and Hovy (2010)
Dependency parse of newspaper texts is used to generate propositions
Propositions from sentence
Steve Walsh threw a pass to Brent Jones in the first quarter.
[Steve_Walsh:noun, throw:verb, pass:noun]
[Steve_Walsh:noun, throw:verb, pass:noun, to:prep, Brent_Jones:noun]
[Steve_Walsh:noun, throw:verb, pass:noun, in:prep, quarter:noun]
Propositions containing nouns Germany and people
[people:noun, in:prep, Germany:noun] :6433[p p , p p, y ]
[Germany:noun, have:verb, people:noun] :2035
[people:noun, live:verb, in:prep, Germany:noun] :288
NN dependencies in RTE 2: results for abductionNN‐dependencies in RTE‐2: results for abductionPeñas andOvchinnikova (2012)
KB: WordNet, FrameNet, and Proposition Store
Number of pairsNumber of pairs
Correct paraphrasing 42
Wrong paraphrasing 1
No NN construction found 27
No relevant paraphrase found 23
Outcomes:
Integrating heterogeneous knowledge gives advantages Integrating heterogeneous knowledge gives advantages • legalization of marijuana drugs legalization
Not all applied paraphrases (18 out of 42) were the most frequent
ConclusionsConclusions
World knowledge is not a bottleneck anymore
Inference‐based approaches can compete with purely statistical ones Inference based approaches can compete with purely statistical ones
Weighed abduction seems to be more promising for NLU than classical deductiondeduct o
Complexity of reasoning is still an issue
OutlookOutlook
Main obstacles to large scale inference based NLU :Main obstacles to large‐scale inference‐based NLU :
1. Lack of structured world knowledge applicable for reasoning
2 Computational complexity of reasoning2. Computational complexity of reasoning
Situation is changing:g g
• a lot of machine‐readable knowledge available• computational capacities increase
d l d• new reasoners developed
It‘ s time to look again at inference‐based NLU!
Thank you!