Upload
isaac-mathews
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
CPSC 422, Lecture 27 Slide 1
Intelligent Systems (AI-2)
Computer Science cpsc422, Lecture 27
Mar, 18, 2015
CPSC 422, Lecture 272
NLP: Knowledge-Formalisms Map
(including probabilistic formalisms)
Logical formalisms (First-Order Logics, Prob. Logics)
Rule systems (and prob. versions)
(e.g., (Prob.) Context-Free Grammars)
State Machines (and prob. versions)
(Finite State Automata,Finite State Transducers, Markov Models)
Morphology
Syntax
PragmaticsDiscourse
and Dialogue
Semantics
AI planners (Task decomposition, MDP Markov Decision Processes)
Machine Learning
CPSC 422, Lecture 27 3
Lecture Overview
• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars
(PCFG)• Treebanks and Grammar Learning
CPSC 422, Lecture 274
Key Constituents: Examples
• Noun phrases
• Verb phrases
• Prepositional phrases
• Adjective phrases
• Sentences
• (Det) N (PP) the cat on the
table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy
about it• (NP) (I) (VP) a mouse -- ate it
(Specifier) X (Complement)
CPSC 422, Lecture 27 7
Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon
Example of relatively complex parse tree
CPSC 422, Lecture 27 8
Lecture Overview
• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars
(PCFG)• Treebanks and Grammar Learning
CPSC 422, Lecture 27 9
Structural Ambiguity (Ex. 1)
“I shot an elephant in my pajamas”
VP -> V NP ; NP -> NP PPVP -> V NP PP
CPSC 422, Lecture 27 10
Structural Ambiguity (Ex.2)
“I saw Mary passing by cs2”
(ROOT (S (NP (PRP I)) (VP (VBD saw) (S (NP (NNP Mary)) (VP (VBG passing) (PP (IN by) (NP (NNP cs2)))))))
(ROOT (S (NP (PRP I)) (VP (VBD saw) (NP (NNP Mary)) (S
(VP (VBG passing) (PP (IN by) (NP (NNP cs2)))))))
CPSC 422, Lecture 27 13
Lecture Overview
• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars
(PCFG)• Treebanks and Grammar Learning
(acquiring the probabilities)• Intro to Parsing PCFG
CPSC 422, Lecture 27 14
Probabilistic CFGs (PCFGs)
• Each grammar rule is augmented with a conditional probability
• GOAL: assign a probability to parse trees and to sentences
• If these are all the rules for VP and .55 is the VP -> Verb .55VP -> Verb NP .40VP -> Verb NP NP ??
• What ?? should be ?
A. 1
C. .05
B. 0
D. None of the above
CPSC 422, Lecture 27 15
Probabilistic CFGs (PCFGs)
• Each grammar rule is augmented with a conditional probability
Formal Def: 5-tuple (N, , P, S,D)
• The expansions for a given non-terminal sum to 1VP -> Verb .55VP -> Verb NP .40VP -> Verb NP NP .05
• GOAL: assign a probability to parse trees and to sentences
CPSC 422, Lecture 27 17
PCFGs are used to….• Estimate Prob. of parse tree
• Estimate Prob. of a sentence
A. Sum of the probs of all the rules applied
B. Product of the probs of all the rules applied
A. Sum of the probs of all the parse trees
B. Product of the probs of all the parse trees
CPSC 422, Lecture 27 18
PCFGs are used to….
• Estimate Prob. of parse tree
)(TreeP
• Estimate Prob. to sentences
)(SentenceP
CPSC 422, Lecture 27 19
Example
6
66
102.3
105.1107.1
)...."("
youCanP
6105.1...4.15.)( aTreeP 6107.1...4.15.)( bTreeP
CPSC 422, Lecture 27 20
Lecture Overview
• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars
(PCFG)• Treebanks and Grammar Learning
(acquiring the probabilities)
CPSC 422, Lecture 27 21
Treebanks• DEF. corpora in which each sentence
has been paired with a parse tree • These are generally created
– Parse collection with parser– human annotators revise each parse
• Requires detailed annotation guidelines– POS tagset– Grammar– instructions for how to deal with
particular grammatical constructions.
CPSC 422, Lecture 27 22
Penn Treebank• Penn TreeBank is a widely used
treebank. Most well
known is the Wall Street Journal section of the Penn TreeBank.
1 M words from the 1987-1989 Wall Street Journal.
CPSC 422, Lecture 27 24
Treebank Grammars• Such grammars tend to be very flat
due to the fact that they tend to avoid recursion.– To ease the annotators burden
• For example, the Penn Treebank has 4500 different rules for VPs! Among them...
CPSC 422, Lecture 27 25
Heads in Trees
• Finding heads in treebank trees is a task that arises frequently in many applications.– Particularly important in statistical
parsing
• We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node.
CPSC 422, Lecture 27 27
Head Finding
• The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar.
CPSC 422, Lecture 27 29
Acquiring Grammars and Probabilities
Manually parsed text corpora (e.g., PennTreebank)
Ex: if the NP -> ART ADJ NOUN rule is used 50 times and all NP rules are used 5000 times, then the rule’s probability is …
• Grammar: read it off the parse treesEx: if an NP contains an ART, ADJ, and NOUN
then we create the rule NP -> ART ADJ NOUN.
)( AP
• Probabilities:
CPSC 322, Lecture 19
Learning Goals for today’s class
You can:• Provide a formal definition of a PCFG• Apply a PCFG to compute the probability of a parse
tree of a sentence as well as the probability of a sentence
• Describe the content of a treebank• Describe the process to identify a head of a
syntactic constituent• Compute the probability distribution of a PCFG from
a treebank
CPSC 422, Lecture 27 32
Next class on Fri
• Parsing Probabilistic CFG: CKY parsing
• PCFG in practice: Modeling Structural and Lexical DependenciesAssignment-3 due on MonAssignment-4 out Mon/Tue
CPSC 422, Lecture 27 33
Treebank Uses
• Searching a Treebank. TGrep2NP < PP or NP << PP • Treebanks (and headfinding) are
particularly critical to the development of statistical parsers– Chapter 14
• Also valuable to Corpus Linguistics – Investigating the empirical details of
various constructions in a given language
CPSC 422, Lecture 27 34
Today Jan 31
• Partial Parsing: Chunking• Dependency Grammars / Parsing• Treebank
• Start PCFG
CPSC 422, Lecture 27 35
“the man saw the girl with the telescope”
The girl has the telescopeThe man has the telescope
Syntactic Ambiguity…..
CPSC 422, Lecture 27 37
Syntax of Natural LanguagesDef. The study of how sentences are
formed by grouping and ordering words
Example: Ming and Sue prefer morning flights
* Ming Sue flights morning and prefer
Groups behave as single unit wrt Substitution, Movement, Coordination
CPSC 422, Lecture 27 38
Syntax: Useful tasks
• Why should you care?–Grammar checkers–Basis for semantic interpretation
• Question answering • Information extraction• Summarization
–Machine translation–……
CPSC 422, Lecture 27 39
Key Constituents – with heads (English)• Noun phrases• Verb phrases• Prepositional
phrases • Adjective phrases• Sentences
• (Det) N (PP)• (Qual) V (NP)• (Deg) P (NP)• (Deg) A (PP)• (NP) (I) (VP)
Some simple specifiersCategory Typical function
ExamplesDeterminer specifier of N the, a, this,
no..Qualifier specifier of V never,
often..Degree word specifier of A or P very,
almost..
Complements?
(Specifier) X (Complement)
CPSC 422, Lecture 27 40
Key Constituents: Examples• Noun phrases
• Verb phrases
• Prepositional phrases
• Adjective phrases
• Sentences
• (Det) N (PP) the cat on the
table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy
about it• (NP) (I) (VP) a mouse -- ate it
CPSC 422, Lecture 27 41
Context Free Grammar (Example)
• S -> NP VP• NP -> Det NOMINAL• NOMINAL -> Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left
Terminal
Non-terminal
Start-symbol
• Backbone of many models of syntax
• Parsing is tractable
CPSC 422, Lecture 27 43
CFGs• Define a Formal Language
(un/grammatical sentences)
• Generative Formalism– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings
in the language
CPSC 422, Lecture 27 44
CFG: Formal Definitions
• 4-tuple (non-term., term., productions, start)
• (N, , P, S)
• P is a set of rules A; AN, (N)*• A derivation is the process of rewriting 1
into m (both strings in (N)*) by
applying a sequence of rules: 1 * m
• L G = W|w* and S * w
CPSC 422, Lecture 27 46
Common Sentence-Types• Declaratives: A plane left
S -> NP VP• Imperatives: Leave!
S -> VP• Yes-No Questions: Did the plane
leave?S -> Aux NP VP
• WH Questions: Which flights serve breakfast?
S -> WH NP VP
When did the plane leave?S -> WH Aux NP VP
CPSC 422, Lecture 27 47
NP: more detailsNP -> Specifiers N
Complements• NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom
e.g., all the other cheap cars
• Nom -> Nom PP (PP) (PP) e.g., reservation on BA456 from NY to
YVRNom -> Nom GerundVP
e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening
CPSC 422, Lecture 27 48
Conjunctive Constructions• S -> S and S
– John went to NY and Mary followed him• NP -> NP and NP
– John went to NY and Boston
• VP -> VP and VP– John went to NY and visited MOMA
• …• In fact the right rule for English is
X -> X and X
CPSC 422, Lecture 27 49
CFG for NLP: summary
• CFGs cover most syntactic structure in English.
• But there are problems (over-generation)– That can be dealt with adequately,
although not elegantly, by staying within the CFG framework.
• Many practical computational grammars simply rely on CFG
CPSC 422, Lecture 27 52
Parsing with CFGs
Assign valid trees: covers all and only the elements of the input and has an S at
the top
Parser
I prefer a morning flight
flight
Nominal
Nominal
CFG
Sequence of words Valid parse trees
CPSC 422, Lecture 27 53
Parsing as Search• S -> NP VP• S -> Aux NP VP• NP -> Det Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left,
arrive• Aux -> do, does
Search space of possible parse trees
CFG
defines
Parsing: find all trees that cover all and only the words in the input
CPSC 422, Lecture 27 54
Constraints on Search
Parser
I prefer a morning flight
flight
Nominal
NominalCFG
(search space)
Sequence of words Valid parse trees
Search Strategies: • Top-down or goal-directed• Bottom-up or data-directed
CPSC 422, Lecture 27 55
Top-Down Parsing• Since we’re trying to find trees
rooted with an S (Sentences) start with the rules that give us an S.
• Then work your way down from there to the words. flightInput:
CPSC 422, Lecture 27 56
Next step: Top Down Space
• When POS categories are reached, reject trees whose leaves fail to match all words in the input
…….. …….. ……..
CPSC 422, Lecture 27 57
Bottom-Up Parsing• Of course, we also want trees that
cover the input words. So start with trees that link up with the words in the right way.
• Then work your way up from there.
flight
flight
flight
CPSC 422, Lecture 27 58
Two more steps: Bottom-Up Space
flightflightflight
flightflight
flightflight
…….. …….. ……..
CPSC 422, Lecture 27 59
Top-Down vs. Bottom-Up• Top-down
– Only searches for trees that can be answers
– But suggests trees that are not consistent with the words
• Bottom-up– Only forms trees consistent with the
words– Suggest trees that make no sense
globally
CPSC 422, Lecture 27 60
Effective Parsing• Top-down and Bottom-up can be
effectively combined but still cannot deal with ambiguity and repeated parsing
• SOLUTION: Dynamic Programming approaches (you’ll see one applied to Prob, CFG) Fills tables with solution to subproblems
Parsing: sub-trees consistent with the input, once discovered, are stored and can be reused1. Stores ambiguous parse compactly
2. Does not do (avoidable) repeated work
CPSC 422, Lecture 27 61
Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports ….
Example of relatively complex parse tree
CPSC 422, Lecture 27
Learning Goals for today’s classYou can:• Explain what is the syntax of a Natural Language• Formally define a Context Free Grammar• Justify why a CFG is a reasonable model for the
English Syntax
• Apply a CFG as a Generative Formalism to– Impose structures (trees) on strings in the language (i.e.
Trace Top-down and Bottom-up parsing on sentence given a grammar)
– Reject strings not in the language (also part of parsing)– Generate strings in the language given a CFG
Slide 62