61
CPSC 422, Lecture 27 Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Mar, 18, 2015

CPSC 422, Lecture 27Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 27 Mar, 18, 2015

Embed Size (px)

Citation preview

CPSC 422, Lecture 27 Slide 1

Intelligent Systems (AI-2)

Computer Science cpsc422, Lecture 27

Mar, 18, 2015

CPSC 422, Lecture 272

NLP: Knowledge-Formalisms Map

(including probabilistic formalisms)

Logical formalisms (First-Order Logics, Prob. Logics)

Rule systems (and prob. versions)

(e.g., (Prob.) Context-Free Grammars)

State Machines (and prob. versions)

(Finite State Automata,Finite State Transducers, Markov Models)

Morphology

Syntax

PragmaticsDiscourse

and Dialogue

Semantics

AI planners (Task decomposition, MDP Markov Decision Processes)

Machine Learning

CPSC 422, Lecture 27 3

Lecture Overview

• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars

(PCFG)• Treebanks and Grammar Learning

CPSC 422, Lecture 274

Key Constituents: Examples

• Noun phrases

• Verb phrases

• Prepositional phrases

• Adjective phrases

• Sentences

• (Det) N (PP) the cat on the

table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy

about it• (NP) (I) (VP) a mouse -- ate it

(Specifier) X (Complement)

CPSC 422, Lecture 27 5

CFG more complex Example

LexiconGrammar with example phrases

CPSC 422, Lecture 27 6

Derivations as Trees

flight

Nominal

Nominal

CPSC 422, Lecture 27 7

Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports Using a High-Performance Statistical Natural Language Parser Augmented with the UMLS Specialist Lexicon

Example of relatively complex parse tree

CPSC 422, Lecture 27 8

Lecture Overview

• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars

(PCFG)• Treebanks and Grammar Learning

CPSC 422, Lecture 27 9

Structural Ambiguity (Ex. 1)

“I shot an elephant in my pajamas”

VP -> V NP ; NP -> NP PPVP -> V NP PP

CPSC 422, Lecture 27 10

Structural Ambiguity (Ex.2)

“I saw Mary passing by cs2”

(ROOT (S (NP (PRP I)) (VP (VBD saw) (S (NP (NNP Mary)) (VP (VBG passing) (PP (IN by) (NP (NNP cs2)))))))

(ROOT (S (NP (PRP I)) (VP (VBD saw) (NP (NNP Mary)) (S

(VP (VBG passing) (PP (IN by) (NP (NNP cs2)))))))

CPSC 422, Lecture 27 11

Structural Ambiguity (Ex. 3)

• Coordination “new student and profs”

CPSC 422, Lecture 27 12

Structural Ambiguity (Ex. 4)

• NP-bracketing “French language teacher”

CPSC 422, Lecture 27 13

Lecture Overview

• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars

(PCFG)• Treebanks and Grammar Learning

(acquiring the probabilities)• Intro to Parsing PCFG

CPSC 422, Lecture 27 14

Probabilistic CFGs (PCFGs)

• Each grammar rule is augmented with a conditional probability

• GOAL: assign a probability to parse trees and to sentences

• If these are all the rules for VP and .55 is the VP -> Verb .55VP -> Verb NP .40VP -> Verb NP NP ??

• What ?? should be ?

A. 1

C. .05

B. 0

D. None of the above

CPSC 422, Lecture 27 15

Probabilistic CFGs (PCFGs)

• Each grammar rule is augmented with a conditional probability

Formal Def: 5-tuple (N, , P, S,D)

• The expansions for a given non-terminal sum to 1VP -> Verb .55VP -> Verb NP .40VP -> Verb NP NP .05

• GOAL: assign a probability to parse trees and to sentences

CPSC 422, Lecture 27 16

Sample PCFG

CPSC 422, Lecture 27 17

PCFGs are used to….• Estimate Prob. of parse tree

• Estimate Prob. of a sentence

A. Sum of the probs of all the rules applied

B. Product of the probs of all the rules applied

A. Sum of the probs of all the parse trees

B. Product of the probs of all the parse trees

CPSC 422, Lecture 27 18

PCFGs are used to….

• Estimate Prob. of parse tree

)(TreeP

• Estimate Prob. to sentences

)(SentenceP

CPSC 422, Lecture 27 19

Example

6

66

102.3

105.1107.1

)...."("

youCanP

6105.1...4.15.)( aTreeP 6107.1...4.15.)( bTreeP

CPSC 422, Lecture 27 20

Lecture Overview

• Recap English Syntax and Parsing• Key Problem with parsing: Ambiguity• Probabilistic Context Free Grammars

(PCFG)• Treebanks and Grammar Learning

(acquiring the probabilities)

CPSC 422, Lecture 27 21

Treebanks• DEF. corpora in which each sentence

has been paired with a parse tree • These are generally created

– Parse collection with parser– human annotators revise each parse

• Requires detailed annotation guidelines– POS tagset– Grammar– instructions for how to deal with

particular grammatical constructions.

CPSC 422, Lecture 27 22

Penn Treebank• Penn TreeBank is a widely used

treebank. Most well

known is the Wall Street Journal section of the Penn TreeBank.

1 M words from the 1987-1989 Wall Street Journal.

CPSC 422, Lecture 27 24

Treebank Grammars• Such grammars tend to be very flat

due to the fact that they tend to avoid recursion.– To ease the annotators burden

• For example, the Penn Treebank has 4500 different rules for VPs! Among them...

CPSC 422, Lecture 27 25

Heads in Trees

• Finding heads in treebank trees is a task that arises frequently in many applications.– Particularly important in statistical

parsing

• We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node.

CPSC 422, Lecture 27 26

Lexically Decorated Tree

CPSC 422, Lecture 27 27

Head Finding

• The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar.

CPSC 422, Lecture 27 28

Noun Phrases

CPSC 422, Lecture 27 29

Acquiring Grammars and Probabilities

Manually parsed text corpora (e.g., PennTreebank)

Ex: if the NP -> ART ADJ NOUN rule is used 50 times and all NP rules are used 5000 times, then the rule’s probability is …

• Grammar: read it off the parse treesEx: if an NP contains an ART, ADJ, and NOUN

then we create the rule NP -> ART ADJ NOUN.

)( AP

• Probabilities:

CPSC 422, Lecture 27 30

CPSC 322, Lecture 19

Learning Goals for today’s class

You can:• Provide a formal definition of a PCFG• Apply a PCFG to compute the probability of a parse

tree of a sentence as well as the probability of a sentence

• Describe the content of a treebank• Describe the process to identify a head of a

syntactic constituent• Compute the probability distribution of a PCFG from

a treebank

CPSC 422, Lecture 27 32

Next class on Fri

• Parsing Probabilistic CFG: CKY parsing

• PCFG in practice: Modeling Structural and Lexical DependenciesAssignment-3 due on MonAssignment-4 out Mon/Tue

CPSC 422, Lecture 27 33

Treebank Uses

• Searching a Treebank. TGrep2NP < PP or NP << PP • Treebanks (and headfinding) are

particularly critical to the development of statistical parsers– Chapter 14

• Also valuable to Corpus Linguistics – Investigating the empirical details of

various constructions in a given language

CPSC 422, Lecture 27 34

Today Jan 31

• Partial Parsing: Chunking• Dependency Grammars / Parsing• Treebank

• Start PCFG

CPSC 422, Lecture 27 35

“the man saw the girl with the telescope”

The girl has the telescopeThe man has the telescope

Syntactic Ambiguity…..

CPSC 422, Lecture 27 36

Today March 17

• Finish Context Free Grammars / English Syntax

• Parsing

CPSC 422, Lecture 27 37

Syntax of Natural LanguagesDef. The study of how sentences are

formed by grouping and ordering words

Example: Ming and Sue prefer morning flights

* Ming Sue flights morning and prefer

Groups behave as single unit wrt Substitution, Movement, Coordination

CPSC 422, Lecture 27 38

Syntax: Useful tasks

• Why should you care?–Grammar checkers–Basis for semantic interpretation

• Question answering • Information extraction• Summarization

–Machine translation–……

CPSC 422, Lecture 27 39

Key Constituents – with heads (English)• Noun phrases• Verb phrases• Prepositional

phrases • Adjective phrases• Sentences

• (Det) N (PP)• (Qual) V (NP)• (Deg) P (NP)• (Deg) A (PP)• (NP) (I) (VP)

Some simple specifiersCategory Typical function

ExamplesDeterminer specifier of N the, a, this,

no..Qualifier specifier of V never,

often..Degree word specifier of A or P very,

almost..

Complements?

(Specifier) X (Complement)

CPSC 422, Lecture 27 40

Key Constituents: Examples• Noun phrases

• Verb phrases

• Prepositional phrases

• Adjective phrases

• Sentences

• (Det) N (PP) the cat on the

table• (Qual) V (NP) never eat a cat• (Deg) P (NP) almost in the net• (Deg) A (PP) very happy

about it• (NP) (I) (VP) a mouse -- ate it

CPSC 422, Lecture 27 41

Context Free Grammar (Example)

• S -> NP VP• NP -> Det NOMINAL• NOMINAL -> Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left

Terminal

Non-terminal

Start-symbol

• Backbone of many models of syntax

• Parsing is tractable

CPSC 422, Lecture 27 42

CFG more complex Example

LexiconGrammar with example phrases

CPSC 422, Lecture 27 43

CFGs• Define a Formal Language

(un/grammatical sentences)

• Generative Formalism– Generate strings in the language– Reject strings not in the language– Impose structures (trees) on strings

in the language

CPSC 422, Lecture 27 44

CFG: Formal Definitions

• 4-tuple (non-term., term., productions, start)

• (N, , P, S)

• P is a set of rules A; AN, (N)*• A derivation is the process of rewriting 1

into m (both strings in (N)*) by

applying a sequence of rules: 1 * m

• L G = W|w* and S * w

CPSC 422, Lecture 27 45

Derivations as Trees

flight

Nominal

Nominal

Context Free?

CPSC 422, Lecture 27 46

Common Sentence-Types• Declaratives: A plane left

S -> NP VP• Imperatives: Leave!

S -> VP• Yes-No Questions: Did the plane

leave?S -> Aux NP VP

• WH Questions: Which flights serve breakfast?

S -> WH NP VP

When did the plane leave?S -> WH Aux NP VP

CPSC 422, Lecture 27 47

NP: more detailsNP -> Specifiers N

Complements• NP -> (Predet)(Det)(Card)(Ord)(Quant) (AP) Nom

e.g., all the other cheap cars

• Nom -> Nom PP (PP) (PP) e.g., reservation on BA456 from NY to

YVRNom -> Nom GerundVP

e.g., flight arriving on Monday Nom -> Nom RelClause Nom RelClause ->(who | that) VP e.g., flight that arrives in the evening

CPSC 422, Lecture 27 48

Conjunctive Constructions• S -> S and S

– John went to NY and Mary followed him• NP -> NP and NP

– John went to NY and Boston

• VP -> VP and VP– John went to NY and visited MOMA

• …• In fact the right rule for English is

X -> X and X

CPSC 422, Lecture 27 49

CFG for NLP: summary

• CFGs cover most syntactic structure in English.

• But there are problems (over-generation)– That can be dealt with adequately,

although not elegantly, by staying within the CFG framework.

• Many practical computational grammars simply rely on CFG

CPSC 422, Lecture 27 51

Today March 17

• Finish Context Free Grammars / English Syntax

• Parsing

CPSC 422, Lecture 27 52

Parsing with CFGs

Assign valid trees: covers all and only the elements of the input and has an S at

the top

Parser

I prefer a morning flight

flight

Nominal

Nominal

CFG

Sequence of words Valid parse trees

CPSC 422, Lecture 27 53

Parsing as Search• S -> NP VP• S -> Aux NP VP• NP -> Det Noun• VP -> Verb• Det -> a• Noun -> flight• Verb -> left,

arrive• Aux -> do, does

Search space of possible parse trees

CFG

defines

Parsing: find all trees that cover all and only the words in the input

CPSC 422, Lecture 27 54

Constraints on Search

Parser

I prefer a morning flight

flight

Nominal

NominalCFG

(search space)

Sequence of words Valid parse trees

Search Strategies: • Top-down or goal-directed• Bottom-up or data-directed

CPSC 422, Lecture 27 55

Top-Down Parsing• Since we’re trying to find trees

rooted with an S (Sentences) start with the rules that give us an S.

• Then work your way down from there to the words. flightInput:

CPSC 422, Lecture 27 56

Next step: Top Down Space

• When POS categories are reached, reject trees whose leaves fail to match all words in the input

…….. …….. ……..

CPSC 422, Lecture 27 57

Bottom-Up Parsing• Of course, we also want trees that

cover the input words. So start with trees that link up with the words in the right way.

• Then work your way up from there.

flight

flight

flight

CPSC 422, Lecture 27 58

Two more steps: Bottom-Up Space

flightflightflight

flightflight

flightflight

…….. …….. ……..

CPSC 422, Lecture 27 59

Top-Down vs. Bottom-Up• Top-down

– Only searches for trees that can be answers

– But suggests trees that are not consistent with the words

• Bottom-up– Only forms trees consistent with the

words– Suggest trees that make no sense

globally

CPSC 422, Lecture 27 60

Effective Parsing• Top-down and Bottom-up can be

effectively combined but still cannot deal with ambiguity and repeated parsing

• SOLUTION: Dynamic Programming approaches (you’ll see one applied to Prob, CFG) Fills tables with solution to subproblems

Parsing: sub-trees consistent with the input, once discovered, are stored and can be reused1. Stores ambiguous parse compactly

2. Does not do (avoidable) repeated work

CPSC 422, Lecture 27 61

Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports ….

Example of relatively complex parse tree

CPSC 422, Lecture 27

Learning Goals for today’s classYou can:• Explain what is the syntax of a Natural Language• Formally define a Context Free Grammar• Justify why a CFG is a reasonable model for the

English Syntax

• Apply a CFG as a Generative Formalism to– Impose structures (trees) on strings in the language (i.e.

Trace Top-down and Bottom-up parsing on sentence given a grammar)

– Reject strings not in the language (also part of parsing)– Generate strings in the language given a CFG

Slide 62

CPSC 422, Lecture 27 63

Next class Wed

• Probabilistic CFG…