28
Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Embed Size (px)

Citation preview

Page 1: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Parsing withContext-Free Grammars for

ASR

Julia Hirschberg

CS 4706

Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Page 2: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

What is Syntax?

• Structure of language• How words are arranged together and related to one

another• Goal of syntactic analysis: relate surface form (what

someone says or writes) to underlying structure, to support semantic analysis (what the utterance or text means)

• Syntactic representation: typically a tree structure

Page 3: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Structure in Strings

• A set of words, or, a lexicon: the a small nice big very boy girl sees likes

• Some `good’ (grammatical) sentences:– the boy likes a girl – the small girl likes the big girl– a very small nice boy sees a very nice boy

• Some bad (ungrammatical) sentences:– *the boy the girl– *small boy likes nice girl

• Can we find a way of distinguishing between the two kinds of sequences?

• Can we identify similarities among grammatical subsequences?

Page 4: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

One Version of Constituent Structure

• Lexicon: the a small nice big very boy girl sees likes• Grammatical sentences:

– (the) boy (likes a girl) – (the small) girl (likes the big girl)– (a very small nice) boy (sees a very nice boy)

• Ungrammatical sentences:– *(the) boy (the girl)– *(small) boy (likes the nice girl)

Page 5: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Another Constituency Hypothesis

• Lexicon: the a small nice big very boy girl sees likes• Grammatical sentences:

– (the boy) likes (a girl) – (the small girl) likes (the big girl)– (a very small nice boy) sees (a very nice boy)

• Ungrammatical sentences:– *(the boy) (the girl)– *(small boy) likes (the nice girl)

• Better: fewer types of constituents (blue and red are of same type)

Page 6: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Even More Structures

• Lexicon: the a small nice big very boy girl sees likes• Grammatical sentences:

– ((the) boy) likes ((a) girl) – ((the) (small) girl) likes ((the) (big) girl)– ((a) ((very) small) (nice) boy) sees ((a) ((very) nice)

girl)• Ungrammatical sentences:

– *((the) boy) ((the) girl)– *((small) boy) likes ((the) (nice) girl)

Page 7: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

From Substrings to Trees

• (((the) boy) likes ((a) girl))

boythe

likesgirl

a

Page 8: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

How do we Label the Nodes?

• ( ((the) boy) likes ((a) girl) )• Choose constituents so each one has one non-bracketed

word: the head• Group words by distribution of constituents they head

(POS)– Noun (N), verb (V), adjective (Adj), adverb (Adv),

determiner (Det)• Category of constituent: XP, where X is POS

– NP, S, AdjP, AdvP, DetP

Page 9: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Types of Nodes

• (((the/Det) boy/N) likes/V ((a/Det) girl/N))

boy

the

likes

girl

a

DetP

NP NP

DetP

S

Phrase-structuretree

nonterminalsymbols= constituents

terminal symbols = words

Page 10: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Determining Part-of-Speech

A blue seat/a child seat: noun or adjective?– Syntax:

• a blue seat a child seat• a very blue seat *a very child seat • this seat is blue *this seat is child

– Morphology:• bluer *childer

– blue and child are not the same POS – blue is Adj, child is Noun

Page 11: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Determining Part-of-Speech

– Preposition or particle?• A he threw out the garbage• B he threw the garbage out the door• A he threw the garbage out • B *he threw the garbage the door out

– The two out are not same POS• A is particle, B is Preposition

Page 12: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Constituency

• Some Noun phrases (NPs)• A red dog on a blue tree• A blue dog on a red tree• Some big dogs and some little dogs• A dog• I• Big dogs, little dogs, red dogs, blue dogs,

yellow dogs, green dogs, black dogs, and white dogs

• How do we know these form a constituent?

Page 13: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

NP Constituency

• NPs can all appear before a verb:– Some big dogs and some little dogs are going

around in cars…– Big dogs, little dogs, red dogs, blue dogs,

yellow dogs, green dogs, black dogs, and white dogs are all at a dog party!

– I do not• But individual words can’t always appear before

verbs:– *little are going…– *blue are…– *and are

• Must be able to state generalizations like:– Noun phrases occur before verbs

Page 14: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

PP Constituency

• Preposing and postposing:– Under a tree is a yellow dog. – A yellow dog is under a tree.

• But not:– *Under, is a yellow dog a tree. – *Under a is a yellow dog tree.

• Prepositional phrases notable for ambiguity in attachment– I saw a man on a hill with a telescope.

Page 15: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Context-Free Grammars

• Defined in formal language theory– Terminals: e.g. cat – Non-terminal symbols: e.g. NP, VP– Start symbol: e.g. S– Rewriting rules: e.g. S NP VP

• Start with start symbol, rewrite using rules, done when only terminals left

Page 16: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

A Fragment of English

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

Input: the cat is on the mat

Page 17: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

S

S

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

Page 18: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

NP VP

NP

S

VP

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

Page 19: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

DetP N VP

DetP

NP

S

VP

N

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

Page 20: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

the cat VP

catthe

DetP

NP

S

VP

N

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

Page 21: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

the cat V PP

catthe

DetP

NP

PP

S

VP

N V

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

Page 22: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

the cat is Prep NP

catthe is

DetP

NP

PP

Prep

S

VP

N

NP

V

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

Page 23: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

the cat is on Det N

catthe is

DetP

NP

DetP

PP

Prep

S

VP

N

NP

V

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

on N

Page 24: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Derivations in a CFG

the cat is on the mat

catthe is

DetP

NP

DetP

PP

Prep

S

VP

N

NP

V

S NP VPVP V PPNP DetP NN cat | matV isPP Prep NPPrep onDetP the

on N

the mat

Page 25: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

A More Complicated Fragment of English

• S NP VP• S VP• VP V PP• VP V NP• VP V• NP DetP NP• NP N NP• NP N• PP Prep NP

• N cat | mat | food | bowl | Mary• V is | likes | sits• Prep on | in | under• DetP the | a

Mary likes the cat bowl.

The cat ate the tasty food.

Hello. Nice talking to you.

Page 26: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Pocket Sphinx Grammar Format

• Variables go in angle brackets, e.g. <city>• Terminals must appear in your pronunciation

dictionary (case sensitive)• X Y is concatenation -- e.g., I WANT• (X | Y) means X or Y -- e.g., (WANT|NEED)• Square brackets mean optional -- e.g., [ON]

FRIDAY• * means that the expansion may be spoken zero or

more times -- e.g. <digit>*• + means one or more times -- e.g. <digit>+

Page 27: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Example

• <city> = BOSTON | NEWYORK | WASHINGTON | BALTIMORE;

• <time> = MORNING | EVENING; • <day> = FRIDAY | MONDAY; • public <query> = (((WHAT TRAINS LEAVE) |

(WHAT TIME CAN I TRAVEL) | (IS THERE A TRAIN)) (FROM|TO) <city> [(FROM|TO) <city>] ON <day> [<time>]);

Hello. No. I want to go on Tuesday. When does the train leave?

Page 28: Parsing with Context-Free Grammars for ASR Julia Hirschberg CS 4706 Slides with contributions from Owen Rambow, Kathy McKeown, Dan Jurafsky and James Martin

Next Class

• Language modeling for large vocabulary applications: Ngrams