44
Basic Parsing with Context- Free Grammars 1 Some slides adapted from Julia Hirschberg and Dan Jurafsky

Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Basic Parsing with Context-Free Grammars

1

Some slides adapted from Julia Hirschberg and Dan Jurafsky

To view past videos httpglobecvncolumbiaedu8080oncampusph

pc=133ae14752e27fde909fdbd64c06b337

Usually available only for 1 week Right now available for all previous lectures

2

3

4

5

Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you lsquothis is a legal string of the language XrsquoParsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

6

Many possible CFGs for English here is an example (fragment) S rarr NP VP VP rarr V NP NP rarr Det N | Adj NP N rarr boy | girl V rarr sees | likes Adj rarr big | small DetP rarr a | the

big the small girl sees a boy John likes a girl I like a girl I sleep The old dog the footsteps of the young

the small boy likes a girl

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 2: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

To view past videos httpglobecvncolumbiaedu8080oncampusph

pc=133ae14752e27fde909fdbd64c06b337

Usually available only for 1 week Right now available for all previous lectures

2

3

4

5

Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you lsquothis is a legal string of the language XrsquoParsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

6

Many possible CFGs for English here is an example (fragment) S rarr NP VP VP rarr V NP NP rarr Det N | Adj NP N rarr boy | girl V rarr sees | likes Adj rarr big | small DetP rarr a | the

big the small girl sees a boy John likes a girl I like a girl I sleep The old dog the footsteps of the young

the small boy likes a girl

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 3: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

3

4

5

Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you lsquothis is a legal string of the language XrsquoParsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

6

Many possible CFGs for English here is an example (fragment) S rarr NP VP VP rarr V NP NP rarr Det N | Adj NP N rarr boy | girl V rarr sees | likes Adj rarr big | small DetP rarr a | the

big the small girl sees a boy John likes a girl I like a girl I sleep The old dog the footsteps of the young

the small boy likes a girl

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 4: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

4

5

Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you lsquothis is a legal string of the language XrsquoParsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

6

Many possible CFGs for English here is an example (fragment) S rarr NP VP VP rarr V NP NP rarr Det N | Adj NP N rarr boy | girl V rarr sees | likes Adj rarr big | small DetP rarr a | the

big the small girl sees a boy John likes a girl I like a girl I sleep The old dog the footsteps of the young

the small boy likes a girl

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 5: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

5

Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you lsquothis is a legal string of the language XrsquoParsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

6

Many possible CFGs for English here is an example (fragment) S rarr NP VP VP rarr V NP NP rarr Det N | Adj NP N rarr boy | girl V rarr sees | likes Adj rarr big | small DetP rarr a | the

big the small girl sees a boy John likes a girl I like a girl I sleep The old dog the footsteps of the young

the small boy likes a girl

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 6: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Declarative formalisms like CFGs FSAs define the legal strings of a language -- but only tell you lsquothis is a legal string of the language XrsquoParsing algorithms specify how to recognize the strings of a language and assign each string one (or more) syntactic analyses

6

Many possible CFGs for English here is an example (fragment) S rarr NP VP VP rarr V NP NP rarr Det N | Adj NP N rarr boy | girl V rarr sees | likes Adj rarr big | small DetP rarr a | the

big the small girl sees a boy John likes a girl I like a girl I sleep The old dog the footsteps of the young

the small boy likes a girl

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 7: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Many possible CFGs for English here is an example (fragment) S rarr NP VP VP rarr V NP NP rarr Det N | Adj NP N rarr boy | girl V rarr sees | likes Adj rarr big | small DetP rarr a | the

big the small girl sees a boy John likes a girl I like a girl I sleep The old dog the footsteps of the young

the small boy likes a girl

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 8: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

S NP VP VP VS Aux NP VP VP -gt V PPS -gt VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young | flight

NP PropN V dog | include | prefer | book

NP -gt PronounNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 9: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Parse Tree for lsquoThe old dog the footsteps of the youngrsquo for Prior CFG

S

NP VP

NPV

DETNOM

N PP

DET NOM

N

The old dog the

footstepsof the young

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 10: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Searching FSAs Finding the right path through the automaton Search space defined by structure of FSASearching CFGs Finding the right parse tree among all possible

parse trees Search space defined by the grammarConstraints provided by the input sentenceand the automaton or grammar

10

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 11: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Builds from the root S node to the leavesExpectation-basedCommon search strategy Top-down left-to-right backtracking Try first rule with LHS = S Next expand all constituents in these treesrules Continue until leaves are POS Backtrack when candidate POS does not match input string

11

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 12: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

ldquoThe old dog the footsteps of the youngrdquoWhere does backtracking happen

What are the computational disadvantages

What are the advantages

12

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 13: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Parser begins with words of input and builds up trees applying grammar rules whose RHS matches

Det N V Det N Prep Det NThe old dog the footsteps of the young

Det Adj N Det N Prep Det NThe old dog the footsteps of the young

Parse continues until an S root node reached or no further node expansion possible

13

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 14: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Det N V Det N Prep Det NThe old dog the footsteps of the youngDet Adj N Det N Prep Det N

14

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 15: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

When does disambiguation occur

What are the computational advantages and disadvantages

15

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 16: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Top-Down parsers ndash they never explore illegal parses (eg which canrsquot form an S) -- but waste time on trees that can never match the inputBottom-Up parsers ndash they never explore trees inconsistent with input -- but waste time exploring illegal parses (with no S root)For both find a control strategy -- how explore search space efficiently Pursuing all parses in parallel or backtrack or hellip Which rule to apply next Which node to expand next

16

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 17: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Dynamic Programming Approaches ndash Use a chart to represent partial results

CKY Parsing Algorithm Bottom-up Grammar must be in Normal Form The parse tree might not be consistent with linguistic

theoryEarly Parsing Algorithm Top-down Expectations about constituents are confirmed by input A POS tag for a word that is not predicted is never addedChart Parser

17

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 18: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Allows arbitrary CFGsFills a table in a single sweep over the input words Table is length N+1 N is number of words Table entries represent

Completed constituents and their locationsIn-progress constituentsPredicted constituents

18

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 19: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

The table-entries are called states and are represented with dotted-rulesS -gt VP A VP is predicted

NP -gt Det Nominal An NP is in progress

VP -gt V NP A VP has been found

19

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 20: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

It would be nice to know where these things are in the input sohellipS -gt VP [00] A VP is predicted at the

start of the sentence

NP -gt Det Nominal [12] An NP is in progress the Det goes from 1 to 2

VP -gt V NP [03] A VP has been found starting at 0 and ending at 3

20

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 21: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

21

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 22: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

As with most dynamic programming approaches the answer is found by looking in the table in the right placeIn this case there should be an S state in the final column that spans from 0 to n+1 and is completeIf thatrsquos the case yoursquore done S ndashgt α [0n+1]

22

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 23: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

March through chart left-to-rightAt each step apply 1 of 3 operators Predictor

Create new states representing top-down expectations Scanner

Match word predictions (rule with word after dot) to words

CompleterWhen a state is complete see what rules were looking for that completed constituent

23

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 24: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Given a state With a non-terminal to right of dot (not a part-

of-speech category) Create a new state for each expansion of the

non-terminal Place these new states into same chart entry as

generated state beginning and ending where generating state ends So predictor looking at

S -gt VP [00] results in

VP -gt Verb [00]VP -gt Verb NP [00]

24

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 25: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Given a state With a non-terminal to right of dot that is a part-of-

speech category If the next word in the input matches this POS Create a new state with dot moved over the non-

terminal So scanner looking at VP -gt Verb NP [00] If the next word ldquobookrdquo can be a verb add new state

VP -gt Verb NP [01] Add this state to chart entry following current one Note Earley algorithm uses top-down input to

disambiguate POS Only POS predicted by some state can get added to chart

25

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 26: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Applied to a state when its dot has reached right end of roleParser has discovered a category over some span of inputFind and advance all previous states that were looking for this category copy state move dot insert in current chart entryGiven NP -gt Det Nominal [13] VP -gt Verb NP [01]Add VP -gt Verb NP [03]

26

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 27: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Find an S state in the final column that spans from 0 to n+1 and is complete

If thatrsquos the case yoursquore done S ndashgt α [0n+1]

27

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 28: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

More specificallyhellip

1 Predict all the states you can upfront

2 Read a word1 Extend states based on matches2 Add new predictions3 Go to 2

3 Look at N+1 to see if you have a winner

28

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 29: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Book that flightWe should findhellip an S from 0 to 3 that is a completed statehellip

29

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 30: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

S NP VP VP VS Aux NP VP PP -gt Prep NPNP Det Nom N old | dog | footsteps |

young

NP PropN V dog | include | preferNom -gt Adj Nom Aux doesNom N Prep from | to | on | ofNom N Nom PropN Bush | McCain |

ObamaNom Nom PP Det that | this | a| theVP V NP Adj -gt old | green | red

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 31: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

31

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 32: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

32

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 33: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

33

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 34: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

What kind of algorithms did we just describe Not parsers ndash recognizers

The presence of an S state with the right attributes in the right place indicates a successful recognitionBut no parse treehellip no parserThatrsquos how we solve (not) an exponential problem in polynomial time

34

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 35: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

With the addition of a few pointers we have a parserAugment the ldquoCompleterrdquo to point to where we came from

35

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 36: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

S8S9

S10

S11

S13S12

S8

S9S8

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 37: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

All the possible parses for an input are in the table

We just need to read off all the backpointers from every complete S in the last column of the table

Find all the S -gt X [0N+1]

Follow the structural traces from the Completer

Of course this wonrsquot be polynomial time since there could be an exponential number of trees

We can at least represent ambiguity efficiently

37

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 38: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Depth-first search will never terminate if grammar is left recursive (eg NP --gt NP PP)

38

)( εαα ⎯rarr⎯ΑΒ⎯rarr⎯Α

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 39: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Solutions Rewrite the grammar (automatically) to a weakly

equivalent one which is not left-recursiveeg The man on the hill with the telescopehellipNP NP PP (wanted Nom plus a sequence of PPs)NP Nom PPNP NomNom Det NhellipbecomeshellipNP Nom NPrsquoNom Det NNPrsquo PP NPrsquo (wanted a sequence of PPs)NPrsquo e

Not so obvious what these rules meanhellip

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 40: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Harder to detect and eliminate non-immediate left recursion

NP --gt Nom PPNom --gt NP

Fix depth of search explicitly

Rule ordering non-recursive rules firstNP --gt Det NomNP --gt NP PP

40

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 41: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Multiple legal structures Attachment (eg I saw a man on a hill with a

telescope) Coordination (eg younger cats and dogs) NP bracketing (eg Spanish language teachers)

41

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 42: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

42

NP vs VP Attachment

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 43: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Solution Return all possible parses and disambiguate

using ldquoother methodsrdquo

43

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up
Page 44: Basic Parsing with Context- Free Grammarskathy/NLP/ClassSlides/Class7... · 2009-09-29 · `Declarative formalisms like CFGs, FSAs define the legal strings of a language-- but only

Parsing is a search problem which may be implemented with many control strategies Top-Down or Bottom-Up approaches each have

problemsCombining the two solves some but not all issues

Left recursion Syntactic ambiguityNext time Making use of statistical information about syntactic constituents Read Ch 14

44

  • Slide Number 1
  • Announcements
  • Homework Questions
  • Evaluation
  • Syntactic Parsing
  • Syntactic Parsing
  • CFG Example
  • Modified CFG
  • Slide Number 9
  • Parsing as a Form of Search
  • Top-Down Parser
  • Rule Expansion
  • Bottom-Up Parsing
  • Slide Number 14
  • Bottom-up parsing
  • Whatrsquos rightwrong withhellip
  • Some Solutions
  • Earley Parsing
  • States
  • StatesLocations
  • Graphically
  • Earley
  • Earley Algorithm
  • Predictor
  • Scanner
  • Completer
  • How do we know we are done
  • Earley
  • Example
  • CFG for Fragment of English
  • Example
  • Example
  • Example
  • Details
  • Converting Earley from Recognizer to Parser
  • Augmenting the chart with structural information
  • Retrieving Parse Trees from Chart
  • Left Recursion vs Right Recursion
  • Slide Number 39
  • Slide Number 40
  • Another Problem Structural ambiguity
  • Slide Number 42
  • Slide Number 43
  • Summing Up