41
Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

  • View
    217

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Fall 2005

Lecture Notes #3

EECS 595 / LING 541 / SI 661

Natural Language Processing

Page 2: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Context-Free Grammarsfor English

Page 3: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Context-Free Rules and Trees

• Grammars

• CFG = PSG = BNF

• Derivations, parse trees

Page 4: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Constituency

• Examples:– Josephine– My neighbor’s cat– He– Peter, Paul, and Mary– The first three people to participate in the competition– with (?)

• Preposed and postposed constructions:– In the park, he plays with his dog.– He plays in the park with his dog.– He plays with his dog in the park.

Page 5: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Examples of noun phrases

fig09.01.pdf

fig09.02.pdf

fig09.03.pdf fig09.04.pdf

• Terminals, non-terminals

• Parsing: the process of mapping from a string of words to one or more parse trees

Page 6: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Sentence-level constructions

• Declarative vs. imperative sentences

• Imperative sentences: S VP

• Yes-no questions: S Aux NP VP

• Wh-type questions: S Wh-NP VP• Fronting (less frequent):

On Tuesday, I would like to fly to San Diego

Page 7: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Noun phrase

• Before the noun– Determiner: a, the, that, this, those, any, some– No determiner (e.g., in plural, mass nouns “dinner”)– Predeterminers: all– Postdeterminers: cardinals, ordinals, quantifiers: one,

two; first, second, next, last, past, other, another; many, (a) few, several, much, a little

– Adjectives: a first-class fare, a nonstop flight, the longest layover

– AP: the least expensive fare– NP (Det) (Card) (Ord) (Quant) (AP) Nominal

Page 8: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Noun phrases (Cont’d)

• Postmodifiers:– any stopovers [for Delta seven fifty one]– all flights [from Cleveland] [to Newark]

• Nominal Nominal PP (PP) (PP)

• Non-finite postmodifiers: gerundive, -ed, infinitive

Page 9: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Gerunds

• any flights [arriving after ten p.m]

• Nominal Nominal GerundVP

• GerundVP GerundV NP | GerundV PP | GerundV | GerundV NP PP

• GerundV being | preferring | arriving …

Page 10: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Infinitives and –ed forms

• the last flight to arrive in Boston

• I need to have dinner served

• which is the aircraft used by this flight?

Page 11: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Postnominal relative clauses

• Restrictive relative clauses:– A flight that serves breakfast– Flights that leave in the morning– The United flight that arrives in San Jose at ten p.m.

• Rules:– Nominal Nominal RelClause– RelClause (who | that) VP

• Multiple postnominal modifiers can be combined:– A boy from London studying French in Spain(what are the modifiers in the previous example)?

Page 12: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Combining post-modifiers

• A flight from Phoenix to Detroit leaving Monday evening

• Evening flights from Nashville to Houston that serve dinner

Page 13: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

A slightly more complicated example

• The earliest American Airlines flight that I can get

• What rules are needed in the grammar for this type of constructions?

Page 14: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Coordination

• Coordinate noun phrases:– NP NP and NP– S S and S– Similar for VP, etc.

Page 15: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Agreement

• Examples:– Do any flights stop in Chicago?– Do I get dinner on this flight?– Does Delta fly from Atlanta to Boston?– What flights leave in the morning?– * What flight leave in the morning?

• Rules:– S Aux NP VP– S 3sgAux 3sgNP VP– S Non3sgAux Non3sgNP VP– 3sgAux does | has | can …– non3sgAux do | have | can …

Page 16: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Agreement

• We now need similar rules for pronouns, also for number agreement, etc.– 3SgNP (Det) (Card) (Ord) (Quant) (AP)

SgNominal– Non3SgNP (Det) (Card) (Ord) (Quant) (AP)

PlNominal– SgNominal SgNoun | SgNoun SgNoun– etc.

Page 17: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Combinatorial explosion

• What other phenomena will cause the grammar to expand?

• Solution: parameterization with feature structures (see Chapter 11)

Page 18: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

The Verb phrase

• VP Verb

• VP Verb NP

• VP Verb NP PP

• VP Verb PP

Page 19: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Sentential complements

• You said there were two flights that were the cheapest

• You said you had a two hundred sixty six dollar fare

• VP Verb S• I want to fly from Milwaukee to Orlando• I’m trying to find a flight that goes from

Pittsburgh to Denver next Friday• VP Verb VP

Page 20: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Subcategorization

• Frames:– 0: eat, sleep

– NP: prefer, find, leave

– NP NP: show, give

– PPfrom PPto: fly, travel

– NP PPwith: help, load

– VPto: prefer, want, need

– VPbarestem: can, would, might

– S: mean

Page 21: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Subcategorization ambiguity

• Find me a flight– What phenomenon is related to this sentence?

• Others?

Page 22: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Auxiliaries

• Modals: can, could, may, might

• Perfect: have

• Progressive: be

• Passive: be

• What are their subcategories?

• Ordering: modal < perfect < progressive < passive

Page 23: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Parsing withContext-Free Grammars

Page 24: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Introduction• Parsing = associating a structure (parse tree) to an input

string using a grammar

• CFG are declarative, they don’t specify how the parse tree will be constructed

• Parsing programming languages is easy. They are designed to be unambiguous and efficiently parsed.

• However, natural languages are inherently ambiguous– I saw [the man] [with a telescope].– I saw [the man with a telescope].

Page 25: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Applications

• Parse trees are used in – Grammar checking: MS Word– Semantic analysis: explaining ambiguity– Machine translation: parse tree operations– Question answering: e.g.

“How many people in the Human Resources Department receive salaries above $30,000?”

– Speech recognition: e.g. Put the file in the folder. Put the file and the folder.

– information extraction, information retrieval, etc..

Page 26: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Parsing as search

S NP VP Det that | this |a

S Aux NP VP Noun book | flight | meal | money

S VP Verb book | include | prefer

NP Det Nominal Aux does

Nominal Noun Proper-Noun Houston | TWA

Nominal Noun Nominal Prep from | to | on

NP Proper-Noun

VP Verb

VP Verb NP

Nominal Nominal PP

Page 27: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Parsing as search

Book that flight. S

VP

NP

Nom

Verb Det Noun

Book that flight

Two types of constraints on the parses: a) some that come from the input string,b) others that come from the grammar

Page 28: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Top-down parsing

S

NP VP

S

Aux VP

S

VP

S

NP

S

NP VP

Det Nom

S

NP VP

PropN

S

NP VP

Det Nom

S

VP

V NP

Aux

S

NP VPAux

PropN

S

VP

V

Page 29: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Book that flight

Book that flight

Noun Det Noun

Book that flight

Verb Det Noun

Book that flight

Noun Det Noun

Book that flight

Verb Det Noun

Book that flight

Noun Det Noun

Book that flight

Verb Det Noun

Book that flight

Verb Det Noun

Book that flight

Verb Det Noun

Book that flight

Verb Det Noun

NOM NOM NOM

NOMNOM NOM NOM

NOM NOM

VP NP

NP NP

VP

Bottom-up parsing

NP

VP

Page 30: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Comparing TD and BU parsers

• TD parser– never wastes time exploring trees that cannot result in

an S– but ignores the input until it reaches the “leaves” of the

tree.

• BU parser– never spends effort on trees that are not consistent with

the input. – but constructs useless subtrees that do not lead to an S.

• Needed: some middle ground.

Page 31: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Basic TD parser

• Practically infeasible to generate all trees in parallel.

• Use depth-first strategy.

• When arriving at a tree that is inconsistent with the input, return to the most recently generated but still unexplored tree.

fig10.05.pdf

Page 32: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

function TOP-DOWN-PARSE (input, grammar) returns a parse tree agenda (Initial S tree, Beginning of input) current-search-state POP (agenda) loop if SUCCESSFUL-PARSE? (current-search-state) then return TREE (current-search-state) else if CAT (NODE-TO-EXPAND (current-search-state)) is a POS then if CAT (node-to-expand) POS (CURRENT-INPUT (current-search-state)) then PUSH (APPLY-LEXICAL-RULE (current-search-state), agenda) else return reject else PUSH (APPLY-RULES (current-search-state, grammar), agenda) if agenda is empty then return reject else current-search-state NEXT (agenda) end

A TD-DF-LR parser

Page 33: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

An example

Does this flight include a meal?

fig10.07.pdf fig10.08.pdf

fig10.09.pdf

• We can add bottom-up filtering to eliminate the trees that are inconsistent with the input. This is called left corner (LC) parsing.

Page 34: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Problems with the basic parser

• Left-recursion: rules of the type: NP NP PPsolution: rewrite each rule of the form A A | using a new symbol: A A’ A A’ |

• Ambiguity: attachment ambiguity, coordination ambiguity, noun-phrase bracketing ambiguity

• Attachment ambiguity: I saw the Grand Canyon flying to New York

• Coordination ambiguity: old men and women

fig10.10.pdf

fig10.11.pdf

fig10.12.pdf

Page 35: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Problems with the basic parser

• Example:President Kennedy today pushed aside other White House business to devote all his time and attention to working on the Berlin crisis address he will deliver tomorrow night to the American people over nationwide television and radio.

• Solutions: return all parses or include disambiguation in the parser.

• Inefficient reparsing of subtrees: a flight from Indianapolis to Houston on TWA

fig10.13.pdf

fig10.14.pdf

Page 36: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

The Earley algorithm(aka Chart Parser)

• Resolving:– Left-recursive rules– Ambiguity– Inefficient reparsing of subtrees

• A chart with N+1 entries• Dotted rules

– S . VP, [0,0]

– NP Det . Nominal, [1,2]

– VP V NP ., [0,3]fig10.15.pdf

Page 37: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Three operations

• Predictor (expands the rules)– Given S .VP, [0,0], derive

VP . Verb, [0,0] and VP . Verb NP, [0,0]

• Scanner (scans the current word in the input if applicable)– Given VP . Verb NP, [0,0], derive

VP Verb ., [0,1] and VP Verb . NP, [0,1] if the current word is a Verb.

• Completer (completes parsing an entire rule)– Given NP Det Nominal., [1,3], and VP Verb . NP, [0,1], derive

VP Verb NP ., [0,3]

fig10.16.pdf fig10.17.pdf

Page 38: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Overview of Chart Parser

• Dynamic programming. All possible states for chart[n] are produced before reading the n+1st word.

• Never parses the same subtree again.• The idea of “incremental” parsing is close

to how humans parse the sentences. Is chart table a representation of the human brain’s state?

Page 39: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Some Theoretical Limitations

• Chart parser is O(n3)• Fast CFG parsing requires fast Boolean

matrix multiplication (Lee 2002), i.e. it is very unlikely that a much better algorithm exists for parsing.

• There is strong evidence showing that natural languages may not be context-free at all (Shieber 1985).

Page 40: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Parsing with FSAs

• Shallow parsing

• Useful for information extraction: noun phrases, verb phrases, locations, etc.

• The Fastus system (Appelt and Israel, 1997)

• Sample rules for noun groups:NG Pronoun | Time-NP | Date-NPNG (DETP) (Adjs) HdNns | DETP Ving HdNnsDETP DETP-CP | DETP-CP

• Complete determiner-phrases: “the only five”, “another three”, “this”, “many”, “hers”, “all”, “the most”

fig10.20.pdf fig10.21.pdf

Page 41: Fall 2005 Lecture Notes #3 EECS 595 / LING 541 / SI 661 Natural Language Processing

Sample FASTUS outputCompany Name: Bridgestone Sports Co.Verb Group: saidNoun Group: FridayNoun Group: itVerb Group: had set upNoun Group: a joint venturePreposition: inLocation: TaiwanPreposition: withNoun Group: a local concernConjunction: andNoun Group: a Japanese trading houseVerb Group: to produceNoun Group: golf clubsVerb Group: to be shippedPreposition: toLocation: Japan