69
Computational Grammar Sarmad Hussain, Tafseer Ahmed, Nayyara Karamat, Shanza Nayyer Center for Research in Urdu Language Processing National University of Computer and Emerging Sciences Lahore, Pakistan www.crulp.org [email protected]

Computational Grammar · 22 Grammatical Functions SUBJect, OBJect, OBJ2, He(SUBJ) made a cake(OBJ) He(SUBJ) made me(OBJ) a cake(OBJ2) COMPlement, XCOMPlement COMP is a closed function

  • Upload
    others

  • View
    19

  • Download
    1

Embed Size (px)

Citation preview

Computational Grammar

Sarmad Hussain, Tafseer Ahmed, Nayyara Karamat, Shanza NayyerCenter for Research in Urdu Language ProcessingNational University of Computer and Emerging SciencesLahore, [email protected]

www.PANL10n.net 2

How to develop a computational grammar?1. Define a basic POS set2. Identify simple constituents: phrases and

sentences 3. Develop phrasal and sentence level context-free

rules4. Add additonal information

i. Agreementii. Grammatical relationsiii. Sub-categorization frames

5. Repeat with more complex structures until analysis is stable

www.PANL10n.net 3

Part-of-Speech (POS)/Word Class

Open ClassesNounsVerbsAdjectivesAdverbs

Closed ClassesPrepositionsDeterminersPronounsConjunctionsAuxiliary VerbsParticlesNumeralsCase Markers

www.PANL10n.net 4

How many POS tags?

Penn Tree Bank = 45Brown Corpus = 87C5 tagset = 61C7 tagset = 146

Requirement dependent on the applicationDefine/refine tags as per requirement

www.PANL10n.net 5

Constituency

Words normally group together into phrasesPhrases group together to give larger phrases and sentencesHow to identify similar phrases or constituents

Similar syntactic environmentMovement

www.PANL10n.net 6

Similar Syntactic Environment

He sitsThe boy sitsThe naughty boy sitsThe three young men sitThe sitThree sitYoung sit

www.PANL10n.net 7

Movement

I am meeting with the nice old manWith the nice old man, I am meetingWith, I am meeting, the nice old manWith the, I am meeting, nice old manWith the nice, I am meeting, old man

www.PANL10n.net 8

Rule Definition using CFG

Context-Free GrammarA set of non-terminal symbols, NA set of terminal symbols, ΣA set of productions, P, such that each production is of the form A α, where A belongs to N and αis a string from (Σ U N)*A start symbol S

S AA aA | a

a, aa, aaa, aaaa, …

Grammar Development

www.PANL10n.net 10

Noun Phrase

[John] went[She] came[The big brown loyal dog] is missing[Some of the books] have got ruined by the rainThe child tried [ten ice-cream flavors]I saw [the naughty girls planning mischief in the corner].[The man who took the secret packet from me], vanished

www.PANL10n.net 11

Noun Phrase

[The book on the table] is good.Head = bookPre-Nominal Phrase = ThePost Nominal Phrase = on the table

[The man who took the secret packet] vanishedHead = manPre-Nominal Phrase = ThePost Nominal Phrase = who took the secret packet

www.PANL10n.net 12

NP – Grammar Rules

NP NPpronoun

NP NPnoun

NPnoun (PreNomP) n (PostNomP)

www.PANL10n.net 13

Pre-Nominal Phrase

A good book The nice girl Few boysFirst three hoursThe three old mice

www.PANL10n.net 14

PreNomP – Grammar Rules

PreNomP (DetP) (NumberP) (AdjP) (NounModP)

DetPa, the , an

NumberPsome, few , three, third, first three

AdjPnice, good, white, cloudy, warm

NounModPcricket( in “cricket team”), grammar (in “grammar book”)

www.PANL10n.net 15

PreNomP - Adjectives

NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP)

AdjP adj

n: bookadj: good

good

AdjP

adj

n

NPnoun

PreNomPbook

www.PANL10n.net 16

PreNomP - Determiner

NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP) AdjP adj

DetP det

n: bookadj: good

det: aa

DetP

det

book

NPnoun

PreNomP n

adjgood

AdjP

www.PANL10n.net 17

Over-Generation

NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP) AdjP adj

DetP det

n: bookadj: gooddet: a

n: books a

DetP

det

book

NPnoun

PreNomP n

adjgood

AdjP

a

DetP

det

books

NPnoun

PreNomP n

adjgood

AdjP

www.PANL10n.net 18

Agreement Problem

A bookA books

The bookThe books

www.PANL10n.net 19

Solution

CFG is not sufficientNeed to augment it Many frameworks availableWe look at Lexical Functional Grammar

www.PANL10n.net 20

Lexical Functional Grammar

Defines two different syntactic structuresC(Constituent) Structure represents linear and hierarchical organization of words into phrasesF(Functional) Structure represents abstract functional syntactic organization of the sentence, representing syntactic predicate-argument structure and functional relations like subject and object

www.PANL10n.net 21

www.PANL10n.net 22

Grammatical Functions

SUBJect, OBJect, OBJ2, He(SUBJ) made a cake(OBJ)

He(SUBJ) made me(OBJ) a cake(OBJ2)

COMPlement, XCOMPlementCOMP is a closed function that contain an internal SUBJ phrase

David complained that Chris yawned. (COMP)XCOMP is an open function that does not contain an internal SUBJ phrase

David seemed to yawn. (XCOMP)

www.PANL10n.net 23

Grammatical Functions

ADJunct, XADJunctADJ is optional

He works in the morning. (ADJ)XADJ is an adjunct having a verb without subject (subject outside the clause)

Stretching his arms, David yawned. (XADJ).

OBLiqueRequired arguments, normally attached with verbs

David gave the book to Chris. (OBL)

www.PANL10n.net 24

Grammatical Functions

Governable Grammatical FunctionsThese are subcategorized (required) by predicate

SUBJ, OBJ, OBJ2, COMP, XCOMP, OBL

Modifying Grammatical FunctionsThese are not subcategorized (required) by predicate

ADJ, XADJ

www.PANL10n.net 25

Well-formedness

CompletenessF-Structure contains all the governable grammatical functions that its predicate governs: eat<SUBJ,OBJ>They eat an appleThey eat

CoherenceF-Structure disallows extra governable grammatical functions that its predicate does not governsrun<SUBJ>

They runThey run stadium

ConsistencyAn Attribute in F-Structure can have at most one value

A boy runsA boys runs

www.PANL10n.net 26

Sample F-Structures

Good book

Good books⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎦

⎤⎢⎣

⎡ good''PREDADJUNCT{NOM,ACC}CASE

SGNUMbook''PRED

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎦

⎤⎢⎣

⎡ good''PREDADJUNCT{NOM,ACC}CASE

PLNUMbook''PRED

www.PANL10n.net 27

PreNomP - Adjectives

NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP)

AdjP adj

n: bookadj: good

good

AdjP

adj

n

NPnoun

PreNomPbook

www.PANL10n.net 28

Lexicon

good: adj, ↑ PRED = ‘good’

book: noun, ↑PRED = ‘book’, ↑ NUM = SG, ↑CASE = {NOM,ACC}

books: noun, ↑PRED = ‘book’, ↑ NUM = PL, ↑CASE = {NOM,ACC}

www.PANL10n.net 29

NP – Grammar Rules

Simple Syntax:

NP NPnounNPnoun (PreNomP) n (PostNomP)

With Functional Annotation:

NP NPnoun:↑=↓; NPnoun (PreNomP:↑=↓;) n:↑=↓; (PostNomP:↑=↓;)PreNomP (AdjP:↑ ADJUNCT = ↓;) (NumberP) (AdjP) (NounModP) AdjP adj:↑ = ↓;

www.PANL10n.net 30

Unification

Unification is a (partial) operation on feature structures that combines two feature structures such that the new feature structure contains all the information of the original two, and nothing more. If there is no smallest feature structure that is subsumed by both then we say that the unification is undefined.

www.PANL10n.net 31

Unification Examples

⎥⎥⎥⎥

⎢⎢⎢⎢

=3PERS

SGNUM

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡ SGNUMandPLSG,NUM ⎥⎦

⎤⎢⎣

⎡= SGNUM

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡ 3PERSandSGNUM

⎥⎦

⎤⎢⎣

⎡⎥⎦

⎤⎢⎣

⎡ PLNUMandSGNUM UNDEFINED=

www.PANL10n.net 32

Sample F-Structures

Good book

Good books⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎦

⎤⎢⎣

⎡ good''PREDADJUNCT{NOM,ACC}CASE

SGNUMbook''PRED

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎦

⎤⎢⎣

⎡ good''PREDADJUNCT{NOM,ACC}CASE

PLNUMbook''PRED

www.PANL10n.net 33

Over-Generation

NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP) AdjP adj

DetP det

n: bookadj: gooddet: a

n: books a

DetP

det

book

NPnoun

PreNomP n

adjgood

AdjP

a

DetP

det

books

NPnoun

PreNomP n

adjgood

AdjP

www.PANL10n.net 34

NP – Lexicon and Grammar (2)

a: det, ↑ DEFINITE = NEG, ↑ NUM = SG.the: det, ↑ DEFINITE = POS, ↑ NUM = {SG,PL}.

PreNomP (DetP: ↑ SPEC = ↓, ↑NUM =c ↓NUM;) (AdjP:↑ ADJUNCT = ↓;)

DetP det:↑=↓; .

www.PANL10n.net 35

Agreement

A book

The books ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

SGNUMNEGDEFINITESPEC

{NOM,ACC}CASESGNUM

book''PRED

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

PL}{SG,NUMPOSDEFINITESPEC

ACC}{NOM,CASEPLNUM

book''PRED

www.PANL10n.net 36

Agreement

A books

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

SG*NUMNEGDEFINITESPEC

{NOM,ACC}CASEPL*NUM

book''PRED

www.PANL10n.net 37

Sentence

John drives the carJohn sleeps

S NP VPVP v (NP)

www.PANL10n.net 38

Sentence with F-Descriptions

S NP: ↑ SUBJ= ↓; VP:↑=↓; .VP v:↑ =↓; (NP: ↑ OBJ=↓;) .

sleeps: v, ↑ PRED = ‘sleep’, …drives: v ↑ PRED = ‘drive’ , …

www.PANL10n.net 39

Sentence – Sample F-Structure

John drives the car.

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

........PL}{SG,NUM

POSDEFINITESPEC

{NOM,ACC}CASESGNUM'car'PRED

OBJ

{NOM,ACC}CASESGNUM

John''PREDSUBJ

drive''PRED

www.PANL10n.net 40

Problem

John drivesJohn sleeps the car

www.PANL10n.net 41

Solution: Sub-Categorization Frame

sleeps: v, ↑ PRED = ‘sleep<SUBJ>’, …drives: v ↑ PRED = ‘drive<SUBJ,OBJ>’ , …

likes:v, ↑ PRED = ‘like<SUBJ,OBJ>, …likes:v, ↑ PRED = ‘like<SUBJ,XCOMP>, …

John likes the car.John likes to drive the car.

www.PANL10n.net 42

Sentence – Sample F-Structure

John drives the car.

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥

⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

........PL}{SG,NUM

POSDEFINITESPEC

{NOM,ACC}CASESGNUM'car'PRED

OBJ

{NOM,ACC}CASESGNUM

John''PREDSUBJ

drive''PRED

www.PANL10n.net 43

Pronouns and Cases

NP NPpronounNPpronoun pronoun

He likes her John likes Mary

Her likes he Mary likes JohnShe Likes him

www.PANL10n.net 44

Pronouns : Lexicon

he: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑ GEND = M, ↑ CASE = NOM

she: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑GEND = F, ↑ CASE = NOM

I: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑ GEND = {M,F}, ↑ CASE = NOM

him: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑GEND = M, ↑ CASE = ACC

her: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑GEND = F, ↑ CASE = ACC

www.PANL10n.net 45

Pronoun Case

him/her/itthem

he/ she/itthey

3rd person

youyou2nd person

meus

I we

1st person

Objective(ACC/DAT)

Subjective (NOM)Singular (SG)

Personal Pronouns (PERS)

www.PANL10n.net 46

Sentence – Revised Grammar Rules

S -> NP: ↑ SUBJ= ↓, ↓ CASE =c NOM; VP:↑=↓;

VP -> v:↑ =↓; (NP: ↑ OBJ=↓, ↓ CASE =c ACC;)

www.PANL10n.net 47

Other Phrases

Verbal Phrase with tense and aspectPrepositional PhraseAdverbial PhraseConjunctions…

www.PANL10n.net 48

Urdu Grammar – Some Remarks

Agreements within Noun Phrase

Case MarkingFree Order Language

www.PANL10n.net 49

Agreements within NP: Gender and Numberö achcha)ا larka)

achchi) ا larki)achche) ا larke)

öا * (achchi larka)ا * (achcha larki)öا * (achche larka)

www.PANL10n.net 50

NP - Urdu Lexicon and Grammar

↑ ,verb :ا PRED = ‘ ↑ ,’ا NUM = SG, ↑GEND = M.ا : verb, ↑ PRED = ‘ ↑ ,’ا NUM = SG, ↑

GEND = F.

ö : verb, ↑ PRED = ‘ ö ’, ↑ NUM = SG, ↑GEND = M.

: verb, ↑ PRED = ‘ ’, ↑ NUM = SG, ↑GEND = F.

www.PANL10n.net 51

NP - Urdu Lexicon and Grammar (2)

NPnoun -> (AdjP: ↑ ADJUNCT =↓, ↑ NUM= ↓NUM, ↑GEND = ↓ GEND,…;) n:↑=↓; .

ö ا (achcha larka)

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢

MGENDSGNUM

'achcha'PREDADJUNCT

MGENDSGNUM

lerka''PRED

www.PANL10n.net 52

Case MarkingEight (Morphological) Cases of Sanskrit

Addressing/CallingVocative relationGenitive comparisonAblative positionLocative CauseInstrumental Indirect ObjectDative Direct ObjectAccusativeSubjectNominative

RepresentsCase

www.PANL10n.net 53

Case Phrase and Case Markers

]ö [ ]۔ ] مآ[larka] [aam] khata hay

]ð ] اŠ] [ [ ] ] õ رى[را] وق [

[shikari ne] [shair ko] [[africa ke] jungle mein] [bandooq se] mara.] Š[ ]دى۔ ] وق[ ] رى

[larke ne] [shikari ko] [bandooq] di.] Š[ ]دى۔ ] وق[ ]ا

[larke ne] [usse] [bandooq] di.

www.PANL10n.net 54

Case Markers

Urdu has case markers to represent casesŠ:cm, ↑ CASE = ERG

: cm, ↑ CASE = {ACC,DAT} : cm, ↑ CASE = INST : cm, ↑ CASE = LOC

Some pronouns have case through morphology

↑ ,pronoun:ا PRED = ‘pro’, ↑ PERS = 3, NUM = SG, GEND ={M,F}, ↑ CASE = {ACC,DAT}

www.PANL10n.net 55

Case Phrase

KP NPnoun : ↑=↓, ↑CASE = NOM; .KP NPnoun : ↑=↓; cm : ↑=↓; .

Š رى (shikari ne) ð (jungle mein)

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

ERGCASEMGEND

SGNUMshikari''PRED

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

LOC_MENCASEMGEND

SGNUMjungle''PRED

www.PANL10n.net 56

Word Order

۔ د] ] [Š رى[[shikari ne] [shair ko] dekha. ۔د] Š رى][ [[shair ko] [shikari ne] daikha. ۔] [ د]Š رى[[shikari ne] daikha [shair ko].

] Š ][۔د] رى[shair ne] [shikari ko] dekha. ۔د] Š [ ] رى[[shikari ko] [shair ne] daikha.

www.PANL10n.net 57

Word Order and Shuffle Operator

S KP:↑=SUBJ=↓, ↓ CASE =c NOM;KP:↑=OBJ=↓, ↓ CASE =c ACC;VerbalP: ↑=↓, …;

S [KP:↑=SUBJ=↓, ↓ CASE =c NOM;% KP:↑=OBJ=↓, ↓ CASE =c ACC;% VerbalP: ↑=↓, …; ]

www.PANL10n.net 58

F-Structure of “He walks”

⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢

⎥⎥⎥⎥⎥⎥⎥⎥⎥

⎢⎢⎢⎢⎢⎢⎢⎢⎢

><

NominativeCaseTrueAnimatedHeForm

ThirdPersonMasculineGenderSingularNumberPronounPred

Subject

FalsePerfFalseProg

PresentTenseSubjectwalkPred

www.PANL10n.net 59

Parsing Algorithms

Parsing as searchChart Parsing

www.PANL10n.net 60

Parsing as Search

Top Down AlgorithmOnly builds trees that are rooted in SWastes time building trees that don't match the input

Bottom Up AlgorithmOnly builds trees that match the inputWastes time building trees that never lead to S

www.PANL10n.net 61

Problems in Search Algorithm

Left RecursionNP NP PPVP VP PPS S and S

Left Recursion can be removed from productions of the CFG, but the transformed grammar gives different parse structures

www.PANL10n.net 62

Problems in Search AlgorithmRepeated ParsingAmbiguous Grammar results in repeated parses of same structures.

The phrases “I”, “an elephant”, “in my pajamas” are parsed twice in two different parses

www.PANL10n.net 63

Chart Parsing

Chart parsing is an example of dynamic programming.Dynamic Programming solves problem by finding solutions of sub-problems, and then combining them to form complete solution.In Chart parsing, Sub-trees are built, that are used by more than one parse trees.Chart parsing avoids Left Recursion and Repeated parsing problems.

www.PANL10n.net 64

Chart Parsing Algorithm

A chart consists of N + 1 states (where N is the number of words in the input). The algorithm goes through the states from left to right and processes each state in order applying one of the following operators to the state:

PredictorCompleterScanner

www.PANL10n.net 65

Predictor

Predictor creates new states from top-down application of rules.

S -> .VP, [0,0] applying Predictor on above results in adding the following new states on the chart.VP -> .verb, [0,0]VP -> .verb NP, [0,0]

www.PANL10n.net 66

Completer

The completer is applied to a state when its dot has reached the right end of a rule. (whole phrase is parsed.)The completer takes old states and combines them into new states.

S -> . NP VP [0,0]NP -> det noun. [0,2]Applying Cpmpleter on above results in adding the following new state on the chart.S -> NP . VP [0,2]

www.PANL10n.net 67

Scanner

The dot advances and new state is formed if there is POS Category at right of the dot, and this category is also predicted by a state (by current word of input sentence).

VP -> . verb NP [0, 1] scanner checks that current word is a verb and adds new state. VP -> verb . NP [0, 2]

Thank You

www.PANL10n.net 69

References and Further Readings

Jurafsky, Daniel. (2000), Speech and language Processing, Prentice Hall, New Jersey.Butt, Miraiam. (1999), A Grammar Writer's Cookbook. CSLI Publications, Stanford, CA.http://cslu.cse.ogi.edu/HLTsurvey/ch3node5.htmlhttp://www.coli.uni-saarland.de/~kris/nlp-with-prolog/html/node83.htmlhttp://emsah.uq.edu.au/linguistics/Working%20Papers/ananda_ling/LFG_Summary.htmhttp://www.stanford.edu/group/nasslli/courses/as-cr-da/apbook-nasslli.pdfhttp://www.sfu.ca/person/dearmond/222/222.x-bar.htm