Upload
others
View
19
Download
1
Embed Size (px)
Citation preview
Computational Grammar
Sarmad Hussain, Tafseer Ahmed, Nayyara Karamat, Shanza NayyerCenter for Research in Urdu Language ProcessingNational University of Computer and Emerging SciencesLahore, [email protected]
www.PANL10n.net 2
How to develop a computational grammar?1. Define a basic POS set2. Identify simple constituents: phrases and
sentences 3. Develop phrasal and sentence level context-free
rules4. Add additonal information
i. Agreementii. Grammatical relationsiii. Sub-categorization frames
5. Repeat with more complex structures until analysis is stable
www.PANL10n.net 3
Part-of-Speech (POS)/Word Class
Open ClassesNounsVerbsAdjectivesAdverbs
Closed ClassesPrepositionsDeterminersPronounsConjunctionsAuxiliary VerbsParticlesNumeralsCase Markers
www.PANL10n.net 4
How many POS tags?
Penn Tree Bank = 45Brown Corpus = 87C5 tagset = 61C7 tagset = 146
Requirement dependent on the applicationDefine/refine tags as per requirement
www.PANL10n.net 5
Constituency
Words normally group together into phrasesPhrases group together to give larger phrases and sentencesHow to identify similar phrases or constituents
Similar syntactic environmentMovement
www.PANL10n.net 6
Similar Syntactic Environment
He sitsThe boy sitsThe naughty boy sitsThe three young men sitThe sitThree sitYoung sit
www.PANL10n.net 7
Movement
I am meeting with the nice old manWith the nice old man, I am meetingWith, I am meeting, the nice old manWith the, I am meeting, nice old manWith the nice, I am meeting, old man
www.PANL10n.net 8
Rule Definition using CFG
Context-Free GrammarA set of non-terminal symbols, NA set of terminal symbols, ΣA set of productions, P, such that each production is of the form A α, where A belongs to N and αis a string from (Σ U N)*A start symbol S
S AA aA | a
a, aa, aaa, aaaa, …
www.PANL10n.net 10
Noun Phrase
[John] went[She] came[The big brown loyal dog] is missing[Some of the books] have got ruined by the rainThe child tried [ten ice-cream flavors]I saw [the naughty girls planning mischief in the corner].[The man who took the secret packet from me], vanished
www.PANL10n.net 11
Noun Phrase
[The book on the table] is good.Head = bookPre-Nominal Phrase = ThePost Nominal Phrase = on the table
[The man who took the secret packet] vanishedHead = manPre-Nominal Phrase = ThePost Nominal Phrase = who took the secret packet
www.PANL10n.net 13
Pre-Nominal Phrase
A good book The nice girl Few boysFirst three hoursThe three old mice
www.PANL10n.net 14
PreNomP – Grammar Rules
PreNomP (DetP) (NumberP) (AdjP) (NounModP)
DetPa, the , an
NumberPsome, few , three, third, first three
AdjPnice, good, white, cloudy, warm
NounModPcricket( in “cricket team”), grammar (in “grammar book”)
www.PANL10n.net 15
PreNomP - Adjectives
NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP)
AdjP adj
n: bookadj: good
good
AdjP
adj
n
NPnoun
PreNomPbook
www.PANL10n.net 16
PreNomP - Determiner
NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP) AdjP adj
DetP det
n: bookadj: good
det: aa
DetP
det
book
NPnoun
PreNomP n
adjgood
AdjP
www.PANL10n.net 17
Over-Generation
NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP) AdjP adj
DetP det
n: bookadj: gooddet: a
n: books a
DetP
det
book
NPnoun
PreNomP n
adjgood
AdjP
a
DetP
det
books
NPnoun
PreNomP n
adjgood
AdjP
www.PANL10n.net 19
Solution
CFG is not sufficientNeed to augment it Many frameworks availableWe look at Lexical Functional Grammar
www.PANL10n.net 20
Lexical Functional Grammar
Defines two different syntactic structuresC(Constituent) Structure represents linear and hierarchical organization of words into phrasesF(Functional) Structure represents abstract functional syntactic organization of the sentence, representing syntactic predicate-argument structure and functional relations like subject and object
www.PANL10n.net 22
Grammatical Functions
SUBJect, OBJect, OBJ2, He(SUBJ) made a cake(OBJ)
He(SUBJ) made me(OBJ) a cake(OBJ2)
COMPlement, XCOMPlementCOMP is a closed function that contain an internal SUBJ phrase
David complained that Chris yawned. (COMP)XCOMP is an open function that does not contain an internal SUBJ phrase
David seemed to yawn. (XCOMP)
www.PANL10n.net 23
Grammatical Functions
ADJunct, XADJunctADJ is optional
He works in the morning. (ADJ)XADJ is an adjunct having a verb without subject (subject outside the clause)
Stretching his arms, David yawned. (XADJ).
OBLiqueRequired arguments, normally attached with verbs
David gave the book to Chris. (OBL)
www.PANL10n.net 24
Grammatical Functions
Governable Grammatical FunctionsThese are subcategorized (required) by predicate
SUBJ, OBJ, OBJ2, COMP, XCOMP, OBL
Modifying Grammatical FunctionsThese are not subcategorized (required) by predicate
ADJ, XADJ
www.PANL10n.net 25
Well-formedness
CompletenessF-Structure contains all the governable grammatical functions that its predicate governs: eat<SUBJ,OBJ>They eat an appleThey eat
CoherenceF-Structure disallows extra governable grammatical functions that its predicate does not governsrun<SUBJ>
They runThey run stadium
ConsistencyAn Attribute in F-Structure can have at most one value
A boy runsA boys runs
www.PANL10n.net 26
Sample F-Structures
Good book
Good books⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎦
⎤⎢⎣
⎡ good''PREDADJUNCT{NOM,ACC}CASE
SGNUMbook''PRED
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎦
⎤⎢⎣
⎡ good''PREDADJUNCT{NOM,ACC}CASE
PLNUMbook''PRED
www.PANL10n.net 27
PreNomP - Adjectives
NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP)
AdjP adj
n: bookadj: good
good
AdjP
adj
n
NPnoun
PreNomPbook
www.PANL10n.net 28
Lexicon
good: adj, ↑ PRED = ‘good’
book: noun, ↑PRED = ‘book’, ↑ NUM = SG, ↑CASE = {NOM,ACC}
books: noun, ↑PRED = ‘book’, ↑ NUM = PL, ↑CASE = {NOM,ACC}
www.PANL10n.net 29
NP – Grammar Rules
Simple Syntax:
NP NPnounNPnoun (PreNomP) n (PostNomP)
With Functional Annotation:
NP NPnoun:↑=↓; NPnoun (PreNomP:↑=↓;) n:↑=↓; (PostNomP:↑=↓;)PreNomP (AdjP:↑ ADJUNCT = ↓;) (NumberP) (AdjP) (NounModP) AdjP adj:↑ = ↓;
www.PANL10n.net 30
Unification
Unification is a (partial) operation on feature structures that combines two feature structures such that the new feature structure contains all the information of the original two, and nothing more. If there is no smallest feature structure that is subsumed by both then we say that the unification is undefined.
www.PANL10n.net 31
Unification Examples
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=3PERS
SGNUM
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡ SGNUMandPLSG,NUM ⎥⎦
⎤⎢⎣
⎡= SGNUM
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡ 3PERSandSGNUM
⎥⎦
⎤⎢⎣
⎡⎥⎦
⎤⎢⎣
⎡ PLNUMandSGNUM UNDEFINED=
www.PANL10n.net 32
Sample F-Structures
Good book
Good books⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎦
⎤⎢⎣
⎡ good''PREDADJUNCT{NOM,ACC}CASE
SGNUMbook''PRED
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎦
⎤⎢⎣
⎡ good''PREDADJUNCT{NOM,ACC}CASE
PLNUMbook''PRED
www.PANL10n.net 33
Over-Generation
NPnoun (PreNomP) n (PostNomP)PreNomP (DetP) (NumberP) (AdjP) (NounModP) AdjP adj
DetP det
n: bookadj: gooddet: a
n: books a
DetP
det
book
NPnoun
PreNomP n
adjgood
AdjP
a
DetP
det
books
NPnoun
PreNomP n
adjgood
AdjP
www.PANL10n.net 34
NP – Lexicon and Grammar (2)
a: det, ↑ DEFINITE = NEG, ↑ NUM = SG.the: det, ↑ DEFINITE = POS, ↑ NUM = {SG,PL}.
PreNomP (DetP: ↑ SPEC = ↓, ↑NUM =c ↓NUM;) (AdjP:↑ ADJUNCT = ↓;)
DetP det:↑=↓; .
www.PANL10n.net 35
Agreement
A book
The books ⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
SGNUMNEGDEFINITESPEC
{NOM,ACC}CASESGNUM
book''PRED
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
PL}{SG,NUMPOSDEFINITESPEC
ACC}{NOM,CASEPLNUM
book''PRED
www.PANL10n.net 36
Agreement
A books
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
SG*NUMNEGDEFINITESPEC
{NOM,ACC}CASEPL*NUM
book''PRED
www.PANL10n.net 38
Sentence with F-Descriptions
S NP: ↑ SUBJ= ↓; VP:↑=↓; .VP v:↑ =↓; (NP: ↑ OBJ=↓;) .
sleeps: v, ↑ PRED = ‘sleep’, …drives: v ↑ PRED = ‘drive’ , …
www.PANL10n.net 39
Sentence – Sample F-Structure
John drives the car.
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
........PL}{SG,NUM
POSDEFINITESPEC
{NOM,ACC}CASESGNUM'car'PRED
OBJ
{NOM,ACC}CASESGNUM
John''PREDSUBJ
drive''PRED
www.PANL10n.net 41
Solution: Sub-Categorization Frame
sleeps: v, ↑ PRED = ‘sleep<SUBJ>’, …drives: v ↑ PRED = ‘drive<SUBJ,OBJ>’ , …
likes:v, ↑ PRED = ‘like<SUBJ,OBJ>, …likes:v, ↑ PRED = ‘like<SUBJ,XCOMP>, …
John likes the car.John likes to drive the car.
www.PANL10n.net 42
Sentence – Sample F-Structure
John drives the car.
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
........PL}{SG,NUM
POSDEFINITESPEC
{NOM,ACC}CASESGNUM'car'PRED
OBJ
{NOM,ACC}CASESGNUM
John''PREDSUBJ
drive''PRED
www.PANL10n.net 43
Pronouns and Cases
NP NPpronounNPpronoun pronoun
He likes her John likes Mary
Her likes he Mary likes JohnShe Likes him
www.PANL10n.net 44
Pronouns : Lexicon
he: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑ GEND = M, ↑ CASE = NOM
she: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑GEND = F, ↑ CASE = NOM
I: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑ GEND = {M,F}, ↑ CASE = NOM
him: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑GEND = M, ↑ CASE = ACC
her: pronoun, ↑ PRED = ‘pro’, ↑ PERS = 3, ↑ NUM = SG, ↑GEND = F, ↑ CASE = ACC
www.PANL10n.net 45
Pronoun Case
him/her/itthem
he/ she/itthey
3rd person
youyou2nd person
meus
I we
1st person
Objective(ACC/DAT)
Subjective (NOM)Singular (SG)
Personal Pronouns (PERS)
www.PANL10n.net 46
Sentence – Revised Grammar Rules
S -> NP: ↑ SUBJ= ↓, ↓ CASE =c NOM; VP:↑=↓;
VP -> v:↑ =↓; (NP: ↑ OBJ=↓, ↓ CASE =c ACC;)
www.PANL10n.net 47
Other Phrases
Verbal Phrase with tense and aspectPrepositional PhraseAdverbial PhraseConjunctions…
www.PANL10n.net 48
Urdu Grammar – Some Remarks
Agreements within Noun Phrase
Case MarkingFree Order Language
www.PANL10n.net 49
Agreements within NP: Gender and Numberö achcha)ا larka)
achchi) ا larki)achche) ا larke)
öا * (achchi larka)ا * (achcha larki)öا * (achche larka)
www.PANL10n.net 50
NP - Urdu Lexicon and Grammar
↑ ,verb :ا PRED = ‘ ↑ ,’ا NUM = SG, ↑GEND = M.ا : verb, ↑ PRED = ‘ ↑ ,’ا NUM = SG, ↑
GEND = F.
ö : verb, ↑ PRED = ‘ ö ’, ↑ NUM = SG, ↑GEND = M.
: verb, ↑ PRED = ‘ ’, ↑ NUM = SG, ↑GEND = F.
www.PANL10n.net 51
NP - Urdu Lexicon and Grammar (2)
NPnoun -> (AdjP: ↑ ADJUNCT =↓, ↑ NUM= ↓NUM, ↑GEND = ↓ GEND,…;) n:↑=↓; .
ö ا (achcha larka)
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
MGENDSGNUM
'achcha'PREDADJUNCT
MGENDSGNUM
lerka''PRED
www.PANL10n.net 52
Case MarkingEight (Morphological) Cases of Sanskrit
Addressing/CallingVocative relationGenitive comparisonAblative positionLocative CauseInstrumental Indirect ObjectDative Direct ObjectAccusativeSubjectNominative
RepresentsCase
www.PANL10n.net 53
Case Phrase and Case Markers
]ö [ ]۔ ] مآ[larka] [aam] khata hay
]ð ] اŠ] [ [ ] ] õ رى[را] وق [
[shikari ne] [shair ko] [[africa ke] jungle mein] [bandooq se] mara.] Š[ ]دى۔ ] وق[ ] رى
[larke ne] [shikari ko] [bandooq] di.] Š[ ]دى۔ ] وق[ ]ا
[larke ne] [usse] [bandooq] di.
www.PANL10n.net 54
Case Markers
Urdu has case markers to represent casesŠ:cm, ↑ CASE = ERG
: cm, ↑ CASE = {ACC,DAT} : cm, ↑ CASE = INST : cm, ↑ CASE = LOC
Some pronouns have case through morphology
↑ ,pronoun:ا PRED = ‘pro’, ↑ PERS = 3, NUM = SG, GEND ={M,F}, ↑ CASE = {ACC,DAT}
www.PANL10n.net 55
Case Phrase
KP NPnoun : ↑=↓, ↑CASE = NOM; .KP NPnoun : ↑=↓; cm : ↑=↓; .
Š رى (shikari ne) ð (jungle mein)
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
ERGCASEMGEND
SGNUMshikari''PRED
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
LOC_MENCASEMGEND
SGNUMjungle''PRED
www.PANL10n.net 56
Word Order
۔ د] ] [Š رى[[shikari ne] [shair ko] dekha. ۔د] Š رى][ [[shair ko] [shikari ne] daikha. ۔] [ د]Š رى[[shikari ne] daikha [shair ko].
] Š ][۔د] رى[shair ne] [shikari ko] dekha. ۔د] Š [ ] رى[[shikari ko] [shair ne] daikha.
www.PANL10n.net 57
Word Order and Shuffle Operator
S KP:↑=SUBJ=↓, ↓ CASE =c NOM;KP:↑=OBJ=↓, ↓ CASE =c ACC;VerbalP: ↑=↓, …;
S [KP:↑=SUBJ=↓, ↓ CASE =c NOM;% KP:↑=OBJ=↓, ↓ CASE =c ACC;% VerbalP: ↑=↓, …; ]
www.PANL10n.net 58
F-Structure of “He walks”
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
><
NominativeCaseTrueAnimatedHeForm
ThirdPersonMasculineGenderSingularNumberPronounPred
Subject
FalsePerfFalseProg
PresentTenseSubjectwalkPred
www.PANL10n.net 60
Parsing as Search
Top Down AlgorithmOnly builds trees that are rooted in SWastes time building trees that don't match the input
Bottom Up AlgorithmOnly builds trees that match the inputWastes time building trees that never lead to S
www.PANL10n.net 61
Problems in Search Algorithm
Left RecursionNP NP PPVP VP PPS S and S
Left Recursion can be removed from productions of the CFG, but the transformed grammar gives different parse structures
www.PANL10n.net 62
Problems in Search AlgorithmRepeated ParsingAmbiguous Grammar results in repeated parses of same structures.
The phrases “I”, “an elephant”, “in my pajamas” are parsed twice in two different parses
www.PANL10n.net 63
Chart Parsing
Chart parsing is an example of dynamic programming.Dynamic Programming solves problem by finding solutions of sub-problems, and then combining them to form complete solution.In Chart parsing, Sub-trees are built, that are used by more than one parse trees.Chart parsing avoids Left Recursion and Repeated parsing problems.
www.PANL10n.net 64
Chart Parsing Algorithm
A chart consists of N + 1 states (where N is the number of words in the input). The algorithm goes through the states from left to right and processes each state in order applying one of the following operators to the state:
PredictorCompleterScanner
www.PANL10n.net 65
Predictor
Predictor creates new states from top-down application of rules.
S -> .VP, [0,0] applying Predictor on above results in adding the following new states on the chart.VP -> .verb, [0,0]VP -> .verb NP, [0,0]
www.PANL10n.net 66
Completer
The completer is applied to a state when its dot has reached the right end of a rule. (whole phrase is parsed.)The completer takes old states and combines them into new states.
S -> . NP VP [0,0]NP -> det noun. [0,2]Applying Cpmpleter on above results in adding the following new state on the chart.S -> NP . VP [0,2]
www.PANL10n.net 67
Scanner
The dot advances and new state is formed if there is POS Category at right of the dot, and this category is also predicted by a state (by current word of input sentence).
VP -> . verb NP [0, 1] scanner checks that current word is a verb and adds new state. VP -> verb . NP [0, 2]
www.PANL10n.net 69
References and Further Readings
Jurafsky, Daniel. (2000), Speech and language Processing, Prentice Hall, New Jersey.Butt, Miraiam. (1999), A Grammar Writer's Cookbook. CSLI Publications, Stanford, CA.http://cslu.cse.ogi.edu/HLTsurvey/ch3node5.htmlhttp://www.coli.uni-saarland.de/~kris/nlp-with-prolog/html/node83.htmlhttp://emsah.uq.edu.au/linguistics/Working%20Papers/ananda_ling/LFG_Summary.htmhttp://www.stanford.edu/group/nasslli/courses/as-cr-da/apbook-nasslli.pdfhttp://www.sfu.ca/person/dearmond/222/222.x-bar.htm