25
Predavanje VI. : GRAMATIKA - po kojim pravilima slaganja ? Prof.dr.sc. Mario Essert ([email protected]) Fakultet strojarstva i brodogradnje, Zagreb Osijek, 7. studenoga 2017. M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 1 / 25

Prof.dr.sc. Mario Essert (messert@fsb · PDF . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

Embed Size (px)

Citation preview

Page 1: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

Predavanje VI. : GRAMATIKA- po kojim pravilima slaganja ?

Prof.dr.sc. Mario Essert ([email protected])

Fakultet strojarstva i brodogradnje, Zagreb

Osijek, 7. studenoga 2017.

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 1 / 25

Page 2: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

Sadrzaj:

1 HRVATSKA i DRUGE GRAMATIKEOd stoljeca sedmogGramaticko sredisnje mjesto

2 RJECNIK+GRAMATIKA=JEZIKContext-free & context-sensitiveAn ambiguous sentencePenn treebankConstituent structure

3 NLTK PARSINGChunking - komadanjeChinking - rascjepkanjeParsing - rasclamba

Page 3: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

HRVATSKA i DRUGE GRAMATIKE Od stoljeca sedmog

HRVATSKE GRAMATIKE - od stoljeca sedmog ,

STJEPKO TEZAK • STJEPAN BABIC;17. izdanje, SK 2009.

EUGENIJA BARIC • MIJO LONCARIC• DRAGICA MALIC • SLAVKOPAVESIC • MIRKO PETI • VESNAZECEVIC • MARIJA ZNIKA; SK 2000.

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 3 / 25

Page 4: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

HRVATSKA i DRUGE GRAMATIKE Od stoljeca sedmog

Nazalost, nase gramatike ne uce nas lingvisticke strukture

Zato cemo morati uciti po ENGLESKIM gramatikama, u nadi da cemo sto prijezajedno naciniti vlastitu!

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 4 / 25

Page 5: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

HRVATSKA i DRUGE GRAMATIKE Gramaticko sredisnje mjesto

Gramaticko sredisnje mjesto

Dvije zadace:1. prepoznavanje uzoraka i oznacenih objekata (parsing)2. generiranje izjava i recenica (sinteza, sintaksna realizacija)

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 5 / 25

Page 6: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Gramatika s rjecnikom definira jezik

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 6 / 25

Page 7: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Ugnijezdene ovisnosti

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 7 / 25

Page 8: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Jezici ovise o kontekstu (automati zakazuju)

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 8 / 25

Page 9: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Context-Free (CFG) or Phrase-Structure Grammars (PSG)

The (Chomsky) formalism is equivalent to Backus-Naur Form (BNF):

A context-free grammar consists of a set of RULES or productions, each ofwhich expresses the ways that symbols of the language can be grouped andordered together, and a LEXICON of words and symbols.

NP (or noun phrase), can be composed of either a ProperNoun or a determiner(Det) followed by a Nominal ; a Nominal can be one or more Nouns, etc.Primjedba: engleski - ’nominal’ je hrvatski - imenicki, imenski

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 9 / 25

Page 10: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Tree structure

[S [NP[Pro I ]] [VP[V prefer ] [NP[Det a] [Nom [N morning ] [Nom[N flight]]]]]]

In linguistics, the use of formal languages to model natural languages is calledGENERATIVE grammar.

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 10 / 25

Page 11: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Formal definition of context-free grammar

A context-free grammar G is 4-tuple defined by four parameters N, Σ, P, S :

N a set of non-terminal symbols (or variables)

Σ a set of terminal symbols (disjoint from N)

R a set of rules or productions, each of the form A→ , where A is anonterminal, b is a string of symbols from the infinite set of strings (Σ ∪ N)?

S a designated start symbol

A language is defined via the concept of derivation. One string derives anotherone if it can be rewritten as the second one via some series of rule applications.if A→ b is a production of P and α and γ are any strings in the set (Σ ∪ N)∗,then we say that αAγ DIRECTLY DERIVES αβγ , or αAγ ⇒ αβγ.

Let α1, α2, . . . , αm be strings in (Σ∪N)∗, m = 1, such that α1 ⇒ α2, α2 ⇒ α3,. . . , αm−1 ⇒ αm; i.e. α1 derives αm.

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 11 / 25

Page 12: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Generativna gramatika - primjer 01

1 from n l t k i m p o r t CFG2 grammar = CFG . f r o m s t r i n g ( ”””3 S −> NP VP4 PP −> P NP5 NP −> Det N | NP PP6 VP −> V NP | VP PP7 Det −> ’ n e k i ’ | ’ j e d n a ’8 N −> ’ pas ’ | ’ macka ’9 V −> ’ p r o g o n i ’ | ’ s j e d i ’

10 P −> ’ na ’ | ’ u ’11 ””” )12 p r i n t ’ 1 . ’ , grammar13 p r i n t ’ 2 . ’ , grammar . s t a r t ( )14 p r i n t ’ 3 . ’ , grammar .

p r o d u c t i o n s ( )

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 12 / 25

Page 13: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Generativna gramatika - primjer 02

1 from n l t k . p a r s e . g e n e r a t e i m p o r t g e n e r a t e , demo grammar2 from n l t k i m p o r t CFG3 moj grammar=”””4 S −> NP VP5 PP −> P NP6 NP −> Det N | NP PP7 VP −> V NP | VP PP8 Det −> ’ n e k i ’ | ’ j e d n a ’9 N −> ’ pas ’ | ’ macka ’

10 V −> ’ p r o g o n i ’ | ’ s j e d i ’11 P −> ’ na ’ | ’ u ’12 ”””13 grammar = CFG . f r o m s t r i n g ( moj grammar )14 f o r s e n t e n c e i n g e n e r a t e ( grammar , n=10) :15 p r i n t ( ’ ’ . j o i n ( s e n t e n c e ) )16 p r i n t l e n ( l i s t ( g e n e r a t e ( grammar , depth =4) ) )17 p r i n t l i s t ( g e n e r a t e ( grammar , depth =5) )18 p r i n t l e n ( l i s t ( g e n e r a t e ( grammar , depth =6) ) )

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 13 / 25

Page 14: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Context-free & context-sensitive

Generativna gramatika - primjer 02: rjesenje

Constituent structure is based on the observation that words combine with otherwords to form units. The evidence that a sequence of words forms such a unit isgiven by substitutability — that is, a sequence of words in a well-formed sentencecan be replaced by a shorter sequence without rendering the sentence ill-formed.

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 14 / 25

Page 15: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK An ambiguous sentence

An ambiguous sentence - sveprisutna dvosmislenost

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 15 / 25

Page 16: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Penn treebank

TREEBANK

It is possible to build a corpus inwhich every sentence issyntactically annotated with aparse tree (TREEBANK).

The Penn Treebank project hasproduced treebanks from theBrown, Switchboard, ATIS, andWall Street Journal corpora ofEnglish, as well as treebanks inArabic and Chinese.

Various tree-searching languages exist in different tools: Tgrep (Pito, 1993) andTGrep2 (Rohde, 2005) are publicly-available tools for searching treebanks thatuse a similar language for expressing tree constraints.

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 16 / 25

Page 17: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Penn treebank

POS: part-of-speech

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 17 / 25

Page 18: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Penn treebank

Imenice, zamjenice, prilozi, . . .

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 18 / 25

Page 19: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Penn treebank

glagoli, wh-pitalice, interpunkcije, . . .

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 19 / 25

Page 20: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Constituent structure

Constituent structure

Consider the following sentence:The little bear saw the fine fat trout in the brook.The fact that we can substitute ’He’ for ’The little bear ’ indicates that the lattersequence is a unit.We systematically substitute longer sequences by shorter ones in a way whichpreserves grammaticality. Each sequence that forms a unit can in fact be replacedby a single word, and we end up with just two elements:

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 20 / 25

Page 21: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Constituent structure

Substitution + tagset

Substitution of Word Sequences Plus Grammatical Categories

This diagram reproduces already shown table along with grammatical categoriescorresponding to:noun phrases (NP), verb phrases (VP), prepositional phrases (PP), and nominals(Nom)

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 21 / 25

Page 22: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

RJECNIK+GRAMATIKA=JEZIK Constituent structure

Phrase structure tree

If we now strip out the words apart from the topmost row, add an S node, andflip the figure over, we end up with a standard phrase structure tree:

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 22 / 25

Page 23: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

NLTK PARSING Chunking - komadanje

Chunking

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 23 / 25

Page 24: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

NLTK PARSING Chinking - rascjepkanje

Chinking

Classes and interfaces for producing tree structures that represent the internalorganization of a text. This task is known as PARSING. The text, and theresulting tree structures are called the text’s PARSES. Typically, the text is asingle sentence, and the tree structure represents the syntactic structure of thesentence. However, parsers can also be used in other domains. For example,parsers can be used to derive the morphological structure of the morphemes thatmake up a word, or to derive the discourse structure for a set of utterances.

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 24 / 25

Page 25: Prof.dr.sc. Mario Essert (messert@fsb · PDF  . Mario Essert ... 17. izdanje, SK 2009 . ... If we now strip out the words apart from the topmost row, add an S node, and

NLTK PARSING Parsing - rasclamba

Interactive visual parsing

M.Essert (FSB, Zagreb) Gramatika Osijek, 7. studenoga 2017. 25 / 25