Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01) Increasing power of LL(k) parsers

Increasing power of LL(k) parsers

Bc. Jozef Lang (xlangj01)Bc. Zoltn Zemko (xzemko01)Increasing power of LL(k) parsersHello, let me to introduce myself and my teammate. We have prepared a presentation about the methods by which the power of LL(k) parsers can be increased. In case of any questions, feel free to ask at the end of presentation.1OutlineLL(1) parsersWhy increasing the power of LL(k) parsers?LL(k) parsersLinear approximate LL(k) parsingLL-regular parsingParse tree grammarsExtended LL(1) grammarsConclusionLL(1) parsingDeterministic top-down parsingPrediction is made only of one symbol, thus LL(1) is an 1 look-ahead parsingThe starting terminal symbol of every non-terminal symbol is needed when a parse table is constructedFIRSTk set -- set of terminals that are at the first k positions of strings that non-terminal can be derived toFOLLOW set set of all terminal symbols that can follow non-terminal symbol in any sequential form derivable from S# LL(1) versus strong-LL(1)If all entries of the parse table have at most one element then the grammar is called strong-LL(1)Every LL(1) grammar is also a strong-LL(1) grammarWhen a parse table entry contains more than one entry then it is a LL(1) conflictA parser with LL(1) conflict is not deterministic and thus is less efficientLL(1) versus strong-LL(1) (2)LL(1) conflicts can be solved byLeft recursion eliminationLeft factoringConflict resolversWhy increasing the power of LL(k) parsers?Lets have a following grammar where idf produces identifiers:

This fragment defines expression elements like x, sin(0.41), T[2,3]First token is common for all expressions, but the second token distinguish between alternativesLook ahead of only one token is not enoughIt would be handy to increase power of deterministic LL parsing.

Each expression which starts with an identifier; only the second token allows us to distinguish between the altematives. Idf produce identifiers ( x; sin(0.45); T[5,6]). Look-ahead of one token is not enough!!! Useful to have Look-ahead to k>1 tokens.

Ku gramatike: Kad vraz, ktor sa zana identifiktorom, vieme rozli a po prijat druhho tokenu.

6LL(k) parsersIt is sometimes handy to look ahead of k symbols, where k > 1Need to define FIRSTk setsLets have sequential form x. FIRSTk(x) is a set of terminals where:

where y is some sequential form

x is a sentential form, then FIRSTk(X) is the set of terminal strings w such that Iwl(the length of w) is les s than k and x~w, 01' Iwl is equal to k, and x~wy, for somesentential form y. For k = 1 this definition coincides with the definition of the FIRSTsets as we have seen it before.

LLk umozni vytvorit prehlad vopred k symbolov, kde k>1. Musime definovat mnoziny FIRSTk, ktore nam poskytuje pohlad vpred na k symbolov.Majme vetnu formu x, potom FIRSTk(x) je mnozina terminalov retazca w, kde plati (vid vzorce)

7LL(k) parsers (2)Assume that we have following rule in a grammar G

Grammar G is a LL(k) grammar iff the sets FIRSTk(a1x#k) FIRSTk(akx#k) are pairwise disjoint.Symbol #k represents the number of look-ahead symbolsIt is obvious that every LL(k) grammar is a subset of LL(k+1) grammars, this does not hold vice versa.

8LL(k) parsers (3)Similarly as by LL(1) parsers, producing parse tables for LL(k) parsers is difficultFOLLOW set for a LL(k) grammar is defined as an union of FIRSTk(x#k) for any prediction Ax#kAs by LL(1) parsers, the parse table will be indexed by with a pair consisting of a non-terminal symbol and string of terminals with the length equal to kIf a parse table has for every entry at most one element then the grammar denoted by this parse table is strong-LL(k)For k > 1 there are grammars that are LL(k) but not strong-LL(k)LL(k) parsers (4)Strong-LL(k) parsers are only seldom used in practiceSimilar effect can be obtained by using conflict resolvers Linear-approximate LL(k) parsingDifficult constructing of LL(k) parse tables can be avoided by a simple trickIn addition to FIRST set, introduce SECOND, THIRD etc setThe size complexity is reduced from O(tk) to k tables of O(t), where k is the number of setsLinear-approximate LL(k) grammar is weaker than LL(k) grammar because it breaks the relationship between tokensLets assume that we have LL(2) grammar that has look ahead sets of { ab, cd }{ad, cb } Linear-approximate LL(2) grammar has FIRST set { ac } and SECOND {bd} there are not disjointLL -regular parsingLL(k) provides bounded look-aheadThere are grammars where a discriminating token can be arbitrarily far away

Unbounded look-ahead is neededUnbounded look-ahead forms its own context-free grammarContext-free grammar can be approximated by regular grammarThere is no algorithm to approximate context-free grammar, but there are several heuristics

Parse tree grammar from LL(1)A straightforward processBasic idea is to create new rule for every predictionThe non-terminals are numbered by an increasing global counterThen are inserted into prediction stackNew created rules forms parse tree grammarAs far as the parser is deterministic, the parse tree grammar is obtained instead of parse forest grammar

Parse tree grammar from LL(1) (2)

To see how it works in some more detail we refer to the grammar in Figure8.9 and parse table 8.10. We start with a prediction stack Session _1 #, a lookaheadI and a global counter which now stands at 2. For non-terllnal Sessionand look-ahead ! the table predicts Session _ Facts Question. So wegenerate the parse tree grammar rule Session_1 _ Facts_2 Question_3where Session 1 obtains its number from the prediction and the Facts_2and Question 3 obtain their numbers from the global counter. Next wetum the predict~n stack into Facts_2 Question_3 #. For Facts and1 the parse table yields the prediction Facts _ Fact Facts whichgives us the parse tree grammar rule Facts_2 _ Fact_4 Facts_Sand a stack Fact_4 Facts_S Question_3 #. The next step is silllarand produces the grammar rule Fac t _ 4 _ ! STRING and a stack1 STRING Facts_S Question_3 #. Now we are ready to match the I.This process generates successive layers of the parse tree, using non-terminalnames like Question_3 and Facts_S as forward pointers. See Figure 8.11,where the leaves of the tree spell the absorbed input followed by the prediction stack.When the parsing is finished, the leaves spell the input string.

======================================================Strana 25814Extended LL(1) grammarsSome parsers accept Extended LL(1) grammars instead of ordinary oneTo accept Extended LL(1) grammar parser must transform it to ordinary one without introducing LL(1) conflictsAn advantage of Extended LL(1) grammars is that they allow a more efficient implementation in recursive descent parsersConclusionLL(1) is very intuitive, makes its steps according to prediction of one tokenThere are situations where look-ahead only of one symbol is not sufficientThe power of LL parsers can be improved by extending the bounding look-ahead to a bounded length resulting in LL(k) parsinga unbounded length resulting in LL regular parsingLinear-approximate LL(2) parsing is a convenient and simplified form of a LL(2) parsing

Thank you for your attention

Documents

Bc. Jozef Lang (xlangj01) Bc. Zoltán Zemko (xzemko01) Increasing power of LL(k) parsers