Upload
aubrey-wilkinson
View
223
Download
2
Embed Size (px)
Citation preview
Introduction to Language Theory
Prepared by
Manuel E. Bermúdez, Ph.D.Associate ProfessorUniversity of Florida
Programming Language Translators
Introduction to Language TheoryDefinition: An alphabet (or vocabulary) Σ is a
finite set of symbols.
Example: Alphabet of Pascal:+ - * / < … (operators)begin end if var (keywords)<identifier> (identifiers)<string> (strings)<integer> (integers); : , ( ) [ ] (punctuators)
Note: All identifiers are represented by one symbol, because Σ must be finite.
Introduction to Language Theory
Definition: A sequence t = t1t2…tn of symbols from an alphabet Σ is a string.
Definition: The length of a string t = t1t2…tn (denoted |t|) is n. If n = 0, the string is ε, the empty string.
Definition: Given strings s = s1s2…sn and
t = t1t2…tm, the concatenation of s and t, denoted st, is the string s1s2…snt1t2…tm.
Introduction to Language Theory
Note: εu = u = uε, uεv = uv, for any strings u,v (including ε)
Definition: Σ* is the set of all strings of symbols from Σ.
Note: Σ* is called the reflexive, transitive closure of Σ.
Σ* is described by the graph (Σ*, ·), where “·” denotes concatenation, and there is a designated “start” node, ε.
Introduction to Language TheoryExample: Σ = {a, b}.
(Σ*, ·)
Σ* is countably infinite, so can’t compute all of Σ*, and can only compute finite subsets of Σ*, but can compute whether a given string is in Σ*.
ε
a
b
aa
ab
ba
bb
aba
abba
b
ba
a
b
a
b
Introduction to Language Theory
Example: Σ = Pascal vocabulary. Σ* = all possible alleged Pascal
programs, i.e. all possible inputs to Pascal compiler.
Need to specify L Σ*, the correct Pascal programs.
Definition: A language L over an alphabet Σ is a subset of Σ*.
Introduction to Language Theory
Example: Σ = {a, b}.L1 = ø is a languageL2 = {ε} is a languageL3 = {a} is a languageL4 = {a, ba, bbab} is a languageL5 = {anbn / n >= 0} is a language
where an = aa…a, n timesL6 = {a, aa, aaa, …} is a language
Note: L5 is an infinite language, but described finitely.
Introduction to Language Theory
THIS IS THE MAIN GOAL OF LANGUAGE SPECIFICATION :
To describe (infinite) programming languages finitely, and to provide corresponding finite inclusion-test algorithms.
Language Constructors
Definition: The catenation (or product) of two languages L1 and L2, denoted L1L2, is the set
{uv | uL1, vL2}.
Example: L1 = {ε, a, bb}, L2 = {ac, c}
L1L2 = {ac, c, aac, ac, bbac, bbc}
= {ac, c, aac, bbac, bbc}
Language Constructors
Definition: Ln = LL…L (n times), and L0 = {ε}.
Example: L = {a, bb} L3 = {aaa, aabb, abba, abbbb, bbaa, bbabb, bbbba, bbbbbb}
Language ConstructorsDefinition: The union of two languages L1 and L2 is
the set L1 L2 = {u | uL1} { v | vL2}
Definition: The Kleene star (L*) of a language is the set L* = U Ln, n >0.
Example: L = {a, bb} L* = {any string composed of a’s and
bb’s}
Definition: The Transitive Closure (L+) of a language L is the set L+ = U Ln, n > 1.
∩ ∩
Language Constructors
Note: In general, L* = L+ U {ε}, but L+ ≠ L* - {ε}.
For example, consider L = {ε}. Then {ε} = L+ ≠ L* – {ε} = {ε} – {ε} = ø.
Grammars
Goal: Providing a means for describing languages finitely.
Method: Provide a subgraph (Σ*, →*) of (Σ*, ·), and a start node S, such that the set of reachable nodes (from S) are the strings in the language.
Grammars
Example: Σ = {a, b}
L = {anbn / n > 0}
ε
a
b
aa
ab
ba
bb
aab
aaa
bbb
bba
aaba
bbaa
bbab
aabb
b
a
b
a
b
a
a
b
bb
a
a
a
b
Grammars
“=>” (derives) is a relation defined by a finite set of rewrite rules known as productions.
Definition: Given a vocabulary V, a production is a pair (u, v) V* x V*, denoted u → v. u is called the left-part; v is called the right-part.
Grammars
Example: Pseudo-English.V = {Sentence, NP, VP, Adj, N, V, boy, girl, the, tall, jealous, hit, bit}
Sentence → NP VP (one production)NP → NNP → Adj NPN → boyN → girlAdj → theAdj → tallAdj → jealousVP → V NPV → hitV → bit
Note: English is much too complicated to be described this way.
Grammars
Definition: Given a finite set of productions P V* x V* the relation => is defined such that
, β, u, v V* , uβ => vβ iff u → v P is a production.
Example: Sentence → NP VP Adj → the NP → N Adj → tall NP → Adj NP Adj → jealous N → boy VP → V NP N → girl V → hit
V → bit
Grammars
Sentence => NP VP=> Adj NP VP=> the NP VP=> the Adj NP VP=> the jealous NP VP=> the jealous N VP=> the jealous girl VP=> the jealous girl V NP=> the jealous girl hit NP => the jealous girl hit Adj NP=> the jealous girl hit the NP=> the jealous girl hit the N => the jealous girl hit the
boy
GrammarsDefinition: A grammar is a 4-tuple G = (Φ, Σ, P, S) where
Φ is a finite set of nonterminals, Σ is a finite set of terminals, V = Φ U Σ is the grammar’s vocabulary, S Φ is called the start or goal symbol, and P V* x V* is a finite set of productions.
Example: Grammar for {anbn / n > 0}.
G = (Φ, Σ, P, S), where Φ = {S}, Σ = {a, b}, and P = {S → aSb, S → ε}
Grammars
Derivations: S => aSb => aaSbb => aaaSbbb => aaaaSbbbb → …
ε ab aabb aaabbb aaaabbbb
Note: Normally, grammars are given by simply listing the productions.
=> => =>=> =>
Grammar Conventions
TWS convention
1. Upper case letter (identifier) – nonterminal2. Lower case letter (string) – terminal3. Lower case greek letter – strings in V*4. Left part of the first production is assumed to
be the start symbol, e.g.S → aSbS → ε
5. Left part omitted if same as for preceeding production, e.g.S → aSb → ε
GrammarsExample: Grammar for identifiers.
Identifier → Letter→ Identifier Letter→ Identifier Digit
Letter → ‘a’ → ‘A’ → ‘b’ → ‘B’
.
.→ ‘z’ → ‘Z’
Digit → ‘0’→ ‘1’..→ ‘9’
Grammars
Definition: The language generated by a grammar G, is the set L(G) = { Σ* | S =>* }
Definition: A sentential form generated by a grammar G is any string α such that S =>* .
Definition: A sentence generated by a
grammar G is any sentential form such that Σ*.
GrammarsExample:
sentential forms
S => aSb => aaSbb => aaaSbbb => aaaaSbbbb > … ε ab aabb aaabbb aaaabbbb
Lemma: L(G) = { | is a sentence}
Proof: Trivial.
=> => => =>=>sentences
GrammarsDerivations: A => aABC => aaABCBC => …
aBC aaBCBC aaaBCBCBC abC aabCBC aaaBBCBCC abc aabBCC aaaBBBCCC
aabbCC aaabBBCCC (2) aabbcC aaabbbCCC aabbcc aaabbbcCC (2)
aaabbbccc
L (G) = {anbncn | n > 1}
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
=>
The Chomsky Hierarchy
A hierarchy of grammars, the languages they generate, and the machines the accept those languages.
The Chomsky HierarchyType Language
NameGrammarName
RestrictionsOn grammar
Accepting Machine
0 RecursivelyEnumerable
Unrestricted re-writing system
None Turing Machine
1 Context-Sensitive Language
Context- Sensitive Grammar
For all →, ||≤||
Linear Bounded Automaton
2 Context- Free Language
Context- Free Grammar
For all →,Φ.
Push-Down Automaton(parser)
3 RegularLanguage
RegularGrammar
For all →,Φ, UΦU{}
Finite- State Automaton