Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
1
Teknik KompilasiAnalisa Sintaks (Scanning)
Sulistyo Pusptodjati
Sumber: Compiler Construction by Vana Doufexi users.ece.northwestern.edu/~boz283/cs-322-original
2
Kuis Compiler bagian dari:
a. Interpreter b. translator Translator yang tidak mempunyai hubungan dengan
pembentukan bahasa mesin: a. Kompiler b. interpreter
Membentuk program output (dalam bentuk exe) yang dapat dijalankan (run) terpisah dari program asli a. Kompiler b. interpreter
Intrepeter lebih bagus untuk web programming dibanding dengan kompiler a.benar b. salah
Menggunakan Kompiler atau interpreter: C, Ruby, C++, JAVA, php?
3
Try g++ with –v, -E, -S flags on linprog.
Proses Kompilasi
Source program with macros
Preprocessor
Source program
Compiler
Target assembly program
assembler
Relocatable machine code
linker
Absolute machine code
4
Compiler Front- and Back-end
Semantic Analysis and Intermediate Code Generation
Scanner(lexical analysis)
Parser(syntax analysis)
Machine-Independent Code
Improvement
Target Code Generation
Machine-Specific Code Improvement
Source program (character stream)
Tokens
Parse tree
Abstract syntax tree orother intermediate form
Modified intermediate form
Assembly or object code
Modified assembly or object code
Abstract syntax tree orother intermediate form
Fron
t end
anal
ysis
Bac
k en
dsy
nthe
sis
5
Contoh Proses Kompilasi
6
Proses Scanning (Analisa Leksikal)
Tujuan utama: mengenal kata (token) Bagaimana? Dengan mengenal patterns/pola
Contoh: identifier berbentuk susunan huruf atau digits yang diawali dengan huruf
Pola lexical membentuk bahasa regular Regular languages dapat dirumuskan menggunakan
regular expressions (REs) Dapatkan RE recognizer diotomatisasi?
Yes!
7
The scanning process Goal: automate the process Idea:
Start with an RE Build a DFA
How? We can build a non-deterministic finite automaton
(Thompson's construction) Convert that to a deterministic one
(Subset construction) Minimize the DFA
(Hopcroft's algorithm) Implement it
Existing scanner generator: flex, lex
8
Proses Scanning
Definisi: Regular expressions (atas alfabet Σ) ε adalah RE dengan notasi {ε} Jika α∈Σ, maka α adalah RE dengan notasi {α} Jika r dan s adalah RE, maka
(r) adalah RE dengan notasi L(r) r|s adalah RE dengan notasi L(r)∪L(s) rs adalah RE dengan notasi L(r)L(s) r* adalah RE dengan notasi Kleene closure dari L(r)
Sifat: RE tertutup pada banyak operasi This allows us to build complex REs.
9
Regular Definitions A regular expression that describes digits is:
0|1|2|3|4|5|6|7|8|9 For convenience, we'd like to give it a name and then
use the name in building more complex regular expressions:
digit → 0|1|2|3|4|5|6|7|8|9 This is called a regular definition. Example
Integer 0|((1|2|3|..|9 )digit*)letter → a|...|z|A|...|Z ident → letter (letter | digit)* Token_if if
10
digit → 0|1|2|3|4|5|6|7|8|9 letter → a|...|z|A|...|Z
ident → letter (letter | digit)*
bEtE2bE2tE
11
What’s next
Given an input string, we need a “machine” that has a regular expression hard-coded in it and can tell whether the input string matches the pattern described by the regular expression or not.
A machine that determines whether a given string belongs to a language is called a finite automaton.
12
The scanning process
Definition: Deterministic Finite Automaton a five-tuple (Σ, S, δ, s0, F) where
Σ is the alphabet S is the set of states δ is the transition function (S×Σ→S) s0 is the starting state F is the set of final states (F ⊆ S)
Notation: Use a transition diagram to describe a DFA
DFAs are equivalent to REs Hey! We just came up with a recognizer!
13
The scanning process
Main goal: recognize words/tokens Snapshot:
At any point in time, the scanner has read some input and is on the way to identifying what kind of token has been read (e.g. identifier, operator, integer literal, etc.)
Once the scanner identifies a token, it sends it off to the parser and starts over with the next word. Some tokens need additional data to be carried along
with them For example, an identifier token needs to have the
identifier itself attached to it. Alternatively, the scanner generates a file of tokens which is
then input to the parser.
14
The scanning process
A simple hand-written scanner would look a bit like this:…nextchar = getNextChar();switch (nextchar) {
case '(': return LPAREN; /* return LPAREN token */case 0:case 1:...case 9: nextchar = getNextChar();
while (nextchar is a digit) {concat the digits to build an integernextchar = getNextChar();
}putBack(nextchar)make a new INTEGER token with the integer value attachedreturn INTEGER;
...} …
15
The scanning process
Not always as simple as it seems Example from old versions of FORTRAN:
Instead of writing a scanner by hand, we can automate the process. Specify what needs to be recognized and what to do when
something is recognized. Have a scanner generator create the scanner based on our
specification. Hand-written vs. automated scanner
DO 5 I=1,10vs.
DO 5 I=1.10
16
The scanning process Goal: automate the process Idea:
Start with an RE Build a DFA
How? We can build a non-deterministic finite automaton
(Thompson's construction) Convert that to a deterministic one
(Subset construction) Minimize the DFA
(Hopcroft's algorithm) Implement it
Existing scanner generator: lex, flex
17
Scanner generator: Lex
digit [0-9]letter [a-zA-Z]%%{letter}({letter}|{digit})* printf(“id: %s\n”, yytext);\n printf(“new line\n”);%%main() {
yylex();}
Lex source is a table of regular expressions and corresponding program fragments
1818
(optional)
(required)
Lex Source Lex source is separated into three sections by
%% delimiters The general format of Lex source is
The absolute minimum Lex program is thus
{definitions}%%{transition rules}%%{user subroutines}
%%
19
20
21
Contoh untuk suatu bahasa “Tiny”
22
23