23
1 Teknik Kompilasi Analisa Sintaks (Scanning) Sulistyo Pusptodjati Sumber: Compiler Construction by Vana Doufexi users.ece.northwestern.edu/~boz283/cs-322-original

Teknik Kompilasi Analisa Sintaks ( Scanning

  • Upload
    others

  • View
    15

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Teknik Kompilasi Analisa Sintaks ( Scanning

1

Teknik KompilasiAnalisa Sintaks (Scanning)

Sulistyo Pusptodjati

Sumber: Compiler Construction by Vana Doufexi users.ece.northwestern.edu/~boz283/cs-322-original

Page 2: Teknik Kompilasi Analisa Sintaks ( Scanning

2

Kuis Compiler bagian dari:

a. Interpreter b. translator Translator yang tidak mempunyai hubungan dengan

pembentukan bahasa mesin: a. Kompiler b. interpreter

Membentuk program output (dalam bentuk exe) yang dapat dijalankan (run) terpisah dari program asli a. Kompiler b. interpreter

Intrepeter lebih bagus untuk web programming dibanding dengan kompiler a.benar b. salah

Menggunakan Kompiler atau interpreter: C, Ruby, C++, JAVA, php?

Page 3: Teknik Kompilasi Analisa Sintaks ( Scanning

3

Try g++ with –v, -E, -S flags on linprog.

Proses Kompilasi

Source program with macros

Preprocessor

Source program

Compiler

Target assembly program

assembler

Relocatable machine code

linker

Absolute machine code

Page 4: Teknik Kompilasi Analisa Sintaks ( Scanning

4

Compiler Front- and Back-end

Semantic Analysis and Intermediate Code Generation

Scanner(lexical analysis)

Parser(syntax analysis)

Machine-Independent Code

Improvement

Target Code Generation

Machine-Specific Code Improvement

Source program (character stream)

Tokens

Parse tree

Abstract syntax tree orother intermediate form

Modified intermediate form

Assembly or object code

Modified assembly or object code

Abstract syntax tree orother intermediate form

Fron

t end

anal

ysis

Bac

k en

dsy

nthe

sis

Page 5: Teknik Kompilasi Analisa Sintaks ( Scanning

5

Contoh Proses Kompilasi

Page 6: Teknik Kompilasi Analisa Sintaks ( Scanning

6

Proses Scanning (Analisa Leksikal)

Tujuan utama: mengenal kata (token) Bagaimana? Dengan mengenal patterns/pola

Contoh: identifier berbentuk susunan huruf atau digits yang diawali dengan huruf

Pola lexical membentuk bahasa regular Regular languages dapat dirumuskan menggunakan

regular expressions (REs) Dapatkan RE recognizer diotomatisasi?

Yes!

Page 7: Teknik Kompilasi Analisa Sintaks ( Scanning

7

The scanning process Goal: automate the process Idea:

Start with an RE Build a DFA

How? We can build a non-deterministic finite automaton

(Thompson's construction) Convert that to a deterministic one

(Subset construction) Minimize the DFA

(Hopcroft's algorithm) Implement it

Existing scanner generator: flex, lex

Page 8: Teknik Kompilasi Analisa Sintaks ( Scanning

8

Proses Scanning

Definisi: Regular expressions (atas alfabet Σ) ε adalah RE dengan notasi {ε} Jika α∈Σ, maka α adalah RE dengan notasi {α} Jika r dan s adalah RE, maka

(r) adalah RE dengan notasi L(r) r|s adalah RE dengan notasi L(r)∪L(s) rs adalah RE dengan notasi L(r)L(s) r* adalah RE dengan notasi Kleene closure dari L(r)

Sifat: RE tertutup pada banyak operasi This allows us to build complex REs.

Page 9: Teknik Kompilasi Analisa Sintaks ( Scanning

9

Regular Definitions A regular expression that describes digits is:

0|1|2|3|4|5|6|7|8|9 For convenience, we'd like to give it a name and then

use the name in building more complex regular expressions:

digit → 0|1|2|3|4|5|6|7|8|9 This is called a regular definition. Example

Integer 0|((1|2|3|..|9 )digit*)letter → a|...|z|A|...|Z ident → letter (letter | digit)* Token_if if

Page 10: Teknik Kompilasi Analisa Sintaks ( Scanning

10

digit → 0|1|2|3|4|5|6|7|8|9 letter → a|...|z|A|...|Z

ident → letter (letter | digit)*

bEtE2bE2tE

Page 11: Teknik Kompilasi Analisa Sintaks ( Scanning

11

What’s next

Given an input string, we need a “machine” that has a regular expression hard-coded in it and can tell whether the input string matches the pattern described by the regular expression or not.

A machine that determines whether a given string belongs to a language is called a finite automaton.

Page 12: Teknik Kompilasi Analisa Sintaks ( Scanning

12

The scanning process

Definition: Deterministic Finite Automaton a five-tuple (Σ, S, δ, s0, F) where

Σ is the alphabet S is the set of states δ is the transition function (S×Σ→S) s0 is the starting state F is the set of final states (F ⊆ S)

Notation: Use a transition diagram to describe a DFA

DFAs are equivalent to REs Hey! We just came up with a recognizer!

Page 13: Teknik Kompilasi Analisa Sintaks ( Scanning

13

The scanning process

Main goal: recognize words/tokens Snapshot:

At any point in time, the scanner has read some input and is on the way to identifying what kind of token has been read (e.g. identifier, operator, integer literal, etc.)

Once the scanner identifies a token, it sends it off to the parser and starts over with the next word. Some tokens need additional data to be carried along

with them For example, an identifier token needs to have the

identifier itself attached to it. Alternatively, the scanner generates a file of tokens which is

then input to the parser.

Page 14: Teknik Kompilasi Analisa Sintaks ( Scanning

14

The scanning process

A simple hand-written scanner would look a bit like this:…nextchar = getNextChar();switch (nextchar) {

case '(': return LPAREN; /* return LPAREN token */case 0:case 1:...case 9: nextchar = getNextChar();

while (nextchar is a digit) {concat the digits to build an integernextchar = getNextChar();

}putBack(nextchar)make a new INTEGER token with the integer value attachedreturn INTEGER;

...} …

Page 15: Teknik Kompilasi Analisa Sintaks ( Scanning

15

The scanning process

Not always as simple as it seems Example from old versions of FORTRAN:

Instead of writing a scanner by hand, we can automate the process. Specify what needs to be recognized and what to do when

something is recognized. Have a scanner generator create the scanner based on our

specification. Hand-written vs. automated scanner

DO 5 I=1,10vs.

DO 5 I=1.10

Page 16: Teknik Kompilasi Analisa Sintaks ( Scanning

16

The scanning process Goal: automate the process Idea:

Start with an RE Build a DFA

How? We can build a non-deterministic finite automaton

(Thompson's construction) Convert that to a deterministic one

(Subset construction) Minimize the DFA

(Hopcroft's algorithm) Implement it

Existing scanner generator: lex, flex

Page 17: Teknik Kompilasi Analisa Sintaks ( Scanning

17

Scanner generator: Lex

digit [0-9]letter [a-zA-Z]%%{letter}({letter}|{digit})* printf(“id: %s\n”, yytext);\n printf(“new line\n”);%%main() {

yylex();}

Lex source is a table of regular expressions and corresponding program fragments

Page 18: Teknik Kompilasi Analisa Sintaks ( Scanning

1818

(optional)

(required)

Lex Source Lex source is separated into three sections by

%% delimiters The general format of Lex source is

The absolute minimum Lex program is thus

{definitions}%%{transition rules}%%{user subroutines}

%%

Page 19: Teknik Kompilasi Analisa Sintaks ( Scanning

19

Page 20: Teknik Kompilasi Analisa Sintaks ( Scanning

20

Page 21: Teknik Kompilasi Analisa Sintaks ( Scanning

21

Contoh untuk suatu bahasa “Tiny”

Page 22: Teknik Kompilasi Analisa Sintaks ( Scanning

22

Page 23: Teknik Kompilasi Analisa Sintaks ( Scanning

23