2.1 2. Introduction To Compilers And Phase 1 Inside a compiler. Inside a C-- compiler. The compilation process. Example C-- code. Extended Backus-Naur

Embed Size (px)

Text of 2.1 2. Introduction To Compilers And Phase 1 Inside a compiler. Inside a C-- compiler. The...

  • Slide 1

2.1 2. Introduction To Compilers And Phase 1 Inside a compiler. Inside a C-- compiler. The compilation process. Example C-- code. Extended Backus-Naur form. Lexical analysis. The syntax of C--. The unit directory. Phase 1. Slide 2 2.2 Inside A Compiler symbol table abstract syntax tree p.lst Syntax analyser Lexical analyser Code generator p.cxx tokens a.out C++ Compiler Optimiser Slide 3 2.3 Inside A C-- Compiler symbol table abstract syntax tree Syntax analyser Lexical analyser Code generator p.c-- tokens a.s C-- Compiler Slide 4 2.4 The Compilation Process There are three main stages to compilation : Lexical analysis. Syntax analysis. Code generation. Lexical analysis. Recognising the individual components of the language. Literals, identifiers, operators etc. Throwing away irrelevant things like comments and whitespace. Often called tokenising. Slide 5 2.5 The Compilation Process II Syntax analysis. Recognising declarations, statements etc. Detecting syntactic and static semantic errors. Building the symbol table and abstract syntax tree (AST). Code generation. Generating machine code from the symbol table and the AST. Most modern compilers also perform optimisation on the code after both syntax analysis (macro optimisation) and code generation (micro optimisation). There are three phases to writing a compiler. Phase 1 : write a lexical analyser. Phase 2 : write a syntax analyser. Phase 3 : write a code generator. Slide 6 2.6 Example C-- Code The factorial program : // Computes the factorial of a value read // from input. int a = 1 ; // Result int b = 0 ; // Data { cin >> b ; // Read data // Loop to compute factorial. while (b > 0) // Check for termination { a = a * b ; // Compute new a value b = b - 1 ; // Decrement b } cout 2.11 C-- Lexical Tokens The lexical tokens are the terminal symbols of the grammar. The lexical tokens in C-- can be split into the following groups : Identifiers. Literals. Punctuation : =, ,,, ;, &, (, ), [, ], {, } and !. Operators : Relational : ==, !=, >, = and 2.15 Example Run Assume that the file prog.c-- contains the following simple program : // Simple test program int a ; const string s = Input : ; const string endl = \n ; { cin >> a ; cout 2.16 Example Run II Use the makefile to compile and link your lexer into the file lexer. Then run it : jaguar> make lexer jaguar> lexer < prog.c-- INT IDENTIFIER : a TERMINATOR CONST STRING IDENTIFIER : s STRINGLIT : Input : TERMINATOR CONST STRING IDENTIFIER : endl STRINGLIT : \n TERMINATOR LBRACE Lexer reads from cin and writes to cout each tokens kind and (if required), its value. Lexer reads from cin and writes to cout each tokens kind and (if required), its value. Slide 17 2.17 Example Run III CIN INOP IDENTIFIER : a TERMINATOR COUT OUTOP IDENTIFIER : s TERMINATOR COUT OUTOP IDENTIFIER : b TERMINATOR COUT OUTOP IDENTIFIER : endl TERMINATOR RBRACE jaguar> Slide 18 2.18 lexprog.cxx #include #include ".../phase1/lexer.h" void main() { LexToken lexToken ; // Next lexical token skipWhiteComments() ; while (cin) { lexAnal(lexToken) ; writeToken(lexToken) ; cout 2.23 Lookahead 28 possible kinds of lexical token : one value in the LexTokenTag enum for each kind. lexAnal can identify the kind of some tokens just by looking at the first character on input (i.e. by one character lookahead). This is true for tokens that start with (or consist of only) the following characters : ,,, ;, (, ), [, ], {, }, +, -, *, /, %, 0..9. The following pairs of tokens require lexAnal to inspect the next two input characters (i.e. two character lookahead) to distinguish between them : = and == ! and != | and || > and >= < and