29
INTRODUCTION TO COMPILER Baishakhi Ray Programming Languages & Translators These slides are motivated from Prof. Calvin Lin, UT Austin

INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

INTRODUCTION TO COMPILERBaishakhi Ray

Programming Languages & Translators

These slides are motivated from Prof. Calvin Lin, UT Austin

Page 2: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

What is a Compiler?

Compiler

#include <iostream>

int main() { std::cout << "Hello World!"; return 0;}

001100110011001001001100110000000111001000110000000110110011

Source Program Target Program

Input

Output

Page 3: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

What is a Compiler?

CompilerSource Program Target Program

Input

Output

InterpreterSource Program

InputOutput

Page 4: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

A Hybrid Compiler

Translator

Source Program

InterpreterIntermediate Program

InputOutput

Page 5: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

A language processing system

Preprocessor

Compiler

Assembler

Linker/Loader

Source Program

Modified Source Program

Target Assembly Program

Relocatable Machine Code

Target Machine CodeLibrary filesRelocatable object files

A source code typically have many modules written in different files.

Collect source program from different files, expand macros, etc.

Assembly code is easier to produce and debug.

A linker resolves external memory addresses, where one file’s code

refers another file’s location. Loader puts all the executive object files to memory for

execution.

Page 6: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

What is a Compiler?

Compiler

#include <iostream>

int main() { std::cout << "Hello World!"; return 0;}

001100110011001001001100110000000111001000110000000110110011

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Page 7: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Structure of a Typical Compiler

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Analysis Phase

Front End

Synthesis Phase

Back End

Page 8: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Input to Compiler

for i = 1 to 10 doa[i] = x * 5;

f o r i = 1 t o 1 0 d o a [ i ] = x * 5 ;

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Page 9: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Lexical Analysis

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 doa[i] = x * 5;

Break character stream into tokens (“words”)

for id(i) <=> number(1) to number(10) do id(a) <[> id(i) <]> <=> id(x) <*> number(5) <;>

Page 10: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Compiler Data Structure

▪ Symbol Tables▪ Compile-time data structures▪ Hold names, type information, and scope information for variables

▪ Scopes ▪ A name space e.g., In C/C++, each set of curly braces defines a new scope ▪ Can create a separate symbol table for each scope

Page 11: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Lexical Analysis

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 doa[i] = x * 5;

Break character stream into tokens (“words”)

for id(i) <=> number(1) to number(10) do id(a) <[> id(i) <]> <=> id(x) <*> number(5) <;>

for <id,1> <=> number(1) to number(10) do <id,2> <[> <id,1> <]> <=> <id,3> <*> number(5) <;>

1 i …2 a …3 x …

Symbol Table

Page 12: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Syntactic Analysis (Parsing)

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 doa[i] = x * 5;

Impose Structure to Token Stream

for

i 1 10 assign

a i x 5

array timesIn a typical syntax tree, intermediate nodes represent operations and Leaf node represent the arguments of the operations.

Page 13: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Semantic Analysis

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 doa[i] = x * 5;

Determine whether source is meaningful

▪ Check for semantic errors▪ Check for type errors▪ Gather type information for subsequent stages

▪ Relate variable uses to their declarations

Page 14: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Usage of Symbol Tables

▪ For each variable declaration: ▪ Check for symbol table entry ▪ Add new entry (parsing)▪ add type info (semantic analysis)

▪ For each variable use: ▪ Check symbol table entry (semantic analysis)

Page 15: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Intermediate Code Generation

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Transform AST into low-level intermediate representation (IR)

Simplifies the IR

• Removes high-level control structures: - for, while, do, switch • Removes high-level data structures: - arrays, structs, unions, enums

Page 16: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Intermediate Code Generation

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Transform AST into low-level intermediate representation (IR)

One possible result is assembly-like code

• Semantic lowering • Control-flow expressed in terms of “gotos” • Each expression is very simple (three-address code) e.g., x := a * b * c

t := a * b x := t * c

Page 17: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Intermediate Code Generation

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 doa[i] = x * 5;

i := 1 loop1: t1 := x * 5 t2 := &a t3 := sizeof(int) t4 := t3 * i t5 := t2 + t4 *t5 := t1 i := i + 1 if i <= 10 goto loop1

Page 18: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Optimization

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Mostly machine independent optimization Phase aims to generate better code.

Better can be• Faster• Shorter• Energy efficient• …

Page 19: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Optimization

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

for i = 1 to 10 doa[i] = x * 5;

i := 1 t3 := sizeof(int) loop1: t1 := x * 5 t2 := &a t3 := sizeof(int) t4 := t3 * i t5 := t2 + t4 *t5 := t1 i := i + 1 if i <= 10 goto loop1

Page 20: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Low Level Code Generation

Intermediate Code Generation

optimization

Code Generation

Lexical Analysis

Syntactic Analysis

Semantic Analysis

Interpreter

Character stream

Token stream

Syntax trees

Syntax trees

IR

IR

Target Language

Register Transfer Language (RTL) – Linear representation – Typically language-independent – Nearly corresponds to machine instructions

Example operations • Assignment x := y • Unary op x := op y • Binary op x := y op z • Call x := f() • Cbranch if (x==3) goto L• Address of p := & y • Load x := *(p+4) • Store *(p+4) := y

Page 21: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Exercise:

a = b + c * 5

Page 22: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Compiler vs Interpreter

Compiler InterpreterOptimization Compiler sees the entire program. Thus

optimization is easy.Interpreter sees program line by line. Thus, optimization is not robust.

Running time Compiled code runs faster Interpreted code runs slower

Program Generation

Compiler generates output program, which can be run independently at a later point of time.

Do not generate output program. Evaluate each line one by one during program execution.

Error Execution Emits compilation errors after the whole compilation process.

Reads each line and shows errors, if any.

Example C, C++ Command Lines

Page 23: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Why studying compiler?

Isn’t it a solved problem?

▪ Machines keep changing ▪ New features present new problems (e.g., GPU) ▪ Changing costs lead to different concerns

▪ Languages keep changing ▪ New ideas (e.g., OOP and GC) have gone mainstream

▪ Applications keep changing ▪ Interactive, real-time, mobile, machine-learning based applications

Page 24: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Why studying compiler?

▪ Values keep changing

▪ We used to just care about run-time performance

▪ Now? ▪ Compile-time performance▪ Code size ▪ Correctness ▪ Energy consumption▪ Security ▪ Fault tolerance

Page 25: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Value added compilation

▪ The more we rely on software, the more we demand more of it

▪ Compilers can help– treat code as data ▪ Analyze the code

▪ Correctness

▪ Security

Page 26: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Correctness and Security

▪ Can we check whether pointers and addresses are valid?

▪ Can we detect when untrusted code accesses a sensitive part of a system?

▪ Can we detect whether locks are used properly?

▪ Can we use compilers to certify that code is correct?

▪ Can we use compilers to verify that a given compiler transformation is correct?

Page 27: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Value-added Compilation

▪ The more we rely on software, the more we demand more of it

▪ Compilers can help– treat code as data ▪ Analyze the code

▪ Correctness ▪ Security ▪ Reliability ▪ Program understanding ▪ Program evolution

▪ Software testing ▪ Reverse engineering ▪ Program obfuscation ▪ Code compaction ▪ Energy efficiency

Computation important understanding computation important

Page 28: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Why studying compiler?

▪ Compilers are a fundamental building block of modern systems

▪ We need to understand their power and limitations ▪ Computer architects ▪ Language designers ▪ Software engineers ▪ OS/Runtime system researchers ▪ Security researchers ▪ Formal methods researchers (model checking, automated theorem proving)

Page 29: INTRODUCTION TO COMPILERbaishakhir.github.io/class/2020_Fall/1_introduction.pdfrefers another file’s location. Loader puts all the executive object files to memory for execution

Due

• Programming assignment 0 is due next Wednesday.• Set up Git environment