Lecture1 - Overview of Compiler

  • Upload
    vn79

  • View
    226

  • Download
    0

Embed Size (px)

Citation preview

  • 8/3/2019 Lecture1 - Overview of Compiler

    1/49

    Overview of Compiler

    Compiler is a program (written in ahigh-level language) that converts /

    translates / compiles source programwritten in a high level language intoan equivalent machine code.

    source program machine code

    or object code

    compiler

  • 8/3/2019 Lecture1 - Overview of Compiler

    2/49

    What is a Compiler?

    Definition: A compiler is a programthat translates one language toanother

    Usually, the translation takes placebetween a high-level language anda low-level language

    Clearly, our first step is to discusssome terminology

  • 8/3/2019 Lecture1 - Overview of Compiler

    3/49

    Terminology

    Source language the language thatis being translated

    Object language the language intowhich the translation is being done

    High-level language a language thatis far removed from a computer; one

    which is close to the problem area(s)for which the language is designed

  • 8/3/2019 Lecture1 - Overview of Compiler

    4/49

    Terminology

    Low-level language a language thatis close to the machine (computer)upon which the language will run(execute)

    Object language (sometimes calledmachine code) the language of some

    computer. This language usually isnot human readable (and isexpressed in bits or hex)

  • 8/3/2019 Lecture1 - Overview of Compiler

    5/49

    Terminology

    Intermediate language a language thatis used either:

    because it is a temporary step in the

    translation process; or, because it is neither particularly, high, nor

    low, and is the output of a translation

    Assembly language a language that

    translates almost one-to-one to machinelanguage, but is in human readable form

  • 8/3/2019 Lecture1 - Overview of Compiler

    6/49

    Whats a Compiler?...

    Today, compilers are written using high-level languages (such as Java, C++, etc.)

    The earliest compilers were written using

    assembly language (e.g., FORTRAN andCOBOL around 1954)

    Sometimes a compiler is written in thesame language for which one is writing a

    compiler. This is done throughBootstrapping.

  • 8/3/2019 Lecture1 - Overview of Compiler

    7/49

    Why Should I learn CompilerConstruction?

    How do compilers work?

    How do computers work? (instruction set,registers, addressing modes, run time data

    structures, )What machine code is generated for certain

    language constructs? (efficiencyconsiderations)

    Getting "a feeling" for good language design

  • 8/3/2019 Lecture1 - Overview of Compiler

    8/49

    Why Compilers? A Brief History

    The first computers were hard-wired

    That is, they were collections ofphysical devices that connected toone-another, in an assemblagedesigned to calculate particular kinds

    of results

  • 8/3/2019 Lecture1 - Overview of Compiler

    9/49

    Why Compilers? A BriefHistory

    For example, Babbages AnalyticEngine and his Difference Enginewere assemblages of gears that

    solved numeric problems The primary driving force was the

    calculation of ballistics tables forartillery

    Jacquards loom is another example And Holleriths work for the US

    Census bureau is another

  • 8/3/2019 Lecture1 - Overview of Compiler

    10/49

    Why Compilers? A BriefHistory

    In the late 1940s John von Neumanninvented the stored programcomputer

    The invention is the observationthat just as you can store data in thememory of a computer, the data can

    be machine instructions Then the computer can not only take

    its instructions from memory

  • 8/3/2019 Lecture1 - Overview of Compiler

    11/49

    Why Compilers? A BriefHistory

    But the computer can modify theinstructions in its memory

    And, in fact, can write its ownprograms, storing them in memory

    It quickly became apparent that thesimplest way to store information in a

    computer was in the form of binarynumbers

  • 8/3/2019 Lecture1 - Overview of Compiler

    12/49

    Why Compilers? A BriefHistory

    So, to program a computer, you onlyneeded to enter a sequence of binarynumbers into memory, and then tell

    the computer at which memoryaddress to start execution

    This was programming in machinelanguage

    Instructions (and data) were enteredfrom a console, one word (in binary)at a time

  • 8/3/2019 Lecture1 - Overview of Compiler

    13/49

    Why Compilers? A BriefHistory

    This form of coding (note the word!)quickly was replaced by programmingin assembly language

    A program was written (in machinelanguage) which translated assemblylanguage to machine language (called

    an assembler)

  • 8/3/2019 Lecture1 - Overview of Compiler

    14/49

    Why Compilers? A BriefHistory

    After the first assembler was written,no one needed to code in machinelanguage any longer

    But, coding x = 3; can take manyinstructions

    So, the thought was can we create

    a program that translates somethinglike x = 3; into assembly languageor into machine language?

  • 8/3/2019 Lecture1 - Overview of Compiler

    15/49

    Why Compilers? A Brief History.Formal Languages

    About the same time, in the mid-1950s, Noam Chomsky (M.I.T.)began investigating the formalstructure of natural languages

    His work led to the Chomskyhierarchy oftype 0, 1, 2, 3 languages

    and their associated grammars

  • 8/3/2019 Lecture1 - Overview of Compiler

    16/49

    Why Compilers? A Brief History.Formal Languages

    The type 2 (context-free) grammarsturned out to be very good atdescribing computer languages

    And, efficient ways to recognize thestructure of a source program using atype 2 were developed

    Such recognition is called parsing

  • 8/3/2019 Lecture1 - Overview of Compiler

    17/49

    Why Compilers? A Brief History.Formal Languages

    Very closely related to context-freegrammars are the type 3 grammars

    These are equivalent to finiteautomata and regular grammars

    An entire sub-branch of mathematicsstudies automata; its called

    automata theory

  • 8/3/2019 Lecture1 - Overview of Compiler

    18/49

    Why Compilers? A Brief History.Formal Languages

    It turns out that type 3 (regular)grammars are very good atdescribing the atoms used in

    computer languages These atoms are the reserved

    words, symbols, and user-definedwords that are used in a computerlanguage

    Recognizing atoms is called scanning(or lexing)

  • 8/3/2019 Lecture1 - Overview of Compiler

    19/49

    Why Compilers? A BriefHistory

    By far the most difficult andcomplicated problem has been howto generate object code that isconcise, and most importantly,executes efficiently

    This is called optimization

  • 8/3/2019 Lecture1 - Overview of Compiler

    20/49

    Why Compilers? A BriefHistory

    Far simpler are the front-end issuesofscanning and parsing = recognizingthe source code

    This is due to the fact that wevedeveloped (semi-) automatic ways tocreate scanners and parsers

    using scanner generators and parsergenerators

  • 8/3/2019 Lecture1 - Overview of Compiler

    21/49

    Programs Related toCompilers

    Interpreters directly executesthe code upon recognition;usually statement by statement

    Assemblers translateassembly language to machinelanguage

    Macro Assemblers ditto, butwith (powerful) macrocapabilities

  • 8/3/2019 Lecture1 - Overview of Compiler

    22/49

    Programs Related toCompilers

    Linkers combine objectmodules to produce an

    executable module Linkage Editors manage the

    linking process, and are able to

    create/maintain object libraries

  • 8/3/2019 Lecture1 - Overview of Compiler

    23/49

    Programs Related toCompilers

    Loaders load executablemodules into memory, and

    launch executionDynamic Loaders loaders that

    stay around during execution to

    handle the loading of DLLs(dynamically loadable libraries)

  • 8/3/2019 Lecture1 - Overview of Compiler

    24/49

    Programs Related toCompilers

    Preprocessors usually aseparate program whose input is

    source code and whose output issource code; perform macroexpansion, comment deletion,

    etc. Sometimes the first phaseof a compiler

  • 8/3/2019 Lecture1 - Overview of Compiler

    25/49

    Programs Related toCompilers

    Editors allow the user to create andupdate source code

    Smart Editors include syntaxcoloring, parenthesis balancing, etc.

    Debuggers a program that providesan environment in which code may be

    debugged; including single stepping,symbol tables, etc.

  • 8/3/2019 Lecture1 - Overview of Compiler

    26/49

    Programs Related toCompilers

    IDEs integrated developmentenvironments; provide integratededitor-debugger-executionenvironments

    Profilers collects statistics aboutwhere programs spend their time

    during execution; important foroptimizing at the source code level

  • 8/3/2019 Lecture1 - Overview of Compiler

    27/49

    Programs Related toCompilers

    Project Managers programs thathelp software managers deal withhundreds or thousands of modules;build reports, etc.

    SCCS source code control systems;provide for multiple access to shared

    code in a control manner

  • 8/3/2019 Lecture1 - Overview of Compiler

    28/49

    The Translation Process

    The translation process consists ofa collection ofphases, with the outputof one phase feeding the input of the

    next

    The original source code istransformed into a sequence of

    intermediate representations (IRs)during this process

  • 8/3/2019 Lecture1 - Overview of Compiler

    29/49

    The

    Translation

    Process

  • 8/3/2019 Lecture1 - Overview of Compiler

    30/49

    Phases of Compiler

    Parallel to all other phases are twoactivities:

    Symbol table manipulation. Symboltable is one of the primary data-structures that a compiler uses. Thisdata-structure is used by all of the

    phases.Error detecting and handling

  • 8/3/2019 Lecture1 - Overview of Compiler

    31/49

    The Scanner

    The scanner reads the sourceprogram, as a stream of characters,and it performs lexical analysis

    collecting sequences of charactersinto meaningful units called tokens

    The scanner also may create a

    symbol table and a literal table

  • 8/3/2019 Lecture1 - Overview of Compiler

    32/49

    The Parser

    The parser reads the tokens producedby the scanner and performssyntactic analysis creating an IR (a

    parse tree or a syntax tree) showingthe structure of the program

    Syntax trees (abstract syntax trees)are reduced representations of the

    tree, with many irrelevant nodeseliminated

  • 8/3/2019 Lecture1 - Overview of Compiler

    33/49

    The Semantic Analyzer

    The semantics of a program are itsmeaning what it is intended toaccomplish

    The semantic analyzer creates anintermediate data structure thatcontains this meaning these are thestatic semantics

    The dynamic semantics of a programonly can be determined byexecuting the program

  • 8/3/2019 Lecture1 - Overview of Compiler

    34/49

    The Semantic Analyzer

    An example of the static semantics ofa program is the data types of thevariables (and expressions)

    These static semantics usually arerepresented in the intermediaterepresentations (IRs) as attributes

    The IR usually is a tree, decoratedwith these attributes

  • 8/3/2019 Lecture1 - Overview of Compiler

    35/49

    (Source) Code Optimization

    Optimization may occur duringseveral phases

    Source code optimization rearrangesthe source (or the IR of the source) inorder to produce more optimal results

    E.g., x = 7 + 9; can become x =

    16;

    This is called constant folding

  • 8/3/2019 Lecture1 - Overview of Compiler

    36/49

    (Source) Code Optimization

    Duplicated computations can besaved as temporaries and thentheir values re-used

    Recursion can be converted toiteration

    Repeated calculations can be moved

    out of loops

    The possibilities are endless

  • 8/3/2019 Lecture1 - Overview of Compiler

    37/49

    The Code Generator

    The code generator takes the IR andgenerates code for the targetmachine

    Here the details of how variousnumeric and non-numeric quantitiesare represented become important

    E.g., word length, hardware stack,hardware calling conventions,memory access, etc.

  • 8/3/2019 Lecture1 - Overview of Compiler

    38/49

    The Target Code Optimizer

    The target code optimizer examinesthe emitted target code to see iffurther possibilities for optimization

    are present and then capitalizes uponthem

    E.g., reuse of registers, using a shift

    instruction to replace a multiplicationor division, etc.

  • 8/3/2019 Lecture1 - Overview of Compiler

    39/49

    Phases of the compiler

    Lexical AnalyzerScanner

    Parser

    Semantic Analyzer

    Source Program

    Syntax Analyzer

    Tokens

    Parse Tree

    Abstract Syntax Tree with

    attributes

  • 8/3/2019 Lecture1 - Overview of Compiler

    40/49

    Sample Program Compiled

    Consider the example:

    int a, b{

    a = 100;b = f (a) + 3}

    Source Program

    Lexical Analyzer

    Token stream

  • 8/3/2019 Lecture1 - Overview of Compiler

    41/49

    Sample Program Compiled

    Tokens are entities defined by the compiler writer

    which are of interest. A sequence of characters with

    collective meanings are grouped to form a token.

    Examples of Tokens:Single Character operator: = + - * >