Chapter 10: Compilers and Language Translation

  • View
    26

  • Download
    3

Embed Size (px)

DESCRIPTION

Chapter 10: Compilers and Language Translation. Invitation to Computer Science, C++ Version, Third Edition. Objectives. In this chapter, you will learn about: The compilation process Phase I: Lexical analysis Phase II: Parsing Phase III: Semantics and code generation - PowerPoint PPT Presentation

Transcript

  • Chapter 10: Compilers and Language TranslationInvitation to Computer Science,C++ Version, Third Edition

    Invitation to Computer Science, C++ Version, Third Edition

  • ObjectivesIn this chapter, you will learn about:The compilation processPhase I: Lexical analysisPhase II: ParsingPhase III: Semantics and code generationPhase IV: Code optimization

    Invitation to Computer Science, C++ Version, Third Edition

  • IntroductionHigh-level language instructions must be translated into machine language prior to executionCompilerA piece of system software that translates high-level languages into machine language

    Invitation to Computer Science, C++ Version, Third Edition

  • Introduction (continued)Goals of a compiler when performing a translationCorrectnessProducing a reasonably efficient and concise machine language code

    Invitation to Computer Science, C++ Version, Third Edition

  • Figure 10.1General Structure of a Compiler

    Invitation to Computer Science, C++ Version, Third Edition

  • The Compilation ProcessPhase I: Lexical analysisCompiler examines the individual characters in the source program and groups them into syntactical units called tokensPhase II: ParsingThe sequence of tokens formed by the scanner is checked to see whether it is syntactically correct

    Invitation to Computer Science, C++ Version, Third Edition

  • The Compilation Process (continued)Phase III: Semantic analysis and code generationThe compiler analyzes the meaning of the high-level language statement and generates the machine language instructions to carry out these actionsPhase IV: Code optimizationThe compiler takes the generated code and sees whether it can be made more efficient

    Invitation to Computer Science, C++ Version, Third Edition

  • Figure 10.2Overall Execution Sequence on a High-level Language Program

    Invitation to Computer Science, C++ Version, Third Edition

  • The Compilation Process (continued)Final stepObject program is written to an object fileSource programOriginal high-level language programObject programMachine language translation of the source program

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase I: Lexical AnalysisLexical analyzerThe program that performs lexical analysisMore commonly called a scannerJob of lexical analyzerGroup input characters into tokensTokens: syntactical units that are treated as single, indivisible entities for the purposes of translationClassify tokens according to their type

    Invitation to Computer Science, C++ Version, Third Edition

  • Figure 10.3Typical Token Classifications

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase I: Lexical Analysis (continued)Input to a scannerA high-level language statement from the source programScanners outputA list of all the tokens in that statementThe classification number of each token found

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase II: Parsing IntroductionParsing phaseA compiler determines whether the tokens recognized by the scanner are a syntactically legal statementPerformed by a parser

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase II: Parsing Introduction (continued)Output of a parserA parse tree, if such a tree existsAn error message, if a parse tree cannot be constructedSuccessful construction of a parse tree is proof that the statement is correctly formed

    (See the next slide)

    Invitation to Computer Science, C++ Version, Third Edition

  • ExampleHigh-level language statement: a = b + c

    Invitation to Computer Science, C++ Version, Third Edition

  • Grammars, Languages, and BNFSyntaxThe grammatical structure of the languageThe parser must be given the syntax of the languageBNF (Backus-Naur Form)Most widely used notation for representing the syntax of a programming language;

    Invitation to Computer Science, C++ Version, Third Edition

  • Grammars, Languages, and BNF (continued)In BNFThe syntax of a language is specified as a set of rules (also called productions)A grammarThe entire collection of rules for a languageStructure of an individual BNF ruleleft-hand side ::= definition

    Invitation to Computer Science, C++ Version, Third Edition

  • Grammars, Languages, and BNF (continued)BNF rules use two types of objects on the right-hand side of a productionTerminalsThe actual tokens of the languageNever appear on the left-hand side of a BNF ruleNonterminalsIntermediate grammatical categories used to help explain and organize the languageMust appear on the left-hand side of one or more rules

    Invitation to Computer Science, C++ Version, Third Edition

  • Grammars, Languages, and BNF (continued)Goal symbolThe highest-level nonterminalThe nonterminal object that the parser is trying to produce as it builds the parse treeAll nonterminals are written inside angle brackets

    Invitation to Computer Science, C++ Version, Third Edition

  • Parsing Concepts and TechniquesFundamental rule of parsingBy repeated applications of the rules of the grammarIf a parser can convert the sequence of input tokens into the goal symbol, then that sequence of tokens is a syntactically valid statement of the languageIf the parser cannot convert the input tokens into the goal symbol, then this is not a syntactically valid statement of the language

    Invitation to Computer Science, C++ Version, Third Edition

  • Parsing Concepts and Techniques (continued)One of the biggest problems in building a compiler is designing a grammar that:Includes every valid statement that we want to be in the languageExcludes every invalid statement that we do not want to be in the language

    Invitation to Computer Science, C++ Version, Third Edition

  • Parsing Concepts and Techniques (continued)Another problem in constructing a compiler: designing a grammar that is not ambiguousAn ambiguous grammar allows the construction of two or more distinct parse trees for the same statement

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase III: Semantics and Code GenerationSemantic analysisThe compiler makes first pass over parse tree to determine whether all branches of the tree are semantically validIf they are valid, the compiler can generate machine language instructionsIf not, there is a semantic error; machine language instructions are not generated

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase III: Semantics and Code Generation (continued)Code generationCompiler makes the second pass over the parse tree to produce the translated code

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase IV: Code OptimizationTwo types of optimizationLocal Global Local optimizationThe compiler looks at a very small block of instructions and tries to determine how it can improve the efficiency of this local code blockRelatively easy; included as part of most compilers

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase IV: Code Optimization (continued)Examples of possible local optimizationsConstant evaluationStrength reductionEliminating unnecessary operations

    Invitation to Computer Science, C++ Version, Third Edition

  • Phase IV: Code Optimization (continued)Global optimizationThe compiler looks at large segments of the program to decide how to improve performanceMuch more difficult; usually omitted from all but the most sophisticated and expensive production-level optimizing compilersOptimization cannot make an inefficient algorithm efficient

    Invitation to Computer Science, C++ Version, Third Edition

  • SummaryA compiler is a piece of system software that translates high-level languages into machine languageGoals of a compiler: correctness, and producing efficient and concise codeSource program: high-level language program

    Invitation to Computer Science, C++ Version, Third Edition

  • SummaryObject program: the machine language translation of the source programPhases of the compilation processPhase I: Lexical analysisPhase II: ParsingPhase III: Semantic analysis and code generationPhase IV: Code optimization

    Invitation to Computer Science, C++ Version, Third Edition