Compilers and Language Translation

Embed Size (px)


Compilers and Language Translation. Gordon College. What’s a compiler?. All computers only understand machine language Therefore, high-level language instructions must be translated into machine language prior to execution. This is a program. 10000010010110100100101……. What’s a compiler?. - PowerPoint PPT Presentation

Text of Compilers and Language Translation

  • Compilers and Language TranslationGordon College

  • Whats a compiler?All computers only understand machine language

    Therefore, high-level language instructions must be translated into machine language prior to execution

    10000010010110100100101This isa program

  • Whats a compiler?Compiler A piece of system software that translates high-level languages into machine language

    10000010010110100100101Congrats!while (c!='x') { if (c == 'a' || c == 'e' || c == 'i')printf("Congrats!"); else if (c!='x') printf("You Loser!"); }

    Compilergcc -o prog program.cprogram.cprog

  • Assembler (a kind of compiler)LOADXAssembly0101 0000 0000 1001Machine Language(symbol table)(opcode table)One-to-one translation

  • Compiler (high-level language translator)a = b + c - d;One-to-many translation0101 00001110001LOAD B0111 00001110010ADD C0110 00001110011SUBTRACT D0100 00001110100STORE A0101 00001110001 0111 00001110010.

  • Goals of a compilerCode produced must be correct

    A = (B+C)-(D+E);Possible translation:LOAD BADD CSTORE BLOAD DADD ESTORE DLOAD BSUBTRACT DSTORE ANo - STORE B and STORE D changes the values of variables B and D which is the high-level language does not intendIs this correct?

  • Goals of a compilerCode produced should be reasonably efficient and conciseCompute the sum - 2x1+ 2x2+ 2x3+ 2x4+. 2x50000sum = 0.0for(i=0;i
  • General Structure of a Compiler

  • The Compilation ProcessPhase I: Lexical analysisCompiler examines the individual characters in the source program and groups them into syntactical units called tokensPhase II: ParsingThe sequence of tokens formed by the scanner is checked to see whether it is syntactically correctScannerSourcecodeGroups oftokensParserGroups oftokenscorrectnot correct

  • The Compilation ProcessPhase III: Semantic analysis and code generationThe compiler analyzes the meaning of the high-level language statement and generates the machine language instructions to carry out these actions

    Code GeneratorGroupsof tokensMachinelanguage

  • The Compilation ProcessPhase IV: Code optimizationThe compiler takes the generated code and sees whether it can be made more efficientCode OptimizerMachinelanguageMachinelanguage

  • Overall Execution Sequence on a High-Level Language Program

  • The Compilation ProcessSource programOriginal high-level language programObject programMachine language translation of the source program

  • Phase I: Lexical AnalysisLexical analyzerThe program that performs lexical analysisMore commonly called a scannerJob of lexical analyzerGroup input characters into tokensTokens: Syntactical units that are treated as single, indivisible entities for the purposes of translationClassify tokens according to their type

  • Phase I: Lexical AnalysisProgram statementsum = sum + a[i];

    Digital perspective:tab,s,u,m,blank,=,blank,s,u,m,blank,+,blank,a,[,i,],;


  • Phase I: Lexical Analysis

    TOKEN TYPECLASSIFICATION NUMBERSymbol1Number2=3+4- 5;6==7If8Else9(10)11[ 12] 13Typical Token Classifications

  • Phase I: Lexical AnalysisLexical Analysis Process 1. Discard blanks, tabs, etc. - look for beginning of token. 2. Put characters together 3. Repeat step 2 until end of token 4. Classify and save token 5. Repeat steps 1-4 until end of statement 6. Repeat steps 1-5 until end of source codeScannersum=sum+a[i];sum 1=3+4a1[12i1]13;6

  • Phase I: Lexical AnalysisInput to a scanner - A high-level language statement from the source programScanners output - A list of all the tokens in that statement - The classification number of each token foundScannersum=sum+a[i];sum 1=3+4a1[12i1]13;6

  • Phase II: Parsing Parsing phaseA compiler determines whether the tokens recognized by the scanner are a syntactically legal statementPerformed by a parser

  • Phase II: Parsing Output of a parserA parse tree, if such a tree existsAn error message, if a parse tree cannot be constructedSuccessful construction of a parse tree is proof that the statement is correctly formed

  • ExampleHigh-level language statement: a = b + c

  • Grammars, Languages, and BNFSyntaxThe grammatical structure of the languageThe parser must be given the syntax of the languageBNF (Backus-Naur Form) Most widely used notation for representing the syntax of a programming languageliteral_expression ::= integer_literal | float_literal | string | character

  • Grammars, Languages, and BNFIn BNFThe syntax of a language is specified as a set of rules (also called productions)A grammarThe entire collection of rules for a languageStructure of an individual BNF ruleleft-hand side ::= definition

  • Grammars, Languages, and BNFBNF rules use two types of objects on the right-hand side of a productionTerminalsThe actual tokens of the languageNever appear on the left-hand side of a BNF ruleNonterminalsIntermediate grammatical categories used to help explain and organize the languageMust appear on the left-hand side of one or more rules

  • Grammars, Languages, and BNFGoal symbolThe highest-level nonterminalThe nonterminal object that the parser is trying to produce as it builds the parse treeAll nonterminals are written inside angle bracketsJava BNF

  • BNF Example ::= ::= | ::= | "." ::= ::= "," ::= "Sr." | "Jr." | | ""Identify the following:Goal symbol, terminals, nonterminals, a individual rule

    Is this a legal postal address?Steve Moses Sr. 215 Rose Ave.Everywhere, NC 43563

  • Parsing Concepts and TechniquesFundamental rule of parsing:By repeated applications of the rules of the grammar-If the parser can convert the sequence of input tokens into the goal symbol the sequence of tokens is a syntactically valid statement of the language else the sequence of tokens is not a syntactically valid statement of the language

  • Parsing Example ::= http:// [ / ] [ ? ] ::= [ : ] ::= | ::= [ . ] ::= . . . ::= ::= | [ / ] ::= [ + ] ::= | | | | ::= [ ] ::= | + ::= [ ] ::= [ ] ::= a | b | | z | A | B | | Z ::= 0 |1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 ::= $ | - | _ | @ | . | & | ~ ::= ! | * | " | ' | ( | ) | : | ; | , | ::= % ::= | a | b | c | d | e | f | A | B | C | D | E | F ::= [ ] ::=Is the following http address legal:

  • Parsing Concepts and TechniquesLook-ahead parsing algorithms - intelligent parsersOne of the biggest problems in building a compiler is designing a grammar that:Includes every valid statement that we want to be in the languageExcludes every invalid statement that we do not want to be in the language

  • Parsing Concepts and TechniquesAnother problem in constructing a compiler: Designing a grammar that is not ambiguousAn ambiguous grammar allows the construction of two or more distinct parse trees for the same statement NOT GOOD - multiple interpretations

  • Phase III: Semantics and Code GenerationSemantic analysisThe compiler makes a first pass over the parse tree to determine whether all branches of the tree are semantically validIf they are valid the compiler can generate machine language instructions else there is a semantic error; machine language instructions are not generated

  • Phase III: Semantics and Code GenerationSemantic analysisSyntactically correct, but semantically incorrect example:sum = a + b;int a; double sum;data type mismatch char b;Semantic recordsaintegersumdoublebchar

  • Phase III: Semantics and Code GenerationSemantic analysis

    a integerb char

    temp ?


    Parse treeSemantic recordSemantic recordSemantic record

  • Phase III: Semantics and Code GenerationSemantic analysis

    a integerb integer

    temp integer


    Parse treeSemantic recordSemantic recordSemantic record

  • Phase III: Semantics and Code GenerationCode generationCompiler makes a second pass over the parse tree to produce the translated code

  • Phase IV: Code OptimizationTwo types of optimizationLocal Global Local optimizationThe compiler looks at a very small block of instructions and tries to determine how it can improve the efficiency of this local code blockRelatively easy; included as part of most compilers:

  • Phase IV: Code OptimizationExamples of possible local optimizationsConstant evaluation x = 1 + 1 ---> x = 2Strength reduction x = x * 2 ---> x = x + xEliminating unnecessary operations

  • Phase IV: Code OptimizationGlobal optimizationThe compiler looks at large segments of the program to decide how to improve performanceMuch more difficult; usually omitted from all but the most sophisticated and expensive production-level optimizing compilersOptimization cannot make an inefficient algorithm efficient - only makes an efficient algorithm more efficient

  • SummaryA compiler is a piece of system software that translates high-level languages into machine languageGoals of a compiler: Correctness and the production of efficient and concise codeSource program: High-level language program

  • SummaryObject program: The machine language translation of the source programPhases of the compilation processPhase I: Lexical analysisPhase II: ParsingPhase III: Semantic analysis and code generationPhase IV: Code optimization