Programming Language Processors in Java COMPILERS AND INTERPRETERS
DAVID A WATT University of Glasgow, Scotland and DERYCK F BROWN The Robert Gordon University, Scotland
An imprint of Pearson Education Harlow, England . London . New York . Reading, Massachusetts . San Francisco . Toronto . Don Mills, Ontario . Sydney Tokyo . Singapore . Hong Kong . Seoul . Taipei . Cape Town . Madrid . Mexico City . Amsterdam . Munich . Paris . Milan
Pearson Education Limited Edinburgh Gate Harlow Essex, CM20 2JE England and Associated Companies throughout the world Visit us on the World Wide Web at: http://www.pearsoneduc.com
First published 2000
0 Pearson Education Limited 2000
The rights of David A Watt and Deryck F Brown to be identified as authors of this Work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, without either the prior written permission of the Publishers or a licence permitting restricted copying in the United Kingdom issued by the Copyright Licensing Agency Ltd., 90 Tottenham Court Road London W l P OLP.
Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Pearson Education Limited has made every attempt to supply trademark information about manufacturers and their products mentioned in this book. A list of the trademark designations and their owners appears on page x.
ISBN 0 130 25786 9
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library.
Library of'Congrrss Cutaloging-in-Publication Data Watt, David A. (David Anthony)
Programming language processors in Java : compilers and interpreters / David A. Watt and Deryck F. Brown
p. cm. Includes bibliographical references. ISBN 0-1 3425786-9 (case) 1. Java (Computer program language) 2. Compilers (Computer programs) 3.
Interpreters (Computer programs) 1. Brown, Deryck F. 11. Title.
Typeset by 7 Printed an bound in Great Britain by Biddles Ltd, www.biddles.co.uk
1 Introduction 1.1 Levels of programming language 1.2 Programming language processors 1.3 Specification of programming languages
1.3.1 Syntax 1.3.2 Contextual constraints 1.3.3 Semantics
1.4 Case study: the programming language Triangle 1.5 Further reading
2 Language Processors Translators and compilers Interpreters Real and abstract machines interpretive compilers Portable compilers Bootstrapping 2.6.1 Bootstrapping a portable compiler 2.6.2 Full bootstrap 2.6.3 Half bootstrap 2.6.4 Bootstrapping to improve efficiency Case study: the Triangle language processor Further reading Exercises
3 Compilation 3.1 Phases
3.1.1 Syntactic analysis 3.1.2 Contextual analysis 3.1.3 Code generation
vi Programming Language Processors in Java
3.2.1 Multi-pass compilation 3.2.2 One-pass compilation 3.2.3 Compiler design issues
3.3 Case study: the Triangle compiler 3.4 Further reading
4 Syntactic Analysis 4.1 Subphases of syntactic analysis
4.1.1 Tokens 4.2 Grammars revisited
4.2.1 Regular expressions 4.2.2 Extended BNF 4.2.3 Grammar transformations 4.2.4 Starter sets
4.3 Parsing 4.3.1 The bottom-up parsing strategy 4.3.2 The top-down parsing strategy 4.3.3 Recursive-descent parsing 4.3.4 Systematic development of a recursive-descent parser
4.4 Abstract syntax trees 4.4.1 Representation 4.4.2 Construction
4.5 Scanning 4.6 Case study: syntactic analysis in the Triangle compiler
4.6.1 Scanning 4.6.2 Abstract syntax trees 4.6.3 Parsing 4.6.4 Error handling
4.7 Further reading Exercises
5 Contextual Analysis 5.1 Identification
5.1.1 Monolithic block structure 5.1.2 Flat block structure 5.1.3 Nested block structure 5.1.4 Attributes 5.1.5 Standard environment
5.2 Typechecking 5.3 A contextual analysis algorithm
5.3.1 Decoration 5.3.2 Visitor classes and objects 5.3.3 Contextual analysis as a visitor object
5.4 Case study: contextual analysis in the Triangle compiler
5.4.1 Identification 5.4.2 Type checking 5.4.3 Standard environment
5.5 Further reading Exercises
6 Run-Time Organization 6.1 Data representation
6.1.1 Primitive types 6.1.2 Records 6.1.3 Disjoint unions 6.1.4 Static arrays 6.1.5 Dynamic arrays 6.1.6 Recursive types
6.2 Expression evaluation 6.3 Static storage allocation 6.4 Stack storage allocation
6.4.1 Accessing local and global variables 6.4.2 Accessing nonlocal variables
6.5 Routines 6.5.1 Routine protocols 6.5.2 Static links 6.5.3 Arguments 6.5.4 Recursion
6.6 Heap storage allocation 6.6.1 Heap management 6.6.2 Explicit storage deallocation 6.6.3 Automatic storage deallocation and garbage collection
6.7 Run-time organization for object-oriented languages 6.8 Case study: the abstract machine TAM 6.9 Further reading
7 Code Generation 7.1 Code selection
7.1.1 Code templates 7.1.2 Special-case code templates
7.2 A code generation algorithm 7.2.1 Representation of the object program 7.2.2 Systematic development of a code generator 7.2.3 Control structures
7.3 Constants and variables 7.3.1 Constant and variable declarations 7.3.2 Static storage allocation 7.3.3 Stack storage allocation
viii Programming Language Processors in Java
7.4 Procedures and functions 7.4.1 Global procedures and functions 7.4.2 Nested procedures and functions 7.4.3 Parameters
7.5 Case study: code generation in the Triangle compiler 7.5.1 Entity descriptions 7.5.2 Constants and variables
7.6 Further reading Exercises
8 Interpretation 8.1 Iterative interpretation
8.1.1 Iterative interpretation of machine code 8.1.2 Iterative interpretation of command languages 8.1.3 Iterative interpretation of simple programming languages
8.2 Recursive interpretation 8.3 Case study: the TAM interpreter 8.4 Further reading
9 Conclusion 9.1 The programming language life cycle
9.1.1 Design 9.1.2 Specification 9.1.3 Prototypes 9.1.4 Compilers
9.2 Error reporting 9.2.1 Compile-time error reporting 9.2.2 Run-time error reporting
9.3 Efficiency 9.3.1 Compile-time efficiency 9.3.2 Run-time efficiency
9.4 Further reading Exercises Projects with the Triangle language processor
A Answers to Selected Exercises Answers 1 Answers 2 Answers 3 Answers 4 Answers 5 Answers 6
Answers 7 Answers 8 Answers 9
B Informal Specification of the Programming Language Triangle B. 1 Introduction B.2 Commands B.3 Expressions B.4 Value-or-variable names B.5 Declarations B.6 Parameters B.7 Type-denoters B.8 Lexicon B.9 Programs
C Description of the Abstract Machine TAM C. 1 Storage and registers C.2 Instructions C.3 Routines
D Class Diagrams for the Triangle Compiler D.l Compiler D.2 Abstract syntax trees
D.2.1 Commands D.2.2 Expressions D.2.3 Value-or-variable names D.2.4 Declarations D.2.5 Parameters D.2.6 Type-denoters D.2.7 Terminals
D.3 Syntactic analyzer D.4 Contextual analyzer D.5 Code generator
The subject of this book is the implementation of programming languages. Programming language processors are programs that process other programs. The primary examples of language processors are compilers and interpreters.
Programming languages are of central importance in computer science. They are the most fundamental tools of software engineers, who are completely dependent on the quality of the language processors they use. There is an interplay between the design of programming languages and computer instruction sets: compilers must bridge the gap between high-level languages and machine code. And programming language design itself raises strong feelings among computer scientists, as witnessed by the proliferation of language paradigms. Imperative and object-oriented languages are currently dominant in terms of actual usage, and it is on the implementation of such languages that this book focuses.
Programming language implementation is a particularly fascinating topic, in our view, because of its close interplay between theory and practice. Ever since the dawn of computer science, the engineering of language processors has driven, and has been vastly improved by, the development of relevant theories.
Nowadays, the principles of programming language implementation are very well understood. An experienced compiler writer can implement a simple programming lan- guage about as fast as he or she can type. The basic techniques are simple yet effective, and can be lucidly presented to students. Once the techniques have been mastered, building a compiler from scratch is essentially an exercise in software engineering.
A textbook example of a compiler is often the first complete program of its size seen by computer science students. Such an example should therefore be an exemplar of good software engineering principles. Regrettably, many compiler textbooks offend these principles. This textbook, based on a total of about twenty-five years' experience of teaching programming language implementation, aims to exemplify good software engineering principles at the same time as explaining the specific techniques needed to build compilers and interpreters.
The book shows how to design and build simple compilers and interpreters using the object-o