View
214
Download
0
Embed Size (px)
Citation preview
Michael Eckmann - Skidmore College - CS 330 - Fall 2004
Today’s Topics• Continue the discussion of Language Evaluation• von Neumann architecture / Imperitive languages• The major language groups• Language design tradeoffs• Compilation vs. Interpretation vs. Hybrids• Syntax & Semantics
Language Evaluation• Writability
– Orthogonality• if small set and all combinations make sense +
• if large set and all combinations exist but some don't make sense to ever use or would be rarely used – (could cause undetected errors)
– Abstraction (functional (methods/functions) & data (classes))• don't need to know/be reminded of/replicate the implementation
details, just need to use what has already been written/tested
– Expressivity (powerful operators or many ways to do the same things e.g. count++, for vs. while ...)
• Reliability
– (+)Type checking (at run-time vs. compile-time vs. neither)• when is better?
– (+)Exception Handling
– (-)Aliasing (why?)Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Language Evaluation• Cost (time == $)
– training programmers in the language– writing --- IDEs– compiling, executing --- compiler optimization
tradeoff– Implementation system (e.g. The compiler and/or
interpreter available for free?)– Poor reliability– Maintenance (up to 4 times cost of developing)
• Can you think of others?Michael Eckmann - Skidmore College - CS 330 - Fall 2007
A few Imperative Languages
• C, Pascal and many others
• C++ is imperative-based but it is OO
• Java is also imperative-based but it is OO
von Neumann bottleneck
• transmission/piping between memory and CPU takes longer than executing instructions within the CPU.
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Imperative Language Features• Imperative languages are built FOR the von
Neumann architecture – Data and programs stored in same memory– Memory is separate from CPU– Instructions and data are transmitted to CPU from
memory, results are transmitted back to memory
• Variables model memory cells
• Assignment statements model transmission
• Iteration is efficient (b/c instructions are in adjacent memory cells) – recursion is inefficient (b/c save state of each call).
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Functional vs. Imperative Languages
• Functional (e.g. Lisp, Scheme, et al.)– Apply functions to parameters– different use of variables– No assignment statements– No iteration
– Many programmers feel that there are extreme benefits to computing using functional languages. So, why aren't they used more?
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Functional vs. Imperative Languages
• Because of von Neumann, that's why!
• What might be a drawback to using a functional language on a von Neumann machine?
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Imperative Languages
• Von Neumann => imperative languages
• But many programmers do not realize this and think that imperative languages are the natural way to program.
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
The major Language Groups• Imperative
– And visual langs (drag/drop code)
• Functional• Logic
– Rule based, no order of execution
• Object-Oriented– provide support for
• abstract data types
• inheritance
• dynamic binding of method calls to methodsMichael Eckmann - Skidmore College - CS 330 - Fall 2007
Some Language Design Trade-offs
• Reliability vs. cost of execution– e.g. Run-time array range checking --- Java vs. C
• Readability vs. writability (expressivity)– Compactness
• Flexibility vs. safety– e.g. Multiple types allowed in same memory location
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Let's think about some stuff
• Any other language evaluation criteria you can think of?
• What are the pros and cons of case sensitivity in user defined names?
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Compilation, Interpretation, Hybrids
• What is a compiler?
• What is an interpreter?
• What is a hybrid of these?– Java, Perl– Compiler generates intermediate code that is then
interpreted when executed– Any advantages/disadvantages?
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Compilation Process• Lexical analyzer – groups the code into variable names,
reserved words, punctuation, etc.
• Syntax analyzer – constructs parse trees to determine if the lexical units in order are syntactically correct
• Both create/update items in the symbol table to refer to these units
• Intermediate code generator (and semantic analyzer) generates assembly-like code from the symbol table and parse trees.
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Compilation Process
• Optimization may occur at this stage• What might the compiler try to optimize?
• Code generator takes the intermediate code and the symbol table to create the machine code for the particular architecture on which the code is compiled.
• Before execution, the machine code must be linked to any necessary programmed libraries and/or systems programs.
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Execution Process
• Fetch-execute cycle on a von-Neumann machine– Uses a program counter to know which instruction is
next to be fetched.
while (true)
{
Fetch instruction,
increment program counter,
decode instruction,
execute instruction
}Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Now Chapter 3 ...
• Describing a language– How to give it a clear and precise definition so that
implementers (compiler writers) will get it right– How to describe it to users (programmers) of the
language
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax and Semantics
• Describing a language involves both it's syntax and semantics
• Syntax is the form, semantics is the meaning– e.g. English language example:
• Time flies like an arrow.– Syntactically correct but has 3 different meanings (semantics)
• Easier to describe syntax formally than it is to describe semantics formally
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax
• A language is a set of strings (or sentences or statements) of characters from an alphabet.
• Lexemes vs. tokens– tokens (more general) are categories of lexemes
(more specific)– e.g. Some tokens might be: identifier, int_literal,
plus_op– e.g. Some lexemes might be: idx, 42, +
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax
Lexemesidx
=
42
+
Count
;
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Tokensidentifier
equal_sign
int_literal
plus_op
identifier
semicolon
idx = 42 + count;
Syntax• Recognizers and generators are used to define languages.
• Generators generate valid programs in a language.
• Recognizers determine whether or not a program is in the language (valid syntactically.)
• Generators are studied in Chapter 3 (stuff coming up next in this lecture) and recognizers (parsers) in Chapter 4.
• How many valid programs are there for some particular language, say Java?
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax• Context Free Grammars (CFGs) developed by Noam
Chomsky are essentially equal to Backus-Naur Form (BNF) by Backus and Naur.
• They are used to describe syntax.
• These are metalanguages (languages used to describe languages.)
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax• A Context Free Grammar is a four-tuple
(T, N, S, P) where– T is the set of terminal symbols– N is the set of non-terminal symbols– S is the start symbol (which is one of the non-
terminals)– P is the set of productions of the form:
• A -> X1 . . . XM where– A element of N
– Xi element of N U T, 1 <= i <= m, m>=0
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax
• How are CFGs used to describe the syntax of a programming language?– The nonterminals are abstractions– The terminals are tokens and lexemes– The productions are used to describe programs,
individual statements, expressions etc.
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax• Example production:
<while_stmt> while ( <logic_expr> ) <stmt>
• Everything to the left of the arrow is considered the left-hand side, LHS, and to the right the RHS.
• The only thing that can appear on the LHS is one nonterminal.
• Multiple RHS's for a LHS are separated by the | or symbol, e.g.
<compound_stmt> <single_stmt> ;
| { <stmt_list> }
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax• Recursion is allowed in productions, e.g.
<ident_list> ident
| ident, <ident_list>
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax• An example grammar:
<program> <stmts> <stmts> <stmt> | <stmt> ; <stmts> <stmt> <var> = <expr> <var> a | b | c | d <expr> <term> + <term>
| <term> - <term> <term> <var> | const
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax• Derivations are repeated applications of production rules.
• An example derivation:
<program> => <stmts> => <stmt> => <var> = <expr> => a = <expr> => a = <term> + <term> => a = <var> + <term> => a = b + <term> => a = b + const
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax
• Every string of symbols in the derivation is a sentential form
• A sentence is a sentential form that has only terminal symbols
• A leftmost derivation is one in which the leftmost nonterminal in each sentential form is the one that is expanded in each step of the derivation.
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Syntax / Parse Trees
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
• A hierarchical representation of a derivation (parse trees also hold some semantic information)
<program>
<stmts>
<stmt>
const
a
<var> = <expr>
<var>
b
<term> + <term>
An Ambiguous Expression Grammar
<expr> <expr> <op> <expr> | const
<op> / | -
<expr>
<expr> <expr>
<expr> <expr>
<expr>
<expr> <expr>
<expr> <expr>
<op>
<op>
<op>
<op>
const const const const const const- -/ /
<op>
This one is now unambiguous• Ambiguity is bad for compilers, so the language
description should be unambiguous.
<expr> <expr> - <term> | <term>
<term> <term> / const | const
• Compiler examines parse tree to determine the code to generate. Two parse trees for the same syntax causes the meaning (semantics) of the code to not be unique.
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Ambiguous?• Look at the if statement rules below
<if_stmt> if <logic_expr> then <stmt> | if <logic_expr> then <stmt> else <stmt>
<stmt> <if_stmt> | ...
• Do you think this is ambiguous? That is, can more than one parse tree be generated from the same code?
• if (a==b) then if (c==d) then print_something() else print_something_else()
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
Ambiguous?if (a==b) then if (c==d) then print_something() else
print_something_else()
if (a==b) then
if (c==d) then
print_something()
else
print_something_else()
Michael Eckmann - Skidmore College - CS 330 - Fall 2007
if (a==b) then
if (c==d) then
print_something()
else
print_something_else()