Introduction to Language Processing TechnologyNatawut Nupairoj, Ph.D.
Department of Computer EngineeringChulalongkorn University
Outline
Level of Programming Languages. Language Processors. Specification of Programming Languages.
swap(int v[], int k)
{ int temp;
temp = v[k];
v[k] = v[k+1];
v[k+1] = temp;
}
swap:
muli $2, $5, 4
add $2, $4, $2
lw $15, 0($2)
...
Assembler
C Compiler
Level of Programming Languages
000010001101101100110000
000010001101101100110000
000010001101101100110000
000010001101101100110000
...
•High level: C / Java / Pascal•Low level: Assembly / Bytecode•Machine Language
High-Level Language Characteristics Expressions:
a = b + (c – d)/2; Data types:
Integer, character, boolean. Record, array.
Control structures: Selective. Iterative.
High-Level Language Characteristics Declarations:
Identifier can be constant, variable, procedure, function, and type.
Abstraction: Object-oriented concept. Concern only what, not how.
Encapsulation: Object-oriented concept. Information hiding.
Language Processors
Program that manipulates programs express in some programming languages.
Example:Editor.Translator / Compiler. Interpreter.
Translator
Translate a “source” program into an “equivalent” “object” program.
Translatorsourceprogram
objectprogram
error messages
CC++FORTRANJavaVB
AssemblyCBytecodep-code
Tombstone Diagrams
Ordinary program
Program P
Written with Language L
L
P
Java
Sort
x86
Sort
x86
Web Browser
x86
Web Browser
Tombstone Diagrams
Machine
M
Machine M
x86
SPARCx86
SPARC
x86
Web Browser
Tombstone Diagrams
Translator
L
S T
S is translatedto T
Translator is written with Language L
C
Java x86
x86
Java x86
C++
Java C
Tombstone Diagrams
Compilation
x86
C x86
x86
x86
x86
Sort
C
Sort
x86
Sort
Tombstone Diagrams
Cross Compilation
x86
C SPARC
x86
SPARC
SPARC
Sort
SPARC
Sort
C
Sort
Tombstone Diagrams
x86
Java C
x86
x86
C x86
x86
Two-stage compilation
C
Sort
Java
Sort
x86
Sort
Tombstone Diagrams
x86
C x86
x86
Compiling a compiler
C
Pascal x86
x86
Pascal x86
Tombstone Diagrams
Interpreter
S
L
Interpret source S
x86Written in language L
Basic
x86
Basic
x86
SQL
SPARC
Basic
Sort
Tombstone Diagrams
Abstract machine = hardware emulator interpreter for low-level language.
x86
C x86
x86
370
C
370
x86
x86
370
x86=
370
HW1
370
370
HW1
Tombstone Diagrams
Java Portable environment: write-once-run-anywhere. Interpretive compiler.
M
Java JVM JVM
M
JVM = Bytecode
Tombstone Diagrams
x86
JVM
x86
SPARC
JVM
SPARC
JVM
Sort
JVM
Sort
x86
Java JVM
x86
JVM
Sort
Java
Sort
Tombstone Diagrams
BootstrappingCompiler L that is written on L language.
Full bootstrapStart from nothing.
Half bootstrapStart from other machine.
NNP
C NNP
Tombstone Diagrams
Full Bootstrap
NNP
Csub
Csub NNP
NNP
Csub NNP
NNP
Csub
C NNP
NNP
C NNP
NNP
Csub NNP
NNP
Csub NNP
NNP
Csub NNP
Tombstone Diagrams
NNP
C
C NNP
NNP
C NNP
NNP
C NNP
Tombstone Diagrams
NNP
Csub
Csub NNP
NNP
Csub NNP
NNP
Csub
C NNP
NNP
C NNP
NNP
Csub NNP
NNP
C NNP
NNP
C
C NNP
Tombstone Diagrams
Half Bootstrap
x86
C x86
x86
C
C NNP
x86
C NNP
x86
C NNP
x86
C
C NNP
NNP
C NNP
x86
C X86
x86
Specification of Programming Language Specification
Syntax Define symbol and structure of the language. Grammar.
Contextual constraints Constraints beyond grammar. Rules of the language: scope rules, type rules, etc.
Semantics Meaning of program: its behaviors when run. How to translate a sentence S of the language L to a
machine code on M
Syntax
Context-free grammarTerminals.Non-terminals / Variables.Start symbol.Production rules.
Usually being expressed with BNF notation.
BNF Notation
Backus-Naur Form. Given production rule:
N N
Can be written as:
N ::=
Example: Mini-Triangle Program
! This is a comment. It continues to the end-of-line.
let
const m ~ 7;
var n: Integer
in
begin
n:= 2 * m * m;
putint(n);
end
Terminalsbegin const do else end ifin let then var while; : := ~ ( )+ - * / < >= \
Mini-Triangle Syntax
Program ::= Command
Command ::= single-Command
| Command ; single-Command
single-Command ::= V-name := Expression
| Identifier ( Expression )
| if Expression then single-Command
else single-Command
| while Expression do single-Command
| let Declaration in single-Command
| begin Command end
Mini-Triangle Syntax
Expression ::= primary-Expression
| Expression Operator primary-Expression
primary-Expression ::= Integer-Literal
| V-name
| Operator primary-Expression
| ( Expression )
V-name ::= Identifier
Declaration ::= single-Declaration
| Declaration ; single-Declaration
single-Declaration ::= const Identifier ~ Expression
| var Identifier : Type-denoter
Mini-Triangle Syntax
Type-denoter ::= Identifier
Operator ::= + | - | * | / | < | > | = | \
Identifier ::= Letter | Identifier Letter
| Identifier Digit
Integer-Literal ::= Digit | Integer-Literal Digit
Comment ::= ! Graphic* eol
Letter ::= a | b | … |z
Digit ::= 0 | 1 | 2 | … | 9
Syntax Tree
Ordered tree with Internal nodes: non-terminals.Leaf nodes: terminals.N-tree of G is a syntax tree with N as the root.
Mini-Triangle Syntax Tree
Expression ::= primary-Expression| Expression Operator primary-Expression
primary-Expression ::= Integer-Literal| V-name| Operator primary-Expression|( Expression )
V-name ::= Identifier…
Expression
Expression
Expression
primary-Expr.
V-name
Ident.
d
Op.
+
Int. Lit.
10
Op.
*
primary-Expr. primary-Expr.
V-name
Ident.
n