Theory of Compilation 236360 Erez Petrank Lecture 9: Runtime Part II. 1

1

Theory of Compilation 236360

Erez Petrank

Lecture 9: Runtime Part II.

2

Runtime Environment

• Mediates between the OS and the programming language

• Hides details of the machine from the programmer– Ranges from simple support functions all the way to a

full-fledged virtual machine

• Handles common tasks – Runtime stack (activation records)– Memory management– JIT (Dynamic optimization)– Debugging– …

3

Dynamic Memory Management: Introduction

There is a course about this topic: 236780 “Algorithms for dynamic memory management”

4

Static and Dynamic Variables

• Static variables are defined in a method and allocated on the runtime stack.

• Dynamic allocation: • In C, “malloc” allocates a space and “free” says that the

program will not use this space anymore.

Ptr = malloc (256 bytes);/* Use ptr */free (Ptr);

5

Dynamic Memory Allocation

• In Java, “new” allocates an object for a given class. – President obama = new President

• But there is no instruction for manually deleting the object.

• Objects reclaimed by a garbage collector when the program “does not need them” anymore.

course c = new course(236360);c.class = “TAUB 2”;Faculty.add(c);

6

Manual Vs. Automatic Memory Management

• A manual memory management lets the programmer decide when objects are deleted.

• A memory manager that lets a garbage collector delete objects is called automatic.

• Manual memory management creates severe debugging problems– Memory leaks,– Dangling pointers.

• Considered the BIG debugging problem of the 80’s

7

Automatic Memory Reclamantion

• An object is reclaimed when the program has “no way of accessing it”.

• Formally, when it is unreachable by a path of pointers from the “root” pointers, to which the program has direct access. – Local variables, pointers on stack, global (class) pointers, JNI

pointers, etc.

What’s good about automatic “garbage collection”?

© Erez Petrank 8

• Software engineering: – Relieves users of the book-keeping burden. – Stronger reliability, fewer bugs, faster debugging. – Code understandable and reliable. (Less interaction

between modules.)

• Security (Java):– Program never gets a pointer to “play with”.

9

Importance

• Memory is the bottleneck in modern computation. – Time & energy (and space).

• Optimal allocation (even if all accesses are known in advance to the allocator) is NP-Complete (to even approximate).

• Must be done right for a program to run efficiently.

• Must be done right to ensure reliability.

GC and languages

© Erez Petrank 10

• Sometimes it’s built in:– LISP, Java, C#.– The user cannot free an object.

• Sometimes it’s an added feature:– C, C++.– User can choose to free objects or not. The collector

frees all objects not freed by the user.

• Most modern languages are supported by garbage collection.

© Erez Petrank 11

Most modern languages rely on GC

Source: “The Garbage Collection Handbook” by Richard Jones, Anthony Hosking, and Eliot Moss.

61

7

What’s bad about automatic “garbage collection”?

© Erez Petrank 12

• It has a cost: – Old Lisp systems 40%.– Today’s Java program (if the collection is done “right”)

5-15%.

• Considered a major factor determining program efficiency.

• Techniques have evolved since the 60’s. We will only go over basic techniques.

13

Garbage Collection Efficiency

• Overall collection overheads (program throughput).

• Pauses in program run. • Space overhead. • Cache Locality (efficiency and energy).

14

Three classical algorithms

• Reference counting• Mark and sweep (and mark-compact)• Copying.

• The last two are also called tracing algorithms because they go over (trace) all reachable objects.

Reference counting

15

• Recall that we would like to know if an object is reachable from the roots.

• Associate a reference count field with each object: how many pointers reference this object.

• When nothing points to an object, it can be deleted.

• Very simple, used in many systems.

Basic Reference Counting

16

• Each object has an RC field, new objects get o.RC:=1.

• When p that points to o1 is modified to point to o2 we execute: o1.RC--, o2.RC++.

• if then o1.RC==0:– Delete o1.

– Decrement o.RC for all “children” of o1.

– Recursively delete objects whose RC is decremented to 0.

o1 o2

p

A Problem: Cycles

17

• The Reference counting algorithm does not reclaim cycles!

• Solution 1: ignore cycles, they do not appear frequently in modern programs.

• Solution 2: run tracing algorithms (that can reclaim cycles) infrequently.

• Solution 3: designated algorithms for cycle collection.

• Another problem for the naïve algorithm: requires a lot of synchronization in parallel programs.

• Advanced versions solve that.

The Mark-and-Sweep Algorithm

18

• Mark phase:– Start from roots and traverse all objects reachable by

a path of pointers. – Mark all traversed objects.

• Sweep phase:– Go over all objects in the heap. – Reclaim objects that are not marked.

The Mark-Sweep algorithm

19

• Traverse live objects & mark black. • White objects can be reclaimed.

sta

ckHeap

registers

Roots

Note! This is not the heap data structure!

Triggering

20

New(A)=if free_list is empty

mark_sweep() if free_list is empty

return (“out-of-memory”)pointer = allocate(A)return (pointer)

Garbage collection is triggered by allocation.

Basic Algorithm

21

mark_sweep()= for Ptr in Roots

mark(Ptr)sweep()

mark(Obj)=if mark_bit(Obj) == unmarked

mark_bit(Obj)=markedfor C in Children(Obj)

mark(C)

Sweep()=p = Heap_bottomwhile (p < Heap_top)

if (mark_bit(p) == unmarked) then free(p)else mark_bit(p) = unmarked; p=p+size(p)

Mark&Sweep Example

r1

r2

Properties of Mark & Sweep

23

• Most popular method today (at a more advanced form). • Simple.• Does not move objects, and so heap may fragment. • Complexity:

Mark phase: live objects (dominant phase) Sweep phase: heap size.

• Termination: each pointer traversed once. • Various engineering tricks are used to improve

performance.

• During the run objects are allocated and reclaimed. • Gradually, the heap gets fragmented. • When space is too fragmented to allocate, a compaction

algorithm is used. • Move all live objects to the beginning of the heap and

update all pointers to reference the new locations. • Compaction is considered very costly and we usually

attempt to run it infrequently, or only partially.

Mark-Compact

24

The Heap

25

An Example: The Compressor

• A simplistic presentation of the Compressor: • Go over the heap and compute for each live object where it

moves to – To the address that is the sum of live space before it in the

heap. – Save the new locations in a separate table.

• Go over the heap and for each object: – Move it to its new location– Update all its pointers.

• Why can’t we do it all in a single heap pass? • (In the full algorithm: succinct table, execute the first pass

very quickly, and parallelization.)

26

Mark Compact

• Important parameters of a compaction algorithm:– Keep order of objects?– Use extra space for compactor data structures?– How many heap passes? – Can it run in parallel on a multi-processor?

• We do not elaborate in this intro.

Copying garbage collection

27

• Heap partitioned into two.• Part 1 takes all allocations.• Part 2 is reserved. • During GC, the collector traces all reachable

objects and copies them to the reserved part. • After copying the parts roles are reversed: • Allocation activity goes to part 2, which was

previously reserved. • Part 1, which was active, is reserved till next

collection.

1 2

Copying garbage collection

28

Part I Part II

Roots

A

D

C

B

E

The collection copies…

29

Part I Part II

Roots

A

D

C

B

E

A C

Roots are updated; Part I reclaimed.

30

Part I Part II

Roots

A C

Properties of Copying Collection

31

• Compaction for free• Major disadvantage: half of the heap is not

used. • “Touch” only the live objects

– Good when most objects are dead. – Usually most new objects are dead, and so there are

methods that use a small space for young objects and collect this space using copying garbage collection.

A very simplistic comparison

Copying

Mark & sweep

Reference Counting

Live objects

Size of heap (live objects)

Pointer updates + dead objects

Complexity

Half heap wasted

Bit/object + stack for DFS

Count/object + stack for DFS

Space overhead

For free Additional work Additional work

Compaction

long long Mostly short Pause time

Cycle collection

More issues

33

Memory Management with Parallel Processors

• Stop the world• Parallel (stop-the-world) GC• Concurrent GC

• Trade-offs in pauses and throughput• Difference in complexity• Choose between parallel (longer pauses) and

concurrent (lower throughput)

© Erez Petrank 34

Terminology

Stop-the-World

Parallel

Concurrent

On-the-Fly

programGC

© Erez Petrank 35

Concurrent GC• Problem: Long pauses disturb the user.An important measure for the collection: length

of pauses.

• Can we just run the program while the collector runs on a different thread?

• Not so simple!• The heap changes while we collect. • For example,

– we look at an object B, but before we have a chance to mark its children, the program changes them.

Problem: Interference

AC

B

1. GC traced B

SYSTEM = program|| GC


AC

B

AC

B

1. GC traced B 2. programlinks C to B



AC

B

AC

B

AC

B

X


3. programunlinks C from A



AC

B

AC

B

AC

B

AC

B

C LOST


3. programunlinks C from A

4. GC traced A


40

Coordination with a write-barrier

• The program notifies the collector that changes happen so that it can mark objects conservatively.

• E.g.: the program registers objects that gain pointers.

• update(object y, field f, object x) {notifyCollector(x); // record new referent y.f := x;

}

• Collector can then assume all recordedobjects are alive (and trace their descendants as live).

Y

XZ

Three Typical Write Barriers

AC

B

1. Record C when C is linked to B

AC

B

2. Record C when link to C is removed

X AC

B

3. Record B when C is linked to B

C CB

42

Dijkstra on Concurrent GC

It has been surprisingly hard to find the published solution and justification.

It was only too easy to design what looked--sometimes even for weeks and to many people--like a perfectly valid solution, until the effort to prove it to be correct revealed a (sometimes deep) bug.

© Erez Petrank 43

Generational Garbage Collection

“The weak generational hypothesis”: most objects die young.

Using the hypothesis: separate objects according to their ages and collect

the area of the young objects more frequently.

© Erez Petrank 44

More Specifically,

• The heap is divided into two or more areas (generations).

• Objects allocated in 1st (youngest) generation. • The youngest generation is collected frequently. • Objects that survive in the young generation

“long enough” are promoted to the old generation.

Old GenerationYoung Generation

© Erez Petrank 45

• Short pauses: the young generation is kept small and so most pauses are short.

• Efficiency: collection efforts are concentrated where many dead objects exists.

• Locality: • Collector: mostly concentrated on a small part of

the heap• Program: allocates (and mostly uses) young

objects in a small part of the memory.

Advantages

© Erez Petrank 46

Mark-Sweep or Copying ?

• Copying is good when live space is small (time) and heap is small (space).

• A popular choice: – Copying for the (small) young generation.– Mark-and-sweep for the full collection.

• A small waste in space, high efficiency.

© Erez Petrank 47

Inter-Generational Pointers

• When tracing the young generation, is it enough to trace from the roots through the young generation only?


Roots

© Erez Petrank 48


• When tracing the young generation, is it enough to trace from roots through the young generation?

• No! Pointers from old to young generation may witness the liveness of an object.


Roots

© Erez Petrank 49


• We don’t trace the old generation. – Why?

• The solution: – “Maintain a list” of all inter-generational pointers.– Assume (conservatively) the parent (old) object is

alive. – Treat these pointers as additional roots.

• “Typically”: most pointers are from young to old (few from old to young).

• When collecting the old generation, collect the entire heap.

© Erez Petrank 50


• Inter-generational pointers are created: – When objects are promoted to old generation– When pointers are modified in the old generation.

• The first can be monitored by the collector during promotion.

• The second requires a write barrier.

© Erez Petrank 51

Write Barrier

Update(object y, field f, object x) { if y is in old space and x is in young

remember y.f->x ; y.f := x;}

x f

young old

Reference counting also had a write-barrier:

update(object y, field f, object x) {x.RC++; // increment new referent

y.f^.RC--; // decrement old referentif (y.f^.RC == 0) collect(y.f); y.f := x;

}

Y

52

Modern Memory Management

• Handle parallel platforms.• Real-time. • Cache consciousness. • Handle new platforms (GPU, PCM, …) • Hardware assistance.

53

Summary: Dynamic Memory Management

• Compiler generates code for allocation of objects and garbage collection when necessary.

• Reference-Counting, Mark-Sweep, Mark-Compact, Copying. • Concurrent garbage collection• Generations

– inter-generational pointers, write barrier.

Terminology

54

• Heap, objects• Allocate, free (deallocate, delete,

reclaim)• Reachable, live, dead, unreachable• Roots• Reference counting, mark and sweep,

copying, compaction, tracing algorithms• Fragmentation• Concurrent GC • Generational GC

55

Runtime Summary• Runtime:

– services that are always there: function calls, memory management, threads, etc.

– We discussed function calls• scoping rules• activation records• caller/callee conventions• Nested procedures (and the display array)

– Memory Management• mark-sweep, copying, reference counting, compaction• Generations• Concurrent garbage collection

56

OO Issues

57

Representing Data at Runtime

• Source language types– int, boolean, string, object types

• Target language types– Single bytes, integers, address representation

• Compiler should map source types to some combination of target types– Implement source types using target types

58

Basic Types

• int, boolean, string, void• Arithmetic operations

– Addition, subtraction, multiplication, division, remainder

• Can be mapped directly to target language types and operations

59

Pointer Types

• Represent addresses of source language data structures

• Usually implemented as an unsigned integer• Pointer dereferencing – retrieves pointed value

• May produce an error– Null pointer dereference – When is this error triggered?

60

Object Types

• Basic operations– Field selection + read/write

• computing address of field, dereferencing address

– Copying• copy block or field-by-field copying

– Method invocation• Identifying method to be called, calling it

• How does it look at runtime?

61

Object Types

class Foo { int x; int y;

void rise() {…} void shine() {…}}

x

y

rise

shine

Compile time information

Runtime memory layout for object of class Foo

DispacthVectorPtr

62

Field Selection

x

y

rise

shine


Runtime memory layout for object of class Foo

DispacthVectorPtrFoo f;int q;

q = f.x;

MOV f, %EBXMOV 4(%EBX), %EAXMOV %EAX, q

base pointer

field offset

from base pointer

63

Object Types - Inheritance

x

y

rise

shine


Runtime memory layout for object of class Bar

twinkle

z

DispacthVectorPtr

class Foo { int x; int y;

void rise() {…} void shine() {…}}

class Bar extends Foo{ int z; void twinkle() {…}}

64

Object Types - Polymorphism

class Foo { … void rise() {…} void shine() {…}}

x

y

Runtime memory layout for object of class Bar

class Bar extends Foo{ …}

z

class Main { void main() { Foo f = new Bar(); f.rise();}

f

Pointer to Bar

Pointer to Foo inside Bar

DVPtr

65

Static & Dynamic Binding

• Which “rise” should main() use? • Static binding: f is of type Foo and therefore it always refers

to Foo’s rise. • Dynamic binding: f points to a Bar object now, so it refers

to Bar’s rise.


class Bar extends Foo{ void rise() {…}}


66

Typically, Dynamic Binding is used

• Finding the right method implementation at runtime according to object type

• Using the Dispatch Vector (a.k.a. Dispatch Table)




67

Dispatch Vectors in Depth

• Vector contains addresses of methods• Indexed by method-id number• A method signature has the same id number for all subclasses



01


0

xyz

fPointer to Bar

Pointer to Foo inside BarDVPtr

shine

rise

shinerise

Dispatchvector for Bar

Methodcode

using Bar’s

dispatch table

68

Dispatch Vectors in Depthclass Main { void main() { Foo f = new Foo(); f.rise();}


01


0

xy

f

Pointer to Foo

DVPtrshine

rise

shinerise

using Foo’s

dispatch table

Dispatchvector for Foo

Methodcode

Multiple Inheritance

69

supertyping convert_ptr_to_E_to_ptr_to_C(e) = e convert_ptr_to_E_to_ptr_to_D(e) = e + sizeof (class C)

subtyping convert_ptr_to_C_to_ptr_to_E(e) = e convert_ptr_to_D_to_ptr_to_E(e) = e - sizeof (class C)

class C { field c1; field c2; void m1() {…} void m2() {…}}class D { field d1; void m3() {…} void m4() {…}}class E extends C,D{ field e1; void m2() {…} void m4() {…} void m5() {…}}

c1c2

DVPtr

Pointer to E

Pointer to C inside EDVPtr

m2_C_E

m1_C_C

E-Object layout

Dispatchvector

d1e1

m4_D_E

m3_D_D

m5_E_EPointer to D inside E

70

Runtime checks

• generate code for checking attempted illegal operations– Null pointer check– Array bounds check– Array allocation size check– Division by zero– …

• If check fails jump to error handler code that prints a message and gracefully exists program

71

Null pointer check

# null pointer check cmp $0,%eax je labelNPE

labelNPE: push $strNPE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

72

Array bounds check

# array bounds check mov -4(%eax),%ebx # ebx = length# ecx holds index cmp %ecx,%ebx jle labelABE # ebx <= ecx ? cmp $0,%ecx jl labelABE # ecx < 0 ?

labelABE: push $strABE # error message call __println push $1 # error code call __exit


73

Array allocation size check

# array size check cmp $0,%eax # eax == array size jle labelASE # eax <= 0 ?

labelASE: push $strASE # error message call __println push $1 # error code call __exit


74

Exceptions

bar

foo

main

Exception thrown

env

Exception exampleorg.eclipse.swt.SWTException: Graphic is disposed

at org.eclipse.swt.SWT.error(SWT.java:3744) at org.eclipse.swt.SWT.error(SWT.java:3662) at org.eclipse.swt.SWT.error(SWT.java:3633) at org.eclipse.swt.graphics.GC.getClipping(GC.java:2266) at com.aelitis.azureus.ui.swt.views.list.ListRow.doPaint(ListRow.java:260) at com.aelitis.azureus.ui.swt.views.list.ListRow.doPaint(ListRow.java:237) at com.aelitis.azureus.ui.swt.views.list.ListView.handleResize(ListView.java:867) at com.aelitis.azureus.ui.swt.views.list.ListView$5$2.runSupport(ListView.java:406) at org.gudy.azureus2.core3.util.AERunnable.run(AERunnable.java:38) at org.eclipse.swt.widgets.RunnableLock.run(RunnableLock.java:35) at org.eclipse.swt.widgets.Synchronizer.runAsyncMessages(Synchronizer.java:130) at org.eclipse.swt.widgets.Display.runAsyncMessages(Display.java:3323) at org.eclipse.swt.widgets.Display.readAndDispatch(Display.java:2985) at org.gudy.azureus2.ui.swt.mainwindow.SWTThread.<init>(SWTThread.java:183) at org.gudy.azureus2.ui.swt.mainwindow.SWTThread.createInstance(SWTThread.java:67)….

76

Recap• Lexical analysis

– regular expressions identify tokens (“words”)

• Syntax analysis– context-free grammars identify the structure of the program

(“sentences”)

• Contextual (semantic) analysis– type checking defined via typing judgements– can be encoded via attribute grammars– Syntax directed translation

• Intermediate representation – many possible IRs; generation of intermediate representation;

3AC; backpatching

• Runtime: – services that are always there: function calls, memory

management, threads, etc.

77

Journey inside a compiler

LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

float position;

float initial;

float rate;

position = initial + rate * 60

<float> <ID,position> <;> <float> <ID,initial> <;> <float> <ID,rate> <;> <ID,1> <=> <ID,2> <+> <ID,3> <*> <60>

TokenStream

78


LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

<ID,1> <=> <ID,2> <+> <ID,3> <*> <60>

60

<id,1>

=

<id,3>

<id,2>

+

*

AST

id symbol type data

1 position float …

2 initial float …

3 rate float …

symbol table

S ID = EE ID | E + E | E * E | NUM

79


LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

60

=

<id,3>

<id,2>

+

*

<id,1>

inttofloat

60

<id,1>

=

<id,3>

<id,2>

+

*

AST AST

coercion: automatic conversion from int to floatinserted by the compiler

id symbol type

1 position float

2 initial float

3 rate float

symbol table

80


LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Inter.Rep.

Code Gen.

t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3

3AC

60

=

<id,3>

<id,2>

+

*

<id,1>

inttofloat

production semantic rule

S id = E S.code := E. code || gen(id.var ‘:=‘ E.var)

E E1 op E2 E.var := freshVar(); E.code = E1.code || E2.code || gen(E.var ‘:=‘ E1.var ‘op’ E2.var)

E inttofloat(num) E.var := freshVar(); E.code = gen(E.var ‘:=‘ inttofloat(num))

E id E.var := id.var; E.code = ‘’

t1 = inttofloat(60)t2 = id3 * t1

t3 = id2 * t2id1 = t3

(for brevity, bubbles show only code generated by the node and not all accumulated “code” attribute)

note the structure:translate E1translate E2

handle operator

81


Inter.Rep.

Code Gen.

LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

3AC Optimized

t1 = inttofloat(60)t2 = id3 * t1t3 = id2 + t2id1 = t3

t1 = id3 * 60.0id1 = id2 + t1

value known at compile timecan generate code with converted value

eliminated temporary t3

82


Inter.Rep.

Code Gen.

LexicalAnalysi

s

Syntax Analysi

s

Sem.Analysi

s

Optimized

t1 = id3 * 60.0id1 = id2 + t1

Code Gen

LDF R2, id3MULF R2, R2, #60.0LDF R1, id2ADDF R1,R1,R2STF id1,R1

83

Problem 3.8 from [Appel]

A simple left-recursive grammar: E E + id E id

A simple right-recursive grammar accepting the same language:

E id + E E id

Which has better behavior for shift-reduce parsing?

84

Answer

The stack never has more than three items on it. In general, withLR-parsing of left-recursive grammars, an input string of length O(n)requires only O(1) space on the stack.

E E + idE id

Input

id+id+id+id+id

id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E E + E + id (reduce) E

stack

left recursive

85

Answer

The stack grows as large as the input string. In general, with LR-parsingof right-recursive grammars, an input string of length O(n) requires O(n) space on the stack.

E id + EE id

Input

id+id+id+id+id

id id + id + id id + id + id + id + id id + id + id id + id + id + id id + id + id + id + id + id + id + id + id (reduce) id + id + id + id + E (reduce) id + id + id + E (reduce) id + id + E (reduce) id + E (reduce) E

stack

right recursive

Documents

Theory of Compilation 236360 Erez Petrank Lecture 9: Runtime Part II. 1