27
Cse322, Programming Languages and Compilers 1 06/14/22 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation, anonymous value, field information, layout), Control Flow (basic blocks, generating code, loops).

Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

  • View
    225

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

104/18/23

Lecture #4, April 12, 2007•Strings (representation, byte operation, copying),•Structures (representation, anonymous value, field information, layout),•Control Flow (basic blocks, generating code, loops).

Page 2: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

204/18/23

Assignments

• Reading– Read chapter 7 sections 7.9 7.10 and 7.11

– Possible Quiz Monday on the reading.

Page 3: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

304/18/23

Strings

• Strings are usually represented as byte sequences

• Operations on strings do not generally map onto hardware operations.– Load instructions load whole words

– Strings are composed of bytes

– Shifting and masking are often necessary

• String representations are often both language and machine dependent.– In C strings are null terminated adjacent arrays of char

– In Java strings are byte arrays with their length stored explicitly.

Page 4: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

404/18/23

Representation

a b c \0

a b c3

a b c

3

Null terminated

Length Prefixed

Length plus pointer2

Note that sharing is possible with the length plus

pointer

Page 5: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

504/18/23

Assignment of individual characters• A[ 1 ] = b[ 2 ]

loadI @b => rb

cloadAI rb,2 => r2

loadI @a => ra

cstoreAI r2 => ra,1

• This is only possible if the machine has byte level load and store. Many machines do not.

Page 6: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

604/18/23

Without byte oriented operations• Masking and shifting are necessary without

byte oriented operations.

• Masking– A mask is a word where “1”s are in the important positions and

“0”s are in other positions. – For example in a 32 bit word, the mask for the second byte

» 00000000 00000000 11111111 00000000» Ox0000FF00 in hex» Anding a mask with a word “zeros” out the unmasked bits andI 00000000 00000000 11111111 00000000 01011101 11011101 01010001 11110101 -> 00000000 00000000 01010001 00000000

• Shifting– Shifting moves the bits over

Shift 00000000 00000000 11111111 00000000,8-> 00000000 00000000 00000000 11111111

Page 7: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

704/18/23

a[ 1 ] = b[ 2 ]• Load source word ( b )

– 01011101 11011101 01010001 11110101• Mask away unwanted characters (every thing but

2)– 00000000 00000000 01010001 00000000

• Shift to byte position in word of target (position 1)– 00000000 00000000 00000000 01010001

• Load target word (a)– 01110100 10111011 00001011 11010111

• Mask away the position of the target character– 01110100 10111011 00001011 00000000

• Or with shifted & masked source with masked targetOr 00000000 00000000 00000000 01010001 01110100 10111011 00001011 00000000-> 01110100 10111011 00001011 01010001

• Store result in target address

Page 8: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

804/18/23

Longer words

• If a and b are longer strings (longer than 4 characters) then we need to select the right word from the longer string.

• A[n] = B[m]

• The correct source word is ( n `div` 4 )

• The correct source position in that word is ( n `mod` 4 )

• Similar for target string A

Page 9: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

904/18/23

Copying Strings.

• To copy a string we need to copy all the component characters.

• With byte oriented load and store this is easy

• With word oriented load and store again need to load and move words.– How many words must we move?

– When do we need to mask?

» How is this affected by length?

» Word alignment of the two strings?

• Error conditions.– Since strings are generally allocated once.

– A := B could cause an error if B is longer than A

– Test for lengths, first.

Page 10: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1004/18/23

String Concatenation A^B

• Compute lengths of A and B lenA and lenB

• Allocate (lenA + lenB) bytes plus room for length and any alignment necessary.

• Copy A to target• Copy B to target• Set the length convention appropriately.

Page 11: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1104/18/23

String Length

• Here we use the explicit information stored with the string.

• Null terminated– Loop and count until 0 is encountered

• Length Prefixed– Address of string stores the length

• Length plus pointer– Address of string stores length

a b c \0

a b c3

a b c

3

Page 12: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1204/18/23

Structures• Structures are heterogeneous aggregates

with statically known accessors.• Statically known means we know their

“offset” at compile time.Sometimes these are named. X.ageSometimes the names are implicit as in pattern matching in ML

fun f (Node(x,y,z)) = …Positions of x, y, and z, are statically known

Examples includeC - struct

struct node {Int value;Struct node *next;}

Java - Objects with instance variablesML - datatypes with constructors with more than one field

Page 13: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1304/18/23

Problems

• Anonymous valuesstruct node {

Int value;

Struct node *next;

}

Node x

f( *(X.next) )

Note that (X.next) is an anonymous value. A value without a name.

• Structure Layout– Layout requires alignment

– Computing offsets for each field.

– Offset depends on size of preceeding fields in the structure

Page 14: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1404/18/23

Anonymous values• Aliasing is a problem with anonymous values.

– Pointers

int a, *b;

b = &a;– Array References

Are x[i] and x[j-n] different?

p1 = (node *) malloc(sizeof(node));

p2 = (node *) malloc(sizeof(node));

If (. . .)

then p3 = p1;

else p3 = p2;

p1->value = . . .

p2->value = . . .

w := p1->value;

It is clear that p1->value is stored in a register. But what register? It depends upon the path through the if then else.Anonymous values are often stored in

memory because we can’t tell when they might change because of aliasing

Page 15: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1504/18/23

Recording and using field information

Structure name

Field name

length offset type

node 2 fields

value 4 bytes 0 int

next 4 bytes 4 node *

struct node {int value;struct node *next;}

p1->next

loadI 4 => r1 // offset of next

loadAI rp1,r1 => r2 // value of p1->next

Page 16: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1604/18/23

Layout of structures• When laying out

structures– Meet all alignment rules

– Minimize the amount of space used

– Statically know the offset of each field.

Struct example {

int fee;

double fie;

int foe;

double fum;

} e1;

Structure name

Field name

length Naïve offset

type

example

4 fields

fee 4 bytes 0 int

fie 8 bytes 4 double

foe 4 bytes 12 int

fum 8 bytes 16 double

fee … fie foe … fum0 4 8 16 24

Note that the alignment of fie on double word boundaries makes

naïve offset be incorrect

Page 17: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1704/18/23

Alternate structure• We can reorder the layout of the fields• As long as the table is correct, the programmer

cannot observe this change.• This also save space as we don’t use unnecessary

padding

Structure name

Field name

length offset type

example 4 fields

fee 4 bytes 16 int

fie 8 bytes 0 double

foe 4 bytes 20 int

fum 8 bytes 8 double

feefie foefum0 8 16 20

Page 18: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1804/18/23

Arrays of structures

struct node {int value;int age;}

node x[4];

Value = 5Age = 34

Value = 2Age = 18

Value = 0Age = 3

Value = 9Age = 45

0

1

2

3

5

2

0

9

34

18

3

45

Value Age

We can represent these in at least two

ways. Performance

may vary.

Page 19: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

1904/18/23

Unions and run-time tags• Unions can have several different layouts at

runtime.• In order to distinguish at runtime, the user

must add a tag field that can be tested at runtime to distinguish.

struct two { int tag; union choice { struct { char * name } A struct { int age } B } field} u2;

• In ML the tags are the constructor names!

Page 20: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2004/18/23

Basic Blocks

• A (maximal length) straight-line code segment.

• Any jump or label (because it is the target of a jump) ends a basic block.

loadI @a => r2 loadAO rA,r2 => r3 loadI @b => r4 loadAO rA,r4 => r5L1: comp r3,r5 => cc1 cbr_Lt cc1 -> L2,L5L5: loadI @c => r6 loadAO rA,r6 => r7 loadI @d => r8 loadAO rA,r8 => r9 comp r7,r9 => cc2 cbr_Lt cc2 -> L6,L3L6: loadI @e => r10 loadAO rA,r10 => r11 loadI @f => r12 loadAO rA,r12 => r13 comp r11,r13 => cc3 cbr_Lt cc3 -> L2,L3L2: loadI true => r1 jumpI -> L4L3: loadI false => r1 jumpI -> L4L4: nop

Page 21: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2104/18/23

Sources

• Basic blocks are produced by– Control Flow constructs in the language

» If-then-else

» Loops

– Positional evaluation of booleans

– Short circuit evaluation

Page 22: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2204/18/23

Predication vs jumps

• Recall the predicated moveMov_GT cc,r1,r2, => r3

• Mostly we use these to avoid branching or jumps– if x<y then a <- c+d else a <- e+f

– comp rx,ry => cc1– add rc,rd => r1– add re,rf => r2– mov_LT cc1,r1,r2 => ra

• If the branches to the else and then are large, we may do too much speculative execution, so using jumps may be better.

• Other considerations– Expected frequency of one path over another

– Complicated control flow (other if-then-else) inside the then or else

Page 23: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2304/18/23

Generating code• Because of our experience with short-circuit

evaluation we have all the tools to generate code with control flow.

• We will need one more IR instruction

datatype IR = LoadI of (string * Reg) | LoadAO of (Reg * Reg * Reg) | Arith of (Op * Reg * Reg * Reg)

| Comp of (Reg * Reg * CC) | Neg of (Reg * Reg) | Cmp of (Op * Reg * Reg * Reg) | Cbr of (Op * CC * Label * Label) | JumpI of Label | Lab of (Label * IR) | Nop | StoreAO of (Reg * Reg * Reg)

Page 24: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2404/18/23

Translating statements

fun stmt dict x =

case x of

Assign (NONE,v,NONE,exp) =>

let val result = expr dict exp

val b = base (Var(NONE,v))

val delta = offset (Var(NONE,v))

in emit (StoreAO(result,b,delta));

result

end

Page 25: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2504/18/23

z := x – (2 * y)

loadI @x => r1

loadAO rA,r1 => r2

loadI 2 => r3

loadI @y => r4

loadAO rA,r4 => r5

Mul r3,r5 => r6

Sub r2,r6 => r7

loadI @z => r8

storeAO r7 => rA,r8

Page 26: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2604/18/23

If then elsefun stmt dict x =case x of If (tst,thenS,elseS) => let val [start,thenL,elseL,endL] = NextLabel 4 in short dict tst start thenL elseL; emitAt thenL Nop; stmt dict thenS; emit (JumpI,endL); emitAt elseL Nop; stmt dict elseS; emitAt endL Nop; end;

Note how we take advantage of the short

circuit evaluation mechanism

Page 27: Cse322, Programming Languages and Compilers 1 6/22/2015 Lecture #4, April 12, 2007 Strings (representation, byte operation, copying), Structures (representation,

Cse322, Programming Languages and Compilers

2704/18/23

Loops

• Loops have multiple parts– Initialization

– Tests for termination

– Body

– Jump to continue loop

• Your homework on tuesday will be to extend S04code.sml to include translation of the while statement.