Computer Organization: A Programmer's Perspective

Computer Organization:A Programmer's Perspective

Bits, Bytes, Nibbles, Words and Strings

Computer Organization: A Programmer's Perspective Based on class notes by Bryant and O'Hallaron 2

Topics

Why bits? Why 0/1? Basic terms: Bits, Bytes, Nibbles, Words Representing information as bits

Characters and strings Instructions (more on this when we look at assembly) Numbers

Bit-level manipulations Boolean algebra Expressing in C


Everything is bits

Each bit is 0 or 1 By encoding/interpreting sets of bits in various ways

Computers represent numbers, sets, strings, etc…

Computers manipulate representations (instructions)

Why bits? Electronic implementation is easy Easy to store with bistable elements

Reliably transmitted on noisy and inaccurate wires

0.0V

0.2V

0.9V

1.1V

0 1 0


Binary Representations Number representation (base 2)

Represent 1521310 as 111011011011012

Represent 1.5213 X 104 as 1.11011011011012 X 213

Represent 1.2010 as 1.0011001100110011[0011]…2

Instruction representation In AMD64 machine code

RETQ command: C3 (hex) = 110000112

MOV $0, %EAX: b8 (hex) = 101110002 [00000000 … ]


Terms

Bit: Single binary digit, 0 or 1 Byte: 8 bits.

Smallest unit of memory used in modern computers Nibble (English: small bite): 4 bits

2 nibbles = 1 byte Word: 8-64 bits (1 to 8 bytes)

Depends on machine!


Encoding Byte Values

Byte = 8 bits, nibble = 4 bits Binary 000000002 to 111111112

Decimal: 010 to 25510

Hexadecimal 0016 to FF16

Two nibblesBase 16 number representationUse characters ‘0’ to ‘9’ and ‘A’ to ‘F’Write FA1D37B16 in C as 0xFA1D37B

» Or 0xfa1d37b

Octal: 08 to 3778

Base 8, Not often usedWritten in C as '0256' (0 is zero)

0 0 00001 1 00012 2 00103 3 00114 4 01005 5 01016 6 01107 7 01118 8 10009 9 1001A 10 1010B 11 1011C 12 1100D 13 1101E 14 1110F 15 1111

HexDecim

al

Binary


Bit level operations


Boolean Algebra (George Boole, 19th century)

Developed by George Boole in 19th Century Algebraic representation of logic

Encode “True” as 1 and “False” as 0

And A&B = 1 when both A=1 and

B=1

Or A|B = 1 when either A=1 or

B=1

Not ~A = 1 when

A=0

Exclusive-Or (Xor) A^B = 1 when either A=1 or B=1, but not

both


General Boolean Algebras

Operate on Bit Vectors Operations applied bitwise

All of the Properties of Boolean Algebra Apply

01101001& 01010101 01000001

01101001| 01010101 01111101

01101001^ 01010101 00111100

~ 01010101 10101010 01000001 01111101 00111100 10101010


Example: Representing & Manipulating Sets

Representation Width w bit vector represents subsets of {0, …, w–1}

aj = 1 if j A∈

01101001 { 0, 3, 5, 6 }

76543210

01010101 { 0, 2, 4, 6 }

76543210

Operations & Intersection 01000001 { 0, 6 }

| Union 01111101 { 0, 2, 3, 4, 5, 6 }

^ Symmetric difference 00111100 { 2, 3, 4, 5 }

~ Complement 10101010 { 1, 3, 5, 7 }


Bit-Level Operations in C

Operations &, |, ~, ^ Available in C Apply to any “integral” data type

long, int, short, char, unsigned

View arguments as bit vectors

Arguments applied bit-wise

Examples (Char data type) ~0x41 ➙ 0xBE

~010000012 ➙ 101111102

~0x00 ➙ 0xFF ~000000002 ➙ 111111112

0x69 & 0x55 ➙ 0x41 011010012 & 010101012 ➙ 010000012

0x69 | 0x55 ➙ 0x7D 011010012 | 010101012 ➙ 011111012


Contrast: Logic Operations in C

Contrast to Logical Operators &&, ||, !

View 0 as “False”

Anything nonzero as “True”

Always return 0 or 1

Early termination

Examples (char data type) !0x41 ➙ 0x00 !0x00 ➙ 0x01 !!0x41 ➙ 0x01

0x69 && 0x55 ➙ 0x01 0x69 || 0x55 ➙ 0x01

p && *p (avoids null pointer access)


Shift Operations

Left Shift: x << y

Shift bit-vector x left y positions (throw extras on left)

Fill with 0’s on right

Right Shift: x >> y

Shift bit-vector x right y positions (throw extras on right)

Logical shift: Fill with 0’s on left

Arithmetic shift: Replicate most significant bit on left

Undefined Behavior

Arithmetic or logical right shift? Up to compiler!

Often arithm. for signed, logical otherwise

Shift amount < 0 or ≥ word size

01100010Argument x

00010000<< 3

00011000Log. >> 2

00011000Arith. >> 2

10100010Argument x

00010000<< 3

00101000Log. >> 2

11101000Arith. >> 2

0001000000010000

0001100000011000

0001100000011000

00010000

00101000

11101000

00010000

00101000

11101000


Cool Stuff with Xor

void funnyswap(int *x, int *y){ *x = *x ^ *y; /* #1 */ *y = *x ^ *y; /* #2 */ *x = *x ^ *y; /* #3 */}

Bitwise Xor is form of addition

With extra property that every value is its own additive inverse A ^ A = 0

BABegin

BA^B1

(A^B)^B = AA^B2

A(A^B)^A = B3

ABEnd

*y*x


Some basic representations

C Data Type Typical 32-bit Typical 64-bit x86-64

char 1 1 1

short 2 2 2

int 4 4 4

long 4 8 8

float 4 4 4

double 8 8 8

long double − − 10/16

pointer 4 8 8


Encoding IntegersEncoding Integers

short int x = 15213; short int y = -15213;

C short 2 bytes long

Sign Bit For 2’s complement, most significant bit indicates sign

0 for nonnegative 1 for negative

B2T (X ) xw1 2w1 xi 2

i

i0

w2

B2U(X ) xi 2i

i0

w1

Unsigned Two’s Complement

SignBit

Decimal Hex Binaryx 15213 3B 6D 00111011 01101101y -15213 C4 93 11000100 10010011


Encoding Example (Cont.)Encoding Example (Cont.) x = 15213: 00111011 01101101 y = -15213: 11000100 10010011

Weight 15213 -152131 1 1 1 12 0 0 1 24 1 4 0 08 1 8 0 0

16 0 0 1 1632 1 32 0 064 1 64 0 0

128 0 0 1 128256 1 256 0 0512 1 512 0 0

1024 0 0 1 10242048 1 2048 0 04096 1 4096 0 08192 1 8192 0 0

16384 0 0 1 16384-32768 0 0 1 -32768

Sum 15213 -15213


Numeric RangesNumeric Ranges

Unsigned Values UMin = 0

000…0 UMax = 2w – 1

111…1

Two’s Complement Values TMin = –2w–1

100…0 TMax = 2w–1 – 1

011…1

Other Values Minus 1

111…1

Decimal Hex BinaryUMax 65535 FF FF 11111111 11111111TMax 32767 7F FF 01111111 11111111TMin -32768 80 00 10000000 00000000-1 -1 FF FF 11111111 111111110 0 00 00 00000000 00000000

Values for W = 16


Values for Different Word Sizes

Observations |TMin | =

TMax + 1 Asymmetric range

UMax = 2 * TMax + 1

C Programming #include <limits.h>

K&R App. B11

Declares constants, e.g., ULONG_MAX

LONG_MAX

LONG_MIN

Values platform-specific

W8 16 32 64

UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808


Unsigned & Signed Numeric ValuesEquivalence

Same encodings for nonnegative values

Uniqueness Every bit pattern represents

unique integer value Each representable integer has

unique bit encoding

Can Invert Mappings U2B(x) = B2U-1(x) T2B(x) = B2T-1(x)

Bit pattern for two’s comp integer

X B2T(X)B2U(X)0000 00001 10010 20011 30100 40101 50110 60111 7

–88

–79

–610

–511

–412

–313

–214

–115

1000

1001

1010

1011

1100

1101

1110

1111

0

1

2

3

4

5

6

7


short int x = 15213; unsigned short int ux = (unsigned short) x; short int y = -15213; unsigned short int uy = (unsigned short) y;

Casting Signed to UnsignedCasting Signed to Unsigned

C Allows Conversions from Signed to Unsigned

Resulting Value No change in bit representation Nonnegative values unchanged

ux = 15213 Negative values change into (large) positive values

uy = 50323


Relation Between Signed & UnsignedRelation Between Signed & Unsigned

uy = y + 2 * 32768= y + 65536

Weight -15213 503231 1 1 1 12 1 2 1 24 0 0 0 08 0 0 0 0

16 1 16 1 1632 0 0 0 064 0 0 0 0

128 1 128 1 128256 0 0 0 0512 0 0 0 0

1024 1 1024 1 10242048 0 0 0 04096 0 0 0 08192 0 0 0 0

16384 1 16384 1 1638432768 1 -32768 1 32768

Sum -15213 50323


0

TMax

TMin

–1–2

0

UMaxUMax – 1

TMaxTMax + 1

2’s Complement Range

UnsignedRange

Conversion Visualized 2’s Comp. Unsigned

Ordering Inversion

Negative Big Positive


Signed vs. Unsigned in CSigned vs. Unsigned in CConstants

By default are considered to be signed integers Unsigned if have “U” as suffix

0U, 4294967259U

Casting Explicit casting between signed & unsigned (U2T and T2U)

int tx, ty;unsigned ux, uy;tx = (int) ux;uy = (unsigned) ty;

Implicit casting also occurs via assignments, procedure callstx = ux;uy = ty;


0 0U == unsigned

-1 0 < signed

-1 0U > unsigned

2147483647 -2147483648 > signed

2147483647U -2147483648 < unsigned

-1 -2 > signed

(unsigned) -1 -2 > unsigned

2147483647 2147483648U < unsigned

2147483647 (int) 2147483648U > signed

Casting SurprisesCasting SurprisesExpression Evaluation

If mix unsigned and signed in single expression, signed values implicitly cast to unsigned

Including comparison operations <, >, ==, <=, >=Examples for W = 32

Constant1 Constant2 Relation Evaluation0 0U-1 0-1 0U2147483647 -2147483648 2147483647U -2147483648 -1 -2 (unsigned)-1 -2 2147483647 2147483648U 2147483647 (int)2147483648U


Remember!

Bit pattern does not change Bit pattern gets re-interpreted==> Unexpected effects

When mixed signed, unsigned => cast to unsigned


Sign ExtensionSign Extension

Task: Given w-bit signed integer x Convert it to w+k-bit integer with same value

Rule: Make k copies of sign bit: X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0

k copies of MSB• • •X

X • • • • • •

• • •

w

wk


Justification For Sign ExtensionJustification For Sign ExtensionProve Correctness by Induction on k

Induction Step: extending by single bit maintains value

Key observation: –2w–1 = –2w +2w–1

Look at weight of upper bits: X –2w–1 xw–1 X –2w xw–1 + 2w–1 xw–1 = –2w–1 xw–1

- • • •X

X - + • • •

w+1

w


Sign Extension ExampleSign Extension Example

Converting from smaller to larger integer data type C automatically performs sign extension

short int x = 15213; int ix = (int) x; short int y = -15213; int iy = (int) y;

Decimal Hex Binary

x 15213 3B 6D 00111011 01101101ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101y -15213 C4 93 11000100 10010011iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011


Visualizing (Mathematical) Integer Addition

Integer Addition 4-bit integers u, v

Compute true sum Add4(u , v)

Values increase linearly with u and v

Forms planar surface

Add4(u , v)

u

v0 2

46 8

1012

14

0

2

4

6

8

10

1214

0

4

8

12

16

20

24

28

32

Integer Addition


Unsigned Addition

Standard Addition Function Ignores carry output

Implements Modular Arithmetics = UAddw(u , v) = u + v mod 2w

• • •

• • •

u

v+

• • •u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits UAddw(u , v)


Visualizing Unsigned Addition

0

2w

2w+1

UAdd4(u , v)

u

v

True Sum

Modular Sum

Overflow

Overflow

02

46

810

1214

0

2

4

6

8

10

1214

0

2

4

6

8

10

12

14

16

Wraps Around If true sum ≥ 2w

At most once


Two’s Complement Addition

TAdd and UAdd have Identical Bit-Level Behavior Signed vs. unsigned addition in C:

int s, t, u, v;s = (int) ((unsigned) u + (unsigned) v);

t = u + v

Will give s == t

• • •

• • •

u

v+

• • •u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits TAddw(u , v)


TAdd Overflow

–2w –1

–2w

0

2w –1–1

2w–1

True Sum

TAdd Result

1 000…0

1 011…1

0 000…0

0 100…0

0 111…1

100…0

000…0

011…1

PosOver

NegOver

Functionality True sum requires w+1

bits

Drop off MSB

Treat remaining bits as 2’s comp. integer


Visualizing 2’s Complement Addition

Values 4-bit two’s comp.

Range from -8 to +7

Wraps Around If sum 2w–1

Becomes negative

At most once

If sum < –2w–1

Becomes positive

At most once

TAdd4(u , v)

u

v

PosOver

NegOver

-8 -6 -4-2 0

24

6

-8

-6

-4

-2

0

2

46

-8

-6

-4

-2

0

2

4

6

8


Multiplication

Goal: Computing Product of w-bit numbers x, y Either signed or unsigned

But, exact results can be bigger than w bits Unsigned: up to 2w bits

Result range: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1

Two’s complement min (negative): Up to 2w-1 bits

Result range: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1

Two’s complement max (positive): Up to 2w bits, but only for (TMinw)2

Result range: x * y ≤ (–2w–1) 2 = 22w–2

So, maintaining exact results… would need to keep expanding word size with each product computed

is done in software, if needed

e.g., by “arbitrary precision” arithmetic packages


Unsigned Multiplication in C

Standard Multiplication Function Ignores high order w bits

Implements Modular ArithmeticUMultw(u , v) = u · v mod 2w

• • •

• • •

u

v*

• • •u · v

• • •

True Product: 2*w bits

Operands: w bits

Discard w bits: w bitsUMultw(u , v)

• • •


Signed Multiplication in C


Some of which are different for signed vs. unsigned multiplication

Lower bits are the same

• • •

• • •

u

v*

• • •u · v

• • •


Operands: w bits

Discard w bits: w bitsTMultw(u , v)

• • •


Power-of-2 Multiply with Shift

Operation u << k gives u * 2k

Both signed and unsigned

Examples u << 3 == u * 8 (u << 5) – (u << 3) == u * 24 Most machines shift and add faster than multiply

Compiler generates this code automatically

• • •

0 0 1 0 0 0•••

u

2k*u · 2kTrue Product: w+k bits

Operands: w bits

Discard k bits: w bits UMultw(u , 2k)

•••

k

• • • 0 0 0•••

TMultw(u , 2k)0 0 0••••••


Why Should I Use Unsigned?Why Should I Use Unsigned?Don’t Use Just Because Number Nonzero

C compilers on some machines generate less efficient codeunsigned i; int cnt;for (i = 1; i < cnt; i++) a[i] += a[i-1];

Cast cnt to unsigned

(repeatedly)




Easy to make mistakesfor (i = cnt-2; i >= 0; i--) a[i] += a[i+1];

What’s the bug?





Loop foreveri never < 0




Should befor (i = cnt-2; i <cnt; i--) a[i] += a[i+1];

Why does this work?





Do Use For: Multiprecision or modular arithmetic Need extra bits' range, right up to limit of word size Represent sets


C PuzzleC Puzzle Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases

x < 0 ((x*2) < 0)

ux >= 0

(x & 7) == 7 (x<<30) < 0

ux > -1

x > y -x < -y

x * x >= 0

x > 0 && y > 0 x + y > 0

x >= 0 -x <= 0

x <= 0 -x >= 0

int x = foo();

int y = bar();

unsigned ux = x;

unsigned uy = y;


C Puzzle AnswersC Puzzle Answers

x < 0 ((x*2) < 0) False: TMin

ux >= 0 True: 0 = UMin

(x & 7) == 7 (x<<30) < 0 True: x1 = 1

ux > -1 False: 0

x > y -x < -y False: -1, TMin

x * x >= 0 False: x=65535

x > 0 && y > 0 x + y > 0 False: TMax, TMax

x >= 0 -x <= 0 True: –TMax < 0

x <= 0 -x >= 0 False: TMin

Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases


Security Example: Impact of (un)signed

Similar to code found in FreeBSD’s implementation of getpeername There are legions of smart people trying to find vulnerabilities in

programs

/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];

/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}


Typical Usage



#define MSIZE 528

void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(“%s\n”, mybuf);}


Malicious Usage



#define MSIZE 528

void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, -MSIZE); . . .}

/* Declaration of library function memcpy */void *memcpy(void *dest, void *src, size_t n);


Code Security Example #2

SUN XDR library Widely used library for transferring data between machines

void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);

ele_src

malloc(ele_cnt * ele_size)


XDR Code

void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL)

/* malloc failed */return NULL;

void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size);

/* Move pointer to next memory region */next += ele_size;

} return result;}


XDR Vulnerability

What if: ele_cnt = 220 + 1

ele_size = 4096 = 212

Allocation on 32 bits? What happens to the ‘next’ pointer?

How can I make this function secure?




Machine-Level Code Representation

Encode Program as Sequence of Instructions Arithmetic, logical, math operations Read or write memory Conditional branches, jumps

Different machines, different instructions Most code not binary compatible Alpha’s, Sun’s, ARM (tablets) use fix-length instructions

Reduced Instruction Set Computer (RISC) PC’s use variable length instructions

Complex Instruction Set Computer (CISC)


Representing Instructionsint sum(int x, int y){ return x+y;}

Different CPUs use totally different instructions and encodings

00

00

30

42

Alpha sum

01

80

FA

6B

E0

08

81

C3

Sun sum

90

02

00

09

For this example, Alpha & Sun use two 4-byte instructions Use differing numbers of instructions

in other cases

PC uses 7 instructions with lengths 1, 2, and 3 bytes Same for NT and for Linux

E5

8B

55

89

PC sum

45

0C

03

45

08

89

EC

5D

C3


Machines have Words

Imprecise definitions Nominal size of integer-valued data Sometimes size of address, memory bus width

Current desktop machines are 64 bits (8 bytes) Potentially address 1.8 X 1019 bytes 32-bit machines phasing out (but in phones, tablets)

Low-end use 8- or 16-bit words Machines support multiple data formats

Fractions or multiples of word size Always integral number of bytes


Byte-Oriented Memory Organization Programs Refer to Virtual Addresses

Conceptually very large array of bytes Implemented with hierarchy of different memory types

SRAM, DRAM, disk In Unix and Windows NT, address space private to “process”

Program can clobber its own data, but not that of others You will see this again, in much more detail

Compiler + Run-Time System Control Allocation Where different program objects should be stored Multiple mechanisms: static, stack, and heap In any case, all allocation within single virtual address space You will see this again, in much more detail


Word-Oriented Memory Organization

Addresses Specify Byte Locations Address of first byte in word Addresses of successive

words differ by 4 (32-bit) or 8 (64-bit)

0000

0001

0002

0003

0004

0005

0006

0007

0008

0009

0010

0011

32-bitWords

Bytes Addr.

0012

0013

0014

0015

64-bitWords

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

0000

0004

0008

0012

0000

0008


Byte Ordering

How should bytes within multi-byte word be ordered in memory?

Conventions Sun’s, Mac’s are “Big Endian” machines

Least significant byte has highest address Alphas, PC’s are “Little Endian” machines

Least significant byte has lowest address


Byte Ordering Example

Big Endian Least significant byte has highest address

Little Endian Least significant byte has lowest address

Example Variable x has 4-byte representation 0x01234567 Address given by &x is 0x100

0x100 0x101 0x102 0x103

01 23 45 67

0x100 0x101 0x102 0x103

67 45 23 01

Big Endian

Little Endian

01 23 45 67

67 45 23 01


Reading Byte-Reversed Listings

Disassembly Text representation of binary machine code Generated by program that reads the machine code

Example Fragment

Address Instruction Code Assembly Rendition 8048365: 5b pop %ebx 8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx 804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)

Deciphering Numbers Value: 0x12ab

Pad to 4 bytes: 0x000012ab

Split into bytes: 00 00 12 ab

Reverse: ab 12 00 00


Examining Data Representations

Code to Print Byte Representation of Data Casting pointer to unsigned char * creates byte array

typedef unsigned char *pointer;

void show_bytes(pointer start, int len){ int i; for (i = 0; i < len; i++) printf("0x%p\t0x%.2x\n", start+i, start[i]); printf("\n");}

Printf directives:%p: Print pointer%x: Print Hexadecimal


show_bytes Execution Example

int a = 15213;

printf("int a = 15213;\n");

show_bytes((pointer) &a, sizeof(int));

Result (Linux):

int a = 15213;

0x11ffffcb8 0x6d

0x11ffffcb9 0x3b

0x11ffffcba 0x00

0x11ffffcbb 0x00


Representing Integersint A = 15213;int B = -15213;long int C = 15213;

Decimal: 15213

Binary: 0011 1011 0110 1101

Hex: 3 B 6 D

6D

3B

00

00

Alpha A

3B

6D

00

00

Sun A

93

C4

FF

FF

Alpha B

C4

93

FF

FF

Sun B

Two’s complement representation

00

00

00

00

6D

3B

00

00

Alpha C

3B

6D

00

00

Sun C

6D

3B

00

00

ia32 C


Representing Pointersint B = -15213;int *P = &B;

Alpha Address

Hex: ... 0 1 F F F F F C A 0

Binary: 0000 0001 1111 1111 1111 1111 1111 1100 1010 0000

01

00

00

00

A0

FC

FF

FF

Alpha P

Sun Address

Hex: E F F F F B 2 C Binary: 1110 1111 1111 1111 1111 1011 0010 1100

Different compilers & machines assign different locations to objects

FB

2C

EF

FF

Sun P

FF

BF

D4

F8

Linux PLinux Address

Hex: B F F F F 8 D 4 Binary: 1011 1111 1111 1111 1111 1000 1101 0100


Characters and Strings


Representing a character Single byte sufficient for small codes

In C: char c; But interpretation of value depends on chosen code Fixed-length code:

Baudot, ITA2, USTTY: 5 bits BCD (binary coded decimal, used in early computers): 6 bits ASCII – defines 128 characters (7 bits) ISO 8859, EBCDIC: 8 bits

Variable-length codes UTF-8: 8-32 bits UTF-16: 16-32 bits

Wide chars: chars more than 8 bits.


Example: ASCII


Representing Strings (in C) Represented by array of 1-byte characters

Each character encoded in ASCII formatStandard 7-bit encoding of character setCharacter “0” has code 0x30

» Digit i has code 0x30+i Other encodings exist (e.g., for Hebrew) String are null-terminated

Final character = 0x00Array length must be strlen()+1.

Alpha S Sun S

32

31

31

35

33

00

32

31

31

35

33

00

char S[6] = "15213";






Text files generally platform independent But: Different conventions of line termination character(s)! But: Different default encodings (depend on locale)

Alpha S Sun S

32

31

31

35

33

00

32

31

31

35

33

00

char S[6] = "15213";


Representing Strings (Cont'd) Strings in Pascal (a.k.a P-Strings)

Represented by array of characters+1 First cell holds length, no need for null-termination Length known in O(1)!

Think about strcat(), strlen(), strcpy(), ... Size of string limited by size of array cell value

p-String

35

32

5

31

31

33





c-strings and p-strings can be used with wide chars, but: big/little endian now matters (each cell multiple bytes) No longer built-in, instead platform/library specific

p-String

35

32

5

31

31

33


Main Points

Boolean Algebra is Mathematical Basis Basic form encodes “false” as 0, “true” as 1 General form like bit-level operations in C Good for representing & manipulating sets

It’s All About Bits & Bytes Numbers, text, programs: They are all collections of bits!

Different Machines/Languages: Different Conventions Representations Word size Byte ordering


שאלות?

Computer Organization:A Programmer's Perspective

Bits, Bytes, Nibbles, Words and Strings


Topics Why bits? Why 0/1? Basic terms: Bits, Bytes, Nibbles, Words Representing information as bits

Characters and strings Instructions (more on this when we look at assembly) Numbers

Bit-level manipulations Boolean algebra Expressing in C


Everything is bits

Each bit is 0 or 1 By encoding/interpreting sets of bits in various ways

Computers represent numbers, sets, strings, etc…

Computers manipulate representations (instructions)

Why bits? Electronic implementation is easy Easy to store with bistable elements

Reliably transmitted on noisy and inaccurate wires

0.0V

0.2V

0.9V

1.1V

0 1 0


Binary Representations Number representation (base 2)

Represent 1521310 as 111011011011012

Represent 1.5213 X 104 as 1.11011011011012 X 213

Represent 1.2010 as 1.0011001100110011[0011]…2

Instruction representation In AMD64 machine code

RETQ command: C3 (hex) = 110000112

MOV $0, %EAX: b8 (hex) = 101110002 [00000000 … ]


Terms

Bit: Single binary digit, 0 or 1 Byte: 8 bits.

Smallest unit of memory used in modern computers Nibble (English: small bite): 4 bits

2 nibbles = 1 byte Word: 8-64 bits (1 to 8 bytes)

Depends on machine!


Encoding Byte Values

Byte = 8 bits, nibble = 4 bits Binary 000000002 to 111111112

Decimal: 010 to 25510

Hexadecimal 0016 to FF16

Two nibblesBase 16 number representationUse characters ‘0’ to ‘9’ and ‘A’ to ‘F’Write FA1D37B16 in C as 0xFA1D37B

» Or 0xfa1d37b

Octal: 08 to 3778

Base 8, Not often usedWritten in C as '0256' (0 is zero)

0 0 00001 1 00012 2 00103 3 00114 4 01005 5 01016 6 01107 7 01118 8 10009 9 1001A 10 1010B 11 1011C 12 1100D 13 1101E 14 1110F 15 1111

HexDecim

al

Binary


Bit level operations


Boolean Algebra (George Boole, 19th century)

Developed by George Boole in 19th Century Algebraic representation of logic

Encode “True” as 1 and “False” as 0

And A&B = 1 when both A=1 and

B=1

Or A|B = 1 when either A=1 or

B=1

Not ~A = 1 when

A=0

Exclusive-Or (Xor) A^B = 1 when either A=1 or B=1, but not

both


General Boolean Algebras

Operate on Bit Vectors Operations applied bitwise

All of the Properties of Boolean Algebra Apply

01101001& 01010101 01000001

01101001| 01010101 01111101

01101001^ 01010101 00111100

~ 01010101 10101010 01000001 01111101 00111100 10101010


Example: Representing & Manipulating Sets

Representation Width w bit vector represents subsets of {0, …, w–1}

aj = 1 if j A∈

01101001 { 0, 3, 5, 6 }

76543210

01010101 { 0, 2, 4, 6 }

76543210

Operations & Intersection 01000001 { 0, 6 }

| Union 01111101 { 0, 2, 3, 4, 5, 6 }

^ Symmetric difference 00111100 { 2, 3, 4, 5 }

~ Complement 10101010 { 1, 3, 5, 7 }


Bit-Level Operations in C

Operations &, |, ~, ^ Available in C Apply to any “integral” data type

long, int, short, char, unsigned

View arguments as bit vectors

Arguments applied bit-wise

Examples (Char data type) ~0x41 ➙ 0xBE

~010000012 ➙ 101111102

~0x00 ➙ 0xFF ~000000002 ➙ 111111112

0x69 & 0x55 ➙ 0x41 011010012 & 010101012 ➙ 010000012

0x69 | 0x55 ➙ 0x7D 011010012 | 010101012 ➙ 011111012


Contrast: Logic Operations in C

Contrast to Logical Operators &&, ||, !

View 0 as “False”

Anything nonzero as “True”

Always return 0 or 1

Early termination

Examples (char data type) !0x41 ➙ 0x00 !0x00 ➙ 0x01 !!0x41 ➙ 0x01

0x69 && 0x55 ➙ 0x01 0x69 || 0x55 ➙ 0x01

p && *p (avoids null pointer access)


Shift Operations

Left Shift: x << y Shift bit-vector x left y positions (throw extras on left)

Fill with 0’s on right

Right Shift: x >> y Shift bit-vector x right y positions (throw extras on right)

Logical shift: Fill with 0’s on left

Arithmetic shift: Replicate most significant bit on left

Undefined Behavior

Arithmetic or logical right shift? Up to compiler!

Often arithm. for signed, logical otherwise

Shift amount < 0 or ≥ word size

01100010Argument x

00010000<< 3

00011000Log. >> 2

00011000Arith. >> 2

10100010Argument x

00010000<< 3

00101000Log. >> 2

11101000Arith. >> 2

0001000000010000

0001100000011000

0001100000011000

00010000

00101000

11101000

00010000

00101000

11101000


Cool Stuff with Xor

void funnyswap(int *x, int *y){ *x = *x ^ *y; /* #1 */ *y = *x ^ *y; /* #2 */ *x = *x ^ *y; /* #3 */}

Bitwise Xor is form of addition

With extra property that every value is its own additive inverse A ^ A = 0

BABegin

BA^B1

(A^B)^B = AA^B2

A(A^B)^A = B3

ABEnd

*y*x


Some basic representations

C Data Type Typical 32-bit Typical 64-bit x86-64

char 1 1 1

short 2 2 2

int 4 4 4

long 4 8 8

float 4 4 4

double 8 8 8

long double − − 10/16

pointer 4 8 8


Encoding IntegersEncoding Integers

short int x = 15213; short int y = -15213;

C short 2 bytes long

Sign Bit For 2’s complement, most significant bit indicates sign

0 for nonnegative 1 for negative

B2T (X ) xw1 2w1 xi 2

i

i0

w2

B2U(X ) xi 2i

i0

w1

Unsigned Two’s Complement

SignBit

Decimal Hex Binaryx 15213 3B 6D 00111011 01101101y -15213 C4 93 11000100 10010011


Encoding Example (Cont.)Encoding Example (Cont.) x = 15213: 00111011 01101101 y = -15213: 11000100 10010011

Weight 15213 -152131 1 1 1 12 0 0 1 24 1 4 0 08 1 8 0 0

16 0 0 1 1632 1 32 0 064 1 64 0 0

128 0 0 1 128256 1 256 0 0512 1 512 0 0

1024 0 0 1 10242048 1 2048 0 04096 1 4096 0 08192 1 8192 0 0

16384 0 0 1 16384-32768 0 0 1 -32768

Sum 15213 -15213


Numeric RangesNumeric Ranges

Unsigned Values UMin = 0

000…0 UMax = 2w – 1

111…1

Two’s Complement Values TMin = –2w–1

100…0 TMax = 2w–1 – 1

011…1

Other Values Minus 1

111…1

Decimal Hex BinaryUMax 65535 FF FF 11111111 11111111TMax 32767 7F FF 01111111 11111111TMin -32768 80 00 10000000 00000000-1 -1 FF FF 11111111 111111110 0 00 00 00000000 00000000

Values for W = 16


Values for Different Word Sizes

Observations |TMin | =

TMax + 1 Asymmetric range

UMax = 2 * TMax + 1

C Programming #include <limits.h>

K&R App. B11

Declares constants, e.g., ULONG_MAX LONG_MAX LONG_MIN

Values platform-specific

W8 16 32 64

UMax 255 65,535 4,294,967,295 18,446,744,073,709,551,615TMax 127 32,767 2,147,483,647 9,223,372,036,854,775,807TMin -128 -32,768 -2,147,483,648 -9,223,372,036,854,775,808


Unsigned & Signed Numeric ValuesEquivalence

Same encodings for nonnegative values

Uniqueness Every bit pattern represents

unique integer value Each representable integer has

unique bit encoding

Can Invert Mappings U2B(x) = B2U-1(x) T2B(x) = B2T-1(x)

Bit pattern for two’s comp integer

X B2T(X)B2U(X)0000 00001 10010 20011 30100 40101 50110 60111 7

–88

–79

–610

–511

–412

–313

–214

–115

1000

1001

1010

1011

1100

1101

1110

1111

0

1

2

3

4

5

6

7


short int x = 15213; unsigned short int ux = (unsigned short) x; short int y = -15213; unsigned short int uy = (unsigned short) y;

Casting Signed to UnsignedCasting Signed to Unsigned

C Allows Conversions from Signed to Unsigned

Resulting Value No change in bit representation Nonnegative values unchanged

ux = 15213 Negative values change into (large) positive values

uy = 50323


Relation Between Signed & UnsignedRelation Between Signed & Unsigned

uy = y + 2 * 32768= y + 65536

Weight -15213 503231 1 1 1 12 1 2 1 24 0 0 0 08 0 0 0 0

16 1 16 1 1632 0 0 0 064 0 0 0 0

128 1 128 1 128256 0 0 0 0512 0 0 0 0

1024 1 1024 1 10242048 0 0 0 04096 0 0 0 08192 0 0 0 0

16384 1 16384 1 1638432768 1 -32768 1 32768

Sum -15213 50323


0

TMax

TMin

–1–2

0

UMaxUMax – 1

TMaxTMax + 1

2’s Complement Range

UnsignedRange

Conversion Visualized 2’s Comp. Unsigned

Ordering Inversion

Negative Big Positive


Signed vs. Unsigned in CSigned vs. Unsigned in CConstants

By default are considered to be signed integers Unsigned if have “U” as suffix

0U, 4294967259U

Casting Explicit casting between signed & unsigned (U2T and T2U)

int tx, ty;unsigned ux, uy;tx = (int) ux;uy = (unsigned) ty;

Implicit casting also occurs via assignments, procedure callstx = ux;uy = ty;


0 0U == unsigned

-1 0 < signed

-1 0U > unsigned

2147483647 -2147483648 > signed

2147483647U -2147483648 < unsigned

-1 -2 > signed

(unsigned) -1 -2 > unsigned

2147483647 2147483648U < unsigned

2147483647 (int) 2147483648U > signed

Casting SurprisesCasting SurprisesExpression Evaluation

If mix unsigned and signed in single expression, signed values implicitly cast to unsigned

Including comparison operations <, >, ==, <=, >=Examples for W = 32

Constant1 Constant2 Relation Evaluation0 0U-1 0

-1 0U2147483647 -2147483648 2147483647U -2147483648

-1 -2 (unsigned)-1 -2

2147483647 2147483648U 2147483647 (int)2147483648U


Remember!

Bit pattern does not change Bit pattern gets re-interpreted==> Unexpected effects

When mixed signed, unsigned => cast to unsigned


Sign ExtensionSign Extension

Task: Given w-bit signed integer x Convert it to w+k-bit integer with same value

Rule: Make k copies of sign bit: X = xw–1 ,…, xw–1 , xw–1 , xw–2 ,…, x0

k copies of MSB• • •X

X • • • • • •

• • •

w

wk


Justification For Sign ExtensionJustification For Sign ExtensionProve Correctness by Induction on k

Induction Step: extending by single bit maintains value

Key observation: –2w–1 = –2w +2w–1

Look at weight of upper bits: X –2w–1 xw–1 X –2w xw–1 + 2w–1 xw–1 = –2w–1 xw–1

- • • •X

X - + • • •

w+1

w


Sign Extension ExampleSign Extension Example

Converting from smaller to larger integer data type C automatically performs sign extension

short int x = 15213; int ix = (int) x; short int y = -15213; int iy = (int) y;

Decimal Hex Binary

x 15213 3B 6D 00111011 01101101ix 15213 00 00 3B 6D 00000000 00000000 00111011 01101101y -15213 C4 93 11000100 10010011iy -15213 FF FF C4 93 11111111 11111111 11000100 10010011


Visualizing (Mathematical) Integer Addition

Integer Addition 4-bit integers u, v

Compute true sum Add4(u , v)

Values increase linearly with u and v

Forms planar surface

Add4(u , v)

u

v0

2 46

810

1214

0

2

4

6

8

10

1214

0

4

8

12

16

20

24

28

32

Integer Addition


Unsigned Addition

Standard Addition Function Ignores carry output

Implements Modular Arithmetics = UAddw(u , v) = u + v mod 2w

• • •

• • •

u

v+

• • •u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits UAddw(u , v)


Visualizing Unsigned Addition

0

2w

2w+1

UAdd4(u , v)

u

v

True Sum

Modular Sum

Overflow

Overflow

0 24

6 810

1214

0

2

4

6

8

10

1214

0

2

4

6

8

10

12

14

16

Wraps Around If true sum ≥ 2w

At most once


Two’s Complement Addition

TAdd and UAdd have Identical Bit-Level Behavior Signed vs. unsigned addition in C:

int s, t, u, v;s = (int) ((unsigned) u + (unsigned) v);

t = u + v

Will give s == t

• • •

• • •

u

v+

• • •u + v

• • •

True Sum: w+1 bits

Operands: w bits

Discard Carry: w bits TAddw(u , v)


TAdd Overflow

–2w –1

–2w

0

2w –1–1

2w–1

True Sum

TAdd Result

1 000…0

1 011…1

0 000…0

0 100…0

0 111…1

100…0

000…0

011…1

PosOver

NegOver

Functionality True sum requires w+1

bits

Drop off MSB

Treat remaining bits as 2’s comp. integer


Visualizing 2’s Complement Addition

Values 4-bit two’s comp.

Range from -8 to +7

Wraps Around If sum 2w–1

Becomes negative

At most once

If sum < –2w–1

Becomes positive

At most once

TAdd4(u , v)

u

v

PosOver

NegOver

-8 -6 -4-2 0

24

6

-8

-6

-4

-2

0

2

46

-8

-6

-4

-2

0

2

4

6

8


Multiplication

Goal: Computing Product of w-bit numbers x, y Either signed or unsigned

But, exact results can be bigger than w bits Unsigned: up to 2w bits

Result range: 0 ≤ x * y ≤ (2w – 1) 2 = 22w – 2w+1 + 1

Two’s complement min (negative): Up to 2w-1 bits

Result range: x * y ≥ (–2w–1)*(2w–1–1) = –22w–2 + 2w–1

Two’s complement max (positive): Up to 2w bits, but only for (TMinw)2

Result range: x * y ≤ (–2w–1) 2 = 22w–2

So, maintaining exact results… would need to keep expanding word size with each product computed

is done in software, if needed

e.g., by “arbitrary precision” arithmetic packages


Unsigned Multiplication in C


Implements Modular ArithmeticUMultw(u , v) = u · v mod 2w

• • •

• • •

u

v*

• • •u · v

• • •


Operands: w bits

Discard w bits: w bitsUMultw(u , v)

• • •


Signed Multiplication in C


Some of which are different for signed vs. unsigned multiplication

Lower bits are the same

• • •

• • •

u

v*

• • •u · v

• • •


Operands: w bits

Discard w bits: w bitsTMultw(u , v)

• • •


Power-of-2 Multiply with Shift

Operation u << k gives u * 2k

Both signed and unsigned

Examples u << 3 == u * 8 (u << 5) – (u << 3) == u * 24 Most machines shift and add faster than multiply

Compiler generates this code automatically

• • •

0 0 1 0 0 0•••

u

2k*u · 2kTrue Product: w+k bits

Operands: w bits

Discard k bits: w bits UMultw(u , 2k)

•••

k

• • • 0 0 0•••

TMultw(u , 2k)0 0 0••••••




Cast cnt to unsigned

(repeatedly)

First code bad because may cause repeated casting of cnt to unsigned with every passage of loop

Second code will loop forever. When unsigned i=0, do i-- which is a big number





What’s the bug?







Loop foreveri never < 0







Why does this work?







Do Use For: Multiprecision or modular arithmetic Need extra bits' range, right up to limit of word size Represent sets




C PuzzleC Puzzle Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases

x < 0 ((x*2) < 0)

ux >= 0

(x & 7) == 7 (x<<30) < 0

ux > -1

x > y -x < -y

x * x >= 0

x > 0 && y > 0 x + y > 0

x >= 0 -x <= 0

x <= 0 -x >= 0

int x = foo();

int y = bar();

unsigned ux = x;

unsigned uy = y;


C Puzzle AnswersC Puzzle Answers

x < 0 ((x*2) < 0) False: TMin

ux >= 0 True: 0 = UMin

(x & 7) == 7 (x<<30) < 0 True: x1 = 1

ux > -1 False: 0

x > y -x < -y False: -1, TMin

x * x >= 0 False: x=65535

x > 0 && y > 0 x + y > 0 False: TMax, TMax

x >= 0 -x <= 0 True: –TMax < 0

x <= 0 -x >= 0 False: TMin

Assume machine with 32 bit word size, two’s comp. integers TMin makes a good counterexample in many cases


Security Example: Impact of (un)signed

Similar to code found in FreeBSD’s implementation of getpeername There are legions of smart people trying to find vulnerabilities in

programs




Typical Usage



#define MSIZE 528

void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(“%s\n”, mybuf);}


Malicious Usage



#define MSIZE 528

void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, -MSIZE); . . .}

/* Declaration of library function memcpy */void *memcpy(void *dest, void *src, size_t n);


Code Security Example #2

SUN XDR library Widely used library for transferring data between machines

void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size);

ele_src



XDR Code

void* copy_elements(void *ele_src[], int ele_cnt, size_t ele_size) { /* * Allocate buffer for ele_cnt objects, each of ele_size bytes * and copy from locations designated by ele_src */ void *result = malloc(ele_cnt * ele_size); if (result == NULL)

/* malloc failed */return NULL;

void *next = result; int i; for (i = 0; i < ele_cnt; i++) { /* Copy object i to destination */ memcpy(next, ele_src[i], ele_size);

/* Move pointer to next memory region */next += ele_size;

} return result;}


XDR Vulnerability

What if: ele_cnt = 220 + 1

ele_size = 4096 = 212

Allocation on 32 bits? What happens to the ‘next’ pointer?

How can I make this function secure?




Machine-Level Code Representation

Encode Program as Sequence of Instructions Arithmetic, logical, math operations Read or write memory Conditional branches, jumps

Different machines, different instructions Most code not binary compatible Alpha’s, Sun’s, ARM (tablets) use fix-length instructions

Reduced Instruction Set Computer (RISC) PC’s use variable length instructions

Complex Instruction Set Computer (CISC)


Representing Instructionsint sum(int x, int y){ return x+y;}

Different CPUs use totally different instructions and encodings

00

00

30

42

Alpha sum

01

80

FA

6B

E0

08

81

C3

Sun sum

90

02

00

09

For this example, Alpha & Sun use two 4-byte instructions Use differing numbers of instructions

in other cases

PC uses 7 instructions with lengths 1, 2, and 3 bytes Same for NT and for Linux

E5

8B

55

89

PC sum

45

0C

03

45

08

89

EC

5D

C3


Machines have Words

Imprecise definitions Nominal size of integer-valued data Sometimes size of address, memory bus width

Current desktop machines are 64 bits (8 bytes) Potentially address 1.8 X 1019 bytes 32-bit machines phasing out (but in phones, tablets)

Low-end use 8- or 16-bit words Machines support multiple data formats

Fractions or multiples of word size Always integral number of bytes


Byte-Oriented Memory Organization Programs Refer to Virtual Addresses

Conceptually very large array of bytes Implemented with hierarchy of different memory types

SRAM, DRAM, disk In Unix and Windows NT, address space private to “process”

Program can clobber its own data, but not that of others You will see this again, in much more detail

Compiler + Run-Time System Control Allocation Where different program objects should be stored Multiple mechanisms: static, stack, and heap In any case, all allocation within single virtual address space You will see this again, in much more detail


Word-Oriented Memory Organization

Addresses Specify Byte Locations Address of first byte in word Addresses of successive

words differ by 4 (32-bit) or 8 (64-bit)

0000

0001

0002

0003

0004

0005

0006

0007

0008

0009

0010

0011

32-bitWords

Bytes Addr.

0012

0013

0014

0015

64-bitWords

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

Addr =??

0000

0004

0008

0012

0000

0008


Byte Ordering

How should bytes within multi-byte word be ordered in memory?

Conventions Sun’s, Mac’s are “Big Endian” machines

Least significant byte has highest address Alphas, PC’s are “Little Endian” machines

Least significant byte has lowest address


Byte Ordering Example

Big Endian Least significant byte has highest address

Little Endian Least significant byte has lowest address

Example Variable x has 4-byte representation 0x01234567 Address given by &x is 0x100

0x100 0x101 0x102 0x103

01 23 45 67

0x100 0x101 0x102 0x103

67 45 23 01

Big Endian

Little Endian

01 23 45 67

67 45 23 01


Reading Byte-Reversed Listings

Disassembly Text representation of binary machine code Generated by program that reads the machine code

Example Fragment

Address Instruction Code Assembly Rendition 8048365: 5b pop %ebx 8048366: 81 c3 ab 12 00 00 add $0x12ab,%ebx 804836c: 83 bb 28 00 00 00 00 cmpl $0x0,0x28(%ebx)

Deciphering Numbers Value: 0x12ab

Pad to 4 bytes: 0x000012ab

Split into bytes: 00 00 12 ab

Reverse: ab 12 00 00


Examining Data Representations

Code to Print Byte Representation of Data Casting pointer to unsigned char * creates byte array

typedef unsigned char *pointer;

void show_bytes(pointer start, int len){ int i; for (i = 0; i < len; i++) printf("0x%p\t0x%.2x\n", start+i, start[i]); printf("\n");}

Printf directives:%p: Print pointer%x: Print Hexadecimal


show_bytes Execution Example

int a = 15213;

printf("int a = 15213;\n");

show_bytes((pointer) &a, sizeof(int));

Result (Linux):

int a = 15213;

0x11ffffcb8 0x6d

0x11ffffcb9 0x3b

0x11ffffcba 0x00

0x11ffffcbb 0x00


Representing Integersint A = 15213;int B = -15213;long int C = 15213;

Decimal: 15213

Binary: 0011 1011 0110 1101

Hex: 3 B 6 D

6D

3B

00

00

Alpha A

3B

6D

00

00

Sun A

93

C4

FF

FF

Alpha B

C4

93

FF

FF

Sun B

Two’s complement representation

00

00

00

00

6D

3B

00

00

Alpha C

3B

6D

00

00

Sun C

6D

3B

00

00

ia32 C


Representing Pointersint B = -15213;int *P = &B;

Alpha Address

Hex: ... 0 1 F F F F F C A 0

Binary: 0000 0001 1111 1111 1111 1111 1111 1100 1010 0000

01

00

00

00

A0

FC

FF

FF

Alpha P

Sun Address

Hex: E F F F F B 2 C Binary: 1110 1111 1111 1111 1111 1011 0010 1100

Different compilers & machines assign different locations to objects

FB

2C

EF

FF

Sun P

FF

BF

D4

F8

Linux PLinux Address

Hex: B F F F F 8 D 4 Binary: 1011 1111 1111 1111 1111 1000 1101 0100


Characters and Strings


Representing a character Single byte sufficient for small codes

In C: char c; But interpretation of value depends on chosen code Fixed-length code:

Baudot, ITA2, USTTY: 5 bits BCD (binary coded decimal, used in early computers): 6 bits ASCII – defines 128 characters (7 bits) ISO 8859, EBCDIC: 8 bits

Variable-length codes UTF-8: 8-32 bits UTF-16: 16-32 bits

Wide chars: chars more than 8 bits.


Example: ASCII






Alpha S Sun S

32

31

31

35

33

00

32

31

31

35

33

00

char S[6] = "15213";






Text files generally platform independent But: Different conventions of line termination character(s)! But: Different default encodings (depend on locale)

Alpha S Sun S

32

31

31

35

33

00

32

31

31

35

33

00

char S[6] = "15213";





p-String

35

32

5

31

31

33





c-strings and p-strings can be used with wide chars, but: big/little endian now matters (each cell multiple bytes) No longer built-in, instead platform/library specific

p-String

35

32

5

31

31

33


Main Points

Boolean Algebra is Mathematical Basis Basic form encodes “false” as 0, “true” as 1 General form like bit-level operations in C Good for representing & manipulating sets

It’s All About Bits & Bytes Numbers, text, programs: They are all collections of bits!

Different Machines/Languages: Different Conventions Representations Word size Byte ordering


שאלות?

Documents

Computer Organization: A Programmer's Perspective