Bits and Bytes Spring, 2015 Topics Why bits? Representing information as bits Binary / Hexadecimal Byte representations »Numbers »Characters and strings

Bits and BytesSpring, 2015

Bits and BytesSpring, 2015

TopicsTopics Why bits? Representing information as bits

Binary / Hexadecimal Byte representations

» Numbers» Characters and strings» Instructions

Bit-level manipulations Boolean algebra Expressing in C

COMP 21000Comp Org & Assembly Lang

– 2 – COMP 21000, S’15

How then do we represent data?How then do we represent data?

Each different Each different ““typetype”” in a programming language has in a programming language has its own representation.its own representation.

Next few slides discuss the representation for the Next few slides discuss the representation for the common types:common types: Ints Pointers Floats Strings

– 3 – COMP 21000, S’15

Representing IntegersRepresenting Integers

int A = 15213;int A = 15213;int B = -15213;int B = -15213;long int C = 15213;long int C = 15213;

Decimal: 15213

Binary: 0011 1011 0110 1101

Hex: 3 B 6 D

6D3B0000

Linux/Alpha A

3B6D

0000

Sun A

93C4FFFF

Linux/Alpha B

C493

FFFF

Sun B

Two’s complement representation(Covered next lecture)

00000000

6D3B0000

Alpha C

3B6D

0000

Sun C

6D3B0000

Linux C

Linux/Win/OS X: little endianSolaris: big endian

– 4 – COMP 21000, S’15

Representing PointersRepresenting Pointersint B = -15213;int B = -15213;int *P = &B;int *P = &B;

Alpha Address

Hex: 1 F F F F F C A 0

Binary: 0001 1111 1111 1111 1111 1111 1100 1010 0000

01000000

A0FCFFFF

Alpha P

Sun Address

Hex: E F F F F B 2 C Binary: 1110 1111 1111 1111 1111 1011 0010 1100

Different compilers & machines assign different locations to objects

FB2C

EFFF

Sun P

FFBF

D4F8

Linux P

Linux Address

Hex: B F F F F 8 D 4 Binary: 1011 1111 1111 1111 1111 1000 1101 0100

Linux/Win/OS X: little endianSolaris: big endian

– 5 – COMP 21000, S’15

Representing FloatsRepresenting Floats

Float F = 15213.0;Float F = 15213.0;

IEEE Single Precision Floating Point Representation

Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000

15213: 1110 1101 1011 01

Not same as integer representation, but consistent across machines

00B46D46

Linux/Win/OS X F

B400

466D

Solaris F

Can see some relation to integer representation, but not obvious


Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000

15213: 1110 1101 1011 01


Hex: 4 6 6 D B 4 0 0 Binary: 0100 0110 0110 1101 1011 0100 0000 0000

15213 1110 1101 1011 01 (in binary)

– 6 – COMP 21000, S’15

char S[6] = "15213";char S[6] = "15213";

Representing StringsRepresenting Strings

Strings in CStrings in C Represented by array of characters Each character encoded in ASCII format

Standard 7-bit encoding of character setCharacter “0” has code 0x30

» Digit i has code 0x30+i String should be null-terminated

Final character = 0

CompatibilityCompatibility Byte ordering not an issue Text files generally platform independent

Except for different conventions of line termination character(s)!» Unix (‘\n’ = 0x0a = ^J) » Mac (‘\r’ = 0x0d = ^M) » DOS and HTTP (‘\r\n’ = 0x0d0a = ^M^J)

Linux/Win/OS X S Solaris S

3231

3135

3300

3231

3135

3300

– 7 – COMP 21000, S’15

ASCII ChartASCII Chart

ASCII ‘0’ is 0x30

Reference:http://jimprice.com/jim-asc.htm

– 8 – COMP 21000, S’15

CharactersCharacters

Other popular encoding standards: Other popular encoding standards: Unicode

16 bit codeAllows more characters: most languages can be encodedSee http://en.wikipedia.org/wiki/UnicodeThe most commonly used encodings are UTF-8, UTF-16 and the

now-obsolete UCS-2UTF-8

» uses one byte for any ASCII character,

» These bytes have the same code values in both UTF-8 and ASCII encoding,

» and up to four bytes for other characters

ISO Latin-88-bit encoding standard with dotted consonants International standard for Latin letters

– 9 – COMP 21000, S’15

Machine-Level Code RepresentationMachine-Level Code Representation

Encode Program as Sequence of InstructionsEncode Program as Sequence of Instructions Each instruction is a simple operation

Arithmetic operationRead or write memoryConditional branch

Instructions encoded as bytesAlpha’s, Sun’s, Mac’s use 4 byte instructions

» Reduced Instruction Set Computer (RISC)PC’s (and intel Mac’s) use variable length instructions

» Complex Instruction Set Computer (CISC)

Different instruction types and encodings for different machines

Most code is not binary compatible

Programs are Byte Sequences Too!Programs are Byte Sequences Too!

– 10 – COMP 21000, S’15

Representing InstructionsRepresenting Instructions

int sum(int x, int y)int sum(int x, int y){{ return x+y;return x+y;}}

Different machines use totally different instructions and encodings

00003042

Alpha sum

0180FA6B

E008

81C3

Solaris sum

90020009

For this example, Alpha & Sun use two 4-byte instructions

Use differing numbers of instructions in other cases

PC uses 7 instructions with lengths 1, 2, and 3 bytes

Same for NT, OSX (intel) and for Linux

NT / Linux/OSX not fully binary compatible

E58B

5589

PC sum

450C03450889EC5DC3

– 11 – COMP 21000, S’15

What do we know now?What do we know now?

A computer is a processor and RAM. A computer is a processor and RAM. Everything else is peripheral

Everything is 1’s and 0’s. Everything is 1’s and 0’s. A processor executes instructions

Instructions are encoded in 1’s and 0’seach instruction has binary representation

RAM stores 1’s and 0’sWe interpret these 1’s and 0’s based on context

Every type is interpreted from binary integers: binary float: special representation in binarychar: binary (ASCII encoding)

– 12 – COMP 21000, S’15

How do we get to bits?How do we get to bits?

We write programs in an abstract language like CWe write programs in an abstract language like C

How do we get to the binary representation?How do we get to the binary representation?

Compilers & Assemblers!Compilers & Assemblers!

We writeWe write

x = 5.x = 5.

The compiler changes it intoThe compiler changes it into

mov 5, 0x005Fmov 5, 0x005F

The assembler changes it into:The assembler changes it into:

80483c7: 8b 45 5F 80483c7: 8b 45 5F

– 13 – COMP 21000, S’15

Manipulating bitsManipulating bits

Since all information is in the form of bits, we must Since all information is in the form of bits, we must manipulate bits when programming at low levelsmanipulate bits when programming at low levels

We did some of this with the show_bytes program. To understand how to do more, we must understand

Boolean algebra We can manipulate bits from C; this is where C and

hardware meet.

– 14 – COMP 21000, S’15

Boolean AlgebraBoolean AlgebraDeveloped by George Boole in 19th CenturyDeveloped by George Boole in 19th Century

Algebraic representation of logicEncode “True” as 1 and “False” as 0

AndAnd A&B = 1 when both A=1 and

B=1

NotNot ~A = 1 when A=0

OrOr A|B = 1 when either A=1 or

B=1

Exclusive-Or (Xor)Exclusive-Or (Xor) A^B = 1 when either A=1 or

B=1, but not both

– 15 – COMP 21000, S’15

A

~A

~B

B

Connection when A&~B | ~A&B

Application of Boolean AlgebraApplication of Boolean Algebra

Applied to Digital Systems by Claude ShannonApplied to Digital Systems by Claude Shannon 1937 MIT Master’s Thesis Reason about networks of relay switches

Encode closed switch as 1, open switch as 0

A&~B

~A&B = A^B

– 16 – COMP 21000, S’15

General Boolean AlgebrasGeneral Boolean Algebras

Operate on Bit VectorsOperate on Bit Vectors

Operations applied bitwise

All of the Properties of Boolean Algebra ApplyAll of the Properties of Boolean Algebra Apply

01101001& 01010101 01000001

01101001| 01010101 01111101

01101001^ 01010101 00111100

~ 01010101 10101010 01000001 01111101 00111100 10101010

– 17 – COMP 21000, S’15

Representing & Manipulating SetsRepresenting & Manipulating Sets

RepresentationRepresentation Width w bit vector represents subsets of {0, …, w–1} aj = 1 if j A

01101001 { 0, 3, 5, 6 }

76543210

01010101 { 0, 2, 4, 6 }

76543210

OperationsOperations & Intersection 01000001 { 0, 6 } | Union 01111101 { 0, 2, 3, 4, 5, 6

} ^ Symmetric difference 00111100 { 2, 3, 4, 5 } ~ Complement 10101010 { 1, 3, 5, 7 }

– 18 – COMP 21000, S’15

Bit-Level Operations in CBit-Level Operations in C

Operations &, |, ~, ^ Available in COperations &, |, ~, ^ Available in C Apply to any “integral” data type

long, int, short, char, unsigned View arguments as bit vectors Arguments applied bit-wise

Examples (Char data type)Examples (Char data type) ~0x41 --> 0xBE

~010000012 --> 101111102

~0x00 --> 0xFF~000000002 --> 111111112

0x69 & 0x55 --> 0x41011010012 & 010101012 --> 010000012

0x69 | 0x55 --> 0x7D011010012 | 010101012 --> 011111012

– 19 – COMP 21000, S’15

Contrast: Logic Operations in CContrast: Logic Operations in C

Contrast to Logical OperatorsContrast to Logical Operators &&, ||, !

View 0 as “False”Anything nonzero as “True”Always return 0 or 1Early termination

Examples (char data type)Examples (char data type) !0x41 --> 0x00 !0x00 --> 0x01 !!0x41 --> 0x01

0x69 && 0x55 --> 0x01 0x69 || 0x55 --> 0x01 p && *p++ (avoids null pointer access)

Or p && (*p = 10) also protects

Works because of short-circuit evaluation in C

– 20 – COMP 21000, S’15

Shift OperationsShift Operations

Left Shift: Left Shift: x << yx << y Shift bit-vector x left y positions

Throw away extra bits on leftFill with 0’s on right

Right Shift: Right Shift: x >> yx >> y Shift bit-vector x right y

positionsThrow away extra bits on right

Logical shiftFill with 0’s on left

Arithmetic shiftReplicate most significant bit on

rightUseful with two’s complement

integer representation

01100010Argument x

00010000<< 3

00011000Log. >> 2

00011000Arith. >> 2

10100010Argument x

00010000<< 3

00101000Log. >> 2

11101000Arith. >> 2

0001000000010000

0001100000011000

0001100000011000

00010000

00101000

11101000

00010000

00101000

11101000

Which right shift does C do? Depends on the compiler!

Often: int do ASR While unsign does LSR

– 21 – COMP 21000, S’15

Effect of LSLEffect of LSL

Each shift doubles the number.Each shift doubles the number.

If the carry bit is 1 after a shift, then overflow (if the number If the carry bit is 1 after a shift, then overflow (if the number is unsigned).is unsigned).

Decimal: shift multiplies by 10.Decimal: shift multiplies by 10.35 << 350

Binary:

0 1 0 1 0 = 1 x 23 + 0 x 22 + 1 x 21 + 0 x 20

1 0 1 0 0 = 1 x 24 + 0 x 23 + 1 x 22 + 0 x 21+ 0 x 20

= 2 x (1 x 23 + 0 x 22 + 1 x 21 + 0 x 20)

– 22 – COMP 21000, S’15

Effect of LSREffect of LSR

Each shift halves the number.Each shift halves the number.

Round down (because you drop the least significant bit)Round down (because you drop the least significant bit)

Decimal: shift divides by 10.Decimal: shift divides by 10.350 >> 35

Binary:

1 0 1 0 0 = 1 x 24 + 0 x 23 + 1 x 22 + 0 x 21+ 0 x 20

0 1 0 1 0 = 1 x 23 + 0 x 22 + 1 x 21 + 0 x 20

= (1 x 24 + x 23 + 0 x 22 + 1 x 21 + 0 x 20) ÷ 2

– 23 – COMP 21000, S’15

TruncatingTruncating

Casting to a shorter form Casting to a shorter form truncatestruncates the number. the number.int x = 53191;int x = 53191;

short sx = (short) x;short sx = (short) x; /* -12345 *//* -12345 */

int y = sx;int y = sx; /* -12345*//* -12345*/

On a typical 32-bit machine:On a typical 32-bit machine: Casting int to short truncate 32-bit to 16-bits When cast back to 32 bits, do sign extension (see next

lecture) so value doesn’t change.

– 24 – COMP 21000, S’15

MasksMasks

Can use bit patterns as masksCan use bit patterns as masks

ExampleExample

Mask = 0xFFFFFF00Mask = 0xFFFFFF00 when used with & filters out when used with & filters out (i.e., zeros out) the last byte(i.e., zeros out) the last byte

Mask = 0x000000FFMask = 0x000000FF when used with | places 1when used with | places 1’’s s in the last byte, rest is unaffected.in the last byte, rest is unaffected.

– 25 – COMP 21000, S’15

MasksMasks

Can perform a mod by a power of 2 with a maskCan perform a mod by a power of 2 with a mask

Examples:Examples:

x % 2 = 0 (x is even) or 1 (x is odd)x % 2 = 0 (x is even) or 1 (x is odd)

1111 = 1 x 21111 = 1 x 233 + 1 x 2 + 1 x 222 + 1 x 2 + 1 x 211 + 1 x 2 + 1 x 200 : all places : all places except the except the lastlast are divisible by 2. So if the last bit is are divisible by 2. So if the last bit is 0, mod is 0. If last bit is 1, mod is 10, mod is 0. If last bit is 1, mod is 1

x % 4 x % 4

1111 = 1 x 21111 = 1 x 233 + 1 x 2 + 1 x 222 + 1 x 2 + 1 x 211 + 1 x 2 + 1 x 200 : all places : all places except the except the last twolast two are divisible by 4. So if the last are divisible by 4. So if the last two bits are 0, mod is 0. Otherwise the mod is two bits are 0, mod is 0. Otherwise the mod is whatever value the last two bits havewhatever value the last two bits have

– 26 – COMP 21000, S’15

MasksMasks

Can perform a mod by a power of 2 with a maskCan perform a mod by a power of 2 with a mask

Examples:Examples:

x % 8 = 0 x % 8 = 0

1111 = 1 x 21111 = 1 x 233 + 1 x 2 + 1 x 222 + 1 x 2 + 1 x 211 + 1 x 2 + 1 x 200 : all places : all places except the except the last threelast three are divisible by 8. So if the last are divisible by 8. So if the last three bits are 0, mod is 0. . Otherwise the mod is the three bits are 0, mod is 0. . Otherwise the mod is the value in the last three bitsvalue in the last three bits

soso

x % 2 == x & 00000001 (or 0x01)x % 2 == x & 00000001 (or 0x01)

x % 4 == x & 00000011 (or 0x03)x % 4 == x & 00000011 (or 0x03)

x % 8 == x & 00000111 (or 0x07)x % 8 == x & 00000111 (or 0x07)

– 27 – COMP 21000, S’15

Converting (revisited)Converting (revisited)

#include <stdlib.h>#define LOWDIGIT 07#define DIG 20

void octal_print(register int x){ register int i = 0; char s[DIG]; if ( x < 0 ){ /* treat negative x */ putchar ('-'); /* output a minus sign */ x = - x; } do{

/* mod performed with & */ s[i++] = ((x & LOWDIGIT) + '0'); } while ( ( x >>= 3) > 0 ); /* divide x by 8 via shift */ putchar('0'); /* octal number prefix */ do{ putchar(s[--i]); /* output characters on s */ } while ( i > 0 ); putchar ('\n');}

– 28 – COMP 21000, S’15

Converting to binaryConverting to binary

void printBits(unsigned int word, int nBits){void printBits(unsigned int word, int nBits){

for (int i=nBits-1;i>=0;i--){for (int i=nBits-1;i>=0;i--){

printf("%u", !!(word&(1<<i)));printf("%u", !!(word&(1<<i)));

}}

printf("\n");printf("\n");

}}

– 29 – COMP 21000, S’15

Cool Stuff with XorCool Stuff with Xor

void funny(int *x, int *y)void funny(int *x, int *y){{ *x = *x ^ *y; /* #1 */*x = *x ^ *y; /* #1 */ *y = *x ^ *y; /* #2 */*y = *x ^ *y; /* #2 */ *x = *x ^ *y; /* #3 */*x = *x ^ *y; /* #3 */}}

Bitwise Xor is form of addition

With extra property that every value is its own additive inverse

A ^ A = 0

BABegin

BA^B1

(A^B)^B = AA^B2

A(A^B)^A = B3

ABEnd

*y*x

– 30 – COMP 21000, S’15

Bit techniquesBit techniquesINTEGER CODING RULES:INTEGER CODING RULES:

Replace the "return" statement in each function with oneReplace the "return" statement in each function with one

or more lines of C code that implements the function. Your code or more lines of C code that implements the function. Your code

must conform to the following style:must conform to the following style:

int Funct(arg1, arg2, ...) {int Funct(arg1, arg2, ...) {

/* brief description of how your implementation works *//* brief description of how your implementation works */

int var1 = Expr1;int var1 = Expr1;

......

int varM = ExprM;int varM = ExprM;

varJ = ExprJ;varJ = ExprJ;

......

varN = ExprN;varN = ExprN;

return ExprR;return ExprR;

}}

– 31 – COMP 21000, S’15

Bit techniquesBit techniques Each "Expr" is an expression using ONLY the following:Each "Expr" is an expression using ONLY the following:

1. Integer constants 0 through 255 (0xFF), inclusive. You are1. Integer constants 0 through 255 (0xFF), inclusive. You are

not allowed to use big constants such as 0xffffffff.not allowed to use big constants such as 0xffffffff.

2. Function arguments and local variables (no global variables).2. Function arguments and local variables (no global variables).

3. Unary integer operations ! ~3. Unary integer operations ! ~

4. Binary integer operations & ^ | + << >>4. Binary integer operations & ^ | + << >>

Some of the problems restrict the set of allowed operators even further.Some of the problems restrict the set of allowed operators even further.

Each "Expr" may consist of multiple operators. You are not restricted toEach "Expr" may consist of multiple operators. You are not restricted to

one operator per line.one operator per line.

You are expressly You are expressly forbidden forbidden to:to:

1. Use any control constructs such as if, do, while, for, switch, etc.1. Use any control constructs such as if, do, while, for, switch, etc.

2. Define or use any macros.2. Define or use any macros.

3. Define any additional functions in this file.3. Define any additional functions in this file.

4. Call any functions.4. Call any functions.

5. Use any other operations, such as &&, ||, -, or ?:5. Use any other operations, such as &&, ||, -, or ?:

6. Use any form of casting.6. Use any form of casting.

7. Use any data type other than int. This implies that you7. Use any data type other than int. This implies that you

cannot use arrays, structs, or unions.cannot use arrays, structs, or unions.

– 32 – COMP 21000, S’15

Bit techniquesBit techniques/* pow2plus1 - returns 2^x + 1, where 0 <= x <= 31 *//* pow2plus1 - returns 2^x + 1, where 0 <= x <= 31 */

int pow2plus1(int x) {int pow2plus1(int x) {

/* exploit ability of shifts to compute powers of 2 *//* exploit ability of shifts to compute powers of 2 */

return return (1 << x) + 1;(1 << x) + 1;

}}

/* pow2plus4 - returns 2^x + 4, where 0 <= x <= 31 *//* pow2plus4 - returns 2^x + 4, where 0 <= x <= 31 */

int pow2plus4(int x) {int pow2plus4(int x) {

/* exploit ability of shifts to compute powers of 2 *//* exploit ability of shifts to compute powers of 2 */

int result = (1 << x);int result = (1 << x);

result += 4;result += 4;

return result;return result;

}}

– 33 – COMP 21000, S’15

Bit techniquesBit techniques/ * bitNor - ~(x|y) using only ~ and & / * bitNor - ~(x|y) using only ~ and & * Example: bitNor(0x6, 0x5) = 0xFFFFFFF8* Example: bitNor(0x6, 0x5) = 0xFFFFFFF8 * Legal ops: ~ &* Legal ops: ~ & * Max ops: 8* Max ops: 8 * Rating: 1* Rating: 1 */*/

int bitNor(int x, int y) {int bitNor(int x, int y) {

}}/* evenBits - return word with all even-numbered bits set to 1/* evenBits - return word with all even-numbered bits set to 1 * Legal ops: ! ~ & ^ | + << >>* Legal ops: ! ~ & ^ | + << >> * Max ops: 8* Max ops: 8 * Rating: 1* Rating: 1 */*/

int evenBits(void) {int evenBits(void) {

}}

return (~x & ~y)

int byte = 0x55;int word = byte | byte<<8;return word | word<<16;

use DeMorgan’s Law: ~(A & B) = (~A) | (~B)

And~(A | B) = (~A) & (~B)

use DeMorgan’s Law: ~(A & B) = (~A) | (~B)

And~(A | B) = (~A) & (~B)

– 34 – COMP 21000, S’15

Some Other Uses for BitvectorsSome Other Uses for Bitvectors

Representation of small setsRepresentation of small sets

Representation of polynomials:Representation of polynomials: Important for error correcting codes (see later slides) Arithmetic over finite fields, say GF(2^n) Example 0x15213 : x16 + x14 + x12 + x9 + x4 + x + 1

Representation of graphs (intersection of sys & algo)Representation of graphs (intersection of sys & algo) A ‘1’ represents the presence of an edge

Representation of bitmap images, icons, cursors, …Representation of bitmap images, icons, cursors, … Exclusive-or cursor patent (see later slides)

Representation of Boolean expressions and logic Representation of Boolean expressions and logic circuitscircuits

– 35 – COMP 21000, S’15

UDPUDP

The User Datagram Protocol (UDP) is one of the core The User Datagram Protocol (UDP) is one of the core members of the Internet protocol suite, the set of network members of the Internet protocol suite, the set of network protocols used for the Internet. With UDP, computer protocols used for the Internet. With UDP, computer applications can send messages, in this case referred to applications can send messages, in this case referred to as datagrams, to other hosts on an Internet Protocol (IP) as datagrams, to other hosts on an Internet Protocol (IP) network without prior communications to set up special network without prior communications to set up special transmission channels or data paths. The protocol was transmission channels or data paths. The protocol was designed by David P. Reed in 1980 and formally defined designed by David P. Reed in 1980 and formally defined in RFC 768.in RFC 768.

http://en.wikipedia.org/wiki/http://en.wikipedia.org/wiki/User_Datagram_ProtocolUser_Datagram_Protocol

– 36 – COMP 21000, S’15Transport Layer 3-36

UDP: User Datagram Protocol [RFC 768]UDP: User Datagram Protocol [RFC 768]

““no frills,no frills,”” ““bare bonesbare bones”” Internet transport protocolInternet transport protocol

““best effortbest effort”” service, UDP service, UDP segments may be:segments may be:

lost delivered out of order

to app

connectionless:connectionless: no handshaking

between UDP sender, receiver

each UDP segment handled independently of others

Why is there a UDP?Why is there a UDP?

no connection no connection establishment establishment (TCP: 3-(TCP: 3-way handshake)way handshake)

simple: simple: no connection no connection state state at sender, receiver at sender, receiver (TCP keeps buffers, (TCP keeps buffers, sequence no., ack, etc.)sequence no., ack, etc.)

small segment header small segment header (20 (20 bytes for TCP, 8 for UDP)bytes for TCP, 8 for UDP)

no congestion controlno congestion control: UDP : UDP can blast away as fast as can blast away as fast as desireddesired

With UDP applications basically talk directly to IP

– 37 – COMP 21000, S’15Transport Layer

UDP checksumUDP checksum

Goal: detect “errors” (e.g., flipped bits) in transmitted segment

Result:UDP does error detection not error correction

1. may just discard a segment with errors 2. may pass it to the application with a warning.


UDP checksumUDP checksum

Sender:Sender:

treat segment contents treat segment contents as sequence of 16-bit as sequence of 16-bit integersintegers

checksum: addition (1checksum: addition (1’’s s complement sum) of complement sum) of segment contentssegment contents

sender puts checksum sender puts checksum value into UDP value into UDP checksum fieldchecksum field

Receiver:Receiver:

compute checksum of compute checksum of received segmentreceived segment

check if computed checksum check if computed checksum equals checksum field value:equals checksum field value:

NO - error detected YES - no error detected.

But maybe errors nonetheless? More later ….

Goal: detect “errors” (e.g., flipped bits) in transmitted segment


Internet Checksum ExampleInternet Checksum ExampleNoteNote

When adding numbers, a carryout from the most significant bit needs to be added to the result

sendersender: add two 16-bit integers: add two 16-bit integers1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 01 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1

wraparound

sum

Checksum (complement the sum)


Internet Checksum ExampleInternet Checksum ExampleNoteNote

When adding numbers, a carryout from the most significant bit needs to be added to the result

receiverreceiver: add two 16-bit integers plus the : add two 16-bit integers plus the

checksum: should get all 1checksum: should get all 1’’s (why?)s (why?)1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 01 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1

1 0 1 1 1 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

wraparound

sumChecksum Result

For an explanation of 1’s complement addition see http://mathforum.org/library/drmath/view/54379.html

For an explanation of 1’s complement addition see http://mathforum.org/library/drmath/view/54379.html

Do

XO

R o

f su

m &

ch

ecks

um

; sh

ou

ld g

et a

ll 1’

sD

o X

OR

of

sum

& c

hec

ksu

m;

sho

uld

get

all

1’s

– 41 – COMP 21000, S’15

UDP checksum code (part 1)UDP checksum code (part 1)typedef unsigned short u16;typedef unsigned short u16;

typedef unsigned long u32;typedef unsigned long u32;

u16 udp_sum_calc(u16 len_udp, u16 src_addr[],u16 dest_addr[], BOOL padding, u16 buff[])u16 udp_sum_calc(u16 len_udp, u16 src_addr[],u16 dest_addr[], BOOL padding, u16 buff[])

{ u16 prot_udp=17;{ u16 prot_udp=17;

u16 padd=0;u16 padd=0;

u16 word16;u16 word16;

u32 sum;u32 sum;

// Find out if the length of data is even or odd number. If odd,// Find out if the length of data is even or odd number. If odd,

// add a padding byte = 0 at the end of packet// add a padding byte = 0 at the end of packet

if (padding&1==1){if (padding&1==1){

padd=1;padd=1;

buff[len_udp]=0;buff[len_udp]=0;

}}

//initialize sum to zero//initialize sum to zero

sum=0;sum=0;

See http://www.netfor2.com/udpsum.htm for the full code.See http://www.netfor2.com/udpsum.htm for the full code.

// make 16 bit words out of every two adjacent 8 bit words and // calculate the sum of all 16 vit words


– 42 – COMP 21000, S’15

UDP checksum code (part 2)UDP checksum code (part 2)for (i=0; i < len_udp+padd; i=i+2){for (i=0; i < len_udp+padd; i=i+2){

word16 =((buff[i]<<8)&0xFF00)+(buff[i+1]&0xFF);word16 =((buff[i]<<8)&0xFF00)+(buff[i+1]&0xFF);

sum = sum + (unsigned long)word16;sum = sum + (unsigned long)word16;

}}

// add the UDP pseudo header which contains the IP source and destinationn // add the UDP pseudo header which contains the IP source and destinationn addressesaddresses

for (i=0;i<4;i=i+2){for (i=0;i<4;i=i+2){

word16 =((src_addr[i]<<8)&0xFF00)+(src_addr[i+1]&0xFF);word16 =((src_addr[i]<<8)&0xFF00)+(src_addr[i+1]&0xFF);

sum=sum+word16;sum=sum+word16;

}}

for (i=0;i<4;i=i+2){for (i=0;i<4;i=i+2){

word16 =((dest_addr[i]<<8)&0xFF00)+(dest_addr[i+1]&0xFF);word16 =((dest_addr[i]<<8)&0xFF00)+(dest_addr[i+1]&0xFF);

sum=sum+word16; sum=sum+word16;

}}See http://www.netfor2.com/udpsum.htm for the full code.See http://www.netfor2.com/udpsum.htm for the full code.



// overflow bits are added on the next slide// overflow bits are added on the next slide

– 43 – COMP 21000, S’15

UDP checksum code (part 3)UDP checksum code (part 3)// the protocol number and the length of the UDP packet// the protocol number and the length of the UDP packet

sum = sum + prot_udp + len_udp;sum = sum + prot_udp + len_udp;

// keep only the last 16 bits of the 32 bit calculated sum and add the carries// keep only the last 16 bits of the 32 bit calculated sum and add the carries

while (sum>>16)while (sum>>16)

sum = (sum & 0xFFFF)+(sum >> 16);sum = (sum & 0xFFFF)+(sum >> 16);

// Take the one's complement of sum// Take the one's complement of sum

sum = ~sum;sum = ~sum;

return ((u16) sum);return ((u16) sum);

}}

See http://www.netfor2.com/udpsum.htm for the full code.See http://www.netfor2.com/udpsum.htm for the full code.

/* here’s where the carry’s are added in. Turns out you can save the overflow bits and add them back at the end */

/* here’s where the carry’s are added in. Turns out you can save the overflow bits and add them back at the end */

– 44 – COMP 21000, S’15

XOR patent disputeXOR patent disputefor (y=0; y < y_height; ++y) {for (y=0; y < y_height; ++y) {

int loc = (y+y_off)*stride + x_off;int loc = (y+y_off)*stride + x_off;

vidmem[loc] ^= 255; /* XOR */vidmem[loc] ^= 255; /* XOR */

}}

// another way of doing this:// another way of doing this:

for (y=0; y < y_height; ++y) {for (y=0; y < y_height; ++y) {

int loc = (y+y_off)*stride + x_off;int loc = (y+y_off)*stride + x_off;

for (x=0; x < 8; ++x)for (x=0; x < 8; ++x)

if (vidmem[loc] & (1 << x))if (vidmem[loc] & (1 << x))

vidmem[loc] &= ~(1 << x);vidmem[loc] &= ~(1 << x);

elseelse

vidmem[loc] |= (1 << x);vidmem[loc] |= (1 << x);

}} See http://nothings.org/computer/patents.html for a full discussionSee http://nothings.org/computer/patents.html for a full discussion

Closely related is a patent which apparently covers results, not process. The exclusive-or-cursor patent is a simple example of this.

There are a number of mathematically equivalent ways of expressing the xor algorithm, but it appears to be widely believed that all of them are covered by the patent. The patent apparently is understood to cover the idea of representing an onscreen cursor by complementing each "visible" pixel of the cursor with the contents of whatever is on screen.

This is somewhat surprising, since it is predated by hardware implementations that do the exact same thing for text-only modes--rendering a block or underline cursor so its appearance is the same as above.

Closely related is a patent which apparently covers results, not process. The exclusive-or-cursor patent is a simple example of this.

There are a number of mathematically equivalent ways of expressing the xor algorithm, but it appears to be widely believed that all of them are covered by the patent. The patent apparently is understood to cover the idea of representing an onscreen cursor by complementing each "visible" pixel of the cursor with the contents of whatever is on screen.

This is somewhat surprising, since it is predated by hardware implementations that do the exact same thing for text-only modes--rendering a block or underline cursor so its appearance is the same as above.

– 45 – COMP 21000, S’15

More Bitvector MagicMore Bitvector Magic

Count the number of 1’s in a wordCount the number of 1’s in a wordMIT Hackmem 169:

int bitcount(unsigned int n){ unsigned int tmp;

tmp = n - ((n >> 1) & 033333333333) - ((n >> 2) & 011111111111); return ((tmp + (tmp >> 3)) & 030707070707)%63;}

MIT Hackmem Count. The main idea.1. Consider a 3 bit unsigned number as being 4a+2b+c,

• example: consider the three bit number 101• then the unsigned number represents 1 x 22 + 0 x 21 + 1 x 20 = 1 x 4 + 0 x 2 + 1 • let “a” represent the leftmost bit (the 1), “b” the next bit (the 0) and c the rightmost bit (the rightmost 1).• then we get a x 4 + b x 2 + c or 4a+2b+c

2. If we shift the number right 1 bit, we get 010. If we follow the convention of using “abc”

for the bits, then after the shift (logical) we get 0ab or 0 x 22 + a x 21 + b x 20 = 2a+b. 3. Subtracting this shifted number (2a+b) from the original number (4a+2b+c) gives 2a+b+c. 4. If we right-shift the original 3-bit number by two bits, we go from “abc” to “00a” which is

just a x 20

5. If we subract this new shifted number (a) from the result of 3, we get 2a+b+c – a = a+b+c, 6. Since a, b, and c represent the bits (1, 0, and 1 in our example), a+b+c is just the number

of bits in the original number!

– 46 – COMP 21000, S’15





MIT Hackmem Count. (continued) How is this insight employed in the code? 1. The key insight is that we’re actually going to consider the parameter n to be composed of

11 groups of 3 (3 x 11 = 33, or one more bit than we actually have in a 32-bit number).2. So we’re going to do the algorithm from the previous page one each three bit group of bits.3. Consider the bits 101 111 010 To apply the previous algorithm on each group we want to:

a. (101 – 010 – 001) (111 – 011 – 001) (010 – 001 – 000)b. If we shift n we get: 010 111 101 Note that the shifted number in each position

has a “wrong” first bit (it’s the shifted bit from the previous group) but the correct last two digits.c. We know that the first bit must be 0 since we’re doing logical shift rightd. So we can take the result of shifting and mask out the first bit in each group.e. Since each group can be represented by an octal number, the mask for each

group will be 3 in octal (011). This explains the line ((n >> 1) & 033333333333). Note that a number beginning in “0” in C is taken as an octal number.

f. A similar explanation can be made for the next part of the expression, i.e., to shift each group twice, shift n right twice and mask with 001 (or octal 1) for each group.

Now each octal digit in the number n represents the number of 1’s the original number in that position.In our example, the 9 digits become

– 47 – COMP 21000, S’15





MIT Hackmem Count. (continued) The last return statement sums these octal digits to produce the final answer.

The key idea is to add adjacent pairs of octal digits together and then compute the remainder modulus 63.

1. Right-shifting tmp by three bits, 2. adding it to tmp itself and 3. ANDing with a suitable mask. This step saves every other group of 3 bits. We want this

because we’ve added each 3-bit group with its right neighbor group; if we save every group we will have counted each group twice.

4. The result is a number in which groups of six adjacent bits (starting from the LSB) contain the number of 1's among those six positions in n.

5. This number modulo 63 yields the final answer.

For 64-bit numbers, we would have to add triples of octal digits and use modulus 1023. This is HACKMEM 169, as used in X11 sources.

Source: MIT AI Lab memo, late 1970's.

– 48 – COMP 21000, S’15

Summary of the Main PointsSummary of the Main Points

It’s All About Bits & BytesIt’s All About Bits & Bytes Numbers Programs Text

Different Machines Follow Different Conventions forDifferent Machines Follow Different Conventions for Word size Byte ordering Representations

Boolean Algebra is the Mathematical BasisBoolean Algebra is the Mathematical Basis Basic form encodes “false” as 0, “true” as 1 General form like bit-level operations in C

Good for representing & manipulating sets

Documents

Bits and Bytes Spring, 2015 Topics Why bits? Representing information as bits Binary / Hexadecimal Byte representations »Numbers »Characters and strings