Computer Architecture CS 154 Where software and hardware finally meet Dr. Franklin

Computer Architecture CS 154

Where software and hardware finally meet

Dr. Franklin

What is Computer Architecture?Program software

Write compilers

Design assembly language

Design processor

Optimize layout, circuits, etc

Design transistortechnology

Architecture


Write compilers


Design processor



Architecture

This class!!

Coming together – the basics

• What do high-level instructions get compiled down to?

• How do you build a basic machine?

Coming together – the basics



Hardware optimization



• How do architects specialize the hardware to run programs quickly?

Hardware optimization




Software optimization




• How do programmers optimize programs to run quickly?

Software optimization





CS 154 Topics




Architecture

• Must understand software– Programs have certain characteristics– Optimize design to take advantage of char.

• Must understand hardware– Hardware design complexity– Ease of programming– Performance– Power

Looking smart for your friends and family

Where is computing going?

Technology Trends: Memory Capacity (Single-Chip DRAM)

• Now 1.4X/yr, or 2X every 2 years.• 8000X since 1980!

Technology Trends: Microprocessor Complexity

Moore’s Law

2X transistors/chipEvery 1.5 years

Alpha 21264: 15 millionPentium Pro: 5.5 millionPowerPC 620: 6.9 millionAlpha 21164: 9.3 millionSparc Ultra: 5.2 million

Moore’s Law

Athlon (K7): 22 Million

Itanium 2: 41 Million

Technology Trends: Processor Performance

1.5X/yr

Intel P4 2000 MHz(Fall 2001)

year

Per

form

ance

mea

sure

This curve has now flattened out - that is why we are seeing multicore

Technology Trends Summary

• Technology trend• 2X every 2.0 years in memory size;

every 1.0 year in disk capacity; every 1.5 years in processor

complexity (Moore’s Law)More processors per chip each

generation

The Architecture Walls

• Memory Wall

• ILP Wall

• Power Wall

The Architecture Walls• Memory Wall – Processor speed kept

increasing, memory did not as quickly, so processor is often idle waiting for memory

• ILP Wall – There are not enough independent instructions for the processor to get real work done when one instruction needs to wait for another (or memory or whatever)

• Power Wall – Solving the above two walls requires too much power, and we don’t have cooling technology to dissipate that much heat.

Beginning of the multi-core era• Multi-core chips

– Place multiple processors on a single die

• Because– They can communicate very quickly– Much higher potential throughput– Less power per area than accelerating single

thread

• But– You need parallel programs (or multiple

programs) to exploit

The next frontier• GPU – Graphics processing unit

– Specialized hardware for graphics– Optimized to run the same thing on many pieces

of data (i.e. pixels)

• Why?– They are mature technology, driven by gaming– Low power parallel processing

• Barrier– Limited programming model– Not appropriate for a lot of programs (i.e. servers)

Performance

• Not an absolute• Depends on application characteristics

– Graphics– General-Purpose desktop– Scientific apps– Servers

• Rapidly changing technology– DRAM speed, chip density, etc.

• This is the focus of our class


Write compilers


Design processor



Architecture

This class!!

Why do I care?!?I’m 3 levels above.

But I’m CS

• Why do I have to learn about hardware?(I hear you ask)

But I’m CS


• Hardware is optimized to take advantage of particular program characteristics

But I’m CS



• If your software is different, it can get atrocious performance

But I’m CS




• You must understand general architecture to program for it.

But I’m CS




• You must understand general architecture to program for it.

• In an ideal world, compilers would do this for you. (We live in the real world)

Which is faster?

R1 = A[5];

B[6] = R1

R3 = R0 + R2

R5 = R4 – R3

R7 = R0 + R6

C[7] = R7

R1 = A[5];

R3 = R0 + R2

R7 = R0 + R6

B[6] = R1

R5 = R4 – R3

C[7] = R7

Which is faster in C/Java?

for(i=0;i<n;i++) for(j=0;j<n;j++) A[j][i] = i*j+7;

for(i=0;i<n;i++) for(j=0;j<n;j++) A[i][j] = i*j+7;

What data structure should I use?

• Array or linked structure?

• Does it change often?

• Does it get searched often?



• Does it change often? – yes – then linked nodes

• Does it get searched often?



• Does it change often? – yes – then linked nodes

• Does it get searched often?– yes – then array

General Class Info

• When, where and who– Website:

http://www.cs.ucsb.edu/~franklin/154/154.html– Professor: Diana Franklin, franklin@cs– TA: Michael, Nadav, Shivapriya

• Office Hours:– Franklin: MTWR, 3:30-4:30, – HFH 1115– TA:

Grading Policy• Grading

– Labs: 0-5% (0.5% for each attended)– Projects: 25-30% – Quizzes: 10% – Midterms: 25% – Final: 35%

• Plagiarism– You may discuss the design of programming

assignments– You may not show or look at any other group’s

code• Come to office hours!!!• Look at example code from class!!!

– Plagiarism will result in an F in the class and reporting to Judicial Affairs for further action.

Curve

• Individual tests and assignments are not curved

• Curving only occurs at the end to offset grading that is too harsh

Projects• 2 or 3 students per group• Discussions focus on skills for project• Projects build on each other

– Don’t get behind – you have fair warning

– The expectation is that everyone completes all projects properly (as opposed to in the past, where you could get one bad grade and have others not depend on it)

Discussion group

• Piazza– join this week

– Announcements will be made here

– Do not post code or partial solutions EVER, even to ask for help as to what is wrong• Post those privately!

Exams

• 2 MiniExams – 1 side of 1 page notes

• 2 Midterms – 2 sides of 1 page notes

• 1 Final – 2 sides of 2 pages of notes

• if your weighted average on exams < 60% (straight scale), and is well below the class average, you may receive an F

Learning a new ISALearn the syntax, semantics of:

• Arithmetic operations

• Control operations

• Memory operations

High-Level MIPS

• Arithmetic: All computation occurs in registers

• Branches: Two-step process – calculate then branch

• Memory: Move data between registers (for computation) and memory (huge)

MIPS Registers – 32 registersName Reg Number Usage Preserved

across call?

$zero 0 The constant 0 Yes

$v0-$v1 2-3 Function results No

$a0-$a3 4-7 Arguments No

$t0-$t7 8-15 Temporaries No

$s0-$s7 16-23 Saved Yes

$t8-$t9 24-25 More temporaries No

$gp 28 Global pointer Yes

$sp 29 Stack pointer Yes

$fp 30 Frame pointer Yes

$ra 31 Return address Yes

Page 140, Figure 3.13

Operation # meaning

add $2,$3,$5 # $2 <- $3 + $5

sub $2,$3,$5 # $2 <- $3 - $5

addu $2,$3,$5 # $2 <- $3 + $5

slt $2, $3, $5 # if ($3 < $5) $2 <- 1#else $2 <- 0

Arithmetic “R-Format”

• Two input registers – rs & rt

• One output register - rd

Arithmetic “I-format”

• One input register –

• One hard-coded constant -

• One output register -

Operation # comment

addi $2, $3, 8 # $2 <- $3 + 8

andi $2, $3, 10 # $2 <- $3 & 10

slti $2, $3, 7 # if ($3 < 7) $2 <- 1 #else $2 <- 0

Branches

• goto loop• if (i < 100) goto loop

Operation # comment

beq $3,$2,loop # if ($3 == $2) goto loop

bne $3,$2, loop # if ($3 != $2) goto loop

jr $3 # goto $3

j loop # goto loop

jal function # goto function, store return address in $ra

Operation # comment

lw $2, 32($3) # $2 <- M[32 +$3]

sw $2, 16($3) # M[16 +$3] <- $2

Load/Store Instructions

• Displacement addressing mode • Register indirect is Displacement with 0 offset• lw = load word (4 bytes)

Let’s do a code example

int sum = 0;

for(i=0;i<n;i++)sum += A[i];

1. Split apart the parts of the for loop

2. Translate the regular code

3. Insert branches

4. Translate memory operations

int sum = 0;

for(i=0;i<n;i++)sum += A[i];

• int sum = 0;

• i = 0;

• if !(i < n) -> skip loop

• sum += A[i]

• i++

• if (i < n) -> loop again

• int sum = 0;• i = 0;• if !(i < n) -> skip loop• sum += A[i]• i++• if (i < n) -> loop again

• $t0 -> sum, $t1 -> i• assume &A[0] is in $a0, n

is in $a1• addi $t0, $0, 0• add $t1, $0, $0• slt $t2, $t1, $a1• beq $t2, $0, skiploop• loop: sll $t2, $t1, 2• add $t3, $t2, $a0• lw $t2, 0 ($t3)• add $t0, $t0, $t2• addi $t1, $t1, 1 • slt $t2, $t1, $a1• bne $t2, $0, loop• skiploop:

• sum += A[i]

• load A[i]• add it to sum

• sll $t2, $t1, 2• add $t3, $t2, $a0• lw $t2, 0 ($t3)

Documents

Computer Architecture CS 154 Where software and hardware finally meet Dr. Franklin