Upload
merilyn-norton
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
Mark Franklin, S06
CS, CoE, EE 362Digital Computers II: Architecture
• Prof. Mark Franklin: [email protected]
• Course Assistants: – Drew Frank: [email protected]
• Required Book: “Heuring & Jordan” 2nd Edition
• Optional Book: “Intro. VHDL” Yalamanchili
• Read: Academic Integrity Statement.• Course Web Site:
http://www.cse.wustl.edu/~jbf/cse362.d/cse362.html
Mark Franklin, S06
Four Key Questions
• What components must every computer have ?
• How can computers be described, specified and evaluated ?
• What constitutes computer architecture (hardware, software, firmware, algorithms, etc.) ?
• How does technology effect computer architecture (chip size, feature size, power, pin density, etc) ?
Mark Franklin, S06
Essential Computer Components• Processor: interpret/execute instructions.
• Memory: store instructions & data.
• Communication Device(s): communicate with outside world, I/O.
Processor
ControlUnit
ALU
Memory Input/Output
Classic Computer Architecture (SISD: Single Instruction Stream-Single Data Stream)
Mark Franklin, S06
Architecture Components
• INSTRUCTION SET DESIGN: Programmer visible instruction set Algorithm, compiler, OS design, algorithmic complexity
• HIGH LEVEL COMPONENT ORGANIZATION: Memory system, bus structure, processor design, branch handling, pipelining, execution algorithms, instructions/second, clocks/instruction.
• HARDWARE: Detailed logic design, packaging VLSI & Logic design CAD algorithms speed, area, power, …
Mark Franklin, S06
ALU ALUALUALU
Interconnection Network
Data Memory Unit
Program Control Unit
ProgramMemory
Input / Output
(SIMD) Single Instruction Stream – Multiple Data Stream Architecture
Mark Franklin, S06
Performance Expression: Amdahl’s Law
/ Efficiency
present processors ofnumber
/)1(
1
speedup eachieveabl maximum
10;lysequential performed
bemustthatoperationsoffraction
pSE
p
pffS
S
f
f
nn
n
n
Mark Franklin, S06
Amdahl’s LawIt does no good to have many processors if there is notenough parallelism. What portion of a computation can be sequential if we want the processors to be used at 50 percent efficiency ? ( S = p/2 )
.processors of
number theof inverse the toalproportion bemust
processing sequential todevotedn computatio the
offraction the,efficiencyconstant amaintain To
1
1
21
/)1(
12/
pf
fpf
pffp
n
nn
nn
Mark Franklin, S06
Generalize Amdahl’s Law
Speedupoverall =ExTimeold
ExTimenew
=
1
(1 - Fractionenhanced) + Fractionenhanced
Speedupenhanced
Example: “Suppose a program runs in 100 seconds on a machine. Multiply operations are responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?”
What about 5 times faster?
PRINCIPAL: Make the common case fast!
Mark Franklin, S06
Computer Market Partitioning(costs are for processor, not system)
• Desktop Computing ($100 - $1,000):– Price-performance
• Servers: ($200 - $2,000)– Availability (reliability + effectiveness)– Scalability– Throughput
• Embedded Computers: ($0.20 - $1,000)– Real-time performance– Power and memory minimization– Cost minimization– Interface with special purpose logic; use of processor
cores
Mark Franklin, S06
HLL (e.g., C, C++, Perl) vs Machine/Assembly Language (AL)
• HLL Pros: – Easier to express algorithms due to higher level constructs
(e.g., For, Case, Arithmetic expressions, objects, etc.)– Type checking (Hardware for type checking ?).– Some memory allocation checking.
• Assembly Language Pros:– More control over ISA more speed, less memory– More control over I/O
• Combination is often best for embedded systems: HLL calling AL .
Mark Franklin, S06
Example: HLL AL Mapping
• b = c + d*e • LOAD R1, d• LOAD R2, e• LOAD R3, c• MPY R4, R2, R1• ADD R5, R4, R3• STORE R5, b
HLL AL
Mark Franklin, S06
Buses: I• A set of path(s) (wires) connecting on-chip
or off-chip modules. – Serial bus: transmit one bit at a time– Parallel bus: transmits many bits
simultaneously • Generally time-shared.• Generally has separate data & control paths.• Typically has a separate bus controller or
arbiter that decides which modules can use the bus at any given time.
Mark Franklin, S06
Buses: II• Some common buses:
– On-chip: AMBA, Wishbone, (generally not standard)– Off-chip: PCI Bus Family),
• ---------------- 32bit transfer 64bit transfer• 33-MHz PCI 133 MB/sec 266 MB/sec• 66-MHz PCI 266 MB/sec 532 MB/sec• 100-MHz PCI-X ------------ 800 MB/sec• 133-MHz PCI-X ------------ 1 GB/sec• PCI-e(xpress) serial, 1 lane 500 MB/sec
• PCI-e(xpress) serial, 4 lanes 2 GB/sec– Off-chip: Other buses - SCSI, IDE, Infiniband
• Common issues: Arbitration, congestion.• Logical equivalence between buses, multiplexers
and switches.
Mark Franklin, S06
Bandwidth Requirements
Mark Franklin, S06
Bandwidth Trend
Mark Franklin, S06
Simple Queuing Theory View of Buses
• Bus is a shared resource and can be viewed as a server in a queuing system.
• Modules attached to the bus present inputs (i.e., requests) to the server (or Bus) and are queued up if the server is busy.
BUS
CPU
I/O
Memory
Server
Queue
Mark Franklin, S06
Basic Queueing Theory
• Utilization: % time a server is busy• Average Queue Length: Avg # of jobs in queue.• Average System Delay (latency): Avg time from job
entry into, to job departure from system.• Arrival Time Distribution: Poisson Distribution of
arrival times (exponential interarrival times).• Service Time Distribution: Exponentially distributed
service times.• Queue Charactericstics: Infinite length; FIFO service
discipline.
Mark Franklin, S06
Basic Queueing Results
1...
...
)..
.
.
TimeWaitingSystemAvg
LSysteminNumAvg
LLengthQueueAvg
rateservice
ratearrivalnUtilizatio
q
Mark Franklin, S06
Basic Queueing Results
1 0
1 0
M/M/1 M/M/1
Qu
eue
Len
gth
Wai
tin
g
Tim
e
1/
Mark Franklin, S06
Computer Generations
• 1: 1950 - 1959 Vacuum Tubes
• 2: 1960 - 1968 Transistors
• 3: 1969 - 1977 Integrated Circuit
• 4: 1978 - 2005 LSI-Large Scale Integration; VLSI-Very LSI
• 5: 2005 - 20?? ULSI-Ultra LSI; parallel processing
Mark Franklin, S06
Technology: How we make a chip (roughly)
Mark Franklin, S06
Integrated Circuit Cost
Cost.per.waferCost.per.die = ----------------------------------- (Dies.per.wafer) x (Yield)
Wafer.areaDies.per.wafer = ------------------- (approximate) Die.area
1Yield = ---------------------------------------------- (empirical observation) (1 + (Defects.per.area)x(die.area/2))2
Typical: Die area = 1.5 cm x 1.5 cm; Wafer Diameter = 10 inches; Defects.per.cm2 = 1.7; Yield = 50 %
Mark Franklin, S06
TECHNOLOGY TRENDS
• Semiconductors:– Transistor Density: +50%/year, quadruple in 4 years.– Die Size: +10 - 25%/year
• IC Logic Technology: – Transistors per Chip: +50 - 60%/year– Device Speed: +30%/year– Wire/Communications Speed: ~constant (Cu vs Al)
• Magnetic Disk Technology: – Density: +25 - 60% / year– Access Time: +35% / 10 years (8 ms).
Feature and Die Size
Mark Franklin, S06
Wafer Size12-inch wafer
Mark Franklin, S06
SILICON & MAGNETIC DENSITIES
Mark Franklin, S06
Processor Performance GainsP
erfo
rman
ce (
x V
AX
-10
/780
)
Mark Franklin, S06
Processor Cost Trends with Time
Mark Franklin, S06
SILICON & MAGNETIC DENSITIES