Lecture 2: Fundamentalsof Computer Design
http://list.zju.edu.cn/kaibu/comparch
Chapter 1
• Transition from single processor to multiple processors;
• Quantitative approach: empirical observations (of programs, experimentations, simulation) as its tools;
Outline
• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement
Outline
• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement
5 Classes of Computers
PMD: Personal Mobile Device
• Wireless devices with multimedia user interfaces
• cell phones, tablet computers, etc.• a few hundred dollars
PMD Characteristics• Cost effectiveness
less expensive packaging;absence of fan for cooling
• Responsiveness & Predictabilityreal-time performance: a maximum execution time for each
app segment;soft real-time: average time constraint – tolerate occasionally
missed time constraint on an event.
• Memory efficiencyoptimize code size
• Energy efficiencybattery power, heat dissipation
Desktop Computing
• Largest market share• low-end netbooks: $x00• …• high-end workstations: $x000
Desktop Characteristics
• Price-Performancecombination of performance and price;compute performancegraphics performance
• The most important to customers,and hence to computer designers
Servers
• Provide large-scale and reliable file and computing services (to desktops)
• Constitute the backbone of large-scale enterprise computing
Servers Characteristics
• Availabilityagainst server failure
• Scalabilityin response to increasing demand with scaling up computing capacity, memory, storage, and I/O bandwidth
• Efficient throughputtoward more requests handled in a unit time
Why Server Availability
Clusters/WSCsWarehouse-Scale Computerscollections of desktop computers or serversconnected by local area networks to act as a single larger computer
Characteristicsprice-performance, power, availability
Embedded Computers
hide everywhere
Embedded vs Non-embedded• Dividing line
the ability to run third-party software
• Embedded computers’ primary goalmeet the performance need at a minimum price;rather than achieve higher performance at a higher price
Outline
• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement
Application Parallelism
• DLP: Data-Level Parallelismmany data items being operated on at the same time
• TLP: Task-Level Parallelismtasks of work created to be operate independently and largely in parallel
Hardware Parallelism
• Computer hardware exploits two kinds of application parallelism in four major ways:Instruction-Level ParallelismVector Architectures and GPUsThread-Level ParallelismRequest-Level Parallelism
Hardware Parallelism
• Instruction-Level Parallelismexploits data-level parallelismat modest levels – pipelining;at medium levels – speculative exec;
Hardware Parallelism
• Vector Architectures &GPUs (Graphic Process Units)exploit data-level parallelismapply a single instruction to a collection of data in parallel
Hardware Parallelism
• Thread-Level Parallelismexploits either DLP or TLPin a tightly coupled hardware modelthat allows for interaction among parallel threads
Hardware Parallelism
• Request-Level Parallelismexploits parallelism among largely decoupled tasks specified by the programmer or the OS
Classes of Parallel Architectures
by Michael Flynnaccording to the parallelismin the instruction and data streams called for by the instructions at the most constrained component of the multiprocessor:SISD, SIMD, MISD, MIMD
SISD
• Single instruction stream, single data stream – uniprocessor
• Can exploit instruction-level parallelism
SIMD
• Single instruction stream, multiple data stream
• The same instruction is executed by multiple processors using different data streams.
• Exploits data-level parallelism• Data memory for each processor;
whereas a single instruction memory and control processor.
MISD
• Multiple instruction streams, single data stream
• No commercial multiprocessor of this type yet
MIMD
• Multiple instruction streams, multiple data streams
• Each processor fetches its own instructions and operates on its own data.
• Exploits task-level parallelism
Outline
• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement
Instruction Set Architecture
ISA• actual programmer-visible instruction
set• the boundary between software and
hardware• 7 major dimensions
ISA: Class
• Most are general-purpose register architectures with operands of either registers or memory locations
• Two popular versionsregister-memory ISA: e.g., 80x86
many instructions can access memoryload-store ISA: e.g., ARM, MIPS
only load or store instructions can access memory
ISA: Memory Addressing
• Byte addressing• Aligned address
object width: s bytesaddress: Aaligned if A mod s = 0
Each misaligned object requires two memory accesses
ISA: Addressing Modes
• Specify the address of a memory object
• Register, Immediate, Displacement
ISA: Types and Sizes of OPerands
Type Size in bits
ASCII character 8
Unicode characterHalf word
16
Integerword
32
Double wordLong integer
64
IEEE 754 floating point – single precision
32
IEEE 754 floating point – double precision
64
Floating point –extended double precision
80
MIPS64 Operations
• Data transfer
MIPS64 Operations
• Arithmetic Logical
MIPS64 Operations
• Control
MIPS64 Operations
• Floating point
ISA: Control Flow Instructions
• Types:conditional branchesunconditional jumpsprocedure callsreturns
• Branch address: add an address field to PC (program counter)
ISA: Encoding an ISA
• Fixed length: ARM, MIPS – 32 bits• Variable length: 80x86 – 1~18 bytes
http://en.wikipedia.org/wiki/MIPS_architecture
Start with a 6-bit opcode.
R-type: three registers,
a shift amount field, and a function field;
I-type: two registers,
a 16-bit immediate value; J-type:
a 26-bit jump target.
Computer Architecture
ISA Organization Hardwareactual programmervisible instruction set;boundary between swand hw;
high-level aspectsof computer design:
memory system,memory
interconnect,design of internal processor or CPU;
computer specifics:logic design,packaging tech;
Outline
• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement
Five CriticalImplementation Technologies• Integrated circuit logic technology• Semiconductor DRAM• Semiconductor flash• Magnetic disk technology• Network technology
Integrated circuit logic technology
• Moore’s Law: a growth rate in transistor count on a chip of about 40% to 55% per year
doubles every 18 to 24 months
Semiconductor DRAM
• Capacity per DRAM chip doubles roughly every 2 or 3 years
Semiconductor Flash
• Electronically erasable programmable read-only memory
• Capacity per Flash chip doubles roughly every two years
• In 2011, 15 to 20 times cheaper per bit than DRAM
Magnetic Disk Technology
• Since 2004, density doubles every three years
• 15 to 20 times cheaper per bit than Flash
• 300 to 500 times cheaper per bit than DRAM
• For server and warehouse scale storage
Network Technology
• Switches• Transmission systems
Performance Trends
• Bandwidth/Throughputthe total amount of work done in a given time;
• Latency/Response Timethe time between the start and the completion of an event;
Bandwidth over Latency
Trends in Power and Energy
• Power = Energy per unit time1 watt = 1 joule per secondenergy to execute a workload = avg power x execution time
• Three primary concernsthe max power for a processorsustained power consumptionenergy and energy efficiency
Trends in Power and Energy• Sustained power consumption• Metric: TDP
Thermal Design Powerdetermines cooling requirement
• Heat management1. reduce clock rate and hence power as the thermal temperature approaches the junction temperature limit;2. if 1 is not working, power down the chip.
Trends in Power and Energy• Energy and Energy Efficiency• energy to execute a workload =
avg power x execution time• Example
processor A with 20% higher avg power consumption than processor B;but A executes the task with 70% of the time by B;A or B is more efficient?
Trends in Power and Energy• Example
processor A with 20% higher avg power consumption than processor B;but A executes the task with 70% of the time by B;A or B is more efficient?
• EnergyConsumptionA= 1.2 x 0.7 x EnergyConsumptionB=0.84 x EnergyConsumptionB
Trends in Power and Energy• Primary energy consumption within a micr
oprocessor is for switching transistors – dynamic energy
logic transistion: 0->1->0 or 1->0->1• The energy of a single transition
Trends in Power and Energy• The power required per transistor
• For a fixed task, slowing clock rate (frequency) reduces power, but not energy.
Trends in Power and Energy• Example
some microprocessors with adjustable voltage;15% reduction in voltage -> 15% reduction in frequency;the impact on dynamic energy and dynamic power?
Trends in Power and Energy• Answer
Trends in Power and Energy• Challenges
distributing the powerremoving the heatpreventing hot spots
potential research topics
Trends in Power and Energy• Energy-efficiency improvement
techniques1. do nothing wellturn off the clock of inactive modules2. DVFS: dynamic voltage-frequency scalingscale down clock frequency and voltage during periods of low activity
DVFS
Trends in Power and Energy• Energy-efficiency improvement techniques
3. design for typical casePMDs, laptops – often idlememory and storage with low power modes to save energy4. overclockingthe chip runs at a higher clock rate for a short time until temperature rises
Trends in Cost
• Cost of an Integrated Circuitwafer for test; chopped into dies for
packaging
Trends in Cost
• Cost of an Integrated Circuit
percentage of manufactured devices that survives the testing procedure
Trends in Cost
• Cost of an Integrated Circuit
Trends in Cost
• Cost of an Integrated Circuit
Intel Core i7 Die
Trends in Cost• Example
Trends in Cost• Example
Trends in Cost• Cost of an Integrated Circuit
• N: process-complexity factor for measuring manufacturing difficulty
Outline
• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement
Dependability
• SLA: service level agreements• System states: up or down• Service states
service accomplishment
service interruption
failure restoration
Dependability• Two measures of dependability
Module reliabilityModule availability
Dependability• Two measures of dependability
Module reliabilitycontinuous service accomplishment from a reference initial instant
MTTF: mean time to failure MTTR: mean time to repairMTBF: mean time between failuresMTBF = MTTF + MTTR
Dependability• Two measures of dependability
Module reliabilityFIT: failures in time
failures per billion hours
MTTF of 1,000,000 hours= 109/106 = 1000 FIT
Dependability• Two measures of dependability
Module availability
Dependability
• Example
Dependability
• Answer
Outline
• Classes of computers• Parallelism• Instruction Set Architecture• Trends• Dependability• Performance Measurement
Measuring Performance
• Execution timethe time between the start and the completion of an event
• Throughputthe total amount of work done in a given time
Measuring Performance• Computer X and Computer Y• X is n times faster than Y
Quantitative Principles
• Parallelism• Locality
temporal locality: recently accessed items are likely to be accessed in the near future;spatial locality: items whose addresses are near one another tend to be referenced close together in time
Quantitative Principles• Amdahl’s Law
Quantitative Principles• Amdahl’s Law: two factors
1. Fractionenhanced: e.g., 20/60 if 20 seconds out of a 60-second program to enhance2. Speedupenhanced:e.g., 5/2 if enhanced to 2 seconds while originally 5 seconds
Quantitative Principles• Example
Quantitative Principles• The Processor Performance Equation
Quantitative Principles• Example
Quantitative Principles• Example
?
Reading
• Chapter 1.8, 1.10 – 1.13