Upload
others
View
27
Download
0
Embed Size (px)
High Performance Processor Architecture
Neeraj Goel
2004csz8035
Embedded System Group
Dept. of Computer Science and Engineering
Indian Institute of Technology Delhi
http://embedded.cse.iitd.ernet.in/
HU810 Seminar
Outline
Introduction
History and Future prediction
Pentium 4 features
Pipelining
Superscalar features
Hyper-Threading
Conclusion and future
HU810 Seminar
Moore’s Law
Intel Microprocessors(source:www.intel.com)
HU810 Seminar
Intel’s Processors : past and Current
Year of Introduction Transistors8008 1972 2,500
8080 1974 5,000
8086 1978 29,000
286 1982 120,000
Intel386 processor 1985 275,000
Intel486 processor 1989 1,180,000
Intel Pentium processor 1993 3,100,000
Intel Pentium II processor 1997 7,500,000
Intel Pentium III processor 1999 24,000,000
Intel Pentium 4 processor 2000 42,000,000
Intel Itanium processor 2002 220,000,000
Intel Itanium 2 processor 2003 410,000,000HU810 Seminar
How to increase performance
PipeliningBreaking a large system in number of stages
Instruction level parallelismSoftware codes are serially writtenIndependent instructions can be executed parallelLarge number of function units required
Thread level parallelismApplication are written with threadsOperating system can have threadsDifferent application on different thread
HU810 Seminar
How Pentium is getting high performance
Rapid execution, more pipelining stages
Out of order execution
Speculative execution
Hyper threading
Trace cache
Store to load forwarding enhancements
HU810 Seminar
Pipelining
The concept of splitting a job into sub-processes in whichthe output of one sub-process feeds into the next.
A mechanical example of a pipeline is a washer/dryersystem for clothing.
HU810 Seminar
Pipelining
The concept of splitting a job into sub-processes in whichthe output of one sub-process feeds into the next.
A mechanical example of a pipeline is a washer/dryersystem for clothing.
More stages means more throughput also more latency
Issue : All stages should be of almost equal delay otherwiseslowest stage will determine clock cycle
Fetch Decode Execute Write−back
HU810 Seminar
Superscalar Architecture
We can have large number if functional units but program isserial
Will multiple instruction fetch solve the problem?
HU810 Seminar
Superscalar Architecture
We can have large number if functional units but program isserial
Will multiple fetch solve the problem?
IssuesDependenciesBranches
HU810 Seminar
Speculative Execution
Situation: There is pipeline of 20 stages and all are waitingfor branch to be resolved
Effect: Benefits of pipelining and superscalar will vanish atbranch instructions?
Solution?
HU810 Seminar
Speculative Execution
Situation: There is pipeline of 20 stages and all are waitingfor branch to be resolved
Effect: Benefits of pipelining and superscalar will vanish onbranches?
Execute both if and else instructions simultaneously
Discard wrong one when result of branch come
HU810 Seminar
Thread level parallelism
Multi-processorsSupercomputers
Chip Multi-ProcessingDual core chips like Intel’s Xeon
Simultaneous Multi-threadingOne processor and multiple threadDifferent from multi-programing and multi-tasking
HU810 Seminar
Hyper-threading
Makes a single processor appear as multiple logicalprocessors
Each logical processor keeps a its own copy of thearchitecture state
OS view the logical processors as physical processors
Logical processors share a single set of physical resources
HU810 Seminar
Hyper-threading
Makes a single processor appear as multiple logicalprocessors
Each logical processor keeps a its own copy of thearchitecture state
OS view the logical processors as physical processors
Logical processors share a single set of physical resources
HU810 Seminar
Conclusion and Future
Future processor will need more performance - higher clockspeed
Not possible with shrinking device dimensions
Need architectural solutions
SMP and CMP will be solution
More instruction level parallelism can be exploited usingcompiler techniques
HU810 Seminar
Thank You
Thank You
HU810 Seminar
Backup
Backup
HU810 Seminar
Source Files
http://www.cse.iitd.ernet.in/ neeraj/doc
HU810 Seminar
Some Definitions
CacheAn on chip memory with very less access timeCost is moreusually required data can be placed there
Clock speedMentioned in MHz and GHzMHz : Million instructions per second
BusesData, Address and ControlBus width -> Number of parallel bits that can beaccessed
HU810 Seminar
Block Diagram of Pentium 4
HU810 Seminar