79
A New Era in Processor Evolution Dezső Sima Fall 2007 (Ver. 2.2) Dezső Sima, 2007

A New Era in Processor Evolution

  • Upload
    zihna

  • View
    38

  • Download
    3

Embed Size (px)

DESCRIPTION

A New Era in Processor Evolution. Dezső Sima Fall 2007. (Ver. 2.2).  Dezső Sima, 2007. Foreword. - PowerPoint PPT Presentation

Citation preview

Page 1: A New Era in Processor Evolution

A New Era in Processor Evolution

Dezső Sima

Fall 2007

(Ver. 2.2) Dezső Sima, 2007

Page 2: A New Era in Processor Evolution

Foreword

Beginning with second generation superscalars, the continuous, approximately 10-fold-per-decade increase of processor efficiency leveled off for reasons shown in Chapter I. Designers responded by massively rising clock frequencies at up to a 100-fold-per-decade rate in order to sustain an approximately 100-fold-per-decade performance increase. Such a rapid progress, however inevitably encountered its limits due to declining processor efficiency, increasing dissipation and skew in parallel buses, as shown in this Chapter. As a consequence, a decade long era of processor evolution, characterized by massively rising clock frequencies, ended in the last few years. The new era is heralded by multicore and multithreaded designs, as discussed in Chapters III. and IV.

Page 3: A New Era in Processor Evolution

Contents

1. Processor performance•

2. Efficiency of processors•

3. Addressing the levelling off of processor efficiency•

4. Aggressively raising clock frequency•

5. The efficiency wall•

6. The thermal wall •

7. The skew wall •

8. EPIC architectures/processors •

9. The end of an era in processor evolution •

Page 4: A New Era in Processor Evolution

1. Processor Performance

Page 5: A New Era in Processor Evolution

Relative performanceAbsolute performance

Number of succesfully executed instructions/sec

effcai IPCfP

Number of succesfully executed operations/sec (SIMD)

OPIIPCfP effcao

Relating the execution times of a benchmark program on the tested system to a reference system according to the following interpretation:

E.g.: SPECint92, SPECint_base2000

1.1. Introduction (1)

fc: Clock frequencyIPC: Instructions/cycleOPI: Operations/cycle

n

nv

nref

v

refr t

t

t

tP

1

1

Page 6: A New Era in Processor Evolution

1.1. Introduction (2)

In general purpose applications:

1OPI

IPCIPCeff

where:IPC : issued instructions per cycleη : number of successfully executed/issued instructions

(efficiency of the speculative execution)

effcaia IPCfPP

Page 7: A New Era in Processor Evolution

In performance/efficiency studies:

Theoretical interpretation: Pa

Practical measurement: Pr

1.1. Introduction (3)

1

2

1

2

r

r

a

a

P

P

P

P?

Page 8: A New Era in Processor Evolution

If the following were true:

v

ref

nv

nref

v

ref

v

ref

t

t

t

t

t

t

t

t ...

2

2

1

1

In that case:

2

1

121

2

v

v

v

ref

v

ref

r

r

t

t

t

t

t

t

P

P

1

2

21 a

a

aa P

P

PI

PI

v

refr t

tP

1.1. Introduction (4)

I: Number of instructions in the application considered

Page 9: A New Era in Processor Evolution

However:

Figure 1.1.: Runtime ratios of the component programs of SPECint2000

Source: http://www.spec.org

1.1. Introduction (5)

Page 10: A New Era in Processor Evolution

When comparing the performance of two systems:

1

2

1

2

r

r

a

a

P

P

P

P

This estimation is useable in trend considerations.

1.1. Introduction (6)

Page 11: A New Era in Processor Evolution

Comparing the efficiency of two systems:

1.1. Introduction (7)

1

2

eff

eff

IPC

IPC

1

1

2

2

c

a

c

a

fP

fP

2

1

1

2

c

c

a

a

f

f

P

P

2

1

1

2

c

c

r

r

f

f

P

P

1

1

2

2

c

r

c

r

fP

fP

1

2

eff

eff

IPC

IPC

Page 12: A New Era in Processor Evolution

1.2. Evolution of processor performance (1)

Figure 1.2: Integer performance growth of Intel’s x86 processors

SPECint92

5

10

50

Year86 8879 1980 81 82 83 84 85 87 89 1990 91 92 93 94 95 96 97 98 99

*

*

*

**

*

**

2

386/16

*

* *

*

*

* 8088/5

*0.5

100

8088/8

80286/10

80286/12

386/20 386/25

386/33

500

*

*

*1000

20

200

1

0.2

*

***

**

*

486/25

486/33486/50 486-DX2/66

Pentium/66

Pentium/100 Pentium/120

Pentium Pro/200

PII/450

PIII/600

486-DX4/100

Pentium/133 Pentium/166

Pentium/200

PII/300PII/400 PIII/500

486-DX2/50*

2000 01 02 03

5000

2000*

*

*

*

*

** *

*

PIII/1000

P4/1500P4/1700

P4/2000 P4/2200P4/2400 P4/2800

P4/3060

P4/3200

~ 100*/10 years

*

*

***

04 05

Northwood B

10000

Prescott (1M)Prescott (2M)

Levelling off

Page 13: A New Era in Processor Evolution

Figure 1.3: Integer performance growth (in general - 1)

Source: X86-64 Technology White Paper, AMD Inc., Sunnyvale, CA, 2000

1.2. Evolution of processor performance (2)

Page 14: A New Era in Processor Evolution

3.Figure 1.4: Integer performance growth (in general - 2)

Source: F. Labonte, www-vlsi.stanford.edu/group/chart/specInf2000.pdf

1.2. Evolution of processor performance (3)

Page 15: A New Era in Processor Evolution

2. Efficiency of processors

Page 16: A New Era in Processor Evolution

effca IPCfP

2.1. Introduction

?rsy10/100~

Page 17: A New Era in Processor Evolution

Figure 2.1: Efficiency of Intel processors

2.2. Growth of processor efficiency (1)

fcSPECint_base2000/

Year79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978 2000 01 02

0.05

0.1

0.02

0.5

1

0.2

0.01~~

*

**

**

* **

** *

Pentium

486DX

386DX

286

Pentium IIPentium Pro

**

Pentium III~10*/10 years

Levelling off

2. generationsuperscalars

Page 18: A New Era in Processor Evolution

Figure 2.2: Growth of processor performance/efficiency (in general)

Source: J. Birnbaum, „Architecture at HP: Two decades of Innovation”, Microprocessor Forum, October 14, 1997.

2.2. Growth of processor efficiency (2)

Page 19: A New Era in Processor Evolution

2.3. Contribution of raising processor efficiency to the growth of processor performance

(up to the 2nd generation of superscalars)

A második generációig az órafrekvencia és a hatékonyság növelése egyenlő arányban járultak hozzá a teljesítmény növeléséhez.

? effca IPCfP

y10/100~ rs y10/10~ rs rsy10/10~

Page 20: A New Era in Processor Evolution

2.4. Sources of raising processor efficiency

Increasing the word length

Introducing and increasing temporal parallelism

Introducing and increasing issue parallelism

8/16 32 bit (286 386DX)

1st and 2nd generation pipeline processors (386DX, 486DX)

1st and 2nd generation superscalars (Pentium, Pentium Pro)

Page 21: A New Era in Processor Evolution

2.5. Limit of raising processor efficiency (1)

Processing width

4 RISC instructions/cycle~3 CISC instructions/cycle

Figure 2.3: Processing width of 2nd generation (wide) superscalars vs extent of parallelism available in general purpose applications

2nd generationsuperscalars

(wide superscalars)

Source: Wall: Limits of ILP, WRL TN-15, Dec. 1990

Page 22: A New Era in Processor Evolution

fcSPECint_base2000/

Year79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978 2000 01 02 03 04 05

Leveling off

~ 10x/ 10 years

2. gen. szuperscalars

0.05

0.1

0.02

0.5

1

0.2

0.01~~

Figure 2.4: Growth of processor efficiency (in general)

2.5. Limit of raising processor efficiency (2)

Page 23: A New Era in Processor Evolution

2.5. Limit of raising processor efficiency (3)

Beginning with 2nd generation (wide) superscalarsthe sources of extensively raising processor efficiency

became exhausted

In general purpose applications:

The width of 2nd generation superscalars already approaches the extent of available parallelism (ILP)

Page 24: A New Era in Processor Evolution

3. Addressing the levelling off of processor efficiency

Page 25: A New Era in Processor Evolution

Essentially widening the core by introducing EPIC architectures

Aggresively raising clock frequency

effca IPCfP

Main road of evolution

(Sections 4 – 7) (Section 8)

Page 26: A New Era in Processor Evolution

4. Aggressively raising clock frequency

Page 27: A New Era in Processor Evolution

By reducing the logic depth of pipline stages

By scaling down the feature size in the manufacturing process

4.1. Sources of raising clock frequencies (1)

Raising clock frequency

Page 28: A New Era in Processor Evolution

Figure 4.1: Evolution of Intel’s process technology

Source: D. Bhandarkar: „The Dawn of a New Era”, 11. EMEA, May, 2006.

4.1. Sources of raising clock frequencies (2)

Page 29: A New Era in Processor Evolution

20

30

Year*

10

40

1990 2000

*

* *

*

Pentium(5)

2005

No of pipeline stages

Pentium Pro(~12)

Pentium 4(~20)

Athlon-64(12)

P4 Prescott(~30)

(14)Conroe

*Athlon(6)K6

(6)*

1995

*

Core Duo

Figure 4.2: Number of pipeline stages in Intel’s and AMD’s processors

4.1. Sources of raising clock frequencies (3)

Page 30: A New Era in Processor Evolution

Figure 4.3: Max. logic depth of pipeline stages in processors (in terms of FO4)

Source: F. Labonte www-vlsi.stanford.edu/group/chart/CycleFO4.pdf

4.1. Sources of raising clock frequencies (4)

Page 31: A New Era in Processor Evolution

Figure 4.4: Growth of clock frequencies in Intel’s x86 line of processors

4.2. Growth rate of clock frequencies (1)

5

10

50

Year

*

** *

2

8088

*

100

386

Pentium

Year of first volume shipment

cf

500

1000

20

200

*

486-DX2

79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 9978

*

*

**

*

*486

*

** *

*

** *

**

Pentium II

***Pentium III

*

286

*

Pentium Pro

1

486-DX4

2000 01 02 03

2000**

***

***

*

*

5000

Pentium 4

~10*/10years

~100*/10years

04 05

* * *

Leveling off(MHz)

Page 32: A New Era in Processor Evolution

Figure 4.5: Growth of clock frequencies (in general)

4.2. Growth rate of clock frequencies (2)

Page 33: A New Era in Processor Evolution

Emerging limits of evolution

Ousting of major RISC families

4.3. Implications of aggressively raising clock frequencies

4.3.1 Overview

(4.3.2)

(4.3.3)

Page 34: A New Era in Processor Evolution

Figure 4.6: The shift in performance leadership between RISC and x86 lines

4.3.2. Ousting of major RISC families (2)

Page 35: A New Era in Processor Evolution

1995-2000: CISCs overtook the performance leadershipthen it is a more intrinsic task to raise fc from a higher value than from a lower one in the same rate

Cancelling of most major RISC lines, such as MIPS’s R-Lines, HP’s Alpha and PA lines,

PowerPC Consortium’s PowerPC line

4.3.2. Ousting of major RISC families (2)

1997: Intel and HP unveiled IA-64/Merced as the next generation architecture/processor line

Page 36: A New Era in Processor Evolution

4.3.3. Emerging limits of evolution

The skew wall

The thermal wall

The efficiency wall

(Section 5)

(Section 6)

(Section 7)

Page 37: A New Era in Processor Evolution

5. The efficiency wall

Page 38: A New Era in Processor Evolution

speed gap between the processor and the memory

5.1. Overview

Basic reason:

(it widens on higher frequencies)

Page 39: A New Era in Processor Evolution

Memory transfer rates

DRAM latencies

Transfer rates of processor buses

L2 cache latencies

Main appearances of the speed gap between the processor and the memory:

5.1. Overview (2)

Page 40: A New Era in Processor Evolution

5.2. Speed gap between processor and memory (1a)

Figure 5.1a: DRAM types

DRAM FPM EDO BEDO SDRAM DRDRAM

Cycle time within a burst(for a 60 ns part)

Full burst timing

Max. bandwidth MB/s

Effective bandwidth MB/s

Examples

Remakes

Random access,typ. access time60/70/80/100 ns

(60 ns)

(5-5-5-5)

Access to 4subsequentcolumns

Overlapping theread and addresstransfer operations

Internal 2-bitaddress generator,

dual banks

Full pipelinedoperation,

assuming at leastdual banks

66/100/133 MHz

Asynchronous

Burst mode access (4*8B) on the same row (page)

Synchronous

Up to 66 MHz bus frequency

Internal on-chipSRAM cache,page is filled in

1 clock cycle,1-2 B wide data path256/300/356/400MHz transfer rate

~ 40 ns ~ 25 ns ~ 15 ns ~ 15/10/7.5 ns (4/3.3/2.8/2.5 ns)

(5-7)-3-3-3(5-7)-4-4-4

(5-7)-2-2-2 5-1-1-1 (5-7)-1-1-1

Triton I.: 7-3-3-3Triton III.: 6-3-3-3

Triton I.: 7-2-2-2Triton II,III.:6-2-2-2

Triton III.: 7-1-1-1430 ZX.: 7-1-1-1

820840

Developed byMICRON

Developed byRAMBUS

Level of overlapping

Since 1996

Cached structure

1

1

2

2

3

3

4

4

5

5

6

6

Dynamic RAMFast Page Mode DRAMExtended Data Out DRAM

Burst mode EDOSynchronous DRAMDirect Rambus DRAM

Page 41: A New Era in Processor Evolution

5.2. Speed gap between processor and memory (1b)

Figure 5.1b: Latency of DRAM chips

486 DX P PPro PII PIII386 DX

86 8881 82 83 84 85 87 89 90 91 92 93 94 95 96 97 98 99

200

180

160

140

120

100

80

60

40

20

2000

*

PC AT

*

*

* *

**

**

*

*

16 K 128 K 256 K 256 K 4 M 16 M

tRAC

Year

Processorchipset

Typ. DRAMparts

(ns)

430 NX

4 M

4 M

4 M1 M 1 M

8 M

16 M 64 M64 M

16 M64 M 128 M

256 M

200

150

100

80

80

60

70

5060

50

30

450 KX/GX 440 BX 815

tRAC

: Row access time (time from row address until data valid)

128 K256 K

Page 42: A New Era in Processor Evolution

5.2. Speed gap between processor and memory (1c)

Figure 5.1c: System-level memory latency in x86-based PCs

486 DX P PPro PII PIII386 DX

86 8881 82 83 84 85 87 89 90 91 92 93 94 95 96 97 98 99

100

10

1

2000

PC

Year

Processor

Memory latencyin proc. cycles

AT(286)(8088)

P4

50

1000

3020

500

200

23

5

*

*

*

*

10

40

85

702

300

**

*

1 1

3

Memory latencyns

500

400

300

200

100

*

*

**

*

155

135

141

116

468

*200

Latency in ns

Latency in proc.cycles

Page 43: A New Era in Processor Evolution

5.2. Speed gap between processor and memory (1d)

Figure 5.1d: Latency of DRAM chips (in clock cycles)

20

40

30

1.0 2.0fc

1.5 2.50.5

10 *

*

*

*

*

*

*

3.0 3.5

*

4.0

Memory latency

*

*

*

**

60

50

80

70

100

90

Pentium

Pentium Pro

Pentium II

Pentium III Pentium 4

RDRAM-40

120

110

*

*

*

*

**

RDRAM-60 DDR2 533

DDR 400

DDR 333

PC 133

PC 100

PC 66

386

EDO

(cycles)

FPM

130*

DDR 266

486

*

*

(GHz)

Page 44: A New Era in Processor Evolution

Figure 5.2: Relative transfer rate of memories (D: dual channel)

0.20

0.40

0.30

1.0 2.0fc

1.5 2.50.5

0.10

**

*

**

*

*

*

*

*

***

*

3.0 3.5

*

*

*

**

4.0

Tmemory/f c

*

*

*

**

**

*

*

*

** *

*

**

*

0.60

0.50

0.80

0.70

1.00

0.90

Pentium

Pentium Pro

Pentium II

Pentium III Pentium 4

PC-66

PC-100

PC-133

DDR 266

PC-800D

DDR 333

DDR 333D

**

*

*****

*

DDR 400

DDR 400D

DDR 533D

*

*

*

*

*

*

*

*

FPM

EDO

(GHz)

5.2. Speed gap between processor and memory (2)

Page 45: A New Era in Processor Evolution

fc max at intro.

(GHz)L2 size(Kbyte)

L2 latency(clock cycles)

Willamette 1.5 128 7

Northwood 2.0 512 16

Prescott 3.4 1024 23

Figure 5.3: Latency of L2 caches

5.2. Speed gap between processor and memory (3)

Page 46: A New Era in Processor Evolution

Figure 5.4: Relative transfer rates of processor buses

0.20

0.40

0.30

1.0 2.0fc

1.5 2.50.5

0.10

*

*

*

*

*

*

*

**

**

**

*

3.0 3.5

**

*

*

*

4.0

Tpb/f c

*

**

*

*

*

*

**

*

*

*

*

*

**

*

0.60

0.50

0.80

0.70

1.00

0.90

Pentium

Pentium Pro

Pentium II

Pentium III

Pentium 4

66

100

133 400 533

8001066

(GHz)

5.2. Speed gap between processor and memory (4)

Page 47: A New Era in Processor Evolution

5.3. Efficiency of 3rd generation superscalars (1)

5.5: Efficiency of Intel’s Pentium III and Pentium 4 processors in general purpose applications

0.40

0.5

0.45

1.0 2.0fc

1.5 2.50.5

0.35

0.30

**

*

*

*

*

**

*

*

*

*

*

**

*

*

****

0.55

3.0 3.5

***

**

*

*

**

*

**

*

*

4.0

Katmai512K dir L2

Coppermine256K on-die L2

Willamette256K on-die L2

Northwood A512K on-die L2

Prescott (1M)1M on-die L2

Prescott (2M)2M on-die L2

Irwindale512K on-die L2

2M on-die L3

800 MHz/PC-3200/SATA-150/HT800 MHz/PC-3200/ATA-100

100 MHzPC-100

SCSI-U2W

100 MHzPC-100

ATA-100

100 MHzPC-133

ATA-100

400 MHzPC-800 RDRAM

ATA-66

400 MHzPC-800 RDRAM

ATA-66

800 MHzPC-4300

SATA-150

Pentium 4Pentium III

SPECint_base2000/f c

Northwood C512K on-die L2

~~

800 MHz/PC-3200/ATA-100

533 MHzPC-800 RDRAM

ATA-100

800 MHzPC-3200

SATA-150HT

**

*

800 MHz/PC-2667/ATA-100

Northwood B512K on-die L2

* *

(GHz)

Page 48: A New Era in Processor Evolution

Figure 5.6: efficiency of AMD’s Athlon, Athlon XP and Athlon 64 processors in general purpose applications

0.40

0.50

0.45

1.5 2.5fc

2.0 3.01.0

0.35

0.30

* *

*

*

*

**

*

*

**

* *

*

**

*

*

0.65

3.5

**

*Palomino

256K on-die L2

Clawhammer1M on-die L2

Thorougbread256K on-die L2

200 MHzPC-100

ATA-66200 MHzPC-100

ATA-66

200 MHzPC-133

ATA-66

200 MHzPC-133

ATA-66

266 MHzPC-2100

ATA-100

266 MHzPC-2100

ATA-100

333 MHz/PC-2700/ATA-100

Athlon-XP

Athlon

SPECint_base2000/f c

0.5

Barton512K on-die L2

Thunderbird256K on-die L2

400 MHz/PC-3200/ATA-100

PC-3200ATA-133

f =fFSBmemory

K7512K dir L21

K75512K dir L22,3

1 f =0.5*fL2 c 2 f =0.4*f

L2 c (f =750/800/850 MHz)c3 f =0.3*f

L2 c (f =900/950/1000 MHz)c

Athlon 64

~~

0.60

~~

4.0(GHz)

5.3. Efficiency of 3rd generation superscalars (2)

Page 49: A New Era in Processor Evolution

Figure 5.7: Main aspects of the memory subsystem affecting core efficiency

fc

Core efficiency

Decreasing core efficiencydue to broadening Increasing core efficiency

primarily due to enhancing thememory subsystem(memory, FSB, L2)

(GHz)

the memory gap

5.3. Efficiency of 3rd generation superscalars (3)

Page 50: A New Era in Processor Evolution

Figure 5.8: Contrasting the efficiency of Intel’s and AMD’s

processors

0.40

0.50

0.45

1.0 2.0fc

1.5 2.50.5

0.35

0.30

**

*

*

*

*

**

*

*

*

*

*

**

*

*

****

3.0 3.5

***

**

4.0

512K/100

256K/100

256K/400

512K/400

1M/800

2M/800

SPECint_base2000/f c

512K/800

~~

**

*

512K/533

* *

**256K/200

* *

**

*

**512K/200

*

*

**

**

*

256K/266

*

*

512K/400

512K/333

0.65

0.60

***

1M/fFSB

1000

0.55

1200 1400 1600 1800

Pentium III

Pentium IV

Athlon

Athlon XP

Athlon 64

(GHz)

5.3. Efficiency of 3rd generation superscalars (4)

Page 51: A New Era in Processor Evolution

Figure 5.9: Contrasting Intel’s and AMD’s processor design philosophies

0.40

0.50

0.45

0.35

**

*

*

*

*

**

*

*

*

*

*

**

*

*

****

***

**

512K/100

256K/100

256K/400

512K/400

1M/800

2M/800

SPECint_base2000/f c

512K/800

~~

**

*

512K/533

* *

**

256K/200

* *

**

*

**512K/200

*

*

**

**

*

256K/266

*

*

512K/400

512K/333

0.65

0.60

***

1M/fFSB

1000

0.55

1200 1400 1600 1800

Designs preferringcore efficiency

Designs preferring clock frequency

1.0 2.0fc

1.5 2.50.5 3.0 3.5 4.0(GHz)

0.75

0.70

0.80

*2M/400

Pentium III

Pentium IV

Athlon

Athlon XP

Athlon 64

Pentium M

5.3. Efficiency of 3rd generation superscalars (5)

Page 52: A New Era in Processor Evolution

Diminishing return on higher clock frequencies

Implication of the emerging efficiency wall:

5.3. Efficiency of 3rd generation superscalars (6)

Page 53: A New Era in Processor Evolution

6. The thermal wall

Page 54: A New Era in Processor Evolution

6. The thermal wall (1)

Dissipation (D) :

Dd=A*C*V2*fc

withA: ratio of the active gates

C: effective capacity of the gates

V: supply voltage

fc: clock frequency

Ileak: leakage current

Dynamic Static

Ds=V*Ileak

Page 55: A New Era in Processor Evolution

6. The thermal wall (2)

Figure 6.1:Chip dynamic and static power dissipation trends

Source: N. S. Kim et al., „Leakage Current: Moore’s Law Meets Static Power”, Computer, Dec. 2003, pp. 68-75.

Page 56: A New Era in Processor Evolution

Source:Solie D., „Technology Trends, Aug. 2006, http://www-03.ibm.com/procurement/proweb.nsf/objectdocswebview/file14+-+darryl+solie+-+ibm+power+symposium+presentation/$file/14+-+darryl+solie-ibm-power+symposium+presentation+v2.pdf

Figure 6.2: Dynamic and static power dissipation trends

Page 57: A New Era in Processor Evolution

Figure 6.3: Relative dissipation of Intel’s x86 family of processors

5

10

50

100

20

2

100 1000 5000

*

*

*

*

**

****

*

*

*

*

**

*

*

* **

*

*

*

Prescott

Northwood

Willamette

Tualatin

Coppermine

Katmai

Deshutes

Klamath

P6

P54CS

P54C

P5

*

(W/cm )2

fc2000200 50020 50

D/die area

0.8μ 0.6μ

0.6μ

0.35μ

0.35μ

0.35μ

0.25μ

0.25μ

0.18μ

0.18μ

0.13μ0.13μ

0.09μ

(MHz)

6. The thermal wall (3)

Page 58: A New Era in Processor Evolution

Figure 6.4: Contrasting the evolution of Intel’s and AMD’s processor lines with the thermal wall

0.40

0.50

0.45

0.35

**

*

*

*

*

**

*

*

*

*

*

**

*

*

****

***

**

512K/100

256K/100

256K/400

512K/400

1M/800

2M/800

SPECint_base2000/f c

512K/800

~~

**

*

512K/533

* *

**

256K/200

* *

**

*

**512K/200

*

*

**

**

*

256K/266

*

*

512K/400

512K/333

0.65

0.60

***

1M/fFSB

0.55

Thermal

wall

Core design,

technology

1.0 2.0fc

1.5 2.50.5 3.0 3.5 4.0 ~~(GHz)

1000 1200 1400 1600 18000.80

*2M/400

0.75

0.70

Pentium III

Pentium IV

Athlon

Athlon XP

Athlon 64

Pentium M

6. The thermal wall (4)

Page 59: A New Era in Processor Evolution

11/00 1/02

^

0.18 /42 mtrs

^

400 MHz FSB

Northwood-A

Xeon DP line

Desktop-line

Celeron-line

Willamette

1.4/1.5 GHz

(Value PC-s)

On-die 256K L2

0.13 /55 mtrs

400 MHz FSB

2A/2.2 GHzOn-die 512K L2

2/02

^

0.13 /55 mtrs

400 MHz FSB

1.8/2/2.2 GHz

On-die 512K L2

5/01

^

0.18 /42 mtrs

400 MHz FSB

1.4/1.5/1.7 GHz

On-die 256 K L2

11/02

^Prestonia-B

0.13 /55 mtrs

533 MHz FSB

2/2.4/2.6/2.8 GHz

On-die 512K L2

Foster Prestonia-A Nocona

2/04

^

0.09 /125mtrs

800 MHz FSB

2.80E/3E/3.20E/3.40E GHzOn-die 1M L2

2000 2001 2002 2003 2004

Xeon - MP line

3/02

^

0.18 /108 mtrs

400 MHz FSB

1.4/1.5/1.6 GHz

On-die 256K L2

11/02

^Gallatin

0.13 /178 mtrs

400 MHz FSB

1.5/1.9/2 GHz

On-die 512K L2

Foster-MP

On-die 512K/1M L3 On-die 1M/2M L3

5/02

^Northwood-B

0.13 /55 mtrs

533 MHz FSB

2.26/2.40B/2.53 GHzOn-die 512K L2

5/02^

Willamette-128

400 MHz FSB

1.7 GHz

11/02

^

6/04

^

0.09 / 125 mtrs

800 MHz FSB

2.8/3.0/3.2/3.4/3.6 GHz

On-die 1M L2

Northwood-B

533 MHz FSB

3.06 GHzOn-die 512K L2

0.13 /55 mtrs

400 MHz FSB

2 GHzOn-die 128K L2

0.18 0.13

9/02

^Northwood-128

On-die 128K L2

Cores supporting hyperthreading

5/03

^Northwood-C

800 MHz FSB

2.40C/2.60C/2.80C GHzOn-die 512K L2

0.13 /55 mtrs

Cores with EM64T implemented but not enabled

2005

2Q/05

^Potomac

0.09 > 3.5 MHz

On-die 1M L2On-die 8M L3 (?)

Irwindale-C

1Q/05

^

0.09 3.0/3.2/3.4/3.6 GHz

On-die 512K L2, 2M L3

Jayhawk

2Q/05

^

0.09

(Cancelled 5/04)

3.8 GHz

On-die 1M L2

3Q/05

^Tejas

0.09 /4.0/4.2 GHz

On-die 1M L2(Cancelled 5/04)

Irwindale-A

11/03

^

800 MHz FSB

3.2EE GHz

On-die 512K L2, 2M L3

0.13 /178 mtrs

Cores supporting EM64T

6/04

^

0.09 /125mtrs

800 MHz FSB

2.8/3.0/3.2/3.4/3.6 GHz

On-die 1M L2

11/04

^Irwindale-B

0.13 /178mtrs

1066 MHz FSB

3.4EE GHzOn-die 512K L2, 2 MB L3

533 MHz FSB

2.4/2.53/2.66/2.8 GHzOn-die 256K L2

0.09

6/04

^Celeron-D

PGA 603 PGA 603

PGA 603 PGA 604

PGA 478 LGA 775

PGA 423 PGA 478 PGA 478 PGA 478 PGA 478 PGA 478 LGA 775

PGA 478 PGA 478

PGA 603 PGA 603

0.18 /42 mtrs

^

400 MHz FSB

Willamette

On-die 256K L2

PGA 478

3/04

^Gallatin

0.13 /286 mtrs

400 MHz FSB

2.2/2.7/3.0 GHz

On-die 512K L2On-die 2M/4M L3

PGA 603

8/01

PGA 478533 MHz FSB

2.53/2.66/2.80/2.93 GHzOn-die 256K L2

0.09

9/04

^Celeron-D

Extreme Edition

7/03

^Prestonia-C

0.13 /178 mtrs

533 MHz FSB

3.06 GHz

On-die 512K L2, 1M L3

PGA 603

1.4 ... 2.0 GHz0.09 /125mtrs

800 MHz FSB

3.20F/3.40F/3.60F GHz

On-die 1M L2

LGA 775

8/04

^

12 13

8,9,10PrescottPrescott Prescott-F115 6,7

LGA 775

42,3

1 1

Figure 6.5: Intel’s P4 processor family (Netburst architecture)

6. The thermal wall (5)

Page 60: A New Era in Processor Evolution

Figure 6.6: The growth of relative dissipation of processors (in general)Source: R Hetherington, „The UltraSPARC T1 Processor” White Paper, Sun Inc., 2005

6. The thermal wall (6)

Page 61: A New Era in Processor Evolution

Implications of the thermal wall:

6. The thermal wall (7)

Processor designs focus now more and more on power aware technics

The approach to increase performance by aggressively raising clock frequency met the

thermal wall

Page 62: A New Era in Processor Evolution

7. The skew wall

Page 63: A New Era in Processor Evolution

Reason:

Figure 7.1: Skew between lines of parallel buses

63. bit

0. bit

Skew

7. The skew wall (1)

Page 64: A New Era in Processor Evolution

Figure 7.2: Equalizing skews among different bit lines of the

processor bus on the MSI 915G Combo motherboard

7. The skew wall (2)

Page 65: A New Era in Processor Evolution

7. The skew wall (3)

Introducing sequential buses

Figure 7.3: Signal transfer over a sequential bus

D+

D-

"0" "1"

(also in slow peripheral buses due to impressive cost savings)

Implication of emerging skews between bit lines of parallel buses:

Page 66: A New Era in Processor Evolution

Implication of emerging limits of evolution

The approach to aggressively raise clock frequencies met the efficiency, thermal and skew walls

and thus hit the dead end

Page 67: A New Era in Processor Evolution

8. EPIC architectures/processors

Page 68: A New Era in Processor Evolution

8. EPIC architectures/processors (1)

Essentially widening the core by introducing EPIC architectures

Aggresively raising clock frequency

effca IPCfP

Main road of evolution

(Sections 4 – 7) (Section 8)

Page 69: A New Era in Processor Evolution

Instructions

Principle of superscalar processing

FE

FE

FE

dynamicdependency resolution

Processor

dependent instructions

Principle ofVLIW processing

FE

FE

FE

VLIW: Very Large Instruction Word

independent instructions(static dependency resolution)

Processor

Figure 8.1: Contrasting the principles of operation of superscalar and VLIW processors

8. EPIC architectures/processors (2)

Page 70: A New Era in Processor Evolution

VLIW EPIC

EPIC: Explicitly Parallel Instruction Computer

enhanced VLIW

• branch prediction• explicit cache control• •

(integration of advanced superscalar features)

8. EPIC architectures/processors (3)

1994: Intel, HP

2001: IA-64 Itanium

1997:EPIC designation

Page 71: A New Era in Processor Evolution

5/01 6/03

^^Itanium 2Itanium

4/04

^Itanium 2

11/04

^Itanium 2

7/05

^Itanium 2

2001 2002 2003 2004 2005

^Itanium 2

7/02

0.18 /25 mtrs

64-bit FSB

733/800 MHz96K L2

266 MT/s

2/4M L3

0.18 /220 mtrs

128-bit FSB

800/1000 MHz256K L2

400 MT/s

1,5/3M L3

0.13 /410 mtrs

128-bit FSB

1.5 GHz256K L2

400 MT/s

6M L3

0.13 /410 mtrs

128-bit FSB

1.4/1.6 GHz256K L2

400 MT/s

3M L3

0.13 /592 mtrs

128-bit FSB

1.5/1.6 GHz256K L2

400 MT/s

3/4/6/9M L3128-bit FSB

1.66 GHz256K L2

667 MT/s

6/9M L3

9/03

^Itanium 2

(Merced) (Mc Kinley) (Madison)(Madison)

1

11

2

21.5 GHz with 4 MB L3

1.6 GHz with 3/6/9 MB L3

400 MT/s for 4/6/9 MB L3 GHz with 4 MB L3400/533 MT/s for 3 MB L3

Multiprocessor(MP-line)

Dual processor(DP-line)

0.13 /410 mtrs

128-bit FSB

1.4 GHz256K L2

400 MT/s

1.5M L3

(Madison)

Year

(Madison) (Madison)

Figure 8.2: Overview of Itanium cores

8. EPIC architectures/processors (4)

Page 72: A New Era in Processor Evolution

0.5

0.7

0.6

1000 2000fc

1500500

0.4*

*

0.9

0.8

1.0

Itanium

Itanium 2

64-bit FSB/266 MT/s

*

**

*

*

*

(MHz)~~~~

SPECint_base2000/f c

128-bit FSB/400 MT/s

96K L2/4M dir. L3

96K L2/2M dir. L3

256K L2/9M L3/DDR 266256K L2/6M L3/DDR 266

256K L2/3M L3/DDR 266

Figure 8.3: The efficiency of Itanium processors

8. EPIC architectures/processors (5)

Page 73: A New Era in Processor Evolution

Figure 8.4: Expected spreading of the IA-64 architecture (Itanium processors)

Source: L. Gwennap: Intel’s Itanium and IA-64: Technology and Market Forecast, MDR, 2000

8. EPIC architectures/processors (6)

Page 74: A New Era in Processor Evolution

Figure 8.5: Revenue expectations concerning Intel’s Itanium line

8. EPIC architectures/processors (7)

Page 75: A New Era in Processor Evolution

In general purpose applications:EPIC architectures/processors

play a decreasing role

8. EPIC architectures/processors (8)

Page 76: A New Era in Processor Evolution

9. The end of an era in processor evolution

Page 77: A New Era in Processor Evolution

9. The end of an era in processor evolution (1)

In general purpose applicationsbeginning with the 2. generation superscalars

processor efficiency leveled off,but both approaches to address leveling off efficiency

met limits of evolution and thus hit the dead end

Single core complex superscalars, –

at the end of an era

Page 78: A New Era in Processor Evolution

9. The end of an era in processor evolution (2)

A new era in processor evolution–

The dawn of multicore, multithreded processors

The number of processors will double also in each ~ 24 months

Available hardware complexity increases further on exponentially

(Moore’s law)

Complexity is doubled in each ~ 24 moths

Page 79: A New Era in Processor Evolution

Figure 9.1: Rapid spreading of multi core processors

revealed by Intel

9. The end of an era in processor evolution (3)