Advanced Micro Devices - Athlon Buddy Guest Mike Lewitt Bill McCorkle November 28, 2001

Preview:

Citation preview

Advanced Micro Devices - Athlon

Buddy Guest Mike Lewitt Bill McCorkle

November 28, 2001

RISC

IA-64

IA-32

What Have We Seen So Far?Where is the Competition?

Overview of Today’s Events Company History Differences in AMD Athlon

Architecture System Bus Macro vs. Micro Operations Floating Point Operations Branch Prediction Memory Management

Comparing Processor Performance

AMD Intel May 1, 1969 – founded

Semiconductor company 1975 8080A and AM2900 1976 Sign cross-licensing

agreement 1987 AMD & Intel go to court 1992 Court awards full rights

to AMD to produce AM386 Processor

1991 AM386 (breaks Intel Monopoly)

1993 AM486 1997 AMD-K6 1998 Athlon – 1st 7th

Generation Processor

July 18, 1968 – founded Semiconductor memory

1971 4004 introduced 1971 8008 introduced 1976 Sign cross-

licensing agreement 1981 16-bit 8086 1982 286 (on-board

memory) 1985 32-bit 386 1989 486 1993 Pentium 1998 Celeron & Pentium

II

Architecture Summary AMD Approach

Balanced approach to optimize processor performance (IPC) and improving the operating frequency at the same time.

Intel Approach Increased pipelining depth to handle more

instructions which created loss in processor performance (IPC).

Solution: Compensated with much higher frequency to stay in competition. (=IPC)

Architecture Summary Overall Improvement to Performance

Frequency Improvements Smaller Geometries Faster Transistors (“process shrinks”) Deeper Pipelines Fewer Gates Per Clock Cycle

Work Per Clock Improvements Super scalar Architectures Dynamic Instruction Schedulers Larger On-Chip Caches Advanced Branch Prediction

Architecture Summary Clock Speed / EV6 Bus

Designed with very high clock speeds in mind

K7 has very deep buffers to enable those high clock speeds, offering up to 72 x86 instructions in-flight.

Uses Rising Edge and Falling Edge Detection For Bus

100 MHz Clock 200 MHz Processor 133 MHz Clock 266 MHz Processor

AMD vs. Intel comparing same clock

Architecture Summary EV6 Bus on AMD Athlon

Scalable up to 200 MHz Yielding Effective frequency 400 MHz

Multiprocessor support Highest bus bandwidth (1.60 GB/s)

Intel using 133 MHz (1.01 GB/s)

AMD Athlon

PIII

Architecture Summary Instruction Control Unit

Holds 72 MOps Before Assignment(MOp = x86 instruction, therefore Athlon

can have 72 “in-flight” instructions) P6 Only Holds 13 in-flight MOps

Architecture Summary Execution Ports

AMD Has No Less Than 9 Intel Has 5

2 Dedicated to memory stores

Enhanced Parallelism Inside Athlon

Micro-OPs / Macro-OPs Athlon has 3 parallel x86 instruction

decoders translate into a Macro-Op of 72-entry ICU Uses 2 pipelines (Intel uses 1)

-Decoding common instructions (direct path) -Decoding complex x86 instructions (vector path)

Integer Scheduler is fed and holds max 15 M-Ops, representing 30 at a time

Leads to 3 parallel integer execution units

Micro-OPs / Macro-OPs Athlon Decoders 3-Way Instruction

Has 3 parallel decoding units Can handle any combination of instructions with

any of it’s decoders that are “fully capable” decoders

Handles Complex and Simple Instructions Intel Decoders

Has 3 parallel decoding units 1 Complex 2 Simple

Handles Complex / Simple / Simple

3DNOW!

3DNOW! (Athlon) SSE (Intel)

Pipelines (parallel) 2 2

Instructions (how wide) 2 4

Effective Instructions per Cycle 4* 4

Registers Used 3DNOW! / FPU No FPU

Every 4-wide Intel SSE instruction is actually 2 Athlon micro-ops

*AMD takes advantage of rising edge as well as falling edge

**SSE Cannot be used with MMX Registers

MMX Developed When FPUs Not As Important

3DNOW!

Each pipeline can do any instruction above.

The second pipeline can do any instruction in any group except the group the first pipeline has chosen.

3DNOW!

Conclusion of 3DNOW! Vs SSE Both have pairing restrictions

SSE Separate Unit implementation more difficult program with more freedom

MMX-add & prefetch-instructions slightly better for SSE

Final Conclusion: DRAW

Full Architecture viewsAMD Athlon

PIII

Looking at the ALUs

Floating Point Operations

Fully pipelined FPU 3 ported parallel Floating Point

Execution Units Pentium has 3 also, but are behind

only one port FPU can execute two 80-bit

extended Ops Intel can currently only execute one

Pipelining Differences Determining the length

Execution rate of pipeline (ALU) Degree of Parallelism

AMD Athlo

n

Intel Pentium III

Integer Pipeline Length

10 12-17

Floating Point Pipeline length

15 25

(AMD-Athlon)

Branch PredictionExample:

if (x > 0){a=0;b=1;c=2; }

d=3;

Cycle Fetch Decode Execute Save

1 if (x>0)2 a=0 if (x>0)3 b=1 a=0 if (x>0)4 c=2 b=1 a=0 if (x>0)5 c=2 b=1 a=06 c=2 b=17 c=2

Cycle Fetch Decode Execute Save

1 if (x>0)2 a=0 if (x>0)3 b=1 a=0 if (x>0)

4 d=3squash

b=1squash

a=0 if (x>0)

5 d=3squash

b=1squash

a=0

6 d=3squash

b=17 d=3

Cycle Fetch Decode Execute Save

1 if (x>0)2 d=3 if (x>0)3 d=3 if (x>0)4 d=3 if (x>0)5 d=3

When x>0

When x<0

Predicting x<0

Branch Prediction AMD Athlon

Branch Target Buffer size of 2048 entries Branch History Table can store 4096 entries

Intel Pentium III Dynamic Branch Predictor can store 512

entries Approximate Correct Branch Predictions

AMD Athlon: 95% Intel Pentium III: 90-92%

Memory Management Level 2 Cache

512kB to 8 MB Rate of 1/3, 1/2, 2/3, 1/1 the clock frequency External to the CPU (Weakness of Athlon)

Intel L2: 256kB ‘on-die’ Intel moving away from Slot1 and back to socket AMD will need to move to ‘on-die’ and socket

connections to stay competitive Main push towards 0.18 -process

Level 1 Cache 64kB data and instruction caches (4x Pentium III) Scalability

Which One Is Better? In the past (286, 386, 486)

Performance = Frequency

In Today’s World Performance = IPC * Frequency

How else so we compare? Benchmarking

Benchmarking Software that performs different

tasks to obtain comparisons between processors.

Problems: Processor frequencies. Other processes already running. Types of programs

Some programs are written to take advantage of certain architecture.

Photo Editing Software

Animation Software

3D Graphics Editor

3D Gaming

Various Benchmarks

Summary Past couple years, AMD and Intel

have taken different approaches. We have gone over the main

architectural differences. We have shown how they compare. It will be very interesting to see

how the market plays out.

Questions?

References http://www.amd.com http://www.amdzone.com http://www.intel.com Gardner, Ryan. AMD employee CPU Specialist

email: ryan.gardner@amd.com Hsieh, Paul. 7th Generation CPU Comparisons.

http://www.azillionmonkeys.com/qed/cpujihad.shtml . 11/30/00 Pabst, Thomas. The New Athlon Processor – AMD is Finally Overtaking

Intel . http://www6.tomshardware.com/cpu/99q3/990809/index.html. 8/9/99

Pabst, Thomas. AMD Processors vs. Intel Processors – Facts and Lies. http://www6.tomshardware.com/cpu/00q4/001017/athlon-02.html. 10/12/00

Morgan, Rob. Power Mac G4 Dual 500 vs. Pentium 4 vs. Athlon. http://www.barefeats.com/pentium.html . 1/08/01

Recommended