17
The Alpha 21264 The Alpha 21264 Thomas Daniels Thomas Daniels Other Dude Other Dude Matt Ziegler Matt Ziegler

The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

Embed Size (px)

Citation preview

Page 1: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

The Alpha 21264The Alpha 21264

Thomas DanielsThomas Daniels

Other DudeOther Dude

Matt ZieglerMatt Ziegler

Page 2: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

OutlineOutline

•Overview

•Instruction Set Architecture

•Instruction Stream

•Data Stream

Page 3: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

System OverviewSystem Overview

RISC instruction SetRISC instruction Set 64-bit processor64-bit processor 15 million transistors15 million transistors 11stst Alpha w/ out-of-order execution Alpha w/ out-of-order execution Speculative executionSpeculative execution

Page 4: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

SpecsSpecs

6/7-stage pipeline6/7-stage pipeline 4-way integer issue4-way integer issue 2-way floating point issue2-way floating point issue Peak of 6 instructions per cyclePeak of 6 instructions per cycle Sustainable of 4 instructions per cycleSustainable of 4 instructions per cycle Can have up to 80 instructions in Can have up to 80 instructions in

process at one timeprocess at one time 4-way out-of-order issue4-way out-of-order issue

Page 5: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

SpecsSpecs

4 integer execution units4 integer execution units 2 general purpose units2 general purpose units 2 address ALUs2 address ALUs 32 int & 31 fp registers (64-bits wide)32 int & 31 fp registers (64-bits wide) 48 int & 40 fp reorder registers48 int & 40 fp reorder registers 80 additional int registers (copies other 80 additional int registers (copies other

80)80)

Page 6: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

RISC vs. CISCRISC vs. CISC

P3 and Athlon are CISC processorsP3 and Athlon are CISC processors 21264 is a RISC processor21264 is a RISC processor

Today’s difference is that RISC ISA Today’s difference is that RISC ISA generally do not use operands from generally do not use operands from memory.memory.

Page 7: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

Differences from 21164Differences from 21164

Out of order issueOut of order issue Smaller pipelineSmaller pipeline Increased memory bandwidthIncreased memory bandwidth Memory references can be accessed in Memory references can be accessed in

parallel to cachesparallel to caches One pipeline for both floating point and One pipeline for both floating point and

integer operationsinteger operations 4x the bandwidth4x the bandwidth

Page 8: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

Previous instruction typesPrevious instruction types

BranchBranch Floating pointFloating point MemoryMemory Memory/Function codeMemory/Function code Memory/branchMemory/branch OperateOperate PALcodePALcode

Page 9: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

New instructionsNew instructions

Floating PointFloating Point Cache prefetchingCache prefetching MVI (motion video instructions)MVI (motion video instructions)

Page 10: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

Floating point instructionsFloating point instructions

To Better calculate square root To Better calculate square root Single precision (SQRTS)Single precision (SQRTS) Double precision (SQRTT)Double precision (SQRTT)

Move data between floating point and Move data between floating point and integer register filesinteger register files

Page 11: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

Prefetch instructionsPrefetch instructions

Allows the compiler to exploit the higher Allows the compiler to exploit the higher bandwidthbandwidth

Five instructions introducedFive instructions introduced

Page 12: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

Prefetch instructionsPrefetch instructions21264 Cache Prefetch and Management Instructions

Normal Prefetch

The 21264 fetches the (64-byte) block into the (level one data and level two) cache.

Prefetch with Modify Intent

The same as the normal prefetch except that the block is loaded into the cache in dirty state so that subsequent stores can immediately update the block.

Prefetch and Evict Next

The same as the normal prefetch except that the block will be evicted from the (level one) data cache as soon as there is another block loaded at the same cache index.

Write Hint 64

The 21264 obtains write access to the 64-byte block without reading the old contents of the block. The application typically intends to over-write the entire contents of the block.

Evict The cache block is evicted from the caches.

Page 13: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

MVIMVI

Or Motion Video InstructionsOr Motion Video Instructions

Set of Alpha processor instructions that Set of Alpha processor instructions that are categorized as SIMDare categorized as SIMD

Intended to implement high quality Intended to implement high quality software video encoding (MPEG-1, software video encoding (MPEG-1, MPEG-2, H.261 and H.263MPEG-2, H.261 and H.263

Page 14: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

MVI (unpack)MVI (unpack)

UNPKBWUNPKBW Unpack bytes to wordsUnpack bytes to words

UNPKBLUNPKBL Unpack bytes to long wordsUnpack bytes to long words

Page 15: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

MVI (pack)MVI (pack)

PACKWBPACKWB Truncates the four component words of the Truncates the four component words of the

input register and writes them to the low four input register and writes them to the low four bytes of the output register.bytes of the output register.

PACKLBPACKLB Truncates the 2 component long words of Truncates the 2 component long words of

the input register to byte values and writes the input register to byte values and writes them to the low two bytes of the output them to the low two bytes of the output register.register.

Page 16: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

MVI (Byte & Word Min. and Max.)MVI (Byte & Word Min. and Max.)

MINUB8MINUB8 MINUW4MINUW4 MINSB8MINSB8 MINSW4MINSW4 MAXUB8MAXUB8 MAXUW4MAXUW4 MAXSB8MAXSB8 MAXSW4MAXSW4

Take the form

MINxxx Ra, Rb, RcMINxxx Ra, Rb, Rc

Page 17: The Alpha 21264 Thomas Daniels Other Dude Matt Ziegler

MVI (PERR)MVI (PERR)

Replaced nine instructions for motion estimation.Replaced nine instructions for motion estimation.

It takes the 8 bytes packed into 2 quadword It takes the 8 bytes packed into 2 quadword registers and computes the absolute difference registers and computes the absolute difference between them, then adds the eight intermediate between them, then adds the eight intermediate results and right aligns the result in the results and right aligns the result in the destination register.destination register.

RESULT: Motion estimation calculations on 8 RESULT: Motion estimation calculations on 8 pixels in a single clock tick.pixels in a single clock tick.