Upload
martin-douglas
View
215
Download
1
Embed Size (px)
Citation preview
The Alpha 21264The Alpha 21264
Thomas DanielsThomas Daniels
Other DudeOther Dude
Matt ZieglerMatt Ziegler
OutlineOutline
•Overview
•Instruction Set Architecture
•Instruction Stream
•Data Stream
System OverviewSystem Overview
RISC instruction SetRISC instruction Set 64-bit processor64-bit processor 15 million transistors15 million transistors 11stst Alpha w/ out-of-order execution Alpha w/ out-of-order execution Speculative executionSpeculative execution
SpecsSpecs
6/7-stage pipeline6/7-stage pipeline 4-way integer issue4-way integer issue 2-way floating point issue2-way floating point issue Peak of 6 instructions per cyclePeak of 6 instructions per cycle Sustainable of 4 instructions per cycleSustainable of 4 instructions per cycle Can have up to 80 instructions in Can have up to 80 instructions in
process at one timeprocess at one time 4-way out-of-order issue4-way out-of-order issue
SpecsSpecs
4 integer execution units4 integer execution units 2 general purpose units2 general purpose units 2 address ALUs2 address ALUs 32 int & 31 fp registers (64-bits wide)32 int & 31 fp registers (64-bits wide) 48 int & 40 fp reorder registers48 int & 40 fp reorder registers 80 additional int registers (copies other 80 additional int registers (copies other
80)80)
RISC vs. CISCRISC vs. CISC
P3 and Athlon are CISC processorsP3 and Athlon are CISC processors 21264 is a RISC processor21264 is a RISC processor
Today’s difference is that RISC ISA Today’s difference is that RISC ISA generally do not use operands from generally do not use operands from memory.memory.
Differences from 21164Differences from 21164
Out of order issueOut of order issue Smaller pipelineSmaller pipeline Increased memory bandwidthIncreased memory bandwidth Memory references can be accessed in Memory references can be accessed in
parallel to cachesparallel to caches One pipeline for both floating point and One pipeline for both floating point and
integer operationsinteger operations 4x the bandwidth4x the bandwidth
Previous instruction typesPrevious instruction types
BranchBranch Floating pointFloating point MemoryMemory Memory/Function codeMemory/Function code Memory/branchMemory/branch OperateOperate PALcodePALcode
New instructionsNew instructions
Floating PointFloating Point Cache prefetchingCache prefetching MVI (motion video instructions)MVI (motion video instructions)
Floating point instructionsFloating point instructions
To Better calculate square root To Better calculate square root Single precision (SQRTS)Single precision (SQRTS) Double precision (SQRTT)Double precision (SQRTT)
Move data between floating point and Move data between floating point and integer register filesinteger register files
Prefetch instructionsPrefetch instructions
Allows the compiler to exploit the higher Allows the compiler to exploit the higher bandwidthbandwidth
Five instructions introducedFive instructions introduced
Prefetch instructionsPrefetch instructions21264 Cache Prefetch and Management Instructions
Normal Prefetch
The 21264 fetches the (64-byte) block into the (level one data and level two) cache.
Prefetch with Modify Intent
The same as the normal prefetch except that the block is loaded into the cache in dirty state so that subsequent stores can immediately update the block.
Prefetch and Evict Next
The same as the normal prefetch except that the block will be evicted from the (level one) data cache as soon as there is another block loaded at the same cache index.
Write Hint 64
The 21264 obtains write access to the 64-byte block without reading the old contents of the block. The application typically intends to over-write the entire contents of the block.
Evict The cache block is evicted from the caches.
MVIMVI
Or Motion Video InstructionsOr Motion Video Instructions
Set of Alpha processor instructions that Set of Alpha processor instructions that are categorized as SIMDare categorized as SIMD
Intended to implement high quality Intended to implement high quality software video encoding (MPEG-1, software video encoding (MPEG-1, MPEG-2, H.261 and H.263MPEG-2, H.261 and H.263
MVI (unpack)MVI (unpack)
UNPKBWUNPKBW Unpack bytes to wordsUnpack bytes to words
UNPKBLUNPKBL Unpack bytes to long wordsUnpack bytes to long words
MVI (pack)MVI (pack)
PACKWBPACKWB Truncates the four component words of the Truncates the four component words of the
input register and writes them to the low four input register and writes them to the low four bytes of the output register.bytes of the output register.
PACKLBPACKLB Truncates the 2 component long words of Truncates the 2 component long words of
the input register to byte values and writes the input register to byte values and writes them to the low two bytes of the output them to the low two bytes of the output register.register.
MVI (Byte & Word Min. and Max.)MVI (Byte & Word Min. and Max.)
MINUB8MINUB8 MINUW4MINUW4 MINSB8MINSB8 MINSW4MINSW4 MAXUB8MAXUB8 MAXUW4MAXUW4 MAXSB8MAXSB8 MAXSW4MAXSW4
Take the form
MINxxx Ra, Rb, RcMINxxx Ra, Rb, Rc
MVI (PERR)MVI (PERR)
Replaced nine instructions for motion estimation.Replaced nine instructions for motion estimation.
It takes the 8 bytes packed into 2 quadword It takes the 8 bytes packed into 2 quadword registers and computes the absolute difference registers and computes the absolute difference between them, then adds the eight intermediate between them, then adds the eight intermediate results and right aligns the result in the results and right aligns the result in the destination register.destination register.
RESULT: Motion estimation calculations on 8 RESULT: Motion estimation calculations on 8 pixels in a single clock tick.pixels in a single clock tick.