Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
HOT CHIPS III SYMPOSIUM
An 80MHz 54-Bit
Floating Point RISC Processor
with Direct DRAM Support
IC:ReTECHNOlOGY.INC
Presentation Outline
- MICRON FRISC PROJECT
- SYSTEM DESIGN I TARGETAPPLICATIONS
- FRISC CHIP ARCHITECTUREAND DESIGN
- SOFTWARE SYSTEM
What is the Micron FRISC Project?
.EBlSQ - Stands for: Floating Point Reduced Instruction Set Computer
Objectives:
1. Design and Develop the World's Fastest Single Chip Processor2. Construct Systems From Processor Chip:
(SS) Scalar Supercomputer (256 MByte - 1 GByte)(MP) Multi-Processor 8 - 32 Processors (256 MByte - 1 GByte)(LSP) Large Scale Parallel Processor 64 - 256 Processors
(32 MByte - 128 MByte per Node)(RTW) Real Time Unix Workstation(RTL) Real Time Unix Laptop
3. Construct Embedded Control Board Level Products
performance Measyres:Peak Performance:
160 MFlop - 160 million Floating Point Operations per secondpeak rate for single precision IEEE format (32 bit)
80 MFlop - 80 million Floating Point Operations per secondpeak rate for double precision IEEE format (64 bit)
Sustained Performance:UNPACK Benchmark 38 - 42 MFlop, Double Precision
75 - 80 MFlop, Single Precision
leRCTUHClLClGY.1IC
FRlse SS - Scalar Supercomputer System Block Diagram
3.2
MICRONHOSTUNIX
PLATFORM
DRIVE
...._LA_N..... ETHERNETFOOl
leRC
FRISC SS - Scalar Supercomputer System Block Diagram
Fiii::::::::::::::::::::::::::::::::::::: 3wtr
F·......·····..··..··....··..···..···..··3........................................................................•.......Ylr
;. ........•......... .................. • l- .................. .................. .!.................. .................. $ $ $ $ .................. ..................-. .................. ..................-. .0 ..l- .•................ .1;a .................. ...............•..
D U 0'...........:..:........::....... •.................
セN........••..••.... ••.•.............. • Ie .................. ...............•.. .!.................. •....•............ ..............•... ................... .................. ...•.............. • l- .................. ..................
••..................•................. .................. .................... ...:.....................................•........... • セ .................. ..•............... .!;a •....•...........• .................. ..................セN
........••.•..•... .................. • $ • .f.................................... $ .................:.............................:..............:......................... ........•......... .;a. .................. .•...•.•.......... •
.0 0 0' Ie .................. ........•......... .!.................. ..................セN
.................. ..................• l- .................................... .1..........•....... .................. .................. ..................セN
........•..•...... .................. • Ie .................. ..•.•............. Nセ.................. •.................•.........••...... .................. ..................................... ..................•..•••.•.......... • l- •................. ......•...........••.................. ..................;. ..........•....... .................. • $ $ l- .................. .................. .!1.................. .................. .................. ..................
ii· .................. •................. II ,0 0 D Ie .................. .................. .Il!.................. ...•......•....•.................... ........•......... ......................•.......•••...セN .................. ..................• l- .................................... Nセ.................. ........•......... .................. ..................a· ........•......... ...•........•..... • • .................. .................. ."..............•... ....•............. ...............•.. ..................i· .................. ........•......... • . 0 l- .................. .................. Nセ.................. ..................セN
.................. ..................• $ $ $ $ Ie .................. .................. Nセ.................. .................. ......................•.............n
Ji iii::::::::::::::::::::::::::::::::::::: ., F·····....··..··..·....·..····..···..··..3..•.............................................................................
II:RO
FRlse MP - Multi-Processor System Block Diagram
MASTER SLAVE SLAVE
\ FAlSeFRlse FRISC
(256M -1G) (256M - 1G) (256M ·1G)
MSTR SlV7 SlV6 SlV5 SLV4 SlV3 SlV2 SlV1 SlVOen en en en en en en en:I
セ :I セ セ
セセ セ... ...
2 g セ 2«
2 セ 2 セ 2 セ 2 2 セ 2 セ> > ... > > > > > :>en en en en en en en
1 f f 1 f 1 f 1 f 1 f 1 t -I t 1 t I1 1 1 1 1 1 1 1
SHARED SRAM MEMORY IN FRlse INPUT/OUTPUT SPACE (SM)
II:ROセ N i c N
3.3
3.4
FRlse LSP - Multi-Mode Large Scale Parallel Processor
leRe
Target Applications
scientific and Technical Users:
1. Electronic simulation - analog and switch level simulationSPICE, SIlDS, COSMOS, VERIlDG, MSC/EMAS
2. ProcessSimulation -Supreae,pisces
3. Mechanical simUlation - finite elementand finite differenceNastran, Patran, ADINA, CINDA, DYNA3D
4. Fluid Dynaaics - Navier stkes EquationsFIDAP, (ADM, ARC3D, F1D52, OCEAN, SPEC77)
5. Mathe...tics -IMSL, LINPACK, EISPACK, SPARSEPACK, Mathematica
6. optical DesignACOS5
7. 3D Graphics and RenderingWavefront Technologies
8. ComputationalChemistry(BOHA, IIDG, QCD, TRFD)
9. l_ge and signal Signal processinc:J-Radar, Sonar, speechprocessing,l ...ge compression
10. Medical l_ging-MRI l ..ge Reconstruction
FRISC Chip Unique Architecture Features
• Processor bus unit directly supports a 32 way. bank interleaved. DRAM memorysystem and inpuVoutput memory
• Block transfer load/store operation permits concurrent execution unit and dataload store unit operation
• Vector interrupt unit supports premptable and non-premptable interrupts withinterrupt latency as short as 2 machine cycles
• Single cycle floating point units support 32 bit. 64 bit. fixed point and 32 bit. 64 bitIEEE floating point formats and chain mode operation in floating point
• Eight fully associative instruction buffers auto load with a least recently usedreplacement strategy. and permits concurrent execution and instructionbuffer load ッ ー セ イ 。 エ ゥ ッ ョ
• Pipeline interlocks and vector interrupts supported in hardware
leRCTECM«llOGY.1Nt
Instruction Set Summary (10F2)
131-10
セセセセセセセセセセnセセセセセセZセセ]・セュセセセNセnセo
I OPCODE I RSI I RD I IMMEDIATE r-- 21
iセZZ[ odャGBヲエョMeセ[eャャャqZGMGMMB[Rtセ MMMNN[[NLセセセ --::;:001";"';'-r-.-'-"'I::=l,'''-UIo' " RSI RD RS2 COUNT セRS
o
ilセ⦅ッ⦅イ」ッ⦅de⦅nli __RS_I_1- I_M_M_ED_I_ATE r-24
leRC........,.K. 3.5
Instruction Set Summary (20F2)
I. LMcI MllI &lor. In.ructlona
Theload 8Ild _.In_ClDftIiIl 01:-lI¥l., hdwonl._d.1llld セ load Illld_.
apar........M セ i n 、 X n i ⦅ B B B
L I- T".. LNd 8Ild SIDfe InIInIr:IiDftaI. l.oedIlyIe
セ Z エ Z Z Z Z ] セ:: t:::=:-UlIaIgned6. l.oed00IlIIle-.f7. SlDfeByl.I. SIDfeH8IIwMIt. SlDNWonlャ o N s ャ d n セ
セ NlIuclIiDna_balIlI-T"..8IldR-T"...
l i M t ケ ー ・ セ B B G M B
I. Add ...........2. Add UnIigned3. SUlMrIll:l --...Illld Compule CondIl_. aogMd4. SUlMrIll:l ..-....IllldCompule e.-.s. UftIIlIMd5.1wJ ...........6. Or ............7. Ea-Or...........
3. ConVoIT.lltla'" InlllrUCllonaConlrolwMSfer ..Sl_c:neng. セ_ cI • pr......T1_.__ 1flCUIe:
-Jump- Jumpend link ... cab- CondIl__ brand! Oft ALU ...• UnclondaIDNll .eldl
L J- TypeCOrllIaI T.....IMWI:IiDna
I. Jump2. JumpMd....
b.R-T"..COrllIaIT ......-
i ]セr・ァゥimイc. J. Type8lMdI ..........
I. CcIndlIiDneI8lMdIone.-セ ==::t:.'1:.or&lual4. CondiIianeI8lMdIonGr...lJlM5. ConlIiIoaMI8lMdI on"-T...,6. CandlIooneI8lMdIonGr...T.....7. CandlIooneI8lMdIon セ 8Ild ....
:: セ ] Z エ Z N セ M Z セ X i ャ 、 ....10. Cond.1DNII8lMdIono....lJlM Illld .....11. ConlliIilIfteIIhncIl on..-T.....endUr*12.ConlMiDMIlhncIlonGr_lJlM or セ end.....
4. l M t ケ ー ・ l d n セ B B B B G BL lAIId inIlMdiII. セ _ UMd
II) tarm...,.,32...__ 1Iom
....... ⦅ i ョ セ Q i ゥ i i i
...Or .........inIIruI:IIDft,
1.l.oed",,*.........
5.......' ...........
1. SpIciaI r ・ ァ i i イ ・ イ セ L-'2. Spea.IRegIIl. セ SlDN
6. S,....ConInII...........
,. HOP2. \/1CIDrel"mndIIiOrleI-"-' _11li13. Normal Enllfapl4. c....PcoeS. SU" P'C1g,-""",T_6 s.-- Aetelee-IW N | O ⦅ 「 ᄋ N G ⦅ セ ...tmPl
ICRC
Micron FRISC Supercomputer ChipFunction Block Diagram
oACe
t !BUSU
IA IOl OA lOS
f J セ 6
+ I +IA IOl 105 IOl
PCU FETCH14- FETCHIFU reRA RF EXU
Ol"HIA セ MIA wpQセ
(RO)STAll I (RS1HRS2) I IRO) I (AS1) (RS2 I IRSn1RS2) (110:
1· r i .- TI
I I (RS1)(RS2
BTRUセra
OLSU
セ DADl
WP2 WP1 SCRBRD SCR8RO STALL WP2
t I • II
3.6 ICRC
leRD
Unique Bank Interleaved Main Memory SystemProvides 640 MByte/Second Burst Transfer Rate
MEMORY ADDRESS
DRAM COlUMN ADDRESS
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 1514 13 12 11
[C.IJ I11101 BNK IX I DRAM ROW ADDRESS I
01...-. 1
10 9 8 7 6 5 4 3 2 1 0
CIIIIJJI IMBC I BYTE I
'MBC (MAPPER)110SPACE
SRAMIEPROM)
DBAMSPACE
BOOT
(
-BANK3
BANK2
BANK1
BANKO
STACK
---+ DATA2
---+ DATAIBSS
---+ TEXTTEXTDATA
PROCESS?
STACK
MAPPED ADDRESS
VECTORINTERRUPT
REGION1 G'GA·BmMEMORY AppRESS pEFINITION"
leRe 3.7
Bus Interface Unit
IC:Re
RASB3.RASBO}CASB3-BORDFIWRKB EXT. BUSRDIWRB SIGNALS
MSBLBOEB
OVERRNGUNDRNG
AD31-ADO ] EXT. BUSAC1-ACO SIGNALSBNK1-BNKO
DONEBJ ....---4
CONF.REG.
BUSYXDRV
OEB
IDL
BUSARBO
1A31.IAO MEM. CONF. PARAM.
IAC1·IACODA31·DAO ADDPORTODAC1·DACO
RESET--..-t----------t-----o+-t
CLK---"---------f------+-+-f
Instruction Fetch Unit
RST1 ClK1(NIA)
A031·A02 RSTO ClKO
J l セ lHITP HITP
DONEBP I DONEBP
\7LRUXLUX
BRESET aSQセ
L.A5-3TG2·0 TGR2-o TGWR-O -
A031-6 AD5-3 OLAT -RST1 CLK1
セ セRESET MCLK
セ tgrRセ DLAT 14-TGWR-O セM
Mセ AD5-2 L.A5-3
131-0 IBUFX
OLAT 14-.... TGR2-o TGWR-O セ----. AD5-2 L.A5-3BRESETHITPB
FETCH FETCH
YBUFX
BROOBGNTOBUSYO
1DL63-0
3.8 II:Reセ N i i i c N
Vector Interrupt Unit StateTransition Diagram
Floating Point UnitsChained Mode Operation
10000SURIW PORTS)
FPAD01
FPAQD2
IC:Re
3.10
Fast Newton Method for Divides andSquare Root Computations
RECIPROCAL ITERATIONS
COMPUTATION XR OPERAND XS OPERAND COMMENTS
1. X(O) c seed lookup
2. p(O) = cX(O) X(0) c mixed precisionwithfeedback• 0
3. X(1) - X(O)(2 - p(O» x(O) p(O) mixed precisionwithfeedback• 0
4. pel) = cX(l) X(l) c mixed precisionwithfeedback- 0
5. x(2) - X(l)(2 - p(l» x(l) pel) mixed precisionwithfeedback- 0
6. p(2) ., cX(l) X(l) c fixed point, full cLSB's of x(l)
7. p(2) - cx(1) x(1) c mixed precisionwithfeedback from step 6.
8. X(3) - X(2)(2 - p(2» x(2) p(2) fixed point, full p(2)LSB's of x(2)
9. x(3) ., X(2) (2 - p(2» x(2) p(2) mixed precisionwithfeedback from step 8.
leRelECMlOl.OQY•..c.
Fast Newton Method for Divides andSquare Root Computations
RECIPROCAL COMPUTATION OF xC"l):
where: xCa+1). X(.) (2 - p(m»pCa) - cx(.)pCa) - 2**Calpha) (1.fp), fp - p(n-2)p(n-3)••• p(O)alpha • exponentvalue of pea)
xc.) • (-1)**sx 2**(xexp - bias)( 1.fx)xexp - biasedexponentof x(a)bias • IEEE bias value 1023 for double and 127 for single
precision floating pointfx - x(n-2)X(n-3)••• x(O), x(.) fraction field
CASE 1. pea) exponentvalue (alpha) - セャN _. -
x("1) - « 1.0p(n-2) p(n-3)•••p(O) ) ( 1.x(n-2)x(n-3)••• x(O)
+ 2**(-n)( 1.x(n-2)x(n-3)•••X(O) » «-1)**sx) 2**(xexp - bias)
CASE 2. pea) exponentvalue (alpha) • 0
X("1) • « P'("n-2f. p(n-3)•••p(O) )( 1.x(n-2)x(n-3)...x(O)
+ 2** (-n+2)( 1.x(n-2)x(n-3)•••X(O) » «-1)**sx) 2**(xexp - bias -1)
leRe
Fast Newton Method for Divides andSquare Root Computations
RECIPROCAL SQUARE ROOT ITERATIONS
COMPUTATION XR OPERAND XS OPERAND COMMENTS
1. x(O) c seed lookup
2. q(O) - X(O)X(O) X(O) X(O) aixed precisionwithfeedback- 0
3. p(O) - cq(O) q(O) c aixed precisionwithfeedback- 0
4. x(l) - .5X(0)(3 -p(O» X(O) p(o) aixed precisionwithfeedback- 0
5. q(l) - x(l)X(l) X(l) x(l) aixed precisionwithfeedback- 0
6. pIll - cq(l) q(l) c aixed precision withfeedback- 0
7. X(2) - .5X(1)(3 - ー・ャᄏセ X(l) p(1) aixed precision withfeedback- 0
8. q(2) - x(2)x(2) x(2) x(2) fixed point, full x(2)LSB's of x(2)
9. q(2) - x(2)X(2) X(2) x(2) aixed precisionwithfeedback fro. step 8.
10. p(2) - cq(2) q(2) c fixed point, full cLSB's of q(2)
U. p(2) - cq(2) q(2) c aixed precisionwithfeedback froa step 10.
12. X(3) •• 5X(2)(3 - p(2») X(2) p(2) fixed point. full p(2)LSB's of x(2)
13. x(3) ••5X(2)(3 - p(2») X(2) p(2) aixed precision withfeedback froa step 12.
IC:ROQ ャ セ N Q n c N
Fast Newton Method for Divides andSquare Root Computations
RECIPROCAL SQUARE ROOT COMPUTATION OF x(m+l):
fx K
x(m+l)p(m)pCm)alphaxClII) ..xexpbias ..
where: • .5x(m)(3 - p(m»• cx(m)**2• 2** (alpha)(l.fp), fp. p(n-2)p(n-3)••• P(O)• exponentvalue of p(m)
(-l)**sx 2**(xexp - biaa)( l.fx)biasedexponentof x(m)IEEE bias value 1023 for double and 127 for single
precision floating pointx(n-2)x(n-3)••• X(O), x(m) fraction field
CASE 1. p(m) exponentvalue (alpha) - 0
xCm+l) .. « l.p(n-2) p(n-3)•••p(0) )( l.x(n-2)x(n-3)•••X(0)
+ 2**(-n+l)( l.x(n-2)x(n-3)••• x(0) » «-l)**sx) 2**(xexp - bias -1)
CASE 2. p(m) exponentvalue (alpha) • -1
SUBCASE 1. p(O) .. 0
X(m+l) .. « 1.00p(n-2) p(n-3) •••p(l) )( 1.x(n-2)x(n-3)•••X(0) )
+ 2** (-n-l)( l.x(n-2)x(n-3)•••x(0) )} «-l)**sx) 2**(xexp - bias)
SUBCASE 2. p(O) - 1
X(a+l) - « 1.00P<n=2rP<D=3)•••P7l) )( 1.x(n-2)x(n-3)••• x(0) }
+ 2**(-n)( 1.x(n-2)x(n-3)•••X(0) )} «-l)**sx) 2**(xexp - bi••) IC:RO3.11
Daxpy Scatter/Gather
This COCl8 overlaps anlhmellC and110 cycI8sby UNO.ng \he loop andaoutlle-bullenng \0l hel1lO'Sl8rlil8:
.ddlu
.d
LI: Cbt1.lrauL1r...1.1r"'l.lr_l.1cbt.l.lradd.lracld.lracld.lhcld.lcbts.lacldlacldl
cbtl.l
r'l,ICrOIr2, r2,l
rl, r3. r'l. 4r16.rl6, r'rll. rll, Jr6r20,r20. r6r22, r22,r61'24,r4,r1,4r32.r32,r16r34, r34.JrIlr36,r36.Jr20rll, rJl, r22r32,r4. r'l. 4d, 32Cr31Jr4,32 Cr4)
rl6, r3. r'l. 4
; ..,1 - 1IP1I2
rl.rlO,r12.r14 c- ."U.4I,."U.S)••.r16 c- ••·."(1)rll c- ••·."U.1Ir20 c- ••••" U.2)r22 c- •• ' •• U. 31r24, Jr26.r21. r30 <- .y U.4I,.y U.S) ...r32 c- .yUI •••·."UIr34 c- .yU.U •••·."Il.11r36 c- sy11.21 ·511(1021
; rll <- .y(1.31 • •• 11.31.yl1l ••yU.!I ••.•syU.31 c- r32•...rJl
., .2nd h.lf or loop
3.12
.cldl d,4(rl)
.cldl r3. 32Cr))
.ubcc rD. d, r 2ble LI.ddl r4. 32Cr4)
IC:RC
FRISC Chip and System Patents Pending
A Flexible MicroprocessorBus unit for the Operationof a HighSpeed Bank InterleavedDRAM Memory System.
Fast Newton Iterations for Reciprocal and Reciprocal squareRoots. (3)
Easily ConfiguraableFully Differential Fast Logic Circuit.
Universal Multiplier Accumulator.
High SpeedCMOS Driver Circuits.
Area Efficient Clock Noise Reduction in an IntegratedCircuitSubstrate.
Block TransferRegisterScoreboardUnit for· Data Processingsysteas.
High Speed, Five Port RegisterFile Having SimultaneousReadand Write and High Toleranceto Clock Skew. .
A RISC MicroprocessorPriority Vector Interrupt SystemwithPremptableand Non-PremptableOperationModes for the Directsupportof a Real Tiae operatingSystem.
High SpeedFully AssociateInstruction Buffer with Least RecentlyUsed ReplaceaentStrategyand Block Transfer Load operation.
A Bank InterleavedMemory Buffer.
A RISC Floating Point Unit that SupportsChainedMode InstructionExecutionand Hardware Pipeline Interlocking.
leRC
System Software Architecture
Sun DEC 3100 IBM RSI6OO0 VAXSPARCstatlon
I I I
FRISC SoftwareOeYelopmentEnvironment
Application LayerSunos I SVR4
FRISC ApplicationsDOSNetworking Software C server
/' r----\----TCpnp NFS TLP ... SunOSKemelI HosIDevice
FSX Operating SystemI セI OrMIrI
Network Interface Hosl Motherboard Host Interface FRISe Mothelboard
II::RC1iCHNDUlQY• INC.
Software Development Environment
...._.-......•.............. .. .1 C++ (rool end 1,_______ ._••••_ ••.1
Funaioninlining.machine independentoptimizaLions
-- inplace
••-.-.- in ー イ ッ セ ウ ウ
FRISCIL
3.13IC:RCIOUI'CC leveldcbuaer
...--._._..-._.... ⦅セ. .MMMMNMMMMNセ library inlincr iassemblycode ".--.-.;--.--.-.:
._--_..
enerator
Loop optimizations,machine dependentoptimizarions
Postpassoptimizer
3.14
Acknowledgements
FRISC Software. Hardware. Layout Team:
John MosbyGreg Perkins
Richard SchoenerChris UnreinCraig CarterLarry Balsan
View Graphs By ... Larry Cromar
leRe