SMT Verification of the POWER5 and POWER6 High-Performance Processors

  • View
    583

  • Download
    3

Embed Size (px)

Text of SMT Verification of the POWER5 and POWER6 High-Performance Processors

  • 1.IBM Power Systems 2008 IBM CorporationSMT Verification of the POWER5 and POWER6High-Performance ProcessorsJohn LuddenSenior Technical Staff MemberHardware VerificationIBM Systems & Technology Group

2. IBM System p2 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance Processors1. What is a multi-threaded processor? Essentially a processor core that executes multipleinstruction streams simultaneously Each thread appears to software as a virtual processor core2. What are the advantages of SMT? More efficient utilization of silicon real estate and power: smalldie size increase compared to adding another core Increased system throughput by utilizing processor resourcesthat would otherwise be idle3. What are the disadvantages of SMT? Increased complexity -> Makes verification state space MUCHlarger SMT verification much harder than SMP Possibly degrades performance of some applicationsIntroduction to Simultaneous Multi-Threading(SMT) 3. IBM System p3 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance Processors1. Video Game Systems Sony Playstation 3: IBM CELL processor Xbox 360: IBM Xenon processor2. Personal Computers: Intel Pentium 4 Hyper-Threading (HT) processors3. Servers: SUN UltraSparc Systems: T1 (4 threads) and T2 (8 threads) HP Superdome Systems: Intel Itanium 2 IBM Power Systems: POWER5 and POWER6 processorsExamples of SMT microprocessors 4. IBM System p4 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance Processors1. Context : POWER5 vs. POWER6 Microarchitecture Comparison2. Verification methodology: In the beginning3. The times they are a changing: SMT arrives in POWER54. POWER6: An in-order design should be simpler, but5. Future directions?Overview 5. IBM System p5 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsConsistent predictable deliveryIBM POWER systemsPOWER4+POWER4POWER5POWER5+POWER620012003200420062007 6. IBM System p6 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsPOWER5 ChipHigh FreqPOWER5SMT2 Core~2 MB L236 MB L3Controller36 MBL3ChipSMP Interconnect FabricMemoryControllerBufferChipsHigh FreqPOWER5SMT2 CorePOWER6 ChipUltra FreqPOWER6SMT2 Core4 MB L232 MB L3Controller32 MBL3Chip(s)SMP Interconnect FabricUltra FreqPOWER6SMT2 Core4 MB L2MemoryControllerMemoryControllerBufferChipsBufferChips 7. IBM System p7 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsPOWER5 PipelineMP ISS RF EA DC WB XferMP ISS RF EX WB XferMP ISS RF EX WB XferMP ISS RF F6XferF6F6F6F6F6CPBRLD/STFXFPGroup Formation andInstruction DecodeInstruction FetchBranch RedirectsInterrupts & FlushesOut-of-Order ProcessingWBFmtD1 D2 D3 Xfer GDD0D0Shared by two threads Resource used by thread 1Resource used by thread 0Shared IssueQueuesCPLSU0FXU0LSU1FXU1FPU0FPU1BXUCRLSharedExecutionUnitsRead SharedRegister FilesDynamicInstructionSelectionThreadPriorityGroup Formation,Instruction Decode,DispatchSharedRegisterMappersAlternateTargetCacheBranch PredictionInstructionTranslationInstructionCacheProgramCounterBranchHistoryTablesReturnStackInstructionBuffer 1InstructionBuffer 0Write SharedRegister FilesGroupCompletionStoreQueueDataCacheDataTranslationL2CacheIF BPICIF 8. IBM System p8 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsHigh-end server: New POWER6 microprocessorTopology Two cores on chip, a 2-way SMP Core private L1s (64KB I, 64KB D) Superscalar, SMT cores Chip private 8 MB L2 cache L3 32 MB off chip Two-tier SMP fabricTechnology 65 nm SOI 341 mm2 die size 10 Layers of metal 790 million transistors on chip Frequency : 3.5, 4.2, 4.7, 5.0 GHzCustom & semi-custom design style High frequency constraints3.3 M Lines of VHDL 9. IBM System p9 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsPOWER6 core pipelineInstruction fetch pipelineInstruction fetch pipelineBR/FX/Load pipelineBR/FX/Load pipelineFloating Point PipelineFloating Point Pipeline Check Point Recovery PipelineCheck Point Recovery PipelineBR/CRBR/CRFXFXLOADLOADLegend :Legend : Pre-decode stageIfetch/Branch stageDelayed/Transmit stageInstruction Decode stageInstruction Dispatch/Issue stageOperand access/execution stageWrite back stageCompletion stageCheck Point stageFX result bypassLoad result bypassFloat result bypassCache access stageP1P1P2P2P3P3P4P4 IC0IC0 ROTROTIC1IC1EX1EX1FMTFMTAGAGDISPDISPPDPDIB0IB0 IB1IB1RFRFRFRFRFRFRFRF DC0DC0 DC1DC1EX2EX2 EX3EX3 EX4EX4 EX5EX5 EX6EX6 EX7EX7EXEXISSISS ECCECCECCECCBHTBHTBHTBHTIFARIFARInstruction dispatch pipelineInstruction dispatch pipeline 10. IBM System p10 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsPOWER6 corePOWER6 processor is ~2X frequency of POWER5 (4 5 GHz)POWER6 instruction pipeline depth equivalent to POWER5 Minimize power Scale performance with frequencyInstruction Fetch Instruction Buffer/Decode Instruction Dispatch/Issue Data Fetch/ExecuteFXU Dependent executionLoad Dependent executionPOWER6 extends functionality of POWER5 core 64K I cache, 64K D cache, 2 FXU, 2 Binary FPU, 1 branch execution unit Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread) Decimal Floating Point Unit VMX Unit (PowerPCs SIMD ISA) Recovery Unit~6ns/instr~3ns/instr 11. IBM System p11 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsBullet-proof computingSystem reliability with recovery unit Every measure possible taken to preserve application execution Retry soft errors Change hardware for hard errorsProcessor architected state check pointedEvery 1 cycleECC & Non-ECC protected circuitry checkedEvery cycleProcessor restarts from last saved checkpointProcessor workload moved to another CPUNo error foundNo error foundError foundError foundSoft error caseHard error case 12. IBM System p12 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsOverview1. Context : POWER5 vs. POWER6 microarchitecture comparison2. Verification methodology: In the beginning3. The times they are a changing: SMT arrives in POWER54. POWER6: An in-order design should be simpler, but5. Future directions? 13. IBM System p13 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsPOWER4/5/6 RTL verification technologyRTL(VHDL, Verilog)Language CompileModel BuildPhysical VLSIDesign Tools /Custom DesignCycle-basedModelFormalVerification:BooleanEquivalenceCheck(Verity)Software Simulator(MESA)HardwareAccelerator(Awan)Driver/CheckerAssertionsTest ProgramGenerator(GPRO, X-Gen)C++TestbenchConstraintRandomUnitTestbenchPSL et al.(Semi) FormalVerification(SixthSense,RuleBase) 14. IBM System p14 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsSingle threaded uniprocessor verification for POWER4Unit level: methodology inherited from POWER4 Driven by a combination of instruction level test cases (AVPs) created by Genesys-Pro (GPRO) pseudo-random test generator and random C++ driven irritation Instruction-By-Instruction (IBI) checking against AVP results Low level microarchitecture checkers written in C++Processor core (aka core) level Mixture of GPRO pseudo-random and directed random instruction level test cases IBI checking against AVP results Low level microarchitecture checkers written in C++- Irritation from random C++ drivers- Highly deterministic and architected state easily verifiable against test 15. IBM System p15 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsSymmetric multi-processor (SMP) verification for POWER4Chip (dual-core) level Test generation similar to uniprocessor via GPRO for false-sharingor non-sharing tests IBI checking against AVP results for two-independent instruction streamscontained within single test Low level microarchitecture checkers written in C++ L1/L2 interactions primary focus True-sharing scenarios, lock testing and storage access (weak)ordering checked GPRO employed but. IBI checking of these accesses is limited or not possible: Non-unique or non-deterministic results CML (architecture level coherency monitor) employed to detectthe right answer as a post-simulation rule check 16. IBM System p16 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential 2008 IBM CorporationIBM Systems & TechnologySMT Verification of the POWER5 and POWER6 High-Performance ProcessorsOverview1. Context : POWER5 vs. POWER6 microarchitecture comparison2. Verification methodology: In the beginning3. The times they are a changing: SMT arrives in POWER54. POWER6: An in-order design should be simpler, but