34
IBM Power Systems © 2008 IBM Corporation SMT Verification of the POWER5 and POWER6 High-Performance Processors John Ludden Senior Technical Staff Member Hardware Verification IBM Systems & Technology Group

SMT Verification of the POWER5 and POWER6 High-Performance Processors

  • Upload
    dvclub

  • View
    594

  • Download
    3

Embed Size (px)

Citation preview

Page 1: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM Power Systems

© 2008 IBM Corporation

SMT Verification of the POWER5 and POWER6 High-Performance Processors

John Ludden Senior Technical Staff MemberHardware VerificationIBM Systems & Technology Group

Page 2: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

2 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

1. What is a multi-threaded processor?• Essentially a processor core that executes multiple

instruction streams simultaneously• Each thread appears to software as a “virtual” processor core

2. What are the advantages of SMT?• More efficient utilization of silicon real estate and power: small

die size increase compared to adding another core• Increased system throughput by utilizing processor resources

that would otherwise be idle3. What are the disadvantages of SMT?

• Increased complexity -> Makes verification state space MUCH larger

• SMT verification much harder than SMP• Possibly degrades performance of some applications

Introduction to Simultaneous Multi-Threading (SMT)

Page 3: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

3 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

1. Video Game Systems• Sony Playstation 3: IBM CELL processor• Xbox 360: IBM Xenon processor

2. Personal Computers:• Intel Pentium 4 Hyper-Threading (HT) processors

3. Servers:• SUN UltraSparc Systems: T1 (4 threads) and T2 (8 threads)• HP Superdome Systems: Intel Itanium 2• IBM Power Systems: POWER5 and POWER6 processors

Examples of SMT microprocessors

Page 4: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

4 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

1. Context : POWER5 vs. POWER6 Microarchitecture Comparison

2. Verification methodology: In the beginning…

3. The times they are a changing: SMT arrives in POWER5

4. POWER6: An in-order design should be simpler, but…

5. Future directions?

Overview

Page 5: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

5 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Consistent predictable delivery

IBM POWER systems

POWER4+

POWER4

POWER5

POWER5+

POWER6

20012003

20042006

2007

Page 6: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

6 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER5 Chip

High FreqPOWER5

SMT2 Core

~2 MB L2

36 MB L3Controller

36 MBL3

Chip

SMP Interconnect Fabric

MemoryController

BufferChips

High FreqPOWER5

SMT2 Core

POWER6 Chip

Ultra FreqPOWER6

SMT2 Core

4 MB L2

32 MB L3Controller

32 MBL3

Chip(s)

SMP Interconnect Fabric

Ultra FreqPOWER6

SMT2 Core

4 MB L2

MemoryController

MemoryController

BufferChips

BufferChips

Page 7: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

7 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER5 Pipeline

MP ISS RF EA DC WB Xfer

MP ISS RF EX WB Xfer

MP ISS RF EX WB Xfer

MP ISS RF F6Xfer

F6F6F6F6F6

CP

BRLD/ST

FX

FPGroup Formation and

Instruction Decode

Instruction Fetch

Branch Redirects

Interrupts & Flushes

Out-of-Order Processing

WB

Fmt

D1 D2 D3 Xfer GDD0D0

Shared by two threads Resource used by thread 1Resource used by thread 0

Shared IssueQueues

CP

LSU0FXU0LSU1FXU1

FPU0FPU1BXUCRL

SharedExecution

Units

Read Shared Register Files

DynamicInstructionSelection

ThreadPriority

Group Formation,Instruction Decode,

Dispatch

SharedRegisterMappers

Alternate

TargetCache

Branch Prediction

InstructionTranslation

InstructionCache

ProgramCounter

BranchHistoryTables

ReturnStack

InstructionBuffer 1

InstructionBuffer 0

Write Shared Register Files

GroupCompletion

StoreQueue

DataCache

DataTranslation

L2Cache

IF BPICIF

Page 8: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

8 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

High-end server: New POWER6 microprocessorTopology

– Two cores on chip, a 2-way SMP

– Core private L1s (64KB I, 64KB D)

– Superscalar, SMT cores

– Chip private 8 MB L2 cache

– L3 32 MB off chip

– Two-tier SMP fabric

Technology– 65 nm SOI

– 341 mm2 die size

– 10 Layers of metal

– 790 million transistors on chip

– Frequency : 3.5, 4.2, 4.7, 5.0 GHz

Custom & semi-custom design style– High frequency constraints 3.3 M Lines of VHDL

Page 9: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

9 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER6 core pipeline

Instruction fetch pipelineInstruction fetch pipeline

BR/FX/Load pipelineBR/FX/Load pipeline

Floating Point PipelineFloating Point Pipeline Check Point Recovery PipelineCheck Point Recovery Pipeline

BR/CRBR/CR

FXFX

LOADLOAD

Legend :Legend : Pre-decode stage

Ifetch/Branch stage

Delayed/Transmit stage

Instruction Decode stage

Instruction Dispatch/Issue stage

Operand access/execution stage

Write back stage

Completion stage

Check Point stage

FX result bypass

Load result bypass

Float result bypass

Cache access stage

P1P1

P2P2

P3P3

P4P4 IC0IC0 ROTROTIC1IC1

EX1EX1

FMTFMTAGAGDISPDISPPDPDIB0IB0 IB1IB1

RFRF

RFRF

RFRF

RFRF DC0DC0 DC1DC1

EX2EX2 EX3EX3 EX4EX4 EX5EX5 EX6EX6 EX7EX7

EXEX

ISSISS ECCECC

ECCECC

BHTBHT

BHTBHT

IFARIFAR

Instruction dispatch pipelineInstruction dispatch pipeline

Page 10: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

10 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER6 core

POWER6 processor is ~2X frequency of POWER5 (4 – 5 GHz)

POWER6 instruction pipeline depth equivalent to POWER5– Minimize power – Scale performance with frequency

Instruction Fetch Instruction Buffer/Decode Instruction Dispatch/Issue Data Fetch/Execute

FXU Dependent executionLoad Dependent execution

POWER6 extends functionality of POWER5 core– 64K I cache, 64K D cache, 2 FXU, 2 Binary FPU, 1 branch execution unit– Two way SMT with 7 instruction dispatch from 2 threads (maximum of 5 instructions per thread)– Decimal Floating Point Unit – VMX Unit (PowerPC’s SIMD ISA)– Recovery Unit

~6ns/instr

~3ns/instr

Page 11: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

11 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Bullet-proof computing System reliability with recovery unit– Every measure possible taken to preserve application execution

– Retry soft errors

– Change hardware for hard errors

Processor architected state check pointedEvery 1 cycle

ECC & Non-ECC protected circuitry checked Every cycle

Processor restarts from last saved checkpoint

Processor workload moved to another CPU

No error found

No error found

Error found

Error foundSoft error case

Hard error case

Page 12: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

12 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Overview

1. Context : POWER5 vs. POWER6 microarchitecture comparison

2. Verification methodology: In the beginning…

3. The times they are a changing: SMT arrives in POWER5

4. POWER6: An in-order design should be simpler, but…

5. Future directions?

Page 13: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

13 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER4/5/6 RTL verification technology

RTL(VHDL, Verilog)

Language CompileModel Build

Physical VLSI Design Tools / Custom Design

Cycle-basedModel

Formal Verification:

Boolean Equivalence

Check(Verity)

Software Simulator(MESA)

Hardware Accelerator

(Awan)

Driver/CheckerAssertions

Test Program Generator

(GPRO, X-Gen)

C++Testbench

ConstraintRandom

Unit Testbench

PSL et al.

(Semi) Formal Verification

(SixthSense,RuleBase)

Page 14: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

14 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Single threaded uniprocessor verification for POWER4

Unit level: methodology inherited from POWER4– Driven by a combination of instruction level test cases (AVPs) created by Genesys-

Pro (GPRO) pseudo-random test generator and random C++ driven irritation

– Instruction-By-Instruction (IBI) checking against AVP results

– Low level microarchitecture checkers written in C++

Processor core (aka “core”) level– Mixture of GPRO pseudo-random and directed random instruction level test cases

– IBI checking against AVP results

– Low level microarchitecture checkers written in C++

- Irritation from random C++ drivers

- Highly deterministic and architected state easily verifiable against test

Page 15: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

15 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Symmetric multi-processor (SMP) verification for POWER4

Chip (dual-core) level– Test generation similar to uniprocessor via GPRO for false-sharing

or non-sharing tests• IBI checking against AVP results for two-independent instruction streams

contained within single test• Low level microarchitecture checkers written in C++• L1/L2 interactions primary focus

– True-sharing scenarios, lock testing and storage access (“weak”) ordering checked

• GPRO employed but….– IBI checking of these accesses is limited or not possible:

› Non-unique or non-deterministic results› CML (architecture level coherency monitor) employed to detect

the “right answer” as a post-simulation rule check

Page 16: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

16 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Overview

1. Context : POWER5 vs. POWER6 microarchitecture comparison

2. Verification methodology: In the beginning…

3. The times they are a changing: SMT arrives in POWER5

4. POWER6: An in-order design should be simpler, but…

5. Future directions?

Page 17: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

17 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER5 SMT verification methodology

Evolutionary based on single thread uniprocessor and SMP approaches

– Traditional SMP scenarios now self-contained in a single core simulation model• Downward migration of dual-core methodology to single core model

New SMT verification scenario categories– Shared resource and priority conflicts:

• SMT resource types:– Equally shared between threads: Queue full conditions easier to hit– Dynamically shared / tagged: Either thread can consume most/all of the

resource– Replicated: Not shared…same as single thread

– Dynamic thread mode switching: SMT->ST; ST->SMT• Some applications attain better performance in ST mode• Shared resources re-allocated on each mode switch

Page 18: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

18 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Traditional SMP approach applied to SMT verification

SMT.tst

Random t0 Random t1

Core Level Registers common to both threads

t0 Registers

SMP.def(test template)

TestGeneration

Real memory is common to both threads with test generator managing some potential overlap

t1 Registers

Output test case

SMT.tst

Random t0 Random t1

Core Level Registers common to both threads

t0 Registers

SMP.def(test template)

TestGeneration

Real memory is common to both threads with test generator managing some potential overlap

t1 Registers

Output test case

Page 19: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

19 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Shared resource and priority conflicts

Approach was similar to SMP verification

– Testing largely consisted of “symmetric” instruction streams on each thread

• A particular resource targeted (e.g., GPR rename registers)

– 100 load instructions on each thread

– Coverage and lab feedback validated this approach

• Good enough: “Got the job done”

Page 20: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

20 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER5 dynamic thread mode switching

All architected states initializedThread enabledInitial

State

Thread 0 terminates itself

Shared resources reallocated

Random instructions

Normal finishThread enabled

Run State

Random instructions

Restart thread 0

Normal finishThread enabledFinal

State

All architected states initializedThread enabled

Save architected state

Wake up threadPartition resourcesRestore architected

state

Thread kills itself

Random instructions

Thread 0 Thread 1

Sim Driver

Other thread

Interrupt

Page 21: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

21 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER5 shared resource re-allocation on mode switch

0100200

GPR FPR

Rename Registers per thread

SMT ModeMaxST Mode 0

510

Split in half

Load Miss Queue entries per thread

SMT ModeST Mode

01020

Split in half

Branch Queue (BIQ) entries per thread

SMT ModeST Mode

02040

DynamicallyShared

Max LRQ/SRQ entries per thread

SMT modeMaxST mode

Page 22: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

22 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Overview

1. Context : POWER5 vs. POWER6 microarchitecture comparison

2. Verification methodology: In the beginning…

3. The times they are a changing: SMT arrives in POWER5

4. POWER6: An in-order design should be simpler, but…

5. Future directions?

Page 23: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

23 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER5: centralized complexity

POWER5

– Out-of-order design: Even in single thread mode, complex events naturally occur simultaneously

– Started from POWER4+: Known working design that was modified incrementally

– 23 FO4 design: Isolated complexity in Instruction Sequencing Unit (ISU):

• Every unit communicated back to ISU• ISU resolved all exceptions and

out-of-order conflicts

– ST and SMT modes both supported:• Alternating dispatch cycles per thread• Resources re-allocated on mode switch

FXU

FPU

LSU

IFU

ISU

Page 24: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

24 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

POWER6 distributed complexity

POWER6 – From-scratch mostly in-order design

• Normally, design is well behaved• Cross-thread interaction necessary for “tough

bugs”

– 13 FO4 design: Distributed complexity needed to achieve high performance goals

– Recovery unit (RU): • Must resolve out-of-order FP with in-order

pipelines• Checkpoints machine state• Recovers processor from soft errors

– Design is inherently in SMT mode all the time (almost)

• Dispatch to both threads in same cycle• Most resources dynamically shared / tagged• No resource reallocation on mode switch

IFU

IDU

FPU

LSU

RU

FXU

Page 25: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

25 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

The different verification engines have different strengths related to the verification tasks

POWER6 verification process

Software simulation– Slow, but low penalty for highly intrusive checking of model internals. Total model visibility.– Hundreds of AIX workstations running 24x7x365– New enhancements helped keep pace with design complexity– 2x number of simulation cycles of POWER5 design

Hardware-accelerated simulation– 10-1k x Faster than SW sim, but need less intrusive driving/checking to not slow down hardware box.– New usage: Mainline function verification– Yields additional 3x simulation cycle advantage over POWER5 (5x cycle advantage overall)

(Semi)-formal verification– (High to) Exhaustive coverage, but higher skill needed to drive. Scaling problems w/ model size.– Extensively used: Proved extremely valuable for complex SMT bugs

Hardware bring-up– Ideal speed, very limited visibility/controllability

Page 26: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

26 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Software simulation enhancements

Random command driven unit simulation for most core units– Yielded >1 Million lines of C++ code

– More control over generation for low level events

– More efficient test generation

Irritator threads at “core model” level– “Symmetric” instruction stream approach employed on POWER5 proved inadequate

“S” in SMT is for “Simultaneous”, not “Symmetric”

– Target cross-thread interactions at the microarchitecture level

– ~2x test generation efficiency

– Ensures both threads running the same length (self adjusting)

Page 27: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

27 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Irritator thread example

SMT_Irritator.tst

Long Random t0

Short Irritator t1

Core Level Registers common to both threads

SMT_Irritator.def(test template)

Test Generation

Real memory with test generator managing some potential overlap

Irritator thread restrictions

• Cannot cause unexpected exceptions

• Cannot modify memory read by random thread

• Cannot modify registers shared with other threads

• Architected results may be undefined

t1 Registerst0 Registers

Output test case

Page 28: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

28 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Irritator thread example

SEQUENCEREPEAT 100

SELECTGroup_All

stw nop, A

SEQUENCELB0: fdivA: b to LB0

Long Random Thread Irritator Thread

Generated Instr: 101Simulated Instr: 101

Generated Instr: 2Simulated Instr: Infinite

Kill Irritator Thread

Page 29: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

29 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Simulation acceleration usage on POWER6

Extensively used on POWER6

– Run lab exercisers prior to tape-out• Found additional bugs missed by software simulation• Debug new exerciser functionality prior to lab• Error injection and recovery testing• Reproducibility of lab bugs in “simulation-like” environment for rapid debug of root cause

• Rapid testing of bug fixes and collateral damage testing

– Linux boot prior to tape-out

– Not employed on POWER5 for “mainline” functional verification

Page 30: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

30 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Formal methods are a vital complement to simulation flow

– Lab bring-up bug re-creation• Often faster reproduction than simulation based

approaches• Aids in root cause analysis• High-coverage / proof of side-effect-free fixes

(Semi) Formal methods

Page 31: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

31 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Error detection and soft error recovery

Biggest challenge on POWER6

– Why so hard?

• Myriads of injection points coupled with large SMT state space– Often needed multiple “rare” combinations of “asymmetric” events on both threads while specific error was injected

• End-to-end recovery testing difficult at unit level– Really a “core” effort

– Verification strategy:

– Error injection and recovery on hardware accelerated simulation platform

– Dynamic on-the-fly error injection combined with “irritator threads” needed to cover large SMT recovery state space

Page 32: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

32 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Summary

1. SMT verification has four key pieces– Traditional SMP-like effort– Thread starvation and priority– Starting and stopping threads– Asymmetric “irritator thread” approach to verify often unforeseen cross-thread interactions at

the microarchitecture level

2. “From-scratch in-order” SMT design was more difficult to verify than the “out-of-order retrofitted” SMT design

– Complex events only occurred due to cross thread interaction– Even though team had experience– Required more “weapons” in the arsenal

3. High frequency design drove distributed complexity– Makes verification job harder– Increased dependency on formal verification for difficult bugs

4. “Mainframe”-like RAS on POWER6 drove a huge amount of work that was difficult to attack at the unit level

Page 33: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

33 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Overview

1. Context : POWER5 vs. POWER6 microarchitecture comparison

2. Verification methodology: In the beginning…

3. The times they are a changing: SMT arrives in POWER5

4. POWER6: An in-order design should be simpler, but…

5. Future directions?

Page 34: SMT Verification of the POWER5 and POWER6 High-Performance Processors

IBM System p

34 © 2006 IBM Corporation IBM SystemsDRAFT: IBM Confidential © 2008 IBM CorporationIBM Systems & Technology

SMT Verification of the POWER5 and POWER6 High-Performance Processors

Future directions

Predictions– RAS features will be an increasingly important feature of server

systems• POWER6 design has set the “bar” to a new high standard to which future

processors will have to measure up- Power Systems Revenue up 29% in 2Q08 (from 2Q07)

• Verification methods employed on POWER6 to attack nearly infinite state space created by the combination of SMT and processor recovery features will become standard practice

– A migration of “pre-silicon” verification techniques into “post-silicon” hardware lab verification effort

• Hardware is the fastest “simulator” available and the state space is getting bigger with SMT