34
"JAGUAR" X86 CORE FUNCTIONAL VERIFICATION Zihno Jusufovic

Jaguar x86 Core Functional Verification

  • Upload
    dvclub

  • View
    474

  • Download
    0

Embed Size (px)

Citation preview

"JAGUAR" X86 CORE FUNCTIONAL VERIFICATION Zihno Jusufovic

2 | Jaguar x86 Core Functional Verification | December 2012

“JAGUAR” X86 LOW-POWER CORE

3 | Jaguar x86 Core Functional Verification | December 2012

TWO X86 CORES TUNED FOR TARGET MARKETS

“Bulldozer” Family

Performance and Scalability

“Cat” Family Flexible,

Low Power, and Small

Mainstream Client and Server Markets

Low-power Markets

Optimized for Cloud

Clients

Small Die Area

Jaguar Hotchips 2012

4 | Jaguar x86 Core Functional Verification | December 2012

“JAGUAR” – DESIGN FOR LOW-POWER X86 CORE

§ Jaguar is based on AMD’s Bobcat low-power x86 core with goal to:

–  Improve IPC/power/frequency –  Update the ISA/feature set

§ Significant changes between Bobcat and Jaguar: –  Totally new L2-inclusive cache shared among four Jaguar cores

–  New power-management flow –  Update the ISA/feature set:

–  SSE4.1, SSE4.2

–  AES, CLMUL

–  MOVBE

–  AVX, XSAVE/XSAVEOPT

–  F16C, BMI1

–  40-bit physical address capable vs. 36-bit on Bobcat –  Improved virtualization

–  Many design blocks totally or significantly redesigned

5 | Jaguar x86 Core Functional Verification | December 2012

JAGUAR X86 CORE Microarchitecture

FP PRF

To/From Shared Cache Unit

Int PRF

ALU ALU LAGU SAGU

Div

Mul

Ld/St Queues

BU

FP Decode Rename

VALU VALU

FPAdd FPMul

VIMul St Conv.

32KB DCache

Int Rename

Scheduler Scheduler

FP Scheduler

32KB ICACHE

Decode and

Microcode ROMs

Branch Prediction

Jaguar Hotchips 2012

6 | Jaguar x86 Core Functional Verification | December 2012

JAGUAR COMPUTE UNIT (CU)

§ Four independent Jaguar cores

§ Shared cache unit (SCU) –  4 L2 data banks (total 2MB)

–  L2 interface tile L2D L2D

L2D L2D

L2 Interface

Core Core Core Core

CU SCU

To/From NB

Jaguar Hotchips 2012

7 | Jaguar x86 Core Functional Verification | December 2012

JAGUAR CORE PIPELINE 0 1 2 3 4 5 6 7 8 9 10 11 12 13

Branch Mispredict Latency 14 cycles

Load Use Latency L1 hit: 3 cycles

Fetch0 Fetch1 Fetch2 Fetch3 Fetch4 Fetch5 uCode ROM MDec

Dec0 Dec1 Dec2 iDec Pack FDec Dispatch Sched RegRd ALU WB

Transit FpDec RegRen Sched RegRd1 RegRd2 EXE WB AGU DC1 DC2

Jaguar Hotchips 2012

8 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" FUNCTIONAL VERIFICATION

9 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" FUNCTIONAL VERIFICATION STRATEGY

§ Jaguar core is verified with test benches at multiple levels: – Unit (Whacker)-level test benches

§  ID

§ DE

§ FP

§ LSDC

§ BU

§ L2I

§ MP

– Top (Cluster or CPC)-level test bench – System (SOC)-level test benches

10 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" TEST BENCHES

FP PRF

To/From Shared Cache Unit

Int PRF

ALU ALU LAGU SAGU

Div

Mul

Ld/St Queues

BU

FP Decode Rename

VALU VALU

FPAdd FPMul

VIMul St Conv.

32KB DCache

Int Rename

Scheduler Scheduler

FP Scheduler

32KB ICACHE

Decode and

Microcode ROMs

Branch Prediction

ID Test Bench

FP Test Bench

BU Test Bench

LSDC Test Bench

DE Test Bench

11 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" TEST BENCHES - CONT

L2D L2D

L2D L2D

L2 Interface

Core Core Core Core

CU

MP Test Bench (SCU + LS/DC/BU of each core)

To/From NB

Top (Cluster) Test Bench - CU

SCU

L2I Test Bench - SCU

12 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" FUNCTIONAL VERIFICATION STRATEGY

§ Cluster (Core) verification with mixed C++/SV(OVM/VMM)/assembly environment -- random and directed stimulus

§ Unit-level verification with SV OVM/VMM transaction-based random test benches

§ Formal verification used in FP and a few other blocks § Emulation done at SOC level § MVSIM used for power verification in cluster test bench § X-propagation targeted with special tool/regressions § Extensive use of coverage:

– Functional coverage – Code coverage – Microcode coverage

13 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" FUNCTIONAL VERIFICATION STRATEGY § Long bake (soak) time for bug hunting § Maintain high passing rate through the entire project

– Core/CPC team organized around “always tape-out ready” principle § Main code line should always be higher than 90% pass rate § Anything below 90% is considered a crisis -- all hands required to drive up pass rate

– Features developed on branches and merged when healthy enough to support main line pass rate > 90%

§ Different stimulus strategies used at different levels – Core test bench uses mix of random exercisers (generators) and directed tests

supported by global tools § Biased towards exerciser-based new development § Conscious effort to not write new directed tests because of maintenance costs

§ Rigorous core debug strategy – Unit test benches use SV OVM/VMM-constrained random transaction-based

tests

14 | Jaguar x86 Core Functional Verification | December 2012

JAGUAR TOP (CLUSTER)-LEVEL TEST BENCH BLOCK DIAGRAM

Fake UNB

Core Core Core Core

SCU

CU

L2I L2D L2D L2D L2D

DRAM Mem

I/O Mem

Various Monitors and

Programmable Drivers

System Model

I/O Mem

DRAM Mem

Bridge Code

MP Mem Model

x86 ISA Models 1 per Core

Various Core/CU-level

Checkers, Irritators, and

Cache Preloaders

15 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" TOP-LEVEL STIMULUS

§ x86 random test generators –  Many single-threaded and multi-threaded generators

–  Contemporary generator has more directed random capabilities and is used extensively in core/cluster-level test plan executions

§ Heavy emphasis on random and coverage for new stimulus requirements § Randomize control/configuration register state on per-test basis § L1/L2 cache preloaders and other dynamic, random irritators:

–  MCA, TLBs, external probes, power-management events, interrupts, etc.

§ Fake UNB: –  Built-in randomization for things like memory-read latency

§ Large amount of self-checking x86 directed tests, mostly legacy: –  Use coverage-based test case selection to reduce run cost

16 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS

§ Checking: – x86 ISA model

§ Architectural state compared at instruction retire

§ MP memory model checks all memory accesses, ordering rules, and consistency

§  Also used in MP unit-level test bench

– Cache coherency checkers § MOESI state and corresponding data checked between all caches

– Variety of other cluster-level checkers (i.e., power management, probes, stalls) – Thousands of inline RTL assertions – All unit-level checkers re-used in top test bench – Self-checking legacy-directed tests

§ Coverage: – Heavy use and dependency on functional coverage – Code coverage – Microcode code coverage

17 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" TOP TEST BENCH CHECKING, COVERAGE, AND REGRESSIONS

§ 24x7 regression runs – Use machine resources effectively

§ Have enough pending sims to keep all machines busy

– Requires a good, organized debug effort to cover all fails § User-friendly regression database with many options/filters

– Helps synchronizing debug efforts among multiple teams §  Debug methodology

–  Debug to root cause

18 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" TOP TEST BENCH METRIC

§ Test plan completeness § Functional, code, and microcode coverage § Regression cycles/instrs, pass rates, and fail signatures § RTL bug rates and open backlog § Verification bug rates and open backlog

19 | Jaguar x86 Core Functional Verification | December 2012

UNIT-LEVEL TEST BENCHES SUMMARY (1 OF 3)

§ All unit-level test benches based on SV (VMM or OVM) § Most stimulus is constrained random transaction-based

– Coverage-driven random stimulus – Randomization of control/configuration register is shared with higher-level test

benches –  Stimulus “state targets” with time-outs

§ Stimulus attempts to put DUT in a targeted state, with a time out, to catch deadlock/live-lock bugs

§ Examples: Artificial reduction of RTL queue size

§ Multi-unit test bench used to target coherency § Good simulation performance -- cycle per second (CPS)

– Goal 5-10x comparing to top test bench

20 | Jaguar x86 Core Functional Verification | December 2012

UNIT-LEVEL TEST BENCHES SUMMARY (2 OF 3)

§ 100% functional and code coverage with waiving few coverage points – Selectively exporting functional coverage points to high-level test benches

§ Checking done using assertions, high-level checkers, and x86 ISA model – All checks are re-used in the higher-level test benches – Checks for unit stimulus constraints exported to higher-level test benches – Create overlap of critical checking functions between unit-level and higher-level

checkers – Black and white box checking

– White box checks for: § Consistency between fields of different internal queues and arrays § Many others…

– Thousands of inline RTL assertions

21 | Jaguar x86 Core Functional Verification | December 2012

UNIT-LEVEL TEST BENCHES SUMMARY (3 OF 3)

§  Formal verification –  Still relaying on simulation –  FP – FPA and FPM theorem proofs –  Debug bus –  LS (USQ) –  CRC32 –  DE block –  Others

22 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" FP UNIT-LEVEL TEST BENCH BLOCK DIAGRAM

FPU RTL ME BFM

CCU

Opgen

Test(s)

FPU KOS Bridge

KOS

Checkers

FPU Mon Monitors

CLK, Reset,

Timeout

Load Store

Op db

Broadcast cloud

SRB BFM

23 | Jaguar x86 Core Functional Verification | December 2012

JAGUAR LSDC TEST BENCH BLOCK DIAGRAM

DRAM Mem

System Mem I/O

Mem DRAM Mem

MP Mem Model

Numerous monitors and

scoreboard-based checkers, plus

shadow models of D$, TLBs

DC

LS Sched., Ordering Queues Store Queue

Miss Buffers

Data Cache TLBs, TableWalker

Front-end Agent (Represents ID, ME, EX, etc.)

I/O Mem

Transactional stimulus generator recipes (multiply

selectable, interleavable)

Memory layout and page translation configuration

engines Dashed line indicates global re-use in MP test bench

Back-end Agent (Represents BU, NB, etc.

Not re-used.)

MP Probe/Write Irritator

24 | Jaguar x86 Core Functional Verification | December 2012

JAGUAR L2I UNIT-LEVEL TEST BENCH BLOCK DIAGRAM SCU

CRQ

PRQ

Interface stimulus per external device

Interface checking per device

Stimulus state

Checker Monitored state

Test interface

Trans. generator (test plug-ins)

Memory preloaders, irritators, etc.

Internal

Test

External

Back door

BANK/ L2 TAG

DSM

DPM

L2I

L2 DATA

x4

x4

x4

L2I connects 4 cores to NB and manages a shared, inclusive L2 cache.

Transaction driver (drive-x if invalid)

transaction monitor (x-checks)

25 | Jaguar x86 Core Functional Verification | December 2012

"JAGUAR" RTL BUGS FOUND PER TEST BENCH LEVELS

Test Bench Level Percentage of Found RTL Bugs

Unit-level Test Benches 31%

Top-level Test Bench 65%

System-level Test Bench 4%

•  Bug distribution rate does not match typical/expected distribution •  Top-level test bench found most bugs due to:

•  Some RTL blocks covered only in the top-level test bench •  Unit-level test benches extensively used in the bug fixes validation by

RTL team due to good simulation performance (CPS) •  Bug hunting late in the project relies more on top-level test benches

to find corner cases involving multiple blocks

26 | Jaguar x86 Core Functional Verification | December 2012

CHALLENGES

27 | Jaguar x86 Core Functional Verification | December 2012

POWER-MANAGEMENT VERIFICATION (1 OF 3)

§ New power-management interface between Jaguar and rest of the system § Shared L2 cache increases complexity § Each core can independently go to different levels of power states § Number of possible states very high:

– Number of clusters – Number of cores – Number of possible power states – Number of wake-up events – Specific windows of interest – Number of features affected by power-management events (example: probes,

debug features, etc.) § Specification changes

28 | Jaguar x86 Core Functional Verification | December 2012

POWER-MANAGEMENT VERIFICATION (2 OF 3)

§ Stimulus – Random generators

§ Random sequences to change power-management states

§ Per-thread stimulus

– Power-management irritators – UNB BFM built-in randomization for certain power-management events – Very few directed tests

§ Coverage – Functional coverage extensively used

§ Per-core coverage points

§ Cross-coverage points

– Microcode coverage

29 | Jaguar x86 Core Functional Verification | December 2012

POWER-MANAGEMENT VERIFICATION (3 OF 3)

§ Checking – Power-management checker – L2I and other checkers – Self-checking directed tests

§  Test bench level –  Unit-level test benches (L2I) –  Top-level test bench

§  Most of verification done here because heavy dependency on microcode and better simulation performance than on SOC-level test bench

–  SOC (System)-level test benches §  For power-management verification, SOC-level test bench is very important:

–  Some power-management features are very complex and not all details well-documented

–  Use SOC-level test bench to check power-management constraints used at lower test benches

–  First level at which all power-management components are integrated

30 | Jaguar x86 Core Functional Verification | December 2012

COHERENCY, SELF-MODIFYING CODE, AND CROSS-MODIFYING CODE (1 OF 2)

§ Coherency -- traditionally concerning feature in multi-processor (MP) environments – MOESI protocol – Common core interface (CCI) protocol between cluster and NB –  Inclusive L2 cache shared among four cores (from scratch)

§ Example: Flushing and invalidation of shared caches

– Complexity increases with increased number of clusters § SMC/CMC handled by hardware in x86

–  Increased complexity due to inclusive shared L2 cache § Out-of-order execution adds complexity

– LS block totally redesigned

31 | Jaguar x86 Core Functional Verification | December 2012

COHERENCY, SMC, AND CMC (2 OF 2)

§ Verification – Done at multiple levels of test benches

§ L2I test bench, top-level test bench, and SOC-level test benches

– Multiple levels of checkers (Jaguar-specific checkers and IP checkers) § CCI IP protocol checkers (and coverage)

– MP memory-model checker used for ordering and data consistency – Different types of stimulus (SV transaction-based, random generators, and

directed tests) § Some random generators created to target coherence/SMC/CMC specifically

–  True and false sharing

§ Cache preloading

§ Functional coverage used to check quality of random stimulus

§  MP test bench created to target coherency

32 | Jaguar x86 Core Functional Verification | December 2012

JAGUAR MP (LSDC + BU + L2I) TEST BENCH BLOCK DIAGRAM Fake NB

“Core” “Core” “Core” “Core”

SCU CPC

L2I L2D L2D L2D L2D

DRAM Mem

I/O Mem

System Mem

I/O Mem

DRAM Mem

MP Mem Model

Various Core/CPC-level

Checkers, Irritators, and Cache

Preloaders

Memory layout and page translation

configuration engines

“Core” BU

LSDC

BU Monitors, Checkers

LSDC Monitors, Checkers

LSDC tb stimulus

IF stimulus (not re-used)

(Exploded view)

33 | Jaguar x86 Core Functional Verification | December 2012

MISCELLANEOUS

§ Verification done in multiple geographic locations – Time zone differences

§ Good for 24-hours-a-day work on a project

§ Challenge for meetings and communication

– Sharing methodologies and tools § Jaguar is designed to be used in multiple SOCs § Adding new features late in a project § Compressed schedule § Jaguar verification team worked on two very successful projects (Bobcat and Jaguar)

§ Verification team starts a new project

34 | Jaguar x86 Core Functional Verification | December 2012

DISCLAIMER & ATTRIBUTION

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION © 2012 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.