22
May 2, 2012 1 Uri Tal, CEO GPU-Accelerated Simulations

Rocketick accelerated verilog simulations

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Rocketick  accelerated verilog simulations

May 2, 2012 1

Uri Tal, CEO

GPU-Accelerated Simulations

Page 2: Rocketick  accelerated verilog simulations

May 2, 2012 2

Long verification process In 66% of designs, verification takes >50% of the design cycle In ~40% of projects, simulation regression runtime is longer than 1

day

Large design/SoC simulation challenge >40% of designs are larger than 10M gates Difficult to simulate the entire design/SoC

Excessive computing resources Required 10’s or 100’s GBytes of memory Needs most advanced CPUs

The User ChallengeVerification Bottleneck

Effort spent on verification was increased by >58% in last 4 years

Source: 2010 Wilson Research Group and Mentor Graphics Functional Verification Study

Page 3: Rocketick  accelerated verilog simulations

May 2, 2012 3

Exploding Cost of Verification

$.0B$1.0B$2.0B$3.0B$4.0B$5.0B$6.0B$7.0B$8.0B

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011

Engineering Cost

Compute Farm

Source: Synopsys

Page 4: Rocketick  accelerated verilog simulations

May 2, 2012 4

Simulation

Emulation/Acceleration

Formal Verification

Others

Functional Verification Solutions

Slide :4

Page 5: Rocketick  accelerated verilog simulations

May 2, 2012 5

Simulators using CPUs

Event driven single queue of events

Memory access patterns cache miss

Multi-core CPUs: Only one order of magnitude Communication latency Limited bandwidth

Page 6: Rocketick  accelerated verilog simulations

May 2, 2012 6

HW Solutions

Hardware accelerators and emulators are not simulators

Suitable for system-level debug Are very expensive – HW cost, custom design Require significant effort for bring up Are limited in capacity (large designs require several boxes)

They lack: Support for non-synthesizable code 4-state-logic Full debug visibility

Page 7: Rocketick  accelerated verilog simulations

May 2, 2012 7

Simulations are Single-Threaded

Page 8: Rocketick  accelerated verilog simulations

May 2, 2012 8

The Cache becomes Useless

Page 9: Rocketick  accelerated verilog simulations

May 2, 2012 9

CPU Low Utilization

Thread #1 ALU Memory Load ALU Memory Load

Single thread + No “cache hit” Low CPU Utilization

Page 10: Rocketick  accelerated verilog simulations

May 2, 2012 10

A Brute-Force Approach

Page 11: Rocketick  accelerated verilog simulations

May 2, 2012 11

GPU Computing

Source: NVIDIA

Page 12: Rocketick  accelerated verilog simulations

May 2, 2012 12

146X

Medical Imaging

U of Utah

36X

Molecular Dynamics

U of Illinois, Urbana

50X

Matlab ComputingAccelerEyes

100X

AstrophysicsRIKEN

149X

Financial simulation

Oxford

GPU Computing – cont’

Source: NVIDIA

Page 13: Rocketick  accelerated verilog simulations

May 2, 2012 13

The Power of GPU

20032004

20052006

20072008

20092010

Peak Single Precision Performance GFlops/sec

Tesla 8-series

Tesla 10-series

Nehalem3 GHz

Tesla 20-series

Source: NVIDIA

Page 14: Rocketick  accelerated verilog simulations

May 2, 2012 14

GPU 100% Utilization

Thread #1 ALU Memory Load ALU Memory Load

Thread #2 ALU Memory Load ALU Memory Load

Thread #3 ALU Memory Load ALU Memory Load

Thread #4 ALU Memory Load ALU Memory Load

Thread #5 ALU Memory Load ALU Memory Load

Pipelining multiple threads can increase utilization to 100%

Page 15: Rocketick  accelerated verilog simulations

May 2, 2012 15

Logic Simulations - Challenge

– Billions of computing elements

– Short/simple calculations

– Many dependencies

– How to SIMD?

Page 16: Rocketick  accelerated verilog simulations

May 2, 2012 16

Breaking the Dependency Barrier

Page 17: Rocketick  accelerated verilog simulations

May 2, 2012 17

Compilation Stages

Analyze

•Parse source files•RTL/Static Elaboration

Compile

•Create optimal dependency graphs•Calculate optimal GPU invocation schemes

•Generate skeleton (ske.v)

Assembly

•Calculate optimal memory allocation for variables•Generate final recipes for the GPU virtual machine

Page 18: Rocketick  accelerated verilog simulations

May 2, 2012 18

Distributed Compilation

Linking Phase

Compilation Phase

Splitting Phase

Splitter

Split 0

Link

Split1

Split2 Link

.

Split-n

Link

Page 19: Rocketick  accelerated verilog simulations

May 2, 2012 19

Co-Simulation Approach

Page 20: Rocketick  accelerated verilog simulations

May 2, 2012 20

RocketSim™ OverviewSummary Highly-cost-effective simulation offload-engine, based on GPUs

10x acceleration factor compared to Cadence ncsim & Synopsys vcs

Acceleration increases with every new GPU generation

Works seamlessly with every existing simulator (Cadence, Synopsys, …)

Zero ramp-up time

Supports extremely huge designs (Giga-gates)

Short compilation time (minutes)

Full visibility

Page 21: Rocketick  accelerated verilog simulations

May 2, 2012 21

“With RocketSim, simulation time was reduced dramatically from weeks to days, with a tenfold increase in speed and five-fold decrease in server RAM requirements. This, together with support of 4-state and capacity of more than 1G gates, give us a superb tool to simulate our next generation GPU designs.”

- Dan Smith, Director of Engineering, NVIDIA

"Rocketick's RocketSim™ simulation accelerator solved verification bottleneck of our SwitchX® switch silicon IC project by running 37 days’ worth of simulation over a single weekend without changes in our standard verification environment and scripts. In addition, reducing memory consumption from 192GB to less than 8GB allowed full chip simulation of SwitchX®, which was previously impossible using standard simulators."

- Eitan Zahavi, Senior Director of Engineering, Mellanox

Customers testimonials

Page 22: Rocketick  accelerated verilog simulations

May 2, 2012 22

Thank you for your time

For more information:www.rocketick.com