Upload
chiportal
View
661
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
May 2, 2012 1
Uri Tal, CEO
GPU-Accelerated Simulations
May 2, 2012 2
Long verification process In 66% of designs, verification takes >50% of the design cycle In ~40% of projects, simulation regression runtime is longer than 1
day
Large design/SoC simulation challenge >40% of designs are larger than 10M gates Difficult to simulate the entire design/SoC
Excessive computing resources Required 10’s or 100’s GBytes of memory Needs most advanced CPUs
The User ChallengeVerification Bottleneck
Effort spent on verification was increased by >58% in last 4 years
Source: 2010 Wilson Research Group and Mentor Graphics Functional Verification Study
May 2, 2012 3
Exploding Cost of Verification
$.0B$1.0B$2.0B$3.0B$4.0B$5.0B$6.0B$7.0B$8.0B
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Engineering Cost
Compute Farm
Source: Synopsys
May 2, 2012 4
Simulation
Emulation/Acceleration
Formal Verification
Others
Functional Verification Solutions
Slide :4
May 2, 2012 5
Simulators using CPUs
Event driven single queue of events
Memory access patterns cache miss
Multi-core CPUs: Only one order of magnitude Communication latency Limited bandwidth
May 2, 2012 6
HW Solutions
Hardware accelerators and emulators are not simulators
Suitable for system-level debug Are very expensive – HW cost, custom design Require significant effort for bring up Are limited in capacity (large designs require several boxes)
They lack: Support for non-synthesizable code 4-state-logic Full debug visibility
May 2, 2012 7
Simulations are Single-Threaded
May 2, 2012 8
The Cache becomes Useless
May 2, 2012 9
CPU Low Utilization
Thread #1 ALU Memory Load ALU Memory Load
Single thread + No “cache hit” Low CPU Utilization
May 2, 2012 10
A Brute-Force Approach
May 2, 2012 11
GPU Computing
Source: NVIDIA
May 2, 2012 12
146X
Medical Imaging
U of Utah
36X
Molecular Dynamics
U of Illinois, Urbana
50X
Matlab ComputingAccelerEyes
100X
AstrophysicsRIKEN
149X
Financial simulation
Oxford
GPU Computing – cont’
Source: NVIDIA
May 2, 2012 13
The Power of GPU
20032004
20052006
20072008
20092010
Peak Single Precision Performance GFlops/sec
Tesla 8-series
Tesla 10-series
Nehalem3 GHz
Tesla 20-series
Source: NVIDIA
May 2, 2012 14
GPU 100% Utilization
Thread #1 ALU Memory Load ALU Memory Load
Thread #2 ALU Memory Load ALU Memory Load
Thread #3 ALU Memory Load ALU Memory Load
Thread #4 ALU Memory Load ALU Memory Load
Thread #5 ALU Memory Load ALU Memory Load
Pipelining multiple threads can increase utilization to 100%
May 2, 2012 15
Logic Simulations - Challenge
– Billions of computing elements
– Short/simple calculations
– Many dependencies
– How to SIMD?
May 2, 2012 16
Breaking the Dependency Barrier
May 2, 2012 17
Compilation Stages
Analyze
•Parse source files•RTL/Static Elaboration
Compile
•Create optimal dependency graphs•Calculate optimal GPU invocation schemes
•Generate skeleton (ske.v)
Assembly
•Calculate optimal memory allocation for variables•Generate final recipes for the GPU virtual machine
May 2, 2012 18
Distributed Compilation
Linking Phase
Compilation Phase
Splitting Phase
Splitter
Split 0
Link
Split1
Split2 Link
.
Split-n
Link
May 2, 2012 19
Co-Simulation Approach
May 2, 2012 20
RocketSim™ OverviewSummary Highly-cost-effective simulation offload-engine, based on GPUs
10x acceleration factor compared to Cadence ncsim & Synopsys vcs
Acceleration increases with every new GPU generation
Works seamlessly with every existing simulator (Cadence, Synopsys, …)
Zero ramp-up time
Supports extremely huge designs (Giga-gates)
Short compilation time (minutes)
Full visibility
May 2, 2012 21
“With RocketSim, simulation time was reduced dramatically from weeks to days, with a tenfold increase in speed and five-fold decrease in server RAM requirements. This, together with support of 4-state and capacity of more than 1G gates, give us a superb tool to simulate our next generation GPU designs.”
- Dan Smith, Director of Engineering, NVIDIA
"Rocketick's RocketSim™ simulation accelerator solved verification bottleneck of our SwitchX® switch silicon IC project by running 37 days’ worth of simulation over a single weekend without changes in our standard verification environment and scripts. In addition, reducing memory consumption from 192GB to less than 8GB allowed full chip simulation of SwitchX®, which was previously impossible using standard simulators."
- Eitan Zahavi, Senior Director of Engineering, Mellanox
Customers testimonials
May 2, 2012 22
Thank you for your time
For more information:www.rocketick.com