Upload
vankhanh
View
225
Download
0
Embed Size (px)
Citation preview
Enabling HPC Design
David Zhang, CadenceSridhar Valluru, ArmArm TechSymposiaNovember 6, 2017
2 © 2017 Cadence Design Systems, Inc. All rights reserved.
Agenda
• Introductions
• Challenges to High-Performance Computing (HPC) Design
– With challenges come opportunities
• HPC Architecture and Key Subsystem
– Knowing the key blocks
• HPC System Design Enablement
– Lower risk and increased success
• Real Results and Silicon
– Leveraging learnings
• Wrap Up/Questions
3 © 2017 Cadence Design Systems, Inc. All rights reserved.
Challenges and Growth Opportunity for HPC
4 © 2017 Cadence Design Systems, Inc. All rights reserved.
Facebook4.5B posts/day
1B videos/day
Amazon426 items
sold/second
NETFLIX225 GB/week
per person
YouTube100 hours of video
uploaded each minute
WeChat40 minutes/day
1.1 Billion users
Data Is Growing Exponentially
12
5 © 2017 Cadence Design Systems, Inc. All rights reserved.
Exponential Data Growth Is Driving Changes in the Datacenter
Compute
• Mobility and IoT has accelerated need for fast and efficient data management
Connectivity
• PCI Express® (PCIe®) 3.0 cannot handle bandwidth of latest Ethernet without increased lane usage
Storage
• Storage and analysis of vast quantity of data being created
More Processing Power Bigger Pipe Larger/Faster Memory
6 © 2017 Cadence Design Systems, Inc. All rights reserved.
• Needs/requirements
– Power (efficiency), performance (higher), latency (lower)
– Scalability, reliability, portability, and durability
– QoS-sensitive computing demands
– Heterogeneity, redundancy
– Packaging and thermal
• Challenges
– New competing specifications – CCIX, GenZ, Open Capi
– Architectural changes – x86 to Arm® based
– IP selection and integration
– Verification and validation
– System bring-up
– Hardware and software – example PCIe just needs to work
– Ultimately, determining system requirements
Design ChallengesDemands at the SoC
7 © 2017 Cadence Design Systems, Inc. All rights reserved.
• Architectural choices
– Components: CPUs, GPUs, FPGAs, DSPs, accelerators
– Processors: X86, Arm, power, etc.
• Process choices
– Process nodes: 28/22nm, 16/14nm, 10/7nm
– Foundry technologies: Bulk, FinFET, FDSOI
• Tools decisions
– Design, development, verification, and validation
– Hardware, software, and performance analysis
• IP decisions
– Processing: CPU, DSP, controllers, NoC
– Interface and I/Os: PCIe, DDR, CCIX, Ethernet, peripherals
• Hardware and software decisions
– Emulation
– Operating systems, drivers, and APIs
Development DecisionsBuilding your SoC
Arm
28nm
Powerx86
PCIe
16nm14nm
7nm
22nm
FDSOIBulk
FinFET
x86
CCIXDDR
8 © 2017 Cadence Design Systems, Inc. All rights reserved.
HPC Architect and Key Subsystems
9 © 2017 Cadence Design Systems, Inc. All rights reserved.
The Enterprise SoC Server AnatomyA typical system diagram
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
10 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
The Enterprise SoC Server AnatomyProcessing subsystem and NoC
Arm Cortex®-A CPU Cluster
Arm Processors
Cache-Coherent Interconnect or
NoC/Fabric
11 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
The Enterprise SoC Server AnatomyPower management subsystem
Arm Cortex-A CPU Cluster
Arm Processors
Cache-Coherent Interconnect or
NoC/Fabric
Power Management Subsystem
NIC and Boot
12 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
The Enterprise SoC Server AnatomyDebug and peripherals
Arm Cortex-A CPU Cluster
Arm Processors
Cache-Coherent Interconnect or
NoC/Fabric
Power Management Subsystem
NIC and Boot
Debug
Peripherals
13 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
The Enterprise SoC Server AnatomyPCIe subsystem
PCIe 3.0 or PCIe 4.0 optimized
with specific SMMU<>PCIe RC
integration features
PCIe 4.0 steps up bandwidth
requirements significantly
14 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
The Enterprise SoC Server AnatomyCCIX subsystem
PCIe 3.0 or PCIe 4.0 optimized
with specific SMMU<>PCIe RC
integration features
PCIe 4.0 steps up bandwidth
requirements significantly
New CCIX protocol adds
increased performance and
cache coherency
New CCIX protocol adds chip-
to-chip acceleration and
flexibility
15 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
The Enterprise SoC Server AnatomyDDR memory subsystem
PCIe 3.0 or PCIe 4.0 optimized
with specific SMMU<>PCIe RC
integration features
PCIe 4.0 steps up bandwidth
requirements significantly
New CCIX protocol adds
increased performance and
cache coherency
New CCIX protocol adds chip-
to-chip acceleration and
flexibility
Integration of DDR4 controller
and PHY to provide access to
PCIe4 and CCIX
DDR4 at 3200+ to achieve
optimal system performance and
provide high bandwidth
16 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
Gen 4CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
IOs
System Memory
Management Unit
Flagship Design IP SubsystemsArm and Cadence IP integration
Peripherals
DDR4
Controller and PHY
CCIX
Controller and PHY
Design IPPCIe 4.0
Controller and PHY
CMN
Interconnect
Software
Drivers and API
Arm IPArm Cortex-A
CPU Cluster
FPGA
PCIe TL CCIX TL
Data Link Layer (DLL)
Logical PHY Layer (LPL)
AXI4 I/F CXS
PIPE Interface CC
IX C
on
tro
ller
Inbound Outbound Inbound Outbound
AP
B
DFI
AMBA
Multiport and Command Arbiter
Command Queue
WriteQueue
ReadQueue
BISTConfig
Registers
Low power
Transaction Processing
Look-AheadOptimization
Host Interface 2 to 16 Points
AXI4 AXI3 AHB
DDR PHY
DDR Controller
ISROS Wrapper/
Reference
Core Driver
Hardware Access
Interrupt
Handling/
Polling
CPS Interface
DIP Int. Ctrl
CDI
Developed by
Cadence
Developed by
Customer
Cadence Customer
17 © 2017 Cadence Design Systems, Inc. All rights reserved.
HPC System Integration and Design Enablement
18 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
Digital Implementation Tools SuiteCore engines
Voltus™
Tempus™
Quantus™
Power
Timing
Extraction
Innovus™ Place and Route
Modus™ Test Solution
Pegasus™ Design Rule Check
Genus™ Synthesis
19 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
Verification Tools SuiteVerification fabric
Xcelium™
Palladium®
Indago™
RTL Simulation
Waveform Debug
Hardware Acceleration
Waveform Debug
Smart Debug, Embedded
Software, Protocol Analyzer
IWBAMBA® Performance
Analysis
Perspec™Constraint-Driven Software
Use-Case Generation
JasperGold® Connectivity
Formal Proof
vManager™Verification Plan
Coverage Closure
20 © 2017 Cadence Design Systems, Inc. All rights reserved.
• Proven interoperability
– Cadence IP for PCIe 3.0 and 4.0 (ctrl and PHY) and Arm system IP
– Cadence IP for CCIX (ctrl and PHY) and Arm system IP
– Cadence IP for DDR4 (PHY) and Arm DMC
• Server/enterprise system optimized
– Arm DTI-ATS SMMU integration for improved system performance
– PCIe ordering issues optimized
– Proven system validation for Cadence and Arm
• System IP
– Cadence IP for PCIe, CCIX, and DDR
– Arm Cortex-A and server subsystems
PCIe IP
VIP + Testbench Arm Subsystem
EP
RP
PCIe
TBU
ATC S
M
S
M
TCUT
a
b
Inter-
conne
ct and
Mem
Optimized and proven
interoperability for server, enterprise,
and datacenter enablement
Specialized Arm DTI-ATS
Accelerating HPC Designs with Custom DTI-ATSNext-generation system – PCIe and ATS
21 © 2017 Cadence Design Systems, Inc. All rights reserved.
New Enterprise and Server System Solutions
Enabling Advanced Architectures with New SpecificationsNext-generation system – CCIX accelerators
• CCIX – Cache coherent interconnect for accelerators– Coherent multi-chip interconnect and acceleration framework to
improve performance and efficiency of datacenter applications
– Open standard and multi-vendor support gives system designers the choice to customize their system
– Leverages existing PCIe ecosystem – physical connection, software, connectors
– Up to 25Gbps
• Cadence/Arm IP enablement– Cadence IP for CCIX (Ctrl and PHY) development
– Cadence IP for PCIe 4.0 (Ctrl and PHY)
– Cadence/Arm IP controller to interconnect integration
– Si-proven PHY platform upgraded to 25G to support CCIX
22 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
SoC Performance ExplorationIWB on Palladium solution for full SoC performance
AVIP
AVIP
Performance
Log
Cadence Interconnect Workbench
Inte
rco
nn
ect V
alid
ato
r
AVIP
PCIe
VIP
Performance Charts
IWB can generate AVIP
instantiations from same
spreadsheet
Software Use-Case
AVIP
23 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache-Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
CCIX Performance ExplorationChip-to-chip coherency validation and performance
AVIP AVIP
AVIP
Performance
Log
Cadence Interconnect Workbench
Inte
rco
nn
ect V
alid
ato
r
Performance Charts
PorterCCIX
AVIP
Software Use-Case
24 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
4.0CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
SimulationSoftware-driven integration verification using Perspec System Verifier
PCIe
VIP
CCIX
VIP
PCIe RC needs software configuration
to drive DMA transfers into and out of
memory
Perspec PCIe and Arm libraries
provide out-of-the-box productivity for
easy creation of complex software use-
cases
Perspec
System
Verifier
Software Use-Case
Arm
Library
PCIe
Library
Brooks
Model
Use-
CasesGoal-oriented use-cases created by the
user allow tests to be generated to give
best coverage of integration features
25 © 2017 Cadence Design Systems, Inc. All rights reserved.
FPGA
CPU Cluster
PCIe
Gen 4CCIX
DDR4
DDR4 PHY
Debug
and
Trace
Peripherals
Power Management
Cache Coherent Interconnect
GPIO NIC and
Flash I/F
NIC and
Flash I/F
Clocks
and
Test
I/Os
System Memory
Management Unit
Hardware Emulation/AccelerationPalladium hardware/software debug
Indago ESWD
Use synthesizable TARMAC logger for
offline hardware/software debug
26 © 2017 Cadence Design Systems, Inc. All rights reserved.
Real Results and Real Silicon for HPC
27 © 2017 Cadence Design Systems, Inc. All rights reserved.
• Collaboration between Xilinx, Arm, Cadence, and TSMC
• Enabling and accelerating HPC designs
• Providing real silicon results
Real Results – Real SiliconTSMC 7nm demonstration platform
28 © 2017 Cadence Design Systems, Inc. All rights reserved.
Complete Cadence Design Flow SolutionRevolutionizing the digital flow
Aug 2015
Quantus QRC
Extraction Solution
Innovus
ImplementationSolution
Joules
RTL Power
Analysis Solution
Pegasus
VerificationSolution
ModusTest Solution
GenusSynthesis Solution
StratusHigh-Level
Synthesis Solution
VoltusPower Integrity
Solution
May 2013
Nov 2013
Jul 2014
Feb 2015
Mar 2015
June 2015 Feb 2016
Aug 2015 Apr 2017
Cadence Full-Flow Digital SolutionTempus
Timing Solution
29 © 2017 Cadence Design Systems, Inc. All rights reserved.
• High-performance computing (HPC) is a high-growth area– Driven by datacenter, mobility, and IoT
– Growth presents many opportunities
• Cadence has developed a strong ecosystem– Key partners from IP to software to hardware to technology
– Working closely with Arm and TSMC
• Cadence provides a comprehensive enterprise system enablement solution– Comprehensive tools: IP, VIP, design, verification, and performance tools
– Full integrated solutions: Accelerates designs
– Silicon proven and validated: Reduces risks
Summary
© 2017 Cadence Design Systems, Inc. All rights reserved worldwide. Cadence, the Cadence logo, and the other Cadence marks found at www.cadence.com/go/trademarks
are trademarks or registered trademarks of Cadence Design Systems, Inc. Arm, AMBA, and Cortex are registered trademarks of Arm Limited (or its subsidiaries) in the EU
and/or elsewhere. CoreLink and CoreSight are trademarks of Arm Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. PCI Express and PCIe are
registered trademarks of PCI-SIG. SystemC is a trademark of Accellera Systems Initiative Inc. All other trademarks are the property of their respective owners.