Upload
others
View
15
Download
0
Embed Size (px)
Citation preview
Erik Bohnhorst, Jacob LibermanLast Updated: 05/15/2020
EGX EDGE AI PLATFORM
2
AGENDA
AI at the Edge Opportunity
Introduction to EGX Platform
GTC’20 New EGX Product Announcements
3
Our Opportunity: Confluence of IoT, AI and 5GNext Phase of Digital Transformation
IOT devices projected to grow to >150B
by 2025, >1T by 2035
5G will deliver 1000X better bandwidth
and 10X lower latency than 4G
By 2030, AI has a potential total economic
impact of up to $17T/year
5GIOT AI
4
Emerging Intelligent Applications at the EdgeNeed for AI at the Edge for Real Time Analysis
5
Need for Inferencing at the EdgeEnables Low-Latency, Highly Secure Processing while lowering Bandwidth to Cloud
High amount of real-time streaming Data
AI recommended insights/ actions
Cloud
Secure & Lower Bandwidth
High PerformanceEdge Servers with NVIDIA
IoT Sensors
AuthenticateDeployMonitorScaleManage
6
Requirements for Deploying AI at the Edge
AI Model Security
Always-on AI applications reliability require resiliency
Fleets in distant location with limited onsite IT support
Multiple Remote Locations Require a Different Approach
Container 1 Container 2 Container n
…
Edge
Cloud
7
EGX Platform Opportunity
40M Miles
of road 12B Acres 2M Factories
10T Miles drivenper year
13M Stores 5M Call center agents
Enabling Opportunities Across Industries
EGX Server
Application Frameworks
NGC
EGX Stack
Kubernetes Networking StorageSecurity
CUDA X
Third Party ISV’s
METROPOLIS
Smart Retail
5G AERIAL
Telco
ISAACRobotics
METROPOLISSmart Cities
CLARAHealthcare
8
NGC-Ready Servers: High Performance COTS Optimized for AI at the Edge
Performance-Validated “Out-of-the-box” systems accelerate time to solution
Remote Management and
SecurityEnabled through TPM 2.0 and Redfish
Enterprise Support option for
NGC Containers
Support for the software stack, including NVIDIA
drivers, CUDA®, container runtime, and NGC containers
running on Ubuntu and RHEL
9
Enabling Resilience & Monitoring of Advanced Deployments
Package manager for Kubernetes Easily configure, deploy and update
applications on Kubernetes
Container OrchestrationAutomated container deployment
including self-healing
Cloud Native Deployment Approach
NVIDIA EGX Stack
GPU Operator
10
NVIDIA GPU Operator: Enabling GPUs on Kubernetes
Manage GPU Accelerated Worker Nodes Like CPU Worker Nodes
Legacy
NVIDIA Driver
Linux Distribution
11
NVIDIA GPU Operator: Enabling GPUs on KubernetesManage GPU Accelerated Worker Nodes Like CPU Worker Nodes
Legacy
NVIDIA Driver
Linux Distribution
NVIDIA Runtime
Kubernetes
NVIDIA
Device
Plugin
NVIDIA Monitoring
NVIDIA Driver
Linux Distribution
K8s Device-Plugin Model
12
NVIDIA GPU Operator: Enabling GPUs on KubernetesManage GPU Accelerated Worker Nodes Like CPU Worker Nodes
Legacy
NVIDIA Driver
Linux Distribution
NVIDIA Runtime
Kubernetes
NVIDIA
Device
Plugin
NVIDIA Monitoring
NVIDIA Driver
Linux Distribution
NVIDIA
Driver
NVIDIA Monitorin
g
NVIDIA
Runtime
NVIDIA
Device
Plugin
Kubernetes
Linux Distribution
K8s Device-Plugin Model GPU Operator for K8s
13
GPU Node
Helm installNVIDIA/gpu-operator
NFD
Daemonset
DriverRuntime
Device
PluginMonitoring
GPU Operator
NVIDIA NFD
Plugin
Simplifying DAY 0 Operations: NVIDIA GPU OperatorHow Your GPU Nodes are Automatically Configured
https://github.com/NVIDIA/gpu-operator
https://devblogs.NVIDIA.com/NVIDIA-gpu-operator-simplifying-gpu-management-in-kubernetes/
1
2
3
14
Automating Day 1..N Operations : NVIDIA GPU Operator No Human Interaction Required
Management Nodes
1. Operator plugs into the Node
Feature Discovery Kubernetes
service
1. Automatically detects new
hardware features and labels
Worker Nodes
Discovered Node
15
Automating Day 1..N Operations : NVIDIA GPU Operator No Human Interaction Required
Management Nodes
1. Operator plugs into the Node
Feature Discovery Kubernetes
service
1. Automatically detects new
hardware features and labels
1. Calls the GPU Operator to
automatically install drivers on
host labeled GPUs
Worker Nodes
16
EGX Platform Software Stack
Kubernetes Red Hat OpenShift Kubernetes
Architecture x86 x86 ARM
GPU Operator 1.1 1.1 -
NVIDIA Driver 440.64.00 440.64.00 JetPack 4.4
NVIDIA Container Runtime 1.0.2 1.0.2 0.9
NVIDIA Kubernetes Device Plugin 1.0.0-beta6 1.0.0-beta6 -
Data Center GPU Manager 1.7.2 1.7.2 -
Helm 3 N/A (OLM) 3
Kubernetes 1.17 OpenShift 4 1.17
Container Runtime Docker CE 19.03 CRI-ONVIDIA Container
Runtime
Operating System Ubuntu Server 18.04 LTS Red Hat CoreOS 4 JetPack 4.4
Hardware NGC-Ready for Edge SystemEGX Jetson Xavier NX
GPU Accelerated Applications on Kubernetes
GPU Operator
17
NGC Catalog
Containers
Build | Customize | Integrate
Models Helm Charts
NGC-Ready
SimplifyDevelopment
Manage & Secure Applications
DeployAnywhere
NGC Private Registry
Access Control
Scanning, Signing, Encryption
Lifecycle Management
Manage | Secure | Share Certified | Hybrid | Supported
On Premises Cloud Edge
ACCELERATING TIME TO SOLUTION
18
NVIDIA Applications & Frameworks
CUDA-X
Third-Party
ISVs
Systems
Infrastructure
& Storage
Growing Ecosystem Adoption Across All Industries
Security &
Networking
Industrial
Medical
Intelligent Video Analytics
Conversational AI
5G & Cloud-RAN
EGX Stack
Cloud
19
Leading Enterprise CustomersGlobal Leaders turning to EGX for AI at the Edge
21
Introducing NVIDIA EGX™ A100 and EGX Jetson Xavier NX
22
Needs of Edge AI Computing InfrastructureRemote Locations are Unlike a Physical Data Center
Network
Image here
High Performance Compute
CPU only infrastructure is inefficient
Unique Security Requirements
Multiple attack surfaces(Sensors, Physical System, Software)
Low Latency Decisions
WAN latency is a bottleneck
23
Introducing EGX Jetson Xavier NX & NVIDIA EGX A100
Cloud Native, AI Scalability from 20TOPS to 10,000 TOPS, One Software Stack, Starting at 10W
Application Frameworks
NGC
EGX Stack
Kubernetes Networking StorageSecurity
CUDA X
Third Party ISV’s
METROPOLIS
Smart Retail
5G AERIAL
Telco
ISAACRobotics
METROPOLISSmart Cities
CLARAHealthcare
NVIDIA
T4
NVIDIA EGX Jetson Xavier NX
NVIDIA
RTX
NVIDIA
V100
NVIDIA EGX A100
24
Introducing EGX Support for Jetson Xavier NXXavier Performance. Nano Size.
Jetson Xavier NX
Cloud Native support
● 21 TOPS (INT8) at 15 W
● Supports upto 32 1080p IP cameras
● Cloud-native ready
Ecosystem partner
Micro-Edge Servers
● Turnkey partner systems
● AI Platform for AIoT, Retail, Industrial
computers, IoT gateways
Jetson Xavier NX
Developer Kit
● Available now for $399
● JetPack: Open source Linux
and CUDA-X software stack
● 500K+ Jetson developers
25
Introducing EGX A100 Converged Accelerator Combining Mellanox ConnectX-6 Dx and NVIDIA Ampere GPU Architecture
Unprecedented AI
Inference Performance
In-Line
Network Acceleration
● Secure GPU enclave protects AI model
● Line-speed TLS & IPSec Crypto Engines
● Service Mesh Offloads (SDN)
Enhanced
Security
● Ampere based Architecture
● 3rd Generation Tensor Core
● Dual 100Gb/s Ethernet or InfiniBand
● Accelerated Switch & Packet
Processing
● Time Triggered transmission tech for
Telco (5T for 5G)
26
EGX A100 Family Hardware Support for End to End SecuritySystem Level Security Benefits of Integration
New Security Engine for
Confidential AI
Secure/Authenticated
Boot
RDMA and GPUDirect
L4 Firewall
TLS Crypto Engine
IPsec Crypto Engine
Hardware Root of Trust
Secure Application
Secure Data
Secure Platform
Benefits of
Integration
27
Secure and Accelerate End to End AI WorkflowsNGC AI Model and Security Enhancements
PRE-TRAINED
MODELS
AI Toolkits & SDK’s
Transfer Learning
Federated Learning
NeMo
ConversationalAI
TensorRT Optimizer
Service Maker
TRAINING & REFINING
NGC Catalog Private Registry
Container Signing
Model Encryption
Model Versioning
Security Scanning
Access Control
DeploySecure
Manage
Remote EGX Systems
28
EGX Stack for A100 Capabilities
Triton Inference
Server
Kubernetes
GPU Operator
Storage
NVME-oF
Network Storage Encryption (AES-XTS)
ASAP²
Networking
Mellanox 5T for 5G
Secure Service Mesh(SDN)
Confidential AI Model Enclave
Security
Key ManagementTLS/IPSec Crypto
Secure Cloud Native Edge Deployments
EGX STACK
Application Frameworks
NGC Third Party ISV’s
METROPOLIS
Smart Retail
5G AERIAL
Telco
ISAACRobotics
METROPOLISSmart Cities
CLARAHealthcare
29
Use Case: Building an Accelerated CloudRAN at the Edge
30
Legacy Approach to WirelessDedicated Hardware for Every Function
Proprietary Base Station Processing
4G RAN
DATA
VOICE
31
5G RAN IS Cloud-Native & Runs Many WorkloadsPaves the Way to Creation of New Business Models
CLOUD NATIVE
OFF THE SHELF SERVERS
WITH GPUS (COTS)
5G CloudRAN
DATA
VOICE
AR / VR
ROBOTICS
IoT
AV
Virtualized, Distributed or
Centralized RAN (COTS)
32
Building an Accelerated CloudRAN at the EdgeGetting Started Today w/ Aerial & Metropolis
EGX 5G gNB Server #1
Host OS
CPUV100GPU
Mellanox SmartNIC
Kubernetes
cuVNFcuBB
L2+ Processing
Aerial 5G — L1 Processing
EGX MEC Server #2
Host OS
CPUTeslaGPU
Kubernetes
DeepStreamTensorRT
NGC Registry of Containers,Models & Helm Charts
MetropolisNVIDIA NVIDIA
Mellanox SmartNIC SmartNICSwitchPTP
Grand Master
EPC
Radio Unit+ User
Equipment
O-RAN
FrontHaul
Qty : 91080P Cameras
70Mbps
33
SUMMARY AND KEY TAKEAWAYS
1. Enterprises today are using NVIDIA GPUs to accelerate AI processing, and NVIDIA Mellanox
SmartNIC brings accelerated crypto processing, networking and security
1. NVIDIA announced two new EGX products - EGX Jetson Xavier NX micro-servers and EGX A100
converged accelerator family for scalable performance and lowest latencies at the edge
1. Powerful combination of the NVIDIA Mellanox SmartNIC and NVIDIA’s eighth-generation GPU
architecture in EGXA100 family helps build a more secure end to end AI compute infrastructure -
Secure Data, Secure Application, Secure Platform