EGX EDGE AI PLATFORM - Nvidia...Erik Bohnhorst, Jacob Liberman Last Updated: 05/15/2020 EGX EDGE AI PLATFORM 2 AGENDA AI at the Edge Opportunity Introduction to EGX Platform GTC’20

Erik Bohnhorst, Jacob LibermanLast Updated: 05/15/2020

EGX EDGE AI PLATFORM

2

AGENDA

AI at the Edge Opportunity

Introduction to EGX Platform

GTC’20 New EGX Product Announcements

3

Our Opportunity: Confluence of IoT, AI and 5GNext Phase of Digital Transformation

IOT devices projected to grow to >150B

by 2025, >1T by 2035

5G will deliver 1000X better bandwidth

and 10X lower latency than 4G

By 2030, AI has a potential total economic

impact of up to $17T/year

5GIOT AI

4

Emerging Intelligent Applications at the EdgeNeed for AI at the Edge for Real Time Analysis

5

Need for Inferencing at the EdgeEnables Low-Latency, Highly Secure Processing while lowering Bandwidth to Cloud

High amount of real-time streaming Data

AI recommended insights/ actions

Cloud

Secure & Lower Bandwidth

High PerformanceEdge Servers with NVIDIA

IoT Sensors

AuthenticateDeployMonitorScaleManage

6

Requirements for Deploying AI at the Edge

AI Model Security

Always-on AI applications reliability require resiliency

Fleets in distant location with limited onsite IT support

Multiple Remote Locations Require a Different Approach

Container 1 Container 2 Container n

…

Edge

Cloud

7

EGX Platform Opportunity

40M Miles

of road 12B Acres 2M Factories

10T Miles drivenper year

13M Stores 5M Call center agents

Enabling Opportunities Across Industries

EGX Server

Application Frameworks

NGC

EGX Stack

Kubernetes Networking StorageSecurity

CUDA X

Third Party ISV’s

METROPOLIS

Smart Retail

5G AERIAL

Telco

ISAACRobotics

METROPOLISSmart Cities

CLARAHealthcare

8

NGC-Ready Servers: High Performance COTS Optimized for AI at the Edge

Performance-Validated “Out-of-the-box” systems accelerate time to solution

Remote Management and

SecurityEnabled through TPM 2.0 and Redfish

Enterprise Support option for

NGC Containers

Support for the software stack, including NVIDIA

drivers, CUDA®, container runtime, and NGC containers

running on Ubuntu and RHEL

9

Enabling Resilience & Monitoring of Advanced Deployments

Package manager for Kubernetes Easily configure, deploy and update

applications on Kubernetes

Container OrchestrationAutomated container deployment

including self-healing

Cloud Native Deployment Approach

NVIDIA EGX Stack

GPU Operator

10

NVIDIA GPU Operator: Enabling GPUs on Kubernetes

Manage GPU Accelerated Worker Nodes Like CPU Worker Nodes

Legacy

NVIDIA Driver

Linux Distribution

11

NVIDIA GPU Operator: Enabling GPUs on KubernetesManage GPU Accelerated Worker Nodes Like CPU Worker Nodes

Legacy

NVIDIA Driver

Linux Distribution

NVIDIA Runtime

Kubernetes

NVIDIA

Device

Plugin

NVIDIA Monitoring

NVIDIA Driver

Linux Distribution

K8s Device-Plugin Model

12

NVIDIA GPU Operator: Enabling GPUs on KubernetesManage GPU Accelerated Worker Nodes Like CPU Worker Nodes

Legacy

NVIDIA Driver

Linux Distribution

NVIDIA Runtime

Kubernetes

NVIDIA

Device

Plugin

NVIDIA Monitoring

NVIDIA Driver

Linux Distribution

NVIDIA

Driver

NVIDIA Monitorin

g

NVIDIA

Runtime

NVIDIA

Device

Plugin

Kubernetes

Linux Distribution

K8s Device-Plugin Model GPU Operator for K8s

13

GPU Node

Helm installNVIDIA/gpu-operator

NFD

Daemonset

DriverRuntime

Device

PluginMonitoring

GPU Operator

NVIDIA NFD

Plugin

Simplifying DAY 0 Operations: NVIDIA GPU OperatorHow Your GPU Nodes are Automatically Configured

https://github.com/NVIDIA/gpu-operator

https://devblogs.NVIDIA.com/NVIDIA-gpu-operator-simplifying-gpu-management-in-kubernetes/

1

2

3

https://github.com/NVIDIA/gpu-operator

https://devblogs.nvidia.com/nvidia-gpu-operator-simplifying-gpu-management-in-kubernetes/





14

Automating Day 1..N Operations : NVIDIA GPU Operator No Human Interaction Required

Management Nodes

1. Operator plugs into the Node

Feature Discovery Kubernetes

service

1. Automatically detects new

hardware features and labels

Worker Nodes

Discovered Node

15

Automating Day 1..N Operations : NVIDIA GPU Operator No Human Interaction Required

Management Nodes

1. Operator plugs into the Node

Feature Discovery Kubernetes

service

1. Automatically detects new

hardware features and labels

1. Calls the GPU Operator to

automatically install drivers on

host labeled GPUs

Worker Nodes

16

EGX Platform Software Stack

Kubernetes Red Hat OpenShift Kubernetes

Architecture x86 x86 ARM

GPU Operator 1.1 1.1 -

NVIDIA Driver 440.64.00 440.64.00 JetPack 4.4

NVIDIA Container Runtime 1.0.2 1.0.2 0.9

NVIDIA Kubernetes Device Plugin 1.0.0-beta6 1.0.0-beta6 -

Data Center GPU Manager 1.7.2 1.7.2 -

Helm 3 N/A (OLM) 3

Kubernetes 1.17 OpenShift 4 1.17

Container Runtime Docker CE 19.03 CRI-ONVIDIA Container

Runtime

Operating System Ubuntu Server 18.04 LTS Red Hat CoreOS 4 JetPack 4.4

Hardware NGC-Ready for Edge SystemEGX Jetson Xavier NX

GPU Accelerated Applications on Kubernetes

GPU Operator

17

NGC Catalog

Containers

Build | Customize | Integrate

Models Helm Charts

NGC-Ready

SimplifyDevelopment

Manage & Secure Applications

DeployAnywhere

NGC Private Registry

Access Control

Scanning, Signing, Encryption

Lifecycle Management

Manage | Secure | Share Certified | Hybrid | Supported

On Premises Cloud Edge

ACCELERATING TIME TO SOLUTION

18

NVIDIA Applications & Frameworks

CUDA-X

Third-Party

ISVs

Systems

Infrastructure

& Storage

Growing Ecosystem Adoption Across All Industries

Security &

Networking

Industrial

Medical

Intelligent Video Analytics

Conversational AI

5G & Cloud-RAN

EGX Stack

Cloud

19

Leading Enterprise CustomersGlobal Leaders turning to EGX for AI at the Edge

21

Introducing NVIDIA EGX™ A100 and EGX Jetson Xavier NX

22

Needs of Edge AI Computing InfrastructureRemote Locations are Unlike a Physical Data Center

Network

Image here

High Performance Compute

CPU only infrastructure is inefficient

Unique Security Requirements

Multiple attack surfaces(Sensors, Physical System, Software)

Low Latency Decisions

WAN latency is a bottleneck

23

Introducing EGX Jetson Xavier NX & NVIDIA EGX A100

Cloud Native, AI Scalability from 20TOPS to 10,000 TOPS, One Software Stack, Starting at 10W


NGC

EGX Stack

Kubernetes Networking StorageSecurity

CUDA X

Third Party ISV’s

METROPOLIS

Smart Retail

5G AERIAL

Telco

ISAACRobotics


CLARAHealthcare

NVIDIA

T4

NVIDIA EGX Jetson Xavier NX

NVIDIA

RTX

NVIDIA

V100

NVIDIA EGX A100

24

Introducing EGX Support for Jetson Xavier NXXavier Performance. Nano Size.

Jetson Xavier NX

Cloud Native support

● 21 TOPS (INT8) at 15 W

● Supports upto 32 1080p IP cameras

● Cloud-native ready

Ecosystem partner

Micro-Edge Servers

● Turnkey partner systems

● AI Platform for AIoT, Retail, Industrial

computers, IoT gateways

Jetson Xavier NX

Developer Kit

● Available now for $399

● JetPack: Open source Linux

and CUDA-X software stack

● 500K+ Jetson developers

25

Introducing EGX A100 Converged Accelerator Combining Mellanox ConnectX-6 Dx and NVIDIA Ampere GPU Architecture

Unprecedented AI

Inference Performance

In-Line

Network Acceleration

● Secure GPU enclave protects AI model

● Line-speed TLS & IPSec Crypto Engines

● Service Mesh Offloads (SDN)

Enhanced

Security

● Ampere based Architecture

● 3rd Generation Tensor Core

● Dual 100Gb/s Ethernet or InfiniBand

● Accelerated Switch & Packet

Processing

● Time Triggered transmission tech for

Telco (5T for 5G)

26

EGX A100 Family Hardware Support for End to End SecuritySystem Level Security Benefits of Integration

New Security Engine for

Confidential AI

Secure/Authenticated

Boot

RDMA and GPUDirect

L4 Firewall

TLS Crypto Engine

IPsec Crypto Engine

Hardware Root of Trust

Secure Application

Secure Data

Secure Platform

Benefits of

Integration

https://github.com/NVIDIA/egx-platform

27

Secure and Accelerate End to End AI WorkflowsNGC AI Model and Security Enhancements

PRE-TRAINED

MODELS

AI Toolkits & SDK’s

Transfer Learning

Federated Learning

NeMo

ConversationalAI

TensorRT Optimizer

Service Maker

TRAINING & REFINING

NGC Catalog Private Registry

Container Signing

Model Encryption

Model Versioning

Security Scanning

Access Control

DeploySecure

Manage

Remote EGX Systems

28

EGX Stack for A100 Capabilities

Triton Inference

Server

Kubernetes

GPU Operator

Storage

NVME-oF

Network Storage Encryption (AES-XTS)

ASAP²

Networking

Mellanox 5T for 5G

Secure Service Mesh(SDN)

Confidential AI Model Enclave

Security

Key ManagementTLS/IPSec Crypto

Secure Cloud Native Edge Deployments

EGX STACK


NGC Third Party ISV’s

METROPOLIS

Smart Retail

5G AERIAL

Telco

ISAACRobotics


CLARAHealthcare

29

Use Case: Building an Accelerated CloudRAN at the Edge

30

Legacy Approach to WirelessDedicated Hardware for Every Function

Proprietary Base Station Processing

4G RAN

DATA

VOICE

31

5G RAN IS Cloud-Native & Runs Many WorkloadsPaves the Way to Creation of New Business Models

CLOUD NATIVE

OFF THE SHELF SERVERS

WITH GPUS (COTS)

5G CloudRAN

DATA

VOICE

AR / VR

ROBOTICS

IoT

AV

Virtualized, Distributed or

Centralized RAN (COTS)

32

Building an Accelerated CloudRAN at the EdgeGetting Started Today w/ Aerial & Metropolis

EGX 5G gNB Server #1

Host OS

CPUV100GPU

Mellanox SmartNIC

Kubernetes

cuVNFcuBB

L2+ Processing

Aerial 5G — L1 Processing

EGX MEC Server #2

Host OS

CPUTeslaGPU

Kubernetes

DeepStreamTensorRT

NGC Registry of Containers,Models & Helm Charts

MetropolisNVIDIA NVIDIA

Mellanox SmartNIC SmartNICSwitchPTP

Grand Master

EPC

Radio Unit+ User

Equipment

O-RAN

FrontHaul

Qty : 91080P Cameras

70Mbps

33

SUMMARY AND KEY TAKEAWAYS

1. Enterprises today are using NVIDIA GPUs to accelerate AI processing, and NVIDIA Mellanox

SmartNIC brings accelerated crypto processing, networking and security

1. NVIDIA announced two new EGX products - EGX Jetson Xavier NX micro-servers and EGX A100

converged accelerator family for scalable performance and lowest latencies at the edge

1. Powerful combination of the NVIDIA Mellanox SmartNIC and NVIDIA’s eighth-generation GPU

architecture in EGXA100 family helps build a more secure end to end AI compute infrastructure -

Secure Data, Secure Application, Secure Platform

Documents

EGX EDGE AI PLATFORM - Nvidia...Erik Bohnhorst, Jacob Liberman Last Updated: 05/15/2020 EGX EDGE AI PLATFORM 2 AGENDA AI at the Edge Opportunity Introduction to EGX Platform GTC’20