14
Snapdragon NPE Overview Mark Charlebois Director, Engineering Qualcomm Technologies, Inc. Linaro Connect March 2018 Hong Kong

Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

Snapdragon NPE Overview

Mark Charlebois

Director, Engineering

Qualcomm Technologies, Inc.

Linaro ConnectMarch 2018 Hong Kong

Page 2: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

2

Snapdragon NeuralProcessing Engine

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc

Software accelerated runtimefor the execution of deep neuralnetworks on device

Available at:developer.qualcomm.com

Efficient executionon Snapdragon

Model framework/network support

Fixed and floating pointoptimizations

Supports Caffe2,CNTK, MxNet

New optimizations fornetworks

What’s new?

Developer Tools

Caffe2

Page 3: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

3

Elements of Snapdragon NPE SDK

API

• C++ library in binary form and header files

• Java library for Android integration

• C++ and Python API support for interacting with DLC

DLC

• Snapdragon NPE DNN model format

• Network is a collection of connected layers

• DNN models are stored in DLC files

Tools

• Model converters to create Snapdragon NPE compatible DNN models from popular training framework formats

• Optimization and debugging support tools

Support Assets

• Development host (x86 Ubuntu 14.04)

• User and reference documentation

• Tutorials and examples

• Benchmarking

Snapdragon NPE SDK

Page 4: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

4

Snapdragon NPE SW DiagramO

S D

rive

rsC

ore

Ru

nti

me DL Container

OS

Model DebugProfiling Logging

Runtime Engine

Model loader

SDK Productivity Components

3rd Party Apps Benchmarking Tutorial Samples User & Reference Docs

HW CPU Adreno GPU Hexagon DSP

Android & Linux (x86_64, Armv7, Armv7hf, AArch64) QuRT

User Defined Layers (UDL) API

ComputeNetworks

GPU DSPCPU

Network Debug Tools

SDK API

libOpenCL.solibsnpe_[a,c]dsp*.so

libsnpe_dsp_*skel.so

DNN Model Conversion Tools

Caffe/2 -> DLC fixed

TensorFlow -> DLC fixed

UDL Plugin

Caffe/2 -> DLC Float

TensorFlow -> DLC Float

Page 5: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

5

Snapdragon NPE SDK

• SDK can be downloaded from Qualcomm Developer Network◦ https://developer.qualcomm.com/software/snapdragon-neural-processing-engine-ai

The NPE SDK supports Qualcomm® Snapdragon™ 845, 820, 835, 625, 626, 650, 652, 653,

660, 630, 636, and 450 as well as the Qualcomm® Snapdragon™ 820Am automotive platform

and Qualcomm Snapdragon™ Flight. For Qualcomm® Adreno™ GPU support, libOpenCL.so

must be present on device.

Toolchains:

• Android (armv7, aarch64) - GCC and Clang toolchains

• Linux (armv7, armv7hf, aarch64, x86_64*) - GCC on ARM, Clang on x86_64

* CPU only

Page 6: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

66

NPE SDK Developer Tools

• snpe-net-run

• snpe-caffe-to-dlc

• snpe-caffe2-to-dlc

• snpe-tensorflow-to-dlc

• snpe-onnx-to-dlc*

• snpe-diagview

• snpe-dlc-info

• snpe-dlc-quantize

• snpe_bench.py

*Coming soon

Page 7: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

7

Using the Snapdragon NPE

Page 8: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

8

Snapdragon NPE Workflow

TensorFlow, the TensorFlow logo and any related marks are trademarks of Google Inc

GoogleNet

Inception

SSD

Alexnet

ResNet

MobileNet

SqueezeNet

Faster – RCNNCaffe2

User Defined Layer (UDL) – enables prototyping of layers not yet supported

Page 9: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

9

Input Image Formatting

◦ In the Snapdragon NPE, images must be presented as a tensor of shape (height x width x channel), where

channel is the fastest-changing dimension.

◦ See $SNPE_ROOT/models/alexnet/scripts/create_alexnet_raws.py in the SDK

NCHW NHWC

Caffe Format SNPE Format

NCHWMean

ImageConvert

For current Snapdragon NPE SDK release, N=1. Batch support coming in future release.

(1, 1)(1, 1)(1, 1) (1,2)(1,2)(1,2) (H,W)

Page 10: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

10

Quantized vs Non-Quantized Models

• Non-quantized DLC files use 32 bit floating point representations of network parameters.

• Quantized DLC files use 8 bit fixed point representations of network parameters and are

smaller.

.dlc File

Quantized

.dlc File

CPU/GPU

Runtime

(32 bit)

DSP

Runtime

(8-bit)

Page 11: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

11

Making a Snapdragon NPE Enabled Application

bool useUserSuppliedBuffers = false;

// Set the Runtime

static zdl::DlSystem::Runtime_t runtime =

zdl::SNPE::SNPEFactory::isRuntimeAvailable(zdl::DlSystem::Runtime_t::GPU)) ?

zdl::DlSystem::Runtime_t::GPU : zdl::DlSystem::Runtime_t::CPU;

// Load DLC Container

std::unique_ptr<zdl::DlContainer::IDlContainer> container =

zdl::DlContainer::IDlContainer::open(dlcPath);

// Build SNPE instance

zdl::SNPE::SNPEBuilder snpeBuilder(container);

std::unique_ptr<zdl::SNPE::SNPE> snpe = snpeBuilder.setOutputLayers({})

.setRuntimeProcessor(runtime)

.setUdlBundle(udlBundle)

.setUseUserSuppliedBuffers(useUserSuppliedBuffers)

.build();

App setup

Page 12: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

12

Making a Snapdragon NPE Enabled ApplicationRunning the network (ITensor)

// Load the inputs

std::unique_ptr<zdl::DlSystem::ITensor> inputTensor = loadInputTensor(snpe, fileLine); // See SDK docs

static zdl::DlSystem::TensorMap outputTensorMap;

// Run the network

snpe.execute(inputTensor, outputTensorMap);

zdl::DlSystem::StringList tensorNames = outputTensorMap.getTensorNames();

// Access the results

std::for_each( tensorNames.begin(), tensorNames.end(), [&](const char* name){

auto tensorPtr = outputTensorMap.getTensor(name);

for ( auto it = tensorPtr->cbegin(); it != tensorPtr->cend(); ++it ){

float f = *it;

...

});

}

Page 13: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

13

Making a SNPE Enabled ApplicationRunning the network (UserBuffer)

// Load the Inputs

loadInputUserBuffer(applicationInputBuffers, snpe, fileLine); // See SDK Docs

// Run the Network

snpe.execute(inputMap, outputMap);

const zdl::DlSystem::StringList& outputBufferNames = outputMap.getUserBufferNames();

// Access the results

std::for_each(outputBufferNames.begin(), outputBufferNames.end(), [&](const char* name)

{

auto buffer = applicationOutputBuffers.at(name).data();

float *f;

for (auto i=0; i< buffer.size(); i+=sizeof(float)) {

f = reinterpret_cast<float *>(&buffer[i]);

...

}

});

applicationOutputBuffers.at(name)

Page 14: Snapdragon NPE Overview - Amazon Web Servicesconnect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-3… · Runtime Engine Model loader SDK Productivity Components 3rd Party

Follow us on:

For more information, visit us at:

www.qualcomm.com & www.qualcomm.com/blog

Thank you!

Nothing in these materials is an offer to sell any of the

components or devices referenced herein.

©2018 Qualcomm Technologies, Inc. and/or its affiliated

companies. All Rights Reserved.

Qualcomm is a trademark of Qualcomm Incorporated,

registered in the United States and other countries. Other

products and brand names may be trademarks or registered

trademarks of their respective owners.

References in this presentation to “Qualcomm” may mean Qualcomm

Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries

or business units within the Qualcomm corporate structure, as

applicable. Qualcomm Incorporated includes Qualcomm’s licensing

business, QTL, and the vast majority of its patent portfolio. Qualcomm

Technologies, Inc., a wholly-owned subsidiary of Qualcomm

Incorporated, operates, along with its subsidiaries, substantially all of

Qualcomm’s engineering, research and development functions, and

substantially all of its product and services businesses, including its

semiconductor business, QCT.