Deep Vision - Electrical and Computer Engineeringyoga/capstone/media/DeepVision... · 2020-03-09 · Deep Vision Better object recognition with less power. ... Computer Vision: NVIDIA

Deep VisionBetter object recognition with less power

PurposeWe aim to implement a real time Deep Neural Network Image Recognition and Detection algorithm running on an embedded GPU device

Drones send real time video stream to the mobile GPU device and achieve real time recognition on objects

Participating in 2018 DAC Contest

Sponsored by and collaborated with Prof. Xie and SEAL LAB

Jenny Zeng

Team Members

Charlie Xu Chenghao JiangTerry Xie

2018 DAC Contest

Features embedded system implementation of neural network based object detection for drones

Provide hardware (Nvidia Jetson Tx2) and training dataset

Evaluation○ 20 FPS minimal○ Accuracy of detection○ Power consumption

Hardware - Nvidia Jetson TX2

GPU NVIDIA Pascal™, 256 CUDA cores

CPU HMP Dual Denver 2/2 MB L2 + Quad ARM® A57/2 MB L2

Memory 8 GB 128 bit LPDDR4 1866 MHz59.7 GB/s

Power External 19V AC Adapter 7.5 Watt Typical / 15 Watt Max

Software Support

● Ubuntu 16.04 LTS● Jetpack 3.1 SDK

○ Deep Learning: TensorRT, cuDNN, NVIDIA DIGITS™ Workflow

○ Computer Vision: NVIDIA VisionWorks, OpenCV○ GPU Compute: NVIDIA CUDA, CUDA Libraries○ Multimedia: ISP Support, Camera imaging, Video CODEC

GPU Nvidia Jetson TX2 Nvidia GTX 1080 Founders

Edition

Cuda Cores 256 2560

GPU Frequency 854 MHz Max-Q / 1122 MHz Max-P 1607 MHz Base / 1733 MHz Boost

Memory No dedicated Graphics Memory, 8GB LPDDR4 shared with CPU

8 GB GDDR5X 10Gbps

Power Consumption 7.5 Watt Typical / 15 Watt Max

180 W Max

Performance 1.5 TFLOPS 9 TFLOPS

Software - AlgorithmDetects and tracks people and objects in video captured by drones.

Why not conventional tracking?

Deep learning

We are implementing our design based on the state-of-the-art YOLO algorithm

Deep Learning basic idea

Original ImageCounter-clockwise rotate Clockwise rotate

15% Crop 25% Crop Mirrored Image

High BrightnessLow brightness No object

Dataset Modification

Algorithm: YOLO (You Only Look Once)

● Object detection with a single Convolutional Neural Network (CNN)

Algorithm: YOLOv2

● Fast and Accurate

Model mAP FLOPS FPS

YOLOv2 608x608 48.1 62.94 Bn 40

Tiny YOLO 23.7 5.41 Bn 244

*mAP stands for mean average precision

*Table evaluated on COCO dataset

*FLOPs: Floating Points Operations per second

Training with YOLO● Train for classification: Pretrain with ImageNet 1000-class dataset ● Train for detection: Convert the model to perform detection

Learning through multiple iteration,which takes days

YOLO Architecture

InferenceSpeed: FPS (Frames per Second) Accuracy: IoU (Intersection over Union)

Inferencing through one iteration, which takes minutes

Inference Results

Accuracy (Average IoU) & FPS:

1st Submission: 37.27% & 11.9 FPS

2nd Submission: 34.09% & 29 FPS

Final Submission:72% (measured on test set) & 20 FPS

PARROT BEBOP 2

Camera 14 megapixels with fish-eye lens

Video Resolution

1080p (30 fps) recording / 480p (30 fps) streaming

Battery life 25 minutes flying time (with 2700 mAh battery)

Storage 8 GB flash storage system

Connectivity Wi-Fi 802.11a/b/g/n/ac

Signal range 300 m

ARDroneSDK Connect and get H264 video stream

Block DiagramPARROT BEBOP

2 DRONE

Monitor

HMP Dual Denver + Quad ARM A57 CPU

NVIDIA Pascal GPU

Ethernet Module

Drone Controller

ARDroneSDK3

Trained Neural Networks

USB to Ethernet Adapter

WiFi

Nvidia Jetson TX2

Competition Result

Ranked 13 out of total 53 groups for April

Improved accuracy from 34% to 72% for the May submission

Final rank yet to be determined

Demo Video

https://youtu.be/vVuwy9izHFY

Collaborators / Mentors / Sponsor

Prof. Yuan Xie (UCSB SEAL LAB)

Prof. Yogananda Isukapalli (ECE 189)

Caio Motta (ECE 189)

Dr. Lei Deng (UCSB SEAL LAB)

Yiming Gan (Master Student)

NVIDIA (SPONSOR)

DJI (SPONSOR)

Questions?

Documents

Deep Vision - Electrical and Computer Engineeringyoga/capstone/media/DeepVision... · 2020-03-09 · Deep Vision Better object recognition with less power. ... Computer Vision: NVIDIA