Grokking TechTalk #21: Deep Learning in Computer Vision

[email protected],2017

DangHuynhEducation

• Ph.D.inComputerScience(France)

Work• Jan2017– now:AxonEnterprise• 2015– 2016:Misfit• 2011– 2015:NokiaBellLabs

Researchdomains• Machinevision.• Datascience.• Telecommunicationsystems.

Axon Enterprise

Misfit

Nokia Bell Labs2/43

!=

WeareAXON!

3/43

Outline

•Refresh•Computervision•DeeplearninginComputervision•Theoryvs.Reality•Demo

4/43

RefreshMachinelearningandDeeplearning

5/43

MachinelearningInputdataà predictionmodelà outputlabel

y

x

y=F(x)x0

y0?

6/43

MachineLearningy=4x13 - 2x22 +8

x2

f(x)=x3x1

f(x)=x2

+1

y

weight=1

0

0

1

4

-2

8

7/43

MachineLearning

Challenges• Relevantdataacquisition• Datapreprocessing• Featureselection• Modelselection:simplicityversuscomplexity• Resultinterpretation.

8/43

DeepLearning• MachineLearningwithmany(deep)hiddenlayers

x2

x1

+1

+1

+1

y1

y2

HiddenlayersInput Output9/43

Whydeeplearning?

Amountofdata

Perfo

rmance

Deeplearning

Machinelearning

10/43

ComputerVisionintro

11/43

Makecomputersunderstandimagesandvideo:- Detection- Recognition- Tracking- Extraction

ComputerVision

Object detection 12/43

Stilltherearechallenges:objectcanbe…

ComputerVision

… partlyoccluded

… orevenfullyoccluded.

13/43

ChallengeWe were building a human detector, and we accidentally got future human detector!

14/43

15/43

TraditionalapproachDeeplearningapproach

has two eyes?

has a nose below eyes?

Ok, it’s a face!

…..

Feature engineering NO feature engineering

Traditionalapproachvs.Deeplearning

16/43

ImageNet: 1.2 million images with 1000 object categories

Source:http://pattern-recognition.weebly.com/

Deep learningTradition

DeepLearning in ComputerVision

17/43

ComputerVisionWhatcomputersees

Red43 45 2113 34 12

23 88 55

Green19 89 2717 57 29

75 56 94

Blue19 89 2717 57 29

75 56 94

y=F(Red,Green,Blue)

3-Dinputarray

Facialdetection

18/43

Intuition

x2

x1

+1

+1

+1

y1

y2

HiddenlayersInput Output

Facialdetection

Green

Red

Blue

19/43

ConvolutionalNeuralNetwork(CNN)Idea:havingafilterscanningoverimage.

Outputmatrix

Inputmatrix(e.g.,image)Filter(grey)

Source:https://github.com/vdumoulin/conv_arithmetic

Convolutionalprocess

20/43

CNN – StridingandPaddingControlhowthefilterconvolvesaroundtheinputmatrix.

Outputmatrix

Inputmatrix(e.g.,image)

Filter(grey)

Source:https://github.com/vdumoulin/conv_arithmetic

Stride=2,Zero-padding=121/43

Convolutionaloperation

0 1 1 1 0 0 00 0 1 1 1 0 00 0 0 1 1 1 00 0 0 1 1 0 00 0 1 1 0 0 00 1 1 0 0 0 01 1 0 0 0 0 0

1 0 10 1 01 0 1

1 4 3 4 11 2 4 3 31 2 3 4 11 3 3 1 13 3 1 1 0

5x5Output

3 x3Filter

7x7Input

* =

Input [height1,width1,#ofchannels]Filter [height2,width2,#ofchannels]Output [height3,width3,#offilters] 22/43

RectifiedLinearUnit(ReLU)

ReLU:F(y)=max(0,y)

-3 2 01 -1 0

-5 2 4

0 2 01 0 0

0 2 4

ReLU

Non-linearactivationfunction.

23/43

MaxPooling

1 0 2 3

4 6 6 8

3 1 1 0

1 2 2 4

6 8

3 4

Reducedimensionandavoidoverfitting.

Maxpoolwith2x2filterandstride2

24/43

Example

Input24x24x3

11x11x28 4x4 x48 3x3x64

face/non-face

boundingboxregression

2

4

Conv:3x3MP:2x2

Conv:3x3MP:3x3

Conv:2x2 Fullyconnected

128

SupposethatallMaxPooling(MP)layerhasstride2.

Input:24 x24 x3Conv:3 x3 x3MP:2x2(stride2)à Outputdimension(24 – 3 +1)/2=11

25/43

Objectscales• Detectobjectofvarioussizes.

Source:https://www.pyimagesearch.com

Input

Tradeoffs?

scansover

26/43

Dataaugmentation• Generatemoreartificialdatapointsfrombasedata.

•Applywithcare tootherdatatypes!

Original Little noise Moderate Heavy noise

27/43

Complexdataaugmentation

Face rotation28/43

Whydataaugmentation?

WITHOUT augmentation

AXON detection

WITH augmentation

29/43

Howtobenchmark?

Facebook detection 30/43

Theoryvs.Reality

31/43

DeeplearninginComputerVisionPros:• DLreducestheneedforfeatureengineering.• DLoutperformsclassicalComputerVisionapproaches.

Cons:• DLrequiresahugeamountofdata(>100Ksamples).• DLisextremelycomputationallyexpensivetotrain(weeksonGPUs).• DLmodelstructureisablackbox.

32/43

Performancevs.Portability

Theory Reality

33/43

Performancevs.Powerconsumption

Theory Reality

Portable battery34/43

SpecialhardwareforDeepLearning

Jetson TX2 (NVDIA) Google TPU Movidius Myriad

• Optimizedforspecificusecase.• Notplug-and-play,needgoodengineerstomakeitwork.

Stillfarfromconsumer…35/43

Privacy

• Thepoliceareourcustomers,sodataprivacyisimportant.• Canwe“extractfeatures”fromtheprivatedata?

36/43

Demo

37/43

Workflowandtoolset

38/43

Skinblurring

39/43

Facialdetectionwithtracking

40/43

Licenseplatedetection

41/43

TakeHomemessage

42/43

Industryperspective

Alwaysconsiderthefollowing4Ps:• Performance• Powerconsumption• Portability• Price

Deeplearningisnotamagic:tradeoffalwaysexists!

43/43

Thankyou

44/43

WeareHiring

FullStack,ResearchEngineers,Security.

https://jobs.lever.co/axon

45/43

Technology

Grokking TechTalk #21: Deep Learning in Computer Vision