45
Deep Learning in Computer Vision Axon@Grokking Oct. 28, 2017

Grokking TechTalk #21: Deep Learning in Computer Vision

Embed Size (px)

Citation preview

Page 1: Grokking TechTalk #21: Deep Learning in Computer Vision

[email protected],2017

Page 2: Grokking TechTalk #21: Deep Learning in Computer Vision

DangHuynhEducation

• Ph.D.inComputerScience(France)

Work• Jan2017– now:AxonEnterprise• 2015– 2016:Misfit• 2011– 2015:NokiaBellLabs

Researchdomains• Machinevision.• Datascience.• Telecommunicationsystems.

Axon Enterprise

Misfit

Nokia Bell Labs2/43

Page 3: Grokking TechTalk #21: Deep Learning in Computer Vision

!=

WeareAXON!

3/43

Page 4: Grokking TechTalk #21: Deep Learning in Computer Vision

Outline

•Refresh•Computervision•DeeplearninginComputervision•Theoryvs.Reality•Demo

4/43

Page 5: Grokking TechTalk #21: Deep Learning in Computer Vision

RefreshMachinelearningandDeeplearning

5/43

Page 6: Grokking TechTalk #21: Deep Learning in Computer Vision

MachinelearningInputdataà predictionmodelà outputlabel

y

x

y=F(x)x0

y0?

6/43

Page 7: Grokking TechTalk #21: Deep Learning in Computer Vision

MachineLearningy=4x13 - 2x22 +8

x2

f(x)=x3x1

f(x)=x2

+1

y

weight=1

0

0

1

4

-2

8

7/43

Page 8: Grokking TechTalk #21: Deep Learning in Computer Vision

MachineLearning

Challenges• Relevantdataacquisition• Datapreprocessing• Featureselection• Modelselection:simplicityversuscomplexity• Resultinterpretation.

8/43

Page 9: Grokking TechTalk #21: Deep Learning in Computer Vision

DeepLearning• MachineLearningwithmany(deep)hiddenlayers

x2

x1

+1

+1

+1

y1

y2

HiddenlayersInput Output9/43

Page 10: Grokking TechTalk #21: Deep Learning in Computer Vision

Whydeeplearning?

Amountofdata

Perfo

rmance

Deeplearning

Machinelearning

10/43

Page 11: Grokking TechTalk #21: Deep Learning in Computer Vision

ComputerVisionintro

11/43

Page 12: Grokking TechTalk #21: Deep Learning in Computer Vision

Makecomputersunderstandimagesandvideo:- Detection- Recognition- Tracking- Extraction

ComputerVision

Object detection 12/43

Page 13: Grokking TechTalk #21: Deep Learning in Computer Vision

Stilltherearechallenges:objectcanbe…

ComputerVision

… partlyoccluded

… orevenfullyoccluded.

13/43

Page 14: Grokking TechTalk #21: Deep Learning in Computer Vision

ChallengeWe were building a human detector, and we accidentally got future human detector!

14/43

Page 15: Grokking TechTalk #21: Deep Learning in Computer Vision

15/43

TraditionalapproachDeeplearningapproach

has two eyes?

has a nose below eyes?

Ok, it’s a face!

…..

Feature engineering NO feature engineering

Page 16: Grokking TechTalk #21: Deep Learning in Computer Vision

Traditionalapproachvs.Deeplearning

16/43

ImageNet: 1.2 million images with 1000 object categories

Source:http://pattern-recognition.weebly.com/

Deep learningTradition

Page 17: Grokking TechTalk #21: Deep Learning in Computer Vision

DeepLearning in ComputerVision

17/43

Page 18: Grokking TechTalk #21: Deep Learning in Computer Vision

ComputerVisionWhatcomputersees

Red43 45 2113 34 12

23 88 55

Green19 89 2717 57 29

75 56 94

Blue19 89 2717 57 29

75 56 94

y=F(Red,Green,Blue)

3-Dinputarray

Facialdetection

18/43

Page 19: Grokking TechTalk #21: Deep Learning in Computer Vision

Intuition

x2

x1

+1

+1

+1

y1

y2

HiddenlayersInput Output

Facialdetection

Green

Red

Blue

19/43

Page 20: Grokking TechTalk #21: Deep Learning in Computer Vision

ConvolutionalNeuralNetwork(CNN)Idea:havingafilterscanningoverimage.

Outputmatrix

Inputmatrix(e.g.,image)Filter(grey)

Source:https://github.com/vdumoulin/conv_arithmetic

Convolutionalprocess

20/43

Page 21: Grokking TechTalk #21: Deep Learning in Computer Vision

CNN – StridingandPaddingControlhowthefilterconvolvesaroundtheinputmatrix.

Outputmatrix

Inputmatrix(e.g.,image)

Filter(grey)

Source:https://github.com/vdumoulin/conv_arithmetic

Stride=2,Zero-padding=121/43

Page 22: Grokking TechTalk #21: Deep Learning in Computer Vision

Convolutionaloperation

0 1 1 1 0 0 00 0 1 1 1 0 00 0 0 1 1 1 00 0 0 1 1 0 00 0 1 1 0 0 00 1 1 0 0 0 01 1 0 0 0 0 0

1 0 10 1 01 0 1

1 4 3 4 11 2 4 3 31 2 3 4 11 3 3 1 13 3 1 1 0

5x5Output

3 x3Filter

7x7Input

* =

Input [height1,width1,#ofchannels]Filter [height2,width2,#ofchannels]Output [height3,width3,#offilters] 22/43

Page 23: Grokking TechTalk #21: Deep Learning in Computer Vision

RectifiedLinearUnit(ReLU)

ReLU:F(y)=max(0,y)

-3 2 01 -1 0

-5 2 4

0 2 01 0 0

0 2 4

ReLU

Non-linearactivationfunction.

23/43

Page 24: Grokking TechTalk #21: Deep Learning in Computer Vision

MaxPooling

1 0 2 3

4 6 6 8

3 1 1 0

1 2 2 4

6 8

3 4

Reducedimensionandavoidoverfitting.

Maxpoolwith2x2filterandstride2

24/43

Page 25: Grokking TechTalk #21: Deep Learning in Computer Vision

Example

Input24x24x3

11x11x28 4x4 x48 3x3x64

face/non-face

boundingboxregression

2

4

Conv:3x3MP:2x2

Conv:3x3MP:3x3

Conv:2x2 Fullyconnected

128

SupposethatallMaxPooling(MP)layerhasstride2.

Input:24 x24 x3Conv:3 x3 x3MP:2x2(stride2)à Outputdimension(24 – 3 +1)/2=11

25/43

Page 26: Grokking TechTalk #21: Deep Learning in Computer Vision

Objectscales• Detectobjectofvarioussizes.

Source:https://www.pyimagesearch.com

Input

Tradeoffs?

scansover

26/43

Page 27: Grokking TechTalk #21: Deep Learning in Computer Vision

Dataaugmentation• Generatemoreartificialdatapointsfrombasedata.

•Applywithcare tootherdatatypes!

Original Little noise Moderate Heavy noise

27/43

Page 28: Grokking TechTalk #21: Deep Learning in Computer Vision

Complexdataaugmentation

Face rotation28/43

Page 29: Grokking TechTalk #21: Deep Learning in Computer Vision

Whydataaugmentation?

WITHOUT augmentation

AXON detection

WITH augmentation

29/43

Page 30: Grokking TechTalk #21: Deep Learning in Computer Vision

Howtobenchmark?

Facebook detection 30/43

Page 31: Grokking TechTalk #21: Deep Learning in Computer Vision

Theoryvs.Reality

31/43

Page 32: Grokking TechTalk #21: Deep Learning in Computer Vision

DeeplearninginComputerVisionPros:• DLreducestheneedforfeatureengineering.• DLoutperformsclassicalComputerVisionapproaches.

Cons:• DLrequiresahugeamountofdata(>100Ksamples).• DLisextremelycomputationallyexpensivetotrain(weeksonGPUs).• DLmodelstructureisablackbox.

32/43

Page 33: Grokking TechTalk #21: Deep Learning in Computer Vision

Performancevs.Portability

Theory Reality

33/43

Page 34: Grokking TechTalk #21: Deep Learning in Computer Vision

Performancevs.Powerconsumption

Theory Reality

Portable battery34/43

Page 35: Grokking TechTalk #21: Deep Learning in Computer Vision

SpecialhardwareforDeepLearning

Jetson TX2 (NVDIA) Google TPU Movidius Myriad

• Optimizedforspecificusecase.• Notplug-and-play,needgoodengineerstomakeitwork.

Stillfarfromconsumer…35/43

Page 36: Grokking TechTalk #21: Deep Learning in Computer Vision

Privacy

• Thepoliceareourcustomers,sodataprivacyisimportant.• Canwe“extractfeatures”fromtheprivatedata?

36/43

Page 37: Grokking TechTalk #21: Deep Learning in Computer Vision

Demo

37/43

Page 38: Grokking TechTalk #21: Deep Learning in Computer Vision

Workflowandtoolset

38/43

Page 39: Grokking TechTalk #21: Deep Learning in Computer Vision

Skinblurring

39/43

Page 40: Grokking TechTalk #21: Deep Learning in Computer Vision

Facialdetectionwithtracking

40/43

Page 41: Grokking TechTalk #21: Deep Learning in Computer Vision

Licenseplatedetection

41/43

Page 42: Grokking TechTalk #21: Deep Learning in Computer Vision

TakeHomemessage

42/43

Page 43: Grokking TechTalk #21: Deep Learning in Computer Vision

Industryperspective

Alwaysconsiderthefollowing4Ps:• Performance• Powerconsumption• Portability• Price

Deeplearningisnotamagic:tradeoffalwaysexists!

43/43

Page 44: Grokking TechTalk #21: Deep Learning in Computer Vision

Thankyou

44/43

Page 45: Grokking TechTalk #21: Deep Learning in Computer Vision

WeareHiring

FullStack,ResearchEngineers,Security.

https://jobs.lever.co/axon

45/43