Upload
lviv-it-arena
View
63
Download
0
Embed Size (px)
Citation preview
Applied Deep-Learning
Computer Vision: Landscape, Capabilities and Case-studies
Programming is an art where artist understands little what was created.
Unknown Author
A.I. headlines
Google DeepMind software masters the game of Go, takes aim at the world’s top player
GeekWire, January, 2016
A Learning Advance in Artificial Intelligence Rivals Human Abilities
NewYork Times, December, 2015
AI is nearly as good as humans in detecting breast cancer.
Engadget, June, 2016
Why it is important?
Speech
FinTex
ADAS
Medical
Social
Security
99,98% _______ 10,000K
90% _____ 50K
Why NOW?
Needs Ways
Hardware
Methods
Tools Data
Use-Cases
Demand Pyramid
integration
classification
comprehension
Value
Challenge: Video Comprehension
✔Provide ranking for certain video event • TecVID MED’13 – 16 (Audio, OCR, Speech)
✔Assign action label to video event • Dense trajectory features [Wang, 13]
• CNN features for optical flow [Simonyan, 14]
• 3D convolution networks for videos [Tran, 15]
✔Action localization • Learning to track for spatial-temporal action
localization [Weinzapfel, 15]
Pros & Cons
Highly flexible
Adaptable to the new tasks
Great for complex noisy data
Deterministic latency
Undebuggable
Compute & power intensive
Large memory footprint
Not quite understood by developers
What’s different?
ILSVRC – Architectures Competition
ResNet GoogleNet
VGG
AlexNet
28,2 25,8
16,4
11,7
6,7 7,3
3,57
2010 AlexNet 2011 AlexNet 2012 AlexNet 2013 AlexNet 2014 GoogleNet 2014 VGG 2015 ResNet
layers-> 152
22 19 8 8
5,1
errors% ->
Dev Landscape: Frameworks
✔Caffee
✔TensorFlow
✔Torch & Co.
Dev Landscape: Tools
✔NVidia Digits
✔OpenML
✔Proprietary solutions
CNN Optimizations
✔Fine tuning – learning optimization
✔Nets pruning – less memory footprint
✔Forward speed – decrease latency
Architectures Performance
CNNs Alex Net
Google Net
VGG-19 VGG-16 SqueezeNet
ResNet-152
Leaning time, hours 187 673 4100 3500 n.a 2680
Forward, sec 0,3 1 4 3,5 0,3 ?
Weights, MB 230 51 548 528 4,7 230
Top-5 error, % 16,4 6,7 7,3 7,4 16,4 3,57
Number of layers 8 22 19 16 65 152
Machine Learning: Misconceptions
No self-learning
No universal architecture
Spatial-temporal analysis
Examples
AlexNet at ARM Mali T760
AlexNet@CAFFEE image visual analysis. Creates tags and understand your picture offline
DNN for ADAS
Driver assist systems leverage DNNs to classify road obstacles. Hybrid HOG + DNN
YOLO + DNN + Dense Trajectories
Specialized video surveillance to achieve unprecedented value for pharma. Provides reliability and accuracy of video data. Example of visual cognition.
Lessons Learned:
Deep Learning is in it’s infancy
Be ready for chaotic tools landscape
Don’t break up with traditional algorithmic approach
Be ready to change your mindset!
Thank you!
Questions?
Programming is an art where artist understands little what was created.
Unknown Author