View
26
Download
3
Category
Preview:
Citation preview
Copyright © 2015 CEVA 1
Moshe Shahar
12 May 2015
The Evolution of Object Recognition in
Embedded Systems
Copyright © 2015 CEVA 2
Timeline of CEVA’s Feature Extraction and
Classification Algorithms
Cycle
count scale
2012 2013 Time 2014 2015
SIFT
2016
HOG
CNN
HARRIS
3D Objects
(KD-Tree)
SURF
GFTT
ORB
CEVA-MM3101 CEVA-XM4
LBP
Copyright © 2015 CEVA 3
• Find peak responses over scale
in Laplacian pyramid
• Find response with sub pixel
accuracy
• Only keep “corner like”
responses
• Assign orientation
• Create recognition signature
• (Patent protected)
SIFT—Scale Invariant Feature Transform
Performance estimations were
30-50MCycles for VGA frame
Scale
(Next
Octave)
Scale
(First
Octave)
Gaussian Difference of
Gaussian
Copyright © 2015 CEVA 4
• SIFT is very accurate
but very complex to
implement on a
programmable platform
• SURF includes fast
implementation and fast
descriptor matching
• Invariant to various
image transforms like
rotation and illumination
changes
• Still widely used
• Partial patent protection
SURF—Speeded Up Robust Features
Image
Integral Image
Response Map Calculatoin
Maxima 3x3x3 + interpolate
Detected Interest Points
Descriptor DB
Sorted Interest Points
DetectorDetector
Sort according to levels and
regions
Integral sum
Build 64 sum Descriptor
SortSort DescriptorDescriptor
Copyright © 2015 CEVA 5
HOG—Histogram of Gradients
Input image
Scaled image
Scale 1
Scaled image
Scale 9
Gamma
Normalization
Gradient
Calculation
Descriptor
Calculation
Bilinear
Scaling
HOG algorithm is based on Dalal & Triggs paper
(2005)
Common use is object detection, especially
pedestrian detection
Reference Code—OpenCV 2.4.3
Copyright © 2015 CEVA 6
1. Load 2 vectors in single cycle
2. Perform multiple operations in single
cycle
3. Store a transposed rectangle of 4x4
pixels in single cycle
4. Perform the load and filter again
5. Store 4x4 transposed to memory in
single cycle
HOG—Bilinear Scaling
Memory
vAvBvCvD
Vector Registers
Memory
vAvBvCvD
Vector Registers
Memory
vAvBvCvD
Vector Registers
filter
transpose
Copyright © 2015 CEVA 7
• Implemented using ‘Look Up Table’ (LUT)—N way parallel access to
local memory in one cycle
• Parallel load mechanism—Load N gamma values in a cycle
HOG—Gamma Normalization
Memory Map
Copyright © 2015 CEVA 8
ORB—Oriented FAST and Rotated BRIEF
• An efficient alternative to SIFT (and patent free)
• Pyramid is used for scale-invariance
• Features are detected using FAST9, Harris and non-max-suppress
• Descriptors are based on BRIEF with normalized orientation
ORB—Feature Extraction
Input
Image Fast9 Harris
Non-Max-
Suppress
Oriented
BRIEF
Descriptors
list
Pyramid
Copyright © 2015 CEVA 9
ORB—FAST9 Implementation
Continuous arc of 9 or more pixels:
All much brighter then (p+Th)
or
All much darker then (p-Th)
Copyright © 2015 CEVA 10
ORB—FAST9 Implementation
• Early exit is used to detect potential positions
• Long memory access of 32 bytes using
• quickly load consecutive pixels
• Vector compare is used to compare the center of the corner to
the borders
• Building a binary (bit) map with positions that need to be calculated
• Calculation of N positions in parallel
• Using different two dimensional loads
• Vector predicates are used selectively calculate only the locations
that pass the threshold
• Use N way parallel lookup
Copyright © 2015 CEVA 11
• More MACs per cycle
• Complex multi-operation instructions
• Wider bandwidth to local memory
(while conserving power)
• More orthogonal memory accesses
• Supporting ISA to allow conditional
execution per vector element
• More operations per loaded bit
• Higher performance per mW
• Improve local data reuse
Vision Processor Evolution
Computing power
“Multi Scalar”
Improve
efficiency
Copyright © 2015 CEVA 12
• Ability to processes 2D data efficiently is critical for many algorithms
and especially for convolutions
• Idea is to take advantage of the pixel overlap in image processing by
reusing same data to produce multiple outputs
• Significantly increases processing capability
• Saves external memory bandwidth and frees system buses for other tasks
• Reduces power consumption
2-Dimension Data Processing Capability
Input Image
Reuse
For 16MAC with of 512-bit bandwidth, only 176-bit actually loaded
Copyright © 2015 CEVA 13
• Convolutional neural network (CNN)
• A deep learning neural network algorithm
• Used for classification, localization and detection of objects in images
• CNN value
1. Best recognition quality
2. Re-trainable to any object without code change
• CNN combines 2D convolutions, 2D max and 1D MAC operations
• Good match for vector DSPs
Vector Accelerated Deep Neural Network
Input
Image
NxM
Convolution
NxM
Convolution
Subsampling Subsampling Fully
Connected
Copyright © 2015 CEVA 14
• The algorithm enables to utilize the overlapping convolutions to get
efficient processing
• Executing one or several filters in parallel on the same input—ideal
for using a 2-dimention data processing capability
CNN—2D Convolution Layer
Convolution layer
Copyright © 2015 CEVA 15
• Subsampling stage: max filter operation used to find strongest response
on MxN patch from previous layer reducing the scaling in each axis
• Example processor capability: Calculate MxN max filter using vector
max on 3-input vectors, 16 elements each, 16-bit per element
CNN—Pooling Layer
Pooling layer
m
n
Copyright © 2015 CEVA 16
• Includes many multiplications with different weights accumulated to
single result
• Requires high accumulation precision and large amount of MAC
operations
• Ideal for vector processor
CNN—Fully Connected Layer
m
n
Fully Connected
Copyright © 2015 CEVA 17
• CNN—The new king of the block
• Will dominant object recognition once real-time is possible
• Allows a lot of algorithmic freedom within the implementation
• Ideal for programmable or accelerator+processor solutions
• 3D becoming more widely used
• 3D object detection, classification and recognition will evolve
rapidly
• No clear winner, each vendor has its own flow
• Dominant database seems to be KD-Tree, very serial in nature
• Rapid developments, new innovations coming soon…
What Do We See as The Next Trends?
Copyright © 2015 CEVA 18
Timeline of CEVA’s Feature Extraction and
Classification Algorithms C
ycle
count scale
2012 2013 Time 2014 2015 2016
HOG
CNN
SURF
ORB
Parallel
access
Data
reuse Fast
Filters
64 bit
Parallel
access
Fast
Filters
Parallel
access
Fast
Filters+ Parallel
access+
Copyright © 2015 CEVA 19
• 4th-generation imaging and vision processor IP
• Vector-type processor; combines fixed- and floating-point math; up
to 4096-bit processing per cycle
• Platform includes vision processor, libraries, tools and applications
Introducing CEVA-XM4™
Copyright © 2015 CEVA 20
Come see us at our booth
for real time demos ….
Recommended