Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Where Tegra meets Titan!
Prof Tom Drummond!
Computer vision is easy!!But first a diversion to 10th Century Persia …!
! ! ! ! ! ! !… and the first recorded game of chess!
The rice and the chessboard!
The rice and the chessboard!
The rice and the chessboard!
The rice and the chessboard!
The rice and the chessboard!
The rice and the chessboard!
First half of the chessboard: 100 tons of rice
The rice and the chessboard!
First half of the chessboard: 100 tons of rice
Second half of the chessboard: 400 billion tons of rice = 1000 years of production
And the moral of the story is …
The transistor and the chessboard!
The transistor and the chessboard!1974: Intel 8080 (6,000 transistors) 1978: Intel 8086 (29,000 transistors) 1982: Intel 80286 (134,000 transistors) 1993 Intel Pen:um (3,000,000 transistors) 2004 P4 Intel Presco> (125,000,000 transistors)
The transistor and the chessboard!
?
How many on the last square…?
1974: Intel 8080 (6,000 transistors) 1978: Intel 8086 (29,000 transistors) 1982: Intel 80286 (134,000 transistors) 1993 Intel Pen:um (3,000,000 transistors) 2004 P4 Intel Presco> (125,000,000 transistors) This notebook > 2 trillion transistors
2004: Nvidia NV40 (222,000,000 transistors) 2006: Nvidia G80 (484,000,000 transistors) 2008: Nvidia GT200 (1,400,000,000 transistors) 2010: Nvidia GF104 (1,900,000,000 transistors) 2012: Nvidia GK104 (3,540,000,000 transistors) 2015: Nvidia GM200 (8,000,000,000 transistors)
Can run Mooreʼs law backwards!Q: According to Moore’s law, when was there just one transistor? A: 1948
Can run Mooreʼs law backwards!Q: According to Moore’s law, when was there just one transistor? A: 1948
In Nov 1947, Bardeen, Bra>ain and Shockley a>ached two gold contacts to a crystal of germanium…
Power!
Mooreʼs law gives us increasing compute power!
BUT!
With great power comes great …!
Mooreʼs Law is not always our friend!!
Even with GPUs, compute on mobile devices is limited Can’t put a K40 on a Quadrotor!
Mooreʼs Law is not always our friend!!
Even with GPUs, compute on mobile devices is limited But a TX1 fits just fine! (Stereolabs TX1 enabled drone)
ACRV!
The Australian Research Council Centre of Excellence for Robo:c Vision • $25.5M over 7 years • 13 Chief Inves:gators in 4 Universi:es • 16 Research Fellows • ~50 PhD students • Research into:
– Seman:cs (deep learning) – Robust vision (all weathers) – Vision and Ac:on (closing the loop) – Algorithms and Architecture (constrained resources)
Distributed Robotic Vision!
Simplest method is to just partition the problem somewhere, giving some tasks to the mobile and some to the server!
mobile server
Distributed Robotic Vision!
But often this isnʼt the best solution !e.g. latency introduced by the network may be a problem!
Many interesting solutions not like this, e.g:!
Obtain sensor data
Extract summary
informa:on
Compute accurate solu:on
Compute approximate solu:on
Compare
Calculate output
Update local model
Bring correc:on up to date
Calculate and send correc:on
Compute approximate solu:on
Distributed Robotic Vision!
Want to create solutions to enable robotics in a distributed sensing and compute environment!
TX1
TX1 TX1
K40
K40
K40
K40
K40
K40
K40
K40
CPU
CPU
Distributed Localisation Service!
Extract landmarks CCTV1 Build Image
Pyramid Build
Descriptors Index Match
Extract landmarks CCTV2 Build Image
Pyramid Build
Descriptors Index Match
Extract landmarks Robot Build Image
Pyramid Build
Descriptors
Compute 1 Compute Robot pose
Distributed Localisation Service!==3031== NVPROF is profiling process 3031, command: ./ComputeOrb 1!Frame# 1!Elapsed time : 5.955523 ms!Frame Elapsed time : 7.765627 ms!
numCorners: 28304, nmsnumCorners: 5073!==3031== Profiling application: ./ComputeOrb 1!==3031== Profiling result:!
Time(%) Time Calls Avg Min Max Name! 57.18% 3.2379ms 1 3.2379ms 3.2379ms 3.2379ms OrbDescriptors(…)! 30.57% 1.7312ms 1 1.7312ms 1.7312ms 1.7312ms (…)! 4.29% 242.92us 1 242.92us 242.92us 242.92us fastcorner(…)!
4.00% 226.31us 1 226.31us 226.31us 226.31us harris(…)! 1.46% 82.553us 1 82.553us 82.553us 82.553us NMS(…)! 0.73% 41.458us 1 41.458us 41.458us 41.458us cleansweep(…)!
!
Speedup over CPU* implementation is 4-5X!
!
* Intel Core2 Quad Q8400 @2.66Ghz!
Sub-pixel localisation!
Timing Results: ! ! !(µs/keypoint)Inverse Additive ! ! !672 Inverse Compositional !367 Ours ! ! ! ! !7!
Extract image patch Camera 1 Find
landmarks
Compute matrix Compute 1
Camera 2 Extract image patch
Find landmarks
Compute sub-‐pixel correspondence on many subsequent frames
Compute sub-‐pixel correspondence on many subsequent frames
Approximate Nearest Neighbor!Big data in high dimensional spaces Given a query point, find the nearest reference point Solu:on: FANNG (Fast Approximate Nearest Neighbor Graphs) @CVPR 2016 Can serve 1.2M queries/second at 90% recall in a database of 1M reference points in 128D space on Titan X
Approximate Nearest Neighbor!CUDA implementa:on requires a short priority queue BUT int array[30]; // very slow global memory!
Solu:on is to treat a warp as a single unit with array spread over the warp in a single register: int array; // there are 32 of these in a warp !...!// find the first entry in array that is > thresh!int pq = __ffs(__ballot(array > thresh));!...!!
Approximate Nearest Neighbor!Want to keep the array sorted when we insert a new value, discarding the largest value
1 2 4 5 9 11 13 15 array:
0 1 2 3 4 5 6 7 thread:
new_value: 8
8 8 8 8 9 11 13 15 ship value:
8 8 8 8 9 11 13 shuffle:
(each thread sees this value)
=max(new_value,array)
Write new value if less than array
1 2 4 5 8 9 1 13 array:
8 8 8 8 8 8 8