Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Geometric Learning Algorithms
Stephen M. Omohundro
International Computer Science Institute
Berkeley, California
• Density estimation, classification, mappings
• Average case algorithms
• K-d trees and nearest neighbors
• Balltrees and boxtrees
• Robot arm task
• Bumptrees and mapping learning
Geometric Learning Algorithms
• Geometric - Mathematics
• Learning - Statistics
• Algorithms - Computer Science
• •
r ~Some Basic Tasks
• Density Estimation
• Classification learning
• Smooth nonlinear mapping learning
out , out.B .B • • B.B.B •
•AA .CC ....... .. ... ·...... I~ ./),A .QC
10 in
Classification Learning Mapping Learning
\.. ~
• • • • • • • • • ••
.,
Density estimation by Histogram
• • •
• ••
• •••~~•
• Bins too large -> washes out the fine structure
• Bins too small-> only one or no samples per bin
• Correct size for one region may be incorrect for another
Volume Density Estimator
• Instead of taking regions of fixed volume and counting samples
• take regions with a fixed number of samples and measure volume
• eg. Take inverse volume of nth nearest neighbor sphere
• Get small regions in high density parts of the space
• Large regions in low density parts of the space
• Need to compute the distance to the nth nearest neighbor.
The beta distribution and non-parametric statistics.
Let Sa. be a nested family of sets parameterized by a such that p (Sa.) = a. If we draw N points and choose the smallest set So.
containing exactly n pOints, then the a'S are distributed according to the Beta distribution:
p (a) - Nt n-l N-n n - F "i " F 11. T " , a ( 1 - a)__ •
n n 2 n ~-E (a) = 'lLT ."i N cr (a) ~2
N
class 2
Nearest Neighbor Classification
• Baye's optimal approach would choose that class for which the posterior distribution is largest.
class 1
• Typically don't know the distributions, however.
• Nearest neighbor assigns a point the class of the nearest example.
• Asymptotically error is less than twice that of Bayes optimal.
• Non-parametric.
Double bucket sort: an adaptive algorithm
• Sorting n values in [0,1] takes O(n log n) worst case.
• Would like good average case performance
• but, What distribution do you average over?
• Assume samples come from an unknown distribution, want good average with respect to that distribution.
• Bucket sort: Break interval into equal sized buckets, sort each one.
1-- I I- - I I- 1...1 1
• Double bucket sort: Further break up intervals into a number of subintervals proportional to the samples which fell into them.
• Linear expected time O(n), beats quicksort by factor of 2 in practice.
I - 4t I I - - I I - I .11111 I
0.
u
cr;
S .l""1
'1j
K -d trees can find n nearest neighbors in log expected time.
If N samples are drawn from a non-vanishing, smooth distribution on a compact region, then we can use a k-d tree to find the n nearest neighbors of a new sample in O(log N) expected time, asymptotically for large N.
Idea: Asymptotically, volume of n nearest neighbor ball is beta distributed. With k-d construction, tree regions are also beta distributed. n nearest neighbor ball will overlap only a constant number of tree regions on average. Using branch and bound, we do log time initial search and then constant extra work.
K -d Tree Search Dimension Dependence 0.16., Truncated .".
0.200.12 -I Brute sea.r;en/"
//"" 0.08 -I """"""
/"" 0.10
0.04 -I "" ",,// K-d search
0.00 If iii iii iii I 0.00 0 2000 4000 6000 8000 10000
4 Dimensions 8 Dimensions 9 Dimensions 0.30., 0.30 /
// 0.30
0.20
l0 k:::: -I /- 0.20-1
.
/'"
// "","" ----./ 0.20
0.10O.10~O.
0.00 i I : I 0.00 I I Iii I 0.00 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 1 0000 0 2000 4000 6000 8000 10000
10 Dimensions 11 Dimensions 12 Dimensions
Average nearest neighbor search time (sees) vs number of points
""~",,"" 0.20
"","""".... //
0.10",,"" ",,/
,/""'
0.00 0 2000 4000 6000 8000 10000 a 2000 4000 6000 8000 10000
•en--C
.. .0
0)
0)
.... "'C
-1
ft
-0
)C
....o
J-
....co
C
J
OJ::
(J
.-en
en 0)C
Q)
0) ... E+:,.--0
_ en I-
Nc
«~ C
0).....
LU
Balltree queries. Pruning: • Return leaf balls containing a query point.
Branch and bound queries: • Return nearest leaf to query point.
Distribution independent average performance:
If leaves and queries are drawn from an underlying distribution p then would like good performance on average with respect to p.
Five balltree construction algorithms.
• K-d algorithm Split most spread dimension at median.
• Top-down algorithm Split best dimension at point to minimize new ball volume.
• Insertion algorithm Insert ball on-line at minimum volume insertion location.
• Cheap insertion algorithm Insert ball on-line at heuristically good location.
• Bottom up algorithm Repeatedly pair the best two balls.
The balltrees produced from a Cantor set bythe five construction algorithms.
Insertion and Kd Cheap insertion
Bottom upTop down
8alltree volume and construction time vs. number of 2-d uniformly distributed point
leaves.
Construction Time Balltree Volume 254.88top_dn40.
203.9032. ~\I \
I I \ 152.9324. chpjns I '! .... f I . .J I \
..... '\ r \ 101.9516. ...... -"'" I \ I kd" ,...__ I \ ~
./ ,rOo" ! LJ ~ns 50.988. "'" - / ',--'/"-;:.......... - _....;.::-1==.. • --....=. -=.::::::..bOt up o00 I se:ry ~--"",;;;.__ .r---=...... ---="""'------• , ""T I I I I I I Io. I I I I I I I I I I .. o 100 200 300 400 500o 100 200 300 400 500 Number of Leaves Number of Leaves
8alltree volume and construction time vs. number of 2-d uniformly distributed point
leaves.
Construction Time Balhree Volume
254.88top_dn40.
boCup 203.90
32.
152.9324.
101.9516.
50.988.
0.00 I ==-; ..-=*""1' - '"f" ,;::; - - ......... - - - - - - - iii i
o 100 200 300 400 500 o 100 200 300 400 500
Number of LeavesNumber of Leaves
Octbox image trees
289 leaves, depth 12, average depth 8.7
~ ~\
edges level 0 level 1 level 2 level 3
level 4 level 5 level 6 level 7 level 12
Robot Arm Task
Camera
SystemArm
Kinematic space ,. Visual space
R3 ,. R6
Setup studied by Bartlett Mel in "Connectionist Robot Motion Planning"
Kinematic to Visual map testbed
predicted
C!!!!D ~ (]!!!!J First joint: (52] 0 1180
llecoru1 jotnt: [94] 0 1180
Third jotnt: (135] 0 1180
lIe.ples: [u] 1_ 1200
IInbrs: (5] 1M I so
actual
... .2 .... G
)
z c. 2 c. ~
o o.:a
------ en c::: o .... o c::: J
LL
en en C
CO
-C -c c ~
II
u. CO a: c ro en en :::J ro
C)
u. CO a:
--
.. 0.... w c:
0..... 0c: ::J
LI-
en.en 0 ra -0.-0
0 ~
0 0 (Y')
0 0 LL
" C
\JI
co I
a: I
c I
C'd /
0 0s....
en /
en ./
~0 s....
:::J ./
s.... C'd
--"
UJ
CJ -
Q.)
-s....
-C'd :::J
0 c-
o 0
0 0
C/)
(Y')
C\J
~
0 0
0 0
0c
. .
. .
C'd 0
0 0
0Q
.)
~
en - Q
.)
c. E
C'd en C
) c c C'd s.... +
-' ...... 0 s.... Q
.) .0
E
:::J z
---------- CD E
.... t»
c c... C
~
c 0 iii
0 C
:::J
L&
en en C
£
Q
-C -c C
~
.-. (J) u CD (J)
'-""
CD E
.+
-oJ
C')
c.c "~
CD ....J
LL COa::
o o
o o o o
o C
\J ,
o L()
(Y')
o L()
C\J
o L()
,
o L()
(J)
- (])
a. E
~
(J)
C')
.~ c.~
"
+-oJ
'+
0 "
~
E
::J Z
RBF vs. Bumptree Learning Time
Learning time (sees)
/ /
/2000 Gaussian RBFI I II
I I I I
/ I / I1000
/ / // Linear RBF / /
//,/../
~'/'" """",,'IIf/fII-"o -- ~
50 150 250 350
Number of training samples
---- CD E
I-c:: 0
..... 0 ::J........ C
I)
c: 0 0 CD CD........ a. E
::J
I:Q
en a. E
:J
0
,,, ..a
0 ('I)
, a>
en
', a>
a. L
~
E
a. :J
"-
E..a
":J
0
.- c "-
ClJ 0
~
"C
\I-0..
" " " -en
0u
" " ~ 0
a> ~
en ,
"""'-'"
a> E
~
0)
0 c c L
~
C\I
0 a>
--I
en a> a. E
~
en 0
)
C.c.~
L~
'+
0 L
a> ..a E
:J
Z
---
0 <.0 ~
CD
.- E 0 C
\JI
.--... ~
en U
Ol
LL Q
)c:
C(]
en.-
a: Q
)c:
c 0
... ro
co E
+
-'C
Q
) C
>CD
Q)
C'
..... +
-' C
'-c..
0 '
• 0
ro'-
E
~en
'Q
)
... :::J
--I Q
)>
W
C
(]
0 '-ro
... :::J
... c-
o (f)o
co
<.0 ~
C\J
0W
~
0 0
0 0
0C
0
0 0
0 0
0ro
. .
. .
. .
Q)
0 0
0 0
0 0
~
Function Evaluation Time
Eval time (secs)
Gaussian RBf;,// 0.03 ,/'/
.,;.,;",,/
0.02
/ ,,/ Bumptree
//
0.01
" O.OOU'+I-....----.---r-----,.--,-----,r---,
o 100 200 300
Number of training samples
Eval time (secs)
I Gaussian RBFI Bumptree
,LRan~raPh
2000 4000 6000 800010000
Number of training samples
c
What is a Bumptree?
A bumptree is a complete binary tree with a real valued function associated to each node such that an interior node's function is everywhere larger than the functions associated with the leaves beneath it.
!\ /57 ego ball supported bump
~ Een d ~b'0f a abc d e f
2-d leaf functions tree structure tree functions
Some bumptree functions
• kd-trees, oct-trees, boxtrees At?1 in rectangle, 0 outside:
• balltrees 1 in ball, 0 outside: /07 ~ or smooth bumps in ball, 0 outside: /Ey
-• Gaussian mixtures Really store quadratic form
A
:(Acin exponent: abed
For fast access, parent functions are quadratic function of distance from a coordinate partition on one side, 0 on other:
Bumptrees Queries
Pruning: • Return leaf functions which are larger than a given value on query point
Branch and bound queries: • Return leaf functions which are within a given factor or constant of the
largest leaf function value
Can find nearest neighbors in log expected time over a wide variety of probability distributions. In practice, works well up to about 10 input dimensions for a few thousand uniformly distributed data points. Structure in the distribution can give speedups in arbitrarily high dimensions.
en CD CD... a E
J 1
0
s:::. o
!!2.....
"Ci5 <D
en "'0
:J-~
oo
EU
) ....
Ol
<D ~
c::: c
--en
c::: ~
o a;
"...
0)
"Co
<D o
II
CD E
....
(J) .t:.....
Ol ---.....-_
(J)....
....c:
...c c
. . -a
en . .
-o C
a.
. ~
-..
... a. E
C
en
(J)a;
....:::J
:::::>o
en "C
c
.a .
~a.
0(J)
:E E
o
0 - >
c :2
.t:.:::J m
c
.- 0 "0
(J) ....
"0 o
~
o :::J
t 0
(J) -
0 0
Ec
..... C
I).....
~
0..- en
C
Bumptrees for Density Estimation, Classification and Constraint Learning
Use bumptree to hold normal mixture.
Density estimation by adaptive kernel estimator (with merging).
Classification by Bayes' decision rule on density.
Constraint queries such as partial match:
(the product of Gaussians is Gaussian, so normal mixtures may be composed to form normal mixtures)