Geometric Learning Algorithms

Geometric Learning Algorithms

Stephen M. Omohundro

International Computer Science Institute

Berkeley, California

• Density estimation, classification, mappings

• Average case algorithms

• K-d trees and nearest neighbors

• Balltrees and boxtrees

• Robot arm task

• Bumptrees and mapping learning

Geometric Learning Algorithms

• Geometric - Mathematics

• Learning - Statistics

• Algorithms - Computer Science

• •

r ~Some Basic Tasks

• Density Estimation

• Classification learning

• Smooth nonlinear mapping learning

out , out.B .B • • B.B.B •

•AA .CC ....... .. ... ·...... I~ ./),A .QC

10 in

Classification Learning Mapping Learning

\.. ~

• • • • • • • • • ••

.,

Density estimation by Histogram

• • •

• ••

• •••~~•

• Bins too large -> washes out the fine structure

• Bins too small-> only one or no samples per bin

• Correct size for one region may be incorrect for another

Volume Density Estimator

• Instead of taking regions of fixed volume and counting samples

• take regions with a fixed number of samples and measure volume

• eg. Take inverse volume of nth nearest neighbor sphere

• Get small regions in high density parts of the space

• Large regions in low density parts of the space

• Need to compute the distance to the nth nearest neighbor.

The beta distribution and non-parametric statistics.

Let Sa. be a nested family of sets parameterized by a such that p (Sa.) = a. If we draw N points and choose the smallest set So.

containing exactly n pOints, then the a'S are distributed according to the Beta distribution:

p (a) - Nt n-l N-n n - F "i " F 11. T " , a ( 1 - a)__ •

n n 2 n ~-E (a) = 'lLT ."i N cr (a) ~2

N

class 2

Nearest Neighbor Classification

• Baye's optimal approach would choose that class for which the posterior distribution is largest.

class 1

• Typically don't know the distributions, however.

• Nearest neighbor assigns a point the class of the nearest example.

• Asymptotically error is less than twice that of Bayes optimal.

• Non-parametric.

Double bucket sort: an adaptive algorithm

• Sorting n values in [0,1] takes O(n log n) worst case.

• Would like good average case performance

• but, What distribution do you average over?

• Assume samples come from an unknown distribution, want good average with respect to that distribution.

• Bucket sort: Break interval into equal sized buckets, sort each one.

1-- I I- - I I- 1...1 1

• Double bucket sort: Further break up intervals into a number of subintervals proportional to the samples which fell into them.

• Linear expected time O(n), beats quicksort by factor of 2 in practice.

I - 4t I I - - I I - I .11111 I

0.

u

cr;

S .l""1

'1j

K -d trees can find n nearest neighbors in log expected time.

If N samples are drawn from a non-vanishing, smooth distribution on a compact region, then we can use a k-d tree to find the n nearest neighbors of a new sample in O(log N) expected time, asymptotically for large N.

Idea: Asymptotically, volume of n nearest neighbor ball is beta distributed. With k-d construction, tree regions are also beta distributed. n nearest neighbor ball will overlap only a constant number of tree regions on average. Using branch and bound, we do log time initial search and then constant extra work.

K -d Tree Search Dimension Dependence 0.16., Truncated .".

0.200.12 -I Brute sea.r;en/"

//"" 0.08 -I """"""

/"" 0.10

0.04 -I "" ",,// K-d search

0.00 If iii iii iii I 0.00 0 2000 4000 6000 8000 10000

4 Dimensions 8 Dimensions 9 Dimensions 0.30., 0.30 /

// 0.30

0.20

l0 k:::: -I /- 0.20-1

.

/'"

// "","" ----./ 0.20

0.10O.10~O.

0.00 i I : I 0.00 I I Iii I 0.00 0 2000 4000 6000 8000 10000 0 2000 4000 6000 8000 1 0000 0 2000 4000 6000 8000 10000

10 Dimensions 11 Dimensions 12 Dimensions

Average nearest neighbor search time (sees) vs number of points

""~",,"" 0.20

"","""".... //

0.10",,"" ",,/

,/""'

0.00 0 2000 4000 6000 8000 10000 a 2000 4000 6000 8000 10000

•en--C

.. .0

0)

0)

.... "'C

-1

ft

-0

)C

....o

J-

....co

C

J

OJ::

(J

.-en

en 0)C

Q)

0) ... E+:,.--0

_ en I-

Nc

«~ C

0).....

LU

Balltree queries. Pruning: • Return leaf balls containing a query point.

Branch and bound queries: • Return nearest leaf to query point.

Distribution independent average performance:

If leaves and queries are drawn from an underlying distribution p then would like good performance on average with respect to p.

Five balltree construction algorithms.

• K-d algorithm Split most spread dimension at median.

• Top-down algorithm Split best dimension at point to minimize new ball volume.

• Insertion algorithm Insert ball on-line at minimum volume insertion location.

• Cheap insertion algorithm Insert ball on-line at heuristically good location.

• Bottom up algorithm Repeatedly pair the best two balls.

The balltrees produced from a Cantor set bythe five construction algorithms.

Insertion and Kd Cheap insertion

Bottom upTop down

8alltree volume and construction time vs. number of 2-d uniformly distributed point

leaves.

Construction Time Balltree Volume 254.88top_dn40.

203.9032. ~\I \

I I \ 152.9324. chpjns I '! .... f I . .J I \

..... '\ r \ 101.9516. ...... -"'" I \ I kd" ,...__ I \ ~

./ ,rOo" ! LJ ~ns 50.988. "'" - / ',--'/"-;:.......... - _....;.::-1==.. • --....=. -=.::::::..bOt up o00 I se:ry ~--"",;;;.__ .r---=...... ---="""'------• , ""T I I I I I I Io. I I I I I I I I I I .. o 100 200 300 400 500o 100 200 300 400 500 Number of Leaves Number of Leaves

8alltree volume and construction time vs. number of 2-d uniformly distributed point

leaves.

Construction Time Balhree Volume

254.88top_dn40.

boCup 203.90

32.

152.9324.

101.9516.

50.988.

0.00 I ==-; ..-=*""1' - '"f" ,;::; - - ......... - - - - - - - iii i

o 100 200 300 400 500 o 100 200 300 400 500

Number of LeavesNumber of Leaves

Octbox image trees

289 leaves, depth 12, average depth 8.7

~ ~\

edges level 0 level 1 level 2 level 3

level 4 level 5 level 6 level 7 level 12

Robot Arm Task

Camera

SystemArm

Kinematic space ,. Visual space

R3 ,. R6

Setup studied by Bartlett Mel in "Connectionist Robot Motion Planning"

Kinematic to Visual map testbed

predicted

C!!!!D ~ (]!!!!J First joint: (52] 0 1180

llecoru1 jotnt: [94] 0 1180

Third jotnt: (135] 0 1180

lIe.ples: [u] 1_ 1200

IInbrs: (5] 1M I so

actual

... .2 .... G

)

z c. 2 c. ~

o o.:a

------ en c::: o .... o c::: J

LL

en en C

CO

-C -c c ~

II

u. CO a: c ro en en :::J ro

C)

u. CO a:

--

.. 0.... w c:

0..... 0c: ::J

LI-

en.en 0 ra -0.-0

0 ~

0 0 (Y')

0 0 LL

" C

\JI

co I

a: I

c I

C'd /

0 0s....

en /

en ./

~0 s....

:::J ./

s.... C'd

--"

UJ

CJ -

Q.)

-s....

-C'd :::J

0 c-

o 0

0 0

C/)

(Y')

C\J

~

0 0

0 0

0c

. .

. .

C'd 0

0 0

0Q

.)

~

en - Q

.)

c. E

C'd en C

) c c C'd s.... +

-' ...... 0 s.... Q

.) .0

E

:::J z

---------- CD E

.... t»

c c... C

~

c 0 iii

0 C

:::J

L&

en en C

£

Q

-C -c C

~

.-. (J) u CD (J)

'-""

CD E

.+

-oJ

C')

c.c "~

CD ....J

LL COa::

o o

o o o o

o C

\J ,

o L()

(Y')

o L()

C\J

o L()

,

o L()

(J)

- (])

a. E

~

(J)

C')

.~ c.~

"

+-oJ

'+

0 "

~

E

::J Z

RBF vs. Bumptree Learning Time

Learning time (sees)

/ /

/2000 Gaussian RBFI I II

I I I I

/ I / I1000

/ / // Linear RBF / /

//,/../

~'/'" """",,'IIf/fII-"o -- ~

50 150 250 350

Number of training samples

---- CD E

I-c:: 0

..... 0 ::J........ C

I)

c: 0 0 CD CD........ a. E

::J

I:Q

en a. E

:J

0

,,, ..a

0 ('I)

, a>

en

', a>

a. L

~

E

a. :J

"-

E..a

":J

0

.- c "-

ClJ 0

~

"C

\I-0..

" " " -en

0u

" " ~ 0

a> ~

en ,

"""'-'"

a> E

~

0)

0 c c L

~

C\I

0 a>

--I

en a> a. E

~

en 0

)

C.c.~

L~

'+

0 L

a> ..a E

:J

Z

---

0 <.0 ~

CD

.- E 0 C

\JI

.--... ~

en U

Ol

LL Q

)c:

C(]

en.-

a: Q

)c:

c 0

... ro

co E

+

-'C

Q

) C

>CD

Q)

C'

..... +

-' C

'-c..

0 '

• 0

ro'-

E

~en

'Q

)

... :::J

--I Q

)>

W

C

(]

0 '-ro

... :::J

... c-

o (f)o

co

<.0 ~

C\J

0W

~

0 0

0 0

0C

0

0 0

0 0

0ro

. .

. .

. .

Q)

0 0

0 0

0 0

~

Function Evaluation Time

Eval time (secs)

Gaussian RBf;,// 0.03 ,/'/

.,;.,;",,/

0.02

/ ,,/ Bumptree

//

0.01

" O.OOU'+I-....----.---r-----,.--,-----,r---,

o 100 200 300


Eval time (secs)

I Gaussian RBFI Bumptree

,LRan~raPh

2000 4000 6000 800010000


c

What is a Bumptree?

A bumptree is a complete binary tree with a real valued function associated to each node such that an interior node's function is everywhere larger than the functions associated with the leaves beneath it.

!\ /57 ego ball supported bump

~ Een d ~b'0f a abc d e f

2-d leaf functions tree structure tree functions

Some bumptree functions

• kd-trees, oct-trees, boxtrees At?1 in rectangle, 0 outside:

• balltrees 1 in ball, 0 outside: /07 ~ or smooth bumps in ball, 0 outside: /Ey

-• Gaussian mixtures Really store quadratic form

A

:(Acin exponent: abed

For fast access, parent functions are quadratic function of distance from a coordinate partition on one side, 0 on other:

Bumptrees Queries

Pruning: • Return leaf functions which are larger than a given value on query point

Branch and bound queries: • Return leaf functions which are within a given factor or constant of the

largest leaf function value

Can find nearest neighbors in log expected time over a wide variety of probability distributions. In practice, works well up to about 10 input dimensions for a few thousand uniformly distributed data points. Structure in the distribution can give speedups in arbitrarily high dimensions.

en CD CD... a E

J 1

0

s:::. o

!!2.....

"Ci5 <D

en "'0

:J-~

oo

EU

) ....

Ol

<D ~

c::: c

--en

c::: ~

o a;

"...

0)

"Co

<D o

II

CD E

....

(J) .t:.....

Ol ---.....-_

(J)....

....c:

...c c

. . -a

en . .

-o C

a.

. ~

-..

... a. E

C

en

(J)a;

....:::J

:::::>o

en "C

c

.a .

~a.

0(J)

:E E

o

0 - >

c :2

.t:.:::J m

c

.- 0 "0

(J) ....

"0 o

~

o :::J

t 0

(J) -

0 0

Ec

..... C

I).....

~

0..- en

C

Bumptrees for Density Estimation, Classification and Constraint Learning

Use bumptree to hold normal mixture.

Density estimation by adaptive kernel estimator (with merging).

Classification by Bayes' decision rule on density.

Constraint queries such as partial match:

(the product of Gaussians is Gaussian, so normal mixtures may be composed to form normal mixtures)

Documents

Geometric Learning Algorithms