23
Dynamics and generalization of LVQ , Birmingham , 09-12- 05 Vector Quantization (VQ) and Learning Vector Quantization (LVQ) References M. Biehl, A. Freking, G. Reents Dynamics of on-line competitive learning Europhysics Letters 38 (1997) 73-78 M. Biehl, A. Ghosh, B. Hammer Dynamics and generalization ability of LVQ algorithms J. Machine Learning Research 8 (2007) 323-360 and references in the latter

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ)

Embed Size (px)

DESCRIPTION

3) Vector Quantization (VQ) and Learning Vector Quantization (LVQ). References M. Biehl, A. Freking, G. Reents Dynamics of on-line competitive learning Europhysics Letters 38 (1997) 73-78 M. Biehl, A. Ghosh, B. Hammer Dynamics and generalization ability of LVQ algorithms - PowerPoint PPT Presentation

Citation preview

Dynamics and generalization of LVQ Birmingham 09-12- 05

3) Vector Quantization (VQ)

and Learning Vector Quantization (LVQ)

References

M Biehl A Freking G ReentsDynamics of on-line competitive learningEurophysics Letters 38 (1997) 73-78

M Biehl A Ghosh B HammerDynamics and generalization ability of LVQ algorithmsJ Machine Learning Research 8 (2007) 323-360

and references in the latter

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectors

example

identification and grouping

in clusters of similar data

assignment of feature vector to the closest prototype w

(similarity or distance measure

eg Euclidean distance )

Dynamics and generalization of LVQ Birmingham 09-12- 05

unsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototype ie the so-called winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) on-line gradient descent with respect to

the cost function

Dynamics and generalization of LVQ Birmingham 09-12- 05

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

Dynamics and generalization of LVQ Birmingham 09-12- 05

bull identify the closest prototype ie the so-called winner

bull initialize prototype vectors for different classes

bull present a single example

bull move the winner - closer towards the data (same class)

- away from the data (different class)

classification

assignment of a vector to the class of the closest

prototype w

aim generalization ability

classification of novel data

after learning from examples

∙ identification of prototype vectors from labelled example data

∙ distance based classification (eg Euclidean Manhattan hellip)

basic heuristic LVQ scheme LVQ1 [Kohonen]

piecewise linear decision boundaries

Learning Vector Quantization

(t)wξ(t)w1tw (t)w

η

N-dimfeature space

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamics and generalization of LVQ Birmingham 09-12- 05

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

cluster distance prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

indep components with

and variance

ℝN

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization (VQ)

aim

representation of large amounts

of data by (few) prototype vectors

example

identification and grouping

in clusters of similar data

assignment of feature vector to the closest prototype w

(similarity or distance measure

eg Euclidean distance )

Dynamics and generalization of LVQ Birmingham 09-12- 05

unsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototype ie the so-called winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) on-line gradient descent with respect to

the cost function

Dynamics and generalization of LVQ Birmingham 09-12- 05

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

Dynamics and generalization of LVQ Birmingham 09-12- 05

bull identify the closest prototype ie the so-called winner

bull initialize prototype vectors for different classes

bull present a single example

bull move the winner - closer towards the data (same class)

- away from the data (different class)

classification

assignment of a vector to the class of the closest

prototype w

aim generalization ability

classification of novel data

after learning from examples

∙ identification of prototype vectors from labelled example data

∙ distance based classification (eg Euclidean Manhattan hellip)

basic heuristic LVQ scheme LVQ1 [Kohonen]

piecewise linear decision boundaries

Learning Vector Quantization

(t)wξ(t)w1tw (t)w

η

N-dimfeature space

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamics and generalization of LVQ Birmingham 09-12- 05

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

cluster distance prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

indep components with

and variance

ℝN

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

unsupervised competitive learning

bull initialize K prototype vectors

bull present a single example

bull identify the closest prototype ie the so-called winner

bull move the winner even closer towards the example

intuitively clear plausible procedure

- places prototypes in areas with high density of data

- identifies the most relevant combinations of features

- (stochastic) on-line gradient descent with respect to

the cost function

Dynamics and generalization of LVQ Birmingham 09-12- 05

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

Dynamics and generalization of LVQ Birmingham 09-12- 05

bull identify the closest prototype ie the so-called winner

bull initialize prototype vectors for different classes

bull present a single example

bull move the winner - closer towards the data (same class)

- away from the data (different class)

classification

assignment of a vector to the class of the closest

prototype w

aim generalization ability

classification of novel data

after learning from examples

∙ identification of prototype vectors from labelled example data

∙ distance based classification (eg Euclidean Manhattan hellip)

basic heuristic LVQ scheme LVQ1 [Kohonen]

piecewise linear decision boundaries

Learning Vector Quantization

(t)wξ(t)w1tw (t)w

η

N-dimfeature space

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamics and generalization of LVQ Birmingham 09-12- 05

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

cluster distance prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

indep components with

and variance

ℝN

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

quantization error

μj

μk

K

jk

P

1μj

μK

1jVQ ddΘ

2 wξH

μjdprototypes data wj is the winner

here

Euclidean distance

aim faithful representation (in general ne clustering )

Result depends on - the number of prototype vectors - the distance measure metric used

Dynamics and generalization of LVQ Birmingham 09-12- 05

bull identify the closest prototype ie the so-called winner

bull initialize prototype vectors for different classes

bull present a single example

bull move the winner - closer towards the data (same class)

- away from the data (different class)

classification

assignment of a vector to the class of the closest

prototype w

aim generalization ability

classification of novel data

after learning from examples

∙ identification of prototype vectors from labelled example data

∙ distance based classification (eg Euclidean Manhattan hellip)

basic heuristic LVQ scheme LVQ1 [Kohonen]

piecewise linear decision boundaries

Learning Vector Quantization

(t)wξ(t)w1tw (t)w

η

N-dimfeature space

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamics and generalization of LVQ Birmingham 09-12- 05

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

cluster distance prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

indep components with

and variance

ℝN

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

bull identify the closest prototype ie the so-called winner

bull initialize prototype vectors for different classes

bull present a single example

bull move the winner - closer towards the data (same class)

- away from the data (different class)

classification

assignment of a vector to the class of the closest

prototype w

aim generalization ability

classification of novel data

after learning from examples

∙ identification of prototype vectors from labelled example data

∙ distance based classification (eg Euclidean Manhattan hellip)

basic heuristic LVQ scheme LVQ1 [Kohonen]

piecewise linear decision boundaries

Learning Vector Quantization

(t)wξ(t)w1tw (t)w

η

N-dimfeature space

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamics and generalization of LVQ Birmingham 09-12- 05

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

cluster distance prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

indep components with

and variance

ℝN

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ algorithms

- frequently applied in a variety

of practical problems

- plausible intuitive flexible

- fast easy to implement

- often based on heuristic arguments

or cost functions with unclear relation to generalization

- limited theoretical understanding of

- dynamics and convergence properties

- achievable generalization ability

here analysis of LVQ algorithms wrt

- dynamics of the learning process

- performance ie generalization ability

- typical properties in a model situation

Dynamics and generalization of LVQ Birmingham 09-12- 05

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

cluster distance prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

indep components with

and variance

ℝN

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Model situation two clusters of N-dimensional data

random vectors isin ℝN according to σ)P(p )P(1σ

σ ξξ

σN2

σ

- v 2

1exp

v 2π

1σ)P( Βξξ mixture of two Gaussians

orthonormal center vectors

B+ B- isin ℝN ( B )2 =1 B+ B- =0

prior weights of classes p+ p-

p+ + p- = 1

B+

B-

(p+)

(p-)

cluster distance prop ℓ ℓ

jj Bσσξ

σσσvξξ

22jj

indep components with

and variance

ℝN

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

high-dimensional data (formally Ninfin)

ξμ isinℝN N=200 ℓ=1 p+=04 v+=144 v-=064μ

B

( 240)( 160)

projections into the plane of center vectors B+ B-

μ By ξ

μ 2

2xξ

w

projections on two independent random directions w12

μ 11x ξw

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Dynamics of on-line training

sequence of new independent random examples 123μσμμ ξ

drawn according to μμσ σPp μ ξ

learning ratestep size

competitiondirection ofupdate etc

change of prototypetowards or away from the current data

example

LVQ1 original formulation [Kohonen]

Winner-Takes-All (WTA) algorithm

μs

μs

μs d d σS f

1-μs

μμμs-

μss

1-μs

μs σSddf

N

ηwξww 21

μs

μμs

μ

d

1σS

update of two prototype vectors w+ w-

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

algorithm recursions

Mathematical analysis of the learning dynamics

11 σtsμt

μs

μstσ

μs

μsσ QBR www

projections into the (B+ B- )-plane

length and relativeposition of prototypes

1 description in terms of a few characteristic quantitities

( here ℝ2N ℝ7 )

2 average over the current example

random vector according to avg lengthσ)|P( μξ 22 vN σσ

ξ

in the thermodynamic limit N μμ

μ1-μs

μs

By

wx

ξ

ξ

correlated Gaussian random quantities

completely specified in terms of first and second moments (wo indices μ)

sσσ

N

1jjsσs R x

jw stσσtσsσt s Qv xx- xx

sσσσsσ s Rv yx- yx σσσσv yy- yy

sσσ y

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

averaged recursions closed in p σ1σ

σ

μsσ

μst R Q

- depend on the random sequence of example data

- their fluctuations vanish with N

learning dynamics is completely described in terms of averages

3 self-averaging property of characteristic quantities

μsσ

μst R Q

1N

(mean and variance)

R++ (α=10) computer simulations (LVQ1)

- mean results approach theoretical prediction- variance vanishes as N

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

4 continuous learning time

N

μ α of examples

of learning stepsper degree of freedom

) α (R ) α (Q sσst integration yields evolution of projections

stochastic recursions deterministic ODE

probability for misclassification of a novel example

ddpddp gε

]2

]

]2

][

2

1

2

1

QQ[Qv

R[R2QQ

QQ[Q v

RR2QQpp

5 learning curve

generalization error εg(α) after training with α N examples

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

initialization ws(0)=0

theory and simulation (N=100)p+=08 v+=4 p+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

self-averaging property

(mean and variances)

1N

R++ (α=10)

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ1 The winner takes it all

winner ws 1

1-μs

μμμS

μS

1-μs

μs Sσdd

N

ηwξww

only the winner is updated according to the class label

w-

w+

ℓ B-

ℓ B+

RS- w+

RS+

Trajectories in the (B+B- )-plane

(bull) =2040140 optimal decision boundary ____ asymptotic position

initialization ws(0)asymp0

theory and simulation (N=100)p+=08 v+=4 v+=9 ℓ=20 =10

averaged over 100 indep runs

Q++

Q--

Q+-

α

RSσ

tsst

σssσ

Q

BR

ww

w

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning curve

η= 201002

- suboptimal non-monotonic behavior for small η

εg (αinfin) grows linearly with η- stationary state

η 0 αinfin (η α ) infin

- well-defined asymptotics

η

εgp+ = 02 ℓ=10

v+ = v- = 10

achievable generalization error

εgεg

p+ p+

v+ = v- =10 v+ =025 v-=081

best linear boundary― LVQ1

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

LVQ 21 [Kohonen] here update correct and wrong winner

1-μs

μ1-μs

μs Sσ

N

ηwξww

αQQRR

Q R R

with

finite remain

Q R R

R Q

Q R

α 102 4 86

6-

0

6theory and simulation (N=100)

p+=08 ℓ=1 v+=v-=1 =05

averages over 100 independent runs

problem instability of the algorithm

due to repulsion of wrong prototypes

trivial classification for αinfin

εg = min p+p- RS+

RS-

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

suggested strategy

selection of data in a window close to the current decision boundary

slows down the repulsion system remains instable

Early stopping end training process at minimal εg (idealized)

εg

η= 20 10 05

η

- pronounced minimum in εg (α) depends on initialization and cluster geometry

- here lowest minimum value reached for η0

v+ =025 v-=081εg

p+

― LVQ1__ early stopping

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Learning From Mistakes (LFM)

1-μs

μμμσ-

μσ

1-μs

μs Sσdd

N

ημμ wξww

LVQ21 updateonly if the current classification is wrong

crisp limit version of Soft Robust LVQ [Seo and Obermayer 2003]

projected trajetory

ℓ B-

ℓ B+

RS+

RS-

εg

p+=08 ℓ=30

v+=40 v-=90

η= 20 10 05

Learning curves

η-independent asymptotic εg p+=08 ℓ= 12 v+=v-=10

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

εg

p+

equal cluster variances

p+

unequal variances

best linear boundary

― LVQ1

--- LVQ21 (early stopping)middot-middot LFM

Comparison achievable generalization ability

v+=025 v-=081v+=v-=10

― trivial classification

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Vector Quantization

competitive learning 1-μs

μμS

μS

1-μs

μs dd

N

ηwξww

ws winner

class membership is unknown

or identical for all data

numerical integration for ws(0)asymp0 ( p+=02 ℓ=10 =12 )

εg

α

VQ

LVQ+

LVQ1

αα

R++

R+-

R-+

R--

100 200 3000

0

10system is invariant under

exchange of the prototypes

weakly repulsive fixed

points

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

interpretations

- VQ unsupervised learning unlabelled data

- LVQ two prototypes of the same class identical labels

- LVQ different classes but labels are not used in training

εg

p+

asymptotics (0 )

p+asymp0

p-asymp1

- low quantization error- high gen error εg

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Summary

bulla model scenario of LVQ training

two clusters two prototypes

dynamics of online training

bullcomparison of algorithms (within the model)

LVQ 1 original formulation of LVQ

with close to optimal asymptotic generalization

LVQ 21 intuitive extension creates instability

trivial (stationary) classification

+ stopping potentially good performance

practical difficulties depends on initialization

LFM crisp limit of Soft Robust LVQ stable behavior

far from optimal generalization

VQ description of in-class competition

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line

Dynamics and generalization of LVQ Birmingham 09-12- 05

Outlook

bullSelf-Organizing Maps (SOM)

neighborhood preserving SOM Neural Gas (distance rank based)

bull Generalized Relevance LVQ [eg Hammer amp Villmann]

adaptive metrics eg distance measure

N

i

iii w1

2)( sλ ξξwd

training

bullapplications

bull multi-class multi-prototype problems

bull optimized procedures learning rate schedules

variational approach Bayes optimal on-line