On the Basis Learning Rule of Adaptive-Subspace SOM (ASSOM)

On the Basis Learning Rule of Adaptive-Subspace SOM (ASSOM)

Huicheng Zheng, Christophe Laurent and Grégoire Lefebvre

13th September 2006

Thanks to the MUSCLE Internal Fellowship (http://www.muscle-noe.org).

ICANN’06

http://www.muscle-noe.org/

2

Outline• Introduction• Minimization of the ASSOM objective

function• Fast-learning methods

– Insight on the basis vector rotation– Batch-mode basis vector updating

• Experiments• Conclusions

3

Motivation of ASSOM

• Learning “invariance classes” with subspace learning and SOM [Kohonen. T., et al., 1997]– For example: spatial-translation invariance

rectangles

circles

triangles

……

4

Applications of ASSOM• Invariant feature formation

[Kohonen, T., et al., 1997]• Speech processing

[Hase, H., et al., 1996]• Texture segmentation

[Ruiz del Solar, J., 1998]• Image retrieval

[De Ridder, D., et al., 2000]• Image classification

[Zhang, B., et al., 1999]

5

ASSOM Modules Representing Subspaces

The module arrays in ASSOM

Rectangular topology

Hexagonal topology

1b 2b Hb

Q

x

1Tbx

2ˆ )(L jx

2Tbx Hbx

T

A module representing the subspace L(j)

c

i

j

ci

j

6

Competition and Adaptation

• Repeatedly:– Competition: The winner

– Adaptation: For the winner and the modules i in its neighborhood

– Orthonormalize the basis vectors

2ˆmaxarg )(L j

jc x

)(')()( ),( ih

ic

ih t bxpb

xx

xxIxp

)(ˆ)()(),(

T)()(

i

thtt ic

ic

L

N×N matrix:)(' i

hb

)(ihb )(i

cp

7

Transformation Invariance• Episodes correspond to signal subspaces.• Example:

– One episode, S, consists of 8 vectors. Each vector is translated in time with respect to the others.

8

Episode Learning

• Episode winner

• Adaptation: for each sample x(s) in the episode X={x(s), s S} – Rotate the basis vectors

– Orthonormalize the basis vectors

Ss

jsc j

2)(ˆmaxarg )(L

x

)(')()( )),(( ih

ic

ih ts bxpb

9

Deficiency of the Traditional Learning Rule

• Rotation operator pc(i)(x(s),t) is an N×N matrix.

– N: input vector dimension

• Approximately:NOP (number of operations) ∝ MN2

– M: subspace dimension

10

Efforts in the Literature

• Adaptive Subspace Map (ASM) [De Ridder, D., et al., 2000]:– Drop topological ordering– Perform a batch-mode updating with PCA– Essentially not ASSOM.

• Replace the basis updating rule [McGlinchey, S.J., Fyfe, C., 1998]– NOP ∝ M2N

11

Outline• Introduction• Minimization of the ASSOM objective

function• Fast-learning methods

– Insight on the basis vector rotation– Batch-mode basis vector updating

• Experiments• Conclusions

12

Minimization of the ASSOMObjective Function

XXx

xd)(

)(

)(~

2

2

)( )(

Ps

shE

i Ss

ic

iL

)()( ˆ~ii LL

xxx where:

(projection error)

P(X): probability density function of X

Solution: Stochastic gradient descent:

)('2

T)()(

)(

)()()()( i

hSs

ic

ih

s

sstht b

x

xxIb

)(t : Learning rate function

13


)(tWhen is small:

Ss

ic

Ss

ic

s

sstht

s

sstht 2

T)(

2

T)(

)(

)()()()(

)(

)()()()(

x

xxI

x

xxI

In practice, better stability has been observed by the modified form proposed in [Kohonen, T., et al., 1997]

Ss

ic

ic

ss

ssthtt

i )()(ˆ

)()()()()(

)(

T)()(

xx

xxIM

L

14


• corresponds to a modified objective function:)()( ticM

XXx

xd)(

)(

)(ˆ )()(m P

s

shE

i Ss

ic

iL

• Solution to Em:

Ss

ic

ic

ss

ssthtt

i )()(ˆ

)()()()()(

)(

T)()(

xx

xxIB

L

• When is small:)(t

)()( )()( tt ic

ic BM

15

Outline

• Introduction

• Minimization of the ASSOM objective function

• Fast-learning methods– Insight on the basis vector rotation– Batch-mode basis vector updating

• Experiments

• Conclusions

16

Insight on the Basis Vector Rotation

• Recall: traditional learning

)()(ˆ

)()()()()),((

)(

)()(

ss

ssthtts

i

Ti

ci

cxx

xxIxp

L

)(')()( )),(( ih

ic

ih ts bxpb

17


)(ihb

)(),()(,

)( stsihc

ih xb

)()(ˆ

)()()(),(

)(

)(')()(

,ss

sthtts

i

ih

Ti

cihc

xx

bx

L

)()(ˆ

)()()()(

)(

)(')()(')(

ss

sstht

i

ih

Ti

ci

hi

hxx

bxxbb

L

scalar

projection

• For fast computing, calculate first, then scale x(s) with to get

• NOP ∝ MN

• Referred to as FL-ASSOM (Fast-Learning ASSOM)

),()(, tsihc

Scalar

),()(, tsihc )(i

hb

18


)(ihb

)(' ihb

)(),()(,

)( stsihc

ih xb

)(sx

19

Outline

• Introduction



• Experiments

• Conclusions

20

Batch-mode Fast Learning(BFL-ASSOM)

• Motivation: Re-use the previously calculated during module competition.

)()(ˆ

)()()(),(

)(

)('T)()(

,ss

sthtts

i

ihi

cihc

xx

bx

L

• In the basic ASSOM, L(i) keeps changing with receiving of each component vector x(s). has to be re-calculated for each x(s).

)(ˆ )( siLx

)(ˆ )( siLx

21

Batch-mode Rotation• Use the solution to the modified objective

function Em:

Ss

Ti

ci

css

ssthtt

i )()(ˆ

)()()()()(

)(

)()(

xx

xxIB

L

• Subspace remains the same for all the component vectors in the episode. We can now use calculated during module competition.

)(ˆ )( siLx

22

Batch-mode Fast Learning

Ss

ihc

ih sts )(),()(

,)( xb

)()(ˆ

)()()(),(

)(

)(')()(

,ss

sthtts

i

ih

Ti

cihc

xx

bx

L

where ),()(, tsihc is a scalar defined by:

• Correction is a linear combination of component vectors x(s) in the episode.

• For each episode, one orthonormalization of basis vectors is enough.

)(ihb

23

Outline

• Introduction



• Experiments

• Conclusions

24

Experimental Demonstration• Emergence of translation-invariant filters

– Episodes are drawn from a colored noise image

– Vectors in episodes are subject to translation

white noise image colored noise image

• Example episode (magnified):

25

Resulted Filters

1b

2b

FL-ASSOM BFL-ASSOM

5.1

5.2

5.3

5.4

5.5

5.6

5.7

5.8

5.9

6

0 5 10 15 20 25 30(×10

3)

FL-ASSOM

BFL-ASSOM

e

t

Decrease of the average projection error e with learning step t:

26

Timing Results

Times given in seconds for 1,000 training steps.

M: subspace dimension

N: input vector dimension

VU: Vector Updating time

WL: Whole Learning time

27

Timing Results

0

200

400

600

800

1000

1200

50 100 200 400 N

VU(s)

ASSOM

FL-ASSOM

BFL-ASSOM

0

200

400

600

800

1000

1200

2 3 4 M

VU(s)

ASSOMFL-ASSOM

BFL-ASSOM

Change of vector updating time (VU) with input dimension N:

Change of vector updating time (VU) with subspace dimension M:

Vertical scales of FL-ASSOM and BFL-ASSOM have been magnified 10 times for clarity.

28

Outline

• Introduction

• Minimization of ASSOM objective function


• Experiments

• Conclusions

29

Conclusions

• The basic ASSOM algorithm corresponds to a modified objective function.

• Updating of basis vectors in the basic ASSOM correponds to a scaling of the component vectors in the input episode.

• In batch-mode updating, the correction to the basis vectors is a linear combination of component vectors in the input episode.

• Basis learning can be dramatically boosted with the previous understandings.

30

References• De Ridder, D., et al., 2000: The adaptive subspace map for image

description and image database retrieval. SSPR&SPR 2000.

• Hase, H., et al., 1996: Speech signal processing using Adaptive Subspace SOM (ASSOM). Technical Report NC95-140, The Inst. of Electronics, Information and Communication Engineers, Tottori University, Koyama, Japan.

• Kohonen, T., et al., 1997: Self-Organized formation of various invariant-feature filters in the adaptive-subspace SOM. Neural Computation 9(6).

• McGlinchey, S. J., Fyfe, C., 1998: Fast formation of invariant feature maps. EUSIPCO’98.

• Ruiz del Solar, J., 1998: Texsom: texture segmentation using Self-Organizing Maps. Neurocomputing 21(1–3).

• Zhang, B., et al., 1999: Handwritten digit recognition by adaptive-subspace self-organizing map (ASSOM). IEEE Trans. on Neural Networks 10:4.

31

Thanks and questions?

Documents

On the Basis Learning Rule of Adaptive-Subspace SOM (ASSOM)