Unsupervised Change Detection in the Feature Space Using Kernels.pdf

Intro The difference image in H Clustering Changes in H Experimental results Conclusions

Unsupervised Change Detection in the FeatureSpace Using Kernels

Michele Volpi 1, Devis Tuia 2, Gustavo Camps-Valls 2,and Mikhail Kanevski 1

1 Institute of Geomatics and Analysis of Risk (IGAR)University of Lausanne, Switzerland

2 Image Processing Laboratory (IPL)University of Valencia, Spain

IGARSS 2011, VancouverJuly, the 25th kernelcd.org

SWISS NATIONAL SCIENCE FOUNDATION

project no. 200021_126505/1

Michele Volpi (IGAR-UNIL) Change Detection in Feature Spaces IGARSS 2011 1 / 16


Unsupervised Change Detection

• Automatic, minimal user intervention

• Binary result: change / no-change map

• Many assumptions: linear combinations, covariances, specificvariables,...

• The ‘explicit’ way: per-pixel difference image analysis

• Present a way of performing accurate unsupervised changedetection with kernels:

- Difference image in feature spaces

- Clustering changes using kernel k-means



Difference image in the input space

Difference image in the input space X :

D = X2 − X1

• Unchanged pixels result in low difference pixel norm values

• Changed pixels have a direction and high norm

• For low dimensions: risk of ambiguity (e.g. high mixing)

• For high dimensions: hard to discover changes (e.g. `2-norm)



Difference image in the kernel-induced feature space

The difference image in feature spaces H:

• A non-linear combination can better describe transitions

(It will results in a linear combination of kernels)

• Maximize the dependence between images X1,X2 and ‘true’(unknown) labels Y

• Once the correct representation is found, a simple algorithmcan model the data



Difference image in the kernel-induced feature space

Formulation of the difference image in H:

Let ϕ(1,2)(·) two (possibly different) feature maps to H and A(1,2)

positive-definite scaling matrices:

φ(xi ) = A2ϕ2(xi )− A1ϕ1(xi ) (1)

computes the difference image in features spaces.

• Nonlinearly map and scale each image

• Plugging in Eq. (1), Mercer’s conditions:

〈φ(xi ),φ(xj)〉 =

k2(xi , xj) + k1(xi , xj)− k1,2(xi , xj)− k2,1(xi , xj)

= k(xi , xj)



Kernel k-means

• To discover changes: group similar pixels composing thedifference image in H• Simply use this kernel as the similarity/metric matrix!

The kernel k-means solves:

arg minµk

|k|∑k=1

∑i∈πk

‖φ(xi )− µk‖2 with µk =1

|πk |∑j∈πk

φ(xj)

In other words:

‖φ(xi )−µk‖2 = k(xi , xi ) +1

|πk |2∑j ,l∈πk

k(xj , xl)−2

|πk |∑j∈πk

k(xi , xj)

• Iteratively discovers dense hyperspherical clusters in H



Initialization

• Optimization is non-convex and prone to local minima (greedysolution)

• Provide a rough pseudo-training set to ‘train’ the kernelk-means

• Automatically refines the cluster assignments if theinitialization is noisy

True distribution Input distribution Refined Distribution



Initialization

• Compute the magnitude of the difference image ‖D‖2 (in X )

• Model a bimodal Gaussian Mixture on it

• Dense regions of the Gaussians are used for sampling(thresholds ∝ std.)

C

p

N

• Many methods

• Reliable initialization,widely used in changedetection (with minimumerror thresholding)



Cost Function

• But how to tune the kernel parameters Θ? They define thecorrect representation of the data!

• Rely on geometrical properties: unsupervised

• Closely related to Fisher’s ratio (maximum separation)

Θ∗ = arg minΘ

1n

∑k

∑i∈πk d(ϕ(xi ),µk |Θ)∑

k 6=p d(µk ,µp|Θ)

• Explicitly computed in H• Centroids with Θ∗ are used to assign pixels to clusters



Cost Function - Example: Gaussian RBF kernel k-means

−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5True distribution

• Tuning a single RBF σ

• Minimum of the costfunction corresponds ≈ tominimum error

Gaussian RBF kernel, σ = 0.2

0.20

0.05

0.1

0.15

0.2

Err/

cost

CostTr ue E rror




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 1.6931 True error: 0.483




0.20

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 0.10778 True error: 0.009




0.2 0.50

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 0.056636 True error: 0.01




0.2 0.5 0.70

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 0.043534 True error: 0.014




0.2 0.5 0.7 0.90

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 0.041345 True error: 0.016



Gaussian RBF kernel, σ = 1

0.2 0.5 0.7 0.910

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 0.040596 True error: 0.019




0.2 0.5 0.7 0.911.10

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 0.043006 True error: 0.148




0.2 0.5 0.7 0.911.1 1.50

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error




−3 −2 −1 0 1 2 3−4

−3

−2

−1

0

1

2

3

4

5Cost = 0.05359 True error: 0.185



Gaussian RBF kernel, σ = 2

0.2 0.5 0.7 0.911.1 1.5 20

0.05

0.1

0.15

0.2

σ

Err/cost

CostTrue Error



VHR QB Zurich, Switzerland

• Bruttisellen (Zurich),Switzerland

• Pansharpened, registrationerror ≈ 1px

• Some differenences inillumination (sun angle),shadows

2002

2006




Kernel k-Means on D ∈ XExample: nonlinear partitioning of the

difference image in input space

• 1 parameter to find, line search

0 1 2 3 4 5 6 7 8 9 100

0.2

0.4

0.6

0.8

1

1.2

1.4

RBF σ value

‘Distance’/Cost

Cost FunctionDist. Between ClustersDist. Within Clusters

Change Map

Dist. To ‘No change’

Dist. To ‘Change’




Kernel k-Means on D ∈ HExample: nonlinear partitioning of

difference image in RKHS

• Grid Search of 2 parameters:

σsingle and σcross

RBF σ Single Image Kernel

RBF

σCross

Kernel

0.5 1 1.5 2 2.50.5

1

1.5

2

2.5

3

3.5


RBF

σCross

Kernel

0.5 1 1.5 2 2.50.5

1

1.5

2

2.5

3

3.5


RBF

σCross

Kernel

0.5 1 1.5 2 2.50.5

1

1.5

2

2.5

3

3.5

Whithin Cluster Between Cluster CostDistance Distance Function

Change Map

Dist. To ‘No change’

Dist. To ‘Change’




• 10 independent runs, randominitialization

• 200 pseudo-training samples

• Best Accuracy given by low falsealarm rate

• Hit rate is similar for the threeapproaches

Skill ScoreEst. κ AUC

H diff, RBF 0.756 0.992X diff, RBF 0.686 0.983X diff, lin 0.599 0.811A

pp

roa

ch

H diff, RBF

X diff, RBF

X diff, lin

100 90 80 70 60 20304050 10 0

%

1009080706020 30 40 50100

Ch.

Nch.



Conclusions

• Difference image in H seems to be a better representation(greatly reduces the false alarm rate)

• Pseudo-training samples can be obtained at cost 0, besttrade-off between computational time and accuracy

• Parameters learned from data

• The correct convergence still not ensured (kernel k-means)

• Still unclear the relationship between kernel and overfitting

• Inclusion of spatial domain

• Correct convergence ↔ Initialization ↔ Change detection



Thank you for the attention!and thanks to my co-authors and colleagues:

kernelcd.orgSWISS NATIONAL SCIENCE FOUNDATION

project no. 200021_126505/1


Technology

Unsupervised Change Detection in the Feature Space Using Kernels.pdf