ICI for Change Detection - polimi.it · 2011. 10. 4. · ICI rule is used on feature values to determine to which extent these can be considered constant, and thus stationarity X

Change Detection Tests Using the ICI rule

Giacomo Boracchi

Dipartimento di Elettronica e Informazione,

Politecnico di Milano

[email protected]

Department of Signal Processing

Tampere University of Technology

4 October 2011

Application Scenario

� Reliable systems working in real-world environments have to handle the occurrence of unpredictable events causing changes in the data generating process

• ageing effects

• thermal drifts

• faults

� Change detection is of relevant interest for two main reasons

1. Changes may be due to faults, malfunctioning and ageing effects e.g.

Industrial production (Statistical Process Control, Control Charts)

2. On-line systems (such as a classifier) have to adapt to the new operating

conditions to maintain the performance after the change.

[Alippi10a] Alippi, C., Boracchi, G,. Roveri, M., "Change Detection Tests Using the ICI rule" in Proceedings of IJCNN 2010, 18 - 23 July, 2010 Barcelona, Spain.

[Alippi10b] Alippi, C., Boracchi, G,. Roveri, M., "Adaptive Classifiers with ICI-

based Adaptive Knowledge Base Management", Proceedings of ICANN 2010, September 15-18, 2010, Thessaloniki, Greece

4 October 2011



• ageing effects

• thermal drifts

• faults









4 October 2011



• ageing effects

• thermal drifts

• faults









4 October 2011

Problem Statement

Problem statement

� Let be the stochastic process

� is stationary until (i.e. data are i.i.d.), then it may change.

� The goal is to determine , the time instant when becomes non stationary.

� Change has to be detected online.

What do we know?

� parametric tests: Distribution of (before and/or after the

change)

� non-parametric tests: a training set ,

generated by in stationary condition.X

*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

X

X 0T

X

4 October 2011

:X →N R

Problem Statement

Problem statement





What do we know?


change)



*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

X

:X →N R

X 0T

X

4 October 2011

Problem Statement

Problem statement





What do we know?


change)



*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

X

X 0T

X

4 October 2011

:X →N R

Problem Statement

Problem statement




� Design an online statistical test for change detection.

What do we know?


change)



*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

X

X 0T

X

4 October 2011

:X →N R

Problem Statement

Problem statement




� Design an online statistical test for change detection

What do we know?


change)



*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

X

X 0T

X

4 October 2011

:X →N R

Problem Statement

Problem statement





What do we know?


change)



*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

X

X 0T

X

4 October 2011

:X →N R

Problem Statement

Problem statement





What do we know?


change)



*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

X

X 0T

X

4 October 2011

:X →N R

Change Detection Tests

� Parametric statistical tests:

• CUSUM test: (control area) based on the cumulative sum charts

�: widely used, easy to implement;

�: knowledge of the pdfs, parameters to fix

� Non-parametric statistical tests

• Mann-Kendall test: (environmental science) : evaluates the signs of

observed values differences

�: widely used, easy to implement;

�: thresholds to fix, high computational complexity

• CI-CUSUM test: (computational intelligence)

based on features extracted from data

�: automatical configuration of the parameters, effective;

�: large training set

4 October 2011

The contributions

1. We propose a change detection test that requires only a training set

2. Once the change has been detected we provide the test the capability to identify a novel training set.

TS

0TO

0TO

0T

0T

*T T̂

*T T̂

4 October 2011

The Contributions

1. We propose a change detection test that requires only a training set

2. Once the change has been detected we provide the test the capability to identify a novel training set.

TS

0TO

0TO

0T

0T

*T T̂

*T T̂

4 October 2011

The core elements of the test

1. The ICI rule

• Intersection of Confidence Interval rule (ICI) is an adaptation

algorithm, used to define neighborhoods for polynomial regression

[Goldenshluger97, Katkovnik99]

[Goldenshluger97] Goldenshluger, A., and Nemirovski A., “On spatial adaptive estimation of nonparametric regression”, Math. Meth. Statistics, vol. 6, pp. 135-170, 1997

[Katkovnik99] Katkovnik, V.: “A new method for varying adaptive bandwidth selection”. In: Signal Processing, IEEE Transactions. on, vol. 47, no. 9, pp. 2567-2571, 1999.

4 October 2011

The ICI rule

� The ICI rule operates, combined with a polynomial regression technique, on sequences of noisy data having Gaussian distribution

� Given a set of a set of nested neighborhoods in t

• the corresponding polynomial fits

• the value of

� The ICI selects an adaptive neighborhood which is the largest for which the polynomial model fits the data.

� Thus, it is reasonable to exploit the ICI selected neighborhood to detect nonstationarity in the process.

4 October 2011

The ICI rule




• the value of



4 October 2011

The ICI rule




• the value of



4 October 2011

The ICI rule




• the value of



4 October 2011


1. The ICI rule




2. The Gaussianization via Feature Extraction

• Since data distribution is unknown, ICI cannot be directly applied on observed data

4 October 2011

Gaussianization via Feature Extraction

� Process stationarity is monitored through features, functions of observed values that are Gaussian distributed, such as:

• the sample mean of non-overlapping sets of data

• the sample variance of non-overlapping sets of data values transformed with Gaussianizing power-law [Mudholkar81]

• ....

� When is stationary, any feature should be distributed as

� ICI rule is used on feature values to determine to which extent these can be considered constant, and thus stationarity

X

X

[Mudholkar81] Mudholkar G. S., Trivedi M. C.: “A Gaussian Approximation to the Distribution of the Sample Variance for Nonnormal Populations”. In: Journal of the American Statistical Association, Vol. 76, No. 374 (Jun., 1981), pp. 479-485

4 October 2011





• ....



X

X


4 October 2011





• ....



X

X


4 October 2011


1. The ICI rule




2. The Gaussianization via Feature Extraction

• Since data distribution is unknown, ICI cannot be directly applied on observed data

• Features represents the process-handles:

− Only changes affecting features can be perceived

− A change is detected in when a change is detected in any feature

− Beside the two sample moments, other features can be devised

X

4 October 2011

Training Phase

� The training set has to provide, for each feature

• an estimate for its expected value (in stationarity)

• an estimate of its standard deviation (in stationarity)

� is divided it into non-overlapping subsequences of observations

� In each subsequence we compute

• the sample mean

• the sample variance

0 0{ ( ), 1, , }T X t TO t= = …

( )·M

( )·S

0TO

(1) , (1)M S 0 0( ) , ( )M s S s0T

ν0TO

4 October 2011

Training Phase




� is divided it into non-overlapping subsequences of observations

� In each subsequence we compute

• the sample mean

• the sample variance

0 0{ ( ), 1, , }T X t TO t= = …

( )·M

( )·S

0TO

0T(1) , (1)M S 0 0( ) , ( )M s S s

ν0TO

4 October 2011

Training Phase




� The whole training set is used to compute the exponent of the power-law transformation [Mudholkar81] to give approximate

Gaussian distribution the sample variance

0 0{ ( ), 1, , }T X t TO t= = …

0h

( )0( )

( ) ,1

hS s

S sν

=−

T

0TO

0T

0h

4 October 2011

Training Phase




� The values of the sample variance are replaced by

� For each feature we compute

• the mean (in stationarity)

• the standard deviation (in stationarity)

0 0{ ( ), 1, , }T X t TO t= = …

( ))( () sV s S=T

0TO

0T(1) , (1)M V 0 0( ) , ( )M s V s

0 0,ˆ ˆ

SM V

Sµ µ

0 0,ˆ ˆ

SM V

Sσ σ

4 October 2011

The Test

� Summarizing the training phase provides the test

• an estimate of each feature distribution and in particular

� The test is composed of the following three parts:

Feature Extraction

ICI ruletest

outcomePolynomial Regression

0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V

S S S S Sµ σ µ σ = − Γ Γ+ I

0 0 0 0 0;ˆ ˆ ˆ ˆ

S S S SM M M M M

Sµ σ µ σΓ Γ = − + I

4 October 2011

The Test

� Summarizing the training phase provides the test

• an estimate of each feature distribution and in particular

� The test is composed of the following three parts:

Feature Extraction

ICI ruletest


0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V

S S S S Sµ σ µ σ = − Γ Γ+ I

0 0 0 0 0;ˆ ˆ ˆ ˆ

S S S SM M M M M

Sµ σ µ σΓ Γ = − + I

4 October 2011

The ICI-based change detection test: feature extraction

Feature Extraction

X(t) ICI ruletest


t

( )X t

…

� Upcoming observations are

partitioned in non-overlapping subsequences of observations

� Features are computed on each

subsequence

ν

( )1

1( )( ) .

t s

s

M s X t

ν

νν

= −

= ∑

( )2

1( )

( ) ( ) ( ) .

t

s

s

S s X t M s

ν

ν= −

= −∑M

Ss

( )( ( .) )s S s=V T

ν observations

4 October 2011

The ICI-based change detection test: feature extraction

Feature Extraction

X(t) ICI ruletest


t

( )X t

…

� Upcoming observations are

partitioned in non-overlapping subsequences of observations

� Features are computed on each

subsequence

ν

( )1

1( )( ) .

t s

s

M s X t

ν

νν

= −

= ∑

( )2

1( )

( ) ( ) ( ) .

t

s

s

S s X t M s

ν

ν= −

= −∑M

Vs

( )( ( .) )s S s=V T

4 October 2011

The ICI-based change detection test: polynomial regression

Feature Extraction

X(t) ICI ruletest


� Let us consider a set of nested

neighborhoods including the training

set, i.e.

� The estimates associated to each

neighborhood are obtained by the

0th order polynomial fit of feature

values: (stationary processes provide

constant feature values)

� Any optimal neighborhood selected by

ICI contains feature values that can be considered constant, and

consistent with the training set

{ }0[1, ], ,[1, ]S S…

0U =

1U =

2U =

3U =

0S

1S

2S

3S

fea

ture

va

lue

s

Nested neighborhoods for 0th order

polynomial regression

subsequence

index

stationary

features

4 October 2011


Feature Extraction

X(t) ICI ruletest




set, i.e.







ICI contains feature values that can be considered constant, and

consistent with the training set

{ }0[1, ], ,[1, ]S S…

0U =

1U =

2U =

3U =

0S

1S

2S

3S

fea

ture

va

lue

s



subsequence

index

stationary

features

4 October 2011


Feature Extraction

X(t) ICI ruletest




set, i.e.







ICI is the largest that contains feature values are consistent with

the training set

{ }0[1, ], ,[1, ]S S…

0U =

1U =

2U =

3U =

0S

1S

2S

3S

fea

ture

va

lue

s



subsequence

index

stationary

features

4 October 2011

The ICI-based change detection test: the ICI rule

Feature Extraction

X(t) test


� ICI rule selects an adaptive neighborhood for fitting constant

functions to features

� For each neighborhood:

� Estimate of polynomial regression

� Standard deviation of the

polynomial estimator

� However, we are not interested in the

estimates provided by the ICI rule

themselves; rather, in their

neighborhoods.

ICI rule

0 1 2 3

iµ̂

iσ

neighborhood

index

feature values

4 October 2011


Feature Extraction

X(t) test


ICI rule

0 1 2 3

� ICI rule selects an adaptive neighborhood for fitting constant

functions to features

� For each neighborhood:

� Estimate of polynomial regression

� Standard deviation of the

polynomial estimator

� However, we are not interested in the

estimates provided by the ICI rule

themselves; rather, in their

neighborhoods.

0σΓ

iµ̂

iσ

neighborhood

index

4 October 2011


Feature Extraction

X(t) test


ICI rule

0 1 2 3

Compute iteratively the intersection

of the confidence intervals

� The ICI rule selects the largest

neighborhood for which

is not empty.

� The ICI rule acts as a nonstationarity

test determining if the feature can be treated as constant within the

considered time interval.

change detected

Ι

[1, ]j

S

neighborhood

index

4 October 2011

Test Execution

1. Compute the features on the current subsequence

2. Fit constant value to each feature in [0,s]

3. Compute the estimator’s standard deviation

4. Intersect Confidence Intervals

5. Continue if

Feature

Extraction

( ), ( )M s V s

1 1,Ms s

V− −I I

1ˆ( 1)·

ˆ( )

MM ss

s

s

s Mµµ − +−

=

0ˆ

ˆ

MSM

ss

σσ =

1ˆ( 1)·

ˆ( )V s

s

V

s

s V sµµ − +−

=

0ˆ

ˆ

VSV

ss

σσ =

Polynomial

Regression

1ˆ ˆ ˆ ˆ;s s

V V V V V Vs ss sµ σ µ σ −

= − + Γ Γ ∩I I

1ˆ ˆ ˆ ˆ;s s

V V V V V Vs ss sµ σ µ σ −

= − + Γ Γ ∩I I

( )&& sM Vs ≠ ∅ ≠ ∅I I

ICI rule

1 1,Ms s

V− −I I

Detect?no

seq. (s) ,

4 October 2011

Experiments – Figures of Merit

� Change Detection Performance is evaluated considering

• False Positives (FP): it counts the times a test detects a change in the sequence when there it is not.

• False Negatives (FN): it counts the times a test does not detect a change when there it is.

• Recognition Delay (RD): it measures the time delay in detecting a change.

• Computational Time (CT): it provides the execution time needed to perform the test (reference platform: Intel Xeon CPU 2.33 GHz)

0TO

*T T̂

RD

4 October 2011

0T

Experiments: datasets and tests

� Application D1: mono-dimensional Gaussian process with four kinds of perturbations

• Abrupt change on mean (variance)

• Drift on mean (variance)

� Application D2: SATIMAGE dataset (Landsat Multispectral Images)

� Application D3: Self-Assembled-Monolayer gas sensors

� ICI-based change detection test compared with:

• CUSUM test (in Application D1)

• Mann – Kendall test

• CI-CUSUM test

� Two configurations for the CI-CUSUM and the ICI-based CDT

• Long Training Sequence (2000 obsevations),

• Short Training Sequence (500 observations)

Data

set

Tests

4 October 2011

Application D1: 2000 training samples

CUSUMMann-

Kendall

CI-CUSUM ICI test

T0=2000 T0=2000

AbruptMean

FP (%) 0 7.3 0 0

FN (%) 0 0 0 0

RD (sample) 11.4 94.9 386.1 149.5

CT (s) 0.5 1044.0 6.9 0.12

DriftMean

FP (%) Na 8 0 0

FN (%) Na 0 0.3 0

RD (sample) Na 590.0 1110.5 793.2

CT (s) Na 1046.9 7.1 0.1

AbruptVariance

FP (%) 0 10 0 0

FN (%) 0 90 2.0 0

RD (sample) 39.5 Na 642.2 300.3

CT (s) 0.5 1037.5 9.2 0.1

DriftVariance

FP (%) Na 10 0 0

FN (%) Na 90 0 0

RD (sample) Na Na 1029.1 630.8

CT (s) Na 1050.3 8.8 0.13

4 October 2011


CUSUMMann-

Kendall

CI-CUSUM ICI test

T0=2000 T0=2000

AbruptMean

FP (%) 0 7.3 0 0

FN (%) 0 0 0 0

RD (sample) 11.4 94.9 386.1 149.5

CT (s) 0.5 1044.0 6.9 0.12

DriftMean

FP (%) Na 8 0 0

FN (%) Na 0 0.3 0

RD (sample) Na 590.0 1110.5 793.2

CT (s) Na 1046.9 7.1 0.1

AbruptVariance

FP (%) 0 10 0 0

FN (%) 0 90 2.0 0

RD (sample) 39.5 Na 642.2 300.3

CT (s) 0.5 1037.5 9.2 0.1

DriftVariance

FP (%) Na 10 0 0

FN (%) Na 90 0 0


CT (s) Na 1050.3 8.8 0.13

4 October 2011


CUSUMMann-

Kendall

CI-CUSUM ICI test

T0=2000 T0=2000

AbruptMean

FP (%) 0 7.3 0 0

FN (%) 0 0 0 0

RD (sample) 11.4 94.9 386.1 149.5

CT (s) 0.5 1044.0 6.9 0.12

DriftMean

FP (%) Na 8 0 0

FN (%) Na 0 0.3 0

RD (sample) Na 590.0 1110.5 793.2

CT (s) Na 1046.9 7.1 0.1

AbruptVariance

FP (%) 0 10 0 0

FN (%) 0 90 2.0 0

RD (sample) 39.5 Na 642.2 300.3

CT (s) 0.5 1037.5 9.2 0.1

DriftVariance

FP (%) Na 10 0 0

FN (%) Na 90 0 0


CT (s) Na 1050.3 8.8 0.13

4 October 2011


CUSUMMann-

Kendall

CI-CUSUM ICI test

T0=2000 T0=2000

AbruptMean

FP (%) 0 7.3 0 0

FN (%) 0 0 0 0

RD (sample) 11.4 94.9 386.1 149.5

CT (s) 0.5 1044.0 6.9 0.12

DriftMean

FP (%) Na 8 0 0

FN (%) Na 0 0.3 0

RD (sample) Na 590.0 1110.5 793.2

CT (s) Na 1046.9 7.1 0.1

AbruptVariance

FP (%) 0 10 0 0

FN (%) 0 90 2.0 0

RD (sample) 39.5 Na 642.2 300.3

CT (s) 0.5 1037.5 9.2 0.1

DriftVariance

FP (%) Na 10 0 0

FN (%) Na 90 0 0


CT (s) Na 1050.3 8.8 0.13

4 October 2011


� When shorter training sequences are available, the performance gap between ICI e CI-CUSUM even increases

CUSUMMann-

Kendall

CI-CUSUM ICI test CI-CUSUM ICI test

T0=2000 T0=2000 T0=500 T0=500

AbruptMean

FP (%) 0 7.3 0 0 7.7 5.5

FN (%) 0 0 0 0 0 0

RD (sample) 11.4 94.9 386.1 149.5 345.0 140.5

CT (s) 0.5 1044.0 6.9 0.12 6.9 0.1

DriftMean

FP (%) Na 8 0 0 8.0 5.9

FN (%) Na 0 0.3 0 0 0

RD (sample) Na 590.0 1110.5 793.2 832.9 764.2

CT (s) Na 1046.9 7.1 0.1 4.5 0.2

AbruptVariance

FP (%) 0 10 0 0 8.0 5.9

FN (%) 0 90 2.0 0 0 0

RD (sample) 39.5 Na 642.2 300.3 437.9 280.9

CT (s) 0.5 1037.5 9.2 0.1 6.61 0.1

DriftVariance

FP (%) Na 10 0 0 9.4 5.8

FN (%) Na 90 0 0 0 0

RD (sample) Na Na 1029.1 630.8 765.8 597.6

CT (s) Na 1050.3 8.8 0.13 7.4 0.2

4 October 2011

Application D2 and D3

CUSUMMann-

Kendall

CI-CUSUM ICI test

T0=2000 T0=500 T0=2000 T0=500

D2

Abrupt

FP (%) Na 6.0 0 26.6 0.0 7.8

FN (%) Na 41.8 12.0 7.3 6.0 2.6

RD (sample) Na 1003.0 574.7 487.1 196.1 229.1

CT (s) Na 51.7 2.3 2.0 0.05 0.07

Drift

FP (%) Na 6.1 0 25.3 0 7.3

FN (%) Na 58.5 22.6 9.3 10.7 8.6

RD (sample) Na 1718.1 1304.1 996.5 831 811.4

CT (s) Na 50.6 2.3 1.5 0.05 0.05

D3

Abrupt

FP (%) Na Na 0.3 60.6 1.3 8.6

FN (%) Na Na 5.3 2.6 1.3 2.0

RD (sample) Na Na 384.5 438.1 361.7 295.4

CT (s) Na Na 81.4 55.1 0.1 0.5

Drift

FP (%) Na Na 0.7 61.6 0.6 14

FN (%) Na Na 12 1.3 4.6 4

RD (sample) Na Na 1911.3 1924.1 1890.1 1843.1

CT (s) Na Na 49.2 73.4 0.1 0.1

4 October 2011

Pros / Cons

� Pros

• Good performance (Prompter detections, less FP and FN)

• Low computation complexity

• No need of alternative hypothesis, a change corresponds to the impossibility, according to ICI rule, to fit a zero order polynomial to

the whole feature set

� Cons

• Process is handled by means of features,

− changes that do not affect features are not perceived

− data are processed in subsequences

• ICI balances bias/variance, while a change detection test should be zero-bias.

4 October 2011

Pros / Cons

� Pros

• Good performance (Prompter detections, less FP and FN)

• Low computation complexity

• No need of alternative hypothesis, a change corresponds to the impossibility, according to ICI rule, to fit a zero order polynomial to

the whole feature set

� Cons

• Process is handled by means of features,

− changes that do not affect features are not perceived

− data are processed in subsequences

• ICI balances bias/variance, while a change detection test should be zero-bias.

4 October 2011

Performance of the ICI on long feat sequences

� How does the Recognition Delay vary when incrases?

� We follow a Monte Carlo approach considering processes having abrupt changes at different time instant

0TO

*T T̂

*T*

( ˆ )T T−

RD

4 October 2011

0T

ICI behavior on long time execution – Monte Carlo

� The later the change, the more observations are required

� These delays cannot be analytically compensated on-line.

Re

co

gn

itio

n D

ela

y

4 *10 ·T

4 October 2011

� The later the change, the more observations are required

� These delays cannot be analytically compensated on-line.

� ICI provides prompter detection on shorter observation sequences.

ICI behavior on long time execution – Monte Carlo

4 *10 ·T

Re

co

gn

itio

n D

ela

y

4 October 2011

Refinement Procedure

� Let be the test outcome

4 October 2011



� We split the segment into

4 October 2011




� We run the test on , where it should react faster

4 October 2011





� Assume there is a detection in

4 October 2011






� Set , and run the test on

4 October 2011







� Assume that there is a detection in

4 October 2011








� Stop as

4 October 2011








� Stop as

� The refined estimate is

4 October 2011

Re

co

gn

itio

n D

ela

yRefinement Procedure Performance

� The change-detection refinement procedure effectively reduces Recognition Delays when increases

*T

4 *10 ·T

4 October 2011

Re

co

gn

itio

n D

ela

yRefinement Pocedure Performance

4 *10 ·T

4 October 2011

What for the refinement procedure?

� The gap between the initial detection and the refined detection is assumed to be composed of samples generated by X in the novel status

� These samples can be considered as representative of the novel stationary status and can be used as a new training set for the test, in order to detect further changes

4 October 2011

0TO

*T T̂0T




4 October 2011

0TO

*T T̂0T refT




4 October 2011

0TO

*T T̂0T refT

Experiments

� The test has been paired with an on-line classification system.

• Data are taken from two different x-ray sources, the goal is to determine the source out of the

• k-NN classifier; 1 sample out of five is classified by a supervisor

4 October 2011

Experiments




4 October 2011

Experiments




cla

ssific

atio

n e

rro

r, o

ve

r 3

0 r

un

s

4 October 2011

Ongoing Works

� Modeling the behavior of ICI rule as t increases,

• to motivate the refinement procedure

• comparison with parametric tests

� Use gaussianizing transform (such as Box Cox) to define attitional features

� Use Gaussianity tests on features when a novel training set is identified

� The test can be used to monitor polynomial trends in features

� A truly-multivariate extension would be very useful...4 October 2011

4 October 2011

4 October 2011

Training Phase

1. Compute the features on training subsequences

2. Compute polynomial estimate of features

3. Compute the first six cumulants of X from

4. Compute h_0 as in [REF] and define

5. Compute

6. Define

Training Phase

training samples

0

0 0

1

( )ˆ /M

S

S

s

M s Sµ=

=∑

0 0{ ( ), 1, , }T X t TO t= = …{ } { }0 0 0 0( ), 1, , ( ), 1, , ,, /S SM Ts s S s s S ν… … == =

( )00

0

2

01

ˆ( )

1ˆ

MS

M S

S

s

M s

S

µσ

=

−

−= ∑

Tra

in

0TO

( ){ }0( )( ) ,, 1,S s sV s S== …T

0

0

01

( )ˆ

S

V

s

SS

V sµ

=

=∑ ( )00

0

2

01

ˆ( )

1ˆ

SS

S

V

V

s

V s

S

µσ

=

−

−= ∑ 0 0 0 0 0

;ˆ ˆ ˆ ˆV V V V VS S S S Sµ σ µ σ = − + I

0 0 0 0 0;ˆ ˆ ˆ ˆ

S S S SM M M M

SMµ σ µ σ = − + I

0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V

S S S S Sµ σ µ σ = − + I

0 0 0 0 0;ˆ ˆ ˆ ˆ

S S S SM M M M


4 October 2011

Training Phase

1. Compute the features on training subsequences

2. Compute polynomial estimate of features

3. Compute the first six cumulants of X from

4. Compute h_0 as in [REF] and define

5. Compute

6. Define

Training Phase

training samples

0

0 0

1

( )ˆ /M

S

S

s

M s Sµ=

=∑

0 0{ ( ), 1, , }T X t TO t= = …{ } { }0 0 0 0( ), 1, , ( ), 1, , ,, /S SM Ts s S s s S ν… … == =

( )00

0

2

01

ˆ( )

1ˆ

MS

M S

S

s

M s

S

µσ

=

−

−= ∑

Tra

in

0TO

( ){ }0( )( ) ,, 1,S s sV s S== …T

0

0

01

( )ˆ

S

V

s

SS

V sµ

=

=∑ ( )00

0

2

01

ˆ( )

1ˆ

SS

S

V

V

s

V s

S

µσ

=

−

−= ∑ 0 0 0 0 0

;ˆ ˆ ˆ ˆV V V V VS S S S Sµ σ µ σ = − + I

0 0 0 0 0;ˆ ˆ ˆ ˆ

S S S SM M M M


0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V


0 0 0 0 0;ˆ ˆ ˆ ˆ

S S S SM M M M


4 October 2011

The ICI-based change detection test

� The proposed test relies on a set of functions that transform the observations into Gaussian distributed features

� The ICI rule, combined with a polynomial regression technique,

assesses the stationary of the features (and hence of the process)

Feature Extraction

X(t) ICI ruletest


A change is detected in the process when at least one of the features shows a change

4 October 2011


Feature Extraction

X(t) ICI ruletest


� Let us consider a set of neighborhoods having the leftmost extreme at S=1




values: stationary processes provide

constant feature values


ICI contains feature values that can be considered constant for a

stationary process.

{ }0[1, ], ,[1, ]S S…

0U =

1U =

2U =

3U =

0S

1S

2S

3S

fea

ture

va

lue

s

4 October 2011

The ICI rule

� Thus, it is reasonable to exploit the adaptive neighborhood selected using the ICI rule

ˆ ( )tµ

t

( )z t

ˆ ( )tµˆ ( )tµ

4 October 2011

Problem Statement

� Let be the data generating process

� Let be the training set, where is stationary.

� The goal is to determine , the time instant when becomes non stationary

� are i.i.d. samples, the of is unkwnon

• practical implications (updating KB of a classifier)

X

*0( )T T>

0 0{ ( ), 1, , }T X t TO t= = …

:d

X →� �

X

{ }( ) , *X t t T< X

4 October 2011

Test Execution

1. Compute the features on the current sequence

2. Fit constant value to each feature in [0,s]

3. Compute the estimator’s standard deviation

4. Intersect Confidence Intervals

Feature

Extraction

s sequence,

( ), ( )M s V s

1 1,Ms s

V− −I I

1ˆ( 1)·

ˆ( )

MM ss

s

s

s Mµµ − +−

=

0ˆ

ˆ

MSM

ss

σσ =

1ˆ( 1)·

ˆ( )V s

s

V

s

s V sµµ − +−

=

0ˆ

ˆ

VSV

ss

σσ =

Polynomial

Regression

1ˆ ˆ ˆ ˆ;s s

V V V V V Vss s sµ σ µ σ −∩ = − + I I

1ˆ ˆ ˆ ˆ;s s


( )&& sM Vs ≠ ∅ ≠ ∅I I

ICI rule

1 1,Ms s

V− −I I

Detect?no

4 October 2011

The ICI-based change detection test: the algorithm

1. Compute and2.

3. Define4. Compute the first six cumulants of from5. Compute6.

7. Define8. Set

9. while {10. Set11. Wait for samples, until is populated12. Compute and

13.

14.

15.16.17. }18. Detect a change in

{ }0( ), , ,1M s s S= …

0

0

01

( )ˆ

S

s

MS

M

S

sµ

=

=∑ ( )00

0

2

01

ˆ( )

1ˆ

MS

M S

S

s

M s

S

µσ

=

−

−= ∑

0 0 0 0 0;ˆ ˆ ˆ ˆ

S S S SM M M M


X TS

{ }0( ), , ,1S s s S= …

( ){ }0( )( ) ,, 1,S s sV s S== …T

0

0

01

( )ˆ

S

V

s

SS

V sµ

=

=∑ ( )00

0

2

01

ˆ( )

1ˆ

SS

S

V

V

s

V s

S

µσ

=

−

−= ∑

0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V


0s S=

( )&& sM Vs ≠ ∅ ≠ ∅I I

0 1s S= +

ν ( )Y s

( )M s ( )V s

1ˆ( 1)·

ˆ( )

MM ss

s

s

s Mµµ − +−

= 0ˆ

ˆ

MSM

ss

σσ =

1ˆ( 1)·

ˆ( )V s

s

V

s

s V sµµ − +−

= 0ˆ

ˆ

VSV

ss

σσ =

1ˆ ˆ ˆ ˆ; s

M M M M M Ms ss s sµ σ µ σ −∩ = − + I I

1ˆ ˆ ˆ ˆ;s s


( )1 , ss ν ν −

Training Phase

training samples

0 0{ ( ), 1, , }T X t TO t= = …

s sequence,1 1,

Ms s

V− −I I

Polynomial Regression

ICI rule

1 1,Ms s

V− −I I

Detect?no

Feature Extraction

yesChangeDetected

4 October 2011

Documents

ICI for Change Detection - polimi.it · 2011. 10. 4. · ICI rule is used on feature values to determine to which extent these can be considered constant, and thus stationarity X