Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Change Detection Tests Using the ICI rule
Giacomo Boracchi
Dipartimento di Elettronica e Informazione,
Politecnico di Milano
Department of Signal Processing
Tampere University of Technology
4 October 2011
Application Scenario
� Reliable systems working in real-world environments have to handle the occurrence of unpredictable events causing changes in the data generating process
• ageing effects
• thermal drifts
• faults
� Change detection is of relevant interest for two main reasons
1. Changes may be due to faults, malfunctioning and ageing effects e.g.
Industrial production (Statistical Process Control, Control Charts)
2. On-line systems (such as a classifier) have to adapt to the new operating
conditions to maintain the performance after the change.
[Alippi10a] Alippi, C., Boracchi, G,. Roveri, M., "Change Detection Tests Using the ICI rule" in Proceedings of IJCNN 2010, 18 - 23 July, 2010 Barcelona, Spain.
[Alippi10b] Alippi, C., Boracchi, G,. Roveri, M., "Adaptive Classifiers with ICI-
based Adaptive Knowledge Base Management", Proceedings of ICANN 2010, September 15-18, 2010, Thessaloniki, Greece
4 October 2011
Application Scenario
� Reliable systems working in real-world environments have to handle the occurrence of unpredictable events causing changes in the data generating process
• ageing effects
• thermal drifts
• faults
� Change detection is of relevant interest for two main reasons
1. Changes may be due to faults, malfunctioning and ageing effects e.g.
Industrial production (Statistical Process Control, Control Charts)
2. On-line systems (such as a classifier) have to adapt to the new operating
conditions to maintain the performance after the change.
[Alippi10a] Alippi, C., Boracchi, G,. Roveri, M., "Change Detection Tests Using the ICI rule" in Proceedings of IJCNN 2010, 18 - 23 July, 2010 Barcelona, Spain.
[Alippi10b] Alippi, C., Boracchi, G,. Roveri, M., "Adaptive Classifiers with ICI-
based Adaptive Knowledge Base Management", Proceedings of ICANN 2010, September 15-18, 2010, Thessaloniki, Greece
4 October 2011
Application Scenario
� Reliable systems working in real-world environments have to handle the occurrence of unpredictable events causing changes in the data generating process
• ageing effects
• thermal drifts
• faults
� Change detection is of relevant interest for two main reasons
1. Changes may be due to faults, malfunctioning and ageing effects e.g.
Industrial production (Statistical Process Control, Control Charts)
2. On-line systems (such as a classifier) have to adapt to the new operating
conditions to maintain the performance after the change.
[Alippi10a] Alippi, C., Boracchi, G,. Roveri, M., "Change Detection Tests Using the ICI rule" in Proceedings of IJCNN 2010, 18 - 23 July, 2010 Barcelona, Spain.
[Alippi10b] Alippi, C., Boracchi, G,. Roveri, M., "Adaptive Classifiers with ICI-
based Adaptive Knowledge Base Management", Proceedings of ICANN 2010, September 15-18, 2010, Thessaloniki, Greece
4 October 2011
Problem Statement
Problem statement
� Let be the stochastic process
� is stationary until (i.e. data are i.i.d.), then it may change.
� The goal is to determine , the time instant when becomes non stationary.
� Change has to be detected online.
What do we know?
� parametric tests: Distribution of (before and/or after the
change)
� non-parametric tests: a training set ,
generated by in stationary condition.X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
X
X 0T
X
4 October 2011
:X →N R
Problem Statement
Problem statement
� Let be the stochastic process
� is stationary until (i.e. data are i.i.d.), then it may change.
� The goal is to determine , the time instant when becomes non stationary.
� Change has to be detected online.
What do we know?
� parametric tests: Distribution of (before and/or after the
change)
� non-parametric tests: a training set ,
generated by in stationary condition.X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
X
:X →N R
X 0T
X
4 October 2011
Problem Statement
Problem statement
� Let be the stochastic process
� is stationary until (i.e. data are i.i.d.), then it may change.
� The goal is to determine , the time instant when becomes non stationary.
� Change has to be detected online.
What do we know?
� parametric tests: Distribution of (before and/or after the
change)
� non-parametric tests: a training set ,
generated by in stationary condition.X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
X
X 0T
X
4 October 2011
:X →N R
Problem Statement
Problem statement
� Let be the stochastic process
� is stationary until (i.e. data are i.i.d.), then it may change.
� The goal is to determine , the time instant when becomes non stationary.
� Design an online statistical test for change detection.
What do we know?
� parametric tests: Distribution of (before and/or after the
change)
� non-parametric tests: a training set ,
generated by in stationary condition.X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
X
X 0T
X
4 October 2011
:X →N R
Problem Statement
Problem statement
� Let be the stochastic process
� is stationary until (i.e. data are i.i.d.), then it may change.
� The goal is to determine , the time instant when becomes non stationary.
� Design an online statistical test for change detection
What do we know?
� parametric tests: Distribution of (before and/or after the
change)
� non-parametric tests: a training set ,
generated by in stationary condition.X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
X
X 0T
X
4 October 2011
:X →N R
Problem Statement
Problem statement
� Let be the stochastic process
� is stationary until (i.e. data are i.i.d.), then it may change.
� The goal is to determine , the time instant when becomes non stationary.
� Design an online statistical test for change detection
What do we know?
� parametric tests: Distribution of (before and/or after the
change)
� non-parametric tests: a training set ,
generated by in stationary condition.X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
X
X 0T
X
4 October 2011
:X →N R
Problem Statement
Problem statement
� Let be the stochastic process
� is stationary until (i.e. data are i.i.d.), then it may change.
� The goal is to determine , the time instant when becomes non stationary.
� Design an online statistical test for change detection
What do we know?
� parametric tests: Distribution of (before and/or after the
change)
� non-parametric tests: a training set ,
generated by in stationary condition.X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
X
X 0T
X
4 October 2011
:X →N R
Change Detection Tests
� Parametric statistical tests:
• CUSUM test: (control area) based on the cumulative sum charts
�: widely used, easy to implement;
�: knowledge of the pdfs, parameters to fix
� Non-parametric statistical tests
• Mann-Kendall test: (environmental science) : evaluates the signs of
observed values differences
�: widely used, easy to implement;
�: thresholds to fix, high computational complexity
• CI-CUSUM test: (computational intelligence)
based on features extracted from data
�: automatical configuration of the parameters, effective;
�: large training set
4 October 2011
The contributions
1. We propose a change detection test that requires only a training set
2. Once the change has been detected we provide the test the capability to identify a novel training set.
TS
0TO
0TO
0T
0T
*T T̂
*T T̂
4 October 2011
The Contributions
1. We propose a change detection test that requires only a training set
2. Once the change has been detected we provide the test the capability to identify a novel training set.
TS
0TO
0TO
0T
0T
*T T̂
*T T̂
4 October 2011
The core elements of the test
1. The ICI rule
• Intersection of Confidence Interval rule (ICI) is an adaptation
algorithm, used to define neighborhoods for polynomial regression
[Goldenshluger97, Katkovnik99]
[Goldenshluger97] Goldenshluger, A., and Nemirovski A., “On spatial adaptive estimation of nonparametric regression”, Math. Meth. Statistics, vol. 6, pp. 135-170, 1997
[Katkovnik99] Katkovnik, V.: “A new method for varying adaptive bandwidth selection”. In: Signal Processing, IEEE Transactions. on, vol. 47, no. 9, pp. 2567-2571, 1999.
4 October 2011
The ICI rule
� The ICI rule operates, combined with a polynomial regression technique, on sequences of noisy data having Gaussian distribution
� Given a set of a set of nested neighborhoods in t
• the corresponding polynomial fits
• the value of
� The ICI selects an adaptive neighborhood which is the largest for which the polynomial model fits the data.
� Thus, it is reasonable to exploit the ICI selected neighborhood to detect nonstationarity in the process.
4 October 2011
The ICI rule
� The ICI rule operates, combined with a polynomial regression technique, on sequences of noisy data having Gaussian distribution
� Given a set of a set of nested neighborhoods in t
• the corresponding polynomial fits
• the value of
� The ICI selects an adaptive neighborhood which is the largest for which the polynomial model fits the data.
� Thus, it is reasonable to exploit the ICI selected neighborhood to detect nonstationarity in the process.
4 October 2011
The ICI rule
� The ICI rule operates, combined with a polynomial regression technique, on sequences of noisy data having Gaussian distribution
� Given a set of a set of nested neighborhoods in t
• the corresponding polynomial fits
• the value of
� The ICI selects an adaptive neighborhood which is the largest for which the polynomial model fits the data.
� Thus, it is reasonable to exploit the ICI selected neighborhood to detect nonstationarity in the process.
4 October 2011
The ICI rule
� The ICI rule operates, combined with a polynomial regression technique, on sequences of noisy data having Gaussian distribution
� Given a set of a set of nested neighborhoods in t
• the corresponding polynomial fits
• the value of
� The ICI selects an adaptive neighborhood which is the largest for which the polynomial model fits the data.
� Thus, it is reasonable to exploit the ICI selected neighborhood to detect nonstationarity in the process.
4 October 2011
The core elements of the test
1. The ICI rule
• Intersection of Confidence Interval rule (ICI) is an adaptation
algorithm, used to define neighborhoods for polynomial regression
[Goldenshluger97, Katkovnik99]
2. The Gaussianization via Feature Extraction
• Since data distribution is unknown, ICI cannot be directly applied on observed data
4 October 2011
Gaussianization via Feature Extraction
� Process stationarity is monitored through features, functions of observed values that are Gaussian distributed, such as:
• the sample mean of non-overlapping sets of data
• the sample variance of non-overlapping sets of data values transformed with Gaussianizing power-law [Mudholkar81]
• ....
� When is stationary, any feature should be distributed as
� ICI rule is used on feature values to determine to which extent these can be considered constant, and thus stationarity
X
X
[Mudholkar81] Mudholkar G. S., Trivedi M. C.: “A Gaussian Approximation to the Distribution of the Sample Variance for Nonnormal Populations”. In: Journal of the American Statistical Association, Vol. 76, No. 374 (Jun., 1981), pp. 479-485
4 October 2011
Gaussianization via Feature Extraction
� Process stationarity is monitored through features, functions of observed values that are Gaussian distributed, such as:
• the sample mean of non-overlapping sets of data
• the sample variance of non-overlapping sets of data values transformed with Gaussianizing power-law [Mudholkar81]
• ....
� When is stationary, any feature should be distributed as
� ICI rule is used on feature values to determine to which extent these can be considered constant, and thus stationarity
X
X
[Mudholkar81] Mudholkar G. S., Trivedi M. C.: “A Gaussian Approximation to the Distribution of the Sample Variance for Nonnormal Populations”. In: Journal of the American Statistical Association, Vol. 76, No. 374 (Jun., 1981), pp. 479-485
4 October 2011
Gaussianization via Feature Extraction
� Process stationarity is monitored through features, functions of observed values that are Gaussian distributed, such as:
• the sample mean of non-overlapping sets of data
• the sample variance of non-overlapping sets of data values transformed with Gaussianizing power-law [Mudholkar81]
• ....
� When is stationary, any feature should be distributed as
� ICI rule is used on feature values to determine to which extent these can be considered constant, and thus stationarity
X
X
[Mudholkar81] Mudholkar G. S., Trivedi M. C.: “A Gaussian Approximation to the Distribution of the Sample Variance for Nonnormal Populations”. In: Journal of the American Statistical Association, Vol. 76, No. 374 (Jun., 1981), pp. 479-485
4 October 2011
The core elements of the test
1. The ICI rule
• Intersection of Confidence Interval rule (ICI) is an adaptation
algorithm, used to define neighborhoods for polynomial regression
[Goldenshluger97, Katkovnik99]
2. The Gaussianization via Feature Extraction
• Since data distribution is unknown, ICI cannot be directly applied on observed data
• Features represents the process-handles:
− Only changes affecting features can be perceived
− A change is detected in when a change is detected in any feature
− Beside the two sample moments, other features can be devised
X
4 October 2011
Training Phase
� The training set has to provide, for each feature
• an estimate for its expected value (in stationarity)
• an estimate of its standard deviation (in stationarity)
� is divided it into non-overlapping subsequences of observations
� In each subsequence we compute
• the sample mean
• the sample variance
0 0{ ( ), 1, , }T X t TO t= = …
( )·M
( )·S
0TO
(1) , (1)M S 0 0( ) , ( )M s S s0T
ν0TO
4 October 2011
Training Phase
� The training set has to provide, for each feature
• an estimate for its expected value (in stationarity)
• an estimate of its standard deviation (in stationarity)
� is divided it into non-overlapping subsequences of observations
� In each subsequence we compute
• the sample mean
• the sample variance
0 0{ ( ), 1, , }T X t TO t= = …
( )·M
( )·S
0TO
0T(1) , (1)M S 0 0( ) , ( )M s S s
ν0TO
4 October 2011
Training Phase
� The training set has to provide, for each feature
• an estimate for its expected value (in stationarity)
• an estimate of its standard deviation (in stationarity)
� The whole training set is used to compute the exponent of the power-law transformation [Mudholkar81] to give approximate
Gaussian distribution the sample variance
0 0{ ( ), 1, , }T X t TO t= = …
0h
( )0( )
( ) ,1
hS s
S sν
=−
T
0TO
0T
0h
4 October 2011
Training Phase
� The training set has to provide, for each feature
• an estimate for its expected value (in stationarity)
• an estimate of its standard deviation (in stationarity)
� The values of the sample variance are replaced by
� For each feature we compute
• the mean (in stationarity)
• the standard deviation (in stationarity)
0 0{ ( ), 1, , }T X t TO t= = …
( ))( () sV s S=T
0TO
0T(1) , (1)M V 0 0( ) , ( )M s V s
0 0,ˆ ˆ
SM V
Sµ µ
0 0,ˆ ˆ
SM V
Sσ σ
4 October 2011
The Test
� Summarizing the training phase provides the test
• an estimate of each feature distribution and in particular
� The test is composed of the following three parts:
Feature Extraction
ICI ruletest
outcomePolynomial Regression
0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V
S S S S Sµ σ µ σ = − Γ Γ+ I
0 0 0 0 0;ˆ ˆ ˆ ˆ
S S S SM M M M M
Sµ σ µ σΓ Γ = − + I
4 October 2011
The Test
� Summarizing the training phase provides the test
• an estimate of each feature distribution and in particular
� The test is composed of the following three parts:
Feature Extraction
ICI ruletest
outcomePolynomial Regression
0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V
S S S S Sµ σ µ σ = − Γ Γ+ I
0 0 0 0 0;ˆ ˆ ˆ ˆ
S S S SM M M M M
Sµ σ µ σΓ Γ = − + I
4 October 2011
The ICI-based change detection test: feature extraction
Feature Extraction
X(t) ICI ruletest
outcomePolynomial Regression
t
( )X t
…
� Upcoming observations are
partitioned in non-overlapping subsequences of observations
� Features are computed on each
subsequence
ν
( )1
1( )( ) .
t s
s
M s X t
ν
νν
= −
= ∑
( )2
1( )
( ) ( ) ( ) .
t
s
s
S s X t M s
ν
ν= −
= −∑M
Ss
( )( ( .) )s S s=V T
ν observations
4 October 2011
The ICI-based change detection test: feature extraction
Feature Extraction
X(t) ICI ruletest
outcomePolynomial Regression
t
( )X t
…
� Upcoming observations are
partitioned in non-overlapping subsequences of observations
� Features are computed on each
subsequence
ν
( )1
1( )( ) .
t s
s
M s X t
ν
νν
= −
= ∑
( )2
1( )
( ) ( ) ( ) .
t
s
s
S s X t M s
ν
ν= −
= −∑M
Vs
( )( ( .) )s S s=V T
4 October 2011
The ICI-based change detection test: polynomial regression
Feature Extraction
X(t) ICI ruletest
outcomePolynomial Regression
� Let us consider a set of nested
neighborhoods including the training
set, i.e.
� The estimates associated to each
neighborhood are obtained by the
0th order polynomial fit of feature
values: (stationary processes provide
constant feature values)
� Any optimal neighborhood selected by
ICI contains feature values that can be considered constant, and
consistent with the training set
{ }0[1, ], ,[1, ]S S…
0U =
1U =
2U =
3U =
0S
1S
2S
3S
fea
ture
va
lue
s
Nested neighborhoods for 0th order
polynomial regression
subsequence
index
stationary
features
4 October 2011
The ICI-based change detection test: polynomial regression
Feature Extraction
X(t) ICI ruletest
outcomePolynomial Regression
� Let us consider a set of nested
neighborhoods including the training
set, i.e.
� The estimates associated to each
neighborhood are obtained by the
0th order polynomial fit of feature
values: (stationary processes provide
constant feature values)
� Any optimal neighborhood selected by
ICI contains feature values that can be considered constant, and
consistent with the training set
{ }0[1, ], ,[1, ]S S…
0U =
1U =
2U =
3U =
0S
1S
2S
3S
fea
ture
va
lue
s
Nested neighborhoods for 0th order
polynomial regression
subsequence
index
stationary
features
4 October 2011
The ICI-based change detection test: polynomial regression
Feature Extraction
X(t) ICI ruletest
outcomePolynomial Regression
� Let us consider a set of nested
neighborhoods including the training
set, i.e.
� The estimates associated to each
neighborhood are obtained by the
0th order polynomial fit of feature
values: (stationary processes provide
constant feature values)
� Any optimal neighborhood selected by
ICI is the largest that contains feature values are consistent with
the training set
{ }0[1, ], ,[1, ]S S…
0U =
1U =
2U =
3U =
0S
1S
2S
3S
fea
ture
va
lue
s
Nested neighborhoods for 0th order
polynomial regression
subsequence
index
stationary
features
4 October 2011
The ICI-based change detection test: the ICI rule
Feature Extraction
X(t) test
outcomePolynomial Regression
� ICI rule selects an adaptive neighborhood for fitting constant
functions to features
� For each neighborhood:
� Estimate of polynomial regression
� Standard deviation of the
polynomial estimator
� However, we are not interested in the
estimates provided by the ICI rule
themselves; rather, in their
neighborhoods.
ICI rule
0 1 2 3
iµ̂
iσ
neighborhood
index
feature values
4 October 2011
The ICI-based change detection test: the ICI rule
Feature Extraction
X(t) test
outcomePolynomial Regression
ICI rule
0 1 2 3
� ICI rule selects an adaptive neighborhood for fitting constant
functions to features
� For each neighborhood:
� Estimate of polynomial regression
� Standard deviation of the
polynomial estimator
� However, we are not interested in the
estimates provided by the ICI rule
themselves; rather, in their
neighborhoods.
0σΓ
iµ̂
iσ
neighborhood
index
4 October 2011
The ICI-based change detection test: the ICI rule
Feature Extraction
X(t) test
outcomePolynomial Regression
ICI rule
0 1 2 3
Compute iteratively the intersection
of the confidence intervals
� The ICI rule selects the largest
neighborhood for which
is not empty.
� The ICI rule acts as a nonstationarity
test determining if the feature can be treated as constant within the
considered time interval.
change detected
Ι
[1, ]j
S
neighborhood
index
4 October 2011
Test Execution
1. Compute the features on the current subsequence
2. Fit constant value to each feature in [0,s]
3. Compute the estimator’s standard deviation
4. Intersect Confidence Intervals
5. Continue if
Feature
Extraction
( ), ( )M s V s
1 1,Ms s
V− −I I
1ˆ( 1)·
ˆ( )
MM ss
s
s
s Mµµ − +−
=
0ˆ
ˆ
MSM
ss
σσ =
1ˆ( 1)·
ˆ( )V s
s
V
s
s V sµµ − +−
=
0ˆ
ˆ
VSV
ss
σσ =
Polynomial
Regression
1ˆ ˆ ˆ ˆ;s s
V V V V V Vs ss sµ σ µ σ −
= − + Γ Γ ∩I I
1ˆ ˆ ˆ ˆ;s s
V V V V V Vs ss sµ σ µ σ −
= − + Γ Γ ∩I I
( )&& sM Vs ≠ ∅ ≠ ∅I I
ICI rule
1 1,Ms s
V− −I I
Detect?no
seq. (s) ,
4 October 2011
Experiments – Figures of Merit
� Change Detection Performance is evaluated considering
• False Positives (FP): it counts the times a test detects a change in the sequence when there it is not.
• False Negatives (FN): it counts the times a test does not detect a change when there it is.
• Recognition Delay (RD): it measures the time delay in detecting a change.
• Computational Time (CT): it provides the execution time needed to perform the test (reference platform: Intel Xeon CPU 2.33 GHz)
0TO
*T T̂
RD
4 October 2011
0T
Experiments: datasets and tests
� Application D1: mono-dimensional Gaussian process with four kinds of perturbations
• Abrupt change on mean (variance)
• Drift on mean (variance)
� Application D2: SATIMAGE dataset (Landsat Multispectral Images)
� Application D3: Self-Assembled-Monolayer gas sensors
� ICI-based change detection test compared with:
• CUSUM test (in Application D1)
• Mann – Kendall test
• CI-CUSUM test
� Two configurations for the CI-CUSUM and the ICI-based CDT
• Long Training Sequence (2000 obsevations),
• Short Training Sequence (500 observations)
Data
set
Tests
4 October 2011
Application D1: 2000 training samples
CUSUMMann-
Kendall
CI-CUSUM ICI test
T0=2000 T0=2000
AbruptMean
FP (%) 0 7.3 0 0
FN (%) 0 0 0 0
RD (sample) 11.4 94.9 386.1 149.5
CT (s) 0.5 1044.0 6.9 0.12
DriftMean
FP (%) Na 8 0 0
FN (%) Na 0 0.3 0
RD (sample) Na 590.0 1110.5 793.2
CT (s) Na 1046.9 7.1 0.1
AbruptVariance
FP (%) 0 10 0 0
FN (%) 0 90 2.0 0
RD (sample) 39.5 Na 642.2 300.3
CT (s) 0.5 1037.5 9.2 0.1
DriftVariance
FP (%) Na 10 0 0
FN (%) Na 90 0 0
RD (sample) Na Na 1029.1 630.8
CT (s) Na 1050.3 8.8 0.13
4 October 2011
Application D1: 2000 training samples
CUSUMMann-
Kendall
CI-CUSUM ICI test
T0=2000 T0=2000
AbruptMean
FP (%) 0 7.3 0 0
FN (%) 0 0 0 0
RD (sample) 11.4 94.9 386.1 149.5
CT (s) 0.5 1044.0 6.9 0.12
DriftMean
FP (%) Na 8 0 0
FN (%) Na 0 0.3 0
RD (sample) Na 590.0 1110.5 793.2
CT (s) Na 1046.9 7.1 0.1
AbruptVariance
FP (%) 0 10 0 0
FN (%) 0 90 2.0 0
RD (sample) 39.5 Na 642.2 300.3
CT (s) 0.5 1037.5 9.2 0.1
DriftVariance
FP (%) Na 10 0 0
FN (%) Na 90 0 0
RD (sample) Na Na 1029.1 630.8
CT (s) Na 1050.3 8.8 0.13
4 October 2011
Application D1: 2000 training samples
CUSUMMann-
Kendall
CI-CUSUM ICI test
T0=2000 T0=2000
AbruptMean
FP (%) 0 7.3 0 0
FN (%) 0 0 0 0
RD (sample) 11.4 94.9 386.1 149.5
CT (s) 0.5 1044.0 6.9 0.12
DriftMean
FP (%) Na 8 0 0
FN (%) Na 0 0.3 0
RD (sample) Na 590.0 1110.5 793.2
CT (s) Na 1046.9 7.1 0.1
AbruptVariance
FP (%) 0 10 0 0
FN (%) 0 90 2.0 0
RD (sample) 39.5 Na 642.2 300.3
CT (s) 0.5 1037.5 9.2 0.1
DriftVariance
FP (%) Na 10 0 0
FN (%) Na 90 0 0
RD (sample) Na Na 1029.1 630.8
CT (s) Na 1050.3 8.8 0.13
4 October 2011
Application D1: 2000 training samples
CUSUMMann-
Kendall
CI-CUSUM ICI test
T0=2000 T0=2000
AbruptMean
FP (%) 0 7.3 0 0
FN (%) 0 0 0 0
RD (sample) 11.4 94.9 386.1 149.5
CT (s) 0.5 1044.0 6.9 0.12
DriftMean
FP (%) Na 8 0 0
FN (%) Na 0 0.3 0
RD (sample) Na 590.0 1110.5 793.2
CT (s) Na 1046.9 7.1 0.1
AbruptVariance
FP (%) 0 10 0 0
FN (%) 0 90 2.0 0
RD (sample) 39.5 Na 642.2 300.3
CT (s) 0.5 1037.5 9.2 0.1
DriftVariance
FP (%) Na 10 0 0
FN (%) Na 90 0 0
RD (sample) Na Na 1029.1 630.8
CT (s) Na 1050.3 8.8 0.13
4 October 2011
Application D1: 500 training samples
� When shorter training sequences are available, the performance gap between ICI e CI-CUSUM even increases
CUSUMMann-
Kendall
CI-CUSUM ICI test CI-CUSUM ICI test
T0=2000 T0=2000 T0=500 T0=500
AbruptMean
FP (%) 0 7.3 0 0 7.7 5.5
FN (%) 0 0 0 0 0 0
RD (sample) 11.4 94.9 386.1 149.5 345.0 140.5
CT (s) 0.5 1044.0 6.9 0.12 6.9 0.1
DriftMean
FP (%) Na 8 0 0 8.0 5.9
FN (%) Na 0 0.3 0 0 0
RD (sample) Na 590.0 1110.5 793.2 832.9 764.2
CT (s) Na 1046.9 7.1 0.1 4.5 0.2
AbruptVariance
FP (%) 0 10 0 0 8.0 5.9
FN (%) 0 90 2.0 0 0 0
RD (sample) 39.5 Na 642.2 300.3 437.9 280.9
CT (s) 0.5 1037.5 9.2 0.1 6.61 0.1
DriftVariance
FP (%) Na 10 0 0 9.4 5.8
FN (%) Na 90 0 0 0 0
RD (sample) Na Na 1029.1 630.8 765.8 597.6
CT (s) Na 1050.3 8.8 0.13 7.4 0.2
4 October 2011
Application D2 and D3
CUSUMMann-
Kendall
CI-CUSUM ICI test
T0=2000 T0=500 T0=2000 T0=500
D2
Abrupt
FP (%) Na 6.0 0 26.6 0.0 7.8
FN (%) Na 41.8 12.0 7.3 6.0 2.6
RD (sample) Na 1003.0 574.7 487.1 196.1 229.1
CT (s) Na 51.7 2.3 2.0 0.05 0.07
Drift
FP (%) Na 6.1 0 25.3 0 7.3
FN (%) Na 58.5 22.6 9.3 10.7 8.6
RD (sample) Na 1718.1 1304.1 996.5 831 811.4
CT (s) Na 50.6 2.3 1.5 0.05 0.05
D3
Abrupt
FP (%) Na Na 0.3 60.6 1.3 8.6
FN (%) Na Na 5.3 2.6 1.3 2.0
RD (sample) Na Na 384.5 438.1 361.7 295.4
CT (s) Na Na 81.4 55.1 0.1 0.5
Drift
FP (%) Na Na 0.7 61.6 0.6 14
FN (%) Na Na 12 1.3 4.6 4
RD (sample) Na Na 1911.3 1924.1 1890.1 1843.1
CT (s) Na Na 49.2 73.4 0.1 0.1
4 October 2011
Pros / Cons
� Pros
• Good performance (Prompter detections, less FP and FN)
• Low computation complexity
• No need of alternative hypothesis, a change corresponds to the impossibility, according to ICI rule, to fit a zero order polynomial to
the whole feature set
� Cons
• Process is handled by means of features,
− changes that do not affect features are not perceived
− data are processed in subsequences
• ICI balances bias/variance, while a change detection test should be zero-bias.
4 October 2011
Pros / Cons
� Pros
• Good performance (Prompter detections, less FP and FN)
• Low computation complexity
• No need of alternative hypothesis, a change corresponds to the impossibility, according to ICI rule, to fit a zero order polynomial to
the whole feature set
� Cons
• Process is handled by means of features,
− changes that do not affect features are not perceived
− data are processed in subsequences
• ICI balances bias/variance, while a change detection test should be zero-bias.
4 October 2011
Performance of the ICI on long feat sequences
� How does the Recognition Delay vary when incrases?
� We follow a Monte Carlo approach considering processes having abrupt changes at different time instant
0TO
*T T̂
*T*
( ˆ )T T−
RD
4 October 2011
0T
ICI behavior on long time execution – Monte Carlo
� The later the change, the more observations are required
� These delays cannot be analytically compensated on-line.
Re
co
gn
itio
n D
ela
y
4 *10 ·T
4 October 2011
� The later the change, the more observations are required
� These delays cannot be analytically compensated on-line.
� ICI provides prompter detection on shorter observation sequences.
ICI behavior on long time execution – Monte Carlo
4 *10 ·T
Re
co
gn
itio
n D
ela
y
4 October 2011
Refinement Procedure
� Let be the test outcome
4 October 2011
Refinement Procedure
� Let be the test outcome
� We split the segment into
4 October 2011
Refinement Procedure
� Let be the test outcome
� We split the segment into
� We run the test on , where it should react faster
4 October 2011
Refinement Procedure
� Let be the test outcome
� We split the segment into
� We run the test on , where it should react faster
� Assume there is a detection in
4 October 2011
Refinement Procedure
� Let be the test outcome
� We split the segment into
� We run the test on , where it should react faster
� Assume there is a detection in
� Set , and run the test on
4 October 2011
Refinement Procedure
� Let be the test outcome
� We split the segment into
� We run the test on , where it should react faster
� Assume there is a detection in
� Set , and run the test on
� Assume that there is a detection in
4 October 2011
Refinement Procedure
� Let be the test outcome
� We split the segment into
� We run the test on , where it should react faster
� Assume there is a detection in
� Set , and run the test on
� Assume that there is a detection in
� Stop as
4 October 2011
Refinement Procedure
� Let be the test outcome
� We split the segment into
� We run the test on , where it should react faster
� Assume there is a detection in
� Set , and run the test on
� Assume that there is a detection in
� Stop as
� The refined estimate is
4 October 2011
Re
co
gn
itio
n D
ela
yRefinement Procedure Performance
� The change-detection refinement procedure effectively reduces Recognition Delays when increases
*T
4 *10 ·T
4 October 2011
Re
co
gn
itio
n D
ela
yRefinement Pocedure Performance
4 *10 ·T
4 October 2011
What for the refinement procedure?
� The gap between the initial detection and the refined detection is assumed to be composed of samples generated by X in the novel status
� These samples can be considered as representative of the novel stationary status and can be used as a new training set for the test, in order to detect further changes
4 October 2011
0TO
*T T̂0T
What for the refinement procedure?
� The gap between the initial detection and the refined detection is assumed to be composed of samples generated by X in the novel status
� These samples can be considered as representative of the novel stationary status and can be used as a new training set for the test, in order to detect further changes
4 October 2011
0TO
*T T̂0T refT
What for the refinement procedure?
� The gap between the initial detection and the refined detection is assumed to be composed of samples generated by X in the novel status
� These samples can be considered as representative of the novel stationary status and can be used as a new training set for the test, in order to detect further changes
4 October 2011
0TO
*T T̂0T refT
Experiments
� The test has been paired with an on-line classification system.
• Data are taken from two different x-ray sources, the goal is to determine the source out of the
• k-NN classifier; 1 sample out of five is classified by a supervisor
4 October 2011
Experiments
� The test has been paired with an on-line classification system.
• Data are taken from two different x-ray sources, the goal is to determine the source out of the
• k-NN classifier; 1 sample out of five is classified by a supervisor
4 October 2011
Experiments
� The test has been paired with an on-line classification system.
• Data are taken from two different x-ray sources, the goal is to determine the source out of the
• k-NN classifier; 1 sample out of five is classified by a supervisor
cla
ssific
atio
n e
rro
r, o
ve
r 3
0 r
un
s
4 October 2011
Ongoing Works
� Modeling the behavior of ICI rule as t increases,
• to motivate the refinement procedure
• comparison with parametric tests
� Use gaussianizing transform (such as Box Cox) to define attitional features
� Use Gaussianity tests on features when a novel training set is identified
� The test can be used to monitor polynomial trends in features
� A truly-multivariate extension would be very useful...4 October 2011
4 October 2011
4 October 2011
Training Phase
1. Compute the features on training subsequences
2. Compute polynomial estimate of features
3. Compute the first six cumulants of X from
4. Compute h_0 as in [REF] and define
5. Compute
6. Define
Training Phase
training samples
0
0 0
1
( )ˆ /M
S
S
s
M s Sµ=
=∑
0 0{ ( ), 1, , }T X t TO t= = …{ } { }0 0 0 0( ), 1, , ( ), 1, , ,, /S SM Ts s S s s S ν… … == =
( )00
0
2
01
ˆ( )
1ˆ
MS
M S
S
s
M s
S
µσ
=
−
−= ∑
Tra
in
0TO
( ){ }0( )( ) ,, 1,S s sV s S== …T
0
0
01
( )ˆ
S
V
s
SS
V sµ
=
=∑ ( )00
0
2
01
ˆ( )
1ˆ
SS
S
V
V
s
V s
S
µσ
=
−
−= ∑ 0 0 0 0 0
;ˆ ˆ ˆ ˆV V V V VS S S S Sµ σ µ σ = − + I
0 0 0 0 0;ˆ ˆ ˆ ˆ
S S S SM M M M
SMµ σ µ σ = − + I
0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V
S S S S Sµ σ µ σ = − + I
0 0 0 0 0;ˆ ˆ ˆ ˆ
S S S SM M M M
SMµ σ µ σ = − + I
4 October 2011
Training Phase
1. Compute the features on training subsequences
2. Compute polynomial estimate of features
3. Compute the first six cumulants of X from
4. Compute h_0 as in [REF] and define
5. Compute
6. Define
Training Phase
training samples
0
0 0
1
( )ˆ /M
S
S
s
M s Sµ=
=∑
0 0{ ( ), 1, , }T X t TO t= = …{ } { }0 0 0 0( ), 1, , ( ), 1, , ,, /S SM Ts s S s s S ν… … == =
( )00
0
2
01
ˆ( )
1ˆ
MS
M S
S
s
M s
S
µσ
=
−
−= ∑
Tra
in
0TO
( ){ }0( )( ) ,, 1,S s sV s S== …T
0
0
01
( )ˆ
S
V
s
SS
V sµ
=
=∑ ( )00
0
2
01
ˆ( )
1ˆ
SS
S
V
V
s
V s
S
µσ
=
−
−= ∑ 0 0 0 0 0
;ˆ ˆ ˆ ˆV V V V VS S S S Sµ σ µ σ = − + I
0 0 0 0 0;ˆ ˆ ˆ ˆ
S S S SM M M M
SMµ σ µ σ = − + I
0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V
S S S S Sµ σ µ σ = − + I
0 0 0 0 0;ˆ ˆ ˆ ˆ
S S S SM M M M
SMµ σ µ σ = − + I
4 October 2011
The ICI-based change detection test
� The proposed test relies on a set of functions that transform the observations into Gaussian distributed features
� The ICI rule, combined with a polynomial regression technique,
assesses the stationary of the features (and hence of the process)
Feature Extraction
X(t) ICI ruletest
outcomePolynomial Regression
A change is detected in the process when at least one of the features shows a change
4 October 2011
The ICI-based change detection test: polynomial regression
Feature Extraction
X(t) ICI ruletest
outcomePolynomial Regression
� Let us consider a set of neighborhoods having the leftmost extreme at S=1
� The estimates associated to each
neighborhood are obtained by the
0th order polynomial fit of feature
values: stationary processes provide
constant feature values
� Any optimal neighborhood selected by
ICI contains feature values that can be considered constant for a
stationary process.
{ }0[1, ], ,[1, ]S S…
0U =
1U =
2U =
3U =
0S
1S
2S
3S
fea
ture
va
lue
s
4 October 2011
The ICI rule
� Thus, it is reasonable to exploit the adaptive neighborhood selected using the ICI rule
ˆ ( )tµ
t
( )z t
ˆ ( )tµˆ ( )tµ
4 October 2011
Problem Statement
� Let be the data generating process
� Let be the training set, where is stationary.
� The goal is to determine , the time instant when becomes non stationary
� are i.i.d. samples, the of is unkwnon
• practical implications (updating KB of a classifier)
X
*0( )T T>
0 0{ ( ), 1, , }T X t TO t= = …
:d
X →� �
X
{ }( ) , *X t t T< X
4 October 2011
Test Execution
1. Compute the features on the current sequence
2. Fit constant value to each feature in [0,s]
3. Compute the estimator’s standard deviation
4. Intersect Confidence Intervals
Feature
Extraction
s sequence,
( ), ( )M s V s
1 1,Ms s
V− −I I
1ˆ( 1)·
ˆ( )
MM ss
s
s
s Mµµ − +−
=
0ˆ
ˆ
MSM
ss
σσ =
1ˆ( 1)·
ˆ( )V s
s
V
s
s V sµµ − +−
=
0ˆ
ˆ
VSV
ss
σσ =
Polynomial
Regression
1ˆ ˆ ˆ ˆ;s s
V V V V V Vss s sµ σ µ σ −∩ = − + I I
1ˆ ˆ ˆ ˆ;s s
V V V V V Vss s sµ σ µ σ −∩ = − + I I
( )&& sM Vs ≠ ∅ ≠ ∅I I
ICI rule
1 1,Ms s
V− −I I
Detect?no
4 October 2011
The ICI-based change detection test: the algorithm
1. Compute and2.
3. Define4. Compute the first six cumulants of from5. Compute6.
7. Define8. Set
9. while {10. Set11. Wait for samples, until is populated12. Compute and
13.
14.
15.16.17. }18. Detect a change in
{ }0( ), , ,1M s s S= …
0
0
01
( )ˆ
S
s
MS
M
S
sµ
=
=∑ ( )00
0
2
01
ˆ( )
1ˆ
MS
M S
S
s
M s
S
µσ
=
−
−= ∑
0 0 0 0 0;ˆ ˆ ˆ ˆ
S S S SM M M M
SMµ σ µ σ = − + I
X TS
{ }0( ), , ,1S s s S= …
( ){ }0( )( ) ,, 1,S s sV s S== …T
0
0
01
( )ˆ
S
V
s
SS
V sµ
=
=∑ ( )00
0
2
01
ˆ( )
1ˆ
SS
S
V
V
s
V s
S
µσ
=
−
−= ∑
0 0 0 0 0;ˆ ˆ ˆ ˆV V V V V
S S S S Sµ σ µ σ = − + I
0s S=
( )&& sM Vs ≠ ∅ ≠ ∅I I
0 1s S= +
ν ( )Y s
( )M s ( )V s
1ˆ( 1)·
ˆ( )
MM ss
s
s
s Mµµ − +−
= 0ˆ
ˆ
MSM
ss
σσ =
1ˆ( 1)·
ˆ( )V s
s
V
s
s V sµµ − +−
= 0ˆ
ˆ
VSV
ss
σσ =
1ˆ ˆ ˆ ˆ; s
M M M M M Ms ss s sµ σ µ σ −∩ = − + I I
1ˆ ˆ ˆ ˆ;s s
V V V V V Vss s sµ σ µ σ −∩ = − + I I
( )1 , ss ν ν −
Training Phase
training samples
0 0{ ( ), 1, , }T X t TO t= = …
s sequence,1 1,
Ms s
V− −I I
Polynomial Regression
ICI rule
1 1,Ms s
V− −I I
Detect?no
Feature Extraction
yesChangeDetected
4 October 2011