Dataset Shift Detection in Non-Stationary Environments using EWMA Charts

http://isrc.ulster.ac.uk

Magee Campus

Dataset Shift Detection in Non-Stationary Environments using EWMA Charts

1

Prof. Girijesh PrasadCo-authors: Haider Raza, Yuhua Li

School of Computing & Intelligent Systems @ Magee,Faculty of Computing & Engineering, Derry~Londonderry.

[email protected]


Magee Campus

Outline

Motivation

Background

Proposed contribution

Future work and Conclusion

2


Magee Campus

Motivation

Classical learning systems are built upon the assumption that the input data distribution for the training and testing are same.

Real-world environments are often non-stationary (e.g., EEG-based BCI) So, learning in real-time environments is difficult due to the non-

stationarity effects and the performance of system degrades with time. So, predictors need to adapt online. However, online adaptation

particularly for classifiers is difficult to perform and should be avoided as far as possible and this requires performing in real-time:

Non-stationary shift-detection test.

3

x1

x2

t

e

Inputs

Target

Error

-

+y


Magee Campus

Background

4

Supervised learning (Mitchell, 1997)(Sugiyama et al. 2009)

Non-stationaryenvironments

(M Krauledat 2008), (Sugiyama 2012).

Datasetshift

(Torres et al. 2012),

Dataset shift-detection(Shewhart 1939),

(Page 1954),(Roberts 1959),

(Alippi et al. 2011b),(Alippi & Roveri 2008a; Alippi & Roveri 2008b)

• Supervised learning

• Non-stationary environments

• Dataset shift

• Dataset shift-detection

• Proposed Work

Proposed Work

Shift-Detection


Magee Campus

5

Supervised Learning

Training samples: Input and output ( Learn input-output rule:

Assumption: “Training and test samples are drawn from same probability distribution” i.e.,

Is this assumption really true?

No….!!! Not always true

Reason :- Non-StationaryEnvironments !


Magee Campus

𝓝(𝝁 ,𝝈 𝟐)

6

Non-StationarityDefinition: Let be a multivariate time-series, where is an index set. Then is called stationary time-

series, if the probability distribution does not change over time, i.e.,

for all. A time-series is called non-stationary, if it is not stationary.

For examples:

Brain-computer interface Robot control Remote sensing application Network intrusion detection

What is the challenge?

Learning from past only is of limited use


Magee Campus

7

Dataset ShiftDataset Shift appears when training and test joint distributions are different. That is, when (Torres, 2012)

*Note : Relationship between covariates (x) and class label (y) XY: Predictive model (e.g., spam filtering) YX: Generative model (e.g., Fault detection )

Types of Dataset Shift Covariate Shift Prior Probability Shift Concept Shift

Covariate shift appears only in XY problems Prior probability shift appears only in YX problems Concept shifts appears


Magee Campus

8

Dataset Shift-Detection

Detecting abrupt and gradual shifts in time-series data is called the data shift-detection.

Types of Shift-Detection Retrospective/offline-detection: (i.e., Shift-point analysis) Real-time/online-detection: (i.e., Control charts)

Types of Control Charts Shewart Chart (Shewart, 1939) Cumulative Sum (CUSUM) (E S Page, 1954) Exponentially Weighted Moving Average (EWMA) (S W Roberts, 1959) Computational Intelligence CUSUM (CI-CUSUM) (Alippi et al., 2008) Intersection of Confidence Interval (ICI) (Alippi et al., 2011 )


Magee Campus

Proposed Contribution

We have proposed dataset shift-detection test.

• Shift-Detection based on Exponentially Weight Moving Average (SD-EWMA) model

9


Magee Campus

Shift-Detection based on Exponentially Weight Moving Average (SD-EWMA)

(1)

where λ is the smoothing constant (0< ≤1).λ

10

It is a first-order integrated moving average (ARIMA) model.

(2)Where is a sequence of i.i.d random signal with zero mean and constant variance.Equation (1) with , is the optimal 1-step-ahead prediction for this processThe 1-step-ahead error are calculated as (3)IF the 1-step-ahead error is normally distributed, then

LCL UCL


Magee Campus

11

Training Input: For each observation in the training dataset, go to training phase and compute the parameters for testing.Testing input: Receive new data sample-by-sample and perform the check as follows. IF (Shift detected) THEN (Report the point of shift and initiate an appropriate corrective action) ELSE (Continue and integrate the upcoming information).Output: Shift-detection points Training Phase1. Initialize training data: , is the size of training data.2. Calculate the mean of input data and set it to the initial value of 3. Compute the -statistics for each observation in training data

4. Compute 1-step-ahead prediction errors

5. Compute the sum of 1-step-ahead prediction error divided by number of observations and use it as the initial value of variance for the testing phase.6. Set by minimizing the sum of squared prediction error on training data.

Testing Phase7. For each data point in the operation/testing phase, 8. Compute 9. Compute 10. Compute the estimated variance 11. Compute and 12. 13. 14. IF 15. THEN 16. Continue processing with new sample 17. ELSE (Covariate Shift detected)18. Initiate an appropriate corrective action.

Proposed Algorithm: SD-EWMA


Magee Campus

Datasets

12

Synthetic DataDataset 1-Jumping Mean (D1): where is a noise with mean and standard deviation 1.5. The initial values are set as. A change point is inserted at every 100 time steps by setting the noise mean at time as where is a natural number such that. Dataset 2-Scaling Variance (D2): The change point is inserted at every 100 time steps by setting the noise standard deviation at time as where is a natural number such that Dataset 3-Positive-Auto-correlated (D3): The dataset is consisting of 2000 data-points, the non stationarity occurs in the middle of the data stream, shifting from to, where denotes the normal distribution with mean and standard deviation respectively.


Magee Campus

13

The real-world data used here are from BCI competition-III dataset (IV-b). This dataset, contains 2 classes,118 EEG channels (0.05-200Hz), 1000Hz sampling rate which is down-sampled to 100Hz, 210 training trials,and 420 test trials.Real-world Dataset: EEG Based Brain Signals

Figure : pdf plot of 3 different sessions’ data taken from the training dataset. It is clear from the plot that, in each session the distribution is changed by shifting the mean from session-to-session transfer.

2.5 3 3.5 4 4.5 50

0.5

1

1.5

Log(bandpower)

Session1

Session 2Session 3

Dataset 4-Auto-correlated (D4): The dataset is a time-series consisting of 2000 data-points using 1-D digital filter from matlab. The filter function creates a direct form II transposed implementation of a standard difference equation. In the filter, the denominator coefficient is changed from 2 to 0.5 after producing 1000 number of points.


Magee Campus

14

0 100 200 300 400 500 60045

50

55

60

65

70

75

80

Sample Number

UCL LCL SD-EWMA

Shift Points

380 385 390 395 400 405 410 415 42055

60

65

70

75

Sample Number

UCL LCL SD-EWMA

Shift Point

Figure: Shift detection based on SD-EWMA: Dataset 1 (jumping mean): (a) the shift point is detected at every 100th point. (b) Zoomed view of figure a: shift is detected at 401st sample by crossing the upper control limit.

250 300 350 400 450 500 550 600 650-20

-10

0

10

20

Sample Number

UCL LCL SD-EWMA

Shifts Points

850 900 950 1000 1050 1100 1150-4

-2

0

2

4

6

8

Sample Number

UCL LCL SD-EWMA

Shift Point

900 950 1000 1050 1100-30

-20

-10

0

10

20

Sample Number

UCL LCL SD-EWMA

Shift Point

(a)

(b)

Figure : Shift detection based on SD-EWMA: (a) Dataset 2 (scaling variance): the shift is detected at 3 points. (b) Dataset 3 (positive auto-correlated): detects the shift after producing 1000 observations. (c) Dataset 4 (Auto-correlated): detects the shift after producing 1000 observations.


Magee Campus

15

Totalshifts

Lambda () # TP # FP #FN % Shift detected

D1

9

0.20 8 1 1 88.80.30 8 3 1 88.80.40 All 4 0 100

D2

5

0.60 All 11 1 1000.70 All 6 0 1000.80 4 8 1 80

D3

1

0.40 All 13 0 1000.50 All 9 0 1000.60 All 15 0 100

D4

1

0.40 All 7 0 1000.50 All 6 0 1000.60 All 7 0 100

TABLE : SD-EWMA shift detection in time-series data

ICI-CDT SD-EWMA

# TP # FP #FN AD # TP # FP #FN AD

D1 6 0 3 35 All 1 0 0

D2 1 0 4 60 All 6 0 0

D3 1 0 0 80 All 9 0 0

D4 1 0 0 60 All 6 0 0

Table : Simulation results on different tests


Magee Campus

16

5500 6000 6500

2.5

3

3.5

4

4.5

5

Sample Number

log(

band

pow

er)

UCL LCL SD-EWMA

Figure 4: A window of 2000 samples obtained from real-world dataset.

Lambda () Number of Trials

# Shifts points SD-EWMASession 2 Session 3

0.01

1-20 0 121-45 3 146-70 4 2

0.05

1-20 10 421-45 9 646-70 7 5

0.10

1-20 16 921-45 22 2146-70 25 17

TABLE 4: SD-EWMA shift detection in BCI data


Magee Campus

Conclusion and Future Work

The drawback of classical supervised learning techniques in non-stationary environments and the motivation behind the dataset shift-detection were discussed.

The background of non-stationary environments and dataset shift-detection were presented.

A proposed SD-EWMA method is presented and the results are discussed.

In future, the SD-EWMA will be combined into an adaptive learning framework for non-stationary learning.

17


Magee Campus

Questions

Thank You !

Documents

Dataset Shift Detection in Non-Stationary Environments using EWMA Charts