Upload
fiachra-philbin
View
68
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Dataset Shift Detection in Non-Stationary Environments using EWMA Charts. Prof. Girijesh Prasad Co-authors: Haider Raza, Yuhua Li School of Computing & Intelligent Systems @ Magee , Faculty of Computing & Engineering, Derry~Londonderry . [email protected]. Outline. Motivation - PowerPoint PPT Presentation
Citation preview
http://isrc.ulster.ac.uk
Magee Campus
Dataset Shift Detection in Non-Stationary Environments using EWMA Charts
1
Prof. Girijesh PrasadCo-authors: Haider Raza, Yuhua Li
School of Computing & Intelligent Systems @ Magee,Faculty of Computing & Engineering, Derry~Londonderry.
http://isrc.ulster.ac.uk
Magee Campus
Outline
Motivation
Background
Proposed contribution
Future work and Conclusion
2
http://isrc.ulster.ac.uk
Magee Campus
Motivation
Classical learning systems are built upon the assumption that the input data distribution for the training and testing are same.
Real-world environments are often non-stationary (e.g., EEG-based BCI) So, learning in real-time environments is difficult due to the non-
stationarity effects and the performance of system degrades with time. So, predictors need to adapt online. However, online adaptation
particularly for classifiers is difficult to perform and should be avoided as far as possible and this requires performing in real-time:
Non-stationary shift-detection test.
3
x1
x2
t
e
Inputs
Target
Error
-
+y
http://isrc.ulster.ac.uk
Magee Campus
Background
4
Supervised learning (Mitchell, 1997)(Sugiyama et al. 2009)
Non-stationaryenvironments
(M Krauledat 2008), (Sugiyama 2012).
Datasetshift
(Torres et al. 2012),
Dataset shift-detection(Shewhart 1939),
(Page 1954),(Roberts 1959),
(Alippi et al. 2011b),(Alippi & Roveri 2008a; Alippi & Roveri 2008b)
• Supervised learning
• Non-stationary environments
• Dataset shift
• Dataset shift-detection
• Proposed Work
Proposed Work
Shift-Detection
http://isrc.ulster.ac.uk
Magee Campus
5
Supervised Learning
Training samples: Input and output ( Learn input-output rule:
Assumption: “Training and test samples are drawn from same probability distribution” i.e.,
Is this assumption really true?
No….!!! Not always true
Reason :- Non-StationaryEnvironments !
http://isrc.ulster.ac.uk
Magee Campus
𝓝(𝝁 ,𝝈 𝟐)
6
Non-StationarityDefinition: Let be a multivariate time-series, where is an index set. Then is called stationary time-
series, if the probability distribution does not change over time, i.e.,
for all. A time-series is called non-stationary, if it is not stationary.
For examples:
Brain-computer interface Robot control Remote sensing application Network intrusion detection
What is the challenge?
Learning from past only is of limited use
http://isrc.ulster.ac.uk
Magee Campus
7
Dataset ShiftDataset Shift appears when training and test joint distributions are different. That is, when (Torres, 2012)
*Note : Relationship between covariates (x) and class label (y) XY: Predictive model (e.g., spam filtering) YX: Generative model (e.g., Fault detection )
Types of Dataset Shift Covariate Shift Prior Probability Shift Concept Shift
Covariate shift appears only in XY problems Prior probability shift appears only in YX problems Concept shifts appears
http://isrc.ulster.ac.uk
Magee Campus
8
Dataset Shift-Detection
Detecting abrupt and gradual shifts in time-series data is called the data shift-detection.
Types of Shift-Detection Retrospective/offline-detection: (i.e., Shift-point analysis) Real-time/online-detection: (i.e., Control charts)
Types of Control Charts Shewart Chart (Shewart, 1939) Cumulative Sum (CUSUM) (E S Page, 1954) Exponentially Weighted Moving Average (EWMA) (S W Roberts, 1959) Computational Intelligence CUSUM (CI-CUSUM) (Alippi et al., 2008) Intersection of Confidence Interval (ICI) (Alippi et al., 2011 )
http://isrc.ulster.ac.uk
Magee Campus
Proposed Contribution
We have proposed dataset shift-detection test.
• Shift-Detection based on Exponentially Weight Moving Average (SD-EWMA) model
9
http://isrc.ulster.ac.uk
Magee Campus
Shift-Detection based on Exponentially Weight Moving Average (SD-EWMA)
(1)
where λ is the smoothing constant (0< ≤1).λ
10
It is a first-order integrated moving average (ARIMA) model.
(2)Where is a sequence of i.i.d random signal with zero mean and constant variance.Equation (1) with , is the optimal 1-step-ahead prediction for this processThe 1-step-ahead error are calculated as (3)IF the 1-step-ahead error is normally distributed, then
LCL UCL
http://isrc.ulster.ac.uk
Magee Campus
11
Training Input: For each observation in the training dataset, go to training phase and compute the parameters for testing.Testing input: Receive new data sample-by-sample and perform the check as follows. IF (Shift detected) THEN (Report the point of shift and initiate an appropriate corrective action) ELSE (Continue and integrate the upcoming information).Output: Shift-detection points Training Phase1. Initialize training data: , is the size of training data.2. Calculate the mean of input data and set it to the initial value of 3. Compute the -statistics for each observation in training data
4. Compute 1-step-ahead prediction errors
5. Compute the sum of 1-step-ahead prediction error divided by number of observations and use it as the initial value of variance for the testing phase.6. Set by minimizing the sum of squared prediction error on training data.
Testing Phase7. For each data point in the operation/testing phase, 8. Compute 9. Compute 10. Compute the estimated variance 11. Compute and 12. 13. 14. IF 15. THEN 16. Continue processing with new sample 17. ELSE (Covariate Shift detected)18. Initiate an appropriate corrective action.
Proposed Algorithm: SD-EWMA
http://isrc.ulster.ac.uk
Magee Campus
Datasets
12
Synthetic DataDataset 1-Jumping Mean (D1): where is a noise with mean and standard deviation 1.5. The initial values are set as. A change point is inserted at every 100 time steps by setting the noise mean at time as where is a natural number such that. Dataset 2-Scaling Variance (D2): The change point is inserted at every 100 time steps by setting the noise standard deviation at time as where is a natural number such that Dataset 3-Positive-Auto-correlated (D3): The dataset is consisting of 2000 data-points, the non stationarity occurs in the middle of the data stream, shifting from to, where denotes the normal distribution with mean and standard deviation respectively.
http://isrc.ulster.ac.uk
Magee Campus
13
The real-world data used here are from BCI competition-III dataset (IV-b). This dataset, contains 2 classes,118 EEG channels (0.05-200Hz), 1000Hz sampling rate which is down-sampled to 100Hz, 210 training trials,and 420 test trials.Real-world Dataset: EEG Based Brain Signals
Figure : pdf plot of 3 different sessions’ data taken from the training dataset. It is clear from the plot that, in each session the distribution is changed by shifting the mean from session-to-session transfer.
2.5 3 3.5 4 4.5 50
0.5
1
1.5
Log(bandpower)
Session1
Session 2Session 3
Dataset 4-Auto-correlated (D4): The dataset is a time-series consisting of 2000 data-points using 1-D digital filter from matlab. The filter function creates a direct form II transposed implementation of a standard difference equation. In the filter, the denominator coefficient is changed from 2 to 0.5 after producing 1000 number of points.
http://isrc.ulster.ac.uk
Magee Campus
14
0 100 200 300 400 500 60045
50
55
60
65
70
75
80
Sample Number
UCL LCL SD-EWMA
Shift Points
380 385 390 395 400 405 410 415 42055
60
65
70
75
Sample Number
UCL LCL SD-EWMA
Shift Point
Figure: Shift detection based on SD-EWMA: Dataset 1 (jumping mean): (a) the shift point is detected at every 100th point. (b) Zoomed view of figure a: shift is detected at 401st sample by crossing the upper control limit.
250 300 350 400 450 500 550 600 650-20
-10
0
10
20
Sample Number
UCL LCL SD-EWMA
Shifts Points
850 900 950 1000 1050 1100 1150-4
-2
0
2
4
6
8
Sample Number
UCL LCL SD-EWMA
Shift Point
900 950 1000 1050 1100-30
-20
-10
0
10
20
Sample Number
UCL LCL SD-EWMA
Shift Point
(a)
(b)
Figure : Shift detection based on SD-EWMA: (a) Dataset 2 (scaling variance): the shift is detected at 3 points. (b) Dataset 3 (positive auto-correlated): detects the shift after producing 1000 observations. (c) Dataset 4 (Auto-correlated): detects the shift after producing 1000 observations.
http://isrc.ulster.ac.uk
Magee Campus
15
Totalshifts
Lambda () # TP # FP #FN % Shift detected
D1
9
0.20 8 1 1 88.80.30 8 3 1 88.80.40 All 4 0 100
D2
5
0.60 All 11 1 1000.70 All 6 0 1000.80 4 8 1 80
D3
1
0.40 All 13 0 1000.50 All 9 0 1000.60 All 15 0 100
D4
1
0.40 All 7 0 1000.50 All 6 0 1000.60 All 7 0 100
TABLE : SD-EWMA shift detection in time-series data
ICI-CDT SD-EWMA
# TP # FP #FN AD # TP # FP #FN AD
D1 6 0 3 35 All 1 0 0
D2 1 0 4 60 All 6 0 0
D3 1 0 0 80 All 9 0 0
D4 1 0 0 60 All 6 0 0
Table : Simulation results on different tests
http://isrc.ulster.ac.uk
Magee Campus
16
5500 6000 6500
2.5
3
3.5
4
4.5
5
Sample Number
log(
band
pow
er)
UCL LCL SD-EWMA
Figure 4: A window of 2000 samples obtained from real-world dataset.
Lambda () Number of Trials
# Shifts points SD-EWMASession 2 Session 3
0.01
1-20 0 121-45 3 146-70 4 2
0.05
1-20 10 421-45 9 646-70 7 5
0.10
1-20 16 921-45 22 2146-70 25 17
TABLE 4: SD-EWMA shift detection in BCI data
http://isrc.ulster.ac.uk
Magee Campus
Conclusion and Future Work
The drawback of classical supervised learning techniques in non-stationary environments and the motivation behind the dataset shift-detection were discussed.
The background of non-stationary environments and dataset shift-detection were presented.
A proposed SD-EWMA method is presented and the results are discussed.
In future, the SD-EWMA will be combined into an adaptive learning framework for non-stationary learning.
17
http://isrc.ulster.ac.uk
Magee Campus
Questions
Thank You !