Upload
sara-pope
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Advanced Methods of PredictionAdvanced Methods of Prediction
Motti Sorani Boaz Cohen
Supervisor: Gady Zohar
Technion - Israeli Institute of TechnologyDepartment of Electrical Engineering
The Image and Computer Vision Laboratory
Press <a> for information
Press <q> to quit
t=88
PtNum=54
SelDim=7
MSe=0.005135
Real=79
Predicted=79.5789
Hi ConLim=80.9483
Low ConLim=78.2096
60 70 80 90 100 110 120 130 140
-50
0
50
100
150
200
250
Predicition of Xn+1: NMSE=0.071472
n
Xn+
1
Real PredictedHi Lo
Project GoalsProject Goals
Enhanced prediction scheme based on the former project.
Better approximation of the system behavior using Kalman Filtering.
Implementation of a competative prediction tool that is based on neural network
Implementation of LZ Predictor, and adapting its prediction scheme to continuous signals.
Enhanced Prediction SchemesEnhanced Prediction Schemes
The following points of weakness were diagnosed in the former project:
A need for an “optimal” criterion while searching for the optimal-evaluation-environment
Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)
Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.
Enhanced Prediction SchemesEnhanced Prediction Schemes
The following points of weakness were diagnosed in the former project:
A need for an “optimal” criterion while searching for the optimal-evaluation-environment
Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)
Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.
Confidence Interval CriterionConfidence Interval Criterion
The former criterions: The Neighbor Criterion
Xn
Xn+1
XnewXnei
optXnew+1
Confidence Interval Criterion (cont)Confidence Interval Criterion (cont)
The former criterions: The Nmse Criterion
Xn
Xn+1
Xnew
optXnew+1
Confidence Interval Criterion (cont)Confidence Interval Criterion (cont)
The New criterion: The Confidence Interval Criterion
Choose the environment in which the regression has the best (minimal) confidence interval.
Motivation: The confidence interval gives us the interval around in which exists in 90% probability.
Small Confidence Interval Better Evaluation Env.
1nX 1nX
קריטריון
NMSEמרווח סמך שכן
משך חיזוי
מספר נקודות
האות
0.037534 0.029739 0.023888 100 1,000 D
0.001858 0.002973 0.001724 100 1,000 HEN
0.128929 0.019392 0.040898 100 1,000 AA
0.086130 0.073822 0.081742 100 2,180 AA
Confidence Interval Criterion (cont)Confidence Interval Criterion (cont)
Criterions Comparison
Confidence Interval Criterion (cont)Confidence Interval Criterion (cont)
Criterions Comparison
0 10 20 30 40 50 60 70 80 90 100 0
0.2
0.4
0.6
0.8
1
1.2
1.4
Predicition of Xn+1: NMSE=0.023888
n
Xn
+1
Real Predicted
Confidence Interval Neighbor NMSE
Real Predicted
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
1.2Predicition of Xn+1: NMSE=0.029739
n
Xn+
1
Real Predicted
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
1.2Predicition of Xn+1: NMSE=0.037534
n
Xn+
1
Confidence Interval Criterion - ConclusionsConfidence Interval Criterion - Conclusions
The Confidence Interval criterion proved its superiority over the NMSE criterion. In most cases
it was better than the Neighbor criterion as well
Thus, the Confidence-Interval Criterion was selected to be the
major criterion in our experiments.
Enhanced Prediction SchemesEnhanced Prediction Schemes
The following points of weakness were diagnosed in the former project:
A need for an “optimal” criterion while searching for the optimal-evaluation-environment
Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)
Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.
Multi Dimensional PredictionMulti Dimensional Prediction
In the former project: Prediction is done on a fixed dimensional state-vector. (the dimension is the fractal dimension of the set).
The reason:
Smaller dimension the attractor won’t be embedded correctly in the embedding space
Bigger dimension the points go far from each other demands a large number of samples
Multi Dimensional Prediction (cont)Multi Dimensional Prediction (cont)
Fixed Dimensional Prediction
•Advantage: Speed, Speed, Speed.
•Disadvantage: The fractal dimension calculated is an averaged one. We know that certain areas of the attractor have a bigger dimension than the averaged value.
We want to allow our prediction to increase/decrease dimension as needed
Multi Dimensional Prediction (cont)Multi Dimensional Prediction (cont)
The Solution
Xn(samples)
Embedding
Dim = 1
Dim = 2
Dim = 3
Dim = 10
Prediction
Dim = 1
Dim = 2
Dim = 3
Dim = 10
PickBest
(in terms of Confidence
Interval)
Xn+1(samples)
Multi Dimensional Prediction (cont)Multi Dimensional Prediction (cont)
Example Set: AA N: 2180 LookAhead: 200
0 50 100 150 200 -100
-50
0
50
100
150
200
250
300 Predicition of Xn+1: NMSE=0.071472
n
Xn
+1
Real Predicted Hi Lo
0 50 100 150 200 -100
-50
0
50
100
150
200
250
300 Predicition of Xn+1: NMSE=0.136284
n
Xn
+1
Real
Predicted Hi
Lo
Multi Dim Dim = 5
Multi Dimensional Prediction - ConclusionsMulti Dimensional Prediction - Conclusions
As we expected, using Multi-Dimensional Prediction improved the
quality of the prediction, in cost of run-time.
Enhanced Prediction SchemesEnhanced Prediction Schemes
The following points of weakness were diagnosed in the former project:
A need for an “optimal” criterion while searching for the optimal-evaluation-environment
Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)
Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.
Asymmetrical Evaluation EnvironmentAsymmetrical Evaluation Environment
Xn
In the former project: searching for environments that are symmetrical around Xnew.
Poor results near sharp areas
Xn+1
opt
Xnew
Asymmetrical Evaluation Environment (cont)Asymmetrical Evaluation Environment (cont)
The algorithm (by example):
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 System Transform Function
XnMin Max
Step 1: Partition of the range
Asymmetrical Evaluation Environment (cont)Asymmetrical Evaluation Environment (cont)
The algorithm (by example): Step 2: Try all possibilities
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 System Transform Function
XnMin Max
Asymmetrical Evaluation Environment (cont)Asymmetrical Evaluation Environment (cont)
The algorithm (by example): Step 3: Find the optimal
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 System Transform Function
XnMin Max
Opt
Asymmetrical Evaluation Environment (cont)Asymmetrical Evaluation Environment (cont)
The algorithm (by example): Step 4: go to step 1 (repartition)
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 System Transform Function
XnMin Max
Asymmetrical Evaluation Environment (cont)Asymmetrical Evaluation Environment (cont)
Examples Set: AA N: 2180 LookAhead:100 Dim: 2
Symmetric Env Asymmetric Env
0 20 40 60 80 100 -50
0
50
100
150
200
250
300
Predicition of Xn+1: NMSE= 0.083721AvrgConfidenceInterval=6.9375
HitRatio=0.8
n
Xn+1
Real
Predicte
d Hi
Lo
0 20 40 60 80 10
0
-50
0
50
10
0
15
0
20
0
25
0
30
0
Predicition of Xn+1: NMSE= 0.081769AvrgConfidenceInterval=6.1299
HitRatio=0.71
n
Xn+1
Real
Predict
ed Hi
Lo
Asymmetrical Evaluation Environment - Asymmetrical Evaluation Environment - ConclusionsConclusions
•The algorithm succeeds in finding environment with minimum value of the quality criterion.
•Thus, the confidence interval is reduced, but in some cases the Hit-Ratio isn’t improved.
•Possible reason: Noise Contribution
System approximation using System approximation using Kalman FilteringKalman Filtering
The model:
)()()(
)()()1(
kkCxky
kkAxkx
One Dimensional Kalman Filter
Noises are gaussian, independent in time, and independent one of each other.
Kalman FilterKalman FilterThe filter
kykbkxkakx 1ˆˆ
• Recursive filter
• Optimization problem - finding of a(k) and b(k) that minimize the error
)k(eE)k(p
)k(x)k(x)k(e2
kpkCbkpkp
1kpAkp
kpCkCpkb
1kxACkykb1kxAkx
11
221
121
21
Kalman FilterKalman Filter
The Extended Kalman Filter (EKF)
The model:
kkCxky
kkxkx
)(1
is non-linear
•x, w can be multi-dimensional
Kalman FilterKalman Filter
The Extended Kalman Filter (EKF)
The model:
kkukBkxkAkx 1
kkxkCky
•A, B are local linear approximation of
•EKF doesn’t promise us the optimal solution!
Kalman FilterKalman Filter
The Extended Kalan Filter (EKF)
The filter:
kukBkxkAkCkykK
kukBkxkAkx
ˆ111
ˆ1ˆ
kPkCkKkPkP
kRkCkPkCkCkPkK
kQkAkPkAkP
**
1**
*
~''
'~
1
System approximation using System approximation using Kalman FilteringKalman Filtering
{ y(n)}
Embedding
{X(n)}
Regression
f פונקצית העתקה
Kalman Filter
f התכנסה?
לא כן
) X ( f X N 1 N
Our goal:
To eliminate the measurement noise from the state vectors
Kalman Filtering examplesKalman Filtering examples
0.4 0.5 0.6 0.7 0.8 0.9 1 1.10.4
0.5
0.6
0.7
0.8
0.9
1Real Transform
0.5 0.6 0.7 0.8 0.9 1 1.10.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1Filtered Transform
12242 10 10 Linear Transform N=1000
העתקהמספר נקודות לימודרעש מערכתרעש מדידהמספר איטרציות של סינון קלמן.5210
לינארית1.0913951.0932821.0938541.63505750
לינארית0.232660.232660.232660.22485950לינארית0.7459150.8625830.9117363.2330150
לינארית1.0901631.0920351.0926881.62688750לינארית0.6472210.6767510.8937824.46582550
משולש0.0004890.0004890.0004890.000534150משולש1.2910861.1764071.3468960.436515450
Prediction using Kalman FilteringPrediction using Kalman Filtering
1210
1210
210
210
610
810
410
210
410
1210
210
810
810
810
1 n x
nx
העתקת משולש
NMSE of prediction
Prediction using Kalman FilteringPrediction using Kalman Filtering
Example
8242 10 10 Linear Transform N=50
ITR=5 ITR=1 Without Kalman
Prediction using Kalman Filtering - ConclusionsPrediction using Kalman Filtering - Conclusions
•EKF demands accurate knowledge of the behavior of the system, but having an accurate knowledge is the reason why we use the Kalman filter…
•We checked the iterative process of:filter improved transform filter
Predicting signals with fast-changes in their behavior are not improved by this scheme (the fast changes are considered as noise, the filter smoothes the behavior)
•Finger Rule: prediction will be efficient if measurement noise is greater than system noise in at least one order.
•In most cases first iteration is enough.
Competitive tool - neural networkCompetitive tool - neural network
We implemented a competitive prediction tool that is based on neural network, to be used as a comparison to our prediction scheme.
we used the backpropagation algorithm in order to train the network.
The tool was written in MATLAB.
Competitive tool - neural networkCompetitive tool - neural network
NMSE
Neural Net.
NMSE
Our Predictor
Duration Learning Points Signal
0.3442 0.2835 100 1000 AA
0.2217 0.1576 100 2180 AA
0.2251 0.0648 100 3870 AA
0.3913 0.0258 100 1000 D
Our predictor uses the Confidence-Interval criterion
Comparison
Competitive tool - neural networkCompetitive tool - neural network
Comparison
Real
Predicted
0 20 40 60 80 1000
50
100
150
200
250
300Predicition of Xn+1: NMSE=0.283492
n
Xn+
10 20 40 60 80 100 120
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1NMSE=0.3442
Neural Network Our predictor
Set: AA N: 1000 LookAhead: 100
Competitive tool - neural networkCompetitive tool - neural network
Comparison
Neural Network Our predictor
0 20 40 60 80 100 1200
0.2
0.4
0.6
0.8
1
1.2
1.4NMSE=0.3913
Set: D N: 1100 LookAhead: 100
Real
Predicted
0 20 40 60 80 1000
0.2
0.4
0.6
0.8
1
1.2Predicition of Xn+1: NMSE=0.025866
n
Xn+
1
Competitive tool - neural network - conclusionsCompetitive tool - neural network - conclusions
The comparison between the prediction results of our tool, and
the neural network shows our tool’s superiority, for the signals
that were tested.
Sequential PredictionSequential Prediction
Common usage: for signals with finite accuracy
The Idea: The predictor is FS. (Final State Predictor)It keeps in memory only part of the past knowledge, thus can be used for sequential prediction of infinite set.
Sequential PredictionSequential Prediction
Some Terms before we start...
Alpha-Bet: set of all possible values of measurement.For example, digital information has an alpha-bet of {0,1}
We deal with the case of finite alpha-bet
Sequential PredictionSequential Prediction
xn
nx 1nx FS Predictor
Predictor keeps all the information needed for the prediction inside.In other words the FS
predictor keeps an approximation for the system’s state which it updates sequentially.
Sequential PredictionSequential Prediction
xn
nx1nC
FS Predictor
C ,CC
C)xof( Class
1n
1n1n
For example: The alpha-bet: {-2, -1, 0, 1, 2}The Classes: Negative {-2, -1}
Non-negative {0, 1, 2}
Sequential PredictionSequential Prediction
The sequential FS Prediction scheme:
1nx
1nS
2nx
nx
nS
1nx
2nx
2nS
3nx
3nx
3nS
4nx
4nx
4nS
5nx
f f f f f
g g g g
f - stochasticg- deteministic
The problem:Find optimal f & gthat minimize the fraction of errors
Sequential PredictionSequential Prediction
Markovian PredictorMarkovian predictor of order k is a FS-predictor with the following properities:
•The state is composed of k-order embedding of the last samples.
•The f-function is:
))x,...,x|1(p( yprobabilit with"1"
))x,...,x|0(p( yprobabilit with"0"x
1knnn
1knnn1n
2)x...x(N
1)0,x...x(N)x,...,x||0(p
n1knn
n1knn1knnn
Empiric probability
The problem:Increasing k as n is increased
LZ PredictorLZ Predictor
LZ Predictor
FS predictor that increases its order automatically.
Based on LZ parsing.
LZ ParsingLZ Parsing
The result of parsing:00101010100is: 0, 01, 010, 1, 0100
The dictionary tree is actually the g-function. The probabilities in the nodes generate the f-function.
The tree is self-increasing.
Applying LZ Predictor on continuous signalsApplying LZ Predictor on continuous signals
Xn 1nC
Mapscont.todisc.
LZPredictor
For Example: Predicting the aim of the signal
NOTE: The partitioning of the continuous space to the cells is very important for the quality of the prediction
Applying LZ Predictor on continuous signalsApplying LZ Predictor on continuous signals
Results
Predicting the sequence of 000100010001… with Salt&pepper noise.
641282565121024204840968192Np \ N0.410.340.300.300.260.250.230.210.1000000.410.270.210.170.110.090.070.050.0100000.380.260.220.150.110.080.060.040.0010000.360.230.200.160.100.080.060.040.0001000.330.260.200.160.100.070.050.040.000010
Applying LZ Predictor on continuous signalsApplying LZ Predictor on continuous signals
Example - Stocks
Prediction of the aim of the signal
Fraction of ErrorsLEV
0.47546520.48900240.49577080.467005160.45854532
Applying LZ Predictor on continuous signals Applying LZ Predictor on continuous signals - Conclusions- Conclusions
•The fraction error is lower-bounded as it can be seen in the case of the binary-sequence (decreasing the noise probability doesn’t decrease the error).The reason: Guessing at the leaves of the dictionary tree.
•Discretization of a continuous signals shows good results especially for the STOCKS signal.
•Partitioning the space to cells proved to be very effective.