Advanced Methods of Prediction Motti Sorani Boaz Cohen Supervisor: Gady Zohar Technion - Israeli Institute of Technology Department of Electrical Engineering

Advanced Methods of PredictionAdvanced Methods of Prediction

Motti Sorani Boaz Cohen

Supervisor: Gady Zohar

Technion - Israeli Institute of TechnologyDepartment of Electrical Engineering

The Image and Computer Vision Laboratory

Press <a> for information

Press <q> to quit

t=88

PtNum=54

SelDim=7

MSe=0.005135

Real=79

Predicted=79.5789

Hi ConLim=80.9483

Low ConLim=78.2096

60 70 80 90 100 110 120 130 140

-50

0

50

100

150

200

250

Predicition of Xn+1: NMSE=0.071472

n

Xn+

1

Real PredictedHi Lo

Project GoalsProject Goals

Enhanced prediction scheme based on the former project.

Better approximation of the system behavior using Kalman Filtering.

Implementation of a competative prediction tool that is based on neural network

Implementation of LZ Predictor, and adapting its prediction scheme to continuous signals.

Enhanced Prediction SchemesEnhanced Prediction Schemes

The following points of weakness were diagnosed in the former project:

A need for an “optimal” criterion while searching for the optimal-evaluation-environment

Prediction limited to a fixed dimension (the fractal dimension calculated using GP algorithm)

Symmetrical environments while searching for an optimal evaluation environment - poor results near sharp areas of the system’s behavior.






Confidence Interval CriterionConfidence Interval Criterion

The former criterions: The Neighbor Criterion

Xn

Xn+1

XnewXnei

optXnew+1

Confidence Interval Criterion (cont)Confidence Interval Criterion (cont)

The former criterions: The Nmse Criterion

Xn

Xn+1

Xnew

optXnew+1


The New criterion: The Confidence Interval Criterion

Choose the environment in which the regression has the best (minimal) confidence interval.

Motivation: The confidence interval gives us the interval around in which exists in 90% probability.

Small Confidence Interval Better Evaluation Env.

1nX 1nX

קריטריון

NMSEמרווח סמך שכן

משך חיזוי

מספר נקודות

האות

0.037534 0.029739 0.023888 100 1,000 D

0.001858 0.002973 0.001724 100 1,000 HEN

0.128929 0.019392 0.040898 100 1,000 AA

0.086130 0.073822 0.081742 100 2,180 AA


Criterions Comparison


Criterions Comparison

0 10 20 30 40 50 60 70 80 90 100 0

0.2

0.4

0.6

0.8

1

1.2

1.4

Predicition of Xn+1: NMSE=0.023888

n

Xn

+1

Real Predicted

Confidence Interval Neighbor NMSE

Real Predicted

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1

1.2Predicition of Xn+1: NMSE=0.029739

n

Xn+

1

Real Predicted

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1


n

Xn+

1

Confidence Interval Criterion - ConclusionsConfidence Interval Criterion - Conclusions

The Confidence Interval criterion proved its superiority over the NMSE criterion. In most cases

it was better than the Neighbor criterion as well

Thus, the Confidence-Interval Criterion was selected to be the

major criterion in our experiments.






Multi Dimensional PredictionMulti Dimensional Prediction

In the former project: Prediction is done on a fixed dimensional state-vector. (the dimension is the fractal dimension of the set).

The reason:

Smaller dimension the attractor won’t be embedded correctly in the embedding space

Bigger dimension the points go far from each other demands a large number of samples

Multi Dimensional Prediction (cont)Multi Dimensional Prediction (cont)

Fixed Dimensional Prediction

•Advantage: Speed, Speed, Speed.

•Disadvantage: The fractal dimension calculated is an averaged one. We know that certain areas of the attractor have a bigger dimension than the averaged value.

We want to allow our prediction to increase/decrease dimension as needed


The Solution

Xn(samples)

Embedding

Dim = 1

Dim = 2

Dim = 3

Dim = 10

Prediction

Dim = 1

Dim = 2

Dim = 3

Dim = 10

PickBest

(in terms of Confidence

Interval)

Xn+1(samples)


Example Set: AA N: 2180 LookAhead: 200

0 50 100 150 200 -100

-50

0

50

100

150

200

250

300 Predicition of Xn+1: NMSE=0.071472

n

Xn

+1

Real Predicted Hi Lo

0 50 100 150 200 -100

-50

0

50

100

150

200

250

300 Predicition of Xn+1: NMSE=0.136284

n

Xn

+1

Real

Predicted Hi

Lo

Multi Dim Dim = 5

Multi Dimensional Prediction - ConclusionsMulti Dimensional Prediction - Conclusions

As we expected, using Multi-Dimensional Prediction improved the

quality of the prediction, in cost of run-time.






Asymmetrical Evaluation EnvironmentAsymmetrical Evaluation Environment

Xn

In the former project: searching for environments that are symmetrical around Xnew.

Poor results near sharp areas

Xn+1

opt

Xnew

Asymmetrical Evaluation Environment (cont)Asymmetrical Evaluation Environment (cont)

The algorithm (by example):

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 System Transform Function

XnMin Max

Step 1: Partition of the range


The algorithm (by example): Step 2: Try all possibilities

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


XnMin Max


The algorithm (by example): Step 3: Find the optimal

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


XnMin Max

Opt


The algorithm (by example): Step 4: go to step 1 (repartition)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


XnMin Max


Examples Set: AA N: 2180 LookAhead:100 Dim: 2

Symmetric Env Asymmetric Env

0 20 40 60 80 100 -50

0

50

100

150

200

250

300

Predicition of Xn+1: NMSE= 0.083721AvrgConfidenceInterval=6.9375

HitRatio=0.8

n

Xn+1

Real

Predicte

d Hi

Lo

0 20 40 60 80 10

0

-50

0

50

10

0

15

0

20

0

25

0

30

0

Predicition of Xn+1: NMSE= 0.081769AvrgConfidenceInterval=6.1299

HitRatio=0.71

n

Xn+1

Real

Predict

ed Hi

Lo

Asymmetrical Evaluation Environment - Asymmetrical Evaluation Environment - ConclusionsConclusions

•The algorithm succeeds in finding environment with minimum value of the quality criterion.

•Thus, the confidence interval is reduced, but in some cases the Hit-Ratio isn’t improved.

•Possible reason: Noise Contribution

System approximation using System approximation using Kalman FilteringKalman Filtering

The model:

)()()(

)()()1(

kkCxky

kkAxkx

One Dimensional Kalman Filter

Noises are gaussian, independent in time, and independent one of each other.

Kalman FilterKalman FilterThe filter

kykbkxkakx 1ˆˆ

• Recursive filter

• Optimization problem - finding of a(k) and b(k) that minimize the error

)k(eE)k(p

)k(x)k(x)k(e2

kpkCbkpkp

1kpAkp

kpCkCpkb

1kxACkykb1kxAkx

11

221

121

21

Kalman FilterKalman Filter

The Extended Kalman Filter (EKF)

The model:

kkCxky

kkxkx

)(1

is non-linear

•x, w can be multi-dimensional


The Extended Kalman Filter (EKF)

The model:

kkukBkxkAkx 1

kkxkCky

•A, B are local linear approximation of

•EKF doesn’t promise us the optimal solution!


The Extended Kalan Filter (EKF)

The filter:

kukBkxkAkCkykK

kukBkxkAkx

ˆ111

ˆ1ˆ

kPkCkKkPkP

kRkCkPkCkCkPkK

kQkAkPkAkP

**

1**

*

~''

'~

1

System approximation using System approximation using Kalman FilteringKalman Filtering

{ y(n)}

Embedding

{X(n)}

Regression

f פונקצית העתקה

Kalman Filter

f התכנסה?

לא כן

) X ( f X N 1 N

Our goal:

To eliminate the measurement noise from the state vectors

Kalman Filtering examplesKalman Filtering examples

0.4 0.5 0.6 0.7 0.8 0.9 1 1.10.4

0.5

0.6

0.7

0.8

0.9

1Real Transform

0.5 0.6 0.7 0.8 0.9 1 1.10.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1Filtered Transform

12242 10 10 Linear Transform N=1000

העתקהמספר נקודות לימודרעש מערכתרעש מדידהמספר איטרציות של סינון קלמן.5210

לינארית1.0913951.0932821.0938541.63505750

לינארית0.232660.232660.232660.22485950לינארית0.7459150.8625830.9117363.2330150

לינארית1.0901631.0920351.0926881.62688750לינארית0.6472210.6767510.8937824.46582550

משולש0.0004890.0004890.0004890.000534150משולש1.2910861.1764071.3468960.436515450

Prediction using Kalman FilteringPrediction using Kalman Filtering

1210

1210

210

210

610

810

410

210

410

1210

210

810

810

810

1 n x

nx

העתקת משולש

NMSE of prediction

Prediction using Kalman FilteringPrediction using Kalman Filtering

Example

8242 10 10 Linear Transform N=50

ITR=5 ITR=1 Without Kalman

Prediction using Kalman Filtering - ConclusionsPrediction using Kalman Filtering - Conclusions

•EKF demands accurate knowledge of the behavior of the system, but having an accurate knowledge is the reason why we use the Kalman filter…

•We checked the iterative process of:filter improved transform filter

Predicting signals with fast-changes in their behavior are not improved by this scheme (the fast changes are considered as noise, the filter smoothes the behavior)

•Finger Rule: prediction will be efficient if measurement noise is greater than system noise in at least one order.

•In most cases first iteration is enough.

Competitive tool - neural networkCompetitive tool - neural network

We implemented a competitive prediction tool that is based on neural network, to be used as a comparison to our prediction scheme.

we used the backpropagation algorithm in order to train the network.

The tool was written in MATLAB.


NMSE

Neural Net.

NMSE

Our Predictor

Duration Learning Points Signal

0.3442 0.2835 100 1000 AA

0.2217 0.1576 100 2180 AA

0.2251 0.0648 100 3870 AA

0.3913 0.0258 100 1000 D

Our predictor uses the Confidence-Interval criterion

Comparison


Comparison

Real

Predicted

0 20 40 60 80 1000

50

100

150

200

250

300Predicition of Xn+1: NMSE=0.283492

n

Xn+

10 20 40 60 80 100 120

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1NMSE=0.3442

Neural Network Our predictor

Set: AA N: 1000 LookAhead: 100


Comparison

Neural Network Our predictor

0 20 40 60 80 100 1200

0.2

0.4

0.6

0.8

1

1.2

1.4NMSE=0.3913

Set: D N: 1100 LookAhead: 100

Real

Predicted

0 20 40 60 80 1000

0.2

0.4

0.6

0.8

1


n

Xn+

1

Competitive tool - neural network - conclusionsCompetitive tool - neural network - conclusions

The comparison between the prediction results of our tool, and

the neural network shows our tool’s superiority, for the signals

that were tested.

Sequential PredictionSequential Prediction

Common usage: for signals with finite accuracy

The Idea: The predictor is FS. (Final State Predictor)It keeps in memory only part of the past knowledge, thus can be used for sequential prediction of infinite set.


Some Terms before we start...

Alpha-Bet: set of all possible values of measurement.For example, digital information has an alpha-bet of {0,1}

We deal with the case of finite alpha-bet


xn

nx 1nx FS Predictor

Predictor keeps all the information needed for the prediction inside.In other words the FS

predictor keeps an approximation for the system’s state which it updates sequentially.


xn

nx1nC

FS Predictor

C ,CC

C)xof( Class

1n

1n1n

For example: The alpha-bet: {-2, -1, 0, 1, 2}The Classes: Negative {-2, -1}

Non-negative {0, 1, 2}


The sequential FS Prediction scheme:

1nx

1nS

2nx

nx

nS

1nx

2nx

2nS

3nx

3nx

3nS

4nx

4nx

4nS

5nx

f f f f f

g g g g

f - stochasticg- deteministic

The problem:Find optimal f & gthat minimize the fraction of errors


Markovian PredictorMarkovian predictor of order k is a FS-predictor with the following properities:

•The state is composed of k-order embedding of the last samples.

•The f-function is:

))x,...,x|1(p( yprobabilit with"1"

))x,...,x|0(p( yprobabilit with"0"x

1knnn

1knnn1n

2)x...x(N

1)0,x...x(N)x,...,x||0(p

n1knn

n1knn1knnn

Empiric probability

The problem:Increasing k as n is increased

LZ PredictorLZ Predictor

LZ Predictor

FS predictor that increases its order automatically.

Based on LZ parsing.

LZ ParsingLZ Parsing

The result of parsing:00101010100is: 0, 01, 010, 1, 0100

The dictionary tree is actually the g-function. The probabilities in the nodes generate the f-function.

The tree is self-increasing.

Applying LZ Predictor on continuous signalsApplying LZ Predictor on continuous signals

Xn 1nC

Mapscont.todisc.

LZPredictor

For Example: Predicting the aim of the signal

NOTE: The partitioning of the continuous space to the cells is very important for the quality of the prediction


Results

Predicting the sequence of 000100010001… with Salt&pepper noise.

641282565121024204840968192Np \ N0.410.340.300.300.260.250.230.210.1000000.410.270.210.170.110.090.070.050.0100000.380.260.220.150.110.080.060.040.0010000.360.230.200.160.100.080.060.040.0001000.330.260.200.160.100.070.050.040.000010


Example - Stocks

Prediction of the aim of the signal

Fraction of ErrorsLEV

0.47546520.48900240.49577080.467005160.45854532

Applying LZ Predictor on continuous signals Applying LZ Predictor on continuous signals - Conclusions- Conclusions

•The fraction error is lower-bounded as it can be seen in the case of the binary-sequence (decreasing the noise probability doesn’t decrease the error).The reason: Guessing at the leaves of the dictionary tree.

•Discretization of a continuous signals shows good results especially for the STOCKS signal.

•Partitioning the space to cells proved to be very effective.

Documents

Advanced Methods of Prediction Motti Sorani Boaz Cohen Supervisor: Gady Zohar Technion - Israeli Institute of Technology Department of Electrical Engineering