View
1
Download
0
Category
Preview:
Citation preview
11-755 Machine Learning for Signal Processing
Regression and Prediction
Class 15. 23 Oct 2012
Instructor: Bhiksha Raj
23 Oct 20121 11755/18797
Matrix Identities
The derivativeof a scalar function w.r.t. a vector is a vector
The derivativew.r.t. a matrix is a matrix
23 Oct 201211755/187972
Dx
xx
f...
) (2
1
DD
dxdxdf
dxdxdf
dxdxdf
df...
) (22
11
Matrix Identities
The derivativeof a scalar function w.r.t. a vector is a vector
The derivativew.r.t. a matrix is a matrix
23 Oct 201211755/187973
DD D D
D
D
x x x
x x xx x x
f
.... .. .. ..
..
..
) (
2 1
2 22 21
1 12 11
DDDD
DD
DD
DD
DD
dxdxdf
dxdxdf
dxdxdf
dxdxdf
dxdxdf
dxdxdf
dxdxdf
dxdxdf
dxdxdf
df.. ..
..
..
..
.. ..) (2
2
11
22
2222
1212
11
2121
1111
Matrix Identities
The derivativeof a vector function w.r.t. a vector is a matrix
Note transposition of order
23 Oct 201211755/187974
D Nx
xx
F
FF
...
... ) (2
1
2
1
x F x F
DD
N
DD
DD
N
D
NN
dxdxdF
dxdxdF
dxdxdF
dxdxdF
dxdxdF
dxdxdF
dxdxdF
dxdxdF
dxdxdF
dF
dFdF
.. ........
.. .....
2
1
22
22
2
22
1
1
12
2
11
1
2
1
Derivatives
In general: Differentiating an MxNfunction by a UxVargument results in an MxNxUxVtensor derivative
23 Oct 201211755/187975
,Nx1
UxV
NxUxV
,Nx1 UxV
UxVxN
Matrix derivative identities
Some basic linear andquadratic identities
23 Oct 201211755/187976
a X X a a X Xad d d dT T
) ( ) (X is a matrix,a is a vector. Solution may also be XT
) ( ) ( ; ) ( ) (A X XA X A AXd d d d A is a matrix
a X X a Xa ad dT T T
A X X X AA XAA XA Ad trace d trace d trace dT T T T
) (
A Common Problem
Canyou spot the glitches?7 11755/18797 23 Oct 2012
How to fix this problem?“Glitches” in audio
Must be detected
How?
Thenwhat?
Glitches must be “fixed”Delete the glitch
Results in a “hole”
Fillin the hole
How?
8 11755/18797 23 Oct 2012
Interpolation..
23 Oct 201211755/187979
“Extend” the curve on the left to“predict” the values in the “blank” region
Forwardprediction
Extend the blue curve on the right leftwards to predict the blank region
Backwardprediction
How?
Regression analysis..
Detecting the Glitch
23 Oct 201211755/1879710
Regressionbased reconstruction can be done anywhere
Reconstructed value will not match actual value
Large errorof reconstruction identifies glitches
NOTOK OK
What is a regression
Analyzing relationship between variables
Expressedin many forms
Wikipedia
Linear regression, Simple regression, Ordinary least squares, Polynomialregression, General linear model, Generalized linear model, Discretechoice, Logistic regression, Multinomial logit, Mixed logit, Probit, Multinomial probit, ….
Generally a tool to predictvariables
23 Oct 201211755/1879711
Regressions for predictiony= f(x; Q) + eDifferent possibilities
yis a scalarY is real
Y is categorical (classification)
yis a vectorxis a vector
xis a set of real valued variablesxis a set of categorical variablesxis a combination of the two
f(.)is a linear or affine functionf(.)is a nonlinear functionf(.)is a time-series model
23 Oct 201211755/1879712
A linear regression
Assumption: relationship between variables is linearA linear trend may be found relating xand yy= dependent variablex= explanatory variableGiven x, ycan be predicted as an affine function of x
23 Oct 201211755/1879713
X
Y
An imaginary regression..http://pages.cs.wisc.edu/~kovar/hall.htmlCheck this shit out (Fig. 1). That's bonafide, 100%real data, my friends. I took it myself over the course of two weeks.And this was not a leisurely two weeks, either; I busted my ass day and night in order to provide you with nothing but the best datapossible. Now, let's look a bit more closely at this data, remembering that it is absolutely firstrate. Do you see the exponential dependence? I sure don't. I see a bunch of crap.Christ, this was such a waste of my time.Banking on my hopes that whoever grades this will just look
at the pictures, I drew an exponential through my noise. I believe the apparent legitimacy is enhanced by the fact that I used a complicated computerprogram to make the fit. Iunderstand this is the same processby which the top quark was discovered.
23 Oct 201211755/1879714
Linear Regressionsy= Ax+ b+ e
e = prediction error
Given a “training” set of {x,y}values: estimate Aand by1= Ax1+ b+ e1
y2= Ax2+ b+ e2
y3= Ax3+ b+ e3
…
If Aand bare well estimated, prediction error will be small
23 Oct 201211755/1879715
Linear Regression to a scalar
Rewrite
23 Oct 201211755/1879716
...] [3 2 1y y y y
...
1 1 13 2 1x x x X
ba A
...] [3 2 1e e e e
Define:
e X A y T
y1= aTx1+ b+ e1y2= aTx2+ b+ e2y3= aTx3+ b + e3
Learning the parameters
Given training data: several x,yCandefine a “divergence”: D(y, )
Measures how much yhatdiffers from y
Ideally, if the model is accurate this should be small
Estimate A, bto minimize D(y, )
23 Oct 201211755/1879717
X A yT
ˆAssuming no error
e X A y T
y
y
The prediction error as divergence
Define the divergence as the sum of the squared error in predicting y18
e X A y T
y1= aTx1+ b+ e1y2= aTx2+ b+ e2y3= aTx3+ b + e3
... ˆ23
22
21 e e e E ) y D(y,
... ) ( ) ( ) (2
12
12
1 b y b y b yT T T
a a a
2
X A y X A y X A y ET T T T
23 Oct 201211755/18797
Prediction error as divergence
y= aTx+ ee= prediction errorFind the “slope” asuch that the total squared lengthof the error lines is minimized
23 Oct 201211755/1879719
Solving a linear regression
Minimize squared error
Differentiating w.r.tAandequating to 0
23 Oct 201211755/1879720
e X A y T
T T T T) )( (X A y X A y || A X y|| E
2
A yX - A XX A yyT T T T
2
0 2 2 A yX - XX A Ed dT T T
X y XX yX A-1
pinvT T T
T T
Xy XX A-1
An Aside
What happens if we minimize the perpendicular instead?
23 Oct 201211755/1879721
Regression in multiple dimensions
Also called multiple regression
Equivalent of saying:
Fundamentally no differentfrom Nseparate single regressions
But we can use the relationship between ysto our benefit23 Oct 201211755/1879722
y1= ATx1+ b + e1y2= ATx2+ b + e2y3= ATx3+ b+ e3
yiis a vector
y1= ATx1+ b + e1
y11= a1Tx1+ b1+ e11
y12= a2Tx2+ b2+ e12
y13= a3Tx3+ b3+ e13
yij= jthcomponentof vectoryi
ai= ithcolumn of A
bi= ithcomponent of b
Multiple Regression
Differentiating andequating to 0
23 Oct 201211755/1879723
...] [3 2 1y y y Y
... 3 2 1
1x
1x
1x X
bA A
...] [3 2 1e e e EDx1 vector of ones
E X A Y T
T T T
ii
Titrace DIV) )( (
2X A Y X A Y b x A y
0 2 2 A YX - XX Ad dDivT T T
X Y XX YX A-1
pinvT T T
T T
XY XX A-1
A Different Perspective
yis a noisy reading of ATx
Error eis Gaussian
Estimate Afrom
23 Oct 201211755/1879724
e x A y T
) , 0(2I ~ e N
] ... [ ] ... [2 1 2 1N Nx x x X y y y Y
=+
The Likelihood of the data
Probability of observing a specific y, given x, for a particular matrix A
Probability of the collection:
Assuming IID for convenience(not necessary)23 Oct 201211755/1879725
e x A y T
) , 0(2I ~ e N
) , ( ) ; | (2I x A A x y
TN P
] ... [ ] ... [2 1 2 1N Nx x x X y y y Y
i
iT
N P) , ( ) ; | (2I x A A X Y
A Maximum Likelihood Estimate
Maximizing the log probability is identical to minimizing the traceIdentical to the least squares solution
23 Oct 201211755/1879726
e x A y T
) , 0(2I ~ e N] ... [ ] ... [2 1 2 1N Nx x x X y y y Y
i
iT
D P2
2 2
1exp
) 2(
1) | (x A X Y
i
iT
i C P2
22
1) ; | ( logx A y A X Y
T T T
trace C) )( (2
12X A Y X A Y
X Y XX YX A-1
pinvT T T
T T
XY XX A-1
Predicting an output
From a collection of training data, have learned AGiven xfor a new instance, but not y, what is y?Simple solution:
23 Oct 201211755/1879727
X A yT
ˆ
Applying it to our problem
Prediction by regression
Forward regression
xt= a1xt-1+ a2xt-2…akxt-k+et
Backward regression
xt= b1xt+1+ b2xt+2…bkxt+k+et
23 Oct 201211755/1879728
Applying it to our problem
Forward prediction
23 Oct 201211755/1879729
1
1
1 1
1 3 2
2 1
1
1
....
.. .. .. ......
..K
t
t
K t K t
K t t
K t t
T
K
t
t
e
ee
x x x
x x xx x x
x
xx
t a
e X a x Tt
Tt pinva X x) (
Applying it to our problem
Backward prediction
23 Oct 201211755/1879730
1
2
1
2 1
2 1
1 1
1
2
1
....
.. .. .. ......
..e
ee
x x x
x x xx x x
x
xx
K t
K t
K t K t
K t t
K t t
Tt
K t
K t
b
e X b x Tt
Tt pinvb X x) (
Finding the burst
At each time
Learn a“forward” predictor at
At each time, predict next sample xtest= Siat,kxt-k
Compute error: ferrt=|xt-xtest|2
Learn a“backward” predict and compute backward error
berrt
Compute averageprediction error over window, threshold
23 Oct 201211755/1879731
Filling the hole
Learn “forward” predictor at left edge of “hole”
For eachmissing sample
At each time, predict next sample xtest= Siat,kxt-k
Use estimated samples if real samples are not available
Learn “backward” predictor at left edge of “hole”
For eachmissing sample
At each time, predict next sample xtest= Sibt,kxt+k
Use estimated samples if real samples are not available
Average forward and backward predictions
23 Oct 201211755/1879732
Reconstruction zoom in
Next glitch
Interpolationresult
Reconstructionarea
Actualdata
Distortedsignal
Recoveredsignal
33 11755/18797 23 Oct 2012
Incrementally learning the regression
Canwe learn A incrementally instead?
As data comes in?
The WidrowHoff rule
Note the structure
Can also be done in batch mode!
23 Oct 201211755/1879734
Requires knowledge ofall (x,y) pairs
T TXY XX A
-1
t
T tt t t t
t ty y yx a x a a
ˆ ˆ
1
error
Scalarprediction version
Predicting a value
What are we doing exactly?
23 Oct 201211755/1879735
T T
XY XX A-1
x XX YX x A y1
ˆ
T T T
TXX C
x C x21
ˆ
ii
Ti
Tyx x x XY y ˆ ˆ ˆ ˆ ˆ
LetNormalizing and rotating space
Therotation is irrelevant
Weighted combinationofinputs
Relationships are not alwayslinear
How do we model these?
Multiple solutions
23 Oct 201211755/1879736
Non-linear regression
y = j(x)+e
23 Oct 201211755/1879737
)] ( ... ) ( ) ( [ ) (2 1x x x x φ xN
)] ( ... ) ( ) ( [ ) (2 1K x φ x φ x φ X X
Y= A(X)+eReplace Xwith (X)in earlier equations for solution
) ( ) ( ) (T T
Y X X X A-1
What we are doing
Finding the optimal combination of various function
Remindyou of something?
23 Oct 201211755/1879738
Being non-commital: Local RegressionRegression is usually trained overthe entire data
Must apply everywhere
How about doing this locally?
For any x
23 Oct 201211755/1879739
ii
Ti
Tyx x x XY y ˆ ˆ ˆ ˆ ˆ
e y x x y ii
i d) , (
e y x C x y
ii
iT1
Local RegressionThe resulting regression is dependent on x!
No closed form solution
But can be highly accurate
But what is d(x,x’)??
23 Oct 201211755/1879740
ii
i dy x x y) , ( ˆii
i dy x x y) , ( ˆ
2|| ) , ( || ) (i
ii d ey x x y x
Kernel Regression
Actually a nonparametric MAP estimator of yNote –an estimator of y, not parameters of regressionThe “Kernel” is the kernel of a parzenwindow
But first.. MAP estimators..
23 Oct 201211755/1879741
ii h
ii
i h
K
K
) (
) (ˆ
x x
y x xy
Map Estimators
MAP (Maximum A Posteriori): Finda “best guess” for y(in a statistical sense), given that we know x
y= argmaxYP(Y|x)
ML (Maximum Likelihood): Findthat value of Y for which the statistical best guess of X would have been the observed X
y= argmaxYP(x|Y)
MAP is simpler to visualize
23 Oct 201242 11755/18797
MAP estimation: Gaussian PDF
F1F0
Assume X and Y arejointly Gaussian
The parameters of theGaussian are learned from trainingdata
23 Oct 201243 11755/18797
Learning the parametersof the Gaussian
23 Oct 201211755/1879744
xy z
N
ii
N1
1z z
x
yz
YY YX
XY XX
C CC C Cz
T
i
N
ii
NCz z zz z 1
1
Learning the parametersof the Gaussian
23 Oct 201211755/1879745
xy z
N
ii
N1
1z z
Ti
N
ii
NCz z zz z 1
1
x
yz
YY YX
XY XX
C CC C Cz
N
ii
N1
1x x
Ti
N
ii XY
NCy xy x 1
1
MAP estimation: Gaussian PDF
F1F0
Assume X and Y arejointly Gaussian
The parameters of theGaussian are learned from trainingdata
23 Oct 201246 11755/18797
MAP Estimator for Gaussian RV
Assume X and Y arejointly Gaussian
The parameters of the Gaussian are learned from trainingdata
Now we are given an X, but no YWhat is Y?
Level set ofGaussian
23 Oct 201247 11755/18797
MAP estimator for Gaussian RV
x0
23 Oct 201248 11755/18797
MAP estimation: Gaussian PDF
F1
23 Oct 201249 11755/18797
F1
MAP estimation: The Gaussian at a particular value of X
x0
23 Oct 201250 11755/18797
F1
MAP estimation: The Gaussian at a particular value of X
Most likelyvalue
x0
23 Oct 201251 11755/18797
MAP Estimation of a Gaussian RVY = argmaxyP(y| X) ???
23 Oct 201252 11755/18797
MAP Estimation of a Gaussian RV
23 Oct 201253 11755/18797
MAP Estimation of a Gaussian RV
23 Oct 201254 11755/18797
So what is this value?
Clearly a line
Equation of Line:
Scalar version given; vector version is identical
Derivation? Later in the program a bit
23 Oct 201211755/1879755
x XX YX Xx C C y 1
ˆ
x x y 1
ˆXX YX XC C
This is a multiple regression
This is the MAP estimate ofyNOT the regression parameter
What about the ML estimate of yAgain, ML estimate of y, not regression parameter
23 Oct 201211755/1879756
x x y 1
ˆXX YX XC C
Its also a minimum-mean-squared error estimateGeneral principle of MMSE estimation:
yis unknown, xis knownMust estimate it such that the expected squared error is minimized
Minimize above term
23 Oct 201211755/1879757
] | ˆ [2
x y y E Err
Its also a minimum-mean-squared error estimateMinimize error:
Differentiating andequating to 0:
23 Oct 201211755/1879758
] | ˆ ˆ [ ] | ˆ [2
x y y y y x y y T
E E Err
] | [ ˆ 2 ˆ ˆ ] | [ ] | ˆ 2 ˆ ˆ [x y y y y x y y x y y y y y yE E E ErrT T T T T T
0 ˆ ] | [ 2 ˆ ˆ 2 ] | ˆ 2 ˆ ˆ [ 2 y x y y y x y y y y y yd E d E dErrT T T T T
] | [ ˆx y yE The MMSE estimate is themean of the distribution
For the Gaussian: MAP = MMSE
Most likelyvalue
is also
The MEANvalue
Would be true of any symmetric distribution23 Oct 201259 11755/18797
MMSE estimates for mixture distributions
60
Let P(y|X)be a mixture densityTheMMSE estimate of y is given by
Just a weightedcombination of the MMSEestimates from the component distributions
) , | ( ) ( ) | (x y x yk P k P Pk
y x y y x yd k P k P Ek
) , | ( ) ( ] | [ y x y yd k P k Pk
) , | ( ) (
] , | [ ) (x yk E k Pk
23 Oct 201211755/18797
MMSE estimates from a Gaussian mixture
23 Oct 201211755/1879761
LetP(y|x)is also a Gaussian mixture
Let P(x,y)be a Gaussian Mixture
) , ; ( ) ( ) ( ) (k kk
N k P P PS z z y x,
xy z
) (
) , | ( ) | ( ) (
) (
) , , (
) () (
) | (x
x y x x
x
y x
xy x,
P
k P k P P
P
k P
PP
x y Pk k
k
k P k P P) , | ( ) | ( ) | (x y x x y
MMSE estimates from a Gaussian mixture
23 Oct 201211755/1879762
Let P(y|x)is a Gaussian Mixture
k
k P k P P) , | ( ) | ( ) | (x y x x y
) ], ; [ ]; ; ([ ) , , (, ,
, ,, ,
xx xy
yx yyx y x y x y
k k
k kk kC C
C CN k P
) ), ( ; ( ) , | (,1, , ,Q
x xx yx yx y x yk k k kC C N k P
Q
kk k k kC C N k P P) ), ( ; ( ) | ( ) | (,
1, , ,x xx yx yx y x x y
MMSE estimates from a Gaussian mixture
23 Oct 201211755/1879763
E(Y|X) is also a mixture
P(y|x) isa mixture density
Q
kk k k kC C N k P P) ), ( ; ( ) | ( ) | (,
1, , ,x xx yx yx y x x y
k
k E k P E] , | [ ) | ( ] | [x y x x y
kk k k kC C k P E) ( ) | ( ] | [,
1, , ,x xx yx yx x x y
MMSE estimates from a Gaussian mixture
23 Oct 201211755/1879764
A mixtureof estimates from individual Gaussians
MMSE with GMM: Voice Transformation
-FestvoxGMM transformation suite (Toda) awbbdljmkslt
awbbdljmkslt
23 Oct 201265 11755/18797
Voice Morphing
Align training recordings from both speakers
Cepstralvector sequence
Learn a GMM on joint vectors
Given speech from one speaker, find MMSE estimate of the other
Synthesize from cepstra
23 Oct 201211755/1879766
A problem with regressions
ML fit is sensitiveError is squared
Small variations in data large variations in weights
Outliers affect it adversely
UnstableIf dimension of X >= no. of instances
(XXT) is notinvertible
23 Oct 201211755/1879767
T T
XY XX A-1
MAP estimation of weights
Assume weights drawn from a Gaussian
P(a) = N(0,2I)
Max. Likelihood estimate
Maximum a posterioriestimate
23 Oct 201211755/1879768
a
Xe
y=aTX+e
) ; | ( log max arg ˆa X y aaP
) ( ) , | ( log max arg ) , | ( log max arg ˆa a X y X y a aA aP P P
MAP estimation of weights
Similar to ML estimate with an additional term
23 Oct 201211755/1879769
) ( ) , | ( log max arg ) , | ( log max arg ˆa a X y X y a aA AP P P
P(a) = N(0,2I)Log P(a) = C –log –0.5-2||a||2
T T T TC P) ( ) (
21
) ( log2X a y X a y a X, | y
a a X a y X a y aAT T T T T
C2
25. 0 ) ( ) (2
1log ' max arg ˆ
MAP estimate of weights
Equivalent to diagonal loading of correlation matrix
Improves condition number of correlation matrixCan be inverted with greater stability
Will not affect the estimation from wellconditioned data
Also called TikhonovRegularization Dual form: Ridge regression
MAP estimate of weights
Not to be confused with MAP estimate of Y
23 Oct 201211755/1879770
0 2 2 2 a I yX XX ad dLT T T
T T
XY I XX a-1
MAP estimate priors
Left: Gaussian Prior on W
Right: LaplacianPrior23 Oct 201211755/1879771
MAP estimation of weights with laplacianpriorAssume weights drawn from a Laplacian
P(a) = l-1exp(-l-1|a|1)
Maximum a posterioriestimate
No closed form solution
Quadratic programming solution required
Nontrivial
23 Oct 201211755/1879772
11
) ( ) ( ' max arg ˆa X a y X a y aA
lT T T T
C
MAP estimation of weights with laplacianpriorAssume weights drawn from a Laplacian
P(a) = l-1exp(-l-1|a|1)
Maximum a posterioriestimate
…
Identical to L1 regularized leastsquares estimation
23 Oct 201211755/1879773
11
) ( ) ( ' max arg ˆa X a y X a y aA
lT T T T
C
L1-regularized LSE
No closed form solution
Quadratic programming solutions required
Dual formulation
“LASSO” –Least absolute shrinkage and selection operator
23 Oct 201211755/1879774
11
) ( ) ( ' max arg ˆa X a y X a y aA
lT T T T
C
T T T TC) ( ) ( ' max arg ˆX a y X a y aA t 1 a subject to
LASSO Algorithms
Various convex optimization algorithms
LARS: Least angle regression
Pathwisecoordinate descent..
Matlabcode available from web
23 Oct 201211755/1879775
Regularized least squares
Regularization results in selection of suboptimal(in leastsquares sense) solution
One of the loci outside center
Tikhonovregularization selectsshortestsolution
L1 regularization selectssparsest solution
23 Oct 201211755/1879776
Image Credit:Tibshirani
LASSO and Compressive Sensing
Given Y and X, estimate sparse WLASSO:
X = explanatory variableY = dependent variablea = weights of regression
CS:X = measurement matrixY = measurementa = data
23 Oct 201211755/1879777
Y=Xa
An interesting problem: Predicting War!
Economists measure a number of social indicators for countries weekly
Happiness index
Hunger index
Freedom index
Twitter records
…
Question: Will there be a revolution or war next week?
23 Oct 201211755/1879778
An interesting problem: Predicting War!
Issues:
Dissatisfaction builds up–not an instantaneous phenomenon
Usually
War / rebellion build upmuch faster
Often in hours
Important to predict
Preparedness for security
Economic impact
23 Oct 201211755/1879779
Predicting War
Given
Sequence of economic indicators for each week
Sequence of unrest markers for each week
At the end of each week we know if war happened or not that week
Predictprobability of unrest next week
This could be a new unrest orpersistence of a current one
23 Oct 201211755/1879780
W
S
W
S
W
S
W
S
W
S
W
S
W
S
W
S
wk1wk2wk3wk4wk5wk6wk7wk8
O1O2O3O4O5O6O7O8
A Step Aside: Predicting Time Series
An HMM is a model for timeseries data
How canwe use it predict the future?
23 Oct 201211755/1879781
Predicting with an HMM
Given
Observations O1..Ot
All HMM parameters
Learned from some training data
Must estimate future observationOt+1
Estimate must consider entire history (O1..Ot)
No knowledge of actual stateof the process at any time
23 Oct 201211755/1879782
Predicting with an HMM
Given O1..OtCompute P(O1.. Ot,s)Using the forward algorithm –computes a(s,t)
23 Oct 201211755/1879783
timett+1
sa(s,t)
) ) ( , ,..., , ( ) , (2 1s t state O O O P t st a
' '.. 1
.. 1.. 1
) ,' () , (
) ,' () , (
) | (
s st t
t tt t
t st s
O s s POs s P
O s s Paa
Predicting with an HMM
Given P(st=s | O1..t) for allsP(st+1= s | O1..t) = Ss’P(st=s’|O1..t)P(s|s’)P(Ot+1,s|O1..t) = P(O|s) P(st+1=s|O1..t)P(Ot+1|O1..t) = SsP(Ot+1,s|O1..t)
= SsP(O|s) P(st+1=s|O1..t)This is a mixture distribution84
time
sa(s,t+1)
23 Oct 201211755/18797
Predicting with an HMM
P(Ot+1|O1..t) = SsP(Ot+1,s|O1..t) = SsP(O|s) P(st+1=s|O1..T)
MMSE estimate of Ot+1given O1..tE[Ot+1| O1..t] =SsP(st+1=s|O1..T)E[O|s]
A weighted sum of the state means23 Oct 201211755/1879785
time
sa(s,t+1)
Predicting with an HMMMMSE Estimate of Ot+1= E[Ot+1|O1..T]
E[Ot+1| O1..t] = SsP(st+1=s|O1..T) E[O|s]
If P(O|s)is a GMME(O|s) = SkP(k|s) k,s
23 Oct 201211755/1879786
sk
s k s k t tw O s P O, , .. 1 1) | ( ˆ
sks k s k
s
tws t
s tO, ,
'
1)' , (
) , ( ˆaa
Predicting War
Train an HMM on z = [w, s]After the tthweek, predict probabilitydistribution:
P(zt| z1…zt) = P(w, z | z1..zt)
Marginalize out x(not known for next week)
War? E[w | z1..zt]23 Oct 201211755/1879787
ds z s w P z w Pt t) | , ( ) | (.. 1 .. 1
Recommended