Upload
yueshen-xu
View
346
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Hidden markov chain and bayes belief networks doctor consortium.I made it by myself. I hope it's helpful for you.
Citation preview
Graphical Models of Probability
Graphical models use directed or undirected graphs over a set of random variables to explicitly specify variable dependencies and allow for less restrictive independence assumptions while limiting the number of parameters that must be estimated.
Bayesian Networks: Directed acyclic graphs that indicate causal structure.
Markov Networks: Undirected graphs that capture general dependencies.
Middle ware, CCNT, ZJU04/10/23
Hidden Markov Model
Zhejiang Univ
CCNT
Yueshen Xu
Middle ware, CCNT, ZJU04/10/23
Overview
Markov Chain HMM Three Core Problems and Algorithms Application
Middleware, CCNT, ZJU04/10/23
Markov Chain
Instance
We can regard the weather as three states: state1 : Rain
state2 : Cloudy
state3 : Sun
Tomorrow
Rain Cloudy Sun
Today
Rain 0.4 0.3 0.3
Cloudy 0.2 0.6 0.2
Sun 0.1 0.1 0.8
We can obtain the transition matrix with long term observation
Middleware, CCNT, ZJU04/10/23
Definition
one-step transition probability
That is to say, the evolvement of the stochastic process only relies on the current state and has nothing to do with those states before. Then we call this Markov property, and the process is regarded as Markov Process
State Space:
Observation Sequence:
Middleware, CCNT, ZJU04/10/23
Keystone
Middleware, CCNT, ZJU
state transition matrix
其中:
Initial state probability matrix
04/10/23
HMM
A HMM is a double random process, consisting of two parallel parts: Markov Chain: Describe the transition of the states, which is unobservable, by
means of transition probability matrix. Common stochastic process: Describe the stochastic process of the
observable events
Markov Chain( , A)
Stochastic Process( B)
State Sequence Observation Sequence
q1, q2, ..., qT o1, o2, ..., oT
HMM
Middleware, CCNT, ZJU04/10/23
Unobservable ObservableCoreFeature
S1 S2 S3
a11 0.3a
b0.80.2
a22 0.4a
b0.30.7
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
Example:What’s the probability of producing the sequence “abb” for this stochastic process?
Middleware, CCNT, ZJU04/10/23
S1 S2 S3
a11 0.3a
b0.80.2
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
S1→S1→S2→S3 0.3*0.8*0.5*1.0*0.6*0.5=0.036
a22 0.4a
b0.30.7
Instance1:
Middleware, CCNT, ZJU04/10/23
S1 S2 S3
a11 0.3a
b0.80.2
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
S1→S2→S2→S3 0.5*1.0*0.4*0.3*0.6*0.5=0.018
a22 0.4a
b0.30.7
Instance2:
Middleware, CCNT, ZJU04/10/23
S1 S2 S3
a11 0.3a
b0.80.2
a12 0.5a
b1
0
a23 0.6a
b0.50.5 a13 0.2
a
b0
1
S1→S1→S1→S3 0.3*0.8*0.3*0.8*0.2*1.0=0.01152
Therefore, the total probability is: 0.036+0.018+0.01152=0.06552
a22 0.4a
b0.30.7
Instance3:
Middleware, CCNT, ZJU
We just know “abb”, but don’t know “S?S?S?”-----That’s the point.
04/10/23
Description
A HMM can be identified by those parameters below:
N: the number of states
M: the number of observable events for each state
A: the state transition matrix
B: observable event probability
: the initial state probability
Middleware, CCNT, ZJU
We generally record it as ),,( BA
04/10/23
Three Core Problem
Evaluation: In the case that the observation sequence and the
model have been preseted, then how can we calculate ?
Optimization:Based on question 1, the question is how to choose a special
sequence so that the observation sequence O can be explained reasonably?
TrainingBased on question 1, here is how to adjust parameters of the
model to maximize ?
Middleware, CCNT, ZJU
TOOOO 21,),,( BA
)|( OP
TqqqS 21
),,( BA )|( OP
We know O, but don’t know Q
04/10/23
Solution
There is no need to expound those algorithms, since we should pay attention to the application context.
Evaluation——Dynamic Programming Forward Backward Optimization——Greedy Viterbi Training——Iterative Baum-Welch & Maximum Likelihood Estimation
You can think over and deduce these methods after the workshop.
Middleware, CCNT, ZJU04/10/23
Application Context
Just think over it : The feature of HMM Which kind of problem can it describe and model?
Two stochastic sequence One relies on another or two is related. One can be “seen”, but another can not Just think about the Three Core Problem ……
I think we can make a conclusion , just as: Use One sequence to deduce and predict another or Find Out Who is Behind
““Iceberg” Iceberg” ProblemProblem
Middleware, CCNT, ZJU04/10/23
Application Context(1):Voice Recognition
Statistical DescriptionI. The characteristic pattern of voice, from sampling more often:
T =t1,t2,…, tn
II. The word sequence W(n): W1,W2,...,Wn
III. Therefore, what we concern about is P( W(n)|T )
Middleware, CCNT, ZJU
Formalization DescriptionWhat we have to solve is :
k = arg max{ P( W(n)|T ) }
n
04/10/23
Application Context(1):Voice Recognition
Middleware, CCNT, ZJU
Baum-WelchRe-estimation
Speechdatabase
FeatureExtraction
Converged?
1
2
7
HMM
waveform feature
Yes
No
end
Recognition FrameworkRecognition Framework
04/10/23
Application Context(2):Text Information Extraction
Figure out the HMM Model : Q1:What ‘s the state and what’s the observation event? Q2:How to figure out those parameters, just like aij?
Middleware, CCNT, ZJU
),,( BA
state : what you want to extract observation event : text block or each word etc
Through Training Samples
04/10/23
Application Context(2):Text Information Extraction
Middleware, CCNT, ZJU
Partition-ing
State List
Extracted Sequence
Document Partitioni-ng
Training Sample
HMM
Extraction FrameworkExtraction Frameworkcountry, state , city, street
title, author, email, abstract
04/10/23
Application Context(3):Other Fields:
Face Recognition POS tagging Web Data Extraction Bioinformatics Network intrusion detection Handwriting recognition Document Categorization Multiple Sequence Alignment …
Middleware, CCNT, ZJU
Which field are you interested in ?
Which field are you interested in ?
04/10/23
Middleware, CCNT, ZJU04/10/23
Bayes Belief Network
Yueshen Xu, too
Middle ware, CCNT, ZJU04/10/23
Overview
Bayes Theorem Naïve Bayes Theorem Bayes Belief Network Application
Middleware, CCNT, ZJU04/10/23
Bayes Theorem
Basic Bayes FormulaBasic of basis, but vital.
)(
)()|()|(
AP
BPBAPABP ii
i
niBPBAP
BPBAPABP n
jjj
iii ,...,2,1,
)()|(
)()|()|(
1
prior probabilityposterior probability
complete probability formula
Middleware, CCNT, ZJU
Condition Condition InversionInversion
04/10/23
The naive Bayes theorem is a simple probabilistic theorem based on applying Bayes theorem with strong independence assumptions
Naïve Bayes Theorem
Middleware, CCNT, ZJU
),,,,|(),|()|()(
),,,(
121121
21
nn
n
FFFCFPFCFPCFPCP
FFFCP
Chain RuleChain Rule
Conditional IndependenceConditional Independence
)|(),|( CFPFCFP iji
n
iin CFPCPFFFCP
121 )|()(),,,(
CC
F1F1 F2
F2… Fn
Fn
Naïve Bayes is a simple Bayes Net
04/10/23
Bayes Belief Network:Graph Structure
Directed Acyclic Graph (DAG) Nodes are random variables Edges indicate causal influences
Middleware, CCNT, ZJU
BurglaryBurglary EarthquakeEarthquake
AlarmAlarm
JohnCallsJohnCalls MaryCallsMaryCalls
RV
parents
descendant
relationship
04/10/23
Bayes Belief Network:Conditional Probability Table
Each node has a conditional probability table (CPT) that gives the probability of each of its values given every possible combination of values for its parents.
Roots (sources) of the DAG that have no parents are given prior probabilities.
Middleware, CCNT, ZJU
BurglaryBurglary EarthquakeEarthquake
AlarmAlarm
JohnCallsJohnCalls MaryCallsMaryCalls
P(B)
.001
P(E)
.002
B E P(A)
T T .95
T F .94
F T .29
F F .001
A P(M)
T .70
F .01
A P(J)
T .90
F .05
04/10/23
Bayes Belief Network:Joint Distributions
A Bayesian Network implicitly defines a joint distribution.
))(Parents|(),...,(1
21 i
n
iin XxPxxxP
Example
)( EBAMJP
)()()|()|()|( EPBPEBAPAMPAJP
00062.0998.0999.0001.07.09.0 Therefore an inefficient approach to inference is:
– 1) Compute the joint distribution using this equation.– 2) Compute any desired conditional probability using the joint
distribution.
Middleware, CCNT, ZJU
Conditional Independence
04/10/23
Conditional Independence &D-separation
D-separation− Let X,Y and Z be three sets of node
− If X and Y are d-separation by Z then X and Y are conditional independent given Z
D-separation− A is d-separation from B given C if
every undirected path between them is blocked
Path blocking− Three cases that expand on three
basic independence structures.
Middleware, CCNT, ZJU04/10/23
Application:Simple Document Classification(1)
Step1: Assume for the moment that there are only two mutually exclusive classes, S and ¬S (eg, spam and not spam), such that every element(email) is in either one or the other, that is to say:
Step2: what we concern about is :
Middleware, CCNT, ZJU
ii
ii
SPSDP
and
SPSDP
)|()|(
)|()|(
ii
ii
SPDP
SPDSP
and
SPDP
SPDSP
)|()(
)()|(
)|()(
)()|(
04/10/23
Application:Simple Document Classification(2)
Step3: Dividing one by the other gives, and the be re-factored .
Step4: Taking the logarithm of all these ratios for decreasing calculated quantity:
i i
i
i i
i i
SP
SP
SP
SP
DSP
DSP
SPSP
SPSP
DSP
DSP
)|(
)|(
)(
)(
)|(
)|(
)|()(
)|()(
)|(
)|(
i i
i
SP
SP
SP
SP
DSP
DSP
)|(
)|(ln
)(
)(ln
)|(
)|(ln
>0 or
<0
Known Sample
Training
Middleware, CCNT, ZJU04/10/23
Application:Overall
Medical diagnosis Pathfinder system outperforms leading experts in diagnosis of lymph-node
disease.
Microsoft applications Problem diagnosis: printer problems Recognizing user intents for HCI
Text categorization and spam filtering Student modeling for intelligent tutoring systems. Biochemical Data Analysis Predicting mutagenicity
So many…
Which field are you interested in ?
Which field are you interested in ?
Middleware, CCNT, ZJU04/10/23
Middleware, CCNT, ZJU04/10/23