Upload
kolton-bardwell
View
225
Download
0
Tags:
Embed Size (px)
Citation preview
3
Neural networks Supervised learning
The training data consists of input information with their corresponding output information.
Unsupervised learning The training data consists of input information without their
corresponding output information.
4
Neural networks Generative model
Model the distribution of input as well as output ,P(x , y) Discriminative model
Model the posterior probabilities ,P(y | x)
P(x,y1)
P(x,y2)
P(y1|x)
P(y2|x)
5
Neural networks What is the neural?
Linear neurons
Binary threshold neurons
Sigmoid neurons
Stochastic binary neurons
x1
x2
1
w1w2
b
y
ii
iwxby
y 1 if0 otherwise
ii
iwxbz ze
y
1
1
ii
iwxbz ze
yp
1
1)1(
6
Neural networks Two layer neural networks (Sigmoid neurons)
Back-propagationStep1:Randomly initial weightDetermine the output vectorStep2:Evaluating the gradient of an error functionStep3:Adjusting weight, Repeat The step1,2,3 until error enough low
7
Neural networks Back-propagation is not good for deep
learning It requires labeled training data.
Almost data is unlabeled. The learning time is very slow in networks with multiple
hidden layers. It is very slow in networks with multi hidden layer.
It can get stuck in poor local optima. For deep nets they are far from optimal.
Learn P(input) not P(output | input) What kind of generative model should we learn?
9
Graphical model A graphical model is a probabilistic model
for which graph denotes the conditional dependence structure between random variables probabilistic model
In this example: D depends on A, D depends on B, D depends on C, C depends on B, and C depends on D.
10
Graphical model Directed graphical model
Undirected graphical model
A
B C
D
A
B
C
D
𝑃 ( 𝐴 ,𝐵 ,𝐶 ,𝐷 )= 1𝑍∗φ ( 𝐴 ,𝐵 ,𝐶 )∗𝜑 (𝐵 ,𝐶 ,𝐷)
𝑃 ( 𝐴 ,𝐵 ,𝐶 ,𝐷 )=𝑃 ( 𝐴) 𝑃 (𝐵|𝐴 )𝑃 (𝐶|𝐴 ) 𝑃 (𝐷∨𝐵 ,𝐶)
12
Belief nets A belief net is a directed acyclic graph
composed of stochastic variablesstochastic hidden causes
visible
Stochastic binary neurons
ii
iwxbz ze
yp
1
1)1(
It is sigmoid belief nets
13
Belief nets we would like to solve two problems
The inference problem: Infer the states of the unobserved variables.
The learning problem: Adjust the interactions between variables to make the network more likely to generate the training data.stochastic hidden causes
visible
14
Belief nets It is easy to generate sample P(v | h) It is hard to infer P(h | v)
Explaining away stochastic hidden causes
visible
15
Belief nets Explaining away
H1
H2
V
H1 and H2 are independent, but they can become dependentwhen we observe an effect that they can both influence
16
Belief nets Some methods for learning deep belief nets
Monte Carlo methods But its painfully slow for large, deep belief nets
Learning with samples from the wrong distribution Use Restricted Boltzmann Machines
18
Boltzmann Machine It is a Undirected graphical model The Energy of a joint configuration
hidden
i
j
visible
20
Boltzmann Machine A very surprising fact
Derivative of log probability of one training vector, v under the model.
Expected value of product of states at thermal equilibrium when v is clamped on the visible units
Expected value of product of states at thermal equilibrium with no clamping
21
Boltzmann Machines Restricted Boltzmann Machine We restrict the connectivity to make learning easier.
Only one layer of hidden units. We will deal with more layers later
No connections between hidden units
Making the updates more parallel
visible
22
Boltzmann Machines the Boltzmann machine learning algorithm
for an RBM
i
j
i i
j
i
j
t = 0
j
t = 1
t = 2
t = infinity
23
Boltzmann Machines Contrastive divergence: A very surprising
short-cut
t = 0 t = 1 reconstructiondata
i
j
i
j
This is not following the gradient of the log likelihood. But it works well.
25
DBN It is easy to generate sample P(v | h) It is hard to infer P(h | v)
Explaining away
Use RBM to initial weight can get good optimal
stochastic hidden causes
visible
26
DBN Combining two RBMs to make a DBN
1W
2W
2h
1h
1h
v
1W
2W
2h
1h
v
copy binary state for each v
Compose the two RBM models to make a single DBN model
Train this RBM first
Then train this RBM
It’s a deep belief nets!
27
DBN Why we can use RBM to initial belief nets
weights? An infinite sigmoid belief net that is equivalent to an RBM
Inference in a directed net with replicated weights
Inference is trivial. We just multiply v0 by W transpose. The model above h0 implements a complementary prior. Multiplying v0 by W transpose gives the product of the likelihood term and the prior term.
W
v1
h1
v0
h0
v2
h2
TW
TW
TW
W
W
etc.
28
DBN
Complementary prior
A Markov chain is a sequence of variables X1;X2; : : : with the Markov property
A Markov chain is stationary if the transition probabilities do not
depend on time
is called the transition matrix. If a Markov chain is ergodic it has a unique equilibrium
distribution
X1X2
X3
X4
29
DBN Most Markov chains used in practice satisfy detailed
balance
e.g. Gibbs, Metropolis-Hastings, slice sampling. . . Such Markov chains are reversible
X1X2
X3
X4
𝑃∞ (𝑋 1 )𝑇 (𝑋 1→𝑋 2 )𝑇 (𝑋 2→𝑋 3 )𝑇 (𝑋 3→𝑋 4)
X1X2
X3
X4
𝑇 ( 𝑋1←𝑋 2 )𝑇 ( 𝑋 2←𝑋 3 )𝑇 (𝑋 3← 𝑋 4 )𝑃 ∞(𝑋 4)
31
DBN Combining two RBMs to make a DBN
1W
2W
2h
1h
1h
v
1W
2W
2h
1h
v
copy binary state for each v
Compose the two RBM models to make a single DBN model
Train this RBM first
Then train this RBM
It’s a deep belief nets!