Upload
chastity-patterson
View
219
Download
0
Embed Size (px)
Citation preview
Machine Learningand having it deep and
structured
Introduction
Outline
Some tasks are very complex• You know how to write programs.• One day, you are asked to write a program for
speech recognition.
Find the common patterns from the left waveforms
It seems impossible to write a program for speech recognition
你好 你好
你好 你好
You quickly get lost in the exceptions and special cases.
Let the machine learn by itself
你好
大家好
人帥真好
You said “你好”
A large amount of audio data
You only have to write the program for learning
Learn how to do speech
recognition
Learning ≈ Looking for a Function• Speech Recognition
• Handwritten Recognition
• Weather forecast
• Play video games
f
f
f
f
“2”
“你好”
“sunny tomorrow”
“jump”
weather today
Positions and number of enemies
Types of Learning
Supervised Learning
Reinforcement Learning
Unsupervised Learning
Supervised Learning
Training:Pick the best Function f*
TrainingData
Model
Testing:
,ˆ,,ˆ, 2211 yxyx
Hypothesis Function Set
x
y21, ff
*f
“Best” Function
yxf
:x
:y “你好”
“2”(label)
x: function input y: function output
大家好
Reinforcement Learning
TrainingData
Model
,, 21 xx
Hypothesis Function Set
21, ff
know how good f(x) is
How are you?
GoodBye
Bad!
Example: Dialogue System
x:
f1(x)=“Good Bye”No labels
Reinforcement Learning
Training:Pick the best Function f*
TrainingData
Model
,, 21 xx
Hypothesis Function Set
21, ff
know how good f(x) is
Example: Dialogue System
How are you?
Fine.
Good!
f2(x)=“Fine”No labels
Reinforcement Learning
Training:Pick the best Function f*
TrainingData
Model
Testing:
Hypothesis Function Set
21, ff
*f
“Best” Function
yxf
,, 21 xx
know how good f(x) is : x’ = “hello”
Machine: y’ = “hi”
No labels
Unsupervised Learning
TrainingData
,, 21 xx
No labels
Lots of audio without text annotation
What can I do with these
data?
Outline
Inspired from human brain
Human Brains are Deep
A Neuron for Machine
z
1w
2w
Nw
…
1x
2x
Nx
b
z z
zbias
a
ze
z
1
1
Sigmoid function
Each neuron is a function
Activation function
• Neural Network: Cascading the neurons
Deep Learning
Hidden Layer
1x
2x
……
Layer 1
……
1y
2y…
…
Layer 2…
…Layer L
……
……
……
Input Output
MyNx
M: RRf N
Deep Learning
Reference: http://neuralnetworksanddeeplearning.com/chap4.html
Any continuous function f
M: RRf N
Can be realized by a network with one hidden layer
(given enough hidden neurons)
Universality Theorem:
Popular
Powerful
• Speech Recognition (TIMIT):
(Deep neural network on TIMIT usually used 4 to 8 layers)
Three misunderstandings about Deep Learning
1. Deep learning works because the model is more “complex”
Deep is simply more complex …..
1x 2x ……Nx
……
1x 2x ……Nx
……
……
Shallow Deep
Deep works better simply because it uses more parameters.
Fat + Short v.s. Thin + Tall
1x 2x ……Nx
Deep
1x 2x ……Nx
……
Shallow
Which one is better?
Deep Learning - Why? Toy Example {0,1}: 2 Rf
10
x
y
……
……
……
……
……
……0 or 1
Sample 10,0000 points as training data
Deep Learning - Why? Toy Example
1 hidden layer:
125 neurons 2500 neurons500 neurons
3 hidden layers:
How many neurons in each hidden layers?
100~200
50~100
25~50
Less than 25
Deep Learning - Why?
• Experiments on Hand-writing digit classification
Deeper: Using less parameters to achieve the same performance
Three misunderstandings about Deep Learning
2. When you are using deep learning, you need more training data.
Size of Training Data
• Different number of training examples10,0000 5,0000 2,0000
1 hidden layer
3 hidden layers
Size of Training Data
• Experiments on Hand-writing digit classification
Deeper: Using less training data to achieve the same performance
Three misunderstandings about Deep Learning
3. You can simply get the power of deep by cascading the neurons.
Hard to get the power of Deep …
Can I get all the power of deep from this course?No, the researchers still do not understand all the mystery of deep learning.
Outline
In the real world ……
YXf :
X (Input domain): Sequence, graph structure, tree structure ……
Y (Output domain): Sequence, graph structure, tree structure ……
Retrieval
YXf :
:X :Y
(keyword)
“Machine learning”
A list of web pages (Search Result)
Translation
(Another kind of sequence)(One kind of sequence)
YXf :
“Machine learning and having it deep and structured”
“機器學習及其深層與結構化”
:X :Y
Speech Recognition
“大家好,歡迎大家來修機器學習及其深層與結構化”(Another kind of sequence)(One kind of sequence)
YXf :
:X :Y
Speech Summarization
YXf :
:X
:YSummary
Select the most informative segments to form a compact version
Record Lectures
Object Detection
YXf :
:X Image Object Positions:Y
Haruhi
Mikuru
Image Segmentation
YXf :
Source of images: Nowozin, Sebastian, and Christoph H. Lampert. "Structured learning and prediction in computer vision." Foundations and Trends® in Computer Graphics and Vision 6.3–4 (2011): P57.
:X Image foreground:Y
Remote Image Ground Survey
Source of images: Nowozin, Sebastian, and Christoph H. Lampert. "Structured learning and prediction in computer vision." Foundations and Trends® in Computer Graphics and Vision 6.3–4 (2011): P146.
YXf :
Pose Estimation
YXf ::X Image :Y Pose
Source of images: http://groups.inf.ed.ac.uk/calvin/Publications/eichner-techreport10.pdf
Structured Learning
• The tasks above are developed separately in the past.
• Recently, people realize that there is a unified framework behind these approaches.
• Three stepsEvaluationInferenceLearning
Concluding Remarks
Reference
• Deep Learning• Neural Networks and Deep Learning
• http://neuralnetworksanddeeplearning.com/• For more information
• http://deeplearning.net/
• Structure Learning• Structured Learning and Prediction in Computer Vision.
• http://www.nowozin.net/sebastian/papers/nowozin2011structured-tutorial.pdf
• Linguistic Structure Prediction• http://www.cs.cmu.edu/afs/cs/Web/People/nasmith/LSP/
PUBLISHED-frontmatter.pdf
Thank you!
Powerful
• Inspired from human brain
Layer 1 Layer 2 Layer L
1x
2x
……
……
……
……
……
……
……Nx
visual cortex
Retina
Pixels Edges PrimitiveShapes
……
……
……
……
Powerful
• Image Recognition
Reference: Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833)
1st hidden layer 2nd hidden layer 3rd hidden layer