24
Scaling Learning Algorithms Towards AI Authors: Yoshua Bengio, Yann LeCun Presenter: Marilyn Vazquez George Mason University February 10, 2017 Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 1 / 25

Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

  • Upload
    others

  • View
    13

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Scaling Learning Algorithms Towards AI

Authors: Yoshua Bengio, Yann LeCunPresenter: Marilyn Vazquez

George Mason University

February 10, 2017

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 1 / 25

Page 2: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Outline

1 Curse of Dimensionality

2 Shallow Learning

3 Deep Learning

4 Results

5 Conclusion

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 2 / 25

Page 3: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Curse of Dimensionality

The curse of dimensionality can be viewed either as the limitation on dataanalysis due to the large amount of data or parameters needed to analyzethe data.

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 3 / 25

Page 4: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Curse of Dimensionality: Example 1

Kernel density estimation: At point x ∈ Rd , the kernel density q̂estimates the real density q with high probability, i.e.

E(q̂(xi )) = E

σ−d

N

N∑j=1

Kσ(xi , xj)

= E

1

N

N∑j=1

e−||xi−xj ||

2σ2

(2πσ2)2d

→ q(xi ) +O(σ2,N−12σ−

d2

√q(x))

where the bias error, σ2, is dominant with large data and variance error,

N−12σ−

d2 , blows up if we take σ → 0

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 4 / 25

Page 5: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Error

To find an optimal bandwidth σ, we can balance errors:

σ2 = c1N− 1

2σ−d2 =⇒ σ

4+d2 = c1N

− 12 =⇒ σ = c1N

− 14+d

=⇒ error = c2N− 2

4+d

So that if for d = 1 we need n1 points to achieve the fixed error e1, then

increasing the dimension to d , we need n4+d5

1 data points i.e. exponentialin dimension!

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 5 / 25

Page 6: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Curse of Dimensionality: Example 2

Smooth function representation: A gaussian kernel machine is arepresentation

f (x) = b +n∑

i=1

wiK (xi , x)

where xi are the base points, wi are weights found through regression, andK (xi , x) is a Gaussian kernel.

Theorem

Let f : R→ R computed by a Gaussian Kernel machine with k base points(k non-zero wi ’s). Then f has at most 2k zeros.

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 6 / 25

Page 7: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Curse of Dimensionality: Example 2

Smooth Function Representation

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 7 / 25

Page 8: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Curse of Dimensionality: Example 2

Smooth function representation: A gaussian kernel machine is arepresentation

f (x) = b +n∑

i=1

wiK (xi , x)

where xi are the base points, wi are weights found through regression, andK (xi , x) is a Gaussian kernel.

Corollary

In Rd , if the learning problem requires f to change sign at least 2k timesalong some straight line, then the kernel machine must have at least kbase points (k non-zero wi ’s).

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 8 / 25

Page 9: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Curse of Dimensionality: Example 3

Local Derivative : For a Gaussian Kernel classifier, the normal of thetangent of the decision surface at x is constrained to approximately lie inthe span of the vectors (x − xi ), where ||x − xi || is small compared to σand xi are in the training set.

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 9 / 25

Page 10: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Local Derivative

Brief explanation:

For f (x) = b +n∑

i=1

wiK (x , xi ) = b +n∑

i=1

wie− ||x−xi ||

2

σ2

We get∂f (x)

∂x= −

n∑i=1

2(x − xi )wi

σ2e−||x−xi ||

2

σ2

Note that the dominant terms are those for which xi is a near neighbor ofx , so that we approximately get

∂f (x)

∂x≈ −

m∑i=1

w ′i e− ||x−xi ||

2

σ2

where w ′i = 2(x−xi )wi

σ2 and m ≤ n.

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 10 / 25

Page 11: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Curse of Dimensionality

Local Derivative Example

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 11 / 25

Page 12: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Getting Around the Curse of Dimensionality

There is no universal solution to the curse of dimensionality; however, forparticular purposes you can make assumption that may help.

“We hypothesize that many tasks in the AI set may be built aroundcommon representations, which can be understood as a set of interrelatedconcepts”

Translation: Use the mathematical idea of composition of functions tobuild a complicated function from simple parts (common representation),such as Gaussian kernels or any basis functions.

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 12 / 25

Page 13: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Shallow Learning

f (x) = b +N∑i=1

wiφi (x)

where wi results from the training, the basis could be something fixed suchas K (xi , x) or also a result of the training.

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 13 / 25

Page 14: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Shallow Learning

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 14 / 25

Page 15: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Deep Learning as Neural Network

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 15 / 25

Page 16: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Deep Learning as Function Composition

Let fjk represent the jth feature in the k layer

fj1(x) = bj ,1 +∑i

wij1K (xij1, x)

fj ,2(x) = bj ,2 +∑i

wij2K (xij2, fj1(x))

...

They conjecture that by allowing these compositions, we will need fewerparameters to fit compared to a shallow representation

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 16 / 25

Page 17: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Deep Learning Steps

Step 1 Initialization via unsupervised learning with a feedback thathelps reconstruct input from output

Step 2 Refine via gradient-descent supervised learning

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 17 / 25

Page 18: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Deep Learning

cijxy = tanh

bij +∑k

Pi−1∑p=0

Qi−1∑q=0

wijkpqc(i−1),k,(x+p),(y+q)

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 18 / 25

Page 19: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Shallow and Deep Learning

Deep Learning

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 19 / 25

Page 20: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Results

Results

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 20 / 25

Page 21: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Results

Sample Data

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 21 / 25

Page 22: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Results

Results

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 22 / 25

Page 23: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Conclusion

Summary

The curse of dimensionality can limit the amounts of data that can beanalyzed

We can not completely get rid of the curse of dimensionality, but wecan go around it if we make some assumptions

Shallow learning assumes that we can represent functions with thesmooth functions such as gaussian kernels

Deep learning assumes that complicated functions can be build bycomposing simple functions such as Gaussian kernels

Deep learning is composed of several two-layer sequences, the featuredetection layer and the feature pooling layer, in which each layer anon-linear supervision step is performed.

The authors show successful results in image classification

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 23 / 25

Page 24: Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Conclusion

References

Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towardsAI, Large-Scale Kernel Machine, 2007.

Leslie Lamport, Deep Learning and Convolutional Neural Networks,RSIP Vision Blogshttp://www.rsipvision.com/exploring-deep-learning/.

Jianxin Wu, Introduction to Convolutional Neural Networks, NationalKey Lab for Novel Software Technology, Nanjing University, China2016.

Bengion and LeCun (GMU) NLDA Seminar February 10, 2017 24 / 25