Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August...

Deep Learning for AIYoshua Bengio

August 28th, 2017 @ DS3

Data Science Summer School

A new revolution

seems to be in the

work after the

industrial revolution.

And Machine

Learning,

especially Deep

Learning, is at

the epicenter of

this revolution.

Deep Learning Breakthroughs

Computers have made hugestrides in perception, manipulating language, playinggames, reasoning, ...

Intelligence Needs Knowledge

• Learning:

powerful way to transfer knowledge to intelligent agents

• Failure of classical AI: a lot of knowledge is intuitive

• Solution: get knowledge from data & experience

Machine Learning, AI & No Free Lunch• Five key ingredients for ML towards AI

1. Lots & lots of data

2. Very flexible models

3. Enough computing power

4. Computationally efficient inference

5. Powerful priors that can defeat the curse of dimensionality

Bypassing the curse of dimensionalityWe need to build compositionality into our ML models

Just as human languages exploit compositionality to give representations and meanings to complex ideas

Exploiting compositionality can give an exponential gain in representational power

Distributed representations / embeddings: feature learning

Deep architecture: multiple levels of feature learning

Prior assumption: compositionality is useful to describe the world around us efficiently6

Distributed Representations: The Power of Compositionality – Part 1

• Distributed (possibly sparse) representations, learned from data, can capture the meaning of the data and state

• Parallel composition of features: can be exponentially advantageous

DistributedNot Distributed

Deep Representations: The Power of Compositionality –Part 2

• Learned function seen as a composition of simpler operations, e.g. inspired by neural computation

• Hierarchy of features, concepts, leading to more abstract factors enabling better generalization

• Again, theory shows this can be exponentially advantageous

Why multiple layers? The world is compositional

Anything New with Deep Learning since the Neural Nets of the 90s?

• Rectified linear units instead of sigmoids, enable training much deeper networks by backprop (Glorot & Bengio AISTATS 2011)

• Some forms of noise (like dropout) are powerful regularizersyielding superior generalization abilities

• Success of deep convnets trained on large labeled image datasets

• Success of recurrent nets with more memory, with gating units

• Attention mechanisms liberate neural nets from fixed-size inputs9

What’s New with Deep Learning?

• Progress in unsupervised generative neural nets allows them to synthesize a diversity images, sounds and text imitating unlabeled images, sounds or text

GeneratorNetwork

DiscriminatorNetwork

FakeImage

RealImage

TrainingSet

RandomVector

RandomIndex

GANs (NIPS’2014)

• Incorporating the idea of attention, using GATING units, has unlocked a breakthrough in machine translation:

Neural Machine Translation

• Now in Google Translate:

Lower-level

Higher-level

Softmax over lowerlocations conditionedon context at lower andhigher locations

Humanevaluation

humantranslation

n-gramtranslation

currentneural nettranslation

(ICLR’2015)

• Attention has also opened the door to neural nets which can write to and read from a memory

• 2 systems: • Cortex-like (state controller and representations)

• System 1, intuition, fast heuristic answer

• Hippocampus-like (memory) + prefrontal cortex

• System 2, slow, logical, sequential

• Memory-augmented networks gave rise to

• Systems which reason• Sequentially combining several selected pieces of

information (from the memory) in order to obtain a conclusion

• Systems which answer questions• Accessing relevant facts and combining them

We are starting to better understand whydeep learning is working

• Generalization:

• Distributed representations: (up to) exponential statisticaladvantage, if the world is compositional

• Depth, multiple layers: similar story, on top

• Optimization: MYTHS BUSTED

• Non-convexity & local min of the objective fn: not a curse

• Stochastic gradient descent is very efficient

• Additional human-inspired tricks: curriculum learning (ICML’2009)

NIPS’2014

ICLR’2014

Still Far from Human-Level AI

• Industrial successes mostly based on supervisedlearning

• Learning superficial clues, not generalizing well outsideof training contexts, easy to fool trained networks: • Current models cheat by picking on surface regularities

• Still unable to discover higher-level abstractions atmultiple time scales, very long-term dependencies

• Still relying heavily on smooth differentiable predictors(using backprop, the workhose of deep learning)

Humans outperform machines at unsupervisedlearning

• Humans are very good at unsupervised learning, e.g. a 2 year old knows intuitive physics

• Babies construct an approximate but sufficiently reliable model of physics, how do they manage that? Note that they interact with the world, not just observe it.

Latent Variables and Abstract Representations

• Encoder/decoder view: mapsbetween low & high-levels

• Encoder does inference: interpretthe data at the abstract level

• Decoder can generate new configurations

• Encoder flattens and disentanglesthe data manifold

encoder decoder P(x|h)

data space

Q(h|x) Abstract representationspace

x and y representdifferent modalities, e.g., image, text, sound…

Can provide 0-shot generalization to new categories (values of y)

Maps BetweenRepresentations

(Larochelle et al AAAI 2008)

Convolutional GANs

Strided convolutions, batch normalization, onlyconvolutional layers, ReLU and leaky ReLU

(Radford et al, arXiv 1511.06343)

GAN: Interpolating in Latent SpaceIf the model is good (unfolds the manifold), interpolatingbetween latent values yields plausible images.

Under review as aconference paper at ICLR 2016

Figure 7: Vector arithmetic for visual concepts. For each column, the Z vectors of samples areaveraged. Arithmetic was then performed on the mean vectors creating a new vector Y . The centersample on the right hand side is produce by feeding Y as input to the generator. To demonstratethe interpolation capabilities of the generator, uniform noise sampled with scale +-0.25 was addedto Y to produce the 8 other samples. Applying arithmetic in the input space (bottom two examples)results in noisy overlap due to misalignment.

Combining Iterative Sampling from Denoising Auto-Encoders with GAN

Plug & Play Generative Networks: Conditional Iterative Generation of Images in Latent Space

Anh Nguyen, Jason Yosinski, Yoshua Bengio, Alexey Dosovitskiy, Jeff Clune

(submitted to CVPR 2017) arXiv:1612.00005

227 x 227 ImageNet GENERATED IMAGES of category Volcano

(cheatinga bit by using lots of labeled data during training)

Plug & Play Generative Networks

volcano

High-ResolutionSamples227 x 227

What’s Missing

• More autonomous learning, betterunsupervised learning

• Discovering the underlying causal factors

• Model-based RL which extends to completely new situations by unrolling powerful predictive modelswhich can help reason about rarely observed dangerousstates

• Sufficient computational power for models large enoughto capture human-level knowledge

What’s Missing

• Autonomously discovering multiple time scales to handlevery long-term dependencies

• Actually understanding language (also solves generating), requiring enough world knowledge / commonsense

• Neural nets which really understand the notions of object, agent, action, etc.

• Large-scale knowledge representation allowing one-shotlearning as well as discovering new abstractions and explanations by ‘compiling’ previous observations

Acting to Guide Representation Learning

• What is a good latent representation?

• Disentangling the underlying factors of representationso that computers make sense of the world

• Some factors (e.g. objects) correspond to ‘independently controllable’ aspects of the world

• Can only be discovered by acting in the world

The Future of Deep AI

• Scientific progress is slow and continuous, but social and economic impact can be disruptive

• Many fundamental research questions are in front of us, withmuch uncertainty about when we will crack them, but we will

• Importance of continued investment in basic & exploratory AI research, for both practical (recruitment) short-term and long-term reasons

• Let us continue to keep the field open and fluid, be mindful of social impacts, and make sure AI will bloom for the benefit of all

Montreal Institute for Learning Algorithms

Deep Learning for AI - Data Science Summer School · Deep Learning for AI Yoshua Bengio August...

Documents

Dell EMC Ready Solutions for AI Deep Learning with Intel · Dell EMC Ready Solutions for AI – Deep Learning ... The Dell EMC Ready Solutions for AI – Deep Learning with Intel

Deep Learning - The Force of AI Awakens

Scaling Learning Algorithms towards AI - Yann LeCunyann.lecun.com/exdb/publis/pdf/bengio-lecun-07.pdf · 2007-08-24 · Scaling Learning Algorithms towards AI Yoshua Bengio (1) and

Dell EMC Ready Solutions for AI, Machine and Deep Learning · Dell EMC Ready Solutions for AI, Machine and Deep Learning ... results in AI, machine and deep learning environments

Deep learning neural networks and AI explained

Why Does Unsupervised Pre-training Help Deep …...201 Why Does Unsupervised Pre-training Help Deep Learning? Dumitru Erhan Aaron Courville Yoshua Bengio Pascal Vincent DIRO, Universite

Learning Deep Architectures for AI - York University

Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), 2009

Why are deep neural networks hard to train?mhyipa/nndl/chap5.pdf · 2017. 1. 6. · Deep Learning, draft book in preparation, by Yoshua Bengio, Ian 1. For instance, suppose we're

Learning Deep Architectures for AI Contents

Portafolio de trabajo by Franko Yoshua

Deep Learners Bene t More from Out-of-Distribution Examplesproceedings.mlr.press/v15/bengio11b/bengio11b.pdf · Deep Learners Bene t More from Out-of-Distribution Examples Yoshua

Learning Deep Architectures for AI - Université de Montréalbengioy/papers/ftml.pdf · Learning Deep Architectures for AI Yoshua Bengio ... parameter space of deep architectures

Learning Deep Architectures for AI

AI Deep Learning - CF Machine Learning

Scaling Learning Algorithms Towards AI...Yoshua Bengio and Yann LeCun, Scaling Learning Algorithms towards AI, Large-Scale Kernel Machine, 2007. Leslie Lamport, Deep Learning and Convolutional

Deep convolutional networks for quality assessment of …Deep convolutional networks for quality assessment of protein folds Georgy Derevyanko{, Sergei Grudininy, Yoshua Bengiozx,

AI - Deep Learning Computer Vision - Scene Understanding

Deep Learning - LMU Munich€¦ · Geoffrey Hinton Yann LeCun Yoshua Bengio Andrew Ng. ... parameters in a deep learning model. •Use with care since it can quickly over-parameterize

Learning Deep Architectures in AI