53
Representation learning PyData Warsaw 2015 Michael Jamroz Matthew Opala 24’th september 2015

PyData2015

Embed Size (px)

Citation preview

Page 1: PyData2015

Representation learning

PyData Warsaw 2015

Michael JamrozMatthew Opala

24’th september 2015

Page 2: PyData2015

● Goals of AI● Learning representations● Deep learning● Examples

Presentation Plan

Page 3: PyData2015

AI

● Goal: build the intelligent machine● It needs knowledge to make decisions● Impossible to put the knowledge into

computer program● Knowledge gained by learning from data

Page 4: PyData2015

Data representation

● Representation - features passed to ML algorithms, crucial for good performance on various tasks

● Features can be handcrafted or learned automatically

● Representation learning: discovering meaningful features by the computer

Page 5: PyData2015

ML in industry nowadays

● Most of the time spent on manual feature extraction

● We would like to have

Page 6: PyData2015

Why representation learning ?

● Previous slide (time-consuming, incomplete)● Unsupervised feature learning

○ Collected data are mostly unlabeled (bigger datasets)

○ Labels do not provide enough information○ Process of learning is independent of the

ML task performed on data

Page 7: PyData2015

Semi-supervised, transfer learning

● Transfer learning - transferring knowledge from previous learning to the new machine learning task

● Semi-supervised learning

few labeled examples

many unlabeled examples

Page 8: PyData2015

Need for Deep Architectures

● deep architecture can represent certain functions more compactly than shallow one

● any boolean function (e. g. AND, OR, XOR) can be represented by a single hidden layer - however it may require exponential number of hidden units

Page 9: PyData2015

Formally

● shown by Yao in 1985 that d-bit parity circuits of depth 2 have exponential size

● generalised to perceptrons with linear threshold units in 1991 by Hastad

Page 10: PyData2015

How deep representation do we need?

Page 11: PyData2015

Informal arguments

Page 12: PyData2015

Shallow program

Page 13: PyData2015

Deep program

Page 14: PyData2015

Biology inspirations

Page 15: PyData2015

Learning multiple levels of representation

Page 16: PyData2015

“I'm sorry, Dave. I'm afraid I can't do that.”

Page 17: PyData2015

对不起,戴夫。恐怕我不能这样做。

Page 18: PyData2015

Let’s build deep representation

Page 19: PyData2015

Multilayer Perceptron

input layer

hidden layers

output layer

Page 20: PyData2015

Reminder - Gradient Descent

Page 21: PyData2015

But MLPs have their problems

● vanishing, exploding gradients● stucking in poor local optima● lack of good initializations● lack of labeled data● hard time to encourage for research● slow hardware

Page 22: PyData2015

Breakthrough 2006

Page 23: PyData2015

Greedy layer-wise pretraining

Page 24: PyData2015

Restricted Boltzmann Machine

Page 25: PyData2015

Stacking RBMs

Page 26: PyData2015

● but for natural images we would like to be invariant to translations, rotations and other non-changing class transformations

● fully connected networks do not introduce such invariance

Limitations of fully connected networks

Page 27: PyData2015

Convolutional Neural Nets

Page 28: PyData2015

Convolution = sparse connectivity + parameters sharing

Page 29: PyData2015

Sparse connectivity

Page 30: PyData2015

Parameter sharing

Page 31: PyData2015

Convolution

Page 32: PyData2015

Pooling

Page 33: PyData2015

Architecture

Page 34: PyData2015

Examples

Page 35: PyData2015

Word2Vec / Doc2Vec

● Tomas Mikolov et al 2013● Embedding words / documents in vector

space● Neural network with one hidden layer● Trained in unsupervised way● Representation for word obtained by

computing hidden layer activation● Good explanation: http://arxiv.org/pdf/1411.

2738v1.pdf

Page 36: PyData2015
Page 37: PyData2015

Problem

● ~180k documents - reports made by american companies of activity

● companies belonging to different industry segments (260)

● ~9k labeled documents (given industry the company operates in)

● example of semi-supervised learning● task: classify the remaining part of

documents

Page 38: PyData2015

Doc2Vec - document embedding

Page 39: PyData2015

Doc2Vec - classification

● Division of labeled set to training/test data with ratio 70/30

● Test set: ~2700 examples, 260 classes● Classification performed on representation

obtained from Doc2Vec● Accuracy on test set:

○ KNN with voting: ~85 %○ SVM one-versus-one: ~83 %○ Random forest: ~80 %

Page 40: PyData2015

Neural Art Style Transfer

Page 41: PyData2015
Page 42: PyData2015
Page 43: PyData2015
Page 44: PyData2015

Pretrain CNN

Page 45: PyData2015

Content representation

Page 46: PyData2015

Art style representation

Page 47: PyData2015

Objective function

Page 48: PyData2015

Summing up

● define loss function for content● define loss function for art● define total loss● perform gradient-based optimization● compute derivatives with respect to data

Page 49: PyData2015
Page 50: PyData2015

● Theano & Lasagne● NViDIA GTX● https://github.com/Craftinity/art_style● http://deeplearning.net

Page 51: PyData2015

Contact

● http://www.craftinity.com● https://www.facebook.com/craftinitycom● https://twitter.com/craftinitycom● [email protected][email protected][email protected]

Page 52: PyData2015

Q&A

Page 53: PyData2015

The End