© AlchemyAPI CENG 501 Deep Learning Week 2

CENG 501 Deep Learning

Week 2

Sinan Kalkan

© AlchemyAPI

Administrative Issues

• Registrations• Access to Moodle (https://odtuclass.metu.edu.tr/)• Project paper selection– https://docs.google.com/spreadsheets/d/1tzPHq_Vgu6gCwNyXJHGv

qeA6pgU67H0nKYjqkisWfKc/edit?usp=sharing– Deadline: 29 March + 1-2 weeks

Spring 2021 Sinan Kalkan 2

https://odtuclass.metu.edu.tr/

https://docs.google.com/spreadsheets/d/1tzPHq_Vgu6gCwNyXJHGvqeA6pgU67H0nKYjqkisWfKc/edit?usp=sharing

Deep learning and other approaches

3Fig.: I. Goodfellow

Sprin

g 20

21Si

nan

Kal

kan

Previously on CENG501!

So what does Deep (Machine) Learning bring/provide?

• (Hierarchical) Compositionality� Cascade of non-linear transformations� Multiple layers of representations

• End-to-End Learning� Learning (goal-driven) representations� Learning to feature extraction

• Distributed Representations� No single neuron “encodes” everything� Groups of neurons work together

4Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Sprin

g 20

21Si

nan

Kal

kan


5

Building A Complicated Function

Given a library of simple functions

Compose into a

complicate function

Idea 2: Compositions• Deep Learning

• Grammar models

• Scattering transforms…

f(x) = g1(g2(. . . (gn(x) . . .))

Slide Credit: Marc'Aurelio Ranzato, Yann LeCunSpring 2021Sinan Kalkan


Trainable Classifier

Low-LevelFeature

Mid-LevelFeature

High-LevelFeature

Feature visualization of convolutional net trained on ImageNet from [Zeiler & Fergus 2013]

“car”

Deep Learning = Hierarchical Compositionality

Slide Credit: Marc'Aurelio Ranzato, Yann LeCun 6Spring 2021Sinan Kalkan


Distributed Representations Toy Example• Can we interpret each dimension?

7Slide Credit: Moontae Lee Spring 2021Sinan Kalkan


It can make mistakes

• T. Xu, J. White, S. Kalkan, H. Gunes, "Investigating Bias and Fairness in Facial Expression Recognition", ECCV2020 Workshop: ChaLearn Looking at People workshop ECCV: Fair Face Recognition and Analysis, 2020.

• J. Cheong, S. Kalkan, H. Gunes, "The Hitchhiker’s Guide to Bias and Fairness in Facial Affective Signal Processing", IEEE Signal Processing Magazine.

8Joy Buolamwini & Timnit Gebru

Sprin

g 20

21Si

nan

Kal

kan


9

Beery, Sara, Grant Van Horn, and Pietro Perona. "Recognition in terra incognita." Proceedings of the European Conference on Computer Vision (ECCV). 2018.

Sprin

g 20

21Si

nan

Kal

kan


Today

• Introduction to machine learning

• Towards deep learning� History of deep learning� Biological neuron� Artificial neuron� Perceptron learning� Linear classification/regression� Non-linear classification/regression� Multi-layer perceptrons

10

Sprin

g 20

21Si

nan

Kal

kan

A CRASH COURSE ONMACHINE LEARNING

11

Facial images in this part of the lecture are from the JAFFE database: https://figshare.com/articles/journal_contribution/jaffe_desc_pdf/5245003

Spring 2021 Sinan Kalkan

Sinan Kalkan

machine learningWhat is ?

Finding “hidden” structures in data “automatically”.

Input Label

Unhappy

Unhappy

Unhappy

Unhappy

Input Label

Happy

Happy

Happy

Happy

Sinan Kalkan


Finding “hidden” structures in data “automatically”.

With supervision è Supervised learning

Sinan Kalkan


𝐱 ∈ ℐ

Happy

Unhappy

𝐲 ∈ 𝒪

Sinan Kalkan


𝐱 ∈ ℐ

Happy

Unhappy

𝐲 ∈ 𝒪𝐲 = 𝑓(𝐱)

… … ……

𝑓 ?

Testing

Training

Machine learning pipeline

“Learn”Learned

Models or Classifiers

Data with label(label: happy, unhappy)

- Size- Texture- Color- Histogram of

oriented gradients- SIFT- Etc.

ExtractFeatures

ExtractFeatures

Predict

Happy or Unhappy

Test input


GenerativeDiscriminativeGeneral Approaches

Find separating line (in general: hyperplane)

Fit a function to data(regression).

Learn a model for each class.


Supervised

Instance Label

happy

unhappy

unhappy

happy

unhappy

Unsupervised

General Approaches (cont’d)

Extract Features

Learn a model

Extract Features

Learn a model

e.g. SVM

e.g. k-meansclustering


General Approaches (cont’d)

Generative Discriminative

Supervised

Unsupervised


Neural Networks

Support Vector MachinesNeural Networks

K-Nearest NeighborsDecision TreesRandom Forest

Mixture Models K-means

Outline for the Machine Learning Part

Supervised Learning

Unsupervised Learning

Other Forms of Learning

General Issues in Learning

Evaluation


SUPERVISED LEARNING


Supervised Machine Learning

• Find the following mapping, given the training set 𝐱!, 𝐲! !"#$ :

𝐲 = 𝑓 𝐱

𝑓: 𝕏 → 𝕐.

• Targets can be for classification or regression

• Methods: – K Nearest Neighbors, Support Vector Machines,

Decision Trees, Random Forest, Neural Networks


Input Target Value

0.95

0.80

0.65

0.88

Target Class

Happy

Happy

Happy

Happy

Supervised Machine Learning MethodsK Nearest Neighbors

• Advantages:– No training– Simple

• Disadvantages:– Slow testing time (more efficient

versions exist)– Needs a lot of memory

Fig: http://cs231n.github.io/classification/Spring 2021 Sinan Kalkan 23

k shall be 3

Supervised Machine Learning MethodsSupport Vector Machines

Borrowed mostly from the slides of: ML Group, Uni of Texas at Austin; Mingyue Tan, Uni of British Columbia

wTx + b = 0

wTx + b < 0wTx + b > 0

Binary classification can be viewed as the task of separating classes in feature space:

𝐲 = 𝑓(𝐱) = sign(𝐰𝑻𝐱 + 𝑏)



• How many lines are there?• Which one is optimal?

25Borrowed mostly from the slides of: ML Group, Uni of Texas at Austin; Mingyue Tan, Uni of British Columbia



• Distance from example xi to the separator is 𝑟 = 𝐰!𝐱"#$𝐰

• Examples closest to the hyperplane are support vectors. • Margin ρ of the separator is the distance between support vectors.

r

ρ


Maximum Margin Classification• Maximizing the margin is good

according to intuition.• Implies that only support vectors

matter; other training examples are ignorable.



What we know:• w . x+ + b = +1 • w . x- + b = -1 • w . (x+-x-) = 2

“Predict Class =

+1”

zone

“Predict Class =

-1”

zonewx+b=1

wx+b=0

wx+b=-1

X-

x+

wwwxx 2)(=

×-=

-+

r

ρ = Margin Width

27

Borrowed mostly from the slides of: ML Group, Uni of Texas at Austin; Mingyue Tan, Uni of British Columbia



• Then we can formulate the optimization problem:

which can be reformulated as:

or as the following in more compact form:

Find w and b that maximizes:

𝜌 = %𝐰 ,

such that: 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) ≥ 1 for all (xi, yi), i=1..n

Find w and b that minimizes:

𝚽(𝐰) = 𝐰 2 = 𝐰𝑇𝐰,

such that: 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) ≥ 1 for all (xi, yi), i=1..n


min𝐰∈ℝ(,)∈ℝ

𝐰 * −6!

𝛼! 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) − 1



Limitation: Not robust to noisy data: • Solution: Allow some margin of

error for each sample point


Find w and b that minimizes:

𝚽(𝐰) = 𝐰 2 = 𝐰𝑇𝐰,

such that: 𝑦𝑖(𝐰𝑻𝐱𝑖 + 𝑏) ≥ (1 − 𝜖)) for all (xi, yi), i=1..n

and 𝜖# +⋯+ 𝜖$ < 𝐶. Spring 2021 Sinan Kalkan

Take C as …


Limitation: Not suitable for non-linearly separable data

• Solution: Expand the feature space using a “kernel”, a function calculating some similarity between samples as a dot product:

𝐾 𝐱) , 𝐱* = 𝜙 𝐱) ⋅ 𝜙(𝐱*)

• Example kernels:– Polynomial

𝐾 𝐱! , 𝐱" = 𝐱! ⋅ 𝐱" + 1#

– Radial-basis Function:

𝐾 𝐱! , 𝐱" = exp −𝛾6$

(𝑥!$−𝑥"$)%


0 x

0 x

0

x2

x


Supervised Machine Learning MethodsDecision Trees

T. Mitchell, “Machine Learning”, McGraw Hill, 1997.


Supervised Machine Learning MethodsDecision Trees

• At each node 𝑚, we divide the space based on a simple rule:

𝑥! > 𝜃+• Following the nodes, we reach

leaves.• Classification: Leaves are

labels• Regression: Leaves are

numerical values (a function of local values)

Fig: Oya Celiktutan.Spring 2021 Sinan Kalkan 34

Supervised Machine Learning MethodsRandom Forest

• Instead of a single strong classifier, have many weak classifiers and combine their answers.

Fig: https://towardsdatascience.com/understanding-random-forest-58381e0602d2


Take 9 trees

UNSUPERVISED LEARNING

Fig: http://www.nlpca.org/


Unsupervised Machine Learning

• Learning without labels• Goal: Discover a better

representation of data• Uses:

– Dimensionality reduction– Data visualization– Preparation for supervised learning– Clustering

• Methods:– Principle/Independent Component

Analysis, k-Means/x-Means Clustering, Manifold learning methods, …


Unsupervised Machine Learning MethodsClustering with k-Means


k shall be 2

Unsupervised Machine Learning MethodsClustering with k-Means

Raw data

Problem:

- Need to know k beforehand.


Unsupervised Machine Learning MethodsDimensionality Reduction

Principle Component Analysis• Analyze variance in the

data• Find new set of axes

such that along the first axis, variance is the highest; along the second, variance is the second highest etc.

Fig: http://cs231n.github.io/neural-networks-2/

Fig: http://www.nlpca.org/


OTHER FORMS OF LEARNING

Self-supervised learningWeakly-supervised learningZero-shot/Few-show learningReinforcement LearningMeta-learningLife-long learning


Self-supervisedLearning

• Form supervision from data itself by predicting one portion of data from another portion

• An introductory read: https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html

Spring 2021 Sinan Kalkan 43Fig.: Doersch et al., 2015

https://lilianweng.github.io/lil-log/2019/11/10/self-supervised-learning.html

Weakly-supervised LearningE.g. weakly supervised object detection

There is at least one penguin in this photo


Few-shot/Zero-shot Learning

Fig: https://preparingforgre.com/item/42488


Reinforcement Learning

Fig: Richard Sutton


Meta Learning

• One learning module shaping/controlling another


F. I. Dogan, I. Bozcan, M. Celik, S. Kalkan, "CINet: A Learning Based Approach to Incremental Context Modeling in Robots",IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2018.

Lifelong Learning

I. Dogan, H. Celikkanat, S. Kalkan, "A Deep Incremental Boltzmann Machine for Modeling Context in Robots", International Conference on Robotics and Automation (ICRA), pp. 2411-2416, IEEE, 2018.


ISSUES IN MACHINE LEARNING


Model selection

𝑓 𝑥 = 𝑊𝑥

𝑓 𝑥 =4%

𝑊%𝑥%%

𝑓 𝑥 = 𝑊#𝑥 +W&𝑥&

𝑓 𝑥 = 𝑊𝑓#(𝑥)

𝑓 𝑥 = 𝑊#𝑓#(𝑥) +W&𝑓&(𝑥)&

…


Model ComplexityModels range in their flexibility to fit arbitrary data

complex model

unconstrained

large capacity mayallow it to memorizedata and fail tocapture regularities

simple model

constrained

small capacity mayprevent it from representing allstructure in data

low bias

high variance

high bias

low variance

Slide Credit: Michael Mozer51Spring 2021 Sinan Kalkan

Bias-Variance Dilemma

underfit overfit

Erro

r on

Test

Set

Slide Credit: Michael Mozer52Spring 2021 Sinan Kalkan

EVALUATION


How to evaluate performance

• Hold-out method:


Training set Test set

Training set Test setValidation set

Cross-validation

• K-fold cross validation

Fig: https://scikit-learn.org/stable/modules/cross_validation.htmlSpring 2021 Sinan Kalkan 55

Cross-validation

Exhaustive validation:

– Leave-p-out (LPO) cross validation• One iteration: Use p samples as the validation set

and others as training set• Repeat this for all possible combinations of p

samples.

– Leave-one-out (LOO) is LPO with p=1


Cross-validation

• Group/subject-based validation: Determine the folds based on group/subject information

• Leave one subject out (LOSO) cross validation• Divide dataset into k subsets with respect to subject identifiers

Training and test sets contain the same subjects

Training and test sets do NOT contain the same subjects


Evaluation MetricsClassification

Credit: Oya CeliktutanSpring 2021 Sinan Kalkan 58

Evaluation MetricsClassification

• Always try to quantify the predictions that have not been made.

• Recall & F-measure are frequently used


Evaluation MetricsRegression


2

2

Training Vs. Test Set Error

Test Set

Training Set

Slide Credit: Michael Mozer 61Spring 2021 Sinan Kalkan

EXAMPLE


Facial Emotion Recognition

- Using SVM and MLP on Google Colab + sci-kit learn:https://colab.research.google.com/drive/1mZimbXBAJlqgl-04phHxXRYku6EvgUOw


https://colab.research.google.com/drive/1mZimbXBAJlqgl-04phHxXRYku6EvgUOw

Documents

© AlchemyAPI CENG 501 Deep Learning Week 2