Transcript
Page 1: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Michael Brückner

Manager Machine Learning

25/02/2016

Machine Learning 101

Page 2: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Agenda

• What is Machine Learning and why do we need it?

• Model Building

• Model Evaluation & Tuning

Page 3: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What is Machine Learning?

Methods and Systems that …

Adaptbased on recorded

data

Predictnew data based on recorded

data

Optimizean action given a utility

function

Extracthidden

structure from the

data

Summarizedata into concise

descriptions

Page 4: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What is Machine Learning NOT?

Methods and Systems that …

can yield Garbage-In Knowledge-

Out

perform well without

data modeling& feature

engineering

avoid the curse-of-

dimensionality

are a replacement for business

rules

Page 5: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Infer-Predict-Decide Cycle

Inference

Build & evaluate Predictor

Prediction

Apply the learned Predictor

Decision Making

Adjust Business lossand get new/more data

Page 6: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

What for?

Automate tasks, which typically require humans in order to

• scale

• improve over humans (non-experts)

• preserve privacy

or solve tasks that are impossible for humans

Page 7: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Examples: Personalized Recommandation

• Input:

Page 8: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Examples: Personalized Recommandation

• Output:

Page 9: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Examples: Face Detection & Recognition

Face detection

• Input: image

• Output: face position

Face recognition

• Input: face (image & face position)

• Output: person’s name

Page 10: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Examples: Full-Text Translation

• Input: text in one language

• Output: text of another language

Page 11: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Examples: Spam Filtering

• Input: email (text, images, …)

• Output: spam/non-spam flag

• Challenges:

• extremely high precision for

legitimate emails

• spam changes constantly

• noisy ground truth

Page 12: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Supervised Machine Learning

1. Model problem in terms of input data and output data

2. Collect sample of input-output pairs

3. Learn a mapping that produces the output given the

input

4. Apply this function on new inputs to make predictions

Page 13: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

A Programer’s Perspective

Traditional Programming (Predicting)

Supervised Machine Learning

Computer

Input Data

Mapping

Output Data

Computer

Input Data

Output Data

Mapping

Page 14: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Advantages

• Use data instead of intuition to derive the mapping

• Can solve very complex tasks

• Can adapt to new situations (collect more data)

• Does not require much expert knowledge

Page 15: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Input Data

Description Type Cost Actual Cost Diff In Catalogue

Movies Entertainment $50 $28 $22 Yes

Music (CDs, MP3s, etc.) $500 $30 $470 No

Sporting Events Entertainment $0 $40 ($40) No

Dining Out Food $1,000 $1,200 ($200) Yes

Groceries $100 $0 $100 Yes

Charity 1 Gifts and Charity $200 $200 $0 No

Charity 2 $500 $500 $0 No

Cable/Satellite Housing $100 $100 $0 Yes

Electric Housing $45 $40 $5 Yes

Mortgage or Rent $700 $700 $0 Yes

Health Insurance $400 $400 $0 Yes

Home Insurance $400 $400 $0 No

Credit Card 1 $0 Yes

Dataset

Categorical Data

Missing Data

Binary Data

Numerical Data

Attribute Name

Attribute Value

Attribute

Text Data

Page 16: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Description Type Cost Actual Cost Diff In Catalogue

Movies Entertainment $50 $28 $22 Yes

Music (CDs, MP3s, etc.) ? $500 $30 $470 No

Sporting Events Entertainment $0 $40 ($40) No

Dining Out Food $1,000 $1,200 ($200) Yes

Groceries ? $100 $0 $100 Yes

Charity 1 Gifts and Charity $200 $200 $0 No

Charity 2 ? $500 $500 $0 No

Cable/Satellite Housing $100 $100 $0 Yes

Electric Housing $45 $40 $5 Yes

Mortgage or Rent ? $700 $700 $0 Yes

Health Insurance $400 $400 $0 Yes

Home Insurance $400 $400 $0 No

Credit Card 1 ? $0 Yes

Output Data

Target Attribute Values

Target Attribute

Page 17: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Agenda

• What is Machine Learning and why do we need it?

• Model Building

• Model Evaluation & Tuning

Page 18: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Problem Setting

• Input: vector of observable attributes, x

• Output: target attribute value, y

• Training data: pairs of input and corresponding output,

D = (x1,y1),…,(xN,yN)

• Application data: inputs only

• Goal: learn mapping fw:x ↦ y

Predictor

Page 19: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Challenges in Model Building

• Which function class for Predictor (data modeling)?

• How to pre-process the data (feature engineering)?

• How to learn this Predictor from our training data?

• How to generalize to new data?

Page 20: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which function class for Predictor?

Types of prediction tasks (output type):

• Binary Classification ⇒ binary target y {–1, +1}

• Multinomial Classification ⇒ categorical target y {1… K}

• Regression ⇒ numeric target y [ l ,u] R

Page 21: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which function class for Binary Classification?

• Decision Tree

+

+-

-

-

x2 > 7?

no yes

+

+

+

+

+

x1 < 3?

no yes

x2 < 5?

no yes

x1 < 1?

no yes

+

+

-

-

x2

x11 3

5

7

Page 22: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which function class for Binary Classification?

• Decision Tree

+-

x2 > 7?

no yes

+

x1 < 3?

no yes

x2 < 5?

no yes

x1 < 1?

no yes

+ -

x2

x1

+

--

Page 23: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which function class for Binary Classification?

• Linear function

• binary target attribute

values y {–1, +1}

x2

x1

Hw +

-

y(x) = sign( fw(x))

Hw

={x | fw(x) = xTw+ w

0= 0}

^

Page 24: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Which function class for Binary Classification?

• Generalized linear function

(Kernel methods)

• Layered Generalized linear

function (Neural Networks)

• Ensemble of functions

• …

x2

x1

+

- +

-

Page 25: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to pre-process the data?

• Predictor’s function class defined for limited input domain

⇒ transform/extract attributes first (pre-processing)

• Number to (normalized) Number:

• z-standardization, min-max normalization

• Number to Category:

• Binning (quantile, equidistant)

• Category to (numeric) Vector:

• One-hot encoding

Page 26: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to pre-process the data?

• Predictor’s function class defined for limited input domain

⇒ transform/extract attributes first (pre-processing)

• Text to (numeric) Vector:

• Normalization, tokenization, stemming

• Bag-of-Words, Bag-of-NGrams, TI-IDF ⇒ sparse vector

• Latent word embedding (LSI, word2vec, LDA) ⇒ dense vector

• Image to (numeric) Vector:

• HoG, DAISY, color histogram

Page 27: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to learn a Predictor?

• Loss of Predictor fw:x ↦ y for a given input-output pair:

Loss function PredictionGround Truth

L(y, fw(x))

Page 28: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to learn a Predictor?

Loss functions for binary classification (target ): y Î{-1,+1}

Page 29: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to learn a Predictor?

Function Class Loss Function Learning Algorithm

Decision Trees 0/1 loss ID3

Decision Trees Quadratic loss CART

Linear function Quadratic loss Least-squares regression

Linear function Logistic loss Logistic regression

Linear function Hinge loss Support Vector Machines

Layered Generalized

Linear function

Logistic loss Neural Networks

(Binary Classification)

Layered Generalized

Linear function

Quadratic loss Neural Networks

(Regression)

Page 30: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to learn a Predictor?

• Theoretical Risk:

• Empirical Risk:

Average over all possible data

Average over training data

Page 31: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to learn a Predictor?

• Prediction depends on Predictor with model

parameters w

• Minimize Risk w.r.t. those model parameters w⇒ mathematical Optimisation Problem

• Gradient-based first or second-order methods

• Coordinate-descent methods

• (Greedy) Search

y(x)^ fw

Page 32: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to generalize to new data?

Err

or

Model Complexity

Page 33: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to generalize to new data?

• Empirical Risk:

• Structural Risk: Regularizer

Page 34: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Agenda

• What is Machine Learning and why do we need it?

• Model Building

• Model Evaluation & Tuning

Page 35: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Performance for Binary Classification

Total number of

data points (N)

True Target

positive negative

Predicted

Target

positiveTrue

Positive

False

Positive

negativeFalse

Negative

True

Negative

Page 36: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Performance for Binary Classification

• Accuracy:

• Recall (true positive rate):

• Precision:

• Fall-out (false positive rate):

TP+TN

NTP

TP+ FNTP

TP+ FPFP

TN + FP

Page 37: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Performance for Binary Classification

Decision function

AUC

(Area Under roc Curve)

y(x) = sign( fw(x)+b)^

Predictor Decision threshold

Page 38: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Training vs. Test Performance

How do we know that a Predictor works well on new data?

Small error on training

data ≠ small error on

new data (test data)!

Page 39: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Hold-out Evaluation

• Put some data aside before training = test data

• Use this hold-out data for evaluation

• Disadvantages:

• What if we were (un)lucky when choosing the hold-out data?

• We do NOT use all the data for model training!

Page 40: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

K-Fold Cross Validation-based Evaluation

• Split data into K partitions (folds)

• Take all but one partition to train a Predictor

• Evaluate Predictor on the left-out partition

• Repeat this for all partitions

• Average performance for all K evaluations

• Finally train a Predictor on all data

Page 41: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Model Tuning

Learning methods and Predictors have hyper-parameters

• Amount of regularization

• Choice of loss function

• Decision threshold score

• Learning rate

• …

Page 42: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Example: Decision threshold

Decision threshold

Page 43: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to choose hyper-parameters?

Grid Search:

• Evaluate Predictor for all grid points (hyper-parameter

combinations)

• Take best grid point

Very expensive!

210 010 210

12

02

12

Page 44: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

How to choose hyper-parameters?

Bayesian Optimisation:

• Learn model to predict evaluation outcomes

• Evaluate Predictor only for promising grid points

• Take best grid point

after fixed number of

evaluations

210 010 210

12

02

12

Page 45: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Common Pitfalls

• Model tuning is part of training

⇒ Do NOT use test data or test CV partitions!

• Use proper grid resolution and axis scaling

• Use same metric for tuning as for evaluation

Page 46: Machine Learning 101 - Amazon Web Servicesaws-de-media.s3.amazonaws.com/images/Webinar/2016...© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Michael Brückner

Thank you!

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.


Recommended