Fabien VAUCHELLES - Java2Days€¦ · Men Women Child Adult 1st class 3th class GAME OVER JACK LIFE...

Preview:

Citation preview

10/12/2019 @

Fabien VAUCHELLESzelros.com / fabien.vauchelles.@zelros.com / @fabienvhttp://bit.ly/ml-aismarttech

INTRODUCTION

I WILL TELL YOU THE TRUTH

JACK KNEW HE WOULD DIE

JACK KNEW HIS FUTURE

JACK

LIFE DECISION PATH

Men

Women

JACK

LIFE DECISION PATH

Men

Women

JACK Child

Adult

LIFE DECISION PATH

Men

Women

Child

Adult

1st class

3th class

JACK

LIFE DECISION PATH

Men

Women

Child

Adult

1st class

3th class

GAME OVER

JACK

LIFE DECISION PATH

70%

BACK TO MACHINE LEARNING

If you know the passengers list:

- Gender- Age- Ticket class- Does he survived ?

You can create a Decision Tree … … for this Supervised Problem !

REAL LIFE IS DIFFERENT...

KEEP COOL AND USE MACHINE LEARNING

ZELROS // FABIEN VAUCHELLES

Speech Recognition

NLP

ComputerVision

Knowledge

AI coachInsurer Data Insurance ML / Predictive models Insurance Advisor

What is Machine Learning

ML SOLVE PROBLEMS WHICH ARE UNSOLVABLE BY CONVENTIONAL METHODS

What is the type of problem

TYPE OF PROBLEM

SUPERVISEDLEARNING

UNSUPERVISEDLEARNING

SUPERVISED LEARNING

Men

Women

Child

Adult

1st class

3th class

GAME OVER

JACK

70%

UNSUPERVISED LEARNING

age

satisfaction

UNSUPERVISED LEARNING

age

satisfaction

UNSUPERVISED LEARNING

age

satisfaction

What is the goal of ML algorithms

MINIMIZE THE ERROR

How to start a ML problem

AS DEVELOPWE USE TEST DRIVEN DEVELOPMENT

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL1 2

3

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL1 2

3

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL1 2

3

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL1 2

3

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL1 2

3

THE PROCESS OF DATA SCIENCE

ANALYZE DATA

VALIDATE THE PREDICTION

TRAIN THE MODEL1 2

3

ANALYZE DATA

Do you speakData

DATA LANGUAGE

NAME AGE CLASS DIED ?John 23 3 Yes

Marry 31 1 NoHenry 23 2 Yes

Nicolas 41 1 NoAnna 18 3 Yes

Dataset

DATA LANGUAGE

NAME AGE CLASS DIED ?John 23 3 Yes

Marry 31 1 NoHenry 23 2 Yes

Nicolas 41 1 NoAnna 18 3 Yes

Feature

DATA LANGUAGE

NAME AGE CLASS DIED ?John 23 3 Yes

Marry 31 1 NoHenry 23 2 Yes

Nicolas 41 1 NoAnna 18 3 Yes

Target

DATA LANGUAGE

NAME AGE CLASS DIED ?John 23 3 Yes

Marry 31 1 NoHenry 23 2 Yes

Nicolas 41 1 NoAnna 18 3 Yes

Sample

DEMO TIME !

How to visualizea feature

EXCEL WON’T HELP YOU ON BIG DATA

YOU LOVE STATISTIC METHODS

BOX PLOT

Median: 50%

BOX PLOT

5% 25% 50% 75% 95%

Quantile 1 Quantile 2 Quantile 3 Quantile 4Outlier

DEMOTIME !

Why do we care of missing data

ALGORITHMS DON’T LIKE MISSING DATA

MISSING VALUES

10Nan20Nan30

10Nan20Nan30

102030

MISSING VALUES

BAD !!!

MISSING VALUES

10Nan20Nan30

10Nan20Nan30

10020030

BAD !!!

MISSING VALUES

Fill empty value with median: 20

10Nan20Nan30

1020202030

GOOD !!!

MISSING VALUES

DEMO TIME !

CREATE FEATURES

Why do we createArtificial Features

ARTIFICIAL FEATURES HELP ALGORITHMS TO HAVE BETTER PREDICTION

I DON’T UNDERSTAND TEXT !

UNDERSTAND CATEGORIES

NAME GENDERJohn male

Marry femaleHenry male

Nicolas maleAnna female

UNDERSTAND CATEGORIES

NAME GENDERJohn male

Marry femaleHenry male

Nicolas maleAnna female

NAME GENDERJohn 1

Marry 2Henry 1

Nicolas 1Anna 2

LabelEncoder

2 > 1 !!!

UNDERSTAND CATEGORIES

NAME GENDERJohn male

Marry femaleHenry male

Nicolas maleAnna female

NAME GENDER_MALE GENDER_FEMALEJohn 1 0

Marry 0 1Henry 1 0

Nicolas 1 0Anna 0 1

OneHotEncoder

DEMO TIME !

TRAIN THE MODEL

FIT & PREDICT

FIT MODEL PREDICT1 2

FIT & PREDICT

FIT MODEL PREDICT1 2

What can we predict

CLASSIFICATION

Do we have survived on Titanic ?NAME AGE CLASS DIED ?John 23 3 Yes

Marry 31 1 NoHenry 23 2 Yes

Nicolas 41 1 NoAnna 18 3 ?

REGRESSION

What is the price of the ticket ?NAME AGE CLASS FAREJohn 23 3 71

Marry 31 1 8Henry 23 2 53

Nicolas 41 1 7Anna 18 3 ?

What algorithmcan we choose

ALGORITHMS

DECISION TREE

COMPUTER VISION

LINEAR

CLUSTERING

BAYESIAN

NLP

Linear Regression

Logistic RegressionConvolutional Neural

Network

GAN

Recurrent Neural Network

Gaussian Naive Bayes

Multinomial Naive Bayes Bayesian Network

k-Means

k-Medians

Hierarchical Clustering

Perceptron

Random Forest

CATboost

XGBoostTF-IDF

Word2Vec

BERTGPT2

Siamese Network

LightGBM

What is Linear Regression

REGRESSION

y

X

h(X)=θ0+θ1X

/ LINEAR

REGRESSION / POLYNOMIAL

y

X

h(X)=θ0+θ1X+θ2X2

REGRESSION / POLYNOMIAL

y

X

h(X)=θ0+θ1X+θ2X2+θ3X

3

REGRESSION / POLYNOMIAL

y

X

OVERFITTING !!!

h(X)=θ0+θ1X+θ2X2+θ3X

3

REGRESSION / POLYNOMIAL

y

X

h(X)=θ0+θ1X+θ2X2

REGRESSION / COST FUNCTION

J=∑(h(X)-y)2prediction real value

Sum of errors :

mcount

REGRESSION / COST FUNCTION

h(X)=θ0+θ1X+θ2X2

REGRESSION / COST FUNCTION

h(X)=θ0+θ1X+θ2X2+θ3X

3

DEMO TIME !

VALIDATETHE PREDICTION

How to validatea trained model

TRAIN & TEST

X

y

X

y

y_predictΔ

TEST (30%)TRAIN (70%)

CROSS VALIDATION

X

y

X

y

y_predictΔ

TEST (30%)TRAIN (70%)

x5

random

PREDICT

X

y_predict

NEW DATA

DOES JACK SURVIVE ?

DEMO TIME !

RESOURCES

● Coursera Machine Learninghttps://www.coursera.org/learn/machine-learning

● Coursera Deep Learninghttps://www.coursera.org/specializations/deep-learning

● Kagglehttp://www.kaggle.com

ANY QUESTIONS ?

zelros.com / fabien.vauchelles.@zelros.com / @fabienvhttp://bit.ly/ml-aismarttech