43
Machine learning for a busy developer Leonid Igolnik

Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Machine learning for a busy developerLeonid Igolnik

Page 2: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

About me

5/31/13

• Based in California

• Java Developer for over 11 years

• More then 13 years of SaaS experience

• Love to Travel

Page 3: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

AGENDA

5/31/13

Page 4: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

AGENDA

5/31/13

Page 5: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

AGENDA

5/31/13

Page 6: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

AGENDA

• What is machine learning

• R for Machine Learning

• R demo

• Further reading

5/31/13

Page 7: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Machine learning

5/31/13

Page 8: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Defining machine learning

“"Field of study that gives computers the ability to learn without being explicitly programmed” - Arthur Samuel 1959

5/31/13

Page 9: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Defining machine learning

• Statistics is set of tools help humans learn more about the world so the can make better decisions

• Machine learning is about teaching computers about the world so they can use the knowledge to perform tasks

• Intersection of mathematics, statistics, computer science and software engineering

5/31/13

Page 10: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Problems we are solving

• Job Title clustering

• Predicting time to hire based on location, job type, source type etc.

• Predict probability of hire of a candidate given job type, skills, past employer, source type etc.

• Use number of hires in taleo to predict overall job market trends

• And many more

5/31/13

Page 11: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R for machine learningR For Machine Learning

Page 12: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R for Machine learning

Language and environment for statistical computing and is an Open Source alternative to S

Page 13: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R for Machine learning

• Comes with comprehensive ecosystem of extensions: CRAN

• Provides variety of statistical and graphical data analysis tools

http://cran.us.r-project.org/

Page 14: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R for Machine learning

http://www.revolutionanalytics.com/what-is-open-source-r/companies-using-r.php

Page 15: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R for Machine learning

Does not always scale well with large data sets

Page 16: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

"The best thing about R is that it was developed by statisticians.

The worst thing about R is that... it was developed by statistician” -

Bo Cowgill, Google

Page 17: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Statistics you say ?

5/31/13

Page 18: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Teapots and other kitchen tools

5/31/13

Page 19: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

5/31/13

Page 20: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

What can you use R for ?

• Log analysis

• Getting insights into your data

• Creating pretty charts for management ….

• Fun

5/31/13

Page 21: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

HELLO world

5/31/13

Page 22: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R Basics

• Workspaces

• Variables

• X = 1 or Y <- 3

• Functions

• C (1, 2, 3)

• [1] 1 2 3

• Comments

• 1 + 1 # this is a comment

5/31/13

Page 23: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R Data types

• Numeric

• Integer

• Complex

• Logical: TRUE or FALSE

• Character

• Factors: aka Enum

5/31/13

Page 24: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R Data types

• Vectors

• V1 = c(1 , 2, 3, 4)

• V2 = c (3, 4, 5)

• V3 = c(V1, V2)

• V4 = 5 * V3

• Matrix

• M1 = matrix(c(1,2,3,4,5,6), nrows=2)

• M2 = matrix(c(1,2,3,4,5,6), nrows=2, byrow=ROW)

• M3 = t(M2) # transpose

5/31/13

Page 25: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

R Data types

• Lists

• 1:5 [1] 1 2 3 4 5

• 2^(1:5) [1] 2 4 8 16 32

• Missing values: NA

• Indexing

• Letters[1:3]

• Letters[c(7,9)]

• Letters[-c(1:15)]

5/31/13

Page 26: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Sounds like matlab, no ?

http://www.math.umaine.edu/~hiebeler/comp/matlabR.pdf MATLAB® / R Reference, by David Hiebeler

5/31/13

Page 27: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Your mission should you ….

5/31/13

Page 28: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Your second mission if you choose to accept it….

5/31/13

Page 29: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

5/31/13

Page 30: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Kernel density estimation

“In statistics, kernel density estimation (KDE) is a non-parametric way to estimate the probability density function of a random variable. Kernel density estimation is a fundamental data smoothing problem where inferences about the population are made, based on a finite data sample. “ - Wikipedia

5/31/13

Page 31: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Generalized linear model

“In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value.” - Wikipedia

5/31/13

Page 32: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Typical flow

5/31/13

Page 33: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Typical flow

5/31/13

Page 34: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Model

• Gather

• Cleanse

• Explore

• Theorize

• Create training set

• Train

• Validate

• Refine

• Port

5/31/13

Page 35: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Further reading

http://www.amazon.com/Machine-Learning-Hackers-Drew-Conway/dp/1449303714

5/31/13

Page 36: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Further reading

http://cran.r-project.org/doc/manuals/R-intro.pdf

5/31/13

Page 39: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Q&A

5/31/13

Page 40: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

5/31/13

Page 41: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Open Middleware 2.0 community & concept

proj. art. Natalia Borowicz

Douglas TaitOracle

Marcin NowakOrange Labs

www.openmiddleware.pl

Page 42: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

5/31/13

Page 43: Machine learning for a busy developer - Amazon S3s3-eu-west-1.amazonaws.com/presentations2013/9... · 2013-05-31 · Generalized linear model “In statistics, the generalized linear

Thank you !!!

5/31/13