40
Copyright © 2012, SAS Institute Inc. All rights reserved. GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS, BRAD PITT AND THE IKEA BILLY INDEX Longhow Lam Freelance Data Scientist https://www.linkedin.com/in/longhowlam https://longhowlam.wordpress.com @longhowlam

Data science in action

Embed Size (px)

Citation preview

Page 1: Data science in action

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS,

BRAD PITT AND THE IKEA BILLY INDEX

Longhow Lam – Freelance Data Scientist

https://www.linkedin.com/in/longhowlam

https://longhowlam.wordpress.com

@longhowlam

Page 2: Data science in action

Data Science in Action

AGENDA

TEXT MINING AND MACHINE LEARNING

SOME CRAZY EXAMPLES

Goede tijden Slechte tijden

IENS Restaurant Reviews

Who looks like Brad Pitt?

The IKEA Billy Index

Page 3: Data science in action

Text mining and Machine Learning

Page 4: Data science in action

Text mining: simple exampleDoc 1 “I walked accross the street in Amsterdam, 1057DK, with my bike”

Doc 2 “She didn’t walk but cycled with her blue biike, //bitly.com/sdrtw”

Doc 3 “My bicycle is broken, what a piece of junk, @#$%$@!”

Terms Doc 1 Doc 2 Doc 3

+Bicycle (noun) 1 1 1

Cycling (verb) 0 1 0

Blue (adjective) 0 1 0

Amsterdam (location) 1 0 0

+Walk (verb) 1 1 0

Street (noun) 1 0 0

Broken (adjective) 0 0 1

Piece of junk (noun) 0 0 1

1057DK (postal code) 1 0 0

//bitly.com/sdrtw 0 1 0

TERM DOCUMENT MATRIX: A

• Every text document is a (very)

long string (with many zeros!)

• Data mining techniques are

applied to this matrix A

Page 5: Data science in action

Data Science in Action

TEXT MINING PREDICT OR CLUSTER

Combine texts and “normal data” to predict behaviour (churn / fraude)

Use machine learning to train a

learner f to predict the TARGET

Automatically create topics / clusters in huge piles of documents

Apply cluster techniques to divide

documents into topic

Topic 1 Topic 2 Topic 3

Page 6: Data science in action

Data Science in Action

MACHINE LEARNING SOME ALGORITHMS

Predict

Trees

Random Forests

Cluster

K-means

Hierarchical clustering

DBSCAN

Lineair regression

f

y = f(x) = a0 + a1x1 + a2x2+…anxn

Neural networks y = f(g(h(x)))

Page 7: Data science in action
Page 8: Data science in action

Data Science in Action

GTST ANALYSIS TEXT ANALYTICS

Business pain

Looking at GTST (Dutch soap): what the hack is this all about?

Are there trends in the series, is it not all the same?

Approach

Take the 5000 summaries and apply text mining in SAS

Page 9: Data science in action

Data Science in Action

GTST ANALYSIS RESULTS

Main topics in 5000 episodes

Page 10: Data science in action

Data Science in Action

GTST ANALYSIS DISTANCES BETWEEN TOPICS

Page 11: Data science in action

Data Science in Action

GTST ANALYSIS ZOOMING IN ON A TOPIC

Page 12: Data science in action

Data Science in Action

GTST ANALYSIS ZOOMING IN ON A TOPIC

Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine)

Harmsen feeling lonely.

Plan by Jack, dangerous

Writing a farewell letter

Panic, fear,

Questions about giving kid assignment

Getting money back, paying

IMPORTANT: Business validation!

I asked my wife, she used to be a loyal GTST watcher

Page 13: Data science in action

Data Science in Action

GTST ANALYSIS TREND RESULTS

Trends over time with SAS text profile feature

Page 14: Data science in action

Data Science in Action

GTST ANALYSIS TRENDS OVER TIME

Page 15: Data science in action

Data Science in Action

GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS

Page 16: Data science in action

Data Science in Action

Can you shake hands with your neighbor?

A LITTLE STATISTICAL EXPERIMENT

Two statistics that I like to share:

Page 17: Data science in action

Data Science in Action

Can you shake hands with your neighbor?

A LITTLE STATISTICAL EXPERIMENT

50.1% of people don’t

wash their hands

after visiting the toilet

Page 18: Data science in action

Data Science in Action

Can you shake hands with your neighbor?

A LITTLE STATISTICAL EXPERIMENT

50.1% of people don’t

wash their hands

after visiting the toilet

84.6% of all statistics are

just made up on the spot !!

Page 19: Data science in action
Page 20: Data science in action

Data Science in Action

IENS RESTAURANT PATH ANALYTICS

Business pain

I have eaten Chinese, where should I go next?.

Approach

Look at what others do, IENS restaurant reviewers!

Page 21: Data science in action

Data Science in Action

A FEW FACTS… IENS DATA (TRADITIONAL BI)

Most occurring restaurant name (39 times)

Among “dutch”

restaurant (6 times)

% Sustainable kitchensBiological (67%)

French (58%)

Fish (44%)

Vegetarian (39%)

Chinese (3%)700 reviews on a “normal” Saturday

Valentine 2015 1200 reviews (1.7 times)

23 times

12 times

Page 22: Data science in action

Data Science in Action

IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS

Page 23: Data science in action

Data Science in Action

IENS REVIEWS CAN SENTIMENT BE PREDICTED?

Translate the reviews into a term document matrix

Apply machine learning to predict scores

Why would you do this?

Page 24: Data science in action

Data Science in Action

IENS REVIEWS CAN I PREDICT THE SENTIMENT?

Page 25: Data science in action

Data Science in Action

IENS REVIEWS PREDICT THE ‘EAT’ SCORE

Neural (2 X 20) R2 of 0.65

Linear reg model R2 of 0.56

Page 26: Data science in action

Data Science in Action

Predicted review score vs. Given review score

IENS REVIEWS PREDICTION THE ‘EAT’ SCORE

Page 27: Data science in action

Data Science in Action

IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING

Page 28: Data science in action
Page 29: Data science in action

Data Science in Action

OUTLIERS IN FACES DATA MINING & MACHINE LEARNING

Business pain

Tell me: Who has a strange face at SAS Netherlands?

Approach

Take SAS photos and translate to data and apply machine learning

Page 30: Data science in action

Data Science in Action

OUTLIERS IN FACES DATA MINING & MACHINE LEARNING

Page 31: Data science in action

Data Science in Action

STRANGE FACE

DETECTIONCOMBO OF OPEN API & SAS

Use Face++ to do facial landmarking (no deep learning!!)

Import all landmarks in SAS as an ABT

Now you can solve some funny business issues with machine learning:

Which persons are look-alikes?

Hierarchical clustering

Are there any accountmanagers?

Predictive modeling / machine learning

Who is the Brad Pitt at SAS?

Nearest Neighbor

Funny faces

Anomaly / outlier detection

Page 32: Data science in action

Data Science in Action

STRANGE FACE

DETECTIONHIERARCHICAL CLUSTERING

Page 33: Data science in action

Data Science in Action

STRANGE FACE

DETECTIONBRAD PITT LOOK-A-LIKES…

Page 34: Data science in action

Data Science in Action

STRANGE FACE

DETECTIONOUTLIER DETECTION

Page 35: Data science in action
Page 36: Data science in action

Data Science in Action

IKEA WEBSITE KEEP TRACK OF BILLY STOCK

Define the IKEA Billy Index

as the change in stock over time

Page 37: Data science in action

Data Science in Action

IKEA WEBSITE THE IKEA BILLY INDEX

Page 38: Data science in action

Data Science in Action

THE BILLY INDEX SOME STATISTICS

Page 39: Data science in action

Data Science in Action

Every extra unit increase in wind speed results in 19 less Billy’s sold

Page 40: Data science in action

C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

Thanks for your attention, QUESTIONS?

Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken

https://www.linkedin.com/in/longhowlam

https://longhowlam.wordpress.com/

@longhowlam