Upload
longhow-lam
View
150
Download
0
Embed Size (px)
Citation preview
C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
GOEDE TIJDEN SLECHTE TIJDEN, RESTAURANT REVIEWS,
BRAD PITT AND THE IKEA BILLY INDEX
Longhow Lam – Freelance Data Scientist
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com
@longhowlam
Data Science in Action
AGENDA
TEXT MINING AND MACHINE LEARNING
SOME CRAZY EXAMPLES
Goede tijden Slechte tijden
IENS Restaurant Reviews
Who looks like Brad Pitt?
The IKEA Billy Index
Text mining and Machine Learning
Text mining: simple exampleDoc 1 “I walked accross the street in Amsterdam, 1057DK, with my bike”
Doc 2 “She didn’t walk but cycled with her blue biike, //bitly.com/sdrtw”
Doc 3 “My bicycle is broken, what a piece of junk, @#$%$@!”
Terms Doc 1 Doc 2 Doc 3
+Bicycle (noun) 1 1 1
Cycling (verb) 0 1 0
Blue (adjective) 0 1 0
Amsterdam (location) 1 0 0
+Walk (verb) 1 1 0
Street (noun) 1 0 0
Broken (adjective) 0 0 1
Piece of junk (noun) 0 0 1
1057DK (postal code) 1 0 0
//bitly.com/sdrtw 0 1 0
TERM DOCUMENT MATRIX: A
• Every text document is a (very)
long string (with many zeros!)
• Data mining techniques are
applied to this matrix A
Data Science in Action
TEXT MINING PREDICT OR CLUSTER
Combine texts and “normal data” to predict behaviour (churn / fraude)
Use machine learning to train a
learner f to predict the TARGET
Automatically create topics / clusters in huge piles of documents
Apply cluster techniques to divide
documents into topic
Topic 1 Topic 2 Topic 3
Data Science in Action
MACHINE LEARNING SOME ALGORITHMS
Predict
Trees
Random Forests
Cluster
K-means
Hierarchical clustering
DBSCAN
Lineair regression
f
y = f(x) = a0 + a1x1 + a2x2+…anxn
Neural networks y = f(g(h(x)))
Data Science in Action
GTST ANALYSIS TEXT ANALYTICS
Business pain
Looking at GTST (Dutch soap): what the hack is this all about?
Are there trends in the series, is it not all the same?
Approach
Take the 5000 summaries and apply text mining in SAS
Data Science in Action
GTST ANALYSIS RESULTS
Main topics in 5000 episodes
Data Science in Action
GTST ANALYSIS DISTANCES BETWEEN TOPICS
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Data Science in Action
GTST ANALYSIS ZOOMING IN ON A TOPIC
Sub-topics of main topic: topic 16 (Ludo, Isabelle, Martine, Janine)
Harmsen feeling lonely.
Plan by Jack, dangerous
Writing a farewell letter
Panic, fear,
Questions about giving kid assignment
Getting money back, paying
IMPORTANT: Business validation!
I asked my wife, she used to be a loyal GTST watcher
Data Science in Action
GTST ANALYSIS TREND RESULTS
Trends over time with SAS text profile feature
Data Science in Action
GTST ANALYSIS TRENDS OVER TIME
Data Science in Action
GTST ANALYSIS SIMILARITY OF EPISODES THROUGH THE YEARS
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
Two statistics that I like to share:
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
Data Science in Action
Can you shake hands with your neighbor?
A LITTLE STATISTICAL EXPERIMENT
50.1% of people don’t
wash their hands
after visiting the toilet
84.6% of all statistics are
just made up on the spot !!
Data Science in Action
IENS RESTAURANT PATH ANALYTICS
Business pain
I have eaten Chinese, where should I go next?.
Approach
Look at what others do, IENS restaurant reviewers!
Data Science in Action
A FEW FACTS… IENS DATA (TRADITIONAL BI)
Most occurring restaurant name (39 times)
Among “dutch”
restaurant (6 times)
% Sustainable kitchensBiological (67%)
French (58%)
Fish (44%)
Vegetarian (39%)
…
…
…
Chinese (3%)700 reviews on a “normal” Saturday
Valentine 2015 1200 reviews (1.7 times)
23 times
12 times
Data Science in Action
IENS RESTAURANT PATH ANALYSIS: GENERATED PATHS
Data Science in Action
IENS REVIEWS CAN SENTIMENT BE PREDICTED?
Translate the reviews into a term document matrix
Apply machine learning to predict scores
Why would you do this?
Data Science in Action
IENS REVIEWS CAN I PREDICT THE SENTIMENT?
Data Science in Action
IENS REVIEWS PREDICT THE ‘EAT’ SCORE
Neural (2 X 20) R2 of 0.65
Linear reg model R2 of 0.56
Data Science in Action
Predicted review score vs. Given review score
IENS REVIEWS PREDICTION THE ‘EAT’ SCORE
Data Science in Action
IENS REVIEWS SENTIMENT ANALYSIS / PREDICTIVE MODELING
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Business pain
Tell me: Who has a strange face at SAS Netherlands?
Approach
Take SAS photos and translate to data and apply machine learning
Data Science in Action
OUTLIERS IN FACES DATA MINING & MACHINE LEARNING
Data Science in Action
STRANGE FACE
DETECTIONCOMBO OF OPEN API & SAS
Use Face++ to do facial landmarking (no deep learning!!)
Import all landmarks in SAS as an ABT
Now you can solve some funny business issues with machine learning:
Which persons are look-alikes?
Hierarchical clustering
Are there any accountmanagers?
Predictive modeling / machine learning
Who is the Brad Pitt at SAS?
Nearest Neighbor
Funny faces
Anomaly / outlier detection
Data Science in Action
STRANGE FACE
DETECTIONHIERARCHICAL CLUSTERING
Data Science in Action
STRANGE FACE
DETECTIONBRAD PITT LOOK-A-LIKES…
Data Science in Action
STRANGE FACE
DETECTIONOUTLIER DETECTION
Data Science in Action
IKEA WEBSITE KEEP TRACK OF BILLY STOCK
Define the IKEA Billy Index
as the change in stock over time
Data Science in Action
IKEA WEBSITE THE IKEA BILLY INDEX
Data Science in Action
THE BILLY INDEX SOME STATISTICS
Data Science in Action
Every extra unit increase in wind speed results in 19 less Billy’s sold
C op yr i g h t © 2012 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
Thanks for your attention, QUESTIONS?
Freelance Data Scientist, Ik sta open om eens een kop koffie te drinken
https://www.linkedin.com/in/longhowlam
https://longhowlam.wordpress.com/
@longhowlam