Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Jeroen Hardon | Venture Café | March 2017
Bringing big data to life
1.3
Exabytes
2.9
Million
Per second
375
Megabytes
Per day
24
Petabytes
Per day
50
Million
Per day
700
Billion
Minutes
per month
73
Items
Per second
Big data is everywhere
20
Hours
Per minute
A journey in segmentation with
data scientists and big data.
What was the
problem?
What was the
solution?
How well did
it work?
Needs-based
segmentation
7 segments
created
Classifier
tool build,
using 10
questions
Original segmentation study
This resulted in a
happy client.
“Let’s tag a segment to
each person in our
database of 40 million“
12.000 people
from the database
answered the
classifier
questions
Those 12.000
were classified
in 1 of the 7
segments
Attitudinal
segments not
explained by
demographics
Attitudes ≠ Demographics
Revised
segments should
align better with
big data
Must predict
original
segments in
segmentation
study
Merging the
2 types of
data
New classification tool
The database
and survey
demographics
did not match
We build classifiers
by matching survey
data to resemble
the database
We generated many
samples of our
survey data and
built an ensemble
of classifiers
Ensembles
While building ensembles of
classifiers helped, it was still
inadequate.
We needed to strengthen the demographic / behavioral signal
Expectation Maximization
?
Expectation Maximization
5
Expectation Maximization
How do I
"assign" each of
the individual
fruits to a tree
type?
What are the
characteristics of
the fruit of each
tree type?
Expectation Maximization
Expectation Maximization
Expectation Maximization
Observed Data
Initial segmentation data
6500 respondents
Augment of 12000
from Big Data
Known
fixed
segment
Unknown
segment
+ Model 1
Expectation Maximization
Observed Data
Initial segmentation data
6500 respondents
Augment of 12000
from Big Data
Known
fixed
segment
Unknown
segment
+
Big data
variables
Model 2
We got classifiers that were slightly
less optimal in predicting survey
data, but much more aligned with
the big data.
We made sure to not let the predictive accuracy drop below 70%
(originally 80%)
How well did it work?
Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7
Seg 1 564 84 15 56 36 14 18
Seg 2 68 844 84 13 7 13 10
Seg 3 33 72 561 2 3 1 5
Seg 4 34 8 0 567 5 81 29
Seg 5 27 12 1 6 635 50 57
Seg 6 21 27 6 76 43 873 30
Seg 7 18 28 9 50 59 52 1193
Initia
l cla
ssifie
r
segm
en
t
Revised classifier segment
Only 19% changed
Data Source: Survey Data of 6500
How well did it work?
Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7
Seg 1 135 102 18 66 207 157 45
Seg 2 119 545 171 58 174 203 101
Seg 3 55 113 316 44 240 219 72
Seg 4 90 67 4 283 233 287 69
Seg 5 303 169 41 216 1994 925 205
Seg 6 325 259 36 261 646 1591 127
Seg 7 52 26 3 90 193 191 156
Initia
l cla
ssifie
r
segm
en
t
Revised classifier segment
Over 58% changed
Data Source: Augment of 12000
Conclusions
Big data cannot
predict
everything
No need to be
scared of big data.
Surveys and big
data can coexist
Expectation
maximization
provides a
framework for
joint modeling
So what?
So what?
So what?
So what?