Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Jeroen Hardon | Venture Café | March 2017

Bringing big data to life

1.3

Exabytes

2.9

Million

Per second

375

Megabytes

Per day

24

Petabytes

Per day

50

Million

Per day

700

Billion

Minutes

per month

73

Items

Per second

Big data is everywhere

20

Hours

Per minute

A journey in segmentation with

data scientists and big data.

What was the

problem?

What was the

solution?

How well did

it work?

Needs-based

segmentation

7 segments

created

Classifier

tool build,

using 10

questions

Original segmentation study

This resulted in a

happy client.

“Let’s tag a segment to

each person in our

database of 40 million“

12.000 people

from the database

answered the

classifier

questions

Those 12.000

were classified

in 1 of the 7

segments

Attitudinal

segments not

explained by

demographics

Attitudes ≠ Demographics

Revised

segments should

align better with

big data

Must predict

original

segments in

segmentation

study

Merging the

2 types of

data

New classification tool

The database

and survey

demographics

did not match

We build classifiers

by matching survey

data to resemble

the database

We generated many

samples of our

survey data and

built an ensemble

of classifiers

Ensembles

While building ensembles of

classifiers helped, it was still

inadequate.

We needed to strengthen the demographic / behavioral signal

Expectation Maximization

?


5


How do I

"assign" each of

the individual

fruits to a tree

type?

What are the

characteristics of

the fruit of each

tree type?




Observed Data

Initial segmentation data

6500 respondents

Augment of 12000

from Big Data

Known

fixed

segment

Unknown

segment

+ Model 1


Observed Data

Initial segmentation data

6500 respondents

Augment of 12000

from Big Data

Known

fixed

segment

Unknown

segment

+

Big data

variables

Model 2

We got classifiers that were slightly

less optimal in predicting survey

data, but much more aligned with

the big data.

We made sure to not let the predictive accuracy drop below 70%

(originally 80%)

How well did it work?

Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7

Seg 1 564 84 15 56 36 14 18

Seg 2 68 844 84 13 7 13 10

Seg 3 33 72 561 2 3 1 5

Seg 4 34 8 0 567 5 81 29

Seg 5 27 12 1 6 635 50 57

Seg 6 21 27 6 76 43 873 30

Seg 7 18 28 9 50 59 52 1193

Initia

l cla

ssifie

r

segm

en

t

Revised classifier segment

Only 19% changed

Data Source: Survey Data of 6500

How well did it work?

Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7

Seg 1 135 102 18 66 207 157 45

Seg 2 119 545 171 58 174 203 101

Seg 3 55 113 316 44 240 219 72

Seg 4 90 67 4 283 233 287 69

Seg 5 303 169 41 216 1994 925 205

Seg 6 325 259 36 261 646 1591 127

Seg 7 52 26 3 90 193 191 156

Initia

l cla

ssifie

r

segm

en

t

Revised classifier segment

Over 58% changed

Data Source: Augment of 12000

Conclusions

Big data cannot

predict

everything

No need to be

scared of big data.

Surveys and big

data can coexist

Expectation

maximization

provides a

framework for

joint modeling

So what?

So what?

So what?

So what?

Questions?

Jeroen Hardon

Director Methodology and Innovation EU

+31 6 288 399 47

[email protected]

Documents

Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259