27
Jeroen Hardon | Venture Café | March 2017 Bringing big data to life

Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Jeroen Hardon | Venture Café | March 2017

Bringing big data to life

Page 2: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

1.3

Exabytes

2.9

Million

Per second

375

Megabytes

Per day

24

Petabytes

Per day

50

Million

Per day

700

Billion

Minutes

per month

73

Items

Per second

Big data is everywhere

20

Hours

Per minute

Page 3: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

A journey in segmentation with

data scientists and big data.

Page 4: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

What was the

problem?

What was the

solution?

How well did

it work?

Page 5: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Needs-based

segmentation

7 segments

created

Classifier

tool build,

using 10

questions

Original segmentation study

Page 6: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

This resulted in a

happy client.

Page 7: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

“Let’s tag a segment to

each person in our

database of 40 million“

Page 8: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

12.000 people

from the database

answered the

classifier

questions

Those 12.000

were classified

in 1 of the 7

segments

Attitudinal

segments not

explained by

demographics

Attitudes ≠ Demographics

Page 9: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Revised

segments should

align better with

big data

Must predict

original

segments in

segmentation

study

Merging the

2 types of

data

New classification tool

Page 10: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

The database

and survey

demographics

did not match

We build classifiers

by matching survey

data to resemble

the database

We generated many

samples of our

survey data and

built an ensemble

of classifiers

Ensembles

Page 11: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

While building ensembles of

classifiers helped, it was still

inadequate.

We needed to strengthen the demographic / behavioral signal

Page 12: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Expectation Maximization

?

Page 13: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Expectation Maximization

5

Page 14: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Expectation Maximization

How do I

"assign" each of

the individual

fruits to a tree

type?

What are the

characteristics of

the fruit of each

tree type?

Page 15: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Expectation Maximization

Page 16: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Expectation Maximization

Page 17: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Expectation Maximization

Observed Data

Initial segmentation data

6500 respondents

Augment of 12000

from Big Data

Known

fixed

segment

Unknown

segment

+ Model 1

Page 18: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Expectation Maximization

Observed Data

Initial segmentation data

6500 respondents

Augment of 12000

from Big Data

Known

fixed

segment

Unknown

segment

+

Big data

variables

Model 2

Page 19: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

We got classifiers that were slightly

less optimal in predicting survey

data, but much more aligned with

the big data.

We made sure to not let the predictive accuracy drop below 70%

(originally 80%)

Page 20: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

How well did it work?

Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7

Seg 1 564 84 15 56 36 14 18

Seg 2 68 844 84 13 7 13 10

Seg 3 33 72 561 2 3 1 5

Seg 4 34 8 0 567 5 81 29

Seg 5 27 12 1 6 635 50 57

Seg 6 21 27 6 76 43 873 30

Seg 7 18 28 9 50 59 52 1193

Initia

l cla

ssifie

r

segm

en

t

Revised classifier segment

Only 19% changed

Data Source: Survey Data of 6500

Page 21: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

How well did it work?

Seg 1 Seg 2 Seg 3 Seg 4 Seg 5 Seg 6 Seg 7

Seg 1 135 102 18 66 207 157 45

Seg 2 119 545 171 58 174 203 101

Seg 3 55 113 316 44 240 219 72

Seg 4 90 67 4 283 233 287 69

Seg 5 303 169 41 216 1994 925 205

Seg 6 325 259 36 261 646 1591 127

Seg 7 52 26 3 90 193 191 156

Initia

l cla

ssifie

r

segm

en

t

Revised classifier segment

Over 58% changed

Data Source: Augment of 12000

Page 22: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Conclusions

Big data cannot

predict

everything

No need to be

scared of big data.

Surveys and big

data can coexist

Expectation

maximization

provides a

framework for

joint modeling

Page 23: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

So what?

Page 24: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

So what?

Page 25: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

So what?

Page 26: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

So what?

Page 27: Bringing big data to life - SKIM€¦ · Seg 2 119 545 171 58 174 203 101 Seg 3 55 113 316 44 240 219 72 Seg 4 90 67 4 283 233 287 69 Seg 5 303 169 41 216 1994 925 205 Seg 6 325 259

Questions?

Jeroen Hardon

Director Methodology and Innovation EU

+31 6 288 399 47

[email protected]