Keynote at Big Data Tech Con SF 2014


Citation preview

Building Data Products: The Right Order of Things

Gloria Lau VP of Data, Timeful

Keynote @ Big Data Tech Con @gloriatlau

What do they have in common?

Right order of things

def __init__(self):

data infrastructure

for x in range(3):

offline modeling

online data product

user feedback

Model Product

Model Product

The challenge

Exception: tracking code missing/overloaded!

Debug: Power user computation takes forever!

def __init__(self):

data infrastructure

for x in range(3):

offline modeling

online data product

user feedback

The challenge

Data viz --> ID'ed new data potential --> Yet another data product

Sparse data --> Crappy model --> Need to nudge users for *more* data

Non-standardized data --> Crappy model --> Need to standardize

def __init__(self):

data infrastructure

for x in range(3):

offline modeling

online data product

user feedback

• Four diseases have broken out in the world and it is up to a team of specialists in various fields to find cures for these diseases before mankind is wiped out ... the diseases are out breaking fast and time is running out: the team must try to stem the tide of infection in diseased areas while also towards cures. A truly cooperative game where you all win or you all lose.

• How do you win?

• Optimally deploy minimal resources in the right order

• What is optimal

• Do you fix that tracking issue first?

• Do you optimize your power user computation?

• Do you double down on standardization?

• Relevant classifications

• P0 vs P1

• big company vs small company

2 Questions to ask

1 Quote answers them all

–Donald Knuth

“Premature optimization is the root of all evil.”

What is the one metric that your data product will move?

• Retention. Growth. Engagement. Money. Etc.

• Find it, and focus

If your users use your product a min/day/user, how would you spend that?

• Data scientists love data. More the merrier.

• More data solves your data scientist's problem. It does not solve your user's problem.

• Q1: Is it in the critical path of measuring that metric?

• Q2: Are you throwing away user's time?

Do you fix that tracking issue first?

Do you optimize your power user computation?

• Q1: Are power users your key user metric to lift?

• Q2: What fraction of total user's time is affected by this?

Do you double down on standardization?

• Q1: Peel the onion. How will x% increase in standardization rate affect your current and projected metric?

• Q2: Does it add friction to the funnel?

–Donald Knuth

“Premature optimization is the root of all evil.”

• Right order:

• talent first

• assimilation

• the 3%; fail fast

–Donald Knuth

“Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and

these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about

small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that

critical 3%. A good programmer will not be lulled into complacency by such reasoning, he will be wise to look carefully at the critical code; but only after that code has been identified. It is often a mistake to make a priori judgments about what parts of a program are really critical, since

the universal experience of programmers who have been using measurement tools has been that their intuitive guesses fail.”

It's an art.