Upload
ashish-bansal
View
902
Download
2
Embed Size (px)
Citation preview
Idiot’s Guide to Creating a Data Science Practice
We Create Emotionally Powerful and
Economically Sound Brand Experiences
powered by Programmatic And Strategic
Content
Ashish Bansal
April 30th 2015
2
Today’s objectives
Today I hope to convey to you that…
1. …data science team are built with a limited understanding of the
benefits
2. …it is very hard to find the right people for the role
3. …there are a few core things to build once the team gathered
I hope that you…
1. …improve your understanding of the role of data scientists in an org.
2. …learn how to increase your value in the market (get paid more!)
3. …find my jokes funny
3
Why Does a Digital Marketing Agency Need a Data Science Practice?
We are founded on four core pillars –
Strategy, User Experience, Technology, and Data
We want to build programmatic experiences (not
programmatic media) and foster brand loyalty with strong
measureable RoI
We want to build our own IP, solve problems no one has
attempted before
We need data scientists who know marketing, and
marketers who understand data
4
Knowing What You Want is a Great First Step!
5
Knowing What You Want…
I want to do deep learning
We need to build a
recommendation system
All other companies in our
category have a data science
practice
I want to automatically categorize
content for more relevant search
We want to improve customer
retention and cross-sell more
6
Knowing What You Want…
Have an end in mind – envision what success looks
like for your data science team
Be hypotheses driven – don’t fall into the trap of ‘lets
just look for something cool’
Simpler explainable algorithms before complicated
ones (given enough data)
Understand the domain…. Of the business you are in
“
”
7
Data Science, Data Engineering, Big Data blah blah blah..
What is the difference between Computer Science and Computer Engineering? Do you NEED
computer scientists or computer engineers? Do you HIRE computer scientists or computer
engineers?
Big Data is inconsequential.
Today’s tools hide complexity of big, small, thick, thin, light, dark, wide data. Think of Hadoop as
operating system.
Pop Quiz: If Hadoop is OS, then how would Cloudera, HortonWorks and MapR map to
Microsoft, Mac OSX and Linux?
Differentiate between Exploratory Data scientists vs Operational Data Scientists
• Exploratory work requires challenging assumptions, learning very quickly, and moving on
to the next thing – Most likely bored by repetitive work, architecture/code quality is
immaterial
• Operational work focuses on large scale deployment of algorithms, getting rid of feedback
loops, good code and architecture, performance optimizations
8
Data Scientist, Data Engineer….
Your options are:
• Hire one that can do both… (impossible to find)
• Hire a data scientist and a data engineer… (expensive)
• Hire one or the other and grow them into the other role… (takes
time)
Who do you need first?
• I need to prove that this team could add value, I need to build a
business case: Hire a data scientist
• I don’t know where all the data is – need to manage it properly
prior to analyzing it: Hire a data engineer
9
About Hiring Unicorns…
Programmer
Statistician
Man with glasses and hair
Wears Cardigan over a tie
Marketer (or your domain)
Writer
Must own Converse sneakers
No beer belly
10
Let’s Do Some Sampling Right Now!
N > 20
Every one who considers themselves to be a programmer, raise your
hands
Now, everyone who considers themselves a data scientist, know math
& statistics, or work on machine learning algorithms, keep your hands
up
Everyone who can put these two together and roll their own k-means
on Hadoop/Spark etc, keep your hands up
Everyone who can write a best seller, present to large audiences,
create decks for executive audiences, build D3 visualizations keep
your hands up
Everyone who understands the business/domain and customers of the
11
How to Hire a Data Scientist/Engineer if you are not one?
Ben Horowitz’s advice*:
Don’t hire on look and feel
Don’t value lack of weakness rather than strength
How I Did It:
• Educated myself – great resources available now – Coursera, Big Data
University, THUG meetups, PoC, AWS free tiers
• Talked to experts in my network – what do they do, what problems are they
solving
• Got leeway from my organization to fail early and learn quickly
• Decided against hiring unicorns – would rather grow them
* From The Hard Thing About Hard Things by Ben Horowitz
12
Applying Software Engineering to Data Science/Engineering Work Product
Layout, style, self documenting code
Refactoring Code
Debugging
Unit Testing (esp. stochastic processes)
Pipeline Jungles*
Handling Changes to The Matrix*
*Must Read: Machine Learning: high Interest Credit Card of Technical Debt: http://research.google.com/pubs/pub43146.html
Thank You