Upload
peter-skomoroch
View
762
Download
0
Embed Size (px)
DESCRIPTION
Warren Buffet would often think of companies as castles with a competitive moat protecting the business. Products or companies that figure out how to build and leverage differentiated data assets will be best positioned to win their respective markets. This talk describes the properties of a good data moat, why it matters, and how to go about building them within your organization.
Citation preview
Building Competitive Moats With Data
Pete Skomoroch@peteskomorochDataLeadOct 1, 2014 - Berkeley
About Me
• Ex Principal Data Scientist @ LinkedIn • Entrepreneur, Advisor at Data Collective
Competitive Moats
Data as Competitive Moat
Why the current obsession with Big Data?
The rise of Hadoop
What is Big Data?
Big Data: Myths
Big Data: Reality
• Science, theory, and reason are not being replaced• Big Data is different: for some problems, big data produces
better results than we find with smaller samples• Data storage and logging are increasingly cheap, so err on the
side of collecting data to process later if you think it may be valuable
• Large, differentiated data assets are the foundation for defensible products and better decisions
If software is eating the world…
… it is replacing it with data
Startups are moving offline life to online data
• Restaurants => Yelp• Resume + Rolodex => LinkedIn• Powerpoint => SlideShare• Yearbook + Photos => Facebook• Real Estate => RedFin• Interior Design => Houzz
The Data Factory Revolution
Source: http://www.linkedin.com/channels/disrupt
2013 Steve Jennings/Getty Images Entertainment
Early Data Factory: del.icio.us
User Generated Data Moats
User entered data has Gravity
Behavioral history is a moat: life is easier when apps remember you
Reputation based Data Moats
Network Based Data Moats
Don’t build on top of someone else’s moat
Real scientists make their own data
Build distinct, defensible datasets
This sounds great, how do I build a data moat?
http://xkcd.com/802/
A new occupation: data scientist
source: data from http://www.linkedin.com/skills
What do data scientists actually do?
Two species of data scientist*
Type I: Traditional BI• Question-driven• Interactive• Ad-hoc, post-hoc• Fixed data• Focus on speed and
flexibility• Output is embedded into a
report, dashboard, or in-database scoring engine
Type II: Data Products• Metric-driven• Automated• Systematic• Fluid data• Focus on transparency
and reliability• Output is a production
system that makes customer-facing decisions
*Slide adapted from Josh Wills “From the Lab to the Factory”
Data Products: automated systems that make customer facing decisions and collect data
Data Product pre-history: Data Aggregators
• 1972: Vinod Gupta forms American Business Information, Inc., a database initially built via manual data entry of Yellow Pages information
• 1973: LEXIS full text legal search launches publicly
• 1986: Bloomberg reaches 5,000 terminal subscribers
• 1994: Jerry Yang & David Filo compile and maintain a hand curated set of categorized links on the World Wide Web known as the Yahoo! Directory
The Rise of Algorithmic Data Products
• Google: Web Search, PageRank, AdWords• Netflix: Movie Recommendations• Pandora: Music Recommendations • eBay: Product Search, Fraud Detection, Advertising• Amazon: Similar Items, Book Recommendations• LinkedIn: People You May Know, Who Viewed My Profile
LinkedIn Skills: a moat built by data products
Data Product investment and ROI
• Skill Extraction and Standardization Pipeline• Skill Pages• Skills Section on member profiles• Suggested Skills Algorithm and email > 20M members• Skill Endorsements > 60M members, 3B+ Edges• Big product wins: engagement, recall, relevance• SkillRank & Reputation Algorithm R&D• LinkedIn is now the definitive source for information
on skills & expertise*Statistics as of 2013
How leaders can drive data growth
• Accountability: Who defines the data vision & roadmap in your organization? Who is accountable for building and expanding your moat?
• Invest in data infrastructure, training, logging, & tools for rapid iteration. Build a data lake.
• Invest in exploration and innovation, including user facing data product and algorithm development
• Define a framework for trading off data quality and quantity metrics
• Ask “How does this increase our data moat?” when evaluating any new project, incentivize it
Twitter: @peteskomoroch LinkedIn: linkedin.com/in/peterskomoroch