Upload
srinu-adira
View
422
Download
0
Tags:
Embed Size (px)
Citation preview
Big Data EcoSystem and Analytics @ LinkedInMay 16, 2013
LinkedIn Confidential ©2013 All Rights Reserved
Srinu Adira
Manager, Data Services(Business Solutions)
LinkedIn Corporation
http://www.linkedin.com/in/srinuadira
LinkedIn Confidential ©2013 All Rights Reserved 2
Outline
LinkedIn OverviewWhy Data is important for LinkedIn?Big Data EcosystemAnalytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 3
Our Mission
Connect the world’s professionalsto make them more productive and successful
LinkedIn Confidential ©2013 All Rights Reserved 4
5
The LinkedIn Opportunity
Connect talent with opportunity at massive scale
+
Fundamentally transforming the way the world works
LinkedIn Confidential ©2013 All Rights Reserved
200M+
The World’s Largest Professional Network
LinkedIn Confidential ©2013 All Rights Reserved 6
8 1732
55
90
147
2006 2007 2008 2009 2010 2011 2012
LinkedIn Members (Millions)
*
88%Fortune 100 Companies
use LinkedIn to hire
~2/secNew Members joining
>2.9MCompany Pages
Professional
searches in 2012
~5.7B
Outline
LinkedIn Overview Why Data is important for LinkedIn? Big Data Ecosystem Analytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 7
LinkedIn Confidential ©2013 All Rights Reserved 8
“If you are not embarrassed by the first versionof your product, you have launched it too late.”
Reid Hoffman, Founder & Chairman LinkedIn Corp
LinkedIn Confidential ©2013 All Rights Reserved 9
“What gets measured gets fixed.”
David Henke, SVP Technology Operations, LinkedIn Corp
LinkedIn Confidential ©2013 All Rights Reserved 10
The Power of LinkedIn’s Network Effects
Member growthand engagement
Relevant andvaluable products, solutions & services
Critical massof data
Few Data Driven Products People You May Like Groups You May Like Jobs You May Be Interested In Who's Viewed Your Profile Companies You May Want To Follow
11
LinkedIn Confidential ©2013 All Rights Reserved
Data Insights (Sample)
LinkedIn Confidential ©2013 All Rights Reserved 12
Data Solutions (Sample)
LinkedIn Confidential ©2013 All Rights Reserved
Segmentation/Standardization
Propensity Modeling
TargetingChurn Analysis/LTV
Business Forecasting
Java/MPP/
Hadoop
ML/Statistical Packages
HadoopMPP
MPP
13
Data Solutions Drivers Business analytics (e.g., data mining,
enable decision making) Sales analytics (e.g., customer
segmentation, targeting) Marketing (e.g., campaigns) Data insights for Customers (e.g., Career
site analytics) Business Operations (forecasting,
business pulse)
14
LinkedIn Confidential ©2013 All Rights Reserved
Outline
LinkedIn Overview Why Data is important at LinkedIn? Big Data Ecosystem Analytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 15
Big Data at LinkedIn
16
* Chart from Philip Russom- Research Director: TDWI
LinkedIn Confidential ©2013 All Rights Reserved
LinkedIn Confidential ©2013 All Rights Reserved 17
Big Data at LinkedIn
Platform and solutions that Scale at cost with data complexity Simplify the data continuum across online, near-line
and offline Enable business decisions
18
What does “big data” mean at LinkedIn?
ERP data…
Social Data…
CRM data…
Web data…
+∞
+∞
Data Volume
Analytical Challenge & Complexity0
18
LinkedIn Confidential ©2013 All Rights Reserved
3 major data dimensions at LinkedIn
19
IdentityData
SocialData
Behavioral Data
LinkedIn Confidential ©2013 All Rights Reserved
LinkedIn Confidential ©2013 All Rights Reserved 20
Near-LineData Store
Online DataStore
WebLogs
Big Data at LinkedIn
High-level data environment
Application
Users
Challenges so complex thatoff-the-shelf or a few
technologies can’t address
Offline DataStore
Built our own combination oftoolsets/ technologies tomeet specific requirements
LinkedIn Confidential ©2013 All Rights Reserved 21
LinkedIn’s Sample Data Stack
Let’s do a deep dive to understand how the capabilities ofLinkedIn’s data stack meet our requirements
LinkedIn Confidential ©2013 All Rights Reserved 22
Users
Near-LineData Store
Online DataStore
Application Offline DataStore
WebLogs
LinkedIn Data Stack – Online
Systems
•
•
Capabilities
Rich structures (e.g., indexes)
Change capture capability
LinkedIn Confidential ©2013 All Rights Reserved 33
Users
Near-LineData Store
Online DataStore
Application Offline DataStore
WebLogs
LinkedIn Data Stack – Nearline
Systems Capabilities
•
•
•
Distributed Key value store
Search platform
Distributed Graph engine
Bobo Sensei
Voldemort
Zoie
D-Graph
LinkedIn Confidential ©2013 All Rights Reserved 34
Users
Online DataStore
Application Offline DataStore
WebLogs
LinkedIn Data Stack – Pipeline
Systems Capabilities
•
•
•
Messaging for site events, monitoring
Change data capture streams
Reliable, consistent, low latency pipe
Near-LineData Store
LinkedIn Confidential ©2013 All Rights Reserved 35
Users
Near-LineData Store
Online DataStore
Application Offline DataStore
WebLogs
LinkedIn Data Stack – Offline
Systems
•
•
Capabilities
Machine learning, ranking,Relevance, SolutionsWarehouse and analytics
LinkedIn with Hadoop, Aster, and Teradata
Aster/TeradataBi-Directional Connector
Aster/TeradataHadoop Connectors
Data transformation& batch processing• Image processing• Search indexes• Graph (PYMK)• MapReduce
Batch data transformations forengineering groups using HDFS +
MapReduce
LinkedIn Confidential ©2013 All Rights Reserved
Analytic Platform for datadiscovery• nPath Pattern/Path• Clickstream analysis• A/B site testing• Data Sciences discovery• SQL-MapReduce
Interactive MapReduceanalytics for the enterprise using
MapReduce Analytics &SQL-MapReduce
Integrated DataWarehouse• Exec Dashboards• Adhoc/OLAP• Complex SQL• SQL
Integration with structured data,operational intelligence, scalable
distribution of analytics
26
Outline
LinkedIn Overview Why Data is important at LinkedIn? Big Data Ecosystem Analytics at LinkedIn
LinkedIn Confidential ©2013 All Rights Reserved 27
Several examples of business analytics evolution at LinkedIn
Products
Marketing
Sales
1
2
3
28
How we leverage data to support Marketing
29
Identity DataSocial Data
Behavioral Data
Overall Audience
Target Audience
LinkedIn Confidential ©2013 All Rights Reserved
The closed-loop analytical framework
30
Execution
Reporting & business
intelligence
Post campaign analysis
Model building and tuning
Campaign planning & design
Test
Measure
Why?Predict
Design
LinkedIn Confidential ©2013 All Rights Reserved
A example of using data to improve sales
Which account? Who? How?
Step 1 Step 2 Step 3
Identity Data
Social Data
Behavioral Data
31
How to provide 500 to 1000X impact?
Insights portal for sales org.
Easy: quickly find right info
Fast: few seconds response time for most insights
Scalable: 2M+ accounts/prospects
Accurate: mimic analyst/data scientist1
2
3
4
32
Four stages of data analytics
What will happen?
What happened?
Why it happened?
What is happening?
High
High
Business Value
Analytical Challenge & Complexity0
33
LinkedIn Confidential ©2013 All Rights Reserved
Use data to solve product problems-- A solution for answering A/B testing questions
Let technology work for us
Results first, methodology later
Bypass the charts and reports
Several thousands A/B tests are live, how to measure the performance?
1
2
3
34
LinkedIn Confidential ©2013 All Rights Reserved
Nextplay : Web 3.0 – It’s all about data!!
LinkedIn Confidential ©2013 All Rights Reserved 35