Upload
vitaly-gordon
View
193
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Strata New York 2014
Citation preview
Computing Professional Identity for the Economic Graph
Strata, New York – October 16th 2014
©2013 LinkedIn Corporation. All Rights Reserved.
Agenda
1 Introduction
2 LinkedIn’s Vision
3 Computing Professional Identity
4 Selected Topics
5 Summary
6 Final Words
1 Introduction
2 LinkedIn’s Vision
3 Computing Professional Identity
4 Selected Topics
5 Summary
6 Final Words
4
About me
©2013 LinkedIn Corporation. All Rights Reserved.
Vitaly Gordon
5
About me
©2013 LinkedIn Corporation. All Rights Reserved.
6
About me
©2013 LinkedIn Corporation. All Rights Reserved.
7
About me
©2013 LinkedIn Corporation. All Rights Reserved.
8
About me
©2013 LinkedIn Corporation. All Rights Reserved.
9
About me
©2013 LinkedIn Corporation. All Rights Reserved.
10
About me
©2013 LinkedIn Corporation. All Rights Reserved.
@bigdatasc /in/vitalygordon
11
What’s in it for you?
©2013 LinkedIn Corporation. All Rights Reserved.
12
What’s in it for you?
©2013 LinkedIn Corporation. All Rights Reserved.
1. You will learn about how LinkedIn takes a massive vision and breaks it down to small data problems
13
What’s in it for you?
©2013 LinkedIn Corporation. All Rights Reserved.
1. You will learn about how LinkedIn takes a massive vision and breaks it down to small data problems
2. You will learn about how hard cleaning data can be
14
What’s in it for you?
©2013 LinkedIn Corporation. All Rights Reserved.
1. You will learn about how LinkedIn takes a massive vision and breaks it down to small data problems
2. You will learn about how hard cleaning data can be
3. You will learn why LinkedIn needs endorsements
15
What’s in it for me?
©2013 LinkedIn Corporation. All Rights Reserved.
16
What’s in it for me?
©2013 LinkedIn Corporation. All Rights Reserved.
17
What’s in it for me?
©2013 LinkedIn Corporation. All Rights Reserved.
@bigdatasc
1 Introduction
2 LinkedIn’s Vision
3 Computing Professional Identity
4 Selected Topics
5 Summary
6 Final Words
19©2013 LinkedIn Corporation. All Rights Reserved.
Create economic opportunity for every professional in the world
20©2013 LinkedIn Corporation. All Rights Reserved.
21©2013 LinkedIn Corporation. All Rights Reserved.
• CEO• Chief Executive Officer• CEO and Founder• CEO & Co-founder• President and CEO• Owner
22©2013 LinkedIn Corporation. All Rights Reserved.
• IBM• International Business Machines• International Bus. Machines• IBM Research• IBM T.J. Watson Research Center• IBM Canada• IBM India
23©2013 LinkedIn Corporation. All Rights Reserved.
• UCLA• University of California, Los Angeles• UC Los Angeles• The Anderson School of Management
24©2013 LinkedIn Corporation. All Rights Reserved.
1 Introduction
2 LinkedIn’s Vision
3 Computing Professional Identity
4 Selected Topics
5 Summary
6 Final Words
©2013 LinkedIn Corporation. All Rights Reserved.26
Why Do We Need Identity Standardization?
©2013 LinkedIn Corporation. All Rights Reserved.27
Why Do We Need Identity Standardization?
©2013 LinkedIn Corporation. All Rights Reserved.28
Why Do We Need Identity Standardization?
©2013 LinkedIn Corporation. All Rights Reserved.29
©2013 LinkedIn Corporation. All Rights Reserved.30
Text Based Solution
Applies acronym expansion (e.g. vp -> vice president) Applies abbreviation expansion (e.g. sr. -> senior) Select the most common standard titles Selects standard sub strings (e.g. software engineer and tech lead
in search -> [software engineer, tech lead])
©2013 LinkedIn Corporation. All Rights Reserved.31
Problems with a Text Based Approach
Senior Software Engineer
Software Engineer
©2013 LinkedIn Corporation. All Rights Reserved.32
Problems with a Text Based Approach
Software Engineer Software Developer Programmer
©2013 LinkedIn Corporation. All Rights Reserved.33
Problems with a Text Based Approach
Architect
©2013 LinkedIn Corporation. All Rights Reserved.34
©2013 LinkedIn Corporation. All Rights Reserved.35
©2013 LinkedIn Corporation. All Rights Reserved.36
©2013 LinkedIn Corporation. All Rights Reserved.37
38
Profile Inferred Skills Endorsements Skill Vectors
39
Profile Inferred Skills Endorsements Skill Vectors
Data Mining Hadoop Machine Learning Java Algorithms Python MapReduce Data Science0%
5%
10%
15%
20%
25%
30%
40
Profile Inferred Skills Endorsements Skill Vectors
Data Mining Hadoop Machine Learning Java Algorithms Python MapReduce Data Science0%
5%
10%
15%
20%
25%
30%
http://www.slideshare.net/s_shah/strata-endorsements
41
Ontology Creation
42
Ontology Creation
43
Classification
1 Introduction
2 LinkedIn’s Vision
3 Computing Professional Identity
4 Selected Topics
5 Summary
6 Final Words
©2013 LinkedIn Corporation. All Rights Reserved.45
Normalization
46
Normalization
©2013 LinkedIn Corporation. All Rights Reserved.47
Clustering
48
Clustering
• Each topic is a distribution over words• Each document is a mixture of corpus-wide topics• Each word is drawn from one of those topics
49
Anomaly Detection
50
Anomaly Detection
51
Anomaly Detection
http://www.slideshare.net/tdunning/strata-2014-anomaly-detection
1 Introduction
2 LinkedIn’s Vision
3 Computing Professional Identity
4 Selected Topics
5 Summary
6 Final Words
60
Summary
©2013 LinkedIn Corporation. All Rights Reserved.
1. User generated content from 300M members, creates 300M problems
61
Summary
©2013 LinkedIn Corporation. All Rights Reserved.
1. User generated content from 300M members, creates 300M problems
2. Data cleaning is so much more than filtering out empty values
62
Summary
©2013 LinkedIn Corporation. All Rights Reserved.
1. User generated content from 300M members, creates 300M problems
2. Data cleaning is so much more than filtering out empty values
3. Try to be creative and work around difficult language problems
1 Introduction
2 LinkedIn’s Vision
3 Computing Professional Identity
4 Selected Topics
5 Summary
6 Final Words
We’re building the next big thing
We’re building the next big thingJoin Us
We’re building the next big thingJoin Us!
We’re building the next big thingJoin Us!
DJ Patil Gary Flake Beau Cronin
@bigdatasc /in/vitalygordon