13
Could You Be a Data Scientist? Carlo Torniai, Ph.D. @carlotorniai

Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Embed Size (px)

DESCRIPTION

Short presentation about my final project at Zipfian Academy about quantifying Data Scientist profiles using Linkedin data. The prototype web app is available at: bit.ly/cybads

Citation preview

Page 1: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Could You Be a Data Scientist?

Carlo Torniai, Ph.D.@carlotorniai

Page 2: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

• Quantify data scientist profiles features • Analyze aspirant data scientist profiles• Provide useful feedback

Goal

?

Page 3: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Why this is relevant?

• A quantitative characterization of data scientists profiles can help closing the loop between job seekers and recruiters

Image: http://www.getelastic.com/wp-content/uploads/puzzle1.jpg

Page 4: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

• Linkedin API:– General Information– Past work history– Education

• Web Scraping:– Skills

• 1500 profiles– Data Scientists– Software Engineer– Business Analysts– Mathematicians– Statisticians

Page 5: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Business AnalystsData scientists

Software Engineers

StatisticiansMathematicians

Page 6: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Bioi

nfor

mati

cs

Biol

ogy

Com

pute

r Sc

ienc

e

Econ

omic

s

Elec

tron

ics

Astr

onom

y

Mat

h

Neu

rosc

ienc

e

Oth

er

Phys

ics

Psyc

holo

gy

Stat

s

Engi

neer

ing

Number of PhDs by topic and profiles

Page 7: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

For the purpose of this project I trained with skills and education features the following models:Random Forest• Classify the profileNaïve Bayes• Multi class probabilities to asses profiles

background componentsK-means• Capability of suggesting similar and relevant profiles

Page 8: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

For the purpose of this project I trained with skills and education features the following models:

Model Training set Purpose

Random Forest

All 5 categories Classify the profile

Naïve Bayes 4 classic categories: SE, BA, MT, ST

Asses profile backgrounds components with multi class probabilities

K-means All 5 categories Identify similar profiles

Page 9: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

bit.ly/cybads

Page 10: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Naïve BayesMulti class probabilities

Random Forest

Page 11: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

K-meansclustering

Page 12: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Next Steps

Data Collection Data AnalysisFeature Extraction Model Testing Data Product

Get more data:- Other websites- Indeed- User input on

Web app

- Fine grained parsing of education- Experiment with additional features (industry, years of experience)

• Extend feature set and test more models

• Fuzzy C-means

• Add interactive data collection

• Personalized links for skills

• Explanation about similarity results

Close the loop by analyzing job offers and suggest matching profiles

Page 13: Could You be a Data Scientist? Quantify Data Scientist Profiles using Machine Learning and Linkedin API

Thank you!

Technologies

Web App: Flask, jQuery, Vega, MongoDB

NMF, HC, RF ,DT, NB, K-means models:: scikit-learn

Visualizations:Vincent, Vega, NetworkX, Gephi

Acknowledgementyatish27 : Ruby Linkedin public profile Web Scraperozgut : Linkedin API Python wrapper