21
How I became a Data Scientist Owen Zhang

How i became a data scientist

Embed Size (px)

Citation preview

How I became a Data Scientist

Owen Zhang

Let’s tell a story

● Just to prove that I can talk other things than kaggle● Today’s goal, as always, is to entertain, not enlighten.● Apologize for presuming myself to be “experter data

scientist”.

What It Takes to be a Good Data Scientist

● Domain knowledge● Coding skills● Math/Stats

But maybe equally (or even more) important:● Ask the right question● Tell a good story

How Much Math(/Stats) is Required?

● Math is an extremely broad field● Personally I am good at numerical problems but bad

at algebra● Guestimate has always been my strength vs “precise

answer”● Having good intuition is more helpful than being

able to prove theorems

Majored in Engineering but...

Always wanted to be a “Data Scientist”● Unfortunately that didn’t exist at that

time

Three useful things learned in college● Linear algebra● Programming● Teamwork (a.k.a. party with your

friends)

Even after Y2K, there were plenty of IT jobs

● By chance I got a job as software developer● By chance it was in insurance

○ Arguably insurance has the best data to practice data science on

○ Very noisy○ High variety○ Not too small and not too big

The Most Useful Things Learned Doing IT

● It is NOT how to program!○ My coding skill probably degenerated

● Be interested in learning the domain○ I learned my “domain expertise” here

● Speak the “business language”○ Terminology is very important

● How to talk to IT folks

What to do when bored with your job?

● Career switch!● The following approach isn’t recommended:

Wanna be a chef? I’ve never

cooked before but you can trust me

Lesson learned in switching careers

● It is counter productive to talk about how you would be good at something that you haven’t done before

● Use cases / stories● Find the right mentor/sponsor

Don’t Laugh, but Almost Became an Actuary

● Why?○ Actuaries were doing “data science” way before

“data scientist” became a job title○ My wife is an actuary○ I am good at taking exams

● Why not?○ Data Science came along before I finished all the

exams

Finally made it to Data Science

IT Developer

Finally made it to Data Science

IT Developer Data Scientist

Became “Expert Data Scientist”

● It is both easy and hard to transform from “some IT guy who wants to be a (predictive) modeler” to “expert data scientist”○ The trick is to get new colleagues

● At that time it was called “predictive modeler”

● “Legitimized” by Kaggle Kaggle

What I Learned being a “Practitioner”

● The most important insight:○ Asking the right question is more important than

getting the perfect answer

● The right “form” of question:○ What will/can you do differently if you have a

prediction of [????]

If We Finish here...

● Then we would have made a very common mistake in data analysis○ All we have is an anecdote

● Enemies and friends of Data Science○ “Anecdotal” vs “general”○ “Co-occurrence” vs “correlation”○ “Correlation” vs “causality”

An ExampleOwen was good at math and became a data scientist

An ExampleOwen was good at math and became a data scientist(1000 people) were good at math and became data scientists

An Example

Good@Math Became Data ScientistYes No %Became DS

Yes 1,000 99,000 1%

No 10,000 90,000 10%

%Good@Math 9% 52%

Owen was good at math and became a data scientist(1000 people) were good at math and became data scientists

An Example

Good@Math Became Data ScientistYes No %Became DS

Yes 1,000 9,000 10%

No 1,000 99,000 1%

%Good@Math 50% 8.3%

Owen was good at math and became a data scientist(1000 people) were good at math and became data scientists

An Example● We found something!

○ People who are good at math has 10 times better chance to become Data Scientist!

● Is this good enough? Depending on your use case:○ Probably good enough to make up some math

interview questions for DS○ But not necessarily good enough to say “let’s

teach kids more math so that more of them become data scientists”

That’s All ● Questions?● Office hour at 1:30pm