View
5.006
Download
7
Category
Preview:
Citation preview
Data scientists are big data wranglers. They take an enormous mass of messy data points and use their formidable skills in math, statistics and programming to clean, massage and organize them. Then they apply all their analytic powers and domain knowledge to uncover hidden solutions to business challenges.
Script (modified) from http://www.mastersindatascience.org/careers/data-scientist/
A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.
domain knowledge business understanding+
http://www.mastersindatascience.org/careers/data-scientist/
Diagram from https://www.quora.com/What-is-a-data-scientist-3
http://www.sintetia.com/wp-content/uploads/2014/05/Data-Scientist-What-I-really-do.png
DB
Log
SQLData
TXT / EXL Visualization
Implement
Test & Deploy
[KR]Algorithm
- Regression - Classification - Clustering
[Insight]
Value
BIG & FAST SMARTCount & Trend PredictiveTechnical Meaningful
Analytics
Volume Variety
Velocity
Engineering Science
Data Science
Scientific Method
Proved by
TheoryVerified with
Experiment
Algorithm(Equations)
Testing(Evidence)
M T W T F S S
Code Release
Off-line Test
On-line Test
Deployment
Monitoring & Improvement
Netflix’s Weekly Test & Deployment
A/B Test Configuration
Traffic-driven For every incoming request, if random() < 0.1, then assign the treatment group (10%) otherwise, assign the control group (90%)
User-driven For every requestor (whose userId ends with ‘NN’) if ‘NN’ is in ’00 ~ 09’, then assign the treatment group otherwise, assign the control group
Random
Control Group
Treatment (A)
A/B Test
Random
Control Group
Treatment A
Treatment B
Treatment C
Multivariate Test
DB
Log
SQL Data
Implement
Test & Deploy
[KR]Algorithm
- Regression - Classification - Clustering
[Insight]
20 60
15
5
Recommended