2015 Healthcare Data Science
Practical Data Science: The WPC Healthcare Strategy for Delivering Meaningful Data Science Projects
Damian Mingle@OPENDATASCI
3
Representative Clients
4
What’s the Problem?A Common Scenario: Johnny Data Scientist
• Does not like working with others• Too much black magic – not enough explanation• His process is always different• Uses multiple languages• Hates producing presentations• Constant unclear project status• Doesn’t capture business needs• Models aren’t production quality
5
Why Jupyter?Interactive Computing Environment
• Notebook Web Application: Writing and running code interactively
• Kernels: Over 40 programming languages• Notebook Documents: Self-contained documents which
include: Live code, Interactive widgets, Plots, Narrative text, Equations, Images, and Video
6
Why a Data Science Methodology?Data Science Projects Involve Risk
• Strategically: Provides confidence to the business that Data Science projects can be delivered profitably
• Tactically: Management can understand status assessments• Operationally: Empowers the Data Science team to do the
right thing, the right way, the first time.
7
Business UnderstandingUncover important factors at the Start
• Determine business objectives• Assess situation• Determine data science goals• Produce project plan
Understand the Data Science project objectives and requirements from a business perspective. Then convert this knowledge into a Data Science problem definition and preliminary plan designed to achieve the objective.
8
Exercise 1
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
9
Data UnderstandingBecome familiar with the data
• Collect initial data• Describe data• Explore data• Verify data quality
Identify data quality problems, discover first insights into the data, and/or detect interesting subsets to form hypotheses regarding hidden information.
10
Exercise 2
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
11
Data PreparationConstruct the Final Dataset
• Select data• Clean data• Construct data• Integrate data• Format data
Data Science task in this phase have to do with selection of table, record, and attributes. In addition, transformation and cleaning of data.
12
Exercise 3
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
13
ModelingVarious Modeling Techniques Are Selected
• Select modeling technique• Generate test design• Build model• Assess model
In this phase, calibrating parameters is important. Some techniques may require the Data Scientist to go back to the data preparation phase.
14
Exercise 4
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
15
EvaluationReview Your Steps with Certainty
• Evaluate results• Review process• Determine next steps
At the end of this phase, a decision on the use of the data science results should be reached.
16
Exercise 5
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
17
DeploymentMake Use of The Model
• Plan deployment• Plan monitoring and maintenance• Produce final report• Review project
This phase can be as simple as generating a report or as complex as implementing a repeatable data science process across the enterprise.
18
Exercise 6
https://github.com/drmingle/Boston-Data-Festival-2015/tree/master/Exercises
19
Data Scientist 2.0Lead Analytically Your Organization
• Use Jupyter to document your process – real time – using whatever language you want!
• Establish a Data Science Methodology that is comprehensive• Provide insights that help the organization make better
decisions to solve their business problems
Have Questions?E-mail: [email protected]: @damianmingleLinkedIn: DamianRMingle