Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
2018 Predictive Analytics Symposium
Session 33: Commercializing a Data Science Model as Application Programming Interface (API) or Batch
Service
SOA Antitrust Compliance Guidelines SOA Presentation Disclaimer
Jeffrey Heaton, Ph.D. and Ed Deuser
September 2018
Commercializing a Data Science Model as API or Batch Service
2
Agenda
Intro
Operational Readiness
Model Methodology
Partnerships
Example
3
Intro
4
Presenters
Jeffrey Heaton, Ph.D. – Lead Data Scientist - RGA
Ed Deuser – Technical Architect and Developer - RGA
RGA Reinsurance CompanyThe security of experience. The power of innovation. www.rgare.com
Ed Deuser is a Technical Architect with RGA Reinsurance Company. In this role, Ed is responsible for technical solutions that support RGA’s global business units, including Valuation, Financial Solutions, Underwriting, and Global Research, Development and Analytics. He also served as the technical lead for B3i, the Blockchain Insurance Industry Initiative, and guides other digital objectives for RGA. In addition to his experience in the insurance sector, Ed has worked in financial services, government and law enforcement. Accomplished in the emerging field of distributed ledger technology, Ed has participated in RGA sponsored hackathons as a coach and was part of the winning team at the Office of the National Coordinator (ONC) for Health Information Technology’s first-ever hackathon.Ed received his Bachelor of Science in Information Systems from the University of Missouri–St. Louis. His article “From R Studio to Real-Time Operations,” which he co-authored with RGA Lead Data Scientist Jeff Heaton, was published in the December 2017 issue of the Society of Actuaries’ Predictive Analytics and Futurism Section newsletter.
Jeff Heaton is a lead data scientist at Reinsurance Group of America (RGA), an adjunct instructor for the Sever Institute at Washington University, and the author of several books about artificial intelligence. Jeff holds a Master of Information Management (MIM) from Washington University and a Ph.D. in computer science from Nova Southeastern University. Over twenty years of experience in all aspects of software development allows Jeff to bridge the gap between complex data science problems and proven software development. Working primarily with the Python, R, Java/C#, and JavaScript programming languages he leverages frameworks such as TensorFlow, Scikit-Learn, Numpy, and Theano to implement deep learning, random forests, gradient boosting machines, support vector machines, T-SNE, and generalized linear models (GLM). Jeff holds numerous certifications and credentials, such as the Johns Hopkins Data Science certification, Fellow of the Life Management Institute (FLMI), ACM Upsilon Pi Epsilon (UPE), a senior membership with IEEE. He has published his research through peer reviewed papers with the Journal of Machine Learning Research and IEEE.
5
Science is good, but how do my customers use it ?
6
Operational Readiness
7
Operational Readiness Readiness occurs throughout the
project; most importantly when it starts.
End User Journey – Contract and Service Level Agreement (SLA)
Security is first and last thing we think of.
Agreed on patterns of use• Batch • Real Time• Web
Project Execution
Workload Reality
Project Inception
Project at Risk
Project Failure
8
Contract Management
Clear Expectation Management in Contractual Terms
End User Journey and Expectations
Standard Service level agreement as basis
End Users Journey to a delivered Service level agreement (SLA)
9
Threat Modeling• How could it be compromised ?• How to protect compromised sections ?
Logging, Monitoring and Alerting• Forensic logging of the item to be protected
and where it is housed.• Monitor and Alert on suspicious activities and logs.
Pen Testing • Contract with someone to ensure the item is protected.
Security in DepthShould be first and last thing we think of
“According to Microsoft, the potential cost of cyber-crime to the global community is a mind-boggling $500 billion, and a data breach will cost the average company about $3.8 million. “
10
API in English PleaseAPI stands for Application Programming Interface.
Cohort – 100Id, gender, conditionsScores
Cohort – 100Id, gender, conditions, score
What is an API ?
API
Compute Score
API
11
Model Development Methodology
12
Model Development Methodology
Model Scoping and Business Understanding Data Understanding Data Discovery and
Enrichment Model Fitting /
Validation Model Deployment
13
Input Format for Model
Clients tend to vary the format of input data during model development.
Columns provided might change.
Column names might change.
Date formats may not be consistent.
For an automated API, this format must become consistent.
For an API, data input must be very standardized
14
Use Excel as a Tool, Not a Format
Excel is a powerful data exploration tool for rapid analysis.
However, Excel can be a problematic data exchange format.• Inability to specify export encoding (UTF-8, Unicode, etc.).• Excel often mangles input by inferring data format. Such as treating SNOMED codes as
numbers.• Different tools generate Excel files differently. • Many more ways to confuse automated imports with Excel than CSV.
For tabular data, we prefer CSV (UTF-8)
15
Input Format for Model
Input from the client is usually in JSON, XML, or CSV format.
For real time API’s we prefer JSON/XML• JSON and XML provide a hierarchical view of data.• JSON and XML do not always easily fit into Excel.
For batch, we generally prefer CSV (sometimes Excel)• CSV and Excel both store data in tabular format.
JSON, CSV, or XML?
16
The XML FormatVerbose and Hierarchical
17
The JSON FormatConcise and JavaScript-like
18
Data Discovery and Enrichment
Client input data usually will not contain all necessary information for a model.• If identity of individual is known (PII), we might augment with:
o 3rd party marketing data on individual.o 3rd party credit data on individual.
• If identity of individual is unknown (PII-less):o RGA severity scores for drugs or medical diagnosis.o RGA mortality tables.
Augmenting the input data with additional data sources
19
Model Fitting
Model fitting is where a data scientist trains a model based on data.
Fitting is usually a very manual process that can go on for days, weeks, or months.
The final output from fitting is a model that can be deployed for client use.
Teaching a model from data
20
Model Deployment
How will your model be used?• Will the model be used directly by individual human users?• Will the model be integrated into a system developed by client’s IT?• Will the model be used as part of a client’s mobile application?• Will users upload files that a client will upload?
Manual steps from fitting must be automated.
Input data must be checked for errors.
Making your model available to clients
21
Personally Identifiable Information (PII)and Data Retention
Some input data contains PII, others do not.
Some clients request us to retain no data.
We prefer to keep some data.
We usually do not store PII data on the model side.
What data should we retain? (and where)
22
Ongoing Model Validation
Client data distributions can change over time.
Baseline truth can change.
Models must be evaluated over time to ensure they remain relevant.
Calibration is an ongoing process.
Keeping the model relevant
23
Partnerships
24
Know your strengths
Types of partnerships :
• Internal
“Partnering with different parts of your organization “
• External
“ i.e. Staff Augmentation, Client Partner (i.e. RGA) “
Partnerships in Place to Ensure success
Questions to ask :
• Do you have data scientists in your organization ?
• Are you experienced in cloud deployments ?
• Can you sustain the DevOps practice ?
• Do you understand where your attack vectors are ?
25
Example Commercialization
26
Commercialization exampleEXAMPLE. models
Swagger Hub – Create an API first, what's on the menu
Upload API to API gateway on AWS.
Pre- templated NodeJS Lamda to compute score on cohort.
27
Questions
28
Appendix
29
Resources to use for creating your own API
Disclaimer:
The resources provided are intended for educational purposes only and do not replace independent professional judgment. Statements of fact and opinions expressed are those of the participants individually and, unless expressly stated to the contrary, are not the opinion or position of Reinsurance Group of America, its cosponsors, or its committees. Reinsurance Group of America does not endorse or approve, and assumes no responsibility for, the content, accuracy or completeness of the information presented. The above resources do not provide all security measures that are recommended; such that appropriate security measures are not provided use freely at your own risk.
https://github.com/eddeuser2017/commercialize_api