PMSA Winter Symposium Uncovering the Machine Learning

ⓒ 2018 SYMPHONY HEALTH ALL RIGHTS RESERVED | CONFIDENTIAL

JANUARY 2019

1

PMSA Winter SymposiumUncovering the Machine Learning Black Box


ⓒ 2018 SYMPHONY HEALTH ALL RIGHTS RESERVED | CONFIDENTIALⓒ 2017 SYMPHONY HEALTH ALL RIGHTS RESERVED | CONFIDENTIAL

• Log on to https://meet.ps/pmsa

• Start the polls

• Start questions

• Broadcast outputs

2

For meeting pulse


For our discussion today…..

• Machine learning: What is it and why do it?

• The history and future of learning

• Uncovering the machine learning “black box”• The process

• When the output does not look so good: Troubleshooting issues

• Reliable and repeatable output

• Machine Learning road map

(case studies incorporated into deck)

3


What is your experience with machine learning?

1. I am interested in machine learning but have not managed or executed machine learning projects yet

2. We have been using machine learning in other parts of our organization and I would like to start using these techniques for our department

3. I have recently started using machine learning in the last 12 months

4. I am a data scientist and practitioner for over a year

4


Machine Learning, What is it and why do it?

5


A. The practice of using algorithms to parse data, learn from it, and then make a determination or prediction

B. The science of getting computers to act without being explicitly programmed

C. The science of using algorithms to take over human decision making and ultimately jobs like making cocktails, driving cars and eventually the practice of medicine

D. The science of getting computers to learn as well as humans do or better

E. All the above

F. A, B and D

6

What is machine learning?


What is machine learning?

7

An example from Amazon.com

• Machine learning algorithms allows retailers to provide recommendations based on previous search and purchase activity

• Model algorithms use prior history to be predictive

• Machine learning models make recommendations based on patient characteristics, previous medical history and treatment activity

In the case of healthcare:

Predictions which will improve patient

health outcomes

2018 SYMPHONY HEALTH ALL RIGHTS RESERVED | CONFIDENTIAL

Recommendations for Karin

© 2014 Symphony Health Solutions. All Rights Reserved. 8

Why do machine learning?

The holy grail in health care is not fancier technology and tools, it is physician and patient behavior change. Machine learning will truly come of age when it can systematically and reliably do one of two things – improve the decision-making of clinicians and patients or improve their efficiency in carrying out the actions that follow from those decisions.

8

Jean Drouin, M.D.Founder and CEO - Clarify Health Solutions

https://www.academyhealth.org/blog/2018-02/breaking-

through-hype-health-care-what-can-machine-learning-really-

do-your-patients


❏ Quicker diagnoses, better treatment plans, and new approaches to insurance

❏ In the US: $300B in possible savings for population health forecasting

❏ In the UK: £3.3B in possible savings for preventative care and non-elective

hospital admission reductions

❏ 30-50% improvement in nurse productivity

❏ ≤2% GDP savings for operational efficiencies in developed countries

❏ 5-9% health expenditure reduction by tailoring treatments and keeping

patients engaged

❏ $2-10T savings globally by tailoring medications and treatments

❏ +0.2-1.3 years onto the average life expectancy

9

Projected benefits of adopting AI in healthcare

https://www.mckinsey.com/~/media/mckinsey/industries/advanced%20electronics/our%20insights/how%20artificial%20intelligence%20can%20deliver%20real%20value%20to%20companies/mgi-artificial-intelligence-discussion-paper.ashx

ⓒ 2018 SYMPHONY HEALTH ALL RIGHTS RESERVED | CONFIDENTIALⓒ 2017 SYMPHONY HEALTH ALL RIGHTS RESERVED | CONFIDENTIAL 10

Generation of healthcare “big data” sources & machine learning has advanced pharma R&D and commercialization

The leading 10 disease types considered in the artificial intelligence (AI) literature

Explosion of highly disparate health data sources being collected

GENETIC

WEARABLES

DIET & EXERCISE

EMR, CLAIMS LABS

MICROBIOMIC

Top machine learning algorithms used in medical literature

Jiang F, Jiang Y, Zhi H,et al. Artificial intelligence inhealthcare: past, present andfuture. Stroke and VascularNeurology 2017;2: e000101.doi:10.1136/svn-2017-000101


Uncovering the machine learning black box

UNPACKING A DECISION TREE1



Bigram analysis & word cloud to show associations2

Bigram % Score('child', 'health') 2.31%('need', 'prophylactic') 1.97%('prophylactic', 'vaccination') 1.94%('vaccination', 'inoculation') 1.90%('otitis', 'medium') 1.46%('infant', 'child') 1.26%('attention', 'deficit') 1.01%('respiratory', 'infection') 0.95%('examination', 'abnormal') 0.86%('allergic', 'rhinitis') 0.80%('acute', 'respiratory') 0.73%('encounter', 'immunization') 0.68%('health', 'examination') 0.64%('spontaneous', 'rupture') 0.45%



Drug_Name # CountsHYDROCORTISONE 1091LEVETIRACETAM 901MEPHYTON 869FLUDROCORTISONE ACETATE 637RANITIDINE HCL 305GABAPENTIN 292BACLOFEN 258ALBUTEROL SULFATE 252LEVOTHYROXINE SODIUM 238FLUTICASONE PROPIONATE 238CLONAZEPAM 235OMEPRAZOLE 226DIAZEPAM 223MONTELUKAST SODIUM 214AMOXICILLIN 211

Common Drugs Prescribed to Patients with Zellweger Syndrome & Peroxisomal Disorder

Drugs Prescribed to Patients with Zellweger Syndrome

Network visualizations to show associations between diagnosis & concomitanttherapies

3

ⓒ 2018 SYMPHONY HEALTH ALL RIGHTS RESERVED | CONFIDENTIAL 14

Patient FindingCase Study 1: Leveraging Machine Learning to Identify Physicians with a Rare Lipid Disorder


Case 1 - Working with a manufacturer of a rare disease drug to identify undiagnosed patients who

are likely to have the rare lipid disorder

Model Patients

Optimized Model

Rx Model

Separate Brand X patients from control population

based on historical prescription information

Dx Model


based on historical diagnosisinformation

Px Model


based on historical procedure information

Separate Brand X patients from control population based on historical diagnosis, procedure, prescription, and demographic

information

106 Brand X patients with full patient histories and no

exclusionary diagnoses


Multiple models are built with varying methodologies

Tree Model 84% positive predictive value

Overall Error is 24% (FN:10%, FP: 14%)

Predicted

Actu

al

Random Forest Model 95% positive predictive value

Boost Model 97% positive predictive value

Predicted

Actu

al

Predicted

Actu

alOverall Error is 8% (FN:6%, FP: 2%) Overall Error 13% (FN:8%, FP: 5%)

Brand X Control Error

Brand X 0.49 0.10 0.16

Control 0.14 0.27 0.35


Brand X 0.52 0.06 0.11

Control 0.02 0.40 0.04


Brand X 0.50 0.08 0.14

Control 0.05 0.36 0.12


Patient probability thresholds ensure precise deployment


Case Study 1 – factors found in modeling can provide new clinical insights

Age is a critical demographic factor in

recognizing Brand X patients

Diabetes diagnoses and treatment play a major role

in identifying Brand X patients

Treats hypertension and congestive heart failure

Diagnosis/Prescription/Procedure/Demographic Variable Codes of Interest

Lipodystrophy 272.6

Pure Hyperglyceridemia 272.1

Fibric Acid Derivatives

Age

Diabetes Mellitus 250.00-250.99

Diabetes Accessories

HUMAN INSULIN, ANALOG LONG ACTING

BIGUANIDES, ALONE

Metformin HCl

Hydrocodone acetaminophen

Immunization administration (1 vaccine) 90471

Lisinopril

The lipodystrophy diagnosis as well as diagnoses/ treatments for high

triglycerides play a major role

Pain-killers may be prescribed for joint pain,

while vaccinations may be a proxy for age


Resolving Diagnosis Ambiguities: ForecastingCase Study 2: Leveraging Machine Learning to Distinguish Between TI and T2 Diabetes

19


Case Study 2 – Machine Learning Resolves “ambiguous” diabetes patients by assignment to Type-1 or Type-2 status prior to projection

InductiveModeling

DeductiveRule block

Validation & Scoring

Inductive Models were developed across algorithms, patient samples, and variable sets

Deductive approaches were used to identify a set of consensus business rules

Models validated against known T1 & T2 patients not used in training, followed by scoring of the “ambiguous” patients to assign them to T1 or T2 with known probabilities

Ambiguous Diabetes Patient Typing & ProjectionOBJECTIVE• The CHALLENGE with the

patient level diagnosis data is that due to various reasons patients have confounding diabetes diagnoses. • Many Type 1 patients

have a diagnosis for Type 2 as well and vice versa

• The SOLUTION: Develop a robust algorithm that will accurately identify patients as Type 1 or Type 2 based on their demographics and medical history


Case Study 2 – Deductive methods require specific defined rules based on research and knowledge of disease space

MORE THAN 4 DIAGNOSES2,332,786

LESS THAN 4 DIAGNOSES 348,584

AMBIGUOUSPATIENTS THAT HAVE MORE THAN 1 DIAGNOSIS OF BOTH T1 AND T2

2,681,370

INDUCTIVE

Ability to Identify Patterns and Hidden Relationships

DEDUCTIVE

Rule IV

RULE 1

RULE 2

RULE 3

RULE 4

TYPE1

TYPE2

97.2%

98.6%

97.4%

98.8%

Purity%52,202 69,583

Unique Patients Classified as T1 via deductive models

56,293

1,404,443 Unique Patients

Classified as T2 via deductive models

1,383,843

385,688

Patients ClassifiedPatient is LESS THAN 28 years of age

Patient HAS BEEN using ‘Human Insulin, Analog Fast Acting, VIAL’

Patient is LESS than 30 years of agePatient HAS NOT BEEN using Metformin AND Type II diabetes drugsPatient HAS BEEN using some form of Insulin

Patient HAS NOT been provided with ‘Infusion Pump’Patient HAS BEEN diagnosed with HypertensionPatient HAS BEEN using Metformin AND Type II diabetes drugs

Patient is MORE THAN 30 years of agePatient HAS NOT BEEN provided with ‘Infusion Pump’Patient HAS BEEN using Metformin AND Type II diabetes drugsPatient HAS NOT BEEN using any form of Insulin


Multiple Inductive models are built with varying methodologies

Ensembles

DX + PX + RX ModelT1: 84% positive predictive valueT2: 86% positive predictive value

(FN: 8%, FP: 7%)Predicted

Actu

al

DX + PX ModelT1: 80% positive predictive valueT2: 79% positive predictive value

DX + RX ModelT1: 80% positive predictive valueT2: 84% positive predictive value

Predicted

Actu

al

Predicted

Actu

al

(FN: 10%, FP: 11%) (FN: 10%, FP: 8%)


Brand X 0.43 0.08 0.16

Control 0.07 0.43 0.14


Brand X 0.40 0.1 0.20

Control 0.11 0.4 0.21


Brand X 0.41 0.10 0.20

Control 0.08 0.41 0.16


Inductive models are used to further identify “ambiguous” patients as Type 1 vs Type 2 Diabetics

MORE THAN 4 DIAGNOSES2,332,786

LESS THAN 4 DIAGNOSES 348,584

AMBIGUOUSPATIENTS THAT HAVE MORE THAN 1 DIAGNOSIS OF BOTH T1 AND T2

2,681,370

INDUCTIVEDEDUCTIVE

DX + PX + RX DX + PX DX + RXT1 Patients Identified 490,731 103,513 136,692

T2 Patients Identified 107,721 107,738 186,543

Total Unique Patients 598,452 211,251 323,235

DX + PX + RX ModelT1: 84% positive predictive valueT2: 86% positive predictive value

DX + PX ModelT1: 80% positive predictive valueT2: 79% positive predictive value

DX + RX ModelT1: 80% positive predictive valueT2: 84% positive predictive value

14% Type-186% Type-2

Final proportions after attribution

CONCLUSION


Resolving Diagnosis Ambiguities: Projection Case Study 3: Leveraging Machine Learning to Correctly Identify Cancer Sub-Types

24


Case Study 3 - SHS investigated a modeling approach to reassign a clinical “catch-all” designation back to known good tumor type designations

Current: Following JBI/PCYC protocols all DLBCL2 would be assigned to DLBCL

The clinical difficulties in distinguishing between DLBCL, FL and MZL could make business rule allocations hard to establish

Alternative: DLBCL2 re-allocation via Machine Learning back to known good categories

• 5-6 candidate data matrices

• ~15,000 variables

Interim DLBCL2 Patient Indication(CLL*, DLBCL*,FL*,MCL*,MZL*,WM*)

SHA Patient Journey Data

MachineLearning

Application of clinical expertise asmeta-rules/business rules

Final DLBCL2 patient Indication Assignments

Dimension reduction &

variable selection

Low Dimensional (11 variables)

Comparative Model Selection & Validation

1st

Pass

2nd

Pass

3rd

Pass

Greater Range of Options for Error Reduction


Case Study 3 – Machine learning was leveraged to re-assign patients

with an ambiguous ICD-9 code in order to enhance the accuracy of

projections

Overall NHL Patients Across IndicationsN = 169,849

MCL872

3.97%

CLL1,0935.35%

WM343

1.58%

FL16,35275.16%

DLBCL1,5438.73%

MZL1,2075.21%

DLBCL276,400

MCL4,360

CLL56,176

WM4,752

FL15,981

DLBCL6,105

MZL6,075

Re-Allocate DLBCL2 Patients Across Tumor Types76,400

Number of Patients &

Share

• The model does well at re-allocating DLBCL2 patients into MCL, CLL and WM • However, it is more prone to error when re-allocating DLBCL2 patients to FL, DLBCL and MZL

This echoes clinical difficulties in discerning between these tumor types• The majority of the re-allocated DLBCL2 patients go to MZL and FL, suggesting a business rule may be needed to supplement

the model

Training Results

CONCLUSION


Machine Learning allows us to evaluate large numbers of patient characteristics to identify the best factors that lead to the most promising cohorts of patients, pre-diagnosis.

Attribution from high probability patients to target physician provides an opportunity to focus on the best market vs. a market focus that might be less effective due to missing key doctors of interest.

The approach allows for optimal and efficient allocation of resources as well as rolling updates of the patient-to-physician target market overtime.

27

Case study conclusions


Steps in a machine learning project

1. Define the business question

2. Define the cohort(s) to be used

3. Gather the data to be used for the model

4. Prepare the data

5. Choose a model

6. Determining the method for variable reduction

7. Training

8. Validation

9. Parameter tuning

10. Prediction

28


The Process

29


Define the Business question

• Does your business problem require Machine Learning• Some problems may be solved by straight-forward automation. An example could be filtering out bad

data such as negative values in input

• Problems which have fewer known variables can be solved with a statistical approach

• On the other hand sentiment analysis of doctor notes may require the application of machine learning

• What are some business problems for machine learning methods?• Problems that require prediction rather than casual inference. i.e. we are interested in understanding

how certain aspects of data relate to each other

• Problems that are relatively self-contained: We are certain that the data we will feed to our machine learning model includes everything there is to the problem

30


What types of problems are best suited for a machine learning model?

31

• Diagnosis: Rare disease, disease with vague symptoms, complicated, multi-organ conditions, diseases which go undiagnosed or misdiagnosed for many years

• Product recommendations: Find characteristics of patients or prior treatment which suggests a recommendation for a specific product or therapeutic class

• Forecasting: Defining eligible patient universe

• Image recognition: Abnormal findings in imaging studies or pathology

• Disease progression: determine line of therapy progression

• Segmentation and targeting: Grouping practitioners or patients based on similar disease or treatment behavior

• Any problem where there are only a few known variables

• Where simple correlations and/or inferential statistics will answer the question

• Forecasting a market which is mature with no anticipated market changes or events

• Diagnosing a disease which is commonly tested with clear test results

• When you can see the answer using descriptive statistics

• Where key variables are missing or not available in the data

ML applications


Define the cohorts to be used

• Training set• A set of examples used to build our prediction algorithm(s).

• Roughly 60% of the dataset is set aside for this purpose

• Cross Validation set• This dataset is used to test the performance of the algorithms based on the training set.

• We generally pick the algorithm(s) that has the best performance.

• This test allows us to choose between the models.

• 20% of the original dataset is set aside for this purpose

• Test set• We apply our chosen algorithm(s) to see how it performs on unseen data.

• This test tells us how we have performed.

• 20% of the original dataset is set aside as test set.

32


What are the best practices in choosing the cohorts?

33

• Appropriate mix of dependent and independent variables in the cohort. For example in a study predicting coffee taste, we may include dependent variables such as region of growth, color, roast type and independent variables such as weight, plantation it was grown etc.

• Choose the training sets of appropriate size. The size has to be just right.

• The number of features used should be no more than double the number of records

• Use a 60-40 or 70-30 rule when splitting between training and validation/test sets

• The training dataset should be representative of the problem statement

• Skipping the validation dataset because of too few records is not recommended

• It is recommended that the validation/test dataset should represent future data we have not seen.

• When using timeseries data, the validation/test dataset should not be drawn at random. It is better to use the earlier data for training and later data for test


Gather the data

• Identify the various data sources

• Build an automated or semi-automated process to select the data

• Quality and Quantity of data determines how good our prediction will be

34


Data Preparation

Missing Data: We have massive amounts of data available today. However as we acquire data for use in a study, we may find that several data points pertinent for the project may be missing. For example, in a patient study for a disease, we may find that not all of the patient’s history is available which may lead to either not using that patient for the study or lead to other design considerations.

Improper format or standard: Machine learning projects by nature use data not only from several sources (multiple vendors providing data) but also various data sets (ex. prescription claims, hospitalization records, diagnosis claims, lab procedures etc.). Often we find that the data is still not “clean” enough even after downstream data-warehousing processes. This typically requires proper formatting and standardization before use.

Feature engineering: An important piece of any machine learning project is feature engineering i.e. Use domain knowledge to create/extract features to make machine learning algorithms work. For example we may encode time of day attribute as a sine + cosine features that can be used by an algorithm.

35


Types of Models:

36

Type of ML Problem Description ExampleClassification Pick one of N labels Cat, dog, horse, or bearRegression Predict numerical values Click-through rate

Clustering Group similar examples Most relevant documents (unsupervised)

Association rule learning Infer likely association patterns in data

If you buy hamburger buns, you're likely to buy hamburgers (unsupervised)

Structured output Create complex outputNatural language parse trees, image recognition bounding boxes

Ranking Identify position on a scale or status Search result ranking


Supervised LearningUse Case

• Patient Propensity scoring using Classification Model

• Use labelled input (disease) and features (Therapy, Diagnosis, age, Gender, Procedures) over 3 years

• SVM model with Linear kernel

• Given an individual with features,

the model produces a probability

score indicating the likelihood of

disease

37


Unsupervised LearningUse Case

• Disease prediction with K-means clustering for under-diagnosed/mis-diagnosed ailments

• Given a set of ICD-10 codes for patients, that are essentially unlabeled (cannot be tagged as diagnosis for a disease), use K-Means clustering predict disease.

• The choice of cluster centers is critical

to the quality of outcomes

• The model may be expanded to identify

similar patients based on their attributes

to optimize costs and intervene early

with therapy.

38


Reinforcement LearningUse Case

39

Develop a Markov model to study patient transition through the system

What is the likely next step for the patient?

What is the optimal next step for the patient?


Variable Reduction

Variable Reduction

• Making the selection of relevant variables to use becomes important for and model to predict with acceptable accuracy.

• For example if we consider a patient who is diagnosed with COPD and look at all his diagnosis history we may find several events that are not of relevance in a study. This may include events such as a doctor encounter for a strep infection that should not be considered for a COPD study.

• Some techniques used for variable reduction include; Exploratory Factor Analysis (statistical), Principal Component Analysis (unsupervised), Correlation analysis (Association)

40


Training the model

• The process of training the model includes providing the algorithm with the training data to learn from.

• The training data contains the target attributes we are interested in.

• The algorithm finds patters in the training dataset and outputs the ML model that captures this experience.

41


Validating Algorithms and troubleshooting issues:

42

The goal for testing the model: : Maximize prediction efficiency not just in the training data set but new data points


How well did the model work?

Underfitting: The model was not able to make a

reliable prediction in the test/training data set or in new

data

Just Right:The model predicts well in test and training sets and

with new data`

Overfitting: The model is fitting to the

“noise” or some characteristic of the training dataset and

does not predict as well when using new data points

1

2

What type of performance issue is it?

Determine the underlying issue…


What does good look like…

Performing additional validation:

• Measure outcomes using historical data

• Pilot: Measure outcomes using prospective data

• Review key variables in new data to validate model is working with different set of data

• Evaluate patient journey

44

The model looks “Just right”


Machine Learning Output…

Please insert pictures of examples…

-Overfitting

-Underfitting

-Data looks good but too much false positives..

-Other examples


A. To optimize the algorithms within the training and test samples

B. To optimize the algorithms with new data

C. Both A and B above

D. None of the above

46

What is the objective of tuning or in some cases “overhauling” a machine learning model?


Data attributes that differentiate test from control but not apply to larger sample

Increase the sample size of test/control in order to gain meaningful differentiation

Pruning variables

The CART algorithm will repeatedly partition data into smaller and smaller subsets until those final subsets are homogeneous in terms of the outcome variable. In practice this often means that the final subsets (known as the leaves of the tree) each consist of only one or a few data points. The tree has learned the data exactly, but a new data point that differs very slightly might not be predicted well.

• Minimum error. The tree is pruned back to the point where the cross-validated error is a minimum. Cross-validation is the process of building a tree with most of the data and then using the remaining part of the data to test the accuracy of the decision tree.

• Smallest Minimum error. The tree is pruned back to the point where the cross-validated error is a minimum. Cross-validation is the process of building a tree with most of the data and then using the remaining part of the data to test the accuracy of the decision tree.

• Smallest tree. The tree is pruned back slightly further than the minimum error. Technically the pruning creates a decision tree with cross-validation error within 1 standard error of the minimum error. The smaller tree is more intelligible at the cost of a small increase in error.

Remove variables using a heuristic approach

Medical relevance

Contrary to what is expected is sometimes ok

Other? 47

Under-fitting issues and solutions


Insufficient variables

The sample is too small

The sample is missing key data points

The model does not have the required variables to differentiate the test from control

Cohort needs to be re-designed to show differentiation

48

Over-fitting issues


Parameter Tuning

• Each algorithm has a set of parameters that need to be set before we can use the model.

• Goal is to set these parameters in the most optimal fashion that will enable us to complete the learning task efficiently.

• Can we use unsupervised learning models to find tune?

49


Prediction and Productionizing

• Prediction is the step where we get the answer to the questions. This is where we apply the learnt model on unseen data and produce results.

• Frequency: Because of the volume of data involved, how frequently do we re-run these models. Machine learning models typically are memory and core (CPU or GPU) intensive and a pragmatic selection of frequency of runs need to be made.• What data do we want to consider? – Sub setting only relevant data will allow for a smaller size of

input datasets

• Outcome Skew: ML algorithm implementations need to consider downstream effects because of changes in prediction outcomes. While a typical implementation may produce vast changes between 2 cycles of runs because of significant changes in the data received as input, this may lead to problems in systems downstream.

• Repeatability: As discussed earlier, ML algorithm implementations need to be tuned for a production system to produce consistent results and not be impacted by input changes both in volume and content.

50


Feedback on model

performanceRefine model

Machine Learning Roadmap

Define Mine Learn Validate Predict

Determine the business

question to be solved

Is the question best solved by

machine learning?

Create the (test and control) cohorts

Label data and

normalize

Feature extraction

Select datasets

Dimensional reduction

Exploratory analysis,

hypotheses testing

Create training and test data

sets

Run learning algorithms

Select learning algorithms

Test with hold out sample

Run models to make

predictions

New Data

Document…Document…Document

Test on new data, larger

sample


1. Ensure what you are trying to predict is addressable.

2. Machine learning models don’t learn “on their own”. Business and clinical expertise are required

3. Don’t expect the results to be “logical”. The algorithms find correlations not causes

4. Treat your data carefully: 1. Normalize and establish MDM

2. Use statistics and visualization to avoid biases and wholes in

3. Quality and completeness

5. Start small and add data layers: Especially where the data is not known well or recently integrated

6. Don’t stop with just a few algorithms.. use many and different types

7. Validate…validate…validate

8. Feedback and monitor with thresholds to ensure continued reliability of model(s)

9. Keep track of model changes

10. Keep learning: ML is a new field and advancing quickly

52

Top10 Machine Learning Best Practices


What next?.....



So what are we really talking about?

55

It matters to us: our data’s already being used by third parties to develop ML applications




Machine learning has broad potential across industries and use cases

58


Prediction 4: By 2019, 40% of Digital Transformation Initiatives Will Use AI Services; by 2021, 75% of

Commercial Enterprise Apps Will Use AI, Over 90% of Consumers Will Interact with Customer Support Bots,

and Over 50% of New Industrial Robots Will Leverage AI

59

https://www.idc.com/url.do?url=/getfile.dyn?containerId=US43234117&attachmentId=47303129&elementId=54687882&position=8


How much money is being invested?

60


Top AI Trends for 2018

61

http://usblogs.pwc.com/emerging-technology/wp-content/uploads/2017/12/Top-10-AI-trends-for-2018_PwC.pdf


Top 15 Deep Learning applications that will rule the world in 2018 and beyond

Self-driving cars

Deep Learning in Healthcare

Voice Search & Voice-Activated Assistants

Automatically Adding Sounds To Silent Movies

Automatic Machine Translation

Automatic Text Generation

Automatic Handwriting Generation

62

Image Recognition

Automatic Image Caption Generation

Automatic Colorization

Advertising

Predicting Earthquakes

Neural Networks for Brain Cancer Detection

Neural Networks in Finance

Energy Market Price Forecasting

https://medium.com/@vratulmittal/top-15-deep-learning-applications-that-will-rule-the-world-in-2018-and-beyond-7c6130c43b01


Not everything is good with the ML world

63


Deep learning is all the rage...why?

Neural networks: universal function approximators

Some spectacular applications

Confluence of hardware, software,theory, and data

64


So what are we doing?

Architecture

Patient-Condition Prediction

HCP-Practice-Condition Prediction (Condition360)

Synthetic Data Generation

Synthetic Personas

Sentiment Analysis

Patient Journey (Step) Prediction

Rare Disease Prediction

65


So what are we doing?ArchitectureBuilt on our HDMP and IDV

Python-based environment with predefined packages

Standard images

Standardized feature set by business object

Integration with Subversion SVN

Leverage existing model outputs as features into other models

66

API and DAPI

Webstack

Visualization

Solutions and Applications


So what are we doing?Predicting Patient Disease RiskWhat is the likelihood that a particular patient has a specific disease?

Given a set of (80) conditions, what are the relative risks?

Refresh every 90 days

Already a licensed offering

67


So what are we doing?HCP-Practice-Condition Prediction (Condition360)

Identify the pool of HCPs associated with the pool of at risk patients.

While patients may see many types of HCPs, we select associated HCPs who meet the following criterion:

• Have diagnosed or prescribed conditions/drugs in the market of interest in prior 2 years

• In cases where patients are not diagnosed or not on therapy, we select the PCP and or in market specialists in prior 2 years

68


So what are we doing?Imputing DataGenerating data that has the same characteristics as the real data we receive

More sophisticated approach than our legacy imputation framework

Why? So we don’t have to synthesize-delete-restate

69


So what are we doing?NLPStandard vocabulary: ours or third-party?

Embedding using word2vec

Application programming interface (API) and a Data “application programming interface” (DAPI)

Applications:

● Sentiment analysis● Text classification/categorization● Named entity recognition (NER)● Speech recognition (DeepSpeech)● Semantic parsing/Q&A

● Machine translation?

70


So what are we doing?Synthetic PersonaGenerating patients with a required profile based on real data we receive

Use Auto Encoders (AE) or Principal Component Analysis (PCA) depending on dimensionality

Synths can be used in studies targeting rare diseases modelling or fill gaps when patient data does not exist due to various factors (ex. HIPAA restrictions)

71


So what are we doing?Predicting Patient Journey next stepsDevelop Markov models to study patient transition through the system

What is the likely next step for the patient?

What is the optimal next step for the patient?

72

http://www.drgdigital.com/drg-digital-innovation-blog/report-modernizing-the-patient-journey-with-digital


So what are we doing?Rare Disease predictionPredict diseases that are difficult to diagnose or uncommon

Use RNNs and leverage model output from our patient disease risk assessment combined with claims and socio-economic features

Examples: Infantile spasm, Amyloidosis

73


So what are we doing?Anomaly Detection (DataPulse)Is this the same as fraud?

Looking for (low-band) signal in (high-band) noise

When is the inbound data “bad”?

What does “bad” even mean?

Process point-in-time and time series data to gather statistics, flagging anomalies, assigning them a significance score, and importance (based on how far “off” the data is, and for how long a period of time)

Unusual behavior in the data

Positive and negative issues

Handle seasonality, trends, and changing behaviors in the data

Data agnostic

74


So what are we doing?Diagnostics and AlertsLeverages part of our anomaly detection framework

Identify an “actionable anomaly”, generating a transaction for use by downstream systems

Key points: similarity and anomaly

75


Industry Example: Bladder Cancer Diagnosis using Deep Learning StudyMultinomial classification of primary tumor that can recognize bladder cancer in Magnetic Resonance Images without human intervention

TMN classification

❏ Tumor: How large is the primary tumor? Where is it located?❏ Node: Has the tumor spread to the lymph nodes? If so, where and how many?❏ Metastasis: Has the tumor spread to the lymph nodes? If so, where and how

many?

Focus on primary tumor; tracked 4 different types of primary tumors of bladder cancer: T2a, T2b, T3a and T4a

● Classification outcomes are related to 4 classes: T2a, T2b, T3a and T4a● Using the ConvNet, Top 1 accuracy increases achieving 81.30%● Baseline using a Multinomial Logistic Regression we achieved Top 1 accuracy

72.27%

76

Mauro Damo, Wei Lin, Ronaldo Braga and William Schneiderhttps://cdn.oreillystatic.com/en/assets/1/event/269/Bladder%20cancer%20diagnosis%20using%20deep%20learning%20Presentation.pdf


What else?

❏ Predicting physician performance and scoring physicians

❏ Recommender system for suggesting treatments

❏ Tailored treatments based on genetic profile?

❏ Predict disease outbreak

77

ConclusionsHealth informatics systems based on machine learning are in their infancy and the translation of such systems into clinical management has yet to be performed at scale.

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4587065/pdf/ymi-10-0038.pdf


Thoughtful analysis,intelligent answers, powerful impact.

78

Documents

PMSA Winter Symposium Uncovering the Machine Learning