36
Learning outcome prediction models from cancer data Andre Dekker Department of Radiation Oncology (MAASTRO) GROW - Maastricht University Medical Centre + Maastricht, The Netherlands SLIDES AVAILABLE ON SLIDESHARE (slideshare.net/AndreDekker)

Dekker trog - learning outcome prediction models from cancer data - 2017

Embed Size (px)

Citation preview

Page 1: Dekker   trog  - learning outcome prediction models from cancer data - 2017

Learning outcome prediction models from cancer data

Andre DekkerDepartment of Radiation Oncology (MAASTRO)GROW - Maastricht University Medical Centre +Maastricht, The Netherlands

SLIDES AVAILABLE ON SLIDESHARE (slideshare.net/AndreDekker)

Page 2: Dekker   trog  - learning outcome prediction models from cancer data - 2017

2

Disclosures• Research collaborations incl. funding and speaker honoraria

– Varian (VATE, SAGE, ROO, chinaCAT, euroCAT), Siemens (euroCAT), Sohard (SeDI, CloudAtlas), Mirada Medical (CloudAtlas), Philips (EURECA, TraIT, SWIFT-RT, BIONIC), Xerox (EURECA), De Praktijkindex (DLRA), ptTheragnostic (DART, Strategy), CZ (My Best Treatment)

• Public research funding– Radiomics (USA-NIH/U01CA143062), euroCAT(EU-Interreg), duCAT&Strategy

(NL-STW), EURECA (EU-FP7), SeDI & CloudAtlas & DART (EU-EUROSTARS), TraIT (NL-CTMM), DLRA (NL-NVRO), BIONIC (NWO)

• Spin-offs and commercial ventures– MAASTRO Innovations B.V. (CSO)– Various patents on medical machine learning

Page 3: Dekker   trog  - learning outcome prediction models from cancer data - 2017

3

TROG 2017 talks• Learning outcome prediction models from

cancer data– Technical Research Workshop, Monday 840-910,

followed by Panel Discussion• Big Data in Radiation Oncology

– Statistical Methods, Evidence Appraisal and Research for Trainees, Monday 1450-1520

• Knowledge Engineering in Oncology– TROG Plenary, Tuesday, 925-1000

• Radiomics for Oncology– TROG Plenary, Thursday, 1150-1220

Some Overlap

NoOverlap

Page 4: Dekker   trog  - learning outcome prediction models from cancer data - 2017

4

Learning objectivesAfter the lecture, attendees should be able to• Name the major sources of cancer data and their absolute and

relative size• Understand the challenges of sharing data and solutions to these• Itemize steps in the methodology to go from data to models• Appraise papers that describe models incl. using TRIPOD

Page 5: Dekker   trog  - learning outcome prediction models from cancer data - 2017

The Data Part

Page 6: Dekker   trog  - learning outcome prediction models from cancer data - 2017

6

Cancer Data?

Oncology2005-2015140M patients0.1-10GB per patient14-1400PB80% unstructured100k hospitals

Page 7: Dekker   trog  - learning outcome prediction models from cancer data - 2017

7

Barriers to sharing data[..] the problem is not really technical […]. Rather, the problems are ethical, political, and administrative. Lancet Oncol 2011;12:933

1. Administrative (I don’t have the resources)2. Political (I don’t want to)3. Ethical (I am not allowed to)

4. Technical (I can’t)

Page 8: Dekker   trog  - learning outcome prediction models from cancer data - 2017

8

Common approaches to sharing• Sharing standardized, highly curated data from

clinical research programs• Very useful, but only 3% of patients (if that)

• Sharing standardized, highly curated data to clinical registries

• Very useful, but limited amount of features and a lot of work

• Big Data companies usually cloud based (Watson Health Cloud, Flatiron/Google, ASCO/SAP CancerLinq)

• Worries about privacy, loss of control, limited reusability, silos

Page 9: Dekker   trog  - learning outcome prediction models from cancer data - 2017

9

Data landscape• Clinical research

• 3% of patients• 100% of features• 5% missing• 285 data points

• Clinical registries• 100% of patients• 3% of features• 20% missing• 240 data points

• Clinical routine• 100% of patients• 100% of features• 80% missing• 2000 data points

Data elementsPatients

Page 10: Dekker   trog  - learning outcome prediction models from cancer data - 2017

10

A different approach• If sharing is the problem: Don’t share the data

• If you can’t bring the data to the research• You have to bring the research to the data

• Challenges– The research application has to be distributed (trains & track)– The data has to be understandable by an application (i.e. not a human) ->

FAIR data stations

Page 11: Dekker   trog  - learning outcome prediction models from cancer data - 2017

11

CORAL: Community in Oncology for RApid Learning

7

4

meerCATLung - DyspneaU MichiganMAASTROThe Christie

Map © Copyright Showeet.com

canCATLung SBRT - ControlPrincess MargaretMAASTRO

BIONICRadiomicsMAASTROTata Memorial

duCATLung - DysphagiaMAASTRORadboudNKI

euroCATLung - SurvivalUK AachenLOC HasseltCatharinaMAASTROCHU Liege

Interest to joinErasmus (Breast)BCCA (Breast)Bloemfontein (Cervix)Odense (HN, Lung)Aalst (Lung)McGill (Brain)

ozCATHead&Neck - Survival LiverpoolIllawarra NewcastleWestmeadMAASTRORTOG/NRG

worldCATRectum - Local ControlFudanRome/EURTOG/NRG

Page 12: Dekker   trog  - learning outcome prediction models from cancer data - 2017

12

Typical Data Quality challenges• Data are unstructured• Data are not understandable• Data are missing• Data are incorrect• Data are contradicting• Data are biased• Data are biased missing

• Garbage in – Garbage out?

声门下区T4N0M0 Stage IV patientPatient weighing 1000kg

Grade 3+ toxicities

Page 13: Dekker   trog  - learning outcome prediction models from cancer data - 2017

For the techies…

Page 14: Dekker   trog  - learning outcome prediction models from cancer data - 2017

14

Horizontal PartitionsData elementsPatients

Maastricht

Patients Shanghai

• Reasonably well understood

• Distributed learning possible if data is FAIR

• No need for data to leave the hospital

Page 15: Dekker   trog  - learning outcome prediction models from cancer data - 2017

15

Vertical and Complex PartitionsData elements MAASTROPatients

Data elements Registry

Page 16: Dekker   trog  - learning outcome prediction models from cancer data - 2017

16

A bit more technical detail• Keep data locally• Standardize it

according to an ontology

• Make and send around learning “bots”

• Share the results - not the data!

Page 17: Dekker   trog  - learning outcome prediction models from cancer data - 2017

17

Even more technical details• De-identification• Semantic web, linked data• Imaging/DICOM data & clinical data stream

Page 18: Dekker   trog  - learning outcome prediction models from cancer data - 2017

The Modelling Part

Page 19: Dekker   trog  - learning outcome prediction models from cancer data - 2017

19

Our modelling approach• Hypothesis driven!!

Page 20: Dekker   trog  - learning outcome prediction models from cancer data - 2017

20

How much data do you need?• Rule of thumb. Min. 10 events per input feature

• 200 NSCLC patients• 25% survival at two years• 50 events

• 10 input features• More is better Source: vitalflux.com (2017)

Page 21: Dekker   trog  - learning outcome prediction models from cancer data - 2017

21

Source: Jason Brownlee (2013)

Machine Learning

Page 22: Dekker   trog  - learning outcome prediction models from cancer data - 2017

22

Considerations for machine learning• Discrimination (AUC)• Calibration (Brier)

• Interpretability (black box vs. transparent)

• Can it handle low data quality (of training and validation)?

• Can it be learned in a distributed setting?

Page 23: Dekker   trog  - learning outcome prediction models from cancer data - 2017

23

Choose alreadySimple and quick, but need complete data• Logistic regression• Support Vector Machines

Intuitive and can handle missing data• Bayesian Networks

All can be learned in a distributed setting

Review pending

Page 24: Dekker   trog  - learning outcome prediction models from cancer data - 2017

24

TRIPOD

https://www.tripod-statement.org/

Page 25: Dekker   trog  - learning outcome prediction models from cancer data - 2017

25

Validation model• Discrimination: Is the model able to classify the

population into two or more groups with different observed survival?

• Calibration: Is the estimated probability of survival equal to the observed survival probability?

• Clinical usefulness: Is the data on which the data is based representative for my patient and is the predicted outcome clinically relevant for my patient?

Page 26: Dekker   trog  - learning outcome prediction models from cancer data - 2017

26

Laryngeal carcinoma model• 994 MAASTRO patients• 1990-2005• www.predictcancer.org• Input parameters

– Age– Hemoglobin– T-stage– Radiotherapy Dose (Gy)– Gender– N+– Tumor location

• Output parameters– Overall survival

Page 27: Dekker   trog  - learning outcome prediction models from cancer data - 2017

27

Discrimination / Calibration / Clinical Relevance?

• Discrimination: Is the model able to classify the population into two or more groups with different observed survival?

• Calibration: Is the estimated probability of survival equal to the observed survival probability?

• Clinical usefulness: Is the data on which the data is based representative for my patient and is the predicted outcome clinically relevant for my patient?

Page 28: Dekker   trog  - learning outcome prediction models from cancer data - 2017

28

Discrimination / Calibration / Clinical Relevance?

• Discrimination: Is the model able to classify the population into two or more groups with different observed survival?

• Calibration: Is the estimated probability of survival equal to the observed survival probability?

• Clinical usefulness: Is the data on which the data is based representative for my patient and is the predicted outcome clinically relevant for my patient?

Page 29: Dekker   trog  - learning outcome prediction models from cancer data - 2017

29

Discrimination / Calibration / Clinical Relevance?

• Discrimination: Is the model able to classify the population into two or more groups with different observed survival?

• Calibration: Is the estimated probability of survival equal to the observed survival probability?

• Clinical usefulness: Is the data on which the data is based representative for my patient and is the predicted outcome clinically relevant for my patient?

Page 30: Dekker   trog  - learning outcome prediction models from cancer data - 2017

30

There is an app for that

Page 31: Dekker   trog  - learning outcome prediction models from cancer data - 2017

31

Learning objectivesAfter the lecture, attendees should be able to• Name the major sources of cancer data and their absolute and

relative size• Understand the challenges of sharing data and solutions to these• Itemize steps in the methodology to go from data to models• Appraise papers that describe models incl. using TRIPOD

Page 32: Dekker   trog  - learning outcome prediction models from cancer data - 2017

32

Acknowledgements• Fudan Cancer Center, Shanghai,

China• Varian, Palo Alto, CA, USA• Siemens, Malvern, PA, USA• RTOG, Philadelphia, PA, USA• MAASTRO, Maastricht, Netherlands• Policlinico Gemelli, Roma, Italy• UH Ghent, Belgium• UZ Leuven, Belgium• Radboud, Nijmegen, Netherlands• University of Sydney, Australia• University of Michigan, Ann Arbor,

USA

• Liverpool and Macarthur CC, Australia

• CHU Liege, Belgium• Uniklinikum Aachen, Germany• LOC Genk/Hasselt, Belgium• Princess Margaret CC, Canada• The Christie, Manchester, UK• UH Leuven, Belgium• State Hospital, Rovigo, Italy• Illawarra Shoalhaven CC, Australia • Catharina Zkh Eindhoven,

Netherlands• Philips, Eindhoven, NetherlandsMore info on: www.predictcancer.org www.cancerdata.org

www.eurocat.info www.mistir.info

Page 33: Dekker   trog  - learning outcome prediction models from cancer data - 2017

Thank you for your attention

Andre DekkerDepartment of Radiation Oncology (MAASTRO)GROW - Maastricht University Medical Centre +Maastricht, The Netherlands

Page 34: Dekker   trog  - learning outcome prediction models from cancer data - 2017

34

Page 35: Dekker   trog  - learning outcome prediction models from cancer data - 2017

35

Patient(ncit:C16960)

Age at start RT(roo:100003)

Year(uo:UO_0000036)

Value

Non-small cell lung carcinoma

(ncit:C2926)

Sex(nci:C20197 and

nci:C16576)

Value

Hospital(ncit:C19326)(uri=http://

www.uhn.ca/PrincessMargaret)

Month(uo:UO_0000035)

Value

Survival(roo:100063)

Vital Status(ncit:C37987 or

ncit:28554)

FEV1(nci:C38084)

Percentage FEV1(nci:C112376)

Liter(uo:UO_0000099)

Value

Percent(uo:UO_0000187)

Age at diagnosis(roo:100002) Year

(uo:UO_0000036)

Value

ECOG performance status

(nci:105722nci:105723nci:105725nci:105726nci:105727nci:105728)

Value

Positive Lymph Node Stations(roo:100049)

Count(uo:UO_0000189)

has_unitroo:100027

Value

DateTimeDescription

Clinical TNM Finding

(ncit:C48881)

Generic T-stage 0-4(ncit:48719)(ncit:48720)

).(ncit:48732)

has_clinical_t_stageroo:100244

Diagnostic Procedure

(ncit:C18020)

Volume of primary tumor

(roo:100054)

has_

volu

me

(roo

:100

315)

Cubic centimeter(uo:UO_0000097)

ValueRT Structure Set

(sedi:RTStructureSet)MIA Version

(mia:<version>)

AJCC Edition(roo:100052)(roo:100053)

Radition Therapy (ncit:C15313)

OR

SBRT(ncit:C118286)

Prescribed Radiotherapy Dose

(roo:100013)

Gray(uo:UO_0000134)

Value(xsd:double)

No. RT Fractions Per Treatment(roo:100356)

Value(xsd:integer)

No. RT Fractions Per Day

(roo:100355)

Value(xsd:integer)

Delivered Radiotherapy Dose

(roo:100012)

Gray(uo:UO_0000134)

Value(xsd:double)

First radiotherapy fraction

(roo:100058)

Last radiotherapy fraction

(roo:100059)

Histology(nci:2926nci:2852nci:3780nci:2929nci:2852nci:3915)

DateTimeDescription

DateTimeDescription

DateTimeDescriptionat_date_timeroo:100041

DateTimeDescription

Pneumonitis(ctcae:Pneumonitis)

Fracture(ctcae:Fracture)

Rib(fma:fma7574)

DateTimeDescription

DateTimeDescription

Reaction(ctcae:Radiation_recall_reaction_derm

atologic)

DateTimeDescription

at_date_timeroo:100041

Fatigue(ctcae:Fatigue)

DateTimeDescription

at_date_timeroo:100041

Dyspnea(ctcae:Dyspnea)

DateTimeDescription

at_date_timeroo:100041

Couch(ctcae:Couch)

DateTimeDescription

at_date_timeroo:100041

Anorexia(ctcae:Anorexia)

DateTimeDescription

at_date_timeroo:100041

DateTimeDescription Dysphagia(ctcae:Dysphagia)

at_date_timeroo:100041

DateTimeDescription Hemoptysis(nci:C3094)

at_date_timeroo:100041

DateTimeDescription Esophagitis(ctcae:Esophagitis)

at_date_timeroo:100041

DateTimeDescriptionPulmonary Fibrosis(ctcae:Pulmonary_fi

brosis)

at_date_timeroo:100041

DateTimeDescriptionBrachial plexopathy(ctcae:Brachial_plex

opathy)

at_date_timeroo:100041

Page 36: Dekker   trog  - learning outcome prediction models from cancer data - 2017

36

Tech used• ETL (Pentaho, Talend)• DICOM de-identification

(CTP)• RDF store & SPARLQ

endpoint (Blazegraph, Sesame)

• Ontology editing (Protégé)• Ontology publishing

(BioPortal)• Database (PostgreSQL)

• Database to RDF (D2R)• DICOM to RDF (SeDI)• PACS (dcm4chee)• Image processing pipeline

(MIA-MAASTRO)• Distributed application

(Varian, Docker)• Generic & Machine

learning (Matlab, R, Java, Python)