89
Database Approach for Innovative Discovery Robert J. Chen, MD, MPH [email protected]

Data Science in Cardiac Sciences

Embed Size (px)

Citation preview

Page 1: Data Science in Cardiac Sciences

Database Approach for Innovative Discovery

Robert J. Chen, MD, MPH

[email protected]

Page 2: Data Science in Cardiac Sciences

Backgrounds

• Data sciences: applied epidemiology

• Prediction and monitoring

• Exploration and confirmation

• Cardiac surgery– Risk and performance– Heart failure therapeutics

Page 3: Data Science in Cardiac Sciences

Cardiac Surgery

• Prediction of risks– EuroScore– STS Score– Our score?

• Study of controversies– CABG: beating vs. arrest– Atrial fibrillation

Page 4: Data Science in Cardiac Sciences

Cardiac Surgery

• Transplantation– Donor, awaiting, recipients

• Clinical database system– Retrospective data– Prospective data– Comprehensive data

• Preop, Op, Postop

Page 5: Data Science in Cardiac Sciences

Cardiac Surgery

• Conventional vs. New methods– Robotics, trans-catheter valves, aortic stent

graft

• Ultimate treatment for heart failure– Cardiac stem cells– Adult somatic cells trans-differentiation– Cardiac re-genesis from adult somatic cells– Missing link?

Page 6: Data Science in Cardiac Sciences
Page 7: Data Science in Cardiac Sciences
Page 8: Data Science in Cardiac Sciences
Page 9: Data Science in Cardiac Sciences
Page 10: Data Science in Cardiac Sciences
Page 11: Data Science in Cardiac Sciences
Page 12: Data Science in Cardiac Sciences

Cytoscape

• Http://www.cytoscape.org

• Biomolecular interaction network

• P-P, P-G, G-G interactions

• Plug-in

Page 13: Data Science in Cardiac Sciences

Cytoscape

Page 14: Data Science in Cardiac Sciences

Specific Aims

• To understand nucleostemin in the molecular level;

• To master the tool of Cytoscape for its applications;

• To use Cytoscape to construct the functional network of nucleostemin;

• To propose a potential future direction of cardiac stem cell research.

Page 15: Data Science in Cardiac Sciences

Methods and Procedures

• Literature review for cardiac regenerative therapy;

• Literature review for cardiac stem cells;

• Literature review for nucleostemin and related molecules;

• Acquisition of experience and expertise for using Cytoscape;

Page 16: Data Science in Cardiac Sciences

Methods and Procedures

• Use of Cytoscape for nucleotide-protein and protein-protein interaction analysis;

• Construction of the functional network of nucleostemin;

• Hypothesis generation for more research targets of cardiac stem cells starting from the network of nucleostemin.

Page 17: Data Science in Cardiac Sciences

Methods and Procedures

• Use of Cytoscape for nucleotide-protein and protein-protein interaction analysis;

• Construction of the functional network of nucleostemin;

• Hypothesis generation for more research targets of cardiac stem cells starting from the network of nucleostemin.

Page 18: Data Science in Cardiac Sciences

Gantt Chart

Page 19: Data Science in Cardiac Sciences

Backgrounds

• Cardiovascular Surgery– Technology-intensive– Techniques-oriented

• A specialty that needs– Risk assessment– Outcome prediction– Performance feedback

• Data from paper -> time and manpower demanding

Page 20: Data Science in Cardiac Sciences

Backgrounds

• Disease/Procedures-specific– CABG, Aorta, valve, heart failure (LVAD,

transplant), Af, …• Risk-adjusted outcomes• Patient-surgeon preop discussion• Decision-making for choosing therapies• Performance comparison• Quality improvement• Research, reports, and publications

Page 21: Data Science in Cardiac Sciences

Specific Aims

• Establish the hospital-based CVS database system– Data entry (web-based, intranet)– Data storage and management (secure,

confidential)– Data analysis

• Features: flexible, compatible to standard syntax, and open-structured

Page 22: Data Science in Cardiac Sciences

Specific Aims

• Data exchange– Data import from various sources– Data export to advanced statistical software

• Procedure-specific: CABG, valvular, aortic, transplant, atrial fibrillation,…

• Data entry once system ready– Prospective: clinical staff– Retrospective: data staff

• Regular reports• Clinical research

Page 23: Data Science in Cardiac Sciences

Methods & Procedures

• Variable selection:– Demographic, underlying, preop status– Operation-related– Postop condition– Complications, outcomes, follow-ups

• Interaction with database programmers– Entry interface– Hospital IT support and HIS (hosp info

system) integration

Page 24: Data Science in Cardiac Sciences

Methods & Procedures

• Test-drive– Debugging– Feedback and revision

• Data entry– Past data– Current and new data

• Statistical analysis– Descriptive statistics– Inferential statistics– Stata 10.2

Page 25: Data Science in Cardiac Sciences

Content of Cardiovascular Surgery Database

1. Administrative2. Demographics3. Hospitalization4. Risk Factors5. Previous CV Interventions (1..n)

6. Preoperative Cardiac Status (1..n)

7. Preoperative Medications (1..n)

8. Hemodynamics, cath, and echocardiogram (1..n) …..

• There are totally 19 raw-tables, hundreds of features from patient’s special chart.

• Highly de-normalized! • Transactional data tracking is required!

Page 26: Data Science in Cardiac Sciences

Database System Meta-structure

• 1. Software

• 2. Data

• 3. Meta-data: mapping of variable names of identical meaning, for importing data from other existing datasets.

Page 27: Data Science in Cardiac Sciences

Database System Meta-structure

Page 28: Data Science in Cardiac Sciences

Expected Results

• 1. A database system for retrospective and prospective data

• 2. A trained data-entry team

• 3. A preliminary analysis report

• 4. A schedule for regular reporting

• 5. Analysis upon request in daily basis

Page 29: Data Science in Cardiac Sciences

Analysis Report Outline

1) Outcome Reporting and International Comparisons

2) Overall Cardiac Surgical Activity3) Preoperative Assessment4) Patient Demographics5) Risk-Stratification and Presentation of Risk-

Adjusted Outcomes6) Coronary Artery Bypass Grafting (CABG)7) Heart Transplantation8) Summary

Page 30: Data Science in Cardiac Sciences

Gantt Chart

Page 31: Data Science in Cardiac Sciences

Funding

• Transplantation database: NT$300,000

• Nucleostemin: NT$130,000

• Cardiac surgery database: NT$2,000,000

Page 32: Data Science in Cardiac Sciences

Current Status• Transplantation database:

– Completed in 2008 (N=3,000)– EMB and TR in HTx (N=2,000; n=200)

• Presented in CAST 2007 and ASCVS 2008• Published in Transplantation Proceedings 2008

• Cardiac surgery database:– EuroScore and our score for CAD-LMD

(N=444)• Presented in 2009 ASCVS

– Arrest CABG performance (N=800)• Presented in TSOC 2009 debate

Page 33: Data Science in Cardiac Sciences
Page 34: Data Science in Cardiac Sciences

Current Status

• Cardiac Surgery Clinical Database– Dendrite, Inc.– Connection with HIS– Data entry reduced to minimum– N=600*15 (electronic *5)

• Nucleostemin– Literature review– Data exploration (on-line database)

Page 35: Data Science in Cardiac Sciences

Perspectives

• Outcome-based cardiac surgery– Evidence-based– Selection of procedure– Selection of surgeon/team

• Novel and ultimate treatment for end-stage heart failure (initial stage)

• Database approach both for research and clinical practice

Page 36: Data Science in Cardiac Sciences

Thank You!

• 陳勁辰• [email protected]

Page 37: Data Science in Cardiac Sciences

Theoretical Functional Network of Nucleostemin for Cardiac Stem

Cells

Page 38: Data Science in Cardiac Sciences

Abstract

• Nucleostemin plays a pivotal role in cardiac stem cells for the regenerative function but its interactions with other key molecules are still unclear.

• We would like to perform nucleotide-protein and protein-protein interaction analysis by Cytoscape (http://www.cytoscape.org) to build the functional network map for nucleostemin.

Page 39: Data Science in Cardiac Sciences

Abstract

• New or revised bioinformatics methodology may be developed.

• The proposed functional network of nucleostemin may inspire future laboratory investigation of cardiac stem cell research.

Page 40: Data Science in Cardiac Sciences

Backgrounds

• Myocardial regeneration-> end-stage heart failure

• Cardiac stem cells

• Various sources: embryo, BM, iPS,…

• Cellular reprogramming: avoiding the use of embryo

Page 41: Data Science in Cardiac Sciences

Backgrounds

• Nucleostemin: a regulatory protein• Its expression is associated with

proliferation and maintenance of a primitive cellular phenotype

• Nucleostemin expression in cardiomyocytes is induced by fibroblast growth factor-2 and accumulates in response to Pim-1 kinase activity.

Page 42: Data Science in Cardiac Sciences

Backgrounds

• Cardiac stem cells also express nucleostemin that is diminished in response to commitment to a differentiated phenotype.

• Overexpression of nucleostemin in cultured cardiac stem cells increases proliferation while preserving telomere length, providing a mechanistic basis for potential actions of nucleostemin in promotion of cell survival and proliferation as seen in other cell types.

Page 43: Data Science in Cardiac Sciences
Page 44: Data Science in Cardiac Sciences
Page 45: Data Science in Cardiac Sciences

Cytoscape

• Http://www.cytoscape.org

• Biomolecular interaction network

• P-P, P-G, G-G interactions

• Plug-in

Page 46: Data Science in Cardiac Sciences

Cytoscape

Page 47: Data Science in Cardiac Sciences
Page 48: Data Science in Cardiac Sciences
Page 49: Data Science in Cardiac Sciences
Page 50: Data Science in Cardiac Sciences
Page 51: Data Science in Cardiac Sciences
Page 52: Data Science in Cardiac Sciences

Specific Aims

• To understand nucleostemin in the molecular level;

• To master the tool of Cytoscape for its applications;

• To use Cytoscape to construct the functional network of nucleostemin;

• To propose a potential future direction of cardiac stem cell research.

Page 53: Data Science in Cardiac Sciences

Methods and Procedures

• Literature review for cardiac regenerative therapy;

• Literature review for cardiac stem cells;

• Literature review for nucleostemin and related molecules;

• Acquisition of experience and expertise for using Cytoscape;

Page 54: Data Science in Cardiac Sciences

Methods and Procedures

• Use of Cytoscape for nucleotide-protein and protein-protein interaction analysis;

• Construction of the functional network of nucleostemin;

• Hypothesis generation for more research targets of cardiac stem cells starting from the network of nucleostemin.

Page 55: Data Science in Cardiac Sciences

Expected Results

• Molecular characteristics of nucleostemin;

• Functional network of nucleostemin;

• Role of nucleostemin in cardiac stem cells and cardiac regeneration therapy.

• New or revised bioinformatics methodology for the network analysis

Page 56: Data Science in Cardiac Sciences

Gantt Chart

Page 57: Data Science in Cardiac Sciences

Budgets

Page 58: Data Science in Cardiac Sciences

Cardiovascular Surgery Database and Data Exploratory

Analysis

Cheng Hsin Rehabilitation Medical Center

2008/11/23

Page 59: Data Science in Cardiac Sciences

Outline• Objectives

• Content of Cardiovascular Surgery Database

• Scope and challenges

• Clinical Case Management System – A possible technological innovation

framework– Descriptive statistics, or beyond?

Page 60: Data Science in Cardiac Sciences

Objectives

• Develop cardiovascular surgery database– Clinical case management system?– Including bio-information, systemic complications?

• Risk assessment– Pre/post-operative probabilistic judgment?– Risk prediction model?

• Outcome prediction– Co-occurrence of complications?– Major features screening? Patient screening?

• Statistical analysis– Advanced data exploratory analysis?

Page 61: Data Science in Cardiac Sciences

Questions

• Develop cardiovascular surgery database– Clinical case management system?– Including bio-information, systemic complications?

• Risk assessment– Pre/post-operative probabilistic judgment? – Risk prediction model?

• Outcome prediction– Co-occurrence of complications?– Major features screening? Patient screening?

• Statistical analysis– Advanced data exploratory analysis?

Page 62: Data Science in Cardiac Sciences

Content of Cardiovascular Surgery Database

1. Administrative2. Demographics3. Hospitalization4. Risk Factors5. Previous CV Interventions (1..n)

6. Preoperative Cardiac Status (1..n)

7. Preoperative Medications (1..n)

8. Hemodynamics, cath, and echocardiogram (1..n) …..

• There are totally 19 raw-tables, hundreds of features from patient’s special chart.

• Highly de-normalized! • Transactional data tracking is required!

Page 63: Data Science in Cardiac Sciences

Cardiovascular Surgery Database

Scope• Surgery Operations

– CABG– Valvular heart– Heart transplantation– Aortic– Atrial fibrillation– Ventricular restoration– Ventricular assist device– Congenital heart

• Referred sites?

Page 64: Data Science in Cardiac Sciences

Cardiovascular Surgery DatabaseChallenge

• Multiple surgery operations– Involve different features?– Balance between physician and database

designer viewpoint! (Special chart vs. relational tables)

– NULL/Missing valued included!• Inter/Intra-hospital database system?• How to tracking of clinical patient records

(pre/post-operative)?• Need to develop validation model?

Page 65: Data Science in Cardiac Sciences

Clinical Case Management System

• Four-level framework• Monitoring Level

– Frontend: Web-based data entry, visualization, various data format export interfaces

– Backend: validation model, relational databases, co-relation among features

• Surveillance Level– Preoperative: Probabilistic reasoning, Bayes

decision, Bibliography– Postoperative: Time-tracking?

Page 66: Data Science in Cardiac Sciences

Clinical Case Management System

• Model Construction Level– Prediction/classification model (DT, NN,

Ensemble, etc.)– Co-relation/co-occurrence frequency graph model– Knowledge model (Apriori, Carma, GRI, etc.)– Ontology Knowledge Base?

• Life Quality Level– Long-term tracking of patient status– WHOQOL-BREF Taiwan Version questionnaire

Page 67: Data Science in Cardiac Sciences

A Brief ModelCommunication via. Web-Based Interface

Internet Web-Based Services

病患(Patients)

檢驗人員(Labortory)

醫師(Surgeon)

護理人員(Nursing)

Monitoring Level

Surveillance Level

Model Construction

Level

Life Quality Level

Medical Case Management:醫療計畫、醫學個案管理、生理及生化資訊顯示介面、

生理與生化資訊趨勢及變化顯示、JESS建立專家系統介面殼層

手術醫學資料庫

視覺顯示

Medical Decision Reasoning:手術決策模式、驗前機率及驗後機率推論機制、貝氏認知

網路推論模型、手術方式輔助決定

Knowledge Construction:本體論建構專家系統知識庫、手術關鍵決策因子分析、組合式(ensemble)個案分類模型、術前術後個案比較模式

手術知識庫

Life Quality Evaluation:病患術後品質評估模型、WHOQOL-BREF Taiwan Version

Questionnaire、不同手術方式之照護計畫

Execution Medical Decision with Patients information and assistance

probabilistic model

Extraction of the Key Factors for Surgical decision making, Model Construction for patients

before/after Surgeon Comparison

Electronic Patient Health Records

護理照護評估

Medication and Care issues

Diagnosis and Medication

issues

Diagnosis and Care issues

Medication and Care issues

Life Quality evaluation for patients before/after Surgeon Comparison

Page 68: Data Science in Cardiac Sciences

Communication via. Web-Based Interface

Internet Web-Based Services

病患(Patients)

檢驗人員(Labortory)

醫師(Surgeon)

護理人員(Nursing)

Monitoring Level

Surveillance Level

Model Construction

Level

Life Quality Level

Medical Case Management:醫療計畫、醫學個案管理、生理及生化資訊顯示介面、

生理與生化資訊趨勢及變化顯示、JESS建立專家系統介面殼層

手術醫學資料庫

視覺顯示

Medical Decision Reasoning:手術決策模式、驗前機率及驗後機率推論機制、貝氏認知

網路推論模型、手術方式輔助決定

Knowledge Construction:本體論建構專家系統知識庫、手術關鍵決策因子分析、組合式(ensemble)個案分類模型、術前術後個案比較模式

手術知識庫

Life Quality Evaluation:病患術後品質評估模型、WHOQOL-BREF Taiwan Version

Questionnaire、不同手術方式之照護計畫

Execution Medical Decision with Patients information and assistance

probabilistic model

Extraction of the Key Factors for Surgical decision making, Model Construction for patients

before/after Surgeon Comparison

Electronic Patient Health Records

護理照護評估

Medication and Care issues

Diagnosis and Medication

issues

Diagnosis and Care issues

Medication and Care issues

Life Quality evaluation for patients before/after Surgeon Comparison

Page 69: Data Science in Cardiac Sciences

Abstract

• The project was motivated by the need for risk assessment, outcome prediction, and performance feedback.

• Referring to other existing cardiovascular surgery database systems, we would select the variables of interest and then outsource the database design to database programmers with our

• Operations such as CABG, valvular heart surgery, heart transplantation, aortic surgery, atrial fibrillation surgery would be included.

Page 70: Data Science in Cardiac Sciences

Abstract

• The database would be established in a trustworthy system and platform.

• Revision of the database system would be made after test driving.

• Retrospective and prospective data entry (web-based) would be done by trained personnel.

• Preliminary report would be made from the data stored in the database with the statistical analysis performed by qualified professional.

Page 71: Data Science in Cardiac Sciences

Backgrounds

• Cardiovascular Surgery– Technology-intensive– Techniques-oriented

• A specialty that needs– Risk assessment– Outcome prediction– Performance feedback

• Data from paper -> time and manpower demanding

Page 72: Data Science in Cardiac Sciences

Backgrounds

• Disease/Procedures-specific– CABG, Aorta, valve, heart failure (LVAD,

transplant), Af, …• Risk-adjusted outcomes• Patient-surgeon preop discussion• Decision-making for choosing therapies• Performance comparison• Quality improvement• Research, reports, and publications

Page 73: Data Science in Cardiac Sciences

Specific Aims

• Establish our hospital-based CVS database system– Data entry (web-based, intranet)– Data storage and management (secure,

confidential)– Data analysis

• Features: flexible, compatible to standard syntax, and open-structured

Page 74: Data Science in Cardiac Sciences

Specific Aims

• Data exchange– Data import from various sources– Data export to advanced statistical software

• Procedure-specific: CABG, valvular, aortic, transplant, atrial fibrillation,…

• Data entry once system ready– Prospective: clinical staff– Retrospective: data staff

• Regular reports• Clinical research

Page 75: Data Science in Cardiac Sciences

Methods & Procedures

• Variable selection:– Demographic, underlying, preop status– Operation-related– Postop condition– Complications, outcomes, follow-ups

• Interaction with database programmers– Entry interface– Hospital IT support and HIS (hosp info

system) integration

Page 76: Data Science in Cardiac Sciences

Methods & Procedures

• Test-drive– Debugging– Feedback and revision

• Data entry– Past data– Current and new data

• Statistical analysis– Descriptive statistics– Inferential statistics– Stata 10.2

Page 77: Data Science in Cardiac Sciences

Content of Cardiovascular Surgery Database

1. Administrative2. Demographics3. Hospitalization4. Risk Factors5. Previous CV Interventions (1..n)

6. Preoperative Cardiac Status (1..n)

7. Preoperative Medications (1..n)

8. Hemodynamics, cath, and echocardiogram (1..n) …..

• There are totally 19 raw-tables, hundreds of features from patient’s special chart.

• Highly de-normalized! • Transactional data tracking is required!

Page 78: Data Science in Cardiac Sciences

Database System Meta-structure

• 1. Software

• 2. Data

• 3. Meta-data: mapping of variable names of identical meaning, for importing data from other existing datasets.

Page 79: Data Science in Cardiac Sciences

Database System Meta-structure

Page 80: Data Science in Cardiac Sciences

Expected Results

• 1. A database system for retrospective and prospective data

• 2. A trained data-entry team

• 3. A preliminary analysis report

• 4. A schedule for regular reporting

• 5. Analysis upon request in daily basis

Page 81: Data Science in Cardiac Sciences

Analysis Report Outline

1) Outcome Reporting and International Comparisons

2) Overall Cardiac Surgical Activity3) Preoperative Assessment4) Patient Demographics5) Risk-Stratification and Presentation of Risk-

Adjusted Outcomes6) Coronary Artery Bypass Grafting (CABG)7) Heart Transplantation8) Summary

Page 82: Data Science in Cardiac Sciences

Gantt Chart

Page 83: Data Science in Cardiac Sciences

Budget

Page 84: Data Science in Cardiac Sciences

台灣移植登錄資料庫分析報告

V.2.1(西元 2008年 1月 )

器官捐贈移植登錄中心台灣移植醫學學會

Page 85: Data Science in Cardiac Sciences

摘要• 日期範圍 :2004年 4月至 2007年 9月• 移植器官包括心臟,肝臟,腎臟,及肺臟。

• 包含捐贈者,等候者,及受贈者之性質,器官利用率,等候時間,病人存活率等等。

• 並對移植登錄資料庫升級提出建言。

Page 86: Data Science in Cardiac Sciences

報告目錄

Page 87: Data Science in Cardiac Sciences

報告目錄

Page 88: Data Science in Cardiac Sciences

工作報告• 回朔性分析已建立好之資料庫• 姓名及醫院名已加密獨特編碼而無法辨識• 資料日期更新至 2007年 10 月 1日

–較可靠資料始於 2004年 4月• 資料庫平台 PostgreSQL 8

–共 36個資料表–依病人獨特識別碼進行資料鏈結–輸出成分析所需之子資料表 (*.csv)

Page 89: Data Science in Cardiac Sciences