49
Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data Joel Saltz MD, PhD Director Center for Comprehensive Informatics

Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

  • Upload
    gil

  • View
    33

  • Download
    1

Embed Size (px)

DESCRIPTION

Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data. Joel Saltz MD, PhD Director Center for Comprehensive Informatics. Objectives. Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology) - PowerPoint PPT Presentation

Citation preview

Page 1: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Joel Saltz MD, PhDDirector Center for Comprehensive

Informatics

Page 2: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Objectives

• Reproducible anatomic/functional characterization at gross level (Radiology) and fine level (Pathology)

• Integration of anatomic/functional characterization with multiple types of “omic” information

• Create categories of jointly classified data to describe pathophysiology, predict prognosis, response to treatment

• Data modeling standards, semantics• Data research issues

Page 3: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

In Silico Program Objectives (from NCI)• In silico is an expression used to mean "performed on computer or via

computer simulation.“ (Wikipedia)• In silico science centers: support investigator-initiated, hypothesis-

driven research in the etiology, treatment, and prevention of cancer using in silico methods– Generating and publishing novel cancer research findings leveraging

caBIG tools and infrastructure– Identifying novel bioinformatics processes and tools to exploit existing data

resources• Encouraging the development of additional data resources and caBIG

analytic services• Assessing the capabilities of current caBIG tools• Emory, Columbia, Georgetown, Fred Hutchinson Cancer ,

Translational Genomics Research Institute

Page 4: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

In Silico Center for Brain Tumor Research

Specific Aims:

1. Influence of necrosis/hypoxia on gene expression andgenetic classification.

2. Molecular correlates of highresolution nuclear morphometry.

3. Gene expression profiles that predict glioma progression.

4. Molecular correlates of MRIenhancement patterns.

Page 5: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Integrative Analysis: Tumor Microenvironment

• Structural and functional differentiation within tumor

• Molecular pathways are time and space dependent

• “Field effects” – gradient of genetic, epigenetic changes

• Radiology, microscopy, high throughput genetic, genomic, epigenetic studies, flow cytometry, microCT, nanotechologies …

• Create biomarkers to understand disease progression, response to treatment

Tumors are organs consisting of many interdependent cell types

• From John E. Niederhuber, M.D. Director National Cancer Institute, NIH presented at Integrating and Leveraging the Physical Sciences to Open a New Frontier in Oncology, Feb 2008

Page 6: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Informatics Requirements

•Parallel initiatives Pathology, Radiology, “omics”

•Exploit synergies between all initiatives to improve ability to forecast survival & response.

RadiologyImaging

Patient Outco

me

Pathologic Features

“Omic”Data

Page 7: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

In Silico Center for Brain Tumor ResearchKey Data Sets

REMBRANDT: Gene expression and genomics data set of all glioma subtypes

The Cancer Genome Atlas (TCGA): Rich “omics” set of GBM, digitized Pathology and Radiology

Vasari Feature Set: Standardized annotation of gliomas of all subtypes

Page 8: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

TCGA Research Network

Digital Pathology

Neuroimaging

Page 9: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Distinguishing Among the Gliomas

“There are also many cells which appear to be transitions between gigantic oligodendroglia and astrocytes. It is impossible to classify them as belonging in either group”

Bailey P, Bucy PC. Oligodendrogliomas of the brain. J Pathol Bacteriol 1929: 32:735

Page 10: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Oligodendroglioma Astrocytoma

Nuclear Qualities

Page 11: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Progression to GBM

Anaplastic Astrocytoma(WHO grade III)

Glioblastoma(WHO grade IV)

Page 12: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

TCGA Neuropathology Attributes 120 TCGA specimens; 3 Reviewers

 Presence and Degree of:

Microvascular hyperplasia Complex/glomeruloid Endothelial hyperplasia

Necrosis Pseudopalisading pattern Zonal necrosis

Inflammation Macrophages/histiocytes Lymphocytes Neutrophils 

Differentiation: Small cell component Gemistocytes Oligodendroglial Multi-nucleated/giant cells Epithelial metaplasia           Mesenchymal metaplasia

Other Features Perineuronal/perivascular

satellitosis Entrapped gray or white matter Micro-mineralization  

Page 13: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Feature Extraction

TCGA Whole Slide Images

Jun Kong

Page 14: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Astrocytoma vs OligodendroglimaOverlap in genetics, gene expression, histology

Astrocytoma vs Oligodendroglima• Assess nuclear size (area and

perimeter), shape (eccentricity, circularity major axis, minor axis, Fourier shape descriptor and extent ratio), intensity (average, maximum, minimum, standard error) and texture (entropy, energy, skewness and kurtosis).

Page 15: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Nuclear Qualities

Which features carry most prognostic significance?Which features correlate with genetic alterations?

Page 16: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Whole slide scans from 14 TCGA GBMS (69 slides)7 purely astrocytic in morphology; 7 with 2+ oligo component399,233 nuclei analyzed for astro/oligo featuresCases were categorized based on ratio of oligo/astro cells

Machine-based Classification of TCGA GBMs (J Kong)

TCGA Gene Expression Query: c-Met overexpression

Page 17: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Nuclear Feature Analysis: TCGA• Using the parallel computation infrastructure of Sun Grid

Engine, we analyzed image tiles of 4096x4096 of 213 whole-slide TCGA images of permanent tissue sections.

• Approximately 90 million nuclei segmented.

• 79 patients: 57 are diagnosed as GBM (‘oligo 0’) 17 are classified as GBM with ‘oligo 1’, 5 as GBM with ‘oligo 2+’.

• With each data file including all nuclear features from one patient, all nuclei were classified with color blue, green, and red representing nuclei scored as 1~3, 4~6, and 7~10, respectively.

Page 18: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

• The ratio of nuclei classified ≤ 5 to >5 was computed for each 79 patients.

• Ratios associated with ‘oligo 0’ and ‘oligo 2+’ patient populations were compared with two-sample t-test (p=0.0145)

Nuclear Feature Analysis: TCGA

Page 19: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Discriminating Features (Grade 1 vs. Grade 7-10)

Page 20: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Discriminating Features (Grade 1 vs. Grade 7-10)

Page 21: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Discriminating Features (Grade 1 vs. Grade 7-10)

Page 22: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Determine the influence of tumor microenvironment on gene expression

profiling and genetic classification using TCGA data

Necrosis = Severe Hypoxia

Microvascular HyperplasiaGBM

Page 23: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

3 Gene Families are Altered in GBM: RTK, p53 and RB

Page 24: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

GBM: necrosis, hypoxia, angiogenesis and gene expression

• Does the presence or degree of necrosis within digitized frozen section slides correlate with specific gene expression patterns or determine algorithm-based unsupervised clustering of GBMs gene expression categories?

• Does the presence or degree of necrosis influence the type of angiogenesis or pro-angiogenic gene expression patterns within human gliomas?

Page 25: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

GBM: % Necrosis

Page 26: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

TCGA: GBM Frozen Sections

• 179 cases were assessed for % necrosis on frozen section slides for TCGA quality assurance.

• Cox-based regression analysis for % necrosis vs. gene expression (795 probe sets; 647 distinct genes)

Page 27: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Network Analysis based on % Necrosis

Carlos Moreno

DavidGutman

Page 28: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Aim 4. Identify correlates of MRI enhancement patterns in astrocytic neoplasms with underlying vascular changes and gene expression profiles.

No enhancementNormal VesselsStable lesion

?Rim-enhancementVascular ChangesRapid progression

Page 29: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Angiogenesis Segmentation

H&EImage

ColorDeconvolution

HematoxylinImage

EosinImage

EosinImage

SpatialNorm.

DensityImage

DensityCalculation

BoundarySmoothing

DensityImage

ObjectID

SegmentedVessels

Eosin intensity image

Angiogenic Segmentation

Page 30: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

States of AngiogenesisEndothelial Hypertrophy

Complex Microvascular Hyperplasia

Endothelial Hyperplasia

Lee CooperSharath Cholleti

Page 31: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Vessel Characterization• Bifurcation detection

Page 32: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Vasari Imaging Criteria(Adam Flanders, TJU; Dan Rubin, Stanford, Lori

Dodd, NCI)• Require standardized validated feature sets to

describe de novo disease.• Fundamental obstacle to new imaging criteria as

treatment biomarkers is lack of standard terminology:– To define a comprehensive set of imaging

features of cancer– For reporting imaging results– To provide a more quantitative, reproducible

basis for assessing baseline disease and treatment response

Page 33: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Defining Rich Set of Qualitative and Quantitative Image Biomarkers

• Community-driven ontology development project; collaboration with ASNR

• Imaging features (5 categories)– Location of lesion– Morphology of lesion margin (definition, thickness,

enhancement, diffusion)– Morphology of lesion substance (enhancement, PS

characteristics, focality/multicentricity, necrosis, cysts, midline invasion, cortical involvement, T1/FLAIR ratio)

– Alterations in vicinity of lesion (edema, edema crossing midline, hemorrhage, pial invasion, ependymal invasion, satellites, deep WM invasion, calvarial remodeling)

– Resection features (extent of nCE tissue, CE tissue, resected components)

Page 34: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data
Page 35: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data
Page 36: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data
Page 37: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Results: Reader Agreement• High inter-observer agreement among

the three readers– (kappa = 0.68, p<0.001)

• Percentage agreement was also high for most features individually– 22 of 30 features (73%) had agreement greater than

50%– Twelve features (40%) had >80% agreement– No feature had less than 20% agreement

• Feature agreement rose substantially when used with tolerance (+/- 1).

Page 38: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Preliminary Relationships of Features to Survival

• Cox proportional hazards models were fit to each of the thirty features related to overall survival.

• Features associated with lower survival included (p<.0001):– Proportion of enhancing tissue at baseline.– Thick or nodular enhancement characteristics.– Contralateral hemisphere invasion.

• Proportion of non contrast enhancing tumor (nCET) had positive correlation with survival.

• Tumor size at baseline had no relationship to survival.

Page 39: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Coupling silico methodology with a clinical trial: Will Treatment work and if not, why not?

Example: Avastin and Glioblastoma as in RTOG-0825 (plus institutional accrual)

Treatment: Radiation therapy and Avastin (anti angiogenesis)

Predict and Explain: Genetic, gene expression, microRNA, Pathology, Imaging

Imaging/RT reproducibility, Integration with EMR, PACS, RT systems

Page 40: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

DATA

INTEGRATION

Information Warehouse User AccessAcquisition Transfer

CPOEOR systemPatient MgmtDictated reportsPathology reports

Daily

ADT LabRespiratory Blood Endoscopy CardiologySiemens Img

Real time

Patient BillingPractice Plans

Pt Satisfaction

Weekly

Monthly

Cancer GeneticsWoundImagesTissuePulmonaryGenomic Data

Web

Multi-Dimensional Analysis & Data Mining Ad-hoc

Query

Error Report Benchmarking

De-IdentificationHonest Broker

Wound Center

Research

Web Scorecards & Dashboards

Image Analysis

Text Mining, NLP

Business Clinical

Research External

Meta Data

Ove

rvie

w

Crucial to Leverage Institutional Data

Ohio State Information Warehouse Infrastructure

Page 41: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Annotations and Imaging Markup (AIM)

Annotations and Imaging Markup Developer (AIM) provides a standard for medical image annotation and markup for images used in the research space, and in particular, the image based cancer clinical trial. It is notable that there is no existing standard for radiology annotation and markup. The caBIG® program is working with almost

every standards body such as DICOM to elicit consensus regarding use of AIM as the accepted standard for radiology annotation and markup, and is positioned to extend AIM to digital pathology.

The pixel at the tip of the arrow [coordinates (x,y)] in this image[DICOM: 1.2.814.234543.23243]represents the Ascending Thoracic Aorta[SNOMED:A3310657]

Aim Data Service Emory (AIME) is a caGrid data service that manages AIM documents in XML databases. AIME supports query, enumerationQuery, queryByTransfer, submit and submitByTransfer methods

Page 42: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

MicroAIM

• Provide a semantically enabled data model for pathology and microscopy image markups and observations

• Goal: interoperable data exchange and knowledge sharing

• Ongoing collaborative efforts to standardize via caBIG and Association for Pathology Informatics

• microAIM data service provides comprehensive query support, and can be efficiently implemented either through an XML-based or a relational based approach.

Page 43: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Pathology AIM; Semantic Annotation and Spatial Reasoning

Observation on a single or multiple objects

Support multiple objects and associate different observations to each object

Calculation on a single or multiple objects

Class of type to represent mask or field value

Represent provenance information for computed markup and annotation

Identify all instances of a set of cell types

Complex spatial queries: Quantify areas of glomeruloid microvascular proliferation within X microns of pseudopalisading necrosis

Page 44: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Use of caBIG Tools in Emory in Silico Brain Tumor Research Center

Osirix

XIP/AVT

Clear Canvas

NBIA

AIME

TCGA Portal

Modified Workstation Extended AIME Service

Enterprise Architecture 2.0

caB2B

caIntegrator 2

CBIIT Research Team

Carl Schaefer Jinghui Zhang

TBPT

ICR

VCDE/ARCH

Science

CIP:Research &

Clinical Trials

TCGA Portal

In vivoImaging

Page 45: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Data Science Research Challenges Driven by In Silico Discovery Research

• Data integration  that targets multiple data sources with conflicting metadata and conflicting data

• Efficient methods for semantic query that targets questions involving complex multi-scale features associated with petascale and exascale ensembles of highly annotated images

• Computer assisted annotation and markup for very large datasets

• Systems to support combinations of structured and irregular accesses to exascale datasets

Page 46: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Data Science Research Challenges

• Structural and semantic metadata management: how to manage tradeoff between flexibility and curation

• Data and semantic modeling infrastructures and policies able to scale to handle distributed systems with an aggregate of 10*9 or more data models/concepts

• Three dimensional (time dependent) reconstruction, feature detection and annotation of 3-D microscopy imagery

• Workflow infrastructure for large scale data intensive computations

Page 47: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Final Data Science Challenge: Large Dataset Size

– Basic small mouse is 10 cm3

– 1 μ resolution – very roughly 1013 bytes/mouse

– Molecular data (spatial location) multiply by 102

– Vary genetic composition, environmental manipulation, systematic mechanisms for varying genetic expression; multiply by 103

Total: 1018 bytes per big science animal experiment

Page 48: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Thanks to:

• In silico center team: Dan Brat (Science PI), Tahsin Kurc, Ashish Sharma, Tony Pan, David Gutman, Jun Kong, Sharath Cholleti, Carlos Moreno, Chad Holder, Erwin Van Meir, Daniel Rubin, Tom Mikkelsen, Adam Flanders, Joel Saltz (Director)

• caGrid Knowledge Center: Joel Saltz, Mike Caliguiri, Steve Langella co-Directors; Tahsin Kurc, Himanshu Rathod Emory leads

• caBIG In vivo imaging team: Eliot Siegel, Paul Mulhern, Adam Flanders, David Channon, Daniel Rubin, Fred Prior, Larry Tarbox and many others

• In vivo imaging Emory team: Tony Pan, Ashish Sharma, Joel Saltz• Emory ATC Supplement team: Tim Fox, Ashish Sharma, Tony Pan, Edi

Schreibmann, Paul Pantalone• Digital Pathology R01: Foran and Saltz; Jun Kong, Sharath Cholleti,

Fusheng Wang, Tony Pan, Tahsin Kurc, Ashish Sharma, David Gutman (Emory), Wenjin Chen, Vicky Chu, Jun Hu, Lin Yang, David J. Foran (Rutgers)

Page 49: Integrative Analysis of Pathology, Radiology and High Throughput Molecular Data

Thanks!