9
3/22/2017 1 COMPELLING USE-CASES FOR IMMEDIATE DEPLOYMENT OF IMAGE-BASED ANALYTICS IN DIGITAL WHOLE SLIDE IMAGING PATHOLOGY WORKFLOW Ulysses G. J. Balis, M.D., FCAP, FASCP, FAIMBE Professor of Pathology & Director, Division of Pathology Informatics Director, Computational Pathology Lab Section Department of Pathology Michigan Medicine [email protected] Disclosure of Relevant Financial Relationships USCAP requires that all planners (Education Committee) in a position to influence or control the content of CME disclose any relevant financial relationship WITH COMMERCIAL INTERESTS which they or their spouse/partner have, or have had, within the past 12 months, which relates to the content of this educational activity and creates a conflict of interest. Disclosure of Relevant Financial Relationships . Dr. Balis declares affiliation with: Inspirata, Inc. – Strategic Advisory Board (This is included for completeness; no commercial or proprietary information is included in this presentation) Outline Observations about data growth in Pathology Some thoughts on the “Hype Cycle” Maturation of required computational solutions needed in support of deploying WSI Workflow models Highthroughput computation (GPUs bot local and cloudbased) Some thoughts on information theory and data compression Transitioning to cloud services to realize HighThroughput WSI solutions. Example Opportunities, made possible by WSIbased workflow: Rare micrometastatsis detection Mitotic Figure detection Another Motivation: Content Based Image Retrieval Example Usecase: Democratization of Imagebased Analytics on the Web Closing thoughts Data Portfolio: Contemporary Pathology Setting 2017 Diagnostic Text Imagebased Data All Other Metadata The Present

Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

  • Upload
    lenhu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

1

COMPELLING USE-CASES FOR IMMEDIATE DEPLOYMENT OF IMAGE-BASED ANALYTICS IN DIGITAL WHOLE SLIDE IMAGING PATHOLOGY WORKFLOW

Ulysses G. J. Balis, M.D., FCAP, FASCP, FAIMBEProfessor of Pathology &Director, Division of Pathology InformaticsDirector, Computational Pathology Lab SectionDepartment of PathologyMichigan [email protected]

Disclosure of Relevant Financial Relationships

USCAP requires that all planners (Education Committee) in a position to

influence or control the content of CME disclose any relevant financial

relationship WITH COMMERCIAL INTERESTS which they or their

spouse/partner have, or have had, within the past 12 months, which relates to

the content of this educational activity and creates a conflict of interest.

Disclosure of Relevant Financial Relationships

.

Dr. Balis declares affiliation with:

Inspirata, Inc. – Strategic Advisory Board

(This is included for completeness; no commercial or proprietary information is included in this presentation)

Outline• Observations about data growth in Pathology

• Some thoughts on the “Hype Cycle”

• Maturation of required computational solutions needed in support of deployingWSI

– Workflow models

– High‐throughput computation (GPUs bot local and cloud‐based)

• Some thoughts on information theory and data compression 

• Transitioning to cloud services to realize High‐Throughput WSI solutions.

• Example Opportunities, made possible by WSI‐based workflow:

• Rare micrometastatsis detection

• Mitotic Figure detection

• Another Motivation:  Content Based Image Retrieval

• Example Use‐case:  Democratization of Image‐based Analytics on the Web

• Closing thoughts

Data Portfolio:Contemporary Pathology Setting ‐ 2017

DiagnosticText

Image‐basedData

All OtherMetadata

The Present

Page 2: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

2

Data Portfolio:Digital Pathology Workspace ca. 2022

DiagnosticText

Image‐basedData

All OtherMetadata

…with near‐complete adoption

Plateau of ProductivitySlope of Enlightenment

Trough of Disillusionment

Peak of Inflated

 Expectations

Innovation Trigger Phase

The Hype Cycle as Witnessed within Digital Pathology

Time

Ex

pe

cta

tio

ns

Deep Learning All-digital Whole Slide Imaging Workflow

Specific DP Reimbursement Models

High ThroughputScanners and GPU-basedComputation

Digital Consultation Outreach

Mul

tiple

x as

says

FDA Clearance for Primary Diagnosis

QuantitativeImmunoscoring

Info

rmed

Det

ectio

n

Conventional MachineLearning

NLP

DICOM

Liqu

id B

x

Effective Federated Integration with AP-LIS systems

High Level Integrated Diagnostics Architectural Map

9

Enterprise Service‐Oriented 

Architecture M

essage Bus

Enterprise Service‐Oriented 

Architecture M

essage Bus

Discreet Numerical Data Parsing Pipeline

Free Text NLP Parsing Pipeline

Staging

Staging

Scanning Center

Numerical Validation

Lexical Validation

Image Aggregation

Relational DB

Relational DB

Relational DB

Final Data Transformation

Multi‐AxialEdge‐Connected

AndRelational Database withHigh‐Performance Cluster

Image Scanning Pipeline

MultipleUser ClassesFinal Data 

Transformation

Image Analysis / Informed Detection

Multiple Clinical Data Sources

Epic

LIS

AP‐LIS

PACS

CancerRegistry

High LevelDiscovery Workflow Map

EMREMR

ADT / Billing System

ADT / Billing System

Data Warehouse

Data Warehouse

Other Clinical RepositoriesOther Clinical Repositories

Interface Engine (WBI, Cloverleaf, eGate, etc.)

Interface Engine (WBI, Cloverleaf, eGate, etc.)

LISLIS

Digital Integration Engine

Digital Integration Engine

High‐ThroughputScanning FacilityHigh‐ThroughputScanning Facility

High‐PerformanceScanner and Analysis 

Server

High‐PerformanceScanner and Analysis 

Server

MirroredHigh‐performance

On‐line and Near‐lineImage Storage

MirroredHigh‐performance

On‐line and Near‐lineImage Storage

Pathology Discovery WorkstationsPathology Discovery Workstations

Barcode Tracking System

Barcode Tracking System

1 2 3 5

Application ServerApplication Server

DICOM‐BasedPACS

DICOM‐BasedPACS

Overall Application SuiteOverall Application Suite

MirroredHigh‐performance

Image Server

MirroredHigh‐performance

Image Server

4

Graphics Processing Units (GPUs) as a Computational Solution to Needed Scale

• Instead of running a program as a single linear set of operations (“a thread”), why not run 1,000 or 10,000 threads, concurrently?

• GPU‐based computation is usually only amenable to algorithms that can be subdivided without the need for substantial inter‐thread communication

• Whole Slide Imaging computation, by virtue of its tiled structure, is an ideal candidate for most image search and analytical operations

• GPU technology has become very affordable• GPU solutions are now available as a commodity 

from cloud service providers.

Hence, conditions are perfect for transitioning to cloud‐based WSI analytics!

Claude Elwood Shannon (April 30, 1916 – February 24, 2001)

• Formalized our modern understanding of Information Theory and entropy• Remarkably, little use of Information theory has been applied to systematically 

extracting meaningful information from WSI data sets• Loss and lossless compression are often applied without rigorous analysis 

assessment of pre‐and post‐operation information content• It turns out that data compression is an incredibly important topic with respect 

to high‐throughput WSI analytics 

Information Theory as Applied to Digital Pathology Image Subject Matter and Image Search

Page 3: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

3

3 x 5 effective pixels  15 bytes of information• Essentially all high frequency information is 

absent• Compression ratio of 71,093::1 as compared 

to the original

6 x 10 effective pixels  60 bytes• Most high frequency information is absent• Compression ratio of 17,773::1 as compared 

to the original

12 x 20 effective pixels  240 bytes• Minimal high frequency information is 

present• Compression ratio of 4,443::1 as compared to 

the original

24 x 40 effective pixels  960 bytes• Probably enough high frequency information 

is present• Compression ration of 1,111::1 as compared 

to the original

48 x 80 effective pixels  3840 bytes• Adequate high frequency information is 

present• Compression ratio of 278::1 as compared to 

the original

800 x 1333 effective pixels  1.1 Mb• All high frequency information present• Original Image

Page 4: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

4

4 x 3 effective pixels  12 bytes• 1670,000::1 

8 x 6 effective pixels  48 bytes• 41,752::1 

16 x 13 effective pixels  208 bytes• 9,635:1 

32 x 26 effective pixels  832 bytes• 2409::1 

64 x 51 effective pixels  3,264 bytes• 614::1 

128 x 102 effective pixels  13,056 bytes• 153::1 

Page 5: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

5

256 x 205 effective pixels  52,480 bytes• 38::1 

1583 x 1266 effective pixels  2 Mb

Malignant Melanoma

Use Case – Informed Detection

• Micro‐metastasis identification is time consuming

• A  pre‐screening tool would save pathologist time if sufficient sensitivity is realized

• In such circumstances, the pathologist’s task is shifted to directed review, which is less fatiguing, allowing the pathologist to practice at their highest credentialed level

• This is no longer in the realm of fiction

Page 6: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

6

Use Case – Mitosis Counting

• Mitotic figure identification is also time consuming

• A  pre‐screening tool would similarly save pathologist time if sufficient sensitivity is realized

• Neuropathology and bone & soft tissue services can allocate substantial time for this task

A Foundational Model Building towards Image‐Based Differential Diagnosis Generation

FullyAutomated Diagnosis

Semi‐Supervised Dx

Machine Learning Techniques Normalized Discrete Vectorized and 

Scalar Data  

Application of Individual Image Extraction Operators (focal, diffuse and global)

Mature Technologies (scanners, storage, servers, GPUs and Network bandwidth)

Use Case: Using Images Themselves toSearch Image Repositories & Retrieve Associated Case Metadata:

The Dawn of Image‐Based Predictive Assays

• This is potentially a “killer apps” for the field of Whole Slide Imaging

• This can extract information not directly available to human cognition, and therefore not available through optical microscopy alone; can only be reached by means of DP

• When validated, predictive assays hold the potential to elevate use of WSI as a “must have” modality, essentially creating a new god standard

Use Case: Searching Libraries ofPathology Images with Images Themselves:

A Schematic Perspective

Extraction from Imagerepositories based uponspatial information

Analysis of datain the digital domain

…001011010111010111..

Content- Based Image Retrieval (CBIR)

Resultant gallery of matching images and any/all associated metadata 

Initial Predicate Image with feature of interest

Use of CBIR to rapidly converge on a classifier

Page 7: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

7

…This begs the following question…• What is the actual useful (actionable) information content of the WSI data that we are generating?

• Does it make sense to query WSI datasets in native (non‐tokenized) format

• Are there more efficient ways to represent the spatial data?

• Are there some data elements that are more important than others?

Three Generations of Texture‐Based Pattern Recognition Software

• I – Vector Quantization (VQ)

• II – Spatially Invariant Vector Quantization (SIVQ)

• III – Vector Invariant Pattern Recognition (VIPR)

Markov Field Synthesis grid

Markov Field Synthesis grid

SIVQ individual search predicate feature match

SIVQ individual search predicate feature match

SIVQ individual search predicate feature match

Bayesian Probability Engine

A Matter of Degrees of Freedom…

Candidate Feature

How many ways can

this be sampled?

How Many Ways Can A Candidate Feature Be Matched During Training?

Y Translational Freedom

X Translational Freedom

RotationalFreedom

Page 8: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

8

The Compression Opportunity of SIVQ / VIPR:It may be the same feature but there are excessively enumerable ways to sample

• Typical Feature Vector:– 25 x 25 pixels (x by y) or larger

• 625 translational degrees of freedom

– Effective radius of 12.5 pixels– After Nyquist rotational sampling (2x spatial frequency)

• 2 x (2 x 12.5 x π)  79 separate rotations

– 3 color planes– 2 mirror symmetries– At least 20 possible semi‐discreet length‐scale Nyquist samples– All together, there are at least 625 x 79 x 3 x 2 x 20 5,925,000 possible ways 

to represent one possible vector (assuming twenty fixed magnifications in use)

Further Possible Reductions in Degrees of Freedom• Length Scale

– Up to 20x impact on search space (40:2 magnification ratio)

• Dynamic Range (contrast) – 3x impact on search space

• Black Level Offset (brightness)– 5x impact on search space

• Biased distortion ellipsoid compression of fundamental circular vectors– 30x (both angle of axis and degree of distortion) 

• Total further reductions: at least 9000, or approximately 4 orders or magnitude, in addition to the initial 5.9 million‐to‐one reduction ratio

Total Realized Search Space Reductions

• RGB Images– 5,925,000 * 104 = ~60 * 109

– (60 billion equivalent Cartesian vectors)

• Computational performance is improved linearly by the reduction of required comparisons for each matching class (at least 60 billion times faster search for the predicate or interest)

• In many cases, a complete feature descriptor can be described with as few as even a single vector.

Motivation: Why Develop Semi‐Supervised and Unsupervised Tools for Differential Diagnosis Generation?• Not to replace the pathologist, but rather to:

– Transition primary screening activities (time consuming and tedious) to a directed review paradigms  (faster and less fatiguing)

– Add an interactive machine vision layer to the sign‐out process, conferring quantitative, prognostic and theragnostic data, as required   

• Find all image‐based matches (or near‐matches) in a repository that correlate with the current image, based on spatially‐based acceptance criteria

• Use the matching images as a source of statistically convergent metadata that fits an established thresholds for predictive power (standard ROC  performance metrics) for key concepts such as:

– Diagnosis 

– Biological potential of malignancy (e.g. survival)

– Expected disease‐free survival following specific therapies

– Image‐enhanced Kaplan‐Meyer statistics

– Histology‐normalized response to therapeutic agents / regimen / clinical course

– Association with genomic data already known for the image‐matched cohort of cases  (allowing  for the constitutive image features to serve as a proxy for previously established multi‐dimensional correlates between morphology and the molecular basis of disease, once initial discovery has bridged the two) 

Simple Use Case Already Reduced to Practice:Ground Truth Cancer Mapping

• Useful for precisely identifying all areas of a whole‐slide image that are involved by malignancy– Tumor quantization

– Automated gating for LCM

– Fiduciary mapping for multi‐modality fusion studies

• As vectors are internally derived for each case, inter‐slide variability from fixation and staining becomes inconsequential  

Interactive Demonstration

• Web‐based deployment of WSI viewing in tandem with high‐performance computation

• Allows for real‐time analytical and diagnostic activities

• Publically available

Page 9: Disclosure of Relevant Financial Relationships - USCAP · Deep Learning All-digital Whole Slide Imaging Workflow Specific DP Reimbursement Models High Throughput Scanners and GPU-based

3/22/2017

9

Final Observations• The combined democratization of high‐throughput 

computing, as made possible with cloud‐based GPU solutions, in tandem with algorithms that can effectively operate in compressed data spaces, bodes well for the development of real time, high throughput WSI algorithms

• Continued migration to the cloud will accelerate the pace of discovery and implementation

• These tools are very real and the data they can extract will be incremental to our current diagnostic armamentarium

Important Information Regarding CME/SAMs

The Online CME/Evaluations/SAMs claim process will only be available on the USCAP website until September 30, 2017.

No claims can be processed after that date!

After September 30, 2017 you will NOT be able to obtain any CME or SAMs credits for attending this meeting.