62
Data Visualization in Molecular Biology Alexander Lex July 29, 2013

Data Visualization in Molecular Biology Alexander Lex July 29, 2013

Embed Size (px)

Citation preview

Page 1: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

Data Visualization in Molecular BiologyAlexander Lex

July 29, 2013

Page 2: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

2

research requires understanding data

but there is so much of it…

Page 3: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

3

What is important?

Where are the connections?

Page 4: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

4

methylation levels

mRNA expression

copy number status

mutation status

microRNA expression

clinical parameters

pathways

protein expression

sequencingdata

publications

Page 5: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

5

Data Visualization

… makes the data accessible

… combines strengths of humans and computers

… enables insight

… communicates

Page 6: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

6

THREE TOPICS

Teaser Picture

Page 7: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

7

Cancer Subtype Analysis

Page 8: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

8

Pathways & Experimental Data

Page 9: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

9

Managing Pathways & Cross-Pathway Analysis

Page 10: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

10

Who am I?

PostDoc @ Harvard, Hanspeter Pfister’s Group

PhD from TU Graz, Austria

Co-leader ofCaleydo Project

Page 11: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

11

What is Caleydo?

Software analyzing molecular biology data

Software for doing research in visualizationdeveloped in academic setting

platform for trying out radically new visualization ideas

Quest for compromise between academic prototyping and ready-to-use software

Page 12: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

12

What is Caleydo?

Open source platform for developing visualization and data analysis techniques

easily extendible due to plug-in architecture

you can create your own views

you can plug-in your own algorithms

Page 13: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

13

The Team

Marc Streit Johannes Kepler University Linz, AT

Christian Partl Graz University of Technology, AT

Samuel Gratzl Johannes Kepler University Linz, AT

Nils Gehlenborg Harvard Medical School, Boston, USA

Dieter Schmalstieg Graz University of Technology, AT

Hanspeter Pfister Harvard University, Cambridge, USA

Page 14: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

14

CANCER SUBTYPE VISUALIZATION

Caleydo StratomeX

Page 15: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

15

Cancer Subtypes

Cancer types are not homogeneous

They are divided into Subtypesdifferent histology

different molecular alterations

Subtypes have serious implicationsdifferent treatment for subtypes

prognosis varies between subtypes

Page 16: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

16

Cancer Subtype Analysis

Done using many different types of data,

for large numbers of patients.

Page 17: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

18

Goal:

Support tumor subtype characterization

through

Integrative visual analysis of cancer genomics data sets.

Page 18: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

19

StratificationPa

tient

sTabular Data

Candidate Subtypes

Genes, Proteins, etc.

Page 19: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

20

Stratification of a Single Dataset

Cluster A1

Cluster A2

Cluster A3

Subtypes are identified by stratifying datasets, e.g.,

based on an expression pattern

a mutation status

a copy number alteration

a combination of these

Page 20: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

21

Patie

nts

Genes

Candidate Subtype /Heat Map

Header /Summary of whole Stratification

Page 21: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

22

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 22: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

23

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 23: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

24

Clustering of mRNA Data

Stratification on Copy Number Status

Many shared Patients

Page 24: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

25

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 25: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

26

Stratification of Multiple Datasets

Cluster A1

Cluster A2

Cluster A3

B1

B2

Dependent Data,e.g. clinical data

Dep. C1

Dep. C2

Tabulare.g., mRNA

Categorical,e.g., mutation status

Page 26: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

27

Survival data in Kaplan Meier plots

Page 27: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

28

Detail View

Page 28: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

29Dependent Pathway

Page 29: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

30

Page 30: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

31Stratification based on

clinical variable (gender)

Page 31: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

32

How to Choose Stratifications?

~ 15 clusterings per matrix

~ 15,000 stratifications for copy number & mutations

~ 500 pathways

~ 20 clinical variables

Calculating scores for matches

Ranking the results

Page 32: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

33

Query column Result column

Ranked Stratifications

Considered Datasets

Page 33: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

34

Algorithms for finding..

… matching stratification

… matching subtype

… mutual exclusivity

… relevant pathway

… stratification with significant effect in survival

… high/low structural variation

Page 34: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

35

Live-Demo!

http://stratomex.caleydo.org

Page 35: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

36

PATHWAYS & EXPERIMENTAL DATA

Caleydo enRoute

Teaser Picture

Page 36: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

37

Experimental Data and Pathways

Cannot account for variation found in real-world data

Branches can be (in)activated due to mutation,

changed gene expression,

modulation due to drug treatment,

etc.

Page 37: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

38

Why use Visualization?

Efficient communication of information

A -3.4

B 2.8

C 3.1

D -3

E 0.5

F 0.3

C

B

D

F

A

E

Page 38: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

39

Experimental Data and Pathways

[Lindroos2002]

[KEGG]

Page 39: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

40

Five RequirementsIdeal visualization technique addresses all

Talking about 3 today

Page 40: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

41

R I: Data Scale

Large number of experimentsLarge datasets have more than 500 experiments

Multiple groups/conditions

Page 41: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

42

R II: Data Heterogeneity

Different types of data, e.g.,mRNA expression numerical

mutation statuscategorical

copy number variation ordered categorical

metabolite concentration numerical

Require different visualization techniques

Page 42: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

43

R V: Supporting Multiple Tasks

Two central tasks:Explore topology of pathway

Explore the attributes of the nodes (experimental data)

Need to support both!

C

B

D

F

A

E

Page 43: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

44

Pathway View

A

E

C

B

D

F

Pathway View

C

B

D

F

A

E

enRoute View

Concept

Group 1Dataset 1

Group 2Dataset 1

Group 1Dataset 2

B

C

F

A

D

E

Page 44: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

45

Pathway View

On-Node Mapping

Path highlighting with Bubble Sets [Collins2009]

IGF-1

low high

Page 45: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

46

enRoute View

Group 1Dataset 1

Group 2Dataset 1

Group 1Dataset 2

B

C

F

A

D

E

Path Representation

Page 46: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

47

enRoute View

Group 1Dataset 1

Group 2Dataset 1

Group 1Dataset 2

B

C

F

A

D

E

Experimental Data Representation

Page 47: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

48

Experimental Data Representation

Gene Expression Data (Numerical)

Copy Number Data (Ordered Categorical)

Mutation Data

Page 48: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

49

enRoute View – Putting All Together

Page 49: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

50

CCLE Cell lines & Cancer Drugs

Analysis by Anne Mai Wasserman

Page 50: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

51

MANAGING PATHWAYS & CROSS-PATHWAY ANALYSIS

Collaboration with AM Wassermann, M Borowsky, M Glick @NIBR

Teaser Picture

Page 51: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

52

Pathways

Partitioning in pathways is artificial

Purpose: reduce complexity “Relevant” subset of nodes and edge

Makes it hard tounderstand cross-talk

identify role of nodes in other pathways

Page 52: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

53[Klukas & Schreiber 2006]

Page 53: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

54

Solution: Contextual Subsets

Page 54: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

55

Solution: Contextual Subsets

Page 55: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

56

Page 56: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

57

Levels of Detail

Experimental data highlights

Thumbnail showing paths

Page 57: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

58

How to Select Pathways?

Search Pathway

Find pathways that contain focus node

Find pathway that is similar to another one

Page 58: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

59

Ranked pathway list

Datasets

Selected Path

Page 59: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

60

Integrating enRoute

Page 60: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

61

Video!

http://enroute.caleydo.org

Page 61: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

62

More Information

http://caleydo.orgSoftware, Help, Project Information, Publications, Videos

Page 62: Data Visualization in Molecular Biology Alexander Lex July 29, 2013

63

Data Visualization In Molecular Biology

Alexander Lex, Harvard [email protected]://alexander-lex.com

?Credits:

Marc Streit, Nils Gehlenborg, Hanspeter Pfister, Anne Mai

Wasserman, Mark Borowsky,, Christian Partl, Denis Kalkofen,

Samuel Gratzl, Dieter Schmalstieg