View
932
Download
1
Embed Size (px)
DESCRIPTION
Integrative ‘-omics’ – wherein multiple types of high-throughput data are combined and analysed together – continues to grow in popularity for its potential to illuminate the basis of complex diseases. Our work explores different ways of combining such data to reveal insights into cancer biology.
Citation preview
Integration and analysis of high-throughput data types for insights into
complex disease
National Council of Women NSW
Olena Pchilka Branch of the Ukranian Women’s Association NSW
SARAH-JANE SCHRAMM
Multiple approaches
Melanoma
› High and rising incidence
› Aggressive and therapy resistant, surgical resection is key
› Same stage disease can have markedly different survival outcomes
› Patient outcome predicted using clinical and histological features
› Limited predictive power for individual patients
Stages I & II, primary melanoma
Stage III, lymphatic drainage from primary (nodal metastases)
Stage IV, further dissemination (distant metastases)
Image adapted and reproduced from LANCET ONCOLOGY|Vol 8|2007
Research aims
› New prognostic markers
- To determine whether there are significant biomarker and pathway differences between melanomas of good and bad prognosis after resection of nodal metastatic disease;
› New therapeutic targets
- To identify and validate the principal regulatory pathway abnormalities that characterise metastatic (stage III and IV) melanomas;
- To investigate novel genomic drivers of melanoma tumour progression and outcome.
What is Cancer Systems Biology?
“Recent findings point to daunting heterogeneity within individuals, and even
within tumours over time…
…rummaging through that complexity is exactly what systems biologists
do…
…Rather than focusing on one molecular pathway, this integrative approach
blends many contexts, including DNA, RNA, proteins, signalling networks,
cells, organs, whole organisms and even environmental factors.
This varied data mix requires scientists to build complex mathematical
models of cancer, which in turn drive new research questions…”
Reprinted from NATURE|Vol 464|2010
How does one do Cancer Systems Biology?
1. Collect and prepare different data types such as,
› Gene expression microarray data
› MicroRNA expression array data
› Proteomic data
› Clinical data e.g., survival data
› Pathologic data e.g., subtypes
› Mutation data e.g., RNA/DNA-seq
2. Combine and interpret data with mathematical models
3. Validate the models
Slide adapted from Los Alamos q-bio Summer School, 2009
1. Collection and preparation of data
P Natl Acad Sci USA|Nov. 13|2009 Clin Cancer Res.|Vol. 14|2008 Clin Cancer Res.|Vol. 16|2010 JID|Vol 133|2013
Gene expression microarray data
Thank you to Drs Anna Campain, Vivek Jawayasal and Yee Hwa Yang, School of Mathematics & Statistics, The University of Sydney
1. Collection and preparation of data
CLINICAL DATA
Tumour_DateBanked
Person_Sex
Person_DateBirth
Person_NumPrim
Person_DateLastFUDeath
Person_FUStatus
Person_StageatBank
Person_DateRelapse
Age_Analysis
Prognosis_TimeSinceLNMet GENOTYPE
Tum_BRAFmut
Tum_NRASmut
Tum_FLT3mut
Tum_METmut
Tum_PIK3CAmut
Tum_PDGFRAmut
Tum_EGFRmut
PATHOLOGY - PRECEDING PRIMARY
Person_NumPrim
Prim_Worst
Prim_BestGuess
Prim_Date_Diag
Prognosis_TimeOverall
Prim_Site
Prim_Site_SunExp
Prim_Stage
Prim_TStage
Prim_NStage
Prim_Naevus
Prim_Breslow
Prim_Mitos
Prim_Clark
Prim_Histol
Prim_Regress
Prim_Ulc
Prim_Vasc
Prim_LymphInv
Prim_Satell
Sun_Damage_Score
NM
SSM
PATHOLOGY - METASASES
Tum_NumNodesInv
Tum_MetSize
Tum_Extranodal
Tum_CellType
Tum_CellSize
Tum_Necrosis
Tum_Pigment
Tum_NonTumour%
Clinical, pathological, and mutation type data
Thank you to Prof. Richard Scolyer and his team at the Royal Prince Alfred Hospital, The University of Sydney
1. Collection and preparation of data
› Human Protein Reference Database
- Keshava Prasad et al. 2009
› iRefWeb
- Turner et al. 2010
› BioGRID
- Chatr-aryamontri et al. 2013
› MetaCore
- From GeneGo Inc.
Hairball image generated using Cytoscape
(Smoot et al. 2011)
Protein-protein interaction data
Thanks to Simone Li and Drs Igy Pang and David Fung at the Systems Biology Initiative, the University of New South Wales
2. Mathematical modeling and interpretation
NATURE BIOTECH.|Vol 27|2009
2. Mathematical modeling and interpretation
Results – gene co-expression networks are significantly disturbed among patients with good and poor clinical outcomes
› A:
› Patients surviving >4yr post
resection of metastatic disease
› B:
› Patients surviving <1yr post
resection of metastatic disease
› C & D:
› Enlarged view (HDAC)
PIG. CELL & MEL. RES.|26(5):708-22|2013
2. Mathematical modeling and interpretation
Results – hubs are reproducibly ‘disturbed’ among good and poor outcomes
Gene symbol ID Known drug target
Causally
implicated in
cancer(s)
Number of
interaction
partners (k) = 6-38
Previously
prognosis-
associated
Previously
progression-
associated
Previously tumor
thickness-
associated
Protein type1
AKT1 P P Protein Kinase
APPL1 P Protein
CCNA2 P P Protein
CDC25A P Phosphatase
CIITA P P Protein
CREBBP P Enzyme
CSNK2A1 Protein Kinase
FANCG P P Protein
GATA4 P Transcription
Factor
GRAP2 P Protein
GRB2 Protein
HDAC1 P Enzyme
PIG. CELL & MEL. RES.|26(5):708-22|2013
2. Mathematical modeling and interpretation
Results – hubs are reproducibly ‘disturbed’ among good and poor outcomes
Gene symbol ID Known drug target
Causally
implicated in
cancer(s)
Number of
interaction
partners (k) = 6-38
Previously
prognosis-
associated
Previously
progression-
associated
Previously tumor
thickness-
associated
Protein type1
HIF1A P P P Transcription
Factor
IKBKB P P Protein Kinase
IL16 Receptor
Ligand
JAK1 P P Protein Kinase
KHDRBS1 P Protein
MYBL2 P Transcription
Factor
NF2 P P Protein
PDZK1 P Protein
PIM1 P P P Protein Kinase
PSTPIP1 P Protein
PTPN11 P P Phosphatase
RAPGEF1 P Regulator
PIG. CELL & MEL. RES.|26(5):708-22|2013
2. Mathematical modeling and interpretation
Results – hubs are reproducibly ‘disturbed’ among good and poor outcomes
Gene symbol ID Known drug target
Causally
implicated in
cancer(s)
Number of
interaction
partners (k) = 6-38
Previously
prognosis-
associated
Previously
progression-
associated
Previously tumor
thickness-
associated
Protein type1
RBL1 P Protein
RBX1 P Enzyme
SMAD2 P Transcription
Factor
SMAD7 P Protein
STAMBP P Metalloproteas
e
TGM2 P P Enzyme
TLE1 Protein
TNF P P P Receptor
Ligand
› 9 are already known drug targets (although not in melanoma)
› 8 already causally implicated in other cancers
› 5 previously associated with melanoma progression or prognosis or indirectly associated
via correlation with tumor thickness (more than would be expected by chance)
PIG. CELL & MEL. RES.|26(5):708-22|2013
2. Mathematical modeling and interpretation
15
Results – top ranking hubs are cancer-associated both individually (below) and as a gene set (data not shown)
PIG. CELL & MEL. RES.|26(5):708-22|2013
2. Mathematical modeling and interpretation
Results - top ranking hubs can be used together to predict patient outcome
Cohort Mann Bogunovic Jönsson John
Sample size (ngood
outcome; npoor outcome) 47 (23;25) 33 (23;10) 54 (7;47) 24 (10;14)
Classes compared
survival >4yr with no
sign of relapse or <1yr
after surgical resection
of stage III disease
survival ≥ 1.5yr
or<1.5yr since
metastasis
overall survival
time taken to tumor
progression from stage
III to stage IV disease
≥2yr or <2yr
Class prediction error
rate (LOOCV under
KNN)
0.33 0.24 0.20 0.29
• Comparison with standard-of-care prognostic markers
• Novel proposed prognostic biomarkers should be tested for improved performance relative to current biomarkers (McShane, Altman et al. 2005)
• We compared the prediction accuracy of our 32-hub classifier with the prediction accuracy of the four most statistically significant clinico-pathologic prognostic parameters in stage III melanomas: i.e., number of tumor-positive lymph nodes, tumor burden at the time of staging (microscopic v. macroscopic), presence or absence of primary tumor ulceration, and thickness of the primary melanoma (Balch, Gershenwald et al. 2009).
• Misclassification rate of 56% for our set of 48 patients, which is less accurate than the misclassification rate of 33% obtained for this cohort using the hub-based classifier.
PIG. CELL & MEL. RES.|26(5):708-22|2013
3. Validation with a view to a mechanism…?
› Exome sequencing data (Hodis et al. 2012)
› Calculation of functional mutation burden for ~16,000 genes (Broad
Institute software)
› Functional mutation burden is significantly (P<0.05) higher in protein
interaction partners of top-ranking hubs than would be expected by chance
› So, is functional mutation burden a pathogenic mechanism behind the
differential network behaviour we observe between patients with good and
poor clinical outcome?
› If so, can differential network behaviour act as a compass by indicating
genomic areas (i.e., members of disturbed networks) that should be
carefully scrutinized (including non-coding regions) for undiscovered and
potentially targetable mutations???
› More work is needed!
17
An association between network-type and functional mutation burden
PIG. CELL & MEL. RES.|26(5):708-22|2013
Summary and conclusions
• Used a large-scale, ‘systems biology’ approach to identify features of intracellular networks that are perturbed in poor-prognosis metastatic melanoma.
• Showed this to be consistent in a number of independent patient cohorts identifying:
• A portfolio of high priority potential targets for therapy, characterised by enrichment for cancer pathways, existing cancer drug targets, and functional mutation burden
• Gene expression of the 32 hubs forms a new, a priori-selected prognostic gene expression signature in the setting of metastatic melanoma: a critical turning point for many patients for which therapeutic options are very limited (but further validation needed).
• Present work is focussed on investigating our preliminary observation that network disturbances are associated with higher functional mutation burden
• Modelling and integration of different data types to answer clinically relevant questions is ongoing
Square One.
› Perform equivalent experiments using data from larger cohorts as well as other cancer types to see whether the observation can be repeated
› So,
- 1. Collect and prepare data: breast (TCGA), ovarian (Metabric), and melanoma (in-house/TCGA), lung (TCGA)
• Permissions and applications…yikes!
- 2. Mathematical modelling and interpretation
• In collaboration with The USYD Maths and Stats team (Yee Hwa Yang, Shila Ghazanfar, and John Ormerod)
• Software generated in-house and available externally (VAN, Jayaswal et al. 2013; MuText and InVex – Broad Institute, Hodis et al. 2012)
- 3. Validation…
19
An association between network phenotype and functional mutation burden?
Sincere thanks to Dr Yee Hwa Yang and Shila Ghanazfar for their essential collaboration in this work
Software spruik
20
VAN: identifying biologically perturbed networks using differential variability analysis
BMC RES. NOTES.|6(430)w|2013, special thanks to Dr Vivek Jayaswal for his invaluable collaboration
Issues common among different integration approaches
• Power
• Handling of prior knowledge biases
• Visualisation
• Maintaining clinical relevance
• Computational search space
Acknowledgements
› UNSW
› Marc Wilkins
- Simone Li
- Chi Nam Ignatius Pang
- David Fung
- Apurv Goel
- Natalie Twine
› USYD
› Graham Mann
- Gulietta Pupo & Varsha Tembe
› Swetlana Mactier
› Richard Scolyer (RPA)
› Yee Hwa Yang
- Anna Campain
- Vivek Jayaswal
- Kaushala Jayawardana
- Shila Ghanazfar
My contact details:
p. 0408 260 588