50
SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identifies a synergistic dependency between FOXM1 and CENPF that drives prostate cancer malignancy Alvaro Aytes 1,8,* , Antonina Mitrofanova 2,* , Celine Lefebvre 2,* , Mariano J. Alvarez 2 , Mireia Castillo-Martin 9 , Tian Zheng 7,10 , James A. Eastham 11 , Anuradha Gopalan 12 , Kenneth J. Pienta 13 , Michael M. Shen 2,3,4,7 , Andrea Califano 3,5,7 , and Cory Abate-Shen 1,2,6,7 Departments of 1 Urology, 2 Systems Biology, 3 Medicine, 4 Genetics & Development, 5 Biochemistry and Molecular Biophysics, and 6 Pathology and Cell Biology 7 Herbert Irving Comprehensive Cancer Center Columbia University Medical Center, New York, NY 10032 8 Translational Research Laboratory, Catalan Institute of Oncology, Bellvitge Institute for Biomedical Research, L’Hospitalet de Llobregat, Barcelona, Spain 08907 9 Department of Pathology, Icahn School of Medicine at Mount Sinai New York, NY 10029 10 Department of Statistics, Columbia University, New York, NY 10027 Departments of 11 Urology and 12 Pathology, Memorial Sloan Kettering Cancer Center, New York, NY 10065 13 The University of Michigan Ann Arbor, MI 48109, and the Brady Urological Institute at the Johns Hopkins School of Medicine, Baltimore, MD 21231 * These authors contributed equally to this study May 12, 2014 Contents 1 Introduction 2 2 Transcriptional Gene Regulatory networks (Interactomes) 2 2.1 Reconstructing Interactomes (data objects) ............................ 2 2.2 ChIP-Chip and ChIP-Seq validation of ARACNe inferred targets (data objects) ........ 3 3 Conservation of human and mouse prostate interactomes (R code and objects) 4 4 Comparison of mouse models to humal malignancy signature (data objects) 6 5 Master Regulator Analysis (MARINa) 7 5.1 Human Master Regulators (data objects) ............................. 7 5.2 Mouse Master Regulators (data objects) .............................. 7 6 Datasets for clinical validation (R code and objects) 7 1

SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

  • Upload
    others

  • View
    17

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

SWEAVE documentation:

Cross-species analysis of genome-wide regulatory networks identifies

a synergistic dependency between FOXM1 and CENPF that drives

prostate cancer malignancy

Alvaro Aytes1,8,*, Antonina Mitrofanova2,*, Celine Lefebvre2,*, Mariano J. Alvarez2, MireiaCastillo-Martin9, Tian Zheng7,10, James A. Eastham11, Anuradha Gopalan12, Kenneth J.

Pienta13, Michael M. Shen2,3,4,7, Andrea Califano3,5,7, and Cory Abate-Shen1,2,6,7

Departments of 1Urology, 2Systems Biology, 3Medicine, 4Genetics & Development,5Biochemistry and Molecular Biophysics, and 6Pathology and Cell Biology

7Herbert Irving Comprehensive Cancer CenterColumbia University Medical Center, New York, NY 10032

8Translational Research Laboratory, Catalan Institute of Oncology, Bellvitge Institute forBiomedical Research, L’Hospitalet de Llobregat, Barcelona, Spain 08907

9Department of Pathology, Icahn School of Medicine at Mount Sinai New York, NY 1002910Department of Statistics, Columbia University, New York, NY 10027

Departments of 11Urology and 12Pathology, Memorial Sloan Kettering Cancer Center, NewYork, NY 10065

13The University of Michigan Ann Arbor, MI 48109, and the Brady Urological Institute atthe Johns Hopkins School of Medicine, Baltimore, MD 21231

* These authors contributed equally to this study

May 12, 2014

Contents

1 Introduction 2

2 Transcriptional Gene Regulatory networks (Interactomes) 22.1 Reconstructing Interactomes (data objects) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22.2 ChIP-Chip and ChIP-Seq validation of ARACNe inferred targets (data objects) . . . . . . . . 3

3 Conservation of human and mouse prostate interactomes (R code and objects) 4

4 Comparison of mouse models to humal malignancy signature (data objects) 6

5 Master Regulator Analysis (MARINa) 75.1 Human Master Regulators (data objects) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75.2 Mouse Master Regulators (data objects) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

6 Datasets for clinical validation (R code and objects) 7

1

Page 2: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

7 Kaplan-Meier survival analysis (R code and objects) 87.1 Swedish dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

7.1.1 Prostate Cancer-Specific Survival: based on transcriptional activity . . . . . . . . . . . 87.1.2 Prostate Cancer-Specific Survival: based on expression levels . . . . . . . . . . . . . . 13

7.2 Glinsky dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197.2.1 Time to BioChemical Reccurence: based on transcriptional activity . . . . . . . . . . . 207.2.2 Time to BioChemical Reccurence: based on expression levels . . . . . . . . . . . . . . 24

7.3 TMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307.3.1 BioChemical Reccurence Free Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . 317.3.2 Prostate Cancer-Specific Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367.3.3 Metastases-Free Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

8 C-statistics (R code and objects) 468.1 C-statistics based on Prostate Cancer-Specific Survival . . . . . . . . . . . . . . . . . . . . . . 478.2 C-statistics based on Metastases-Free Survival . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

1 Introduction

This SWEAVE document provides R objects and code for the computational analysis included in themanuscript “Cross-species analysis of genome-wide regulatory networks identifies a synergistic dependencybetween FOXM1 and CENPF that drives prostate cancer malignancy”. Sections 2,4, and 5 of this documentprovide information on data objects (i.e., tables) referenced in the manuscript and sections 3,6,7,8 provideexecutable R code along with necessary R objects. All R data objects, necessary to execute this document,can be downloaded from http://dx.doi.org/10.6084/m9.figshare.1023038.

2 Transcriptional Gene Regulatory networks (Interactomes)

2.1 Reconstructing Interactomes (data objects)

ARACNe [1, 3] software for reconstraction of Transcriptional Regulatory networks (Interactomes) can bedownloaded from http://wiki.c2b2.columbia.edu/califanolab/index.php/Software.

To run ARACNe, you will need gene expression dataset and a list of TRs (Transcriptional Regulators,as defined in Gene Ontology), which are provided below.

To reconstract human prostate cancer interactome, we have used gene expression dataset from Tayloret al [2] with detailed description of the data available at GSE21034 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE21034.

To reconstract mouse prostate cancer interactome, we have generated mouse gene expression profiles ,deposited to GEO GSE53202 http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53202.

List of TRs (Transcription Factors and co-Factors) is available as R object TRs.

> TRs[1:5]

[1] "AEBP1" "NR0B1" "AHR" "ALX3" "AIRE"

Command line arguments used to run ARACNe are: p-value, Bonferroni corrected for multiple hypothesistesting, p = 0.05/(number of TRs ∗ number of tested targets), dpi = 0, method = adaptive partitioning.

The resulting human prostate cancer interactome can be loaded as R object taylor.interactome. Theresulting mouse prostate cancer interactome can be loaded as R object mouse.interactome.

> taylor.interactome[1:3,]

2

Page 3: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

Transcriptional Regulator Target Mutual Information1 AATF ANKLE2 0.51802 AATF BLMH 0.47833 AATF CBX1 0.5839

> mouse.interactome[1:3,]

Transcriptional Regulator Target Mutual Information1 Irf7 Apol9a 1.4922 Irf7 Apol9b 1.4453 Uhrf1 Mcm5 1.376

2.2 ChIP-Chip and ChIP-Seq validation of ARACNe inferred targets (dataobjects)

As a metric to assess how accurately the ARACNe-inferred targets of transcriptional regulators representactual/known targets, we compared these ARACNe inferred targets with genome-wide DNA-occupancy data,using publicly available ChIP-Chip or ChIP-Seq data (see object chipchip chipseq validation). Structureof the object is as follows:

> names(chipchip_chipseq_validation)

[1] "human_interactome" "mouse_interactome"

> names(chipchip_chipseq_validation$human_interactome)

[1] "TF" "TF.pubmedID"[3] "Targets.inferred.by.ARACNe" "Targets.predicted.by.ChIP.ChIP.Seq"[5] "Overlapping.targets" "p.value..by.Fisher.exact.test."[7] "Odds.ratio" "Experimental.technique"

As an example,

> chipchip_chipseq_validation$human_interactome[1:3,]

TF TF.pubmedID Targets.inferred.by.ARACNe1 KDM5B KDM5B-21448134 2982 STAT4 STAT4-19710469 3963 SMAD3 SMAD3-18955504 355Targets.predicted.by.ChIP.ChIP.Seq Overlapping.targets

1 3178 1322 1100 553 1696 61p.value..by.Fisher.exact.test. Odds.ratio Experimental.technique

1 2.68e-41 6.367 chip-seq2 8.28e-14 3.731 chip-chip3 1.97e-11 3.113 chip-chip

Among the human and mouse transcriptional regulators that could be evaluated using this approach (i.e.,those having available ChIP-Chip or ChIP-Seq data), there were many known to be functionally relevant forprostate cancer, such as c-MYC, Androgen Receptor (AR), and the NKX3.1 homeobox gene, which displayedsignificant overlap of their ARACNe inferred targets and ChIP-Chip or ChIP-Seq targets (Fisher exact test,p < 0.05).

3

Page 4: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

3 Conservation of human and mouse prostate interactomes (Rcode and objects)

It has been previously established that target-by-target analysis is a poor metric of cross-species interactomeconservation, due to false negatives [8]. Moreover, since our goal has been to identify transcriptional reg-ulators that drive a given tumor-related phenotype, such conservation of individual targets is not critical;rather, the key question is whether transcriptional regulators from one interactome or the other will yieldcomparable results in terms of their regulatory activity. Thus, as a quantitative metric to compare the humanand mouse interactomes, we first used a computationally efficient version of the MARINa [4, 5] algorithmto infer the differential activity of each transcriptional regulator represented in both the human and mouseinteractomes based on the expression of their human or mouse targets at the single-sample level (ssMARINa,available from http://dx.doi.org/10.6084/m9.figshare.785718). And then we compared the vectors ofactivity obtained from the mouse and human interactomes to determine if they were significantly correlated.

We based this analysis on four human and one murine prostate carcinoma expression dataset availablefrom http://dx.doi.org/10.6084/m9.figshare.785720. So the first step in our analysis pipeline is toload the ssmarina package, and the expression data and interactomes from the crosspeciesData package:

> library(ssmarina)

> library(crosspeciesData)

> data(prostate_dsets)

> data(prostate_interactomes)

The size of the datasets and some descriptive numbers about the interactomes can be obtained with:

> sapply(dsets, dim)

Taylor Wang1 Wang2 Yu mouse[1,] 22801 19609 12605 15374 13984[2,] 185 154 148 138 384

> sapply(list(mouse=reg.mus, mouse.sub1=reg.mus1, mouse.sub2=reg.mus2, Taylor=reg.taylor,

+ Taylor.sub1=reg.taylor1, Taylor.sub2=reg.taylor2), summary)

mouse mouse.sub1 mouse.sub2 Taylor Taylor.sub1 Taylor.sub2Regulators 2032 2032 2032 2719 2719 2719Targets 11958 11958 11958 20082 20082 20082Interactions 612670 1174461 998612 880914 1022384 1376709

The inference of the single-sample relative activity profile and the correlation analysis is performed bythe compareRegulon function in ssmarina package:

> set.seed(1)

> cres <- compareRegulon(reg.taylor, reg.mus, dsets, list(reg.taylor1, reg.taylor2),

+ list(reg.mus1, reg.mus2), groups=c(1, 1, 1, 1, 2))

> cres1 <- lapply(cres, abs)

> class(cres1) <- class(cres)

The distribution of the correlation coefficients for the negative controls (comparison between randomizedversions of the networks), positive controls (comparison between the alternative versions of the same network),and mouse vs. human interactomes can be represented in a graph by:

> plot(cres1,legend="topright")

4

Page 5: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0.0 0.2 0.4 0.6 0.8 1.0

02

46

8

Correlation coefficient

Den

sity

Random networkNtw1 to Ntw2Ntw1 (+) ctrlNtw2 (+) ctrl

While the statistical significance can be observed by plotting the distribution of the Z-scores inferredfrom the correlation between the random networks:

> plot(cres1, type="zscore")

5

Page 6: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 2 4 6 8 10 12

0.0

0.1

0.2

0.3

0.4

Z−Score

Den

sity

Ntw1 to Ntw2Ntw1 (+) ctrlNtw2 (+) ctrl

30% 70%

4 Comparison of mouse models to humal malignancy signature(data objects)

To elavuate molecular similarity of mouse models to human malignancy signature, we first performed anunbiased screen of all potential GEMM signatures (from all possible GEMM pairs) to determine their sim-ilarity to the human prostate cancer malignancy signature, using Gene Set Enrichment Analyses (GSEA).Human malignancy signature was used as a reference signature for GSEA. Each mouse signature was usedto define (i) query gene set of top 200 upregulated/over-expressed genes; (ii) query gene set of top 200downregulated/under-expressed genes. The data object containing results of this comparison can be loadedas mouse human similarity. Columns correspond to NESs and p-values from GSEA comparing upregu-lated (positive) and downregulated (negative) mouse gene sets with human malignancy signature while rowscorrespond to mouse model pairs (signatures).

> colnames(mouse_human_similarity)

[1] "NES_pos" "p_val_pos" "NES_neg" "p_val_neg"

6

Page 7: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

5 Master Regulator Analysis (MARINa)

MARINa software can be downloaded from http://wiki.c2b2.columbia.edu/califanolab/index.php/Software.

5.1 Human Master Regulators (data objects)

List of human MRs based on Taylor et al signature can be found in R object Taylor shample shuffing MRs.List of human MRs based on Balk et al signature can be found in R object Balk shample shuffing MRs.Columns correspond to TR names. The structure of MR objects is as follows:

> Taylor_shample_shuffling_MRs[,1:5]

AATF ABCG1 ABCG4 ABL1 ABLIM3p.value 0.8326613 1 0.221374 0.2340426 1NES -0.4513442 0 1.340957 -1.3797141 0

> Balk_shample_shuffling_MRs[,1:5]

AATF ABCG1 ABCG4 ABL1 ABLIM3p.value 0.09340659 1 0.400000 0.98765432 1NES 1.82579914 0 1.234709 0.02566632 0

5.2 Mouse Master Regulators (data objects)

We performed MARINa analysis for the following mouse models: (1) NPK and NP AI models, MRs canbe found in R object NPK NPAI MRs; (ii) Pb myc and NP models, MRs can be found in R objectPbmyc NP MRs; (iii) Pb myc and NP53 models, MRs can be found in R object Pbmyc NP53 MRs;(iv) NPB and NP AI models, MRs can be found in R object NPB NPAI MRs.

> Pbmyc_NP_MRs[,1:5]

Pbrm1 Phb Mzf1 Mynn Pycardp.value 0.004376368 1 1 0.94491525 0.02020202NES 2.933441321 0 0 -0.05758808 -2.39203358

> NPB_NPAI_MRs[,1:5]

Pbrm1 Phb Mzf1 Mynn Pycardp.value 0.001000 1 1 0.003629764 0.001000NES -4.441094 0 0 2.903557965 -3.500852

We used MARINa with gene shuffling for mouse signatures due to the fact that the number of samplesin each group was equal to four.

6 Datasets for clinical validation (R code and objects)

Gene expression datasets used for clinical validation of FOXM1 and CENPF include: (i) Sboner et al [6] (i.e.,swedish dataset), stored in R object mexp.swedish ; (ii) Glinsky et al. [7] stored in R object mexp.glinsky;and (iii) MSKCC TMA dataset stored in R object TMA FOXM1 CENPF protein levels. We haveperformed Kaplan-Meier Survival Analysis and c-statistics.

7

Page 8: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

7 Kaplan-Meier survival analysis (R code and objects)

7.1 Swedish dataset

The Swedish cohort data object can be loaded using:

> data(mexp.swedish)

where columns correspond to sample (i.e., patient) names and rows correspond to gene names.First, you need to load the survival data object (which describes time to prostate cancer specific death)

> attach(swedish_survival_data)

> names(swedish_survival_data)

[1] "patientsForSurvival"[2] "Prostate_specific_survival_free_time"[3] "Prostate_specific_survival_event"

Libraries necessary for Survival analysis can be loaded as:

> library(survival)

> library(survcomp)

7.1.1 Prostate Cancer-Specific Survival: based on transcriptional activity

To identify groups of patients with (i) both FOXM1 and CENPF active; (ii) FOXM1 active; (iii) CENPF ac-tive; and (iv) other cases; you need to use activity levels for FOXM1 and CENPF (stored in FOXM1 activity swedish cohortand CENPF activity swedish cohort), calculated based on enrichment of their activated and repressedtargets(as described in Methods) in each sample, as:

> FOXM1_up<-colnames(FOXM1_activity_swedish_cohort)[intersect(

+ which(FOXM1_activity_swedish_cohort[1,]>th_act),

+ which(FOXM1_activity_swedish_cohort[2,]<= (-th_act)))]

> CENPF_up<-colnames(CENPF_activity_swedish_cohort)[intersect(

+ which(CENPF_activity_swedish_cohort[1,]>th_act),

+ which(CENPF_activity_swedish_cohort[2,]<= (-th_act)))]

> both_up<-intersect(FOXM1_up,CENPF_up)

> FOXM1_up<-FOXM1_up[which(!(FOXM1_up %in% both_up))]

> CENPF_up<-CENPF_up[which(!(CENPF_up %in% both_up))]

> other<-colnames(FOXM1_activity_swedish_cohort)[

+ which(!(colnames(FOXM1_activity_swedish_cohort) %in% c(FOXM1_up,CENPF_up, both_up)))]

where threshold th act was defined as described in Methods.Kaplan-Meier survival analysis depicting relationship of “both up’ (red) group and “other” (blue) group,

with respect to time to Prostate cancer-related specific survival (time to death) can be generated as

> plotKaplanMeier(patientsForSurvival=swedish_survival_data$patientsForSurvival[which(

+ swedish_survival_data$patientsForSurvival

+ %in% c(both_up, other))],

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time[which(names(

+ swedish_survival_data$Prostate_specific_survival_free_time) %in% c(both_up, other))],

+ events=swedish_survival_data$Prostate_specific_survival_event[which(names(

+ swedish_survival_data$Prostate_specific_survival_event)

+ %in% c(both_up, other))],

+ class1=both_up,

+ class2=other,yaxis="Prostate Cancer-Specific Survival")

8

Page 9: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 200 139 153.7 1.4 16.9comp=1 35 29 14.3 15.1 16.9

Chisq= 16.9 on 1 degrees of freedom, p= 4.03e-05Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 200 200 200 139 125 110 149comp=1 35 35 35 29 49 40 93[1] 4.03e-05

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 4.03e−05

Kaplan-Meier survival analysis depicting relationship of “FOXM1 up’ (red) group and “other” (blue)group, with respect to time to Prostate cancer-related specific survival (time to death) can be generated as

> plotKaplanMeier(patientsForSurvival=swedish_survival_data$patientsForSurvival[which(

+ swedish_survival_data$patientsForSurvival

+ %in% c(FOXM1_up, other))],

9

Page 10: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time[which(names(

+ swedish_survival_data$Prostate_specific_survival_free_time)

+ %in%

+ c(FOXM1_up, other))],

+ events=swedish_survival_data$Prostate_specific_survival_event[which(names(

+ swedish_survival_data$Prostate_specific_survival_event)

+ %in% c(FOXM1_up,

+ other))],

+ class1=FOXM1_up,

+ class2=other, yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 200 139 143.6 0.146 2.19comp=1 18 15 10.4 2.009 2.19

Chisq= 2.2 on 1 degrees of freedom, p= 0.139Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 200 200 200 139 125 110 149comp=1 18 18 18 15 111 77 135[1] 0.139

10

Page 11: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 0.139

Kaplan-Meier survival analysis depicting relationship of “CENPF up” (red) group and “other” (blue)group, with respect to time to Prostate cancer-related specific survival (time to death) can be generated as

> plotKaplanMeier(patientsForSurvival=swedish_survival_data$patientsForSurvival[which(

+ swedish_survival_data$patientsForSurvival

+ %in% c(CENPF_up, other))],

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time[which(names(

+ swedish_survival_data$Prostate_specific_survival_free_time)

+ %in%

+ c(CENPF_up, other))],

+ events=swedish_survival_data$Prostate_specific_survival_event[which(names(

+ swedish_survival_data$Prostate_specific_survival_event)

+ %in% c(CENPF_up,

+ other))],

+ class1=CENPF_up,

+ class2=other, yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/V

11

Page 12: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

comp=0 200 139 147.5 0.491 5.59comp=1 28 23 14.5 4.993 5.59

Chisq= 5.6 on 1 degrees of freedom, p= 0.0181Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 200 200 200 139 125 110 149comp=1 28 28 28 23 73 69 161[1] 0.0181

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 0.0181

Finally, Kaplan-Meier survival analysis depicting all four groups/curves with respect to time to Prostatecancer-related specific survival (time to death) can be generated as

> plotKaplanMeier4(patientsForSurvival=swedish_survival_data$patientsForSurvival,

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time,

+ events=swedish_survival_data$Prostate_specific_survival_event, class1=both_up, class2=other,

+ class3=CENPF_up, class4=FOXM1_up, yaxis="Prostate Cancer-Specific Survival")

12

Page 13: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

7.1.2 Prostate Cancer-Specific Survival: based on expression levels

To identify groups of patients with (i) both FOXM1 and CENPF over-expressed; (ii) FOXM1 overexpressed;(iii) CENPF over-expressed; and (iv) other cases, run

> FOXM1_expression<-mexp.swedish["FOXM1",]

> CENPF_expression<-mexp.swedish["CENPF",]

> both_up_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression>th),

+ which(CENPF_expression>th))]

> FOXM1_up_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression>th),

+ which(CENPF_expression<=th))]

> CENPF_up_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression<=th),

+ which(CENPF_expression>th))]

> others_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression<=th),

+ which(CENPF_expression<=th))]

where threshold th was defined as described in Methods.Kaplan-Meier survival analysis depicting relationship of“both up expression”(red) group and“other expression”

(blue) group, with respect to time to Prostate cancer-related specific survival (time to death) can be generatedas

13

Page 14: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

> plotKaplanMeier(patientsForSurvival=swedish_survival_data$patientsForSurvival[which(

+ swedish_survival_data$patientsForSurvival

+ %in% c(both_up_expression, others_expression))],

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time[which(names(

+ swedish_survival_data$Prostate_specific_survival_free_time) %in% c(both_up_expression, others_expression))],

+ events=swedish_survival_data$Prostate_specific_survival_event[which(names(

+ swedish_survival_data$Prostate_specific_survival_event)

+ %in% c(both_up_expression, others_expression))],

+ class1=both_up_expression,

+ class2=others_expression,yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 81 50 67.7 4.62 9.49comp=1 108 86 68.3 4.58 9.49

Chisq= 9.5 on 1 degrees of freedom, p= 0.00207Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 81 81 81 50 157.0 119 170comp=1 108 108 108 86 82.5 72 122[1] 0.00207

14

Page 15: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 0.00207

Kaplan-Meier survival analysis depicting relationship of “FOXM1 up expression” (red) group and “oth-ers expression” (blue) group, with respect to time to Prostate cancer-related specific survival (time to death)can be generated as

> plotKaplanMeier(patientsForSurvival=swedish_survival_data$patientsForSurvival[which(

+ swedish_survival_data$patientsForSurvival

+ %in% c(FOXM1_up_expression, others_expression))],

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time[which(names(

+ swedish_survival_data$Prostate_specific_survival_free_time)

+ %in%

+ c(FOXM1_up_expression, others_expression))],

+ events=swedish_survival_data$Prostate_specific_survival_event[which(names(

+ swedish_survival_data$Prostate_specific_survival_event)

+ %in% c(FOXM1_up_expression,

+ others_expression))],

+ class1=FOXM1_up_expression,

+ class2=others_expression, yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

15

Page 16: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 81 50 63.3 2.79 8.83comp=1 51 44 30.7 5.76 8.83

Chisq= 8.8 on 1 degrees of freedom, p= 0.00296Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 81 81 81 50 157 119 170comp=1 51 51 51 44 110 83 129[1] 0.00296

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 0.00296

Kaplan-Meier survival analysis depicting relationship of “CENPF up expression” (red) group and “oth-ers expression” (blue) group, with respect to time to Prostate cancer-related specific survival (time to death)can be generated as

> plotKaplanMeier(patientsForSurvival=swedish_survival_data$patientsForSurvival[which(

+ swedish_survival_data$patientsForSurvival

+ %in% c(CENPF_up_expression, others_expression))],

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time[which(names(

+ swedish_survival_data$Prostate_specific_survival_free_time)

16

Page 17: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ %in%

+ c(CENPF_up_expression, others_expression))],

+ events=swedish_survival_data$Prostate_specific_survival_event[which(names(

+ swedish_survival_data$Prostate_specific_survival_event)

+ %in% c(CENPF_up_expression,

+ others_expression))],

+ class1=CENPF_up_expression,

+ class2=others_expression, yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 81 50 52.4 0.107 0.351comp=1 41 26 23.6 0.238 0.351

Chisq= 0.4 on 1 degrees of freedom, p= 0.553Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 81 81 81 50 157 119 170comp=1 41 41 41 26 121 78 NA[1] 0.553

17

Page 18: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 0.553

Finally, Kaplan-Meier survival analysis depicting all four groups/curves with respect to time to Prostatecancer-related specific survival (time to death) can be generated as

> plotKaplanMeier4(patientsForSurvival=swedish_survival_data$patientsForSurvival,

+ survTime=swedish_survival_data$Prostate_specific_survival_free_time,

+ events=swedish_survival_data$Prostate_specific_survival_event, class1=both_up_expression, class2=others_expression,

+ class3=CENPF_up_expression, class4=FOXM1_up_expression, yaxis="Prostate Cancer-Specific Survival")

18

Page 19: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

7.2 Glinsky dataset

The Glinsky et al data object can be loaded using:

> data(mexp.glinsky)

where columns correspond to sample (i.e., patient) names and rows correspond to gene names.First, you need to load the survival data object (which describes time to BioChemical Reccurence)

> attach(glinsky_survival_data)

> names(glinsky_survival_data)

[1] "patientsForSurvival" "BCR_free_time" "BCR_event"

Libraries necessary for Survival analysis can be loaded as:

> library(survival)

> library(survcomp)

19

Page 20: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

7.2.1 Time to BioChemical Reccurence: based on transcriptional activity

To identify groups of patients with (i) both FOXM1 and CENPF active; (ii) FOXM1 active; (iii) CENPFactive; and (iv) other cases, you will need to use activity levels for FOXM1 and CENPF calculated, basedon enrichment of their activated and repressed targets(as described in Methods) in each sample, as:

> FOXM1_up<-colnames(FOXM1_activity_glinsky)[intersect(

+ which(FOXM1_activity_glinsky[1,]>th_act_gl),

+

+ which(FOXM1_activity_glinsky[2,]<= (-th_act_gl)))]

> CENPF_up<-colnames(CENPF_activity_glinsky)[intersect(

+ which(CENPF_activity_glinsky[1,]>th_act_gl),

+ which(CENPF_activity_glinsky[2,]<= (-th_act_gl)))]

> both_up<-intersect(FOXM1_up,CENPF_up)

> FOXM1_up<-FOXM1_up[which(!(FOXM1_up %in% both_up))]

> CENPF_up<-CENPF_up[which(!(CENPF_up %in% both_up))]

> other<-colnames(FOXM1_activity_glinsky)[

+ which(!(colnames(FOXM1_activity_glinsky) %in% c(FOXM1_up,CENPF_up, both_up)))]

where threshold th act gl was defined as described in Methods.Kaplan-Meier survival analysis depicting relationship of “both up” (red) group and “other” (blue) group,

with respect BioChemical Reccurence Free survival (time to BCR) can be generated as

> plotKaplanMeier(patientsForSurvival=glinsky_survival_data$patientsForSurvival[which(

+ glinsky_survival_data$patientsForSurvival

+ %in% c(both_up, other))],

+ survTime=glinsky_survival_data$BCR_free_time[which(names(

+ glinsky_survival_data$BCR_free_time) %in%

+ c(both_up, other))],

+ events=glinsky_survival_data$BCR_event[which(names(

+ glinsky_survival_data$BCR_event)

+ %in% c(both_up, other))],

+ class1=both_up,

+ class2=other,yaxis="BioChemical Reccerence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 53 20 27.2 1.91 15.9comp=1 13 11 3.8 13.64 15.9

Chisq= 15.9 on 1 degrees of freedom, p= 6.52e-05Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 53 53 53 20 NA 82.4 NAcomp=1 13 13 13 11 12.2 6.1 NA[1] 6.52e-05

20

Page 21: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecce

renc

e F

ree

Sur

viva

l

Kaplan-Meier survival analysis depicting relationship of “FOXM1 up” (red) group and “other” (blue)group, with respect to BioChemical Recurrence Free Survival (time to BCR) can be generated as

> plotKaplanMeier(patientsForSurvival=glinsky_survival_data$patientsForSurvival[which(

+ glinsky_survival_data$patientsForSurvival

+ %in% c(FOXM1_up, other))],

+ survTime=glinsky_survival_data$BCR_free_time[which(names(

+ glinsky_survival_data$BCR_free_time)

+ %in%

+ c(FOXM1_up, other))],

+ events=glinsky_survival_data$BCR_event[which(names(

+ glinsky_survival_data$BCR_event)

+ %in% c(FOXM1_up,other))],

+ class1=FOXM1_up, class2=other, yaxis="BioChemical Recurrence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 53 20 18.91 0.063 0.635comp=1 5 1 2.09 0.570 0.635

21

Page 22: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

Chisq= 0.6 on 1 degrees of freedom, p= 0.425Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 53 53 53 20 NA 82.4 NAcomp=1 5 5 5 1 NA NA NA[1] 0.425

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecur

renc

e F

ree

Sur

viva

l

Kaplan-Meier survival analysis depicting relationship of “CENPF up” (red) group and “other” (blue)group, with respect to BioChemical Recurrence Free Survival (time to BCR) can be generated as

> plotKaplanMeier(patientsForSurvival=glinsky_survival_data$patientsForSurvival[which(

+ glinsky_survival_data$patientsForSurvival

+ %in% c(CENPF_up, other))],

+ survTime=glinsky_survival_data$BCR_free_time[which(names(

+ glinsky_survival_data$BCR_free_time)

+ %in%

+ c(CENPF_up, other))],

+ events=glinsky_survival_data$BCR_event[which(names(

+ glinsky_survival_data$BCR_event)

22

Page 23: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ %in% c(CENPF_up,other))],

+ class1=CENPF_up,class2=other, yaxis="BioChemical Recurrence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 53 20 22.5 0.278 2.81comp=1 8 5 2.5 2.504 2.81

Chisq= 2.8 on 1 degrees of freedom, p= 0.0934Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 53 53 53 20 NA 82.4 NAcomp=1 8 8 8 5 49.8 11.6 NA[1] 0.0934

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecur

renc

e F

ree

Sur

viva

l

Finally, Kaplan-Meier survival analysis depicting all four groups/curves with respect to time to BioChem-ical Recurrence Free Survival (time to BCR) can be generated as

23

Page 24: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

> plotKaplanMeier4(patientsForSurvival=glinsky_survival_data$patientsForSurvival,

+ survTime=glinsky_survival_data$BCR_free_time,

+ events=glinsky_survival_data$BCR_event,

+ class1=both_up, class2=other,

+ class3=CENPF_up, class4=FOXM1_up, yaxis="BioChemical Recurrence Free Survival")

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecur

renc

e F

ree

Sur

viva

l

7.2.2 Time to BioChemical Reccurence: based on expression levels

To identify groups of patients with (i) both FOXM1 and CENPF over-expressed; (ii) FOXM1 overexpressed;(iii) CENPF over-expressed; and (iv) other cases, run

> FOXM1_expression<-mexp.glinsky["FOXM1",]

> CENPF_expression<-mexp.glinsky["CENPF",]

> both_up_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression>th2),

+ which(CENPF_expression>th2))]

> FOXM1_up_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression>th2),

+ which(CENPF_expression<=th2))]

> CENPF_up_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression<=th2),

+ which(CENPF_expression>th2))]

24

Page 25: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

> others_expression<-names(FOXM1_expression)[intersect(which(FOXM1_expression<=th2),

+ which(CENPF_expression<=th2))]

where threshold th2 was defined as described in Methods.Kaplan-Meier survival analysis depicting relationship of“both up expression”(red) group and“other expression”

(blue) group, with respect BioChemical Reccurence Free survival (time to BCR) can be generated as

> plotKaplanMeier(patientsForSurvival=glinsky_survival_data$patientsForSurvival[which(

+ glinsky_survival_data$patientsForSurvival

+ %in% c(both_up_expression, others_expression))],

+ survTime=glinsky_survival_data$BCR_free_time[which(names(

+ glinsky_survival_data$BCR_free_time) %in%

+ c(both_up_expression, others_expression))],

+ events=glinsky_survival_data$BCR_event[which(names(

+ glinsky_survival_data$BCR_event)

+ %in% c(both_up_expression, others_expression))],

+ class1=both_up_expression,

+ class2=others_expression,yaxis="BioChemical Reccerence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 35 13 16.08 0.592 3.89comp=1 8 6 2.92 3.265 3.89

Chisq= 3.9 on 1 degrees of freedom, p= 0.0487Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 35 35 35 13 NA 82.4 NAcomp=1 8 8 8 6 39 12.1 NA[1] 0.0487

25

Page 26: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecce

renc

e F

ree

Sur

viva

l

Kaplan-Meier survival analysis depicting relationship of “FOXM1 up expression” (red) group and “oth-ers expression” (blue) group, with respect to BioChemical Recurrence Free Survival (time to BCR) can begenerated as

> plotKaplanMeier(patientsForSurvival=glinsky_survival_data$patientsForSurvival[which(

+ glinsky_survival_data$patientsForSurvival

+ %in% c(FOXM1_up_expression, others_expression))],

+ survTime=glinsky_survival_data$BCR_free_time[which(names(

+ glinsky_survival_data$BCR_free_time)

+ %in%

+ c(FOXM1_up_expression, others_expression))],

+ events=glinsky_survival_data$BCR_event[which(names(

+ glinsky_survival_data$BCR_event)

+ %in% c(FOXM1_up_expression,

+ others_expression))],

+ class1=FOXM1_up_expression,

+ class2=others_expression, yaxis="BioChemical Recurrence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

26

Page 27: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 35 13 12.59 0.0134 0.427comp=1 1 0 0.41 0.4103 0.427

Chisq= 0.4 on 1 degrees of freedom, p= 0.513Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 35 35 35 13 NA 82.4 NAcomp=1 1 1 1 0 NA NA NA[1] 0.513

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecur

renc

e F

ree

Sur

viva

l

Kaplan-Meier survival analysis depicting relationship of “CENPF up expression” (red) group and “oth-ers expression” (blue) group, with respect to BioChemical Recurrence Free Survival (time to BCR) can begenerated as

> plotKaplanMeier(patientsForSurvival=glinsky_survival_data$patientsForSurvival[which(

+ glinsky_survival_data$patientsForSurvival

+ %in% c(CENPF_up_expression, others_expression))],

+ survTime=glinsky_survival_data$BCR_free_time[which(names(

+ glinsky_survival_data$BCR_free_time)

27

Page 28: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ %in%

+ c(CENPF_up_expression, others_expression))],

+ events=glinsky_survival_data$BCR_event[which(names(

+ glinsky_survival_data$BCR_event)

+ %in% c(CENPF_up_expression,

+ others_expression))],

+ class1=CENPF_up_expression,

+ class2=others_expression, yaxis="BioChemical Recurrence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 35 13 16.5 0.730 1.57comp=1 35 18 14.5 0.828 1.57

Chisq= 1.6 on 1 degrees of freedom, p= 0.211Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 35 35 35 13 NA 82.4 NAcomp=1 35 35 35 18 67.4 34.6 NA[1] 0.211

28

Page 29: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecur

renc

e F

ree

Sur

viva

l

Finally, Kaplan-Meier survival analysis depicting all four groups/curves with respect to time to BioChem-ical Recurrence Free Survival (time to BCR) can be generated as

> plotKaplanMeier4(patientsForSurvival=glinsky_survival_data$patientsForSurvival,

+ survTime=glinsky_survival_data$BCR_free_time,

+ events=glinsky_survival_data$BCR_event,

+ class1=both_up_expression, class2=others_expression,

+ class3=CENPF_up_expression, class4=FOXM1_up_expression, yaxis="BioChemical Recurrence Free Survival")

29

Page 30: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 20 40 60 80 100

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

ecur

renc

e F

ree

Sur

viva

l

7.3 TMA

The MSKCC TMA data object (i.e., protein expression levels) can be loaded using:

> attach(TMA_FOXM1_CENPF_protein_levels)

where rows correspond to sample (i.e., patient) names and columns correspond to either FOXM1 or CENPF.

> names(TMA_FOXM1_CENPF_protein_levels)

[1] "FOXM1_score" "CENPF_score"

To apply Kaplan-Meier survival analysis for these data, you will need to load the survival data object(which describes time to BCR, time to prostate cancer specific death and time to metastasis)

> attach(TMA_survival_data)

> names(TMA_survival_data)

[1] "patientsForSurvival"[2] "BCR_free_time"

30

Page 31: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

[3] "BCR_event"[4] "Prostate_specific_survival_free_time"[5] "Prostate_specific_survival_event"[6] "Metastasis_free_time"[7] "Metastasis_event"

To identify patient/sample groups with (i) both FOXM1 and CENPF over-expressed; (ii) FOXM1 over-expressed; (iii) CENPF over-expressed; and (iv) other cases; you can run

> both_up<-names(TMA_FOXM1_CENPF_protein_levels$FOXM1_score)[intersect(which(

+ TMA_FOXM1_CENPF_protein_levels$FOXM1_score>th_TMA),

+ which(TMA_FOXM1_CENPF_protein_levels$CENPF_score>th_TMA))]

> FOXM1_up<-names(TMA_FOXM1_CENPF_protein_levels$FOXM1_score)[intersect(which(

+ TMA_FOXM1_CENPF_protein_levels$FOXM1_score>th_TMA),

+ which(!(TMA_FOXM1_CENPF_protein_levels$CENPF_score>th_TMA)))]

> CENPF_up<-names(TMA_FOXM1_CENPF_protein_levels$FOXM1_score)[intersect(which(

+ !(TMA_FOXM1_CENPF_protein_levels$FOXM1_score>th_TMA)),

+ which(TMA_FOXM1_CENPF_protein_levels$CENPF_score>th_TMA))]

> others<-CENPF_up<-names(TMA_FOXM1_CENPF_protein_levels$FOXM1_score)[intersect(which(

+ !(TMA_FOXM1_CENPF_protein_levels$FOXM1_score>th_TMA)),

+ which(!(TMA_FOXM1_CENPF_protein_levels$CENPF_score>th_TMA)))]

where bf th TMA was defined as described in Methods.or you can load these groups as

> attach(TMA_KM_groups)

Libraries necessary for Survival analysis can be loaded as:

> library(survival)

> library(survcomp)

7.3.1 BioChemical Reccurence Free Survival

Kaplan-Meier survival analysis depicting relationship of “both up” (red) group and “others” (blue) group,with respect to time to BioChemical reccurence (BCR) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$both_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$BCR_free_time[which(names(TMA_survival_data$BCR_free_time) %in%

+ c(TMA_KM_groups$both_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$BCR_event

+ [which(names(TMA_survival_data$BCR_event) %in% c(TMA_KM_groups$both_up,

+ TMA_KM_groups$others))],

+ class1=TMA_KM_groups$both_up,

+ class2=TMA_KM_groups$others, yaxis="BioChemical Reccurence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 418 111 138.7 5.51 21.1comp=1 173 77 49.3 15.49 21.1

31

Page 32: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

Chisq= 21.1 on 1 degrees of freedom, p= 4.43e-06Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 111 NA NA NAcomp=1 173 173 173 77 NA 85.8 NA[1] 4.43e-06

0 50 100 150 200

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

eccu

renc

e F

ree

Sur

viva

l

p−value = 4.43e−06

Kaplan-Meier survival analysis depicting relationship of “FOXM1 up” (red) group and “others” (blue)group, with respect to time to BioChemical reccurence (BCR) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$FOXM1_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$BCR_free_time[which(names(TMA_survival_data$BCR_free_time) %in%

+ c(TMA_KM_groups$FOXM1_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$BCR_event[which(names(TMA_survival_data$BCR_event) %in%

+ c(TMA_KM_groups$FOXM1_up,

+ TMA_KM_groups$others))],

+ class1=TMA_KM_groups$FOXM1_up,

32

Page 33: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ class2=TMA_KM_groups$others, yaxis="BioChemical Reccurence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 418 111 119.8 0.65 3.75comp=1 97 34 25.2 3.09 3.75

Chisq= 3.7 on 1 degrees of freedom, p= 0.0528Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 111 NA NA NAcomp=1 97 97 97 34 NA NA NA[1] 0.0528

0 50 100 150 200

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

eccu

renc

e F

ree

Sur

viva

l

p−value = 0.0528

Kaplan-Meier survival analysis depicting relationship of “CENPF up” (red) group and “others” (blue)group, with respect to time to BioChemical reccurence (BCR) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

33

Page 34: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ %in% c(TMA_KM_groups$CENPF_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$BCR_free_time[which(names(TMA_survival_data$BCR_free_time) %in%

+ c(TMA_KM_groups$CENPF_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$BCR_event[which(names(TMA_survival_data$BCR_event) %in%

+ c(TMA_KM_groups$CENPF_up,

+ TMA_KM_groups$others))],

+ class1=TMA_KM_groups$CENPF_up,

+ class2=TMA_KM_groups$others, yaxis="BioChemical Reccurence Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 418 111 120.1 0.696 3.1comp=1 133 44 34.9 2.399 3.1

Chisq= 3.1 on 1 degrees of freedom, p= 0.0782Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 111 NA NA NAcomp=1 133 133 133 44 NA NA NA[1] 0.0782

34

Page 35: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

eccu

renc

e F

ree

Sur

viva

l

p−value = 0.0782

Finally, Kaplan-Meier survival analysis depicting all four groups/curves with respect to time to BioChem-ical reccurence (BCR) can be generated as

> plotKaplanMeier4(patientsForSurvival=TMA_survival_data$patientsForSurvival,

+ survTime=TMA_survival_data$BCR_free_time,

+ events=TMA_survival_data$BCR_event, class1=TMA_KM_groups$both_up,

+ class2=TMA_KM_groups$others,

+ class3=TMA_KM_groups$CENPF_up, class4=TMA_KM_groups$FOXM1_up, yaxis="BioChemical Reccurence

+ Free Survival")

35

Page 36: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Bio

Che

mic

al R

eccu

renc

e F

ree

Sur

viva

l

7.3.2 Prostate Cancer-Specific Survival

Kaplan-Meier survival analysis depicting relationship of “both up” (red) group and “others” (blue) group,with respect to time to Prostate cancer-related specific survival (time to death) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$both_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$Prostate_specific_survival_free_time[which(names(

+ TMA_survival_data$Prostate_specific_survival_free_time) %in% c(TMA_KM_groups$both_up,

+ TMA_KM_groups$others))],

+ events=TMA_survival_data$Prostate_specific_survival_event[which(names(

+ TMA_survival_data$Prostate_specific_survival_event)

+ %in% c(TMA_KM_groups$both_up, TMA_KM_groups$others))],

+ class1=TMA_KM_groups$both_up,

+ class2=TMA_KM_groups$others, yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

36

Page 37: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 418 3 15.03 9.63 33.9comp=1 173 18 5.97 24.23 33.9

Chisq= 33.9 on 1 degrees of freedom, p= 5.9e-09Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 3 NA NA NAcomp=1 173 173 173 18 NA NA NA[1] 5.9e-09

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 5.9e−09

Kaplan-Meier survival analysis depicting relationship of “FOXM1 up” (red) group and “others” (blue)group, with respect to time to Prostate cancer-related specific survival (time to death) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$FOXM1_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$Prostate_specific_survival_free_time[which(names(

+ TMA_survival_data$Prostate_specific_survival_free_time)

+ %in%

37

Page 38: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ c(TMA_KM_groups$FOXM1_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$Prostate_specific_survival_event[which(names(

+ TMA_survival_data$Prostate_specific_survival_event)

+ %in% c(TMA_KM_groups$FOXM1_up,

+ TMA_KM_groups$others))],

+ class1=TMA_KM_groups$FOXM1_up,

+ class2=TMA_KM_groups$others, yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 418 3 5.66 1.25 6.5comp=1 97 4 1.34 5.25 6.5

Chisq= 6.5 on 1 degrees of freedom, p= 0.0108Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 3 NA NA NAcomp=1 97 97 97 4 NA NA NA[1] 0.0108

38

Page 39: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 0.0108

Kaplan-Meier survival analysis depicting relationship of “CENPF up” (red) group and “others” (blue)group, with respect to time to Prostate cancer-related specific survival (time to death) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$CENPF_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$Prostate_specific_survival_free_time[which(names(

+ TMA_survival_data$Prostate_specific_survival_free_time)

+ %in%

+ c(TMA_KM_groups$CENPF_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$Prostate_specific_survival_event[which(names(

+ TMA_survival_data$Prostate_specific_survival_event)

+ %in% c(TMA_KM_groups$CENPF_up,

+ TMA_KM_groups$others))],

+ class1=TMA_KM_groups$CENPF_up,

+ class2=TMA_KM_groups$others, yaxis="Prostate Cancer-Specific Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/V

39

Page 40: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

comp=0 418 3 3.8 0.169 0.703comp=1 133 2 1.2 0.534 0.703

Chisq= 0.7 on 1 degrees of freedom, p= 0.402Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 3 NA NA NAcomp=1 133 133 133 2 NA NA NA[1] 0.402

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic S

urvi

val

p−value = 0.402

Finally, Kaplan-Meier survival analysis depicting all four groups/curves with respect to time to Prostatecancer-related specific survival (time to death) can be generated as

> plotKaplanMeier4(patientsForSurvival=TMA_survival_data$patientsForSurvival,

+ survTime=TMA_survival_data$Prostate_specific_survival_free_time,

+ events=TMA_survival_data$Prostate_specific_survival_event, class1=TMA_KM_groups$both_up,

+ class2=TMA_KM_groups$others,

+ class3=TMA_KM_groups$CENPF_up, class4=TMA_KM_groups$FOXM1_up, yaxis="Prostate Cancer-Specific

+ Survival")

40

Page 41: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Pro

stat

e C

ance

r−S

peci

fic

Sur

viva

l

7.3.3 Metastases-Free Survival

Kaplan-Meier survival analysis depicting relationship of “both up” (red) group and “others” (blue) group,with respect to Metastases-Free Survival (time to metastases) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$both_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$Metastasis_free_time[which(names(

+ TMA_survival_data$Metastasis_free_time)

+ %in%

+ c(TMA_KM_groups$both_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$Metastasis_event[which(names(

+ TMA_survival_data$Metastasis_event)

+ %in% c(TMA_KM_groups$both_up, TMA_KM_groups$others))],

+ class1=TMA_KM_groups$both_up,

+ class2=TMA_KM_groups$others, yaxis="Metastases-Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

41

Page 42: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 418 10 39.6 22.2 83.6comp=1 173 44 14.4 61.2 83.6

Chisq= 83.6 on 1 degrees of freedom, p= 0Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 10 NA NA NAcomp=1 173 173 173 44 NA NA NA[1] 0

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Met

asta

ses−

Fre

e S

urvi

val

p−value = 0

Kaplan-Meier survival analysis depicting relationship of “FOXM1 up” (red) group and “others” (blue)group, with respect to Metastases-Free Survival (time to metastases) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$FOXM1_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$Metastasis_free_time[which(names(

+ TMA_survival_data$Metastasis_free_time)

42

Page 43: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

+ %in%

+ c(TMA_KM_groups$FOXM1_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$Metastasis_event[which(names(

+ TMA_survival_data$Metastasis_event)

+ %in% c(TMA_KM_groups$FOXM1_up,

+ TMA_KM_groups$others))],

+ class1=TMA_KM_groups$FOXM1_up,

+ class2=TMA_KM_groups$others, yaxis="Metastases-Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/Vcomp=0 418 10 15.49 1.94 10.5comp=1 97 9 3.51 8.58 10.5

Chisq= 10.5 on 1 degrees of freedom, p= 0.00117Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 10 NA NA NAcomp=1 97 97 97 9 NA NA NA[1] 0.00117

43

Page 44: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Met

asta

ses−

Fre

e S

urvi

val

p−value = 0.00117

Kaplan-Meier survival analysis depicting relationship of “CENPF up” (red) group and “others” (blue)group, with respect to Metastases-Free Survival (time to metastases) can be generated as

> plotKaplanMeier(patientsForSurvival=TMA_survival_data$patientsForSurvival[which(

+ TMA_survival_data$patientsForSurvival

+ %in% c(TMA_KM_groups$CENPF_up, TMA_KM_groups$others))],

+ survTime=TMA_survival_data$Metastasis_free_time[which(names(

+ TMA_survival_data$Metastasis_free_time)

+ %in%

+ c(TMA_KM_groups$CENPF_up, TMA_KM_groups$others))],

+ events=TMA_survival_data$Metastasis_event[which(names(

+ TMA_survival_data$Metastasis_event)

+ %in% c(TMA_KM_groups$CENPF_up,

+ TMA_KM_groups$others))],

+ class1=TMA_KM_groups$CENPF_up,

+ class2=TMA_KM_groups$others, yaxis="Metastases-Free Survival")

Call:survdiff(formula = Surv(time, vital) ~ comp, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/V

44

Page 45: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

comp=0 418 10 20.01 5.01 21.8comp=1 133 16 5.99 16.75 21.8

Chisq= 21.8 on 1 degrees of freedom, p= 3.07e-06Call: survfit(formula = Surv(time, vital) ~ comp)

records n.max n.start events median 0.95LCL 0.95UCLcomp=0 418 418 418 10 NA NA NAcomp=1 133 133 133 16 NA NA NA[1] 3.07e-06

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Met

asta

ses−

Fre

e S

urvi

val

p−value = 3.07e−06

Finally, Kaplan-Meier survival analysis depicting all four groups/curves with respect to Metastases-FreeSurvival (time to metastases) can be generated as

> plotKaplanMeier4(patientsForSurvival=TMA_survival_data$patientsForSurvival,

+ survTime=TMA_survival_data$Metastasis_free_time,

+ events=TMA_survival_data$Metastasis_event, class1=TMA_KM_groups$both_up,

+ class2=TMA_KM_groups$others,

+ class3=TMA_KM_groups$CENPF_up,

+ class4=TMA_KM_groups$FOXM1_up, yaxis="Metastases-Free Survival")

45

Page 46: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

0 50 100 150 200 250

0.0

0.2

0.4

0.6

0.8

1.0

survival curves

months

Met

asta

ses−

Fre

e S

urvi

val

8 C-statistics (R code and objects)

To calculate c-statistics on TMA data, load the following libraries

> library(survival)

> library(survcomp)

The MSKCC TMA data object can be loaded using:

> data(TMA_FOXM1_CENPF_protein_levels)

Survival data are loaded as:

> attach(TMA_survival_data)

> names(TMA_survival_data)

[1] "patientsForSurvival"[2] "BCR_free_time"[3] "BCR_event"[4] "Prostate_specific_survival_free_time"

46

Page 47: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

[5] "Prostate_specific_survival_event"[6] "Metastasis_free_time"[7] "Metastasis_event"

Gleason scores for these patients are stored in TMA.Gleason object.To choose TMA samples with available Gleason scores:

> FOXM1_score_Gleason<-sapply(names(TMA.Gleason), function(x)

+ as.numeric(TMA_FOXM1_CENPF_protein_levels$FOXM1_score[which(

+ names(TMA_FOXM1_CENPF_protein_levels$FOXM1_score)==x)]) )

> names(FOXM1_score_Gleason)<-names(TMA.Gleason)

> CENPF_score_Gleason<-sapply(names(TMA.Gleason), function(x)

+ as.numeric(TMA_FOXM1_CENPF_protein_levels$CENPF_score[which(

+ names(TMA_FOXM1_CENPF_protein_levels$CENPF_score)==x)]) )

> names(CENPF_score_Gleason)<-names(TMA.Gleason)

8.1 C-statistics based on Prostate Cancer-Specific Survival

To calculate c-statistics based on the Prostate Cancer-specific survival and compare it to commonly usedpredictions based on Gleason score, perform survival analysis on data with available Gleason grading (799cases).

> TMA_PSS_FT<-TMA_survival_data$Prostate_specific_survival_free_time[names(TMA.Gleason)]

> TMA_PSS_E<-TMA_survival_data$Prostate_specific_survival_event[names(TMA.Gleason)]

Survival function is then estimated as

> surv<- Surv(TMA_PSS_FT,TMA_PSS_E)

To calculate c-statistics based on Gleason score:

> ci.res.Gleason <-

+ concordance.index(x =

+ coxph(surv~as.numeric(TMA.Gleason))$linear.predictors, surv.time =TMA_PSS_FT

+ ,surv.event = TMA_PSS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res.Gleason$c.index

[1] 0.6451687

> ci.res.Gleason$p.value

[1] 0.03855914

To calculate c-statistics based on FOXM1+CENPF protein expression levels (additive model):

> ci.res_FOXM1_CENPF<-concordance.index(x =

+ coxph(surv~CENPF_score_Gleason+FOXM1_score_Gleason)$linear.predictors,

+ surv.time =TMA_PSS_FT,

+ surv.event = TMA_PSS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res_FOXM1_CENPF$c.index

[1] 0.7146325

> ci.res_FOXM1_CENPF$p.value

[1] 0.0002402222

47

Page 48: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

To calculate c-statistics based on FOXM1+CENPF protein expression levels (multiplicative model):

> ci.res_FOXM1_CENPF_m<-concordance.index(x =

+ coxph(surv~CENPF_score_Gleason*FOXM1_score_Gleason)$linear.predictors,

+ surv.time =TMA_PSS_FT,

+ surv.event = TMA_PSS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res_FOXM1_CENPF_m$c.index

[1] 0.7156727

> ci.res_FOXM1_CENPF_m$p.value

[1] 0.0002173858

which shows higher predictive value for protein levels of FOXM1 and CENPF when compared to the predic-tive values of Gleason score alone.

To estimate the independent predictive value for protein levels of FOXM1 and CENPF, over (i.e., inaddition to) the Gleason score (based on the additive model):

> ci.res_Gleason_FOXM1_CENPF<-concordance.index(x =

+ coxph(surv~as.numeric(TMA.Gleason)+CENPF_score_Gleason*FOXM1_score_Gleason)$

+ linear.predictors,

+ surv.time =TMA_PSS_FT,

+ surv.event = TMA_PSS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res_Gleason_FOXM1_CENPF$c.index

[1] 0.8687009

> ci.res_Gleason_FOXM1_CENPF$p.value

[1] 3.759784e-31

Multiplicative model for Gleason and FOXM1 and CENPF protein levels aims similar results.

8.2 C-statistics based on Metastases-Free Survival

To calculate c-statistics based on the Metastases-Free survival and compare it to commonly used predictionsbased on Gleason score, perform survival analysis on data with available Gleason grading (799 cases).

> TMA_MFS_FT<-TMA_survival_data$Metastasis_free_time[names(TMA.Gleason)]

> TMA_MFS_E<-TMA_survival_data$Metastasis_event[names(TMA.Gleason)]

Survival function is then estimated as

> surv<- Surv(TMA_MFS_FT,TMA_MFS_E)

To calculate c-statistics based on Gleason score:

> ci.res.Gleason <-

+ concordance.index(x =

+ coxph(surv~as.numeric(TMA.Gleason))$linear.predictors, surv.time =TMA_MFS_FT

+ ,surv.event = TMA_MFS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res.Gleason$c.index

[1] 0.6185323

> ci.res.Gleason$p.value

48

Page 49: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

[1] 0.00209945

To calculate c-statistics based on FOXM1+CENPF protein expression levels (additive model):

> ci.res_FOXM1_CENPF<-concordance.index(x =

+ coxph(surv~CENPF_score_Gleason+FOXM1_score_Gleason)$linear.predictors,

+ surv.time =TMA_MFS_FT,

+ surv.event = TMA_MFS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res_FOXM1_CENPF$c.index

[1] 0.7693181

> ci.res_FOXM1_CENPF$p.value

[1] 3.100663e-19

To calculate c-statistics based on FOXM1+CENPF protein expression levels (multiplicative model):

> ci.res_FOXM1_CENPF_m<-concordance.index(x =

+ coxph(surv~CENPF_score_Gleason*FOXM1_score_Gleason)$linear.predictors,

+ surv.time =TMA_MFS_FT,

+ surv.event = TMA_MFS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res_FOXM1_CENPF_m$c.index

[1] 0.7713868

> ci.res_FOXM1_CENPF_m$p.value

[1] 5.264187e-20

which shows higher predictive value for protein levels of FOXM1 and CENPF when compared to the predic-tive values of Gleason score alone.

To estimate the independent predictive value for protein levels of FOXM1 and CENPF, over (i.e., inaddition to) the Gleason score (additive model):

> ci.res_Gleason_FOXM1_CENPF<-concordance.index(x =

+ coxph(surv~as.numeric(TMA.Gleason)+CENPF_score_Gleason*FOXM1_score_Gleason)$

+ linear.predictors,

+ surv.time =TMA_MFS_FT,

+ surv.event = TMA_MFS_E, method = "noether", na.rm = TRUE, outx=FALSE);

> ci.res_Gleason_FOXM1_CENPF$c.index

[1] 0.8506017

> ci.res_Gleason_FOXM1_CENPF$p.value

[1] 1.489094e-55

Multiplicative model for Gleason and FOXM1 and CENPF protein levels aims similar results.

49

Page 50: SWEAVE documentation: Cross-species analysis of genome ...€¦ · SWEAVE documentation: Cross-species analysis of genome-wide regulatory networks identi es a synergistic dependency

References

[1] Basso, K., Margolin, A. A., Stolovitzky, G., Klein, U., Dalla-Favera, R., and Califano, A. (2005). Reverseengineering of regulatory networks in human B cells. Nat Genet 37, 382-390.

[2] Taylor, B. S., Schultz, N., Hieronymus, H., Gopalan, A., Xiao, Y., Carver, B. S., Arora, V. K., Kaushik,P., Cerami, E., Reva, B., et al. (2010). Integrative genomic profiling of human prostate cancer. CancerCell 18, 11-22.

[3] Margolin, A. A., Wang, K., Lim, W. K., Kustagi, M., Nemenman, I., and Califano, A. (2006b). Reverseengineering cellular networks. Nat Protoc 1, 662-671.

[4] Carro, M. S., Lim, W. K., Alvarez, M. J., Bollo, R. J., Zhao, X., Snyder, E. Y., Sulman, E. P., Anne, S.L., Doetsch, F., Colman, H., et al. (2010). The transcriptional network for mesenchymal transformationof brain tumours. Nature 463, 318-325.

[5] Lefebvre, C., Rajbhandari, P., Alvarez, M. J., Bandaru, P., Lim, W. K., Sato, M., Wang, K., Sumazin,P., Kustagi, M., Bisikirska, B. C., et al. (2010). A human B-cell interactome identifies MYB and FOXM1as master regulators of proliferation in germinal centers. Mol Syst Biol 6, 377.

[6] Sboner, A., Demichelis, F., Calza, S., Pawitan, Y., Setlur, S. R., Hoshida, Y., Perner, S., Adami, H. O.,Fall, K., Mucci, L. A., et al. (2010). Molecular sampling of prostate cancer: a dilemma for predictingdisease progression. BMC Med Genomics 3, 8.

[7] Glinsky, G. V., Glinskii, A. B., Stephenson, A. J., Hoffman, R. M., and Gerald, W. L. (2004). Geneexpression profiling predicts clinical outcome of prostate cancer. J Clin Invest 113, 913-923.

[8] Zhang,Q.C. et al. (2012) Structure-based prediction of protein-protein interactions on a genome-widescale. Nature, 490, 556-60.

50