118
- 1 - Oral Presentations

Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 1 -

Oral Presentations

Page 2: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 2 -

95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes of Health, Bethesda, Maryland Background: Papillary renal cell carcinoma (PRCC), the second most common type of RCC, is a heterogeneous disease made up of a number of different types of renal cancer, including those with indolent, often multifocal presentation as well as solitary tumors with an aggressive, highly lethal phenotype. Little is known about the genetic basis of sporadic papillary RCC and there are no effective forms of therapy for patients with advanced disease. Methods: We performed comprehensive molecular characterization of 161 surgically resected primary papillary renal cell carcinomas using whole-exome sequencing, messenger RNA, microRNA, copy number, methylation and proteomic analyses. Integrative analysis was performed to correlate the molecular features with stage and survival. Results: We determined that Type 1 and Type 2 PRCC represented distinctly different types of renal cancer characterized by specific genetic alterations, and that Type 2 PRCC could be further classified into at least three individual subtypes based on molecular differences and patient survival. Alterations in MET were associated with Type 1 tumors, while CDKN2A silencing, SETD2 mutations, TFE3 fusions, and increased expression of the NRF2-ARE pathway were identified in Type 2 tumors. We found a CpG island methylator phenotype (CIMP) in a distinct subset of Type 2 PRCC characterized by early onset, poor survival, and germline or somatic mutation of the fumarate hydratase (FH) gene. Conclusions: Integrative analysis confirms different biological entities characterized by distinct genetic features for the Type 1 and Type 2 histologic subtypes of PRCC, and reveals an important role for the MET (Type 1) and NRF2-ARE (Type 2) pathways.

Page 3: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 3 -

38. High-Throughput Somatic Variant Impact Phenotyping Using Gene Expression

Signatures Angela N. Brooks1,2, Alice H. Berger1,2, Xiaoyun Wu2, Larson Hogstrom2, Itay Tirosh2, Federica Piccioni2, Mukta Bagul2, Cong Zhu2, Yashaswi Shretha2, David Root2, Pablo Tamayo2, Ryo Sakai3, Bang Wong2, Ted Natoli2, David Lahr2, Atanas Kamburov2, Aravind Subramanian2, Gad Getz2, Todd Golub1,2, Matthew Meyerson1,2, Jesse Boehm2 1Division of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts; 2Cancer Program, Broad Institute, Cambridge, Massachusetts; 3Department of Electrical Engineering, KU Leuven, Belgium Cancer genome sequencing efforts have led to the rapid identification of thousands of cancer-associated somatic mutations; however, there is a significant bottleneck in determining the functional impact of these variants. In addition, infrequently observed variants, even in well-characterized genes, pose a challenge in distinguishing impactful from passenger mutations. Understanding mutation function is critical for our knowledge of cancer biology and to more rapidly determine targeted treatment strategies based on individual tumor genetic profiles. We report a high-throughput approach for expression-based variant impact phenotyping called e-VIP. We applied this method to study ~450 somatic mutations identified in primary lung adenocarcinomas. The e-VIP approach compares gene expression changes upon introduction of wild-type versus mutant allele cDNAs in cell lines to disentangle functional mutations from likely inert mutations. We further classify alleles as gain-of-function or loss-of-function through differences in the signature strength between wild-type and mutant alleles. e-VIP correctly classifies known functional mutations in genes such as KRAS, EGFR, and RIT1 and predicts functional effects of never-characterized mutations. We characterized rare mutations in clinically-actionable oncogenes such as EGFR and unexpected dominant mutations in the transcription factor MAX and the phosphatase subunit PPP2R1A, among others. We observed an enrichment of loss-of-function missense mutations in known tumor suppressor genes such as STK11, KEAP1, and FBXW7. Most genes assayed also harbored variants that are likely inert, further underscoring the importance of characterizing individual variant alleles. Orthogonal functional approaches including an EGFR inhibitor resistance screen and a pooled tumor formation assay, were used as validation. In principle, e-VIP can characterize any genetic variant, independent of prior knowledge of gene function, and should significantly advance the pace of functional characterization of variants identified from genome sequencing studies.

Page 4: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 4 -

32. CoMEt: A Statistical Approach to Identify Combinations of Mutually Exclusive

Alterations in Cancer

Mark D.M. Leiserson1, Hsin-Ta Wu1, Fabio Vandin1, Benjamin J. Raphael1

1Department of Computer Science and Center for Computational Molecular Biology, Brown University, Providence, Rhode Island Identifying driver mutations in cancer genomes is a significant challenge due to the mutational heterogeneity of tumors. This mutational heterogeneity arises because driver mutations target genes in signaling and regulatory pathways, each of which can be perturbed in numerous ways. We introduce Combinations of Mutually Exclusive Alterations (CoMEt), an algorithm to identify combinations of candidate driver mutations de novo, without any prior biological knowledge (e.g. pathways or protein interactions). CoMEt searches for combinations of mutations that exhibit mutual exclusivity, a pattern frequently observed for mutations in cancer pathways. CoMEt uses an exact statistical test for mutual exclusivity that is less biased toward high frequency alterations than previous approaches and more sensitive in detecting combinations of lower frequency alterations. We compute the exact test using a novel tail enumeration procedure and also derive a binomial approximation. CoMEt simultaneously identifies collections of one or more combinations of mutually exclusive alterations, consistent with the observation of multiple hallmarks of cancer. CoMEt summarizes over multiple possible collections with high scores. Finally, CoMEt also enables simultaneous analysis of subtype-specific mutations. We show that CoMEt outperforms other mutual exclusivity approaches on simulated and real data. We apply CoMEt to hundreds of samples from four different TCGA cancer types: gastric cancer (STAD), glioblastoma (GBM) and acute myeloid leukemia (AML), and breast cancer (BRCA). We identify multiple mutually exclusive sets within each cancer type. These include the RTK/RAS pathway in gastric cancer; the Rb and p53 signaling pathways in GBM; and a collection containing multiple kinases, including FLT3 and RAS genes, in AML. Many of these collections overlap known pathways, but others reveal novel putative cancer genes. In addition, we analyze subtype-specific mutations in four molecular subtypes of breast cancer and three molecular subtypes of gastric cancer. We identify several pathways that are enriched for mutations in specific subtypes including the PI(3)K/AKT signaling pathway in the Luminal A subtype of BRCA and the strong exclusive module containing CDH1, the fusion gene ARHGAP26-CLDN18, and amplification on EPHB3 in the genomically stable subtype of STAD. CoMEt analysis also reveals subtle relationships between subtype-specific mutations and mutations in different pathways. These findings provide testable hypotheses for experimental validation.

Page 5: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 5 -

91. Decoding Breast Cancer with Quantitative Radiomics and Radiogenomics: Imaging

Phenotypes in Breast Cancer Risk Assessment, Diagnosis, Prognosis, and Response

to Therapy Maryellen Giger, Hui Li, Karen Drukker, Yuan Ji, Yitan Zhu, Charles Perou, Carl Jaffe, Justin Kirby, Erich Huang, John Freyman, Elizabeth Morris, Elizabeth Burnside, and the TCIA Breast Cancer Group Purpose: To demonstrate, using the TCGA TCIA breast cancer dataset, the role of quantitative radiomics in characterizing the molecular subtypes of breast cancer and associating the magenetic resonance imaging (MRI) computer-extracted image phenotypes (CEIP) with genomic data. Understanding of the potentially correlative or complimentary relationships between quantitative image phenotypes, cancer subtypes, and genomic data of breast tumors is expected to allow for improved prognostic assessment and subsequently more effective cancer treatment plans. Method and Materials: Analyses were performed on the TCGA breast cases that possessed corresponding MRI studies. MRI-based phenotyping analysis included 3D lesion segmentation based on a fuzzy c-means clustering algorithm and the computerized feature extraction yielding quantitative characteristics from the six phenotypic categories of size, shape, morphology, enhancement texture, kinetics, and variance kinetics. Correlative and classification analyses were conducted. The performance of the image-based phenotypes and genomic data in distinguishing between molecular subtypes of breast cancer was evaluated using ROC analysis with area under the ROC curve (AUC) as the figure of merit. Results: MR images were available on 91 TCGA breast cases. After identification of the lesion locations by TCIA radiologists, the quantitative analysis was automatically conducted and evaluation conducted in tasks including receptor status (e.g., ER, PR), molecular subtype (e.g., Luminal A, Luminal B, Basal-like), risk of recurrence (e.g., PAM50, Mammaprint), and genomic data. Multiple linear regression analyses demonstrated statistically significant Pearson correlations (0.5-0.55) between MRI tumor signatures and multi-gene assay recurrence scores. Important MR phenotypes included tumor size and enhancement texture patterns characterizing tumor heterogeneity. Use of the MRI signatures in the tasks of distinguishing between good and poor prognosis in terms of levels of recurrence yielded AUC values (standard error) of 0.83 (0.07), 0.77 (0.06), 0.80 (0.07), and 0.75 (0.08) for MammaPrint, Oncotype DX, PAM50 Risk of Relapse Subtype (ROR-S), and PAM50 ROR-P (subtype+proliferation), respectively. Significant associations were also identified between the MRI phenotypes (such as tumor size, shape, margin, enhancement texture, blood flow kinetics) and molecular features involved in multiple regulation layers (including DNA mutation, miRNA expression, protein expression, pathway gene expression and copy number variation). Conclusion: The results from this study indicate that quantitative MRI analysis shows promise as a means for high-throughput image-based phenotyping to yield quantitative predictive models of breast cancer for precision medicine and patient treatment strategies. This project is funded in part by the University of Chicago Dean Bridge Fund and in part with federal funds from the National Cancer Institute, National Institutes of Health, under Contract No. HHSN261200800001E. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the U.S. Government.

Page 6: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 6 -

102. Integrated Genomic Characterization of Pheochromocytoma and Paraganglioma Matthew D. Wilkerson1, Katherine L. Nathanson2, Karel Pacak3, The Cancer Genome Atlas Pheochromocytoma and Paraganglioma Analysis Working Group 1Lineberger Comprehensive Cancer Center, Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; 2Abramson Cancer Center, Department of Medicine, Division of Translational Medicine and Human Genetics, University of Pennsylvania, Philadelphia, Pennsylvania; 3National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland In recent years, we have seen important advances in understanding the molecular basis of pheochromocytoma and paraganglioma (PCC/PGL), particular in relationship to inherited susceptibility. Nevertheless, specific understanding of the somatic genomic alterations, in hereditary and non-hereditary PCC/PGL including malignant ones, is still very limited. The Cancer Genome Atlas (TCGA) consortium has conducted a coordinated effort to characterize the molecular basis of PCC/PGL and have collected a cohort of 184 PCC/PGL (excluding head and neck PGL). At least 30% of patients were attributed with a germline mutation in a known familial PCC/PGL susceptibility gene, thus making PCC/PGL the tumor type with the greatest rate of germline mutations in the TCGA. Some of the familial susceptibility genes (NF1, RET, and VHL) were also observed to be somatically mutated and in similar expression subtypes. Most notably, we have identified the first RNA fusion genes (and several species of fusion genes) in PCC/PGL, demonstrating for the first time that inter-chromosomal translocation and gene fusion is a method of molecular pathogenesis in this disease. In particular, we report the novel fusion genes UBTF-MAML3 and TCF4-MAML3, which are activating based on over-expression properties, are recurrent (5% of cases), found in tumors that had no other driving event, occur in exactly one gene expression subtype, and associate with poor patient outcome. We identified new somatically mutated driver genes, such as CSDE1 which is also coordinated with transcript splicing alterations. Lastly, our integration of these abundant germline and somatic alterations across mRNA, miRNA and methylation platforms has enabled a characterization of vastly divergent patterns of molecular pathogenesis in PCC/PGL.

Page 7: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 7 -

58. A Multi-Cancer Gene Signature Associated With Stromal Activation Zhenqiu Liu1, Ann E. Walts2, Beth Y. Karlan3,4, Sandra Orsulic3,4 1Biostatistics and Bioinformatics Research Center, 2Department of Pathology and Laboratory Medicine, 3Women’s Cancer Program, Samuel Oschin Comprehensive Cancer Institute, CedarsSinai Medical Center, Los Angeles, California; 4Department of Obstetrics and Gynecology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, California The presence of cancer cells induces a reaction in the surrounding stroma similar to fibrosis and wound healing after injury. Activated stroma (also known as reactive or desmoplastic stroma) is morphologically different from disease-free stroma and is characterized by the increased presence of activated myofibroblasts and altered extracellular matrix. Despite recent advances toward understanding the key roles of activated stroma in the initiation, progression and recurrence of different cancers, it is currently unknown if different cancer types share a common pattern of stromal cell recruitment and activation across their respective microenvironments. Previously, we identified a TGFβ-regulated collagen remodeling stromal gene signature associated with metastasis, recurrence and poor survival in ovarian cancer and showed that inhibition of one of the signature genes, COL11A1, is effective in reducing ovarian cancer growth and dissemination in a mouse model. Using TCGA datasets from multiple primary cancer types, we now show that a highly conserved set of COL11A1 co-expressed genes is present in all epithelial cancers examined, including cancers of the breast, ovary, lung, pancreas, stomach, urinary bladder, colon, thyroid, cervix, head and neck, and prostate, but not in non-epithelial malignancies such as leukemias, gliomas and sarcomas. In any given epithelial cancer, this gene signature is typically associated with the mesenchymal molecular subtype which often has the worst prognosis of the molecular subtypes of that cancer. Although many of the individual signature genes are present in myofibroblasts and mesenchymal stem cells and are enriched in benign processes such as wound healing, fibrosis and tissue homeostasis, others are cancer-specific and could serve as biomarkers of malignant stromal activation and/or highly selective therapeutic targets within the tumor microenvironment.

Page 8: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 8 -

104. Integrated Analysis of TCGA Identifies Targets and Patient Populations for

Antibody-Drug Conjugates Wenyan Zhong1, Keith Ching2, Tao Xie2, Jeremy Myers1, Marc Damelin1, Puja Sapra1, Kim Arndt1, Jadwiga R. Bienkowska2, Paul A. Rejto2 1Oncology Research, Pfizer Worldwide R&D, Pearl River, New York; 2Oncology Research, Pfizer Worldwide R&D, La Jolla, California Antibody-drug conjugates (ADCs) are a promising class of therapeutics for the treatment of cancer. The strategy selectively targets tumor cells by attaching cytotoxic agents to an antibody that recognizes antigen preferentially expressed on the cell surface of tumor tissues compared to normal tissues. We describe an integrated computational approach for ADC target identification and patient selection employing The Cancer Genome Atlas (TCGA) and Gene-Tissue Expression (GTEx) data. Scores were defined for each target to capture its expression in normal tissues (NormScore) and tumor tissues (TumorScore). NormScore was calculated as the number of normal tissues in which the median target expression level is above defined expression threshold in that tissue. TumorScore is calculated as the rank product of differential expression between tumor and normal tissue (fold change) and prevalence (% samples above defined expression threshold in tumor tissues). We then developed a customized target selection criteria tailored to various type of payload class using these scores along with other factors such as cell surface expression. For example, Microtubule inhibitor payloads may require higher target expression than DNA damage payloads due to differences in potency. Our approach successfully predicted ADC targets currently in clinical trials (eg. HER2, GPNMB, and STEAP1) in addition to novel ADC targets. While IHC based target expression measurement is the primary biomarker used for patient selection for ADC therapeutics, RNASeq from large tumor panels in TCGA is an excellent data resource for estimating the prevalence of target expression and predicting patient populations to guide clinical biomarker assay development and clinical trial design. These examples demonstrate how Pfizer incorporates integrated TCGA analysis from target identification through to patient selection.

Page 9: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 9 -

57. Mutation Hotspots Associate with Gene Expression, Signaling Pathways, Protein

Domains, and Drug Response William Poole, Theo Knijnenburg, Brady Bernard, Ilya Shmulevich Institute for Systems Biology, Seattle, Washington Overview: The distribution of mutations in cancer genes is nonrandom. Oncogenes are recurrently mutated at the same amino acid positions. Tumor suppressor genes, on the other hand, often have protein-truncating alterations that form non-uniform patterns of mutations. The TCGA provides a significant opportunity to identify these hotspots of mutations in both well-known and novel cancer genes. Statistical associations with molecular data, such as gene expression, protein expression and drug response, can elucidate the functional consequences of the mutation hotspots. Approach: We have developed a novel multiscale clustering algorithm that uses gene mutation data to detect ‘hotspots’ of single-nucleotide protein-affecting mutations. The approach begins by fitting multiple mixture models each representing a different length scale. These multiscale models are combined using a greedy algorithm, which aims to find the set of non-overlapping clusters that minimize the Akaike information criterion. This methodology allows for the discovery of mutation hotspots of widely differing sizes; clusters range from individual to hundreds of amino acids. We have applied this approach to all mutation data in 11 cancers from TCGA. We performed a large-scale statistical analysis to associate these mutation hotspots with tens of thousands of molecular features in TCGA, including gene, protein (RPPA) and clinical phenotypes. Additionally, we employed annotations of protein domains and drug response measurements in cancer cell lines to further establish the functional importance of the mutation hotspots. Results: The uncovered mutation hotspots led to a novel ranking of genes, which is different from gene rankings based on commonly employed methods for identification of significantly mutated genes. Mutation hotspots are significantly enriched with annotated protein domains. Additionally, our approach highlighted specific regions of interest in genes, as these regions had a stronger statistical association with pathway-level expression signatures when compared to mutations found across the entire gene. We interpret these findings as the identification of functional regions within these genes. This hypothesis is further supported by the observation that sensitivity to anti-cancer drugs in cell lines is better explained by mutation hotspots versus mutations in the entire gene.

Page 10: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 10 -

4. Integrated Molecular Characterization of Uterine Carcinosarcoma Co-Chairs: Rehan Akbani, Douglas A. Levine The Cancer Genome Atlas Research Network, UCS group Uterine carcinosarcoma (UCS) is a rare tumor that is found in less than 5% of all uterine cancers. The median age of patients affected by this disease is 65 years and patients often present with vaginal bleeding. The overall 5-year survival rate for UCS is approximately 35% with a median overall survival of approximately 24 months, which is much worse than endometrial carcinoma (UCEC) with a median overall survival of more than 60 months. UCS is an aggressive disease and TCGA’s mission is to better understand the molecular characteristics of this disease, with an overarching goal to improve treatment options. We performed an integrated genomic, epigenomic, transcriptomic, and proteomic characterization of 57 uterine carcinosarcomas (UCSs) using array- and sequencing-based technologies. UCSs have extensive copy number alterations, poor unsupervised clustering, and highly recurrent somatic mutations. Nearly all (91%) cases had TP53 mutations and frequent mutations were also found in PTEN, PIK3CA, PPP2R1A, FBXW7, and KRAS. Transcriptome sequencing identified a strong EMT gene signature in subset of 17 (30%) cases. Corresponding decreases in mir-200 family expression were apparent in cases with EMT signatures and were generally under epigenetic control. UCS had the largest range of EMT signature among the different tumor types studied and shared proteomic features with both, gynecological and non-epithelial tumors. Our results indicate that UCS tumors share many features with serous-like endometrial carcinomas including frequent TP53 mutations and extensive somatic copy number alterations, though with greater EMT features. Despite having mixed histology, the tumors demonstrated similar clonality to other common solid tumors suggesting a homogeneous cellular population at the molecular level. These data taken together suggest that some UCS tumors develop from an endometrioid lineage, though most are thought to de-differentiate from a serous precursor accounting for their clinical aggressiveness and poor response to treatment. Multiple somatic mutations and copy number alterations in genes that are therapeutic targets have been identified.

Page 11: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 11 -

31. Characterization of Tumor-Infiltrating T-lymphocytes in TCGA Cancers Linghua Wang, Liu Xi, Kyle Covington, Richard A. Gibbs, David A. Wheeler Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas Tumor-infiltrating lymphocytes (TILs) are a type of white blood cells recruited into the tumor in an attempt to kill the tumor cells. TILs reflect the host’s anti-tumor immune response and it has been demonstrated in multiple tumor types that TILs can be a valuable predictor of prognosis. The use of TILs as an adoptive cell transfer therapy has recently shown great promise in the treatment of human cancers including metastatic melanoma and colorectal cancer. To date, however, many properties of TILs are not fully understood. To gain a better understanding of the anti-tumor immune response, we developed a pipeline to characterize expression of T-cell receptor (TCR) repertoire and other T-cell makers in patients’ tumors using the RNA-seq data. We have applied this approach to cutaneous squamous cell carcinomas and the TCGA colorectal cancer samples. The expression of TCR-β and TCR-α genes in the variable region, joining region, and constant region was evident in over a third of patients, allowing further assessment of the state of TCR repertoire. Further analysis of the sequencing reads aligned at the breakpoints of TCR rearrangement suggested a polyclonal profile. Compared to the TCR repertoire of normal lymphocytes, the repertoire was limited in some of the patients with depletion of a portion of the TCR repertoire, suggesting the possibility a specific response to tumor antigen was being mounted in those patients. We are developing methods to quantify the levels of TILs with the expression data as part of the TCGA pan-can project. In summary, our preliminary results revealed novel and important details about the biology of anti-tumor T-cell responses.

Page 12: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 12 -

43. Global Analysis of Somatic Structural Alterations and Their Impact on Gene

Expression in Diverse Human Cancers Babak Alaeimahabadi, Erik Larsson

Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden Tumor genomes are mosaics of somatic structural variations (SVs) that may contribute to the activation of oncogenes or inactivation of tumor suppressors, for example, by altering gene copy-number. However, while gene copy-number variation accounts for some of transcriptional variability seen between tumors, most mRNA changes remain unexplained. Notably, there are multiple other ways in which SVs can alter transcription, but the overall impact of these mechanisms on tumor transcriptional output has not been systematically studied. Moreover, the overall structural basis of copy-number changes in cancer is poorly known. Here, we use whole-genome sequencing (WGS) data from TCGA to map SVs across >500 tumors and >15 cancers, and investigate the relationship between SVs, copy-number alterations, and mRNA levels. We use known chromosomal breakpoints from copy-number data to carefully evaluate and optimize tools and parameters for WGS-based SV detection. We find that ~20% of copy number alterations can be clarified structurally and that most copy-number amplifications are due to tandem duplications. We also find that some seemingly simple copy-number alterations have a more complex structural basis involving composite events on different chromosomes. We observe frequent swapping of strong and weak promoters in the context of gene fusions, and find that these events have a measurable global impact on mRNA levels. Notably, many of these fusions are due to short-range events, visible in copy-number data as small copy number segments (<100 kb), or copy-number neutral inversions, while only ~30% are long-range or interchromosomal fusions. Our analyses confirm several known fusion genes such as TMPRSS2-ERG in prostate cancer and CCDC6-RET in thyroid cancer, and in some of these cases our WGS-based analysis provides further structural insight. In conclusion, by combining SV, copy-number and expression data we gain insight into the structural basis of copy-number alterations as well as the global impact of genomic rearrangements on gene expression in human tumors.

Page 13: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 13 -

98. Radiogenomic Analysis of TCGA/TCIA Diffuse Lower Grade Gliomas by

Molecular Subtype L.M. Poisson1, L.A.D. Cooper2,3,6, E.P. Huang7, J.Y. Chen8, A.E. Flanders9, D.J. Brat2,5,6, C.A. Holder4,6, with TCGA Glioma Phenotype Research Group10,11 1Department of Public Health Sciences, Center for Bioinformatics, Hermelin Brain Tumor Center, Josephine Ford Cancer Institute, Henry Ford Health System, Detroit, Michigan; Departments of 2Biomedical Informatics, 3Biomedical Engineering, 4Radiology and Imaging Sciences, 5Pathology and Laboratory Medicine, 6Winship Cancer Institute, Emory University, Atlanta, Georgia; 7Division of Cancer Treatment and Diagnosis, National Cancer Institute, NIH, Bethesda, Maryland; 8Department of Radiology, University of California, San Diego, San Diego, California; 9Department of Radiology, Thomas Jefferson University, Philadelphia, Pennsylvania; 10http://www.cancerimagingarchive.net/, 11https://wiki.cancerimagingarchive.net/ Introduction: TCGA has dramatically improved our understanding of diffuse cerebral gliomas through comprehensive genomic analysis. For the Lower Grade Gliomas (LGGs), emerging genetic classification divides 6 histology and grade combinations into 3 clinically-relevant molecular classes based on the status of isocitrate dehydrogenase (IDH) mutations and chromosome 1p/19q co-deletions (IDHmut-codel, IDHmut-non-codel, IDHwt). To investigate relationships between neuroimaging (MRI) phenotypes and genetic classification of LGGs, we performed a comparative analysis of semi-quantitative MRI features (VASARI LGG feature-set) and IDH/1p19q genomic classifications. Methods: Pre-operative MRI scans of 72 TCGA-profiled LGGs were collected by The Cancer Imaging Archive, including T1-weighted without and with contrast enhancement, T2-weighted, fluid-attenuated inversion recovery (FLAIR) and DWI/ADC maps. Each imaging set was independently assessed by 3 neuroradiologists blinded to molecular status. The VASARI LGG feature-set was developed by the TCGA Glioma Phenotype Research Group and defines a standardized set of 26 morphological features. Data were compiled across the 3 readers to define a single feature-set measure per sample. Clinical and molecular classifications were obtained from the LGG-AWG marker paper (TCGA Research Network. NEJM;2015, in press). Associations with histology, WHO grade and molecular type were assessed by Fisher’s exact test (categorical features) and ANOVA/t-test (continuous features). Results: Of 70 tumors with known IDH/1p19q genomic classification, 16 were IDHmut-codel, 34 were IDHmut-non-codel, and 19 were IDHwt. IDHmut-codel tumors were preferentially centered in the frontal lobes (75%, FET p=0.026). IDHmut-non-codel tumors tended to arise in the frontal (41%) and temporal lobes (41%), whereas IDHwt tumors did not show regional preference. All LGGs had regions of non-enhancement. The nonenhancing tumor margins were more well-defined for IDHmut LGGs (56% and 76% were well-defined) than for IDHwt tumors (32%, FET p=0.027). Sixty-six percent of LGGs had an enhancing region, but this was not associated with molecular class (FET p=0.286), although contrast enhancement was more likely in grade III than grade II (FET, p=0.043). Twenty-three percent of these grade II/III tumors had MR imaging evidence of necrosis, with presence equally likely in any of the 3 molecular classes (FET p=0.931); however, 5/16 (31%) of LGGs with necrosis on MRI were grade II. IDHwt tumors tended to be smaller than IDHmut tumors (23.0 cm2 vs 39.7cm2, respectively, for maximal area, t-test p<0.001). Further differences were found in T1/FLAIR ratio (FET p=0.030), T2/FLAIR signal crossing the midline (FET p=0.007), and presence of hemorrhage (FET p=0.009), cysts (FET p=0.006), or satellites (FET p=0.030). Conclusions: Neuroimaging review demonstrated differential MR imaging features between LGG molecular classes. Interestingly, IDHwt LGGs showed association with aggressive features (e.g., small dimension with poorly-defined non-contrast-enhanced borders). Yet the lack of association with necrosis or presence of an enhancing region suggests that the IDHwt class is not simply underdiagnosed GBM. We did not uncover a specific feature that clearly defined a molecular group. An investigation of imaging profiles that align with molecular type or define further subclasses is underway.

Page 14: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 14 -

109. Integrative Molecular Analysis Across Adult Glioma Reveals Novel Relationships

Between Histological Subtypes and Molecular Signatures 1Houtan Noushmer, 1Tathiane Malta, 1Thais Sabedot, 2Floris Bathel, 3Michele Ceccarelli,3Antonio Iavarone,2 Roel

Verhaak, on behalf of the Pan-Glioma TCGA AWG.

1University of São Paulo, Ribeirão Preto, São Paulo, Brazil 2The University of Texas MD Anderson Cancer Center 3Columbia University, NY USA TCGA has performed comprehensive characterization of genome, transcriptome, DNA methylome and proteome of 1,122 primary diffuse grade II-IV glioma. We aimed to elucidate the molecular underpinnings associated with the clinical heterogeneity of glioma. We detected significantly mutated genes across 737 gliomas and re-identified known events (e.g. TP53, IDH1, ATRX, EGFR, CIC, PTEN, PIK3CA, NF1) and putative novel drivers including NBPF1, KDR, SETD2 and MAP3K1. Mutually exclusive TERT promoter and ATRX mutations were detected in 266 of 321 samples. Copy number alterations (CNA) were determined among 1,087 glioma cases and identified novel deletions on 3p21 (targeting SETD2, also shown to be mutated), 16p12.2 (HS3ST4), and 2q22.1 (GIGYF2) and amplification in 1q22 (ERRFI1), focal alterations targeting EGFR, PTEN, CDKN2A, and whole chromosome arm events such as co-deletion of 1p and 19q, chr10 loss and chr7 gain. Using RNA sequencing from 665 gliomas, we identified a total of 39 candidate activating RTK fusions (involving FGFR3, EGFR, NTRK, EPHB, PDGFRA and MET) where EGFR/FGFR fusions were the strongest activators of RTK signaling and correlates with patient survival in IDHwt gliomas. We performed unsupervised clustering of the promoter DNA methylation and gene expression of 932 and 1045 gliomas, respectively which resulted into 6 methylation groups (LGm1-6) and four expression subtypes (LGr1-4). These clusters overlapped significantly and could be divided into three macro groups separated by both IDH mutation status, mutations in TP53/ATRX and 1p19q codeletion. Supervised comparison of IDH wildtype LGG and GBM suggested upregulation of oxidative phosphorylation gene sets in GBM which may provide an intriguing possibility of pathogenetic differences between histologically different but molecularly similar gliomas. Whereas, within-cluster functional enrichments between IDH mutant samples revealed that GBMs are primarily marked by an hyperproliferation signature with respect to LGGs which are characterized by increased activation of genes related to glial neuronal function. The first DNA methylation macro-group contains clusters LGm1-3 (n=454) and is enriched for lower grade glioma (421/454, 92.7%), and IDH mutant samples (451/454, 99%). In contrast, the second macro-group contained clusters LGm4-6 and was enriched for glioblastoma (383/478, 80%) and IDH-wildtype (477/478, 99.8%). As expected, the high grade samples (n=33) in the LGm1-3 macro-group were all hypermethylated and the low grade discordant samples in the LGm4-6 macro-group were all IDH-wildtype. LGm6 was characterized as having the lowest overall methylation profile, the youngest age at diagnosis among the IDHwt cohort and with overall improved survival relative to the LGm4-5 IDHwt methylation groups, suggesting that increases in age correlates with overall methylation increase that may impact survivability in glioma. Interestingly, this group is enriched for BRAF somatic alterations and shares a similar methylomic signatures to pilocytic astrocytomas and tumors with known H3F3A histone mutation (K27 and G34) suggesting that primary GBMs can be further classified based on methylation profiles with distinct clinical and molecular phenotypes. In summary, analysis of the most complete molecular glioma dataset to date supports the classification of glioma using molecular markers and identifies novel putative driver genes such as SETD2.

Page 15: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 15 -

59. The Landscape of Somatic Structural Rearrangements in RAS Pathway Genes Angeliki Pantazi1, Alexei Protopopov2, Xingzhi Song2, Lixing Yang3, Semin Lee3, Christopher A. Bristow2, Michael Parfenov1, Melanie Kucherlapati1, Jon Seidman1, Peter Park3, Lynda Chin2, Harvard GCC Team, Raju Kucherlapati1

¹Department of Genetics, Harvard Medical School, Boston, Massachusetts; Division of Genetics, Brigham and Women’s Hospital, Boston, Massachusetts; 2Institute for Applied Cancer Science, Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center, Houston, Texas; 3Center for Biomedical Informatics, Harvard Medical School, Boston, Massachusetts The RAS pathway represents one of the most well-studied cancer pathways. While the mutations and copy number aberrations in RAS-related genes have been thoroughly studied, our understanding of the landscape of structural aberrations in solid tumors is still incomplete. Current studies focus on cancer transcriptomes to elute functional fusions; however such an approach cannot reproduce the underlying complex genomic architecture in cancer cells and cannot detect disruption and, thus, inactivation of tumor suppressor genes. In order to identify somatic structural rearrangements we assembled a computational pipeline for the detection of structural rearrangements that makes use of two algorithms, Breakdancer and Meerkat. We applied this pipeline to the TCGA low pass paired-end whole genome sequencing data for 12 tumor types (BLCA, CESC, ESCA, HNSC, LGG, LUAD, PRAD, COAD, SKCM, STAD, THCA, UCEC). We then focused our detailed analysis on structural rearrangements that harbor at least one breakpoint within a gene participating in the RAS/RAF/MEK/ERK pathway. We detected a wide variety of events such as oncogene-activating gene fusions, promoter swapping, loss of regulatory 3’-UTR and disruption of tumor suppressor genes. Our pipeline was able to detect already known events and identify novel partners for known fusion genes and new pan-cancer structural rearrangements in RAS pathway genes. Using an integrated approach we used data on somatic mutations, copy-number aberrations, and mRNA and RPPA protein expression derived from the TCGA to validate structural rearrangements, demonstrate their mutual exclusivity to other types of somatic aberrations and show that structural rearrangements were able to increase RAS activity. Our findings suggest that structural rearrangements are an alternative mechanism through which RAS signaling is activated in different tumor types.

Page 16: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 16 -

100. Somatic Copy Number Alterations and Aneuploidy Events in Uveal Melanoma Juliann Shih, Andrew Cherniack, Julian Hess, Carrie Sougnez, Gordon Saksena, Bita Esmaeli, Cyriac Kandoth, Matthew Meyerson, UVM AWG Uveal melanoma (UVM) is a rare tumor (~6 per million per year), but also the most common intraocular cancer. It is molecularly and clinically distinct from cutaneous melanoma. Various past studies have determined prognostically significant chromosomal or arm-length aberrations in UVM, including monosomy 3 (associated with unfavorable prognosis and metastatic risk) and 6p gain (better prognosis). The TCGA UVM analysis working group profiled somatic copy number alterations (SCNAs) in 80 tumors on Affymetrix SNP 6.0 arrays. Unsupervised hierarchical clustering of significantly altered SCNA lesions by GISTIC 2.0 distinguished four arm-level alteration distinct subtypes: 1A: quiet; 1B: 6p, 8q amplified; 2A: 3 loss; and 2B: 3, 6q, 8p loss with 8q gain. BAP1 mutations, thought to be the driver of chromosome 3 loss, were found in 17/42 subtype 1 tumors and no subtype 2 tumors. SF3B1 mutations were almost mutually exclusive (one exception) with BAP1 mutations, and found in 14/38 subtype 2 tumors and 4/42 subtype 1 tumors. Other aneuploidy events, including 1p loss, 1q gain, 11 gain, and 21 gain, were prevalent throughout UVM with no significant enrichment in any subtype. Subtypes 1A+1B (collectively subtype 1) were enriched for spindle-cell UVM, and subtypes 2A+2B (subtype 2) were enriched for epithelioid-cell UVM. Subtype 1 showed significantly better survival than subtype 2, while 2A showed slightly better survival than 2B, indicating that a loss of chromosome 3 is predictive of poor survival in UVM. Additional aneuploidy events in 2B may be predictive of poorer survival within subtype 2. Further analyses will determine absolute allelic copy number and correlates to other somatic alterations.

Page 17: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 17 -

72. The Landscape of Driver Kinase Fusions in Cancer Nicolas Stransky, Ethan Cerami, Stefanie Schalm, Joseph L. Kim, Klaus P. Hoeflich, Christoph Lengauer Blueprint Medicines, Cambridge, Massachusetts Human cancer genomes harbor a variety of alterations leading to the deregulation of key pathways in tumor cells. The genomic characterization of tumors has uncovered numerous genes recurrently mutated, deleted or amplified, but gene fusions have not been studied as extensively. Kinase fusions represent ideal targets for the development of cancer drugs because they often confer oncogenic dependency in hematopoietic and solid malignancies as demonstrated by the success of several kinase inhibitors. For example, imatinib induces remission in leukemia patients who are positive for BCR–ABL1 fusions, and crizotinib and ceritinib have produced significant clinical benefit in patients with lung adenocarcinomas and mesenchymal tumors harboring ALK fusions. We have developed heuristics for reliably detecting gene fusion events in RNA-seq data sets and apply them to nearly 9,000 samples from The Cancer Genome Atlas (TCGA). Fusions between any two genes were identified based on the number of chimeric reads and split reads. Then a number of filtering criteria were applied to flag false positive and non-functional fusions, including the removal of kinase fusions observed in a panel of more than 3,500 normal samples from diverse origins. Finally, we reviewed all recurrent kinase fusions manually to identify putative oncogenic drivers with distinctive characteristics of functional kinase fusions. We thereby were able to recapitulate most known translocation events in solid tumors (i.e., ALK, BRAF, EGFR, FGFR1, 2 and 3, NTRK1, 2 and 3, PDGFRA, PRKCA, RAF1, RET, ROS1). Interestingly, we identified new tumor types harboring such fusions and discovered several novel fusion partners for these kinases. We also detected several low-frequency, pan-cancer kinase fusion events, for example in the neurotrophic tyrosine receptor kinases NTRK1, NTRK2 and NTRK3, that drive tumorigenesis in a small fraction of multiple cancers, regardless of tissue type. Using our computational pipeline, we identified several novel and recurrent kinase fusions involving the MET proto-oncogene and PIK3CA. These bona fide oncogenes have not been shown previously to be activated by fusion events. Our analysis also uncovered novel, recurrent fusions in kinases with no known tumorigenic genomic alterations (e.g., FGR and PKN1), potentially resulting in active and oncogenic fusion proteins that represent putative targets for drug discovery. These findings have immediate diagnostic and clinical implications and expand the therapeutic options for cancer patients, as approved or exploratory drugs exist for many of these kinases.

Page 18: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 18 -

107. Comprehensive Pan-Genomic Characterization of Adrenocortical Carcinoma Thomas J. Giordano University of Michigan Adrenocortical carcinoma (ACC) is a rare neoplasm with a heterogeneous outcome and limited treatment options. Here we describe the genomic, transcriptomic, epigenomic and proteomic profiling of 91 ACCs as a part of The Cancer Genome Atlas (TCGA). We identified potential driving alterations including amplifications (TERT, TERF2 and CDK4), deletions (ZNRF3, CDKN2A and RB1) and point mutations in genes previously not known to participate in adrenal disease (RPL22), as well as in genes known to initiate familial syndromes that include adrenocortical neoplasms (TP53, CTNNB1, PRKAR1A, MEN1). We observed a wide variability in ploidy in absolute copy number and genotypic analysis, which implies a sequential development from hypodiploidy to polyploidy via whole genome doubling in a subset of ACCs. Integrated analyses confirmed and expanded the role of mutations of t -catenin and PKA signaling pathways. Unsupervised clustering of multidimensional data revealed three classes of ACC with biological and clinical significance. Using genomic data of other tumour types, we performed pan-cancer analyses of ACC, which allowed us to place ACC in a broader context of cancer genomic profiles. Our results present a comprehensive genomic landscape and a refined molecular classification of ACC, improving our understanding of its pathogenesis that will ultimately improve the care of patients.

Page 19: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 19 -

108. NCI Cloud Pilots Report Tanja Davidsen National Cancer Institute, NIH The growth of large-scale sequence data for cancer research is rapidly out-stripping the required computational capacity for storage, processing, network transmission, and analysis. Investigators mine genomics data by downloading data stored at a variety of locations, often adding their own local data. They then compute over these data on local hardware using computational tools, many of which are locally developed and rapidly changing. This model has been successful for many years, but is becoming untenable given the enormous growth of biomedical data since the advent of large-scale programs like TCGA. At its projected completion in 2015, TCGA will generate approximately 2.5 Petabytes (PB) of data. Evidence indicates that with less than 1 PB of TCGA data available, much of the NCI-supported research community is already computationally limited by financial constraints and IT (network, storage, and computing) issues. Wider access to data and computational infrastructure utilizing new technological capabilities could potentially address these needs. The purpose of this project is to support the development of a new model for computational analysis of biological data that has the potential to address the challenges described above. This model involves the creation of a set of data repositories with co-located computational capacity and an Application Programming Interface (API) that provides security and data access for developers of analytic tools. In this model, applications are brought to the data, rather than bringing the data to the applications. Such a “Cancer Genomics Cloud” has the potential to democratize access to NCI-generated genomic data and provide a more cost-effective way to provide computational support to the cancer research community. Three NCI Cloud Pilots have been awarded and are currently in development at Broad, The Institute for Systems Biology, and Seven Bridges Genomics.

Page 20: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 20 -

64. Analysis of Paired Tumor and Normal Data in TCGA Andrew M. Gross1, Jason F. Kreisberg2, Trey Ideker1,2 1Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, California; 1,2Moores Cancer Center, University of California, San Diego, La Jolla, California A remarkable aspect of cancer is the diversity of ways by which cells can achieve the shared effect of uncontrolled growth. While numerous studies have characterized the tumor phenotype within specific tissues and measurement platforms, this constrained study design provides little information about the true scope of the perturbation. In contrast, TCGA provides an unprecedented panoramic view of the disease. Here we describe a pan-cancer approach to characterize conserved changes from adjacent normal to tumor tissues in gene and miRNA expression as well as CpG methylation in over 500 patients. For this analysis, we focused on the fraction of patients for which an entity is overexpressed or methylated, allowing for a simple and robust handle on the differential tumor response across all cancer types. We employed this framework to identify ubiquitous activation of mitotic genes, repression of metabolic genes, global increases in miRNA levels and hypermethylation at PRC2 binding sites. After establishing these conserved cancer phenotypes, we then identified features with tissue-specific activity including overexpression of MET specifically in tissues in which germ-line MET mutations have been shown to drive familial cancers. Finally we used this approach to integrate across multiple data layers and found a number of genes that are differentially expressed in specific genetic or epigenetic contexts.

(a) Distribution of fraction of methylation probes, miRNA, and mRNA increased in tumor compared to patient-matched normal tissue in the pan-cancer cohort. (b) Comparison of fraction of patients with overexpression in tumor for 18,410 genes across the breast cancer cohort (BRCA) compared to all other matched TCGA patients. (c) Differential gene expression of MET across different TCGA tissues in matched tumor (red) and normal (blue) patients. Note that arrow in b corresponds to the MET gene.

Page 21: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 21 -

Actionable Clinical Insights

Page 22: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 22 -

6. Proteomic Stratification of Clear Cell Renal Cell Carcinoma Utilizing The Cancer

Genome Atlas (TCGA) With External Validation Samuel D. Kaffenberger1, Giovanni Ciriello1, Andrew G. Winer1, Martin H. Voss1, Jodi K. Maranchie2, Pheroze Tamboli3, W. Kimryn Rathmell4, Toni K. Choueiri5, Robert J. Motzer1, Jonathan A. Coleman1, Paul Russo1, Chris Sander1, James J. Hsieh1, A. Ari Hakimi1 1Memorial Sloan Kettering Cancer Center, New York, New York; 2University of Pittsburgh Medical Center, Pittsburgh, Pennsylvania; 3The University of Texas MD Anderson Cancer Center, Houston, Texas; 4The University of North Carolina Lineberger Comprehensive Cancer Center, Chapel Hill, North Carolina; 5Dana-Farber Cancer Institute, Boston, Massachusetts Introduction: Proteomics represents the ultimate convergence of DNA and expression alterations. We therefore sought to leverage TCGA reverse phase protein array (RPPA) data with an independent proteomic platform to identify druggable targets and pathways associated with prognosis in clear cell renal cell carcinoma (ccRCC). Methods: Unsupervised hierarchical consensus clustering was performed and differentially expressed proteins were identified for pathway analysis. Associations with clinicogenomic factors were assessed and Cox proportional hazards models were constructed for disease-specific survival (DSS). Results: RPPA clustering of 324 patients from the ccRCC TCGA revealed 5 robust clusters characterized by alterations in specific pathways and divergent prognoses (Figures). Cluster 1 was characterized by poor DSS and decreased expression of receptor tyrosine kinases (RTKs) and upregulation of the mTOR pathway. It was also associated with mTOR pathway somatic alterations, sarcomatoid histology, and the ccB poor-risk mRNA signature (all p<0.001). Cluster 2 was characterized by increased expression of RTKs and interestingly, also had upregulation of the mTOR pathway; however, it had few mTOR pathway somatic alterations and had excellent DSS. Clusters 3-5 were associated with intermediate prognoses and other distinct pathway alterations. After accounting for grade and pTNM stage, cluster designation remained independently associated with DSS (HR 0.23 for cluster 2 versus cluster 1, 95% CI 0.08-0.68; p=0.008). External validation was performed on a separate cohort of 189 patients with a different quantitative proteomics platform. A panel of phosphoproteins (pHER1, pHER2, pHER3, pSHC, pMEK, pAKT), highly discriminant between the most divergent RPPA clusters (1 and 2) was evaluated. Those at the highest quartile of activation in > 3 proteins (similar to the good prognosis cluster 2) were associated with improved DSS (HR 0.19, 95% CI 0.05-0.082; p=0.03). Patients with mTOR pathway activation segregated to those with concurrent RTK activation similar to cluster 2 (n=83) and those without, similar to cluster 1 (n=13).

Page 23: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 23 -

Conclusions: We have identified and validated proteomic signatures which cluster ccRCC patients into 5 prognostic groups. Furthermore, two distinct mTOR-activated clusters—one with high RTK activity and one with increased mTOR pathway somatic alterations were revealed and validated, which may have prognostic and therapeutic implications with respect to response to mTOR inhibition.

Page 24: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 24 -

2. TCGA Analysis of 412 Bladder Cancer Exomes: Preliminary Results J. Kim1, G. Getz1, D.J. Kwiatkowski1,2, S.P. Lerner3,4, J.N. Weinstein5, A. Cherniak1, G. Guangwu1, C.J. Creighton4, R. Akbani5, K.A. Hoadley6,7, W.Y. Kim7, M.B. Morgan8, T. Hinoue9, J.E. Rosenberg10, D.F. Bajorin10, D.E. Hansel11, H. Al-Ahmadie10, D. Gordenin12, J.M. Stuart13, G. Robertson14, A. Mungall14, R. Kucherlapati15, P.W. Laird9, G.B. Mills16, representing TCGA's Bladder Cancer Working Group 1Broad Institute; 2Brigham and Women’s Hospital; 3Scott Department of Urology, Baylor College of Medicine; 4Dan L Duncan Cancer Center, Baylor College of Medicine; 5Bioinformatics & Computational Biology, The University of Texas MD Anderson Cancer Center; 6The University of North Carolina Lineberger Comprehensive Cancer Center; 7The University of North Carolina at Chapel Hill; 8Human Genome Sequencing Center at Baylor College of Medicine; 9Van Andel Institute; 10Memorial Sloan Kettering Cancer Center; 11Cleveland Clinic; 12National Institute of Environmental Health Sciences; 13 Biomolecular Engineering, University of California, Santa Cruz; 14British Columbia Cancer Agency; 15Harvard Medical School; 16Systems Biology, The University of Texas MD Anderson Cancer Center The TCGA program at the Broad Institute has completed whole exome sequencing on 412 bladder cancer samples, the largest sequencing project in bladder cancer to date and more than three times the number of samples in the marker paper (Nature 2014). Sequence data was analyzed using standard computational pipelines (Firehose) at Broad, which identified a median mutation rate of 8.0 per Mb, third highest among all TCGA cancers analyzed, after melanoma and squamous cell lung cancer. MutSig2CV analysis on the 412 bladder samples identified 54 significantly mutated genes (SMGs) that included previously known drivers: TP53 (mutated in 49%), MLL2 (29%), KDM6A (26%), ARID1A (25%), PIK3CA (22%), MLL3 (19%), RB1 (17%), EP300 (15%), FGDR3 (14%), STAG2 (14%), CREBBP (12%), ELF3 (12%), ERBB2 (12%), MLL (11%), ERBB3 (11%), ERCC2 (9%), CDKN1A (9%), TSC1 (8%), FBXW7 (8%), ZFP36L1 (7%), CDKN2A (7%), RHOB (6%), NFE2L2 (6%), RXRA (6%), KLF5 (6%), PSIP1 (5%), HRAS (5%), RHOA (5%), KRAS (4%), and PTEN (3%). In addition to these known bladder cancer genes, we identified novel SMGs with mutation frequencies as low as 2%: ATM (14%), FAT1 (12%), SPTAN1 (12%), ASXL2 (10%), PARD3 (6%), FAM47C (5%), RBM10 (5%), CUL1 (5%), RPTN (5%), ACTB (5%), C3orf70 (4%), METTL3 (4%), CASP8 (4%), MBD1 (3%), GNA13 (3%), TAF11 (2%), NUP93 (2%), and SPN (2%). The frequency of driver mutations showed significant differences in comparison to the marker paper set of 130 samples. KRAS (0% to 4%, 14 mutations out of 17 at known hotspots - G12, A59, Q61), ERBB2 (7% to 12%), RB1 (14% to 17%), ELF3 (9% to 12%) all showed a significant increase in mutation frequency in the 412 set, while mutations in CDKN1A (14% to 9%), ERCC2 (12% to 9%), FBXW7 (11% to 8%), NFE2L2 (9% to 6%), RXRA (9% to 6%) were significantly less frequent. These differences may be due in part to inclusion of samples with higher proportion of variant squamous cell histology, which had been excluded from the marker paper analysis. As reported in the marker paper, APOBEC mutagenesis was pervasive across all samples. 43% of single nucleotide variants (SNVs) were associated with known APOBEC motifs, i.e. C>T/G at TCW (W=A/T) mutation contexts, and there was a strong correlation between the number of APOBEC mutations and overall mutation burden across samples (Pearson = 0.71 and p < 10-16). Eight of the 20 most frequently mutated SMGs were epigenetic modifiers: histone demethylase (KDM6A), histone methylases (MLL, MLL2, MLL3), histone acetylases (CREBBP, EP300), and chromatin modifying enzymes (ARID1A, ASXL2). Unsupervised pairwise exclusivity and co-occurrence analyses revealed that TP53 mutations were mostly exclusive with FGFR3 mutations (p < 0.0001 and q < 0.1, one-sided Fisher exact test), and co-occurred with RB1 mutations (p < 10-7 and q < 10-5). In addition, mutations in FGFR3, KDM6A, and STAG2, all more frequently mutated in non-muscle invasive bladder cancers, tended to co-occur in these samples (p < 0.0001 and q < 0.1 for all pairs). These observations suggest that there are two distinct pathogenic pathways for development of muscle invasive bladder cancer, one in which muscle invasive cancer develops from non-muscle invasive papillary bladder cancer, and a second in which muscle invasive cancer develops de novo or from carcinoma in-situ lesions.

Page 25: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 25 -

3. Candidate Ovarian Cancer Oncogenes in the Recurrently Amplified Chr8q24.3

Region R. Sinha, J. Martignetti, P. Dottino, B. Evans, E.E. Schadt, B.A. Reva

Icahn School of Medicine at Mount Sinai, New York, New York Precision medicine implies the ability to accurately, robustly and efficiently classify a tumor's driver alterations and pathways, its aggressiveness and sensitivity and resistance to chemotherapeutics. Currently, despite the large amount of cancer genome sequencing data being produced by The Cancer Genome Atlas project (TCGA), no publicly available protocol exists. Thus, there exists a clear need for stringent computational approaches and tools which are available to the entire research community for independent use and validation. One of the major challenges in realizing the promise of precision medicine for ovarian cancer is the identification of new therapeutically targetable driver genes. A particularly promising approach is highlighting genomic regions of recurrent copy number amplifications and identifying genes whose overexpression is driven by copy number amplifications. These are putative drivers, whose potential oncogenic function must be evaluated by wet lab experiments. As an example of this approach, we investigated the recurrently amplified genomic region 8q24.3 which was identified by the TCGA study of high-grade serous ovarian carcinoma. While this potentially important region contains 80 genes, none of them were nominated by TCGA as drivers/oncogenes. We did a deeper analysis of this region, utilizing TCGA copy number alteration and gene expression data. In order to identify genes whose gene expression was significantly elevated due to copy number amplification, we compared the gene expression of samples in which the gene was amplified to samples in which it was normal (diploid). We found that GRINA (Glutamate Receptor, Ionotropic, N-Methyl D-Aspartate-Associated Protein), PTK2 (Protein tyrosine kinase 2, also known as Focal Adhesion Kinase), MAF1, COMMD5 (COMM domain containing 5) and HSF1 (heat shock transcription factor 1) displayed high level copy number amplifications in ~25% samples in the TCGA ovarian dataset. Moreover, these five genes are the top ranking genes on 8q24.3 whose expression is elevated in copy number amplified samples to a statistically significant extent. Based on their known functions, four of these five genes represent exciting novel ovarian cancer candidate oncogenes whereas the other is an already known oncogene. GRINA (also known as Protein lifeguard 1) is a compelling novel candidate oncogene due to its role as a down-regulator of apoptosis. MAF1 is an element of the mTORC1 signaling pathway. COMMD5 down-regulates activation of NF-κβ. HSF1 is involved in cell cycle control. High expression levels of PTK2 have already been linked to tumor progression, invasion, and worse outcomes in several cancers, including glioblastoma, lung cancer, head and neck cancer, breast cancer, and ovarian cancer. As such, it is an emerging drug target in ovarian cancer. Taken together, our results support the hypothesis that the recurrently amplified genes on 8q24.3 with significantly elevated expression play a key role in ovarian cancer, and justifies further experimental study of significantly overexpressed genes of the 8q24.3 locus.

Page 26: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 26 -

Analytical Tools and Methods

Page 27: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 27 -

5. Quality Management System/Information Management System – TCGA and

Beyond Laura Aume The TCGA Quality Management System/Information Management System (QMS/IMS) has grown and evolved over the past 5 years. As the TCGA enters its next stage, it is a natural progression to use what has been learned during the process of creating a QMS for TCGA to other Center for Cancer Genomics (CCG) programs such as Exceptional Drug Responders, ALCHEMIST, and the Cancer Drivers Discovery Project. The expanded version of the QMS/IMS features separate portals for each program and, in some cases, will allow users to view information across programs. The QMS/IMS currently includes interactive workflow diagrams, a document repository for SOPs, nonconformance event reporting, training modules, quality indicator reporting with drill down capabilities, an interface for the TCGA components to report on their progress toward their current TCGA Goals and Objectives, and automated monthly progress reports. The QMS/IMS also features some new capabilities such as an aliquot accounting capability that allows TCGA personnel to track the status of aliquots that have been shipped by the BCR, and a data user report that enables TCGA to see the types of data that are being downloaded by disease and geographic location.

Page 28: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 28 -

1. MMiRNA-Tar: A Correlation and Target Prediction Analysis Tool for microRNA

and mRNA Expression Datasets Yongsheng Bai Indiana State University, Terre Haute, Indiana We developed a web interface tool MMiRNA-Tar (http://bioinf1.indstate.edu/MMiRNA-Tar) that can calculate and plot the correlation of expression for mRNA-microRNA pairs across samples or through a time course for a list of pairs under different target prediction confidence cutoff criteria. We have tested our MMiRNA-Tar tool using The Cancer Genome Atlas (TCGA) Bladder Urothelial Carcinoma (BLCA) datasets and identified many microRNAs which were correlated to mRNAs of several previously reported Bladder Cancer risk genes. Our results also showed that one gene could be targeted by multiple microRNAs in bladder cancer datasets from TCGA, and vice versa. Through analyzing BLCA datasets, we have identified additional genes targeted by these microRNAs.

We think MMiRNA-Tar provides researchers a convenient tool to visualize the co-relationship between microRNAs

and mRNAs and to predict their targeting relationships. We believe that correlating expression profiles for microRNAs

and mRNAs offers a complementary approach for elucidating these relationships.

Page 29: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 29 -

7. The ICGC-TCGA DREAM Somatic Mutation Calling Challenge: Current Status

and Future Directions

Paul C. Boutros1, Adam D. Ewing2, Kathleen E. Houlahan1, Kyle Ellrott2, Yin Hu3, Anna Y.W. Lee1, Minjeong Ko1, Amit Deshwar4, Takafumi N. Yamaguchi1, J. Christopher Bare3, Kristen Deng3, Cristian Caloian1, Christine P’ng1, Daryl Waggott1, Veronica Y. Sabelnykova1, ICGC-TCGA DREAM Somatic Mutation Calling Challenge Participants, Michael R. Kellen3, Paul Spellman5, Thea C. Norman3, David Haussler2, Stephen H. Friend3, Justin Guinney3, Peter Van Loo6, David Wedge6, Quaid D. Morris4, Gustavo Stolovitzky7, Adam A. Margolin5, Joshua M. Stuart2

1Ontario Institute for Cancer Research, Toronto, Ontario, Canada; 2University of California, Santa Cruz, Santa Cruz, California; 3Sage Bionetworks, Seattle, Washington; 4University of Toronto, Toronto, Ontario, Canada; 5Oregon Health & Science University, Portland, Oregon; 6Sanger Research Institute, Hinxton, United Kingdom; 7IBM Computational Biology Centre, New York The detection of somatic mutations from cancer genome sequences is a major bottleneck to the routine implementation of clinical-sequencing and to the discovery of mutations associated with patient survival and response to therapy. Benchmarking somatic mutation detection algorithms is complicated by the lack of gold standards, extensive resource requirements and difficulties in sharing personal genomic information. To resolve these issues, we launched the ICGC-TCGA DREAM Somatic Mutation Calling Challenge -- a crowd-sourced benchmark of somatic mutation detection algorithms. We report here:

detailed analysis of single nucleotide and structural variants in simulated genomes

analysis of SNVs in real tumour genomes (including a detailed comparison of aligner effects)

design characteristics of a cloud-based challenge studying tumoural heterogeneity

design of an RNA-Seq data-analysis challenge The results of our somatic variant detection from whole-genome sequencing crowd-sourced challenge are striking. Across 14 tumours, the WGS analysis Challenge received over 3,200 separate submissions. We show distinctive error profiles for different mutation calling algorithms and for different tumours. An exhaustive comparison of aligners demonstrates that most algorithms achieve the highest accuracy with recent versions of the bwa-mem aligner, and that there is a consistent, but moderate, improvement from the use of GATK-based indel-realignment and variant base-quality recalibration. We also demonstrate clear synergy between aligners and variant-callers, with specific pairs of tools working best together. Our structural variant analysis highlights the challenges in scoring these variants, and provides hints of the specific genomic features associated with errors. Further, we show the efficacy of ensemble pipeline methods to provide stable improvements over the best individual algorithms. Finally, we discuss the similarities between simulated and real tumour genomes, and the differences in error profiles resulting from each, along with our efforts to further enhance the state of the art in tumour genome simulation, before reporting on the design and progress of two new challenges aimed to evaluate methods for assessing somatic sub-clonal heterogeneity and RNA-Seq data-analysis.

Page 30: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 30 -

8. Next-Generation Clustered Heat Maps (NG-CHMs) for Interactive Exploration of

Patterns in TCGA Data Bradley M. Broom1, Michael Ryan1,2, Chris Wakefield1, David Kane3, Rehan Akbani1, John N. Weinstein1 1The University of Texas MD Anderson Cancer Center’s TCGA Genome Data Analysis Center, Houston, Texas; 2In Silico Solutions, Fairfax, Virginia; 3Santeon, Inc., Fairfax, Virginia Clustered heat maps (CHMs), which we introduced into ‘omic’ biology (1,2) in the mid-1990’s, are the ubiquitous way to visualize molecular profile data (3-8). But they have generally been static images (9). What we wanted for fluent exploration of TCGA and other omic datasets was a dynamically interactive heat map environment – one in which the user could zoom and navigate without loss of resolution, link out to various sources of information, query the statistics behind each pixel, re-color on the fly, link to interactive chromosomal ideograms, compute pathway and Gene Ontology relationships, produce high-resolution graphics for publication, and store all metadata necessary to reproduce the map months or years later. Accordingly, we have used a tiling technology like the ones that power popular map sites on the web to produce heat maps with those capabilities. A living compendium of hundreds of NG-CHMs for the TCGA project can be explored at http://bioinformatics.mdanderson.org/TCGA/NGCHMPortal/. For testing purposes, an interactive web-interface for building NG-CHMs using either example data or your own data is available at http://bioinformatics.mdanderson.org/testchm/. An implementation of the NG-CHM system based on Docker can also be downloaded for use on your own systems, including virtual machines hosted by commercial cloud providers. This downloadable NG-CHM system includes an intuitive web interface for building your own NG-CHMs, but it can also be used in conjunction with a high-level R library for constructing full-featured NG-CHMs such as those in our TCGA NG-CHM Compendium. The NG-CHM project page available at http://bioinformatics.mdanderson.org/main/NG-CHM:Overview includes links to a detailed user-guide, introductory videos, tutorials, the downloadable NG-CHM system, the NG-CHM R library, example scripts, and other support materials. References

1. Weinstein JN ... Paull KD. Stem Cells 12; 13, 1994. 2. Weinstein JN … Paull KD. Science 275;343, 1997. 3. Myers T … Weinstein JN. Electrophoresis 18; 467, 1997. 4. Eisen MB … Botstein D. Proc. Natl. Acad. Sci. U.S.A. 14863, 1998. 5. Golub TR … Lander ES. Science 286; 531, 1999. 6. Ross DT… Brown PA. Nature Genetics 24; 227, 2000 7. Scherf U …Weinstein JN. Nature Genetics 24; 236, 2000. 8. Zeeberg BR … Weinstein JN. BMC Bioinformatics 6; 168, 2005. 9. Weinstein JN. Science 319; 1772, 2008. This work was supported in part by NCI Grant No. U24CA143883 (TCGA Genome Data Analysis Center), by the Mary K. Chapman Foundation, by the Michael and Susan Dell Foundation honoring Lorraine Dell, and by the NCI Cancer Center Support Grant to UT MD Anderson Cancer Center.

Page 31: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 31 -

9. Stacked Predictive Sparse Coding for Classification of Distinct Regions in Tumor

Histopathology Hang Chang, Yin Zhou, Paul Spellman, Bahram Parvin Life Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California Image-based classification of histology sections, in terms of distinct components (e.g., tumor, stroma, normal), provides a series of indices for tumor composition. Furthermore, aggregation of these indices, from each whole slide image (WSI) in a large cohort, can provide predictive models of the clinical outcome. However, performance of the existing techniques is hindered as a result of large technical variations and biological heterogeneities that are always present in a large cohort. In this presentation, we propose a system that automatically learns a series of basis functions for representing the underlying spatial distribution using stacked predictive sparse decomposition (PSD). The learned representation is then fed into the spatial pyramid matching framework (SPM) with a linear SVM classifier. The system has been evaluated for classification of (a) distinct histological components for two cohorts of tumor types, and (b) colony organization of normal and malignant cell lines in 3D cell culture models. Throughput has been increased through the utility of graphical processing unit (GPU), and evaluation indicates a superior performance results, compared with previous research.

Page 32: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 32 -

10. Insights Into Somatic Mutation-Driven Cancer Genome Evolution: A Study of 3,000

Cancer Genomes Across 9 Cancer Types Feixiong Cheng1, Zhongming Zhao1,2,3

1Department of Biomedical Informatics, Vanderbilt University School of Medicine, Nashville, Tennessee; 2Center for Quantitative Sciences, Vanderbilt University Medical Center, Nashville, Tennessee; 3Department of Cancer Biology, Vanderbilt University School of Medicine, Nashville, Tennessee Cancer development and progression result from the somatic evolution by the accumulation of the genomic alterations, including point mutations, deletions, gene fusions, gene amplifications, and chromosomal rearrangements. The effects of those alterations on the fitness of somatic cells lead to evolutionary adaptations such as increased cell proliferative, angiogenesis, and altered anticancer drug responses. Additionally, genomic instabilities, such as chromosomal instability and microsatellite instability, have been recognized as a hallmark of cancer for several decades. However, there are few general mathematical models to quantitatively examine how perturbations of a single gene shape adaptive evolution of the cancer genome. Moreover, distinguishing cancer genomic instabilities from massive genetic polymorphisms and non-genetic events is a major challenge in cancer research. Massive genomic alterations present researcher with a dilemma: does this genomic instability contribute to cancer, or is it simply a byproduct of cellular processes gone awry? Thus, quantifying whether the perturbation of any single gene in a cancer genome is sufficient to shape adaptive cancer genome evolution would help us better understand the fitness of somatic cells through genomic alterations. In this study, we developed the gene gravity model to study the evolution of cancer genomes by incorporating the transcriptional and somatic mutation profiles of ~3,000 tumors across 9 cancer types from The Cancer Genome Atlas (TCGA) into a broad gene network. We found that mutations of a cancer driver gene tend to uniquely cause cancer genome instability and shape adaptive cancer genome evolution by inducing mutations in other genes. Importantly, this functional consequence is often generated by the combined effect of genetic and epigenetic (e.g., chromatin regulation) alterations. In addition to the above fundamental findings, we identified six new cancer genes (AHNAK, COL11A1, DDX3X, FAT4, STAG2, and SYNE1), each of which significantly increased cancer genome mutation rates. Finally, we provided statistical evidence that aneuploidy is a common genetic mark of cancer, due to a higher risk of genomic instability uniquely induced by cancer driver genes on the X chromosome in comparison to those on autosomes. In summary, we presented novel insights on the genomic instability that propels adaptive cancer genome evolution. Moreover, we first time provided statistical evidence that aneuploidy is a common genetic factor of cancer. We felt this work helps to illustrate the functional consequences and evolutionary characteristics of somatic mutations during tumorigenesis by driving adaptive cancer genome evolution.

Page 33: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 33 -

11. Gene Expression Signatures of Pathway Activity Derived Using TCGA Data and

Known Biology Patrick Danaher

NanoString Technologies Gene expression signatures measuring the activity of biological pathways promise greater analytic stability, biological interpretability and association with clinical outcomes than single genes. We propose a method to harness the abundance of data and biological knowledge that characterize modern genomics to train pathway signatures with improved biological interpretability and clinical relevance. Pathway signatures are typically derived using unsupervised methods, as gold standard measurements of pathway activity are rare. For example, a pathway’s data is often summarized using its first principal component. However, if pathway activity is not the most prominent signal in the data, unsupervised methods like PCA will train signatures targeting the wrong aspect of pathway biology, making their interpretation uncertain. We take a “biologically supervised” approach, using known biology to inform unsupervised methods. The link to biology buttresses the case for specific interpretations of our pathway signatures, for example the claim that our Wnt signaling signature captures the Wnt pathway genes’ responses to pathway activity rather than some other factor. Method: Varying pathway activity can be reasonably expected to drive substantial variability in pathway gene expression. In addition, other factors may exert similarly broad influence on pathway gene expression. These few important factors will combine to drive the leading eigenvectors of the pathway’s expression data. While the effects of pathway activity may not correspond to the first eigenvector, they will be largely contained in the subspace of these leading eigenvectors. Our method uses previously known biology to find the desired signal within that subspace. Specifically, the scientist uses a literature search, a KEGG pathway diagram or earlier experimental results to define an initial guess at a pathway signature – a vector of weights for each gene. This initial, highly inaccurate guess is then projected onto the subspace of the leading eigenvectors to generate the final signature. If the model holds and the initial guess is near to the subspace, then the signature will both be consistent with known biology and lie in the subspace where we expect the truth resides. Application: For a wide variety of cancer-related pathways, we use literature searches to define initial guesses at pathway signatures. We use 24 TCGA RNASeq datasets to update these guesses into final pathway activity signatures. These signatures define subtypes within cancers, predict survival in TCGA datasets and predict cell lines’ responses to a large panel of drugs.

Page 34: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 34 -

12. Petabyte-Scale Cancer Genomics in The Cloud Brandi Davis-Dusenbery, Zeynep Onder, Devin Locke, Deniz Kural The advent of next generation sequencing has transformed our ability to generate genomic data. Today, cancer researchers have access to petabytes of multi-dimensional information from thousands of patients. However, analysis of this information only becomes more challenging as the amount of data continues to increase. This difficulty is exemplified when we consider data generated by the efforts of The Cancer Genomics Atlas (TCGA) network. Simply downloading the complete TCGA repository would require several weeks with a highly optimized network connection. Once downloaded, integrated analysis of this data remains out of reach for any researcher without access to the largest institutional compute clusters. The Cancer Genomics Cloud (CGC) Pilots project seeks to directly address these challenges by co-localizing data with the computational resources to analyze it. The project was born out of the recognition that as the biological research enterprise grows increasingly computationally-intensive, new approaches are required to support effective data discovery, storage, computation, and collaboration. As one of three pilots, our platform will enable researchers to securely leverage the power of cloud computing to gain biologically relevant and actionable insights from massive public datasets including TCGA. Reproducible analysis of public (including controlled and open-access TCGA data) and private data can be performed using both application program and graphical user interfaces. Additionally, a robust software development kit (SDK) utilizes Docker containers and the Common Workflow Language to enable tool developers to readily deploy tools and pipelines on the cloud. This presentation will highlight the functionalities of the platform, as well as our approach to optimized computation, data mining, and visualization solutions that will enable cancer researchers to address the challenges associated with analysis of petabyte-scale datasets and beyond. The CGC will be released to the community for evaluation and feedback by the end of 2015; interested researchers can sign up at cancergenomicscloud.org

Page 35: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 35 -

13. Improving Search, Access, and Analysis of Cancer Genomic Data From Multiple

Collections of TCGA and ICGC Protected Data Francisco M. De La Vega1, Michael Ainsworth2, Dai-Ying Wu1, Tal Shmaya1, James Wiley2, Akshay Patel2, Raja Hayek2 Annai Systems, Inc., 1Burlingame, CA, 2Carlsbad, CA Finding and accessing the ever-growing volumes of cancer genomic data generated by an ever-growing number of large international projects can be difficult for researchers and clinicians. Even after available data is identified and access is approved, the tasks of extracting and running analyses on the data can overwhelm available IT resources such as compute power, storage and networks. Here, we show a set of tools to improve search, access and analysis of genomic data across two large collections of protected cancer genomics data distributed across seven data centers around the world – The Cancer Genome Atlas (TCGA) and the ICGC Pan-Cancer Analysis of Whole Genomes (PCAWG) project. TCGA has been generating data for nearly 9 years and has created an enormous collection of “omics” data on about 10,000 patients. On the other hand, the International Cancer Genome Consortium today has data on over 3,600 additional patients and has the ambitious goal of more than doubling the sample size of TCGA. The datasets generated by each of these consortia are available to the research community, but subject to access restrictions for the raw data, and reside in disparate environments, making query and analysis of this data difficult. The PCAWG project encompasses both TCGA and ICGC studies and comprises of whole-genome data from over 2,200 patients stored in six data centers around the world. We show how administrators, scientists and clinicians are empowered by shortcuts to controlling, finding, accessing and analyzing genomic data through the use of the tools we developed. For administrators charged with making protected data from complex projects like PCAWG available, Annai-GNOS provides reliably secure genomic data transfer via Annai’s GeneTorrent protocol, data synchronization across multiple data centers, and enforcement of metadata standards and access controls across multiple data centers. GNOS enables the project leaders to make protected data securely available to only approved researchers around the world. For the scientist, the GNOS interface for metadata search allows for metadata search across all six data centers in the PCAWG project eliminating the requirement to search each data center separately. GeneTorrent allows approved researchers to achieve fast reliable downloads regardless of data location. In addition, we provide GTFuse, a tool that allows approved researches to remotely mount raw genomic data files and easily apply commonly used tools, such as SAMtools, to manipulate and slice those files transporting only the necessary data blocks securely via GeneTorrent, saving significant time and storage resources. Finally, we describe our new ShareSeq Genomic Resource that brings compute to the data. This allows researchers to combine and analyze all the data from TCGA and ICGC (data of the latter project being resident in the system) in an expedient and cost effective manner.

Page 36: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 36 -

14. Creating Portable Workflows for TCGA Multicenter Calling Kyle Ellrott, David Haussler, Josh Stuart The MC3 (Multi-Center Mutation Calling Multi-cancer Completion) effort is a project to apply a set of variant calling tools from multiple research institute consistently across all of the TCGA DNA exome samples. These uniformly produced variant predictions will provide a rich dataset across 33 tumor types for the PanCan Atlas project. This effort includes variant callers from the University of California Santa Cruz, Washington University in St. Louis, the Broad, Baylor and MD Anderson. This project is comprised of two major components, first the production of a set of BAMs with uniform GATK preparation steps, including INDEL realignment and Base Quality Score Recalibration (BQSR). This dataset includes 14,000 BAMs (~208TBs) originally produced by the Broad Institute since 2011 and another 6,000 BAMs (~115TB) that have been re-processed by the Broad and another 7,000 left to be processed. Once a uniform set of BAMs has been produced, tools including Radia, MuSE, Varscan, Somatic Sniper and Pindel will be applied to produce a consistent set of variant calls. These tools have been adapted to run using a containerization technology called Docker, so issues related to installation and package dependencies are taken care of. The interface from the tools to the workflow system is done using the Galaxy tool wrapper syntax. This technology is being used to run the exact same pipelines at multiple institutions for both the MC3 and ICGC/TCGA PCAWG variant calling projects. By basing this work on containerization technology and public standards these pipelines can quickly be adapted for new environments and uses. It can be deployed on cloud based VMs or bare metal clusters. The exact code installation used for analysis can be archived and can easily be re-deployed for re-running analysis at a later date. The work related to this project is being made available via open source, so other institutions will be able deploy similar analysis on their own data.

Page 37: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 37 -

15. Reconstructing Hi-C Data Using Long-Range Correlations in Epigenetic Data J.P. Fortin1, K.D. Hansen1,2 1Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland; 2McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins School of Medicine, Baltimore, Maryland A Hi-C experiment produces a genome-wide contact matrix whose entries estimate how often two distinct loci interact with each other. Analysis of Hi-C contact matrices have shown that at a gross scale, the genome can be divided into two compartments – closed and open – and that this compartmentalization is cell-type specific. Recent work has shown that 36% of these compartments change during stem cell differentiation. Here we show that genome compartments can be reliably estimated using DNA methylation data from the Illumina 450k platform, an inexpensive and popular methylation microarray. To do so, we show that the long-range correlations of methylation levels are substantially higher for two loci that belong to the "closed" compartment ("closed-closed" interaction) than for the two other types of interactions ("open-open" and "open-closed" interactions). By applying principal component analysis to the methylation correlation matrix, we can estimate where the "closed-closed" interactions occur and thereby obtain the genome compartmentalization at a 100kb resolution. We show that we are able to recover differences between cell types. Using the TCGA tumor samples assayed on the 450k platform, we predict and examine the genome compartments in more than 10 cancer types.

Page 38: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 38 -

16. Scalable Visualization of Billions of Histological Objects Using the Cancer Digital

Slide Archive M. Nalisnik1, C. Vaughn1, W.D. Dunn5, L.A.D. Cooper1,3,4, D.A. Gutman3,5 1Departments of Biomedical Informatics, 2Pathology and Laboratory Medicine, 3Winship Cancer Institute, 4Biomedical Engineering, 5Neurology, Emory University/Georgia Institute of Technology, Atlanta, Georgia Introduction: The Cancer Genome Atlas (TCGA) contains a large collection of whole-slide images (WSIs) of tumor histological slides. These data are a rich source of information, and when integrated with genomic and clinical data, they enable insight into important issues like tumor microenvironment, heterogeneity, and the subclassification of disease. Advances in image analysis technology enable these data to be mined to extract quantitative morphologic features describing microanatomy, such as cell nuclei and blood vessels, and to correlate these data with clinical outcomes and various genomic descriptors. We developed the Cancer Digital Slide Archive (CDSA) (http://cancer.digitalslidearchive.net/) to facilitate better integration of WSIs into integrated cancer studies by enabling browser-based visualization of whole-slide images. We recently extended this resource to enable users to visualize image analysis generated boundaries of billions of objects. This proof of concept demonstration unlocks the potential to rapidly review the output of segmentation algorithms. Methods: We applied a nuclear segmentation algorithm to 6,000 images of hematoxylin and eosin (H&E) permanent sections from the GBM, LGG, LUAD, LUSC, PRAD, and SKCM. A total of 4 billion objects were delineated in these images, and stored in a MONGO database for rapid retrieval and visualization. To facilitate visualization of these boundaries, we enhanced the CDSA to generate scalable vector graphics in real time, illustrating nuclear boundaries with color rendering in the user’s field of view. To make this functionality responsive, we take advantage of spatial locality by loading objects in adjacent fields in anticipation of panning events. Results: We have demonstrated the feasibility of analysis, loading, and visualization of segmentation results produced on 6000+ whole slide digital images. We have noted very small (<500ms) delay when loading initial field, and no noticeable delay when panning thereafter. Conclusions: The enhanced architecture of the CDSA enables scalable visualization of billions of histological entities, enabling better use of quantitative image analysis results in integrated cancer studies. As this framework can easily be modified to support the visualization of analysis results from multiple algorithms simultaneously and highlight regions of overlap/variance by proper color coding, our model provides an efficient mechanism for quality control and analysis. We will be releasing this technology in the future through GitHub.

Page 39: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 39 -

17. Firecloud - The Firehose Analysis Platform in The Cloud Gad Getz1, Anthony Philippakis1, Matthew Trunnell1, Chet Birger1, Megan Hanna1, Gordon Saksena1, Alex Ramos1, Doug Voet1, Kristian Cibulskis1, David An1, Nils Homer1, Adam Kiezun1, Michael Noble1, Ian Poynter1, Eric Banks1, David Patterson2, Matt Massie2, Timothy Danford2, David Haussler3, Benedict Paten3, Hannes Schmidt3, Mark Diekhans3, Nadine Gassner3 1Broad Institute, Cambridge, Massachusetts; 2University of California, Berkeley, Berkeley, California; 3University of California, Santa Cruz, Santa Cruz, California Broad/UC collaborate to build a cloud-based analysis platform with co-located TCGA data for public use. The cost of DNA sequencing has dropped more than one-million-fold over the last decade, and we are entering an era where, in principle, it will be possible to discover the genetic basis of disease and treatment response. Three challenges prevent achieving this goal: (i) Processing massive sequence datasets requires a costly computational infrastructure; (ii) The current generation of methods cannot scale to the petabytes of data currently in existence, let alone to the exabytes that will come; and, (iii) Data is being collected and stored in silos, prohibiting sharing and collaborative projects. Nowhere is this situation more true than the field of cancer genomics. Large-scale sequencing efforts such as TCGA have begun to elucidate the genetic pathogenesis of cancer, and the development of several successful targeted chemotherapies has provided proof-of-principle that identifying driver mutations can be translated into therapeutics. In order to enter an era of true “precision medicine,” however, we will need to succeed not only in collecting appropriately consented samples and sequencing genomes, but also in creating sophisticated information technologies capable of storing, analyzing, and sharing genomic data. The goal of the NCI Cancer Genomics Cloud Pilots is to develop solutions to these challenges. As one of the three Cloud Pilot awardees, a team of experts from the Broad Institute, University of California at Santa Cruz and University of California at Berkeley is building FireCloud, a platform for analysis of cancer genomes. FireCloud strives to empower researchers, democratize access, enable sharing, and facilitate collaboration by building a robust scalable platform that is accessible to the community at large. It is modelled on Firehose, the cancer genome analysis platform built by the Getz lab at the Broad Institute, which supports both small groups and major projects (TCGA, GDAC, GTEx). Firehose provides a workspace environment with access control which securely tracks and manages data, metadata, algorithms, job execution and results. Firehose captures provenance for each run (method versions, timestamps, input and output files), thus allowing reproducible science. It is flexible, robust and and can be run at scale, as has been demonstrated on numerous large scale production efforts (e.g. TCGA Analysis Working Groups, Genome Data Analysis Center). FireCloud will contain much of the Firehose functionality but will also contain novel and scalable paradigms for distributed data storage and computation, various ‘stores’ (read store, variant store, signal store), and co-located TCGA data. Like Firehose, FireCloud will be built to support the work of the general community of cancer genome researchers, including analysts, production managers and tool developers. We will work closely with our users to demonstrate FireCloud’s capabilities and to train and engage the community. This effort is firmly rooted in the data-sharing principles set forth by the Global Alliance for Genomics and Health (GA4GH), making it both technology-driven and mission-driven from its incipience. This work is both open source and non-profit. It is our hope that this platform will, in time, grow into a resource not only for the analysis of cancer genomes, but for all forms of genomic data.

Page 40: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 40 -

18. Validating a New Somatic Mutation Caller Using TCGA Data Arun Ahuja, Ryan Williams, Tim O’Donnell, Jeff Hammerbacher Icahn School of Medicine at Mount Sinai, New York, New York Our lab has implemented a simple somatic mutation caller called Guacamole and have used it to compete in the ICGC-TCGA DREAM Mutation Calling challenge. While the synthetic data provided by the challenge was useful for early development of our somatic mutation caller, we’d like to test Guacamole on validated variant calls obtained from the sequencing of tissue samples from real cancer patients before using calls from Guacamole in a research or clinical project. The TCGA provides a wonderful resource to build such a validation data set for a new somatic mutation caller. We’ve downloaded all TCGA MAF files, extracted the variants with Validation_Status == “Valid” or “Invalid”, searched CGHub for the BAM files which match the sample barcode of the validated variants, and pulled all of that data into our cluster. We then separated the variant calls and supporting reads into training, test, and validation sets and characterized the performance of many somatic mutation callers under a variety of parameter settings when run on this validation data set. We’d like to present our experiences obtaining validation data as well as our results validating a variety of somatic mutation callers, including Guacamole, against this data set.

Page 41: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 41 -

19. Pipeline and Tools for RNA-Seq-Based Fusion Transcript Expression Estimation in

Cancer Isaac Joseph University of California, Berkeley, Berkeley, California In cancer, chromosomal breakage and rejoining is common, which can lead to the construction of fusion genes (FGs): chimeric genes consisting of regions from two separate constituent genes. FGs’ products —fusion proteins— are important in both tumorigenesis and tumor maintenance, and their oncogenic properties are often thought to be due to expressional deregulation. To our knowledge, there do not currently exist high-throughput tools able to assess FG expression. However, several high-throughput tools exist to estimate relative expression of normal gene transcript isoforms based on ambiguously-mapping short Illumina RNA sequencing reads. In addition, several high-throughput tools exist to find genomic chromosomal breakage junctions leading to FGs. Based on expression estimation and FG-finding tools, we create a novel tool and associated pipeline to estimate the relative expression of FG transcript isoforms. Working with The Cancer Genome Atlas’ Low Grade Glioma Analysis Working Group, we show our method’s successful identification of an expressionally deregulated fusion gene transcript isoform in one representative patient.

Page 42: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 42 -

20. FocalScan: Scanning for Altered Genes Based on Coordinated DNA and RNA

Change Joakim Karlsson Department of Medical Biochemistry, Institute of Cell Biology, University of Gothenburg, Gothenburg, Sweden Background: In cancer, cells acquire the ability to replicate quickly and uncontrollably, avoiding cell death and achieving tissue invasive capacity. The evolution of these mechanisms is made possible by genomic instability. As such, to improve understanding and treatment of cancer it is relevant to look for genomic changes subject to selection in tumors. Selection may manifest in the form of recurrent amplifications and deletions in size-limited (focal) genomic regions, which will lead to transcriptional activation or inactivation of driver or suppressor genes. Methods have been developed to identify such focally altered regions, but the exact genes targeted by these events are often unclear. Several algorithms are also available to integrate transcription and copy-number data, with the purpose of discovering individual genes activated or inactivated by copy-number changes, but these tools are lacking severely in specificity. Principal Findings: We developed FocalScan, a tool designed to simultaneously uncover patterns of focal DNA alteration and coordinated transcription, thus drawing strengths from both principles. The software scans the genome and yields as output a ranked list of tentative drivers or suppressors in cancer. FocalScan is designed to work with RNA-seq data and can also be used in a “gene-agnostic” mode that scans the genome without the aid of a gene annotation, enabling identification of previously un-annotated and putatively functional elements including long non-coding RNAs. Application of the method on a large breast cancer dataset from The Cancer Genome Atlas, and evaluation of the enrichment of cancer genes in the resulting list, preliminarily suggests that performance is considerably better than existing tools for DNA/RNA integration. Additionally, the annotation-independent analysis option led to the discovery of a novel putative lncRNA in a focally amplified region. Conclusions: FocalScan enables annotation-independent analysis of focally altered genomic regions with respect to coordinated changes in copy number and transcription, narrowing down the search space for critical cancer genes.

Page 43: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 43 -

21. GenePool: A Cloud-Based Technology for Rapidly Data Mining Large-Scale,

Patient-Derived Cancer Genomic Cohorts Including The Cancer Genome Atlas Tod M. Klingler, Emanuele Altieri, Jeff Hanson, Edie J. Hovermale, Anish Kejariwal, Maria Soong, Adin Stein, Antoaneta Vladimirova, Mike Wood, Richard D. Goold, Sandeep Sanga Station X, Inc., San Francisco, California The Cancer Genome Atlas is a large, multi-center project that is characterizing the molecular profiles of samples from patients with difficult-to-treat cancers. Data that has been generated in this project includes exome, whole genome, RNA-seq, miRNA-seq, copy number, methylation and protein expression datasets from more than 10,000 cancer patients. While germline variants and read-level data is controlled access (i.e. available by application and approval of dbGaP), much of the remaining “summary” level data is open-access. We have downloaded the open-access TCGA datasets, matched them with the available patient and sample data, and imported them into GenePool®. GenePool (https://stationx.mygenepool.com/) is a cloud-based system for the secure storage, management, analysis, visualization, interpretation, and sharing of large-scale human genomics data. The GenePool platform enables rapid interrogation of massive amounts of data generated by current laboratory methods and was designed to meet the needs of life scientists and clinicians engaged in research and clinical activities where making sense of patient-derived genomics samples is paramount. GenePool is designed to perform at a scale aligned with the sequencing capacity the next generation of high-throughput sequencing machines. The software solution provides users with an intuitive interface, an easy way to store and select genomics datasets according to sample-associated metadata for analysis, and an automated system for performing routine, well-characterized genomics workflows. Results are presented with multiple options for annotation, sorting and data export, with a visualization tool that facilitates browsing of genomic data for biomarker identification. We will demonstrate how GenePool removes the data download, management, and computing burdens often faced by researchers and clinicians working with sequencing data, particularly large-scale projects exemplified by data generated for The Cancer Genome Atlas. Through the context of a number of use-cases, we will present GenePool as a powerful software platform providing best-in-class data management, analysis, and collaboration for quickly deriving value from the large amounts of patient-derived sequencing data. In particular, we will demonstrate how GenePool can be used to conduct disease-specific integrative analyses, cross-tumor analyses, derive actionable clinical insights, and conduct biological validation with The Cancer Genome Atlas.

Page 44: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 44 -

22. Dynamic Analyses of Alternative Polyadenylation From RNA-Seq (Dapars) Reveal

Molecular Mechanisms and Functional Consequences of 3`UTR Shortening Across

TCGA Tumor Types Zheng Xia, Wei Li Division of Biostatistics, Dan L. Duncan Cancer Center and Department of Molecular and Cellular Biology, Baylor College of Medicine, Houston, Texas Alternative polyadenylation (APA) is emerging as a pervasive mechanism in the regulation of more than 70% of human genes. By changing the position of polyA site, APA can either shorten or extend 3’ UTRs that contain many important cis-regulatory elements, such as miRNA binding sites. The role of APA in human cancer is a largely overlooked due to the lack of genome-wide APA profiling method. To overcome these limitations, we recently developed DaPars for the de novo identification of APA from standard RNA-seq. When applied to 358 TCGA Pan-Cancer tumor/normal pairs across 7 cancer types, DaPars reveals 1,346 genes with recurrent and tumor-specific APA target genes, most of which (91%) have shorter 3`-UTRs in tumors that can avoid microRNA-mediated repression, including glutaminase (GLS), a key metabolic enzyme for tumor proliferation. Interestingly, selected APA events add strong prognostic power beyond common clinical and molecular variables, suggesting their potential as novel prognostic biomarkers. Finally, our results implicate CFIm25 as a master regulator of 3`-UTR shortening that links APA to glioblastoma tumor suppression. These results underscore the power of innovative bioinformatics analyses that can derive novel biological insights from existing big data in genomics.

Page 45: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 45 -

23. OncoLand: An Integration Platform for Data Management, Analysis, and

Visualization of TCGA Datasets Matthew Newman, Jason Lu, Vivienne Zhang, Gary Ge, John Hu, Jack Liu Omicsoft Corporation, Cary, North Carolina The rich data generated by The Cancer Genome Atlas (TCGA) project provides a rare opportunity to comprehensively define the genomic changes occurring in cancer. Along with clinical data, the TCGA OMIC data represents a vast resource for investigators at both academic and commercial institutions for exploration. To facilitate data mining and integration, Omicsoft has developed Oncoland, a client-server based software solution for storage, analysis, and visualization of cancer-related -omic data, including DNA mutation, expression, copy number, methylation and gene fusions detected from next generation sequencing (NGS). The OncoLand enables investigators to easily query and navigate a gene or sets of genes of interest in multiple tumors across data from different platforms. A sample metadata management system provides the ability to incorporate both the publicly curated sample level and clinical metadata, as well as user-defined groupings or subtypes. In addition, the built-in Omicsoft Genome Browser allows a detailed view of coverage, gene fusions and alternatively spliced isoforms. OncoLand contains a variety of modules for integrated analysis and data exploration, including correlation of mutations to gene expression. For cell line datasets, it allows for the incorporation of drug response measurement data for gene-drug association analysis. Furthermore, OncoLand provides analytic tools for processing users’ own data through the same pipelines as Omicsoft used for TCGA. This allows users to easily add public and their internal data - spanning from clinic and laboratory - into a user’s instance of the platform, providing an easy solution for rapid data expansion.

Page 46: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 46 -

24. The CRAVAT and MuPIT Web-Services for Pathogenicity Prediction, Annotation,

and 3D-Visualization of Mutations Including Integrated Comparative TCGA

Analysis David Masica Johns Hopkins University and Johns Hopkins School of Medicine, Baltimore, Maryland; In Silico Solutions, Falls Church, Virginia The Cancer-Related Analysis of Variants Toolkit1 (CRAVAT; Figure 1A) was designed with an easy-to-use interface to facilitate the high-throughput assessment and prioritization of genes and variants important for cancer tumorigenesis. CRAVAT provides predictive scores for germline variants, somatic mutations and relative gene importance, as well as annotations from published literature and databases. Mutation Position Imaging Toolbox (MuPIT) interactive2 (Figure 1B) is a browser-based application for single-nucleotide variants (SNVs), which automatically maps the genomic coordinates of SNVs onto the coordinates of available three-dimensional (3D) protein structures. The application is designed for interactive browser-based visualization of the putative functional relevance of SNVs by biologists who are not necessarily experts in either bioinformatics or protein structure. Users may submit batches of several thousand SNVs and review all protein structures that cover the SNVs. Alternatively, if a single or multiple variants in a gene annotated by CRAVAT or the UCSC Xena browser3 can be visualized in MuPIT, those tools provide easy link-outs. MuPIT automatically maps all available TCGA mutations onto available crystal structures, which is useful for comparison with user-provided mutations or for viewing separately; importantly, TCGA mutations can be displayed on a tissue-specific basis (see for instance, Figure 1B). MuPIT also provides visual annotation of functionally important positions, such as binding sites and putative mutation “hotspots”.

Figure 1: Example of CRAVAT and MuPIT web-service interfaces using a single mutation in the FGFR1 gene. In A, a user has entered the chromosomal coordinates for a single missense mutation in the Input field. In the Analysis field, the CHASM classifier for cancer driver analysis has been selected. CHASM classifiers are tissue specific and trained, in part, using TCGA data (in this example, the Lung-Squamous-Cell classifier was selected). After entering an email address and clicking Submit, the user is emailed the pathogenicity prediction and detailed functional annotation, in both excel and text formats. These results also include links to view mutated residue positions in context of the corresponding crystal structures, via the MuPIT website (B). In B, a crystal structure of the FGF2-FGFR1 dimer is shown with the user-provided mutation displayed as bright-green spheres (same mutation shown once in each of two dimers). This page includes the option to display TCGA mutations, by specific tissue, which is useful for comparison with the user-provided mutation(s). In B, mutations from the TCGA’s lung squamous-cell (LUSC) study are displayed as magenta spheres; for clarity, mutations from other TCGA subtypes are not displayed in B. The user can also display mutation clusters from any TCGA tissue type for which significant clusters were detected (e.g., mutation “hotspots”). This example also shows MuPIT utilities for toggling “on” and “off” different structural elements, as well as adjusting rendering options such as color and style. Additional display options, such as the highlighting of ligand-binding sites and biomolecular interfaces, are available on the MuPIT website (only partial functionality is shown in Figure 1B). CRAVAT (http://www.cravat.us) and MuPIT (http://mupit.icm.jhu.edu) are freely available for non-profit use. WebAPIs are available for programmatic access to CRAVAT annotations and MuPIT visualizations.

A

B

Page 47: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 47 -

Page 48: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 48 -

References

1. Douville, Christopher, et al. "CRAVAT: cancer-related analysis of variants toolkit." Bioinformatics 29.5 (2013): 647-648.

2. Niknafs, Noushin, et al. "MuPIT interactive: webserver for mapping variant positions to annotated, interactive 3D structures." Human genetics 132.11 (2013): 1235-1243.

3. Goldman, Mary, et al. "The UCSC Cancer Genomics Browser: update 2015."Nucleic acids research (2014): gku1073.

Page 49: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 49 -

25. Subclonal Hierarchy Inference From Somatic Mutations: Automatic Reconstruction

of Cancer Evolutionary Trees From Multi-Region Next Generation Sequencing Noushin Niknafs Johns Hopkins University, Baltimore, Maryland Recent improvements in next-generation sequencing of tumor samples and the ability to identify somatic mutations at low allelic fractions have opened the way for new approaches to model the evolution of individual cancers. The power and utility of these models is increased when tumor samples from multiple sites are sequenced. Temporal ordering of the samples may provide insight into the etiology of both primary and metastatic lesions and rationalizations for tumor recurrence and therapeutic failures. Additional insights may be provided by temporal ordering of evolving subclones- cellular subpopulations with unique mutational profiles. We present a new modular framework based on a rigorous statistical hypothesis test to infer the SubClonal Hierarchy from Somatic Mutations (SCHISM). Our framework decouples the problems of mutation cellularity estimation and temporal ordering, and can thus be flexibly combined with existing tools addressing either of these problems. The SCHISM framework includes tools to interpret hypothesis test results, which inform phylogenetic tree construction, and we introduce the first genetic algorithm designed for this purpose. The utility of our framework is demonstrated in simulations and by application to data from three published multi-region tumor sequencing studies of (murine) small cell lung cancer, acute myeloid leukemia, and chronic lymphocytic leukemia. Using a number of different configurations of tools in SCHISM framework, we were able to identify subclonal phylogenies that were either identical to or inclusive of the phylogenies reconstructed by study authors using manual expert curation. SCHISM can be applied to TCGA data, especially in patients with multiple characterized tumor samples, to study the genomic underpinnings of tumor evolution, and possibly therapeutic resistance.

SCHSIM performance on a simulated patient phylogeny (A) subclonal phylogeny (B) mutation cluster cellularities across 7 simulated biopsy samples (C) Mutation cluster precedence order violation matrix, marking unsupported lineage relationships in red (D) an ensemble of maximum fitness subclonal phylogeny trees after 50 generations of the genetic algorithm.

Page 50: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 50 -

26. Exploring Summary Measures of Intra-Tumor Heterogeneity Irina Ostrovnaya, Esther Drill Memorial Sloan Kettering Cancer Center, New York, New York It has long been known that tumors are often heterogeneous, potentially harboring multiple subclones with different populations of mutations. Recently a lot of methods have emerged that attempt to infer the subclonal structure from a single or multiple tumors from the same patient using next generation sequencing data. Here we will consider a simpler problem of summarizing intra-tumor heterogeneity (ITH) by a single measure that can be further correlated with tumor characteristics and clinical outcome. We assume a single tumor per patient was sequenced using exome sequencing and variant allele frequencies (VAF) of somatic mutations and copy number data are available. The questions arising from this problem that we will address are: (1) Can we properly quantify ITH using only VAFs, or do we need to estimate cellular prevalences of the mutations? (2) Is information about non-functional mutations useful, and do VAFs have different patterns among functional and non-functional mutations? (3) How does tumor purity affect the considered ITH measures, especially if purity is correlated with the outcome? We will consider a range of statistical summaries of VAF distribution, including median, standard deviation, entropy, skewness coefficient, number of modes and others. Our goal is to test the hypothesis that higher ITH is associated with worse tumor prognosis or more aggressive tumor features. We will test this hypothesis and evaluate the proposed measures of ITH using TCGA clear cell kidney cancer data and other TCGA data from the cancer types where VAFs were available.

Page 51: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 51 -

27. FireBrowse: Mining the Firehose of TCGA Michael S. Noble, Katherine Huang, Kane Hadley, Benjamin Alexander, David I. Heiman, Gad Getz The Cancer Genome Atlas has generated an enormously rich set of genomic data for the research community. As one of many software tools developed to extract knowledge from that data, the Broad Institute GDAC Firehose has helped systematize, simplify, and democratize access to TCGA in several ways: by regularly aggregating data into versioned packages suitable for immediate analysis; automatically performing a complex set of integrative bioinformatic analyses upon them; and making the results of such available through biologist-friendly and literature-citable online reports, and en masse through firehose_get. In the summer of 2014 we debuted the companion FireBrowse portal, which makes it easy to find any of the thousands of TCGA datasets or Firehose analysis result reports in just 2 clicks, with no typing or foreknowledge of TCGA nomenclature; a data synopsis of every disease cohort is also given, with precise aliquot counts displayed for each TCGA data type in a modern graphical user interface (GUI). Powering the FireBrowse GUI is a RESTful application-programming interface (API) that offers over 20 functions and is completely open for public use. The API supports bulk or fine-grained access to sample-level data, metadata, and Firehose analyses, returned in forms suitable for either scientists (TSV, CSV) or programmers (JSON). This provides cancer researchers the most complete and current set of automated MutSig and GISTIC results, among many others; sample data indexed by gene, miR, or barcode; and a diverse set of clinical parameters for each TCGA patient. In this work we discuss recent advances in FireBrowse, including: extensions to its RESTful API; the fbget suite of wrappers to that API, which enable FireBrowse data and results to be easily used within powerful user analysis environments such as SciPy/IPython; an mRNASeq gene expression viewer; and iCoMut, for interactive visualization of co-occurrence between statistically significant mutations and copy number alterations, expression clusters, and clinical data--with context-sensitive outlinks to TumorPortal, Wikipedia, COSMIC, and other important WWW resources.

Page 52: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 52 -

28. PathSeq Analysis on TCGA Data for Pathogen Discovery Chandra Sekhar Pedamallu1,2, Akinyemi Ojesina1,2, Ami Bhatt1,2, Susan Bullman1,2, Fujiko Duke1,2, Peter Carr2, Aruna Ramachandran1,2, Gad Getz1, Matthew Meyerson1,2

1Broad Institute, Cambridge, Massachusetts; 2Department of Medical Oncology and Center for Cancer Genome Discovery, Dana-Farber Cancer Institute, Boston, Massachusetts; 3Department of Pathology, Harvard Medical School, Boston, Massachusetts An estimated 15-20% of all malignancies are caused by infectious agents, such as human papillomavirus (HPV), Epstein-Barr virus (EBV) or Helicobacter pylori (H. pylori). The development of massively parallel next generation sequencing led to the discovery of Merkel cell polyomavirus, and more recently a significant enrichment of Fusobacterium nucleatum (F. nucleatum) and related Fusobacterium species in colorectal carcinomas compared to non-neoplastic colon tissue from the same patients. However, these studies involved relatively small number of samples and there are likely many additional cancer-associated microbes that have yet to be discovered. TCGA sequencing data which includes mRNASeq, WGS (low pass and high pass), WES from over 20 human cancer types provide an unprecedented opportunity for cancer-associated pathogen discovery. Using PathSeq, software for pathogen discovery by computational subtraction, we have analyzed and analyzing various TCGA sequencing data for the presence of viral, fungal or bacterial sequences. We anticipate our analysis may provide novel insights how pathogens contribute to human cancer. Moreover, Pancancer PathSeq analysis could lead to the discovery of novel cancer-associated pathogens. Our preliminary results for recent three TCGA cancer types will be discussed.

Page 53: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 53 -

29. The ISB Cancer Genomics Cloud: Leveraging Google Cloud Platform for TCGA

Analysis Sheila Reynolds, Natalie Tasman, Michael Miller, Phyliss Lee, Kelly Iverson, Kalle Leinonen, Zack Rodebaugh, Lesley Wilkerson, Preston Holmes, Nicole Deflaux, Dennis Ai, John Lucena, Simon Fung, Sandeep Namburi, Yan Zhang, Walter Dula, David Pot, Jonathan Bingham, Ilya Shmulevich The ISB Cancer Genomics Cloud (ISB-CGC) is one of three pilot projects funded by the National Cancer Institute with the goal of democratizing access to the TCGA data by substantially lowering the barriers to accessing and computing over this rich dataset. The ISB-CGC is a cloud-based platform that will serve as a large-scale data repository for TCGA data, while also providing the computational infrastructure and interactive exploratory tools necessary to carry out cancer genomics research at unprecedented scales. The ISB-CGC will also facilitate collaborative research by allowing scientists to share data, analyses, and insights in a cloud environment. The ISB-CGC will provide interactive and programmatic access to the TCGA data, leveraging many aspects of Google Cloud Platform including BigQuery, Compute Engine, and App Engine. Open-access clinical and biospecimen information for all TCGA patients and samples, combined with the Level-3 TCGA data and a variety of genomic reference and platform-annotation sources will be stored in BigQuery, enabling fast SQL-like queries against the entire dataset. Controlled-access DNA and RNA sequence data will be available to dbGaP-authorized users in the original BAM and FASTQ file formats, and using the Global Alliance for Genomics and Health (GA4GH) API. The ISB-CGC aims to serve the needs of a broad range of cancer researchers ranging from scientists or clinicians who prefer to use an interactive web-based application to access and explore the rich TCGA dataset, to computational scientists who want to write their own custom scripts using languages such as R or Python, accessing the data through APIs, to algorithm developers who want to spin up thousands of virtual machines to analyze hundreds of terabytes of sequence data. The ISB-CGC will allow scientists to interactively define and compare cohorts, examine the underlying molecular data for specific genes or pathways of interest, and share insights with collaborators around the globe. All registered ISB-CGC users will automatically qualify for Google Cloud Platform credits that can be used to upload their own datasets into Google Cloud Storage, and to perform analyses using existing or custom pipelines. If you are interested in learning more about the ISB-CGC or would like to propose specific scientific use-cases to our development team, please visit us at http://cgc.systemsbiology.net/. This project has been funded in whole or in part with Federal funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400007C.

Page 54: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 54 -

30. Improved Resolution of Allelic Imbalance in TCGA Paul Sheet The University of Texas MD Anderson Cancer Center, Houston, Texas Genomic instability plays a well-documented and central role in carcinogenesis. Acquired chromosomal alterations that result in allelic imbalance (AI), such as loss of heterozygosity (LOH) or duplications, have been used to identify genomic regions that harbor tumor suppressor genes or oncogenes and characterize tumor stage and progression. One challenge in detecting regions of AI is that subtle forms of imbalance, or low proportions of DNA harboring the chromosomal alteration, may be typical due to stromal contamination or heterogeneity in the tumor genome itself. Here we apply hapLOH, a powerful statistical technique based on haplotype information (Vattathil & Scheet, 2013, Gen Res 23:152) that leverages the joint distribution of within-sample “B allele” frequencies (BAF) from SNP microarrays to multiple data sets in TCGA (PAAD, SKCM, BRCA, with analyses of OV and COAD almost complete) to elucidate numerous discrepancies with published sets of copy number changes. For example, in comparisons to TCGA calls on the pancreatic data set (PAAD), our calls differ substantially from the working TCGA calls. We examine categories where the two call sets differ. By visual inspection of the BAF and total intensity (log R ratio) plots, we clearly exhibit greater sensitivity to detect very subtle events and have a lower rate of false positives (e.g. in highly aberrant tumor genomes, which calibrate poorly in the standard TCGA calls). Our analyses are especially useful for detecting copy-neutral LOH, since these changes do not result in modifications to the log R ratio data and may be combined with existing TCGA calls to provide complementary information and increased resolution of acquired chromosomal alterations.

Page 55: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 55 -

33. MAD Bayes for Tumor Heterogeneity – Feature Allocation With Exponential

Family Sampling Yanxun Xu The University of Texas at Austin, Austin, Texas We propose small-variance asymptotic approximations for inference on tumor heterogeneity (TH) using next-generation sequencing data. Understanding TH is an important and open research problem in biology. The lack of appropriate statistical inference is a critical gap in existing methods that the proposed approach aims to fill. We build on a hierarchical model with an exponential family likelihood and a feature allocation prior. The proposed implementation of posterior inference generalizes similar small-variance approximations proposed by Kulis and Jordan (2012) and Broderick et.al (2012b) for inference with Dirichlet process mixture and Indian buffet process prior models under normal sampling. We show that the new algorithm can success- fully recover latent structures of different haplotypes and subclones and is magnitudes faster than available Markov chain Monte Carlo samplers. The latter are practically infeasible for high-dimensional genomics data. The proposed approach is scalable, easy to implement and benefits from the flexibility of Bayesian nonparametric models. More importantly, it provides a useful tool for applied scientists to estimate cell subtypes in tumor samples.

Page 56: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 56 -

35. Integrating TCGA and Investigator-Generated Data Using the UCSC Xena

Platform Brian Craft, Mary Goldman, Melissa Cline, Mark Diekhans, Teresa Swatloski, David Haussler, Jingchun Zhu Genomics Institute, University of California, Santa Cruz, Santa Cruz, California With the advent of cancer genome analysis, there is an enormous need for an integrative computational approach to understand the functional impact of the genomic aberrations that drive and characterize cancers. This requires mechanisms to aggregate and visualize both public and investigator-generated data on cancer genomes, transcriptomes, epigenomes and more. Extending the UCSC Cancer Genomics Browser, we are developing the UCSC Xena platform to achieve this. UCSC's Xena is a data server-based platform that stores functional genomics data and serves them in response to data request in real-time and with minimal informatics overhead. Examples of these data requests include data visualization, integration and further downstream analysis. Xena can easily be installed on a laptop, or on servers behind a firewall. The same server platform is installed at UCSC to serve TCGA open access data. It currently serves 882 datasets and 32143 samples from 34 different TCGA cancer types. Types of hosted datasets include copy number, somatic mutation, DNA methylation, gene- exon-level expression, miRNA expression, protein expression, PARADIGM pathway inference, and phenotype data. Our automated pipeline updates TCGA data periodically in the Xena database, ensuring we are visualizing the most recent data available. Additionally, our pipeline ingests TCGA phenotype data and further derive overall and recurrence free survival variables, allowing users to perform survival analysis. We built a web-based client called Xena Browser to access and visualize data hosted across multiple Xena servers while maintaining data privacy. This functionality allows viewing and interpretation of one's own data (e.g. stored on a private Xena) in the context of a large collection of cancer genomics datasets (e.g. TCGA data stored at UCSC Xena Server). The outcome is a platform for researchers to store and analyze their datasets in an interoperable manner. We are integrating with other tools such as MuPIT (enables visualization of somatic mutations on three-dimensional protein structures). Xena is being developed to leverage the Galaxy software as the underlying workflow engine to connect with the myriad of bioinformatics tools. Integrating these tools provides researchers with a workflow with strong analysis and visualization capabilities, and brings sophisticated computational analyses within the reach of non-computational scientists.

Page 57: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 57 -

36. Integrating Clinical and Molecular Data for Survival Prediction in TCGA Bin Zhu, Nan Song, Ronglai Shen, Veera Baladandayuthapani, Katerina Kechris, Hongyu Zhao, and the SAMSI Data Integration Group Large-scale comprehensive molecular profiling of tumor samples has revealed substantial inter-patient heterogeneity in their genomic background and tumor characteristics with unprecedented details. However, very few studies have systematically studied the effect of integrating multiple platforms for patient survival prediction, and the performance of molecular predictors in relation to clinico-pathological variables such as age, stage, and tumor size. We present a kernel-fusion Cox regression framework for integrating clinical and molecular data at the genomic, epigenomic, transcriptomic, and proteomic level to achieve enhanced precision in survival prediction. The kernel fusion approach allows powerful non-linear discrimination based on multiple molecular data types (somatic mutation, DNA copy number, DNA methylation, mRNA expression, and protein expression) and multiple views (e.g., gene, isoform, exon expression). In addition, our method facilitates the incorporation of prior knowledge, such as pathway information; and permits probability inference of t-year survival for future patients. In this poster, we present an example using the TCGA lung adenocarcinoma (LUAD) data set. We demonstrate that a) integrating multiple molecular platforms leads to improved prediction accuracies than using any single platform alone; b) for complex tumor types such as LUAD, kernel methods effectively aggregate numerous small effects toward better prediction. Taken together, we present a framework that provide a multi-layered deep learning of TCGA data sets to significantly improve the survival prediction.

Page 58: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 58 -

37. Evaluating the Signature of FFPE Through Genomic Characterization of Patient-

Matched Frozen and Formalin Fixed, Paraffin-Embedded Tissues; Focus on Whole

Exome Sequencing Erik Amuda1, Harsha Doddapaneni, Kyle Richard Covington, Nipun Kakkar, Liu Xi, Donna Marie Morton, Donna Marie Muzny, David Wheeler, Andy Mungall, Andy Chu, Richard Corbett, Payal Sipahimalani, Reanne Bowlby, Denise Brooks, Gordon Robertson, Bradley Murray, Carrie Sougnez, Petar Stojanov, Michael Lawrence, Raktim Sinha, Jorge Reis-Filho, Nicholas Schultz, Cyriac Kandoth, Raymond Lim, Charlotte Ng, Marc Ladanyi, Roy Tarnuzzer, Jean Claude Zenklusen, Natalie Bir, Jessica Frick, Jay Bowen, Julie Gastier-Foster, Katherine Hoadley, Wei Zhao, Dan Weisenberger, Moiz Bootwalla, Christina Curtis, Toshinori Hinoue, Peter W. Laird, Chris Miller, Mike McLellan, and Bob Fulton 1Nationwide Children's Hospital Under the auspices of The Cancer Genome Atlas (TCGA) initiative, we sought to test and implement methods that could enable the use of DNA and RNA samples extracted from FFPE specimens in large scale genomics projects. Our aims were i) to develop a method for optimal extraction of DNA and RNA from FFPE samples, maximizing the analyte integrity obtained and allowing for co-isolation of nucleic acids; ii) to assess the performance of state-of-the-art platforms for whole exome sequencing, copy number profiling, RNA and miRNA sequencing, and whole genome methylation analysis; iii) to define the best practices for the analysis of these platforms with data obtained from nucleic acids extracted from FFPE samples using data obtained from matched fresh-frozen samples as the ‘gold standard’; and iv) to report on the limitations imposed by the nature of FFPE samples and provide guidelines for the optimal use of current genomic and transcriptomic methods for the analysis of FFPE samples. To achieve these aims, TCGA procured a series of cancers from 38 patients, from which matched frozen and FFPE tumor samples and germline DNA extracted from blood leukocytes were available. These samples were subjected to whole exome, RNA and miRNA massively parallel sequencing, array profiling (SNP6 and methylation), and analyzed using the ‘best practice’ approaches implemented in the TCGA affiliated characterization centers. For all comparisons, the results obtained from the analysis of frozen specimens were adopted as the ‘gold standard’. In this progress report, our findings from whole exome sequencing will be presented. Specifically, results will be shown to (1) describe the repertoire of artifactual genetic alterations caused by formalin fixation and paraffin embedding (i.e. FFPE signature) through an in depth analysis of the results obtained with a minimally filtered variant calling pipeline, (2) define the residual signature of FFPE following stringent filtering achieved through multi-center calling, and (3) summarize the concordance in variant detection between paired FFPE and FF specimens from the perspective of the whole exome sequencing as well as within selected genomic regions of known biological/clinical significance.

Page 59: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 59 -

Biological Validation

Page 60: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 60 -

39. Somatic Copy Number Amplification and Activating Somatic Mutations of EZH2

Y641 Drive Melanoma by Epigenetic Remodeling and Silencing of Tumor

Suppressors Jessamy Tiffen1, Stuart J. Gallagher1, Peter Hersey1, Fabian V. Filipp2 1Melanoma Research Group, Kolling Institute of Medical Research, University of Sydney, St. Leonards, New South Wales, Australia; 2Systems Biology and Cancer Metabolism, Program for Quantitative Systems Biology, University of California, Merced, Merced, California The histone methyltransferase enhancer of zeste homolog 2 (EZH2) is the catalytic subunit of the polycomb repressive complex 2 (PRC2) and is associated with methylation-dependent gene repression. We investigated the genomic and epigenomic landscape of PRC2 components in melanoma and found recurring somatic mutations as well as somatic copy number amplification of EZH2 in the TCGA SKCM dataset. These structural alterations provide PRC2 with enhanced activity in melanoma, causing hypermethylation and transcriptional silencing. Central to oncogenic PRC2 action are somatic gain-of-function driver mutations in the active site of the SET domain of EZH2 at Y641. Somatic mutations of EZH2 are mutually exclusive to mutations in DNMT3A or DNMT3B, emphasizing the role of de novo DNA methylation by PRC2. Hypermethylated CpG markers in patients with gain-of-function mutations of EZH2 are enriched in tumor suppressors, MAPK pathway and chromatin remodelers. Upregulation of CDKN1A by inhibition of EZH2 in mutant cell lines provides mechanistic insight into oncogenic EZH2 action. EZH2 presents itself as an epigenetic cancer driver by modulating chromatin remodelers and silencing tumor suppressors. Increased survival of patients with low EZH2 expression, and responsiveness of melanoma cell lines to small molecule inhibition encourage targeting of EZH2 in treatment of melanoma.

Page 61: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 61 -

40. Validation and Extension Efforts at The Genome Institute at Washington University

School of Medicine R. Fulton, C. Miller, M. McLellan, L. Ding, R. Wilson, and The Genome Institute Faculty and Staff The Genome Institute at Washington University, St. Louis, Missouri As the TCGA project winds down, the discovery phase of the projects is ending, and validation/extension efforts are underway. This presentation features the validation and extension process as well as active project examples at The Genome Institute at Washington University. This workflow provides significant depth of coverage per site, and enables the target lists to be extended across large numbers of samples very efficiently. These methods can be employed for variant validation, extension or both, and can provide data for SNVs, indels, and SV. With the significant depth of coverage, clonality analyses and confidence in low VAF calls are enhanced, and graphical display of these data are sharpened. Here we highlight several examples of this process and the resulting data.

Page 62: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 62 -

41. SDHD Promoter Mutation and Function in Melanoma Tongwu Zhang1, Mai Xu1, Christine Lee1, Chris Schmidt2, Nicholas K. Hayward2, Kevin M. Brown1

1Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Gaithersburg, Maryland; 2QIMR Berghofer Medical Research Institute, Brisbane, Australia SDHD is succinate dehydrogenase complex, subunit D, integral membrane protein. SDH is the only enzyme that participates in both the TCA cycle and the electron transport chain. Inherited or somatic mutations in subunits B, C or D of SDH, which result in HIF1a stabilization, have been associated with several types of cancers including pheochromocytoma, paraganglioma, renal cell carcinoma, and papillary thyroid cancers. In addition, SDHD downregulation is observed in gastric and colon carcinoma. Recently, recurrent SDHD promoter mutations were reported exclusively in melanoma samples at a frequency of ~5%. These mutations are predicted to disrupt consensus ETS-transcription factor binding sites, and were found to be associated with both reduced SDHD gene expression and poor prognosis. However, whether SDH is more widely dysregulated in melanoma remains unclear. We identified recurrent SDHD promoter mutations in our large panel of melanoma cell lines with the same frequency previously reported (~5%). Notably, SDHD promoter mutations tend toward co-occurrence with NF1 mutations (OR = 3.0,P = 0.07). Consistent with predicted loss of ETS-transcription factor binding, we observe decreased luciferase activity in constructs with mutations of the SDHD core promoter compared to wild-type. We hypothesize that SDHD dysregulation will contribute to the transition of cellular metabolism from TCA cycle to glycolysis, facilitating cell survival and sustained growth in the presence of aberrant MAPK pathway signaling. Our preliminary data show increased MAPK signaling through introduction of oncogenic V600E BRAF leads to an increase in SDHD expression in primary melanocytes. shRNA-mediated knockdown of SDHD decreases cellular ATP generation, depletion of MITF, and causes cell death in melanocytes, effects that are partially rescued via expression of oncogenic BRAF. Moving forward, we are working to uncover the downstream functional consequences of SDH dysregulation, particularly the interaction of SDH activity, aberrant MAPK pathway signaling, and cellular phenotype.

Page 63: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 63 -

77. Altered Inflammatory and Death Pathways in Head and Neck Cell Lines Model Genomic

and Expression Signatures Identified in The Cancer Genome Atlas

Xinping Yang1, Hui Cheng1, Anthony Saleh1, Shaleeka Cornelius1, Emine Guven-Maiorov2, Ozlem Keskin2, Attila Gursoy2, Ruth Nussinov3, Carter Van Waes1, Zhong Chen1 1Tumor Biology Section and Clinical Genomics Unit, Head and Neck Surgery Branch, National Institute on Deafness and Other Communication Disorders, Bethesda, Maryland; 2Center for Computational Biology and Bioinformatics, Koc University, Istanbul, Turkey; 3Cancer and Inflammation Program, Leidos Biomedical Research Inc., National Cancer Institute, Frederick, Maryland The Cancer Genome Atlas (TCGA) project has investigated 279 HNSCC tissue specimens, and uncovered

significant genomic alterations of key molecules involved in inflammation, NF-B, and death pathways. These genetic alterations include amplification of FADD (11q13) and BIRC2/3 (11q22), mutations of caspase 8 and RIPKs in HPV(-), or deletion of TRAF3 in HPV(+) HNSCC tissues. Using bioinformatic analyses and protein structural and interactive tools, we identified ~60 proteins that may potentially interact with these genetically altered molecules involved in the inflammation and death pathways. More than 90% HNSCC tissues studied in TCGA exhibited expression or genetic alterations of these genes, with FADD amplification and/or overexpression ranking first with 37%. We also searched this group of genes among more than 20 major cancer types investigated by TCGA, among which HNSCC ranked the highest for the alterations. To further investigate and identify experimental models for functional studies of these genetic and phenotypic alterations, we have performed whole transcriptome (RNA-seq) and genome-wide exome DNA sequencing (exome DNA-seq) in 15 HPV(-) and 11 HPV(+) HNSCC lines, and compared them with three normal human oral mucosa lines and 8 matched blood samples. We have identified gene

amplifications and overexpression among many of the molecules involved in inflammation, NF-B and death pathways in cell lines, which were observed in the HNSCC TCGA project. Among the top 22 genes identified from TCGA with the alteration rate greater than 8% in cancer tissues, we found consistent expression patterns of ~77% molecules in our cell lines. To further test the function of these molecules, we established HNSCC cell lines stably

transfected with a vector that contains NF-B transcription factor response elements upstream of the -lactamase

reporter gene, to measure a fluorescence resonance energy transfer (FRET) substrate cleaved by -lactamase, that

results in blue fluorescence. Using these NF-B reporter cell lines, large RNAi screening assays have been performed to assess the regulatory and signaling molecules involve in NF-κB and death pathways to model the findings from TCGA. The function and mechanistic validation of these molecules are under the way, which could provide precise molecular targets of diagnosis and prognosis for further preclinical and clinical investigation in HNSCC.Supported by NIDCD intramural projects ZIA-DC-000016, 73 and 74; and funded under contract HHSN261200800001E

Page 64: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 64 -

Cross-Tumor Analyses

Page 65: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 65 -

42. Mining The Cancer Genome Atlas for Transcriptional Networks Regulated by the

TP53, TP63, and TP73 Family of Members at Different Stages of Tumorigenesis Hussein A. Abbas1, Cristian Coarfa2, Elsa R. Flores1 1The University of Texas MD Anderson Cancer Center, Department of Molecular and Cellular Oncology, Houston, Texas; 2Baylor College of Medicine, Department of Molecular and Cellular Biology, Houston, Texas Background and Objectives: TP53 plays a critical role in tumor suppression via transcriptional activation of genes involved in multiple cellular processes including apoptosis, cell cycle arrest, senescence, and metabolism. TP53 is part of a larger family of genes known as the p53 family, which includes TP63 and TP73. The functional roles of these two new family members in cancer has been confusing due to the existence of multiple isoforms and the lack of antibodies that distinguish between them. Both TP63 and TP73 have isoforms that can be categorized into the TA isoforms, which are thought to act primarily as tumor suppressors, and ∆N isoforms which are thought to act as oncogenes. Our lab has genetically engineered conditional deletions of each of these isoforms to gain a clear understanding of their roles in developmental biology and cancer. It is only until recently that our group has found that the deletion of the oncogenic forms of p63 and p73, ∆Np63 and ∆Np73, respectively, in p53-null mice and p53-mutant human cancer cell lines can induce regression of tumors via upregulation of IAPP and metabolic reprogramming. This finding further strengthens the argument of interplay amongst p53 family members and shed lights on targeting specific p63 and p73 isoforms to therapeutically target the p53 pathway. A well-defined understanding of the cross talk between the p53 family members in human cancers is still needed to effectively treat cancers with alterations in these pathways. Methods: We have begun to utilize the data of the 33 tumors and 22 normal tissues of The Cancer Genome Atlas (TCGA) in order to characterize the expression and role of TP53, TP63 and TP73 and its isoforms at different stages of tumorigenesis. We will also use genomic signatures from cells derived from the genetically engineered mouse models generated in our lab in order to derive the gene signatures of TP53, TP63 and TP73 and its isoforms to build for comparison to tumors at different stages in TCGA data. Results: There is a differential expression of TAp63 and ∆Np63 across all TCGA tumors and normal tissues. Specifically, ∆Np63α had the highest expression in cancers and normal tissues. We also identified two clusters of tumors that have differential expression of each of TAp63α and TAp63γ isoforms. Via gene set enrichment analysis, we found that the ∆Np63 signature that was generated from keratinocytes, which are cells that phenotypically resemble pluripotent stem cells, in our lab is downregulated at higher stages or grades of bladder carcinoma (BLCA), kidney renal clear cell carcinoma (KIRC), kidney renal papillary cell carcinoma (KIRP) and low grade gliomas (LGG) compared to lower stages and grades. We computed ∆Np63 signature scores for TCGA cohorts. We found that higher stages of BLCA, KIRC, KIRP and LGG have scores significantly closer to ∆Np63 null than lower stages. This finding further corroborates that the ∆Np63 signature is significantly less represented at higher stages of these cancers. In characterizing pathways altered in at least 3 out of the 4 tumors (BLCA, KIRC, KIRP and LGG) as well as in ∆Np63 null keratinocytes, there were significant enrichment for cell cycle processes, apoptosis, extracellular matrix reorganization and organ development pathways. We also found that compared to normal tissues the ∆Np63 signature is upregulated in BLCA, breast cancer (BRCA), colon adenocarcinoma (COAD), lung adenocarcinoma and squamous cell carcinoma (LUAD and LUSC), thyroid cancer (THCA) and uterine corpus endometeroid carcinoma (UCEC) suggesting a distinctive role of ∆Np63 at different stages. On the other hand, TAp63 signatures were downregulated across 17/33 tumors and upregulated in 2/33 tumors. Further analysis of this pathway is underway.

Page 66: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 66 -

Conclusion and Future Work: We showed a differential expression of TP63 isoforms across TCGA tumor and normal tissues. While ∆Np63 signature was highly expressed in tumors compared to normal tissues suggesting an oncogenic role, ∆Np63 expression decreases significantly at higher stages in BLCA, KIRC, KIRP and LGG. The latter finding suggests a tumor progression model were ∆Np63 expression is required for initiation and its loss is required for progression of these tumors, which is consistent with the stem cell signature of ∆Np63 deficient keratinocytes and suggests a possible epithelial to mesenchymal/stem cell state required for tumor progression driven by loss of ∆Np63. Interestingly, 3 of these tumors (BLCA, KIRC and KIRP) are of genitourothelial origin. The differential expression of TAp63 across different stages is also interesting and requires further investigation. We are currently generating the TP53 and TP73 isoform signatures in order to identify tumors that are driven or inhibited by these family members. The overarching goal is to build a complete map of signatures regulated by the TP53 family members to target this pathway therapeutically.

Page 67: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 67 -

44. Pan-Cancer Transcriptomic Analysis Associates Long Non-Coding RNAs With Key

Driver Mutational Events Arghavan Ashouri, Niklas Dahr, Erik Larsson

Institute of Biomedicine, The Sahlgrenska Academy, University of Gothenburg, Gothenburg, Sweden Thousands of long non-coding RNAs (lncRNAs) lie interspersed with coding across the genome. While most lncRNAs have unknown functions, a small subset has been implicated as effectors in oncogenic pathways. Here, we made use of transcriptome and exome sequencing data from >7000 human tumors, spanning >20 types of cancers, to identify lncRNAs that are induced or repressed upon mutation of key oncogenic drivers or suppressors. Our screen confirms known coding and non-coding members of oncogenic pathways such as TP53 signaling, but also associates a large number of new lncRNAs to relevant pathways. The associations were often highly reproducible across cancer types, and while many lncRNAs were co-expressed with their protein-coding hosts or neighbors, a subset of the pathway-associated lncRNAs were intergenic and independently regulated. Additionally, by systematically examining regions of focal copy-number change that lack obvious protein-coding targets, we were able to pinpoint several lncRNAs that are putative targets of recurrent genomic alteration in tumors. In summary, we provide a comprehensive map of lncRNA transcriptional alterations in relation to key driver mutational events in cancer, supporting that numerous uncharacterized lncRNAs could contribute to the effectuation of oncogenic programs in cancer.

Page 68: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 68 -

45. A Versatile Site-Specific Model of the Neutral Mutation Probability for Whole-

Genome Cancer Data Johanna Bertl Understanding the mutational process in cancer cells is crucial to distinguish driver mutations, responsible for the initiation and progress of cancer, from passenger mutations. The high somatic mutation rate in cancer cells and the heterogeneity of the process on different levels make this a challenging question: whole-genome pan cancer analyses have shown that the mutation pattern differs fundamentally not only between different cancer types, but also between patients and along the genome. With the increasing availability of whole-genome DNA sequence data from cancer cells, typically paired with data obtained from healthy tissue, efficient and scalable analysis methods are called for. Population genetic approaches to detect regions under positive selection are often not directly applicable: the outcome of the evolutionary process that takes place in the cancer tissue is usually only observed in a single biopsy. Therefore, methods have been developed that model the neutral mutation probability in windows or specific genomic elements, based on local genomic characteristics. Alternative approaches study the functional impact of individual mutations and the clustering of mutations along the genome. Here, we model the somatic mutation probability in cancer cells by considering not only heterogeneity of mutation rates both between patients and tissue types, but also covariates that describe the functional relevance of sites and epigenetic factors like methylation and replication timing. Our statistical framework is flexible enough to include both patient and site specific covariates as well as interactions between them. Modeling the mutation probabilities at single sites allows us to study the mutational process on multiple resolutions, so we can analyze regions of varying size and different types of genomic elements within our framework.

Page 69: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 69 -

46. Cx43 Regulation in Breast Cancer: Insight From TCGA Database Melanie Busby1, Isabelle Plante1, Michael Hallett2 1INRS-Institut Armand-Frappier, Laval, Quebec, Canada; 2Breast Cancer Informatics Group, McGill University, Montreal, Quebec, Canada Gap junctions are transmembrane channels formed by a family of proteins called connexins (Cxs) allowing direct communication between the cytoplasm of two adjacent cells. The mammary gland is formed by a double layered epithelium. Cx43 is expressed by the outer layer of basal cells, but also potentially by the inner layer of luminal cells. Cellular interactions within each layer (homotypic), as well as between luminal and myoepithelial cells (heterotypic) are necessary for the polarization and a full differentiation of the epithelium. Cxs have long been considered as tumor suppressors. However, this paradigm as been recently challenged, as their role seems to be tissue- and stage-dependant. Thus, Cxs are now considered conditional tumor suppressors. Conflicting evidences have been published regarding the role of Cx43 in breast cancer. Our recent data suggested that Cx43 is more involved in later stage of progression and in metastasis. The goal of this study is to use TCGA and other public databases to (1) investigate how Cx43 is associated to cancer progression; (2) understand the differences in Cx43 regulation in different breast cancer molecular subtypes. Our analysis showed that Cx43 expression is associated with a good prognosis in luminal cancers but with bad prognosis in Her2 breast cancer. The expression of Cx43 is closely related to methylation in luminal tumours, but poorly correlated with other gene’s expression. Conversely, in Her2 breast cancer, methylation do not follow closely Cx43 expression, but Cx43 is strongly correlated with genes associated with extracellular matrix remodelling (ex: collagens, laminins, SPARC, ADAMs), involved in epithelial-mesenchymal transition (EMT) (ex: Twist1, SNAI2, Zeb1, Zeb2) or associated with mammary stem cell signature. These results highlight that the causes and the consequences of Cx43 protein expression might diverge in the different molecular subtype of breast cancer. It therefore highlight the importance of studying different molecular subtypes of breast cancer as distinct diseases. These results also suggest that, in Her2 breast cancers, Cx43 is linked to tumour progression and that it could also be involved in the generation of breast cancer stem cells via EMT. Since EMT and cancer stem cells are associated with cancer progression and metastasis, together, these results will not only contribute to understand the role of Cx43 in breast cancer but will also help understand the context in which cancer stem cells evolve and proliferate. Further studies will use normal and cancer tissues as well as bioinformatics analysis to determine (1) whether Cx43 is expressed in the stroma of normal tissues and of different tumours subtypes; (2) how Cx43 vary in the different breast cancer subtypes at the protein level; (3) the precise localization of Cx43 protein within the different cell populations of the mammary gland, especially in stem cells and (4) the molecular landscape to which Cx43 is associated in these different context.

Page 70: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 70 -

47. Distinct Patterns of Somatic Genome Alterations in Lung Adenocarcinomas and

Squamous Cell Carcinomas Joshua Campbell, Jeremiah Wala, Angela Brooks, Bradley Murray, Mike Lawrence, Marcin Imielinski, Alice Berger, Guangwu Guo, Andrew Cherniack, Roel Verhaak, Gad Getz, Matthew Meyerson, Govindan Ramaswamy Lung adenocarcinomas (ADC) and lung squamous cell carcinomas (SqCC) are the most common types of lung cancer and remain major causes of death worldwide despite advances in smoking cessation, early detection, and targeted and immunological therapies. We examined exome sequences and copy number profiles of 619 lung ADC and 405 lung SqCC tumors to understand their similarities and differences and to identify previously undiscovered drivers of lung carcinogenesis. The majority of recurrently altered genes were unique to each lung tumor type. Additionally, the somatic profiles lung SqCCs were more similar to head and neck squamous and bladder urothelial carcinomas than to lung ADCs. Novel significantly mutated genes included PPP3CA (Calcineurin), DOT1L, and FTSJD1 in lung ADC and RASA1 in lung SqCC. Additionally, novel amplification peaks were observed in lung SqCC containing YES1 and miR-205. Joint analysis revealed additional significantly mutated genes including ERCC5, associated with a higher mutational burden, and amplification peaks containing MAPK1 and CCND3. By manual review of known hotspots, additional canonical mutations were found in KRAS, EGFR, and ERBB2 in 16 tumors. Furthermore, recurrent misaligned reads were observed in 10 tumors at EGFR exon 19. An assembly-based approach was used to identify complex in-frame deletions coupled with non-template base insertions in these tumors. Analysis of lung adenocarcinomas without known driver receptor tyrosine kinase/Ras pathway oncogenes revealed mutations in SOS1, VAV1, RASA1, and ARHGAP35, as well as amplifications peaks near FGFR1/WHSC1L1, PDGFRA/KIT/KDR, and MAPK1. These results demonstrate that the underlying mechanisms of carcinogenesis are largely different between lung ADCs and SqCCs and suggest that distinct targeted therapeutic strategies will be required for treatment of these diseases.

Page 71: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 71 -

48. Mutation Pattern Analysis in Cancer Kyle R. Covington, Eve Shinbrot, David A. Wheeler Genomes are under constant mutational stress caused by environmental mutagens, errors in replication, and defects in mutation repair. An understanding of the mutational processes occurring in cancer is critical for understanding the biology of the disease by understanding the source of mutagenic processes. We generated a set of 21 distinct mutation patterns in cancer and have applied mutation signature analysis to the TCGA and other cancer datasets revealing common and divergent mutational patterns across cancer. Many of these can be assigned to specific mutational processes while others may be caused by similar biological stresses on the genome. We focused on unraveling the mutational processes in gut-tube derived tissues. We unravel underlying mutational processes in mixed colorectal tumors (POLE, MSI, MSS). Class A POLE mutants clearly separated into the TCT-TAT signature cluster, while class B POLE mutants were grouped with the MSI high group. This approach was used to score POLE exonuclease domain mutations allowing us to identify several mutations of unknown significance as class A POLE mutants. We have found that these signatures are not merely descriptive, but have key biological and clinical implications to the tumor biology. We identified signatures in peri-ampullary and hepatocellular cancers that were associated with poor survival even after correcting for common clinical features. These data indicate that mutation signature analysis can be of clinical utility.

Page 72: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 72 -

49. Comprehensive Analysis of PI3K/Akt/mTOR Pathway Alterations in Human

Cancers Chad J. Creighton1, Rehan Akbani4, Michael Ittmann1, David J. Kwiatkowski2,3, Han Liang4, Gordon Mills4

1Baylor College of Medicine, Houston, Texas; 2Broad Institute, Cambridge, Massachusetts; 3Brigham and Women’s Hospital, Harvard Medical School, Boston, Massachusetts; 4The University of Texas MD Anderson Cancer Center, Houston, Texas The PI3K/Akt/mTOR pathway is frequently activated in cancer by mutation or copy number alteration. The aim of our study is to take a comprehensive look at the entire pathway and its components in over 10,000 human cancers profiled by TCGA. This includes mTOR and TSC1/2 and other components downstream of PI3K/Akt (e.g. RHEB, NPRL2, NPRL3, and DEPDC5), as well as AMPK/LKB1/NF2. In general many of the types of analyses regarding PI3K/Akt/mTOR that were previously featured in the TCGA marker studies (each study focusing on an individual cancer type) can be extended to the entire multi-cancer cohort. Analysis is currently ongoing. Initial observations were made using preliminary datasets (available in the public domain) representing ~11,123 human cancers (including 6106 with whole exome sequencing or WES, 10843 with copy number profiles by SNP array, 3467 with RPPA proteomic profiles, and 9005 with RNA-seq profiles). For a set of over 20 genes with known roles in PI3K/Akt/mTOR pathway, approximately 25% of tumors surveyed had nonsilent mutations involving at least one gene. Genes frequently mutated included ones previously found to have statistically significant mutation patterns (Lawrence et al. Nature 2014), including PIK3CA (13% of tumors with nonsilent mutations in our analysis), PTEN (7%), PIK3R1 (3%), MTOR (3%), TSC1 (1%), STK11 (1%), and PPP2R1A (1%). Genes with focal gains or losses previously found to be significantly targeted by copy alteration (Zack et al. Nat Genet. 2013) included PIK3CA (10% tumors with focal gain in our analysis), AKT1 (3% focal gain), AKT3 (5% focal gain), RICTOR (3% focal gain), STK11 (9% focal loss), and PTEN (7% focal loss). Using RPPA proteomic data, 3467 tumor profiles were scored for activation of PI3K/Akt and for activation of TSC/mTOR, using pathway knowledge-based signatures defined previously (Akbani et al. Nat Comm. 2014). Of the 10 cancer types surveyed, glioblastomas showed the highest activation levels of both PI3K/Akt and TSC/mTOR, while both clear cell renal cell carcinoma and endometrial carcinomas showed higher PI3K/Akt activation levels without correspondingly higher TSC/mTOR activation levels. Overall, PI3K/Akt and TSC/mTOR proteomic scores were significantly correlated across human cancers, though on the order of 20% of tumors surveyed showed higher TSC/mTOR activation levels without higher PI3K/Akt levels. PTEN mutations and lower PTEN protein levels were associated with higher PI3K/Akt activation but not with TSC/mTOR activation. A previously defined gene transcription signature of PI3K/Akt/mTOR (Creighton et al., Breast Cancer Research, 2010) was also used to score tumors based on mRNA expression patterns, and the signature was associated with higher PI3K/Akt proteomic scoring. In survival analyses that made corrections for tumor type, TSC/mTOR proteomic score, PTEN copy loss, and STK11 copy loss were each associated with poor overall patient survival. Future work will incorporate additional TCGA data as they become available, including additional WES and RPPA data in particular.

Page 73: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 73 -

50. Integrative Analysis of Transcriptome Variation in Uterine Carcinosarcoma and

Comparison to Sarcoma and Endometrial Carcinoma Natalie Davidson1, Kjong-Van Lehmann2, Andre Kahles2, Alexendar Perez1, Julia Vogt2, Gunnar Rätsch2 1Weill Cornell Medical College, 2Memorial Sloan Kettering Cancer Center, New York, New York Large-scale cancer genomics has made a huge impact on cancer research, not only allowing tumor types to be characterized in unprecedented molecular detail, but also making joint analysis of multiple tumor types much easier. Tumor types are commonly classified on their histological features influencing potential treatment strategies. While histological observations can give a general view of the properties of a tissue, molecular analyses can give a more fine-grained view into the state of the tissue. Furthermore, molecular analyses can better compare and contrast different tumor types to find differences and similarities in mutation, expression, and splicing patterns. Here we present a transcriptomic analysis of uterine carcinosarcoma (UCS) in relation to endometrial carcinoma (UCEC), sarcoma (SARC) and normal uterine tissue. Previously, uterine carcinosarcoma was classified as a uterine sarcoma due to histological features and aggressive behavior, but was recently reclassified as an endometrial carcinoma. The histological similarities between uterine carcinosarcoma, and both sarcoma and endometrial carcinoma warrants an in-depth analysis of all three cancer types. We used RNA-seq data from The Cancer Genome Atlas to understand splicing and expression similarities and differences between all three tumor types. Furthermore, we performed a differential transcriptome analysis of uterine carcinosarcoma to normal uterine samples from GTEx to find genes with tumor-specific splicing and expression patterns. Through our analysis of both splicing and expression patterns of the three cancer types and normal tissue, we found that uterine carcinosarcoma most closely resembles endometrial carcinoma and not sarcoma. Our results are consistent across multiple different analysis strategies. We also discovered a subset of genes that best discriminate uterine carcinosarcoma and endometrial carcinoma from sarcoma, as well as uterine carcinosarcoma and sarcoma from endometrial carcinoma. These genes were found by applying a random forest algorithm as a feature selection method for the task of discriminating between the sets of cancer types. We find that the expression of the gene EPCAM alone can discriminate uterine carcinosarcoma and endometrial carcinoma from sarcoma in most samples. We found no other gene with similar predictive ability in distinguishing uterine carcinosarcoma and sarcoma from endometrial carcinoma. In our comparison against normal tissue, we found 709 differentially expressed genes and 88 differentially spliced events between normal uterine tissue and uterine carcinosarcoma samples. We also found a set of introns that are recurrently spliced in a way that was not observed in any normal tissue samples. This work demonstrates conceptual strategies to investigate the transcriptomic profile across multiple cancer types. The similarities and differences we found between sarcomas, endometrial carcinomas, and uterine carcinosarcomas may not only be of interest for a deeper mechanistic understanding of the development and progression of uterine carcinosarcoma, but may also serve as potential tumor markers or opportunities for the development of new and targeted drug therapies.

Page 74: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 74 -

51. microRNA Regulation of Molecular Pathways as a Generic Mechanism and as a

Core Disease Phenotype Oncotarget Sol Efroni Bar-Ilan University, Ramat Gan, Israel The presentation includes findings described in the above paper as well as additional unpublished and published findings. The role of microRNAs as key regulators of a wide variety of fundamental cellular processes, such as apoptosis, differentiation, proliferation and cell cycle is increasingly recognized in most aspects of biology and biomedicine. Results from multiple microRNA studies over multiple pathway networks, led us to hypothesize that microRNAs target molecular pathways. As we show here, this is a network-wide phenomenon. The work presented, uses statistical tools that show how single microRNAs target molecular pathways. We demonstrate that this targeting could not be the result of random associations and cannot be the result of the sheer numeracy of microRNA targets. Furthermore, the strongest evidence for the association microRNA and pathway is in a demonstration of the way by which this network behavior associates with cancer phenotypes. In our analyses we study ten different types of cancer involving thousands of samples (from TCGA), and show that the identified microRNA–pathway associations demonstrate a clinical affiliation and an ability to stratify patients. The work presented here shows the first evidence for a mechanism of microRNAs-pathway generic regulation. This regulation is tightly associated with clinical phenotype. The presented approach may catalyze targeted treatment through exposure of hidden regulatory mechanisms and a systems-medicine view of clinical observation.

Page 75: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 75 -

52. PhenoNet: Identification of Key Networks Associated With Disease Phenotype

Bioinformatics Sol Efroni Bar-Ilan University, Ramat Gan, Israel The paper is based on TCGA data. Motivation: At the core of transcriptome analyses of cancer is a challenge to detect molecular differences affiliated with disease phenotypes. This approach has led to remarkable progress in identifying molecular signatures and in stratifying patients into clinical groups. Yet, despite this progress, many of the identified signatures are not robust enough to be clinically used and not consistent enough to provide a follow-up on molecular mechanisms. Results: To address these issues, we introduce PhenoNet, a novel algorithm for the identification of pathways and networks associated with different phenotypes. PhenoNet uses two types of input data: gene expression data (RMA, RPKM, FPKM, etc.) and phenotypic information, and integrates these data with curated pathways and protein–protein interaction information. Comprehensive iterations across all possible pathways and subnetworks result in the identification of key pathways or subnetworks that distinguish between the two phenotypes. Availability and implementation: Matlab code is available upon request.

Page 76: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 76 -

53. Mining a Comprehensive Database of TCGA Splice Junctions to Discover Cis

Genetic Drivers of Alternative Splicing Jacob Feala, Laura Corson, Ping Zhu, Lihua Yu H3 Biomedicine, Inc., Cambridge, Massachusetts A growing number of alternatively spliced isoforms of cancer genes, including PKM1/2, MCL1-L/S, and RON, have been found to be associated with oncogenic or tumor-suppressing activities. This has sparked numerous efforts to mine TCGA RNA-seq data for cancer-specific splice forms that can serve as potential drug targets or biomarkers. We adopted three parallel, complementary analyses. In approach 1, we identify isoforms that are up-regulated in cancer vs. normal. In approach 2, we adapt the Cancer Outlier Profile Analysis (COPA) approach to identify isoform outliers in a subset of tumors. In approach 3, we correlate recurrent alternative splicing events with nearby mutations likely to drive alternate isoform usage. Our pipeline reprocesses RNA-seq data from the raw FASTQ files, due to the fact that the TCGA RNASeqV2 pipeline publishes only a restricted list of known isoforms, exons, and splice junctions in its Level 3 expression datasets, leaving out novel splice forms (such as the well-known MET exon 14 skipping event) that do not belong to a standard gene model. Therefore, we recomputed splice junction counts from patients across 10 TCGA cohorts using the STAR RNA-seq aligner, and then calculated a "junction percent" score analogous to an exon-based "percent spliced-in" or PSI. Junction percent scores were used as input to our three approaches. In approach one, we perform a grouped t-test of the junction percent metric in tumor vs normal samples for each splice junction. We discover a large number of tumor-specific isoforms, validate their expression in the raw alignment data, and explore pathways enriched in the top hits. The top hits had strong overlap with top hits of a similar study based on isoform- rather than junction-level quantifications. In approach three, we correlate junction percent to allele frequency of mutations within 50bp of each splice site. Reassuringly, our results re-discovered the MET exon 14 skipping event, and we found several new low-frequency mutations in known cancer genes (such as ERBB2 and KRAS) that induce a nearby aberrant splicing event. A correlation between U2AF1 hotspot mutations and alternative U2AF1 splice form was also discovered. Splice junction data were stored and analyzed in Amazon Redshift, a cloud-based, column-store data warehouse enabling rapid analytics on large datasets. A web app provides visualizations and lists of top hits from data mining algorithms.

Page 77: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 77 -

54. Pan-Cancer Analysis of Mutagenesis by APOBEC Cytidine Deaminases Kin Chan1, Steven A. Roberts1,2, Joan F. Sterling1, Natalie A. Saini1, Leszek J. Klimczak3, Ewa P. Malc4, Jaegil Kim5, Hailei Zhang5, Harindra Arachchi5, Juok Cho5, David Heiman5, Michael Noble5, David J. Kwiatkowski5,6, David Fargo3, Piotr A. Mieczkowski4, Gad Getz5,7, Dmitry A. Gordenin1

1Genome Integrity & Structural Biology Laboratory and 3Integrative Bioinformatics Group, National Institute of Environmental Health Sciences, NIH, Research Triangle Park, North Carolina; 2School of Molecular Biosciences, Washington State University, Pullman, Washington; 4Carolina Center for Genome Sciences, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina; 5Broad Institute, Cambridge, Massachusetts; 6Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts; 7Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts The elucidation of mutagenic processes that shape cancer genomes is a fundamental problem whose solution promises insights into new treatment, diagnostic, and prevention strategies. We and others previously identified mutation clusters and genome-wide mutagenesis bearing the single-strand DNA (ssDNA)-specific APOBEC cytidine deaminase signature at 5′-tC-3′ motifs across many cancer types. Because of high abundance of APOBEC mutagenesis in several cancer types, we have developed a statistical approach capable of identifying cancer samples with APOBEC mutagenesis pattern based on WGS as well as on WES mutation data. This analysis, called Pattern of Mutagenesis by APOBEC Cytidine Deaminases (P-MACD), has been recently integrated into the Broad Institute GDAC Firehose and is currently available for Firehose standard and AWG-customized runs. It allows users to explore correlations of APOBEC mutagenesis with multiple clinical and molecular features, e.g., gene expression and hotspots in significantly mutated genes (SMGs). Our approach also enabled us to highlight APOBEC3A (A3A) as the most likely source of the majority of APOBEC-induced mutations in tumors. It is thought that A3A and APOBEC3B (A3B) are the deaminases most likely to play a leading mutagenic role in cancers, but the identity of APOBEC enzyme(s) responsible remain unidentified. In addressing this question we explored similarities of mutation motifs in TCGA and ICGC tumor samples with distinct APOBEC3A and APOBEC3B signature motifs defined by separate expression of these enzymes in yeast models. Specifically, we compared the prevalence of A3A- and A3B-specific mutation signatures in samples enriched with a common component of both signatures. We found that cancer genomes with statistically significant, but low, enrichment for common APOBEC signature mutations usually had an A3B-like signature. Strikingly, cancer genomes with high enrichment for common component of APOBEC signature mutations almost always had an A3A-like signature. We propose that there is a background level of A3B-mediated mutagenesis in many cancers, but in strongly mutated cancers, A3A-mediated mutagenesis apparently dwarfs the A3B background. The A3B background is detectable only in samples without significant A3A signature mutagenesis. While A3B is likely to be acting in the background of more cancers, A3A is by far the more prolific deaminase in terms of sheer numbers of mutations induced. As such, A3A should be considered a more important target for development of novel diagnostic and treatment strategies.

Page 78: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 78 -

55. Detection of Trans and Cis Splicing QTLs Through Large-Scale Cancer Genome

Analysis Kjong-Van Lehmann1, André Kahles1, Cyriac Kandoth1, William Lee1, Nikolaus Schultz1, Oliver Stegle2, Gunnar Rätsch1 1Memorial Sloan Kettering Cancer Center, New York, New York; 2European Bioinformatics Institute, Hinxton, Cambridge, United Kingdom The comprehensive survey of molecular characteristics provided by The Cancer Genome Atlas (TCGA) enables large scale analyses across multiple cancers. However, sophisticated tools for the joint analysis of the thousands of samples that tackle the cancer specific challenges are needed. In an effort to enable joint analysis, we have re-aligned and re-analyzed RNA and whole exome sequencing data of ~4,000 individuals across 11 cancer types in a uniform manner. We used the newly developed open source SplAdder pipeline to count gene expression as well as annotate and quantify a comprehensive set of alternative splicing events. We identified threefold more high confidence alternative splicing events than annotated in the GENCODE annotation, which reflect cancer-specific and tissue-specific splicing variation. Comparisons to matching tissue normal samples confirm a ~20% increase of splicing complexity in tumor samples. We have identified sets of genes with splicing changes that recurrently occur in tumor samples (>10%) but are virtually never observed in normal samples or ENCODE cell lines (<0.5%) and could be possible targets for new drugs. While population structure is one of the most severe confounding factors in the analysis of quantitative trait loci (QTL), tumor samples open up many new additional challenges. Tumor-specific somatic mutations and recurrence patterns as well as sample heterogeneity can lead to spurious associations. Thus, we have developed a new strategy to perform a common variant association study using linear mixed models on tumor samples enabling us to account for tumor specific genotypic and phenotypic heterogeneity in addition to population structure. Due to sample size constraints, many previous QTL studies have been limited to the analysis of cis-associated variants. The large sample size available from TCGA enables us to overcome this limitation and discover trans-associated variants as well. We can demonstrate that we find cis-associations for ~10% of the analyzed genes, of which a large fraction replicates across tissue and cancer types. We also confirm recently reported trans-associations in the splice factors U2AF1 as well as SF3B1.

Page 79: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 79 -

56. Comprehensive Assessment of Cancer Missense Mutation Clustering in Protein

Structures Atanas Kamburov, Michael Lawrence, Ignaty Leshchiner, Paz Polak, Kasper Lage, Todd Golub, Eric Lander, Gad Getz Harvard University, Boston, Massachusetts Cancer genome projects are continuously enlarging the catalog of somatic mutations found in tumors. New insights about cancer genes and mechanisms are buried in these data and may be revealed through integration with complementary information such as the (also growing) catalog of protein structures. Here we present a systematic analysis of the recently published PanCancer compendium of mutations from 4,742 tumors in the context of human protein structures currently available in the PDB. We aimed to detect proteins showing significant three-dimensional (3-D) mutation clustering, and structurally resolved molecular interactions showing enrichment of interface mutations. Such mutational patterns may arise from positive selection of cancer mutations and thus can implicate new candidate proteins and interactions in cancer. In addition to several previously known cancer proteins, we found significant 3-D mutation clustering in the kinetochore component NUF2 whose mutations may potentially lead to improper chromosome segregation and to aneuploidy. Importantly, our results suggest that clustering of missense mutations is a feature of tumor suppressors and is not exclusive to oncoproteins, as commonly believed. By directly, systematically testing interaction interfaces between proteins and different other partner molecules (proteins, ligands, DNA and RNA) for enrichment of mutations, we found several interactions likely perturbed in cancer, including FBXW7-CCNE1, HRAS-RASA1, CUL4B-CAND1, OGT-HCFC1, PPP2R1A-PPP2R5C/PPP2R2A, DICER1-Mg2+, MAX-DNA and SRSF2-RNA. Routine comprehensive analysis of somatic mutations and protein structures will generate a stream of hypotheses regarding a likely role in cancer of specific mutated protein residues and interactions that can feed into systematic follow-up experimental efforts.

Page 80: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 80 -

60. Virus Analysis in Head and Neck, Bladder, Cervical, and Esophageal Cancers Michael Parfenov, Angeliki Pantazi, Angela Hadjipanayis, Lixing Yang, Semin Lee, Alexei Protopopov, Harvard GCC Team, Peter Park, Jon Seidman, Raju Kucherlapati Harvard GCC Team, Harvard Medical School, Brigham and Women’s Hospital, Boston, Massachusetts; The University of Texas MD Anderson Cancer Center, Houston, Texas We used low pass paired-end whole genome sequencing data to analyze TCGA samples for the presence of viral sequences. To detect viruses, we aligned reads against a set of 4322 viral reference genomes and calculated the percentage of covered viral genome and the number of virus copies per cell. The most abundantly represented virus was Human papillomavirus (HPV). We also examined the physical status of the viral DNA to assess their integration into the host genome at nucleotide resolution by detecting chimeric viral-human reads. Then we comprehensively characterized HPV-positive tumors including association with clinical features, DNA methylation analysis, expression of both human and viral genes, and structural variation analysis of the HPV genome. We analyzed 150 head and neck tumors, 72 bladder tumors, 52 cervical, and 49 esophageal cancers. Among them we detected the presence of a wide range of viruses including different types of HPV, Human herpesvirus types 1, 2, 4, 5, 6A, 6B, and 7, and BK polyomavirus. Of the 102 virus positive cases across four cancer types, eighty five (83%) had HPV. None of the normal samples were HPV-positive. Esophageal cancer was the only tumor type that did not have HPV, whereas 23% head and neck, 4% bladder, and 90% cervical tumors were HPV-positive. Fifty-nine cases (25 head and neck, 2 bladder, and 32 cervical tumors) had integration of the viral genome into one or more locations in the human genome with statistical enrichment for genic regions. Many of the integration events occurred in the genes that are known to be tumor suppressors or oncogenes. Head and neck and cervical cancers share several common integration target loci such as the RAD51B gene involved in DNA repair and cell cycle control and the tumor protein p63 regulated 1 gene, TPRG1. However, cervical cancer demonstrated its own unique “integration signature”: multiple integrations in the chr8q24.21 region involving the MYC and PVT1 genes. In head and neck cancer, however, we did not detect any integrations in chromosome 8. Integrations had a significant impact on the host genome and were associated with variations in DNA copy number (for example, in the cases of RAD51B, NR4A2 and ETS2 genes), mRNA transcript levels and splicing (in the cases of ETS2 and PDL1), and both inter- and intrachromosomal rearrangements (in the cases of RAD51B, KLF5, TP63 and TPRG1) that were confirmed by FISH analysis for the RAD51B case. Cancers with integrated vs. nonintegrated HPV displayed different patterns of both human and viral gene expression. Moreover, we showed for the first time that head and neck tumors but not cervical tumors demonstrated significant differences in DNA methylation signatures between integration-positive and negative tumors. In the tumors that did not have integrated virus tumor suppressor genes IRX4 and BARX2 were hypermethylated and underexpressed, whereas SIM2 and CTSE genes, associated with cancer progression, showed hypomethylation and increased expression. Our data demonstrate an important role of HPV infection in the development of head and neck, cervical, and bladder cancers, but do not support viral oncogenic potential for esophageal carcinoma. Our results show the mechanisms by which HPV interacts with the human genome beyond expression of viral oncoproteins and suggest that specific integration events are essential for viral oncogenesis. Our analysis with whole genome, transcriptome, and methylation profiling supports the idea that the developmental mechanisms of integration-positive and negative tumors are significantly different.

Page 81: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 81 -

61. Master Regulator and Network Diffusion Analysis Reveals Convergent Cancer

Driver Programs Across Pan-Cancer Samples Evan Paull, Vlado Uzunangelov, Joshua M. Stuart Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California Methods: To identify pathway-based subtypes, we created a ‘network signature’ for each sample in a collection of 20 tumor types that are a subset of the TCGA PanCanAtlas dataset. To do this, we first identified the major transcription factors active in each sample using the VIPER algorithm [2]. VIPER was run using as input a broad compendium of literature based and predicted sets of transcription factor (TF) to target gene interactions. Networks were constructed by then connecting predicted driver mutations and copy number to the activated TFs using our previously developed network diffusion approach called TieDIE [1]. We quantify the explanatory power of the networks using a novel permutation-based test that randomizes the chosen driver events from samples of different tissue types. Clustering of the network signatures reveals groups of samples with similar transcriptional and regulatory patterns that could be explained by a convergent (but non-overlapping) set of genomic alterations. Results: We found the inferred protein activities to identify similar patterns of TF activation and repression related to established oncogenic and tumor-supressor programs, across tissue types. However, in spite of the convergent trend observed in the inferred TF activity, we found that mutation and copy number events specific to each sample were significantly better at explaining the inferred TF activity in corresponding samples, compared with randomly chosen genomic events from samples of other tissue types. This suggests our algorithm is finding paths that connect driver events through established interactions in the protein signaling layer, that facilitate either activation or inhibition of transcriptional regulators that control a large fraction of the observed expression signature. In addition, clustering of the TieDIE network signatures summarizes diverse patterns of mutation and copy-number alterations that lead to convergent transcriptional profiles, allowing us to ascribe function to novel genomic events from the network-transformed transcriptional and genomic profiles of the corresponding samples. We tested this hypothesis with a leave-one-out cross validation framework and found that we can recover the functional impact of many known driver events with this approach, suggesting that the combination of transcriptional data, genomic data and pathway context may complement existing methods of driver detection based on genomic sequence. The results of this analysis not only allow for prioritization of mutated or copy-number altered genes, but also identify ‘linking’ genes that may be essential for relaying growth signals (or blocking survival signals) from proteins with altered function or activity. These linking proteins may be essential to the cancer and may be of clinical interest. Because prioritization of these potential vulnerabilities is done on a per-sample basis, our platform is able to function as a prototype for future personalized medicine solutions. References

1. Paull, Evan O., et al. "Discovering causal pathways linking genomic events to transcriptional states using Tied Diffusion Through Interacting Events (TieDIE)."Bioinformatics 29.21 (2013): 2757-2764.

2. Alvarez, Mariano J., Federico Giorgi, and Andrea Califano. "Using viper, a package for Virtual Inference of Protein-activity by Enriched Regulon analysis." (2014).

Page 82: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 82 -

62. A Pan-Cancer Catalogue of Cancer Driver Protein Interaction Interfaces Eduard Porta-Pardo, Thomas Hrabe, Adam Godzik

Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, La Jolla, California Despite their importance in maintaining the integrity of all cellular pathways, role of mutations on protein-protein interaction (PPI) interfaces as cancer drivers, though known for specific examples, has not been systematically studied. We analyzed missense somatic mutations in a pan-cancer cohort of 5,989 tumors from 23 projects of The Cancer Genome Atlas (TCGA) for enrichment on PPI interfaces using e-Driver, an algorithm to analyze the mutation pattern of specific protein regions such as PPI interfaces. We identified 128 PPI interfaces enriched in somatic cancer mutations. 28 of them are found in well-established cancer driver genes, particularly those in critical network positions, showing that their mechanism of action involves altering PPI interfaces. The remaining 100 represent novel cancer driver predictions. Integrating these findings with clinical information, we show examples of how tumors driven by the same gene can have different behaviors, including patient outcomes, depending on which specific interfaces are mutated.

Page 83: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 83 -

63. Putative Driver Roles of RNA Editing in Tumorigenesis Based on Editome Analysis

of 9 Major Cancer Types Si Qiu and Xion Heng BGI-Shenzhen, China RNA editing is merging as an important player in tumorigenesis through post-transcriptional modification. Here, we developed a robustly computational approach to accurately identify RNA edits using paired tumor-normal transcriptome data. This allows us to profile the editome landscape in 504 patients across 9 major cancer types in The Cancer Genome Atlas. Based on editome-wide analysis, we identified significant global hyper-editing in bladder urothelial carcinoma (BLCA), breast invasive carcinoma (BRCA), colon adenocarcinoma (COAD), head and neck squamous cell carcinoma (HNSC), and thyroid carcinoma (THCA), and significant global hypo-editing in kidney renal clear cell carcinoma (KIRC). In HNSC and LUAD, patients with high editing frequency in tumor have poor-prognosis. To explore the role of editing in tumorigenesis, we designed method and developed bioinformatic pipeline for detecting cancerous significantly differential editing frequency sites (DESs) which have significant higher or lower editing frequency in tumor than normal. We identified 28,403 cancerous DESs and found that non-synonymous cancerous DESs are enriched in reported cancer related genes. In addition, pathway analysis shows that non-synonymous and 3’UTR cancerous DESs are enriched in cancer pathway in complementary way. Take together RNA editing appears to play a putative driver role in tumorigenesis. Our work extensively enhances the understanding of RNA editome across 9 major cancer types and shows its great potential role in future clinical implication.

Page 84: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 84 -

65. A New Molecular Signature Approach for Prediction of Driver Cancer Pathways

From Transcriptional Data D.S. Rykunov, H. Li, A. Usilov, E.E. Schadt, B.A. Reva

Icahn School of Medicine at Mount Sinai, New York, New York Assigning cancer patients to the most effective treatments requires an understanding of the molecular basis of their disease. While DNA-based molecular profiling approaches have flourished over the past several years to transform our understanding of driver pathways across a broad range of tumors, a systematic characterization of key driver pathways based on RNA data has not been undertaken. Here we introduce a new approach to predict the status of driver cancer pathways based on weighted sums of gene expressions or signature functions derived from RNA sequencing data. To identify the driver cancer pathways of interest, we mined DNA variant data from TCGA and nominated driver alterations in seven major cancer pathways in breast, ovarian, and colon cancer tumors. The activation status of these driver pathways were then characterized using RNA sequencing data by constructing signature functions in training datasets and then testing the accuracy of the signatures in test datasets. The signature functions perform well in separation tumors with nominated active pathways from tumors with no genomic signs of activation (average AUC equals to 0.83) systematically exceeding the accuracies obtained by the SVM method that we employed as a control approach. A typical pathway signature is composed of ~20 biomarker genes that are unique to a given pathway and cancer type. Our results confirm that driver genomic alterations are distinctively displayed at the transcriptional level and that the transcriptional signatures can generally provide an alternative to DNA sequencing methods in detecting specific driver pathways.

Page 85: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 85 -

66. A New Approach for Prediction of Molecular Signatures of Outcome in Cancer D.S. Rykunov, E.E. Schadt, B.A. Reva

Icahn School of Medicine at Mount Sinai, New York, New York Stratification of cancer patients into different risk groups is one of the key tasks in the development of personalized medicine of cancer. Driven by the hypothesis that the aggressiveness of cancer (and disease outcome) is associated with distinct genomic and transcriptional features, we developed a molecular signature approach for prediction of the disease outcome given a transcriptional or genomic profile of a tumor. The signatures of outcome were derived from transcriptional profiles of TCGA with available survival information. At the first step, the algorithm evaluates genes as candidate biomarkers, and stratifies a given training set of tumors into two survival classes with respect to individual biomarkers. To this end, for each of ~20K genes, tumors of a training set are sorted by a gene’s expression level (or by a number of genomic alterations – mutations and copy number variations) to determine the maximal difference in survival (estimated by logrank test) for all possible separations of tumors into two classes by a gene’s expression level (or a number of genomic alteration). At the second step, the top biomarker candidates are combined into a signature function – a weighted sum of expression values (or numbers of genomic alterations). The signature function is used as a collective biomarker: tumors are sorted by the value of the signature function and the two most distinct survival classes are determined by considering all possible two-class separations. In constructing the signature function, we assumed that each of the individual biomarkers is a “weak” marker that differentiates the more aggressive and less aggressive forms of disease. Under this assumption, the biomarker weights in the signature function can be computed analytically. Because there is no a priori division of tumors into survival groups, stratification will be always specific to given data. To increase robustness of survival stratification, we randomly split an original set of tumor profiles into two approximately equal sets for training and testing. Then, a survival signature derived on a training set was tested by how well it separated tumors of a test set into two survival classes – this was repeated ~8,000 times for robustness. The multiple randomized tests make it possible to (i) rank candidate biomarkers by association with survival and by frequency of occurrence, and (ii) compute a probability for each of tumors to be found in a poor (or better) survival class in the averaged “environment” of other tumors for both training and test sets. The computed probabilities were used to stratify tumors into two survival classes. We applied the signature approach to transcriptional profiles of head-and-neck, ovarian and breast cancers and obtained very distinct separation of tumors into poor and better survival classes. In particular, the probability driven separation of tumors of test sets produced P values of 2×10-12 for head and neck squamous cell carcinoma (263 tumors); 5×10-11 for ovarian serous cystadenocarcinoma (196 tumors); 6×10-7 for estrogen positive and her2 negative subtype of breast cancer (327 tumors) and 2×10-5 for triple negative breast cancer subtype (81 tumors); the fractions of the survival classes in studied cancers were comparable. The P values of the survival difference obtained for the combined signatures are essentially lower than any of the P values obtained for individual genes. This illustrates the power of the general approach to combine individual biomarkers into a consistent signature of outcome.

Page 86: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 86 -

67. TCGA SpliceSeq: An Interactive Exploration of Splicing Variation Across TCGA

Tumor Types Michael C. Ryan1,2, James Cleland2, Wing Chung Wong2, RyangGuk Kim2, John N. Weinstein1

1The University of Texas MD Anderson Cancer Center, Houston, Texas; 2In Silico Solutions, Fairfax, Virginia TCGA’s RNASeq databases represent one of the largest collections of cancer transcriptomes ever assembled. RNASeq technology combined with computational tools like SpliceSeq1 provides a comprehensive, detailed view of alternative mRNA splicing. Aberrant splicing patterns have been implicated in tumor development, dedifferentiation, and metastasis. Several therapeutic agents designed to target or modify transcript spliceforms in tumor cells are in early-stage development. TCGA SpliceSeq is a web-based resource that provides a quick, user-friendly method for exploring the alternative splicing patterns of TCGA tumors. Percent Spliced In (PSI) values for splice events on representative samples from 24 different tumor types, including adjacent normal samples when available, have been loaded into TCGA SpliceSeq. We are currently using a high performance cluster to loading the entire collection of TCGA RNASeq samples. Investigators can interrogate genes of interest to them or search for the genes that show the strongest variation across selected tumor types or between tumor and adjacent normal samples. The interface presents intuitive graphical representations of splicing patterns, read counts, and various statistical summaries, including percent spliced in. Selected data can be downloaded for inclusion in integrative analyses. TCGASpliceSeq is freely available for academic, government, or commercial use at http://projects.insilico.us.com/TCGASpliceSeq. Reference

1. Ryan MC, Cleland J, Kim R, Wong WC, Weinstein JN. SpliceSeq: A Resource for Analysis and Visualization of RNA-Seq Data on Alternative Splicing and Its Functional Impacts. Bioinformatics, 10.1093, 2012.

This project is made possible by the following funding sources:

NCI/NIH Grant no. U24CA143883 (MD Anderson TCGA Genome Data Analysis Center)

The Michael & Susan Dell Foundation: The Lorraine Dell Program in Bioinformatics for Personalization of Cancer Medicine

The H.A. & Mary K. Chapman Foundation

The MD Anderson Cancer Center Support Grant

Page 87: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 87 -

68. Cross-Tumor Analysis and Characterization of Somatic and Germline Mutations in

The Cancer Genome Atlas Sandeep Sanga, Tod M. Klingler Station X, Inc., San Francisco, California Our genomes are going to be an integral part of our future health care. On this promise, exome and genome sequencing are rapidly being integrated into the practice of medicine presenting opportunities in characterizing rare diseases, managing patient-specific treatment such as in cancer, prenatal screening for disease risk, and identifying novel pharmacogenomics biomarkers. Recently, the American College of Medical Genetics and Genomic (ACMG) published their recommendations for reporting incidental findings in clinical exome and genome sequencing. They recommend that laboratories performing germline clinical sequencing seek and report mutations of the specified classes or types in a specified set of genes. Likewise, characterization of somatic mutations in tumor samples and is helping to guide treatment strategy by helping link cancer genotypes to clinically actionable knowledge. The clinical utility of reporting mutations based on the ACMG recommendations remains unclear until more data is available. Here, in an effort to help assess the clinical utility of the ACMG guidelines for reporting on incidental findings, we apply the recommended guidelines to more than 4,000 germline exome samples currently available as part of The Cancer Genome Atlas, and report our findings. Likewise, in an effort to identify somatic markers unique to cancer indications, we profiled the somatic mutations for primary tumors across more than 10,000 patients representing more than 25 different cancer indications that are available as part of The Cancer Genome Atlas. We focus this somatic mutation profiling to cancer-associated genes as defined by the Cancer Gene Census maintained by the Catalogue of Somatic Mutations in Cancer (COSMIC), and apply cluster analysis to identify patterns unique to cancer types as well as common across indications. In order to accomplish these analyses rapidly across thousands of samples, we leverage the regularly updated version of The Cancer Genome Atlas data that Station X (San Francisco, CA) maintains in GenePool® (https://stationx.mygenepool.com/), a cloud-based system for the secure storage, management, analysis, visualization, interpretation, and sharing of large-scale human genomics data.

Page 88: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 88 -

69. The Landscape of T Cell Infiltration in Human Cancer and Its Association With

Antigen Presenting Gene Expression Yasin Şenbabaoğlu1, Andrew G. Winer2, Ron S. Gejman3,11, Ming Liu4, Augustin Luna1, Irina Ostrovnaya5, Nils Weinhold1, William Lee1,6, Samuel D. Kaffenberger2, Ying-Bei Chen7, Martin Voss8, Paul Russo2, Jonathan A. Coleman2, Victor E. Reuter7, Timothy A. Chan6,9,11, David A. Scheinberg3,10,11, Ming O. Li4, James J. Hsieh8,9, Chris Sander1, A. Ari Hakimi1,2

1Computational Biology Center, 2Urology Service, Department of Surgery, 3Molecular Pharmacology and Chemistry Program, 4Immunology Program, 5Department of Epidemiology and Biostatistics, 6Department of Radiation Oncology, 7Department of Pathology, 8Genitourinary Oncology, Department of Medicine, 9Human Oncology and Pathogenesis Program, 10Department of Medicine, 11Weill Cornell Medical College, Memorial Sloan Kettering Cancer Center, New York, New York Infiltrating T cell subsets in the tumor microenvironment play crucial roles in the competing processes of antitumor immune response and tumor-induced immunosuppression. However, the infiltration level of distinct T cell subsets and their association with the expression of antigen presenting machinery (APM) genes remain poorly characterized across human cancers. Here, we defined novel mRNA-based T cell infiltration scores (TIS) to profile infiltration levels in 19 tumor types and identified clear cell renal cell carcinoma (ccRCC) as the highest for TIS and among the highest for the correlation between TIS and APM expression. To further characterize the immune infiltration in ccRCC, we computationally determined the infiltration levels of 24 adaptive and innate immunity cell types in a discovery cohort of 415 patients profiled by The Cancer Genome Atlas consortium and validated our findings in an independent ccRCC cohort of 101 patients. An integrated analysis revealed three clusters of tumors primarily separated by levels of T cell infiltration and APM gene expression, but unlikely to be driven by recurrent driver mutations, copy-number alterations or tumor neo-antigens. Specific T cell infiltration assessment revealed Th17 cells and the CD8+ T/Treg ratio to be pro-survival whereas Th2 cells and Tregs were associated with negative outcome. Our analysis identifies cancer type-specific immune infiltrates, ccRCC-specific associations between immune infiltrates and prognosis, and opens up the opportunity to utilize computational immune decomposition in a variety of clinically meaningful scenarios.

Page 89: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 89 -

110. A Comprehensive Guide for Managing Large-Scale Collaborative Genomics

Research Projects

Sheth, Margi

Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030

Globally applicable best practices guidelines for managing large-scale collaborative genomics projects have been

established using lessons learned from the successes and challenges of The Cancer Genome Atlas (TCGA). As the

cost of genomic sequencing is decreasing, more and more researchers are leveraging genomic data to inform the

biology of disease. The amount of genomic data generated is growing exponentially, and protocols need to be

established for the long-term storage, dissemination, and regulation of these data for research. The authors aim to

create a comprehensive guide to managing research projects involving genomic data, as learned through the

evolution of the TCGA program over the last decade. This project was primarily carried out in the US, but the impact

and lessons learned can be applied to an international audience.

The guide will serve to:

Establish a framework for managing large-scale genomic research projects involving multiple collaborators Describe lessons learned through TCGA to prepare for potential roadblocks Evaluate policy considerations that are needed to avoid pitfalls Recommend strategies to make project management more efficient Educate readers on practical considerations and stakeholder applications regarding each step of the project

The guide will cover operational procedures, policy considerations, and lessons learned through TCGA on topics

such as:

Sample acquisition Data generation Data storage and dissemination Data analysis efforts Quality control, auditing and reporting Formation of analysis working groups for consortium publications

Analysis of TCGA’s programmatic and policy decisions since 2006 provides insight into successful

practices. Collaborative spirit, vital to its success, was maintained through incentivizing participation in analysis

working groups, publishing with a single network author, and allowing participants to gain early access to project

data. TCGA was managed centrally by NIH offices, which streamlined project management activities overall. Sample

and clinical data quality was maintained by evaluation of tissue provider practices through a review board, use of a

central repository for sample receipt and distribution, and the use of multi-stage payment plan per sample

enrolled. Streamlined data analysis, storage, and dissemination occurred through a tightly controlled data

coordination center, which among other activities, provided to the public precise datasets used for each analysis

publication.

This is done in collaboration with Jiashan Zhang and Jean C. Zenklusen.

The Cancer Genome Atlas Program Office, The National Cancer Institute, The National Institutes of Health,

Bethesda MD 20892

Page 90: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 90 -

70. Detection and Characterization of Gene Fusions in Liver and Kidney Cancer Eve Shinbrot, Kyle Covington, Liu Xi, Richard R. Gibbs, David A. Wheeler Human Genome Sequencing Center, Baylor College of Medicine, Houston, Texas Gene fusions are important mediators of carcinogenesis. Fusion genes can produce a functional effect by overexpression of oncogenes, inactivation of tumor suppressor genes, or by creating novel functions for a protein. They present a therapeutic target as well as constituting important diagnostic and prognostic markers. In this study we identified RNA fusions from RNA-seq data in kidney (KIRP, KICH, KIRC) and liver, including bile duct and contrasted the fusions that occur in cancers of these organ systems. We identified both known and novel gene fusions for each cancer subtype. In LIHC we identified the known DNAJB1-PRKACA fusions, a marker for the fibrolamellar HCC (FL-HCC) a subtype that comprises 2% of the TCGA cohort. FL-HCC is an extremely rare liver cancer (< 1%), occurs in younger patients, does not respond to chemotherapy, and contains few coding mutations. The DNAJB1-PRKACA fusion is found in 80% or more of this subtype carrying both prognostic and therapeutic implications: treatment of these patients is different from standard HCC. Two of the HCC patients with DNAJB1-PRKACA were reclassified as a result of this finding. Other important fusions we have identified in this data set include TCLF7 fusions (previously identified in CRC), MET, ERBB2, FGFR2, TERT, XRCC6/5, MAML3 fusions. We also found fusions involving tumor suppressor (TS) genes, these are out of frame and would lead to inactivation of the TS genes. Thus, fusion events are an important mechanism of TS knock out, corroborating what was shown in the LAML cancers. We are currently investigating these fusions and their role in LIHC. We also identified fusion transcripts characteristic of cholangiocarcinoma. Provisionally, these tumors were reassigned to CHOL, although they may eventually be designated an intermediate subtype. In TCGA KIRP fusions involving the micropthalmis (MiT) genes (TFE3, TFEB, MiTF) are found in 1-5% of sporadic RCC tumors. Tumors containing TFE3 fusions often contain mutations in chromatin genes without other known drivers. We found eight tumors (5%) that contained micropthalmis (MiT) family fusions. Four of those harbor previously identified fusion partners with TFE3 (3 partnered with PKRCC and1 with SFPQ). Two remaining TFE3 fusions were with novel partners, RBM10 and DVL2. In addition we found two TFEB gene fusions, both with novel gene partners. We also found oncogenic kinases (MET, ALK, and FGFR3) fusions in this data set. These fusions provide evidence for mechanisms of oncogenic activation in addition to single base pair substitutions and have both prognostic and potential therapeutic relevance to this project.

Page 91: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 91 -

71. Comparing Cancer Cell Lines and Tumor Samples by Genomic Profiles Rileen Sinha1,2, Nikolaus Schultz1, Chris Sander1

1Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York; 2Department of Genetics and Genomic Sciences, Icahn Institute of Genomics and Multiscale Biology, New York, New York Cancer cell lines are often used in laboratory experiments as models of tumours, although they can have substantially different genetic and epigenetic profiles compared to tumours. We have developed a general computational method – TumorComparer - to systematically quantify similarities and differences between tumour material using detailed genetic and molecular profiles. The comparisons can be flexibly tailored by placing a higher weight on functional alterations of interest (‘weighted similarity’). In a first pan-cancer application, we have compared 260 cell lines from the Cancer Cell Line Encyclopaedia (CCLE) and 1914 tumours of six different cancer types from The Cancer Genome Atlas (TCGA), using weights to emphasize genomic alterations that frequently recur in tumours. We report the potential suitability of particular cell lines as tumor models and identify unsuitable outlier cell lines for each of the six cancer types. In future, the weighted similarity method can be applied in a clinical setting to compare patient profiles, when sufficient data is available to combine cancer genomic profiles with appropriately weighted clinical attributes, such as diagnosis, treatment regimen and response to therapy.

Page 92: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 92 -

73. Analysis of Somatic Mutation in miRNA Across 30 Different Cancer Types Yumeng Wang Baylor College of Medicine, Houston, Texas Micro RNAs (miRNAs) are small, non-coding RNAs that function in post-transcriptonal gene regulation through interactions with the target mRNA. miRNAs can block protein translation, initiate mRNA cleavage and degradation [1]. Mutations occurred in these small molecules would potentially alter the behavior of miRNA, and profoundly affect gene regulation pathway. In this project, we systematically analyzed somatic mutation in miRNA region across 30 different cancer types using mutation data from The Cancer Genome Atlas (TCGA), International Cancer Genome Consortium (ICGC) and other public resources [2]. Overall, miRNA mutation is rare event in cancer tumors. In total, around 13,000,000 somatic mutations are analyzed, among which, only 737 of them located in mature miRNA region. Each cancer type has slightly different miRNA mutation frequency, but no significant enrichment has been observed in any specific type. Referring to variation in each miRNA, recurring mutation was detected in several miRNA including hsa-miR-124-3p, hsa-miR-142-3p and hsa-miR-518a-5p, which have been reported to be potential factors that contribute to cancer development [3-5]. These reoccurred mutations came from various cancer types, indicating that their vital role in fundamental cell proliferation and survival. However, due to the low frequency of miRNA mutation, no recurring time larger than 10 was observed in 7042 samples we analyzed using Alex [2] dataset. Therefore, no miRNA mutation hotspot was identified. In conclusion, by comprehensively identify all miRNA mutations, miRNA mutation is uncommon in 30 different cancer types we studied, and not significant recurring mutation was targeted in any miRNA or cancer subtype. However, miRNA with relatively high recurring time are interesting targets worth further investigation, and they may serve important role in unraveling gene regulatory pathway in tumor. References 1. Valencia-Sanchez, Marco Antonio, et al. "Control of translation and mRNA degradation by miRNAs and siRNAs."

Genes & development 20.5 (2006): 515-524. 2. Alexandrov, Ludmil B., et al. "Signatures of mutational processes in human cancer." Nature (2013). 3. Xu, Xianglai, et al. "MicroRNA-124-3p inhibits cell migration and invasion in bladder cancer cells by targeting

ROCK1." J Transl Med 11.276 (2013): 10-1186. 4. Lin, Rui‐Jun, et al. "MiR‐142‐3p as a potential prognostic biomarker for esophageal squamous cell carcinoma."

Journal of surgical oncology 105.2 (2012): 175-182. 5. Baffa, Raffaele, et al. "MicroRNA expression profiling of human metastatic cancers identifies cancer gene

targets." The Journal of pathology 219.2 (2009): 214-221.

Page 93: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 93 -

74. Integrated Analysis of Pan-Cancer Gene Fusion from RNA-Seq Data Jiayin Wang1, Ken Chen5, Michael D. McLellan1, Michael C. Wendl1, Li Ding1,2,3,4

1The Genome Institute, 2Department of Medicine, 3Department of Genetics, 4Siteman Cancer Center, Washington University in St. Louis, St. Louis, Missouri; 5Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, Texas Gene fusions comprise an important class of driver event in the development of cancer, with important prognostic implications, and valuable potential as therapeutic targets in personalized approaches to precision medicine. Large-scale data sets from The Cancer Genome Atlas (TCGA) offer opportunities for identifying both known and novel gene-fusion events. In particular, the matched nature of the tumor / normal data allows for integrated analyses of fusions with other types of genetic variants. Toward this end, we analyzed the TCGA RNA-Seq data from 3,322 cases across 12 cancer types. We employed BreakFusion, TopHat2 (TopHat-Fusion), and ChimeraScan to detect gene fusion events, ultimately identifying 233 candidates, of which 14 associated with known cancer genes including ERG fusions with partners AGXT2L2, NDRG1 and TMPRSS2, etc. These preliminary candidates will be further vetted to develop an automated fusion detection and analysis pipeline and extend the analysis to all available cases across more cancer types. Integrated analysis of gene fusions with other data, including copy number, expression, mutation, and methylation data can be utilized to identify treatment optimizing patient outcome.

Page 94: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 94 -

76. Continental Ancestry Inference and Admixture Deconvolution of TCGA and ICGC

Cohorts From Whole Genome Sequencing Data of the Germline Dai-Ying Wu1, Suyash Shringarpure2, Tal Shmaya1, Genevieve Wojcik2, Christopher Gignoux2, The PCAWG Network, Carlos D. Bustamante2, Francisco M. De La Vega1,2

1Annai Systems Inc., Burlingame, California; 2Stanford University School of Medicine, Stanford, California Numerous studies have shown a relationship between ancestry and penetrance of disease susceptibility variants. Population specific variants can cause changes in the pharmacokinetics and effectiveness of drugs that lead to different populations responding differently to cancer treatments. Unfortunately, ethnicity is often not present in the metadata of research cohorts, and if so, it is self-reported and often inaccurate. We aim to explore the relationship between ancestry, cancer etiology, and outcomes using over 2,200 samples from TCGA and ICGC for which whole genome sequencing data of the germline is available. Samples were sequenced with Illumina technology as part of the TCGA and ICGC projects, and realigned to the GRCh37 (hg19) human reference with BWA-MEM as part of the uniform alignment workflow for ICGC PanCancer Analysis of Whole Genomes (PCAWG) Project. We performed variant calling from these BAMs using the rtgVariant software and completed this task within two months using less than 300 CPU/cores on the Annai-ShareSeq Genomic Resource. We developed a maximum likelihood method that uses a previously selected set of 4,235 ancestry-informative-SNPs and their frequency across continental populations (the Americas, Africa, East Asia, European, and South Asia) to extend and verify the broad ethnic categories provided by the TCGA project (ICGC DCC doesn’t have this metadata). For those individuals of recent ancestry admixture (e.g. African Americans), we apply the RFmix software, a fast discriminative modeling approach, to identify ancestral chromosomal segments along the genome. We then annotate these germline variants to identify potentially deleterious variants across cancer genes of clinical interest, and report allele frequencies across these genes, stratified by continental ancestry. Having ancestry information from sequence data, and separating out ancestry segments in admixed individuals, is an important covariate to control for in rare-variant association analysis to be carried out at the PCAWG project and beyond. In addition, calculating the allele frequency of genomic variants is a critical step in determining variant pathogenicity during clinical genetic testing. A key limitation in the field is the lack of population control data of known (or estimated) ethnicity. We believe the data we generated here will add to the growing resources of allele frequencies for clinical genomics applications, enabling clinician to make better assessment of pathogenicity when diagnosing rare genetic diseases.

Page 95: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 95 -

78. Dissecting the Clinical Prognostic and Predictive Utility of Cancer Genomic Data

Across Tumor Types Yuan Yuan, Eliezer M. Van Allen, Larsson Omberg, Levi A. Garraway, Adam A Margolin, Gad Getz, Han Liang The Cancer Genome Atlas (TCGA) project represents the largest effort to systematically characterize the molecular profiles of human cancers. Over the last several years, TCGA has generated tremendous amounts of genomic, transcriptomic, epigenetic and proteomic data from a large number of patient samples in multiple cancer types using consistent high-throughput characterization technologies. A central question for the cancer research community is how to use these large-scale molecular data to guide cancer care. We investigated two potential clinical applications of TCGA data: predicting patient prognosis and identifying clinically actionable alterations. For four TCGA cohorts, I systematically evaluated the power of molecular data with or without clinical variables in predicting patient survival; and to facilitate further community efforts, we established an open-access model evaluation platform. I demonstrated that under certain conditions incorporating molecular data with clinical variables can improve predictive power. Across 12 cancer types, we identified 10,281 somatic alterations in clinically relevant genes in 2,928 out of 3,277 patients (89.4%), revealing recurrent and potentially targetable alterations not revealed in single-tumor datasets. Our study represents the first systematic and comprehensive analysis evaluating the prognostic power of different types of molecular data across multiple cancer types. In addition, the clinically actionable alterations we identified may inform novel clinical trial design and treatment choice.

Page 96: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 96 -

79. Zodiac: A Comprehensive Depiction of Genetic Interactions in Cancer by

Integrating TCGA Data Yitan Zhu1, Yanxun Xu2, Donald L. Helseth, Jr3, Kamalakar Gulukota3, Shengjie Yang1, Lorenzo L. Pesce4, Riten Mitra5, Peter Müller2, Subhajit Sengupta1, Wentian Guo6, Jonathan C. Silverstein1, Ian Foster4, Nigel Parsad1, Kevin P. White7,8, Yuan Ji1,9

1Center for Biomedical Research Informatics, NorthShore University HealthSystem, Evanston, Illinois; 2Department of Mathematics, The University of Texas at Austin, Austin, Texas; 3Center for Molecular Medicine, NorthShore University HealthSystem, Evanston, Illinois; 4Computation Institute, The University of Chicago and Argonne National Laboratory, Chicago, Illinois; 5Department of Bioinformatics and Biostatistics, University of Louisville, Louisville, Kentucky; 6School of Public Health, Fudan University, Shanghai, People’s Republic of China; 7Institute for Genomics and Systems Biology, The University of Chicago and Argonne National Laboratory, Chicago, Illinois; 8Department of Human Genetics and Department of Ecology and Evolution, The University of Chicago, Chicago, Illinois; 9Department of Health Studies, The University of Chicago, Chicago, Illinois Genetic interactions play a critical role in cancer development. Existing knowledge about cancer genetic interactions is incomplete, especially lacking evidences derived from large-scale cancer genomic data. The Cancer Genome Atlas (TCGA) produces multimodal measurements across genomic features of thousands of tumors, which provide an unprecedented opportunity to investigate the interplays of genes in cancer. We introduce Zodiac, a computational tool and resource to integrate existing knowledge about cancer genetic interactions with new information contained in TCGA data. It is an evolution of existing knowledge by treating it as a prior graph, integrating it with a likelihood model derived by Bayesian graphical model based on TCGA data, and producing a posterior graph as updated and data-enhanced knowledge. In short, Zodiac realizes “Prior interaction map + TCGA data Posterior interaction map.” We performed such integration and knowledge evolution for about 200 million pairs of genes and produced a database allowing customized search for interplays between any genes of interest. Equally important, Zodiac provides data processing and analysis tools that allow users to customize the prior networks and update the genetic pathways of their interest. Zodiac is publicly available at www.compgenome.org/ZODIAC. The inferred genetic interactions in Zodiac recapitulate and extend existing knowledge and provide evidences supporting novel hypotheses about cancer molecular interactions, such as transcriptional regulation and signaling cascade. We expect Zodiac will be used by the cancer genetics community and facilitate cancer research in a variety of settings.

Page 97: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 97 -

106. Characterization of the Usage of The Serine Metabolic Network in Human Cancer M. Mehrmohamadi1, X. Liu2, A.A. Shestov2, J.W. Locasale3 1Field of Genomics, Genetics, and Development, Department of Molecular Biology and Genetics, 2Division of Nutritional Sciences, 3Field of Genomics, Genetics, and Development, Department of Molecular Biology and Genetics, Division of Nutritional Sciences, Cornell University, Ithaca, New York The serine, glycine, one-carbon (SGOC) metabolic network is implicated in cancer pathogenesis, but its general functions are unknown. We carried out a computational reconstruction of the SGOC network and then characterized its expression across thousands of cancer tissues. Pathways including methylation and redox metabolism exhibited heterogeneous expression indicating a strong context dependency of their usage in tumors. From an analysis of coexpression, simultaneous up- or downregulation of nucleotide synthesis, NADPH, and glutathione synthesis was found to be a common occurrence in all cancers. Finally, we developed a method to trace the metabolic fate of serine using stable isotopes, high-resolution mass spectrometry, and a mathematical model. Although the expression of single genes didn't appear indicative of flux, the collective expression of several genes in a given pathway allowed for successful flux prediction. Altogether, these findings identify expansive and heterogeneous functions for the SGOC metabolic network in human cancer.

Page 98: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 98 -

Disease Specific Integrative Analyses

Page 99: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 99 -

80. Mbatch: Detection, Diagnosis, and Correction of Batch Effects in TCGA Data Rehan Akbani, Nianxiang Zhang, Anna Unruh, Tod D. Casasent, Chris Wakefield, James M. Melott, Bradley M. Broom, John N. Weinstein Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center (MD Anderson TCGA Genome Data Analysis Center), Houston, Texas Batch effects constitute a major technical danger in any large-scale omic project such as TCGA. To address that problem, we have developed Mbatch, a suite of algorithms and visualization tools for detection, diagnosis, and correction of batch effects. Mbatch is available at http://bioinformatics.mdanderson.org/tcgambatch/ or as a downloadable application. We have been using Mbatch in conjunction with almost all of TCGA’s Disease-specific and Pan-Cancer Analysis Working Groups to provide quality-control information in a timely fashion. That has proved reassuring in some cases and has detected issues that needed attention in others. One component of Mbatch, PCA-plus, provides an enhanced form of principal components analysis, including (i) a “dispersion separability criterion” that quantifies batch effects, (ii) improvements in the visualization of PCA results by introducing group centroids, and (iii) the ability to assess trend effects. A hierarchical clustering algorithm allows side-by-side comparison of batch effects within sample subgroups. We will present some examples to indicate the relative importance of batch effects that arise at the level of tissue source site, batch ID, shipping date, and plate number. Other possible sources, for example, in the GSCs or GCCs, can be analyzed similarly insofar as the data to do so are available. The data analyzed include mRNA expression, microRNA expression, DNA copy number, and DNA methylation for all tumor types whose data are on the TCGA website. Statistical algorithms cannot distinguish between technical and biological batch effects, of course, so that distinction will continue to be dependent on the marshalling of available evidence on a case-by-case basis. The MD Anderson GDAC periodically analyzes all TCGA level 3 data for batch effects. The results are available on the Mbatch website noted above. The website features dynamic PCA plots that enable zooming, panning, and resizing of plots, as well as identification of sample IDs via mouse-over functionality. An information box displays several types of batch information about the selected sample, including its tissue source site, shipping date, processing BCR and GSC/GCC. Most of the Mbatch computations are in R. Computationally intensive aspects of the code are parallelized. The output consists of static PNG files, as well as SVG files for dynamic visualization. The SVG and supplemental files are uploaded to the website on a periodic basis. A Java D3 library is used for zooming and panning, and the website itself uses the JavaScript Dojo toolkit. All of the components are tightly integrated to support multiple browsers and operating systems. In sum, the Mbatch website provides a dynamic, interactive resource for detailed batch effects analysis and correction of TCGA data sets. This work is supported by Grant Number U24CA143883 from the National Cancer Institute and supported in part by a gift from the Mary K. Chapman Foundation and a grant from the Michael & Susan Dell Foundation.

Page 100: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 100 -

81. Improved Prediction of Post-Surgical Survival in Glioblastoma Through Integrative

Genomic and Epigenomic Analysis Spyridon Bakas, Bilwaj Gaonkar, Christos Davatzikos Department of Radiology, University of Pennsylvania, Philadelphia, Pennsylvania Hypothesis: Integration of genomic and epigenomic data should yield substantial improvement in predicting clinical variables of interest, as compared to using either type of data in isolation. In this work this hypothesis is validated by using a combination of information from expression calls for genes, miRNA and DNA methylation to predict length of post-resection survival of patients afflicted with glioblastoma (GBM), via machine learning methods. Materials and Method: A subset of 186 patients with GBM was selected from The Cancer Genome Atlas (TCGA). The selection was performed to include patients with joint availability of data for gene expression calls, miRNA and DNA methylation. The median age for the subset at time of diagnosis was 58 (range 10-76) years and the median post-resection survival was 343.5 (range 26-1338) days. After assessing the distribution of survival for the chosen population, low and high survival groups were distinguished, comprised by 57 patients with survival below the 33rd percentile (<182 days) and 56 patients with survival above the 67th percentile (>422 days). A Support Vector Machine (SVM) formulation for classification was used with a linear type of kernel function to estimate the survival prediction accuracy between the aforementioned survival groups. 10-fold cross validation was used to test the predictive models on new patient data. The SVM classifier was trained on four configurations, using each of the gene expression, miRNA, DNA methylation data, as well as their combination. Results: Comparison of the accuracy results obtained for the individual sets of data with the accuracy obtained for their combination indicates the superiority of the latter. Specifically, the accuracy results for gene expression data (AgilentG4502A) is 61.95%, for the miRNA expression data (Agilent Human microRNA8x15K) is 63.72%, for the DNA methylation data (Illumina Infinium Human Methylation BeadChip 27) is 66.37%, whilst for the combination of all three data sources is 70.80%. Conclusion: The combination of genomic and epigenomic data allows for better prediction of survival as compared to using any one type of dataset individually. While combining data increases complexity of the analysis, the boost in the signal outweighs the increase in noise, while predicting survival.

Page 101: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 101 -

82. Molecular Profiling of Epithelial to Mesenchymal Transition in High-Grade and

Low-Grade Gliomas Orieta Celiku, Anita Tandle, Kevin Camphausen, Uma Shankavaram National Institutes of Health, Bethesda, Maryland Gliomas are the most common malignant brain tumors. Grade IV glioma – glioblastoma (GBM) – arises primarily de novo and has particularly poor prognosis with less than 5% of patients surviving 5 years after diagnosis. Processes that mimic Epithelial to Mesenchymal Transition (EMT) – a common process in organ development, wound healing, and tissue remodeling – are responsible for disorganization of the extracellular matrix, tumor cell motility, the highly invasive nature of GBM, and are thought to be responsible for resistance to radiotherapy and various therapeutic agents [1]. By comparison, lower grade gliomas (LGGs) show fewer mesenchymal characteristics and prolonged survival. Targeting or reversing the EMT signature is a promising avenue in the development of molecular therapies for GBM. We use TCGA data to study EMT in gliomas and report on differences between LGGs and GBM, molecular markers associated with increased survival, potential therapeutic targets, and drug candidates for these targets. We use a recently proposed 64-gene EMT signature [2] to stratify a cohort of primary untreated GBMs and grade II LGGs (astrocytoma and oligodendroma) into samples overexpressing the EMT signature (High EMT cohort), and those with low expression of the signature (Low EMT cohort). The High EMT and Low EMT cohorts were analyzed for differential expression at the RNASeq level, enrichment of biological pathways for differentially expressed genes, differences in transcription factor regulatory networks (using PANDA [3]) , differential methylation of probes close to CpG islands (using R's methyAnalysis [4]), and mutations (using cBioPortal [5]). The highest ranked enriched pathways include Extracellular Matrix Organization, and a number of Signal Transduction pathways. High EMT samples show increased expression of genes encoding collagens (type IV, VI, IX), matrix metalloproteinases, laminins (beta and gamma chains), and extracellular signaling. The most frequent mutation of High EMT samples was EGFR amplification. Hypermethylation of Low EMT samples consistent with the G-CIMP phenotype, known to give LGGs better overall survival, shows protective effect via suppression of genes involved in extracellular matrix organization. References 1. Yan YR, Xie Q, Li F, Zhang Y, Ma JW, Xie SM, et al. Epithelial-to-mesenchymal transition is involved in BCNU

resistance in human glioma cells. Neuropathology. 2014;34(2):128-34. doi: 10.1111/neup.12062. PubMed PMID: 24112388.

2. Kim H, Watkinson J, Varadan V, Anastassiou D. Multi-cancer computational analysis reveals invasion-associated variant of desmoplastic reaction involving INHBA, THBS2 and COL11A1. BMC Med Genomics. 2010;3:51. doi: 10.1186/1755-8794-3-51. PubMed PMID: 21047417; PubMed Central PMCID: PMCPMC2988703.

3. Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing messages between biological networks to refine predicted interactions. PLoS One. 2013;8(5):e64832. doi: 10.1371/journal.pone.0064832. PubMed PMID: 23741402; PubMed Central PMCID: PMCPMC3669401.

4. Du P, Bourgon R. methyAnalysis: DNA methylation data analysis and visualization. R package version 1.8.0. 2014.

5. Cerami E, Gao J, Dogrusoz U, Gross BE, Sumer SO, Aksoy BA, et al. The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. Cancer Discov. 2012;2(5):401-4. doi: 10.1158/2159-8290.CD-12-0095. PubMed PMID: 22588877.

Page 102: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 102 -

83. Identifying Genetic Drivers and Alterations in HPV-Negative and HPV-Positive

HNSCC Cell Lines with High-Throughput Whole Exome DNA and Transcriptome

RNA Sequencing Hui Cheng1, Xinping Yang1, Han Si1, Anthony Saleh1, Jamie Coupar1, Robert L Ferris2, Wendell G. Yarbrough3, Mark E. Prince4, Thomas E. Carey4, Carter Van Waes1, Zhong Chen1

1Tumor Biology Section and Clinical Genomics Unit, Head and Neck Surgery Branch, National Institute on Deafness and Other Communication Disorders, Bethesda, Maryland; 2Division of Head and Neck Surgery, Departments of Otolaryngology, Radiation Oncology, and Immunology, University of Pittsburgh Cancer Institute, Pittsburgh, Pennsylvania; 3Department of Surgery, Division of Otolaryngology, Molecular Virology Research Program, Smilow Cancer Hospital, Yale Cancer Center, Yale University Medical School, New Haven, Connecticut; 4Cancer Biology Program, Program in the Biomedical Sciences, Rackham Graduate School, University of Michigan, Ann Arbor, Michigan Head and neck squamous cell carcinoma (HNSCC) is among the top cancer types with high frequencies of genetic alterations, including mutation and copy number variation (CNV). Recently the Cancer Genome Atlas (TCGA) has profiled over 279 HNSCC tumors and generated a comprehensive genomic characterization of HNSCC. This has led to an urgent need for a panel of HNSCC cell line models with genomic alterations representative of those found by TCGA. We performed whole exome DNA sequencing (exome DNA-seq) and transcriptome RNA sequencing (RNA-seq) on 15 HPV-negative and 11 HPV-positive HNSCC lines, which were compared with three normal human oral mucosa lines and 8 matched blood samples. Exome DNA- and RNA-seq were performed on the ABI SOLiD platform with an average depth of 87X and 44X respectively. Using an in-house analysis pipeline, we determined the CNVs and single nucleotide variants (SNV) obtained from DNA-seq to be able to compare with the genomic alterations found in TCGA, and also to cross-validate these with the SNVs identified in our RNA-seq. We identified chromosome losses in 3p, 5q, 8p, 9p and 18q and gains in 3q, 7p and 11q in a significant portion of cell lines with software CONTRA (COpy Number Targeted Resequencing Analysis), which are consistent with previous karyotype and TCGA CNV studies. Integrative analysis between CNV by exome-seq and gene expression by RNAseq of these cell lines revealed a significant positive correlation in multiple oncogenes including PIK3CA, TP63, CCND1, FADD, BIRC2 and YAP1, which is in concordance with TCGA results. We established a workflow to determine deleterious mutations and somatic mutations using software ANNOVAR, in combination with functional prediction tool Mutation Assessor, and Sanger Institute’s somatic mutation database, COSMIC, in order to characterize legacy and newer HNSCC lines without and with matched samples. We identified a median of 1588 potentially deleterious and/or somatic mutations for each cell line. The most recurrently mutated genes in TCGA with a functional impact are also frequently mutated in cell lines, including TP53, FAT1 and NOTCH1, etc. Many of the genomic alterations identified converge on the networks we previously defined in HNSCC, including the PI3K/AKT/mTOR, NFκB, and RAS/MAPK pathways. Our findings suggest that these cell lines can serve as HNSCC models for mechanistic and therapeutic studies, and thereby provide a valuable resource for the wider biomedical research community.

Page 103: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 103 -

84. Somatic Copy Number Alterations in Esophageal Cancer Andrew Cherniack1, Juliann Shih1, Bradley Murray1, Carrie Sougnez1, Gordon Saksena1, Adam Bass1,2, Matthew Meyerson1,2, and the TCGA Esophageal AWG 1Broad Institute, Cambridge, Massachusetts; 2Dana-Farber Cancer Institute, Boston, Massachusetts Esophageal cancer is the eighth most common cancer type worldwide and results in over 400,000 deaths annually. These tumors fall into two major histological subtypes, esophageal adenocarcinoma (EAC) and esophageal squamous-cell carcinoma (SCC). EAC is most prevalent in western countries and occurs primarily in the lower third of the esophagus, while SCC is the most common form in Asian counties and usually occurs in the distal and mid esophageal regions. As part of the TCGA esophageal analysis working group, we profiled 82 EACs and 94 SCCs for somatic copy number alterations (SCNA) on SNP 6.0 arrays. Levels of aneuploidy are similar between both histological types, but overall patterns of SCNAs differ. SCC tumors have patterns of chromosomal and focal alterations that are similar to HPV- head and neck and other squamous tumors. These tumors are characterized by an idiosyncratic squamous pattern of 3p loss and 3q gain along with focal alterations that include gain of CCND1 and loss of CDKN2A. Other focal losses in SCC include PARD3 and VGLL4, which are present in other types of squamous tumors. Patterns of SCNAs in EAC tumors were similar to those found in gastric CIN tumors. Reoccurring focal alterations shared by EAC and gastric CIN tumors include amplifications of ERBB2, MYC, VEGFA, GATA4 and KRAS as well as deletions in fragile site genes such as WWOX and tumor suppressors such as CDKN2A, RUNX1 and SMAD4. Taken together this analysis indicates that EAC and SCC tumors are molecularly distinct tumors that are more similar to histologically related tumors in adjacent tissues than each other.

Page 104: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 104 -

85. Correlation of Methylation, Expression, and Mutation Patterns With Overall

Survival in Low-Grade Glioma and Glioblastoma Multiforme TCGA Cohorts Stephen W. Clark1, Sandeep Sanga2

1Vanderbilt University, Neuro-Oncology Division, Nashville, Tennessee; 2Station X Inc., San Francisco, California Aberrant and distinguishing molecular patterns have been widely described in the literature for brain cancers, particularly Glioblastoma Multiforme, which is the most common adult brain cancer. Previous studies have defined subtypes of brain cancer with differing survival profiles based on presence or absence of IDH1/IDH2 somatic mutations, and have further shown these subtypes to correlate with methylation patterns. More specifically, CpG island methylator phenotype (CIMP) tumors appear to be established in brain tumors by somatic mutations in IDH1/IDH2 and loss-of-function mutations in TET/TET2. These so-called G-CIMP tumors tend to have a hypermethylated phenotype and are associated with better survival. Driven by a hypothesis that methylation patterns in brain tumors correlate with survival, we mine the Low Grade Glioma and Glioblastoma Multiforme cohorts of The Cancer Genome Atlas to search for novel, survival-associated biomarkers. We focus our multi-omic analysis on the more aggressive phenotype defined by a lack of IDH1/IDH2 somatic mutations and hypomethylation in genes associated with G-CIMP. We will present our findings with particular attention to the methylation, expression, and mutation patterns of genes involved with DNA methylation including the DNMT and TET families.

Page 105: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 105 -

86. Evaluating the Prognostic Significance of Protein-Protein Interactions Through

Integrated Analysis of TCGA and High-Throughput Screening and Data Sahar Harati2, Josue D Moran3, Andrei Ivanov1,4, Zenggang Li1,4, Yuhong Du1,4, Margaret A. Johns1,4, Fadlo R. Khuri5,6, Haian Fu1,4,6, Carlos S. Moreno2,3,6, Lee A.D. Cooper2,6,7

1Emory CTD2 Center and Chemical Biology Discovery Center, 2Departments of Biomedical Informatics, 3Pathology and Laboratory Medicine, 4Pharmacology, 5Hematology and Medical Oncology, 6Winship Cancer Institute, 7Biomedical Engineering, Emory University/Georgia Institute of Technology, Atlanta, Georgia Introduction: Protein-protein interactions (PPIs) are critical links in cancer signaling pathways, and thus are increasingly being recognized as potential therapeutic targets for novel compounds with high specificity for the treatment of cancers. The increasing availability of genome-scale high-throughput screening (HTS) datasets presents opportunities to explore the essentiality of PPIs and to identify promising therapeutic targets with computational approaches. Genomic profiles of TCGA patient samples enable the clinical significance of cancer-essential PPIs to be evaluated to prioritize PPIs for therapeutic development and to improve understanding of disease biology. We developed a computational approach for identifying cancer-essential PPIs from gene essentiality shRNA screens, and to evaluate their prognostic value in patient samples. We applied our software to a public HTS database to discover PPIs essential in lung adenocarcinomas, and to evaluate their prognostic value in the TCGA lung NSCLC squamous cell carcinoma (LUSC) cohort. Methods: We analyzed shRNA gene-essentiality profiles from 216 cell lines in the ACHILLES database [reference]. A context-specific PPI network was created for each cell line using genetic alterations to re-wire the topology of a prior-knowledge superpathway containing 2186 proteins and 11488 PPIs. Since gene-essentiality measurements represent the sum total loss of multiple PPIs, we devised a method to use context specific topologies to deconvolve unknown PPI essentialities from gene essentiality measurements. PPIs from lung adenocarcinoma derived lines were then ranked using the Kolmogorov-Smirnov statistic to identify the most essential PPIs in lung adenocarcinomas. To evaluate the clinical significance of these PPIs, we used mutational, CNV and mRNA expression profiles to infer the absence of PPIs in 175 TCGA LUSC patients. Prognostic significance of individual PPIs was evaluated. An aggregate PPI essentiality score for each patient was also calculated using the 50 most essential PPIs in LUSC cell lines. Results: We found the aggregate PPI score to be significant (logrank p=4.8e-2), predicting poorer outcomes for patients with aggregate PPI essentiality scores below the median. Analysis of individual PPIs identified several prognostic genes that had multiple essential PPIs, including SMAD7 (22 PPIs), paxillin (PXN, 19 PPIs), TADA2B (13

PPIs), fibrinogen (FGB,10 PPIs), and EPS8 (9 PPIs). These genes are critical for TGF and EGFR signaling, adhesion to the extracellular matrix and invasion, as well as transcriptional regulation and chromatin modification. Moreover, several previous studies have determined that some of these genes are prognostic markers that may confer sensitivity to cisplatin, and play roles in invasion and metastasis. Conclusions: Here we describe a novel approach to inferring essential PPIs from HTS data and evaluating their clinical significance using TCGA patient samples. This framework could be used to better prioritize PPIs for drug target development. Reference Cowley, Weir & Vazquez, et al. Parallel genome-scale loss of function screens in 216 cancer cell lines for the

identification of context-specific genetic dependencies. Nature Scientific Data 1, Article number: 140035. September

30, 2014.

Page 106: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 106 -

87. Prognostic Value of Quantitative Histologic Features in Lower Grade Gliomas W.D. Dunn1, M. Nalisnik1, J. Kong1, D.A. Gutman1,2,4, D.J. Brat1,3,4, L.A.D. Cooper1,4,5

1Departments of Biomedical Informatics, 2Neurology, 3Pathology and Laboratory Medicine, 4Winship Cancer Institute, 5Biomedical Engineering, Emory University/Georgia Institute of Technology, Atlanta, Georgia Introduction: Comprehensive genomic analysis by the TCGA has dramatically improved our understanding of glioma brain tumors. For Lower Grade Gliomas (LGGs), the emerging molecular classification divides these tumors into three clinically relevant molecular classes based on the status of isocitrate dehydogenase (IDH) mutations and chromosome 1p/19q co-deletions (IDHmut-non-codel, IDHmut-codel, IDHwt). This classification is supplanting the traditional, but highly subjective, practice of classifying LGGs based on histopathologic criteria. To investigate the role that histology can play in aiding future prognostic evaluation of LGGs, we developed a novel quantitative image analysis pipeline to measure prognostic cues in whole slide digital images of glioma tissues. Methods: We developed an algorithm to generate statistical models of cellular arrangements in gliomas and used this approach to extract features describing the cellular density of 417 LGG and 346 GBM tumors (382 million total cells). Following identification of cell nuclei with an image analysis algorithm, each tumor is analyzed to learn the parameters of mixture point processes that describe cellular locations, effectively separating the tissue into neoplastic or normal regions with varying densities. These density parameters were then correlated with a variety of molecular and clinical data such as molecular classifications, tumor grade, and gene expression. Finally, the parameters were used as variables to construct survival models, along with various clinical and genomic variables, in order to evaluate their prognostic value. Results: Tumor density parameters (TDPs) showed significant differences across both grade and molecular subtype defined by IDH mutations and co-deletion of chromosomes 1p19q. TDPs varied consistently between grades (II-IV) when all histological types were included (ANOVA p = 5.5e-19) as well as when only astrocytomas were included (ANOVA p=2.4e-14). Focusing on LGGs with 1p19q intact, we found the more aggressive IDH wild-type (IDHwt) tumors to be denser than IDH mutant (IDHmut-non-codel) tumors (ANOVA p=1.0e-9). TDPs were also prognostic within molecular classes and grades for grade III IDHwt (logrank p=0.027) and grade III IDHmut-non-codel tumors (logrank p=0.0013). When included in proportional hazards models with Age and IDH status, TDPs perform comparably to human grading (AUC 0.86 vs. 0.87 respectively), and improve prediction over models based on IDH status and age alone (likelihood ratio test p=0.01). Pathway-level gene expression correlates of TDPs including hypoxia, cell cycle, cell motility, and cadherin and integrin signaling further validated the biological relevance of TDP measurements. Conclusions: Quantitative features of tissue architecture can capture important prognostic information from histologic sections of gliomas, and perform comparably to human grading in the TCGA cohort. The significance of TDPs within grade and LGG molecular subtype, as well as outcome modeling experiments, suggests that TDPs capture information that provides added prognostic value. Additional features with pathologic relevance beyond cellularity are needed to further improve results.

Page 107: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 107 -

88. microRNA-Gene Association as a Prognostic Biomarker in Cancer Exposes Disease

Mechanisms Sol Efroni Bar-Ilan University, Ramat Gan, Israel The presentation includes findings described in the above paper as well as additional unpublished and published findings. The paper is based on TCGA data. The transcriptional networks that regulate gene expression and modifications to this network are at the core of the cancer phenotype. MicroRNAs, a well-studied species of small non-coding RNA molecules, have been shown to have a central role in regulating gene expression as part of this transcriptional network. Further, microRNA deregulation is associated with cancer development and with tumor progression. Glioblastoma Multiform (GBM) is the most common, aggressive and malignant primary tumor of the brain and is associated with one of the worst 5-year survival rates among all human cancers. To study the transcriptional network and its modifications in GBM, we utilized gene expression, microRNA sequencing, whole genome sequencing and clinical data from hundreds of patients from different datasets. Using these data and a novel microRNA-gene association approach we introduce, we have identified unique microRNAs and their associated genes. This unique behavior is composed of the ability of the quantifiable association of the microRNA and the gene expression levels, which we show stratify patients into clinical subgroups of high statistical significance. Importantly, this stratification goes unobserved by other methods and is not affiliated by other subsets or phenotypes within the data. To investigate the robustness of the introduced approach, we demonstrate, in unrelated datasets, robustness of findings. Among the set of identified microRNA-gene associations, we closely study the example of MAF and hsa-miR-330-3p, and show how their co-behavior stratifies patients into prognosis clinical groups and how whole genome sequences tells us more about a specific genomic variation as a possible basis for patient variances. We argue that these identified associations may indicate previously unexplored specific disease control mechanisms and may be used as basis for further study and for possible therapeutic intervention.

Page 108: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 108 -

89. Somatic Mutations and Copy Number Aberrations in Drug-Resistant Acute

Myeloid Leukemia Samuli Eldfors1, Mika Kontro2, Kimmo Porkka2, Olli Kallioniemi1, Caroline Heckman1

1Institute for Molecular Medicine Finland, University of Helsinki; 2Hematology Research Unit Helsinki, University of Helsinki and Helsinki, University Central Hospital Cancer Center, Department of Hematology, Helsinki, Finland While the majority of acute myeloid leukemia (AML) patients respond to induction chemotherapy, disease recurrence and drug resistance is common. Mutations underlying AML pathogenesis have been extensively characterized by the TCGA consortium by sequencing 200 AML samples obtained at diagnosis. However, mutations driving disease progression and drug resistance in relapsed AMLs are not well characterized. In addition, identification of somatic mutations in relapsed AML is compounded by interference of donor cell variants present in those patients who have received an allogeneic hematopoietic stem cell transplant (alloHSCT). In this study we sought to identify mutations and copy number aberrations associated with development of drug resistant AML, and at the same time develop methods to identify and filter out donor variants. We analyzed samples from therapy resistant AMLs by exome sequencing (n=31). All patients had received prior chemotherapy and a subset had relapsed after receiving an allogeneic hematopoietic stem cell transplant (alloHSCT, n=6). 5 of the patients had secondary AML that had developed after treatment for earlier hematologic malignancy. Mutation and copy number aberration frequencies were compared to those in 236 diagnosis phase AMLs (n=36 sequenced in-house, n=200 AMLs from TCGA). Donor derived germline variants in chimeric samples from patients relapsing after alloHSCT were identified with a bioinformatic methodology utilizing the dbSNP population variant database. Somatic mutations called from chimeric samples were filtered for common population variants present in the donor’s genome. Rare donor derived population variants that have not been previously described were identified as variants not present in the patient’s germline genome and which had similar tumor variant allele frequencies as the common donor derived variants. We estimated the level of chimerism in the tumor samples based on the variant allele frequencies of all donor derived variants. Our results suggest that AML progression and drug resistance may be caused by strengthening of aberrant signaling through pathways already affected by a mutation present at diagnosis. In drug resistant AMLs both copies of WT1 are typically affected by mutations in contrast to diagnosis phase samples which more often carry heterozygous mutations. We have also observed multiple mutations affecting the FLT3 pathway in relapse samples. We also show that donor derived germline variants can be identified and filtered from exome sequence data.

Page 109: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 109 -

90. Cancer Systems Biology of TCGA SKCM: Efficient Detection of Genomic Drivers

in Melanoma Lauren Edwards, Rohit Gupta, Jian Guan, Fabian V. Filipp Systems Biology and Cancer Metabolism, Program for Quantitative Systems Biology, University of California, Merced, Merced, California We characterized the mutational and transcriptional landscape of human skin cutaneous melanoma (SKCM) using data obtained from The Cancer Genome Atlas (TCGA) project. We analyzed next-generation sequencing data of somatic copy number alterations and somatic mutations in 303 metastatic melanomas. We were able to confirm preeminent drivers of melanoma as well as identify new melanoma genes. The TCGA SKCM study confirmed a dominance of somatic BRAF mutations in 50% of patients. The mutational burden of melanoma patients is an order of magnitude higher than of other TCGA cohorts. A multi-step filter enriched somatic mutations while accounting for recurrence, conservation, and basal rate. Thus, this filter can serve as a paradigm for analysis of genome-wide next-generation sequencing data of large cohorts with a high mutational burden. Analysis of TCGA melanoma data using such a multi-step filter discovered novel and statistically significant potential melanoma driver genes. In the context of the Pan-Cancer study we report a detailed analysis of the mutational landscape of BRAF and other drivers across cancer tissues. Integrated analysis of somatic mutations, somatic copy number alterations, low pass copy numbers, and gene expression of the melanogenesis pathway shows coordination of proliferative events by Gs-protein and cyclin signaling at a systems level.

Page 110: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 110 -

92. Integrative Analysis of HSF1 in Breast Cancer Yesim Gökmen-Polar1, Susan Perkins3, Sunil Badve1,2,4

Departments of 1Pathology and Laboratory Medicine, 2Medicine, 3Biostatistics, Indiana University School of Medicine, Indianapolis, Indiana; 4Indiana University Melvin and Bren Simon Cancer Center, Indianapolis, Indiana Heat shock transcription factor 1 (HSF1), a key regulator of the heat-shock response, is deregulated in a number of cancers. HSF1 can mediate cancer cell survival and metastasis. High levels of HSF1 have been associated with poor prognosis in breast cancer. To validate the nature of HSF1 upregulation in breast cancer, we assessed copy number alterations (CNAs), mutational and methylation status in the Cancer Genome Atlas (TCGA) breast cancer data set. Copy number alterations (CNA) of HSF1 were observed in 146 (15%) out of 962 breast tumors (14.8% amplifications; 0.2% homozygous deletion). In addition, only 0.2% patients had mutations which were not located in major domains. The majority of tumors had HSF1 upregulation (25.6%) rather than downregulation (0.1%). Analysis of 737 cases with methylation data (HM450) showed a weak inverse correlation of methylation with the expression of HSF1. Analysis of HSF1 protein expression by immunohistochemistry (IHC; New England BioLabs) in a large series of breast cancer cases with Oncotype Dx scores (~n=450) showed weak correlation of protein expression with predicted outcomes. HSF1 mRNA levels were not associated with outcomes in TCGA data. However, in microarray datasets, high HSF1 levels in ER-positive tumors were significantly associated with shorter overall survival (OS; P=0.00045) and relapse-free survival (RFS; P=0.0057). In multivariable analysis, HSF1 remained a significant prognostic parameter. HSF1 is an independent prognostic factor in ER-positive breast cancer. A prognostic impact was not observed in ER-negative tumors. In conclusion, the mRNA expression levels of HSF1 in ER-positive breast cancer are associated with both shorter relapse-free and overall survival in publically available microarray datasets. This prognostic impact could not be confirmed by using IHC or TCGA analysis using mRNA, methylation or mutation data. Cautious interpretation of data from publically available microarray datasets is necessary in determining biological impact of genes/ proteins.

Page 111: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 111 -

93. Cancer Systems Biology in Melanoma to Identify Key Driver Genes in TCGA

SKCM Dataset Rohit Gupta, Fabian Filipp University of California, Merced, Merced, California We characterized the mutational landscape of human skin cutaneous melanoma (SKCM) using data obtained from The Cancer Genome Atlas (TCGA) project. We analyzed next-generation sequencing data obtained from Cancer Genome Hub at UCSC to analyze the signature of somatic copy number alterations and somatic mutations in 303 metastatic melanomas. As a result, we were able to confirm preeminent drivers of melanoma as well as identify new melanoma genes. The TCGA SKCM study confirmed a dominance of somatic BRAF mutations in 50% of patients. The mutational burden of melanoma patients is an order of magnitude higher than of other TCGA cohorts. A multi-step filter enriched somatic mutations while accounting for recurrence, conservation, and basal rate. Thus, this filter can serve as a paradigm for analysis of genome-wide next-generation sequencing data of large cohorts with a high mutational burden. Analysis of TCGA melanoma data using such a multi-step filter discovered novel and statistically significant potential melanoma driver genes. We also ran an unbiased analysis using all metabolic genes without using our multi-step filter and observed significant overlap in the driver genes reported. The driver genes obtained were subjected to mutual exclusivity analysis to access the scope and extent to which a specific gene can drive cancer progression. Mutual exclusivity study also confirmed that genes identified as driver through our multi-step filter had a pattern of mutational mutual exclusivity among each other. In the context of the Pan-Cancer study we report a detailed analysis of the mutational landscape of BRAF, DPYD and other drivers across cancer tissues. Integrated analysis of somatic mutations, somatic copy number alterations, low pass copy numbers, and gene expression of the melanogenesis pathway shows coordination of proliferative events by Gs-protein and cyclin signaling at a systems level.

Page 112: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 112 -

94. Cancer-BioBin: Binning Somatic Mutations Based on Biological Knowledge for

Predicting Survival Outcome Dokyoon Kim, Ruowang Li, Scott M. Dudek, John R. Wallace, Marylyn D. Ritchie

Center for Systems Genomics, Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, Pennsylvania Enormous efforts of whole exome and genome sequencing from hundreds to thousands of patients have provided the landscape of somatic genomic alterations in many cancer types to distinguish between driver mutations and passenger mutations. Driver mutations show strong associations with cancer clinical outcomes such as survival. However, due to the heterogeneity of tumors, somatic mutation profiles are exceptionally sparse whereas other types of genomic data such as miRNA or gene expression contain much more complete data for all genomic features with quantitative values measured in each patient. To overcome the extreme sparseness of somatic mutation profiles and allow for the discovery of combinations of somatic mutations that may predict cancer clinical outcomes, here we propose a novel approach, Cancer-BioBin, for binning somatic mutations based on existing biological knowledge. Through the analysis using renal cell carcinoma dataset from The Cancer Genome Atlas (TCGA), we identified combinations of somatic mutation burden based on pathways, protein families, evolutionary conversed regions, and regulatory regions associated with survival. In addition, Kaplan-Meier survival analysis for the validation dataset demonstrated that somatic mutation burden based on biological knowledge showed significant associations with cancer prognosis in renal cell carcinoma (Log rank P = 3 x 1015). Due to the nature of heterogeneity in cancer, using a binning strategy for somatic mutation profiles based on biological knowledge will be valuable for improved prognostic biomarkers and potentially for tailoring therapeutic strategies by identifying combinations of driver mutations.

Page 113: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 113 -

96. Characterization of Somatic Copy Number Drivers in Cervical Carcinomas Brad Murray, Andrew Cherniack, Akinyemi Ojesina, Gordon Mills, Janet Rader, Matthew Meyerson, and TCGA CESC Analysis Working Group Somatic copy number alterations in 180 CESC tumors were determined with SNP 6.0 arrays. There were an average of 82 copy number alterations per tumor, less than serous ovarian (#) and endometrial (#) carcinomas but more than endometrioid endometrial carcinomas.Analysis of focal amplifications and deletions performed by the GISTIC2.0 algorithm revealed 26 focal amplifications and 37 focal deletions along with 23 whole arms that were recurrently altered. Recurrent focal amplifications were identified at 3q26.31 (TERC, MECOM), 3q28 (TP63), 7p11.2 (EGFR), 8q24.21 (MYC, PVT1), 9p24.1 (CD274, PDCD1LG2), 11q22.1 (YAP1), 13q22.1 (KLF5), 16p13.13 (BCAR4) and 17q12 (ERBB2). Recurrent deletions were identified at 3p24.1 (TGFBR2), 10q23.31 (PTEN), and 18q21.2 (SMAD4). Notably, this analysis discovered novel cervical cancer driver genes, including, therapeutic targets of immune inhibitors CD274 (PD-L1), PDCD1LG2 (PD-L2), as well as a novel linc-RNA BCAR4, which has been linked to promoting metastasis, anti-estrogen resistance, and Lapatanib sensitivity in breast cancer. BCAR4 is unexpressed in normal cervix and only expressed in the presence of amplifications in cervical cancer. Unsupervised clustering of somatic copy number alterations revealed two groups of tumors, one group with high rate of copy number alterations (> 100 events) and one with less (p <0.0001). Interestingly, these groups also showed significant clinical and molecular differences. The CN high cluster was largely composed of squamous tumors infected with HPV16 and contained significantly more tumors with YAP1 amplifications (p < 0.0001). The CN low cluster contained the majority of adenocarcinomas, HPV18, and showed a novel deletion of TGFBR2.

Page 114: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 114 -

97. Examining Overall Mutation Frequency in Breast Cancer Patients Who Carry

Germline Variants in DNA Repair Genes Hyrum S. Eddington, Stephen R. Piccolo Department of Biology, Brigham Young University, Provo, Utah Introduction: Germline variants in well-known breast-cancer susceptibility genes, including BRCA1 and BRCA2, can compromise a cell’s ability to repair DNA damage that inevitably arises during cell replication and upon exposure to mutagens. Consequently, one might expect that patients harboring germline variants in these genes would experience higher overall mutation rates in non-tumor cells than individuals who do not carry such mutations. An increase in the overall number of mutations may increase the chance that normal cells will transform into cancer cells and thus may be correlated with a younger age at diagnosis. However, to our knowledge, these assumptions have not been evaluated across the genome in a large population-based cohort. Methods: Although The Cancer Genome Atlas (TCGA) focuses primarily on providing insights about molecular-level aberrations that influence tumorigenesis, this resource also provides extensive data representing germline DNA variation. We downloaded exome-sequencing data representing germline variation for 611 breast-cancer patients from TCGA. We identified variants in these samples using Burrows Wheeler Aligner (v0.6.1) and Genome Analysis Toolkit (v. 2.3.4) and filtered the data to include only variants that Variant Effect Predictor estimates to have a MODERATE or HIGH effect on protein sequence and that occur relatively rarely in the population (<5%). Results: We are summarizing the variants at the patient level and evaluating whether patients who carried a variant in BRCA1 or BRCA2 have a relatively high number of germline variants compared to individuals who did not carry a variant in these genes. We are also comparing the overall frequency of germline variants against the patient’s age at diagnosis. Our poster will provide a summary of these findings. Discussion: This approach may serve as a biomarker to predict whether individuals will develop breast cancer at a relatively young age. In addition, for patients who do not carry a known pathogenic mutation in BRCA1 or BRCA2, a relatively high overall number of germline mutations may indicate that DNA repair mechanisms have been compromised by other genomic aberrations; knowledge of these aberrations would be informative for identifying additional risk variants that are relevant for the general population. At the time of breast-cancer treatment, it may also be useful to know whether DNA repair mechanisms have been compromised because this status may influence the level to which patients respond to targeted treatments and cancer immunotherapies.

Page 115: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 115 -

99. Analysis of TCGA Data Under the Prism of miRNA Isoforms Leads to New Insights

Into Breast Cancer Biology Aristeidis G. Telonis, Phillipe Loher, Eric Londin, Isidore Rigoutsos Computational Medicine Center, Sidney Kimmel Medical College at Thomas Jefferson University, Philadelphia, Pennsylvania We analyzed breast cancer (BRCA) and adjacent normal breast samples from The Cancer Genome Atlas (TCGA) repository from the standpoint of miRNA isoforms (isomiRs). The analyzed datasets represent all hormone profiles and include patients from two races (white and black). The choice of datasets allowed us to carry out subtype-specific and race-specific comparisons and to track the dependence of the isomiR profiles on these variables. Our analysis revealed a very large repertoire of active, distinct miRNA isoforms in breast cancer that are produced from genomic loci already in miRBase as well as from many of our recently reported novel miRNA loci. Importantly, we found that the specifics of which isomiRs will be produced by a given miRNA precursor arm depend on the subtype of breast cancer as well as on the race of the breast cancer patient. This dependence is strikingly evident in our analysis of triple negative breast cancer and normal TCGA datasets from white and black patients. From the standpoint of potential downstream targets, we found that isomiRs with even slightly different sequences can have radically different targetomes in BRCA, a finding that is concordant with the messenger RNA data that are available for the TCGA samples we analyzed. From the standpoint of information content, we found that the isomiR profiles can easily differentiate between the Luminal A and Luminal B BRCA subtypes. Lastly, our analyses also show that the abundance of an isomiR in BRCA depends strongly on the location of the isomiR’s 5´ endpoint within the miRNA precursor, which in turn suggests the existence of currently unknown mechanisms that are involved in isomiR biogenesis. These findings make an important contribution to our understanding of the post-transcriptional regulatory layer in breast cancer. They may also underlie previously reported differential outcomes among BRCA patients, which persist after socioeconomic factors have been accounted for. As such, isomiRs might provide a platform that further facilitates the transition of miRNAs from basic research to clinical use.

Page 116: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 116 -

101. TERT Fusion Transcript in Hepatocellular Carcinoma Shogo Yamamoto, Kenji Tatsuno, Hiroki Ueda, Genta Nagae, Hiroyuki Aburatani Genome Science Division, RCAST, University of Tokyo, Tokyo, Japan We have sequenced about 500 exome of hepatocellular carcinoma (HCC) and matched-pair normal liver tissue and recently reported [reference] that a variety of TERT alterations such as TERT promoter mutation, focal copy number gain at TERT locus, and hepatitis B virus (HBV) integration at TERT locus was the most frequent gene alteration (>65% of total cases) in HCC. We have also analyzed transcripts of HCC and normal liver tissue and confirmed that TERT higher expressions compare to normal liver tissues were observed in all HCC samples, even in the case without these TERT alterations. Gene fusion is known for a gene regulation mechanism in several cancers. We analyzed TERT-fusion transcript using RNA-seq data of HCC and identified TERT-fusion transcripts in 4 HCCs out of 152 cases. TERT expression level was ~100 times higher than normal liver and 30-60 times higher than HCCs without gene fusion. We also analyzed TERT-fusion transcript for 250 liver cancer RNA-seq data from TCGA and identified 4 cases having TERT fusion transcript. Among them, one gene was recurrently identified as TERT-fusion transcript both in Japanese HCC data and in TCGA data, suggesting functional roles of this fusion gene in HCC tumor genesis. Reference Totoki et al., Nat Genet. 2014 Dec;46(12):1267-73.

Page 117: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 117 -

103. Non-Coding RNA Profiling of Colon Cancer Reveals a Distinct Profile for High-

Grade and Metastatic Tumors: A Potential Role for Mitochondrial tRNAs and

Small Nucleolar RNAs in Locally Invasive and Metastatic Tumors Lai Xu1, Joseph Ziegelbauer2, Rong Wang3, Wells W. Wu3, Rong-Fong Shen3, Hartmut Juhl4, Yaqin Zhang1, Amy Rosenberg1

1OBP/DBRR-III, CDER, FDA, Silver Spring, Maryland 2 HIV/AIDS Malignancy Branch, NCI, Bethesda, Maryland 3 Facility for Biotechnology Resources, CBER, FDA, Silver Spring, Maryland 4 Indivumed GMBH. Hamburg, Germany Purpose: To gain insight into the potential role of non-coding RNAs (ncRNAs) in the biological characteristics of colorectal carcinoma (CRC), we evaluated both non-coding and coding RNAs in paired samples of normal mucosa and tumor from 15 patients with CRC which were collected and stored under stringent conditions, thereby minimizing warm ischemic time. Experimental Design: We focused particularly on distinctions among high grade tumors and tumors with known metastases versus lower grade tumors, performing RNA-Seq analysis which quantifies transcript abundance and identifies novel transcripts. Results: In comparing tumors to healthy control mucosa, we found the following: (1) a distinct signature of mitochondrial encoded transfer RNAs (MT-tRNAs) and small nucleolar RNAs (snoRNAs) in CRC with lymph node (LN) metastases; (2) a MT-tRNA/snoRNA signature for high vs low grade tumors; (3) individualized MT-tRNA/snoRNA and microRNA (miRNA) fingerprints for each patient’s tumor; and (4) miRNA/mRNA signatures of hypoxia and a shift to glycolytic metabolism in all CRCs, regardless of grade and stage. Conclusions: These findings could potentially improve upon anatomically and histologically based tumor staging systems, improve prognostic predictions, and identify potential targets for cancer therapeutics.

Page 118: Oral Presentations€¦ · 95. Comprehensive Molecular Characterization of Papillary Renal Cell Carcinoma W. Marston Linehan The Cancer Genome Atlas Research Network, National Institutes

- 118 -

105. Deciphering Genomic Underpinnings of Quantitative MRI-Based Radiomic

Phenotypes of Invasive Breast Carcinoma Yitan Zhu1, Hui Li2, Wentian Guo3, Karen Drukker2, Shengjie Yang1, Li Lan2, Maryellen L. Giger2, Yuan Ji1,4 for the TCIA Breast Cancer Group 1Research Institute, NorthShore University HealthSystem, Evanston, Illinois; 2Department of Radiology, The University of Chicago, Chicago, Illinois; 3School of Public Health, Fudan University, Shanghai, People’s Republic of China; 4Department of Health Studies, The University of Chicago, Chicago, Illinois Magnetic Resonance Imaging (MRI) has been routinely used for diagnosis and assessment of breast cancer. Despite its wide applications in clinical practice, the relationship between the observed tumor MRI phenotypes and the genomic mechanism of tumorigenesis remains under-explored, largely due to lack of data on both imaging and genomics for the same tumors. We combined data from The Cancer Genome Atlas (TCGA) and The Cancer Image Archive (TCIA), which included quantitatively extracted MRI phenotypes of 91 breast invasive carcinomas and their multi-layer genomic data. Gene set enrichment analysis and regression analysis were performed to identify associations between tumor MRI radiomic phenotypes and various genomic and molecular subtypes of tumors. Patient groups defined by radiomic phenotypes and genomic platforms were also associated with tumor pathological stages and molecular receptor status using Fisher’s exact test. Significant associations (adjusted p-value ≤ 0.1) were identified between radiomic phenotypes (characterizing tumor size, shape, margin, enhancement texture, and blood flow kinetics) and genomic features involved in multiple molecular regulation layers (including pathway gene expressions, pathway copy number variations, gene somatic mutations, miRNA expressions, and protein expressions). Transcriptional activities of various genetic pathways were dominantly positively associated with tumor size and blurred tumor margin. miRNA activity significantly associated with tumor size and enhancement textures, but not with phenotypes describing tumor shape, margin, and blood flow kinetics. Patient groups defined by radiomic phenotypes were associated with tumor T stage and overall stage (p-values ≤ 0.072). Genomic platforms defined patient groups associated with the status of progesterone and estrogen receptors (p-value ≤ 4.27×10-5) and pathological stages (p-values ≤ 0.056). We present these findings as a resource shedding insight on the connection between underlying genetic mechanisms and observed tumor radiomic phenotypes, which forms a basis for future studies using non-invasive MRI techniques for accurate cancer diagnosis and prognosis.