CBIO243: Principles of Cancer Systems Biology Sylvia Plevritis, PhD Course Director Melissa Ko...

Preview:

Citation preview

CBIO243: Principles of Cancer Systems Biology

Sylvia Plevritis, PhDCourse Director

Melissa KoTeaching Assistant

Fuad NijimCCSB Program Manager

March 31, 2014

Goals of CBIO243

• Introduce major principles of cancer systems biology that integrate experimental and computational biology.

• Gain familiarity with methods to analyze high-dimensional and highly-multiplexed data in order to synthesize biologically and clinically relevant insights and generate hypotheses for functional testing.

Biological Sciences:• Cancer Biology,• Hematology,• Immunology,• Genetics,• etc.

Computational Sciences:• Bioinformatics,• Engineering,• Computer Science,• Physics,• Statistics,• etc.

CSB

Approach: Integrative Analysis

Cancer Research Goal: Drug Targets Drug Resistance Combination

Therapies Tumor Evolution Cancer Drivers Metastasis Tumor Heterogeneity Cancer Stem Cells EMT Personalized Medicine Biomarkers Other ______

Experimental Sciences:

Sequencing Methylation Gene Expression CNV TMA Proteomics Single Cell Analysis LCM, Sorted Cells Drug Screening Other ______

_______

Computational Sciences:

Statistical Regression

Machine Learning Bayesian Analysis Boolean Analysis ODE/PDE Network

Reconstruction Pathway Analysis Other _____

________

Functional Validation

Components of Cancer Systems Biology

Topics Covered

• Basic principles of molecular biology of cancer• Experimental high-throughput technologies• Design of perturbation studies, including drug screening.• Overview of publically available datasets, including GEO,

TCGA, CCLE, and ENCODE• Online biocomputational tools, including selected

accessible tools from the NCI Center for Bioinformatics• Network reconstruction from genomic data• Application of systems biology to identifying drug targets• Application of systems biology to personalized medicine

Grading

• Weekly paper review/class participation (30%)

• Project Presentations (20%)

• Final Project Report (50%): 6-7 page written report and oral presentation demonstrating the understanding of key concepts in cancer systems biology research.

Weekly Reading Review

• Summarize objective/hypothesis, the data, the controls, results and the published interpretations.

• Discuss whether the authors' conclusions were justified, and suggest improved analyses and/or future research.

• Describe relevance to cancer systems biology, and any gaps in training to fully understand paper.

First Reading Assignment

• Chuang, H.-Y., Lee, E., Liu, Y.-T., Lee, D., & Ideker, T. (2007). Network-based classification of breast cancer metastasis. Molecular Systems Biology.

• Akavia, U. D., Litvin, O., Kim, J., Sanchez-Garcia, F., Kotliar, D., Causton, H. C., Pochanard, P., et al. (2010). An Integrated Approach to Uncover Drivers of Cancer. Cell, 143(6), 1005–1017.

Background Material

• Overview of Cancer– Hannahan D, Weinberg RA. Hallmarks of Cancer:

The Next Generation, Cell 14(5), 2011. • Overview of Molecular Biology

– Kimball’s Biology Pages– http://home.comcast.net/~john.kimball1/

BiologyPages

Background Material

• Visualization of Genomic Data• Schroeder MP, et al, Visualizing multidimensional

cancer genomics data, Genome Medicine, 5:9, 2013

• Overview of Programming– R/Bioconductor

• http://www.r-project.org/• www.cyclismo.org/tutorial/R/

– Python• http://www.python.org/• https://developers.google.com/edu/python/

Center for Cancer Systems Biology(ccsb.stanford.edu)

• Monthly Seminar Series– GENOMIC BIOMAKERS OF CANCER PREVENTION AND TREATMENT– Friday April 11th at 11 am (Alway Building, Room M114)Andrea Bild, Department of Pharmacology

and Toxicology, University of Utah

• Annual Symposium (Friday October 17, 2014)

• R25T Training Grant– Two year postdoctoral training fellowship

Cancer as a Complex System

Pienta et al, Ecological Therapy for Cancer: Defining Tumors Using an Ecosystem Paradigm Suggests New Opportunities for Nove Cancer Treatments, Translational Oncology, 2008, 1(4):158-164.

Multiscale View of Cancer• Genes and proteins• Complex signaling and regulatory networks• Multiple cellular processes• Micro-environment• Host systems• Environmental factors• Population dynamics

Initiation Progression Metastasis Recurrence

Time - Progression

Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.

Hallmarks of Cancer

Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.

Hanahan, D., & Weinberg, R. A. (2011). Hallmarks of Cancer: The Next Generation. Cell, 144(5), 646–674.

http://www.cell.com/image/S0092-8674(11)00127-9?imageId=gr2&imageType=hiRes

Network types• Protein-protein• Protein-DNA• miRNA-RNA• Transcriptional

(expression) networks• Signaling networks

Sachs et al. http://www.sciencemag.org/content/308/5721/523.full

20

The Multiscale Challenge

• Many components and interactions of the “cancer system” are known

• Linkages between global dynamics and phenotypic properties from local interactions are not well known

http://circ.ahajournals.org/content/123/18/1996/F5.expansion.html

Goals of Cancer Systems Biology Research

• To derive a comprehensive understanding of cancer’s complexity by integrating diverse information to:– Identify cellular networks and cell-cell interactions

that drive cancer initiation and progression– Identify potential therapeutic targets and mechanisms

of action

Principles in Cancer Systems Biology Research

• Cancer networks are dynamic and response to genetic variants, epigenetics and the microenvironment

• Tumors may not be a random collection of malignant cells but cells that may be related through processes of developmental biology

Cancer Systems Biology The Past

Experimentation Computation

Cancer Systems Biology The Present

Experimentation Computation

Cancer Systems Biology The Future

Experimentation

Computation

Objective: Identify genes and networks differentially expressed in lymphoma transformation

• Glas et al. “Gene expression profiling in follicular lymphoma to assess clinical aggressiveness and to guide the choice of treatment.” Blood 2005

– 24 paired samples (12 FL/12 DLBCL)

– 88 FL/DLBCL arrays » 30 DLBCL» 40 FL-transforming (FL_t)» 18 FL-non-transforming (FL_nt)

FL DLBCL

Identify differentially expressed genes

• Average Fold Change (AFC)• Pro: Easy• Con: Does not account for

variance

• p-value, based on t-test statistic• Pro: Easy, accounts for

variance• Con: Does not account for

the problem of multiple hypothesis testing

Log2(Average Fold Change)

-Log

10(p

-val

ue)

Statistical Analysis of Microarrays (SAM)

http://www-stat.stanford.edu/~tibs/expected

obse

rved

Address the problem of Multiple Hypothesis Testing:

Suppose measure 10,000 genes and nothing changes.

At the %1 significance level, 100 genes could be selected as differentially expressed but all would be false positives.

SAM corrects for this by computing the False

Discovery Rate, based on

permutation testing.

GOminer• Identify enrichment in Gene Ontology (GO) terms

based a hierarchy describing biological process; cellular component; molecular function

Genes significantly differentially expressed in compact vs. non-compact tumors are related to cell death, Cell-to-cell signaling and interaction, cellular assembly and organization, DNA replication and Cellular movement

http://discover.nci.nih.gov/gominer/

Gene set enrichment analysis (GSEA)

• Evaluate enrichment of curated gene sets, such as– Pathways– Genes that share a motif– Genes at a similar chromosomal

location– Computationally predicted gene

sets– Your own favorite list of genes

• Evaluating related genes together adds statistical power

• http://broad.mit.edu/gsea

GSEA on Lymphoma Data

• Myc targets up-regulated, in agreement with Myc up-regulation found by SAM

• GSEA detects ~200 sets of differentially expressed genes at low FDR– Many metabolic pathways up-

regulated in DLBCL– Myc target genes significant

• In general, GSEA produces many “generic” gene sets– many metabolic– many a consequence of

aggressive phenotype– no graphical view of pathways

FLDLBCL

LegendUP

DOWN

Overlap expression levels on canonical pathways

IPA, Ingenuity Pathway Analysis (www.ingenuity.com)

Cellular assembly & organization network

Cellular assembly & organization network

• Expand network using interactions from the literature

• Visualization using cellular localization

IPA links to literature

Protein-protein Interaction Networks

Protein-protein interaction networks http://string-db.org

String-db.org - example• DNA repair genes

BARD1 FANCL POLD3 TOPBP1

BLM FEN1 POLE TREX1

BRCA1 GMNN POLE2 UNG

BRIP1 ING2 PRIM2A USP1

DCLRE1A MLH3 RAD51A

DCLRE1B MSH2 RAD54B

DDX11 MSH5 RECQL4

DNA2L MSH6 RFC3

EXO1 PARP2 RFC4

FANCG PCNA RPA2

Inferring Gene Regulatory Networks

Useful non-technical review:“Computational methods for discovering gene

networks from expression data” Lee & Tzou

Single gene focus is limiting

induced

repressed

gene A

FL DLBCL

individuals

Gene interaction is more powerful

induced

repressed

DLBCL

individuals

gene A

gene B

FL FL

A UPB DOWN

Interaction of gene clusters

induced

repressed

DLBCLFL FL

individuals

Module X

Module Y

X UPY DOWN

Module1

Module2

Module3

samples

gene1

gene2

geneN

Inferring Gene Regulation

Inferring Gene Regulation

Mod1

Mod8

Mod3

Mod6

samples

Average expression of each module

Key Idea of Regulatory Module Networks• Look for a set of regulatory factors that, in combination,

predict a gene’s expression level• Regulatory factors can include:

– mRNA level of regulatory proteins– Genotypic factors (SNPs, CNVs)– Epigenetic factors (methylation status)– TF binding (measured by ChIP-seq)– …

• Factors that robustly predict a target’s expression across different experiments are inferred to be its regulators

Segal et al., Nature Genetics 2003

Transcription factors, signal transduction proteins, mRNA binding proteins, chromatin modification factors, …

Computational Derived Regulatory Module

Group of co-expressed genes are driven by

a computationally derived

transcriptional regulatory program,

derived from a candidate list of

regulators.

Gene A

Gene B

OnOff

OnOff

Mod

ule

gene

s

Regulatory program

Segal E et al, Nature Genetics 2003.

Core module network of FL transformation

Gentles A et al, Blood 2009

Integration with survival data• Module A is single most predictive of survival data by Cox

regression (bad prognosis in FL)

• Define a linear predictor of survival:– LPS=1.14*ModuleA + 0.72*GFL3027 – 1.35*GFL2738

2738*35.13027*72.0*14.1 GFLGFLModuleALPS 2738*35.13027*72.0*14.1 GFLGFLModuleALPS

Bad Part: ESC like expression

Good Part: TGFB signaling

Gentles A et al, Blood 2009

Survival based on LPS

Gentles A et al, Blood 2009

DATABASES

• TCGA• CCLE• ENCODE

The Cancer Genome Atlas (TCGA)

• Phase I: Initiated in 2005 by the National Cancer Institute and National Human Genome Research Institute to catalog genetic mutations causing cancer, using genome sequencing; focused on GBM, lung and ovarian cancer

• Phase II: Expanded to 20-25 different cancer types, complement genome sequencing with genomic characterization, including gene expression profiling, copy number variation, DNA methylation, miRNA

TCGA:Cancer measured at multiple scales

– mRNA & miRNA expression

– Copy number– DNA Methylation– Mutation (NGS)– Pathology images– Medical Images– Treatment– Survival Outcome

Acute

Myeloid Le

ukemia

[LAML]

Bladder

Urotheli

al Carc

inoma [BLC

A]

Brain Lo

wer Grad

e Glio

ma [LG

G]

Breast

invasiv

e carc

inoma [BRCA]

Cervica

l squam

ous cell

carci

noma and en

docervic

al ad

enocar

cinoma [

CESC]

Colon aden

ocarcin

oma [COAD]

Esophag

eal ca

rcinoma [

ESCA]

Glioblas

toma multiform

e [GBM]

Head an

d Neck sq

uamous c

ell ca

rcinoma [

HNSC]

Kidney Chro

mophobe [KICH]

Kidney re

nal cle

ar cel

l carci

noma [KIRC]

Kidney re

nal pap

illary

cell ca

rcinoma [

KIRP]

Liver

hepato

cellular

carci

noma [LIH

C]

Lung a

denocar

cinoma [

LUAD]

Lung s

quamous c

ell ca

rcinoma [

LUSC

]

Lymphoid Neo

plasm Diffuse

Large

B-cell L

ymphoma [

DLBC]

Ovaria

n sero

us cyst

aden

ocarcin

oma [OV]

Pancre

atic a

denocar

cinoma [

PAAD]

Prosta

te ad

enocar

cinoma [

PRAD]

Rectum ad

enocar

cinoma [

READ]

Sarco

ma [SA

RC]

Skin Cutan

eous M

elanoma [

SKCM]

Stomach

aden

ocarcin

oma [ST

AD]

Thyro

id carci

noma [TH

CA]

Uterine C

orpus E

ndometrioid Carc

inoma [UCEC

]0

100

200

300

400

500

600

700

800

900

1000

TCGA Cancer Types

Num

ber o

f Pati

ents

with

Sam

ples

TCGA Organization TSS:Tissue Source Sites

BCR: Biospecimen Core Resources

DCC: Data Coordinating Center

GCC: Genome Characterization Centers

GSC: Genome Sequencing Center

CGSub: Cancer Genomics Hub

GDACS: Genome Data Analysis Centers

Major TCGA Publications• Comprehensive molecular characterization of human colon and rectal cancer.

Nature. 487 (7407):330-337, 2012. – Mutations in ARlD1A, SOX9, FAM123B/WTX;, IGF2; mutations in WNT pathway

• Comprehensive genomic characterization of squamous cell lung cancers. Nature. 489 (7417):519:525, 2012.

• Comprehensive molecular portraits of human breast tumors. Nature. 490 (7418):61-70, 2012.

- Mutations in ESR1, GATA3, FOXA1, XBP1, and cMYB.

• Integrated genomic analyses of ovarian carcinoma. Nature. 474 (7353):609-615, 2011.

– Mutations in TP53 occurred in 96% of the cases studied; mutations in BRCA1 and BRCA2 occurred in 21% of the cases

• An integrated genomic analysis identifies clinically relevant subtypes of glioblastoma characterized by abnormalities in PDGFRA, IDH1, EGFR and NF1. Cancer Cell. 17 (1):98-110, 2010.

• Identification of a CpG Island Methylator Phenotype that Defines a Distinct Subgroup of Glioma. Cancer Cell. 17 (5):510-522 , 2010.

• Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature. 455 (7216):1061-1068, 2008.

– Mutations in NF1, ERBB2, TP53, PlK3R1

UCSC Cancer Browser – Chromosome View

https://genome-cancer.ucsc.edu

UCSC Cancer Browser Gene View

Cancer Browser – Survival Analysis

Cancer Cell Line Encyclopedia (CCLE)

• The Cancer Cell Line Encyclopedia (CCLE) project is a collaboration between the Broad Institute, and Novartis to conduct a genetic and pharmacologic characterization of a large panel of human cancer cell lines

• Link distinct drug response to genomic patterns and to translate cell line integrative genomics into cancer patient stratification.

• Public access analysis and visualization of DNA copy number, mRNA expression and mutation data for about 1000 cell lines.

http://www.broadinstitute.org/ccle/home

Cellular Information Processing

ENCODE

http://genome.ucsc.edu/ENCODE/index.html

ENCODE

Summary

• Basic principles of molecular biology of cancer• Experimental high-throughput technologies• Design of perturbation studies, including drug screening.• Overview of publically available datasets, including GEO,

TCGA, CCLE, and ENCODE• Online biocomputational tools, including selected

accessible tools from the NCI Center for Bioinformatics• Network reconstruction from genomic data• Application of systems biology to identifying drug targets• Application of systems biology to personalized medicine