34
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Microarrays and CancerSegal et al.

CS 466

Saurabh Sinha

Page 2: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Genomics and pathology

• Genomics provides high-throughput measurements of molecular mechanisms– Microarrays, ChIP-on-chip, etc.

• Genomics may provide the molecular underpinnings of pathology, in a highly comprehensive manner– Revolutionize the diagnosis and management of

diseases, including cancer

Page 3: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Prior applications to cancer

• Gene expression measurements have been applied to cancer diagnosis

• Measure each gene’s expression in several normal tissue samples, and several pathological (diseased) samples

• Find subset of genes differentially expressed in the two sample groups

• If such “gene signatures” of particular cancer types are found, they can become the basis of tests for malignancy

Page 4: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

We want better …

• Genes may be differentially expressed, but not enough to cross certain thresholds used in the analysis

• Analyzing the data on a gene-by-gene basis is error prone -- microarray data has inherent noise

• Finding the genes involved in one type of cancer is only the first step; it does not reveal the underlying processes

Page 5: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Part 1: Cancer modules

Page 6: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

A “module” level view

• Many methods use “gene modules” (sets of genes) as basic blocks for analysis

• Instead of trying to find changes in individual gene expression profiles, look out for entire sets of genes with changing expression profiles

Page 7: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

The study of Mootha et al.

• Showed that expression of “oxidative phosphorylation” genes (a particular set of genes) is reduced in diabetic muscle

• Signal not very strong when looking at individual genes, but highly significant when looking at the “gene module”

Page 8: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Source: Nature Genetics 37, S38 - S45 (2005) D

isea

se t

issu

e(D

iabe

tes

mel

litus

typ

e 2)

Normal tissue(Normal tolerance to glucose)

Grey: all genesRed: oxidative phosphorylation genes

Page 9: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Segal et al.: Methodology

• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or

normal tissue

• Compile a large collection of gene sets (modules) from earlier studies

• Identify gene set (modules) induced or repressed in a microarray

• Identify modules induced in several arrays, or repressed in several arrays

• Check if these arrays are enriched in some clinical annotation

Page 10: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Identify gene set (modules) induced or repressed in a microarray

• Given expression value Eg,m of each gene g in the microarray experiment m

• Compute average expression Eg of the gene g over all microarrays

• If Eg,m is 2-fold greater than Eg, call the gene g as induced in array m

• Categorize each gene as being induced or not-induced in the array.

Source: Nature Genetics 36, 1090 - 1098 (2004)

Page 11: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Identify gene set (modules) induced or repressed in a microarray

• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k

• Hypergeometric test(N,n,m,k):

• If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be larger than or equal to k?

All genes

Module

Induced

Intersection

Pr(

Page 12: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Identify gene set (modules) induced or repressed in a microarray

• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k

• Hypergeometric test(N,n,m,k):

• Sum over i>=k: If a set of m genes was chosen at random (sampling w/o replacement), what is the probability that the intersection would be equal to i?

All genes

Module

Induced

Intersection

Pr(

Page 13: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Identify gene set (modules) induced or repressed in a microarray

• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k

• Hypergeometric test(N,n,m,k):

All genes

Module

Induced

Intersection

Pr(

n

i

⎝ ⎜

⎠ ⎟N − n

m − i

⎝ ⎜

⎠ ⎟

N

m

⎝ ⎜

⎠ ⎟i≥k

“p-value” of the Hypergeometric test

Page 14: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Identify gene set (modules) induced or repressed in a microarray

• |All genes| = N• |Module| = n• |Induced| = m• |Intersection| = k

• Hypergeometric test(N,n,m,k)

• If the “p-value” is very small, then we infer that the intersection is “statistically significant”, i.e., the module is induced in the microarray

• Similarly define module repressed in microarray

All genes

Module

Induced

Intersection

Pr(

Page 15: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Segal et al.: Methodology

• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or

normal tissue

• Compile a large collection of gene sets (modules) from earlier studies

• Identify gene set (modules) induced or repressed in a microarray

• Identify modules induced in several arrays, or repressed in several arrays

• Check if these arrays are enriched in some clinical annotation

Page 16: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Source: Nature Genetics 36, 1090 - 1098 (2004)

Page 17: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Segal et al.: Methodology

• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or

normal tissue

• Compile a large collection of gene sets (modules) from earlier studies

• Identify gene set (modules) induced or repressed in a microarray

• Identify modules induced in several arrays, or repressed in several arrays

• Check if these arrays are enriched in some clinical annotation

Page 18: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Identify modules induced in several arrays, or repressed in several arrays

Source: Nature Genetics 36, 1090 - 1098 (2004)

Page 19: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Segal et al.: Methodology

• Compile a large collection of cancer-related microarrays – microarrays measuring gene expression in cancer tissues or

normal tissue

• Compile a large collection of gene sets (modules) from earlier studies

• Identify gene set (modules) induced or repressed in a microarray

• Identify modules induced in several arrays, or repressed in several arrays

• Check if these arrays are enriched in some clinical annotation

Page 20: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Check if these arrays are enriched in some clinical annotation

Source: Nature Genetics 36, 1090 - 1098 (2004)

Page 21: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Segal et al: Cancer “module maps”

Source: Nature Genetics 37, S38 - S45 (2005)

Red(m,c): Microarrays in whichmodule m was overexpressed (induced) are enriched in condition c

Green: Microarrays in whichmodule m was underexpressed (repressed) are enriched in condition c

Rows and columns are not inan arbitrary order. They havebeen “clustered” to displaysimilar rows (or columns) together

Page 22: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Insights from cancer module map

• Some modules activated or repressed across many tumor types. Such modules could be related to general tumorogenic processes

• Some modules specifically activated or repressed in certain tumor types or stages of tumor progression

Page 23: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

From modules to regulation

• A module map shows the transcriptional changes underlying cancer

• Transcriptional changes are a result of transcription factors and their binding sites

• A deeper understanding of cancer would come from finding out which transcription factors and binding sites led to the transcriptional changes

Page 24: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Part 2: Cis-regulatory elements

Page 25: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Genomics and gene regulation

• Such knowledge comes from genomics data• ChIP-chip studies identify which transcription

factors bind which DNA sequences• Analysis of DNA sequence, using known

binding site motifs, gives us putative binding sites

• Cross-species conservation also tells us something about possible locations of binding sites

Page 26: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Cis-regulatory analysis

• Identify a set of genes whose promoters contain the same binding sites– Such a set of genes is likely to be regulated by the

same TF– Often called a “regulatory module”

• Earlier studies mined microarrays for “co-expressed” genes, then used motif finding algorithms to discover their shared binding sites

Page 27: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Cis-regulatory analysis

• Another approach (Segal et al. 2003) tried to solve the problem in an integrated manner

• Find a set of genes such that– their expression profiles are similar (microarrays)– they share the same binding sites (sequence)

• Joint learning of “regulatory module” from two very different types of data: microarray and sequence– An important theme in current bioinformatics

Page 28: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Cis-regulatory analysis

• Connection between gene expression and cis-regulatory elements (binding sites) also explored in Beer & Tavazoie.

• Found rules on combinations and locations of binding sites that would cause the gene to be over- or under-expressed

Page 29: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

• The binding sites “RRPE” and “PAC” must occur within240 bp and 140 bp of gene start• Genes containing both motifs, following certain rules on location, are tightly co-regulated• Genes containing any one motif, or both in incorrect positional configuration, have close to random expression

Source: Nature Genetics 37, S38 - S45 (2005)

Page 30: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Eukaryotes• These studies have mostly focused on yeast

(which is a eukaryote, but has a small, compact genome)

• Not much work of this type in the longer, more complex genomes of metazoans (e.g., humans, rodents, fruitflies)

• The genome is not compact; may not suffice to look at sequence right next to a gene. Intergenic regions are long, and cis-regulatory signals may not be close to gene

Page 31: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

One study in humans

• HeLa cells are an “immortal” cell-line derived from cervical cancer cells in a person who died in 1951.– Used extensively in studying cancer

• Method of Segal et al. (joint learning of regulatory modules from gene expression and sequence data) applied to these cells

Page 32: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

One study in humans

• Gene expression data used: microarrays measuring genes during cell cycle in HeLa cells

• Sequence: 1000 bp promoters (upstream) of human genes

Page 33: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Result of analysis: Two motifs found to be shared by this set of genes. The genes have similar expression profiles. One of theidentified motifs (NFAT) known to be involved in cell-cycle

Source: Nature Genetics 37, S38 - S45 (2005)

Page 34: Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha

Summary

• The common theme is to analyze sets of genes, and relate their common expression patterns to cancer types or to presence of cis-regulatory motifs

• Search algorithms may be required to identify some of these features