37
Microarrays & Gene Expression Analysis

Microarrays & Gene Expression Analysis

  • Upload
    asabi

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

Microarrays & Gene Expression Analysis. Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH – Sequencing By Hybridization. DNA Microarrays Developed around 1987. - PowerPoint PPT Presentation

Citation preview

Page 1: Microarrays & Gene Expression Analysis

Microarrays & Gene Expression Analysis

Page 2: Microarrays & Gene Expression Analysis
Page 3: Microarrays & Gene Expression Analysis

Contents• DNA microarray technique

• Why measure gene expression

• Clustering algorithms

• Relation to Cancer

• SAGE

• SBH – Sequencing By Hybridization

Page 4: Microarrays & Gene Expression Analysis

DNA Microarrays1. Developed around 1987.2. Employ methods previously exploited in

immunoassay context – specific binding and marking techniques.

3. Two types of probes: http://www.gene-chips.com/Format I: probe cDNA (500~5,000 bases long) is immobilized to a solid surface such as glass; widely considered as developed at Stanford University; Traditionally called DNA microarrays. Format II: an array of oligonucleotide (20~80-mer oligos) probes is synthesized either in situ (on-chip) or by conventional synthesis followed by on-chip immobilization; developed at Affymetrix, Inc. Many companies are manufacturing oligonucleotide based chips using alternative in-situ synthesis or depositioning technologies. Historically called DNA chips.

Page 5: Microarrays & Gene Expression Analysis

DNA Microarray Technique1. The microarray is made of a small piece of

glass (1x1 or 2x2 cm).

2. Thousands to millions of pixels are put on it, in each many (n) copies of DNA probes (short (8-30 bases), single stranded, called OLIGO).

3. A probe on the array will bind its complementary target if it is present in the solution washing the chip.

4. When the array surface is scanned with a laser, fluorescent labels attached to the targets reveal which probes are bound.

Page 6: Microarrays & Gene Expression Analysis

Use of DNA Microarrays

1. Identify a query sequence - the sequence is hybridized to an array containing suitable probes1. Point mutations (SNP) or other mutations –

the array contains probes that match segments of the normal and mutated sequences.

2. An unknown sequence (SBH) – the array contains all possible k-mers (e.g., all the 46 6-mers)

2. Gene expression analysis - which genes are

expressed ? under what conditions ?

Page 7: Microarrays & Gene Expression Analysis

DNA Microarray Methodology - Flash Animationhttp://www.bio.davidson.edu/biology/courses/genomics/chip/chip.html

Page 8: Microarrays & Gene Expression Analysis
Page 9: Microarrays & Gene Expression Analysis

Why Measure Gene Expression

Page 10: Microarrays & Gene Expression Analysis

Why Measure Gene Expression1. Determines which genes are induced/repressed

in response to a developmental phase or to an environmental change.

Page 11: Microarrays & Gene Expression Analysis

Why Measure Gene Expression1. Determines which genes are induced/repressed

in response to a developmental phase or to an environmental change.

2. Sets of genes whose expression rises and falls under the same condition are likely to have a related function.

Page 12: Microarrays & Gene Expression Analysis

Why Measure Gene Expression1. Determines which genes are induced/repressed

in response to a developmental phase or to an environmental change.

2. Sets of genes whose expression rises and falls under the same condition are likely to have a related function.

3. Features such as a common regulatory motif can be detected within co-expressed genes.

Page 13: Microarrays & Gene Expression Analysis

Why Measure Gene Expression1. Determines which genes are induced/repressed

in response to a developmental phase or to an environmental change.

2. Sets of genes whose expression rises and falls under the same condition are likely to have a related function.

3. Features such as a common regulatory motif can be detected within co-expressed genes.

4. A pattern of gene expression may be used as an indicator of abnormal cellular regulation.• A useful tool for cancer diagnosis

Page 14: Microarrays & Gene Expression Analysis

Clustering Co-expressed Genes

1. Find genes whose expression rises and falls under the same conditions.

2. Methods include:1. Hierarchical clustering.2. Self organizing maps.3. Support vector machines (SVMs).

Page 15: Microarrays & Gene Expression Analysis

Hierarchical Clustering

• Cluster analysis and display of genome-wide expression patterns. Michael B. Eisen, Paul T. Spellman, Patrick O. Brown , and David Botstein, 1998,

http://www.pnas.org/cgi/content/full/95/25/14863

• Relationships among objects (genes) are represented by a tree whose branch lengths reflect the degree of similarity between the objects, as assessed by a pairwise similarity

function.

• The computed trees can be used to order genes in the original data table, so that genes or groups of genes with similar expression patterns are

adjacent.

Page 16: Microarrays & Gene Expression Analysis

Zoom:

GeneCardspointer

UniGenepointer

GeneExplorer

Page 17: Microarrays & Gene Expression Analysis

Similarity Metric• The gene similarity metric is a form of correlation coefficient. • Let Gi equal the (log-transformed) primary data for gene G in condition i. For any two genes x and y observed over a series of N conditions, a similarity score can be computed as follows: S(x,y) = i=1..N(xi-x)(yi-y) / (std(x)*(std(y)) where x,y are the mean of observations on genes x and y.

• A neighbor joining method is used to built the corresponding tree.

Page 18: Microarrays & Gene Expression Analysis

Tree Creation• For any set of n genes, a similarity matrix is computed by using the metric described above.• The matrix is scanned to identify the highest

value (representing the most similar pair of genes). • A node is created joining these two genes, and a gene expression profile is computed for the node by averaging observation for the joined elements (missing values are omitted and the two joined elements are weighted by the number of genes they contain). • The similarity matrix is updated with this new node replacing the two joined elements, and the process is repeated n-1 times until only a single

element remains.

Page 19: Microarrays & Gene Expression Analysis

Five separate clusters are indicated by colored bars and by identical coloring of the corresponding region of the dendrogram. The sequence-verified named genes in these clusters contain multiple genes involved in (A) cholesterol biosynthesis, (B) the cell cycle, (C) the immediate-early response, (D) signaling and angiogenesis, and (E) wound healing and tissue remodeling. These clusters also contain named genes not involved in these processes and numerous uncharacterized genes.

Page 20: Microarrays & Gene Expression Analysis

Self Organizing Maps

• K-means method: the number of clusters is fixed (k).

• g1, ..,gn represents the expression of each gene gi in d experiments as a point in d dimensions.

• Randomly choose k centers, c1, ..,ck: ci is a point in a d dimension.

• The protocol:1. Join gi to the closest center.2. Compute new centers. The new center ci‘ is

the center of mass of all points joined to ci.3. Repeat the steps until convergence or until

you’re pleased with the results.

Page 21: Microarrays & Gene Expression Analysis

Relation to Cancer• Tumors result from disruptions of growth

regulation. Although most tumors are treated with general anti-proliferate drugs, they exhibit remarkable clinical heterogeneity which remains a major challenge in the successful management of cancer.

• Clinical heterogeneity in tumors likely reflects unrecognized molecular heterogeneity in tumors. Because of the logical connection between gene expression patterns and phenotype, it is likely that there is a direct connection between gene expression patterns of tumors and their clinical phenotype.

Page 22: Microarrays & Gene Expression Analysis

Towards a clinically relevant taxonomy of Cancer

• Access archived clinical tumor samples taken at or near diagnosis from patients with well-characterized subsequent clinical histories.

• Use DNA arrays to measure gene expression in these samples.

• Look for new molecularly defined groups within or between previously recognized groups of tumors, especially groups with increased clinical homogeneity.

• Look for direct associations between molecular and clinical properties of tumors.

Page 23: Microarrays & Gene Expression Analysis

Cancer Gene Expression

• The suggested procedure has been used to classify several types of cancer, or cancerous verses normal cells.• Breast cancer• AML and ALL.• Melanoma.• Lymphoma.• …

Page 24: Microarrays & Gene Expression Analysis

Example - Melanoma• Molecular classification of cutaneous

malignant melanoma by gene expression profiling. Nature 2000 Aug 3;406(6795):536-40

• Discovered a subset of melanomas identified by mathematical analysis of gene expression in a series of samples.

Page 25: Microarrays & Gene Expression Analysis

Example - Melanoma • Remarkably, many genes underlying the

classification of this subset are differentially regulated in invasive melanomas that form primitive tubular networks in vitro, a feature of some highly aggressive metastatic melanomas.

• Global transcript analysis can identify unrecognized subtypes of cutaneous melanoma and predict experimentally verifiable phenotypic characteristics that may be of importance to disease progression.

Page 26: Microarrays & Gene Expression Analysis

Detection of Regulatory Motifs

• A group of co-expressed genes is likely to be co-regulated during transcription.

• Transcription initiation is mediated by regulatory proteins that usually bind upstream to the transcription start site.

• The regulatory proteins bind to conserved regulatory motifs, a short DNA sequence.

• The upstream region of co-expressed genes can be searched for a common regulatory motif.

Page 27: Microarrays & Gene Expression Analysis

Other Applications – Predictive Tools• There is a correlation between co-expression

and related gene function.“Inferring subnetworks from perturbed expression profiles.” Bioinformatics. 2001 Jun;17 Suppl 1:S215-S224.

• There is a correlation between co-expression and protein-protein interaction. “Correlation between transcriptome and interactome mapping data from Saccharomyces cerevisiae.” Nat Genet. 2001 Dec;29(4):482-6.

• Poor correlation between gene expression and protein expression.

Page 28: Microarrays & Gene Expression Analysis

Correlation between gene and protein expression

Ideker et al., science 2001

Page 29: Microarrays & Gene Expression Analysis

Design & Probe Selection• Sensitivity – probes need to hybridize to their

targets. For example – they need to avoid highly structured regions of the target molecule.

• Specificity – probes need not hybridize to wrong targets (cross hybridization). To this end:– design probes to be long enough for

statistical protection.– search databases to explicitly avoid cross-

hybridization to known foreign mRNA.

• Mismatch control.

Page 30: Microarrays & Gene Expression Analysis

Other Challenges• Analyze image to infer expression levels from

red to green ratios, clean background, check for outliers, etc.

• Infer causal relations between genes – regulatory networks.

Page 31: Microarrays & Gene Expression Analysis

http://www.ncbi.nlm.nih.gov/SAGE/

• Experimental technique assigned to gain a quantitive measure of gene expression.

• ~10-20 base “tags” are produced (immediately adjacent to the 3’ end of the 3’ most NlaIII restriction site).

• The SAGE technique measures not the expression level of a gene, but quantifies a "tag" which represents the transcription product of a gene.

Page 32: Microarrays & Gene Expression Analysis

SAGE Technique1. Extracting unique tagging sequences from

mRNA molecules (tags are ~10-20b long).2. Concatenating the tags to a long sequence.3. Sequencing the resulting sequence and

inferring levels from frequencies.• Advantage: an unbiased and inclusive analysis

of the transcriptome.• Sequencing errors are especially problematic

when tags are used, because of the short length of tags.• Of roughly 1.5 million transcript sequences

stored in GenBank, only about 180,000 are well

characterized, and tags could represent them.

Page 33: Microarrays & Gene Expression Analysis

http://www.sagenet.org/

Page 34: Microarrays & Gene Expression Analysis

http://www.ncbi.nlm.nih.gov/SAGE/index.cgi Colon cancer vs normal colon

AColon cancer

BNormal colon

Page 35: Microarrays & Gene Expression Analysis
Page 36: Microarrays & Gene Expression Analysis

C A T A G T A

G T AA G T

T A GA T A

C A T

Using chips forsequencing

SBH – Sequencing by Hybridization

A method for sequencing, actually the original motivation of DNA microarrays.

• A chip containing all k-mers is produced.

• The query sequence is hybridized to the chip.

• Example: a chip of all 3-mers is produced, containing 64 probes. 5 probes will be highlighted.

Page 37: Microarrays & Gene Expression Analysis

SBH Protocol• Knowing the start and end of the query

sequence, and the set of highlighted k-mers, the query sequence is reconstructed.

• Example: start = CAT, end = GTA, highlighted group = {CAT, ATA, TAG, AGT, GTA}.

CAT – AT? CAT ATA – TA? CATA TAG – AG? CATAG AGT – GTA CATAGT• Problems:

• Reconstruction is not always unique – same k-mer may be followed by several k-mers. • CAT – ATA, ATG.

• Hybridization contain errors.