39
Swiss Institute of Bioinformatics Institut Suisse de Bioinformatique LF-2001.11 Introduction to DNA Microarrays DNA Microarrays and DNA chips resources on the web

Introduction to DNA Microarrays

  • Upload
    henry

  • View
    71

  • Download
    0

Embed Size (px)

DESCRIPTION

Introduction to DNA Microarrays. DNA Microarrays and DNA chips resources on the web. INTRODUCTION. Microarray analysis is a new technology that allows scientists to simultaneously detect thousands of genes in a small sample and to analyze the expression of those genes. - PowerPoint PPT Presentation

Citation preview

Page 1: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Introduction to

DNA Microarrays

DNA Microarrays and DNA chips resources on

the web

Page 2: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Microarray analysis is a new technology that allows scientists to simultaneously detect thousands of genes in a small sample and to analyze the expression of those genes.

Microarrays are simply ordered sets of DNA molecules of known sequence. Usually rectangular shaped, they can consist of a few hundred to hundreds of thousands of sets. Each individual sequence goes on the array at precisely defined location.

INTRODUCTION

Page 3: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Identification of complex genetic diseases Drug discovery and toxicology studies Mutation/polymorphism detection (SNP’s) Pathogen analysis Differing expression of genes over time, between tissues, and

disease states Preventive medicine Specific genotype (population) targeted drugs More targeted drug treatments – AIDS Genetic testing and privacy

Potential application domains

Page 4: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

The technique

Based on already known methods, such as fluorescence and hybridization. High throughput miniaturized method.

It's main purpose is to compare gene transcription levels in two or more different kinds of cells.

- Microarrays

- DNA chips

- SAGE

- Beads (liquid chip)

Page 5: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

The challenge

The big revolution here is in the "micro" term. New slides will contain a survey of the human genome on a 2 cm2 chip! The use of this large-scale method tends to create phenomenal amounts of data, that have then to be analyzed, processed and stored.

As the technique is quite new, analyzing the data is still a problem, and nothing is standardized yet. A few databases and on-line repositories are coming out, and the future standard will probably be chosen among them.

This is a job for… Bioinformatics !

Page 6: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

General overview

Making the chip Experiment design, sequence selection, collection maintenance, PCR, spotting, printing, synthesis

Probe hybridization Probe purification, labelling, hybridization, washing

Scanning and image treatment Fluorescence correction, find spots, background

Analysing the data Filtering, normalisation Clustering (hierarchical, centroid, SPC)

Representation, storage Graphics, databases, web public resources

} wet lab

Page 7: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

THE EXPERIMENT : making the chip

1- Designing the chip : choosing genes of interest for the experiment and/or select the samples

- Selection of sequences that represent the investigated genes.

- Finding sequences, usually in the EST database.

- Problems : sequencing errors, alternative splicing, chimeric sequences, contamination…

Page 8: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

2- Spotting the sequences on the substrate

- Substrate : usually glass, but also nylon membranes, plastic, ceramic…

- Sequences : cDNA (500-5000 nucleotides, dna chips), oligonucleotides (20~80-mer oligos, oligo chips), genomic DNA ( ~50’000 bases)

- Printing methods : microspotting, ink-jetting (for dna chips) or in-situ printing, photolithography (for oligos, Affymetrix method)

THE EXPERIMENT : making the chip

Page 9: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Microspotting and ink-jetting

THE EXPERIMENT : making the chip

Page 10: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

The microspotting is done by a robot called “arrayer”

THE EXPERIMENT : making the chip

Page 11: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Oligo-spotting (Affymetrix method)

THE EXPERIMENT : making the chip

Page 12: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

THE EXPERIMENT : hybridization

Sample preparation

- Extracting DNA (for genomic studies) or mRNA (for gene expressions studies) from the two or more samples to compare.

- Making cDNAs with extracts, and labeling them with different fluorochromes to allow direct comparison. (Cy-3, Cy-5, DIG…)

- Some techniques use radiolabeling

Page 13: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

THE EXPERIMENT : hybridization

Probes are overlaid on the chip, put in a hybridization chamber, and then washed.

Page 14: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

THE EXPERIMENT : generating data

Chip scanning

- Fluorescence measurements are made with scanning laser fluorescence microscope that scans the slide, illuminating each DNA spot and measuring fluorescence for each dye separately. It creates one red and one green image.

- The two images are then superimposed to give a virtual result of RNA ratio in both samples

Page 15: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

THE EXPERIMENT : generating data

Page 16: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

1- Samples

2- Extracting mRNA

3- Labeling

4- Hybridizing

5- Scanning

6- Visualizing

Page 17: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Examples of images

Affymetrix chip Stanford array

Page 18: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Image analysis- These fluorescence measures are then used to determine the ratio, and in turn the relative abundance, of the sequence of each specific gene in the two mRNA or DNA samples.

- This analysis is performed by a software such as “Scanalyze”, available at : http://rana.lbl.gov/EisenSoftware.htm

or “Spotfinder” from TIGR

- The files created can then be submitted to further analysis

THE EXPERIMENT : generating data

Page 19: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

THE EXPERIMENT : making sense of the data

Although the visual image of a microarray panel is alluring, its information content, per se, is still not human readable.

How to visualize, organize and explore the meaning of information consisting of several million measurements of expression of thousands of genes under thousands of conditions?…

Page 20: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Data mining depends on the questions which are asked. The most frequent question is to find sets of genes that have correlated expression profiles (belonging to the same biological process and/or co-regulated), or to divide conditions to groups with similar gene expression profiles (for example divide drugs according to their effect on gene expression).

The method used to answer these questions is called CLUSTERING.

THE EXPERIMENT : making sense of the data

Page 21: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Clustering data

•Input: N data points, Xi, i=1,2,…,N (the color ratios measured with Scanalyze, for example) in a D dimensional space. N and D will be either genes and conditions for gene clustering, or conditions and genes for condition clustering.•Goal: Find “natural” groups or clusters. •Note: according to the method, the number of clusters will be fixed from the beginning (centroid clustering) or determined after the analysis (hierarchical clustering)

Page 22: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Clustering data

Before clustering, a few steps to “clean the data” are necessary (normalization, filtering)

Clustering methods (examples) :

1- Agglomerative Hierarchical

2- Centroids: K-means or SOM

3- Super-Paramagnetic Clustering

For a good introduction on different clustering techniques, read the article from Gavin Sherlock “Analysis of large-scale gene expression data” in Current Opinion in Immunology 2000, 12:201-205 (pdf)

http://www.isrec.isb-sib.ch/~vpraz/chips/Sherlock.pdf

Page 23: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

52 41 3

Agglomerative Hierarchical Clustering

3

1

4 2

5

Distance between joined clusters

Dendrogram The dendrogram induces a linear ordering of the data points

The dendrogram induces a linear ordering of the data points

Page 24: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

1- The similarity measure between two genes (or experiments)

Before doing a such clustering, one has to define two things:

2- The distance measure between the new cluster and the others

Single Linkage: distance between closest pair.

Complete Linkage: distance between farthest pair.

Average Linkage: distance between cluster centers

Centered correlation

Uncentered correlation

Absolute correlation

Euclidean

Agglomerative Hierarchical Clustering

Page 25: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Centroid methods - K-means

Iteration = 0

•Start with random position of K centroids.•Assign points to centroids•Move centroids to centerof assigned points•Iterate until centroids are stable

Page 26: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Iteration = 1

Centroid methods - K-means

•Start with random position of K centroids.•Assign points to centroids•Move centroids to centerof assigned points•Iterate until centroids are stable

Page 27: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Iteration = 3

Centroid methods - K-means

•Start with random position of K centroids.•Assign points to centroids•Move centroids to centerof assigned points•Iterate until centroids are stable

Page 28: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Self-organizing Maps

- Choose a number of partitions

- Assign a random reference vector to each partition.

- Pick a gene randomly and assign it to its most similar reference vector.

- Adjust that reference vector is so that it is more similar to the chosen gene.

- Adjust the other reference vectors.

- Repeat thousands of times until partitions are stable.

A self-organizing map.

Page 29: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

The idea behind SPC is based on the physical properties of dilute magnets.

Calculating correlation between magnet orientations at different temperatures (T).

T=Low

Super-Paramagnetic Clustering (SPC) M.Blatt, S.Weisman and E.Domany (1996) Neural Computation

Page 30: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

The idea behind SPC is based on the physical properties of dilute magnets.

Calculating correlation between magnet orientations at different temperatures (T).

T=High

Super-Paramagnetic Clustering (SPC) M.Blatt, S.Weisman and E.Domany (1996) Neural Computation

Page 31: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

The algorithm simulates the magnets behavior at a range of temperatures and calculates their correlation

The temperature (T) controls the resolution

T=Intermediate

Super-Paramagnetic Clustering (SPC) M.Blatt, S.Weisman and E.Domany (1996) Neural Computation

Page 32: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Clustering data

•M. Eisen’s programs for clustering and display of results (Cluster, TreeView)

–Predefined set of normalizations and filtering–Agglomerative, K-means, 1D SOM

•Matlab–Agglomerative, public m-files.

•Dedicated software packages (SPC)•Web sites: e.g. http://ep.ebi.ac.uk/EP/EPCLUST/•Statistical programs (SPSS, SAS, S-plus)•And much more…

Available clustering tools

Page 33: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Clustering data

The final data representation is then a big matrix with rows being the genes and columns representing the different experiments. To keep the image coherent with the scan output, the ratio numbers calculated by Scanalyze are transformed back in color spots on a green-red based scale.

Page 34: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Clustering data

Another way to represent these data is a graph showing the gene’s expression variation during the different experiments

Expression variation of nine genes along the 19 experiments from Lyer et al. (Fibroblast response to serum stimulation)

Page 35: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Expression Profiler Online clustering and analysis tools (EBI)

GenEx Database, repository and analysis tools (NCGR)

MAExplorer MicroArray Explorer for data mining Gene Expression, free download

ArrayDB Downloadable tools, short online demo

MAXD Downloadable data warehouse and visualisation for expression data

Jexpress Java tools for gene expression data analysis, free download

Eisen Lab Michael Eisen's suite for image quantitation and data analysis (Scanalyze, Cluster, TreeView). Downloadable.

Web resources : data analysis tools

Page 36: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

SMD The Stanford Microarray Database

Chip DB Searchable database on gene expression (MIT)

ExpressDB Public queries of E. coli and yeast data

GEO Gene expression data repository and online resource (NCBI)

RAD RNA Abundance Database

Expression Connection Saccharomyces Genome Database expression data retrieval

EpoDB Expression information retrieval for one gene at a time

yMGV Public queries of yeast data

Web resources : public databases

Page 37: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

AMAD Downloadable web driven database system

ArrayExpress Public data deposition and public queries (EBI)

maxdSQL Downloadable data warehouse and visualization environment

GXD Mouse expression data storage and integration

GeNet Distribution and visualization of gene expression data from any organism

Web resources : public databases

Page 38: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Drosophila microarray project Drosophila Metamorphosis Time Course Database

Samson Lab Yeast Transcriptional Profiling Experiments

SageMap NCBI SAGE data and analysis tools

NCI60 cancer project Supplement to Ross et al. (Nat Genet., 2000).

Serum-response Supplement to Lyer et al.(1999) Science 283:83-87

Breast cancer Supplement to Perou et al. Nature 406:747-752(2000)

Cancer Molecular Pharmacology Integration of large databases on gene expression and molecular pharmacology.

Web resources : public databases

Page 39: Introduction to  DNA Microarrays

Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique

LF-2001.11

Leung’s Links page & software info

Davison’s DNA Microarray Methodology - Flash Animation

gene-chips Overview of the technique, papers…

Chips & microassays General information

SMD guide Stanford's links page, very complete

Introduction Online introduction to microarrays (EBI)

Brown Lab Guide Microarrays protocols and arrayer construction.

Web resources : general information