DNA Microarrays Paper 2010

  • Published on

  • View

  • Download

Embed Size (px)


<p>Chemometrics and Intelligent Laboratory Systems 104 (2010) 2852</p> <p>Contents lists available at ScienceDirect</p> <p>Chemometrics and Intelligent Laboratory Systemsj o u r n a l h o m e p a g e : w w w. e l s ev i e r. c o m / l o c a t e / c h e m o l a b</p> <p>An introduction to DNA microarrays for gene expression analysisTobias K. Karakach a, Robert M. Flight b,c, Susan E. Douglas a, Peter D. Wentzell b,a b c</p> <p>Institute of Marine Biosciences, National Research Council of Canada, 1411 Oxford Street, Halifax, Nova Scotia, Canada B3H 3Z1 Department of Chemistry, Dalhousie University, Halifax, Nova Scotia, Canada B3H 4J3 Department of Neuroscience Training, University of Louisville, Louisville, Kentucky, 40203, USA</p> <p>a r t i c l e</p> <p>i n f o</p> <p>a b s t r a c tThis tutorial presents a basic introduction to DNA microarrays as employed for gene expression analysis, approaching the subject from a chemometrics perspective. The emphasis is on describing the nature of the measurement process, from the platforms used to a few of the standard higher-level data analysis tools employed. Topics include experimental design, detection, image processing, measurement errors, ratio calculation, background correction, normalization, and higher-level data processing. The objective is to present the chemometrician with as clear a picture as possible of an evolving technology so that the strengths and limitations of DNA microarrays are appreciated. Although the focus is primarily on spotted, two-color microarrays, a signicant discussion of single-channel, lithographic arrays is also included. 2010 Elsevier B.V. All rights reserved.</p> <p>Article history: Received 24 November 2009 Received in revised form 5 April 2010 Accepted 6 April 2010 Available online 29 April 2010 Keywords: DNA microarray GeneChip Gene expression Experimental design</p> <p>1. Introduction The rise of chemometrics as an important sub-discipline of analytical measurement science paralleled the rapid growth of analytical instrumentation capable of providing higher orders of multivariate data and the associated demand for new kinds of information. Some twenty years later, the biological sciences are undergoing a similar revolution resulting from new measurement technologies, and the need for effective data analysis tools is just as pressing. Since the beginning of the 1990s, molecular biology has moved toward high throughput measurements and data similar to the transition in the analytical chemistry eld in the early 1970s. The move toward high throughput technologies in molecular biology is concomitant with the advent of the huge amounts of genome information and the need to utilize it in understanding complex molecular interactions in biological systems. This is a consequence of the recognition that even simple cellular activities are the result of well-orchestrated molecular networks that control the cell and that these cannot be fully understood by studying one component at a time, but only through a comprehensive integration of the entire molecular machinery controlling the cell. Predictably, analysis of the data generated by high throughput measurements has necessitated more complex mathematical approaches that had not previously been available to molecular biologists. Chemometrics has an important role to play in this regard, since at their core these are analytical measurements and amenable to the tools that have been developed by chemometricians over many</p> <p> Corresponding author. E-mail address: peter.wentzell@dal.ca (P.D. Wentzell). 0169-7439/$ see front matter 2010 Elsevier B.V. All rights reserved. doi:10.1016/j.chemolab.2010.04.003</p> <p>years. The application of those tools, however, requires a clear understanding of the nature of these new measurements and the challenges they pose. There are many different high throughput measurement technologies currently employed by molecular biologists, including DNA sequencing and LCMS (and derivatives), but one of the more ubiquitous tools in use is the DNA microarray. DNA microarrays are popular due to their unique ability to query the mRNA expression levels of thousands of genes (potentially all of the genes in an organism) simultaneously with relatively high specicity, providing a snapshot in time of the overall gene expression of the system under study. However, there are some important considerations to take into account when one is using DNA microarrays or analyzing DNA microarray data. Although this topic has been previously reviewed in other elds [15], this tutorial provides an introduction, to an analytical chemistry audience, of this technology and various issues related to the analysis of the resultant data. It begins by providing a brief biological background necessary to appreciate the experimental underpinnings of the technology and an overview of the methods used in manufacturing DNA microarrays. Later sections provide a detailed introduction to the measurement process of DNA microarrays in the context of the DNA microarray experiment workow, starting with the experimental design and following through data acquisition and processing. In addition, the pre-processing applied to the data before nal analysis is discussed. Finally, the methods used to analyze the resultant data are briey considered. The primary technological platform treated in this paper is the spotted DNA microarray, with a secondary focus on Affymetrix arrays (see Section 3 for a description of the microarray types). This is largely due to the fact that the authors' have more extensive</p> <p>T.K. Karakach et al. / Chemometrics and Intelligent Laboratory Systems 104 (2010) 2852</p> <p>29</p> <p>experience working with data only from the former, and that much of the research available in the literature has been published on spotted microarrays. It is also important to note from the outset that the emphasis of this tutorial is on the nature of microarray measurements and the experimental procedures used to obtain them, rather than on the data analysis techniques applied to the nal data sets. Chemometricians are well-versed in the tools of the trade, but less familiar with strengths, limitations, and peculiarities of high throughput biological measurements. Readers looking for a primer on higher-level analysis of transcriptomics data are likely to be disappointed (they should visit [6] for a listing of papers describing DNA microarray analysis methods), but it is hoped that those who wish to gain a fundamental understanding of the measurement workow will nd what they need to venture into the eld of microarray analysis with condence.</p> <p>2. Biological background and motivation A simplied view of the ow of information in a cell would show information traversing from the genes (DNA) to messenger RNA (mRNA) to proteins, which can subsequently act on DNA, mRNA, metabolites, or other proteins. To produce the required proteins, the gene must be transcribed into mRNA by RNA polymerases, and the mRNA can then be translated by ribosomes into protein (see Fig. 1). Depending on the cell type and its biological state, specic proteins will be expressed at different levels. Therefore, if one can measure the complement of all expressed proteins then this will provide information about the current state of the cell. Given the explicit relationship between gene expression (transcription) and protein translation, knowledge of mRNA levels may provide an indirect route to this knowledge. For example, comparing the gene expression between diseased and healthy cells could allow the determination of the molecular basis of disease. Alternatively, measuring gene expression as a function of a serial process would allow the determination of molecular changes over time (cell cycle) or with changing dosage (drugs/metabolite response). Consequently, three options are available for investigating molecular dynamics of the cell, analyzing the variations of (1) the complete set of proteins in the cell (proteomics), or (2) the complete set of mRNA transcripts that leads to the production of these proteins (transcriptomics), or (3) the complete set of metabolites generated by the proteins (metabolomics). Although research in proteomics and metabolomics has been ongoing for many years, both elds still suffer from a lack of standardized methodologies and poor reproducibility. This is partly a result of the heterogeneous properties of the molecules being measured. In the case of proteomics, different amino acid sequences lead to a wide variety of protein types, making it difcult to design standard protocols for performing measurements on the entire protein complement. Metabolomics likewise suffers from the wide diversity of chemical properties of different metabolites. The</p> <p>relatively homogeneous nature of mRNA, and the development of capture methods based on complementary base pairing, has led to the very mature eld of transcriptomics using DNA microarrays. In addition, in many cases mRNA levels are a reasonable proxy for protein amounts, allowing one to make a rational inference regarding the level of protein expression based on the levels of mRNA expression. There are, however, exceptions where protein expression is controlled post-transcriptionally by other factors. Transcriptomics generally utilizes DNA microarrays, small slides to which are attached hundreds to tens of thousands of molecules of DNA [7]. The DNA is able to bind complementary sequences created from mRNA transcripts, facilitating the quantitation of various mRNA transcripts in the cell. This process is illustrated schematically in Fig. 2. DNA microarrays allow molecular biologists to monitor the levels of mRNA transcripts for tens of thousands of genes simultaneously, thereby giving them a window into the inner workings of the genome at the transcriptional level. Microarrays have impacted the study of numerous diseases, the regulation of many biological mechanisms, as well as the cell cycle of various organisms [1]. The methods by which DNA microarrays are constructed and used, however, can take various forms.</p> <p>3. DNA microarrays A microarray consists of a series of miniaturized chemical recognition sites onto which binding reagents, capable of distinguishing complementary molecules, have been attached. Pirrung dened a microarray as a at solid support that bears multiple probe sites containing distinct chemical reagents with the capacity to recognize matching molecules unambiguously [8]. Thus, in principle, if complementary molecules in a complex mixture were modied with uorophores, for instance, and allowed to interact with the probes, the molecules could be interrogated simultaneously to determine their respective concentrations. This denition is similar to the classic denition of multianalyte chemical sensors, notwithstanding the different measurement environments and detection systems. It also restricts a microarray to be a miniaturized assay without specifying, explicitly, the chemical reagents that constitute the probes. In the case of DNA microarrays, the probes are DNA oligomers that are allowed to interact with labeled complementary DNA strands. This has led to a wide variety of DNA microarray types, although there are two general classes. The rst category encompasses microarrays on which a single stranded DNA (ssDNA) oligomer probe is synthesized directly on the substrate (in situ synthesis). The second category encompasses microarrays on which a ssDNA oligomer or dsDNA (double-stranded DNA) amplicon probe is deposited on the substrate, and these are commonly referred to as spotted arrays. These different methods of generating DNA microarrays lead to some important considerations in the data analysis, and</p> <p>Fig. 1. Overview of the process of transcribing DNA to mRNA, which is translated into proteins that are then able to act on metabolites. It should be noted that this simple model ignores many of the complexities in the process, such as alternative splicing of mRNA, miRNA silencing and the effect of post-translational modications on proteins.</p> <p>30</p> <p>T.K. Karakach et al. / Chemometrics and Intelligent Laboratory Systems 104 (2010) 2852</p> <p>Fig. 2. (a) Spotted microarray experimental set-up. mRNA extracts (targets) from cells under two distinct physiological conditions are reverse transcribed to cDNA and then labeled with different uorescent dyes e. g. Cy3 and Cy5. Equal amounts of the dye-labeled targets are combined and applied to a glass substrate onto which cDNA amplicons or oligomers (probes) are immobilized. (b) Scanned image of an Atlantic salmon cDNA microarray [7].</p> <p>so a basic primer on the synthesis and detection of target binding for both is provided in the next sections. 3.1. In situ synthesis Among the most popular arrays where the DNA oligomers are synthesized in situ are the Affymetrix arrays, known as GeneChips. In a GeneChip, a photolithographic mask is used to determine the probe position on the array at which photo-induced deprotection of a previously deposited functionalized nucleotide occurs, in order to attach the subsequent nucleotide to the growing oligomer [9]. Due to possible failure of photo-induced deprotection at each step of the synthesis, GeneChips contain short probes (25 nucleotides long), with multiple probe sequences for each target of interest. These make up what are known as probe sets, and contain both perfect matches (PM) for the sequence of interest, and also probes that contain a single base mis-match (MM) at the middle position to allow determination of non-specic target binding. The use of photolithographic techniques to produce the arrays leads to very reproducible, extremely regular probe regions on the array surface. However, this same strategy makes it more expensive to produce custom arrays, and Affymetrix has concentrated on producing arrays for widely used organisms, although the selection of organisms for which arrays exist has expanded considerably in recent years. Nimblegen arrays are similar to GeneChips in that they use a photo-induced deprotection of previously deposited functionalized nucleotides to subsequently add to the growing oligomer. However, in the case of the Nimblegen arrays, a digital micromirror device (DMD) is used to direct the light to cause photo-induced deprotection [10]. This has the advantage of not requiring the fabrication of new photolithographic masks for new array designs, as in the case of the GeneChips. Another important difference in the Nimblegen technology is the use of longer oligonucleotides, 60mers in contrast to the 25mers used by Affymetrix. In theory, this allows for greater specicity of hybridization of the targets to the probes on the slide, with less chance of cross-hybridization between the target sequences. The use of the DMD allows Nimblegen to achieve high densities, while easily allowing one to create customized arrays. Another method of in situ oligomer synthesis uses addressable electrodes to cause deprotection...</p>


View more >