10
Funct Integr Genomics (2006) 6: 202211 DOI 10.1007/s10142-006-0025-4 ORIGINAL PAPER Arnis Druka . Gary Muehlbauer . Ilze Druka . Rico Caldo . Ute Baumann . Nils Rostoks . Andreas Schreiber . Roger Wise . Timothy Close . Andris Kleinhofs . Andreas Graner . Alan Schulman . Peter Langridge . Kazuhiro Sato . Patrick Hayes . Jim McNicol . David Marshall . Robbie Waugh An atlas of gene expression from seed to seed through barley development Received: 15 December 2005 / Revised: 16 January 2006 / Accepted: 23 January 2006 / Published online: 18 March 2006 # Springer-Verlag 2006 Abstract Assaying relative and absolute levels of gene expression in a diverse series of tissues is a central step in the process of characterizing gene function and a necessary component of almost all publications describing individual genes or gene family members. However, throughout the literature, such studies lack consistency in genotype, tissues analyzed, and growth conditions applied, and, as a result, the body of information that is currently assembled is fragmented and difficult to compare between different studies. The development of a comprehensive platform for assaying gene expression that is available to the entire research community provides a major opportunity to assess whole biological systems in a single experiment. It also integrates detailed knowledge and information on individ- ual genes into a unified framework that provides both context and resource to explore their contributions in a broader biological system. We have established a data set that describes the expression of 21,439 barley genes in 15 tissues sampled throughout the development of the barley cv. Morex grown under highly controlled conditions. A. Druka . I. Druka . N. Rostoks . D. Marshall . R. Waugh (*) Scottish Crop Research Institute, Invergowrie, Dundee, DD2 5DA, Scotland, UK e-mail: [email protected] Tel.: +44-1382-568584 Fax: +44-1382-568587 G. Muehlbauer Department of Agronomy and Plant Genetics, University of Minnesota, St Paul, MN 55108, USA R. Caldo . R. Wise Department of Plant Pathology, Iowa State University, Ames, IA 50011-1020, USA R. Caldo . R. Wise Center for Plant Responses to Environmental Stresses, Iowa State University, Ames, IA 50011-1020, USA U. Baumann . A. Schreiber . P. Langridge University of Adelaide, Plant Science, Waite Campus, PMB 1, Glen Osmond, SA 5064, Australia R. Wise Corn Insects and Crop Genetics Research, USDA-ARS, Iowa State University, Ames, IA 50011-1020, USA T. Close Department of Botany and Plant Sciences, University of California, Riverside, CA 92521-0124, USA A. Kleinhofs Department of Crop and Soil Sciences, Washington State University, Pullman, WA 99164-6420, USA A. Kleinhofs Department of Genetics and Cell Biology, Washington State University, Pullman, WA 99164-6420, USA A. Graner Institut für Pflanzengenetik und Kulturpflanzenforschung, Correnstraβe 2, 06466 Gatersleben, Germany A. Schulman MTT/BI Plant Genomics Laboratory, University of Helsinki, Helsinki, Finland A. Schulman MTT Agrifood Research Finland, P.O. Box 56, 00014 Helsinki, Finland K. Sato Research Institute for Bioresources, Okayama University, Kurashiki 710-0046, Japan P. Hayes Department of Crop and Soil Science, Oregon State University, Corvallis, OR 97331, USA J. McNicol BioSS Office, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, Scotland, UK

An atlas of gene expression from seed to seed through barley development

Embed Size (px)

Citation preview

Funct Integr Genomics (2006) 6: 202–211DOI 10.1007/s10142-006-0025-4

ORIGINAL PAPER

Arnis Druka . Gary Muehlbauer . Ilze Druka . Rico Caldo . Ute Baumann .Nils Rostoks . Andreas Schreiber . Roger Wise . Timothy Close . Andris Kleinhofs .Andreas Graner . Alan Schulman . Peter Langridge . Kazuhiro Sato .Patrick Hayes . Jim McNicol . David Marshall . Robbie Waugh

An atlas of gene expression from seed to seedthrough barley development

Received: 15 December 2005 / Revised: 16 January 2006 / Accepted: 23 January 2006 / Published online: 18 March 2006# Springer-Verlag 2006

Abstract Assaying relative and absolute levels of geneexpression in a diverse series of tissues is a central step inthe process of characterizing gene function and a necessarycomponent of almost all publications describing individualgenes or gene family members. However, throughout theliterature, such studies lack consistency in genotype,tissues analyzed, and growth conditions applied, and, asa result, the body of information that is currently assembledis fragmented and difficult to compare between differentstudies. The development of a comprehensive platform for

assaying gene expression that is available to the entireresearch community provides a major opportunity to assesswhole biological systems in a single experiment. It alsointegrates detailed knowledge and information on individ-ual genes into a unified framework that provides bothcontext and resource to explore their contributions in abroader biological system. We have established a data setthat describes the expression of 21,439 barley genes in 15tissues sampled throughout the development of the barleycv. Morex grown under highly controlled conditions.

A. Druka . I. Druka . N. Rostoks . D. Marshall . R. Waugh (*)Scottish Crop Research Institute,Invergowrie, Dundee,DD2 5DA, Scotland, UKe-mail: [email protected].: +44-1382-568584Fax: +44-1382-568587

G. MuehlbauerDepartment of Agronomy and Plant Genetics,University of Minnesota,St Paul, MN 55108, USA

R. Caldo . R. WiseDepartment of Plant Pathology, Iowa State University,Ames, IA 50011-1020, USA

R. Caldo . R. WiseCenter for Plant Responses to Environmental Stresses,Iowa State University,Ames, IA 50011-1020, USA

U. Baumann . A. Schreiber . P. LangridgeUniversity of Adelaide, Plant Science,Waite Campus, PMB 1,Glen Osmond, SA 5064, Australia

R. WiseCorn Insects and Crop Genetics Research,USDA-ARS,Iowa State University,Ames, IA 50011-1020, USA

T. CloseDepartment of Botany and Plant Sciences,University of California,Riverside, CA 92521-0124, USA

A. KleinhofsDepartment of Crop and Soil Sciences,Washington State University,Pullman, WA 99164-6420, USA

A. KleinhofsDepartment of Genetics and Cell Biology,Washington State University,Pullman, WA 99164-6420, USA

A. GranerInstitut für Pflanzengenetik und Kulturpflanzenforschung,Correnstraβe 2,06466 Gatersleben, Germany

A. SchulmanMTT/BI Plant Genomics Laboratory, University of Helsinki,Helsinki, Finland

A. SchulmanMTT Agrifood Research Finland,P.O. Box 56, 00014 Helsinki, Finland

K. SatoResearch Institute for Bioresources,Okayama University,Kurashiki 710-0046, Japan

P. HayesDepartment of Crop and Soil Science,Oregon State University,Corvallis, OR 97331, USA

J. McNicolBioSS Office, Scottish Crop Research Institute,Invergowrie, Dundee DD2 5DA,Scotland, UK

Rather than attempting to address a specific biologicalquestion, our experiment was designed to provide areference gene expression data set for barley researchers; agene expression atlas and a comparative data set for thoseinvestigating genes or regulatory networks in other plantspecies. In this paper we describe the tissues sampled andtheir transcriptomes, and provide summary information ongenes that are either specifically expressed in certain tissuesor show correlated expression patterns across all 15 tissuesamples. Using specific examples and an online tutorial, wedescribe how the data set can be interrogated for patternsand levels of barley gene expression and how the resultinginformation can be used to generate and/or test specificbiological hypotheses.

Keywords Barley . Development . Gene expression

Introduction

Profiling transcript abundance is currently the mostefficient, scalable, and informative method for investigat-ing complex biological systems at the molecular level andfor identifying candidate genes involved in specificbiological functions. For example, genes controlling thefloral induction pathway in Arabidopsis have beenidentified by examining transcript abundance profiles indeveloping shoot apices in flowering mutants (Schmid etal. 2003), and floral organ identity mutants have been usedto investigate spatial patterns of gene expression duringflower development (Wellmer et al. 2004). Elegant studiesusing separated cell populations have led to the identifi-cation of candidate ABA-regulated genes involved inguard-cell function (Leonhardt et al. 2004) and transcriptsspecific to individual root cell types and developmentalzones (Birnbaum et al. 2003). Other biological processesincluding plant–pathogen interactions (Caldo et al. 2004),seed development and grain filling (Hunter et al. 2002;Ruuska et al. 2002; Zhu et al. 2003), response toenvironmental stresses (Chen et al. 2002; Cheong et al.2002; Rossel et al. 2002; Cooper et al. 2003), hormonalcontrol of development (Che et al. 2002; Goda et al. 2002),gravitropism (Moseyko et al. 2002), cell cycle (Menges etal. 2002), and circadian rhythms (Harmer et al. 2000) haveall been addressed by this powerful approach.

RNA profiling experiments that catalogue gene expres-sion in diverse tissues and developmental stages under asingle experimental design have been performed inArabidopsis (Czechowski et al. 2005; Schmid et al.2005), rice (Zhu et al. 2003), and maize (Cho et al.2002). While the immediate value of these catalogues islargely descriptive, their true worth will likely emerge overtime as detailed information from single or groups of genesemerges and as reliable methodologies for cross-platformand cross-species RNA profiling data comparisons becomeroutine.

The recently developed Affymetrix Barley1 GeneChipthat contains at least 21,439 genes (Close et al. 2004) is aplatform technology that facilitates extensive and globally

comparable transcript profiling experiments in this crop. Inthis paper, we present a data set from the spring barley cv.Morex that is derived from assaying gene expression in 15tissues from eight key stages during the development ofbarley grown under highly controlled conditions. Tofacilitate future comparisons, the experimental design andthe procedures employed are described according toMIAME standards (Brazma et al. 2001) with the tissuescarefully mapped to the current cereal plant growth andanatomical ontologies (Ware et al. 2002). To encourage useof the data, we have established a simple Web-basedtutorial describing how it can be accessed and searched toreveal the expression of individual or groups of genes. Weuse examples to describe the descriptive and predictivepotential of the data set and suggest that electronic analysismay, in many instances, substitute for the need to performsingle gene-based expression analyses as part of the genecharacterization process. We propose that this collection ofreplicated transcript profiles acts as a reference and acatalyst for future hypothesis-driven studies in barley andprovides a baseline for comparative transcriptomics withother grasses including wheat, maize, and rice. Forsystematic analyses, the data set is publicly availablethrough both BarleyBase/PLEXdb (Shen et al. 2005) andArrayExpress (Parkinson et al. 2005).

Materials and methods

Plant material

The barley reference genotype cv. Morex (15 tissues) wasused throughout. Morex is a US Midwestern six-rowedmalting variety. Tissues were selected to represent themajor stages of barley plant development (Fig. 1a). Theywere mapped to the ‘Cereal plant development ontology’and ‘Plant anatomy ontology’ terms developed byGramene (http://www.plantontology.org/). Plant growthconditions and tissue collection descriptions have beendeposited in the ‘Protocols’ submission sections inBarleyBase (http://www.barleybase.org) and ArrayExpress(http://www.ebi.ac.uk/arrayexpress). In brief, plants werepropagated in Microclima 1000 growth chambers (SnijdersScientific B.V., Tilburg, Holland) with 16 h of light (17°C,337–377 mmol m−2 s−1 light intensity measured by SkyQuantium light sensor), 8 h of darkness (12°C), and 80%humidity.

Three different tissue preparation methods were used.Coleoptiles, mesocotyls, and seminal roots were dissectedfrom 2-day-old embryos from seeds germinated in the darkbetween sheets of water-soaked filter paper. Leaves,crowns, and roots were obtained from 10-cm longseedlings (10–12 d after planting) in pots with vermiculite.Mature tissues were obtained from plants grown in potswith sterilized potting soil. Planting and collection oftissues was at the same time of the day. Tissues werecollected directly into liquid nitrogen by pooling materialfrom seven to ten individual plants per biological sample.

203

Three independent biological samples (type-I replicates)represented a tissue type.

Molecular methods

RNA isolation, labeling, and hybridization protocols weredeposited in the ‘Protocols’ submission sections inBarleyBase (http://www.barleybase.org) and ArrayExpress(http://www.ebi.ac.uk/arrayexpress). A Trizol (Invitrogen,Carlsbad, CA, USA) RNA isolation protocol was used toextract total RNA. RNA quality was checked by micro-chromatography using an Agilent 2000 Bioanalyzer(Agilent Technologies, Palo Alto, CA, USA). Probesynthesis, labeling, and hybridization were performedaccording to manufacturer’s protocols (Affymetrix, Santa

Clara, CA, USA) at the Iowa State University GeneChipCore facility (Caldo et al. 2004).

Data management

The whole data set including DAT, CEL, and CHP filesfrom the 45 Barley1 GeneChip hybridizations can bedownloaded or accessed and analyzed using Web-basedtools available in the BarleyBase (http://www.barleybase.org, experiment: BB3) or from the ArrayExpress (http://www.ebi.ac.uk/arrayexpress; experiment: E-AFMX-3)using the associated analysis tool ‘Expression Profiler’(/http://www.ebi.ac.uk/expressionprofiler).

Fig. 1 Anatomical, ontological, and mRNA-based comparativeclassification of the transcriptomes of 15 barley tissues representingthe key stages of cereal development. a Tissues used in this study.INF Inflorescence, PST pistil, BRC bracts (lemma, palea, andglumes), ANT anthers, CAR5 caryopsis 5 DAP, CAR10 CAR16,END22 caryopsis without embryo, DEM22 embryo, GEM meso-cotyl, COL coleoptile, RAD seminal root, CRO crown, LEA leaf

(partially shown), and ROO root (partially shown). b (left)Correlation analysis of normalized transcript abundance valuesfrom 22,840 probe sets and 45 biological conditions. A, B, and C aretrue biological replicates of the 15 tissues sampled. Average linkagedistance was used to construct the dendogram. B (right) Normalizedtranscript abundance profiles of selected photosynthesis-related(red) and histone (blue) genes

204

Data analysis

Two algorithms, MAS 5.0 and RMA, were used tocalculate integrated probe set intensity values. For thesubsequent analysis presented in this manuscript, only theMAS 5.0 data set was used, though the data analyzed byboth approaches is deposited in BarleyBase (http://www.barleybase.org, experiment: BB3). MAS 5.0 values wereimported into GeneSpring 6.1 software (Silicon Genetics,CA, USA), which was used for the rest of the analysis.Normalized or relative hybridization signal values wereobtained by dividing each measurement by the 50thpercentile of all measurements in that sample and dividingthe total signal intensity for each probe set by the median ofits measurements in all samples.

Differentially expressed genes were identified usingone-way ANOVA with all available error estimates fromthe GeneSpring 6.1 (Silicon Genetics, CA, USA) cross-gene error model (including type-I replicates). The p-valuecut-off was 0.05, and Benjamini and Hochberg multipletesting false discovery rate was set at 5.0%. We analyzedseparately 15 tissue types (three type-I replicates each)from the barley cv. Morex (45 GeneChips).

The background hybridization signal value ‘63’ fromthese 45 GeneChips representing the 15 cv. Morex tissuetypes was determined by using ANOVA on the subset of 45spiked in GeneChip controls. Probe sets reporting absolutesignal values two times above the background (>126,p<0.05) were termed ‘expressed’. To classify tissue typesaccording to their transcript populations, we used hier-archical clustering (Johnson 1967) and QT-Clust (Heyer etal. 1999). A correlation matrix was obtained using standardcorrelation and a tree was constructed based on averagelinkage distance measures.

Transcripts that were specifically up-regulated inindividual tissues were identified by applying basicrelational operations to the subset of differentiallyaccumulating transcripts. Thus, the class of ‘specific’transcripts for a particular tissue type was identified byintersecting all pair-wise comparisons of a particular tissueand, therefore, they comprised a set of transcripts thataccumulate differentially in a selected tissue compared toany other. From this list, genes that are up-regulated in aparticular tissue type were identified and defined as‘specifically up-regulated’. This subset can further bepartitioned by selecting those with values two-, five-, orn-folds higher in a particular tissue compared to any other.From the differentially expressed list, genes which haveabsolute values >126 in one tissue type and <126 in theremaining 14 tissues were termed ‘exclusive’.

Barley expressed sequence tags (EST) homologs ofgenes from other species were identified by homologysearches using the tBlastX function available from theNCBI Web site (http://www.ncbi.nlm.nih.gov). Probe setsto the matching ESTs were assigned by using informationfrom the HarvEST1.16 database (http://harvest.ucr.edu/Barley1.htm). All pertinent annotation and expressioninformation for any probe set can be accessed at http://barleybase.org/barley1contig.php or http://www.plexdb.

org/modules.php?name=PD_probeset&page=annotation.php&genechip=Barley1.

Results

Tissues

We identified 15 tissues from eight developmental stagesthat we considered broadly representative of the focus ofmajor academic and commercial activities throughout theresearch community (Fig. 1a). A precise and detaileddescription of the tissues and how they were sampled isgiven at http://barleygenome.net/affy_ref/WEB_TIS-SUES/tissue_types.htm, which contains links to therelevant plant growth stage and plant anatomy ontologies.Each tissue is complex and contains a number of differentcell types that are individually specialized to perform arange of functions. The transcript abundance profiles wehave derived are an average across all component celltypes. As a result, we encourage further studies, such as insitu hybridizations using individual genes, to morespecifically address given biological questions.

Classification of the barley tissues basedon their transcriptomes

We determined the transcript levels expressed from 21,439different barley genes (Close et al. 2004) using the Barley1GeneChip. Relationships between the tissues were estab-lished by applying hierarchical clustering to the entire dataset. This analysis was performed to evaluate the perfor-mance of the probe level algorithm and normalizationprocedure and, by presenting the information as adendrogram, to illustrate graphically how the tissues arerelated on the basis of their transcriptomes. Withoutexception, the tissue replicates (labeled A, B, and C foreach tissue in Fig. 1b) clustered in terminal clades,exhibiting significantly higher correlation than observedbetween tissues. A variation in the transcript populationswas, therefore, sufficient to differentiate the biologicalsamples. As normalized integrated signal values of allprobe sets were used for clustering and yielded results thatwere consistent with our expectations (see below), weconcluded that, in general terms, the data transformationprocedure used was appropriate and did not requireadditional data processing steps.

The basal discriminating factor in the dendrogram is thepresence or absence of chlorophyll-containing cell types.This tree structure was supported by the transcriptabundance profiles of selected groups of genes represent-ing two major physiological processes: photosynthesis(GO:0015979) and cell proliferation (GO:0008283)(Fig. 1b, relative expression level line graphs) (Ware et al.2002). Consistent with our expectations, photosynthesis-associated genes (red line profiles) were up-regulated inchlorophyll-containing tissues and genes associated withcell proliferation (blue line profiles) were up-regulated in

205

tissues containing mitotically active cells. As expected,secondary and tertiary branches separated clades that aremore related anatomically and functionally than ontologi-cally. Thus, tissues from the mature flower (pistil, anthers,and bracts) are on distant clades despite being dissectedfrom the same floret. We conclude that the relative positiona tissue occupies on the dendogram, therefore, reflects howits transcriptome is programmed to fulfill the specificdemands of its biological role.

Patterns of gene expression

A total of 18,481 of the 22,700 probe sets on the Barley1array detected transcripts that were present at greater thantwice background in at least one tissue. In individualtissues, the number of expressed genes varied from 10,189in anthers to 14,805 in the crown (Table 1, ‘expressed’).Some 14,943 probe sets recorded relative expression levelsthat varied by more than fourfold between at least twotissues (Table 1, ‘informative’) with the remaining 3,538probe sets designated as constitutive. We used twoapproaches to partition the informative probe set informa-tion into co-regulated groups: (1) supervised partitioning,based on combinatorial relationships between differentiallyexpressed genes (also known as classification, discriminantanalysis, class prediction, or supervised pattern recogni-tion) and (2) unsupervised partitioning, based on the QT-Clust clustering algorithm (also known as cluster analysis,class discovery, and unsupervised pattern recognition)(Dudoit and Fridlyand 2003). The combined use of bothapproaches placed data from 9,485 different probe sets intoco-regulated groups:

1. Supervised partitioning identifies transcripts thataccumulate at a higher level in one particular tissuecompared with the others (we call these specifically

up-regulated). We observed a total of 5,764 specificallyup-regulated transcripts, ranging from six in 16 DAPcaryopses to 1,591 in anthers (Table 1, ‘up-regulated’).As expected, the number of specifically up-regulatedtranscripts was influenced by the similarities betweenthe groups of tissues. Thus, as the transcriptomes of thedeveloping caryopses are significantly more similar toeach other than to any other group of tissues (seeFig. 1b), the number of specifically up-regulatedtranscripts identified in individual developing caryop-sis tissues is correspondingly lower. A total of 650probe sets detected transcripts that were expressed inonly a single tissue type, ranging from 0 in 16 DAPcaryopsis to 251 in anthers (Table 1, ‘exclusive’).Many of the specifically up-regulated transcriptsencode putative orthologues of proteins with pre-viously described functions in the tissues in which theypredominate. For example, a putative orthologue of theprotein RAFTIN was exclusively expressed in theanther tissues. RAFTIN encodes a structural protein ofUbisch bodies, a characteristic of the secretory tape-tum, that participate in the degradation of tapetal cellwalls and transportation of Ca2+ from the tapetum tothe pollen surface. RAFTIN is essential for late pollendevelopment in cereals and appears to be monocot-specific with no obvious orthologue in the Arabidopsisgenome (Wang et al. 2003).

2. Unsupervised QT-Clust clustering assembled 6,592probe sets into 332 groups (correlation >0.9, n>10,df=14). The ten most highly populated clusterscontained 109–194 co-expressed genes while a majorgroup of 175 clusters contained 15–23 genes (Fig. 2).An inspection of the annotations of individual clustergroup members revealed considerable enrichment forfunctionally correlated transcripts. For example,among several predominant histone-enriched clusters(e.g., >50% histones), there are putative barleyorthologues of functionally well-characterized genesassociated with aspects of cell proliferation (e.g., inchromatin remodeling) including PROLIFERA (PRL)(Springer et al. 1995), WEE1 (Ferreira et al. 1993; Sunet al. 1999), TITAN (Liu and Meinke 1998; Liu et al.2002; Tzafrir et al. 2002), ARGONAUTE1 (AGO1)(Lynn et al. 1999), and DECREASED DNA METH-YLATION (DDM1) (Singer et al. 2001).

Of the total 9,485 transcripts placed into co-regulatedgroups, 2,871 were common to both analytical approaches.The roughly symmetrical partitioning illustrates thecomplementarity of the classification procedures used todescribe the transcript variability contained within the dataset. These analyses, together with the raw data, effectivelyrepresent an atlas of gene expression from seed to seedduring normal barley development.

Table 1 Number of probe sets in 15 tissues reporting expressed,informative, specifically up-regulated, and exclusive transcriptsbased on supervised partitioning

Tissue Expressedgenes

Informative Specificallyup-regulated

Exclusive

CRO 14,805 11,506 71 9ROO 14,346 11,027 349 61GEM 13,904 10,742 269 15RAD 13,825 10,587 287 6PST 13,817 10,609 39 4CAR5 13,782 10,607 84 16INF 13,638 10,329 476 12COL 13,250 10,106 372 5CAR10 13,230 10,245 78 15BRC 12,856 10,027 262 21LEA 12,508 9,734 1,295 176DEM22 12,153 7,954 330 51CAR16 11,453 8,864 6 0END22 10,345 7,954 255 8ANT 10,189 7,706 1,591 251

206

Variation in the accumulation of regulatory factortranscripts

According to sequence-homology-based annotations, thereare at least 1,059 probe sets on the Barley1 GeneChiprepresenting regulatory factor (RF) genes. Approximately30% correspond to transcription factors, with the remainderrepresented mostly by protein kinases. We examined howvariation in the abundance of RF transcripts was correlatedwith the other genes on the array. Only 393 of the RF probesets were classified as informative, and they partitionedproportionally among transcript groups identified bysupervised and unsupervised analyses. We found thatunique sets of RFs accumulated in different barley tissues,with the anthers and inflorescence accumulating the most,135 and 131, respectively. There was a clear trend betweenthe number of up-regulated RF genes and the total numberof genes expressed in a given tissue (r=0.97), which isconsistent with transcriptional diversity being linked to thedifferential accumulation of RF transcripts.

To display the dynamic changes in the informative RFgene transcripts, we plotted their abundance in each tissueseparately along with their distribution in the remaining 14(Fig. 3). Different tissues accumulated specific suites ofRFs, with most individual RF transcripts up-regulated inseveral. For example, RF’s associated with late endospermdevelopment can be identified in caryopses 10–22 DAP(Fig. 3a). The developing embryo (part of the 22 DAPcaryopsis) accumulated many RFs that were also up-regulated in the inflorescence. Furthermore, there is a clearoverlap in the RFs expressed in the inflorescence and thecoleoptile (Fig. 3b,c). These observations are consistentwith a model where subsets of a suite of commonregulatory modules form the core components of regula-tory networks, with additional specific modules incorpo-

rated to control the highly specialized functions essentialfor morphological and functional differentiation of thesampled tissues.

Using the data set to generate or test biologicalhypotheses

Given that the data are largely descriptive, we took thefollowing approach to demonstrate how they could be usedto generate or test biological hypotheses. First, weidentified strong candidate barley homologs of a widerange of genes that have well-defined molecular andcellular functions in other systems. We examined theirlevels and patterns of gene expression across tissues andidentified groups of co-regulated genes at various correla-tion stringencies (r=0.85–1.0). We subdivided these intothose with known (putative) functions and hypothetical andunknown genes then inspected the annotations of the well-characterized co-regulated genes and the literature asso-ciated with them to determine their putative functionalrelatedness. We considered that the identification ofmultiple functionally related genes would significantlystrengthen any inferences we made regarding hypotheticaland/or unknown barley genes in any given co-regulatedgroup. One of the examples we followed using thisapproach is presented here.

AGAMOUS (AG) is a C-function floral organ identityMADS-box gene that is responsible for the formation ofstamens and carpels in the wild-type Arabidopsis flower(Yanofsky et al. 1990). Curiously, no AG-like mutants havebeen identified in barley, even after many decades ofmutagenesis and phenotypic characterization (JerryFranckowiac, personal communication). In maize, AGA-MOUS function is thought to be shared by ZMM2 andZAG1, two sub-functionalized paralogs that display over-lapping and partially redundant functions. A phylogeneticanalysis of AG-like sequences has shown that ZMM2(Mena et al. 1996) is very closely related at the DNAsequence level to HvAG2 (Schmitz et al. 2000), suggestingthatHvAG2 is likely its functional ortholog. The expressionatlas shows that HvAG2 mRNA accumulates not only indeveloping barley seeds, but also in the floral tissues,immature inflorescence, bracts, and pistil (Fig. 4). AGA-MOUS function in Arabidopsis promotes the continueddifferentiation of non-meristematic cells. Targets of AGA-MOUS remain relatively poorly defined and only a few,including SHATTERPROOF2 (Liljegren et al. 2000), aMADS-box protein that controls seed dispersal, have beendescribed. Strict clustering (r=0.9) identified 11 transcriptswith similar relative abundance patterns to HvAG2(Table 2). These included transcripts encoding HvAG1,the putative functional ortholog of maize ZAG1. Theremainder includes six uncharacterized MADS-box andregulatory genes, including one that is highly homologousto TaMADS5, a putative wheat ortholog of SHATTER-PROOF2, along with four hypothetical proteins from rice.These data allow us to erect the hypothesis: ‘some or all ofthe (eleven) transcripts co-regulated with HvAG2 are

Fig. 2 The frequency of co-regulated gene cluster sizes identifiedby unsupervised QT-Clust clustering. Left ordinate axis (frequency)relates to the bar graph and shows the number of independentclusters containing the number of genes indicated on the abscissaaxis (QT bin). Associated cumulative values are shown on the rightordinate axis (%) and they relate to the line graph. Correlation >0.9and a minimal final cluster size of 10 were the parameters used forclustering

207

components of the C-function transcriptional network thatpromotes wild-type seed development in barley’. Anumber of strategies could clearly be used to test this.The salient point is that the data set provides strong leads,frequently (as in this case) with biological support, thatcould be explored to further investigate a wide range ofdevelopmental, physiological, or biochemical processes inthis large-genome monocotyledonous crop.

Discussion

mRNA profiling experiments are generally designed toidentify sets of informative candidate genes that areinvolved in a specific biological process. As a consortiumrepresenting the barley genomics community, we set outinstead to assemble a high-quality, highly descriptive dataset that would act as a publicly accessible point of referencefor future studies and as a catalyst for comparativegenomics and hypothesis-driven research. The data setwas derived from 15 tissues from the barley cv. Morex andcan be accessed at BarleyBase (http://www.barleybase.org)and ArrayExpress (http://www.ebi.ac.uk/arrayexpress),and represents an atlas of gene expression throughoutdevelopment from seed to seed.

A previous thorough evaluation of the Barley1 Gene-Chip platform gives us a high degree of confidence that thedata presented in this study is representative and accuratelyreflect the quantitative variation in mRNA levels found inthe tissues that we analyzed. At a technical level, the arraywas extensively tested by Close et al. (2004) and hassubsequently been shown to provide information-rich datawhen used to investigate specific biological systems (Caldoet al. 2004). The measured absolute and relative levels oftranscript abundance have been compared to expressionmeasures derived from both; gene-specific (e.g., quantita-tive real-time polymerase chain reaction, G. Fincher/R.Burton, personal communication) and general profilingtechnologies (e.g., serial analysis of gene expression) usingeither the same or comparable RNA samples (Ibrahim et al.2005). In each case, the data were remarkably consistent.

In our experiment, we recorded the abundance oftranscripts from 18,481 expressed barley genes across thetissues analyzed, representing approximately 85% of theprobe sets on the Barley1 GeneChip. The absolute andrelative levels and patterns of transcript abundance in thesetissues provide an expression-based annotation for thegenes represented by these probe sets and supplement theavailable gene sequence and homology-based annotations

3Fig. 3 Regulatory factor transcript levels in different tissues.a Distribution plots of the relative abundance of RF transcriptsthat are up-regulated in caryopses at 16 DAP (CAR16) is shown incomparison to their expression in the other 14 tissues. The gray boxshows the relative expression level of 56 RF genes that are up-regulated >1.5-fold in CAR16. Each gene is represented as a smallblack box positioned on the vertical axis at its relative expressionvalue (e.g., in CAR16, all black boxes are >1.5-fold up-regulated).In the 14 other plots, the relative expression level of the same 56 RFgenes is shown. Six tissues encompassing stages and tissues fromthe developing grain and four tissues containing a significantproportion of the developing endosperm are boxed to highlight theoverlap in RF expression in related tissues. b Distribution plots ofRFs in the developing embryo (DEM22), coleoptile (COL), andimmature inflorescence (INF) are highlighted. As in (a), transcripts>1.5-fold up-regulated (dark gray background) were identified ineach tissue and their relative expression level in each of the other 14tissues plotted. Comparing the relative distribution of black boxes onthe vertical axes highlights similarities and differences between thehighlighted tissue (DEM22, COL, and INF) and the remaining 15.c The number of common and tissue specific RFs in DEM22, COL,and INF presented as a Venn diagram that illustrates the modularityof RF expression in vegetative and floral meristems

208

of the respective genes. Although only a small number ofbarley genes have been studied in any detail, homology-based annotations currently provide putative functions forapproximately two thirds of the genes on the Barley1GeneChip. Two important classes of genes are thosedesignated as hypothetical (i.e., those that have noreasonable homology-based functional annotation) andthose that encode regulatory factors. The Barley1 Gene-Chip contains at least 6,789 and 1,059 probe setsrepresenting hypothetical genes and regulatory factors,respectively. Our results provide transcript-abundance-based annotation for at least 3,756 and 393 of these,respectively. We suggest that the information on transcriptabundance, tissue distribution, and co-regulation withfunctionally annotated genes will facilitate the develop-ment or testing of biological hypotheses and guide thedesign of further experimental strategies required forfunctional validation.

Using examples, we have illustrated how the geneexpression atlas could be used to erect further experimentsto test specific biological hypotheses. We are collectivelyof the opinion that by making the data from this experimentpublicly available and easily accessible, it will provide apoint of reference for those working on individual orgroups of genes in barley and a portal for those working onorthologous genes in other species to investigate transcrip-tional activity throughout normal stages of development.To facilitate this level of exploitation, we have erected a

series of Web-accessible tutorials to guide those who areunfamiliar with barley and barley genomics resourcesthrough the process of querying levels and patterns ofexpression in this data set. The tutorials (http://germinate.scri.ac.uk/barley/tutorials) represent an extension of thosedescribed previously by Shen et al. (2005). We will updateand expand these tutorials in response to the availability ofnew data sources and possibilities for comparativeinference.

It is important to recognize that the expression atlas hasat least two significant limitations: the spatial resolutionover which we have monitored gene expression is low andwe have used only a single barley cultivar. Drea et al.(2005) overcame the spatial resolution problem in wheatusing microdissection and linear amplification of extractedRNA combined with high-throughput RNA in situ hybrid-ization. They identified genes with novel expression pat-terns during the early stages of wheat grain development aswell as some with apparently conserved functions in seeddevelopment between Arabidopsis and wheat (data avail-able at http://bioinf.scri.sari.ac.uk/cgi-bin/insitu/home).Casson et al. 2005 illustrated nicely the power of lasercapture microdissection coupled with linear mRNA ampli-fication to profile gene expression in very small numbers ofcells. In relation to cultivar specificity, it is becomingincreasingly apparent that allele specific expression levelpolymorphism is relatively common in biological systems(e.g., see Gompel et al. 2005; Wittkopp et al. 2004). The

Fig. 4 Relative expression ofHvAG2 across 15 biologicalsamples. The graph shows thepattern of gene expression in theQT-Clust cluster containingHvAG2 (black) and 11 co-regu-lated genes (gray) (r>0.9). Thebest tBlastX annotations foreach of these co-regulated genesare presented along with theirrespective e-values

Table 2 Sequence-homology-based annotations of the genes identified by similarity to the abundance profile of HvAG2 (probe setContig3831_at)

Best BlastX hit BlastX hit e-value T test p-value in pistil

AGAMOUS-like protein 2 HvAG2 (Hordeum vulgare) 1.00E−127 2.40E−07Zinc finger with coiled-coil domain 2 (Homo sapiens) 1.00E−11 8.80E−05Hypothetical protein (Oryza sativa) 3.00E−24 1.40E−06MADS-box protein 9 (Hordeum vulgare) 1.00E−140 1.40E−05MADS box protein 5 (Triticum aestivum) 1.00E−111 9.60E−05Putative PEP carboxylase (Oryza sativa) 1.00E−119 2.60E−05Hypothetical protein OSJNBa0044M19.12 (Oryza sativa) 3.00E−04 9.00E−06Hypothetical protein OSJNBa0035M09.6 (Oryza sativa) 8.00E−22 1.20E−05AGAMOUS-like protein 1 HvAG1 (Hordeum vulgare) 1.00E−117 4.20E−06Hypothetical protein (Oryza sativa) 3.00E−24 3.60E−05RNA-binding protein, putative (Arabidopsis thaliana) 2.00E−31 5.70E−05Putative oligopeptide transporter (Oryza sativa) 2.00E−69 1.20E−04

The t-test p-value measures the probability of the likelihood that the difference between normality (normalized value of 1) and the measuredrelative signal is actually less than indicated

209

levels and profiles of expression across tissues of somegenes in cv. Morex is indeed different in other cultivars(Druka et al., in preparation). It will, therefore, be prudent tobear this in mind and test experimentally if appropriate forthe questions being asked.

We believe that exciting results will undoubtedly emergefrom leveraging the community-designed 22K Barley1GeneChip to address a wide range of biological questions.Resources such as BarleyBase/PLEXdb (http://barleybase.org/, http://plexdb.org/; Shen et al. 2005) and HarvEST(http://harvest.ucr.edu/) will allow users to perform hy-pothesis-building or testing queries from multiple inter-linked sources, e.g., a particular gene, a protein class, orEST entries coupled with transcript abundance data from avariety of crop and model plant species. The experimentswe have presented here will form a key component of thisand will, we propose, contribute towards the developmentof a wide range of intelligent hypotheses driving a newgeneration of functional experiments.

Acknowledgements We thank Alvis Brazma, Dan Nettleton, TomFreeman, and John Quakenbush for valuable conceptual input on theart of microarray data analysis; Julie Dickerson and Lishuang Shenfor help with submission to the BarleyBase; Philippe Rocca-Serra forhelp with submission in ArrayExpress; and Doreen Ware and PankajJaiswal for assistance with plant ontologies. Sarah Jackson andYusuke Komishi from GeneSpring are acknowledged for helpfuladvice and excellent technical support during the data analysis andpresentation. Funding for this experiment was provided by ScottishExecutive Environment and Rural Affairs Department (Grant No.IGD12397 to RW); BBSRC (Grant No. ISIS 1107 to RWand GJM);USDA Initiative for Future Agriculture and Food Systems (IFAFS)01-52100-11346 to AK, RPW, TJC, and GJM; USDA-NRI 02-35300-12619 to RPW; USDA-NRI 02-35300-12548 to TJC; USDA-CSREES North American Barley Genome Project funds to RPW,GJM, AK, TJC, and PH; McKnight Landgrant Professorship(University of Minnesota) for sabbatical leave to GJM; BMBFPlant Genome Program ‘GABI’ (Grants No. 0312282 and 0312271)to AG; and TEKES (National Technology Agency of Finland) andBoreal Plant Breeding to AS.

References

Birnbaum K, Shasha DE, Wang JY, Jung JW, Lambert GM,Galbraith DW, Benfey PN (2003) A gene expression map of theArabidopsis root. Science 302:1956–1960

Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P,Stoeckert C, Aach J, Ansorge W, Ball CA, Causton HC,Gaasterland T, Glenisson P, Holstege FC, Kim IF, Markowitz V,Matese JC, Parkinson H, Robinson A, Sarkans U, Schulze-Kremer S, Stewart J, Taylor R, Vilo J, Vingron M (2001)Minimum information about a microarray experiment(MIAME)—toward standards for microarray data. Nat Genet29:365–371

Caldo RA, Nettleton D, Wise RP (2004) Interaction-dependent geneexpression in Mla-specified response to barley powderymildew. Plant Cell 16:2514–2528

Casson S, Spencer M, Walker K, Lindsey K (2005) Laser capturemicrodissection for the analysis of gene expression duringembryogenesis of Arabidopsis. Plant J 42:111–123

Che P, Gingerich DJ, Lall S, Howell SH (2002) Global andhormone-induced gene expression changes during shoot de-velopment in Arabidopsis. Plant Cell 14:2771–2785

Chen W, Provart NJ, Glazebrook J, Katagiri F, Chang HS, EulgemT, Mauch F, Luan S, Zou G, Whitham SA, Budworth PR, TaoY, Xie Z, Chen X, Lam S, Kreps JA, Harper JF, Si-Ammour A,Mauch-Mani B, Heinlein M, Kobayashi K, Hohn T, Dangl JL,Wang X, Zhu T (2002) Expression profile matrix ofArabidopsis transcription factor genes suggests their putativefunctions in response to environmental stresses. Plant Cell14:559–574

Cheong YH, Chang HS, Gupta R, Wang X, Zhu T, Luan S (2002)Transcriptional profiling reveals novel interactions betweenwounding, pathogen, abiotic stress, and hormonal responses inArabidopsis. Plant Physiol 129:661–677

Cho Y, Fernandes J, Kim S-H, Walbot V (2002) Gene-expressionprofile comparisons distinguish seven organs of maize.Genome Biol 3:1–16

Close TJ, Wanamaker SI, Caldo RA, Turner SM, Ashlock DA,Dickerson JA, Wing RA, Muehlbauer GJ, Kleinhofs A, WiseRP (2004) A new resource for cereal genomics: 22K barleyGeneChip comes of age. Plant Physiol 134:960–968

Cooper B, Clarke JD, Budworth P, Kreps J, Hutchison D, Park S,Guimil S, Dunn M, Luginbuhl P, Ellero C, Goff SA,Glazebrook J (2003) A network of rice genes associated withstress response and seed development. Proc Natl Acad Sci U SA 100:4945–4950

Czechowski T, Stitt M, Altmann T, Udvardi MK, Scheible W-R(2005) Genome-wide identification and testing of superiorreference genes for transcript normalization in Arabidopsis.Plant Physiol 139:5–17

Drea S, Leader DJ, Arnold B, Shaw P, Dolan L, Doonan JH (2005)Systematic spatial analysis of gene expression during wheatcaryopsis development. Plant Cell 17:2172–2185

Dudoit S, Fridlyand J (2003) Rules classification in microarrayexperiments. In: Speed T (ed) Statistical analysis of geneexpression. Chapman & Hall/CRC, pp 93–158

Ferreira PC, Hemerly AS, Van Montagu M, Inze D (1993) A proteinphosphatase 1 from Arabidopsis thaliana restores temperaturesensitivity of a Schizosaccharomycespombe cdc25ts/wee1-double mutant. Plant J 4:81–87

Goda H, Shimada Y, Asami T, Fujioka S, Yoshida S (2002)Microarray analysis of brassinosteroid-regulated genes inArabidopsis. Plant Physiol 130:1319–1334

Gompel N, Prud’homme B, Wittkopp PJ, Kassner VA, Carroll SB(2005) Chance caught on the wing: cis-regulatory evolution andthe origin of pigment patterns in Drosophila. Nature 433:481–487

Harmer SL, Hogenesch JB, Straume M, Chang HS, Han B, Zhu T,Wang X, Kreps JA, Kay SA (2000) Orchestrated transcriptionof key pathways in Arabidopsis by the circadian clock. Science290:2110–2113

Heyer LJ, Kruglyak S, Yooseph S (1999) Exploring expression data:identification and analysis of coexpressed genes. Genome Res9:1106–1115

Hunter BG, Beatty MK, Singletary GW, Hamaker BR, Dilkes BP,Larkins BA, Jung R (2002) Maize opaque endospermmutations create extensive changes in patterns of gene expres-sion. Plant Cell 14:2591–2612

Ibrahim AF, Hedley PE, Cardle L, Kruger W, Marshall DF,Muehlbauer GJ, Waugh R (2005) A comparative analysis oftranscript abundance using SAGE and Affymetrix arrays. FunctIntegr Genomics 5:163–174

Johnson SC (1967) Hierarchical clustering schemes. Psychometrika32:241–254

Leonhardt N, Kwak JM, Robert N, Waner D, Leonhardt G,Schroeder JI (2004) Microarray expression analyses ofArabidopsis guard cells and isolation of a recessive abscisicacid hypersensitive protein phosphatase 2C mutant. Plant Cell16:596–615

Liljegren SJ, Ditta GS, Eshed Y, Savidge B, Bowman JL, YanofskyMF (2000) SHATTERPROOF MADS-box genes control seeddispersal in Arabidopsis. Nature 404:766–770

210

Liu CM, Meinke DW (1998) The titan mutants of Arabidopsis aredisrupted in mitosis and cell cycle control during seeddevelopment. Plant J 16:21–31

Liu CM, McElver J, Tzafrir I, Joosen R, Wittich P, Patton D, VanLammeren AA, Meinke D (2002) Condensin and cohesinknockouts in Arabidopsis exhibit a titan seed phenotype. PlantJ 29:405–415

Lynn K, Fernandez A, Aida M, Sedbrook J, Tasaka M, Masson P,Barton MK (1999) The PINHEAD/ZWILLE gene acts pleio-tropically in Arabidopsis development and has overlappingfunctions with the ARGONAUTE1 gene. Development126:469–481

Mena M, Ambrose BA, Meeley RB, Briggs SP, Yanofsky MF,Schmidt RJ (1996) Diversification of C-function activity inmaize flower development. Science 274:1537–1540

Menges M, Hennig L, Gruissem W, Murray JA (2002) Cell cycle-regulated gene expression in Arabidopsis. J Biol Chem277:41987–42002

Moseyko N, Zhu T, Chang HS, Wang X, Feldman LJ (2002)Transcription profiling of the early gravitropic response inArabidopsis using high-density oligonucleotide probe micro-arrays. Plant Physiol 130:720–728

Parkinson H, Sarkans U, Shojatalab M, Abeygunawardena N,Contrino S, Coulson R, Farne A, Lara GG, Holloway E,Kapushesky M, Lilja P, Mukherjee G, Oezcimen A, Rayner T,Rocca-Serra P, Sharma A, Sansone S, Brazma A (2005)ArrayExpress—a public repository for microarray gene expres-sion data at the EBI. Nucleic Acids Res 33:D553–D555(Database issue)

Rossel JB, Wilson IW, Pogson BJ (2002) Global changes in geneexpression in response to high light in Arabidopsis. PlantPhysiol 130:1109–1120

Ruuska SA, Girke T, Benning C, Ohlrogge JB (2002) Contrapuntalnetworks of gene expression during Arabidopsis seed filling.Plant Cell 14:1191–1206

Schmid M, Uhlenhaut NH, Godard F, Demar M, Bressan R, WeigelD, Lohmann JU (2003) Dissection of floral induction pathwaysusing global expression analysis. Development 130:6001–6012

Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M,Scholkopf B, Weigel D, Lohmann JU (2005) A gene expressionmap of Arabidopsis thaliana development. Nat Genet 37(5):501–506

Schmitz J, Franzen R, Ngyuen TH, Garcia-Maroto F, Pozzi C,Salamini F, Rohde W (2000) Cloning, mapping and expressionanalysis of barley MADS-box genes. Plant Mol Biol 42:899–913

Shen L, Gong J, Caldo RA, Nettleton D, Cook D, Wise RP,Dickerson JA (2005) BarleyBase—an expression profilingdatabase for plant genomics. Nucleic Acids Res 33:D614–D618 (Database issue)

Singer T, Yordan C, Martienssen RA (2001) Robertson’s mutatortransposons in A. thaliana are regulated by the chromatin-remodeling gene Decrease inDNA Methylation (DDM1). GenesDev 15:591–602

Springer PS, McCombie WR, Sundaresan V, Martienssen RA(1995) Gene trap tagging of PROLIFERA, an essential MCM2-3-5-like gene in Arabidopsis. Science 268:877–880

Sun Y, Dilkes BP, Zhang C, Dante RA, Carneiro NP, Lowe KS, JungR, Gordon-Kamm WJ, Larkins BA (1999) Characterization ofmaize (Zea mays L.) Wee1 and its activity in developingendosperm. Proc Natl Acad Sci U S A 96:4180–4185

Tzafrir I, McElver JA, Liu Cm CM, Yang LJ, Wu JQ, Martinez A,Patton DA, Meinke DW (2002) Diversity of TITAN functionsin Arabidopsis seed development. Plant Physiol 128:38–51

Wang A, Xia Q, Xie W, Datla R, Selvaraj G (2003) The classicalUbisch bodies carry a sporophytically produced structuralprotein (RAFTIN) that is essential for pollen development. ProcNatl Acad Sci U S A 100:14487–14492

Ware DH, Jaiswal P, Ni J, Yap IV, Pan X, Clark KY, Teytelman L,Schmidt SC, Zhao W, Chang K, Cartinhour S, Stein LD,McCouch SR (2002) Gramene, a tool for grass genomics. PlantPhysiol 130:1606–1613

Wellmer F, Riechmann JL, Alves-Ferreira M, Meyerowitz EM(2004) Genome-wide analysis of spatial gene expression inArabidopsis flowers. Plant Cell 16:1314–1326

Wittkopp PJ, Haerum BK, Clark AG (2004) Evolutionary changesin cis and trans gene regulation. Nature 430:85–88

Yanofsky MF, Ma H, Bowman JL, Drews GN, Feldmann KA,Meyerowitz EM (1990) The protein encoded by the Arabi-dopsis homeotic gene agamous resembles transcription factors.Nature 346:35–39

Zhu T, Budworth P, Chen W, Provart NJ, Chang HS, Guimil S,Wenpei Su, Estes B, Zou G, Wang X (2003) Transcriptionalcontrol of nutrient partitioning during rice grain filling. PlantBiotechnol J 1:59–70

211