Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Bioinformatics
CORNET: A User-Friendly Tool for Data Miningand Integration1[W]
Stefanie De Bodt*, Diana Carvajal, Jens Hollunder, Joost Van den Cruyce, Sara Movahedi, and Dirk Inze
Department of Plant Systems Biology, Flanders Institute for Biotechnology, and Department of PlantBiotechnology and Genetics, Ghent University, 9052 Ghent, Belgium
As an overwhelming amount of functional genomics data have been generated, the retrieval, integration, and interpretation ofthese data need to be facilitated to enable the advance of (systems) biological research. For example, gathering and processingmicroarray data that are related to a particular biological process is not straightforward, nor is the compilation of protein-protein interactions from numerous partially overlapping databases identified through diverse approaches. However, thesetasks are inevitable to address the following questions. Does a group of differentially expressed genes show similar expressionin diverse microarray experiments? Was an identified protein-protein interaction previously detected by other approaches?Are the interacting proteins encoded by genes with similar expression profiles and localization? We developed CORNET (forCORrelation NETworks) as an access point to transcriptome, protein interactome, and localization data and functionalinformation on Arabidopsis (Arabidopsis thaliana). It consists of two flexible and versatile tools, namely the coexpression tooland the protein-protein interaction tool. The ability to browse and search microarray experiments using ontology terms and theincorporation of personal microarray data are distinctive features of the microarray repository. The coexpression tool enableseither the alternate or simultaneous use of diverse expression compendia, whereas the protein-protein interaction tool searchesexperimentally and computationally identified protein-protein interactions. Different search options are implemented toenable the construction of coexpression and/or protein-protein interaction networks centered around multiple input genes orproteins. Moreover, networks and associated evidence are visualized in Cytoscape. Localization is visualized in pie charts,thereby allowing multiple localizations per protein. CORNET is available at http://bioinformatics.psb.ugent.be/cornet.
A high number of experiments have been per-formed to unravel molecular mechanisms underly-ing diverse biological processes active in the modelplant Arabidopsis (Arabidopsis thaliana). For instance,in microarray experiments, diverse tissues from wild-type plants as well as mutant or transgenic plants aresampled at different developmental stages and treatedwith numerous compounds. Although enormousamounts of data have been generated, it remains ahurdle to sift through the heterogeneous informationto find data relevant to a particular biological question.Data resulting from profiling studies are stored indifferent formats in various databases. Microarraydata and, in particular, corresponding meta-data (e.g.
sampled tissue, time point, treatment) are stored inan unstructured manner, which complicates dataretrieval and interpretation. Also, proteomics data,such as protein-protein interaction (PPI) data, are dis-persed over several databases in somewhat differentformats. Although some efforts have been made, suchas setting up MIAME (for Minimum InformationAbout a Microarray Experiment) and MIAPE (forMinimum Information About a Proteomics Experi-ment), these systems are not (yet) generally employed(Taylor et al., 2007; Brazma, 2009). Using transcriptprofiling data, one can investigate how genes areexpressed, when genes are active and/or differentiallyexpressed, and which other genes show similar ex-pression profiles. Integration of microarray data withPPI data can, for instance, lead to the identification ofprotein complexes and/or coregulated genes, a betterunderstanding of a group of differentially expressedgenes, and the prediction of putative functions forunknown genes (Brown et al., 2005; Gachon et al.,2005; Lisso et al., 2005; Rautengarten et al., 2005;Usadel et al., 2009).
Databases and tools such as ACT (Manfield et al.,2006), AtCOECiS (Vandepoele et al., 2009), ATTED-II(Obayashi et al., 2007, 2009), Bio-Array Resource (BARToufighi et al., 2005), CressExpress (Srinivasasainagendraet al., 2008), CSB.DB (Steinhauser et al., 2004), GeneCAT(Mutwil et al., 2008), Genevestigator (Zimmermannet al., 2004), Plant Gene Expression Database (Horanet al., 2008), and PRIMe (Akiyama et al., 2008) havebeen developed with the aim to easily find similarity
1 This work was supported by Ghent University (BijzonderOnderzoeksfonds Methusalem project no. BOF08/01M00408), theInteruniversity Attraction Poles Programme (grant nos. IUAP VI/25[BioMaGNet] and VI/33) initiated by the Belgian State Science PolicyOffice, the European Union 6th Framework Programme (AGRON-OMICS grant no. LSHG–CT–2006–037704), the Institute for thePromotion of Innovation by Science and Technology in Flanders-funded industrial collaboration project with BASF Cropscience,and the Research Foundation-Flanders (postdoctoral fellowship toS.D.B.).
* Corresponding author; e-mail [email protected] author responsible for distribution of materials integral to the
findings presented in this article in accordance with the policydescribed in the Instructions for Authors (www.plantphysiol.org) is:Stefanie De Bodt ([email protected]).
[W] The online version of this article contains Web-only data.www.plantphysiol.org/cgi/doi/10.1104/pp.109.147215
Plant Physiology�, March 2010, Vol. 152, pp. 1167–1179, www.plantphysiol.org � 2010 American Society of Plant Biologists 1167
between genes based on their expression, and data-bases such as IntAct (Hermjakob et al., 2004), BioGRID(Stark et al., 2006), DIP (Salwinski et al., 2004), MINT(Chatr-aryamontri et al., 2007), BIND (Bader et al.,2003), BAR Arabidopsis Interactions Viewer (Geisler-Lee et al., 2007), and AtPID (Cui et al., 2008) have beenbuilt to easily retrieve PPIs. In the next paragraph, wedescribe the tools that have been developed to enablesmall-scale coexpression studies rather than the nu-merous large-scale coexpression studies often involv-ing clustering. For a more detailed discussion on thecoexpression tools as well as other databases and toolsdeveloped for hypothesis generation in plant biology,we point to two recent reviews (Brady and Provart,2009; Usadel et al., 2009).
Current coexpression tools allow the visualization ofgene expression profiles and/or the search for genesthat are coexpressed with one or more genes of inter-est. To identify coexpression, these tools employ ameasure, such as the Pearson correlation coefficient, acorrelation rank (Obayashi et al., 2009), or linear re-gression (Srinivasasainagendra et al., 2008), followedby either applying an absolute cutoff or selecting thetop x most correlated genes. The implementation ofcoexpression analysis is well advanced, with sometools that provide a flexible choice of input expressiondata sets. For instance, BAR Expression Angler allowsthe use of different types of expression data sets,among them the AtGenExpress compendia (Schmidet al., 2005; Toufighi et al., 2005; Kilian et al., 2007;Goda et al., 2008), whereas CressExpress allows theselection of microarray experiments based on tissuetypes (Srinivasasainagendra et al., 2008). Intuitively,the use of different expression data sets can yielddifferent degrees of expression correlation betweengenes because some genes might behave similarlyunder certain conditions and differently under others.In other words, condition-dependent and condition-independent coexpression analyses have to be distin-guished (Usadel et al., 2009). Therefore, a flexible andefficient compilation of the expression data sets usedto calculate expression correlation needs to be enabled.
In contrast to the coexpression analysis, only a fewtools provide additional functionalities, such as re-trieval of PPIs, functions, pathways, and cis-regulatoryelements, and the network visualization. The follow-ing tools have incorporated PPI data from one or moreof the above-mentioned PPI databases. The output ofthe BAR Expression Angler displays Gene Ontology(GO) functional categories and PPI data from the BARArabidopsis Interactions Viewer (Toufighi et al., 2005;Geisler-Lee et al., 2007). ATTED-II provides PPIs,Kyoto Encyclopedia of Genes and Genomes pathwayinformation, and cis-regulatory elements in additionto coexpression links (Obayashi et al., 2009). VirtualPlant provides a network analysis tool that compilesPPI data (BIND, interolog detection, and AtPID),microRNA:RNA associations, enzymatic reactions(both primary and secondary), and regulatory linksbased on binding site occurrence (Gutierrez et al., 2007).
To a large extent, the representation of the outputdetermines the accessibility and interpretability of theresults. The aforementioned tools came up with dif-ferent solutions to represent coexpression and inter-action data. In most tools, the output is in tabularformat (such as in CressExpress [Srinivasasainagendraet al., 2008]). Although this format has many advan-tages for the advanced user who can import the resultsin other software tools, it does not allow immediateinspection of the results by less experienced users.With the BAR Expression Angler, the viewing anddownloading of results are possible in both text andmatrix formats (Toufighi et al., 2005), and with theDataMetaFormatter, functional classification of thecoexpressed genes and PPIs are displayed on a click-able map of the matrix of coexpression data linkingto other BAR tools. The BAR Arabidopsis InteractionsViewer allows the export of PPI networks to Cytoscapesif format. ATTED-II generates a network representa-tion of the results (Obayashi et al., 2009). Althoughintuitively very comprehensive, the network views arestatic, ruling out visualization and exploration of largenetworks (Obayashi et al., 2007). In addition, networkvisualization is only possible in within-query genesearches. Only these small networks can be down-loaded in tab-delimited, Pajek or Cytoscape sif formats(Shannon et al., 2003; de Nooy et al., 2005). In the latestversion of ATTED-II, precalculated networks of par-ticular genes can be viewed using the Google MapsAPI (Obayashi et al., 2009). PRIMe allows coexpres-sion analysis of multiple genes, provides the results innetwork files that can be viewed in dedicated soft-ware, such as Pajek (de Nooy et al., 2005) or Biolayout(Goldovsky et al., 2005), and thus allows the explora-tion of larger networks (Akiyama et al., 2008). Thenetwork analysis tool of Virtual Plant visualizes theresulting networks in Cytoscape Web Start (Gutierrezet al., 2007).
Taking into account all these features, we developed anew user-friendly tool for data mining and integration,with theacronymCORNET(forCORrelationNETworks),that is accessible through http://bioinformatics.psb.ugent.be/cornet. We collected the majority of the cur-rentlyavailablemicroarrayexpressiondata; correspond-ing meta-data describing sampled tissues, treatments,and time points of sampling; PPI data; localizationdata; and functional information in a central database.A user-friendly interface allows one to query the data-base, enabling coexpression analysis through a multi-tude of search options addressing diverse biologicalquestions. Several predefined expression data sets,such as global compendia representing diverse exper-imental conditions as well as tissue- or treatment-specific expression data sets, are provided. In addition,the user can compile expression data sets from publicas well as private microarray data or can uploadpersonal processed expression data sets. Directed se-lection of microarray experiments is possible, as allmeta-data are described with standardized ontologyterms and stored in the database. Not only is it possible
De Bodt et al.
1168 Plant Physiol. Vol. 152, 2010
to calculate expression correlation based on one partic-ular data set, but coexpression also can be assessedsimultaneously among several expression data sets.PPI networks can be reconstructed with both experi-mentally identified and computationally predicteddata. Moreover, coexpression and PPI networks canbe integrated. CORNET generates a comprehensivevisualization that provides a bird’s eye view of theresults and the different degrees of reliability of theextracted information. The toolmakes use ofCytoscapeWeb Start, which has the advantage that all function-alities of Cytoscape itself and numerous plug-ins canbe exploited to further explore the constructed net-works (Shannon et al., 2003).
RESULTS AND DISCUSSION
Primarily, CORNET is composed of two tools,namely the coexpression tool and the PPI tool, con-structing coexpression and PPI networks, respectively(Fig. 1). Both tools can be used autonomously butcan also be used consecutively to build a network ofcoexpression links as well as PPIs. Additionally, local-ization and functional information (GO terms andprotein domain information) can be displayed on theconstructed networks.
Annotation of Microarray Experiments
All expression data available at Gene ExpressionOmnibus (Barrett and Edgar, 2006) and resulting fromexperiments carried out on Affymetrix ATH1 arrayswere incorporated into the CORNET database. As themeta-data of the microarray experiments available inpublic databases are very unstructured and hard toprocess automatically, and as information on growthconditions, treatments, sampled tissues, and geno-types is difficult to retrieve, we described the meta-data of the microarray experiments by manuallyassigning ontology terms. Existing ontologies wereexploited to avoid confusion and redundancy. PlantOntology (Plant Structures and Plant Growth andDevelopmental Stages; Bruskiewich et al., 2002; Pujaret al., 2006; Ilic et al., 2007; Avraham et al., 2008), PlantEnvironmental Conditions (www.gramene.org), andMGED Ontology (Whetzel et al., 2006) were used(Supplemental Fig. S1). The final aim of this ontology-based annotation was the automatic and comprehen-sive retrieval of microarray experiments, similar to theselection of microarray experiments in Genevestigator(Zimmermann et al., 2004). With these microarraydata, we compiled different, so-called predefined ex-pression compendia (see “Materials and Methods”).Compendium 1 is a set of microarray experimentscovering diverse conditions but somewhat biased to-ward growth and development. Compendium 2 alsocontains diverse conditions, but biases toward partic-ular design types (see “Materials and Methods”) arereduced as much as possible. In addition, highly re-
dundant experiments are removed, resulting in a setof approximately 100 experiments. Finally, we com-piled several specific expression compendia (abioticstress, biotic stress, development, flower, genetic mod-ification, hormone, leaf, root, seed, and abiotic plusbiotic stress data sets) using the design types andontology terms (see “Materials and Methods”). Theuser can temporarily upload personal raw expressiondata and annotate and incorporate the data into user-defined expression data sets (see “Materials andMethods”).
Coexpression Tool
Using the coexpression tool, genes with similarexpression profiles in a number of experimental con-ditions can be identified. When performing a coex-pression analysis, first, one needs to decide on theinput expression data that will be used. The user caneither select the predefined expression compendia orcompile user-defined data sets. In the latter option, theuser is directed to the “Browse experiments” page,where a set of microarray experiments can be assem-bled using ontology terms that describe the meta-data(Supplemental Fig. S1). The ontology terms allowan easily reproducible and intuitive selection of themicroarray experiments without going through eachindividual experiment. Users should keep in mindthat user-defined expression data sets should be largeenough to enable reliable calculation of the correlationcoefficients (Usadel et al., 2009). In the next step, thecoexpression tool page is displayed, where one ormore genes can be introduced for coexpression anal-ysis (step 1; Supplemental Fig. S1). Subsequently, oneor more of the predefined, previously generated user-defined, or personal, preprocessed expression datasets needs to be selected (step 2). Below, more detailsare given on the importance of different expressioncompendia. Next, one can choose to calculate eitherPearson or Spearman correlation coefficients (Spearmancan only be chosen when one expression compen-dium is selected; see “Materials and Methods”). Then,thresholds to limit the number of results and searchoptions, referring to different biological questions,can be chosen (Fig. 2). Either an absolute or a relativethreshold or both can be chosen, namely a correlationcoefficient threshold and/or a number of the mosthighly coexpressed genes (top x), respectively. In thecase of multiple compendia, each compendium istreated separately in a first step, and subsequently,these results are combined, depending on the “all/atleast” parameter. When coexpression in at least onecompendium is chosen (at least = 1), all coexpressionlinks found in the different compendia are reported(union). When coexpression needs to hold true formultiple expression compendia (at least .1), the in-tersection of the individual results is reported.In addition, average, minimum, and maximum cor-relation coefficients over all expression compendiathat meet the chosen thresholds are reported. The
CORNET: Coexpression and Interaction Networks
Plant Physiol. Vol. 152, 2010 1169
search options entail the calculation of expressioncorrelation in a pairwise manner between given genesor a list of gene pairs uploaded as a tab-delimited file
(“Pairwise correlations”), between one or more givengenes and all genes in the genome [“Correlation ofquery gene(s) with neighbors”], and between genes
Figure 1. Functionalities of CORNET.
De Bodt et al.
1170 Plant Physiol. Vol. 152, 2010
that show coexpression with the given gene(s) (“Cor-relations between neighbors”; Fig. 2). These searchoptions are more flexible and intuitive than the func-tionalities of most existing coexpression tools, whichallow the search for coexpressed genes of one or a fewquery genes. Evidently, thresholds need to be chosencarefully when the two latter search options are se-lected because huge amounts of results can be gener-ated. We advise starting off by choosing a relativelyhigh correlation coefficient threshold (such as 0.9) orthe top 10 genes and gradually lowering the strin-gency. Finally, one can integrate localization informa-tion and/or PPIs using the PPI tool as discussedbelow.
Selection of Multiple Microarray Expression Compendia
When comparing the expression profiles of two genes,the input microarray data are expected to influence theobservations (Usadel et al., 2009). For instance, due topleiotropic functions or the combinatorial nature ofcis-regulation, some genes may coexpress under cer-tain conditions but differ in expression under othersand/or coexpress with other genes under these con-ditions. Using a global measure of coexpression, suchas the Pearson correlation coefficient, all conditions inthe input expression compendium are used. Thus,depending on the nature of the studied genes and theinterest of the user, different input expression com-pendia can be imagined, some being subsets of eachother. For instance, when looking for genes that aresimilar to a drought stress-responsive gene, an expres-sion compendium representing abiotic stress condi-tions can be used to identify specific and relevantrelations.
To investigate the variability of expression correla-tion over different expression compendia, several sub-sets of expression data, namely the abiotic stress, bioticstress, development, flower, genetic modification, hor-mone, leaf, root, seed, and abiotic plus biotic stresssets, as well as three global compendia, namely theAtGenExpress compendium, Compendium 1, andCompendium 2, were considered (Table I; see “Mate-rials and Methods”) and are provided as predefinedcompendia in CORNET. We observe that expressioncorrelation can vary with the input expression com-pendia independently from the chosen threshold (Fig.3). Overall, few coexpression links held true whentaking into account multiple compendia. Twenty-fivepercent to 30% of all gene pairs are correlated based onone expression compendium, while about 10% of thecorrelated gene pairs also show correlation in threeother expression compendia, and this number gradu-ally drops when more expression compendia aretaken into account. Only 0.01% to 0.05% of the genepairs show coexpression in all 14 compendia simulta-neously (Fig. 3). The majority of the genes that showcoexpression over all compendia encode ribosomalproteins or are involved in phytosynthesis (Pearsoncorrelation coefficient threshold of 0.9). The coex-pressed genes have a high average expression leveland are highly variable across conditions. Based onthis study, we can conclude that, when performingcoexpression analyses, expression correlation can varysignificantly when using different expression condi-tions. Consequently, coexpression tools need to enablethe estimation of coexpression over diverse expressioncompendia or in specific expression compendia. Theidentification of coexpression under specific condi-tions can be employed to seek genes with character-istics common to genes of interest (such as similarexpression upon abiotic stress treatment). In contrast,coexpression analysis can be carried out with a collec-tion of expression compendia, representing diverseconditions, and lead to the identification of those con-
Figure 2. Search options of coexpression and PPI tools. A, Correlationsor interactions of query gene/protein with neighbors. B, Correlations orinteractions of query gene/protein with neighbors plus correlations orinteractions between neighbors. C, Pairwise correlations or interac-tions. D, Pairwise correlations or interactions plus correlations orinteractions of query genes/proteins with neighbors. E, Pairwise corre-lations or interactions plus correlations or interactions of query genes/proteins with neighbors plus correlations between neighbors.
CORNET: Coexpression and Interaction Networks
Plant Physiol. Vol. 152, 2010 1171
ditions in which the genes of interest show similar ex-pression patterns. For genes with limited functionalinformation, expression compendia delivering highcorrelations hint at possible functional activities of theunknown genes. Using CORNET, this can either bedone using the predefined expression compendia oruser-defined compendia individually or by consider-ing several predefined data sets. In the latter, coex-pression links found in all selected compendia or in atleast x compendia are reported (see above). As such,coexpression in, for instance, root only, leaf only, orboth root and leaf can be studied in one analysis.
PPI Tool
As for the coexpression tool, the PPI tool needs oneor more proteins as input (step 1; Supplemental Fig.S1). Next, different PPI databases can be chosen (step2; see “Materials and Methods”) to extract only exper-imentally identified PPIs, only computationally pre-dicted interactions, or both. As in the coexpressiontool, different search options can be selected, namelysearch for PPI in a pairwise manner (Pairwise interac-tions), search for proteins that interact with the givenprotein(s) [“Interactions of query protein(s) withneighbors”], and/or search if the proteins that interactwith the given protein(s) also interact (“Interactionsbetween neighbors”; step 3; Fig. 2). Finally, also here,integration of coexpression and localization informa-tion is possible.
Integration of Coexpression and PPI Networks
CORNET allows the integration of coexpressionand PPIs networks, which can be approached in twoways. One can start with the coexpression tool, identify-
ing genes that coexpress with each other, and subse-quently test if the corresponding proteins interactand if these proteins interact with other proteins orvice versa. One needs to keep in mind that the orderof the analysis will yield different results when choos-ing the option “Correlations/interactions with neigh-bors.” For instance, when a coexpression analysis isfollowed by a PPI search, all coexpressed genes of thefirst analysis are used as input for the PPI search andgenes that do not show coexpression with other genesare not included as input for the PPI tool. Conversely,when first performing a PPI search, only proteins forwhich interactions have been found will be used asinput for the subsequent coexpression analysis. De-pending on the question to be addressed, one or bothapproaches can be opted for.
Figure 3. Variability of expression correlation. Only a few genes arecoexpressed based on multiple expression compendia. Expressioncorrelation is defined by a Pearson correlation threshold of 0.9 (blue),0.8 (red), top 1% most correlated gene pairs (green), and top 10%mostcorrelated gene pairs (purple).
Table I. Predefined expression compendia
Expression Compendium Description No. of Experiments
AtGenExpress All experiments performed by AtGenExpress 425Compendium 1 Collection of microarray experiments oriented toward growth, development,
and cell cycle studies454
Compendium 2 Collection of microarray experiments; very similar experiments were removed;similar numbers of experiments of each design type
111
Abiotic stress Abiotic stress series (cold, drought, genotoxic, heat, osmotic, oxidative, salt,UV-B, wounding)
256
Biotic stress Biotic stress series (Botrytis, Pseudomonas, Phytophthora, etc.) 69Abiotic plus biotic stress Combination of the abiotic and biotic stress data sets 336Hormone treatment Hormone treatment series (abscisic acid, brassinosteroids, GA, cytokinin, etc.,
and inhibitors)140
Development Developmental series (different tissues, developmental stages, developmentalmutants)
135
Flower Microarray experiments in which floral tissues are sampled 72Leaf Microarray experiments in which leaf tissues are sampled 212Root Microarray experiments in which root tissues are sampled 258Seed Microarray experiments in which seed tissues are sampled 83Whole plant Microarray experiments in which whole plants are sampled 85Genetic modification Microarray experiments in which transgenic lines are profiled (gene overexpression
[knockin], gene knockout, transient transgene expression)313
De Bodt et al.
1172 Plant Physiol. Vol. 152, 2010
Figure 4. (Legend appears on following page.)
CORNET: Coexpression and Interaction Networks
Plant Physiol. Vol. 152, 2010 1173
When integrating coexpression and PPI data, thedegree of coexpression of genes encoding for interact-ing proteins can be studied using CORNET. A globalstudy of all experimentally identified PPIs showed therelatively low concordance between coexpression andinteractions (see Fig. 2 in De Bodt et al., 2009). Thisobservation can be confirmed using the different ex-pression compendia available in CORNET. The meancorrelation coefficient for experimentally identifiedPPIs ranges between 0.13 and 0.21. As previouslymentioned in the literature, this low degree of expres-sion similarity is probably due to the transient nature ofPPIs.
Visualization of Coexpression and PPI Networks
For network visualization, the existing softwareCytoscape was designated (Shannon et al., 2003; Fig. 1)because its functionalities allow browsing and zoom-ing into the constructed networks, a visual as wellas textual representation of diverse attributes (e.g. cor-relation coefficient, localization databases; see Fre-quently Asked Questions [FAQ] page and VizMapperin Cytoscape) and further exploration and analysis ofthe networks. The degree of expression correlation isrepresented by the color of the edges (or correlations)in the coexpression network, where, for instance, blueedges correspond to highly correlated expression pro-files. In case multiple expression compendia wereselected in the coexpression tool, the minimum, max-imum, or average Pearson correlation coefficient (asrequested by the user) is displayed. However, allretrieved Pearson correlation coefficients are reportedin the text output and as composite attributes (data setcoefficients) in Cytoscape. PPIs are depicted by blackedges, whereas their reliability can be assessed throughthe width (number of data sources) and the style(detection method) of the edges (or interactions) in thePPI network. Experimentally identified interactionsare represented by solid lines, while computation-ally predicted interactions are represented by dashedlines. The shape of the nodes (or genes/proteins)depicts the nature of the gene/protein as query or asneighbor in either the coexpression or PPI tool. Allattributes can be displayed in the lower data panel ofCytoscape by clicking on the Select Attributes button(see FAQ) and copied and/or exported in tabularformat at any time.
Case Study: DELLA Network
To demonstrate the functionalities of CORNET and,particularly, the use of multiple expression compendiain the coexpression tool, we have investigated themolecular context of DELLA proteins. The Arabidop-sis genome encodes five DELLA proteins (REPRESSOROF GA [RGA], GIBBERELLIC ACID INSENSITIVE[GAI], RGA-LIKE1 [RGL1], RGL2, and RGL3) that actas negative regulators of the GA signaling pathway.DELLA proteins bind to the GIBBERELLIN INSENSI-TIVE DWARF1 (GID1) receptor in the presence of GA.The GA-GID1-DELLA complex is then targeted by anSCF-E3 ligase (SLEEPY), resulting in ubiquitinylationand degradation of DELLA by the 26S proteasome,thereby relieving DELLA-mediated repression of GAresponses. DELLAs are involved in diverse processessuch as flower development, seed germination, leafgrowth, and abiotic stress response (Tyler et al., 2004;Achard et al., 2006; Schwechheimer, 2008; Achard andGenschik, 2009). Although the functional differencesbetween the DELLAs are not completely clear, RGAand GAI have been shown to repress stem elongation(Dill and Sun, 2001; King et al., 2001), RGL2 inhibitsseed germination (Lee et al., 2002), and RGA, RGL1,and RGL2 together regulate floral development(Cheng et al., 2004; Tyler et al., 2004; Yu et al., 2004).Whereas RGA and GAI are highly expressed in mosttissues, RGL1, RGL2, and RGL3 are mainly expressedin germinating seeds, young seedlings, and flowers,indicating that these signaling molecules might betranscriptionally regulated (Tyler et al., 2004). Thus,DELLA proteins are highly pleiotropic and serve as anideal test case.
Different functions of the DELLA proteins surfacewhen employing particular expression compendia(Fig. 4). First, we have generated a network usingCompendium 2, which represents diverse conditionsand gives a global estimate of coexpression. Subse-quently, experimentally identified PPIs were addedbetween the resulting proteins and all other proteins(neighbors) as well as between those other interactingproteins (neighbors). Two main groups of highly con-nected genes could be delineated in this network (Fig.4A). The degree of correlation between the genes ofboth groups clearly differed (blue edges [group I]versus red edges [group II]). Group I consists of RGL2and RGL3 and several LEA (for late embryogenesisabundant) genes with roles in seed development,
Figure 4. DELLA networks generated by CORNETusing multiple compendia. The DELLA network based on Compendium 2 (A)shows two groups of coexpressed genes where group I is highly coexpressed (blue edges) and group II shows a lower degree ofcoexpression (red edges). The networks based on Compendium 2 plus abiotic stress plus hormone treatment compendia (B) andCompendium 2 plus leaf plus seed compendia (C) reveal new coexpression links and new coexpressed genes. Red edgesrepresent coexpression links found in the hormone (B) and seed (C) compendia, and green edges represent expression links foundin the abiotic stress (B) and leaf (C) compendia. Edges starting and ending at the same node represent homodimerization ofproteins.
De Bodt et al.
1174 Plant Physiol. Vol. 152, 2010
while group II contains GAI, RGA, and RGL1, somehormone-related genes such as BRASSINOSTEROIDINSENSITIVE1 and AFB5 (auxin F-box), and CLIP-associating protein-related genes involved in mitosis,growth, and protein stability, as well as many othergenes. Both groups are connected to each other throughPPI links between the DELLAs and the GID receptors,SLEEPY1, PHYTOCHROME B, PHYTOCHROMEINTERACTING FACTOR3, and PIF4, interactionsthat have been identified by various experimentalapproaches (Supplemental Files S1, “Features” attri-bute in Cytoscape; Dill et al., 2004; Fu et al., 2004;Griffiths et al., 2006; Nakajima et al., 2006; Achardet al., 2007; Ariizumi et al., 2008; de Lucas et al., 2008;Feng et al., 2008). Next, we investigated how thisnetwork varied with different, more specific expres-sion compendia, namely the abiotic stress and hor-mone treatment compendia on the one hand (Fig. 4B)and the leaf and seed compendia on the other hand(Fig. 4C; see “Materials andMethods”), again followedby a search for experimentally identified PPIs. Strik-ingly, a high number of new coexpression links andcoexpressed genes were identified that had not beenfound with Compendium 2 (Fig. 4, B and C; Supple-mental Tables S1 and S2). Nevertheless, the twogroups of genes and especially the degree of expres-sion correlation between the respective genes were stilldemarcated, although some coexpression links be-tween the two groups appeared. When examiningthe expression compendia that supported the differentcoexpression links, an additional distinction betweengroup I and group II could be noticed. Group I linkswere mainly retrieved using the hormone treatmentand the seed compendia, while some group II linkswere found using the abiotic stress and leaf compen-dia (Fig. 4, B and C). In other words, genes in eachgroup had similar expression patterns in particulartissues and conditions and probably had associatedfunctions primarily in those tissues and conditions.Through CORNET, the expression profiles of thesegenes for the different expression compendia could
be explored to further pinpoint the possible similari-ties and functions of the two groups. Group I genesshowed similar and high expression in the imbibitionstage (the first stage of seed germination) as well as inlater stages of embryo development. In addition, thesegenes were highly expressed in gibberellic acid andabscisic acid experiments, in which seeds or embryoswere sampled, as well as in methyl jasmonate andcytokinin experiments, with sampling of stamens andshoots, respectively.
Accordingly, Cao et al. (2006) observed that the setof genes that are presumably regulated by DELLAs forseed germination overlaps little with and is largelydistinct from the set of DELLA-regulated genes in-volved in floral development when investigating tran-script profiling results from ga1-3 and ga1-3 gai-t6 rga-t2rgl1-1 rgl2-1 seeds and flowers. This observation sug-gests that GA-mediated seed germination and floraldevelopment are under the control of distinct DELLA-dependent transcriptomes. Our analysis suggests asimilar distinction between the sets of genes involvedin processes ongoing in seed and leaf and betweensets of genes involved in hormone and abiotic stress-related processes.
CONCLUSION
We have developed CORNET, a tool for the con-struction of coexpression and PPI networks and theirfunctional annotation in Arabidopsis (bioinformatics.psb.ugent.be/cornet). With this tool, we aim at pro-viding biologists with the means to investigate theassociations between genes and between encoded pro-teins. Thereby, we provide the ability to better under-stand the functional context of a gene, leading to functionprediction of unknown genes or prediction of (indirect)regulatory interactions between known genes and pro-teins.
In the case study of DELLA proteins, coexpressionand PPI networks demonstrate the importance ofcareful coexpression analysis. Both the correct selec-
Table III. Number of microarray experiments in each design type
Design Type No. of Experiments
Stimulus_or_stress_design 13Abiotic_stress_design 255Biotic_stress_design 322Compound_treatment_design 23Hormone_treatment_design 140Genetic_modification_design 314Growth_condition_design 69Time_series_design 357Circadian_rhythm_design 15Development_or_differentiation_design 134Individual_genetic_characteristics_design 3Organism_part_comparison_design 121Strain_or_line_design 22Translational_bias_design 2
Table II. Statistics on the CORNET database
Data No.
MicroarrayArrays 3,055Experiments 1,209Series 200Experiments with two replicates 634Experiments with more than two replicates 575
PPIsExperimentally identified interactions 4,302Computationally predicted interactions 89,181Total interactions 93,109
LocalizationExperimentally identified localizations 14,135Computationally predicted localizations 11,310Total localizations 18,824
CORNET: Coexpression and Interaction Networks
Plant Physiol. Vol. 152, 2010 1175
tion of expression compendia and of adequate searchoptions and thresholds can greatly enhance the powerof coexpression tools to uncover new associationsbetween genes and between genes and processes/microarray conditions. Moreover, the comprehensivevisualization, as illustrated by Figure 4, allows a bird’seye view of the constructed networks, instantly dis-covering those associations without any requirementfor bioinformatics skills, such as scripting or compli-cated database queries.
In the future, we plan to add different data types asthey become available. First, we foresee the incorpo-ration of cis-regulatory elements, which can be tightlylinked with the coexpression results, as is workedout in AtCOECiS (Vandepoele et al., 2009). Second,through comparative genomics approaches, construc-ted networks can be transferred to other plant species,such as poplar (Populus species), tomato (Solanumlycopersicum), rice (Oryza sativa), and crops of interest.Last, the inclusion of protein-DNA (or transcriptionfactor-target) interactions will be very profitable, asdata are generated through chromatin immunoprecip-itation (ChIP)-chip and ChIP-seq or indirectly throughthe analysis of transcriptome data.
MATERIALS AND METHODS
CORNET Troubleshooting
CORNETcan be accessed through the followingURL: http://bioinformatics.
psb.ugent.be/cornet. The tool is fully functional in Firefox and Safari browsers.
First users might need to accept a security certificate before accessing the Web
site. The site is ideally viewed at 1,280 3 1,024 resolution. You need to allow
pop-ups in your browser before clicking the “go” button. After calculations and
database queries, Cytoscape will start automatically from the Web. In other
words, Cytoscape does not have to be installed on your computer. However, to
enable the Cytoscape WebStart, an up-to-date version of Java is required. More
details can be found on our FAQ page.
Construction of the Microarray Database
All expression data available at Gene Expression Omnibus (Barrett and
Edgar, 2006) and resulting from experiments with Affymetrix ATH1 arrays
were uploaded in the CORNET database (Table II). Only experiments where
two or more replicates were performed are included. Meta-data were de-
scribed using ontology terms (for more details, see “Results and Discussion”).
However, in some instances, these ontologies did not suffice to describe the
microarray experiments. Existing ontologies were extended as necessary to
allow amore detailed description of tissues, transgenic lines, and experimental
designs (indicated by “EXT”). The experimental designs proved very valuable
in the compilation of specific expression compendia (e.g. development_or_
differentiation_design, genetic_modification_design, compound_treatment_
design, abiotic_stress_design, biotic_stress_design, time_series_design, hormone_
treatment_design; see below; Table III).
Microarray Experiment Browsing and Retrieval
Microarray experiments and their meta-data can be browsed through
CORNET (Browse experiments). Using a tree-based representation, the on-
tology terms can be browsed and selected in order to compile specific sets of
microarray experiments. For each microarray experiment, the different rep-
licates and the link between control and treatment arrays are shown. On the
one hand, raw data together with their annotation can be downloaded. On
the other hand, processed data can be downloaded or used as input for the
coexpression tool. The microarray data are processed with the Robust Multi-
array Average procedure implemented in BioConductor (Irizarry et al., 2003a,
2003b; Gautier et al., 2004; Gentleman et al., 2004) using an alternative Chip
Description File that takes into account possible cross-hybridization (tine-
sath1cdf; Casneuf et al., 2007).
Uploading Personal Expression Data
CORNETallows the incorporation of personal expression data (“Upload”).
Personal data can be uploaded temporarily, processed with Robust Multiarray
Average, and downloaded for later use. In addition, the uploaded data can be
combined with public expression data in user-defined data sets and subse-
quently downloaded for later use and/or used as input for the coexpression
tool. The submitter can access these personal data for 24 h after submission.
The personal data are linked to the session identifier of the browser.
Predefined Expression Compendia
The coexpression tool of CORNET makes use of the following predefined
expression compendia. First, an expression data set (“AtGenExpress”) in-
cluding the publicly available AtGenExpress compendia (such as abiotic
stress, pathogen, development, and hormones) is provided (Schmid et al.,
2005; Kilian et al., 2007; Goda et al., 2008). Second, we have compiled two
different microarray compendia, which cover diverse conditions and conse-
quently can be used to estimate an overall degree of coexpression taking into
account all conditions. The first compendium (Compendium 1) covers diverse
conditions that mainly deal with plant development and growth. The second
compendium (Compendium 2) was built to reduce the bias toward particular
conditions as much as possible. Moreover, the redundant information that
microarray experiments can generate is assessed by calculating the correlation
between experiments rather than between genes. A cutoff of 0.99 is used to
identify redundant experiments. From each group of redundant experiments,
one or more experiments are selected, taking the experiments with the highest
number of replicates and the noncontrol experiments. Subsequently, an equal
number of experiments of each type of condition (according to the assigned
“design” terms) is selected. In addition to these “global” compendia, we have
compiled specific expression data sets using the design and ontology terms
mentioned above. As such, expression compendia were generated that are
specific to certain tissues (leaf, root, flower, seed, and whole plant) or certain
treatments (abiotic stress, biotic stress, and hormone treatment), compendia of
experiments in which development is perturbed, or in which genes are
modified (overexpression, knockout, or silencing lines; Tables I and III).
Expression profiles for particular genes can be viewed in automatically
generated line graphs.
Coexpression Tool
To quantify the similarity in expression profiles, the commonly used
Pearson and Spearman correlation coefficients can be calculated. Pearson is a
parametric method based on actual expression values, while Spearman is a
nonparametric method based on ranks. Both measures range from 21 (anti-
correlation) over 0 (no correlation) to 1 (correlation). When only one expres-
sion compendium is chosen for coexpression analysis, correlation coefficients
are calculated in real time. However, when more than one expression
compendium is chosen, it is no longer feasible to do the calculations in real
time. Alternatively, the correlation coefficients are extracted from the database
containing precalculated Pearson correlation coefficients higher than 0.6 and
lower than 20.6 (this limitation is due to space and time constraints).
Arabidopsis Genome Initiative codes are used to describe the genes. Only
genes represented on the Affymetrix ATH1 array are taken into account in the
coexpression analysis (in contrast to the PPI tool, where all proteins are taken
into account). In total, 20,777 genes are accounted for using an alternative Chip
Description File that takes into account possible cross-hybridization (tine-
sath1cdf; Casneuf et al., 2007). There is no limit to the number of genes that can
be given as query.
PPI Tool
We have assembled currently available experimentally identified PPIs for
Arabidopsis (Arabidopsis thaliana) from BIND (Bader et al., 2003), IntAct
(Hermjakob et al., 2004), BioGRID (Stark et al., 2006), DIP (Salwinski et al.,
2004), MINT (Chatr-aryamontri et al., 2007), and The Arabidopsis Information
Resource (Rhee et al., 2003), the predicted PPIs from BAR (Geisler-Lee et al.,
2007) and AtPID (Cui et al., 2008), and the filtered (high-stringency) and
De Bodt et al.
1176 Plant Physiol. Vol. 152, 2010
predicted (low-stringency) interactions identified in our own study (De Bodt
et al., 2009; Table II). To be able to grasp the reliability of the PPIs, we
distinguish between experimental and predicted PPIs and indicate the dif-
ferent data sources (database, experiment type, evidence code, and PubMed
identifier) as edge attributes in Cytoscape (see “Results and Discussion”;
Shannon et al., 2003). Where possible, gene names are mapped on the nodes of
the network. Alternatively, gene descriptions are shown. Gene names and full
gene descriptions downloaded from The Arabidopsis Information Resource
are stored as “description” and “descriptionLong” attributes in Cytoscape.
Gene Information
To allow easy interpretation of the constructed networks, we add locali-
zation data and functional information to the genes. The localization data are a
collection of both experimental and predicted localizations retrieved from
SUBA (Heazlewood et al., 2007), IPSort (Bannai et al., 2002), LocTree (Nair and
Rost, 2005), MITOPRED (Guda et al., 2004), MitoProt (Claros, 1995), MultiLoc
(Høglund et al., 2006), PeroxP (Emanuelsson et al., 2003), Predotar (Small
et al., 2004), SubLoc (Chen et al., 2006), TargetP (Emanuelsson et al., 2007), and
WoLF_PSORT (Horton et al., 2007; Table II). These localization data are
depicted in pie charts, allowing multiple localizations for one gene. The
fractions of the pie chart are based on the fraction of databases in which a
particular localization was found. As for PPIs, sources for localization data are
reported in the Cytoscape attributes. In addition to the localization data, we
integrate InterPro protein domain information (Hunter et al., 2009) and GO
Biological Process and GOMolecular Function data (Harris et al., 2004), which
can be viewed as node attributes in Cytoscape (see “Results and Discussion”;
Shannon et al., 2003).
Supplemental Data
The following materials are available in the online version of this article.
Supplemental Figure S1. Screen shots of CORNET.
Supplemental Table S1. Group I genes in the DELLA network (Fig. 4A).
Supplemental Table S2. Group II genes in the DELLA network (Fig. 4B).
Supplemental Files S1. Cytoscape session files of the DELLA networks
(http://bioinformatics.psb.ugent.be/supplementary_data/stbod/cornet/).
ACKNOWLEDGMENTS
We thank Klaas Vandepoele and Fabio Fiorani for fruitful discussions, our
colleagues for helpful comments on the tool, Stijn Vermeir, Thomas Van
Parys, and Lieven Baeyens for technical assistance, and Martine De Cock for
help in preparing the manuscript. We acknowledge three anonymous re-
viewers for helpful comments on the manuscript.
Received September 8, 2009; accepted January 4, 2010; published January 6,
2010.
LITERATURE CITED
Achard P, Cheng H, De Grauwe L, Decat J, Schoutteten H, Moritz T, Van
Der Straeten D, Peng J, Harberd NP (2006) Integration of plant
responses to environmentally activated phytohormonal signals. Science
311: 91–94
Achard P, Genschik P (2009) Releasing the brakes of plant growth: how
GAs shutdown DELLA proteins. J Exp Bot 60: 1085–1092
Achard P, Liao L, Jiang C, Desnos T, Bartlett J, Fu X, Harberd NP (2007)
DELLAs contribute to plant photomorphogenesis. Plant Physiol 143:
1163–1172
Akiyama K, Chikayama E, Yuasa H, Shimada Y, Tohge T, Shinozaki K,
Hirai MY, Sakurai T, Kikuchi J, Saito K (2008) PRIMe: a Web site that
assembles tools for metabolomics and transcriptomics. In Silico Biol 8:
339–345
Ariizumi T, Murase K, Sun TP, Steber CM (2008) Proteolysis-independent
downregulation of DELLA repression in Arabidopsis by the gibberellin
receptor GIBBERELLIN INSENSITIVE DWARF1. Plant Cell 20:
2447–2459
Avraham S, Tung CW, Ilic K, Jaiswal P, Kellogg EA, McCouch S, Pujar A,
Reiser L, Rhee SY, Sachs MM, et al (2008) The Plant Ontology Database:
a community resource for plant structure and developmental stages
controlled vocabulary and annotations. Nucleic Acids Res 36:
D449–D454
Bader GD, Betel D, Hogue CWV (2003) BIND: the Biomolecular Interac-
tion Network Database. Nucleic Acids Res 31: 248–250
Bannai H, Tamada Y, Maruyama O, Nakai K, Miyano S (2002) Extensive
feature detection of N-terminal protein sorting signals. Bioinformatics
18: 298–305
Barrett T, Edgar R (2006) Gene expression omnibus: microarray data
storage, submission, retrieval, and analysis. Methods Enzymol 411:
352–369
Brady SM, Provart NJ (2009) Web-queryable large-scale data sets for
hypothesis generation in plant biology. Plant Cell 21: 1034–1051
Brazma A (2009) Minimum Information About a Microarray Experiment
(MIAME): successes, failures, challenges. ScientificWorldJournal 9:
420–423
Brown DM, Zeef LAH, Ellis J, Goodacre R, Turner SR (2005) Identification
of novel genes in Arabidopsis involved in secondary cell wall formation
using expression profiling and reverse genetics. Plant Cell 17: 2281–2295
Bruskiewich R, Coe EH, Jaiswal P, McCouch S, Polacco M, Stein L,
Vincent L, Ware D (2002) The plant ontology consortium and plant
ontologies. Comp Funct Genomics 3: 137–142
Cao D, Cheng H, Wu W, Soo HM, Peng J (2006) Gibberellin mobilizes
distinct DELLA-dependent transcriptomes to regulate seed germination
and floral development in Arabidopsis. Plant Physiol 142: 509–525
Casneuf T, Van de Peer Y, Huber W (2007) In situ analysis of cross-
hybridisation on microarrays and the inference of expression correla-
tion. BMC Bioinformatics 8: 461
Chatr-aryamontri A, Ceol A, Montecchi Palazzi L, Nardelli G, Schneider
MV, Castagnoli L, Cesareni G (2007) MINT: the Molecular INTeraction
database. Nucleic Acids Res 35: D572–D574
Chen H, Huang N, Sun Z (2006) SubLoc: a server/client suite for protein
subcellular location based on SOAP. Bioinformatics 22: 376–377
Cheng H, Qin L, Lee S, Fu X, Richards DE, Cao D, Luo D, Harberd NP,
Peng J (2004) Gibberellin regulates Arabidopsis floral development via
suppression of DELLA protein function. Development 131: 1055–1064
Claros MG (1995) MitoProt, a Macintosh application for studying mito-
chondrial proteins. Comput Appl Biosci 11: 441–447
Cui J, Li P, Li G, Xu F, Zhao C, Li Y, Yang Z, Wang G, Yu Q, Li Y, et al (2008)
AtPID: Arabidopsis thaliana protein interactome database. An integra-
tive platform for plant systems biology. Nucleic Acids Res 36:
D999–D1008
De Bodt S, Proost S, Vandepoele K, Rouze P, Van de Peer Y (2009)
Predicting protein-protein interactions in Arabidopsis thaliana through
integration of orthology, gene ontology and co-expression. BMC
Genomics 10: 288
de Lucas M, Daviere JM, Rodrıguez-Falcon M, Pontin M, Iglesias-Pedraz
JM, Lorrain S, Fankhauser C, Blazquez MA, Titarenko E, Prat S (2008)
A molecular framework for light and gibberellin control of cell elonga-
tion. Nature 451: 480–484
de Nooy W, Mrvar A, Batagelj V (2005) Exploratory Social Network
Analysis with Pajek. Cambridge University Press, Cambridge, UK
Dill A, Sun TP (2001) Synergistic derepression of gibberellin signaling by
removing RGA and GAI function in Arabidopsis thaliana. Genetics 159:
777–785
Dill A, Thomas SG, Hu J, Steber CM, Sun TP (2004) The Arabidopsis F-box
protein SLEEPY1 targets gibberellin signaling repressors for gibberellin-
induced degradation. Plant Cell 16: 1392–1405
Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating
proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:
953–971
Emanuelsson O, Elofsson A, von Heijne G, Cristobal S (2003) In silico
prediction of the peroxisomal proteome in fungi, plants and animals. J
Mol Biol 330: 443–456
Feng S, Martinez C, Gusmaroli G, Wang Y, Zhou J, Wang F, Chen L, Yu L,
Iglesias-Pedraz JM, Kircher S, et al (2008) Coordinated regulation of
Arabidopsis thaliana development by light and gibberellins. Nature
451: 475–479
Fu X, Richards DE, Fleck B, Xie D, Burton N, Harberd NP (2004) The
CORNET: Coexpression and Interaction Networks
Plant Physiol. Vol. 152, 2010 1177
Arabidopsis mutant sleepy1gar2-1 protein promotes plant growth by
increasing the affinity of the SCFSLY1 E3 ubiquitin ligase for DELLA
protein substrates. Plant Cell 16: 1406–1418
Gachon CMM, Langlois-Meurinne M, Henry Y, Saindrenan P (2005)
Transcriptional co-regulation of secondary metabolism enzymes in
Arabidopsis: functional and evolutionary implications. Plant Mol Biol
58: 229–245
Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy: analysis of
Affymetrix GeneChip data at the probe level. Bioinformatics 20: 307–315
Geisler-Lee J, O’Toole N, Ammar R, Provart NJ, Millar AH, Geisler M
(2007) A predicted interactome for Arabidopsis. Plant Physiol 145:
317–329
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S,
Ellis B, Gautier L, Ge Y, Gentry J, et al (2004) Bioconductor: open
software development for computational biology and bioinformatics.
Genome Biol 5: R80
Goda H, Sasaki E, Akiyama K, Maruyama-Nakashita A, Nakabayashi K,
Li W, Ogawa M, Yamauchi Y, Preston J, Aoki K, et al (2008) The
AtGenExpress hormone and chemical treatment data set: experimental
design, data evaluation, model data analysis and data access. Plant J 55:
526–542
Goldovsky L, Cases I, Enright AJ, Ouzounis CA (2005) BioLayout(Java):
versatile network visualisation of structural and functional relation-
ships. Appl Bioinformatics 4: 71–74
Griffiths J, Murase K, Rieu I, Zentella R, Zhang ZL, Powers SJ, Gong F,
Phillips AL, Hedden P, Sun TP, et al (2006) Genetic characterization
and functional analysis of the GID1 gibberellin receptors in Arabidopsis.
Plant Cell 18: 3399–3414
Guda C, Fahy E, Subramaniam S (2004) MITOPRED: a genome-scale
method for prediction of nucleus-encoded mitochondrial proteins.
Bioinformatics 20: 1785–1794
Gutierrez RA, Lejay LV, Dean A, Chiaromonte F, Shasha DE, Coruzzi GM
(2007) Qualitative network models and genome-wide expression data
define carbon/nitrogen-responsive molecular machines in Arabidopsis.
Genome Biol 8: R7
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck
K, Lewis S, Marshall B, Mungall C, et al (2004) The Gene Ontology
(GO) database and informatics resource. Nucleic Acids Res 32:
D258–D261
Heazlewood JL, Verboom RE, Tonti-Filippini J, Small I, Millar AH (2007)
SUBA: the Arabidopsis Subcellular Database. Nucleic Acids Res 35:
D213–D218
Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S,
Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, et al
(2004) IntAct: an open source molecular interaction database. Nucleic
Acids Res 32: D452–D455
Høglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O (2006)
MultiLoc: prediction of protein subcellular localization using N-terminal
targeting sequences, sequence motifs and amino acid composition. Bio-
informatics 22: 1158–1165
Horan K, Jang C, Bailey-Serres J, Mittler R, Shelton C, Harper JF, Zhu JK,
Cushman JC, Gollery M, Girke T (2008) Annotating genes of known
and unknown function by large-scale coexpression analysis. Plant
Physiol 147: 41–57
Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ,
Nakai K (2007) WoLF PSORT: protein localization predictor. Nucleic
Acids Res 35: W585–W587
Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork
P, Das U, Daugherty L, Duquenne L, et al (2009) InterPro: the integra-
tive protein signature database. Nucleic Acids Res 37: D211–D215
Ilic K, Kellogg EA, Jaiswal P, Zapata F, Stevens PF, Vincent LP, Avraham
S, Reiser L, Pujar A, Sachs MM, et al (2007) The plant structure
ontology, a unified vocabulary of anatomy and morphology of a flower-
ing plant. Plant Physiol 143: 587–599
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP (2003a)
Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res
31: e15
Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf
U, Speed TP (2003b) Exploration, normalization, and summaries of high
density oligonucleotide array probe level data. Biostatistics 4: 249–264
Kilian J, Whitehead D, Horak J, Wanke D, Weinl S, Batistic O, D’Angelo
C, Bornberg-Bauer E, Kudla J, Harter K (2007) The AtGenExpress
global stress expression data set: protocols, evaluation and model data
analysis of UV-B light, drought and cold stress responses. Plant J 50:
347–363
King KE, Moritz T, Harberd NP (2001) Gibberellins are not required for
normal stem growth in Arabidopsis thaliana in the absence of GAI and
RGA. Genetics 159: 767–776
Lee S, Cheng H, King KE, Wang W, He Y, Hussain A, Lo J, Harberd NP,
Peng J (2002) Gibberellin regulates Arabidopsis seed germination via
RGL2, a GAI/RGA-like gene whose expression is up-regulated follow-
ing imbibition. Genes Dev 16: 646–658
Lisso J, Steinhauser D, Altmann T, Kopka J, Mussig C (2005) Identifica-
tion of brassinosteroid-related genes by means of transcript co-response
analyses. Nucleic Acids Res 33: 2685–2696
Manfield IW, Jen CH, Pinney JW, Michalopoulos I, Bradford JR, Gilmartin
PM, Westhead DR (2006) Arabidopsis Co-expression Tool (ACT): Web
server tools for microarray-based gene expression analysis. Nucleic Acids
Res 34: W504–W509
Mutwil M, Øbro J, Willats WGT, Persson S (2008) GeneCAT: novel
Webtools that combine BLAST and co-expression analyses. Nucleic
Acids Res 36: W320–W326
Nair R, Rost B (2005) Mimicking cellular sorting improves prediction of
subcellular localization. J Mol Biol 348: 85–100
Nakajima M, Shimada A, Takashi Y, Kim YC, Park SH, Ueguchi-Tanaka
M, Suzuki H, Katoh E, Iuchi S, Kobayashi M, et al (2006) Identification
and characterization of Arabidopsis gibberellin receptors. Plant J 46:
880–889
Obayashi T, Hayashi S, Saeki M, Ohta H, Kinoshita K (2009) ATTED-II
provides coexpressed gene networks for Arabidopsis. Nucleic Acids
Res 37: D987–D991
Obayashi T, Kinoshita K, Nakai K, Shibaoka M, Hayashi S, Saeki M,
Shibata D, Saito K, Ohta H (2007) ATTED-II: a database of co-expressed
genes and cis elements for identifying co-regulated gene groups in
Arabidopsis. Nucleic Acids Res 35: D863–D869
Pujar A, Jaiswal P, Kellogg EA, Ilic K, Vincent L, Avraham S, Stevens P,
Zapata F, Reiser L, Rhee SY, et al (2006) Whole-plant growth stage
ontology for angiosperms and its application in plant biology. Plant
Physiol 142: 414–428
Rautengarten C, Steinhauser D, Bussis D, Stintzi A, Schaller A, Kopka J,
Altmann T (2005) Inferring hypotheses on functional relationships of
genes: analysis of the Arabidopsis thaliana subtilase gene family. PLoS
Comput Biol 1: e40
Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-
Hernandez M, Huala E, Lander G, Montoya M, et al (2003) The
Arabidopsis Information Resource (TAIR): a model organism database
providing a centralized, curated gateway to Arabidopsis biology, re-
search materials and community. Nucleic Acids Res 31: 224–228
Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004)
The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 32:
D449–D451
Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M,
Scholkopf B, Weigel D, Lohmann JU (2005) A gene expression map of
Arabidopsis thaliana development. Nat Genet 37: 501–506
Schwechheimer C (2008) Understanding gibberellic acid signaling: are we
there yet? Curr Opin Plant Biol 11: 9–15
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N,
Schwikowski B, Ideker T (2003) Cytoscape: a software environment for
integrated models of biomolecular interaction networks. Genome Res
13: 2498–2504
Small I, Peeters N, Legeai F, Lurin C (2004) Predotar: a tool for rapidly
screening proteomes for N-terminal targeting sequences. Proteomics 4:
1581–1590
Srinivasasainagendra V, Page GP, Mehta T, Coulibaly I, Loraine AE
(2008) CressExpress: a tool for large-scale mining of expression data
from Arabidopsis. Plant Physiol 147: 1004–1016
Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M
(2006) BioGRID: a general repository for interaction datasets. Nucleic
Acids Res 34: D535–D539
Steinhauser D, Usadel B, Luedemann A, Thimm O, Kopka J (2004) CSB.
DB: a comprehensive systems-biology database. Bioinformatics 20:
3647–3651
Taylor CF, Paton NW, Lilley KS, Binz PA, Julian RK Jr, Jones AR, Zhu W,
Apweiler R, Aebersold R, Deutsch EW, et al (2007) The minimum
information about a proteomics experiment (MIAPE). Nat Biotechnol
25: 887–893
De Bodt et al.
1178 Plant Physiol. Vol. 152, 2010
Toufighi K, Brady SM, Austin R, Ly E, Provart NJ (2005) The Botany Array
Resource: e-northerns, expression angling, and promoter analyses.
Plant J 43: 153–163
Tyler L, Thomas SG, Hu J, Dill A, Alonso JM, Ecker JR, Sun TP (2004)
DELLA proteins and gibberellin-regulated seed germination and floral
development in Arabidopsis. Plant Physiol 135: 1008–1019
Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M,
Chow A, Steinhauser D, Persson S, Provart NJ (2009) Co-expression
tools for plant biology: opportunities for hypothesis generation and
caveats. Plant Cell Environ 32: 1633–1651
Vandepoele K, Quimbaya M, Casneuf T, De Veylder L, Van de Peer Y
(2009) Unraveling transcriptional control in Arabidopsis using cis-
regulatory elements and coexpression networks. Plant Physiol 150:
535–546
Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game
L, Heiskanen M, Morrison N, Rocca-Serra P, et al (2006) The MGED
Ontology: a resource for semantics-based description of microarray
experiments. Bioinformatics 22: 866–873
Yu H, Ito T, Zhao Y, Peng J, Kumar P, Meyerowitz EM (2004) Floral
homeotic genes are targets of gibberellin signaling in flower develop-
ment. Proc Natl Acad Sci USA 101: 7827–7832
Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004)
GENEVESTIGATOR: Arabidopsis microarray database and analysis
toolbox. Plant Physiol 136: 2621–2632
CORNET: Coexpression and Interaction Networks
Plant Physiol. Vol. 152, 2010 1179