13
Bioinformatics CORNET: A User-Friendly Tool for Data Mining and Integration 1[W] Stefanie De Bodt*, Diana Carvajal, Jens Hollunder, Joost Van den Cruyce, Sara Movahedi, and Dirk Inze ´ Department of Plant Systems Biology, Flanders Institute for Biotechnology, and Department of Plant Biotechnology and Genetics, Ghent University, 9052 Ghent, Belgium As an overwhelming amount of functional genomics data have been generated, the retrieval, integration, and interpretation of these data need to be facilitated to enable the advance of (systems) biological research. For example, gathering and processing microarray data that are related to a particular biological process is not straightforward, nor is the compilation of protein- protein interactions from numerous partially overlapping databases identified through diverse approaches. However, these tasks are inevitable to address the following questions. Does a group of differentially expressed genes show similar expression in diverse microarray experiments? Was an identified protein-protein interaction previously detected by other approaches? Are the interacting proteins encoded by genes with similar expression profiles and localization? We developed CORNET (for CORrelation NETworks) as an access point to transcriptome, protein interactome, and localization data and functional information on Arabidopsis (Arabidopsis thaliana). It consists of two flexible and versatile tools, namely the coexpression tool and the protein-protein interaction tool. The ability to browse and search microarray experiments using ontology terms and the incorporation of personal microarray data are distinctive features of the microarray repository. The coexpression tool enables either the alternate or simultaneous use of diverse expression compendia, whereas the protein-protein interaction tool searches experimentally and computationally identified protein-protein interactions. Different search options are implemented to enable the construction of coexpression and/or protein-protein interaction networks centered around multiple input genes or proteins. Moreover, networks and associated evidence are visualized in Cytoscape. Localization is visualized in pie charts, thereby allowing multiple localizations per protein. CORNET is available at http://bioinformatics.psb.ugent.be/cornet. A high number of experiments have been per- formed to unravel molecular mechanisms underly- ing diverse biological processes active in the model plant Arabidopsis (Arabidopsis thaliana). For instance, in microarray experiments, diverse tissues from wild- type plants as well as mutant or transgenic plants are sampled at different developmental stages and treated with numerous compounds. Although enormous amounts of data have been generated, it remains a hurdle to sift through the heterogeneous information to find data relevant to a particular biological question. Data resulting from profiling studies are stored in different formats in various databases. Microarray data and, in particular, corresponding meta-data (e.g. sampled tissue, time point, treatment) are stored in an unstructured manner, which complicates data retrieval and interpretation. Also, proteomics data, such as protein-protein interaction (PPI) data, are dis- persed over several databases in somewhat different formats. Although some efforts have been made, such as setting up MIAME (for Minimum Information About a Microarray Experiment) and MIAPE (for Minimum Information About a Proteomics Experi- ment), these systems are not (yet) generally employed (Taylor et al., 2007; Brazma, 2009). Using transcript profiling data, one can investigate how genes are expressed, when genes are active and/or differentially expressed, and which other genes show similar ex- pression profiles. Integration of microarray data with PPI data can, for instance, lead to the identification of protein complexes and/or coregulated genes, a better understanding of a group of differentially expressed genes, and the prediction of putative functions for unknown genes (Brown et al., 2005; Gachon et al., 2005; Lisso et al., 2005; Rautengarten et al., 2005; Usadel et al., 2009). Databases and tools such as ACT (Manfield et al., 2006), AtCOECiS (Vandepoele et al., 2009), ATTED-II (Obayashi et al., 2007, 2009), Bio-Array Resource (BAR Toufighi et al., 2005), CressExpress (Srinivasasainagendra et al., 2008), CSB.DB (Steinhauser et al., 2004), GeneCAT (Mutwil et al., 2008), Genevestigator (Zimmermann et al., 2004), Plant Gene Expression Database (Horan et al., 2008), and PRIMe (Akiyama et al., 2008) have been developed with the aim to easily find similarity 1 This work was supported by Ghent University (Bijzonder Onderzoeksfonds Methusalem project no. BOF08/01M00408), the Interuniversity Attraction Poles Programme (grant nos. IUAP VI/25 [BioMaGNet] and VI/33) initiated by the Belgian State Science Policy Office, the European Union 6th Framework Programme (AGRON- OMICS grant no. LSHG–CT–2006–037704), the Institute for the Promotion of Innovation by Science and Technology in Flanders- funded industrial collaboration project with BASF Cropscience, and the Research Foundation-Flanders (postdoctoral fellowship to S.D.B.). * Corresponding author; e-mail [email protected]. The author responsible for distribution of materials integral to the findings presented in this article in accordance with the policy described in the Instructions for Authors (www.plantphysiol.org) is: Stefanie De Bodt ([email protected]). [W] The online version of this article contains Web-only data. www.plantphysiol.org/cgi/doi/10.1104/pp.109.147215 Plant Physiology Ò , March 2010, Vol. 152, pp. 1167–1179, www.plantphysiol.org Ó 2010 American Society of Plant Biologists 1167

CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

Bioinformatics

CORNET: A User-Friendly Tool for Data Miningand Integration1[W]

Stefanie De Bodt*, Diana Carvajal, Jens Hollunder, Joost Van den Cruyce, Sara Movahedi, and Dirk Inze

Department of Plant Systems Biology, Flanders Institute for Biotechnology, and Department of PlantBiotechnology and Genetics, Ghent University, 9052 Ghent, Belgium

As an overwhelming amount of functional genomics data have been generated, the retrieval, integration, and interpretation ofthese data need to be facilitated to enable the advance of (systems) biological research. For example, gathering and processingmicroarray data that are related to a particular biological process is not straightforward, nor is the compilation of protein-protein interactions from numerous partially overlapping databases identified through diverse approaches. However, thesetasks are inevitable to address the following questions. Does a group of differentially expressed genes show similar expressionin diverse microarray experiments? Was an identified protein-protein interaction previously detected by other approaches?Are the interacting proteins encoded by genes with similar expression profiles and localization? We developed CORNET (forCORrelation NETworks) as an access point to transcriptome, protein interactome, and localization data and functionalinformation on Arabidopsis (Arabidopsis thaliana). It consists of two flexible and versatile tools, namely the coexpression tooland the protein-protein interaction tool. The ability to browse and search microarray experiments using ontology terms and theincorporation of personal microarray data are distinctive features of the microarray repository. The coexpression tool enableseither the alternate or simultaneous use of diverse expression compendia, whereas the protein-protein interaction tool searchesexperimentally and computationally identified protein-protein interactions. Different search options are implemented toenable the construction of coexpression and/or protein-protein interaction networks centered around multiple input genes orproteins. Moreover, networks and associated evidence are visualized in Cytoscape. Localization is visualized in pie charts,thereby allowing multiple localizations per protein. CORNET is available at http://bioinformatics.psb.ugent.be/cornet.

A high number of experiments have been per-formed to unravel molecular mechanisms underly-ing diverse biological processes active in the modelplant Arabidopsis (Arabidopsis thaliana). For instance,in microarray experiments, diverse tissues from wild-type plants as well as mutant or transgenic plants aresampled at different developmental stages and treatedwith numerous compounds. Although enormousamounts of data have been generated, it remains ahurdle to sift through the heterogeneous informationto find data relevant to a particular biological question.Data resulting from profiling studies are stored indifferent formats in various databases. Microarraydata and, in particular, corresponding meta-data (e.g.

sampled tissue, time point, treatment) are stored inan unstructured manner, which complicates dataretrieval and interpretation. Also, proteomics data,such as protein-protein interaction (PPI) data, are dis-persed over several databases in somewhat differentformats. Although some efforts have been made, suchas setting up MIAME (for Minimum InformationAbout a Microarray Experiment) and MIAPE (forMinimum Information About a Proteomics Experi-ment), these systems are not (yet) generally employed(Taylor et al., 2007; Brazma, 2009). Using transcriptprofiling data, one can investigate how genes areexpressed, when genes are active and/or differentiallyexpressed, and which other genes show similar ex-pression profiles. Integration of microarray data withPPI data can, for instance, lead to the identification ofprotein complexes and/or coregulated genes, a betterunderstanding of a group of differentially expressedgenes, and the prediction of putative functions forunknown genes (Brown et al., 2005; Gachon et al.,2005; Lisso et al., 2005; Rautengarten et al., 2005;Usadel et al., 2009).

Databases and tools such as ACT (Manfield et al.,2006), AtCOECiS (Vandepoele et al., 2009), ATTED-II(Obayashi et al., 2007, 2009), Bio-Array Resource (BARToufighi et al., 2005), CressExpress (Srinivasasainagendraet al., 2008), CSB.DB (Steinhauser et al., 2004), GeneCAT(Mutwil et al., 2008), Genevestigator (Zimmermannet al., 2004), Plant Gene Expression Database (Horanet al., 2008), and PRIMe (Akiyama et al., 2008) havebeen developed with the aim to easily find similarity

1 This work was supported by Ghent University (BijzonderOnderzoeksfonds Methusalem project no. BOF08/01M00408), theInteruniversity Attraction Poles Programme (grant nos. IUAP VI/25[BioMaGNet] and VI/33) initiated by the Belgian State Science PolicyOffice, the European Union 6th Framework Programme (AGRON-OMICS grant no. LSHG–CT–2006–037704), the Institute for thePromotion of Innovation by Science and Technology in Flanders-funded industrial collaboration project with BASF Cropscience,and the Research Foundation-Flanders (postdoctoral fellowship toS.D.B.).

* Corresponding author; e-mail [email protected] author responsible for distribution of materials integral to the

findings presented in this article in accordance with the policydescribed in the Instructions for Authors (www.plantphysiol.org) is:Stefanie De Bodt ([email protected]).

[W] The online version of this article contains Web-only data.www.plantphysiol.org/cgi/doi/10.1104/pp.109.147215

Plant Physiology�, March 2010, Vol. 152, pp. 1167–1179, www.plantphysiol.org � 2010 American Society of Plant Biologists 1167

Page 2: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

between genes based on their expression, and data-bases such as IntAct (Hermjakob et al., 2004), BioGRID(Stark et al., 2006), DIP (Salwinski et al., 2004), MINT(Chatr-aryamontri et al., 2007), BIND (Bader et al.,2003), BAR Arabidopsis Interactions Viewer (Geisler-Lee et al., 2007), and AtPID (Cui et al., 2008) have beenbuilt to easily retrieve PPIs. In the next paragraph, wedescribe the tools that have been developed to enablesmall-scale coexpression studies rather than the nu-merous large-scale coexpression studies often involv-ing clustering. For a more detailed discussion on thecoexpression tools as well as other databases and toolsdeveloped for hypothesis generation in plant biology,we point to two recent reviews (Brady and Provart,2009; Usadel et al., 2009).

Current coexpression tools allow the visualization ofgene expression profiles and/or the search for genesthat are coexpressed with one or more genes of inter-est. To identify coexpression, these tools employ ameasure, such as the Pearson correlation coefficient, acorrelation rank (Obayashi et al., 2009), or linear re-gression (Srinivasasainagendra et al., 2008), followedby either applying an absolute cutoff or selecting thetop x most correlated genes. The implementation ofcoexpression analysis is well advanced, with sometools that provide a flexible choice of input expressiondata sets. For instance, BAR Expression Angler allowsthe use of different types of expression data sets,among them the AtGenExpress compendia (Schmidet al., 2005; Toufighi et al., 2005; Kilian et al., 2007;Goda et al., 2008), whereas CressExpress allows theselection of microarray experiments based on tissuetypes (Srinivasasainagendra et al., 2008). Intuitively,the use of different expression data sets can yielddifferent degrees of expression correlation betweengenes because some genes might behave similarlyunder certain conditions and differently under others.In other words, condition-dependent and condition-independent coexpression analyses have to be distin-guished (Usadel et al., 2009). Therefore, a flexible andefficient compilation of the expression data sets usedto calculate expression correlation needs to be enabled.

In contrast to the coexpression analysis, only a fewtools provide additional functionalities, such as re-trieval of PPIs, functions, pathways, and cis-regulatoryelements, and the network visualization. The follow-ing tools have incorporated PPI data from one or moreof the above-mentioned PPI databases. The output ofthe BAR Expression Angler displays Gene Ontology(GO) functional categories and PPI data from the BARArabidopsis Interactions Viewer (Toufighi et al., 2005;Geisler-Lee et al., 2007). ATTED-II provides PPIs,Kyoto Encyclopedia of Genes and Genomes pathwayinformation, and cis-regulatory elements in additionto coexpression links (Obayashi et al., 2009). VirtualPlant provides a network analysis tool that compilesPPI data (BIND, interolog detection, and AtPID),microRNA:RNA associations, enzymatic reactions(both primary and secondary), and regulatory linksbased on binding site occurrence (Gutierrez et al., 2007).

To a large extent, the representation of the outputdetermines the accessibility and interpretability of theresults. The aforementioned tools came up with dif-ferent solutions to represent coexpression and inter-action data. In most tools, the output is in tabularformat (such as in CressExpress [Srinivasasainagendraet al., 2008]). Although this format has many advan-tages for the advanced user who can import the resultsin other software tools, it does not allow immediateinspection of the results by less experienced users.With the BAR Expression Angler, the viewing anddownloading of results are possible in both text andmatrix formats (Toufighi et al., 2005), and with theDataMetaFormatter, functional classification of thecoexpressed genes and PPIs are displayed on a click-able map of the matrix of coexpression data linkingto other BAR tools. The BAR Arabidopsis InteractionsViewer allows the export of PPI networks to Cytoscapesif format. ATTED-II generates a network representa-tion of the results (Obayashi et al., 2009). Althoughintuitively very comprehensive, the network views arestatic, ruling out visualization and exploration of largenetworks (Obayashi et al., 2007). In addition, networkvisualization is only possible in within-query genesearches. Only these small networks can be down-loaded in tab-delimited, Pajek or Cytoscape sif formats(Shannon et al., 2003; de Nooy et al., 2005). In the latestversion of ATTED-II, precalculated networks of par-ticular genes can be viewed using the Google MapsAPI (Obayashi et al., 2009). PRIMe allows coexpres-sion analysis of multiple genes, provides the results innetwork files that can be viewed in dedicated soft-ware, such as Pajek (de Nooy et al., 2005) or Biolayout(Goldovsky et al., 2005), and thus allows the explora-tion of larger networks (Akiyama et al., 2008). Thenetwork analysis tool of Virtual Plant visualizes theresulting networks in Cytoscape Web Start (Gutierrezet al., 2007).

Taking into account all these features, we developed anew user-friendly tool for data mining and integration,with theacronymCORNET(forCORrelationNETworks),that is accessible through http://bioinformatics.psb.ugent.be/cornet. We collected the majority of the cur-rentlyavailablemicroarrayexpressiondata; correspond-ing meta-data describing sampled tissues, treatments,and time points of sampling; PPI data; localizationdata; and functional information in a central database.A user-friendly interface allows one to query the data-base, enabling coexpression analysis through a multi-tude of search options addressing diverse biologicalquestions. Several predefined expression data sets,such as global compendia representing diverse exper-imental conditions as well as tissue- or treatment-specific expression data sets, are provided. In addition,the user can compile expression data sets from publicas well as private microarray data or can uploadpersonal processed expression data sets. Directed se-lection of microarray experiments is possible, as allmeta-data are described with standardized ontologyterms and stored in the database. Not only is it possible

De Bodt et al.

1168 Plant Physiol. Vol. 152, 2010

Page 3: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

to calculate expression correlation based on one partic-ular data set, but coexpression also can be assessedsimultaneously among several expression data sets.PPI networks can be reconstructed with both experi-mentally identified and computationally predicteddata. Moreover, coexpression and PPI networks canbe integrated. CORNET generates a comprehensivevisualization that provides a bird’s eye view of theresults and the different degrees of reliability of theextracted information. The toolmakes use ofCytoscapeWeb Start, which has the advantage that all function-alities of Cytoscape itself and numerous plug-ins canbe exploited to further explore the constructed net-works (Shannon et al., 2003).

RESULTS AND DISCUSSION

Primarily, CORNET is composed of two tools,namely the coexpression tool and the PPI tool, con-structing coexpression and PPI networks, respectively(Fig. 1). Both tools can be used autonomously butcan also be used consecutively to build a network ofcoexpression links as well as PPIs. Additionally, local-ization and functional information (GO terms andprotein domain information) can be displayed on theconstructed networks.

Annotation of Microarray Experiments

All expression data available at Gene ExpressionOmnibus (Barrett and Edgar, 2006) and resulting fromexperiments carried out on Affymetrix ATH1 arrayswere incorporated into the CORNET database. As themeta-data of the microarray experiments available inpublic databases are very unstructured and hard toprocess automatically, and as information on growthconditions, treatments, sampled tissues, and geno-types is difficult to retrieve, we described the meta-data of the microarray experiments by manuallyassigning ontology terms. Existing ontologies wereexploited to avoid confusion and redundancy. PlantOntology (Plant Structures and Plant Growth andDevelopmental Stages; Bruskiewich et al., 2002; Pujaret al., 2006; Ilic et al., 2007; Avraham et al., 2008), PlantEnvironmental Conditions (www.gramene.org), andMGED Ontology (Whetzel et al., 2006) were used(Supplemental Fig. S1). The final aim of this ontology-based annotation was the automatic and comprehen-sive retrieval of microarray experiments, similar to theselection of microarray experiments in Genevestigator(Zimmermann et al., 2004). With these microarraydata, we compiled different, so-called predefined ex-pression compendia (see “Materials and Methods”).Compendium 1 is a set of microarray experimentscovering diverse conditions but somewhat biased to-ward growth and development. Compendium 2 alsocontains diverse conditions, but biases toward partic-ular design types (see “Materials and Methods”) arereduced as much as possible. In addition, highly re-

dundant experiments are removed, resulting in a setof approximately 100 experiments. Finally, we com-piled several specific expression compendia (abioticstress, biotic stress, development, flower, genetic mod-ification, hormone, leaf, root, seed, and abiotic plusbiotic stress data sets) using the design types andontology terms (see “Materials and Methods”). Theuser can temporarily upload personal raw expressiondata and annotate and incorporate the data into user-defined expression data sets (see “Materials andMethods”).

Coexpression Tool

Using the coexpression tool, genes with similarexpression profiles in a number of experimental con-ditions can be identified. When performing a coex-pression analysis, first, one needs to decide on theinput expression data that will be used. The user caneither select the predefined expression compendia orcompile user-defined data sets. In the latter option, theuser is directed to the “Browse experiments” page,where a set of microarray experiments can be assem-bled using ontology terms that describe the meta-data(Supplemental Fig. S1). The ontology terms allowan easily reproducible and intuitive selection of themicroarray experiments without going through eachindividual experiment. Users should keep in mindthat user-defined expression data sets should be largeenough to enable reliable calculation of the correlationcoefficients (Usadel et al., 2009). In the next step, thecoexpression tool page is displayed, where one ormore genes can be introduced for coexpression anal-ysis (step 1; Supplemental Fig. S1). Subsequently, oneor more of the predefined, previously generated user-defined, or personal, preprocessed expression datasets needs to be selected (step 2). Below, more detailsare given on the importance of different expressioncompendia. Next, one can choose to calculate eitherPearson or Spearman correlation coefficients (Spearmancan only be chosen when one expression compen-dium is selected; see “Materials and Methods”). Then,thresholds to limit the number of results and searchoptions, referring to different biological questions,can be chosen (Fig. 2). Either an absolute or a relativethreshold or both can be chosen, namely a correlationcoefficient threshold and/or a number of the mosthighly coexpressed genes (top x), respectively. In thecase of multiple compendia, each compendium istreated separately in a first step, and subsequently,these results are combined, depending on the “all/atleast” parameter. When coexpression in at least onecompendium is chosen (at least = 1), all coexpressionlinks found in the different compendia are reported(union). When coexpression needs to hold true formultiple expression compendia (at least .1), the in-tersection of the individual results is reported.In addition, average, minimum, and maximum cor-relation coefficients over all expression compendiathat meet the chosen thresholds are reported. The

CORNET: Coexpression and Interaction Networks

Plant Physiol. Vol. 152, 2010 1169

Page 4: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

search options entail the calculation of expressioncorrelation in a pairwise manner between given genesor a list of gene pairs uploaded as a tab-delimited file

(“Pairwise correlations”), between one or more givengenes and all genes in the genome [“Correlation ofquery gene(s) with neighbors”], and between genes

Figure 1. Functionalities of CORNET.

De Bodt et al.

1170 Plant Physiol. Vol. 152, 2010

Page 5: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

that show coexpression with the given gene(s) (“Cor-relations between neighbors”; Fig. 2). These searchoptions are more flexible and intuitive than the func-tionalities of most existing coexpression tools, whichallow the search for coexpressed genes of one or a fewquery genes. Evidently, thresholds need to be chosencarefully when the two latter search options are se-lected because huge amounts of results can be gener-ated. We advise starting off by choosing a relativelyhigh correlation coefficient threshold (such as 0.9) orthe top 10 genes and gradually lowering the strin-gency. Finally, one can integrate localization informa-tion and/or PPIs using the PPI tool as discussedbelow.

Selection of Multiple Microarray Expression Compendia

When comparing the expression profiles of two genes,the input microarray data are expected to influence theobservations (Usadel et al., 2009). For instance, due topleiotropic functions or the combinatorial nature ofcis-regulation, some genes may coexpress under cer-tain conditions but differ in expression under othersand/or coexpress with other genes under these con-ditions. Using a global measure of coexpression, suchas the Pearson correlation coefficient, all conditions inthe input expression compendium are used. Thus,depending on the nature of the studied genes and theinterest of the user, different input expression com-pendia can be imagined, some being subsets of eachother. For instance, when looking for genes that aresimilar to a drought stress-responsive gene, an expres-sion compendium representing abiotic stress condi-tions can be used to identify specific and relevantrelations.

To investigate the variability of expression correla-tion over different expression compendia, several sub-sets of expression data, namely the abiotic stress, bioticstress, development, flower, genetic modification, hor-mone, leaf, root, seed, and abiotic plus biotic stresssets, as well as three global compendia, namely theAtGenExpress compendium, Compendium 1, andCompendium 2, were considered (Table I; see “Mate-rials and Methods”) and are provided as predefinedcompendia in CORNET. We observe that expressioncorrelation can vary with the input expression com-pendia independently from the chosen threshold (Fig.3). Overall, few coexpression links held true whentaking into account multiple compendia. Twenty-fivepercent to 30% of all gene pairs are correlated based onone expression compendium, while about 10% of thecorrelated gene pairs also show correlation in threeother expression compendia, and this number gradu-ally drops when more expression compendia aretaken into account. Only 0.01% to 0.05% of the genepairs show coexpression in all 14 compendia simulta-neously (Fig. 3). The majority of the genes that showcoexpression over all compendia encode ribosomalproteins or are involved in phytosynthesis (Pearsoncorrelation coefficient threshold of 0.9). The coex-pressed genes have a high average expression leveland are highly variable across conditions. Based onthis study, we can conclude that, when performingcoexpression analyses, expression correlation can varysignificantly when using different expression condi-tions. Consequently, coexpression tools need to enablethe estimation of coexpression over diverse expressioncompendia or in specific expression compendia. Theidentification of coexpression under specific condi-tions can be employed to seek genes with character-istics common to genes of interest (such as similarexpression upon abiotic stress treatment). In contrast,coexpression analysis can be carried out with a collec-tion of expression compendia, representing diverseconditions, and lead to the identification of those con-

Figure 2. Search options of coexpression and PPI tools. A, Correlationsor interactions of query gene/protein with neighbors. B, Correlations orinteractions of query gene/protein with neighbors plus correlations orinteractions between neighbors. C, Pairwise correlations or interac-tions. D, Pairwise correlations or interactions plus correlations orinteractions of query genes/proteins with neighbors. E, Pairwise corre-lations or interactions plus correlations or interactions of query genes/proteins with neighbors plus correlations between neighbors.

CORNET: Coexpression and Interaction Networks

Plant Physiol. Vol. 152, 2010 1171

Page 6: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

ditions in which the genes of interest show similar ex-pression patterns. For genes with limited functionalinformation, expression compendia delivering highcorrelations hint at possible functional activities of theunknown genes. Using CORNET, this can either bedone using the predefined expression compendia oruser-defined compendia individually or by consider-ing several predefined data sets. In the latter, coex-pression links found in all selected compendia or in atleast x compendia are reported (see above). As such,coexpression in, for instance, root only, leaf only, orboth root and leaf can be studied in one analysis.

PPI Tool

As for the coexpression tool, the PPI tool needs oneor more proteins as input (step 1; Supplemental Fig.S1). Next, different PPI databases can be chosen (step2; see “Materials and Methods”) to extract only exper-imentally identified PPIs, only computationally pre-dicted interactions, or both. As in the coexpressiontool, different search options can be selected, namelysearch for PPI in a pairwise manner (Pairwise interac-tions), search for proteins that interact with the givenprotein(s) [“Interactions of query protein(s) withneighbors”], and/or search if the proteins that interactwith the given protein(s) also interact (“Interactionsbetween neighbors”; step 3; Fig. 2). Finally, also here,integration of coexpression and localization informa-tion is possible.

Integration of Coexpression and PPI Networks

CORNET allows the integration of coexpressionand PPIs networks, which can be approached in twoways. One can start with the coexpression tool, identify-

ing genes that coexpress with each other, and subse-quently test if the corresponding proteins interactand if these proteins interact with other proteins orvice versa. One needs to keep in mind that the orderof the analysis will yield different results when choos-ing the option “Correlations/interactions with neigh-bors.” For instance, when a coexpression analysis isfollowed by a PPI search, all coexpressed genes of thefirst analysis are used as input for the PPI search andgenes that do not show coexpression with other genesare not included as input for the PPI tool. Conversely,when first performing a PPI search, only proteins forwhich interactions have been found will be used asinput for the subsequent coexpression analysis. De-pending on the question to be addressed, one or bothapproaches can be opted for.

Figure 3. Variability of expression correlation. Only a few genes arecoexpressed based on multiple expression compendia. Expressioncorrelation is defined by a Pearson correlation threshold of 0.9 (blue),0.8 (red), top 1% most correlated gene pairs (green), and top 10%mostcorrelated gene pairs (purple).

Table I. Predefined expression compendia

Expression Compendium Description No. of Experiments

AtGenExpress All experiments performed by AtGenExpress 425Compendium 1 Collection of microarray experiments oriented toward growth, development,

and cell cycle studies454

Compendium 2 Collection of microarray experiments; very similar experiments were removed;similar numbers of experiments of each design type

111

Abiotic stress Abiotic stress series (cold, drought, genotoxic, heat, osmotic, oxidative, salt,UV-B, wounding)

256

Biotic stress Biotic stress series (Botrytis, Pseudomonas, Phytophthora, etc.) 69Abiotic plus biotic stress Combination of the abiotic and biotic stress data sets 336Hormone treatment Hormone treatment series (abscisic acid, brassinosteroids, GA, cytokinin, etc.,

and inhibitors)140

Development Developmental series (different tissues, developmental stages, developmentalmutants)

135

Flower Microarray experiments in which floral tissues are sampled 72Leaf Microarray experiments in which leaf tissues are sampled 212Root Microarray experiments in which root tissues are sampled 258Seed Microarray experiments in which seed tissues are sampled 83Whole plant Microarray experiments in which whole plants are sampled 85Genetic modification Microarray experiments in which transgenic lines are profiled (gene overexpression

[knockin], gene knockout, transient transgene expression)313

De Bodt et al.

1172 Plant Physiol. Vol. 152, 2010

Page 7: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

Figure 4. (Legend appears on following page.)

CORNET: Coexpression and Interaction Networks

Plant Physiol. Vol. 152, 2010 1173

Page 8: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

When integrating coexpression and PPI data, thedegree of coexpression of genes encoding for interact-ing proteins can be studied using CORNET. A globalstudy of all experimentally identified PPIs showed therelatively low concordance between coexpression andinteractions (see Fig. 2 in De Bodt et al., 2009). Thisobservation can be confirmed using the different ex-pression compendia available in CORNET. The meancorrelation coefficient for experimentally identifiedPPIs ranges between 0.13 and 0.21. As previouslymentioned in the literature, this low degree of expres-sion similarity is probably due to the transient nature ofPPIs.

Visualization of Coexpression and PPI Networks

For network visualization, the existing softwareCytoscape was designated (Shannon et al., 2003; Fig. 1)because its functionalities allow browsing and zoom-ing into the constructed networks, a visual as wellas textual representation of diverse attributes (e.g. cor-relation coefficient, localization databases; see Fre-quently Asked Questions [FAQ] page and VizMapperin Cytoscape) and further exploration and analysis ofthe networks. The degree of expression correlation isrepresented by the color of the edges (or correlations)in the coexpression network, where, for instance, blueedges correspond to highly correlated expression pro-files. In case multiple expression compendia wereselected in the coexpression tool, the minimum, max-imum, or average Pearson correlation coefficient (asrequested by the user) is displayed. However, allretrieved Pearson correlation coefficients are reportedin the text output and as composite attributes (data setcoefficients) in Cytoscape. PPIs are depicted by blackedges, whereas their reliability can be assessed throughthe width (number of data sources) and the style(detection method) of the edges (or interactions) in thePPI network. Experimentally identified interactionsare represented by solid lines, while computation-ally predicted interactions are represented by dashedlines. The shape of the nodes (or genes/proteins)depicts the nature of the gene/protein as query or asneighbor in either the coexpression or PPI tool. Allattributes can be displayed in the lower data panel ofCytoscape by clicking on the Select Attributes button(see FAQ) and copied and/or exported in tabularformat at any time.

Case Study: DELLA Network

To demonstrate the functionalities of CORNET and,particularly, the use of multiple expression compendiain the coexpression tool, we have investigated themolecular context of DELLA proteins. The Arabidop-sis genome encodes five DELLA proteins (REPRESSOROF GA [RGA], GIBBERELLIC ACID INSENSITIVE[GAI], RGA-LIKE1 [RGL1], RGL2, and RGL3) that actas negative regulators of the GA signaling pathway.DELLA proteins bind to the GIBBERELLIN INSENSI-TIVE DWARF1 (GID1) receptor in the presence of GA.The GA-GID1-DELLA complex is then targeted by anSCF-E3 ligase (SLEEPY), resulting in ubiquitinylationand degradation of DELLA by the 26S proteasome,thereby relieving DELLA-mediated repression of GAresponses. DELLAs are involved in diverse processessuch as flower development, seed germination, leafgrowth, and abiotic stress response (Tyler et al., 2004;Achard et al., 2006; Schwechheimer, 2008; Achard andGenschik, 2009). Although the functional differencesbetween the DELLAs are not completely clear, RGAand GAI have been shown to repress stem elongation(Dill and Sun, 2001; King et al., 2001), RGL2 inhibitsseed germination (Lee et al., 2002), and RGA, RGL1,and RGL2 together regulate floral development(Cheng et al., 2004; Tyler et al., 2004; Yu et al., 2004).Whereas RGA and GAI are highly expressed in mosttissues, RGL1, RGL2, and RGL3 are mainly expressedin germinating seeds, young seedlings, and flowers,indicating that these signaling molecules might betranscriptionally regulated (Tyler et al., 2004). Thus,DELLA proteins are highly pleiotropic and serve as anideal test case.

Different functions of the DELLA proteins surfacewhen employing particular expression compendia(Fig. 4). First, we have generated a network usingCompendium 2, which represents diverse conditionsand gives a global estimate of coexpression. Subse-quently, experimentally identified PPIs were addedbetween the resulting proteins and all other proteins(neighbors) as well as between those other interactingproteins (neighbors). Two main groups of highly con-nected genes could be delineated in this network (Fig.4A). The degree of correlation between the genes ofboth groups clearly differed (blue edges [group I]versus red edges [group II]). Group I consists of RGL2and RGL3 and several LEA (for late embryogenesisabundant) genes with roles in seed development,

Figure 4. DELLA networks generated by CORNETusing multiple compendia. The DELLA network based on Compendium 2 (A)shows two groups of coexpressed genes where group I is highly coexpressed (blue edges) and group II shows a lower degree ofcoexpression (red edges). The networks based on Compendium 2 plus abiotic stress plus hormone treatment compendia (B) andCompendium 2 plus leaf plus seed compendia (C) reveal new coexpression links and new coexpressed genes. Red edgesrepresent coexpression links found in the hormone (B) and seed (C) compendia, and green edges represent expression links foundin the abiotic stress (B) and leaf (C) compendia. Edges starting and ending at the same node represent homodimerization ofproteins.

De Bodt et al.

1174 Plant Physiol. Vol. 152, 2010

Page 9: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

while group II contains GAI, RGA, and RGL1, somehormone-related genes such as BRASSINOSTEROIDINSENSITIVE1 and AFB5 (auxin F-box), and CLIP-associating protein-related genes involved in mitosis,growth, and protein stability, as well as many othergenes. Both groups are connected to each other throughPPI links between the DELLAs and the GID receptors,SLEEPY1, PHYTOCHROME B, PHYTOCHROMEINTERACTING FACTOR3, and PIF4, interactionsthat have been identified by various experimentalapproaches (Supplemental Files S1, “Features” attri-bute in Cytoscape; Dill et al., 2004; Fu et al., 2004;Griffiths et al., 2006; Nakajima et al., 2006; Achardet al., 2007; Ariizumi et al., 2008; de Lucas et al., 2008;Feng et al., 2008). Next, we investigated how thisnetwork varied with different, more specific expres-sion compendia, namely the abiotic stress and hor-mone treatment compendia on the one hand (Fig. 4B)and the leaf and seed compendia on the other hand(Fig. 4C; see “Materials andMethods”), again followedby a search for experimentally identified PPIs. Strik-ingly, a high number of new coexpression links andcoexpressed genes were identified that had not beenfound with Compendium 2 (Fig. 4, B and C; Supple-mental Tables S1 and S2). Nevertheless, the twogroups of genes and especially the degree of expres-sion correlation between the respective genes were stilldemarcated, although some coexpression links be-tween the two groups appeared. When examiningthe expression compendia that supported the differentcoexpression links, an additional distinction betweengroup I and group II could be noticed. Group I linkswere mainly retrieved using the hormone treatmentand the seed compendia, while some group II linkswere found using the abiotic stress and leaf compen-dia (Fig. 4, B and C). In other words, genes in eachgroup had similar expression patterns in particulartissues and conditions and probably had associatedfunctions primarily in those tissues and conditions.Through CORNET, the expression profiles of thesegenes for the different expression compendia could

be explored to further pinpoint the possible similari-ties and functions of the two groups. Group I genesshowed similar and high expression in the imbibitionstage (the first stage of seed germination) as well as inlater stages of embryo development. In addition, thesegenes were highly expressed in gibberellic acid andabscisic acid experiments, in which seeds or embryoswere sampled, as well as in methyl jasmonate andcytokinin experiments, with sampling of stamens andshoots, respectively.

Accordingly, Cao et al. (2006) observed that the setof genes that are presumably regulated by DELLAs forseed germination overlaps little with and is largelydistinct from the set of DELLA-regulated genes in-volved in floral development when investigating tran-script profiling results from ga1-3 and ga1-3 gai-t6 rga-t2rgl1-1 rgl2-1 seeds and flowers. This observation sug-gests that GA-mediated seed germination and floraldevelopment are under the control of distinct DELLA-dependent transcriptomes. Our analysis suggests asimilar distinction between the sets of genes involvedin processes ongoing in seed and leaf and betweensets of genes involved in hormone and abiotic stress-related processes.

CONCLUSION

We have developed CORNET, a tool for the con-struction of coexpression and PPI networks and theirfunctional annotation in Arabidopsis (bioinformatics.psb.ugent.be/cornet). With this tool, we aim at pro-viding biologists with the means to investigate theassociations between genes and between encoded pro-teins. Thereby, we provide the ability to better under-stand the functional context of a gene, leading to functionprediction of unknown genes or prediction of (indirect)regulatory interactions between known genes and pro-teins.

In the case study of DELLA proteins, coexpressionand PPI networks demonstrate the importance ofcareful coexpression analysis. Both the correct selec-

Table III. Number of microarray experiments in each design type

Design Type No. of Experiments

Stimulus_or_stress_design 13Abiotic_stress_design 255Biotic_stress_design 322Compound_treatment_design 23Hormone_treatment_design 140Genetic_modification_design 314Growth_condition_design 69Time_series_design 357Circadian_rhythm_design 15Development_or_differentiation_design 134Individual_genetic_characteristics_design 3Organism_part_comparison_design 121Strain_or_line_design 22Translational_bias_design 2

Table II. Statistics on the CORNET database

Data No.

MicroarrayArrays 3,055Experiments 1,209Series 200Experiments with two replicates 634Experiments with more than two replicates 575

PPIsExperimentally identified interactions 4,302Computationally predicted interactions 89,181Total interactions 93,109

LocalizationExperimentally identified localizations 14,135Computationally predicted localizations 11,310Total localizations 18,824

CORNET: Coexpression and Interaction Networks

Plant Physiol. Vol. 152, 2010 1175

Page 10: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

tion of expression compendia and of adequate searchoptions and thresholds can greatly enhance the powerof coexpression tools to uncover new associationsbetween genes and between genes and processes/microarray conditions. Moreover, the comprehensivevisualization, as illustrated by Figure 4, allows a bird’seye view of the constructed networks, instantly dis-covering those associations without any requirementfor bioinformatics skills, such as scripting or compli-cated database queries.

In the future, we plan to add different data types asthey become available. First, we foresee the incorpo-ration of cis-regulatory elements, which can be tightlylinked with the coexpression results, as is workedout in AtCOECiS (Vandepoele et al., 2009). Second,through comparative genomics approaches, construc-ted networks can be transferred to other plant species,such as poplar (Populus species), tomato (Solanumlycopersicum), rice (Oryza sativa), and crops of interest.Last, the inclusion of protein-DNA (or transcriptionfactor-target) interactions will be very profitable, asdata are generated through chromatin immunoprecip-itation (ChIP)-chip and ChIP-seq or indirectly throughthe analysis of transcriptome data.

MATERIALS AND METHODS

CORNET Troubleshooting

CORNETcan be accessed through the followingURL: http://bioinformatics.

psb.ugent.be/cornet. The tool is fully functional in Firefox and Safari browsers.

First users might need to accept a security certificate before accessing the Web

site. The site is ideally viewed at 1,280 3 1,024 resolution. You need to allow

pop-ups in your browser before clicking the “go” button. After calculations and

database queries, Cytoscape will start automatically from the Web. In other

words, Cytoscape does not have to be installed on your computer. However, to

enable the Cytoscape WebStart, an up-to-date version of Java is required. More

details can be found on our FAQ page.

Construction of the Microarray Database

All expression data available at Gene Expression Omnibus (Barrett and

Edgar, 2006) and resulting from experiments with Affymetrix ATH1 arrays

were uploaded in the CORNET database (Table II). Only experiments where

two or more replicates were performed are included. Meta-data were de-

scribed using ontology terms (for more details, see “Results and Discussion”).

However, in some instances, these ontologies did not suffice to describe the

microarray experiments. Existing ontologies were extended as necessary to

allow amore detailed description of tissues, transgenic lines, and experimental

designs (indicated by “EXT”). The experimental designs proved very valuable

in the compilation of specific expression compendia (e.g. development_or_

differentiation_design, genetic_modification_design, compound_treatment_

design, abiotic_stress_design, biotic_stress_design, time_series_design, hormone_

treatment_design; see below; Table III).

Microarray Experiment Browsing and Retrieval

Microarray experiments and their meta-data can be browsed through

CORNET (Browse experiments). Using a tree-based representation, the on-

tology terms can be browsed and selected in order to compile specific sets of

microarray experiments. For each microarray experiment, the different rep-

licates and the link between control and treatment arrays are shown. On the

one hand, raw data together with their annotation can be downloaded. On

the other hand, processed data can be downloaded or used as input for the

coexpression tool. The microarray data are processed with the Robust Multi-

array Average procedure implemented in BioConductor (Irizarry et al., 2003a,

2003b; Gautier et al., 2004; Gentleman et al., 2004) using an alternative Chip

Description File that takes into account possible cross-hybridization (tine-

sath1cdf; Casneuf et al., 2007).

Uploading Personal Expression Data

CORNETallows the incorporation of personal expression data (“Upload”).

Personal data can be uploaded temporarily, processed with Robust Multiarray

Average, and downloaded for later use. In addition, the uploaded data can be

combined with public expression data in user-defined data sets and subse-

quently downloaded for later use and/or used as input for the coexpression

tool. The submitter can access these personal data for 24 h after submission.

The personal data are linked to the session identifier of the browser.

Predefined Expression Compendia

The coexpression tool of CORNET makes use of the following predefined

expression compendia. First, an expression data set (“AtGenExpress”) in-

cluding the publicly available AtGenExpress compendia (such as abiotic

stress, pathogen, development, and hormones) is provided (Schmid et al.,

2005; Kilian et al., 2007; Goda et al., 2008). Second, we have compiled two

different microarray compendia, which cover diverse conditions and conse-

quently can be used to estimate an overall degree of coexpression taking into

account all conditions. The first compendium (Compendium 1) covers diverse

conditions that mainly deal with plant development and growth. The second

compendium (Compendium 2) was built to reduce the bias toward particular

conditions as much as possible. Moreover, the redundant information that

microarray experiments can generate is assessed by calculating the correlation

between experiments rather than between genes. A cutoff of 0.99 is used to

identify redundant experiments. From each group of redundant experiments,

one or more experiments are selected, taking the experiments with the highest

number of replicates and the noncontrol experiments. Subsequently, an equal

number of experiments of each type of condition (according to the assigned

“design” terms) is selected. In addition to these “global” compendia, we have

compiled specific expression data sets using the design and ontology terms

mentioned above. As such, expression compendia were generated that are

specific to certain tissues (leaf, root, flower, seed, and whole plant) or certain

treatments (abiotic stress, biotic stress, and hormone treatment), compendia of

experiments in which development is perturbed, or in which genes are

modified (overexpression, knockout, or silencing lines; Tables I and III).

Expression profiles for particular genes can be viewed in automatically

generated line graphs.

Coexpression Tool

To quantify the similarity in expression profiles, the commonly used

Pearson and Spearman correlation coefficients can be calculated. Pearson is a

parametric method based on actual expression values, while Spearman is a

nonparametric method based on ranks. Both measures range from 21 (anti-

correlation) over 0 (no correlation) to 1 (correlation). When only one expres-

sion compendium is chosen for coexpression analysis, correlation coefficients

are calculated in real time. However, when more than one expression

compendium is chosen, it is no longer feasible to do the calculations in real

time. Alternatively, the correlation coefficients are extracted from the database

containing precalculated Pearson correlation coefficients higher than 0.6 and

lower than 20.6 (this limitation is due to space and time constraints).

Arabidopsis Genome Initiative codes are used to describe the genes. Only

genes represented on the Affymetrix ATH1 array are taken into account in the

coexpression analysis (in contrast to the PPI tool, where all proteins are taken

into account). In total, 20,777 genes are accounted for using an alternative Chip

Description File that takes into account possible cross-hybridization (tine-

sath1cdf; Casneuf et al., 2007). There is no limit to the number of genes that can

be given as query.

PPI Tool

We have assembled currently available experimentally identified PPIs for

Arabidopsis (Arabidopsis thaliana) from BIND (Bader et al., 2003), IntAct

(Hermjakob et al., 2004), BioGRID (Stark et al., 2006), DIP (Salwinski et al.,

2004), MINT (Chatr-aryamontri et al., 2007), and The Arabidopsis Information

Resource (Rhee et al., 2003), the predicted PPIs from BAR (Geisler-Lee et al.,

2007) and AtPID (Cui et al., 2008), and the filtered (high-stringency) and

De Bodt et al.

1176 Plant Physiol. Vol. 152, 2010

Page 11: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

predicted (low-stringency) interactions identified in our own study (De Bodt

et al., 2009; Table II). To be able to grasp the reliability of the PPIs, we

distinguish between experimental and predicted PPIs and indicate the dif-

ferent data sources (database, experiment type, evidence code, and PubMed

identifier) as edge attributes in Cytoscape (see “Results and Discussion”;

Shannon et al., 2003). Where possible, gene names are mapped on the nodes of

the network. Alternatively, gene descriptions are shown. Gene names and full

gene descriptions downloaded from The Arabidopsis Information Resource

are stored as “description” and “descriptionLong” attributes in Cytoscape.

Gene Information

To allow easy interpretation of the constructed networks, we add locali-

zation data and functional information to the genes. The localization data are a

collection of both experimental and predicted localizations retrieved from

SUBA (Heazlewood et al., 2007), IPSort (Bannai et al., 2002), LocTree (Nair and

Rost, 2005), MITOPRED (Guda et al., 2004), MitoProt (Claros, 1995), MultiLoc

(Høglund et al., 2006), PeroxP (Emanuelsson et al., 2003), Predotar (Small

et al., 2004), SubLoc (Chen et al., 2006), TargetP (Emanuelsson et al., 2007), and

WoLF_PSORT (Horton et al., 2007; Table II). These localization data are

depicted in pie charts, allowing multiple localizations for one gene. The

fractions of the pie chart are based on the fraction of databases in which a

particular localization was found. As for PPIs, sources for localization data are

reported in the Cytoscape attributes. In addition to the localization data, we

integrate InterPro protein domain information (Hunter et al., 2009) and GO

Biological Process and GOMolecular Function data (Harris et al., 2004), which

can be viewed as node attributes in Cytoscape (see “Results and Discussion”;

Shannon et al., 2003).

Supplemental Data

The following materials are available in the online version of this article.

Supplemental Figure S1. Screen shots of CORNET.

Supplemental Table S1. Group I genes in the DELLA network (Fig. 4A).

Supplemental Table S2. Group II genes in the DELLA network (Fig. 4B).

Supplemental Files S1. Cytoscape session files of the DELLA networks

(http://bioinformatics.psb.ugent.be/supplementary_data/stbod/cornet/).

ACKNOWLEDGMENTS

We thank Klaas Vandepoele and Fabio Fiorani for fruitful discussions, our

colleagues for helpful comments on the tool, Stijn Vermeir, Thomas Van

Parys, and Lieven Baeyens for technical assistance, and Martine De Cock for

help in preparing the manuscript. We acknowledge three anonymous re-

viewers for helpful comments on the manuscript.

Received September 8, 2009; accepted January 4, 2010; published January 6,

2010.

LITERATURE CITED

Achard P, Cheng H, De Grauwe L, Decat J, Schoutteten H, Moritz T, Van

Der Straeten D, Peng J, Harberd NP (2006) Integration of plant

responses to environmentally activated phytohormonal signals. Science

311: 91–94

Achard P, Genschik P (2009) Releasing the brakes of plant growth: how

GAs shutdown DELLA proteins. J Exp Bot 60: 1085–1092

Achard P, Liao L, Jiang C, Desnos T, Bartlett J, Fu X, Harberd NP (2007)

DELLAs contribute to plant photomorphogenesis. Plant Physiol 143:

1163–1172

Akiyama K, Chikayama E, Yuasa H, Shimada Y, Tohge T, Shinozaki K,

Hirai MY, Sakurai T, Kikuchi J, Saito K (2008) PRIMe: a Web site that

assembles tools for metabolomics and transcriptomics. In Silico Biol 8:

339–345

Ariizumi T, Murase K, Sun TP, Steber CM (2008) Proteolysis-independent

downregulation of DELLA repression in Arabidopsis by the gibberellin

receptor GIBBERELLIN INSENSITIVE DWARF1. Plant Cell 20:

2447–2459

Avraham S, Tung CW, Ilic K, Jaiswal P, Kellogg EA, McCouch S, Pujar A,

Reiser L, Rhee SY, Sachs MM, et al (2008) The Plant Ontology Database:

a community resource for plant structure and developmental stages

controlled vocabulary and annotations. Nucleic Acids Res 36:

D449–D454

Bader GD, Betel D, Hogue CWV (2003) BIND: the Biomolecular Interac-

tion Network Database. Nucleic Acids Res 31: 248–250

Bannai H, Tamada Y, Maruyama O, Nakai K, Miyano S (2002) Extensive

feature detection of N-terminal protein sorting signals. Bioinformatics

18: 298–305

Barrett T, Edgar R (2006) Gene expression omnibus: microarray data

storage, submission, retrieval, and analysis. Methods Enzymol 411:

352–369

Brady SM, Provart NJ (2009) Web-queryable large-scale data sets for

hypothesis generation in plant biology. Plant Cell 21: 1034–1051

Brazma A (2009) Minimum Information About a Microarray Experiment

(MIAME): successes, failures, challenges. ScientificWorldJournal 9:

420–423

Brown DM, Zeef LAH, Ellis J, Goodacre R, Turner SR (2005) Identification

of novel genes in Arabidopsis involved in secondary cell wall formation

using expression profiling and reverse genetics. Plant Cell 17: 2281–2295

Bruskiewich R, Coe EH, Jaiswal P, McCouch S, Polacco M, Stein L,

Vincent L, Ware D (2002) The plant ontology consortium and plant

ontologies. Comp Funct Genomics 3: 137–142

Cao D, Cheng H, Wu W, Soo HM, Peng J (2006) Gibberellin mobilizes

distinct DELLA-dependent transcriptomes to regulate seed germination

and floral development in Arabidopsis. Plant Physiol 142: 509–525

Casneuf T, Van de Peer Y, Huber W (2007) In situ analysis of cross-

hybridisation on microarrays and the inference of expression correla-

tion. BMC Bioinformatics 8: 461

Chatr-aryamontri A, Ceol A, Montecchi Palazzi L, Nardelli G, Schneider

MV, Castagnoli L, Cesareni G (2007) MINT: the Molecular INTeraction

database. Nucleic Acids Res 35: D572–D574

Chen H, Huang N, Sun Z (2006) SubLoc: a server/client suite for protein

subcellular location based on SOAP. Bioinformatics 22: 376–377

Cheng H, Qin L, Lee S, Fu X, Richards DE, Cao D, Luo D, Harberd NP,

Peng J (2004) Gibberellin regulates Arabidopsis floral development via

suppression of DELLA protein function. Development 131: 1055–1064

Claros MG (1995) MitoProt, a Macintosh application for studying mito-

chondrial proteins. Comput Appl Biosci 11: 441–447

Cui J, Li P, Li G, Xu F, Zhao C, Li Y, Yang Z, Wang G, Yu Q, Li Y, et al (2008)

AtPID: Arabidopsis thaliana protein interactome database. An integra-

tive platform for plant systems biology. Nucleic Acids Res 36:

D999–D1008

De Bodt S, Proost S, Vandepoele K, Rouze P, Van de Peer Y (2009)

Predicting protein-protein interactions in Arabidopsis thaliana through

integration of orthology, gene ontology and co-expression. BMC

Genomics 10: 288

de Lucas M, Daviere JM, Rodrıguez-Falcon M, Pontin M, Iglesias-Pedraz

JM, Lorrain S, Fankhauser C, Blazquez MA, Titarenko E, Prat S (2008)

A molecular framework for light and gibberellin control of cell elonga-

tion. Nature 451: 480–484

de Nooy W, Mrvar A, Batagelj V (2005) Exploratory Social Network

Analysis with Pajek. Cambridge University Press, Cambridge, UK

Dill A, Sun TP (2001) Synergistic derepression of gibberellin signaling by

removing RGA and GAI function in Arabidopsis thaliana. Genetics 159:

777–785

Dill A, Thomas SG, Hu J, Steber CM, Sun TP (2004) The Arabidopsis F-box

protein SLEEPY1 targets gibberellin signaling repressors for gibberellin-

induced degradation. Plant Cell 16: 1392–1405

Emanuelsson O, Brunak S, von Heijne G, Nielsen H (2007) Locating

proteins in the cell using TargetP, SignalP and related tools. Nat Protoc 2:

953–971

Emanuelsson O, Elofsson A, von Heijne G, Cristobal S (2003) In silico

prediction of the peroxisomal proteome in fungi, plants and animals. J

Mol Biol 330: 443–456

Feng S, Martinez C, Gusmaroli G, Wang Y, Zhou J, Wang F, Chen L, Yu L,

Iglesias-Pedraz JM, Kircher S, et al (2008) Coordinated regulation of

Arabidopsis thaliana development by light and gibberellins. Nature

451: 475–479

Fu X, Richards DE, Fleck B, Xie D, Burton N, Harberd NP (2004) The

CORNET: Coexpression and Interaction Networks

Plant Physiol. Vol. 152, 2010 1177

Page 12: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

Arabidopsis mutant sleepy1gar2-1 protein promotes plant growth by

increasing the affinity of the SCFSLY1 E3 ubiquitin ligase for DELLA

protein substrates. Plant Cell 16: 1406–1418

Gachon CMM, Langlois-Meurinne M, Henry Y, Saindrenan P (2005)

Transcriptional co-regulation of secondary metabolism enzymes in

Arabidopsis: functional and evolutionary implications. Plant Mol Biol

58: 229–245

Gautier L, Cope L, Bolstad BM, Irizarry RA (2004) affy: analysis of

Affymetrix GeneChip data at the probe level. Bioinformatics 20: 307–315

Geisler-Lee J, O’Toole N, Ammar R, Provart NJ, Millar AH, Geisler M

(2007) A predicted interactome for Arabidopsis. Plant Physiol 145:

317–329

Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S,

Ellis B, Gautier L, Ge Y, Gentry J, et al (2004) Bioconductor: open

software development for computational biology and bioinformatics.

Genome Biol 5: R80

Goda H, Sasaki E, Akiyama K, Maruyama-Nakashita A, Nakabayashi K,

Li W, Ogawa M, Yamauchi Y, Preston J, Aoki K, et al (2008) The

AtGenExpress hormone and chemical treatment data set: experimental

design, data evaluation, model data analysis and data access. Plant J 55:

526–542

Goldovsky L, Cases I, Enright AJ, Ouzounis CA (2005) BioLayout(Java):

versatile network visualisation of structural and functional relation-

ships. Appl Bioinformatics 4: 71–74

Griffiths J, Murase K, Rieu I, Zentella R, Zhang ZL, Powers SJ, Gong F,

Phillips AL, Hedden P, Sun TP, et al (2006) Genetic characterization

and functional analysis of the GID1 gibberellin receptors in Arabidopsis.

Plant Cell 18: 3399–3414

Guda C, Fahy E, Subramaniam S (2004) MITOPRED: a genome-scale

method for prediction of nucleus-encoded mitochondrial proteins.

Bioinformatics 20: 1785–1794

Gutierrez RA, Lejay LV, Dean A, Chiaromonte F, Shasha DE, Coruzzi GM

(2007) Qualitative network models and genome-wide expression data

define carbon/nitrogen-responsive molecular machines in Arabidopsis.

Genome Biol 8: R7

Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck

K, Lewis S, Marshall B, Mungall C, et al (2004) The Gene Ontology

(GO) database and informatics resource. Nucleic Acids Res 32:

D258–D261

Heazlewood JL, Verboom RE, Tonti-Filippini J, Small I, Millar AH (2007)

SUBA: the Arabidopsis Subcellular Database. Nucleic Acids Res 35:

D213–D218

Hermjakob H, Montecchi-Palazzi L, Lewington C, Mudali S, Kerrien S,

Orchard S, Vingron M, Roechert B, Roepstorff P, Valencia A, et al

(2004) IntAct: an open source molecular interaction database. Nucleic

Acids Res 32: D452–D455

Høglund A, Donnes P, Blum T, Adolph HW, Kohlbacher O (2006)

MultiLoc: prediction of protein subcellular localization using N-terminal

targeting sequences, sequence motifs and amino acid composition. Bio-

informatics 22: 1158–1165

Horan K, Jang C, Bailey-Serres J, Mittler R, Shelton C, Harper JF, Zhu JK,

Cushman JC, Gollery M, Girke T (2008) Annotating genes of known

and unknown function by large-scale coexpression analysis. Plant

Physiol 147: 41–57

Horton P, Park KJ, Obayashi T, Fujita N, Harada H, Adams-Collier CJ,

Nakai K (2007) WoLF PSORT: protein localization predictor. Nucleic

Acids Res 35: W585–W587

Hunter S, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork

P, Das U, Daugherty L, Duquenne L, et al (2009) InterPro: the integra-

tive protein signature database. Nucleic Acids Res 37: D211–D215

Ilic K, Kellogg EA, Jaiswal P, Zapata F, Stevens PF, Vincent LP, Avraham

S, Reiser L, Pujar A, Sachs MM, et al (2007) The plant structure

ontology, a unified vocabulary of anatomy and morphology of a flower-

ing plant. Plant Physiol 143: 587–599

Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP (2003a)

Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res

31: e15

Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf

U, Speed TP (2003b) Exploration, normalization, and summaries of high

density oligonucleotide array probe level data. Biostatistics 4: 249–264

Kilian J, Whitehead D, Horak J, Wanke D, Weinl S, Batistic O, D’Angelo

C, Bornberg-Bauer E, Kudla J, Harter K (2007) The AtGenExpress

global stress expression data set: protocols, evaluation and model data

analysis of UV-B light, drought and cold stress responses. Plant J 50:

347–363

King KE, Moritz T, Harberd NP (2001) Gibberellins are not required for

normal stem growth in Arabidopsis thaliana in the absence of GAI and

RGA. Genetics 159: 767–776

Lee S, Cheng H, King KE, Wang W, He Y, Hussain A, Lo J, Harberd NP,

Peng J (2002) Gibberellin regulates Arabidopsis seed germination via

RGL2, a GAI/RGA-like gene whose expression is up-regulated follow-

ing imbibition. Genes Dev 16: 646–658

Lisso J, Steinhauser D, Altmann T, Kopka J, Mussig C (2005) Identifica-

tion of brassinosteroid-related genes by means of transcript co-response

analyses. Nucleic Acids Res 33: 2685–2696

Manfield IW, Jen CH, Pinney JW, Michalopoulos I, Bradford JR, Gilmartin

PM, Westhead DR (2006) Arabidopsis Co-expression Tool (ACT): Web

server tools for microarray-based gene expression analysis. Nucleic Acids

Res 34: W504–W509

Mutwil M, Øbro J, Willats WGT, Persson S (2008) GeneCAT: novel

Webtools that combine BLAST and co-expression analyses. Nucleic

Acids Res 36: W320–W326

Nair R, Rost B (2005) Mimicking cellular sorting improves prediction of

subcellular localization. J Mol Biol 348: 85–100

Nakajima M, Shimada A, Takashi Y, Kim YC, Park SH, Ueguchi-Tanaka

M, Suzuki H, Katoh E, Iuchi S, Kobayashi M, et al (2006) Identification

and characterization of Arabidopsis gibberellin receptors. Plant J 46:

880–889

Obayashi T, Hayashi S, Saeki M, Ohta H, Kinoshita K (2009) ATTED-II

provides coexpressed gene networks for Arabidopsis. Nucleic Acids

Res 37: D987–D991

Obayashi T, Kinoshita K, Nakai K, Shibaoka M, Hayashi S, Saeki M,

Shibata D, Saito K, Ohta H (2007) ATTED-II: a database of co-expressed

genes and cis elements for identifying co-regulated gene groups in

Arabidopsis. Nucleic Acids Res 35: D863–D869

Pujar A, Jaiswal P, Kellogg EA, Ilic K, Vincent L, Avraham S, Stevens P,

Zapata F, Reiser L, Rhee SY, et al (2006) Whole-plant growth stage

ontology for angiosperms and its application in plant biology. Plant

Physiol 142: 414–428

Rautengarten C, Steinhauser D, Bussis D, Stintzi A, Schaller A, Kopka J,

Altmann T (2005) Inferring hypotheses on functional relationships of

genes: analysis of the Arabidopsis thaliana subtilase gene family. PLoS

Comput Biol 1: e40

Rhee SY, Beavis W, Berardini TZ, Chen G, Dixon D, Doyle A, Garcia-

Hernandez M, Huala E, Lander G, Montoya M, et al (2003) The

Arabidopsis Information Resource (TAIR): a model organism database

providing a centralized, curated gateway to Arabidopsis biology, re-

search materials and community. Nucleic Acids Res 31: 224–228

Salwinski L, Miller CS, Smith AJ, Pettit FK, Bowie JU, Eisenberg D (2004)

The Database of Interacting Proteins: 2004 update. Nucleic Acids Res 32:

D449–D451

Schmid M, Davison TS, Henz SR, Pape UJ, Demar M, Vingron M,

Scholkopf B, Weigel D, Lohmann JU (2005) A gene expression map of

Arabidopsis thaliana development. Nat Genet 37: 501–506

Schwechheimer C (2008) Understanding gibberellic acid signaling: are we

there yet? Curr Opin Plant Biol 11: 9–15

Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N,

Schwikowski B, Ideker T (2003) Cytoscape: a software environment for

integrated models of biomolecular interaction networks. Genome Res

13: 2498–2504

Small I, Peeters N, Legeai F, Lurin C (2004) Predotar: a tool for rapidly

screening proteomes for N-terminal targeting sequences. Proteomics 4:

1581–1590

Srinivasasainagendra V, Page GP, Mehta T, Coulibaly I, Loraine AE

(2008) CressExpress: a tool for large-scale mining of expression data

from Arabidopsis. Plant Physiol 147: 1004–1016

Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M

(2006) BioGRID: a general repository for interaction datasets. Nucleic

Acids Res 34: D535–D539

Steinhauser D, Usadel B, Luedemann A, Thimm O, Kopka J (2004) CSB.

DB: a comprehensive systems-biology database. Bioinformatics 20:

3647–3651

Taylor CF, Paton NW, Lilley KS, Binz PA, Julian RK Jr, Jones AR, Zhu W,

Apweiler R, Aebersold R, Deutsch EW, et al (2007) The minimum

information about a proteomics experiment (MIAPE). Nat Biotechnol

25: 887–893

De Bodt et al.

1178 Plant Physiol. Vol. 152, 2010

Page 13: CORNET: A User-Friendly Tool for Data Miningbioinformatics.psb.ugent.be/pdf/publications/20053712.pdf · CORNET: A User-Friendly Tool for Data Mining and Integration1[W] Stefanie

Toufighi K, Brady SM, Austin R, Ly E, Provart NJ (2005) The Botany Array

Resource: e-northerns, expression angling, and promoter analyses.

Plant J 43: 153–163

Tyler L, Thomas SG, Hu J, Dill A, Alonso JM, Ecker JR, Sun TP (2004)

DELLA proteins and gibberellin-regulated seed germination and floral

development in Arabidopsis. Plant Physiol 135: 1008–1019

Usadel B, Obayashi T, Mutwil M, Giorgi FM, Bassel GW, Tanimoto M,

Chow A, Steinhauser D, Persson S, Provart NJ (2009) Co-expression

tools for plant biology: opportunities for hypothesis generation and

caveats. Plant Cell Environ 32: 1633–1651

Vandepoele K, Quimbaya M, Casneuf T, De Veylder L, Van de Peer Y

(2009) Unraveling transcriptional control in Arabidopsis using cis-

regulatory elements and coexpression networks. Plant Physiol 150:

535–546

Whetzel PL, Parkinson H, Causton HC, Fan L, Fostel J, Fragoso G, Game

L, Heiskanen M, Morrison N, Rocca-Serra P, et al (2006) The MGED

Ontology: a resource for semantics-based description of microarray

experiments. Bioinformatics 22: 866–873

Yu H, Ito T, Zhao Y, Peng J, Kumar P, Meyerowitz EM (2004) Floral

homeotic genes are targets of gibberellin signaling in flower develop-

ment. Proc Natl Acad Sci USA 101: 7827–7832

Zimmermann P, Hirsch-Hoffmann M, Hennig L, Gruissem W (2004)

GENEVESTIGATOR: Arabidopsis microarray database and analysis

toolbox. Plant Physiol 136: 2621–2632

CORNET: Coexpression and Interaction Networks

Plant Physiol. Vol. 152, 2010 1179