1
e-Laboratory for Interdisciplinary Collaborative Data Mining http://www.e-lico.eu An EU-FP7 Collaborative Project (2009-2012) Theme ICT-4.4: Intelligent Content and Semantics The RapidMiner Plugin for Taverna: bringing Data Mining Tools to Bioinformatics Workflows Simon Jupp 1 , James Eales 1 , Simon Fischer 2 , Sebastian Land 2 , Rishi Ramgolam 1 , Alan Williams 1 and Robert Stevens 1 Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands of databases and similar numbers of tools for processing data. Data analysis involves gathering and processing data possibly from many sources, even before the analysis for the central biological question takes place. Taverna ( http://www.taverna.org.uk ) is a workflow management system that allows bioinformaticians to create data pipelines involving distributed Web services and other forms of tool; these workflows gather and manage data in order to perform analyses that ask biological questions. RapidMiner ( RM, http://rapid-i.com ) is an open source, cross-platform application, released under the AGPLv3, that brings a large suite of data processing, visualization and data mining tools to bear upon tables of data, such as those that can be gathered by Taverna. Through the RM plugin for Taverna we have combined the ability to gather and process data from many molecular biological sources with RM’s data mining capabilities to provide a powerful tool for scientific analysis. More than 350 data mining operators exposed as services in Taverna. Operators for data transformation, pre-processing, machine learning, text-mining, reporting and visualisation. powered by an enterprise server RapidAnalytics. Data repository browser supporting upload, download and sharing of large data files. Data passing by reference - operators are executed on data stored in the repository for increased scalability with large datasets. Web interface allowing the generation of reports and results. Generation and retrieval of meta data. See workflows demonstrating some typical bioinformatics tasks at http://www.myexperiment.org/groups/402 For more information, software & instructions, please visit http://www.e-lico.eu/TavernaRM Licence: GNU Lesser General Public Licence (LGPL) 2.1 Source code: http://taverna.googlecode.com/svn/unsorted/taverna-elico/ University of Manchester 1 (UK), Rapid-I GmbH 2 (Germany), University of Zurich, University of Geneva (Switzerland), Inserm (France), Josef Stefan Institute (Slovenia), National Hellenic Research Foundation (Greece), Poznan University of Technology (Poland), Ruder Boskovic Institute (Croatia)

The RapidMiner Plugin for Taverna: bringing Data Mining ... · • More than 350 data mining operators exposed as services in Taverna. • Operators for data transformation, pre-processing,

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The RapidMiner Plugin for Taverna: bringing Data Mining ... · • More than 350 data mining operators exposed as services in Taverna. • Operators for data transformation, pre-processing,

e-Laboratory for Interdisciplinary Collaborative Data Mining

http://www.e-lico.eu An EU-FP7 Collaborative Project (2009-2012) !Theme ICT-4.4: Intelligent Content and Semantics

The RapidMiner Plugin for Taverna: bringing Data Mining Tools to Bioinformatics Workflows

Simon Jupp1, James Eales1, Simon Fischer2, Sebastian Land2, Rishi Ramgolam1, Alan Williams1 and Robert Stevens1

Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands of databases and similar numbers of tools for processing data. Data analysis involves gathering and processing data possibly from many sources, even before the analysis for the central biological question takes place. •  Taverna ( http://www.taverna.org.uk ) is a workflow management system that allows bioinformaticians to create

data pipelines involving distributed Web services and other forms of tool; these workflows gather and manage data in order to perform analyses that ask biological questions.

•  RapidMiner ( RM, http://rapid-i.com ) is an open source, cross-platform application, released under the AGPLv3, that

brings a large suite of data processing, visualization and data mining tools to bear upon tables of data, such as those that can be gathered by Taverna.

Through the RM plugin for Taverna we have combined the ability to gather and process data from many molecular biological sources with RM’s data mining capabilities to provide a powerful tool for scientific analysis.

•  More than 350 data mining operators exposed as services in Taverna. •  Operators for data transformation, pre-processing, machine

learning, text-mining, reporting and visualisation. •  powered by an enterprise server RapidAnalytics. •  Data repository browser supporting upload, download and sharing of

large data files. •  Data passing by reference - operators are executed on data stored in

the repository for increased scalability with large datasets. •  Web interface allowing the generation of reports and results. •  Generation and retrieval of meta data. •  See workflows demonstrating some typical bioinformatics tasks at

http://www.myexperiment.org/groups/402 •  For more information, software & instructions, please visit

http://www.e-lico.eu/TavernaRM

Licence: GNU Lesser General Public Licence (LGPL) 2.1 Source code: http://taverna.googlecode.com/svn/unsorted/taverna-elico/

University of Manchester1 (UK), Rapid-I GmbH2 (Germany), University of Zurich, University of Geneva (Switzerland), Inserm (France), Josef Stefan Institute (Slovenia), National Hellenic Research Foundation (Greece), Poznan University of Technology (Poland), Ruder Boskovic Institute (Croatia)