Upload
inam12
View
127
Download
4
Embed Size (px)
DESCRIPTION
Citation preview
BIO MAJWORKFLOW ENGINE DEDICATED TO BIO-DATA SYNCHRONIZATION AND PROCESSING.Sana Anam Roll # 3003
Bs(Hons) Botany 3rd semester Eve
Submitted to Inam ul Haq
Univ
ersity
of E
duca
tion
CONTENT
INTRODUCTION BACKGROUND OF BIOMAJ APPLICATION BIOMAJ PROVIDE CONCLUSION REFRENCES
Univ
ersity
of E
duca
tion
INTRODUCTION
In biocomputing, analyses are almost systematically reliant on
databanks. Any biocomputing site therefore needs to
manage these invaluable databanks that hold a huge amount of information usually several terabytes, spread over various international sites and in a consistent format (there are still several different standards currently).
Univ
ersity
of E
duca
tion
BACKGROUND OF BIOMAJ The BioMAJ project came out of the work of three teams in 2005:
INRIA Rennes and INRA Toulouse and JouyenJosas. At the time, no free applications met users’ requirements. The
closest application was citrina, developed by Josh Goodman (from
Washington University’s gmod project). This was a promising prototype – nonetheless quite far from the
application required – and it had not been updated since 2004. In 2006, these teams (INRIA and INRA) developed a new engine
called BioMAJ1. Based on citrina 0.51, nearly all the code was rewritten and the application’s
architecture and functions were completely rethought and considerably extended. During 2007, the application was tested on the three sites involved
in the project to make it more robust and suitable
Univ
ersity
of E
duca
tion
APPLICATION
Synchronization :Multiple remote protocols (ftp, sftp, http,
rsync, local copy)Data transfers integrity checkRelease versioning using a incremental
approachMulti threadingData extraction (gzip, tar, bzip)Data tree directory normalization
Univ
ersity
of E
duca
tion
Pre &Post processing :Advanced workflow description (D.A.G)
using Easy normalized syntax languagePost-process indexation for various
bioinformatics software (blast, srs, fastacmd, readseq, etc…)
Easy integration of personal scripts for bank post-processing automation
Univ
ersity
of E
duca
tion
Supervision :Administration web interfaceRepository statisticsMail alerts for the update cycle supervision
Univ
ersity
of E
duca
tion
BIOMAJ PROVIDE
A reliable workflow engine that can download remote data automatically and intelligently
(error correction, synchronization of local and remote data), apply formatting to this data and
put it into production (make the data available for all users and/or applications).
A group of predefined workflows for the main biological banks.
An indexing scripts library (formatting for biological data)
Univ
ersity
of E
duca
tion
CONCLUSION
BioMAJ provides flexibility in managing banks of sequences on a site while allowing for rapid implementation of new workflows by simply creating a bank description file.
Univ
ersity
of E
duca
tion
REFERENCES
Website: http://biomaj.genouest.org/
Univ
ersity
of E
duca
tion
Authors: David Allouche, Olivier Filangi , Romaric Sabas, Olivier Sallou([email protected])