10
BIO MAJ WORKFLOW ENGINE DEDICATED TO BIO-DATA SYNCHRONIZATION AND PROCESSING. Sana Anam Roll # 3003 Bs(Hons) Botany 3 rd semester Eve Submitted to Inam ul Haq U n i v e r s i t y o f E d u c a t i o n

3003 eve 1

  • Upload
    inam12

  • View
    127

  • Download
    4

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 3003 eve 1

BIO MAJWORKFLOW ENGINE DEDICATED TO BIO-DATA SYNCHRONIZATION AND PROCESSING.Sana Anam Roll # 3003

Bs(Hons) Botany 3rd semester Eve

Submitted to Inam ul Haq

Univ

ersity

of E

duca

tion

Page 2: 3003 eve 1

CONTENT

INTRODUCTION BACKGROUND OF BIOMAJ APPLICATION BIOMAJ PROVIDE CONCLUSION REFRENCES

Univ

ersity

of E

duca

tion

Page 3: 3003 eve 1

INTRODUCTION

In biocomputing, analyses are almost systematically reliant on

databanks. Any biocomputing site therefore needs to

manage these invaluable databanks that hold a huge amount of information usually several terabytes, spread over various international sites and in a consistent format (there are still several different standards currently).

Univ

ersity

of E

duca

tion

Page 4: 3003 eve 1

BACKGROUND OF BIOMAJ The BioMAJ project came out of the work of three teams in 2005:

INRIA Rennes and INRA Toulouse and JouyenJosas. At the time, no free applications met users’ requirements. The

closest application was citrina, developed by Josh Goodman (from

Washington University’s gmod project). This was a promising prototype – nonetheless quite far from the

application required – and it had not been updated since 2004. In 2006, these teams (INRIA and INRA) developed a new engine

called BioMAJ1. Based on citrina 0.51, nearly all the code was rewritten and the application’s

architecture and functions were completely rethought and considerably extended. During 2007, the application was tested on the three sites involved

in the project to make it more robust and suitable

Univ

ersity

of E

duca

tion

Page 5: 3003 eve 1

APPLICATION

Synchronization :Multiple remote protocols (ftp, sftp, http,

rsync, local copy)Data transfers integrity checkRelease versioning using a incremental

approachMulti threadingData extraction (gzip, tar, bzip)Data tree directory normalization

Univ

ersity

of E

duca

tion

Page 6: 3003 eve 1

Pre &Post processing :Advanced workflow description (D.A.G)

using Easy normalized syntax languagePost-process indexation for various

bioinformatics software (blast, srs, fastacmd, readseq, etc…)

Easy integration of personal scripts for bank post-processing automation

Univ

ersity

of E

duca

tion

Page 7: 3003 eve 1

Supervision :Administration web interfaceRepository statisticsMail alerts for the update cycle supervision

Univ

ersity

of E

duca

tion

Page 8: 3003 eve 1

BIOMAJ PROVIDE

A reliable workflow engine that can download remote data automatically and intelligently

(error correction, synchronization of local and remote data), apply formatting to this data and

put it into production (make the data available for all users and/or applications).

A group of predefined workflows for the main biological banks.

An indexing scripts library (formatting for biological data)

Univ

ersity

of E

duca

tion

Page 9: 3003 eve 1

CONCLUSION

BioMAJ provides flexibility in managing banks of sequences on a site while allowing for rapid implementation of new workflows by simply creating a bank description file.

Univ

ersity

of E

duca

tion

Page 10: 3003 eve 1

REFERENCES

Website: http://biomaj.genouest.org/

Univ

ersity

of E

duca

tion

Authors: David Allouche, Olivier Filangi , Romaric Sabas, Olivier Sallou([email protected])