19
A High-Throughput A High-Throughput Bioinformatics Bioinformatics Distributed Distributed Computing Platform Computing Platform UTE OF INFORMATION TECHNOLOGY (IIT), UNIVERSITY OF 1 19-09-2012 19-09-2012 A high-throughput bioinformatics distributed computing

A High Throughput Bioinformatics Distributed Computing Platform

Embed Size (px)

DESCRIPTION

A Research Paper review presentation on "A High throughput bioinformatics distribute computing platform", presented by Md. Habibur Rahman, BIT0216, Institute of Information Technology University of Dhaka.

Citation preview

Page 1: A High Throughput Bioinformatics Distributed Computing Platform

A High-Throughput A High-Throughput Bioinformatics Distributed Bioinformatics Distributed

Computing PlatformComputing Platform

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

1119-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 2: A High Throughput Bioinformatics Distributed Computing Platform

Presented by-Presented by-

Md. Habibur RahmanMd. Habibur Rahman

BIT 0216BIT 0216

Institute of Information TechnologyInstitute of Information Technology

University of DhakaUniversity of Dhaka

BangladeshBangladesh

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

2219-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 3: A High Throughput Bioinformatics Distributed Computing Platform

The contributors of the paper The contributors of the paper

Thomas M. Keane, Andrew J. Page, James O. McInerney, Thomas M. Keane, Andrew J. Page, James O. McInerney, and Thomas J. Naughtonand Thomas J. Naughton

Bioinformatics and Pharmacogenomics Laboratory, Bioinformatics and Pharmacogenomics Laboratory, National University of Ireland, Maynooth, Co. Kildare, National University of Ireland, Maynooth, Co. Kildare,

IrelandIreland

Department of Computer Science, National University of Department of Computer Science, National University of Ireland, Maynooth, Co. Kildare, IrelandIreland, Maynooth, Co. Kildare, Ireland

Homepage: http://www.cs.nuim.ie/distibutedHomepage: http://www.cs.nuim.ie/distibuted

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

3319-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 4: A High Throughput Bioinformatics Distributed Computing Platform

PublicationsPublications

18th IEEE Symposium on Computer-Based Medical System (CBMS’05)

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

4419-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 5: A High Throughput Bioinformatics Distributed Computing Platform

Suitability of Bioinformatics to Suitability of Bioinformatics to Distributed ComputingDistributed Computing

A Class of Algorithmic Parallelism

referred to as coarse-grained parallelism.

High compute-to-data ratio.

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

5519-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 6: A High Throughput Bioinformatics Distributed Computing Platform

Topic and Problem Overview Topic and Problem Overview

Demand for high performance computing has increased dramatically in the area of bioinformatics due to rapid increase in the size of genomic databases.

Traditional database search algorithm was not feasible to perform full search of a large database in a reasonable time.

Feasibility of heuristic algorithm but reduction of sensitivity of search.

Evolutionary biology, phylogenetic tree and greedy heuristic algorithm.

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

6619-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 7: A High Throughput Bioinformatics Distributed Computing Platform

Proposed solutionProposed solution

o According to the writers of the paper---

“We present a general-

purpose programmable distributed computing platform suitable

for deployment in a typical university environment where many

semi-idle desktop PC’s are connected via a network”

The system is fully cross-platform. Two distributed bioinformatics applications:

i) DSEARCHii) DPRml

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

7719-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 8: A High Throughput Bioinformatics Distributed Computing Platform

Proposed solution(cont.)Proposed solution(cont.)

o Java Distributed Computing platform

- Client Server model

- Server controls the resources (database, algorithm

or computer hardware)

- The model is divided into three separate pieces of

software: server, client and remote interface.

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

8819-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 9: A High Throughput Bioinformatics Distributed Computing Platform

Proposed solution(cont.)Proposed solution(cont.)

Fig: Diagram of the complete system

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

9919-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 10: A High Throughput Bioinformatics Distributed Computing Platform

Installation and Deployment

- Consists of three executable JAR files corresponding to

the server, client and remote interface.

- Run the client as a low priority background service.

- Hardware specification: At least Pentium IV processor

- OS compatibility: Windows, Sun Solaris, Mac OSX and

Linux.

Proposed solution(cont.)Proposed solution(cont.)

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

101019-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Page 11: A High Throughput Bioinformatics Distributed Computing Platform

DPRmlDPRml- - Distributed Phylogeny Reconstruction by maximum likelihood

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

111119-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Previous situation:

Maximum likelihood evolution is one the most accurate techniques

for reconstructing phylogenies.

Developed parallel ML programs for reconstructing large and

accurate phylogenetic trees.

Implemented in platform specific language

Page 12: A High Throughput Bioinformatics Distributed Computing Platform

19-09-201219-09-2012 1212

DPRml (cont.)DPRml (cont.)- - Distributed Phylogeny Reconstruction by maximum likelihood

After the development of distributed computing platform:

One of the most general and powerful likelihood-based phylogenetic

tree building program.

Used proven tree building algorithm and phylogenetic Analysis

Library

Possibility of multiple phylogenetic computation.

Platform independent ML program.

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

A high-throughput bioinformatics distributed computing platform

Page 13: A High Throughput Bioinformatics Distributed Computing Platform

19-09-201219-09-2012 1313

DPRml (cont.)DPRml (cont.)- - Distributed Phylogeny Reconstruction by maximum likelihood

Speed up Testing:

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

A high-throughput bioinformatics distributed computing platform

Fig. Speedup achieved by running 6 simultaneous DPRml problems using between 1-40 semi-idle processors.

Page 14: A High Throughput Bioinformatics Distributed Computing Platform

DSEARCHDSEARCH

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

141419-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

Fully cross-platform parallel database search program.

Operates in a master slave environment.

Splitting the database into fixed sized units that are subsequently

searched on the donor machines.,

Page 15: A High Throughput Bioinformatics Distributed Computing Platform

19-09-201219-09-2012 1515

DSEARCH (cont.)DSEARCH (cont.)

Speed up Testing:

Fig. Speedup achieved by DSEARCH running on between 1-80 semi-idle processors.

Using-

- FASTA database file,

- A FASTA query

sequence file.

- A searching scheme

- A configuration file.

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

A high-throughput bioinformatics distributed computing platform

Page 16: A High Throughput Bioinformatics Distributed Computing Platform

My criticism and future work to doMy criticism and future work to do

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

A high-throughput bioinformatics distributed computing platform

161619-09-201219-09-2012

No detail description about how the applications works on the

distributed computing platform.

If we don’t get the spare clock cycle of the semi-idle pc then the

system will not give us the best result.

Failure of interconnected network of the desktop-pc’s will reduce

the performance.

To improve and expand the range of bioinformatics applications for

the system.

Page 17: A High Throughput Bioinformatics Distributed Computing Platform

ConclusionConclusion

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

171719-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

“There should not have any conclusion of

research work, It is a continual process and it will

be continued for the betterment of the human

being.”

Page 18: A High Throughput Bioinformatics Distributed Computing Platform

ANY QUESTION?ANY QUESTION?

19-09-201219-09-2012 1818

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA

A high-throughput bioinformatics distributed computing platform

Page 19: A High Throughput Bioinformatics Distributed Computing Platform

191919-09-201219-09-2012

A high-throughput bioinformatics distributed computing platform

INS

TIT

UTE O

F IN

FOR

MA

TIO

N T

EC

HN

OLO

GY (

IIT),

U

NIV

ER

SIT

Y O

F D

HA

KA