29
EGEE-II INFSO-RI- 031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E- sciencE The Wisdom Environment Vincent Bloch CNRS-IN2P3 ACGRID School Hanoi (Vietnam) November 8th, 2007 Credits: Jean Salzemann

EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

Embed Size (px)

Citation preview

Page 1: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Enabling Grids for E-sciencE

www.eu-egee.org

EGEE and gLite are registered trademarks INFSO-RI-508833

Enabling Grids for E-sciencE

The Wisdom Environment

Vincent Bloch CNRS-IN2P3

ACGRID School

Hanoi (Vietnam) November 8th, 2007

Credits: Jean Salzemann

Page 2: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

2

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM initiative

• WISDOM initiative aims to demonstrate the relevance and the impact of the grid approach to address drug discovery for neglected and emerging diseases.

• First achieved experiences:– Summer 2005: Wide In Silico Docking On Malaria (WISDOM)– Spring 2006: Accelerate drug design against H5N1 neuraminidase– Winter 2006: Second data challenge on Malaria

• Partners:– Grid infrastructures: EGEE, Auvergrid, TWGrid, EELA, EuChinaGrid,

EuMedGrid– European projects: Embrace, BioinfoGrid, EGEE– Institutes and association: Fraunhofer SCAI, Academia Sinica of

Taiwan, ITB, Unimo University, LPC, CMBA, CERN-ARDA, HealthGrid

Page 3: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

3

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Challenges for high throughput virtual dockingExample: data challenge against H5N1 NA

300,000 Chemical compounds:ZINC &Chemical combinatorial library

Target (PDB) :Neuraminidase (8 structures)

Millions of chemicalcompounds availablein laboratories

In vitro high Throughput Screening1$/compound, nearly impossible

Molecular docking (Autodock)~100 CPU years, 600 GB data

Data challenge on EGEE, Auvergrid, TWGrid~6 weeks on ~2000 computers

In vitroscreeningof 100 hits

Hits sorting and refining

N1H5 N1H5

Page 4: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

4

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Issues for the grid-enabled high throughput virtual docking

• Computer-based in-silico screening can help to identify the most promising leads for biological tests– Involve whole databases (ZINC)– reduces the cost of trail-and-error approach

• In silico docking is well-fitted for grid deployment– CPU intensive application– Huge amount of output– Embarrassingly Parallel

• Issues of a large scale grid deployment– The rate of submitted jobs must be carefully monitored– The amount of transferred data impacts on grid performance– Grid process introduces significant delays– Licensed software requires licenses distribution strategy on grid

Page 5: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

5

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid tools used during the data challenges

• WISDOM– a workflow of grid job handling: automated job submission,

status check and report, error recovery– push model job scheduling– batch mode job handling– http://wisdom.eu-egee.fr

• DIANE– a framework for applications with master-worker model– pull mode job scheduling– interactive mode job handling with flexible failure recovery

feature– http://cern.ch/diane

Page 6: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

6

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid components interacting with WISDOM

• The WMS:

– The user submits jobs via the Workload Management System – The Goal of WMS is the distributed scheduling and resource

management in a Grid environment.– What does it allow Grid users to do?

To submit their jobs To get information about their status To cancel them To retrieve their output

– The WMS tries to optimize the usage of resources as well as execute user jobs as fast as possible

Page 7: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

7

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WMS Components

WMS is currently composed of the following parts:• User Interface (UI) : access point for the user to the WMS • Resource Broker (RB) : the broker of GRID resources,

responsible to find the “best” resources where to submit jobs

• Job Submission Service (JSS) : provides a reliable submission system

• Information Index (BDII) : a server (based on LDAP) which collects information about Grid resources – used by the Resource Broker to rank and select resources

• Logging and Bookkeeping services (LB) : store Job Info available for users to query

Page 8: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

8

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Grid components interacting with WISDOM

• DMS: Data Management system– The user can store files on the grid through the DMS.

– The goal of the DMS is to virtualize data on the grid and guarantee security integrity, and reliability of the data

– What it allows Grid users to do: Copy Files on the Grid Register files on the Grid with a logical name Store and manage metadata related to a file Replicate files on the Grid Delete files on the Grid Retrieve files from the Grid

Page 9: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

9

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

DMS Components

• LFC (LCG File Catalogue):– It is used to register files on the grid– LFC provides functionalities to give logical names to files and

organize them in directories

• GridFTP:– Low level file transfer protocol – Secured and reliable

• AMGA:– It is an grid interface for relational databases– Can be used to store medata– Can be used as a file catalogue

Page 10: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

10

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

other components interacting with WISDOM

• VOMS (Virtual Organisation Membership Service)– Store information concerning VO and roles

• FlexLm floating licenses server

• Web Portals– Can be used to visualize statistics or results

• Remote Database Servers– Can be used to store some information remotely (results,

metadata etc..)

Page 11: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

11

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM technology

• WISDOM has been specifically developed around EGEE middlewares (LCG-2.7 , Glite).

• It uses a Java Multithreaded submission Engine

• Main scripts are written in perl

• Job-related scripts in written shell script (bash)

• Future environment will include– Web Services technology (WS-I profile)– Java and Python AMGA clients– All the code written in Java– Security and fine-grained ACLs

Page 12: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

12

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

• 2 main scripts:– wisdom_submit:

submits the jobs with a java multithreaded submission engine stores the job ID and command lines and store them in a database.

– wisdom_status:

checks the status of jobs regularly handle the resubmissions of failed, aborted and cancelled jobs. reads the IDs from wisdom_submit database stores the job IDs in a table to prevent crushing wisdom_submit

files, along with other parameters:• job number• a submission job status (unsubmitted, submitted, done)• job submission count.

The process will loop until all the jobs of the instance are not finished.

WISDOM ENVIRONMENT (1/2)

Page 13: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

13

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM ENVIRONMENT (2/2)

• Several Features:– No input and output sandboxes in jobs.

All the target files, ligands and software are copied dynamically from the SE to the WN to unload the RB i/o.

FlexX outputs and Grid outputs are saved on several SEs through LFC and GridFTP

– Jobs JDL and scripts are generated just before any submission to take the wisdom.conf modifications into account (CE and RB black

lists, job submission frequency) are deleted afterward to save disk space.

– Dynamic insertions of docking results and statistics in databases which allow real-time visualisation of the DC status.

– wisdom_status can be stopped at any given time and restarted: it saves its own memory environment, so it can be restarted after a crash.

Page 14: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

14

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Instance Definition

• The instance is a set of jobs regrouped accordingly to different criteria.

• The instance is unique, and has its own name• The instance is submitted entirely on the grid, then it is

followed up• The instance name is by default: • <TARGET><PARAMETER><DATABASE>

• Instance’s jobs are called after the instance name:

• <INSTANCE NAME>J<number of the job>

Page 15: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

15

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM deployment

GRIDGrid services (RB, RLS…)

Grid resources (CE, SE)Application components

(Software, database)

installation

Installer Tester

Test the grid

wisdom_executionWorkload definition

Job submissionJob monitoring

Job bookkeepingFault trackingFault fixing

Job resubmission

Set of jobs

User

Collection

Accounting data

Superviser

Web sitedatabase

License server

Page 16: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

16

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM Integration example

DMS/GFTP

User Interface

HealthGrid Server

Web Site

WMSSEsCEs &WNs

User Interface

Wisdom_submit

Wisdom_status

WMSSubmits the jobs

Checks job status resubmits

CEs &WNs

FlexXjob

SEs

Structure file

Compounds file

inputs

outputs

Output file

Web Site WISDOMDB

OutputDB

Docking information

Statistics

FlexX

Statistics Flexlm

server

Flexlm

server

Flexlm

server

Page 17: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

17

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Environment architecture (1/2)

• wisdom.conf – the file that define the configuration of the instance

• wisdom_submit.sh – the execution script that launch the instance submission

• wisdom_submit.pl – the perl script of the execution process which submits the

instance

• wisdom_status.sh – the execution script that launches the instance status checking

• wisdom_status.pl – the perl script of the status checking

Page 18: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

18

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Environment architecture (2/2)

• bin/flexx.sh (the flexx script that is run by the jobs)• bin/mt-job-submit (the execution script of the multi-threaded

submission engine) • bin/MTJobSubmitter.jar (the jar file of the multi-threaded java

submission engine)• bin/checkit.sh (a script used at the end of jobs to check the status

of the job and store everything on the grid)• bin/lfc_env.sh (a script to set up the environment variables for

LFC)• input/<DATABASE>/db_urls* ( there are several files, 1000, 2000,

3000, 4000… each of these files has the sfn of the database subsets replica. It is used in case of failure of the LFC server).

• edg_wl_ui_config/* (this directory hold all the configuration files of the resource brokers)

• Files need to be edited accordingly to the application and the VO!

Page 19: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

19

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Simplified grid workflow for WISDOM

StorageStorageElementElement

ComputiComputingngElementElement

Site1

Site2

StorageStorageElementElement

User interfaceUser interface

ComputiComputingngElementElement

Compounds database

Parameter settingsTarget structures

Results

Results

Statistics

ResourceResourceBrokerBroker

Software

WISDOM production system

Jobs

Subsets

Subsets

Page 20: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

20

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

WISDOM and Security

• Instances are submitted by a given user– All the jobs of the instance are belonging to the same user– Resources are dependant on the user’s VO

• Outputs files – Stored on the VO storage elements and register with the VO

LFC.– If LFC is failing, files are stored on the VO writable directory on a

given storage element

This implies that: • Users must follow-up the execution, and need to renew

their proxy if necessary• Files stored, are, a priori, available to all the VO

members

Page 21: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

21

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The new environment

• Web Services Interface– better interoperability– Everything is controlled through a few set of operations (no more

modification of the files are required)

• Dynamic storing/querying of the results and jobs information on the Grid using AMGA metadata management system

• Improved fault tolerance

• Improved flexibility – New applications can be deployed more easily– As well as corresponding data

• Secured and multi-user

Page 22: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

22

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The new environment

• Entirely developed using Java– No need to use text files to send information between the

submission and the monitoring of the jobs– Improved fault tolerance– Improved flexibility– New application easier to deploy

• Improved monitoring of the grid resources– Uses its own ranking based on BDII information– Takes into account the number of jobs submitted to the sites to

avoid overloads (the jobs are sent where the free CPUs are)– Takes into account the jobs failures and failing reasons (the

“bad” sites are penalized)

Page 23: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

23

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

The new environment

• Uses AMGA for jobs and data monitoring– Improved monitoring and statistics– Dynamic storage and query of the data and results– Allows “Pull Model”

• Web Services Interface– Better interoperability– Ease the access to the environment: everything is controlled

through a few set of operations

Page 24: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

24

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

New environment process

• Retrieve BDII information concerning the CE (number of CPUs, free CPUs,…)

• Define a workload according to the CE information• Initialize the voms proxy• Generate the jobs JDL• Submit the jobs using multithreaded submission• Until all the jobs are successful:

– Check the status of the jobs using multithreaded check– Resubmit jobs if needed– Re-initialize voms proxy if needed– Update instance information in AMGA

Page 25: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

25

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

New Environment Architecture

Page 26: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

26

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

hits

hits_id

rank

simulation_id

energy_level

run

mean_energy

Cluster_count

simulation

dlg_file

simulation_id

Target_id

Ligand_id

Histogram_file

file

coordinates_blob

hits_id

ligands

Ligand_id

Library_id

name

pdbq_file

mass

psa

logp

donor

acceptor

Logd7_4

ring

rb

far

atoms

refra

library

Library_name

library_id

target

name

Target_id

pdbqs

maps

1,1

1,n

1,1

1,1

1,1

1,n

1,1 1,n1,1

1,n

project

description

Project_id

Program_name

Program_version

Program_options

Project_id

1,n

1,1

coordinates_file

Agent_id

job

Result Database Schema (autodock)

Page 27: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

27

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Monitoring schema

Page 28: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

28

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Pull Model

• Instead of sending a task with the job, the job retrieve a task from the task database while running

• The job performs tasks as long as it is running• Pros:

– No need to define a workload before the job submission– No need to have all the jobs running– When a job fails, only the last task need to be recomputed

• Cons:– Need to store the results on the fly– No access to the output sandbox– Retrieving a task can increase the job overhead

Page 29: EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE  EGEE and gLite are registered trademarks INFSO-RI-508833 Enabling Grids for E-sciencE

29

Enabling Grids for E-sciencE

EGEE-II INFSO-RI-031688

Questions?