34
eScience and Grid Tools and techniques for the next generation scientist Professor Brian Vinter Head of the Copenhagen eScience Center

eScience and Grid Tools and techniques for the next generation scientist

Embed Size (px)

DESCRIPTION

eScience and Grid Tools and techniques for the next generation scientist. Professor Brian Vinter Head of the Copenhagen eScience Center. e Science. «The next 10 to 20 years will see computational science firmly embedded in the fabric of science - PowerPoint PPT Presentation

Citation preview

Page 1: eScience and Grid Tools and techniques for the next generation scientist

eScience and Grid

Tools and techniques for the next generation scientist

Professor Brian VinterHead of the Copenhagen eScience Center

Page 2: eScience and Grid Tools and techniques for the next generation scientist

eScience

«The next 10 to 20 years will seecomputational science firmlyembedded in the fabric of science– the most profound development in the scientific method in over three centuries.»

US Department of Energy 2003.

Page 3: eScience and Grid Tools and techniques for the next generation scientist

Mega-Science

The next scientific period will be dominated by Mega-Science projects• 104 researchers on a single project• Extreme data production• Highly integrated collaboration between different

groups of scientistsExamples

• CERN LHC• ALMA• Mars project

Page 4: eScience and Grid Tools and techniques for the next generation scientist

Data Production

1997: Total data worldwide app 12 exabytes (incl. documents, film, TV, pictures, …)1

1999: 2-3 exabytes data produced2

2002: App. 5 exabytes data produced2

1 Exabyte = 1000 Petabytes1 Petabyte = 1000 Terabytes1 Terabyte = 1000 Gigabytes1 Gigabyte = 1000 Megabytes

Global data availablity doubles every 4-5 years.

1) http://www.lesk.com/mlesk/ksg97/ksg.html2) http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

Page 5: eScience and Grid Tools and techniques for the next generation scientist

Modeling and simulation

eScience Components

Page 6: eScience and Grid Tools and techniques for the next generation scientist

Modeling and simulation

Data acquisition and handling

eScience Components

Page 7: eScience and Grid Tools and techniques for the next generation scientist

eScience Components

Modeling and simulation

Data acquisition and handling

Visualization

Page 8: eScience and Grid Tools and techniques for the next generation scientist

eScience Components

Modeling and simulation

Data acquisition and handling

Visualization

HPC and Grid

Page 9: eScience and Grid Tools and techniques for the next generation scientist

442 molecules1372 molecyles

54 molecules

Why is it getting more difficult?

Page 10: eScience and Grid Tools and techniques for the next generation scientist

System sizes and time scales

System

Size 1H

2

O2

10

Sing

lepe

ptid

e

1000

Bio

mim

etic

Com

poun

d

105

Prot

eins

106

Rib

osom

es

Time 10-15

Phot

o-io

niza

tion

10-12

Prot

ontr

ansf

er

10-9 10-6 1

Process

103

Thi

s se

min

ar

seconds

number of atoms

10-3

Prot

ein

fold

ing

104

Bio

poly

mer

s

System

Size 1H

1H

2

O2

2

O2

10

Sing

lepe

ptid

e

10

Sing

lepe

ptid

e

1000

Bio

mim

etic

Com

poun

d

1000

Bio

mim

etic

Com

poun

d

105

Prot

eins

105

Prot

eins

106

Rib

osom

es

106

Rib

osom

es

Time 10-15

Phot

o-io

niza

tion

10-15

Phot

o-io

niza

tion

10-12

Prot

ontr

ansf

er

10-12

Prot

ontr

ansf

er

10-910-9 10-610-6 11

Process

103

Thi

s se

min

ar

103

Thi

s se

min

ar

seconds

number of atoms

10-3

Prot

ein

fold

ing

10-310-3

Prot

ein

fold

ing

104

Bio

poly

mer

s

104104

Bio

poly

mer

s

Page 11: eScience and Grid Tools and techniques for the next generation scientist

Nano-modeling

Extremely CPU- and Data-intensive algorithmsComplex structure-calculationsMultiple days of execution even on a supercomputerRuns of both PCs and Supercomputers

Page 12: eScience and Grid Tools and techniques for the next generation scientist

eScience and Bio/Med

We expect very good results form eScience in biology and medicine

The foremost advantages will come from introducing a mathematical causal understanding of biological systems• Bio-informatics are already doing this

An emerging field: Systems Biology• Systems Medicine is also starting internationally

Page 13: eScience and Grid Tools and techniques for the next generation scientist

Calculations in treatment

Computational methods are already important in medical planning

• Radiation planning• Bypass flow

modeling• Robotic surgery• …

Page 14: eScience and Grid Tools and techniques for the next generation scientist

Every human is uniqueAlso at the genetic level

In our genome, which is written with the alphabet ACGT, we have a number of micro mutations – called single nucleotide polymorphisms, SNP

These SNPs are often without consequence but• Some make us sick• Some are indicators of a faulty gene• Others influence our reception of a drug

The last complication makes is very hard to make drugs for the general population

We want to move from commodity medicine to custom tailored drugs

Personalized medicine

Page 15: eScience and Grid Tools and techniques for the next generation scientist

An example

app 60% of today's medicines are metabolized by cytochrome P450 enzymes• Some have highly efficient P450 while

others have very slow and inefficient P450• Knowledge of a patients P450 level will

allow us to dose medicine to the individual much more efficiently

This is already in early use

Page 16: eScience and Grid Tools and techniques for the next generation scientist

And this is eScience how?

Developing a drug is not a linear process The human genome is written with

billions og letters• Any person has millions of SNP mutations• Finding the SNP that has an effect is a

highly complex computational task

Page 17: eScience and Grid Tools and techniques for the next generation scientist

eScience and geology

Geology and hydrology too has been using computational methods for a long time

There are very interesting aspects in combining different methods• i.e. include biological systems in the models• Inverse mapping of seismic data

It turns out that we use the same techniques in medicine• And soon in industry

Page 18: eScience and Grid Tools and techniques for the next generation scientist

Grid

Minimum intrusion Grid

Page 19: eScience and Grid Tools and techniques for the next generation scientist

Minimum intrusion Grid

GRID

GRID

GRID

Resource

Resource

Resource

Resource

User

User

User

Page 20: eScience and Grid Tools and techniques for the next generation scientist

Processing plants

Like the power grid the computing Grid has many types of power producers• High yield power plants (fossil fuel, nuclear,…)

• Supercomputers and large farms

• Low yield producers (windmills, etc)• Individual PCs and games-consoles

• Very low yield producers (solar panels, etc.)• Web-browers

Page 21: eScience and Grid Tools and techniques for the next generation scientist

One Click

Page 22: eScience and Grid Tools and techniques for the next generation scientist

Interactive Applications

Page 23: eScience and Grid Tools and techniques for the next generation scientist

VGrids

Best thing since sliced bread VGrids are Virtual Organizations in MiGThey are a dead easy way to create collaborations

• Share files• Share resources• Private entry page• Public Web-page

Page 24: eScience and Grid Tools and techniques for the next generation scientist

Portals

VO’s can generate their own private entry pages including application portals

Page 25: eScience and Grid Tools and techniques for the next generation scientist

Files in VGrids

A user must keep her personal home-directory independent of which VGrid she works in

But VGrids have a common directory where only members of the VGrid are allowed• These are represented as directories in the

users home-directory

VGrid owners can create sub-VGrids

Page 26: eScience and Grid Tools and techniques for the next generation scientist

Examples

eScience on Grid

Page 27: eScience and Grid Tools and techniques for the next generation scientist

GeneRecon

GeneRecon seeks to identify genetic factors behind heretical deceases

The overall idea is to compare two genomes• One where the decease is observed• One where the decease is not observed

App 1000 individuals in each set

GeneRecon is developed at the Bioinformatics Research Center, Århus University

Page 28: eScience and Grid Tools and techniques for the next generation scientist

GeneRecon

The Algorithm is a Markov-chain Monte Carlo method

A test run consists of app. 30.000 individual tests• One test runs form 1 to 10 days on a PC• In total no less than 82 CPU years

MiG hosted the execution on Grid and got the execution down below a month

Page 29: eScience and Grid Tools and techniques for the next generation scientist

0.01

2.08

5546

101

505392

678

Total time

Queue timeExecution time

Min

Avg

Max

Statistics

1315 jobs were submitted to Grid at the same time0 jobs were lostFirst result

• 2:04:44

Last result• 28 days, 5:42:54

Page 30: eScience and Grid Tools and techniques for the next generation scientist

Groundwater modeling on Funen

11.0

12.0

13.0

14.0

15.0

16.0

17.0

18.0

0 200 400 600 800 1000

Antal model evalueringer

Ag

gre

ge

ret

ob

jekt

iv f

un

ktio

n

Calibration of the Assens model:1 model evaluation = 30 min920 model evaluations = 19 days

Page 31: eScience and Grid Tools and techniques for the next generation scientist

AUTOCAL OfficeGRID

Days to hours

Client

Master

ClientClient

Client

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

0 20 40 60 80 100Time [h]

Obj

ectiv

e fu

nctio

n

AUTOCAL (1 PC)

AUTOCAL OfficeGRID (10 PCs)

Page 32: eScience and Grid Tools and techniques for the next generation scientist

Drug Design

Molecular docking is a time consuming calculation process which this project does through two steps

First step is a coarse calculation that can eliminate molecules that won’t dock• This process can run on PCs and PS3’s – a lot of work is

being done towards efficient utilization of the CELL CPU for molecular docking

The molecules that survive the first step are then modeled more precisely at quantum level on classic supercomputers and clusters

Page 33: eScience and Grid Tools and techniques for the next generation scientist

SeGrid

Still a proposalThe idea is to share sensitive data through Grid and use the

Grid technology to manage access control and automatic anonymization

Page 34: eScience and Grid Tools and techniques for the next generation scientist

More information

www.eScience.dkPortal for KUs eScience activities

www.migrid.orgPortal for the Minimum intrusion Grid

www.rcuk.ac.uk/escience/The very ambitious UK eScience program