The Grid Data Warehouse

Preview:

DESCRIPTION

The Grid Data Warehouse. Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004-02-04. AstroGrid: the UK Virtual Observatory. }. - PowerPoint PPT Presentation

Citation preview

A PPARC funded project

The Grid Data Warehouse

Description of prototype work in progress by AstroGrid.Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004-02-04

04-02-2004 GDW description: access-grid lecture 2

AstroGrid: the UK Virtual Observatory

}

Seven UK astronomy departments collaborating to build a Virtual Observatory (VO) for the use of the entire astronomical community.

04-02-2004 GDW description: access-grid lecture 3

IVOA: the community of VO projects

04-02-2004 GDW description: access-grid lecture 4

Purpose of the virtual observatory

To combine data from all sources into a data grid.

Data grid

Private files

Archives

Live feeds

Bibliographies

Data sets can be images (mainly in files) or tabular (mainly in RDBMS).

04-02-2004 GDW description: access-grid lecture 5

Example of VO use

“Find brown dwarf candidates: combine optical (e.g. APM catalogue) and IR (e.g. 2MASS) data to select by colour. Combine multi-epoch data to determine proper motions; select high-PM fraction of colour-selected sample. Then use that sample to…”

Optical archive

IR archive

2nd epoch

Colour sample

Refined sample

3rd epoch

04-02-2004 GDW description: access-grid lecture 6

VO as collection of web sites: no good

Each site has different query protocol

Results only go to browser, not to RDBMS, reprocessing

Results in HTML etc not machine readable

Basic web sites are not sufficient for the VO.

04-02-2004 GDW description: access-grid lecture 7

Grid metaphor: electricity supply

Loadsa complex equipment

Simple delivery to consumer

Get your power from any supplier: commodity

04-02-2004 GDW description: access-grid lecture 8

Commodities in astronomy data grid

Common s/w on desktop

Algorithms

Archives

Writeable Storage

Registry of resources

(Processors)

Bulk data transport; machine-readable results; combined inside grid

Metadata transport

04-02-2004 GDW description: access-grid lecture 9

AstroGrid topology

Portal Registry

Algorithms Writeable storageArchives

Workflow

04-02-2004 GDW description: access-grid lecture 10

Difficult RDBMS operations

“Select objects with V-K > 4.5…” (i.e. find ‘red’ objects).

U, B, V, R

Optical archive service

IR archive service

J, H, K?No std. way of combining DBs.

No std. way of storing results in RDBMS

?

04-02-2004 GDW description: access-grid lecture 11

Need for data warehouse

Join across internet

RDBMS RDBMSRDBMS

RDBMS

RDBMS

RDBMS

RDBMSJoin inside warehouse DB

1000x speed gains

04-02-2004 GDW description: access-grid lecture 12

GDW topology extends AstroGrid

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

04-02-2004 GDW description: access-grid lecture 13

GDW people

Kona Andrews (Cambridge)Elizabeth Auden (MSSL)Martin Hill (Edinburgh)Tony Linde (Leicester)Clive Page (Leicester)Guy Rixon (Cambridge)Noel Winstanley (Jodrell Bank)

04-02-2004 GDW description: access-grid lecture 14

Current system

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Link not implemented yet

DB tables preloaded; read-only DB

Link temporarily redirected

04-02-2004 GDW description: access-grid lecture 15

Next system (3Q2004)

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Limited choice

Links implemented properly (GridFTP)

Two dedicated installations inside AstroGrid; multi-user

04-02-2004 GDW description: access-grid lecture 16

Ultimate system (2005+)

Portal

File storage Archive

Workflow

Registry

Warehouse controller

Grid-DB (OGSA-DAI)

AstroGrid

UK e-Science grid / EGEE

One node per user; any storage node

04-02-2004 GDW description: access-grid lecture 17

Assessment

Basic idea is soundCoding of GDW was quite simpleVery difficult to get it all integratedProblems with OGSA-DAI:

Performance Data-size limits Can’t get higher functions to work yet

Proceed? Yes; need to experiment further Still expect to get science out of it

04-02-2004 GDW description: access-grid lecture 18

Can one use it?

Beta testers invitedWait for release of “Iteration 4.1” system (soon!)Wait for release of “Iteration 5” system (3Q2004) to see GDW useful for scienceAstroGrid final release is at the end of 2004

http://wiki.astrogrid.org/bin/view/Astrogrid/BetaTesting

04-02-2004 GDW description: access-grid lecture 19

That’s all That’s all folks!folks!

Recommended