19
A PPARC funded project The Grid Data Warehouse Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004- 02-04

The Grid Data Warehouse

  • Upload
    shepry

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

The Grid Data Warehouse. Description of prototype work in progress by AstroGrid. Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004-02-04. AstroGrid: the UK Virtual Observatory. }. - PowerPoint PPT Presentation

Citation preview

Page 1: The Grid Data Warehouse

A PPARC funded project

The Grid Data Warehouse

Description of prototype work in progress by AstroGrid.Access-Grid lecture to Universities of Leeds and Sheffield by Guy Rixon on 2004-02-04

Page 2: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 2

AstroGrid: the UK Virtual Observatory

}

Seven UK astronomy departments collaborating to build a Virtual Observatory (VO) for the use of the entire astronomical community.

Page 3: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 3

IVOA: the community of VO projects

Page 4: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 4

Purpose of the virtual observatory

To combine data from all sources into a data grid.

Data grid

Private files

Archives

Live feeds

Bibliographies

Data sets can be images (mainly in files) or tabular (mainly in RDBMS).

Page 5: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 5

Example of VO use

“Find brown dwarf candidates: combine optical (e.g. APM catalogue) and IR (e.g. 2MASS) data to select by colour. Combine multi-epoch data to determine proper motions; select high-PM fraction of colour-selected sample. Then use that sample to…”

Optical archive

IR archive

2nd epoch

Colour sample

Refined sample

3rd epoch

Page 6: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 6

VO as collection of web sites: no good

Each site has different query protocol

Results only go to browser, not to RDBMS, reprocessing

Results in HTML etc not machine readable

Basic web sites are not sufficient for the VO.

Page 7: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 7

Grid metaphor: electricity supply

Loadsa complex equipment

Simple delivery to consumer

Get your power from any supplier: commodity

Page 8: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 8

Commodities in astronomy data grid

Common s/w on desktop

Algorithms

Archives

Writeable Storage

Registry of resources

(Processors)

Bulk data transport; machine-readable results; combined inside grid

Metadata transport

Page 9: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 9

AstroGrid topology

Portal Registry

Algorithms Writeable storageArchives

Workflow

Page 10: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 10

Difficult RDBMS operations

“Select objects with V-K > 4.5…” (i.e. find ‘red’ objects).

U, B, V, R

Optical archive service

IR archive service

J, H, K?No std. way of combining DBs.

No std. way of storing results in RDBMS

?

Page 11: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 11

Need for data warehouse

Join across internet

RDBMS RDBMSRDBMS

RDBMS

RDBMS

RDBMS

RDBMSJoin inside warehouse DB

1000x speed gains

Page 12: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 12

GDW topology extends AstroGrid

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Page 13: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 13

GDW people

Kona Andrews (Cambridge)Elizabeth Auden (MSSL)Martin Hill (Edinburgh)Tony Linde (Leicester)Clive Page (Leicester)Guy Rixon (Cambridge)Noel Winstanley (Jodrell Bank)

Page 14: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 14

Current system

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Link not implemented yet

DB tables preloaded; read-only DB

Link temporarily redirected

Page 15: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 15

Next system (3Q2004)

Portal

File storage Archive

Workflow

Registry

Grid-DB (OGSA-DAI)

Warehouse controller

Grid-DB (OGSA-DAI)

Limited choice

Links implemented properly (GridFTP)

Two dedicated installations inside AstroGrid; multi-user

Page 16: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 16

Ultimate system (2005+)

Portal

File storage Archive

Workflow

Registry

Warehouse controller

Grid-DB (OGSA-DAI)

AstroGrid

UK e-Science grid / EGEE

One node per user; any storage node

Page 17: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 17

Assessment

Basic idea is soundCoding of GDW was quite simpleVery difficult to get it all integratedProblems with OGSA-DAI:

Performance Data-size limits Can’t get higher functions to work yet

Proceed? Yes; need to experiment further Still expect to get science out of it

Page 18: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 18

Can one use it?

Beta testers invitedWait for release of “Iteration 4.1” system (soon!)Wait for release of “Iteration 5” system (3Q2004) to see GDW useful for scienceAstroGrid final release is at the end of 2004

http://wiki.astrogrid.org/bin/view/Astrogrid/BetaTesting

Page 19: The Grid Data Warehouse

04-02-2004 GDW description: access-grid lecture 19

That’s all That’s all folks!folks!