25
VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton [email protected] Royal Meteorological Society SIG Meeting, BAS, 5 th October 2011: Transmission, presentation and archiving of meteorological data

VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton [email protected] Royal Meteorological Society SIG Meeting,

Embed Size (px)

Citation preview

Page 1: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Environmental Data Archival: Practices and Benefits

Graham Parton [email protected]

Royal Meteorological Society SIG Meeting, BAS, 5th October 2011:

Transmission, presentation and archiving of meteorological data

Page 2: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Overview

What is data archival

Why do it?

How do we do it within CEDA?

Page 3: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

What do we call “data archival”

Placing data into a repository which is:

• Backed up• Robust (identify data corruptions)• Catalogued• Recognised repository

Page 4: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Why archive data

• Making data public - Openness of the result and repeatability are essential for scientific rigor

• Place to share data with project participants• Re-purposing data• Additional services (often for free!)• Maybe required for legal reasons • Secure • Get credit

And because if you don’t….

Page 5: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Why archive data

Page 6: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

>100,000,000 files holding ~ 1 Pb of data~38,000,000 files downloaded since October 201019,000+ register users of which ~3600 are currently ‘active’ users250+ datasets26 staffResponsible for

+ other services and projects (e.g. UKCIP, CMIP5 partner)

… i.e.. We are highly reliant on scripted systems and a well structured archive

Scale of CEDA operations

Page 7: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Arrivals

3rd Party Dataproviders

Data Suppliers

Ingest

Archive Archive Archive

Backup Backup Backup

External discovery service

Catalogue

met

adat

a

External U

sers

Web service

download

view

discovery

Page 8: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Arrivals

3rd Party Dataproviders

Data Suppliers

Ingest

Archive Archive Archive

Data Preparation

Page 9: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Data Preparation

• Data Management Plans including delivery schedules

• Conditions of Use/Licensing

• Support suppliers in data preparation

• Capture supporting documentation (formats, calibration information, flight logs, etc.)

• File naming and archive structure

• Set up ingest routes

Page 10: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Data Preparation - File structure

Take the bad data challenge…. File “sw010203”

What are these data? Guess surface winds, but on what day?What are the units? Any convention?How do we read the file? Is this spatial or temporal data?... 1440 pairs of data in a file

4.31 155.3 3.92 136.1 5.15 140.2 4.23 137.1 4.75 150.2 4.71 137.9 4.35 146.5 4.52 138.0 4.83 153.7 5.40 145.8 4.63 141.0 4.90 137.3 4.31 143.3 4.58 157.0 4.94 141.7 4.65 143.1 4.63 143.0 4.88 149.5 5.42 148.5 4.92 140.4 4.04 146.7 3.92 151.5 5.02 135.3 5.06 151.6 4.65 152.3 4.31 168.8 3.79 145.3 5.92 152.9 5.02 145.8 4.77 161.6 4.79 144.1 4.60 147.5 5.33 150.1 4.81 141.0 6.02 146.9 4.38 149.0 4.42 142.5 4.58 133.4 4.35 150.5 4.96 149.8 5.56 143.4 5.08 148.5 5.19 141.6 4.40 142.4 4.10 152.6 5.02 134.0 4.94 142.9 5.27 144.4 5.38 141.5 5.88 144.8 6.00 140.1 4.75 158.3 5.08 148.1 5.46 163.5 4.27 150.8 4.69 138.8 5.71 144.0 5.21 138.8 5.00 132.4 5.06 144.4

Page 11: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Supported Formats

Highly structured metadata

Standard Names

Page 12: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Arrivals

3rd Party Dataproviders

Data Suppliers

Ingest

Archive Archive Archive

External discovery service

Catalogue

met

adat

a

External U

sers

Web service

discovery

Data Discovery

Page 13: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

CEDA Catalogue

Page 14: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

NERC Data Discovery Servicedata-search.nerc.ac.uk

Page 15: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

CEDA Document Repositorycedadocs.badc.rl.ac.uk

Page 16: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Citations for Data Creators: DOIs

Citation (and DOI)

Data Citation and DOI… but only if in a recognised repository

Page 17: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Arrivals

3rd Party Dataproviders

Data Suppliers

Ingest

Archive Archive Archive

External discovery service

Catalogue

met

adat

a

External U

sers

Web service

download

view

discovery

Data Services

Page 18: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Visualisation Services

Page 19: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Visualisation Services ISIC Video Wall

Page 20: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Visualisation Services

Page 21: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Processing ServicesCEDA WPS: ceda-wps2.badc.rl.ac.uk/ui/home

Chain services together

Download resultJob either run straight awayOr sent to run on backend service

Page 22: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Processing ServicesTrajectory Service

Page 23: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

OPeNDAP ServiceWith security layer

• Navigable and scriptable

interface to archive

• CEDA has applied security

shell using “Open ID”

technology

• Give powerful sub-setting

service for large datasets

Page 24: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

What’s on the horizon?

Continue to develop visualisation and data processing services

Increasing data volumes becoming too large to move around

Hosting services – provide virtual environments for people to work on the data without downloading

From Petascale to Exoscale

But all this NEEDS well data that uses standards driven metadata and formats

Page 25: VO Sandpit, November 2009 Environmental Data Archival: Practices and Benefits Graham Parton graham.parton@stfc.ac.uk Royal Meteorological Society SIG Meeting,

VO Sandpit, November 2009

Take Home Messages

Team Digial Preservation Video

• Plan for data management

• Tap into standards when preparing data

• Get data catalogued for data discovery

• Data in supported repositories leads to recognition for efforts preparing data

• A suite of additional services add value to existing data