Upload
marcus-huntington
View
221
Download
2
Tags:
Embed Size (px)
Citation preview
VO Sandpit, November 2009
Environmental Data Archival: Practices and Benefits
Graham Parton [email protected]
Royal Meteorological Society SIG Meeting, BAS, 5th October 2011:
Transmission, presentation and archiving of meteorological data
VO Sandpit, November 2009
Overview
What is data archival
Why do it?
How do we do it within CEDA?
VO Sandpit, November 2009
What do we call “data archival”
Placing data into a repository which is:
• Backed up• Robust (identify data corruptions)• Catalogued• Recognised repository
VO Sandpit, November 2009
Why archive data
• Making data public - Openness of the result and repeatability are essential for scientific rigor
• Place to share data with project participants• Re-purposing data• Additional services (often for free!)• Maybe required for legal reasons • Secure • Get credit
And because if you don’t….
VO Sandpit, November 2009
Why archive data
VO Sandpit, November 2009
>100,000,000 files holding ~ 1 Pb of data~38,000,000 files downloaded since October 201019,000+ register users of which ~3600 are currently ‘active’ users250+ datasets26 staffResponsible for
+ other services and projects (e.g. UKCIP, CMIP5 partner)
… i.e.. We are highly reliant on scripted systems and a well structured archive
Scale of CEDA operations
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
Backup Backup Backup
External discovery service
Catalogue
met
adat
a
External U
sers
Web service
download
view
discovery
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
Data Preparation
VO Sandpit, November 2009
Data Preparation
• Data Management Plans including delivery schedules
• Conditions of Use/Licensing
• Support suppliers in data preparation
• Capture supporting documentation (formats, calibration information, flight logs, etc.)
• File naming and archive structure
• Set up ingest routes
VO Sandpit, November 2009
Data Preparation - File structure
Take the bad data challenge…. File “sw010203”
What are these data? Guess surface winds, but on what day?What are the units? Any convention?How do we read the file? Is this spatial or temporal data?... 1440 pairs of data in a file
4.31 155.3 3.92 136.1 5.15 140.2 4.23 137.1 4.75 150.2 4.71 137.9 4.35 146.5 4.52 138.0 4.83 153.7 5.40 145.8 4.63 141.0 4.90 137.3 4.31 143.3 4.58 157.0 4.94 141.7 4.65 143.1 4.63 143.0 4.88 149.5 5.42 148.5 4.92 140.4 4.04 146.7 3.92 151.5 5.02 135.3 5.06 151.6 4.65 152.3 4.31 168.8 3.79 145.3 5.92 152.9 5.02 145.8 4.77 161.6 4.79 144.1 4.60 147.5 5.33 150.1 4.81 141.0 6.02 146.9 4.38 149.0 4.42 142.5 4.58 133.4 4.35 150.5 4.96 149.8 5.56 143.4 5.08 148.5 5.19 141.6 4.40 142.4 4.10 152.6 5.02 134.0 4.94 142.9 5.27 144.4 5.38 141.5 5.88 144.8 6.00 140.1 4.75 158.3 5.08 148.1 5.46 163.5 4.27 150.8 4.69 138.8 5.71 144.0 5.21 138.8 5.00 132.4 5.06 144.4
VO Sandpit, November 2009
Supported Formats
Highly structured metadata
Standard Names
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
External discovery service
Catalogue
met
adat
a
External U
sers
Web service
discovery
Data Discovery
VO Sandpit, November 2009
CEDA Catalogue
VO Sandpit, November 2009
NERC Data Discovery Servicedata-search.nerc.ac.uk
VO Sandpit, November 2009
CEDA Document Repositorycedadocs.badc.rl.ac.uk
VO Sandpit, November 2009
Citations for Data Creators: DOIs
Citation (and DOI)
Data Citation and DOI… but only if in a recognised repository
VO Sandpit, November 2009
Arrivals
3rd Party Dataproviders
Data Suppliers
Ingest
Archive Archive Archive
External discovery service
Catalogue
met
adat
a
External U
sers
Web service
download
view
discovery
Data Services
VO Sandpit, November 2009
Visualisation Services
VO Sandpit, November 2009
Visualisation Services ISIC Video Wall
VO Sandpit, November 2009
Visualisation Services
VO Sandpit, November 2009
Processing ServicesCEDA WPS: ceda-wps2.badc.rl.ac.uk/ui/home
Chain services together
Download resultJob either run straight awayOr sent to run on backend service
VO Sandpit, November 2009
Processing ServicesTrajectory Service
VO Sandpit, November 2009
OPeNDAP ServiceWith security layer
• Navigable and scriptable
interface to archive
• CEDA has applied security
shell using “Open ID”
technology
• Give powerful sub-setting
service for large datasets
VO Sandpit, November 2009
What’s on the horizon?
Continue to develop visualisation and data processing services
Increasing data volumes becoming too large to move around
Hosting services – provide virtual environments for people to work on the data without downloading
From Petascale to Exoscale
But all this NEEDS well data that uses standards driven metadata and formats
VO Sandpit, November 2009
Take Home Messages
Team Digial Preservation Video
• Plan for data management
• Tap into standards when preparing data
• Get data catalogued for data discovery
• Data in supported repositories leads to recognition for efforts preparing data
• A suite of additional services add value to existing data