42
Data Standards Workflow Raw data Scripts Database Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web Extract Transform Load Charts & Maps Tools and websites Provide Add meta information Script to convert raw data into netcdf OpenEarth RawData OpenEarth OPeNDAP OpenEarth Tools

Data Standards Workflow Raw dataScriptsDatabase Store raw data in subversion to keep track of history Stored files (netcdf) accessible through the web

Embed Size (px)

Citation preview

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Transform

• Add metadata• Store in netcdf• Save script in subversion

Add metadata

• Use the inspire meta data form to store information about the dataset.• http://www.inspire-geoportal.eu/inspireEditor.htm• Click launch editor

Transform

Turn validation on

Transform – add metadata

validation

Location in subversion

micore

File identification

Transform – add metadata

History of your data.

Transform – add metadata

quality

Please fill in limitations of use.

Transform – add metadata

constraints

Store in course/Pcnumber/inspire_description.xml

Transform – add metadata

Save metadata file1. Save metadata file (local)2. Add to subversion (local)3. Commit => metadata into subversion (remote)

Transform

• Add metadata• Store in netcdf• Save script in subversion

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform

What is netcdf

• Data format defined by unidata• Data store used for coverage data and

multidimensional data• CF Metadata convention

Transform – store in netcdf - netcdf

What is netcdf

XX

ZZ

TT

YY

• An array based data structure for storing multidimensional data

• N-dimensional coordinates systems• X coordinate (e.g. longitude)• Y coordinate (e.g. latitude)• Z coordinate (e.g. altitude)• Time dimension• … other dimensions

• Variables – support for multiple variables• Temperature, humidity, pressure, salinity, etc

• Geometry – implicit or explicit• Regular grid (implicit)• Irregular grid• Points

TransformTransform – store in netcdf - netcdf

Storing Multidimensional Data

X Y Z Q

1 1 1 0.5

1 1 2 0.3

1 2 1 0.6

1 2 2 0.1

2 1 1 0.4

2 1 2 0.2

2 2 1 0.9

2 2 2 0.3

0.5 0.4

0.6 0.9

0.3 0.2

0.1 0.3

1 2

1

2

1

2

X Y Z

32 numbers

14 numbers

Transform – store in netcdf - netcdf

Data Model

Data model for netcdf and others.

Also usable for hdf, opendap, grib, etc. See the java library for details

Data model for netcdf and others.

Also usable for hdf, opendap, grib, etc. See the java library for details

Transform – store in netcdf - netcdf

ArcGis

ArcGis also reads and writes netcdf files.

ArcGis also reads and writes netcdf files.

Transform – store in netcdf – netcdf - applications

Your favorite text editor

xml representation of a netcdf file

xml representation of a netcdf file

Transform – store in netcdf - netcdf

Other Tools

NCO#diffncdiff -v time file1.nc file2.nc#compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression)#selecting variables by regexncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc.

IDVVery usefulVery useful

Web hyperslabs, cool!Web hyperslabs, cool!

Not so stable.Not so stable.

Transform – store in netcdf - netcdf

Data Standards Workflow

Raw data Scripts Database

Store raw data in subversion to

keep track of history

Stored files (netcdf)

accessible through the web

Extract Transform Load

Charts & Maps

Tools and websites

Provide

Add meta information

Script to convert raw data into

netcdf

OpenEarthRawData

OpenEarth

OPeNDAP

OpenEarthTools

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform – store in netcdf - script

Write script

• Read raw data• Read header line• Read data• Read all data• Create function to read all data• Use function in Matlab

• Raw data into empty netcdf file• Create empty netcdf file• Add dimensions and variables• Store variables

• Read values

Transform – store in netcdf - script

Reading raw data into memory

• Use one of the following matlab functions to read the file data into an array• fscanf

Transform – store in netcdf - script

Example: Transect.txt file

1999 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951… 200 -2415 210 -2995 220 -3595 99999999999 99999999999 2000 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951

Header lineYear

number of points

PointsX Z X Z …. 9999999

Location: OpenEarthRawData\course\example\raw

Transform – store in netcdf - script

Read header line

>> fid = fopen('..\raw\transect.txt')fid = 15

>> header = fscanf(fid, '%d', 2)header = 2000 58

>> year = header(1)year = 2000

>> npoint = header(2)npoint = 58

Transform – store in netcdf - script

% read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data';

Read data>> % read datadata = fscanf(fid, '%d', npoint*2)

data = -150 3741 -140 3581 -135

>> data = reshape(data, [2, npoint])

data = Columns 1 through 7

-150 -140 -135 -130 3741 3581 3531 3541

1

2

>> % use column vectorsdata = data'

data = -150 3741 -140 3581 -135 3531

3

Transform – store in netcdf - script

Read all data% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen('..\raw\transect.txt');i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data' % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;end

Transform – store in netcdf - script

Create a functionfunction transect = readtransect(filename)% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen(filename);i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;endtransect = struct('series', transectseries, … 'distance', coastward_distance, 'time', time);end

Transform – store in netcdf - script

Use the new function

>> data = readtransect('..\raw\transect.txt')

data =

series: [3x58 double] distance: [58x1 double] time: [3x1 double]

Transform – store in netcdf - script

Loading data into netcdf

• What does a netcdf file look like• Required meta information

Transform – store in netcdf - script

Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}

Transform – store in netcdf - script

Create an empty netcdf file

>> nc_create_empty(outputfile)>> nc_dump(outputfile)netcdf transect.nc {

dimensions:

variables:

}

Transform – store in netcdf - script

Add dimensions

nc_add_dimension(outputfile, 'crossshore', 58)nc_add_dimension(outputfile, 'time', 3)nc_dump(outputfile)>>netcdf transect.nc {

dimensions:coastward = 58 ;time = 3 ;

variables:}

help nc_add_dimension

Transform – store in netcdf - script

Add variablescrossshoreVariable = struct(... 'Name', 'crossshore_distance', ... 'Nctype', 'float', ... 'Dimension', {{‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, crossshoreVariable);timeVariable = struct(... 'Name', 'year', ... 'Nctype', 'float', ... 'Dimension', {{'time'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'year') ... );nc_addvar(outputfile, timeVariable);heightVariable = struct(... 'Name', 'height', ... 'Nctype', 'float', ... 'Dimension', {{'time', ‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, heightVariable);nc_dump(outputfile)

help nc_addvar

Transform – store in netcdf - script

Result

netcdf transect.nc {

dimensions:coastward = 58 ;time = 3 ;

variables:float coastward_distance(coastward), shape = [58]

coastward_distance:unit = "metre" float year(time), shape = [3]

year:unit = "year" float height(time,coastward), shape = [3 58]

height:unit = "metre"

}

Transform – store in netcdf - script

Store variables

nc_varput(outputfile, 'height', data.series)nc_varput(outputfile, 'year', data.time)nc_varput(outputfile, 'coastward_distance', data.distance)

help nc_varput

Transform – store in netcdf - script

Result: Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:

coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}

Transform – store in netcdf - script

Read values

surface(nc_varget(outputfile, 'height')')

11.5

22.5

3

020

4060

-5000

0

5000

10000

15000

Transform – store in netcdf - script

Store in netcdf

• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention

Transform – store in netcdf - convention

CF convention

Standard used by USGS, NOAA, Arcgis, GDAL

Climate and Forecast (CF) Conventionhttp://www.unidata.ucar.edu/software/netcdf/docs/conventions.html

Initially developed for• Climate and forecast data• Atmosphere, surface and ocean model-generated data• Also used for observational datasets• CF is the most widely used convention for geospatial netCDF

data.

Transform – store in netcdf - convention

Improve output

• Store extra attributes• Title• Author• Standard_name

Transform – store in netcdf - convention

Transform

• Add metadata• Store in netcdf• Save script in subversion

Transform – save script

Save script1. Save script (local, using matlab

https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/)2. Add to subversion (local)3. Commit => script into subversion (remote)