Upload
avice-gaines
View
219
Download
0
Embed Size (px)
Citation preview
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Experience with the WMO core metadata in the SIMDAT/VGISC project
Baudouin Raoult
ECMWF
ECMWFWMO Metadata Workshop – Beijing Sep 2005
The SIMDAT/VGISC project
SIMDATEU funded GRID project7 Technologies: Grid infrastructure, Virtual Organisation,
Ontologies, Analysis Services, Workflows, Distributed data access, Knowledge Services
4 Activities: Automotive, Areospace, Pharmacy and Meteorology
Meteorology activity: build a Virtual GISC (V-GISC)DWDUKMOMétéoFranceEUMETSATECMWF
ECMWFWMO Metadata Workshop – Beijing Sep 2005
V-GISC infrastructure
ECMWFWMO Metadata Workshop – Beijing Sep 2005
V-GISC Conceptual view
Through the Distributed Portal users searches for and retrieves data, subscribe to services subject to authentication and authorization
The Virtual Database Service provides a single view of partners databases
ECMWFWMO Metadata Workshop – Beijing Sep 2005
VGISC Distributed Architecture
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Why do we need metadata (in this project)?
Create a catalogue (discovery metadata)Searchable (Keyword, Geographical location, Time range)
Browsable (Directory hierarchy)
Implement the V-GISC (service metadata)Describe where the data resides (physical location)
Describe how to request the data
Describe the data format (useful for offering list of transformations, e.g. sub-sampling of gridded data, plots or format conversions)
Describe associated data policies
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Study of the WMO core
Starting pointXML files available on the WMO web site
XML files from DWD earlier prototype
Trying to describe ECMWF archive (1.3 1010 GRIB fields)
ECMWFWMO Metadata Workshop – Beijing Sep 2005
XML Root element
<p:piTimeseries xmlns:p="http://www.wmo.ch/web/www/metadata/piTimeseries" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.wmo.ch/web/www/metadata" xsi:schemaLocation="http://www.wmo.ch/web/www/metadata http://www.dwd.de/UNIDART/metadata/WMO19115_metadata_v0_2.xsd http://www.wmo.ch/web/www/metadata/piTimeseries http://www.dwd.de/UNIDART/metadata/WMO19115_piTimeseries_schema.xsd">
or
<metaData xmlns="http://www.wmo.ch/web/www/metadata" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance“ xmlns:fc="http://www.wmo.ch/web/www/featurecatalogue“ xsi:schemaLocation="http://www.wmo.ch/web/www/metadata/../WMO19115_metadata_v0_2.xsd http://www.wmo.ch/web/www/featurecatalogue/./featurecat/iso19110.xsd">
Namespaces are a nightmare to use (especially using XPath when there is a default namespace)
ECMWFWMO Metadata Workshop – Beijing Sep 2005
XML Keywords
<descriptiveKeywords>Russian Federation</descriptiveKeywords><descriptiveKeywords>Moscow region</descriptiveKeywords><descriptiveKeywords>Temperature</descriptiveKeywords><descriptiveKeywords>Clouds</descriptiveKeywords><descriptiveKeywords>Meteorology</descriptiveKeywords><descriptiveKeywords>Observation</descriptiveKeywords><descriptiveKeywords>Pressure</descriptiveKeywords><descriptiveKeywords>Rainfall</descriptiveKeywords><descriptiveKeywords>Snow</descriptiveKeywords><descriptiveKeywords>Snowfall</descriptiveKeywords><descriptiveKeywords>Weather</descriptiveKeywords><descriptiveKeywords>Wind</descriptiveKeywords><descriptiveKeywords>Phenomenon</descriptiveKeywords>
Or…
<descriptiveKeywords>EARTH SCIENCE > Cryosphere > Sea Ice</descriptiveKeywords><descriptiveKeywords>EARTH SCIENCE > Atmosphere</descriptiveKeywords><descriptiveKeywords>EARTH SCIENCE > Oceans</descriptiveKeywords><descriptiveKeywords>EARTH SCIENCE > Solid Earth</descriptiveKeywords><descriptiveKeywords>ocean, atmosphere, ice, land</descriptiveKeywords>
Or…
<descriptiveKeywords>METAR aviation hourly weather observation temperature dew point precipitation amount visibility cloud amount type height weather runway colour state</descriptiveKeywords>
ECMWFWMO Metadata Workshop – Beijing Sep 2005
XML Geographical extent<geographicElement> <polygon> <point> <latitude>50.78</latitude> <longitude>6.1</longitude> </point> </polygon></geographicElement>
Or…
<geographicElement><geographicIdentifier gazetteer="http://www.wmo.ch/web/www/ois/volume-a/vola-home.htm">
CCCC2</geographicIdentifier>
</geographicElement>
Or…
<geographicElement><boundingBox>
<westBoundLongitude>-126.3</westBoundLongitude><eastBoundLongitude>-126.3</eastBoundLongitude><southBoundLatitude>39.9</southBoundLatitude><northBoundLatitude>39.9</northBoundLatitude>
</boundingBox></geographicElement>
ECMWFWMO Metadata Workshop – Beijing Sep 2005
XML Temporal extent
<temporalElement><beginDateTime>0100-01-01</beginDateTime><endDateTime>0299-12-31</endDateTime><dataFrequency>monthly</dataFrequency><dataFrequency>daily</dataFrequency>
</temporalElement>
Or…
<temporalElement><referenceDateTime>2004-02-05T00:00:00</referenceDateTime><beginDateTime>2004-02-05T06:00:00</beginDateTime><endDateTime>2004-02-05T06:00:00</endDateTime>
</temporalElement>
Or…
<referenceDate><date>2004-01-28</date><dateType>creationDate</dateType>
</referenceDate>
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Repetition of XML elements (means extension)
<dataExtent><verticalElement>
<minimumValue>3.5</minimumValue><maximumValue>992.5</maximumValue><unitOfMeasure>mb</unitOfMeasure>
</verticalElement></dataExtent><dataExtent>
<geographicElement><boundingBox>
<westBoundLongitude>-180</westBoundLongitude><eastBoundLongitude>+180</eastBoundLongitude><southBoundLatitude>-90</southBoundLatitude><northBoundLatitude>+90</northBoundLatitude>
</boundingBox><geographicIdentifier
gazetteer="http://gcmd.gsfc.nasa.gov/Resources/valids/location.html">Global</geographicIdentifier>
</geographicElement></dataExtent><dataExtent>
<temporalElement><beginDateTime>1900-01-01</beginDateTime><endDateTime>1999-12-31</endDateTime><dataFrequency>monthly</dataFrequency><dataFrequency>daily</dataFrequency>
</temporalElement></dataExtent>
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Repetition of XML elements (means redefinition)
<dataExtent>
<description>Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector S</description><geographicElement>
<boundingBox><westBoundLongitude>-180</westBoundLongitude><eastBoundLongitude>-60</eastBoundLongitude><southBoundLatitude>0</southBoundLatitude><northBoundLatitude>90</northBoundLatitude>
</boundingBox></geographicElement>
</dataExtent>
<dataExtent><description>Global Grid 2.5 degree latitude and 2.5 degree longitude steps, 6 sectors, one sector per GRIB bulletin Sector T</description><geographicElement>
<boundingBox><westBoundLongitude>-60</westBoundLongitude><eastBoundLongitude>60</eastBoundLongitude><southBoundLatitude>0</southBoundLatitude><northBoundLatitude>90</northBoundLatitude>
</boundingBox></geographicElement>
</dataExtent>
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Findings
A flexible format, that leads to a lack of consistency
Different way to encode geographical extent, keywords and temporal extents
Missing information (for the V-GISC)To create a directoryTo locate the dataTo create retrieval requestsTo describe available transformationsTo implement data policies
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Findings (cont.)
Seems to be designed for human consumptionFree text in XML elements
•<distributionInfo>•<dataQualityInfo>
Not scalableSome document may change frequently (hourly?)Some documents are orders of magnitude larger than data
itselfCannot represent very large archives with small granularity
ECMWFWMO Metadata Workshop – Beijing Sep 2005
SIMDAT/VGISC problem
Each site has its own practicesWe have to be ready for variability in the XMLWe will have to handle XML from other WMO programmes
We need to handle tens of thousands of documentsLot of repeated informationWe need fast search
We need to automatically Index the keywords, the geographical extent and the temporal
extent Create a browsable directory (similar the NCAR’s Community
data portal)Locate and retrieve the data Implement the data policy
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Solution: split XML documents into fragments
WMO core metadata is structured
Some part are shared amongst many documentsAll metadata share the Core partAll UKMO metadata share the Owner partAll synops (should) share the same descriptionAll observations at Heathrow have the same locationThe date part is variable but is very small
WMO
UKMO
Synop
Heathrow
2005-10-12
Core
Owner
Data type
Station (geographical extent)
Date (temporal extent)
ECMWFWMO Metadata Workshop – Beijing Sep 2005
XML fragments are hierarchically linked
WMO UKMO
Synop Heathrow
Heathrow Synop
Heathrow Synop 2005-10-12
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Fragments: advantages
Factorizing commonalities into static fragmentsReduces size of XML documents
Indexation done once
Avoid redundancy of informationFaster searches
Frequently updated documents are smallManageable
Scalable
Complete XML document can be rebuilt For exchange outside the V-GISC
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Indexing of XML fragments
WMO UKMO
Synop Heathrow
Heathrow Synop
Heathrow Synop 2005-10-12
Keywords
Geographical Extent
Temporal Extent
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Prototype implementationXML Fragment are stored as “text”
Fragment tableHierarchy table
Indexed at insertion timeKeywords tableLocations tablePeriods tableDirectory table
Implemented with MySQLWith OpenGIS extensionWith text search extension
Indexes are “inherited”OO approach
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Object Oriented Approach - Behaviours
WMO UKMO
Synop Heathrow
Heathrow Synop
Heathrow Synop 2005-10-12
Index <geographicElement><boundingBox>
as geography
Index <featureAttribute>
<membrName> as keyword
Index<referenceDate>
<date>as period
Index <descriptiveKeywords>
as keyword
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Fragment properties - Behaviours
Only the owner of the data knows how to :Describe the data (Indexation information)
Request the data (Create internal request)
Extract a subset of the data (Define a interface to extract a subset)
Associated to each fragments ancillary metadata can be defined to describe how to index, request and sub-select the data
Behaviours are inheritedObject oriented approach
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Behaviours example: indexing
<indexing class="XPathKeywordIndexer“ separator=“ “><xpath>//identificationInfo/descriptiveKeywords</xpath>
</indexing>
<indexing class="XPathBoundingBoxIndexer"><xpath>//identificationInfo/dataExtent/geographicElement/boundingBox</
xpath></indexing>
<indexing class="XPathPolygonIndexer"><xpath>//identificationInfo/dataExtent/geographicElement/polygon</xpath>
</indexing>
<indexing class="XPathDateIndexer"><xpath>//identificationInfo/referenceDate/date</xpath>
</indexing>
<indexing class="XPathPeriodIndexer"><xpath>//identificationInfo/dataExtent/temporalElement</xpath><xpath>//identificationInfo/referenceDate/period</xpath>
</indexing>
<indexing class="XPathDirectoryIndexer"><xpath>//identificationInfo/topicCategory</xpath>
</indexing>
ECMWFWMO Metadata Workshop – Beijing Sep 2005
<vgisc> extension
A <vgisc> element from the “http://www.vgisc.org/” namespace is embedded in all the fragments
It contains all information needed to implement the V-GISC that is not defined by the WMO core because they are not relevant outside the scope of the V-GISC
Internal unique IDHierarchy relationshipPhysical location (which V-GISC node holds the data) Information used to create data request Information used to create web pages
It is removed when full XML document is recomposed for use outside the V-GISC
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Fragment example
<metaData xmlns:v='http://www.vgisc.org/'><v:vgisc>
<id>urn:akrotiri.synop.land.second.record.20050629</id>
<inherit>urn:akrotiri</inherit><inherit>urn:int.wmo.synop.land.second.record</
inherit><location>ecmwf.obs</location>
</v:vgisc><identificationInfo>
<referenceDate><date>2005-06-29</date>
</referenceDate></identificationInfo>
</metaData>
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Variables and Requests
Some datasets have two many items Impossible to describe every one of them
But describing the whole dataset is simple
Some datasets are very homogenousE.g. same parameters for a long period of time
This can be described in a compact form (<beginDateTime> and <endDateTime>)
But we still need to specify that individual dates can be requested by the user
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Variables and requests (cont.)
Associate two elements with an XML fragment:
<request>Hold information specific on how to generate a valid request
to the data repository
<variable>Holds information on how to create a web interface to let the
user select items from the dataset
Web portalWe use WMO core for discovery
We use the <variable> element to present selection dialogues to the user
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Fragment example: ECMWF Reanalysis <metadata xmlns:v='http://www.vgisc.org/'>
<v:vgisc><id>urn:int.ecmwf.era40.sfc</id><inherit>urn:int.wmo.core</inherit><location>ecmwf.mars</location><request>
<class>e4</class><levtype>sfc</levtype><database>marser</database>
</request><variables>
<date type='date'><startDate>1980-01-01</startDate><endDate>1990-12-31</endDate>
</date><param title='Parameter' multiple='1' type='enum'>
<value>2t</value><value>msl</value>
</param><time title='Base time' multiple='1' type='enum'>
<value>0000</value><value>0600</value><value>1200</value><value>1800</value>
</time></variables>
</v:vgisc><identificationInfo>
<descriptiveKeywords>ECMWF 40 Years reanalysis ERA40 ERA-40 in GRIB</descriptiveKeywords><topicCategory>NWP Outputs > ECMWF > 40 years reanalysis</topicCategory><dataExtent>
<temporalElement><beginDateTime>1980-01-01</beginDateTime><endDateTime>1990-12-31</endDateTime>
</temporalElement>…
ECMWFWMO Metadata Workshop – Beijing Sep 2005
Directory structure
Problem: create a browsable hierarchy of topics, as the “Google directory” (see NCAR’s community data portal)
Not to be confuse with the internal “fragment hierarchy” which is not exposed to the end user
Currently using the element <topicCategory><topicCategory>NWP Outputs > ECMWF > 40 years
reanalysis</topicCategory>
The same product can appear in several locations of the directory<topicCategory>Observations > By Type > Profile > Temp Land</topicCategory><topicCategory>Observations > By Region > Asia > China</topicCategory>
Usage should be recommended by WMO
ECMWFWMO Metadata Workshop – Beijing Sep 2005
ConclusionThe approach taken in the V-GISC should help us
support the large variety of XML documentsNevertheless, the standard is too flexible
Lot of programming is required to support all possible variations
The WMO must provide “best practices” guidelinesHow to encode point in time, how to encode ranges, …
A topic hierarchy must be defined, to create the directory
WMO core metadata needs only contain sufficient information for discovery
The rest can be implemented as a series of local extensions, as long as they are not exported or exchanged