38
INTERCHANGE TECHNOLOGY FOR APPLICATIONS TO FACILITATE GENERIC ACCESS TO HETEROGENOUS DATA FORMATS IEEE International Geoscience and Remote Sensing Symposium, 2002 RAHUL RAMACHANDRAN INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE

INTERCHANGE TECHNOLOGY FOR APPLICATIONS TO FACILITATE GENERIC

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

INTERCHANGE TECHNOLOGY FOR APPLICATIONS TO FACILITATE GENERIC

ACCESS TO HETEROGENOUS DATA FORMATS

IEEE International Geoscience and Remote Sensing Symposium, 2002

RAHUL RAMACHANDRANINFORMATION TECHNOLOGY AND SYSTEMS CENTER

UNIVERSITY OF ALABAMA IN HUNTSVILLE

Earth Science Data Characteristics• Different formats,

types and structures (18 and counting for Atmospheric Science alone!)

• Different states of processing ( raw, calibrated, derived, modeled or interpreted )

• Enormous volumes

• Data usability problem

HDF HDF-EOS

netCDFASCII

Binary GRIB

Data Usability Problem

DATA FORMAT 1

DATA FORMAT 1

DATA FORMAT 2

DATA FORMAT 2

DATA FORMAT 3

DATA FORMAT 3

APPLICATION

READER 1 READER 2

FORMATCONVERTER

• Specialized code for every format• Difficult to assimilate new data types

• Enforce a Standard Data Format• Not practical for legacy datasets

ESML Solution

ESML LIBRARY

APPLICATION

ESMLFILE

ESMLFILE

ESMLFILE

DATA FORMAT 1

DATA FORMAT 1

DATA FORMAT 2

DATA FORMAT 2

DATA FORMAT 3

DATA FORMAT 3

• Data interoperability for applications• “Define Once, Use Anywhere” philosophy

What is ESML?

• Earth Science Markup Language (ESML)• Specialized markup language for Earth Science metadata

based on XML• Machine-readable and -interpretable representation of the

structure and content of any data file, regardless of data format

• External metadata files that can be generated by either data producer or data consumer (at collection, data set, and/or granule level)

• ESML will provide the benefits of a standard, self-describing data format (like HDF, HDF-EOS, netCDF,geoTIFF, …) without the cost of data conversion

• Basis for core Interchange Technology

Components of the Interchange Technology

DATAFORMAT2

DATAFORMAT3

OTHER FORMATS

DATAFORMAT1

ESMLFILE

ESMLFILE

ESMLFILE

ESMLSCHEMA

ESML LIBRARY

ESMLDATA

BROWSER

ADaM DATA MININGSYSTEM

OTHER APPLICATIONS

ESMLEDITOR

INTERCHANGETECHNOLOGY

Components of the Interchange Technology

DATAFORMAT2

DATAFORMAT3

OTHER FORMATS

DATAFORMAT1

ESMLEDITORESML

SCHEMA

ESML LIBRARY

ESMLDATA

BROWSER

ADaM DATA MININGSYSTEM

OTHER APPLICATIONS

ESMLFILE

ESMLFILE

ESMLFILE

ESML COSISTS OF:

MARKUPS/DESCRIPTIONS

Components of the Interchange Technology

DATAFORMAT2

DATAFORMAT3

OTHER FORMATS

DATAFORMAT1

ESMLEDITORESML

SCHEMA

ESML LIBRARY

ESMLDATA

BROWSER

ADaM DATA MININGSYSTEM

OTHER APPLICATIONS

ESMLFILE

ESMLFILE

ESMLFILE

ESML COSISTS OF:

RULES FOR THE MARKUPS

Components of the Interchange Technology

DATAFORMAT2

DATAFORMAT3

OTHER FORMATS

DATAFORMAT1

ESMLEDITORESML

SCHEMA

ESML LIBRARY

ESMLDATA

BROWSER

ADaM DATA MININGSYSTEM

OTHER APPLICATIONS

ESMLFILE

ESMLFILE

ESMLFILE

ESML COSISTS OF:

MIDDLEWARE FOR AUTOMATION

Components of the Interchange Technology

DATAFORMAT2

DATAFORMAT3

OTHER FORMATS

DATAFORMAT1

ESMLFILE

ESMLFILE

ESMLFILE

ESMLSCHEMA

ESML LIBRARY

ESMLDATA

BROWSER

ADaM DATA MININGSYSTEM

OTHER APPLICATIONS

ESMLEDITOR

INTERCHANGETECHNOLOGY

Scientists/Application Developers Role

DATAFORMAT1

DATAFORMAT2

DATAFORMAT3

OTHER FORMATS

ESMLFILE

ESMLFILE

ESMLFILE

ESMLSCHEMA

ESML LIBRARY

ESMLDATA

BROWSER

ADaM DATA MININGSYSTEM

OTHER APPLICATIONS

ESMLEDITOR

DATA PRODUCERSOR CONSUMERS

INTERCHANGETECHNOLOGYAPPLICATION

DEVELOPERS

ESML Schema

• Content metadata describe the contents of a file in human-readable and machine-readable terms. An example of this would be ECS or FGDC metadata.

• Syntactic metadata describe the structure of the file in machine-readable and -interpretable terms. HDF-EOS and HDF provide this mechanism for HDF-EOS files.

• Semantic metadata, describe the contents of a file in machine-readable terms such that an application can interpret the data in an intelligent manner. Semantic metadata is embedded in the Syntactic

ESML Content Metadata

• Content metadata are mainly used for human knowledge and web-searching

• Content metadata in ESML are derived from FGDC and ECS metadata sets

• Tools are planned for converting existing ECS metadata into ESML

Syntactic Metadata in Earth Science Markup Language

• Free Formats:– Descriptions of the structures of data files in two

basic data formats, Binary and ASCII• Self-Describing Formats:

– Descriptions of the structures of self-describing data files through their internal metadata

• HDF-EOS

• Other data formats– GRIB, McIDAS

• Formats to be added• HDF, CDF, netCDF, etc

ASCII Format in ESML Schema

Semantic Metadata in ESML• Goals

– Get the right data – Navigating the data

• Figure out within the data what is latitude, longitude, time and/or coordinate system

– Reading the data in actual scientific units, for example: • Data stored as value to save space • Equation is applied on processing

• Embedded in Syntactic metadata• Provides semantic descriptions for Time, Geo (Lat/Lon) and Data

fields• Provides capabilities to define equation for data conversion for Time,

Geo and Data fields• e.g. Y=mx+b where m is scale and b is offset

Need For Semantic Metadata: Example

TMI Data Product README File

The data values between 0 and 250 need to be scaled to obtain meaningful geophysical data.To scale the data, multiply by the scale factors listed below:

T: multiply by 6.0 to get minute of day (0 – 1440) S: multiply by 0.15 AND subtract 3 to get SST (-3 - 34.5 C)W: multiply by 0.15 to get 10 m winds (0 - 37.5 m/s)V: multiply by 0.3 to get water vapor (0 - 75 mm)L: multiply by 0.01 to get cloud liquid water (0 - 2.5 mm)R: multiply by 0.1 to get rain rate (0 - 25 mm/hr)

Semantic Metadata for TMI Products<SyntacticMetaData>

<Binary><BinaryStructure geoInfo="ByProjection" instances="1">

<Projection LowRight_X="360" LowRight_Y="-40" UpLeft_X="0" UpLeft_Y="40">

<Geographic latOffset="-0.125" lonOffset="-0.125"/></Projection><Array occurs="320"><Array occurs="1440"><Field name="SST" type="UInt8" order="LittleEndian" size="8"><Data unit="C " equation="0.15 * X -3"/>

</Field></Array>

</Array>...

</SyntacticMetaData>

Writing an ESML File (1)

ESML MARKUPFOR THE DATA FILE

<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">

<Attribute/></Field><Field name="SizeY" format="%d">

<Attribute/></Field><Array occurs="4">

<Array occurs="5"><Field name="BrightnessTemp" format="%f"><Data unit="Degrees Kelvin"/>

</Field></Array>

</Array></AsciiStructure>

</Ascii></SyntacticMetaData>

</a:ESML>

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

SIMPLE ASCIIDATA FILE

Writing an ESML File (2)

<a:ESML><SyntacticMetaData>

SIMPLE ASCIIDATA FILE

ONLY THESTRUCTURE

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

Writing an ESML File (3)

<a:ESML><SyntacticMetaData><Ascii>

DESCRIBE THEFORMAT

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

SIMPLE ASCII DATA FILE

Writing an ESML File (4)

<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1">

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

ENTIRE FILE CONTENTSINTO 1 LOGICAL

STRUCTURE

SIMPLE ASCII DATA FILE

Writing an ESML File (5)

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">

<Attribute/></Field>

DEFINE THE FIRST FIELD IN THE FILE:

HEADER INFORMATION

SIMPLE ASCII DATA FILE

Writing an ESML File (6)

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">

<Attribute/></Field><Field name="SizeY" format="%d">

<Attribute/></Field>

DEFINE THE SECOND FIELD IN THE FILE:

HEADER INFORMATION

SIMPLE ASCII DATA FILE

Writing an ESML File (7)

<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">

<Attribute/></Field><Field name="SizeY" format="%d">

<Attribute/></Field><Array occurs="4">

<Array occurs="5"><Field name="BrightnessTemp" format="%d"><Data unit="Degrees Kelvin"/>

</Field></Array>

</Array>

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

DEFINE THE DATA FIELD IN THE FILE:PROVIDE SIZE AND

FORMAT INFORMATION

SIMPLE ASCII DATA FILE

Writing an ESML File (8)

<a:ESML><SyntacticMetaData><Ascii><AsciiStructure geoInfo="NoGeoInfo" instances="1"><Field name="SizeX" format="%d">

<Attribute/></Field><Field name="SizeY" format="%d">

<Attribute/></Field><Array occurs="4">

<Array occurs="5"><Field name="BrightnessTemp" format="%d"><Data unit="Degrees Kelvin"/>

</Field></Array>

</Array></AsciiStructure>

</Ascii></SyntacticMetaData>

</a:ESML>

CLOSE ALL THETAGS: ESML FILE

IS READY

451 2 3 4 56 7 8 9 10

11 12 13 14 1516 17 18 19 20

SIMPLE ASCII DATA FILE

Advantages of using ESML• Scientist (Data Producer/Consumer)

– ESML will let them use virtually any data format in their applications

– ESML files are external description files that can be easily created, modified and viewed by any text editor

– ESML has a few simple concepts which can be used to describe numerous data sets

– An ESML file can be seen as a set of instructions to the application on how to read and understand a data file

– If the format of the data changes for whatever reason (e.g., newversion of data set) no software changes are required, just a new ESML file.

• Does that mean a scientist has to write an ESML file for every data file? – No, in fact the beauty of ESML is that it allows scientist to write

ONE ESML file to describe MANY data files that are structural and semantically similar

Advantages of using ESML

• Data Archiving Centers (Data Producers) – ESML files can be used to store not only the structural

and semantic information about data sets but also content or search metadata

– Since ESML files are independent separate files, they can be generated on the fly utilizing metadata databases as datasets are ordered

– Centers can archive data in its native formats and not have to store them in any “selected” format

– Centers can now also “ESMLize” all their legacy datasets with minimal efforts

– The existing legacy datasets now become a more valuable data resource for scientists, because they can be used more efficiently and effectively

Advantages of using ESML

• Application Developers– By using the ESML library, developers can build

“ESML enabled” applications!– ONE single reader component can read all the various

data formats instead of having separate reader module for different formats

– URL access• Application users can access data stored at different site online

without requiring to download the data files to their machines– These features make applications much more powerful

and flexible– Plus, ESML library is intuitive and easy to use

Advantages of using ESML• Other Advantages

– In addition to allowing applications to read the data, ESML provides scientist nifty features such as:

– Semantic tagging• Assigning meaning to different data fields that are present • Identify geolocation fields • Users can modify tags to access different data fields based on their

needs and requirements– Slicing and Dicing data

• Selecting or subsetting a part of data by changing the ESML definition

– Preprocessing data • Scientists can specify a transformation equation to be performed when

reading the data• Application receives the preprocessed the data• Example - Data that are stored as packed storage format and require

to be converted into some scientific quantity

Tools/Products Status

Ability to browse data files using the ESML description files

ESML Data Browser

Ability to write and validate ESML descriptionsESML Editor

•WINDOWS and LINUX•URL access•Handles ASCII, Binary (McIDAS), HDF-EOS, GRIB files•Handles preprocessing, wild card and symbols

ESML Library(C++/Java JNI )

•Defines ASCII, Binary (McIDAS), HDF-EOS, GRIB formats•Provides preprocessing, wild card, symbols and semantic tagging capabilities

ESML Schema

FeaturesTools/Products

ESML Editor

ESML Data Browser

ESML Enabled Applications (current and planned)

• ADaM Data Mining System• ESML Pilot Study – building a

MODIS/CERES collocation web service• Web Map and Coverage Servers for Passive

Microwave data sets• General Purpose Subsetter• Space Time Tool Kit• DODS server

Summary• ESML is NOT a new data format• ESML enables independently developed applications

and services to effectively utilize wide variety of distributed, heterogeneous data products

• ESML is simple enough that end-users (scientists) can create their own ESML for on-hand datasets (new or legacy)

ESML Web Page

• URL: esml.itsc.uah.eduesml.itsc.uah.edu• Post latest products, news, presentations, papers• Schema and related documents available to all• ESML Library, ESML Editor and ESML Data

Browser available

Demo

• ESML Editor• Data Browser

– Read multiple formats• National Lightning Detection Network (ASCII)• Advanced Microwave Sounding Unit-A (HDF-EOS)

• Equation to preprocess the data• Semantic capabilities – slice and dice the data

All the ESML tools/products are available at: http://esml.itsc.uah.edu/

Questions?