33
Development of the Next Generation PDS Data Standards PDS4 Earth and Space Science Informatics Workshop J. Steven Hughes PDS4 Data Design Working Group August 2-4, 2010

Development of the Next Generation PDSSataStadads …virbo.org/wiki/images/ESSI_DataStds_100801.pdfDevelopment of the Next Generation PDSSataStadads Data Standards PDS4 Earth and Space

  • Upload
    trandan

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Development of the Next Generation PDS Data StandardsS ata Sta da ds

PDS4

Earth and Space ScienceInformatics Workshop

J. Steven HughesPDS4 Data Design Working Group

August 2-4, 2010

Topics

• IntroductionD i G l• Design Goals

• Key Architectural Concepts• Data Driven Development• Data Driven Development

Copyright 2009 California Institute of TechnologyGovernment sponsorship acknowledged

PDS 2010 Architecture

3

Why upgrade the PDS Data Standards?Why upgrade the PDS Data Standards?

• The current PDS data standards (PDS3) were d l d i h l 1980’ d fi h developed in the late 1980’s to define the concepts and terms needed for archiving science data in the planetary science domain.p y• Data standards were innovative for their time however

after almost two decades of use:• Ambiguity had crept inAmbiguity had crept in• Data formats had become obsolete• Usability software had become difficult to maintain

Th i h d i ifi bl f • These issues have caused significant problems for PDS operations, data providers, and end-users.

4

DeliverablesI f ti M d l• Information Model• The Information Model defines object classes, including

data structures, formats, and products as well as data sets, documents, software, and missions. sets, documents, software, and missions.

• Data Dictionary• Model - The Data Dictionary Model provides the schema for Model The Data Dictionary Model provides the schema for

the data dictionary. • Content - The Data Dictionary documents the data

elements used in the Information Model.

• Standards Reference • The Standards Reference documents the overall standards

architecture.

• Grammar Options• XML is the working grammar of the archive.

5

Design Goalsg

• Simplified Data Formats

• Long-term Stability in the Archive (Data structures should not become obsolete)structures should not become obsolete)

• Efficient Archive Preparation for Data Providers

• Efficient Data Service Development

• Enhanced Data Dictionary

6

Key Features

• Four base formats for all archived information• Physical data segments map directly to logical

tsegments• Documents, software and ancillary data

treated as rigorously as observational datatreated as rigorously as observational data• Keyword content sorted into independent

classes• Hierarchical data dictionary with delegated

authorities

7

Base Formats

All th d t d l ith b b k d All the data we deal with can be broken down into one or more of the base formats.

• Arrays

• Tables

• Parseable byte streams

Encoded byte streams• Encoded byte streams

8

Data Dictionary

• All keywords grouped into classes

• Separate (or partitioned) dictionaries to distribute authority

• Strict central control over structural descriptions and universally required sections

9

Data Dictionary - Logical View

GovernanceR i t ti A th it

ISO/IEC 11179:2003 Volume:3 Metadata Registry Specification

Common

• Registration Authority• Steward• Namespace

g y p

Discipline / External Source

Local Data Dictionaries (Mission)( )

All Products Are Equal

All products are treated with equal rigor in labelling and documenting.

• Ensures the ability to cross-reference throughout the archive holdings

• Supports interface selection and packaging options for usersp

• Necessary for tracking and processing formats that may require migration in futurethat may require migration in future

11

OAIS Information Objectj

• The OAIS* Information Object unifies digital, conceptual and physical objects and their conceptual and physical objects and their descriptions

D ObjRepresentationStructuralSemantic

Data ObjectDigitalPhysicalConceptual

• A product is a uniquely defined package of related information objects

Conceptual

related information objects• Data Product, Software, Document

• A data set is a collection of productsp

12* Open Archive Information System

Data Product Components

Registry Object &Web Resource

Classification

Product

Description Combinations

Data Object Description

Structure

Data

PDS4 Core Concept Map

14

Data Driven Development Process

• The ontology defines the things in the domain and their

l ti hi relationships.

• A Data Dictionary defines data elements.

• The report writer uses the ontology and data dictionary to data dictionary to export and translate the information model into various notations and languages.

•Updates to the ontology are reflected ontology are reflected in the artifacts automatically.

Image Grayscale

16

Target Body

17

Example XML Schema - Image Grayscale<xsd:complexType name="Image_Grayscale_Type"><!-- Structure_Base_Type:Array_Base --><xsd:sequence>

<xsd:element name="local_identifier" type="dd:local_identifier_Type" minOccurs="1" maxOccurs="1<xsd:element name="comment" type="dd:comment_Type" minOccurs="0" maxOccurs="1"> </xsd:element><xsd:element name="axes" type="dd:Array_2D_axes_Type" minOccurs="1" maxOccurs="1"> </xsd:elemen<xsd:element name="axis_order" type="dd:Image_Grayscale_axis_order_Type" minOccurs="1" maxOccur<xsd:element name="object_encoding_type" type="dd:Array_Base_object_encoding_type_Type" minOccu< d l t "D t L ti " t " d D t L ti T " i O "1" O "1"> </<xsd:element name="Data_Location" type="pds:Data_Location_Type" minOccurs="1" maxOccurs="1"> </<xsd:element name="Array_Axis" type="pds:Array_Axis_Type" minOccurs="2" maxOccurs="2"> </xsd:el<xsd:element name="Array_Element" type="pds:Array_Element_Type" minOccurs="1" maxOccurs="1"> </

</xsd:sequence></xsd:complexType>

<xsd:complexType name="Data_Location_Type">…

<xsd:complexType name="Array_Axis_Type"><xsd:sequence>

<xsd:element name="elements" type="dd:elements_Type" minOccurs="1" maxOccurs="1"> </xsd:element<xsd:element name="name" type="dd:name_Type" minOccurs="1" maxOccurs="1"> </xsd:element><xsd:element name="scale_type" type="dd:scale_type_Type" minOccurs="0" maxOccurs="1"> </xsd:ele<xsd:element name="sequence_number" type="dd:sequence_number_Type" minOccurs="1" maxOccurs="1"><xsd:element name="unit" type="dd:unit_Type" minOccurs="0" maxOccurs="1"> </xsd:element>

</xsd:sequence></xsd:sequence></xsd:complexType>

<xsd:complexType name="Array_Element_Type"><xsd:sequence>

<xsd:element name="data type" type="dd:data type Type" minOccurs="1" maxOccurs="1"> </xsd:eleme

18

sd:e e e t a e data_type type dd:data_type_ ype Occu s a Occu s / sd:e e e<xsd:element name="scaling_factor" type="dd:scaling_factor_Type" minOccurs="0" maxOccurs="1"> <<xsd:element name="unit" type="dd:unit_Type" minOccurs="0" maxOccurs="1"> </xsd:element><xsd:element name="value_offset" type="dd:value_offset_Type" minOccurs="0" maxOccurs="1"> </xsd

</xsd:sequence></xsd:complexType>

Example XML Labels – Image Grayscale<Image_Grayscale><local_identifier>MPFL_M_IMP_IMAGE</local_identifier><axes>2</axes><axis_order>FIRST_INDEX_FASTEST</axis_order><object_encoding_type>BINARY</object_encoding_type><Data_Location>

<file_local_identifier>F09128.IMG</file_local_identifier><offset>1</offset>

</Data_Location><Array_Axis>

<elements>248</elements><name>LINE</name><sequence_number>1</sequence_number>/</Array_Axis>

<Array_Axis><elements>256</elements><name>SAMPLE</name>

2 /<sequence_number>2</sequence_number></Array_Axis><Array_Element>

<data_type>SignedMSB4</data_type>/ l

19

</Array_Element></Image_Grayscale>

Definition of Center Longitude

Identification Area

21

Cross Reference Area

22

DDig

ital O

bjee

cts

23

Registry Configuration File - Associations

<!-- AssociationType definitions -->

<rim:RegistryObject xsi:type="rim:ClassificationNodeType" code="has_browse" parent="urn:oasis:names:tc:ebxml-regrep:classificationScheme:Asslid="urn:nasa:pds:profile:regrep:AssociationType:has_browse“id="urn:nasa:pds:profile:regrep:AssociationType:has_browse"><rim:Name>

<rim:LocalizedString charset="UTF-8" value="has_browse"/></rim:Name>

</rim:RegistryObject>

<rim:RegistryObject xsi:type="rim:ClassificationNodeType" code="has_calibration" parent="urn:oasis:names:tc:ebxml-regrep:classificationScheme:Asslid="urn:nasa:pds:profile:regrep:AssociationType:has_calibrationid="urn:nasa:pds:profile:regrep:AssociationType:has_calibration"<rim:Name>

<rim:LocalizedString charset="UTF-8" value="has_calibration"/></rim:Name>

24

</rim:RegistryObject>

Query Model - Semantic Browser

Identifiable•The Identifiable model •The Identifiable model defines objects that can be registered into a registry and stored into a repository.

• Based on ISO 15000-3-ebXML RIM, Dublin Core; W3C:XML/SchemaW3C:XML/Schema

•Each Identifiable has a globally unique immutable identifie a logical identifie identifier, a logical identifier for grouping versions, and all names that might have been assigned to the object.g j

•Identifiables can be located and retrieved by a single query against a federated query against a federated registry system.

Identifiable•The Identifiable model •The Identifiable model defines objects that can be registered into a registry and stored into a repository.

• Based on ISO 15000-3-ebXML RIM, Dublin Core; W3C:XML/SchemaW3C:XML/Schema

•Each Identifiable has a globally unique immutable identifie a logical identifie identifier, a logical identifier for grouping versions, and all names that might have been assigned to the object.g j

•Identifiables can be located and retrieved by a single query against a federated query against a federated registry system.

Partial Graph of Center Longitude

Schedule and

Data Dictionary ModelISO/IEC 11179 adopted

Generic Product ModelDesigned and in testing

Progress ChartDesigned and in testing

Fundamental StructuresDesigned and in testing

Data FormatsData FormatsInitial set designed and in testing

Data Element NomenclatureRules drafted

D t Di tiData DictionaryClean-up started

Context ModelDesign started

XML/SchemaDesigned and in testing

Discipline ModelsInitial set designed; More neededInitial set designed; More needed

PDS4 Standards Reference, TutorialsConcept of Operations, DPHIn progress; dependent on model

Jan 2010 Sys Rev/MC SepJul Acc Rev

Benefits of the PDS4 Data Model

• The data model is managed in a ontology modeling tool.• The model is formally defined.The model is formally defined.• The model can be validated and tested.

• Defines a few simple fundamental data structures.• Fundamental data structures may be extended and • Fundamental data structures may be extended and

combined to form more complex data formats• The overall architecture is model driven.

Di t l th d l f it i l t ti• Disentangles the model from its implementation.• Model can evolve over time as research domain changes.• Drives the generation of documentation, label schema,

and other model dependent artifacts.• The data dictionary uses a standard data dictionary model.

30

Backup

31

Proposed IPDA Data Standards Project

• Identify the core elements of the PDS4 data standards

• Develop a process for maintaining alignment between the IPDA and the PDSPDS

UniqueNamespace

32

Positioning the PDS for the Future• Support for Advanced Technologies

• Service Oriented Architectures; Semantic Searches, Text and Facet Based Searches; Machine reasoning; Automatic classification; Logical Consistency Checking.

• Federated Registries: Unique Identification, Versioning, Federated queries, f d t d li ti f d t d li ki fi ti t federated replication, federated linking, configuration management, subscribe/notification, logging.

• Support for Interoperability• Shared Ontology across Planetary Science Disciplines; Shared ontologies

ithi S i Di i liwithin Science Disciplines• Standard Data Dictionary Schema

• Namespace partitioning; classification schemes; registration authority, submitter, steward

• Standards Based• ISO/IEC 11179-MDR; ISO 14721:2003-OAIS; ISO 15000-3-RIM; ISO/IEC

19502-MOF; ISO 639-RDF; OWL_DL; ISO/IEC 19501-UML; Dublin Core; W3C:XML/Schema;ISO 11404-Data Types; ISO 8601-Time

• Model Driven Implementation Philosophy• Metadata can be used in ways not yet envisioned.

• Supported Implementation Languages• XML, PVL, ODL, RDF/XML, OWL/XML, YADL, , , / , / ,

• Modeling Approach• Ontology - Object-Oriented semantics including class hierarchy, class

inheritance; named and typed associations; class, attribute and value cardinalities; network and recursive.