View
39
Download
1
Category
Tags:
Preview:
DESCRIPTION
Distributed Databases and metadata. G. Bégni, H. Makhmara - MEDIAS-France July 18, 2004 ENVIROMIS Tomsk. Aims of the presentation. Understanding the principles of metadata and databases. Making the scientific community aware of the efforts expected in terms of data documentation. - PowerPoint PPT Presentation
Citation preview
Distributed Databases Distributed Databases and metadataand metadata
G. Bégni, H. Makhmara - MEDIAS-FranceG. Bégni, H. Makhmara - MEDIAS-France
July 18, 2004July 18, 2004
ENVIROMIS TomskENVIROMIS Tomsk
Aims of the presentationAims of the presentation
Understanding the principles of metadata Understanding the principles of metadata and databases.and databases.
Making the scientific community aware of Making the scientific community aware of the efforts expected in terms of data the efforts expected in terms of data documentation. documentation.
Highlighting the positive impacts of such Highlighting the positive impacts of such efforts.efforts.
Demonstrating the need of an easy way to Demonstrating the need of an easy way to access distributed databases.access distributed databases.
ApproachApproach
Presentation of the AMMA context and its Presentation of the AMMA context and its constraints: status of the problem.constraints: status of the problem.
Reflection on a solution.Reflection on a solution. Abstract dAbstract description escription of the various of the various
elements part of the solution.elements part of the solution. Selection and justification of Selection and justification of standards standards
andand techniques techniques.. Assessment of selections.Assessment of selections.
AMMAAMMA context context
Scientific levelScientific level Multi-disciplinaryMulti-disciplinary Multi-Multi-scale.scale.
Technical levelTechnical level Multi-formatMulti-format Multi-Multi-volumevolume Multi-structureMulti-structure Multi-Multi-location.location.
Cultural levelCultural level Multi-Multi-lenguagelenguage Multi-usageMulti-usage Multi-Multi-possibilities.possibilities.
Constraints involvedConstraints involved
Providing the various communities with the Providing the various communities with the best suited access to data best suited access to data ((languagelanguage, , mediummedium, , costcost, , sservices…)ervices…)
Guaranteeing the durability of data Guaranteeing the durability of data wherever they are produced. wherever they are produced.
Ensuring the durability of services as time Ensuring the durability of services as time goes bygoes by ( (technological developmentstechnological developments))..
Access servicesAccess services
Easy web interface for data research and Easy web interface for data research and location (geographical, temporal, thematic, location (geographical, temporal, thematic, keywords).keywords).
Transparent service to access Transparent service to access heterogeneous distributed data heterogeneous distributed data (possibilities of compiling…).(possibilities of compiling…).
Homogeneous dHomogeneous documentation ocumentation for for heterogeneous dataheterogeneous data in order to optimise in order to optimise their exploitation.their exploitation.
Data durabilityData durability
Multiple and systematic back-up Multiple and systematic back-up procedure.procedure.
Data transparency in relation to Data transparency in relation to technological changestechnological changes ( (hardware, hardware, softwaresoftware))..
Transparent data eTransparent data exploitation xploitation as time goes as time goes by.by.
A A solutionsolution
Fully defined back-up process.Fully defined back-up process.
Data storage in standardised formats.Data storage in standardised formats.
Clear data dClear data documentation ocumentation for future for future exploitation.exploitation.
Service durabilityService durability
Services should not depend on any Services should not depend on any proprietary or « exotic » software. proprietary or « exotic » software.
The quality of a service should not The quality of a service should not deteriorate according to technological deteriorate according to technological changes. changes.
AA solution solution
Services Services based onbased on standards standards..
Services Services based on the « Open sourcebased on the « Open source»»..
To sum-up:To sum-up:
Standardise storage.Standardise storage. Standardise services.Standardise services. Standardise exploitation.Standardise exploitation.
However, some data formats cannot be However, some data formats cannot be standardisedstandardised ( (satellite imagingsatellite imaging))..
Neither can the related services.Neither can the related services.
Principles appliedPrinciples applied
Every item liable to be standardised Every item liable to be standardised should be standardised.should be standardised.
There should be a system gateway based There should be a system gateway based on standards only. on standards only.
Every item that cannot be standardised Every item that cannot be standardised should be described in a standardised should be described in a standardised way. way.
A standard for each elementA standard for each element
Data storageData storage: ANSI/ISO: ANSI/ISO,, SQL, XML SQL, XML.. Data descriptionData description: FGDC-STD-001-1998 o: FGDC-STD-001-1998 orr
ISO 19115ISO 19115.. Service descriptionService description: W3C SOAP: W3C SOAP.. Catalogue: ANSI/ISO 23950 (Z39.50)Catalogue: ANSI/ISO 23950 (Z39.50)..
Data descriptionData description M Metadataetadata
Formed from a Greek rootFormed from a Greek root (« meta »).(« meta »). What surpasses, encompasses a subject, a science.What surpasses, encompasses a subject, a science.
(Le Robert(Le Robert Dictionary Dictionary)).. Denoting a nature of a higher order or more fundamental Denoting a nature of a higher order or more fundamental
kind. kind. ((Ofxord Talking DictionaryOfxord Talking Dictionary)).. EnglishEnglish: metadata: metadata
FrenchFrench: métadonnées: métadonnées.. Literally speaking, metadata are data about data.Literally speaking, metadata are data about data. To be more precise, they are structured sets of To be more precise, they are structured sets of
information that describe resources.information that describe resources.
Metadata standardsMetadata standards
Metadata have always existed.Metadata have always existed. An effort of world-wide standardisation An effort of world-wide standardisation
has been undertaken for several years.has been undertaken for several years. Several (georeferenced)Several (georeferenced) standards standards::
1.1. Content Standard for Content Standard for DDigital Geospatial igital Geospatial Metadata: FGDC-STD-001-1998Metadata: FGDC-STD-001-1998..
2.2. ISO 19115 ISO 19115 since the end ofsince the end of 2002 2002..
FFGGDC DC is a de facto standard.is a de facto standard.
AAddvantagesvantages
Homogeneous presentation.Homogeneous presentation. Pooled developments.Pooled developments. PossibiliPossibility to automate data processing. ty to automate data processing. Comparison of examples:Comparison of examples:
1.1. GeoConneGeoConnectctions Portaions Portal, l, Canada: Canada: http://http://geodiscover.cgdi.cageodiscover.cgdi.ca
2.2. Portal Portal on desertification monitoringon desertification monitoring (OSS/Medias/SCOT):(OSS/Medias/SCOT):
http://http://geooss.oss.org.tn/geoossgeooss.oss.org.tn/geooss
EffortsEfforts asked asked from data providersfrom data providers
Be aware of standards.Be aware of standards. Endeavour to describe data as completely as Endeavour to describe data as completely as
possible. possible. Use data exchange formats as simple and Use data exchange formats as simple and
consistent as possible. consistent as possible. ----------------------------------------
Data providers do not have to care about the Data providers do not have to care about the technical or formal aspects of standards. technical or formal aspects of standards.
Database managers will provide them with easy Database managers will provide them with easy and user-friendly tools to describe their data. and user-friendly tools to describe their data.
MetaCatalog (Portal to the AMMA I.S)
Meta database(ISO 19115 AND/OR FGDC)
DB AMMASAT DB LOPDB SOP
Exchange protocol Exchange protocol Exchange protocol
AMMA INFORMATION SYSTEM ARCHITECTURE
1.Search by criteria(User friendly interface)
2.Query metadata
3.Retrieve metadata
4.Choose datasets
4.Query data
5. Locate and query datasets from relevant data sources
6. Retrieve datasets
Technical diagramTechnical diagram
Z39.50
YAZPHPZOOM
Other catalogues(GCMD, Clearinghouse FGDC)
XML records Metadata creation - validation
Web forms
Import XML
Catalogue service (any user) Edition service (data provider)
Zebraserver
Zebra indexer
ZAP client
CharacteristicsCharacteristics
Management of multi-standard metadataManagement of multi-standard metadata ISO 19115ISO 19115 FGDCFGDC DIF DIF ifif XML XML schema. schema.
Transparent to the data provider.Transparent to the data provider. Transparent to the user.Transparent to the user.
Data access servicesData access services
Médias-France is devoloping generic data Médias-France is devoloping generic data access servicesaccess services
These services have to be auto descriptive, These services have to be auto descriptive, registered and with well know interfacesregistered and with well know interfaces
For the moment, we focus our efforts on For the moment, we focus our efforts on software permitting access to software permitting access to geographically distant databases geographically distant databases (Distributed databases)(Distributed databases)
PrincipePrincipe
Each service is registered within a Each service is registered within a directory serverdirectory server
Each data source declares what data it Each data source declares what data it servesserves
A web portal is used by scientists to locate A web portal is used by scientists to locate and request data from different sourcesand request data from different sources
Data is sent back to the user in a Data is sent back to the user in a standardized format standardized format
ImplementationImplementation
Data sources are under PostgreSQL, flat Data sources are under PostgreSQL, flat files or other RDBSM systemsfiles or other RDBSM systems
Each data server is a DODS servlet Each data server is a DODS servlet (Distribued Oceanographic Data System)(Distribued Oceanographic Data System)
Sevlet container is Apache TomcatSevlet container is Apache Tomcat Metada are in XML filesMetada are in XML files
ProspectsProspects
Develop Web services based on W3C Develop Web services based on W3C SOAP recommandationSOAP recommandation
Implement a Directory service for servicesImplement a Directory service for services Hope share development effors with other Hope share development effors with other
organisations, within the framework of organisations, within the framework of international projects (Funded by EC, international projects (Funded by EC, INTAS…)INTAS…)
Recommended