Upload
rowan-salyards
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
CONSOLIDATION OF CONSOLIDATION OF METADATA IN FIELD OF METADATA IN FIELD OF
ENVIRONMENTAL ENVIRONMENTAL SCIENCESSCIENCES
Evgeny Vyazilov, All-Russia Reseach Institute of Evgeny Vyazilov, All-Russia Reseach Institute of Hydrometeorological Information – World Data Hydrometeorological Information – World Data
Centre. Obninsk. E-mail: Centre. Obninsk. E-mail: [email protected]@meteo.ru
CITES-2005, Novosibirsk, March 19-23 2005
http://www.oceaninfo.ru
ContentContent• Necessity of metadata creation • A brief history of metadata development and
characteristic of the most known systems metadata for various metadata objects
• Metadata structure on an example ESIMO • Documentary data - sources metadata • Problems of development and use metadata • Metadata as a basis of monitoring of a condition of
information resources in the field of an environment• Metdata aggregation• For examles of out put forms• Prospects of metadata development • Conclusion
World “metadata” in InternetWorld “metadata” in Internet
Serach mashin Rus Engl• GOOGLE 30100 10700000• Yandex 193084 120302• Yahoo 2790 4710000• MSN 43795 1632490
787 requests for last month for Yandex
Characteristics of environmentalCharacteristics of environmental data data collection and processing systemscollection and processing systems
• Big size data - hundred Tb (tens thousand files) • High intensity of data sets updating - up to 1Tb per one year • Large amount of data sources - tens thousand sea
expeditions, more than 10 000 hydrometeorological stations and posts
• Small of output production made by periodically• Variety of inquiries - from simple inquiries about information
about data up to climatic estimations of influence of environment on objects of economy
• Variety of distribution forms - table, diagrams, maps, directories, that requires use of various software
• Necessity of access to data in on-line • Many steps of data processing - observation, collection,
cataloguing, applied processing, …..
MetadataMetadata are information on data being auxiliary, help at data are information on data being auxiliary, help at data processing processing
Semantic MetadataSemantic Metadata - the logic characteristics of data, advance - the logic characteristics of data, advance the items of information on sources of data accommodation, and the items of information on sources of data accommodation, and also give information on: who, where and than observed, when also give information on: who, where and than observed, when and as data are received, on what carrier and format are stored, and as data are received, on what carrier and format are stored, what software entered the data, as were checked etc. what software entered the data, as were checked etc.
SyntacticSyntactic MetadataMetadata – information on data accommodation in a – information on data accommodation in a network, a disk, describe structure allowable meanings, ways of network, a disk, describe structure allowable meanings, ways of their representation, interrelation with other data, distribution and their representation, interrelation with other data, distribution and other data characteristics, which help to carry out access to data, other data characteristics, which help to carry out access to data, correctly to interpret them and to usecorrectly to interpret them and to use
Definitions
Necessity metadataNecessity metadataMetadata are necessary not who is creator of databases, and the
one who uses environment data
• Distributed character of data centers and platforms• Variety of observant platforms, parameters, methods and ways
of their reception • Variety registration methods of observation (hard copies,
technical carriers) • Available paper catalogues, lists, information received from
technical carriers already few help by data search • Metadata - basis for transition to paperless technology of
information processing (future "Data source – Decision Makers")
• Metadata allow faster to be guided in a big data flow about natural environment
• In many cases data holders are not interested in, that information from databases could be used by departments, ministries, economy subjects or population
The requirements to metadataThe requirements to metadata • Metadata will help to answer questions: what data are, whence
and when the data have got in storehouse, who the author, when also by whom changed, in what structure are stored
• It are necessary various metadata - information on organizations assembling and storing data, data sets, exchange formats, processing software, etc.
• For thin data select metadata attributes are required which are not present in initial files, for example, quantitative characteristics of data flows
• All objects metadata should be stored in one scheme • Metadata should allow finding by the logic characteristics of
data their physical address of storage • Than more full base metadata, them they can be are used for
data more effectively• Creation of metadata bases should be a duty of each project,
program, expedition connected to data reception
Features metadataFeatures metadata • Volume metadata concerning large. So the bases of coverage
by oceanographic observation of this or that area are estimated in 1Gb
• Expendable input of the information at initial loading metadata with the subsequent modification and its repeated use during enough long interval of time
• Rather small activity of updating both on frequency, and on volume of updating
• Necessity of centralization of the general information about data and decentralization of the local, detailed information about data
Documentary data - metadataDocumentary data - metadata sourcesource• Formalized description of a data sets and data base• Description of data sources (organization, observant platforms,
projects) • Description of data storage format• List of parameters • Methods description of data check• Completeness of a file in relation to the initial carrier or program
of observation • Descriptions of the observant programs (projects)• Description of measurements methods and used equipments • List of logic units of a storage (cruises, squares, geographical
areas etc.) with the indication of observation amount• Description of software • List of the publications received on the basis of this data set • Used qualifiers and codifiers
Where is created occur metadata?Where is created occur metadata?• Manufacture of observation - network, methods of observation, methods
of environment parameters definition, measuring systems • Means of manufacture of observation - RVs, stations, satellites, buys • Data collection - technology, formats of transfer, description of transmitted
complete data sets • Accumulation data - description of data sets, organizations - suppliers,
owners, users, formats of the acquisition, storage and exchange of data, observant projects, cruises of RVs, information on parameters, codifies, technologies, methods of the data control
• An interdepartmental and international data exchange - formats, description of complete data sets, observant projects and programs
• A storage and protection data - processing technology, processing methods, control and analysis data, software, algorithms of calculation of parameters, coverage by observation
• Modeling - model, methods, formats of the target data • Distribution of the data - production (analyses, bulletins, monthly journals,
year-books, directories, forecasts, Web sites), soft and hardware • Decisions support - impacts of environment on economic objects, ways of
prevention of influences
Structure metadataStructure metadata Data sources
• information on organizations • information for all RVs and about
working RVs • information on cruises of RVs • information about hydrometeostations • Information on the satellites • information on networks of
observation • information on the experts • information on the observant projects • information on hydrometeorological
equipments • information about Web sites
Information resources • information on data sets and
databases • Web - resources • information production
Instens of data • information about observation, profiles,
terms • information about times series • information about grid data• information about text graphic
informationSyntactic metadata
• the descriptions of data formats• the dictionaries and codifiers • information on models, software • the dictionary of the terms • the list of used abbreviators
Spatial metadata • information on maps • information on shape-files • information about attributives data • … …
Place and role of metadata in data Place and role of metadata in data managementmanagement
Metadata
General description of database, formats
Data sources (organizations, observing platforms)
Logic units of processing (cruises, squares, stations, etc
)
Database
Connection between IR instanceConnection between IR instance descriptions and metadata objectsdescriptions and metadata objects
• inside one metadata object - between various Instance of metadata objects, for example, for IR description, several projects of one program
• between different metadata objects - for example, the description of data sets should be accompanied by the complete information on organization, experts, platforms, formats, projects
• between objects of one type at the distributed storage - for example, information about IR, supported by different organizations
Metadata at various levels of dataMetadata at various levels of data managementmanagement
• Local - observant platform (the separate organization) - is necessary the detailed information on data sets (databases) as information on RVs cruises and their condition (in processing, on what carrier etc.), about a coverage on various parameters
• Regional - project, expedition (corporation) - information on each data set, collection unit, account and data exchange (cruise, monthly flow data from coastal station)
• National - information about organizations, data sets, software of processing, formats and exchange at a level of the country, observant platforms, observant networks etc.
• International - information on the international agreements, data sets transferred to the international exchange, including information on cruise and stations, formats of data exchange, processing software
At all levels of management are available as the help information of one class
(information on data sets, data sources, formats etc.), so specific to each level
Metadata at various stages of Metadata at various stages of processingprocessing
• Collection - as a rule metadata are stored together with the initial data
• Primary processing allocate information on observant platforms
• Analysis of the information - there is various objects metadata
• Decisions support - information on production and rules of its release, and also possible types of inquiries and soluble tasks
Metadata StandardsMetadata Standards • ISO 19115 – International Standard for Geographic Information• ISO-19139 – XML Schema, Extension of ISO Metadata Schema • ISO 3166 – Geographical regions and countries
XML - standards• Dublin Core – Web resource Description of bibliographic information• RDF – Description of complex, hierarchical connection resources • LOM – Educational resources description (Learning object model)• XMI – XML Metadata Interchange• UDDI – Universal Method Description, Discovering, Interface• WSDL – Web - Service Description• DCML - Data Center Markup Language Framework Specification • SWEET – Semantic Web for Earth and Environmental Terminology• OWL - Ontology Web Language• ESML - Earth Science Markup Language
Others standards• CDI – Sea Search project (EC)• DIF – Data Interchange Format (GCMD, NASA)• Standard for Digital Geospatial Metadata – US Federal Geographic Data
Committee• EDMED – EC standard for marine data• ROSCOP – IOC standard for cruise data
For effective data management is For effective data management is necessary aggregated metadatanecessary aggregated metadata
• Condition IR - aggregated characteristics of databases, their quantity
• Condition of observation networks used equipment, measuring systems are by the important parameter of quality of a observation network
• Characteristics of information flows, data distribution on the basic regions of Globe with the indication of organizations - data suppliers
• Distribution of the information on carriers, various levels of the collection and data exchange
• Quantity of the executed inquiries with the indication of the tendencies in information needs of the users etc.
The characteristics IR, as the result of The characteristics IR, as the result of metadata bases processingmetadata bases processing
• on a basis metadata is possible to carry out analytical inquiries and to receive the aggregated characteristics, i.e. to carry out the analysis of receipt data from various organizations.
• For example, to receive: - Quantity of data sets on organizations, regions - Quantity of RV cruises, arrived in 1990 – 2000 from the
various countries, organizations of Russia; quantity of RVs cruises, arrived 1990-2000 on kinds of observation
- Quantity of stations on squares, periods, parameters etc.• Such information allow to receive the information for data
management about quantity of cruises, stations on areas, departments etc.
Metadata aggregationMetadata aggregationQuantity of logic units of data (RVs,
expeditions, observant platforms, structures, parameters) for the period
of observation for geoobject, organization
Information on data set,
IR
Information on data sources (observing nets, platforms HMS, RVs, satellites, etc.)
Information on obserbation in point
Quantity of observations in unit of data collection (stations, time) for
observation period
Quantity of stations, profiles, levels for every parameter
Aggregation level
Characteristic of aggregated information Characteristic of aggregated information at various stages of processingat various stages of processing
• Manufacture of observation - quantity of information sources (RVs, coastal stations, buys etc.), volumes of the received information of one source (urgent, daily, monthly, annual)
• Data sorting in data centre - for time (daily, monthly, annual), in space (station, cruises, territory)
• Data processing - volume of the process able information, time of processing of the information
• Distribution of the information - volume of the out put information, periodicity of representation (day, week, decade, month, quarter, year); spatial association of the data (region, water area); sorting of the data (in time, in space, in space and in time simultaneously)
Metadata aggregation for various Metadata aggregation for various levels of managementlevels of management
• Higher organizations - general information on a DB condition, IR updating, portal visiting
• Management - general information on DB condition, IR updating, condition of metadata bases, various IR
• Users - general information on a DB, subschemes • Applied programmers - developers of applications -
general information on a condition DB, on subschemes, detailed informations on the tables
• DB administrator, developers of applications - general information on DB, on subschemes, detailed information on the tables special information on the rights, roles for subschemes, tables, parameters
The monitoring IR isThe monitoring IR is • operative observation of data flows for preparation
information - analytical materials, decisions support on IR, DB development, applications with the purpose of improvement of information maintenance
The basic idea
DB administrators at any moment from any place should receive metadata about DB quantitative characteristics, their updating
Basis monitoring metadata and application developed for reception of an information on bases of the initial data
Function of IR monitoringFunction of IR monitoring• Reception of an information about the contents and
volumes of separate subschemes (objects) and tables with a various aggregation degree
• Reception information on data coverage of separate areas for any data kind and separate parameter
• Visualization of any metadata kind as the tables by criteria of search
• Information - analytical tasks on management IR (estimation of volumes, coverage of separate subject domains and geographical areas, forecast of development IR etc.)
• Estimation visiting and separate IR • Distribution of information-analytical materials
Basic approach on creation of IRBasic approach on creation of IR monitoring systemmonitoring system
• Maximal use before the created applications, for example:
- Reception information about RVs cruises
- Data coverage
- Work with Unified Dictionary Parameters
- Search metadata
- Help information on DB subschema
History of development information History of development information systems for metadatasystems for metadata
• 1969. RIHMI-WDC. The description for hydrometeorological data sets
• 1971. RIHMI-WDC, VNIIMORGEO. Metadata on expeditions, profiles, stations (many levels data structure)
• 1977. IOC UNESCO (MEDI). Metadata on organizations, catalogue of stations, information on data sets which are taking place in the various countries, detailed description of files
• 1984. RIHMI-WDC. A complex metadata (data sets, cruises, projects, coverage, etc.)
• 1987. WMO. INFOCLIMA. The bibliography and information on data sets
• 1997. RIHMI-WDC Electronic directory • 1998. UA, MGI. Metadata (projects, experts, information on
cruises)• 1999. RIHMI-WDC. System metadata ESIMO (Internet)• 2002. EU. The project SeaSearch. 4 objects metadata (data
sets, cruises, projects, CDI)• 2004. RIHMI-WDC. Pilot IOC project. XML Scheme metadata
The most known metadataThe most known metadata systemssystems• CIESIN (http://www.ciesin.org ) - Consortium for International Earth
Science Information Network • EDMED (http://www.bodc.ac.uk/services/edmed/ukmed.html ) -
information on data sets (4 thousands) • GCMD - http://gcmd.nasa.gov information on data sets (10 thousand) • INFOTERRA (http://www.unep.org/infoterra/welcome.htm ) - The Global
Environmental Information Exchange Network • Oceanic (http://ships.cms.udel.edu ) - Information on the programs of
research courts • OceanPortal (http://oceanportal.org) - 5 thousand web - sites• OceanExpert (http://ioc3.unesco.org/oceanexpert) – information on more
than 10 thousand experts, • ROSCOP (http://www.ices.dk ) - information on RVs cruises under the
Announced national programs, about 10 thousand expeditions(dispatches)
• ACOD (RIHMI-WDC) - http://data.oceaninfo.ru/cruisecat/index.jsp catalogue of flights (33 thousand expeditions)
• EOSDIS (http://eosdismain.gsfc.nasa.gov/eosinfo/EOSDIS_Site/)
Structure and connections of metadata Structure and connections of metadata objects objects
(Example in field oceanography(Example in field oceanography))
Data Sources (organizations, data
sets, WEB resources)
Formats Coverage observation
Observing nets
Platforms
Experts
Observing projects, programs
Equipment, observation methods
Information on cruises
Coastal Stations
RVs Buy Satellites
Data sets, databases
….
Information on spatial data
Software, models
Parameters
Codifies
Methods of consolidation metadata - for Methods of consolidation metadata - for reception them from one sourcereception them from one source
• To develop the uniform scheme metadata for all objects • For each metadata object to create java - classes, which can be
used in any metadata object (organization, experts, parameters, other)
• To include java - classes in the appropriate applications for creation of the out put forms for other of objects metadata
• For each object metadata to give out the list of web – site addresses , on which there are external systems with metadata
• To organize the automated transfer of search criteria in other of system metadata
• To create the common list of data sets and DB, appropriate references to their descriptions which are taking place on sites of organizations – data owners (NESDIS)
Scheme of data search with help Scheme of data search with help metadatametadata
List of IR
Menu Maps Dictionary
Extents description of IR
IR (DB, ftp, html, gif, txt)
Information resource description, joint with metadata objects (experts, organizations,
observing nets, projects)
Information on RVs cruise
Data coverage Platforms (RVs,
satellites, stations)
Applications
Examples of consolidation on ESIMOExamples of consolidation on ESIMO web portal web portal
httphttp://://datadata..oceaninfooceaninfo..ruru//resourceresource//index.jspindex.jsp
Links list
UDP
IR descriotion
Experts
Realization of metadataRealization of metadata bases bases Condition ESIMO metadataCondition ESIMO metadata
• Descriptions IR - 600 • Instance for file system -15000, in DB - some millions • Organizations -participants of updating IR - 30,
shipowners - 100 • Parameters - 600 • Objects metadata -15, total amount more than 40
thousand• Codifies - 160
Forms of distributionForms of distribution
Information on data sets Information on organizations
Information on coastal stations Information on equipment
Forms of distributionForms of distribution((continuationcontinuation ))Information on projects Information on Links
Electronic Guide for ESIMO IRUDP
Metadata - as a basis of IR monitoring on Metadata - as a basis of IR monitoring on ESIMO web portalESIMO web portal
Information on subschema DBInformation on subschema DB
Information on IR and visiting portalInformation on IR and visiting portal
Applications for data and metadataApplications for data and metadata aggregationsaggregations
Hydrometeorological ship data, meteobuys
Bathy, Tesak massage
Information on quantity of IRInformation on quantity of IR i instancenstance
Use metadata and results of IR Use metadata and results of IR monitoringmonitoring
• Guide "Information resources about a condition of World ocean“, http://data.oceaninfo.ru/resource/spr/index.htm
• Portal visitors (separate information), http://data.oceaninfo.ru/resource/portal/administrativestatistic?action=showStatatisticByProfile
• Monitoring DIRS (detailed information),
http://data.oceaninfo.ru/monitoring/index.jsp • Pushing under the list (updating DIRS and visiting) • Links analysis for condition servers and channels of
communication • Analysis of portal state (message on telephone)
Prospects of metadataProspects of metadata development development Metadata - basis of the virtual data centreMetadata - basis of the virtual data centre
• To remove separation of information systems and resources to transform them in unified structure
• To create system with use of ideas virtual data centre and service - oriented architecture
• To develop a semantic network - to create uniform space of names on each subject domain -, as a basis of structure description data exchange between the applications
• IR Provider • - Has the catalogue web services on the basis of the standards
UDDI, WSDL, SOAP for work with method • - Supports conceptual XML - scheme information stored and
processable by the virtual centre • - Organizes interaction between web-services - metadata
exchange• Authors IR - support web-services, including metadata,
codifiers • Users - receive the information at any moment from any point
on any object metadata
ConclusionConclusion• A number of information systems do not correspond to requirements of
users on completeness, availability, integrity contained in them metadata • Last years the process of creation of new objects metadata • Basic lack of many systems is the duplication of separate sections of the
descriptions in various objects metadata (information on organizations, experts, platforms, others)
• Uniform complex of technologically connected among themselves information systems with metadata
• Is necessary metadata objects are allocated, that allows to divide the responsibility for their creation, systematically to create them depending on importance and necessity to consolidate them on various web - portals, to organize complex information maintenance by data
• Developed approaches on integration was sped up and the consolidations metadata from various sources
• Systems of metadata collection are now centralized, but the coming years within the framework of creation of services it will be possible to speak about metadata distributed storage
• Metadata are widely used DB administrators and without consolidation metadata to organize effective operation very difficultly
Спасибо за внимание!Спасибо за внимание!
Thank for attention!Thank for attention!