15
M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 rld Data Center Cluster „Earth System Resear - an approach for a common data infrastructure in geosciences WDC-MARE WDC-RSAT WDC-TERRA WDC-Climate (Candidate)

M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Embed Size (px)

Citation preview

Page 1: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI)

CODATA General Assembly, Berlin 10.11.2004

World Data Center Cluster „Earth System Research“- an approach for a common data infrastructure in geosciences

WDC-MAREWDC-RSATWDC-TERRA

WDC-Climate (Candidate)

Page 2: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

0

5

10

15

20

25

30

35

40

1970 1980 1990 2000 2010

Publications

Data

?

Global increase in publications in empirical sciences

Page 3: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

• The poor availability of scientific data hampers complex and large scale approaches in research

• Scientific results cannot be verified without the data underlying publications

• Reproduction is often more expensive than archiving and recycling of data

Main problems

Page 4: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

• founded during the International Geophysical Year (IGY) 1957-58

• longterm funding and maintainance by their host countries on behalf of the international science community

• status of WDC is peer reviewed by international research institutes and programmes and funding organisations

• accept data from national and international scientific or monitoring programs as resources permit.

• all data held in WDCs are generally available to science

• scope of data collected: solar, geophysical, environmental, and human dimensions data, especially for monitoring changes in the geosphere and biosphere

• at present 52 Centers in 12 countries

The World Data Center (WDC) System of the International Council for Science (ICSU)

Page 5: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Founded in April 2003 in Oberpfaffenhofen

Members:• WDC-MARE - World Data Center for Marine Environmental Sciences

(AWI, MARUM)

• WDC-C - World Data Center for Climate (MPI, Hamburg)

• WDC-RSAT – World Data Center for Remote Sensing (DLR, Oberpfaffenhofen)

• WDC-TERRA – World Data Center of the Lithosphere (candidate) – GeoForschungsZentrum (Potsdam)

The WDC cluster „Earth System Research“

Page 6: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

WDC cluster „Earth System Research“

Data and thematic coverage

atmosphere

land

models

ocean

Page 7: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Activities & characteristics of the WDC cluster

Longterm archiving facilities• Clear commission as data libraries• Data management infrastructure, expertise, and manpower• Longterm commitment and funding

Peer review for scientific data• Completeness of data set descriptions (metadata)• Validity of methods used• data values (precision, sequence, and ranges)• Data publication based on citable data entities having persistent identifiers (DOI)

Userfriendly and reliable systems for data retrieval and distribution• General nonrestricted online access• Offline products (e.g. data collections, DVD)

Fostering common standards and protocols

Clear commitment to the rules for good scientific practice!

Page 8: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

WDC infrastructure• Metadataprofile (ISO 19115, subset compatible with Dublin Core and

ISO 690)• Metadata catalogues based on common protocols (ISO, W3C, OGC)• Common internet portal (search engine) • Cost models to support longterm archiving at universities and in

scientific projects

Data publication• Migration of metadata into library catalogues and direct access of

WDC archives• Common search of scientific data and literature• Peer review for scientific data• Acceptance as citable publication through ISI

WDC cluster - milestones

Page 9: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Data Publication:Problem and Solution

• Shortcomings in data provision and interdisciplinary use– Rules of good scientific practise are not taken into account in all

cases.– Data sources are widely unknown.– Data are achived without context.– Data cannot be cited as independent entities

• Method of solution: publication of primary data as independent entities– Persitent Identifier with global resolving mechanism for data archive

and context referencing (scientifc datamodel at archive level)– Integration into library catalogues in order to find data together with

articles – STD-DOI application profile: meta data kernel + items for electronic

publication (interface between scientific data archives and libraries)

Page 10: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Data Publication:Credits in Science

• "Citation Index": Scientific efficiency is "measured" by publications.

• Extra work for data publication is currently not acknowledged.– Data processing, context documentation, quality assurance.

• Recommendation: Data publications should be included in the standard scientific "Citation Index".– Motivation of the individual scientist.– Connection between person and primary dataset.

• Citable Data publications– support the rules of good scientific practise.– encourage inter-disciplinary data utilisation.– Make data searchable in library catalogues together with articles– Closes the gap between scientifc literature and related data sources

Page 11: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Data Publication:Metadata for primary data 1

Attribute Example

1. DOI 10.1594/WDCC/IPCC_EH4_OPYC_SRES_B2_MM

2. identifier URN:TIB:10.1594/WDCC/IPCC_EH4_OPYC_SRES_B2_MM

3. creator Monika Esch (Author)

4. publisher WDCC, World Data Center for Climate

5. title Climate Projection for the next Century calculated by the Global Climate Model ECHAM4-OPYC using the SRES B2 IPCC Scenario

6. language en

7. StructuralType Digital

8. mode Abstract

9. resourceType Dataset

Page 12: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Attribute Example

10.-12. registration information 10.1594 (RA) / 1 (issue no.) / 2004-07-18 (issue date)

13. creationDate 2001-12-31

14. publicationDate 2004-07-18

15. description These data represent results from the ECHAM4/OPYC climate model running the SRES-B2 sceanrio. The data base tables contain monthly mean time series of ……

16. publicationPlace Hamburg

17. size 614190228 Bytes

18. format GRIB

19. edition 1

20. relatedDOIs (none)

Data Publication:Metadata for primary data 2

Page 13: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Data Publication:Criteria for Persistent Identifier Allocation

• Critical points are securing of data quality and stable connection between identifier and data entity

– Allocation is restricted to syntax control and completeness, i.e. expert data description and long-term archiving

– Scientific quality assurance is expected by the author and will be reviewed during the allocation process.

– Published primary data cannot be changed like published articles.

– Stable connection between identifier reference and data entity as well as long-term availability of the primary data are essential and must be ensured (e.g. ICSU WDC's)

Page 14: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

GFZ Geophysics

International DOI Foundation

TIB HannoverRegistr.Agency

M&D/MPIM Climate Models

Marum/AWI Observations

Data StorageLong-termArchivingIn WDC

Data Storage Long-termArchivingIn WDC

Data StorageLong-termArchiving

Global Handle System

DDBURN-Knot

DFG Project "Publication and Citation of ScientificPrimary Data"

TIB-ORDERLibrary Catalogue

Data Publication:

Page 15: M. Diepenbroek (MARUM), M. Lautenschlager (MPI-M), E. Paliouras (DLR), H. Grobe (AWI) CODATA General Assembly, Berlin 10.11.2004 World Data Center Cluster

Further information

• Project webpage:• http://www.std-doi.de• TIB Handle Server:• http://doi.tib-hannover.de:8000• DOI Foundation:• http://www.doi.org• URN registration of the DDB:• http://www.persistent-identifier.de