Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
An introduction to datapublications
Kirsten Elger
Deutsches GeoForschungsZentrum GFZ, Potsdam, [email protected]
Research Data
• Research data are essential for scientific research
• Many datasets, e.g. observational data, areirreplaceable
• With the advent of the internet , there is a significant change in the way to collect, manage, and archive research datasets
So extensive and dangerous a workEleven nations established 14 principal research stations across the Polar Regions. 12 were in the Arctic, along with at least 13 auxiliary stations. Over 700 men incurred the dangers of Arctic service to establish and relieve these stations between 1881 and 1884.
Observations on: meteorology, geomagnetism, auroral phenome-na, ocean currents, tides, structure and motion of ice and atmospheric electricity
Geological field work in 1995…
GPS values
data ”publication“ in 1995…
…and after the end of the project?
• the bad case: the pHd student/ postDoc takes the data with him orher (on a floppy disc/ CD) and, years later, throws everything away
• Slightly better: data submission (in digital or analogue form) to a computer of the department, with or without data description(depending on the time and motivation of the respective scientist
What happens when the professor or lab PI retires?• Who takes care of the hard drives with the old data?• Who takes care of paper copies of maps or other datasets?• How long may rock samples be archived after the scientist left?
Research Data Today
Thanks to the internet …• many datasets are available online• very fast data access, even to large datasets• online access to journal articles• online-only journals are coming of age• real-time data
Real-time data
example: climate station in Alaska (air, surface, shallow ground temperatures)
Quelle: Permafrost Lab, UAF, Fairbankshttp://permafrost.gi.alaska.edu/
GEOFON earthquake information service
GEOFON Live Seismograms
NOAA (National Ocean andAthmosphere Administration):
• Synoptic meterological records of the first IPY ín digital form(surface air temp, sea levelpressure 1-year time series)
• extensive documentary image collection
• Overview on IPY reports • Posters• Online available for download:
www.arctic.noaa.gov/aro/ipy-1
… as a consequence
• With the advent of the digital era and the internet, data sets increasingly grow in size and complexity
• Data reuse and data mining are becoming more and more important
• Metadata portals (with automatically generatedstandardised metadata) are more and more important fordata discoverygetting
• There is an incrasing number of data repositories and forall types of research data
• There is increasing expectation by the scientific community, funding agencies and the public to make publicly-funded research results and data free and open accessible without any constraints.
Politics…
Schwerpunktinitiative “Digitale Information” der Allianz der deutschenWissenschaftsorganisationen: Die Verfügbarkeit und Nachnutzung digitaler Informationen schließt den möglichst kostenfreien und offenen Zugang zu Forschungsdaten ein..
2003 Berliner Erklärung über den offenen Zugang zu wissenschaftlichem Wissen: “Open Access- Veröffentlichungen umfassen originäre wissenschaftliche Forschungsergebnisse ebenso wie Ursprungsdaten, Metadaten, …“
Digitale Agenda der Bundesregierung 2013-2017
• Open science, the unrestricted access to scientific publications and cultural heritage, is an ongoing and future trend in the scientific landscape worldwide. Research publications and other digital objects such as research data and scientific software will thus be publicly available on the internet.
• The Helmholtz Association was one of the initial signatories of the „Berlin Declaration on Open Access to Knowledge in the Sciences and Humanities“ in 2003. This commitment towards open access was then formally approved by its Assembly of Members (assembly of the directors of the Helmholtz Centres): „Publications from the Helmholtz Association shall in future, without exception, be available free of charge, as far as no conflicting agreement with publishers or others exists.“ (Resolution of the Assembly of Members, 27 September 2004).
Helmholtz Open Science
Obstacles of sharing
• too much work with no benefit
• data publications were deletedfrom reference lists by journaleditors
• „they mis-interpret or mis-use mydata“
• „someone will publish MY databefore me“
• Do I have to share ALL my data? © www.aukeherrema.nl
PRIVATE DOMAIN
SHAREDDOMAIN
PERMANENT DOMAIN
PUBLIC DOMAIN
Domains of research data
Think about data sharing from the beginning on!
How to make intelligent openness standard?• data must be accessible and readily located• Data must be intelligible for those who wish to scrutinize them• They must be assessable so that judgments can be made about their reliability
and the competence of those who created them• They must be usable by others• For data to meet these requirements it must be supported by explanatory
metadata (data about data)
Science as an open enterprise (2012) The Royal Society Science Policy Centre report 02/12 ISBN: 978-0-85403-962-3
The practice of science: Open inquiry is at the heart of the scientific enterprise. Publication of scientific theories - and of the experimental and observational data on which they are based - permits others to identify errors, to support, reject or refine theories and to reuse data for further understanding and knowledge. Science’s powerful capacity for self-correction comes from this openness to scrutiny and challenge
Intelligent Openness (Royal Soc. London 2012)
• Researchers‘ willingness to publish their data
• Technical solutions to facilitate dataavailability, access and reuse
• Recognition and credits for data producers
There is a need for….
Data publication with DOI
• persistent • citable• with metadata
DataCite and Digital Object Identifiers(DOI) for Data
STD DOI "Publikation und Zitierbarkeit von Primärdaten" (DFG Project 2004-2009, Partner: TIB, DKRZ, PANGAEA, DLR, GFZ) DOI for research data DataCite
What is a DOI
• Digital Object Identifier
• A unique and permanent identifier for digital objects
• “Signpost” to the URL with the dataset and its description = landing page
• Persistent = long term data access guaranteed by the publisher
• With metadata
Metadata and MetadataMetadata for data discovery: example DOI landing page
title citation
description/ abstract
Keywords
spatialcoverage
relatedwork
downloaddata files
standardisedmetadata
Metadata and Metadata
Metadata for data discoveryauthor, title, description, keywords, spatial/temporal domain, ...
Structural metadata (for reuse): formats, methodology, sources…
Definition of data labels
Metadata and Metadata
metadata for data discoveryauthor, title, description, keywords, spatial/temporal domain, ...
structural metadata (for reuse)formats, methodology, sources, processing steps, …
administrative metadata metadata related to the use, management, and encoding
processes of digital objects over a period of time Includes technical metadata: versions, checksum, timestamp,…
A comprehensive data description isessential for data reuse and shouldalways be available before a DOI registration
There are different possibilities for datapublication
Examples for data publication 1 data supplements to scientific articles
Links to datasets
Link to original articlewith data description
Peer-reviewed articles with thedescription of datasets or
collections, etc.
Examples for data publication 2: Data Journals
3. Data Reports – GFZ examples
Institutional Report Series have long traditions as important sources of information. Today: persistently online accessible and citable with DOI…GFZ: Data Reports• Flexible format – “enhanced data
description“ • standardised templates for each
discipline, internal review• Project-specific design if required
Coalition on Publishing Data in the Earth and Space Sciences
GOAL OPEN DATA in the EARTH
and SPACE SCIENCES STATEMENT OF COMMITMENT
• To promote metadata information and domain standards, […], to help simplify and standardize deposition and reuse.
• To promote referencing of data sets using the Joint Declaration of Data Citation Principles, in which citations of data sets should be included within reference lists.
• To include in research papers concise statements indicating where data reside and clarifying availability.
• To promote and implement links to data sets in publications and corresponding links to journals in data facilities via persistent identifiers. (January 2015)
SIGNATURES (Nov 2015)
additional signatures welcome
Conclusions
• Data are increasingly recognized as part of the scholarly record, data citation is coming of age.
• Data publications with assigned DOI provide citable andpersistent access to research data.
• There is a growing number of data repositories to store and access data (institutional, domain specific, general).
• Data description is essential for reuse
Next step
International Geo Sample Number IGSN – uniqueidentifier forphysical objects