24
Datamanagement at the European Southern Observatory: Strategies and Challenges Michael Sterzik, ESO Datamanagement and Operations Division

Datamanagement at the European Southern Observatory

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Datamanagement at the European Southern Observatory:

Strategies and Challenges

Michael Sterzik, ESO Datamanagement and Operations Division

European Southern Observatory

- builds and operates state-of-the-art ground-based astronomical facilities - most productive Observatory world-wide- inter-governmental organisation supported by 16 member states- involvement of ~1/2 of the astronomical community world-wide

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Data Challenges in Astronomy

! diversity of data collections ! multi-mission, multi-messenger ! multi-wavelength ! time-domain

! Astronomy becomes more and more data intensive ! increasing volume and/or increasing complexity

! … depends on Infrastructure ! to store, access, preserve and share ! to process, analyse, synthesise

! … requires a (widely accepted) ecosystem ! data standards, representations (“FITS”) ! metadata definitions ! interoperability protocols (“VO”) ! associated SW tools ! information services

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Data Policy of ESO http://www.eso.org/sci/observing/policies/Cou996-rev.pdf

! Council doc. on VLT/VLTI Science Operations Policy (2004) ! access to the facility (general observing time, Guaranteed Time) ! data rights: proprietary period (in general one year) ! data access: public through an electronic archive ! data products: level 0 (quality control) and 1 (science grade) by ESO ! data analysis: higher level by community

! … necessary for defining a data management plan

Research cycle Astronomy

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

mapping research into a work/data flow

Phase II

Long Term SchedulePhase I

Phase III

ProgramPreparation

Data EnhancementArchive I/O

OBExecution

Observing BlockPreparation

Short Term Schedule

Quality Control

Metadata

raw datadata products

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

“Business model” of Astronomy

• periodic (6mo) review of observing proposal: oversubscription rate 3-5 • best ideas/proposals win observing time after peer-review • Principal Investigators enjoy proprietary data rights for one year • after: data go public, pressure to publish increases

Volume versus Value

! raw-data are use-less for direct scientific analysis ! data sets must be complete, calibratable, and pass

quality control ! data processing (“reduction”) to create data

products ! quality control, master calibrations (Level 0) ! transformation from instrumental to physical units (L 1) ! combination of observations: deep and wide maps or

spectra (L 2) ! catalogs, high level data products after scientific

analysis (L 3)

Volume

Effort, Value

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Strategies for the Generation of Content: Science Data Products @ ESO

! In-house generation of Data Products (IDPs): ! quality ensured through a standardised process of data acquisition for

science and calibration data ! near-real time quality control process ensures certified master

calibrations (L0) ! un-attended processing through certified pipelines ! goal: science grade data for all popular instruments (L1)

! External Data Products (EDPs): ! provided by public surveys and large programs (deliverables) ! programs specifically selected for high legacy value ! most use dedicated (non-ESO) user-pipes (eg CASU) ! goal: advanced products (L2, L3) ! perspective: user community at large contributes EDPs • quality assurance: published datasets only? • acknowledgement: specific DOI?

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Phase 3 process for EDPs

1.!Data!

prepara(on!

2.!User�s!data!valida(on!

3.!Data!

release!defini(on!

4.!Data!

transfer!to!ESO!

8.!Data!

publica(on!

7.!Archival!storage!

6.!Content!valida(on!

5.!Automa(c!!release!

valida(on!

P.I. Data provider “C

losing” the data release

ESO defines the required data format, provides dedicated tools, user documentation and direct support for Phase 3 data providers.

The data provider i.e. survey P.I. is responsible for the quality of the reduced data products and the associated data release documentation.

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Data Collections: Phase 3 Releases http://www.eso.org/sci/observing/phase3/data_releases.html

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

ESO Data Portal: acknowledgements

… DOIs?

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Linking Publications and Data

! Scientific Return is a KPI for on Research Infrastructure ! data is the primary scientific return ! bibliometrics widely used as a proxy for • quantity (number) • quality (citations) • merit (quality/cost)

! managerial tool • assess impact of science policies, implementation, operations, …

! Community Access to Publications and Data ! enables active archive research ! allows to recognise the authorship (in principle)

! the impact/value of data archive research is not always appreciated (no much data-metrics)

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Comparative(Inter-facility) Bibliometrics

U. Grothkopf et al., http://www.eso.org/sci/libraries/edocs/ESO/ESOstats.pdf

ESO and other Observatories In order to put ESO’s research output into context, we give an overview of the total numbers of publications of major obser-vatories for the publication years 1996 to 2014 (if already available). Note that some facilities date back further than that; their early years are not included in this graph. The most simplistic way of comparing facilities is to look at the numbers of publications. Obviously, this favors large institu-tions with many facilities over smaller ones. A more meaningful investigation should normalize the numbers in some way, for instance by number of observing hours, by actual share of data used in the papers (as many scientific articles use data from more than one observatory), or by budget (telescope construction costs and maintenance). When comparing publication statistics among different observatories, it is essential to assess the selection criteria applied by each observatory. To the best of our knowledge, the observatories shown in this graph include only papers that actually use observational data from their facilities (as opposed to merely referencing them). All papers were published in refereed journals. !!!!!!!!!!!!!!!!!!!!!!!!The statistics shown in Fig. 3 and Table 3 were obtained as follows: ESO total, VLT/VLTI, La Silla, ESO survey telescope, APEX (ESO time), ALMA (Europe time): ESO Telescope Bibliography (http://telbib.eso.org) Chandra: Chandra Bibliographic Statistics (http://cxc.harvard.edu/cda/bibstats/bibstats.html ‘Refereed Chandra Science Papers’ and www.eso.orghttp://cxc.harvard.edu/cda/bibstats/plots/Current/Papers_by-year.txt) Gemini: ADS (Filters / Select References In, http://esoads.eso.org/abstract_service.html#jousel) HST: HST Publication Statistics (http://archive.stsci.edu/hst/bibliography/pubstat.html) ING: Isaac Newton Group of Telescopes paper counts (http://catserver.ing.iac.es/service/biblio/tablecount.php) for WHT, INT, and JKT Keck: Keck Science Bibliography (http://www2.keck.hawaii.edu/library/keck_papers.html) NRAO: NRAO Publication Statistics (http://www.nrao.edu/library/pubstats.shtml) Spitzer: Spitzer Bibliographical Database (http://sohelp2.ipac.caltech.edu/bibsearch/) Subaru: ADS (Filters / Select References In, http://esoads.eso.org/abstract_service.html#jousel) Swift: statistics provided by Sandra Savaglio, Max-Planck-Institute for Extraterrestrial Physics, Garching, Germany XMM: XMM-Newton in the Journals (http://heasarc.gsfc.nasa.gov/docs/xmm/xmmbib.html). Number of publications per year provided by Norbert Schartel, ESA, Madrid, Spain

ESO Library, Karl-Schwarzschild-Strasse 2, 85748 Garching near Munich, Germany, [email protected] / http://telbib.eso.org ! 5

Publications of major observatories by year

0

100

200

300

400

500

600

700

800

900

1996

1997

1998

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

2011

2012

2013

2014

ESO total HST!!Spitzer!VLT/VLTI !NRAOChandra XMM Swift La Silla Keck Gemini ING (may contain duplicates)!Subaru!ESO survey tel. APEX (ESO) ALMA (Eur.)

No. o

f pub

licat

ions

Fig. 3: Refereed publications by ESO and other observatories (as of Feb. 2015)Thick lines: ESO facilities. Thin lines: other ground-based facilities. Dashed lines: space-based facilities.

Please note that selection criteria for inclusion or exclusion of papers vary among observatories• useful to estimate the global performance of a facility (Observatory) • but biased, depends on how librarians count the “contributions”

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Intra-facility bibliometrics

• useful to asses the relative performance of different components

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Intra-facility bibliometrics

• useful to asses the science policy and its implementation

UT Productivity

0 2 4 6 8 10Years between Program scheduled and Publication

0.0

0.5

1.0

1.5

2.0Nu

mbe

r of P

ublic

atio

ns p

er P

rogr

amUT Productivity

0 2 4 6 8 10Years between Program scheduled and Publication

0.0

0.5

1.0

1.5

2.0Nu

mbe

r of P

ublic

atio

ns p

er P

rogr

amAllLargeGTODDTTOO

Average

Guaranteed Time

Discretionary Time

Target of Opportunity

Large Programs

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Links to data and publications telbib.eso.org

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

All original data archive.eso.org

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Content aggregation and exploration cdsportal.u-strasbg.fr

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Archival data enabled publication fraction

U. Grothkopf et al., http://www.eso.org/sci/libraries/edocs/ESO/ESOstats.pdf

start of facility operations start archive population with DP archive services interoperability

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Archival data enabled publication fraction

U. Grothkopf et al., http://www.eso.org/sci/libraries/edocs/ESO/ESOstats.pdf

HST

start of facility operations start archive population with DP archive services interoperability

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

… and costs? as a fraction of total RI operations costs

! data archive operations ! archive infrastructure TCO (1PB, 3 safe copies) 0.3-1% ! content management (production, curation, assurance) ~10%

! data generation ! facility time for calibrations ~4%

! lacking resources for data archive developments … ! apply existing frameworks, standards as much as possible! ! do not re-invent archive infrastructure!

Expert Meeting on Research Data Management and Sharing, Brussels, February 23rd, 2015

Summary

! astronomy as data-intensive science ! diversity of datasets with respect to quality, quantity, and origin ! international data standards largely applied (IAU)

! mapping the research process into data-flow process ! allows a systemic approach to reach consistent data quality ! allows to add-value through the generation of data products ! allows to add-value through the sharing of data products

! astronomical data archives ! are an essential science resource ! benefit (= science return + …) > costs

! challenges ahead ! recognise content, services, interoperability as priority ! incentives for data providers