26
Management from a disciplinary perspective Sarah Jones Digital Curation Centre [email protected] Twitter: @sjDCC Stéphane Goldstein Research Information Network s [email protected] Twitter: @stephgold7

Disciplinary RDM

Embed Size (px)

DESCRIPTION

Presentation given at the Jisc "Directions for RDM in UK unis" event on 6-7th November 2014 at the Moller Centre in Cambridge

Citation preview

Page 1: Disciplinary RDM

Research Data Management from a disciplinary perspective

Sarah JonesDigital Curation Centre

[email protected]: @sjDCC

Stéphane GoldsteinResearch Information Network

[email protected]: @stephgold7

Page 2: Disciplinary RDM

Disclaimer

Practice varies greatly by discipline and sub-discipline so it’s hard to generalise

Apologies for any sweeping statements and groupings that don’t fit your model

Image credit: Sweep by Judy Van der Velden CC-BY-NC-ND www.flickr.com/photos/judy-van-der-velden/6757403261

Page 3: Disciplinary RDM

Case studies on disciplinary practice

RIN Information Seeking and Sharing Behaviourwww.rin.ac.uk/our-work/using-and-accessing-information-resources

– Life sciences– Humanities– Physical sciences

RIN Open Science Case studies www.rin.ac.uk/our-work/data-management-and-curation/open-science-case-studies

SCARP case studies www.dcc.ac.uk/resources/case-studies/scarp

Knowledge Exchange Incentives and motivations for sharing research data (forthcoming)

RLUK research data typology (more from Stephane)

Page 4: Disciplinary RDM

Groups and disciplines

Arts & Humanities– Creative arts, languages, philosophy, archaeology…

Social Science– Economics, history, politics, business, psychology...

Sciences & Engineering– Physics, astronomy, earth sciences, computing…

Life Sciences– Biology, ecology, medical and veterinary science…

Page 5: Disciplinary RDM

Arts & Humanities

Outputs may not be termed ‘data’ e.g. sketches, writing, performance, artefacts, ‘work’

Focus on literary outputs & manuscripts in some disciplines

More use of standard tools e.g. Word, Excel – less likely to adapt technologies to fit

Arguably lower awareness and uptake of RDM overall

Page 6: Disciplinary RDM

Creative Arts

Several RDM projects in the creative arts e.g. Kultivate, KAPTUR, VADS4R, CAiRO training...

Resistance to term ‘data’ – too scientific

Importance of personal websites for profile as work is also conducted outside of academia

Visual Arts Data Service - www.vads.ac.uk

Institutional repositories at arts schools accept a broader range of outputs and display content more visually to fill the void e.g. http://research.gold.ac.uk

Page 7: Disciplinary RDM

Sonic Arts Research Unit

Collaboration with IR as a result of losing data

Tension between providing access in a visual / usable way and preserving data

Still use soundcloud and personal websites for access, but these link to ‘master’ copy of data held in IR for preservation

www.dcc.ac.uk/resources/developing-rdm-services/repository-radar

Page 8: Disciplinary RDM

Digital Humanities

Intentional creation of resources rather than just data as by-product of research process

More use of standards e.g. XML & TEI in language resources, image standards and capture quality for digitisation, Dublin Core metadata…

Often include technical experts in project team

Links with cultural heritage collections

Negotiating copyright often a major issue

Sustainability a big challenge

Page 9: Disciplinary RDM

Mapping Edinburgh’s Social History

Historical maps overlaid these with all kinds of open data to chart how the town has changed through time

Uses open source tools

Allows you to overlay maps

Picks up on common themes

www.mesh.ed.ac.uk

Page 10: Disciplinary RDM

Social Sciences

Greater awareness and acceptance of RDM by community

Methodology is as much a factor in determining difference as discipline

Nature of data often poses challenges for sharing

Lots of reuse of large survey data

Established metadata standards e.g. Data Documentation Initiative (DDI)

Strong international data centre infrastructure

Page 11: Disciplinary RDM

Public health

Ethics predominant concern– How to negotiate consent– How to store, transfer & handle data securely– How to anonymise and share data

Data integration / linking and curation of longitudinal studies is major concern as data added to over decades

Need for data havens to help control access to data – role for unis e.g. Grampian Data Safe Haven

UK Data Service - http://ukdataservice.ac.uk

Page 12: Disciplinary RDM

Twenty-07: Public health study

Longitudinal study following 4510 people from West of Scotland over 20 years to investigate the reasons for differences in health

Undertook interviews, questionnaires, physical measurements, blood samples etc

Strict access controls and guidelines for data collection

Data managed within the MRC Social and Public Health Sciences Unit and accessible under a data sharing agreement - http://2007study.sphsu.mrc.ac.uk/Revised-Data-Sharing-Policy-has-been-launched.html

Page 13: Disciplinary RDM

Life Sciences

Funders arguably more demanding in terms of data sharing policy

Sharing can be problematic / resisted given the nature of the data, fear of misuse or loss of control over IPR

Data sharing agreements and access committees more common

Data integration & mining key drivers

Research is well-resourced so greater capacity to fund local solutions and tools for RDM during projects

Page 14: Disciplinary RDM

Genetics

Vast quantities of data and rapid growth– DNA sequence data is doubling every 6-8 months

Well established public databases for gene sequences e.g. GenBank www.ncbi.nlm.nih.gov/genbank – However even this is on short-term project funding!

Need accession number to publish so driver for sharing and established workflow

European Data Infrastructure projects too e.g. ELIXIR

Page 15: Disciplinary RDM

Neuroscience

Large data volumes due to use of medical imaging

Moving towards larger cohort studies integrating wider range of data types, which strains the balance with ethical requirements around personal data

Costs of data gathering and advances in analysis technology are making field more data intensive - computational methods

Small interdisciplinary teams provide the human infrastructure for RDM, but historically low funder investment in data management at lab level

Disciplinary archives are immature, and has encouraged tendency for labs to treat longitudinal datasets as intellectual capital

Page 16: Disciplinary RDM

OMERO – Open Microscopy Environment

Monash e-Research Centre helps groups to adopt (and if needed adapt) existing technological solutions

Partnered a research group to implement OMERO, a secure central repository to help researchers organise, analyze and share images

Resulting tool more sustainable as tailored to specific community need

www.dcc.ac.uk/resources/developing-rdm-services/improving-rdm-monash

Page 17: Disciplinary RDM

Science & Engineering

Large scale can mean RDM is built in as standard and sharing part of workflow e.g. facilities science

Often early adopters and advocates of new technologies e.g. the Grid, wikis & Arxiv in particle physics

Archiving established in some cases as data can’t be recreated e.g. NERC data centres for Earth Sciences

Commercial sensitivities can place restrictions on sharing in some fields

Industry partners

Page 18: Disciplinary RDM

Mechanical Engineering

Several RDM projects at Bath e.g. ERIM, REDm-MED

Concept of repository well established in industrial engineering – Product Lifecycle Management (PLM) systems

Preservation issues as data is challenging e.g. CAD files

Less information sharing than other disciplines– Commercial sensitivities preclude sharing– Consultancy-style research can lead to internal-only results– Data generated from private systems, so less applicable to others

Page 19: Disciplinary RDM

Crystallography

X-ray examinations, images and videos of crystal structures, chemical crystallography diffraction images

Established metadata standards e.g. Crystallographic Information Framework (CIF)

Advocates of open science and use of related tools UsefulChem - http://usefulchem.wikispaces.com LabTrove - www.labtrove.org

eCrystals Archive and Crystallography Open Database (COD)

National Crystallography Service - www.ncs.ac.uk

Page 20: Disciplinary RDM

Astronomy

Established data standards (e.g. FITS and NOA) maintained by community

Access to facilities requires the deposit of raw data, although this can be embargoed

International data centres e.g. Sloan Digital Sky Survey - www.sdss.org

Large volumes of data so transfer can be difficult

Few IPR issues compared to other disciplines

Data products are not always shared

Page 21: Disciplinary RDM

Galaxy Zoo

Citizen Science project started to classify a million galaxies imaged by the Sloan Digital Sky Survey

Over 50 million classifications in the first year, contributed by more than 150,000 people

Classifications were as good as those from professional astronomers

Further projects in astronomy, climatology, biology, humanities… www.galaxyzoo.org

Page 22: Disciplinary RDM

Research data typology

Commissioned by RLUKAim: to help librarians improve their ability to

engage with researchers on RDM matters; and to enable them to acquire a better understanding of the needs of researchers

A resource structured around a suggested typology of research data, looking at different ways in which data might be categorised

Page 23: Disciplinary RDM

Broad data types

1. How do researchers generate and process data, and for what purpose?

1.1 Method of creation and collection of research data: where the data comes from

1.2 Readiness of research data: extent to which data has been processed

1.3 Use of research data: researchers' main purpose for accessing and using data

2. In what file formats, media and volumes do researchers generate data?

2.1 Medium and format for research data: objects in which data is captured and recorded, electronic storage and file types

2.2 Electronic data volumes: size of files (this is subjective, and based largely on the perception of researchers

3. How do researchers manage and store their data? 3.1 Storage of research data: where and how data is kept

3.2 Types of metadata: not an exhaustive list, but these are widely-recognised metadata standards

3.3 Metadata standards

3.4 Degree of openness: founded on Royal Society's categorisation of 'intelligent openness'

3.5 Licensing of research data: legal rights appertaining the use of the data

Page 24: Disciplinary RDM

An expandable resource

A scaffold onto which disciplinary examples can be hung

Dynamic resource: community input (from librarians, but maybe others too?), crowdsourcing

Turning it into an online interactive toolRefreshing, curating, adapting the resourceBasic introduction at

http://www.powtoon.com/show/fZDm1s0W6TI/research-data-typology-for-rluk-draft/

Page 25: Disciplinary RDM

Conclusions

Lots of work still to do!

Domains different in all respects: data, methods, key RDM concerns, level of infrastructure and support…

Differences exist at sub-discipline level

Need to understand the area Developing and using RLUK’s typology

Page 26: Disciplinary RDM

How to plug the gaps?

Dozens of different repositories or databases specialising in sub-domains or data types, but still major gaps– Shared services?– Institutional services – specialising rather than generic?– Role of publishers and learned societies?– Funder calls for domain specific infrastructure?– Unis to support ground-up development of tools / services?

• How can the sector help domain-specific solutions to mature and thrive?