22
Software Sustainability Institute www.software.ac. uk Where does it go from here? The Place of Software in Digital Repositories 12 July 2012 OR2012, Edinburgh Neil Chue Hong (@npch) [email protected]

Where does it go from here? The role of software in digital repositories

Embed Size (px)

DESCRIPTION

The open repositories community has made great strides in recent years in addressing interoperability, policy and providing the arguments for open access and sharing. One aspect of open research which has come to prominence is the importance of software as a fundamental part of reproducible research, which in turn raises issues around the preservation of software. In this short presentation, I will describe some of the work that the Software Sustainability Institute (SSI) has been doing to address the structural and policy issues which currently present a barrier to the deposit and use of software in open repositories.

Citation preview

Page 1: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Where does it go from here? The Place of Software in Digital Repositories

12 July 2012OR2012, Edinburgh

Neil Chue Hong (@npch)[email protected]

Page 2: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Software is pervasive in research

Page 3: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

The Software Sustainability Institute

A national facility for building better software• Better software enables better research• Software reaches boundaries in its

development cycle that prevent improvement, growth and adoption

• Providing the expertise and services needed to negotiate to the next stage• Software reviews and refactoring, collaborations

to develop your project, guidance and best practice on software development, project management, community building, publicity and more…

Supported by EPSRC Grant EP/H043160/1

Page 4: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Software Sustainability: preservation vs sustainability

Image courtesy of Mortati under CC-by-nc-nd

Image courtesy of London Permaculture under CC-by-nc-sa license

Preservation?

Sustainability?

Page 5: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Why are you considering software sustainability?

Achieve legal compliance

Create heritage value

Enable continued access to data

Encourage software reuse

Purpose

JISC-funded, with Curtis+Cartwrighthttp://www.software.ac.uk/resources/preserving-software-resources

Page 6: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

How are you going to choose the right approach?

Preservation (techno-centric)

Emulation (data-centric)

Migration (functionality-centric)

Transition (process-centric)

Hibernation (knowledge-centric)

Deprecation

Approach

Page 7: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Software Carpentry

• Helping scientists be more productive by teaching them basic computing skills

• How to userepositoriesproperlyis a key skill

• http://software-carpentry.org

Page 8: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Just the Nature of the problem?

Maintenance is not funHacking is fun

Statistics courtesy of Greg Wilson, Software Carpentry, from Nature article

Published online 13 October 2010 | Nature 467, 775-777 (2010) doi:10.1038/467775a

Page 9: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

“Re-” is the new black

Page 10: Where does it go from here? The role of software in digital repositories

Publicationonly

MethodProvenance(link data and code)

Data

MethodDocumentation

MethodExecution

SameState

Replay

Reconstruct

RefreshNewState

Rerun

Repeat

Reproduce with new Data

Reproduce with new Method

Repair

DataProvenance

RecoverRepurpose

Reuse Review

Good enough To Verify

Drummond C Replicability is not Reproducibility: Nor is it Good Science, onlinePeng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.Christine L. Borgman, The Conundrum of Sharing Research Data J ASIS&T 2012

Slide from Carole Goble, JCDL 2012

Page 11: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

The most important: Reward

• How do we reward people for important software contributions?

• Traditionally: publish a research paper that happens to mention software Can we provide more direct, acceptable software citations?

• A Research Software Impact Manifesto http://www.software.ac.uk/blog/2011-05-02-publish-or-be

-damned-alternative-impact-manifesto-research-software

NB Authorship is hard

Page 12: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Isn’t software just data?

http://beyond-impact.org/?p=175

Page 13: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

What do we choose to keep:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies? What’s the minimum citable part?

Boundary

Page 14: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Algorithm

Function

Prog

ram

Library / Suite / Package

Granularity

Page 15: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Versioning

Personalv1

Personal v2

Personalv3

Personal v2a

Public v1

Personal v3a

Personal v2a

Public v2

Public v3

Why do we version?- To indicate a change- To allow sharing- To confer special status

Page 16: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Backup,Sharing,Archiving

Page 17: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Differing roles, different repositories

backup sharing archiving

TimescalesPolicyLicensing

IngestMetadataAssurance

Page 18: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Software Metapapers

• Create a complete scholarly record including “standard” publication, method, dataset and models, and software e.g. modelling and simulation, statistical analysis Enable replay, reproduction and reuse

• Pragmatic approach is to create a metadata record for the software, and link it to a copy of the software in some storage infrastructure This is a software metapaper Peer-review the metadata, not the software

• Journal of Open Research Software: http://openresearchsoftware.metajnl.com/

See: http://openresearchsoftware.metajnl.com/faq/ and the work by B. Matthews et al: The Significant Properties of Software: A Study

Page 19: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

An acceptable repository

• Metapaper references an instance of software, stored in a “suitable” repository Clear access / deposit / preservation policy Adherence to standards Ability to easily “transfer” Sustainability of hosting organisation Ability to monitor, check integrity (obsolescence?)

• We may be storing Binaries, source code (as text or archived), virtual

machines(!)

Page 20: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Potential for confusion

• ‘The right license for all parts of the scholarly record’ Victoria Stodden, Enabling Reproducible Research: Open Licensing for

Scientific Innovation• Commonly used OSI approved licenses include:

Apache License, 2.0 (Apache-2.0) BSD 3-Clause “New” or “Revised” license (BSD-3-Clause) BSD 3-Clause “Simplified” or “FreeBSD” license (BSD-2-Clause) GNU General Public License (GPL) GNU Library or “Lesser” General Public License (LGPL) MIT license (MIT) Mozilla Public License 2.0 (MPL-2.0) Common Development and Distribution License (CDDL-1.0) Eclipse Public License (EPL-1.0)

• Does enabling the deposit of software just confuse those already depositing publications/data?

Page 21: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

5 Stars of Software?

• Do we need a 5 stars for software? Existence – there is accurate

metadata that defines the software Availability – you can access and run

the software Openness – the software has an

open permissible license Assured – the software provides

ways of assuring its correctness Linked – the related data,

dependencies and papers are indicated

c.f.5 Stars of Linked Data (Berners-Lee)5 Stars of Online Journals (Shotton)

Page 22: Where does it go from here? The role of software in digital repositories

Software Sustainability Institute

www.software.ac.uk

Take home points1) Researchers are developing more software than ever, and trying to do it better

2) They want to be rewarded for creating a complete scholarly record – this includes software

3) We still don’t know the best way to shift from one repository role to another when it comes to software! Backup -> sharing -> archiving