Upload
neil-chue-hong
View
610
Download
0
Embed Size (px)
DESCRIPTION
The open repositories community has made great strides in recent years in addressing interoperability, policy and providing the arguments for open access and sharing. One aspect of open research which has come to prominence is the importance of software as a fundamental part of reproducible research, which in turn raises issues around the preservation of software. In this short presentation, I will describe some of the work that the Software Sustainability Institute (SSI) has been doing to address the structural and policy issues which currently present a barrier to the deposit and use of software in open repositories.
Citation preview
Software Sustainability Institute
www.software.ac.uk
Where does it go from here? The Place of Software in Digital Repositories
12 July 2012OR2012, Edinburgh
Neil Chue Hong (@npch)[email protected]
Software Sustainability Institute
www.software.ac.uk
Software is pervasive in research
Software Sustainability Institute
www.software.ac.uk
The Software Sustainability Institute
A national facility for building better software• Better software enables better research• Software reaches boundaries in its
development cycle that prevent improvement, growth and adoption
• Providing the expertise and services needed to negotiate to the next stage• Software reviews and refactoring, collaborations
to develop your project, guidance and best practice on software development, project management, community building, publicity and more…
Supported by EPSRC Grant EP/H043160/1
Software Sustainability Institute
www.software.ac.uk
Software Sustainability: preservation vs sustainability
Image courtesy of Mortati under CC-by-nc-nd
Image courtesy of London Permaculture under CC-by-nc-sa license
Preservation?
Sustainability?
Software Sustainability Institute
www.software.ac.uk
Why are you considering software sustainability?
Achieve legal compliance
Create heritage value
Enable continued access to data
Encourage software reuse
Purpose
JISC-funded, with Curtis+Cartwrighthttp://www.software.ac.uk/resources/preserving-software-resources
Software Sustainability Institute
www.software.ac.uk
How are you going to choose the right approach?
Preservation (techno-centric)
Emulation (data-centric)
Migration (functionality-centric)
Transition (process-centric)
Hibernation (knowledge-centric)
Deprecation
Approach
Software Sustainability Institute
www.software.ac.uk
Software Carpentry
• Helping scientists be more productive by teaching them basic computing skills
• How to userepositoriesproperlyis a key skill
• http://software-carpentry.org
Software Sustainability Institute
www.software.ac.uk
Just the Nature of the problem?
Maintenance is not funHacking is fun
Statistics courtesy of Greg Wilson, Software Carpentry, from Nature article
Published online 13 October 2010 | Nature 467, 775-777 (2010) doi:10.1038/467775a
Software Sustainability Institute
www.software.ac.uk
“Re-” is the new black
Publicationonly
MethodProvenance(link data and code)
Data
MethodDocumentation
MethodExecution
SameState
Replay
Reconstruct
RefreshNewState
Rerun
Repeat
Reproduce with new Data
Reproduce with new Method
Repair
DataProvenance
RecoverRepurpose
Reuse Review
Good enough To Verify
Drummond C Replicability is not Reproducibility: Nor is it Good Science, onlinePeng RD, Reproducible Research in Computational Science Science 2 Dec 2011: 1226-1227.Christine L. Borgman, The Conundrum of Sharing Research Data J ASIS&T 2012
Slide from Carole Goble, JCDL 2012
Software Sustainability Institute
www.software.ac.uk
The most important: Reward
• How do we reward people for important software contributions?
• Traditionally: publish a research paper that happens to mention software Can we provide more direct, acceptable software citations?
• A Research Software Impact Manifesto http://www.software.ac.uk/blog/2011-05-02-publish-or-be
-damned-alternative-impact-manifesto-research-software
NB Authorship is hard
Software Sustainability Institute
www.software.ac.uk
Isn’t software just data?
http://beyond-impact.org/?p=175
Software Sustainability Institute
www.software.ac.uk
What do we choose to keep:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies? What’s the minimum citable part?
Boundary
Software Sustainability Institute
www.software.ac.uk
Algorithm
Function
Prog
ram
Library / Suite / Package
…
Granularity
Software Sustainability Institute
www.software.ac.uk
Versioning
Personalv1
Personal v2
Personalv3
Personal v2a
Public v1
Personal v3a
Personal v2a
Public v2
Public v3
Why do we version?- To indicate a change- To allow sharing- To confer special status
Software Sustainability Institute
www.software.ac.uk
Backup,Sharing,Archiving
Software Sustainability Institute
www.software.ac.uk
Differing roles, different repositories
backup sharing archiving
TimescalesPolicyLicensing
IngestMetadataAssurance
Software Sustainability Institute
www.software.ac.uk
Software Metapapers
• Create a complete scholarly record including “standard” publication, method, dataset and models, and software e.g. modelling and simulation, statistical analysis Enable replay, reproduction and reuse
• Pragmatic approach is to create a metadata record for the software, and link it to a copy of the software in some storage infrastructure This is a software metapaper Peer-review the metadata, not the software
• Journal of Open Research Software: http://openresearchsoftware.metajnl.com/
See: http://openresearchsoftware.metajnl.com/faq/ and the work by B. Matthews et al: The Significant Properties of Software: A Study
Software Sustainability Institute
www.software.ac.uk
An acceptable repository
• Metapaper references an instance of software, stored in a “suitable” repository Clear access / deposit / preservation policy Adherence to standards Ability to easily “transfer” Sustainability of hosting organisation Ability to monitor, check integrity (obsolescence?)
• We may be storing Binaries, source code (as text or archived), virtual
machines(!)
Software Sustainability Institute
www.software.ac.uk
Potential for confusion
• ‘The right license for all parts of the scholarly record’ Victoria Stodden, Enabling Reproducible Research: Open Licensing for
Scientific Innovation• Commonly used OSI approved licenses include:
Apache License, 2.0 (Apache-2.0) BSD 3-Clause “New” or “Revised” license (BSD-3-Clause) BSD 3-Clause “Simplified” or “FreeBSD” license (BSD-2-Clause) GNU General Public License (GPL) GNU Library or “Lesser” General Public License (LGPL) MIT license (MIT) Mozilla Public License 2.0 (MPL-2.0) Common Development and Distribution License (CDDL-1.0) Eclipse Public License (EPL-1.0)
• Does enabling the deposit of software just confuse those already depositing publications/data?
Software Sustainability Institute
www.software.ac.uk
5 Stars of Software?
• Do we need a 5 stars for software? Existence – there is accurate
metadata that defines the software Availability – you can access and run
the software Openness – the software has an
open permissible license Assured – the software provides
ways of assuring its correctness Linked – the related data,
dependencies and papers are indicated
c.f.5 Stars of Linked Data (Berners-Lee)5 Stars of Online Journals (Shotton)
Software Sustainability Institute
www.software.ac.uk
Take home points1) Researchers are developing more software than ever, and trying to do it better
2) They want to be rewarded for creating a complete scholarly record – this includes software
3) We still don’t know the best way to shift from one repository role to another when it comes to software! Backup -> sharing -> archiving