34
Software Sustainability Institute www.software.ac. uk the Digital Age Software, Skills and Sociology http://dx.doi.org/10.6084/m9.figshare.957527 TGAC Science Symposia series, 11 March 2014 Neil Chue Hong (@npch), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | [email protected] Unless otherwise indicated slides licensed under Supported by Project funding from

Unless otherwise indicated slides licensed under

  • Upload
    trynt

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Doing Science in the Digital Age Software, Skills and Sociology http://dx.doi.org/10.6084/m9. figshare.957527 TGAC Science Symposia series , 11 March 2014 Neil Chue Hong (@ npch ), Software Sustainability Institute ORCID: 0000-0002-8876-7606 | [email protected]. Project funding from. - PowerPoint PPT Presentation

Citation preview

Page 1: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.ukDoing Science in

the Digital AgeSoftware, Skills and Sociology

http://dx.doi.org/10.6084/m9.figshare.957527

TGAC Science Symposia series, 11 March 2014Neil Chue Hong (@npch), Software Sustainability InstituteORCID: 0000-0002-8876-7606 | [email protected]

Unless otherwise indicatedslides licensed under

Supported by Project funding from

Page 2: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Four Paradigms of Research

Empirical

Theoretical

Computational

Data Exploration

Page 3: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Water Swap Reaction Coordinate

A water-swap reaction coordinate for the calculation of absolute protein-ligand binding free energiesWoods CJ, Malaisree M, Hannongbua S, Mulholland AJJ. Chem. Phys. (2011) vol. 134, pp. 054114http://dx.doi.org/10.1063/1.3519057

Page 4: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Pleiotropic loci

Selection at pleiotropic loci underlies disease co-occurrence in human populations. Navarro, Haley, Karosas et al. Submitted to Nature Genetics

Page 5: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Behind every great piece of science…

#go through each SNP of interestfor(my $x = 0; $x < scalar @pos; $x++){ #and then each downstream SNP of interest for(my $y = $x+1; $y < scalar @pos; $y++) { #if SNPs within our chosen distance (500kb) and both present in the haplotypes file if((!($trait[$x] eq $trait[$y])) && (abs($pos[$x] - $pos[$y]) <= 500000) && (exists($legArrayPos{$pos[$x]})) && (exists($legArrayPos{$pos[$y]}))) { my $snp1ArrayPos = "”; my $snp2ArrayPos = "”; my $snp1All = "”; my $snp2All = "”;

#create output file for this SNP pair my $filename = "ConditionedResults2/$chr[$x].$pos[$x]-$pos[$y].EHH.GBR.2.txt”; print "$filename\n”; unless (-e $filename) { open(OUT, ">$filename");

#####################CHANGE THESE IF NOT FOCUSING ON SECOND SNP######################### my $start = $pos[$y]-500000; if ($start < 1) { $start = 1; } my $end = $pos[$y]+500000; if ($end > $chrLengths{$chr[$x]}) { $end = $chrLengths{$chr[$x]}; }

Page 6: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

The modern researcher…

• … worries about: Data management

and analysis Reproducible

research Scalable simulations Integration of

models and workflows

CollaborationPicture of Otto Stern courtesy of Emilio Segre Visual Archives

Where do they learn how to do this?

Page 7: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Observation 1:Software is pervasive across research

Corollary: software is bleeding edge and long-tail Demanding users are coming from arts + humanities, economics, and social science as well as sciences

Page 8: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Observation 2:A culture of re-use rather than re-invention is not widespread Corollary: we have wasted effort and increased siloing

Page 9: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Observation 3:Many people are “embarrassed” about software

Corollary: something is broken in the way we regard, recognise and reward software

Page 10: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

The Research Cycle

Create

Test

Interpret

PublishRevise Paper

Data

Software

Research Outputs Research is a continuous cycle.

When we publish we are contributing to the body of knowledge.

Page 11: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Research/Reuse/Reward Cycle

Index

Identify

CiteRewardCreate

Test

Interpret

PublishRevise

Research Reuse Reuse is also a cycle. We build our research on the work of others.

Reward mechanisms should encourage reuse.

Page 12: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

The current process

Startresearch

Writesoftware

Usesoftware

Produceresults

Publishresearch

paper

Releasedata

Releasesoftware

Which mentions software and data

This process is simple but does not reward production orreuse of good software and data.

It also has a long contribution cycle.

Page 13: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Writesoftware

A better process?

Startresearch

Identifyexisting

software

Usesoftware

Produceresults

Publishresearch

paper

Adapt/extend

software

Releasedata

Releasesoftware

Publishsoftware

paper Publishdata

paper

Which references

software and data papers

Software and data papers are needed as proxies for rewarding reuse.

But it enables a shorter contribution cycle for data and software.

Page 14: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

What do we choose to identify:- Workflow?- Software that runs workflow?- Software referenced by workflow?- Software dependencies? What’s the minimum citable part?

Boundary

Page 15: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Algorithm

Function

Prog

ram

Library / Suite / Package

Granularity

Page 16: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Versioning

Personalv1

Personal v2

Personalv3

Personal v2a

Public v1

Personal v3a

Personal v2a

Public v2

Public v3

Why do we version?- To indicate a change- To allow sharing- To confer special status

Page 17: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

AuthorshipAuthorship• Which authors have had what impact on each version of the software?• Who had the largest contribution to the scientific results in a paper?

http://beyond-impact.org/?p=175

OGSA-DAI projects statistics from Ohloh

Page 18: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Observation 4:This is all getting just a little confusing

Corollary: maybe we need to get on to firmer conceptual ground

Page 19: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

The Foundations of Digital Research

Software

Software

Software

Re-usable Re-producible

www.software.ac.uk/ software-evaluation-guide resources/guides software-carpentry training

www.rse.ac.uk

www.software.ac.uk/blog/ 2012-11-09-craftsperson-and-scholar

software.ac.uk/blog/2012-08-16-what-research-software-community-and-why-should-you-care

www.software.ac.uk/blog/2011-05-02-publish-or-be-damned-alternative-impact-manifesto-research-software

Prlić A, Procter JB (2012) Ten Simple Rules for the Open Development of Scientific Software PLoS Comput Biol 8(12): e1002802. doi:10.1371/journal.pcbi.1002802

Wilson G, et al. (2014) Best Practices for Scientific ComputingPLoS Biol 12(1): e1001745. doi:10.1371/journal.pbio.1001745

Page 20: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Gap 1: Software Skills Training

Basic Advanced

ProgrammingFocussed

(Tools)

ResearchFocussed

(methods)

SoftwareCarpentry

Programming 101

SummerSchools

Advanced HPC Training

HPC Short CoursesDoctoral Training

MSc in HPC / scientific

computing

Programming 201

Who fills this gap?

Page 21: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Gap 2: Lack of recognition and reward

• There is an anachronism in the way we conduct and recognise research? REF references software as an output but it is still not easy

to get recognition – peer review fails• Software careers

Researchers who use software Researcher-Developers Research Software Engineers Research Software Support Research Systems Providers

Page 22: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Gap 3: Software Maturity and Management

Softw

are

prol

ifera

tion

Time

CustomisationInnovation Consolidation

Not all software should make it to the next stageManagement changes through time, requiring planning

Page 23: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Standing on the shoulders of giants

• “If I have seen further it is by standing on the shoulders of giants” Isaac Newton

• As researchers we are honour-bound to share our knowledge so that all may benefit

Page 24: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Observation 5:Most of the issues are not technical, they’re social

Corollary: we can do something to change them

Page 25: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Career Paths in UKCareers outside academic sector

Non-universityResearch (industry,government etc.)

ProfessorPermanentResearch Staff

Early CareerResearch

PhD

stud

ents

Source: The Scientific Century, Royal Society, 2010 (revised to reflect first stage clarification from “What Do PhD’s Do?” study)

UK STEM graduate

career paths

Page 26: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

We are science

Hear us roar!

Picture by Tamako the Jaguar

Page 27: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Shake up the system

• “Swim or drown” is not an efficient learning method

• “Publish or perish” is not an effective reward mechanism

• “Becoming a Professor” is not a scalable career path

• “I’ll just have to do it myself” is not a modern way of doing science

Page 28: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

The Software Sustainability Institute

A national facility for cultivating world-class research through software• Better software enables better research• Software reaches boundaries in its

development cycle that prevent improvement, growth and adoption

• Providing the expertise and services needed to negotiate to the next stage

• Developing the policy and tools tosupport the community developing andusing research software

Better software

Better research

Supported by EPSRC Grant EP/H043160/1

Page 29: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Campaigning for careers

www.rse.ac.uk

http://www.rse.ac.uk/

Page 30: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Nurturing a training community

• Bringing together 39+ organisations with interest in e-Infrastructure training

• Raising issues and enablers with RCUK, BIS

software.ac.uk/policy

Page 31: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

SSI Fellows 2014

• 2014: 16 fellows

• 2013: 15 fellows

• 2012: 10 fellows

• Range of subjects, career stages

software.ac.uk/fellows

Page 32: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Welcome to the CW14The Role of Software in Reproducible Research

6th Collaborations Workshop, Oxford26-28th March 2014

Organised by the Software Sustainability InstituteSponsored by Microsoft Research and Github

#CollabW14software.ac.uk/cw14

Page 33: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

Publicise your softwarehttp://openresearchsoftware.metajnl.com

http://dx.doi.org/10.6084/m9.figshare.942289

Page 34: Unless otherwise indicated slides licensed under

Software Sustainability Institute

www.software.ac.uk

What you can do now

• Read the Best Practices for Scientific Computing http://dx.doi.org/10.1371/journal.pbio.1001745

• Release your code and publish it in a journal http://bit.ly/softwarejournals

• Learn new software skills and pass them on to others http://www.software-carpentry.org/

• Ask for software and data if you’re reviewing a paper

• Forge a career in research, and change it for those coming behind you

• The DOI for this presentation: 10.6084/m9.figshare.957257• The Software Sustainabilty Institute is a collaboration between universities of Edinburgh, Manchester, Oxford and

Southampton. Supported by EPSRC Grant EP/H043160/1.