33
Herding Cats: Managing Open Source Projects and Communities Peter Rice 5 th November 2013

tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

Embed Size (px)

DESCRIPTION

tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cats: Managing Open Source Projects and Communities Peter Rice, Imperial College London Bioinformatics in academia was an early adopter of the open source approach to software projects, after first trying commerialisation and proprietary approaches. A selection of projects highlights the issues that arose and how they were successfully resolved.

Citation preview

Page 1: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

Herding Cats: Managing Open Source Projects and Communities

Peter Rice5th November 2013

Page 2: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

Herding Cats

• Managing an open source community is like herding cats.

• ‘Cat, come with me.' 'Nenni!' said the Cat. 'I am the Cat who walks by himself, and all places are alike to me. I will not come. But all the same, he followed' (Rudyard Kipling, Just So Stories)

Page 3: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

– EMACS– Linux– GPL– Apache

A brief history of open source software

Page 4: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Source code provided• Users free to inspect, modify and redistribute• Restrictions may be applied• Freedoms may be guaranteed• Several licenses may be combined

– If they are compatible

Open source licensing

Page 5: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Originally written for the EMACS editor and the GNU project

• Based on copyleft– Copyright holder usually restricts rights– In GPL, copyright holder requires all further

distributions to ensure free access– No further restrictions may be imposed– “Free as in speech, not as in beer”

GNU General Public License

Page 6: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• The full GNU General Public License makes it difficult to combine with other licenses as the whole binary is covered by GPL.

• The Lesser (Library) GPL only preserves the interface and requires LGPL library source code to be made available.

• Applications can be under any license• GPL code requires unlinked interfaces (APIs)

Lesser General Public License

Page 7: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Apache 2.0 allows modified code to use another license (including proprietary). “Indemnity clause” can be scary but is safe

• Perl artistic license has issues with redistributed code

• BSD license imposed a “restriction” requiring citing the original authors, usually removed in several “modified BSD” versions

Other Open Source Licenses

Page 8: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• 1980 Staden package– in support of Fred Sanger

• 1982 EMBL/GenBank– Free sequence databases, later also SwissProt

• 1984 Genetics Computer Group– Free (initially) sequence analysis package

• 1990 Sequence Retrieval System• 1990 BLAST• 1997 EMBOSS

A brief history of bioinformatics

Page 9: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• The Staden package was developed from 1987 to 2003 by Rodger Staden at the MRC-funded Laboratory for Molecular Biology

• To get a copy of the software, users mailed a cheque for £100 to the Medical Research Council

• In 2003, renewal of funding was rejected

Copyright and Ownership

Page 10: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• The software was still owned by the funders• The authors had no right to apply for

alternative funding• … nor did anyone else• Two years later it was formally re-released as

open source, but developers had left.

Copyright and Ownership

Page 11: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• The HMMER package provides standard Hidden Markov Model applications for multiple alignments of protein sequences

• HMMER 2 had a dual licensing model– GNU General Public License– Commercial license

• Only one of these can include third-party contributions. The commercial license cannot.

Multiple licensing

Page 12: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• The Sequence Retrieval System was developed by Thure Etzold as a PhD project, then at EMBL Heidelberg and the European Bioinformatics Institute.

• LION Bioscience in Cambridge started up to maintain and develop SRS commercially

• LION merged with competitors (e.g. NetGenics)

From academia to commercial

Page 13: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• NetGenics software was withdrawn– Customers had to purchase an SRS license instead

• LION merged with BioWisdom• BioWisdom merged with Instem• Lesson: commercial software is high quality,

well supported, but can disappear at any time.

• Open source software avoids this risk

From academia to commercial

Page 14: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• BLAST was developed at NCBI as a successor to FASTA

• Development split into BLAST and WU-BLAST (Washington University) providing new features

• WU-BLAST in turn became commercial AB-BLAST

Branching

Page 15: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• BLAST and the NCBI Toolkit were an early example of open source bioinformatics

• Most software at the time was commercial• In 1990 the commercial providers wrote to

Congress asking for withdrawal of funding for NCBI software because it competed with US industry.

• They failed.

Competition

Page 16: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• The GCG package was developed by the Genetics Computer Group at the University of Wisconsin

• One of the most cited papers in biology– If you change more than 25% of the code, you can

remove the GCG copyright• Changed to an annual source code license

model• Extensions (EGCG) distributed as source code

by EMBL Heidelberg and then by Sanger

Competition

Page 17: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Social scientists have reported in detail on GCG as an example of he development of bioinformatics.

• Intelligenetics Inc objected to GCG’s unfair competition

• Wisconsin spun off GCG Inc• Software license fee doubled• Usage continued• EGCG developed to 50% of the GCG code base

Competition

Page 18: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• GCG Inc looked for a new owner• Source code deemed to be their major asset• Source code distribution was withdrawn• Increased fee for source code• Very restrictive terms of distribution• EGCG was abandoned with 150 applications• EMBOSS written from scratch to replace both

– GPL/LGPL licensing– Created by the former EGCG community

Competition

Page 19: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• So, to summarise– 1984 GCG started as open source– 1990 Became GCG Inc– 1997 Acquired by Oxford Molecular– 2000 EMBOSS 1.0 released as open source

Harvey, M. and McMeekin, A. (2007) “Public or Private Economies of Knowledge? Turbulence in the Biological Sciences”

Competition

Page 21: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• The developers are only the beginning– Users– Installers– Technical authors– Helpdesk and support– Communication– Quality assurance– Competitors

Managing an Open Source Community

Page 22: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• New source code, new functionality• Maintaining source code

– Bug fixes, coding standards• Interfaces

– APIs, third party integration• Competititors (including open source)

– New features and functionality– Integration and active collaboration

Contributions by developers

Page 23: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Branches– Someone needs to merge branches

• Original developers should agree to help• Often merged by others wanting to use new features

– Ideally, merge with a single core– Useful to merge any set of branches– Combine with test suite(s)

Contributions (continued)

Page 24: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• New data types• ETL procedures• Standards• Project-specific requirements

Contributions (continued)

Page 25: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Documentation– Users are good at writing/updating manuals

• Training– Shared examples with public data and common

standards• Support

• Feature requests

Contributions (continued)

Page 26: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Git: Github etc.• Sourceforge• Open Bio Foundation• Locally hosted solutions:

– CVS or SubVersion• Wiki

– Documentation: developers, users, installers

Hosting solutions

Page 27: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Projects need a coordinator– Linux: Linus Torvalds– Emacs: Richard Stallman– GCG: John Devereux– EMBOSS: Peter Rice

Coordination

Page 28: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Maintaining a standard code base– github.com/transmart

• Tracking branches and modified copies elsewhere

• Selecting best solutions from available branches

• Merging conflicting changes• Continuous testing

Coordination

Page 29: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Community meetings (London, Amsterdam, Paris, …) for developers and users

• Regular technical developer meetings / TCs• Mailing lists

– Provide a useful archive• Trackers (JIRA, Pivotal, …)

– Defining tasks/issues and resolving them• Wiki

– Community documentation

Communication

Page 30: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Quality assurance– More tests are always helpful

• Automated documentation– Creating screenshots from test outputs

• Create tests for documented examples• Automated update when results change• Ensure documented functionality still functions

Efficiency

Page 31: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• In a small community, sanctions work– Financial penalties for breaking the code– Small fines for bugs– Put back e.g. funding Xmas drinks

Cat incentives

Page 32: tranSMART Community Meeting 5-7 Nov 13 - Session 2: Herding Cat

• Acknowledge contributions• Benefit from sharing code in other branches• Developers need to support one another

– Put out any flame wars• Involve the user community

– Encourage non-developers to contribute• Keep everything public

– Support the community– Attract new cats

Cat treats