View
821
Download
1
Category
Tags:
Preview:
DESCRIPTION
Presented at the WikiSym and OpenSym joint conference in Hong Kong on August 7, 2013.
Citation preview
The Era of Open
Philip E. Bourne
University of California San Diego
pbourne@ucsd.edu
WikiSym+OpenSym Aug 7, 2013 1
The Era of Open Has The Potential to Deinstitutionalize
WikiSym+OpenSym Aug 7, 2013 2
Daniel Hulshizer/Associated Press
The Era of Open Has The Potential to Deinstitutionalize
WikiSym+OpenSym Aug 7, 2013 3
Daniel Hulshizer/Associated Press
An Example of That Potential:The Story of Meredith
WikiSym+OpenSym Aug 7, 2013 4
http://fora.tv/2012/04/20/Congress_Unplugged_Phil_Bourne
The Era of Open Has The Potential to Deinstitutionalize
WikiSym+OpenSym Aug 7, 2013 5
Daniel Hulshizer/Associated Press
Deinstitutionalization Vs Conservatism
WikiSym+OpenSym Aug 7, 2013 6
Daniel Hulshizer/Associated Press
It Starts with the Metrics of Success
[Adapted from Carole Goble]WikiSym+OpenSym Aug 7, 2013 7
Committee on Academic Promotions
• What Counts– Money– Grants– Papers– Teaching – Service
• What Does Not– Sharing data– Sharing software– Open access– Collaboration– Patents– Startups
WikiSym+OpenSym Aug 7, 2013 8
Getting Ahead as a Computational Biologist in Academia PLOS Comp Biol
The Era of Open Has The Potential to Deinstitutionalize
WikiSym+OpenSym Aug 7, 2013 9
Daniel Hulshizer/Associated Press
Interim Solution: Use the Traditional Reward SystemThe Wikipedia Experiment – Topic Pages
Identify areas of Wikipedia that relate to the journal that are missing of stubs
Develop a Wikipedia page in the sandbox
Have a Topic Page Editor Review the page
Publish the copy of record with associated rewards
Release the living version into Wikipedia
WikiSym+OpenSym Aug 7, 2013 10
MOOCs Are Another Form of Disruption
WikiSym+OpenSym Aug 7, 2013 11
In Short Most Academic Institutions Have Yet to
Embrace the Open Digital Enterprise They Surely Will
Become
WikiSym+OpenSym Aug 7, 2013 12
• Anyone, anything, anytime
• publication access, data, models, source codes, resources, transparent methods, standards, formats, identifiers, apis, licenses, education, policies
• “accessible, intelligible, assessable, reusable”
http://royalsociety.org/policy/projects/science-public-enterprise/report/
[Carole Goble]WikiSym+OpenSym Aug 7, 2013 13
Business Models Rule
• The Internet demanded new business models to support scholarly communication
• Open access was one such sustainable model: – Began with the community – Was driven by new organizations (PLOS, BMC,
F1000, eLife, Dryad, Mendeley etc.)– Was NOT driven by academic institutions– Was driven by policies and funders
WikiSym+OpenSym Aug 7, 2013 14
One Metric of Change:Multidisciplinary Open Access
Mega Journal
• This year PLOS ONE will publish over 30,000 papers!
WikiSym+OpenSym Aug 7, 2013 15
This Disruption Got Us Thinking About…
• A paper as only one form of knowledge discovery
• The use of interaction and rich media from which to learn and actually do science
• Reproducibility• Reward structures• Better management of the research lifecycle
P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLOS Comp. Biol. 1(3) e34
WikiSym+OpenSym Aug 7, 2013 16
This Disruption Got Us Thinking About…
• A paper as only one form of knowledge discovery
• The use of interaction and rich media from which to learn and actually do science
• Reproducibility• Reward structures• Better management of the research lifecycle
P.E. Bourne 2005 In the Future will a Biological Database Really be Different from a Biological Journal? PLOS Comp. Biol. 1(3) e34
WikiSym+OpenSym Aug 7, 2013 17
Better Management of the Research Lifecycle is Not a
New Concept
WikiSym+OpenSym Aug 7, 2013 18
“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995
datasetsdata collectionsalgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware
Morin et al Shining Light into Black BoxesScience 13 April 2012: 336(6078) 159-160
Ince et al The case for open computer programs, Nature 482, 2012
[Carole Goble]
The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
SoftwareRepositories
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
The Research Lifecycle
IDEAS – HYPOTHESES – EXPERIMENTS – DATA - ANALYSIS - COMPREHENSION - DISSEMINATION
AuthoringTools
Lab Notebooks
DataCapture
SoftwareRepositories
Analysis Tools
Visualization
ScholarlyCommunication
Commercial &Public Tools
Git-likeResources
By Discipline
Data JournalsDiscipline-
Based MetadataStandards
Community Portals
Institutional Repositories
New Reward Systems
Commercial Repositories
Training
automate: workflows, pipeline & service integrative frameworks
pool, share & collaborate web systems
nanopub
semantics & ontologiesmachine readable documentation
scientific software engineering
CSSE
Carole Goble]
Why is This Important to Me Personally?
• My wife is being treated for stage 1 breast cancer
• This highlights for me the disparity between what is happening in the lab and what is happening in the clinic– In the lab cancer is a personalized and treatable
condition– In the clinic we are still equally “poisoning”
patients with drugs first introduced 10-20 years ago
WikiSym+OpenSym Aug 7, 2013 23
http://sagecongress.org/Presentations/Sommer.pdf
WikiSym+OpenSym Aug 7, 2013 24
Josh Sommer]
http://sagecongress.org/Presentations/Sommer.pdf
WikiSym+OpenSym Aug 7, 2013 25
[Josh Sommer]
Most Laboratories
• We are the long tail• Goodbye to the student is
goodbye to the data• Very few of us have
complied (or will comply with the data management plans we write into grants)
• Too much software is unusable
S.Veretnik, J.L.Fink, and P.E. Bourne 2008 Computational Biology Resources Lack Persistence and Usability. PLoS Comp. Biol. . 4(7): e1000136
WikiSym+OpenSym Aug 7, 2013 26
Today’s Research Lifecycle is Digitally Fragmented at Best
• Proof:– I cant immediately reproduce the research in
my own laboratory• It took an estimated 280 hours for an average user
to approximately reproduce the paper
– Workflows are maturing and becoming helpful– Data and software versions and accessibility
prevent exact reproducability
Daniel Garijo et al. 2013 Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome PLOS ONE under review.
WikiSym+OpenSym Aug 7, 2013 27
At the Same Time The Disruption Continues
WikiSym+OpenSym Aug 7, 2013 28
G8 open data charterhttp://opensource.com/government/13/7/open-data-charter-g8
WikiSym+OpenSym Aug 7, 2013 29
• In the US alone..– March 2012 OSTP
commits $200M to Big Data
– OSTP demands sharing plans by August 2013
– GBMF/Sloan provide institutional awards for data science
– NCBI considers data catalog and MyBibliography
And the Disruption Continues
WikiSym+OpenSym Aug 7, 2013 30
Where Will It End?
First We Should Ask What It Is We Wish to Accomplish
WikiSym+OpenSym Aug 7, 2013 31
1. A link brings up figures from the paper
0. Full text of PLoS papers stored in a database
2. Clicking the paper figure retrievesdata from the PDB which is
analyzed
3. A composite view ofjournal and database
content results
Here is What I Want – The Paper As Experiment
1. User clicks on thumbnail2. Metadata and a
webservices call provide a renderable image that can be annotated
3. Selecting a features provides a database/literature mashup
4. That leads to new papers
4. The composite view haslinks to pertinent blocks
of literature text and back to the PDB
1.
2.
3.
4.
PLoS Comp. Biol. 2005 1(3) e34
32
Here is What I Want – Knowledge Push
• Each evening the labs “Evernote” notebooks are scanned for commonalities from the days activities. These are seeds in a deep search of the webs research lifecycles that has become available since last searched. Results are ranked and presented for consideration over coffee the next morning
http://www.discoveryinformaticsinitiative.org/diw2012
WikiSym+OpenSym Aug 7, 2013 33
Will End With …
• Infrastructure:– Science, Nature, Cell and megajournals all
“open access” – An array of coupled institutional repositories – A central repository – PubMed Central – Open software in full support of the research
lifecycle – The research lifecycle in the cloud
WikiSym+OpenSym Aug 7, 2013 34
Will End With …
• Sociologically:– An end to build it and they will come– Alternative metrics accepted by the
community– Alternative reward systems that recognize the
realities of today’s scholarship, namely:• Open data availability• Software availability• Collaborative research
WikiSym+OpenSym Aug 7, 2013 35
We Have a Way to GoConsider the Life Sciences
• Good News– We have NCBI/EBI– Publishers are starting
to embrace data– Workflows in support
of the research lifecycle are catching on
• Bad News– Sustainability remains
a noun not a verb– Data are organized by
type not by questions asked (silos)
– Tenure committees are still in the dark ages
WikiSym+OpenSym Aug 7, 2013 36
What Can We Do As a Community?
WikiSym+OpenSym Aug 7, 2013 37
Build Trust
38
Data
Trust in the dataand the derived knowledge
WikiSym+OpenSym Aug 7, 2013
What I Have Learned About Trust 1/2
• Trust is like compound interest
• Comes from listening
• Comes from engaging the community in every aspect of the process
• Comes from data consistency and level of annotation
• Comes from responsiveness
• Comes from the quality of the delivery service
39WikiSym+OpenSym Aug 7, 2013
What I Have Learned About Trust 2/2
• Quality begats trust– Quality requires data models/ontologies
• Quality requires people– Annotators are the unsung heroes
• Trust requires provenance & versioning
• Trust requires explaining that all data and knowledge are not created equal
40WikiSym+OpenSym Aug 7, 2013
Beyond Building Trust What Else Can We Do?
WikiSym+OpenSym Aug 7, 2013 41
Think Globally Act Locally
• Support emergent community commons/portals• Be involved in the support and development of
metadata standards• Contribute to workflow development etc. to drive
an open research lifecycle• Educate your mentors on the importance of
open science and scholarly communication • Write software thinking of an App model
WikiSym+OpenSym Aug 7, 2013 42
Understand That All Data/Knowledge Are NOT
Created Equal• We need to understand
how data are used• Sustainability is not
more money from the funding agencies its about business models
• Reductionism is not a dirty word
• We need to do more with the long tail
On the Future of Genomic DataScience 11 February 2011: vol. 331 no. 6018 728-729 WikiSym+OpenSym Aug 7, 2013
Recognize That Institutions Must Play a Greater Role
• We need institutional data/knowledge sharing plans
• We need data/information scientists to be better recognized by institutions – its not all about papers – this implies new metrics
44WikiSym+OpenSym Aug 7, 2013
Learn from the App Store
• The App model– Think of it operating on a content base rather
than a mobile device– Simple and consistent user interface– Needs to pass some quality control– Has a reward
• The App+ Model– Apps interoperate through a generic workflow
interface
WikiSym+OpenSym Aug 7, 2013 45
In Summary
• Open science is a means to accelerate the rate of discovery
• Disruption has begun, but there is great inertia in the system
• All of us are stakeholders and capable of invoking further positive change
• We need to get institutions and more scientists involved….
WikiSym+OpenSym Aug 7, 2013 46
Acknowledgementswww.force11.org
WikiSym+OpenSym Aug 7, 2013 47
pbourne@ucsd.edu
• Force11 Manifesto• Fourth Paradigm: Data Intensive Scientific
Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/WikiSym+OpenSym Aug 7, 2013 48
Recommended