Transcript
Page 1: Open Data - Where Do We Stand from a Researcher's Perspective?

Open Data – Where Do We Stand from A Researcher's

Perspective?

Philip E. Bourne

University of California San Diego

[email protected]

Page 2: Open Data - Where Do We Stand from a Researcher's Perspective?

My Perspective …• Mine is a biomedical sciences perspective• My lab. distributes for free data equivalent to ¼ the

Library of Congress every month• I am a supporter of open access (provided there is a

business/sustainability model) and founding editor in chief of PLOS Computational Biology

• I am Co-founder of SciVee Inc. and believe innovation comes from open access to knowledge

• Recently became UCSD’s AVC of Innovation which is giving me a more institutional perspective

I Readily Acknowledge Each Discipline is Different

Page 3: Open Data - Where Do We Stand from a Researcher's Perspective?

My General Opinion:Where Does the Open Access Debate

Stand Today?

• Its not a question of “if” but a question of “when” and “how” for most disciplines

• We are at the tip of the iceberg in our ability to use OA content

• OA will gain momentum in an increasingly knowledge-based economy

Page 4: Open Data - Where Do We Stand from a Researcher's Perspective?

The State of Play:UC Open Access Policy Debate:

Opt Out vs Opt in

• For– Publically funded

research should be public

– Institutional Perspective: The open provision of data and knowledge derived from these data appears to be an unidentified asset at this time

• Against– Cost to some

disciplines– Impact on societies– Journal quality re

promotion– Extra work– Administration– UC as “Big Brother”

Page 5: Open Data - Where Do We Stand from a Researcher's Perspective?

We will come back to this, but first let us explore why open

knowledge is so important (to me at least)

Page 6: Open Data - Where Do We Stand from a Researcher's Perspective?

Open Data May Save Lives?

* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm

Jan. 2008 Jan. 2009 Jan. 2010Jul. 2009Jul. 2008 Jul. 2010

1RUZ: 1918 H1 Hemagglutinin

Structure Summary page activity forH1N1 Influenza related structures

*

3B7E: Neuraminidase of A/Brevig Mission/1/1918 H1N1 strain in complex with zanamivir

Page 7: Open Data - Where Do We Stand from a Researcher's Perspective?

Open Science Can Accelerate the Scientific Process…

For some people the change may be too slow to save their life

Page 8: Open Data - Where Do We Stand from a Researcher's Perspective?

Josh Sommer – A Remarkable Young ManCo-founder & Executive Director the Chordoma Foundation

http://sagecongress.org/Presentations/Sommer.pdf

Page 9: Open Data - Where Do We Stand from a Researcher's Perspective?

Chordoma

• A rare form of brain cancer

• No known drugs• Treatment – surgical

resection followed by intense radiation therapy

http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG

Page 10: Open Data - Where Do We Stand from a Researcher's Perspective?

http://sagecongress.org/Presentations/Sommer.pdf

Page 11: Open Data - Where Do We Stand from a Researcher's Perspective?

http://sagecongress.org/Presentations/Sommer.pdf

Page 12: Open Data - Where Do We Stand from a Researcher's Perspective?

http://sagecongress.org/Presentations/Sommer.pdf

Page 13: Open Data - Where Do We Stand from a Researcher's Perspective?

Adapted: http://sagecongress.org/Presentations/Sommer.pdf

Isaac

If I have seen further it is only by standing on the shoulders of giants

Isaac Newton

From Josh’s point of view the climb up just takes too long

> 15 years and > $850M to be more precise

Page 14: Open Data - Where Do We Stand from a Researcher's Perspective?

http://sagecongress.org/Presentations/Sommer.pdf

Page 15: Open Data - Where Do We Stand from a Researcher's Perspective?

http://sagecongress.org/Presentations/Sommer.pdf

Page 16: Open Data - Where Do We Stand from a Researcher's Perspective?

http://fora.tv/2010/04/23/Sage_Commons_Josh_Sommer_Chordoma_Foundation

Page 17: Open Data - Where Do We Stand from a Researcher's Perspective?

The Story of Meredith

Page 18: Open Data - Where Do We Stand from a Researcher's Perspective?

What Does Meredith Tell Us?

• The Wikipedia / Kahn Academy /YouTube generation knows no bounds

• Bounds are too often imposed by tradition rather than what makes the most sense

• Another example of an underexploited asset at this time?

Page 19: Open Data - Where Do We Stand from a Researcher's Perspective?

Another Way of Thinking About the Implications of What Josh and Meredith Represent Is the

Need for New Forms of Knowledge Management and

Access

Lets Explore this Notion with An Emphasis on Data

Page 20: Open Data - Where Do We Stand from a Researcher's Perspective?

The Silos of Data & Knowledge Are Starting to Coalesce

Is a Biological Database Really Different than a Biological Journal?PLoS Comp. Biol. 2005 1(3) e34

Page 21: Open Data - Where Do We Stand from a Researcher's Perspective?

The Silos of Data & Knowledge Are Starting to Coalesce

• Supplemental information has exploded

• Data journals are emerging

• The use of rich media is increasing

• Software and other processes are becoming available

• Databases are now knowledgebases

• Science can be done on the fly

• Biocuration is a respectful career

PLoS Comp. Biol. 2008. 4(7): e1000136

Page 22: Open Data - Where Do We Stand from a Researcher's Perspective?

Where Does That Take Us?

• A paper is an artifact of a previous era• It is not the logical end product of eScience,

hence:– Work is omitted– Article vs supplement is a mess– Visualization may be limited– Interaction and enquiry are non-existent– Rich media can help, but barriers remain

Page 23: Open Data - Where Do We Stand from a Researcher's Perspective?

Where Does That Take Us? Data Sharing Policies

• From the NSF:

• Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections and other supporting materials created or gathered in the course of work under NSF grants. Grantees are expected to encourage and facilitate such sharing. See Award & Administration Guide (AAG) Chapter VI.D.4.

Page 24: Open Data - Where Do We Stand from a Researcher's Perspective?

Big Data is Off…

• March 2012 OSTP commits $200M to Big Data

• NSF, DOD, NIH all announce programs

• GBMF think tank leads to soon-to-be-announced institutional awards

Page 25: Open Data - Where Do We Stand from a Researcher's Perspective?

Where Does That Take Us?Add into the Mix:

• Reproducibility• Maintainability• Usability• Reward

• It really is a myth!• DNA doubles in 5 months• Go ahead and try!• Tenure for data – no way

Notwithstanding dreams do emerge …Here is mine

Page 26: Open Data - Where Do We Stand from a Researcher's Perspective?

1. A link brings up figures from the paper

0. Full text of PLoS papers stored in a database

2. Clicking the paper figure retrievesdata from the PDB which is

analyzed

3. A composite view ofjournal and database

content results

Here is What I Want

1. User clicks on thumbnail2. Metadata and a

webservices call provide a renderable image that can be annotated

3. Selecting a features provides a database/literature mashup

4. That leads to new papers

4. The composite view haslinks to pertinent blocks

of literature text and back to the PDB

1.

2.

3.

4.

The Knowledge and Data Cycle

PLoS Comp. Biol. 2005 1(3) e34

Page 27: Open Data - Where Do We Stand from a Researcher's Perspective?

The Knowledge Economy Begins

Immunology Literature

Cardiac DiseaseLiterature

Page 28: Open Data - Where Do We Stand from a Researcher's Perspective?

Simultaneously Discovery Informatics Emerges

• Google with not suffice as a scientific knowledge discovery tool

• Google is broad but shallow

• Science is cross-disciplinary narrower and deeper

Page 29: Open Data - Where Do We Stand from a Researcher's Perspective?

NSF Discovery Informatics Workshop

• Discoveries surpass an individuals ability - need intelligent tools

• Need to increase connections between knowledge and data

• Need to combine diverse human abilities

Discovery informatics - computer scientists, domain scientists, social scientists - http://www.isi.edu/~gil/diw2012/NSFDiscoveryInformatics2012-FinalReport.pdf

Page 30: Open Data - Where Do We Stand from a Researcher's Perspective?

This is Just the Beginning of Discovery Informatics

• Each evening the labs “Evernote” notebooks are scanned for commonalities from the days activities. These are seeds in a deep search of the web for knowledge and data that has become available since last searched. Results are ranked and presented for consideration over coffee the next morning

http://www.discoveryinformaticsinitiative.org/diw2012

Page 31: Open Data - Where Do We Stand from a Researcher's Perspective?

Unimaginable Connections Made Automatically Through RDF Descriptions

http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.html

Page 32: Open Data - Where Do We Stand from a Researcher's Perspective?

Before We Get Too Heady Lets Look at the Realities of the

Situation from My Perspective

• Data repositories are broken

• There is a “high noon” effect

• NCBI has been a wonderful model to date…

Page 33: Open Data - Where Do We Stand from a Researcher's Perspective?

Data/Institutional Repositories

• Build it and they will come fails most of the time

• Institutional repository is an oxymoron

• NCBI works because:– It is an act of the US congress– It has strong leadership– It has a monopoly on the literature– It has IT thought out over many years

Innkeeper at the Roach Motel D. Salo 2008http://muse.jhu.edu/journals/library_trends/v057/57.2.salo.html

Page 34: Open Data - Where Do We Stand from a Researcher's Perspective?

Data/Institutional Repositories

• “High Noon” Effect

– Publishers make knowledge in very difficult, but at least knowledge out, albeit limited is consistent, intuitive and easy to use

– Data repositories make data in and data out very difficult – they strive to be different when in fact users want them to be the same

Page 35: Open Data - Where Do We Stand from a Researcher's Perspective?

Data and Journals

• That journals are thinking about data is good

• Dryad etc. are welcome but a stop gap measure

• Fully functional data journals will not occur without a change to the reward system

• Data papers can help shift the reward system

• Are PLoS Topic Pages a sign?

Page 36: Open Data - Where Do We Stand from a Researcher's Perspective?

Interim Solution: Use the Traditional Reward SystemThe Wikipedia Experiment – Topic Pages

Identify areas of Wikipedia that relate to the journal that are missing of stubs

Develop a Wikipedia page in the sandbox

Have a Topic Page Editor Review the page

Publish the copy of record with associated rewards

Release the living version into Wikipedia

Page 37: Open Data - Where Do We Stand from a Researcher's Perspective?

Think Globally Act Locally:

What Can Our Institutions Do Now To Move Us in The Right

Direction?

Page 38: Open Data - Where Do We Stand from a Researcher's Perspective?

Institutional Response

• Have repositories that are useful– Use common standards– Are vetted by the community– Are fully open and searchable

• Reward all forms of scholarship

• Leverage the asset …

Page 39: Open Data - Where Do We Stand from a Researcher's Perspective?

Most Laboratories

• We are the long tail• Goodbye to the

student is goodbye to the data

• Very few of us have complied (or will comply with the data management plans we write into grants)

Page 40: Open Data - Where Do We Stand from a Researcher's Perspective?

UCSD Dropbox

• Simple!!!!• Can drop large files easily• Asks for limited metadata and permissions to

“discover”• Has guaranteed quality of service and

security not available in the cloud• Is the data management plan and charged

against grants• Is a rich campus corpus open to discovery

informatics

Page 41: Open Data - Where Do We Stand from a Researcher's Perspective?

The UCSD Dropbox Discovery Environment

• Scenarios:– Fosters known collaborations through

simplified data exchange– Discovers new collaborators through the

same or related data elements– A corpus whose intrinsic value is as yet

unknown

Page 42: Open Data - Where Do We Stand from a Researcher's Perspective?

What Do I Want by 2020 or Earlier as a Researcher?

• Answer biological questions not just retrieve data

• Understand all there is to know about the availability and quality of a unit of biological data

• Operate on data in a way that is simpler, more productive, and reproducible

Page 43: Open Data - Where Do We Stand from a Researcher's Perspective?

What Do We Need to Do to Get There? A Data Registry?

• Individual repositories register their metadata which includes access statistics, commentary etc. – DataCite is a beginning

• Identify identical data objects and their respective metadata for comparative analysis

• Funders support registration• Publishers support registration

Page 44: Open Data - Where Do We Stand from a Researcher's Perspective?

What Do We Need to Do to Get There? An App+ Store?

• The App model– Think of it operating on a content base rather

than a mobile device– Simple and consistent user interface– Needs to pass some quality control– Has a reward

• The App+ Model– Apps interoperate through a generic workflow

interface

Page 45: Open Data - Where Do We Stand from a Researcher's Perspective?

In Summary

• We have at hand the means to accelerate the rate of discovery

• To do so we need to place more value on the data, the individuals that produce it and the institutions that maintain it

• We are all stakeholders in this endeavor

• Here is one way to get involved….

Page 46: Open Data - Where Do We Stand from a Researcher's Perspective?

Get Involved: FORCE11

• Tools and Resource catalog

• Article database in Mendeley

• Discussion Forum via Google

• Blogs courtesy of blog sites and RSS feeds

• Web site via Drupal• Announcements via

Twitter

http://force11.org

Page 47: Open Data - Where Do We Stand from a Researcher's Perspective?

General References

• Force11 Manifesto

• Fourth Paradigm: Data Intensive Scientific Discovery http://research.microsoft.com/enus/collaboration/fourthparadigm/


Recommended