82
Measuring progress toward a cultural norm of shared (and reused!) biomedical research data Heather Piwowar Department of Biomedical Informatics University of Pittsburgh

NESCent visit: Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Embed Size (px)

DESCRIPTION

Preliminary work and future directions in measuring biomedical research data sharing

Citation preview

Page 1: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Measuring progress toward a cultural norm of

shared (and reused!)biomedical research data

Heather Piwowar

Department of Biomedical InformaticsUniversity of Pittsburgh

Page 4: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Sharing research data

PAST MEDICAL HISTORY:

Past medical history showed she had

superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for

four years.

She had been hypothyroid for three years.

HISTORY OF PRESENT ILLNESS:

The patient is a 58-year-old female, …

http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441

Page 5: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Sharing research data

PAST MEDICAL HISTORY:

Past medical history showed she had

superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for

four years.

She had been hypothyroid for three years.

HISTORY OF PRESENT ILLNESS:

The patient is a 58-year-old female, …

http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441

Page 6: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Sharing research data

PAST MEDICAL HISTORY:

Past medical history showed she had

superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for

four years.

She had been hypothyroid for three years.

HISTORY OF PRESENT ILLNESS:

The patient is a 58-year-old female, …

http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441

Page 7: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Sharing research data

PAST MEDICAL HISTORY:

Past medical history showed she had

superficial phlebitis times two in the past, had non-insulin dependent diabetes mellitus for

four years.

She had been hypothyroid for three years.

HISTORY OF PRESENT ILLNESS:

The patient is a 58-year-old female, …

http://upload.wikimedia.org/wikipedia/commons/7/76/PeptideMSMS.jpg; http://en.wikipedia.org/wiki/Image:Helices.png; http://en.wikipedia.org/wiki/Image:Heatmap.png; http://en.wikipedia.org/wiki/Image:Microarray2.gif; http://zellig.cpmc.columbia.edu/medlee/demo/; htp://www.plosone.org/article/fetchArticle.action?articleURI=info:doi/10.1371/journal.pone.0000441

Page 8: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Shared data benefits science

VerifyUnderstandExtendExploreCombineSynergizeTrainReduce

Page 9: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

But... costly for authorsFindOrganizeDocumentDeidentifyFormatDecideAskSubmit

Answer questionsWorry about mistakes being foundWorry about data being misinterpretedWorry about being scoopedForgo money and IP and prestige???

Page 10: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

As a result, policy makers have spent lots of time and money ....

http://www.flickr.com/photos/tonivc/2283676770/

http://www.flickr.com/photos/johnnyvulkan/381941233/

Page 11: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

... on initiatives, requests, requirements, and tools

NIH data sharing plan requirement

Journal requirements

Public databases

Data sharing grids like BIRN and caBIG

Data formatting standards

Editorials, letters to the editor, discussion....

Page 13: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

lots of data sharing!

http://www.genome.jp/en/db_growth.html

Page 14: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

but how much isn’t shared?

what isn’t shared?

who isn’t sharing it?why not?

what can we do about it?

how much does it matter?

Page 15: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

you can not manage what you do not measure

http://www.flickr.com/photos/archeon/2941655917/

Page 16: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

1. Is there benefit for those who share?

2. Do journal policies increase rates of sharing?

3. What other factors are correlated with sharing and withholding data?

research questions

Page 17: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

microarray data

http://en.wikipedia.org/wiki/DNA_microarray

http://en.wikipedia.org/wiki/Image:Heatmap.png

http://commons.wikimedia.org/wiki/File:DNA_double_helix_vertikal.PNG

Page 18: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

microarray data

Page 19: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

http://www.flickr.com/photos/sunrise/35819369/

1. Is there benefit for those who share?

Page 20: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

currency of value?

Citations.

$50!

Diamond,Arthur M. What is a Citation Worth?. The Journal of Human Resources (1986) vol. 21 (2) pp. 200-215

Page 21: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Prior work focused on the citation advantage of an open access publishing model.

Our question: are articles that share their raw research data cited more than articles that don’t?

Page 22: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

dataset85 cancer microarray trials published in 1999-2003, as identified by Ntzani and Ioannidis (2003)

citationsISI Web of Science Citation index, citations from 2004-2005

data sharing locationsPublisher and lab websites, microarray databases, WayBack Internet Archive, Oncomine

statisticsMultivariate linear regression

Page 23: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Note:log scale

Page 24: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

In multivariate regression, we found studies that had made their data publicly available received 69% more citations than similar studies that did not share their data (95% confidence interval: 18% to 143%)

Piwowar, Day and Fridsma (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308

Page 25: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

• collect a larger dataset for citation analysis (stay tuned)

• investigate other datatypes

• examine citation context

future work

Page 26: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

http://www.flickr.com/photos/ryanr/142455033/

2. Do journal data sharing policies increase sharing?

Page 27: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

“An inherent principle of publication is that others should be able to replicate and build upon the authors' published claims. Therefore, a condition of publication in a Nature journal is that authors are required to make materials, data and associated protocols available in a publicly accessible database …”

http://www.nature.com/authors/editorial_policies/availability.html

http://www.nature.com/nature/journal/v453/n7197/index.html

Page 28: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Prior work examined data sharing policies in biomedicine, but these reviews are now dated, consider a variety of resources, and don’t correlate policy to behaviour.

McCain. Science Communication, Vol. 16, No. 4. (1 June 1995), pp. 403-431

NAS. Sharing Publication-Related Data and Materials. (2003), p. 33

Page 29: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Our aim: look at data sharing policies within Instruction to Author statements of 70 journals, as they apply to gene expression microarray data.

Page 30: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Very diverse policies in terms of:• statements of policy motivation• datatype-specific policies• requested vs. required• data location• data format• data completeness• timeliness of sharing• consequences for not sharing• exceptions

content of data sharing policies

Page 31: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

No applicable policy (43%)

Weak policy (24%)

should, recommend, requestmust, but without database accession number

Strong policy (33%)

must, required, condition of publicationrequires database accession number

strength of data sharing policies

Page 32: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Journal has a data sharing policy?

Impact

Factor

Open

Access?

Society

Publisher?

•! Biochemistry

&Molecular Biology

•! Oncology

strength of data sharing policiesmultivariate associations

Page 33: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

High-impact journals

tend to have

a strong data-sharing

policy

strength of data sharing policiesassociated with impact factor

Page 34: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

For each of the 70 journals,

we measured the percent of articles that were cited from within GEO and ArrayExpress.

We considered this a proxy for percent of articles with shared data.

data sharing policiesassociated with amount of sharing

Page 35: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

% of articles with shared data

Impact

Factor

Open

Access?

Society

Publisher?

•! Genetics &

Heredity

•! Multidisciplinary Sciences

Having a data-sharing policy?

data sharing policiesassociated with amount of sharing

Page 36: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

• our corpus of “gene expression microarray” articles may have included some that reused data and did not themselves produce primary data

• these results should be considered preliminary, pending a more precise filter (stay tuned)

http://www.flickr.com/photos/vlastula/300102949/

Page 37: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

• use a more precise filter to isolate data producing articles and thereby understand the absolute levels of data sharing

• investigate other datatypes

• look at associations with reviewer instructions and opinions

future work on journal policies

Page 38: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

• are they effective? (stay tuned)

• what do people propose in data sharing plans? Do they do what they propose? Why not?

• quantify the perceived worth of data sharing plans and accomplishments in funding and promotion decisions

future work on funder policies

Page 39: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

http://www.flickr.com/photos/cogdog/123072/

3. What other factors are correlated with sharing and withholding data?

Page 40: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Prior work has focused on surveys and studies of intention.

Our aim: measure associations between observed data sharing behaviour and environmental variables

Campbell et al. JAMA. 2002.Kyzas et al. J Natl Cancer Inst. 2005.

Vogeli et al. Acad Med. 2006.Reidpath et al. Bioethics 2001.

Blumenthal et al. Acad Med. 2006

Page 41: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Ochsner et al. manually reviewed 20 journals for 2007:

400 studies

200 shared their microarray data

Ochsner et al. (2008). Much room for improvement in deposition rates of expression microarray datasets. Nature Methods, 5(12), 991.

pilot dataset

Page 42: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Is research data shared after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

pilot variables

Page 43: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

funder mandates

NIH 2003 Data Sharing Requirement

Requires a data sharing plan

for studies funded after October 2003

that receive more than $500 000 in direct funding per year

Page 44: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Assumed data sharing requirement was applicable if:

the NIH grant numbers associated with PubMed entry had

$750 000 in total funding any year since 2004

plus

a NIH grant number with a leading “1” or “2” since 2004

funder mandates

Page 45: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Publication history and impact proxy

First and last authors:

• years since first paper• h-index (the largest number N such that

an author has N papers cited at least N times)

• a-index

author experience

Page 46: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Author publication history:

Citation counts:

Author-ity web serviceTorvik & Smalheiser. (2009). Author Name Disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data, 3(3):11.

Author name disambiguation:

Derived h-index (pubmedi citation indices):

author experience

Page 47: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Is research data shared after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

pilot variables

Page 48: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Univariate odds ratios

Multivariate logistic regression

stats

Page 49: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Is research data shared after publication?

Funder mandates

Journalimpact factor

Investigator “experience”

Journalmandates

Statistically significantNot statistically significant

results of pilot

Page 50: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

33%

results of pilot

Page 51: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

results of pilot

Page 52: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

results of pilot

Page 53: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

results of pilot

Page 54: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

results of pilot

Page 55: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

results of pilot

Page 56: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

results of pilot

Page 57: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

More samples, more variables

http://www.flickr.com/photos/krcla/2069243613/

PhD dissertation

Page 58: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Developed and evaluated automated methods to:

•Identify studies that generate datasets that could potentially be shared

•Determine which of these have in fact been shared

More samples:

Page 59: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

To identify studies that generate datasets,

use a query on the full text of published articles:

("gene expression" AND microarray AND cell AND rna) AND (rneasy OR trizol OR "real-time pcr") NOT (“tissue microarray*” OR “cpg island*”)

Page 60: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

To determine which articles have shared data,

use a query on the full text of published articles:

pubmed_gds[filter] and query ArrayExpress

Page 61: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

More variables:

Use PubMed and a variety of other internet resources...

Page 62: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

funded by NIH?

size of grant

sharing plan req’d?

funded by non-NIH?

impact factor

strength of policy

open access?

number of microarray studies published

years since first paper

h-index

a-index

previously shared?

previously reused?

gender

sector

size

impact rank

country

humans?

mice?

plants?

cancer?

clinical trial?

number of authors

year

Funder Journal Investigator Institution Study

Page 63: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Univariate odds ratios

Multivariate logistic regression

Exploratory factor analysis

stats

Page 64: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

http://www.flickr.com/photos/skrb/2427171774/

results?

Page 65: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

1. Is there benefit for those who share?

2. Do journal policies increase rates of sharing?

3. What other factors are correlated with sharing and withholding data?

research questions

Page 66: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

what’s next?

Page 67: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

• citation analysis of larger cohort

• journal policies with refined filter

• beyond microarray data

• deeper into journal and funder policies

• and, finally....

future work previously mentioned...

Page 68: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Reuse.

http://www.flickr.com/photos/boitabulle/3668162701/

Page 69: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

who reuses data?when?

why aren’t they?

which datasets are most likely to be reused?

what can we do about it?

how many datasets could be reused but aren’t?

why?

who doesn’t?

Page 70: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

One possible reuse research agenda

1. Inventory reuse acknowlegement patterns

2. Build full-text and metadata filters to identify instances of data reuse

3. Analyze patterns in data reuse choices

4. Survey data producers and data consumers to augment with intentions and perspectives

Page 71: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Resources

• GEO list of reuse articles (currently 618)

• Previous work in citation context classification

• Amazon Mechanical Turk for annotation

• Experimental Philosophy for insight into cultural norms

• ...Teufel et al. (2006) Automatic classification

of citation function. EMNLP.

Page 72: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

• readers

• reusers

• authors

• editors

• reviewers

• funders

• database designers, maintainers, curators

• patients, subjects, or populations

Stakeholders

For their perspectives,

and also to design studies that have actionable results

for these groups

Page 73: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

I post my data, code, and statistical scripts athttp://www.dbmi.pitt.edu/piwowar

Share yours too!

http://www.flickr.com/photos/myklroventine/892446624/

Data sharing plan

Page 74: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

thank you

Dept of Biomedical Informatics at U of Pittsburgh

NLM for training grant funding

Open science online community and those who release their articles, datasets and photos openly

Dr Wendy Chapman for her support and feedback

Page 75: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data
Page 76: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

“Does anyone want your data?

That’s hard to predict […] After all, no one ever knocked on your door asking to buy those figurines collecting dust in your cabinet before you listed them on eBay.

Your data, too, may simply be awaiting an effective matchmaker.”

Got data? Nature Neuroscience 10, 931 (2007)

Page 77: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

variables

Journal mandates

Page 78: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data
Page 79: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data
Page 80: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Blumenthal et al. Acad Med. 2006

industry involvement

perceived competitiveness of field

male

sharing discouraged in training

human participants

academic productivity

0 1 2 3

Correlates with self‐reported data withholding

Page 81: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

Campbell et al. JAMA 2002.

sharing is too much effort

want student or jr faculty to publish more

they themselves want to publish more

cost

industrial sponsor

confidentiality

commercial value of results0% 20% 40% 60% 80%

Self‐reported reasons for data withholding

Page 82: NESCent visit:  Measuring progress toward a cultural norm of shared (and reused!) biomedical research data

self-reported denying a request in last 3 years

trainees self-reported denying a request

been denied access to data, materials, code

authors “not able to retrieve raw data”

not willing to release data

0% 10% 20% 30% 40%

Prevalence of data withholding via surveys

Campbell et al. JAMA. 2002.Kyzas et al. J Natl Cancer Inst. 2005.

Vogeli et al. Acad Med. 2006.Reidpath et al. Bioethics 2001.