75
Heather Piwowar @researchremix DataONE postdoc with NESCent and Dryad #idcc11 A future where data attribution Counts some photos NC, SA

a future where data citation Counts

Embed Size (px)

DESCRIPTION

Presentation by Heather Piwowar at International Digital Curation Conference #idcc

Citation preview

Page 1: a future where data citation Counts

Heather  Piwowar  @researchremix  DataONE  postdoc  with  NESCent  and  Dryad

#idcc11  

A future wheredata attribution Counts

some photos NC, SA

Page 2: a future where data citation Counts

http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm

If I have seen farther it is by standing on the shoulders of giants, said Isaac Newton and others before him.

While historians speculate that Isaac Newton was actually being sarcastic,

Page 3: a future where data citation Counts

http://www.flickr.com/photos/jsmjr/62443357/most of us would agree that science progresses by standing on shoulders of those who came before. Or by kneeling on their backs. Or clambering up their work any other way we can.

Page 4: a future where data citation Counts

http://www.flickr.com/photos/camilleharrington/3587294608/

Many of us believe that when we share our research output, not only as published research descriptions, but also in the form of open datasets and methods, we are, in effect, making our shoulders broader.

Page 5: a future where data citation Counts

http://www.flickr.com/photos/rkuhnau/3318245976/

All of a sudden, a lot more people can build on our work.

Page 6: a future where data citation Counts

http://www.flickr.com/photos/conformpdx/1796399674/

Researchers can climb higher than otherwise possible,

Page 7: a future where data citation Counts

http://www.flickr.com/photos/rkuhnau/3317418699/and jump up and down on our findings to make sure they are really stable.

Page 8: a future where data citation Counts

http://www.flickr.com/photos/zemlinki/261617721/

It allows contributions from places we may never have expected,

Page 9: a future where data citation Counts

http://www.flickr.com/photos/tracenmatt/3020786491/

and investigators can explore places they never could have on their own.

Page 10: a future where data citation Counts

http://www.flickr.com/photos/the-o/2078239333/

In short, our broad-shouldered research can make a contribution that far exceeds its original role.

Page 11: a future where data citation Counts

This is a great story, right? And why where are all here.

But it is also a great metaphor for the problem

Page 12: a future where data citation Counts

http://www.flickr.com/photos/davemurr/4592014327/

What exactly do broad shoulders get the individual researcher?

Pain!

Because a few citations, as much as we'd like to think otherwise, aren't enough to offset the hard work and Fear Uncertainty and Doubt that accompanies the costs of uploading

a dataset in the current culture.

Page 13: a future where data citation Counts

http://www.flickr.com/photos/joshb/25983792

Nobody looks at the supporting structure of an impressive tower. We are all busy oggling the top. That means these people? These ones with the shoulders? They've got

nothing.

Page 14: a future where data citation Counts

http://www.flickr.com/photos/joshb/25983792

everyone is looking at this guy

Page 15: a future where data citation Counts

http://www.flickr.com/photos/joshb/25983792

not this one. he’s not getting any fame or glory here, he isn’t making great strides in his career.

Page 16: a future where data citation Counts

http://www.flickr.com/photos/joshb/25983792

ok, maybe this guy gets some citations. Not enough.

Page 17: a future where data citation Counts

http://www.flickr.com/photos/joshb/25983792

everyone is looking at this guy

Page 18: a future where data citation Counts

http://www.flickr.com/photos/supersam5/216868485/

This person

Page 19: a future where data citation Counts

http://www.flickr.com/photos/commissariat/4829261601/in/faves-30112411@N02/

somebody else gets to be top tog. And I think a lot of researchers actually believe that by making their shoulders broader they enable others to become top tog at their expense.

Page 20: a future where data citation Counts

http://www.flickr.com/photos/sunrise/35819369/

A few citations aren’t enough to overcome that

fear.

Page 21: a future where data citation Counts

Gleditsch et al. 2003. Posting Your Data: Will You Be Scooped or Will You Be Famous?, International Studies Perspectives 4(1): 89–97.

Piwowar et al. 2007. Sharing Detailed research data is associated with increased citation Rate. PLoS ONE.

Ioannidis et al. Repeatability of published microarray gene expression analyses. Nature Genetics 41, 149 - 155

Pienta et al. 2010. NSR Social Science Secondary Use. Michigan IR.

Henneken et al. 2011. Linking to Data – Effect on Citation Rates in Astronomy. ESO.

Sears 2011. Data Sharing Effect on Article Citation rate in Paleoceanography. AGU.

Don't get me wrong, I'm a fan of studies that show a citation benefit for sharing data :) . But it won't be enough.

Page 22: a future where data citation Counts

http://www.flickr.com/photos/bfhoyt/4606049592/

If it were, we'd have researchers knocking down the doors of our IR for the 10 minute job of sending in their preprints. They aren't doing

that.

Page 23: a future where data citation Counts
Page 24: a future where data citation Counts
Page 25: a future where data citation Counts

but....

Page 26: a future where data citation Counts
Page 27: a future where data citation Counts
Page 28: a future where data citation Counts

http://www.flickr.com/photos/davemurr/4592014327/

What exactly do broad shoulders get the individual researcher?

Pain!

Because a few citations, as much as we'd like to think otherwise, aren't enough to offset the hard work and Fear Uncertainty and Doubt that accompanies the costs of uploading

a dataset in the current culture.

Page 29: a future where data citation Counts

So.

So.

What to do about it? How to change the culture?

Page 30: a future where data citation Counts

We need to facilitate deep recognition of the labour of dataset creation.

We need to facilitate deep recognition of the labour of dataset creation. hat top John Wilbanks.

Ok let me say that again because it is so important

We need to facilitate deep recognition of the labour of dataset creation.

Page 31: a future where data citation Counts

http://www.flickr.com/photos/g_kat26/4255119413/

Let's dig in to how these groups do impact tracking now, and how they'd like to do it in the

future.

Page 32: a future where data citation Counts

http://www.flickr.com/photos/joshb/25983792

how to researchers value their own contributions now

Page 33: a future where data citation Counts

http://www.flickr.com/photos/europedistrict/5692787622/

Data repositories, who we might view as perhaps personal trainers.

Page 34: a future where data citation Counts

http://www.flickr.com/photos/digitaljourney/5767535618/

and funders, the ones who pay for all of the gym equipment

Page 35: a future where data citation Counts

Researchers

Page 36: a future where data citation Counts

Investigators, today, can list research products on CV. This can include datasets.

Page 37: a future where data citation Counts

Investigators, today, can list research products on CV. This can include datasets.

Page 38: a future where data citation Counts

http://total-impact.orgA CV is sort of bland, don't you think? It has no context of use.

We can see one version of a more useful future comes from a tool called total-Impact. Continuing a project that started as a hackathon at the Open Society Foundation

workshop Beyond Impact organized by Cameron Neylon here in the UK last spring, Jason Priem, me, and a few other people have been working on a tool called total-impact.

http://total-impact.org

Page 39: a future where data citation Counts

http://total-impact.orgtotal-Impact aggregates metrics for papers and also non-traditional research metrics, for traditional research project like articles

Page 40: a future where data citation Counts

http://total-impact.orgcan drill in

The metrics are citations, but also altmetrics. PLoS has done some of the ground breaking work in this space with article-level citations, but a lot of other metrics are available

also...various indications that others have found your research worth bookmarking, or blogging, or referencing on Wikipedia.

Page 41: a future where data citation Counts

http://total-impact.orgAlso non-traditional research products like datasets.

It doesn't currently look for dataset identifiers in public R packages, but it could, for example, as indication of use.

This makes a “live CV” if you will, giving post-publication context to research output.

Page 42: a future where data citation Counts

http://total-impact.orgThis is where citations would go. More on that later.

Page 43: a future where data citation Counts

Repositories

Repositories, today,

Page 44: a future where data citation Counts

http://dx.doi.org/10.5061/dryad.18can look at graphs of their deposit counts.

Many know their own download statistics, some share this with their authors or the public.

Page 45: a future where data citation Counts

http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/3131/utilizationAs a result of intensive manual digging, some have metrics about how many times their datasets have been mentioned in the

literature.

Page 46: a future where data citation Counts

http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/3131/utilizationThey have details about what was downloaded

Page 47: a future where data citation Counts

http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/3131/utilizationIn cases where logons are required to get the data, have information about who is downloading. These stats are from ICPSR for one dataset. Publicly

available.

Page 48: a future where data citation Counts

I'll splash by a few graphs of preliminary research findings.... come find me or my blog if you want more info.

Using manual annotation we are starting to be able to estimate third party reuse. In terms of raw numbers, with extrapolations

Page 49: a future where data citation Counts

Teasing out use by the original authors from use by 3rd parties who probably only got access to the data because of the repository. Tools that support data citation will help

this.

Page 50: a future where data citation Counts

We have observed reuse of at 35% of GEO datasets submitted in 2005.

And distribution of the data use across all of the datasets in the repository. Is it 1% of the datasets that drive all the use? Nope, it looks like often use is distributed across a broad population of datasets.

Page 51: a future where data citation Counts

Piwowar, Vision, Whitlock (2011) Data archiving is a good investment. Nature letter to the editor: 473, p285.

http://researchremix.wordpress.com/2011/05/19/nature-letter/

This sort of information is very valuable for repositories when they want to make their case.

As I said, right now we can get some of this information through a lot of painful manual searching across the internet. Data citations will help reduce some of this burden.

Page 52: a future where data citation Counts

Indispensible

What repositories really want, though, though -- correct me if I’m wrong -- is to show that they are indispensable. That they generate new, profound science not otherwise

possible. That they are a great financial investment in scientific progress. This requires knowing more than just a citation count, it requires knowing the context of reuse. This

means we need access to the full text of the paper that cites the data.

Page 53: a future where data citation Counts

Funders

What about funders?

Page 54: a future where data citation Counts

http://www.flickr.com/photos/n2artscapes/3527520456/

They want to know the impact the data had on society. Did it facilitate innovation, reduce discrimination, create jobs, save the rainforest, increase our GDP.

That kind of tracking is beyond what any of us know how to do yet :)

We're going to need digital tracking technology that as far as I know isn't available yet but I'm sure people are working on. Google analytics meets digital RF-ID tags.... I

dunno... but I do know we need it. Furthermore, we need these digital tracking mechanisms to be affordable and open, to facilitate mashups.

Page 55: a future where data citation Counts

Ok, so with that sort of future vision for tracking, what do we need as a scholarly ecosystem need to power this future world?

Page 56: a future where data citation Counts

innovation and experimentation

We need innovation and experimentation.

Page 57: a future where data citation Counts

http://www.flickr.com/photos/jo-h/2688026447/

We need 1000 flowers blooming

We need solutions that are open and generative

We need data that is open and generative

I don't have all the answers, but here is part of it:

Page 58: a future where data citation Counts

open access to citation data

We can't just rely on Scopus, Thomson, and Google Scholar.

Those are only three players, They good at what they do and have been invaluable, but they can't possibly be as nimble as a whole bunch of startups.

It is taking them a long time to come out with a data tracking tool. Why? Probably because they have an ambitious vision and need time to fit it into their other product

offerings. That isn’t a bad thing... but at the same time, Some of the rest of us would be happy with iterating on a quick and dirty solution.

We need more competition in this space. The barrier to entry is extrodinarily high because of course reference lists are almost all behind copyright and paywalls.... but open

access publications gives us a toehold.

Page 59: a future where data citation Counts

open access to full text

Open access to full text.

Open access also gives us a toehold into citation context information.

A citation to a dataset tells us that the dataset played some role in that new research paper. What role? Was it used to validate a new method? Detect errors? Was it combined

with other datasets to solve a problem that was otherwise intractable? The answers to these questions are fundamental to what funders and others need to know about impact.

It won't be easy to derive them from the text of the paper, but I strongly believe it is possible.

Page 60: a future where data citation Counts

open access to other metrics

Open access to other use.

We need broad-based metrics... not just citations, but blog posts about data, slides that include R and STATA tutorials about data, bookmarks to data on bookmarking sites.

altmetrics. If you run a data repository, make your download stats publicly available. We frankly don't know what all of this info means yet, but we didn't know what citations

to papers meant 50 years ago either. We'll all figure it out, the more data the better.

Page 61: a future where data citation Counts

here’s what each of us need to

do

Page 62: a future where data citation Counts

1. raise our expectations

raise our expectations

Page 63: a future where data citation Counts

http://www.flickr.com/photos/quinnanya/2055471833

what and and should be open and able to be mashed upwhat each of us can do to make a differencewhat we must do

Page 64: a future where data citation Counts

2. raise our voices

raise our voices

Page 65: a future where data citation Counts
Page 66: a future where data citation Counts

3. get excited and make things

here’s what each of us need to

do

Page 68: a future where data citation Counts

1. raise our expectations2. raise our voices3. get excited and make things

here’s what each of us need to

do

Page 69: a future where data citation Counts

http://www.flickr.com/photos/huzzahvintage/4577075021/

These things will make shoulders that get noticed whereever they go, and recognition when they make dramatic impact

Page 70: a future where data citation Counts

A future wheredata attribution Counts

Page 71: a future where data citation Counts

A future about what kind of impact 

a dataset makes,not just a citation number.

Page 72: a future where data citation Counts

http://www.flickr.com/photos/myklroventine/892446624/

The future is

The future is open.

Page 73: a future where data citation Counts

Open data.Open data about our data.

Page 74: a future where data citation Counts

thank youTodd Vision,

Jonathan Carlson, Estephanie Sta Maria, Jason Priem, total-Impact and Beyond ImpactDryad and DataONE teams

The open science online community and those who release their articles, datasets and photos openly

blog: ResearchRemix.wordpress.com@researchremix

thank you

Page 75: a future where data citation Counts

1. raise our expectations2. raise our voices3. get excited and make things