Data Citation from the perspective of tracking data reuse

Preview:

DESCRIPTION

Presentation by Heather Piwowar at DataCite 2011 Summer Meeting, Aug 24 2011.

Citation preview

Data Citation Challenges and Opportunities

from the perspective ofTracking Data Reuse

Heather  PiwowarDataONE  postdoc  with  NESCent  and  Dryad

@researchremix  

DataCite  Summer  MeetingAugust  2011

http://www.metmuseum.org/toah/ho/09/euwf/ho_24.45.1.htm

http://www.flickr.com/photos/jsmjr/62443357/

http://www.flickr.com/photos/camilleharrington/3587294608/

http://www.flickr.com/photos/rkuhnau/3318245976/

http://www.flickr.com/photos/conformpdx/1796399674/

http://www.flickr.com/photos/rkuhnau/3317418699/

http://www.flickr.com/photos/zemlinki/261617721/

http://www.flickr.com/photos/tracenmatt/3020786491/

http://www.flickr.com/photos/the-o/2078239333/

We have observed reuse of at 35% of GEO datasets submitted in 2005.

Piwowar, Vision, Whitlock (2011) Data archiving is a good investment. Nature 473, 285

http://researchremix.wordpress.com/2011/05/19/nature-letter/

Tracking 1k

10 * 100 = 1000

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)!"#

*!"#

+!"#

,!"#

$!!"#

-./# 0123141# 56447184#

!"#$"%

&'()'*

++',*

&*'#"

-."'*/

#01-2

(%.'

948:;1<4=

># 1?6@AB:C2#@2#64D4642E4#

1?6@AB:C2#@2#DCC<2C<4#

1?6@AB:C2#@2#<1AF4#

1?6@AB:C2#@2#<4G<#

!"!!!!

!#$!!

!%$$!!

!%#$!!

!&$$!!

!&#$!!

&$$'! &$$#! &$$(! &$$)! &$$*! &$$+! &$%$! &$%%!

!"#

$%&'(

)'*+,+'&%"-%-'.,

-./01,2

3!

456!

70890,0!

:;,,<0-,!

=>?@89!0?,;,09,!A4563!

=>?@89!0?,;09,!A70890,03!

=>?@89!0?,;09,!A:;,,<0-,3!

https://notebooks.dataone.org/tracking1000datasets/

Piwowar, Carlson, Vision (2011) Beginning to track 1000 datasets from public repositories into the published literature. ASIS&T poster.

My research blog:

ResearchRemix.wordpress.com

http://www.flickr.com/photos/myklroventine/892446624/

Data citation in the wild IDCC 2010 poster.

A best-practice solution!

#1

Lack of tool support for our best practice

We need more diversityWe need more players

We need start-ups

Abstracts are open. Ref lists should be too.

#2

Our best practice doesn’t scale to mega-reuse

!"#$!"#%!"#&!"#'!"#(!"#)!"#*!"#+!"#,!"#

$!!"#

!# (# $!# $(# %!# %(# &!#

!"#$%&'()'*+,+-%,-'&%)%&%./%*'$0'+&1/2%-',3+,'&%"-%*'456'*+,+7'/"#"2+18%'

!"#$!"#%!"#&!"#'!"#(!"#)!"#*!"#+!"#,!"#

$!!"#

!# (# $!# $(# %!# %(# &!#

!"#$%&'()'*+,+-%,-'&%)%&%./%*'$0'+&1/2%-',3+,'&%"-%*'456'*+,+7'/"#"2+18%'

!"#$!"#%!"#&!"#'!"#(!"#)!"#*!"#+!"#,!"#

$!!"#

!# (# $!# $(# %!# %(# &!#

!"#$%&'()'*+,+-%,-'&%)%&%./%*'$0'+&1/2%-',3+,'&%"-%*'456'*+,+7'/"#"2+18%'

But wait!

#2

Our best practice doesn’tcan

scale to mega-reuse(if we work at it)

Another place where having a few big players is a bottleneck

Open reference lists.

#3a

Adoption of best practices erode incentives in the

short term

~70% in multivariate analysis

#3b

Data citations only matter if they are valued

Please donʼt tweet or publicize this next bit...

Early results from an ongoing survey.

n=538

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)*+,-./0#1234.+55#

6# 6# 758*+4/# 6# 6# )*+,-./0#4.+55#

9:/54351#,*;5+3#<4-#=82/1#,-#>0#?,+@#>,+5#5432/0#

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)*+,-./0#1234.+55#

6# 6# 758*+4/# 6# 6# )*+,-./0#4.+55#

9:/54351#;#<2//#.5*#=,+5#>2*4?,-3##

Do not publicize

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)*+,-./0#1234.+55#

6# 6# 758*+4/# 6# 6# )*+,-./0#4.+55#

9#:/54351#2*#;2//#<5#=4/851##<0#>0#?8-15+#

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)*+,-./0#1234.+55#

6# 6# 758*+4/# 6# 6# )*+,-./0#4.+55#

9#:/54351#2*#;2//#<5#=4/851##<0#>0#:+,>,?,-#,+#*5-8+5#@,>>2A55#

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)*+,-./0#1234.+55#

6# 6# 758*+4/# 6# 6# )*+,-./0#4.+55#

9:/54351#,*;5+3#<4-#=82/1#,-#>0#?,+@#>,+5#5432/0#

!"#

$!"#

%!"#

&!"#

'!"#

(!"#

)*+,-./0#1234.+55#

6# 6# 758*+4/# 6# 6# )*+,-./0#4.+55#

9:/54351#;#<2//#.5*#=,+5#>2*4?,-3##

Do not publicize

Top-down

Bottom-up

Text

DataCite!

thank youTodd Vision,

Jonathan Carlson, Estephanie Sta Maria, Nicholas Weber, Sarah Judson, Valerie EnriquezJason Priem and Beyond ImpactDryad and DataONE teams

The open science online community and those who release their articles, datasets and photos openly

blog: ResearchRemix.wordpress.com

No consistent practice

Sarah Judson, Data citation in the wild IDCC 2010 poster.

We reviewed 500 articles in six major evolution and ecology journals for evidence of data citation:

Sarah Judson, Data citation in the wild IDCC 2010 poster.

We reviewed 500 articles in six major evolution and ecology journals for evidence of data citation:

In 2009, 116 articles cited ORNL DAAC data.

Finding these articles took 70-80 hours

across at least 12 resourcesall chosen from a deep understanding of this specific research domain

then the full text of all the hits were manually reviewed

Valerie Enriquez interview with James Kidderhttp://openwetware.org/wiki/DataONE:Notebook/Reuse_of_repository_data

Recommended