27
COMMUNICATING WITH DATA: NEW ROLES FOR RESEARCHERS, PUBLISHERS AND LIBRARIES MacKenzie Smith Associate Director for Technology, MIT Libraries Science Commons Research Fellow, Creative Commons ef Annual Meeting ©2010, MIT

Communicating with Data 2010 Annual Meeting

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Communicating with Data 2010 Annual Meeting

COMMUNICATING WITH DATA: NEW ROLES FOR RESEARCHERS,

PUBLISHERS AND LIBRARIES

MacKenzie Smith

Associate Director for Technology, MIT Libraries

Science Commons Research Fellow, Creative Commons

CrossRef Annual Meeting ©2010, MIT

Page 2: Communicating with Data 2010 Annual Meeting

2

Page 3: Communicating with Data 2010 Annual Meeting

3

Page 4: Communicating with Data 2010 Annual Meeting

4

The world’s first hard drive (5Mb) IBM Almaden Research Center, 1952-1954

That Was Then

CrossRef Annual Meeting ©2010, MIT

Page 5: Communicating with Data 2010 Annual Meeting

5

Current capacity hard drive (>2Tb)Google Data Center, 2010

This is Now

CrossRef Annual Meeting ©2010, MIT

Page 6: Communicating with Data 2010 Annual Meeting

6

How Much Information?

“IDC research shows that the digital universe —information that is

either created, captured, or replicated in digital form — was 281

exabytes in 2007. In 2011, the amount of digital information

produced in the year should equal nearly 1,800 exabytes, or 10

times that produced in 2006. The compound annual growth rate

between now and 2011 is expected to be almost 60%”

The Diverse and Exploding Digital Universe, 2008 IDC White Paper

CrossRef Annual Meeting ©2010, MIT

Page 7: Communicating with Data 2010 Annual Meeting

7

How Much Information?

Sequence Submissions to DNA DataBank of Japan 1993-2005

CrossRef Annual Meeting ©2010, MIT

Page 8: Communicating with Data 2010 Annual Meeting

8

What Is Research Data?

Observational e.g. sensor, telemetry, survey, sample dataExperimental e.g. genetic sequences, chromatogramsSimulation e.g. climate, economic, 3-D modelsMedia e.g. images, audio, videoDerived/compiled e.g. text/data mining, compiled databases

Often expensive or impossible to reproduce

CrossRef Annual Meeting ©2010, MIT

Page 9: Communicating with Data 2010 Annual Meeting

9

What Is Research Data?

Text e.g. flat text files, Word, PDF

Numerical e.g. SPSS, STATA, Excel, MySQL

Media e.g. jpeg, tiff, dicom, mpeg, quicktime

Models e.g. 3D, statistical

Software e.g. Java, C programs

Domain-specific e.g. FITS in astronomy, CIF in chemistry

Instrument-specific e.g. Olympus con-focal microscope

Not always in neat packages like books

CrossRef Annual Meeting ©2010, MIT

Page 10: Communicating with Data 2010 Annual Meeting

What Do Researchers Do With Data?

Analyze (e.g. process, visualize) Share Review (evaluate methods) Annotate Cite Re-use (reproduce results) Re-purpose (e.g. integrate)

CrossRef Annual Meeting ©2010, MIT

Page 11: Communicating with Data 2010 Annual Meeting

Data Sharing Innovations New-fangled Hybrid Articles

Integrate text, data and toolsEnhanced PDFs

Linked Open DataAccess to data via Web standards to

encourage large-scale interoperability

“Data Papers”

CrossRef Annual Meeting ©2010, MIT

Page 12: Communicating with Data 2010 Annual Meeting

Issues in Data Curation

Storage very large scale Metadata what standard to use? Provenance research methods Identifiers scalability, persistence Preservation see slide #5 on formats Sharing laws confusing, not

interoperable

CrossRef Annual Meeting ©2010, MIT

Page 13: Communicating with Data 2010 Annual Meeting

13

Data Sharing Trends

“The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers.” NIH grant proposal guide

Similar data management, sharing mandates from US NSF, other funding agencies worldwide

Journals mandating deposit

(e.g. Journal of Evolutionary Biology)CrossRef Annual Meeting ©2010, MIT

Page 14: Communicating with Data 2010 Annual Meeting

14

Data Interoperability

IPR and data licenses

Lots of data not copyrightable since facts cannot be copyrighted

UK, EU, some other countries have sui generis data rights

Laws not “interoperable”

Big problem for international scientific collaborations and data re-purposing

CrossRef Annual Meeting ©2010, MIT

Page 15: Communicating with Data 2010 Annual Meeting

15BWIN presentation ©2010, MIT

Page 16: Communicating with Data 2010 Annual Meeting

16

Libraries and Data

Established curation for some data types

statistical (Harvard-MIT Data Center)geospatial (Geodata Repository)bioinformatics (via NLM NCBI)digital media (e.g. images, videos)datasets (IR digital archives)

CrossRef Annual Meeting ©2010, MIT

Page 17: Communicating with Data 2010 Annual Meeting

17BWIN presentation ©2010, MIT

Page 18: Communicating with Data 2010 Annual Meeting

18

Libraries and Data

Applies to both faculty-authored and externally-acquired data

Consultation services (in-person, via Website) Liaise with data archives (e.g. ICPSR) Develop (meta)data standards (e.g. DDI) Manage and preserve data

CrossRef Annual Meeting ©2010, MIT

Page 19: Communicating with Data 2010 Annual Meeting

19BWIN presentation ©2010, MIT

Page 20: Communicating with Data 2010 Annual Meeting

20

Robotics Data in DSpace@MIT

The Library:Defined local taxonomy for metadata valuesCustomized metadata recordsAdapted/simplified deposit workflowLoaded data from previous repositoryAdded CC0 licenses

Review of new deposits done by community

CrossRef Annual Meeting ©2010, MIT

Page 21: Communicating with Data 2010 Annual Meeting

New roles for scholarly data Communication

CrossRef Annual Meeting ©2010, MIT

Page 22: Communicating with Data 2010 Annual Meeting

22

Page 23: Communicating with Data 2010 Annual Meeting

23

Researcher’s Role: Data Provision

e.g. Sage Commons

“The Sage Commons is a novel information platform being built by an international partnership of researchers and stakeholders to define the molecular basis of disease and guide the development of effective human therapeutics and diagnostics.

The Sage Commons will be used to integrate diverse molecular mega-data sets, to build predictive bionetworks and to offer advanced tools proven to provide unique new insights into human disease biology.  Users will also be contributors that advance the knowledge base and tools through their cumulative participation.

The public access mission of the Sage Commons requires the development of a new strategic and legal framework to protect the rights of contributors while providing widespread access to integrative genomics resources.”

CrossRef Annual Meeting ©2010, MIT

Page 24: Communicating with Data 2010 Annual Meeting

24

Library’s Role: Data Curation

Data organization and annotation e.g. ontologies and metadata

Data archiving, preservatione.g. perpetual access

Outreach and support to local researchers

CrossRef Annual Meeting ©2010, MIT

Page 25: Communicating with Data 2010 Annual Meeting

25

Publisher’s Role: Data Accreditation

Require data deposit to archives Publish data journals Manage peer review (quality control) Provide credit for data publishing

(evolution of promotion & tenure system)

CrossRef Annual Meeting ©2010, MIT

Page 26: Communicating with Data 2010 Annual Meeting

Data Papers Revisited

“a formal publication whose primary purpose is to expose and describe data, as opposed to analyze and draw conclusions from it.”

1. Organize peer-review, establish quality-control measures

2. Create citable entity

3. Establish cross-linking mechanisms with traditional papers, to enforce separation of concerns (methodology vs analysis)

4. Specify required documentation to make data re-usable, re-purposable

5. Apply standard interoperable legal license (CC0 or PDDL with normative attribution, CC-By with URI attribution)

6. Ensure archiving strategy in place

Jonathan Rees, Recommendations for independent scholarly publication of data sets, Creative Commons Working Paper, March 2010,

http://neurocommons.org/report/data-publication.pdf

CrossRef Annual Meeting ©2010, MIT

Page 27: Communicating with Data 2010 Annual Meeting

Questions?

CrossRef Annual Meeting ©2010, MIT