Upload
crossref
View
1.993
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Citation preview
COMMUNICATING WITH DATA: NEW ROLES FOR RESEARCHERS,
PUBLISHERS AND LIBRARIES
MacKenzie Smith
Associate Director for Technology, MIT Libraries
Science Commons Research Fellow, Creative Commons
CrossRef Annual Meeting ©2010, MIT
2
3
4
The world’s first hard drive (5Mb) IBM Almaden Research Center, 1952-1954
That Was Then
CrossRef Annual Meeting ©2010, MIT
5
Current capacity hard drive (>2Tb)Google Data Center, 2010
This is Now
CrossRef Annual Meeting ©2010, MIT
6
How Much Information?
“IDC research shows that the digital universe —information that is
either created, captured, or replicated in digital form — was 281
exabytes in 2007. In 2011, the amount of digital information
produced in the year should equal nearly 1,800 exabytes, or 10
times that produced in 2006. The compound annual growth rate
between now and 2011 is expected to be almost 60%”
The Diverse and Exploding Digital Universe, 2008 IDC White Paper
CrossRef Annual Meeting ©2010, MIT
7
How Much Information?
Sequence Submissions to DNA DataBank of Japan 1993-2005
CrossRef Annual Meeting ©2010, MIT
8
What Is Research Data?
Observational e.g. sensor, telemetry, survey, sample dataExperimental e.g. genetic sequences, chromatogramsSimulation e.g. climate, economic, 3-D modelsMedia e.g. images, audio, videoDerived/compiled e.g. text/data mining, compiled databases
Often expensive or impossible to reproduce
CrossRef Annual Meeting ©2010, MIT
9
What Is Research Data?
Text e.g. flat text files, Word, PDF
Numerical e.g. SPSS, STATA, Excel, MySQL
Media e.g. jpeg, tiff, dicom, mpeg, quicktime
Models e.g. 3D, statistical
Software e.g. Java, C programs
Domain-specific e.g. FITS in astronomy, CIF in chemistry
Instrument-specific e.g. Olympus con-focal microscope
Not always in neat packages like books
CrossRef Annual Meeting ©2010, MIT
What Do Researchers Do With Data?
Analyze (e.g. process, visualize) Share Review (evaluate methods) Annotate Cite Re-use (reproduce results) Re-purpose (e.g. integrate)
CrossRef Annual Meeting ©2010, MIT
Data Sharing Innovations New-fangled Hybrid Articles
Integrate text, data and toolsEnhanced PDFs
Linked Open DataAccess to data via Web standards to
encourage large-scale interoperability
“Data Papers”
CrossRef Annual Meeting ©2010, MIT
Issues in Data Curation
Storage very large scale Metadata what standard to use? Provenance research methods Identifiers scalability, persistence Preservation see slide #5 on formats Sharing laws confusing, not
interoperable
CrossRef Annual Meeting ©2010, MIT
13
Data Sharing Trends
“The NIH expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers.” NIH grant proposal guide
Similar data management, sharing mandates from US NSF, other funding agencies worldwide
Journals mandating deposit
(e.g. Journal of Evolutionary Biology)CrossRef Annual Meeting ©2010, MIT
14
Data Interoperability
IPR and data licenses
Lots of data not copyrightable since facts cannot be copyrighted
UK, EU, some other countries have sui generis data rights
Laws not “interoperable”
Big problem for international scientific collaborations and data re-purposing
CrossRef Annual Meeting ©2010, MIT
15BWIN presentation ©2010, MIT
16
Libraries and Data
Established curation for some data types
statistical (Harvard-MIT Data Center)geospatial (Geodata Repository)bioinformatics (via NLM NCBI)digital media (e.g. images, videos)datasets (IR digital archives)
CrossRef Annual Meeting ©2010, MIT
17BWIN presentation ©2010, MIT
18
Libraries and Data
Applies to both faculty-authored and externally-acquired data
Consultation services (in-person, via Website) Liaise with data archives (e.g. ICPSR) Develop (meta)data standards (e.g. DDI) Manage and preserve data
CrossRef Annual Meeting ©2010, MIT
19BWIN presentation ©2010, MIT
20
Robotics Data in DSpace@MIT
The Library:Defined local taxonomy for metadata valuesCustomized metadata recordsAdapted/simplified deposit workflowLoaded data from previous repositoryAdded CC0 licenses
Review of new deposits done by community
CrossRef Annual Meeting ©2010, MIT
New roles for scholarly data Communication
CrossRef Annual Meeting ©2010, MIT
22
23
Researcher’s Role: Data Provision
e.g. Sage Commons
“The Sage Commons is a novel information platform being built by an international partnership of researchers and stakeholders to define the molecular basis of disease and guide the development of effective human therapeutics and diagnostics.
The Sage Commons will be used to integrate diverse molecular mega-data sets, to build predictive bionetworks and to offer advanced tools proven to provide unique new insights into human disease biology. Users will also be contributors that advance the knowledge base and tools through their cumulative participation.
The public access mission of the Sage Commons requires the development of a new strategic and legal framework to protect the rights of contributors while providing widespread access to integrative genomics resources.”
CrossRef Annual Meeting ©2010, MIT
24
Library’s Role: Data Curation
Data organization and annotation e.g. ontologies and metadata
Data archiving, preservatione.g. perpetual access
Outreach and support to local researchers
CrossRef Annual Meeting ©2010, MIT
25
Publisher’s Role: Data Accreditation
Require data deposit to archives Publish data journals Manage peer review (quality control) Provide credit for data publishing
(evolution of promotion & tenure system)
CrossRef Annual Meeting ©2010, MIT
Data Papers Revisited
“a formal publication whose primary purpose is to expose and describe data, as opposed to analyze and draw conclusions from it.”
1. Organize peer-review, establish quality-control measures
2. Create citable entity
3. Establish cross-linking mechanisms with traditional papers, to enforce separation of concerns (methodology vs analysis)
4. Specify required documentation to make data re-usable, re-purposable
5. Apply standard interoperable legal license (CC0 or PDDL with normative attribution, CC-By with URI attribution)
6. Ensure archiving strategy in place
Jonathan Rees, Recommendations for independent scholarly publication of data sets, Creative Commons Working Paper, March 2010,
http://neurocommons.org/report/data-publication.pdf
CrossRef Annual Meeting ©2010, MIT
Questions?
CrossRef Annual Meeting ©2010, MIT