24
Prepared for Second Open Economics International Workshop June 2013 “Not-bad” Practices for Sharing Economics Data Dr. Micah Altman <[email protected]> Director of Research, MIT Libraries Non-Resident Senior Fellow, The Brookings Institution

Best Practices for Sharing Economics Data

Embed Size (px)

DESCRIPTION

Presented at OKFN/MIT/Sloan 2nd Open Economics International Workshop. http://openeconomics.net/events/workshop-june-2013/

Citation preview

Page 1: Best Practices for Sharing Economics Data

Prepared for

Second Open Economics International Workshop

June 2013

“Not-bad” Practices for Sharing Economics Data

Dr. Micah Altman<[email protected]>

Director of Research, MIT LibrariesNon-Resident Senior Fellow, The Brookings Institution

Page 2: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 2

DISCLAIMERThese opinions are my own, they are not the opinions of MIT, Brookings, any of the project funders, nor (with the exception of co-authored previously published work) my collaborators

Secondary disclaimer:

“It’s tough to make predictions, especially about the future!”

-- Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R.

Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, etc.

Page 3: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 3

Collaborators & Co-Conspirators

• Jonathan Crabtree, Merce Crosas, Gary King, Michael McDonald, Nancy McGovern, Salil Vadhan & many others

• Research SupportThanks to the Library of Congress, the

National Science Foundation, IMLS, the Sloan Foundation, the Joyce Foundation, the Massachusetts Institute of Technology, & Harvard University.

Page 4: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 4

Related Work• Altman (2013) Data Citation in The Dataverse Network ®,. In Developing

Data Attribution and Citation Practices and Standards: Report from an International Workshop.

• National Digital Stewardship Alliance, 2013 (Forthcoming), 2014 National Agenda for Digital Stewardship.

• M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182

• M. Altman, 2008, "A Fingerprint Method for Verification of Scientific Data" in, Advances in Systems, Computing Sciences and Software Engineering, (Proceedings of the International Conference on Systems, Computing Sciences and Software Engineering 2007) , Springer-Verlag.

• M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April).

Most reprints available from:informatics.mit.edu

Page 5: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data

‘Not Bad’Practices

5

Page 6: Best Practices for Sharing Economics Data

The Lifecycle and Institutional Ecology of Data

Some TrendsShifting Evidence Base

High Performance Collaboration(here comes everybody…)

More Data

Publish, then Filter

More Learners

6

More Open

Page 7: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 7

Why not ‘best’ practices? • Few models for systematic valuation of data

– how much will data X be worth to community Y at time Z?See: National Digital Stewardship Alliance, 2013 (Forthcoming),

2014 National Agenda for Digital Stewardship. Library of Congress

• Optimality of practices are generally strongly dependent on operational context• Context of data sharing very dynamic

– change in publication models– change in evidence base– change in data management methodologies– change in policies

• Paucity of evidence to establish data practices as best:– Descriptive: adoption, compliance– Predictive: association of best practices &desired outcomes– Causal: intervention with best practices linked to improvement

Best practices neither best nor practiced.

Page 8: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 8

Why ‘not bad’ practices?

• Avoid clearly bad practices• Document operational and tacit knowledge• Elicit assumptions• Provide basis for auditing, evaluation, and

improvement

Page 9: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data

Probably Not Bad Practices

9

Page 10: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 10

Types of Practices• Analytic practices– Lifecycle analysis– Requirements analysis

• Policy practices– Data dissemination policies– Data citation policies– Reproducibility policies

• Technical practices– Sharing technologies– Reproducibility technologies

Page 11: Best Practices for Sharing Economics Data

11

Core Dimensions of Shared Information Infrastructure

“Not-bad” Practices for Sharing Economics Data

• Stakeholder incentives – recognition; citation; payment; compliance; services

• Dissemination– access to metadata; documentation; data

• Access control– authentication; authorization; rights management

• Provenance– chain of control; verification of metadata, bits, semantic content

• Persistence– bits; semantic content; use

• Legal protection – rights management; consent; record keeping;

• Usability for…– discovery; deposit; curation; administration; annotation; collaboration

• Economic model– valuation models; cost models; business models

• Trust model– verification; transparency; enforcementSee: King 2007; ICSU 2004; NSB 2005; Schneier 2011

Page 12: Best Practices for Sharing Economics Data

Creation/Collection

Storage/Ingest

Processing

Internal SharingAnalysis

External dissemination/

publication

Re-use

Long-term

access

Stakeholders

Scholarly Publishers

Researchers

Data Archives/Publisher

Research Sponsors

Data Sources/Su

bjectsConsumers

Service/Infrastructure

Providers

Research Organizations

“Not-bad” Practices for Sharing Economics Data12

Mod

elin

g

Page 13: Best Practices for Sharing Economics Data

Legal ConstraintsContract Intellectual Property

Access Rights Confidentiality

Copyright

Fair Use

DMCA

Database Rights

Moral Rights

Intellectual Attribution

Trade Secret

Patent

Trademark

Common Rule45 CFR 26

HIPAA

FERPA EU Privacy DirectivePrivacy Torts

(Invasion, Defamation)

Rights of Publicity

Sensitive but Unclassified

Potentially Harmful

(Archeological Sites,

Endangered Species,

Animal Testing, …)

Classified

FOIA

CIPSEA

State Privacy Laws

EAR

State FOI Laws

Journal Replication

Requirements

Funder Open Access

Contract

License

Click-WrapTOU

ITAR

Export Restrictions

Page 14: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 14

Data Dissemination Policies - How• License: Creative Commons

Version 4.0 of the Creative Commons licenses

– Legally well crafted– Avoids attribution stacking – attribution through links– Handles sui-generis database rights, licensee rights to publicity, etc.– Machine actionable

See: wiki.creativecommons.org/4.0

• Confidentiality

Deidentification & public use files insufficient.

– Need multiple modes of access, including protected access to confidential data.

See: National Research Council. 2005. Expanding access to research data: Reconciling risks and opportunities. Washington, DC: The National Academies Press.

Vadhan, S. , et al. 2010. “Re: Advance Notice of Proposed Rulemaking: Human Subjects Research Protections”. Available from: http://dataprivacylab.org/projects/irb/Vadhan.pdf

Page 15: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 15

Data Dissemination Policy - When

• Timeliness [NRC Recommendations]– Sharing data should be a regular practice.– Investigators should share their data by the time of

publication of initial major results of analyses of the data except in compelling circumstances.

– Data relevant to public policy should be shared as quickly and widely as possible.

– Plans for data sharing should be an integral part of a research plan whenever data sharing is feasible.

Fienberg, et al. (eds). 1985. Sharing Research data. Washington, DC: The National Academies Press.

Page 16: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 16

Data Dissemination Policy - Where

• With journals. Follow NISO supplementary materials:http://www.niso.org/workrooms/supplementalrecommendations

• With sustainable well known collaboratively-stewarded repositories– Example: data-pass.org

Also see: M. Altman, Adams, M., Crabtree, J., Donakowski, D., Maynard, M., Pienta, A., & Young, C. 2009. "Digital preservation through archival collaboration: The Data Preservation Alliance for the Social Sciences." The American Archivist. 72(1): 169-182

Page 17: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 17

Data Citation Policies• Data Citation First Principles

(Harvard Workshop, NRC Report, Co-Data Forthcoming)– Data citations should be treated as first-class objects of publication– At minimum, all data necessary to understand assess extend conclusions in scholarly

work should be cited. See: Altman, Micah. “Data Citation in The Dataverse Network.” Developing Data Attribution and Citation Practices and Standards Report from an International Workshop. Ed. Paul F Uhlir. National Academies Press, 2012M. Altman and G. King. 2007. “A Proposed Standard for the Scholarly Citation of Quantitative Data”, D-Lib, 13, 3/4 (March/April).

• Data-PASS recommendations– Minimal elements: author, date, title, persistent id– Location: must appear with other elements– Recommended: fixity information, such as

Universal Numeric Fingerprint

See: data-pass.org/citations.html

Page 18: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 18

Reproducibility Policies• Science

– “Unpublished data and personal communications. Citations to unpublished data and personal communications cannot be used to support claims in a published paper. Papers will be held for publication until all "in press" citations are published.”

– “Data and materials availability All data necessary to understand, assess, and extend the conclusions of the manuscript must be available to any reader of Science. All computer codes involved in the creation or analysis of data must also be available to any reader of Science. “

• Support for publishing replication– Registered replication reports:

http://www.psychologicalscience.org/index.php/replication– ICMJE Clinical Trials Registration:

http://www.icmje.org/publishing_10register.html– Journals of Negative/Null Results

Page 19: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 19

Policies are not Self-Enforcing /Sustaining

• Technical and financial sustainability must be planned, to ensure long term accessSee: National Science Board, Long-Lived Digital Data Collections: Enabling Research andEducation in the 21st Century. NSF. http://www.nsf.gov/pubs/2005/nsb0540/nsb0540.pdf

• Long-term access requires initial investment in data preparation– Capture tacit knowledge, create metadata– Transfer to stable formats

Page 20: Best Practices for Sharing Economics Data

Compliance with Data Sharing Policies is often Low

The Lifecycle and Institutional Ecology of Data

Compliance is low even in best examples of journals

Checking compliance is labor-intensive without citation and repository standards

[See Glandon 2011; Mucullough, et. al 2008]

20

micah
BORGMAN, Lyone unevenness in data sharing practices
Page 21: Best Practices for Sharing Economics Data

“Not-bad” Practices for Sharing Economics Data 21

Technical Infrastructure Examples• CKAN

– Open Source– Established– Built on drupal platform– http://ckan.org/

• Dataverse Network– http://thedata.org– Open Source– Flexible archival models– Semantic Fixity (UNF)

[Altman 2008]• MyExperiment

– http://www.myexperiment.org/– Long lasting– Archives complete workflows to produce results

Technical Criteria• Long term access– Replication,

independence• Verifiability and fixity• Provenance• Workflows/code

Page 22: Best Practices for Sharing Economics Data

Final Observations• Best practices aren’t…

– document context of practice & measure desired outcomes• Not-bad practice starts with analysis…

– lifecycle; requirements; sustainability ; predicted costs and benefits

• Effective data sharing requires policies:– dissemination, citation, replication, auditing

• Effective data sharing requires infrastructure: – For verifiability, provenance, workflows/code, & long term

access• Policies are not self-enforcing

– combine incentives, transparency, auditing, & evaluation“Not-bad” Practices for Sharing Economics Data 22

Page 23: Best Practices for Sharing Economics Data

Additional Bibliography (Selected)• McCullough, B.D., Kerry Anne McGeary, and Teresa D. Harrison. "Do Economics Journal Archives Promote Replicable

Research?" Canadian Journal of Economics 41, no. 4 (2008).

• Schneier, Bruce, 2012, Liars and Outliers. Wiley.• Borgman, Christine. “The Conundrum of Research Sharing.” Journal of the American Society for Information Science and

Technology (2011):1-40.• Glandon P. , 2011. Report on the American Economic Review Data Availability Compliance Project.

http://www.aeaweb.org/aer/2011_Data_Compliance_Report.pdf

• King, Gary. 2007. An Introduction to the Dataverse Network as an Infrastructure for Data Sharing. Sociological Methods and Research 36: 173–199NSB

• International Council For Science (ICSU) 2004. ICSU Report of the CSPR Assessment Panel on Scientific Data and Information. Report.

“Not-bad” Practices for Sharing Economics Data

23

Page 24: Best Practices for Sharing Economics Data

Questions?

E-mail: [email protected]: micahaltman.comTwitter: @drmaltman

“Not-bad” Practices for Sharing Economics Data

24