15
The Challenges of Making Data Travel Sabina Leonelli Exeter Centre for the Study of Life Sciences (Egenis) & Department of Sociology, Philosophy and Anthropology University of Exeter @sabinaleonelli www.datastudies.eu

The Challenges of Making Data Travel, by Sabina Leonelli

Embed Size (px)

Citation preview

Page 1: The Challenges of Making Data Travel, by Sabina Leonelli

The Challenges of Making Data Travel

Sabina LeonelliExeter Centre for the Study of Life Sciences (Egenis)

& Department of Sociology, Philosophy and Anthropology

University of Exeter@sabinaleonelli

www.datastudies.eu

Page 2: The Challenges of Making Data Travel, by Sabina Leonelli

Outline

• The Potential of Open Data

• Data Journeys:– Challenges of collection– Challenges of re-use– Challenges of openness– The Open Data divide

• Conclusions

Page 3: The Challenges of Making Data Travel, by Sabina Leonelli

Openness in ScienceLong history of openness as a key norm for science: public scrutiny, transparency and reproducibility of results define what science is, how it works, what counts as a research output

Equally long history of reasons why it does not work in practice:• Trust system where scrutiny is delegated to specialists• Long paths from data generation to discovery• Strong incentives provided by commercialisation and competition,

with associated intellectual property regimes around research results (and conflicting interests of research sponsors and institutions)

• Practical difficulties in disseminating and reproducing data, software, techniques and materials, vis-à-vis research articles

• Publication regime itself increasingly commercialised

Page 4: The Challenges of Making Data Travel, by Sabina Leonelli

What makes Open Data valuable now?• Potential to improve

– pathways to and quality of discoveries– uptake of new technologies – collaborative efforts across disciplines, nations and expertises– research evaluation, debate and transparency– appropriate valuation of research components beyond papers and patents– fight against fraud, low quality and duplication of efforts– legitimacy of science and public trust – public understanding and participation

• Open Data as a platform to debate what counts as science, scientific infrastructures and scientific governance, and how results should be credited and disseminated

• Making data open means making data mobile and useful across sites, contexts, uses: major challenges to realising that potential

• My concern: examining conditions under which the potential of data as evidence for scientific claims can be realised sustainably in the long term

Page 5: The Challenges of Making Data Travel, by Sabina Leonelli

Researching Data JourneysInvestigating the conceptual/material/institutional labor involved in making data travel from sites of production to sites of (re-)use

• Digital data infrastructures as sites for data movements and integration across a wide variety of sources and perspectives

• Situations of data uptake and re-use in developed and developing world (ongoing studies in UK, USA, Kenya, South Africa)

• Methods: history, philosophy and social studies of science– Archival research– Ethnographies and interviews on attitudes to openness, curation practices and re-use– Collaboration with researchers

• Policy involvement: – Lead for Open Science working group of the Global Young Academy (e.g. Access to

Open Software Survey – Nigeria, Ghana, Bangladesh)– Chair of ongoing Open Data consultation across European YAs

Page 6: The Challenges of Making Data Travel, by Sabina Leonelli

Research Data Management Across Disciplines

Scientific realms under investigation:• model organism research: data on different aspects of same organism • plant science: environmental, phenotypic and omics data• biomedicine: clinical, crowdsourced, biological data• oceanography: geological, geographical, metereological, biological data• archaeology, particle physics, climate science, economics

Parameters of comparison:• Subject matter (complex objects versus simplified models)• Data source (one or multiple disciplines) • Data production mode (centralised vs dispersed; highly automated vs system-specific)• Data types (ease of dissemination and analysis, size, relation to software)• Publication cultures and collaborative ethos• Geographical locations, types and sources of funding involved• Availability of relevant data (and other) infrastructures• Ethical concerns and regulation

Page 7: The Challenges of Making Data Travel, by Sabina Leonelli

A simple case

Page 8: The Challenges of Making Data Travel, by Sabina Leonelli

CGCCGCCAC

[CyVerse]

Other DBs

Page 9: The Challenges of Making Data Travel, by Sabina Leonelli

Challenges of CollectionData sharing needs to be extensive, comprehensive, global and long-term. This requires:• Habitual data donation: challenge to current credit systems and research practices,

given considerable labor involved (NB: when adopted as community ethos, huge boost to research)

• Adequate standards & guidelines for data formatting: problematic given large diversity of methods & terminologies

• Well-organised databases: intelligent and labor-intensive curation to avoid ‘data dumps’

• Sharing of related materials: reliable stock centres and collections, rarely available & well-coordinated with databases

• Diversity of data types: now emphasis on cheap and easy quantitative measurements

• Sustainability in time: – commitment to data infrastructures beyond short term– continuous updates of data standards and classification to keep up with shifts in

technology and knowledge

Page 10: The Challenges of Making Data Travel, by Sabina Leonelli

Challenges of Re-Use• Qualitative results: very limited re-use*. Why?

• Misalignment between IT solutions and research questions/needs/situations; problems with access to related software

• Substantive disagreement over data management:– methods, terminologies, standards involved in data production and

interpretation– what counts as data in the first place (data as a relational category)

• Re-use often linked to participation in developing data infrastructures rarely the case for busy practitioners, also gap in skills

• Conflation of epistemic and economic value of data wish to capitalise on past investments risks encouraging conservatism (building on old data instead of pursuing new questions independently of which data are available)

Page 11: The Challenges of Making Data Travel, by Sabina Leonelli

Challenges of Openness• Semantic ambiguity: Openness means different things to different

people, even in same discipline (e.g. free of license, free of ownership, under CC-BY license, common good, good enough to share, unrestricted access and/or use, accessible without payment, unclear/open to interpretation..) – explicit debate is key

• Problematic implementation: research ethos, career structures & incentives lag behind; strong disincentives in competitive fields; publication pressure leads to information control

• IP: confusion around which modes of intellectual property apply, and to whom (individual researchers, labs, projects, networks, universities, funders)

• Social & ethical concerns: data as tokens of personal identity• Universities and the state: confusion around Open Data policies

perceived and perceived tensions with metrics of excellence and impact (e.g. UK)

Page 12: The Challenges of Making Data Travel, by Sabina Leonelli

The Open Data DivideHigh-resource bias: richer labs struggle to comply, poorer labs are left behind and/or choose not to participate• databases mostly display outputs of top English-speaking labs, which

have funds to curate contents, visibility to determine dissemination formats/procedures, resources and confidence to build on data donated by others

• involvement of poor/unfashionable labs, scientists in middle-low-income countries, non-scientists remains low & at ‘receiving’ end

• few provisions for situations of systematic disadvantage (e.g. lack of infrastructures and online access, funding, governmental support, expertise, materials; teaching demands; power cuts and transport delays) and vulnerability (e.g. where access to a resource/location is what gives competitive edge, as in archaeology, botany)

• low-resourced researchers are reluctant to contribute, fear it will undermine rather than increase international credibility

Page 13: The Challenges of Making Data Travel, by Sabina Leonelli

Conclusions

1. OD is Not Quick Nor Cheap

2. Open to What and When?

3. Link between OD and Access to Software

4. Estimating Prospective Value vs Preserving Open-EndednessMeanings of openness in Oxford English Dictionary:

1. ‘free’ (of..)

2. ‘accessible, exposed, unrestricted’

3. ‘available, reusable’

4. ‘flexible, unpredictable, uncertain, unsettled’

Policy and scientific discourse centers around 1-3, and yet 4 is crucial to science

Page 14: The Challenges of Making Data Travel, by Sabina Leonelli

Steps Forward: Researchers, Institutions, Funders and Learned Societies

• Current data collections are very limited in scope and difficult to re-use by outsiders

• Careful consideration needs to be given to what is disseminated, why, how and with which priority and time-line

• Need to promote– data curation as integral part of research, since being involved in developing

databases is key to effective data re-use – critical discussions about what counts as data and openness in each research

community / centre / project, taking account of specific ethical, legal and political concerns

• Crucial role of learned societies and funders in informing researchers as well as policy-makers of shifting needs, resources and constrains for each field

• Beware of the term “sharing”: it suggests, but does not entail, reciprocity and common ground

Page 15: The Challenges of Making Data Travel, by Sabina Leonelli

www.datastudies.eu 15

With thanks to the Exeter Data Studies Group:Brian Rappert

Louise BezuidenhoutAnn Kelly

Niccolo Tempini Gregor Halfmann

Rachel Ankeny

Main reference: Leonelli, Sabina (2016, in press) Data-Centric Biology: A Philosophical Study. Chicago, Il: The University of Chicago Press.

For other relevant publications, see www.datastudies.eu, @DataScienceFeed

This research was funded by the European Research Council under the European Union's Seventh Framework Programme (FP7/2007-2013) / ERC grant agreement

n° 335925; the UK Economic and Social Research Council (ESRC), grant number ES/F028180/1; and the Leverhulme Trust, grant award RPG-2013-153.