Codes, Clouds & Constellations: Open Science in the Data Decade

Preview:

DESCRIPTION

Presentation given at the CNI Meeting, Baltimore in April 2010.

Citation preview

                                                             

A centre of expertise in digital information management

www.ukoln.ac.uk

UKOLN is supported by:

Codes, Clouds & Constellations: Open Science in the Data Decade

Dr Liz Lyon, Director, UKOLN, University of Bath, UKAssociate Director, UK Digital Curation Centre

CNI Meeting, Baltimore, April 2010

.

This work is licensed under a Creative Commons LicenceAttribution-ShareAlike 2.0

1. Scaling to Share2. Publication and Attribution3. Pathways to Participation4. Institutions and Informatics

http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/publications.html#november-2009

•2010 Perspectives

•November 2009

•Consultation

•eResearch Australasia slides •http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html#2009-november-australasia

•Progress, Prospects?

Scaling to Share

Human Genome printed http://www.flickr.com/photos/johnjobby/2252981353/sizes/l/

From the Laboratory bench....

…to a national crystallography service....

....to Diamond Light Source

• “Bridging the chasm” between the local laboratory bench and large scale facilities

• Develop Integrated Information Model

• Use cases and Inter-disciplinary Pilots

• Cost-benefit analysis: before and after

http://www.ukoln.ac.uk/projects/I2S2/

Diamond Light Source

National Crystallography Service (NCS)

Local Earth Sciences Lab University of Cambridge

Function International service -multiple communities

UK service - multiple institutions. Also uses Diamond

Lone researcher at institution - uses NCS and ISIS large-scale facility

Administration Peer-reviewed proposal required

Paper-based records –experiments, safety ERA, instrument time

Multiple proposals, multiple forms

Metadata Core Scientific MetaData Model

eBank/eCrystals schema

?

Identifiers Beam-line number DOI InChI ?

Workflow Formulaic and bespoke

Formulaic, unrecorded Complex, unrecorded

Software In-house scripts In-house scripts + open-source suite

In-house scripts + open-source suite

Raw data In-house GDA store ATLAS data-store Laptop / local server

Derived data Taken offsite on laptop / USB stick

eCrystals repository Laptop / local server / USB stick

Technology race to market$1000 genome in <15 minutes ....by 2013?

...data deluge challenges....

• Large-scale data storage that is:– Cost-effective (rent on-demand)– Secure (privacy and IPR)– Robust and resilient– Low entry barrier / ease-of-use– Has data-handling / transfer / analysis capability

• Move sequencing out of genome centres

• “....analyse an entire human genome in a single day sitting with a laptop at your local Starbucks.”

...cloud services?

...data clouds in the media

Clients in the cloud

Post-genome decade

Human genomes: >24 published &almost 200 unpublished

“P4 medicine : predictive, personalised, preventive, participatory.”Leroy Hood – Institute for Systems Biology

• Each patient’s genome sequenced• Your genome is the basis of your medical record • New predictive models of health and disease• Individualised treatments focusing on preventative therapies

Image from Scientific American

Genome scale network biologyGenomic data as a commodity

• Sage Bionetworks : Integrative genomics• Develop predictive models of disease: liver /

breast / colon cancer, diabetes, obesity • Open data in the Sage Commons• Human and mouse: clinical and genetics data• Congress San Francisco 23-24 April 2010

Stephen Friend

They have shared their data….

Heather Piwowar

…but many researchers don’t share…

…and are reluctant to re-use data…

Publication and

Attribution

http://www.flickr.com/photos/digitalfemme57/3271063366/

Calls for action, new metrics

• Journal

• Article

• Workflow

• Data

• Annotation

• Concept

Macro

Micro / Nano

Attribution granularity

... complexity challenges...

Citing network models

• Multiple data sources

• Many standards

• Workflow integration

• User requirements

• Service functionality?

Pathways to Participation

http://www.flickr.com/photos/lemontwist/502860137/sizes/o/

Continuum of Openness

Open accessClosed Access

Participation

Lone scholar

Professional, experts

Volunteers interested amateurs

Citizen science

“dark data”

Creative Commons Attribution-Non-Commercial-Share Alike 2.0

Data Informatics: Logistics dilemma

Professional scientistCitizens

Capability

Capacity

Data scientists , LIS

Peer production

Volunteers, interested amateurs

Community curation

Creative Commons Attribution-Non-Commercial-Share Alike 2.0

Professional scientist

Observations

Audit

Preservation

Ontologies

Metadata schema

Annotation

Data management plans

Selection & Appraisal

Data cleansing

Training

Visualisation

Peer Production

Using gaming to drive curation

Professional Scientists Enthusiastic amateurs

Training Citizen scientist

Standards and ethics Local : natural history, environ.

Peer-review Global : astronomy

Organisational support Self-supporting

Citizen science...

Privacy issues?

… “participatory urbanism”?

“You have zero privacy anyway. Get over it”

Scott McNealy, CEO Sun Microsystems, 1999

Working with science professionals

...cultural challenges for faculty?

Institutions and Informatics

University of Edinburgh Informatics Forum http://www.flickr.com/photos/chris_malcolm/2638210422/sizes/l/

Open Science at Web-Scale Report 2009

Institutional response : High Throughput Biology

• North Carolina universities

• Cyber-infrastructure project

• Data cloud across three campuses

• “regional”

• Policy & practice

New data support structures

Facilitating team science

- Future Chips

- Biocomputation & Bioinformatics

- Tetherless World

- Integrative Systems Biology

- Graphic designers?

- Animators?

- Social scientists?

- Legal experts?

Embedding data informatics education

...for faculty & LIS...

Take homes1. Data sharing requires

pragmatic solutions

2. Attribution granularity & citation complexity

3. We need “the crowd”

4. Institutional strategies embrace informatics

5. The prospects are transformational...

http://www.flickr.com/photos/29170077@N05/4412360636/

Slides will be available at :http://www.ukoln.ac.uk/ukoln/staff/e.j.lyon/presentations.html

http://www.dcc.ac.uk/

Recommended