22
computationinstitute.org www.globusonline.org Research data management as a service Ian Foster [email protected]

Research Data Management as a Service

  • Upload
    globus

  • View
    192

  • Download
    5

Embed Size (px)

DESCRIPTION

This presentation is by Ian Foster, director of the Computation Institute at The University of Chicago. It was given at the Great Plains Network Annual Meeting, on May 29, 2013. For more information on Globus Online, visit globusonline.org. "What would a Dropbox for science look like?" asks Foster. "It should be trivial to collect, move, sync, share, analyze, annotate, publish, search, backup, and archive Big Data. But in reality it's often very challenging." Globus Online, a software as a service for data management, solves these problems. This slideshow explains how Globus Online does that for universities and laboratories around the world.

Citation preview

Page 1: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Research data management as a service

Ian Foster [email protected]

Page 2: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

High energy physics

Molecular biology

Cosmology

Genetics

Metagenomics

Linguistics

Economics

Climate change

Visual arts

Page 3: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

What would a “dropbox for science”

look like?

Page 4: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Registry  Staging  Store  

Ingest  Store  

Analysis  Store  

Community  Store  

Archive   Mirror  

Ingest  Store  

Analysis  Store  

Community  Store  

Archive   Mirror  

Registry  

Quota exceeded

!

Expired credentials

!

Network failed. Retry.

!

Permission denied

!

It should be trivial to Collect, Move, Sync, Share, Analyze, Annotate, Publish, Search, Backup, & Archive BIG DATA … but in reality it’s often very challenging

Page 5: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

• Collect  • Move  • Sync  • Share  • Analyze  

• Annotate  • Publish  • Search  • Backup  • Archive  

BIG  DATA  …for

Page 6: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

• Collect  • Move  • Sync  • Share  • Analyze  

• Annotate  • Publish  • Search  • Backup  • Archive  

• Collect  • Move  • Sync  • Share     Capabili8es  delivered  using    

So=ware-­‐as-­‐Service  (SaaS)  model  

Page 7: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Page 8: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Data Source

Data Destination

User  iniAates  transfer  request  

1

Globus  Online  moves/syncs  files  

2

Globus  Online  noAfies  user  

3

Page 9: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Data Source

User  A  selects  file(s)  to  share;  selects  user/group,  sets  share  permissions    

1

Globus  Online  tracks  shared  files;  no  need  to  move  files  to  cloud  storage!  

2

User  B  logs  in  to  Globus  Online  and  accesses  shared  file  

3

Page 10: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Early  adopAon  is  encouraging  

Page 11: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Early  adopAon  is  encouraging  

8,000  registered  users;  >100  daily  ~16  PB  moved;  ~1B  files  

10x  (or  beOer)  performance  vs.  scp  99.9%  availability  

En8rely  hosted  on  Amazon  

Page 12: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Globus  Online  already  does  a  lot  

Globus Toolkit

Sharing Service

Transfer Service

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

Page 13: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

We  are  also  adding  capabiliAes  

Globus Toolkit

Sharing Service

Transfer Service

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

Page 14: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

We  are  also  adding  capabiliAes  

Globus Toolkit

Sharing Service

Transfer Service

Dataset Services

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nec

t

Page 15: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Expanding Globus Online services

•  Ingest and publication –  Imagine a DropBox that not only replicates, but

also extracts metadata, catalogs, converts •  Cataloging

– Virtual views of data based on user-defined and/or automatically extracted metadata

•  Computation – Associate computational procedures,

orchestrate application, catalog results, record provenance

Page 16: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Builds on catalog as a service Approach

•  Hosted user-defined catalogs

•  Based on tag model <subject, name, value>

•  Optional schema constraints

•  Integrated with other Globus services

Three REST APIs /query/ •  Retrieve subjects /tags/ •  Create, delete, retrieve

tags /tagdef/ •  Create, delete, retrieve

tag definitions Builds  on  USC  Tagfiler  project  (C.  Kesselman  et  al.)  

Page 17: Research Data Management as a Service

17  

mydata42  

owner:  Francesco  type:  3dtomo  format:  HDF5  beamline:  2BM  

Tomography!

Define  dataset  Infer  type  Extract  metadata  

Populate  catalog(s)  

Locate  datasets  Access  files  

analyze  

Catalog  derived  products  

transfer/schedule  

Orchestra8on  Organiza8on  

Record    provenance    

Annotate,  share  browse,  search  

Page 18: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Our challenge:

Sustainability

We are a non-profit service provider to the non-profit

research community

Page 19: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Globus Online Provider Plans

Support ongoing operations

Offer value-added capabilities

Engage more closely with users

Page 20: Research Data Management as a Service

computationinstitute.org www.globusonline.org    Starting at $20k per year

•  Provider endpoints with sharing •  Multiple GridFTP servers per endpoint •  Branded web sites •  Alternate identity provider •  Usage reporting •  MSS optimizations •  Operations monitoring and management •  Input into and access to product roadmap

Provider Plans offer…

Page 21: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Thanks to great colleagues and collaborators

•  Steve Tuecke, Rachana Ananthakrishnan, Kyle Chard, Raj Kettimuthu, Ravi Madduri, Tanu Malik, and many others at Argonne & Uchicago

•  Carl Kesselman, Karl Czajkowski, Rob Schuler, and others at USC/ISI

•  Birali Runesha and others at UChicago Research Computing Center

Page 22: Research Data Management as a Service

computationinstitute.org www.globusonline.org    

Thank  you  to  our  sponsors!