34
www.ci.anl.gov www.ci.uchicago.edu Hosted services for managing shared cyberinfrastructure Ian Foster Argonne National Laboratory & The University of Chicago Joint work with Rachana Ananthakrishnan, Josh Bryan, Kyle Chard, Mattias Lidman, Steven Tuecke, and others

GENI Engineering Conference -- Ian Foster

Embed Size (px)

DESCRIPTION

I was invited to talk at the 18th GENI Engineering Conference (http://groups.geni.net/geni/wiki/GEC18Agenda) on experiences in the Grid community with creating and operating large shared infrastructures. I chose to focus on our experiences using Software as a Service (SaaS: aka Cloud) to reduce barriers to the use of the capabilities required to create and operate virtual organizations.

Citation preview

Page 1: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

Hosted services for managing shared cyberinfrastructure

Ian FosterArgonne National Laboratory & The University of Chicago

Joint work with Rachana Ananthakrishnan, Josh Bryan, Kyle Chard, Mattias Lidman, Steven Tuecke, and others

GENI Engineering Conference, NYC, October 28, 2013

Page 2: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

Using cloud services to accelerate discoveryIan FosterArgonne National Laboratory & The University of Chicago

Joint work with Rachana Ananthakrishnan, Josh Bryan, Kyle Chard, Mattias Lidman, Steven Tuecke, and others

GENI Engineering Conference, NYC, October 28, 2013

Page 3: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

3

Cyberinfrastructure

• “a technological and sociological solution to the problem of efficiently connecting laboratories, data, computers, and people with the goal of enabling derivation of novel scientific theories and knowledge” [Wikipedia]

• AKA eScience, eResearch, Computer Supported Collaborative Work, Grid, …

Page 4: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

4

“The Anatomy of the Grid,” 2001 The … problem that underlies the Grid concept is coordinated

resource sharing and problem solving in dynamic, multi-institutional virtual organizations. The sharing that we are concerned with is not primarily file exchange but rather direct access to computers, software, data, and other resources, as is required by a range of collaborative problem-solving and resource-brokering strategies emerging in industry, science, and engineering. This sharing is, necessarily, highly controlled, with resource providers and consumers defining clearly and carefully just what is shared, who is allowed to share, and the conditions under which sharing occurs. A set of individuals and/or institutions defined by such sharing rules form what we call a virtual organization (VO).

Page 5: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

5

Large Hadron Collider

Grid technology accelerates discoveryHiggs discovery “only possible because of the extraordinary achievements of … grid computing”—Rolf Heuer, CERN DG

Page 6: GENI Engineering Conference -- Ian Foster

http://gstat2.grid.sinica.edu.tw/gstat/vo/atlas/

LHC Computing Grid “virtual organizations”

Page 7: GENI Engineering Conference -- Ian Foster
Page 8: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

8

Complexity in research is large and growing

Run experimentCollect dataMove dataCheck data

Annotate dataShare data

Find similar dataLink to literature

Analyze dataPublish data

Time

Page 9: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

9

Process automation for discovery

Run experimentCollect dataMove dataCheck data

Annotate dataShare data

Find similar dataLink to literature

Analyze dataPublish data

Time

Discovery IT as a service

Page 10: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

10

First: File transfer as a service

DataSource

DataDestinatio

n

User initiates transfer request

1

Globus Online moves and syncs files

2

Globus Online notifies user

3

EasyFastReliableAvailableSecure

Page 11: GENI Engineering Conference -- Ian Foster
Page 12: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

12

Early adoption is encouraging

Page 13: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

13

Early adoption is encouraging

12,000 registered users; >150 daily>25 PB moved; >1B files

10x (or better) performance vs. scp99.9% availability

Entirely hosted on Amazon

Page 14: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

14

File X: Users A, B: RWDirectory Y: Group G: R

Next: Share big data from existing storage

DataSource

User A selects file(s) to share, selects user or group, and sets permissions

1

Globus Online tracks shared files; no need to move files to cloud storage!

2

User B logs in to Globus Online

and accesses shared file

3

X Y

Page 15: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

15

Globus Online is SaaS for science

Globus Nexus (Identity, Group, Profile)

Sharing Service

Transfer Service

Globus Toolkit

Glo

bu

s C

on

nect

SaaS

Page 16: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

16

We are now expanding to a platform

Globus Nexus (Identity, Group, Profile)

Sharing Service

Transfer Service

Globus Toolkit

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nect

SaaSPaaS

Page 17: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

17

Globus Toolkit

Sharing Service

Transfer Service

Globus Nexus (Identity, Group, Profile)

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nect

Globus Online: Platform-as-a-Service

Page 18: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

18

The identity challenge in science

• Research communities often need to– Assign identities to their users – Manage user profiles– Organize users into groups for authorization

• Obstacles to high-quality implementations– Complexity of associated security protocols– Creation of identity silos– Multiple credentials for users– Reliability, availability, scalability, security

Page 19: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

19

Sharing Service

Transfer Service

Globus Toolkit

Glo

bu

s O

nlin

e A

PIs

Glo

bu

s C

on

nect

Streamline collaborative tool development

Globus Nexus (Identity, Group, Profile)

Globus Nexus (Identity,

group, & profile management)

Custom Web Application

• Allows developers to focus on core application logic

• Simplifies integration with campus infrastructure

Page 20: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

20

Nexus provides four key capabilities• Identity provisioning

– Create, manage Globus identities• Identity hub

– Link with other identities; use to authenticate to services

• Group hub– User-managed groups; groups can

be used for authorization• Profile management

– User-managed attributes; can use in group admission

I

II I

I

Ia b

I

UV

G

Key points:1) Outsource

identity, group, profile management

2) REST API for flexible integration

3) Intuitive, customizable Web interfaces

Page 21: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

21

Identity provisioning

• Globus Nexus can act as an identity provider (IDP) for a project– User management, email validation…

• DOE Systems Biology Knowledge Base (kBase) is an example of such a project. ~400 identities to date

I

Page 22: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

22

Identity hub

• Link identities from other federated IDP(s) with a Nexus identity– E.g., InCommon/Campus (SAML), Google (OpenID),

XSEDE (OAuth MyProxy), IGTF-certified X.509 CA, SSH• Use linked identity to authenticate to Nexus

– E.g., use campus identity, XSEDE identity (via OAuth)• Leverage Nexus federated IDP to 3rd-party services

– Via OAuth or LDAP– E.g., to Jira, Zendesk, Drupal, Confluence

• Have Nexus cache delegated credentials– X.509, via CILogon and MyProxy

II I

I

Page 23: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

23

Identity management

Page 24: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

24

• Dr. Smith creates a Nexus id, via BIRN project interface• Dr. Smith links campus id and XSEDE id• Dr. Smith can then:

– Authenticate to BIRN with campus id– Query catalog (Nexus/BIRN id)– Request data transfer from BIRN

to campus (Nexus and campus ids)– Request transfer from BIRN

to XSEDE (Nexus and XSEDE ids)– Repeat these tasks: use cached

credentials

(BIRN=Biomedical Informatics Research Network)

BIRN Gateway

Campus(SAML)

BIRN Campus

CampusidentityNexus

identity

Name: Dr. SmithEmail: [email protected]: Dr. SmithEmail: [email protected] id: CampusLinked id: XSEDE

XSEDE

OAuthXSEDEidentity

Identity hub: Biomedical science

Page 25: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

25

Use linked identity

25

Page 26: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

26

Group hub

• User-managed group creation, management• Flexible control over admission policies and visibility• Groups can be used in authorization decisions

26

Example: kBase• Every kBase user

added to kbase_users• Subgroups also

created• Groups used for

access control

I

UV

G

Page 27: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

27

Group membership interface

27

Page 28: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

28

Branded sites

Open Science Grid University of ChicagoXSEDE

DOE kBase Indiana University University of Exeter

Globus Online NERSC NIH BIRN

Page 29: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

29

Implementation and deployment

Elastic Load Balancer

Monitoring

Logging

OSSEC

Nexus

REST APIWeb

Nexus

REST APIWeb

Nexus

REST APIWeb

Page 30: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

30

Globus Nexus usage as of 9/13

• >12,000 users and 4977 linked identities

• 557 groups totaling:– 1638 active members– 229 pending or

invited members– 162 rejected or

suspended members• Largest group (kbase)

has 402 members

Nov-10

Feb-11

May-11

Aug-11

Nov-11

Feb-12

May-12

Aug-12

Nov-12

Feb-13

May-13

Aug-130

2,000

4,000

6,000

8,000

10,000

12,000

14,000

Tota

l use

rs

1 11 21 31 41 51 61 71 81 91 1011111211311

10

100

1000

Use

rs in

gro

up

Page 31: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

31

Identities and groups in XSEDE• Proposal: Replace current ad-hoc systems with

Globus Nexus identity and group service– Reduce complexity, reduce cost, increase capability

• Careful process of documentation and review– “Architecture and development requirements: User

and identity management”– “User management proposal: Affected use cases”– “User management proposal: Motivating stories”– “Proposal: Refactoring XSEDE identity and group

capabilities”• Hope to reach closure by end of 2013

Page 32: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

32

Cloud services to accelerate discovery

Accelerate discovery and innovation worldwide by providing research IT as a service

Leverage software-as-a-service to• provide millions of researchers with

unprecedented access to powerful tools; • enable a massive shortening of cycle times in

time-consuming research processes; and• reduce research IT costs dramatically via

economies of scale

Page 33: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

Thanks to ...U.S . DEPARTMENT OF

ENERGY

Page 34: GENI Engineering Conference -- Ian Foster

www.ci.anl.govwww.ci.uchicago.edu

Thank you! Questions?

[email protected]@uchicago.edu

www.globusonline.org