38
accelerating science via outsourcing and automation Ian Foster Argonne National Laboratory and University of Chicago [email protected] ianfoster.org The Discovery Cloud!

The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Embed Size (px)

DESCRIPTION

Director's Colloquium at Los Alamos National Laboratory, September 18, 2014. We have made much progress over the past decade toward harnessing the collective power of IT resources distributed across the globe. In high-energy physics, astronomy, and climate, thousands work daily within virtual computing systems with global scope. But we now face a far greater challenge: Exploding data volumes and powerful simulation tools mean that many more--ultimately most?--researchers will soon require capabilities not so different from those used by such big-science teams. How are we to meet these needs? Must every lab be filled with computers and every researcher become an IT specialist? Perhaps the solution is rather to move research IT out of the lab entirely: to leverage the “cloud” (whether private or public) to achieve economies of scale and reduce cognitive load. In this talk, I explore the past, current, and potential future of large-scale outsourcing and automation for science.

Citation preview

Page 1: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

accelerating science via outsourcing and automation

Ian Foster Argonne National Laboratory and University of Chicago

[email protected]

ianfoster.org

The Discovery Cloud!

Page 2: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Publish

results

Collectdata

Design experimen

t

Test hypothesis

Hypothesize

explanation

Identify patterns

Analyzedata

The discovery process:Iterative and time-consuming

Pose questio

n

Page 3: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

We've got no money, so we've got to think

Ernest Rutherford

Page 4: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Civilization advancesby extending the number of important operations which we can perform without thinking about them

Alfred North Whitehead (1911)

Page 5: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

About 85% of my “thinking” time was spent getting into a position to think, to make a decision, to learn something I needed to know

J.C.R Licklider, 1960

Page 6: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Automation is required to apply more sophisticated methods at larger scales

Page 7: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Outsourcing is needed to achieve economies of scale in the use of automated methods

Automation is required to apply more sophisticated methods at larger scales

Page 8: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Outsourcing and automation:(1) The Grid

A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to computational capabilities

Foster and Kesselman, 1998

Page 9: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Higgs discovery “only possible because of the extraordinary achievements of … grid computing”—Rolf Heuer, CERN DG

10s of PB, 100s of institutions,1000s of scientists, 100Ks of CPUs, Bs of tasks

Page 10: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Outsourcing and automation:(2) The Cloud

Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction

NIST, 2011

Page 11: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

11

Page 12: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Tripit exemplifies process automation

MeBook flights

Book hotel

Record flights

Suggest hotel

Record hotel

Get weather

Prepare maps

Share info

Monitor prices

Monitor flight

Other servicesTime

Page 13: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

How the “business cloud” works

Platformservices

Database, analytics, application, deployment, workflow, queuing Auto-scaling, Domain Name Service, content distributionElastic MapReduce, streaming data analyticsEmail, messaging, transcoding. Many more.

Infrastructureservices

Computing, storage, networkingElastic capacityMultiple availability zones

Page 14: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

The Intelligence Cloud

Page 15: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Process automation for science

Run experimentCollect dataMove dataCheck data

Annotate dataShare data

Find similar dataLink to literature

Analyze dataPublish data

Time

Automate and

outsource:

theDiscovery cloud

Page 16: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Analysis

Staging Ingest

Community Repository

Archive Mirror

Registry

Next-gen genomesequencer

Telescope

In millions of labs worldwide, researchers struggle with massive data, advanced software, complex protocols, burdensome reporting

Globus research data management services

www.globus.org

Simulation

Page 17: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

“I need to easily, quickly, and reliably mirror [portions of] my data to other

places.”

Research Computing HPC Cluster

Lab Server

Campus Home Filesystem

Desktop Workstation

Personal Laptop

XSEDE Resource

Public Cloud

Page 18: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

“I need to easily and securely share my data with colleagues.”

Page 19: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

“I need to get data from a scientific instrument to my analysis server.”

Next GenSequencer

Light Sheet Microscope

MRIAdvanced Light Source

Page 20: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Globus transfer & sharing; identity & group management, data discovery &

publication

25,000 users, 60 PB and 3B files transferred, 8,000 endpoints

Page 21: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

The Globus Galaxies platform:Science as a service

Globus Galaxies platform

Tool and workflow execution, publication, discovery, sharing;identity management; data management; task scheduling

Infra-structureservices

EC2, EBS, S3, SNS, Spot, Route 53, Cloud Formation

Ematter materials scienceFACE-IT

PDACS

Page 22: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

22

Flexible, scalable, affordable

genomics analysis for all biologists

Ravi Madduri, Paul Davé , Dina Sulakhe, Alex Rodriguez

Page 23: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Globus Genomics

Sequencing Centers

Sequencing Centers

PublicData

Storage

Local Cluster/CloudSeq

Center

Research Lab

Globus Provides a• High-performance • Fault-tolerant• Secure

file transfer Service between all data-endpoints

Data Management Data Analysis

Picard

GATK

Fastq Ref Genome

Alignment

Variant Calling

Galaxy Data Libraries

Globus Genomics on Amazon EC2

• Analytical tools are automatically run on the scalable compute resources when possible

• Globus Integrated within Galaxy

• Web-based UI• Drag-Drop workflow

creations• Easily modify Workflows

with new tools

Galaxy-based workflow management

FTP, SCP, others

FTP, SCP

SCP

Globus Genomics

FTP,

SCP,

HTTP

Page 24: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

It’s proving popular

DobynsLab

Cox LabVolchenboum LabOlopade Lab

Nagarajan Lab

Page 25: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

25

2.5 million core hours used in first six months of 2014

Page 26: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

• Pricing includes• Estimated compute• Storage (one month)• Globus Genomics platform usage• Support

Costs are remarkably low

Page 27: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

metagenomics.anl.gov

Data service as community resource

Page 28: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

kbase.us

Page 29: The Discovery Cloud: Accelerating Science via Outsourcing and Automation
Page 30: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Linking simulation and experiment to study disordered structures

Diffuse scattering images from Ray Osborn et al., Argonne

SampleExperimentalscattering

Material composition

Simulated structure

Simulatedscattering

La 60%Sr 40%

Detect errors (secs—mins)

Knowledge basePast experiments;

simulations; literature; expert knowledge

Select experiments (mins—hours)

Contribute to knowledge base

Simulations driven by experiments (mins—days)

Knowledge-drivendecision making

Evolutionary optimization

Page 31: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Integrate data movement, management, workflow, and computation to accelerate data-driven applications

New data, computational capabilities, and methods create opportunities and

challengesIntegrate statistics/machine learning to assess many models and calibrate them against `all' relevant data

New computer facilities enable on-demand computing and high-speed analysis of large quantities of data

Page 32: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

A lab-wide data architecture and facility

32

Page 33: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Immediate assessment of alignment quality in near-field high-energy

diffraction microscopy

33

Before

After

Hemant SharmaJustin WozniakMike WildeJon Almer

Page 34: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

34

One APS data node: 125 destinations

Page 35: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Same node(1 Gbps link)

Page 36: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Accelerate discovery via automation and outsourcing

And at the same time:– Enhance reproducibility– Encourage entrepreneurial science– Democratize access and contributions– Enhance collaboration

The discovery Cloud!

Page 37: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

My work is supported by:

U.S . DEPARTMENT OF

ENERGY

37

Page 38: The Discovery Cloud: Accelerating Science via Outsourcing and Automation

Questions?

[email protected]