48
Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR [email protected] http://research.microsoft.com/barga

Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR [email protected]

Embed Size (px)

Citation preview

Page 1: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Data Laden CloudsTrends and Insights

Roger Barga, PhDArchitect

eXtreme Computing Group, MSR

[email protected] http://research.microsoft.com/barga

Page 2: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Technology trends• Riding the exponentials

• Convergent invisibility• NUIs and computing on behalf

• Client+Cloud experiences• Opportunity for Data and Analytics

• Cloud infrastructure challenges• Packaging, hardware, software, security

• Thoughts on the future

Presentation Outline

Page 3: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

It’s Easy To Forget That Not Very Long Ago …

• There were few or no experiences with …• Web sites, email, spam, phishing, computer

viruses• e-commerce, digital photography or telephony

• Cell phones were rare and expensive• A portable cassette player was still cool• HiFi was more common than WiFi• A “friend” was someone you actually knew

The future depends on vision and context …

Page 4: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Pre-PC Era(1980)

PC Era(1995)

Internet Era(2000)

Consumer Era(Today+)

21st century implicit and natural computing• Increasingly natural interfaces• Embedded intelligence in everyday objects• Ubiquitous network access and cloud

services

Computing Eras: Paucity To Plethora

MainframeEra

Page 5: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

What Has Changed? • System on a chip designs• Powerful mobile devices

• Graphics processing units • High quality graphics

• Explosive data growth• Ubiquitous sensors and

media

• Inexpensive embedded computing• Everyday smart objects, …

• Wireless spectrum pressure• Mobile device growth

• New software models• Social networks, clients+clouds

…LPIA x86

LPIA x86

DRAM ctlr

DRAM ctlr

OoO x86

LPIA x86

LPIA x86

1 MB cache

1 MB cache

LPIA x86

LPIA x86

1 MB cacheGPU GPU

PCIe ctlr

PCIe ctlr NoC NoC 1 MB

cache1 MB cache

LPIA x86

LPIA x86

1 MB cacheGPU GPU

LPIA x86

LPIA x86

1 MB cache

1 MB cache

LPIA x86

LPIA x86

DRAM ctlr

DRAM ctlr

OoO x86

LPIA x86

LPIA x86

DRAM ctlr

DRAM ctlr

DRAM ctlr

DRAM ctlr

LPIA x86

LPIA x86

LPIA x86

LPIA x86

1 MB cache

1 MB cache

1 MB cache

1 MB cache

LPIA x86

LPIA x86

LPIA x86

LPIA x86

1 MB cache

1 MB cache

1 MB cache

1 MB cache

LPIA x86

LPIA x86

LPIA x86

LPIA x86

1 MB cache

1 MB cache

1 MB cache

1 MB cache

LPIA x86

LPIA x86

PCIe ctlr NoC NoC NoC NoC NoC NoC PCIe

ctlrLPIA x86

LPIA x86

1 MB cache

1 MB cache

1 MB cache

1 MB cache

LPIA x86

LPIA x86

LPIA x86

LPIA x86

1 MB cache

1 MB cache

1 MB cache

1 MB cache

LPIA x86

LPIA x86

LPIA x86

LPIA x86

1 MB cache

1 MB cache

1 MB cache

1 MB cache

LPIA x86

LPIA x86

LPIA x86

LPIA x86 Custom accelerationLPIA

x86LPIA x86

Server

Desktop

MobileLPIA

x861 MB cache

1 MB cache

DRAM ctlr

LPIA x86

1 MB 1 MB cache

PCIe ctlr

GPU

GPUcache

Page 6: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Multidisciplinary challenges are the present and future… and the tools must empower, not frustrate

• A computation task has four characteristic demands:• Networking – delivering questions and answers• Computation – transforming information to produce new information• Data access – access to information needed by the computation• Data storage – long term storage of information

• The ratios among these and their costs are critical

New applications and systems will arise… if we create the right environment

Orders of Magnitude Always Matter

Page 7: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Your car drives and navigates for you… and also parks the car (already a feature on some cars)

• Your sound system only plays music you love… because it knows about every song you’ve ever heard

• Your phone only rings when you want to answer… because it knows your emotional state and social context

• All your family memories are recorded automatically… via MEMS-based sensors and solid state storage

• Your body calls an ambulance when you’re ill… via implanted, biologically powered diagnostic sensors

• Your DNA sample and lifestyle determine personalized treatment… because genotype-phenotype models are specific

• Your office adjusts its behavior to your needs… because it knows what you want to do

Imagine a Future Where …

Page 8: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

8

Successful Technologies Are Invisible

Page 9: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

SpreadsheetsWord processors

DOS1981-1995

NUI2012-…

AnticipatoryHuman-centric

TO

DA

Y

Location-based appsSocial networks

CLIENT+CLOUD2006-present

INTERNET1993-present

EmailWeb browsers

GUI1985-present

Desktop publishingMultimedia

Page 10: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Enhanced GUI NUI

Gestures

Voice

Environment

ContextTasks

Expressions

Multi-touch

Speech

Handwriting

Single Touch

Versus

Page 11: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com
Page 12: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

An FFT?• No, it’s an algorithm

A rendering pipeline?• No, it’s a software library

A feature recognition system?• No, it’s a building block

Our notion of “application” is increasingly complex• Many integrated and interoperating components

Our tools must enable creativity accordingly, creating experiences

What Is An Application?Microsoft Kinect

Page 13: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Working at your command

Working on your behalf

Page 14: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Fixed Portable Specialty Mobile

Create the Experience

The CloudThe Clients

Intelligent Objects

Page 15: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

15

Client

Cloud

HybridExperiences

AugmentedInteraction

ContextAwareness

EnvironmentAwareness

AnticipatoryProcessing

AdaptiveBehavior

Public DataServices

Trust & SecurityServices

Private DataServices

SensoryInputs

The Future of Experiences

Page 16: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Computer Rooms: Cloud COGS MatterWhat’s A Cloud?

Page 17: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

17

2005 2006 2007 2008 2009 2010

Chicago and Dublin Generation 3

Data Center Co-Location Generation

1

Modular Data Center Generation 4

Quincy and San Antonio Generation 2

ContainersServer IT PACRack

Facility PAC

Microsoft’s Data Center Evolution And Economics

Deployment Scale Unit

Time to MarketLower TCO

Scalability & Sustainability

Density & Deployment

Capacity

Page 18: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Discovery and Innovation in 2020

In the last two decades advances in computing technology, from processing speed to network capacity and the Internet, have revolutionized the way scientists work.

From sequencing genomes to monitoring the Earth's climate, many recent scientific advances would not have been possible without a parallel increase in computing power –

and with revolutionary technologies such as the quantum computer edging towards reality, what will the relationship between computing and science bring us over the next 15 years?

http://research.microsoft.com/towards2020science

Page 19: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

19

70M

1M

14M

High Performance Data-intensive Capacity

80%

20%14M

1M

Scientists & Engineers

55M Little to no access to high performance data-intensivecapacity

Lack of Broad Access

Page 20: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

New Bytes of Information in 2010Source: IDC, as reported in The Economist, Feb 25, 2010

20

1.2 x 1021

Page 21: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

21Sources: The Economist, Feb ‘10; IDC

By 2016 the New Large Synoptic Survey Telescope in Chile will acquire 140 terabytes in 5 days - more than Sloan acquired in 10 years

In 2000 the Sloan Digital Sky Survey collected more data in its 1st week than was collected in the entire history of Astronomy

The Large Hadron Collider at CERN generates 40 terabytes of data every second

Page 22: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

22

Economics of Storage

22Source: Wired Magazine April 2010; Figures represented in USD

2000

Disk Storage(per gigabyte)

Web Storage (per gigabyte)

2001200220032004200520062007200820092010

$44.56 $1,250$0.07 $0.15

But remember,… free storage is like free puppies

Page 23: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Hypothesis-driven• “I have an idea, let me verify it.”

• Exploratory• “What correlations can I glean?”

• Different tools and techniques• Rapid exploration of alternatives• Data volume and complexity are assets• … and challenges

• Simplicity really matters

Social Implications of the Data Deluge

Page 24: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Let Researchers Be Researchers…

Most researchers do not want to be system administrators

They don’t want to learn to use supercomputers

They want to focus on their research

They use standard tools: spreadsheets, statistical packages, desktop visualizationProgramming = modifying a few parameters in a trusted scripting language

Page 25: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Let Researchers Be Researchers…

BUT …

The data deluge means they must solve problems 10000 times the capacity of their desktop

Research is now interdisciplinarySharing access to large data collections and analysis tools is the future

A paradigm shift is coming

Page 27: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

NCBI BLAST

BLAST (Basic Local Alignment Search Tool) • One of the most important software in bioinformatics• Identify similarity between bio-sequences

Computationally intensive• Large number of pairwise alignment operations• A normal BLAST running could take 700 ~ 1000 CPU hours

For most biologists, two choices to run large jobs• Build a local cluster • Submit jobs to NCBI or EBI (long job queue times)

Page 28: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

R. palustris as a platform for H2 productionIdentify key drivers for producing hydrogen, promising

alternative fuel – understand R. palustris well enough to be able to improve its H2 production;

Characterize a population of strains and use integrative genomics approaches to dissect the molecular networks of H2 production;

BLAST to query 16 strains to sort out genetic relationships• Each strain, estimated ~5,000 proteins • Jobs kicked off NCBI clusters before completion• Against NCBI non-redundant proteins in ~30 min• Against ~5,000 proteins from another strain < 30 sec• Publishable result in one day for roughly $150.

Eric Schadt, Pac Bio and Sam Phattarasukol Harwood Lab, UW

Page 29: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

NCBI BLAST on Windows Azure• Parallel BLAST engine on Azure

• Query-segmentation data-parallel pattern• split the input sequences• query partitions in parallel• merge results together when done

• Follows the general suggested application model for Window Azure • Web Role + Queue + Worker

• With three special considerations• Batch job management• Task parallelism on an elastic Cloud• Large data-set management

Page 30: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

AzureBLAST Task-Flow

A simple split/Join pattern

Leverage multi-core of one instance • argument “–num_threads” of NCBI-BLAST

Task granularity • Large partition load imbalance • Small partition unnecessary overheads• NCBI-BLAST overhead• Data transferring overhead.

Best Practice: test runs to profile and set size to mitigate the overhead

BLAST task

Splitting task

BLAST task

BLAST task

BLAST task

Merging Task

Page 31: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Micro-Benchmarks Inform DesignTask size vs. Performance• Benefit of the warm cache effect• 100 sequences per partition is the best

choice

Instance size vs. Performance• Super-linear speedup with larger size

worker instances• Primarily due to the memory capability.

Task Size/Instance Size vs. Cost• Extra-large instance generated the best

and the most economical throughput• Fully utilize the resource

Page 32: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

All-Against-All ExperimentDiscovering Homologs • BLAST Uniref100, non-redundant protein sequence database• Discover the interrelationships of known protein sequences

“All against All” query• The database is also the input query• The protein database is large (4.2 GB size)

• Total of 9,865,668 sequences to be queried• Theoretically, 100 billion sequence comparisons!

Performance estimation• Estimated completion, 3,216,731 minutes (6.1 years) on 8 core VM

One of biggest BLAST jobs as far as we know• This scale of experiment is usually infeasible to most researchers

Page 33: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Our Approach• Allocated a total of ~4000 instances • 475 extra-large VMs (8 cores per VM)

• 8 deployments of AzureBLAST• Each deployment has its own co-located storage service

• Divide 10 million sequences into multiple segments• Each will be submitted to one deployment as one job for execution• 300,000 tasks on 3500 cores on Azure (70,000 bp or 35 sequences per

task)

Page 34: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Cloud System Upgrades

North Europe Data Center, totally 34,256 tasks processed

All 62 nodes lost tasks and then came back together. This is an update domain

~30 mins

~ 6 nodes in one group

Page 35: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

35 Nodes experience blob

writing failure at the same time

Failures HappenWest Europe Datacenter; 30,976 tasks are completed, and job was killed

Reasonable guess: Fault Domain is

working

Page 36: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Release of BLAST on Windows Azure• Open source release of NCBI BLAST on Windows Azure

(CTP)• Installation guide and users guide;• Developers guide in preparation; • Support the release for the next year – feature requests, fixes,

…• Free access to NCBI reference data sets on Windows Azure,

auto update; http://research.microsoft.com/azure

• Software can be installed and used immediately, customized for your institution (logos, private database, group databases), extend source

• Releasing result data from “all-against-all” run• BLAST Uniref100, non-redundant protein sequence database• Discover the interrelationships of known protein sequences• Available Dec. 1st, 2010.

Page 37: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Microsoft Client+Cloud Partnership

• Azure cloud services• Storage and computing

• Tier one support• Hardware and Azure software

• Hosted data sets• Multidisciplinary data analysis

• Technical engagement team• Community collaborations• Application support

One step of a worldwide program

Page 38: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

HPC and Clouds: Twins Separated At Birth

• Similar technology issues• Node and system architectures • Communication fabrics• Storage systems and analytics• Physical plant and operations• Programming models• Reliability and resilience

• Differing culture and sociology• Design and operations• Management and philosophy

Page 39: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Cloud/HPC Hardware Comparison

Predominate differences today• Network architecture and SAN storage

Dan Reed’s hypotheses• Convergence is coming

Attribute HPC Cloud

Processor High-end x86 x86

Memory/Node 1-8 GB 8 GB+

Local Disk Scratch only Permanent storage

SAN Storage Common Rare

Tertiary Storage Common Rare

Interconnect Infiniband or 10 GigE 1 GigE/10GigE

Network Flat Hierarchical

Page 40: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Environmental responsibility• Managing under a 100 MW envelope• Adaptive systems management

• Provisioning 100,000 servers• Hardware: at most one week after delivery• Software: at most a few hours

• Resilience during a blackout/disaster• Data center failure• Service rollover for 20M customers

• Programming the entire facility• Power, environmentals, provisioning• Component tracking, resilience, …

Cloud Scaling: Lessons for HPC Exascale

Page 41: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Consistency• Weak consistency is

goodComponent failure• Failure as a first class

objectSystemic resilience• Upgrade during

operation• Never go down

Rethinking Node Architecture

Windows AzureLive Services

Applications

Applications

SQL Azure

OthersWindowsMobile

WindowsVista/XP

WindowsServer

.NET Services

Fabric

Storage

Config

Compute

Application

Windows Azure

Page 42: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Break the LAN hierarchy• Multiple paths, commodity components• High bisection bandwidth

• We build WAN islands, not continents• Isolated facilities with limited connectivity

• Change the landscape• Serious, multiple terabit WANs• Many lambdas entering a facility• Fused node/LAN/WAN infrastructure

Rethinking LAN/WAN Networking

Page 43: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• People and hardware need not mix• Hardware cooling standards are conservative• Reliable at high temperature/humidity

• Optimize for efficiency• Cooling is (often) unnecessary• Design for ambient environments

• Energy reliability is (often) unnecessary• Design for power outages

• Use larger building blocks• Accept component failures

Rethinking Packaging and Cooling

Temperature

Hum

idity

Page 44: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

Truncated Life Models

Performance andFailure Data

MarkovPerformability

Models

TCO andProvisioning

DesiredLifetime

UtilityThreshold

Elapsed Time

Perf

orm

ance

• Factory sealed units (FRUs)• Over-provisioned for failure• Dynamic reconfiguration• Real-time, adaptive control

Rethinking Reliability: Fail In Place

Page 45: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Power redundancy is a major cost• Batteries to supply up to 15 minutes

• Use multiple sites, based on energy cost and carbon footprint• Electrical grid, solar, wind, fuel cell, …• Workload dispatching based on models

• Real-time optimization and prediction• Workload demand• Weather and seasonal models• Auction-based energy pricing• Infrastructure

• UPS, optical fiber and computing

Rethinking Energy Provisioning

Page 46: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

• Draw the right bounding box• It defines the problem you solve

• Understand your workload• Use only the hardware you need

• Metrics reward and punish• Choose carefully what you measure

• Embrace component failure• Hardware is cheap and readily recyclable

• Machines and people do not mix well• Consider sealing hardware at the factory

• Engage multidisciplinary solutions• Mechanical, electrical, economic, social …

• Culture shapes behavior• Implicit versus explicit costs

Some Research/Design Thoughts

Page 47: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

47

END-TO-END TRUST

INTELLIGENT MANAGEMENT

GLOBAL POLICY FRAMEWORK

HOLISTIC DESIGN

NEW EXPERIENCES

End-to-End Perspective

Page 48: Data Laden Clouds Trends and Insights Roger Barga, PhD Architect eXtreme Computing Group, MSR barga@microsoft.com

© 2010 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,

it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Questions?...