80
Scalable Elastic System Architecture (SESA) Dan Schatzberg, Boston University Jonathan Appavoo, Boston University Orran Krieger,VMware Eric Van Hensbergen, IBM Research Austin

Scalable Elastic Systems Architecture (SESA)

Embed Size (px)

DESCRIPTION

Dan Schatzberg, Jonathan Appavoo, Orran Krieger, and Eric Van Hensbergen. Scalable elastic systems architecture. In Proceedings of the ASPLOS Runtime Environment/Systems, Layering, and Virtualized Environments (RESoLVE) Workshop. ASPLOS, March 2011.

Citation preview

Page 1: Scalable Elastic Systems Architecture (SESA)

Scalable Elastic System Architecture (SESA)

Dan Schatzberg, Boston UniversityJonathan Appavoo, Boston University

Orran Krieger, VMwareEric Van Hensbergen, IBM Research Austin

Page 2: Scalable Elastic Systems Architecture (SESA)

The goal

Perform more computation with fewer resources

Page 3: Scalable Elastic Systems Architecture (SESA)

Fixed Resources

• Hardware as a fixed resource

• Focus on reducing computation’s need for hardware resources

• Multiplex hardware resources for different computations

Page 4: Scalable Elastic Systems Architecture (SESA)

Elastic Resources

• Cloud Computing

• Pay as you go hardware

• Focus on providing hardware to the computation that requires it

Page 5: Scalable Elastic Systems Architecture (SESA)

Time to scale hardware

Days

Fixed Hardware

Minutes

Cloud Computing

Page 6: Scalable Elastic Systems Architecture (SESA)

Time to scale hardware

Days

Fixed Hardware

Minutes

Cloud Computing

Elastic Applications

Page 7: Scalable Elastic Systems Architecture (SESA)

Time to scale hardware

Days

Fixed Hardware

Minutes

Cloud Computing

Elastic Applications

Milliseconds

?

Page 8: Scalable Elastic Systems Architecture (SESA)

Interactive HPC

• Medical imaging application

• interactive

• 1 megapixel image

• quadratic memory consumption - ~14TB

Page 9: Scalable Elastic Systems Architecture (SESA)

Interactive HPC

• Fixed Hardware

• Purchase a cluster

Page 10: Scalable Elastic Systems Architecture (SESA)

Interactive HPC

• Cloud Computing

• Allocate a cluster

• Maintain interactivity

• 650+ EC2 instances - $8000 dollars / 8 hour day

Page 11: Scalable Elastic Systems Architecture (SESA)

Can we do better?

Page 12: Scalable Elastic Systems Architecture (SESA)

Where we’re starting

Treat elasticity as a first-class system characteristic

Page 13: Scalable Elastic Systems Architecture (SESA)

OUTLINE1. THE PROBLEM

2. OBSERVATIONS

1. Top-Down Demand

2. Bottom-Up Support

3. Modularity

3. OUR TAKE ON A SOLUTION

4. PROTOTYPE & CHALLENGES

Page 14: Scalable Elastic Systems Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Software

Page 15: Scalable Elastic Systems Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Page 16: Scalable Elastic Systems Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Page 17: Scalable Elastic Systems Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Page 18: Scalable Elastic Systems Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Page 19: Scalable Elastic Systems Architecture (SESA)

Top-Down DemandSystem Interface

Hardware

Page 20: Scalable Elastic Systems Architecture (SESA)

Events as Load

• Treat a service request as an event that is dispatched to resources

• As events occur, load increases

• As events are handled, load decreases

• Each layer being event-driven forces demand to flow top-down

Page 21: Scalable Elastic Systems Architecture (SESA)

Bottom-Up SupportSystem Interface

Hardware

Page 22: Scalable Elastic Systems Architecture (SESA)

Bottom-Up SupportSystem Interface

HardwareAllocate/Deallocate

Resources

Page 23: Scalable Elastic Systems Architecture (SESA)

Bottom-Up SupportSystem Interface

Hardware

Page 24: Scalable Elastic Systems Architecture (SESA)

Elastic Interface

• Support elasticity by interfacing via allocation and deallocation of physical or logical resources

• Each layer is constructed by being explicit with respect to resource consumption

• Be explicit with respect to time to meet a request

Page 25: Scalable Elastic Systems Architecture (SESA)

ModularitySystem Interface

Hardware

Page 26: Scalable Elastic Systems Architecture (SESA)

ModularitySystem Interface

Hardware

Page 27: Scalable Elastic Systems Architecture (SESA)

ModularitySystem Interface

Hardware

Page 28: Scalable Elastic Systems Architecture (SESA)

ModularitySystem Interface

Hardware

Page 29: Scalable Elastic Systems Architecture (SESA)

Object model

• Objects can take advantage

• the semantics of their request patterns

• the lifetime of an instance

• the occupancy w.r.t memory, processing and communication

• We can optimize for elasticity by taking advantage of modularity in a system

Page 30: Scalable Elastic Systems Architecture (SESA)

OUTLINE

1. THE PROBLEM

2. OBSERVATIONS

3. OUR TAKE ON A SOLUTION : SESA

1. EBB’s : Elastic Building Blocks

2. SEE: Scalable Elastic Executive - A LibOS

3. EPIC: Events as Interrupts

4. PROTOTYPE & CHALLENGES

Page 31: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

Hardware

Page 32: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

SSD

FAWN

Page 33: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

Partitioning

SSD

FAWN

Page 34: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

Page 35: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

System Software

SSD

FAWN

Kittyhawk

Page 36: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

HAL

Page 37: Scalable Elastic Systems Architecture (SESA)

Architecture OverviewApplications

SSD

FAWN

Kittyhawk

HAL

Page 38: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

?HAL

Page 39: Scalable Elastic Systems Architecture (SESA)

Architecture Overview

SSD

FAWN

Kittyhawk

?HAL

Page 40: Scalable Elastic Systems Architecture (SESA)

SESA

SEHAL SEHAL SEHAL

VM/Node

Elastic Partition of Nodes

VM/Node

VM/Node

SEMachine

SEExecutiveSEE SEE SEE

EBB Namespace

SE APP/SERVICE

Partitioning Layer

System Software Layers

Hardware Abstraction Layer

LibOS Layer

Component Layer

Page 41: Scalable Elastic Systems Architecture (SESA)

EBB’s

EBB NameSpace

A new Component Model for expressing and

encapsulating fine grain elasticity.

The Next Generation of Clustered Objects.

Page 42: Scalable Elastic Systems Architecture (SESA)

inc()

val() dec()

Memory

Processors

Memory

Processors

c c c c c c c c

inc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

Cinc

val dec

C

dref(ctr)->inc();

val Clustered Objects (CO)

Page 43: Scalable Elastic Systems Architecture (SESA)

What did we learn?

• Event-driven architecture for lazy and dynamic instantiation of resources

• Mechanism to create scalable software

Page 44: Scalable Elastic Systems Architecture (SESA)

Elastic Building Blocks

• Programming Model for Elastic and Scalable Components

• Span multiple nodes

• Built in On Demand nature -- encapsulation of policies for both allocation and deallocation of resources

Page 45: Scalable Elastic Systems Architecture (SESA)

SEE

SEExecutive

SEE SEE SEE

A Distributed Library OS Model designed to enable Elastic Software within the

context of legacy environments.

Next Generation of Libra

Page 46: Scalable Elastic Systems Architecture (SESA)

Libra

PartitionController

Partition Partition PartitionApplication Application Application

JVMApp

App

AppApp

App

App

AppDB

LibraLibra LibraOSPurposeGeneral

Hypervisor

Gateway

Figure 1. Proposed system architecture.

nels, hypervisors run other operating systems with few or no mod-ifications [27, 5, 42]. By running an operating system (the con-troller) in a hypervisor partition, applications can be migrated in-crementally to an exokernel system. In the first step of the migra-tion, the application runs in its own partition instead of as an oper-ating system process but accesses system services on the controllerremotely through the 9P distributed filesystem protocol [32]. Later,as needed, these relatively expensive remote calls are replaced byefficient library implementations of the same abstractions.Proponents of exokernels object to hypervisors because “[hy-

pervisors] confine specialized operating systems and associatedprocesses to isolated virtual machines...sacrificing a single view ofthe machine” [25]. However, in our approach, a controller parti-tion provides a single view of the machine: applications runningin other partitions correspond to processes at the controller. This istrue even for applications that run on other machines in a cluster.Section 2 explains how Libra achieves this unified machine view.This paper makes three main contributions:

• The hypervisor approach for transforming existing systems intohigh-performance, specialized systems.

• A case study in which Nutch, a large distributed application thatincludes a commercial JVM (IBM’s J9 [4]), is transformed intosuch a system.

• Examples of Nutch-specific optimizations that result in goodperformance.The rest of the paper is organized as follows. Section 2 explains

the overall architecture we are proposing. Section 3 describes theimplementation of Libra and our port of J9 to Libra. Details ofour port of Nutch to J9/Libra, including specializations that im-prove its throughput, are described in Section 4. Section 5 evaluatesJ9/Libra’s performance on various benchmarks, including Nutch,the standard SPECjvm98 [39] and SPECjbb2000 [38] benchmarks,and two microbenchmarks. Section 6 discusses related work. Fi-nally, Section 7 outlines future work and Section 8 concludes thepaper.

2. DesignThis section explains our proposed architecture, which is depictedin Figure 1. At the bottom of the software stack, a hypervisorhosts a controller partition and one or more application partitions.The hypervisor interacts directly with the hardware to provisionhardware resources among partitions, providing high-level mem-ory, processor, and device management.Above the hypervisor, a general-purpose operating system such

as Linux runs as a “controller” partition. The controller is the pointof administrative control for the system and provides a familiar en-

vironment for both users and applications. Each application par-tition is launched from the controller by a script that invokes thehypervisor to create a new partition and load the application into it.This script also launches a gateway server that permits the applica-tion to access the controller’s resources, services, and environment.The gateway server is an extended version of Inferno [31],

which is a compact operating system that can run on other operatingsystems. Inferno creates a private, file-like namespace that containsservices such as the user’s console, the controller’s filesystem, andthe network (see Figure 2). The application accesses this names-pace remotely via the 9P resource sharing protocol [30], which runsover a shared-memory transport established between the controllerand application partitions. A more detailed description of our ex-tensions to Inferno and a preliminary performance analysis of thetransport is available in another paper [20].Note that nothing in the architecture requires applications to

access all resources through the gateway, because the hypervisorallows resources and peripherals to be dedicated to an applicationpartition and accessed directly. Alternatively, applications can use9P to access resources across the network, either directly or throughthe gateway. Facilities for redundant resource servers, fail over, andautomated recovery have also been explored [16].As in an exokernel system, each application is linked with a

library operating system (libOS). However, unlike in exokernelsystems, the libOS focuses only on performance-critical services;other services are obtained from the controller through the gateway.For example, Libra, the libOS described in this paper, contains athread library implementation but accesses files remotely.This approach not only reduces the cost of developing a libOS

but also reduces the cost of administering new partitions. Becauseapplications share filesystems, network configuration, and systemconfiguration with the controller, administering an additional ap-plication partition is cheaper than administering an additional op-erating system partition. Also, because 9P gateways run as the userwho launched the application, they inherit the same permissionsand limitations, taking advantage of existing security mechanismsand resource policies.Finally, because we use a hypervisor instead of an exokernel to

multiplex system resources, applications can use privileged execu-tion modes and instructions. This enables optimizations and easesmigration, because a libOS can be simply a pared-down, general-purpose operating system. The combination of supervisor-mode ca-pability and 9P for access to remote services is flexible: applicationpartitions can be like traditional operating systems, self-containedwith statically-allocated hardware resources; like microkernel ap-plications, with system services spread across various protectiondomains; or like a hybrid of the two architectures that makes sensefor the particular workload being executed.

3. J9/Libra ImplementationThis section describes the implementation of Libra and the port ofthe J9 [4] virtual machine to this new platform. It was a somewhatatypical porting process, since we were simultaneously portingJ9 to the Libra abstractions while designing and implementingnew Libra abstractions to support J9’s execution. We begin byintroducing the relevant aspects of J9 and briefly describing ourporting and debugging methodology. Next, we discuss the majorLibra subsystems needed by J9. Finally we highlight the majorlimitations of our current implementation.

3.1 J9 OverviewJ9 is one of IBM’s production JVMs and is deployed on morethan a dozen major platforms ranging from cell phones to zSeriesmainframes. Because it runs on such a diverse set of platforms, agreat deal of effort has been invested by the J9 team in defining

Architecture

Page 47: Scalable Elastic Systems Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

9p

$

$

Page 48: Scalable Elastic Systems Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$

Page 49: Scalable Elastic Systems Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

9p

$

$ java -cp my.jar

Page 50: Scalable Elastic Systems Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$ for ((i=0;i<44;i++))do java -cp my.jar &done

Page 51: Scalable Elastic Systems Architecture (SESA)

Libra

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$ for ((i=0;i<44;i++))do java -cp my.jar &done

Page 52: Scalable Elastic Systems Architecture (SESA)

PowerPC Blades: Libra Workers

Pool of Libra PartitionsX86 Linux Front Ends

$ java -cp my.jar

9p

$ for ((i=0;i<44;i++))do java -cp my.jar &done

Libra

Page 53: Scalable Elastic Systems Architecture (SESA)

What did we learn?

• Specialized environment for each application

• Lightweight system layer implementing services for performance

• General purpose OS for non-performance critical services

Page 54: Scalable Elastic Systems Architecture (SESA)

SEE : A LibOS for SESA• Distributed LibOS that

can elastically span nodes

• Instances cooperate to support the allocation and deallocation of EBB’s

• Enables compatibility with Front End nodes running via unified 9p namespace

locality aware memory allocator

event dispatcher

inter-node communication protocols and

primitives

EBB InfrastructureFS Name Space Protocol (9p)

Scalable Elastic Executive (SEE)

Per-node EBB manifestations

SEHAL

Page 55: Scalable Elastic Systems Architecture (SESA)

SEMachines and EPICs

SEHAL SEHAL SEHAL

SEMachine Hardware Abstraction Layer : EPIC

Page 56: Scalable Elastic Systems Architecture (SESA)

Programmable Interrupt Controller

1 10 0

Interrupt

Execution

Source

Page 57: Scalable Elastic Systems Architecture (SESA)

1 11 0

Interrupt

Execution

Source

Programmable Interrupt Controller

Page 58: Scalable Elastic Systems Architecture (SESA)

1 11 0

Event

Action

Source

Programmable Interrupt Controller

Page 59: Scalable Elastic Systems Architecture (SESA)

Elastic Programmable Interrupt Controller

Event

Action

Source

1 11 0 0 001

Page 60: Scalable Elastic Systems Architecture (SESA)

1101

Event

Action

...

...

110 1

...

Elastic Programmable Interrupt Controller

Source

Page 61: Scalable Elastic Systems Architecture (SESA)

Elastic Programmable Interrupt Controller

• Programmed by the SEE

• Provides the minimum requirement of elastic applications - mapping load to resources

• Portable layer

• Take advantage of network features such as broadcast and multicast

Page 62: Scalable Elastic Systems Architecture (SESA)

OUTLINE

1. THE PROBLEM

2. OBSERVATIONS

3. OUR TAKE ON A SOLUTION

4. PROTOTYPE & CHALLENGES

Page 63: Scalable Elastic Systems Architecture (SESA)

PROTOTYPE APP

Kittyhaw

k

Elastic Matrix Cache

ElasticMatrixOps

Sage*

OL

SESA SAGE SERVICES

SEE: EBB’s + EH

AL

Traditional HW

Advanced HW

Page 64: Scalable Elastic Systems Architecture (SESA)

Challenges and Discussion

Page 65: Scalable Elastic Systems Architecture (SESA)

OUTLINE

1. THE PROBLEM

1. Pay as you go computing

2. Insufficient systems support for elasticity

2. OBSERVATIONS

3. OUR TAKE ON A SOLUTION

4. PROTOTYPE & CHALLENGES

Page 66: Scalable Elastic Systems Architecture (SESA)

Provider

ConsumerSoftware

Pay as you go hardware

Page 67: Scalable Elastic Systems Architecture (SESA)

Provider

ConsumerRequest

Software

Pay as you go hardware

Page 68: Scalable Elastic Systems Architecture (SESA)

Provider

ConsumerSoftware

Pay as you go hardware

Page 69: Scalable Elastic Systems Architecture (SESA)

Elastic Website

Provider

Consumer

Load Balancer

Page 70: Scalable Elastic Systems Architecture (SESA)

Elastic Website

Provider

Consumer

Load Balancer

Page 71: Scalable Elastic Systems Architecture (SESA)

Elastic Website

Provider

Consumer

Load Balancer

Page 72: Scalable Elastic Systems Architecture (SESA)

Other Elastic Applications

• Analytics

• Batch computation

• Stream processing

Page 73: Scalable Elastic Systems Architecture (SESA)

What’s the problem?

• Allocation/Boot-time

• Programmability

Page 74: Scalable Elastic Systems Architecture (SESA)

Medical Imaging Application

• Megapixel image

• Quadratic algorithm

• (1 mil pixels * 4 bytes/pixel)^2 ~ 14 TB

• On Amazon EC2 ~ $8000 per day

Page 75: Scalable Elastic Systems Architecture (SESA)

Snowflock

Provider

Consumer

Page 76: Scalable Elastic Systems Architecture (SESA)

Snowflock

Provider

Consumer

Page 77: Scalable Elastic Systems Architecture (SESA)

Snowflock

Provider

Consumer

Page 78: Scalable Elastic Systems Architecture (SESA)

Distributing an ObjectNon-Distributed Object Instance

LR0

R1

R2

Region List LockRegion List

Other Data

Structures

Page 79: Scalable Elastic Systems Architecture (SESA)

LR0

R1

R2

Root

LR0

Rep0

LR0

Rep2R2

LR1

Rep1

Page 80: Scalable Elastic Systems Architecture (SESA)

Elastic Programmable Interrupt Controller

1 11 0

Event

Action

Source