60
D u k e S y s t e m s Orca internals 101 Jeff Chase Orcafest 5/28/09

Orca internals 101

  • Upload
    justin

  • View
    56

  • Download
    0

Embed Size (px)

DESCRIPTION

Orca internals 101. Jeff Chase Orcafest 5/28/09. Summary of Earlier Talks. Factor actors/roles along the right boundaries. stakeholders, innovation, tussle Open contracts with delegation resource leases Orca as control plane for GENI: Aggregates are first-class (“authority”) - PowerPoint PPT Presentation

Citation preview

Page 1: Orca internals 101

D u k e S y s t e m s

Orca internals 101

Jeff Chase

Orcafest 5/28/09

Page 2: Orca internals 101

2D u k e S y s t e m s

Summary of Earlier Talks

• Factor actors/roles along the right boundaries.– stakeholders, innovation, tussle

• Open contracts with delegation– resource leases

• Orca as control plane for GENI:– Aggregates are first-class (“authority”)– Slice controllers are first-class (“SM”)– Clearinghouse brokers enable policies under

GSC direction

Page 3: Orca internals 101

3D u k e S y s t e m s

For more…

• For more on all that, see the slides tacked onto the ends of the presentation.

Page 4: Orca internals 101

4D u k e S y s t e m s

Orca Internals 101

• Leasing core: “shirako”

• Plugins and control flow

• Actor concurrency model

• Lease state machines

• Resource representations

Page 5: Orca internals 101

5D u k e S y s t e m s

Actors: The Big Picture

Brokerrequest

ticket

redeem

lease

SliceController

Authority

delegate

Page 6: Orca internals 101

6D u k e S y s t e m s

Actors: The Big Picture

Brokerrequest

ticket

redeem

lease

SliceController

Authority

delegate

Integrate experiment control tools here (e.g., Gush and DieselNet tools by XMLRPC to generic slice controller) Integrate substrate

here with authority-side handler plugins

These are inter-actor RPC calls made automatically: you should not have to mess with them.

Page 7: Orca internals 101

7D u k e S y s t e m s

Terminology

• Slice controller == slice manager == service manager == guest controller– Blur the distinction between the “actor” and

the controller module that it runs.

• Broker == clearinghouse (or a service within a clearinghouse)– == “agent” (in SHARP, and in some code)

• Authority == aggregate manager– Controls some portion of substrate for a site or

domain under a Management Authority

Page 8: Orca internals 101

8D u k e S y s t e m s

Slice controller monitorsthe guest and obtains/renews

leases to meet demand

Slice Controllers

• Separate application/environment demand management from resource arbitration

ExperimentManager Slice

Controller

Site Authority

Aggregate/authorities monitorresources, arbitrate access,

and perform placement of guestrequests onto resources

Experiment control tools (e.g., Gush and DieselNet tools) e.g. with XMLRPC to generic slice controller.

Page 9: Orca internals 101

9D u k e S y s t e m s

ProtoGENI?

Possibility: export ProtoGENI XMLRPC from a generic slice controller

> > (One thing to consider is how close it is to protogeni, which is not that> > complicated, something like:)> From poster on protogeni.net. (We should be able to support it. )

> > - GetCredential() from slice authority> > - CreateSlice() goes to slice authority> > - Register() slice with clearinghouse> > - ListComponents() goes to CH and returns list of AMs and CMs> > - DiscoverResources() to AM or CM returns rspecs> > - RequestTicket goes straight to an AM> > - RedeemTicket also goes to the AM> > - StartSliver to the AM after redeem: "bring sliver to a running state"

Page 10: Orca internals 101

10D u k e S y s t e m s

Brokerrequest

ticket

redeem

lease

SliceController

Authority

delegate

Brokers and Ticketing

• Sites delegate control of resources to a broker– Intermediary/middleman

• Factor allocation policy out of the site– Broker arbitrates

resources under its control

– Sites retain placement policy

• “Federation"– Site autonomy

– Coordinated provisioningSHARP [SOSP 2003]w/ Vahdat, Schwab

Page 11: Orca internals 101

11D u k e S y s t e m s

Actor structure: symmetry

• Actors are RPC clients and servers

• Recoverable: commit state changes– Pluggable DB layer (e.g., mysql)

• Common structures/classes in all actors– A set of slices each with a set of leases

(ReservationSet) in various states.– Different “*Reservation*” classes with different state

machine transitions.– Generic resource encapsulator (ResourceSet and

IConcreteSet)– Common kernel: Shirako leasing core

Page 12: Orca internals 101

12D u k e S y s t e m s

Actors and containers

• Actors run within containers– JVM, e.g., Tomcat– Per-container actor registry and keystore– Per-container mysql binding

• Actor management interface– Useful for GMOC and portals– Not yet remoteable

• Portal attaches to container– tabs for different actor “roles”– Dynamic loadable controllers and views (“Automat”)

Page 13: Orca internals 101

13D u k e S y s t e m s

“Automat” Portal

Page 14: Orca internals 101

14D u k e S y s t e m s

Shirako Kernel

Snippet from “developer setup guide” on the web. The pathschanged in RENCI code base: prefix with core/trunk

Page 15: Orca internals 101

15D u k e S y s t e m s

Shirako Kernel Events

The actor kernel (“core”) maintains state and processes events: – Local initiate: start request from local actor

• E.g., from portal command, or a policy

– Incoming request from remote actor– Management API

• E.g., from portal or GMOC

– Timer tick– Other notifications come through tick or

protocol APIs

Page 16: Orca internals 101

16D u k e S y s t e m s

Pluggable Resources and Policies

Leasing Coreinstantiate guests

monitoringstate storage/recovery

negotiate contract termsevent handlinglease groups

Configure resources

PolicyModules

“Controllers”

ResourceHandlers

andDrivers

Page 17: Orca internals 101

17D u k e S y s t e m s

Kernel control flow

• All ops come through KernelWrapper and Kernel (.java)– Wrapper: validate request and access

• Most operations pertain to a single lease state machine (FSM)

• But many access global state, e.g., alloc resources from shared substrate.– Kernel: execute op with a global “core” lock – Nonblocking core, at least in principle

Page 18: Orca internals 101

18D u k e S y s t e m s

Kernel control flow

• Acquire core lock

• Invoke *Reservation* class to transition lease FSM

• Release core lock

• Commit new state to actor DB

• Execute async tasks, e.g. “service*” methods to invoke plugins, handlers

• Ticks probe for completions of pending async tasks.

Page 19: Orca internals 101

19D u k e S y s t e m s

NascentActive

Ticketed

Active

Joining Active Extending Active Closed

Extending

Extending

requestticket

returnticket

redeemticket

closehandshake

Site policy assigns concrete resources to

match ticket.

Initialize resources when lease begins (e.g., install nodes).

Broker policy selects resource types and sites, and sizes unit quantities.

returnlease update

lease

updateticket

request ticketextend

request leaseextend

Guest usesresources.

Resources joinguest application.

original lease term

Service Manager

Broker

Site Authorit

yTeardown/reclaim

resourcesafter lease expires, or on

guest-initiated close.

Reservation may change size (“flex”)

on extend.

Guest may continue to extend

lease by mutual agreement.

form resource request

TimePriming

Ticketed

Ticketed

Lease State Machine

Page 20: Orca internals 101

20D u k e S y s t e m s

Handlers

• Invocation upcalls through ResourceSet/ConcreteSet on relevant lease transitions.– Authority: setup/teardown

– Slice controller: join/leave

– Unlocked “async task” upcalls

• Relevant property sets are available to these handler calls– For resource type, configuration, local/unit

– Properties ignored by the core

• ConcreteSet associated with ShirakoPlugin– e.g., COD manages “nodes” with IP addresses, invokes

handlers in Ant

Page 21: Orca internals 101

21D u k e S y s t e m s

Drivers

• Note that a handler runs within the actor

• So how to run setup/teardown code on the component itself?

• How to run join/leave code on the sliver?

• Option 1: handler invokes management interfaces, e.g., XMLRPC, SNMP, ssh

• Option 2: invoke custom driver in a NodeAgent with secure SOAP

Page 22: Orca internals 101

22D u k e S y s t e m s

Example: VM instantiation

handlersdrivers

Page 23: Orca internals 101

23D u k e S y s t e m s

TL1 Driver Framework

• General TL1 (command line) framework– Substrate component command/response

• What to “expect”?– XML file

Page 24: Orca internals 101

24D u k e S y s t e m s

Orca: Actors and Protocols

Authority

Guest

Broker

extend

ticket allocateextend

redeemassign

formulate requests

redeemtickets

redeemextendLease

ticketextendTicket updateTicket updateLease

core

core

calendar

inventory

core

resource pools

[1]

[2]

[4] [8]

[6]

[5]

[7][3]

Page 25: Orca internals 101

25D u k e S y s t e m s

Host site (resource pool)Guest application

Service Manager

Policy Plugin Points

Site Authority

lease status notify

setup/teardown

handlers for resources

assignment policy

leasingservice

interface

applicationresource

request policy

leaseevent

interface

join/leave handler

for service

leasing API

Broker plug-in broker policies forresource selection and provisioning

Broker service interface

Negotiation between policy plugins over allocation and configuration

Properties used to guide negotiation

Page 26: Orca internals 101

26D u k e S y s t e m s

Host site (resource pool)Guest application

Service Manager

Property Lists

Site Authority

lease status notify

setup/teardown

handlers for resources

assignment policy

leasingservice

interface

applicationresource

request policy

leaseevent

interface

join/leave handler

for service

leasing API

Broker Examples: FCFS,priority,economic

Broker service interface

Request properties Resource Properties

Configuration properties

Unit properties

elastic,deferrable machine.memorymachine.clockspeed

image.id,public.key

host.ip,host.key

Page 27: Orca internals 101

27D u k e S y s t e m s

Messaging Model

• Proxies maintained in actor registry– Asynchronous RPC– Upcall to kernel for incoming ops– Downcall from lease FSM for outgoing

• Local, SOAP w/ WS-Security, etc.– WSDL protocols, but incomplete

• Integration (e.g., XMLRPC)– Experiment manager calls into slice controller

module– Substrate ops through authority-side handler

Page 28: Orca internals 101

28D u k e S y s t e m s

The end, for now

• Presentation trails off….

• Follows are other slides from previous presentations dealing more with the concepts and rationale of Orca, and its use for GENI.

Page 29: Orca internals 101

29D u k e S y s t e m s

NSF GENI Initiative

Sliverable GENI Substrate(Contributing domains/Aggregates)

Wind tunnel

Experiments(Guests occupying slices)

Embedding

Petri dish

Observatory

Page 30: Orca internals 101

30D u k e S y s t e m s

Dreaming of GENINSF GENIclearinghouse

Component Registry

Channel

Band

Switch Port

Fiber ID ρ

Optical

Switch

GID

1. CM self-generates GID: public and private keys

2. CM sends GID to MA; out of band methods are used to validate MA is willing to vouch for component. CM delegates MA the ability to create slices.

3. MA (because it has sufficient credentials) registers name, GID, URIs and some descriptive info.

Notes:

• Identity and authorization are decoupled in this architecture. GIDs are used for identification only. Credentials are used for authorization. I.e., the GID says only who the component is and nothing about what it can do or who can access it.

• Assuming aggregate MA already has credentials permitting access to component registry

Usage PolicyEngine

4. MA delegates rights to NSF GENI so that NSF GENI users can create slices.

Aggregate MgmtAuthority

Aaron Falk, GPO BBN

http://groups.geni.net/

Page 31: Orca internals 101

31D u k e S y s t e m s

Slivers and Slices

Aaron Falk, GPO BBN

Page 32: Orca internals 101

32D u k e S y s t e m s

GENI as a Programmable Substrate

• Diverse and evolving collection of substrate components.– Different owners, capabilities, and interfaces

• A programmable substrate is an essential platform for R/D in network architecture at higher layers.– Secure and accountable routing plane– Authenticated traffic control (e.g., free of DOS and spam)– Mobile social networking w/ “volunteer” resources– Utility networking– Deep introspection and adaptivity– Virtual tunnels and bandwidth-provisioned paths

Page 33: Orca internals 101

33D u k e S y s t e m s

Some Observations

• The Classic Internet is “just an overlay”.– GENI is underlay architecture (“underware”).

• Incorporate edge resources: “cloud computing” + sliverable network

• Multiple domains (MAD): not a “Grid”, but something like dynamic peering contracts– Decouple services from substrate; manage the

substrate; let the services manage themselves.

• Requires predictable (or at least “discoverable”) allocations for reproducibility– QoS at the bottom or not at all?

Page 34: Orca internals 101

34D u k e S y s t e m s

Breakable Experimental Network (BEN)

• BEN is an experimental fiber facility• Supports experimentation at metro scale

– Distributed applications researchers

– Networking researchers

• Enabling disruptive technologies – Not a production network

• Shared by the researchers at the three Triangle Universities– Coarse-grained time sharing is the primary mode for usage

– Assumes some experiments must be granted exclusive access to the infrastructure

Page 35: Orca internals 101

35D u k e S y s t e m s

Resource Control Plane

MiddlewareCloud Apps Services

Node

VM

Hardware

VM

Node

VM

Hardware

VM

Other Guests

Open Resource Control Architecture (Orca)

• Contract model for resource peering/sharing/management• Programmatic interfaces and protocols• Automated lease-based allocation and assignment• Share substrate among dynamic “guest” environments• http://www.cs.duke.edu/nicl/

Resource Control Plane

Page 36: Orca internals 101

36D u k e S y s t e m s

The GENI Control Plane

• Programmable substrate elements• Dynamic end-to-end sliver allocation + control

– Delegation of authority etc.– Instrumentation (feedback)

• Resource representation and exchange– Defining the capabilities of slivers– “network virtual resource”

• Foundation for discovery– Of resources, paths, topology ra=(8,4)

rb=(4,8)

a

b

crc=(4,4)

→16

CPU shares

band

wid

th s

hare

s

Page 37: Orca internals 101

37D u k e S y s t e m s

Define: Control Plane

GGF+GLIF: "Infrastructure and distributed intelligence that controls the establishment and maintenance of connections in the network, including protocols and mechanisms to disseminate this information; and algorithms for automatic delivery and on-demand provisioning of an optimal path between end points.”

s/connections/slices/s/optimal path/embedded slicesprovisioning += and programmed instantiation

Page 38: Orca internals 101

39D u k e S y s t e m s

Key Questions

• Who are the entities (actors)?

• What are their roles and powers?

• Whom do they represent?

• Who says what to whom?

• What innovation is possible within each entity, or across entities?

Control plane defines “the set of entities that interact to establish, maintain, and release resources and provide…[connection,slice] control functions”.

Page 39: Orca internals 101

41D u k e S y s t e m s

Design Tensions

• Governance vs. freedom

• Coordination vs. autonomy

• Diversity vs. coherence

• Assurance vs. robustness

• Predictability vs. efficiency

• Quick vs. right

• Inclusion vs. entanglement

• Etc. etc. …

Page 40: Orca internals 101

42D u k e S y s t e m s

Design Tensions

• What is standardized vs. what is open to innovation?

• How can GENI be open to innovation in components/management/control?– We want it to last a long time.– Innovation is what GENI is for.

• Standardization vs. innovation– Lingua Franca vs. Tower of Babel

Page 41: Orca internals 101

43D u k e S y s t e m s

Who Are the Actors?

• Principle #1: Entities (actors) in the architecture represent the primary stakeholders.

1.Resource owners/providers (site or domain)

2.Slice owners/controllers (guests)

3.The facility itself, or resource scheduling services acting on its behalf.

Others (e.g., institutions) are primarily endorsing entities in the trust chains.

Page 42: Orca internals 101

44D u k e S y s t e m s

Control Plane

CloudService

NetworkService

Etc.

ResourcesInfrastructure providers

Brokering intermediaries

(ClearingHouse)

Plug guests, resources, and

management policies into the “cloud”.

Page 43: Orca internals 101

45D u k e S y s t e m s

Contracts

• Principle #2: provide pathways for contracts among actors.– Accountability [SHARP, SOSP 2003]

• Be open with respect to what promises an actor is permitted to make.– Open innovation for contract languages and tools– Yes, need at least one LCD

• Rspec > HTML 1.0• Lingua Franca vs. Tower of Babel

• Resource contracts are easier than service/interface contracts.

Page 44: Orca internals 101

46D u k e S y s t e m s

Rules for Resource Contracts

• Don’t make promises you can’t keep…but don’t hide power. [Lampson]

• There are no guarantees, ever.– Have a backup plan for what happens if

“assurances” are not kept.

• Provide sufficient power to represent what promises the actor is explicitly NOT making.– E.g., temporary donation of resources– Best effort, probabilistic overbooking, etc.

• Incorporate time: start/expiration time– Resource contracts are leases (or tickets).

Page 45: Orca internals 101

47D u k e S y s t e m s

<lease> <issuer> Site’s public key </issuer> <signed_part> <holder> Guest’s public key </holder> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Site </sn> </signed_part> <signature> Site’s signature </signature></lease>

Guest

request

grant Provider Site

Leases

• Foundational abstraction: resource leases• Contract between provider (site) and guest

– Bind a set of resource units from a site to a guest– Specified term (time interval)– Automatic extends (“meter feeding”)– Various attributes

Page 46: Orca internals 101

50D u k e S y s t e m s

Network Description Language?<ndl:Interface rdf:about="#tdm3.amsterdam1.netherlight.net:501/3"> <ndl:name>tdm3.amsterdam1.netherlight.net:501/3</ndl:name> <ndl:connectedTo rdf:resource="http://networks.internet2.edu/manlan/manlan.rdf#manlan:if1"/> <ndl:capacity rdf:datatype="http://www.w3.org/2001/XMLSchema#float">1.244E+9</ndl:capacity></ndl:Interface><ndl:Interface rdf:about="http://networks.internet2.edu/manlan/manlan.rdf#manlan:if1"> <rdfs:seeAlso rdf:resource="http://networks.internet2.edu/manlan/manlan.rdf"/>

<?xml version="1.0" encoding="UTF-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:ndl="http://www.science.uva.nl/research/sne/ndl#” xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"><!-- Description of Netherlight --><ndl:Location rdf:about="#Amsterdam1.netherlight.net"> <ndl:name>Netherlight Optical Exchange</ndl:name> <geo:lat>52.3561</geo:lat> <geo:long>4.9527</geo:long></ndl:Location><!-- TDM3.amsterdam1.netherlight.net --><ndl:Device rdf:about="#tdm3.amsterdam1.netherlight.net"> <ndl:name>tdm3.amsterdam1.netherlight.net</ndl:name> <ndl:locatedAt rdf:resource="#Amsterdam1.netherlight.net"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/1"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/2"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/3"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:501/4"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/1"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/2"/> <ndl:hasInterface rdf:resource="#tdm3.amsterdam1.netherlight.net:502/3"/>>

Page 47: Orca internals 101

52D u k e S y s t e m s

Delegation

• Principle #3: Contracts enable delegation of powers.– Delegation is voluntary and provisional.

• It is a building block for creating useful concentrations of power.– Creates a potential for governance– Calendar scheduling, reservation– Double-edged sword?

• Facility can Just Say No

Page 48: Orca internals 101

53D u k e S y s t e m s

Aggregation

• Principle #4: aggregate the resources for a site or domain.– Primary interface is domain/site authority

• Abstraction/innovation boundary– Keep components simple– Placement/configuration flexibility for owner– Mask unscheduled outages by substitution– Leverage investment in technologies for

site/domain management

Page 49: Orca internals 101

54D u k e S y s t e m s

BEN fiberplant

• Combination of NCNI fiber and campus fiber

• Possible fiber topologies:

Page 50: Orca internals 101

55D u k e S y s t e m s

Infinera DTN

• PIC-bases solution

• 100Gbps DLM (digital line module)– Circuits provisioned at 2.5G granularity

• Automatic optical layer signal management (gain control etc.)

• GMPLS-based control plane

• Optical express– All-optical node bypass

Page 51: Orca internals 101

56D u k e S y s t e m s

Experimentation on BEN

• Extend Orca to enable slivering of– Network elements:

• Fiber switches• DWDM equipment• Routers

• Adapt mechanisms to enable flexible description of network slices– NDL

• Demonstrate end-to-end slicing on BEN– Create realistic slices containing compute, storage and

network resources– Run sample experiments on them

Page 52: Orca internals 101

57D u k e S y s t e m s

BEN Usage

• Experimental equipment connected to the BEN fiberplant at BEN points-of-presence

• Use MEMS fiber switches to switch experimental equipment in and out

– Based on the experiment schedule

• By nature of the facility, experiments running on it may be disruptive to the network

• BEN Points of presence located at the RENCI engagement sites and RENCI anchor site

Page 53: Orca internals 101

58D u k e S y s t e m s

BEN Redux

• Reconfigurable optical plane– We will be seeking out opportunities to expand the available

fiber topology

• Researcher equipment access at all layers– From dark fiber up

• Coarse-grained scheduled• Researcher-controlled• No single-vendor lock-in• Equipment with exposable APIs• Connectivity with substantial non-production

resources

Page 54: Orca internals 101

59D u k e S y s t e m s

Elements of Orca Research Agenda

• Automate management inside the cloud.– Programmable guest setup and provisioning

• Architect a guest-neutral platform.– Plug-in new guests through protocols; don’t hard-wire them

into the platform.

• Design flexible security into an open control plane.• Enforce fair and efficient sharing for elastic guests.• Incorporate diverse networked resources and virtual

networks.• Mine instrumentation data to pinpoint problems and

select repair actions.• Economic models and sustainability.

Page 55: Orca internals 101

60D u k e S y s t e m s

Leasing Virtual Infrastructure

- e.g., CPU shares, memory etc. “slivers”- storage server shares [Jin04]- measured, metered, independent units- varying degrees of performance isolation

Policy agents control negotiation/arbitration.- programmatic, service-oriented leasing interfaces- lease contracts

The hardware infrastructure consists of pools of typed “raw” resources distributed across sites.

ra=(8,4)

rb=(4,8)

a

b

crc=(4,4)

→16

Guest

request

grant Provider Site

<lease> <issuer> Site’s public key </issuer> <signed_part> <holder> Guest’s public key </holder> <rset> resource description </rset> <start_time> … </start_time> <end_time> … </end_time> <sn> unique ID at Site </sn> </signed_part> <signature> Site’s signature </signature></lease>

Page 56: Orca internals 101

61D u k e S y s t e m s

Summary

• Factor actors/roles along the right boundaries.– stakeholders, innovation, tussle

• Open contracts with delegation

• Specific recommendations for GENI:– Aggregates are first-class entities– Component interface: permit innovation– Clearinghouse: enable policies under GSC

direction

Page 57: Orca internals 101

62D u k e S y s t e m s

Modularize Innovation

• Control plane design should enable local innovation within each entity.

• Can GENI be a platform for innovation of platforms? Management services?– How to carry forward the principle that

PlanetLab calls “unbundled management”?

• E.g., how to evolve standards for information exchange and contracts.– Lingua Franca or Tower of Babel?

Page 58: Orca internals 101

63D u k e S y s t e m s

Slices: Questions

• What “helper” tools/interfaces must we have and what do they require from the control plane?

• Will GENI enable research on new management services and control plane?– If software is the “secret weapon”, what parts of the platform

are programmable/replaceable?

• Co-allocation/scheduling of an end-to-end slice?– What does “predictable and repeatable” mean?– What assurances are components permitted to offer?

• What level of control/stability do we assume over the

substrate?

Page 59: Orca internals 101

64D u k e S y s t e m s

Focus questions

• Specify/design the “core services”:– Important enough and hard enough to argue about– Must be part of facilities planning– Directly motivated by usage scenarios– Deliver maximum bang for ease-of-use– User-centric, network-centric

• Enable flowering of extensions/plugins– Find/integrate technology pieces of value

• What requirements do these services place on other WGs?

Page 60: Orca internals 101

65D u k e S y s t e m s