36
HPTS Keynote, October 200 HPTS Keynote, October 200 Invariant Boundaries Invariant Boundaries Dr. Eric A. Brewer Dr. Eric A. Brewer Professor, UC Berkeley Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi Co-Founder & Chief Scientist, Inktomi

Our Perspective

  • Upload
    rey

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

Invariant Boundaries Dr. Eric A. Brewer Professor, UC Berkeley Co-Founder & Chief Scientist, Inktomi. Our Perspective. Inktomi builds two distributed systems: Global Search Engines Distributed Web Caches Based on scalable cluster & parallel computing technology - PowerPoint PPT Presentation

Citation preview

Page 1: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Invariant BoundariesInvariant Boundaries

Dr. Eric A. BrewerDr. Eric A. BrewerProfessor, UC BerkeleyProfessor, UC Berkeley

Co-Founder & Chief Scientist, InktomiCo-Founder & Chief Scientist, Inktomi

Page 2: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Our PerspectiveOur Perspective

Inktomi builds two Inktomi builds two distributed systems:distributed systems:– Global Search EnginesGlobal Search Engines– Distributed Web CachesDistributed Web Caches

Based on scalable Based on scalable cluster & parallel cluster & parallel computing technologycomputing technology

But very little use of But very little use of classic DS research...classic DS research...

Page 3: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

““Distributed Systems” don’t work...Distributed Systems” don’t work...

There exist working DS:There exist working DS:– Simple protocols: DNS, WWWSimple protocols: DNS, WWW– Inktomi search, Content Delivery NetworksInktomi search, Content Delivery Networks– Napster, Verisign, AOLNapster, Verisign, AOL

But these are not classic DS:But these are not classic DS:– Not distributed objectsNot distributed objects– No RPCNo RPC– No modularityNo modularity– Complex ones are single owner (except phones)Complex ones are single owner (except phones)

Page 4: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Concept: Invariant BoundariesConcept: Invariant Boundaries

Claim: we don’t understand boundariesClaim: we don’t understand boundaries Solution: make “Invariant Boundary” explictitSolution: make “Invariant Boundary” explictit Goal: simpler, easier, faster to correctnessGoal: simpler, easier, faster to correctness

InvariantHolds

Invariantmay not hold

Page 5: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Three Basic IssuesThree Basic Issues

Where is the state?Where is the state?

Consistency vs. AvailabilityConsistency vs. Availability

Communication BoundariesCommunication Boundaries

Page 6: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Santa Clara ClusterSanta Clara Cluster

• Very uniform• No monitors• No people• No cables

• Working power• Working A/C• Working BW

Page 7: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Boundaries for Kinds of StateBoundaries for Kinds of State

StatelessStateless– Front endsFront ends

Immutable stateImmutable state

Soft State (rebuild on restart)Soft State (rebuild on restart)

Durable Single-writer (e.g. user data)Durable Single-writer (e.g. user data)– Emissaries, horizontal partitioningEmissaries, horizontal partitioning

Durable MWDurable MW– Fiefdoms, DBMSFiefdoms, DBMS

Page 8: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Persistent State is HARDPersistent State is HARD

Classic DS focus on the computation, not the dataClassic DS focus on the computation, not the data– this is WRONG, computation is the easy partthis is WRONG, computation is the easy part

Data centers exist for a reasonData centers exist for a reason– can’t have consistency or availability without themcan’t have consistency or availability without them

Other locations are for caching only:Other locations are for caching only:– proxies, basestations, set-top boxes, desktopsproxies, basestations, set-top boxes, desktops– phones, PDAs, …phones, PDAs, …

Distributed systems can’t ignore locationDistributed systems can’t ignore location

Invariant Boundary is Invariant Boundary is smallsmall

Page 9: Our Perspective

APActive Proxy:Bootstraps thin devices into infrastructure, runs mobile code

AP

Workstations & PCs

Berkeley Ninja ArchitectureBerkeley Ninja Architecture

Base: Scalable, highly-available platform for persistent-state services

Internet

PDAs(e.g. IBM Workpad)Cellphones, Pagers, etc.

Page 10: Our Perspective

APActive Proxy:Bootstraps thin devices into infrastructure, runs mobile code

AP

Workstations & PCs

Berkeley Ninja ArchitectureBerkeley Ninja Architecture

Base: Scalable, highly-available platform for persistent-state services

Internet

PDAs(e.g. IBM Workpad)Cellphones, Pagers, etc.

Consistent Shared

Soft-state, immutable state

Single user

Page 11: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Three Basic IssuesThree Basic Issues

Where is the state?Where is the state?

Consistency vs. AvailabilityConsistency vs. Availability

Communication BoundariesCommunication Boundaries

Page 12: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Data is only consistent *inside*Data is only consistent *inside*

Data is not consistent(“reference” data)

Consistent A

Page 13: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Data is only consistent *inside*Data is only consistent *inside*

Consistent A

Data is not consistent(“reference” data)

Consistent B

Page 14: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

The CAP TheoremThe CAP Theorem

Consistency Availability

Tolerance to network

Partitions

Theorem: You can have at most two of these invariantsfor any shared-data system

Page 15: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

The CAP TheoremThe CAP Theorem

Consistency Availability

Tolerance to network

Partitions

Theorem: You can have at most two of these invariantsfor any shared-data system

Corollary: consistency boundary must choose A or P

Page 16: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Forfeit PartitionsForfeit Partitions

Consistency Availability

Tolerance to network

Partitions

ExamplesExamples Single-site databasesSingle-site databases Cluster databasesCluster databases LDAPLDAP FiefdomsFiefdoms

TraitsTraits 2-phase commit2-phase commit cache validation protocolscache validation protocols The “inside”The “inside”

Page 17: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Forfeit AvailabilityForfeit Availability

Consistency Availability

Tolerance to network

Partitions

ExamplesExamples Distributed databasesDistributed databases Distributed lockingDistributed locking Majority protocolsMajority protocols

TraitsTraits Pessimistic lockingPessimistic locking Make minority partitions Make minority partitions

unavailableunavailable

Page 18: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Forfeit ConsistencyForfeit Consistency

Consistency Availability

Tolerance to network

Partitions

ExamplesExamples CodaCoda Web cachingeWeb cachinge DNSDNS EmissariesEmissaries

TraitsTraits expirations/leasesexpirations/leases conflict resolutionconflict resolution OptimisticOptimistic The “outside”The “outside”

Page 19: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

ACID vs. BASEACID vs. BASE

ACIDACID Strong consistencyStrong consistency

IsolationIsolation

Focus on “commit”Focus on “commit”

Nested transactionsNested transactions

Availability?Availability?

Conservative (pessimistic)Conservative (pessimistic)

Difficult evolutionDifficult evolution(e.g. schema)(e.g. schema)

““small” Invariant Boundarysmall” Invariant Boundary

The “inside”The “inside”

BASEBASE Weak consistencyWeak consistency

– stale data OKstale data OK

Availability firstAvailability first

Best effortBest effort

Approximate answers OKApproximate answers OK

Aggressive (optimistic)Aggressive (optimistic)

““Simpler” and fasterSimpler” and faster

Easier evolution (XML)Easier evolution (XML)

““wide” Invariant Boundarywide” Invariant Boundary

Outside consistency boundaryOutside consistency boundary

but it’s a spectrum

Page 20: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Consistency Boundary Summary Consistency Boundary Summary

Can have consistency & availability within a Can have consistency & availability within a cluster. No partitions within boundary!cluster. No partitions within boundary!

OS/Networking better at A than COS/Networking better at A than C

Databases better at C than ADatabases better at C than A

Wide-area databases can’t have bothWide-area databases can’t have both

Disconnected clients can’t have bothDisconnected clients can’t have both

Page 21: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Three Basic IssuesThree Basic Issues

Where is the state?Where is the state?

Consistency vs. AvailabilityConsistency vs. Availability

Communication BoundariesCommunication Boundaries

Page 22: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

The BoundaryThe Boundary

The interface between two modulesThe interface between two modules– client/server, peers, libaries, etc…client/server, peers, libaries, etc…

Basic boundary = the procedure callBasic boundary = the procedure call

– thread traverses the boundarythread traverses the boundary– two sides are in the same address spacetwo sides are in the same address space

What invariants don’t hold across?What invariants don’t hold across?

C S

Page 23: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Different Address SpacesDifferent Address Spaces

What if the two sides are NOT in the same What if the two sides are NOT in the same address space?address space?– IPC or LRPCIPC or LRPC

Can’t do pass-by-reference (pointers)Can’t do pass-by-reference (pointers)– Most IPC screws this up: pass by value-resultMost IPC screws this up: pass by value-result– There are TWO copies of args not oneThere are TWO copies of args not one

What if they share some memory?What if they share some memory?– Can pass pointers, but…Can pass pointers, but…– Need synchronization between client/serverNeed synchronization between client/server– Not all pointers can be passedNot all pointers can be passed

Page 24: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Partial FailurePartial Failure

Can the two sides fail independently?Can the two sides fail independently?– RPC, IPC, LRPCRPC, IPC, LRPC

Can’t be transparent (like RPC) !!Can’t be transparent (like RPC) !!

New exceptions (other side gone)New exceptions (other side gone)

Idempotent calls?Idempotent calls?– Use Transaction Ids (to solve replay problem)Use Transaction Ids (to solve replay problem)

Reclaim local resourcesReclaim local resources– e.g. kernels leak sockets over time => reboote.g. kernels leak sockets over time => reboot

RPC tries to hide these issues (but fails)RPC tries to hide these issues (but fails)

Use Level 4/7 switches to hide failures?Use Level 4/7 switches to hide failures?

Page 25: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Resource AllocationResource Allocation

How to reclaim resources allocated for client?How to reclaim resources allocated for client?– Usually timeout exception cleans upUsually timeout exception cleans up– Release locks? (must track them!)Release locks? (must track them!)– How to avoid long delays while holding resources?How to avoid long delays while holding resources?

How long to remember client?How long to remember client?– Delayed responses (past timeout) must be ignoredDelayed responses (past timeout) must be ignored

Problem with leases:Problem with leases:– Great for servers, but…Great for servers, but…– Client’s lease may expire mid operationClient’s lease may expire mid operation– Hard to make client updates atomic with multiple leases (2PC?)Hard to make client updates atomic with multiple leases (2PC?)– Which things have leases? (can be hidden)Which things have leases? (can be hidden)

Page 26: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Trust the other side?Trust the other side?

What if we don’t trust the other side?What if we don’t trust the other side?– Or partial trust (legal contract), or malicious?Or partial trust (legal contract), or malicious?– Have to check args, no pointer passingHave to check args, no pointer passing

Limited release of information (leaks)Limited release of information (leaks)

Kernels get this right:Kernels get this right:– copy/check argscopy/check args– use opaque references (e.g. File Descriptors)use opaque references (e.g. File Descriptors)

Most systems do notMost systems do not– TCP, Napster, web browsersTCP, Napster, web browsers

Security boundaries tend to be explicitSecurity boundaries tend to be explicit– Holes come from services!Holes come from services!

Page 27: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Multiplexing clients?Multiplexing clients?

Does the server have to:Does the server have to:– Deal with high concurrency?Deal with high concurrency?– Say “no” sometimes (graceful degradation)Say “no” sometimes (graceful degradation)– Treat clients equally (fairness)Treat clients equally (fairness)– Bill for resources (and have audit trail)Bill for resources (and have audit trail)– Isolate clients performance, data, ….Isolate clients performance, data, ….

These all affect the boundary definitionThese all affect the boundary definition

Page 28: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Boundary evolution?Boundary evolution?

Can the two sides be updated independently? Can the two sides be updated independently? (NO)(NO)

The DLL problem...The DLL problem...

Boundaries need versionsBoundaries need versions

Negotiation protocol for upgrade?Negotiation protocol for upgrade?

Promises of backward compatibility?Promises of backward compatibility?

Affects naming too (version number)Affects naming too (version number)

Page 29: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Example: protocols vs. APIsExample: protocols vs. APIs

Protocols have been more successful than APIsProtocols have been more successful than APIs

Some reasons:Some reasons:– protocols are pass by valueprotocols are pass by value– protocols designed for partial failureprotocols designed for partial failure– not trying to look like local procedure callsnot trying to look like local procedure calls– explicit state machine, rather than call/returnexplicit state machine, rather than call/return

(this exposes exceptions well)(this exposes exceptions well)

Protocols still not good at trust, billing, evolutionProtocols still not good at trust, billing, evolution

Page 30: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Example: XMLExample: XML

XML doesn’t solve any of these issuesXML doesn’t solve any of these issues

It is RPC with an extensible type systemIt is RPC with an extensible type system

It makes evolution better?It makes evolution better?– two sides need to agree on schematwo sides need to agree on schema– can ignore stuff you don’t understandcan ignore stuff you don’t understand– Must agree on meaning, not just tagsMust agree on meaning, not just tags

Can mislead us to ignore/postpone the real issuesCan mislead us to ignore/postpone the real issues

Page 31: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Example: servicesExample: services

Claim: you can’t magically convert a class to a Claim: you can’t magically convert a class to a serviceservice

Behavior depends on the boundaries that callers Behavior depends on the boundaries that callers cross…. cross…. – Trusted? Multiplexed? Partial failure? Namespaces?Trusted? Multiplexed? Partial failure? Namespaces?

Shouldn’t TRY to be transparentShouldn’t TRY to be transparent

Instead: make it easier to state boundary Instead: make it easier to state boundary assumptions (and check them)assumptions (and check them)

Page 32: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Annotated InterfacesAnnotated Interfaces

IDL can annotate interfaces:IDL can annotate interfaces:– ““Timeout” => client may not respondTimeout” => client may not respond– ““Trusted” => no malicious clientsTrusted” => no malicious clients– ““Malicious’ => client may be hostileMalicious’ => client may be hostile– ““Multiplexed” => many simultaneous callersMultiplexed” => many simultaneous callers– ““Idempotent”Idempotent”– Etc.Etc.

Could be checked at statically in some cases, Could be checked at statically in some cases, dynamically in othersdynamically in others

Annotations + Boundaries => fewer bugsAnnotations + Boundaries => fewer bugs

Page 33: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Lessons for ApplicationsLessons for Applications

Make boundaries very explicitMake boundaries very explicit– Not just client/serverNot just client/server– Independent systemsIndependent systems– Third-party softwareThird-party software– Third-party services (RPC to vendors/partners)Third-party services (RPC to vendors/partners)

Have a few big modules…Have a few big modules…– Otherwise too many boundaries, and no invariantsOtherwise too many boundaries, and no invariants– Examples: Apache, Oracle, Inktomi search engineExamples: Apache, Oracle, Inktomi search engine– Big Modules are well supportedBig Modules are well supported– Big Modules justify their cost (Lampson)Big Modules justify their cost (Lampson)

Page 34: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

Partial checklistPartial checklist

What is shared? (namespace, schema?)What is shared? (namespace, schema?)

What kind of state in each boundary?What kind of state in each boundary?

How would you evolve an API?How would you evolve an API?

Lifetime of references? Expiration impact?Lifetime of references? Expiration impact?

Graceful degradation as modules go down?Graceful degradation as modules go down?

External persistent names?External persistent names?

Consistency semantics and boundary?Consistency semantics and boundary?

Page 35: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

ConclusionsConclusions

Most systems are fragileMost systems are fragile Root causes:Root causes:

– False transparencyFalse transparency: assuming locality, trust, privacy…: assuming locality, trust, privacy…– ImplicitlyImplicitly changingchanging boundaries: consistency, partial failure, … boundaries: consistency, partial failure, …

Some of the causes:Some of the causes:– focus on computation, not datafocus on computation, not data– ignoring location distinctionsignoring location distinctions– overestimating consistency boundaryoverestimating consistency boundary– degraded boundaries (RPC is not a PC)degraded boundaries (RPC is not a PC)

Invariant Boundaries Invariant Boundaries – Help understanding, documentationHelp understanding, documentation– Simplify detectionSimplify detection– Simpler and easier than full specifications (but weaker)Simpler and easier than full specifications (but weaker)

Page 36: Our Perspective

HPTS Keynote, October 2001HPTS Keynote, October 2001

ACID vs. BASEACID vs. BASE

DBMS research is about ACID (mostly)DBMS research is about ACID (mostly)

But we forfeit “C” and “I” for availability, graceful But we forfeit “C” and “I” for availability, graceful degradation, and performancedegradation, and performance

This tradeoff is fundamental.This tradeoff is fundamental.

BASE:BASE:– BBasically asically AAvailablevailable– SSoft-stateoft-state– EEventual consistencyventual consistency