A BRIEF INTRODUCTION TO HIGH ASSURANCE CLOUD COMPUTING WITH
ISIS2 Ken Birman 1 Cornell University
Slide 2
An introduction to the lecturer About the Lecturer 2
Slide 3
Ken Birman 3 Researcher in high assurance computing since
joining Cornell in 1982 (PhD U.C. Berkeley). Currently Cornells N.
Rama Rao Professor of Computer Science. ACM Fellow, Winner of IEEE
Tsutomu Kanai Award Built the distributed software infrastructure
used for a decade by the New York Stock Exchange, and still used in
the French Air Traffic Control System, the US Navy AEGIS and
several other mission-criticial systems. Contact information at
http://www.cs.cornell.edu/ken
Slide 4
Introducing terminology Informal description of goals Segment
I: The Cloud Landscape 4
Slide 5
High Assurance and the Cloud 5 Cloud Computing: The new
universal standard A technology for federating network services
Easy to share data, deeply integrated with web pages Supports a
wide range of media types But the cloud cant offer high assurance
today! A wave of sensitive applications is approaching (areas like
mHealth, Smart power grid, eBanking, Smart cars...) They need
strong guarantees... what can we do to help?
Slide 6
How does todays cloud work? 6 Client platform: browsers and
apps, which are programs that exploit a stripped-down browser API
Internet transports the data Data centers run web services that
produce the pages we see, stream videos, etc
Slide 7
Each step embodies weaknesses 7 The client system is vulnerable
to loss of connectivity, compromise by downloaded code and
infection by viruses and worms. The Internet layer is potentially
unreliable The mapping of domain names to IP addresses is very
complex (consequence of cloud need to steer traffic) Network
reliability is much lower than it needs to be Much too easy to
snoop on traffic or attack connections The Web Services
infrastructure can fail or reconfigure abruptly, forcing the client
to reconnect
Slide 8
Recipe for high assurance 8 Design a system to fail only in
safe ways Nobody gets hurt, but perhaps the system reports that it
has gone offline Then do everything practical to enhance
reliability, consistency, security, other needed properties Today:
Focus on the web services running on the cloud data center
Slide 9
Tradeoffs in cloud space 9 The properties we need are in
tension! Snappy response: Every 100ms matters Elasticity: Load
varies suddenly and dramatically, service replication levels need
to vary accordingly Consistency: If distinct service replicas talk
to multiple clients about something, they dont say contradictory
things. Fault-tolerance: If a replica crashes, the cloud self-heals
Attack-tolerance: The service is very hard to attack. Security:
Authenticated clients are limited to performing authorized actions
in accordance with a policy Privacy: I can control who uses my data
and how Required Often weak or lacking
Slide 10
Todays cloud: As fast as possible 10 In the race to offer the
fastest possible services to the largest possible number of clients
todays cloud often gives up on other assurance properties In some
sense the cloud is insecure and inconsistent by design! ... but
does it have to be that way?
Slide 11
Tomorrow: A high assurance cloud! 11 A single system needs to
tell multiple kinds of assurance stories and not all in the same
way An mHealth application: Needs to reassure the user that it is
trustworthy Needs to help the developer make the right choices Must
implement complex protocols correctly Must be a good citizen on the
cloud data center
Slide 12
A few slides each on some challenging problems Each needs the
cloud... but each needs some form of strong assurance guarantee too
Segment II: Examples 12
Slide 13
Example 1: Power grid 13 Todays power grid has serious issues
Wasteful: As much as 15% of power is lost just moving it around,
and a great deal of renewable energy (solar, wind, tides) is lost
because of poor integration with the standard grid Rigid: Ideally,
the grid should adapt and move parcels of power much as the
Internet moves packets. Dumb: even when it is obvious that we could
optimize behavior, the grid uses old, inefficient techniques Goal:
A smart power grid!
Slide 14
How a small power grid operates 14 Power flows like water Path
of least resistance Governed by Kirchoffs Law Power enters at every
generator, exits at every load Hierarchical structure: Primary
power busses Secondary smaller local feeds 10-Generator, 39-bus New
England System
Slide 15
Technology to enable a smart grid 15 Well need to monitor power
loads, frequency, current in real-time, reliably and securely Use
this data to estimate the state of the grid and to predict its
evolution over time Use those predictions to plan control actions:
increase/decrease generation, borrow reactive power from
neighboring regions, adapt pricing, etc Ultimately the grid will
become a new kind of network. But must also be safe, efficient, and
secure against both mishaps and even attack!
Slide 16
Even mundane problems can hurt 16 California: Repeated episodes
of market manipulation aimed at increasing profits for companies
such as Enron that speculate on pricing Multi-state and
multi-national rolling outages Causes turmoil for air traffic,
ground traffic, telephone outages Will smartness also make grid
more fragile? Risk of CyberAttacks?
Slide 17
Control of the smart power grid 17 Suppose that a cloud control
system speaks with two voices In physical infrastructure settings,
consequences can be very costly Switch on the 50KV Canadian bus
Canadian 50KV bus going offline
Slide 18
Control of the smart power grid 18 Suppose that a cloud control
system speaks with two voices In physical infrastructure settings,
consequences can be very costly Switch on the 50KV Canadian bus
Canadian 50KV bus going offline Bang!
Slide 19
Power grid summary 19 To make it smart we need to monitor at a
massive scale and use that to initiate control actions But for this
to be safe, we need more that fast response and elasticity We also
need security (so that attackers cant take the grid down) ... and
consistency (as we just saw) ... and fault-tolerance (since power
systems often experience failures of various kinds)
Slide 20
Example 2: mHealth 20 A term for everything outside the doctors
office (but might be linked to electronic health records) Goal is
to make your life better and healthier Encourage activity
Discourage poor nutrician choices Help patients with chronic
conditions manage their complex medical devices and medications
Offer caregivers a window into health so that the patient can
maintain independence
Slide 21
Integrated glucose monitor and Insulin pump receives
instructions wirelessly Motion sensor, fall-detector Cloud
Infrastructure Home healthcare application Healthcare provider
monitors large numbers of remote patients Medication station
tracks, dispenses pills What properties are needed in remote
medical care systems? 21
Slide 22
Durability... scalability... fast response Need: Strong
consistency and durability for data Cloud Infrastructure Mrs. Marsh
has been dizzy. Her stomach is upset and she hasnt been eating
well, yet her blood sugars are high. Lets stop the oral diabetes
medication and increase her insulin, but well need to monitor
closely for a week Patient Records DB 22
Slide 23
What do these terms mean? 23 Consistency: Even if accessed by
multiple users concurrently, the data looks like a single database
This sounds like it should obviously be true, but when the data is
spread over multiple computers, if they dont coordinate their
actions, consistency can easily violated For example, perhaps
machine 1 shows updates machine 2 never saw. Perhaps machine 3 sees
all the updates but has the order confused. Each of these cases can
cause serious inconsistencies.
Slide 24
What do these terms mean? 24 Durability: Even if system
components crash and then recover later, data will not be lost.
Updates confuse things: before the update occurs, clearly it isnt
durable After the update is finished, it must have durable effect
Question to pose: exactly when did it need to be durable? Usual
answer: If the effect of an update survives a crash, then the
update itself should also survive the crash
Slide 25
Scalability 25 As we make the system larger, perforance remains
good It needs to be able to support large numbers of clients and
run on large numbers of cloud computing systems Fast response:
Queries shouldnt delay for long. Updates should have rapid effect
on the data.
Slide 26
Guarantees versus best effort 26 Todays cloud systems work well
in all of these ways but without providing strong guarantees except
in certain very specialized cases, like Googles new Spanner
database Our challenge: can normal people who arent in the Google
spanner development team also create trustworthy cloud computing
solutions?
Slide 27
mHealth summary 27 The needs of the system vary depending on
what part of the system we focus on In our example, some aspects
need durability in the sense of a logged database update, while
others might accept durability through in-memory replication This
illustrates one of many such tradeoffs If we had more time we could
identify a number of additional issues of this kind
Slide 28
How The Cloud Was Built 28 It is very hard to create software
to run in cloud computing systems Everything must be automated You
must follow many rules and use many packages So open source tools
have become popular Examples: Hadoop (a version of MapReduce),
Zookeeper, Graphlab, Pregel, Vowpal Wabbit, global file systems
like GFS, etc. In this short class we will focus on process group
tools and will use Isis 2 as our main example.
Slide 29
An obsession with speed... 29 At very large scale, either a
thing is extremely fast, or unacceptably slow So everything we do
must be shaped by speed! High assurance is not an option if the
solution would be dramatically slower For example, the cloud
computing community avoids databases. They founded the NoSQL
movement (storage, but not as strong as a SQL database) for this
reason. Similarly we must have speed in mind at all times!
Slide 30
To understand speed, understand the limiting factors This
forces us to think about critical paths Concept: Critical paths
30
Slide 31
What limits responsiveness? 31 Top priority: delay until a
client receives a reply Critical path traces actions that
contribute to this delay Update the monitoring and alarms criteria
for Mrs. Marsh as follows Confirmed Response delay seen by end-user
would include Internet latencies Service response delay Service
instance
Slide 32
Critical path with complex services? 32 When we replicate
information but want to be sure the data wont be lost, critical
path extends into the replicas Update the monitoring and alarms
criteria for Mrs. Marsh as follows Confirmed Response delay seen by
end-user would include Internet latencies Service response delay
Service instance Critical path
Slide 33
Why do critical paths matter? 33 When we build complex systems
it is hard to imagine how they will behave when we run them By
thinking about the critical performance-limiting paths, we can
focus our attention on specific elements and not think about the
whole system By avoiding delays on the critical path, we bring
benefits to the whole system!
Slide 34
There are many critical applications 34 Cloud-hosted system to
control transportation (think of Googles smart cars) The cars have
autonomy but they depend on data from the cloud and would have a
much harder challenge if that data couldnt be trusted Banking
systems Todays online banking systems are growing, but as they
happens, more and more security issues arise Process control
Chemical refineries, manufacturing plants,...
Slide 35
And they come with similar stories 35 In each case we can
identify properties that are Absolutely needed for a cloud
deployment Absolutely needed for safety And beyond that we might
have other assurance properties that a particular use case doesnt
need The challenge will be to analyze each application, and then to
translate its needs into cloud solutions
Slide 36
Well drill down on the tradeoffs between durability and
consistency Many cloud systems believe that consistency isnt
possible: CAP theorem Yet consistency underlies so many other
guarantees Virtual synchrony model Segment III: Consistency 36
Slide 37
Were going to drill down 37 on data and service replication
Replication is at the center of cloud computing: With many replicas
a service can handle many clients And those replicas need as much
of the critical data to be local as possible So replication is a
key technology. It even underlies security: we need to replicate
the policy database and certificates that identify principals
(clients, servers, etc)
Slide 38
Consistency for replication 38 There are many ways to replicate
information But it becomes tricky if the data or even the service
evolves over time. Replication of changing data can leave a
confusing mess if a request encounters stale versions. In some
situations these errors can harm the client. In others, they could
cause security violations.
Slide 39
What do we mean by consistency? A consistent distributed system
will often have many components, but users observe behavior
indistinguishable from that of a single-component reference system.
Our power system example illustrated a form of inconsistency 39
Switch on the 50KV Canadian bus Canadian 50KV bus going offline
Bang!
Slide 40
Theory of Consistency 40 There are some famous impossibility
results Fischer, Lynch and Patterson: FLP theorem proves that any
correct fault-tolerant protocol strong enough to solve consensus (a
form of agreement) can also wedge in the event of certain sequences
of failures. But those sequences turn out to be very rare. Brewers
CAP theorem posits that you can only have two from {Consistency,
Availability and Partition Tolerance}. But the proof holds only for
a service running in a WAN, not for one in a single data
center.
Slide 41
Relate consistency to speed? 41 How costly is strong
consistency? The cloud computing community debates this topic! It
is a very contemporary question We usually pose the question in
connection to replicating data. Strongly consistent data means
guaranteed to be correct and current. Can cloud systems afford
strong consistency? Weakly consistent data means best effort but
can have mistakes. Facebook, eBay, Google all use weak
consistency
Slide 42
We will learn more about these topics 42 In todays lecture we
wont drill down But in lecture 4 we will look more closely at these
theoretical questions Mathematics is a valuable tool for cloud
computing By making a correspondance of computing ideas to
mathematics we can reason more rigorously Yet we will also find
that some of the existing theory has limitations of its own
Slide 43
How does consistency look to the end user? What is it like to
program with a powerful high assurance library like Isis 2 ?
Segment IV: Isis 2 43
Slide 44
Isis 2 System 44 A prebuilt technology that automates many of
the hard tasks involved in replicating services and the data on
which they depend Targets cloud computing settings Available in
open-source from isis2.codeplex.com Intended to be easy to use but
still at an early stage of development
Slide 45
Isis 2 System Elasticity (sudden scale changes) Potentially
heavily loads High node failure rates Concurrent (multithreaded)
apps Long scheduling delays, resource contention Bursts of message
loss Need for very rapid response times Community skeptical of
assurance properties C# library (but callable from any.NET
language) offering replication techniques for cloud computing
developers Based on a model that fuses virtual synchrony and state
machine replication models Research challenges center on creating
protocols that function well despite cloud events 45
Slide 46
Isis 2 makes developers life easier Formal model permits us to
achieve correctness Isis 2 is too complex to use formal methods as
a development too, but does facilitate debugging (model checking)
Think of Isis 2 as a collection of modules, each with rigorously
stated properties Isis 2 implementation needs to be fast, lean,
easy to use Developer must see it as easier to use Isis 2 than to
build from scratch Seek great performance under cloudy conditions
Forced to anticipate many styles of use Benefits of Using Formal
model Importance of Sound Engineering 46
Slide 47
Isis 2 makes developers life easier Group g = new
Group(myGroup); Dictionary Values = new Dictionary ();
g.ViewHandlers += delegate(View v) { Console.Title = myGroup
members: +v.members; }; g.Handlers[UPDATE] += delegate(string s,
double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string
s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, Harry, 20.75);
List resultlist = new List (); nr = g.Query(ALL, LOOKUP, Harry,
EOL, resultlist); First sets up group Join makes this entity a
member. State transfer isnt shown Then can multicast, query.
Runtime callbacks to the delegates as events arrive Easy to request
security (g.SetSecure), persistence Consistency model dictates the
ordering aseen for event upcalls and the assumptions user can make
47
Slide 48
Isis 2 makes developers life easier Group g = new
Group(myGroup); Dictionary Values = new Dictionary ();
g.ViewHandlers += delegate(View v) { Console.Title = myGroup
members: +v.members; }; g.Handlers[UPDATE] += delegate(string s,
double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string
s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, Harry, 20.75);
List resultlist = new List (); nr = g.Query(ALL, LOOKUP, Harry,
EOL, resultlist); First sets up group Join makes this entity a
member. State transfer isnt shown Then can multicast, query.
Runtime callbacks to the delegates as events arrive Easy to request
security (g.SetSecure), persistence Consistency model dictates the
ordering seen for event upcalls and the assumptions user can make
48
Slide 49
Isis 2 makes developers life easier Group g = new
Group(myGroup); Dictionary Values = new Dictionary ();
g.ViewHandlers += delegate(View v) { Console.Title = myGroup
members: +v.members; }; g.Handlers[UPDATE] += delegate(string s,
double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string
s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, Harry, 20.75);
List resultlist = new List (); nr = g.Query(ALL, LOOKUP, Harry,
EOL, resultlist); First sets up group Join makes this entity a
member. State transfer isnt shown Then can multicast, query.
Runtime callbacks to the delegates as events arrive Easy to request
security (g.SetSecure), persistence Consistency model dictates the
ordering seen for event upcalls and the assumptions user can make
49
Slide 50
Isis 2 makes developers life easier Group g = new
Group(myGroup); Dictionary Values = new Dictionary ();
g.ViewHandlers += delegate(View v) { Console.Title = myGroup
members: +v.members; }; g.Handlers[UPDATE] += delegate(string s,
double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string
s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, Harry, 20.75);
List resultlist = new List (); nr = g.Query(ALL, LOOKUP, Harry,
EOL, resultlist); First sets up group Join makes this entity a
member. State transfer isnt shown Then can multicast, query.
Runtime callbacks to the delegates as events arrive Easy to request
security (g.SetSecure), persistence Consistency model dictates the
ordering seen for event upcalls and the assumptions user can make
50
Slide 51
Isis 2 makes developers life easier Group g = new
Group(myGroup); Dictionary Values = new Dictionary ();
g.ViewHandlers += delegate(View v) { Console.Title = myGroup
members: +v.members; }; g.Handlers[UPDATE] += delegate(string s,
double v) { Values[s] = v; }; g.Handlers[LOOKUP] += delegate(string
s) { g.Reply(Values[s]); }; g.Join(); g.Send(UPDATE, Harry, 20.75);
List resultlist = new List (); nr = g.Query(ALL, LOOKUP, Harry,
EOL, resultlist); First sets up group Join makes this entity a
member. State transfer isnt shown Then can multicast, query.
Runtime callbacks to the delegates as events arrive Easy to request
security (g.SetSecure), persistence Consistency model dictates the
ordering seen for event upcalls and the assumptions user can make
51
Slide 52
Concept: A multi-query Our lookup is Multicast to the group All
members respond A chance for parallelism Each can do part of the
job: e.g. search 1/n th of a database Reduces response delays 52
Lookup Harry in the Ithaca phone directory Front end With n
replicas...... we get an n times speedup! Names with Harry in
them:....
Slide 53
Our example was overly simple 53 it didnt show the state
transfer code Corresponds to the white arrows in time-line figure
In Isis 2 we have a way to make checkpoints State transfer: Some
active member makes a checkpoint, and the joiner loads the state
from it. The code looks like other operations in our example
Checkpoints can also be used to save group state during periods
when all members are inactive
Slide 54
Adding security: Just one line! Group g = new Group(myGroup);
Dictionary Values = new Dictionary (); g.ViewHandlers +=
delegate(View v) { Console.Title = myGroup members: +v.members; };
g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;
}; g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);
}; g.SetSecure(myKey); g.Join(); g.Send(UPDATE, Harry, 20.75); List
resultlist = new List (); nr = g.Query(ALL, LOOKUP, Harry, EOL,
resultlist); First sets up group Join makes this entity a member.
State transfer isnt shown Then can multicast, query. Runtime
callbacks to the delegates as events arrive Easy to request
security, persistence, tunnelling on TCP... Consistency model
dictates the ordering seen for event upcalls and the assumptions
user can make 54
Slide 55
Some uses for process groups To replicate data maintained by
the members in memory To replicate actions taken on an external
service such as a replicated database To ensure that all replicas
are configured the same way To coordinate the processing of
requests and load-balance To offer a way to parallelize processing
by having each group member do part of the work Fault-tolerance via
a backup scheme 55
Slide 56
Isis 2 Summary 56 A library that you can invoke from a normal
program written in a normal way It does the work of creating groups
and sending multicasts and ensuring that the consistency model will
be enforced The developer just tells it what to do. She thinks
about a parallel distributed application. Virtual synchrony
eliminates many hard problems
Slide 57
Why not build it yourself from scratch? 57 Isis 2 user object
Isis 2 library Group instances and multicast protocols Flow Control
Membership Oracle Large Group LayerTCP tunnels (overlay)Dr.
MulticastPlatform Security Reliable SendingFragmentationGroup
Security Sense Runtime Environment Self-stabilizing Bootstrap
Protocol Socket Mgt/Send/Rcv Send CausalSend OrderedSend SafeSend
Query.... Message LibraryWrapped locks Bounded Buffers Oracle
Membership Group membership Report suspected failures Views Other
group members The SandBox itself is mostly composed of convergent
protocols that use probabilistic methods SafeSend and Send are two
of the protocol components hosted over what we call the large-scale
properties sandbox. The sandbox addresses issues like flow control,
security, etc. All protocols share and benefit from those
properties These systems are complex, especially if you want to run
on platforms like EC2 By using Isis 2 you inherit 30 years of
research on how to make it work
Slide 58
Why focus on Isis 2 ? 58 This is a good question to ask In fact
we could focus on any of a number of other technologies, including
other multicast products Such as Spread, JGroups, C-Ensemble... But
Isis 2 is open source and specifically designed for cloud settings.
(Also, Ken built it!) So since our class is short, we will look at
Isis 2 examples
Slide 59
Can Isis2 applications achieve the kinds of scalable
performance and elasticity required in large cloud deployments?
Segment V: Performance 59
Slide 60
Revisit our notion of consistency 60 Lets look again at our
mHealth example We want the best possible performance but we also
want to be sure that the application is safe for this kind of use
We need consistency, yet also need snappy response and elasticity,
especially in the monitoring component After all, it continuously
monitors huge numbers of patients. What limits scalability?
Slide 61
Speed of updates 61 Isis 2 offers many ways to do updates
RawSend, Send, CausalSend, OrderedSend, SafeSend Each has different
consistency / durability guarantees As a developer, youll want to
use the fastest option that is still safe in your setting ... Hence
will need to understand how each works ... and how fast each
solution will be Today well just look at this superficially
Slide 62
Example: Speed of updates 62 Isis 2 offers several ways to do
updates (we will visit them more carefully later) They have big
performance implications But speed can have more than one
definition!
Slide 63
Isis 2 : Send v.s. in-memory SafeSend 63 Send scales best, but
SafeSend with in-memory (rather than disk) logging and small
numbers of acceptors isnt terrible.
Slide 64
Latency ops/second 64 Latency: Delay before external user sees
action Ops/second: total throughput For most purposes systems like
Isis 2 offer basic performance of about 1000 ops/second But by
grouping requests into batches of ~50/request, services that can
support ~50,000 ops/second are feasible Building them is
challenging, but we wont focus on that engineering topic in these
lectures
Slide 65
Jitter: how steady are latencies? Cornell (Birman): No
distribution restrictions. 65 The spread of latencies is much
better (tighter) with Send: the 2-phase SafeSend protocol is
sensitive to scheduling delays
Slide 66
Flush delay as function of shard size Cornell (Birman): No
distribution restrictions. 66 Flush is fairly fast if we only wait
for acks from 3-5 members, but slow if we wait for all members.
Isis 2 lets developer set the threshold.
Slide 67
So I want Send+Flush, right? 67 The problem is that the
different solutions offer different guarantees The fastest
solutions have weaker guarantees Using them safely involves
understanding these properties in order to decide whether they are
good enough for the desired purpose But there are subtle issues we
dont have time to discuss in todays lecture. We will revisit
tomorrow.
Slide 68
Raw speed isnt the whole story! 68 When building a system such
as this we need to look at performance but also at steady behavior
Heres an example of a problem we ran into when doing the
experiments I just showed you As well see, Isis 2 had an
instability. We think weve fixed it but it illustrates an important
point
Slide 69
The experiment we did 69 We made a timeline picture from left
to right One node (the bottom one) sends multicasts The others log
the time of receipt We graphed the delay, sorted from slowest (top)
to fastest (bottom) delays Heres what we saw
As the application ran, it slowed down! 71 At first the system
was fast: even the slowest nodes at the top had short delays But
within a few multicasts they slowed down Then something resets them
and they speed up We tracked it down to a problem with garbage
collection in our system Modifying that protocol helped smooth
things out
Summary of insights from example? 76 Tools like Isis 2 enable
us to build cloud-scale replication based services with strong
guarantees But today, at least, they demand a lot from the
developer, who needs to really understand the choices and their
implications As Isis 2 evolves, this problem will be reduced: the
system will eventually automate many decisions, including picking
the right update primitives for you
Slide 77
Weve scratched the surface but there is much more to be
explored Cornells high assurance researchers are creating solutions
for tomorrows demanding applications Segment V: Conclusions 77
Slide 78
Key take-away points 78 Cloud computing, today, isnt very
friendly to high assurance applications This is a problem because
those applications are increasingly forced to migrate to the cloud
for reasons of cost, scalability or just because the cloud is the
dominant paradigm today But we can already use tools like Isis 2 to
solve these problems and as they become easier to work with, the
community able to build these solutions will grow
Slide 79
Key take-away points 79 With Isis 2 we can easily create
programs that run on cloud platforms like EC2 or even Android
mobile They form into groups and coordinate or replicate data or
actions via group primitives The concept is powerful and easily
visualized But tuning and doing sophisticated fault-tolerance
remains challenging. In the remaining lectures we will explore
these issues
Slide 80
The last word... The word on the street is that cloud computing
will rule but that the cloud cant do high assurance But the word in
the hallways at Cornell differs! We see Isis 2 as our
proof-by-demonstration that it can be done Even so, the engineering
challenge remains enormous 80
Slide 81
Learning more 81 Stay in the class. Well show you how! Download
the Isis 2 system from isis2.codeplex.com You can access the users
manual The code itself (currently v2.xxx, a very stable release)
And we maintain a discussion and issues board there
Slide 82
Learning more 82 My textbook covers this topic in depth Guide
to Reliable Distributed Systems: Building High- Assurance
Applications and Cloud-Hosted Services Ken Birman. Springer Verlag,
February 2012 A paper focused entirely on todays topic is:
Overcoming CAP with Consistent Soft-State Replication. Kenneth P.
Birman, D. Freedman, Q. Huang and Patrick Dowell. IEEE Computer
Magazine (special issue on The Growing Impact of the CAP Theorem).
Volume 12. pp. 50-58. February 2012. You can download a copy from:
http://www.cs.cornell.edu/projects/quicksilver/pubs.html