Upload
scott-powell
View
224
Download
1
Tags:
Embed Size (px)
Citation preview
1
WHAT IS THE ISIS2 LIBRARY?
Ken BirmanCornell University
2
Isis2 is a technology for replication Solves the coherent replication/caching
problem
Like MapReduce, intended for use by a programmer
Like Spark, fairly complex because the issues are subtle when you look at them carefully
3
Isis2 library
A large, multithreaded service that runs inside your program using its own threads and data structures
It implements a wide variety of cutting-edge distributed computing protocols and algorithms The outcome of 25 years of research! A very complex, sophisticated “operating
system” for distributed and cloud computing
But you access it via a very simple library interface that hides as much of this complexity as feasible
4
A short summary
What does it do?
5
Everyone knows how to write programs
Here’s one in C#:using System;using System.Collections.Generic;using System.Linq;using System.Text;
namespace ConsoleApplication2{ class Program { static void Main(string[] args) {
Console.WriteLine(“Hello World”); } }}
6
Everyone knows how to write programs
Here’s one in C++/CLI:
// HelloWorld.cpp : main project file.
#include "stdafx.h"
using namespace System;
int main(array<System::String ^> ^args){ Console::WriteLine(L"Hello World"); return 0;}
7
Suppose you want to build a cloud service
It will run on many nodes on a cloud platform
It will hold data collected as your system is running Perhaps it keeps track of where people are Then it can answer questions about
location, such as “where are my friends right now?”
Web applications send location updates and perform queries
8
Determinism
Many programs are deterministic Program state is determined by some
sequence of events that mutate the state Given the sequence, the state is easily
computed
When building distributed services Some people don’t assume determinism. But it easier to work with the “state
machine” approach in which programs are identical and fully deterministic
9
State Machine Approach (Lamport) You take some deterministic event-
driven program
Replicate it by making N identical copies
Apply the same sequence of events to all N copies ... they will stay in the identical states! ... and you can spread read-only operations
evenly. Any copy can respond identically to any other copy!
10
Is Determinism Feasible?
Modern programming languages make non-determinism “inevitable” Threads, libraries that use them Input that might arrive in unpredictable
orders Other events such as failures, exceptions...
But we can still build a deterministic “object” – a class that has update and query / lookup operations, and lives inside a program
11
Key idea in this short course? Small, deterministic objects
We build normal programs, which might not be deterministic
But inside them are deterministic objects and those have identical replicated states over a set of copies
We arrange to deliver the identical events, in the identical order, and the copies advance through identical states
12
Events
They could be updates... but could also be Group membership changes Failures
In the version of state machine replication of interest to us, all of these are just “events” delivered to the deterministic objects that we replicate
13
Using the membership
Suppose a system has N members, 0... N-1 (“rank”) We can replicate identical states But at the same time, can ask the different members to
play distinct roles
Example: Search a database with K*N entries Member 0 searches 0... K-1 Member 1 searches 1... 2K-1 Search is a form of “lookup” and if they search identical
replicates, the N responses reflect exactly one copy of each sub-result! They add up to “1 search of the database”
14
Without Isis2
With no help from a system such as Isis2, you will need to do many things by hand Run your program on the cloud, with N
copies Track the status of each copy, adapt as
they crash, or restart, or the cloud balances load
Send in updates and share them within the copies
Send in queries and compute the responses Follow the various rules imposed by cloud
operators This makes your task very hard
15
With Isis2
Today’s cloud developers often work with prebuild technologies to make their life simpler and easier Zookeeper: a special cloud computing file system from
Yahoo, used for sharing files in a reliable way Cassendra: A distributed key,value storage system MapReduce and Hadoop: Computing tools to split jobs
into small parts, combine results Isis2 is a library for writing distributed programs, and
we will focus on it A short class like this won’t have time to look at the full
range of cloud computing options. At Cornell we teach such a class but it takes us 14 weeks
By focusing on Isis2 we can be more thorough
16
Isis2 is for distributed computing Rather than building one program that will run “by
itself” we can use Isis2 to build a program that will Talk to clients over a network (client-server model) Collaborate with other servers (peer computing model) Execute on a cluster or cloud or in the Internet WAN
In this class we will focus on applications running on a cloud platform such as Amazon EC2 You can “simulate” such a setup on your own laptop! So you can experiment with Isis2 while we talk about it
DSN, June 2013 (Budapest)
17
Isis2 System
Revisits an old model (a personal favorite!)
Core functionality: groups of objects … fault-tolerance, speed (parallelism),
coordination Intended for use in very large-scale settings
The local object instance functions as a gateway Read-only operations performed on local state Update operations update all the replicas
myGroupstate transfer
“joinmyGroup”
update update
18
Terminology we’ve used
Process group: A term for a collection of programs that are all running (perhaps on different machines, perhaps on the same machine) and that use Isis2
Each process group has a name (you pick it) You can have multiple groups in one application
Message: Data encoded to be sent between programs State transfer: Data to initialize a new group member Update: Any action that changes the shared data Lookup: Any action that only queries the data Multicast: A message sent to every group member
19
How would this map to our use case We will use a process group to maintain the
locations of our users Updates will be done by multicast to the group Queries will be done by asking a group member
to look up the location data for a person Scalability
We could replicate all the data at every member, which works until the number of users gets very large.
Once the data gets huge we will need to split it into subsets. The approach is called “sharding”
20
Why would we use Isis2
Prebuilt tools such as Isis2 simplify the mapping of a concept such as “location tracking service” to code that we can run on the actual cloud It does many of the hard jobs for us Our programming task becomes easier as a
result
Developer still designs the solution and builds it, but many of the hardest tasks are “automated”
DSN, June 2013 (Budapest)
21
Isis2 Functionality
It offers a wide range of basic functions Multicast and multi-queries: replicated state
SafeSend Paxos, but I offer other flavors too Lock-based synchronization Distributed hash tables Persistent storage…
Execution model: “virtually synchronous state machine replication”, formalized jointly with Dahlia & Robbert
Easily integrated with application-specific logic
DSN, June 2013 (Budapest)
22
A distributed request that
updates group “state”...
Some service
A B C D
Example: Cloud-Hosted Service
SafeSend
SafeSend
SafeSend
... and the response
Standard Web-Services method
invocation
DB
DB
DB
DB
Use Paxos to replicate an external database
DSN, June 2013 (Budapest)
23
A distributed request that
updates group “state”...
Some service
A B C D
Example: Cloud-Hosted Service
Send
Send
Send
... and the response
Standard Web-Services method
invocation
In-memory collecti
on
In-memory collecti
on
In-memory collecti
on
In-memory collecti
on
Cheaper multicast+Flush suffices with in-memory replicas
g.Flush()
24
DSN, June 2013 (Budapest)
Isis2 System Summary
Elasticity (sudden scale changes) Potentially heavily loads High node failure rates Concurrent (multithreaded) apps
Long scheduling delays, resource contention
Bursts of message loss Need for very rapid response times Community skeptical of “assurance
properties”
C# library (callable from any .NET language) offering wide variety of replication tools and pre-built algorithms for cloud computing developers
Based on a model that fuses virtual synchrony and state machine replication models
Research challenges center on creating protocols that function well despite cloud “events”
25
DSN, June 2013 (Budapest)
Isis2 also deals with “pragmatics”
Flow control Building and using TCP
mesh overlays when UDP/IPMC aren’t permitted
Moving big objects out of band to avoid congesting in-band communication
Batched join/leave to reduce impact of churn in large groups
256-bit AES security Various persistency
features (logging, etc) Flexible ways to
integrate with external databases and other storage systems
Dr. Multicast algorithm to manage scarce IPMC resources
... etc
DSN, June 2013 (Budapest)
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering aseen for event upcalls and the assumptions user can make
26
DSN, June 2013 (Budapest)
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
27
DSN, June 2013 (Budapest)
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
28
DSN, June 2013 (Budapest)
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
29
DSN, June 2013 (Budapest)
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
30
DSN, June 2013 (Budapest)
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.SetSecure(myKey);
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security, persistence, tunnelling on TCP...
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
31
DSN, June 2013 (Budapest)
32
Lots of stuff inside...
Isis2 user
object
Isis2 user
object
Isis2 user
object
Isis2 library
Group instances and multicast protocolsFlow Control
Membership Oracle
Large Group Layer TCP tunnels (overlay)Dr. Multicast Platform Security
Reliable Sending Fragmentation Group Security
Sense Runtime EnvironmentSelf-stabilizing
Bootstrap ProtocolSocket Mgt/Send/Rcv
SendCausalSend
OrderedSendSafeSendQuery....
Message Library “Wrapped” locks Bounded Buffers
Oracle Membership
Group membership
Report suspected failuresViews
Other groupmembers
New: Big Data Analysis Infrastructure
33
All of them can use Isis2
Styles of programs
34
Modern computing: Many styles Console application:
Program receives arguments on the command line and by interacting with the “console user”
Prints output to the console window
Often used when developing a program that might later run in different styles
35
GUI program
Application employs a GUI library Lays out a windowed application with
buttons, menus, visual regions that contain images, etc
Attaches handlers to perform actions, like responding to mouse clicks (“event” handles)
When a GUI program is launched, it creates a window (a “frame”), sets it up and renders it
Then sits in a special method waiting for events
36
Client server program
Program must be registered as a server
Now client systems can connect to it, send it requests These days the “Web Services” approach is common Requests encoded as web pages, replies too Clients connect via TCP and use the HTTP protocol to
send the requests to the server and receive replies
Usually a library handles the connections and you only write the code to handle the requests
37
Cloud computing program
Many servers to support lots of clients
The servers all share some form of database
The cloud load-balances work so that performance remains high even if the number of clients is huge
38
Cloud programs are like...
Servers, but they often only serve one client at a time This is the easiest model to implement Virtualization allows cloud computers to host
many such servers and leverage multicore One computer might run many copies
GUI programs We favor an “event-oriented” style of computing But there is no private console or terminal and no
GUI. The similarity is because of the event computing model.
39
Sharing data
Normally, cloud servers share databases and files For example, during the night Google computes big
files with web index information, such as answers to queries with 1, 2, 3 ... n terms
These index files are stored in a big file system: GFS The servers have access to the files as needed
Ideally, caching is used to avoid accessing the file or database servers heavily Databases: Use a slightly fancier “snapshot
isolation” approach
40
Updates
Updates are typically sent to “back end” servers for processing and applied to the system state there
The client systems have in-memory caches but they don’t do in-memory updates to the server state
41
Update example
UpdateLocation(“John Smith”, now at 12 Main Street, ...)
Web application sends data Process group retains it and
responds to queries From this we can build apps:
“Who could join me for golf?”
42
Query example
Query(“Are any of my friends near the downtown golf course?”)
Web application sends question Request is “load balanced” on
the service elements Service is a big database
43
Most of the cases we’ll consider are like this example
The cloud gets a lot of “power” from simple ideas
Our focus in these lectures will mostly be on services that take updates and lookup requests and keep the data in memory, or in files associated with the server members
Even for this simple case many issues arise
44
Two kinds of questions
How hard is it to scale my solution up? For example, to divide the data so that
subsets of my group handle each subset of the data?
Otherwise, the update rate will limit scalability
How hard is it to guarantee properties? Consistency Security Fault-tolerance
45
A process group
Update ordering iskey to consistency
If we treat membershipchanges as a kind of update, we can address fault-tolerance
By allowing our groups to havea subgroup structure, we cansubdivide data for scalability E.g.: Red: Locations for people
with names A-H, Green: I-Z
myGroupstate transfer
“joinmyGroup”
update update
46
Our goals
Learn to use Isis2 to solve problems of this kind, and related problems that are based on the same ideas A full-length class would look at many other
cloud computing technologies But in the time we have, our choice is to be
very superficial for many technologies or reasonably detailed about just one
And so we’re focusing on just the one
47
Copying it to your system and building it
Obtaining the Isis2 library
48
What is the Isis2 Library?
Solving the kinds of problems we’ve look at is hard
Nobody wants to invent the solutions and implement them “from scratch” each time they are needed
Isis2 packages common functions into a standard and easily used form It can be used on many systems: Windows,
Linux, Amazon EC2 or Eucalyptus, even Android
And in many settings: WAN, cloud, small clusters
49
Isis2 is a library
It provides pre-built methods you can call from your program in C#, C++ or Python
But there are some limitations Isis2 itself was written in C# using .NET and so is
not so easy to use from non-.NET languages The version of C++ we support is “C++/CLI
for .NET” The version of Python is “IronPython for .NET”
On Windows there are 41 additional languages and in fact any of them would work. But only these three are available on Linux as well
50
How can you obtain this library? You’ll visit Isis2.codeplex.com and (normally) will
register for updates via email Then download Isis.cs: source code for Isis2
Also download Isis2.doc: a programming manual And Documentation.chm: “compiled” html with per-API
documentation
Working on Linux? If so, you should also download and install Mono. Build it. You’ll use the “dmcs” compiler, so read the small Isis
document explaining how this is done
Now you are ready to use the system
51
Our lectures will use C#
Do you know Java? C# was originally the Java language! Then it evolved in small ways, over a long time Also, the JUtil methods evolved into the ones
in .NET
What if you only know C or C++? C# looks like these languages but has automated
garbage collection and some additional operators Ask Professor Birman if you have any questions
about the code we look at during this class!
52
IDE for C#
Many people use Eclipse IDE, but this isn’t an option for programs in C#
Instead we recommend Visual Studio if you are a Windows user MonoDevel if you use Linux
Both have good development and debugging help
53
How do you install Isis2?
Some systems have elaborate install scripts
With Isis2 most people just use Isis.cs as a component of their solution! Perhaps your program consists of
myClass.cs and Isis.cs Visual Studio will rebuild Isis.cs when it
builds myClass
Some people prefer to compile Isis.cs into a dll This is also feasible and works very well
54
You can even use it by hand in Linux Suppose you prefer to develop with vim
from the bash shell in Linux Then you can edit your program
myClass.cs with vim Then use dmcs myClass.cs Isis.cs to
compile it
So you have many options!
55
Parameters
Many parameters control the behavior of the system. Most should not be changed
But there is a list of parameters for you to set if needed and those will often need to be adjusted In Linux you use the “bash” shell “export”
command In Windows you use the “setenv” command
These help Isis2 start up correctly
56
We’ll be more detailed
Step by Step
57
Step by step
1. You write the code and compile the program
2. You launch copies over time1. The first copy needs ~45 seconds to
initialize Isis2
2. The subsequent copies can join rapidly
3. Each copy creates local Group objects and associates handlers with them: view handlers and event handlers
4. Then calls g.Join() and then can use g.Send() and other primitives
58
Compilation
We discussed previously
Creates a version of your program that can be launched... Directly, on Windows: myProg.exe On Linux, by running “Mono myProg.exe”
You can also “wrap” the program with Mono... Doing this gives a single self-contained image like for Windows
Find instructions on Isis2.codeplex.com
59
Initial launch
First copy of your program will search for the Isis ORACLE service Starts by checking to see which network interfaces are
available. You can tell it which to use by setting the environment variable ISIS_NETWORK_INTERFACES.
Then sends broadcasts using the ISIS_PORTa port number searching for other instances of Isis
First to start initiates the ORACLE. Others join later This takes 45 seconds unless you use
Isis.Start(true), but this works ONLY if you launch just one copy first
60
Roles of ORACLE
Invisible to the application programmer
The first ISIS_ORACLE_SIZE members form a service to manage Isis process groups
If you wish this service can be told to run only on machines listed in ISIS_HOSTS
61
ISIS_TCP_ONLY, ISIS_UNICAST_ONLY
Some systems allow UDP but not UDP multicast. For these specify ISIS_UNICAST_ONLY=true.
The system will be a bit slower Some require pure use of TCP and no
UDP For these specify ISIS_HOSTS and
ISIS_TCP_ONLY Must start Isis on a machine in ISIS_HOSTS
first, but then can run copies on other machines
The system will be much slower
62
Failure sensing
Isis will monitor the applications using it
If something becomes unresponsive eventually Isis will notify groups that it has failed
Membership changes and a “new view” event occurs, and you can take actions
63
“Startup” thread
Isis2 monitors the thread that called Isis.Start() If that thread exits, Isis2 threads also exit The goal is to shut the library down if your
program finishes and terminates
But what should the startup thread do if it has no work and you DON’T want to exit? It calls Isis.WaitForever();
Registering a view handler
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
When membership changes a new view event occurs
Every member sees same view, at the same time, and each can use its own “rank” in the members list to take action
The new view lists: viewid members array departed, joined
Final view is delivered when a program leaves a group or the group « terminates » Shows
members.Length=0
64
65
Register a view handler
What really happens? If a process joins, leaves or crashes, the
ORACLE service is informed. The ORACLE computes the new view Then it broadcasts the new view to the
members Then they each do upcalls to event
handlers
This ensures consistency
66
You can also “watch” group members
Suppose that process B sees some event and realizes that process A will handle it Probably because of their rank in the group
view Both see the same group view when the
event occurs Then B may want to watch A so that if A
fails before finishing event handling, B can take over
A would signal that it has finished using a multicast
67
Partitioning Failures
Suppose a machine loses connectivity? The main network will think it has failed New views are reported and the system
“moves on”
The disconnected node will eventually sense that it has lost connectivity and will throw a partitioning exception It must not try to rejoin Isis; this option is
not allowed But it can save data for a new program to
load
68
Poison failures
Suppose a machine is temporarily overloaded The main system may think it has crashed But it is actually slow, yet alive
Such a machine will receive a “poison” message Causes the Isis programs to exit This is better than waiting for them
to recover, which could stop the wholecloud service
69
Saving data for next time
Isis has a logging service You need to register methods to
Create a checkpoint Load state from a checkpoint
Then you call g.SetPersistent()
Isis stores state in “data” files in the folder “chkpts”
70
What is a checkpoint?
You can save any state you wish
You can call SendChkpt as many times as needed
int istuff; double dstuff; g.MakeChkpt += (Isis.ChkptMaker)delegate(View nv) { g.SendChkpt(istuff); // Checkpoint a single integer g.SendChkpt(dstuff); // Checkpoint a single floating point value g.EndOfChkpt(); // Finished making the checkpoint }; g.LoadChkpt += (loadichkpt)delegate(int what) { IsisSystem.WriteLine(name + ": Got integer checkpoint: istuff=" + what); istuff = what; }; g.LoadChkpt += (loaddchkpt)delegate(double what) { IsisSystem.WriteLine(name + ": Got double checkpoint: dstuff=" + what); dstuff = what; };
71
Steps
The MakeCheckpt method is called from time to time in your program. You can control exactly when this will
happen
That updates the log files
Later, after restart, the LoadCheckpt method(s) will be called to reload the saved state
72
What is a checkpoint?
It will be loaded into the NEXT instance that runs int istuff;
double dstuff; g.MakeChkpt += (Isis.ChkptMaker)delegate(View nv) { g.SendChkpt(istuff); // Checkpoint a single integer g.SendChkpt(dstuff); // Checkpoint a single floating point value g.EndOfChkpt(); // Finished making the checkpoint }; g.LoadChkpt += (loadichkpt)delegate(int what) { IsisSystem.WriteLine(name + ": Got integer checkpoint: istuff=" + what); istuff = what; }; g.LoadChkpt += (loaddchkpt)delegate(double what) { IsisSystem.WriteLine(name + ": Got floating point checkpoint: dstuff=" + what); dstuff = what; };
73
Why did we register two loaders? Isis2 is polymorphic
Each method can be defined many times with different type signatures
As events occur, upcalls are done to the ones that match
In our examples we had just one argument to SendChkpt(), but we could have given many:
Any data type is allowed but you must register user-defined types with Isis first
g.SendChkpt(x, y, z, ....);
74
If my program crashes, that must be an Isis bug, right?
Debugging
75
Isis2 sometimes throws exceptions Mostly because of user errors that
confuse it
But simple crashes can look like Isis2 bugs Issue is that Isis2 runs many threads And many threads do lots of calls before
delivering events to the user, so the stack may be complex
Worse, delegate methods have no “names” and this can make it hard to realize where the crash happened
76
State transfer is a checkpoint too! If the checkpoint methods are defined, Isis2
will ask for a checkpoint just as a new member joins The old member makes the checkpoint The new member loads it
This initializes the joining member
myGroupstate transfer
update update
77
Multicast operations
Each group has “handlers” for a set of multicast and query events it defines Each such event should be given a name,
like LOOKUP or QUERY. These would normally be small integers in the range 0..127
For each such name you can define a handler, like this:
g.Handlers[UPDATE] += delegate(string s, double v) { Values[s] = v;};g.Handlers[LOOKUP] += delegate(string s) { g.Reply(Values[s]);};
78
Notice that...
Each handler has a “type signature” defined by its arguments: int, string, etc.
The handlers are methods. They can be declared in separate procedures or can be in-line delegates.
They can access local variables Each process would have its own private
copy Your process will run multiple times Each instance has its own copy of the
address space
79
Then we can send a multicast Isis2 has many “flavors”
This example uses SafeSend, which is a version of Leslie Lamport’s famous Paxos protocol It guarantees total order: every group member
sees these messages in identical order All receive every message (this is a “learner”
behavior for Paxos). Even the sender receives a copy!
If desired, the log of messages can persist even across crashes.
g.SafeSend(UPDATE, “Harry”, 20.75);
80
What does multicast do?
Picks some “moment in time” and delivers to every group member at that time This is guaranteed even if the sender crashes Order is also guaranteed If persistence is requested, SafeSend also logs
each message to remember it across failures And if group membership is changing, SafeSend
always uses the most current group view Each of these guarantees simplifies your
challenge when you write code using SafeSend
81
Other “flavors” of multicast
g.SafeSend() – Paxos g.OrderedSend() – Like SafeSend but
for in-memory data. “Optimistic delivery”
g.CausalSend() – Rarely used, implements causal ordering. If x y then delivers x before y
g.Send() – Fifo: if some process sends x, then y, delivers x before y
g.RawSend() – like UDP multicast. Very weak guarantees.
We’ll revisit the roles each of these can play later!
DSN, June 2013 (Budapest)
Multicast
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
82
83
Queries
A Isis2 query is a multicast that awaits replies A reply could be sent by 1, K or ALL group
members The caller must indicate how many replies
are desired
Each reply is collected into a list for that reply type
Pattern: 1 to N, then N to 1
DSN, June 2013 (Budapest)
Isis2 makes developer’s life easier
Group g = new Group(“myGroup”);
Dictionary<string,double> Values = new Dictionary<string,double>();
g.ViewHandlers += delegate(View v) {Console.Title = “myGroup members: “+v.members;
};
g.Handlers[UPDATE] += delegate(string s, double v) {
Values[s] = v;
};
g.Handlers[LOOKUP] += delegate(string s) {
g.Reply(Values[s]);
};
g.Join();
g.Send(UPDATE, “Harry”, 20.75);
List<double> resultlist = new List<double>();
nr = g.Query(ALL, LOOKUP, “Harry”, EOL, resultlist);
First sets up group
Join makes this entity a member. State transfer isn’t shown
Then can multicast, query. Runtime callbacks to the “delegates” as events arrive
Easy to request security (g.SetSecure), persistence
“Consistency” model dictates the ordering seen for event upcalls and the assumptions user can make
84
85
This example used g.Reply
Also available: g.NullReply() – Member doesn’t contribute
any value but the caller won’t wait for it (useful with ALL)
g.NoReply() – A risky option: like NullReply but no message of any kind is sent to the caller
Query can also specify an Isis “Timeout” new Timeout(delay_ms, action) Action is: TO_NULLREPLY, TO_FAILURE,
TO_ABORT
86
Other forms of Query
Like multicast: g.SafeQuery, g.OrderedQuery,
g.CausalQuery, g.Query, g.RawQuery
There are also other options for collecting responses We’ve seen a “list of each reply object”
version You can also collect responses as a byte
array, or request that Isis2 call a method you supply with the reply objects
87
What happens in an application that experiences many “events” all at the same time?
When does State Transfer occur?
Isis2 has a strong consistency model: a new form of virtual synchrony.
88
Virtual synchrony is a “consistency” model: Membership epochs: begin when a new configuration
is installed and reported by delivery of a new “view” and associated state
Protocols run “during” a single epoch: rather than overcome failure, we reconfigure when a failure occurs
p
q
r
s
t
Time: 0 10 20 30 40 50 60 70
p
q
r
s
t
Time: 0 10 20 30 40 50 60 70
Synchronous execution Virtually synchronous execution
Non-replicated reference execution
A=3
B=7
B = B-A A=A+1
89
Notice that...
State transfer “seems” to occur at the instant when a new view is delivered (all prior multicasts have already been performed) This means that the member preparing the
state has the correct values for state variables needed by joining member!
It is “safe” to send this state If desired, there is a way for you to
specify which member will send state to each joining process
90
... and also notice that....
We “think” of multicasts as being totally ordered
In fact, we tend to think of every group as if it only used the SafeSend protocol with its most conservative parameters (“DiskLogger” mode)
But the way we think about it may not be how the group actually behaves at runtime We will learn more about this in the next
lecture
91
... and last but also important Failure discovery and handling is
standardized and strongly consistent too Isis2 has many ways to sense failures, and
you can add methods of your own When a failure is sensed, the whole system
“drops” that member and all group views are revised
Any multicasts underway are finalized even if the failure disrupted sending them. This is invisible to users
These guarantees are very powerful!
Isis2 has a strong consistency model: a new form of virtual synchrony.
92
Virtual synchrony is a “consistency” model: Membership epochs: begin when a new configuration
is installed and reported by delivery of a new “view” and associated state
Protocols run “during” a single epoch: rather than overcome failure, we reconfigure when a failure occurs
p
q
r
s
t
Time: 0 10 20 30 40 50 60 70
p
q
r
s
t
Time: 0 10 20 30 40 50 60 70
Synchronous execution Virtually synchronous execution
Non-replicated reference execution
A=3
B=7
B = B-A A=A+1
But what happens here?
93
Optimistic early delivery
As we will see, some Isis2 multicast options are very fast, much faster than SafeSend The execution on the left is “pure SafeSend” The execution on the right used “optimistic early
delivery” When we enable optimistic early delivery,
We gain a big speedup But we could see runs that cannot arise with
SafeSend We will need to understand whether an application
can safely use this kind of faster multicast
94
Examples using views, multicast, query
Writing simple Isis2 programs
95
Three examples
1. Data fully replicated within a group1. Load-balanced query2. Parallel query with every member
participating
2. Data replicated in an external database.
Group used as a “front end”
96
Load-balanced query
We run multiple copies of our program, K copies
Inform the cluster or cloud management layer that these are replicas of some service
External client (web browser or web services application) issues requests
Management layer balances load over members
97
Load-balanced query
Each request thus arrives at a “primary” handler: some group member who hosts the endpoint of the TCP connection from the client
This handler can use OrderedSend or SafeSend to replicate updates. And it can access local variables to respond to read-only requests It does NOT use Query to respond to read-only
requests! All copies are identical so it would get K identical
replies!
98
Correctness properties
State transfer put each group member in the identical initial state
These send primitives deliver each multicast to every member and in the identical order
Thus every member does the same updates, in the same order, and stays in the identical state
This is why any member can use its local data to respond to a query!
99
OrderedSend vs SafeSend
Use OrderedSend if the data is in memory As we will see, this is one situation in which optimistic early
delivery is generally safe and appropriate
Use SafeSend (enabling the “DiskLogger” durability method) for data maintained in an external file or database As we will see, if data is external, one cannot use optimistic
early delivery
Summary: OrderedSend is faster but inappropriate for external data; SafeSend is slower but safe
100
Parallel Query
Suppose that we want to have all K members of our group respond to each query Example: With your “Google Glass” eyeglasses
you see an incredibly lovely girl. But who is she? You need to search all the images in Facebook
Beijing and in a hurry, before she leaves the party But there are more than 10M photos to scan!
With a group of 250 members, each could search just 40,000 entries for you. This will be faster
101
Parallel Query
Suppose that we want to have all K members of our group respond to each query
Try this: Use Query to replicate the read-only request to all
group members Member uses view.GetMyRank() to learn its rank:
0... K-1 Divide database into K equal sized parts. Member
with rank i searches part i Query response list “covers” the full database!
102
Why is this correct and safe? Correctness centers on the “group view”
In Isis2 any multicast is delivered at a moment when all group members agree on the contents of the view
Every member sees the same view
Since the membership list is a part of the view, they compute rankings in a consistent manner Everyone agrees that such-and-such a member is rank
K
Thus, every image in the database will be assigned to some single member
103
What if a failure occurs?
When the request was initiated we used a Query to the group and asked for ALL replies
How can we know if we received the proper number of responses? One option: have the members send back their
rank, and also how big the view was Thus we would get “result 0 of 3, 1 of 3, 2 of 3”
List<string> resultlist = new List<string>();nr = g.Query(ALL, LOOKUP, ImageData, EOL, resultlist);
g.Reply(myIndex, myCount, myBestMatch);nr = g.Query(ALL, LOOKUP, ImageData, EOL, indexes, counts, resultlist);
104
Understanding
It may seem wasteful to send these extra data Why not look at the membership before the query?
In fact we could do so, this way: Check the membership Query ALL members’ If the number of replies isn’t as expected, try again
But Isis2 does not actually promise to deliver the multicast in the same view that the member who sent the multicast saw prior to sending Views can change “at any time” without warning Thus option 2 can loop even when there are no failures
105
Data in an external database Use SafeSend to transmit updates.
Enable the DiskLogger durability method
After applying each update, call the DiskLogger “Done” method
On recovery, DiskLogger will replay updates and you must make sure each is applied, exactly once, in order, to the database
106
This case is tricky
We will need to discuss it in more detail The problem is a famous issue with Paxos-
type protocols, including SafeSend The protocol itself will remember updates
but when a database node crashes and recovers, the protocol will redeliver the same messages again, in the same order
As a result, the database must protect against replay
The issue cannot be avoided when using Paxos-type protocols for database replication
107
Summary
We have seen the basic Group features of Isis2
We have looked at the basis for consistency in the system: an order-based model called virtual synchrony
We explored simple ways to use groups to replicate data and handle large numbers of queries