Upload
simon-mckenzie
View
15
Download
0
Embed Size (px)
DESCRIPTION
Improving Robustness in Distributed Systems. Per Bergqvist [email protected] Erlang User Conference 2001 (courtesy CellPoint Systems AB). Design base. Cluster of cooperating hosts Erlang and C COTS hardware based Unix based (i.e. Solaris or Linux) - PowerPoint PPT Presentation
Citation preview
Improving Robustness in Distributed Systems
Erlang User Conference 2001
(courtesy CellPoint Systems AB)
Design base
Cluster of cooperating hostsErlang and CCOTS hardware basedUnix based (i.e. Solaris or Linux)10/100/1000 base-T back plane(”system area network”)
Cluster
Shared, distributed, system configurationEach host have ONE cluster controllerDispatch and supervise worker tasksMaster cluster controller: holds configuration database (persistent replica)Slave cluster controller: gets configuration from master cluster controllersCluster is DOWN when all master cluster controllers are inaccessible
Typical system
FirewallSwitch
Traffic
Control
Cluster Key Benefits
Single system view
Enforces decoupling of parts of O&M from actual traffic processing
Implementing a cluster
Cluster->Host->Node->NodeData Cluster global parametersSubscription mechanisms for conf. changesMnesia as configuration database on master cluster controllersHomebrewn configuration distribution to slave controllers (NOT using mnesia)(Worker) node supervision
Mnesia gotchas
First distributed node startup Disallow writes when all replicas not
accessible Use timeout on table load and force
load
... BUT ...
TCP based distribution
Network partitioning
Network parameters
Align TCP retransmission intervals w/ Erlang heartbeatsAlign TCP and IP rerouting parameters
Typical system II: Dual back plane
FirewallSwitch Traffic
Control
Erlang multi-homing problem
Host A
Host B
Host C
Multi-home Erlang w/ TCP
Add an alias interface to loop back i/fPatch tcp distribution to bind to alias
Publish alias interface on (all wanted) via real hw i/f’s Method 1: Static routes and
gratuitous/proxy arp Method 2: Use new (routing) protocol
ARP method
Implement a utility to:- broadcast unsolicited ARP responses- respond to ARP requests for the alias i/f addressAdd static routes on all far end systemsNOTE: all real i/f needs to be on same IP subnet
New routing protocol
Broadcast (Ethernet frames) what you have, including interface priorityLet the far end select path based on what/when they receiveFar end dynamically sets up host routesUse short retransmission intervals
Erlang multi-homing resolved ?
Host A
Host B
Host C
Summing up
Erlang can support multihoming with some additional workBy using loop back alias i/f, link failure becomes a routing problem (peer-peer association is kept intact)Solaris TCP/IP stack parameters are:- hard to find (only in out-of-date app. notes)- hard to set ”right”- host globalA distribution mechanism with built-in support for multi-homing preferred