17
Improving Robustness in Distributed Systems Per Bergqvist [email protected] Erlang User Conference 2001 (courtesy CellPoint Systems AB)

Improving Robustness in Distributed Systems

Embed Size (px)

DESCRIPTION

Improving Robustness in Distributed Systems. Per Bergqvist [email protected] Erlang User Conference 2001 (courtesy CellPoint Systems AB). Design base. Cluster of cooperating hosts Erlang and C COTS hardware based Unix based (i.e. Solaris or Linux) - PowerPoint PPT Presentation

Citation preview

Page 1: Improving Robustness in Distributed Systems

Improving Robustness in Distributed Systems

Per [email protected]

Erlang User Conference 2001

(courtesy CellPoint Systems AB)

Page 2: Improving Robustness in Distributed Systems

Design base

Cluster of cooperating hostsErlang and CCOTS hardware basedUnix based (i.e. Solaris or Linux)10/100/1000 base-T back plane(”system area network”)

Page 3: Improving Robustness in Distributed Systems

Cluster

Shared, distributed, system configurationEach host have ONE cluster controllerDispatch and supervise worker tasksMaster cluster controller: holds configuration database (persistent replica)Slave cluster controller: gets configuration from master cluster controllersCluster is DOWN when all master cluster controllers are inaccessible

Page 4: Improving Robustness in Distributed Systems

Typical system

FirewallSwitch

Traffic

Control

Page 5: Improving Robustness in Distributed Systems

Cluster Key Benefits

Single system view

Enforces decoupling of parts of O&M from actual traffic processing

Page 6: Improving Robustness in Distributed Systems

Implementing a cluster

Cluster->Host->Node->NodeData Cluster global parametersSubscription mechanisms for conf. changesMnesia as configuration database on master cluster controllersHomebrewn configuration distribution to slave controllers (NOT using mnesia)(Worker) node supervision

Page 7: Improving Robustness in Distributed Systems

Mnesia gotchas

First distributed node startup Disallow writes when all replicas not

accessible Use timeout on table load and force

load

Page 8: Improving Robustness in Distributed Systems

... BUT ...

TCP based distribution

Network partitioning

Page 9: Improving Robustness in Distributed Systems

Network parameters

Align TCP retransmission intervals w/ Erlang heartbeatsAlign TCP and IP rerouting parameters

Page 10: Improving Robustness in Distributed Systems

Typical system II: Dual back plane

FirewallSwitch Traffic

Control

Page 11: Improving Robustness in Distributed Systems

Erlang multi-homing problem

Host A

Host B

Host C

Page 12: Improving Robustness in Distributed Systems

Multi-home Erlang w/ TCP

Add an alias interface to loop back i/fPatch tcp distribution to bind to alias

Publish alias interface on (all wanted) via real hw i/f’s Method 1: Static routes and

gratuitous/proxy arp Method 2: Use new (routing) protocol

Page 13: Improving Robustness in Distributed Systems

ARP method

Implement a utility to:- broadcast unsolicited ARP responses- respond to ARP requests for the alias i/f addressAdd static routes on all far end systemsNOTE: all real i/f needs to be on same IP subnet

Page 14: Improving Robustness in Distributed Systems

New routing protocol

Broadcast (Ethernet frames) what you have, including interface priorityLet the far end select path based on what/when they receiveFar end dynamically sets up host routesUse short retransmission intervals

Page 15: Improving Robustness in Distributed Systems

Erlang multi-homing resolved ?

Host A

Host B

Host C

Page 16: Improving Robustness in Distributed Systems

Summing up

Erlang can support multihoming with some additional workBy using loop back alias i/f, link failure becomes a routing problem (peer-peer association is kept intact)Solaris TCP/IP stack parameters are:- hard to find (only in out-of-date app. notes)- hard to set ”right”- host globalA distribution mechanism with built-in support for multi-homing preferred

Page 17: Improving Robustness in Distributed Systems

Erlang Distribution over SCTP

Per Bergqvist et [email protected]

Erlang User Conference 2002