Dependable Distributed Applications - uni-potsdam.de · •Relational DBMS are hard to make...

Preview:

Citation preview

Dependable Distributed Applications

Dependable Systems 2014

Lena Herscheid, Dr. Peter Tröger

Dependable Distributed Applications | Dependable Systems 2014 1

Frameworks +Programming ModelsHanmer, Robert. Patterns for fault tolerant software. John Wiley & Sons, 2013.

Introduction to Fault Tolerant CORBA. http://cnb.ociweb.com/cnb/CORBANewsBrief-200301.html

Erlang/OTP http://www.erlang.org/doc/

Dependable Distributed Applications | Dependable Systems 2014 2

Dependable Distributed Applications | Dependable Systems 2014 3

FT-CORBA

• Extension of CORBA standard by commonly used fault tolerance patterns

• Fault model: node crash faults

• Replication• Object level, ReplicationManager + ReplicaFactory• Logical singletons: group of replica object group, appear as a single object• warm / cold passive high recovery time• active / active_with_votinghigh multicast time

• Fault detection• FaultDetector + FaultNotifier• Are assumed inherently fault tolerant

• Failure recovery• Apply log of updated to replica, depending on replica type

• Implementations• Replication in the ORB: Electra, TAO, Orbix+Isis, …• Replication through CORBA objects: DOORS, AQuA, OGS, …

Dependable Distributed Applications | Dependable Systems 2014 4

Erlang/OTP

• Erlang programming language: fault tolerance as design principle• Isolated lightweight processes (managed by the VM)

• Programming model: asynchronous message passing

• “Let it crash” policy• Processes terminate with error codes

• Monitoring processes are expected to do recovery

• Transparent distribution of processes (by VM)

• Open Telecom Platform framework• Common patterns in concurrent distributed Erlang programs

• Modules can instantiate behaviours (server, fsm, supervisor…)

Dependable Distributed Applications | Dependable Systems 2014 5

Erlang/OTP Example supervision tree:

one_for_one restart:

Dependable Distributed Applications | Dependable Systems 2014 6

Fault TolerantCoordination ServicesBurrows, Mike. "The Chubby lock service for loosely-coupled distributed systems." Proceedings of the 7th symposium on Operating systems design and implementation. USENIX Association, 2006.

Hunt, Patrick, et al. "ZooKeeper: Wait-free Coordination for Internet-scale Systems." USENIX Annual Technical Conference. Vol. 8. 2010.

Dependable Distributed Applications | Dependable Systems 2014 7

Motivation

• Distributed algorithms are notoriously hard to implement correctly

• Leader election / consensus need to be inherently fault tolerant

• Decoupling algorithmic and data redundancy• Storage nodes usually need a higher degree of replication

• Consistency constraints

• High recovery costs

• Decision making should be lightweight• Fast recovery

• Low latency requirement

Dependable Distributed Applications | Dependable Systems 2014 8

In Search of an Understandable Consensus Algorithm

Chubby

• Google’s distributed lock service

• Goal: easily add consensus / leader election to existing application

• Lock service: simple interface for distributed decision making• “a generic electorate that allows a client system to make decisions correctly

when less than a majority of its own members are up”

• Serves small files so elected primaries can easily distribute parameters

• Client notification on events (such as lock expiry new leader election)

• Chubby servers contain 5 replicas, implementing Paxos• Automatic failover within a configured machine pool

Dependable Distributed Applications | Dependable Systems 2014 9

Zookeeper

• “Because Coordinating Distributed Systems is a Zoo”

• Distributed configuration + coordination service• Used for leader election, message queuing, synchronization

• Provides a file system like namespace for coordination data (<= 1MB per node)• Kept in memory• State based service: no change history

• Guaranteed absolute order of updates• Client watch events are triggered in the same order as Zookeeper sees the updates

• Throughput of read requests scales with #servers

• Throughput of write requests decreases with #servers• Consensus on all updates• ~50k updates per second

Dependable Distributed Applications | Dependable Systems 2014 10

Distributed StorageChang, Fay, et al. "Bigtable: A distributed storage system for structured data.“ ACM Transactions on Computer Systems (TOCS) 26.2 (2008): 4.

Corbett, James C., et al. "Spanner: Google’s globally distributed database.“ ACM Transactions on Computer Systems (TOCS) 31.3 (2013): 8.

HDFS architecture guide. http://hadoop. apache. org/common/docs/current/hdfs design. pdf (2008).

DeCandia, Giuseppe, et al. "Dynamo: amazon's highly available key-value store." ACM SIGOPS Operating Systems Review. Vol. 41. No. 6. ACM, 2007.

Dependable Distributed Applications | Dependable Systems 2014 11

Design Choices

• When to resolve conflicts?• On read

• On write

• Who resolves conflicts?• Application: data model aware resolution policies possible

• Storage system: application transparency, but less powerful

• ACID vs BASE

• PCAELC trade-offs

• Data partitioning algorithm

Dependable Distributed Applications | Dependable Systems 2014 12

ACID vs BASE (Brewer. PODC keynote. 2000)

Atomic, Consistent, Isolated, Durable

• Transactions

• Strong consistency

• Pessimistic/conservative replication

Basically Available, Soft-state, Eventual consistency

• Best Effort

• Weak consistency

• Optimistic replication

Dependable Distributed Applications | Dependable Systems 2014 13

Modern distributed storage systems

• Geo-replication• Latency issues• Consistency models need to take locality into account

• Shift towards tuneable, relaxed consistency models• Application-specific configuration• Fault tolerance increasingly also a DevOps problem

• Always available, low latency, partition tolerance, scalability (ALPS)• Availability before consistency• Most ALPS systems offer eventual consistency

• NoSQL movement• Relational DBMS are hard to make consistent and available• Denormalized data is easier to replicate

Dependable Distributed Applications | Dependable Systems 2014 14

Self-Healing

• How (and when) to handle diverging replicas with eventual consistency?

• Read repair• Quorum met, but not all replicas agreed inconsistency detected!

• Force the minority to update their copy

• Active Anti-Entropy (AAE)• Continuously running background process

• Difference detection using hash trees

Dependable Distributed Applications | Dependable Systems 2014 15

BigTable

• Google’s distributed database

• Designed to handle petabytes of distributed data

• Non-relational data model: “multi-dimensional sparse maps”

• GQL: subset of SQL

• Building Blocks• Google File System (GFS) for raw storage

• Chubby for master election

• Custom MapReduce implementation for writing data

Dependable Distributed Applications | Dependable Systems 2014 16

Google Spanner

• Spanservers consist of different data centres

• Data model: semi-relational

• “Externally consistent” transactions(linearizable consistency for R/W transactions)

• Timestamped transactions, using Paxos

Dependable Distributed Applications | Dependable Systems 2014 17

Effect of killing the Paxos leader

Google Spanner / TrueTime

• Instead of relying on NTP, data centres have own atomic clocks

• GPS-based time negotiation• Periodical consensus on time reliable, uncertain global clock

• Interval-based time (uncertainty representation)• The longer past the last synchronization point, the higher the uncertainty

Dependable Distributed Applications | Dependable Systems 2014 18

HDFS

Dependable Distributed Applications | Dependable Systems 2014 19

• Standard storage system behind Hadoop

• Replication of equal size file blocks on DataNodes

• Central coordinating NameNode• Maintains metadata: namespace tree, mapping of blocks to DataNodes• Metadata kept in memory• Monitors DataNodes by receiving heartbeats

• DataNode failure NameNode detects it, replicates on another node

• NameNode single point of failure (before 2.0.0)

High Availability HDFS

• HBase runs on top of HDFS: open source BigTable implementation

Dynamo

• Amazon’s distributed key-value store

• Designed for scalability and high availability

• Assumptions• Most operations do not span multiple data items

No need for fully relational DBMS

• Poor write availability is worse than inconsistency

• Always writeable• Conflict resolution upon reads

Dependable Distributed Applications | Dependable Systems 2014 21

Riak

• Distributed key-value store programmed in Erlang

• Designed based on Dynamo paper

Dependable Distributed Applications | Dependable Systems 2014 22

Replication configuration Ring-based consistent hashing Erlang supervision tree

Cassandra

• Distributed NoSQL DBMS

• Designed for performance and scalability

• Eventual consistency configurable• Hinted handoff for availability

• Gossip protocol for failure detection

• Configurable replication + partitioning• NetworkTopologyStrategy:

data-centre aware

Dependable Distributed Applications | Dependable Systems 2014 23

http://www.ecyrd.com/cassandracalculator/

The Reality of Distributed Failures…

human operation mistakes data corruption is rarely part of the failure model

unforeseen (hence unmodelled) error propagation chains

dynamically changing failure probabilities

nested failures during recovery routines

Dependable Distributed Applications | Dependable Systems 2014 24

Recommended