16
KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association AIFB, eOrganization Research Group www.kit.edu Towards Comprehensive Measurement of Consistency Guarantees for Cloud-Hosted Data Storage Services David Bermbach (KIT) Liang Zhao, Sherif Sakr (NICTA & UNSW) © David Bermbach

Comprehensive Consistency Benchmarking

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association

AIFB, eOrganization Research Group

www.kit.edu

Towards Comprehensive Measurement of Consistency Guarantees for Cloud-Hosted Data Storage ServicesDavid Bermbach (KIT)Liang Zhao, Sherif Sakr (NICTA & UNSW)

© David Bermbach

AIFB – eOrganization Research Group2

Motivation

Developing applications on top of only eventually consistent cloudstorage services is difficult and requires a change in mindsetsMost applications can tolerate a certain degree of uncertainty as longas they know about the extentIn some cases eventual consistency is just not acceptable but changingthe behavior of storage systems requires knowledge of their guaranteeinternaConsistency guarantees are affected by additional influence factors(e.g., failures, workload, geo-distribution, …); this should be analyzed in a comprehensive consistency benchmark.

David Bermbach

AIFB – eOrganization Research Group3

What is consistency?

Two different definitions“C“ from ACID is not what distributed systems see as consistency

Basic idea: If all replicas are identical and all ordering guarantees of theconsistency model are observed, then we call the datastore consistentEventual Consistency: In the absence of updates all replicas eventuallyconverge towards a consistent state

David Bermbach

AIFB – eOrganization Research Group4

What is consistency? (cont.)

Two perspectives: data-centric vs. client-centric

David Bermbach02.09.2013

AIFB – eOrganization Research Group5

What is consistency? (cont.)

Two perspectives: data-centric vs. client-centric

Two dimensions:Staleness: How far are replicas lagging behindOrdering: How many requests are executed out of order

David Bermbach02.09.2013

AIFB – eOrganization Research Group6

Metrics

StalenessIn terms of time (t-Visibility)In terms of versions (k-Staleness)

OrderingProbability of violations of Monotonic Read ConsistencyProbability of violations of Monotonic Write ConsistencyProbability of violations of Read Your Writes Consistency

David Bermbach02.09.2013

AIFB – eOrganization Research Group7

Challenges

Accuracy and MeaningfulnessChoose appropriate metricsConsider accuracy of the measurement approach

WorkloadsSystem loadActual measurement workload (highly volatile reaction, so keep it simple)

Geo-replicationSingle site vs. multi site deployments vs. anything in between

FailuresCloud service vs. self-hosted systemCrash-stop, crash-recover, byzantine

Multi-tenancyCross-effects between tenants

David Bermbach02.09.2013

AIFB – eOrganization Research Group8

Architecture

02.09.2013 David Bermbach

AIFB – eOrganization Research Group9

Architecture

02.09.2013

YCSB

Not yet implemented

e.g., Simian Army

Based on Bermbach & Tai, MW4SOC 2011

David Bermbach

AIFB – eOrganization Research Group10

Experiments

Study effects ofdifferent workloadsgeo-distribution

on consistency guarantees ofCassandra: (3,1,1) quorumMongoDB: Master-Slave

David Bermbach02.09.2013

AIFB – eOrganization Research Group11

Setup

Storage system deployed on 3 medium EC2 instances running in oneAZ (eu-west-1a), multiple AZ (eu-west) or multiple regions (eu-west, us-west, asia-pacific)YCSB running on xlarge instanceConsistency measurement running on 5 small instances per testLoad balancer routes requests to the closest replicaMeasure consistency

Without additional workloadWith 80% reads and 20% writes in parallel (YCSB)With 20% reads and 80% writes in parallel (YCSB)

David Bermbach02.09.2013

AIFB – eOrganization Research Group12

Staleness Results for Cassandra

David Bermbach02.09.2013

Single-AZ

Multi-AZ

Multi-region

<1ms for 98% ofrequests

AIFB – eOrganization Research Group13

Staleness Results for MongoDB

David Bermbach02.09.2013

Single-AZ Multi-AZ Multi-region

AIFB – eOrganization Research Group14

Effects of parallel workload

No effect except for multi-region Cassandra:

David Bermbach02.09.2013

Workload 1 Workload 2

AIFB – eOrganization Research Group15

Conclusion

A comprehensive consistency benchmark shoulduse meaningful and accurate (fine-grained & continuous) metricsconsider effects of workloads, geo-distribution, multi-tenancy and failurespreferably reuse and integrate popular, proven solutions

Increased geo-distribution results in larger staleness values for bothMongoDB and CassandraIncreased workloads seem to have little effect on consistency as longas CPU utilization is below 100%

David Bermbach02.09.2013

AIFB – eOrganization Research Group16

Thank you for your attention

Questions?

David Bermbach02.09.2013

[email protected]

This paper was joint work withLiang Zhao and Sherif Sakr atNICTA and UNSW:=> [email protected]=> [email protected]