98
Corpus collapsum Partition tolerance testing of Galera with Docker and NetEm Raghavendra Prabhu [email protected] Percona [email protected] randomsurfer wnohang.net rdprabhu ronin13

Corpus collapsum: Partition tolerance testing of Galera with Docker and NetEm

Embed Size (px)

Citation preview

Corpus collapsumPartition tolerance testing of Galera with

Docker and NetEm

Raghavendra Prabhu [email protected]

Percona [email protected] randomsurfer wnohang.net rdprabhu ronin13

The Title

Split Brain?

Split brain

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributedsystem. ”

“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributedsystem. ”

“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributedsystem. ”

“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68

Introduction

Seed quotes..

“ ’Network is reliable’ - a fallacy of the distributedsystem. ”

“ A distributed system is one in which the failure of acomputer you didn’t even know existed can render your owncomputer unusable. ” - Leslie Lamport

“ Never attribute to malice that which is adequatelyexplained by stupidity. ” - Hanlon’s Razor

“ Never attribute to Byzantine failure which can beexplained by an ill node(s) ” - Me

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 5 / 68

20000 feet view

Introduction

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68

Introduction

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68

Introduction

Actors

▶ Database - WSREP/PXC▶ Plugin - Galera▶ Traffic control

♦ Traffic Control - tc♦ NetEm

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 7 / 68

Introduction

Actors

▶ Containers - Docker▶ Load

♦ Generators - Sysbench, RQG▶ Network

♦ Dnsmasq♦ nsenter

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68

Introduction

Actors

▶ Containers - Docker▶ Load

♦ Generators - Sysbench, RQG▶ Network

♦ Dnsmasq♦ nsenter

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 8 / 68

Introduction

Actors

▶ Jenkins♦ Build flow and CI

▶ Storage♦ Why

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 9 / 68

Distributed Systems TestingA Kobayashi Maru

Cheat on CAP!

Details

Rationale

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68

Details

Rationale

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68

Details

Rationale

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68

Details

Rationale

▶ The ‘P’ in CAP▶ WAN scalability▶ Real Reason - fun!▶ Tolerance to latency variance

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 11 / 68

Details

Rationale

▶ Failures in warehouses.▶ Not quorum, but consensus.▶ Real world networks and synchronous replication

- Delay- Partition- Non-graceful exits

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 12 / 68

Galera

Details

Galera

▶ Data-centric approach▶ Extended Virtual Synchrony▶ Causality and Synchronous▶ Flow control and temporal

Synchrony

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 14 / 68

Details

Galera

▶ Latency- Global ordering- Certification and not apply- Communication overhead

▶ Layers- Replication- Certification- Group communication

▶ Isolation- REPEATABLE-READ- SNAPSHOT-ISOLATION

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 15 / 68

Where did it start

Details

Where did it start

▶ Bug! https://bugs.launchpad.net/galera/+bug/1274192▶ Loss of PC▶ Crash▶ HAT

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 20 / 68

One can bring the wholedown

Details

Tests

▶ Chaos testing▶ Flow control with sysbench▶ Network Loss▶ Future

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 22 / 68

There is no higher menace thandistributed systems testing

Details

NetEm

▶ Initial setup- Bridge- Egress only- IFB- Present state

▶ NetEm- tc qdisc buckets- packet loss, delay, corruption, duplication, reordering- nsenter

▶ Future- Docker exec- Rocket ACI

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 24 / 68

Details

Tests: Chaos testing

▶ Nodes killed at random around sysbench▶ Less than half of nodes are chosen▶ docker inspect && SIGKILL▶ Configurable sleep && retry

♦ Snapshot/Incremental State Transfer- Composability of transactional databases

▶ docker restart && repeat

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 25 / 68

Details

Tests: Network Loss

▶ Loss nodes▶ Detach/Keep qdisc▶ Reconciliation▶ Sanity checks▶ Formation of PC || time to recover

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 26 / 68

The Flow

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

Jenkins Build images Start Dnsmasq Bootstrap

Load/SysbenchSST/OthersPre-sanitynsenter/netem

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 28 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Basic Flow

RR sysbench

Detach/Keep

Sanity check Reconciliation

Post sanity Core trace

Cleanup Collect logs

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 29 / 68

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68

Details

Parameters

▶ Sysbench▶ Segment▶ Reconciliation period▶ Loss nodes

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 30 / 68

Plumbing the pressure

Details

Parameters

▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68

Details

Parameters

▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68

Details

Parameters

▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68

Details

Parameters

▶ NetEm▶ Qdisc detach▶ fsync▶ Shutdown

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 32 / 68

Containers!

Details

Docker

▶ Why not virtualizeOccamNamespaces

▶ Simplicity♦ Network

Logical scalability♦ One application per node

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 34 / 68

Details

Docker

▶ Portability- Qualitative behavior.

▶ Reproducibility- Makes it determinstic

▶ Configurable and CI- Byproducts

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 35 / 68

Details

Docker

▶ QEMU vis-à-vis Docker▶ Scalability

♦ Performance♦ Feature

▶ Abstraction of channels

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 36 / 68

Details

Container Networking

▶ Linking didn’t help▶ Dnsmasq to rescue!

♦ Hosts file and volumes♦ SIGHUP and refresh

▶ Potential issues

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 37 / 68

Testing methods

Details

Overview

▶ Transient noise▶ Lasting ’sickness’▶ Sick nodes▶ Dead members

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 39 / 68

Details

Method I

▶ Qdisc is detached after load▶ Objective

- Time to recover of full cluster▶ Done with a larger subset

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 40 / 68

Details

Method II

▶ Qdisc is kept till the end▶ Objective

- Formation of primary component▶ Comparatively smaller set

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 41 / 68

Details

Observations

▶ Post sanity types- Why

▶ Which method is more pertinent▶ State transfer issues

- Beginning- During re-emergence

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 42 / 68

Details

Observations

▶ Direct load to affected nodes▶ Partition external to system▶ Logs

- journalctl- Streaming?

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 43 / 68

Details

Other noises

▶ Aim▶ Fsync

- libeatmydata- Variance

▶ Correlation with network▶ How with Docker

- LD_PRELOAD

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 44 / 68

System Load

Details

Load generation

▶ Sysbench- Generation- Reconnect on partition

▶ Sockets chosen- Load on affected nodes

▶ Distribution of Load- RR with socat- Native sysbench support- HAProxy?

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 46 / 68

Details

Load generation

▶ Nature of data/load- DDL

▶ RQG in future- Fuzz testing

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 47 / 68

The Fix

Strike Out!

Details

Eviction

▶ STONITH▶ Permanent eviction▶ ’N’ strikes & out!

- Timers - evs parameters- wsrep_evs_delayed and wsrep_evs_evict_list

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 50 / 68

Details

Eviction

▶ Aim▶ Quorum required

- Why? - Not shoot each other- Non-PC nodes also.

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68

Details

Eviction

▶ Aim▶ Quorum required

- Why? - Not shoot each other- Non-PC nodes also.

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 51 / 68

Details

Coredumps with Docker

▶ Breakdown of abstraction▶ Lack of isolation▶ What was done

- Volumes- core_pattern & sysctl- suid and ulimit

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 52 / 68

Details

WAN Segments

▶ How they work▶ Simulates data center▶ Random allocation - latency multiplier▶ Joiner starvation▶ Donor selection

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 53 / 68

Epilogue

The code

▶ Github:- https://github.com/percona/pxc-docker-https://github.com/percona/percona-xtradb-cluster/- https://github.com/percona/galera

▶ Jenkins:- http://jenkins.percona.com/job/PXC-5.6-netem/- http://jenkins.percona.com/job/PXC-5.6-bench/- http://jenkins.percona.com/job/PXC-5.6-chaos/

▶ Contributions/testing/bugs welcome!

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 54 / 68

Epilogue

Code: todo

▶ Docker automated builds▶ Orchestration▶ Docker

♦ Injection♦ Signal proxying

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 55 / 68

Epilogue

Code: todo

▶ => Proof of concept to a framework =>▶ Run it bare - CoreOS, Atomic▶ Overlay with etcd/fleet/libswarm

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 56 / 68

Future work

Epilogue

Future work

▶ Fault injection♦ Memory

- Poisoned memory♦ Disk

- libeatmydata- Opposite- ENOSPC

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 58 / 68

Epilogue

Fault injection

▶ CPU- NUMA?- Hotplug

▶ More network- corruption, duplication, reordering, rate-limit- Better distribution- Other shaping

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 59 / 68

Worst case improves Averagecase

Epilogue

Future work

▶ Disturb cluster more!- Membership changes* Manual eviction* Pull the cord!- Corrupt nodes

▶ Introduce inconsistencies- Consistency voting- Silent corruptions

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 61 / 68

Epilogue

Eventual consistency

▶ CAP▶ Latency factor▶ Is Galera EC? No!

- ACIDs only, No BASE▶ Bounded Staleness

- PBS▶ ACID and CAP▶ Instrumentation▶ Lambda architecture

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 62 / 68

Epilogue

Further Reading

▶ Worst-Case Distributed Systems Design▶ HAT, not CAP: Introducing Highly Available Transactions▶ Bridging the Gap: Opportunities in Coordination-Avoiding

Databases▶ Linearizability versus Serializability

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 64 / 68

Epilogue

We are Hiring Too!

▶ Looking for build engineer - Packaging and Jenkins/CI are yourstrengths and you are a linux geek.bonus points if you are a linux distrouser/contributor/maintainer.

▶ Senior C/C++ developer - if linux userspace development anddatabases (and distributed systems) is your thing.

▶ Apply here: http://percona.theresumator.com/.

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 65 / 68

Epilogue

About/Contact - HA compliant

▶ /me: Raghavendra Prabhu, Product Lead, Percona XtraDBCluster, Percona.

▶ Slides will be at slideshare.net/slidunder.▶ About.me: raghavendra.prabhu▶ Keybase.io: rdprabhu▶ Presentation under CC BY-SA 4.0

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 67 / 68

Epilogue

Image Credits▶ http://galeracluster.com/documentation-webpages/▶ https://en.wikipedia.org/wiki/Network_theory▶ https://upload.wikimedia.org/wikipedia/commons/6/60/Corpus_callosum.png▶ http://www.thebarrow.org/Neurological_Services/Epilepsy/204354▶ https://flic.kr/p/9J6GNu▶ http://schauerte.me/data.html▶ https://secure.flickr.com/photos/brewbooks/7780990192▶ https://www.flickr.com/photos/kwerfeldein/2649294869▶ https://secure.flickr.com/photos/mindmob/51951632▶ https://secure.flickr.com/photos/arenamontanus/2227769907▶ https://www.flickr.com/photos/markop/477199204▶ https://www.flickr.com/photos/gcwest/281385801▶ https://www.flickr.com/photos/29233640@N07/13466208953▶ https://www.flickr.com/photos/bob_in_thailand/9782777742/▶ http://ok-panic.net/art/jeff/dennis.jpg▶ https://www.facebook.com/sciencedump/photos/a.296290153732762.90161.

111815475513565/985102638184840/?type=1▶ http://upload.wikimedia.org/wikipedia/commons/0/05/Sna_large.png▶ http://background-kid.com/background-images-light-blue-color.html

Raghavendra Prabhu (Percona) Corpus collapsum 20 February, 2015 68 / 68