Scalability and microservice doogfooding in weave cloud

Preview:

Citation preview

Scalability and microservice dogfooding in Weave Cloud

Alfonso Acosta <fons@weave.works>Software Engineer

@2opremio

Introductions

Outline• What is Weaveworks/Scope/Weave Cloud?• First Weave Cloud architecture iteration• Second architecture iteration• Performance bottlenecks• Golang nuances

Scopegithub.com/weaveworks/scopecloud.weave.works/demogithub.com/microservices-demo/microservices-demo

Scope standalone

ScopeApp

ScopeProbe(host1

)

ScopeProbe(host2

)

ScopeProbe(hostn

)Control

s

Reports (CRDT-like semantics)

First Weave Cloud iteration (beta preview)

• Nov 2015• MMMMMMMMVP:–Multiuser–Authenticated–ASAP

And we came up with ...• Invite-based: manual approvals• Small wrapping around OSS Scope– Authentication layer (users service)– Dedicated, lazy-provisioned user app

instances– Multiplexing+provisioner service (app-mapper

service)• Deployed in AWS• Managed by Docker Swarm+Terraform

And we came up with ...Weave Cloud (Docker Swarm)

App1

AppN

User1

P1 P2

UserN

P1 P2users

Appmappe

r

What went wrong?• Swarm: insufficient API + buggy• Per-user Scope app mapping– Pets, not cattle– Single point of failure per user– Resources wasted– Painful upgrades

How did we fix it?

Kubernetes• Rich set of abstractions (maybe too rich)• Zero-downtime deployments• Strong open community• Not without drawbacks (being addressed)– Steep learning slope– Installation/upgrades are painful outside GKE

http://blog.kubernetes.io/2015/12/how-Weave-built-a-multi-deployment-solution-for-Scope-using-Kubernetes.htmlKubeadm: https://github.com/kubernetes/kubernetes/pull/30360

Horizontally-scalable Scope App• Bag of Scope App "cattle"• Any user can connect to any of them– Easy to scale/deploy

• Specialized Scope-App services, by function:–Collection: stores reports–Query: obtains reports (latency-sensitive) – Control: Apply actions on probe resources– Pipe: Bidirectional data comm App<->Probe

Horizontally-scalable Scope App

DEMO

Horizontally-scalable Scope App• Specialized storage per service:– Collection/Query: Dynamo, S3, Memcached,

NATS– Control/Pipe: SQS/Consul (rendezvous and

data communication probes<->apps<->UI)• Conscious lock-in– Easily-replaceable by OSS alternatives ...–… with a non-negligible maintenance cost

But we were still sad• Very, very sad: query latency > 4s

(99percentile)• Combination of:–Big, unoptimized reports (>10MB

uncompressed msgpack, 0.3 Hz per probe)–Bad use of immutable (persistent) data

structures in Golang.• Good for reasoning• Garbage collection was killing us

Mitigations• Faster report decoding:–Custom, compile-time-generated mspack

codecs• Better use of immutable (persistent) DS:–Remove unnecessary just-in-case Copy() calls– Improved external map library (10x hash

speedup)• Better choice of EC2 instance types– 3x less machines with 4x cores: same price,

considerably less latency

Longer term solutions• Mitigations got us under 200ms • They won't cut it in the long run

–Vertical scaling of query won't last–Optimize report format: delta reports?–Get rid of persistent DS?–Report merging service?

Persistent Data Structures• Operations never mutate older

versions/references of the DS– Easy to reason about–No locks needed

• Needs Garbage Collection to discard old versions– Large amounts of Garbage

Dealing with persistent DS garbage• Haskell Garbage Collector

– Mark and Sweep, Stop the World, Generational, Compacting

– Uses purity: new data NEVER points to younger values

– High throughput• Golang Garbage Collector– MaS, Concurrent (tri-color) with small STW periods– Low latency– Escape analysis needs improvements

Golang example// FNV1a hashfunc hashKey(key string) uint64 {

hash := offset64for _, codepoint := range key {

hash ^= uint64(codepoint)hash *= prime64

}return hash

}

Optimization I// FNV1a hashfunc hashKey(key string) uint64 {

hash := offset64for _, codepoint := range []byte(key) {

hash ^= uint64(codepoint)hash *= prime64

}return hash

}

Optimization II// FNV1a hashfunc hashKey(key string) uint64 {

hash := offset64for _, codepoint := range

bytesView(key) {hash ^= uint64(codepoint)hash *= prime64

}return hash

}

Optimization IIfunc bytesView(v string) []byte {

if len(v) == 0 {return zeroByteSlice

}sx := (*unsafeString)

(unsafe.Pointer(&v))bx := unsafeSlice{sx.Data, sx.Len,

sx.Len}return *(*[]byte)(unsafe.Pointer(&bx))

}

Optimization IItype unsafeString struct {

Data uintptrLen int

}type unsafeSlice struct {

Data uintptrLen intCap int

}var zeroByteSlice = []byte{}

Questions?

Recommended