27
Scalability and microservice dogfooding in Weave Cloud Alfonso Acosta <[email protected]> Software Engineer @2opremio

Scalability and microservice doogfooding in weave cloud

Embed Size (px)

Citation preview

Page 1: Scalability and microservice doogfooding in weave cloud

Scalability and microservice dogfooding in Weave Cloud

Alfonso Acosta <[email protected]>Software Engineer

@2opremio

Page 2: Scalability and microservice doogfooding in weave cloud
Page 3: Scalability and microservice doogfooding in weave cloud

Introductions

Page 4: Scalability and microservice doogfooding in weave cloud

Outline• What is Weaveworks/Scope/Weave Cloud?• First Weave Cloud architecture iteration• Second architecture iteration• Performance bottlenecks• Golang nuances

Page 5: Scalability and microservice doogfooding in weave cloud

Scopegithub.com/weaveworks/scopecloud.weave.works/demogithub.com/microservices-demo/microservices-demo

Page 6: Scalability and microservice doogfooding in weave cloud

Scope standalone

ScopeApp

ScopeProbe(host1

)

ScopeProbe(host2

)

ScopeProbe(hostn

)Control

s

Reports (CRDT-like semantics)

Page 7: Scalability and microservice doogfooding in weave cloud
Page 8: Scalability and microservice doogfooding in weave cloud

First Weave Cloud iteration (beta preview)

• Nov 2015• MMMMMMMMVP:–Multiuser–Authenticated–ASAP

Page 9: Scalability and microservice doogfooding in weave cloud

And we came up with ...• Invite-based: manual approvals• Small wrapping around OSS Scope– Authentication layer (users service)– Dedicated, lazy-provisioned user app

instances– Multiplexing+provisioner service (app-mapper

service)• Deployed in AWS• Managed by Docker Swarm+Terraform

Page 10: Scalability and microservice doogfooding in weave cloud

And we came up with ...Weave Cloud (Docker Swarm)

App1

AppN

User1

P1 P2

UserN

P1 P2users

Appmappe

r

Page 11: Scalability and microservice doogfooding in weave cloud

What went wrong?• Swarm: insufficient API + buggy• Per-user Scope app mapping– Pets, not cattle– Single point of failure per user– Resources wasted– Painful upgrades

Page 12: Scalability and microservice doogfooding in weave cloud

How did we fix it?

Page 13: Scalability and microservice doogfooding in weave cloud

Kubernetes• Rich set of abstractions (maybe too rich)• Zero-downtime deployments• Strong open community• Not without drawbacks (being addressed)– Steep learning slope– Installation/upgrades are painful outside GKE

http://blog.kubernetes.io/2015/12/how-Weave-built-a-multi-deployment-solution-for-Scope-using-Kubernetes.htmlKubeadm: https://github.com/kubernetes/kubernetes/pull/30360

Page 14: Scalability and microservice doogfooding in weave cloud

Horizontally-scalable Scope App• Bag of Scope App "cattle"• Any user can connect to any of them– Easy to scale/deploy

• Specialized Scope-App services, by function:–Collection: stores reports–Query: obtains reports (latency-sensitive) – Control: Apply actions on probe resources– Pipe: Bidirectional data comm App<->Probe

Page 15: Scalability and microservice doogfooding in weave cloud

Horizontally-scalable Scope App

DEMO

Page 16: Scalability and microservice doogfooding in weave cloud

Horizontally-scalable Scope App• Specialized storage per service:– Collection/Query: Dynamo, S3, Memcached,

NATS– Control/Pipe: SQS/Consul (rendezvous and

data communication probes<->apps<->UI)• Conscious lock-in– Easily-replaceable by OSS alternatives ...–… with a non-negligible maintenance cost

Page 17: Scalability and microservice doogfooding in weave cloud

But we were still sad• Very, very sad: query latency > 4s

(99percentile)• Combination of:–Big, unoptimized reports (>10MB

uncompressed msgpack, 0.3 Hz per probe)–Bad use of immutable (persistent) data

structures in Golang.• Good for reasoning• Garbage collection was killing us

Page 18: Scalability and microservice doogfooding in weave cloud

Mitigations• Faster report decoding:–Custom, compile-time-generated mspack

codecs• Better use of immutable (persistent) DS:–Remove unnecessary just-in-case Copy() calls– Improved external map library (10x hash

speedup)• Better choice of EC2 instance types– 3x less machines with 4x cores: same price,

considerably less latency

Page 19: Scalability and microservice doogfooding in weave cloud

Longer term solutions• Mitigations got us under 200ms • They won't cut it in the long run

–Vertical scaling of query won't last–Optimize report format: delta reports?–Get rid of persistent DS?–Report merging service?

Page 20: Scalability and microservice doogfooding in weave cloud

Persistent Data Structures• Operations never mutate older

versions/references of the DS– Easy to reason about–No locks needed

• Needs Garbage Collection to discard old versions– Large amounts of Garbage

Page 21: Scalability and microservice doogfooding in weave cloud

Dealing with persistent DS garbage• Haskell Garbage Collector

– Mark and Sweep, Stop the World, Generational, Compacting

– Uses purity: new data NEVER points to younger values

– High throughput• Golang Garbage Collector– MaS, Concurrent (tri-color) with small STW periods– Low latency– Escape analysis needs improvements

Page 22: Scalability and microservice doogfooding in weave cloud

Golang example// FNV1a hashfunc hashKey(key string) uint64 {

hash := offset64for _, codepoint := range key {

hash ^= uint64(codepoint)hash *= prime64

}return hash

}

Page 23: Scalability and microservice doogfooding in weave cloud

Optimization I// FNV1a hashfunc hashKey(key string) uint64 {

hash := offset64for _, codepoint := range []byte(key) {

hash ^= uint64(codepoint)hash *= prime64

}return hash

}

Page 24: Scalability and microservice doogfooding in weave cloud

Optimization II// FNV1a hashfunc hashKey(key string) uint64 {

hash := offset64for _, codepoint := range

bytesView(key) {hash ^= uint64(codepoint)hash *= prime64

}return hash

}

Page 25: Scalability and microservice doogfooding in weave cloud

Optimization IIfunc bytesView(v string) []byte {

if len(v) == 0 {return zeroByteSlice

}sx := (*unsafeString)

(unsafe.Pointer(&v))bx := unsafeSlice{sx.Data, sx.Len,

sx.Len}return *(*[]byte)(unsafe.Pointer(&bx))

}

Page 26: Scalability and microservice doogfooding in weave cloud

Optimization IItype unsafeString struct {

Data uintptrLen int

}type unsafeSlice struct {

Data uintptrLen intCap int

}var zeroByteSlice = []byte{}

Page 27: Scalability and microservice doogfooding in weave cloud

Questions?