Upload
weaveworks
View
4.489
Download
2
Embed Size (px)
Citation preview
Scalability and microservice dogfooding in Weave Cloud
Alfonso Acosta <[email protected]>Software Engineer
@2opremio
Introductions
Outline• What is Weaveworks/Scope/Weave Cloud?• First Weave Cloud architecture iteration• Second architecture iteration• Performance bottlenecks• Golang nuances
Scopegithub.com/weaveworks/scopecloud.weave.works/demogithub.com/microservices-demo/microservices-demo
Scope standalone
ScopeApp
ScopeProbe(host1
)
ScopeProbe(host2
)
ScopeProbe(hostn
)Control
s
Reports (CRDT-like semantics)
First Weave Cloud iteration (beta preview)
• Nov 2015• MMMMMMMMVP:–Multiuser–Authenticated–ASAP
And we came up with ...• Invite-based: manual approvals• Small wrapping around OSS Scope– Authentication layer (users service)– Dedicated, lazy-provisioned user app
instances– Multiplexing+provisioner service (app-mapper
service)• Deployed in AWS• Managed by Docker Swarm+Terraform
And we came up with ...Weave Cloud (Docker Swarm)
App1
AppN
User1
P1 P2
UserN
P1 P2users
Appmappe
r
What went wrong?• Swarm: insufficient API + buggy• Per-user Scope app mapping– Pets, not cattle– Single point of failure per user– Resources wasted– Painful upgrades
How did we fix it?
Kubernetes• Rich set of abstractions (maybe too rich)• Zero-downtime deployments• Strong open community• Not without drawbacks (being addressed)– Steep learning slope– Installation/upgrades are painful outside GKE
http://blog.kubernetes.io/2015/12/how-Weave-built-a-multi-deployment-solution-for-Scope-using-Kubernetes.htmlKubeadm: https://github.com/kubernetes/kubernetes/pull/30360
Horizontally-scalable Scope App• Bag of Scope App "cattle"• Any user can connect to any of them– Easy to scale/deploy
• Specialized Scope-App services, by function:–Collection: stores reports–Query: obtains reports (latency-sensitive) – Control: Apply actions on probe resources– Pipe: Bidirectional data comm App<->Probe
Horizontally-scalable Scope App
DEMO
Horizontally-scalable Scope App• Specialized storage per service:– Collection/Query: Dynamo, S3, Memcached,
NATS– Control/Pipe: SQS/Consul (rendezvous and
data communication probes<->apps<->UI)• Conscious lock-in– Easily-replaceable by OSS alternatives ...–… with a non-negligible maintenance cost
But we were still sad• Very, very sad: query latency > 4s
(99percentile)• Combination of:–Big, unoptimized reports (>10MB
uncompressed msgpack, 0.3 Hz per probe)–Bad use of immutable (persistent) data
structures in Golang.• Good for reasoning• Garbage collection was killing us
Mitigations• Faster report decoding:–Custom, compile-time-generated mspack
codecs• Better use of immutable (persistent) DS:–Remove unnecessary just-in-case Copy() calls– Improved external map library (10x hash
speedup)• Better choice of EC2 instance types– 3x less machines with 4x cores: same price,
considerably less latency
Longer term solutions• Mitigations got us under 200ms • They won't cut it in the long run
–Vertical scaling of query won't last–Optimize report format: delta reports?–Get rid of persistent DS?–Report merging service?
Persistent Data Structures• Operations never mutate older
versions/references of the DS– Easy to reason about–No locks needed
• Needs Garbage Collection to discard old versions– Large amounts of Garbage
Dealing with persistent DS garbage• Haskell Garbage Collector
– Mark and Sweep, Stop the World, Generational, Compacting
– Uses purity: new data NEVER points to younger values
– High throughput• Golang Garbage Collector– MaS, Concurrent (tri-color) with small STW periods– Low latency– Escape analysis needs improvements
Golang example// FNV1a hashfunc hashKey(key string) uint64 {
hash := offset64for _, codepoint := range key {
hash ^= uint64(codepoint)hash *= prime64
}return hash
}
Optimization I// FNV1a hashfunc hashKey(key string) uint64 {
hash := offset64for _, codepoint := range []byte(key) {
hash ^= uint64(codepoint)hash *= prime64
}return hash
}
Optimization II// FNV1a hashfunc hashKey(key string) uint64 {
hash := offset64for _, codepoint := range
bytesView(key) {hash ^= uint64(codepoint)hash *= prime64
}return hash
}
Optimization IIfunc bytesView(v string) []byte {
if len(v) == 0 {return zeroByteSlice
}sx := (*unsafeString)
(unsafe.Pointer(&v))bx := unsafeSlice{sx.Data, sx.Len,
sx.Len}return *(*[]byte)(unsafe.Pointer(&bx))
}
Optimization IItype unsafeString struct {
Data uintptrLen int
}type unsafeSlice struct {
Data uintptrLen intCap int
}var zeroByteSlice = []byte{}
Questions?