Upload
allegro-tech-blog
View
249
Download
0
Tags:
Embed Size (px)
Citation preview
Fighting with scalethings that “Getting started” doesn’t cover
Adam Dubiel
We’re building a New World
We used to have these..
..and now we have these
Microservices are needydiscovery
messaging
deployment
metrics
monitoring
storage
Need for automation
env business
env business
Choosing the right toolscan be hard
and then came docker..
Metrics - graphite
building blocks
frontend
dashboard
aggregator
storage
building blocks
graphite-web
tessera
carbon
whisper
building blocks
graphite-web
tessera
carbon
ceres
building blocks
graphite-web
tessera
carbon
cyanite
frontend
dashboard
influxdb
building blocks
“getting started” setup
graphite-web
tessera
carbon
whisper
1st iteration
1st iteration - overall
1st iteration - memory
1st iteration - cause
2nd iteration
OTH
ER
TEC
H
OTH
ER
TEC
H
2nd iteration - memory
2nd iteration - overall
not only soft, drivers too
Metrics - StatsD
architecture
Host1
Host2
Graphite
metric.host1.rate
metric.host2.rateOK
architecture
Host1
Host2
Graphite
metric.rate
metric.rateCLASH!
architecture
Host1
Host2
Graphite
metric.rate
metric.rateOKStatsD
problem?
how to scale?
StatsDStatsD
StatsD proxy
how to scale proxy…?
Messaging - Hermes
architecture
HermesFrontend
HermesConsumer
Kafka is great!performant
fault-tolerant
robust
but what with SLA?strict SLA, p99 < 100ms
stable p999 would be great
cluster manipulations can be painful
and beware of abusive clients
1st iteration - Kafka leaders
1st iteration - Kafka leaders
1st iteration - Kafka leadersKafka does not distribute leaders uniformly
simple take-first algorithm
no tools to make it happen
go with #!/usr/bin/python
2nd iteration - bufferingsimple solution to complex problem
use Kafka producer buffer
we can operate without Kafka for 1h!
and when we thought we have it all covered..bam!
corrupt message logs
corrupt message logsKafka High Level Consumer can’t skip corrupt parts
no easy solution
time pressure - 24h retency
#!/usr/bin/python to the rescue!
Lessons learned
Production traffic is only on.. Production
prepare for the Unknown
..but mostly for problems
scripts are your friends when time is the enemy
centralisation sounds appealing(but seriously, don’t)
focus onknowhow
agilityability to create clusters on demand
isolate
allegrotech.io
@allegrotechblog