Upload
todd-palino
View
2.946
Download
6
Embed Size (px)
Citation preview
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved.
More Datacenters, More ProblemsMulti-Tier Architectures
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved.
Todd PalinoStaff Site Reliability EngineerLinkedIn, Data Infrastructure Streaming
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 3
Who Am I?
Kafka, Samza, and Zookeeper SRE at LinkedIn
Site Reliability Engineering– Administrators– Architects– Developers
Keep the site running, always
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 4
Data @ LinkedIn is Hiring!
Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)
Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels
LinkedIn Data Infrastructure Meetup– Where: LinkedIn @ 2061 Stierlin Court, Mountain View– When: May 11th at 6:30 PM– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 5
Kafka At LinkedIn
1100+ Kafka brokers Over 32,000 topics 350,000+ Partitions
875 Billion messages per day 185 Terabytes In 675 Terabytes Out
Peak Load (whole site)– 10.5 Million messages/sec– 18.5 Gigabits/sec Inbound– 70.5 Gigabits/sec Outbound
1800+ Kafka brokers Over 95,000 topics 1,340,000+ Partitions
1.3 Trillion messages per day 330 Terabytes In 1.2 Petabytes Out
Peak Load (single cluster)– 2 Million messages/sec– 4.7 Gigabits/sec Inbound– 15 Gigabits/sec Outbound
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 6
What Will We Talk About?
Tiered Cluster Architecture
Multiple Datacenters
Performance Concerns
Conclusion
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 7
Tiered Cluster Architecture
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 8
One Kafka Cluster
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 9
Single Cluster – Remote Clients
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 10
Single Cluster – Spanning Datacenters
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 11
Multiple Clusters – Local and Remote Clients
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 12
Multiple Clusters – Message Aggregation
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 13
Why Not Direct?
Network Concerns– Bandwidth– Network partitioning– Latency
Security Concerns– Firewalls and ACLs– Encrypting data in transit
Resource Concerns– A misbehaving application can swamp production resources
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 14
What Do We Lose?
You may lose message ordering– Mirror maker breaks apart message batches and redistributes them
You may lose key to partition affinity– Mirror maker will partition based on the key– Differing partition counts in source and target will result in differing distribution– Mirror maker does not (without work) honor custom partitioning
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 15
Multiple Datacenters
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 16
Why Multiple Datacenters?
Disaster Recovery Geolocalization Legal / Political
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 17
Planning For the Worst
Keep your datacenters identical– Snowflake services are hard to fail over
Services that need to run only once in the infrastructure need to coordinate– Zookeeper across sites can be used for this (at least 3 sites)– Moving from one Kafka cluster to another is hard– KIP-33: Time-based log indexing
What about AWS (or Google Cloud)?– Disaster recovery can be accomplished via availiability zones
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 18
Planning For the Best
Not much different than designing for disasters
Consider having a limited number of aggregate clusters– Every copy of a message costs you money– Have 2 downstream (or super) sites that process aggregate messages– Push results back out to all sites– Good place for stream processing, like Samza
What about AWS (or Google Cloud)?– Geolocalization can be accomplished via regions
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 19
Segregating Data
All the data, available everywhere The right data, only where we need it
Topics are (usually) the smallest unit we want to deal with– Mirroring a topic between clusters is easy– Filtering that mirror is much harder
Have enough topics so consumers do not throw away messages
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 20
Retaining Data
Retain data only as long as you have to
Move data away from the front lines as quickly as possible.
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 21
Performance Concerns
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 22
Buy The Book!
Early Access available now.
Covers all aspects of Kafka, from setup to client development to ongoing administration and troubleshooting.
Also discusses stream processing and other use cases.
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 23
Kafka Cluster Sizing
How big for your local cluster?– How much disk space do you have?– How much network bandwidth do you have?– CPU, memory, disk I/O
How big for your aggregate cluster?– In general, multiple the number of brokers by the number of local clusters– May have additional concerns with lots of consumers
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 24
Topic Configuration
Partition Counts for Local– Many theories on how to do this correctly, but the answer is “it depends”– How many consumers do you have?– Do you have specific partition requirements?– Keeping partition sizes manageable
Partition Counts for Aggregate– Multiply the number of partitions in a local cluster by the number of local clusters– Periodically review partition counts in all clusters
Message Retention– If aggregate is where you really need the messages, only retain it in local for long enough
to cover mirror maker problems
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 25
Where Do I Put Mirror Makers?
Best practice was to keep the mirror maker local to the target cluster– TLS (can) make this a new game
In the datacenter with the produce cluster– Fewer issues from networking– Significant performance hit when using TLS on consume
In the datacenter with the consume cluster– Highest performance TLS– Potential to consume messages and drop on produce
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 26
Mirror Maker Sizing
Number of servers and streams– Size the number of servers based on the peak bytes per second– Co-locate mirror makers– Run more mirror makers in an instance than you need– Use multiple consumer and producer streams
Other tunables to look at– Partition assignment strategy– In flight requests per connection– Linger time
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 27
Segregation of Topics
Not all topics are created equal
High Priority Topics– Topics that change search results– Topics used for hourly or daily reporting
Low Latency Topics
Run a separate mirror maker for these topics– One bloated topic won’t affect reporting– Restarting the mirror maker takes less time– Less time to catch up when you fall behind
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 28
Conclusion
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 29
Broker Improvements
Namespaces– Namespace topics by datacenter– Eliminate local clusters and just have aggregate– Significant hardware savings
JBOD Fixes– Intelligent partition assignment– Admin tools to move partitions between mount points– Broker should not fail completely with a single disk failure
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 30
Mirror Maker Improvements
Identity Mirror Maker– Messages in source partition 0 get produced directly to partition 0– No decompression of message batches– Keeps key to partition affinity, supporting custom partitioning– Requires mirror maker to maintain downstream partition counts
Multi-Consumer Mirror Maker– A single mirror maker that consumes from more than one cluster– Reduces the number of copies of mirror maker that need to be running– Forces a produce-local architecture, however
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 31
Administrative Improvements
Multiple cluster management– Topic management across clusters– Visualization of mirror maker paths
Better client monitoring– Burrow for consumer monitoring– No open source solution for producer monitoring (audit)
End-to-end availability monitoring
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 32
Getting Involved With Kafka
http://kafka.apache.org
Join the mailing lists– [email protected]– [email protected]
irc.freenode.net - #apache-kafka
Meetups– Apache Kafka - http://www.meetup.com/http-kafka-apache-org– Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/
Contribute code
SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 33
Data @ LinkedIn is Hiring!
Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)
Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels
LinkedIn Data Infrastructure Meetup– Where: LinkedIn @ 2061 Stierlin Court, Mountain View– When: May 11th at 6:30 PM– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup