More Datacenters, More Problems

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved.

More Datacenters, More ProblemsMulti-Tier Architectures

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved.

Todd PalinoStaff Site Reliability EngineerLinkedIn, Data Infrastructure Streaming

SITE RELIABILITY ENGINEERING©2015 LinkedIn Corporation. All Rights Reserved. 3

Who Am I?

Kafka, Samza, and Zookeeper SRE at LinkedIn

Site Reliability Engineering– Administrators– Architects– Developers

Keep the site running, always


Data @ LinkedIn is Hiring!

Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)

Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels

LinkedIn Data Infrastructure Meetup– Where: LinkedIn @ 2061 Stierlin Court, Mountain View– When: May 11th at 6:30 PM– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup

mailto:[email protected]


Kafka At LinkedIn

1100+ Kafka brokers Over 32,000 topics 350,000+ Partitions

875 Billion messages per day 185 Terabytes In 675 Terabytes Out

Peak Load (whole site)– 10.5 Million messages/sec– 18.5 Gigabits/sec Inbound– 70.5 Gigabits/sec Outbound

1800+ Kafka brokers Over 95,000 topics 1,340,000+ Partitions

1.3 Trillion messages per day 330 Terabytes In 1.2 Petabytes Out

Peak Load (single cluster)– 2 Million messages/sec– 4.7 Gigabits/sec Inbound– 15 Gigabits/sec Outbound


What Will We Talk About?

Tiered Cluster Architecture

Multiple Datacenters

Performance Concerns

Conclusion


Tiered Cluster Architecture


One Kafka Cluster


Single Cluster – Remote Clients


Single Cluster – Spanning Datacenters


Multiple Clusters – Local and Remote Clients


Multiple Clusters – Message Aggregation


Why Not Direct?

Network Concerns– Bandwidth– Network partitioning– Latency

Security Concerns– Firewalls and ACLs– Encrypting data in transit

Resource Concerns– A misbehaving application can swamp production resources


What Do We Lose?

You may lose message ordering– Mirror maker breaks apart message batches and redistributes them

You may lose key to partition affinity– Mirror maker will partition based on the key– Differing partition counts in source and target will result in differing distribution– Mirror maker does not (without work) honor custom partitioning


Multiple Datacenters


Why Multiple Datacenters?

Disaster Recovery Geolocalization Legal / Political


Planning For the Worst

Keep your datacenters identical– Snowflake services are hard to fail over

Services that need to run only once in the infrastructure need to coordinate– Zookeeper across sites can be used for this (at least 3 sites)– Moving from one Kafka cluster to another is hard– KIP-33: Time-based log indexing

What about AWS (or Google Cloud)?– Disaster recovery can be accomplished via availiability zones


Planning For the Best

Not much different than designing for disasters

Consider having a limited number of aggregate clusters– Every copy of a message costs you money– Have 2 downstream (or super) sites that process aggregate messages– Push results back out to all sites– Good place for stream processing, like Samza

What about AWS (or Google Cloud)?– Geolocalization can be accomplished via regions


Segregating Data

All the data, available everywhere The right data, only where we need it

Topics are (usually) the smallest unit we want to deal with– Mirroring a topic between clusters is easy– Filtering that mirror is much harder

Have enough topics so consumers do not throw away messages


Retaining Data

Retain data only as long as you have to

Move data away from the front lines as quickly as possible.


Performance Concerns


Buy The Book!

Early Access available now.

Covers all aspects of Kafka, from setup to client development to ongoing administration and troubleshooting.

Also discusses stream processing and other use cases.


Kafka Cluster Sizing

How big for your local cluster?– How much disk space do you have?– How much network bandwidth do you have?– CPU, memory, disk I/O

How big for your aggregate cluster?– In general, multiple the number of brokers by the number of local clusters– May have additional concerns with lots of consumers


Topic Configuration

Partition Counts for Local– Many theories on how to do this correctly, but the answer is “it depends”– How many consumers do you have?– Do you have specific partition requirements?– Keeping partition sizes manageable

Partition Counts for Aggregate– Multiply the number of partitions in a local cluster by the number of local clusters– Periodically review partition counts in all clusters

Message Retention– If aggregate is where you really need the messages, only retain it in local for long enough

to cover mirror maker problems


Where Do I Put Mirror Makers?

Best practice was to keep the mirror maker local to the target cluster– TLS (can) make this a new game

In the datacenter with the produce cluster– Fewer issues from networking– Significant performance hit when using TLS on consume

In the datacenter with the consume cluster– Highest performance TLS– Potential to consume messages and drop on produce


Mirror Maker Sizing

Number of servers and streams– Size the number of servers based on the peak bytes per second– Co-locate mirror makers– Run more mirror makers in an instance than you need– Use multiple consumer and producer streams

Other tunables to look at– Partition assignment strategy– In flight requests per connection– Linger time


Segregation of Topics

Not all topics are created equal

High Priority Topics– Topics that change search results– Topics used for hourly or daily reporting

Low Latency Topics

Run a separate mirror maker for these topics– One bloated topic won’t affect reporting– Restarting the mirror maker takes less time– Less time to catch up when you fall behind


Conclusion


Broker Improvements

Namespaces– Namespace topics by datacenter– Eliminate local clusters and just have aggregate– Significant hardware savings

JBOD Fixes– Intelligent partition assignment– Admin tools to move partitions between mount points– Broker should not fail completely with a single disk failure


Mirror Maker Improvements

Identity Mirror Maker– Messages in source partition 0 get produced directly to partition 0– No decompression of message batches– Keeps key to partition affinity, supporting custom partitioning– Requires mirror maker to maintain downstream partition counts

Multi-Consumer Mirror Maker– A single mirror maker that consumes from more than one cluster– Reduces the number of copies of mirror maker that need to be running– Forces a produce-local architecture, however


Administrative Improvements

Multiple cluster management– Topic management across clusters– Visualization of mirror maker paths

Better client monitoring– Burrow for consumer monitoring– No open source solution for producer monitoring (audit)

End-to-end availability monitoring


Getting Involved With Kafka

http://kafka.apache.org

Join the mailing lists– [email protected]– [email protected]

irc.freenode.net - #apache-kafka

Meetups– Apache Kafka - http://www.meetup.com/http-kafka-apache-org– Bay Area Samza - http://www.meetup.com/Bay-Area-Samza-Meetup/

Contribute code

http://kafka.apache.org/





http://www.meetup.com/http-kafka-apache-org

http://www.meetup.com/http-kafka-apache-org

http://www.meetup.com/Bay-Area-Samza-Meetup/

http://www.meetup.com/Bay-Area-Samza-Meetup/


Data @ LinkedIn is Hiring!

Streams Infrastructure– Kafka pub/sub ecosystem– Stream Processing Platform built on Apache Samza– Next Generation change capture technology (incubating)

Join us in working on cutting edge stream processing infrastructures– Please contact [email protected]– Software developers and Site Reliability Engineers at all levels

LinkedIn Data Infrastructure Meetup– Where: LinkedIn @ 2061 Stierlin Court, Mountain View– When: May 11th at 6:30 PM– Registration: http://www.meetup.com/LinkedIn-Data-Infrastructure-Meetup


Data & Analytics

More Datacenters, More Problems