Kafka internals

Kafka internalsDavid Gruzman, www.nestlogic.com

What is Kafka?Why it is so interesting? Is it just "yet another queue" with better performance?It is not queue, although can be used in that sense. Lets look on it as a database / storage technology

Data model● Ordered, Partitioned collection of Key-

Values.● Key is optional● Values - opaque

High level architecture

Role of the brokerBroker is handling read/writesIt forward messages for replicationIt does compactions on its own logs replica (without influence to any other copies)

Role of the controllerHandling “cluster wide events” sent by Zookeeper (all in sync with Zookeeper registries)● Brokers list change (registration, failure)● Leaders election● Change in topics (deleted, added, num of partitions

changed) ● Track partitions replica

Kafka controller

Zookeeper role- Kafka controller registration- List of topics and partitions- Partition states- Brokers registration (id, host, port)- Consumer registration & Subscriptions.

Partitions- Each partition has its leader broker and N followers.- Consumers and producers works with leaders only.- Partition is the main mechanism of a scale, within the

topic.- Producer specify the target partition via Partitioner

implementation (balancing within available topic partitions)

Access Pattern● Writers write data in massive streams. Data

is already "ordered". (This ordering is re-used)

● Readers consume data, each one from some position, sequentially.

Write pathProducer

broker

brokerbroker

Local Storage Local Storage Local Storage

Elected leader FollowersFollowers

Read pathRead always happens via partition leader.Kafka helps to balance consumers within the group. Each topic’s partition could be read by single consumer at a time to avoid simultaneous read of the same message by several consumers within the same group.

Data transfer efficiency1. Sequential disk access - optimal disk

utilization2. Zero copy - save CPU cycles.3. Compression - save network bandwidth.

CompressionUp to 0.7 Kafka - special kind of compressed messages was handled by clients (producer and consumer parts), the Broker was not involved.Starting 8.1, Kafka broker repackages messages in order to support logical offsets.

Indexing- When data flows into the broker, it is being

“indexed”. - Index files are stored alongside “segments”.- Segments are files with the data.

Consumer API Levels● Low level API : work with partitions and

offsets● High level API : Work with topics, automatic

offset management, load balancing. Can be rephrased as● Low level API : Database● High level : Queue

Layers of functionality

Levels of abstraction

Offset managementPrior to 0.81 it was pure Zookeeper responsibility to hold offsets metadata. Starting from 0.81 - there is special offset manager service. It runs with Broker, use special topic to store offsets and also do in-memory caching as optimization. We can choose what mechanism to use

Key CompactionKafka is capable to store only latest value per key.It is not a Queue. It is a table.This capability enables to store the whole state (the historical data flow), not only latest X days (in comparison to auto-deletion approach).

PerformanceWhy it is so fast?1. Network and Disk formats of messages are

the same. What is to be done is just append.2. Local storage is used.3. No attempts to cache / optimize.

Something big happensWe have new world of needs of real time data processing. In many cases - it means streams.For many years I thought it is just counters to be calculated before saving data into HDFS for “real work”. Now I see it quite different.

Naive use of Kafka

Possible simple solution...

Kafka as NoSQL- Sync replication as resilience model- Single master per partition- Opaque data- Compactions- Optimized for read in the same order as

write was done- Optimized for massive writes- Optimized for temporal data

ComputeSamza, Kafka Streams relation to Kafka is like MapReduce, Spark relation to HDFSKafka became media on top of which we build computational layers.Have to be said - no data locality. Samza, Kafka Streams solve common problems

State, recovery approachBoth Samza and Kafka streams took approach, for long time serving RDBMS. Snapshot + Redo log.They force stateful stream processing applications to follow this paradigm.

NestLogic caseWhat are we doing?Why do we need Kafka / SparkHow Kafka helped us?

Statistical analysis of data segments

First shot - Spark

What was a problem- All data have to be processed. We might

have not enough resources to process particular - huge - segment.

- Spark shuffle when data is bigger than RAM is challenging

- We are moving to more “real time” and streaming.

- We do not want to pay for sorting

Kafka as shuffle engine

What we learned - flexibility ● We can re-run “reduce” stage several times.● Kafka clients could wait for connection to be

reestablished with no timeouts, so we can repair failed Kafka resource leader, and the job will proceed.

● We can run clusters for map and reduce separately : flexibility to select their sizes. It saves us some money.

● Now we can have different technologies for Map and Reduce. We are about to replace map stage (transformations) with ImpalaToGo

● It can be much faster (like this happened in our case)● Significantly less RAM problems. It became true stream

processing

What we learned - contMore concise resource management.We can look on size of shuffle data, number of groups (available from Kafka cluster metadata), and only than decide on size of “reduce” cluster. We can interleave map and reduce stages, because there is no sorting requirements.

Is it universal solution?● If you need dozens of different, concurrent

jobs : Yarn + Spark probably the best● If you need single job to run smoothly and be

flexible with it - our approach comes into place

So, what we do?We help to distinguish your data by its nature, present it and help to decide what should be done with each

As data scientists...We believe that checking statistical homogeneity of data is very important

As business people...- Do not count attack as popularity- Do not count fraud as profit- Do not count bug as lack of interestAnd most important- Work hard to distinguish all above

As big data expertsIt is not simple to achieve. It took a lot of efforts to get good results, orchestrate operation etc.We believe - you have better utilization of your Big data, data science and devops resources.

How it looks

NestLogic incWe work hard to help you to Know your data.

Thank you for your attention

Сontact us Contact us on [email protected] in our site www.nestlogic.com

mailto:[email protected]

http://www.nestlogic.com

Helper slides

State is in RocksDBRocksDB was selected. A few quick facts: 1. Developed by Facebook, based on LevelDB2. Single node3. C++ library4. HBase ideas of sorting, snapshot, transaction logs.5. My speculation - transaction log is what “glue” it with

Kafka streams

Rebalancing - part 1- One of the brokers is elected as the coordinator for a

subset of the consumer groups. It will be responsible for triggering rebalancing attempts for certain consumer groups on consumer group membership changes or subscribed topic partition changes.

- It will also be responsible for communicating the resulting partition-consumer ownership configuration to all consumers of the group undergoing a rebalance operation.

Rebalancing - part 2- On startup or on co-ordinator failover, the consumer

sends a ClusterMetadataRequest to any of the brokers in the "bootstrap.brokers" list. In the request, it receives the location of the co-ordinator for it's group.

- The consumer sends a RegisterConsumer request to it's co-ordinator broker. In the response, it receives the list of topic partitions that it should own.

- At this time, group management is done and the consumer starts fetching data and (optionally) committing offsets for the list of partitions it owns.

Consumer balancingIt is the capability to balance load, fail over between consumers in the same group.

Kafka consumer communicates with Co-ordinator Broker for this. Co-ordinator broker info is stored on ZK and is available from any broker.

These mechanisms are reused in Kafka Streams

Co-ordinator broker, part 11. Reads the list of groups it manages and their

membership information from ZK.2. If discovered membership is alive (as from ZK), waits for

consumers in each of the groups to re-register with it.3. Does failure detection for all consumers in a group.

Consumers marked as dead by the co-ordinator's failure detection protocol are removed from the group and the co-ordinator marks the rebalance for a group completed by communicating the new partition ownership to the remaining consumers in the group.

Co-ordinator broker, part 24. The co-ordinator tracks the changes to topic partition changes for all topics that any consumer group has registered interest for. If it detects a new partition for any topic, it triggers a rebalance operation (killing consumers socket connection with itself). The creation of new topics can also trigger a rebalance operation as consumers can register for topics before they are created

Consumer and Co-ordinator

Log compaction

Any read from offset 0 to any offset Q where Q > P that completes in less than a configurable SLA will see the final state of all keys as of time Q. Log head is always a single segment (default 1GB)

Material usedAbout compression : http://www.confluent.io/blog/compression-in-apache-kafka-is-now-34-percent-fasterChange in message offsets : https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+ProposalConfiguration : https://cwiki.apache.org/confluence/display/KAFKA/Kafka+data+structures+in+Zookeeper

http://www.confluent.io/blog/compression-in-apache-kafka-is-now-34-percent-faster




https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal





Engineering

Kafka internals