[System Design] How LinkedIn Solved Its Log Aggregation Problem: A Case Study Of Kafka

6 min readMar 8, 2024

In the article we will be going over the the essential parts from the research paper “Kafka: a Distributed Messaging System for Log Processing”.

Kafka was developed at LinkedIn for collecting and delivering high volumes of log data with low latency.

1) The Start

Back in 2011, large amount of “log” was getting generated at internet sized companies. Two types of data was generated which included

User Activity Events : This includes logins, page views, clicks, “likes”, sharing, comments and search queries.
Operational Metrics : This includes service call stack, call latency, errors, and system metrics such as CPU, memory, network and disk utilization on each machine.

This data then can be utilized for

Search Relevance
Recommendations which maybe driven by item popularity.
Ad targeting and reporting.
Security applications that protect against abusive behaviors such as spam or unauthorized data scrapping
Newsfeed features that aggregate user status updates for their “friends” or “connections” to read.

This production of real-time usage of log data created new challenges for data systems because of its volume is orders of magnitude larger than the “real” data.

Hence out of this need, Kafka was born. It was originally created by Jay Kreps, Neha Narkhede and Jun Rao at LinkedIn. It was subsequently open sourced in early 2011.

🎉Fun Fact : Jay Kreps chose to name the software after the author Franz Kafka because it is “a system optimized for writing”, and he liked Kafka’s work.

Why existing systems could not fulfil LinkedIn needs ?

Some of the existing systems at the time were :

Active MQ (http://activemq.apache.org/)
IBM WebSphere MQ(http://www-01.ibm.com/software/integration/wmq/)
Oracle Enterprise Messaging Service(http://www.oracle.com/technetwork/middleware/ias/index-
TIBCO Enterprise Message Service (http://www.tibco.com/products/soa/messaging/)
JAVA Message Service(http://download.oracle.com/javaee/1.3/jms/tutorial/1_3_1-fcs/doc/jms_tutorialTOC.html.)

They were not a good fit for log processing because of the following reasons:-

There is a mismatch in features offered by enterprise systems. Those system often focus on rich set of delivery guarantees. For eg. IBM WebSphere MQ has transactional support that allow an application to insert messages into multiple queues atomically. The Java Messaging Service allows each individual message to be acknowledged after consumption, potentially out of order. Such guarantees are an overkill for a log collecting log data. For instance, loosing a few page view events occasionally is not the end of the world.
Many systems do not focus as strongly on throughput (throughput is often measured in terms of messages per second (MPS) or events per second (EPS). It indicates the system’s capacity to handle the incoming stream of messages, process them, and deliver them to their intended destinations) as their primary design constraint. For example at that time JMS had no API to allow the producer to explicitly batch multiple messages into a single request. This means that each message requires a full TCP/IP roundtrip which is not feasible for the throughput requirements of our domain.
There was no easy way to partition and store messages on multiple machines.
Many messaging systems assume near immediate consumption of messages, so the queue of unconsumed messages is always fairly small. Their performance degrades significantly is messages are allowed to accumulate, as in the case of offline consumers such as data warehousing applications that do periodic large loads rather than continuous consumption.
Lastly most of these systems use a “push” model in which broker forwards data to the consumers. At LinkedIn “pull” model was more suitable since consumer can retrieve the message at the maximum rate it can sustain and avoid being flooded by messages faster than it can handle. The pull model also makes it easy to rewind a consumer (we will discuss this in later sections).

We will discuss in the next sections about the guarantees Kafka provides and how they make sense for a distributed messaging and log aggregation system.

2) Kafka Architecture and Design Principles

In Kafka terms, A stream of messages of a particular type is defined by a topic.

In LinkedIn example we can consider all the messages getting generated out of page views can be clubbed together under one topic.

A producer can publish messages to a topic.

The published messages then are stored at a set of servers called brokers.

A consumer can subscribe to one of more topics from the broker and consume the subscriber messages by pulling the data from the broker.

A message here just contains a payload of bytes. A user can choose their favorite serialization method to encode this message.

To subscribe to a topic, a consumer first creates one of more message streams for the topic. The messages published to that topic will be evenly distributed in these sub-streams. Each stream provides an iterator interface over the continual streams of messages being produced. The consumer than iterates over every message in the stream and processes the payload of the message. Unlike traditional iterators the message stream iterator never terminates. If there are currently no no more messages to consume the iterator blocks until new messages are published to the topic.

The overall Architecture of Kafka is shown below. A Kafka cluster consists of multiple brokers. To balance load, a topic is divided into multiple partitions and each broker stores one of more of these partitions. Multiple producers and consumers can publish and consume messages at the same time. We will cover this in more detail in the upcoming sections.

2.1) Efficiency on a Single Partition

Key decisions that were made to make the system efficient:-

Simple Storage: Kafka has a very simple storage layout. Each partition corresponds to a logical log. Physically a log is implemented as a set of same sized segment files(e.g. 1 GB). Every time a producer publishes a message to a partition, the broker simply appends the message to the last segment file.

Unlike other messaging system, the messages stored in Kafka does not have a message id. Instead, each message is addressed by its logical offset in the log.

Each pull request from the consumer contains the offset of the message from which the consumption begins and an acceptable number of bytes to fetch.

Each broker keeps in memory a sorted list of offsets, including the offset of the first message in every segment file. The broker locates the segment file where the requested message resides by searching the offset list and sends the data back to the consumer. After a consumer receives a message, it computes the offset of the next message to consume and uses it in the next pull request.

Efficient Transfer: Producer can send a set of messages in a single request. Although the end consumer iterates one message at a time, under the cover each pull request also retrieves multiple messages.

Stateless Broker: Unlike other messaging systems, in Kafka, the information about how much consumer has consumed is not maintained by the broker but the consumer itself.

Therefore, the decision to delete the messages is not dependent of consumers in this case. Rather a simple time-based SLA policy is used to determine the retention period. A message is automatically deleted after the SLA period has crossed.

There is an important side benefit of this design. A consumer can
deliberately rewind back to an old offset and re-consume data.

2.2) Delivery Guarantees

In general Kafka only guarantees at least once delivery. Exactly once delivery typically requires two-phase commits and is not necessary for our applications.

Kafka guarantees that messages from a single partition are
delivered to a consumer in order.

This is the end of the article.