[System Design] How LinkedIn Solved Its Log Aggregation Problem: A Case Study Of Kafka
6 min readMar 8, 2024
In the article we will be going over the the essential parts from the research paper “Kafka: a Distributed Messaging System for Log Processing”.
Kafka was developed at LinkedIn for collecting and delivering high volumes of log data with low latency.
1) The Start
Back in 2011, large amount of “log” was getting generated at internet sized companies. Two types of data was generated which included
- User Activity Events : This includes logins, page views, clicks, “likes”, sharing, comments and search queries.
- Operational Metrics : This includes service call stack, call latency, errors, and system metrics such as CPU, memory, network and disk utilization on each machine.
This data then can be utilized for
- Search Relevance
- Recommendations which maybe driven by item popularity.
- Ad targeting and reporting.
- Security applications that protect against abusive behaviors such as spam or unauthorized data…