The Kafka project, which was created by LinkedIn in 2012 and adopted to by Apache, is a public subscribe distributed message system. This post will provide an overview of Kafka, focusing on the ideas related producers, topic, brokers, and consumers.
Introduction to Kafka
Scala’s Kafka is a high-throughput, partitioned, scalable log system that can be written in Scala. It was originally created by LinkedIn to handle live feeds from all social media channels, including Twitter, Facebook, and LinkedIn. It was later made open-source so that other organizations could also adopt it. Like other messaging systems messages can be written to and read by the server, but Kafka clusters speeds this up.
Kafka is a “public subscribed distributed messaging system” and not a “queue” system, since the message is received by the producer and broadcast to a group rather than one consumer.
Architecture of Kafka
Let’s now look at the architecture of Kafka after we have reviewed its history. These are the fundamental terms that are associated with Kafka architecture: producer, broker/consumer, topic, and so on.
Producer:
Different producers such as Apps, DBMS and NoSQL write data into the Kafka cluster. Many “brokers” make up the Kafka cluster. In layman’s terms, each “broker” is a “server”. Each message is assigned a key that ensures that all messages with the exact same key reach the same partition. The Kafka cluster producer continues to send messages to the Kafka cluster, without waiting for acknowledgement. This asynchronous way of creating and adding messages to the Kafka cluster is what gives Kafka its incredible speed, which is a necessity in today’s social media world.
Topic:
Messages of the same type are considered a ‘Topic. A ‘Topic is similar to a File structure. Messages are published to a Topic and each Topic has a partition.
Brokers:
Kafka’s “broker”, as it is called, is very similar to a traditional broker. It contains the messages that were written by the producer prior to being consumed by the consumer’.
Kafka cluster has many “brokers” and “servers”. Each “broker”, or “server”, has a partition. As mentioned, each partition is associated to a ‘Topic. The messages are received by the brokers and stored in the “brokers’ for a ‘n’ amount of days (which can also be configured). The messages are discarded after the expiry of the “n” number of days. Kafka doesn’t verify that each consumer has read the messages.
Consumer:
The message is then read by the consumers after the “producers”, or the Kafka brokers, have sent it. Each “consumer” (or “consumer group”) is subscribed for different “topics”. They then read from the “partitions” for those “topics”. If one broker goes down, the other brokers support it and make sure it runs smoothly.
ZooKeeper:
The Zookeeper is responsible for coordination with all components of the Kafka cluster. The producer gives the message to the “broker lead”, who copies it onto other brokers. Kafka has been adopted by many organizations, including LinkedIn, Yahoo!, Twitter, Pinterest and Tumblr.
This post provided an overview of Kafka, followed by its architecture. As time passes, Kafka will be adopted by more organizations.
Kafka.apache.org has more information.