I wrote about the Introduction to Kafka a while ago without touching the technical side of it and its use cases. There are couples of jargons to be familiar with this blog. I used an image I downloaded from the Internet to explain it.
There are four core APIs (Application Programming Interfaces) we need to know:
- The Producer API allows an application to publish a stream of records to one or more Kafka topics.
- The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
- The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
- The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
We can run the Kafka in a single node server (node) or a cluster mode with multiple nodes (Kafka broker). Producers are processes that publish data or a stream of records (push messages) into Kafka topics within the broker.
A consumer pulls records off a or more Kafka topic and processes the streams of records produced to them. You can see it as Kafka has publisher, topics and subscribers. It can partition topics and enable parallel consumption. Messages are replicated across the cluster to provide support for multiple subscribers and balances the consumers in case of failures.
In Kafka, messages are written to a topic that maintains this log (or multiple logs — one for each partition) from which subscribers can read and derive their representations of the data. In a simple way, you can think it is an “activity” log.
Main parts of Kafka system:
- Broker: Handles all requests from clients (produce, consume, and metadata) and keeps data replicated within the cluster. There can be one or more brokers in a cluster.
- Zookeeper: Keeps the state of the cluster (brokers, topics, users). (It is a system).
- Producer: Sends records to a broker.
- Consumer: Consumes batches of records from the broker.
In my self-learning course, the instructor shared some use cases of using the Kafka:
- Messaging system
- Activity tracking
- Application logs gathering
- Streaming processes with Spark or Kafka Stream API.
- Decoupling system dependencies.
- Integration with Spark, Flink, Hadoop, Storm and other Big Data technologies.
With a proper configuration, Kafka able to ensure zero data loss. In this entry, I cover the fundamental of Apache Kafka, and some of the important keywords. These include producers, consumers, topics, brokers and Zookeeper. For now, I keep the explanation of Topics, Brokers and Zookeeper in another blog entry.