Thursday, August 20, 2020

KAFKA


What is Kafka?  

Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. This was initially conceived as a messaging Queue because Kafka is based on an abstraction of a distributed committed log.   Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full fledged event streaming platform. 

To be precise, Kafka is used for publishing and subscribing events. The basic concepts of Kafka are comparable to traditional messaging systems. Producers publish messages onto topics.Consumers subscribe to topics and receive messages.

Excursus

• A topic is a category to which messages are published. Topics consist of at least one partition.

• Partitions contain an ordered sequence of messages. The messages are stored in regular files.

• The order of messages within a given partition is the same across the whole cluster (“totally ordered”). All producers and all consumers see messages of one partition in the same order.

• There is no order however between messages on different partitions. A total order of messages in a topic can be achieved by only having one partition for this topic.

• With multiple partitions, each partition may be consumed by a different consumer of the same consumer group. Kafka guarantees that each partition will be assigned to exactly one consumer of a group. A partition may of course be assigned to multiple consumers, each part of a different consumer group. Producers choose onto which partition messages are published  to. This can be round-robin or a domain based partition.

Simple Testing


-      1. Download Kafka from below ftp site

 http://ftp.nluug.nl/internet/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz


-     2. Enter the /opt directory, and extract the archive:

cd /opt

tar -xvf kafka_2.12-2.2.0.tgz

 

3. Create a symlink called /opt/kafka that points to the now created /opt/ kafka_2.12-2.2.0 directory to make our lives easier.

 ln -s /opt/kafka_2.12-2.2.0  /opt/kafka


 4. Create a non-privileged user that will run both zookeeper and kafka service.

useradd kafka


 5. Set the new user as owner of the whole directory we extracted, recursively

     chown -R kafka:kafka /opt/kafka* 


      6.Create the unit file /etc/systemd/system/zookeeper.service with the following content:

[Unit]

Description=zookeeper

After=syslog.target network.target

 

[Service]

Type=simple

 

User=kafka

Group=kafka

 

ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties

ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh

 

[Install]

WantedBy=multi-user.target

 

Note that we do not need to write the version number three times because of the symlink we created. The same applies to the next unit file for Kafka, /etc/systemd/system/kafka.service, that contains the following lines of configuration:

[Unit]

Description=Apache Kafka

Requires=zookeeper.service

After=zookeeper.service

 

[Service]

Type=simple

 

User=kafka

Group=kafka

 

ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties

ExecStop=/opt/kafka/bin/kafka-server-stop.sh

 

[Install]

WantedBy=multi-user.target

 

-     7 Need to reload systemd to get it read the new unit files:

 systemctl daemon-reload

 

8. We can start our new services (in this order):

 systemctl start zookeeper

 systemctl start kafka

 

If all goes well, systemd should report running state on both service's status, similar to the outputs below:

# systemctl status zookeeper.service

  zookeeper.service - zookeeper

   Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)

   Active: active (running) since Thu 2019-01-10 20:44:37 CET; 6s ago

 Main PID: 11628 (java)

    Tasks: 23 (limit: 12544)

   Memory: 57.0M

   CGroup: /system.slice/zookeeper.service

            11628 java -Xmx512M -Xms512M -server [...]

 

# systemctl status kafka.service

  kafka.service - Apache Kafka

   Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)

   Active: active (running) since Thu 2019-01-10 20:45:11 CET; 11s ago

 Main PID: 11949 (java)

    Tasks: 64 (limit: 12544)

   Memory: 322.2M

   CGroup: /system.slice/kafka.service

            11949 java -Xmx1G -Xms1G -server [...]

 

-     9. Optionally we can enable automatic start on boot for both services

# systemctl enable zookeeper.service

# systemctl enable kafka.service

 

To test functionality, we'll connect to Kafka with one producer and one consumer client. The messages provided by the producer should appear on the console of the consumer. But before this we need a medium these two exchange messages on. We create a new channel of data called topic in Kafka's terms, where the provider will publish, and where the consumer will subscribe to. We'll call the topic FirstKafkaTopic. We'll use the kafka user to create the topic:

 #su - kafka

$ /opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic FirstKafkaTopic


10. Start a consumer client from the command line that will subscribe to the (at this point empty) topic created in the previous step:

/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic FirstKafkaTopic --from-beginning

 We leave the console and the client running in it open. This console is where we will receive the message we publish with the producer client.


11. On another terminal, we start a producer client, and publish some messages to the topic we created. We can query Kafka for available topics:

$ /opt/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181 FirstKafkaTopic

 

12. And connect to the one the consumer is subscribed, then send a message:

  $ /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic FirstKafkaTopic

 

> new message publish ed by producer from console #2

 

13. At the consumer terminal, the message should appear shortly:

 

$ /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic FirstKafkaTopic --from-beginning

 new message published by producer from console #2

 If the message appears, our test is successful, and our Kafka installation is working as intended. Many clients could provide and consume one or more topic records the same way, even with a single node setup we created in this tutorial.

1 comment: