What is Kafka?
Apache Kafka is a distributed event streaming platform capable of handling trillions of events a day. This was initially conceived as a messaging Queue because Kafka is based on an abstraction of a distributed committed log. Since being created and open sourced by LinkedIn in 2011, Kafka has quickly evolved from messaging queue to a full fledged event streaming platform.
To be precise, Kafka is used for publishing and subscribing events. The basic concepts of Kafka are comparable to traditional messaging systems. Producers publish messages onto topics.Consumers subscribe to topics and receive messages.
Excursus
• A topic is a category to which messages are published. Topics consist of at least one partition.
• Partitions contain an ordered sequence of messages. The messages are stored in regular files.
• The order of messages within a given partition is the same across the whole cluster (“totally ordered”). All producers and all consumers see messages of one partition in the same order.
• There is no order however between messages on different partitions. A total order of messages in a topic can be achieved by only having one partition for this topic.
• With multiple partitions, each partition may be consumed by a different consumer of the same consumer group. Kafka guarantees that each partition will be assigned to exactly one consumer of a group. A partition may of course be assigned to multiple consumers, each part of a different consumer group. Producers choose onto which partition messages are published to. This can be round-robin or a domain based partition.
Simple Testing
- 1. Download Kafka from below ftp site
http://ftp.nluug.nl/internet/apache/kafka/2.2.0/kafka_2.12-2.2.0.tgz
- 2. Enter the /opt
directory, and extract
the archive:
cd /opt
tar -xvf kafka_2.12-2.2.0.tgz
3. Create a symlink called /opt/kafka
that points to the now
created /opt/
kafka_2.12-2.2.0
directory to make our lives easier.
kafka_2.12-2.2.0
/opt/kafka
4. Create a non-privileged user that will
run both zookeeper
and kafka
service.
useradd
kafka
5. Set the new user as owner of the whole directory we extracted, recursively
chown -R kafka:kafka /opt/kafka*
6.Create the unit file /etc/systemd/system/zookeeper.service
with the following content:
[Unit]
Description=zookeeper
After=syslog.target network.target
[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh
/opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
[Install]
WantedBy=multi-user.target
Note that we do not need to write the version number three times because
of the symlink we created. The same applies to the next unit file for Kafka, /etc/systemd/system/kafka.service
, that contains the following lines of configuration:
[Unit]
Description=Apache Kafka
Requires=zookeeper.service
After=zookeeper.service
[Service]
Type=simple
User=kafka
Group=kafka
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
[Install]
WantedBy=multi-user.target
- 7. Need
to reload systemd
to get
it read the new unit files:
8. We can start our new services (in this order):
systemctl start kafka
If
all goes well, systemd
should
report running state on both service's status, similar to the outputs below:
# systemctl status zookeeper.service
zookeeper.service -
zookeeper
Loaded: loaded (/etc/systemd/system/zookeeper.service;
disabled; vendor preset: disabled)
Active: active
(running) since Thu 2019-01-10 20:44:37 CET; 6s ago
Main PID: 11628 (java)
Tasks: 23 (limit:
12544)
Memory: 57.0M
CGroup:
/system.slice/zookeeper.service
11628 java -Xmx512M -Xms512M -server
[...]
# systemctl status kafka.service
kafka.service - Apache
Kafka
Loaded: loaded
(/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
Active: active
(running) since Thu 2019-01-10 20:45:11 CET; 11s ago
Main PID: 11949 (java)
Tasks: 64 (limit:
12544)
Memory: 322.2M
CGroup:
/system.slice/kafka.service
11949 java
-Xmx1G -Xms1G -server [...]
- 9. Optionally we can enable
automatic start on boot for both services
# systemctl
enable zookeeper.service
# systemctl
enable kafka.service
To test functionality, we'll connect to
Kafka with one producer and one consumer client. The messages provided by the
producer should appear on the console of the consumer. But before this we need
a medium these two exchange messages on. We create a new channel of data called
topic
in Kafka's terms, where the provider
will publish, and where the consumer will subscribe to. We'll call the topic FirstKafkaTopic
.
We'll use the kafka
user to create the topic:
$ /opt/kafka/bin/kafka-topics.sh --create --zookeeper
localhost:2181 --replication-factor 1 --partitions 1 --topic FirstKafkaTopic
10. Start a consumer client from the command line that will subscribe to the (at this point empty) topic created in the previous step:
/opt/kafka/bin/kafka-console-consumer.sh
--bootstrap-server localhost:9092 --topic
FirstKafkaTopic --from-beginning
11. On another terminal, we start a producer client, and publish some messages to the topic we created. We can query Kafka for available topics:
$ /opt/kafka/bin/kafka-topics.sh --list --zookeeper localhost:2181 FirstKafkaTopic
12. And connect to the one the consumer is subscribed, then send a message:
>
new message publish ed by producer from
console #2
13. At the
consumer terminal, the message should appear shortly:
$ /opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server
localhost:9092 --topic FirstKafkaTopic --from-beginning
new message published by
producer from console #2
Indepth explaination of Kafka thanks for the blog
ReplyDelete