卡夫卡启动失败

问题描述

在我本地的 Ubuntu 20.04 上,Kafka 无法启动。

kafka@bablo-HP-ProBook-440-G5:~$ sudo systemctl status kafka
● kafka.service
     Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: enabled)
     Active: Failed (Result: exit-code) since Wed 2021-03-03 12:10:36 IST; 3min 26s ago
    Process: 32197 ExecStart=/bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/>
   Main PID: 32197 (code=exited,status=1/FAILURE)

Mar 03 12:10:33 bablo-HP-ProBook-440-G5 systemd[1]: Started kafka.service.
Mar 03 12:10:36 bablo-HP-ProBook-440-G5 systemd[1]: kafka.service: Main process exited,code=exited,status=1/FAILURE
Mar 03 12:10:36 bablo-HP-ProBook-440-G5 systemd[1]: kafka.service: Failed with result 'exit-code'.

在同一台主机上,zookeeper 服务运行流畅。

kafka@bablo-HP-ProBook-440-G5:~$ sudo systemctl status zookeeper
● zookeeper.service
     Loaded: loaded (/etc/systemd/system/zookeeper.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2021-03-03 11:47:27 IST; 28min ago
   Main PID: 932 (java)
      Tasks: 38 (limit: 9359)
     Memory: 84.1M
     CGroup: /system.slice/zookeeper.service
             └─932 java -Xmx512M -xms512M -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCI>

Mar 03 11:48:37 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:37,793] INFO Created server with tickTime 3000 minS>
Mar 03 11:48:38 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:38,022] INFO Using org.apache.zookeeper.server.NIOS>
Mar 03 11:48:38 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:38,037] INFO Configuring NIO connection handler wit>
Mar 03 11:48:38 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:38,058] INFO binding to port 0.0.0.0/0.0.0.0:2181 (>
Mar 03 11:48:38 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:38,265] INFO zookeeper.snapshotSizefactor = 0.33 (o>
Mar 03 11:48:38 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:38,279] INFO Snapshotting: 0x0 to /tmp/zookeeper/ve>
Mar 03 11:48:38 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:38,291] INFO Snapshotting: 0x0 to /tmp/zookeeper/ve>
Mar 03 11:48:38 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:38,406] INFO Using checkIntervalMs=60000 maxPerMinu>
Mar 03 11:48:45 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:48:45,679] INFO Creating new log file: log.1 (org.apac>
Mar 03 11:57:26 bablo-HP-ProBook-440-G5 zookeeper-server-start.sh[932]: [2021-03-03 11:57:26,915] WARN fsync-ing the write ahead log in SyncT>
lines 1-19/19 (END)

我检查了 9092 端口是否没有被占用,没有。

kafka@bablo-HP-ProBook-440-G5:~$ sudo lsof -i:9092
kafka@bablo-HP-ProBook-440-G5:~$ 

我没有得到这个命令的输出

当我打开 /home/kafka/logs 中的 server.log 时。 我注意到以下内容

[2021-03-03 12:10:34,936] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2021-03-03 12:10:34,941] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn)
[2021-03-03 12:10:34,944] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-03-03 12:10:34,947] INFO opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unkNown error) (org.apache.zookeeper.ClientCnxn)
[2021-03-03 12:10:34,952] INFO Socket connection established,initiating session,client: /127.0.0.1:40888,server: localhost/127.0.0.1:2181 (org.apache.zookeeper.ClientCnxn)
[2021-03-03 12:10:35,046] INFO Session establishment complete on server localhost/127.0.0.1:2181,sessionid = 0x10000019ced0002,negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
[2021-03-03 12:10:35,052] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2021-03-03 12:10:35,531] INFO Cluster ID = Tv2VkHIwQeeE9P8qlGcquQ (kafka.server.KafkaServer)
[2021-03-03 12:10:35,538] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentClusterIdException: The Cluster ID Tv2VkHIwQeeE9P8qlGcquQ doesn't match stored clusterId Some(NojQoU95QE-ubFol4BNjXg) in Meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
        at kafka.server.KafkaServer.startup(KafkaServer.scala:235)
        at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
        at kafka.Kafka$.main(Kafka.scala:82)
        at kafka.Kafka.main(Kafka.scala)
[2021-03-03 12:10:35,540] INFO shutting down (kafka.server.KafkaServer)
[2021-03-03 12:10:35,542] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-03-03 12:10:35,663] INFO Session: 0x10000019ced0002 closed (org.apache.zookeeper.ZooKeeper)
[2021-03-03 12:10:35,663] INFO EventThread shut down for session: 0x10000019ced0002 (org.apache.zookeeper.ClientCnxn)

所以这条线似乎包含了线索。

[2021-03-03 12:10:35,538] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentClusterIdException: The Cluster ID Tv2VkHIwQeeE9P8qlGcquQ doesn't match stored clusterId Some(NojQoU95QE-ubFol4BNjXg) in Meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
        at kafka.server.KafkaServer.startup(KafkaServer.scala:235)
        at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
        at kafka.Kafka$.main(Kafka.scala:82)
        at kafka.Kafka.main(Kafka.scala)

我可能错了,完全错了,请告诉我。 如何在 kafka 中修复这个“InconsistentClusterIdException”?

解决方法

正如我在问题中提到的,我的诊断表明“InconsistentClusterIdException”是这种情况的罪魁祸首。所以在某处定义了一个具有旧值的 cluster.id,而新的 cluster.id 与系统中定义的文件中给出的不同。

日志中还提到定义cluster.id的文件名为“meta.properties”,现在问题是去哪里找这个文件。

我检查了 kafka 安装目录有一个日志文件夹,一开始我以为 meta.properties 会在那里,但唉!事实并非如此。

然后我在根目录下运行以下命令来查找 meta.properties 文件位置。

kafka@bablo-HP-ProBook-440-G5:~$ sudo find / -type f -name meta.properties 
[sudo] password for kafka: 
find: ‘/run/user/1000/gvfs’: Permission denied
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-0bd2b2fa548f/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-2dbdd7463fde/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-6adbc7f092cf/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-f6f11dd5b1e3/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-ecf97c2a08bd/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-7bad187baa15/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-e114b322d257/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-de90efb99ab0/meta.properties
/var/lib/docker/volumes/5538f86a82c0c8e4b52b7c99d974f8d7f0dcd7cd5923372fc9526daa7610de77/_data/kafka-logs-5d1ae8504d91/meta.properties
/home/kafka/logs/meta.properties

结果的最后一行提到了 meta.properties 文件夹的位置。它位于/home/kafka/logs/meta.properties。 在那个位置,我只是在 meta.properties 文件中注释了包含 cluster.id 键的行,然后将其保存回来。

之后停止了zookeeper并重新启动了zookeeper和kafka。

就是这样。又开始顺利了。