Artemis:服务器停止发出错误消息,因为在健康的网络中网络不正常

问题描述

我们在Red Hat Enterprise Linux Server 7.7(Linux 3.10.0-1062.el7.x86_64 x86_64)上使用Apache Artemis 2.14.0版本。但是服务器启动失败,提示:

AMQ201001: Network is unhealthy,stopping service ActiveMQServerImpl

我们配置了ping命令,并且ping成功返回,没有明显的延迟。

Ping输出

      ping 10.30.12.21
      PING 10.30.12.21 (10.30.12.21) 56(84) bytes of data.
      64 bytes from 10.30.12.21: icmp_seq=1 ttl=63 time=0.337 ms
      64 bytes from 10.30.12.21: icmp_seq=2 ttl=63 time=0.380 ms

Ping配置

      <network-check-period>5000</network-check-period>
      <network-check-timeout>1000</network-check-timeout>
      <network-check-list>10.30.12.21,10.30.12.22</network-check-list>
      <network-check-ping-command>ping -c 1 -t %d %s</network-check-ping-command>

两个ping ips均可正常访问。如果我评论ping配置服务器正常。

调试日志

2020-08-12 09:15:17,439 INFO  [org.apache.activemq.artemis.logs] AMQ201001: Network is unhealthy,stopping service ActiveMQServerImpl::serverUUID=8108eab2-dbff-11ea-83e1-eeeeeeeeeeee
2020-08-12 09:15:17,440 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Stopping discovery. There's an exception just as a trace where it happened: java.lang.Exception: trace
        at org.apache.activemq.artemis.core.cluster.DiscoveryGroup.stop(DiscoveryGroup.java:154) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.doClose(ServerLocatorImpl.java:1346) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.close(ServerLocatorImpl.java:1320) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.cluster.ClusterController.stop(ClusterController.java:138) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.cluster.ClusterManager.stop(ClusterManager.java:307) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1362) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1097) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:1033) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stop(ActiveMQServerImpl.java:876) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$1.stop(ActiveMQServerImpl.java:360) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.NetworkHealthCheck.run(NetworkHealthCheck.java:296) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.ActiveMQScheduledComponent$2.run(ActiveMQScheduledComponent.java:306) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.ActiveMQScheduledComponent$3.run(ActiveMQScheduledComponent.java:316) [artemis-commons-2.14.0.jar:2.14.0]
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [java.base:]
        at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [java.base:]
        at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [java.base:]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [java.base:]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [java.base:]
        at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.14.0.jar:2.14.0]

2020-08-12 09:15:17,443 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] endpoing closed
2020-08-12 09:15:17,443 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] receiving 65535
2020-08-12 09:15:17,443 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Received nodeID 393759d8-dc4e-11ea-a68e-eeeeeeeeeeee with originatingID =
2020-08-12 09:15:17,443 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Received 0 discovery entry elements
2020-08-12 09:15:17,444 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] changed = false
2020-08-12 09:15:17,444 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Calling notifyAll
2020-08-12 09:15:17,450 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] receiving 65535
2020-08-12 09:15:17,450 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] ignoring original NodeID8108eab2-dbff-11ea-83e1-eeeeeeeeeeee receivedID = 8108eab2-dbff-11ea-83e1-eeeeeeeeeeee
2020-08-12 09:15:17,450 DEBUG [org.apache.activemq.artemis.core.remoting.server.impl.RemotingServiceImpl] RemotingServiceImpl::removing connection ID b29aee05
2020-08-12 09:15:17,501 DEBUG [org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl] Couldn't find any bindings for address=activemq.notifications on message=CoreMessage[messageID=38,durable=true,userID=null,priority=0,timestamp=Wed Aug 12 09:15:17 IST 2020,expiration=0,address=activemq.notifications,size=277,properties=TypedProperties[_AMQ_NotifType=BROADCAST_GROUP_STOPPED,name=my-broadcast-group,_AMQ_NotifTimestamp=1597203917490]]@1627279258
2020-08-12 09:15:17,501 DEBUG [org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl] Message CoreMessage[messageID=38,_AMQ_NotifTimestamp=1597203917490]]@1627279258 is not going anywhere as it didn't have a binding on address:activemq.notifications
2020-08-12 09:15:17,506 DEBUG [org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl] Couldn't find any bindings for address=activemq.notifications on message=CoreMessage[messageID=40,size=267,properties=TypedProperties[_AMQ_NotifType=CLUSTER_CONNECTION_STOPPED,name=my-cluster,_AMQ_NotifTimestamp=1597203917506]]@1325505535
2020-08-12 09:15:17,506 DEBUG [org.apache.activemq.artemis.core.postoffice.impl.PostOfficeImpl] Message CoreMessage[messageID=40,_AMQ_NotifTimestamp=1597203917506]]@1325505535 is not going anywhere as it didn't have a binding on address:activemq.notifications
2020-08-12 09:15:17,507 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Stopping discovery. There's an exception just as a trace where it happened: java.lang.Exception: trace
        at org.apache.activemq.artemis.core.cluster.DiscoveryGroup.stop(DiscoveryGroup.java:154) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.doClose(ServerLocatorImpl.java:1346) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.close(ServerLocatorImpl.java:1320) [artemis-core-client-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl.closeLocator(ClusterConnectionImpl.java:467) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl.access$300(ClusterConnectionImpl.java:77) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.core.server.cluster.impl.ClusterConnectionImpl$1.run(ClusterConnectionImpl.java:452) [artemis-server-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:42) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.OrderedExecutor.doTask(OrderedExecutor.java:31) [artemis-commons-2.14.0.jar:2.14.0]
        at org.apache.activemq.artemis.utils.actors.ProcessorBase.executePendingTasks(ProcessorBase.java:65) [artemis-commons-2.14.0.jar:2.14.0]
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [java.base:]
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [java.base:]
        at org.apache.activemq.artemis.utils.ActiveMQThreadFactory$1.run(ActiveMQThreadFactory.java:118) [artemis-commons-2.14.0.jar:2.14.0]

2020-08-12 09:15:17,511 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] endpoing closed
2020-08-12 09:15:17,523 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] receiving 65535
2020-08-12 09:15:17,523 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Received nodeID 8108eab2-dbff-11ea-83e1-eeeeeeeeeeee with originatingID =
2020-08-12 09:15:17,523 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Received 0 discovery entry elements
2020-08-12 09:15:17,523 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] changed = false

更新

记录网络不正常消息

 2020-08-12 09:15:17,stopping service ActiveMQServerImpl::serverUUID=8108eab2-dbff-11ea-83e1-eeeeeeeeeeee
 2020-08-12 09:15:17,440 DEBUG [org.apache.activemq.artemis.core.cluster.DiscoveryGroup] Stopping discovery. There's an exception just as a trace where it happened: java.lang.Exception: trace
    at org.apache.activemq.artemis.core.cluster.DiscoveryGroup.stop(DiscoveryGroup.java:154) [artemis-core-client-2.14.0.jar:2.14.0]
    at org.apache.activemq.artemis.core.client.impl.ServerLocatorImpl.doClose(ServerLocatorImpl.java:1346) [artemis-core-client-2.14.0.jar:2.14.0]

更新2

TRACE的日志org.apache.activemq.artemis.core.server.NetworkHealthCheck

2020-08-13 22:37:31,453 INFO  [org.apache.activemq.artemis] AMQ241001: HTTP Server started at http://10.30.122.241:8161
2020-08-13 22:37:31,454 INFO  [org.apache.activemq.artemis] AMQ241002: Artemis Jolokia REST API available at http://10.30.122.241:8161/console/jolokia
2020-08-13 22:37:31,454 INFO  [org.apache.activemq.artemis] AMQ241004: Artemis Console available at http://10.30.122.241:8161/console
2020-08-13 22:37:33,806 TRACE [org.apache.activemq.artemis.core.server.NetworkHealthCheck] purePing on canonical address 10.30.12.20
2020-08-13 22:37:33,807 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] executing ping:: ping -c 1 -t 1 10.30.12.20
2020-08-13 22:37:33,828 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] PING 10.30.12.20 (10.30.12.20) 56(84) bytes of data.
2020-08-13 22:37:33,828 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] From 10.30.122.253 icmp_seq=1 Time to live exceeded
2020-08-13 22:37:33,828 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck]
2020-08-13 22:37:33,828 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] --- 10.30.12.20 ping statistics ---
2020-08-13 22:37:33,828 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] 1 packets transmitted,0 received,+1 errors,100% packet loss,time 0ms
2020-08-13 22:37:33,829 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck]
2020-08-13 22:37:33,829 TRACE [org.apache.activemq.artemis.core.server.NetworkHealthCheck] purePing on canonical address 10.30.12.21
2020-08-13 22:37:33,830 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] executing ping:: ping -c 1 -t 1 10.30.12.21
2020-08-13 22:37:33,835 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] PING 10.30.12.21 (10.30.12.21) 56(84) bytes of data.
2020-08-13 22:37:33,836 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] From 10.30.122.253 icmp_seq=1 Time to live exceeded
2020-08-13 22:37:33,836 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck]
2020-08-13 22:37:33,836 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] --- 10.30.12.21 ping statistics ---
2020-08-13 22:37:33,836 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] 1 packets transmitted,838 INFO  [org.apache.activemq.artemis.logs] AMQ201001: Network is unhealthy,stopping service ActiveMQServerImpl::serverUUID=45ae05a9-dd86-11ea-8ee5-eeeeeeeeeeee
2020-08-13 22:37:33,875 INFO  [org.apache.activemq.artemis.core.server] AMQ221029: stopped bridge $.artemis.internal.sf.my-cluster.869c8d37-dc5c-11ea-aa39-eeeeeeeeeeee
2020-08-13 22:37:34,060 INFO  [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 2.14.0 [45ae05a9-dd86-11ea-8ee5-eeeeeeeeeeee] stopped,uptime 11.724 seconds
2020-08-13 22:37:39,061 TRACE [org.apache.activemq.artemis.core.server.NetworkHealthCheck] purePing on canonical address 10.30.12.20
2020-08-13 22:37:39,061 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] executing ping:: ping -c 1 -t 1 10.30.12.20
2020-08-13 22:37:39,065 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] PING 10.30.12.20 (10.30.12.20) 56(84) bytes of data.
2020-08-13 22:37:39,066 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] From 10.30.122.253 icmp_seq=1 Time to live exceeded
2020-08-13 22:37:39,066 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck]
2020-08-13 22:37:39,066 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] --- 10.30.12.20 ping statistics ---
2020-08-13 22:37:39,068 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] 1 packets transmitted,time 0ms
2020-08-13 22:37:39,068 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck]
2020-08-13 22:37:39,068 TRACE [org.apache.activemq.artemis.core.server.NetworkHealthCheck] purePing on canonical address 10.30.12.21
2020-08-13 22:37:39,069 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] executing ping:: ping -c 1 -t 1 10.30.12.21
2020-08-13 22:37:39,075 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] PING 10.30.12.21 (10.30.12.21) 56(84) bytes of data.
2020-08-13 22:37:39,075 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] From 10.30.122.253 icmp_seq=1 Time to live exceeded
2020-08-13 22:37:39,075 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck]
2020-08-13 22:37:39,075 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] --- 10.30.12.21 ping statistics ---
2020-08-13 22:37:39,075 DEBUG [org.apache.activemq.artemis.core.server.NetworkHealthCheck] 1 packets transmitted,time 0ms

ping地址有1跳。

traceroute to 10.30.12.21 (10.30.12.21),30 hops max,60 byte packets
  1  10.30.122.253 (10.30.122.253)  1.472 ms  1.515 ms  1.527 ms
  2  10.30.12.21 (10.30.12.21)  0.282 ms  0.262 ms  0.231 ms

解决方法

我怀疑您使用的ping命令(即ping -c 1 -t %d %s)对于您的操作系统或用例实际上无效。要确认这一点,您需要为TRACE激活org.apache.activemq.artemis.core.server.NetworkHealthCheck日志记录,重现问题,然后检查日志。有关如何为特定记录器启用TRACE记录的详细信息,请参见this question

一旦有了日志,就很容易看到实际的问题了。之后,您可以根据需要调整ping命令。

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...