MariaDB HAProxy死锁

问题描述

我正在尝试使用haproxy,keepalived,mariadb和galera集群运行Zabbix。你能帮我吗?

  • 3个mariadb集群数据库服务器。
  • 2个keepalived和haproxy服务器。
  • 1个zabbix服务器。

日志:

Error in query [COMMIT] [Deadlock found when trying to get lock; try restarting transaction]

log2:

Sep 28 13:59:41 db2 MysqLd[18273]: 2020-09-28 13:59:41 101 [Warning] Aborted connection 101 to db: 'mydb' user: 'myuser' host: '192.168.1.107' (Got an error reading communication packets)

log3:

Sep 28 14:01:58 hake haproxy[16785]: 192.168.1.103:37036 [28/Sep/2020:14:01:58.229] galera_cluster_frontend galera_cluster_backend/db2 1/1/8 295 -- 14/14/13/4/0 0/0
Sep 28 14:01:59 hake haproxy[16785]: 192.168.1.103:36960 [28/Sep/2020:14:01:09.283] galera_cluster_frontend galera_cluster_backend/db1 1/0/50008 364 cD 13/13/12/4/0 0/0
Sep 28 14:01:59 hake haproxy[16785]: 192.168.1.103:37038 [28/Sep/2020:14:01:59.492] galera_cluster_frontend galera_cluster_backend/db3 1/0/27 2550 -- 14/14/13/4/0 0/0

log4:

Sep 28 14:05:29 hake Keepalived_vrrp[19536]: opening file '/etc/keepalived/keepalived.conf'.
Sep 28 14:05:29 hake Keepalived_vrrp[19536]: WARNING - default user 'keepalived_script' for script execution does not exist - please create.
Sep 28 14:05:29 hake Keepalived_vrrp[19536]: Truncating auth_pass to 8 characters
Sep 28 14:05:29 hake Keepalived_vrrp[19536]: Security VIOLATION - scripts are being executed but script_security not enabled.
Sep 28 14:05:29 hake Keepalived_vrrp[19536]: Using LinkWatch kernel netlink reflector...
Sep 28 14:05:29 hake Keepalived_vrrp[19536]: VRRP_Script(chk_haproxy) succeeded
Sep 28 14:05:30 hake Keepalived_vrrp[19536]: VRRP_Instance(LB_VIP) Transition to MASTER STATE
Sep 28 14:05:30 hake Keepalived_vrrp[19536]: VRRP_Instance(LB_VIP) Changing effective priority from 101 to 103
Sep 28 14:05:31 hake Keepalived_vrrp[19536]: VRRP_Instance(LB_VIP) Entering MASTER STATE
Sep 28 14:05:31 hake Keepalived_vrrp[19536]: SMTP connection ERROR to [127.0.0.1]:25.

解决方法

HAProxy日志为您提供提示:

Sep 28 14:01:59 hake haproxy[16785]: 192.168.1.103:36960 [28/Sep/2020:14:01:09.283] galera_cluster_frontend galera_cluster_backend/db1 1/0/50008 364 cD 13/13/12/4/0 0/0

如您所见,session state at disconnect被指定为cD。这就是说,引用链接的文档:

客户端没有发送或确认任何数据的时间 “超时客户端”延迟。这通常是由网络故障引起的 客户端,或者客户端只是不干净地离开网络。

这样,您的客户端实际上已经消失了,或者MySQL客户端和HAProxy之间发生了一些网络更改,或者(更可能是):您在HAProxy中设置了一个timeout client设置,该设置太短了

在这种情况下,您可以增加此超时时间,以确保现有连接保持打开状态的时间更长。但是,您应该确保不要将超时时间设置得太长,以确保最终发现并回收实际上断开或不干净的中止连接。

尤其是在TCP模式下,通常最好将timeout clienttimeout server设置为相同的值。