MariaDB 列存储不以 CAL0000 开头:***mysql.pid FILE SIZE EQUALS ZERO

问题描述

我们在 AWS 的 2 个 EC2 实例的集群上设置了 MariaDB 列存储。它突然停止在以下状态下工作。没有人能够登录数据库

mcsadmin getsystemstatus

System and Module statuses

Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        ACTIVE                       Mon Jul  2 12:48:50 2021

Module um1    DEGRADE                      Mon Jul  2 12:48:46 2021
Module pm1    ACTIVE                       Mon Jul  2 12:49:00 2021

我们关闭数据库并重新启动了我们获得以下状态的数据库帖子。我们停止了数据库并重新启动了安装了 UM1 的 EC2 实例,但状态仍然如下。

mcsadmin getsystemstatus

System and Module statuses

Component     Status                       Last Status Change
------------  --------------------------   ------------------------
System        Failed                       Mon Jul  2 12:48:50 2021

Module um1    Failed                       Mon Jul  2 12:48:46 2021
Module pm1    ACTIVE                       Mon Jul  2 12:49:00 2021

日志文件显示以下错误

cat debug.log
Jul  5 12:46:46 ip-172-16-10-27 ProcessMonitor[1340]: 46.697394 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
Jul  5 12:46:46 ip-172-16-10-27 ProcessMonitor[1340]: 46.697776 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr,STATUS_UPDATE only performed
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.698421 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.700241 |0|0|0| D 18 CAL0000: STOPPING Process: DMLProc
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.700590 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 0 PID = 0
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.700705 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.703311 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DMLProc
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.713661 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.713814 |0|0|0| D 18 CAL0000: STOPPING Process: DDLProc
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.714097 |0|0|0| D 18 CAL0000: StatusUpdate of Process DDLProc State = 0 PID = 0
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.714255 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DDLProc
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.716726 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DDLProc
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726181 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DDLProc\*
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726308 |0|0|0| D 18 CAL0000: STOPPING Process: ExeMgr
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726504 |0|0|0| D 18 CAL0000: StatusUpdate of Process ExeMgr State = 0 PID = 0
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726747 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ExeMgr
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.729845 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ExeMgr
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.737917 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ExeMgr\*
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.738046 |0|0|0| D 18 CAL0000: STOPPING Process: DBRMWorkerNode
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.738268 |0|0|0| D 18 CAL0000: StatusUpdate of Process DBRMWorkerNode State = 0 PID = 0
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.738478 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DBRMWorkerNode
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.741687 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DBRMWorkerNode
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750193 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: workernode\*
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750320 |0|0|0| D 18 CAL0000: STOPPING Process: ServerMonitor
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750523 |0|0|0| D 18 CAL0000: StatusUpdate of Process ServerMonitor State = 0 PID = 0
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750747 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ServerMonitor
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.754830 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ServerMonitor
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.761855 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ServerMonitor\*
Jul  5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.797456 |0|0|0| D 18 CAL0000: BRM reset_locks script run
Jul  5 12:46:48 ip-172-16-10-27 ProcessMonitor[1340]: 48.756148 |0|0|0| D 18 CAL0000: Successfully ran DBRM clearShm
Jul  5 12:46:48 ip-172-16-10-27 ProcessMonitor[1340]: 48.843432 |0|0|0| D 18 CAL0000: Stop MysqL Process
Jul  5 12:46:48 ip-172-16-10-27 ProcessMonitor[1340]: 48.844815 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr,return status = 0
Jul  5 12:46:57 ip-172-16-10-27 ProcessMonitor[1340]: 57.584789 |0|0|0| I 18 CAL0000: MSG RECEIVED: Shutdown Module request...
Jul  5 12:46:57 ip-172-16-10-27 ProcessMonitor[1340]: 57.584894 |0|0|0| I 18 CAL0000: SHUTDOWNMODULE: ACK back to ProcMgr,return status = 0
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.318944 |0|0|0| I 18 CAL0000:
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.319014 |0|0|0| I 18 CAL0000: **********Process Monitor Started**********
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.319042 |0|0|0| D 18 CAL0000:
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.319066 |0|0|0| D 18 CAL0000: **********Process Monitor Started**********
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.323329 |0|0|0| D 18 CAL0000: Message Thread started ..
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.324161 |0|0|0| D 18 CAL0000: Cloud setting = amazon-vpc
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.324244 |0|0|0| D 18 CAL0000: PORTS: um1_ProcessMonitor/8800
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.324817 |0|0|0| D 18 CAL0000: amazonIPCheck function called
Jul  5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.330280 |0|0|0| D 18 CAL0000: getEC2InstanceIpAddress called to get status for Module 'um1' / Instance i-06c95252257960c5e
Jul  5 12:47:43 ip-172-16-10-27 ProcessMonitor[30970]: 43.699670 |0|0|0| D 18 CAL0000: Module is Running: 'um1' / Instance 'i-06c95252257960c5e' current IP didn't change.
Jul  5 12:47:43 ip-172-16-10-27 ProcessMonitor[30970]: 43.699754 |0|0|0| D 18 CAL0000: getEC2InstanceIpAddress called to get status for Module 'pm1' / Instance i-07c64d7c545727c0b
Jul  5 12:47:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.047175 |0|0|0| D 18 CAL0000: Module is Running: 'pm1' / Instance 'i-07c64d7c545727c0b' current IP didn't change.
Jul  5 12:47:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.047255 |0|0|0| D 18 CAL0000: amazonIPCheck function successfully completed
Jul  5 12:47:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.048717 |0|0|0| D 18 CAL0000: createDataDirs called
Jul  5 12:47:55 ip-172-16-10-27 ProcessMonitor[30970]: 55.071178 |0|0|0| D 18 CAL0000: error return from distributeConfigFile,waiting for Active ProcMgr to start
Jul  5 12:48:06 ip-172-16-10-27 ProcessMonitor[30970]: 06.092980 |0|0|0| D 18 CAL0000: error return from distributeConfigFile,waiting for Active ProcMgr to start
Jul  5 12:48:17 ip-172-16-10-27 ProcessMonitor[30970]: 17.142692 |0|0|0| D 18 CAL0000: error return from distributeConfigFile,waiting for Active ProcMgr to start
Jul  5 12:48:25 ip-172-16-10-27 ProcessMonitor[30970]: 25.672306 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul  5 12:48:25 ip-172-16-10-27 ProcessMonitor[30970]: 25.672802 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul  5 12:48:26 ip-172-16-10-27 ProcessMonitor[30970]: 26.674408 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul  5 12:48:26 ip-172-16-10-27 ProcessMonitor[30970]: 26.674851 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul  5 12:48:27 ip-172-16-10-27 ProcessMonitor[30970]: 27.675776 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul  5 12:48:27 ip-172-16-10-27 ProcessMonitor[30970]: 27.675946 |0|0|0| D 18 CAL0000: Successfull return from distributeConfigFile
Jul  5 12:48:27 ip-172-16-10-27 ProcessMonitor[30970]: 27.676099 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul  5 12:48:28 ip-172-16-10-27 ProcessMonitor[30970]: 28.678249 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul  5 12:48:28 ip-172-16-10-27 ProcessMonitor[30970]: 28.678585 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul  5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.680066 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul  5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.680341 |0|0|0| D 18 CAL0000: Successfull return from distributeProcessFile
Jul  5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.680407 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul  5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.684042 |0|0|0| D 18 CAL0000: StatusUpdate of Process ProcessMonitor State = 1 PID = 30970
Jul  5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.685284 |0|0|0| D 18 CAL0000: MysqLd Monitoring Thread started ..
Jul  5 12:48:30 ip-172-16-10-27 ProcessMonitor[30970]: 30.681526 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul  5 12:48:30 ip-172-16-10-27 ProcessMonitor[30970]: 30.681966 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul  5 12:48:31 ip-172-16-10-27 ProcessMonitor[30970]: 31.682826 |0|0|0| I 18 CAL0000: MSG RECEIVED: Configure Module
Jul  5 12:48:31 ip-172-16-10-27 ProcessMonitor[30970]: 31.686100 |0|0|0| I 18 CAL0000: CONfigURE: ACK back to ProcMgr,return status = 0
Jul  5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.715876 |0|0|0| D 18 CAL0000: SYstem STATUS = 9
Jul  5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.716466 |0|0|0| D 18 CAL0000: Child Process Monitoring Thread started ..
Jul  5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.718262 |0|0|0| D 18 CAL0000: processInitComplete Successfully Called
Jul  5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.746537 |0|0|0| I 18 CAL0000: MSG RECEIVED: Get Calpont Software Info
Jul  5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.746642 |0|0|0| I 18 CAL0000: GETSOFTWAREINFO: ACK back to ProcMgr with 1.2.21
Jul  5 12:48:40 ip-172-16-10-27 ProcessMonitor[30970]: 40.747287 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul  5 12:48:40 ip-172-16-10-27 ProcessMonitor[30970]: 40.747666 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul  5 12:48:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.757970 |0|0|0| I 18 CAL0000: MSG RECEIVED: Start All process request...
Jul  5 12:48:46 ip-172-16-10-27 oamcpp[30970]: 46.848238 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul  5 12:48:46 ip-172-16-10-27 ProcessMonitor[30970]: 46.848362 |0|0|0| C 18 CAL0000: STARTALL: MysqL Failed to start,start-module failure
Jul  5 12:48:46 ip-172-16-10-27 ProcessMonitor[30970]: 46.849567 |0|0|0| I 18 CAL0000: STARTALL: ACK back to ProcMgr,return status = 1
Jul  5 13:07:32 ip-172-16-10-27 ProcessMonitor[30970]: 32.151395 |0|0|0| I 18 CAL0000: MSG RECEIVED: Restart process request on MysqLd
Jul  5 13:07:33 ip-172-16-10-27 oamcpp[30970]: 33.391268 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul  5 13:07:33 ip-172-16-10-27 ProcessMonitor[30970]: 33.391440 |0|0|0| I 18 CAL0000: RESTART: ACK back to ProcMgr,return status = 0
Jul  5 13:08:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.464215 |0|0|0| I 18 CAL0000: MSG RECEIVED: Restart process request on MysqLd
Jul  5 13:08:46 ip-172-16-10-27 oamcpp[30970]: 46.696043 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul  5 13:08:46 ip-172-16-10-27 ProcessMonitor[30970]: 46.696223 |0|0|0| I 18 CAL0000: RESTART: ACK back to ProcMgr,return status = 0
Jul  5 13:11:28 ip-172-16-10-27 ProcessMonitor[30970]: 28.103421 |0|0|0| I 18 CAL0000: MSG RECEIVED: Restart process request on MysqLd
Jul  5 13:11:29 ip-172-16-10-27 oamcpp[30970]: 29.342288 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul  5 13:11:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.342473 |0|0|0| I 18 CAL0000: RESTART: ACK back to ProcMgr,return status = 0
Jul  5 13:11:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.393199 |0|0|0| I 18 CAL0000: MSG RECEIVED: Start process request on: MysqLd
Jul  5 13:11:40 ip-172-16-10-27 oamcpp[30970]: 40.480796 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul  5 13:11:40 ip-172-16-10-27 ProcessMonitor[30970]: 40.480996 |0|0|0| I 18 CAL0000: START: ACK back to ProcMgr,return status = 0
Jul  5 13:12:03 ip-172-16-10-27 ProcessMonitor[30970]: 03.279250 |0|0|0| I 18 CAL0000: MSG RECEIVED: Start process request on: MysqLd
Jul  5 13:12:04 ip-172-16-10-27 oamcpp[30970]: 04.368315 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul  5 13:12:04 ip-172-16-10-27 ProcessMonitor[30970]: 04.368507 |0|0|0| I 18 CAL0000: START: ACK back to ProcMgr,return status = 0

系统信息给出以下状态:

MariaDB ColumnStore Process statuses

Process             Module    Status            Last Status Change        Process ID
------------------  ------    ---------------   ------------------------  ----------
ProcessMonitor      um1       ACTIVE            Mon Jul  5 12:48:39 2021       30970
ServerMonitor       um1       INITIAL
DBRMWorkerNode      um1       INITIAL
ExeMgr              um1       INITIAL
DDLProc             um1       INITIAL
DMLProc             um1       INITIAL
MysqLd              um1       Failed            Mon Jul  5 12:48:46 2021

ProcessMonitor      pm1       ACTIVE            Mon Jul  5 12:48:18 2021        2941
ProcessManager      pm1       ACTIVE            Mon Jul  5 12:48:25 2021        3200
DBRMControllerNode  pm1       ACTIVE            Mon Jul  5 12:48:55 2021        3694
ServerMonitor       pm1       ACTIVE            Mon Jul  5 12:48:57 2021        3723
DBRMWorkerNode      pm1       ACTIVE            Mon Jul  5 12:48:57 2021        3751
PrimProc            pm1       ACTIVE            Mon Jul  5 12:49:01 2021        3824
WriteEngineserver   pm1       ACTIVE            Mon Jul  5 12:49:02 2021        3856

Active Alarm Counts: Critical = 2,Major = 1,Minor = 5,Warning = 0,Info = 0

有人可以帮忙解决这个问题吗?

我已检查此链接 (MariaDB Columnstore not starting when innodb uncommented in my.cnf) 并且提到的行已在“my.cnf”中进行了注释。

谢谢 甘尼什

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)