问题描述
我们在 AWS 的 2 个 EC2 实例的集群上设置了 MariaDB 列存储。它突然停止在以下状态下工作。没有人能够登录到数据库。
mcsadmin getsystemstatus
System and Module statuses
Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Mon Jul 2 12:48:50 2021
Module um1 DEGRADE Mon Jul 2 12:48:46 2021
Module pm1 ACTIVE Mon Jul 2 12:49:00 2021
我们关闭了数据库并重新启动了我们获得以下状态的数据库帖子。我们停止了数据库并重新启动了安装了 UM1 的 EC2 实例,但状态仍然如下。
mcsadmin getsystemstatus
System and Module statuses
Component Status Last Status Change
------------ -------------------------- ------------------------
System Failed Mon Jul 2 12:48:50 2021
Module um1 Failed Mon Jul 2 12:48:46 2021
Module pm1 ACTIVE Mon Jul 2 12:49:00 2021
cat debug.log
Jul 5 12:46:46 ip-172-16-10-27 ProcessMonitor[1340]: 46.697394 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
Jul 5 12:46:46 ip-172-16-10-27 ProcessMonitor[1340]: 46.697776 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr,STATUS_UPDATE only performed
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.698421 |0|0|0| I 18 CAL0000: MSG RECEIVED: Stop All process request...
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.700241 |0|0|0| D 18 CAL0000: STOPPING Process: DMLProc
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.700590 |0|0|0| D 18 CAL0000: StatusUpdate of Process DMLProc State = 0 PID = 0
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.700705 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DMLProc
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.703311 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DMLProc
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.713661 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DMLProc\*
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.713814 |0|0|0| D 18 CAL0000: STOPPING Process: DDLProc
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.714097 |0|0|0| D 18 CAL0000: StatusUpdate of Process DDLProc State = 0 PID = 0
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.714255 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DDLProc
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.716726 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DDLProc
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726181 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: DDLProc\*
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726308 |0|0|0| D 18 CAL0000: STOPPING Process: ExeMgr
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726504 |0|0|0| D 18 CAL0000: StatusUpdate of Process ExeMgr State = 0 PID = 0
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.726747 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ExeMgr
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.729845 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ExeMgr
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.737917 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ExeMgr\*
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.738046 |0|0|0| D 18 CAL0000: STOPPING Process: DBRMWorkerNode
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.738268 |0|0|0| D 18 CAL0000: StatusUpdate of Process DBRMWorkerNode State = 0 PID = 0
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.738478 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device DBRMWorkerNode
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.741687 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device DBRMWorkerNode
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750193 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: workernode\*
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750320 |0|0|0| D 18 CAL0000: STOPPING Process: ServerMonitor
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750523 |0|0|0| D 18 CAL0000: StatusUpdate of Process ServerMonitor State = 0 PID = 0
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.750747 |0|0|0| D 18 CAL0000: Send CLEAR Alarm ID 25 on device ServerMonitor
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.754830 |0|0|0| D 18 CAL0000: Send SET Alarm ID 21 on device ServerMonitor
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.761855 |0|0|0| D 18 CAL0000: Pkill Process just to make sure: ServerMonitor\*
Jul 5 12:46:47 ip-172-16-10-27 ProcessMonitor[1340]: 47.797456 |0|0|0| D 18 CAL0000: BRM reset_locks script run
Jul 5 12:46:48 ip-172-16-10-27 ProcessMonitor[1340]: 48.756148 |0|0|0| D 18 CAL0000: Successfully ran DBRM clearShm
Jul 5 12:46:48 ip-172-16-10-27 ProcessMonitor[1340]: 48.843432 |0|0|0| D 18 CAL0000: Stop MysqL Process
Jul 5 12:46:48 ip-172-16-10-27 ProcessMonitor[1340]: 48.844815 |0|0|0| I 18 CAL0000: STOPALL: ACK back to ProcMgr,return status = 0
Jul 5 12:46:57 ip-172-16-10-27 ProcessMonitor[1340]: 57.584789 |0|0|0| I 18 CAL0000: MSG RECEIVED: Shutdown Module request...
Jul 5 12:46:57 ip-172-16-10-27 ProcessMonitor[1340]: 57.584894 |0|0|0| I 18 CAL0000: SHUTDOWNMODULE: ACK back to ProcMgr,return status = 0
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.318944 |0|0|0| I 18 CAL0000:
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.319014 |0|0|0| I 18 CAL0000: **********Process Monitor Started**********
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.319042 |0|0|0| D 18 CAL0000:
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.319066 |0|0|0| D 18 CAL0000: **********Process Monitor Started**********
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.323329 |0|0|0| D 18 CAL0000: Message Thread started ..
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.324161 |0|0|0| D 18 CAL0000: Cloud setting = amazon-vpc
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.324244 |0|0|0| D 18 CAL0000: PORTS: um1_ProcessMonitor/8800
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.324817 |0|0|0| D 18 CAL0000: amazonIPCheck function called
Jul 5 12:47:42 ip-172-16-10-27 ProcessMonitor[30970]: 42.330280 |0|0|0| D 18 CAL0000: getEC2InstanceIpAddress called to get status for Module 'um1' / Instance i-06c95252257960c5e
Jul 5 12:47:43 ip-172-16-10-27 ProcessMonitor[30970]: 43.699670 |0|0|0| D 18 CAL0000: Module is Running: 'um1' / Instance 'i-06c95252257960c5e' current IP didn't change.
Jul 5 12:47:43 ip-172-16-10-27 ProcessMonitor[30970]: 43.699754 |0|0|0| D 18 CAL0000: getEC2InstanceIpAddress called to get status for Module 'pm1' / Instance i-07c64d7c545727c0b
Jul 5 12:47:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.047175 |0|0|0| D 18 CAL0000: Module is Running: 'pm1' / Instance 'i-07c64d7c545727c0b' current IP didn't change.
Jul 5 12:47:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.047255 |0|0|0| D 18 CAL0000: amazonIPCheck function successfully completed
Jul 5 12:47:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.048717 |0|0|0| D 18 CAL0000: createDataDirs called
Jul 5 12:47:55 ip-172-16-10-27 ProcessMonitor[30970]: 55.071178 |0|0|0| D 18 CAL0000: error return from distributeConfigFile,waiting for Active ProcMgr to start
Jul 5 12:48:06 ip-172-16-10-27 ProcessMonitor[30970]: 06.092980 |0|0|0| D 18 CAL0000: error return from distributeConfigFile,waiting for Active ProcMgr to start
Jul 5 12:48:17 ip-172-16-10-27 ProcessMonitor[30970]: 17.142692 |0|0|0| D 18 CAL0000: error return from distributeConfigFile,waiting for Active ProcMgr to start
Jul 5 12:48:25 ip-172-16-10-27 ProcessMonitor[30970]: 25.672306 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul 5 12:48:25 ip-172-16-10-27 ProcessMonitor[30970]: 25.672802 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul 5 12:48:26 ip-172-16-10-27 ProcessMonitor[30970]: 26.674408 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul 5 12:48:26 ip-172-16-10-27 ProcessMonitor[30970]: 26.674851 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul 5 12:48:27 ip-172-16-10-27 ProcessMonitor[30970]: 27.675776 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul 5 12:48:27 ip-172-16-10-27 ProcessMonitor[30970]: 27.675946 |0|0|0| D 18 CAL0000: Successfull return from distributeConfigFile
Jul 5 12:48:27 ip-172-16-10-27 ProcessMonitor[30970]: 27.676099 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul 5 12:48:28 ip-172-16-10-27 ProcessMonitor[30970]: 28.678249 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul 5 12:48:28 ip-172-16-10-27 ProcessMonitor[30970]: 28.678585 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul 5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.680066 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul 5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.680341 |0|0|0| D 18 CAL0000: Successfull return from distributeProcessFile
Jul 5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.680407 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul 5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.684042 |0|0|0| D 18 CAL0000: StatusUpdate of Process ProcessMonitor State = 1 PID = 30970
Jul 5 12:48:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.685284 |0|0|0| D 18 CAL0000: MysqLd Monitoring Thread started ..
Jul 5 12:48:30 ip-172-16-10-27 ProcessMonitor[30970]: 30.681526 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul 5 12:48:30 ip-172-16-10-27 ProcessMonitor[30970]: 30.681966 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul 5 12:48:31 ip-172-16-10-27 ProcessMonitor[30970]: 31.682826 |0|0|0| I 18 CAL0000: MSG RECEIVED: Configure Module
Jul 5 12:48:31 ip-172-16-10-27 ProcessMonitor[30970]: 31.686100 |0|0|0| I 18 CAL0000: CONfigURE: ACK back to ProcMgr,return status = 0
Jul 5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.715876 |0|0|0| D 18 CAL0000: SYstem STATUS = 9
Jul 5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.716466 |0|0|0| D 18 CAL0000: Child Process Monitoring Thread started ..
Jul 5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.718262 |0|0|0| D 18 CAL0000: processInitComplete Successfully Called
Jul 5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.746537 |0|0|0| I 18 CAL0000: MSG RECEIVED: Get Calpont Software Info
Jul 5 12:48:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.746642 |0|0|0| I 18 CAL0000: GETSOFTWAREINFO: ACK back to ProcMgr with 1.2.21
Jul 5 12:48:40 ip-172-16-10-27 ProcessMonitor[30970]: 40.747287 |0|0|0| I 18 CAL0000: MSG RECEIVED: Update Calpont Config file
Jul 5 12:48:40 ip-172-16-10-27 ProcessMonitor[30970]: 40.747666 |0|0|0| I 18 CAL0000: UPDATECONfigFILE: Completed
Jul 5 12:48:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.757970 |0|0|0| I 18 CAL0000: MSG RECEIVED: Start All process request...
Jul 5 12:48:46 ip-172-16-10-27 oamcpp[30970]: 46.848238 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul 5 12:48:46 ip-172-16-10-27 ProcessMonitor[30970]: 46.848362 |0|0|0| C 18 CAL0000: STARTALL: MysqL Failed to start,start-module failure
Jul 5 12:48:46 ip-172-16-10-27 ProcessMonitor[30970]: 46.849567 |0|0|0| I 18 CAL0000: STARTALL: ACK back to ProcMgr,return status = 1
Jul 5 13:07:32 ip-172-16-10-27 ProcessMonitor[30970]: 32.151395 |0|0|0| I 18 CAL0000: MSG RECEIVED: Restart process request on MysqLd
Jul 5 13:07:33 ip-172-16-10-27 oamcpp[30970]: 33.391268 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul 5 13:07:33 ip-172-16-10-27 ProcessMonitor[30970]: 33.391440 |0|0|0| I 18 CAL0000: RESTART: ACK back to ProcMgr,return status = 0
Jul 5 13:08:45 ip-172-16-10-27 ProcessMonitor[30970]: 45.464215 |0|0|0| I 18 CAL0000: MSG RECEIVED: Restart process request on MysqLd
Jul 5 13:08:46 ip-172-16-10-27 oamcpp[30970]: 46.696043 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul 5 13:08:46 ip-172-16-10-27 ProcessMonitor[30970]: 46.696223 |0|0|0| I 18 CAL0000: RESTART: ACK back to ProcMgr,return status = 0
Jul 5 13:11:28 ip-172-16-10-27 ProcessMonitor[30970]: 28.103421 |0|0|0| I 18 CAL0000: MSG RECEIVED: Restart process request on MysqLd
Jul 5 13:11:29 ip-172-16-10-27 oamcpp[30970]: 29.342288 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul 5 13:11:29 ip-172-16-10-27 ProcessMonitor[30970]: 29.342473 |0|0|0| I 18 CAL0000: RESTART: ACK back to ProcMgr,return status = 0
Jul 5 13:11:39 ip-172-16-10-27 ProcessMonitor[30970]: 39.393199 |0|0|0| I 18 CAL0000: MSG RECEIVED: Start process request on: MysqLd
Jul 5 13:11:40 ip-172-16-10-27 oamcpp[30970]: 40.480796 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul 5 13:11:40 ip-172-16-10-27 ProcessMonitor[30970]: 40.480996 |0|0|0| I 18 CAL0000: START: ACK back to ProcMgr,return status = 0
Jul 5 13:12:03 ip-172-16-10-27 ProcessMonitor[30970]: 03.279250 |0|0|0| I 18 CAL0000: MSG RECEIVED: Start process request on: MysqLd
Jul 5 13:12:04 ip-172-16-10-27 oamcpp[30970]: 04.368315 |0|0|0| E 08 CAL0000: ***MysqL.pid FILE SIZE EQUALS ZERO
Jul 5 13:12:04 ip-172-16-10-27 ProcessMonitor[30970]: 04.368507 |0|0|0| I 18 CAL0000: START: ACK back to ProcMgr,return status = 0
系统信息给出以下状态:
MariaDB ColumnStore Process statuses
Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor um1 ACTIVE Mon Jul 5 12:48:39 2021 30970
ServerMonitor um1 INITIAL
DBRMWorkerNode um1 INITIAL
ExeMgr um1 INITIAL
DDLProc um1 INITIAL
DMLProc um1 INITIAL
MysqLd um1 Failed Mon Jul 5 12:48:46 2021
ProcessMonitor pm1 ACTIVE Mon Jul 5 12:48:18 2021 2941
ProcessManager pm1 ACTIVE Mon Jul 5 12:48:25 2021 3200
DBRMControllerNode pm1 ACTIVE Mon Jul 5 12:48:55 2021 3694
ServerMonitor pm1 ACTIVE Mon Jul 5 12:48:57 2021 3723
DBRMWorkerNode pm1 ACTIVE Mon Jul 5 12:48:57 2021 3751
PrimProc pm1 ACTIVE Mon Jul 5 12:49:01 2021 3824
WriteEngineserver pm1 ACTIVE Mon Jul 5 12:49:02 2021 3856
Active Alarm Counts: Critical = 2,Major = 1,Minor = 5,Warning = 0,Info = 0
有人可以帮忙解决这个问题吗?
我已检查此链接 (MariaDB Columnstore not starting when innodb uncommented in my.cnf) 并且提到的行已在“my.cnf”中进行了注释。
谢谢 甘尼什
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)