问题描述
开放系统:SUSE Linux Enterprise Server 12 SP3
Bash 版本:4.3.42(1)-release (x86_64-suse-linux-gnu)
bash rpm 版本:bash-4.3-82.1.x86_64
node1:/var/log # rpm -q bash
bash-4.3-82.1.x86_64
node2:/var/log # bash --version
GNU bash,version 4.3.42(1)-release (x86_64-suse-linux-gnu)
copyright (C) 2013 Free Software Foundation,Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software; you are free to change and redistribute it.
There is NO WARRANTY,to the extent permitted by law.
node2:/var/log # lsb_release -a
LSB Version: n/a
distributor ID: SUSE
Description: SUSE Linux Enterprise Server 12 SP3
Release: 12.3
Codename: n/a
我有一个名为 ce1800v 的服务。它的主要进程是ce1800vd.sh。
每秒,它使用命令 cat /proc/${process_id}/stat
如果此命令的返回值不为零或进程状态为Z
(表示僵尸状态),我将重新启动服务。
以下是部分代码:
process_info=$(cat /proc/${process_id}/stat 2>/dev/null)
if [ $? -ne 0 ]; then
error_info="Get process ${process_id} info Failed."
else
然而,莫名其妙地,检查命令失败并输出(/var/log/messages
)
usr/ce1800v/bin/ce1800vd.sh: line 315: wait_for: No record of process 30388
我搜索了很多,发现 bash 4.3.42
有一个具有相同现象的错误。然而不同的触发原因。
我没有设置 lastpipe
选项。
这个问题几乎每天都在发生,但没有规律。
node2:/var/log # zgrep "wait_for" messages-202103*
messages-20210309.xz:2021-03-08T23:44:49.798309+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 35598
messages-20210310.xz:2021-03-09T11:51:31.799274+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 10211
messages-20210310.xz:2021-03-09T20:20:29.798511+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 1280: wait_for: No record of process 50678
messages-20210311.xz:2021-03-10T02:29:22.119447+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 9656
messages-20210311.xz:2021-03-10T11:14:25.802595+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 18307
messages-20210311.xz:2021-03-10T12:43:49.806703+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 525: wait_for: No record of process 20453
messages-20210311.xz:2021-03-10T13:06:15.804019+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 39166
messages-20210311.xz:2021-03-10T17:11:51.672758+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 25945
messages-20210311.xz:2021-03-10T19:07:41.514515+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 1280: wait_for: No record of process 55102
messages-20210311.xz:2021-03-10T21:22:01.810217+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 31511
messages-20210311.xz:2021-03-10T23:41:52.429946+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 526: wait_for: No record of process 42884
messages-20210312.xz:2021-03-11T00:26:53.845105+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 18142
messages-20210312.xz:2021-03-11T05:23:53.346956+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 31932
messages-20210312.xz:2021-03-11T08:01:19.918593+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 1280: wait_for: No record of process 39101
messages-20210312.xz:2021-03-11T09:02:23.804787+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 46670
messages-20210312.xz:2021-03-11T10:57:43.229498+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 526: wait_for: No record of process 2109
messages-20210312.xz:2021-03-11T12:10:57.874531+08:00 node2 ce1800vd.sh[3290]: /usr/ce1800v/bin/ce1800vd.sh: line 462: wait_for: No record of process 35847
我尝试从 4.2.46(2)-release (x86_64-redhat-linux-gnu)
升级我的 Centos bash。到4.4.0(1)-release (x86_64-unkNown-linux-gnu)
。但无法重现该问题。所以我想也许还有其他原因。
我现在很困惑,不知道接下来如何进行这个分析。
我想知道为什么?
谢谢!
解决方法
开放系统:SUSE Linux Enterprise Server 12 SP3
Bash 版本:4.3.42(1)-release (x86_64-suse-linux-gnu)
看来您使用的是相当老的 bash RPM 包。 请通过
检查安装的RPMrpm -q bash
最初的 SLES 12 SP3 版本是 4.3-82.1。
更新频道的最新版本是 4.3-83.23.1。 此版本解析为
GNU bash,版本 4.3.48(1)-release
请首先确保您拥有有效的订阅和附加的升级存储库,然后运行
zypper -v up bash
顺便说一句,即使你的 bash 包已经过时,你的系统也可能缺少更多的更新,所以运行
zypper -v up
会被推荐。
最后,由于 SLES 12 SP3 不再受支持,请考虑迁移到更新的版本,例如 SLES 12 SP5 或 SLES 15 SP2。