问题描述
我有一个应用程序每 1-2 秒在我的集群中创建一个僵尸进程。我在我的应用程序中使用 Process,但只有在我收到特定命令时才使用,而现在情况并非如此。
String command = "helm install release xxx";
LOGGER.debug("handle Install request : command [{}]",command);
waitFornormalTermination(Runtime.getRuntime().exec(command),INSTALL_TIMEOUT,TimeUnit.SECONDS,name);
private void waitFornormalTermination(Process process,int timeout,TimeUnit unit,String release) throws Exception {
try {
if (!process.waitFor(timeout,unit)) {
throw new TimeoutException("Timeout while executing " + process.info().commandLine().orElse(null));
}
if (process.exitValue() != 0) {
String errorStreamOutput = IoUtils.toString(process.getErrorStream(),StandardCharsets.UTF_8);
if (errorStreamOutput != null && errorStreamOutput.contains("release: not found")) {
throw new ReleaseNotFoundException(release);
}
throw new Exception("Process termination was abnormal,exit value: [" + process.exitValue() + "],command:[" + process.info().commandLine().orElse(null) + "] error returned:[" + errorStreamOutput + "]");
}
} finally {
pr.destroy(); // that part was added to simplify the code.. but each process are destroy like that before existing that method
}
}
这是我所做的
#1 - add pr.destroy(); in my code
#1b - build and publish the image
#2 - I killed my pod in my cluster.
#3 - my pod was recreated with the new image
#4 - I look into my node were I had zombies (it's the same where my application was).
I killed the process java that were generating zombie. I had over 12 000 zombies.. Now I'm back at 4200.
#5 - I did : ps aux | grep 'Z' | wc -l
in a loop to see if I have new zombies... and yes.. they are still increasing
Now I have this : root@test-pcl111:~# ps aux | grep 'Z' | wc -l
4487
I did this : kubectl logs iep-iep-codec-staging-7596fccd85-jkn68 --follow
在另一个终端,看看我是否有活动...
僵尸仍然每 1-2 秒增加一次,即使我身边没有活动,除了少数定期 REST 调用(从其他应用程序轮询)。在这一点上,我没有调用创建 new Process(..) 的方法
我错过了什么?
编辑 我创建了一个小脚本,它将通过您节点中的应用程序打印僵尸。
#!/bin/bash
ps -eo ppid,comm | grep "<defunct>" | awk '{print $1}' | sort | uniq -c > /tmp/zombie.file
Files="/tmp/zombie.file"
Lines=$(cat $Files | tr -s ' ' | cut -d ' ' -f2,3)
i=0;
for Line in $Lines
do
if [[ $i -eq 0 ]]
then
echo "Zombies found = $Line"
i=1
else
ps -f $Line
i=0
fi
done
echo " "
echo " "
echo "Running docker containers are "
# that line was to grep only our containers from our private repo
#docker ps | grep private-repository
echo " "
echo " "
echo "the PID of those docker containers"
for value in $(docker ps | grep private-repository | cut -d ' ' -f1); do
docker inspect --format '{{ .State.Pid }}' $value
done
编辑
我对 Containerd 有同样的问题。看起来问题出在 exec 探针上。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)