问题描述
最近我正在学习 dumb-init,如果我没记错的话,它正在尝试:
- 作为 PID1 运行,就像一个简单的初始化系统(收割僵尸进程)
- 信号代理/转发(bash不这样做)
在 here 和 here 中,他们都提到 bash
能够收割僵尸进程,所以我正在尝试验证这一点,但无法使其工作。
首先我写了一个简单的 Go 程序,它产生了 10 个僵尸进程:
func main() {
sigs := make(chan os.Signal,1)
signal.Notify(sigs,syscall.SIGINT,syscall.SIGTERM,syscall.SIGKILL)
go func() {
for i := 0; i < 10; i++ {
sleepCmd := exec.Command("sleep","1")
_ = sleepCmd.Start()
}
}()
fmt.Println("awaiting signal")
sig := <-sigs
fmt.Println()
fmt.Printf("received %s,exiting\n",sig.String())
}
为它构建一个镜像:
FROM golang:1.15-alpine3.12 as builder
workdir /
copY . .
RUN go build -o main main.go
FROM alpine:3.12
RUN apk --no-cache --update add dumb-init bash
workdir /
copY --from=builder /main /
copY --from=builder /entrypoint.sh /
RUN chmod +x /entrypoint.sh
ENTRYPOINT ["/main"]
如果我运行 docker run -d <image>
,它会按预期工作,我可以在 ps
中看到 10 个僵尸进程:
vagrant@vagrant:/vagrant/dumb-init$ ps aux | grep sleep
root 4388 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4389 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4390 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4391 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4392 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4393 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4394 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4395 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4396 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
root 4397 0.0 0.0 0 0 ? Z 13:54 0:00 [sleep] <defunct>
第 2 步是验证 bash
是否真的能够收获进程,所以我将我的 docker 镜像 ENTRYPOINT 更新为 entrypoint.sh,它只是用 bash 包装了我的程序:
#!/bin/bash
/cLever
如果我在容器中运行 ps
,僵尸进程仍然挂在那里:
/ # ps
PID USER TIME COMMAND
1 root 0:00 {entrypoint.sh} /bin/bash /entrypoint.sh
7 root 0:00 /cLever
13 root 0:00 [sleep]
14 root 0:00 [sleep]
15 root 0:00 [sleep]
16 root 0:00 [sleep]
17 root 0:00 [sleep]
18 root 0:00 [sleep]
19 root 0:00 [sleep]
20 root 0:00 [sleep]
21 root 0:00 [sleep]
22 root 0:00 [sleep]
31 root 0:00 /bin/sh
39 root 0:00 ps
感谢您的帮助。
解决方法
我在 c
中编写了一个小演示,可以帮助证明 bash
已经收获了僵尸进程,以及如果他没有收获会是什么样子。
先解释一下僵尸进程的定义。僵尸进程是已经完成工作并生成退出代码的进程。资源由内核保留,等待父进程收集退出代码。
要生成僵尸,父级需要忽略子级的退出(不要发出 wait
并忽略 SIGCHLD
)。
收割僵尸
以下 c
代码正在创建两个僵尸进程。一个属于主进程,一个属于第一个子进程。
#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <pthread.h>
#include <sys/wait.h>
#include <unistd.h>
int main()
{
printf("Starting Program!\n");
int pid = fork();
if (pid == 0)
{
pid = fork(); // Create a child zombie
if (pid == 0) {
printf("Zombie process %i of the child process\n",getpid());
exit(10);
} else {
printf("Child process %i is running!\n",getpid());
sleep(10); // wait 10s
printf("Child process %i is exiting!\n",getpid());
exit(0);
}
}
else if (pid > 0)
{
pid = fork();
if (pid == 0) {
printf("Zombie process %i from the parent process\n",getpid());
} else {
printf("Parent process %i...\n",getpid());
sleep(5);
printf("Parent process will crash with segmentation failt!\n");
int* p = 0;
p = 10;
}
}
else perror("fork()");
exit(-1);
}
我还创建了一个 docker 容器来编译文件和子文件。整个项目可在以下 git repository
运行构建和演示后,控制台中显示以下打印输出:
root@d2d87f4aafbc:/zombie# ./zombie & ps -eaf --forest
[1] 8
Starting Program!
Parent process 8...
Zombie process 11 from the parent process
Child process 10 is running!
Zombie process 12 of the child process
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 8 1 0 10:43 pts/0 00:00:00 ./zombie
root 10 8 0 10:43 pts/0 00:00:00 \_ ./zombie
root 12 10 0 10:43 pts/0 00:00:00 | \_ [zombie] <defunct>
root 11 8 0 10:43 pts/0 00:00:00 \_ [zombie] <defunct>
root 9 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
root@d2d87f4aafbc:/zombie# Parent process will crash with segmentation failt!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 10 1 0 10:43 pts/0 00:00:00 ./zombie
root 12 10 0 10:43 pts/0 00:00:00 \_ [zombie] <defunct>
root 13 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
[1]+ Exit 255 ./zombie
root@d2d87f4aafbc:/zombie# Child process 10 is exiting!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:43 pts/0 00:00:00 /bin/bash
root 14 1 0 10:43 pts/0 00:00:00 ps -eaf --forest
主进程 (PID 8) 创建两个子进程。
- 一个子节点 (PID 10),它将创建一个僵尸子节点 (PID 12) 并睡眠 10 秒。
- 一个会变成僵尸的孩子(PID 11)。
进程创建后,父进程会休眠 5s 并创建分段错误,留下僵尸进程。
当主进程终止时,PID 11 由 bash
继承并被清理(收割)。 PID 10 仍在工作(睡眠是进程的一种工作)他被 bash
留下,因为 PID 11 没有调用 wait
,PID 12 仍然是僵尸。
5 秒后,PID 11 完成睡眠并退出。 Bash 收获并继承了 PID 12,之后 bash 收获了 PID 12
离开僵尸
另一个 c
应用程序只是将 bash
作为子进程执行,让它成为 PID 1,他将忽略僵尸。
# docker run -ti --rm test /zombie/ignore
root@b9d49363cb57:/zombie# ./zombie & ps -eaf --forest
[1] 10
Starting Program!
Parent process 10...
Zombie process 13 from the parent process
Child process 12 is running!
Zombie process 14 of the child process
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 10 8 0 11:18 pts/0 00:00:00 \_ ./zombie
root 12 10 0 11:18 pts/0 00:00:00 | \_ ./zombie
root 14 12 0 11:18 pts/0 00:00:00 | | \_ [zombie] <defunct>
root 13 10 0 11:18 pts/0 00:00:00 | \_ [zombie] <defunct>
root 11 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root@b9d49363cb57:/zombie# pParent process will crash with segmentation failt!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 15 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root 12 1 0 11:18 pts/0 00:00:00 ./zombie
root 14 12 0 11:18 pts/0 00:00:00 \_ [zombie] <defunct>
root 13 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
[1]+ Exit 255 ./zombie
root@b9d49363cb57:/zombie# Child process 12 is exiting!
ps -eaf --forest
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:18 pts/0 00:00:00 /zombie/ignore
root 7 1 0 11:18 pts/0 00:00:00 sh -c /bin/bash
root 8 7 0 11:18 pts/0 00:00:00 \_ /bin/bash
root 16 8 0 11:18 pts/0 00:00:00 \_ ps -eaf --forest
root 12 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root 13 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root 14 1 0 11:18 pts/0 00:00:00 [zombie] <defunct>
root@b9d49363cb57:/zombie#
所以现在,我们系统中还剩下 3 个僵尸,悬而未决。