如何使用 bash 在 docker 容器中收割僵尸进程

问题描述

最近我正在学习 dumb-init,如果我没记错的话,它正在尝试:

  1. 作为 PID1 运行,就像一个简单的初始化系统(收割僵尸进程)
  2. 信号代理/转发(bash不这样做)

herehere 中,他们都提到 bash 能够收割僵尸进程,所以我正在尝试验证这一点,但无法使其工作。

首先我写了一个简单的 Go 程序,它产生了 10 个僵尸进程:

func main() {
    sigs := make(chan os.Signal,1)

    signal.Notify(sigs,syscall.SIGINT,syscall.SIGTERM,syscall.SIGKILL)

    go func() {
        for i := 0; i < 10; i++ {
            sleepCmd := exec.Command("sleep","1")
            _ = sleepCmd.Start()
        }
    }()

    fmt.Println("awaiting signal")
    sig := <-sigs
    fmt.Println()
    fmt.Printf("received %s,exiting\n",sig.String())
}

为它构建一个镜像:

FROM golang:1.15-alpine3.12 as builder

workdir /

copY . .

RUN go build -o main main.go

FROM alpine:3.12

RUN apk --no-cache --update add dumb-init bash

workdir /
copY --from=builder /main /
copY --from=builder /entrypoint.sh /
RUN chmod +x /entrypoint.sh

ENTRYPOINT ["/main"]

如果我运行 docker run -d <image>,它会按预期工作,我可以在 ps 中看到 10 个僵尸进程:

vagrant@vagrant:/vagrant/dumb-init$ ps aux | grep sleep
root      4388  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4389  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4390  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4391  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4392  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4393  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4394  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4395  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4396  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>
root      4397  0.0  0.0      0     0 ?        Z    13:54   0:00 [sleep] <defunct>

第 2 步是验证 bash 是否真的能够收获进程,所以我将我的 docker 镜像 ENTRYPOINT 更新为 entrypoint.sh,它只是用 bash 包装了我的程序:

#!/bin/bash

/cLever

如果我在容器中运行 ps,僵尸进程仍然挂在那里:

/ # ps
PID   USER     TIME  COMMAND
    1 root      0:00 {entrypoint.sh} /bin/bash /entrypoint.sh
    7 root      0:00 /cLever
   13 root      0:00 [sleep]
   14 root      0:00 [sleep]
   15 root      0:00 [sleep]
   16 root      0:00 [sleep]
   17 root      0:00 [sleep]
   18 root      0:00 [sleep]
   19 root      0:00 [sleep]
   20 root      0:00 [sleep]
   21 root      0:00 [sleep]
   22 root      0:00 [sleep]
   31 root      0:00 /bin/sh
   39 root      0:00 ps

尝试了其他一些方法,但仍然无法弄清楚如何正确获取僵尸进程。

感谢您的帮助。

解决方法

我在 c 中编写了一个小演示,可以帮助证明 bash 已经收获了僵尸进程,以及如果他没有收获会是什么样子。

先解释一下僵尸进程的定义。僵尸进程是已经完成工作并生成退出代码的进程。资源由内核保留,等待父进程收集退出代码。

要生成僵尸,父级需要忽略子级的退出(不要发出 wait 并忽略 SIGCHLD)。

收割僵尸

以下 c 代码正在创建两个僵尸进程。一个属于主进程,一个属于第一个子进程。

#include <stdio.h>
#include <stdlib.h>
#include <signal.h>
#include <pthread.h>
#include <sys/wait.h>
#include <unistd.h>

int main()
{
    printf("Starting Program!\n");

    int pid = fork();
    if (pid == 0)
    {
        pid = fork(); // Create a child zombie
        if (pid == 0) {
            printf("Zombie process %i of the child process\n",getpid());
            exit(10);
        } else {
            printf("Child process %i is running!\n",getpid());
            sleep(10);  // wait 10s
            printf("Child process %i is exiting!\n",getpid());
            exit(0);
        }
    }
    else if (pid > 0)
    {
        pid = fork();
        if (pid == 0) {
            printf("Zombie process %i from the parent process\n",getpid());
        } else {
            printf("Parent process %i...\n",getpid());
            sleep(5);
            printf("Parent process will crash with segmentation failt!\n");
            int* p = 0;
            p = 10;
        }
    }
    else perror("fork()");
    exit(-1);
}

我还创建了一个 docker 容器来编译文件和子文件。整个项目可在以下 git repository

运行构建和演示后,控制台中显示以下打印输出:

root@d2d87f4aafbc:/zombie# ./zombie & ps -eaf --forest
[1] 8
Starting Program!
Parent process 8...
Zombie process 11 from the parent process
Child process 10 is running!
Zombie process 12 of the child process
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 10:43 pts/0    00:00:00 /bin/bash
root           8       1  0 10:43 pts/0    00:00:00 ./zombie
root          10       8  0 10:43 pts/0    00:00:00  \_ ./zombie
root          12      10  0 10:43 pts/0    00:00:00  |   \_ [zombie] <defunct>
root          11       8  0 10:43 pts/0    00:00:00  \_ [zombie] <defunct>
root           9       1  0 10:43 pts/0    00:00:00 ps -eaf --forest
root@d2d87f4aafbc:/zombie# Parent process will crash with segmentation failt!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 10:43 pts/0    00:00:00 /bin/bash
root          10       1  0 10:43 pts/0    00:00:00 ./zombie
root          12      10  0 10:43 pts/0    00:00:00  \_ [zombie] <defunct>
root          13       1  0 10:43 pts/0    00:00:00 ps -eaf --forest
[1]+  Exit 255                ./zombie
root@d2d87f4aafbc:/zombie# Child process 10 is exiting!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 10:43 pts/0    00:00:00 /bin/bash
root          14       1  0 10:43 pts/0    00:00:00 ps -eaf --forest

主进程 (PID 8) 创建两个子进程。

  • 一个子节点 (PID 10),它将创建一个僵尸子节点 (PID 12) 并睡眠 10 秒。
  • 一个会变成僵尸的孩子(PID 11)。

进程创建后,父进程会休眠 5s 并创建分段错误,留下僵尸进程。

当主进程终止时,PID 11 由 bash 继承并被清理(收割)。 PID 10 仍在工作(睡眠是进程的一种工作)他被 bash 留下,因为 PID 11 没有调用 wait,PID 12 仍然是僵尸。

5 秒后,PID 11 完成睡眠并退出。 Bash 收获并继承了 PID 12,之后 bash 收获了 PID 12

离开僵尸

另一个 c 应用程序只是将 bash 作为子进程执行,让它成为 PID 1,他将忽略僵尸。

# docker run -ti --rm test /zombie/ignore
root@b9d49363cb57:/zombie# ./zombie & ps -eaf --forest
[1] 10
Starting Program!
Parent process 10...
Zombie process 13 from the parent process
Child process 12 is running!
Zombie process 14 of the child process
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:18 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:18 pts/0    00:00:00 sh -c /bin/bash
root           8       7  0 11:18 pts/0    00:00:00  \_ /bin/bash
root          10       8  0 11:18 pts/0    00:00:00      \_ ./zombie
root          12      10  0 11:18 pts/0    00:00:00      |   \_ ./zombie
root          14      12  0 11:18 pts/0    00:00:00      |   |   \_ [zombie] <defunct>
root          13      10  0 11:18 pts/0    00:00:00      |   \_ [zombie] <defunct>
root          11       8  0 11:18 pts/0    00:00:00      \_ ps -eaf --forest
root@b9d49363cb57:/zombie# pParent process will crash with segmentation failt!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:18 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:18 pts/0    00:00:00 sh -c /bin/bash
root           8       7  0 11:18 pts/0    00:00:00  \_ /bin/bash
root          15       8  0 11:18 pts/0    00:00:00      \_ ps -eaf --forest
root          12       1  0 11:18 pts/0    00:00:00 ./zombie
root          14      12  0 11:18 pts/0    00:00:00  \_ [zombie] <defunct>
root          13       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
[1]+  Exit 255                ./zombie
root@b9d49363cb57:/zombie# Child process 12 is exiting!
ps -eaf --forest
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 11:18 pts/0    00:00:00 /zombie/ignore
root           7       1  0 11:18 pts/0    00:00:00 sh -c /bin/bash
root           8       7  0 11:18 pts/0    00:00:00  \_ /bin/bash
root          16       8  0 11:18 pts/0    00:00:00      \_ ps -eaf --forest
root          12       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
root          13       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
root          14       1  0 11:18 pts/0    00:00:00 [zombie] <defunct>
root@b9d49363cb57:/zombie#

所以现在,我们系统中还剩下 3 个僵尸,悬而未决。

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...