无法创建 pod 沙箱:rpc 错误:代码 = 未知描述 = 无法设置沙箱容器 背景结论解决方案其他故障排除详情

问题描述

我们正在尝试创建 POD,但该 Pod 的状态长时间处于 ContainerCreating 状态。

这是我们运行命令后得到的输出:kubectl describe pod

Name:           demo-6c59fb8f77-9x6sr
Namespace:      default
Priority:       0
Node:           k8-slave2/10.0.0.5
Start Time:     Wed,23 Dec 2020 10:16:23 +0000
Labels:         app=demo
                pod-template-hash=6c59fb8f77
Annotations:    <none>
Status:         Pending
IP:
IPs:            <none>
Controlled By:  replicaset/demo-6c59fb8f77
Containers:
  private-docker-registry:
    Container ID:
    Image:          private-docker-registry:5000/mahin/mof-docker-demo:v1
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-p94zw (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  default-token-p94zw:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-p94zw
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                 node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
  

  Events:
      Type     Reason                  Age                  From               Message
      ----     ------                  ----                 ----               -------
      normal   Scheduled               10m                  default-scheduler  Successfully assigned default/demo-6c59fb8f77-9x6sr to k8-slave2
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "8eee497a2176c7f5782222f804cc63a4abac7f4a2fc7813016793857ae1b1dff" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "95e72bfc6f6c13de7f5c96eb76b012c2e6639ca03f4c2f270b23ed1a09b90413" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "566370012e4a1d32af2ef9035ff64d743cd81f36f25d2724e7b033e393b8247e" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "7d499e40f572cfc29ecfb44f8376493df56a44213b1c1e9333b65499a0c288cd" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "53241e64de1e4470712b4061e2c82f44916d654bc532f8f1d12e5d5d4e136914" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "fd168faab4546f988dc38fc56df2f71cf80c922e86d3f869be15a43f08328f99" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "e578afe329abb0cba64802dfa480e00f2bbbb8c80be537791c24a31c853eb62f" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "a3cb32dba55907ca907fc4f38f7ca05ef6db10a6af2dd1fa3c4db166e4ab9ffe" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m                  kubelet            Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "7e4368ba8ec460b3c94de24ab0a04b6c799eb28df885cbbacfc3bb3ffa8c1e67" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      Warning  FailedCreatePodSandBox  10m (x4 over 10m)    kubelet            (combined from similar events): Failed to create pod sandBox: rpc error: code = UnkNown desc = Failed to set up sandBox container "c4aaa8f8cd2dc1eff788baf04774c4ecc845568d00ed1b386df311ec224eb6f3" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni Failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory
      normal   SandBoxChanged          56s (x551 over 10m)  kubelet            Pod sandBox changed,it will be killed and re-created.



azureuser@k8-master:~$ kubectl get pods --all-namespaces
NAMESPACE              NAME                                         READY   STATUS              RESTARTS   AGE
default                demo-6c59fb8f77-2jq6k                        0/1     ContainerCreating   0          5m23s
kube-system            coredns-f9fd979d6-q8s9b                      1/1     Running             2          27h
kube-system            coredns-f9fd979d6-qnm4j                      1/1     Running             2          27h
kube-system            etcd-k8-master                               1/1     Running             2          27h
kube-system            kube-apiserver-k8-master                     1/1     Running             3          27h
kube-system            kube-controller-manager-k8-master            1/1     Running             3          27h
kube-system            kube-flannel-ds-kqz4t                        0/1     CrashLoopBackOff    92         27h
kube-system            kube-flannel-ds-szqzn                        1/1     Running             3          27h
kube-system            kube-flannel-ds-v9q47                        0/1     CrashLoopBackOff    142        27h
kube-system            kube-proxy-4mb47                             1/1     Running             2          27h
kube-system            kube-proxy-54m9b                             1/1     Running             2          27h
kube-system            kube-proxy-wdxfz                             1/1     Running             1          27h
kube-system            kube-scheduler-k8-master                     1/1     Running             3          27h
kubernetes-dashboard   dashboard-metrics-scraper-7b59f7d4df-zmlvs   0/1     ContainerCreating   0          27h
kubernetes-dashboard   kubernetes-dashboard-665f4c5ff-cnsvn         0/1     ContainerCreating   0          6h3m

为了修复 flannel crashloopbackoff,我们重置了 Kubeadm,一段时间后这个问题再次出现。

目前我们正在使用一个主节点和两个工作节点。

我的集群详情如下: azureuser@k8-master:~$ kubectl 配置视图 api版本:v1 集群: - 簇: 证书颁发机构数据:DATA+OMITTED 服务器:https://52.150.11.168:6443 名称:kubernetes 上下文: - 语境: 集群:kubernetes 用户:kubernetes-admin 名称:kubernetes-admin@kubernetes 当前上下文:kubernetes-admin@kubernetes 种类:配置 喜好: {} 用户: - 名称:kubernetes-admin 用户: 客户端证书数据:已编辑 客户端密钥数据:已编辑

Docker 版本:

azureuser@k8-master:~$ sudo docker version
[sudo] password for azureuser: 
Client:
 Version:           19.03.6
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        369ce74a3c
 Built:             Wed Oct 14 19:00:27 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          19.03.6
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       369ce74a3c
  Built:            Wed Oct 14 16:52:50 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.3.3-0ubuntu1~18.04.2
  GitCommit:
 runc:
  Version:          spec: 1.0.1-dev
  GitCommit:
 docker-init:
  Version:          0.18.0
  GitCommit:

kubeadm 版本:

azureuser@k8-master:~$ kubeadm version
kubeadm version: &version.Info{Major:"1",Minor:"19",GitVersion:"v1.19.4",GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f",GitTreeState:"clean",BuildDate:"2020-11-11T13:15:05Z",GoVersion:"go1.15.2",Compiler:"gc",Platform:"linux/amd64"}

每当我尝试安排 pod 创建时,法兰绒都会崩溃。

解决方法

背景

我认为您的问题是由您的 2 Flannel CNI pods CrashLoopBackOff 状态引起的。

你的错误

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8eee497a2176c7f5782222f804cc63a4abac7f4a2fc7813016793857ae1b1dff" network for pod "demo-6c59fb8f77-9x6sr": networkPlugin cni failed to set up pod "demo-6c59fb8f77-9x6sr_default" network: open /run/flannel/subnet.env: no such file or directory

指出由于缺少 /run/flannel/subnet.env 文件而无法创建 pod。 在 Flannel Github 文档中,您可以找到:

Flannel 在每个主机上运行一个名为 flanneld 的小型单一二进制代理,负责从更大的预配置地址空间中为每个主机分配子网租用。

意思是,为了正常工作,Flannel pod 应该在每个节点上运行,因为它包含子网信息。从您的输出中,我可以看到 只有 1在 3 个 Flannel pod 中正常工作。

NAMESPACE              NAME                                         READY   STATUS              RESTARTS   AGE
...
kube-system            kube-flannel-ds-kqz4t                        0/1     CrashLoopBackOff    92         27h
kube-system            kube-flannel-ds-szqzn                        1/1     Running             3          27h
kube-system            kube-flannel-ds-v9q47                        0/1     CrashLoopBackOff    142        27h

如果提到的 pod 被安排在 flannel pod 不工作的节点上,由于 CNI network issues,它不会被创建。除了您的 demo pod,还有 kubernetes-dashboard pod 也有与 ContainerCreating 状态相同的问题。

结论

您的 demo pod 无法调度,因为 Kubernetes 遇到一些与 flannel 配置文件 (...network: open /run/flannel/subnet.env: no such file or directory) 相关的网络问题。

对于 27 hours,您的法兰绒豆荚重新启动计数非常高。您必须确定原因并修复它。可能是资源不足、基础架构的网络问题或许多其他原因。一旦所有 flannel pod 都能正常工作,您就不会遇到此错误。

解决方案

你必须让 flannel pods 在每个节点上都能正常工作。

其他故障排除详情

详细调查请提供

$ kubectl describe kube-flannel-ds-kqz4t -n kube-system
$ kubectl describe kube-flannel-ds-v9q47 -n kube-system

日志详细信息也会有帮助

$ kubectl logs kube-flannel-ds-kqz4t -n kube-system
$ kubectl logs kube-flannel-ds-v9q47 -n kube-system

请将 kubectl get pods --all-namespaces 替换为 kubectl get pods -o wide -A 并输出 kubectl get nodes -o wide

如果您提供这些信息,应该可以确定 flannel pod 问题的根本原因,我将使用准确的解决方案编辑此答案。