问题描述
我最近发现我的 GKE 集群中有许多 Pod 被杀死
k get events -n kube-system | grep "Stopping container calico-node"
14m normal Killing pod/calico-node-26rmv Stopping container calico-node
29m normal Killing pod/calico-node-2bz2c Stopping container calico-node
10m normal Killing pod/calico-node-2mjkt Stopping container calico-node
26m normal Killing pod/calico-node-2srrt Stopping container calico-node
34m normal Killing pod/calico-node-2vwz9 Stopping container calico-node
23m normal Killing pod/calico-node-4fqdf Stopping container calico-node
31m normal Killing pod/calico-node-4hj2h Stopping container calico-node
14m normal Killing pod/calico-node-4w9fr Stopping container calico-node
7m normal Killing pod/calico-node-56ns7 Stopping container calico-node
10m normal Killing pod/calico-node-5mxjh Stopping container calico-node
32m normal Killing pod/calico-node-65zmr Stopping container calico-node
7m38s normal Killing pod/calico-node-66bnz Stopping container calico-node
19m normal Killing pod/calico-node-66kx4 Stopping container calico-node
32m normal Killing pod/calico-node-6bctr Stopping container calico-node
38m normal Killing pod/calico-node-6gq9b Stopping container calico-node
29m normal Killing pod/calico-node-6hjk5 Stopping container calico-node
15m normal Killing pod/calico-node-6kn67 Stopping container calico-node
27m normal Killing pod/calico-node-6q6cp Stopping container calico-node
其中一些 Pod 部署在未启用任何自动缩放的节点池上。
从日志的角度来看,我在 pod 中看到的最后一个日志是
2021-05-10 11:23:03
"plugins": [
2021-05-10 11:23:03
"cniVersion": "0.3.1",2021-05-10 11:23:03
"name": "k8s-pod-network",2021-05-10 11:23:03
CNI config: {
2021-05-10 11:23:03
Using CNI config template from CNI_NETWORK_CONfig environment variable.
2021-05-10 11:23:03
/host/secondary-bin-dir is non-writeable,skipping
2021-05-10 11:23:03
CNI plugin version: v3.8.8-1-gke.2
2021-05-10 11:23:03
Wrote Calico CNI binaries to /host/opt/cni/bin
2021-05-10 11:23:03
ls: cannot access '/calico-secrets': No such file or directory
2021-05-10 11:22:53
No Calico CNI spec template is specified. Exiting (0)...
2021-05-10 11:22:53
Calico Network Policy is enabled
2021-05-10 11:22:53
Calico network policy config: true
我如何进一步调查可能的原因?由于这是 GKE,我无权访问 /var/log/kube-scheduler.log
。
在 GCP 日志记录中,我看到的只有
我已经验证过了。 没有 OOM 错误。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)