问题描述
我是 K8s 的新手,所以我无法找到问题的按钮。 上周我用kubeadm在centos中安装了一个1主2节点的集群:
kubectl 获取节点
NAME STATUS ROLES AGE VERSION
ardl-k8latam01 Ready control-plane,master 7d2h v1.20.0
ardl-k8latam02 Ready <none> 7d2h v1.20.0
ardl-k8latam03 Ready <none> 7d2h v1.20.0
起初工作正常,但在我开始使用 helm 后开始失败(不知道是否相关)。
现在我无法运行任何部署,并且有很多处于“终止”状态的 Pod 永远不会完成。这里我尝试以 kubectl apply -f https://k8s.io/examples/controllers/Nginx-deployment.yaml
为例:
[root@ardl-k8latam01 ~]# kubectl get all --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default pod/Nginx-deployment-66b6c48dd5-2xt7b 1/1 Terminating 0 19h
default pod/Nginx-deployment-66b6c48dd5-5cttk 1/1 Terminating 0 19h
default pod/Nginx-deployment-66b6c48dd5-8bz2f 0/1 Pending 0 18h
default pod/Nginx-deployment-66b6c48dd5-dksqx 1/1 Terminating 0 19h
default pod/Nginx-deployment-66b6c48dd5-fj9kl 0/1 Pending 0 18h
default pod/Nginx-deployment-66b6c48dd5-j4hqv 0/1 Pending 0 18h
kube-system pod/calico-kube-controllers-bcc6f659f-bgmkb 1/1 Running 0 18h
kube-system pod/calico-kube-controllers-bcc6f659f-pksws 1/1 Terminating 0 7d21h
kube-system pod/calico-node-fns6d 0/1 Running 2 7d21h
kube-system pod/calico-node-t854c 1/1 Running 0 7d21h
kube-system pod/calico-node-vbsdr 1/1 Running 0 7d21h
kube-system pod/coredns-74ff55c5b-gw8j2 1/1 Running 1 18h
kube-system pod/coredns-74ff55c5b-xhvqb 1/1 Terminating 0 7d21h
kube-system pod/coredns-74ff55c5b-xr9mb 1/1 Terminating 0 7d21h
kube-system pod/coredns-74ff55c5b-zhhkx 1/1 Running 1 18h
kube-system pod/etcd-ardl-k8latam01 1/1 Running 2 7d21h
kube-system pod/kube-apiserver-ardl-k8latam01 1/1 Running 4 7d21h
kube-system pod/kube-controller-manager-ardl-k8latam01 1/1 Running 2 7d21h
kube-system pod/kube-proxy-2lmpb 1/1 Running 0 7d21h
kube-system pod/kube-proxy-fchv8 1/1 Running 2 7d21h
kube-system pod/kube-proxy-xks7h 1/1 Running 0 7d21h
kube-system pod/kube-scheduler-ardl-k8latam01 1/1 Running 2 7d21h
kube-system pod/metrics-server-68b849498d-6q74v 1/1 Terminating 0 7d20h
kube-system pod/metrics-server-68b849498d-7lpz8 0/1 Pending 0 18h
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default service/dashboardlb ClusterIP 10.100.82.105 <none> 8001/TCP 7d20h
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 7d21h
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 7d21h
kube-system service/metrics-server ClusterIP 10.101.85.63 <none> 443/TCP 7d20h
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
kube-system daemonset.apps/calico-node 3 3 0 3 0 beta.kubernetes.io/os=linux 7d21h
kube-system daemonset.apps/kube-proxy 3 3 1 3 1 kubernetes.io/os=linux 7d21h
NAMESPACE NAME READY UP-TO-DATE AVAILABLE AGE
default deployment.apps/Nginx-deployment 0/3 3 0 18h
kube-system deployment.apps/calico-kube-controllers 1/1 1 1 7d21h
kube-system deployment.apps/coredns 2/2 2 2 7d21h
kube-system deployment.apps/metrics-server 0/1 1 0 7d20h
NAMESPACE NAME DESIRED CURRENT READY AGE
default replicaset.apps/Nginx-deployment-66b6c48dd5 3 3 0 18h
kube-system replicaset.apps/calico-kube-controllers-bcc6f659f 1 1 1 7d21h
kube-system replicaset.apps/coredns-74ff55c5b 2 2 2 7d21h
kube-system replicaset.apps/metrics-server-68b849498d 1 1 0 7d20h
在集群信息转储中我得到:
==== START logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====
Request log error: the server rejected our request for an unkNown reason (get pods second-app-deployment-7f794d896f-q6zn5)
==== END logs for container second-node of pod default/second-app-deployment-7f794d896f-q6zn5 ====
用描述:
[root@ardl-k8latam01 testwordpress]# kubectl describe pod Nginx-deployment-66b6c48dd5-5cttk
Name: Nginx-deployment-66b6c48dd5-5cttk
Namespace: default
Priority: 0
Node: ardl-k8latam02/10.48.41.12
Start Time: Fri,18 Dec 2020 17:06:57 -0300
Labels: app=Nginx
pod-template-hash=66b6c48dd5
Annotations: <none>
Status: Pending
IP:
IPs: <none>
Controlled By: replicaset/Nginx-deployment-66b6c48dd5
Containers:
Nginx:
Container ID:
Image: Nginx:1.14.2
Image ID:
Port: 80/TCP
Host Port: 0/TCP
State: Waiting
Reason: ContainerCreating
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9rnk6 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-9rnk6:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9rnk6
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 22m kubelet Failed to create pod **sandBox: rpc error: code = UnkNown desc = [Failed to set up sandBox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "Nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni Failed to set up pod "Nginx-deployment-66b6c48dd5-5cttk_default" network: error getting Clusterinformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host,Failed to clean up sandBox container "044a2201b141e6679570d0f0ec3b1967b2a5bf0b230fa5058ed2bc6711eba55e" network for pod "Nginx-deployment-66b6c48dd5-5cttk": networkPlugin cni Failed to teardown pod "Nginx-deployment-66b6c48dd5-5cttk_default" network: error getting Clusterinformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: dial tcp 10.96.0.1:443: connect: no route to host]
normal Scheduled 21m default-scheduler Successfully assigne**d default/Nginx-deployment-66b6c48dd5-5cttk to ardl-k8latam02
normal SandBoxChanged 2m27s (x93 over 22m) kubelet Pod sandBox changed,it will be killed and re-created.
我也尝试重新启动节点和主节点,但没有任何改变。 当我尝试“描述”一个“终止”pod 时,它告诉我该 pod 不存在。
我的问题与印花布有关吗?
如何深入了解Request log error: the server rejected our request for an unkNown reason
?
我该如何继续调查?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)