Kubernetes DNS lookg 在工作节点上不起作用 - 连接超时；无法访问任何服务器

问题描述

我已经使用 Calico CNI 构建了新的 Kubernetes 集群 v1.20.1 单主和单节点。

我在默认命名空间中部署了 busyBox pod。

# kubectl get pods busyBox -o wide
NAME      READY   STATUS    RESTARTS   AGE   IP             NODE        NOMINATED NODE   READInesS GATES
busyBox   1/1     Running   0          12m   10.203.0.129   node02   <none>           <none>

nslookup 不起作用

kubectl exec -ti busyBox -- nslookup kubernetes.default
Server:    10.96.0.10
Address 1: 10.96.0.10

nslookup: can't resolve 'kubernetes.default'

集群正在运行带有最新更新的 RHEL 8

遵循以下步骤：https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/

nslookup 命令无法访问名称服务器

# kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers Could be reached

command terminated with exit code 1

resolve.conf 文件

# kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local 
nameserver 10.96.0.10
options ndots:5

正在运行的 DNS pod

# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME                      READY   STATUS    RESTARTS   AGE
coredns-74ff55c5b-472vx   1/1     Running   1          85m
coredns-74ff55c5b-c75bq   1/1     Running   1          85m

DNS pod 日志

 kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
coredns-1.7.0
linux/amd64,go1.14.4,f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
coredns-1.7.0
linux/amd64,f59c03d

服务已定义

# kubectl get svc --namespace=kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   86m

**I can see the endpoints of DNS pod**

# kubectl get endpoints kube-dns --namespace=kube-system
NAME       ENDPOINTS                                               AGE
kube-dns   10.203.0.5:53,10.203.0.6:53,10.203.0.5:53 + 3 more...   86m

启用日志记录，但没有看到流量进入 DNS pod

# kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
coredns-1.7.0
linux/amd64,f59c03d

我可以 ping DNS POD

# kubectl exec -i -t dnsutils -- ping 10.203.0.5
PING 10.203.0.5 (10.203.0.5): 56 data bytes
64 bytes from 10.203.0.5: seq=0 ttl=62 time=6.024 ms
64 bytes from 10.203.0.5: seq=1 ttl=62 time=6.052 ms
64 bytes from 10.203.0.5: seq=2 ttl=62 time=6.175 ms
64 bytes from 10.203.0.5: seq=3 ttl=62 time=6.000 ms
^C
--- 10.203.0.5 ping statistics ---
4 packets transmitted,4 packets received,0% packet loss
round-trip min/avg/max = 6.000/6.062/6.175 ms

nmap 显示过滤的端口

# ke netshoot-6f677d4fdf-5t5cb -- nmap 10.203.0.5
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:29 UTC
Nmap scan report for 10.203.0.5
Host is up (0.0060s latency).
Not shown: 997 closed ports
PORT     STATE    SERVICE
53/tcp   filtered domain
8080/tcp filtered http-proxy
8181/tcp filtered intermapper

Nmap done: 1 IP address (1 host up) scanned in 14.33 seconds

如果我在主节点上安排 POD，nslookup 工作 nmap 显示端口打开？

# ke netshoot -- bash
bash-5.0# nslookup kubernetes.default
Server:         10.96.0.10
Address:        10.96.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

 nmap -p 53 10.96.0.10
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:46 UTC
Nmap scan report for kube-dns.kube-system.svc.cluster.local (10.96.0.10)
Host is up (0.000098s latency).

PORT   STATE SERVICE
53/tcp open  domain

Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds

为什么在工作节点上运行的 POD 的 nslookup 不起作用？如何解决此问题？

我重新构建了两次服务器，仍然是同样的问题。

谢谢

更新添加 kubeadm 配置文件

# cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
nodeRegistration:
  criSocket: unix:///run/containerd/containerd.sock
  taints:
  - effect: NoSchedule
    key: node-role.kubernetes.io/master
  kubeletExtraArgs:
    cgroup-driver: "systemd"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesversion: stable
controlPlaneEndpoint: "master01:6443"
networking:
  dnsDomain: cluster.local
  podsubnet: 10.0.0.0/14
  servicesubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs

”

解决方法

首先，根据文档 - 请注意 Calico 和 kubeadm 支持 Centos/RHEL 7+。
在 Calico 和 kubeadm 文档中，我们可以看到它们仅支持 RHEL7+。

默认情况下 RHEL8 使用 nftables 而不是 iptables（我们仍然可以使用 iptables 但 RHEL8 上的“iptables”是实际上在后台使用内核的 nft 框架 - 看看 "Running Iptables on RHEL 8"）。

9.2.1. nftables replaces iptables as the default network packet filtering framework

我相信 nftables 可能会导致此网络问题，因为我们可以在 nftables adoption page 上找到：

Kubernetes 尚不支持 nftables。

注意：目前我强烈建议您使用 RHEL7 而不是 RHEL8。

考虑到这一点，我将提供一些可以帮助您使用 RHEL8 的信息。
我已经重现了您的问题，并找到了适合我的解决方案。

首先我打开了 Calico 所需的端口 - 可以找到这些端口 here 在“网络要求”下。
解决方法：
接下来我在所有集群上恢复到旧的 iptables 后端节点，您可以通过设置 FirewallBackend 在 /etc/firewalld/firewalld.conf 到 iptables 如上所述
here。
最后，我重新启动了 firewalld 以使新规则生效。

我已经尝试在工作节点 (kworker) 上运行来自 nslookup 的 Pod，它似乎工作正常。

root@kmaster:~# kubectl get pod,svc -o wide
NAME      READY   STATUS    RESTARTS   AGE    IP           NODE      NOMINATED NODE   READINESS GATES
pod/web   1/1     Running   0          112s   10.99.32.1   kworker   <none>           <none>

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE     SELECTOR
service/kubernetes   ClusterIP   10.99.0.1    <none>        443/TCP   5m51s   <none>
root@kmaster:~# kubectl exec -it web -- bash
root@web:/# nslookup kubernetes.default
Server:         10.99.0.10
Address:        10.99.0.10#53

Name:   kubernetes.default.svc.cluster.local
Address: 10.99.0.1

root@web:/#