Metrics-Server:节点没有匹配类型 [InternalIP] 的地址

问题描述

我使用 Rancher 2.5.8 来管理我的 Kubernetes 集群。今天,我创建了一个新集群,除了指标服务器之外,一切都按预期工作。指标服务器的状态始终为“CrashLoopBackOff”,日志告诉我以下内容

E0519 11:46:39.225804       1 server.go:132] unable to fully scrape metrics: [unable to fully scrape metrics from node worker1: unable to fetch metrics from node worker1: unable to extract connection information for node "worker1": node worker1 had no addresses that matched types [InternalIP],unable to fully scrape metrics from node worker2: unable to fetch metrics from node worker2: unable to extract connection information for node "worker2": node worker2 had no addresses that matched types [InternalIP],unable to fully scrape metrics from node worker3: unable to fetch metrics from node worker3: unable to extract connection information for node "worker3": node worker3 had no addresses that matched types [InternalIP],unable to fully scrape metrics from node main1: unable to fetch metrics from node main1: unable to extract connection information for node "main1": node main1 had no addresses that matched types [InternalIP]]
I0519 11:46:39.228205       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
I0519 11:46:39.228222       1 shared_informer.go:240] Waiting for caches to sync for RequestHeaderAuthRequestController
I0519 11:46:39.228290       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0519 11:46:39.228301       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
I0519 11:46:39.228310       1 configmap_cafile_content.go:202] Starting client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0519 11:46:39.228314       1 shared_informer.go:240] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
I0519 11:46:39.229241       1 secure_serving.go:197] Serving securely on [::]:4443
I0519 11:46:39.229280       1 dynamic_serving_content.go:130] Starting serving-cert::/tmp/apiserver.crt::/tmp/apiserver.key
I0519 11:46:39.229302       1 tlsconfig.go:240] Starting DynamicServingCertificateController
I0519 11:46:39.328399       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 
I0519 11:46:39.328428       1 shared_informer.go:247] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 
I0519 11:46:39.328505       1 shared_informer.go:247] Caches are synced for RequestHeaderAuthRequestController

有没有人知道我如何解决这个问题,以便度量服务器不再崩溃?

这是 kubectl get nodes worker1 -oyaml输出

status:
  addresses:
  - address: worker1
    type: Hostname
  - address: 65.21.<any>.<ip>
    type: ExternalIP

解决方法

问题出在指标服务器上。

指标服务器配置为使用 kubelet-preferred-address-types=InternalIP,但工作节点没有列出任何 InternalIP:

$ kubectl get nodes worker1 -oyaml:
[...]
status:
  addresses:
  - address: worker1
    type: Hostname
  - address: 65.21.<any>.<ip>
    type: ExternalIP

解决方案是在指标服务器部署 yaml 中设置 --kubelet-preferred-address-types=ExternalIP

但可能更好的解决方案是在官方指标服务器部署 yaml (source) 中对其进行配置:

- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

metrics-server configuration docs 中所述:

--kubelet-preferred-address-types - 确定连接到特定节点的地址时使用的节点地址类型的优先级(默认 [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])