Openshift 4.4 - 无法在工作节点上运行“oc logs\exec”pod

问题描述

Openshift 4.4.17 集群(3 个主节点和 3 个工作节点)。

尝试查看在工作节点上运行的 pod 上的日志(或 exec 终端)时出错。这同样适用于 Openshift GUI。尝试对主节点上运行的 pod 执行相同操作时没有问题。

示例 1:在 worker 上运行的 Pod

$ oc whoami
kube:admin
$ oc get pod -n lamp
NAME                         READY   STATUS    RESTARTS   AGE
lamp-lamp-6c7d9f467d-jsn4t   3/3     Running   0          108d

$ oc logs lamp-lamp-6c7d9f467d-jsn4t httpd -n lamp
error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log lamp-lamp-6c7d9f467d-jsn4t))

示例 2:在主节点上运行的 Pod

$ oc get pod -n openshift-apiserver
NAME                       READY   STATUS    RESTARTS   AGE
apiserver-6d64545f-5lmb8   1/1     Running   0          2d19h
apiserver-6d64545f-hktqd   1/1     Running   0          2d19h
apiserver-6d64545f-kb4qt   1/1     Running   0          2d19h

$ oc logs apiserver-6d64545f-5lmb8 -n openshift-apiserver
copying system trust bundle
I0225 20:41:39.989689       1 requestheader_controller.go:243] (..output omitted..)

调查工作节点上的 kubelet:

在每个工作节点上 kubelet 服务都在运行,但是

journalctl -u kubelet 

显示这两行:

Unable to authenticate the request due to an error: x509: certificate signed by unkNown authority
logging error output: "Unauthorized"

关于工作节点上的 kubeconfig

查看 /etc/kubernetes/kubeconfig 文件内容

- kubelet connects to api-server                --> https://api-int.ocs-cls1.mycompany.lab
- the server passes valid certificate signed by --> kube-apiserver-lb-signer
- certificate-authority-data carries            --> kube-apiserver-lb-signer rootCA

kubeconfig 看起来是正确的。

更新:

还注意到这些日志行报告了错误的证书:

$ oc -n openshift-apiserver logs apiserver-6d64545f-5lmb8
log.go:172] http: TLS handshake error from 10.128.0.12:47078: remote error: tls: bad certificate
...

更新 2:

还检查了 apiserver-loopback-client 证书:

$ curl --resolve apiserver-loopback-client:6443:{IP_MASTER} -v -k https://apiserver-loopback-client:6443/healthz
server certificate verification SKIPPED
*        server certificate status verification SKIPPED
*        common name: apiserver-loopback-client@1614330374 (matched)
*        server certificate expiration date OK
*        server certificate activation date OK
*        certificate public key: RSA
*        certificate version: #3
*        subject: CN=apiserver-loopback-client@1614330374
*        start date: Fri,26 Feb 2021 08:06:13 GMT
*        expire date: Sat,26 Feb 2022 08:06:13 GMT
*        issuer: CN=apiserver-loopback-client-ca@1614330374

解决方法

试试这个

while :;do
  sleep 2
  oc get csr -o name | xargs -r oc adm certificate approve
done

使用另一个终端,并ssh到任何主节点,运行:

crictl ps -a | awk '/Running/&&/-cert-syncer/{print $1}' | xargs -r crictl stop