问题描述
Openshift 4.4.17 集群(3 个主节点和 3 个工作节点)。
尝试查看在工作节点上运行的 pod 上的日志(或 exec 终端)时出错。这同样适用于 Openshift GUI。尝试对主节点上运行的 pod 执行相同操作时没有问题。
示例 1:在 worker 上运行的 Pod
$ oc whoami
kube:admin
$ oc get pod -n lamp
NAME READY STATUS RESTARTS AGE
lamp-lamp-6c7d9f467d-jsn4t 3/3 Running 0 108d
$ oc logs lamp-lamp-6c7d9f467d-jsn4t httpd -n lamp
error: You must be logged in to the server (the server has asked for the client to provide credentials ( pods/log lamp-lamp-6c7d9f467d-jsn4t))
示例 2:在主节点上运行的 Pod
$ oc get pod -n openshift-apiserver
NAME READY STATUS RESTARTS AGE
apiserver-6d64545f-5lmb8 1/1 Running 0 2d19h
apiserver-6d64545f-hktqd 1/1 Running 0 2d19h
apiserver-6d64545f-kb4qt 1/1 Running 0 2d19h
$ oc logs apiserver-6d64545f-5lmb8 -n openshift-apiserver
copying system trust bundle
I0225 20:41:39.989689 1 requestheader_controller.go:243] (..output omitted..)
调查工作节点上的 kubelet:
在每个工作节点上 kubelet 服务都在运行,但是
journalctl -u kubelet
显示这两行:
Unable to authenticate the request due to an error: x509: certificate signed by unkNown authority
logging error output: "Unauthorized"
关于工作节点上的 kubeconfig:
查看 /etc/kubernetes/kubeconfig 文件的内容。
- kubelet connects to api-server --> https://api-int.ocs-cls1.mycompany.lab
- the server passes valid certificate signed by --> kube-apiserver-lb-signer
- certificate-authority-data carries --> kube-apiserver-lb-signer rootCA
kubeconfig 看起来是正确的。
更新:
还注意到这些日志行报告了错误的证书:
$ oc -n openshift-apiserver logs apiserver-6d64545f-5lmb8
log.go:172] http: TLS handshake error from 10.128.0.12:47078: remote error: tls: bad certificate
...
更新 2:
还检查了 apiserver-loopback-client 证书:
$ curl --resolve apiserver-loopback-client:6443:{IP_MASTER} -v -k https://apiserver-loopback-client:6443/healthz
server certificate verification SKIPPED
* server certificate status verification SKIPPED
* common name: apiserver-loopback-client@1614330374 (matched)
* server certificate expiration date OK
* server certificate activation date OK
* certificate public key: RSA
* certificate version: #3
* subject: CN=apiserver-loopback-client@1614330374
* start date: Fri,26 Feb 2021 08:06:13 GMT
* expire date: Sat,26 Feb 2022 08:06:13 GMT
* issuer: CN=apiserver-loopback-client-ca@1614330374
解决方法
试试这个
while :;do
sleep 2
oc get csr -o name | xargs -r oc adm certificate approve
done
使用另一个终端,并ssh到任何主节点,运行:
crictl ps -a | awk '/Running/&&/-cert-syncer/{print $1}' | xargs -r crictl stop