问题描述
我遇到了身份验证运算符不稳定的问题(在 Avaialbe = True 和 Degraded = True 之间跳跃)。操作员正在尝试使用端点 https://oauth-openshift.apps.oc.sow.expert/healthz 检查运行状况。它认为它不可用(至少有时)。
集群版本:
[root@bastion ~]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.1 True False 44h Error while reconciling 4.7.1: the cluster operator ingress is degraded
集群运算符描述:
[root@bastion ~]# oc describe clusteroperator authentication
Name: authentication
Namespace:
Labels: <none>
Annotations: exclude.release.openshift.io/internal-openshift-hosted: true
include.release.openshift.io/self-managed-high-availability: true
include.release.openshift.io/single-node-developer: true
API Version: config.openshift.io/v1
Kind: ClusterOperator
Metadata:
Creation Timestamp: 2021-03-15T19:54:21Z
Generation: 1
Managed Fields:
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:Metadata:
f:annotations:
.:
f:exclude.release.openshift.io/internal-openshift-hosted:
f:include.release.openshift.io/self-managed-high-availability:
f:include.release.openshift.io/single-node-developer:
f:spec:
f:status:
.:
f:extension:
Manager: cluster-version-operator
Operation: Update
Time: 2021-03-15T19:54:21Z
API Version: config.openshift.io/v1
Fields Type: FieldsV1
fieldsV1:
f:status:
f:conditions:
f:relatedobjects:
f:versions:
Manager: authentication-operator
Operation: Update
Time: 2021-03-15T20:03:18Z
Resource Version: 1207037
Self Link: /apis/config.openshift.io/v1/clusteroperators/authentication
UID: b7ca7d49-f6e5-446e-ac13-c5cc6d06fac1
Spec:
Status:
Conditions:
Last Transition Time: 2021-03-17T11:42:49Z
Message: OAuthRouteCheckEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.oc.sow.expert/healthz": EOF
Reason: AsExpected
Status: False
Type: Degraded
Last Transition Time: 2021-03-17T11:42:53Z
Message: All is well
Reason: AsExpected
Status: False
Type: Progressing
Last Transition Time: 2021-03-17T11:43:21Z
Message: OAuthRouteCheckEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps.oc.sow.expert/healthz": EOF
Reason: OAuthRouteCheckEndpointAccessibleController_EndpointUnavailable
Status: False
Type: Available
Last Transition Time: 2021-03-15T20:01:24Z
Message: All is well
Reason: AsExpected
Status: True
Type: Upgradeable
Extension: <nil>
Related Objects:
Group: operator.openshift.io
Name: cluster
Resource: authentications
Group: config.openshift.io
Name: cluster
Resource: authentications
Group: config.openshift.io
Name: cluster
Resource: infrastructures
Group: config.openshift.io
Name: cluster
Resource: oauths
Group: route.openshift.io
Name: oauth-openshift
Namespace: openshift-authentication
Resource: routes
Group:
Name: oauth-openshift
Namespace: openshift-authentication
Resource: services
Group:
Name: openshift-config
Resource: namespaces
Group:
Name: openshift-config-managed
Resource: namespaces
Group:
Name: openshift-authentication
Resource: namespaces
Group:
Name: openshift-authentication-operator
Resource: namespaces
Group:
Name: openshift-ingress
Resource: namespaces
Group:
Name: openshift-oauth-apiserver
Resource: namespaces
Versions:
Name: oauth-apiserver
Version: 4.7.1
Name: operator
Version: 4.7.1
Name: oauth-openshift
Version: 4.7.1_openshift
Events: <none>
当我从堡垒服务器多次 curl 到同一个端点时,它会导致两个不同的响应,一次出现错误“OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443”和其他似乎成功如下:
[root@bastion ~]# curl -vk https://oauth-openshift.apps.oc.sow.expert/healthz
* Trying 192.168.124.173...
* TCP_NODELAY set
* Connected to oauth-openshift.apps.oc.sow.expert (192.168.124.173) port 443 (#0)
* ALPN,offering h2
* ALPN,offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT),TLS handshake,Client hello (1):
* OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443
* Closing connection 0
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to oauth-openshift.apps.oc.sow.expert:443
[root@bastion ~]# curl -vk https://oauth-openshift.apps.oc.sow.expert/healthz
* Trying 192.168.124.173...
* TCP_NODELAY set
* Connected to oauth-openshift.apps.oc.sow.expert (192.168.124.173) port 443 (#0)
* ALPN,Client hello (1):
* TLSv1.3 (IN),Server hello (2):
* TLSv1.3 (IN),[no content] (0):
* TLSv1.3 (IN),Encrypted Extensions (8):
* TLSv1.3 (IN),Request CERT (13):
* TLSv1.3 (IN),Certificate (11):
* TLSv1.3 (IN),CERT verify (15):
* TLSv1.3 (IN),Finished (20):
* TLSv1.3 (OUT),TLS change cipher,Change cipher spec (1):
* TLSv1.3 (OUT),[no content] (0):
* TLSv1.3 (OUT),Certificate (11):
* TLSv1.3 (OUT),Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN,server accepted to use http/1.1
* Server certificate:
* subject: CN=*.apps.oc.sow.expert
* start date: Mar 15 20:05:53 2021 GMT
* expire date: Mar 15 20:05:54 2023 GMT
* issuer: CN=ingress-operator@1615838672
* SSL certificate verify result: self signed certificate in certificate chain (19),continuing anyway.
* TLSv1.3 (OUT),TLS app data,[no content] (0):
> GET /healthz HTTP/1.1
> Host: oauth-openshift.apps.oc.sow.expert
> User-Agent: curl/7.61.1
> Accept: */*
>
* TLSv1.3 (IN),Newsession Ticket (4):
* TLSv1.3 (IN),[no content] (0):
< HTTP/1.1 200 OK
< Cache-Control: no-cache,no-store,max-age=0,must-revalidate
< Content-Type: text/plain; charset=utf-8
< Expires: 0
< Pragma: no-cache
< Referrer-Policy: strict-origin-when-cross-origin
< X-Content-Type-Options: nosniff
< X-Dns-Prefetch-Control: off
< x-frame-options: DENY
< X-Xss-Protection: 1; mode=block
< Date: Wed,17 Mar 2021 11:49:50 GMT
< Content-Length: 2
<
* Connection #0 to host oauth-openshift.apps.oc.sow.expert left intact
ok
在堡垒服务器中,我托管了 HAProxy 负载平衡器和鱿鱼代理,以允许内部安装访问互联网。
HAProxy 配置如下:
[root@bastion ~]# cat /etc/haproxy/haproxy.cfg
#---------------------------------------------------------------------
# Example configuration for a possible web application. See the
# full configuration options online.
#
# https://www.haproxy.org/download/1.8/doc/configuration.txt
#
#---------------------------------------------------------------------
#---------------------------------------------------------------------
# Global settings
#---------------------------------------------------------------------
global
# to have these messages end up in /var/log/haproxy.log you will
# need to:
#
# 1) configure syslog to accept network log events. This is done
# by adding the '-r' option to the SYSLOGD_OPTIONS in
# /etc/sysconfig/syslog
#
# 2) configure local2 events to go to the /var/log/haproxy.log
# file. A line like the following can be added to
# /etc/sysconfig/syslog
#
# local2.* /var/log/haproxy.log
#
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
daemon
# turn on stats unix socket
stats socket /var/lib/haproxy/stats
# utilize system-wide crypto-policies
#ssl-default-bind-ciphers PROFILE=SYstem
#ssl-default-server-ciphers PROFILE=SYstem
#---------------------------------------------------------------------
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#---------------------------------------------------------------------
defaults
mode tcp
log global
option tcplog
option dontlognull
option http-server-close
#option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 3000
# Control Plane config - external
frontend api
bind 192.168.124.174:6443
mode tcp
default_backend api-be
# Control Plane config - internal
frontend api-int
bind 10.164.76.113:6443
mode tcp
default_backend api-be
backend api-be
mode tcp
balance roundrobin
# server bootstrap 10.94.124.2:6443 check
server master01 10.94.124.3:6443 check
server master02 10.94.124.4:6443 check
server master03 10.94.124.5:6443 check
frontend machine-config
bind 10.164.76.113:22623
mode tcp
default_backend machine-config-be
backend machine-config-be
mode tcp
balance roundrobin
# server bootstrap 10.94.124.2:22623 check
server master01 10.94.124.3:22623 check
server master02 10.94.124.4:22623 check
server master03 10.94.124.5:22623 check
# apps config
frontend https
mode tcp
bind 10.164.76.113:443
default_backend https
frontend http
mode tcp
bind 10.164.76.113:80
default_backend http
frontend https-ext
mode tcp
bind 192.168.124.173:443
default_backend https
frontend http-ext
mode tcp
bind 192.168.124.173:80
default_backend http
backend https
mode tcp
balance roundrobin
server storage01 10.94.124.6:443 check
server storage02 10.94.124.7:443 check
server storage03 10.94.124.8:443 check
server worker01 10.94.124.15:443 check
server worker02 10.94.124.16:443 check
server worker03 10.94.124.17:443 check
server worker04 10.94.124.18:443 check
server worker05 10.94.124.19:443 check
server worker06 10.94.124.20:443 check
backend http
mode tcp
balance roundrobin
server storage01 10.94.124.6:80 check
server storage02 10.94.124.7:80 check
server storage03 10.94.124.8:80 check
server worker01 10.94.124.15:80 check
server worker02 10.94.124.16:80 check
server worker03 10.94.124.17:80 check
server worker04 10.94.124.18:80 check
server worker05 10.94.124.19:80 check
server worker06 10.94.124.20:80 check
这里是squid代理配置:
[root@bastion ~]# cat /etc/squid/squid.conf
#
# Recommended minimum configuration:
#
# Example rule allowing access from your local networks.
# Adapt to list your (internal) IP networks from where browsing
# should be allowed
acl localnet src 0.0.0.1-0.255.255.255 # RFC 1122 "this" network (LAN)
acl localnet src 10.0.0.0/8 # RFC 1918 local private network (LAN)
acl localnet src 100.64.0.0/10 # RFC 6598 shared address space (CGN)
acl localnet src 169.254.0.0/16 # RFC 3927 link-local (directly plugged) machines
acl localnet src 172.16.0.0/12 # RFC 1918 local private network (LAN)
acl localnet src 192.168.0.0/16 # RFC 1918 local private network (LAN)
acl localnet src fc00::/7 # RFC 4193 local private network range
acl localnet src fe80::/10 # RFC 4291 link-local (directly plugged) machines
acl SSL_ports port 443
acl Safe_ports port 80 # http
acl Safe_ports port 21 # ftp
acl Safe_ports port 443 # https
acl Safe_ports port 70 # gopher
acl Safe_ports port 210 # wais
acl Safe_ports port 1025-65535 # unregistered ports
acl Safe_ports port 280 # http-mgmt
acl Safe_ports port 488 # gss-http
acl Safe_ports port 591 # filemaker
acl Safe_ports port 777 # multiling http
acl CONNECT method CONNECT
#
# Recommended minimum Access Permission configuration:
#
# Deny requests to certain unsafe ports
#http_access deny !Safe_ports
# Deny CONNECT to other than secure SSL ports
#http_access deny CONNECT !SSL_ports
# Only allow cachemgr access from localhost
http_access allow localhost manager
http_access deny manager
# We strongly recommend the following be uncommented to protect innocent
# web applications running on the proxy server who think the only
# one who can access services on "localhost" is a local user
#http_access deny to_localhost
#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#
# Example rule allowing access from your local networks.
# Adapt localnet in the ACL section to list your (internal) IP networks
# from where browsing should be allowed
http_access allow localnet
http_access allow localhost
# And finally deny all other access to this proxy
http_access deny all
# Squid normally listens to port 3128
http_port 3128
http_port 10.164.76.113:3128
# Uncomment and adjust the following to add a disk cache directory.
#cache_dir ufs /var/spool/squid 100 16 256
# Leave coredumps in the first cache dir
coredump_dir /var/spool/squid
#
# Add any of your own refresh_pattern entries above these.
#
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern . 0 20% 4320
有人可以帮我解决点击应用程序端点时的连接问题吗?
编辑:
我在控制台 pod 日志中收到以下错误:
[root@bastion cp]# oc logs -n openshift-console console-6697f85d68-p8jxf
W0404 14:59:30.706793 1 main.go:211] Flag inactivity-timeout is set to less then 300 seconds and will be ignored!
I0404 14:59:30.706887 1 main.go:288] cookies are secure!
E0404 14:59:31.221158 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 14:59:41.690905 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 14:59:52.155373 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:02.618751 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:13.071041 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:23.531058 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:33.999953 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:44.455873 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
E0404 15:00:54.935240 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.oc.sow.expert/oauth/token Failed: Head "https://oauth-openshift.apps.oc.sow.expert": EOF
I0404 15:01:05.666751 1 main.go:670] Binding to [::]:8443...
I0404 15:01:05.666776 1 main.go:672] using TLS
解决方法
我刚刚解决了这个问题。要检查您是否有相同的问题:
oc logs -n openshift-console console-xxxxxxx-yyyyy
检查您是否有这样的消息:
联系身份验证提供者时出错(10 秒后重试):请求 OAuth 发行者端点 https://oauth-openshift.apps.oc4.tt.testing/oauth/token 失败:头 “https://oauth-openshift.apps.oc4.tt.testing”:拨号tcp:查找 172.30.0.10:53 上的 oauth-openshift.apps.oc4.tt.testing:没有这样的主机
就我而言,我是通过 libvirt 进行部署的。 Libvirt 负责部分 DNS 解析。 我已经将这个条目添加到 libvirt 网络中,但是我不得不删除并再次添加它。
WORKER_IP=192.168.126.51
virsh net-update oc4-xxxx delete dns-host "<host ip='$WORKER_IP'><hostname>oauth-openshift.apps.oc4.tt.testing</hostname></host>"
virsh net-update oc4-xxxx add dns-host "<host ip='$WORKER_IP'><hostname>oauth-openshift.apps.oc4.tt.testing</hostname></host>"