Nginx 入口控制器随机 404 - 可能无法识别每个请求的正确证书?

问题描述

我正在尝试通过 kubernetes 上的 Nginx 入口实现端点。相同的配置似乎适用于同一集群中的另一个控制器部署,但在这里我得到了非常随机的 404 响应与预期的响应混合在一起。

ingress-Nginx-controller 部署的配置,从 helm chart 修改

apiVersion: v1
kind: Service
Metadata:
  annotations:
    cloud.google.com/load-balancer-type: Internal 
  labels:
    helm.sh/chart: ingress-Nginx-3.23.0
    app.kubernetes.io/name: ingress-Nginx
    app.kubernetes.io/instance: ingress-Nginx
    app.kubernetes.io/version: "0.44.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: controller
  name: ingress-Nginx-controller
  namespace: kube-system
spec:
  type: LoadBalancer
  ports:
    - name: http
      port: 80
      protocol: TCP
      targetPort: http
    - name: https
      port: 443
      protocol: TCP
      targetPort: https
  selector:
    app.kubernetes.io/name: ingress-Nginx
    app.kubernetes.io/instance: ingress-Nginx
    app.kubernetes.io/component: controller
---
# Source: ingress-Nginx/templates/controller-deployment.yaml
apiVersion: apps/v1
kind: Deployment
Metadata:
  labels:
    helm.sh/chart: ingress-Nginx-3.23.0
    app.kubernetes.io/name: ingress-Nginx
    app.kubernetes.io/instance: ingress-Nginx
    app.kubernetes.io/version: "0.44.0"
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/component: controller
  name: ingress-Nginx-controller
  namespace: kube-system
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-Nginx
      app.kubernetes.io/instance: ingress-Nginx
      app.kubernetes.io/component: controller
  replicas: 1
  revisionHistoryLimit: 10
  minReadySeconds: 0
  template:
    Metadata:
      labels:
        app.kubernetes.io/name: ingress-Nginx
        app.kubernetes.io/instance: ingress-Nginx
        app.kubernetes.io/component: controller
    spec:
      dnsPolicy: ClusterFirst
      containers:
        - name: controller
          image: "k8s.gcr.io/ingress-Nginx/controller:v0.44.0@sha256:3dd0fac48073beaca2d67a78c746c7593f9c575168a17139a9955a82c63c4b9a"
          imagePullPolicy: IfNotPresent
          lifecycle:
            preStop:
              exec:
                command:
                - /wait-shutdown
          args:
            - /nginx-ingress-controller
            - --publish-service=$(POD_NAMESPACE)/ingress-Nginx-controller
            - --election-id=ingress-controller-leader
            - --ingress-class=Nginx
            - --configmap=$(POD_NAMESPACE)/ingress-Nginx-controller
            - --validating-webhook=:8443
            - --validating-webhook-certificate=/usr/local/certificates/cert
            - --validating-webhook-key=/usr/local/certificates/key
            -  --default-ssl-certificate=kube-system/Nginx-certificates ##custom by environment,must be created
          securityContext:
            capabilities:
                drop:
                - ALL
                add:
                - NET_BIND_SERVICE
            runAsUser: 101
            allowPrivilegeEscalation: true
          env:
            - name: POD_NAME
              valueFrom:
                fieldRef:
                  fieldpath: Metadata.name
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldpath: Metadata.namespace
            - name: LD_PRELOAD
              value: /usr/local/lib/libmimalloc.so
            - name: GODEBUG
              value: x509ignoreCN=0
          livenessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 1
            successthreshold: 1
            failureThreshold: 5
          readinessProbe:
            httpGet:
              path: /healthz
              port: 10254
              scheme: HTTP
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 1
            successthreshold: 1
            failureThreshold: 3
          ports:
            - name: http
              containerPort: 80
              protocol: TCP
            - name: https
              containerPort: 443
              protocol: TCP
            - name: webhook
              containerPort: 8443
              protocol: TCP
          volumeMounts:
            - name: webhook-cert
              mountPath: /usr/local/certificates/
              readOnly: true
          resources:
            requests:
              cpu: 100m
              memory: 90Mi
      nodeselector:
        kubernetes.io/os: linux
      serviceAccountName: ingress-Nginx
      terminationGracePeriodSeconds: 300
      volumes:
        - name: webhook-cert
          secret:
            secretName: ingress-Nginx-admission

入口配置(服务名称/端点为了这篇文章而改变):

apiVersion: extensions/v1beta1
kind: Ingress
Metadata:
  annotations:
    ingress.kubernetes.io/proxy-body-size: 50m
    ingress.kubernetes.io/proxy-request-buffering: "off"
    kubernetes.io/ingress.class: Nginx
    Nginx.ingress.kubernetes.io/backend-protocol: HTTPS
    Nginx.ingress.kubernetes.io/default-backend: test-endpoint-svc
    Nginx.ingress.kubernetes.io/proxy-body-size: 50m
    Nginx.ingress.kubernetes.io/proxy-http-version: "1.1"
    Nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
    Nginx.ingress.kubernetes.io/ssl-passthrough: "False"
  labels:
    app: test-endpoint
  name: test-endpoint
  namespace: default
spec:
  backend:
    serviceName: test-endpoint-svc
    servicePort: 443
  rules:
  - host: test.internal
    http:
      paths:
      - backend:
          serviceName: test-endpoint-svc
          servicePort: 443
        path: /
  tls:
  - hosts:
    - test.internal
    secretName: Nginx-certificates

这是 curl -k -vvv -u <user>:<password> https://test.internal

的示例工作输出
* Rebuilt URL to: https://test.internal/
*   Trying <correct ip>...
* TCP_NODELAY set
* Connected to test.internal (<correct ip>) port 443 (#0)
* ALPN,offering h2
* ALPN,offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT),TLS handshake,Client hello (1):
* TLSv1.3 (IN),Server hello (2):
* TLSv1.3 (IN),[no content] (0):
* TLSv1.3 (IN),Encrypted Extensions (8):
* TLSv1.3 (IN),Certificate (11):
* TLSv1.3 (IN),CERT verify (15):
* TLSv1.3 (IN),Finished (20):
* TLSv1.3 (OUT),TLS change cipher,Change cipher spec (1):
* TLSv1.3 (OUT),[no content] (0):
* TLSv1.3 (OUT),Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN,server accepted to use h2
* Server certificate:
*  subject: CN=test.internal; O=test.internal
*  start date: Mar  4 00:53:27 2021 GMT
*  expire date: Mar  4 00:53:27 2022 GMT
*  issuer: CN=test.internal; O=test.internal
*  SSL certificate verify result: self signed certificate (18),continuing anyway.
* Using HTTP2,server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* TLSv1.3 (OUT),TLS app data,[no content] (0):
* Server auth using Basic with user '<user>'
* Using Stream ID: 1 (easy handle 0x55a3643114c0)
* TLSv1.3 (OUT),[no content] (0):
> GET / HTTP/2
> Host: test.internal
> Authorization: Basic <password>
> User-Agent: curl/7.61.1
> Accept: */*
> 
* TLSv1.3 (IN),Newsession Ticket (4):
* TLSv1.3 (IN),[no content] (0):
* Connection state changed (MAX_CONCURRENT_STREAMS == 128)!
* TLSv1.3 (OUT),[no content] (0):
< HTTP/2 200 
< date: Thu,04 Mar 2021 01:05:43 GMT
< content-type: application/json; charset=UTF-8
< content-length: 533
< strict-transport-security: max-age=15724800; includeSubDomains
<expected response>

半秒后尝试相同的 curl 调用

* Rebuilt URL to: https://test.internal/
*   Trying <correct ip>...
* TCP_NODELAY set
* Connected to test.internal (<correct ip>) port 443 (#0)
* ALPN,server accepted to use h2
* Server certificate:
*  subject: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
*  start date: Feb  5 20:51:55 2021 GMT
*  expire date: Feb  5 20:51:55 2022 GMT
*  issuer: O=Acme Co; CN=Kubernetes Ingress Controller Fake Certificate
*  SSL certificate verify result: unable to get local issuer certificate (20),[no content] (0):
* Server auth using Basic with user <user>
* Using Stream ID: 1 (easy handle 0x560637cb34c0)
* TLSv1.3 (OUT),[no content] (0):
< HTTP/2 404 
< date: Thu,04 Mar 2021 01:05:44 GMT
< content-type: text/html
< content-length: 146
< strict-transport-security: max-age=15724800; includeSubDomains
< 
<html>
<head><title>404 Not Found</title></head>
<body>
<center><h1>404 Not Found</h1></center>
<hr><center>Nginx</center>
</body>
</html>
* TLSv1.3 (IN),[no content] (0):
* Connection #0 to host test.internal left intact

我尝试了对入口注释的各种更改,添加/删除认主机,以及从控制器添加/删除 GODEBUG 环境变量。与 404 相比,这些调用何时成功似乎没有规律,而且我不愿深入研究打开 404 日志,因为需要自定义模板 (https://github.com/kubernetes/ingress-nginx/issues/4856)。 Nginx-certificates 机密同时存在kube-systemdefault 命名空间中,并且是使用 openssl 生成的。

这里发生了什么?

解决方法

我正在开发一个使用 Kubernetes 部署的 React 应用程序。根据我的经验,显示 404 页面 - 返回了一些响应的事实 - 意味着部署工作正常。

就我而言,每当我收到 404 时,前端代码都有问题。所以,你应该检查你的前端 - 特别是路由配置。

希望这能给你一些指导。