Nginx-ingress 工作进程不断重启

问题描述

我最近将入口控制器升级kubernetes-ingress v1.10.0。入口似乎正确路由流量,但在检查 pod 日志后,我注意到生成了大量通知

2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2748
2021/02/10 09:40:23 [notice] 19#19: worker process 2748 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2745
2021/02/10 09:40:23 [notice] 19#19: worker process 2745 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
W0210 09:40:23.416499       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.416812       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.416912       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2735
2021/02/10 09:40:23 [notice] 19#19: worker process 2735 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2737
2021/02/10 09:40:23 [notice] 19#19: worker process 2737 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2742 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2746
2021/02/10 09:40:23 [notice] 19#19: worker process 2746 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2744
2021/02/10 09:40:23 [notice] 19#19: worker process 2744 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2740
2021/02/10 09:40:23 [notice] 19#19: worker process 2740 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2736
2021/02/10 09:40:23 [notice] 19#19: worker process 2736 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2741
2021/02/10 09:40:23 [notice] 19#19: worker process 2734 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2741 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2739
2021/02/10 09:40:23 [notice] 19#19: worker process 2739 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2738
2021/02/10 09:40:23 [notice] 19#19: worker process 2738 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2743
2021/02/10 09:40:23 [notice] 19#19: worker process 2743 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2749
2021/02/10 09:40:23 [notice] 19#19: worker process 2749 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2747
2021/02/10 09:40:23 [notice] 19#19: worker process 2747 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [warn] 2718#2718: *6697105 an upstream response is buffered to a temporary file /var/cache/Nginx/proxy_temp/6/79/0000214796 while reading upstream,client: xxxx,server: xxxx,request: "GET /xxxx HTTP/1.1",upstream: "xxxx",host: "xxxx",referrer: "xxxx"
2021/02/10 09:40:23 [notice] 2769#2769: signal process started
2021/02/10 09:40:23 [notice] 19#19: signal 1 (SIGHUP) received from 2769,reconfiguring
2021/02/10 09:40:23 [notice] 19#19: reconfiguring
2021/02/10 09:40:23 [notice] 19#19: using the "epoll" event method
2021/02/10 09:40:23 [notice] 19#19: start worker processes
2021/02/10 09:40:23 [notice] 19#19: start worker process 2770
2021/02/10 09:40:23 [notice] 19#19: start worker process 2771
2021/02/10 09:40:23 [notice] 19#19: start worker process 2772
2021/02/10 09:40:23 [notice] 19#19: start worker process 2773
2021/02/10 09:40:23 [notice] 19#19: start worker process 2774
2021/02/10 09:40:23 [notice] 19#19: start worker process 2775
2021/02/10 09:40:23 [notice] 19#19: start worker process 2776
2021/02/10 09:40:23 [notice] 19#19: start worker process 2777
2021/02/10 09:40:23 [notice] 19#19: start worker process 2778
2021/02/10 09:40:23 [notice] 19#19: start worker process 2779
2021/02/10 09:40:23 [notice] 19#19: start worker process 2780
2021/02/10 09:40:23 [notice] 19#19: start worker process 2781
2021/02/10 09:40:23 [notice] 19#19: start worker process 2782
2021/02/10 09:40:23 [notice] 19#19: start worker process 2783
2021/02/10 09:40:23 [notice] 19#19: start worker process 2784
2021/02/10 09:40:23 [notice] 19#19: start worker process 2785
90.114.22.230 - - [10/Feb/2021:09:40:23 +0000] "GET /xxxx HTTP/1.1" 200 352910 "xxxx" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:84.0) Gecko/20100101 Firefox/84.0" "-"
2021/02/10 09:40:23 [notice] 2753#2753: gracefully shutting down
2021/02/10 09:40:23 [notice] 2755#2755: gracefully shutting down
2021/02/10 09:40:23 [notice] 2760#2760: gracefully shutting down
2021/02/10 09:40:23 [notice] 2755#2755: exiting
2021/02/10 09:40:23 [notice] 2753#2753: exiting
2021/02/10 09:40:23 [notice] 2762#2762: gracefully shutting down
2021/02/10 09:40:23 [notice] 2760#2760: exiting
2021/02/10 09:40:23 [notice] 2766#2766: gracefully shutting down
2021/02/10 09:40:23 [notice] 2762#2762: exiting
2021/02/10 09:40:23 [notice] 2766#2766: exiting
2021/02/10 09:40:23 [notice] 2759#2759: gracefully shutting down
2021/02/10 09:40:23 [notice] 2759#2759: exiting
2021/02/10 09:40:23 [notice] 2763#2763: gracefully shutting down
2021/02/10 09:40:23 [notice] 2761#2761: gracefully shutting down
2021/02/10 09:40:23 [notice] 2767#2767: gracefully shutting down
2021/02/10 09:40:23 [notice] 2763#2763: exiting
2021/02/10 09:40:23 [notice] 2767#2767: exiting
2021/02/10 09:40:23 [notice] 2761#2761: exiting
2021/02/10 09:40:23 [notice] 2760#2760: exit
2021/02/10 09:40:23 [notice] 2753#2753: exit
2021/02/10 09:40:23 [notice] 2766#2766: exit
2021/02/10 09:40:23 [notice] 2764#2764: gracefully shutting down
2021/02/10 09:40:23 [notice] 2764#2764: exiting
2021/02/10 09:40:23 [notice] 2752#2752: gracefully shutting down
2021/02/10 09:40:23 [notice] 2752#2752: exiting
2021/02/10 09:40:23 [notice] 2763#2763: exit
2021/02/10 09:40:23 [notice] 2762#2762: exit
2021/02/10 09:40:23 [notice] 2764#2764: exit
2021/02/10 09:40:23 [notice] 2759#2759: exit
2021/02/10 09:40:23 [notice] 2755#2755: exit
2021/02/10 09:40:23 [notice] 2752#2752: exit
2021/02/10 09:40:23 [notice] 2767#2767: exit
2021/02/10 09:40:23 [notice] 2761#2761: exit
2021/02/10 09:40:23 [notice] 2758#2758: gracefully shutting down
2021/02/10 09:40:23 [notice] 2758#2758: exiting
2021/02/10 09:40:23 [notice] 2756#2756: gracefully shutting down
2021/02/10 09:40:23 [notice] 2756#2756: exiting
2021/02/10 09:40:23 [notice] 2758#2758: exit
2021/02/10 09:40:23 [notice] 2756#2756: exit
2021/02/10 09:40:23 [notice] 2765#2765: gracefully shutting down
2021/02/10 09:40:23 [notice] 2765#2765: exiting
2021/02/10 09:40:23 [notice] 2757#2757: gracefully shutting down
2021/02/10 09:40:23 [notice] 2757#2757: exiting
2021/02/10 09:40:23 [notice] 2754#2754: gracefully shutting down
2021/02/10 09:40:23 [notice] 2754#2754: exiting
2021/02/10 09:40:23 [notice] 2754#2754: exit
2021/02/10 09:40:23 [notice] 2765#2765: exit
2021/02/10 09:40:23 [notice] 2757#2757: exit
I0210 09:40:23.604803       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",Namespace:"xxxx",Name:"xxxx",UID:"82a71705-194e-4919-a7e2-a511d52c1a7a",APIVersion:"networking.k8s.io/v1beta1",ResourceVersion:"77919848",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.604873       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"10246997-07ae-41e1-b811-0ec630647f3b",ResourceVersion:"182677830",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605520       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"d628825f-1b06-4719-b4b0-4d971b8c0a54",ResourceVersion:"182677778",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605557       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"4b7b1fa1-1d7d-41a5-9d97-5f5aee52ade7",ResourceVersion:"182678922",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605569       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"b86b8b8e-82b9-40d0-b02d-073db557c0e1",ResourceVersion:"182678955",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
I0210 09:40:23.605577       1 event.go:282] Event(v1.ObjectReference{Kind:"Ingress",UID:"585ccdee-9807-442e-9b4f-7d1a97264216",ResourceVersion:"182677754",Fieldpath:""}): type: 'normal' reason: 'AddedOrUpdated' Configuration for xxxx/xxxx was added or updated 
W0210 09:40:23.614001       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.614213       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
W0210 09:40:23.614304       1 listers.go:79] can not retrieve list of objects using index : Index with name namespace does not exist
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2763
2021/02/10 09:40:23 [notice] 19#19: worker process 2755 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2763 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2767 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2766
2021/02/10 09:40:23 [notice] 19#19: worker process 2752 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2753 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2766 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2756
2021/02/10 09:40:23 [notice] 19#19: worker process 2756 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2758 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2759 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2760 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2761 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2762 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: worker process 2764 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2754
2021/02/10 09:40:23 [notice] 19#19: worker process 2754 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received
2021/02/10 09:40:23 [notice] 19#19: signal 17 (SIGCHLD) received from 2765
2021/02/10 09:40:23 [notice] 19#19: worker process 2765 exited with code 0
2021/02/10 09:40:23 [notice] 19#19: signal 29 (SIGIO) received

这似乎在所有 Pod 上永远且非常快速地循环。 我使用 these manifests 部署了我的控制器并重新创建了发行​​说明中提到的认服务器机密。

控制器参数是:

args:
  - -Nginx-configmaps=$(POD_NAMESPACE)/Nginx-config
  - -default-server-tls-secret=$(POD_NAMESPACE)/default-server-secret
  - -global-configuration=$(POD_NAMESPACE)/Nginx-configuration
  - -report-ingress-status
  - -enable-prometheus-metrics
  - -enable-snippets

这是我的 Nginx-config CM 的内容

data:
  client-max-body-size: 50m
  proxy-read-timeout: 5m
  server-tokens: "False"

知道那里发生了什么以及如何解决这个问题吗?

编辑

经过更多研究后,我发现我的两个入口在不断更新:

Name:             xxxx
Namespace:        xxxx
Address:          
Default backend:  default-http-backend:80 (<none>)
TLS:
  xxxx terminates xxxx
Rules:
  Host  Path  Backends
  ----  ----  --------
  *     *     default-http-backend:80 (<none>)
Annotations:
  ingress.kubernetes.io/ssl-redirect:                true
  kubectl.kubernetes.io/last-applied-configuration:  {"apiVersion":"extensions/v1beta1","kind":"Ingress","Metadata":{"annotations":{"ingress.kubernetes.io/ssl-redirect":"true","kubernetes.io/ingress.class":"Nginx","Nginx.org/mergeable-ingress-type":"master"},"labels":{"app.kubernetes.io/component":"xxxx","app.kubernetes.io/instance":"xxxx","app.kubernetes.io/name":"xxxx","app.kubernetes.io/part-of":"xxxx","argocd.argoproj.io/instance":"xxxx"},"name":"xxxx","namespace":"xxxx"},"spec":{"rules":[{"host":"xxxx"}],"tls":[{"hosts":["xxxx"],"secretName":"xxxx"}]}}

  kubernetes.io/ingress.class:       Nginx
  Nginx.org/mergeable-ingress-type:  master
Events:
  Type    Reason          Age                       From                      Message
  ----    ------          ----                      ----                      -------
  normal  AddedOrUpdated  3m5s (x2600127 over 6d)   nginx-ingress-controller  Configuration for xxxx/xxxx was added or updated
  normal  AddedOrUpdated  2m12s (x2599793 over 6d)  nginx-ingress-controller  Configuration for xxxx/xxxx was added or updated
  normal  AddedOrUpdated  66s (x2600182 over 6d)    nginx-ingress-controller  Configuration for xxxx/xxxx was added or updated
apiVersion: extensions/v1beta1
kind: Ingress
Metadata:
  annotations:
    ingress.kubernetes.io/ssl-redirect: "true"
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"extensions/v1beta1","secretName":"xxxx"}]}}
    kubernetes.io/ingress.class: Nginx
    Nginx.org/mergeable-ingress-type: master
  creationTimestamp: "2021-01-18T09:55:07Z"
  generation: 1
  labels:
    app.kubernetes.io/component: xxxx
    app.kubernetes.io/instance: xxxx
    app.kubernetes.io/name: xxxx
    app.kubernetes.io/part-of: xxxx
    argocd.argoproj.io/instance: xxxx
  name: xxxx
  namespace: xxxx
  resourceVersion: "182677754"
  selfLink: /apis/extensions/v1beta1/namespaces/xxxx/ingresses/xxxx
  uid: 585ccdee-9807-442e-9b4f-7d1a97264216
spec:
  rules:
  - host: xxxx
  tls:
  - hosts:
    - xxxx
    secretName: xxxx
status:
  loadBalancer:
    ingress:
    - {}

我的环境由 ArgoCD 管理,但在检查日志后,看起来更新并非来自 ArgoCD。我想知道这些更新是否与 -report-ingress-status 选项有关。

编辑二: 我删除-report-ingress-status 并且它没有改变任何东西。

解决方法

我不知道真正的根本原因,但我删除了所有不断更新和重新创建的 TLS 机密、证书和入口。它解决了这个问题。

在此问题之前发生了不同的事件,并且可能与之相关:我的 3 个入口节点中有 2 个失败,在升级过程中应用了错误的 CRD,然后才得到快速修复。

目前我只能说这么多,但是删除与不断更新的入口相关的资源并重新创建它们确实可以解决问题。