问题描述
我创建了一个启动探测器并使它总是失败。它应该会导致 pod 被杀死并重新启动,但事实并非如此。我看到启动探测失败的一个事件(之后没有任何事件),但 Pod 显示为 1/1 Running
。当我运行 Helm 测试时,它通过了!
使用 K8s 版本:1.19.4
当我检查事件时,我得到:
4m44s normal SuccessfulCreate replicaset/MysqLpod-5957645967 Created pod: MysqLpod-5957645967-fj95t
4m44s normal Scalingreplicaset deployment/MysqLpod Scaled up replica set MysqLpod-5957645967 to 1
4m44s normal Scheduled pod/MysqLpod-5957645967-fj95t Successfully assigned data-layer/MysqLpod-5957645967-fj95t to minikube
4m43s normal Created pod/MysqLpod-5957645967-fj95t Created container MysqL
4m43s normal Pulled pod/MysqLpod-5957645967-fj95t Container image "MysqL:5.6" already present on machine
4m43s normal Started pod/MysqLpod-5957645967-fj95t Started container MysqL
4m41s Warning Unhealthy pod/MysqLpod-5957645967-fj95t Startup probe Failed: Warning: Using a password on the command line interface can be insecure.
MysqLadmin: connect to server at 'localhost' Failed
error: 'Can't connect to local MysqL server through socket '/var/run/MysqLd/MysqLd.sock' (2)'
Check that MysqLd is running and that the socket: '/var/run/MysqLd/MysqLd.sock' exists!
检查 Pod,我看到(使用 --watch
):
NAME READY STATUS RESTARTS AGE
MysqL-db-app-5957645967-fj95t 0/1 Running 0 7m18s
MysqL-db-app-5957645967-fj95t 1/1 Running 0 7m43s
注意它有零重启。
我的部署有:
apiVersion: apps/v1
kind: Deployment
Metadata:
name: {{ include "MysqLapp.name" . }}
namespace: {{ quote .Values.Metadata.namespace }}
spec:
replicas: {{ .Values.deploymentSpecs.replicas}}
selector:
matchLabels:
{{- include "MysqLapp.selectorLabels" . | nindent 6 }}
template:
Metadata:
labels:
{{- include "MysqLapp.selectorLabels" . | nindent 8 }}
spec:
containers:
- image: "{{ .Values.image.name }}:{{ .Values.image.tag }}"
imagePullPolicy: {{ .Values.image.pullPolicy }}
name: {{ .Values.image.name }}
env:
- name: MysqL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: db-password
ports:
- containerPort: {{ .Values.ports.containerPort }}
name: {{ .Values.image.name }}
startupProbe:
exec:
command:
- /bin/sh
- -c
- MysqLadmin ping -u wrong -pwrong
periodSeconds: {{ .Values.startupProbe.periodSeconds }}
timeoutSeconds: {{ .Values.startupProbe.timeoutSeconds }}
successthreshold: {{ .Values.startupProbe.successthreshold }}
failureThreshold: {{ .Values.startupProbe.failureThreshold }}
注意上面的 - MysqLadmin ping -u wrong -pwrong
。
Values.yaml:
Metadata:
namespace: data-layer
myprop: value
deploymentSpecs:
replicas: 1
labels:
app: db-service
image:
name: MysqL
pullPolicy: IfNotPresent
tag: "5.6"
ports:
containerPort: 3306
startupProbe:
periodSeconds: 10
timeoutSeconds: 2
successthreshold: 1
failureThreshold: 5
即使等了 5 分钟,我仍然能够运行测试(它使用 MysqL 客户端访问数据库)并且它工作正常!为什么这不会失败?
解决方法
它没有失败,因为结果证明 ping
命令返回 0
状态,即使用户/密码错误,只要它可以到达服务器。
检查服务器是否可用。如果服务器正在运行,mysqladmin 的返回状态为 0,否则为 1。即使在出现拒绝访问等错误的情况下也是 0,因为这意味着服务器正在运行但拒绝连接,这与服务器未运行不同。
要强制失败并重新启动,您可以使用:
mysqladmin ping -u root -p${MYSQL_ROOT_PASSWORD} --host fake