问题描述
我正在尝试在 EKS Kubernetes 集群中设置 AWS Xray Daemonset。
问题在于 "xray-daemon" pod 无法以 "CrashLoopBackOff" 状态启动。
当我查看daemonset pods的日志时,显示如下错误:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
normal Scheduled 20s default-scheduler Successfully assigned default/xray-daemon-wcnzt to ip-172-34-169-37.ap-south-1.compute.internal
normal Pulling 15s (x2 over 19s) kubelet Pulling image "amazon/aws-xray-daemon:latest"
normal Pulled 12s (x2 over 16s) kubelet Successfully pulled image "amazon/aws-xray-daemon:latest"
normal Created 12s (x2 over 16s) kubelet Created container xray-daemon
Warning Failed 12s (x2 over 16s) kubelet Error: Failed to start container "xray-daemon": Error response from daemon: OCI runtime create Failed: container_linux.go:370: starting container process caused: exec: "/usr/bin/xray": stat /usr/bin/xray: no such file or directory: unkNown
重现步骤:
首先,我使用 AWSXRayDaemonWriteAccess
创建 IAM 服务帐户eksctl create iamserviceaccount \
--name xray-daemon \
--namespace default \
--cluster eksdemo1 \
--attach-policy-arn arn:aws:iam::aws:policy/AWSXRayDaemonWriteAccess \
--approve \
--override-existing-serviceaccounts
然后我尝试使用这个 "xray-k8s-daemonset.yml" 文件创建 xray daemonset:
apiVersion: v1
kind: ServiceAccount
Metadata:
labels:
app: xray-daemon
name: xray-daemon
namespace: default
# Update IAM Role ARN created for X-Ray access
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::12345678999:role/eksctl-eksdemo1-addon-iamserviceaccount-defa-Role1-1FIM8S2K6D404
---
apiVersion: apps/v1
kind: DaemonSet
Metadata:
name: xray-daemon
namespace: default
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: xray-daemon
template:
Metadata:
labels:
app: xray-daemon
spec:
serviceAccountName: xray-daemon
volumes:
- name: config-volume
configMap:
name: "xray-config"
containers:
- name: xray-daemon
image: amazon/aws-xray-daemon
command: ["/usr/bin/xray","-c","/aws/xray/config.yaml"]
resources:
requests:
cpu: 256m
memory: 32Mi
limits:
cpu: 512m
memory: 64Mi
ports:
- name: xray-ingest
containerPort: 2000
hostPort: 2000
protocol: UDP
- name: xray-tcp
containerPort: 2000
hostPort: 2000
protocol: TCP
volumeMounts:
- name: config-volume
mountPath: /aws/xray
readOnly: true
---
# Configuration for AWS X-Ray daemon
apiVersion: v1
kind: ConfigMap
Metadata:
name: xray-config
namespace: default
data:
config.yaml: |-
TotalBufferSizeMB: 24
Socket:
UDPAddress: "0.0.0.0:2000"
TCPAddress: "0.0.0.0:2000"
Version: 2
---
# k8s service deFinition for AWS X-Ray daemon headless service
apiVersion: v1
kind: Service
Metadata:
name: xray-service
namespace: default
spec:
selector:
app: xray-daemon
clusterIP: None
ports:
- name: xray-ingest
port: 2000
protocol: UDP
- name: xray-tcp
port: 2000
protocol: TCP
角色 arn 是正确的,我确定(我在 AWS 控制台中检查了 arn)
解决方法
此问题已通过使用“amazon/aws-xray-daemon:3.2.0”docker 映像修复
"xray-k8s-daemonset.yml" 中的 pod 模板似乎无法在最新版本的 "amazon/aws-xray-daemon" 中正常工作> 图像不再。
从映像版本 3.3.0 开始,有一些更改,这会破坏 "xray-k8s-daemonset.yml" 中的 xray daemonset 部署>.
所以我在 "xray 中用 "amazon/aws-xray-daemon:3.2.0" 替换了图像 "amazon/aws-xray-daemon" -k8s-daemonset.yml".
版本 3.2.0 工作正常,问题已修复
P.S. 目前尚不清楚如何使 xray daemonset 与最新版本的 "amazon/aws-xray-daemon" 镜像一起正常工作。
因此,仍然欢迎其他解决方案。