问题描述
我在Prometheus运算符中具有以下默认警报规则,
@H_404_3@ - alert: KubePodNotReady annotations: message: Pod {{`{{`}} $labels.namespace {{`}}`}}/{{`{{`}} $labels.pod {{`}}`}} has been in a non-ready state for longer than 15 minutes. runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodnotready expr: |- sum by (namespace,pod) ( max by(namespace,pod) ( kube_pod_status_phase{job="kube-state-metrics",namespace=~".*",phase=~"Pending|UnkNown"} ) * on(namespace,pod) group_left(owner_kind) topk by(namespace,pod) ( 1,max by(namespace,pod,owner_kind) (kube_pod_owner{owner_kind!="Job"}) ) ) > 0 for: 15m labels: severity: warning
我可以使用以下表达式获得豆荚标签,
@H_404_3@kube_pod_info * on(namespace,pod) group_left kube_pod_labels{label_teamname="example"} kube_pod_info * on(namespace,pod) group_left(label_teamname) kube_pod_labels
但是我不确定如何更新警报规则以显示标签。我只是尝试添加标签而不编辑表达式,
@H_404_3@ labels: severity: warning teamname: '{{ $labels.label_teamname }}'
但这没用。
是否需要更改表达式才能在警报中包含团队名称?如果是,请提出如何更改以下表达式。
@H_404_3@ expr: |- sum by (namespace,pod) ( max by(namespace,pod) ( kube_pod_status_phase{job="kube-state-metrics",phase=~"Pending|UnkNown"} ) * on(namespace,pod) ( 1,owner_kind) (kube_pod_owner{owner_kind!="Job"}) ) ) > 0
解决方法
此表达式对我有用,
(sum by (namespace,pod) (
max by(namespace,pod) (
kube_pod_status_phase{job="kube-state-metrics",namespace=~".*",phase=~"Pending|Unknown"}
) * on(namespace,pod) group_left(owner_kind) topk by(namespace,pod) (
1,max by(namespace,pod,owner_kind) (kube_pod_owner{owner_kind!="Job"})
)
) > 0) * on(namespace,pod) group_left(label_teamname) kube_pod_labels