问题描述
我有三个名称空间:具有prometheus-operator的名称空间 monitoring ,具有RabbitMq队列管理器和prometheus-adapter的名称空间 rabbitmq ,具有名称空间的 worker 一个仅为RabbitMq窗格创建输入的应用程序。我想使用水平Pod自动缩放器(HPA)来缩放Worker Pod(在 worker 名称空间上),并使用RabbitMq Pod(在 rabbitmq 名称空间上)的队列“ task_queue”中的度量标准。所有这些度量标准都是由prometheus运算符(在监视名称空间上)收集的,并显示在prometheus前端中:
在prometheus-url:8080 / graph上查询“ rabbitmq_queue_messages”:
rabbitmq_queue_messages{durable="true",endpoint="metrics",instance="x.x.x.x:9419",job="rabbitmq-server",namespace="rabbitmq",pod="rabbitmq-server-0",queue="task_queue",service="rabbitmq-server",vhost="/"}
从头盔图表中安装了RabbitMQ,Prometheus-operator和Prometheus-adapter
RabbitMQ(values.yaml具有密码,并在9419启用度量以进行抓取):
helm install --namespace rabbitmq rabbitmq-server stable/rabbitmq \
--set extraPlugins=rabbitmq_prometheus \
-f charts/default/rabbitmq/values.yaml
Prometheus适配器:
helm upgrade --install --namespace rabbitmq prometheus-adapter stable/prometheus-adapter \
--set prometheus.url="http://pmt-server-prometheus-oper-prometheus.monitoring.svc" \
--set prometheus.port="9090"
Prometheus-operator:
helm upgrade --install --namespace monitoring pmt-server stable/prometheus-operator \
--set prometheusOperator.createCustomresource=false \
-f charts/default/values.yaml
普罗米修斯values.yaml:
prometheus:
additionalServiceMonitors:
- name: rabbitmq-svc-monitor
selector:
matchLabels:
app: rabbitmq
namespaceSelector:
matchNames:
- rabbitmq
endpoints:
- port: metrics
interval: 10s
path: /metrics
自定义指标还可以:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/rabbitmq/services/rabbitmq-server/rabbitmq_queue_messages" | jq .
{
"kind": "MetricValueList","apiVersion": "custom.metrics.k8s.io/v1beta1","Metadata": {
"selfLink": "/apis/custom.metrics.k8s.io/v1beta1/namespaces/rabbitmq/services/rabbitmq-server/rabbitmq_queue_messages"
},"items": [
{
"describedobject": {
"kind": "Service","namespace": "rabbitmq","name": "rabbitmq-server","apiVersion": "/v1"
},"metricName": "rabbitmq_queue_messages","timestamp": "2020-08-20T12:15:39Z","value": "0","selector": null
}
]
}
这是我的hpa.yaml:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
Metadata:
name: rabbitmq-queue-worker-hpa
namespace: worker
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: worker
minReplicas: 1
maxReplicas: 50
metrics:
- type: Object
object:
metric:
name: rabbitmq_queue_messages
describedobject:
apiVersion: "/v1"
kind: Service
name: rabbitmq-server.rabbitmq.svc.cluster.local
target:
type: Value
value: 100
但是hpa不能像kubectl描述的那样工作:
kubectl describe hpa/rabbitmq-queue-worker-hpa -n worker
Name: rabbitmq-queue-worker-hpa
Namespace: worker
Labels: app.kubernetes.io/managed-by=Helm
Annotations: Meta.helm.sh/release-name: rabbitmq-scaling-demo-app
Meta.helm.sh/release-namespace: worker
CreationTimestamp: Thu,20 Aug 2020 08:42:32 -0300
Reference: Deployment/worker
Metrics: ( current / target )
"rabbitmq_queue_messages" on Service/rabbitmq-server.rabbitmq.svc.cluster.local (target value): <unkNown> / 100
Min replicas: 1
Max replicas: 50
Deployment pods: 1 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbletoScale True SucceededGetScale the HPA controller was able to get the target's current scale
ScalingActive False FailedGetobjectMetric the HPA was unable to compute the replica count: unable to get metric rabbitmq_queue_messages: Service on worker rabbitmq-server.rabbitmq.svc.cluster.local/unable to fetch metrics from custom metrics API: the server Could not find the metric rabbitmq_queue_messages for services rabbitmq-server.rabbitmq.svc.cluster.local
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedComputeMetricsReplicas 60m (x12 over 63m) horizontal-pod-autoscaler invalid metrics (1 invalid out of 1),first error is: Failed to get object metric value: unable to get metric rabbitmq_queue_messages: Service on worker rabbitmq-server.rabbitmq.svc.cluster.local/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered
Warning FailedGetobjectMetric 60m (x13 over 63m) horizontal-pod-autoscaler unable to get metric rabbitmq_queue_messages: Service on worker rabbitmq-server.rabbitmq.svc.cluster.local/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered
Warning FailedGetobjectMetric 58m (x3 over 58m) horizontal-pod-autoscaler unable to get metric rabbitmq_queue_messages: Service on worker rabbitmq-server.rabbitmq.svc.cluster.local/unable to fetch metrics from custom metrics API: the server is currently unable to handle the request (get services.custom.metrics.k8s.io rabbitmq-server.rabbitmq.svc.cluster.local)
Warning FailedGetobjectMetric 2m59s (x218 over 57m) horizontal-pod-autoscaler unable to get metric rabbitmq_queue_messages: Service on worker rabbitmq-server.rabbitmq.svc.cluster.local/unable to fetch metrics from custom metrics API: the server Could not find the metric rabbitmq_queue_messages for services rabbitmq-server.rabbitmq.svc.cluster.local
我相信HPA试图在 worker 名称空间上找到RabbitMq服务,
Warning FailedGetobjectMetric 60m (x13 over 63m) horizontal-pod-autoscaler unable to get metric rabbitmq_queue_messages: Service on worker rabbitmq-server.rabbitmq.svc.cluster.local/unable to fetch metrics from custom metrics API: no custom metrics API (custom.metrics.k8s.io) registered
但该服务位于 rabbitmq 名称空间上。我尝试使用兔子的服务FQDN(rabbitmq-server.rabbitmq.svc.cluster.local)和仅服务的名称(rabbitmq-server)。我想念什么?有办法使它起作用吗?这里的要点是,我还有一个包含10个以上名称空间的项目,并且它们都使用相同的Rabbit服务器(在 rabbitmq 名称空间上),因此让它们一起使用同一名称空间将是一场噩梦。谢谢。
编辑1:我的custom metrics config.yaml
prometheus:
url: http://pmt-server-prometheus-oper-prometheus.monitoring.svc
port: 9090
rbac:
create: true
serviceAccount:
create: true
service:
port: 443
logLevel: 6
rules:
custom:
- seriesQuery: 'rabbitmq_queue_messages{namespace!="",service!=""}'
resources:
overrides:
namespace: {resource: "namespace"}
service: {resource: "service"}
metricsQuery: sum(<<.Series>>{<<.LabelMatchers>>,queue="task_queue"}) by (<<.GroupBy>>)
然后随该文件安装适配器头盔:
helm upgrade --install --namespace rabbitmq prometheus-adapter stable/prometheus-adapter -f config.yaml
这是HPA描述,如果HPA是在Rabbitmq名称空间中创建的:
Name: rabbitmq-queue-worker-hpa
Namespace: rabbitmq
Labels: app.kubernetes.io/managed-by=Helm
Annotations: Meta.helm.sh/release-name: rabbitmq-scaling-demo-app
Meta.helm.sh/release-namespace: worker
CreationTimestamp: Fri,21 Aug 2020 08:45:25 -0300
Reference: Deployment/worker
Metrics: ( current / target )
resource cpu on pods (as a percentage of request): <unkNown> / 80%
Min replicas: 1
Max replicas: 50
Deployment pods: 0 current / 0 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbletoScale False FailedGetScale the HPA controller was unable to get the target's current scale: deployments/scale.apps "worker" not found
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedGetScale 9s (x17 over 4m11s) horizontal-pod-autoscaler deployments/scale.apps "worker" not found
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)