问题描述
我可以通过控制台设置以下内容,并将其用作 cloudformation 模板:
- 与我的 ALB 关联的可扩展目标,
- cpu 目标跟踪扩展策略,
- ALBRequestCountPerTarget 目标跟踪政策。
这一切都很好。在我的 Cloudformation 模板中创建策略还负责创建关联的横向扩展和缩减警报。
问题:自动创建的警报仅在前 3 个 60 秒的时间段内发生 3 个警报后才会触发。因此,如果突然负载进来,ECS 集群的服务需要 3 分钟才能横向扩展。这对我来说太长了。我希望它尽可能快地扩展。而且,从文档来看,ALB RequestCountPerTarget 的最小周期似乎是 60 秒:
“AWS/”中的指标仅支持大于 60 秒的时间段 命名空间
手动解决方案:现在,我可以进入控制台,在 cloudwatch 服务中,找到为我创建的 HIGH 和 LOW 警报,并编辑 HIGH 警报(触发向外扩展的警报)。所以我可以将警报的评估“周期”设置为 60 秒,“DatapointsToAlarm”设置为 1(一旦警报响起,触发向外扩展操作),“EvaluationPeriods”设置为 1(仅考虑前 60 秒的时间段) ,并将“阈值”设置为 500(如果过去 60 秒内我的 ALB 上有超过 500 个请求,则添加容量=向外扩展)。
为了测试,我使用 JMeter 并发送了大量请求,我可以看到警报在一分钟左右响起,并且我的 ECS 服务调整了所需的运行任务计数。这一切都很好。
但现在,我们都应该编写基础设施即代码 (IaC),对吗?因此,我希望将上述控制台调整包含在我的 CloudFormation 模板中。这就是问题发生的地方。我做了什么:
发生了什么:
- 警报会在 60 秒内进入警报状态,
- 策略尝试运行操作(向外扩展),但出现错误:
执行操作失败
arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster/my-ecs-service:policyName/ alb-requests-per-target-per-minute。
收到错误:“”
我不知道这意味着什么,只能告诉我:警报已正确发出,我们尝试对其采取行动并向外扩展,但它失败了(并且没有提供任何错误!)。
我尝试将此错误与其他成功操作进行比较 [AWS 在创建自动缩放策略时自动创建的警报中的操作 = 在 t=3 分钟触发的操作],我看到的唯一区别是错误消息中操作的 ARN 似乎缺少“createdBy”,当“默认”警报(在 cloudformation 创建自动缩放策略时自动提供的警报)触发操作时,该“createdBy”似乎附加到操作 ARN:>
成功执行的动作 arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service: policyName/tpg-cce-cpu-target-tracking-scaling-policy:createdBy/59b3e5ac-81ae-490f-8ecb-00241506a15e
成功执行的动作 arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service: policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868
注意上面的区别(策略 ARN 中缺少 createdBy,其中操作由我的自定义警报触发)。但我不知道如何得到它,因为在 CloudFormation 中,我 Fn::Ref 到策略 ARN,没有提到在策略 ARN 的末尾附加某种“createdBy”(请注意,这可能不是根本就是这个问题,我只是列出了我迄今为止发现的内容,这是我迄今为止发现的唯一区别 = 这可能是一条红鲱鱼/虚假线索)。
另一个线索,也许是,当我去 Cloudwatch 中的 AWS 控制台查看警报时:
- 我可以编辑 AWS 在我创建策略时自动创建的 Cloudwatch HIGH 警报,
- 我无法编辑我的自定义“横向扩展”警报(下方 CloudFormation 模板底部的警报)。我尝试编辑自定义警报时控制台中的错误是:
无法编辑 myCloudFormationStack-ALBRequestsScaleOutAlarm-E4VY9ZOJ5DOF 原样 具有目标跟踪扩展策略的 Auto Scaling 警报。
我的警报与使用策略自动创建的警报之间的另一个区别是:当我在 AWS 控制台中查看 CloudWatch->Alarms 并查看警报详细信息时,“操作”部分看起来不同。对于自动配置的警报,我看到了:
报警时,执行此操作 arn:aws:autoscaling:us-east-1:MYACCOUNTID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/ my-service:policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868
但是对于我自己的警报(在下面的 CloudFormation 模板中定义),我在警报详细信息(操作)中看到了这一点:
报警时,使用策略 alb-requests-per-target-per-minute(将指标 ALBRequestCountPerTarget 保持在目标值 1000。)
这是我的完整 CloudFormation 模板:
AWstemplateFormatVersion: '2010-09-09'
Description: ECS task deFinition,service,and hooks it up to the ALB via a Target Group
# IMPORTANT: this needs the first Cloudformation layers in place (see the imports below)
Parameters:
ContainerImageIdParam:
Description: The ECR container image ID and tag to deploy
Type: String
Default: MYACCOUNTID.dkr.ecr.us-east-1.amazonaws.com/myapp:v10
JDBCUrlParam:
Description: The JDBC URL to the RDS database (use the Route53 DNS entry to your database,and NOT the AWS URL)
Type: String
Default: jdbc-secretsmanager:MysqL://MysqL.MYPIERRednS.org:3306/MYDATABASE?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC
Resources:
Task:
Type: AWS::ECS::TaskDeFinition
Properties:
Family: myapp
cpu: 512
Memory: 1024
NetworkMode: awsvpc
RequiresCompatibilities:
- FARGATE
ExecutionRoleArn: !ImportValue ECSTaskExecutionRole
taskRoleArn: !ImportValue ECSTaskRole
ContainerDeFinitions:
- Name: myapp-container
Image: !Ref ContainerImageIdParam
cpu: 512
Memory: 1024
environment:
- name: JDBC_DB_URL
value: !Ref JDBCUrlParam
- name: JDBC_DB_DRIVER_CLASS
value: com.amazonaws.secretsmanager.sql.AWSSecretsManagerMysqLDriver
- name: JDBC_DB_USERNAME
value: dev/myapp/MysqL
- name: JDBC_DB_PASSWORD
value: notUsedButECSWillErrorIfMissing
- name: DB_NUM_THREADS
value: 10
PortMappings:
- ContainerPort: 9000
Protocol: tcp
LogConfiguration:
LogDriver: awslogs
Options:
awslogs-group: /ecs/myapp
awslogs-region: !Ref AWS::Region
awslogs-stream-prefix: myapp-app
Service:
Type: AWS::ECS::Service
DependsOn: ListenerRule
Properties:
ServiceName: myapp-service # todo if we remove the name,one will be automatically be generated
TaskDeFinition: !Ref Task
Cluster: !ImportValue ECSCluster
LaunchType: FARGATE
DesiredCount: 1 # set this to 0 if cloudformation has issues creating this stack (otherwise takes 3 hours and then fails/timeouts)
DeploymentConfiguration:
MaximumPercent: 200
MinimumHealthyPercent: 0
HealthCheckGracePeriodSeconds: 30
NetworkConfiguration:
AwsvpcConfiguration:
AssignPublicIp: ENABLED
subnets:
- !ImportValue Privatesubnet1
- !ImportValue Privatesubnet2
SecurityGroups:
- !ImportValue ECSServiceSecurityGroup
LoadBalancers:
- ContainerName: myapp-container
ContainerPort: 9000
TargetGroupArn: !Ref TargetGroup
TargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: myapp-tg
VpcId: !ImportValue VPC
Port: 9000
Protocol: HTTP
Matcher:
HttpCode: 200-299
HealthCheckIntervalSeconds: 30
HealthCheckPath: /myapp-svc/index.html
HealthCheckProtocol: HTTP
HealthCheckTimeoutSeconds: 10
HealthyThresholdCount: 2
UnhealthyThresholdCount: 6
targettype: ip
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: 30
ListenerRule:
Type: AWS::ElasticLoadBalancingV2::ListenerRule
Properties:
ListenerArn: !ImportValue LoadBalancerListenerHTTPS
Priority: 20
Conditions:
- Field: path-pattern
Values:
- /myapp-svc/*
Actions:
- TargetGroupArn: !Ref TargetGroup
Type: forward
ECSAutoScalingTarget:
Type: AWS::ApplicationAutoScaling::scalableTarget
Properties:
MaxCapacity: 6
MinCapacity: 1
ResourceId: !Join ["/",[service,!ImportValue ECSCluster,!GetAtt Service.Name]] # service/clusterName/serviceName = service/ecs-cluster-myapp/myapp-service
RoleARN: !Sub 'arn:aws:iam::${AWS::AccountId}:role/aws-service-role/ecs:application-autoscaling:amazonaws:com/AWSServiceRoleForApplicationAutoScaling_ECSService'
scalableDimension: ecs:service:DesiredCount
ServiceNamespace: ecs
cpuutilizationAutoScalingPolicy:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: myapp-cpu-target-tracking-scaling-policy
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ECSAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
disableScaleIn: true # disable scale in for this policy to give ALBRequestPolicy the priority on scale in decisions
PredefinedMetricSpecification:
PredefinedMetricType: ECSServiceAveragecpuutilization
ScaleInCooldown: 300
ScaleOutCooldown: 30
TargetValue: 50 # Average 50% cpu utilization
ServiceScalingPolicyALB:
Type: AWS::ApplicationAutoScaling::ScalingPolicy
Properties:
PolicyName: alb-requests-per-target-per-minute
PolicyType: TargetTrackingScaling
ScalingTargetId: !Ref ECSAutoScalingTarget
TargetTrackingScalingPolicyConfiguration:
TargetValue: 1000
ScaleInCooldown: 300
ScaleOutCooldown: 30
PredefinedMetricSpecification:
PredefinedMetricType: ALBRequestCountPerTarget
ResourceLabel: !Join
- '/'
- - !ImportValue EcsloadBalancerFullName
- !GetAtt TargetGroup.TargetGroupFullName
# NOTE: the ALB RequestCountPerTarget metric alarms are automatically
# created when we use that policy. But if we want a different evaluation period,# we need to define our own alarms. So,the new scale IN/OUT alarms are included below.
# SCALE OUT ALARM: if the total (SUM) of ALB requests per target is ABOVE the
# threshold a certain number of times in the past period,THEN send "scale out" alarm.
ALBRequestsScaleOutAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: RequestCountPerTarget
Namespace: AWS/ApplicationELB # Only a period greater than 60s is supported for metrics in the "AWS/" namespaces
ActionsEnabled: true
AlarmActions:
- !Ref ServiceScalingPolicyALB
# OKActions: []
# InsufficientDataActions: []
Statistic: Sum
Dimensions:
- Name: LoadBalancer
Value: !ImportValue EcsloadBalancerFullName
- Name: TargetGroup
Value: !GetAtt TargetGroup.TargetGroupFullName
Period: 60 # evaluation period (in seconds) = 1 datapoint
EvaluationPeriods: 1 # number of prevIoUs periods to take into account
DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
Threshold: 1000 # alarm threshold: more than 1000 requests
Unit: None
Comparisonoperator: GreaterThanThreshold
# SCALE IN ALARM: if the total (SUM) of ALB requests per target is BELOW the
# threshold a certain number of times in the past period,THEN send "scale in" alarm.
ALBRequestsScaleInAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
MetricName: RequestCountPerTarget
Namespace: AWS/ApplicationELB
Statistic: Sum
Period: 60 # evaluation period (in seconds) = 1 datapoint
EvaluationPeriods: 5 # number of prevIoUs periods to take into account
DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
Threshold: 500 # alarm threshold: less than 500 requests
Unit: None
AlarmActions:
- !Ref ServiceScalingPolicyALB
OKActions:
- !Ref ServiceScalingPolicyALB
Dimensions:
- Name: LoadBalancer
Value: !ImportValue EcsloadBalancerFullName
- Name: TargetGroup
Value: !GetAtt TargetGroup.TargetGroupFullName
Comparisonoperator: LessthanThreshold
Q1:我做错了什么?如何在 CloudFormation 中指定 ACTION,以便我的警报触发与自动配置的 AWS 警报(在我创建策略时自动创建的警报)触发的操作相同的操作?
Q2:有没有办法在 AWS 控制台中查看 ACTION?我猜 AWS 将这些东西隐藏在幕后(可能是 lambda 或其他?)。
Q3:有没有人有其他方法可以做到这一点?也许用步进缩放?我也愿意在 60 秒以下触发,所以也许我应该远离目标跟踪?
如果有人有一个 CloudFormation 模板的工作样本,可以根据对 ALB 的请求数量在一分钟或更短的时间内触发,那肯定很棒:) 我把它放在一个单独的问题中(如何在更短的时间内触发)超过一分钟):ECS Fargate autoscaling more rapidly?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)