Cloudformation ECS Fargate 自动扩展目标跟踪：1 分钟内 1 个自定义警报：无法执行操作

问题描述

我可以通过控制台设置以下内容，并将其用作 cloudformation 模板：

与我的 ALB 关联的可扩展目标，
cpu 目标跟踪扩展策略，
ALBRequestCountPerTarget 目标跟踪政策。

这一切都很好。在我的 Cloudformation 模板中创建策略还负责创建关联的横向扩展和缩减警报。

问题：自动创建的警报仅在前 3 个 60 秒的时间段内发生 3 个警报后才会触发。因此，如果突然负载进来，ECS 集群的服务需要 3 分钟才能横向扩展。这对我来说太长了。我希望它尽可能快地扩展。而且，从文档来看，ALB RequestCountPerTarget 的最小周期似乎是 60 秒：

“AWS/”中的指标仅支持大于 60 秒的时间段命名空间

手动解决方案：现在，我可以进入控制台，在 cloudwatch 服务中，找到为我创建的 HIGH 和 LOW 警报，并编辑 HIGH 警报（触发向外扩展的警报）。所以我可以将警报的评估“周期”设置为 60 秒，“DatapointsToAlarm”设置为 1（一旦警报响起，触发向外扩展操作），“EvaluationPeriods”设置为 1（仅考虑前 60 秒的时间段），并将“阈值”设置为 500（如果过去 60 秒内我的 ALB 上有超过 500 个请求，则添加容量=向外扩展）。

为了测试，我使用 JMeter 并发送了大量请求，我可以看到警报在一分钟左右响起，并且我的 ECS 服务调整了所需的运行任务计数。这一切都很好。

但现在，我们都应该编写基础设施即代码 (IaC)，对吗？因此，我希望将上述控制台调整包含在我的 CloudFormation 模板中。这就是问题发生的地方。我做了什么：

我在 Cloudformation 模板中添加了两个新警报：一个用于 HIGH（向外扩展），另一个用于 LOW（向内扩展），
这两个新警报指向现有的扩展策略。

发生了什么：

警报会在 60 秒内进入警报状态，
策略尝试运行操作（向外扩展），但出现错误：

执行操作失败
arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster/my-ecs-service:policyName/ alb-requests-per-target-per-minute。收到错误：“”

我不知道这意味着什么，只能告诉我：警报已正确发出，我们尝试对其采取行动并向外扩展，但它失败了（并且没有提供任何错误！）。

我尝试将此错误与其他成功操作进行比较 [AWS 在创建自动缩放策略时自动创建的警报中的操作 = 在 t=3 分钟触发的操作]，我看到的唯一区别是错误消息中操作的 ARN 似乎缺少“createdBy”，当“默认”警报（在 cloudformation 创建自动缩放策略时自动提供的警报）触发操作时，该“createdBy”似乎附加到操作 ARN：>

成功执行的动作 arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service: policyName/tpg-cce-cpu-target-tracking-scaling-policy:createdBy/59b3e5ac-81ae-490f-8ecb-00241506a15e

成功执行的动作 arn:aws:autoscaling:us-east-1:MY-ACCOUNT-ID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/my-ecs-service: policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868

注意上面的区别（策略 ARN 中缺少 createdBy，其中操作由我的自定义警报触发）。但我不知道如何得到它，因为在 CloudFormation 中，我 Fn::Ref 到策略 ARN，没有提到在策略 ARN 的末尾附加某种“createdBy”（请注意，这可能不是根本就是这个问题，我只是列出了我迄今为止发现的内容，这是我迄今为止发现的唯一区别 = 这可能是一条红鲱鱼/虚假线索）。

另一个线索，也许是，当我去 Cloudwatch 中的 AWS 控制台查看警报时：

我可以编辑 AWS 在我创建策略时自动创建的 Cloudwatch HIGH 警报，
我无法编辑我的自定义“横向扩展”警报（下方 CloudFormation 模板底部的警报）。我尝试编辑自定义警报时控制台中的错误是：

无法编辑 myCloudFormationStack-ALBRequestsScaleOutAlarm-E4VY9ZOJ5DOF 原样具有目标跟踪扩展策略的 Auto Scaling 警报。

我的警报与使用策略自动创建的警报之间的另一个区别是：当我在 AWS 控制台中查看 CloudWatch->Alarms 并查看警报详细信息时，“操作”部分看起来不同。对于自动配置的警报，我看到了：

报警时，执行此操作 arn:aws:autoscaling:us-east-1:MYACCOUNTID:scalingPolicy:51f0a780-28d5-4005-9681-84244912954d:resource/ecs/service/my-ecs-cluster-cce/ my-service:policyName/alb-requests-per-target-per-minute:createdBy/ffacb0ac-2456-4751-b9c0-b909c66e9868

但是对于我自己的警报（在下面的 CloudFormation 模板中定义），我在警报详细信息（操作）中看到了这一点：

报警时，使用策略 alb-requests-per-target-per-minute（将指标 ALBRequestCountPerTarget 保持在目标值 1000。）

这是我的完整 CloudFormation 模板：

AWstemplateFormatVersion: '2010-09-09'
Description: ECS task deFinition,service,and hooks it up to the ALB via a Target Group

# IMPORTANT: this needs the first Cloudformation layers in place (see the imports below)

Parameters:
  ContainerImageIdParam:
    Description: The ECR container image ID and tag to deploy
    Type: String
    Default: MYACCOUNTID.dkr.ecr.us-east-1.amazonaws.com/myapp:v10

  JDBCUrlParam:
    Description: The JDBC URL to the RDS database (use the Route53 DNS entry to your database,and NOT the AWS URL)
    Type: String
    Default: jdbc-secretsmanager:MysqL://MysqL.MYPIERRednS.org:3306/MYDATABASE?useSSL=false&allowPublicKeyRetrieval=true&serverTimezone=UTC

Resources:
  Task:
    Type: AWS::ECS::TaskDeFinition
    Properties:
      Family: myapp
      cpu: 512
      Memory: 1024
      NetworkMode: awsvpc
      RequiresCompatibilities:
        - FARGATE
      ExecutionRoleArn: !ImportValue ECSTaskExecutionRole
      taskRoleArn: !ImportValue ECSTaskRole
      ContainerDeFinitions:
        - Name: myapp-container
          Image: !Ref ContainerImageIdParam
          cpu: 512
          Memory: 1024
          environment:
            - name: JDBC_DB_URL
              value: !Ref JDBCUrlParam
            - name: JDBC_DB_DRIVER_CLASS
              value: com.amazonaws.secretsmanager.sql.AWSSecretsManagerMysqLDriver
            - name: JDBC_DB_USERNAME
              value: dev/myapp/MysqL
            - name: JDBC_DB_PASSWORD
              value: notUsedButECSWillErrorIfMissing
            - name: DB_NUM_THREADS
              value: 10
          PortMappings:
            - ContainerPort: 9000
              Protocol: tcp
          LogConfiguration:
            LogDriver: awslogs
            Options:
              awslogs-group: /ecs/myapp
              awslogs-region: !Ref AWS::Region
              awslogs-stream-prefix: myapp-app

  Service:
    Type: AWS::ECS::Service
    DependsOn: ListenerRule
    Properties:
      ServiceName: myapp-service # todo if we remove the name,one will be automatically be generated
      TaskDeFinition: !Ref Task
      Cluster: !ImportValue ECSCluster
      LaunchType: FARGATE
      DesiredCount: 1 # set this to 0 if cloudformation has issues creating this stack (otherwise takes 3 hours and then fails/timeouts)
      DeploymentConfiguration:
        MaximumPercent: 200
        MinimumHealthyPercent: 0 
      HealthCheckGracePeriodSeconds: 30
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: ENABLED
          subnets:
            - !ImportValue Privatesubnet1
            - !ImportValue Privatesubnet2
          SecurityGroups:
            - !ImportValue ECSServiceSecurityGroup
      LoadBalancers:
        - ContainerName: myapp-container
          ContainerPort: 9000
          TargetGroupArn: !Ref TargetGroup

  TargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      Name: myapp-tg
      VpcId: !ImportValue VPC
      Port: 9000
      Protocol: HTTP
      Matcher:
        HttpCode: 200-299
      HealthCheckIntervalSeconds: 30
      HealthCheckPath: /myapp-svc/index.html
      HealthCheckProtocol: HTTP
      HealthCheckTimeoutSeconds: 10
      HealthyThresholdCount: 2
      UnhealthyThresholdCount: 6
      targettype: ip
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 30

  ListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    Properties:
      ListenerArn: !ImportValue LoadBalancerListenerHTTPS
      Priority: 20
      Conditions:
        - Field: path-pattern
          Values:
            - /myapp-svc/*
      Actions:
        - TargetGroupArn: !Ref TargetGroup
          Type: forward

  ECSAutoScalingTarget:
    Type: AWS::ApplicationAutoScaling::scalableTarget
    Properties:
      MaxCapacity: 6
      MinCapacity: 1
      ResourceId: !Join ["/",[service,!ImportValue ECSCluster,!GetAtt Service.Name]] # service/clusterName/serviceName = service/ecs-cluster-myapp/myapp-service
      RoleARN: !Sub 'arn:aws:iam::${AWS::AccountId}:role/aws-service-role/ecs:application-autoscaling:amazonaws:com/AWSServiceRoleForApplicationAutoScaling_ECSService'
      scalableDimension: ecs:service:DesiredCount
      ServiceNamespace: ecs

  cpuutilizationAutoScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: myapp-cpu-target-tracking-scaling-policy
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ECSAutoScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        disableScaleIn: true # disable scale in for this policy to give ALBRequestPolicy the priority on scale in decisions
        PredefinedMetricSpecification:
          PredefinedMetricType: ECSServiceAveragecpuutilization
        ScaleInCooldown: 300
        ScaleOutCooldown: 30
        TargetValue: 50 # Average 50% cpu utilization


  ServiceScalingPolicyALB:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: alb-requests-per-target-per-minute
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ECSAutoScalingTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: 1000
        ScaleInCooldown: 300
        ScaleOutCooldown: 30
        PredefinedMetricSpecification:
          PredefinedMetricType: ALBRequestCountPerTarget
          ResourceLabel: !Join
            - '/'
            - - !ImportValue EcsloadBalancerFullName
              - !GetAtt TargetGroup.TargetGroupFullName

  # NOTE: the ALB RequestCountPerTarget metric alarms are automatically
  # created when we use that policy. But if we want a different evaluation period,# we need to define our own alarms. So,the new scale IN/OUT alarms are included below.

  # SCALE OUT ALARM: if the total (SUM) of ALB requests per target is ABOVE the
  # threshold a certain number of times in the past period,THEN send "scale out" alarm.
  ALBRequestsScaleOutAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      MetricName: RequestCountPerTarget
      Namespace: AWS/ApplicationELB # Only a period greater than 60s is supported for metrics in the "AWS/" namespaces
      ActionsEnabled: true
      AlarmActions:
        - !Ref ServiceScalingPolicyALB
      # OKActions: []
      # InsufficientDataActions: []
      Statistic: Sum
      Dimensions:
        - Name: LoadBalancer
          Value: !ImportValue EcsloadBalancerFullName
        - Name: TargetGroup
          Value: !GetAtt TargetGroup.TargetGroupFullName
      Period: 60     # evaluation period (in seconds) = 1 datapoint
      EvaluationPeriods: 1 # number of prevIoUs periods to take into account
      DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
      Threshold: 1000  # alarm threshold: more than 1000 requests
      Unit: None
      Comparisonoperator: GreaterThanThreshold

  # SCALE IN ALARM: if the total (SUM) of ALB requests per target is BELOW the
  # threshold a certain number of times in the past period,THEN send "scale in" alarm.
  ALBRequestsScaleInAlarm:
    Type: AWS::CloudWatch::Alarm
    Properties:
      MetricName: RequestCountPerTarget
      Namespace: AWS/ApplicationELB
      Statistic: Sum
      Period: 60     # evaluation period (in seconds) = 1 datapoint
      EvaluationPeriods: 5 # number of prevIoUs periods to take into account
      DatapointsToAlarm: 1 # number of datapoints above threshold needed to generate an alarm
      Threshold: 500  # alarm threshold: less than 500 requests
      Unit: None
      AlarmActions:
        - !Ref ServiceScalingPolicyALB
      OKActions:
        - !Ref ServiceScalingPolicyALB
      Dimensions:
        - Name: LoadBalancer
          Value: !ImportValue EcsloadBalancerFullName
        - Name: TargetGroup
          Value: !GetAtt TargetGroup.TargetGroupFullName
      Comparisonoperator: LessthanThreshold

Q1：我做错了什么？如何在 CloudFormation 中指定 ACTION，以便我的警报触发与自动配置的 AWS 警报（在我创建策略时自动创建的警报）触发的操作相同的操作？

Q2：有没有办法在 AWS 控制台中查看 ACTION？我猜 AWS 将这些东西隐藏在幕后（可能是 lambda 或其他？）。

Q3：有没有人有其他方法可以做到这一点？也许用步进缩放？我也愿意在 60 秒以下触发，所以也许我应该远离目标跟踪？

如果有人有一个 CloudFormation 模板的工作样本，可以根据对 ALB 的请求数量在一分钟或更短的时间内触发，那肯定很棒:) 我把它放在一个单独的问题中（如何在更短的时间内触发）超过一分钟）：ECS Fargate autoscaling more rapidly?

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

amazon-cloudformation amazon-ecs amazon-web-services aws-auto-scaling aws-fargate