Cloudwatch警报:奇怪的OK / ALARM过渡 问题:

问题描述

我已经为我的一个AWS Lambda设置了一个AWS Cloudwatch警报,如下所示:

cloudwatch_client.put_metric_alarm(
  AlarmName=f"BobbyErrors",Namespace="AWS/Lambda",MetricName="Errors",Statistic="Sum",Threshold=1,Period=60 * 60,EvaluationPeriods=1,Comparisonoperator="GreaterThanorEqualToThreshold",TreatMissingData="notBreaching",Dimensions=[{"Name": "FunctionName","Value": "Bobby"}],ActionsEnabled=True,AlarmActions=[alert_topic]
)

给出以下调用历史记录:

<NO MORE EXECUTIONS>
2020-10-15T16:37:41.145Z  OK
2020-10-15T16:33:54.203Z  OK
2020-10-15T16:31:10.373Z  OK
2020-10-15T16:30:07.680Z  ERROR
2020-10-15T16:28:57.371Z  OK

我看到以下错误状态转换发生,我不完全了解 (为什么它多次进入ALARM状态):

2020-10-15 17:31:53 State update    Alarm updated from In alarm to OK
2020-10-15 16:38:53 Action  Successfully executed action arn:aws:sns:xxx
2020-10-15 16:38:53 State update    Alarm updated from OK to In alarm
2020-10-15 16:31:53 State update    Alarm updated from In alarm to OK
2020-10-15 15:38:53 Action  Successfully executed action arn:aws:sns:xxx
2020-10-15 15:38:53 State update    Alarm updated from OK to In alarm
2020-10-15 15:31:53 State update    Alarm updated from In alarm to OK
2020-10-15 14:31:53 Action  Successfully executed action arn:aws:sns:xxx

我相信这是CloudWatch如何处理丢失的数据(即不破坏数据)的根本原因。

Note
A particular case of this behavior is that CloudWatch alarms might repeatedly
re-evaluate the last set of data points for a period of time after the metric
has stopped flowing. This re-evaluation might cause the alarm to change state
and re-execute actions,if it had changed state immediately prior to the metric
stream stopping. To mitigate this behavior,use shorter periods.

问题:

但是我不完全了解正确的警报配置,因此警报只触发一次?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)