问题描述
我有一个处理队列中项目的工作。当队列为空时,它会在几秒钟内完成。但是,如果队列中有项目,有时需要几分钟(或几小时)才能完成。
我希望它在失败后自动重试(等待 5 分钟),除非上次成功的作业是在 1 小时前。如果失败并且最后一次成功是在 1 小时前,我希望它通知我们(通过通知插件)
我该怎么做?
解决方法
您可以创建一个“监控作业”,使用 Rundeck API 步骤中的“嵌入”inline-script 来验证您的“主要作业”的状态。
此作业有一个内联脚本步骤,用于验证主作业的最新状态并计算最近成功执行与当前时间之间的分钟数,全部通过 API,脚本需要 jq 和 dateutils。如果发生这种情况,请禁用主作业的计划并使用“失败时”notification
发送通知电子邮件HelloWorld 作业(“主作业”,如您所说的要监控的作业,有 5 分钟的重试时间,我认为这是计划作业)。
<joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<executionEnabled>true</executionEnabled>
<id>3fc55b67-b2a1-4e92-b949-7ed464041993</id>
<loglevel>INFO</loglevel>
<name>HelloWorld</name>
<nodeFilterEditable>false</nodeFilterEditable>
<plugins />
<retry delay='5m'>1</retry>
<schedule>
<month month='*' />
<time hour='17' minute='15' seconds='0' />
<weekday day='*' />
<year year='*' />
</schedule>
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<exec>eho "hello world"</exec>
</command>
</sequence>
<uuid>3fc55b67-b2a1-4e92-b949-7ed464041993</uuid>
</job>
</joblist>
和“监控”作业一样,该作业使用 Rundeck API 验证内联脚本上的所有条件:
<joblist>
<job>
<defaultTab>nodes</defaultTab>
<description></description>
<executionEnabled>true</executionEnabled>
<id>7a4b117c-75cc-4b49-a504-703e27941170</id>
<loglevel>INFO</loglevel>
<name>Monitor</name>
<nodeFilterEditable>false</nodeFilterEditable>
<notification>
<onfailure>
<email attachLog='true' attachLogInFile='true' recipients='devops@example.net' subject='job failure' />
</onfailure>
</notification>
<notifyAvgDurationThreshold />
<plugins />
<schedule>
<month month='*' />
<time hour='*' minute='0/15' seconds='0' />
<weekday day='*' />
<year year='*' />
</schedule>
<scheduleEnabled>true</scheduleEnabled>
<sequence keepgoing='false' strategy='node-first'>
<command>
<description>Monitor the main job</description>
<fileExtension>.sh</fileExtension>
<script><![CDATA[# get the latest execution
latest_execution_status=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[].status')
# now get the latest successful execution date
latest_succeeded_execution_date=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?status=succeeded&max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[]."date-ended".date' | sed 's/.$//')
# the current date (to compare with the latest execution date)
current_date=$(date --iso-8601=seconds)
# comparing the two dates using dateutils
time_between_latest_succeeded_and_current_time=$(dateutils.ddiff $latest_succeeded_execution_date $current_date -f "%M")
# just for debug
echo "LATEST EXECUTION STATE: $latest_execution_status"
echo "LATEST SUCCEEDED EXECUTION DATE: $latest_succeeded_execution_date"
echo "CURRENT DATE: $current_date"
echo "MINUTES SINCE LATEST SUCCEEDED EXECUTION: $time_between_latest_succeeded_and_current_time minutes"
# If it fails and the last success was more than 1 hour ago,we want it to notify us (via notification plugin)
if [ $latest_execution_status = 'failed' ] && [ $time_between_latest_succeeded_and_current_time -gt 60 ]; then
# disable main job schedule
curl -s --location --request POST 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/schedule/disable' --header 'Accept: application/json' --header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' --header 'Content-Type: application/json'
# with exit 1 this job fails and the notification "on failure" is triggered
exit 1
else
echo "all ok!"
fi
# all done.]]></script>
<scriptargs />
<scriptinterpreter>/bin/bash</scriptinterpreter>
</command>
</sequence>
<uuid>7a4b117c-75cc-4b49-a504-703e27941170</uuid>
</job>
</joblist>
这里有单独需要的脚本:
# get the latest execution
latest_execution_status=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[].status')
# now get the latest successful execution date
latest_succeeded_execution_date=$(curl -s --location --request GET 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/executions?status=succeeded&max=1' \
--header 'Accept: application/json' \
--header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' \
--header 'Content-Type: application/json' | jq -r '.executions | .[]."date-ended".date' | sed 's/.$//')
# the current date (to compare with the latest execution date)
current_date=$(date --iso-8601=seconds)
# comparing the two dates using dateutils
time_between_latest_succeeded_and_current_time=$(dateutils.ddiff $latest_succeeded_execution_date $current_date -f "%M")
# just for debug
echo "LATEST EXECUTION STATE: $latest_execution_status"
echo "LATEST SUCCEEDED EXECUTION DATE: $latest_succeeded_execution_date"
echo "CURRENT DATE: $current_date"
echo "MINUTES SINCE LATEST SUCCEEDED EXECUTION: $time_between_latest_succeeded_and_current_time minutes"
# If it fails and the last success was more than 1 hour ago,we want it to notify us (via notification plugin)
if [ $latest_execution_status = 'failed' ] && [ $time_between_latest_succeeded_and_current_time -gt 60 ]; then
# disable main job schedule
curl -s --location --request POST 'http://localhost:4440/api/38/job/3fc55b67-b2a1-4e92-b949-7ed464041993/schedule/disable' --header 'Accept: application/json' --header 'X-Rundeck-Auth-Token: 6pSjAdcnNInfJ3Y8PHoC6EN7KGzjecEe' --header 'Content-Type: application/json'
# with exit 1 this job fails and the notification "on failure" is triggered
exit 1
else
echo "all ok!"
fi
# all done.
当然可以使用选项参数化作业 ID、主机等参数。