AWS Data Pipeline恢复到DynamoDB表错误,错误为“状态为'CANCELLED',原因为'作业终止'”

问题描述

我已经配置了一个AWS Data Pipeline,以将DynamoDB表导出到另一个帐户中的S3存储桶(使用模板)。导出工作正常,但尝试将备份还原到第二个帐户的新表中时也遇到了一些问题(也使用导入模板)。

我为此任务提供的信息来源:https://aws.amazon.com/premiumsupport/knowledge-center/data-pipeline-account-access-dynamodb-s3/

  1. 我可以看到AWS Data Pipeline正在将数据还原到新表(不确定是否所有数据都已还原),但是执行状态为@media only screen and (max-width: 400px) { .navbar-brand{ margin:0; font-size:15px; } .nav-link{ font-size:12px; } .btn-primary{ font-size:12px; margin:0; } .dropdown-menu{ min-width: 10px !important; } }

  2. 活动日志多次显示CANCELED

2.a),然后取消部分EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'RUNNING' with reason 'null'

请参阅下面的完整日志(仅剩下几行,第2点有错误):

EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'CANCELLED' with reason 'Job terminated'
  1. 如果我转到依赖项“ EmrClusterForLoad”,则会看到以下内容
07 Sep 2020 12:52:04,844 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.TaskPoller: Executing: amazonaws.datapipeline.activity.EmrActivity@1d0415c
07 Sep 2020 12:52:04,887 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 private.com.amazonaws.services.datapipeline.factory.S3ClientFactory: Returning cached AmazonS3Client for the region [eu-west-1]
07 Sep 2020 12:52:04,945 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.EmrActivity: EMR transform starting.
07 Sep 2020 12:52:04,957 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client waiting for cluster to enter ready state for jobflow id 'j-11620944P11II'.
07 Sep 2020 12:52:04,957 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client checking if cluster is ready for jobflow with id 'j-11620944P11II'.
07 Sep 2020 12:52:05,141 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client reports that cluster with jobflow id 'j-11620944P11II' is ready.
07 Sep 2020 12:52:05,200 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrClient: EMR client adding steps with request '{JobFlowId: j-11620944P11II,Steps: [{Name: df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1,ActionOnFailure: CONTINUE,HadoopJarStep: {Properties: [],Jar: s3://dynamodb-dpl-eu-west-1/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,Args: [org.apache.hadoop.dynamodb.tools.DynamoDBImport,s3://dynamodb-backup-imported/2020-09-06-12-19-11/,blabla-test6,0.25]}}]}'
07 Sep 2020 12:53:05,352 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'RUNNING' with reason 'null'
07 Sep 2020 13:48:08,772 [WARN] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job flow named 'df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59' with jobFlowId 'j-11620944P11II' is in status 'WAITING' because of the step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' failures 'Job terminated'
07 Sep 2020 13:48:08,772 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: EMR job '@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' with jobFlowId 'j-11620944P11II' is in  status 'WAITING' and reason 'Cluster ready after last step completed.'. Step 'df-06812232H5PDR4VVK472_@TableLoadActivity_2020-09-07T12:45:59_Attempt=1' is in status 'CANCELLED' with reason 'Job terminated'
07 Sep 2020 13:48:08,772 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: Collecting steps stderr logs for cluster with AMI null
07 Sep 2020 13:48:08,777 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.LogMessageUtil: Returning tail errorMsg :
07 Sep 2020 13:48:08,777 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.cluster.EmrUtil: Collecting steps logs for cluster with AMI/ReleaseLabel emr-5.23.0
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelperFactory: Getting the helper for version 2.8.3
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Uploading step log details
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: path to step logss3n://srh-data-export-int2/df-06812232H5PDR4VVK472/EmrClusterForLoad/@EmrClusterForLoad_2020-09-07T12:45:59/@EmrClusterForLoad_2020-09-07T12:45:59_Attempt=1/j-11620944P11II/steps
07 Sep 2020 13:48:08,778 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: step log file /mnt/taskRunner/output/logs/df-06812232H5PDR4VVK472/TableLoadActivity/@TableLoadActivity_2020-09-07T12:45:59/@TableLoadActivity_2020-09-07T12:45:59_Attempt=1/hadoop.jobs.log
07 Sep 2020 13:48:08,782 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done uploading hadoop log details
07 Sep 2020 13:48:08,842 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Field value updated 
07 Sep 2020 13:48:08,842 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.activity.mapreduce.EMRActivityHelper: Done updating the field with value 
07 Sep 2020 13:48:08,844 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.HeartBeatService: Finished waiting for heartbeat thread @TableLoadActivity_2020-09-07T12:45:59_Attempt=1
07 Sep 2020 13:48:08,845 [INFO] (TaskRunnerService-df-06812232H5PDR4VVK472_@EmrClusterForLoad_2020-09-07T12:45:59-0) df-06812232H5PDR4VVK472 amazonaws.datapipeline.taskrunner.TaskPoller: Work EmrActivity took 56:4 to complete

enter image description here

  1. 我的活动是图像上的活动,并且step字段具有此配置
@failureReason Resource timeout due to terminateAfter configuration
@status TIMEDOUT

enter image description here

  1. 资源配置:

    enter image description here

我想了解我在这里所缺少的内容,以及如何解决它,或者它是否是错误,因为我在新表中看到了一些数据,因此还原位于至少部分可以,但是从日志s3://dynamodb-dpl-#{myDDBRegion}/emr-ddb-storage-handler/4.11.0/emr-dynamodb-tools-4.11.0-SNAPSHOT-jar-with-dependencies.jar,org.apache.hadoop.dynamodb.tools.DynamoDBImport,#{input.directoryPath},#{output.tableName},#{output.writeThroughputPercent} 删除然后被取消意味着恢复可能尚未完全完成。

然后我在某处阅读到此错误消息status 'WAITING' and reason 'Cluster ready after last step completed添加一个名为@failureReason Resource timeout due to terminateAfter configuration的可选字段的问题,该字段在Architect视图中不可用。

解决方法

回答自己

问题在于我设置了Terminate After字段是因为我在下面的图像上收到警告消息,建议这样做,所以我将Terminate After设置为1小时,而我使用该时间的原因是因为要导入的文件仅为9,6 MB。处理小文件等需要多少时间? enter image description here

因此,该小文件的导入过程持续大约5个小时。

发现:

为了缩短导入时间,我将myDDBWriteThroughputRatio的值从0.25增加到了0.95,一开始我没有触摸该参数,因为它是模板中的默认值,AWS文档有时会简化很多工作在许多情况下,您必须通过反复试验才能发现。

更改该值后,导入将持续大约一个小时,这比5小时更好,但仍然很慢,因为我们只讨论9,6 MB enter image description here

然后我在日志is in status 'WAITING' and reason 'Cluster ready after last step completed.'中看到的这个消息令我有些不安,原因是我不熟悉此工具,而我却未收到任何消息,这只是以下内容,如AWS的某人解释的那样

*

如果您看到EMR群集处于“等待”状态,请在之后将群集准备就绪 最后一步,这意味着集群已经执行了第一个请求 已收到并正在等待执行下一个请求/活动 集群。

这些都是我的发现,希望这对其他人有帮助。

,

就我而言,它是在我将容量更改为按需使用后起作用的。