问题描述
我正在尝试在我的 5 节点集群中运行 Apache Oozie 书中的第一个示例 (identity-wf) 工作流。
- hadoop1 / NameNode、资源管理器
- hadoop2 / SecondaryNameNode
- hadoop3 / Datanode、NodeManager (3GB RAM)
- hadoop4 / Datanode、NodeManager (3GB RAM)
- hadoop5 / Datanode、NodeManager (3GB RAM)
- hadoop6 / Oozie 服务器
hadoop 版本是 2.10.1 oozie 版本是 5.2.1
工作流.xml
<workflow-app xmlns="uri:oozie:workflow:0.4" name="identity-WF">
<parameters>
<property>
<name>jobTracker</name>
</property>
<property>
<name>nameNode</name>
</property>
<property>
<name>exampleDir</name>
</property>
</parameters>
<start to="identity-MR"/>
<action name="identity-MR">
<map-reduce>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${exampleDir}/data/output"/>
</prepare>
<configuration>
<property>
<name>mapred.mapper.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityMapper</value>
</property>
<property>
<name>mapred.reducer.class</name>
<value>org.apache.hadoop.mapred.lib.IdentityReducer</value>
</property>
<property>
<name>mapred.input.dir</name>
<value>${exampleDir}/data/input</value>
</property>
<property>
<name>mapred.output.dir</name>
<value>${exampleDir}/data/output</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-verbose</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.memory.mb</name>
<value>512</value>
</property>
<property>
<name>oozie.launcher.yarn.app.mapreduce.am.resource.mb</name>
<value>512</value>
</property>
</configuration>
</map-reduce>
<ok to="success"/>
<error to="fail"/>
</action>
<kill name="fail">
<message>The Identity Map-Reduce job Failed!</message>
</kill>
<end name="success"/>
</workflow-app>
容器已创建并转换为 RUNNING 状态,但最终超时。
ResourceManager 的日志
2021-05-10 13:40:36,733 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from NEW to ALLOCATED
2021-05-10 13:40:36,734 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=oozie OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1620621418881_0001 CONTAINERID=container_1620621418881_0001_01_000001 RESOURCE=<memory:2048,vCores:1> QUEUENAME=default
2021-05-10 13:40:36,736 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.ParentQueue: assignedContainer queue=root usedCapacity=0.083333336 absoluteUsedCapacity=0.083333336 used=<memory:2048,vCores:1> cluster=<memory:24576,vCores:24>
2021-05-10 13:40:36,736 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Allocation proposal accepted
2021-05-10 13:40:36,771 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Sending NMToken for nodeId : hadoop3:34683 for container : container_1620621418881_0001_01_000001
2021-05-10 13:40:36,786 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from ALLOCATED to ACQUIRED
2021-05-10 13:40:36,787 INFO org.apache.hadoop.yarn.server.resourcemanager.security.NMTokenSecretManagerInRM: Clear node set for appattempt_1620621418881_0001_000001
2021-05-10 13:40:36,787 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Storing attempt: AppId: application_1620621418881_0001 AttemptId: appattempt_1620621418881_0001_000001 MasterContainer: Container: [ContainerId: container_1620621418881_0001_01_000001,AllocationRequestId: 0,Version: 0,NodeId: hadoop3:34683,NodeHttpAddress: hadoop3:8042,Resource: <memory:2048,vCores:1>,Priority: 0,Token: Token { kind: ContainerToken,service: 192.168.35.67:34683 },ExecutionType: GUaraNTEED,]
2021-05-10 13:40:36,812 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
2021-05-10 13:40:36,830 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
2021-05-10 13:40:36,842 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Launching masterappattempt_1620621418881_0001_000001
2021-05-10 13:40:36,993 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=oozie IP=192.168.35.80 OPERATION=Get Applications Request TARGET=ClientRMService RESULT=SUCCESS
2021-05-10 13:40:37,035 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Setting up container Container: [ContainerId: container_1620621418881_0001_01_000001,] for AM appattempt_1620621418881_0001_000001
2021-05-10 13:40:37,037 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Create AMRMToken for ApplicationAttempt: appattempt_1620621418881_0001_000001
2021-05-10 13:40:37,048 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Creating password for appattempt_1620621418881_0001_000001
2021-05-10 13:40:37,619 INFO org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher: Done launching container Container: [ContainerId: container_1620621418881_0001_01_000001,619 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
2021-05-10 13:40:37,620 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: update the launch time for applicationId: application_1620621418881_0001,attemptId: appattempt_1620621418881_0001_000001launchTime: 1620621637619
2021-05-10 13:40:37,620 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: Updating info for app: application_1620621418881_0001
2021-05-10 13:40:37,704 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from ACQUIRED to RUNNING
2021-05-10 13:46:44,212 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.AbstractYarnScheduler: Release request cache is cleaned up
2021-05-10 13:51:35,713 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=oozie IP=192.168.35.80 OPERATION=Get Applications Request TARGET=ClientRMService RESULT=SUCCESS
2021-05-10 13:53:39,032 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: Expired:appattempt_1620621418881_0001_000001 Timed out after 600 secs
2021-05-10 13:53:39,034 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: Updating application attempt appattempt_1620621418881_0001_000001 with final state: Failed,and exit status: -1000
2021-05-10 13:53:39,035 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from LAUNCHED to FINAL_SAVING on event = EXPIRE
2021-05-10 13:53:39,036 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Unregistering app attempt : appattempt_1620621418881_0001_000001
2021-05-10 13:53:39,036 INFO org.apache.hadoop.yarn.server.resourcemanager.security.AMRMTokenSecretManager: Application finished,removing password for appattempt_1620621418881_0001_000001
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000001 State change from FINAL_SAVING to Failed on event = ATTEMPT_UPDATE_SAVED
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: The number of Failed attempts is 1. The max attempts is 2
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.ApplicationMasterService: Registering app attempt : appattempt_1620621418881_0001_000002
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.attempt.RMAppAttemptImpl: appattempt_1620621418881_0001_000002 State change from NEW to SUBMITTED on event = START
2021-05-10 13:53:39,037 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler: Application Attempt appattempt_1620621418881_0001_000001 is done. finalState=Failed
2021-05-10 13:53:39,042 INFO org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl: container_1620621418881_0001_01_000001 Container Transitioned from RUNNING to KILLED
容器日志
log4j: Trying to find [container-log4j.properties] using context classloader sun.misc.Launcher$AppClassLoader@7852e922.
log4j: Using URL [jar:file:/opt/hadoop-2.10.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.10.1.jar!/container-log4j.properties] for automatic log4j configuration.
log4j: Reading configuration from URL jar:file:/opt/hadoop-2.10.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.10.1.jar!/container-log4j.properties
log4j: Hierarchy threshold set to [ALL].
log4j: Parsing for [root] with value=[INFO,CLA,EventCounter].
log4j: Level token is [INFO].
log4j: Category root set to INFO
log4j: Parsing appender named "CLA".
log4j: Parsing layout options for "CLA".
log4j: Setting property [conversionPattern] to [%d{ISO8601} %p [%t] %c: %m%n].
log4j: End of parsing for "CLA".
log4j: Setting property [containerLogFile] to [syslog].
log4j: Setting property [totalLogFileSize] to [1048576].
log4j: Setting property [containerLogDir] to [/var/hadoop/yarn/logs/userlogs/application_1620621418881_0002/container_1620621418881_0002_01_000001].
log4j: setFile called: /var/hadoop/yarn/logs/userlogs/application_1620621418881_0002/container_1620621418881_0002_01_000001/syslog,true
log4j: setFile ended
log4j: Parsed "CLA" options.
log4j: Parsing appender named "EventCounter".
log4j: Parsed "EventCounter" options.
log4j: Parsing for [org.apache.hadoop.mapreduce.task.reduce] with value=[INFO,CLA].
log4j: Level token is [INFO].
log4j: Category org.apache.hadoop.mapreduce.task.reduce set to INFO
log4j: Parsing appender named "CLA".
log4j: Appender "CLA" was already parsed.
log4j: Handling log4j.additivity.org.apache.hadoop.mapreduce.task.reduce=[false]
log4j: Setting additivity for "org.apache.hadoop.mapreduce.task.reduce" to false
log4j: Parsing for [org.apache.hadoop.mapred.Merger] with value=[INFO,CLA].
log4j: Level token is [INFO].
log4j: Category org.apache.hadoop.mapred.Merger set to INFO
log4j: Parsing appender named "CLA".
log4j: Appender "CLA" was already parsed.
log4j: Handling log4j.additivity.org.apache.hadoop.mapred.Merger=[false]
log4j: Setting additivity for "org.apache.hadoop.mapred.Merger" to false
log4j: Finished configuring.
Launcher AM configuration loaded
Executing Oozie Launcher with tokens:
Kind: YARN_AM_RM_TOKEN,Service:,Ident: (appAttemptId { application_id { id: 2 cluster_timestamp: 1620621418881 } attemptId: 1 } keyId: -1917584279
我怎样才能找出问题所在? 谢谢
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)