问题描述
我正在Cloudera 6.2.1平台上使用oozie工作流触发火花提交作业。 但是YARN容器失败,错误代码-104和143。下面是日志片段
Application application_1596360900040_33869 Failed 2 times due to AM Container for appattempt_1596360900040_33869_000002 exited with exitCode: -104
…………………………………………………………………………………………………………………………………………………………
…………………some more logs printing jar dependencies…………………………
………………………………………………………………………………………………………………………………………………………………
1001/lib/hadoop/client/xz-1.6.jar:/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p3757.1951001/lib/hadoop/client/xz.jar -Xmx8G org.apache.spark.deploy.SparkSubmit --master yarn --deploy-mode client --conf spark.yarn.am.memory=8G --conf spark.driver.memory=8G --conf spark.yarn.am.memoryOverhead=820 --conf spark.driver.memoryOverhead=820 --conf spark.executor.memoryOverhead=3280 --conf spark.sql.broadcastTimeout=3600 --num-executors 4 --executor-cores 8 --executor-memory 16G --principal username --keytab username.keytab main.py
[2020-08-14 05:30:26.153]Container killed on request. Exit code is 143
[2020-08-14 05:30:26.167]Container exited with a non-zero exit code 143.
Spark提交参数如下
spark2-submit \
--master yarn \
--deploy-mode client \
--num-executors 4 \
--executor-cores 8 \
--executor-memory 16G \
--driver-memory 8G \
--principal ${user_name} \
--keytab ${user_name}.keytab \
--conf spark.sql.broadcastTimeout=3600 \
--conf spark.executor.memoryOverhead=3280 \
--conf spark.driver.memoryOverhead=820 \
--conf spark.yarn.am.memory=8G \
--conf spark.yarn.am.memoryOverhead=820 \
main.py
我为执行程序,驱动程序和应用程序主存储器尝试了不同的组合,但是所有结果都导致相同的错误。
解决方法
问题可以通过将部署模式从客户端更改为群集来解决。我从oozie应用程序触发了火花作业。因此,在客户端模式下,驱动程序将在oozie JVM上启动。为了避免这种情况,我将模式设置为集群。