火花任务无法开始执行

问题描述

我正在火花壳作业中工作

--num-executors 15 
--driver-memory 15G 
--executor-memory 7G 
--executor-cores 8 
--conf spark.yarn.executor.memoryOverhead=2G 
--conf spark.sql.shuffle.partitions=500 
--conf spark.sql.autoBroadcastJoinThreshold=-1 
--conf spark.executor.memoryOverhead=800

作业卡住，无法启动该代码正在对270m大型数据集进行过滤条件的交叉联接。我已将大表270m和小表（100000）的分区增加到16000，我已经将其转换为广播变量

我为工作添加了spark ui，

所以我必须减少分区，增加执行者，任何想法

感谢您的帮助。

！[spark ui 1] [1] ！[spark ui 2] [2] ！[spark ui 3] [3] 10小时后

状态：任务：7341/16936（16624失败）

检查容器错误日志

RM Home
NodeManager
Tools
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

[每完成ui 1完成50次] [4] [每完成ui 2完成50次] [5] [1]：https：//i.stack.imgur.com/nqcys.png [2]：https：//i.stack.imgur.com/S2vwL.png [3]：https：//i.stack.imgur.com/81FUn.png [4]：https：//i.stack.imgur.com/h5MTa.png [5]：https：//i.stack.imgur.com/yDfKF.png

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

apache-spark scala spark-ui task