Pyspark：TaskMemoryManager：无法分配页面

问题描述

我在服务器上以独立集群模式运行 Spark 作业时遇到错误。

我得到的错误类似于：

WARN TaskMemoryManager: Failed to allocate a page (x bytes),try again.

其中 x 可以是：

一些建议的解决方案：

我的 Spark 工作旨在：

加入一些表（3 到 4），
应用一些清洁功能，
将结果（1 df，最大大小 300 MB）保存到 HDFS

运行作业后的 htop：

我的服务器规格：

内存：31GB
cpu：8
每个插槽的核心数：8

我的配置：（伪代码）

spark.sql.execution.arrow.enabled: True,spark.driver.maxResultSize: 0,spark.driver.memory: 15g,spark.executor.memory: 15g,spark.dynamicAllocation.enabled: True,spark.shuffle.service.enabled: True,spark.network.timeout: 10000001,spark.executor.heartbeatInterval: 10000000,spark.sql.crossJoin.enabled: True

附注：

这曾经适用于上述细节和更大的数据集（大约 300GB）没有问题
我还是个新手

我试过了：

stop-all.sh (hadoop & spark)
更改配置 spark.executor.memory: 10g
添加一些配置：spark.sql.autobroadcastJoinThreshold: -1 和spark.sql.broadcastTimeout: 3000

解决方法

在观察了许多测试场景并尝试了很多东西之后，

重启服务器解决了问题，我知道这不是完美的解决方案，但确实如此

apache-spark hdfs out-of-memory pyspark taskmanager