pyspark refreshTable for FileNotFoundException

问题描述

我写CSV时看到以下错误。

Stdoutput Caused by: org.apache.spark.SparkException: Job aborted due to stage failure:
 Task 9560 in stage 21.0 Failed 4 times,most recent failure:
 Lost task 9560.3 in stage 21.0 (TID 88857,..,executor 12):
 java.io.FileNotFoundException: File does not exist: <hdfs dependent table location>/000017_0
Stdoutput It is possible the underlying files have been updated.
 You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in sql or by recreating the Dataset/DataFrame involved.
Stdoutput   at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:131)

...
Stdoutput Caused by: java.io.FileNotFoundException: File does not exist: 
 <hdfs dependent table location>/000017_0
Stdoutput It is possible the underlying files have been updated.
 You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in sql or by recreating the Dataset/DataFrame involved.

写语句：

df.write.format(format).mode(mode).saveAsTable("{}.{}".format(runtime_db,table_name))

上面的

df使用依赖表中的数据，并在上面的write语句（您在上面的错误中看到的位置）之前进行了一些转换。

我知道REFRESH table会更新元数据，但是在执行写入CSV的最终操作之前刷新所有相关表的元数据有意义吗？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

apache-spark filenotfoundexception pyspark