使用 Azure Databricks 将 Pyspark 数据帧附加到 StorageBlob 中的另一个 csv 文件

问题描述

我有一个 xls 文件，我将它从 Azure 存储容器加载到我的 Azure 数据块中。我已将 blob 挂载为读取此文件的挂载点。在转换为 csv 文件后，需要将其附加到主文件中。我试过下面的代码

final_df.select(cols).toPandas().to_csv(outfile,mode='a',header=False,index=False)

也使用 spark.sql.dataframe.writer

final_df.select(cols).coalesce(1).write.csv(outfile,mode='append',header=False)

在这两种情况下，都没有错误或操作不受支持的错误，但文件也没有附加到主数据文件中。我已经提到了一些帖子，但他们都提到了使用 python 文件打开函数。尝试了如下类似的方法

p_df = final_df.toPandas()
with open(outfile,'a') as fd:
  p_df.to_csv(fd)

但是这给了我一个操作错误作为不受支持的操作

OSError                                   Traceback (most recent call last)
OSError: [Errno 95] Operation not supported

During handling of the above exception,another exception occurred:

OSError                                   Traceback (most recent call last)
<command-3424188746178105> in <module>
      1 p_df = final_df.toPandas()
      2 with open(outfile,'a') as fd:
----> 3   p_df.to_csv(fd)

OSError: [Errno 95] Operation not supported

有没有办法将新数据附加到 Azure databricks notebook 中的主 csv 文件？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

append azure azure azure azure-databricks azure-storage-blobs python