如何确保数据帧已通过 pandas.to_csv() 完成写入?

问题描述

我一直在创建一个查询数据库并返回结果的小脚本。然后,我一直在使用 Pandas.to_csv() 将其写入 CSV 临时文件,然后再将该 CSV 结果上传到云位置。我遇到的问题是确保在将 CSV 临时文件上传到云位置之前,pandas.to_csv() 函数已完成写入。我一直确保该日期在上传之前进入临时文件的唯一方法是保留

打印(temp.tell())

下面示例中的代码行。如果我将其注释掉,则不会上传任何数据。

示例代码如下:

def write_to_temporary_csv_file(df,file_name,token,folder_id):
   with tempfile.NamedTemporaryFile(mode='w',suffix='.csv',delete=False) as temp:
       print("DataFrame: ",df)
       df.to_csv(temp,index=False,encoding='utf-8')
       print("temp.tell() size: ",temp.tell())
       print("File size: ",str(round((os.stat(temp.name).st_size/1024),2)),"kb")
       new_file_path = tempfile.gettempdir() + '/' + customer_name + '_' + file_name + '_' +  current_date + '.csv'

       ## Check if newly created renamed temp file already exist,if it does remove it to create it
       remove_temporary_file(new_file_path)
       os.link(temp.name,new_file_path)
       upload_response = upload_file(token,folder_id,new_file_path)

       ## Remove both the temp file and the newly created renamed temp file
       remove_temporary_file(temp.name)
       remove_temporary_file(new_file_path)

图像 1(包括 temp.tell():

Image 1 (with temp.tell() included

图 2(带有 temp.tell() 注释掉:

Image 2 (with temp.tell() commented out

解决方法

我认为这可能是因为您将文件保持打开状态(只要您在 with 块内)。这可能会导致内容未刷新到磁盘。

def write_to_temporary_csv_file(df,file_name,token,folder_id):
   with tempfile.NamedTemporaryFile(mode='w',suffix='.csv',delete=False) as temp:
       print("DataFrame: ",df)
       df.to_csv(temp,index=False,encoding='utf-8')

   # at this point we can close the file by exiting the with block

   print("temp.tell() size: ",temp.tell())
   print("File size: ",str(round((os.stat(temp.name).st_size/1024),2)),"kb")
   new_file_path = tempfile.gettempdir() + '/' + customer_name + '_' + file_name + '_' +  current_date + '.csv'

   ## Check if newly created renamed temp file already exist,if it does remove it to create it
   remove_temporary_file(new_file_path)
   os.link(temp.name,new_file_path)
   upload_response = upload_file(token,folder_id,new_file_path)

   ## Remove both the temp file and the newly created renamed temp file
   remove_temporary_file(temp.name)
   remove_temporary_file(new_file_path)