Azure DataLakeServiceClient Python-如何追加，如何设置“偏移量”和“刷新长度”？

问题描述

我想使用DataLakeServiceClient（azure.storage.filedatalake包）创建并重复附加到csv文件。初始创建/写入的工作方式如下。

<key>UTImportedTypeDeclarations</key>
<dict>
    <key>UTTypeConformsTo</key>
    <string>public.text</string>
    
    <key>UTTypeIdentifier</key>
    <string>public.plain-text</string>
    
    <key>UTTypeDescription</key>
    <string>Text File</string>
    
    <key>UTTypeIconFile</key>
    <string>sbl-txt.icns</string>
    
    <key>com.apple.ostype</key>
    <array>
        <string>TXT</string>
        <string>TEXT</string>
    </array>
    
    <key>public.mime-type</key>
    <string>text/plain</string>
    
    <key>public.filename-extension</key>
    <string>txt</string>
</dict>

假设下一个追加对象是data =“”“ Test2”“”，如何设置offset和flush_data？

谢谢。

解决方法

首先，您正在使用directory_client.create_file(myfile)，这将每次创建新文件。因此，您的代码将永远不会附加任何内容。

第二，您需要添加一个判断条件以检查是否存在，如果存在，则使用get_file_client方法。如果不存在，请使用create_file方法。总代码如下：（在我这边，我正在使用.txt文件进行测试。）

from azure.storage.filedatalake import DataLakeServiceClient 
connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
myfilesystem = "test"
myfolder     = "test"
myfile       = "FileName.txt"

file_system_client = datalake_service_client.get_file_system_client(myfilesystem)            
directory_client = file_system_client.create_directory(myfolder)         
directory_client = file_system_client.get_directory_client(myfolder)
print("11111")
try:
    file_client = directory_client.get_file_client(myfile)
    file_client.get_file_properties().size
    data = "Test2"   
    print("length of data is "+str(len(data)))
    print("This is a test123")
    filesize_previous = file_client.get_file_properties().size
    print("length of currentfile is "+str(filesize_previous))
    file_client.append_data(data,offset=filesize_previous,length=len(data))
    file_client.flush_data(filesize_previous+len(data))
except:
    file_client = directory_client.create_file(myfile)
    data = "Test2"   
    print("length of data is "+str(len(data)))
    print("This is a test")
    filesize_previous = 0
    print("length of currentfile is "+str(filesize_previous))
    file_client.append_data(data,length=len(data))
    file_client.flush_data(filesize_previous+len(data))

对我而言这没问题，请尝试一下。（以上仅是示例，您可以设计得更好，更精简。）

azure-data-lake azure-data-lake-gen2 azure-sdk-python azure-storage python