Azure数据湖python的Azure函数绑定

问题描述

我有一个需求，例如我想从Azure函数连接到我的Azure数据湖v2（ADLS），读取文件，使用python（pyspark）处理它，然后再次将其写入Azure数据湖。因此，我的输入和输出绑定将是ADLS。 python中是否有用于Azure函数的ADLS绑定？有人可以对此提出任何建议吗？

谢谢，安藤D

解决方法

更新：

1，当我们读取数据时，可以使用blob输入绑定。

2，但是当我们写数据时，我们不能使用blob输出绑定。（这是因为对象不同。）而且azure函数不支持ADLS输出绑定，因此我们需要将逻辑代码放在主体中我们要编写代码时的功能。

这是azure函数可以支持的绑定类型的文档：

https://docs.microsoft.com/en-us/azure/azure-functions/functions-triggers-bindings?tabs=csharp#supported-bindings

下面是一个简单的代码示例：

import logging

import azure.functions as func
from azure.storage.filedatalake import DataLakeServiceClient

def main(req: func.HttpRequest,inputblob: func.InputStream) -> func.HttpResponse:
    connect_str = "DefaultEndpointsProtocol=https;AccountName=0730bowmanwindow;AccountKey=xxxxxx;EndpointSuffix=core.windows.net"
    datalake_service_client = DataLakeServiceClient.from_connection_string(connect_str)
    myfilesystem = "test"
    myfile       = "FileName.txt"
    file_system_client = datalake_service_client.get_file_system_client(myfilesystem)    
    file_client = file_system_client.create_file(myfile)
    inputstr = inputblob.read().decode("utf-8")
    print("length of data is "+str(len(inputstr)))
    filesize_previous = 0
    print("length of currentfile is "+str(filesize_previous))
    file_client.append_data(inputstr,offset=filesize_previous,length=len(inputstr))
    file_client.flush_data(filesize_previous+len(inputstr))
    return func.HttpResponse(
            "This is a test."+inputstr,status_code=200
    )

原始答案：

我认为以下文档将为您提供帮助：

阅读方法：

https://docs.microsoft.com/en-us/azure/azure-functions/functions-bindings-storage-blob-input?tabs=csharp

怎么写：

https://docs.microsoft.com/en-us/python/api/azure-storage-file-datalake/azure.storage.filedatalake.datalakeserviceclient?view=azure-python

顺便说一句，不要使用blob的输出绑定。可以通过绑定实现读取，但不能写入。（Blob存储服务和Datalake Service基于不同的对象。尽管使用blob输入绑定读取文件是完全可以的，但是请不要使用blob输出绑定来写入文件，因为这样做不能基于Datalake Service创建对象。）

让我知道上面的文档是否可以帮助您，否则，我将更新一个简单的python示例。

azure azure azure-data-lake azure-functions python