从Azure Blob读取Excel数据并使用Python Azure函数转换为CSV

问题描述

我想部署具有以下功能的azure功能

  1. 从Azure blob将Excel数据读取到流对象中,而不是下载到VM中。
  2. 读入数据框 我需要帮助才能将Excel文件读入数据框。如何更新放置的文件持有人download_file_path以读取Excel数据。
    import pandas as pd 
    import os 
    import io
    from azure.storage.blob import BlobClient,BlobServiceClient,ContentSettings
        
    connectionstring="XXXXXXXXXXXXXXXX" 
    excelcontainer = "excelcontainer"        
    excelblobname="Resource.xlsx" 
    sheet ="Resource" 
            
    blob_service_client =BlobServiceClient.from_connection_string(connectionstring)
    download_file_path =os.path.join(excelcontainer)
    blob_client = blob_service_client.get_blob_client(container=excelcontainer,blob=excelblobname)
    with open(download_file_path,"rb") as f:
       data_bytes = f.read()
    df =pd.read_excel(data_bytes,sheet_name=sheet,encoding = "utf-16")

解决方法

如果要使用熊猫从Azure blob读取excel文件,则有两种选择

  1. 为blob生成SAS令牌,然后将blob URL与SAS令牌一起使用来访问它
from datetime import datetime,timedelta
import pandas as pd
from azure.storage.blob import BlobSasPermissions,generate_blob_sas
def main(req: func.HttpRequest) -> func.HttpResponse:
    account_name = 'andyprivate'
    account_key = 'h4pP1fe76*****A=='
    container_name = 'test'
    blob_name="sample.xlsx"
    sas=generate_blob_sas(
      account_name=account_name,container_name=container_name,blob_name=blob_name,account_key=account_key,permission=BlobSasPermissions(read=True),expiry=datetime.utcnow() + timedelta(hours=1)
    )

    blob_url = f'https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}?{sas}'
    df=pd.read_excel(blob_url)
    print(df)
    ......

enter image description here

  1. 下载blob
from azure.storage.blob import  BlobServiceClient
def main(req: func.HttpRequest) -> func.HttpResponse:
    account_name = 'andyprivate'
    account_key = 'h4pP1f****='

    blob_service_client = BlobServiceClient(account_url=f'https://{account_name }.blob.core.windows.net/',credential=account_key)
    blob_client = blob_service_client.get_blob_client(container='test',blob='sample.xlsx')
    downloader =blob_client.download_blob()
    df=pd.read_excel(downloader.readall())
    print(df)
    ....

enter image description here

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...