问题描述
我想部署具有以下功能的azure功能
- 从Azure blob将Excel数据读取到流对象中,而不是下载到VM中。
- 读入数据框 我需要帮助才能将Excel文件读入数据框。如何更新放置的文件持有人download_file_path以读取Excel数据。
import pandas as pd
import os
import io
from azure.storage.blob import BlobClient,BlobServiceClient,ContentSettings
connectionstring="XXXXXXXXXXXXXXXX"
excelcontainer = "excelcontainer"
excelblobname="Resource.xlsx"
sheet ="Resource"
blob_service_client =BlobServiceClient.from_connection_string(connectionstring)
download_file_path =os.path.join(excelcontainer)
blob_client = blob_service_client.get_blob_client(container=excelcontainer,blob=excelblobname)
with open(download_file_path,"rb") as f:
data_bytes = f.read()
df =pd.read_excel(data_bytes,sheet_name=sheet,encoding = "utf-16")
解决方法
如果要使用熊猫从Azure blob读取excel文件,则有两种选择
- 为blob生成SAS令牌,然后将blob URL与SAS令牌一起使用来访问它
from datetime import datetime,timedelta
import pandas as pd
from azure.storage.blob import BlobSasPermissions,generate_blob_sas
def main(req: func.HttpRequest) -> func.HttpResponse:
account_name = 'andyprivate'
account_key = 'h4pP1fe76*****A=='
container_name = 'test'
blob_name="sample.xlsx"
sas=generate_blob_sas(
account_name=account_name,container_name=container_name,blob_name=blob_name,account_key=account_key,permission=BlobSasPermissions(read=True),expiry=datetime.utcnow() + timedelta(hours=1)
)
blob_url = f'https://{account_name}.blob.core.windows.net/{container_name}/{blob_name}?{sas}'
df=pd.read_excel(blob_url)
print(df)
......
- 下载blob
from azure.storage.blob import BlobServiceClient
def main(req: func.HttpRequest) -> func.HttpResponse:
account_name = 'andyprivate'
account_key = 'h4pP1f****='
blob_service_client = BlobServiceClient(account_url=f'https://{account_name }.blob.core.windows.net/',credential=account_key)
blob_client = blob_service_client.get_blob_client(container='test',blob='sample.xlsx')
downloader =blob_client.download_blob()
df=pd.read_excel(downloader.readall())
print(df)
....