如何从存储在 Azure 文件共享上的大型 NetCDF 文件中提取数据并发送到 Azure 网页

问题描述

我下载了一些持续 40 年的天气数据（风、浪等），并将这些数据文件以 NetCDF 格式存储在 Azure 文件共享中。我存储了大约 8 TB 的总数据。每个天气参数，比如整个地球表面一年的风速都保存在一个大约 35GB 的文件中。

接下来，我使用 Python 和 Dash 包开发了一个简单的 Azure 网站，用户可以在其中定义位置（纬度、经度）、选择天气参数、日期范围并提交请求。见下面的网站图片：

现在，我希望能够在用户单击提交按钮以提取指定数据、保存在 csv 文件中并提供该文件的下载链接后运行脚本。

适用于 Python 的 Azure 存储文件共享客户端库 (azure-storage-file-share) 允许连接到文件并下载文件。由于一年的数据文件为35GB，下载每年的数据并提取单个网格点不是一种选择。

无论如何我可以直接在 Azure 文件共享上运行脚本来提取所需的数据，然后从网页中检索它？

我试图避免需要从 NetCDF 文件中提取数据并将其推送到网站可以轻松访问的 sql 数据库的情况。

解决方法

您可以在 Azure VM 中装载文件存储。以下是某人如何在本地执行相同操作的示例：Read NetCDF file from Azure file storage

或者，您可能希望改为查看 Azure Blob 存储。据我所知，Azure 文件存储实际上是作为网络文件共享的替代品，就像在局域网上一样。另一方面，Azure Blob 存储更适合从云中流式传输这样的大文件。 Azure Blob 存储概述中有一组很好的 examples for when to use which。

以下是如何从 Azure Blob Python 参考中download a blob 的示例：

# Download the blob to a local file
# Add 'DOWNLOAD' before the .txt extension so you can see both files in the data directory
download_file_path = os.path.join(local_path,str.replace(local_file_name,'.txt','DOWNLOAD.txt'))
print("\nDownloading blob to \n\t" + download_file_path)

with open(download_file_path,"wb") as download_file:
download_file.write(blob_client.download_blob().readall())

azure azure azure azure-web-app-service flask netcdf python