从服务器提取多个 NetCDF 文件、索引、循环和保存文件的最佳功能？

问题描述

编程新手。我正在尝试更改一个脚本，该脚本旨在拉取包含数据的 .txt 文件，现在从 HTTP 服务器拉取 NetCDF 文件，下载、重命名并保存在本地（以及另一个服务器位置）。我已经粘贴了基本代码，包括 NetCDF 文件的实际浮标数据文件名。我相信 urlrequest 行存在问题。我试过 urllib.request.open 和 url.request.retrieve 都出现错误。

    import os
    import urllib
    import urllib.request
    import shutil
    import netCDF4
    import requests
           
    # Weblink for location of spectra and wave data
    webSpectra = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/swden/41004/41004w9999.nc'
    
    webWave = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
       
    #set save location for each
    saveloc = 'saveSpectra41004w9999.nc'
    saveloc2 = 'saveWave41004h9999.nc'
    
    # perform pull
    try:
            urllib.request.urlopen(webSpectra,saveloc)
        except urllib.error.HTTPError as exception:
            print('Station: 41004 spectra file not available')
            print(exception)
        
        try:     
            urllib.request.urlopen(webWave,saveloc2)    
        except urllib.error.HTTPError as exception:
            print('Station: 41004 wave file not available')
            print(exception)
        print ('Pulling data for 41004)
        print('Percent complete '+ str(round(100*(count/len(stationIndex)))))

    print('Done')

我的错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-5e5ebd26fe46> in <module>
     59     # perform pull
     60     try:
---> 61         urllib.request.urlopen(webSpectra,saveloc)
     62     except urllib.error.HTTPError as exception:
     63         print('Station: 41004 spectra file not available')

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in urlopen(url,data,timeout,cafile,capath,cadefault,context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url,timeout)
    224 
    225 def install_opener(opener):

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in open(self,fullurl,timeout)
    522         for processor in self.process_request.get(protocol,[]):
    523             meth = getattr(processor,meth_name)
--> 524             req = meth(req)
    525 
    526         response = self._open(req,data)

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in do_request_(self,request)
   1277                 msg = "POST data should be bytes,an iterable of bytes," \
   1278                       "or a file object. It cannot be of type str."
-> 1279                 raise TypeError(msg)
   1280             if not request.has_header('Content-type'):
   1281                 request.add_unredirected_header(

TypeError: POST data should be bytes,or a file object. It cannot be of type str.

解决方法

您只想通过外观下载文件。您可以使用 nctoolkit (https://nctoolkit.readthedocs.io/en/latest/) 执行此操作。这会将文件下载到一个临时位置。然后你可以导出到 xarray 或 pandas 等，或者只是保存文件。

以下代码适用于一个文件：

import nctoolkit as nc
ds = nc.open_url('https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc')
# convert to xarray dataset
ds_xr = ds.to_xarray()
# convert to pandas dataframe
df = ds.to_dataframe()
# save to location
ds.to_nc("outfile.nc")

如果由于依赖问题等原因上述方法不起作用，您可以使用urllib：

import urllib.request
url = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
urllib.request.urlretrieve(url,'/tmp/temp/nc')

...并保存在本地。

根据我的理解，您应该 open() 本地文件，而不是 POST 到 URL。

netcdf python urllib