使用循环将可变数据从多个 NetCDF 写入新的组合 NetCDF 文件时出现问题

问题描述

我有许多 NetCDF 文件,其中包含三个我感兴趣的变量,我想单独提取这些变量,然后创建一个新的 NetCDF 数据集进行写入。我还有一个文本列表,其中包含我需要添加为每个变量的坐标的站名。现在,我的数据不正确,时间坐标也没有被正确拉出。 RI 使用 netCDF4.Dataset 创建我的合并数据集,并使用 xarray 打开我的单个 NetCDF 文件xarray.open_dataset 中,因此重新采样数据更容易。我的问题很可能与我的 for 循环有关,因为我没有大量使用循环来提取数据或创建 NetCDF 数据集的经验。这个问题必须是我拉我的时间变量的地方,因为我有一个部分我试图从以前的脚本中重用。我的代码如下:

import pandas as pd
import matplotlib.pyplot as plt
import netCDF4 as nc
import matplotlib.dates as mdates
import numpy as np
import xarray as xr
import os

# List directory where observation files are located.
NDBCobsdir = '/work/Observation-Data/NDBC/wave/April_1_2021/'

# # List directory for where to write the output files.  
obsOutDir = '/work/Pre-Processing/obs-nc-files/April_1_2021/'
    
# Name output files for observation data.
nameWaveObs = 'waveObs.nc'
if os.path.exists(obsOutDir+nameWaveObs):
    os.remove(obsOutDir+nameWaveObs)
waveObsNC = nc.Dataset(obsOutDir+'waveObs.nc','w')

## Specify time range and interval
init = '2021-04-01 3:00:00 AM'
end = '2021-04-02 0:00:00 AM'
interval = '3H'

# Create empty NetCDF file to write merged data to 
longitude = waveObsNC.createDimension('longitude',None) 
latitude = waveObsNC.createDimension('latitude',None) 
station = waveObsNC.createDimension('station',None)  
time = waveObsNC.createDimension('time',None)  
ncTPD = waveObsNC.createVariable('tp',np.float32,('time','station',))  
ncHSD = waveObsNC.createVariable('hs',)) 
ncMWD = waveObsNC.createVariable('mwd',))
nctimes = waveObsNC.createVariable('time',np.float64,))
ncstations = waveObsNC.createVariable('station',('station',))
nclatitudes = waveObsNC.createVariable('latitude',))
nclongitudes = waveObsNC.createVariable('longitude',))
ncTPD.units = 'seconds'
ncHSD.units = 'meters'
ncMWD.units = '°'

# Create list of all obs files 
NDBCFiles = sorted(os.listdir(NDBCobsdir))

# Put all obs files in one list,starting with NDBC for getting time range
listObsFiles = []

# Appending remaining NDBC files to list of obs files. 
for file in NDBCFiles:
    if file.endswith('9h2021.nc'):
        listObsFiles.append(file)

# create list of station names for dataset writing 
stationIndex = []
for i in listObsFiles:
    if i.endswith('9h2021.nc'):
        statID = i[8:13]
        stationIndex.append(statID)
 
# Using this from a prevIoUs script but don't think it is right
timeRange = pd.date_range(init,end,freq=interval)
timeSeries = pd.DataFrame(data=timeRange[:])
timeSeries[0] = timeSeries[0].dt.strftime('%Y-%m-%d %r')

# Open each station NetCDF file individually,looping through to pull data variables needed
for obsFile in listObsFiles:
    count = listObsFiles.index(obsFile)
    if obsFile.endswith('9h2021.nc'): # NDBC files
        ncFileName = os.path.split(obsFile)[1]
        curFile = xr.open_dataset(NDBCobsdir+obsFile)
        
        # Resample data to desired interval
        resample = curFile.resample(time=interval).mean()
        
        # Pull vars of interest
        tpObs = resample.variables['average_wpd']
        tpObs = np.squeeze(tpObs)
        tpObs[tpObs == 99.00] = np.nan; tpObs[tpObs == 999.00] = np.nan
        hsObs = resample.variables['wave_height']
        hsObs = np.squeeze(hsObs)
        hsObs[hsObs == 99.00] = np.nan; hsObs[hsObs == 999.00] = np.nan
        mwdobs = resample.variables['mean_wave_dir']
        mwdobs = np.squeeze(mwdobs)

        lonObs = curFile.variables['longitude'][0]
        lonObs = float(lonObs.data)
               
        latObs = curFile.variables['latitude'][0]
        latObs = float(latObs.data)
                
        # Pull time data from curFile
        t = resample.variables['time']
                
        ncTPD[:,count]=tpObs
        ncHSD[:,count]=hsObs
        ncMWD[:,count]=mwdobs
        
        nctimes=timeSeries
        ncstations[:]=stationIndex
        nclongitudes[:,count]=lonObs
        nclatitudes[:,count]=latObs

waveObsNC.close()

当我打印 stationIndex 时,我得到(为了简化只有两个,使用 200+):

['44100','51000']

要自己重现这个,wave NetCDF 数据文件位于: 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/44100/44100h.nc',其中“44100”与我正在使用的站号一致。

我的输出没有错误,但我的时间数据没有正确写入并且我的变量中有错误的数据值:

Dimensions:
station: 2time: 16
Coordinates:
time
(time)
float64
9.969e+36 9.969e+36 ... 9.969e+36
array([9.96921e+36,9.96921e+36,9.96921e+36])
station
(station)
float32
4.41e+04 5.1e+04
array([44100.,51000.],dtype=float32)
Data variables:
tp
(time,station)
timedelta64[ns]
...
hs
(time,station)
float32
...
mwd
(time,station)
float32
...
latitude
(time,station)
float32
...
longitude
(time,station)
float32
...

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)