读取远程数据集 (NBM) 时 xarray MissingDimensionsError

问题描述

读取 NBM 的远程数据集 (https://vlab.ncep.noaa.gov/web/mdl/nbm) 时,我得到一个 xarray.core.variable.MissingDimensionsError。我确定我在 open_dataset 中遗漏了一些 arg 设置。

您可以在此处查看数据的结构:https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD.html。使用 ncdump -h https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD

显示了完整的结构 here

使用 time1 的变量:

  • Precipitation_type_surface_probability_between_1p0_and_2
import xarray as xr
url = "https://thredds-jumbo.unidata.ucar.edu/thredds/dodsC/grib/NCEP/NBM/CONUS/TwoD"
ds = xr.open_dataset(url)

如果你删除这个变量,它就会进入下一次昏暗

ds = xr.open_dataset(url,drop_variables="time1")
xarray.core.variable.MissingDimensionsError: 'time2' has more than 1-dimension and the same name as one of its dimensions ('reftime4','time2'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

完整回溯

Traceback (most recent call last):
  File "<stdin>",line 1,in <module>
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py",line 575,in open_dataset
    ds = maybe_decode_store(store,chunks)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py",line 471,in maybe_decode_store
    ds = conventions.decode_cf(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/conventions.py",line 600,in decode_cf
    ds = Dataset(vars,attrs=attrs)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/dataset.py",line 630,in __init__
    variables,coord_names,dims,indexes,_ = merge_data_and_coords(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py",line 467,in merge_data_and_coords
    return merge_core(
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py",line 594,in merge_core
    collected = collect_variables_and_indexes(aligned)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/merge.py",line 278,in collect_variables_and_indexes
    variable = as_variable(variable,name=name)
  File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/core/variable.py",line 154,in as_variable
    raise MissingDimensionsError(
xarray.core.variable.MissingDimensionsError: 'time1' has more than 1-dimension and the same name as one of its dimensions ('reftime','time1'). xarray disallows such variables because they conflict with the coordinates used to label dimensions.

本地测试

wget https://ftp.ncep.noaa.gov/data/nccf/com/blend/prod/blend.20210214/00/core/blend.t00z.core.f001.co.grib2

解决方法

如果要从 Xarray 中的 THREDDS Forecast Model Run Collection (FRMC) 虚拟数据集访问这些“TwoD”数据集,可以先使用 NetCDF 库对它们进行切片,然后将切片变量传递给 Xarray。如果你用 Dask 包装 NetCDF 变量,你可以保持懒惰。

以下是为 HRRR 的最后 60 个值提取“最佳时间序列”的示例,但使用 1 小时预测数据(而不是使用 FMRC 最佳时间序列的默认“分析”0 小时预测):

import netCDF4
import xarray as xr
from dask import array as da
import hvplot.xarray

url = 'https://thredds.unidata.ucar.edu/thredds/dodsC/grib/NCEP/HRRR/CONUS_2p5km/TwoD'
nc = netCDF4.Dataset(url)
arr = da.from_array(nc['Temperature_height_above_ground'])
tau = 1
da = xr.DataArray(arr[-60:,tau,:,:],dims=['time','y','x'],name='temp')

这是证明它有效的时间序列图: enter image description here