将 Pandas 数据帧转换为 xarray 数据集后大小和顺序发生变化

问题描述

我正在尝试将数据帧导出到 netcdf 文件。据我所知，我可以使用 xarray.Dataset.to_netcdf 函数来做到这一点。因此，我必须将数据框转换为 xarray 数据集。这是我在做什么：

ypredicted_df = pd.DataFrame(ypredicted,index=ytest.index,columns=ytest.columns.values)
ypredicted_ds = ypredicted_df.to_xarray()  # to ds
ypredicted_ds.to_netcdf(os.path.join(output_path,'ypredicted_wholescene_highres_' + str(max_features) + '.nc'))

ypredicted 是一个 ndarray。当我打印 ypredicted_df 和 ypredicted_ds.to_dataframe() 以检查是否有变化时，我看到订单的那部分和大小发生了变化：

print(ypredicted_df)

                       ST_B10
lat       lon
50.684918 13.282882 -0.213598
          13.283247  0.521064
          13.283613  0.162646
          13.283978  0.090892
          13.284343 -0.060037
...                       ...
51.397346 13.671611  4.871557
          13.671977  4.168761
          13.672342  1.421363
          13.672708  1.761741
          13.673073  2.938208

[5979909 rows x 1 columns]



print(ypredicted_ds.to_dataframe())

                       ST_B10
lat       lon
50.684918 13.282882 -0.213598
          13.283247  0.521064
          13.283613  0.162646
          13.283978  0.090892
          13.284343 -0.060037
...                       ...
51.397346 12.281465  3.387909
          12.281099  3.021199
          12.280734  2.889664
          12.280369  3.197318
          12.280003  2.702418

[7441114 rows x 1 columns]

数据框的大小不相等
最后一行的顺序不同（降序，而第一行是升序）

我已经检查过是否包含了一些 nan，但是当我删除 nan 时，大小不会改变。

谁能解释一下，这里发生了什么？为什么从 Pandas 数据帧转换为 xarray 数据集后数据帧不同？还有另一种方法可以让数据帧保持不变吗？或者我可以直接将数据框导出到 netcdf 吗？

感谢您的帮助:)

更新：

我再次尝试删除 nans，现在大小相同，但顺序仍然错误。如果我在绘制它时有一些影响，我现在不知道。

print(ypredicted.to_dataframe().dropna(how='any'))

                       ST_B10
lat       lon
50.684918 13.282882 -0.213598
          13.283247  0.521064
          13.283613  0.162646
          13.283978  0.090892
          13.284343 -0.060037
...                       ...
51.397346 12.281465  3.387909
          12.281099  3.021199
          12.280734  2.889664
          12.280369  3.197318
          12.280003  2.702418

[5979909 rows x 1 columns]

但是，为了绘图，我需要一个数据集，因为我还没有找到绘制数据框的方法。因此我仍然需要从数据集中删除 nan 。我找到了 xarray.Dataset.dropna，但还不能用：

我尝试的第一件事是：

ypredicted_ds.dropna(how='any')

错误信息：

Traceback (most recent call last):
  File "script_randomforest_dem.py",line 174,in <module>
    output_path_identifier,3)
  File "/lustre/scratch2/ws/1/stwa779b-master/04_workspace/randomforest/randomforest.py",line 104,in randomforest
    print(dif_ds.dropna(how='any'))
TypeError: dropna() missing 1 required positional argument: 'dim'

然后我尝试了：

ypredicted_ds.dropna('lon',how='any').to_dataframe()

错误信息：

Empty DataFrame
Columns: [ST_B10]
Index: []
(5979909,1)

ypredicted_ds.dropna('lat',1)

他们都没有工作。当通过 lon 和 lat 删除 nan 时，我可以想象在每个 lon 或 lat 中至少出现一个 nan，因此数据集是空的。现在有人如何使用 xarrays ds.dropna()？

我该如何绘图？ 作为补充，我如何尝试绘制数据集：

import matplotlib.pyplot as plt

fig,ax = plt.subplots(figsize=(10,7))
ax.imshow(ypredicted_ds['ST_B10'],cmap=cmap)

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

netcdf pandas pandas python python-xarray