将DataArray转换为DataFrame

问题描述

是否有一种简单的方法可以将xarray DataArray转换为pandas DataFrame,我可以在其中规定将哪些尺寸转换为索引/列?例如,假设我有一个DataArray

import xarray as xr
weather = xr.DataArray(
    name='weather',data=[['Sunny','Windy'],['Rainy','Foggy']],dims=['date','time'],coords={
        'date': ['Thursday','Friday'],'time': ['Morning','Afternoon'],}
)

结果为:

<xarray.DataArray 'weather' (date: 2,time: 2)>
array([['Sunny',dtype='<U5')
Coordinates:
  * date     (date) <U8 'Thursday' 'Friday'
  * time     (time) <U9 'Morning' 'Afternoon'

假设我现在想将其移动到按日期和时间列索引的pandas DataFrame。我可以通过在结果数据帧上使用.to_dataframe()然后使用.unstack()来做到这一点:

>>> weather.to_dataframe().unstack()
           weather        
time     Afternoon Morning
date                      
Friday       Foggy   Rainy
Thursday     Windy   Sunny

但是,大熊猫将对事物进行排序,而不是早上,然后是下午,我得到了下午,然后是早晨。我本来希望会有像这样的API

weather.to_dataframe(index_dims=[...],column_dims=[...])

这可以为我进行此重塑,而无需在以后重新排序索引和列。

解决方法

在xarray 0.16.1中,dim_order已添加到.to_dataframe中。这符合您的需求吗?

xr.DataArray.to_dataframe(
    self,name: Hashable = None,dim_order: List[Hashable] = None,) -> pandas.core.frame.DataFrame
Docstring:
Convert this array and its coordinates into a tidy pandas.DataFrame.

The DataFrame is indexed by the Cartesian product of index coordinates
(in the form of a :py:class:`pandas.MultiIndex`).

Other coordinates are included as columns in the DataFrame.

Parameters
----------
name
    Name to give to this array (required if unnamed).
dim_order
    Hierarchical dimension order for the resulting dataframe.
    Array content is transposed to this order and then written out as flat
    vectors in contiguous order,so the last dimension in this list
    will be contiguous in the resulting DataFrame. This has a major
    influence on which operations are efficient on the resulting
    dataframe.

    If provided,must include all dimensions of this DataArray. By default,dimensions are sorted according to the DataArray dimensions order.