问题描述
是否有一种简单的方法可以将xarray DataArray转换为pandas DataFrame,我可以在其中规定将哪些尺寸转换为索引/列?例如,假设我有一个DataArray
import xarray as xr
weather = xr.DataArray(
name='weather',data=[['Sunny','Windy'],['Rainy','Foggy']],dims=['date','time'],coords={
'date': ['Thursday','Friday'],'time': ['Morning','Afternoon'],}
)
结果为:
<xarray.DataArray 'weather' (date: 2,time: 2)>
array([['Sunny',dtype='<U5')
Coordinates:
* date (date) <U8 'Thursday' 'Friday'
* time (time) <U9 'Morning' 'Afternoon'
假设我现在想将其移动到按日期和时间列索引的pandas DataFrame。我可以通过在结果数据帧上使用.to_dataframe()
然后使用.unstack()
来做到这一点:
>>> weather.to_dataframe().unstack()
weather
time Afternoon Morning
date
Friday Foggy Rainy
Thursday Windy Sunny
但是,大熊猫将对事物进行排序,而不是早上,然后是下午,我得到了下午,然后是早晨。我本来希望会有像这样的API
weather.to_dataframe(index_dims=[...],column_dims=[...])
这可以为我进行此重塑,而无需在以后重新排序索引和列。
解决方法
在xarray 0.16.1中,dim_order
已添加到.to_dataframe
中。这符合您的需求吗?
xr.DataArray.to_dataframe(
self,name: Hashable = None,dim_order: List[Hashable] = None,) -> pandas.core.frame.DataFrame
Docstring:
Convert this array and its coordinates into a tidy pandas.DataFrame.
The DataFrame is indexed by the Cartesian product of index coordinates
(in the form of a :py:class:`pandas.MultiIndex`).
Other coordinates are included as columns in the DataFrame.
Parameters
----------
name
Name to give to this array (required if unnamed).
dim_order
Hierarchical dimension order for the resulting dataframe.
Array content is transposed to this order and then written out as flat
vectors in contiguous order,so the last dimension in this list
will be contiguous in the resulting DataFrame. This has a major
influence on which operations are efficient on the resulting
dataframe.
If provided,must include all dimensions of this DataArray. By default,dimensions are sorted according to the DataArray dimensions order.