问题描述
有人会说这需要两个单独的问题,但是它们是相互关联的,所以我只在这里写下它们。
1。制作多索引列
我有三个数据框:
data_large = pd.DataFrame({"name":["a","b","c"],"sell":[10,60,50],"buy":[20,30,40]})
data_mini = pd.DataFrame({"name":["b","c","d"],"sell":[60,20,10],"buy":[30,50,40]})
data_topix = pd.DataFrame({"name":["a",80,0],"buy":[70,40]})
但是首先,我想使它们的列像下面这样多索引。
这是我尝试过的方法,但未按预期工作。 name
处于索引级别Nikkei225Large
iterables = [['Nikkei225Large'],['name','buy','sell']]
index_large = pd.MultiIndex.from_product(iterables,names=['product','sell_buy'])
data_large.columns = index_large
2。例如,将具有多个索引列的多个熊猫连接起来。使用reduce
接下来,在列name
上将三个数据帧完全外部联接。预期输出为:
就目前而言,我只是使用reduce
来加入他们,如下所示,但我想使用多索引列。
from functools import reduce
dfs = {0: data_large,1: data_mini,2: data_topix}
def agg_df(dfList):
df_agged = reduce(lambda left,right: pd.merge(left,right,left_index=True,right_index=True,on='name',how='outer'),dfList)
return df_agged
df_final = agg_df(dfs.values())
任何帮助将不胜感激!
解决方法
IIUC,您可以使用带有pd.concat
参数的keys
:
df_out = pd.concat([dfi.set_index('name') for dfi in [data_large,data_mini,data_topix]],keys=['Nikkei225Large','Nikkei225Mini','Topix'],axis=1)\
.rename_axis(index=['Name'],columns=['product','buy_sell'])
输出:
product Nikkei225Large Nikkei225Mini Topix
buy_sell sell buy sell buy sell buy
Name
a 10.0 20.0 NaN NaN 10.0 70.0
b 60.0 30.0 60.0 30.0 80.0 30.0
c 50.0 40.0 20.0 50.0 0.0 40.0
d NaN NaN 10.0 40.0 NaN NaN