python reindex不显示列值

问题描述

我正在JupyterLab中使用python 3.7和pandas,并尝试使用reindex在MultiIndex表中添加缺失的行(index应该是具有Col1和Col2元素的产品)。

df2 = pd.DataFrame({'Col1': [10,11,12,],'Col2': ['2012','2012','2013','2014','2015','2012'],'result': [1,1,1]})

start,end = df2['Col1'].min(),df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start,end+1),Col2_values ],names=['Col1','Col2'],)
 
df2 = df2.reindex( newIndex,columns=['result'] )
print(df2)

显示的是:

           result
Col1 Col2        
10   2012     NaN
     2013     NaN

有人可以告诉我如何保存列结果的值吗?

解决方法

MultiIndex将{{1}中的col1用于col2reindex中将MultiIndex用于reindex,否则RangeIndex使用默认{{ 1}}(0-6),没有值匹配,并且得到所有NaN的输出列:

#last year is 2016
df2 = pd.DataFrame({'Col1': [10,11,12,],'Col2': ['2012','2012','2013','2014','2015','2016'],'result': [1,1,1]})

start,end = df2['Col1'].min(),df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start,end+1),Col2_values ],names=['Col1','Col2'],)
print (newIndex) 
MultiIndex([(10,'2012'),(10,'2013'),'2014'),'2015'),'2016'),(11,(12,'2016')],'Col2'])

df2 = df2.set_index(['Col1','Col2']).reindex( newIndex,columns=['result'] )
print(df2)
           result
Col1 Col2        
10   2012     1.0
     2013     NaN
     2014     NaN
     2015     NaN
     2016     NaN
11   2012     1.0
     2013     NaN
     2014     NaN
     2015     NaN
     2016     1.0
12   2012     NaN
     2013     1.0
     2014     1.0
     2015     1.0
     2016     NaN