问题描述
我正在JupyterLab中使用python 3.7和pandas,并尝试使用reindex在MultiIndex表中添加缺失的行(index应该是具有Col1和Col2元素的产品)。
df2 = pd.DataFrame({'Col1': [10,11,12,],'Col2': ['2012','2012','2013','2014','2015','2012'],'result': [1,1,1]})
start,end = df2['Col1'].min(),df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start,end+1),Col2_values ],names=['Col1','Col2'],)
df2 = df2.reindex( newIndex,columns=['result'] )
print(df2)
显示的是:
result
Col1 Col2
10 2012 NaN
2013 NaN
有人可以告诉我如何保存列结果的值吗?
解决方法
MultiIndex
将{{1}中的col1
用于col2
,reindex
中将MultiIndex
用于reindex
,否则RangeIndex
使用默认{{ 1}}(0-6
),没有值匹配,并且得到所有NaN
的输出列:
#last year is 2016
df2 = pd.DataFrame({'Col1': [10,11,12,],'Col2': ['2012','2012','2013','2014','2015','2016'],'result': [1,1,1]})
start,end = df2['Col1'].min(),df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start,end+1),Col2_values ],names=['Col1','Col2'],)
print (newIndex)
MultiIndex([(10,'2012'),(10,'2013'),'2014'),'2015'),'2016'),(11,(12,'2016')],'Col2'])
df2 = df2.set_index(['Col1','Col2']).reindex( newIndex,columns=['result'] )
print(df2)
result
Col1 Col2
10 2012 1.0
2013 NaN
2014 NaN
2015 NaN
2016 NaN
11 2012 1.0
2013 NaN
2014 NaN
2015 NaN
2016 1.0
12 2012 NaN
2013 1.0
2014 1.0
2015 1.0
2016 NaN