问题描述
我正在尝试总结降雨的总和,但只需添加平均温度:
data = [{'year':2020,'area': 'new-hills','rainfall': 100,'temperature': 20},{'year':2021,'rainfall': 110,{'year':2019,'rainfall': 111,'temperature': 19},{'year':2020,'area': 'cape-town','rainfall': 70,'temperature': 25},'rainfall': 80,'temperature': 23},'rainfall': 75,'temperature': 24},'area': 'mumbai','rainfall': 200,'temperature': 37 },'rainfall': 170,'temperature': 39 },'rainfall': 180,'temperature': 38 },]
这有效,但我还需要显示平均温度,但我不知道如何将其添加并保留在相同摘要行中。这只是一个例子,但我需要在现实世界的项目中使用相同的安排。
df = pd.DataFrame.from_dict(data)
container = []
for label,_df in df.groupby(['area']):
_df.loc['summary'] = _df[['rainfall']].sum() # <-How do I add 2nd column that's not another 'sum'
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
我需要的示例图片(我已填充绿色值以显示我需要代码执行的操作)。
谢谢。
如果你想使用它,我的代码作为 jupyter notebook 在 GitHub 上。 Pandas Summary Jupyter Notebook
解决方法
你可以试试这个:
import pandas as pd
data = [{'year':2020,'area': 'new-hills','rainfall': 100,'temperature': 20},{'year':2021,'rainfall': 110,{'year':2019,'rainfall': 111,'temperature': 19},{'year':2020,'area': 'cape-town','rainfall': 70,'temperature': 25},'rainfall': 80,'temperature': 23},'rainfall': 75,'temperature': 24},'area': 'mumbai','rainfall': 200,'temperature': 37},'rainfall': 170,'temperature': 39},'rainfall': 180,'temperature': 38 }]
df = pd.DataFrame.from_dict(data)
container = []
for label,_df in df.groupby(['area']):
_df.loc['summary'] = _df.agg({'rainfall': 'sum','temperature': 'mean'})
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
df
输出:
编辑
根据后续请求用常数替换平均温度,这里是修改后的代码:
import pandas as pd
data = [{'year': 2020,{'year': 2021,{'year': 2019,{'year': 2020,'temperature': 38}]
my_constants = [10,20,30]
def map_constant(x,v):
x.mean()
return v
df = pd.DataFrame.from_dict(data)
container = []
for i,group in enumerate(df.groupby(['area'])):
label,_df = group
_df.loc['summary'] = _df.agg({'rainfall': 'sum','temperature': (lambda x: map_constant(x,my_constants[i]))})
container.append(_df)
df_summary = pd.concat(container)
df = (df_summary.fillna(''))
df
输出: