问题描述
我有一个数据框,代表餐厅的顾客签到(访问)。 year
仅仅是在餐厅办理登机手续的年份。
data = {
'restaurant_id': ['--1UhMGODdWsrMastO9DZw','--1UhMGODdWsrMastO9DZw','--6MefnULPED_I942VcFNA','--6MefnULPED_I942VcFNA'],'year': ['2016','2016','2017','2011','2012','2012'],}
df = pd.DataFrame (data,columns = ['restaurant_id','year'])
# total number of checkins per restaurant
d = df.groupby('restaurant_id')['year'].count().to_dict()
df['nb_checkin'] = df['restaurant_id'].map(d)
grouped = df.groupby(["restaurant_id"])
avg_annual_visits = grouped["year"].count() / grouped["year"].nunique()
avg_annual_visits = avg_annual_visits.rename("avg_annual_visits")
df = df.merge(avg_annual_visits,left_on="restaurant_id",right_index=True)
df.head(10)
从这里,我不确定如何用熊猫写我想要的东西。如果需要任何澄清,请询问。
谢谢!
解决方法
我想你想做
counts = df.groupby('restaurant_id')['year'].value_counts()
counts.std(level='restaurant_id')
counts
的输出,即每个餐厅每年的总访问量:
restaurant_id year
--1UhMGODdWsrMastO9DZw 2016 4
2017 2
--6MefnULPED_I942VcFNA 2011 2
2012 2
Name: year,dtype: int64
并输出std
restaurant_id
--1UhMGODdWsrMastO9DZw 1.414214
--6MefnULPED_I942VcFNA 0.000000
Name: year,dtype: float64