Pandas 的多个数据框分组

问题描述

我有 2 个数据集可以使用:

ID Date       Amount   
1  2020-01-02 1000
1  2020-01-09 200
1  2020-01-08 400

还有另一个数据集,它告诉每个 ID 是一周中最频繁的哪一天和一个月中最频繁的一周(有多个这样的 ID)

ID Pref_Day_Of_Week_A Pref_Week_Of_Month_A 
1  3                 2

对于此 ID,对于 ID 1 而言,星期四是一周中最频繁的一天,而当月的第 2 周是该月中最频繁的一周。

对于所有 ID,我希望找到发生在一周中最频繁的一天和一个月中最频繁的一周的所有金额的总和(因此需要 groupby):

ID Amount_On_Pref_Day Amount_Pref_Week
1  1200               600

如果有人能帮助我使用 Pandas 计算这个数据框,我将不胜感激。作为参考,我使用此函数查找给定日期的月份中的第几周:

#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
    """Get week number in a month.
    
    Parameters: 
        dates (pd.Series): Series of dates.
    Returns: 
        pd.Series: Week number in a month.
    """
    firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1,unit='d')
    return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1

解决方法

想法仅过滤匹配dayofweekweek并聚合sum,最后由concat连接在一起:

#https://stackoverflow.com/a/64192858/2901002
def weekinmonth(dates):
    """Get week number in a month.
    
    Parameters: 
        dates (pd.Series): Series of dates.
    Returns: 
        pd.Series: Week number in a month.
    """
    firstday_in_month = dates - pd.to_timedelta(dates.dt.day - 1,unit='d')
    return (dates.dt.day-1 + firstday_in_month.dt.weekday) // 7 + 1



df.Date = pd.to_datetime(df.Date)
df['dayofweek'] = df.Date.dt.dayofweek
df['week'] = weekinmonth(df['Date'])

f = lambda x: x.mode().iat[0]
df1 = (df.groupby('ID',as_index=False).agg(Pref_Day_Of_Week_A=('dayofweek',f),Pref_Week_Of_Month_A=('week',f)))

s1 = df1.rename(columns={'Pref_Day_Of_Week_A':'dayofweek'}).merge(df).groupby('ID')['Amount'].sum()
s2 = df1.rename(columns={'Pref_Week_Of_Month_A':'week'}).merge(df).groupby('ID')['Amount'].sum()

df2 = pd.concat([s1,s2],axis=1,keys=('Amount_On_Pref_Day','Amount_Pref_Week'))
print (df2)
    Amount_On_Pref_Day  Amount_Pref_Week
ID                                      
1                 1200               600