我有一个具有以下结构的数据帧
Debtor ID | Accountrating | AccountratingDate | AmountOutstanding |AmountPastDue John SNow Closed 2017-03-01 0 0 John SNow Delayed 2017-04-22 2000 500 John SNow Closed 2017-05-23 0 0 John SNow Delayed 2017-07-15 6000 300 Sarah Parker Closed 2017-02-01 0 0 Edward Hall Closed 2017-05-01 0 0 Douglas Core Delayed 2017-01-01 1000 200 Douglas Core Delayed 2017-06-01 1000 400
我想要实现的是
Debtor ID | Incidents of delay | TheMostRecentOutstanding | TheMostRecentPastDue John SNow 2 6000 300 Sarah Parker 0 0 0 Edward Hall 0 0 0 Douglas Core 2 1000 400
计算延迟事件非常简单
df_account["pastDuebool"] = df_account['amtPastDue'] > 0 new_df = pd.DataFrame(index = df_account.groupby("Debtor ID").groups.keys()) new_df['Incidents of delay'] = df_account.groupby("Debtor ID")["pastDuebool"].sum()
我正在努力提取最新的amonts和pastdue.我的代码是这样的
new_df["TheMostRecentOutstanding"] = df_account.loc[df_account[df_account["Accountrating"]=='Delayed'].groupby('Debtor ID')["AccountratingDate"].idxmax(),"AmountOutstanding"] new_df["TheMostRecentPastDue"] = df_account.loc[df_account[df_account["Accountrating"]=='Delayed'].groupby('Debtor ID')["AccountratingDate"].idxmax(),"AmountPastDue"]
但他们返回具有所有NaN值的系列.请帮帮我,我在这里做错了什么?
解决方法
你可以试试这个:
df.sort_values('AccountratingDate')\ .query('Accountrating == "Delayed"')\ .groupby('Debtor ID')[['Accountrating','AmountOutstanding','AmountPastDue']]\ .agg({'Accountrating':'count','AmountOutstanding':'last','AmountPastDue':'last'})\ .reindex(df['Debtor ID'].unique(),fill_value=0)\ .reset_index()
输出:
Debtor ID Accountrating AmountOutstanding AmountPastDue 0 John SNow 2 6000 300 1 Sarah Parker 0 0 0 2 Edward Hall 0 0 0 3 Douglas Core 2 1000 400
细节:
>首先按AccountratingDate排序数据框,以获取最后一个日期
最后一项记录
>将数据帧仅过滤到Accountrating等于的数据帧
‘延迟’
> Groupby Debtor ID与要聚合的列,然后使用agg与a
字典表示如何聚合每列
>使用Debtor ID的唯一值重新索引以填充零
没有任何延误
>并且,重置索引.
df.sort_values('AccountratingDate')\ .query('Accountrating == "Delayed"')\ .groupby('Debtor ID')[['Accountrating',fill_value=0)\ .rename(columns={'Accoutrating':'Incidents of delay','AmountOutstanding':'TheMostRecentOutstanding','AmountPastDue':'TheMostRecentPastDue'})\ .reset_index()
输出:
Debtor ID Accountrating TheMostRecentOutstanding TheMostRecentPastDue 0 John SNow 2 6000 300 1 Sarah Parker 0 0 0 2 Edward Hall 0 0 0 3 Douglas Core 2 1000 400