我有一个大熊猫的数据框,我想通过电子邮件分组,获取日期的最大值并保留状态列.但是在groupby上没有使用状态.
示例:给出以下数据帧df
+-------------------------------+ | email | status | date | +-------------------------------+ | test1 | viewed | 01/07/18 | --------------------------------- | test1 |not viewed| 03/07/18 | --------------------------------- | test2 |not viewed| 02/07/18 | --------------------------------- | test2 | viewed | 01/07/18 | --------------------------------- | test3 |not viewed| 03/07/18 | --------------------------------- | test3 | viewed | 04/07/18 | ---------------------------------
我使用以下代码,但我想保留状态列,但我不知道如何.
df.groupby([email]).aggregate({'date': max})
期望的输出:
+-------------------------------+ | email | status | date | +-------------------------------+ | test1 |not viewed| 03/07/18 | --------------------------------- | test2 |not viewed| 02/07/18 | --------------------------------- | test3 | viewed | 04/07/18 | ---------------------------------