查找百分比差异和具有连续但奇数个日期的差异

问题描述

我有一个数据集df，我希望在其中找到diff和diff的百分比。我希望查看最早的日期，并将此值与下一个日期进行比较：

 id    date         value

 1     11/01/2020   10
 2     11/01/2020   5
 1     10/01/2020   20
 2     10/01/2020   30
 1     09/01/2020   15
 2     09/01/2020   10
 3     11/01/2020   5

所需的输出

  id    date          diff   percent


  1     10/01/2020    5       33                 
  1     11/01/2020   -10     -50
  2     10/01/2020    20      200               
  2     11/01/2020   -25   -83.33
  3     11/01/2020     0       0

我想一次查看一组，然后将上一个值与下一个值进行比较，以找到增加百分比和差异。

例如，

ID 1，从09/01/2020到10/01/2020 ：从 15到20 ，给出了 5 的差异 相差33％

从10/01/2020到11/01/2020： 从 20变为10， -10 的差异和 50％的差异。

这就是我在做什么：

a['date'] = pd.to_datetime(a['date'])
grouped = a.sort_values('date').groupby(['id'])

output = pd.DataFrame({
'date': grouped['date'].agg(lambda x: x.iloc[-1]).values,'diff': grouped['value'].agg(lambda x: x.diff().fillna(0).iloc[-1]).values,'percentdiff': grouped['value'].agg(lambda x: x.pct_change().fillna(0).iloc[-1] * 100).values,'type': grouped['id'].agg(lambda x: x.iloc[0]).values
})

但是，我注意到缺少一些值，因为这是我的输出：

是否可以实现所需的输出？ 也许必须实现循环才能返回到上一个日期行并与下一个日期行进行比较？

任何建议都值得赞赏

解决方法

这里是解决问题的一种方法，假设我正确理解您的逻辑：

我们的想法是对每个组使用shift来计算差异和百分比，

result = (df.sort_values(["id","date","value"])
                  # use this later to drop the first row per group
                  # if number is greater than 1,else leave as-is
          .assign(counter=lambda x: x.groupby("id").date.transform("size"),date_shift=lambda x: x.groupby(["id"]).date.shift(1),value_shift=lambda x: x.groupby("id").value.shift(1),diff=lambda x: x.value - x.value_shift,percent=lambda x: x["diff"].div(x.value_shift).mul(100).round(2))
           # here is where the counter column becomes useful
           # drop rows where date_shift is null and counter is > 1
           # this way if number of rows in the group is just one it is kept,# if greater than one,the first row is dropped,# as the first row would have nulls due to the `shift` method.
          .query("not (date_shift.isna() and counter>1)")
          .loc[:,["id","diff","percent"]]
          .fillna(0))

result



   id   date        diff    percent
2   1   10/01/2020   5.0     33.33
0   1   11/01/2020  -10.0   -50.00
3   2   10/01/2020   20.0    200.00
1   2   11/01/2020  -25.0   -83.33
6   3   11/01/2020   0.0     0.00

numpy pandas pandas percentage python

查找百分比差异和具有连续但奇数个日期的差异

问题描述

解决方法

相关问答