如何根据数据帧中该唯一列值的下一个日期为每个唯一列值计算差异?

问题描述

我有一个df,例如:

date       | prod_number | prod_count | prod_factor
2018-01-01 | 1           | 5          | 3
2018-02-01 | 1           | 20         | 3
2018-04-01 | 1           | 10         | 3
2019-09-01 | 2           | 8          | 5
2018-09-02 | 2           | 7          | 5
2018-10-03 | 2           | 10         | 5

对于每个“ prod_number”,我想要从上次日期开始进行更改,然后乘以prod_factor:

每个“ prod_number”的第一个条目都没有计算差的值,因此它为NONE或0,更容易些。

赞:

date       | prod_number | prod_count | prod_factor | change      | prod_factor*change
2018-01-01 | 1           | 5          | 3           | NONE/0      | NONE/0
2018-02-01 | 1           | 20         | 3           | 15 # 20-5   | 45  # 3*15
2018-04-01 | 1           | 10         | 3           | -10 # 10-20 | -30 # 3*-10

2019-09-01 | 2           | 8          | 5           | NONE/0      | NONE/0
2018-09-02 | 2           | 7          | 5           | -1 # 7-8    | -5  # 5*-1
2018-10-03 | 2           | 10         | 5           | 3 # 10-7    | 15  # 5*3

我如何用熊猫来做到这一点?

解决方法

使用groupby.diff,然后将两列相乘:

df['change'] = df.groupby('prod_number')['prod_count'].diff()
df['prod_factor*change'] = df['change'] * df['prod_factor']

         date  prod_number  prod_count  prod_factor  change  prod_factor*change
0  2018-01-01            1           5            3     NaN                 NaN
1  2018-02-01            1          20            3    15.0                45.0
2  2018-04-01            1          10            3   -10.0               -30.0
3  2019-09-01            2           8            5     NaN                 NaN
4  2018-09-02            2           7            5    -1.0                -5.0
5  2018-10-03            2          10            5     3.0                15.0
,

您可以使用np.where和diff()

import pandas as pd
import numpy as np
df=pd.DataFrame([['2018 - 01 - 01',1,5,3],['2018 - 02 - 01',20,['2018 - 04 - 01',10,['2019 - 09 - 01',2,8,5],['2018 - 09 - 02',7,['2018 - 10 - 03',5]  ],columns=['date','prod_number','prod_count','prod_factor'])
df['change']=np.where(
    df['prod_number'].diff() == 0,#cond to check if  prod_number is the same
    df['prod_count'].diff(),#value if true
  0  #else we 0
)
                 date  prod_number  prod_count  prod_factor  change
0  2018 - 01 - 01            1           5            3     0.0
1  2018 - 02 - 01            1          20            3    15.0
2  2018 - 04 - 01            1          10            3   -10.0
3  2019 - 09 - 01            2           8            5     0.0
4  2018 - 09 - 02            2           7            5    -1.0
5  2018 - 10 - 03            2          10            5     3.0