如何检查熊猫群中n个正值

问题描述

我有一个看起来像这样的数据框

pd.DataFrame({'a': ['cust1','cust1','cust2','cust3','cust3'],'year': [2017,2018,2019,2020,2017,2020],'amt': [2,4,'NaN',2,3,5]})

        a  year  amt
0   cust1  2017    2
1   cust1  2018    0
2   cust1  2019    4
3   cust1  2020  NaN
4   cust2  2017    2
5   cust2  2018    2
6   cust2  2019    3
7   cust2  2020    3
8   cust3  2017    3
9   cust3  2018    2
10  cust3  2019  NaN
11  cust3  2020    5

我需要检查“ a”列中每个组的“ amt”列中是否至少有3个正值。结果数据框应如下图所示

        a  year  amt   cond
0   cust1  2017    2  False
1   cust1  2018    0  False
2   cust1  2019    4  False
3   cust1  2020  NaN  False
4   cust2  2017    2   True
5   cust2  2018    2   True
6   cust2  2019    3   True
7   cust2  2020    3   True
8   cust3  2017    3   True
9   cust3  2018    2   True
10  cust3  2019  NaN   True
11  cust3  2020    5   True

以下逻辑适用:

cust1 = False(仅2个正值)(2017,2019)

cust2 = True为4个正值

cust3 = True为3个正值

解决方法

让我们尝试transformsum

df = df.replace('NaN',np.nan)
df['cond'] = df.amt.gt(0).groupby(df['a']).transform('sum')>2
df
Out[62]: 
        a  year  amt   cond
0   cust1  2017  2.0  False
1   cust1  2018  0.0  False
2   cust1  2019  4.0  False
3   cust1  2020  NaN  False
4   cust2  2017  2.0   True
5   cust2  2018  2.0   True
6   cust2  2019  3.0   True
7   cust2  2020  3.0   True
8   cust3  2017  3.0   True
9   cust3  2018  2.0   True
10  cust3  2019  NaN   True
11  cust3  2020  5.0   True
,

我建议您必须使用for循环。然后,您必须修改数据集或创建另一个数据集。

for i in range(df.shape[0])
  ### Your algoritm goes here (Your only need to select the file an the operation you want to do)