分析按不同列分组的2个列

问题描述

我正在尝试在下面的数据框中查看2种不同的“ If”条件,但必须对这两种条件进行分组分析。

对于每个“ Text_1”,如果多个“ Text_2”的“ Relation”为“ COVER”,而Text_2的“ Type”不同,即为“ COVER”,则输出应为“ Y”,否则应该不会。

df = pd.DataFrame({
    'ID': [1,1,2,2],'Text_1': ['ABCDEF','ABCDEF','GHIJKL','MNOPQR','MNOPQR'],'Text_2': ['ABC','BCD','GHI','JKL','XYZ','RST','MNO','PQR','XYZ'],'Relation': ['COVER','COVER','UNRELATED','UNRELATED'],'Type': ['NAME','PLACE','NAME','THING','PLACE']})
   ID  Text_1 Text_2   Relation   Type
0   1  ABCDEF    ABC      COVER   NAME
1   1  ABCDEF    BCD      COVER  PLACE
2   1  ABCDEF    GHI  UNRELATED  PLACE
3   1  GHIJKL    JKL      COVER   NAME
4   1  GHIJKL    XYZ  UNRELATED  PLACE
5   1  GHIJKL    RST  UNRELATED  THING
6   2  MNOPQR    MNO      COVER  PLACE
7   2  MNOPQR    PQR      COVER  PLACE
8   2  MNOPQR    XYZ  UNRELATED  PLACE

这是输出的样子:

df_output = pd.DataFrame({
    'ID': [1,'PLACE'],'Output': ['Y','Y','N','N']})
   ID  Text_1 Text_2   Relation   Type Output
0   1  ABCDEF    ABC      COVER   NAME      Y
1   1  ABCDEF    BCD      COVER  PLACE      Y
2   1  ABCDEF    GHI  UNRELATED  PLACE      N
3   1  GHIJKL    JKL      COVER   NAME      N
4   1  GHIJKL    XYZ  UNRELATED  PLACE      N
5   1  GHIJKL    RST  UNRELATED  THING      N
6   2  MNOPQR    MNO      COVER  PLACE      N
7   2  MNOPQR    PQR      COVER  PLACE      N
8   2  MNOPQR    XYZ  UNRELATED  PLACE      N

解决方法

第一个掩码COVER,使用groupby转换来计算Text_1Text_2的计数。使用np.where(condition,if true,else)施加条件。删除临时列。

   m=df.Relation=='COVER'#mask 'COVER'
   df[['temp1','temp2']]=df[m].groupby('Text_1')[['Text_2','Type']].transform('nunique')#create temporary columns which would have count for Text_1 and Text_2. Please take note only the Relation COVER has count,the others are NaN
   df=df.assign(output=np.where((df.temp1==2)&(df.temp2==2),'Y','N')).drop(columns=['temp1','temp2'])#Use np where together with boolean select to ensure only temp1>1 and tempt>1 counts are made Y



  ID  Text_1 Text_2   Relation   Type output
0   1  ABCDEF    ABC      COVER   NAME      Y
1   1  ABCDEF    BCD      COVER  PLACE      Y
2   1  ABCDEF    GHI  UNRELATED  PLACE      N
3   1  GHIJKL    JKL      COVER   NAME      N
4   1  GHIJKL    XYZ  UNRELATED  PLACE      N
5   1  GHIJKL    RST  UNRELATED  THING      N
6   2  MNOPQR    MNO      COVER  PLACE      N
7   2  MNOPQR    PQR      COVER  PLACE      N
8   2  MNOPQR    XYZ  UNRELATED  PLACE      N

工作方式

第1步:遮罩

m=df.Relation=='COVER'
print(m)

0     True
1     True
2    False
3     True
4    False
5    False
6     True
7     True
8    False

步骤:2创建临时列,其计数为Text_1Text_2

df[['temp1','Type']].transform('nunique')
print(df)



ID  Text_1 Text_2   Relation   Type  temp1  temp2
0   1  ABCDEF    ABC      COVER   NAME    2.0    2.0
1   1  ABCDEF    BCD      COVER  PLACE    2.0    2.0
2   1  ABCDEF    GHI  UNRELATED  PLACE    NaN    NaN
3   1  GHIJKL    JKL      COVER   NAME    1.0    1.0
4   1  GHIJKL    XYZ  UNRELATED  PLACE    NaN    NaN
5   1  GHIJKL    RST  UNRELATED  THING    NaN    NaN
6   2  MNOPQR    MNO      COVER  PLACE    2.0    1.0
7   2  MNOPQR    PQR      COVER  PLACE    2.0    1.0
8   2  MNOPQR    XYZ  UNRELATED  PLACE    NaN    NaN

步骤:3分配条件,这是上面的结果与np.where的组合

df=df.assign(output=np.where((df.temp1==2)&(df.temp2==2),'temp2'])

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...