问题描述
我正在尝试在下面的数据框中查看2种不同的“ If”条件,但必须对这两种条件进行分组分析。
对于每个“ Text_1”,如果多个“ Text_2”的“ Relation”为“ COVER”,而Text_2的“ Type”不同,即为“ COVER”,则输出应为“ Y”,否则应该不会。
df = pd.DataFrame({
'ID': [1,1,2,2],'Text_1': ['ABCDEF','ABCDEF','GHIJKL','MNOPQR','MNOPQR'],'Text_2': ['ABC','BCD','GHI','JKL','XYZ','RST','MNO','PQR','XYZ'],'Relation': ['COVER','COVER','UNRELATED','UNRELATED'],'Type': ['NAME','PLACE','NAME','THING','PLACE']})
ID Text_1 Text_2 Relation Type
0 1 ABCDEF ABC COVER NAME
1 1 ABCDEF BCD COVER PLACE
2 1 ABCDEF GHI UNRELATED PLACE
3 1 GHIJKL JKL COVER NAME
4 1 GHIJKL XYZ UNRELATED PLACE
5 1 GHIJKL RST UNRELATED THING
6 2 MNOPQR MNO COVER PLACE
7 2 MNOPQR PQR COVER PLACE
8 2 MNOPQR XYZ UNRELATED PLACE
这是输出的样子:
df_output = pd.DataFrame({
'ID': [1,'PLACE'],'Output': ['Y','Y','N','N']})
ID Text_1 Text_2 Relation Type Output
0 1 ABCDEF ABC COVER NAME Y
1 1 ABCDEF BCD COVER PLACE Y
2 1 ABCDEF GHI UNRELATED PLACE N
3 1 GHIJKL JKL COVER NAME N
4 1 GHIJKL XYZ UNRELATED PLACE N
5 1 GHIJKL RST UNRELATED THING N
6 2 MNOPQR MNO COVER PLACE N
7 2 MNOPQR PQR COVER PLACE N
8 2 MNOPQR XYZ UNRELATED PLACE N
解决方法
第一个掩码COVER
,使用groupby转换来计算Text_1
和Text_2
的计数。使用np.where(condition,if true,else)
施加条件。删除临时列。
m=df.Relation=='COVER'#mask 'COVER'
df[['temp1','temp2']]=df[m].groupby('Text_1')[['Text_2','Type']].transform('nunique')#create temporary columns which would have count for Text_1 and Text_2. Please take note only the Relation COVER has count,the others are NaN
df=df.assign(output=np.where((df.temp1==2)&(df.temp2==2),'Y','N')).drop(columns=['temp1','temp2'])#Use np where together with boolean select to ensure only temp1>1 and tempt>1 counts are made Y
ID Text_1 Text_2 Relation Type output
0 1 ABCDEF ABC COVER NAME Y
1 1 ABCDEF BCD COVER PLACE Y
2 1 ABCDEF GHI UNRELATED PLACE N
3 1 GHIJKL JKL COVER NAME N
4 1 GHIJKL XYZ UNRELATED PLACE N
5 1 GHIJKL RST UNRELATED THING N
6 2 MNOPQR MNO COVER PLACE N
7 2 MNOPQR PQR COVER PLACE N
8 2 MNOPQR XYZ UNRELATED PLACE N
工作方式
第1步:遮罩
m=df.Relation=='COVER'
print(m)
0 True
1 True
2 False
3 True
4 False
5 False
6 True
7 True
8 False
步骤:2创建临时列,其计数为Text_1
和Text_2
df[['temp1','Type']].transform('nunique')
print(df)
ID Text_1 Text_2 Relation Type temp1 temp2
0 1 ABCDEF ABC COVER NAME 2.0 2.0
1 1 ABCDEF BCD COVER PLACE 2.0 2.0
2 1 ABCDEF GHI UNRELATED PLACE NaN NaN
3 1 GHIJKL JKL COVER NAME 1.0 1.0
4 1 GHIJKL XYZ UNRELATED PLACE NaN NaN
5 1 GHIJKL RST UNRELATED THING NaN NaN
6 2 MNOPQR MNO COVER PLACE 2.0 1.0
7 2 MNOPQR PQR COVER PLACE 2.0 1.0
8 2 MNOPQR XYZ UNRELATED PLACE NaN NaN
步骤:3分配条件,这是上面的结果与np.where的组合
df=df.assign(output=np.where((df.temp1==2)&(df.temp2==2),'temp2'])