问题描述
在下面的数据框中,我想为其关联的primary_fruit创建一个具有code_num的新列“ refer”,如果它与priamry_fruit没有关联,则应留空。
dct = {'Store': ('A','A','B','B'),'code_num':(101,102,103,104,105,106,201,202,203),'fruits': ('apple','cherry','cherry,apple','banana','rambo','apple,cherry','toy')
}
df = pd.DataFrame(dct)
fruit_list= ["apple","banana","cherry"]
primary_fruit = 'banana'
print(df)
Store code_num fruits
A 101 apple
A 102 cherry
A 103 cherry,apple
A 104 banana
A 105 cherry
A 106 rambo
B 201 apple,cherry
B 202 banana
B 203 toy
预期数据框:
Store code_num fruits reference
A 101 apple 104
A 102 cherry 104
A 103 cherry,apple 104
A 104 banana 104
A 105 cherry 104
A 106 rambo
B 201 apple,cherry 202
B 202 banana 202
B 203 toy
在我当前的问题中,我不希望106和203中的值,因为它们不属于“ fruit_list”
我尝试了下面的代码,但它只是为primary_fruit(104和202)获取了参考号,其余都留为空白
unique_store_id = df.Store.unique()
for store_id in unique_store_id:
s = (df.Store == store_id) & df['fruits'].isin(unique_all_parts)
primary_code = df[df['fruits']==first_primary]['code_num']
df.loc[s,'reference'] = primary_code
感谢您的帮助:)
更新: @Scott Boston的建议在完整的数据集上运行良好。但是在切片/切块的情况下,它给出[KeyError:'None'],我将不得不使用此逻辑将其应用于每个将更改“ fruit_list”和“ primary_fuit”的商店的切片数据帧。 (我必须在最初的问题中表示歉意。) 概念:根据每个商店的主要水果,应在参考中提供代码编号
解决方法
尝试一下:
$new_html = str_replace(array('\"','\/','"','\n'),array('"','/','\'',"\n"),$old_html);
function unicode_convert($match){
return mb_convert_encoding(pack('H*',$match[1]),'UTF-8','UCS-2BE'); }
$new_html = preg_replace_callback('/\\\\u([0-9a-fA-F]{4})/',"unicode_convert",$new_html);
输出:
dct = {'Store': ('A','A','B','B'),'code_num':(101,102,103,104,105,106,201,202,203),'fruits': ('apple','cherry','cherry,apple','banana','rambo','apple,cherry','toy')
}
df = pd.DataFrame(dct)
fruit_list= ["apple","banana","cherry"]
primary_fruit = 'banana'
m = df.set_index(['Store','code_num'])['fruits'].str.split(',').explode().isin(fruit_list).max(level=[0,1]).to_numpy()
df['primary_code'] = df.loc[df['fruits'] == primary_fruit,'code_num']
#Changed this line
df['reference'] = df.groupby('Store')['primary_code'].transform(lambda x: x.loc[x.first_valid_index()]).where(m,'')
df_out = df.drop('primary_code',axis=1)
print(df_out)