问题描述
帮助循环比较不同表中的两列并将匹配项返回到第一个表。
data1:
|name | revenue |
|-------|---------|
|Alice | 700 |
|Bob | 1000 |
|Gerry | 300 |
|Alex | 600 |
|Kyle | 800 |
data2:
|Name | revenue |
|-------|---------|
|Bob | 900 |
|Gerry | 400 |
result data1:
|name | revenue | name_result |
|-------|----------|--------------|
|Alice | 700 | |
|Bob | 1000 | Bob |
|Gerry | 300 | Gerry |
|Alex | 600 | |
|Kyle | 800 | |
我尝试使用此代码,但得到所有空值:
import pandas as pd
import numpy as np
def group_category(category):
for name in data['name']:
if name in data2['Name']:
return name
else: name = ''
return name
data['name_result'] = data['name'].apply(group_category)
解决方法
使用:
def group_category(category):
if category in df2['Name'].unique():
return category
else:
return ''
#Finally:
#Since you are going to use this function on Series so used map() in place of apply()
df1['name_result']=df1['name'].map(group_category)
或
通过 isin()
和 where()
:
df1['name_result']=df1['name'].where(df1['name'].isin(df2['Name']),'')
,
我找到了解决方案:
df1.loc[df1['name'].isin(df2['name_result'].unique()),'brand'] = 'Adidas Collection'