问题描述
以前,我在不同的列表中匹配了值(此线程 How to get a python lookup to return another column after match)
import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['a cat dog - multiple','grey puppy - narrow term','a cat puppy','reddog - single no spaces','acatdog - multiple no spaces']})
df2 = pd.DataFrame({'broadTerm':['cat','cat','dog','dog'],'NarrowTerm':['cat','kitten','puppy','dog']})
有几个问题:
- 匹配单元格中有 1 个或多个值的值(例如数据帧的第 1 行)
- 匹配不包含任何空格的值(例如 df 的第 4 行和第 5 行)
基本代码是
df['Animal'] = df['Name'].str.extract(pat = f"({'|'.join(df2.NarrowTerm)})")[0].map(dict(df2.iloc[:,::-1].values))
但这仅适用于单次命中单元格/返回第一个命中)
解决方法
我们可以尝试findall
然后explode
df['step1'] = df['Name'].str.findall(pat = f"({'|'.join(df2.NarrowTerm)})")
df['animal'] = df['step1'].explode().map(dict(df2.iloc[:,::-1].values)).groupby(level=0).agg(list)
df
Out[63]:
Name step1 animal
0 a cat dog - multiple [cat,dog] [cat,dog]
1 grey puppy - narrow term [puppy] [dog]
2 a cat puppy [cat,puppy] [cat,dog]
3 reddog - single no spaces [dog] [dog]
4 acatdog - multiple no spaces [cat,dog]