多个匹配和空格变体python 查找以在匹配后返回另一列

问题描述

以前,我在不同的列表中匹配了值(此线程 How to get a python lookup to return another column after match

import pandas as pd
import numpy as np
df = pd.DataFrame({'Name':['a cat dog - multiple','grey puppy - narrow term','a cat puppy','reddog - single no spaces','acatdog - multiple no spaces']})
df2 = pd.DataFrame({'broadTerm':['cat','cat','dog','dog'],'NarrowTerm':['cat','kitten','puppy','dog']})

有几个问题:

  1. 匹配单元格中有 1 个或多个值的值(例如数据帧的第 1 行)
  2. 匹配不包含任何空格的值(例如 df 的第 4 行和第 5 行)

基本代码

df['Animal'] = df['Name'].str.extract(pat = f"({'|'.join(df2.NarrowTerm)})")[0].map(dict(df2.iloc[:,::-1].values))

但这仅适用于单次命中单元格/返回第一个命中)

我如何修改代码来做到这一点?

解决方法

我们可以尝试findall然后explode

df['step1'] = df['Name'].str.findall(pat = f"({'|'.join(df2.NarrowTerm)})")
df['animal'] = df['step1'].explode().map(dict(df2.iloc[:,::-1].values)).groupby(level=0).agg(list)
df
Out[63]: 
                           Name         step1      animal
0          a cat dog - multiple    [cat,dog]  [cat,dog]
1      grey puppy - narrow term       [puppy]       [dog]
2                   a cat puppy  [cat,puppy]  [cat,dog]
3     reddog - single no spaces         [dog]       [dog]
4  acatdog - multiple no spaces    [cat,dog]