将列文本模式与定义的列表进行比较，并将定义列表中的第一个匹配字符串返回到数据框中的新列

问题描述

假设我有咖啡店菜单列表。我想取文本并返回数量和商品名称。

this.name = response.data.data.name

现在我想从我的菜单中提取匹配的编号和订购的项目名称（菜单中的任何第一个匹配项）

示例文本：带上 1 个 Capputino

输出数据帧：

menu = ['Cappuccino','Café Latte','Expresso','Macchiato ','Irish coffee ']

不需要的文本输入拼写将与菜单完全相同，因此它只会从匹配列的菜单列表中返回匹配的模式。

我写了下面的代码，但它在匹配列中返回了 Nan。感谢任何指导。

代码：

      text                          Quantity                   match

     Bring 1 Capputino                 1                     Cappuccino

解决方法

请看以下内容：

import re

menu_map = {'cap': 'Cappucino','caf': 'Café Latte',"cof": "Irish coffee","cok": "Cookie","cook": "Cookie"} 

order = input('Enter a substring: ')

df = pd.DataFrame({'Text': [order]})
df["Quantity"] = df.Text.str.extract('(\d+)')
df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')',flags=re.IGNORECASE)
df['Replacement'] = df.Match.str.casefold().map(menu_map)

order == 'Bring 1 Caputino' 的结果

               Text Quantity Match Replacement
0  Bring 1 Caputino        1   Cap   Cappucino

和order == 'Bring 1 Caxutino'

               Text Quantity Match Replacement
0  Bring 1 Caxutino        1   NaN         NaN

因为 menu_map 中没有捕获 'Caxutino' 的模式。

在我看来，这就是您真正要寻找的东西？由于您不想要 Replacement 列（我仅将其用于透明度），您可以这样做：

df['Match'] = df.Text.str.extract('(' + '|'.join(menu_map) + ')',flags=re.IGNORECASE)
df.Match = df.Match.str.casefold().map(menu_map)

（我不明白你想用 for ... if ... 部分实现什么。）

编辑：现在我了解了 for ... if ... 部分，我建议采用以下方法：

args_dict = {'capu': 'Cappuccino','chap': 'Cappuccino','cof': 'Coffee','coof': 'Coffee','chof': 'Coffee','cok': 'Cookie','chok': 'Cookie','choo': 'Cookie'}

order = order.split()
for i,word in enumerate(order):
    word = word.casefold()
    for key in args_dict:
        if word.startswith(key):
            order[i] = args_dict[key]
            break
order = ' '.join(order)

或者：

args_dict = {('capu','chap'): 'Cappuccino',('cof','coof','chof'): 'Coffee',('cok','chok','choo'): 'Cookie'}

order = order.split()
for i,word in enumerate(order):
    word = word.casefold()
    for keys,replacement in args_dict.items():
        for key in keys:
            if word.startswith(key):
                order[i] = replacement
                break
order = ' '.join(order)

python python-3.x regex regex regex

将列文本模式与定义的列表进行比较，并将定义列表中的第一个匹配字符串返回到数据框中的新列

问题描述

代码：

解决方法

相关问答