有没有更好的方法使用字典来解决这个问题？

问题描述

我正在尝试解决以下问题：

示例 csv 数据集如下所示（数据集中共有 1000 行）：

我想解决的问题是：

实现 AND 条件，例如steel keyboard 应该只匹配在某处同时包含 steel 和 keyboard 的项目名称（不是必须按这个顺序）
实施 OR 条件，例如steel keyboard 应该匹配项目名称 steel table 和 wooden keyboard，因为它们都包含我们的搜索词之一
实现数字范围查询，例如steel keyboard 价格在 40 美元到 70 美元之间

我已经使用以下方法解决了问题，但我觉得使用字典会更简单：

class SimpleSearch: 
    
    def __init__(self,path):
        self.df = pd.read_csv(path)
    
        
    def match_keyword(self,pattern):
        self.df['matches'] = self.df['name'].str.findall(pattern).apply(lambda x: list(set(x)))
        
        
        ids = []
        for i in self.df.itertuples():
            if i.matches != []: 
                 ids.append(i.id)
                    
        return ids
    
if __name__ == '__main__': 
    path = "random_path/file.csv"
    pattern = "steel keyboard"
    search_obj = SimpleSearch(path)
    print(search_obj.match_keyword(pattern))

是否有一种简单的方法可以使用字典区分 And 和 Or 操作的逻辑？我的解决方案此时只解决 AND。
解决数字范围查询的最佳方法是什么？我想不出一种方法，可以提供一些帮助。

解决方法

在下面的数据框中，有 3 个结果匹配名称 (1xAND,2xOR) 和价格标准 ([40,70])

>>> df
                       name   price
0   Lightweight Linen Watch   54.56
1               Steel Table   63.88  # OK
2  Keyboard With Steel Keys   48.24  # OK
3           Wooden Keyboard  104.29
4         Small Rubber Lamp   82.69
5       Durable Leather Car    9.88
6            Steel Keyboard   59.45  # OK
7   Fantastic Granite Bench   22.21
8            Apple Keyboard  999.99

用熊猫解决

TL;DR

import re

search = "steel keyboard"
search = fr"({'|'.join(search.split())})"  # '(steel|keyboard)'
min_price = 40
max_price = 70

name_result = df["name"].str.findall(search,re.IGNORECASE).apply(len)
price_result = df["price"].between(min_price,max_price)

out = df.loc[(name_result > 0) & (price_result == True)]

>>> out
                       name  price
1               Steel Table  63.88
2  Keyboard With Steel Keys  48.24
6            Steel Keyboard  59.45

名称标准

可以同时进行

import re
search = "steel keyboard"
search = fr"({'|'.join(search.split())})"

name_result = df["name"].str.findall(search,re.IGNORECASE).apply(len)

>>> pd.concat([df["name"],name_result],axis="columns")
                       name  name
0   Lightweight Linen Watch     0  # no match
1               Steel Table     1  # partial match (ANY of words <- OR)
2  Keyboard With Steel Keys     2  # full match (ALL words <- AND)
3           Wooden Keyboard     1
4         Small Rubber Lamp     0
5       Durable Leather Car     0
6            Steel Keyboard     2
7   Fantastic Granite Bench     0
8            Apple Keyboard     1

0：没有结果
1 到 N-1：部分匹配。至少找到了一个词。
N：完全匹配。找到所有单词 => N = len(search.split())

价格标准

简单得多！

min_price = 40
max_price = 70

price_result = df["price"].between(min_price,max_price)

结果一起应用所有规则：

out = df.loc[(name_result > 0) & (price_result == True)]

>>> out
                       name  price
1               Steel Table  63.88
2  Keyboard With Steel Keys  48.24
6            Steel Keyboard  59.45

用`dict`求解

import re

search = "steel keyboard"
search = fr"({'|'.join(search.split())})"  # '(steel|keyboard)'
search = re.compile(search,re.IGNORECASE)
min_price = 40
max_price = 70

data = df.set_index("name").squeeze().to_dict()

out = {name: price for name,price in data.items()
           if search.search(name) and min_price <= price <= max_price}

>>> out
{'Steel Table': 63.88,'Keyboard With Steel Keys': 48.24,'Steel Keyboard': 59.45}

data-structures dictionary pandas pandas python-3.x

有没有更好的方法使用字典来解决这个问题？

问题描述

解决方法

用熊猫解决

用dict求解

用`dict`求解