在包含子字符串的字典中递归搜索路径

问题描述

我正在尝试确定使用正则表达式搜索嵌套字典并将路径返回到该字符串的每次出现的最快方法。我只对字符串形式的值感兴趣,而不对可能不是显式字符串形式的其他值感兴趣。递归不是我的强项。这是一个示例JSON,假设我正在寻找包含“ blah”的所有绝对路径

d = {'id': 'abcde','key1': 'blah','key2': 'blah blah','nestedlist': [{'id': 'qwerty','nestednestedlist': [{'id': 'xyz','keyA': 'blah blah blah'},{'id': 'fghi','keyZ': 'blah blah blah'}],'anothernestednestedlist': [{'id': 'asdf','keyQ': 'blah blah'},{'id': 'yuiop','keyW': 'blah'}]}]}

我找到了以下代码段,但未能使其返回路径,而不仅仅是打印它们。除此之外,添加一个“如果值是一个字符串并且包含re.search()然后将路径追加到列表中”应该不难。

def search_dict(v,prefix=''):
    
    if isinstance(v,dict):
        for k,v2 in v.items():
            p2 = "{}['{}']".format(prefix,k)
            search_dict(v2,p2)
    elif isinstance(v,list):
        for i,v2 in enumerate(v):
            p2 = "{}[{}]".format(prefix,i)
            search_dict(v2,p2)
    else:
        print('{} = {}'.format(prefix,repr(v)))

解决方法

这两个答案都急切地计算结果,在返回第一个(如果有的话)可用结果之前先用尽整个输入字典。我们可以使用yield from来编码更多Python程序-

def search_substr(t = {},q = ""):
  def loop(t,path):
    if isinstance(t,dict):
      for k,v in t.items():
        yield from loop(v,(*path,k))  # <- recur
    elif isinstance(t,list):
      for k,v in enumerate(t):
        yield from loop(v,str):
      if q in t:
        yield path,t                   # <- output a match
  yield from loop(t,())                # <- init

for (path,value) in search_substr(d,"blah"):
  print(path,value)

结果-

('key1',) blah
('key2',) blah blah
('nestedlist','nestednestedlist','keyA') blah blah blah
('nestedlist',1,'keyZ') blah blah blah
('nestedlist','anothernestednestedlist','keyQ') blah blah
('nestedlist','keyW') blah

请注意,我们使用q测试目标t中的子字符串q in t。如果您实际上想为此使用regexp-

from re import compile

def search_re(t = {},re,path):                      # <- add re
    if isinstance(t,k))    # <- carry re
    elif isinstance(t,str):
      if re.search(t):                        # <- re.search
        yield path,t
  yield from loop(t,compile(q),())          # <- compile q

现在我们可以使用正则表达式进行搜索-

for (path,value) in search_re(d,r"[abhl]{4}"):
  print(path,'keyW') blah

让我们尝试使用其他查询进行另一次搜索-

for (path,r"[dfs]{3}"):
  print(path,value)
('nestedlist','id') asdf

最后,当查询不匹配时,search_substrsearch_re不会产生任何结果-

print(list(search_re(d,r"zzz")))
# []
,

您只需要初始化一个输出列表,在当前调用中找到的append个元素,并通过递归调用返回的结果extend对其进行初始化。

尝试一下:

def search_dict(v,prefix=''):
    result = []
    if isinstance(v,dict):
        for k,v2 in v.items():
            p2 = "{}['{}']".format(prefix,k)
            result.extend(search_dict(v2,p2))
    elif isinstance(v,list):
        for i,v2 in enumerate(v):
            p2 = "{}[{}]".format(prefix,i)
            result.extend(search_dict(v2,p2))
    else:
        result.append('{} = {}'.format(prefix,repr(v)))
    return result
,

Adam.Er8是准确的,我只是想对这个问题给出更明确的答案:

def search_dict(v,re_term,prefix=''):

    re_term = re.compile(re_term)
    result = []
    if isinstance(v,prefix = p2))
    elif isinstance(v,str) and re.search(re_term,v):
        result.append(prefix)
    return result