带有列表的 Python 嵌套字典或带有字典的列表到用于 CSV 输出的平面字典列表

问题描述

我尝试搜索类似的问题，但我发现的问题都不是我需要的。

我正在尝试构建一个可以接受 2 个参数的通用函数：

对象结构
（嵌套）路径列表

并将所有给定的路径转换为平面字典列表，适合以 CSV 格式输出。

例如，如果我有一个结构，例如：

structure = {
    "configs": [
        {
            "name": "config_name","id": 1,"parameters": [
                {
                    "name": "param name","description": "my description","type": "mytype",},{
                    "name": "test","description": "description 2","type": "myothertype","somedata": [
                        'data','data2'
                    ]
                }
            ]
        },{
            "name": "config_name2","id": 2,"type": "mytype2",'data2'
                    ]
                },"type": "myothertype2",}
            ]
        }
    ]
}

并传递以下路径列表：

paths = [
'configs.name',# notice the list structure is omitted (i.e it should be 'configs.XXX.name' where XXX is the elem id). This means I want the name entry of every dict that is in the list of configs
'configs.0.id',# similar to the above but this time I want the ID only from the first config
'configs.parameters.type' # I want the type entry of every parameter of every config
]

由此，该函数应生成平面字典列表。列表中的每个条目对应于 CSV 的一行。每个平面字典都包含所有选定的路径。

例如在这种情况下，我应该看到：

result = [
{"configs.name": "config_name","configs.0.id": 1,"configs.parameters.type": "mytype"},{"configs.name": "config_name","configs.parameters.type": "myothertype"},{"configs.name": "config_name2","configs.parameters.type": "mytype2"},"configs.parameters.type": "myothertype2"}
]

它需要能够对任何传递的结构执行此操作，其中包含嵌套的字典和列表。

编辑：

~~我尝试了@Ajax1234 的代码，它似乎存在一个错误——在某些情况下，它获得的元素数量是预期的两倍。该错误在以下代码中演示：~~

SOLVED：问题由@Ajax1234 编辑解决

import pprint


def get_val(d,rule,match = None,l_matches = []):
   if not rule:
      yield (l_matches,d)
   elif isinstance(d,list):
     if rule[0].isdigit() and (match is None or match[0] == int(rule[0])):
        yield from get_val(d[int(rule[0])],rule[1:],match=match if match is None else match[1:],l_matches=l_matches+[int(rule[0])])
     elif match is None or not rule[0].isdigit():
         for i,a in enumerate(d):
            if not match or i == match[0]:
               yield from get_val(a,l_matches = l_matches+[i])
   else:
      yield from get_val(d[rule[0]],match = match,l_matches = l_matches)

def evaluate(paths,struct,val = {},rule = None):
   if not paths:
      yield val
   else:
      k = list(get_val(struct,paths[0].split('.'),match = rule))
      if k:
         for a,b in k:
            yield from evaluate(paths[1:],val={**val,paths[0]:b},rule = a)
      else:
         yield from evaluate(paths[1:],val=val,rule=rule)
         
paths1 = ['configs.id','configs.parameters.name','configs.parameters.int-param'] # works as expected
paths2 = ['configs.parameters.name','configs.id','configs.parameters.int-param'] # prints everything twice

structure = {
    'configs': [
        {
            'id': 1,'name': 'declaration','parameters': [
                {
                    'int-param': 0,'description': 'decription1','name': 'name1','type': 'mytype1'
                },{
                    'int-param': 1,'description': 'description2','list-param': ['param0'],'name': 'name2','type': 'mytype2'
                }
            ]
        }
    ]
}

pprint.PrettyPrinter(2).pprint(list(evaluate(paths2,structure)))

使用 paths1 列表的输出是：

[ { 'configs.id': 1,'configs.parameters.int-param': 0,'configs.parameters.name': 'name1'},{ 'configs.id': 1,'configs.parameters.int-param': 1,'configs.parameters.name': 'name2'}]

虽然 paths2 的输出产生：

[ { 'configs.id': 1,'configs.parameters.name': 'name2'},'configs.parameters.name': 'name2'}]

解决方法

您可以构建一个查找函数，根据您的规则 (get_val) 搜索值。此外，此函数接受有效索引的匹配列表 (match)，它告诉函数仅遍历字典中具有匹配索引的子列表。这样，搜索函数就可以从之前的搜索中“学习”，并且只返回基于之前搜索的子列表定位的值：

structure = {'configs': [{'name': 'config_name','id': 1,'parameters': [{'name': 'param name','description': 'my description','type': 'mytype'},{'name': 'test','description': 'description 2','type': 'myothertype','somedata': ['data','data2']}]},{'name': 'config_name2','id': 2,'type': 'mytype2','data2']},'type': 'myothertype2'}]}]}
def get_val(d,rule,match = None,l_matches = []):
   if not rule:
      yield (l_matches,d)
   elif isinstance(d,list):
     if rule[0].isdigit() and (match is None or match[0] == int(rule[0])):
        yield from get_val(d[int(rule[0])],rule[1:],match=match if match is None else match[1:],l_matches=l_matches+[int(rule[0])])
     elif match is None or not rule[0].isdigit():
         for i,a in enumerate(d):
            if not match or i == match[0]:
               yield from get_val(a,l_matches = l_matches+[i])
   else:
      yield from get_val(d[rule[0]],match = match,l_matches = l_matches)

def evaluate(paths,struct,val = {},rule = None):
   if not paths:
      yield val
   else:
      k = list(get_val(struct,paths[0].split('.'),match = rule))
      if k:
         for a,b in k:
            yield from evaluate(paths[1:],val={**val,paths[0]:b},rule = a)
      else:
         yield from evaluate(paths[1:],val=val,rule = rule)

paths = ['configs.name','configs.0.id','configs.parameters.type']
print(list(evaluate(paths,structure)))

输出：

[{'configs.name': 'config_name','configs.0.id': 1,'configs.parameters.type': 'mytype'},{'configs.name': 'config_name','configs.parameters.type': 'myothertype'},{'configs.name': 'config_name2','configs.parameters.type': 'mytype2'},'configs.parameters.type': 'myothertype2'}]

编辑：最好按树中的路径深度对输入路径进行排序：

def get_depth(d,path,c = 0):
   if not path:
      yield c
   elif isinstance(d,dict) or path[0].isdigit():
      yield from get_depth(d[path[0] if isinstance(d,dict) else int(path[0])],path[1:],c+1)
   else:
      yield from [i for b in d for i in get_depth(b,c)]

此函数将在树中找到路径目标值所在的深度。然后，应用到主代码：

structure = {'configs': [{'id': 1,'name': 'declaration','parameters': [{'int-param': 0,'description': 'decription1','name': 'name1','type': 'mytype1'},{'int-param': 1,'description': 'description2','list-param': ['param0'],'name': 'name2','type': 'mytype2'}]}]}
paths1 = ['configs.id','configs.parameters.name','configs.parameters.int-param']
paths2 = ['configs.parameters.name','configs.id','configs.parameters.int-param']
print(list(evaluate(sorted(paths1,key=lambda x:max(get_depth(structure,x.split('.')))),structure)))
print(list(evaluate(sorted(paths2,structure)))

csv csv dictionary flatten list python