如何将我的字符串拆分为带有分隔符异常的嵌套字典?

问题描述

我需要使用 .split 拆分字符串并制作嵌套字典,并且使用了 ',' 但是,如下面的数据所示,, 出现多次在“已审核”字段中,导致 Python 错误地将值标记为键。 Review 字段是字典中的一个列表。

我的数据示例如下:

{"Username": "bkpn1412","dob": "31.07.1983","State": "Oregon","Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}
{"Username": "gqjs4414","dob": "27.07.1998","State": "Massachusetts","Reviewed": ["fa04fe6c0dd5189f54fe600838da43d3"]}
{"Username": "eehe1434","dob": "08.08.1950","State": "Idaho","Reviewed": []}
{"Username": "hkxj1334","dob": "03.08.1969","State": "Florida","Reviewed": ["f129b1803f447c2b1ce43508fb822810","3b0c9bc0be65a3461893488314236116"]}
{"Username": "jjbd1412","dob": "26.07.2001","State": "Georgia","Reviewed": []}

我当前的代码

#converting list to string using list comprehension
pdict = ' '.join([str(item) for item in products_list]) 
print(type(pdict))

rdict = ' '.join([str(item) for item in reviewers_list]) 
print(type(rdict))

#converting string to list of string
plist  = pdict.split(',')
rlist = rdict.split(',')
print(type(plist))
print(type(rlist))

#list of string to dict
products_dicts = {}
for item in plist:
    t = products_dicts
    for part in item.split(':'):
        t = t.setdefault(part,{})
print(type(products_dicts))

reviewers_dicts = {}
for item in rlist:
    t = reviewers_dicts
    for part in item.split(':'):
        t = t.setdefault(part,{})
print(type(reviewers_dicts))

我尝试过使用不同的分隔符,但没有奏效,我该如何解决这个问题(最好无需通过手动删除所有不需要的逗号的大型数据集)。

预期的输出应该类似于:

{"Username": "bkpn1412","Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}

{"Username": "hkxj1334","3b0c9bc0be65a3461893488314236116"]}

解决方法

解决这个问题的一种方法是使用内置函数json.loads

假设您有一个包含输入数据的文件:

inputdata.txt

{"Username": "bkpn1412","DOB": "31.07.1983","State": "Oregon","Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}
{"Username": "gqjs4414","DOB": "27.07.1998","State": "Massachusetts","Reviewed": ["fa04fe6c0dd5189f54fe600838da43d3"]}
{"Username": "eehe1434","DOB": "08.08.1950","State": "Idaho","Reviewed": []}
{"Username": "hkxj1334","DOB": "03.08.1969","State": "Florida","Reviewed": ["f129b1803f447c2b1ce43508fb822810","3b0c9bc0be65a3461893488314236116"]}
{"Username": "jjbd1412","DOB": "26.07.2001","State": "Georgia","Reviewed": []}

对该数据实施解析器将是:

import json
filename = "inputdata.txt"
with open(filename) as f:
    for line in f.readlines():
        parsed_data = json.loads(line)
        print(parsed_data)

一次处理一行(无需加载内存中的所有文件)。

如果您不想将所有文件加载到内存中进行处理,您可以更改逻辑以使用python默认包中的函数readline。

import json
filename = "inputdata.txt"
with open(filename) as f:
    line = f.readline()
    while line:
        parsed_data = json.loads(line)
        print(parsed_data)
        line = f.readline()    

在这个例子中,我们使用上下文管理器“with”,为了很好地解释为什么使用它,check here。 如果您不想使用 with keywork 作为上下文管理器,则在处理完文件后,您必须显式调用 close() 方法(以避免资源泄漏)。

如果您想了解有关文件处理的更多信息,可以查看python official documentation about function open used in files