问题描述
我需要使用 .split
拆分字符串并制作嵌套字典,并且使用了 ','
但是,如下面的数据所示,,
出现多次在“已审核”字段中,导致 Python 错误地将值标记为键。 Review 字段是字典中的一个列表。
我的数据示例如下:
{"Username": "bkpn1412","dob": "31.07.1983","State": "Oregon","Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}
{"Username": "gqjs4414","dob": "27.07.1998","State": "Massachusetts","Reviewed": ["fa04fe6c0dd5189f54fe600838da43d3"]}
{"Username": "eehe1434","dob": "08.08.1950","State": "Idaho","Reviewed": []}
{"Username": "hkxj1334","dob": "03.08.1969","State": "Florida","Reviewed": ["f129b1803f447c2b1ce43508fb822810","3b0c9bc0be65a3461893488314236116"]}
{"Username": "jjbd1412","dob": "26.07.2001","State": "Georgia","Reviewed": []}
我当前的代码:
#converting list to string using list comprehension
pdict = ' '.join([str(item) for item in products_list])
print(type(pdict))
rdict = ' '.join([str(item) for item in reviewers_list])
print(type(rdict))
#converting string to list of string
plist = pdict.split(',')
rlist = rdict.split(',')
print(type(plist))
print(type(rlist))
#list of string to dict
products_dicts = {}
for item in plist:
t = products_dicts
for part in item.split(':'):
t = t.setdefault(part,{})
print(type(products_dicts))
reviewers_dicts = {}
for item in rlist:
t = reviewers_dicts
for part in item.split(':'):
t = t.setdefault(part,{})
print(type(reviewers_dicts))
我尝试过使用不同的分隔符,但没有奏效,我该如何解决这个问题(最好无需通过手动删除所有不需要的逗号的大型数据集)。
预期的输出应该类似于:
{"Username": "bkpn1412","Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}
{"Username": "hkxj1334","3b0c9bc0be65a3461893488314236116"]}
解决方法
解决这个问题的一种方法是使用内置函数json.loads。
假设您有一个包含输入数据的文件:
inputdata.txt
{"Username": "bkpn1412","DOB": "31.07.1983","State": "Oregon","Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}
{"Username": "gqjs4414","DOB": "27.07.1998","State": "Massachusetts","Reviewed": ["fa04fe6c0dd5189f54fe600838da43d3"]}
{"Username": "eehe1434","DOB": "08.08.1950","State": "Idaho","Reviewed": []}
{"Username": "hkxj1334","DOB": "03.08.1969","State": "Florida","Reviewed": ["f129b1803f447c2b1ce43508fb822810","3b0c9bc0be65a3461893488314236116"]}
{"Username": "jjbd1412","DOB": "26.07.2001","State": "Georgia","Reviewed": []}
对该数据实施解析器将是:
import json
filename = "inputdata.txt"
with open(filename) as f:
for line in f.readlines():
parsed_data = json.loads(line)
print(parsed_data)
一次处理一行(无需加载内存中的所有文件)。
如果您不想将所有文件加载到内存中进行处理,您可以更改逻辑以使用python默认包中的函数readline。
import json
filename = "inputdata.txt"
with open(filename) as f:
line = f.readline()
while line:
parsed_data = json.loads(line)
print(parsed_data)
line = f.readline()
在这个例子中,我们使用上下文管理器“with”,为了很好地解释为什么使用它,check here。
如果您不想使用 with
keywork 作为上下文管理器,则在处理完文件后,您必须显式调用 close()
方法(以避免资源泄漏)。
如果您想了解有关文件处理的更多信息,可以查看python official documentation about function open used in files。