问题描述
以下是4个JSON文件:
[
{
"name": "Apple","year": "2014","rating": "21"
},{
"name": "Pear","year": "2003","rating": ""
},{
"name": "Pineapple","year": "1967","rating": "60"
},]
[
{
"name": "Pineapple","rating": "5.7"
},{
"name": "Apple","year": "1915","rating": "2.3"
},"rating": "3.7"
}
]
[
{
"name": "Apple","rating": "2.55"
}
]
[
{
"name": "APPLE","rating": "+4"
},{
"name": "LEMON","rating": "+3"
}
]
当您在所有4个文件中搜索“ Apple”时,您想要返回1个名称,1年和4个评分:
name: Apple (closest match to search term across all 4 files)
year: 2014 (the MOST COMMON year for Apple across first 3 JSONs)
rating: 21 (from JSON1)
3.7 (from JSON2)
2.55 (from JSON3)
+4 (from JSON4)
现在假装JSON3(或任何JSON)的“名称:Apple”没有不匹配。在这种情况下,代替返回以下内容。假设至少一个文件中至少有一个匹配项。
name: Apple (closest match to search term across all 4 files)
year: 2014 (the MOST COMMON year for Apple across first 3 JSONs)
rating: 21 (from JSON1)
3.7 (from JSON2)
Not Found (from JSON3)
+4 (from JSON4)
您将如何在Python中获得此输出?
此问题与Python - Getting the intersection of two Json-Files中的示例代码类似,除了有4个文件,其中1个文件缺少 year 键,并且我们不需要 rating 键的值。
这是到目前为止的内容,仅适用于上述两组JSON:
import json
with open('1.json','r') as f:
json1 = json.load(f)
with open('2.json','r') as f:
json2 = json.load(f)
json2[0]['name'] = list(set(json2[0]['name']) - set(json1[0]['name']))
print(json.dumps(json2,indent=2))
我从中获得输出,但与我要实现的输出不匹配。例如,这是输出的一部分:
{
"name": [
"a","n","i","P"
],
解决方法
在使用set
构造函数创建集合时,它期望一个可迭代的对象,并将迭代该对象的值以创建集合。因此,当您尝试直接从字符串进行设置时,您最终会得到
name = set('Apple')
# name = {'A','p','l','e'}
因为字符串是一个由字符组成的可迭代对象。相反,您希望像这样将字符串包装到列表或元组中
name = set(['Apple'])
# name = {'Apple'}
在您的情况下看起来像
json2[0]['name'] = list(set([json2[0]['name']]) - set([json1[0]['name']]))
但是我仍然不认为这确实是您要实现的目标。相反,我建议您遍历每个json文件,制作自己的字典,该字典以json文件中的名称为索引。词典中的每个值都将具有另一个包含两个键rating
和year
的词典,这两个键都有值列表。完成字典的构建后,您将得到每个名称的等级和年份列表,然后可以通过选择年份列表中最频繁的年份,将每个年份列表转换为单个值。
这是字典外观的一个例子
{
"Apple": { "rating": [21,3.7,...],"year": [1915,2014,2014] }
"Pineapple": ...
...
}