获取基于1-2个通用键值的4个JSON文件的交集? 蟒蛇

问题描述

以下是4个JSON文件

  • 3个JSON文件具有3个关键字段:名称,等级和年份
  • 1个JSON仅具有2个关键字段:名称,等级(无年份)
[
  {
    "name": "Apple","year": "2014","rating": "21"
  },{
    "name": "Pear","year": "2003","rating": ""
  },{
    "name": "Pineapple","year": "1967","rating": "60"
  },]
[
  {
    "name": "Pineapple","rating": "5.7"
  },{
    "name": "Apple","year": "1915","rating": "2.3"
  },"rating": "3.7"
  }
]
[
  {
    "name": "Apple","rating": "2.55"
  }
]
[
  {
    "name": "APPLE","rating": "+4"
  },{
    "name": "LEMON","rating": "+3"
  }
]

当您在所有4个文件搜索“ Apple”时,您想要返回1个名称,1年和4个评分:

name: Apple (closest match to search term across all 4 files)
year: 2014 (the MOST COMMON year for Apple across first 3 JSONs)
rating:  21 (from JSON1)
        3.7 (from JSON2)
       2.55 (from JSON3)
         +4 (from JSON4)

现在假装JSON3(或任何JSON)的“名称:Apple”没有不匹配在这种情况下,代替返回以下内容。假设至少一个文件中至少有一个匹配项。

name: Apple (closest match to search term across all 4 files)
year: 2014 (the MOST COMMON year for Apple across first 3 JSONs)
rating:  21 (from JSON1)
        3.7 (from JSON2)
  Not Found (from JSON3)
         +4 (from JSON4)

您将如何在Python中获得此输出

此问题与Python - Getting the intersection of two Json-Files中的示例代码类似,除了有4个文件,其中1个文件缺少 year 键,并且我们不需要 rating 键的值。

这是到目前为止的内容,仅适用于上述两组JSON:

import json

with open('1.json','r') as f:
  json1 = json.load(f)

with open('2.json','r') as f:
  json2 = json.load(f)

json2[0]['name'] = list(set(json2[0]['name']) - set(json1[0]['name']))

print(json.dumps(json2,indent=2))

我从中获得输出,但与我要实现的输出不匹配。例如,这是输出的一部分:

  {
    "name": [
      "a","n","i","P"
    ],

解决方法

在使用set构造函数创建集合时,它期望一个可迭代的对象,并将迭代该对象的值以创建集合。因此,当您尝试直接从字符串进行设置时,您最终会得到

name = set('Apple')
# name = {'A','p','l','e'}

因为字符串是一个由字符组成的可迭代对象。相反,您希望像这样将字符串包装到列表或元组中

name = set(['Apple'])
# name = {'Apple'}

在您的情况下看起来像

json2[0]['name'] = list(set([json2[0]['name']]) - set([json1[0]['name']]))

但是我仍然不认为这确实是您要实现的目标。相反,我建议您遍历每个json文件,制作自己的字典,该字典以json文件中的名称为索引。词典中的每个值都将具有另一个包含两个键ratingyear的词典,这两个键都有值列表。完成字典的构建后,您将得到每个名称的等级和年份列表,然后可以通过选择年份列表中最频繁的年份,将每个年份列表转换为单个值。 这是字典外观的一个例子

{
  "Apple": { "rating": [21,3.7,...],"year": [1915,2014,2014] }
  "Pineapple": ...
  ...
}