根据 ID 将嵌套的 JSON 文件拆分为两个 JSON?

问题描述

我有嵌套的 JSON 文件,它加载为名为 props 的 python 字典,如下所示:

import React from 'react';
import { ConversationalForm } from 'conversational-form';

export default class MyForm extends React.Component {
  constructor(props) {
    super(props);
    this.state = {
      questions: []
    };
    this.submitCallback = this.submitCallback.bind(this);
  }

  componentDidMount() {
    fetch('http://localhost:3000/api/questions')
    .then((response) => response.json())
    .then((responseJson) => {
      this.setState({questions:JSON.stringify(responseJson)},() => console.log(this.state.questions)); // Works fine,log has the json response
      alert(this.state.questions); // Returns expected JSON - runs after the next alert
    })
    alert(this.state.questions); // returns undefined - runs first

    this.cf = ConversationalForm.startTheConversation({
      options: {
        submitCallback: this.submitCallback,showProgressBar: true,preventAutoFocus: false,},tags: this.state.questions // returns undefined
    });
    this.elem.appendChild(this.cf.el);
  }
  
  submitCallback() {
    var formDataSerialized = this.cf.getFormData(true);
    console.log("Formdata,obj:",formDataSerialized);
    this.cf.addRobotChatResponse("Your are done. Grab a well deserved coffee.")
  }
  
  render() {
    return (
      <div>
        <div
          ref={ref => this.elem = ref}
        />
      </div>
    );
  }
}

它具有以下结构:

movies_data

它有 3324 个键值对(即最多 key review_3224)。我想根据特定的键列表将此文件拆分为两个 json 文件with open('project_folder/data_movie_absa.json') as infile: movies_data = json.load(infile) { "review_1": {"tokens": ["Best","show","ever","!"],"movie_user_4": {"aspects": ["O","B_A","O","O"],"sentiments": ["B_S","O"]},"movie_user_6": {"aspects": ["O","O"]}},"review_2": {"tokens": ["Its","a","great","show"],"movie_user_1": {"aspects": ["O","B_A"],"sentiments": ["O","B_S","review_3": {"tokens": ["I","love","this","actor","movie_user_17": {"aspects": ["O","movie_user_23": {"aspects": ["O","review_4": {"tokens": ["Bad","movie"],"O"]}} ... } ):

train_movies.json

对于 test_movies.json 我有以下结构:

test_movies.json

不幸的是,这种结构存在一些问题,例如不一致的双引号 (test_IDS = ['review_2','review_4'] with open("train_movies.json","w",encoding="utf-8-sig") as outfile_train,open("test_movies.json",encoding="utf-8-sig") as outfile_test: for review_id,review in movies_data.items(): if review_id in test_IDS: outfile = outfile_test outfile.write('{"%s": "%s"}' % (review_id,movies_data[review_id])) else: outfile = outfile_train outfile.write('{"%s": "%s"}' % (review_id,movies_data[review_id])) outfile.close() )、评论之间没有逗号等。 . 因此,通过将 {"review_2": "{'tokens': ['Its','a','great','show'],'movie_user_4': {'aspects': ['O','O','B_A'],'sentiments': ['O','B_S','O']},'movie_user_6': {'aspects': ['O','O']}}"} {"review_4": "{'tokens': ['Bad','movie'],'movie_user_1': {'aspects': ['O','sentiments': ['B_S','O']}}"} 读取为 " vs. ' 文件,我遇到了以下问题:

test_movies.json

错误信息:

json

所需的输出应该有一个正确的 json 结构,就像原来的 with open('project_folder/test_movies.json') as infile: testing_data = json.load(infile) 这样 python 可以正确地将它作为一个 dict 读取。

你能帮我改正我的python代码吗?

先谢谢你!

解决方法

问题

  • 需要使用 json.dumps 创建输出字符串以写入文件。
  • 使用 Python 字符串格式,即 '{"%s": "%s"}' % (review_id,movies_data[review_id]) 会产生您描述的问题

代码

train,test = {},{}   # Dicionaries for storing training and test data
for review_id,review in movies_data.items():
    if review_id in test_IDS:
        test[review_id] = review
    else:
        train[review_id] = review

# Output Test
with open("test_movies.json","w") as outfile_test:
    json.dump(test,outfile_test)
    
# Output training
with open("train_movies.json","w") as outfile_train:
    json.dump(train,outfile_train)

结果

输入: test.json的文件内容

{ "review_1": {"tokens": ["Best","show","ever","!"],"movie_user_4": {"aspects": ["O","B_A","O","O"],"sentiments": ["B_S","O"]},"movie_user_6": {"aspects": ["O","O"]}},"review_2": {"tokens": ["Its","a","great","show"],"movie_user_1": {"aspects": ["O","B_A"],"sentiments": ["O","B_S","review_3": {"tokens": ["I","love","this","actor","movie_user_17": {"aspects": ["O","movie_user_23": {"aspects": ["O","review_4": {"tokens": ["Bad","movie"],"O"]}}

}

输出: test_movies.json 的文件内容

{"review_2": {"tokens": ["Its","O"]}}}

输出: train_movies.json 的文件内容

{"review_1": {"tokens": ["Best","O"]}}}