问题描述
我有嵌套的 JSON 文件,它加载为名为 props
的 python 字典,如下所示:
import React from 'react';
import { ConversationalForm } from 'conversational-form';
export default class MyForm extends React.Component {
constructor(props) {
super(props);
this.state = {
questions: []
};
this.submitCallback = this.submitCallback.bind(this);
}
componentDidMount() {
fetch('http://localhost:3000/api/questions')
.then((response) => response.json())
.then((responseJson) => {
this.setState({questions:JSON.stringify(responseJson)},() => console.log(this.state.questions)); // Works fine,log has the json response
alert(this.state.questions); // Returns expected JSON - runs after the next alert
})
alert(this.state.questions); // returns undefined - runs first
this.cf = ConversationalForm.startTheConversation({
options: {
submitCallback: this.submitCallback,showProgressBar: true,preventAutoFocus: false,},tags: this.state.questions // returns undefined
});
this.elem.appendChild(this.cf.el);
}
submitCallback() {
var formDataSerialized = this.cf.getFormData(true);
console.log("Formdata,obj:",formDataSerialized);
this.cf.addRobotChatResponse("Your are done. Grab a well deserved coffee.")
}
render() {
return (
<div>
<div
ref={ref => this.elem = ref}
/>
</div>
);
}
}
它具有以下结构:
movies_data
它有 3324 个键值对(即最多 key review_3224)。我想根据特定的键列表将此文件拆分为两个 json 文件(with open('project_folder/data_movie_absa.json') as infile:
movies_data = json.load(infile)
、{ "review_1": {"tokens": ["Best","show","ever","!"],"movie_user_4": {"aspects": ["O","B_A","O","O"],"sentiments": ["B_S","O"]},"movie_user_6": {"aspects": ["O","O"]}},"review_2": {"tokens": ["Its","a","great","show"],"movie_user_1": {"aspects": ["O","B_A"],"sentiments": ["O","B_S","review_3": {"tokens": ["I","love","this","actor","movie_user_17": {"aspects": ["O","movie_user_23": {"aspects": ["O","review_4": {"tokens": ["Bad","movie"],"O"]}}
...
}
):
train_movies.json
对于 test_movies.json 我有以下结构:
test_movies.json
不幸的是,这种结构存在一些问题,例如不一致的双引号 (test_IDS = ['review_2','review_4']
with open("train_movies.json","w",encoding="utf-8-sig") as outfile_train,open("test_movies.json",encoding="utf-8-sig") as outfile_test:
for review_id,review in movies_data.items():
if review_id in test_IDS:
outfile = outfile_test
outfile.write('{"%s": "%s"}' % (review_id,movies_data[review_id]))
else:
outfile = outfile_train
outfile.write('{"%s": "%s"}' % (review_id,movies_data[review_id]))
outfile.close()
)、评论之间没有逗号等。 . 因此,通过将 {"review_2": "{'tokens': ['Its','a','great','show'],'movie_user_4': {'aspects': ['O','O','B_A'],'sentiments': ['O','B_S','O']},'movie_user_6': {'aspects': ['O','O']}}"}
{"review_4": "{'tokens': ['Bad','movie'],'movie_user_1': {'aspects': ['O','sentiments': ['B_S','O']}}"}
读取为 " vs. '
文件,我遇到了以下问题:
test_movies.json
错误信息:
json
所需的输出应该有一个正确的 json 结构,就像原来的 with open('project_folder/test_movies.json') as infile:
testing_data = json.load(infile)
这样 python 可以正确地将它作为一个 dict 读取。
你能帮我改正我的python代码吗?
先谢谢你!
解决方法
问题
- 需要使用 json.dumps 创建输出字符串以写入文件。
- 使用 Python 字符串格式,即 '{"%s": "%s"}' % (review_id,movies_data[review_id]) 会产生您描述的问题
代码
train,test = {},{} # Dicionaries for storing training and test data
for review_id,review in movies_data.items():
if review_id in test_IDS:
test[review_id] = review
else:
train[review_id] = review
# Output Test
with open("test_movies.json","w") as outfile_test:
json.dump(test,outfile_test)
# Output training
with open("train_movies.json","w") as outfile_train:
json.dump(train,outfile_train)
结果
输入: test.json的文件内容
{ "review_1": {"tokens": ["Best","show","ever","!"],"movie_user_4": {"aspects": ["O","B_A","O","O"],"sentiments": ["B_S","O"]},"movie_user_6": {"aspects": ["O","O"]}},"review_2": {"tokens": ["Its","a","great","show"],"movie_user_1": {"aspects": ["O","B_A"],"sentiments": ["O","B_S","review_3": {"tokens": ["I","love","this","actor","movie_user_17": {"aspects": ["O","movie_user_23": {"aspects": ["O","review_4": {"tokens": ["Bad","movie"],"O"]}}
}
输出: test_movies.json 的文件内容
{"review_2": {"tokens": ["Its","O"]}}}
输出: train_movies.json 的文件内容
{"review_1": {"tokens": ["Best","O"]}}}