KeyError:“['index'] 均不在列中”

问题描述

这是一个json文件:

{
    "id": "68af48116a252820a1e103727003d1087cb21a32","article": [
        "by mark duell .","published : .","05:58 est,10 september 2012 .","| .","updated : .","07:38 est,"a pet owner starved her two dogs so badly that one was forced to eat part of his mother 's dead body in a desperate attempt to survive .","the mother died a ` horrendous ' death and both were in a terrible state when found after two weeks of starvation earlier this year at the home of katrina plumridge,31,in grimsby,lincolnshire .","the barely-alive dog was ` shockingly thin ' and the house had a ` nauseating and overpowering ' stench,grimsby magistrates court heard .","warning : graphic content .","horrendous : the male dog,scrappy -lrb- right -rrb-,was so badly emaciated that he ate the body of his mother ronnie -lrb- centre -rrb- to try to survive at the home of katrina plumridge in grimsby,"the suffering was so serious that the female staffordshire bull terrier,named ronnie,died of starvation,nigel burn,prosecuting,told the court last friday .","suspended jail term : the dogs were in a terrible state when found after two weeks of starvation at the home of katrina plumridge,31 -lrb- pictured -rrb- .","the male dog,her son scrappy,was so badly emaciated that he ate her body to try to survive .",],"abstract": [
        "neglect by katrina plumridge saw staffordshire bull terrier ronnie die .","dog 's son scrappy was forced to eat her to survive at grimsby house .","alarm raised by letting agent shocked by ` thinnest dog he 'd ever seen '",]
}

我已运行 df = pd.read_json('100252.json'),但出现错误:ValueError: arrays must all be same length

然后我尝试了

with open('100252.json') as json_data: 
    data = json.load(json_data) 

pd.DataFrame.from_dict(data,orient='index').T.set_index('index')

但我收到错误 KeyError: "None of ['index'] are in the columns"

我该如何解决这个问题?我不知道我的错误在哪里。这就是为什么我需要你的帮助

编辑

来源:https://huggingface.co/docs/datasets/loading_datasets.html

从这个网站,我想做类似的事情

>>> from datasets import Dataset
>>> import pandas as pd
>>> df = pd.DataFrame({"a": [1,2,3]})
>>> dataset = Dataset.from_pandas(df)

我必须将 json 文件传输到数据帧中,然后使用数据集库从 Pandas 获取数据集

解决方法

Dataset 输入必须是一个 dict,以相同大小的列表作为值。所以,

  1. 将句子连接成一个字符串并创建一个单元素列表。
from datasets import Dataset
with open('100252.json') as json_data: 
    data = json.load(json_data)

data['id'] = [data['id']]
data['article'] = ["\n".join(data['article'])]
data['abstract'] = ["\n".join(data['abstract'])]

Dataset.from_dict(data)

您的数据集将包含一行。

  1. 对齐列表。例如用空字符串填充
max_len = max([len(data[col]) for col in ['article','abstract'] ])

data['id'] = [data['id']] * max_len
data['article'] = data['article'] + [""] * (max_len - len(data['article'])) 
data['abstract'] = data['abstract'] + [""] * (max_len - len(data['abstract'])) 
Dataset.from_dict(data)

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...