问题描述
我遇到了问题,因为当我的VM突然崩溃时,我正在使用与Anaconda3一起运行的Jupyterlab将数据加载到熊猫中。启动后,我发现我的代码由于某种原因不再起作用。这是我的代码:
awsc = pd.DataFrame()
json_pattern = os.path.join('logs_old/AWSCloudtrailLog/','*')
file_list = glob.glob(json_pattern)
for file in file_list:
data = pd.read_json(file,lines=True)
awsc = awsc.append(data,ignore_index = True)
awsc = pd.concat([awsc,pd.json_normalize(awsc['userIdentity'])],axis=1).drop('userIdentity',1)
awsc.rename(columns={'type':'userIdentity_type','principalId':'userIdentity_principalId','arn':'userIdentity_arn','accountId':'userIdentity_accountId','accessKeyId':'userIdentity_accessKeyId','userName':'userIdentity_userName',},inplace=True)
运行代码时,它会给我这样的KeyError消息:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self,key,method,tolerance)
2888 try:
-> 2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'userIdentity'
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-9-efd1d2e600a5> in <module>
1 # unpack nested json
2
----> 3 awsc = pd.concat([awsc,1)
4 awsc.rename(columns={'type':'userIdentity_type',5 'principalId':'userIdentity_principalId',~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/frame.py in __getitem__(self,key)
2900 if self.columns.nlevels > 1:
2901 return self._getitem_multilevel(key)
-> 2902 indexer = self.columns.get_loc(key)
2903 if is_integer(indexer):
2904 indexer = [indexer]
~/anaconda3/envs/environment/lib/python3.8/site-packages/pandas/core/indexes/base.py in get_loc(self,tolerance)
2889 return self._engine.get_loc(casted_key)
2890 except KeyError as err:
-> 2891 raise KeyError(key) from err
2892
2893 if tolerance is not None:
KeyError: 'userIdentity'
当我运行print(awss.info())或print(awsc.info())时,数据帧awsc的输出:
<class 'pandas.core.frame.DataFrame'>
Index: 0 entries
Empty DataFrameNone
有解决此问题的解决方案吗?问题是出自熊猫还是水蟒?
解决方法
使用OP中的代码
- 创建数据框的方法不正确,这是
awsc
为空的方式。 - 没有看到文件,就无法知道
pd.read_json(file,lines=True)
是否是正确的使用方法。 -
pd.json_normalize(awsc['userIdentity'])
将在dicts
的列上工作。尽管该列很可能是字符串。- 如果
dicts
是str
类型,请使用ast.literal_eval
将它们转换为dict
类型。
- 如果
import pandas as pd
from ast import literal_eval
# crate a list to add dataframes to
awsc_list = list()
# iterate through the list of and append them to awsc_list
for file in file_list:
awsc_list.append(pd.read_json(file,lines=True))
# concat the files into a single dataframe
awsc = pd.concat(awsc_list).reset_index(drop=True)
# convert the userIdentity column to dict type,if it contains str type
awsc.userIdentity = awsc.userIdentity.apply(literal_eval)
# normalize userIdentity
normalized = pd.json_normalize(awsc['userIdentity'],sep='_')
# join awsc with normalized and drop the userIdentity column
awsc = awsc.join(normalized).drop('userIdentity',1)
# rename the columns
awsc.rename(columns={'type':'userIdentity_type','principalId':'userIdentity_principalId','arn':'userIdentity_arn','accountId':'userIdentity_accountId','accessKeyId':'userIdentity_accessKeyId','userName':'userIdentity_userName',},inplace=True)
带有示例数据的新代码
- 这些键已经具有正确的名称,因此没有重命名
- 使用
.json_normalize
读取日志,对'userIdentity'
进行规范化,因此不需要第二步。 - 另请参阅Splitting dictionary/list inside a Pandas Column into Separate Columns
import json
import pandas as pd
# crate a list to add dataframes to
awsc_list = list()
# list of files
files_list = ['test.json','test2.json']
# read the filess
for file in files_list:
with open(file,'r',encoding='utf-8') as f:
data = json.loads(f.read())
# normalize the file and append it to the list of dataframe
awsc_list.append(pd.json_normalize(data,'Records',sep='_'))
# concat the files into a single dataframe
awsc = pd.concat(awsc_list).reset_index(drop=True)
# display(awsc)
eventVersion eventTime eventSource eventName awsRegion sourceIPAddress userAgent userIdentity_type userIdentity_principalId userIdentity_arn userIdentity_accessKeyId userIdentity_accountId userIdentity_userName requestParameters_instancesSet_items responseElements_instancesSet_items requestParameters_force userIdentity_sessionContext_attributes_mfaAuthenticated userIdentity_sessionContext_attributes_creationDate requestParameters_keyName responseElements_keyName responseElements_keyFingerprint responseElements_keyMaterial
0 1.0 2014-03-06T21:22:54Z ec2.amazonaws.com StartInstances us-east-2 205.251.233.176 ec2-api-tools 1.6.12.2 IAMUser EX_PRINCIPAL_ID arn:aws:iam::123456789012:user/Alice EXAMPLE_KEY_ID 123456789012 Alice [{'instanceId': 'i-ebeaf9e2'}] [{'instanceId': 'i-ebeaf9e2','currentState': {'code': 0,'name': 'pending'},'previousState': {'code': 80,'name': 'stopped'}}] NaN NaN NaN NaN NaN NaN NaN
1 1.0 2014-03-06T21:01:59Z ec2.amazonaws.com StopInstances us-east-2 205.251.233.176 ec2-api-tools 1.6.12.2 IAMUser EX_PRINCIPAL_ID arn:aws:iam::123456789012:user/Alice EXAMPLE_KEY_ID 123456789012 Alice [{'instanceId': 'i-ebeaf9e2'}] [{'instanceId': 'i-ebeaf9e2','currentState': {'code': 64,'name': 'stopping'},'previousState': {'code': 16,'name': 'running'}}] False NaN NaN NaN NaN NaN NaN
2 1.0 2014-03-06T17:10:34Z ec2.amazonaws.com CreateKeyPair us-east-2 72.21.198.64 EC2ConsoleBackend,aws-sdk-java/Linux/x.xx.fleetxen Java_HotSpot(TM)_64-Bit_Server_VM/xx IAMUser EX_PRINCIPAL_ID arn:aws:iam::123456789012:user/Alice EXAMPLE_KEY_ID 123456789012 Alice NaN NaN NaN false 2014-03-06T15:15:06Z mykeypair mykeypair 30:1d:46:d0:5b:ad:7e:1b:b6:70:62:8b:ff:38:b5:e9:ab:5d:b8:21 <sensitiveDataRemoved>
3 1.0 2014-03-06T21:22:54Z ec2.amazonaws.com StartInstances us-east-2 205.251.233.176 ec2-api-tools 1.6.12.2 IAMUser EX_PRINCIPAL_ID arn:aws:iam::123456789012:user/Alice EXAMPLE_KEY_ID 123456789012 Alice [{'instanceId': 'i-ebeaf9e2'}] [{'instanceId': 'i-ebeaf9e2','name': 'stopped'}}] NaN NaN NaN NaN NaN NaN NaN
样本数据
test.json
- JSON列表
[{
"Records": [{
"eventVersion": "1.0","userIdentity": {
"type": "IAMUser","principalId": "EX_PRINCIPAL_ID","arn": "arn:aws:iam::123456789012:user/Alice","accessKeyId": "EXAMPLE_KEY_ID","accountId": "123456789012","userName": "Alice"
},"eventTime": "2014-03-06T21:22:54Z","eventSource": "ec2.amazonaws.com","eventName": "StartInstances","awsRegion": "us-east-2","sourceIPAddress": "205.251.233.176","userAgent": "ec2-api-tools 1.6.12.2","requestParameters": {
"instancesSet": {
"items": [{
"instanceId": "i-ebeaf9e2"
}
]
}
},"responseElements": {
"instancesSet": {
"items": [{
"instanceId": "i-ebeaf9e2","currentState": {
"code": 0,"name": "pending"
},"previousState": {
"code": 80,"name": "stopped"
}
}
]
}
}
}
]
},{
"Records": [{
"eventVersion": "1.0","eventTime": "2014-03-06T21:01:59Z","eventName": "StopInstances","requestParameters": {
"instancesSet": {
"items": [{
"instanceId": "i-ebeaf9e2"
}
]
},"force": false
},"currentState": {
"code": 64,"name": "stopping"
},"previousState": {
"code": 16,"name": "running"
}
}
]
}
}
}
]
},"userName": "Alice","sessionContext": {
"attributes": {
"mfaAuthenticated": "false","creationDate": "2014-03-06T15:15:06Z"
}
}
},"eventTime": "2014-03-06T17:10:34Z","eventName": "CreateKeyPair","sourceIPAddress": "72.21.198.64","userAgent": "EC2ConsoleBackend,aws-sdk-java/Linux/x.xx.fleetxen Java_HotSpot(TM)_64-Bit_Server_VM/xx","requestParameters": {
"keyName": "mykeypair"
},"responseElements": {
"keyName": "mykeypair","keyFingerprint": "30:1d:46:d0:5b:ad:7e:1b:b6:70:62:8b:ff:38:b5:e9:ab:5d:b8:21","keyMaterial": "\u003csensitiveDataRemoved\u003e"
}
}
]
}
]
test2.json
- 一个JSON
{
"Records": [{
"eventVersion": "1.0","userIdentity": {
"type": "IAMUser","userName": "Alice"
},"requestParameters": {
"instancesSet": {
"items": [{
"instanceId": "i-ebeaf9e2"
}
]
}
},"responseElements": {
"instancesSet": {
"items": [{
"instanceId": "i-ebeaf9e2","currentState": {
"code": 0,"name": "pending"
},"previousState": {
"code": 80,"name": "stopped"
}
}
]
}
}
}
]
}