问题描述
我的日志文件格式示例
--------------------------------------------------- --------------日志文件格式的开始------------------------------- ---------
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1047 mili sec
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1099 mili sec
2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webservices.helper.THelper Method:lambda$0] in 20 - Parsing Response from server Failed
org.json.JSONException: No value for dataModel
at org.json.JSONObject.get(JSONObject.java:355) ~[android-json-0.0.20131108.vaadin1.jar:0.0.20131108.vaadin1]
--------------------------------------------------- ------------------------日志文件格式的末尾--------------------- ---------
我目前正在处理数据(具有不良的编码习惯),该数据的开头是上面日志的前3行中的日期。
问题从第4行开始,我的日志文件中有没有日期的异常。实际上,前一行包含日期及其异常的延续。
随着格式的更改,我不知道如何处理这些行。我想获取例外行的前一行的日期。
要么我必须将上一行的日期保留在一个临时变量中,然后在格式发生更改或任何其他方式时使用它。
我需要Date作为强制性命令,以输入需要日志时间戳记的ELK。
如果有关于干净代码的建议,也请这样做。我需要lambda才能更快地运行。
目前,这是我在python代码中的逻辑(我是python代码的完整初学者)。
import boto3
import botocore
import re
import json
import requests
...
...
def handler(event,context):
print("Starting handler function")
for record in event['Records']:
# Get the bucket name and key for the new file
bucket = record['s3']['bucket']['name']
print("Bucket name - " + bucket)
print(date_time)
key = record['s3']['object']['key']
print("key name - " + key)
# Get,read,and split the file into lines
obj = s3.get_object(Bucket=bucket,Key=key)
body = obj['Body'].read()
lines = body.splitlines()
# Match the regular expressions to each line and index the JSON
for line in lines:
document = {}
try:
if line[0] == '2':
listOfData = line.split(" - ")
date = datetime.strptime(listOfData[0],"%Y-%m-%d %H:%M:%S.%f")
timestamp = date.isoformat()
level = listOfData[1]
classname = listOfData[2]
message = listOfData[-1]
document = { "timestamps": timestamp,"level": level,"classnames": classname,"messages": message }
print(document)
else:
document = {"messages": line}
except ClientError as e:
raise e
r = requests.post(url,auth=awsauth,json=document,headers=headers)
print(r)
其他信息: 正如Ben的答案所建议的那样,当我打印this_line时,我将日志文件中的所有行正确地打印在单独的行中,但是中间的异常行存在问题。下面是打印的内容。
{'line content': ['2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webevics.helpr.Helpr Method:lamda] in 2 - Parsng Respns from servr Failed','org.jsn.JSONExcepion: No vlue for dataModel'],'date string': '2020-08-14 05:35:47.655'}
{'line content': ['2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webserics.helpr.Helpr Method:lambda] in 2 - Parsing Respnse from servr Faied','org.jsn.JSONException: No vlue for dataModel','at org.json.JSONObject.get(JSONObject.java:355) ~[android-json-0.0.20131108.vaadin1.jar:0.0.20131108.vaadin1]'],'date string': '2020-08-14 05:35:47.655'}
在这里我打印了两行,第一行没用,第二行更好。那么我们能不能使第一行这样的东西出现而在this_line中仅出现第二行吗?
解决方法
我首先将示例数据放在文件中,
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1047 mili sec
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1099 mili sec
2020-08-14 05:35:48.762 - [ERROR] - from [Class:com.webservices.helper.THelper Method:lambda$0] in 20 - Parsing Response from server Failed
org.json.JSONException: No value for dataModel
at org.json.JSONObject.get(JSONObject.java:355) ~[android-json-0.0.20131108.vaadin1.jar:0.0.20131108.vaadin1]
2020-08-14 05:35:48.752 - [INFO] - from [Class:com.webservices.services.impl.DataImpl Method:bData] in 20 - Data Single completed in 1099 mili sec
我添加了第四项只是为了验证我的方法是否有效。下面的Python使用适合日期格式的正则表达式检测行的开头
import re
with open('example.txt','r') as file_handle:
file_content = file_handle.read().split("\n")
list_of_lines = []
this_line = {}
for index,line in enumerate(file_content):
if len(line.strip())>0:
reslt = re.match('\d{4}-\d\d-\d\d \d\d:\d\d:\d\d\.\d{3}',line)
if reslt: # line starts with a date
if len(this_line.keys())>0:
list_of_lines.append(this_line)
date_string = reslt.group(0)
this_line = {'date string':date_string,'line content': []}
this_line['line content'].append(line)
else:
this_line['line content'].append(line)
产生的数据结构list_of_lines
是一个字典列表。每个字典都是一个错误条目,其中可能包含一行或多行。错误字典中的两个键是date string
和line content
。
要查看输出数据结构,请尝试运行
for line in list_of_lines[2]['line content']:
print(line)