问题描述
我一直在尝试通过一些调整来遵循本文中的准则: The best way to filter a log by a dates range in python
我的日志文件如下:
2020-Oct-12 13:38:57.742759 -0700 : (some text)
(line of text)
(line of text)
2020-Oct-12 13:38:57.742760 -0700 : (some text)
...
2020-Oct-12 13:57:57.742759 -0700 : (some text)
我尝试了这两个代码段,但是它们什么也没给出。日期时间定义有问题吗?
myfile = open('DIR_extract.log','w')
with open('DIRDriver.log','r') as f:
for line in f:
d = line.split(" ",1)[0]
if d >= '2020-10-12 13:38:57' and d <= '2020-10-12 13:57:57':
myfile.write("%s\n" % line)
还有
myfile = open('DIR_extract.log','w')
from itertools import dropwhile,takewhile
from_dt,to_td = '2020-10-12 13:38:57','2020-10-12 13:57:57'
with open('DIRDriver.log') as fin:
of_interest = takewhile(lambda L: L <= to_td,dropwhile(lambda L: L < from_dt,fin))
for line in of_interest:
myfile.write("%s\n" % line)
解决方法
您几乎到了那里。
d = line.split(" ",1)[0]
仅返回日期时间的第一部分,例如:2020-Oct-12
。
这是因为您的datetime
格式与您链接的答案不同。您在date
和time
之间留有空格。
因此,要使其正常运行,您需要掌握该行的所有日期和时间部分。
dt_start = '2020-Oct-12 13:38:57'
dt_end = '2020-Oct-12 13:57:57'
str_time_len = len(dt_start)
with open('DIR_extract.log','w+') as myfile:
with open('DIRDriver.log','r') as f:
for line in f:
date_time = line[:str_time_len]
if dt_start <= date_time <= dt_end:
myfile.write(line)
假定日志文件的内容为
2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside
上面的代码给出了
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end
请注意,由于您使用MMM
格式已有几个月,因此上述代码仅适用于一个月内的日志。从Jan
过滤到Apr
或类似的操作将不起作用,因为Jan
> Apr
。您将需要将这些字符串转换为datetime
对象。
此外,如果某些日志记录是多行的,那么您将需要掌握所有行,而不仅仅是以datetime开头的行。
import re
from datetime import datetime
_start = '2020-Oct-12 13:38:57'
_end = '2020-Oct-12 13:57:57'
dt_fmt = '%Y-%b-%d %H:%M:%S'
dt_reg = r'\d{4}-[A-Za-z]{3}-\d{2}'
dt_start = datetime.strptime(_start,dt_fmt)
dt_end = datetime.strptime(_end,dt_fmt)
str_time_len = len(_start)
with open('DIR_extract.log','r') as f:
started = False
for line in f:
if re.match(dt_reg,line):
datetime_str = line[:str_time_len]
dt = datetime.strptime(datetime_str,dt_fmt)
if not started and dt >= dt_start:
started = True
elif started and dt > dt_end:
break
if not started:
continue
myfile.write(line)
print(line.strip())
假设日志文件内容如下:
2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside
它为您提供:
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end