在Python中的时间范围之间从日志文件导出数据

问题描述

我一直在尝试通过一些调整来遵循本文中的准则: The best way to filter a log by a dates range in python

我的日志文件如下:

2020-Oct-12 13:38:57.742759 -0700 : (some text)
(line of text)
(line of text)
2020-Oct-12 13:38:57.742760 -0700 : (some text)
...
2020-Oct-12 13:57:57.742759 -0700 : (some text)

我尝试了这两个代码段,但是它们什么也没给出。日期时间定义有问题吗?

myfile = open('DIR_extract.log','w')
with open('DIRDriver.log','r') as f:
    for line in f:
        d = line.split(" ",1)[0] 
        if d >= '2020-10-12 13:38:57' and d <= '2020-10-12 13:57:57':
            myfile.write("%s\n" % line)

还有

myfile = open('DIR_extract.log','w')
from itertools import dropwhile,takewhile
from_dt,to_td = '2020-10-12 13:38:57','2020-10-12 13:57:57'
with open('DIRDriver.log') as fin:
    of_interest = takewhile(lambda L: L <= to_td,dropwhile(lambda L: L < from_dt,fin))
    for line in of_interest:
        myfile.write("%s\n" % line)

解决方法

您几乎到了那里。

d = line.split(" ",1)[0] 仅返回日期时间的第一部分,例如:2020-Oct-12。 这是因为您的datetime格式与您链接的答案不同。您在datetime之间留有空格。

因此,要使其正常运行,您需要掌握该行的所有日期和时间部分。

dt_start = '2020-Oct-12 13:38:57'
dt_end = '2020-Oct-12 13:57:57'
str_time_len = len(dt_start)

with open('DIR_extract.log','w+') as myfile:
    with open('DIRDriver.log','r') as f:
        for line in f:
            date_time = line[:str_time_len]
            if dt_start <= date_time <= dt_end:
                myfile.write(line)

假定日志文件的内容为

2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside

上面的代码给出了

2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
2020-Oct-12 13:57:57.742759 -0700 : end

请注意,由于您使用MMM格式已有几个月,因此上述代码仅适用于一个月内的日志。从Jan过滤到Apr或类似的操作将不起作用,因为Jan> Apr。您将需要将这些字符串转换为datetime对象。

此外,如果某些日志记录是多行的,那么您将需要掌握所有行,而不仅仅是以datetime开头的行。

import re
from datetime import datetime

_start = '2020-Oct-12 13:38:57'
_end = '2020-Oct-12 13:57:57'


dt_fmt = '%Y-%b-%d %H:%M:%S'
dt_reg = r'\d{4}-[A-Za-z]{3}-\d{2}'
dt_start = datetime.strptime(_start,dt_fmt)
dt_end = datetime.strptime(_end,dt_fmt)

str_time_len = len(_start)

with open('DIR_extract.log','r') as f:
        started = False
        for line in f:
            if re.match(dt_reg,line):
                datetime_str = line[:str_time_len]
                dt = datetime.strptime(datetime_str,dt_fmt)
                if not started and dt >= dt_start:
                    started = True
                elif started and dt > dt_end:
                    break

            if not started:
                continue

            myfile.write(line)
            print(line.strip())

假设日志文件内容如下:

2020-Oct-12 13:35:57.742759 -0700 : before
2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end
2020-Oct-12 13:59:57.742759 -0700 : outside

它为您提供:

2020-Oct-12 13:38:57.742759 -0700 : start
2020-Oct-12 13:54:57.742759 -0700 : inside
(line of text)
(line of text)
2020-Oct-12 13:57:57.742759 -0700 : end