问题描述
我有一个日志检查器,该检查器使用正则表达式从日志文件中的行中查找IP。 我想找到这些行,并计算通过IP匹配的那些相同行的总出现次数
WARNING - 192.168.1.1 TIMING OUT
WARNING - 192.168.1.5 TIMING OUT
WARNING - 192.168.1.1 TIMING OUT
WARNING - 192.168.1.5 TIMING OUT
WARNING - 192.168.1.1 TIMING OUT
WARNING - 10.1.1.1 TIMING OUT
WARNING - 10.72.3.1 TIMING OUT
192.168.1.1 - 3 EVENTS
192.168.1.5 - 2 EVENTS
10.1.1.1 - 1 EVENT
10.72.3.1 - 1 EVENT
依此类推。我是python新手,因此我仍在学习最适合此目的的内容。从这一刻开始,我打开了日志文件,使用正则表达式模式执行for循环以查找每行中的IP,但是从那里我有点迷失了。干杯。
解决方法
您可以在此处使用re.findall
来捕获所有IP地址事件,然后使用映射来统计出现的次数:
inp = """WARNING - 192.168.1.1 TIMING OUT
WARNING - 192.168.1.5 TIMING OUT
WARNING - 192.168.1.1 TIMING OUT
WARNING - 192.168.1.5 TIMING OUT
WARNING - 192.168.1.1 TIMING OUT
WARNING - 10.1.1.1 TIMING OUT
WARNING - 10.72.3.1 TIMING OUT"""
matches = re.findall(r'\bWARNING - (\b\d+\.\d+\.\d+\.\d+\b)',inp)
d = {}
for elem in matches:
try:
val = d.get(elem) or 0
d[elem] = val + 1
except KeyError:
d[elem] = d[elem]
print(d)
此打印:
{'10.1.1.1': 1,'192.168.1.5': 2,'10.72.3.1': 1,'192.168.1.1': 3}
,
下面是我在https://stackoverflow.com/a/64220148/6632736的答案的修改后的变体。
假定日志位于文件中,该文件逐行读取。
#!/usr/bin/python
import os
import re
def increment(ips: dict,line: str):
match = re.match(r'^.+?\s+-\s+(?P<ip>\d{1,3}(\.\d{1,3}){3})\s.*$',line)
if match:
ip = match.group('ip')
if not ip in ips:
ips[ip] = 0
ips[ip] += 1
def parse_log_file(log: str) -> dict:
ips = dict()
with open(log,'r') as file:
for line in file:
increment(ips,line)
return ips
# log is the path to the log file:
for key,value in parse_log_file(log).items():
print(key,":",value)