问题描述
示例= {“主机”:“ 146.204.224.152”, “ user_name”:“ feest6811”, “ time”:“ 21 / Jun / 2019:15:45:24 -0700”, “ request”:“ POST / incentivize HTTP / 1.1”}
sample ='146.204.224.152-Feet6811 [21 / Jun / 2019:15:45:24 -0700]“ POST / incentivize HTTP / 1.1” 302 4622'
当我尝试退出时,我得到[]空列表
re.findall('(\d{3}.\d{3}.\d{3}.\d{3}( - )([\w]*)(\[.*\]))',sample)
解决方法
您当前的正则表达式有很多问题,主要是模式实际上与输入不匹配。之所以如此,是因为您在某些地方缺少空格,并且您也没有正确匹配双引号项。试试这个版本:
sample = '146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
parts = re.findall(r'^(\d{3}(?:\.\d{3}){3}) - (\S+) \[(.*?)\] "(.*?)".*$',sample)
print(parts)
此打印:
[('146.204.224.152','feest6811','21/Jun/2019:15:45:24 -0700','POST /incentivize HTTP/1.1')]
,
我不确定您要实现什么目标。但是,根据您给出的示例,我修复了您的正则表达式。
固定的正则表达式:(\d{3}.\d{3}.\d{3}.\d{3})\s*\-\s*(\w+)\s+\[(.*?)\]\s+\"(.*?)\"
您可以查看在线工作示例here。
,您的空格不正确。这是生成dict示例的版本:
import re
from pprint import pprint
example = {"host":"146.204.224.152","user_name":"feest6811","time":"21/Jun/2019:15:45:24 -0700","request":"POST /incentivize HTTP/1.1"}
sample = '146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622'
m = re.match(r'(?P<host>\d{3}.\d{3}.\d{3}.\d{3}) - (?P<user_name>\w*) \[(?P<time>.*?)\] \"(?P<request>.*?)\"',sample)
assert example == m.groupdict()
pprint(m.groupdict())
输出:
{'host': '146.204.224.152','request': 'POST /incentivize HTTP/1.1','time': '21/Jun/2019:15:45:24 -0700','user_name': 'feest6811'}