在字符之前提取单词

问题描述

我正在尝试提取Y之前边界分隔的任何单词。当我尝试使用(?m)标志将每一行视为单独的记录并尝试捕获\w+向前看的\s+Y时，但是我只能打印第一个匹配项，而不能第二场比赛（IMP1）。

print(foo)
this is IMP Y text
and this is also IMP1 Y text
this is not so IMP2 N text
Y is not important

当前无果的尝试：

>>> m = re.search('(?m).*?(\w+)(?=\s+Y)',foo)
>>> m.groups()
('IMP',)
>>>
>>> m = re.search('(?m)(?<=\s)(\w+)(?=\s+Y)',)
>>>

预期结果是：

('IMP','IMP1')

解决方法

您可以使用

\w+(?=[^\S\r\n]+Y\b)

请参见regex demo。详细信息：

\w+-一个或多个字母/数字/下划线 -(?=[^\S\r\n]+Y\b)-紧随其后是CR和LF以外的一个或多个空格，然后是整个单词Y（\b是单词边界）。

查看Python demo：

import re
foo = "this is IMP Y text\nand this is also IMP1 Y text\nthis is not so IMP2 N text\nY is not important"
print(re.findall(r'\w+(?=[^\S\r\n]+Y\b)',foo))
# => ['IMP','IMP1']

尝试使用：

(\w+)(?=.Y)

您可以测试here

因此，完整的代码应为：

import re

a="""this is IMP Y text
and this is also IMP1 Y text
this is not so IMP2 N text
Y is not important"""


print (re.findall(r"(\w+)(?=.Y)",a))

输出：

['IMP','IMP1']

positive-lookahead python regex regex regex regex-lookarounds