问题描述
样本原始数据:
asdas wqdqw Start wqqwdsad Received new email message
asdasd
asdas
dasd
asd
asd
asdasdas Email = [email protected]
asdnaslfbasl
asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message
asdasd
asdas
dasd
asd
asd
asdasdasEmail = [email protected]
asdnaslfbaslasdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdsa
asdsadasd
asdasdasEmail = [email protected]
asdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdnaslfbasl
asdasdasEmail = [email protected]
asdas wqdqw Start wqqwdsadReceived new email message
asda
as
asdasdasEmail = [email protected]
asdnaslfbasl
asdnaslfbasl
asdas wqdqw Start wqqwdsadReceived new email message
预期输出:
asdas wqdqw Start wqqwdsad Received new email message
asdasd
asdas
dasd
asd
asd
asdasdas Email = [email protected]
asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message
asdasd
asdas
dasd
asd
asd
asdasdasEmail = [email protected]
asdas wqdqw Start wqqwdsadReceived new email message
asda
as
asdasdasEmail = [email protected]
我是regex的新手,并希望使用电子邮件= [email protected]提取所有块,直到第一个先例“收到新电子邮件”
我尝试过:
\b.*Received new email message[\s\S]*?(?=\n.*Email = testa@asd\.com)
asdas wqdqw Start wqqwdsad Received new email message
asdsa
asdsadasd
asdasdasEmail = [email protected]
asdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdnaslfbasl
asdasdasEmail = [email protected]
asdas wqdqw Start wqqwdsadReceived new email message
asda
as
感谢任何帮助,让我朝正确的方向前进
解决方法
这可以完成工作:
^.+?Received new email message(?:(?!Received new email message)[\s\S])+?Email = testa@asd\.com
代码:
import re
string = r'''asdas wqdqw Start wqqwdsad Received new email message
asdasd
asdas
dasd
asd
asd
asdasdas Email = [email protected]
asdnaslfbasl
asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message
asdasd
asdas
dasd
asd
asd
asdasdasEmail = [email protected]
asdnaslfbaslasdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdsa
asdsadasd
asdasdasEmail = [email protected]
asdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdnaslfbasl
asdasdasEmail = [email protected]
asdas wqdqw Start wqqwdsadReceived new email message
asda
as
asdasdasEmail = [email protected]
asdnaslfbasl
asdnaslfbasl
asdas wqdqw Start wqqwdsadReceived new email message'''
res = re.findall(r'.+?Received new email message(?:(?!Received new email message)[\s\S])+?Email = testa@asd\.com',string)
print res
输出:
['asdas wqdqw Start wqqwdsad Received new email message\nasdasd\nasdas\ndasd\nasd\nasd\nasdasdas Email = [email protected]','asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message\nasdasd\nasdas\ndasd\nasd\nasd\nasdasdasEmail = [email protected]','asdas wqdqw Start wqqwdsadReceived new email message\nasda\nas\nasdasdasEmail = [email protected]']