正则表达式查找存在所需电子邮件的最后一块内容

问题描述

样本原始数据:

asdas wqdqw Start wqqwdsad Received new email message
asdasd
asdas
dasd
asd
asd
asdasdas Email = [email protected]
asdnaslfbasl
asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message
asdasd
asdas
dasd
asd
asd
asdasdasEmail = [email protected]
asdnaslfbaslasdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message

asdsa
asdsadasd
asdasdasEmail = [email protected]
asdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdnaslfbasl
asdasdasEmail = [email protected]
asdas wqdqw Start wqqwdsadReceived new email message
asda
as
asdasdasEmail = [email protected]
asdnaslfbasl
asdnaslfbasl
asdas wqdqw Start wqqwdsadReceived new email message

预期输出

asdas wqdqw Start wqqwdsad Received new email message
asdasd
asdas
dasd
asd
asd
asdasdas Email = [email protected]

asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message
asdasd
asdas
dasd
asd
asd
asdasdasEmail = [email protected]

asdas wqdqw Start wqqwdsadReceived new email message
asda
as
asdasdasEmail = [email protected]

我是regex的新手,并希望使用电子邮件= [email protected]提取所有块,直到第一个先例“收到新电子邮件

我尝试过:

\b.*Received new email message[\s\S]*?(?=\n.*Email = testa@asd\.com)

它对于第一个2个块效果很好,但是对于第三个块,它给了我:

asdas wqdqw Start wqqwdsad Received new email message

asdsa
asdsadasd
asdasdasEmail = [email protected]
asdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdnaslfbasl
asdasdasEmail = [email protected]
asdas wqdqw Start wqqwdsadReceived new email message
asda
as

感谢任何帮助,让我朝正确的方向前进

解决方法

这可以完成工作:

^.+?Received new email message(?:(?!Received new email message)[\s\S])+?Email = testa@asd\.com

Demo & explanation

代码:

import re

string = r'''asdas wqdqw Start wqqwdsad Received new email message
asdasd
asdas
dasd
asd
asd
asdasdas Email = [email protected]
asdnaslfbasl
asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message
asdasd
asdas
dasd
asd
asd
asdasdasEmail = [email protected]
asdnaslfbaslasdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message

asdsa
asdsadasd
asdasdasEmail = [email protected]
asdnaslfbasl
asdas wqdqw Start wqqwdsad Received new email message
asdnaslfbasl
asdasdasEmail = [email protected]
asdas wqdqw Start wqqwdsadReceived new email message
asda
as
asdasdasEmail = [email protected]
asdnaslfbasl
asdnaslfbasl
asdas wqdqw Start wqqwdsadReceived new email message'''

res = re.findall(r'.+?Received new email message(?:(?!Received new email message)[\s\S])+?Email = testa@asd\.com',string)
print res

输出:

['asdas wqdqw Start wqqwdsad Received new email message\nasdasd\nasdas\ndasd\nasd\nasd\nasdasdas Email = [email protected]','asdas wqdasdwqqeqw Start wqqwdsaadsd Received new email message\nasdasd\nasdas\ndasd\nasd\nasd\nasdasdasEmail = [email protected]','asdas wqdqw Start wqqwdsadReceived new email message\nasda\nas\nasdasdasEmail = [email protected]']