匹配最小可能的句子

问题描述

One sentence here,much wow. Another one here. This is O.N.E. example n. 1,a nice one to understand. Hope it's clear Now!

正则表达式：(?<=\.\s)[A-Z].+?nice one.+?\.(?=\s[A-Z])

结果：Another one here. This is O.N.E. example n. 1,a nice one to understand.

如何获取This is O.N.E. example among n. 1,a nice one to understand.？（即与正则表达式匹配的最小句子）

解决方法

只需在表达式前插入一个贪婪的 .*

.*\.\s([A-Z].+?nice one.+?\.(?=\s[A-Z]))

这里有一种不同的方法，只是拆分整个文本，然后过滤掉您想要的内容：

import re
s = "One sentence here,much wow. Another one here. This is O.N.E. example n. 1,a nice one to understand. Hope it's clear now!"
result = [x for x in re.split(r'(?<=\B.\.)\s*',s) if 'nice one' in x][0]
print(result) # This is O.N.E. example n. 1,a nice one to understand.

不确定您有多少边缘情况，但在这里我使用了 re.split() 和以下模式：(?<=\B.\.)\s*。这意味着：

(?<=\B.\.) - 断言位置的正面回顾是在\b（词边界）不适用的位置之后，后跟文字点。立>
\s* - 0+ 个空白字符。

使用结果数组，检查哪个元素包含您想要的单词“nice one”不会有太大问题。

查看在线demo

您可以排除匹配点，而只匹配大写字符后跟点或点后跟空格和数字的点。

(?:(?<=\.\s)|^)[A-Z][^.A-Z]*(?:(?:[A-Z]\.|\.\s\d)[^.A-Z]*)*\bnice one\b.+?(?=\s[A-Z])

(?:(?<=\.\s)|^) 断言 . 和空白字符到左边或字符串的开头
[A-Z][^.A-Z]* 匹配大写字符 A-Z 和 0+ 次除点或大写字符以外的任何字符
(?: 非捕获组
- (?:[A-Z]\.|\.\s\d) 匹配 A-Z 和 . 或匹配 . 空白字符和数字
- [^.A-Z]* 可选择匹配除 . 或大写字符以外的任何字符
)* 关闭组并可选择重复
\bnice one\b.+?(?=\s[A-Z]) 匹配 nice one 并匹配直到断言右侧的空白字符和大写字符

Regex demo

python regex regex regex