每个句子的前四个单词，但必须以单词X开头并以单词Y结尾

问题描述

我希望过滤每个句子的前四个单词，第一个单词为“ This”，最后一个单词为“ on”。我一直在尝试观看YouTube教程，但是我所能做的只是以下事情：

([A-Z](?:[^\s.!?]+(?:\s|\n)){0,4}(?:[^\s.!?]+)?)

现在，这是一个示例：这种[感知取决于] ...

解决方法

您应该考虑使用某些NLP软件包将文本拆分为句子。然后使用

^This\s+\S+\s+\S+\s+on\b

它与以This开始的字符串匹配，然后有两个包含任何非空格字符的单词，然后是单词on。

请参见proof

EXPLANATION

NODE                     EXPLANATION
--------------------------------------------------------------------------------
  ^                        the beginning of the string
--------------------------------------------------------------------------------
  This                     'This'
--------------------------------------------------------------------------------
  \s+                      whitespace (\n,\r,\t,\f,and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \S+                      non-whitespace (all but \n,and " ") (1 or more times (matching the
                           most amount possible))
--------------------------------------------------------------------------------
  \s+                      whitespace (\n,and " ") (1 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  on                       'on'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

最基本的正则表达式是

/\bThis\s+\w+\s+\w+\s+on\b/

，将不进行任何匹配。也许您认为“单词”字符的内容可能与正则表达式引擎认为单词的字符有所不同。

(?:^|[.;!?]\s+)(\bThis\W*?(\b\w+\b)\W*?(\b\w+\b)\W*on\b)

那样的事情会起作用吗？据我了解，您希望句子中有四个单词，以“ This”开头，以“ on”结尾。

javascript regex