正则表达式替换:替换文本,而不是代码 替换请勿触摸正则表达式替换

问题描述

我试图解决正则表达式测验数天,但仍然无法解决问题。我离得很近,但仍然无法通过。

任务:

在HTML页面中,将文本micro替换为&micro;。哦,不要搞砸代码:不要在<the tags>&entities;

内部替换

替换

  • micro-> &micro;
  • abc micro-> abc &micro;
  • micromicro-> &micro;&micro;
  • &micro;micro-> &micro;&micro;

请勿触摸

  • <tag micro />-> <tag micro />
  • &micro;-> &micro;
  • &abcmicro123;-> &abcmicro123;

I tried this,但在最后一个&micro;失败,我错过了什么?有人可以指出我想念的是什么吗?预先感谢!

我尝试过的事情:

正则表达式

((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro

替换

$1&micro;

解决方法

使用SKIP-FAIL technique,但整体匹配:

(?:<[^<>]*>|&\w+;)(*SKIP)(*F)|\bmicro\b

请参见proof

说明

--------------------------------------------------------------------------------
  (?:                      group,but do not capture:
--------------------------------------------------------------------------------
    <                        '<'
--------------------------------------------------------------------------------
    [^<>]*                   any character except: '<','>' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    >                        '>'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
    \w+                      word characters (a-z,A-Z,0-9,_) (1 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    ;                        ';'
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (*SKIP)(*F)              Skip the match and go on matching from current location
--------------------------------------------------------------------------------
 |                        OR
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  micro                    'micro'
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
,

您可以尝试以下操作:

(?:<.*?>|&\w++;)(*SKIP)(*F)|micro

替换字符串:

&micro;

,

var strings = [
    "micro","abc micro","micromicro","&micro;micro","<tag micro />","&micro;","&abcmicro123;"
];
var re = /(?<!(<[^>]*|&[^;]*))(micro)/g;
strings.forEach(function(str) {
    var result = str.replace(re,'&$2;')
    console.log(str + ' -> ' + result)
});

控制台日志输出:

micro -> &micro;
abc micro -> abc &micro;
micromicro -> &micro;&micro;
&micro;micro -> &micro;&micro;
<tag micro /> -> <tag micro />
&micro; -> &micro;
&abcmicro123; -> &abcmicro123;

说明:

  • 使用(?<!...)-后面的否定性排除微内部标签或实体
  • (<[^>]*|&[^;]*)-否定前瞻跳过<...>或'&...;'
  • (micro)-捕获标签(根据需要添加多个标签,例如(micro|brewery)
  • '&$2;'-替换会将捕获的标签转变为实体&...;