问题描述
我试图解决正则表达式测验数天,但仍然无法解决问题。我离得很近,但仍然无法通过。
任务:
在HTML页面中,将文本
内部替换micro
替换为µ
。哦,不要搞砸代码:不要在<the tags>
或&entities;
替换
-
micro
->µ
-
abc micro
->abc µ
-
micromicro
->µµ
-
µmicro
->µµ
请勿触摸
-
<tag micro />
-><tag micro />
-
µ
->µ
-
&abcmicro123;
->&abcmicro123;
I tried this,但在最后一个µ
失败,我错过了什么?有人可以指出我想念的是什么吗?预先感谢!
我尝试过的事情:
正则表达式
((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro
替换
$1µ
解决方法
使用SKIP-FAIL technique,但整体匹配:
(?:<[^<>]*>|&\w+;)(*SKIP)(*F)|\bmicro\b
请参见proof
说明
--------------------------------------------------------------------------------
(?: group,but do not capture:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^<>]* any character except: '<','>' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
\w+ word characters (a-z,A-Z,0-9,_) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(*SKIP)(*F) Skip the match and go on matching from current location
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
micro 'micro'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
,
您可以尝试以下操作:
(?:<.*?>|&\w++;)(*SKIP)(*F)|micro
替换字符串:
µ
var strings = [
"micro","abc micro","micromicro","µmicro","<tag micro />","µ","&abcmicro123;"
];
var re = /(?<!(<[^>]*|&[^;]*))(micro)/g;
strings.forEach(function(str) {
var result = str.replace(re,'&$2;')
console.log(str + ' -> ' + result)
});
控制台日志输出:
micro -> µ
abc micro -> abc µ
micromicro -> µµ
µmicro -> µµ
<tag micro /> -> <tag micro />
µ -> µ
&abcmicro123; -> &abcmicro123;
说明:
- 使用
(?<!...)
-后面的否定性排除微内部标签或实体 -
(<[^>]*|&[^;]*)
-否定前瞻跳过<...>
或'&...;' -
(micro)
-捕获标签(根据需要添加多个标签,例如(micro|brewery)
) -
'&$2;'
-替换会将捕获的标签转变为实体&...;