正则表达式在文本中匹配单词,但在引号或注释中不匹配

问题描述

我正在为VS Code构建扩展程序,并使用formatter API将所有关键字大写。

假设我在编辑器中有代码

TYPE MyStruct : STRUCT
        this.var1 : POINTER TO INT; (* Указатель 1 *)
        var2 : POINTER TO INT; (* this is Указатель 2 *)
        sstr: STRING(200) := "This 
            Test this line";    
        sstr: STRING(200) := "Test this line";    
        sstr: STRING(200) := 'Test this line';    
    END_STRUCT
END_TYPE

THIS.MyStruct := 100;

我想找到所有this个单词,但要在注释(* ... *)或字符串(单引号或双引号)中找到吗?

我的尝试是用ig

(?<=^([^"'])*)\bthis\b

但它仍在注释中以及是否有新行时选择。

这是我真正的代码示例

let keywords = [
    'true','false','exit','continue','return','constant','retain','public','private','protected','abstract','persistent','internal','final','of','else','elsif','then','__try','__catch','__finally','__endtry','do','to','by','task','with','using','uses','from','until','or','or_else','and','and_then','not','xor','nor','ge','le','eq','ne','gt','lt','__new','__delete','extends','implements','this','super'
];
let regEx = new RegExp(`\\b(?:${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\\b`,"ig");
text = text.replace(regEx,(match) => {
    return match.toupperCase();
});

解决方法

您需要匹配需要丢弃的上下文,然后匹配并捕获需要修改的模式的出现:

/(?<!\\(?:\\{2})*)"[^"\\]*(?:\\[\s\S][^\\"]*)*"|\(\*[\s\S]*?\*\)|\b(true|false|exit|continue|return|constant|retain|public|private|protected|abstract|persistent|internal|final|of|else|elsif|then|__try|__catch|__finally|__endtry|do|to|by|task|with|using|uses|from|until|or|or_else|and|and_then|not|xor|nor|ge|le|eq|ne|gt|lt|__new|__delete|extends|implements|this|super|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b/gi

请参阅此regex demo

我将模式中的第一个(?:更改为(,以便将您期望的匹配项捕获到组1中,并在模式开头添加了(?<!\\(?:\\{2})*)"[^"\\]*(?:\\[\s\S][^\\"]*)*"|\(\*[\s\S]*?\*\)|

  • (?<!\\(?:\\{2})*)"[^"\\]*(?:\\[\s\S][^\\"]*)*"-不带反斜杠的位置(可选)后跟任意偶数个反斜杠,然后是带有转义序列支持的双引号字符串
  • |-或
  • \(\*[\s\S]*?\*\)-(*,然后是0+个字符,越少越好,然后是*)

请参阅JavaScript演示

const keywords = [
    'true','false','exit','continue','return','constant','retain','public','private','protected','abstract','persistent','internal','final','of','else','elsif','then','__try','__catch','__finally','__endtry','do','to','by','task','with','using','uses','from','until','or','or_else','and','and_then','not','xor','nor','ge','le','eq','ne','gt','lt','__new','__delete','extends','implements','this','super'
];
const regEx = new RegExp(String.raw`(?<!\\(?:\\{2})*)"[^"\\]*(?:\\.[^\\"]*)*"|\(\*.*?\*\)|\b(${keywords.join('|')}|AT|BOOL|BYTE|(?:D|L)?WORD|U?(?:S|D|L)?INT|L?REAL|TIME(?:_OF_DAY)?|TOD|DT|DATE(?:_AND_TIME)?|STRING|ARRAY|ANY)\b`,"igs");
let text = "TYPE MyStruct : STRUCT\n        this.var1 : POINTER TO INT; (* Указатель 1 *)\n        var2 : POINTER TO INT; (* this is Указатель 2 *)\n        sStr: STRING(200) := \"This \n            Test this line\";    \n        sStr: STRING(200) := \"Test this line\";    \n        sStr: STRING(200) := 'Test this line';    \n    END_STRUCT\nEND_TYPE\n\nTHIS.MyStruct := 100;";
text = text.replace(regEx,(match,group) => {
    return group != undefined ? match.toUpperCase() : match;
});
console.log(text);