如何在单个 perl regx 中同时执行负前瞻和负前瞻？

问题描述

在多行字符串中，在每一行中，我想删除从第一个未转义的百分号到行尾的所有内容； 有一个例外。如果未转义的百分号出现在以下位置：\d\d:\d\d%:\d\d，那么我想不理会它。

（字符串是 LaTeX / TeX 代码，百分号表示注释。我想将 HH:MM:SS 字符串中的注释视为一种特殊情况，其中秒被注释出时间字符串。）

下面的代码几乎可以做到：

它使用一个否定的后视来让 \% 单独存在
它使用“ungreedy”来匹配第一个，而不是最后一个，%
它使用另一个否定后视来跳过 \d\d:\d\d%
但它无法区分 \d\d:\d\d%anything 和 \d\d:\d\d%\d\d，跳过两者。
我尝试添加负面前瞻的尝试无济于事。有没有办法做到这一点？

#!/usr/bin/perl
use strict; use warnings;

my $string = 'for 10\% and %delete-me
for 10\% and 2021-03-09 Tue 02:59%:02 NO DELETE %delete-me
for 10\% and 2021-03-09 Tue 04:09%anything  %delete-me
for 10 percent%delete-me';

print "original string:\n";
print "$string<<\n";

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\ndelete after any unescaped %\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\nexception for preceding HH:MM\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (!?:\d\d)
        (\%.*)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}


{
    my $tochange = $string;
    # attempt to add negative lookahead
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        (!?:\d\d)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}

解决方法

您可以使用 SKIP FAIL 方法：

\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*

\d\d:\d\d%:\d\d(*SKIP)(*FAIL)| 匹配您要避免的模式

Regex demo | Perl demo

例如

(?<!\\)%.*

negative-lookbehind perl regex-lookarounds