如何在单个 perl regx 中同时执行负前瞻和负前瞻?

问题描述

在多行字符串中,在每一行中,我想删除从第一个未转义的百分号到行尾的所有内容一个例外。如果未转义的百分号出现在以下位置:\d\d:\d\d%:\d\d,那么我想不理会它。

(字符串是 LaTeX / TeX 代码,百分号表示注释。我想将 HH:MM:SS 字符串中的注释视为一种特殊情况,其中秒被注释出时间字符串。)

下面的代码几乎可以做到:

  1. 它使用一个否定的后视来让 \% 单独存在
  2. 它使用“ungreedy”来匹配第一个,而不是最后一个%
  3. 它使用另一个否定后视来跳过 \d\d:\d\d%
  4. 但它无法区分 \d\d:\d\d%anything\d\d:\d\d%\d\d,跳过两者。
  5. 我尝试添加负面前瞻的尝试无济于事。有没有办法做到这一点?
#!/usr/bin/perl
use strict; use warnings;

my $string = 'for 10\% and %delete-me
for 10\% and 2021-03-09 Tue 02:59%:02 NO DELETE %delete-me
for 10\% and 2021-03-09 Tue 04:09%anything  %delete-me
for 10 percent%delete-me';

print "original string:\n";
print "$string<<\n";

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\ndelete after any unescaped %\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        $/${1}/mgx;
    print "\nexception for preceding HH:MM\n";
    print "$tochange<<\n";
}

{
    my $tochange = $string;
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (!?:\d\d)
        (\%.*)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}


{
    my $tochange = $string;
    # attempt to add negative lookahead
    $tochange =~ s/
        (^.*?
        (?<!\d\d:\d\d)
        (?<!\\)
        )
        (\%.*)
        (!?:\d\d)
        $/${1}/mgx;
    print "\nattempt to add negative lookahead\n";
    print "$tochange<<\n";
}

解决方法

您可以使用 SKIP FAIL 方法:

\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*
  • \d\d:\d\d%:\d\d(*SKIP)(*FAIL)| 匹配您要避免的模式

Regex demo | Perl demo

例如

(?<!\\)%.*