问题描述
在多行字符串中,在每一行中,我想删除从第一个未转义的百分号到行尾的所有内容; 有一个例外。如果未转义的百分号出现在以下位置:\d\d:\d\d%:\d\d
,那么我想不理会它。
(字符串是 LaTeX / TeX 代码,百分号表示注释。我想将 HH:MM:SS 字符串中的注释视为一种特殊情况,其中秒被注释出时间字符串。)
下面的代码几乎可以做到:
- 它使用一个否定的后视来让
\%
单独存在 - 它使用“ungreedy”来匹配第一个,而不是最后一个,
%
- 它使用另一个否定后视来跳过
\d\d:\d\d%
- 但它无法区分
\d\d:\d\d%anything
和\d\d:\d\d%\d\d
,跳过两者。 - 我尝试添加负面前瞻的尝试无济于事。有没有办法做到这一点?
#!/usr/bin/perl
use strict; use warnings;
my $string = 'for 10\% and %delete-me
for 10\% and 2021-03-09 Tue 02:59%:02 NO DELETE %delete-me
for 10\% and 2021-03-09 Tue 04:09%anything %delete-me
for 10 percent%delete-me';
print "original string:\n";
print "$string<<\n";
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\\)
)
(\%.*)
$/${1}/mgx;
print "\ndelete after any unescaped %\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(\%.*)
$/${1}/mgx;
print "\nexception for preceding HH:MM\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(!?:\d\d)
(\%.*)
$/${1}/mgx;
print "\nattempt to add negative lookahead\n";
print "$tochange<<\n";
}
{
my $tochange = $string;
# attempt to add negative lookahead
$tochange =~ s/
(^.*?
(?<!\d\d:\d\d)
(?<!\\)
)
(\%.*)
(!?:\d\d)
$/${1}/mgx;
print "\nattempt to add negative lookahead\n";
print "$tochange<<\n";
}
解决方法
您可以使用 SKIP FAIL 方法:
\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|(?<!\\)%.*
-
\d\d:\d\d%:\d\d(*SKIP)(*FAIL)|
匹配您要避免的模式
例如
(?<!\\)%.*