perl正则表达式递归

    今天在chinaunix上看到有个贴,问

    设最外层括号为第 1 层,请问怎么样能够除去 1 对第 2 层的括号,保留其他括号?

    例如:

  1. (((1,2),3),4)   =>  ((1,3,4)
  2. ((1,(3,4))   =>  ((1,4)
  3.                         or
  4.                     (1,2,4))
  5.                     
  6. (1,(2,4)))   => (1,4))
   
     解决方案一:

    

     解决方案二:
    

     $str =~ /
     (/()         # 分组1: $1匹配左括号
     (?=         # 整体是1个环视,这样,第1次匹配成功会从第1个左括号开始,第2个次匹配成功会从第2个左括号开始,以此类推
        (         # 分组2: $2匹配括号里的内容加上$3
                (?:        # 分组不捕获
                        [^()]              # 要么不包括括号
                        |
                        (?1)(?2)        # 要么是分组1加上分组2的递归
                )+
                (/)) # 分组3:$3匹配右括号
        )
      )
      /xg;


————————————————————分割线————————————————————


     http://perldoc.perl.org/perlre.html上有介绍perl 5.10以上的正则表达式新特性

     (?PARNO) (?-PARNO) (?+PARNO) (?R) (?0)

Similar to ( ?? { code } ) except it does not involve compiling any code,instead it treats the contents of a capture buffer as an independent pattern that must match at the current position. Capture buffers contained by the pattern will have the value as determined by the outermost recursion.

PARNO is a sequence of digits (not starting with 0) whose value reflects the paren-number of the capture buffer to recurse to. (?R) recurses to the beginning of the whole pattern. (?0) is an alternate syntax for (?R) . If PARNO is preceded by a plus or minus sign then it is assumed to be relative,with negative numbers indicating preceding capture buffers and positive ones following. Thus (?-1) refers to the most recently declared buffer,and (?+1) indicates the next buffer to be declared. Note that the counting for relative recursion differs from that of relative backreferences,in that with recursion unclosed buffers are included.

The following pattern matches a function foo() which may contain balanced parentheses as the argument.

  
  
  1. $re = qr{ ( # paren group 1 (full function)
  2. foo
  3. ( # paren group 2 (parens)
  4. /(
  5. ( # paren group 3 (contents of parens)
  6. (?:
  7. (?> [^()]+ ) # Non-parens without backtracking
  8. |
  9. (?2) # Recurse to start of paren group 2
  10. )*
  11. )
  12. /)
  13. )
  14. )
  15. }x ;

If the pattern was used as follows

  
  
  1. 'foo(bar(baz)+baz(bop))' =~/$re/
  2. and print "/$1 = $1/n" ,
  3. "/$2 = $2/n" ,
  4. "/$3 = $3/n" ;

the output produced should be the following:

  
  
  1. $1 = foo(bar(baz)+baz(bop))
  2. $2 = (bar(baz)+baz(bop))
  3. $3 = bar(baz)+baz(bop)

If there is no corresponding capture buffer defined,then it is a fatal error. Recursing deeper than 50 times without consuming any input string will also result in a fatal error. The maximum depth is compiled into perl,so changing it requires a custom build.

The following shows how using negative indexing can make it easier to embed recursive patterns inside of a qr// construct for later use:

  
  
  1. my $parens = qr/(/((?:[^()]++|(?-1))*+/))/ ;
  2. if ( /foo $parens /s+ + /s+ bar $parens/x ) {
  3. # do something here...
  4. }

Note that this pattern does not behave the same way as the equivalent PCRE or Python construct of the same form. In Perl you can backtrack into a recursed group,in PCRE and Python the recursed into group is treated as atomic. Also,modifiers are resolved at compile time,so constructs like (?i:(?1)) or (?:(?i)(?1)) do not affect how the sub-pattern will be processed.

相关文章

1. 如何去重 #!/usr/bin/perl use strict; my %hash; while(...
最近写了一个perl脚本,实现的功能是将表格中其中两列的数据...
表的数据字典格式如下:如果手动写MySQL建表语句,确认麻烦,...
巡检类工作经常会出具日报,最近在原有日报的基础上又新增了...
在实际生产环境中,常常需要从后台日志中截取报文,报文的形...
最近写的一个perl程序,通过关键词匹配统计其出现的频率,让...