[LeetCode]—Regular Expression Matching 正则匹配

Regular Expression Matching


Implement regular expression matching with support for'.'and'*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

The function prototype should be:
bool isMatch(const char *s,const char *p)

Some examples:
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa","a*") → true
isMatch("aa",".*") → true
isMatch("ab",".*") → true
isMatch("aab","c*a*b") → true

问题的难点主要在处理“*”问题上,LeetCode给出了解题思路:

http://leetcode.com/2011/09/regular-expression-matching.html


Solution:
This looks just like a straight forward string matching,isn’t it? Couldn’t we just match the pattern and the input string character by character? The question is,how to match a ‘*’?

A natural way is to use a greedy approach; that is,we attempt to match the previous character as many as we can. Does this work? Let us look at some examples.

s= “abbbc”,p= “ab*c”
Assume we have matched the first ‘a’ on bothsandp. When we see “b*” inp,we skip all b’s ins. Since the last ‘c’ matches on both side,they both match.

s= “ac”,p= “ab*c”
After the first ‘a’,we see that there is no b’s to skip for “b*”. We match the last ‘c’ on both side and conclude that they both match.

It seems that being greedy is good. But how about this case?

s= “abbc”,p= “ab*bbc”
When we see “b*” inp,we would have skip all b’s in s. They both should match,but we have no more b’s to match. Therefore,the greedy approach fails in the above case.

One might be tempted to think of a quick workaround. How about counting the number of consecutive b’s ins? If it is smaller or equal to the number of consecutive b’s after “b*” inp,we conclude they both match and continue from there. For the opposite,we conclude there is not a match.

This seem to solve the above problem,but how about this case:
s= “abcbcd”,p= “a.*c.*d”

Here,“.*” inpmeans repeat ‘.’ 0 or more times. Since ‘.’ can match any character,it is not clear how many times ‘.’ should be repeated. Should the ‘c’ inpmatches the first or second ‘c’ ins? Unfortunately,there is no way to tell without using some kind of exhaustive search.

We need some kind of backtracking mechanism such that when a matching fails,we return to the last successful matching state and attempt to match more characters inswith ‘*’. This approach leads naturally to recursion.

The recursion mainly breaks down elegantly to the following two cases:

  1. If the next character ofpisNOT‘*’,then it must match the current character ofs. Continue pattern matching with the next character of bothsandp.
  2. If the next character ofpis ‘*’,then we do a brute force exhaustive matching of 0,1,or more repeats of current character ofp… Until we could not match any more characters.

You would need to consider the base case carefully too. That would be left as an exercise to the reader.

给出参考解法:
class Solution {
public:
    bool isMatch(const char *s,const char *p) {
        assert( s && p);
        if(*p=='\0') return *s=='\0';
        
        if(*(p+1)!='*'){
            if(*p==*s || (*p=='.' && *s!='\0'))
                    return isMatch(s+1,p+1);
            else
                return false;
        }else{
            while(*p==*s || *p=='.' && *s!='\0'){
                if(isMatch(s,p+2)) return true;       
                s++;
            }
            return isMatch(s,p+2);
        }
  }
};

相关文章

jquery.validate使用攻略(表单校验) 目录 jquery.validate...
/\s+/g和/\s/g的区别 正则表达式/\s+/g...
自整理几个jquery.Validate验证正则: 1. 只能输入数字和字母...
this.optional(element)的用法 this.optional(element)是jqu...
jQuery.validate 表单动态验证 实际上jQuery.validate提供了...
自定义验证之这能输入数字(包括小数 负数 ) <script ...