PHP中更有效的字符串清理正则表达式

好吧,我希望有人可以帮我一点正则表达式.

我正在尝试清理字符串.

基本上,我是：

>将所有字符替换为A-Za-z0-9除外.
>用单个替换实例替换替换的连续重复副本.
>从字符串的开头和结尾修剪替换.

输入示例：

(&&(％()$()#&#&％& %%(％$-_狗跳过日志*(&)$％&)#)@#％& )& ^)@#)

要求的输出：

狗跳过了原木

我目前正在使用此非常分散的代码,并且只知道有一种更优雅的方法可以完成此任务.

function clean($string, $replace){

    $ok = "0123456789ABCDEFGHIJKLMnopQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
    $ok .= $replace;
    $pattern = "/[^".preg_quote($ok, "/")."]/";

    return trim(preg_replace('/'.preg_quote($replace.$replace).'+/', $replace, preg_replace($pattern, $replace, $string)),$replace);
}

Regex-Fu Master能否请我提供一个更简单/更有效的解决方案？

BotondBalázs和hakre提出并解释了一个更好的解决方案：

function clean($string, $replace, $skip=""){
    // Escape $skip
    $escaped = preg_quote($replace.$skip, "/");

    // Regex pattern
    // Replace all consecutive occurrences of "Not OK" 
    // characters with the replacement
    $pattern = '/[^A-Za-z0-9'.$escaped.']+/';

    // Execute the regex
    $result = preg_replace($pattern, $replace, $string);

    // Trim and return the result
    return trim($result, $replace);
}

解决方法:

我不是“正则表达式忍者”,但我将按照以下方式进行操作.

function clean($string, $replace){
    /// Remove all "not OK" characters from the beginning and the end:
    $result = preg_replace('/^[^A-Za-z0-9]+/', '', $string);
    $result = preg_replace('/[^A-Za-z0-9]+$/', '', $result);

    // Replace all consecutive occurrences of "not OK" 
    // characters with the replacement:
    $result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);

    return $result;
}

我想这可以进一步简化,但是在处理正则表达式时,清晰度和可读性通常比聪明或编写超最佳代码更重要.

让我们看看它是如何工作的：

> / ^ [^ A-Za-z0-9] /：

> ^匹配字符串的开头.
> [^ A-Za-z0-9]与所有非字母数字字符匹配
>表示“匹配一项或多项先前的内容”

> / [^ A-Za-z0-9] $/：

>与上述相同,但$匹配字符串的结尾

> / [^ A-Za-z0-9] /：

>与上述相同,但它也匹配中弦

编辑：OP是正确的,可以用对trim()的调用来替换前两个：

function clean($string, $replace){
    // Replace all consecutive occurrences of "not OK" 
    // characters with the replacement:
    $result = preg_replace('/[^A-Za-z0-9]+/', $replace, $result);

    return trim($result, $replace);
}

PHP中更有效的字符串清理正则表达式

相关文章