php – 如何反转Unicode字符串

comment to an answer to this question中暗示了PHP不能反转Unicode字符串.

As for Unicode,it works in PHP
because most apps process it as
binary. Yes,PHP is 8-bit clean. Try
the equivalent of this in PHP: perl
-Mutf8 -e ‘print scalar reverse(“ほげほげ”)’ You will get garbage,
not “げほげほ”. – jrockway

不幸的是,PHPs unicode支持atm是最好的“缺乏”是正确的.这将是hopefully change drastically with PHP6.

PHP MultiByte functions确实提供了处理unicode所需的基本功能,但它不一致,缺少很多功能.其中之一是反转字符串的功能.

我当然想把这个文本翻译成没有其他原因,然后弄清楚是否有可能.我做了一个功能来完成这个巨大的复杂的任务来扭转这个Unicode文本,所以你可以放松一下,直到PHP6.

测试代码

$enc = 'UTF-8';
$text = "ほげほげ";
$defaultEnc = mb_internal_encoding();

echo "Showing results with encoding $defaultEnc.\n\n";

$revnormal = strrev($text);
$revInt = mb_strrev($text);
$revEnc = mb_strrev($text,$enc);

echo "Original text is: $text .\n";
echo "normal strrev output: " . $revnormal . ".\n";
echo "mb_strrev without encoding output: $revInt.\n";
echo "mb_strrev with encoding $enc output: $revEnc.\n";

if (mb_internal_encoding($enc)) {
    echo "\nSetting internal encoding to $enc from $defaultEnc.\n\n";

    $revnormal = strrev($text);
    $revInt = mb_strrev($text);
    $revEnc = mb_strrev($text,$enc);

    echo "Original text is: $text .\n";
    echo "normal strrev output: " . $revnormal . ".\n";
    echo "mb_strrev without encoding output: $revInt.\n";
    echo "mb_strrev with encoding $enc output: $revEnc.\n";

} else {
    echo "\nCould not set internal encoding to $enc!\n";
}
Grapheme功能处理UTF-8字符串比mbstring和PCRE功能更正确/ Mbstring和PCRE可能会中断字符.您可以通过执行以下代码来看到它们之间的差异.
function str_to_array($string)
{
    $length = grapheme_strlen($string);
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

        $ret[] = grapheme_substr($string,$i,1);
    }

    return $ret;
}

function str_to_array2($string)
{
    $length = mb_strlen($string,"UTF-8");
    $ret = [];

    for ($i = 0; $i < $length; $i += 1) {

    $ret[] = mb_substr($string,1,"UTF-8");
}

    return $ret;
}

function str_to_array3($string)
{
    return preg_split('//u',$string,-1,PREG_SPLIT_NO_EMPTY);
}

function utf8_strrev($string)
{
    return implode(array_reverse(str_to_array($string)));
}

function utf8_strrev2($string)
{
    return implode(array_reverse(str_to_array2($string)));
}

function utf8_strrev3($string)
{
    return implode(array_reverse(str_to_array3($string)));
}

// http://www.PHP.net/manual/en/function.grapheme-strlen.PHP
$string = "a\xCC\x8A"  // 'LATIN SMALL LETTER A WITH RING ABOVE' (U+00E5)
         ."o\xCC\x88"; // 'LATIN SMALL LETTER O WITH DIAERESIS'  (U+00F6)

var_dump(array_map(function($elem) { return strtoupper(bin2hex($elem)); },[
  'should be' => "o\xCC\x88"."a\xCC\x8A",'grapheme' => utf8_strrev($string),'mbstring' => utf8_strrev2($string),'pcre' => utf8_strrev3($string)
]));

结果就在这里.

array(4) {
  ["should be"]=>
  string(12) "6FCC8861CC8A"
  ["grapheme"]=>
  string(12) "6FCC8861CC8A"
  ["mbstring"]=>
  string(12) "CC886FCC8A61"
  ["pcre"]=>
  string(12) "CC886FCC8A61"
}

IntlBreakIterator可以使用PHP 5.5(intl 3.0);

function utf8_strrev($str)
{
    $it = IntlBreakIterator::createCodePointInstance();
    $it->setText($str);

    $ret = '';
    $pos = 0;
    $prev = 0;

    foreach ($it as $pos) {
        $ret = substr($str,$prev,$pos - $prev) . $ret;
        $prev = $pos;
    }

    return $ret;  
}

相关文章

统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...
统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...
前言 之前做了微信登录,所以总结一下微信授权登录并获取用户...
FastAdmin是我第一个接触的后台管理系统框架。FastAdmin是一...
之前公司需要一个内部的通讯软件,就叫我做一个。通讯软件嘛...
统一支付是JSAPI/NATIVE/APP各种支付场景下生成支付订单,返...