为什么以下代码对于不同的多字节字符串表现不同?
echo preg_replace('@(?=\pL)@u','*','م'); // prints: '*م' ✓ echo preg_replace('@(?=\pL)@u','ض'); // prints: '*ض' ✓ echo preg_replace('@(?=\pL)@u','غ'); // prints: '*�*�' ✗ echo preg_replace('@(?=\pL)@u','ص'); // prints: '*�*�' ✗
解决方法
您还需要包含修饰符(Lm).请参阅以下脚本迭代整个阿拉伯语unicode块:
<?PHP function uchar_2($dec) { $utf = chr(192 + (($dec - ($dec % 64)) / 64)); $utf .= chr(128 + ($dec % 64)); return $utf; } $issues = 0; $count = 0; for ($dec = 1536; $dec <= 1791; $dec++) { $char = uchar_2($dec); if (preg_replace('@^(?=\pLm)$@u',$char) !== $char) { printf("Issue with %s (%s)\n",$dec,$char); $issues++; } $count++; } printf("Found %d issues in %d rows\n",$issues,$count);
如果没有Lm,大约一半的角色都会失败.