避开斜杠时将Unicode特殊字符转换为html的最佳方法？

问题描述

所以我发现自己需要转换一些来自数据库的html文本，并且得到类似于以下内容的字符串：

我需要将其放入适当的HTML中。像这样：

<p style=\"font-size: 10px;\">\n<strong>Search for:<\/strong> <span style=\"color:#888888;\">2 to 15 People,\u00b120$ Per Person,Informal,Available on Date<\/span>\n<\/p>

这里有几个问题，首先是斜杠，我在Stripslashes之前使用stripcslashes，因此它首先转换C样式转义符，例如“ \ n”。然后，我使用反斜杠删除引号转义符。但这会弄乱诸如±号（\ u00b1）之类的Unicode字符

我已经在线搜索过，看来使用json解码通常是解决这个问题的技巧，但是由于要使用的字符串类型，我在这里不能使用json解码。这只是一个例子，我正在使用的实际字符串是完整的HTML页面。

有没有人暗示我该如何解决？

这是我目前正在使用的：现在我正在使用这个：

<p style="font-size: 10px;">
<strong>Search for:</strong> <span style="color:#888888;">2 to 15 People,&plusmn;20$ Per Person,Available on Date</span>
</p>

除了\ u00b1之类的Unicode字符之外，它为我提供了几乎完美的HTML页面

解决方案

我最终使用了劳伦斯·谢罗（LaWrence Cherone）给出的解决方案

$final = urlencode(stripslashes(stripcslashes(html_entity_decode($html,ENT_COMPAT,'UTF-8'))));

解决方法

如果我对您的理解正确，您只想退出：\"的{{1}}和"的{{1}}，也许其他。

您可以使用str_replace()并在事物列表中定位超过\/的目标。

使用以下答案中的preg_replace_callback代码进行编辑以修复unicode：How to decode Unicode escape sequences like "\u00ed" to proper UTF-8 encoded characters?

结果：

https://3v4l.org/fTtLh

html unicode unicode-escapes