从包含转义序列的文本中编码表情符号

问题描述

我正在尝试将此表单 text = "\\ud83d\\ude04\\n\\u3082\\u3042" 中的一些带有表情符号的文本打印到：

# my expecting output
# a new line after the emoji,then is Japanese character
>>>?
もあ

我已经阅读了一个关于此的问题，但只是解决了部分问题：

Best and clean way to Encode Emojis (Python) from text file

我按照帖子中提到的代码进行操作，得到以下结果：

emoji_text = "\\ud83d\\ude04\\n\\u3082\\u3042".encode("latin_1")
output = (emoji_text
  .decode("raw_unicode_escape")
  .encode('utf-16','surrogatepass')
  .decode('utf-16')
)
print(output)

>>>?\nもあ
# it prints \n instead of a new line

所以，我想问一下，如何在转换emoji和文本的同时转换转义序列\n、\t、\b等？

解决方法

使用 unicode_escape 而不是 raw_unicode_escape 也会解码 \n。不过，如果您首先使用 raw_unicode_escape 是有原因的，也许这不合适？

您选择编码为 "latin-1" 有点奇怪，但也许也是有原因的。也许您应该编码为 "ascii" 并准备好应对任何可能的后果。

emoji emoji emoji encode encode python-3.x unicode utf-16