在不强调的情况下将“文字” Unicode字符转换为等效字符

问题描述

我有一个包含“文字” unicode字符的字符串作为输入。

"I want to replace \u00c6 with AE and \u00d5 with O"

注意： \ u00c6 =Æ \ u00d5 =Ö

因此，通过我的python脚本，我可以轻松替换一个字符：

>>> print("I want to replace \u00c6 with AE and \u00d5 with O".replace(u"\u00c6","AE"))
I want to replace AE with AE and Õ with O

但是，如果我想全部替换它们怎么办？（示例中只有2个，但是我们可以想象我们必须搜索50个要替换的字符。

我试图用字典进行匹配，但这似乎不起作用

#input  : "\u00c0 \u00c1 \u00c2 \u00d2 \u00c4 \u00c5 \u00c6 \u00d6"
#output (expected) : "A A A O A A AE 0"

import sys

unicode_table = {
   '\u00c0': 'A',#À
   '\u00c1': 'A',#Á
   '\u00c2': 'A',#Â
   '\u00c3': 'A',#Ã
   '\u00c4': 'A',#Ä
   '\u00c5': 'A',#Å
   '\u00c6': 'AE',#Æ
   '\u00d2': 'O',#Ò
   '\u00d3': 'O',#Ó
   '\u00d4': 'O',#Ô
   '\u00d5': 'O',#Õ
   '\u00d6': 'O'   #Ö
   #this may go on much further
}

result = sys.argv[1]

for key in unicode_table:
   #print(key + unicode_table[key])
   result = result.replace(key,unicode_table[key])

print(result)

输出：

[puppet@damageinc python]$ python replace_unicode.py "\u00c0 \u00c1 \u00c2 \u00d2 \u00c4 \u00c5 \u00c6 \u00d6"
\u00c0 \u00c1 \u00c2 \u00d2 \u00c4 \u00c5 \u00c6 \u00d6

任何帮助表示赞赏！谢谢。

edit：两种带有评论的解决方案，谢谢

1st：使用unicode_escape重新编码字符串：

result = sys.argv[1].encode().decode('unicode_escape')

2nd：使用模块unidecode，只是为了避免重新发现轮子

import sys
from unidecode import unidecode

result = sys.argv[1].encode().decode('unicode_escape')
print(unidecode(result))

解决方法

您的Python代码可以按预期工作，它是您的外壳程序，不呈现转义序列，即Python脚本接收的字面上是“ \ u00c0”而不是“À”，等等。

您应该尝试使用一些实际的unicode字符串对其进行测试，或者也许可以通过添加例如# Youtube API query base_url <- "https://youtube.googleapis.com/youtube/v3/" my_yt_search <- function(search_term,max_results = 20) { my_api_url <- str_c(base_url,"search?part=snippet&","maxResults=",max_results,"&","q=",search_term,"&key=",my_api_key,sep = "") result <- GET(my_api_url) return(result) } my_yt_search(search_term = "salmon")或printf来渲染转义序列，然后再将其传递给脚本：

echo -e

python python-3.x unicode-string

在不强调的情况下将“文字” Unicode字符转换为等效字符

问题描述

解决方法

相关问答