问题描述
@H_502_0@我有两个数据框,由于数据机密,我无法完全共享。我应该使用两个数据集中都存在的LABEL变量合并它们,并包含一些Unicode字符,例如č,ž等。但是,合并过程创建的行比预期的要多,并且经过进一步检查,我发现在第一个数据帧中,包含Unicode字符的值将按字面进行转录(例如,您可以在数据帧中看到标签
VŽ
) ,而在第二个数据帧中,标签通过其Unicode代码显示,因此,您会看到VŽ
而不是V\u008e
。我在两个数据帧上都使用了stri_enc_mark
函数,这是数据帧1的代码和输出:
stri_enc_mark(unique(data1$Label)) %>% cbind(unique(data1$Label))
@H_502_0@输出:
.
[1,] "ASCII" "ZD"
[2,] "ASCII" "RI"
[3,] "ASCII" "PU"
[4,] "ASCII" "ZG"
[5,] "ASCII" "DU"
[6,] NA NA
[7,] "ASCII" "KR"
[8,] "ASCII" "DA"
[9,] "ASCII" "MA"
[10,] "ASCII" "ST"
[11,] "UTF-8" "VŽ"
[12,] "ASCII" "KA"
[13,] "ASCII" "SB"
[14,] "ASCII" "BM"
[15,] "ASCII" "VT"
[16,] "ASCII" "BJ"
[17,] "ASCII" "DJ"
[18,] "ASCII" "OS"
[19,] "ASCII" "SK"
[20,] "ASCII" "GS"
[21,] "UTF-8" "PŽ"
[22,] "UTF-8" "ŠI"
[23,] "UTF-8" "KŽ"
[24,] "ASCII" "Vk"
[25,] "UTF-8" "ŽU"
[26,] "ASCII" "KC"
[27,] "ASCII" "DE"
[28,] "ASCII" "NA"
[29,] "UTF-8" "ČK"
[30,] "ASCII" "KT"
[31,] "ASCII" "IM"
[32,] "ASCII" "VU"
[33,] "ASCII" "NG"
[34,] "ASCII" "VK"
[35,] "ASCII" "OG"
[36,] "ASCII" "SL"
@H_502_0@对于数据框2:
stri_enc_mark(unique(data2$Label)) %>% cbind(unique(data2$Label))
@H_502_0@输出:
.
[1,] "ASCII" "BJ"
[2,] "ASCII" "BM"
[3,] "UTF-8" "\xc8K"
[4,] "ASCII" "DA"
[5,] "ASCII" "DE"
[6,] "ASCII" "DJ"
[7,] "ASCII" "DU"
[8,] "ASCII" "GS"
[9,] "ASCII" "IM"
[10,] "ASCII" "KA"
[11,] "ASCII" "KC"
[12,] "ASCII" "KR"
[13,] "ASCII" "KT"
[14,] "UTF-8" "K\u008e"
[15,] "ASCII" "MA"
[16,] "ASCII" "NA"
[17,] "ASCII" "NG"
[18,] "ASCII" "OG"
[19,] "ASCII" "OS"
[20,] "ASCII" "PU"
[21,] "UTF-8" "P\u008e"
[22,] "ASCII" "RI"
[23,] "ASCII" "SB"
[24,] "ASCII" "SK"
[25,] "ASCII" "ST"
[26,] "UTF-8" "\u008aI"
[27,] "ASCII" "VK"
[28,] "ASCII" "VU"
[29,] "UTF-8" "V\u008e"
[30,] "ASCII" "ZD"
[31,] "ASCII" "ZG"
[32,] "UTF-8" "\u008eU"
[33,] "ASCII" "VT"
@H_502_0@据我所知,“文字”标签和带有Unicode代码的标签都被编码为UTF-8,这让我感到惊讶,因为如果是这种情况,我无法理解为什么是数据帧显示VŽ和另一个 V \ u008e 。
@H_502_0@我想将编码标签转换为文字标签,我尝试了以下操作:
data2 %>%
mutate(Label = recode(Label,"\xc8K" = "ČK","K\u008e" = "KŽ","P\u008e" = "PŽ","\u008aI" = "ŠI","V\u008e" = "VŽ","\u008eU" = "ŽU"))
@H_502_0@但这无法成功,并且我收到以下警告:
Warning messages:
1: unable to translate 'K<U+008E>' to native encoding
2: unable to translate 'P<U+008E>' to native encoding
3: unable to translate '<U+008A>I' to native encoding
4: unable to translate 'V<U+008E>' to native encoding
5: unable to translate '<U+008E>U' to native encoding
@H_502_0@那么,如何正确重新编码这些值?
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)