宪章........错误,:'UTF8TOWCS'

问题描述

我需要您的帮助,因为使用不同的方法尝试会遇到相同的错误。我想删除特殊字符,例如“áéíóúÁÉÍÓÓÚýÝ”,“àèìòùÀÈÌÒÙ”,“âêîôûÂÊÎÔÛ”,“ãõÃÕñÑ”,“äëïöüÄËÏÖÜÿ”,“çÇ”到“ aeIoUAEIoUXX”,“ aeIoUAEIoU”,“ AEIoUAEIU”来自数据帧的“ XX”。 谢谢!!!

首先我尝试这样做:

trata<-function(Campo){
  Campo<-Campo %>% chartr('ÇÆ£ØÞß&@Ð','XXXXXXXXX',.) %>%
    str_to_upper(locale = "es") %>% str_trim(side = "both") %>%
    str_replace_all("['´`^]","") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕÑ','AEIoUAEIoUAEIoUAEIoUAAOX',.)
  return(Campo)
}


trataRS<-function(Campo){
  Campo<-Campo %>% chartr('ÇÆ£ØÞßÐ',"") %>% chartr('ÁÉÍÓÚÀÈÌÒÙÄËÏÖÜÂÊÎÔÛÅÃÕ','AEIoUAEIoUAEIoUAEIoUAAO',.)
  return(Campo)
}

然后我将这些功能应用于:

Base$paterno_originador<-trata(Base$paterno_originador)
Base$razon_originador <- trataRS(Base$razon_originador)

但我收到此错误

Error in chartr("ÇÆ£ØÞßÐ","XXXXXXXXX",.) : invalid input 'HÉCTOR" in 'utftowcs'

因此,我尝试了从@Alexandre_Lima在这里找到的另一种方式:

rm_accent <- function(str,pattern="all") {
  if(!is.character(str))
    str <- as.character(str)
  
  pattern <- unique(pattern)
  
  if(any(pattern=="Ç"))
    pattern[pattern=="Ç"] <- "ç"
  
  symbols <- c(
    acute = "áéíóúÁÉÍÓÚýÝ",grave = "àèìòùÀÈÌÒÙ",circunflex = "âêîôûÂÊÎÔÛ",tilde = "ãõÃÕñÑ",umlaut = "äëïöüÄËÏÖÜÿ",cedil = "çÇ"
  )
  
  nudeSymbols <- c(
    acute = "aeIoUAEIoUyY",grave = "aeIoUAEIoU",circunflex = "AEIoUAEIoU",tilde = "AOAOXX",umlaut = "AEIoUAEIoUX",cedil = "XX"
  )
  
  accentTypes <- c("´","`","^","~","¨","ç")
  
  if(any(c("all","al","a","todos","t","to","tod","todo")%in%pattern)) # opcao retirar todos
    return(chartr(paste(symbols,collapse=""),paste(nudeSymbols,str))
  
  for(i in which(accentTypes%in%pattern))
    str <- chartr(symbols[i],nudeSymbols[i],str) 
  
  return(str)
}

但是我遇到了类似的错误

Error in chartr(paste(symbols,collapse = ""),: 
  invalid input 'RUÍZ' in 'utf8towcs'

我写这个给你看编码。出现在该列中有特殊字符的UTF-8:

编码(Base $ nombre_originador) [1]“未知”“ UTF-8”“未知”“ UTF-8”

解决方法

'utf8towcs' 中无效输入的解决方案是在将 .csv 文件导入 R 时设置您的编码。

  1. 当您使用 read.csv() 或 read.delim() 导入文件时,请指定 encoding = "UTF-8" 或 encoding = "Latin-1"。我尝试使用“Latin-1”并解决它。

  2. 您可能还想检查您的系统编码是什么,并匹配它。您可以使用 Sys.getlocale() 执行此操作(并使用 Sys.setlocale() 对其进行设置。)例如在我的系统上:

Sys.getlocale() [1] "en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8"

一个例子

data <- read.delim("input/data/data.txt",sep=";",encoding = "Latin-1",stringsAsFactors = F )

data <- read.csv("input/data/data.csv",stringsAsFactors = F )

最诚挚的问候