将逗号转换为点和数字,但只能使用一定数量的变量

问题描述

所以我有一个df,它看起来像这样,数字值被分割成逗号而不是点,并且被分类为字符。

var0 <- c("There,are commas","in the text,string","as,well","how,can","i","fix,this","thank you")
var1 <- c("50,0","72,"960,"1.920,"50,0")
var2 <- c("40,"742,"9460,0")
var3<- c("40,"90,"1,30",0")
...
var96 <- c("40,0")

df <- data.frame(cbind(var0,var1,var2,var3))

我知道如何使用gsub手动转换每个变量,但是正如您在下面看到的,我有大约96个。除此之外,我还有其他变量,其中包括不需要转换逗号的文本字符串和因子级别。

关于此的任何提示吗?

谢谢

解决方法

tidyverse软件包非常适合这种事情。

library(tidyverse)
df <- df %>% 
      # First,remove the points in your numbers b/c otherwise,you'll end up
      # with,e.g.,"1.920.0"
      mutate_all(.fun = function(x) gsub("\\.","",x)) %>% 
      # Next,replace all the commas with points and convert to numeric. Only do
      # this for the columns that don't contain text,though.
      mutate_at(.vars = vars(matches("var[1-3]")),.fun = function(x) as.numeric(gsub(",","\\.",x)))

请注意,在mutate_at调用中,我假设只有列“ var0”包含您要保留的文本,并且我转换了所有与正则表达式“ var [1-3]相匹配的内容”表示数字数据和使用的点而不是逗号。您需要根据情况调整该正则表达式。

,

这是一个仅用小数点替换逗号并删除所有其他点的函数,如果出现的所有字符均为数字0-9,点和逗号。

commas2dots <- function(x){
  if(any(grepl("[^\\.,[:digit:]]",x))){
    x
  } else {
    y <- gsub("\\.",x)
    tc <- textConnection(y)
    on.exit(close(tc))
    scan(tc,dec = ",quiet = TRUE)
  }
}

lapply(df,commas2dots)
#$var0
#[1] "There,are commas"   "in the text,string"
#[3] "as,well"             "how,can"           
#[5] "i"                   "fix,this"          
#[7] "thank you"          
#
#$var1
#[1]   50   72  960 1920   50   50  960
#
#$var2
#[1]   40  742 9460 1920   50   50  960
#
#$var3
#[1]  40.0  72.0  90.0   1.3  50.0  50.0 960.0
#
#$var96
#[1]   40  742 9460 1920   50   50  960

要更改data.frame的列,请执行以下操作:

df[] <- lapply(df,commas2dots)
df
#                 var0 var1 var2  var3 var96
#1   There,are commas   50   40  40.0    40
#2 in the text,string   72  742  72.0   742
#3             as,well  960 9460  90.0  9460
#4            how,can 1920 1920   1.3  1920
#5                   i   50   50  50.0    50
#6           fix,this   50   50  50.0    50
#7           thank you  960  960 960.0   960

数据

var0 <- c("There,are commas","in the text,string","as,well","how,can","i","fix,this","thank you")
var1 <- c("50,0","72,"960,"1.920,"50,0")
var2 <- c("40,"742,"9460,0")
var3<- c("40,"90,"1,30",0")
var96 <- c("40,0")

df <- data.frame(var0,var1,var2,var3,var96)