替换列r中的乘法值

问题描述

我正在尝试创建一个函数,该函数接受两个变量,即大陆和要从数据框中使用的列。然后,我试图计算该特定大洲的列的平均值,以替换该大洲该列中的NA。但是,在实际替换值时似乎遇到了麻烦,我一直遇到错误。我尝试了多种方法,例如replace,replace_na和mutate,但我不断收到似乎无法摆脱的错误。这段代码不在函数中时起作用,但是当我将其添加函数中的那一刻,我似乎收到此错误

df<-structure(list(location = c("Algeria","Angola","Benin","Botswana","Burkina Faso","Burundi"),iso_code = c("DZA","AGO","BEN","BWA","BFA","BDI"),continent = c("Africa","Africa","Africa"),date = c("2020-09-02","2020-09-02","2020-09-02"),total_cases = c(44833,2654,2145,1733,1375,445),new_cases = c(339,30,9,5,0),new_cases_smoothed = c(372.143,53,4.286,24.429,3.286,2.143),total_deaths = c(1518,108,40,6,55,1),new_deaths = c(8,1,new_deaths_smoothed = c(8.857,0.857,0.143,0.429,total_cases_per_million = c(1022.393,80.751,176.934,736.937,65.779,37.424),new_cases_per_million = c(7.731,0.913,3.827,0.239,new_cases_smoothed_per_million = c(8.487,1.613,0.354,10.388,0.157,0.18),total_deaths_per_million = c(34.617,3.299,2.551,2.631,0.084),new_deaths_per_million = c(0.182,0.03,new_deaths_smoothed_per_million = c(0.202,0.026,0.012,0.182,population = c(43851043,32866268,12123198,2351625,20903278,11890781),population_density = c(17.348,23.89,99.11,4.044,70.151,423.062),median_age = c(29.1,16.8,18.8,25.8,17.6,17.5),aged_65_older = c(6.211,2.405,3.244,3.941,2.409,2.562),aged_70_older = c(3.857,1.362,1.942,2.242,1.358,1.504),gdp_per_capita = c(13913.839,5819.495,2064.236,15807.374,1703.102,702.225),extreme_poverty = c(0.5,NA,49.6,43.7,71.7),cardiovasc_death_rate = c(278.364,276.045,235.848,237.372,269.048,293.068),diabetes_prevalence = c(6.73,3.94,0.99,4.81,2.42,6.05),female_smokers = c(0.7,0.6,5.7,1.6,NA),male_smokers = c(30.4,12.3,34.4,23.9,NA
),handwashing_facilities = c(83.741,26.664,11.035,11.877,6.144),hospital_beds_per_thousand = c(1.9,0.5,1.8,0.4,0.8),life_expectancy = c(76.88,61.15,61.77,69.59,61.58,61.58)),row.names = c(NA,-6L),class = c("tbl_df","tbl","data.frame"
))


fun1 <- function(cont,column)
{
  countries<-df%>%
    filter(continent == cont)
  
  m<-mean(countries[[column]],na.rm=T)

    df[,column]<-ifelse(is.na(df[,column]) & df$continent==cont,m,(df[,column]=df[,column]))
}

fun1("Europe","median_age")

错误: 包裹时发生错误:无法将大小为208的输入回收为大小1。 错误:没有更多可用的错误处理程序(递归错误?);调用“中止”重启

解决方法

您在这里遇到许多问题。首先是您似乎在复制dput时出错,因此示例代码无法运行。其次,您在函数中使用名称mean作为变量名,这很可能在以后引起调试混乱。第三是您的函数不返回任何东西。最后,您的间距使代码很难阅读。您有很多带有换行符的垂直空格,但不要用空格分隔变量名和运算符。再次,这使事情更难以调试。

如果使用的是dplyr函数,则可以利用准引号来使代码更简单,更直观地使用。例如,您可以编写它以传递裸列名称,而不必将其包装在“双引号”中

fun1 <- function(cont,col)
{
  col <- enquo(col)
  
  filter(df,continent == cont) %>%
    mutate(!!col := replace(!!col,is.na(!!col),mean(!!col,na.rm = TRUE)))
}

所以你可以这样写:

fun1("Africa",new_cases)
#>       location iso_code continent       date total_cases new_cases new_cases_smoothed
#> 1      Algeria      DZA    Africa 2020-09-02       44833       339            372.143
#> 2       Angola      AGO    Africa 2020-09-02        2654        30             53.000
#> 3        Benin      BEN    Africa 2020-09-02        2145         0              4.286
#> 4     Botswana      BWA    Africa 2020-09-02        1733         9             24.429
#> 5 Burkina Faso      BFA    Africa 2020-09-02        1375         5              3.286
#> 6      Burundi      BDI    Africa 2020-09-02         445         0              2.143
#>   total_deaths new_deaths
#> 1         1518          8
#> 2          108          1
#> 3           40          0
#> 4            6          0
#> 5           55          0
#> 6            1          0

如果只想用该大陆其他国家/地区的平均值替换数字列中的所有NA值,则根本不需要任何函数。您可以使用:

df <- df %>% 
        group_by(continent) %>%
        mutate(across(total_cases:life_expectancy,function(x) replace(x,is.na(x),mean(x,na.rm = TRUE))))

要转换整个数据帧。