问题描述
我在R中输入了以下代码,并尝试计算从第2列开始的所有列的均值,但出现错误提示:
Error in colMeans(calculate[2:9]) : 'x' must be numeric.
数据:
calculate<-read.csv("Populationdata.csv",header=TRUE)
head(calculate)
ï..State Yr_2000 Yr_2001 Yr_2002 Yr_2003 Yr_2004 Yr_2005 Yr_2006 Yr_2009 Yr_2010
1 Alabama 7,465 14,899 19,683 23,521 26,338 30,894 33,096 35,625 38,965
2 Alaska 13,007 20,887 25,798 29,642 33,568 38,138 42,603 46,778 55,940
3 Arizona 8,854 16,262 20,634 24,988 26,838 31,936 32,935 35,979 39,060
4 Arkansas 7,113 13,779 18,546 21,995 24,289 28,473 31,946 34,723 39,107
5 California 11,021 20,656 24,496 32,149 33,749 39,626 42,325 44,980 52,651
6 Colorado 10,143 18,818 24,865 32,434 34,283 39,491 41,344 45,135 50,410
class(calculate)
#[1] "data.frame"
colMeans(calculate[2:9])
colMeans中的错误(calculate [2:9]):“ x”必须为数字
请帮我解决这个问题。 CSV文件中的实际数据是:
解决方法
假设您以df
的身份上传数据。问题是您的号码中的Sep ,
。您可以将下一个代码与across()
中的dplyr
一起使用,以转换变量,然后计算均值。这里是一种dplyr
的方法:
library(dplyr)
#Code
df %>% mutate(across(Yr_2000:Yr_2010,~as.numeric(gsub(',','',.)))) %>%
select(-1) %>%
summarise_all(mean,na.rm=T)
输出:
Yr_2000 Yr_2001 Yr_2002 Yr_2003 Yr_2004 Yr_2005 Yr_2006 Yr_2009 Yr_2010
1 9600.5 17550.17 22337 27454.83 29844.17 34759.67 37374.83 40536.67 46022.17
使用了一些数据:
#Data
df <- structure(list(ï..State = c("Alabama","Alaska","Arizona","Arkansas","California","Colorado"),Yr_2000 = c("7,465","13,007","8,854","7,113","11,021","10,143"),Yr_2001 = c("14,899","20,887","16,262",779",656","18,818"),Yr_2002 = c("19,683","25,798",634",546","24,496",865"),Yr_2003 = c("23,521","29,642",988","21,995","32,149",434"),Yr_2004 = c("26,338","33,568","26,838",289",749","34,283"),Yr_2005 = c("30,894","38,138","31,936","28,473","39,626",491"),Yr_2006 = c("33,096","42,603",935",946",325","41,344"),Yr_2009 = c("35,625","46,778","35,979",723","44,980","45,135"),Yr_2010 = c("38,965","55,940",060",107","52,651","50,410")),class = "data.frame",row.names = c("1","2","3","4","5","6"))
,
这是基本的R解决方案。首先用小数点代替逗号,然后colMeans
可以正常工作。
df[-1] <- lapply(df[-1],function(x) as.numeric(sub(",","",x)))
colMeans(df[-1])
# Yr_2000 Yr_2001 Yr_2002 Yr_2003 Yr_2004 Yr_2005 Yr_2006 Yr_2009 Yr_2010
# 9600.50 17550.17 22337.00 27454.83 29844.17 34759.67 37374.83 40536.67 46022.17
,
我们可以使用parse_number
,这似乎是一种直接方法,它可以提取数字部分,然后以mean
方法获得across
tidyverse
列
library(dplyr)
df %>%
summarise(across(-1,~ mean(readr::parse_number(.),na.rm = TRUE )))
# Yr_2000 Yr_2001 Yr_2002 Yr_2003 Yr_2004 Yr_2005 Yr_2006 Yr_2009 Yr_2010
#1 9600.5 17550.17 22337 27454.83 29844.17 34759.67 37374.83 40536.67 46022.17
数据
df <- structure(list(ï..State = c("Alabama","6"))