问题描述
示例数据:
library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50),# Creates a panel ID
Country = c(rep("Albania",30),rep("Belarus",rep("Chilipepper",20)),some_NA = sample(0:5,6),some_NA_factor = sample(0:5,Group = c(rep(1,20),rep(2,rep(3,rep(4,rep(5,Time = rep(seq(as.Date("2010-01-03"),length=20,by="1 month") - 1,5),wt = 15*round(runif(100)/10,2),Income = round(rnorm(10,-5,Happiness = sample(10,10),Sex = round(rnorm(10,0.75,0.3),Age = sample(100,100),Educ = round(rnorm(10,2))
DT [,uniqueID := .I] # Creates a unique ID # https://stackoverflow.com/questions/11036989/replace-all-0-values-to-na
DT$some_NA_factor <- factor(DT$some_NA_factor)
我想计算所有数字列的加权平均值,所以我尝试:
DT_w <- DT[,lapply(Filter(is.numeric,.SD),function(x) weighted.mean(DT$wt,x,na.rm=TRUE)),by=c("Country","Time")]
但是它说:
Error in weighted.mean.default(DT$wt,na.rm = TRUE) :
'x' and 'w' must have the same length
我想我可能对语法有误解。我这样做正确吗?
解决方法
两个问题:
-
使用
DT$wt
时是对wt
表中完整的DT
列的显式调用-by
参数将不起作用。by
参数仅适用于没有DT$
前缀的列。 -
weighted.mean()
的参数顺序首先是x
,其次是w
(权重)-您似乎倒退了
解决了这两个问题:
DT_w <- DT[,lapply(Filter(is.numeric,.SD),function(x) weighted.mean(x,w = wt,na.rm=TRUE)),by=c("Country","Time")]
# runs without errors