计算所有数字列的加权平均值

问题描述

示例数据:

library(data.table)
set.seed(1)
DT <- data.table(panelID = sample(50,50),# Creates a panel ID
                      Country = c(rep("Albania",30),rep("Belarus",rep("Chilipepper",20)),some_NA = sample(0:5,6),some_NA_factor = sample(0:5,Group = c(rep(1,20),rep(2,rep(3,rep(4,rep(5,Time = rep(seq(as.Date("2010-01-03"),length=20,by="1 month") - 1,5),wt = 15*round(runif(100)/10,2),Income = round(rnorm(10,-5,Happiness = sample(10,10),Sex = round(rnorm(10,0.75,0.3),Age = sample(100,100),Educ = round(rnorm(10,2))           
DT [,uniqueID := .I]                                                                        # Creates a unique ID                                                                                # https://stackoverflow.com/questions/11036989/replace-all-0-values-to-na
DT$some_NA_factor <- factor(DT$some_NA_factor)

我想计算所有数字列的加权平均值,所以我尝试:

DT_w <- DT[,lapply(Filter(is.numeric,.SD),function(x) weighted.mean(DT$wt,x,na.rm=TRUE)),by=c("Country","Time")]

但是它说:

Error in weighted.mean.default(DT$wt,na.rm = TRUE) : 
  'x' and 'w' must have the same length

我想我可能对语法有误解。我这样做正确吗?

解决方法

两个问题:

  • 使用DT$wt时是对wt表中完整的DT列的显式调用-by参数将不起作用。 by参数仅适用于没有DT$前缀的列。

  • weighted.mean()的参数顺序首先是x,其次是w(权重)-您似乎倒退了

解决了这两个问题:

DT_w <- DT[,lapply(Filter(is.numeric,.SD),function(x) weighted.mean(x,w = wt,na.rm=TRUE)),by=c("Country","Time")]
# runs without errors