问题描述
作为对最近的SO问题(见here)的跟踪,我想知道如何在R
中对加权数据(包srvyr
)执行多个t.test。我无法运行它,如果有人可以在这里帮助我,我将很高兴。我在下面的代码中添加了一个随机样本。
非常感谢!
#create data
surveydata <- as.data.frame(replicate(1,sample(1:5,1000,rep=TRUE)))
colnames(surveydata)[1] <- "q1"
surveydata$q2 <- sample(6,size = nrow(surveydata),replace = TRUE)
surveydata$q3 <- sample(6,replace = TRUE)
surveydata$q4 <- sample(6,replace = TRUE)
surveydata$group <- c(1,2)
#replace all value "6" wir NA
surveydata[surveydata == 6] <- NA
#add NAs to group 1 in q1
surveydata$q1[which(surveydata$q1==1 & surveydata$group==1)] = NA
surveydata$q1[which(surveydata$q1==2 & surveydata$group==1)] = NA
surveydata$q1[which(surveydata$q1==3 & surveydata$group==1)] = NA
surveydata$q1[which(surveydata$q1==4 & surveydata$group==1)] = NA
surveydata$q1[which(surveydata$q1==5 & surveydata$group==1)] = NA
#add weights
surveydata$weights <- round(runif(nrow(surveydata),min=0.2,max=1.5),3)
#create vector for relevant questions
rquest <- names(surveydata)[1:4]
# create survey design
library(srvyr)
surveydesign <- surveydata %>%
as_survey_design(strata = group,weights = weights,variables = c("group",all_of(rquest)))
# perform multiple t.test (doesn't work yet)
outcome <- do.call(rbind,lapply(names(surveydesign$variables)[-1],function(i) {
tryCatch({
test <- t.test(as.formula(paste(i,"~ survey")),data = surveydesign)
data.frame(question = i,group1 = test$estimate[1],group2 = test$estimate[2],difference = diff(test$estimate),p_value = test$p.value,row.names = 1)
},error = function(e) {
data.frame(question = i,group1 = NA,group2 = NA,difference = NA,p_value = NA,row.names = 1)
})
}))
解决方法
据我了解,示例q1至q4中有一系列问题列。您已使用srvyr
来生成weights
列。在我们的数据中,可能有一个特定的问题可能是整个NA
的整个小组,即使您是真的,您也希望将结果生成到df中。您想要weighted Student's t-test
来利用weights
列不是简单的t检验。我发现提供的唯一函数是weights::wtd.t.test
,它不提供公式接口,但希望提供矢量。
按照采取的步骤:
- 加载必需的库
library(srvyr)
library(dplyr)
library(rlang)
library(weights)
- 构建一个自定义函数,该函数通过变量删除
NA
,pull
是x
,y
,weightx
,{{1}的向量},运行测试,然后将所需的信息提取到df行中。
weighty
- 一旦有了该函数,便可以使用
multiple_wt_ttest <- function(i) { i <- ensym(i) xxx <- surveydata %>% filter(!is.na(!!i)) %>% split(.$group) newx <- pull(xxx[[1]],i) newy <- pull(xxx[[2]],i) wtx <- pull(xxx[[1]],weights) wty <- pull(xxx[[2]],weights) test <- wtd.t.test(x = newx,y = newy,weight = wtx,weighty = wty,samedata = FALSE) data.frame(question = rlang::as_name(i),group1 = test$additional[[2]],group2 = test$additional[[3]],difference = test$additional[[1]],p.value = test$coefficients[[3]]) }
逐列应用它(注意,它处理lapply
中q2
都是group == 1
的情况。 li>
NA
- 最后,将其包装在
lapply(names(surveydata)[1:4],multiple_wt_ttest) #> [[1]] #> question group1 group2 difference p.value #> 1 q1 NaN 3.010457 NaN NA #> #> [[2]] #> question group1 group2 difference p.value #> 1 q2 3.009003 3.071842 -0.06283922 0.515789 #> #> [[3]] #> question group1 group2 difference p.value #> 1 q3 2.985096 2.968867 0.0162288 0.8734034 #> #> [[4]] #> question group1 group2 difference p.value #> 1 q4 2.856255 3.047787 -0.1915322 0.04290471
和do.call
中,制成您想要的df
rbind
您的数据(不显示创建它的所有回转次数,也不显示前200行)
do.call(rbind,lapply(names(surveydata)[1:4],multiple_wt_ttest))
#> question group1 group2 difference p.value
#> 1 q1 NaN 3.010457 NaN NA
#> 2 q2 3.009003 3.071842 -0.06283922 0.51578905
#> 3 q3 2.985096 2.968867 0.01622880 0.87340343
#> 4 q4 2.856255 3.047787 -0.19153218 0.04290471