问题描述
如果这是我的数据框:
> length <- rep(11:17,200)
> mean(length)
[1] 14
> sd(length)
[1] 2.001
如何从数据框(长度)中随机抽取子样本但具有几乎相同的均值和标准差?
解决方法
您可以从长度上重复绘制,直到找到满足您要求的足够样本。它不漂亮,但很管用。
length <- rep(11:17,200)
# save mean and sd the subsamples should have
aimed_mean <- mean(length)
aimed_sd <- sd(length)
# set number of replications / iterations
n_replication <- 1000
# set size of sample
size_sample <- 40
# set desired number of samples
n_sample <- 3
# set deviation from mean and sd you can accept
deviation_mean <- 1.5
deviation_sd <- 1.5
# create empty container for resulting samples
samples <- list(n_replication)
# Repeatedly sample from length
i <- 0
sample_count <- 0
repeat {
i <- i+1
# take a sample from length
sample_length <- sample(length,size_sample)
# keep the sample when is is close enough
if(abs(aimed_mean - mean(sample_length)) < deviation_mean &
abs(aimed_mean - mean(sample_length)) < deviation_sd){
samples[[i]] <- sample_length
sample_count <- sample_count + 1
}
if(i == n_replication | sample_count == n_sample){
break
}
}
# your samples
samples
# test whether it worked
lapply(samples,function(x){abs(mean(x)-aimed_mean)<deviation_mean})
lapply(samples,function(x){abs(sd(x)-aimed_sd)<deviation_sd})