如何使用R中的样本函数将年龄范围18-29的列重新编码为该年龄范围内的随机数？

问题描述

我们遇到以下问题：在我们的数据集中，有一个列，其中列出了被调查人群的年龄范围（例如18-29岁）。我们想创建一个新列，为每个人提供这个年龄范围内的随机数。为此，我们尝试将recode和sample函数结合在一起，但是它不起作用。有人可以帮我们吗？数据来自R包Fivethirtyeight（steak_survey）。

我们的代码：

library(fivethirtyeight)

#rand_age variable
steak_survey$rad <- recode(steak_survey$age,"'18-29' = sample(18:29,1,replace = TRUE)")

非常感谢您！

解决方法

如果您不介意使用dplyr，如果您只想在18-29岁的年龄范围内使用此方法，则应该这样做：

library(dplyr)

steak_survey <- steak_survey %>% 
  mutate(rad = if_else(
    age == "18-29",sample(18:29,nrow(.),replace = TRUE),NA_integer_))

如果您希望在所有年龄段都使用此功能，那么case_when可能会有用（我假设最大年龄为80岁）：

steak_survey <- steak_survey %>% 
  mutate(
    rad = case_when(
      age == "18-29" ~ sample(18:29,age == "30-44" ~ sample(30:44,age == "45-60" ~ sample(45:60,age == "> 60" ~ sample(60:80,TRUE ~ NA_integer_
    )
  )

r r recode sample