将不出现的因素添加到 R 中的数据框中

问题描述

我有一个包含因子和相应值的数据框，如下所示：

df <- data.frame(week = factor(c(1,2,49,50)),occurrences = c(1,4,3))


 week occurrences
1    1          1
2    2          4
3   49          2
4   50          3

我想为 (1-53) 中所有“缺失”周添加因子，相应的出现次数值为 0。这样做的最佳方法是什么？我必须对可能不会“丢失”相同因素的几个数据框执行此操作，因此我想将其概括为一个函数。

解决方法

您可以使用 rbind() 将必要的行附加到您的 df，在本示例中，为了清楚起见，我首先创建要添加的 df，然后再附加它。 setdiff() 将返回当前不在您的周列中的数字：

df_to_app = data.frame(week = factor(setdiff(1:52,df$week)),occurrences = 0)
df = rbind(df,df_to_app)

希望能帮到你！

这是使用 tidyr::complete 的方法。首先，我们需要向 week 列添加额外的级别。我们可以使用 forcats::fct_expand。然后 tidyr::complete 将用这些级别填充 data.frame，我们可以使用 fill = 参数来指示我们想要 0。

library(tidyverse)
df %>%
  mutate(week = fct_expand(week,paste0(1:52))) %>%
  complete(week,fill = list(occurrences = 0))
# A tibble: 52 x 2
   week  occurrences
   <fct>       <dbl>
 1 1               1
 2 2               4
 3 49              2
 4 50              3
 5 3               0
 6 4               0
 7 5               0
 8 6               0
 9 7               0
10 8               0
# … with 42 more rows

或者正确连接到包含所有周的数据框：

library(dplyr)
df %>% 
   right_join(data.frame(week = as.factor(1:52))) %>%
    mutate(occurrences = replace_na(occurrences,0))

r r r-factor