问题描述
我有5个数据框,如下所示:
df_mon <- data.frame(mon = as.factor(c(6,7,8,9,10)),number = c(1.11,1.02,0.95,0.92,0.72))
df_year <- data.frame(year = as.factor(c(1,2)),number = c(1.61,0.4))
df_cat <- data.frame(cat = c("A","B","C"),0.44))
df_bin <- data.frame(bin = as.factor(c(1,number = c(1.42,0.56))
df_cat2 <- data.frame(cat2 = c("A","C","D","AA"),number = c(0.11,1.22,1.34,0.88,0.75))
我需要将这些数据帧中每个数据帧中“数字”列中的所有数字相互倍增。因此,请查看每个数据集第一列中的所有可能组合,然后取数字并将其乘以多个。最终结果数据框应如下所示(前3个步骤已完成)
results_df <- data.frame(combi = c("mon6_year1_catA_bin1_cat2A","mon6_year1_catA_bin1_cat2B","mon6_year1_catA_bin1_cat2C"),final_number = c(1.11*1.61*1.11*1.42*0.11,1.11*1.61*1.11*1.42*1.22,1.11*1.61*1.11*1.42*1.34))
我们可以看到results_df
的第一列显示了用于计算final_number
的组合。第一个示例显示,取'number'
(1.11)中的mon_df cat 6
列并乘以以下内容:
-
df_year中的
- 类别1(1.61) df_cat的
- A类(1.11) df_bin中的
- 类别1(1.42) df_cat2中的
- A类(0.11)
此组合的答案是1.11 x 1.61 x 1.11 x 1.42 x 0.11 = 0.3098。 第二行显示下一个可能的组合,依此类推。
我不确定如何实现此目标,因此将不胜感激!
解决方法
也许您可以像下面那样尝试expand.grid
lst <- list(df_mon,df_year,df_cat,df_bin,df_cat2)
results_df <- data.frame(
combi = do.call(
paste,c(do.call(
expand.grid,lapply(lst,function(v) paste0(names(v[1]),v[,1]))
),sep = "_")
),final_number = Reduce(
"*",do.call(
expand.grid,`[[`,2)
)
)
)
给出
> head(results_df)
combi final_number
1 mon6_year1_catA_bin1_cat2A 0.30985097
2 mon7_year1_catA_bin1_cat2A 0.28472792
3 mon8_year1_catA_bin1_cat2A 0.26518777
4 mon9_year1_catA_bin1_cat2A 0.25681342
5 mon10_year1_catA_bin1_cat2A 0.20098441
6 mon6_year2_catA_bin1_cat2A 0.07698161
,
这是使用dplyr
和tidyr
的方法。
df_all <- df_mon %>%
full_join(df_year,by = character()) %>% # by = character() ensures cross join
full_join(df_cat,by = character()) %>%
full_join(df_bin,by = character()) %>%
full_join(df_cat2,by = character()) %>%
pivot_longer(cols = c(-mon,-year,-cat,-bin,-cat2)) %>%
group_by(mon,year,cat,bin,cat2) %>%
summarize(final_number = prod(value),.groups = "keep")
# A tibble: 300 x 6
# Groups: mon,cat2 [300]
mon year cat bin cat2 final_number
<fct> <fct> <chr> <fct> <chr> <dbl>
1 6 1 A 1 A 0.310
2 6 1 A 1 AA 2.11
3 6 1 A 1 B 3.44
4 6 1 A 1 C 3.77
5 6 1 A 1 D 2.48
6 6 1 A 2 A 0.122
7 6 1 A 2 AA 0.833
8 6 1 A 2 B 1.36
9 6 1 A 2 C 1.49
10 6 1 A 2 D 0.978
# ... with 290 more rows
它会将来自其他data.frames的变量完整保留为列以进行进一步分析,但是您可以使用少量combi
创建paste()
列。