问题描述
sample=data.frame(
ID=c(1,1,2,2),D=c('a','b','c','a','c'),AE=c('m','x','w','y','m','f')
)
我想计算ID的数量,其中所有可能的组合由某个ID内任意两种药物之间的任何组合以及与该ID对应的AE组成。请查看图片以准确了解我的意思enter image description here。
有人可以为我提供一个在小型数据集(样本)上完美工作的代码。但是,该代码将花费几个小时来处理具有(46000个唯一ID,1600个唯一D值和3200个唯一AE值)的实际数据集。实际上,我必须在3小时后(执行代码时)中断会话,而没有任何输出。
enter code here
library(tidyverse)
combinations <- sample %>%
mutate(D2 = D) %>%
group_by(ID) %>%
expand(crossing(D,D2,AE)) %>% # Get all D1,AE combinations within-ID
filter(D2 > D) %>% # Deduplicate to unique combinations
rename(D1 = D) %>%
ungroup() %>%
distinct(D1,AE) # Deduplicate across IDs
# For a given combination of D1,AE; check how many IDs in sample have that combination
count_ids <- function(D1_val,D2_val,AE_val,data) {
data %>%
group_by(ID) %>%
mutate(
has_D1 = if_else(D1_val %in% D,"D1","no D1"),has_D2 = if_else(D2_val %in% D,"D2","no D2"),has_AE = if_else(AE_val %in% AE,"AE","no AE")
) %>%
group_by(has_D1,has_D2,has_AE) %>%
summarise(n_IDs = n_distinct(ID),.groups = "drop") %>%
list(.)
}
combinations %>%
mutate(data = list(sample)) %>%
rowwise() %>%
mutate(data = count_ids(D1,AE,data)) %>% # Get the ID counts for each combination
unnest(data) %>%
mutate(colname = str_c(has_D1,has_AE,sep = ",")) %>% # Create a column name for each possibility of the combinations
select(-starts_with("has_")) %>%
pivot_wider(names_from = colname,values_from = n_IDs,values_fill = 0L) # Spread out to wide format
非常感谢您的帮助。预先感谢
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)