问题描述
我有一个数据帧,其中包含3个与时间段1相关的二进制变量和3个与时间2相关的相应变量。
df <- data.frame("user" = c("a","b","c","d","e"),"item_1_time_1" = c(1,NA),"item_2_time_1" = c(1,1,"item_3_time_1" = c(0,0),"item_1_time_2" = c(1,"item_2_time_2" = c(1,"item_3_time_2" = c(0,1))
df
user item_1_time_1 item_2_time_1 item_3_time_1 item_1_time_2 item_2_time_2 item_3_time_2
1 a 1 1 0 1 1 0
2 b 0 1 0 0 0 0
3 c 0 1 1 0 0 1
4 d 0 0 1 0 0 0
5 e NA NA 0 NA NA 1
我想知道在第1周期内观察到的给定1
是否有item
,但在第2周期内没有。此外,我想知道观察是否在在第1期间而不是第2期间哪个项目是1
。
所以理想的输出看起来像
df2 <- data.frame("user" = c("a",1),"item_1_check" = c(1,"item_2_check" = c(1,"item_3_check" = c(1,item_check = c(1,1))
df2
user item_1_time_1 item_2_time_1 item_3_time_1 item_1_time_2 item_2_time_2 item_3_time_2 item_1_check item_2_check item_3_check item_check
1 a 1 1 0 1 1 0 1 1 1 1
2 b 0 1 0 0 0 0 1 0 1 0
3 c 0 1 1 0 0 1 1 0 1 0
4 d 0 0 1 0 0 0 1 1 0 0
5 e NA NA 0 NA NA 1 1 1 1 1
到目前为止,我已经尝试过
library(tidyverse)
df2 <- df %>%
mutate(across(ends_with('time_2'),replace_na,0)) %>%
mutate(across(ends_with('time_1'),0)) %>%
mutate(item_1_check = if_else(item_1_time_1 == 1 & item_1_time_2 == 0,item_2_check = if_else(item_2_time_1 == 1 & item_2_time_2 == 0,item_3_check = if_else(item_3_time_1 == 1 & item_3_time_2 == 0,1)) %>%
mutate(item_check = pmin(item_1_check,item_2_check,item_3_check))
我想概括一下上述mutate调用,以便它们可以处理n个项目,而不仅仅是3个。是否可以使用ends_with('check')
作为最终的mutate?变量名称不变,但项号和时间段不变。
解决方法
一种选择是将其重塑为“长”格式并执行一次
<v-text-field>
或使用setTimeout
library(dplyr)
library(tidyr)
df %>%
pivot_longer(cols = -user,names_to = c('group','.value'),names_sep="_(?=time)") %>%
mutate(across(starts_with('time'),replace_na,0)) %>%
group_by(group) %>%
transmute(user,check = !(time_1 & !time_2)) %>%
ungroup %>%
group_by(user) %>%
summarise(check = min(check),.groups = 'drop') %>%
right_join(df,.) %>%
select(names(df),check)
# user item_1_time_1 item_2_time_1 item_3_time_1 item_1_time_2 item_2_time_2 item_3_time_2 check
#1 a 1 1 0 1 1 0 1
#2 b 0 1 0 0 0 0 0
#3 c 0 1 1 0 0 1 0
#4 d 0 0 0 0 0 0 1
#5 e NA NA 0 NA NA 1 1