问题描述
我有以下相邻表格,我想将其转换为长格式。
ID <- c(rep('A',1),rep('B',3))
From <- c('Category_8','Category_3','Category_4','Category_1')
To <- c('Category_1','Category_1','Category_3')
have <- tibble(
ID,From,To
)
have
# A tibble: 4 x 3
ID From To
<chr> <chr> <chr>
1 A Category_8 Category_1
2 B Category_3 Category_4
3 B Category_4 Category_1
4 B Category_1 Category_3
ID <- c(rep('A',2),4))
process <- c('Category_8','Category_3')
want <- tibble(
ID,process
)
# A tibble: 6 x 2
ID process
<chr> <chr>
1 A Category_8
2 A Category_1
3 B Category_3
4 B Category_4
5 B Category_1
6 B Category_3
我的尝试如下
have %>%
pivot_longer(!ID,names_to = "x",values_to = "process") %>% dplyr::select(-x)
# A tibble: 8 x 2
ID process
<chr> <chr>
1 A Category_8
2 A Category_1
3 B Category_3
4 B Category_4
5 B Category_4
6 B Category_1
7 B Category_1
8 B Category_3
请注意,第 4 行和第 5 行以及第 6 行和第 7 行基本上是重复的。如果将上述操作与 unique()
链接起来,这可以解决重复问题,但我会丢失最后一行,因为 unique()
会将第 3 行和第 8 行视为重复行,而不应该这样做。
还有一栏我在这里省略了,就是时间。不知道能不能把那个列加进去,能解决这个问题。
解决方法
在你的语法之后创建游程编码将解决问题
library(data.table) #for rleid() function
have %>%
pivot_longer(!ID,names_to = "x",values_to = "process") %>% select(-x) %>%
mutate(d = rleid(ID,process)) %>%
unique() %>% select(-d)
ID process
<chr> <chr>
1 A Category_8
2 A Category_1
3 B Category_3
4 B Category_4
5 B Category_1
6 B Category_3
,
尝试在 lag()
的帮助下创建逻辑过滤器。
library(tidyverse)
enter code here
have %>%
pivot_longer(!ID,values_to = "process") %>% dplyr::select(-x) %>%
group_by(ID) %>%
mutate(previous = lag(process)) %>% # Create a variable that holds the value from the previous row.
replace_na(list(previous = "---")) %>% # This inputs some text into first observations,as to remove the NA and enable a comaprison.
filter(process != previous) %>% # Filter out those cases wher the previous value and the current value are the same.
select(-previous)
# A tibble: 6 x 2
# Groups: ID [2]
ID process
<chr> <chr>
1 A Category_8
2 A Category_1
3 B Category_3
4 B Category_4
5 B Category_1
6 B Category_3