从相邻矩阵from-to格式数据转换为长格式

问题描述

我有以下相邻表格,我想将其转换为长格式。

ID <- c(rep('A',1),rep('B',3))
From <- c('Category_8','Category_3','Category_4','Category_1')
To <- c('Category_1','Category_1','Category_3')

have <- tibble(
  ID,From,To
)
have
# A tibble: 4 x 3
  ID    From       To        
  <chr> <chr>      <chr>     
1 A     Category_8 Category_1
2 B     Category_3 Category_4
3 B     Category_4 Category_1
4 B     Category_1 Category_3

ID <- c(rep('A',2),4))
process <- c('Category_8','Category_3')

want <- tibble(
  ID,process
)
# A tibble: 6 x 2
  ID    process   
  <chr> <chr>     
1 A     Category_8
2 A     Category_1
3 B     Category_3
4 B     Category_4
5 B     Category_1
6 B     Category_3

我的尝试如下

have %>%
  pivot_longer(!ID,names_to = "x",values_to = "process") %>% dplyr::select(-x)

# A tibble: 8 x 2
  ID    process   
  <chr> <chr>     
1 A     Category_8
2 A     Category_1
3 B     Category_3
4 B     Category_4
5 B     Category_4
6 B     Category_1
7 B     Category_1
8 B     Category_3

请注意,第 4 行和第 5 行以及第 6 行和第 7 行基本上是重复的。如果将上述操作与 unique() 链接起来,这可以解决重复问题,但我会丢失最后一行,因为 unique() 会将第 3 行和第 8 行视为重复行,而不应该这样做。

还有一栏我在这里省略了,就是时间。不知道能不能把那个列加进去,能解决这个问题。

解决方法

在你的语法之后创建游程编码将解决问题

library(data.table) #for rleid() function
have %>%
  pivot_longer(!ID,names_to = "x",values_to = "process") %>% select(-x) %>%
  mutate(d = rleid(ID,process)) %>%
  unique() %>% select(-d)
  ID    process   
  <chr> <chr>     
1 A     Category_8
2 A     Category_1
3 B     Category_3
4 B     Category_4
5 B     Category_1
6 B     Category_3
,

尝试在 lag() 的帮助下创建逻辑过滤器。

library(tidyverse) 
enter code here
have %>%
    pivot_longer(!ID,values_to = "process") %>% dplyr::select(-x) %>% 
    group_by(ID) %>% 
    mutate(previous = lag(process)) %>% # Create a variable that holds the value from the previous row.
    replace_na(list(previous = "---")) %>% # This inputs some text into first observations,as to remove the NA and enable a comaprison.
    filter(process != previous) %>% # Filter out those cases wher the previous value and the current value are the same.
    select(-previous)

# A tibble: 6 x 2
# Groups:   ID [2]
  ID    process   
  <chr> <chr>     
1 A     Category_8
2 A     Category_1
3 B     Category_3
4 B     Category_4
5 B     Category_1
6 B     Category_3

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...