如何使用stringr中的`separate`函数来拆分列名 数据

问题描述

我有一个看起来像这样的数据:

 [1] "average_on_return_belief_01" "average_on_return_belief_02"
 [3] "average_on_return_belief_03" "average_on_return_belief_04"
 [5] "average_on_return_belief_05" "average_on_return_belief_06"
 [7] "average_on_return_belief_07" "average_on_return_belief_08"
 [9] "average_on_return_belief_09" "average_on_return_belief_10"
[11] "average_on_return_belief_11" "average_on_return_belief_12"
[13] "average_on_send_belief_01"   "average_on_send_belief_02"  
[15] "average_on_send_belief_03"   "average_on_send_belief_04"  
[17] "average_on_send_belief_05"   "average_on_send_belief_06"  
[19] "average_on_send_belief_07"   "average_on_send_belief_08"  
[21] "average_on_send_belief_09"   "average_on_send_belief_10"  
[23] "average_on_send_belief_11"   "average_on_send_belief_12"  
[25] "sender_decision_01"          "sender_decision_02"         
[27] "sender_decision_03"          "sender_decision_04"         
[29] "sender_decision_05"          "sender_decision_06"         
[31] "sender_decision_07"          "sender_decision_08"         
[33] "sender_decision_09"          "sender_decision_10"         
[35] "sender_decision_11"          "sender_decision_12"         
...

我需要使用 separate 中的 stringr 函数(对于 pivot_longer names_sep 参数)将它们分成两列:parameter name 和 {{1} }.

所以我有一个“前缀”列表

code

但不知道如何将它们输入到 to_retrieve <- c("average_on_return_belief_","average_on_send_belief_","sender_decision_","return_decision_","receiver_belief_","sender_belief_" ) 中,从而将 separate 的输入拆分为 sender_belief_01 之类的内容

非常欢迎任何想法或提示

解决方法

我认为没有必要显式传递 to_retrieve。您可以使用 extract 使用正则表达式将一列分成两个新列。

tidyr::extract(df,cols,c('col1','col2'),'(.*)_(.*)')

#   col1                     col2 
#   <chr>                    <chr>
# 1 average_on_return_belief 01   
# 2 average_on_return_belief 02   
# 3 average_on_return_belief 11   
# 4 average_on_return_belief 12   
# 5 average_on_send_belief   01   
# 6 average_on_send_belief   02   
# 7 average_on_send_belief   11   
# 8 average_on_send_belief   12   
# 9 sender_decision          01   
#10 sender_decision          02   
#11 sender_decision          11   
#12 sender_decision          12   

数据

df <- structure(list(cols = c("average_on_return_belief_01","average_on_return_belief_02","average_on_return_belief_11","average_on_return_belief_12","average_on_send_belief_01","average_on_send_belief_02","average_on_send_belief_11","average_on_send_belief_12","sender_decision_01","sender_decision_02","sender_decision_11","sender_decision_12")),row.names = c(NA,-12L),class = c("tbl_df","tbl","data.frame"))
,

我们可以在 separate 中使用正则表达式查找匹配字符串末尾 (_) 数字 (\\d+$) 之前的 $

library(tidyr)
separate(df,into = c("col1","col2"),"_(?=\\d+$)")
# A tibble: 12 x 2
#   col1                     col2 
#   <chr>                    <chr>
# 1 average_on_return_belief 01   
# 2 average_on_return_belief 02   
# 3 average_on_return_belief 11   
# 4 average_on_return_belief 12   
# 5 average_on_send_belief   01   
# 6 average_on_send_belief   02   
# 7 average_on_send_belief   11   
# 8 average_on_send_belief   12   
# 9 sender_decision          01   
#10 sender_decision          02   
#11 sender_decision          11   
#12 sender_decision          12   

数据

df <- structure(list(cols = c("average_on_return_belief_01","data.frame"))
,

这是一个使用 data.tabletstrsplit 选项,遵循与 @akrun 相同的正则表达式模式

setDT(df)[,setNames(tstrsplit(cols,"_(?=\\d+$)",perl = TRUE),c("col1","col2"))]

给出

                        col1 col2
 1: average_on_return_belief   01
 2: average_on_return_belief   02
 3: average_on_return_belief   11
 4: average_on_return_belief   12
 5:   average_on_send_belief   01
 6:   average_on_send_belief   02
 7:   average_on_send_belief   11
 8:   average_on_send_belief   12
 9:          sender_decision   01
10:          sender_decision   02
11:          sender_decision   11
12:          sender_decision   12

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...