按R中row.names中的值对数据帧行进行子集

问题描述

我看过这个Subsetting a data frame based on a logical condition on a subset of rows和那个https://statisticsglobe.com/filter-data-frame-rows-by-logical-condition-in-r

我想根据row.names中的特定值对data.frame进行子集化。

data <- data.frame(x1 = c(3,7,1,8,5),# Create example data
                   x2 = letters[1:5],group = c("ga1","ga2","gb1","gc3","gb1"))
data                                                         # Print example data
# x1 x2 group
#  3  a    ga1
#  7  b    ga2
#  1  c    gb1
#  8  d    gc3
#  5  e    gb1

我想根据组对data进行子集设置。一个子集应该是在其组中包含a的行，在其组中包含b的行和一个c的行。也许与grepl有什么关系？

结果应该像这样

data.a                                                       
# x1 x2 group
#  3  a    ga1
#  7  b    ga2

data.b                                                      
# x1 x2 group
#  1  c    gb1
#  5  e    gb1

data.c
#  8  d    gc3

我将对如何将这些输出示例中的一个子集感兴趣，否则循环也将起作用。

我从这里https://statisticsglobe.com/filter-data-frame-rows-by-logical-condition-in-r

修改了示例

解决方法

提取要分割的数据：

sub('\\d+','',data$group)
#[1] "ga" "ga" "gb" "gc" "gb"

并使用split中的上述内容将数据分为几组。

new_data <- split(data,sub('\\d+',data$group))
new_data
#$ga
#  x1 x2 group
#1  3  a   ga1
#2  7  b   ga2

#$gb
#  x1 x2 group
#3  1  c   gb1
#5  5  e   gb1

#$gc
#  x1 x2 group
#4  8  d   gc3

最好将数据保留在列表中，如果您希望每个组使用单独的数据框，则可以使用list2env。

list2env(new_data,.GlobalEnv)

我们可以将group_split中的str_remove与tidyverse一起使用

library(dplyr)
library(stringr)
data %>% 
    group_split(grp = str_remove(group,"\\d+$"),.keep = FALSE)

好问题。此解决方案使用与请求"I want to subset data according to group. One subset should be the rows containing a in their group,one containing b in their group and one c. Maybe something with grepl?"非常匹配的输入和输出。

下面的代码使用提供的数据帧（命名数据），并使用grep（）和按组的子集。

代码：

ga <- grep("ga",data$group)   # seperate the data by group type
gb <- grep("gb",data$group)   
gc <- grep("gc",data$group) 

ga1 <- data[ga,]                     # subset ga
gb1 <- data[gb,]                     # subset gb
gc1 <- data[gc,]                     # subset gc

print(ga1)
print(gb1)
print(gc1)

使用Windows和Jupyter Lab。这里的输出与上面显示的输出非常匹配。

链接link1上显示的输出

dataframe r r row subset