按通用值组合小标题数据帧

问题描述

在阅读了评论之后，尤其是关于如何合并组的评论，我意识到我的要求没有任何意义。这是我实际上要在程序中实现的结果：

我有一个如下所示的小数据框（尽管我的实际数据框较长）：

    Group   Person       
     <dbl>    <chr>     
1       1   Person 1.1 
2       2   Person 1.2 
3       2   Person 1.2 
4       3   Person 2.1 
5       4   Person 2.1 
6       4   Person 3.1 
7       5   Person 1.2 
8       5   Person 4.1 
9       6   Person 1.2
10      6   Person 4.2

我希望按组划分小标题。但是，我有一个只有2个人1.2的组2，但是由于人1.2与人4.1处于组5中，而与人4.2处于组6中，我想删除组2。因此，如果有一个具有只有一个类型的人，并且该人与另一个人在一个组中，则应将他们自己所在的组删除。

然后数据框将如下所示：

    Group   Person       
    <dbl>    <chr>     
1       1   Person 1.1 
4       3   Person 2.1 
5       4   Person 2.1 
6       4   Person 3.1 
7       5   Person 1.2 
8       5   Person 4.1 
9       6   Person 1.2
10      6   Person 4.2

可复制的数据，例如上面的数据框：

structure(list(Group = c(1,2,3,4,5,6,6),Person = 
c("Person 1.1","Person 1.2","Person 2.1","Person 3.1","Person 1 .2","Person 4.1","Person 4.2")),spec = 
structure(list(
cols = list(Group = structure(list(),class = c("collector_double","collector")),Person = structure(list(),class = 
c("collector_character","collector"))),default = structure(list(),class = 
c("collector_guess",skip = 1),class = "col_spec"),row.names = c(NA,-10L),class = c("tbl_df","tbl","data.frame"))

解决方法

根据您的编辑，我将首先找到与其他人一起出现在群组中的人（称为persons_with_others），然后过滤掉大小为1的群组，其中该群组中的人是其中之一persons_with_others。

library(dplyr)
persons_with_others = df %>%
  group_by(Group) %>%
  filter(n_distinct(Person) > 1) %>%
  pull(Person) %>% 
  unique

df %>% 
  group_by(Group) %>%
  filter(!(n_distinct(Person) == 1 & Person %in% persons_with_others))
# # A tibble: 7 x 2
# # Groups:   Group [4]
#   Group Person     
#   <dbl> <chr>      
# 1     1 Person 1.1 
# 2     4 Person 2.1 
# 3     4 Person 3.1 
# 4     5 Person 1 .2
# 5     5 Person 4.1 
# 6     6 Person 1.2 
# 7     6 Person 4.2

此结果与您期望的输出不同，但是我认为这是正确的：组3被删除，因为它仅包含Person 2.1，并且Person 2.1出现在组{{1}中}与另一个人（4）。

这是基本的R选项

dfs <- split(df,df$Group)
res <- list()
while(length(dfs)>0) {
  S <- dfs[[1]]$Person
  inds <- 1
  for (k in seq_along(dfs)[-1]) {
    if (length(intersect(dfs[[k]]$Person,S)) >0) {
      S <- union(S,dfs[[k]]$Person)
      inds <- c(inds,k)
    }
  }
  res[[length(res)+1]] <- do.call(rbind,dfs[inds])
  dfs <- dfs[-inds]
}

给出

> res
[[1]]
# A tibble: 1 x 3
  Group Shape   Person
* <dbl> <chr>   <chr>
1     1 shape 1 Person 1.1

[[2]]
# A tibble: 4 x 3
  Group Shape   Person
* <dbl> <chr>   <chr>
1     2 shape 5 Person 1.2
2     2 shape 2 Person 1.2
3     5 shape 4 Person 1.2
4     5 shape 1 Person 4.1

[[3]]
# A tibble: 3 x 3
  Group Shape   Person
* <dbl> <chr>   <chr>
1     3 shape 3 Person 2.1
2     4 shape 3 Person 2.1
3     4 shape 6 Person 3.1

dataframe r r tibble