问题描述
在阅读了评论之后,尤其是关于如何合并组的评论,我意识到我的要求没有任何意义。这是我实际上要在程序中实现的结果:
Group Person
<dbl> <chr>
1 1 Person 1.1
2 2 Person 1.2
3 2 Person 1.2
4 3 Person 2.1
5 4 Person 2.1
6 4 Person 3.1
7 5 Person 1.2
8 5 Person 4.1
9 6 Person 1.2
10 6 Person 4.2
我希望按组划分小标题。但是,我有一个只有2个人1.2的组2,但是由于人1.2与人4.1处于组5中,而与人4.2处于组6中,我想删除组2。因此,如果有一个具有只有一个类型的人,并且该人与另一个人在一个组中,则应将他们自己所在的组删除。
然后数据框将如下所示:
Group Person
<dbl> <chr>
1 1 Person 1.1
4 3 Person 2.1
5 4 Person 2.1
6 4 Person 3.1
7 5 Person 1.2
8 5 Person 4.1
9 6 Person 1.2
10 6 Person 4.2
可复制的数据,例如上面的数据框:
structure(list(Group = c(1,2,3,4,5,6,6),Person =
c("Person 1.1","Person 1.2","Person 2.1","Person 3.1","Person 1 .2","Person 4.1","Person 4.2")),spec =
structure(list(
cols = list(Group = structure(list(),class = c("collector_double","collector")),Person = structure(list(),class =
c("collector_character","collector"))),default = structure(list(),class =
c("collector_guess",skip = 1),class = "col_spec"),row.names = c(NA,-10L),class = c("tbl_df","tbl","data.frame"))
解决方法
根据您的编辑,我将首先找到与其他人一起出现在群组中的人(称为persons_with_others
),然后过滤掉大小为1的群组,其中该群组中的人是其中之一persons_with_others
。
library(dplyr)
persons_with_others = df %>%
group_by(Group) %>%
filter(n_distinct(Person) > 1) %>%
pull(Person) %>%
unique
df %>%
group_by(Group) %>%
filter(!(n_distinct(Person) == 1 & Person %in% persons_with_others))
# # A tibble: 7 x 2
# # Groups: Group [4]
# Group Person
# <dbl> <chr>
# 1 1 Person 1.1
# 2 4 Person 2.1
# 3 4 Person 3.1
# 4 5 Person 1 .2
# 5 5 Person 4.1
# 6 6 Person 1.2
# 7 6 Person 4.2
此结果与您期望的输出不同,但是我认为这是正确的:组3
被删除,因为它仅包含Person 2.1
,并且Person 2.1
出现在组{{1}中}与另一个人(4
)。
这是基本的R选项
dfs <- split(df,df$Group)
res <- list()
while(length(dfs)>0) {
S <- dfs[[1]]$Person
inds <- 1
for (k in seq_along(dfs)[-1]) {
if (length(intersect(dfs[[k]]$Person,S)) >0) {
S <- union(S,dfs[[k]]$Person)
inds <- c(inds,k)
}
}
res[[length(res)+1]] <- do.call(rbind,dfs[inds])
dfs <- dfs[-inds]
}
给出
> res
[[1]]
# A tibble: 1 x 3
Group Shape Person
* <dbl> <chr> <chr>
1 1 shape 1 Person 1.1
[[2]]
# A tibble: 4 x 3
Group Shape Person
* <dbl> <chr> <chr>
1 2 shape 5 Person 1.2
2 2 shape 2 Person 1.2
3 5 shape 4 Person 1.2
4 5 shape 1 Person 4.1
[[3]]
# A tibble: 3 x 3
Group Shape Person
* <dbl> <chr> <chr>
1 3 shape 3 Person 2.1
2 4 shape 3 Person 2.1
3 4 shape 6 Person 3.1