按组和列之间查找最小值

问题描述

我试图在不同的列和组中找到最小值。一小部分数据看起来像这样：

getall()

我想按组分组，对于每个组，在两个组分数中找到包含最小组分数的行，然后获取包含最小组的列的名称（group_score_1或group_score_2），所以基本上我的结果应该是这样的：

getall()

我尝试了一些想法，最终想到将其划分为几个新的数据帧，按组过滤并选择相关的列，然后使用group cut group_score_1 group_score_2 1 a 1 3 5.0 2 b 2 2 4.0 3 a 0 2 2.5 4 b 3 5 4.0 5 a 2 3 6.0 6 b 1 5 1.0，但是我敢肯定有一种更有效的方法去做吧。不知道我在想什么。

解决方法

我们可以使用data.table方法

library(data.table)
setDT(df)[df[,.I[which.min(do.call(pmin,.SD))],group,.SDcols = patterns('^group_score')]$V1]
#   group cut group_score_1 group_score_2
#1:     a   0             2           2.5
#2:     b   1             5           1.0

对于每个group，您可以计算min的值，然后在其中一列中选择存在该值的行。

library(dplyr)

df %>%
  group_by(group) %>%
  filter({tmp = min(group_score_1,group_score_2);
          group_score_1 == tmp | group_score_2 == tmp})

#  group   cut group_score_1 group_score_2
#  <chr> <int>         <int>         <dbl>
#1 a         0             2           2.5
#2 b         1             5           1

当您只有两个group_score列时，上述方法效果很好。如果您有很多这样的列，则不可能用group_score_1 == tmp | group_score_2 == tmp等列出每一列。在这种情况下，请以长格式获取数据并获取对应的最小值的cut值，加入数据。假设cut在每个组中都是唯一的。

df %>%
  tidyr::pivot_longer(cols = starts_with('group_score')) %>%
  group_by(group) %>%
  summarise(cut = cut[which.min(value)]) %>%
  left_join(df,by = c("group","cut"))

这是使用pmin + ave + subset

的基本R选项

subset(
  df,as.logical(ave(
    do.call(pmin,df[grep("group_score_\\d+",names(df))]),FUN = function(x) x == min(x)
  ))
)

给出

  group cut group_score_1 group_score_2
3     a   0             2           2.5
6     b   1             5           1.0

数据

> dput(df)
structure(list(group = c("a","b","a","b"),cut = c(1L,2L,0L,3L,1L),group_score_1 = c(3L,5L,5L
),group_score_2 = c(5,4,2.5,6,1)),class = "data.frame",row.names = c("1","2","3","4","5","6"))

min r

按组和列之间查找最小值

问题描述

解决方法

相关问答