通过对分组变量上的某些列求和来合并特定行

问题描述

下面的数据帧是一个更大的df的子集，其中包含重复的信息

df<-data.frame(Caught=c(92,134,92,134),discarded=c(49,47,49,47),Units=c(170,170,220,220),Hours=c(72,72,72),Colour=c("red","red","red"))

在 Base R 中，我想得到以下内容：

df_result<-data.frame(Caught=226,Retained=96,Units=390,Hours=72,colour="red")

所以基本上结果是 Caught、Retained、Units 列的唯一值的总和，并保留相同的小时和颜色值（Caught=92+134，Retained=49+47，Units=170+220，Hours= 72、color="红色)

但是，我打算在一个包含多列的更大的 data.frame 中执行此操作。我的想法是应用基于列名的函数：

l <- lapply(df,function(x) {
  if(names(x) %in% c("Caught","discarded","Units"))
    sum(unique(x))
  else
    unique(x)
})
as.data.frame(l)

但是，这不起作用，因为我不完全确定在使用 lapply() 和其他函数时如何提取向量名称。

我尝试实现 by()、apply() 函数没有成功。

谢谢

解决方法

要求基础 R：

    l <- lapply( df,function(n) {
        if( is.numeric(n) )
            sum( unique(n) )
        else
            unique( n )
    })
    as.data.frame(l)

该解决方案利用了 data.frames 实际上只是向量列表这一事实。

它产生这个：

    #  Caught Discarded Units Hours Colour
    #    226        96   390    72    red

一个命题：

df <-data.frame(Caught=c(92,134,92,134),Discarded=c(49,47,49,47),Units=c(170,170,220,220),Hours=c(72,72,72),Colour=c("red","red","red"))

df
#>   Caught Discarded Units Hours Colour
#> 1     92        49   170    72    red
#> 2    134        47   170    72    red
#> 3     92        49   220    72    red
#> 4    134        47   220    72    red


df_results <- data.frame(Caught = sum(unique(df$Caught)),Discarded = sum(unique(df$Discarded)),Units = sum(unique(df$Units)),Hours = unique(df$Hours),Colour = unique(df$Colour))

df_results
#>   Caught Discarded Units Hours Colour
#> 1    226        96   390    72    red

# Created on 2021-02-23 by the reprex package (v0.3.0.9001)

问候，