如何使用 R

问题描述

我有一个 data.frame，其中包含多年来的大量盆地数据。

我想使用 group_by 过滤器确定每个类的最高值，但保留源流域和年份信息。

例如，2005 年哪个盆地的 veg 值最高？和 2006 年？.... grass 类也是如此：这是 2005 年和 2006 年草地价值最高的盆地。

我考虑过使用 formmattable 进行目视检查，但它会进行此最小/最大分析而不考虑年龄组。

library(dplyr)
library(tidyverse)

df<-read.table(text="basin  year    veg wet crop    grass   urb water   soil
01  2005    52.64   0   1197.98 524.56  0   0   2.25
01  2009    54.14   0   1171.7  550.51  0   0   1.08
09  2008    9271.08 0   41.66   22190.32    0   5.82    37.34
11  2013    29460.53    0   16489.56    71927.62    437.41  33.1    56.96
04  2017    30831.06    0   5713.81 73876.6 156.75  103.7   29.69
01  2006    47.376  0   1078.182    472.104 0   0   2.025
01  2010    48.726  0   1054.53 495.459 0   0   0.972
09  2009    8343.972    0   37.494  19971.288   0   5.238   33.606
11  2014    26514.477   0   14840.604   64734.858   393.669 29.79   51.264
04  2018    27747.954   0   5142.429    66488.94    141.075 93.33   26.721
01  2007    42.6384 0   970.3638    424.8936    0   0   1.8225
01  2011    43.8534 0   949.077 445.9131    0   0   0.8748
09  2010    7509.5748   0   33.7446 17974.1592  0   4.7142  30.2454
11  2015    23863.0293  0   13356.5436  58261.3722  354.3021    26.811  46.1376
04  2019    24973.1586  0   4628.1861   59840.046   126.9675    83.997  24.0489
05  2005    52.14   0   1169.7  548.51  0   0   0.92
",sep="",header=TRUE)

df%>%
  group_by(year,basin)%>%
  summarise(across('veg':'soil'),max)

尝试2

library(formattable)
#Formatar a tabela
formattable(df,list(
  'veg' = color_tile("#ccf0f0","#0066cc"),'wet' = color_tile("#ccf0f0",'crop' = color_tile("#ccf0f0",'grass' = color_tile("#ccf0f0",'urb' = color_tile("#ccf0f0",'water' = color_tile("#ccf0f0",'soil'= color_tile("#ccf0f0","#0066cc")
  ))

使用 formattable 我可以看到最强颜色的最高值，但 formattable 不能按年份分组。

解决方法

这里有两种方法，dplyr 解决方案和基本 R 解决方案。

`dplyr`。

此解决方案使用 slice_max 来保留每列 n = 2 的最大值。

library(dplyr)

df2 <- df %>%
  arrange(basin,year) %>%
  group_by(basin,year) %>%
  slice_max(order_by = -c(basin,year),n = 2)

基础 R

此解决方案使用辅助函数来获取两个最高值。

fun <- function(x) tail(sort(x),n = 2)

agg <- aggregate(as.matrix(df[-(1:2)]),by = as.list(df[1:2]),fun)
agg <- agg[with(agg,order(basin,year)),]
row.names(agg) <- NULL

输出是相等的。

all.equal(as.data.frame(df2),agg)
#[1] TRUE

formattable group-by r r