使用几个月的最大值计算年平均值

问题描述

我有一个包含 11 个变量的时间序列数据集 https://drive.google.com/file/d/1x63IfB429i3JKrheWNiKn0nIg28xyp9R/view?usp=sharing

我正在尝试计算变量的年度平均值 (v1:v11),我可以在 r 中使用 tidyr 和 dplyr 包来计算,特别是使用 group_by 和汇总函数

 library(tidyr)
    library(dplyr)
    
    tidied_df <- d1 %>%
      ### first to make the data in long tidy formate
      pivot_longer(v1:v11,names_to = "VIs",values_to = "Value")%>%
      drop_na()
    
    # computing the mean for all plots
    annual_growing_mn_mx<-
      tidied_df %>%
      group_by(Plot_code,VIs,Year) %>%
      summarise(VIs_mn = mean(Value,na.rn = FALSE),VIs_mx = max(Value,na.rn = FALSE))

但是,现在我只想使用特定的长度来计算这个平均值。例如,我想使用给定年份中最多 3 个月份的值来计算年平均值。这表明我将仅考虑与其他任何月份相比具有最大值的那 3 个月的年平均值。输出应该像这样:

Output example

一个简单的方法是通过子集或过滤来生成一个新的数据框,其中我将只拥有最多 3 个月的数据。然后我可以再次总结它以获得年度平均值。

我尝试了不同的包和功能,但没有成功。

非常感谢任何帮助!!!

可重现的示例:

structure(list(X = 1:105,Plot_code = c("AT_Neu","AT_Neu","AT_Neu"),Year = c(2002L,2002L,2003L,2004L,2005L,2006L,2006L),Month = c(1L,1L,10L,11L,12L,2L,3L,4L,5L,6L,7L,8L,9L,4L),v1 = c(NA,NA,0.63,0.62,0.82,0.83,0.73,0.76,0.79,0.8,0.72,0.85,0.66,0.77,0.67,0.74,0.7,0.71,0.78,0.86,0.75,0.69,NA),v2 = c(NA,0.48,0.43,0.54,0.58,0.64,0.56,0.46,0.59,0.5,0.53,0.51,0.57,0.6,0.52,0.68,0.49,0.61,0.65,v3 = c(NA,0.47,0.55,v4 = c(NA,4.45,4.32,10.31,10.69,6.44,7.27,8.59,9.05,6.08,7.32,12.73,4.95,7.81,5.13,6.74,5.7,5.83,8.68,8.19,5.81,13.61,7.04,7.03,12.88,7.02,8.48,8.99,5.99,7.54,5.39,5.62,12.09,9.9,6.79,5.63,v5 = c(NA,v6 = c(NA,0.04,0.02,0.03,v7 = c(NA,0.09,0.08,0.05,0.06,0.07,v8 = c(NA,0.4,0.35,0.39,0.42,0.36,0.41,0.38,0.37,0.32,0.45,v9 = c(NA,0.1,0.11,0.12,0.14,v10 = c(99980800.92,99980800.92,0.22,0.189,0.3404,0.2535,0.2924,0.3337,0.3384,0.2752,0.2856,0.385,0.2088,0.2898,0.2419,0.2508,0.2294,0.2394,0.2698,0.294,0.2583,0.4134,0.2144,0.3015,0.3588,0.2814,0.3195,0.3096,0.2205,0.3174,0.2257,0.2418,0.2829,0.333,0.2232,99980800.92),v11 = c(0,0.788622527,0.66959026,0.953392233,0.973284802,0.836679489,0.908891095,0.961089831,0.893697727,0.982013057,0.732659321,0.81366445,0.761594156,0.857664567,0.990411723,0.93414721,0.944455651,0.92233483,0)),row.names = c(NA,105L),class = "data.frame")

解决方法

也许您只想使用 [[1,2,3],[1,3]] [[1,3]] 中的 across。无需在 dplyr

中重塑两次
  • tidyr 将同时对所有需要的列进行操作
  • across 将给出升序,在开始时保持 NA。所以
  • order(.,na.last = F) 对向量进行排序
  • .[order(.,na.last = F)] 将给出最后三个值(三个最大值)
  • 此后,请按预期使用 tail(.[order(.,na.last = F)],3)

希望这能澄清语法。

mean

在随后分享的library(dplyr,warn.conflicts = F) d1 <- read.csv('C:\\Users\\Acer\\Documents\\d1.csv') d1 %>% group_by(Plot_code,Year) %>% summarise(across(starts_with('v'),~ mean(tail(.[order(.,3),na.rm = T)),.groups = 'drop') #> # A tibble: 44 x 13 #> Plot_code Year v1 v2 v3 v4 v5 v6 v7 v8 v9 #> <chr> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> #> 1 AT_Neu 2002 0.833 0.693 0.64 11.2 0.733 0.0367 0.08 0.483 0.107 #> 2 AT_Neu 2003 0.78 0.587 0.56 8.23 0.667 0.03 0.0733 0.417 0.12 #> 3 AT_Neu 2004 0.84 0.7 0.64 11.8 0.72 0.0333 0.0633 0.48 0.103 #> 4 AT_Neu 2005 0.813 0.663 0.62 9.93 0.717 0.0333 0.0667 0.47 0.123 #> 5 AT_Neu 2006 0.823 0.67 0.62 10.6 0.697 0.03 0.0633 0.463 0.103 #> 6 AT_Neu 2007 0.763 0.573 0.553 7.61 0.663 0.03 0.0633 0.413 0.103 #> 7 AT_Neu 2008 0.793 0.627 0.59 8.66 0.723 0.03 0.06 0.46 0.09 #> 8 AT_Neu 2009 0.823 0.67 0.62 10.2 0.687 0.02 0.05 0.467 0.0833 #> 9 AT_Neu 2010 0.81 0.65 0.603 9.41 0.713 0.0267 0.0633 0.46 0.09 #> 10 AT_Neu 2011 0.783 0.593 0.567 8.24 0.69 0.03 0.08 0.423 0.0967 #> # ... with 34 more rows,and 2 more variables: v10 <dbl>,v11 <dbl> 上,上面的代码结果

dput