如何在R中使用长格式的数据帧获得组之间的区别？

问题描述

具有一个具有2个ID（N = 2）和2个周期（T = 2）的简单数据帧，例如：

def top_keywords(rake_keywords,n=3):
    keyword_list = rake_keywords[1]
    top_keyword_items = keyword_list[:n]
    top_keywords = [kw[0] for kw in top_keyword_items]
    return top_keywords

如何实现以下数据帧（最好使用dplyr或任何tidyverse解决方案）？

 year    id    points
   1      1     10
   1      2     12
   2      1     20
   2      2     18

请注意， points_difference 列是每个ID跨时间（即T2-T1）之间的差。

此外，如何归纳多列和多个ID（只有2个句点）？

 id    points_difference
  1         10   
  2         6

 year    id    points  scores
   1      1      10      7
   1     ...    ...     ...
   1      N      12      8
   2      1      20      9
   2     ...    ...     ...
   2      N      12      9

解决方法

如果您使用的是dplyr 1.0.0（或更高版本），则summarise可以在输出中返回多行，因此如果您有两个以上的句点，那么这也将起作用。你可以做：

library(dplyr)

df %>%
  arrange(id,year) %>%
  group_by(id) %>%
  summarise(across(c(points,scores),diff,.names = '{col}_difference'))

#     id points_difference scores_difference
#  <int>             <int>             <int>
#1     1                10                 2
#2     1                -7                 1
#3     2                 6                 2
#4     2                -3                 3

数据

df <- structure(list(year = c(1L,1L,2L,3L,3L),id = c(1L,2L),points = c(10L,12L,20L,18L,13L,15L),scores = c(2L,4L,5L,8L)),class = "data.frame",row.names = c(NA,-6L))

dataframe dplyr panel-data r tidyverse

如何在R中使用长格式的数据帧获得组之间的区别？

问题描述

解决方法

相关问答