有没有一种方法可以根据行在R中的值来选择行并按比例分配行?

问题描述

我有一个看起来像这样的数据框:

@H_404_2@ a b c d 1 2005-01-01 0 ... ... 2 2005-02-22 1 ... ... 3 2005-04-02 0 ... ... 4 2005-12-01 3 ... ... 5 2006-03-03 0 ... ... 6 2006-06-08 1 ... ... 7 2006-10-11 0 ... ... 8 2006-12-02 4 ... ... 9 2007-03-24 0 ... ... 10 2007-04-06 2 ... ... 11 2008-01-28 0 ... ... 12 2008-08-19 0 ... ... 13 2008-09-12 0 ... ... 14 2008-12-12 2 ... ... 15 2009-05-27 0 ... ... 16 ... . ... ...

我想选择2005年的所有行,并查看其中有0、1、2、3或4(例如与b列结合)的行。也许有比例?例如,结果将是:

@H_404_2@output: 2005 0 1 2 3 4 20% 20% 20% 20% 20%

我尝试过table(year(DF$a),c=DF$b),但这仅给出了所有年份的概述,没有任何比例或类似内容。我尝试使用%>%将其传递到比例函数中,但这不起作用。

有人知道该怎么做吗?

解决方法

您可以使用tableproportions来获得每年的份额,您可以在margin(此处为proportions)中给1每行。

proportions(table(format(DF$a,"%Y"),DF$b),1) * 100
#         0   1   2   3   4
#  2005  50  25   0  25   0
#  2006  50  25   0   0  25
#  2007  50   0  50   0   0
#  2008  75   0  25   0   0
#  2009 100   0   0   0   0

数据:

DF <- structure(list(a = structure(c(12784,12836,12875,13118,13210,13307,13432,13484,13596,13609,13906,14110,14134,14225,14391),class = "Date"),b = c(0L,1L,0L,3L,4L,2L,0L),c = c("...","...","..."),d = c("...","..."
)),row.names = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15"),class = "data.frame")
,

您可以count出现b中的每个值,使用pivot_wider计算比率并以宽格式(如果需要)获取数据。

library(dplyr)
df %>%
  count(year = lubridate::year(a),b) %>%
  group_by(year) %>%
  mutate(n = n/sum(n) * 100) %>%
  arrange(b) %>%
  tidyr::pivot_wider(names_from = b,values_from = n,values_fill = 0)

#   year   `0`   `1`   `2`   `3`   `4`
#  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1  2005    50    25     0    25     0
#2  2006    50    25     0     0    25
#3  2007    50     0    50     0     0
#4  2008    75     0    25     0     0
#5  2009   100     0     0     0     0