有没有一种方法可以根据行在R中的值来选择行并按比例分配行？

问题描述

我有一个看起来像这样的数据框：

@H_404_2@

   a          b  c   d
1  2005-01-01 0 ... ...
2  2005-02-22 1 ... ...
3  2005-04-02 0 ... ...
4  2005-12-01 3 ... ...
5  2006-03-03 0 ... ...
6  2006-06-08 1 ... ...
7  2006-10-11 0 ... ...
8  2006-12-02 4 ... ...
9  2007-03-24 0 ... ...
10 2007-04-06 2 ... ...
11 2008-01-28 0 ... ...
12 2008-08-19 0 ... ...
13 2008-09-12 0 ... ...
14 2008-12-12 2 ... ...
15 2009-05-27 0 ... ...
16    ...     . ... ...

我想选择2005年的所有行，并查看其中有0、1、2、3或4（例如与b列结合）的行。也许有比例？例如，结果将是：

@H_404_2@

output:
2005
0    1    2    3    4
20%  20%  20%  20%  20%

我尝试过table(year(DF$a),c=DF$b)，但这仅给出了所有年份的概述，没有任何比例或类似内容。我尝试使用%>%将其传递到比例函数中，但这不起作用。

有人知道该怎么做吗？

解决方法

您可以使用table和proportions来获得每年的份额，您可以在margin（此处为proportions）中给1每行。

proportions(table(format(DF$a,"%Y"),DF$b),1) * 100
#         0   1   2   3   4
#  2005  50  25   0  25   0
#  2006  50  25   0   0  25
#  2007  50   0  50   0   0
#  2008  75   0  25   0   0
#  2009 100   0   0   0   0

数据：

DF <- structure(list(a = structure(c(12784,12836,12875,13118,13210,13307,13432,13484,13596,13609,13906,14110,14134,14225,14391),class = "Date"),b = c(0L,1L,0L,3L,4L,2L,0L),c = c("...","...","..."),d = c("...","..."
)),row.names = c("1","2","3","4","5","6","7","8","9","10","11","12","13","14","15"),class = "data.frame")

您可以count出现b中的每个值，使用pivot_wider计算比率并以宽格式（如果需要）获取数据。

library(dplyr)
df %>%
  count(year = lubridate::year(a),b) %>%
  group_by(year) %>%
  mutate(n = n/sum(n) * 100) %>%
  arrange(b) %>%
  tidyr::pivot_wider(names_from = b,values_from = n,values_fill = 0)

#   year   `0`   `1`   `2`   `3`   `4`
#  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1  2005    50    25     0    25     0
#2  2006    50    25     0     0    25
#3  2007    50     0    50     0     0
#4  2008    75     0    25     0     0
#5  2009   100     0     0     0     0

categories dataframe r r rows